Wednesday, March 9, 2011

One Way Descriptor Matching

Might be useful to know that Lepetit and Fua, who co-authored the Ferns and Randomized Trees Key-Point Classifier technique, are also contributors to One Way Descriptor paper.

The team devises a new patch descriptor that combines both offline and online training. Descriptor extraction is not needed for run-time query. That is why it is 'One-Way'. The goal was to save time for feature-point matching and real-time object-tracking from video. The team's experience with SLAM suggests that this technique works well objects that lacks distinguishing texture.

Training: Try to find an image with frontal view of the object-to-track. The idea is to train a classifier with the same set of feature points viewed from multiple-poses. The key-point patches came from only a few input images. They are expanded into many warped versions. At the end, a key-point patch would be represented by a set of mean-patches. Each mean-patch represents  one single pose. At each single-pose, the image is again expanded into many poses. Only this time the variations are small and around the associated pose. A mean-patch is the 'average' of patches that 'differs' a little bit around that pose.

Matching: Directly compare an incoming patch with mean-patches from all the poses of all the patches in database. (How does that work? Comparing pixels to PV)? The patches are normalized before comparison. (Some heuristics to speed up search like K-D Tree). The search not only returns a mean-patch but also its associated 'coarse' pose.

Speed up the calculation of mean-patches
The author makes use of linear method of computing mean-patches such that it would be preferable to other blurring techniques such as Gaussian. The perspective transforms to make mean-patches takes up most of the training time, too slow to be used for online training. According to the authors, it takes 300 patches to compute the mean in order to get good results. A 'math trick' allows the training will be split into two parts.

Offline Training
Principal Components Analysis is used for offline training. A reference patch is broken down into a mean and L components of principal vectors. (L is user-defined). So instead of warping the image patch in terms of pixel-arrays, it will be acted on means and principal vectors. The mean-patch is a weighted sum of the average-warped-means and average-warped-PVs.

Online Training
With the offline training done the heavy lifting - and the mean-patch calculation now only requires time proportional to the number of PV components, not the number of 'small-warps'. The major work left in online-training is to deduce the 'weights' for each new patch. It will be projected into eigenvector space and solve for a set of coefficients (weights). These will be used to compute a mean patch - the linear sum. (But which feature-point-pose eigenvector-space to use?!)

Demo Application (one_way_sample.cpp)
The demo code makes use of OneWayDescriptorBase (not the OneWayDescriptorMatcher).

Offline phase: Build the mean-patches from a set of 2 images (same chessboard viewed from 2 different angles). The number of 'dimensions' (L) is set to 100 by default. The image patches are of size 24x24. OpenCV implementation uses SURF to detect key-points from training images. And it does 2 versions of it, another one is half the specified patch-dimensions(12x12). The paper mentions something about this to improve consistency. There will be 50 random poses for each patch. The result would be saved to pca.yml file. And it would be loaded back in as a PCA as an array of OneWayDescriptor. I am so far unable to find the definition of how many 'small-pose-changes' from which the 'mean' is computed.
Online phase: SURF detector is used to detect key-points from the reference input image. The OneWayDescriptorBase would compute One-Way Descriptors for these key-points.

Uses SURF detector to find key-points from the second (incoming) image. Incoming key-points will be queried one-by-one from the first (reference) key-points with the OneWayDescriptorBase::FindDescriptor(). The matching criteria is a distance threshold. The pose will also be available.
At the end a correspondence map will be drawn and displayed.

  • 42 keypoints detected from the training images (scene_l.bmp, scene_r.bmp)
  • Took 2 hours to create 100 PCA components.
  • Reference image descriptors prepared in 2.1 ms
  • Matching 37 keypoints takes 2.2 ms [ result is good but i guess it's too simple]
Tried matching box.png(train) and box_in_scene.png(query): 72 keypoints
  • Beware pca.yml was produced under training directory while it was loaded from working directory.
  • The matching result is so-so, 1/3 of them are false-matches.
Tried img1.ppm and img2.ppm set from Dundee Graf set: 502 keypoints
  • Took several minutes to do matching.
  • Cannot tell if it is good or not with so many correspondences on screen.
Demo (generic_descriptor_match.cpp)
  • Able to load the PCA data (with a bug discovered) from previous example.
  • It does not work - at least not supposed to be. First, the GenericDescriptorMatcher::match() calls OneWayDescriptorMatcher::clone() with the parameter indicating that the training was not performed. That means the OneWayDescriptorMatcher is re-instantiated again, discarding the PCA data loaded with GenericDescriptorMatcher::create(). I noticed that this when the training part takes too long. And the function Initialize() is called instead of InitializeFast() inside OneWayDescriptorBase::InitializeDescriptor().

More note: There is trouble writing the PCA file (150MB) from the PC. It stops (without warning) at about 7MB. It was able to do the matching (I suppose the data is cached). No such problem running from the notebook.

Further note: The paper is too brief for me to understand totally. Especially on how to learn new key-points. It seems kind of like magic by training using unrelated images to produce some mean-patches that is used to compare 2 other pairs of images. Is it supposed to work like this?!

Real-Time Learning of Accurate Patch Rectification, Hinterstoisser et al.

Future Reading
  • Simultaneous recognition and homography extraction of local patches with a simple linear classifier, Hinterstoisser et al.
  • Online learning of patch perspective rectification for efficient object detection, Hinterstoisser et al.


  1. Can you post the demo application source files please?

  2. I made a mistake, it should be one_way_matching.cpp, not one_way_matching.cpp. It is one of the OpenCV 2.2 sample programs.