Thursday, March 24, 2011

What are my GUI options?

Evaluating my GUI options for Windows PC applications
  1. Windows Forms (.Net) with C++ or C#; .Net port of Emgu Project
  2. MFC (Native) - Visual Studio Standard/Professional and up
Nokia -> Digia?
  1. Qt: Visual Studio - Qt-add-in to take care of the Qt-specific compilation process. Visual Studio Standard/Pro and up.
  2. Qt: QT Creator
Win32++: Does it have widget like OpenFileDialog?

1. HighGUI + Qt as back-end
2. There is a window_gtk.cpp, wonder if it works on Windows also.

Initial thoughts
Based on what I read from various internet discussions, I could use another GUI framework while keeping highGUI module for image, (camera) video, encode/ decode/read/write. The only work I need to write is to implement cvDrawImage().
Like C# but afraid that .Net would slow down execution. And wonder if C# port would introduce another set of issues/bugs.
The pros of choosing Qt is the cvDrawImage() is already there in window_QT.cpp.

Going with Qt

Problem with MSVC OpenCV DLL
Installed QTCreator. Created a simple app that open a file with the QT File Dialog. It uses OpenCV API to open an image file. Result: C API compiled and ran fine (cvLoadImageM()). Using the corresponding C++ API got linking error (undefined references to cv::imread()). Made the case simpler by trying cvGetTickCount(). Again the C API worked while C++ function cv::GetTickCount() did not. This is the configuration I used for building the application:
Tried to fix the problem with 'reimp' MinGW utility like this:
Unfortunately, 'reimp' did not like opencv_core220d.lib. Giving error 'invalid or corrupt import library'. Someone filed a bug report on this, not sure if that's the same case
Someone points to this article to describe the common issues of linking C++ libraries built with different compilers:
Giving up on this for now.

Debugger failed to step into OpenCV DLL
Even though calling OpenCV C API worked, I was unable to get the debugger to step into OpenCV functions. Installed 32-bit Debugging Tools for Windows (x86). Enable CDB in QTCreator Options menu. Followed instructions here: ; Still cannot get it to work.

OpenCV MinGW Build
Since QTCreator compiles with the MinGW toolchain. Rebuilt OpenCV 2.2 with latest stable version of MinGW(not the one packaged with QTCreator). Instructions here: ; Encountered a error linking with VideoInput.a - and the workaround is described here:
The OpenCV was then built successfully. There are some compilation errors with the C/C++ samples. g++ doesn't like:
vector<vector<Point>> cpoints So I changed to this vector<vector<Point> > cpoints
Fixes are trivial. I am now able to run both the adaptiveskindetector and camshiftdemo from the command-line. It can even read from the webcam right away!

MinGW versions
QtCreator uses its own MinGW. The gcc version is 4.4. The latest MinGW that I used to build OpenCV is gcc4.5.2. It caused problem at runtime. I fixed it by setting the PATH such that the newer MinGW DLLs are picked up.
Debugger: Unusably slow stepping in OpenCV DLLs. Wonder why.... Related to this?!
The qmake manual says automatic code completion and syntax highlighting will be available from external libraries after they are declared in the INCLUDEPATH and LIBS variables. I noticed that it only works if the directory is assigned to INCLUDEPATH as literal string, not a variable value $().

At the end...
I am able to get the application to open an image and display it correctly. It uses OpenCV C++ API to load image - swap color channels from BGR to RGB. The display part uses QtGraphicsScene and QtGraphicsView to show the QtImage.

Here is a hyperlink found in window_QT.hpp, a Qt article on how to interactively pan and zoom images smoothly

Wednesday, March 9, 2011

One Way Descriptor Matching

Might be useful to know that Lepetit and Fua, who co-authored the Ferns and Randomized Trees Key-Point Classifier technique, are also contributors to One Way Descriptor paper.

The team devises a new patch descriptor that combines both offline and online training. Descriptor extraction is not needed for run-time query. That is why it is 'One-Way'. The goal was to save time for feature-point matching and real-time object-tracking from video. The team's experience with SLAM suggests that this technique works well objects that lacks distinguishing texture.

Training: Try to find an image with frontal view of the object-to-track. The idea is to train a classifier with the same set of feature points viewed from multiple-poses. The key-point patches came from only a few input images. They are expanded into many warped versions. At the end, a key-point patch would be represented by a set of mean-patches. Each mean-patch represents  one single pose. At each single-pose, the image is again expanded into many poses. Only this time the variations are small and around the associated pose. A mean-patch is the 'average' of patches that 'differs' a little bit around that pose.

Matching: Directly compare an incoming patch with mean-patches from all the poses of all the patches in database. (How does that work? Comparing pixels to PV)? The patches are normalized before comparison. (Some heuristics to speed up search like K-D Tree). The search not only returns a mean-patch but also its associated 'coarse' pose.

Speed up the calculation of mean-patches
The author makes use of linear method of computing mean-patches such that it would be preferable to other blurring techniques such as Gaussian. The perspective transforms to make mean-patches takes up most of the training time, too slow to be used for online training. According to the authors, it takes 300 patches to compute the mean in order to get good results. A 'math trick' allows the training will be split into two parts.

Offline Training
Principal Components Analysis is used for offline training. A reference patch is broken down into a mean and L components of principal vectors. (L is user-defined). So instead of warping the image patch in terms of pixel-arrays, it will be acted on means and principal vectors. The mean-patch is a weighted sum of the average-warped-means and average-warped-PVs.

Online Training
With the offline training done the heavy lifting - and the mean-patch calculation now only requires time proportional to the number of PV components, not the number of 'small-warps'. The major work left in online-training is to deduce the 'weights' for each new patch. It will be projected into eigenvector space and solve for a set of coefficients (weights). These will be used to compute a mean patch - the linear sum. (But which feature-point-pose eigenvector-space to use?!)

Demo Application (one_way_sample.cpp)
The demo code makes use of OneWayDescriptorBase (not the OneWayDescriptorMatcher).

Offline phase: Build the mean-patches from a set of 2 images (same chessboard viewed from 2 different angles). The number of 'dimensions' (L) is set to 100 by default. The image patches are of size 24x24. OpenCV implementation uses SURF to detect key-points from training images. And it does 2 versions of it, another one is half the specified patch-dimensions(12x12). The paper mentions something about this to improve consistency. There will be 50 random poses for each patch. The result would be saved to pca.yml file. And it would be loaded back in as a PCA as an array of OneWayDescriptor. I am so far unable to find the definition of how many 'small-pose-changes' from which the 'mean' is computed.
Online phase: SURF detector is used to detect key-points from the reference input image. The OneWayDescriptorBase would compute One-Way Descriptors for these key-points.

Uses SURF detector to find key-points from the second (incoming) image. Incoming key-points will be queried one-by-one from the first (reference) key-points with the OneWayDescriptorBase::FindDescriptor(). The matching criteria is a distance threshold. The pose will also be available.
At the end a correspondence map will be drawn and displayed.

  • 42 keypoints detected from the training images (scene_l.bmp, scene_r.bmp)
  • Took 2 hours to create 100 PCA components.
  • Reference image descriptors prepared in 2.1 ms
  • Matching 37 keypoints takes 2.2 ms [ result is good but i guess it's too simple]
Tried matching box.png(train) and box_in_scene.png(query): 72 keypoints
  • Beware pca.yml was produced under training directory while it was loaded from working directory.
  • The matching result is so-so, 1/3 of them are false-matches.
Tried img1.ppm and img2.ppm set from Dundee Graf set: 502 keypoints
  • Took several minutes to do matching.
  • Cannot tell if it is good or not with so many correspondences on screen.
Demo (generic_descriptor_match.cpp)
  • Able to load the PCA data (with a bug discovered) from previous example.
  • It does not work - at least not supposed to be. First, the GenericDescriptorMatcher::match() calls OneWayDescriptorMatcher::clone() with the parameter indicating that the training was not performed. That means the OneWayDescriptorMatcher is re-instantiated again, discarding the PCA data loaded with GenericDescriptorMatcher::create(). I noticed that this when the training part takes too long. And the function Initialize() is called instead of InitializeFast() inside OneWayDescriptorBase::InitializeDescriptor().

More note: There is trouble writing the PCA file (150MB) from the PC. It stops (without warning) at about 7MB. It was able to do the matching (I suppose the data is cached). No such problem running from the notebook.

Further note: The paper is too brief for me to understand totally. Especially on how to learn new key-points. It seems kind of like magic by training using unrelated images to produce some mean-patches that is used to compare 2 other pairs of images. Is it supposed to work like this?!

Real-Time Learning of Accurate Patch Rectification, Hinterstoisser et al.

Future Reading
  • Simultaneous recognition and homography extraction of local patches with a simple linear classifier, Hinterstoisser et al.
  • Online learning of patch perspective rectification for efficient object detection, Hinterstoisser et al.

Sunday, March 6, 2011

Random Ferns Classifier - Semi-Naive-Bayes

The team that proposes training Randomized Trees on Binary Descriptors for fast key-point matching is trying another approach to speed up training and matching. This time they use Semi-Naive-Bayes classification instead of Randomized Trees. The word 'semi' here means that not all the input elements are independent. The input vector would be divided into groups. Only the probability densities among groups are assumed to be independent. The grouping is selected by randomized permutation. Input vector is extracted from a key-point region using binary-intensity-differences. 300 of them will be extracted from a 32x32 patch region. A typical group-size is 11, so there will be about 28 groups. Each group is a 'Fern', so it's called a Fern Classifier / Matcher. An input patch will be characterized by the SNB classifier to one of the classes - set of stable key-points. A product of posterior probabilities is calculated given a class label is true. The input patch would be classified to the one of highest value.

Training Phase: Very similar to Randomized Tree. Only a few training images is required. A set of stable key-points will be chosen by transforming the input images in many ways(300). These stable key-points becomes the class labels. Each image is then transformed again many more times (1000) to obtain the view-set. The classifier will keep count of each Fern pattern (vector of binary-intensity-differences of a group of pixel-pairs) for each associated class label. The counts are used to set the prior probabilities.

The training and testing for 2D matching is done on a video frame sequence. The frame with upright front facing object is chosen for training.

Implementation decision has to be made on how to divide up the input vector into groups - Fern-Size. Increasing fern-size yields better 'variations' handling. (Is this referring to perspective, lighting variants?) Care must be taken with respect to memory usage. The amount required to store the distributions increases quickly with Fern size. And it would need more training samples (to build distributions of a bigger set of possible values?). On the other hand, increasing number of Ferns while keeping the same Fern size (small?) (increased vector size?) gives better recognition rate. The comes with only linear memory increase. But the run-time costs increases (relevant?!).

There is a paper on mobile AR application using Ferns - Citation 34 "Pose Tracking from Natural Features on Mobile Phones", Wagner et al.

Demo (find_fern_obj)

This demo uses the LDetector class to detect object keypoints. And it uses PlanarObjectDetector class to do matching. FernsClassifier is one of PlanarObjectDetector members.
  1. Determine the most stable key-points from the object image (by recovering the key-points from affine-transformed object images).
  2. Build 3-level Image Pyramid for object image.
  3. Train the stable key-points with FernsClassifier and save the result to a file. The image pyramid is also supplied for training. Parameters include Ferns size, number of Ferns, Patch size, and Patch Generator.
  4. Load the PlanarObjectDetector from the file obtained from the last step.
  5. Use the LDetector to find keypoints from the scene image pyramid. Match them against the object key-points using the PlanarObjectDetector. The results are represented as index-pairs between the model-keypoints and the scene-keypoints. The model-keypoints are the stable keypoints of object-image. The list is available from the loaded PlanarObjectDetector instance.
  6. Draw the correspondences on screen.
More notes: Object and Scene Images are loaded as grayscale. And they are smoothed with a Gaussian Filter.

Demo (generic_descriptor_matcher)

The simplest way to exercise Ferns matching is to use the FernDescriptorMatcher class. The demo program is very straightforward. The find_obj_ferns demo app is more informative.

Results and Observations

Using ball.pgm (book is pictured sideways) from Dundee test set as training image.

In most cases, it is able to find and locate correctly from the scene images it appears on. The worst result is TestImg010.jpg. It cannot locate the whole upside down book. I suppose that is because the lack of keypoints detected. The book title "Rivera" is obscured.

Test for false-positive using TestImg02.pgm. The detector return status of 'found' but it was obviously wrong. Half of it is 'out-of-the-picture'.

Fast Keypoint Recognition using Random Ferns, Ozuyal et al.

Calonder Descriptor, Generic Trees and Randomized Trees

Summary of both papers (see Reading)

The first paper proposes to use ML classification techniques to do fast keypoint matching. Time will be spent in offline training and resulting in a shorter matching time. They found that Randomized Trees is a good candidate. It supports multi-class classification and by experiment they give good matching results. If the node-split-criteria is chosen randomly also (Extreme Randomize Trees?), not only the training time be reduced, but also better matching results. The forest will classify an input key-point from the set of training key-points. So given a key-point from input image, the forest is able to characterize whether it matches one of the original keypoints (or none-of-the-above). The training key-points are chosen by its ability to be recovered from a set of distorted views. Additionally, they found that using simple key-point detector (that inspired the BRIEF descriptor?) is good enough to achieve illumination invariance. The paper devises a way to achieve invariants with this simple descriptor by 'reproducing' each training image multiple times into a 'view set'. Each original image is randomly rotated and scaled into 100 separate images for the 'view set'. All images from the view-set will be used to train the classifier so that it will be able to detect the same keypoint patch from various viewpoints at run-time. Scale invariants is improved by building image pyramids from which key-points are extracted also. (Small) Position invariants is enhanced by injected random 'noise' (in terms of positions?) to the view-set. Placing the training patches on random backgrounds so that the classifier could pick out those trained key-points from cluttered background. Such Binary Intensity Descriptor like this together with the view-set performs very well against sophisticated SIFT descriptor, provided that the random trees is able to divide the keypoint space up to a certain granularity.

The second paper is a continuation of first paper. The focus this time is try recognizing objects by key-point matching fast enough to use in real time video. An important application is SLAM where offline learning is not practical as objects cannot be learned ahead of time. The authors propose a Generic Tree Algorithm. First, a randomized tree classifier is trained with a set of key-points called 'base-set'. The key points are selected from only a small number of training images. And similarly, the images are warped in many ways for training to achieve invariants. At run time, a query key-point will be go down the classifier, resulting in a set of probability for n-classes. This set is treated as a Signature (descriptor-vector) for this key-point. The Signature would have n-elements (corresponding to the n-classes). Each element is a thresholded value of the class output. The matching between key-point signatures is done using Euclidean Distance.. The theory is that any new correspondence keypoint-pair will have similar classifier output, even though they do not belong to the base-set.


OpenCV defines CalonderDescriptor class that could produce the Signature of a query point-of-interest from a given Randomized Trees. RTreeClassifier class represents the forest and it is trained with a given base-set and a patch-generator. The base-set is basically collection of key-point locations from a training image. The size of the base-set is the number of classes trained to classify. PatchGenerator objects are used to warp an image using the specified ranges - angles for rotation, intensities for background colors, deltas for position-noises, lambda for scales.

Demo code (find_obj_calonder.cpp)

Dataset - Dundee University set.

The demo code trains Randomized Tree Classifier of 48-trees of 9 levels deep. Took more than 30 minutes to train 176 keypoints (selected out of 1423) from a single image. PatchGenerator creates 100 views for each key-point. The classifier will be saved to a file after training. At run-time, it uses SURF to pick interest points from reference and query images, extracts Calonder Descriptor with the classifier and performs Brute-Force(L2) matching. By default the input image from command-line argument will be used as a reference image. The query image is a warped version of itself. All images are converted to gray-scale for training and tests.

Wrote another test function so that instead of warping the input image, user supplies another image for matching.

Results and Observations
  • Trained (one-at-a-time) with upright object-only image: book1, book2, ball
  • Finding the object image from the object-in-a-bigger-picture images did not do well. Many false matches.
  • Most time spent on loading the classifier data file (~16MB).
Used one of the trained classifiers. Run the default test-case - matching between a query image and its warped version. The testing images are not related to the training image. The warping is not too severe. The results are satisfactory.

Site for image databases and associated 3D model (stereo-vision):

  • Keypoint Recognition with Randomized Trees, Lepetit & Fua
  • Keypoint Signatures for Fast Learning and Recognition, Calonder, Lepetit & Fua

Opponent Color Space Descriptors, Evaluations

The Sande, Gevers and Snoek paper proposes using color to complement the existing intensity-based corner detection and salient region description. The goal is be able to find more salient key-points, and represent the surrounding region with a discriminative descriptor for better matching and object recognition.

The idea is basically to extend the current single-channel methods to support multi-channel. Images get pre-processed for image-space transform. The author uses Opponent Color Space as an example. Described how to extend Harris-Laplace Corner Detector. The paper briefly go over a few color-SIFT detector such as: OpponentSIFT, WSIFT, rgSIFT.

The authors compared the degree of color-invariance among those color-SIFT methods. Color Invariance - invariant to illumination highlights, shadow, noise.

Opponent Color Transformation Steps:
1. RGB -> Opponent Color Space (O1, O2, O3)
2. Salient Color Boosting - Normalize the Opponent values with weights 0.850, 0.524, 0.065.

W-SIFT - I guess that 'W' is the W invariant property, which is a ratio of some spatial-differential transformed pixel value. The transformation is Gaussian Color Model. I suppose this property would be part of the descriptor, useful in matching.

OpenCV implements a Opponent Color Conversion. But I cannot find where it does the Saliency Boosting. It  does not seem to implement the corner detection using multi-channel images. And it supports descriptor using separate channels implicitly.

The OpponentDescriptorExtractor expands an existing Descriptor Type with Opponent Color. It does so by repeating the extraction on all 3 channels and concatenate them together as one big descriptor.

Demo (descriptor_extractor_matcher.cpp)
Cannot find dedicated demo for Opponent Color Space, borrowing this generic one as a try-out.

User picks a trio of Detector, Descriptor and Matcher to perform keypoint-matching between 2 images. The second (query) image could come from 2 different sources: 1) user provided image, 2) a 'warped' version synthesized from the first one.
  • Since Opponent Color Descriptor builds on an intensity-based one, specify the latter by peggy-backing, such as OpponentSURF = OpponentColor + SURF.
  • Cross-Check-Matching: Pick out strong matches by including only those appearing on both forward and backward matching.
  • Optionally draw in-lier matches only: Use the original homography or (RANSAC) approximate one from the strongly matched keypoints. Transform the reference key-points using this H. Query key-points must be within a threshold distance from the corresponding warped key-point in order to be considered an inlier match.
Secondary function of this application demonstrates how to use Evaluation API from OpenCV. Implementation in evaluation.cpp under features2d module.
  • Feature-Detector Evaluation - Given 2 sets of key-points from different viewpoints of the same scene and its homography matrix. The evaluator returns the number of correspondences and repeatability value. Repeatability is basically a ratio of the correspondences to key-points count. It does so by analyzing the overlapping elliptical key-point region between the query key-point and the projected reference key-point.
  • Generic Descriptor Matcher Evaluation - Given a DescriptorExtractor-DescriptorMatcher pair in the form of GenericDescriptorMatcher type, and two-sets of key-points and its homography. The evaluator returns the Recall-Precision Curve. Recall values are the ratio of the Current Correct Matches to Total Correspondences. Precision values are the ratio of Current Correct Matches to Current Total Correspondences. Using the term 'Current' in a sense that each ratio value is associated with a match. They are calculated in the order of descending strength (matching distance). A match is Correct if the overlapping region is big enough, just like how the detector is evaluated.
Results using Graf set from Oxford UGG

Results (img1 vs img2, 3203 vs 3536 keypoints detected)
Strong Matches 988 - Inliers 499
Strong Matches 1323 - Inliers 834

Results (img1 vs warped-img1 3203 vs 2758 keypoints detected)
FD Evaluation (took 5 minutes) - repeatability 0.673, correspondences 1719
Strong Matches 993 - Inlier 565
GDM Evaluation (took 20 minutes)
1-precision = 0.0; recall = 0.000966184
1-precision = 0.1; recall = 0.000966184
1-precision = 0.2; recall = 0.000966184
1-precision = 0.3; recall = 0.000966184
1-precision = 0.4; recall = 0.000966184
1-precision = 0.5; recall = 0.000966184
1-precision = 0.6; recall = 0.00966184
1-precision = 0.7; recall = 0.0995169
1-precision = 0.8; recall = 0.19372
1-precision = 0.9; recall = 0.319324
Strong Matches 1175 - Inlier 761
1-precision = 0.0; recall = 0.00193237
1-precision = 0.1; recall = 0.00193237
1-precision = 0.2; recall = 0.00193237
1-precision = 0.3; recall = 0.00193237
1-precision = 0.4; recall = 0.00289855
1-precision = 0.5; recall = 0.00338164
1-precision = 0.6; recall = 0.00772947
1-precision = 0.7; recall = 0.0144928
1-precision = 0.8; recall = 0.0550725
1-precision = 0.9; recall = 0.241063

  • Color Descriptors for Object Category Recognition, van de Sande, Gevers and Snoek
  • (Opponent Color Space) Boosting Saliency in Color Image Features, Weijer, Gevers.

Wednesday, March 2, 2011

VSC++ 2010 Express Migration

Able to run the C/C++ samples from VC++ 2010 Express on Windows 7, with WebCam working too. But only in 32-bit mode. So in CMake configuration choose Visual Studio 10 without x64.

Problem for 64-bit target on OpenCV 2.2
Link error for highgui - see bug 735. The patches has yet to solve the 64-bit build problem for me.
*update* the problem does not appear any more in OpenCV 2.3. Able to build 32-bit and 64-bit  out of the box. Video capture from webcam works with starter_video sample.
*update-2* seems like vcvars64.bat is not installed by VC Express by default, causing OpenCV 2.3 gpu module build to fail with error: configuration file '(null)' could not be found for installation at "C:/Program Files (x86)/Microsoft Visual Studio 10.0/VC/bin/../.."
Solution: follow the simple instructions here to generate the vcvars64.bat.

VC++ Directory Settings is changed.
For me, the tricky part is to found out that the per-user (all projects) settings cannot be view/edited until a project (OpenCV in this case) is opened.
On property sheet (which already there in earlier versions of VS):

Not sure if this is a windows 7 thing - but ran into failure when building OpenCV documentation with buildall script. Fortunately, people have already solved this issue -
I followed all the steps except the 'Reboot' part and it worked.

Redistributable Binaries
The VS2010 SPI 32-bit redist binaries are installed to Windows\System32 instead of the C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\redist\. and not in a directory structure like x86\Microsoft.VC90.CRT. CMake Config complains about not able to find the DLLs when turning on BUILD_PACKAGE option.

Tuesday, March 1, 2011

Blob Tracking - Video Surveillance Demo

A blob tracking system is included in OpenCV
Code: OpenCV/modules/legacy/
Doc: OpenCV/docs/vidsurv/

The blob-tracking code consists of a pipeline of detecting,  tracking and analyzing foreground objects. It should be a video surveillance system demo by the name of its folder. It not only detect and track blobs, it tries to pick out unusual movements with the analyzer.

The main loop of processing the input video frames is CvBlobTrackerAuto1::Process(). And the order of the processing order differs a little from the Blob_Tracking_Modules.doc. For example, the BlobDetector is actually run after BlobTracker and Post-Processing.

Each stage has a few methods to choose, for example, I could select between FGD and MOG for the FG/BG separation stage.

Some notes I skim through some of the code of the following stages:

Blob-Detectors  (BD_CC, BD_Simple)
  • The file header suggests reading literature by Andrew Senior et al (see Resources section).
  • The purpose of this module is to identify blobs that are 'worthy' of tracking. It does so by first filtering out noise from the foreground mask. The main criteria is that the CCs has to move in a reasonable 'speed' (uniform motion). It determines so by keeping a list of candidates connected-components for the last 5 frames. That probably means that the frame-to-frame change of blob location does not exceed a certain amount in order to qualify.
  • Well I cannot tell the difference between the BD_CC and BD_Simple method by looking at the code.

Blob-Trackers (CCMSPF, CC, MS, MSFG, Particle-Filter)
  • Some literature listed at blobtrackingccwithcr.cpp regarding Particle Filter and tracking temporarily obstructed objects. 
  • Some code are there for Multiple Hypothesis Tracking (pBlobHyp) - but execution breakpoints were not triggered during my testing.
  • The CC trackers uses Contour and Shapes (Moments) to represent blobs while the MS-based uses color intensity histograms. 
Both Connected-Component trackers use Kalman Filter to predict the next blob location. The ProcessBlob() function updates the blob model by the weighted sum of newly captured value and the prediction. In the case of blobs collision, only the predicted value would be used.

Collision Checking

  • CC Method: Detect collision by exhaustively examining every Blob (position + size) 
  • CCMSPF Method: Go further by 'resolving' collision with a MeanShift-Particle-Filter.
All MeanShift trackers below are an instance of CvBlobTrackerList. It holds a list of the corresponding CvBlobTrackerOneMSXXX instances. One tracker for each blob.
  • MS Method: Simple mean-shift tracker for every blob. 
  • MS with FG weights: Foreground pixels are weighted during calculations. A higher value makes the blob accelerates the movements and resizes itself in the model. 
  • MS with Particle Filter: 200 particles is allocated for each Tracker (Blob). Each particle represent the same blob moving and resizing a little differently from others. It position and size delta is generated within some preset variances plus some random value. At the each frame, the particles got randomized with the new values with those parameters. And a weighted sum yields the new prediction (position, size). And then the particles are shuffled and the weights are reset to 1. Each particle is associated with a weight. The weights are updated every frame. They are functions of the Bhattacharyya Coefficients calculated between the current Model Histogram and the Candidate Histogram. The Model Histogram is updated every frame from the blob position and size. The Candidate Histogram is the histogram calculated with the hypothesis particle. Where is the mean-shift?!
Condenation Algorithm ((Conditional Density Propagation) - to represent non-Gaussian distribution. When an object is obstructed, there are multiple possibilities of what could happen. And that is not Gaussian. My understanding is the Particle filter is able to represent multi-modal distribution. The distribution is represented by groups of particles. The density of each group represent the probability density of one of the range along the x-axis.

Post-Processing (Kalman Filter)
Results from Tracking stage will be adjusted by Kalman Filter. That is the Blob Position and Size will be updated.

Track Generator
  • Record the track (position and size) of each blob to a user-specified file. 
  • The values of both information are represented as a fraction of the video frame size. 

Blob-Track-Analyzers ( Histogram analysis of 2D, 4D, 5D, SS feature-vectors, TrackDist, IOR)
A 'status value is maintained on all active blobs:  Normal or Abnormal.

  • Tracks of Past Blobs (no longer a foreground in recent frames) are added to track database. They will be used as templates to compare with the active blobs tracks. 
  • Find the closest match from the templates for each active blob in terms of their similarity in position, velocity. 
  • The state is Normal if it could find a reasonably close match.
Histogram P, PV, PVS
  • Each active blob has a histogram representing its track. There are 3 types of dimensions: 1) position, ) position and velocity, 3) position, velocity and state-change. As far as I could understand, the state-change represents the number of successive frames during which the blob moves very slowly. The 'slowness' is making it almost stationary between frames. 
  • A sparse-matrix is used to store the histogram of these continuous vector values. 
  • Nearby histogram bins are smoothed at which every new vector is collected. 
  • All histograms are updated at every frame. 
  • Past Blobs will have its histogram merged with a global histogram. Similarly, it will be used to decide whether a particular active blob track is 'normal'.
Histogram SS
Similar to P-V-S histogram except that the vector consists only of starting-position and stop-position. A blob would be seen as stopped as soon as the state-change counter reached 5.

Demo Code (blobtrack_sample.cpp):
  • The demo code put everything together. In its simplest form, user supplies a input video file, it would display the 'Tracking' window - marking the moving blobs on the video with a location-size-circle, BlobID and Analyzer status (Abnormal in red, Normal in Green). 
  • The tracking video could be saved to a video file if provided a file name. 
  • The foreground masks could be shown at a separate window and save to a video file of user's choice. 
  • If a Track-Generator output file is specified, the precise blob locations at every frame together with its frame-of-entrance will be would be recorded to that file. 
  • And there is a general log file showing the user parameters, same as those appear on the command console.
  • Although haven't tried it, user should be able to pass method-specific arguments. For example, in addition to choosing Meanshift method for tracking, user is also able to pass specific parameters. See function set_params().
Results from Road-Side Camera Video)

Command-Line Arguments: bd=BD_Simple fg=FG_1 bt=MSPF fgavi=fgSeparated.mp4 btavi=blobtracked.mp4 bta_data=trackAnalysis.log track=track.log log=blobtracksample.log

In general I am not too satisfied with the results in terms of tracking. I don't know whether this is expected with video I have. For example, the blobID for the same car could change as it goes farther from the camera, and vice-versa. The analyzer result is often abnormal for some reason even if the cars are simply going along the road.


Learning OpenCV, O'Reilly Press.