Very Simplified Summary
Haar Feature is similar to Haar Wavelet. The weights inside the box-filter could be oriented horizontally, vertically, diagonally.
Viola-Jones Classifier is a 2-class Cascade Classifier. The cascade is made up of a series of nodes. Each node is a AdaBoost forest (2-class-classifier). An input vector is classified as 'Yes' only if it 'passes' all the cascaded nodes. The classification process aborts when it sees a 'No' from the current node.
Each node is built with high-acceptance rate - therefore many false-positives, and low rejection rate. The trees of AdaBoost forest typically has only a single split. And each forest has about 10 decision stumps (single-split tree). The theory is that the nodes are built to recognize faces of different orientations. Early rejection meaning it spends little time for negative samples.
Found this excellent page from the forum after I wrote this entry: http://note.sonots.com/SciSoftware/haartraining.html
It requires thousands of good samples and 10s of thousands of bad samples to train the classifier. The book says it could take hours or whole day even for a fast computer. There is no exact number given. I guess it depends on the size of the feature-vector or number of features.
'haarTraining' is a standalone program that will train the classifier with pre-processed feature points from Positive Samples and Negative Samples. User is able to specify parameters to 'shape' the nodes and trees.
Positive Samples: Images with faces marked with rectangle. Best results if the the faces are aligned similarly. Do not mix upright with tilted.
Negative Samples: Simply pictures without faces. Preferably with backgrounds similar to the 'Positive samples'.
'createSample' is a standalone program that extracts the face-rectangles and rescale it to the same size as specified by the user.
(Paraphrasing) OpenCV book says Haar Feature Detector works well with Rigid Body with blocky features (like eyes). Objects that's only distinguishing feature is its outline (coffee mug) is hard to detect. 'Rigid' means object that the amount deformation by external pressure is negligible.
Building and Running 'createSamples' and 'haarTraining'
Source code: OpenCV/modules/haartraining/
VC++ Solution file: CMAKE_Build/modules/haartraining/
Test Sample with Coca-Cola Logo (Step 1: createSample)
createSample uses OpenCV built-in C API to make training and test images by superimposing an input foreground image into a list of user-provided background images. In order to create varieties, the object(foreground) image is transformed (perspective), intensity-adjusted before finally scaled to the specified size and overlaid on to the background image.
- Training Samples: Use createSample to produce a _single_ 'vec' file suitable for training. All the input images are embedded in that file. See header file for details (comment added).
- Test Samples: Use createSample to produce a set of test images together with an 'info' file. The plain text file specifies the region of the transformed foreground object inside each test image. Only a single object would be overlaid on each background image.
- 'createSample' application can be used to view the images inside a 'vec' file.
Produced 500 images of with coca-cola logo embedded on 6 of the background images chosen from the GrabCut BSDS300 test images.
Test Sample with Coca-Cola Logo (Step 2: haartraining)
The haartraining program is straightforward, it calls the cvCreateTreeCascadeClassifier with the necessary cascade-parameters, input 'vec' file location and output directory location.
What is the difference between cvCreateTreeCascadeClassifier() and cvCreateCascadeClassifier()?
No idea. Glanced through the code. cvCascadeClassifier seems to be more straightforward. cvCreateTreeCascadeClassifier does more than basic Cascade training. There is early termination condition checking. And there is training-data clustering, probably for evaluation of the classifier stages.
Explanation of the 'mem' command-line parameter of haartraining.cpp is misguided.
haartraining.htm says it specifies the maximum memory allocated for pre-calculation in Megabytes. It is actually passed to cvCreateTreeCascadeClassifier() as 'numprecalculated' argument. It specifies the number of features to be 'pre-calculated', whatever that means. So it is true that a higher number requires more memory. But the value itself does not cap the amount of memory allocated for this pre-calculation task. In fact, code-comment from cvhaartraining.hpp includes a formula on how the memory for 'feature pre-calculation' is a function of this argument.
- Used the 'createSample' to produce 500 Positive Samples with a Coca-Cola Logo embedded on about 6 background images chosen at random. The cola-cola logo image is reduced from 482x482 to 36x36 in size.
- Used all 200 images from the set GrabCut test samples as Negative Samples.
- Classifier is created in 2 forms. A single XML file and a database format. The database consists of a set of directories - one per stage. cvCreateTreeCascadeClassifier() actually calls cvLoadHaarCascadeClassifierCascade() to produce the XML file from the directory-set, as demonstrated from in convert_cascade sample.
- The number of stages built is actually 8 instead of 14 as specified. The training stops with this message: "Required leaf false alarm rate achieved. Branch training terminated.".
- The training function reports the performance using training data: 98.6 hit rate, 8.96e-6 false-alarm rate.
- The 99% hit rate is achieved at the first stage, the rest of the stages lowers the false-alarm rate which starts at 10%.
- BACKGROUND PROCESSING TIME: Time taken to load negative sample (and extract Haar features?)
- "Number of used features": Varies from 1 to 5, corresponding to the number of rows a tabular format output. This number seems to represent the number of trees at the current cluster (stage).
- "Number of features used" (different from the last point): Simply calculated from the size of the object and not from 'feature-detection' of the actual training pictures.
- How come 'Chosen number of splits' and 'Total number of splits' are always zero?
- Training time could be long and requires lots of CPU and memory.
- In fact, the CPU constantly maxes out.
- Time required is proportional to the number of features, and that in turn is proportional to the size of the foreground object picture (coca-cola logo).
- At original resolution (482x482) - program ran out of memory in a few minutes.
- At 48x48 resolution ~ about 4.1 million 'features' and MEM set to 512. 1st stage takes an hour ( did not wait to complete).
- At 36x36 resolution ~ about 1.3 million features and MEM kept at 512. it takes 3 hours to complete. It terminates by itself after 8 stages out of 14, with reason stated earlier.
Test Sample with Coca-Cola Logo (Step 3 - final: face_detect)
OpenCV book gives excellent description on function parameters for CascadeClassifer::detectMultiScale(). Especially on the 'scaleFactor' and the 'flags' arguments.
- Create 6 test image similar to training images.
- Original parameters: Able to detect from 3 out of 6 images.
One that have failed are much smaller size than the rest (36x36), which is actually the original object size! The other two failures are probably related to the object is tilted.
The book suggests training separately the upright and tilted objects.
- Reduced 'scaleFactor' from 1.1 to 1.01: Able to detect the 36x36 object.
The detection is scale-sensitive. So giving it a finer scaling steps increases the hit-rate, at the expense of receiving more false-positive results.
- Re-generate the set of test images, with half the maximum rotation angle for distortion: more 1 more object is recognized.
Test Sample with Running Face Classifier (face_detect)
The face_detect sample demonstrates how to 'nest' classifiers to detect finer features. By default the sample deploys the face-alt-2 classifiers to find face regions. Followed by the eye-tree-eyeglasses classifier to find smaller features from within each of the regions returned by the face-alt-2 classifier.
Pre-built Face Cascade Classifiers
- Location: OpenCV/data/haarcascades/
- Dimensions of the training object could be found in most classifier files inside XML comments near the beginning.
- Check the value of 'minSize' to detectMultiScale() of nested Classifier. The minimum for face could be too big as for mouth.
- Set the 'minSize' argument to maintain aspect ratio of the trained object.
- It takes around one second to finish the detecting process for a 30x30 object from a VGA picture. ScaleFactor at 1.1.
Wikipedia on Rigid Body: http://en.wikipedia.org/wiki/Rigid_body
http://note.sonots.com/SciSoftware/haartraining.html (Script to expand CMU GroundTruth Data)
CMU-MIT Face Detection Test Sets: http://www.ri.cmu.edu/research_project_detail.html?project_id=419&menu_id=261
Face Databases as noted from some of the haarcascade classifier files:
- BioID: http://www.bioid.com/support/downloads/software/bioid-face-database.html
- FERET: http://www.itl.nist.gov/iad/humanid/feret/feret_master.html
Robust Real-Time Face Detection, Viola & Jones, International Journal of Computer Vision 57.