Monday, August 1, 2011

OpenCV 2.3.0 GPU speed-up with CUDA 4

Now it's time to build OpenCV 2.3.0 with GPU enabled.

Configuration and Build -
Follow the steps described in the OPENCV_GPU page for Visual Studio 64-bit build.
module-gpu build error: Configuration(null)
Solution - missing vcvars64.bat in Windows SDK amd64 directory. Create that by following the simple instructions here
Taken by surprised at first because I am able to build 64-bit OpenCV. I suspect it has to do with nvidia compiler (nvcc). It probably open a windows shell to do compilation. And that would not have the 64-bit environment set up without this vcvars64.bat.

Test GPU build by running module-gpu-test suite from VS 2010 Express
See "Implementing tests" section of
Setting up Test Data
Test-Data is required by the gpu-test-suite (and others too). Download a snapshot of the opencv-extra package that is tagged for OpenCV 2.3.0 release from WillowGarage. There is a "Download Zip" link in the source browsing page that makes it convenient.
Set the environment variable OPENCV_TEST_DATA_PATH to point to the testdata directory.
Run the project module-gpu-test
Resulted in 3 types of failures
  1. My NVidia hardware that has compute-capability of 1.2. 1 case requires 1.3
  2. Crash in meanShift and meanShiftProc. The stack trace shows that it dies at the point where GpuMat variable is being released.
  3. Assertion error in NVidia.TestHaarCascadeAppl. (Didn't investigate further).
The other tests run OK.

Learned to use the gtest_ command-line argument - see code comments above ParseGoogleTestFlagsOnlyImpl()
  • gtest_list_tests : shows the tests selected to run and quit
  • gtest_filter= : select the tests to run / or not to run by matching a specified pattern against test name. Pattern for negative matching begins with minus sign.
  • gtest_output=xml[: directory name / file-name ] : output a summary of tests results in XML. Details see ts_gtest.cpp (search for GTEST_DEFINE_string_)
OpenCV GPU module
The library implements accelerated versions of other areas of OpenCV  - image processing, image filtering, matrix calculations, features-2D and, object detection, camera calibration. The API and data-structures are defined in nested namespace cv::gpu::. The accelerations makes use of both NPP API and CUDA parallelization.

Run a few OpenCV GPU samples that could readily compared with non-GPU ones
  • surf_keypoint_matcher vs matcher_simple: speed up from 46 secs to 6 secs with the graffiti image from VGG set.
  • mofology vs morphology2 : not very obvious in my quick test. still noticeable when changing the element shape at a Open/Close set at 17-iterations.
  • hog_gpu vs peopledetect : speed up from 67 to 17 secs with my 5M-pixel test image.
  • cascadeclassifier_nvidia_api vs cascadeclassifier(GPU)vs facedetect (no-nested-cascade) : overall (secs): 5.1 / 4.8 / 4.5; detection-only(secs): 1 secs / 1 / 3.1 

