Thursday, February 17, 2011

Decision Tree

Learning OpenCV book has a good description of the Decision Tree ML method. The API documentation also have a good summary.

Implementation
Based on paper by "Classification and Regression Trees" by Brieman et al.

Parameters

  • Related to Tree Pruning (Post-Build)
    •   use_1st_rule, truncate_prune_tree
  • Is cv_fold use for both tree building and pruning with respect to average Gini error?
  • Support surrogate splits to handle unknown data - meaning it finds 'backup' attributes from the feature vector that would split the node with similar 'purity'.

Sample (mushroom)
  • Classification of mushrooms of being Poisonous or Edible based on 20 discrete attributes.
  • Demonstrate decision tree traversal with interactive-prediction phase.
  • Display a table of importance of attributes after the tree is built.
  • Lots of data available to use from UC-Irvine ML Data Repository (see Resources)
  • Sample could be easily modified to tackle other classification databases.
  • Setting 'penalty-weight' to 1 gives 8 percent of false-negatives. Quickly decreased to 0 when it is set to 2.
Resources


Readings

  • Learning OpenCV, Gary Bradski & Adrian Kaebler (O'Reilly Press)
  • Introduction to Machine Learning, 2nd Edition, Ethern Alpaydin (MIT Press)

No comments:

Post a Comment