TMVA::MethodBDT

Class Description

_______________________________________________________________________

 Analysis of Boosted Decision Trees

 Boosted decision trees have been successfully used in High Energy
 Physics analysis for example by the MiniBooNE experiment
 (Yang-Roe-Zhu, physics/0508045). In Boosted Decision Trees, the
 selection is done on a majority vote on the result of several decision
 trees, which are all derived from the same training sample by
 supplying different event weights during the training.

 Decision trees:

 successive decision nodes are used to categorize the
 events out of the sample as either signal or background. Each node
 uses only a single discriminating variable to decide if the event is
 signal-like ("goes right") or background-like ("goes left"). This
 forms a tree like structure with "baskets" at the end (leave nodes),
 and an event is classified as either signal or background according to
 whether the basket where it ends up has been classified signal or
 background during the training. Training of a decision tree is the
 process to define the "cut criteria" for each node. The training
 starts with the root node. Here one takes the full training event
 sample and selects the variable and corresponding cut value that gives
 the best separation between signal and background at this stage. Using
 this cut criterion, the sample is then divided into two subsamples, a
 signal-like (right) and a background-like (left) sample. Two new nodes
 are then created for each of the two sub-samples and they are
 constructed using the same mechanism as described for the root
 node. The devision is stopped once a certain node has reached either a
 minimum number of events, or a minimum or maximum signal purity. These
 leave nodes are then called "signal" or "background" if they contain
 more signal respective background events from the training sample.

 Boosting:

 the idea behind the boosting is, that signal events from the training
 sample, that end up in a background node (and vice versa) are given a
 larger weight than events that are in the correct leave node. This
 results in a re-weighed training event sample, with which then a new
 decision tree can be developed. The boosting can be applied several
 times (typically 100-500 times) and one ends up with a set of decision
 trees (a forest).

 Bagging:

 In this particular variant of the Boosted Decision Trees the boosting
 is not done on the basis of previous training results, but by a simple
 stochasitc re-sampling of the initial training event sample.

 Analysis:

 applying an individual decision tree to a test event results in a
 classification of the event as either signal or background. For the
 boosted decision tree selection, an event is successively subjected to
 the whole set of decision trees and depending on how often it is
 classified as signal, a "likelihood" estimator is constructed for the
 event being signal or background. The value of this estimator is the
 one which is then used to select the events from an event sample, and
 the cut value on this estimator defines the efficiency and purity of
 the selection.

_______________________________________________________________________

MethodBDT( TString jobName, vector<TString>* theVariables, TTree* theTree, TString theOption, TDirectory* theTargetDir )

 the standard constructor for the "boosted decision trees"

 MethodBDT (Boosted Decision Trees) options:
 format and syntax of option string: "nTrees:BoostType:SeparationType:
                                      nEventsMin:dummy:
                                      nCuts:SignalFraction"
 nTrees:          number of trees in the forest to be created
 BoostType:       the boosting type for the trees in the forest (AdaBoost e.t.c..)
 SeparationType   the separation criterion applied in the node splitting
 nEventsMin:      the minimum number of events in a node (leaf criteria, stop splitting)
 dummy:           a dummy variable, just to keep backward compatible
 nCuts:  the number of steps in the optimisation of the cut for a node
 SignalFraction:  scale parameter of the number of Bkg events
                  applied to the training sample to simulate different initial purity
                  of your data sample.

 known SeparationTypes are:
    - MisClassificationError
    - GiniIndex
    - CrossEntropy
 known BoostTypes are:
    - AdaBoost
    - Bagging

TMVA::MethodBDT

class TMVA::MethodBDT : public TMVA::MethodBase

Data Members

Class Description