TMVA::DecisionTree

Class Description

_______________________________________________________________________

 Implementation of a Decision Tree

 In a decision tree successive decision nodes are used to categorize the
 events out of the sample as either signal or background. Each node
 uses only a single discriminating variable to decide if the event is
 signal-like ("goes right") or background-like ("goes left"). This
 forms a tree like structure with "baskets" at the end (leave nodes),
 and an event is classified as either signal or background according to
 whether the basket where it ends up has been classified signal or
 background during the training. Training of a decision tree is the
 process to define the "cut criteria" for each node. The training
 starts with the root node. Here one takes the full training event
 sample and selects the variable and corresponding cut value that gives
 the best separation between signal and background at this stage. Using
 this cut criterion, the sample is then divided into two subsamples, a
 signal-like (right) and a background-like (left) sample. Two new nodes
 are then created for each of the two sub-samples and they are
 constructed using the same mechanism as described for the root
 node. The devision is stopped once a certain node has reached either a
 minimum number of events, or a minimum or maximum signal purity. These
 leave nodes are then called "signal" or "background" if they contain
 more signal respective background events from the training sample.
_______________________________________________________________________

DecisionTree( void )

   fSoverSBUpperThreshold (0),
   fSoverSBLowerThreshold (0)
 default constructor using the GiniIndex as separation criterion,
 no restrictions on minium number of events in a leave note or the
 separation gain in the node splitting

DecisionTree( TMVA::SeparationBase *sepType,Int_t minSize, Int_t nCuts)

   fSoverSBUpperThreshold (0),
   fSoverSBLowerThreshold (0)
 constructor specifying the separation type, the min number of
 events in a no that is still subjected to further splitting, the
 min separation gain requested for actually splitting a node
 (NEEDS TO BE SET TO ZERO, OTHERWISE I GET A STRANGE BEHAVIOUR
 WHICH IS NOT YET COMPLETELY UNDERSTOOD) as well as the number of
 bins in the grid used in applying the cut for the node splitting.

Double_t TrainNode(vector<TMVA::Event*> & eventSample, TMVA::DecisionTreeNode *node)

 decide how to split a node. At each node, ONE of the variables is
 choosen, which gives the best separation between signal and bkg on
 the sample which enters the Node.
 In order to do this, for each variable a scan of the different cut
 values in a grid (grid = fNCuts) is performed and the resulting separation
 gains are compared.. This cut scan uses either a binary search tree
 or a simple loop over the events depending on the number of events
 in the sample

Double_t CheckEvent(TMVA::Event* e)

 the event e is put into the decision tree (starting at the root node)
 and the output is NodeType (signal) or (background) of the final node (basket)
 in which the given events ends up. I.e. the result of the classification if
 the event for this decision tree.

TMVA::DecisionTree

class TMVA::DecisionTree : public TMVA::BinaryTree

Data Members

Class Description