library: libTMVA #include "MethodBDT.h" |
TMVA::MethodBDT
class description - header file - source file - inheritance tree (.pdf)
private:
Double_t AdaBoost(vector<TMVA::Event*,allocator<TMVA::Event*> >, TMVA::DecisionTree* dt)
Double_t Bagging(vector<TMVA::Event*,allocator<TMVA::Event*> >, Int_t iTree)
void InitBDT()
public:
MethodBDT(TString jobName, vector<TString>* theVariables, TTree* theTree, TString theOption = 100:AdaBoost:GiniIndex:10:0:20:-1, TDirectory* theTargetDir = 0)
MethodBDT(vector<TString>* theVariables, TString theWeightFile, TDirectory* theTargetDir = NULL)
MethodBDT(const TMVA::MethodBDT&)
virtual ~MethodBDT()
virtual Double_t Boost(vector<TMVA::Event*,allocator<TMVA::Event*> >, TMVA::DecisionTree* dt, Int_t iTree)
static TClass* Class()
virtual Double_t GetMvaValue(TMVA::Event* e)
virtual void InitEventSample()
virtual TClass* IsA() const
TMVA::MethodBDT& operator=(const TMVA::MethodBDT&)
virtual void ReadWeightsFromFile()
virtual void ShowMembers(TMemberInspector& insp, char* parent)
virtual void Streamer(TBuffer& b)
void StreamerNVirtual(TBuffer& b)
virtual void Train()
virtual void WriteHistosToFile()
virtual void WriteWeightsToFile()
private:
Double_t fAdaBoostBeta parameter in AdaBoost
vector<TMVA::Event*,allocator<TMVA::Event*> > fEventSample the training events
Int_t fNTrees number of decision trees requested
vector<DecisionTree*> fForest the collection of decision trees
vector<double> fBoostWeights the weights applied in the individual boosts
TString fBoostType string specifying the boost type
TMVA::SeparationBase* fSepType the separation used in node splitting
Int_t fNodeMinEvents min number of events in node
Double_t fDummyOpt dummy option (for backward compatibility)
Int_t fNCuts grid used in cut applied in node splitting
Double_t fSignalFraction scalefactor for bkg events to modify initial s/b fraction in training data
TH1F* fBoostWeightHist weights applied in boosting
TH2F* fErrFractHist error fraction vs tree number
TTree* fMonitorNtuple monitoring ntuple
Int_t fITree ntuple var: ith tree
Double_t fBoostWeight ntuple var: boost weight
Double_t fErrorFraction ntuple var: misclassification error fraction
Int_t fNnodes ntuple var: nNodes
_______________________________________________________________________
Analysis of Boosted Decision Trees
Boosted decision trees have been successfully used in High Energy
Physics analysis for example by the MiniBooNE experiment
(Yang-Roe-Zhu, physics/0508045). In Boosted Decision Trees, the
selection is done on a majority vote on the result of several decision
trees, which are all derived from the same training sample by
supplying different event weights during the training.
Decision trees:
successive decision nodes are used to categorize the
events out of the sample as either signal or background. Each node
uses only a single discriminating variable to decide if the event is
signal-like ("goes right") or background-like ("goes left"). This
forms a tree like structure with "baskets" at the end (leave nodes),
and an event is classified as either signal or background according to
whether the basket where it ends up has been classified signal or
background during the training. Training of a decision tree is the
process to define the "cut criteria" for each node. The training
starts with the root node. Here one takes the full training event
sample and selects the variable and corresponding cut value that gives
the best separation between signal and background at this stage. Using
this cut criterion, the sample is then divided into two subsamples, a
signal-like (right) and a background-like (left) sample. Two new nodes
are then created for each of the two sub-samples and they are
constructed using the same mechanism as described for the root
node. The devision is stopped once a certain node has reached either a
minimum number of events, or a minimum or maximum signal purity. These
leave nodes are then called "signal" or "background" if they contain
more signal respective background events from the training sample.
Boosting:
the idea behind the boosting is, that signal events from the training
sample, that end up in a background node (and vice versa) are given a
larger weight than events that are in the correct leave node. This
results in a re-weighed training event sample, with which then a new
decision tree can be developed. The boosting can be applied several
times (typically 100-500 times) and one ends up with a set of decision
trees (a forest).
Bagging:
In this particular variant of the Boosted Decision Trees the boosting
is not done on the basis of previous training results, but by a simple
stochasitc re-sampling of the initial training event sample.
Analysis:
applying an individual decision tree to a test event results in a
classification of the event as either signal or background. For the
boosted decision tree selection, an event is successively subjected to
the whole set of decision trees and depending on how often it is
classified as signal, a "likelihood" estimator is constructed for the
event being signal or background. The value of this estimator is the
one which is then used to select the events from an event sample, and
the cut value on this estimator defines the efficiency and purity of
the selection.
_______________________________________________________________________
MethodBDT( TString jobName, vector<TString>* theVariables, TTree* theTree, TString theOption, TDirectory* theTargetDir )
the standard constructor for the "boosted decision trees"
MethodBDT (Boosted Decision Trees) options:
format and syntax of option string: "nTrees:BoostType:SeparationType:
nEventsMin:dummy:
nCuts:SignalFraction"
nTrees: number of trees in the forest to be created
BoostType: the boosting type for the trees in the forest (AdaBoost e.t.c..)
SeparationType the separation criterion applied in the node splitting
nEventsMin: the minimum number of events in a node (leaf criteria, stop splitting)
dummy: a dummy variable, just to keep backward compatible
nCuts: the number of steps in the optimisation of the cut for a node
SignalFraction: scale parameter of the number of Bkg events
applied to the training sample to simulate different initial purity
of your data sample.
known SeparationTypes are:
- MisClassificationError
- GiniIndex
- CrossEntropy
known BoostTypes are:
- AdaBoost
- Bagging
MethodBDT( vector<TString> *theVariables, TString theWeightFile, TDirectory* theTargetDir )
constructor for calculating BDT-MVA using previously generatad decision trees
the result of the previous training (the decision trees) are read in via the
weightfile. Make sure the "theVariables" correspond to the ones used in
creating the "weight"-file
void InitBDT( void )
common initialisation with defaults for the BDT-Method
void InitEventSample( void )
write all Events from the Tree into a vector of TMVA::Events, that are
more easily manipulated.
This method should never be called without existing trainingTree, as it
the vector of events from the ROOT training tree
void Train( void )
default sanity checks
Double_t Boost( vector<TMVA::Event*> eventSample, TMVA::DecisionTree *dt, Int_t iTree )
apply the boosting alogrithim (the algorithm is selecte via the the "option" given
in the constructor. The return value is the boosting weight
Double_t AdaBoost( vector<TMVA::Event*> eventSample, TMVA::DecisionTree *dt )
the AdaBoost implementation.
a new training sample is generated by weighting
events that are misclassified by the decision tree. The weight
applied is w = (1-err)/err or more general:
w = ((1-err)/err)^beta
where err is the fracthin of misclassified events in the tree ( <0.5 assuming
demanding the that previous selection was better than random guessing)
and "beta" beeing a free parameter (standard: beta = 1) that modifies the
boosting.
Double_t Bagging( vector<TMVA::Event*> eventSample, Int_t iTree )
call it Bootstrapping, re-sampling or whatever you like, in the end it is nothing
else but applying "random Weights" to each event.
void WriteWeightsToFile( void )
write the whole Forest (sample of Decition trees) to a file for later use.
Double_t GetMvaValue(TMVA::Event *e)
return the MVA value (range [-1;1]) that classifies the
event.according to the majority vote from the total number of
decision trees
In the literature I found that people actually use the
weighted majority vote (using the boost weights) .. However I
did not see any improvement in doing so :(
--> this is currently switched off
void WriteHistosToFile( void )
here we could write some histograms created during the processing
to the output file.
Author: Andreas Hoecker, Joerg Stelzer, Helge Voss, Kai Voss
Last update: root/tmva $Id: MethodBDT.cxx,v 1.4 2006/05/26 09:22:13 brun Exp $
Copyright (c) 2005: *
ROOT page - Class index - Class Hierarchy - Top of the page
This page has been automatically generated. If you have any comments or suggestions about the page layout send a mail to ROOT support, or contact the developers with any questions or problems regarding ROOT.