A helper class to prune a decision tree using the Cost Complexity method (see Classification and Regression Trees by Leo Breiman et al)
There are two running modes in CCPruner: (i) one may select a prune strength and prune back the tree \( T_{max}\) until the criterion:
\[ \alpha < \frac{R(T) - R(t)}{|\sim T_t| - 1} \]
is true for all nodes t in \( T \), or (ii) the algorithm finds the sequence of critical points \( \alpha_k < \alpha_{k+1} ... < \alpha_K \) such that \( T_K = root(T_{max}) \) and then selects the optimally-pruned subtree, defined to be the subtree with the best quality index for the validation sample.
Definition at line 62 of file CCPruner.h.
| Public Types | |
| typedef std::vector< Event * > | EventList | 
| Public Member Functions | |
| CCPruner (DecisionTree *t_max, const DataSet *validationSample, SeparationBase *qualityIndex=nullptr) | |
| constructor | |
| CCPruner (DecisionTree *t_max, const EventList *validationSample, SeparationBase *qualityIndex=nullptr) | |
| constructor | |
| ~CCPruner () | |
| std::vector< TMVA::DecisionTreeNode * > | GetOptimalPruneSequence () const | 
| return the prune strength (=alpha) corresponding to the prune sequence | |
| Float_t | GetOptimalPruneStrength () const | 
| Float_t | GetOptimalQualityIndex () const | 
| void | Optimize () | 
| determine the pruning sequence | |
| void | SetPruneStrength (Float_t alpha=-1.0) | 
| Private Attributes | |
| Float_t | fAlpha | 
| ! regularization parameter in CC pruning | |
| Bool_t | fDebug | 
| ! debug flag | |
| Int_t | fOptimalK | 
| ! index of the optimal tree in the pruned tree sequence | |
| Bool_t | fOwnQIndex | 
| ! flag indicates if fQualityIndex is owned by this | |
| std::vector< TMVA::DecisionTreeNode * > | fPruneSequence | 
| ! map of weakest links (i.e., branches to prune) -> pruning index | |
| std::vector< Float_t > | fPruneStrengthList | 
| ! map of alpha -> pruning index | |
| SeparationBase * | fQualityIndex | 
| ! the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) } | |
| std::vector< Float_t > | fQualityIndexList | 
| ! map of R(T) -> pruning index | |
| DecisionTree * | fTree | 
| ! (pruned) decision tree | |
| const DataSet * | fValidationDataSet | 
| ! the event sample to select the optimally-pruned tree | |
| const EventList * | fValidationSample | 
| ! the event sample to select the optimally-pruned tree | |
#include <TMVA/CCPruner.h>
| typedef std::vector<Event*> TMVA::CCPruner::EventList | 
Definition at line 64 of file CCPruner.h.
| CCPruner::CCPruner | ( | DecisionTree * | t_max, | 
| const EventList * | validationSample, | ||
| SeparationBase * | qualityIndex = nullptr ) | 
constructor
Definition at line 69 of file CCPruner.cxx.
| CCPruner::CCPruner | ( | DecisionTree * | t_max, | 
| const DataSet * | validationSample, | ||
| SeparationBase * | qualityIndex = nullptr ) | 
constructor
Definition at line 92 of file CCPruner.cxx.
| CCPruner::~CCPruner | ( | ) | 
Definition at line 115 of file CCPruner.cxx.
| std::vector< DecisionTreeNode * > CCPruner::GetOptimalPruneSequence | ( | ) | const | 
return the prune strength (=alpha) corresponding to the prune sequence
Definition at line 240 of file CCPruner.cxx.
| 
 | inline | 
Definition at line 89 of file CCPruner.h.
| 
 | inline | 
Definition at line 85 of file CCPruner.h.
| void CCPruner::Optimize | ( | ) | 
determine the pruning sequence
Definition at line 124 of file CCPruner.cxx.
| 
 | inline | 
Definition at line 110 of file CCPruner.h.
| 
 | private | 
! regularization parameter in CC pruning
Definition at line 93 of file CCPruner.h.
| 
 | private | 
! debug flag
Definition at line 106 of file CCPruner.h.
| 
 | private | 
! index of the optimal tree in the pruned tree sequence
Definition at line 105 of file CCPruner.h.
| 
 | private | 
! flag indicates if fQualityIndex is owned by this
Definition at line 97 of file CCPruner.h.
| 
 | private | 
! map of weakest links (i.e., branches to prune) -> pruning index
Definition at line 101 of file CCPruner.h.
| 
 | private | 
! map of alpha -> pruning index
Definition at line 102 of file CCPruner.h.
| 
 | private | 
! the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) }
Definition at line 96 of file CCPruner.h.
| 
 | private | 
! map of R(T) -> pruning index
Definition at line 103 of file CCPruner.h.
| 
 | private | 
! (pruned) decision tree
Definition at line 99 of file CCPruner.h.
! the event sample to select the optimally-pruned tree
Definition at line 95 of file CCPruner.h.
! the event sample to select the optimally-pruned tree
Definition at line 94 of file CCPruner.h.