Logo ROOT  
Reference Guide
TMVA::CCPruner Class Reference

A helper class to prune a decision tree using the Cost Complexity method (see Classification and Regression Trees by Leo Breiman et al)

Some definitions:

  • \( T_{max} \) - the initial, usually highly overtrained tree, that is to be pruned back
  • \( R(T) \) - quality index (Gini, misclassification rate, or other) of a tree \( T \)
  • \( \sim T \) - set of terminal nodes in \( T \)
  • \( T' \) - the pruned subtree of \( T_max \) that has the best quality index \( R(T') \)
  • \( \alpha \) - the prune strength parameter in Cost Complexity pruning \( (R_{\alpha}(T) = R(T) + \alpha*|\sim T|) \)

There are two running modes in CCPruner: (i) one may select a prune strength and prune back the tree \( T_{max}\) until the criterion:

\[ \alpha < \frac{R(T) - R(t)}{|\sim T_t| - 1} \]

is true for all nodes t in \( T \), or (ii) the algorithm finds the sequence of critical points \( \alpha_k < \alpha_{k+1} ... < \alpha_K \) such that \( T_K = root(T_{max}) \) and then selects the optimally-pruned subtree, defined to be the subtree with the best quality index for the validation sample.

Definition at line 61 of file CCPruner.h.

Public Types

typedef std::vector< Event * > EventList
 

Public Member Functions

 CCPruner (DecisionTree *t_max, const DataSet *validationSample, SeparationBase *qualityIndex=NULL)
 constructor More...
 
 CCPruner (DecisionTree *t_max, const EventList *validationSample, SeparationBase *qualityIndex=NULL)
 constructor More...
 
 ~CCPruner ()
 
std::vector< TMVA::DecisionTreeNode * > GetOptimalPruneSequence () const
 return the prune strength (=alpha) corresponding to the prune sequence More...
 
Float_t GetOptimalPruneStrength () const
 
Float_t GetOptimalQualityIndex () const
 
void Optimize ()
 determine the pruning sequence More...
 
void SetPruneStrength (Float_t alpha=-1.0)
 

Private Attributes

Float_t fAlpha
 
Bool_t fDebug
 index of the optimal tree in the pruned tree sequence More...
 
Int_t fOptimalK
 map of R(T) -> pruning index More...
 
Bool_t fOwnQIndex
 the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) } More...
 
std::vector< TMVA::DecisionTreeNode * > fPruneSequence
 (pruned) decision tree More...
 
std::vector< Float_tfPruneStrengthList
 map of weakest links (i.e., branches to prune) -> pruning index More...
 
SeparationBasefQualityIndex
 the event sample to select the optimally-pruned tree More...
 
std::vector< Float_tfQualityIndexList
 map of alpha -> pruning index More...
 
DecisionTreefTree
 flag indicates if fQualityIndex is owned by this More...
 
const DataSetfValidationDataSet
 the event sample to select the optimally-pruned tree More...
 
const EventListfValidationSample
 regularization parameter in CC pruning More...
 

#include <TMVA/CCPruner.h>

Member Typedef Documentation

◆ EventList

typedef std::vector<Event*> TMVA::CCPruner::EventList

Definition at line 63 of file CCPruner.h.

Constructor & Destructor Documentation

◆ CCPruner() [1/2]

CCPruner::CCPruner ( DecisionTree t_max,
const EventList validationSample,
SeparationBase qualityIndex = NULL 
)

constructor

Definition at line 69 of file CCPruner.cxx.

◆ CCPruner() [2/2]

CCPruner::CCPruner ( DecisionTree t_max,
const DataSet validationSample,
SeparationBase qualityIndex = NULL 
)

constructor

Definition at line 92 of file CCPruner.cxx.

◆ ~CCPruner()

CCPruner::~CCPruner ( )

Definition at line 115 of file CCPruner.cxx.

Member Function Documentation

◆ GetOptimalPruneSequence()

std::vector< DecisionTreeNode * > CCPruner::GetOptimalPruneSequence ( ) const

return the prune strength (=alpha) corresponding to the prune sequence

Definition at line 240 of file CCPruner.cxx.

◆ GetOptimalPruneStrength()

Float_t TMVA::CCPruner::GetOptimalPruneStrength ( ) const
inline

Definition at line 88 of file CCPruner.h.

◆ GetOptimalQualityIndex()

Float_t TMVA::CCPruner::GetOptimalQualityIndex ( ) const
inline

Definition at line 84 of file CCPruner.h.

◆ Optimize()

void CCPruner::Optimize ( )

determine the pruning sequence

Definition at line 124 of file CCPruner.cxx.

◆ SetPruneStrength()

void TMVA::CCPruner::SetPruneStrength ( Float_t  alpha = -1.0)
inline

Definition at line 109 of file CCPruner.h.

Member Data Documentation

◆ fAlpha

Float_t TMVA::CCPruner::fAlpha
private

Definition at line 92 of file CCPruner.h.

◆ fDebug

Bool_t TMVA::CCPruner::fDebug
private

index of the optimal tree in the pruned tree sequence

Definition at line 105 of file CCPruner.h.

◆ fOptimalK

Int_t TMVA::CCPruner::fOptimalK
private

map of R(T) -> pruning index

Definition at line 104 of file CCPruner.h.

◆ fOwnQIndex

Bool_t TMVA::CCPruner::fOwnQIndex
private

the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) }

Definition at line 96 of file CCPruner.h.

◆ fPruneSequence

std::vector<TMVA::DecisionTreeNode*> TMVA::CCPruner::fPruneSequence
private

(pruned) decision tree

Definition at line 100 of file CCPruner.h.

◆ fPruneStrengthList

std::vector<Float_t> TMVA::CCPruner::fPruneStrengthList
private

map of weakest links (i.e., branches to prune) -> pruning index

Definition at line 101 of file CCPruner.h.

◆ fQualityIndex

SeparationBase* TMVA::CCPruner::fQualityIndex
private

the event sample to select the optimally-pruned tree

Definition at line 95 of file CCPruner.h.

◆ fQualityIndexList

std::vector<Float_t> TMVA::CCPruner::fQualityIndexList
private

map of alpha -> pruning index

Definition at line 102 of file CCPruner.h.

◆ fTree

DecisionTree* TMVA::CCPruner::fTree
private

flag indicates if fQualityIndex is owned by this

Definition at line 98 of file CCPruner.h.

◆ fValidationDataSet

const DataSet* TMVA::CCPruner::fValidationDataSet
private

the event sample to select the optimally-pruned tree

Definition at line 94 of file CCPruner.h.

◆ fValidationSample

const EventList* TMVA::CCPruner::fValidationSample
private

regularization parameter in CC pruning

Definition at line 93 of file CCPruner.h.

Libraries for TMVA::CCPruner:
[legend]

The documentation for this class was generated from the following files: