TMVA::CostComplexityPruneTool Class Reference

A class to prune a decision tree using the Cost Complexity method.

(see "Classification and Regression Trees" by Leo Breiman et al)

### Some definitions:

• $$T_{max}$$ - the initial, usually highly overtrained tree, that is to be pruned back
• $$R(T)$$ - quality index (Gini, misclassification rate, or other) of a tree $$T$$
• $$\sim T$$ - set of terminal nodes in $$T$$
• $$T'$$ - the pruned subtree of $$T_max$$ that has the best quality index $$R(T')$$
• $$\alpha$$ - the prune strength parameter in Cost Complexity pruning $$(R_{\alpha}(T) = R(T) + \alpha*|\sim T|)$$

There are two running modes in CCPruner: (i) one may select a prune strength and prune back the tree $$T_{max}$$ until the criterion:

$\alpha < \frac{R(T) - R(t)}{|\sim T_t| - 1}$

is true for all nodes t in $$T$$, or (ii) the algorithm finds the sequence of critical points $$\alpha_k < \alpha_{k+1} ... < \alpha_K$$ such that $$T_K = root(T_{max})$$ and then selects the optimally-pruned subtree, defined to be the subtree with the best quality index for the validation sample.

## Public Member Functions

CostComplexityPruneTool (SeparationBase *qualityIndex=NULL)
the constructor for the cost complexity pruning More...

virtual ~CostComplexityPruneTool ()
the destructor for the cost complexity pruning More...

virtual PruningInfoCalculatePruningInfo (DecisionTree *dt, const IPruneTool::EventSample *testEvents=NULL, Bool_t isAutomatic=kFALSE)
the routine that basically "steers" the pruning process. More...

Public Member Functions inherited from TMVA::IPruneTool
IPruneTool ()

virtual ~IPruneTool ()

virtual PruningInfoCalculatePruningInfo (DecisionTree *dt, const EventSample *testEvents=NULL, Bool_t isAutomatic=kFALSE)=0

Double_t GetPruneStrength () const

Bool_t IsAutomatic () const

void SetAutomatic ()

void SetPruneStrength (Double_t alpha)

## Private Member Functions

the optimal index of the prune sequence More...

MsgLoggerLog () const
output stream to save logging information More...

void Optimize (DecisionTree *dt, Double_t weights)
after the critical $$\alpha$$ values (at which the corresponding nodes would be pruned away) had been established in the "InitMetaData" we need now: automatic pruning: More...

## Private Attributes

MsgLoggerfLogger

Int_t fOptimalK
map of R(T) -> pruning index More...

std::vector< DecisionTreeNode * > fPruneSequence
the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) } More...

std::vector< Double_tfPruneStrengthList
map of weakest links (i.e., branches to prune) -> pruning index More...

std::vector< Double_tfQualityIndexList
map of alpha -> pruning index More...

SeparationBasefQualityIndexTool

Public Types inherited from TMVA::IPruneTool
typedef std::vector< const Event * > EventSample

Protected Attributes inherited from TMVA::IPruneTool
Double_t B

Double_t fPruneStrength

Double_t S
regularization parameter in pruning More...

#include <TMVA/CostComplexityPruneTool.h>

## ◆ CostComplexityPruneTool()

 CostComplexityPruneTool::CostComplexityPruneTool ( SeparationBase * qualityIndex = NULL )

the constructor for the cost complexity pruning

## ◆ ~CostComplexityPruneTool()

 CostComplexityPruneTool::~CostComplexityPruneTool ( )
virtual

the destructor for the cost complexity pruning

## ◆ CalculatePruningInfo()

 PruningInfo * CostComplexityPruneTool::CalculatePruningInfo ( DecisionTree * dt, const IPruneTool::EventSample * validationSample = NULL, Bool_t isAutomatic = kFALSE )
virtual

the routine that basically "steers" the pruning process.

Call the calculation of the pruning sequence, the tree quality and alike..

Implements TMVA::IPruneTool.

 void CostComplexityPruneTool::InitTreePruningMetaData ( DecisionTreeNode * n )
private

the optimal index of the prune sequence

initialise "meta data" for the pruning, like the "costcomplexity", the critical alpha, the minimal alpha down the tree, etc... for each node!!

## ◆ Log()

 MsgLogger & TMVA::CostComplexityPruneTool::Log ( ) const
inlineprivate

output stream to save logging information

## ◆ Optimize()

 void CostComplexityPruneTool::Optimize ( DecisionTree * dt, Double_t weights )
private

after the critical $$\alpha$$ values (at which the corresponding nodes would be pruned away) had been established in the "InitMetaData" we need now: automatic pruning:

find the value of $$\alpha$$ for which the test sample gives minimal error, on the tree with all nodes pruned that have $$\alpha_{critical} < \alpha$$, fixed parameter pruning

## ◆ fLogger

 MsgLogger* TMVA::CostComplexityPruneTool::fLogger
mutableprivate

## ◆ fOptimalK

 Int_t TMVA::CostComplexityPruneTool::fOptimalK
private

map of R(T) -> pruning index

## ◆ fPruneSequence

 std::vector TMVA::CostComplexityPruneTool::fPruneSequence
private

the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) }

## ◆ fPruneStrengthList

 std::vector TMVA::CostComplexityPruneTool::fPruneStrengthList
private

map of weakest links (i.e., branches to prune) -> pruning index

## ◆ fQualityIndexList

 std::vector TMVA::CostComplexityPruneTool::fQualityIndexList
private

map of alpha -> pruning index

## ◆ fQualityIndexTool

 SeparationBase* TMVA::CostComplexityPruneTool::fQualityIndexTool
private

