Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
MethodDT.cxx
Go to the documentation of this file.
1// @(#)root/tmva $Id$
2// Author: Andreas Hoecker, Joerg Stelzer, Helge Voss, Kai Voss
3
4/**********************************************************************************
5 * Project: TMVA - a Root-integrated toolkit for multivariate data analysis *
6 * Package: TMVA *
7 * Class : MethodDT (DT = Decision Trees) *
8 * *
9 * *
10 * Description: *
11 * Analysis of Boosted Decision Trees *
12 * *
13 * Authors (alphabetical): *
14 * Andreas Hoecker <Andreas.Hocker@cern.ch> - CERN, Switzerland *
15 * Helge Voss <Helge.Voss@cern.ch> - MPI-K Heidelberg, Germany *
16 * Or Cohen <orcohenor@gmail.com> - Weizmann Inst., Israel *
17 * *
18 * Copyright (c) 2005: *
19 * CERN, Switzerland *
20 * MPI-K Heidelberg, Germany *
21 * *
22 * Redistribution and use in source and binary forms, with or without *
23 * modification, are permitted according to the terms listed in LICENSE *
24 * (see tmva/doc/LICENSE) *
25 **********************************************************************************/
26
27/*! \class TMVA::MethodDT
28\ingroup TMVA
29
30Analysis of Boosted Decision Trees
31
32Boosted decision trees have been successfully used in High Energy
33Physics analysis for example by the MiniBooNE experiment
34(Yang-Roe-Zhu, physics/0508045). In Boosted Decision Trees, the
35selection is done on a majority vote on the result of several decision
36trees, which are all derived from the same training sample by
37supplying different event weights during the training.
38
39### Decision trees:
40
41successive decision nodes are used to categorize the
42events out of the sample as either signal or background. Each node
43uses only a single discriminating variable to decide if the event is
44signal-like ("goes right") or background-like ("goes left"). This
45forms a tree like structure with "baskets" at the end (leave nodes),
46and an event is classified as either signal or background according to
47whether the basket where it ends up has been classified signal or
48background during the training. Training of a decision tree is the
49process to define the "cut criteria" for each node. The training
50starts with the root node. Here one takes the full training event
51sample and selects the variable and corresponding cut value that gives
52the best separation between signal and background at this stage. Using
53this cut criterion, the sample is then divided into two subsamples, a
54signal-like (right) and a background-like (left) sample. Two new nodes
55are then created for each of the two sub-samples and they are
56constructed using the same mechanism as described for the root
57node. The devision is stopped once a certain node has reached either a
58minimum number of events, or a minimum or maximum signal purity. These
59leave nodes are then called "signal" or "background" if they contain
60more signal respective background events from the training sample.
61
62### Boosting:
63
64the idea behind the boosting is, that signal events from the training
65sample, that *end up in a background node (and vice versa) are given a
66larger weight than events that are in the correct leave node. This
67results in a re-weighed training event sample, with which then a new
68decision tree can be developed. The boosting can be applied several
69times (typically 100-500 times) and one ends up with a set of decision
70trees (a forest).
71
72### Bagging:
73
74In this particular variant of the Boosted Decision Trees the boosting
75is not done on the basis of previous training results, but by a simple
76stochastic re-sampling of the initial training event sample.
77
78### Analysis:
79
80applying an individual decision tree to a test event results in a
81classification of the event as either signal or background. For the
82boosted decision tree selection, an event is successively subjected to
83the whole set of decision trees and depending on how often it is
84classified as signal, a "likelihood" estimator is constructed for the
85event being signal or background. The value of this estimator is the
86one which is then used to select the events from an event sample, and
87the cut value on this estimator defines the efficiency and purity of
88the selection.
89*/
90
91#include "TMVA/MethodDT.h"
92
94#include "TMVA/CCPruner.h"
96#include "TMVA/Configurable.h"
97#include "TMVA/CrossEntropy.h"
98#include "TMVA/DataSet.h"
99#include "TMVA/DecisionTree.h"
100#include "TMVA/GiniIndex.h"
101#include "TMVA/IMethod.h"
102#include "TMVA/MethodBase.h"
103#include "TMVA/MethodBoost.h"
105#include "TMVA/MsgLogger.h"
106#include "TMVA/Ranking.h"
107#include "TMVA/SdivSqrtSplusB.h"
108#include "TMVA/SeparationBase.h"
109#include "TMVA/Timer.h"
110#include "TMVA/Tools.h"
111#include "TMVA/Types.h"
112
113#include "TRandom3.h"
114
115#include <iostream>
116#include <algorithm>
117
118using std::vector;
119
121
123
124////////////////////////////////////////////////////////////////////////////////
125/// the standard constructor for just an ordinar "decision trees"
126
128 const TString& methodTitle,
129 DataSetInfo& theData,
130 const TString& theOption) :
131 TMVA::MethodBase( jobName, Types::kDT, methodTitle, theData, theOption)
132 , fTree(0)
133 , fSepType(0)
134 , fMinNodeEvents(0)
135 , fMinNodeSize(0)
136 , fNCuts(0)
137 , fUseYesNoLeaf(kFALSE)
138 , fNodePurityLimit(0)
139 , fMaxDepth(0)
140 , fErrorFraction(0)
141 , fPruneStrength(0)
142 , fPruneMethod(DecisionTree::kNoPruning)
143 , fAutomatic(kFALSE)
144 , fRandomisedTrees(kFALSE)
145 , fUseNvars(0)
146 , fUsePoissonNvars(0) // don't use this initialisation, only here to make Coverity happy. Is set in Init()
147 , fDeltaPruneStrength(0)
148{
150}
151
152////////////////////////////////////////////////////////////////////////////////
153///constructor from Reader
154
156 const TString& theWeightFile) :
157 TMVA::MethodBase( Types::kDT, dsi, theWeightFile)
158 , fTree(0)
159 , fSepType(0)
160 , fMinNodeEvents(0)
161 , fMinNodeSize(0)
162 , fNCuts(0)
163 , fUseYesNoLeaf(kFALSE)
164 , fNodePurityLimit(0)
165 , fMaxDepth(0)
166 , fErrorFraction(0)
167 , fPruneStrength(0)
168 , fPruneMethod(DecisionTree::kNoPruning)
169 , fAutomatic(kFALSE)
170 , fRandomisedTrees(kFALSE)
171 , fUseNvars(0)
172 , fDeltaPruneStrength(0)
173{
175}
176
177////////////////////////////////////////////////////////////////////////////////
178/// FDA can handle classification with 2 classes and regression with one regression-target
179
181{
182 if( type == Types::kClassification && numberClasses == 2 ) return kTRUE;
183 return kFALSE;
184}
185
186
187////////////////////////////////////////////////////////////////////////////////
188/// Define the options (their key words) that can be set in the option string.
189///
190/// - UseRandomisedTrees choose at each node splitting a random set of variables
191/// - UseNvars use UseNvars variables in randomised trees
192/// - SeparationType the separation criterion applied in the node splitting.
193/// known:
194/// - GiniIndex
195/// - MisClassificationError
196/// - CrossEntropy
197/// - SDivSqrtSPlusB
198/// - nEventsMin: the minimum number of events in a node (leaf criteria, stop splitting)
199/// - nCuts: the number of steps in the optimisation of the cut for a node (if < 0, then
200/// step size is determined by the events)
201/// - UseYesNoLeaf decide if the classification is done simply by the node type, or the S/B
202/// (from the training) in the leaf node
203/// - NodePurityLimit the minimum purity to classify a node as a signal node (used in pruning and boosting to determine
204/// misclassification error rate)
205/// - PruneMethod The Pruning method:
206/// known:
207/// - NoPruning // switch off pruning completely
208/// - ExpectedError
209/// - CostComplexity
210/// - PruneStrength a parameter to adjust the amount of pruning. Should be large enough such that overtraining is avoided");
211
213{
214 DeclareOptionRef(fRandomisedTrees,"UseRandomisedTrees","Choose at each node splitting a random set of variables and *bagging*");
215 DeclareOptionRef(fUseNvars,"UseNvars","Number of variables used if randomised Tree option is chosen");
216 DeclareOptionRef(fUsePoissonNvars,"UsePoissonNvars", "Interpret \"UseNvars\" not as fixed number but as mean of a Poisson distribution in each split with RandomisedTree option");
217 DeclareOptionRef(fUseYesNoLeaf=kTRUE, "UseYesNoLeaf",
218 "Use Sig or Bkg node type or the ratio S/B as classification in the leaf node");
219 DeclareOptionRef(fNodePurityLimit=0.5, "NodePurityLimit", "In boosting/pruning, nodes with purity > NodePurityLimit are signal; background otherwise.");
220 DeclareOptionRef(fSepTypeS="GiniIndex", "SeparationType", "Separation criterion for node splitting");
221 AddPreDefVal(TString("MisClassificationError"));
222 AddPreDefVal(TString("GiniIndex"));
223 AddPreDefVal(TString("CrossEntropy"));
224 AddPreDefVal(TString("SDivSqrtSPlusB"));
225 DeclareOptionRef(fMinNodeEvents=-1, "nEventsMin", "deprecated !!! Minimum number of events required in a leaf node");
226 DeclareOptionRef(fMinNodeSizeS, "MinNodeSize", "Minimum percentage of training events required in a leaf node (default: Classification: 10%, Regression: 1%)");
227 DeclareOptionRef(fNCuts, "nCuts", "Number of steps during node cut optimisation");
228 DeclareOptionRef(fPruneStrength, "PruneStrength", "Pruning strength (negative value == automatic adjustment)");
229 DeclareOptionRef(fPruneMethodS="NoPruning", "PruneMethod", "Pruning method: NoPruning (switched off), ExpectedError or CostComplexity");
230
231 AddPreDefVal(TString("NoPruning"));
232 AddPreDefVal(TString("ExpectedError"));
233 AddPreDefVal(TString("CostComplexity"));
234
235 if (DoRegression()) {
236 DeclareOptionRef(fMaxDepth=50,"MaxDepth","Max depth of the decision tree allowed");
237 }else{
238 DeclareOptionRef(fMaxDepth=3,"MaxDepth","Max depth of the decision tree allowed");
239 }
240}
241
242////////////////////////////////////////////////////////////////////////////////
243/// options that are used ONLY for the READER to ensure backward compatibility
244
246
248
249 DeclareOptionRef(fPruneBeforeBoost=kFALSE, "PruneBeforeBoost",
250 "--> removed option .. only kept for reader backward compatibility");
251}
252
253////////////////////////////////////////////////////////////////////////////////
254/// the option string is decoded, for available options see "DeclareOptions"
255
257{
258 fSepTypeS.ToLower();
259 if (fSepTypeS == "misclassificationerror") fSepType = new MisClassificationError();
260 else if (fSepTypeS == "giniindex") fSepType = new GiniIndex();
261 else if (fSepTypeS == "crossentropy") fSepType = new CrossEntropy();
262 else if (fSepTypeS == "sdivsqrtsplusb") fSepType = new SdivSqrtSplusB();
263 else {
264 Log() << kINFO << GetOptions() << Endl;
265 Log() << kFATAL << "<ProcessOptions> unknown Separation Index option called" << Endl;
266 }
267
268 // std::cout << "fSeptypes " << fSepTypeS << " fseptype " << fSepType << std::endl;
269
270 fPruneMethodS.ToLower();
271 if (fPruneMethodS == "expectederror" ) fPruneMethod = DecisionTree::kExpectedErrorPruning;
272 else if (fPruneMethodS == "costcomplexity" ) fPruneMethod = DecisionTree::kCostComplexityPruning;
273 else if (fPruneMethodS == "nopruning" ) fPruneMethod = DecisionTree::kNoPruning;
274 else {
275 Log() << kINFO << GetOptions() << Endl;
276 Log() << kFATAL << "<ProcessOptions> unknown PruneMethod option:" << fPruneMethodS <<" called" << Endl;
277 }
278
279 if (fPruneStrength < 0) fAutomatic = kTRUE;
280 else fAutomatic = kFALSE;
281 if (fAutomatic && fPruneMethod == DecisionTree::kExpectedErrorPruning){
282 Log() << kFATAL
283 << "Sorry automatic pruning strength determination is not implemented yet for ExpectedErrorPruning" << Endl;
284 }
285
286
287 if (this->Data()->HasNegativeEventWeights()){
288 Log() << kINFO << " You are using a Monte Carlo that has also negative weights. "
289 << "That should in principle be fine as long as on average you end up with "
290 << "something positive. For this you have to make sure that the minimal number "
291 << "of (un-weighted) events demanded for a tree node (currently you use: MinNodeSize="
292 <<fMinNodeSizeS
293 <<", (or the deprecated equivalent nEventsMin) you can set this via the "
294 <<"MethodDT option string when booking the "
295 << "classifier) is large enough to allow for reasonable averaging!!! "
296 << " If this does not help.. maybe you want to try the option: IgnoreNegWeightsInTraining "
297 << "which ignores events with negative weight in the training. " << Endl
298 << Endl << "Note: You'll get a WARNING message during the training if that should ever happen" << Endl;
299 }
300
301 if (fRandomisedTrees){
302 Log() << kINFO << " Randomised trees should use *bagging* as *boost* method. Did you set this in the *MethodBoost* ? . Here I can enforce only the *no pruning*" << Endl;
303 fPruneMethod = DecisionTree::kNoPruning;
304 // fBoostType = "Bagging";
305 }
306
307 if (fMinNodeEvents > 0){
308 fMinNodeSize = fMinNodeEvents / Data()->GetNTrainingEvents() * 100;
309 Log() << kWARNING << "You have explicitly set *nEventsMin*, the min absolute number \n"
310 << "of events in a leaf node. This is DEPRECATED, please use the option \n"
311 << "*MinNodeSize* giving the relative number as percentage of training \n"
312 << "events instead. \n"
313 << "nEventsMin="<<fMinNodeEvents<< "--> MinNodeSize="<<fMinNodeSize<<"%"
314 << Endl;
315 }else{
316 SetMinNodeSize(fMinNodeSizeS);
317 }
318}
319
321 if (sizeInPercent > 0 && sizeInPercent < 50){
322 fMinNodeSize=sizeInPercent;
323
324 } else {
325 Log() << kERROR << "you have demanded a minimal node size of "
326 << sizeInPercent << "% of the training events.. \n"
327 << " that somehow does not make sense "<<Endl;
328 }
329
330}
332 sizeInPercent.ReplaceAll("%","");
333 if (sizeInPercent.IsAlnum()) SetMinNodeSize(sizeInPercent.Atof());
334 else {
335 Log() << kERROR << "I had problems reading the option MinNodeEvents, which\n"
336 << "after removing a possible % sign now reads " << sizeInPercent << Endl;
337 }
338}
339
340////////////////////////////////////////////////////////////////////////////////
341/// common initialisation with defaults for the DT-Method
342
344{
345 fMinNodeEvents = -1;
346 fMinNodeSize = 5;
347 fMinNodeSizeS = "5%";
348 fNCuts = 20;
349 fPruneMethod = DecisionTree::kNoPruning;
350 fPruneStrength = 5; // -1 means automatic determination of the prune strength using a validation sample
351 fDeltaPruneStrength=0.1;
352 fRandomisedTrees= kFALSE;
353 fUseNvars = GetNvar();
354 fUsePoissonNvars = kTRUE;
355
356 // reference cut value to distinguish signal-like from background-like events
357 SetSignalReferenceCut( 0 );
358 if (fAnalysisType == Types::kClassification || fAnalysisType == Types::kMulticlass ) {
359 fMaxDepth = 3;
360 }else {
361 fMaxDepth = 50;
362 }
363}
364
365////////////////////////////////////////////////////////////////////////////////
366///destructor
367
369{
370 delete fTree;
371}
372
373////////////////////////////////////////////////////////////////////////////////
374
376{
378 fTree = new DecisionTree( fSepType, fMinNodeSize, fNCuts, &(DataInfo()), 0,
379 fRandomisedTrees, fUseNvars, fUsePoissonNvars,fMaxDepth,0 );
380 fTree->SetNVars(GetNvar());
381 if (fRandomisedTrees) Log()<<kWARNING<<" randomised Trees do not work yet in this framework,"
382 << " as I do not know how to give each tree a new random seed, now they"
383 << " will be all the same and that is not good " << Endl;
384 fTree->SetAnalysisType( GetAnalysisType() );
385
386 //fTree->BuildTree(GetEventCollection(Types::kTraining));
387 Data()->SetCurrentType(Types::kTraining);
388 UInt_t nevents = Data()->GetNTrainingEvents();
389 std::vector<const TMVA::Event*> tmp;
390 for (Long64_t ievt=0; ievt<nevents; ievt++) {
391 const Event *event = GetEvent(ievt);
392 tmp.push_back(event);
393 }
394 fTree->BuildTree(tmp);
395 if (fPruneMethod != DecisionTree::kNoPruning) fTree->PruneTree();
396
398 ExitFromTraining();
399}
400
401////////////////////////////////////////////////////////////////////////////////
402/// prune the decision tree if requested (good for individual trees that are best grown out, and then
403/// pruned back, while boosted decision trees are best 'small' trees to start with. Well, at least the
404/// standard "optimal pruning algorithms" don't result in 'weak enough' classifiers !!
405
407{
408 // remember the number of nodes beforehand (for monitoring purposes)
409
410
411 if (fAutomatic && fPruneMethod == DecisionTree::kCostComplexityPruning) { // automatic cost complexity pruning
412 CCPruner* pruneTool = new CCPruner(fTree, this->Data() , fSepType);
413 pruneTool->Optimize();
414 std::vector<DecisionTreeNode*> nodes = pruneTool->GetOptimalPruneSequence();
415 fPruneStrength = pruneTool->GetOptimalPruneStrength();
416 for(UInt_t i = 0; i < nodes.size(); i++)
417 fTree->PruneNode(nodes[i]);
418 delete pruneTool;
419 }
420 else if (fAutomatic && fPruneMethod != DecisionTree::kCostComplexityPruning){
421 /*
422
423 Double_t alpha = 0;
424 Double_t delta = fDeltaPruneStrength;
425
426 DecisionTree* dcopy;
427 std::vector<Double_t> q;
428 multimap<Double_t,Double_t> quality;
429 Int_t nnodes=fTree->GetNNodes();
430
431 // find the maximum prune strength that still leaves some nodes
432 Bool_t forceStop = kFALSE;
433 Int_t troubleCount=0, previousNnodes=nnodes;
434
435
436 nnodes=fTree->GetNNodes();
437 while (nnodes > 3 && !forceStop) {
438 dcopy = new DecisionTree(*fTree);
439 dcopy->SetPruneStrength(alpha+=delta);
440 dcopy->PruneTree();
441 q.push_back(TestTreeQuality(dcopy));
442 quality.insert(std::pair<const Double_t,Double_t>(q.back(),alpha));
443 nnodes=dcopy->GetNNodes();
444 if (previousNnodes == nnodes) troubleCount++;
445 else {
446 troubleCount=0; // reset counter
447 if (nnodes < previousNnodes / 2 ) fDeltaPruneStrength /= 2.;
448 }
449 previousNnodes = nnodes;
450 if (troubleCount > 20) {
451 if (methodIndex == 0 && fPruneStrength <=0) {//maybe you need larger stepsize ??
452 fDeltaPruneStrength *= 5;
453 Log() << kINFO << "<PruneTree> trouble determining optimal prune strength"
454 << " for Tree " << methodIndex
455 << " --> first try to increase the step size"
456 << " currently Prunestrenght= " << alpha
457 << " stepsize " << fDeltaPruneStrength << " " << Endl;
458 troubleCount = 0; // try again
459 fPruneStrength = 1; // if it was for the first time..
460 } else if (methodIndex == 0 && fPruneStrength <=2) {//maybe you need much larger stepsize ??
461 fDeltaPruneStrength *= 5;
462 Log() << kINFO << "<PruneTree> trouble determining optimal prune strength"
463 << " for Tree " << methodIndex
464 << " --> try to increase the step size even more.. "
465 << " if that still didn't work, TRY IT BY HAND"
466 << " currently Prunestrenght= " << alpha
467 << " stepsize " << fDeltaPruneStrength << " " << Endl;
468 troubleCount = 0; // try again
469 fPruneStrength = 3; // if it was for the first time..
470 } else {
471 forceStop=kTRUE;
472 Log() << kINFO << "<PruneTree> trouble determining optimal prune strength"
473 << " for Tree " << methodIndex << " at tested prune strength: " << alpha << " --> abort forced, use same strength as for previous tree:"
474 << fPruneStrength << Endl;
475 }
476 }
477 if (fgDebugLevel==1) Log() << kINFO << "Pruneed with ("<<alpha
478 << ") give quality: " << q.back()
479 << " and #nodes: " << nnodes
480 << Endl;
481 delete dcopy;
482 }
483 if (!forceStop) {
484 multimap<Double_t,Double_t>::reverse_iterator it=quality.rend();
485 it++;
486 fPruneStrength = it->second;
487 // adjust the step size for the next tree.. think that 20 steps are sort of
488 // fine enough.. could become a tunable option later..
489 fDeltaPruneStrength *= Double_t(q.size())/20.;
490 }
491
492 fTree->SetPruneStrength(fPruneStrength);
493 fTree->PruneTree();
494 */
495 }
496 else {
497 fTree->SetPruneStrength(fPruneStrength);
498 fTree->PruneTree();
499 }
500
501 return fPruneStrength;
502}
503
504////////////////////////////////////////////////////////////////////////////////
505
507{
508 Data()->SetCurrentType(Types::kValidation);
509 // test the tree quality.. in terms of Misclassification
510 Double_t SumCorrect=0,SumWrong=0;
511 for (Long64_t ievt=0; ievt<Data()->GetNEvents(); ievt++)
512 {
513 const Event * ev = Data()->GetEvent(ievt);
514 if ((dt->CheckEvent(ev) > dt->GetNodePurityLimit() ) == DataInfo().IsSignal(ev)) SumCorrect+=ev->GetWeight();
515 else SumWrong+=ev->GetWeight();
516 }
517 Data()->SetCurrentType(Types::kTraining);
518 return SumCorrect / (SumCorrect + SumWrong);
519}
520
521////////////////////////////////////////////////////////////////////////////////
522
523void TMVA::MethodDT::AddWeightsXMLTo( void* parent ) const
524{
525 fTree->AddXMLTo(parent);
526 //Log() << kFATAL << "Please implement writing of weights as XML" << Endl;
527}
528
529////////////////////////////////////////////////////////////////////////////////
530
532{
533 if(fTree)
534 delete fTree;
535 fTree = new DecisionTree();
536 fTree->ReadXML(wghtnode,GetTrainingTMVAVersionCode());
537}
538
539////////////////////////////////////////////////////////////////////////////////
540
542{
543 delete fTree;
544 fTree = new DecisionTree();
545 fTree->Read(istr);
546}
547
548////////////////////////////////////////////////////////////////////////////////
549/// returns MVA value
550
552{
553 // cannot determine error
554 NoErrorCalc(err, errUpper);
555
556 return fTree->CheckEvent(GetEvent(),fUseYesNoLeaf);
557}
558
559////////////////////////////////////////////////////////////////////////////////
560
562{
563}
564////////////////////////////////////////////////////////////////////////////////
565
567{
568 return 0;
569}
#define REGISTER_METHOD(CLASS)
for example
constexpr Bool_t kFALSE
Definition RtypesCore.h:94
long long Long64_t
Definition RtypesCore.h:69
constexpr Bool_t kTRUE
Definition RtypesCore.h:93
#define ClassImp(name)
Definition Rtypes.h:382
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t type
A helper class to prune a decision tree using the Cost Complexity method (see Classification and Regr...
Definition CCPruner.h:62
void SetPruneStrength(Float_t alpha=-1.0)
Definition CCPruner.h:110
void Optimize()
determine the pruning sequence
Definition CCPruner.cxx:124
std::vector< TMVA::DecisionTreeNode * > GetOptimalPruneSequence() const
return the prune strength (=alpha) corresponding to the prune sequence
Definition CCPruner.cxx:240
Float_t GetOptimalPruneStrength() const
Definition CCPruner.h:89
Implementation of the CrossEntropy as separation criterion.
Class that contains all the data information.
Definition DataSetInfo.h:62
static void SetIsTraining(bool on)
Implementation of a Decision Tree.
Double_t GetNodePurityLimit() const
Double_t CheckEvent(const TMVA::Event *, Bool_t UseYesNoLeaf=kFALSE) const
the event e is put into the decision tree (starting at the root node) and the output is NodeType (sig...
Double_t GetWeight() const
return the event weight - depending on whether the flag IgnoreNegWeightsInTraining is or not.
Definition Event.cxx:389
Implementation of the GiniIndex as separation criterion.
Definition GiniIndex.h:63
Virtual base Class for all MVA method.
Definition MethodBase.h:111
virtual void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
Analysis of Boosted Decision Trees.
Definition MethodDT.h:49
virtual ~MethodDT(void)
destructor
Definition MethodDT.cxx:368
MethodDT(const TString &jobName, const TString &methodTitle, DataSetInfo &theData, const TString &theOption="")
the standard constructor for just an ordinar "decision trees"
Definition MethodDT.cxx:127
Double_t TestTreeQuality(DecisionTree *dt)
Definition MethodDT.cxx:506
virtual Bool_t HasAnalysisType(Types::EAnalysisType type, UInt_t numberClasses, UInt_t numberTargets)
FDA can handle classification with 2 classes and regression with one regression-target.
Definition MethodDT.cxx:180
void Train(void)
Definition MethodDT.cxx:375
const Ranking * CreateRanking()
Definition MethodDT.cxx:566
void ReadWeightsFromXML(void *wghtnode)
Definition MethodDT.cxx:531
void GetHelpMessage() const
Definition MethodDT.cxx:561
Double_t GetMvaValue(Double_t *err=nullptr, Double_t *errUpper=nullptr)
returns MVA value
Definition MethodDT.cxx:551
void AddWeightsXMLTo(void *parent) const
Definition MethodDT.cxx:523
Double_t PruneTree()
prune the decision tree if requested (good for individual trees that are best grown out,...
Definition MethodDT.cxx:406
void ReadWeightsFromStream(std::istream &istr)
Definition MethodDT.cxx:541
void DeclareOptions()
Define the options (their key words) that can be set in the option string.
Definition MethodDT.cxx:212
Bool_t fPruneBeforeBoost
ancient variable, only needed for "CompatibilityOptions"
Definition MethodDT.h:137
void Init(void)
common initialisation with defaults for the DT-Method
Definition MethodDT.cxx:343
void SetMinNodeSize(Double_t sizeInPercent)
Definition MethodDT.cxx:320
void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility
Definition MethodDT.cxx:245
void ProcessOptions()
the option string is decoded, for available options see "DeclareOptions"
Definition MethodDT.cxx:256
Implementation of the MisClassificationError as separation criterion.
Ranking for variables in method (implementation)
Definition Ranking.h:48
Implementation of the SdivSqrtSplusB as separation criterion.
Singleton class for Global types used by TMVA.
Definition Types.h:71
@ kMulticlass
Definition Types.h:129
@ kClassification
Definition Types.h:127
@ kTraining
Definition Types.h:143
@ kValidation
these are placeholders... currently not used, but could be moved "forward" if
Definition Types.h:146
Basic string class.
Definition TString.h:139
Double_t Atof() const
Return floating-point value contained in string.
Definition TString.cxx:2054
TString & ReplaceAll(const TString &s1, const TString &s2)
Definition TString.h:704
Bool_t IsAlnum() const
Returns true if all characters in string are alphanumeric.
Definition TString.cxx:1813
create variable transformations
MsgLogger & Endl(MsgLogger &ml)
Definition MsgLogger.h:148