Logo ROOT   6.10/09
Reference Guide
MethodBDT.cxx
Go to the documentation of this file.
1 // Author: Andreas Hoecker, Joerg Stelzer, Helge Voss, Kai Voss, Eckhard v. Toerne, Jan Therhaag
2 
3 /**********************************************************************************
4  * Project: TMVA - a Root-integrated toolkit for multivariate data analysis *
5  * Package: TMVA *
6  * Class : MethodBDT (BDT = Boosted Decision Trees) *
7  * Web : http://tmva.sourceforge.net *
8  * *
9  * Description: *
10  * Analysis of Boosted Decision Trees *
11  * *
12  * Authors (alphabetical): *
13  * Andreas Hoecker <Andreas.Hocker@cern.ch> - CERN, Switzerland *
14  * Helge Voss <Helge.Voss@cern.ch> - MPI-K Heidelberg, Germany *
15  * Kai Voss <Kai.Voss@cern.ch> - U. of Victoria, Canada *
16  * Doug Schouten <dschoute@sfu.ca> - Simon Fraser U., Canada *
17  * Jan Therhaag <jan.therhaag@cern.ch> - U. of Bonn, Germany *
18  * Eckhard v. Toerne <evt@uni-bonn.de> - U of Bonn, Germany *
19  * *
20  * Copyright (c) 2005-2011: *
21  * CERN, Switzerland *
22  * U. of Victoria, Canada *
23  * MPI-K Heidelberg, Germany *
24  * U. of Bonn, Germany *
25  * *
26  * Redistribution and use in source and binary forms, with or without *
27  * modification, are permitted according to the terms listed in LICENSE *
28  * (http://tmva.sourceforge.net/LICENSE) *
29  **********************************************************************************/
30 
31 /*! \class TMVA::MethodBDT
32 \ingroup TMVA
33 
34 Analysis of Boosted Decision Trees
35 
36 Boosted decision trees have been successfully used in High Energy
37 Physics analysis for example by the MiniBooNE experiment
38 (Yang-Roe-Zhu, physics/0508045). In Boosted Decision Trees, the
39 selection is done on a majority vote on the result of several decision
40 trees, which are all derived from the same training sample by
41 supplying different event weights during the training.
42 
43 ### Decision trees:
44 
45 Successive decision nodes are used to categorize the
46 events out of the sample as either signal or background. Each node
47 uses only a single discriminating variable to decide if the event is
48 signal-like ("goes right") or background-like ("goes left"). This
49 forms a tree like structure with "baskets" at the end (leave nodes),
50 and an event is classified as either signal or background according to
51 whether the basket where it ends up has been classified signal or
52 background during the training. Training of a decision tree is the
53 process to define the "cut criteria" for each node. The training
54 starts with the root node. Here one takes the full training event
55 sample and selects the variable and corresponding cut value that gives
56 the best separation between signal and background at this stage. Using
57 this cut criterion, the sample is then divided into two subsamples, a
58 signal-like (right) and a background-like (left) sample. Two new nodes
59 are then created for each of the two sub-samples and they are
60 constructed using the same mechanism as described for the root
61 node. The devision is stopped once a certain node has reached either a
62 minimum number of events, or a minimum or maximum signal purity. These
63 leave nodes are then called "signal" or "background" if they contain
64 more signal respective background events from the training sample.
65 
66 ### Boosting:
67 
68 The idea behind adaptive boosting (AdaBoost) is, that signal events
69 from the training sample, that end up in a background node
70 (and vice versa) are given a larger weight than events that are in
71 the correct leave node. This results in a re-weighed training event
72 sample, with which then a new decision tree can be developed.
73 The boosting can be applied several times (typically 100-500 times)
74 and one ends up with a set of decision trees (a forest).
75 Gradient boosting works more like a function expansion approach, where
76 each tree corresponds to a summand. The parameters for each summand (tree)
77 are determined by the minimization of a error function (binomial log-
78 likelihood for classification and Huber loss for regression).
79 A greedy algorithm is used, which means, that only one tree is modified
80 at a time, while the other trees stay fixed.
81 
82 ### Bagging:
83 
84 In this particular variant of the Boosted Decision Trees the boosting
85 is not done on the basis of previous training results, but by a simple
86 stochastic re-sampling of the initial training event sample.
87 
88 ### Random Trees:
89 
90 Similar to the "Random Forests" from Leo Breiman and Adele Cutler, it
91 uses the bagging algorithm together and bases the determination of the
92 best node-split during the training on a random subset of variables only
93 which is individually chosen for each split.
94 
95 ### Analysis:
96 
97 Applying an individual decision tree to a test event results in a
98 classification of the event as either signal or background. For the
99 boosted decision tree selection, an event is successively subjected to
100 the whole set of decision trees and depending on how often it is
101 classified as signal, a "likelihood" estimator is constructed for the
102 event being signal or background. The value of this estimator is the
103 one which is then used to select the events from an event sample, and
104 the cut value on this estimator defines the efficiency and purity of
105 the selection.
106 
107 */
108 
109 
110 #include "TMVA/MethodBDT.h"
111 
112 #include "TMVA/BDTEventWrapper.h"
113 #include "TMVA/BinarySearchTree.h"
114 #include "TMVA/ClassifierFactory.h"
115 #include "TMVA/Configurable.h"
116 #include "TMVA/CrossEntropy.h"
117 #include "TMVA/DecisionTree.h"
118 #include "TMVA/DataSet.h"
119 #include "TMVA/GiniIndex.h"
121 #include "TMVA/Interval.h"
122 #include "TMVA/IMethod.h"
123 #include "TMVA/LogInterval.h"
124 #include "TMVA/MethodBase.h"
126 #include "TMVA/MsgLogger.h"
128 #include "TMVA/PDF.h"
129 #include "TMVA/Ranking.h"
130 #include "TMVA/Results.h"
131 #include "TMVA/ResultsMulticlass.h"
132 #include "TMVA/SdivSqrtSplusB.h"
133 #include "TMVA/SeparationBase.h"
134 #include "TMVA/Timer.h"
135 #include "TMVA/Tools.h"
136 #include "TMVA/Types.h"
137 
138 #include "Riostream.h"
139 #include "TDirectory.h"
140 #include "TRandom3.h"
141 #include "TMath.h"
142 #include "TMatrixTSym.h"
143 #include "TObjString.h"
144 #include "TGraph.h"
145 
146 #include <algorithm>
147 #include <fstream>
148 #include <math.h>
149 #include <unordered_map>
150 
151 
152 using std::vector;
153 using std::make_pair;
154 
156 
158 
160 
161 ////////////////////////////////////////////////////////////////////////////////
162 /// The standard constructor for the "boosted decision trees".
163 
165  const TString& methodTitle,
166  DataSetInfo& theData,
167  const TString& theOption ) :
168  TMVA::MethodBase( jobName, Types::kBDT, methodTitle, theData, theOption)
169  , fTrainSample(0)
170  , fNTrees(0)
171  , fSigToBkgFraction(0)
172  , fAdaBoostBeta(0)
173 // , fTransitionPoint(0)
174  , fShrinkage(0)
175  , fBaggedBoost(kFALSE)
176  , fBaggedGradBoost(kFALSE)
177 // , fSumOfWeights(0)
178  , fMinNodeEvents(0)
179  , fMinNodeSize(5)
180  , fMinNodeSizeS("5%")
181  , fNCuts(0)
182  , fUseFisherCuts(0) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
183  , fMinLinCorrForFisher(.8) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
184  , fUseExclusiveVars(0) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
185  , fUseYesNoLeaf(kFALSE)
186  , fNodePurityLimit(0)
187  , fNNodesMax(0)
188  , fMaxDepth(0)
189  , fPruneMethod(DecisionTree::kNoPruning)
190  , fPruneStrength(0)
191  , fFValidationEvents(0)
192  , fAutomatic(kFALSE)
193  , fRandomisedTrees(kFALSE)
194  , fUseNvars(0)
195  , fUsePoissonNvars(0) // don't use this initialisation, only here to make Coverity happy. Is set in Init()
196  , fUseNTrainEvents(0)
197  , fBaggedSampleFraction(0)
198  , fNoNegWeightsInTraining(kFALSE)
199  , fInverseBoostNegWeights(kFALSE)
200  , fPairNegWeightsGlobal(kFALSE)
201  , fTrainWithNegWeights(kFALSE)
202  , fDoBoostMonitor(kFALSE)
203  , fITree(0)
204  , fBoostWeight(0)
205  , fErrorFraction(0)
206  , fCss(0)
207  , fCts_sb(0)
208  , fCtb_ss(0)
209  , fCbb(0)
210  , fDoPreselection(kFALSE)
211  , fSkipNormalization(kFALSE)
212  , fHistoricBool(kFALSE)
213 {
215  fSepType = NULL;
216  fRegressionLossFunctionBDTG = nullptr;
217 }
218 
219 ////////////////////////////////////////////////////////////////////////////////
220 
222  const TString& theWeightFile)
223  : TMVA::MethodBase( Types::kBDT, theData, theWeightFile)
224  , fTrainSample(0)
225  , fNTrees(0)
226  , fSigToBkgFraction(0)
227  , fAdaBoostBeta(0)
228 // , fTransitionPoint(0)
229  , fShrinkage(0)
232 // , fSumOfWeights(0)
233  , fMinNodeEvents(0)
234  , fMinNodeSize(5)
235  , fMinNodeSizeS("5%")
236  , fNCuts(0)
237  , fUseFisherCuts(0) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
238  , fMinLinCorrForFisher(.8) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
239  , fUseExclusiveVars(0) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
241  , fNodePurityLimit(0)
242  , fNNodesMax(0)
243  , fMaxDepth(0)
244  , fPruneMethod(DecisionTree::kNoPruning)
245  , fPruneStrength(0)
246  , fFValidationEvents(0)
247  , fAutomatic(kFALSE)
249  , fUseNvars(0)
250  , fUsePoissonNvars(0) // don't use this initialisation, only here to make Coverity happy. Is set in Init()
251  , fUseNTrainEvents(0)
258  , fITree(0)
259  , fBoostWeight(0)
260  , fErrorFraction(0)
261  , fCss(0)
262  , fCts_sb(0)
263  , fCtb_ss(0)
264  , fCbb(0)
268 {
270  fSepType = NULL;
271  fRegressionLossFunctionBDTG = nullptr;
272  // constructor for calculating BDT-MVA using previously generated decision trees
273  // the result of the previous training (the decision trees) are read in via the
274  // weight file. Make sure the the variables correspond to the ones used in
275  // creating the "weight"-file
276 }
277 
278 ////////////////////////////////////////////////////////////////////////////////
279 /// BDT can handle classification with multiple classes and regression with one regression-target.
280 
282 {
283  if (type == Types::kClassification && numberClasses == 2) return kTRUE;
284  if (type == Types::kMulticlass ) return kTRUE;
285  if( type == Types::kRegression && numberTargets == 1 ) return kTRUE;
286  return kFALSE;
287 }
288 
289 ////////////////////////////////////////////////////////////////////////////////
290 /// Define the options (their key words). That can be set in the option string.
291 ///
292 /// know options:
293 ///
294 /// - nTrees number of trees in the forest to be created
295 /// - BoostType the boosting type for the trees in the forest (AdaBoost e.t.c..).
296 /// Known:
297 /// - AdaBoost
298 /// - AdaBoostR2 (Adaboost for regression)
299 /// - Bagging
300 /// - GradBoost
301 /// - AdaBoostBeta the boosting parameter, beta, for AdaBoost
302 /// - UseRandomisedTrees choose at each node splitting a random set of variables
303 /// - UseNvars use UseNvars variables in randomised trees
304 /// - UsePoisson Nvars use UseNvars not as fixed number but as mean of a poisson distribution
305 /// - SeparationType the separation criterion applied in the node splitting.
306 /// Known:
307 /// - GiniIndex
308 /// - MisClassificationError
309 /// - CrossEntropy
310 /// - SDivSqrtSPlusB
311 /// - MinNodeSize: minimum percentage of training events in a leaf node (leaf criteria, stop splitting)
312 /// - nCuts: the number of steps in the optimisation of the cut for a node (if < 0, then
313 /// step size is determined by the events)
314 /// - UseFisherCuts: use multivariate splits using the Fisher criterion
315 /// - UseYesNoLeaf decide if the classification is done simply by the node type, or the S/B
316 /// (from the training) in the leaf node
317 /// - NodePurityLimit the minimum purity to classify a node as a signal node (used in pruning and boosting to determine
318 /// misclassification error rate)
319 /// - PruneMethod The Pruning method.
320 /// Known:
321 /// - NoPruning // switch off pruning completely
322 /// - ExpectedError
323 /// - CostComplexity
324 /// - PruneStrength a parameter to adjust the amount of pruning. Should be large enough such that overtraining is avoided.
325 /// - PruningValFraction number of events to use for optimizing pruning (only if PruneStrength < 0, i.e. automatic pruning)
326 /// - NegWeightTreatment
327 /// - IgnoreNegWeightsInTraining Ignore negative weight events in the training.
328 /// - DecreaseBoostWeight Boost ev. with neg. weight with 1/boostweight instead of boostweight
329 /// - PairNegWeightsGlobal Pair ev. with neg. and pos. weights in training sample and "annihilate" them
330 /// - MaxDepth maximum depth of the decision tree allowed before further splitting is stopped
331 /// - SkipNormalization Skip normalization at initialization, to keep expectation value of BDT output
332 /// according to the fraction of events
333 
335 {
336  DeclareOptionRef(fNTrees, "NTrees", "Number of trees in the forest");
337  if (DoRegression()) {
338  DeclareOptionRef(fMaxDepth=50,"MaxDepth","Max depth of the decision tree allowed");
339  }else{
340  DeclareOptionRef(fMaxDepth=3,"MaxDepth","Max depth of the decision tree allowed");
341  }
342 
343  TString tmp="5%"; if (DoRegression()) tmp="0.2%";
344  DeclareOptionRef(fMinNodeSizeS=tmp, "MinNodeSize", "Minimum percentage of training events required in a leaf node (default: Classification: 5%, Regression: 0.2%)");
345  // MinNodeSize: minimum percentage of training events in a leaf node (leaf criteria, stop splitting)
346  DeclareOptionRef(fNCuts, "nCuts", "Number of grid points in variable range used in finding optimal cut in node splitting");
347 
348  DeclareOptionRef(fBoostType, "BoostType", "Boosting type for the trees in the forest (note: AdaCost is still experimental)");
349 
350  AddPreDefVal(TString("AdaBoost"));
351  AddPreDefVal(TString("RealAdaBoost"));
352  AddPreDefVal(TString("AdaCost"));
353  AddPreDefVal(TString("Bagging"));
354  // AddPreDefVal(TString("RegBoost"));
355  AddPreDefVal(TString("AdaBoostR2"));
356  AddPreDefVal(TString("Grad"));
357  if (DoRegression()) {
358  fBoostType = "AdaBoostR2";
359  }else{
360  fBoostType = "AdaBoost";
361  }
362  DeclareOptionRef(fAdaBoostR2Loss="Quadratic", "AdaBoostR2Loss", "Type of Loss function in AdaBoostR2");
363  AddPreDefVal(TString("Linear"));
364  AddPreDefVal(TString("Quadratic"));
365  AddPreDefVal(TString("Exponential"));
366 
367  DeclareOptionRef(fBaggedBoost=kFALSE, "UseBaggedBoost","Use only a random subsample of all events for growing the trees in each boost iteration.");
368  DeclareOptionRef(fShrinkage=1.0, "Shrinkage", "Learning rate for GradBoost algorithm");
369  DeclareOptionRef(fAdaBoostBeta=.5, "AdaBoostBeta", "Learning rate for AdaBoost algorithm");
370  DeclareOptionRef(fRandomisedTrees,"UseRandomisedTrees","Determine at each node splitting the cut variable only as the best out of a random subset of variables (like in RandomForests)");
371  DeclareOptionRef(fUseNvars,"UseNvars","Size of the subset of variables used with RandomisedTree option");
372  DeclareOptionRef(fUsePoissonNvars,"UsePoissonNvars", "Interpret \"UseNvars\" not as fixed number but as mean of a Poisson distribution in each split with RandomisedTree option");
373  DeclareOptionRef(fBaggedSampleFraction=.6,"BaggedSampleFraction","Relative size of bagged event sample to original size of the data sample (used whenever bagging is used (i.e. UseBaggedBoost, Bagging,)" );
374 
375  DeclareOptionRef(fUseYesNoLeaf=kTRUE, "UseYesNoLeaf",
376  "Use Sig or Bkg categories, or the purity=S/(S+B) as classification of the leaf node -> Real-AdaBoost");
377  if (DoRegression()) {
379  }
380 
381  DeclareOptionRef(fNegWeightTreatment="InverseBoostNegWeights","NegWeightTreatment","How to treat events with negative weights in the BDT training (particular the boosting) : IgnoreInTraining; Boost With inverse boostweight; Pair events with negative and positive weights in training sample and *annihilate* them (experimental!)");
382  AddPreDefVal(TString("InverseBoostNegWeights"));
383  AddPreDefVal(TString("IgnoreNegWeightsInTraining"));
384  AddPreDefVal(TString("NoNegWeightsInTraining")); // well, let's be nice to users and keep at least this old name anyway ..
385  AddPreDefVal(TString("PairNegWeightsGlobal"));
386  AddPreDefVal(TString("Pray"));
387 
388 
389 
390  DeclareOptionRef(fCss=1., "Css", "AdaCost: cost of true signal selected signal");
391  DeclareOptionRef(fCts_sb=1.,"Cts_sb","AdaCost: cost of true signal selected bkg");
392  DeclareOptionRef(fCtb_ss=1.,"Ctb_ss","AdaCost: cost of true bkg selected signal");
393  DeclareOptionRef(fCbb=1., "Cbb", "AdaCost: cost of true bkg selected bkg ");
394 
395  DeclareOptionRef(fNodePurityLimit=0.5, "NodePurityLimit", "In boosting/pruning, nodes with purity > NodePurityLimit are signal; background otherwise.");
396 
397 
398  DeclareOptionRef(fSepTypeS, "SeparationType", "Separation criterion for node splitting");
399  AddPreDefVal(TString("CrossEntropy"));
400  AddPreDefVal(TString("GiniIndex"));
401  AddPreDefVal(TString("GiniIndexWithLaplace"));
402  AddPreDefVal(TString("MisClassificationError"));
403  AddPreDefVal(TString("SDivSqrtSPlusB"));
404  AddPreDefVal(TString("RegressionVariance"));
405  if (DoRegression()) {
406  fSepTypeS = "RegressionVariance";
407  }else{
408  fSepTypeS = "GiniIndex";
409  }
410 
411  DeclareOptionRef(fRegressionLossFunctionBDTGS = "Huber", "RegressionLossFunctionBDTG", "Loss function for BDTG regression.");
412  AddPreDefVal(TString("Huber"));
413  AddPreDefVal(TString("AbsoluteDeviation"));
414  AddPreDefVal(TString("LeastSquares"));
415 
416  DeclareOptionRef(fHuberQuantile = 0.7, "HuberQuantile", "In the Huber loss function this is the quantile that separates the core from the tails in the residuals distribution.");
417 
418  DeclareOptionRef(fDoBoostMonitor=kFALSE,"DoBoostMonitor","Create control plot with ROC integral vs tree number");
419 
420  DeclareOptionRef(fUseFisherCuts=kFALSE, "UseFisherCuts", "Use multivariate splits using the Fisher criterion");
421  DeclareOptionRef(fMinLinCorrForFisher=.8,"MinLinCorrForFisher", "The minimum linear correlation between two variables demanded for use in Fisher criterion in node splitting");
422  DeclareOptionRef(fUseExclusiveVars=kFALSE,"UseExclusiveVars","Variables already used in fisher criterion are not anymore analysed individually for node splitting");
423 
424 
425  DeclareOptionRef(fDoPreselection=kFALSE,"DoPreselection","and and apply automatic pre-selection for 100% efficient signal (bkg) cuts prior to training");
426 
427 
428  DeclareOptionRef(fSigToBkgFraction=1,"SigToBkgFraction","Sig to Bkg ratio used in Training (similar to NodePurityLimit, which cannot be used in real adaboost");
429 
430  DeclareOptionRef(fPruneMethodS, "PruneMethod", "Note: for BDTs use small trees (e.g.MaxDepth=3) and NoPruning: Pruning: Method used for pruning (removal) of statistically insignificant branches ");
431  AddPreDefVal(TString("NoPruning"));
432  AddPreDefVal(TString("ExpectedError"));
433  AddPreDefVal(TString("CostComplexity"));
434 
435  DeclareOptionRef(fPruneStrength, "PruneStrength", "Pruning strength");
436 
437  DeclareOptionRef(fFValidationEvents=0.5, "PruningValFraction", "Fraction of events to use for optimizing automatic pruning.");
438 
439  DeclareOptionRef(fSkipNormalization=kFALSE, "SkipNormalization", "Skip normalization at initialization, to keep expectation value of BDT output according to the fraction of events");
440 
441  // deprecated options, still kept for the moment:
442  DeclareOptionRef(fMinNodeEvents=0, "nEventsMin", "deprecated: Use MinNodeSize (in % of training events) instead");
443 
444  DeclareOptionRef(fBaggedGradBoost=kFALSE, "UseBaggedGrad","deprecated: Use *UseBaggedBoost* instead: Use only a random subsample of all events for growing the trees in each iteration.");
445  DeclareOptionRef(fBaggedSampleFraction, "GradBaggingFraction","deprecated: Use *BaggedSampleFraction* instead: Defines the fraction of events to be used in each iteration, e.g. when UseBaggedGrad=kTRUE. ");
446  DeclareOptionRef(fUseNTrainEvents,"UseNTrainEvents","deprecated: Use *BaggedSampleFraction* instead: Number of randomly picked training events used in randomised (and bagged) trees");
447  DeclareOptionRef(fNNodesMax,"NNodesMax","deprecated: Use MaxDepth instead to limit the tree size" );
448 
449 
450 }
451 
452 ////////////////////////////////////////////////////////////////////////////////
453 /// Options that are used ONLY for the READER to ensure backward compatibility.
454 
457 
458 
459  DeclareOptionRef(fHistoricBool=kTRUE, "UseWeightedTrees",
460  "Use weighted trees or simple average in classification from the forest");
461  DeclareOptionRef(fHistoricBool=kFALSE, "PruneBeforeBoost", "Flag to prune the tree before applying boosting algorithm");
462  DeclareOptionRef(fHistoricBool=kFALSE,"RenormByClass","Individually re-normalize each event class to the original size after boosting");
463 
464  AddPreDefVal(TString("NegWeightTreatment"),TString("IgnoreNegWeights"));
465 
466 }
467 
468 ////////////////////////////////////////////////////////////////////////////////
469 /// The option string is decoded, for available options see "DeclareOptions".
470 
472 {
473  fSepTypeS.ToLower();
474  if (fSepTypeS == "misclassificationerror") fSepType = new MisClassificationError();
475  else if (fSepTypeS == "giniindex") fSepType = new GiniIndex();
476  else if (fSepTypeS == "giniindexwithlaplace") fSepType = new GiniIndexWithLaplace();
477  else if (fSepTypeS == "crossentropy") fSepType = new CrossEntropy();
478  else if (fSepTypeS == "sdivsqrtsplusb") fSepType = new SdivSqrtSplusB();
479  else if (fSepTypeS == "regressionvariance") fSepType = NULL;
480  else {
481  Log() << kINFO << GetOptions() << Endl;
482  Log() << kFATAL << "<ProcessOptions> unknown Separation Index option " << fSepTypeS << " called" << Endl;
483  }
484 
485  if(!(fHuberQuantile >= 0.0 && fHuberQuantile <= 1.0)){
486  Log() << kINFO << GetOptions() << Endl;
487  Log() << kFATAL << "<ProcessOptions> Huber Quantile must be in range [0,1]. Value given, " << fHuberQuantile << ", does not match this criteria" << Endl;
488  }
489 
494  else {
495  Log() << kINFO << GetOptions() << Endl;
496  Log() << kFATAL << "<ProcessOptions> unknown Regression Loss Function BDT option " << fRegressionLossFunctionBDTGS << " called" << Endl;
497  }
498 
501  else if (fPruneMethodS == "costcomplexity") fPruneMethod = DecisionTree::kCostComplexityPruning;
502  else if (fPruneMethodS == "nopruning") fPruneMethod = DecisionTree::kNoPruning;
503  else {
504  Log() << kINFO << GetOptions() << Endl;
505  Log() << kFATAL << "<ProcessOptions> unknown PruneMethod " << fPruneMethodS << " option called" << Endl;
506  }
508  else fAutomatic = kFALSE;
510  Log() << kFATAL
511  << "Sorry automatic pruning strength determination is not implemented yet for ExpectedErrorPruning" << Endl;
512  }
513 
514 
515  if (fMinNodeEvents > 0){
517  Log() << kWARNING << "You have explicitly set ** nEventsMin = " << fMinNodeEvents<<" ** the min absolute number \n"
518  << "of events in a leaf node. This is DEPRECATED, please use the option \n"
519  << "*MinNodeSize* giving the relative number as percentage of training \n"
520  << "events instead. \n"
521  << "nEventsMin="<<fMinNodeEvents<< "--> MinNodeSize="<<fMinNodeSize<<"%"
522  << Endl;
523  Log() << kWARNING << "Note also that explicitly setting *nEventsMin* so far OVERWRITES the option recommended \n"
524  << " *MinNodeSize* = " << fMinNodeSizeS << " option !!" << Endl ;
525  fMinNodeSizeS = Form("%F3.2",fMinNodeSize);
526 
527  }else{
529  }
530 
531 
533 
534  if (fBoostType=="Grad") {
536  if (fNegWeightTreatment=="InverseBoostNegWeights"){
537  Log() << kINFO << "the option *InverseBoostNegWeights* does not exist for BoostType=Grad --> change" << Endl;
538  Log() << kINFO << "to new default for GradBoost *Pray*" << Endl;
539  Log() << kDEBUG << "i.e. simply keep them as if which should work fine for Grad Boost" << Endl;
540  fNegWeightTreatment="Pray";
542  }
543  } else if (fBoostType=="RealAdaBoost"){
544  fBoostType = "AdaBoost";
546  } else if (fBoostType=="AdaCost"){
548  }
549 
550  if (fFValidationEvents < 0.0) fFValidationEvents = 0.0;
551  if (fAutomatic && fFValidationEvents > 0.5) {
552  Log() << kWARNING << "You have chosen to use more than half of your training sample "
553  << "to optimize the automatic pruning algorithm. This is probably wasteful "
554  << "and your overall results will be degraded. Are you sure you want this?"
555  << Endl;
556  }
557 
558 
559  if (this->Data()->HasNegativeEventWeights()){
560  Log() << kINFO << " You are using a Monte Carlo that has also negative weights. "
561  << "That should in principle be fine as long as on average you end up with "
562  << "something positive. For this you have to make sure that the minimal number "
563  << "of (un-weighted) events demanded for a tree node (currently you use: MinNodeSize="
564  << fMinNodeSizeS << " ("<< fMinNodeSize << "%)"
565  <<", (or the deprecated equivalent nEventsMin) you can set this via the "
566  <<"BDT option string when booking the "
567  << "classifier) is large enough to allow for reasonable averaging!!! "
568  << " If this does not help.. maybe you want to try the option: IgnoreNegWeightsInTraining "
569  << "which ignores events with negative weight in the training. " << Endl
570  << Endl << "Note: You'll get a WARNING message during the training if that should ever happen" << Endl;
571  }
572 
573  if (DoRegression()) {
575  Log() << kWARNING << "Regression Trees do not work with fUseYesNoLeaf=TRUE --> I will set it to FALSE" << Endl;
577  }
578 
579  if (fSepType != NULL){
580  Log() << kWARNING << "Regression Trees do not work with Separation type other than <RegressionVariance> --> I will use it instead" << Endl;
581  fSepType = NULL;
582  }
583  if (fUseFisherCuts){
584  Log() << kWARNING << "Sorry, UseFisherCuts is not available for regression analysis, I will ignore it!" << Endl;
586  }
587  if (fNCuts < 0) {
588  Log() << kWARNING << "Sorry, the option of nCuts<0 using a more elaborate node splitting algorithm " << Endl;
589  Log() << kWARNING << "is not implemented for regression analysis ! " << Endl;
590  Log() << kWARNING << "--> I switch do default nCuts = 20 and use standard node splitting"<<Endl;
591  fNCuts=20;
592  }
593  }
594  if (fRandomisedTrees){
595  Log() << kINFO << " Randomised trees use no pruning" << Endl;
597  // fBoostType = "Bagging";
598  }
599 
600  if (fUseFisherCuts) {
601  Log() << kWARNING << "When using the option UseFisherCuts, the other option nCuts<0 (i.e. using" << Endl;
602  Log() << " a more elaborate node splitting algorithm) is not implemented. " << Endl;
603  //I will switch o " << Endl;
604  //Log() << "--> I switch do default nCuts = 20 and use standard node splitting WITH possible Fisher criteria"<<Endl;
605  fNCuts=20;
606  }
607 
608  if (fNTrees==0){
609  Log() << kERROR << " Zero Decision Trees demanded... that does not work !! "
610  << " I set it to 1 .. just so that the program does not crash"
611  << Endl;
612  fNTrees = 1;
613  }
614 
616  if (fNegWeightTreatment == "ignorenegweightsintraining") fNoNegWeightsInTraining = kTRUE;
617  else if (fNegWeightTreatment == "nonegweightsintraining") fNoNegWeightsInTraining = kTRUE;
618  else if (fNegWeightTreatment == "inverseboostnegweights") fInverseBoostNegWeights = kTRUE;
619  else if (fNegWeightTreatment == "pairnegweightsglobal") fPairNegWeightsGlobal = kTRUE;
620  else if (fNegWeightTreatment == "pray") Log() << kDEBUG << "Yes, good luck with praying " << Endl;
621  else {
622  Log() << kINFO << GetOptions() << Endl;
623  Log() << kFATAL << "<ProcessOptions> unknown option for treating negative event weights during training " << fNegWeightTreatment << " requested" << Endl;
624  }
625 
626  if (fNegWeightTreatment == "pairnegweightsglobal")
627  Log() << kWARNING << " you specified the option NegWeightTreatment=PairNegWeightsGlobal : This option is still considered EXPERIMENTAL !! " << Endl;
628 
629 
630  // dealing with deprecated options !
631  if (fNNodesMax>0) {
632  UInt_t tmp=1; // depth=0 == 1 node
633  fMaxDepth=0;
634  while (tmp < fNNodesMax){
635  tmp+=2*tmp;
636  fMaxDepth++;
637  }
638  Log() << kWARNING << "You have specified a deprecated option *NNodesMax="<<fNNodesMax
639  << "* \n this has been translated to MaxDepth="<<fMaxDepth<<Endl;
640  }
641 
642 
643  if (fUseNTrainEvents>0){
645  Log() << kWARNING << "You have specified a deprecated option *UseNTrainEvents="<<fUseNTrainEvents
646  << "* \n this has been translated to BaggedSampleFraction="<<fBaggedSampleFraction<<"(%)"<<Endl;
647  }
648 
649  if (fBoostType=="Bagging") fBaggedBoost = kTRUE;
650  if (fBaggedGradBoost){
652  Log() << kWARNING << "You have specified a deprecated option *UseBaggedGrad* --> please use *UseBaggedBoost* instead" << Endl;
653  }
654 
655 }
656 
657 ////////////////////////////////////////////////////////////////////////////////
658 
660  if (sizeInPercent > 0 && sizeInPercent < 50){
661  fMinNodeSize=sizeInPercent;
662 
663  } else {
664  Log() << kFATAL << "you have demanded a minimal node size of "
665  << sizeInPercent << "% of the training events.. \n"
666  << " that somehow does not make sense "<<Endl;
667  }
668 
669 }
670 
671 ////////////////////////////////////////////////////////////////////////////////
672 
674  sizeInPercent.ReplaceAll("%","");
675  sizeInPercent.ReplaceAll(" ","");
676  if (sizeInPercent.IsFloat()) SetMinNodeSize(sizeInPercent.Atof());
677  else {
678  Log() << kFATAL << "I had problems reading the option MinNodeEvents, which "
679  << "after removing a possible % sign now reads " << sizeInPercent << Endl;
680  }
681 }
682 
683 ////////////////////////////////////////////////////////////////////////////////
684 /// Common initialisation with defaults for the BDT-Method.
685 
687 {
688  fNTrees = 800;
690  fMaxDepth = 3;
691  fBoostType = "AdaBoost";
692  if(DataInfo().GetNClasses()!=0) //workaround for multiclass application
693  fMinNodeSize = 5.;
694  }else {
695  fMaxDepth = 50;
696  fBoostType = "AdaBoostR2";
697  fAdaBoostR2Loss = "Quadratic";
698  if(DataInfo().GetNClasses()!=0) //workaround for multiclass application
699  fMinNodeSize = .2;
700  }
701 
702 
703  fNCuts = 20;
704  fPruneMethodS = "NoPruning";
706  fPruneStrength = 0;
707  fAutomatic = kFALSE;
708  fFValidationEvents = 0.5;
710  // fUseNvars = (GetNvar()>12) ? UInt_t(GetNvar()/8) : TMath::Max(UInt_t(2),UInt_t(GetNvar()/3));
713  fShrinkage = 1.0;
714 // fSumOfWeights = 0.0;
715 
716  // reference cut value to distinguish signal-like from background-like events
718 }
719 
720 
721 ////////////////////////////////////////////////////////////////////////////////
722 /// Reset the method, as if it had just been instantiated (forget all training etc.).
723 
725 {
726  // I keep the BDT EventSample and its Validation sample (eventually they should all
727  // disappear and just use the DataSet samples ..
728 
729  // remove all the trees
730  for (UInt_t i=0; i<fForest.size(); i++) delete fForest[i];
731  fForest.clear();
732 
733  fBoostWeights.clear();
735  fVariableImportance.clear();
736  fResiduals.clear();
737  fLossFunctionEventInfo.clear();
738  // now done in "InitEventSample" which is called in "Train"
739  // reset all previously stored/accumulated BOOST weights in the event sample
740  //for (UInt_t iev=0; iev<fEventSample.size(); iev++) fEventSample[iev]->SetBoostWeight(1.);
742  Log() << kDEBUG << " successfully(?) reset the method " << Endl;
743 }
744 
745 
746 ////////////////////////////////////////////////////////////////////////////////
747 /// Destructor.
748 ///
749 /// - Note: fEventSample and ValidationSample are already deleted at the end of TRAIN
750 /// When they are not used anymore
751 
753 {
754  for (UInt_t i=0; i<fForest.size(); i++) delete fForest[i];
755 }
756 
757 ////////////////////////////////////////////////////////////////////////////////
758 /// Initialize the event sample (i.e. reset the boost-weights... etc).
759 
761 {
762  if (!HasTrainingTree()) Log() << kFATAL << "<Init> Data().TrainingTree() is zero pointer" << Endl;
763 
764  if (fEventSample.size() > 0) { // do not re-initialise the event sample, just set all boostweights to 1. as if it were untouched
765  // reset all previously stored/accumulated BOOST weights in the event sample
766  for (UInt_t iev=0; iev<fEventSample.size(); iev++) fEventSample[iev]->SetBoostWeight(1.);
767  } else {
769  UInt_t nevents = Data()->GetNTrainingEvents();
770 
771  std::vector<const TMVA::Event*> tmpEventSample;
772  for (Long64_t ievt=0; ievt<nevents; ievt++) {
773  // const Event *event = new Event(*(GetEvent(ievt)));
774  Event* event = new Event( *GetTrainingEvent(ievt) );
775  tmpEventSample.push_back(event);
776  }
777 
778  if (!DoRegression()) DeterminePreselectionCuts(tmpEventSample);
779  else fDoPreselection = kFALSE; // just to make sure...
780 
781  for (UInt_t i=0; i<tmpEventSample.size(); i++) delete tmpEventSample[i];
782 
783 
784  Bool_t firstNegWeight=kTRUE;
785  Bool_t firstZeroWeight=kTRUE;
786  for (Long64_t ievt=0; ievt<nevents; ievt++) {
787  // const Event *event = new Event(*(GetEvent(ievt)));
788  // const Event* event = new Event( *GetTrainingEvent(ievt) );
789  Event* event = new Event( *GetTrainingEvent(ievt) );
790  if (fDoPreselection){
791  if (TMath::Abs(ApplyPreselectionCuts(event)) > 0.05) {
792  delete event;
793  continue;
794  }
795  }
796 
797  if (event->GetWeight() < 0 && (IgnoreEventsWithNegWeightsInTraining() || fNoNegWeightsInTraining)){
798  if (firstNegWeight) {
799  Log() << kWARNING << " Note, you have events with negative event weight in the sample, but you've chosen to ignore them" << Endl;
800  firstNegWeight=kFALSE;
801  }
802  delete event;
803  }else if (event->GetWeight()==0){
804  if (firstZeroWeight) {
805  firstZeroWeight = kFALSE;
806  Log() << "Events with weight == 0 are going to be simply ignored " << Endl;
807  }
808  delete event;
809  }else{
810  if (event->GetWeight() < 0) {
812  if (firstNegWeight){
813  firstNegWeight = kFALSE;
815  Log() << kWARNING << "Events with negative event weights are found and "
816  << " will be removed prior to the actual BDT training by global "
817  << " paring (and subsequent annihilation) with positiv weight events"
818  << Endl;
819  }else{
820  Log() << kWARNING << "Events with negative event weights are USED during "
821  << "the BDT training. This might cause problems with small node sizes "
822  << "or with the boosting. Please remove negative events from training "
823  << "using the option *IgnoreEventsWithNegWeightsInTraining* in case you "
824  << "observe problems with the boosting"
825  << Endl;
826  }
827  }
828  }
829  // if fAutomatic == true you need a validation sample to optimize pruning
830  if (fAutomatic) {
831  Double_t modulo = 1.0/(fFValidationEvents);
832  Int_t imodulo = static_cast<Int_t>( fmod(modulo,1.0) > 0.5 ? ceil(modulo) : floor(modulo) );
833  if (ievt % imodulo == 0) fValidationSample.push_back( event );
834  else fEventSample.push_back( event );
835  }
836  else {
837  fEventSample.push_back(event);
838  }
839  }
840  }
841 
842  if (fAutomatic) {
843  Log() << kINFO << "<InitEventSample> Internally I use " << fEventSample.size()
844  << " for Training and " << fValidationSample.size()
845  << " for Pruning Validation (" << ((Float_t)fValidationSample.size())/((Float_t)fEventSample.size()+fValidationSample.size())*100.0
846  << "% of training used for validation)" << Endl;
847  }
848 
849  // some pre-processing for events with negative weights
851  }
852 
853  if (!DoRegression() && !fSkipNormalization){
854  Log() << kDEBUG << "\t<InitEventSample> For classification trees, "<< Endl;
855  Log() << kDEBUG << " \tthe effective number of backgrounds is scaled to match "<<Endl;
856  Log() << kDEBUG << " \tthe signal. Otherwise the first boosting step would do 'just that'!"<<Endl;
857  // it does not make sense in decision trees to start with unequal number of signal/background
858  // events (weights) .. hence normalize them now (happens otherwise in first 'boosting step'
859  // anyway..
860  // Also make sure, that the sum_of_weights == sample.size() .. as this is assumed in
861  // the DecisionTree to derive a sensible number for "fMinSize" (min.#events in node)
862  // that currently is an OR between "weighted" and "unweighted number"
863  // I want:
864  // nS + nB = n
865  // a*SW + b*BW = n
866  // (a*SW)/(b*BW) = fSigToBkgFraction
867  //
868  // ==> b = n/((1+f)BW) and a = (nf/(1+f))/SW
869 
870  Double_t nevents = fEventSample.size();
871  Double_t sumSigW=0, sumBkgW=0;
872  Int_t sumSig=0, sumBkg=0;
873  for (UInt_t ievt=0; ievt<fEventSample.size(); ievt++) {
874  if ((DataInfo().IsSignal(fEventSample[ievt])) ) {
875  sumSigW += fEventSample[ievt]->GetWeight();
876  sumSig++;
877  } else {
878  sumBkgW += fEventSample[ievt]->GetWeight();
879  sumBkg++;
880  }
881  }
882  if (sumSigW && sumBkgW){
883  Double_t normSig = nevents/((1+fSigToBkgFraction)*sumSigW)*fSigToBkgFraction;
884  Double_t normBkg = nevents/((1+fSigToBkgFraction)*sumBkgW); ;
885  Log() << kDEBUG << "\tre-normalise events such that Sig and Bkg have respective sum of weights = "
886  << fSigToBkgFraction << Endl;
887  Log() << kDEBUG << " \tsig->sig*"<<normSig << "ev. bkg->bkg*"<<normBkg << "ev." <<Endl;
888  Log() << kHEADER << "#events: (reweighted) sig: "<< sumSigW*normSig << " bkg: " << sumBkgW*normBkg << Endl;
889  Log() << kINFO << "#events: (unweighted) sig: "<< sumSig << " bkg: " << sumBkg << Endl;
890  for (Long64_t ievt=0; ievt<nevents; ievt++) {
891  if ((DataInfo().IsSignal(fEventSample[ievt])) ) fEventSample[ievt]->SetBoostWeight(normSig);
892  else fEventSample[ievt]->SetBoostWeight(normBkg);
893  }
894  }else{
895  Log() << kINFO << "--> could not determine scaling factors as either there are " << Endl;
896  Log() << kINFO << " no signal events (sumSigW="<<sumSigW<<") or no bkg ev. (sumBkgW="<<sumBkgW<<")"<<Endl;
897  }
898 
899  }
900 
902  if (fBaggedBoost){
905  }
906 
907  //just for debug purposes..
908  /*
909  sumSigW=0;
910  sumBkgW=0;
911  for (UInt_t ievt=0; ievt<fEventSample.size(); ievt++) {
912  if ((DataInfo().IsSignal(fEventSample[ievt])) ) sumSigW += fEventSample[ievt]->GetWeight();
913  else sumBkgW += fEventSample[ievt]->GetWeight();
914  }
915  Log() << kWARNING << "sigSumW="<<sumSigW<<"bkgSumW="<<sumBkgW<< Endl;
916  */
917 }
918 
919 ////////////////////////////////////////////////////////////////////////////////
920 /// O.k. you know there are events with negative event weights. This routine will remove
921 /// them by pairing them with the closest event(s) of the same event class with positive
922 /// weights
923 /// A first attempt is "brute force", I dont' try to be clever using search trees etc,
924 /// just quick and dirty to see if the result is any good
925 
927  Double_t totalNegWeights = 0;
928  Double_t totalPosWeights = 0;
929  Double_t totalWeights = 0;
930  std::vector<const Event*> negEvents;
931  for (UInt_t iev = 0; iev < fEventSample.size(); iev++){
932  if (fEventSample[iev]->GetWeight() < 0) {
933  totalNegWeights += fEventSample[iev]->GetWeight();
934  negEvents.push_back(fEventSample[iev]);
935  } else {
936  totalPosWeights += fEventSample[iev]->GetWeight();
937  }
938  totalWeights += fEventSample[iev]->GetWeight();
939  }
940  if (totalNegWeights == 0 ) {
941  Log() << kINFO << "no negative event weights found .. no preprocessing necessary" << Endl;
942  return;
943  } else {
944  Log() << kINFO << "found a total of " << totalNegWeights << " of negative event weights which I am going to try to pair with positive events to annihilate them" << Endl;
945  Log() << kINFO << "found a total of " << totalPosWeights << " of events with positive weights" << Endl;
946  Log() << kINFO << "--> total sum of weights = " << totalWeights << " = " << totalNegWeights+totalPosWeights << Endl;
947  }
948 
949  std::vector<TMatrixDSym*>* cov = gTools().CalcCovarianceMatrices( fEventSample, 2);
950 
951  TMatrixDSym *invCov;
952 
953  for (Int_t i=0; i<2; i++){
954  invCov = ((*cov)[i]);
955  if ( TMath::Abs(invCov->Determinant()) < 10E-24 ) {
956  std::cout << "<MethodBDT::PreProcessNeg...> matrix is almost singular with determinant="
957  << TMath::Abs(invCov->Determinant())
958  << " did you use the variables that are linear combinations or highly correlated?"
959  << std::endl;
960  }
961  if ( TMath::Abs(invCov->Determinant()) < 10E-120 ) {
962  std::cout << "<MethodBDT::PreProcessNeg...> matrix is singular with determinant="
963  << TMath::Abs(invCov->Determinant())
964  << " did you use the variables that are linear combinations?"
965  << std::endl;
966  }
967 
968  invCov->Invert();
969  }
970 
971 
972 
973  Log() << kINFO << "Found a total of " << totalNegWeights << " in negative weights out of " << fEventSample.size() << " training events " << Endl;
974  Timer timer(negEvents.size(),"Negative Event paired");
975  for (UInt_t nev = 0; nev < negEvents.size(); nev++){
976  timer.DrawProgressBar( nev );
977  Double_t weight = negEvents[nev]->GetWeight();
978  UInt_t iClassID = negEvents[nev]->GetClass();
979  invCov = ((*cov)[iClassID]);
980  while (weight < 0){
981  // find closest event with positive event weight and "pair" it with the negative event
982  // (add their weight) until there is no negative weight anymore
983  Int_t iMin=-1;
984  Double_t dist, minDist=10E270;
985  for (UInt_t iev = 0; iev < fEventSample.size(); iev++){
986  if (iClassID==fEventSample[iev]->GetClass() && fEventSample[iev]->GetWeight() > 0){
987  dist=0;
988  for (UInt_t ivar=0; ivar < GetNvar(); ivar++){
989  for (UInt_t jvar=0; jvar<GetNvar(); jvar++){
990  dist += (negEvents[nev]->GetValue(ivar)-fEventSample[iev]->GetValue(ivar))*
991  (*invCov)[ivar][jvar]*
992  (negEvents[nev]->GetValue(jvar)-fEventSample[iev]->GetValue(jvar));
993  }
994  }
995  if (dist < minDist) { iMin=iev; minDist=dist;}
996  }
997  }
998 
999  if (iMin > -1) {
1000  // std::cout << "Happily pairing .. weight before : " << negEvents[nev]->GetWeight() << " and " << fEventSample[iMin]->GetWeight();
1001  Double_t newWeight = (negEvents[nev]->GetWeight() + fEventSample[iMin]->GetWeight());
1002  if (newWeight > 0){
1003  negEvents[nev]->SetBoostWeight( 0 );
1004  fEventSample[iMin]->SetBoostWeight( newWeight/fEventSample[iMin]->GetOriginalWeight() ); // note the weight*boostweight should be "newWeight"
1005  } else {
1006  negEvents[nev]->SetBoostWeight( newWeight/negEvents[nev]->GetOriginalWeight() ); // note the weight*boostweight should be "newWeight"
1007  fEventSample[iMin]->SetBoostWeight( 0 );
1008  }
1009  // std::cout << " and afterwards " << negEvents[nev]->GetWeight() << " and the paired " << fEventSample[iMin]->GetWeight() << " dist="<<minDist<< std::endl;
1010  } else Log() << kFATAL << "preprocessing didn't find event to pair with the negative weight ... probably a bug" << Endl;
1011  weight = negEvents[nev]->GetWeight();
1012  }
1013  }
1014  Log() << kINFO << "<Negative Event Pairing> took: " << timer.GetElapsedTime()
1015  << " " << Endl;
1016 
1017  // just check.. now there should be no negative event weight left anymore
1018  totalNegWeights = 0;
1019  totalPosWeights = 0;
1020  totalWeights = 0;
1021  Double_t sigWeight=0;
1022  Double_t bkgWeight=0;
1023  Int_t nSig=0;
1024  Int_t nBkg=0;
1025 
1026  std::vector<const Event*> newEventSample;
1027 
1028  for (UInt_t iev = 0; iev < fEventSample.size(); iev++){
1029  if (fEventSample[iev]->GetWeight() < 0) {
1030  totalNegWeights += fEventSample[iev]->GetWeight();
1031  totalWeights += fEventSample[iev]->GetWeight();
1032  } else {
1033  totalPosWeights += fEventSample[iev]->GetWeight();
1034  totalWeights += fEventSample[iev]->GetWeight();
1035  }
1036  if (fEventSample[iev]->GetWeight() > 0) {
1037  newEventSample.push_back(new Event(*fEventSample[iev]));
1038  if (fEventSample[iev]->GetClass() == fSignalClass){
1039  sigWeight += fEventSample[iev]->GetWeight();
1040  nSig+=1;
1041  }else{
1042  bkgWeight += fEventSample[iev]->GetWeight();
1043  nBkg+=1;
1044  }
1045  }
1046  }
1047  if (totalNegWeights < 0) Log() << kFATAL << " compensation of negative event weights with positive ones did not work " << totalNegWeights << Endl;
1048 
1049  for (UInt_t i=0; i<fEventSample.size(); i++) delete fEventSample[i];
1050  fEventSample = newEventSample;
1051 
1052  Log() << kINFO << " after PreProcessing, the Event sample is left with " << fEventSample.size() << " events (unweighted), all with positive weights, adding up to " << totalWeights << Endl;
1053  Log() << kINFO << " nSig="<<nSig << " sigWeight="<<sigWeight << " nBkg="<<nBkg << " bkgWeight="<<bkgWeight << Endl;
1054 
1055 
1056 }
1057 
1058 ////////////////////////////////////////////////////////////////////////////////
1059 /// Call the Optimizer with the set of parameters and ranges that
1060 /// are meant to be tuned.
1061 
1062 std::map<TString,Double_t> TMVA::MethodBDT::OptimizeTuningParameters(TString fomType, TString fitType)
1063 {
1064  // fill all the tuning parameters that should be optimized into a map:
1065  std::map<TString,TMVA::Interval*> tuneParameters;
1066  std::map<TString,Double_t> tunedParameters;
1067 
1068  // note: the 3rd parameter in the interval is the "number of bins", NOT the stepsize !!
1069  // the actual VALUES at (at least for the scan, guess also in GA) are always
1070  // read from the middle of the bins. Hence.. the choice of Intervals e.g. for the
1071  // MaxDepth, in order to make nice integer values!!!
1072 
1073  // find some reasonable ranges for the optimisation of MinNodeEvents:
1074 
1075  tuneParameters.insert(std::pair<TString,Interval*>("NTrees", new Interval(10,1000,5))); // stepsize 50
1076  tuneParameters.insert(std::pair<TString,Interval*>("MaxDepth", new Interval(2,4,3))); // stepsize 1
1077  tuneParameters.insert(std::pair<TString,Interval*>("MinNodeSize", new LogInterval(1,30,30))); //
1078  //tuneParameters.insert(std::pair<TString,Interval*>("NodePurityLimit",new Interval(.4,.6,3))); // stepsize .1
1079  //tuneParameters.insert(std::pair<TString,Interval*>("BaggedSampleFraction",new Interval(.4,.9,6))); // stepsize .1
1080 
1081  // method-specific parameters
1082  if (fBoostType=="AdaBoost"){
1083  tuneParameters.insert(std::pair<TString,Interval*>("AdaBoostBeta", new Interval(.2,1.,5)));
1084 
1085  }else if (fBoostType=="Grad"){
1086  tuneParameters.insert(std::pair<TString,Interval*>("Shrinkage", new Interval(0.05,0.50,5)));
1087 
1088  }else if (fBoostType=="Bagging" && fRandomisedTrees){
1089  Int_t min_var = TMath::FloorNint( GetNvar() * .25 );
1090  Int_t max_var = TMath::CeilNint( GetNvar() * .75 );
1091  tuneParameters.insert(std::pair<TString,Interval*>("UseNvars", new Interval(min_var,max_var,4)));
1092 
1093  }
1094 
1095  Log()<<kINFO << " the following BDT parameters will be tuned on the respective *grid*\n"<<Endl;
1096  std::map<TString,TMVA::Interval*>::iterator it;
1097  for(it=tuneParameters.begin(); it!= tuneParameters.end(); it++){
1098  Log() << kWARNING << it->first << Endl;
1099  std::ostringstream oss;
1100  (it->second)->Print(oss);
1101  Log()<<oss.str();
1102  Log()<<Endl;
1103  }
1104 
1105  OptimizeConfigParameters optimize(this, tuneParameters, fomType, fitType);
1106  tunedParameters=optimize.optimize();
1107 
1108  return tunedParameters;
1109 
1110 }
1111 
1112 ////////////////////////////////////////////////////////////////////////////////
1113 /// Set the tuning parameters according to the argument.
1114 
1115 void TMVA::MethodBDT::SetTuneParameters(std::map<TString,Double_t> tuneParameters)
1116 {
1117  std::map<TString,Double_t>::iterator it;
1118  for(it=tuneParameters.begin(); it!= tuneParameters.end(); it++){
1119  Log() << kWARNING << it->first << " = " << it->second << Endl;
1120  if (it->first == "MaxDepth" ) SetMaxDepth ((Int_t)it->second);
1121  else if (it->first == "MinNodeSize" ) SetMinNodeSize (it->second);
1122  else if (it->first == "NTrees" ) SetNTrees ((Int_t)it->second);
1123  else if (it->first == "NodePurityLimit") SetNodePurityLimit (it->second);
1124  else if (it->first == "AdaBoostBeta" ) SetAdaBoostBeta (it->second);
1125  else if (it->first == "Shrinkage" ) SetShrinkage (it->second);
1126  else if (it->first == "UseNvars" ) SetUseNvars ((Int_t)it->second);
1127  else if (it->first == "BaggedSampleFraction" ) SetBaggedSampleFraction (it->second);
1128  else Log() << kFATAL << " SetParameter for " << it->first << " not yet implemented " <<Endl;
1129  }
1130 }
1131 
1132 ////////////////////////////////////////////////////////////////////////////////
1133 /// BDT training.
1134 
1136 {
1138 
1139  // fill the STL Vector with the event sample
1140  // (needs to be done here and cannot be done in "init" as the options need to be
1141  // known).
1142  InitEventSample();
1143 
1144  if (fNTrees==0){
1145  Log() << kERROR << " Zero Decision Trees demanded... that does not work !! "
1146  << " I set it to 1 .. just so that the program does not crash"
1147  << Endl;
1148  fNTrees = 1;
1149  }
1150 
1152  std::vector<TString> titles = {"Boost weight", "Error Fraction"};
1153  fInteractive->Init(titles);
1154  }
1155  fIPyMaxIter = fNTrees;
1156  fExitFromTraining = false;
1157 
1158  // HHV (it's been here since looong but I really don't know why we cannot handle
1159  // normalized variables in BDTs... todo
1160  if (IsNormalised()) Log() << kFATAL << "\"Normalise\" option cannot be used with BDT; "
1161  << "please remove the option from the configuration string, or "
1162  << "use \"!Normalise\""
1163  << Endl;
1164 
1165  if(DoRegression())
1166  Log() << kINFO << "Regression Loss Function: "<< fRegressionLossFunctionBDTG->Name() << Endl;
1167 
1168  Log() << kINFO << "Training "<< fNTrees << " Decision Trees ... patience please" << Endl;
1169 
1170  Log() << kDEBUG << "Training with maximal depth = " <<fMaxDepth
1171  << ", MinNodeEvents=" << fMinNodeEvents
1172  << ", NTrees="<<fNTrees
1173  << ", NodePurityLimit="<<fNodePurityLimit
1174  << ", AdaBoostBeta="<<fAdaBoostBeta
1175  << Endl;
1176 
1177  // weights applied in boosting
1178  Int_t nBins;
1179  Double_t xMin,xMax;
1180  TString hname = "AdaBooost weight distribution";
1181 
1182  nBins= 100;
1183  xMin = 0;
1184  xMax = 30;
1185 
1186  if (DoRegression()) {
1187  nBins= 100;
1188  xMin = 0;
1189  xMax = 1;
1190  hname="Boost event weights distribution";
1191  }
1192 
1193  // book monitoring histograms (for AdaBost only)
1194 
1195  TH1* h = new TH1F(Form("%s_BoostWeight",DataInfo().GetName()),hname,nBins,xMin,xMax);
1196  TH1* nodesBeforePruningVsTree = new TH1I(Form("%s_NodesBeforePruning",DataInfo().GetName()),"nodes before pruning",fNTrees,0,fNTrees);
1197  TH1* nodesAfterPruningVsTree = new TH1I(Form("%s_NodesAfterPruning",DataInfo().GetName()),"nodes after pruning",fNTrees,0,fNTrees);
1198 
1199 
1200 
1201  if(!DoMulticlass()){
1203 
1204  h->SetXTitle("boost weight");
1205  results->Store(h, "BoostWeights");
1206 
1207 
1208  // Monitor the performance (on TEST sample) versus number of trees
1209  if (fDoBoostMonitor){
1210  TH2* boostMonitor = new TH2F("BoostMonitor","ROC Integral Vs iTree",2,0,fNTrees,2,0,1.05);
1211  boostMonitor->SetXTitle("#tree");
1212  boostMonitor->SetYTitle("ROC Integral");
1213  results->Store(boostMonitor, "BoostMonitor");
1214  TGraph *boostMonitorGraph = new TGraph();
1215  boostMonitorGraph->SetName("BoostMonitorGraph");
1216  boostMonitorGraph->SetTitle("ROCIntegralVsNTrees");
1217  results->Store(boostMonitorGraph, "BoostMonitorGraph");
1218  }
1219 
1220  // weights applied in boosting vs tree number
1221  h = new TH1F("BoostWeightVsTree","Boost weights vs tree",fNTrees,0,fNTrees);
1222  h->SetXTitle("#tree");
1223  h->SetYTitle("boost weight");
1224  results->Store(h, "BoostWeightsVsTree");
1225 
1226  // error fraction vs tree number
1227  h = new TH1F("ErrFractHist","error fraction vs tree number",fNTrees,0,fNTrees);
1228  h->SetXTitle("#tree");
1229  h->SetYTitle("error fraction");
1230  results->Store(h, "ErrorFrac");
1231 
1232  // nNodesBeforePruning vs tree number
1233  nodesBeforePruningVsTree->SetXTitle("#tree");
1234  nodesBeforePruningVsTree->SetYTitle("#tree nodes");
1235  results->Store(nodesBeforePruningVsTree);
1236 
1237  // nNodesAfterPruning vs tree number
1238  nodesAfterPruningVsTree->SetXTitle("#tree");
1239  nodesAfterPruningVsTree->SetYTitle("#tree nodes");
1240  results->Store(nodesAfterPruningVsTree);
1241 
1242  }
1243 
1244  fMonitorNtuple= new TTree("MonitorNtuple","BDT variables");
1245  fMonitorNtuple->Branch("iTree",&fITree,"iTree/I");
1246  fMonitorNtuple->Branch("boostWeight",&fBoostWeight,"boostWeight/D");
1247  fMonitorNtuple->Branch("errorFraction",&fErrorFraction,"errorFraction/D");
1248 
1249  Timer timer( fNTrees, GetName() );
1250  Int_t nNodesBeforePruningCount = 0;
1251  Int_t nNodesAfterPruningCount = 0;
1252 
1253  Int_t nNodesBeforePruning = 0;
1254  Int_t nNodesAfterPruning = 0;
1255 
1256 
1257  if(fBoostType=="Grad"){
1259  }
1260 
1261  Int_t itree=0;
1262  Bool_t continueBoost=kTRUE;
1263  //for (int itree=0; itree<fNTrees; itree++) {
1264  while (itree < fNTrees && continueBoost){
1265  if (fExitFromTraining) break;
1266  fIPyCurrentIter = itree;
1267  timer.DrawProgressBar( itree );
1268  // Results* results = Data()->GetResults(GetMethodName(), Types::kTraining, GetAnalysisType());
1269  // TH1 *hxx = new TH1F(Form("swdist%d",itree),Form("swdist%d",itree),10000,0,15);
1270  // results->Store(hxx,Form("swdist%d",itree));
1271  // TH1 *hxy = new TH1F(Form("bwdist%d",itree),Form("bwdist%d",itree),10000,0,15);
1272  // results->Store(hxy,Form("bwdist%d",itree));
1273  // for (Int_t iev=0; iev<fEventSample.size(); iev++) {
1274  // if (fEventSample[iev]->GetClass()!=0) hxy->Fill((fEventSample[iev])->GetWeight());
1275  // else hxx->Fill((fEventSample[iev])->GetWeight());
1276  // }
1277 
1278  if(DoMulticlass()){
1279  if (fBoostType!="Grad"){
1280  Log() << kFATAL << "Multiclass is currently only supported by gradient boost. "
1281  << "Please change boost option accordingly (GradBoost)."
1282  << Endl;
1283  }
1284  UInt_t nClasses = DataInfo().GetNClasses();
1285  for (UInt_t i=0;i<nClasses;i++){
1286  fForest.push_back( new DecisionTree( fSepType, fMinNodeSize, fNCuts, &(DataInfo()), i,
1288  itree*nClasses+i, fNodePurityLimit, itree*nClasses+1));
1289  fForest.back()->SetNVars(GetNvar());
1290  if (fUseFisherCuts) {
1291  fForest.back()->SetUseFisherCuts();
1292  fForest.back()->SetMinLinCorrForFisher(fMinLinCorrForFisher);
1293  fForest.back()->SetUseExclusiveVars(fUseExclusiveVars);
1294  }
1295  // the minimum linear correlation between two variables demanded for use in fisher criterion in node splitting
1296 
1297  nNodesBeforePruning = fForest.back()->BuildTree(*fTrainSample);
1298  Double_t bw = this->Boost(*fTrainSample, fForest.back(),i);
1299  if (bw > 0) {
1300  fBoostWeights.push_back(bw);
1301  }else{
1302  fBoostWeights.push_back(0);
1303  Log() << kWARNING << "stopped boosting at itree="<<itree << Endl;
1304  // fNTrees = itree+1; // that should stop the boosting
1305  continueBoost=kFALSE;
1306  }
1307  }
1308  }
1309  else{
1312  itree, fNodePurityLimit, itree));
1313  fForest.back()->SetNVars(GetNvar());
1314  if (fUseFisherCuts) {
1315  fForest.back()->SetUseFisherCuts();
1316  fForest.back()->SetMinLinCorrForFisher(fMinLinCorrForFisher);
1317  fForest.back()->SetUseExclusiveVars(fUseExclusiveVars);
1318  }
1319 
1320  nNodesBeforePruning = fForest.back()->BuildTree(*fTrainSample);
1321 
1322  if (fUseYesNoLeaf && !DoRegression() && fBoostType!="Grad") { // remove leaf nodes where both daughter nodes are of same type
1323  nNodesBeforePruning = fForest.back()->CleanTree();
1324  }
1325 
1326  nNodesBeforePruningCount += nNodesBeforePruning;
1327  nodesBeforePruningVsTree->SetBinContent(itree+1,nNodesBeforePruning);
1328 
1329  fForest.back()->SetPruneMethod(fPruneMethod); // set the pruning method for the tree
1330  fForest.back()->SetPruneStrength(fPruneStrength); // set the strength parameter
1331 
1332  std::vector<const Event*> * validationSample = NULL;
1333  if(fAutomatic) validationSample = &fValidationSample;
1334 
1335  Double_t bw = this->Boost(*fTrainSample, fForest.back());
1336  if (bw > 0) {
1337  fBoostWeights.push_back(bw);
1338  }else{
1339  fBoostWeights.push_back(0);
1340  Log() << kWARNING << "stopped boosting at itree="<<itree << Endl;
1341  continueBoost=kFALSE;
1342  }
1343 
1344 
1345 
1346  // if fAutomatic == true, pruneStrength will be the optimal pruning strength
1347  // determined by the pruning algorithm; otherwise, it is simply the strength parameter
1348  // set by the user
1349  if (fPruneMethod != DecisionTree::kNoPruning) fForest.back()->PruneTree(validationSample);
1350 
1351  if (fUseYesNoLeaf && !DoRegression() && fBoostType!="Grad"){ // remove leaf nodes where both daughter nodes are of same type
1352  fForest.back()->CleanTree();
1353  }
1354  nNodesAfterPruning = fForest.back()->GetNNodes();
1355  nNodesAfterPruningCount += nNodesAfterPruning;
1356  nodesAfterPruningVsTree->SetBinContent(itree+1,nNodesAfterPruning);
1357 
1358  if (fInteractive){
1360  }
1361  fITree = itree;
1362  fMonitorNtuple->Fill();
1363  if (fDoBoostMonitor){
1364  if (! DoRegression() ){
1365  if ( itree==fNTrees-1 || (!(itree%500)) ||
1366  (!(itree%250) && itree <1000)||
1367  (!(itree%100) && itree < 500)||
1368  (!(itree%50) && itree < 250)||
1369  (!(itree%25) && itree < 150)||
1370  (!(itree%10) && itree < 50)||
1371  (!(itree%5) && itree < 20)
1372  ) BoostMonitor(itree);
1373  }
1374  }
1375  }
1376  itree++;
1377  }
1378 
1379  // get elapsed time
1380  Log() << kDEBUG << "\t<Train> elapsed time: " << timer.GetElapsedTime()
1381  << " " << Endl;
1383  Log() << kDEBUG << "\t<Train> average number of nodes (w/o pruning) : "
1384  << nNodesBeforePruningCount/GetNTrees() << Endl;
1385  }
1386  else {
1387  Log() << kDEBUG << "\t<Train> average number of nodes before/after pruning : "
1388  << nNodesBeforePruningCount/GetNTrees() << " / "
1389  << nNodesAfterPruningCount/GetNTrees()
1390  << Endl;
1391  }
1393 
1394 
1395  // reset all previously stored/accumulated BOOST weights in the event sample
1396  // for (UInt_t iev=0; iev<fEventSample.size(); iev++) fEventSample[iev]->SetBoostWeight(1.);
1397  Log() << kDEBUG << "Now I delete the privat data sample"<< Endl;
1398  for (UInt_t i=0; i<fEventSample.size(); i++) delete fEventSample[i];
1399  for (UInt_t i=0; i<fValidationSample.size(); i++) delete fValidationSample[i];
1400  fEventSample.clear();
1401  fValidationSample.clear();
1402 
1404  ExitFromTraining();
1405 }
1406 
1407 
1408 ////////////////////////////////////////////////////////////////////////////////
1409 /// Returns MVA value: -1 for background, 1 for signal.
1410 
1412 {
1413  Double_t sum=0;
1414  for (UInt_t itree=0; itree<nTrees; itree++) {
1415  //loop over all trees in forest
1416  sum += fForest[itree]->CheckEvent(e,kFALSE);
1417 
1418  }
1419  return 2.0/(1.0+exp(-2.0*sum))-1; //MVA output between -1 and 1
1420 }
1421 
1422 ////////////////////////////////////////////////////////////////////////////////
1423 /// Calculate residual for all events.
1424 
1425 void TMVA::MethodBDT::UpdateTargets(std::vector<const TMVA::Event*>& eventSample, UInt_t cls)
1426 {
1427  if (DoMulticlass()) {
1428  UInt_t nClasses = DataInfo().GetNClasses();
1429  std::vector<Double_t> expCache;
1430  if (cls == nClasses - 1) {
1431  expCache.resize(nClasses);
1432  }
1433  for (auto e : eventSample) {
1434  fResiduals[e].at(cls) += fForest.back()->CheckEvent(e, kFALSE);
1435  if (cls == nClasses - 1) {
1436  auto &residualsThisEvent = fResiduals[e];
1437  std::transform(residualsThisEvent.begin(),
1438  residualsThisEvent.begin() + nClasses,
1439  expCache.begin(), [](Double_t d) { return exp(d); });
1440  for (UInt_t i = 0; i < nClasses; i++) {
1441  Double_t norm = 0.0;
1442  for (UInt_t j = 0; j < nClasses; j++) {
1443  if (i != j) {
1444  norm += expCache[j] / expCache[i];
1445  }
1446  }
1447  Double_t p_cls = 1.0 / (1.0 + norm);
1448  Double_t res = (e->GetClass() == i) ? (1.0 - p_cls) : (-p_cls);
1449  const_cast<TMVA::Event *>(e)->SetTarget(i, res);
1450  }
1451  }
1452  }
1453  } else {
1454  for (auto e : eventSample) {
1455  auto &residualAt0 = fResiduals[e].at(0);
1456  residualAt0 += fForest.back()->CheckEvent(e, kFALSE);
1457  Double_t p_sig = 1.0 / (1.0 + exp(-2.0 * residualAt0));
1458  Double_t res = (DataInfo().IsSignal(e) ? 1 : 0) - p_sig;
1459  const_cast<TMVA::Event *>(e)->SetTarget(0, res);
1460  }
1461  }
1462 }
1463 
1464 ////////////////////////////////////////////////////////////////////////////////
1465 /// Calculate current residuals for all events and update targets for next iteration.
1466 
1467 void TMVA::MethodBDT::UpdateTargetsRegression(std::vector<const TMVA::Event*>& eventSample, Bool_t first)
1468 {
1469  if(!first){
1470  for (std::vector<const TMVA::Event*>::const_iterator e=fEventSample.begin(); e!=fEventSample.end();e++) {
1471  fLossFunctionEventInfo[*e].predictedValue += fForest.back()->CheckEvent(*e,kFALSE);
1472  }
1473  }
1474 
1476 }
1477 
1478 ////////////////////////////////////////////////////////////////////////////////
1479 /// Calculate the desired response value for each region.
1480 
1481 Double_t TMVA::MethodBDT::GradBoost(std::vector<const TMVA::Event*>& eventSample, DecisionTree *dt, UInt_t cls)
1482 {
1483  struct LeafInfo {
1484  Double_t sumWeightTarget = 0;
1485  Double_t sum2 = 0;
1486  };
1487 
1488  std::unordered_map<TMVA::DecisionTreeNode*, LeafInfo> leaves;
1489  for (auto e : eventSample) {
1490  Double_t weight = e->GetWeight();
1491  TMVA::DecisionTreeNode* node = dt->GetEventNode(*e);
1492  auto &v = leaves[node];
1493  auto target = e->GetTarget(cls);
1494  v.sumWeightTarget += target * weight;
1495  v.sum2 += fabs(target) * (1.0-fabs(target)) * weight * weight;
1496  }
1497  for (auto &iLeave : leaves) {
1498  constexpr auto minValue = 1e-30;
1499  if (iLeave.second.sum2 < minValue) {
1500  iLeave.second.sum2 = minValue;
1501  }
1502  iLeave.first->SetResponse(fShrinkage/DataInfo().GetNClasses() * iLeave.second.sumWeightTarget/iLeave.second.sum2);
1503  }
1504 
1505  //call UpdateTargets before next tree is grown
1506 
1508  return 1; //trees all have the same weight
1509 }
1510 
1511 ////////////////////////////////////////////////////////////////////////////////
1512 /// Implementation of M_TreeBoost using any loss function as described by Friedman 1999.
1513 
1514 Double_t TMVA::MethodBDT::GradBoostRegression(std::vector<const TMVA::Event*>& eventSample, DecisionTree *dt )
1515 {
1516  // get the vector of events for each terminal so that we can calculate the constant fit value in each
1517  // terminal node
1518  std::map<TMVA::DecisionTreeNode*,vector< TMVA::LossFunctionEventInfo > > leaves;
1519  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1520  TMVA::DecisionTreeNode* node = dt->GetEventNode(*(*e));
1521  (leaves[node]).push_back(fLossFunctionEventInfo[*e]);
1522  }
1523 
1524  // calculate the constant fit for each terminal node based upon the events in the node
1525  // node (iLeave->first), vector of event information (iLeave->second)
1526  for (std::map<TMVA::DecisionTreeNode*,vector< TMVA::LossFunctionEventInfo > >::iterator iLeave=leaves.begin();
1527  iLeave!=leaves.end();++iLeave){
1528  Double_t fit = fRegressionLossFunctionBDTG->Fit(iLeave->second);
1529  (iLeave->first)->SetResponse(fShrinkage*fit);
1530  }
1531 
1533  return 1;
1534 }
1535 
1536 ////////////////////////////////////////////////////////////////////////////////
1537 /// Initialize targets for first tree.
1538 
1539 void TMVA::MethodBDT::InitGradBoost( std::vector<const TMVA::Event*>& eventSample)
1540 {
1541  // Should get rid of this line. It's just for debugging.
1542  //std::sort(eventSample.begin(), eventSample.end(), [](const TMVA::Event* a, const TMVA::Event* b){
1543  // return (a->GetTarget(0) < b->GetTarget(0)); });
1544  fSepType=NULL; //set fSepType to NULL (regression trees are used for both classification an regression)
1545  if(DoRegression()){
1546  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1547  fLossFunctionEventInfo[*e]= TMVA::LossFunctionEventInfo((*e)->GetTarget(0), 0, (*e)->GetWeight());
1548  }
1549 
1552  return;
1553  }
1554  else if(DoMulticlass()){
1555  UInt_t nClasses = DataInfo().GetNClasses();
1556  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1557  for (UInt_t i=0;i<nClasses;i++){
1558  //Calculate initial residua, assuming equal probability for all classes
1559  Double_t r = (*e)->GetClass()==i?(1-1.0/nClasses):(-1.0/nClasses);
1560  const_cast<TMVA::Event*>(*e)->SetTarget(i,r);
1561  fResiduals[*e].push_back(0);
1562  }
1563  }
1564  }
1565  else{
1566  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1567  Double_t r = (DataInfo().IsSignal(*e)?1:0)-0.5; //Calculate initial residua
1568  const_cast<TMVA::Event*>(*e)->SetTarget(0,r);
1569  fResiduals[*e].push_back(0);
1570  }
1571  }
1572 
1573 }
1574 ////////////////////////////////////////////////////////////////////////////////
1575 /// Test the tree quality.. in terms of Misclassification.
1576 
1578 {
1579  Double_t ncorrect=0, nfalse=0;
1580  for (UInt_t ievt=0; ievt<fValidationSample.size(); ievt++) {
1581  Bool_t isSignalType= (dt->CheckEvent(fValidationSample[ievt]) > fNodePurityLimit ) ? 1 : 0;
1582 
1583  if (isSignalType == (DataInfo().IsSignal(fValidationSample[ievt])) ) {
1584  ncorrect += fValidationSample[ievt]->GetWeight();
1585  }
1586  else{
1587  nfalse += fValidationSample[ievt]->GetWeight();
1588  }
1589  }
1590 
1591  return ncorrect / (ncorrect + nfalse);
1592 }
1593 
1594 ////////////////////////////////////////////////////////////////////////////////
1595 /// Apply the boosting algorithm (the algorithm is selecte via the the "option" given
1596 /// in the constructor. The return value is the boosting weight.
1597 
1598 Double_t TMVA::MethodBDT::Boost( std::vector<const TMVA::Event*>& eventSample, DecisionTree *dt, UInt_t cls )
1599 {
1600  Double_t returnVal=-1;
1601 
1602  if (fBoostType=="AdaBoost") returnVal = this->AdaBoost (eventSample, dt);
1603  else if (fBoostType=="AdaCost") returnVal = this->AdaCost (eventSample, dt);
1604  else if (fBoostType=="Bagging") returnVal = this->Bagging ( );
1605  else if (fBoostType=="RegBoost") returnVal = this->RegBoost (eventSample, dt);
1606  else if (fBoostType=="AdaBoostR2") returnVal = this->AdaBoostR2(eventSample, dt);
1607  else if (fBoostType=="Grad"){
1608  if(DoRegression())
1609  returnVal = this->GradBoostRegression(eventSample, dt);
1610  else if(DoMulticlass())
1611  returnVal = this->GradBoost (eventSample, dt, cls);
1612  else
1613  returnVal = this->GradBoost (eventSample, dt);
1614  }
1615  else {
1616  Log() << kINFO << GetOptions() << Endl;
1617  Log() << kFATAL << "<Boost> unknown boost option " << fBoostType<< " called" << Endl;
1618  }
1619 
1620  if (fBaggedBoost){
1622  }
1623 
1624 
1625  return returnVal;
1626 }
1627 
1628 ////////////////////////////////////////////////////////////////////////////////
1629 /// Fills the ROCIntegral vs Itree from the testSample for the monitoring plots
1630 /// during the training .. but using the testing events
1631 
1633 {
1635 
1636  TH1F *tmpS = new TH1F( "tmpS", "", 100 , -1., 1.00001 );
1637  TH1F *tmpB = new TH1F( "tmpB", "", 100 , -1., 1.00001 );
1638  TH1F *tmp;
1639 
1640 
1641  UInt_t signalClassNr = DataInfo().GetClassInfo("Signal")->GetNumber();
1642 
1643  // const std::vector<Event*> events=Data()->GetEventCollection(Types::kTesting);
1644  // // fMethod->GetTransformationHandler().CalcTransformations(fMethod->Data()->GetEventCollection(Types::kTesting));
1645  // for (UInt_t iev=0; iev < events.size() ; iev++){
1646  // if (events[iev]->GetClass() == signalClassNr) tmp=tmpS;
1647  // else tmp=tmpB;
1648  // tmp->Fill(PrivateGetMvaValue(*(events[iev])),events[iev]->GetWeight());
1649  // }
1650 
1651  UInt_t nevents = Data()->GetNTestEvents();
1652  for (UInt_t iev=0; iev < nevents; iev++){
1653  const Event* event = GetTestingEvent(iev);
1654 
1655  if (event->GetClass() == signalClassNr) {tmp=tmpS;}
1656  else {tmp=tmpB;}
1657  tmp->Fill(PrivateGetMvaValue(event),event->GetWeight());
1658  }
1659  Double_t max=1;
1660 
1661  std::vector<TH1F*> hS;
1662  std::vector<TH1F*> hB;
1663  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
1664  hS.push_back(new TH1F(Form("SigVar%dAtTree%d",ivar,iTree),Form("SigVar%dAtTree%d",ivar,iTree),100,DataInfo().GetVariableInfo(ivar).GetMin(),DataInfo().GetVariableInfo(ivar).GetMax()));
1665  hB.push_back(new TH1F(Form("BkgVar%dAtTree%d",ivar,iTree),Form("BkgVar%dAtTree%d",ivar,iTree),100,DataInfo().GetVariableInfo(ivar).GetMin(),DataInfo().GetVariableInfo(ivar).GetMax()));
1666  results->Store(hS.back(),hS.back()->GetTitle());
1667  results->Store(hB.back(),hB.back()->GetTitle());
1668  }
1669 
1670 
1671  for (UInt_t iev=0; iev < fEventSample.size(); iev++){
1672  if (fEventSample[iev]->GetBoostWeight() > max) max = 1.01*fEventSample[iev]->GetBoostWeight();
1673  }
1674  TH1F *tmpBoostWeightsS = new TH1F(Form("BoostWeightsInTreeS%d",iTree),Form("BoostWeightsInTreeS%d",iTree),100,0.,max);
1675  TH1F *tmpBoostWeightsB = new TH1F(Form("BoostWeightsInTreeB%d",iTree),Form("BoostWeightsInTreeB%d",iTree),100,0.,max);
1676  results->Store(tmpBoostWeightsS,tmpBoostWeightsS->GetTitle());
1677  results->Store(tmpBoostWeightsB,tmpBoostWeightsB->GetTitle());
1678 
1679  TH1F *tmpBoostWeights;
1680  std::vector<TH1F*> *h;
1681 
1682  for (UInt_t iev=0; iev < fEventSample.size(); iev++){
1683  if (fEventSample[iev]->GetClass() == signalClassNr) {
1684  tmpBoostWeights=tmpBoostWeightsS;
1685  h=&hS;
1686  }else{
1687  tmpBoostWeights=tmpBoostWeightsB;
1688  h=&hB;
1689  }
1690  tmpBoostWeights->Fill(fEventSample[iev]->GetBoostWeight());
1691  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
1692  (*h)[ivar]->Fill(fEventSample[iev]->GetValue(ivar),fEventSample[iev]->GetWeight());
1693  }
1694  }
1695 
1696 
1697  TMVA::PDF *sig = new TMVA::PDF( " PDF Sig", tmpS, TMVA::PDF::kSpline3 );
1698  TMVA::PDF *bkg = new TMVA::PDF( " PDF Bkg", tmpB, TMVA::PDF::kSpline3 );
1699 
1700 
1701  TGraph* gr=results->GetGraph("BoostMonitorGraph");
1702  Int_t nPoints = gr->GetN();
1703  gr->Set(nPoints+1);
1704  gr->SetPoint(nPoints,(Double_t)iTree+1,GetROCIntegral(sig,bkg));
1705 
1706  tmpS->Delete();
1707  tmpB->Delete();
1708 
1709  delete sig;
1710  delete bkg;
1711 
1712  return;
1713 }
1714 
1715 ////////////////////////////////////////////////////////////////////////////////
1716 /// The AdaBoost implementation.
1717 /// a new training sample is generated by weighting
1718 /// events that are misclassified by the decision tree. The weight
1719 /// applied is \f$ w = \frac{(1-err)}{err} \f$ or more general:
1720 /// \f$ w = (\frac{(1-err)}{err})^\beta \f$
1721 /// where \f$err\f$ is the fraction of misclassified events in the tree ( <0.5 assuming
1722 /// demanding the that previous selection was better than random guessing)
1723 /// and "beta" being a free parameter (standard: beta = 1) that modifies the
1724 /// boosting.
1725 
1726 Double_t TMVA::MethodBDT::AdaBoost( std::vector<const TMVA::Event*>& eventSample, DecisionTree *dt )
1727 {
1728  Double_t err=0, sumGlobalw=0, sumGlobalwfalse=0, sumGlobalwfalse2=0;
1729 
1730  std::vector<Double_t> sumw(DataInfo().GetNClasses(),0); //for individually re-scaling each class
1731 
1732  Double_t maxDev=0;
1733  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1734  Double_t w = (*e)->GetWeight();
1735  sumGlobalw += w;
1736  UInt_t iclass=(*e)->GetClass();
1737  sumw[iclass] += w;
1738 
1739  if ( DoRegression() ) {
1740  Double_t tmpDev = TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) );
1741  sumGlobalwfalse += w * tmpDev;
1742  sumGlobalwfalse2 += w * tmpDev*tmpDev;
1743  if (tmpDev > maxDev) maxDev = tmpDev;
1744  }else{
1745 
1746  if (fUseYesNoLeaf){
1747  Bool_t isSignalType = (dt->CheckEvent(*e,fUseYesNoLeaf) > fNodePurityLimit );
1748  if (!(isSignalType == DataInfo().IsSignal(*e))) {
1749  sumGlobalwfalse+= w;
1750  }
1751  }else{
1752  Double_t dtoutput = (dt->CheckEvent(*e,fUseYesNoLeaf) - 0.5)*2.;
1753  Int_t trueType;
1754  if (DataInfo().IsSignal(*e)) trueType = 1;
1755  else trueType = -1;
1756  sumGlobalwfalse+= w*trueType*dtoutput;
1757  }
1758  }
1759  }
1760 
1761  err = sumGlobalwfalse/sumGlobalw ;
1762  if ( DoRegression() ) {
1763  //if quadratic loss:
1764  if (fAdaBoostR2Loss=="linear"){
1765  err = sumGlobalwfalse/maxDev/sumGlobalw ;
1766  }
1767  else if (fAdaBoostR2Loss=="quadratic"){
1768  err = sumGlobalwfalse2/maxDev/maxDev/sumGlobalw ;
1769  }
1770  else if (fAdaBoostR2Loss=="exponential"){
1771  err = 0;
1772  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1773  Double_t w = (*e)->GetWeight();
1774  Double_t tmpDev = TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) );
1775  err += w * (1 - exp (-tmpDev/maxDev)) / sumGlobalw;
1776  }
1777 
1778  }
1779  else {
1780  Log() << kFATAL << " you've chosen a Loss type for Adaboost other than linear, quadratic or exponential "
1781  << " namely " << fAdaBoostR2Loss << "\n"
1782  << "and this is not implemented... a typo in the options ??" <<Endl;
1783  }
1784  }
1785 
1786  Log() << kDEBUG << "BDT AdaBoos wrong/all: " << sumGlobalwfalse << "/" << sumGlobalw << Endl;
1787 
1788 
1789  Double_t newSumGlobalw=0;
1790  std::vector<Double_t> newSumw(sumw.size(),0);
1791 
1792  Double_t boostWeight=1.;
1793  if (err >= 0.5 && fUseYesNoLeaf) { // sanity check ... should never happen as otherwise there is apparently
1794  // something odd with the assignment of the leaf nodes (rem: you use the training
1795  // events for this determination of the error rate)
1796  if (dt->GetNNodes() == 1){
1797  Log() << kERROR << " YOUR tree has only 1 Node... kind of a funny *tree*. I cannot "
1798  << "boost such a thing... if after 1 step the error rate is == 0.5"
1799  << Endl
1800  << "please check why this happens, maybe too many events per node requested ?"
1801  << Endl;
1802 
1803  }else{
1804  Log() << kERROR << " The error rate in the BDT boosting is > 0.5. ("<< err
1805  << ") That should not happen, please check your code (i.e... the BDT code), I "
1806  << " stop boosting here" << Endl;
1807  return -1;
1808  }
1809  err = 0.5;
1810  } else if (err < 0) {
1811  Log() << kERROR << " The error rate in the BDT boosting is < 0. That can happen"
1812  << " due to improper treatment of negative weights in a Monte Carlo.. (if you have"
1813  << " an idea on how to do it in a better way, please let me know (Helge.Voss@cern.ch)"
1814  << " for the time being I set it to its absolute value.. just to continue.." << Endl;
1815  err = TMath::Abs(err);
1816  }
1817  if (fUseYesNoLeaf)
1818  boostWeight = TMath::Log((1.-err)/err)*fAdaBoostBeta;
1819  else
1820  boostWeight = TMath::Log((1.+err)/(1-err))*fAdaBoostBeta;
1821 
1822 
1823  Log() << kDEBUG << "BDT AdaBoos wrong/all: " << sumGlobalwfalse << "/" << sumGlobalw << " 1-err/err="<<boostWeight<< " log.."<<TMath::Log(boostWeight)<<Endl;
1824 
1826 
1827 
1828  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1829 
1830  if (fUseYesNoLeaf||DoRegression()){
1831  if ((!( (dt->CheckEvent(*e,fUseYesNoLeaf) > fNodePurityLimit ) == DataInfo().IsSignal(*e))) || DoRegression()) {
1832  Double_t boostfactor = TMath::Exp(boostWeight);
1833 
1834  if (DoRegression()) boostfactor = TMath::Power(1/boostWeight,(1.-TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) )/maxDev ) );
1835  if ( (*e)->GetWeight() > 0 ){
1836  (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
1837  // Helge change back (*e)->ScaleBoostWeight(boostfactor);
1838  if (DoRegression()) results->GetHist("BoostWeights")->Fill(boostfactor);
1839  } else {
1840  if ( fInverseBoostNegWeights )(*e)->ScaleBoostWeight( 1. / boostfactor); // if the original event weight is negative, and you want to "increase" the events "positive" influence, you'd rather make the event weight "smaller" in terms of it's absolute value while still keeping it something "negative"
1841  else (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
1842 
1843  }
1844  }
1845 
1846  }else{
1847  Double_t dtoutput = (dt->CheckEvent(*e,fUseYesNoLeaf) - 0.5)*2.;
1848  Int_t trueType;
1849  if (DataInfo().IsSignal(*e)) trueType = 1;
1850  else trueType = -1;
1851  Double_t boostfactor = TMath::Exp(-1*boostWeight*trueType*dtoutput);
1852 
1853  if ( (*e)->GetWeight() > 0 ){
1854  (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
1855  // Helge change back (*e)->ScaleBoostWeight(boostfactor);
1856  if (DoRegression()) results->GetHist("BoostWeights")->Fill(boostfactor);
1857  } else {
1858  if ( fInverseBoostNegWeights )(*e)->ScaleBoostWeight( 1. / boostfactor); // if the original event weight is negative, and you want to "increase" the events "positive" influence, you'd rather make the event weight "smaller" in terms of it's absolute value while still keeping it something "negative"
1859  else (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
1860  }
1861  }
1862  newSumGlobalw+=(*e)->GetWeight();
1863  newSumw[(*e)->GetClass()] += (*e)->GetWeight();
1864  }
1865 
1866 
1867  // Double_t globalNormWeight=sumGlobalw/newSumGlobalw;
1868  Double_t globalNormWeight=( (Double_t) eventSample.size())/newSumGlobalw;
1869  Log() << kDEBUG << "new Nsig="<<newSumw[0]*globalNormWeight << " new Nbkg="<<newSumw[1]*globalNormWeight << Endl;
1870 
1871 
1872  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1873  // if (fRenormByClass) (*e)->ScaleBoostWeight( normWeightByClass[(*e)->GetClass()] );
1874  // else (*e)->ScaleBoostWeight( globalNormWeight );
1875  // else (*e)->ScaleBoostWeight( globalNormWeight );
1876  if (DataInfo().IsSignal(*e))(*e)->ScaleBoostWeight( globalNormWeight * fSigToBkgFraction );
1877  else (*e)->ScaleBoostWeight( globalNormWeight );
1878  }
1879 
1880  if (!(DoRegression()))results->GetHist("BoostWeights")->Fill(boostWeight);
1881  results->GetHist("BoostWeightsVsTree")->SetBinContent(fForest.size(),boostWeight);
1882  results->GetHist("ErrorFrac")->SetBinContent(fForest.size(),err);
1883 
1884  fBoostWeight = boostWeight;
1885  fErrorFraction = err;
1886 
1887  return boostWeight;
1888 }
1889 
1890 ////////////////////////////////////////////////////////////////////////////////
1891 /// The AdaCost boosting algorithm takes a simple cost Matrix (currently fixed for
1892 /// all events... later could be modified to use individual cost matrices for each
1893 /// events as in the original paper...
1894 ///
1895 /// true_signal true_bkg
1896 /// ----------------------------------
1897 /// sel_signal | Css Ctb_ss Cxx.. in the range [0,1]
1898 /// sel_bkg | Cts_sb Cbb
1899 ///
1900 /// and takes this into account when calculating the mis class. cost (former: error fraction):
1901 ///
1902 /// err = sum_events ( weight* y_true*y_sel * beta(event)
1903 
1904 Double_t TMVA::MethodBDT::AdaCost( vector<const TMVA::Event*>& eventSample, DecisionTree *dt )
1905 {
1906  Double_t Css = fCss;
1907  Double_t Cbb = fCbb;
1908  Double_t Cts_sb = fCts_sb;
1909  Double_t Ctb_ss = fCtb_ss;
1910 
1911  Double_t err=0, sumGlobalWeights=0, sumGlobalCost=0;
1912 
1913  std::vector<Double_t> sumw(DataInfo().GetNClasses(),0); //for individually re-scaling each class
1914 
1915  for (vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1916  Double_t w = (*e)->GetWeight();
1917  sumGlobalWeights += w;
1918  UInt_t iclass=(*e)->GetClass();
1919 
1920  sumw[iclass] += w;
1921 
1922  if ( DoRegression() ) {
1923  Log() << kFATAL << " AdaCost not implemented for regression"<<Endl;
1924  }else{
1925 
1926  Double_t dtoutput = (dt->CheckEvent(*e,false) - 0.5)*2.;
1927  Int_t trueType;
1928  Bool_t isTrueSignal = DataInfo().IsSignal(*e);
1929  Bool_t isSelectedSignal = (dtoutput>0);
1930  if (isTrueSignal) trueType = 1;
1931  else trueType = -1;
1932 
1933  Double_t cost=0;
1934  if (isTrueSignal && isSelectedSignal) cost=Css;
1935  else if (isTrueSignal && !isSelectedSignal) cost=Cts_sb;
1936  else if (!isTrueSignal && isSelectedSignal) cost=Ctb_ss;
1937  else if (!isTrueSignal && !isSelectedSignal) cost=Cbb;
1938  else Log() << kERROR << "something went wrong in AdaCost" << Endl;
1939 
1940  sumGlobalCost+= w*trueType*dtoutput*cost;
1941 
1942  }
1943  }
1944 
1945  if ( DoRegression() ) {
1946  Log() << kFATAL << " AdaCost not implemented for regression"<<Endl;
1947  }
1948 
1949  // Log() << kDEBUG << "BDT AdaBoos wrong/all: " << sumGlobalCost << "/" << sumGlobalWeights << Endl;
1950  // Log() << kWARNING << "BDT AdaBoos wrong/all: " << sumGlobalCost << "/" << sumGlobalWeights << Endl;
1951  sumGlobalCost /= sumGlobalWeights;
1952  // Log() << kWARNING << "BDT AdaBoos wrong/all: " << sumGlobalCost << "/" << sumGlobalWeights << Endl;
1953 
1954 
1955  Double_t newSumGlobalWeights=0;
1956  vector<Double_t> newSumClassWeights(sumw.size(),0);
1957 
1958  Double_t boostWeight = TMath::Log((1+sumGlobalCost)/(1-sumGlobalCost)) * fAdaBoostBeta;
1959 
1961 
1962  for (vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1963  Double_t dtoutput = (dt->CheckEvent(*e,false) - 0.5)*2.;
1964  Int_t trueType;
1965  Bool_t isTrueSignal = DataInfo().IsSignal(*e);
1966  Bool_t isSelectedSignal = (dtoutput>0);
1967  if (isTrueSignal) trueType = 1;
1968  else trueType = -1;
1969 
1970  Double_t cost=0;
1971  if (isTrueSignal && isSelectedSignal) cost=Css;
1972  else if (isTrueSignal && !isSelectedSignal) cost=Cts_sb;
1973  else if (!isTrueSignal && isSelectedSignal) cost=Ctb_ss;
1974  else if (!isTrueSignal && !isSelectedSignal) cost=Cbb;
1975  else Log() << kERROR << "something went wrong in AdaCost" << Endl;
1976 
1977  Double_t boostfactor = TMath::Exp(-1*boostWeight*trueType*dtoutput*cost);
1978  if (DoRegression())Log() << kFATAL << " AdaCost not implemented for regression"<<Endl;
1979  if ( (*e)->GetWeight() > 0 ){
1980  (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
1981  // Helge change back (*e)->ScaleBoostWeight(boostfactor);
1982  if (DoRegression())Log() << kFATAL << " AdaCost not implemented for regression"<<Endl;
1983  } else {
1984  if ( fInverseBoostNegWeights )(*e)->ScaleBoostWeight( 1. / boostfactor); // if the original event weight is negative, and you want to "increase" the events "positive" influence, you'd rather make the event weight "smaller" in terms of it's absolute value while still keeping it something "negative"
1985  }
1986 
1987  newSumGlobalWeights+=(*e)->GetWeight();
1988  newSumClassWeights[(*e)->GetClass()] += (*e)->GetWeight();
1989  }
1990 
1991 
1992  // Double_t globalNormWeight=sumGlobalWeights/newSumGlobalWeights;
1993  Double_t globalNormWeight=Double_t(eventSample.size())/newSumGlobalWeights;
1994  Log() << kDEBUG << "new Nsig="<<newSumClassWeights[0]*globalNormWeight << " new Nbkg="<<newSumClassWeights[1]*globalNormWeight << Endl;
1995 
1996 
1997  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1998  // if (fRenormByClass) (*e)->ScaleBoostWeight( normWeightByClass[(*e)->GetClass()] );
1999  // else (*e)->ScaleBoostWeight( globalNormWeight );
2000  if (DataInfo().IsSignal(*e))(*e)->ScaleBoostWeight( globalNormWeight * fSigToBkgFraction );
2001  else (*e)->ScaleBoostWeight( globalNormWeight );
2002  }
2003 
2004 
2005  if (!(DoRegression()))results->GetHist("BoostWeights")->Fill(boostWeight);
2006  results->GetHist("BoostWeightsVsTree")->SetBinContent(fForest.size(),boostWeight);
2007  results->GetHist("ErrorFrac")->SetBinContent(fForest.size(),err);
2008 
2009  fBoostWeight = boostWeight;
2010  fErrorFraction = err;
2011 
2012 
2013  return boostWeight;
2014 }
2015 
2016 ////////////////////////////////////////////////////////////////////////////////
2017 /// Call it boot-strapping, re-sampling or whatever you like, in the end it is nothing
2018 /// else but applying "random" poisson weights to each event.
2019 
2021 {
2022  // this is now done in "MethodBDT::Boost as it might be used by other boost methods, too
2023  // GetBaggedSample(eventSample);
2024 
2025  return 1.; //here as there are random weights for each event, just return a constant==1;
2026 }
2027 
2028 ////////////////////////////////////////////////////////////////////////////////
2029 /// Fills fEventSample with fBaggedSampleFraction*NEvents random training events.
2030 
2031 void TMVA::MethodBDT::GetBaggedSubSample(std::vector<const TMVA::Event*>& eventSample)
2032 {
2033 
2034  Double_t n;
2035  TRandom3 *trandom = new TRandom3(100*fForest.size()+1234);
2036 
2037  if (!fSubSample.empty()) fSubSample.clear();
2038 
2039  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
2040  n = trandom->PoissonD(fBaggedSampleFraction);
2041  for (Int_t i=0;i<n;i++) fSubSample.push_back(*e);
2042  }
2043 
2044  delete trandom;
2045  return;
2046 
2047  /*
2048  UInt_t nevents = fEventSample.size();
2049 
2050  if (!fSubSample.empty()) fSubSample.clear();
2051  TRandom3 *trandom = new TRandom3(fForest.size()+1);
2052 
2053  for (UInt_t ievt=0; ievt<nevents; ievt++) { // recreate new random subsample
2054  if(trandom->Rndm()<fBaggedSampleFraction)
2055  fSubSample.push_back(fEventSample[ievt]);
2056  }
2057  delete trandom;
2058  */
2059 
2060 }
2061 
2062 ////////////////////////////////////////////////////////////////////////////////
2063 /// A special boosting only for Regression (not implemented).
2064 
2065 Double_t TMVA::MethodBDT::RegBoost( std::vector<const TMVA::Event*>& /* eventSample */, DecisionTree* /* dt */ )
2066 {
2067  return 1;
2068 }
2069 
2070 ////////////////////////////////////////////////////////////////////////////////
2071 /// Adaption of the AdaBoost to regression problems (see H.Drucker 1997).
2072 
2073 Double_t TMVA::MethodBDT::AdaBoostR2( std::vector<const TMVA::Event*>& eventSample, DecisionTree *dt )
2074 {
2075  if ( !DoRegression() ) Log() << kFATAL << "Somehow you chose a regression boost method for a classification job" << Endl;
2076 
2077  Double_t err=0, sumw=0, sumwfalse=0, sumwfalse2=0;
2078  Double_t maxDev=0;
2079  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
2080  Double_t w = (*e)->GetWeight();
2081  sumw += w;
2082 
2083  Double_t tmpDev = TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) );
2084  sumwfalse += w * tmpDev;
2085  sumwfalse2 += w * tmpDev*tmpDev;
2086  if (tmpDev > maxDev) maxDev = tmpDev;
2087  }
2088 
2089  //if quadratic loss:
2090  if (fAdaBoostR2Loss=="linear"){
2091  err = sumwfalse/maxDev/sumw ;
2092  }
2093  else if (fAdaBoostR2Loss=="quadratic"){
2094  err = sumwfalse2/maxDev/maxDev/sumw ;
2095  }
2096  else if (fAdaBoostR2Loss=="exponential"){
2097  err = 0;
2098  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
2099  Double_t w = (*e)->GetWeight();
2100  Double_t tmpDev = TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) );
2101  err += w * (1 - exp (-tmpDev/maxDev)) / sumw;
2102  }
2103 
2104  }
2105  else {
2106  Log() << kFATAL << " you've chosen a Loss type for Adaboost other than linear, quadratic or exponential "
2107  << " namely " << fAdaBoostR2Loss << "\n"
2108  << "and this is not implemented... a typo in the options ??" <<Endl;
2109  }
2110 
2111 
2112  if (err >= 0.5) { // sanity check ... should never happen as otherwise there is apparently
2113  // something odd with the assignment of the leaf nodes (rem: you use the training
2114  // events for this determination of the error rate)
2115  if (dt->GetNNodes() == 1){
2116  Log() << kERROR << " YOUR tree has only 1 Node... kind of a funny *tree*. I cannot "
2117  << "boost such a thing... if after 1 step the error rate is == 0.5"
2118  << Endl
2119  << "please check why this happens, maybe too many events per node requested ?"
2120  << Endl;
2121 
2122  }else{
2123  Log() << kERROR << " The error rate in the BDT boosting is > 0.5. ("<< err
2124  << ") That should not happen, but is possible for regression trees, and"
2125  << " should trigger a stop for the boosting. please check your code (i.e... the BDT code), I "
2126  << " stop boosting " << Endl;
2127  return -1;
2128  }
2129  err = 0.5;
2130  } else if (err < 0) {
2131  Log() << kERROR << " The error rate in the BDT boosting is < 0. That can happen"
2132  << " due to improper treatment of negative weights in a Monte Carlo.. (if you have"
2133  << " an idea on how to do it in a better way, please let me know (Helge.Voss@cern.ch)"
2134  << " for the time being I set it to its absolute value.. just to continue.." << Endl;
2135  err = TMath::Abs(err);
2136  }
2137 
2138  Double_t boostWeight = err / (1.-err);
2139  Double_t newSumw=0;
2140 
2142 
2143  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
2144  Double_t boostfactor = TMath::Power(boostWeight,(1.-TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) )/maxDev ) );
2145  results->GetHist("BoostWeights")->Fill(boostfactor);
2146  // std::cout << "R2 " << boostfactor << " " << boostWeight << " " << (1.-TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) )/maxDev) << std::endl;
2147  if ( (*e)->GetWeight() > 0 ){
2148  Float_t newBoostWeight = (*e)->GetBoostWeight() * boostfactor;
2149  Float_t newWeight = (*e)->GetWeight() * (*e)->GetBoostWeight() * boostfactor;
2150  if (newWeight == 0) {
2151  Log() << kINFO << "Weight= " << (*e)->GetWeight() << Endl;
2152  Log() << kINFO << "BoostWeight= " << (*e)->GetBoostWeight() << Endl;
2153  Log() << kINFO << "boostweight="<<boostWeight << " err= " <<err << Endl;
2154  Log() << kINFO << "NewBoostWeight= " << newBoostWeight << Endl;
2155  Log() << kINFO << "boostfactor= " << boostfactor << Endl;
2156  Log() << kINFO << "maxDev = " << maxDev << Endl;
2157  Log() << kINFO << "tmpDev = " << TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) ) << Endl;
2158  Log() << kINFO << "target = " << (*e)->GetTarget(0) << Endl;
2159  Log() << kINFO << "estimate = " << dt->CheckEvent(*e,kFALSE) << Endl;
2160  }
2161  (*e)->SetBoostWeight( newBoostWeight );
2162  // (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
2163  } else {
2164  (*e)->SetBoostWeight( (*e)->GetBoostWeight() / boostfactor);
2165  }
2166  newSumw+=(*e)->GetWeight();
2167  }
2168 
2169  // re-normalise the weights
2170  Double_t normWeight = sumw / newSumw;
2171  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
2172  //Helge (*e)->ScaleBoostWeight( sumw/newSumw);
2173  // (*e)->ScaleBoostWeight( normWeight);
2174  (*e)->SetBoostWeight( (*e)->GetBoostWeight() * normWeight );
2175  }
2176 
2177 
2178  results->GetHist("BoostWeightsVsTree")->SetBinContent(fForest.size(),1./boostWeight);
2179  results->GetHist("ErrorFrac")->SetBinContent(fForest.size(),err);
2180 
2181  fBoostWeight = boostWeight;
2182  fErrorFraction = err;
2183 
2184  return TMath::Log(1./boostWeight);
2185 }
2186 
2187 ////////////////////////////////////////////////////////////////////////////////
2188 /// Write weights to XML.
2189 
2190 void TMVA::MethodBDT::AddWeightsXMLTo( void* parent ) const
2191 {
2192  void* wght = gTools().AddChild(parent, "Weights");
2193 
2194  if (fDoPreselection){
2195  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
2196  gTools().AddAttr( wght, Form("PreselectionLowBkgVar%d",ivar), fIsLowBkgCut[ivar]);
2197  gTools().AddAttr( wght, Form("PreselectionLowBkgVar%dValue",ivar), fLowBkgCut[ivar]);
2198  gTools().AddAttr( wght, Form("PreselectionLowSigVar%d",ivar), fIsLowSigCut[ivar]);
2199  gTools().AddAttr( wght, Form("PreselectionLowSigVar%dValue",ivar), fLowSigCut[ivar]);
2200  gTools().AddAttr( wght, Form("PreselectionHighBkgVar%d",ivar), fIsHighBkgCut[ivar]);
2201  gTools().AddAttr( wght, Form("PreselectionHighBkgVar%dValue",ivar),fHighBkgCut[ivar]);
2202  gTools().AddAttr( wght, Form("PreselectionHighSigVar%d",ivar), fIsHighSigCut[ivar]);
2203  gTools().AddAttr( wght, Form("PreselectionHighSigVar%dValue",ivar),fHighSigCut[ivar]);
2204  }
2205  }
2206 
2207 
2208  gTools().AddAttr( wght, "NTrees", fForest.size() );
2209  gTools().AddAttr( wght, "AnalysisType", fForest.back()->GetAnalysisType() );
2210 
2211  for (UInt_t i=0; i< fForest.size(); i++) {
2212  void* trxml = fForest[i]->AddXMLTo(wght);
2213  gTools().AddAttr( trxml, "boostWeight", fBoostWeights[i] );
2214  gTools().AddAttr( trxml, "itree", i );
2215  }
2216 }
2217 
2218 ////////////////////////////////////////////////////////////////////////////////
2219 /// Reads the BDT from the xml file.
2220 
2222  UInt_t i;
2223  for (i=0; i<fForest.size(); i++) delete fForest[i];
2224  fForest.clear();
2225  fBoostWeights.clear();
2226 
2227  UInt_t ntrees;
2228  UInt_t analysisType;
2229  Float_t boostWeight;
2230 
2231 
2232  if (gTools().HasAttr( parent, Form("PreselectionLowBkgVar%d",0))) {
2233  fIsLowBkgCut.resize(GetNvar());
2234  fLowBkgCut.resize(GetNvar());
2235  fIsLowSigCut.resize(GetNvar());
2236  fLowSigCut.resize(GetNvar());
2237  fIsHighBkgCut.resize(GetNvar());
2238  fHighBkgCut.resize(GetNvar());
2239  fIsHighSigCut.resize(GetNvar());
2240  fHighSigCut.resize(GetNvar());
2241 
2242  Bool_t tmpBool;
2243  Double_t tmpDouble;
2244  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
2245  gTools().ReadAttr( parent, Form("PreselectionLowBkgVar%d",ivar), tmpBool);
2246  fIsLowBkgCut[ivar]=tmpBool;
2247  gTools().ReadAttr( parent, Form("PreselectionLowBkgVar%dValue",ivar), tmpDouble);
2248  fLowBkgCut[ivar]=tmpDouble;
2249  gTools().ReadAttr( parent, Form("PreselectionLowSigVar%d",ivar), tmpBool);
2250  fIsLowSigCut[ivar]=tmpBool;
2251  gTools().ReadAttr( parent, Form("PreselectionLowSigVar%dValue",ivar), tmpDouble);
2252  fLowSigCut[ivar]=tmpDouble;
2253  gTools().ReadAttr( parent, Form("PreselectionHighBkgVar%d",ivar), tmpBool);
2254  fIsHighBkgCut[ivar]=tmpBool;
2255  gTools().ReadAttr( parent, Form("PreselectionHighBkgVar%dValue",ivar), tmpDouble);
2256  fHighBkgCut[ivar]=tmpDouble;
2257  gTools().ReadAttr( parent, Form("PreselectionHighSigVar%d",ivar),tmpBool);
2258  fIsHighSigCut[ivar]=tmpBool;
2259  gTools().ReadAttr( parent, Form("PreselectionHighSigVar%dValue",ivar), tmpDouble);
2260  fHighSigCut[ivar]=tmpDouble;
2261  }
2262  }
2263 
2264 
2265  gTools().ReadAttr( parent, "NTrees", ntrees );
2266 
2267  if(gTools().HasAttr(parent, "TreeType")) { // pre 4.1.0 version
2268  gTools().ReadAttr( parent, "TreeType", analysisType );
2269  } else { // from 4.1.0 onwards
2270  gTools().ReadAttr( parent, "AnalysisType", analysisType );
2271  }
2272 
2273  void* ch = gTools().GetChild(parent);
2274  i=0;
2275  while(ch) {
2276  fForest.push_back( dynamic_cast<DecisionTree*>( DecisionTree::CreateFromXML(ch, GetTrainingTMVAVersionCode()) ) );
2277  fForest.back()->SetAnalysisType(Types::EAnalysisType(analysisType));
2278  fForest.back()->SetTreeID(i++);
2279  gTools().ReadAttr(ch,"boostWeight",boostWeight);
2280  fBoostWeights.push_back(boostWeight);
2281  ch = gTools().GetNextChild(ch);
2282  }
2283 }
2284 
2285 ////////////////////////////////////////////////////////////////////////////////
2286 /// Read the weights (BDT coefficients).
2287 
2288 void TMVA::MethodBDT::ReadWeightsFromStream( std::istream& istr )
2289 {
2290  TString dummy;
2291  // Types::EAnalysisType analysisType;
2292  Int_t analysisType(0);
2293 
2294  // coverity[tainted_data_argument]
2295  istr >> dummy >> fNTrees;
2296  Log() << kINFO << "Read " << fNTrees << " Decision trees" << Endl;
2297 
2298  for (UInt_t i=0;i<fForest.size();i++) delete fForest[i];
2299  fForest.clear();
2300  fBoostWeights.clear();
2301  Int_t iTree;
2302  Double_t boostWeight;
2303  for (int i=0;i<fNTrees;i++) {
2304  istr >> dummy >> iTree >> dummy >> boostWeight;
2305  if (iTree != i) {
2306  fForest.back()->Print( std::cout );
2307  Log() << kFATAL << "Error while reading weight file; mismatch iTree="
2308  << iTree << " i=" << i
2309  << " dummy " << dummy
2310  << " boostweight " << boostWeight
2311  << Endl;
2312  }
2313  fForest.push_back( new DecisionTree() );
2314  fForest.back()->SetAnalysisType(Types::EAnalysisType(analysisType));
2315  fForest.back()->SetTreeID(i);
2316  fForest.back()->Read(istr, GetTrainingTMVAVersionCode());
2317  fBoostWeights.push_back(boostWeight);
2318  }
2319 }
2320 
2321 ////////////////////////////////////////////////////////////////////////////////
2322 
2324  return this->GetMvaValue( err, errUpper, 0 );
2325 }
2326 
2327 ////////////////////////////////////////////////////////////////////////////////
2328 /// Return the MVA value (range [-1;1]) that classifies the
2329 /// event according to the majority vote from the total number of
2330 /// decision trees.
2331 
2333 {
2334  const Event* ev = GetEvent();
2335  if (fDoPreselection) {
2336  Double_t val = ApplyPreselectionCuts(ev);
2337  if (TMath::Abs(val)>0.05) return val;
2338  }
2339  return PrivateGetMvaValue(ev, err, errUpper, useNTrees);
2340 
2341 }
2342 
2343 ////////////////////////////////////////////////////////////////////////////////
2344 /// Return the MVA value (range [-1;1]) that classifies the
2345 /// event according to the majority vote from the total number of
2346 /// decision trees.
2347 
2349 {
2350  // cannot determine error
2351  NoErrorCalc(err, errUpper);
2352 
2353  // allow for the possibility to use less trees in the actual MVA calculation
2354  // than have been originally trained.
2355  UInt_t nTrees = fForest.size();
2356 
2357  if (useNTrees > 0 ) nTrees = useNTrees;
2358 
2359  if (fBoostType=="Grad") return GetGradBoostMVA(ev,nTrees);
2360 
2361  Double_t myMVA = 0;
2362  Double_t norm = 0;
2363  for (UInt_t itree=0; itree<nTrees; itree++) {
2364  //
2365  myMVA += fBoostWeights[itree] * fForest[itree]->CheckEvent(ev,fUseYesNoLeaf);
2366  norm += fBoostWeights[itree];
2367  }
2368  return ( norm > std::numeric_limits<double>::epsilon() ) ? myMVA /= norm : 0 ;
2369 }
2370 
2371 
2372 ////////////////////////////////////////////////////////////////////////////////
2373 /// Get the multiclass MVA response for the BDT classifier.
2374 
2375 const std::vector<Float_t>& TMVA::MethodBDT::GetMulticlassValues()
2376 {
2377  const TMVA::Event *e = GetEvent();
2378  if (fMulticlassReturnVal == NULL) fMulticlassReturnVal = new std::vector<Float_t>();
2379  fMulticlassReturnVal->clear();
2380 
2381  UInt_t nClasses = DataInfo().GetNClasses();
2382  std::vector<Double_t> temp(nClasses);
2383  auto forestSize = fForest.size();
2384  // trees 0, nClasses, 2*nClasses, ... belong to class 0
2385  // trees 1, nClasses+1, 2*nClasses+1, ... belong to class 1 and so forth
2386  UInt_t classOfTree = 0;
2387  for (UInt_t itree = 0; itree < forestSize; ++itree) {
2388  temp[classOfTree] += fForest[itree]->CheckEvent(e, kFALSE);
2389  if (++classOfTree == nClasses) classOfTree = 0; // cheap modulo
2390  }
2391 
2392  // we want to calculate sum of exp(temp[j] - temp[i]) for all i,j (i!=j)
2393  // first calculate exp(), then replace minus with division.
2394  std::transform(temp.begin(), temp.end(), temp.begin(), [](Double_t d){return exp(d);});
2395 
2396  for(UInt_t iClass=0; iClass<nClasses; iClass++){
2397  Double_t norm = 0.0;
2398  for(UInt_t j=0;j<nClasses;j++){
2399  if(iClass!=j)
2400  norm += temp[j] / temp[iClass];
2401  }
2402  (*fMulticlassReturnVal).push_back(1.0/(1.0+norm));
2403  }
2404 
2405  return *fMulticlassReturnVal;
2406 }
2407 
2408 ////////////////////////////////////////////////////////////////////////////////
2409 /// Get the regression value generated by the BDTs.
2410 
2411 const std::vector<Float_t> & TMVA::MethodBDT::GetRegressionValues()
2412 {
2413 
2414  if (fRegressionReturnVal == NULL) fRegressionReturnVal = new std::vector<Float_t>();
2415  fRegressionReturnVal->clear();
2416 
2417  const Event * ev = GetEvent();
2418  Event * evT = new Event(*ev);
2419 
2420  Double_t myMVA = 0;
2421  Double_t norm = 0;
2422  if (fBoostType=="AdaBoostR2") {
2423  // rather than using the weighted average of the tree respones in the forest
2424  // H.Decker(1997) proposed to use the "weighted median"
2425 
2426  // sort all individual tree responses according to the prediction value
2427  // (keep the association to their tree weight)
2428  // the sum up all the associated weights (starting from the one whose tree
2429  // yielded the smalles response) up to the tree "t" at which you've
2430  // added enough tree weights to have more than half of the sum of all tree weights.
2431  // choose as response of the forest that one which belongs to this "t"
2432 
2433  vector< Double_t > response(fForest.size());
2434  vector< Double_t > weight(fForest.size());
2435  Double_t totalSumOfWeights = 0;
2436 
2437  for (UInt_t itree=0; itree<fForest.size(); itree++) {
2438  response[itree] = fForest[itree]->CheckEvent(ev,kFALSE);
2439  weight[itree] = fBoostWeights[itree];
2440  totalSumOfWeights += fBoostWeights[itree];
2441  }
2442 
2443  std::vector< std::vector<Double_t> > vtemp;
2444  vtemp.push_back( response ); // this is the vector that will get sorted
2445  vtemp.push_back( weight );
2446  gTools().UsefulSortAscending( vtemp );
2447 
2448  Int_t t=0;
2449  Double_t sumOfWeights = 0;
2450  while (sumOfWeights <= totalSumOfWeights/2.) {
2451  sumOfWeights += vtemp[1][t];
2452  t++;
2453  }
2454 
2455  Double_t rVal=0;
2456  Int_t count=0;
2457  for (UInt_t i= TMath::Max(UInt_t(0),UInt_t(t-(fForest.size()/6)-0.5));
2458  i< TMath::Min(UInt_t(fForest.size()),UInt_t(t+(fForest.size()/6)+0.5)); i++) {
2459  count++;
2460  rVal+=vtemp[0][i];
2461  }
2462  // fRegressionReturnVal->push_back( rVal/Double_t(count));
2463  evT->SetTarget(0, rVal/Double_t(count) );
2464  }
2465  else if(fBoostType=="Grad"){
2466  for (UInt_t itree=0; itree<fForest.size(); itree++) {
2467  myMVA += fForest[itree]->CheckEvent(ev,kFALSE);
2468  }
2469  // fRegressionReturnVal->push_back( myMVA+fBoostWeights[0]);
2470  evT->SetTarget(0, myMVA+fBoostWeights[0] );
2471  }
2472  else{
2473  for (UInt_t itree=0; itree<fForest.size(); itree++) {
2474  //
2475  myMVA += fBoostWeights[itree] * fForest[itree]->CheckEvent(ev,kFALSE);
2476  norm += fBoostWeights[itree];
2477  }
2478  // fRegressionReturnVal->push_back( ( norm > std::numeric_limits<double>::epsilon() ) ? myMVA /= norm : 0 );
2479  evT->SetTarget(0, ( norm > std::numeric_limits<double>::epsilon() ) ? myMVA /= norm : 0 );
2480  }
2481 
2482 
2483 
2484  const Event* evT2 = GetTransformationHandler().InverseTransform( evT );
2485  fRegressionReturnVal->push_back( evT2->GetTarget(0) );
2486 
2487  delete evT;
2488 
2489 
2490  return *fRegressionReturnVal;
2491 }
2492 
2493 ////////////////////////////////////////////////////////////////////////////////
2494 /// Here we could write some histograms created during the processing
2495 /// to the output file.
2496 
2498 {
2499  Log() << kDEBUG << "\tWrite monitoring histograms to file: " << BaseDir()->GetPath() << Endl;
2500 
2501  //Results* results = Data()->GetResults(GetMethodName(), Types::kTraining, Types::kMaxAnalysisType);
2502  //results->GetStorage()->Write();
2503  fMonitorNtuple->Write();
2504 }
2505 
2506 ////////////////////////////////////////////////////////////////////////////////
2507 /// Return the relative variable importance, normalized to all
2508 /// variables together having the importance 1. The importance in
2509 /// evaluated as the total separation-gain that this variable had in
2510 /// the decision trees (weighted by the number of events)
2511 
2513 {
2514  fVariableImportance.resize(GetNvar());
2515  for (UInt_t ivar = 0; ivar < GetNvar(); ivar++) {
2516  fVariableImportance[ivar]=0;
2517  }
2518  Double_t sum=0;
2519  for (UInt_t itree = 0; itree < GetNTrees(); itree++) {
2520  std::vector<Double_t> relativeImportance(fForest[itree]->GetVariableImportance());
2521  for (UInt_t i=0; i< relativeImportance.size(); i++) {
2522  fVariableImportance[i] += fBoostWeights[itree] * relativeImportance[i];
2523  }
2524  }
2525 
2526  for (UInt_t ivar=0; ivar< fVariableImportance.size(); ivar++){
2528  sum += fVariableImportance[ivar];
2529  }
2530  for (UInt_t ivar=0; ivar< fVariableImportance.size(); ivar++) fVariableImportance[ivar] /= sum;
2531 
2532  return fVariableImportance;
2533 }
2534 
2535 ////////////////////////////////////////////////////////////////////////////////
2536 /// Returns the measure for the variable importance of variable "ivar"
2537 /// which is later used in GetVariableImportance() to calculate the
2538 /// relative variable importances.
2539 
2541 {
2542  std::vector<Double_t> relativeImportance = this->GetVariableImportance();
2543  if (ivar < (UInt_t)relativeImportance.size()) return relativeImportance[ivar];
2544  else Log() << kFATAL << "<GetVariableImportance> ivar = " << ivar << " is out of range " << Endl;
2545 
2546  return -1;
2547 }
2548 
2549 ////////////////////////////////////////////////////////////////////////////////
2550 /// Compute ranking of input variables
2551 
2553 {
2554  // create the ranking object
2555  fRanking = new Ranking( GetName(), "Variable Importance" );
2556  vector< Double_t> importance(this->GetVariableImportance());
2557 
2558  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
2559 
2560  fRanking->AddRank( Rank( GetInputLabel(ivar), importance[ivar] ) );
2561  }
2562 
2563  return fRanking;
2564 }
2565 
2566 ////////////////////////////////////////////////////////////////////////////////
2567 /// Get help message text.
2568 
2570 {
2571  Log() << Endl;
2572  Log() << gTools().Color("bold") << "--- Short description:" << gTools().Color("reset") << Endl;
2573  Log() << Endl;
2574  Log() << "Boosted Decision Trees are a collection of individual decision" << Endl;
2575  Log() << "trees which form a multivariate classifier by (weighted) majority " << Endl;
2576  Log() << "vote of the individual trees. Consecutive decision trees are " << Endl;
2577  Log() << "trained using the original training data set with re-weighted " << Endl;
2578  Log() << "events. By default, the AdaBoost method is employed, which gives " << Endl;
2579  Log() << "events that were misclassified in the previous tree a larger " << Endl;
2580  Log() << "weight in the training of the following tree." << Endl;
2581  Log() << Endl;
2582  Log() << "Decision trees are a sequence of binary splits of the data sample" << Endl;
2583  Log() << "using a single discriminant variable at a time. A test event " << Endl;
2584  Log() << "ending up after the sequence of left-right splits in a final " << Endl;
2585  Log() << "(\"leaf\") node is classified as either signal or background" << Endl;
2586  Log() << "depending on the majority type of training events in that node." << Endl;
2587  Log() << Endl;
2588  Log() << gTools().Color("bold") << "--- Performance optimisation:" << gTools().Color("reset") << Endl;
2589  Log() << Endl;
2590  Log() << "By the nature of the binary splits performed on the individual" << Endl;
2591  Log() << "variables, decision trees do not deal well with linear correlations" << Endl;
2592  Log() << "between variables (they need to approximate the linear split in" << Endl;
2593  Log() << "the two dimensional space by a sequence of splits on the two " << Endl;
2594  Log() << "variables individually). Hence decorrelation could be useful " << Endl;
2595  Log() << "to optimise the BDT performance." << Endl;
2596  Log() << Endl;
2597  Log() << gTools().Color("bold") << "--- Performance tuning via configuration options:" << gTools().Color("reset") << Endl;
2598  Log() << Endl;
2599  Log() << "The two most important parameters in the configuration are the " << Endl;
2600  Log() << "minimal number of events requested by a leaf node as percentage of the " <<Endl;
2601  Log() << " number of training events (option \"MinNodeSize\" replacing the actual number " << Endl;
2602  Log() << " of events \"nEventsMin\" as given in earlier versions" << Endl;
2603  Log() << "If this number is too large, detailed features " << Endl;
2604  Log() << "in the parameter space are hard to be modelled. If it is too small, " << Endl;
2605  Log() << "the risk to overtrain rises and boosting seems to be less effective" << Endl;
2606  Log() << " typical values from our current experience for best performance " << Endl;
2607  Log() << " are between 0.5(%) and 10(%) " << Endl;
2608  Log() << Endl;
2609  Log() << "The default minimal number is currently set to " << Endl;
2610  Log() << " max(20, (N_training_events / N_variables^2 / 10)) " << Endl;
2611  Log() << "and can be changed by the user." << Endl;
2612  Log() << Endl;
2613  Log() << "The other crucial parameter, the pruning strength (\"PruneStrength\")," << Endl;
2614  Log() << "is also related to overtraining. It is a regularisation parameter " << Endl;
2615  Log() << "that is used when determining after the training which splits " << Endl;
2616  Log() << "are considered statistically insignificant and are removed. The" << Endl;
2617  Log() << "user is advised to carefully watch the BDT screen output for" << Endl;
2618  Log() << "the comparison between efficiencies obtained on the training and" << Endl;
2619  Log() << "the independent test sample. They should be equal within statistical" << Endl;
2620  Log() << "errors, in order to minimize statistical fluctuations in different samples." << Endl;
2621 }
2622 
2623 ////////////////////////////////////////////////////////////////////////////////
2624 /// Make ROOT-independent C++ class for classifier response (classifier-specific implementation).
2625 
2626 void TMVA::MethodBDT::MakeClassSpecific( std::ostream& fout, const TString& className ) const
2627 {
2628  TString nodeName = className;
2629  nodeName.ReplaceAll("Read","");
2630  nodeName.Append("Node");
2631  // write BDT-specific classifier response
2632  fout << " std::vector<"<<nodeName<<"*> fForest; // i.e. root nodes of decision trees" << std::endl;
2633  fout << " std::vector<double> fBoostWeights; // the weights applied in the individual boosts" << std::endl;
2634  fout << "};" << std::endl << std::endl;
2635  fout << "double " << className << "::GetMvaValue__( const std::vector<double>& inputValues ) const" << std::endl;
2636  fout << "{" << std::endl;
2637  fout << " double myMVA = 0;" << std::endl;
2638  if (fDoPreselection){
2639  for (UInt_t ivar = 0; ivar< fIsLowBkgCut.size(); ivar++){
2640  if (fIsLowBkgCut[ivar]){
2641  fout << " if (inputValues["<<ivar<<"] < " << fLowBkgCut[ivar] << ") return -1; // is background preselection cut" << std::endl;
2642  }
2643  if (fIsLowSigCut[ivar]){
2644  fout << " if (inputValues["<<ivar<<"] < "<< fLowSigCut[ivar] << ") return 1; // is signal preselection cut" << std::endl;
2645  }
2646  if (fIsHighBkgCut[ivar]){
2647  fout << " if (inputValues["<<ivar<<"] > "<<fHighBkgCut[ivar] <<") return -1; // is background preselection cut" << std::endl;
2648  }
2649  if (fIsHighSigCut[ivar]){
2650  fout << " if (inputValues["<<ivar<<"] > "<<fHighSigCut[ivar]<<") return 1; // is signal preselection cut" << std::endl;
2651  }
2652  }
2653  }
2654 
2655  if (fBoostType!="Grad"){
2656  fout << " double norm = 0;" << std::endl;
2657  }
2658  fout << " for (unsigned int itree=0; itree<fForest.size(); itree++){" << std::endl;
2659  fout << " "<<nodeName<<" *current = fForest[itree];" << std::endl;
2660  fout << " while (current->GetNodeType() == 0) { //intermediate node" << std::endl;
2661  fout << " if (current->GoesRight(inputValues)) current=("<<nodeName<<"*)current->GetRight();" << std::endl;
2662  fout << " else current=("<<nodeName<<"*)current->GetLeft();" << std::endl;
2663  fout << " }" << std::endl;
2664  if (fBoostType=="Grad"){
2665  fout << " myMVA += current->GetResponse();" << std::endl;
2666  }else{
2667  if (fUseYesNoLeaf) fout << " myMVA += fBoostWeights[itree] * current->GetNodeType();" << std::endl;
2668  else fout << " myMVA += fBoostWeights[itree] * current->GetPurity();" << std::endl;
2669  fout << " norm += fBoostWeights[itree];" << std::endl;
2670  }
2671  fout << " }" << std::endl;
2672  if (fBoostType=="Grad"){
2673  fout << " return 2.0/(1.0+exp(-2.0*myMVA))-1.0;" << std::endl;
2674  }
2675  else fout << " return myMVA /= norm;" << std::endl;
2676  fout << "};" << std::endl << std::endl;
2677  fout << "void " << className << "::Initialize()" << std::endl;
2678  fout << "{" << std::endl;
2679  //Now for each decision tree, write directly the constructors of the nodes in the tree structure
2680  for (UInt_t itree=0; itree<GetNTrees(); itree++) {
2681  fout << " // itree = " << itree << std::endl;
2682  fout << " fBoostWeights.push_back(" << fBoostWeights[itree] << ");" << std::endl;
2683  fout << " fForest.push_back( " << std::endl;
2684  this->MakeClassInstantiateNode((DecisionTreeNode*)fForest[itree]->GetRoot(), fout, className);
2685  fout <<" );" << std::endl;
2686  }
2687  fout << " return;" << std::endl;
2688  fout << "};" << std::endl;
2689  fout << " " << std::endl;
2690  fout << "// Clean up" << std::endl;
2691  fout << "inline void " << className << "::Clear() " << std::endl;
2692  fout << "{" << std::endl;
2693  fout << " for (unsigned int itree=0; itree<fForest.size(); itree++) { " << std::endl;
2694  fout << " delete fForest[itree]; " << std::endl;
2695  fout << " }" << std::endl;
2696  fout << "}" << std::endl;
2697 }
2698 
2699 ////////////////////////////////////////////////////////////////////////////////
2700 /// Specific class header.
2701 
2702 void TMVA::MethodBDT::MakeClassSpecificHeader( std::ostream& fout, const TString& className) const
2703 {
2704  TString nodeName = className;
2705  nodeName.ReplaceAll("Read","");
2706  nodeName.Append("Node");
2707  //fout << "#ifndef NN" << std::endl; commented out on purpose see next line
2708  fout << "#define NN new "<<nodeName << std::endl; // NN definition depends on individual methods. Important to have NO #ifndef if several BDT methods compile together
2709  //fout << "#endif" << std::endl; commented out on purpose see previous line
2710  fout << " " << std::endl;
2711  fout << "#ifndef "<<nodeName<<"__def" << std::endl;
2712  fout << "#define "<<nodeName<<"__def" << std::endl;
2713  fout << " " << std::endl;
2714  fout << "class "<<nodeName<<" {" << std::endl;
2715  fout << " " << std::endl;
2716  fout << "public:" << std::endl;
2717  fout << " " << std::endl;
2718  fout << " // constructor of an essentially \"empty\" node floating in space" << std::endl;
2719  fout << " "<<nodeName<<" ( "<<nodeName<<"* left,"<<nodeName<<"* right," << std::endl;
2720  if (fUseFisherCuts){
2721  fout << " int nFisherCoeff," << std::endl;
2722  for (UInt_t i=0;i<GetNVariables()+1;i++){
2723  fout << " double fisherCoeff"<<i<<"," << std::endl;
2724  }
2725  }
2726  fout << " int selector, double cutValue, bool cutType, " << std::endl;
2727  fout << " int nodeType, double purity, double response ) :" << std::endl;
2728  fout << " fLeft ( left )," << std::endl;
2729  fout << " fRight ( right )," << std::endl;
2730  if (fUseFisherCuts) fout << " fNFisherCoeff ( nFisherCoeff )," << std::endl;
2731  fout << " fSelector ( selector )," << std::endl;
2732  fout << " fCutValue ( cutValue )," << std::endl;
2733  fout << " fCutType ( cutType )," << std::endl;
2734  fout << " fNodeType ( nodeType )," << std::endl;
2735  fout << " fPurity ( purity )," << std::endl;
2736  fout << " fResponse ( response ){" << std::endl;
2737  if (fUseFisherCuts){
2738  for (UInt_t i=0;i<GetNVariables()+1;i++){
2739  fout << " fFisherCoeff.push_back(fisherCoeff"<<i<<");" << std::endl;
2740  }
2741  }
2742  fout << " }" << std::endl << std::endl;
2743  fout << " virtual ~"<<nodeName<<"();" << std::endl << std::endl;
2744  fout << " // test event if it descends the tree at this node to the right" << std::endl;
2745  fout << " virtual bool GoesRight( const std::vector<double>& inputValues ) const;" << std::endl;
2746  fout << " "<<nodeName<<"* GetRight( void ) {return fRight; };" << std::endl << std::endl;
2747  fout << " // test event if it descends the tree at this node to the left " << std::endl;
2748  fout << " virtual bool GoesLeft ( const std::vector<double>& inputValues ) const;" << std::endl;
2749  fout << " "<<nodeName<<"* GetLeft( void ) { return fLeft; }; " << std::endl << std::endl;
2750  fout << " // return S/(S+B) (purity) at this node (from training)" << std::endl << std::endl;
2751  fout << " double GetPurity( void ) const { return fPurity; } " << std::endl;
2752  fout << " // return the node type" << std::endl;
2753  fout << " int GetNodeType( void ) const { return fNodeType; }" << std::endl;
2754  fout << " double GetResponse(void) const {return fResponse;}" << std::endl << std::endl;
2755  fout << "private:" << std::endl << std::endl;
2756  fout << " "<<nodeName<<"* fLeft; // pointer to the left daughter node" << std::endl;
2757  fout << " "<<nodeName<<"* fRight; // pointer to the right daughter node" << std::endl;
2758  if (fUseFisherCuts){
2759  fout << " int fNFisherCoeff; // =0 if this node doesn't use fisher, else =nvar+1 " << std::endl;
2760  fout << " std::vector<double> fFisherCoeff; // the fisher coeff (offset at the last element)" << std::endl;
2761  }
2762  fout << " int fSelector; // index of variable used in node selection (decision tree) " << std::endl;
2763  fout << " double fCutValue; // cut value applied on this node to discriminate bkg against sig" << std::endl;
2764  fout << " bool fCutType; // true: if event variable > cutValue ==> signal , false otherwise" << std::endl;
2765  fout << " int fNodeType; // Type of node: -1 == Bkg-leaf, 1 == Signal-leaf, 0 = internal " << std::endl;
2766  fout << " double fPurity; // Purity of node from training"<< std::endl;
2767  fout << " double fResponse; // Regression response value of node" << std::endl;
2768  fout << "}; " << std::endl;
2769  fout << " " << std::endl;
2770  fout << "//_______________________________________________________________________" << std::endl;
2771  fout << " "<<nodeName<<"::~"<<nodeName<<"()" << std::endl;
2772  fout << "{" << std::endl;
2773  fout << " if (fLeft != NULL) delete fLeft;" << std::endl;
2774  fout << " if (fRight != NULL) delete fRight;" << std::endl;
2775  fout << "}; " << std::endl;
2776  fout << " " << std::endl;
2777  fout << "//_______________________________________________________________________" << std::endl;
2778  fout << "bool "<<nodeName<<"::GoesRight( const std::vector<double>& inputValues ) const" << std::endl;
2779  fout << "{" << std::endl;
2780  fout << " // test event if it descends the tree at this node to the right" << std::endl;
2781  fout << " bool result;" << std::endl;
2782  if (fUseFisherCuts){
2783  fout << " if (fNFisherCoeff == 0){" << std::endl;
2784  fout << " result = (inputValues[fSelector] > fCutValue );" << std::endl;
2785  fout << " }else{" << std::endl;
2786  fout << " double fisher = fFisherCoeff.at(fFisherCoeff.size()-1);" << std::endl;
2787  fout << " for (unsigned int ivar=0; ivar<fFisherCoeff.size()-1; ivar++)" << std::endl;
2788  fout << " fisher += fFisherCoeff.at(ivar)*inputValues.at(ivar);" << std::endl;
2789  fout << " result = fisher > fCutValue;" << std::endl;
2790  fout << " }" << std::endl;
2791  }else{
2792  fout << " result = (inputValues[fSelector] > fCutValue );" << std::endl;
2793  }
2794  fout << " if (fCutType == true) return result; //the cuts are selecting Signal ;" << std::endl;
2795  fout << " else return !result;" << std::endl;
2796  fout << "}" << std::endl;
2797  fout << " " << std::endl;
2798  fout << "//_______________________________________________________________________" << std::endl;
2799  fout << "bool "<<nodeName<<"::GoesLeft( const std::vector<double>& inputValues ) const" << std::endl;
2800  fout << "{" << std::endl;
2801  fout << " // test event if it descends the tree at this node to the left" << std::endl;
2802  fout << " if (!this->GoesRight(inputValues)) return true;" << std::endl;
2803  fout << " else return false;" << std::endl;
2804  fout << "}" << std::endl;
2805  fout << " " << std::endl;
2806  fout << "#endif" << std::endl;
2807  fout << " " << std::endl;
2808 }
2809 
2810 ////////////////////////////////////////////////////////////////////////////////
2811 /// Recursively descends a tree and writes the node instance to the output stream.
2812 
2813 void TMVA::MethodBDT::MakeClassInstantiateNode( DecisionTreeNode *n, std::ostream& fout, const TString& className ) const
2814 {
2815  if (n == NULL) {
2816  Log() << kFATAL << "MakeClassInstantiateNode: started with undefined node" <<Endl;
2817  return ;
2818  }
2819  fout << "NN("<<std::endl;
2820  if (n->GetLeft() != NULL){
2821  this->MakeClassInstantiateNode( (DecisionTreeNode*)n->GetLeft() , fout, className);
2822  }
2823  else {
2824  fout << "0";
2825  }
2826  fout << ", " <<std::endl;
2827  if (n->GetRight() != NULL){
2828  this->MakeClassInstantiateNode( (DecisionTreeNode*)n->GetRight(), fout, className );
2829  }
2830  else {
2831  fout << "0";
2832  }
2833  fout << ", " << std::endl
2834  << std::setprecision(6);
2835  if (fUseFisherCuts){
2836  fout << n->GetNFisherCoeff() << ", ";
2837  for (UInt_t i=0; i< GetNVariables()+1; i++) {
2838  if (n->GetNFisherCoeff() == 0 ){
2839  fout << "0, ";
2840  }else{
2841  fout << n->GetFisherCoeff(i) << ", ";
2842  }
2843  }
2844  }
2845  fout << n->GetSelector() << ", "
2846  << n->GetCutValue() << ", "
2847  << n->GetCutType() << ", "
2848  << n->GetNodeType() << ", "
2849  << n->GetPurity() << ","
2850  << n->GetResponse() << ") ";
2851 }
2852 
2853 ////////////////////////////////////////////////////////////////////////////////
2854 /// Find useful preselection cuts that will be applied before
2855 /// and Decision Tree training.. (and of course also applied
2856 /// in the GetMVA .. --> -1 for background +1 for Signal)
2857 
2858 void TMVA::MethodBDT::DeterminePreselectionCuts(const std::vector<const TMVA::Event*>& eventSample)
2859 {
2860  Double_t nTotS = 0.0, nTotB = 0.0;
2861  Int_t nTotS_unWeighted = 0, nTotB_unWeighted = 0;
2862 
2863  std::vector<TMVA::BDTEventWrapper> bdtEventSample;
2864 
2865  fIsLowSigCut.assign(GetNvar(),kFALSE);
2866  fIsLowBkgCut.assign(GetNvar(),kFALSE);
2867  fIsHighSigCut.assign(GetNvar(),kFALSE);
2868  fIsHighBkgCut.assign(GetNvar(),kFALSE);
2869 
2870  fLowSigCut.assign(GetNvar(),0.); // ---------------| --> in var is signal (accept all above lower cut)
2871  fLowBkgCut.assign(GetNvar(),0.); // ---------------| --> in var is bkg (accept all above lower cut)
2872  fHighSigCut.assign(GetNvar(),0.); // <-- | -------------- in var is signal (accept all blow cut)
2873  fHighBkgCut.assign(GetNvar(),0.); // <-- | -------------- in var is blg (accept all blow cut)
2874 
2875 
2876  // Initialize (un)weighted counters for signal & background
2877  // Construct a list of event wrappers that point to the original data
2878  for( std::vector<const TMVA::Event*>::const_iterator it = eventSample.begin(); it != eventSample.end(); ++it ) {
2879  if (DataInfo().IsSignal(*it)){
2880  nTotS += (*it)->GetWeight();
2881  ++nTotS_unWeighted;
2882  }
2883  else {
2884  nTotB += (*it)->GetWeight();
2885  ++nTotB_unWeighted;
2886  }
2887  bdtEventSample.push_back(TMVA::BDTEventWrapper(*it));
2888  }
2889 
2890  for( UInt_t ivar = 0; ivar < GetNvar(); ivar++ ) { // loop over all discriminating variables
2891  TMVA::BDTEventWrapper::SetVarIndex(ivar); // select the variable to sort by
2892  std::sort( bdtEventSample.begin(),bdtEventSample.end() ); // sort the event data
2893 
2894  Double_t bkgWeightCtr = 0.0, sigWeightCtr = 0.0;
2895  std::vector<TMVA::BDTEventWrapper>::iterator it = bdtEventSample.begin(), it_end = bdtEventSample.end();
2896  for( ; it != it_end; ++it ) {
2897  if (DataInfo().IsSignal(**it))
2898  sigWeightCtr += (**it)->GetWeight();
2899  else
2900  bkgWeightCtr += (**it)->GetWeight();
2901  // Store the accumulated signal (background) weights
2902  it->SetCumulativeWeight(false,bkgWeightCtr);
2903  it->SetCumulativeWeight(true,sigWeightCtr);
2904  }
2905 
2906  //variable that determines how "exact" you cut on the preselection found in the training data. Here I chose
2907  //1% of the variable range...
2908  Double_t dVal = (DataInfo().GetVariableInfo(ivar).GetMax() - DataInfo().GetVariableInfo(ivar).GetMin())/100. ;
2909  Double_t nSelS, nSelB, effS=0.05, effB=0.05, rejS=0.05, rejB=0.05;
2910  Double_t tmpEffS, tmpEffB, tmpRejS, tmpRejB;
2911  // Locate the optimal cut for this (ivar-th) variable
2912 
2913 
2914 
2915  for(UInt_t iev = 1; iev < bdtEventSample.size(); iev++) {
2916  //dVal = bdtEventSample[iev].GetVal() - bdtEventSample[iev-1].GetVal();
2917 
2918  nSelS = bdtEventSample[iev].GetCumulativeWeight(true);
2919  nSelB = bdtEventSample[iev].GetCumulativeWeight(false);
2920  // you look for some 100% efficient pre-selection cut to remove background.. i.e. nSelS=0 && nSelB>5%nTotB or ( nSelB=0 nSelS>5%nTotS)
2921  tmpEffS=nSelS/nTotS;
2922  tmpEffB=nSelB/nTotB;
2923  tmpRejS=1-tmpEffS;
2924  tmpRejB=1-tmpEffB;
2925  if (nSelS==0 && tmpEffB>effB) {effB=tmpEffB; fLowBkgCut[ivar] = bdtEventSample[iev].GetVal() - dVal; fIsLowBkgCut[ivar]=kTRUE;}
2926  else if (nSelB==0 && tmpEffS>effS) {effS=tmpEffS; fLowSigCut[ivar] = bdtEventSample[iev].GetVal() - dVal; fIsLowSigCut[ivar]=kTRUE;}
2927  else if (nSelB==nTotB && tmpRejS>rejS) {rejS=tmpRejS; fHighSigCut[ivar] = bdtEventSample[iev].GetVal() + dVal; fIsHighSigCut[ivar]=kTRUE;}
2928  else if (nSelS==nTotS && tmpRejB>rejB) {rejB=tmpRejB; fHighBkgCut[ivar] = bdtEventSample[iev].GetVal() + dVal; fIsHighBkgCut[ivar]=kTRUE;}
2929 
2930  }
2931  }
2932 
2933  Log() << kDEBUG << " \tfound and suggest the following possible pre-selection cuts " << Endl;
2934  if (fDoPreselection) Log() << kDEBUG << "\tthe training will be done after these cuts... and GetMVA value returns +1, (-1) for a signal (bkg) event that passes these cuts" << Endl;
2935  else Log() << kDEBUG << "\tas option DoPreselection was not used, these cuts however will not be performed, but the training will see the full sample"<<Endl;
2936  for (UInt_t ivar=0; ivar < GetNvar(); ivar++ ) { // loop over all discriminating variables
2937  if (fIsLowBkgCut[ivar]){
2938  Log() << kDEBUG << " \tfound cut: Bkg if var " << ivar << " < " << fLowBkgCut[ivar] << Endl;
2939  }
2940  if (fIsLowSigCut[ivar]){
2941  Log() << kDEBUG << " \tfound cut: Sig if var " << ivar << " < " << fLowSigCut[ivar] << Endl;
2942  }
2943  if (fIsHighBkgCut[ivar]){
2944  Log() << kDEBUG << " \tfound cut: Bkg if var " << ivar << " > " << fHighBkgCut[ivar] << Endl;
2945  }
2946  if (fIsHighSigCut[ivar]){
2947  Log() << kDEBUG << " \tfound cut: Sig if var " << ivar << " > " << fHighSigCut[ivar] << Endl;
2948  }
2949  }
2950 
2951  return;
2952 }
2953 
2954 ////////////////////////////////////////////////////////////////////////////////
2955 /// Apply the preselection cuts before even bothering about any
2956 /// Decision Trees in the GetMVA .. --> -1 for background +1 for Signal
2957 
2959 {
2960  Double_t result=0;
2961 
2962  for (UInt_t ivar=0; ivar < GetNvar(); ivar++ ) { // loop over all discriminating variables
2963  if (fIsLowBkgCut[ivar]){
2964  if (ev->GetValue(ivar) < fLowBkgCut[ivar]) result = -1; // is background
2965  }
2966  if (fIsLowSigCut[ivar]){
2967  if (ev->GetValue(ivar) < fLowSigCut[ivar]) result = 1; // is signal
2968  }
2969  if (fIsHighBkgCut[ivar]){
2970  if (ev->GetValue(ivar) > fHighBkgCut[ivar]) result = -1; // is background
2971  }
2972  if (fIsHighSigCut[ivar]){
2973  if (ev->GetValue(ivar) > fHighSigCut[ivar]) result = 1; // is signal
2974  }
2975  }
2976 
2977  return result;
2978 }
2979 
Bool_t fUseYesNoLeaf
Definition: MethodBDT.h:228
Types::EAnalysisType fAnalysisType
Definition: MethodBase.h:577
void Train(void)
BDT training.
Definition: MethodBDT.cxx:1135
void PreProcessNegativeEventWeights()
O.k.
Definition: MethodBDT.cxx:926
virtual Int_t Fill(Double_t x)
Increment bin with abscissa X by 1.
Definition: TH1.cxx:3126
double dist(Rotation3D const &r1, Rotation3D const &r2)
Definition: 3DDistances.cxx:48
void GetBaggedSubSample(std::vector< const TMVA::Event *> &)
Fills fEventSample with fBaggedSampleFraction*NEvents random training events.
Definition: MethodBDT.cxx:2031
static long int sum(long int i)
Definition: Factory.cxx:2162
virtual Double_t Fit(std::vector< LossFunctionEventInfo > &evs)=0
Random number generator class based on M.
Definition: TRandom3.h:27
THist< 1, int, THistStatContent > TH1I
Definition: THist.hxx:313
virtual Double_t PoissonD(Double_t mean)
Generates a random number according to a Poisson law.
Definition: TRandom.cxx:414
MsgLogger & Endl(MsgLogger &ml)
Definition: MsgLogger.h:158
Singleton class for Global types used by TMVA.
Definition: Types.h:73
long long Long64_t
Definition: RtypesCore.h:69
std::vector< Bool_t > fIsLowSigCut
Definition: MethodBDT.h:278
Double_t RegBoost(std::vector< const TMVA::Event *> &, DecisionTree *dt)
A special boosting only for Regression (not implemented).
Definition: MethodBDT.cxx:2065
void DeclareCompatibilityOptions()
Options that are used ONLY for the READER to ensure backward compatibility.
Definition: MethodBDT.cxx:455
std::map< const TMVA::Event *, LossFunctionEventInfo > fLossFunctionEventInfo
Definition: MethodBDT.h:213
Bool_t fPairNegWeightsGlobal
Definition: MethodBDT.h:247
void AddPoint(Double_t x, Double_t y1, Double_t y2)
This function is used only in 2 TGraph case, and it will add new data points to graphs.
Definition: MethodBase.cxx:212
void SetUseNvars(Int_t n)
Definition: MethodBDT.h:131
Bool_t fRandomisedTrees
Definition: MethodBDT.h:238
TString fMinNodeSizeS
Definition: MethodBDT.h:222
Double_t Log(Double_t x)
Definition: TMath.h:649
const Ranking * CreateRanking()
Compute ranking of input variables.
Definition: MethodBDT.cxx:2552
virtual void Delete(Option_t *option="")
Delete this tree from memory or/and disk.
Definition: TTree.cxx:3571
Bool_t IsConstructedFromWeightFile() const
Definition: MethodBase.h:522
float Float_t
Definition: RtypesCore.h:53
TString fPruneMethodS
Definition: MethodBDT.h:234
Double_t GetMin() const
Definition: VariableInfo.h:63
TString fSepTypeS
Definition: MethodBDT.h:219
Double_t CheckEvent(const TMVA::Event *, Bool_t UseYesNoLeaf=kFALSE) const
the event e is put into the decision tree (starting at the root node) and the output is NodeType (sig...
void BDT(TString dataset, const TString &fin="TMVA.root")
Absolute Deviation BDT Loss Function.
Definition: LossFunction.h:258
TString & ReplaceAll(const TString &s1, const TString &s2)
Definition: TString.h:640
UInt_t GetNvar() const
Definition: MethodBase.h:328
TTree * fMonitorNtuple
Definition: MethodBDT.h:253
UInt_t GetNNodes() const
Definition: BinaryTree.h:86
virtual Int_t Fill()
Fill all branches.
Definition: TTree.cxx:4383
virtual void SetName(const char *name)
Set the name of the TNamed.
Definition: TNamed.cxx:131
THist< 1, float, THistStatContent, THistStatUncertainty > TH1F
Definition: THist.hxx:311
TH1 * h
Definition: legend2.C:5
Double_t fAdaBoostBeta
Definition: MethodBDT.h:205
MsgLogger & Log() const
Definition: Configurable.h:122
std::vector< Bool_t > fIsHighSigCut
Definition: MethodBDT.h:280
OptionBase * DeclareOptionRef(T &ref, const TString &name, const TString &desc="")
void DeclareOptions()
Define the options (their key words).
Definition: MethodBDT.cxx:334
std::vector< Double_t > fVariableImportance
Definition: MethodBDT.h:267
Bool_t IsFloat() const
Returns kTRUE if string contains a floating point or integer number.
Definition: TString.cxx:1845
void DeterminePreselectionCuts(const std::vector< const TMVA::Event *> &eventSample)
Find useful preselection cuts that will be applied before and Decision Tree training.
Definition: MethodBDT.cxx:2858
EAnalysisType
Definition: Types.h:125
void MakeClassInstantiateNode(DecisionTreeNode *n, std::ostream &fout, const TString &className) const
Recursively descends a tree and writes the node instance to the output stream.
Definition: MethodBDT.cxx:2813
Double_t fMinLinCorrForFisher
Definition: MethodBDT.h:226
Virtual base Class for all MVA method.
Definition: MethodBase.h:106
std::vector< const TMVA::Event * > fEventSample
Definition: MethodBDT.h:195
Double_t Bagging()
Call it boot-strapping, re-sampling or whatever you like, in the end it is nothing else but applying ...
Definition: MethodBDT.cxx:2020
bool fExitFromTraining
Definition: MethodBase.h:431
Bool_t fBaggedGradBoost
Definition: MethodBDT.h:210
Bool_t fDoBoostMonitor
Definition: MethodBDT.h:249
Basic string class.
Definition: TString.h:129
tomato 1-D histogram with a float per channel (see TH1 documentation)}
Definition: TH1.h:551
TransformationHandler & GetTransformationHandler(Bool_t takeReroutedIfAvailable=true)
Definition: MethodBase.h:378
Ranking for variables in method (implementation)
Definition: Ranking.h:48
Short_t Min(Short_t a, Short_t b)
Definition: TMathBase.h:168
void ToLower()
Change string to lower-case.
Definition: TString.cxx:1099
int Int_t
Definition: RtypesCore.h:41
virtual void SetYTitle(const char *title)
Definition: TH1.h:390
bool Bool_t
Definition: RtypesCore.h:59
virtual void SetTitle(const char *title="")
Set graph title.
Definition: TGraph.cxx:2180
Double_t AdaBoost(std::vector< const TMVA::Event *> &, DecisionTree *dt)
The AdaBoost implementation.
Definition: MethodBDT.cxx:1726
UInt_t GetNClasses() const
Definition: DataSetInfo.h:136
void ProcessOptions()
The option string is decoded, for available options see "DeclareOptions".
Definition: MethodBDT.cxx:471
Bool_t fBaggedBoost
Definition: MethodBDT.h:209
Int_t FloorNint(Double_t x)
Definition: TMath.h:603
void GetHelpMessage() const
Get help message text.
Definition: MethodBDT.cxx:2569
Bool_t GetCutType(void) const
#define NULL
Definition: RtypesCore.h:88
std::vector< Bool_t > fIsHighBkgCut
Definition: MethodBDT.h:281
void SetShrinkage(Double_t s)
Definition: MethodBDT.h:130
Double_t AdaCost(std::vector< const TMVA::Event *> &, DecisionTree *dt)
The AdaCost boosting algorithm takes a simple cost Matrix (currently fixed for all events...
Definition: MethodBDT.cxx:1904
Bool_t fAutomatic
Definition: MethodBDT.h:237
void MakeClassSpecific(std::ostream &, const TString &) const
Make ROOT-independent C++ class for classifier response (classifier-specific implementation).
Definition: MethodBDT.cxx:2626
void AddAttr(void *node, const char *, const T &value, Int_t precision=16)
add attribute to xml
Definition: Tools.h:308
virtual Double_t GetROCIntegral(TH1D *histS, TH1D *histB) const
calculate the area (integral) under the ROC curve as a overall quality measure of the classification ...
Double_t fCts_sb
Definition: MethodBDT.h:259
void * AddChild(void *parent, const char *childname, const char *content=0, bool isRootNode=false)
add child node
Definition: Tools.cxx:1135
Short_t Abs(Short_t d)
Definition: TMathBase.h:108
TString fRegressionLossFunctionBDTGS
Definition: MethodBDT.h:285
Double_t fBoostWeight
Definition: MethodBDT.h:255
Double_t GetMvaValue(Double_t *err=0, Double_t *errUpper=0)
Definition: MethodBDT.cxx:2323
LongDouble_t Power(LongDouble_t x, LongDouble_t y)
Definition: TMath.h:628
const TString & GetInputLabel(Int_t i) const
Definition: MethodBase.h:334
Huber BDT Loss Function.
Definition: LossFunction.h:176
UInt_t fSignalClass
Definition: MethodBase.h:671
Double_t fPruneStrength
Definition: MethodBDT.h:235
std::vector< Double_t > fHighBkgCut
Definition: MethodBDT.h:276
Double_t GetGradBoostMVA(const TMVA::Event *e, UInt_t nTrees)
Returns MVA value: -1 for background, 1 for signal.
Definition: MethodBDT.cxx:1411
TClass * GetClass(T *)
Definition: TClass.h:545
Double_t fBaggedSampleFraction
Definition: MethodBDT.h:243
Implementation of the CrossEntropy as separation criterion.
Definition: CrossEntropy.h:43
Bool_t fInverseBoostNegWeights
Definition: MethodBDT.h:246
TStopwatch timer
Definition: pirndm.C:37
Double_t GradBoostRegression(std::vector< const TMVA::Event *> &, DecisionTree *dt)
Implementation of M_TreeBoost using any loss function as described by Friedman 1999.
Definition: MethodBDT.cxx:1514
virtual void SetTuneParameters(std::map< TString, Double_t > tuneParameters)
Set the tuning parameters according to the argument.
Definition: MethodBDT.cxx:1115
void MakeClassSpecificHeader(std::ostream &, const TString &) const
Specific class header.
Definition: MethodBDT.cxx:2702
Float_t GetCutValue(void) const
UInt_t GetTrainingTMVAVersionCode() const
Definition: MethodBase.h:373
const Event * GetEvent() const
Definition: MethodBase.h:733
DataSet * Data() const
Definition: MethodBase.h:393
Double_t fSigToBkgFraction
Definition: MethodBDT.h:203
virtual Bool_t HasAnalysisType(Types::EAnalysisType type, UInt_t numberClasses, UInt_t numberTargets)
BDT can handle classification with multiple classes and regression with one regression-target.
Definition: MethodBDT.cxx:281
UInt_t GetNFisherCoeff() const
Bool_t fDoPreselection
Definition: MethodBDT.h:263
void * GetChild(void *parent, const char *childname=0)
get child node
Definition: Tools.cxx:1161
void Reset(void)
Reset the method, as if it had just been instantiated (forget all training etc.). ...
Definition: MethodBDT.cxx:724
TString & Append(const char *cs)
Definition: TString.h:497
void SetMinNodeSize(Double_t sizeInPercent)
Definition: MethodBDT.cxx:659
TString fAdaBoostR2Loss
Definition: MethodBDT.h:206
void Init(std::vector< TString > &graphTitles)
This function gets some title and it creates a TGraph for every title.
Definition: MethodBase.cxx:174
DataSetInfo & DataInfo() const
Definition: MethodBase.h:394
Bool_t DoRegression() const
Definition: MethodBase.h:422
Int_t fMinNodeEvents
Definition: MethodBDT.h:220
Double_t AdaBoostR2(std::vector< const TMVA::Event *> &, DecisionTree *dt)
Adaption of the AdaBoost to regression problems (see H.Drucker 1997).
Definition: MethodBDT.cxx:2073
void SetNTrees(Int_t d)
Definition: MethodBDT.h:127
std::vector< Double_t > fHighSigCut
Definition: MethodBDT.h:275
Class that contains all the data information.
Definition: DataSetInfo.h:60
Least Squares BDT Loss Function.
Definition: LossFunction.h:219
Implementation of the SdivSqrtSplusB as separation criterion.
PDF wrapper for histograms; uses user-defined spline interpolation.
Definition: PDF.h:63
Double_t fCtb_ss
Definition: MethodBDT.h:260
Long64_t GetNTrainingEvents() const
Definition: DataSet.h:79
const std::vector< Float_t > & GetMulticlassValues()
Get the multiclass MVA response for the BDT classifier.
Definition: MethodBDT.cxx:2375
UInt_t fIPyCurrentIter
Definition: MethodBase.h:432
virtual void Print(Option_t *option="") const
Print TNamed name and title.
Definition: TNamed.cxx:119
Float_t fMinNodeSize
Definition: MethodBDT.h:221
const Event * GetTrainingEvent(Long64_t ievt) const
Definition: MethodBase.h:753
Implementation of the MisClassificationError as separation criterion.
Bool_t fNoNegWeightsInTraining
Definition: MethodBDT.h:245
TString GetElapsedTime(Bool_t Scientific=kTRUE)
returns pretty string with elapsed time
Definition: Timer.cxx:134
Double_t GetMax() const
Definition: VariableInfo.h:64
Bool_t DoMulticlass() const
Definition: MethodBase.h:423
const std::vector< Float_t > & GetRegressionValues()
Get the regression value generated by the BDTs.
Definition: MethodBDT.cxx:2411
void InitEventSample()
Initialize the event sample (i.e. reset the boost-weights... etc).
Definition: MethodBDT.cxx:760
std::vector< Bool_t > fIsLowBkgCut
Definition: MethodBDT.h:279
void WriteMonitoringHistosToFile(void) const
Here we could write some histograms created during the processing to the output file.
Definition: MethodBDT.cxx:2497
virtual void Delete(Option_t *option="")
Delete this object.
Definition: TObject.cxx:176
Double_t fErrorFraction
Definition: MethodBDT.h:256
VecExpr< UnaryOp< Fabs< T >, VecExpr< A, T, D >, T >, T, D > fabs(const VecExpr< A, T, D > &rhs)
const Event * GetTestingEvent(Long64_t ievt) const
Definition: MethodBase.h:759
virtual Double_t Determinant() const
virtual Int_t Write(const char *name=0, Int_t option=0, Int_t bufsize=0)
Write this object to the current directory.
Definition: TTree.cxx:9163
Float_t GetTarget(UInt_t itgt) const
Definition: Event.h:97
Bool_t HasTrainingTree() const
Definition: MethodBase.h:495
Results * GetResults(const TString &, Types::ETreeType type, Types::EAnalysisType analysistype)
Definition: DataSet.cxx:265
std::vector< Double_t > fLowBkgCut
Definition: MethodBDT.h:274
Int_t GetNodeType(void) const
TRandom2 r(17)
Double_t fNodePurityLimit
Definition: MethodBDT.h:229
Service class for 2-Dim histogram classes.
Definition: TH2.h:30
Bool_t fHistoricBool
Definition: MethodBDT.h:283
void SetBaggedSampleFraction(Double_t f)
Definition: MethodBDT.h:132
SVector< double, 2 > v
Definition: Dict.h:5
virtual TString Name()=0
const char * GetName() const
Definition: MethodBase.h:318
ClassInfo * GetClassInfo(Int_t clNum) const
std::map< TString, Double_t > optimize()
TGraph * GetGraph(const TString &alias) const
Definition: Results.cxx:147
void BoostMonitor(Int_t iTree)
Fills the ROCIntegral vs Itree from the testSample for the monitoring plots during the training ...
Definition: MethodBDT.cxx:1632
The TMVA::Interval Class.
Definition: Interval.h:61
Double_t GetFisherCoeff(Int_t ivar) const
Bool_t fTrainWithNegWeights
Definition: MethodBDT.h:248
Bool_t fSkipNormalization
Definition: MethodBDT.h:265
void DeleteResults(const TString &, Types::ETreeType type, Types::EAnalysisType analysistype)
delete the results stored for this particular Method instance.
Definition: DataSet.cxx:316
virtual ~MethodBDT(void)
Destructor.
Definition: MethodBDT.cxx:752
Implementation of the GiniIndex as separation criterion.
Definition: GiniIndex.h:63
virtual void SetBinContent(Int_t bin, Double_t content)
Set bin content see convention for numbering bins in TH1::GetBin In case the bin number is greater th...
Definition: TH1.cxx:8325
void SetNodePurityLimit(Double_t l)
Definition: MethodBDT.h:129
UInt_t fIPyMaxIter
Definition: MethodBase.h:432
Double_t PrivateGetMvaValue(const TMVA::Event *ev, Double_t *err=0, Double_t *errUpper=0, UInt_t useNTrees=0)
Return the MVA value (range [-1;1]) that classifies the event according to the majority vote from the...
Definition: MethodBDT.cxx:2348
Implementation of a Decision Tree.
Definition: DecisionTree.h:59
unsigned int UInt_t
Definition: RtypesCore.h:42
Double_t GradBoost(std::vector< const TMVA::Event *> &, DecisionTree *dt, UInt_t cls=0)
Calculate the desired response value for each region.
Definition: MethodBDT.cxx:1481
char * Form(const char *fmt,...)
const Event * InverseTransform(const Event *, Bool_t suppressIfNoTargets=true) const
double floor(double)
Int_t GetN() const
Definition: TGraph.h:122
const TString & GetMethodName() const
Definition: MethodBase.h:315
void SetTarget(UInt_t itgt, Float_t value)
set the target value (dimension itgt) to value
Definition: Event.cxx:360
Double_t fCbb
Definition: MethodBDT.h:261
void ReadAttr(void *node, const char *, T &value)
read attribute from xml
Definition: Tools.h:290
SeparationBase * fSepType
Definition: MethodBDT.h:218
void Init(void)
Common initialisation with defaults for the BDT-Method.
Definition: MethodBDT.cxx:686
Tools & gTools()
void ReadWeightsFromXML(void *parent)
Reads the BDT from the xml file.
Definition: MethodBDT.cxx:2221
TMVA::DecisionTreeNode * GetEventNode(const TMVA::Event &e) const
get the pointer to the leaf node where a particular event ends up in...
virtual const char * GetPath() const
Returns the full path of the directory.
Definition: TDirectory.cxx:911
Bool_t fUseExclusiveVars
Definition: MethodBDT.h:227
TGraphErrors * gr
Definition: legend1.C:25
REAL epsilon
Definition: triangle.c:617
constexpr Double_t E()
Definition: TMath.h:74
Double_t TestTreeQuality(DecisionTree *dt)
Test the tree quality.. in terms of Misclassification.
Definition: MethodBDT.cxx:1577
Implementation of the GiniIndex With Laplace correction as separation criterion.
Long64_t GetNTestEvents() const
Definition: DataSet.h:80
UInt_t GetNVariables() const
Definition: MethodBase.h:329
const Bool_t kFALSE
Definition: RtypesCore.h:92
Float_t GetValue(UInt_t ivar) const
return value of i&#39;th variable
Definition: Event.cxx:237
DecisionTree::EPruneMethod fPruneMethod
Definition: MethodBDT.h:233
static void SetVarIndex(Int_t iVar)
UInt_t fUseNvars
Definition: MethodBDT.h:239
UInt_t fMaxDepth
Definition: MethodBDT.h:231
Float_t GetPurity(void) const
Bool_t IgnoreEventsWithNegWeightsInTraining() const
Definition: MethodBase.h:668
Double_t Exp(Double_t x)
Definition: TMath.h:622
#define ClassImp(name)
Definition: Rtypes.h:336
void ReadWeightsFromStream(std::istream &istr)
Read the weights (BDT coefficients).
Definition: MethodBDT.cxx:2288
double Double_t
Definition: RtypesCore.h:55
Double_t ApplyPreselectionCuts(const Event *ev)
Apply the preselection cuts before even bothering about any Decision Trees in the GetMVA ...
Definition: MethodBDT.cxx:2958
void UpdateTargets(std::vector< const TMVA::Event *> &, UInt_t cls=0)
Calculate residual for all events.
Definition: MethodBDT.cxx:1425
std::vector< Float_t > * fMulticlassReturnVal
Definition: MethodBase.h:580
Bool_t IsNormalised() const
Definition: MethodBase.h:478
void SetMaxDepth(Int_t d)
Definition: MethodBDT.h:123
TH1 * GetHist(const TString &alias) const
Definition: Results.cxx:130
int type
Definition: TGX11.cxx:120
void AddWeightsXMLTo(void *parent) const
Write weights to XML.
Definition: MethodBDT.cxx:2190
Double_t fShrinkage
Definition: MethodBDT.h:208
Bool_t fUsePoissonNvars
Definition: MethodBDT.h:240
static DecisionTree * CreateFromXML(void *node, UInt_t tmva_Version_Code=TMVA_VERSION_CODE)
re-create a new tree (decision tree or search tree) from XML
static RooMathCoreReg dummy
void * GetNextChild(void *prevchild, const char *childname=0)
XML helpers.
Definition: Tools.cxx:1173
void SetAdaBoostBeta(Double_t b)
Definition: MethodBDT.h:128
void SetCurrentType(Types::ETreeType type) const
Definition: DataSet.h:100
The TH1 histogram class.
Definition: TH1.h:56
std::vector< const TMVA::Event * > * fTrainSample
Definition: MethodBDT.h:198
you should not use this method at all Int_t Int_t Double_t Double_t Double_t e
Definition: TRolke.cxx:630
void UsefulSortAscending(std::vector< std::vector< Double_t > > &, std::vector< TString > *vs=0)
sort 2D vector (AND in parallel a TString vector) in such a way that the "first vector is sorted" and...
Definition: Tools.cxx:549
VariableInfo & GetVariableInfo(Int_t i)
Definition: DataSetInfo.h:96
void AddPreDefVal(const T &)
Definition: Configurable.h:168
Double_t Boost(std::vector< const TMVA::Event *> &, DecisionTree *dt, UInt_t cls=0)
Apply the boosting algorithm (the algorithm is selecte via the the "option" given in the constructor...
Definition: MethodBDT.cxx:1598
UInt_t GetNumber() const
Definition: ClassInfo.h:65
void ExitFromTraining()
Definition: MethodBase.h:446
The TMVA::Interval Class.
Definition: LogInterval.h:83
const TString & GetOptions() const
Definition: Configurable.h:84
LossFunctionBDT * fRegressionLossFunctionBDTG
Definition: MethodBDT.h:288
TString fBoostType
Definition: MethodBDT.h:204
Bool_t fUseFisherCuts
Definition: MethodBDT.h:225
const TString & Color(const TString &)
human readable color strings
Definition: Tools.cxx:839
TMatrixTSym< Element > & Invert(Double_t *det=0)
Invert the matrix and calculate its determinant Notice that the LU decomposition is used instead of B...
UInt_t fUseNTrainEvents
Definition: MethodBDT.h:241
virtual std::map< TString, Double_t > OptimizeTuningParameters(TString fomType="ROCIntegral", TString fitType="FitGA")
Call the Optimizer with the set of parameters and ranges that are meant to be tuned.
Definition: MethodBDT.cxx:1062
virtual Int_t Branch(TCollection *list, Int_t bufsize=32000, Int_t splitlevel=99, const char *name="")
Create one branch for each element in the collection.
Definition: TTree.cxx:1660
#define REGISTER_METHOD(CLASS)
for example
TString fNegWeightTreatment
Definition: MethodBDT.h:244
Abstract ClassifierFactory template that handles arbitrary types.
Ranking * fRanking
Definition: MethodBase.h:569
virtual void SetXTitle(const char *title)
Definition: TH1.h:389
virtual void SetPoint(Int_t i, Double_t x, Double_t y)
Set x and y values for point number i.
Definition: TGraph.cxx:2156
IPythonInteractive * fInteractive
Definition: MethodBase.h:430
TDirectory * BaseDir() const
returns the ROOT directory where info/histograms etc of the corresponding MVA method instance are sto...
Float_t GetResponse(void) const
virtual void AddRank(const Rank &rank)
Add a new rank take ownership of it.
Definition: Ranking.cxx:86
virtual void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
Definition: MethodBase.cxx:601
Class that is the base-class for a vector of result.
Definition: Results.h:57
Short_t Max(Short_t a, Short_t b)
Definition: TMathBase.h:200
double ceil(double)
A Graph is a graphics object made of two arrays X and Y with npoints each.
Definition: TGraph.h:41
virtual DecisionTreeNode * GetLeft() const
std::vector< const TMVA::Event * > fValidationSample
Definition: MethodBDT.h:196
std::vector< DecisionTree * > fForest
Definition: MethodBDT.h:201
virtual DecisionTreeNode * GetRight() const
Bool_t IsSignal(const Event *ev) const
void DrawProgressBar(Int_t, const TString &comment="")
draws progress bar in color or B&W caution:
Definition: Timer.cxx:190
std::vector< Double_t > GetVariableImportance()
Return the relative variable importance, normalized to all variables together having the importance 1...
Definition: MethodBDT.cxx:2512
Double_t fFValidationEvents
Definition: MethodBDT.h:236
std::vector< Double_t > fLowSigCut
Definition: MethodBDT.h:273
std::vector< Float_t > * fRegressionReturnVal
Definition: MethodBase.h:579
Double_t Atof() const
Return floating-point value contained in string.
Definition: TString.cxx:2041
void UpdateTargetsRegression(std::vector< const TMVA::Event *> &, Bool_t first=kFALSE)
Calculate current residuals for all events and update targets for next iteration. ...
Definition: MethodBDT.cxx:1467
Types::EAnalysisType GetAnalysisType() const
Definition: MethodBase.h:421
A TTree object has a header with a name and a title.
Definition: TTree.h:78
double result[121]
Short_t GetSelector() const
std::map< const TMVA::Event *, std::vector< double > > fResiduals
Definition: MethodBDT.h:215
Definition: first.py:1
void Store(TObject *obj, const char *alias=0)
Definition: Results.cxx:86
static const Int_t fgDebugLevel
Definition: MethodBDT.h:291
virtual void Init(std::map< const TMVA::Event *, LossFunctionEventInfo > &evinfomap, std::vector< double > &boostWeights)=0
Double_t Sqrt(Double_t x)
Definition: TMath.h:591
std::vector< TMatrixDSym * > * CalcCovarianceMatrices(const std::vector< Event *> &events, Int_t maxCls, VariableTransformBase *transformBase=0)
compute covariance matrices
Definition: Tools.cxx:1525
virtual void Set(Int_t n)
Set number of points in the graph Existing coordinates are preserved New coordinates above fNpoints a...
Definition: TGraph.cxx:2105
double exp(double)
THist< 2, float, THistStatContent, THistStatUncertainty > TH2F
Definition: THist.hxx:317
Double_t fCss
Definition: MethodBDT.h:258
const Bool_t kTRUE
Definition: RtypesCore.h:91
double norm(double *x, double *p)
Definition: unuranDistr.cxx:40
const Int_t n
Definition: legend1.C:16
std::vector< const TMVA::Event * > fSubSample
Definition: MethodBDT.h:197
Timing information for training and evaluation of MVA methods.
Definition: Timer.h:58
UInt_t GetNTrees() const
Definition: MethodBDT.h:102
Analysis of Boosted Decision Trees.
Definition: MethodBDT.h:54
Int_t CeilNint(Double_t x)
Definition: TMath.h:597
Double_t fHuberQuantile
Definition: MethodBDT.h:286
UInt_t fNNodesMax
Definition: MethodBDT.h:230
virtual void SetTargets(std::vector< const TMVA::Event *> &evs, std::map< const TMVA::Event *, LossFunctionEventInfo > &evinfomap)=0
void InitGradBoost(std::vector< const TMVA::Event *> &)
Initialize targets for first tree.
Definition: MethodBDT.cxx:1539
void NoErrorCalc(Double_t *const err, Double_t *const errUpper)
Definition: MethodBase.cxx:829
void SetSignalReferenceCut(Double_t cut)
Definition: MethodBase.h:348
virtual const char * GetTitle() const
Returns title of object.
Definition: TNamed.h:48
std::vector< double > fBoostWeights
Definition: MethodBDT.h:202
MethodBDT(const TString &jobName, const TString &methodTitle, DataSetInfo &theData, const TString &theOption="")
The standard constructor for the "boosted decision trees".
Definition: MethodBDT.cxx:164