Logo ROOT   6.08/07
Reference Guide
MethodBDT.cxx
Go to the documentation of this file.
1 // Author: Andreas Hoecker, Joerg Stelzer, Helge Voss, Kai Voss, Eckhard v. Toerne, Jan Therhaag
2 
3 /**********************************************************************************
4  * Project: TMVA - a Root-integrated toolkit for multivariate data analysis *
5  * Package: TMVA *
6  * Class : MethodBDT (BDT = Boosted Decision Trees) *
7  * Web : http://tmva.sourceforge.net *
8  * *
9  * Description: *
10  * Analysis of Boosted Decision Trees *
11  * *
12  * Authors (alphabetical): *
13  * Andreas Hoecker <Andreas.Hocker@cern.ch> - CERN, Switzerland *
14  * Helge Voss <Helge.Voss@cern.ch> - MPI-K Heidelberg, Germany *
15  * Kai Voss <Kai.Voss@cern.ch> - U. of Victoria, Canada *
16  * Doug Schouten <dschoute@sfu.ca> - Simon Fraser U., Canada *
17  * Jan Therhaag <jan.therhaag@cern.ch> - U. of Bonn, Germany *
18  * Eckhard v. Toerne <evt@uni-bonn.de> - U of Bonn, Germany *
19  * *
20  * Copyright (c) 2005-2011: *
21  * CERN, Switzerland *
22  * U. of Victoria, Canada *
23  * MPI-K Heidelberg, Germany *
24  * U. of Bonn, Germany *
25  * *
26  * Redistribution and use in source and binary forms, with or without *
27  * modification, are permitted according to the terms listed in LICENSE *
28  * (http://tmva.sourceforge.net/LICENSE) *
29  **********************************************************************************/
30 
31 //_______________________________________________________________________
32 //
33 // Analysis of Boosted Decision Trees
34 //
35 // Boosted decision trees have been successfully used in High Energy
36 // Physics analysis for example by the MiniBooNE experiment
37 // (Yang-Roe-Zhu, physics/0508045). In Boosted Decision Trees, the
38 // selection is done on a majority vote on the result of several decision
39 // trees, which are all derived from the same training sample by
40 // supplying different event weights during the training.
41 //
42 // Decision trees:
43 //
44 // Successive decision nodes are used to categorize the
45 // events out of the sample as either signal or background. Each node
46 // uses only a single discriminating variable to decide if the event is
47 // signal-like ("goes right") or background-like ("goes left"). This
48 // forms a tree like structure with "baskets" at the end (leave nodes),
49 // and an event is classified as either signal or background according to
50 // whether the basket where it ends up has been classified signal or
51 // background during the training. Training of a decision tree is the
52 // process to define the "cut criteria" for each node. The training
53 // starts with the root node. Here one takes the full training event
54 // sample and selects the variable and corresponding cut value that gives
55 // the best separation between signal and background at this stage. Using
56 // this cut criterion, the sample is then divided into two subsamples, a
57 // signal-like (right) and a background-like (left) sample. Two new nodes
58 // are then created for each of the two sub-samples and they are
59 // constructed using the same mechanism as described for the root
60 // node. The devision is stopped once a certain node has reached either a
61 // minimum number of events, or a minimum or maximum signal purity. These
62 // leave nodes are then called "signal" or "background" if they contain
63 // more signal respective background events from the training sample.
64 //
65 // Boosting:
66 //
67 // The idea behind adaptive boosting (AdaBoost) is, that signal events
68 // from the training sample, that end up in a background node
69 // (and vice versa) are given a larger weight than events that are in
70 // the correct leave node. This results in a re-weighed training event
71 // sample, with which then a new decision tree can be developed.
72 // The boosting can be applied several times (typically 100-500 times)
73 // and one ends up with a set of decision trees (a forest).
74 // Gradient boosting works more like a function expansion approach, where
75 // each tree corresponds to a summand. The parameters for each summand (tree)
76 // are determined by the minimization of a error function (binomial log-
77 // likelihood for classification and Huber loss for regression).
78 // A greedy algorithm is used, which means, that only one tree is modified
79 // at a time, while the other trees stay fixed.
80 //
81 // Bagging:
82 //
83 // In this particular variant of the Boosted Decision Trees the boosting
84 // is not done on the basis of previous training results, but by a simple
85 // stochastic re-sampling of the initial training event sample.
86 //
87 // Random Trees:
88 // Similar to the "Random Forests" from Leo Breiman and Adele Cutler, it
89 // uses the bagging algorithm together and bases the determination of the
90 // best node-split during the training on a random subset of variables only
91 // which is individually chosen for each split.
92 //
93 // Analysis:
94 //
95 // Applying an individual decision tree to a test event results in a
96 // classification of the event as either signal or background. For the
97 // boosted decision tree selection, an event is successively subjected to
98 // the whole set of decision trees and depending on how often it is
99 // classified as signal, a "likelihood" estimator is constructed for the
100 // event being signal or background. The value of this estimator is the
101 // one which is then used to select the events from an event sample, and
102 // the cut value on this estimator defines the efficiency and purity of
103 // the selection.
104 //
105 //_______________________________________________________________________
106 
107 #include "TMVA/MethodBDT.h"
108 
109 #include "TMVA/BDTEventWrapper.h"
110 #include "TMVA/BinarySearchTree.h"
111 #include "TMVA/ClassifierFactory.h"
112 #include "TMVA/Configurable.h"
113 #include "TMVA/CrossEntropy.h"
114 #include "TMVA/DecisionTree.h"
115 #include "TMVA/DataSet.h"
116 #include "TMVA/GiniIndex.h"
118 #include "TMVA/Interval.h"
119 #include "TMVA/IMethod.h"
120 #include "TMVA/LogInterval.h"
121 #include "TMVA/MethodBase.h"
123 #include "TMVA/MsgLogger.h"
125 #include "TMVA/PDF.h"
126 #include "TMVA/Ranking.h"
127 #include "TMVA/Results.h"
128 #include "TMVA/ResultsMulticlass.h"
129 #include "TMVA/SdivSqrtSplusB.h"
130 #include "TMVA/SeparationBase.h"
131 #include "TMVA/Timer.h"
132 #include "TMVA/Tools.h"
133 #include "TMVA/Types.h"
134 
135 #include "Riostream.h"
136 #include "TDirectory.h"
137 #include "TRandom3.h"
138 #include "TMath.h"
139 #include "TMatrixTSym.h"
140 #include "TObjString.h"
141 #include "TGraph.h"
142 
143 #include <algorithm>
144 #include <fstream>
145 #include <math.h>
146 
147 
148 using std::vector;
149 using std::make_pair;
150 
152 
154 
156 
157 ////////////////////////////////////////////////////////////////////////////////
158 /// the standard constructor for the "boosted decision trees"
159 
161  const TString& methodTitle,
162  DataSetInfo& theData,
163  const TString& theOption ) :
164  TMVA::MethodBase( jobName, Types::kBDT, methodTitle, theData, theOption)
165  , fTrainSample(0)
166  , fNTrees(0)
167  , fSigToBkgFraction(0)
168  , fAdaBoostBeta(0)
169 // , fTransitionPoint(0)
170  , fShrinkage(0)
171  , fBaggedBoost(kFALSE)
172  , fBaggedGradBoost(kFALSE)
173 // , fSumOfWeights(0)
174  , fMinNodeEvents(0)
175  , fMinNodeSize(5)
176  , fMinNodeSizeS("5%")
177  , fNCuts(0)
178  , fUseFisherCuts(0) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
179  , fMinLinCorrForFisher(.8) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
180  , fUseExclusiveVars(0) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
181  , fUseYesNoLeaf(kFALSE)
182  , fNodePurityLimit(0)
183  , fNNodesMax(0)
184  , fMaxDepth(0)
185  , fPruneMethod(DecisionTree::kNoPruning)
186  , fPruneStrength(0)
187  , fFValidationEvents(0)
188  , fAutomatic(kFALSE)
189  , fRandomisedTrees(kFALSE)
190  , fUseNvars(0)
191  , fUsePoissonNvars(0) // don't use this initialisation, only here to make Coverity happy. Is set in Init()
192  , fUseNTrainEvents(0)
193  , fBaggedSampleFraction(0)
194  , fNoNegWeightsInTraining(kFALSE)
195  , fInverseBoostNegWeights(kFALSE)
196  , fPairNegWeightsGlobal(kFALSE)
197  , fTrainWithNegWeights(kFALSE)
198  , fDoBoostMonitor(kFALSE)
199  , fITree(0)
200  , fBoostWeight(0)
201  , fErrorFraction(0)
202  , fCss(0)
203  , fCts_sb(0)
204  , fCtb_ss(0)
205  , fCbb(0)
206  , fDoPreselection(kFALSE)
207  , fSkipNormalization(kFALSE)
208  , fHistoricBool(kFALSE)
209 {
211  fSepType = NULL;
212 }
213 
214 ////////////////////////////////////////////////////////////////////////////////
215 
217  const TString& theWeightFile)
218  : TMVA::MethodBase( Types::kBDT, theData, theWeightFile)
219  , fTrainSample(0)
220  , fNTrees(0)
221  , fSigToBkgFraction(0)
222  , fAdaBoostBeta(0)
223 // , fTransitionPoint(0)
224  , fShrinkage(0)
227 // , fSumOfWeights(0)
228  , fMinNodeEvents(0)
229  , fMinNodeSize(5)
230  , fMinNodeSizeS("5%")
231  , fNCuts(0)
232  , fUseFisherCuts(0) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
233  , fMinLinCorrForFisher(.8) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
234  , fUseExclusiveVars(0) // don't use this initialisation, only here to make Coverity happy. Is set in DeclarOptions()
236  , fNodePurityLimit(0)
237  , fNNodesMax(0)
238  , fMaxDepth(0)
239  , fPruneMethod(DecisionTree::kNoPruning)
240  , fPruneStrength(0)
241  , fFValidationEvents(0)
242  , fAutomatic(kFALSE)
244  , fUseNvars(0)
245  , fUsePoissonNvars(0) // don't use this initialisation, only here to make Coverity happy. Is set in Init()
246  , fUseNTrainEvents(0)
253  , fITree(0)
254  , fBoostWeight(0)
255  , fErrorFraction(0)
256  , fCss(0)
257  , fCts_sb(0)
258  , fCtb_ss(0)
259  , fCbb(0)
263 {
265  fSepType = NULL;
266  // constructor for calculating BDT-MVA using previously generated decision trees
267  // the result of the previous training (the decision trees) are read in via the
268  // weight file. Make sure the the variables correspond to the ones used in
269  // creating the "weight"-file
270 }
271 
272 ////////////////////////////////////////////////////////////////////////////////
273 /// BDT can handle classification with multiple classes and regression with one regression-target
274 
276 {
277  if (type == Types::kClassification && numberClasses == 2) return kTRUE;
278  if (type == Types::kMulticlass ) return kTRUE;
279  if( type == Types::kRegression && numberTargets == 1 ) return kTRUE;
280  return kFALSE;
281 }
282 
283 ////////////////////////////////////////////////////////////////////////////////
284 /// define the options (their key words) that can be set in the option string
285 /// know options:
286 /// nTrees number of trees in the forest to be created
287 /// BoostType the boosting type for the trees in the forest (AdaBoost e.t.c..)
288 /// known: AdaBoost
289 /// AdaBoostR2 (Adaboost for regression)
290 /// Bagging
291 /// GradBoost
292 /// AdaBoostBeta the boosting parameter, beta, for AdaBoost
293 /// UseRandomisedTrees choose at each node splitting a random set of variables
294 /// UseNvars use UseNvars variables in randomised trees
295 /// UsePoission Nvars use UseNvars not as fixed number but as mean of a possion distribution
296 /// SeparationType the separation criterion applied in the node splitting
297 /// known: GiniIndex
298 /// MisClassificationError
299 /// CrossEntropy
300 /// SDivSqrtSPlusB
301 /// MinNodeSize: minimum percentage of training events in a leaf node (leaf criteria, stop splitting)
302 /// nCuts: the number of steps in the optimisation of the cut for a node (if < 0, then
303 /// step size is determined by the events)
304 /// UseFisherCuts: use multivariate splits using the Fisher criterion
305 /// UseYesNoLeaf decide if the classification is done simply by the node type, or the S/B
306 /// (from the training) in the leaf node
307 /// NodePurityLimit the minimum purity to classify a node as a signal node (used in pruning and boosting to determine
308 /// misclassification error rate)
309 /// PruneMethod The Pruning method:
310 /// known: NoPruning // switch off pruning completely
311 /// ExpectedError
312 /// CostComplexity
313 /// PruneStrength a parameter to adjust the amount of pruning. Should be large enough such that overtraining is avoided.
314 /// PruningValFraction number of events to use for optimizing pruning (only if PruneStrength < 0, i.e. automatic pruning)
315 /// NegWeightTreatment IgnoreNegWeightsInTraining Ignore negative weight events in the training.
316 /// DecreaseBoostWeight Boost ev. with neg. weight with 1/boostweight instead of boostweight
317 /// PairNegWeightsGlobal Pair ev. with neg. and pos. weights in traning sample and "annihilate" them
318 /// MaxDepth maximum depth of the decision tree allowed before further splitting is stopped
319 /// SkipNormalization Skip normalization at initialization, to keep expectation value of BDT output
320 /// according to the fraction of events
321 
322 
324 {
325  DeclareOptionRef(fNTrees, "NTrees", "Number of trees in the forest");
326  if (DoRegression()) {
327  DeclareOptionRef(fMaxDepth=50,"MaxDepth","Max depth of the decision tree allowed");
328  }else{
329  DeclareOptionRef(fMaxDepth=3,"MaxDepth","Max depth of the decision tree allowed");
330  }
331 
332  TString tmp="5%"; if (DoRegression()) tmp="0.2%";
333  DeclareOptionRef(fMinNodeSizeS=tmp, "MinNodeSize", "Minimum percentage of training events required in a leaf node (default: Classification: 5%, Regression: 0.2%)");
334  // MinNodeSize: minimum percentage of training events in a leaf node (leaf criteria, stop splitting)
335  DeclareOptionRef(fNCuts, "nCuts", "Number of grid points in variable range used in finding optimal cut in node splitting");
336 
337  DeclareOptionRef(fBoostType, "BoostType", "Boosting type for the trees in the forest (note: AdaCost is still experimental)");
338 
339  AddPreDefVal(TString("AdaBoost"));
340  AddPreDefVal(TString("RealAdaBoost"));
341  AddPreDefVal(TString("AdaCost"));
342  AddPreDefVal(TString("Bagging"));
343  // AddPreDefVal(TString("RegBoost"));
344  AddPreDefVal(TString("AdaBoostR2"));
345  AddPreDefVal(TString("Grad"));
346  if (DoRegression()) {
347  fBoostType = "AdaBoostR2";
348  }else{
349  fBoostType = "AdaBoost";
350  }
351  DeclareOptionRef(fAdaBoostR2Loss="Quadratic", "AdaBoostR2Loss", "Type of Loss function in AdaBoostR2");
352  AddPreDefVal(TString("Linear"));
353  AddPreDefVal(TString("Quadratic"));
354  AddPreDefVal(TString("Exponential"));
355 
356  DeclareOptionRef(fBaggedBoost=kFALSE, "UseBaggedBoost","Use only a random subsample of all events for growing the trees in each boost iteration.");
357  DeclareOptionRef(fShrinkage=1.0, "Shrinkage", "Learning rate for GradBoost algorithm");
358  DeclareOptionRef(fAdaBoostBeta=.5, "AdaBoostBeta", "Learning rate for AdaBoost algorithm");
359  DeclareOptionRef(fRandomisedTrees,"UseRandomisedTrees","Determine at each node splitting the cut variable only as the best out of a random subset of variables (like in RandomForests)");
360  DeclareOptionRef(fUseNvars,"UseNvars","Size of the subset of variables used with RandomisedTree option");
361  DeclareOptionRef(fUsePoissonNvars,"UsePoissonNvars", "Interpret \"UseNvars\" not as fixed number but as mean of a Possion distribution in each split with RandomisedTree option");
362  DeclareOptionRef(fBaggedSampleFraction=.6,"BaggedSampleFraction","Relative size of bagged event sample to original size of the data sample (used whenever bagging is used (i.e. UseBaggedBoost, Bagging,)" );
363 
364  DeclareOptionRef(fUseYesNoLeaf=kTRUE, "UseYesNoLeaf",
365  "Use Sig or Bkg categories, or the purity=S/(S+B) as classification of the leaf node -> Real-AdaBoost");
366  if (DoRegression()) {
368  }
369 
370  DeclareOptionRef(fNegWeightTreatment="InverseBoostNegWeights","NegWeightTreatment","How to treat events with negative weights in the BDT training (particular the boosting) : IgnoreInTraining; Boost With inverse boostweight; Pair events with negative and positive weights in traning sample and *annihilate* them (experimental!)");
371  AddPreDefVal(TString("InverseBoostNegWeights"));
372  AddPreDefVal(TString("IgnoreNegWeightsInTraining"));
373  AddPreDefVal(TString("NoNegWeightsInTraining")); // well, let's be nice to users and keep at least this old name anyway ..
374  AddPreDefVal(TString("PairNegWeightsGlobal"));
375  AddPreDefVal(TString("Pray"));
376 
377 
378 
379  DeclareOptionRef(fCss=1., "Css", "AdaCost: cost of true signal selected signal");
380  DeclareOptionRef(fCts_sb=1.,"Cts_sb","AdaCost: cost of true signal selected bkg");
381  DeclareOptionRef(fCtb_ss=1.,"Ctb_ss","AdaCost: cost of true bkg selected signal");
382  DeclareOptionRef(fCbb=1., "Cbb", "AdaCost: cost of true bkg selected bkg ");
383 
384  DeclareOptionRef(fNodePurityLimit=0.5, "NodePurityLimit", "In boosting/pruning, nodes with purity > NodePurityLimit are signal; background otherwise.");
385 
386 
387  DeclareOptionRef(fSepTypeS, "SeparationType", "Separation criterion for node splitting");
388  AddPreDefVal(TString("CrossEntropy"));
389  AddPreDefVal(TString("GiniIndex"));
390  AddPreDefVal(TString("GiniIndexWithLaplace"));
391  AddPreDefVal(TString("MisClassificationError"));
392  AddPreDefVal(TString("SDivSqrtSPlusB"));
393  AddPreDefVal(TString("RegressionVariance"));
394  if (DoRegression()) {
395  fSepTypeS = "RegressionVariance";
396  }else{
397  fSepTypeS = "GiniIndex";
398  }
399 
400  DeclareOptionRef(fRegressionLossFunctionBDTGS = "Huber", "RegressionLossFunctionBDTG", "Loss function for BDTG regression.");
401  AddPreDefVal(TString("Huber"));
402  AddPreDefVal(TString("AbsoluteDeviation"));
403  AddPreDefVal(TString("LeastSquares"));
404 
405  DeclareOptionRef(fHuberQuantile = 0.7, "HuberQuantile", "In the Huber loss function this is the quantile that separates the core from the tails in the residuals distribution.");
406 
407  DeclareOptionRef(fDoBoostMonitor=kFALSE,"DoBoostMonitor","Create control plot with ROC integral vs tree number");
408 
409  DeclareOptionRef(fUseFisherCuts=kFALSE, "UseFisherCuts", "Use multivariate splits using the Fisher criterion");
410  DeclareOptionRef(fMinLinCorrForFisher=.8,"MinLinCorrForFisher", "The minimum linear correlation between two variables demanded for use in Fisher criterion in node splitting");
411  DeclareOptionRef(fUseExclusiveVars=kFALSE,"UseExclusiveVars","Variables already used in fisher criterion are not anymore analysed individually for node splitting");
412 
413 
414  DeclareOptionRef(fDoPreselection=kFALSE,"DoPreselection","and and apply automatic pre-selection for 100% efficient signal (bkg) cuts prior to training");
415 
416 
417  DeclareOptionRef(fSigToBkgFraction=1,"SigToBkgFraction","Sig to Bkg ratio used in Training (similar to NodePurityLimit, which cannot be used in real adaboost");
418 
419  DeclareOptionRef(fPruneMethodS, "PruneMethod", "Note: for BDTs use small trees (e.g.MaxDepth=3) and NoPruning: Pruning: Method used for pruning (removal) of statistically insignificant branches ");
420  AddPreDefVal(TString("NoPruning"));
421  AddPreDefVal(TString("ExpectedError"));
422  AddPreDefVal(TString("CostComplexity"));
423 
424  DeclareOptionRef(fPruneStrength, "PruneStrength", "Pruning strength");
425 
426  DeclareOptionRef(fFValidationEvents=0.5, "PruningValFraction", "Fraction of events to use for optimizing automatic pruning.");
427 
428  DeclareOptionRef(fSkipNormalization=kFALSE, "SkipNormalization", "Skip normalization at initialization, to keep expectation value of BDT output according to the fraction of events");
429 
430  // deprecated options, still kept for the moment:
431  DeclareOptionRef(fMinNodeEvents=0, "nEventsMin", "deprecated: Use MinNodeSize (in % of training events) instead");
432 
433  DeclareOptionRef(fBaggedGradBoost=kFALSE, "UseBaggedGrad","deprecated: Use *UseBaggedBoost* instead: Use only a random subsample of all events for growing the trees in each iteration.");
434  DeclareOptionRef(fBaggedSampleFraction, "GradBaggingFraction","deprecated: Use *BaggedSampleFraction* instead: Defines the fraction of events to be used in each iteration, e.g. when UseBaggedGrad=kTRUE. ");
435  DeclareOptionRef(fUseNTrainEvents,"UseNTrainEvents","deprecated: Use *BaggedSampleFraction* instead: Number of randomly picked training events used in randomised (and bagged) trees");
436  DeclareOptionRef(fNNodesMax,"NNodesMax","deprecated: Use MaxDepth instead to limit the tree size" );
437 
438 
439 }
440 
441 ////////////////////////////////////////////////////////////////////////////////
442 /// options that are used ONLY for the READER to ensure backward compatibility
443 
446 
447 
448  DeclareOptionRef(fHistoricBool=kTRUE, "UseWeightedTrees",
449  "Use weighted trees or simple average in classification from the forest");
450  DeclareOptionRef(fHistoricBool=kFALSE, "PruneBeforeBoost", "Flag to prune the tree before applying boosting algorithm");
451  DeclareOptionRef(fHistoricBool=kFALSE,"RenormByClass","Individually re-normalize each event class to the original size after boosting");
452 
453  AddPreDefVal(TString("NegWeightTreatment"),TString("IgnoreNegWeights"));
454 
455 }
456 
457 
458 
459 
460 ////////////////////////////////////////////////////////////////////////////////
461 /// the option string is decoded, for available options see "DeclareOptions"
462 
464 {
465  fSepTypeS.ToLower();
466  if (fSepTypeS == "misclassificationerror") fSepType = new MisClassificationError();
467  else if (fSepTypeS == "giniindex") fSepType = new GiniIndex();
468  else if (fSepTypeS == "giniindexwithlaplace") fSepType = new GiniIndexWithLaplace();
469  else if (fSepTypeS == "crossentropy") fSepType = new CrossEntropy();
470  else if (fSepTypeS == "sdivsqrtsplusb") fSepType = new SdivSqrtSplusB();
471  else if (fSepTypeS == "regressionvariance") fSepType = NULL;
472  else {
473  Log() << kINFO << GetOptions() << Endl;
474  Log() << kFATAL << "<ProcessOptions> unknown Separation Index option " << fSepTypeS << " called" << Endl;
475  }
476 
477  if(!(fHuberQuantile >= 0.0 && fHuberQuantile <= 1.0)){
478  Log() << kINFO << GetOptions() << Endl;
479  Log() << kFATAL << "<ProcessOptions> Huber Quantile must be in range [0,1]. Value given, " << fHuberQuantile << ", does not match this criteria" << Endl;
480  }
481 
486  else {
487  Log() << kINFO << GetOptions() << Endl;
488  Log() << kFATAL << "<ProcessOptions> unknown Regression Loss Function BDT option " << fRegressionLossFunctionBDTGS << " called" << Endl;
489  }
490 
493  else if (fPruneMethodS == "costcomplexity") fPruneMethod = DecisionTree::kCostComplexityPruning;
494  else if (fPruneMethodS == "nopruning") fPruneMethod = DecisionTree::kNoPruning;
495  else {
496  Log() << kINFO << GetOptions() << Endl;
497  Log() << kFATAL << "<ProcessOptions> unknown PruneMethod " << fPruneMethodS << " option called" << Endl;
498  }
500  else fAutomatic = kFALSE;
502  Log() << kFATAL
503  << "Sorry autmoatic pruning strength determination is not implemented yet for ExpectedErrorPruning" << Endl;
504  }
505 
506 
507  if (fMinNodeEvents > 0){
509  Log() << kWARNING << "You have explicitly set ** nEventsMin = " << fMinNodeEvents<<" ** the min ablsolut number \n"
510  << "of events in a leaf node. This is DEPRECATED, please use the option \n"
511  << "*MinNodeSize* giving the relative number as percentage of training \n"
512  << "events instead. \n"
513  << "nEventsMin="<<fMinNodeEvents<< "--> MinNodeSize="<<fMinNodeSize<<"%"
514  << Endl;
515  Log() << kWARNING << "Note also that explicitly setting *nEventsMin* so far OVERWRITES the option recomeded \n"
516  << " *MinNodeSize* = " << fMinNodeSizeS << " option !!" << Endl ;
517  fMinNodeSizeS = Form("%F3.2",fMinNodeSize);
518 
519  }else{
521  }
522 
523 
525 
526  if (fBoostType=="Grad") {
528  if (fNegWeightTreatment=="InverseBoostNegWeights"){
529  Log() << kINFO << "the option *InverseBoostNegWeights* does not exist for BoostType=Grad --> change" << Endl;
530  Log() << kINFO << "to new default for GradBoost *Pray*" << Endl;
531  Log() << kDEBUG << "i.e. simply keep them as if which should work fine for Grad Boost" << Endl;
532  fNegWeightTreatment="Pray";
534  }
535  } else if (fBoostType=="RealAdaBoost"){
536  fBoostType = "AdaBoost";
538  } else if (fBoostType=="AdaCost"){
540  }
541 
542  if (fFValidationEvents < 0.0) fFValidationEvents = 0.0;
543  if (fAutomatic && fFValidationEvents > 0.5) {
544  Log() << kWARNING << "You have chosen to use more than half of your training sample "
545  << "to optimize the automatic pruning algorithm. This is probably wasteful "
546  << "and your overall results will be degraded. Are you sure you want this?"
547  << Endl;
548  }
549 
550 
551  if (this->Data()->HasNegativeEventWeights()){
552  Log() << kINFO << " You are using a Monte Carlo that has also negative weights. "
553  << "That should in principle be fine as long as on average you end up with "
554  << "something positive. For this you have to make sure that the minimal number "
555  << "of (un-weighted) events demanded for a tree node (currently you use: MinNodeSize="
556  << fMinNodeSizeS << " ("<< fMinNodeSize << "%)"
557  <<", (or the deprecated equivalent nEventsMin) you can set this via the "
558  <<"BDT option string when booking the "
559  << "classifier) is large enough to allow for reasonable averaging!!! "
560  << " If this does not help.. maybe you want to try the option: IgnoreNegWeightsInTraining "
561  << "which ignores events with negative weight in the training. " << Endl
562  << Endl << "Note: You'll get a WARNING message during the training if that should ever happen" << Endl;
563  }
564 
565  if (DoRegression()) {
567  Log() << kWARNING << "Regression Trees do not work with fUseYesNoLeaf=TRUE --> I will set it to FALSE" << Endl;
569  }
570 
571  if (fSepType != NULL){
572  Log() << kWARNING << "Regression Trees do not work with Separation type other than <RegressionVariance> --> I will use it instead" << Endl;
573  fSepType = NULL;
574  }
575  if (fUseFisherCuts){
576  Log() << kWARNING << "Sorry, UseFisherCuts is not available for regression analysis, I will ignore it!" << Endl;
578  }
579  if (fNCuts < 0) {
580  Log() << kWARNING << "Sorry, the option of nCuts<0 using a more elaborate node splitting algorithm " << Endl;
581  Log() << kWARNING << "is not implemented for regression analysis ! " << Endl;
582  Log() << kWARNING << "--> I switch do default nCuts = 20 and use standard node splitting"<<Endl;
583  fNCuts=20;
584  }
585  }
586  if (fRandomisedTrees){
587  Log() << kINFO << " Randomised trees use no pruning" << Endl;
589  // fBoostType = "Bagging";
590  }
591 
592  if (fUseFisherCuts) {
593  Log() << kWARNING << "When using the option UseFisherCuts, the other option nCuts<0 (i.e. using" << Endl;
594  Log() << " a more elaborate node splitting algorithm) is not implemented. " << Endl;
595  //I will switch o " << Endl;
596  //Log() << "--> I switch do default nCuts = 20 and use standard node splitting WITH possible Fisher criteria"<<Endl;
597  fNCuts=20;
598  }
599 
600  if (fNTrees==0){
601  Log() << kERROR << " Zero Decision Trees demanded... that does not work !! "
602  << " I set it to 1 .. just so that the program does not crash"
603  << Endl;
604  fNTrees = 1;
605  }
606 
608  if (fNegWeightTreatment == "ignorenegweightsintraining") fNoNegWeightsInTraining = kTRUE;
609  else if (fNegWeightTreatment == "nonegweightsintraining") fNoNegWeightsInTraining = kTRUE;
610  else if (fNegWeightTreatment == "inverseboostnegweights") fInverseBoostNegWeights = kTRUE;
611  else if (fNegWeightTreatment == "pairnegweightsglobal") fPairNegWeightsGlobal = kTRUE;
612  else if (fNegWeightTreatment == "pray") Log() << kDEBUG << "Yes, good luck with praying " << Endl;
613  else {
614  Log() << kINFO << GetOptions() << Endl;
615  Log() << kFATAL << "<ProcessOptions> unknown option for treating negative event weights during training " << fNegWeightTreatment << " requested" << Endl;
616  }
617 
618  if (fNegWeightTreatment == "pairnegweightsglobal")
619  Log() << kWARNING << " you specified the option NegWeightTreatment=PairNegWeightsGlobal : This option is still considered EXPERIMENTAL !! " << Endl;
620 
621 
622  // dealing with deprecated options !
623  if (fNNodesMax>0) {
624  UInt_t tmp=1; // depth=0 == 1 node
625  fMaxDepth=0;
626  while (tmp < fNNodesMax){
627  tmp+=2*tmp;
628  fMaxDepth++;
629  }
630  Log() << kWARNING << "You have specified a deprecated option *NNodesMax="<<fNNodesMax
631  << "* \n this has been translated to MaxDepth="<<fMaxDepth<<Endl;
632  }
633 
634 
635  if (fUseNTrainEvents>0){
637  Log() << kWARNING << "You have specified a deprecated option *UseNTrainEvents="<<fUseNTrainEvents
638  << "* \n this has been translated to BaggedSampleFraction="<<fBaggedSampleFraction<<"(%)"<<Endl;
639  }
640 
641  if (fBoostType=="Bagging") fBaggedBoost = kTRUE;
642  if (fBaggedGradBoost){
644  Log() << kWARNING << "You have specified a deprecated option *UseBaggedGrad* --> please use *UseBaggedBoost* instead" << Endl;
645  }
646 
647 }
648 
649 
650 //_______________________________________________________________________
651 
653  if (sizeInPercent > 0 && sizeInPercent < 50){
654  fMinNodeSize=sizeInPercent;
655 
656  } else {
657  Log() << kFATAL << "you have demanded a minimal node size of "
658  << sizeInPercent << "% of the training events.. \n"
659  << " that somehow does not make sense "<<Endl;
660  }
661 
662 }
663 ////////////////////////////////////////////////////////////////////////////////
664 
666  sizeInPercent.ReplaceAll("%","");
667  sizeInPercent.ReplaceAll(" ","");
668  if (sizeInPercent.IsFloat()) SetMinNodeSize(sizeInPercent.Atof());
669  else {
670  Log() << kFATAL << "I had problems reading the option MinNodeEvents, which "
671  << "after removing a possible % sign now reads " << sizeInPercent << Endl;
672  }
673 }
674 
675 
676 
677 ////////////////////////////////////////////////////////////////////////////////
678 /// common initialisation with defaults for the BDT-Method
679 
681 {
682  fNTrees = 800;
684  fMaxDepth = 3;
685  fBoostType = "AdaBoost";
686  if(DataInfo().GetNClasses()!=0) //workaround for multiclass application
687  fMinNodeSize = 5.;
688  }else {
689  fMaxDepth = 50;
690  fBoostType = "AdaBoostR2";
691  fAdaBoostR2Loss = "Quadratic";
692  if(DataInfo().GetNClasses()!=0) //workaround for multiclass application
693  fMinNodeSize = .2;
694  }
695 
696 
697  fNCuts = 20;
698  fPruneMethodS = "NoPruning";
700  fPruneStrength = 0;
701  fAutomatic = kFALSE;
702  fFValidationEvents = 0.5;
704  // fUseNvars = (GetNvar()>12) ? UInt_t(GetNvar()/8) : TMath::Max(UInt_t(2),UInt_t(GetNvar()/3));
707  fShrinkage = 1.0;
708 // fSumOfWeights = 0.0;
709 
710  // reference cut value to distinguish signal-like from background-like events
712 }
713 
714 
715 ////////////////////////////////////////////////////////////////////////////////
716 /// reset the method, as if it had just been instantiated (forget all training etc.)
717 
719 {
720  // I keep the BDT EventSample and its Validation sample (eventuall they should all
721  // disappear and just use the DataSet samples ..
722 
723  // remove all the trees
724  for (UInt_t i=0; i<fForest.size(); i++) delete fForest[i];
725  fForest.clear();
726 
727  fBoostWeights.clear();
729  fVariableImportance.clear();
730  fResiduals.clear();
731  fLossFunctionEventInfo.clear();
732  // now done in "InitEventSample" which is called in "Train"
733  // reset all previously stored/accumulated BOOST weights in the event sample
734  //for (UInt_t iev=0; iev<fEventSample.size(); iev++) fEventSample[iev]->SetBoostWeight(1.);
736  Log() << kDEBUG << " successfully(?) reset the method " << Endl;
737 }
738 
739 
740 ////////////////////////////////////////////////////////////////////////////////
741 ///destructor
742 /// Note: fEventSample and ValidationSample are already deleted at the end of TRAIN
743 /// When they are not used anymore
744 /// for (UInt_t i=0; i<fEventSample.size(); i++) delete fEventSample[i];
745 /// for (UInt_t i=0; i<fValidationSample.size(); i++) delete fValidationSample[i];
746 
748 {
749  for (UInt_t i=0; i<fForest.size(); i++) delete fForest[i];
750 }
751 
752 ////////////////////////////////////////////////////////////////////////////////
753 /// initialize the event sample (i.e. reset the boost-weights... etc)
754 
756 {
757  if (!HasTrainingTree()) Log() << kFATAL << "<Init> Data().TrainingTree() is zero pointer" << Endl;
758 
759  if (fEventSample.size() > 0) { // do not re-initialise the event sample, just set all boostweights to 1. as if it were untouched
760  // reset all previously stored/accumulated BOOST weights in the event sample
761  for (UInt_t iev=0; iev<fEventSample.size(); iev++) fEventSample[iev]->SetBoostWeight(1.);
762  } else {
764  UInt_t nevents = Data()->GetNTrainingEvents();
765 
766  std::vector<const TMVA::Event*> tmpEventSample;
767  for (Long64_t ievt=0; ievt<nevents; ievt++) {
768  // const Event *event = new Event(*(GetEvent(ievt)));
769  Event* event = new Event( *GetTrainingEvent(ievt) );
770  tmpEventSample.push_back(event);
771  }
772 
773  if (!DoRegression()) DeterminePreselectionCuts(tmpEventSample);
774  else fDoPreselection = kFALSE; // just to make sure...
775 
776  for (UInt_t i=0; i<tmpEventSample.size(); i++) delete tmpEventSample[i];
777 
778 
779  Bool_t firstNegWeight=kTRUE;
780  Bool_t firstZeroWeight=kTRUE;
781  for (Long64_t ievt=0; ievt<nevents; ievt++) {
782  // const Event *event = new Event(*(GetEvent(ievt)));
783  // const Event* event = new Event( *GetTrainingEvent(ievt) );
784  Event* event = new Event( *GetTrainingEvent(ievt) );
785  if (fDoPreselection){
786  if (TMath::Abs(ApplyPreselectionCuts(event)) > 0.05) {
787  delete event;
788  continue;
789  }
790  }
791 
792  if (event->GetWeight() < 0 && (IgnoreEventsWithNegWeightsInTraining() || fNoNegWeightsInTraining)){
793  if (firstNegWeight) {
794  Log() << kWARNING << " Note, you have events with negative event weight in the sample, but you've chosen to ignore them" << Endl;
795  firstNegWeight=kFALSE;
796  }
797  delete event;
798  }else if (event->GetWeight()==0){
799  if (firstZeroWeight) {
800  firstZeroWeight = kFALSE;
801  Log() << "Events with weight == 0 are going to be simply ignored " << Endl;
802  }
803  delete event;
804  }else{
805  if (event->GetWeight() < 0) {
807  if (firstNegWeight){
808  firstNegWeight = kFALSE;
810  Log() << kWARNING << "Events with negative event weights are found and "
811  << " will be removed prior to the actual BDT training by global "
812  << " paring (and subsequent annihilation) with positiv weight events"
813  << Endl;
814  }else{
815  Log() << kWARNING << "Events with negative event weights are USED during "
816  << "the BDT training. This might cause problems with small node sizes "
817  << "or with the boosting. Please remove negative events from training "
818  << "using the option *IgnoreEventsWithNegWeightsInTraining* in case you "
819  << "observe problems with the boosting"
820  << Endl;
821  }
822  }
823  }
824  // if fAutomatic == true you need a validation sample to optimize pruning
825  if (fAutomatic) {
826  Double_t modulo = 1.0/(fFValidationEvents);
827  Int_t imodulo = static_cast<Int_t>( fmod(modulo,1.0) > 0.5 ? ceil(modulo) : floor(modulo) );
828  if (ievt % imodulo == 0) fValidationSample.push_back( event );
829  else fEventSample.push_back( event );
830  }
831  else {
832  fEventSample.push_back(event);
833  }
834  }
835  }
836 
837  if (fAutomatic) {
838  Log() << kINFO << "<InitEventSample> Internally I use " << fEventSample.size()
839  << " for Training and " << fValidationSample.size()
840  << " for Pruning Validation (" << ((Float_t)fValidationSample.size())/((Float_t)fEventSample.size()+fValidationSample.size())*100.0
841  << "% of training used for validation)" << Endl;
842  }
843 
844  // some pre-processing for events with negative weights
846  }
847 
848  if (!DoRegression() && !fSkipNormalization){
849  Log() << kDEBUG << "\t<InitEventSample> For classification trees, "<< Endl;
850  Log() << kDEBUG << " \tthe effective number of backgrounds is scaled to match "<<Endl;
851  Log() << kDEBUG << " \tthe signal. Otherwise the first boosting step would do 'just that'!"<<Endl;
852  // it does not make sense in decision trees to start with unequal number of signal/background
853  // events (weights) .. hence normalize them now (happens atherwise in first 'boosting step'
854  // anyway..
855  // Also make sure, that the sum_of_weights == sample.size() .. as this is assumed in
856  // the DecisionTree to derive a sensible number for "fMinSize" (min.#events in node)
857  // that currently is an OR between "weighted" and "unweighted number"
858  // I want:
859  // nS + nB = n
860  // a*SW + b*BW = n
861  // (a*SW)/(b*BW) = fSigToBkgFraction
862  //
863  // ==> b = n/((1+f)BW) and a = (nf/(1+f))/SW
864 
865  Double_t nevents = fEventSample.size();
866  Double_t sumSigW=0, sumBkgW=0;
867  Int_t sumSig=0, sumBkg=0;
868  for (UInt_t ievt=0; ievt<fEventSample.size(); ievt++) {
869  if ((DataInfo().IsSignal(fEventSample[ievt])) ) {
870  sumSigW += fEventSample[ievt]->GetWeight();
871  sumSig++;
872  } else {
873  sumBkgW += fEventSample[ievt]->GetWeight();
874  sumBkg++;
875  }
876  }
877  if (sumSigW && sumBkgW){
878  Double_t normSig = nevents/((1+fSigToBkgFraction)*sumSigW)*fSigToBkgFraction;
879  Double_t normBkg = nevents/((1+fSigToBkgFraction)*sumBkgW); ;
880  Log() << kDEBUG << "\tre-normalise events such that Sig and Bkg have respective sum of weights = "
881  << fSigToBkgFraction << Endl;
882  Log() << kDEBUG << " \tsig->sig*"<<normSig << "ev. bkg->bkg*"<<normBkg << "ev." <<Endl;
883  Log() << kHEADER << "#events: (reweighted) sig: "<< sumSigW*normSig << " bkg: " << sumBkgW*normBkg << Endl;
884  Log() << kINFO << "#events: (unweighted) sig: "<< sumSig << " bkg: " << sumBkg << Endl;
885  for (Long64_t ievt=0; ievt<nevents; ievt++) {
886  if ((DataInfo().IsSignal(fEventSample[ievt])) ) fEventSample[ievt]->SetBoostWeight(normSig);
887  else fEventSample[ievt]->SetBoostWeight(normBkg);
888  }
889  }else{
890  Log() << kINFO << "--> could not determine scaleing factors as either there are " << Endl;
891  Log() << kINFO << " no signal events (sumSigW="<<sumSigW<<") or no bkg ev. (sumBkgW="<<sumBkgW<<")"<<Endl;
892  }
893 
894  }
895 
897  if (fBaggedBoost){
900  }
901 
902  //just for debug purposes..
903  /*
904  sumSigW=0;
905  sumBkgW=0;
906  for (UInt_t ievt=0; ievt<fEventSample.size(); ievt++) {
907  if ((DataInfo().IsSignal(fEventSample[ievt])) ) sumSigW += fEventSample[ievt]->GetWeight();
908  else sumBkgW += fEventSample[ievt]->GetWeight();
909  }
910  Log() << kWARNING << "sigSumW="<<sumSigW<<"bkgSumW="<<sumBkgW<< Endl;
911  */
912 }
913 
914 ////////////////////////////////////////////////////////////////////////////////
915 /// o.k. you know there are events with negative event weights. This routine will remove
916 /// them by pairing them with the closest event(s) of the same event class with positive
917 /// weights
918 /// A first attempt is "brute force", I dont' try to be clever using search trees etc,
919 /// just quick and dirty to see if the result is any good
920 
922  Double_t totalNegWeights = 0;
923  Double_t totalPosWeights = 0;
924  Double_t totalWeights = 0;
925  std::vector<const Event*> negEvents;
926  for (UInt_t iev = 0; iev < fEventSample.size(); iev++){
927  if (fEventSample[iev]->GetWeight() < 0) {
928  totalNegWeights += fEventSample[iev]->GetWeight();
929  negEvents.push_back(fEventSample[iev]);
930  } else {
931  totalPosWeights += fEventSample[iev]->GetWeight();
932  }
933  totalWeights += fEventSample[iev]->GetWeight();
934  }
935  if (totalNegWeights == 0 ) {
936  Log() << kINFO << "no negative event weights found .. no preprocessing necessary" << Endl;
937  return;
938  } else {
939  Log() << kINFO << "found a total of " << totalNegWeights << " of negative event weights which I am going to try to pair with positive events to annihilate them" << Endl;
940  Log() << kINFO << "found a total of " << totalPosWeights << " of events with positive weights" << Endl;
941  Log() << kINFO << "--> total sum of weights = " << totalWeights << " = " << totalNegWeights+totalPosWeights << Endl;
942  }
943 
944  std::vector<TMatrixDSym*>* cov = gTools().CalcCovarianceMatrices( fEventSample, 2);
945 
946  TMatrixDSym *invCov;
947 
948  for (Int_t i=0; i<2; i++){
949  invCov = ((*cov)[i]);
950  if ( TMath::Abs(invCov->Determinant()) < 10E-24 ) {
951  std::cout << "<MethodBDT::PreProcessNeg...> matrix is almost singular with deterninant="
952  << TMath::Abs(invCov->Determinant())
953  << " did you use the variables that are linear combinations or highly correlated?"
954  << std::endl;
955  }
956  if ( TMath::Abs(invCov->Determinant()) < 10E-120 ) {
957  std::cout << "<MethodBDT::PreProcessNeg...> matrix is singular with determinant="
958  << TMath::Abs(invCov->Determinant())
959  << " did you use the variables that are linear combinations?"
960  << std::endl;
961  }
962 
963  invCov->Invert();
964  }
965 
966 
967 
968  Log() << kINFO << "Found a total of " << totalNegWeights << " in negative weights out of " << fEventSample.size() << " training events " << Endl;
969  Timer timer(negEvents.size(),"Negative Event paired");
970  for (UInt_t nev = 0; nev < negEvents.size(); nev++){
971  timer.DrawProgressBar( nev );
972  Double_t weight = negEvents[nev]->GetWeight();
973  UInt_t iClassID = negEvents[nev]->GetClass();
974  invCov = ((*cov)[iClassID]);
975  while (weight < 0){
976  // find closest event with positive event weight and "pair" it with the negative event
977  // (add their weight) until there is no negative weight anymore
978  Int_t iMin=-1;
979  Double_t dist, minDist=10E270;
980  for (UInt_t iev = 0; iev < fEventSample.size(); iev++){
981  if (iClassID==fEventSample[iev]->GetClass() && fEventSample[iev]->GetWeight() > 0){
982  dist=0;
983  for (UInt_t ivar=0; ivar < GetNvar(); ivar++){
984  for (UInt_t jvar=0; jvar<GetNvar(); jvar++){
985  dist += (negEvents[nev]->GetValue(ivar)-fEventSample[iev]->GetValue(ivar))*
986  (*invCov)[ivar][jvar]*
987  (negEvents[nev]->GetValue(jvar)-fEventSample[iev]->GetValue(jvar));
988  }
989  }
990  if (dist < minDist) { iMin=iev; minDist=dist;}
991  }
992  }
993 
994  if (iMin > -1) {
995  // std::cout << "Happily pairing .. weight before : " << negEvents[nev]->GetWeight() << " and " << fEventSample[iMin]->GetWeight();
996  Double_t newWeight = (negEvents[nev]->GetWeight() + fEventSample[iMin]->GetWeight());
997  if (newWeight > 0){
998  negEvents[nev]->SetBoostWeight( 0 );
999  fEventSample[iMin]->SetBoostWeight( newWeight/fEventSample[iMin]->GetOriginalWeight() ); // note the weight*boostweight should be "newWeight"
1000  } else {
1001  negEvents[nev]->SetBoostWeight( newWeight/negEvents[nev]->GetOriginalWeight() ); // note the weight*boostweight should be "newWeight"
1002  fEventSample[iMin]->SetBoostWeight( 0 );
1003  }
1004  // std::cout << " and afterwards " << negEvents[nev]->GetWeight() << " and the paired " << fEventSample[iMin]->GetWeight() << " dist="<<minDist<< std::endl;
1005  } else Log() << kFATAL << "preprocessing didn't find event to pair with the negative weight ... probably a bug" << Endl;
1006  weight = negEvents[nev]->GetWeight();
1007  }
1008  }
1009  Log() << kINFO << "<Negative Event Pairing> took: " << timer.GetElapsedTime()
1010  << " " << Endl;
1011 
1012  // just check.. now there should be no negative event weight left anymore
1013  totalNegWeights = 0;
1014  totalPosWeights = 0;
1015  totalWeights = 0;
1016  Double_t sigWeight=0;
1017  Double_t bkgWeight=0;
1018  Int_t nSig=0;
1019  Int_t nBkg=0;
1020 
1021  std::vector<const Event*> newEventSample;
1022 
1023  for (UInt_t iev = 0; iev < fEventSample.size(); iev++){
1024  if (fEventSample[iev]->GetWeight() < 0) {
1025  totalNegWeights += fEventSample[iev]->GetWeight();
1026  totalWeights += fEventSample[iev]->GetWeight();
1027  } else {
1028  totalPosWeights += fEventSample[iev]->GetWeight();
1029  totalWeights += fEventSample[iev]->GetWeight();
1030  }
1031  if (fEventSample[iev]->GetWeight() > 0) {
1032  newEventSample.push_back(new Event(*fEventSample[iev]));
1033  if (fEventSample[iev]->GetClass() == fSignalClass){
1034  sigWeight += fEventSample[iev]->GetWeight();
1035  nSig+=1;
1036  }else{
1037  bkgWeight += fEventSample[iev]->GetWeight();
1038  nBkg+=1;
1039  }
1040  }
1041  }
1042  if (totalNegWeights < 0) Log() << kFATAL << " compenstion of negative event weights with positive ones did not work " << totalNegWeights << Endl;
1043 
1044  for (UInt_t i=0; i<fEventSample.size(); i++) delete fEventSample[i];
1045  fEventSample = newEventSample;
1046 
1047  Log() << kINFO << " after PreProcessing, the Event sample is left with " << fEventSample.size() << " events (unweighted), all with positive weights, adding up to " << totalWeights << Endl;
1048  Log() << kINFO << " nSig="<<nSig << " sigWeight="<<sigWeight << " nBkg="<<nBkg << " bkgWeight="<<bkgWeight << Endl;
1049 
1050 
1051 }
1052 
1053 //
1054 
1055 ////////////////////////////////////////////////////////////////////////////////
1056 /// call the Optimzier with the set of paremeters and ranges that
1057 /// are meant to be tuned.
1058 
1059 std::map<TString,Double_t> TMVA::MethodBDT::OptimizeTuningParameters(TString fomType, TString fitType)
1060 {
1061  // fill all the tuning parameters that should be optimized into a map:
1062  std::map<TString,TMVA::Interval*> tuneParameters;
1063  std::map<TString,Double_t> tunedParameters;
1064 
1065  // note: the 3rd paraemter in the inteval is the "number of bins", NOT the stepsize !!
1066  // the actual VALUES at (at least for the scan, guess also in GA) are always
1067  // read from the middle of the bins. Hence.. the choice of Intervals e.g. for the
1068  // MaxDepth, in order to make nice interger values!!!
1069 
1070  // find some reasonable ranges for the optimisation of MinNodeEvents:
1071 
1072  tuneParameters.insert(std::pair<TString,Interval*>("NTrees", new Interval(10,1000,5))); // stepsize 50
1073  tuneParameters.insert(std::pair<TString,Interval*>("MaxDepth", new Interval(2,4,3))); // stepsize 1
1074  tuneParameters.insert(std::pair<TString,Interval*>("MinNodeSize", new LogInterval(1,30,30))); //
1075  //tuneParameters.insert(std::pair<TString,Interval*>("NodePurityLimit",new Interval(.4,.6,3))); // stepsize .1
1076  //tuneParameters.insert(std::pair<TString,Interval*>("BaggedSampleFraction",new Interval(.4,.9,6))); // stepsize .1
1077 
1078  // method-specific parameters
1079  if (fBoostType=="AdaBoost"){
1080  tuneParameters.insert(std::pair<TString,Interval*>("AdaBoostBeta", new Interval(.2,1.,5)));
1081 
1082  }else if (fBoostType=="Grad"){
1083  tuneParameters.insert(std::pair<TString,Interval*>("Shrinkage", new Interval(0.05,0.50,5)));
1084 
1085  }else if (fBoostType=="Bagging" && fRandomisedTrees){
1086  Int_t min_var = TMath::FloorNint( GetNvar() * .25 );
1087  Int_t max_var = TMath::CeilNint( GetNvar() * .75 );
1088  tuneParameters.insert(std::pair<TString,Interval*>("UseNvars", new Interval(min_var,max_var,4)));
1089 
1090  }
1091 
1092  Log()<<kINFO << " the following BDT parameters will be tuned on the respective *grid*\n"<<Endl;
1093  std::map<TString,TMVA::Interval*>::iterator it;
1094  for(it=tuneParameters.begin(); it!= tuneParameters.end(); it++){
1095  Log() << kWARNING << it->first << Endl;
1096  std::ostringstream oss;
1097  (it->second)->Print(oss);
1098  Log()<<oss.str();
1099  Log()<<Endl;
1100  }
1101 
1102  OptimizeConfigParameters optimize(this, tuneParameters, fomType, fitType);
1103  tunedParameters=optimize.optimize();
1104 
1105  return tunedParameters;
1106 
1107 }
1108 
1109 ////////////////////////////////////////////////////////////////////////////////
1110 /// set the tuning parameters accoding to the argument
1111 
1112 void TMVA::MethodBDT::SetTuneParameters(std::map<TString,Double_t> tuneParameters)
1113 {
1114  std::map<TString,Double_t>::iterator it;
1115  for(it=tuneParameters.begin(); it!= tuneParameters.end(); it++){
1116  Log() << kWARNING << it->first << " = " << it->second << Endl;
1117  if (it->first == "MaxDepth" ) SetMaxDepth ((Int_t)it->second);
1118  else if (it->first == "MinNodeSize" ) SetMinNodeSize (it->second);
1119  else if (it->first == "NTrees" ) SetNTrees ((Int_t)it->second);
1120  else if (it->first == "NodePurityLimit") SetNodePurityLimit (it->second);
1121  else if (it->first == "AdaBoostBeta" ) SetAdaBoostBeta (it->second);
1122  else if (it->first == "Shrinkage" ) SetShrinkage (it->second);
1123  else if (it->first == "UseNvars" ) SetUseNvars ((Int_t)it->second);
1124  else if (it->first == "BaggedSampleFraction" ) SetBaggedSampleFraction (it->second);
1125  else Log() << kFATAL << " SetParameter for " << it->first << " not yet implemented " <<Endl;
1126  }
1127 
1128 
1129 }
1130 
1131 ////////////////////////////////////////////////////////////////////////////////
1132 /// BDT training
1133 
1135 {
1137 
1138  // fill the STL Vector with the event sample
1139  // (needs to be done here and cannot be done in "init" as the options need to be
1140  // known).
1141  InitEventSample();
1142 
1143  if (fNTrees==0){
1144  Log() << kERROR << " Zero Decision Trees demanded... that does not work !! "
1145  << " I set it to 1 .. just so that the program does not crash"
1146  << Endl;
1147  fNTrees = 1;
1148  }
1149 
1151  std::vector<TString> titles = {"Boost weight", "Error Fraction"};
1152  fInteractive->Init(titles);
1153  }
1154  fIPyMaxIter = fNTrees;
1155  fExitFromTraining = false;
1156 
1157  // HHV (it's been here since looong but I really don't know why we cannot handle
1158  // normalized variables in BDTs... todo
1159  if (IsNormalised()) Log() << kFATAL << "\"Normalise\" option cannot be used with BDT; "
1160  << "please remove the option from the configuration string, or "
1161  << "use \"!Normalise\""
1162  << Endl;
1163 
1164  if(DoRegression())
1165  Log() << kINFO << "Regression Loss Function: "<< fRegressionLossFunctionBDTG->Name() << Endl;
1166 
1167  Log() << kINFO << "Training "<< fNTrees << " Decision Trees ... patience please" << Endl;
1168 
1169  Log() << kDEBUG << "Training with maximal depth = " <<fMaxDepth
1170  << ", MinNodeEvents=" << fMinNodeEvents
1171  << ", NTrees="<<fNTrees
1172  << ", NodePurityLimit="<<fNodePurityLimit
1173  << ", AdaBoostBeta="<<fAdaBoostBeta
1174  << Endl;
1175 
1176  // weights applied in boosting
1177  Int_t nBins;
1178  Double_t xMin,xMax;
1179  TString hname = "AdaBooost weight distribution";
1180 
1181  nBins= 100;
1182  xMin = 0;
1183  xMax = 30;
1184 
1185  if (DoRegression()) {
1186  nBins= 100;
1187  xMin = 0;
1188  xMax = 1;
1189  hname="Boost event weights distribution";
1190  }
1191 
1192  // book monitoring histograms (for AdaBost only)
1193 
1194  TH1* h = new TH1F(Form("%s_BoostWeight",DataInfo().GetName()),hname,nBins,xMin,xMax);
1195  TH1* nodesBeforePruningVsTree = new TH1I(Form("%s_NodesBeforePruning",DataInfo().GetName()),"nodes before pruning",fNTrees,0,fNTrees);
1196  TH1* nodesAfterPruningVsTree = new TH1I(Form("%s_NodesAfterPruning",DataInfo().GetName()),"nodes after pruning",fNTrees,0,fNTrees);
1197 
1198 
1199 
1200  if(!DoMulticlass()){
1202 
1203  h->SetXTitle("boost weight");
1204  results->Store(h, "BoostWeights");
1205 
1206 
1207  // Monitor the performance (on TEST sample) versus number of trees
1208  if (fDoBoostMonitor){
1209  TH2* boostMonitor = new TH2F("BoostMonitor","ROC Integral Vs iTree",2,0,fNTrees,2,0,1.05);
1210  boostMonitor->SetXTitle("#tree");
1211  boostMonitor->SetYTitle("ROC Integral");
1212  results->Store(boostMonitor, "BoostMonitor");
1213  TGraph *boostMonitorGraph = new TGraph();
1214  boostMonitorGraph->SetName("BoostMonitorGraph");
1215  boostMonitorGraph->SetTitle("ROCIntegralVsNTrees");
1216  results->Store(boostMonitorGraph, "BoostMonitorGraph");
1217  }
1218 
1219  // weights applied in boosting vs tree number
1220  h = new TH1F("BoostWeightVsTree","Boost weights vs tree",fNTrees,0,fNTrees);
1221  h->SetXTitle("#tree");
1222  h->SetYTitle("boost weight");
1223  results->Store(h, "BoostWeightsVsTree");
1224 
1225  // error fraction vs tree number
1226  h = new TH1F("ErrFractHist","error fraction vs tree number",fNTrees,0,fNTrees);
1227  h->SetXTitle("#tree");
1228  h->SetYTitle("error fraction");
1229  results->Store(h, "ErrorFrac");
1230 
1231  // nNodesBeforePruning vs tree number
1232  nodesBeforePruningVsTree->SetXTitle("#tree");
1233  nodesBeforePruningVsTree->SetYTitle("#tree nodes");
1234  results->Store(nodesBeforePruningVsTree);
1235 
1236  // nNodesAfterPruning vs tree number
1237  nodesAfterPruningVsTree->SetXTitle("#tree");
1238  nodesAfterPruningVsTree->SetYTitle("#tree nodes");
1239  results->Store(nodesAfterPruningVsTree);
1240 
1241  }
1242 
1243  fMonitorNtuple= new TTree("MonitorNtuple","BDT variables");
1244  fMonitorNtuple->Branch("iTree",&fITree,"iTree/I");
1245  fMonitorNtuple->Branch("boostWeight",&fBoostWeight,"boostWeight/D");
1246  fMonitorNtuple->Branch("errorFraction",&fErrorFraction,"errorFraction/D");
1247 
1248  Timer timer( fNTrees, GetName() );
1249  Int_t nNodesBeforePruningCount = 0;
1250  Int_t nNodesAfterPruningCount = 0;
1251 
1252  Int_t nNodesBeforePruning = 0;
1253  Int_t nNodesAfterPruning = 0;
1254 
1255 
1256  if(fBoostType=="Grad"){
1258  }
1259 
1260  Int_t itree=0;
1261  Bool_t continueBoost=kTRUE;
1262  //for (int itree=0; itree<fNTrees; itree++) {
1263  while (itree < fNTrees && continueBoost){
1264  if (fExitFromTraining) break;
1265  fIPyCurrentIter = itree;
1266  timer.DrawProgressBar( itree );
1267  // Results* results = Data()->GetResults(GetMethodName(), Types::kTraining, GetAnalysisType());
1268  // TH1 *hxx = new TH1F(Form("swdist%d",itree),Form("swdist%d",itree),10000,0,15);
1269  // results->Store(hxx,Form("swdist%d",itree));
1270  // TH1 *hxy = new TH1F(Form("bwdist%d",itree),Form("bwdist%d",itree),10000,0,15);
1271  // results->Store(hxy,Form("bwdist%d",itree));
1272  // for (Int_t iev=0; iev<fEventSample.size(); iev++) {
1273  // if (fEventSample[iev]->GetClass()!=0) hxy->Fill((fEventSample[iev])->GetWeight());
1274  // else hxx->Fill((fEventSample[iev])->GetWeight());
1275  // }
1276 
1277  if(DoMulticlass()){
1278  if (fBoostType!="Grad"){
1279  Log() << kFATAL << "Multiclass is currently only supported by gradient boost. "
1280  << "Please change boost option accordingly (GradBoost)."
1281  << Endl;
1282  }
1283  UInt_t nClasses = DataInfo().GetNClasses();
1284  for (UInt_t i=0;i<nClasses;i++){
1285  fForest.push_back( new DecisionTree( fSepType, fMinNodeSize, fNCuts, &(DataInfo()), i,
1287  itree*nClasses+i, fNodePurityLimit, itree*nClasses+1));
1288  fForest.back()->SetNVars(GetNvar());
1289  if (fUseFisherCuts) {
1290  fForest.back()->SetUseFisherCuts();
1291  fForest.back()->SetMinLinCorrForFisher(fMinLinCorrForFisher);
1292  fForest.back()->SetUseExclusiveVars(fUseExclusiveVars);
1293  }
1294  // the minimum linear correlation between two variables demanded for use in fisher criterion in node splitting
1295 
1296  nNodesBeforePruning = fForest.back()->BuildTree(*fTrainSample);
1297  Double_t bw = this->Boost(*fTrainSample, fForest.back(),i);
1298  if (bw > 0) {
1299  fBoostWeights.push_back(bw);
1300  }else{
1301  fBoostWeights.push_back(0);
1302  Log() << kWARNING << "stopped boosting at itree="<<itree << Endl;
1303  // fNTrees = itree+1; // that should stop the boosting
1304  continueBoost=kFALSE;
1305  }
1306  }
1307  }
1308  else{
1311  itree, fNodePurityLimit, itree));
1312  fForest.back()->SetNVars(GetNvar());
1313  if (fUseFisherCuts) {
1314  fForest.back()->SetUseFisherCuts();
1315  fForest.back()->SetMinLinCorrForFisher(fMinLinCorrForFisher);
1316  fForest.back()->SetUseExclusiveVars(fUseExclusiveVars);
1317  }
1318 
1319  nNodesBeforePruning = fForest.back()->BuildTree(*fTrainSample);
1320 
1321  if (fUseYesNoLeaf && !DoRegression() && fBoostType!="Grad") { // remove leaf nodes where both daughter nodes are of same type
1322  nNodesBeforePruning = fForest.back()->CleanTree();
1323  }
1324 
1325  nNodesBeforePruningCount += nNodesBeforePruning;
1326  nodesBeforePruningVsTree->SetBinContent(itree+1,nNodesBeforePruning);
1327 
1328  fForest.back()->SetPruneMethod(fPruneMethod); // set the pruning method for the tree
1329  fForest.back()->SetPruneStrength(fPruneStrength); // set the strength parameter
1330 
1331  std::vector<const Event*> * validationSample = NULL;
1332  if(fAutomatic) validationSample = &fValidationSample;
1333 
1334  Double_t bw = this->Boost(*fTrainSample, fForest.back());
1335  if (bw > 0) {
1336  fBoostWeights.push_back(bw);
1337  }else{
1338  fBoostWeights.push_back(0);
1339  Log() << kWARNING << "stopped boosting at itree="<<itree << Endl;
1340  continueBoost=kFALSE;
1341  }
1342 
1343 
1344 
1345  // if fAutomatic == true, pruneStrength will be the optimal pruning strength
1346  // determined by the pruning algorithm; otherwise, it is simply the strength parameter
1347  // set by the user
1348  if (fPruneMethod != DecisionTree::kNoPruning) fForest.back()->PruneTree(validationSample);
1349 
1350  if (fUseYesNoLeaf && !DoRegression() && fBoostType!="Grad"){ // remove leaf nodes where both daughter nodes are of same type
1351  fForest.back()->CleanTree();
1352  }
1353  nNodesAfterPruning = fForest.back()->GetNNodes();
1354  nNodesAfterPruningCount += nNodesAfterPruning;
1355  nodesAfterPruningVsTree->SetBinContent(itree+1,nNodesAfterPruning);
1356 
1357  if (fInteractive){
1359  }
1360  fITree = itree;
1361  fMonitorNtuple->Fill();
1362  if (fDoBoostMonitor){
1363  if (! DoRegression() ){
1364  if ( itree==fNTrees-1 || (!(itree%500)) ||
1365  (!(itree%250) && itree <1000)||
1366  (!(itree%100) && itree < 500)||
1367  (!(itree%50) && itree < 250)||
1368  (!(itree%25) && itree < 150)||
1369  (!(itree%10) && itree < 50)||
1370  (!(itree%5) && itree < 20)
1371  ) BoostMonitor(itree);
1372  }
1373  }
1374  }
1375  itree++;
1376  }
1377 
1378  // get elapsed time
1379  Log() << kDEBUG << "\t<Train> elapsed time: " << timer.GetElapsedTime()
1380  << " " << Endl;
1382  Log() << kDEBUG << "\t<Train> average number of nodes (w/o pruning) : "
1383  << nNodesBeforePruningCount/GetNTrees() << Endl;
1384  }
1385  else {
1386  Log() << kDEBUG << "\t<Train> average number of nodes before/after pruning : "
1387  << nNodesBeforePruningCount/GetNTrees() << " / "
1388  << nNodesAfterPruningCount/GetNTrees()
1389  << Endl;
1390  }
1392 
1393 
1394  // reset all previously stored/accumulated BOOST weights in the event sample
1395  // for (UInt_t iev=0; iev<fEventSample.size(); iev++) fEventSample[iev]->SetBoostWeight(1.);
1396  Log() << kDEBUG << "Now I delete the privat data sample"<< Endl;
1397  for (UInt_t i=0; i<fEventSample.size(); i++) delete fEventSample[i];
1398  for (UInt_t i=0; i<fValidationSample.size(); i++) delete fValidationSample[i];
1399  fEventSample.clear();
1400  fValidationSample.clear();
1401 
1403  ExitFromTraining();
1404 }
1405 
1406 
1407 ////////////////////////////////////////////////////////////////////////////////
1408 ///returns MVA value: -1 for background, 1 for signal
1409 
1411 {
1412  Double_t sum=0;
1413  for (UInt_t itree=0; itree<nTrees; itree++) {
1414  //loop over all trees in forest
1415  sum += fForest[itree]->CheckEvent(e,kFALSE);
1416 
1417  }
1418  return 2.0/(1.0+exp(-2.0*sum))-1; //MVA output between -1 and 1
1419 }
1420 
1421 ////////////////////////////////////////////////////////////////////////////////
1422 ///Calculate residua for all events;
1423 
1424 void TMVA::MethodBDT::UpdateTargets(std::vector<const TMVA::Event*>& eventSample, UInt_t cls)
1425 {
1426  if(DoMulticlass()){
1427  UInt_t nClasses = DataInfo().GetNClasses();
1428  for (std::vector<const TMVA::Event*>::iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1429  fResiduals[*e].at(cls)+=fForest.back()->CheckEvent(*e,kFALSE);
1430  if(cls == nClasses-1){
1431  for(UInt_t i=0;i<nClasses;i++){
1432  Double_t norm = 0.0;
1433  for(UInt_t j=0;j<nClasses;j++){
1434  if(i!=j)
1435  norm+=exp(fResiduals[*e].at(j)-fResiduals[*e].at(i));
1436  }
1437  Double_t p_cls = 1.0/(1.0+norm);
1438  Double_t res = ((*e)->GetClass()==i)?(1.0-p_cls):(-p_cls);
1439  const_cast<TMVA::Event*>(*e)->SetTarget(i,res);
1440  }
1441  }
1442  }
1443  }
1444  else{
1445  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1446  fResiduals[*e].at(0)+=fForest.back()->CheckEvent(*e,kFALSE);
1447  Double_t p_sig=1.0/(1.0+exp(-2.0*fResiduals[*e].at(0)));
1448  Double_t res = (DataInfo().IsSignal(*e)?1:0)-p_sig;
1449  const_cast<TMVA::Event*>(*e)->SetTarget(0,res);
1450  }
1451  }
1452 }
1453 
1454 ////////////////////////////////////////////////////////////////////////////////
1455 ///Calculate current residuals for all events and update targets for next iteration
1456 
1457 void TMVA::MethodBDT::UpdateTargetsRegression(std::vector<const TMVA::Event*>& eventSample, Bool_t first)
1458 {
1459  if(!first){
1460  for (std::vector<const TMVA::Event*>::const_iterator e=fEventSample.begin(); e!=fEventSample.end();e++) {
1461  fLossFunctionEventInfo[*e].predictedValue += fForest.back()->CheckEvent(*e,kFALSE);
1462  }
1463  }
1464 
1466 }
1467 
1468 ////////////////////////////////////////////////////////////////////////////////
1469 ///Calculate the desired response value for each region
1470 
1471 Double_t TMVA::MethodBDT::GradBoost(std::vector<const TMVA::Event*>& eventSample, DecisionTree *dt, UInt_t cls)
1472 {
1473  std::map<TMVA::DecisionTreeNode*,std::vector<Double_t> > leaves;
1474  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1475  Double_t weight = (*e)->GetWeight();
1476  TMVA::DecisionTreeNode* node = dt->GetEventNode(*(*e));
1477  if ((leaves[node]).empty()){
1478  (leaves[node]).push_back((*e)->GetTarget(cls)* weight);
1479  (leaves[node]).push_back(fabs((*e)->GetTarget(cls))*(1.0-fabs((*e)->GetTarget(cls))) * weight* weight);
1480  }
1481  else {
1482  (leaves[node])[0]+=((*e)->GetTarget(cls)* weight);
1483  (leaves[node])[1]+=fabs((*e)->GetTarget(cls))*(1.0-fabs((*e)->GetTarget(cls))) * weight* weight;
1484  }
1485  }
1486  for (std::map<TMVA::DecisionTreeNode*,std::vector<Double_t> >::iterator iLeave=leaves.begin();
1487  iLeave!=leaves.end();++iLeave){
1488  if ((iLeave->second)[1]<1e-30) (iLeave->second)[1]=1e-30;
1489 
1490  (iLeave->first)->SetResponse(fShrinkage/DataInfo().GetNClasses()*(iLeave->second)[0]/((iLeave->second)[1]));
1491  }
1492 
1493  //call UpdateTargets before next tree is grown
1494 
1496  return 1; //trees all have the same weight
1497 }
1498 
1499 ////////////////////////////////////////////////////////////////////////////////
1500 /// Implementation of M_TreeBoost using any loss function as desribed by Friedman 1999
1501 
1502 Double_t TMVA::MethodBDT::GradBoostRegression(std::vector<const TMVA::Event*>& eventSample, DecisionTree *dt )
1503 {
1504  // get the vector of events for each terminal so that we can calculate the constant fit value in each
1505  // terminal node
1506  std::map<TMVA::DecisionTreeNode*,vector< TMVA::LossFunctionEventInfo > > leaves;
1507  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1508  TMVA::DecisionTreeNode* node = dt->GetEventNode(*(*e));
1509  (leaves[node]).push_back(fLossFunctionEventInfo[*e]);
1510  }
1511 
1512  // calculate the constant fit for each terminal node based upon the events in the node
1513  // node (iLeave->first), vector of event information (iLeave->second)
1514  for (std::map<TMVA::DecisionTreeNode*,vector< TMVA::LossFunctionEventInfo > >::iterator iLeave=leaves.begin();
1515  iLeave!=leaves.end();++iLeave){
1516  Double_t fit = fRegressionLossFunctionBDTG->Fit(iLeave->second);
1517  (iLeave->first)->SetResponse(fShrinkage*fit);
1518  }
1519 
1521  return 1;
1522 }
1523 
1524 ////////////////////////////////////////////////////////////////////////////////
1525 /// initialize targets for first tree
1526 
1527 void TMVA::MethodBDT::InitGradBoost( std::vector<const TMVA::Event*>& eventSample)
1528 {
1529  // Should get rid of this line. It's just for debugging.
1530  //std::sort(eventSample.begin(), eventSample.end(), [](const TMVA::Event* a, const TMVA::Event* b){
1531  // return (a->GetTarget(0) < b->GetTarget(0)); });
1532  fSepType=NULL; //set fSepType to NULL (regression trees are used for both classification an regression)
1533  if(DoRegression()){
1534  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1535  fLossFunctionEventInfo[*e]= TMVA::LossFunctionEventInfo((*e)->GetTarget(0), 0, (*e)->GetWeight());
1536  }
1537 
1540  return;
1541  }
1542  else if(DoMulticlass()){
1543  UInt_t nClasses = DataInfo().GetNClasses();
1544  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1545  for (UInt_t i=0;i<nClasses;i++){
1546  //Calculate initial residua, assuming equal probability for all classes
1547  Double_t r = (*e)->GetClass()==i?(1-1.0/nClasses):(-1.0/nClasses);
1548  const_cast<TMVA::Event*>(*e)->SetTarget(i,r);
1549  fResiduals[*e].push_back(0);
1550  }
1551  }
1552  }
1553  else{
1554  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1555  Double_t r = (DataInfo().IsSignal(*e)?1:0)-0.5; //Calculate initial residua
1556  const_cast<TMVA::Event*>(*e)->SetTarget(0,r);
1557  fResiduals[*e].push_back(0);
1558  }
1559  }
1560 
1561 }
1562 ////////////////////////////////////////////////////////////////////////////////
1563 /// test the tree quality.. in terms of Miscalssification
1564 
1566 {
1567  Double_t ncorrect=0, nfalse=0;
1568  for (UInt_t ievt=0; ievt<fValidationSample.size(); ievt++) {
1569  Bool_t isSignalType= (dt->CheckEvent(fValidationSample[ievt]) > fNodePurityLimit ) ? 1 : 0;
1570 
1571  if (isSignalType == (DataInfo().IsSignal(fValidationSample[ievt])) ) {
1572  ncorrect += fValidationSample[ievt]->GetWeight();
1573  }
1574  else{
1575  nfalse += fValidationSample[ievt]->GetWeight();
1576  }
1577  }
1578 
1579  return ncorrect / (ncorrect + nfalse);
1580 }
1581 
1582 ////////////////////////////////////////////////////////////////////////////////
1583 /// apply the boosting alogrithim (the algorithm is selecte via the the "option" given
1584 /// in the constructor. The return value is the boosting weight
1585 
1586 Double_t TMVA::MethodBDT::Boost( std::vector<const TMVA::Event*>& eventSample, DecisionTree *dt, UInt_t cls )
1587 {
1588  Double_t returnVal=-1;
1589 
1590  if (fBoostType=="AdaBoost") returnVal = this->AdaBoost (eventSample, dt);
1591  else if (fBoostType=="AdaCost") returnVal = this->AdaCost (eventSample, dt);
1592  else if (fBoostType=="Bagging") returnVal = this->Bagging ( );
1593  else if (fBoostType=="RegBoost") returnVal = this->RegBoost (eventSample, dt);
1594  else if (fBoostType=="AdaBoostR2") returnVal = this->AdaBoostR2(eventSample, dt);
1595  else if (fBoostType=="Grad"){
1596  if(DoRegression())
1597  returnVal = this->GradBoostRegression(eventSample, dt);
1598  else if(DoMulticlass())
1599  returnVal = this->GradBoost (eventSample, dt, cls);
1600  else
1601  returnVal = this->GradBoost (eventSample, dt);
1602  }
1603  else {
1604  Log() << kINFO << GetOptions() << Endl;
1605  Log() << kFATAL << "<Boost> unknown boost option " << fBoostType<< " called" << Endl;
1606  }
1607 
1608  if (fBaggedBoost){
1610  }
1611 
1612 
1613  return returnVal;
1614 }
1615 
1616 ////////////////////////////////////////////////////////////////////////////////
1617 /// fills the ROCIntegral vs Itree from the testSample for the monitoring plots
1618 /// during the training .. but using the testing events
1619 
1621 {
1623 
1624  TH1F *tmpS = new TH1F( "tmpS", "", 100 , -1., 1.00001 );
1625  TH1F *tmpB = new TH1F( "tmpB", "", 100 , -1., 1.00001 );
1626  TH1F *tmp;
1627 
1628 
1629  UInt_t signalClassNr = DataInfo().GetClassInfo("Signal")->GetNumber();
1630 
1631  // const std::vector<Event*> events=Data()->GetEventCollection(Types::kTesting);
1632  // // fMethod->GetTransformationHandler().CalcTransformations(fMethod->Data()->GetEventCollection(Types::kTesting));
1633  // for (UInt_t iev=0; iev < events.size() ; iev++){
1634  // if (events[iev]->GetClass() == signalClassNr) tmp=tmpS;
1635  // else tmp=tmpB;
1636  // tmp->Fill(PrivateGetMvaValue(*(events[iev])),events[iev]->GetWeight());
1637  // }
1638 
1639  UInt_t nevents = Data()->GetNTestEvents();
1640  for (UInt_t iev=0; iev < nevents; iev++){
1641  const Event* event = GetTestingEvent(iev);
1642 
1643  if (event->GetClass() == signalClassNr) {tmp=tmpS;}
1644  else {tmp=tmpB;}
1645  tmp->Fill(PrivateGetMvaValue(event),event->GetWeight());
1646  }
1647  Double_t max=1;
1648 
1649  std::vector<TH1F*> hS;
1650  std::vector<TH1F*> hB;
1651  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
1652  hS.push_back(new TH1F(Form("SigVar%dAtTree%d",ivar,iTree),Form("SigVar%dAtTree%d",ivar,iTree),100,DataInfo().GetVariableInfo(ivar).GetMin(),DataInfo().GetVariableInfo(ivar).GetMax()));
1653  hB.push_back(new TH1F(Form("BkgVar%dAtTree%d",ivar,iTree),Form("BkgVar%dAtTree%d",ivar,iTree),100,DataInfo().GetVariableInfo(ivar).GetMin(),DataInfo().GetVariableInfo(ivar).GetMax()));
1654  results->Store(hS.back(),hS.back()->GetTitle());
1655  results->Store(hB.back(),hB.back()->GetTitle());
1656  }
1657 
1658 
1659  for (UInt_t iev=0; iev < fEventSample.size(); iev++){
1660  if (fEventSample[iev]->GetBoostWeight() > max) max = 1.01*fEventSample[iev]->GetBoostWeight();
1661  }
1662  TH1F *tmpBoostWeightsS = new TH1F(Form("BoostWeightsInTreeS%d",iTree),Form("BoostWeightsInTreeS%d",iTree),100,0.,max);
1663  TH1F *tmpBoostWeightsB = new TH1F(Form("BoostWeightsInTreeB%d",iTree),Form("BoostWeightsInTreeB%d",iTree),100,0.,max);
1664  results->Store(tmpBoostWeightsS,tmpBoostWeightsS->GetTitle());
1665  results->Store(tmpBoostWeightsB,tmpBoostWeightsB->GetTitle());
1666 
1667  TH1F *tmpBoostWeights;
1668  std::vector<TH1F*> *h;
1669 
1670  for (UInt_t iev=0; iev < fEventSample.size(); iev++){
1671  if (fEventSample[iev]->GetClass() == signalClassNr) {
1672  tmpBoostWeights=tmpBoostWeightsS;
1673  h=&hS;
1674  }else{
1675  tmpBoostWeights=tmpBoostWeightsB;
1676  h=&hB;
1677  }
1678  tmpBoostWeights->Fill(fEventSample[iev]->GetBoostWeight());
1679  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
1680  (*h)[ivar]->Fill(fEventSample[iev]->GetValue(ivar),fEventSample[iev]->GetWeight());
1681  }
1682  }
1683 
1684 
1685  TMVA::PDF *sig = new TMVA::PDF( " PDF Sig", tmpS, TMVA::PDF::kSpline3 );
1686  TMVA::PDF *bkg = new TMVA::PDF( " PDF Bkg", tmpB, TMVA::PDF::kSpline3 );
1687 
1688 
1689  TGraph* gr=results->GetGraph("BoostMonitorGraph");
1690  Int_t nPoints = gr->GetN();
1691  gr->Set(nPoints+1);
1692  gr->SetPoint(nPoints,(Double_t)iTree+1,GetROCIntegral(sig,bkg));
1693 
1694  tmpS->Delete();
1695  tmpB->Delete();
1696 
1697  delete sig;
1698  delete bkg;
1699 
1700  return;
1701 }
1702 
1703 ////////////////////////////////////////////////////////////////////////////////
1704 /// the AdaBoost implementation.
1705 /// a new training sample is generated by weighting
1706 /// events that are misclassified by the decision tree. The weight
1707 /// applied is w = (1-err)/err or more general:
1708 /// w = ((1-err)/err)^beta
1709 /// where err is the fraction of misclassified events in the tree ( <0.5 assuming
1710 /// demanding the that previous selection was better than random guessing)
1711 /// and "beta" being a free parameter (standard: beta = 1) that modifies the
1712 /// boosting.
1713 
1714 Double_t TMVA::MethodBDT::AdaBoost( std::vector<const TMVA::Event*>& eventSample, DecisionTree *dt )
1715 {
1716  Double_t err=0, sumGlobalw=0, sumGlobalwfalse=0, sumGlobalwfalse2=0;
1717 
1718  std::vector<Double_t> sumw(DataInfo().GetNClasses(),0); //for individually re-scaling each class
1719  std::map<Node*,Int_t> sigEventsInNode; // how many signal events of the training tree
1720 
1721  Double_t maxDev=0;
1722  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1723  Double_t w = (*e)->GetWeight();
1724  sumGlobalw += w;
1725  UInt_t iclass=(*e)->GetClass();
1726  sumw[iclass] += w;
1727 
1728  if ( DoRegression() ) {
1729  Double_t tmpDev = TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) );
1730  sumGlobalwfalse += w * tmpDev;
1731  sumGlobalwfalse2 += w * tmpDev*tmpDev;
1732  if (tmpDev > maxDev) maxDev = tmpDev;
1733  }else{
1734 
1735  if (fUseYesNoLeaf){
1736  Bool_t isSignalType = (dt->CheckEvent(*e,fUseYesNoLeaf) > fNodePurityLimit );
1737  if (!(isSignalType == DataInfo().IsSignal(*e))) {
1738  sumGlobalwfalse+= w;
1739  }
1740  }else{
1741  Double_t dtoutput = (dt->CheckEvent(*e,fUseYesNoLeaf) - 0.5)*2.;
1742  Int_t trueType;
1743  if (DataInfo().IsSignal(*e)) trueType = 1;
1744  else trueType = -1;
1745  sumGlobalwfalse+= w*trueType*dtoutput;
1746  }
1747  }
1748  }
1749 
1750  err = sumGlobalwfalse/sumGlobalw ;
1751  if ( DoRegression() ) {
1752  //if quadratic loss:
1753  if (fAdaBoostR2Loss=="linear"){
1754  err = sumGlobalwfalse/maxDev/sumGlobalw ;
1755  }
1756  else if (fAdaBoostR2Loss=="quadratic"){
1757  err = sumGlobalwfalse2/maxDev/maxDev/sumGlobalw ;
1758  }
1759  else if (fAdaBoostR2Loss=="exponential"){
1760  err = 0;
1761  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1762  Double_t w = (*e)->GetWeight();
1763  Double_t tmpDev = TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) );
1764  err += w * (1 - exp (-tmpDev/maxDev)) / sumGlobalw;
1765  }
1766 
1767  }
1768  else {
1769  Log() << kFATAL << " you've chosen a Loss type for Adaboost other than linear, quadratic or exponential "
1770  << " namely " << fAdaBoostR2Loss << "\n"
1771  << "and this is not implemented... a typo in the options ??" <<Endl;
1772  }
1773  }
1774 
1775  Log() << kDEBUG << "BDT AdaBoos wrong/all: " << sumGlobalwfalse << "/" << sumGlobalw << Endl;
1776 
1777 
1778  Double_t newSumGlobalw=0;
1779  std::vector<Double_t> newSumw(sumw.size(),0);
1780 
1781  Double_t boostWeight=1.;
1782  if (err >= 0.5 && fUseYesNoLeaf) { // sanity check ... should never happen as otherwise there is apparently
1783  // something odd with the assignement of the leaf nodes (rem: you use the training
1784  // events for this determination of the error rate)
1785  if (dt->GetNNodes() == 1){
1786  Log() << kERROR << " YOUR tree has only 1 Node... kind of a funny *tree*. I cannot "
1787  << "boost such a thing... if after 1 step the error rate is == 0.5"
1788  << Endl
1789  << "please check why this happens, maybe too many events per node requested ?"
1790  << Endl;
1791 
1792  }else{
1793  Log() << kERROR << " The error rate in the BDT boosting is > 0.5. ("<< err
1794  << ") That should not happen, please check your code (i.e... the BDT code), I "
1795  << " stop boosting here" << Endl;
1796  return -1;
1797  }
1798  err = 0.5;
1799  } else if (err < 0) {
1800  Log() << kERROR << " The error rate in the BDT boosting is < 0. That can happen"
1801  << " due to improper treatment of negative weights in a Monte Carlo.. (if you have"
1802  << " an idea on how to do it in a better way, please let me know (Helge.Voss@cern.ch)"
1803  << " for the time being I set it to its absolute value.. just to continue.." << Endl;
1804  err = TMath::Abs(err);
1805  }
1806  if (fUseYesNoLeaf)
1807  boostWeight = TMath::Log((1.-err)/err)*fAdaBoostBeta;
1808  else
1809  boostWeight = TMath::Log((1.+err)/(1-err))*fAdaBoostBeta;
1810 
1811 
1812  Log() << kDEBUG << "BDT AdaBoos wrong/all: " << sumGlobalwfalse << "/" << sumGlobalw << " 1-err/err="<<boostWeight<< " log.."<<TMath::Log(boostWeight)<<Endl;
1813 
1815 
1816 
1817  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1818 
1819  if (fUseYesNoLeaf||DoRegression()){
1820  if ((!( (dt->CheckEvent(*e,fUseYesNoLeaf) > fNodePurityLimit ) == DataInfo().IsSignal(*e))) || DoRegression()) {
1821  Double_t boostfactor = TMath::Exp(boostWeight);
1822 
1823  if (DoRegression()) boostfactor = TMath::Power(1/boostWeight,(1.-TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) )/maxDev ) );
1824  if ( (*e)->GetWeight() > 0 ){
1825  (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
1826  // Helge change back (*e)->ScaleBoostWeight(boostfactor);
1827  if (DoRegression()) results->GetHist("BoostWeights")->Fill(boostfactor);
1828  } else {
1829  if ( fInverseBoostNegWeights )(*e)->ScaleBoostWeight( 1. / boostfactor); // if the original event weight is negative, and you want to "increase" the events "positive" influence, you'd reather make the event weight "smaller" in terms of it's absolute value while still keeping it something "negative"
1830  else (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
1831 
1832  }
1833  }
1834 
1835  }else{
1836  Double_t dtoutput = (dt->CheckEvent(*e,fUseYesNoLeaf) - 0.5)*2.;
1837  Int_t trueType;
1838  if (DataInfo().IsSignal(*e)) trueType = 1;
1839  else trueType = -1;
1840  Double_t boostfactor = TMath::Exp(-1*boostWeight*trueType*dtoutput);
1841 
1842  if ( (*e)->GetWeight() > 0 ){
1843  (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
1844  // Helge change back (*e)->ScaleBoostWeight(boostfactor);
1845  if (DoRegression()) results->GetHist("BoostWeights")->Fill(boostfactor);
1846  } else {
1847  if ( fInverseBoostNegWeights )(*e)->ScaleBoostWeight( 1. / boostfactor); // if the original event weight is negative, and you want to "increase" the events "positive" influence, you'd reather make the event weight "smaller" in terms of it's absolute value while still keeping it something "negative"
1848  else (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
1849  }
1850  }
1851  newSumGlobalw+=(*e)->GetWeight();
1852  newSumw[(*e)->GetClass()] += (*e)->GetWeight();
1853  }
1854 
1855 
1856  // Double_t globalNormWeight=sumGlobalw/newSumGlobalw;
1857  Double_t globalNormWeight=( (Double_t) eventSample.size())/newSumGlobalw;
1858  Log() << kDEBUG << "new Nsig="<<newSumw[0]*globalNormWeight << " new Nbkg="<<newSumw[1]*globalNormWeight << Endl;
1859 
1860 
1861  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1862  // if (fRenormByClass) (*e)->ScaleBoostWeight( normWeightByClass[(*e)->GetClass()] );
1863  // else (*e)->ScaleBoostWeight( globalNormWeight );
1864  // else (*e)->ScaleBoostWeight( globalNormWeight );
1865  if (DataInfo().IsSignal(*e))(*e)->ScaleBoostWeight( globalNormWeight * fSigToBkgFraction );
1866  else (*e)->ScaleBoostWeight( globalNormWeight );
1867  }
1868 
1869  if (!(DoRegression()))results->GetHist("BoostWeights")->Fill(boostWeight);
1870  results->GetHist("BoostWeightsVsTree")->SetBinContent(fForest.size(),boostWeight);
1871  results->GetHist("ErrorFrac")->SetBinContent(fForest.size(),err);
1872 
1873  fBoostWeight = boostWeight;
1874  fErrorFraction = err;
1875 
1876  return boostWeight;
1877 }
1878 
1879 
1880 ////////////////////////////////////////////////////////////////////////////////
1881 /// the AdaCost boosting algorithm takes a simple cost Matrix (currently fixed for
1882 /// all events... later could be modified to use individual cost matrices for each
1883 /// events as in the original paper...
1884 ///
1885 /// true_signal true_bkg
1886 /// ----------------------------------
1887 /// sel_signal | Css Ctb_ss Cxx.. in the range [0,1]
1888 /// sel_bkg | Cts_sb Cbb
1889 ///
1890 /// and takes this into account when calculating the misclass. cost (former: error fraction):
1891 ///
1892 /// err = sum_events ( weight* y_true*y_sel * beta(event)
1893 ///
1894 
1895 Double_t TMVA::MethodBDT::AdaCost( vector<const TMVA::Event*>& eventSample, DecisionTree *dt )
1896 {
1897  Double_t Css = fCss;
1898  Double_t Cbb = fCbb;
1899  Double_t Cts_sb = fCts_sb;
1900  Double_t Ctb_ss = fCtb_ss;
1901 
1902  Double_t err=0, sumGlobalWeights=0, sumGlobalCost=0;
1903 
1904  std::vector<Double_t> sumw(DataInfo().GetNClasses(),0); //for individually re-scaling each class
1905  std::map<Node*,Int_t> sigEventsInNode; // how many signal events of the training tree
1906 
1907  for (vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1908  Double_t w = (*e)->GetWeight();
1909  sumGlobalWeights += w;
1910  UInt_t iclass=(*e)->GetClass();
1911 
1912  sumw[iclass] += w;
1913 
1914  if ( DoRegression() ) {
1915  Log() << kFATAL << " AdaCost not implemented for regression"<<Endl;
1916  }else{
1917 
1918  Double_t dtoutput = (dt->CheckEvent(*e,false) - 0.5)*2.;
1919  Int_t trueType;
1920  Bool_t isTrueSignal = DataInfo().IsSignal(*e);
1921  Bool_t isSelectedSignal = (dtoutput>0);
1922  if (isTrueSignal) trueType = 1;
1923  else trueType = -1;
1924 
1925  Double_t cost=0;
1926  if (isTrueSignal && isSelectedSignal) cost=Css;
1927  else if (isTrueSignal && !isSelectedSignal) cost=Cts_sb;
1928  else if (!isTrueSignal && isSelectedSignal) cost=Ctb_ss;
1929  else if (!isTrueSignal && !isSelectedSignal) cost=Cbb;
1930  else Log() << kERROR << "something went wrong in AdaCost" << Endl;
1931 
1932  sumGlobalCost+= w*trueType*dtoutput*cost;
1933 
1934  }
1935  }
1936 
1937  if ( DoRegression() ) {
1938  Log() << kFATAL << " AdaCost not implemented for regression"<<Endl;
1939  }
1940 
1941  // Log() << kDEBUG << "BDT AdaBoos wrong/all: " << sumGlobalCost << "/" << sumGlobalWeights << Endl;
1942  // Log() << kWARNING << "BDT AdaBoos wrong/all: " << sumGlobalCost << "/" << sumGlobalWeights << Endl;
1943  sumGlobalCost /= sumGlobalWeights;
1944  // Log() << kWARNING << "BDT AdaBoos wrong/all: " << sumGlobalCost << "/" << sumGlobalWeights << Endl;
1945 
1946 
1947  Double_t newSumGlobalWeights=0;
1948  vector<Double_t> newSumClassWeights(sumw.size(),0);
1949 
1950  Double_t boostWeight = TMath::Log((1+sumGlobalCost)/(1-sumGlobalCost)) * fAdaBoostBeta;
1951 
1953 
1954  for (vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1955  Double_t dtoutput = (dt->CheckEvent(*e,false) - 0.5)*2.;
1956  Int_t trueType;
1957  Bool_t isTrueSignal = DataInfo().IsSignal(*e);
1958  Bool_t isSelectedSignal = (dtoutput>0);
1959  if (isTrueSignal) trueType = 1;
1960  else trueType = -1;
1961 
1962  Double_t cost=0;
1963  if (isTrueSignal && isSelectedSignal) cost=Css;
1964  else if (isTrueSignal && !isSelectedSignal) cost=Cts_sb;
1965  else if (!isTrueSignal && isSelectedSignal) cost=Ctb_ss;
1966  else if (!isTrueSignal && !isSelectedSignal) cost=Cbb;
1967  else Log() << kERROR << "something went wrong in AdaCost" << Endl;
1968 
1969  Double_t boostfactor = TMath::Exp(-1*boostWeight*trueType*dtoutput*cost);
1970  if (DoRegression())Log() << kFATAL << " AdaCost not implemented for regression"<<Endl;
1971  if ( (*e)->GetWeight() > 0 ){
1972  (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
1973  // Helge change back (*e)->ScaleBoostWeight(boostfactor);
1974  if (DoRegression())Log() << kFATAL << " AdaCost not implemented for regression"<<Endl;
1975  } else {
1976  if ( fInverseBoostNegWeights )(*e)->ScaleBoostWeight( 1. / boostfactor); // if the original event weight is negative, and you want to "increase" the events "positive" influence, you'd reather make the event weight "smaller" in terms of it's absolute value while still keeping it something "negative"
1977  }
1978 
1979  newSumGlobalWeights+=(*e)->GetWeight();
1980  newSumClassWeights[(*e)->GetClass()] += (*e)->GetWeight();
1981  }
1982 
1983 
1984  // Double_t globalNormWeight=sumGlobalWeights/newSumGlobalWeights;
1985  Double_t globalNormWeight=Double_t(eventSample.size())/newSumGlobalWeights;
1986  Log() << kDEBUG << "new Nsig="<<newSumClassWeights[0]*globalNormWeight << " new Nbkg="<<newSumClassWeights[1]*globalNormWeight << Endl;
1987 
1988 
1989  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
1990  // if (fRenormByClass) (*e)->ScaleBoostWeight( normWeightByClass[(*e)->GetClass()] );
1991  // else (*e)->ScaleBoostWeight( globalNormWeight );
1992  if (DataInfo().IsSignal(*e))(*e)->ScaleBoostWeight( globalNormWeight * fSigToBkgFraction );
1993  else (*e)->ScaleBoostWeight( globalNormWeight );
1994  }
1995 
1996 
1997  if (!(DoRegression()))results->GetHist("BoostWeights")->Fill(boostWeight);
1998  results->GetHist("BoostWeightsVsTree")->SetBinContent(fForest.size(),boostWeight);
1999  results->GetHist("ErrorFrac")->SetBinContent(fForest.size(),err);
2000 
2001  fBoostWeight = boostWeight;
2002  fErrorFraction = err;
2003 
2004 
2005  return boostWeight;
2006 }
2007 
2008 
2009 ////////////////////////////////////////////////////////////////////////////////
2010 /// call it boot-strapping, re-sampling or whatever you like, in the end it is nothing
2011 /// else but applying "random" poisson weights to each event.
2012 
2014 {
2015  // this is now done in "MethodBDT::Boost as it might be used by other boost methods, too
2016  // GetBaggedSample(eventSample);
2017 
2018  return 1.; //here as there are random weights for each event, just return a constant==1;
2019 }
2020 
2021 ////////////////////////////////////////////////////////////////////////////////
2022 /// fills fEventSample with fBaggedSampleFraction*NEvents random training events
2023 
2024 void TMVA::MethodBDT::GetBaggedSubSample(std::vector<const TMVA::Event*>& eventSample)
2025 {
2026 
2027  Double_t n;
2028  TRandom3 *trandom = new TRandom3(100*fForest.size()+1234);
2029 
2030  if (!fSubSample.empty()) fSubSample.clear();
2031 
2032  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
2033  n = trandom->PoissonD(fBaggedSampleFraction);
2034  for (Int_t i=0;i<n;i++) fSubSample.push_back(*e);
2035  }
2036 
2037  delete trandom;
2038  return;
2039 
2040  /*
2041  UInt_t nevents = fEventSample.size();
2042 
2043  if (!fSubSample.empty()) fSubSample.clear();
2044  TRandom3 *trandom = new TRandom3(fForest.size()+1);
2045 
2046  for (UInt_t ievt=0; ievt<nevents; ievt++) { // recreate new random subsample
2047  if(trandom->Rndm()<fBaggedSampleFraction)
2048  fSubSample.push_back(fEventSample[ievt]);
2049  }
2050  delete trandom;
2051  */
2052 
2053 }
2054 
2055 ////////////////////////////////////////////////////////////////////////////////
2056 /// a special boosting only for Regression ...
2057 /// maybe I'll implement it later...
2058 
2059 Double_t TMVA::MethodBDT::RegBoost( std::vector<const TMVA::Event*>& /* eventSample */, DecisionTree* /* dt */ )
2060 {
2061  return 1;
2062 }
2063 
2064 ////////////////////////////////////////////////////////////////////////////////
2065 /// adaption of the AdaBoost to regression problems (see H.Drucker 1997)
2066 
2067 Double_t TMVA::MethodBDT::AdaBoostR2( std::vector<const TMVA::Event*>& eventSample, DecisionTree *dt )
2068 {
2069  if ( !DoRegression() ) Log() << kFATAL << "Somehow you chose a regression boost method for a classification job" << Endl;
2070 
2071  Double_t err=0, sumw=0, sumwfalse=0, sumwfalse2=0;
2072  Double_t maxDev=0;
2073  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
2074  Double_t w = (*e)->GetWeight();
2075  sumw += w;
2076 
2077  Double_t tmpDev = TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) );
2078  sumwfalse += w * tmpDev;
2079  sumwfalse2 += w * tmpDev*tmpDev;
2080  if (tmpDev > maxDev) maxDev = tmpDev;
2081  }
2082 
2083  //if quadratic loss:
2084  if (fAdaBoostR2Loss=="linear"){
2085  err = sumwfalse/maxDev/sumw ;
2086  }
2087  else if (fAdaBoostR2Loss=="quadratic"){
2088  err = sumwfalse2/maxDev/maxDev/sumw ;
2089  }
2090  else if (fAdaBoostR2Loss=="exponential"){
2091  err = 0;
2092  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
2093  Double_t w = (*e)->GetWeight();
2094  Double_t tmpDev = TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) );
2095  err += w * (1 - exp (-tmpDev/maxDev)) / sumw;
2096  }
2097 
2098  }
2099  else {
2100  Log() << kFATAL << " you've chosen a Loss type for Adaboost other than linear, quadratic or exponential "
2101  << " namely " << fAdaBoostR2Loss << "\n"
2102  << "and this is not implemented... a typo in the options ??" <<Endl;
2103  }
2104 
2105 
2106  if (err >= 0.5) { // sanity check ... should never happen as otherwise there is apparently
2107  // something odd with the assignement of the leaf nodes (rem: you use the training
2108  // events for this determination of the error rate)
2109  if (dt->GetNNodes() == 1){
2110  Log() << kERROR << " YOUR tree has only 1 Node... kind of a funny *tree*. I cannot "
2111  << "boost such a thing... if after 1 step the error rate is == 0.5"
2112  << Endl
2113  << "please check why this happens, maybe too many events per node requested ?"
2114  << Endl;
2115 
2116  }else{
2117  Log() << kERROR << " The error rate in the BDT boosting is > 0.5. ("<< err
2118  << ") That should not happen, but is possible for regression trees, and"
2119  << " should trigger a stop for the boosting. please check your code (i.e... the BDT code), I "
2120  << " stop boosting " << Endl;
2121  return -1;
2122  }
2123  err = 0.5;
2124  } else if (err < 0) {
2125  Log() << kERROR << " The error rate in the BDT boosting is < 0. That can happen"
2126  << " due to improper treatment of negative weights in a Monte Carlo.. (if you have"
2127  << " an idea on how to do it in a better way, please let me know (Helge.Voss@cern.ch)"
2128  << " for the time being I set it to its absolute value.. just to continue.." << Endl;
2129  err = TMath::Abs(err);
2130  }
2131 
2132  Double_t boostWeight = err / (1.-err);
2133  Double_t newSumw=0;
2134 
2136 
2137  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
2138  Double_t boostfactor = TMath::Power(boostWeight,(1.-TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) )/maxDev ) );
2139  results->GetHist("BoostWeights")->Fill(boostfactor);
2140  // std::cout << "R2 " << boostfactor << " " << boostWeight << " " << (1.-TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) )/maxDev) << std::endl;
2141  if ( (*e)->GetWeight() > 0 ){
2142  Float_t newBoostWeight = (*e)->GetBoostWeight() * boostfactor;
2143  Float_t newWeight = (*e)->GetWeight() * (*e)->GetBoostWeight() * boostfactor;
2144  if (newWeight == 0) {
2145  Log() << kINFO << "Weight= " << (*e)->GetWeight() << Endl;
2146  Log() << kINFO << "BoostWeight= " << (*e)->GetBoostWeight() << Endl;
2147  Log() << kINFO << "boostweight="<<boostWeight << " err= " <<err << Endl;
2148  Log() << kINFO << "NewBoostWeight= " << newBoostWeight << Endl;
2149  Log() << kINFO << "boostfactor= " << boostfactor << Endl;
2150  Log() << kINFO << "maxDev = " << maxDev << Endl;
2151  Log() << kINFO << "tmpDev = " << TMath::Abs(dt->CheckEvent(*e,kFALSE) - (*e)->GetTarget(0) ) << Endl;
2152  Log() << kINFO << "target = " << (*e)->GetTarget(0) << Endl;
2153  Log() << kINFO << "estimate = " << dt->CheckEvent(*e,kFALSE) << Endl;
2154  }
2155  (*e)->SetBoostWeight( newBoostWeight );
2156  // (*e)->SetBoostWeight( (*e)->GetBoostWeight() * boostfactor);
2157  } else {
2158  (*e)->SetBoostWeight( (*e)->GetBoostWeight() / boostfactor);
2159  }
2160  newSumw+=(*e)->GetWeight();
2161  }
2162 
2163  // re-normalise the weights
2164  Double_t normWeight = sumw / newSumw;
2165  for (std::vector<const TMVA::Event*>::const_iterator e=eventSample.begin(); e!=eventSample.end();e++) {
2166  //Helge (*e)->ScaleBoostWeight( sumw/newSumw);
2167  // (*e)->ScaleBoostWeight( normWeight);
2168  (*e)->SetBoostWeight( (*e)->GetBoostWeight() * normWeight );
2169  }
2170 
2171 
2172  results->GetHist("BoostWeightsVsTree")->SetBinContent(fForest.size(),1./boostWeight);
2173  results->GetHist("ErrorFrac")->SetBinContent(fForest.size(),err);
2174 
2175  fBoostWeight = boostWeight;
2176  fErrorFraction = err;
2177 
2178  return TMath::Log(1./boostWeight);
2179 }
2180 
2181 ////////////////////////////////////////////////////////////////////////////////
2182 /// write weights to XML
2183 
2184 void TMVA::MethodBDT::AddWeightsXMLTo( void* parent ) const
2185 {
2186  void* wght = gTools().AddChild(parent, "Weights");
2187 
2188  if (fDoPreselection){
2189  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
2190  gTools().AddAttr( wght, Form("PreselectionLowBkgVar%d",ivar), fIsLowBkgCut[ivar]);
2191  gTools().AddAttr( wght, Form("PreselectionLowBkgVar%dValue",ivar), fLowBkgCut[ivar]);
2192  gTools().AddAttr( wght, Form("PreselectionLowSigVar%d",ivar), fIsLowSigCut[ivar]);
2193  gTools().AddAttr( wght, Form("PreselectionLowSigVar%dValue",ivar), fLowSigCut[ivar]);
2194  gTools().AddAttr( wght, Form("PreselectionHighBkgVar%d",ivar), fIsHighBkgCut[ivar]);
2195  gTools().AddAttr( wght, Form("PreselectionHighBkgVar%dValue",ivar),fHighBkgCut[ivar]);
2196  gTools().AddAttr( wght, Form("PreselectionHighSigVar%d",ivar), fIsHighSigCut[ivar]);
2197  gTools().AddAttr( wght, Form("PreselectionHighSigVar%dValue",ivar),fHighSigCut[ivar]);
2198  }
2199  }
2200 
2201 
2202  gTools().AddAttr( wght, "NTrees", fForest.size() );
2203  gTools().AddAttr( wght, "AnalysisType", fForest.back()->GetAnalysisType() );
2204 
2205  for (UInt_t i=0; i< fForest.size(); i++) {
2206  void* trxml = fForest[i]->AddXMLTo(wght);
2207  gTools().AddAttr( trxml, "boostWeight", fBoostWeights[i] );
2208  gTools().AddAttr( trxml, "itree", i );
2209  }
2210 }
2211 
2212 ////////////////////////////////////////////////////////////////////////////////
2213 /// reads the BDT from the xml file
2214 
2216  UInt_t i;
2217  for (i=0; i<fForest.size(); i++) delete fForest[i];
2218  fForest.clear();
2219  fBoostWeights.clear();
2220 
2221  UInt_t ntrees;
2222  UInt_t analysisType;
2223  Float_t boostWeight;
2224 
2225 
2226  if (gTools().HasAttr( parent, Form("PreselectionLowBkgVar%d",0))) {
2227  fIsLowBkgCut.resize(GetNvar());
2228  fLowBkgCut.resize(GetNvar());
2229  fIsLowSigCut.resize(GetNvar());
2230  fLowSigCut.resize(GetNvar());
2231  fIsHighBkgCut.resize(GetNvar());
2232  fHighBkgCut.resize(GetNvar());
2233  fIsHighSigCut.resize(GetNvar());
2234  fHighSigCut.resize(GetNvar());
2235 
2236  Bool_t tmpBool;
2237  Double_t tmpDouble;
2238  for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
2239  gTools().ReadAttr( parent, Form("PreselectionLowBkgVar%d",ivar), tmpBool);
2240  fIsLowBkgCut[ivar]=tmpBool;
2241  gTools().ReadAttr( parent, Form("PreselectionLowBkgVar%dValue",ivar), tmpDouble);
2242  fLowBkgCut[ivar]=tmpDouble;
2243  gTools().ReadAttr( parent, Form("PreselectionLowSigVar%d",ivar), tmpBool);
2244  fIsLowSigCut[ivar]=tmpBool;
2245  gTools().ReadAttr( parent, Form("PreselectionLowSigVar%dValue",ivar), tmpDouble);
2246  fLowSigCut[ivar]=tmpDouble;
2247  gTools().ReadAttr( parent, Form("PreselectionHighBkgVar%d",ivar), tmpBool);
2248  fIsHighBkgCut[ivar]=tmpBool;
2249  gTools().ReadAttr( parent, Form("PreselectionHighBkgVar%dValue",ivar), tmpDouble);
2250  fHighBkgCut[ivar]=tmpDouble;
2251  gTools().ReadAttr( parent, Form("PreselectionHighSigVar%d",ivar),tmpBool);
2252  fIsHighSigCut[ivar]=tmpBool;
2253  gTools().ReadAttr( parent, Form("PreselectionHighSigVar%dValue",ivar), tmpDouble);
2254  fHighSigCut[ivar]=tmpDouble;
2255  }
2256  }
2257 
2258 
2259  gTools().ReadAttr( parent, "NTrees", ntrees );
2260 
2261  if(gTools().HasAttr(parent, "TreeType")) { // pre 4.1.0 version
2262  gTools().ReadAttr( parent, "TreeType", analysisType );
2263  } else { // from 4.1.0 onwards
2264  gTools().ReadAttr( parent, "AnalysisType", analysisType );
2265  }
2266 
2267  void* ch = gTools().GetChild(parent);
2268  i=0;
2269  while(ch) {
2270  fForest.push_back( dynamic_cast<DecisionTree*>( DecisionTree::CreateFromXML(ch, GetTrainingTMVAVersionCode()) ) );
2271  fForest.back()->SetAnalysisType(Types::EAnalysisType(analysisType));
2272  fForest.back()->SetTreeID(i++);
2273  gTools().ReadAttr(ch,"boostWeight",boostWeight);
2274  fBoostWeights.push_back(boostWeight);
2275  ch = gTools().GetNextChild(ch);
2276  }
2277 }
2278 
2279 ////////////////////////////////////////////////////////////////////////////////
2280 /// read the weights (BDT coefficients)
2281 
2282 void TMVA::MethodBDT::ReadWeightsFromStream( std::istream& istr )
2283 {
2284  TString dummy;
2285  // Types::EAnalysisType analysisType;
2286  Int_t analysisType(0);
2287 
2288  // coverity[tainted_data_argument]
2289  istr >> dummy >> fNTrees;
2290  Log() << kINFO << "Read " << fNTrees << " Decision trees" << Endl;
2291 
2292  for (UInt_t i=0;i<fForest.size();i++) delete fForest[i];
2293  fForest.clear();
2294  fBoostWeights.clear();
2295  Int_t iTree;
2296  Double_t boostWeight;
2297  for (int i=0;i<fNTrees;i++) {
2298  istr >> dummy >> iTree >> dummy >> boostWeight;
2299  if (iTree != i) {
2300  fForest.back()->Print( std::cout );
2301  Log() << kFATAL << "Error while reading weight file; mismatch iTree="
2302  << iTree << " i=" << i
2303  << " dummy " << dummy
2304  << " boostweight " << boostWeight
2305  << Endl;
2306  }
2307  fForest.push_back( new DecisionTree() );
2308  fForest.back()->SetAnalysisType(Types::EAnalysisType(analysisType));
2309  fForest.back()->SetTreeID(i);
2310  fForest.back()->Read(istr, GetTrainingTMVAVersionCode());
2311  fBoostWeights.push_back(boostWeight);
2312  }
2313 }
2314 
2315 ////////////////////////////////////////////////////////////////////////////////
2316 
2318  return this->GetMvaValue( err, errUpper, 0 );
2319 }
2320 
2321 ////////////////////////////////////////////////////////////////////////////////
2322 /// Return the MVA value (range [-1;1]) that classifies the
2323 /// event according to the majority vote from the total number of
2324 /// decision trees.
2325 
2327 {
2328  const Event* ev = GetEvent();
2329  if (fDoPreselection) {
2330  Double_t val = ApplyPreselectionCuts(ev);
2331  if (TMath::Abs(val)>0.05) return val;
2332  }
2333  return PrivateGetMvaValue(ev, err, errUpper, useNTrees);
2334 
2335 }
2336 ////////////////////////////////////////////////////////////////////////////////
2337 /// Return the MVA value (range [-1;1]) that classifies the
2338 /// event according to the majority vote from the total number of
2339 /// decision trees.
2340 
2342 {
2343  // cannot determine error
2344  NoErrorCalc(err, errUpper);
2345 
2346  // allow for the possibility to use less trees in the actual MVA calculation
2347  // than have been originally trained.
2348  UInt_t nTrees = fForest.size();
2349 
2350  if (useNTrees > 0 ) nTrees = useNTrees;
2351 
2352  if (fBoostType=="Grad") return GetGradBoostMVA(ev,nTrees);
2353 
2354  Double_t myMVA = 0;
2355  Double_t norm = 0;
2356  for (UInt_t itree=0; itree<nTrees; itree++) {
2357  //
2358  myMVA += fBoostWeights[itree] * fForest[itree]->CheckEvent(ev,fUseYesNoLeaf);
2359  norm += fBoostWeights[itree];
2360  }
2361  return ( norm > std::numeric_limits<double>::epsilon() ) ? myMVA /= norm : 0 ;
2362 }
2363 
2364 
2365 ////////////////////////////////////////////////////////////////////////////////
2366 /// get the multiclass MVA response for the BDT classifier
2367 
2368 const std::vector<Float_t>& TMVA::MethodBDT::GetMulticlassValues()
2369 {
2370  const TMVA::Event *e = GetEvent();
2371  if (fMulticlassReturnVal == NULL) fMulticlassReturnVal = new std::vector<Float_t>();
2372  fMulticlassReturnVal->clear();
2373 
2374  std::vector<double> temp;
2375 
2376  UInt_t nClasses = DataInfo().GetNClasses();
2377  for(UInt_t iClass=0; iClass<nClasses; iClass++){
2378  temp.push_back(0.0);
2379  for(UInt_t itree = iClass; itree<fForest.size(); itree+=nClasses){
2380  temp[iClass] += fForest[itree]->CheckEvent(e,kFALSE);
2381  }
2382  }
2383 
2384  for(UInt_t iClass=0; iClass<nClasses; iClass++){
2385  Double_t norm = 0.0;
2386  for(UInt_t j=0;j<nClasses;j++){
2387  if(iClass!=j)
2388  norm+=exp(temp[j]-temp[iClass]);
2389  }
2390  (*fMulticlassReturnVal).push_back(1.0/(1.0+norm));
2391  }
2392 
2393 
2394  return *fMulticlassReturnVal;
2395 }
2396 
2397 
2398 
2399 
2400 ////////////////////////////////////////////////////////////////////////////////
2401 /// get the regression value generated by the BDTs
2402 
2403 const std::vector<Float_t> & TMVA::MethodBDT::GetRegressionValues()
2404 {
2405 
2406  if (fRegressionReturnVal == NULL) fRegressionReturnVal = new std::vector<Float_t>();
2407  fRegressionReturnVal->clear();
2408 
2409  const Event * ev = GetEvent();
2410  Event * evT = new Event(*ev);
2411 
2412  Double_t myMVA = 0;
2413  Double_t norm = 0;
2414  if (fBoostType=="AdaBoostR2") {
2415  // rather than using the weighted average of the tree respones in the forest
2416  // H.Decker(1997) proposed to use the "weighted median"
2417 
2418  // sort all individual tree responses according to the prediction value
2419  // (keep the association to their tree weight)
2420  // the sum up all the associated weights (starting from the one whose tree
2421  // yielded the smalles response) up to the tree "t" at which you've
2422  // added enough tree weights to have more than half of the sum of all tree weights.
2423  // choose as response of the forest that one which belongs to this "t"
2424 
2425  vector< Double_t > response(fForest.size());
2426  vector< Double_t > weight(fForest.size());
2427  Double_t totalSumOfWeights = 0;
2428 
2429  for (UInt_t itree=0; itree<fForest.size(); itree++) {
2430  response[itree] = fForest[itree]->CheckEvent(ev,kFALSE);
2431  weight[itree] = fBoostWeights[itree];
2432  totalSumOfWeights += fBoostWeights[itree];
2433  }
2434 
2435  std::vector< std::vector<Double_t> > vtemp;
2436  vtemp.push_back( response ); // this is the vector that will get sorted
2437  vtemp.push_back( weight );
2438  gTools().UsefulSortAscending( vtemp );
2439 
2440  Int_t t=0;
2441  Double_t sumOfWeights = 0;
2442  while (sumOfWeights <= totalSumOfWeights/2.) {
2443  sumOfWeights += vtemp[1][t];
2444  t++;
2445  }
2446 
2447  Double_t rVal=0;
2448  Int_t count=0;
2449  for (UInt_t i= TMath::Max(UInt_t(0),UInt_t(t-(fForest.size()/6)-0.5));
2450  i< TMath::Min(UInt_t(fForest.size()),UInt_t(t+(fForest.size()/6)+0.5)); i++) {
2451  count++;
2452  rVal+=vtemp[0][i];
2453  }
2454  // fRegressionReturnVal->push_back( rVal/Double_t(count));
2455  evT->SetTarget(0, rVal/Double_t(count) );
2456  }
2457  else if(fBoostType=="Grad"){
2458  for (UInt_t itree=0; itree<fForest.size(); itree++) {
2459  myMVA += fForest[itree]->CheckEvent(ev,kFALSE);
2460  }
2461  // fRegressionReturnVal->push_back( myMVA+fBoostWeights[0]);
2462  evT->SetTarget(0, myMVA+fBoostWeights[0] );
2463  }
2464  else{
2465  for (UInt_t itree=0; itree<fForest.size(); itree++) {
2466  //
2467  myMVA += fBoostWeights[itree] * fForest[itree]->CheckEvent(ev,kFALSE);
2468  norm += fBoostWeights[itree];
2469  }
2470  // fRegressionReturnVal->push_back( ( norm > std::numeric_limits<double>::epsilon() ) ? myMVA /= norm : 0 );
2471  evT->SetTarget(0, ( norm > std::numeric_limits<double>::epsilon() ) ? myMVA /= norm : 0 );
2472  }
2473 
2474 
2475 
2476  const Event* evT2 = GetTransformationHandler().InverseTransform( evT );
2477  fRegressionReturnVal->push_back( evT2->GetTarget(0) );
2478 
2479  delete evT;
2480 
2481 
2482  return *fRegressionReturnVal;
2483 }
2484 
2485 ////////////////////////////////////////////////////////////////////////////////
2486 /// Here we could write some histograms created during the processing
2487 /// to the output file.
2488 
2490 {
2491  Log() << kDEBUG << "\tWrite monitoring histograms to file: " << BaseDir()->GetPath() << Endl;
2492 
2493  //Results* results = Data()->GetResults(GetMethodName(), Types::kTraining, Types::kMaxAnalysisType);
2494  //results->GetStorage()->Write();
2495  fMonitorNtuple->Write();
2496 }
2497 
2498 ////////////////////////////////////////////////////////////////////////////////
2499 /// Return the relative variable importance, normalized to all
2500 /// variables together having the importance 1. The importance in
2501 /// evaluated as the total separation-gain that this variable had in
2502 /// the decision trees (weighted by the number of events)
2503 
2505 {
2506  fVariableImportance.resize(GetNvar());
2507  for (UInt_t ivar = 0; ivar < GetNvar(); ivar++) {
2508  fVariableImportance[ivar]=0;
2509  }
2510  Double_t sum=0;
2511  for (UInt_t itree = 0; itree < GetNTrees(); itree++) {
2512  std::vector<Double_t> relativeImportance(fForest[itree]->GetVariableImportance());
2513  for (UInt_t i=0; i< relativeImportance.size(); i++) {
2514  fVariableImportance[i] += fBoostWeights[itree] * relativeImportance[i];
2515  }
2516  }
2517 
2518  for (UInt_t ivar=0; ivar< fVariableImportance.size(); ivar++){
2520  sum += fVariableImportance[ivar];
2521  }
2522  for (UInt_t ivar=0; ivar< fVariableImportance.size(); ivar++) fVariableImportance[ivar] /= sum;
2523 
2524  return fVariableImportance;
2525 }
2526 
2527 ////////////////////////////////////////////////////////////////////////////////
2528 /// Returns the measure for the variable importance of variable "ivar"
2529 /// which is later used in GetVariableImportance() to calculate the
2530 /// relative variable importances.
2531 
2533 {
2534  std::vector<Double_t> relativeImportance = this->GetVariableImportance();
2535  if (ivar < (UInt_t)relativeImportance.size()) return relativeImportance[ivar];
2536  else Log() << kFATAL << "<GetVariableImportance> ivar = " << ivar << " is out of range " << Endl;
2537 
2538  return -1;
2539 }
2540 
2541 ////////////////////////////////////////////////////////////////////////////////
2542 /// Compute ranking of input variables
2543 
2545 {
2546  // create the ranking object
2547  fRanking = new Ranking( GetName(), "Variable Importance" );
2548  vector< Double_t> importance(this->GetVariableImportance());
2549 
2550  for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
2551 
2552  fRanking->AddRank( Rank( GetInputLabel(ivar), importance[ivar] ) );
2553  }
2554 
2555  return fRanking;
2556 }
2557 
2558 ////////////////////////////////////////////////////////////////////////////////
2559 /// Get help message text
2560 ///
2561 /// typical length of text line:
2562 /// "|--------------------------------------------------------------|"
2563 
2565 {
2566  Log() << Endl;
2567  Log() << gTools().Color("bold") << "--- Short description:" << gTools().Color("reset") << Endl;
2568  Log() << Endl;
2569  Log() << "Boosted Decision Trees are a collection of individual decision" << Endl;
2570  Log() << "trees which form a multivariate classifier by (weighted) majority " << Endl;
2571  Log() << "vote of the individual trees. Consecutive decision trees are " << Endl;
2572  Log() << "trained using the original training data set with re-weighted " << Endl;
2573  Log() << "events. By default, the AdaBoost method is employed, which gives " << Endl;
2574  Log() << "events that were misclassified in the previous tree a larger " << Endl;
2575  Log() << "weight in the training of the following tree." << Endl;
2576  Log() << Endl;
2577  Log() << "Decision trees are a sequence of binary splits of the data sample" << Endl;
2578  Log() << "using a single descriminant variable at a time. A test event " << Endl;
2579  Log() << "ending up after the sequence of left-right splits in a final " << Endl;
2580  Log() << "(\"leaf\") node is classified as either signal or background" << Endl;
2581  Log() << "depending on the majority type of training events in that node." << Endl;
2582  Log() << Endl;
2583  Log() << gTools().Color("bold") << "--- Performance optimisation:" << gTools().Color("reset") << Endl;
2584  Log() << Endl;
2585  Log() << "By the nature of the binary splits performed on the individual" << Endl;
2586  Log() << "variables, decision trees do not deal well with linear correlations" << Endl;
2587  Log() << "between variables (they need to approximate the linear split in" << Endl;
2588  Log() << "the two dimensional space by a sequence of splits on the two " << Endl;
2589  Log() << "variables individually). Hence decorrelation could be useful " << Endl;
2590  Log() << "to optimise the BDT performance." << Endl;
2591  Log() << Endl;
2592  Log() << gTools().Color("bold") << "--- Performance tuning via configuration options:" << gTools().Color("reset") << Endl;
2593  Log() << Endl;
2594  Log() << "The two most important parameters in the configuration are the " << Endl;
2595  Log() << "minimal number of events requested by a leaf node as percentage of the " <<Endl;
2596  Log() << " number of training events (option \"MinNodeSize\" replacing the actual number " << Endl;
2597  Log() << " of events \"nEventsMin\" as given in earlier versions" << Endl;
2598  Log() << "If this number is too large, detailed features " << Endl;
2599  Log() << "in the parameter space are hard to be modelled. If it is too small, " << Endl;
2600  Log() << "the risk to overtrain rises and boosting seems to be less effective" << Endl;
2601  Log() << " typical values from our current expericience for best performance " << Endl;
2602  Log() << " are between 0.5(%) and 10(%) " << Endl;
2603  Log() << Endl;
2604  Log() << "The default minimal number is currently set to " << Endl;
2605  Log() << " max(20, (N_training_events / N_variables^2 / 10)) " << Endl;
2606  Log() << "and can be changed by the user." << Endl;
2607  Log() << Endl;
2608  Log() << "The other crucial parameter, the pruning strength (\"PruneStrength\")," << Endl;
2609  Log() << "is also related to overtraining. It is a regularisation parameter " << Endl;
2610  Log() << "that is used when determining after the training which splits " << Endl;
2611  Log() << "are considered statistically insignificant and are removed. The" << Endl;
2612  Log() << "user is advised to carefully watch the BDT screen output for" << Endl;
2613  Log() << "the comparison between efficiencies obtained on the training and" << Endl;
2614  Log() << "the independent test sample. They should be equal within statistical" << Endl;
2615  Log() << "errors, in order to minimize statistical fluctuations in different samples." << Endl;
2616 }
2617 
2618 ////////////////////////////////////////////////////////////////////////////////
2619 /// make ROOT-independent C++ class for classifier response (classifier-specific implementation)
2620 
2621 void TMVA::MethodBDT::MakeClassSpecific( std::ostream& fout, const TString& className ) const
2622 {
2623  TString nodeName = className;
2624  nodeName.ReplaceAll("Read","");
2625  nodeName.Append("Node");
2626  // write BDT-specific classifier response
2627  fout << " std::vector<"<<nodeName<<"*> fForest; // i.e. root nodes of decision trees" << std::endl;
2628  fout << " std::vector<double> fBoostWeights; // the weights applied in the individual boosts" << std::endl;
2629  fout << "};" << std::endl << std::endl;
2630  fout << "double " << className << "::GetMvaValue__( const std::vector<double>& inputValues ) const" << std::endl;
2631  fout << "{" << std::endl;
2632  fout << " double myMVA = 0;" << std::endl;
2633  if (fDoPreselection){
2634  for (UInt_t ivar = 0; ivar< fIsLowBkgCut.size(); ivar++){
2635  if (fIsLowBkgCut[ivar]){
2636  fout << " if (inputValues["<<ivar<<"] < " << fLowBkgCut[ivar] << ") return -1; // is background preselection cut" << std::endl;
2637  }
2638  if (fIsLowSigCut[ivar]){
2639  fout << " if (inputValues["<<ivar<<"] < "<< fLowSigCut[ivar] << ") return 1; // is signal preselection cut" << std::endl;
2640  }
2641  if (fIsHighBkgCut[ivar]){
2642  fout << " if (inputValues["<<ivar<<"] > "<<fHighBkgCut[ivar] <<") return -1; // is background preselection cut" << std::endl;
2643  }
2644  if (fIsHighSigCut[ivar]){
2645  fout << " if (inputValues["<<ivar<<"] > "<<fHighSigCut[ivar]<<") return 1; // is signal preselection cut" << std::endl;
2646  }
2647  }
2648  }
2649 
2650  if (fBoostType!="Grad"){
2651  fout << " double norm = 0;" << std::endl;
2652  }
2653  fout << " for (unsigned int itree=0; itree<fForest.size(); itree++){" << std::endl;
2654  fout << " "<<nodeName<<" *current = fForest[itree];" << std::endl;
2655  fout << " while (current->GetNodeType() == 0) { //intermediate node" << std::endl;
2656  fout << " if (current->GoesRight(inputValues)) current=("<<nodeName<<"*)current->GetRight();" << std::endl;
2657  fout << " else current=("<<nodeName<<"*)current->GetLeft();" << std::endl;
2658  fout << " }" << std::endl;
2659  if (fBoostType=="Grad"){
2660  fout << " myMVA += current->GetResponse();" << std::endl;
2661  }else{
2662  if (fUseYesNoLeaf) fout << " myMVA += fBoostWeights[itree] * current->GetNodeType();" << std::endl;
2663  else fout << " myMVA += fBoostWeights[itree] * current->GetPurity();" << std::endl;
2664  fout << " norm += fBoostWeights[itree];" << std::endl;
2665  }
2666  fout << " }" << std::endl;
2667  if (fBoostType=="Grad"){
2668  fout << " return 2.0/(1.0+exp(-2.0*myMVA))-1.0;" << std::endl;
2669  }
2670  else fout << " return myMVA /= norm;" << std::endl;
2671  fout << "};" << std::endl << std::endl;
2672  fout << "void " << className << "::Initialize()" << std::endl;
2673  fout << "{" << std::endl;
2674  //Now for each decision tree, write directly the constructors of the nodes in the tree structure
2675  for (UInt_t itree=0; itree<GetNTrees(); itree++) {
2676  fout << " // itree = " << itree << std::endl;
2677  fout << " fBoostWeights.push_back(" << fBoostWeights[itree] << ");" << std::endl;
2678  fout << " fForest.push_back( " << std::endl;
2679  this->MakeClassInstantiateNode((DecisionTreeNode*)fForest[itree]->GetRoot(), fout, className);
2680  fout <<" );" << std::endl;
2681  }
2682  fout << " return;" << std::endl;
2683  fout << "};" << std::endl;
2684  fout << " " << std::endl;
2685  fout << "// Clean up" << std::endl;
2686  fout << "inline void " << className << "::Clear() " << std::endl;
2687  fout << "{" << std::endl;
2688  fout << " for (unsigned int itree=0; itree<fForest.size(); itree++) { " << std::endl;
2689  fout << " delete fForest[itree]; " << std::endl;
2690  fout << " }" << std::endl;
2691  fout << "}" << std::endl;
2692 }
2693 
2694 ////////////////////////////////////////////////////////////////////////////////
2695 /// specific class header
2696 
2697 void TMVA::MethodBDT::MakeClassSpecificHeader( std::ostream& fout, const TString& className) const
2698 {
2699  TString nodeName = className;
2700  nodeName.ReplaceAll("Read","");
2701  nodeName.Append("Node");
2702  //fout << "#ifndef NN" << std::endl; commented out on purpose see next line
2703  fout << "#define NN new "<<nodeName << std::endl; // NN definition depends on individual methods. Important to have NO #ifndef if several BDT methods compile together
2704  //fout << "#endif" << std::endl; commented out on purpose see previous line
2705  fout << " " << std::endl;
2706  fout << "#ifndef "<<nodeName<<"__def" << std::endl;
2707  fout << "#define "<<nodeName<<"__def" << std::endl;
2708  fout << " " << std::endl;
2709  fout << "class "<<nodeName<<" {" << std::endl;
2710  fout << " " << std::endl;
2711  fout << "public:" << std::endl;
2712  fout << " " << std::endl;
2713  fout << " // constructor of an essentially \"empty\" node floating in space" << std::endl;
2714  fout << " "<<nodeName<<" ( "<<nodeName<<"* left,"<<nodeName<<"* right," << std::endl;
2715  if (fUseFisherCuts){
2716  fout << " int nFisherCoeff," << std::endl;
2717  for (UInt_t i=0;i<GetNVariables()+1;i++){
2718  fout << " double fisherCoeff"<<i<<"," << std::endl;
2719  }
2720  }
2721  fout << " int selector, double cutValue, bool cutType, " << std::endl;
2722  fout << " int nodeType, double purity, double response ) :" << std::endl;
2723  fout << " fLeft ( left )," << std::endl;
2724  fout << " fRight ( right )," << std::endl;
2725  if (fUseFisherCuts) fout << " fNFisherCoeff ( nFisherCoeff )," << std::endl;
2726  fout << " fSelector ( selector )," << std::endl;
2727  fout << " fCutValue ( cutValue )," << std::endl;
2728  fout << " fCutType ( cutType )," << std::endl;
2729  fout << " fNodeType ( nodeType )," << std::endl;
2730  fout << " fPurity ( purity )," << std::endl;
2731  fout << " fResponse ( response ){" << std::endl;
2732  if (fUseFisherCuts){
2733  for (UInt_t i=0;i<GetNVariables()+1;i++){
2734  fout << " fFisherCoeff.push_back(fisherCoeff"<<i<<");" << std::endl;
2735  }
2736  }
2737  fout << " }" << std::endl << std::endl;
2738  fout << " virtual ~"<<nodeName<<"();" << std::endl << std::endl;
2739  fout << " // test event if it decends the tree at this node to the right" << std::endl;
2740  fout << " virtual bool GoesRight( const std::vector<double>& inputValues ) const;" << std::endl;
2741  fout << " "<<nodeName<<"* GetRight( void ) {return fRight; };" << std::endl << std::endl;
2742  fout << " // test event if it decends the tree at this node to the left " << std::endl;
2743  fout << " virtual bool GoesLeft ( const std::vector<double>& inputValues ) const;" << std::endl;
2744  fout << " "<<nodeName<<"* GetLeft( void ) { return fLeft; }; " << std::endl << std::endl;
2745  fout << " // return S/(S+B) (purity) at this node (from training)" << std::endl << std::endl;
2746  fout << " double GetPurity( void ) const { return fPurity; } " << std::endl;
2747  fout << " // return the node type" << std::endl;
2748  fout << " int GetNodeType( void ) const { return fNodeType; }" << std::endl;
2749  fout << " double GetResponse(void) const {return fResponse;}" << std::endl << std::endl;
2750  fout << "private:" << std::endl << std::endl;
2751  fout << " "<<nodeName<<"* fLeft; // pointer to the left daughter node" << std::endl;
2752  fout << " "<<nodeName<<"* fRight; // pointer to the right daughter node" << std::endl;
2753  if (fUseFisherCuts){
2754  fout << " int fNFisherCoeff; // =0 if this node doesn use fisher, else =nvar+1 " << std::endl;
2755  fout << " std::vector<double> fFisherCoeff; // the fisher coeff (offset at the last element)" << std::endl;
2756  }
2757  fout << " int fSelector; // index of variable used in node selection (decision tree) " << std::endl;
2758  fout << " double fCutValue; // cut value appplied on this node to discriminate bkg against sig" << std::endl;
2759  fout << " bool fCutType; // true: if event variable > cutValue ==> signal , false otherwise" << std::endl;
2760  fout << " int fNodeType; // Type of node: -1 == Bkg-leaf, 1 == Signal-leaf, 0 = internal " << std::endl;
2761  fout << " double fPurity; // Purity of node from training"<< std::endl;
2762  fout << " double fResponse; // Regression response value of node" << std::endl;
2763  fout << "}; " << std::endl;
2764  fout << " " << std::endl;
2765  fout << "//_______________________________________________________________________" << std::endl;
2766  fout << " "<<nodeName<<"::~"<<nodeName<<"()" << std::endl;
2767  fout << "{" << std::endl;
2768  fout << " if (fLeft != NULL) delete fLeft;" << std::endl;
2769  fout << " if (fRight != NULL) delete fRight;" << std::endl;
2770  fout << "}; " << std::endl;
2771  fout << " " << std::endl;
2772  fout << "//_______________________________________________________________________" << std::endl;
2773  fout << "bool "<<nodeName<<"::GoesRight( const std::vector<double>& inputValues ) const" << std::endl;
2774  fout << "{" << std::endl;
2775  fout << " // test event if it decends the tree at this node to the right" << std::endl;
2776  fout << " bool result;" << std::endl;
2777  if (fUseFisherCuts){
2778  fout << " if (fNFisherCoeff == 0){" << std::endl;
2779  fout << " result = (inputValues[fSelector] > fCutValue );" << std::endl;
2780  fout << " }else{" << std::endl;
2781  fout << " double fisher = fFisherCoeff.at(fFisherCoeff.size()-1);" << std::endl;
2782  fout << " for (unsigned int ivar=0; ivar<fFisherCoeff.size()-1; ivar++)" << std::endl;
2783  fout << " fisher += fFisherCoeff.at(ivar)*inputValues.at(ivar);" << std::endl;
2784  fout << " result = fisher > fCutValue;" << std::endl;
2785  fout << " }" << std::endl;
2786  }else{
2787  fout << " result = (inputValues[fSelector] > fCutValue );" << std::endl;
2788  }
2789  fout << " if (fCutType == true) return result; //the cuts are selecting Signal ;" << std::endl;
2790  fout << " else return !result;" << std::endl;
2791  fout << "}" << std::endl;
2792  fout << " " << std::endl;
2793  fout << "//_______________________________________________________________________" << std::endl;
2794  fout << "bool "<<nodeName<<"::GoesLeft( const std::vector<double>& inputValues ) const" << std::endl;
2795  fout << "{" << std::endl;
2796  fout << " // test event if it decends the tree at this node to the left" << std::endl;
2797  fout << " if (!this->GoesRight(inputValues)) return true;" << std::endl;
2798  fout << " else return false;" << std::endl;
2799  fout << "}" << std::endl;
2800  fout << " " << std::endl;
2801  fout << "#endif" << std::endl;
2802  fout << " " << std::endl;
2803 }
2804 
2805 ////////////////////////////////////////////////////////////////////////////////
2806 /// recursively descends a tree and writes the node instance to the output streem
2807 
2808 void TMVA::MethodBDT::MakeClassInstantiateNode( DecisionTreeNode *n, std::ostream& fout, const TString& className ) const
2809 {
2810  if (n == NULL) {
2811  Log() << kFATAL << "MakeClassInstantiateNode: started with undefined node" <<Endl;
2812  return ;
2813  }
2814  fout << "NN("<<std::endl;
2815  if (n->GetLeft() != NULL){
2816  this->MakeClassInstantiateNode( (DecisionTreeNode*)n->GetLeft() , fout, className);
2817  }
2818  else {
2819  fout << "0";
2820  }
2821  fout << ", " <<std::endl;
2822  if (n->GetRight() != NULL){
2823  this->MakeClassInstantiateNode( (DecisionTreeNode*)n->GetRight(), fout, className );
2824  }
2825  else {
2826  fout << "0";
2827  }
2828  fout << ", " << std::endl
2829  << std::setprecision(6);
2830  if (fUseFisherCuts){
2831  fout << n->GetNFisherCoeff() << ", ";
2832  for (UInt_t i=0; i< GetNVariables()+1; i++) {
2833  if (n->GetNFisherCoeff() == 0 ){
2834  fout << "0, ";
2835  }else{
2836  fout << n->GetFisherCoeff(i) << ", ";
2837  }
2838  }
2839  }
2840  fout << n->GetSelector() << ", "
2841  << n->GetCutValue() << ", "
2842  << n->GetCutType() << ", "
2843  << n->GetNodeType() << ", "
2844  << n->GetPurity() << ","
2845  << n->GetResponse() << ") ";
2846 }
2847 
2848 ////////////////////////////////////////////////////////////////////////////////
2849 /// find useful preselection cuts that will be applied before
2850 /// and Decision Tree training.. (and of course also applied
2851 /// in the GetMVA .. --> -1 for background +1 for Signal
2852 /// /*
2853 
2854 void TMVA::MethodBDT::DeterminePreselectionCuts(const std::vector<const TMVA::Event*>& eventSample)
2855 {
2856  Double_t nTotS = 0.0, nTotB = 0.0;
2857  Int_t nTotS_unWeighted = 0, nTotB_unWeighted = 0;
2858 
2859  std::vector<TMVA::BDTEventWrapper> bdtEventSample;
2860 
2861  fIsLowSigCut.assign(GetNvar(),kFALSE);
2862  fIsLowBkgCut.assign(GetNvar(),kFALSE);
2863  fIsHighSigCut.assign(GetNvar(),kFALSE);
2864  fIsHighBkgCut.assign(GetNvar(),kFALSE);
2865 
2866  fLowSigCut.assign(GetNvar(),0.); // ---------------| --> in var is signal (accept all above lower cut)
2867  fLowBkgCut.assign(GetNvar(),0.); // ---------------| --> in var is bkg (accept all above lower cut)
2868  fHighSigCut.assign(GetNvar(),0.); // <-- | -------------- in var is signal (accept all blow cut)
2869  fHighBkgCut.assign(GetNvar(),0.); // <-- | -------------- in var is blg (accept all blow cut)
2870 
2871 
2872  // Initialize (un)weighted counters for signal & background
2873  // Construct a list of event wrappers that point to the original data
2874  for( std::vector<const TMVA::Event*>::const_iterator it = eventSample.begin(); it != eventSample.end(); ++it ) {
2875  if (DataInfo().IsSignal(*it)){
2876  nTotS += (*it)->GetWeight();
2877  ++nTotS_unWeighted;
2878  }
2879  else {
2880  nTotB += (*it)->GetWeight();
2881  ++nTotB_unWeighted;
2882  }
2883  bdtEventSample.push_back(TMVA::BDTEventWrapper(*it));
2884  }
2885 
2886  for( UInt_t ivar = 0; ivar < GetNvar(); ivar++ ) { // loop over all discriminating variables
2887  TMVA::BDTEventWrapper::SetVarIndex(ivar); // select the variable to sort by
2888  std::sort( bdtEventSample.begin(),bdtEventSample.end() ); // sort the event data
2889 
2890  Double_t bkgWeightCtr = 0.0, sigWeightCtr = 0.0;
2891  std::vector<TMVA::BDTEventWrapper>::iterator it = bdtEventSample.begin(), it_end = bdtEventSample.end();
2892  for( ; it != it_end; ++it ) {
2893  if (DataInfo().IsSignal(**it))
2894  sigWeightCtr += (**it)->GetWeight();
2895  else
2896  bkgWeightCtr += (**it)->GetWeight();
2897  // Store the accumulated signal (background) weights
2898  it->SetCumulativeWeight(false,bkgWeightCtr);
2899  it->SetCumulativeWeight(true,sigWeightCtr);
2900  }
2901 
2902  //variable that determines how "exact" you cut on the preslection found in the training data. Here I chose
2903  //1% of the variable range...
2904  Double_t dVal = (DataInfo().GetVariableInfo(ivar).GetMax() - DataInfo().GetVariableInfo(ivar).GetMin())/100. ;
2905  Double_t nSelS, nSelB, effS=0.05, effB=0.05, rejS=0.05, rejB=0.05;
2906  Double_t tmpEffS, tmpEffB, tmpRejS, tmpRejB;
2907  // Locate the optimal cut for this (ivar-th) variable
2908 
2909 
2910 
2911  for(UInt_t iev = 1; iev < bdtEventSample.size(); iev++) {
2912  //dVal = bdtEventSample[iev].GetVal() - bdtEventSample[iev-1].GetVal();
2913 
2914  nSelS = bdtEventSample[iev].GetCumulativeWeight(true);
2915  nSelB = bdtEventSample[iev].GetCumulativeWeight(false);
2916  // you look for some 100% efficient pre-selection cut to remove background.. i.e. nSelS=0 && nSelB>5%nTotB or ( nSelB=0 nSelS>5%nTotS)
2917  tmpEffS=nSelS/nTotS;
2918  tmpEffB=nSelB/nTotB;
2919  tmpRejS=1-tmpEffS;
2920  tmpRejB=1-tmpEffB;
2921  if (nSelS==0 && tmpEffB>effB) {effB=tmpEffB; fLowBkgCut[ivar] = bdtEventSample[iev].GetVal() - dVal; fIsLowBkgCut[ivar]=kTRUE;}
2922  else if (nSelB==0 && tmpEffS>effS) {effS=tmpEffS; fLowSigCut[ivar] = bdtEventSample[iev].GetVal() - dVal; fIsLowSigCut[ivar]=kTRUE;}
2923  else if (nSelB==nTotB && tmpRejS>rejS) {rejS=tmpRejS; fHighSigCut[ivar] = bdtEventSample[iev].GetVal() + dVal; fIsHighSigCut[ivar]=kTRUE;}
2924  else if (nSelS==nTotS && tmpRejB>rejB) {rejB=tmpRejB; fHighBkgCut[ivar] = bdtEventSample[iev].GetVal() + dVal; fIsHighBkgCut[ivar]=kTRUE;}
2925 
2926  }
2927  }
2928 
2929  Log() << kDEBUG << " \tfound and suggest the following possible pre-selection cuts " << Endl;
2930  if (fDoPreselection) Log() << kDEBUG << "\tthe training will be done after these cuts... and GetMVA value returns +1, (-1) for a signal (bkg) event that passes these cuts" << Endl;
2931  else Log() << kDEBUG << "\tas option DoPreselection was not used, these cuts however will not be performed, but the training will see the full sample"<<Endl;
2932  for (UInt_t ivar=0; ivar < GetNvar(); ivar++ ) { // loop over all discriminating variables
2933  if (fIsLowBkgCut[ivar]){
2934  Log() << kDEBUG << " \tfound cut: Bkg if var " << ivar << " < " << fLowBkgCut[ivar] << Endl;
2935  }
2936  if (fIsLowSigCut[ivar]){
2937  Log() << kDEBUG << " \tfound cut: Sig if var " << ivar << " < " << fLowSigCut[ivar] << Endl;
2938  }
2939  if (fIsHighBkgCut[ivar]){
2940  Log() << kDEBUG << " \tfound cut: Bkg if var " << ivar << " > " << fHighBkgCut[ivar] << Endl;
2941  }
2942  if (fIsHighSigCut[ivar]){
2943  Log() << kDEBUG << " \tfound cut: Sig if var " << ivar << " > " << fHighSigCut[ivar] << Endl;
2944  }
2945  }
2946 
2947  return;
2948 }
2949 
2950 ////////////////////////////////////////////////////////////////////////////////
2951 /// aply the preselection cuts before even bothing about any
2952 /// Decision Trees in the GetMVA .. --> -1 for background +1 for Signal
2953 
2955 {
2956  Double_t result=0;
2957 
2958  for (UInt_t ivar=0; ivar < GetNvar(); ivar++ ) { // loop over all discriminating variables
2959  if (fIsLowBkgCut[ivar]){
2960  if (ev->GetValue(ivar) < fLowBkgCut[ivar]) result = -1; // is background
2961  }
2962  if (fIsLowSigCut[ivar]){
2963  if (ev->GetValue(ivar) < fLowSigCut[ivar]) result = 1; // is signal
2964  }
2965  if (fIsHighBkgCut[ivar]){
2966  if (ev->GetValue(ivar) > fHighBkgCut[ivar]) result = -1; // is background
2967  }
2968  if (fIsHighSigCut[ivar]){
2969  if (ev->GetValue(ivar) > fHighSigCut[ivar]) result = 1; // is signal
2970  }
2971  }
2972 
2973  return result;
2974 }
2975 
Bool_t fUseYesNoLeaf
Definition: MethodBDT.h:238
Types::EAnalysisType fAnalysisType
Definition: MethodBase.h:589
void Train(void)
BDT training.
Definition: MethodBDT.cxx:1134
void PreProcessNegativeEventWeights()
o.k.
Definition: MethodBDT.cxx:921
virtual Int_t Fill(Double_t x)
Increment bin with abscissa X by 1.
Definition: TH1.cxx:3125
double dist(Rotation3D const &r1, Rotation3D const &r2)
Definition: 3DDistances.cxx:48
void GetBaggedSubSample(std::vector< const TMVA::Event *> &)
fills fEventSample with fBaggedSampleFraction*NEvents random training events
Definition: MethodBDT.cxx:2024
static long int sum(long int i)
Definition: Factory.cxx:1786
virtual Double_t Fit(std::vector< LossFunctionEventInfo > &evs)=0
Random number generator class based on M.
Definition: TRandom3.h:29
THist< 1, int, THistStatContent > TH1I
Definition: THist.hxx:304
virtual Double_t PoissonD(Double_t mean)
Generates a random number according to a Poisson law.
Definition: TRandom.cxx:414
MsgLogger & Endl(MsgLogger &ml)
Definition: MsgLogger.h:162
long long Long64_t
Definition: RtypesCore.h:69
std::vector< Bool_t > fIsLowSigCut
Definition: MethodBDT.h:288
Double_t RegBoost(std::vector< const TMVA::Event *> &, DecisionTree *dt)
a special boosting only for Regression ...
Definition: MethodBDT.cxx:2059
void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility
Definition: MethodBDT.cxx:444
std::map< const TMVA::Event *, LossFunctionEventInfo > fLossFunctionEventInfo
Definition: MethodBDT.h:223
Bool_t fPairNegWeightsGlobal
Definition: MethodBDT.h:257
void AddPoint(Double_t x, Double_t y1, Double_t y2)
This function is used only in 2 TGraph case, and it will add new data points to graphs.
Definition: MethodBase.cxx:206
void SetUseNvars(Int_t n)
Definition: MethodBDT.h:141
Bool_t fRandomisedTrees
Definition: MethodBDT.h:248
TString fMinNodeSizeS
Definition: MethodBDT.h:232
Double_t Log(Double_t x)
Definition: TMath.h:526
const Ranking * CreateRanking()
Compute ranking of input variables.
Definition: MethodBDT.cxx:2544
virtual void Delete(Option_t *option="")
Delete this tree from memory or/and disk.
Definition: TTree.cxx:3563
Bool_t IsConstructedFromWeightFile() const
Definition: MethodBase.h:534
float Float_t
Definition: RtypesCore.h:53
TString fPruneMethodS
Definition: MethodBDT.h:244
Double_t GetMin() const
Definition: VariableInfo.h:71
TString fSepTypeS
Definition: MethodBDT.h:229
Double_t CheckEvent(const TMVA::Event *, Bool_t UseYesNoLeaf=kFALSE) const
the event e is put into the decision tree (starting at the root node) and the output is NodeType (sig...
void BDT(TString dataset, const TString &fin="TMVA.root")
Definition: BDT.cxx:372
TString & ReplaceAll(const TString &s1, const TString &s2)
Definition: TString.h:635
UInt_t GetNvar() const
Definition: MethodBase.h:340
TTree * fMonitorNtuple
Definition: MethodBDT.h:263
UInt_t GetNNodes() const
Definition: BinaryTree.h:92
virtual Int_t Fill()
Fill all branches.
Definition: TTree.cxx:4375
virtual void SetName(const char *name)
Set the name of the TNamed.
Definition: TNamed.cxx:131
THist< 1, float, THistStatContent, THistStatUncertainty > TH1F
Definition: THist.hxx:302
TH1 * h
Definition: legend2.C:5
Double_t fAdaBoostBeta
Definition: MethodBDT.h:215
MsgLogger & Log() const
Definition: Configurable.h:128
std::vector< Bool_t > fIsHighSigCut
Definition: MethodBDT.h:290
OptionBase * DeclareOptionRef(T &ref, const TString &name, const TString &desc="")
void DeclareOptions()
define the options (their key words) that can be set in the option string know options: nTrees number...
Definition: MethodBDT.cxx:323
std::vector< Double_t > fVariableImportance
Definition: MethodBDT.h:277
Bool_t IsFloat() const
Returns kTRUE if string contains a floating point or integer number.
Definition: TString.cxx:1835
void DeterminePreselectionCuts(const std::vector< const TMVA::Event *> &eventSample)
find useful preselection cuts that will be applied before and Decision Tree training.
Definition: MethodBDT.cxx:2854
EAnalysisType
Definition: Types.h:129
void MakeClassInstantiateNode(DecisionTreeNode *n, std::ostream &fout, const TString &className) const
recursively descends a tree and writes the node instance to the output streem
Definition: MethodBDT.cxx:2808
Double_t fMinLinCorrForFisher
Definition: MethodBDT.h:236
std::vector< const TMVA::Event * > fEventSample
Definition: MethodBDT.h:205
Double_t Bagging()
call it boot-strapping, re-sampling or whatever you like, in the end it is nothing else but applying ...
Definition: MethodBDT.cxx:2013
void DrawProgressBar(Int_t, const TString &comment="")
draws progress bar in color or B&W caution:
Definition: Timer.cxx:186
bool fExitFromTraining
Definition: MethodBase.h:443
Bool_t fBaggedGradBoost
Definition: MethodBDT.h:220
Bool_t fDoBoostMonitor
Definition: MethodBDT.h:259
Basic string class.
Definition: TString.h:137
tomato 1-D histogram with a float per channel (see TH1 documentation)}
Definition: TH1.h:575
TransformationHandler & GetTransformationHandler(Bool_t takeReroutedIfAvailable=true)
Definition: MethodBase.h:390
Short_t Min(Short_t a, Short_t b)
Definition: TMathBase.h:170
void ToLower()
Change string to lower-case.
Definition: TString.cxx:1089
int Int_t
Definition: RtypesCore.h:41
virtual void SetYTitle(const char *title)
Definition: TH1.h:414
bool Bool_t
Definition: RtypesCore.h:59
const Bool_t kFALSE
Definition: Rtypes.h:92
virtual void SetTitle(const char *title="")
Set graph title.
Definition: TGraph.cxx:2176
Double_t AdaBoost(std::vector< const TMVA::Event *> &, DecisionTree *dt)
the AdaBoost implementation.
Definition: MethodBDT.cxx:1714
UInt_t GetNClasses() const
Definition: DataSetInfo.h:154
void ProcessOptions()
the option string is decoded, for available options see "DeclareOptions"
Definition: MethodBDT.cxx:463
Bool_t fBaggedBoost
Definition: MethodBDT.h:219
Int_t FloorNint(Double_t x)
Definition: TMath.h:476
void GetHelpMessage() const
Get help message text.
Definition: MethodBDT.cxx:2564
Bool_t GetCutType(void) const
std::vector< Bool_t > fIsHighBkgCut
Definition: MethodBDT.h:291
void SetShrinkage(Double_t s)
Definition: MethodBDT.h:140
Double_t AdaCost(std::vector< const TMVA::Event *> &, DecisionTree *dt)
the AdaCost boosting algorithm takes a simple cost Matrix (currently fixed for all events...
Definition: MethodBDT.cxx:1895
Bool_t fAutomatic
Definition: MethodBDT.h:247
void MakeClassSpecific(std::ostream &, const TString &) const
make ROOT-independent C++ class for classifier response (classifier-specific implementation) ...
Definition: MethodBDT.cxx:2621
TString GetElapsedTime(Bool_t Scientific=kTRUE)
Definition: Timer.cxx:129
void AddAttr(void *node, const char *, const T &value, Int_t precision=16)
Definition: Tools.h:309
virtual Double_t GetROCIntegral(TH1D *histS, TH1D *histB) const
calculate the area (integral) under the ROC curve as a overall quality measure of the classification ...
Double_t fCts_sb
Definition: MethodBDT.h:269
void * AddChild(void *parent, const char *childname, const char *content=0, bool isRootNode=false)
add child node
Definition: Tools.cxx:1134
Short_t Abs(Short_t d)
Definition: TMathBase.h:110
TString fRegressionLossFunctionBDTGS
Definition: MethodBDT.h:295
Double_t fBoostWeight
Definition: MethodBDT.h:265
Double_t GetMvaValue(Double_t *err=0, Double_t *errUpper=0)
Definition: MethodBDT.cxx:2317
LongDouble_t Power(LongDouble_t x, LongDouble_t y)
Definition: TMath.h:501
const TString & GetInputLabel(Int_t i) const
Definition: MethodBase.h:346
UInt_t fSignalClass
Definition: MethodBase.h:683
Double_t fPruneStrength
Definition: MethodBDT.h:245
std::vector< Double_t > fHighBkgCut
Definition: MethodBDT.h:286
Double_t GetGradBoostMVA(const TMVA::Event *e, UInt_t nTrees)
returns MVA value: -1 for background, 1 for signal
Definition: MethodBDT.cxx:1410
TClass * GetClass(T *)
Definition: TClass.h:555
Tools & gTools()
Definition: Tools.cxx:79
Double_t fBaggedSampleFraction
Definition: MethodBDT.h:253
Bool_t fInverseBoostNegWeights
Definition: MethodBDT.h:256
TStopwatch timer
Definition: pirndm.C:37
Double_t GradBoostRegression(std::vector< const TMVA::Event *> &, DecisionTree *dt)
Implementation of M_TreeBoost using any loss function as desribed by Friedman 1999.
Definition: MethodBDT.cxx:1502
virtual void SetTuneParameters(std::map< TString, Double_t > tuneParameters)
set the tuning parameters accoding to the argument
Definition: MethodBDT.cxx:1112
void MakeClassSpecificHeader(std::ostream &, const TString &) const
specific class header
Definition: MethodBDT.cxx:2697
Float_t GetCutValue(void) const
UInt_t GetTrainingTMVAVersionCode() const
Definition: MethodBase.h:385
const Event * GetEvent() const
Definition: MethodBase.h:745
DataSet * Data() const
Definition: MethodBase.h:405
Double_t fSigToBkgFraction
Definition: MethodBDT.h:213
virtual Bool_t HasAnalysisType(Types::EAnalysisType type, UInt_t numberClasses, UInt_t numberTargets)
BDT can handle classification with multiple classes and regression with one regression-target.
Definition: MethodBDT.cxx:275
UInt_t GetNFisherCoeff() const
Bool_t fDoPreselection
Definition: MethodBDT.h:273
void * GetChild(void *parent, const char *childname=0)
get child node
Definition: Tools.cxx:1158
void Reset(void)
reset the method, as if it had just been instantiated (forget all training etc.)
Definition: MethodBDT.cxx:718
TString & Append(const char *cs)
Definition: TString.h:492
void SetMinNodeSize(Double_t sizeInPercent)
Definition: MethodBDT.cxx:652
TString fAdaBoostR2Loss
Definition: MethodBDT.h:216
void Init(std::vector< TString > &graphTitles)
This function gets some title and it creates a TGraph for every title.
Definition: MethodBase.cxx:171
DataSetInfo & DataInfo() const
Definition: MethodBase.h:406
Bool_t DoRegression() const
Definition: MethodBase.h:434
Int_t fMinNodeEvents
Definition: MethodBDT.h:230
Double_t AdaBoostR2(std::vector< const TMVA::Event *> &, DecisionTree *dt)
adaption of the AdaBoost to regression problems (see H.Drucker 1997)
Definition: MethodBDT.cxx:2067
void SetNTrees(Int_t d)
Definition: MethodBDT.h:137
std::vector< Double_t > fHighSigCut
Definition: MethodBDT.h:285
Definition: PDF.h:71
Double_t fCtb_ss
Definition: MethodBDT.h:270
Long64_t GetNTrainingEvents() const
Definition: DataSet.h:93
const std::vector< Float_t > & GetMulticlassValues()
get the multiclass MVA response for the BDT classifier
Definition: MethodBDT.cxx:2368
UInt_t fIPyCurrentIter
Definition: MethodBase.h:444
virtual void Print(Option_t *option="") const
Print TNamed name and title.
Definition: TNamed.cxx:119
Float_t fMinNodeSize
Definition: MethodBDT.h:231
const Event * GetTrainingEvent(Long64_t ievt) const
Definition: MethodBase.h:765
Bool_t fNoNegWeightsInTraining
Definition: MethodBDT.h:255
Double_t GetMax() const
Definition: VariableInfo.h:72
Bool_t DoMulticlass() const
Definition: MethodBase.h:435
const std::vector< Float_t > & GetRegressionValues()
get the regression value generated by the BDTs
Definition: MethodBDT.cxx:2403
void InitEventSample()
initialize the event sample (i.e. reset the boost-weights... etc)
Definition: MethodBDT.cxx:755
std::vector< Bool_t > fIsLowBkgCut
Definition: MethodBDT.h:289
void WriteMonitoringHistosToFile(void) const
Here we could write some histograms created during the processing to the output file.
Definition: MethodBDT.cxx:2489
virtual void Delete(Option_t *option="")
Delete this object.
Definition: TObject.cxx:229
Double_t fErrorFraction
Definition: MethodBDT.h:266
VecExpr< UnaryOp< Fabs< T >, VecExpr< A, T, D >, T >, T, D > fabs(const VecExpr< A, T, D > &rhs)
const Event * GetTestingEvent(Long64_t ievt) const
Definition: MethodBase.h:771
virtual Double_t Determinant() const
virtual Int_t Write(const char *name=0, Int_t option=0, Int_t bufsize=0)
Write this object to the current directory.
Definition: TTree.cxx:9042
Float_t GetTarget(UInt_t itgt) const
Definition: Event.h:104
Bool_t HasTrainingTree() const
Definition: MethodBase.h:507
Results * GetResults(const TString &, Types::ETreeType type, Types::EAnalysisType analysistype)
TString info(resultsName+"/"); switch(type) { case Types::kTraining: info += "kTraining/"; break; cas...
Definition: DataSet.cxx:286
std::vector< Double_t > fLowBkgCut
Definition: MethodBDT.h:284
Int_t GetNodeType(void) const
TRandom2 r(17)
Double_t fNodePurityLimit
Definition: MethodBDT.h:239
Service class for 2-Dim histogram classes.
Definition: TH2.h:36
Bool_t fHistoricBool
Definition: MethodBDT.h:293
void SetBaggedSampleFraction(Double_t f)
Definition: MethodBDT.h:142
virtual TString Name()=0
const char * GetName() const
Definition: MethodBase.h:330
ClassInfo * GetClassInfo(Int_t clNum) const
std::map< TString, Double_t > optimize()
TGraph * GetGraph(const TString &alias) const
Definition: Results.cxx:144
void BoostMonitor(Int_t iTree)
fills the ROCIntegral vs Itree from the testSample for the monitoring plots during the training ...
Definition: MethodBDT.cxx:1620
Double_t GetFisherCoeff(Int_t ivar) const
Bool_t fTrainWithNegWeights
Definition: MethodBDT.h:258
Bool_t fSkipNormalization
Definition: MethodBDT.h:275
void DeleteResults(const TString &, Types::ETreeType type, Types::EAnalysisType analysistype)
delete the results stored for this particulary Method instance (here appareantly called resultsName i...
Definition: DataSet.cxx:337
virtual ~MethodBDT(void)
destructor Note: fEventSample and ValidationSample are already deleted at the end of TRAIN When they ...
Definition: MethodBDT.cxx:747
virtual void SetBinContent(Int_t bin, Double_t content)
Set bin content see convention for numbering bins in TH1::GetBin In case the bin number is greater th...
Definition: TH1.cxx:8323
void SetNodePurityLimit(Double_t l)
Definition: MethodBDT.h:139
UInt_t fIPyMaxIter
Definition: MethodBase.h:444
Double_t PrivateGetMvaValue(const TMVA::Event *ev, Double_t *err=0, Double_t *errUpper=0, UInt_t useNTrees=0)
Return the MVA value (range [-1;1]) that classifies the event according to the majority vote from the...
Definition: MethodBDT.cxx:2341
unsigned int UInt_t
Definition: RtypesCore.h:42
Double_t GradBoost(std::vector< const TMVA::Event *> &, DecisionTree *dt, UInt_t cls=0)
Calculate the desired response value for each region.
Definition: MethodBDT.cxx:1471
char * Form(const char *fmt,...)
const Event * InverseTransform(const Event *, Bool_t suppressIfNoTargets=true) const
Double_t E()
Definition: TMath.h:54
double floor(double)
Int_t GetN() const
Definition: TGraph.h:133
const TString & GetMethodName() const
Definition: MethodBase.h:327
void SetTarget(UInt_t itgt, Float_t value)
set the target value (dimension itgt) to value
Definition: Event.cxx:356
Double_t fCbb
Definition: MethodBDT.h:271
void ReadAttr(void *node, const char *, T &value)
Definition: Tools.h:296
SeparationBase * fSepType
Definition: MethodBDT.h:228
void Init(void)
common initialisation with defaults for the BDT-Method
Definition: MethodBDT.cxx:680
void ReadWeightsFromXML(void *parent)
reads the BDT from the xml file
Definition: MethodBDT.cxx:2215
TMVA::DecisionTreeNode * GetEventNode(const TMVA::Event &e) const
get the pointer to the leaf node where a particular event ends up in...
virtual const char * GetPath() const
Returns the full path of the directory.
Definition: TDirectory.cxx:911
Bool_t fUseExclusiveVars
Definition: MethodBDT.h:237
TGraphErrors * gr
Definition: legend1.C:25
REAL epsilon
Definition: triangle.c:617
Double_t TestTreeQuality(DecisionTree *dt)
test the tree quality.. in terms of Miscalssification
Definition: MethodBDT.cxx:1565
Long64_t GetNTestEvents() const
Definition: DataSet.h:94
UInt_t GetNVariables() const
Definition: MethodBase.h:341
Float_t GetValue(UInt_t ivar) const
return value of i&#39;th variable
Definition: Event.cxx:233
DecisionTree::EPruneMethod fPruneMethod
Definition: MethodBDT.h:243
static void SetVarIndex(Int_t iVar)
UInt_t fUseNvars
Definition: MethodBDT.h:249
UInt_t fMaxDepth
Definition: MethodBDT.h:241
Float_t GetPurity(void) const
Bool_t IgnoreEventsWithNegWeightsInTraining() const
Definition: MethodBase.h:680
Double_t Exp(Double_t x)
Definition: TMath.h:495
#define ClassImp(name)
Definition: Rtypes.h:279
void ReadWeightsFromStream(std::istream &istr)
read the weights (BDT coefficients)
Definition: MethodBDT.cxx:2282
double Double_t
Definition: RtypesCore.h:55
Double_t ApplyPreselectionCuts(const Event *ev)
aply the preselection cuts before even bothing about any Decision Trees in the GetMVA ...
Definition: MethodBDT.cxx:2954
void UpdateTargets(std::vector< const TMVA::Event *> &, UInt_t cls=0)
Calculate residua for all events;.
Definition: MethodBDT.cxx:1424
std::vector< Float_t > * fMulticlassReturnVal
Definition: MethodBase.h:592
Bool_t IsNormalised() const
Definition: MethodBase.h:490
void SetMaxDepth(Int_t d)
Definition: MethodBDT.h:133
TH1 * GetHist(const TString &alias) const
Definition: Results.cxx:127
int type
Definition: TGX11.cxx:120
void AddWeightsXMLTo(void *parent) const
write weights to XML
Definition: MethodBDT.cxx:2184
Double_t fShrinkage
Definition: MethodBDT.h:218
Bool_t fUsePoissonNvars
Definition: MethodBDT.h:250
static DecisionTree * CreateFromXML(void *node, UInt_t tmva_Version_Code=TMVA_VERSION_CODE)
re-create a new tree (decision tree or search tree) from XML
static RooMathCoreReg dummy
void * GetNextChild(void *prevchild, const char *childname=0)
XML helpers.
Definition: Tools.cxx:1170
void SetAdaBoostBeta(Double_t b)
Definition: MethodBDT.h:138
void SetCurrentType(Types::ETreeType type) const
Definition: DataSet.h:114
The TH1 histogram class.
Definition: TH1.h:80
std::vector< const TMVA::Event * > * fTrainSample
Definition: MethodBDT.h:208
you should not use this method at all Int_t Int_t Double_t Double_t Double_t e
Definition: TRolke.cxx:630
void UsefulSortAscending(std::vector< std::vector< Double_t > > &, std::vector< TString > *vs=0)
sort 2D vector (AND in parallel a TString vector) in such a way that the "first vector is sorted" and...
Definition: Tools.cxx:547
VariableInfo & GetVariableInfo(Int_t i)
Definition: DataSetInfo.h:114
void AddPreDefVal(const T &)
Definition: Configurable.h:174
Double_t Boost(std::vector< const TMVA::Event *> &, DecisionTree *dt, UInt_t cls=0)
apply the boosting alogrithim (the algorithm is selecte via the the "option" given in the constructor...
Definition: MethodBDT.cxx:1586
UInt_t GetNumber() const
Definition: ClassInfo.h:73
void ExitFromTraining()
Definition: MethodBase.h:458
const TString & GetOptions() const
Definition: Configurable.h:90
LossFunctionBDT * fRegressionLossFunctionBDTG
Definition: MethodBDT.h:298
TString fBoostType
Definition: MethodBDT.h:214
Bool_t fUseFisherCuts
Definition: MethodBDT.h:235
const TString & Color(const TString &)
human readable color strings
Definition: Tools.cxx:837
TMatrixTSym< Element > & Invert(Double_t *det=0)
Invert the matrix and calculate its determinant Notice that the LU decomposition is used instead of B...
UInt_t fUseNTrainEvents
Definition: MethodBDT.h:251
virtual std::map< TString, Double_t > OptimizeTuningParameters(TString fomType="ROCIntegral", TString fitType="FitGA")
call the Optimzier with the set of paremeters and ranges that are meant to be tuned.
Definition: MethodBDT.cxx:1059
virtual Int_t Branch(TCollection *list, Int_t bufsize=32000, Int_t splitlevel=99, const char *name="")
Create one branch for each element in the collection.
Definition: TTree.cxx:1652
#define REGISTER_METHOD(CLASS)
for example
TString fNegWeightTreatment
Definition: MethodBDT.h:254
Abstract ClassifierFactory template that handles arbitrary types.
Ranking * fRanking
Definition: MethodBase.h:581
virtual void SetXTitle(const char *title)
Definition: TH1.h:413
virtual void SetPoint(Int_t i, Double_t x, Double_t y)
Set x and y values for point number i.
Definition: TGraph.cxx:2150
IPythonInteractive * fInteractive
Definition: MethodBase.h:442
TDirectory * BaseDir() const
returns the ROOT directory where info/histograms etc of the corresponding MVA method instance are sto...
Float_t GetResponse(void) const
virtual void AddRank(const Rank &rank)
Add a new rank take ownership of it.
Definition: Ranking.cxx:86
virtual void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
Definition: MethodBase.cxx:590
Short_t Max(Short_t a, Short_t b)
Definition: TMathBase.h:202
double ceil(double)
A Graph is a graphics object made of two arrays X and Y with npoints each.
Definition: TGraph.h:53
virtual DecisionTreeNode * GetLeft() const
std::vector< const TMVA::Event * > fValidationSample
Definition: MethodBDT.h:206
#define NULL
Definition: Rtypes.h:82
std::vector< DecisionTree * > fForest
Definition: MethodBDT.h:211
virtual DecisionTreeNode * GetRight() const
Bool_t IsSignal(const Event *ev) const
std::vector< Double_t > GetVariableImportance()
Return the relative variable importance, normalized to all variables together having the importance 1...
Definition: MethodBDT.cxx:2504
Double_t fFValidationEvents
Definition: MethodBDT.h:246
std::vector< Double_t > fLowSigCut
Definition: MethodBDT.h:283
std::vector< Float_t > * fRegressionReturnVal
Definition: MethodBase.h:591
Double_t Atof() const
Return floating-point value contained in string.
Definition: TString.cxx:2031
void UpdateTargetsRegression(std::vector< const TMVA::Event *> &, Bool_t first=kFALSE)
Calculate current residuals for all events and update targets for next iteration. ...
Definition: MethodBDT.cxx:1457
Types::EAnalysisType GetAnalysisType() const
Definition: MethodBase.h:433
A TTree object has a header with a name and a title.
Definition: TTree.h:98
double result[121]
Short_t GetSelector() const
std::map< const TMVA::Event *, std::vector< double > > fResiduals
Definition: MethodBDT.h:225
Definition: first.py:1
void Store(TObject *obj, const char *alias=0)
Definition: Results.cxx:83
static const Int_t fgDebugLevel
Definition: MethodBDT.h:301
virtual void Init(std::map< const TMVA::Event *, LossFunctionEventInfo > &evinfomap, std::vector< double > &boostWeights)=0
Double_t Sqrt(Double_t x)
Definition: TMath.h:464
std::vector< TMatrixDSym * > * CalcCovarianceMatrices(const std::vector< Event *> &events, Int_t maxCls, VariableTransformBase *transformBase=0)
compute covariance matrices
Definition: Tools.cxx:1522
virtual void Set(Int_t n)
Set number of points in the graph Existing coordinates are preserved New coordinates above fNpoints a...
Definition: TGraph.cxx:2099
double exp(double)
const Bool_t kTRUE
Definition: Rtypes.h:91
THist< 2, float, THistStatContent, THistStatUncertainty > TH2F
Definition: THist.hxx:308
Double_t fCss
Definition: MethodBDT.h:268
double norm(double *x, double *p)
Definition: unuranDistr.cxx:40
return
Definition: HLFactory.cxx:514
const Int_t n
Definition: legend1.C:16
std::vector< const TMVA::Event * > fSubSample
Definition: MethodBDT.h:207
UInt_t GetNTrees() const
Definition: MethodBDT.h:112
Int_t CeilNint(Double_t x)
Definition: TMath.h:470
Double_t fHuberQuantile
Definition: MethodBDT.h:296
UInt_t fNNodesMax
Definition: MethodBDT.h:240
virtual void SetTargets(std::vector< const TMVA::Event *> &evs, std::map< const TMVA::Event *, LossFunctionEventInfo > &evinfomap)=0
void InitGradBoost(std::vector< const TMVA::Event *> &)
initialize targets for first tree
Definition: MethodBDT.cxx:1527
void NoErrorCalc(Double_t *const err, Double_t *const errUpper)
Definition: MethodBase.cxx:819
void SetSignalReferenceCut(Double_t cut)
Definition: MethodBase.h:360
virtual const char * GetTitle() const
Returns title of object.
Definition: TNamed.h:52
std::vector< double > fBoostWeights
Definition: MethodBDT.h:212
MethodBDT(const TString &jobName, const TString &methodTitle, DataSetInfo &theData, const TString &theOption="")
the standard constructor for the "boosted decision trees"
Definition: MethodBDT.cxx:160