Logo ROOT  
Reference Guide
MethodLikelihood.cxx
Go to the documentation of this file.
1// @(#)root/tmva $Id$
2// Author: Andreas Hoecker, Joerg Stelzer, Helge Voss, Kai Voss, Eckhard von Toerne, Jan Therhaag
3
4/**********************************************************************************
5 * Project: TMVA - a Root-integrated toolkit for multivariate Data analysis *
6 * Package: TMVA *
7 * Class : TMVA::MethodLikelihood *
8 * Web : http://tmva.sourceforge.net *
9 * *
10 * Description: *
11 * Implementation (see header for description) *
12 * *
13 * Authors (alphabetical): *
14 * Andreas Hoecker <Andreas.Hocker@cern.ch> - CERN, Switzerland *
15 * Helge Voss <Helge.Voss@cern.ch> - MPI-K Heidelberg, Germany *
16 * Kai Voss <Kai.Voss@cern.ch> - U. of Victoria, Canada *
17 * Jan Therhaag <Jan.Therhaag@cern.ch> - U of Bonn, Germany *
18 * Eckhard v. Toerne <evt@uni-bonn.de> - U. of Bonn, Germany *
19 * *
20 * Copyright (c) 2005-2011: *
21 * CERN, Switzerland *
22 * U. of Victoria, Canada *
23 * MPI-K Heidelberg, Germany *
24 * U. of Bonn, Germany *
25 * *
26 * Redistribution and use in source and binary forms, with or without *
27 * modification, are permitted according to the terms listed in LICENSE *
28 * (http://tmva.sourceforge.net/LICENSE) *
29 **********************************************************************************/
30
31/*! \class TMVA::MethodLikelihood
32\ingroup TMVA
33
34Likelihood analysis ("non-parametric approach")
35
36
37Also implemented is a "diagonalized likelihood approach",
38which improves over the uncorrelated likelihood approach by
39transforming linearly the input variables into a diagonal space,
40using the square-root of the covariance matrix
41
42
43The method of maximum likelihood is the most straightforward, and
44certainly among the most elegant multivariate analyser approaches.
45We define the likelihood ratio, \f$ R_L \f$, for event
46\f$ i \f$, by:
47
48\f[
49R_L(i) = \frac{L_S(i)}{L_B(i) + L_B(i)}
50\f]
51
52Here the signal and background likelihoods, \f$ L_S \f$,
53\f$ L_B \f$, are products of the corresponding probability
54densities, \f$ p_S \f$, \f$ p_B \f$, of the
55\f$ N_{var} \f$ discriminating variables used in the MVA:
56
57\f[
58L_S(i) \ \prod_{j=1}^{N_{var}} p_{Sj} (i)
59\f]
60
61and accordingly for \f$ L_B \f$.
62In practise, TMVA uses polynomial splines to estimate the probability
63density functions (PDF) obtained from the distributions of the
64training variables.
65
66
67Note that in TMVA the output of the likelihood ratio is transformed by:
68
69\f[
70R_L(i) \to R'_L(i) = -\frac{1}{\tau} ln(R_L^{-1}(i) -1)
71\f]
72
73to avoid the occurrence of heavy peaks at \f$ R_L = 0.1 \f$ .
74
75#### Decorrelated (or "diagonalized") Likelihood
76
77The biggest drawback of the Likelihood approach is that it assumes
78that the discriminant variables are uncorrelated. If it were the case,
79it can be proven that the discrimination obtained by the above likelihood
80ratio is optimal, ie, no other method can beat it. However, in most
81practical applications of MVAs correlations are present. </p>
82
83
84Linear correlations, measured from the training sample, can be taken
85into account in a straightforward manner through the square-root
86of the covariance matrix. The square-root of a matrix
87\f$ C \f$ is the matrix \f$ C&prime; \f$ that multiplied with itself
88yields \f$ C \f$: \f$ C \f$=\f$ C&prime;C&prime; \f$. We compute the
89square-root matrix (SQM) by means of diagonalising (\f$ D \f$) the
90covariance matrix:
91
92\f[
93D = S^TCS \Rightarrow C' = S \sqrt{DS^T}
94\f]
95
96and the linear transformation of the linearly correlated into the
97uncorrelated variables space is then given by multiplying the measured
98variable tuple by the inverse of the SQM. Note that these transformations
99are performed for both signal and background separately, since the
100correlation pattern is not the same in the two samples.
101
102
103The above diagonalisation is complete for linearly correlated,
104Gaussian distributed variables only. In real-world examples this
105is not often the case, so that only little additional information
106may be recovered by the diagonalisation procedure. In these cases,
107non-linear methods must be applied.
108*/
109
111
112#include "TMVA/Configurable.h"
114#include "TMVA/DataSet.h"
115#include "TMVA/DataSetInfo.h"
116#include "TMVA/IMethod.h"
117#include "TMVA/MethodBase.h"
118#include "TMVA/MsgLogger.h"
119#include "TMVA/PDF.h"
120#include "TMVA/Ranking.h"
121#include "TMVA/Tools.h"
122#include "TMVA/Types.h"
123#include "TMVA/VariableInfo.h"
124
125#include "TMatrixD.h"
126#include "TVector.h"
127#include "TMath.h"
128#include "TFile.h"
129#include "TKey.h"
130#include "TH1.h"
131#include "TClass.h"
132#include "Riostream.h"
133
134#include <iomanip>
135#include <vector>
136#include <cstdlib>
137
138REGISTER_METHOD(Likelihood)
139
141
142////////////////////////////////////////////////////////////////////////////////
143/// standard constructor
144
146 const TString& methodTitle,
147 DataSetInfo& theData,
148 const TString& theOption ) :
149 TMVA::MethodBase( jobName, Types::kLikelihood, methodTitle, theData, theOption),
150 fEpsilon ( 1.e3 * DBL_MIN ),
151 fTransformLikelihoodOutput( kFALSE ),
152 fDropVariable ( 0 ),
153 fHistSig ( 0 ),
154 fHistBgd ( 0 ),
155 fHistSig_smooth( 0 ),
156 fHistBgd_smooth( 0 ),
157 fDefaultPDFLik ( 0 ),
158 fPDFSig ( 0 ),
159 fPDFBgd ( 0 ),
160 fNsmooth ( 2 ),
161 fNsmoothVarS ( 0 ),
162 fNsmoothVarB ( 0 ),
163 fAverageEvtPerBin( 0 ),
164 fAverageEvtPerBinVarS (0),
165 fAverageEvtPerBinVarB (0),
166 fKDEfineFactor ( 0 ),
167 fInterpolateString(0)
168{
169}
170
171////////////////////////////////////////////////////////////////////////////////
172/// construct likelihood references from file
173
175 const TString& theWeightFile) :
176 TMVA::MethodBase( Types::kLikelihood, theData, theWeightFile),
177 fEpsilon ( 1.e3 * DBL_MIN ),
178 fTransformLikelihoodOutput( kFALSE ),
179 fDropVariable ( 0 ),
180 fHistSig ( 0 ),
181 fHistBgd ( 0 ),
182 fHistSig_smooth( 0 ),
183 fHistBgd_smooth( 0 ),
184 fDefaultPDFLik ( 0 ),
185 fPDFSig ( 0 ),
186 fPDFBgd ( 0 ),
187 fNsmooth ( 2 ),
188 fNsmoothVarS ( 0 ),
189 fNsmoothVarB ( 0 ),
190 fAverageEvtPerBin( 0 ),
191 fAverageEvtPerBinVarS (0),
192 fAverageEvtPerBinVarB (0),
193 fKDEfineFactor ( 0 ),
194 fInterpolateString(0)
195{
196}
197
198////////////////////////////////////////////////////////////////////////////////
199/// destructor
200
202{
203 if (NULL != fDefaultPDFLik) delete fDefaultPDFLik;
204 if (NULL != fHistSig) delete fHistSig;
205 if (NULL != fHistBgd) delete fHistBgd;
206 if (NULL != fHistSig_smooth) delete fHistSig_smooth;
207 if (NULL != fHistBgd_smooth) delete fHistBgd_smooth;
208 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
209 if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
210 if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
211 }
212 if (NULL != fPDFSig) delete fPDFSig;
213 if (NULL != fPDFBgd) delete fPDFBgd;
214}
215
216////////////////////////////////////////////////////////////////////////////////
217/// FDA can handle classification with 2 classes
218
220 UInt_t numberClasses, UInt_t /*numberTargets*/ )
221{
222 if (type == Types::kClassification && numberClasses == 2) return kTRUE;
223 return kFALSE;
224}
225
226////////////////////////////////////////////////////////////////////////////////
227/// default initialisation called by all constructors
228
230{
231 // no ranking test
232 fDropVariable = -1;
233 fHistSig = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
234 fHistBgd = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
235 fHistSig_smooth = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
236 fHistBgd_smooth = new std::vector<TH1*> ( GetNvar(), (TH1*)0 );
237 fPDFSig = new std::vector<TMVA::PDF*>( GetNvar(), (TMVA::PDF*)0 );
238 fPDFBgd = new std::vector<TMVA::PDF*>( GetNvar(), (TMVA::PDF*)0 );
239}
240
241////////////////////////////////////////////////////////////////////////////////
242/// define the options (their key words) that can be set in the option string
243///
244/// TransformOutput <bool> transform (often strongly peaked) likelihood output through sigmoid inversion
245
247{
248 DeclareOptionRef( fTransformLikelihoodOutput = kFALSE, "TransformOutput",
249 "Transform likelihood output by inverse sigmoid function" );
250
251 // initialize
252
253 // reading every PDF's definition and passing the option string to the next one to be read and marked
254 TString updatedOptions = GetOptions();
255 fDefaultPDFLik = new PDF( TString(GetName()) + " PDF", updatedOptions );
256 fDefaultPDFLik->DeclareOptions();
257 fDefaultPDFLik->ParseOptions();
258 updatedOptions = fDefaultPDFLik->GetOptions();
259 for (UInt_t ivar = 0; ivar< DataInfo().GetNVariables(); ivar++) {
260 (*fPDFSig)[ivar] = new PDF( Form("%s PDF Sig[%d]", GetName(), ivar), updatedOptions,
261 Form("Sig[%d]",ivar), fDefaultPDFLik );
262 (*fPDFSig)[ivar]->DeclareOptions();
263 (*fPDFSig)[ivar]->ParseOptions();
264 updatedOptions = (*fPDFSig)[ivar]->GetOptions();
265 (*fPDFBgd)[ivar] = new PDF( Form("%s PDF Bkg[%d]", GetName(), ivar), updatedOptions,
266 Form("Bkg[%d]",ivar), fDefaultPDFLik );
267 (*fPDFBgd)[ivar]->DeclareOptions();
268 (*fPDFBgd)[ivar]->ParseOptions();
269 updatedOptions = (*fPDFBgd)[ivar]->GetOptions();
270 }
271
272 // the final marked option string is written back to the original likelihood
273 SetOptions( updatedOptions );
274}
275
276
278{
279 // options that are used ONLY for the READER to ensure backward compatibility
280
282 DeclareOptionRef( fNsmooth = 1, "NSmooth",
283 "Number of smoothing iterations for the input histograms");
284 DeclareOptionRef( fAverageEvtPerBin = 50, "NAvEvtPerBin",
285 "Average number of events per PDF bin");
286 DeclareOptionRef( fKDEfineFactor =1. , "KDEFineFactor",
287 "Fine tuning factor for Adaptive KDE: Factor to multiply the width of the kernel");
288 DeclareOptionRef( fBorderMethodString = "None", "KDEborder",
289 "Border effects treatment (1=no treatment , 2=kernel renormalization, 3=sample mirroring)" );
290 DeclareOptionRef( fKDEiterString = "Nonadaptive", "KDEiter",
291 "Number of iterations (1=non-adaptive, 2=adaptive)" );
292 DeclareOptionRef( fKDEtypeString = "Gauss", "KDEtype",
293 "KDE kernel type (1=Gauss)" );
294 fAverageEvtPerBinVarS = new Int_t[GetNvar()];
295 fAverageEvtPerBinVarB = new Int_t[GetNvar()];
296 fNsmoothVarS = new Int_t[GetNvar()];
297 fNsmoothVarB = new Int_t[GetNvar()];
298 fInterpolateString = new TString[GetNvar()];
299 for(UInt_t i=0; i<GetNvar(); ++i) {
300 fAverageEvtPerBinVarS[i] = fAverageEvtPerBinVarB[i] = 0;
301 fNsmoothVarS[i] = fNsmoothVarB[i] = 0;
302 fInterpolateString[i] = "";
303 }
304 DeclareOptionRef( fAverageEvtPerBinVarS, GetNvar(), "NAvEvtPerBinSig",
305 "Average num of events per PDF bin and variable (signal)");
306 DeclareOptionRef( fAverageEvtPerBinVarB, GetNvar(), "NAvEvtPerBinBkg",
307 "Average num of events per PDF bin and variable (background)");
308 DeclareOptionRef(fNsmoothVarS, GetNvar(), "NSmoothSig",
309 "Number of smoothing iterations for the input histograms");
310 DeclareOptionRef(fNsmoothVarB, GetNvar(), "NSmoothBkg",
311 "Number of smoothing iterations for the input histograms");
312 DeclareOptionRef(fInterpolateString, GetNvar(), "PDFInterpol", "Method of interpolating reference histograms (e.g. Spline2 or KDE)");
313}
314
315////////////////////////////////////////////////////////////////////////////////
316/// process user options
317/// reference cut value to distinguish signal-like from background-like events
318
320{
321 SetSignalReferenceCut( TransformLikelihoodOutput( 0.5, 0.5 ) );
322
323 fDefaultPDFLik->ProcessOptions();
324 for (UInt_t ivar = 0; ivar< DataInfo().GetNVariables(); ivar++) {
325 (*fPDFBgd)[ivar]->ProcessOptions();
326 (*fPDFSig)[ivar]->ProcessOptions();
327 }
328}
329
330////////////////////////////////////////////////////////////////////////////////
331/// create reference distributions (PDFs) from signal and background events:
332/// fill histograms and smooth them; if decorrelation is required, compute
333/// corresponding square-root matrices
334/// the reference histograms require the correct boundaries. Since in Likelihood classification
335/// the transformations are applied using both classes, also the corresponding boundaries
336/// need to take this into account
337
339{
340 UInt_t nvar=GetNvar();
341 std::vector<Double_t> xmin(nvar), xmax(nvar);
342 for (UInt_t ivar=0; ivar<nvar; ivar++) {xmin[ivar]=1e30; xmax[ivar]=-1e30;}
343
344 UInt_t nevents=Data()->GetNEvents();
345 for (UInt_t ievt=0; ievt<nevents; ievt++) {
346 // use the true-event-type's transformation
347 // set the event true event types transformation
348 const Event* origEv = Data()->GetEvent(ievt);
349 if (IgnoreEventsWithNegWeightsInTraining() && origEv->GetWeight()<=0) continue;
350 // loop over classes
351 for (int cls=0;cls<2;cls++){
352 GetTransformationHandler().SetTransformationReferenceClass(cls);
353 const Event* ev = GetTransformationHandler().Transform( origEv );
354 for (UInt_t ivar=0; ivar<nvar; ivar++) {
355 Float_t value = ev->GetValue(ivar);
356 if (value < xmin[ivar]) xmin[ivar] = value;
357 if (value > xmax[ivar]) xmax[ivar] = value;
358 }
359 }
360 }
361
362 // create reference histograms
363 // (KDE smoothing requires very finely binned reference histograms)
364 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
365 TString var = (*fInputVars)[ivar];
366
367 // the reference histograms require the correct boundaries. Since in Likelihood classification
368 // the transformations are applied using both classes, also the corresponding boundaries
369 // need to take this into account
370
371 // special treatment for discrete variables
372 if (DataInfo().GetVariableInfo(ivar).GetVarType() == 'I') {
373 // special treatment for integer variables
374 Int_t ixmin = TMath::Nint( xmin[ivar] );
375 xmax[ivar]=xmax[ivar]+1; // make sure that all entries are included in histogram
376 Int_t ixmax = TMath::Nint( xmax[ivar] );
377 Int_t nbins = ixmax - ixmin;
378 (*fHistSig)[ivar] = new TH1F(GetMethodName()+"_"+var + "_sig", var + " signal training", nbins, ixmin, ixmax );
379 (*fHistBgd)[ivar] = new TH1F(GetMethodName()+"_"+var + "_bgd", var + " background training", nbins, ixmin, ixmax );
380 } else {
381
382 UInt_t minNEvt = TMath::Min(Data()->GetNEvtSigTrain(),Data()->GetNEvtBkgdTrain());
383 Int_t nbinsS = (*fPDFSig)[ivar]->GetHistNBins( minNEvt );
384 Int_t nbinsB = (*fPDFBgd)[ivar]->GetHistNBins( minNEvt );
385
386 (*fHistSig)[ivar] = new TH1F( Form("%s_%s_%s_sig",DataInfo().GetName(),GetMethodName().Data(),var.Data()),
387 Form("%s_%s_%s signal training",DataInfo().GetName(),GetMethodName().Data(),var.Data()), nbinsS, xmin[ivar], xmax[ivar] );
388 (*fHistBgd)[ivar] = new TH1F( Form("%s_%s_%s_bgd",DataInfo().GetName(),GetMethodName().Data(),var.Data()),
389 Form("%s_%s_%s background training",DataInfo().GetName(),GetMethodName().Data(),var.Data()), nbinsB, xmin[ivar], xmax[ivar] );
390 }
391 }
392
393 // ----- fill the reference histograms
394 Log() << kINFO << "Filling reference histograms" << Endl;
395
396 // event loop
397 for (Int_t ievt=0; ievt<Data()->GetNEvents(); ievt++) {
398
399 // use the true-event-type's transformation
400 // set the event true event types transformation
401 const Event* origEv = Data()->GetEvent(ievt);
402 if (IgnoreEventsWithNegWeightsInTraining() && origEv->GetWeight()<=0) continue;
403 GetTransformationHandler().SetTransformationReferenceClass( origEv->GetClass() );
404 const Event* ev = GetTransformationHandler().Transform( origEv );
405
406 // the event weight
407 Float_t weight = ev->GetWeight();
408
409 // fill variable vector
410 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
411 Double_t value = ev->GetValue(ivar);
412 // verify limits
413 if (value >= xmax[ivar]) value = xmax[ivar] - 1.0e-10;
414 else if (value < xmin[ivar]) value = xmin[ivar] + 1.0e-10;
415 // inserting check if there are events in overflow or underflow
416 if (value >=(*fHistSig)[ivar]->GetXaxis()->GetXmax() ||
417 value <(*fHistSig)[ivar]->GetXaxis()->GetXmin()){
418 Log()<<kWARNING
419 <<"error in filling likelihood reference histograms var="
420 <<(*fInputVars)[ivar]
421 << ", xmin="<<(*fHistSig)[ivar]->GetXaxis()->GetXmin()
422 << ", value="<<value
423 << ", xmax="<<(*fHistSig)[ivar]->GetXaxis()->GetXmax()
424 << Endl;
425 }
426 if (DataInfo().IsSignal(ev)) (*fHistSig)[ivar]->Fill( value, weight );
427 else (*fHistBgd)[ivar]->Fill( value, weight );
428 }
429 }
430
431 // building the pdfs
432 Log() << kINFO << "Building PDF out of reference histograms" << Endl;
433 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
434
435 // the PDF is built from (binned) reference histograms
436 // in case of KDE, this has a large number of bins, which makes it quasi-unbinned
437 (*fPDFSig)[ivar]->BuildPDF( (*fHistSig)[ivar] );
438 (*fPDFBgd)[ivar]->BuildPDF( (*fHistBgd)[ivar] );
439
440 (*fPDFSig)[ivar]->ValidatePDF( (*fHistSig)[ivar] );
441 (*fPDFBgd)[ivar]->ValidatePDF( (*fHistBgd)[ivar] );
442
443 // saving the smoothed histograms
444 if ((*fPDFSig)[ivar]->GetSmoothedHist() != 0) (*fHistSig_smooth)[ivar] = (*fPDFSig)[ivar]->GetSmoothedHist();
445 if ((*fPDFBgd)[ivar]->GetSmoothedHist() != 0) (*fHistBgd_smooth)[ivar] = (*fPDFBgd)[ivar]->GetSmoothedHist();
446 }
447 ExitFromTraining();
448}
449
450////////////////////////////////////////////////////////////////////////////////
451/// returns the likelihood estimator for signal
452/// fill a new Likelihood branch into the testTree
453
455{
456 UInt_t ivar;
457
458 // cannot determine error
459 NoErrorCalc(err, errUpper);
460
461 // retrieve variables, and transform, if required
462 TVector vs( GetNvar() );
463 TVector vb( GetNvar() );
464
465 // need to distinguish signal and background in case of variable transformation
466 // signal first
467
468 GetTransformationHandler().SetTransformationReferenceClass( fSignalClass );
469 // temporary: JS --> FIX
470 //GetTransformationHandler().SetTransformationReferenceClass( 0 );
471 const Event* ev = GetEvent();
472 for (ivar=0; ivar<GetNvar(); ivar++) vs(ivar) = ev->GetValue(ivar);
473
474 GetTransformationHandler().SetTransformationReferenceClass( fBackgroundClass );
475 // temporary: JS --> FIX
476 //GetTransformationHandler().SetTransformationReferenceClass( 1 );
477 ev = GetEvent();
478 for (ivar=0; ivar<GetNvar(); ivar++) vb(ivar) = ev->GetValue(ivar);
479
480 // compute the likelihood (signal)
481 Double_t ps(1), pb(1), p(0);
482 for (ivar=0; ivar<GetNvar(); ivar++) {
483
484 // drop one variable (this is ONLY used for internal variable ranking !)
485 if ((Int_t)ivar == fDropVariable) continue;
486
487 Double_t x[2] = { vs(ivar), vb(ivar) };
488
489 for (UInt_t itype=0; itype < 2; itype++) {
490
491 // verify limits
492 if (x[itype] >= (*fPDFSig)[ivar]->GetXmax()) x[itype] = (*fPDFSig)[ivar]->GetXmax() - 1.0e-10;
493 else if (x[itype] < (*fPDFSig)[ivar]->GetXmin()) x[itype] = (*fPDFSig)[ivar]->GetXmin();
494
495 // find corresponding histogram from cached indices
496 PDF* pdf = (itype == 0) ? (*fPDFSig)[ivar] : (*fPDFBgd)[ivar];
497 if (pdf == 0) Log() << kFATAL << "<GetMvaValue> Reference histograms don't exist" << Endl;
498 TH1* hist = pdf->GetPDFHist();
499
500 // interpolate linearly between adjacent bins
501 // this is not useful for discrete variables
502 Int_t bin = hist->FindBin(x[itype]);
503
504 // **** POTENTIAL BUG: PREFORMANCE IS WORSE WHEN USING TRUE TYPE ***
505 // ==> commented out at present
506 if ((*fPDFSig)[ivar]->GetInterpolMethod() == TMVA::PDF::kSpline0 ||
507 DataInfo().GetVariableInfo(ivar).GetVarType() == 'N') {
508 p = TMath::Max( hist->GetBinContent(bin), fEpsilon );
509 } else { // splined PDF
510 Int_t nextbin = bin;
511 if ((x[itype] > hist->GetBinCenter(bin) && bin != hist->GetNbinsX()) || bin == 1)
512 nextbin++;
513 else
514 nextbin--;
515
516
517 Double_t dx = hist->GetBinCenter(bin) - hist->GetBinCenter(nextbin);
518 Double_t dy = hist->GetBinContent(bin) - hist->GetBinContent(nextbin);
519 Double_t like = hist->GetBinContent(bin) + (x[itype] - hist->GetBinCenter(bin)) * dy/dx;
520
521 p = TMath::Max( like, fEpsilon );
522 }
523
524 if (itype == 0) ps *= p;
525 else pb *= p;
526 }
527 }
528
529 // the likelihood ratio (transform it ?)
530 return TransformLikelihoodOutput( ps, pb );
531}
532
533////////////////////////////////////////////////////////////////////////////////
534/// returns transformed or non-transformed output
535
537{
538 if (ps < fEpsilon) ps = fEpsilon;
539 if (pb < fEpsilon) pb = fEpsilon;
540 Double_t r = ps/(ps + pb);
541 if (r >= 1.0) r = 1. - 1.e-15;
542
543 if (fTransformLikelihoodOutput) {
544 // inverse Fermi function
545
546 // sanity check
547 if (r <= 0.0) r = fEpsilon;
548 else if (r >= 1.0) r = 1. - 1.e-15;
549
550 Double_t tau = 15.0;
551 r = - TMath::Log(1.0/r - 1.0)/tau;
552 }
553
554 return r;
555}
556
557////////////////////////////////////////////////////////////////////////////////
558/// write options to stream
559
560void TMVA::MethodLikelihood::WriteOptionsToStream( std::ostream& o, const TString& prefix ) const
561{
563
564 // writing the options defined for the different pdfs
565 if (fDefaultPDFLik != 0) {
566 o << prefix << std::endl << prefix << "#Default Likelihood PDF Options:" << std::endl << prefix << std::endl;
567 fDefaultPDFLik->WriteOptionsToStream( o, prefix );
568 }
569 for (UInt_t ivar = 0; ivar < fPDFSig->size(); ivar++) {
570 if ((*fPDFSig)[ivar] != 0) {
571 o << prefix << std::endl << prefix << Form("#Signal[%d] Likelihood PDF Options:",ivar) << std::endl << prefix << std::endl;
572 (*fPDFSig)[ivar]->WriteOptionsToStream( o, prefix );
573 }
574 if ((*fPDFBgd)[ivar] != 0) {
575 o << prefix << std::endl << prefix << "#Background[%d] Likelihood PDF Options:" << std::endl << prefix << std::endl;
576 (*fPDFBgd)[ivar]->WriteOptionsToStream( o, prefix );
577 }
578 }
579}
580
581////////////////////////////////////////////////////////////////////////////////
582/// write weights to XML
583
585{
586 void* wght = gTools().AddChild(parent, "Weights");
587 gTools().AddAttr(wght, "NVariables", GetNvar());
588 gTools().AddAttr(wght, "NClasses", 2);
589 void* pdfwrap;
590 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
591 if ( (*fPDFSig)[ivar]==0 || (*fPDFBgd)[ivar]==0 )
592 Log() << kFATAL << "Reference histograms for variable " << ivar
593 << " don't exist, can't write it to weight file" << Endl;
594 pdfwrap = gTools().AddChild(wght, "PDFDescriptor");
595 gTools().AddAttr(pdfwrap, "VarIndex", ivar);
596 gTools().AddAttr(pdfwrap, "ClassIndex", 0);
597 (*fPDFSig)[ivar]->AddXMLTo(pdfwrap);
598 pdfwrap = gTools().AddChild(wght, "PDFDescriptor");
599 gTools().AddAttr(pdfwrap, "VarIndex", ivar);
600 gTools().AddAttr(pdfwrap, "ClassIndex", 1);
601 (*fPDFBgd)[ivar]->AddXMLTo(pdfwrap);
602 }
603}
604
605////////////////////////////////////////////////////////////////////////////////
606/// computes ranking of input variables
607
609{
610 // create the ranking object
611 if (fRanking) delete fRanking;
612 fRanking = new Ranking( GetName(), "Delta Separation" );
613
614 Double_t sepRef = -1, sep = -1;
615 for (Int_t ivar=-1; ivar<(Int_t)GetNvar(); ivar++) {
616
617 // this variable should not be used
618 fDropVariable = ivar;
619
620 TString nameS = Form( "rS_%i", ivar+1 );
621 TString nameB = Form( "rB_%i", ivar+1 );
622 TH1* rS = new TH1F( nameS, nameS, 80, 0, 1 );
623 TH1* rB = new TH1F( nameB, nameB, 80, 0, 1 );
624
625 // the event loop
626 for (Int_t ievt=0; ievt<Data()->GetNTrainingEvents(); ievt++) {
627
628 const Event* origEv = Data()->GetEvent(ievt);
629 GetTransformationHandler().SetTransformationReferenceClass( origEv->GetClass() );
630 const Event* ev = GetTransformationHandler().Transform(Data()->GetEvent(ievt));
631
632 Double_t lk = this->GetMvaValue();
633 Double_t w = ev->GetWeight();
634 if (DataInfo().IsSignal(ev)) rS->Fill( lk, w );
635 else rB->Fill( lk, w );
636 }
637
638 // compute separation
639 sep = TMVA::gTools().GetSeparation( rS, rB );
640 if (ivar == -1) sepRef = sep;
641 sep = sepRef - sep;
642
643 // don't need these histograms anymore
644 delete rS;
645 delete rB;
646
647 if (ivar >= 0) fRanking->AddRank( Rank( DataInfo().GetVariableInfo(ivar).GetInternalName(), sep ) );
648 }
649
650 fDropVariable = -1;
651
652 return fRanking;
653}
654
655////////////////////////////////////////////////////////////////////////////////
656/// write reference PDFs to ROOT file
657
659{
660 TString pname = "PDF_";
661 for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
662 (*fPDFSig)[ivar]->Write( pname + GetInputVar( ivar ) + "_S" );
663 (*fPDFBgd)[ivar]->Write( pname + GetInputVar( ivar ) + "_B" );
664 }
665}
666
667////////////////////////////////////////////////////////////////////////////////
668/// read weights from XML
669
671{
672 TString pname = "PDF_";
673 Bool_t addDirStatus = TH1::AddDirectoryStatus();
674 TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
675 UInt_t nvars=0;
676 gTools().ReadAttr(wghtnode, "NVariables",nvars);
677 void* descnode = gTools().GetChild(wghtnode);
678 for (UInt_t ivar=0; ivar<nvars; ivar++){
679 void* pdfnode = gTools().GetChild(descnode);
680 Log() << kDEBUG << "Reading signal and background PDF for variable: " << GetInputVar( ivar ) << Endl;
681 if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
682 if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
683 (*fPDFSig)[ivar] = new PDF( GetInputVar( ivar ) + " PDF Sig" );
684 (*fPDFBgd)[ivar] = new PDF( GetInputVar( ivar ) + " PDF Bkg" );
685 (*fPDFSig)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
686 (*fPDFBgd)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
687 (*(*fPDFSig)[ivar]).ReadXML(pdfnode);
688 descnode = gTools().GetNextChild(descnode);
689 pdfnode = gTools().GetChild(descnode);
690 (*(*fPDFBgd)[ivar]).ReadXML(pdfnode);
691 descnode = gTools().GetNextChild(descnode);
692 }
693 TH1::AddDirectory(addDirStatus);
694}
695
696////////////////////////////////////////////////////////////////////////////////
697/// read weight info from file
698/// nothing to do for this method
699
701{
702 TString pname = "PDF_";
703 Bool_t addDirStatus = TH1::AddDirectoryStatus();
704 TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
705 for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
706 Log() << kDEBUG << "Reading signal and background PDF for variable: " << GetInputVar( ivar ) << Endl;
707 if ((*fPDFSig)[ivar] !=0) delete (*fPDFSig)[ivar];
708 if ((*fPDFBgd)[ivar] !=0) delete (*fPDFBgd)[ivar];
709 (*fPDFSig)[ivar] = new PDF(GetInputVar( ivar ) + " PDF Sig" );
710 (*fPDFBgd)[ivar] = new PDF(GetInputVar( ivar ) + " PDF Bkg");
711 (*fPDFSig)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
712 (*fPDFBgd)[ivar]->SetReadingVersion( GetTrainingTMVAVersionCode() );
713 istr >> *(*fPDFSig)[ivar];
714 istr >> *(*fPDFBgd)[ivar];
715 }
716 TH1::AddDirectory(addDirStatus);
717}
718
719////////////////////////////////////////////////////////////////////////////////
720/// read reference PDF from ROOT file
721
723{
724 TString pname = "PDF_";
725 Bool_t addDirStatus = TH1::AddDirectoryStatus();
726 TH1::AddDirectory(0); // this avoids the binding of the hists in TMVA::PDF to the current ROOT file
727 for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
728 (*fPDFSig)[ivar] = (TMVA::PDF*)rf.Get( Form( "PDF_%s_S", GetInputVar( ivar ).Data() ) );
729 (*fPDFBgd)[ivar] = (TMVA::PDF*)rf.Get( Form( "PDF_%s_B", GetInputVar( ivar ).Data() ) );
730 }
731 TH1::AddDirectory(addDirStatus);
732}
733
734////////////////////////////////////////////////////////////////////////////////
735/// write histograms and PDFs to file for monitoring purposes
736
738{
739 Log() << kINFO << "Write monitoring histograms to file: " << BaseDir()->GetPath() << Endl;
740 BaseDir()->cd();
741
742 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
743 (*fHistSig)[ivar]->Write();
744 (*fHistBgd)[ivar]->Write();
745 if ((*fHistSig_smooth)[ivar] != 0) (*fHistSig_smooth)[ivar]->Write();
746 if ((*fHistBgd_smooth)[ivar] != 0) (*fHistBgd_smooth)[ivar]->Write();
747 (*fPDFSig)[ivar]->GetPDFHist()->Write();
748 (*fPDFBgd)[ivar]->GetPDFHist()->Write();
749
750 if ((*fPDFSig)[ivar]->GetNSmoothHist() != 0) (*fPDFSig)[ivar]->GetNSmoothHist()->Write();
751 if ((*fPDFBgd)[ivar]->GetNSmoothHist() != 0) (*fPDFBgd)[ivar]->GetNSmoothHist()->Write();
752
753 // add special plots to check the smoothing in the GetVal method
754 Float_t xmin=((*fPDFSig)[ivar]->GetPDFHist()->GetXaxis())->GetXmin();
755 Float_t xmax=((*fPDFSig)[ivar]->GetPDFHist()->GetXaxis())->GetXmax();
756 TH1F* mm = new TH1F( (*fInputVars)[ivar]+"_additional_check",
757 (*fInputVars)[ivar]+"_additional_check", 15000, xmin, xmax );
758 Double_t intBin = (xmax-xmin)/15000;
759 for (Int_t bin=0; bin < 15000; bin++) {
760 Double_t x = (bin + 0.5)*intBin + xmin;
761 mm->SetBinContent(bin+1 ,(*fPDFSig)[ivar]->GetVal(x));
762 }
763 mm->Write();
764
765 // ---------- create cloned low-binned histogram for comparison in macros (mainly necessary for KDE)
766 TH1* h[2] = { (*fHistSig)[ivar], (*fHistBgd)[ivar] };
767 for (UInt_t i=0; i<2; i++) {
768 TH1* hclone = (TH1F*)h[i]->Clone( TString(h[i]->GetName()) + "_nice" );
769 hclone->SetName ( TString(h[i]->GetName()) + "_nice" );
770 hclone->SetTitle( TString(h[i]->GetTitle()) + "" );
771 if (hclone->GetNbinsX() > 100) {
772 Int_t resFactor = 5;
773 hclone->Rebin( resFactor );
774 hclone->Scale( 1.0/resFactor );
775 }
776 hclone->Write();
777 }
778 // ----------
779 }
780}
781
782////////////////////////////////////////////////////////////////////////////////
783/// write specific header of the classifier (mostly include files)
784
785void TMVA::MethodLikelihood::MakeClassSpecificHeader( std::ostream& fout, const TString& ) const
786{
787 fout << "#include <math.h>" << std::endl;
788 fout << "#include <cstdlib>" << std::endl;
789}
790
791////////////////////////////////////////////////////////////////////////////////
792/// write specific classifier response
793
794void TMVA::MethodLikelihood::MakeClassSpecific( std::ostream& fout, const TString& className ) const
795{
796 Int_t dp = fout.precision();
797 fout << " double fEpsilon;" << std::endl;
798
799 Int_t * nbin = new Int_t[GetNvar()];
800
801 Int_t nbinMax=-1;
802 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
803 nbin[ivar]=(*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX();
804 if (nbin[ivar] > nbinMax) nbinMax=nbin[ivar];
805 }
806
807 fout << " static float fRefS[][" << nbinMax << "]; "
808 << "// signal reference vector [nvars][max_nbins]" << std::endl;
809 fout << " static float fRefB[][" << nbinMax << "]; "
810 << "// backgr reference vector [nvars][max_nbins]" << std::endl << std::endl;
811 fout << "// if a variable has its PDF encoded as a spline0 --> treat it like an Integer valued one" <<std::endl;
812 fout << " bool fHasDiscretPDF[" << GetNvar() <<"]; "<< std::endl;
813 fout << " int fNbin[" << GetNvar() << "]; "
814 << "// number of bins (discrete variables may have less bins)" << std::endl;
815 fout << " double fHistMin[" << GetNvar() << "]; " << std::endl;
816 fout << " double fHistMax[" << GetNvar() << "]; " << std::endl;
817
818 fout << " double TransformLikelihoodOutput( double, double ) const;" << std::endl;
819 fout << "};" << std::endl;
820 fout << "" << std::endl;
821 fout << "inline void " << className << "::Initialize() " << std::endl;
822 fout << "{" << std::endl;
823 fout << " fEpsilon = " << fEpsilon << ";" << std::endl;
824 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
825 fout << " fNbin[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() << ";" << std::endl;
826 fout << " fHistMin[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetXaxis()->GetXmin() << ";" << std::endl;
827 fout << " fHistMax[" << ivar << "] = " << (*fPDFSig)[ivar]->GetPDFHist()->GetXaxis()->GetXmax() << ";" << std::endl;
828 // sanity check (for previous code lines)
829 if ((((*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() != nbin[ivar] ||
830 (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX() != nbin[ivar])
831 ) ||
832 (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() != (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX()) {
833 Log() << kFATAL << "<MakeClassSpecific> Mismatch in binning of variable "
834 << "\"" << GetOriginalVarName(ivar) << "\" of type: \'" << DataInfo().GetVariableInfo(ivar).GetVarType()
835 << "\' : "
836 << "nxS = " << (*fPDFSig)[ivar]->GetPDFHist()->GetNbinsX() << ", "
837 << "nxB = " << (*fPDFBgd)[ivar]->GetPDFHist()->GetNbinsX()
838 << " while we expect " << nbin[ivar]
839 << Endl;
840 }
841 }
842 for (UInt_t ivar=0; ivar<GetNvar(); ivar++){
843 if ((*fPDFSig)[ivar]->GetInterpolMethod() == TMVA::PDF::kSpline0)
844 fout << " fHasDiscretPDF[" << ivar <<"] = true; " << std::endl;
845 else
846 fout << " fHasDiscretPDF[" << ivar <<"] = false; " << std::endl;
847 }
848
849 fout << "}" << std::endl << std::endl;
850
851 fout << "inline double " << className
852 << "::GetMvaValue__( const std::vector<double>& inputValues ) const" << std::endl;
853 fout << "{" << std::endl;
854 fout << " double ps(1), pb(1);" << std::endl;
855 fout << " std::vector<double> inputValuesSig = inputValues;" << std::endl;
856 fout << " std::vector<double> inputValuesBgd = inputValues;" << std::endl;
857 if (GetTransformationHandler().GetTransformationList().GetSize() != 0) {
858 fout << " Transform(inputValuesSig,0);" << std::endl;
859 fout << " Transform(inputValuesBgd,1);" << std::endl;
860 }
861 fout << " for (size_t ivar = 0; ivar < GetNvar(); ivar++) {" << std::endl;
862 fout << std::endl;
863 fout << " // dummy at present... will be used for variable transforms" << std::endl;
864 fout << " double x[2] = { inputValuesSig[ivar], inputValuesBgd[ivar] };" << std::endl;
865 fout << std::endl;
866 fout << " for (int itype=0; itype < 2; itype++) {" << std::endl;
867 fout << std::endl;
868 fout << " // interpolate linearly between adjacent bins" << std::endl;
869 fout << " // this is not useful for discrete variables (or forced Spline0)" << std::endl;
870 fout << " int bin = int((x[itype] - fHistMin[ivar])/(fHistMax[ivar] - fHistMin[ivar])*fNbin[ivar]) + 0;" << std::endl;
871 fout << std::endl;
872 fout << " // since the test data sample is in general different from the training sample" << std::endl;
873 fout << " // it can happen that the min/max of the training sample are trespassed --> correct this" << std::endl;
874 fout << " if (bin < 0) {" << std::endl;
875 fout << " bin = 0;" << std::endl;
876 fout << " x[itype] = fHistMin[ivar];" << std::endl;
877 fout << " }" << std::endl;
878 fout << " else if (bin >= fNbin[ivar]) {" << std::endl;
879 fout << " bin = fNbin[ivar]-1;" << std::endl;
880 fout << " x[itype] = fHistMax[ivar];" << std::endl;
881 fout << " }" << std::endl;
882 fout << std::endl;
883 fout << " // find corresponding histogram from cached indices" << std::endl;
884 fout << " float ref = (itype == 0) ? fRefS[ivar][bin] : fRefB[ivar][bin];" << std::endl;
885 fout << std::endl;
886 fout << " // sanity check" << std::endl;
887 fout << " if (ref < 0) {" << std::endl;
888 fout << " std::cout << \"Fatal error in " << className
889 << ": bin entry < 0 ==> abort\" << std::endl;" << std::endl;
890 fout << " std::exit(1);" << std::endl;
891 fout << " }" << std::endl;
892 fout << std::endl;
893 fout << " double p = ref;" << std::endl;
894 fout << std::endl;
895 fout << " if (GetType(ivar) != 'I' && !fHasDiscretPDF[ivar]) {" << std::endl;
896 fout << " float bincenter = (bin + 0.5)/fNbin[ivar]*(fHistMax[ivar] - fHistMin[ivar]) + fHistMin[ivar];" << std::endl;
897 fout << " int nextbin = bin;" << std::endl;
898 fout << " if ((x[itype] > bincenter && bin != fNbin[ivar]-1) || bin == 0) " << std::endl;
899 fout << " nextbin++;" << std::endl;
900 fout << " else" << std::endl;
901 fout << " nextbin--; " << std::endl;
902 fout << std::endl;
903 fout << " double refnext = (itype == 0) ? fRefS[ivar][nextbin] : fRefB[ivar][nextbin];" << std::endl;
904 fout << " float nextbincenter = (nextbin + 0.5)/fNbin[ivar]*(fHistMax[ivar] - fHistMin[ivar]) + fHistMin[ivar];" << std::endl;
905 fout << std::endl;
906 fout << " double dx = bincenter - nextbincenter;" << std::endl;
907 fout << " double dy = ref - refnext;" << std::endl;
908 fout << " p += (x[itype] - bincenter) * dy/dx;" << std::endl;
909 fout << " }" << std::endl;
910 fout << std::endl;
911 fout << " if (p < fEpsilon) p = fEpsilon; // avoid zero response" << std::endl;
912 fout << std::endl;
913 fout << " if (itype == 0) ps *= p;" << std::endl;
914 fout << " else pb *= p;" << std::endl;
915 fout << " } " << std::endl;
916 fout << " } " << std::endl;
917 fout << std::endl;
918 fout << " // the likelihood ratio (transform it ?)" << std::endl;
919 fout << " return TransformLikelihoodOutput( ps, pb ); " << std::endl;
920 fout << "}" << std::endl << std::endl;
921
922 fout << "inline double " << className << "::TransformLikelihoodOutput( double ps, double pb ) const" << std::endl;
923 fout << "{" << std::endl;
924 fout << " // returns transformed or non-transformed output" << std::endl;
925 fout << " if (ps < fEpsilon) ps = fEpsilon;" << std::endl;
926 fout << " if (pb < fEpsilon) pb = fEpsilon;" << std::endl;
927 fout << " double r = ps/(ps + pb);" << std::endl;
928 fout << " if (r >= 1.0) r = 1. - 1.e-15;" << std::endl;
929 fout << std::endl;
930 fout << " if (" << (fTransformLikelihoodOutput ? "true" : "false") << ") {" << std::endl;
931 fout << " // inverse Fermi function" << std::endl;
932 fout << std::endl;
933 fout << " // sanity check" << std::endl;
934 fout << " if (r <= 0.0) r = fEpsilon;" << std::endl;
935 fout << " else if (r >= 1.0) r = 1. - 1.e-15;" << std::endl;
936 fout << std::endl;
937 fout << " double tau = 15.0;" << std::endl;
938 fout << " r = - log(1.0/r - 1.0)/tau;" << std::endl;
939 fout << " }" << std::endl;
940 fout << std::endl;
941 fout << " return r;" << std::endl;
942 fout << "}" << std::endl;
943 fout << std::endl;
944
945 fout << "// Clean up" << std::endl;
946 fout << "inline void " << className << "::Clear() " << std::endl;
947 fout << "{" << std::endl;
948 fout << " // nothing to clear" << std::endl;
949 fout << "}" << std::endl << std::endl;
950
951 fout << "// signal map" << std::endl;
952 fout << "float " << className << "::fRefS[][" << nbinMax << "] = " << std::endl;
953 fout << "{ " << std::endl;
954 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
955 fout << " { ";
956 for (Int_t ibin=1; ibin<=nbinMax; ibin++) {
957 if (ibin-1 < nbin[ivar])
958 fout << (*fPDFSig)[ivar]->GetPDFHist()->GetBinContent(ibin);
959 else
960 fout << -1;
961
962 if (ibin < nbinMax) fout << ", ";
963 }
964 fout << " }, " << std::endl;
965 }
966 fout << "}; " << std::endl;
967 fout << std::endl;
968
969 fout << "// background map" << std::endl;
970 fout << "float " << className << "::fRefB[][" << nbinMax << "] = " << std::endl;
971 fout << "{ " << std::endl;
972 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
973 fout << " { ";
974 fout << std::setprecision(8);
975 for (Int_t ibin=1; ibin<=nbinMax; ibin++) {
976 if (ibin-1 < nbin[ivar])
977 fout << (*fPDFBgd)[ivar]->GetPDFHist()->GetBinContent(ibin);
978 else
979 fout << -1;
980
981 if (ibin < nbinMax) fout << ", ";
982 }
983 fout << " }, " << std::endl;
984 }
985 fout << "}; " << std::endl;
986 fout << std::endl;
987 fout << std::setprecision(dp);
988
989 delete[] nbin;
990}
991
992////////////////////////////////////////////////////////////////////////////////
993/// get help message text
994///
995/// typical length of text line:
996/// "|--------------------------------------------------------------|"
997
999{
1000 Log() << Endl;
1001 Log() << gTools().Color("bold") << "--- Short description:" << gTools().Color("reset") << Endl;
1002 Log() << Endl;
1003 Log() << "The maximum-likelihood classifier models the data with probability " << Endl;
1004 Log() << "density functions (PDF) reproducing the signal and background" << Endl;
1005 Log() << "distributions of the input variables. Correlations among the " << Endl;
1006 Log() << "variables are ignored." << Endl;
1007 Log() << Endl;
1008 Log() << gTools().Color("bold") << "--- Performance optimisation:" << gTools().Color("reset") << Endl;
1009 Log() << Endl;
1010 Log() << "Required for good performance are decorrelated input variables" << Endl;
1011 Log() << "(PCA transformation via the option \"VarTransform=Decorrelate\"" << Endl;
1012 Log() << "may be tried). Irreducible non-linear correlations may be reduced" << Endl;
1013 Log() << "by precombining strongly correlated input variables, or by simply" << Endl;
1014 Log() << "removing one of the variables." << Endl;
1015 Log() << Endl;
1016 Log() << gTools().Color("bold") << "--- Performance tuning via configuration options:" << gTools().Color("reset") << Endl;
1017 Log() << Endl;
1018 Log() << "High fidelity PDF estimates are mandatory, i.e., sufficient training " << Endl;
1019 Log() << "statistics is required to populate the tails of the distributions" << Endl;
1020 Log() << "It would be a surprise if the default Spline or KDE kernel parameters" << Endl;
1021 Log() << "provide a satisfying fit to the data. The user is advised to properly" << Endl;
1022 Log() << "tune the events per bin and smooth options in the spline cases" << Endl;
1023 Log() << "individually per variable. If the KDE kernel is used, the adaptive" << Endl;
1024 Log() << "Gaussian kernel may lead to artefacts, so please always also try" << Endl;
1025 Log() << "the non-adaptive one." << Endl;
1026 Log() << "" << Endl;
1027 Log() << "All tuning parameters must be adjusted individually for each input" << Endl;
1028 Log() << "variable!" << Endl;
1029}
1030
#define REGISTER_METHOD(CLASS)
for example
ROOT::R::TRInterface & r
Definition: Object.C:4
#define h(i)
Definition: RSha256.hxx:106
int Int_t
Definition: RtypesCore.h:43
const Bool_t kFALSE
Definition: RtypesCore.h:90
double Double_t
Definition: RtypesCore.h:57
float Float_t
Definition: RtypesCore.h:55
const Bool_t kTRUE
Definition: RtypesCore.h:89
#define ClassImp(name)
Definition: Rtypes.h:361
int type
Definition: TGX11.cxx:120
float xmin
Definition: THbookFile.cxx:93
float xmax
Definition: THbookFile.cxx:93
char * Form(const char *fmt,...)
TObject * Get(const char *namecycle) override
Return pointer to object identified by namecycle.
A ROOT file is a suite of consecutive data records (TKey instances) with a well defined format.
Definition: TFile.h:53
1-D histogram with a float per channel (see TH1 documentation)}
Definition: TH1.h:571
The TH1 histogram class.
Definition: TH1.h:56
virtual void SetTitle(const char *title)
See GetStatOverflows for more information.
Definition: TH1.cxx:6345
virtual Double_t GetBinCenter(Int_t bin) const
Return bin center for 1D histogram.
Definition: TH1.cxx:8597
static void AddDirectory(Bool_t add=kTRUE)
Sets the flag controlling the automatic add of histograms in memory.
Definition: TH1.cxx:1226
virtual Int_t GetNbinsX() const
Definition: TH1.h:292
virtual Int_t Fill(Double_t x)
Increment bin with abscissa X by 1.
Definition: TH1.cxx:3275
virtual void SetName(const char *name)
Change the name of this histogram.
Definition: TH1.cxx:8416
virtual Double_t GetBinContent(Int_t bin) const
Return content of bin number bin.
Definition: TH1.cxx:4907
virtual void Scale(Double_t c1=1, Option_t *option="")
Multiply this histogram by a constant c1.
Definition: TH1.cxx:6246
virtual Int_t FindBin(Double_t x, Double_t y=0, Double_t z=0)
Return Global bin number corresponding to x,y,z.
Definition: TH1.cxx:3596
static Bool_t AddDirectoryStatus()
Static function: cannot be inlined on Windows/NT.
Definition: TH1.cxx:706
virtual TH1 * Rebin(Int_t ngroup=2, const char *newname="", const Double_t *xbins=0)
Rebin this histogram.
Definition: TH1.cxx:5904
void WriteOptionsToStream(std::ostream &o, const TString &prefix) const
write options to output stream (e.g. in writing the MVA weight files
Class that contains all the data information.
Definition: DataSetInfo.h:60
Float_t GetValue(UInt_t ivar) const
return value of i'th variable
Definition: Event.cxx:236
Double_t GetWeight() const
return the event weight - depending on whether the flag IgnoreNegWeightsInTraining is or not.
Definition: Event.cxx:381
UInt_t GetClass() const
Definition: Event.h:86
Virtual base Class for all MVA method.
Definition: MethodBase.h:111
virtual void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
Definition: MethodBase.cxx:598
Likelihood analysis ("non-parametric approach")
const Ranking * CreateRanking()
computes ranking of input variables
void Train()
create reference distributions (PDFs) from signal and background events: fill histograms and smooth t...
virtual Bool_t HasAnalysisType(Types::EAnalysisType type, UInt_t numberClasses, UInt_t numberTargets)
FDA can handle classification with 2 classes.
virtual void WriteOptionsToStream(std::ostream &o, const TString &prefix) const
write options to stream
void WriteMonitoringHistosToFile() const
write histograms and PDFs to file for monitoring purposes
void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
void MakeClassSpecific(std::ostream &, const TString &) const
write specific classifier response
virtual ~MethodLikelihood()
destructor
void Init()
default initialisation called by all constructors
void ReadWeightsFromStream(std::istream &istr)
read weight info from file nothing to do for this method
MethodLikelihood(const TString &jobName, const TString &methodTitle, DataSetInfo &theData, const TString &theOption="")
standard constructor
void GetHelpMessage() const
get help message text
Double_t GetMvaValue(Double_t *err=0, Double_t *errUpper=0)
returns the likelihood estimator for signal fill a new Likelihood branch into the testTree
void ReadWeightsFromXML(void *wghtnode)
read weights from XML
void MakeClassSpecificHeader(std::ostream &, const TString &="") const
write specific header of the classifier (mostly include files)
Double_t TransformLikelihoodOutput(Double_t ps, Double_t pb) const
returns transformed or non-transformed output
void ProcessOptions()
process user options reference cut value to distinguish signal-like from background-like events
void WriteWeightsToStream(TFile &rf) const
write reference PDFs to ROOT file
void AddWeightsXMLTo(void *parent) const
write weights to XML
void DeclareOptions()
define the options (their key words) that can be set in the option string
PDF wrapper for histograms; uses user-defined spline interpolation.
Definition: PDF.h:63
TH1 * GetPDFHist() const
Definition: PDF.h:92
@ kSpline0
Definition: PDF.h:70
Ranking for variables in method (implementation)
Definition: Ranking.h:48
void * GetNextChild(void *prevchild, const char *childname=0)
XML helpers.
Definition: Tools.cxx:1173
Double_t GetSeparation(TH1 *S, TH1 *B) const
compute "separation" defined as
Definition: Tools.cxx:132
void * AddChild(void *parent, const char *childname, const char *content=0, bool isRootNode=false)
add child node
Definition: Tools.cxx:1135
const TString & Color(const TString &)
human readable color strings
Definition: Tools.cxx:839
void * GetChild(void *parent, const char *childname=0)
get child node
Definition: Tools.cxx:1161
void ReadAttr(void *node, const char *, T &value)
read attribute from xml
Definition: Tools.h:335
void AddAttr(void *node, const char *, const T &value, Int_t precision=16)
add attribute to xml
Definition: Tools.h:353
Singleton class for Global types used by TMVA.
Definition: Types.h:73
EAnalysisType
Definition: Types.h:127
@ kClassification
Definition: Types.h:128
virtual Int_t Write(const char *name=0, Int_t option=0, Int_t bufsize=0)
Write this object to the current directory.
Definition: TObject.cxx:796
Basic string class.
Definition: TString.h:131
const char * Data() const
Definition: TString.h:364
TVectorT.
Definition: TVectorT.h:27
Double_t x[n]
Definition: legend1.C:17
static constexpr double mm
static constexpr double ps
void GetMethodName(TString &name, TKey *mkey)
Definition: tmvaglob.cxx:335
create variable transformations
Tools & gTools()
MsgLogger & Endl(MsgLogger &ml)
Definition: MsgLogger.h:158
Int_t Nint(T x)
Round to nearest integer. Rounds half integers to the nearest even integer.
Definition: TMath.h:703
Short_t Max(Short_t a, Short_t b)
Definition: TMathBase.h:212
Double_t Log(Double_t x)
Definition: TMath.h:750
Short_t Min(Short_t a, Short_t b)
Definition: TMathBase.h:180