Logo ROOT  
Reference Guide
MethodFisher.cxx
Go to the documentation of this file.
1// @(#)root/tmva $Id$
2// Author: Andreas Hoecker, Xavier Prudent, Joerg Stelzer, Helge Voss, Kai Voss
3
4/**********************************************************************************
5 * Project: TMVA - a Root-integrated toolkit for multivariate Data analysis *
6 * Package: TMVA *
7 * Class : MethodFisher *
8 * Web : http://tmva.sourceforge.net *
9 * *
10 * Description: *
11 * Implementation (see header for description) *
12 * *
13 * Original author of this Fisher-Discriminant implementation: *
14 * Andre Gaidot, CEA-France; *
15 * (Translation from FORTRAN) *
16 * *
17 * Authors (alphabetical): *
18 * Andreas Hoecker <Andreas.Hocker@cern.ch> - CERN, Switzerland *
19 * Xavier Prudent <prudent@lapp.in2p3.fr> - LAPP, France *
20 * Helge Voss <Helge.Voss@cern.ch> - MPI-K Heidelberg, Germany *
21 * Kai Voss <Kai.Voss@cern.ch> - U. of Victoria, Canada *
22 * *
23 * Copyright (c) 2005: *
24 * CERN, Switzerland *
25 * U. of Victoria, Canada *
26 * MPI-K Heidelberg, Germany *
27 * LAPP, Annecy, France *
28 * *
29 * Redistribution and use in source and binary forms, with or without *
30 * modification, are permitted according to the terms listed in LICENSE *
31 * (http://tmva.sourceforge.net/LICENSE) *
32 **********************************************************************************/
33
34/*! \class TMVA::MethodFisher
35\ingroup TMVA
36
37Fisher and Mahalanobis Discriminants (Linear Discriminant Analysis)
38
39In the method of Fisher discriminants event selection is performed
40in a transformed variable space with zero linear correlations, by
41distinguishing the mean values of the signal and background
42distributions.
43
44The linear discriminant analysis determines an axis in the (correlated)
45hyperspace of the input variables
46such that, when projecting the output classes (signal and background)
47upon this axis, they are pushed as far as possible away from each other,
48while events of a same class are confined in a close vicinity.
49The linearity property of this method is reflected in the metric with
50which "far apart" and "close vicinity" are determined: the covariance
51matrix of the discriminant variable space.
52
53The classification of the events in signal and background classes
54relies on the following characteristics (only): overall sample means, \f$ x_i \f$,
55for each input variable, \f$ i \f$,
56class-specific sample means, \f$ x_{S(B),i}\f$,
57and total covariance matrix \f$ T_{ij} \f$. The covariance matrix
58can be decomposed into the sum of a _within_ (\f$ W_{ij} \f$)
59and a _between-class_ (\f$ B_{ij} \f$) class matrix. They describe
60the dispersion of events relative to the means of their own class (within-class
61matrix), and relative to the overall sample means (between-class matrix).
62The Fisher coefficients, \f$ F_i \f$, are then given by
63
64\f[
65F_i = \frac{\sqrt{N_s N_b}}{N_s + N_b} \sum_{j=1}^{N_{SB}} W_{ij}^{-1} (\bar{X}_{Sj} - \bar{X}_{Bj})
66\f]
67
68where in TMVA is set \f$ N_S = N_B \f$, so that the factor
69in front of the sum simplifies to \f$ \frac{1}{2}\f$.
70The Fisher discriminant then reads
71
72\f[
73X_{Fi} = F_0 + \sum_{i=1}^{N_{SB}} F_i X_i
74\f]
75
76The offset \f$ F_0 \f$ centers the sample mean of \f$ x_{Fi} \f$
77at zero. Instead of using the within-class matrix, the Mahalanobis variant
78determines the Fisher coefficients as follows:
79
80\f[
81F_i = \frac{\sqrt{N_s N_b}}{N_s + N_b} \sum_{j=1}^{N_{SB}} (W + B)_{ij}^{-1} (\bar{X}_{Sj} - \bar{X}_{Bj})
82\f]
83
84with resulting \f$ x_{Ma} \f$ that are very similar to the \f$ x_{Fi} \f$.
85
86TMVA provides two outputs for the ranking of the input variables:
87
88 - __Fisher test:__ the Fisher analysis aims at simultaneously maximising
89the between-class separation, while minimising the within-class dispersion.
90A useful measure of the discrimination power of a variable is hence given
91by the diagonal quantity: \f$ \frac{B_{ii}}{W_{ii}} \f$ .
92
93 - __Discrimination power:__ the value of the Fisher coefficient is a
94measure of the discriminating power of a variable. The discrimination power
95of set of input variables can therefore be measured by the scalar
96
97\f[
98\lambda = \frac{\sqrt{N_s N_b}}{N_s + N_b} \sum_{j=1}^{N_{SB}} F_i (\bar{X}_{Sj} - \bar{X}_{Bj})
99\f]
100
101The corresponding numbers are printed on standard output.
102*/
103
104#include "TMVA/MethodFisher.h"
105
107#include "TMVA/Configurable.h"
108#include "TMVA/DataSet.h"
109#include "TMVA/DataSetInfo.h"
110#include "TMVA/Event.h"
111#include "TMVA/IMethod.h"
112#include "TMVA/MethodBase.h"
113#include "TMVA/MsgLogger.h"
114#include "TMVA/Ranking.h"
115#include "TMVA/Tools.h"
117#include "TMVA/Types.h"
119
120#include "TMath.h"
121#include "TMatrix.h"
122#include "TList.h"
123
124#include <iostream>
125#include <iomanip>
126#include <cassert>
127
128REGISTER_METHOD(Fisher)
129
131
132////////////////////////////////////////////////////////////////////////////////
133/// standard constructor for the "Fisher"
134
136 const TString& methodTitle,
137 DataSetInfo& dsi,
138 const TString& theOption ) :
139 MethodBase( jobName, Types::kFisher, methodTitle, dsi, theOption),
140 fMeanMatx ( 0 ),
141 fTheMethod ( "Fisher" ),
142 fFisherMethod ( kFisher ),
143 fBetw ( 0 ),
144 fWith ( 0 ),
145 fCov ( 0 ),
146 fSumOfWeightsS( 0 ),
147 fSumOfWeightsB( 0 ),
148 fDiscrimPow ( 0 ),
149 fFisherCoeff ( 0 ),
150 fF0 ( 0 )
151{
152}
153
154////////////////////////////////////////////////////////////////////////////////
155/// constructor from weight file
156
158 const TString& theWeightFile) :
159 MethodBase( Types::kFisher, dsi, theWeightFile),
160 fMeanMatx ( 0 ),
161 fTheMethod ( "Fisher" ),
162 fFisherMethod ( kFisher ),
163 fBetw ( 0 ),
164 fWith ( 0 ),
165 fCov ( 0 ),
166 fSumOfWeightsS( 0 ),
167 fSumOfWeightsB( 0 ),
168 fDiscrimPow ( 0 ),
169 fFisherCoeff ( 0 ),
170 fF0 ( 0 )
171{
172}
173
174////////////////////////////////////////////////////////////////////////////////
175/// default initialization called by all constructors
176
178{
179 // allocate Fisher coefficients
180 fFisherCoeff = new std::vector<Double_t>( GetNvar() );
181
182 // the minimum requirement to declare an event signal-like
183 SetSignalReferenceCut( 0.0 );
184
185 // this is the preparation for training
186 InitMatrices();
187}
188
189////////////////////////////////////////////////////////////////////////////////
190/// MethodFisher options:
191/// format and syntax of option string: "type"
192/// where type is "Fisher" or "Mahalanobis"
193
195{
196 DeclareOptionRef( fTheMethod = "Fisher", "Method", "Discrimination method" );
197 AddPreDefVal(TString("Fisher"));
198 AddPreDefVal(TString("Mahalanobis"));
199}
200
201////////////////////////////////////////////////////////////////////////////////
202/// process user options
203
205{
206 if (fTheMethod == "Fisher" ) fFisherMethod = kFisher;
207 else fFisherMethod = kMahalanobis;
208
209 // this is the preparation for training
210 InitMatrices();
211}
212
213////////////////////////////////////////////////////////////////////////////////
214/// destructor
215
217{
218 if (fBetw ) { delete fBetw; fBetw = 0; }
219 if (fWith ) { delete fWith; fWith = 0; }
220 if (fCov ) { delete fCov; fCov = 0; }
221 if (fDiscrimPow ) { delete fDiscrimPow; fDiscrimPow = 0; }
222 if (fFisherCoeff) { delete fFisherCoeff; fFisherCoeff = 0; }
223}
224
225////////////////////////////////////////////////////////////////////////////////
226/// Fisher can only handle classification with 2 classes
227
229{
230 if (type == Types::kClassification && numberClasses == 2) return kTRUE;
231 return kFALSE;
232}
233
234////////////////////////////////////////////////////////////////////////////////
235/// computation of Fisher coefficients by series of matrix operations
236
238{
239 // get mean value of each variables for signal, backgd and signal+backgd
240 GetMean();
241
242 // get the matrix of covariance 'within class'
243 GetCov_WithinClass();
244
245 // get the matrix of covariance 'between class'
246 GetCov_BetweenClass();
247
248 // get the matrix of covariance 'between class'
249 GetCov_Full();
250
251 //--------------------------------------------------------------
252
253 // get the Fisher coefficients
254 GetFisherCoeff();
255
256 // get the discriminating power of each variables
257 GetDiscrimPower();
258
259 // nice output
260 PrintCoefficients();
261
262 ExitFromTraining();
263}
264
265////////////////////////////////////////////////////////////////////////////////
266/// returns the Fisher value (no fixed range)
267
269{
270 const Event * ev = GetEvent();
271 Double_t result = fF0;
272 for (UInt_t ivar=0; ivar<GetNvar(); ivar++)
273 result += (*fFisherCoeff)[ivar]*ev->GetValue(ivar);
274
275 // cannot determine error
276 NoErrorCalc(err, errUpper);
277
278 return result;
279
280}
281
282////////////////////////////////////////////////////////////////////////////////
283/// initialization method; creates global matrices and vectors
284
286{
287 // average value of each variables for S, B, S+B
288 fMeanMatx = new TMatrixD( GetNvar(), 3 );
289
290 // the covariance 'within class' and 'between class' matrices
291 fBetw = new TMatrixD( GetNvar(), GetNvar() );
292 fWith = new TMatrixD( GetNvar(), GetNvar() );
293 fCov = new TMatrixD( GetNvar(), GetNvar() );
294
295 // discriminating power
296 fDiscrimPow = new std::vector<Double_t>( GetNvar() );
297}
298
299////////////////////////////////////////////////////////////////////////////////
300/// compute mean values of variables in each sample, and the overall means
301
303{
304 // initialize internal sum-of-weights variables
305 fSumOfWeightsS = 0;
306 fSumOfWeightsB = 0;
307
308 const UInt_t nvar = DataInfo().GetNVariables();
309
310 // init vectors
311 Double_t* sumS = new Double_t[nvar];
312 Double_t* sumB = new Double_t[nvar];
313 for (UInt_t ivar=0; ivar<nvar; ivar++) { sumS[ivar] = sumB[ivar] = 0; }
314
315 // compute sample means
316 for (Int_t ievt=0; ievt<Data()->GetNEvents(); ievt++) {
317
318 // read the Training Event into "event"
319 const Event * ev = GetEvent(ievt);
320
321 // sum of weights
322 Double_t weight = ev->GetWeight();
323 if (DataInfo().IsSignal(ev)) fSumOfWeightsS += weight;
324 else fSumOfWeightsB += weight;
325
326 Double_t* sum = DataInfo().IsSignal(ev) ? sumS : sumB;
327
328 for (UInt_t ivar=0; ivar<nvar; ivar++) sum[ivar] += ev->GetValue( ivar )*weight;
329 }
330
331 for (UInt_t ivar=0; ivar<nvar; ivar++) {
332 (*fMeanMatx)( ivar, 2 ) = sumS[ivar];
333 (*fMeanMatx)( ivar, 0 ) = sumS[ivar]/fSumOfWeightsS;
334
335 (*fMeanMatx)( ivar, 2 ) += sumB[ivar];
336 (*fMeanMatx)( ivar, 1 ) = sumB[ivar]/fSumOfWeightsB;
337
338 // signal + background
339 (*fMeanMatx)( ivar, 2 ) /= (fSumOfWeightsS + fSumOfWeightsB);
340 }
341
342 // fMeanMatx->Print();
343 delete [] sumS;
344 delete [] sumB;
345}
346
347////////////////////////////////////////////////////////////////////////////////
348/// the matrix of covariance 'within class' reflects the dispersion of the
349/// events relative to the center of gravity of their own class
350
352{
353 // assert required
354 assert( fSumOfWeightsS > 0 && fSumOfWeightsB > 0 );
355
356 // product matrices (x-<x>)(y-<y>) where x;y are variables
357
358 // init
359 const Int_t nvar = GetNvar();
360 const Int_t nvar2 = nvar*nvar;
361 Double_t *sumSig = new Double_t[nvar2];
362 Double_t *sumBgd = new Double_t[nvar2];
363 Double_t *xval = new Double_t[nvar];
364 memset(sumSig,0,nvar2*sizeof(Double_t));
365 memset(sumBgd,0,nvar2*sizeof(Double_t));
366
367 // 'within class' covariance
368 for (Int_t ievt=0; ievt<Data()->GetNEvents(); ievt++) {
369
370 // read the Training Event into "event"
371 const Event* ev = GetEvent(ievt);
372
373 Double_t weight = ev->GetWeight(); // may ignore events with negative weights
374
375 for (Int_t x=0; x<nvar; x++) xval[x] = ev->GetValue( x );
376 Int_t k=0;
377 for (Int_t x=0; x<nvar; x++) {
378 for (Int_t y=0; y<nvar; y++) {
379 if (DataInfo().IsSignal(ev)) {
380 Double_t v = ( (xval[x] - (*fMeanMatx)(x, 0))*(xval[y] - (*fMeanMatx)(y, 0)) )*weight;
381 sumSig[k] += v;
382 }else{
383 Double_t v = ( (xval[x] - (*fMeanMatx)(x, 1))*(xval[y] - (*fMeanMatx)(y, 1)) )*weight;
384 sumBgd[k] += v;
385 }
386 k++;
387 }
388 }
389 }
390 Int_t k=0;
391 for (Int_t x=0; x<nvar; x++) {
392 for (Int_t y=0; y<nvar; y++) {
393 //(*fWith)(x, y) = (sumSig[k] + sumBgd[k])/(fSumOfWeightsS + fSumOfWeightsB);
394 // HHV: I am still convinced that THIS is how it should be (below) However, while
395 // the old version corresponded so nicely with LD, the FIXED version does not, unless
396 // we agree to change LD. For LD, it is not "defined" to my knowledge how the weights
397 // are weighted, while it is clear how the "Within" matrix for Fisher should be calculated
398 // (i.e. as seen below). In order to agree with the Fisher classifier, one would have to
399 // weigh signal and background such that they correspond to the same number of effective
400 // (weighted) events.
401 // THAT is NOT done currently, but just "event weights" are used.
402 (*fWith)(x, y) = sumSig[k]/fSumOfWeightsS + sumBgd[k]/fSumOfWeightsB;
403 k++;
404 }
405 }
406
407 delete [] sumSig;
408 delete [] sumBgd;
409 delete [] xval;
410}
411
412////////////////////////////////////////////////////////////////////////////////
413/// the matrix of covariance 'between class' reflects the dispersion of the
414/// events of a class relative to the global center of gravity of all the class
415/// hence the separation between classes
416
418{
419 // assert required
420 assert( fSumOfWeightsS > 0 && fSumOfWeightsB > 0);
421
422 Double_t prodSig, prodBgd;
423
424 for (UInt_t x=0; x<GetNvar(); x++) {
425 for (UInt_t y=0; y<GetNvar(); y++) {
426
427 prodSig = ( ((*fMeanMatx)(x, 0) - (*fMeanMatx)(x, 2))*
428 ((*fMeanMatx)(y, 0) - (*fMeanMatx)(y, 2)) );
429 prodBgd = ( ((*fMeanMatx)(x, 1) - (*fMeanMatx)(x, 2))*
430 ((*fMeanMatx)(y, 1) - (*fMeanMatx)(y, 2)) );
431
432 (*fBetw)(x, y) = (fSumOfWeightsS*prodSig + fSumOfWeightsB*prodBgd) / (fSumOfWeightsS + fSumOfWeightsB);
433 }
434 }
435}
436
437////////////////////////////////////////////////////////////////////////////////
438/// compute full covariance matrix from sum of within and between matrices
439
441{
442 for (UInt_t x=0; x<GetNvar(); x++)
443 for (UInt_t y=0; y<GetNvar(); y++)
444 (*fCov)(x, y) = (*fWith)(x, y) + (*fBetw)(x, y);
445}
446
447////////////////////////////////////////////////////////////////////////////////
448/// Fisher = Sum { [coeff]*[variables] }
449///
450/// let Xs be the array of the mean values of variables for signal evts
451/// let Xb be the array of the mean values of variables for backgd evts
452/// let InvWith be the inverse matrix of the 'within class' correlation matrix
453///
454/// then the array of Fisher coefficients is
455/// [coeff] =sqrt(fNsig*fNbgd)/fNevt*transpose{Xs-Xb}*InvWith
456
458{
459 // assert required
460 assert( fSumOfWeightsS > 0 && fSumOfWeightsB > 0);
461
462 // invert covariance matrix
463 TMatrixD* theMat = 0;
464 switch (GetFisherMethod()) {
465 case kFisher:
466 theMat = fWith;
467 break;
468 case kMahalanobis:
469 theMat = fCov;
470 break;
471 default:
472 Log() << kFATAL << "<GetFisherCoeff> undefined method" << GetFisherMethod() << Endl;
473 }
474
475 TMatrixD invCov( *theMat );
476
477 if ( TMath::Abs(invCov.Determinant()) < 10E-24 ) {
478 Log() << kWARNING << "<GetFisherCoeff> matrix is almost singular with determinant="
479 << TMath::Abs(invCov.Determinant())
480 << " did you use the variables that are linear combinations or highly correlated?"
481 << Endl;
482 }
483 if ( TMath::Abs(invCov.Determinant()) < 10E-120 ) {
484 theMat->Print();
485 Log() << kFATAL << "<GetFisherCoeff> matrix is singular with determinant="
486 << TMath::Abs(invCov.Determinant())
487 << " did you use the variables that are linear combinations? \n"
488 << " do you any clue as to what went wrong in above printout of the covariance matrix? "
489 << Endl;
490 }
491
492 invCov.Invert();
493
494 // apply rescaling factor
495 Double_t xfact = TMath::Sqrt( fSumOfWeightsS*fSumOfWeightsB ) / (fSumOfWeightsS + fSumOfWeightsB);
496
497 // compute difference of mean values
498 std::vector<Double_t> diffMeans( GetNvar() );
499 UInt_t ivar, jvar;
500 for (ivar=0; ivar<GetNvar(); ivar++) {
501 (*fFisherCoeff)[ivar] = 0;
502
503 for (jvar=0; jvar<GetNvar(); jvar++) {
504 Double_t d = (*fMeanMatx)(jvar, 0) - (*fMeanMatx)(jvar, 1);
505 (*fFisherCoeff)[ivar] += invCov(ivar, jvar)*d;
506 }
507 // rescale
508 (*fFisherCoeff)[ivar] *= xfact;
509 }
510
511
512 // offset correction
513 fF0 = 0.0;
514 for (ivar=0; ivar<GetNvar(); ivar++){
515 fF0 += (*fFisherCoeff)[ivar]*((*fMeanMatx)(ivar, 0) + (*fMeanMatx)(ivar, 1));
516 }
517 fF0 /= -2.0;
518}
519
520////////////////////////////////////////////////////////////////////////////////
521/// computation of discrimination power indicator for each variable
522/// small values of "fWith" indicates little compactness of sig & of backgd
523/// big values of "fBetw" indicates large separation between sig & backgd
524///
525/// we want signal & backgd classes as compact and separated as possible
526/// the discriminating power is then defined as the ration "fBetw/fWith"
527
529{
530 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
531 if ((*fCov)(ivar, ivar) != 0)
532 (*fDiscrimPow)[ivar] = (*fBetw)(ivar, ivar)/(*fCov)(ivar, ivar);
533 else
534 (*fDiscrimPow)[ivar] = 0;
535 }
536}
537
538////////////////////////////////////////////////////////////////////////////////
539/// computes ranking of input variables
540
542{
543 // create the ranking object
544 fRanking = new Ranking( GetName(), "Discr. power" );
545
546 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
547 fRanking->AddRank( Rank( GetInputLabel(ivar), (*fDiscrimPow)[ivar] ) );
548 }
549
550 return fRanking;
551}
552
553////////////////////////////////////////////////////////////////////////////////
554/// display Fisher coefficients and discriminating power for each variable
555/// check maximum length of variable name
556
558{
559 Log() << kHEADER << "Results for Fisher coefficients:" << Endl;
560
561 if (GetTransformationHandler().GetTransformationList().GetSize() != 0) {
562 Log() << kINFO << "NOTE: The coefficients must be applied to TRANFORMED variables" << Endl;
563 Log() << kINFO << " List of the transformation: " << Endl;
564 TListIter trIt(&GetTransformationHandler().GetTransformationList());
565 while (VariableTransformBase *trf = (VariableTransformBase*) trIt()) {
566 Log() << kINFO << " -- " << trf->GetName() << Endl;
567 }
568 }
569 std::vector<TString> vars;
570 std::vector<Double_t> coeffs;
571 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
572 vars .push_back( GetInputLabel(ivar) );
573 coeffs.push_back( (*fFisherCoeff)[ivar] );
574 }
575 vars .push_back( "(offset)" );
576 coeffs.push_back( fF0 );
577 TMVA::gTools().FormattedOutput( coeffs, vars, "Variable" , "Coefficient", Log() );
578
579 // for (int i=0; i<coeffs.size(); i++)
580 // std::cout << "fisher coeff["<<i<<"]="<<coeffs[i]<<std::endl;
581
582 if (IsNormalised()) {
583 Log() << kINFO << "NOTE: You have chosen to use the \"Normalise\" booking option. Hence, the" << Endl;
584 Log() << kINFO << " coefficients must be applied to NORMALISED (') variables as follows:" << Endl;
585 Int_t maxL = 0;
586 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) if (GetInputLabel(ivar).Length() > maxL) maxL = GetInputLabel(ivar).Length();
587
588 // Print normalisation expression (see Tools.cxx): "2*(x - xmin)/(xmax - xmin) - 1.0"
589 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
590 Log() << kINFO
591 << std::setw(maxL+9) << TString("[") + GetInputLabel(ivar) + "]' = 2*("
592 << std::setw(maxL+2) << TString("[") + GetInputLabel(ivar) + "]"
593 << std::setw(3) << (GetXmin(ivar) > 0 ? " - " : " + ")
594 << std::setw(6) << TMath::Abs(GetXmin(ivar)) << std::setw(3) << ")/"
595 << std::setw(6) << (GetXmax(ivar) - GetXmin(ivar) )
596 << std::setw(3) << " - 1"
597 << Endl;
598 }
599 Log() << kINFO << "The TMVA Reader will properly account for this normalisation, but if the" << Endl;
600 Log() << kINFO << "Fisher classifier is applied outside the Reader, the transformation must be" << Endl;
601 Log() << kINFO << "implemented -- or the \"Normalise\" option is removed and Fisher retrained." << Endl;
602 Log() << kINFO << Endl;
603 }
604}
605
606////////////////////////////////////////////////////////////////////////////////
607/// read Fisher coefficients from weight file
608
610{
611 istr >> fF0;
612 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) istr >> (*fFisherCoeff)[ivar];
613}
614
615////////////////////////////////////////////////////////////////////////////////
616/// create XML description of Fisher classifier
617
618void TMVA::MethodFisher::AddWeightsXMLTo( void* parent ) const
619{
620 void* wght = gTools().AddChild(parent, "Weights");
621 gTools().AddAttr( wght, "NCoeff", GetNvar()+1 );
622 void* coeffxml = gTools().AddChild(wght, "Coefficient");
623 gTools().AddAttr( coeffxml, "Index", 0 );
624 gTools().AddAttr( coeffxml, "Value", fF0 );
625 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
626 coeffxml = gTools().AddChild( wght, "Coefficient" );
627 gTools().AddAttr( coeffxml, "Index", ivar+1 );
628 gTools().AddAttr( coeffxml, "Value", (*fFisherCoeff)[ivar] );
629 }
630}
631
632////////////////////////////////////////////////////////////////////////////////
633/// read Fisher coefficients from xml weight file
634
636{
637 UInt_t ncoeff, coeffidx;
638 gTools().ReadAttr( wghtnode, "NCoeff", ncoeff );
639 fFisherCoeff->resize(ncoeff-1);
640
641 void* ch = gTools().GetChild(wghtnode);
642 Double_t coeff;
643 while (ch) {
644 gTools().ReadAttr( ch, "Index", coeffidx );
645 gTools().ReadAttr( ch, "Value", coeff );
646 if (coeffidx==0) fF0 = coeff;
647 else (*fFisherCoeff)[coeffidx-1] = coeff;
648 ch = gTools().GetNextChild(ch);
649 }
650}
651
652////////////////////////////////////////////////////////////////////////////////
653/// write Fisher-specific classifier response
654
655void TMVA::MethodFisher::MakeClassSpecific( std::ostream& fout, const TString& className ) const
656{
657 Int_t dp = fout.precision();
658 fout << " double fFisher0;" << std::endl;
659 fout << " std::vector<double> fFisherCoefficients;" << std::endl;
660 fout << "};" << std::endl;
661 fout << "" << std::endl;
662 fout << "inline void " << className << "::Initialize() " << std::endl;
663 fout << "{" << std::endl;
664 fout << " fFisher0 = " << std::setprecision(12) << fF0 << ";" << std::endl;
665 for (UInt_t ivar=0; ivar<GetNvar(); ivar++) {
666 fout << " fFisherCoefficients.push_back( " << std::setprecision(12) << (*fFisherCoeff)[ivar] << " );" << std::endl;
667 }
668 fout << std::endl;
669 fout << " // sanity check" << std::endl;
670 fout << " if (fFisherCoefficients.size() != fNvars) {" << std::endl;
671 fout << " std::cout << \"Problem in class \\\"\" << fClassName << \"\\\"::Initialize: mismatch in number of input values\"" << std::endl;
672 fout << " << fFisherCoefficients.size() << \" != \" << fNvars << std::endl;" << std::endl;
673 fout << " fStatusIsClean = false;" << std::endl;
674 fout << " } " << std::endl;
675 fout << "}" << std::endl;
676 fout << std::endl;
677 fout << "inline double " << className << "::GetMvaValue__( const std::vector<double>& inputValues ) const" << std::endl;
678 fout << "{" << std::endl;
679 fout << " double retval = fFisher0;" << std::endl;
680 fout << " for (size_t ivar = 0; ivar < fNvars; ivar++) {" << std::endl;
681 fout << " retval += fFisherCoefficients[ivar]*inputValues[ivar];" << std::endl;
682 fout << " }" << std::endl;
683 fout << std::endl;
684 fout << " return retval;" << std::endl;
685 fout << "}" << std::endl;
686 fout << std::endl;
687 fout << "// Clean up" << std::endl;
688 fout << "inline void " << className << "::Clear() " << std::endl;
689 fout << "{" << std::endl;
690 fout << " // clear coefficients" << std::endl;
691 fout << " fFisherCoefficients.clear(); " << std::endl;
692 fout << "}" << std::endl;
693 fout << std::setprecision(dp);
694}
695
696////////////////////////////////////////////////////////////////////////////////
697/// get help message text
698///
699/// typical length of text line:
700/// "|--------------------------------------------------------------|"
701
703{
704 Log() << Endl;
705 Log() << gTools().Color("bold") << "--- Short description:" << gTools().Color("reset") << Endl;
706 Log() << Endl;
707 Log() << "Fisher discriminants select events by distinguishing the mean " << Endl;
708 Log() << "values of the signal and background distributions in a trans- " << Endl;
709 Log() << "formed variable space where linear correlations are removed." << Endl;
710 Log() << Endl;
711 Log() << " (More precisely: the \"linear discriminator\" determines" << Endl;
712 Log() << " an axis in the (correlated) hyperspace of the input " << Endl;
713 Log() << " variables such that, when projecting the output classes " << Endl;
714 Log() << " (signal and background) upon this axis, they are pushed " << Endl;
715 Log() << " as far as possible away from each other, while events" << Endl;
716 Log() << " of a same class are confined in a close vicinity. The " << Endl;
717 Log() << " linearity property of this classifier is reflected in the " << Endl;
718 Log() << " metric with which \"far apart\" and \"close vicinity\" are " << Endl;
719 Log() << " determined: the covariance matrix of the discriminating" << Endl;
720 Log() << " variable space.)" << Endl;
721 Log() << Endl;
722 Log() << gTools().Color("bold") << "--- Performance optimisation:" << gTools().Color("reset") << Endl;
723 Log() << Endl;
724 Log() << "Optimal performance for Fisher discriminants is obtained for " << Endl;
725 Log() << "linearly correlated Gaussian-distributed variables. Any deviation" << Endl;
726 Log() << "from this ideal reduces the achievable separation power. In " << Endl;
727 Log() << "particular, no discrimination at all is achieved for a variable" << Endl;
728 Log() << "that has the same sample mean for signal and background, even if " << Endl;
729 Log() << "the shapes of the distributions are very different. Thus, Fisher " << Endl;
730 Log() << "discriminants often benefit from suitable transformations of the " << Endl;
731 Log() << "input variables. For example, if a variable x in [-1,1] has a " << Endl;
732 Log() << "a parabolic signal distributions, and a uniform background" << Endl;
733 Log() << "distributions, their mean value is zero in both cases, leading " << Endl;
734 Log() << "to no separation. The simple transformation x -> |x| renders this " << Endl;
735 Log() << "variable powerful for the use in a Fisher discriminant." << Endl;
736 Log() << Endl;
737 Log() << gTools().Color("bold") << "--- Performance tuning via configuration options:" << gTools().Color("reset") << Endl;
738 Log() << Endl;
739 Log() << "<None>" << Endl;
740}
#define REGISTER_METHOD(CLASS)
for example
#define d(i)
Definition: RSha256.hxx:102
int Int_t
Definition: RtypesCore.h:45
unsigned int UInt_t
Definition: RtypesCore.h:46
const Bool_t kFALSE
Definition: RtypesCore.h:101
bool Bool_t
Definition: RtypesCore.h:63
double Double_t
Definition: RtypesCore.h:59
const Bool_t kTRUE
Definition: RtypesCore.h:100
#define ClassImp(name)
Definition: Rtypes.h:364
int type
Definition: TGX11.cxx:121
TMatrixT< Double_t > TMatrixD
Definition: TMatrixDfwd.h:22
Iterator of linked list.
Definition: TList.h:200
Class that contains all the data information.
Definition: DataSetInfo.h:62
Float_t GetValue(UInt_t ivar) const
return value of i'th variable
Definition: Event.cxx:236
Double_t GetWeight() const
return the event weight - depending on whether the flag IgnoreNegWeightsInTraining is or not.
Definition: Event.cxx:381
Virtual base Class for all MVA method.
Definition: MethodBase.h:111
Fisher and Mahalanobis Discriminants (Linear Discriminant Analysis)
Definition: MethodFisher.h:54
void ReadWeightsFromStream(std::istream &i)
read Fisher coefficients from weight file
void GetCov_Full(void)
compute full covariance matrix from sum of within and between matrices
void GetHelpMessage() const
get help message text
MethodFisher(const TString &jobName, const TString &methodTitle, DataSetInfo &dsi, const TString &theOption="Fisher")
standard constructor for the "Fisher"
const Ranking * CreateRanking()
computes ranking of input variables
virtual ~MethodFisher(void)
destructor
void Train(void)
computation of Fisher coefficients by series of matrix operations
void GetDiscrimPower(void)
computation of discrimination power indicator for each variable small values of "fWith" indicates lit...
virtual Bool_t HasAnalysisType(Types::EAnalysisType type, UInt_t numberClasses, UInt_t numberTargets)
Fisher can only handle classification with 2 classes.
void PrintCoefficients(void)
display Fisher coefficients and discriminating power for each variable check maximum length of variab...
void GetCov_BetweenClass(void)
the matrix of covariance 'between class' reflects the dispersion of the events of a class relative to...
void MakeClassSpecific(std::ostream &, const TString &) const
write Fisher-specific classifier response
void ReadWeightsFromXML(void *wghtnode)
read Fisher coefficients from xml weight file
void ProcessOptions()
process user options
void GetFisherCoeff(void)
Fisher = Sum { [coeff]*[variables] }.
void GetMean(void)
compute mean values of variables in each sample, and the overall means
void AddWeightsXMLTo(void *parent) const
create XML description of Fisher classifier
void DeclareOptions()
MethodFisher options: format and syntax of option string: "type" where type is "Fisher" or "Mahalanob...
Double_t GetMvaValue(Double_t *err=0, Double_t *errUpper=0)
returns the Fisher value (no fixed range)
void InitMatrices(void)
initialization method; creates global matrices and vectors
void GetCov_WithinClass(void)
the matrix of covariance 'within class' reflects the dispersion of the events relative to the center ...
void Init(void)
default initialization called by all constructors
Ranking for variables in method (implementation)
Definition: Ranking.h:48
void FormattedOutput(const std::vector< Double_t > &, const std::vector< TString > &, const TString titleVars, const TString titleValues, MsgLogger &logger, TString format="%+1.3f")
formatted output of simple table
Definition: Tools.cxx:887
void * GetNextChild(void *prevchild, const char *childname=0)
XML helpers.
Definition: Tools.cxx:1162
void * AddChild(void *parent, const char *childname, const char *content=0, bool isRootNode=false)
add child node
Definition: Tools.cxx:1124
const TString & Color(const TString &)
human readable color strings
Definition: Tools.cxx:828
void * GetChild(void *parent, const char *childname=0)
get child node
Definition: Tools.cxx:1150
void ReadAttr(void *node, const char *, T &value)
read attribute from xml
Definition: Tools.h:329
void AddAttr(void *node, const char *, const T &value, Int_t precision=16)
add attribute to xml
Definition: Tools.h:347
Singleton class for Global types used by TMVA.
Definition: Types.h:71
EAnalysisType
Definition: Types.h:126
@ kClassification
Definition: Types.h:127
Linear interpolation class.
@ kHEADER
Definition: Types.h:63
@ kINFO
Definition: Types.h:58
@ kWARNING
Definition: Types.h:59
@ kFATAL
Definition: Types.h:61
void Print(Option_t *name="") const
Print the matrix as a table of elements.
TMatrixT< Element > & Invert(Double_t *det=0)
Invert the matrix and calculate its determinant.
Definition: TMatrixT.cxx:1397
virtual Double_t Determinant() const
Return the matrix determinant.
Definition: TMatrixT.cxx:1362
Basic string class.
Definition: TString.h:136
Double_t y[n]
Definition: legend1.C:17
Double_t x[n]
Definition: legend1.C:17
Tools & gTools()
MsgLogger & Endl(MsgLogger &ml)
Definition: MsgLogger.h:148
constexpr Double_t E()
Base of natural log:
Definition: TMath.h:96
Double_t Log(Double_t x)
Definition: TMath.h:710
Double_t Sqrt(Double_t x)
Definition: TMath.h:641
Short_t Abs(Short_t d)
Definition: TMathBase.h:120