library: libTMVA
#include "MethodCuts.h"

TMVA::MethodCuts


class description - header file - source file - inheritance tree (.pdf)

class TMVA::MethodCuts : public TMVA::MethodBase

Inheritance Chart:
TObject
<-
TMVA::MethodBase
<-
TMVA::MethodCuts
    private:
void CreateVariablePDFs() void GetEffsfromPDFs(Double_t* cutMin, Double_t* cutMax, Double_t& effS, Double_t& effB) void GetEffsfromSelection(Double_t* cutMin, Double_t* cutMax, Double_t& effS, Double_t& effB) void InitCuts() void MatchCutsToPars(Double_t*, Double_t*, Double_t*) void MatchParsToCuts(const vector<Double_t>&, Double_t*, Double_t*) void MatchParsToCuts(Double_t*, Double_t*, Double_t*) Bool_t SanityChecks() public:
MethodCuts(TString jobName, vector<TString>* theVariables, TTree* theTree = 0, TString theOption = MC:150:10000:, TDirectory* theTargetFile = 0) MethodCuts(vector<TString>* theVariables, TString theWeightFile, TDirectory* theTargetDir = NULL) MethodCuts(const TMVA::MethodCuts&) virtual ~MethodCuts() static TClass* Class() Double_t ComputeEstimator(const vector<Double_t>&) virtual Double_t GetEfficiency(TString, TTree*) virtual Double_t GetmuTransform(TTree*) virtual Double_t GetMvaValue(TMVA::Event* e) virtual Double_t GetSeparation() virtual Double_t GetSignificance() virtual TClass* IsA() const TMVA::MethodCuts& operator=(const TMVA::MethodCuts&) virtual void ReadWeightsFromFile() void SetTestSignalEfficiency(Double_t eff) virtual void ShowMembers(TMemberInspector& insp, char* parent) virtual void Streamer(TBuffer& b) void StreamerNVirtual(TBuffer& b) virtual void TestInitLocal(TTree* testTree) static TMVA::MethodCuts* ThisCuts() virtual void Train() virtual void WriteHistosToFile() virtual void WriteWeightsToFile()

Data Members

    private:
TMVA::MethodCuts::ConstrainType fConstrainType TMVA::MethodCuts::FitMethodType fFitMethod chosen fit method TMVA::MethodCuts::EffMethod fEffMethod chosen efficiency calculation method vector<FitParameters>* fFitParams vector for series of fit methods Double_t fTestSignalEff used to test optimized signal efficiency Double_t fEffSMin used to test optimized signal efficiency Double_t fEffSMax used to test optimized signal efficiency TMVA::BinarySearchTree* fBinaryTreeS TMVA::BinarySearchTree* fBinaryTreeB Int_t fGa_nsteps GA settings: number of steps Int_t fGa_preCalc GA settings: number of pre-calc steps Int_t fGa_SC_steps GA settings: SC_steps Int_t fGa_SC_offsteps GA settings: SC_offsteps Double_t fGa_SC_factor GA settings: SC_factor Double_t fEffRef reference efficiency vector<Int_t>* fRangeSign used to match cuts to fit parameters (and vice versa) TRandom* fTrandom random generator for MC optimisation method Int_t fNpar number of parameters in fit (default: 2*Nvar) vector<Double_t>* fMeanS means of variables (signal) vector<Double_t>* fMeanB means of variables (background) vector<Double_t>* fRmsS RMSs of variables (signal) vector<Double_t>* fRmsB RMSs of variables (background) vector<Double_t>* fXmin minimum values of variables vector<Double_t>* fXmax maximum values of variables TH1* fEffBvsSLocal intermediate eff. background versus eff signal histo vector<TH1*>* fVarHistS reference histograms (signal) vector<TH1*>* fVarHistB reference histograms (background) vector<TH1*>* fVarHistS_smooth smoothed reference histograms (signal) vector<TH1*>* fVarHistB_smooth smoothed reference histograms (background) vector<PDF*>* fVarPdfS reference PDFs (signal) vector<PDF*>* fVarPdfB reference PDFs (background) Int_t fNRandCuts number of random cut samplings Double_t** fCutMin minimum requirement Double_t** fCutMax maximum requirement Double_t* fTmpCutMin temporary minimum requirement Double_t* fTmpCutMax temporary maximum requirement static TMVA::MethodCuts* fgThisCuts needed for function reference (GA) public:
static const TMVA::MethodCuts::ConstrainType kConstrainEffS static const TMVA::MethodCuts::ConstrainType kConstrainEffB static const TMVA::MethodCuts::FitMethodType kUseMonteCarlo static const TMVA::MethodCuts::FitMethodType kUseGeneticAlgorithm static const TMVA::MethodCuts::EffMethod kUseEventSelection static const TMVA::MethodCuts::EffMethod kUsePDFs static const TMVA::MethodCuts::FitParameters kNotEnforced static const TMVA::MethodCuts::FitParameters kForceMin static const TMVA::MethodCuts::FitParameters kForceMax static const TMVA::MethodCuts::FitParameters kForceSmart static const TMVA::MethodCuts::FitParameters kForceVerySmart

Class Description

_______________________________________________________________________
Multivariate optimisation of signal efficiency for given background efficiency, applying rectangular minimum and maximum requirements.

Other optimisation criteria, such as maximising the signal significance- squared, S^2/(S+B), with S and B being the signal and background yields, correspond to a particular point in the optimised background rejection versus signal efficiency curve. This working point requires the knowledge of the expected yields, which is not the case in general. Note also that for rare signals, Poissonian statistics should be used, which modifies the significance criterion.

The rectangular cut of a volume in the variable space is performed using a binary tree to sort the training events. This provides a significant reduction in computing time (up to several orders of magnitudes, depending on the complexity of the problem at hand).

Technically, optimisation is achieved in TMVA by two methods:

  1. Monte Carlo generation using uniform priors for the lower cut value, and the cut width, thrown within the variable ranges.
  2. A Genetic Algorithm (GA) searches for the optimal ("fittest") cut sample. The GA is configurable by many external settings through the option string. For difficult cases (such as many variables), some tuning may be necessary to achieve satisfying results

Attempts to use Minuit fits (Simplex ot Migrad) instead have not shown superior results, and often failed due to convergence at local minima.

The tests we have performed so far showed that in generic applications, the GA is superior to MC sampling, and hence GA is the default method. It is worthwhile to try both anyway.

MethodCuts( TString jobName, vector<TString>* theVariables, TTree* theTree, TString theOption, TDirectory* theTargetDir )
 standard constructor
 ----------------------------------------------------------------------------------
 format of option string: "OptMethod:EffMethod:Option_var1:...:Option_varn"
 "OptMethod" can be:
     - "GA"    : Genetic Algorithm (recommended)
     - "MC"    : Monte-Carlo optimization
 "EffMethod" can be:
     - "EffSel": compute efficiency by event counting
     - "EffPDF": compute efficiency from PDFs
 === For "GA" method ======
 "Option_var1++" are (see GA for explanation of parameters):
     - fGa_nsteps
     - fGa_preCalc
     - fGa_SC_steps
     - fGa_SC_offsteps
     - fGa_SC_factor
 === For "MC" method ======
 "Option_var1" is number of random samples
 "Option_var2++" can be
     - "FMax"  : ForceMax   (the max cut is fixed to maximum of variable i)
     - "FMin"  : ForceMin   (the min cut is fixed to minimum of variable i)
     - "FSmart": ForceSmart (the min or max cut is fixed to min/max, based on mean value)
     - Adding "All" to "option_vari", eg, "AllFSmart" will use this option for all variables
     - if "option_vari" is empty (== ""), no assumptions on cut min/max are made
 ----------------------------------------------------------------------------------
MethodCuts( vector<TString> *theVariables, TString theWeightFile, TDirectory* theTargetDir )
 construction from weight file
void InitCuts( void )
 default initialisation called by all constructors
~MethodCuts( void )
 destructor
Double_t GetMvaValue( TMVA::Event *e )
 cut evaluation: returns 1.0 if event passed, 0.0 otherwise
void Train( void )
 training method: here the cuts are optimised for the training sample
Double_t ComputeEstimator( const std::vector<Double_t> & par )
 returns estimator for "cut fitness" used by GA
 there are two requirements:
 1) the signal efficiency must be equal to the required one in the
    efficiency scan
 2) the background efficiency must be as small as possible
 the requirement 1) has priority over 2)
void MatchParsToCuts( const std::vector<Double_t> & par, Double_t* cutMin, Double_t* cutMax )
 translates parameters into cuts
void MatchCutsToPars( Double_t* par, Double_t* cutMin, Double_t* cutMax )
 translates cuts into parameters
void GetEffsfromPDFs( Double_t* cutMin, Double_t* cutMax, Double_t& effS, Double_t& effB )
 compute signal and background efficiencies from PDFs
 for given cut sample
void GetEffsfromSelection( Double_t* cutMin, Double_t* cutMax, Double_t& effS, Double_t& effB)
 compute signal and background efficiencies from event counting
 for given cut sample
void CreateVariablePDFs( void )
 for PDF method: create efficiency reference histograms and PDFs
Bool_t SanityChecks( void )
 basic checks to ensure that assumptions on variable order are satisfied
void WriteWeightsToFile( void )
 write cuts to file
void ReadWeightsFromFile( void )
 read cuts from file
void WriteHistosToFile( void )
 write histograms and PDFs (if exist) to file for monitoring purposes
void TestInitLocal( TTree *theTree )
 create binary trees (global member variables) for signal and background
MethodCuts( TString jobName, vector<TString>* theVariables, TTree* theTree = 0, TString theOption = "MC:150:10000:", TDirectory* theTargetFile = 0 )
Double_t GetSignificance( void )
 also overwrite:
void SetTestSignalEfficiency( Double_t eff )
MethodCuts* ThisCuts( void )
 static pointer to this object

Author: Andreas Hoecker, Peter Speckmayer, Helge Voss, Kai Voss
Last update: root/tmva $Id: MethodCuts.cxx,v 1.3 2006/05/23 19:35:06 brun Exp $
Copyright (c) 2005: *


ROOT page - Class index - Class Hierarchy - Top of the page

This page has been automatically generated. If you have any comments or suggestions about the page layout send a mail to ROOT support, or contact the developers with any questions or problems regarding ROOT.