Multivariate optimisation of signal efficiency for given background efficiency, applying rectangular minimum and maximum requirements.
Also implemented is a "decorrelate/diagonlized cuts approach", which improves over the uncorrelated cuts ansatz by transforming linearly the input variables into a diagonal space, using the square-root of the covariance matrix.
Other optimisation criteria, such as maximising the signal significance- squared, S^2/(S+B), with S and B being the signal and background yields, correspond to a particular point in the optimised background rejection versus signal efficiency curve. This working point requires the knowledge of the expected yields, which is not the case in general. Note also that for rare signals, Poissonian statistics should be used, which modifies the significance criterion.
The rectangular cut of a volume in the variable space is performed using a binary tree to sort the training events. This provides a significant reduction in computing time (up to several orders of magnitudes, depending on the complexity of the problem at hand).
Technically, optimisation is achieved in TMVA by two methods:
Attempts to use Minuit fits (Simplex ot Migrad) instead have not shown superior results, and often failed due to convergence at local minima.
The tests we have performed so far showed that in generic applications, the GA is superior to MC sampling, and hence GA is the default method. It is worthwhile trying both anyway. Decorrelated (or "diagonalized") Cuts
See class description for Method Likelihood for a detailed explanation.
void | CreateVariablePDFs() |
void | GetEffsfromPDFs(Double_t* cutMin, Double_t* cutMax, Double_t& effS, Double_t& effB) |
void | GetEffsfromSelection(Double_t* cutMin, Double_t* cutMax, Double_t& effS, Double_t& effB) |
virtual void | Init() |
void | MatchCutsToPars(vector<Double_t>&, Double_t*, Double_t*) |
void | MatchCutsToPars(vector<Double_t>&, Double_t**, Double_t**, Int_t ibin) |
void | MatchParsToCuts(const vector<Double_t>&, Double_t*, Double_t*) |
void | MatchParsToCuts(Double_t*, Double_t*, Double_t*) |
enum EFitMethodType { | kUseMonteCarlo | |
kUseGeneticAlgorithm | ||
kUseSimulatedAnnealing | ||
kUseMinuit | ||
kUseEventScan | ||
kUseMonteCarloEvents | ||
}; | ||
enum EEffMethod { | kUseEventSelection | |
kUsePDFs | ||
}; | ||
enum EFitParameters { | kNotEnforced | |
kForceMin | ||
kForceMax | ||
kForceSmart | ||
}; | ||
enum TMVA::MethodBase::EWeightFileType { | kROOT | |
kTEXT | ||
}; | ||
enum TObject::EStatusBits { | kCanDelete | |
kMustCleanup | ||
kObjInCanvas | ||
kIsReferenced | ||
kHasUUID | ||
kCannotPick | ||
kNoContextMenu | ||
kInvalidObject | ||
}; | ||
enum TObject::[unnamed] { | kIsOnHeap | |
kNotDeleted | ||
kZombie | ||
kBitMask | ||
kSingleKey | ||
kOverwrite | ||
kWriteDelete | ||
}; |
Bool_t | TMVA::MethodBase::fSetupCompleted | is method setup |
const TMVA::Event* | TMVA::MethodBase::fTmpEvent | ! temporary event when testing on a different DataSet than the own one |
static const Double_t | fgMaxAbsCutVal |
TMVA::Types::EAnalysisType | TMVA::MethodBase::fAnalysisType | method-mode : true --> regression, false --> classification |
UInt_t | TMVA::MethodBase::fBackgroundClass | index of the Background-class |
vector<TString>* | TMVA::MethodBase::fInputVars | vector of input variables used in MVA |
vector<Float_t>* | TMVA::MethodBase::fMulticlassReturnVal | holds the return-values for the multiclass classification |
Int_t | TMVA::MethodBase::fNbins | number of bins in input variable histograms |
Int_t | TMVA::MethodBase::fNbinsH | number of bins in evaluation histograms |
Int_t | TMVA::MethodBase::fNbinsMVAoutput | number of bins in MVA output histograms |
TMVA::Ranking* | TMVA::MethodBase::fRanking | pointer to ranking object (created by derived classifiers) |
vector<Float_t>* | TMVA::MethodBase::fRegressionReturnVal | holds the return-values for the regression |
UInt_t | TMVA::MethodBase::fSignalClass | index of the Signal-class |
TString* | fAllVarsI | what to do with variables |
TMVA::BinarySearchTree* | fBinaryTreeB | |
TMVA::BinarySearchTree* | fBinaryTreeS | |
Double_t** | fCutMax | maximum requirement |
Double_t** | fCutMin | minimum requirement |
vector<Interval*> | fCutRange | allowed ranges for cut optimisation |
Double_t* | fCutRangeMax | maximum of allowed cut range |
Double_t* | fCutRangeMin | minimum of allowed cut range |
TH1* | fEffBvsSLocal | intermediate eff. background versus eff signal histo |
TMVA::MethodCuts::EEffMethod | fEffMethod | chosen efficiency calculation method |
TString | fEffMethodS | chosen efficiency calculation method (string) |
Double_t | fEffRef | reference efficiency |
Double_t | fEffSMax | used to test optimized signal efficiency |
Double_t | fEffSMin | used to test optimized signal efficiency |
TMVA::MethodCuts::EFitMethodType | fFitMethod | chosen fit method |
TString | fFitMethodS | chosen fit method (string) |
vector<EFitParameters>* | fFitParams | vector for series of fit methods |
vector<Double_t>* | fMeanB | means of variables (background) |
vector<Double_t>* | fMeanS | means of variables (signal) |
Bool_t | fNegEffWarning | flag risen in case of negative efficiency warning |
Int_t | fNpar | number of parameters in fit (default: 2*Nvar) |
TRandom* | fRandom | random generator for MC optimisation method |
vector<Int_t>* | fRangeSign | used to match cuts to fit parameters (and vice versa) |
vector<Double_t>* | fRmsB | RMSs of variables (background) |
vector<Double_t>* | fRmsS | RMSs of variables (signal) |
Double_t | fTestSignalEff | used to test optimized signal efficiency |
Double_t* | fTmpCutMax | temporary maximum requirement |
Double_t* | fTmpCutMin | temporary minimum requirement |
vector<TH1*>* | fVarHistB | reference histograms (background) |
vector<TH1*>* | fVarHistB_smooth | smoothed reference histograms (background) |
vector<TH1*>* | fVarHistS | reference histograms (signal) |
vector<TH1*>* | fVarHistS_smooth | smoothed reference histograms (signal) |
vector<PDF*>* | fVarPdfB | reference PDFs (background) |
vector<PDF*>* | fVarPdfS | reference PDFs (signal) |
standard constructor
construction from weight file
Cuts can only handle classification with 2 classes
define the options (their key words) that can be set in the option string know options: Method <string> Minimisation method available values are: MC Monte Carlo <default> GA Genetic Algorithm SA Simulated annealing EffMethod <string> Efficiency selection method available values are: EffSel <default> EffPDF VarProp <string> Property of variable 1 for the MC method (taking precedence over the globale setting. The same values as for the global option are available. Variables 1..10 can be set this way CutRangeMin/Max <float> user-defined ranges in which cuts are varied
process user options sanity check, do not allow the input variables to be normalised, because this only creates problems when interpreting the cuts
cut evaluation: returns 1.0 if event passed, 0.0 otherwise
retrieve cut values for given signal efficiency assume vector of correct size !!
retrieve cut values for given signal efficiency
returns estimator for "cut fitness" used by GA
there are two requirements:
1) the signal efficiency must be equal to the required one in the
efficiency scan
2) the background efficiency must be as small as possible
the requirement 1) has priority over 2)
translates parameters into cuts
translate the cuts into parameters (obsolete function)
compute signal and background efficiencies from PDFs for given cut sample
compute signal and background efficiencies from event counting for given cut sample
create XML description for LD classification and regression (for arbitrary number of output classes/targets)
- overloaded function to create background efficiency (rejection) versus signal efficiency plot (first call of this function) - the function returns the signal efficiency at background efficiency indicated in theString "theString" must have two entries: [0]: "Efficiency" [1]: the value of background efficiency at which the signal efficiency is to be returned
- overloaded function to create background efficiency (rejection) versus signal efficiency plot (first call of this function) - the function returns the signal efficiency at background efficiency indicated in theString "theString" must have two entries: [0]: "Efficiency" [1]: the value of background efficiency at which the signal efficiency is to be returned
get help message text
typical length of text line:
"|--------------------------------------------------------------|"
this is a workaround which is necessary since CINT is not capable of handling dynamic casts
{ return dynamic_cast<MethodCuts*>(method); }
rarity distributions (signal or background (default) is uniform in [0,1])
{ return 0; }
the definition of fit parameters can be different from the actual cut requirements; these functions provide the matching