The Toolkit for Multivariate Analysis (TMVA) provides a ROOT-integrated machine learning environment for the processing and parallel evaluation of multivariate classification and regression techniques. TMVA is specifically designed to the needs of high-energy physics (HEP) applications, but should not be restricted to these.
The package includes:
- Neural networks
- Deep networks
- Multilayer perceptron
- Boosted/Bagged decision trees
- Function discriminant analysis (FDA)
- Linear discriminant analysis (H-Matrix, Fisher and linear (LD) discriminants)
- Multidimensional probability density estimation (PDE - range-search approach and PDE-Foam)
- Multidimensional k-nearest neighbour method
- Predictive learning via rule ensembles (RuleFit)
- Projective likelihood estimation (PDE approach)
- Rectangular cut optimisation
- Support Vector Machine (SVM)
TMVA consists of object-oriented implementations in C++ for each of these multivariate methods and provides training, testing and performance evaluation algorithms and visualization scripts. The MVA training and testing is performed with the use of user-supplied data sets in form of ROOT trees or text files, where each event can have an individual weight. The true event classification or target value (for regression problems) in these data sets must be known. Preselection requirements and transformations can be applied on this data. TMVA supports the use of variable combinations and formulas.
TMVA works in transparent factory mode to guarantee an unbiased performance comparison between the algorithms: all MVA methods see the same training and test data, and are evaluated following the same prescriptions within the same execution job. A Factory class organises the interaction between the user and the TMVA analysis steps. It performs preanalysis and preprocessing of the training data to assess basic properties of the discriminating variables used as input to the methods. The correlation coefficients of the input variables are calculated and displayed, and a preliminary ranking is derived (which is later superseded by method-specific variable rankings). The variables can be linearly transformed (individually for each classifier) into a non-correlated variable space or projected upon their principle components. For performance comparison, the analysis job prints tabulated results for some benchmark measures. Smooth efficiency versus background rejection curves are stored in a ROOT output file, together with other graphical evaluation information. These results can be displayed using ROOT macros, which are conveniently executed via a graphical user interfaces (each one for classification and regression) that comes with the TMVA distribution.
The TMVA training job runs alternatively as a ROOT script, as a standalone executable, where libTMVA.so is linked as a shared library, or as a python script via the PyROOT interface. Each classifier trained in one of these applications writes its configuration and training results in result (``weight'') files, which consist of text and (optionally) ROOT files.
An easy-to-use Reader class is provided, which reads and interprets the weight files (interfaced by the corresponding classifiers), and which can be included in any C++ executable, ROOT macro or python analysis job.
For standalone use of the trained classifiers, TMVA also generates lightweight C++ response classes, which contain the encoded information from the weight files so that these are not required anymore. These classes do not depend on TMVA or ROOT, neither on any other external library.
We have put emphasis on the clarity and functionality of the Factory and Reader interfaces to the user applications. All MVA methods run with reasonable default configurations, so that for standard applications that do not require particular tuning, the user script for a full TMVA analysis will hardly exceed a few lines of code. For individual optimisation the user can (and should) customize the classifiers via configuration strings.