T M V A --- Toolkit for Multivariate Data Analysis (http://tmva.sf.net) ======================================================================================= Release notes (started for release 3.7.3). Append to the top. _______________________________________________________________________________________ Release v4.1.2 (newest): TMVA version 4.1.2 is included in ROOT release 5.30/00. The changes with respect to TMVA-v4.1.1 are listed below. - Bug fixes: o Requested number of training and testing events was not correct when pre-selection cuts were applied. Now the number of requested events scales with the preselection efficiency and hence does not need to be adjusted with the pre-selection. This also corrects the problems seen in the Category classifier, where pre-selection is used to build the categories. o Correct histogram boundaries in PlotVariable. o Correct scanning procedure in OptimizeTuningParameters. o Print the significance formula that is actually used o Small speed improvement for PDEFoam functions. o Several minor bug fixes for version 4.1.2. _______________________________________________________________________________________ Release v4.1.1: - Variable transformations can now be applied to a user-defined subset of variables (and regression targets). Description: - Enable variable transformations for general boosting - Extended PDEFoam functionality: o Multiclass classification by training of one discriminator foam for each variable. o The cell tree can now be plotted from the macro test/PlotFoams. This makes it easyer to compare the PDEFoam structure to a decision tree. o Variable importance ranking by counting the number of cuts made in each dimension. The variable, for which the most cuts were done is ranked highest. - Fixed the size of the sampling box in PDEFoam: In TMVA 4.1.0 the size of the PDEFoam sampling box in each dimension was 2*VolFrac times the foam size. This was contrary to the intention and the documentation in the UserGuide and is now corrected: In TMVA 4.1.1 the size of the PDEFoam sampling box in each dimension is now VolFrac times the foam size. This implies that in TMVA 4.1.1 the VolFrac value for training a PDEFoam must be doubled in order to give the same results as in TMVA 4.1.0. The default VolFrac value was also changed from 0.0333 to 0.0666. - New configuration variable "NbinsMVAoutput" defining the bins of the MVA output variables in the TMVA training plots produced via the GUI. As always, Config settings can be modified in the training script via, eg, the command (TMVA::gConfig().GetVariablePlotting()).fNbinsMVAoutput = 50; to be called AFTER initialising the TMVA Factory object. - Bug fixes: o Fix for MethodBoost which ensures that the method options for the boosted classifier are handled correctly during boosting. o Fixed problems in classification of some methods when booking background training tree before signal one. o Fixed preprocessing transformation bug in HMatrix _______________________________________________________________________________________ Release v4.1.0: - The most important new feature is the support for the simulataneous classification of multiple output classes for several multivariate methods. A lot of effort went into consolidation of the software, i.e. method performance and robustness, and framework stability. TMVA 4.1.0 is included in the ROOT production release 5.28. The changes with respect to TMVA 4.0.7 are listed below. - Framework: o Multi-class support. Support of multiple output classes (i.e., more than a single background and signal class) has been enabled for the methods: MLP (NN), BDTG, FDA. The multiclass functionality can be enabled with the Factory option "AnalysisType=multiclass". Training data are specified with an additional classname: factory->AddTree(tree,"myclassname");. After the training a genetic algorithm is invoked to determine the best cuts for selecting a specific class, based on the figure of merit: purity*efficiency. TMVA comes with two examples for the use of mutiple output classes in $ROOTSYS/tmva/test: TMVAMulticlass.C and TMVAMulticlassApplication.C o New TMVA event vector building. The code for splitting the input data into training and test samples for all classes and the mixing of those samples to one training and one test sample has been rewritten. The new code performs better and has a clearer structure. This fixes several bugs reported by TMVA users. o Code and performance test framework. A unit test framework for nightly software and method performance validation has been implemented. - Method: o BDT Automatic parameter optimisation for building the tree architecture. The optimisation procedure uses the performance of the trained classifier on the "test sample" for finding the set of optimal parameters. Two different optimisation methods are available: parameger scanning and genetic algorithm. Currently, parameter optimisation is implemented only for the three parameters that influence the tree architecture: the maximum depth of a tree, MaxDepth, the minimum number of events in each node, NodeMinEvents, and the number of tress, NTrees. Optimisation is invoked by calling factory->OptimizeAllMethods(), prior to the call factory->TrainAllMethods(). Automated and configurable parameter optimisation is soon to be enabled for all methods via the central framework (for those parameters where optimisation is applicable). o BDT node splitting. While Decision Trees typically have only univariate splits, in TMVA one can now also opt for multivariate splits that use a "Fisher Discriminant" (option: UseFisherCuts), built from all observables that show correlations larger than some settable threshold (MinLinCorrForFisher). The training will then test at each split a cut on this Fisher discriminant in addition to all univariate cuts on the variables (or only on those variables that have not been used in the Fisher discriminant, option UseExcusiveVars). No obvious improvement betwen very simple decision trees after boosting has been observed so far, but only a limited number of studies has been performed concerning potiential benenfit of these simple multivariate splits. - Bug fixes: o A problem in the BDTG has been fixed, leading to a much improved regression performance. o A problem in the TMVA::Reader has been fixed. o With the new test framework and the coverity checks of ROOT a number of bugs were discovered and fixed. They mainly concerned memory leaks, and did not affect the performance. _______________________________________________________________________________________ Release v4.0.3: - New Category method allowing the user to separate the training data (and accordingly the application data) into disjoint sub-populations exhibiting significantly different properties. The separation into phase space regions is done by applying requirements on the input and/or spectator variables. In each of these disjoint regions (each event must belong to one and only one region), an independent training is performed using the most appropriate MVA method, training options and set of training variables in that zone. The division into categories in presence of distinct sub-populations reduces the correlations between the training variables, improves the modelling, and hence increases the classification and regression performance. Presently, the Category method works for classification only, but regression will follow soon. Please contact us if urgently needed. - Added new example scripts and data files illustrating how the new Category method is configured and used. Please check: TMVA/macros/TMVAClassificationCategory.C, TMVA/macros/TMVAClassificationCategoryApplication.C TMVA/execs/TMVAClassificationCategory.cxx, TMVA/execs/TMVAClassificationCategoryApplication.cxx - The analysis type is no longer defined by calling a dedicated TestAllMethods-member-function of the Factory, but with the option "AnalysisType" in the Factory. The default value is "Auto" where TMVA tries to determine the most suitable analysis type from the targets and classes the user has defined. Other values are "regression", "classification" and "multiclass" for the forthcoming multiclass classification. - Added regression functionality for gradient boosted trees using a Huber loss function. - Update of Users Guide. - Removed obsolete option "FVerySmart" from Cuts method. - Bug fixes: - Spectators and Targets could not be used with by-hand assignment of events. - Corrected types (training/testing) for assigning single events. - Changed message from FATAL to WARNING when the user requests more events for training or testing than available. - Fixed bug which caused TMVA to crash if the number of input variables exceeded the allowed maximum for generating scatter plots. - Prevent TMVA from crashing when running with an empty TTree or TChain. - Added missing regression evaluation plots for training sample. _______________________________________________________________________________________ Release v4.0.2: - Checks are performed if events are unvoluntarily cut by using a non-filled array entry (e.g. "arr[4]" is used, when the array has not always at least 5 entries). A warning is given in that case. - Display of convergence information in the progress bar for MLP during training. - Creation of animated gifs for MLP convergence monitoring (please contact authors if you want to do this). - Variables, targets and spectators are now checked if they are constant. (The execution of TMVA is stopped for variables and targets, a warning is given for spectators.) - Bug fixes: - A variable expression like "Alt$(arr[3],0)" can now be used to give a default value for a variable if for some events the array don't contain enough elements (e.g. in two jet events, sometimes only one jet is found and thus, the array jetPt[] has only one entry in that cases). - Plot ranges for scatter-plots showing the transformed events are now correct. - User defined training/testing-trees are now handled correctly. - Fix bug in correlation computation for regression. - Consistent use of variable labels (for the log output) and variable titles (in histograms). _______________________________________________________________________________________ Release v4.0.1: - "Spectator" variables can be defined now which are computed just as the input variables and which are written out into the TestTree, but which don't participate in any MVA calculation (useful for correlation studies). - New booking option "IgnoreNegWeightsInTraining" to test the effect of events with negative weights on the training. This is especially useful for methods, which do not properly deal with such events. - Bug fixes: - Fixed regression bug in VariableNormalizeTransform (Use number of targets from Event instead of DataSet) - Fixed Multitarget-Regression in PDEFoam, foam dimensions were miscalculated - Added writing of targets to the weight files in regression mode to fix problems in RegressionApplication - Added missing standard C++ header files missing to some classes, which lead to compilation failures on some architectures (thanks to Lucian Ancu, Nijmegen, for reporting these). - Added checks for unused options to Factory and DataSetFactory configuration options interpretation. Will now complain if wrong option labels are used. - Fixed standard creation of correlation matrix plots - Fixed internal mapping problem giving a fatal error message when destroying and recreating the Factory. - A number of non-intrusive modifications required for integration into ROOT v5.23 _______________________________________________________________________________________ Release v4.0.0: -------------------------------- - Main changes and new features: -------------------------------- + Reorganisation of internal data handling and constructors of methods to allow to build arbitrary composite MVA methods, and to deal with multi-class classification and multi-target regression + Extended TMVA to multivariate multi-target regression + Any TMVA method can now be boosted (linearly or non-linearly) + Transformation of input variables can be chained as wished + Weight files are now in XML format + New MVA methods "PDE-Foam" and "LD", both featuring classification and regression + Moved from CVS to SVN concurrent development system ----------- - Comments: ----------- On XML format: The old text format is obsolete though still readable in the application. Backward compatibility is NOT guaranteed. Please contact the authors if you require the reading of old text weight files in TMVA 4. Standard macros: The structure of the standard macros has changed: macros are still in the "TMVA/macros" directory, but distringuished for classification and regression examples: in "macros": TMVAClassification.C, TMVAClassificationApplication.C TMVARegression.C, TMVARegressionApplication.C in "execs" : TMVAClassification.cxx, TMVAClassificationApplication.cxx TMVARegression.cx, TMVARegressionApplication.cxx ("execs" replaces the previous "examples", for exectutables) Classification and regression analysis (training) is analysed as usual via standard macros that can be called from dedicated GUIs. Regression: Not yet available for all MVA methods. It exists for: PDE-RS, PDE-Foam, K-NN, LD, FDA, MLP, BDT for single targets (1D), and MLP for multiple targets (nD). Not all transformation of input variables are available (only "Norm" so far). Regression requires specific evaluation tools: - During the trainin we provide a ranking of input variables, using various criteria: correlations, transposed correlation, correlation ratio, and "mutual information" between input variables and regression target. (Correlation ratio and mutual information implmentations provided by Moritz Backes, Geneva U) - After the training, the trained MVA methods are ranked wrt. the deviations between regression target and estimate. - Macros plot various deviation and correlation quantities. A new GUI (macros/TMVARegGui.C) collects these macros. ------------------------------------------------- - Improvements of / new features for MVA methods: ------------------------------------------------- + Linear Discriminant: Re-implementation of "Fisher" method as general linear discriminant ("LD"), which is also regression capable (so far: single-target only) + PDEFoam: PDE-Foam is a variation of the PDE-RS method using a self-adapting binning method to divide the multi-dimensional variable space into a finite number of hyper-rectangles (cells). The binning algorithm adjusts the size and position of a predefined number of cells such that the variance of the signal and background densities inside the cells reaches a minimum. + BDT: Introduced gradient boosting and stochastic gradient boosting for classification with BDT (as desribed by Friedman 1999). See "BDTG" example in TMVAClassification.C/cxx. A new option allows to restrict the maximum tree depth. This may be used to avoid overtraining and often gives better performance than pruning. (The pruning mechanism needs to be revisited) + MLP: Introduced recognition of convergence via general ConvergenceTest-class for interrupting computations when convergence is reached. This feature has is used now in MethodMLP. Improved treatment of event-weights in BFGS training. Implemented random and importance sampling of events in DataSet. Implemented the usage of this feature for MLP. + TMlpANN (interface to TMultiLayerPerceptron) now also uses event weights and writes standalone C++ class. + k-NN: A new global knn search function has been added to NodekNN that searches for k-nearest neighbor using event weights instead of raw event counts. ModulekNN has been modified to allow searches using "weight" or "count" option, where "count" is default. Added UseWeight option to MethodKNN to allow using of "weight" or "count". (Work by Rustem Ospanov, CERN). + Likelihood (and general PDF treatment): Adaptive smoothing the PDF class, allowing it to smooth between MinSmoothNum (for regions with more signal) and MaxSmoothNum (for regions with less signal). Configuration of the PDF parameters from the option string moved to PDF class, allowing the user to define all the PDF functionalities in every classifier the PDF is used (i.e., also for the MVA PDFs). The reading of these variables was removed from MethodBase and MethodLikelihood. This also allows improved (full) PDF configuration of MVA output via the "CreateMvaPdf" option. (Work by Or Cohen, CERN & Weizmann) + New generalisation methods: - MethodCompositeBase: combines more than one classifier within one. - MethodBoost: boosts/bags any classifier type. A special booking procedure for it was added to Factory class. - MethodDT: a classifier composed of a single decision tree, boosted using MethodBoost. Results are compatible with BDT, but BDT remains the default for boosted decision trees, because it has pruning (among other additional features). -------- - Other: -------- + Improved handling of small Likelihood values such that Likelihood performance increases in analyses with many variables (~>10). Thanks to Ralph Schaefer (Bonn U.) for reporting this. + Nicer plotting: custom variable titles and units can be assigned in "AddVariable" call. + Introduced the inverse transformation InverseTransform for the variable transformations into the framework. While this is not necessary for classification, it is necessary for regression. The inverse transformation of the normalisation transformation has been implemented. + Started to extend the variable transformations to the regression targets as well. + MethodCuts now produces the 'optimal-cut' histograms needed by macro mvaeffs.C. (macro 5a of TMVAGui.C) + MsgLogger can be silenced in order to prevent excess output during boosting. + Third dataset type added centrally (Training, Validation and Testing). The validation data is split off the original training data set. + Update of GUI and other Macros according to the new features of PDF and the addition of MethodBoost. _______________________________________________________________________________________ Release v3.9.6 [under CVS branch: V03-09-05-branch]: - Added variable binning and interpolation of the cumulative distributions to Gauss-Decorrelation preprocessing (thanks to Diego Martinez). Include proper event-weight treatment. - BDT: when "bagging" do not count events with weight "0" for weighted and unweighted events. New bagging option "SampleSizeFraction" which gives the size of the resampled event sample relative to the original sample. - Bug fixes: - Inserted missing variable initialisations in TSynapse of MLP. Thanks to Jiahang Zhong (Academia Sinica) for finding this. - Corrected screen print of decorrelation transformation (missing '-' sign in first variable). - Corrected event-weight treatment in MVA-PDF creation, used for Rarity calculation. Thanks to Bertrand Chapleau (McGill U.) for reporting this. A similar weight bug was fixed in MethodLikelihood. - Fix that now the number of variables set by "UseNvars" for RanomiseForests is indeed properly set. Before it was typically some value less than the one that was asked for. - Introduce missing event-weight treatment to Decorrelation preprocessing. - Corrected mismatch in binning between output and weight file for Cuts classifier (thanks to Bjoern Gosdzik (DESY) for reporting this). - The macros TMVAnalysis.C and TMVApplication.C now compile in ROOT (via .L TMVAnalysis.C++). _______________________________________________________________________________________ Release v3.9.5: - New Decision Tree Pruning algorithm (Cost Complexity Pruning a la CART) written by Doug Schouten (Fraser U.). It replaces the old CostComplexity and CostComplexity2 algorithms. - Further improvements introduced by Doug Schouten: o New no-splitting option (can be chosen with NCuts<0) that finds best split point by first sorting the events for each variable and then looping through all events, placing the cuts always in the middle between two of the sorted events, and finding the true possible maximum separation gain in the training sample by cutting on this variable. o The beta parameter in the AdaBoost is now an option (default is 1). o The node purity at which a node is classified as signal (respective background node) for determining the error fraction in the pruning became a parameter that can be set via the option NodePurityLimit (default is 0.5). - First implementation of a new preprocessing method: transformation of the variables first into a Gaussian distribution, then performing a decorrelation of the "Gaussianised" variables. The transformation is again done by default such that (by default) the signal distributions become Gaussian and are decorrelated. Note that simultaneous Gaussianisation and decorrelation of signal and background is only possible (and done) for methods, such as Likelihood, which test both hypotheses. - Bug fixes: - Corrected compilation failure on some architectures with ROOT 5.18.00. - Fix in Expected error pruning: rather than multiplying both sides, the error on the node and the sub-tree, with the prune strength, now only the expected error of the sub-tree is scaled. - Fix in FDA parsing of the input formula. There were problems when treating more than 10 parameters (thanks to Hugh Skottowe for reporting this). ______________________________________________________________________________________ Release v3.9.4: - New automatically generated configuration option reference for newest release: http://tmva.sourceforge.net/optionRef.html - Bug fixes: - Calculation of "Separation": fixed bin-shift and normalisation bugs. Thanks to Dag Gillberg (Fraser U) for spotting these. - Fixed problem in "SetSignal(Background)WeightExpression": signal (background weight expressions not existing in the background (signal) tree led to an abort of the tree reading ("Bad numerical expression"). Thanks to Alfio Rizzo (Brussels) for pointing this out. - Fixed problem when specifying train and test tree explicitly. Some code was forgotten in the background part, creating incompatibilities. Thanks to Zhiyi Liu (Fraser U) for reporting this. _______________________________________________________________________________________ Release v3.9.3 (skipped) _______________________________________________________________________________________ Release v3.9.2: - Bug fixes: - Fixed problem introduced in 3.9.0/1: preprocessing cuts could not be applied on a variable that was not declared via AddVariable (thanks to Daniel Stricker, Karlsruhe U., for spotting this). - Corrected configurable random seed in GeneticAlgorithm (thanks to David Gonzalez Maline, CERN, for pointing this out) - Fixed Cuts (optimisation) method -> event with smallest value was not included in search for optimal cut (thanks to Dimitris Varouchas, LAL-Orsay, for helping us detecting the problem). _______________________________________________________________________________________ Release v3.9.1: - Bug fix: - Added missing link to libMLP shared library to Makefile; also fixed compilation issue with ROOT 5.08. _______________________________________________________________________________________ Release v3.9.0: - Entirely new Simulated Annealing (SA) algorithm for global minimisation in presence of local minima (optionally used in cut optimisation (MethodCuts) and the Function Discriminant (MethodFDA)). The SA algorithm features two approaches, one starting at minimal temperature (ie, from within a local minimum), slowly increasing, and another one starting at high temperature, slowly decreasing into a minimum. Code developed and written by Kamil Bartlomiej Kraszewski, Maciej Kruk and Krzysztof Danielowski from IFJ and AGH/UJ, Krakow, Poland. - Plugin capability: custom multivariate classifier can now be plugged into the TMVA framework to benefit from TMVA's analysis and performance comparison tools. The user needs to derive the custom class from TMVA::MethodBase and implement the (few) virtual methods required by the TMVA::IMethod interface. The classifier can then be directly called via ROOT's plugin mechanism. An example for this is given in TMVA/macros/TMVAnalysis.C. Many thanks to Daniel Martschei and Thomas Kuhr (Karlsruhe U.) for suggesting and implementing this feature. - Preselection cuts now work on arrays. Previously used TEventlists (only event wise pass/fail) were replaced by TreeFormulas (sensitive to array position). Thanks to Arnaud Robert (LPNHE) for his contributions. - Framework/dataset preparation: Signal and background trees can now be assigned individually to training and test purposes. This is achieved by setting the third parameter of the Factory::AddSignalTree/AddBackgroundTree() methods to "Train" or "Test" (const string). The only restriction is that either none or all signal (background) trees need to be specified with that option. It is possible to mix the two modes, for instance one can assign individual training and test trees for signal, but not for background. - For increased flexibility, users can also directly input signal and background, training and test events to TMVA, instead of letting TMVA interpret user-given trees. Note that either one of the two approaches must be chosen (no mix). The syntax of the new calls is described in the macros/TMVAnalysis.C test macro. --> The User runs the event loop, copies for each event the input variables into a std:vector, and "adds" them to TMVA, using the dedicated calls: factory->AddSignalTrainingEvent( vars, signalWeight ); (and replacing "Signal" by "Background", and "Training" by "Test"). After the event loop, everything continues as in the standard method. - Cut optimisation: added physical limits to min/max cuts if smart option is used. - TMlpANN: fixed crash with ROOT>=5.17 when using large number of test events; also corrected bias in cross validation: before the test events were used, which led to an overestimated performance evaluation in case of a small number of degrees of freedom; separate now training tree in two parts for training and validation with configurable ValidationFraction Extended options to TMultilayerPerceptron learning methods. - BDT: removed hard-coded weight file name; now, paths and names of weight files are written as TObjStrings into ROOT target file, and retrieved for plotting; available weight files (corresponding to target used) can be chosen from pop-up GUI. Changes in handling negative weights in BDT algorithm. Events with negative weights now get their weight reduced (*= 1/boostweight) rather than increased (*= boostweight) as the other events do. Otherwise these events tend to receive increasingly stronger boosts, because their effects on the separation gain are as if background events were selected as signal and vice versa (hence the events tend to be "wanted" in signal nodes, but are boosted as if they were misclassified). In addition, the separation indices are protected against negative S or S+B returning 0.5 (no separation at all) in case that occurs. In addition there is a new BDT option to ignore events with negative event weights for the training. This option could be used as a cross check of a "worst case" solution for Monte Carlo samples with negative weights. Note that the results of the testing phase still include these events and are hence objective. Added randomised trees: similar to the "Random Forests" technique of Leo Breiman and Adele Cutler, it uses the "bagging" algorithm and bases the determination of the best node-split during the training on a random subset of variables only, which is individually chosen for each split. Move to TRandom2 for the "bagging" algorithm and throw random weights according to Poisson statistics. (This way the random weights are closer to a resampling with replacement algorithm.) - GUI: New macro (and GUI button) for Parallel Coordinate plotting. - Python (PyROOT): added example for reader application: TMVA/python/TMVApplication.py - Bug fixes: - Corrected inconsistency in MethodCuts: the signal efficiency written out into the weight file does not correspond to the center of the bin within which the background rejection is maximised (as before) but to the lower left edge of it. This is because the cut optimisation algorithm determines the best background rejection for all signal efficiencies belonging into a bin. Since the best background rejection is in general obtained for the lowest possible signal efficiency, the reference signal efficiency is the lowest value in the bin. - Fixes in input-variable and MVA plotting: under/over-flow numbers given on plots were not properly normalised; the maximum histogram ranges have been increased to avoid cut-offs. Thanks to Andreas Wenger, Zuerich, for pointing these out. - Cleaned up memory leaks. _______________________________________________________________________________________ Release v3.8.14: - Added printouts to "Cuts" classifier quoting the explicit cut application for given signal efficiency. In case of transformations of the input variables, the full expressions are given. Added warning to Fisher in case of variable normalisation. - Macro fixes for ROOT-integrated version. - Version integrated into ROOT 5.18. _______________________________________________________________________________________ Release v3.8.13: - Improved "mvaeffs.C" macro for significance calculation with arbitrary numbers of signal and background events. This concerns button (5a) of the main GUI (TMVAGui.C). - Improved "BDT.C" macro for decision tree drawing. This concerns button (8) of the main GUI (TMVAGui.C). - A few minor GUI upgrades. - Multiple internal changes to prepare ROOT 5.18 production release. _______________________________________________________________________________________ Release v3.8.12: - Bug fixes: - Small fix in MLP network plotter. - Fixes of smaller memory leaks. _______________________________________________________________________________________ Release v3.8.11: - Bug fixes: - Reimplemented possibility to cut on variables that are not used in the MVAs during training and test tree preparation. _______________________________________________________________________________________ Release v3.8.10: - New normalisation option "NormTree" for the binary search tree used in PDERS. The tree sorting is modified to achieve trees with equal branch lengths, which, in average, speeds up searches. - Bug fixes: - Fix of problem with TChain treatment (occurred in ROOT >=5.15) in DataSet. _______________________________________________________________________________________ Release v3.8.9: - Bug fixes: - Memory leaks in the Reader class are removed: the Reader is now properly destructed (deletion of all handled classifiers). Thereby, pointer problems in the destructors of Fisher and SVM have been found and fixed. - In later ROOT versions (after 5.14) the macro TMVApplication.C produced a segmentation fault when run from the ROOT prompt (the compiled version in the examples directory worked fine). This problem is now solved. - The color selection has been adapted to the new color palette that was introduced in ROOT 5.16. The macros should now look alike with all ROOT versions (above 4.02/00). _______________________________________________________________________________________ Release v3.8.8: - Cuts can now be applied independently for signal and background in the PrepareTrainingAndTestTree phase. - Previously, the input variables used by the Fisher classifier were always normalised to [-1,1] by default. This has been removed, so that it is now in the hand of the user to decide whether or not normalisation is applied. Choose "Normalise" ("!Normalise") for normalisation (no normalisation), default is "!Normalise". - Bug fixes: - Very important bug fix: the application of cuts in the PrepareTrainingAndTestTree call in conjunction with the use of several trees (ie, several consecutive calls of factory->AddSignalTree(...) or factory->AddBackgroundTree(...)), lead to a wrong application of the cut to all trees but the first one in the signal and background chains. More details can be provided if requested - please contact the authors. We wish to thank Manfred Groh for spotting and analysing the problem! - Some compilers complained about a missing #include "TMVA/Configurable.h" in the Reader class. This has been fixed. _______________________________________________________________________________________ Release v3.8.7: - Significant speed improvements for PDERS. For the options to benefit from this, see the example "PDERSkNN" in macros/TMVAnalysis.C or examples/TMVAnalysis.cxx. Thanks to Kamil Kraszewski and friends from Cracow for implementing this. - Re-established backward compatibility of TMVA code down to ROOT version 4.02/00. - Shortened BDT weight-file and standalone C++ reader class by 20% and 50%, respectively. - Bug fixes: - Fixed problem in RuleFit's standalone class when using integer input variables. - Fixed compilation problem when using decorrelation preprocessing of input variables in C++ standalone reader classes. - Fixed bug in number-of-plots calculation in correlation script. - Fixed bug in printing of number of events in case of several trees (no impact on results). - Fixed inconsistency between cut optimisation and cut reading: the aligned definition of min and max cuts is: a variable passes a cut if: min < var <= max. (This inconsistency may have affected your results if you used cut optimisation together with integer variables. Please check with the new version.) - Fixed macro path in TMVAGui.C to fix problem when running the GUI in the ROOT/TMVA distribution. Also: TMVA Style moved from TMVAlogon into tmvaglob to fix style problem when running in the ROOT/TMVA distribution. _______________________________________________________________________________________ Release v3.8.6: - Weight expressions can now be set individually for signal and background via the calls factory->SetSignalWeightExpression( "" ) and factory->SetBackgroundWeightExpression( "" ). The former call is still supported. - Overtraining test: a new GUI button (corresponding to an extension of the macro "mvas.C") is available to plot a comparison of the classifier response distributions for the training and independent test data sets. The results of a Kolmogorov-Smirnov compatibility test are printed on stdout and plots. - The cuts corresponding to a given signal efficiency can be retrieved via the reader. An example for this is implemented in "macros/TMVApplication.C". Briefly, retrieve the cuts classifier object as follows: TMVA::MethodCuts* mcuts = (TMVA::MethodCuts*)reader->FindMVA( "CutsGA method" );, define cut vectors (a vector of pairs can also be retrieved via overloaded GetCuts function): std::vector cutsMin; std::vector cutsMax; and fill them via: mcuts->GetCuts( wantedSignalEfficiency, cutsMin, cutsMax ); - Clean up of code and include headers to improve forward declaration. - Bug fixes: - Fixed typos in weight file names in MLP and BDT macros - Comment on code size: for an intermediate period we are developing a new, more flexible TMVA design in the TMVA/reader repository (for simplicity, to avoid CVS branching). This entails code duplication, which will be removed once the new design is deployed. _______________________________________________________________________________________ Release v3.8.5: - Bug fixes: - Fixed "MinMax" and "RMS" options of PDERS (thanks to Junpei Maeda for spotting this) - Fixed compilation problem in MetricEuler class on some platforms _______________________________________________________________________________________ Release v3.8.4: - Added new option for event weight renormalisation to "PrepareTrainingAndTestTree" configuration. Possible settings are "None", "NumEvents", "EqualNumEvents", where the latter two options renormalise the event weights so that their sums equal the number of signal and background events, or the number of signal events, respectively. The current version of the TMVA Users Guide has been updated. - Bug fixes: - Fixed bug in weights read back of CFMlpANN -> now fully operational again - Fixed convergence test macro for MLP neural network _______________________________________________________________________________________ Release v3.8.3: - fixed problem of combined use of GA or MC with a Minuit converger in FDA - updated README _______________________________________________________________________________________ Release v3.8.0 -- v3.8.2: - New "k-Nearest Neighbour" classifier (MethodKNN), implemented by Rustem Ospanov Texas U., USA). The classifier is similar to PDERS, but intrinsically adaptive and faster. The TMVA users guide has been updated with the new classifier. - New classifier for a "Function Discriminant Analysis" (MethodFDA). This simple classifier fits any user-defined TFormula (via option configuration string) to the training data by requiring a formula response of as close as possible to 1 (0) for signal (background) events. The parameter fitting is done via the abstract class FitterBase (see below), where the concrete fitter is selected via the configuration option string. The TMVA users guide has been updated with the new classifier. - The Likelihood method now allows to select the interpolation procedure for the reference distributions individually for each variable, i.e. splines for var1 Kernel density estimator for var2 etc. In this context, the naming of the options has been changed. While before one would write "Spline=2" or "useKDE", it is now consistently done by: "PDFInterpol=Spline2" or "PDFInterpol=KDE". The previous syntax chooses it for "ALL variables", but one can also specify individually by writing: "PDFInterpol[0]=Spline2:PDFInterpol[1]=KDE" etc. - Other work on classifiers: + RuleFit has been improved: the forest now consists of boosted trees; significant speed enhancement; reorganisation of configuration options + Change of default options in BDT: if no minimum number of events in leaf node is specified, max(20, N_train/(Nvar^2)/10)) is used. - New Make Class feature: each classifier (exception: PDERS, k-NN and Cut optimisation) writes together with its weight file a standalone, i.e., TMVA and ROOT-independent C++ reader class into the same directory as the weight files, and following the same naming convention. The classes allow users who need ROOT independence to interpret the trained classifiers. They provide identical results as the TMVA-integrated Reader. The example script TMVA/macros/ClassApplication.C shows how to use the new classes. - New help messages: on the "H" option in the configuration option string, each classifier prints a help message with a short summary of the working principles, and, more importantly, hints on which options to tune to improve selection and/or speed performance. The help messages of all booked classifiers can also be printed by invoking the factory via the call: factory->PrintHelpMessage(). - The various fitters used for minimisation by the classifiers have been unified under the common base class FitterBase. Any TMVA module that uses minimisation inherits from IFitterTarget, which interfaces the minimisation function. Presently, four concrete fit algorithms are implemented: Monte Carlo sampling (MC), Genetic Algorithm (GA), MINUIT and Simulated Annealing (SA not yet finalised). Also allowed are combinations of fitters, such as using MC or GA together with MINUIT for better convergence under starting values provided by the former algorithms. - Change of the somewhat ingest mu-transform into Rarity. The rarity (originally defined in a Mark-II note) is a variable transformation that makes the background uniform distributed in [0,1], while the signal is peaking -> 1. It is also useful to spot simulation (or control sample) problems: if the background in data is not flat this would hint to problems in the background description. A useful variable, try it out! - New directory structure for histograms in the target file produced by the training. The classifier directories have subdirectories according to the classifier names. This allows to better organise the histograms produced by different instances of the same classifier (e.g., during optimisation scans of that classifier). The GUI correctly uses the new structure. Some changes also occur in the weight files, though backward compatibility is guaranteed through the TMVA versioning. - Change in TMVA::Config class: the structs with global settings can no longer be addressed via (e.g.): , TMVA::gConfig().ionames.weightFileDir = "myWeightDirectory"; but rather via: (TMVA::gConfig().GetIONames()).fWeightFileDir = "myWeightDirectory"; This change is not backward compatible, but only concerns script commands, which should be easily updatable. - Internal renormalisation of event weights: to avoid artifacts if event weights have very small or large numbers, we renormalise the sum of weights of all signal and background training events to be equal the number of events. This does not affect the relative event weights. - Bug fixes: + Full fix of treatment of integer variables in Reader application (thanks to Andrea Bulgarelli for spotting this bug in the first place). + Fix of potential branch name/title mismatch in DataSet (bug spotted by Zhiyi Liu). - Known problems: + The performance of the CFMlpANN neural network appears questionable. We did not yet investigate whether there is a real problem. In the meantime, users who apply this ANN please either move the the recommended MLP ANN, or contact the authors. + The combined use of GA or MC with a Minuit converger in FDA has a problem in the configuration and is thus nonoperational at present. _______________________________________________________________________________________ Release v3.7.3: - bug fix release resolving problem with discrete variables in Reader. Message sent to users: "A serious bug has been discovered in the TMVA::Reader when dealing with 'I'-declared (int) variables. In that case, the Reader response is inconsistent with the test results derived in the training phase, and wrong (the test results are correct). We have uploaded TMVA-v3.7.3 with a fix of this bug - with however the temporary restriction that decorrelation preprocessing in presence of 'I'-declared variables does not work. We are working on a full fix of the problem." _______________________________________________________________________________________ TMVA Copyright (2005-2009): Andreas Hoecker (CERN) Peter Speckmayer (CERN) Joerg Stelzer (CERN) Fredrik Tegenfeldt (Iowa U. until August 2007) Jan Therhaag (Bonn U.) Eckhard von Toerne (Bonn U.) Helge Voss (MPI-KP Heidelberg) Kai Voss (U. of Victoria until May 2006) Redistribution and use of TMVA in source and binary forms, with or without modification, are permitted according to the terms listed in the BSD license: http://tmva.sourceforge.net/LICENSE