As input data is used a toy-MC sample consisting of two guassian distributions.
The output file "TMVA.root" can be analysed with the use of dedicated macros (simply say: root -l <macro.C>), which can be conveniently invoked through a GUI that will appear at the end of the run of this macro. Launch the GUI via the command:
Cross evaluation is a special case of k-folds cross validation where the splitting into k folds is computed deterministically. This ensures that the a given event will always end up in the same fold.
In addition all resulting classifiers are saved and can be applied to new data using MethodCrossValidation
. One requirement for this to work is a splitting function that is evaluated for each event to determine into what fold it goes (for training/evaluation) or to what classifier (for application).
Cross evaluation uses a deterministic split to partition the data into folds called the split expression. The expression can be any valid TFormula
as long as all parts used are defined.
For each event the split expression is evaluated to a number and the event is put in the fold corresponding to that number.
The split expression has access to all spectators and variables defined in the dataloader. Additionally, the number of folds in the split can be accessed with NumFolds
(or numFolds
).
0.0271029472351
38.842607975
Processing /mnt/build/workspace/root-makedoc-v614/rootspi/rdoc/src/v6-14-00-patches/tutorials/tmva/TMVACrossValidationRegression.C...
DataSetInfo : [dataset] : Added class "Regression"
: Add Tree TreeR of type Regression with 10000 events
--- TMVACrossValidationRegression: Using input file: ./files/tmva_reg_example.root
DataSetInfo : [dataset] : Added class "Signal"
DataSetInfo : [dataset] : Added class "Background"
: Evaluate method: BDTG
<HEADER> Factory : Booking method: BDTG_fold1
:
<WARNING> <WARNING> : Value for option maxdepth was previously set to 3
: the option *InverseBoostNegWeights* does not exist for BoostType=Grad --> change
: to new default for GradBoost *Pray*
: Regression Loss Function: Huber
: Training 2000 Decision Trees ... patience please
: Elapsed time for training with 2500 events: 3.51 sec
: Dataset[dataset] : Create results for training
: Dataset[dataset] : Evaluation of BDTG_fold1 on training sample
: Dataset[dataset] : Elapsed time for evaluation of 2500 events: 0.842 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
: Creating xml weight file: dataset/weights/TMVACrossValidationRegression_BDTG_fold1.weights.xml
<HEADER> Factory : Test all methods
<HEADER> Factory : Test method: BDTG_fold1 for Regression performance
:
: Dataset[dataset] : Create results for testing
: Dataset[dataset] : Evaluation of BDTG_fold1 on testing sample
: Dataset[dataset] : Elapsed time for evaluation of 2500 events: 0.783 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
<HEADER> Factory : Evaluate all methods
: Evaluate regression method: BDTG_fold1
: TestRegression (testing)
: Calculate regression for all events
: Elapsed time for evaluation of 2500 events: 0.751 sec
: TestRegression (training)
: Calculate regression for all events
: Elapsed time for evaluation of 2500 events: 0.837 sec
:
: Evaluation results ranked by smallest RMS on test sample:
: ("Bias" quotes the mean deviation of the regression from true target.
: "MutInf" is the "Mutual Information" between regression and target.
: Indicated by "_T" are the corresponding "truncated" quantities ob-
: tained when removing events deviating more than 2sigma from average.)
: --------------------------------------------------------------------------------------------------
: --------------------------------------------------------------------------------------------------
: dataset BDTG_fold1 : 2.15 -1.55 83.3 75.0 | 2.424 2.346
: --------------------------------------------------------------------------------------------------
:
: Evaluation results ranked by smallest RMS on training sample:
: (overtraining check)
: --------------------------------------------------------------------------------------------------
: DataSet Name: MVA Method: <Bias> <Bias_T> RMS RMS_T | MutInf MutInf_T
: --------------------------------------------------------------------------------------------------
: dataset BDTG_fold1 : 3.54 -0.525 85.4 77.5 | 2.412 2.339
: --------------------------------------------------------------------------------------------------
:
<HEADER> Factory : Thank you for using TMVA!
: For citation information, please visit: http://tmva.sf.net/citeTMVA.html
<HEADER> Factory : Booking method: BDTG_fold2
:
<WARNING> <WARNING> : Value for option maxdepth was previously set to 3
: the option *InverseBoostNegWeights* does not exist for BoostType=Grad --> change
: to new default for GradBoost *Pray*
: Regression Loss Function: Huber
: Training 2000 Decision Trees ... patience please
: Elapsed time for training with 2500 events: 4.91 sec
: Dataset[dataset] : Create results for training
: Dataset[dataset] : Evaluation of BDTG_fold2 on training sample
: Dataset[dataset] : Elapsed time for evaluation of 2500 events: 1.13 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
: Creating xml weight file: dataset/weights/TMVACrossValidationRegression_BDTG_fold2.weights.xml
<HEADER> Factory : Test all methods
<HEADER> Factory : Test method: BDTG_fold2 for Regression performance
:
: Dataset[dataset] : Create results for testing
: Dataset[dataset] : Evaluation of BDTG_fold2 on testing sample
: Dataset[dataset] : Elapsed time for evaluation of 2500 events: 1.08 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
<HEADER> Factory : Evaluate all methods
: Evaluate regression method: BDTG_fold2
: TestRegression (testing)
: Calculate regression for all events
: Elapsed time for evaluation of 2500 events: 1.15 sec
: TestRegression (training)
: Calculate regression for all events
: Elapsed time for evaluation of 2500 events: 1.14 sec
:
: Evaluation results ranked by smallest RMS on test sample:
: ("Bias" quotes the mean deviation of the regression from true target.
: "MutInf" is the "Mutual Information" between regression and target.
: Indicated by "_T" are the corresponding "truncated" quantities ob-
: tained when removing events deviating more than 2sigma from average.)
: --------------------------------------------------------------------------------------------------
: --------------------------------------------------------------------------------------------------
: dataset BDTG_fold2 : 5.32 0.938 83.8 75.3 | 2.460 2.390
: --------------------------------------------------------------------------------------------------
:
: Evaluation results ranked by smallest RMS on training sample:
: (overtraining check)
: --------------------------------------------------------------------------------------------------
: DataSet Name: MVA Method: <Bias> <Bias_T> RMS RMS_T | MutInf MutInf_T
: --------------------------------------------------------------------------------------------------
: dataset BDTG_fold2 : 3.47 0.0323 81.5 73.5 | 2.419 2.355
: --------------------------------------------------------------------------------------------------
:
<HEADER> Factory : Thank you for using TMVA!
: For citation information, please visit: http://tmva.sf.net/citeTMVA.html
<HEADER> Factory : Booking method: BDTG
:
: Reading weightfile: dataset/weights/TMVACrossValidationRegression_BDTG_fold1.weights.xml
: Reading weight file: dataset/weights/TMVACrossValidationRegression_BDTG_fold1.weights.xml
: Reading weightfile: dataset/weights/TMVACrossValidationRegression_BDTG_fold2.weights.xml
: Reading weight file: dataset/weights/TMVACrossValidationRegression_BDTG_fold2.weights.xml
: Evaluate method: MLP
<HEADER> Factory : Booking method: MLP_fold1
:
<HEADER> MLP_fold1 : [dataset] : Create Transformation "Norm" with events from all classes.
:
<HEADER> : Transformation, Variable selection :
: Input : variable 'var1' <---> Output : variable 'var1'
: Input : variable 'var2' <---> Output : variable 'var2'
: Input : target 'fvalue' <---> Output : target 'fvalue'
<HEADER> MLP_fold1 : Building Network.
: Initializing weights
<HEADER> TFHandler_MLP_fold1 : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: 0.34813 0.47197 [ -1.0000 1.0000 ]
: var2: -0.0010602 0.57141 [ -1.0000 1.0000 ]
: fvalue: -0.15911 0.43044 [ -1.0000 1.0000 ]
: -----------------------------------------------------------
: Training Network
:
: Inaccurate progress timing for MLP...
: Elapsed time for training with 2500 events: 5.18 sec
: Dataset[dataset] : Create results for training
: Dataset[dataset] : Evaluation of MLP_fold1 on training sample
: Dataset[dataset] : Elapsed time for evaluation of 2500 events: 0.0087 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
: Creating xml weight file: dataset/weights/TMVACrossValidationRegression_MLP_fold1.weights.xml
<HEADER> Factory : Test all methods
<HEADER> Factory : Test method: MLP_fold1 for Regression performance
:
: Dataset[dataset] : Create results for testing
: Dataset[dataset] : Evaluation of MLP_fold1 on testing sample
: Dataset[dataset] : Elapsed time for evaluation of 2500 events: 0.00852 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
<HEADER> Factory : Evaluate all methods
: Evaluate regression method: MLP_fold1
: TestRegression (testing)
: Calculate regression for all events
: Elapsed time for evaluation of 2500 events: 0.00699 sec
: TestRegression (training)
: Calculate regression for all events
: Elapsed time for evaluation of 2500 events: 0.00771 sec
:
: Evaluation results ranked by smallest RMS on test sample:
: ("Bias" quotes the mean deviation of the regression from true target.
: "MutInf" is the "Mutual Information" between regression and target.
: Indicated by "_T" are the corresponding "truncated" quantities ob-
: tained when removing events deviating more than 2sigma from average.)
: --------------------------------------------------------------------------------------------------
: --------------------------------------------------------------------------------------------------
: dataset MLP_fold1 : 0.0525 0.0518 0.632 0.508 | 3.382 3.389
: --------------------------------------------------------------------------------------------------
:
: Evaluation results ranked by smallest RMS on training sample:
: (overtraining check)
: --------------------------------------------------------------------------------------------------
: DataSet Name: MVA Method: <Bias> <Bias_T> RMS RMS_T | MutInf MutInf_T
: --------------------------------------------------------------------------------------------------
: dataset MLP_fold1 : 0.0359 0.0451 0.617 0.503 | 3.393 3.400
: --------------------------------------------------------------------------------------------------
:
<HEADER> Factory : Thank you for using TMVA!
: For citation information, please visit: http://tmva.sf.net/citeTMVA.html
<HEADER> Factory : Booking method: MLP_fold2
:
<HEADER> MLP_fold2 : [dataset] : Create Transformation "Norm" with events from all classes.
:
<HEADER> : Transformation, Variable selection :
: Input : variable 'var1' <---> Output : variable 'var1'
: Input : variable 'var2' <---> Output : variable 'var2'
: Input : target 'fvalue' <---> Output : target 'fvalue'
<HEADER> MLP_fold2 : Building Network.
: Initializing weights
<HEADER> TFHandler_MLP_fold2 : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: 0.34494 0.47865 [ -1.0000 1.0000 ]
: var2: -0.023394 0.57756 [ -1.0000 1.0000 ]
: fvalue: -0.16964 0.42085 [ -1.0000 1.0000 ]
: -----------------------------------------------------------
: Training Network
:
: Inaccurate progress timing for MLP...
: Elapsed time for training with 2500 events: 4.99 sec
: Dataset[dataset] : Create results for training
: Dataset[dataset] : Evaluation of MLP_fold2 on training sample
: Dataset[dataset] : Elapsed time for evaluation of 2500 events: 0.00836 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
: Creating xml weight file: dataset/weights/TMVACrossValidationRegression_MLP_fold2.weights.xml
<HEADER> Factory : Test all methods
<HEADER> Factory : Test method: MLP_fold2 for Regression performance
:
: Dataset[dataset] : Create results for testing
: Dataset[dataset] : Evaluation of MLP_fold2 on testing sample
: Dataset[dataset] : Elapsed time for evaluation of 2500 events: 0.00729 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
<HEADER> Factory : Evaluate all methods
: Evaluate regression method: MLP_fold2
: TestRegression (testing)
: Calculate regression for all events
: Elapsed time for evaluation of 2500 events: 0.00662 sec
: TestRegression (training)
: Calculate regression for all events
: Elapsed time for evaluation of 2500 events: 0.00672 sec
:
: Evaluation results ranked by smallest RMS on test sample:
: ("Bias" quotes the mean deviation of the regression from true target.
: "MutInf" is the "Mutual Information" between regression and target.
: Indicated by "_T" are the corresponding "truncated" quantities ob-
: tained when removing events deviating more than 2sigma from average.)
: --------------------------------------------------------------------------------------------------
: --------------------------------------------------------------------------------------------------
: dataset MLP_fold2 : 0.0112 0.0124 0.509 0.438 | 3.423 3.415
: --------------------------------------------------------------------------------------------------
:
: Evaluation results ranked by smallest RMS on training sample:
: (overtraining check)
: --------------------------------------------------------------------------------------------------
: DataSet Name: MVA Method: <Bias> <Bias_T> RMS RMS_T | MutInf MutInf_T
: --------------------------------------------------------------------------------------------------
: dataset MLP_fold2 : 0.0264 0.0304 0.507 0.432 | 3.406 3.401
: --------------------------------------------------------------------------------------------------
:
<HEADER> Factory : Thank you for using TMVA!
: For citation information, please visit: http://tmva.sf.net/citeTMVA.html
<HEADER> Factory : Booking method: MLP
:
: Reading weightfile: dataset/weights/TMVACrossValidationRegression_MLP_fold1.weights.xml
: Reading weight file: dataset/weights/TMVACrossValidationRegression_MLP_fold1.weights.xml
<HEADER> MLP_fold1 : Building Network.
: Initializing weights
: Reading weightfile: dataset/weights/TMVACrossValidationRegression_MLP_fold2.weights.xml
: Reading weight file: dataset/weights/TMVACrossValidationRegression_MLP_fold2.weights.xml
<HEADER> MLP_fold2 : Building Network.
: Initializing weights
<HEADER> Factory : [dataset] : Create Transformation "I" with events from all classes.
:
<HEADER> : Transformation, Variable selection :
: Input : variable 'var1' <---> Output : variable 'var1'
: Input : variable 'var2' <---> Output : variable 'var2'
<HEADER> TFHandler_Factory : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: 3.3668 1.1876 [ 0.0019664 5.0000 ]
: var2: 2.4713 1.4343 [ 0.0032142 4.9990 ]
: fvalue: 164.71 82.946 [ 1.7144 391.85 ]
: -----------------------------------------------------------
: Ranking input variables (method unspecific)...
<HEADER> IdTransformation : Ranking result (top variable is best ranked)
: --------------------------------------------
: Rank : Variable : |Correlation with target|
: --------------------------------------------
: 1 : var2 : 7.529e-01
: 2 : var1 : 5.988e-01
: --------------------------------------------
<HEADER> IdTransformation : Ranking result (top variable is best ranked)
: -------------------------------------
: Rank : Variable : Mutual information
: -------------------------------------
: 1 : var1 : 2.229e+00
: 2 : var2 : 2.152e+00
: -------------------------------------
<HEADER> IdTransformation : Ranking result (top variable is best ranked)
: ------------------------------------
: Rank : Variable : Correlation Ratio
: ------------------------------------
: 1 : var1 : 6.230e+00
: 2 : var2 : 2.531e+00
: ------------------------------------
<HEADER> IdTransformation : Ranking result (top variable is best ranked)
: ----------------------------------------
: Rank : Variable : Correlation Ratio (T)
: ----------------------------------------
: 1 : var2 : 8.750e-01
: 2 : var1 : 3.664e-01
: ----------------------------------------
: Elapsed time for training with 5000 events: 4.05e-06 sec
: Dataset[dataset] : Create results for training
: Dataset[dataset] : Evaluation of BDTG on training sample
: Dataset[dataset] : Elapsed time for evaluation of 5000 events: 1.18 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
: Creating xml weight file: dataset/weights/TMVACrossValidationRegression_BDTG.weights.xml
: Elapsed time for training with 5000 events: 3.81e-06 sec
: Dataset[dataset] : Create results for training
: Dataset[dataset] : Evaluation of MLP on training sample
: Dataset[dataset] : Elapsed time for evaluation of 5000 events: 0.0189 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
: Creating xml weight file: dataset/weights/TMVACrossValidationRegression_MLP.weights.xml
<HEADER> Factory : Test all methods
<HEADER> Factory : Test method: BDTG for Regression performance
:
: Dataset[dataset] : Create results for testing
: Dataset[dataset] : Evaluation of BDTG on testing sample
: Dataset[dataset] : Elapsed time for evaluation of 5000 events: 1.1 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
<HEADER> Factory : Test method: MLP for Regression performance
:
: Dataset[dataset] : Create results for testing
: Dataset[dataset] : Evaluation of MLP on testing sample
: Dataset[dataset] : Elapsed time for evaluation of 5000 events: 0.0188 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
<HEADER> Factory : Evaluate all methods
: Evaluate regression method: BDTG
: TestRegression (testing)
: Calculate regression for all events
: Elapsed time for evaluation of 5000 events: 1.13 sec
: TestRegression (training)
: Calculate regression for all events
: Elapsed time for evaluation of 5000 events: 1.08 sec
<HEADER> TFHandler_BDTG : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: 3.3668 1.1876 [ 0.0019664 5.0000 ]
: var2: 2.4713 1.4343 [ 0.0032142 4.9990 ]
: fvalue: 164.71 82.946 [ 1.7144 391.85 ]
: -----------------------------------------------------------
: Evaluate regression method: MLP
: TestRegression (testing)
: Calculate regression for all events
: Elapsed time for evaluation of 5000 events: 0.0211 sec
: TestRegression (training)
: Calculate regression for all events
: Elapsed time for evaluation of 5000 events: 0.0182 sec
<HEADER> TFHandler_MLP : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: 3.3668 1.1876 [ 0.0019664 5.0000 ]
: var2: 2.4713 1.4343 [ 0.0032142 4.9990 ]
: fvalue: 164.71 82.946 [ 1.7144 391.85 ]
: -----------------------------------------------------------
:
: Evaluation results ranked by smallest RMS on test sample:
: ("Bias" quotes the mean deviation of the regression from true target.
: "MutInf" is the "Mutual Information" between regression and target.
: Indicated by "_T" are the corresponding "truncated" quantities ob-
: tained when removing events deviating more than 2sigma from average.)
: --------------------------------------------------------------------------------------------------
: --------------------------------------------------------------------------------------------------
: dataset MLP : 0.0317 0.0317 0.574 0.473 | 3.399 3.400
: dataset BDTG : 3.75 -0.280 83.6 75.2 | 2.406 2.338
: --------------------------------------------------------------------------------------------------
:
: Evaluation results ranked by smallest RMS on training sample:
: (overtraining check)
: --------------------------------------------------------------------------------------------------
: DataSet Name: MVA Method: <Bias> <Bias_T> RMS RMS_T | MutInf MutInf_T
: --------------------------------------------------------------------------------------------------
: dataset MLP : 0.0317 0.0317 0.574 0.473 | 3.399 3.400
: dataset BDTG : 3.75 -0.280 83.6 75.2 | 2.406 2.338
: --------------------------------------------------------------------------------------------------
:
<HEADER> Dataset:dataset : Created tree 'TestTree' with 5000 events
:
<HEADER> Dataset:dataset : Created tree 'TrainTree' with 5000 events
:
<HEADER> Factory : Thank you for using TMVA!
: For citation information, please visit: http://tmva.sf.net/citeTMVA.html
: Evaluation done.
==> Wrote root file: TMVAReg.root
==> TMVACrossValidationRegression is done!
(int) 0