Logo ROOT  
Reference Guide
TMVA_RNN_Classification.C File Reference

Detailed Description

View in nbviewer Open in SWAN TMVA Classification Example Using a Recurrent Neural Network

This is an example of using a RNN in TMVA. We do classification using a toy time dependent data set that is generated when running this example macro

Running with nthreads = 16
--- RNNClassification : Using input file: time_data_t10_d30.root
DataSetInfo : [dataset] : Added class "Signal"
: Add Tree sgn of type Signal with 10000 events
DataSetInfo : [dataset] : Added class "Background"
: Add Tree bkg of type Background with 10000 events
number of variables is 300
vars_time0[0],vars_time0[1],vars_time0[2],vars_time0[3],vars_time0[4],vars_time0[5],vars_time0[6],vars_time0[7],vars_time0[8],vars_time0[9],vars_time0[10],vars_time0[11],vars_time0[12],vars_time0[13],vars_time0[14],vars_time0[15],vars_time0[16],vars_time0[17],vars_time0[18],vars_time0[19],vars_time0[20],vars_time0[21],vars_time0[22],vars_time0[23],vars_time0[24],vars_time0[25],vars_time0[26],vars_time0[27],vars_time0[28],vars_time0[29],vars_time1[0],vars_time1[1],vars_time1[2],vars_time1[3],vars_time1[4],vars_time1[5],vars_time1[6],vars_time1[7],vars_time1[8],vars_time1[9],vars_time1[10],vars_time1[11],vars_time1[12],vars_time1[13],vars_time1[14],vars_time1[15],vars_time1[16],vars_time1[17],vars_time1[18],vars_time1[19],vars_time1[20],vars_time1[21],vars_time1[22],vars_time1[23],vars_time1[24],vars_time1[25],vars_time1[26],vars_time1[27],vars_time1[28],vars_time1[29],vars_time2[0],vars_time2[1],vars_time2[2],vars_time2[3],vars_time2[4],vars_time2[5],vars_time2[6],vars_time2[7],vars_time2[8],vars_time2[9],vars_time2[10],vars_time2[11],vars_time2[12],vars_time2[13],vars_time2[14],vars_time2[15],vars_time2[16],vars_time2[17],vars_time2[18],vars_time2[19],vars_time2[20],vars_time2[21],vars_time2[22],vars_time2[23],vars_time2[24],vars_time2[25],vars_time2[26],vars_time2[27],vars_time2[28],vars_time2[29],vars_time3[0],vars_time3[1],vars_time3[2],vars_time3[3],vars_time3[4],vars_time3[5],vars_time3[6],vars_time3[7],vars_time3[8],vars_time3[9],vars_time3[10],vars_time3[11],vars_time3[12],vars_time3[13],vars_time3[14],vars_time3[15],vars_time3[16],vars_time3[17],vars_time3[18],vars_time3[19],vars_time3[20],vars_time3[21],vars_time3[22],vars_time3[23],vars_time3[24],vars_time3[25],vars_time3[26],vars_time3[27],vars_time3[28],vars_time3[29],vars_time4[0],vars_time4[1],vars_time4[2],vars_time4[3],vars_time4[4],vars_time4[5],vars_time4[6],vars_time4[7],vars_time4[8],vars_time4[9],vars_time4[10],vars_time4[11],vars_time4[12],vars_time4[13],vars_time4[14],vars_time4[15],vars_time4[16],vars_time4[17],vars_time4[18],vars_time4[19],vars_time4[20],vars_time4[21],vars_time4[22],vars_time4[23],vars_time4[24],vars_time4[25],vars_time4[26],vars_time4[27],vars_time4[28],vars_time4[29],vars_time5[0],vars_time5[1],vars_time5[2],vars_time5[3],vars_time5[4],vars_time5[5],vars_time5[6],vars_time5[7],vars_time5[8],vars_time5[9],vars_time5[10],vars_time5[11],vars_time5[12],vars_time5[13],vars_time5[14],vars_time5[15],vars_time5[16],vars_time5[17],vars_time5[18],vars_time5[19],vars_time5[20],vars_time5[21],vars_time5[22],vars_time5[23],vars_time5[24],vars_time5[25],vars_time5[26],vars_time5[27],vars_time5[28],vars_time5[29],vars_time6[0],vars_time6[1],vars_time6[2],vars_time6[3],vars_time6[4],vars_time6[5],vars_time6[6],vars_time6[7],vars_time6[8],vars_time6[9],vars_time6[10],vars_time6[11],vars_time6[12],vars_time6[13],vars_time6[14],vars_time6[15],vars_time6[16],vars_time6[17],vars_time6[18],vars_time6[19],vars_time6[20],vars_time6[21],vars_time6[22],vars_time6[23],vars_time6[24],vars_time6[25],vars_time6[26],vars_time6[27],vars_time6[28],vars_time6[29],vars_time7[0],vars_time7[1],vars_time7[2],vars_time7[3],vars_time7[4],vars_time7[5],vars_time7[6],vars_time7[7],vars_time7[8],vars_time7[9],vars_time7[10],vars_time7[11],vars_time7[12],vars_time7[13],vars_time7[14],vars_time7[15],vars_time7[16],vars_time7[17],vars_time7[18],vars_time7[19],vars_time7[20],vars_time7[21],vars_time7[22],vars_time7[23],vars_time7[24],vars_time7[25],vars_time7[26],vars_time7[27],vars_time7[28],vars_time7[29],vars_time8[0],vars_time8[1],vars_time8[2],vars_time8[3],vars_time8[4],vars_time8[5],vars_time8[6],vars_time8[7],vars_time8[8],vars_time8[9],vars_time8[10],vars_time8[11],vars_time8[12],vars_time8[13],vars_time8[14],vars_time8[15],vars_time8[16],vars_time8[17],vars_time8[18],vars_time8[19],vars_time8[20],vars_time8[21],vars_time8[22],vars_time8[23],vars_time8[24],vars_time8[25],vars_time8[26],vars_time8[27],vars_time8[28],vars_time8[29],vars_time9[0],vars_time9[1],vars_time9[2],vars_time9[3],vars_time9[4],vars_time9[5],vars_time9[6],vars_time9[7],vars_time9[8],vars_time9[9],vars_time9[10],vars_time9[11],vars_time9[12],vars_time9[13],vars_time9[14],vars_time9[15],vars_time9[16],vars_time9[17],vars_time9[18],vars_time9[19],vars_time9[20],vars_time9[21],vars_time9[22],vars_time9[23],vars_time9[24],vars_time9[25],vars_time9[26],vars_time9[27],vars_time9[28],vars_time9[29],
prepared DATA LOADER
Factory : Booking method: TMVA_LSTM
:
: Parsing option string:
: ... "!H:V:ErrorStrategy=CROSSENTROPY:VarTransform=None:WeightInitialization=XAVIERUNIFORM:ValidationSize=0.2:RandomSeed=1234:InputLayout=10|30:Layout=LSTM|10|30|10|0|1,RESHAPE|FLAT,DENSE|64|TANH,LINEAR:TrainingStrategy=LearningRate=1e-3,Momentum=0.0,Repetitions=1,ConvergenceSteps=5,BatchSize=100,TestRepetitions=1,WeightDecay=1e-2,Regularization=None,MaxEpochs=20,Optimizer=ADAM,DropConfig=0.0+0.+0.+0.:Architecture=CPU"
: The following options are set:
: - By User:
: <none>
: - Default:
: Boost_num: "0" [Number of times the classifier will be boosted]
: Parsing option string:
: ... "!H:V:ErrorStrategy=CROSSENTROPY:VarTransform=None:WeightInitialization=XAVIERUNIFORM:ValidationSize=0.2:RandomSeed=1234:InputLayout=10|30:Layout=LSTM|10|30|10|0|1,RESHAPE|FLAT,DENSE|64|TANH,LINEAR:TrainingStrategy=LearningRate=1e-3,Momentum=0.0,Repetitions=1,ConvergenceSteps=5,BatchSize=100,TestRepetitions=1,WeightDecay=1e-2,Regularization=None,MaxEpochs=20,Optimizer=ADAM,DropConfig=0.0+0.+0.+0.:Architecture=CPU"
: The following options are set:
: - By User:
: V: "True" [Verbose output (short form of "VerbosityLevel" below - overrides the latter one)]
: VarTransform: "None" [List of variable transformations performed before training, e.g., "D_Background,P_Signal,G,N_AllClasses" for: "Decorrelation, PCA-transformation, Gaussianisation, Normalisation, each for the given class of events ('AllClasses' denotes all events of all classes, if no class indication is given, 'All' is assumed)"]
: H: "False" [Print method-specific help message]
: InputLayout: "10|30" [The Layout of the input]
: Layout: "LSTM|10|30|10|0|1,RESHAPE|FLAT,DENSE|64|TANH,LINEAR" [Layout of the network.]
: ErrorStrategy: "CROSSENTROPY" [Loss function: Mean squared error (regression) or cross entropy (binary classification).]
: WeightInitialization: "XAVIERUNIFORM" [Weight initialization strategy]
: RandomSeed: "1234" [Random seed used for weight initialization and batch shuffling]
: ValidationSize: "0.2" [Part of the training data to use for validation. Specify as 0.2 or 20% to use a fifth of the data set as validation set. Specify as 100 to use exactly 100 events. (Default: 20%)]
: Architecture: "CPU" [Which architecture to perform the training on.]
: TrainingStrategy: "LearningRate=1e-3,Momentum=0.0,Repetitions=1,ConvergenceSteps=5,BatchSize=100,TestRepetitions=1,WeightDecay=1e-2,Regularization=None,MaxEpochs=20,Optimizer=ADAM,DropConfig=0.0+0.+0.+0." [Defines the training strategies.]
: - Default:
: VerbosityLevel: "Default" [Verbosity level]
: CreateMVAPdfs: "False" [Create PDFs for classifier outputs (signal and background)]
: IgnoreNegWeightsInTraining: "False" [Events with negative weights are ignored in the training (but are included for testing and performance evaluation)]
: BatchLayout: "0|0|0" [The Layout of the batch]
: Will now use the CPU architecture with BLAS and IMT support !
Factory : Booking method: TMVA_DNN
:
: Parsing option string:
: ... "!H:V:ErrorStrategy=CROSSENTROPY:VarTransform=None:WeightInitialization=XAVIER:RandomSeed=0:InputLayout=1|1|300:Layout=DENSE|64|TANH,DENSE|TANH|64,DENSE|TANH|64,LINEAR:TrainingStrategy=LearningRate=1e-3,Momentum=0.0,Repetitions=1,ConvergenceSteps=10,BatchSize=256,TestRepetitions=1,WeightDecay=1e-4,Regularization=None,MaxEpochs=20DropConfig=0.0+0.+0.+0.,Optimizer=ADAM:CPU"
: The following options are set:
: - By User:
: <none>
: - Default:
: Boost_num: "0" [Number of times the classifier will be boosted]
: Parsing option string:
: ... "!H:V:ErrorStrategy=CROSSENTROPY:VarTransform=None:WeightInitialization=XAVIER:RandomSeed=0:InputLayout=1|1|300:Layout=DENSE|64|TANH,DENSE|TANH|64,DENSE|TANH|64,LINEAR:TrainingStrategy=LearningRate=1e-3,Momentum=0.0,Repetitions=1,ConvergenceSteps=10,BatchSize=256,TestRepetitions=1,WeightDecay=1e-4,Regularization=None,MaxEpochs=20DropConfig=0.0+0.+0.+0.,Optimizer=ADAM:CPU"
: The following options are set:
: - By User:
: V: "True" [Verbose output (short form of "VerbosityLevel" below - overrides the latter one)]
: VarTransform: "None" [List of variable transformations performed before training, e.g., "D_Background,P_Signal,G,N_AllClasses" for: "Decorrelation, PCA-transformation, Gaussianisation, Normalisation, each for the given class of events ('AllClasses' denotes all events of all classes, if no class indication is given, 'All' is assumed)"]
: H: "False" [Print method-specific help message]
: InputLayout: "1|1|300" [The Layout of the input]
: Layout: "DENSE|64|TANH,DENSE|TANH|64,DENSE|TANH|64,LINEAR" [Layout of the network.]
: ErrorStrategy: "CROSSENTROPY" [Loss function: Mean squared error (regression) or cross entropy (binary classification).]
: WeightInitialization: "XAVIER" [Weight initialization strategy]
: RandomSeed: "0" [Random seed used for weight initialization and batch shuffling]
: Architecture: "CPU" [Which architecture to perform the training on.]
: TrainingStrategy: "LearningRate=1e-3,Momentum=0.0,Repetitions=1,ConvergenceSteps=10,BatchSize=256,TestRepetitions=1,WeightDecay=1e-4,Regularization=None,MaxEpochs=20DropConfig=0.0+0.+0.+0.,Optimizer=ADAM" [Defines the training strategies.]
: - Default:
: VerbosityLevel: "Default" [Verbosity level]
: CreateMVAPdfs: "False" [Create PDFs for classifier outputs (signal and background)]
: IgnoreNegWeightsInTraining: "False" [Events with negative weights are ignored in the training (but are included for testing and performance evaluation)]
: BatchLayout: "0|0|0" [The Layout of the batch]
: ValidationSize: "20%" [Part of the training data to use for validation. Specify as 0.2 or 20% to use a fifth of the data set as validation set. Specify as 100 to use exactly 100 events. (Default: 20%)]
: Will now use the CPU architecture with BLAS and IMT support !
Factory : Booking method: BDTG
:
: the option NegWeightTreatment=InverseBoostNegWeights does not exist for BoostType=Grad
: --> change to new default NegWeightTreatment=Pray
: Building event vectors for type 2 Signal
: Dataset[dataset] : create input formulas for tree sgn
: Using variable vars_time0[0] from array expression vars_time0 of size 30
: Using variable vars_time1[0] from array expression vars_time1 of size 30
: Using variable vars_time2[0] from array expression vars_time2 of size 30
: Using variable vars_time3[0] from array expression vars_time3 of size 30
: Using variable vars_time4[0] from array expression vars_time4 of size 30
: Using variable vars_time5[0] from array expression vars_time5 of size 30
: Using variable vars_time6[0] from array expression vars_time6 of size 30
: Using variable vars_time7[0] from array expression vars_time7 of size 30
: Using variable vars_time8[0] from array expression vars_time8 of size 30
: Using variable vars_time9[0] from array expression vars_time9 of size 30
: Building event vectors for type 2 Background
: Dataset[dataset] : create input formulas for tree bkg
: Using variable vars_time0[0] from array expression vars_time0 of size 30
: Using variable vars_time1[0] from array expression vars_time1 of size 30
: Using variable vars_time2[0] from array expression vars_time2 of size 30
: Using variable vars_time3[0] from array expression vars_time3 of size 30
: Using variable vars_time4[0] from array expression vars_time4 of size 30
: Using variable vars_time5[0] from array expression vars_time5 of size 30
: Using variable vars_time6[0] from array expression vars_time6 of size 30
: Using variable vars_time7[0] from array expression vars_time7 of size 30
: Using variable vars_time8[0] from array expression vars_time8 of size 30
: Using variable vars_time9[0] from array expression vars_time9 of size 30
DataSetFactory : [dataset] : Number of events in input trees
:
:
: Number of training and testing events
: ---------------------------------------------------------------------------
: Signal -- training events : 8000
: Signal -- testing events : 2000
: Signal -- training and testing events: 10000
: Background -- training events : 8000
: Background -- testing events : 2000
: Background -- training and testing events: 10000
:
Factory : Train all methods
Factory : Train method: TMVA_LSTM for Classification
:
: Start of deep neural network training on CPU using MT, nthreads = 16
:
: ***** Deep Learning Network *****
DEEP NEURAL NETWORK: Depth = 4 Input = ( 10, 1, 30 ) Batch size = 100 Loss function = C
Layer 0 LSTM Layer: (NInput = 30, NState = 10, NTime = 10 ) Output = ( 100 , 10 , 10 )
Layer 1 RESHAPE Layer Input = ( 1 , 10 , 10 ) Output = ( 1 , 100 , 100 )
Layer 2 DENSE Layer: ( Input = 100 , Width = 64 ) Output = ( 1 , 100 , 64 ) Activation Function = Tanh
Layer 3 DENSE Layer: ( Input = 64 , Width = 1 ) Output = ( 1 , 100 , 1 ) Activation Function = Identity
: Using 12800 events for training and 3200 for testing
: Compute initial loss on the validation data
: Training phase 1 of 1: Optimizer ADAM Learning rate = 0.001 regularization 0 minimum error = 0.704414
: --------------------------------------------------------------
: Epoch | Train Err. Val. Err. t(s)/epoch t(s)/Loss nEvents/s Conv. Steps
: --------------------------------------------------------------
: Start epoch iteration ...
: 1 Minimum Test error found - save the configuration
: 1 | 0.68277 0.63685 3.28728 0.22606 4181.34 0
: 2 Minimum Test error found - save the configuration
: 2 | 0.61607 0.582259 3.15884 0.178726 4295.14 0
: 3 Minimum Test error found - save the configuration
: 3 | 0.563187 0.524031 2.74038 0.165785 4971.66 0
: 4 Minimum Test error found - save the configuration
: 4 | 0.507419 0.487824 2.7981 0.176094 4881.75 0
: 5 Minimum Test error found - save the configuration
: 5 | 0.47962 0.454596 2.86022 0.184216 4783.26 0
: 6 Minimum Test error found - save the configuration
: 6 | 0.456611 0.452345 2.82835 0.168757 4812.77 0
: 7 Minimum Test error found - save the configuration
: 7 | 0.443273 0.435399 2.59108 0.168617 5283.89 0
: 8 Minimum Test error found - save the configuration
: 8 | 0.435494 0.417676 2.80635 0.182042 4877.47 0
: 9 Minimum Test error found - save the configuration
: 9 | 0.42672 0.41285 2.79083 0.172066 4887.79 0
: 10 Minimum Test error found - save the configuration
: 10 | 0.417906 0.411237 2.56413 0.16 5324.17 0
: 11 Minimum Test error found - save the configuration
: 11 | 0.412323 0.409474 2.48007 0.160882 5519.18 0
: 12 Minimum Test error found - save the configuration
: 12 | 0.410313 0.404334 2.39637 0.158628 5720.06 0
: 13 Minimum Test error found - save the configuration
: 13 | 0.403103 0.397356 2.41573 0.157647 5668.52 0
: 14 | 0.395619 0.403822 2.40063 0.15855 5708.99 1
: 15 Minimum Test error found - save the configuration
: 15 | 0.395975 0.39255 2.4275 0.15833 5640.82 0
: 16 | 0.392582 0.400194 2.39054 0.157629 5732.43 1
: 17 Minimum Test error found - save the configuration
: 17 | 0.388816 0.383036 2.39693 0.158213 5717.57 0
: 18 | 0.388074 0.394672 2.40966 0.157344 5683.04 1
: 19 | 0.384397 0.39661 2.39084 0.156835 5729.62 2
: 20 | 0.381914 0.383042 2.42884 0.158352 5637.56 3
:
: Elapsed time for training with 16000 events: 52.8 sec
: Evaluate deep neural network on CPU using batches with size = 100
:
TMVA_LSTM : [dataset] : Evaluation of TMVA_LSTM on training sample (16000 events)
: Elapsed time for evaluation of 16000 events: 0.791 sec
: Creating xml weight file: dataset/weights/TMVAClassification_TMVA_LSTM.weights.xml
: Creating standalone class: dataset/weights/TMVAClassification_TMVA_LSTM.class.C
Factory : Training finished
:
Factory : Train method: TMVA_DNN for Classification
:
: Start of deep neural network training on CPU using MT, nthreads = 16
:
: ***** Deep Learning Network *****
DEEP NEURAL NETWORK: Depth = 4 Input = ( 1, 1, 300 ) Batch size = 256 Loss function = C
Layer 0 DENSE Layer: ( Input = 300 , Width = 64 ) Output = ( 1 , 256 , 64 ) Activation Function = Tanh
Layer 1 DENSE Layer: ( Input = 64 , Width = 64 ) Output = ( 1 , 256 , 64 ) Activation Function = Tanh
Layer 2 DENSE Layer: ( Input = 64 , Width = 64 ) Output = ( 1 , 256 , 64 ) Activation Function = Tanh
Layer 3 DENSE Layer: ( Input = 64 , Width = 1 ) Output = ( 1 , 256 , 1 ) Activation Function = Identity
: Using 12800 events for training and 3200 for testing
: Compute initial loss on the validation data
: Training phase 1 of 1: Optimizer ADAM Learning rate = 0.001 regularization 0 minimum error = 1.14003
: --------------------------------------------------------------
: Epoch | Train Err. Val. Err. t(s)/epoch t(s)/Loss nEvents/s Conv. Steps
: --------------------------------------------------------------
: Start epoch iteration ...
: 1 Minimum Test error found - save the configuration
: 1 | 0.714263 0.687872 0.981615 0.0910586 14373 0
: 2 Minimum Test error found - save the configuration
: 2 | 0.683777 0.678751 0.972967 0.0897263 14492.1 0
: 3 | 0.677773 0.705111 0.98286 0.09427 14404.8 1
: 4 | 0.685459 0.679209 0.975122 0.0899719 14460.8 2
: 5 Minimum Test error found - save the configuration
: 5 | 0.670674 0.652663 0.976125 0.0901297 14447 0
: 6 | 0.666919 0.684163 0.97412 0.0906605 14488.5 1
: 7 Minimum Test error found - save the configuration
: 7 | 0.663392 0.637325 0.971705 0.0895721 14510.3 0
: 8 | 0.662556 0.659865 0.97299 0.0904439 14503.5 1
: 9 Minimum Test error found - save the configuration
: 9 | 0.639819 0.626769 0.979269 0.0899969 14393.8 0
: 10 | 0.636365 0.649645 0.981553 0.0910713 14374.2 1
: 11 Minimum Test error found - save the configuration
: 11 | 0.606136 0.604293 0.980946 0.0904022 14373.2 0
: 12 | 0.616097 0.624643 0.984434 0.0905393 14319.4 1
: 13 Minimum Test error found - save the configuration
: 13 | 0.599959 0.603684 0.984823 0.0896639 14299.1 0
: 14 | 0.614713 0.623757 0.995522 0.0898476 14133.1 1
: 15 Minimum Test error found - save the configuration
: 15 | 0.587372 0.550483 1.02215 0.0891858 13719.7 0
: 16 | 0.569293 0.580623 1.04485 0.0900216 13405.6 1
: 17 | 0.588866 0.556252 1.06383 0.0891463 13132.5 2
: 18 Minimum Test error found - save the configuration
: 18 | 0.545505 0.540063 1.04562 0.0904567 13400.8 0
: 19 | 0.534485 0.594832 1.03979 0.0894674 13469.1 1
: 20 Minimum Test error found - save the configuration
: 20 | 0.544712 0.524199 1.04064 0.0900661 13465.6 0
:
: Elapsed time for training with 16000 events: 20.1 sec
: Evaluate deep neural network on CPU using batches with size = 256
:
TMVA_DNN : [dataset] : Evaluation of TMVA_DNN on training sample (16000 events)
: Elapsed time for evaluation of 16000 events: 0.466 sec
: Creating xml weight file: dataset/weights/TMVAClassification_TMVA_DNN.weights.xml
: Creating standalone class: dataset/weights/TMVAClassification_TMVA_DNN.class.C
Factory : Training finished
:
Factory : Train method: BDTG for Classification
:
BDTG : #events: (reweighted) sig: 8000 bkg: 8000
: #events: (unweighted) sig: 8000 bkg: 8000
: Training 100 Decision Trees ... patience please
: Elapsed time for training with 16000 events: 8.93 sec
BDTG : [dataset] : Evaluation of BDTG on training sample (16000 events)
: Elapsed time for evaluation of 16000 events: 0.0823 sec
: Creating xml weight file: dataset/weights/TMVAClassification_BDTG.weights.xml
: Creating standalone class: dataset/weights/TMVAClassification_BDTG.class.C
: data_RNN_CPU.root:/dataset/Method_BDT/BDTG
Factory : Training finished
:
: Ranking input variables (method specific)...
: No variable ranking supplied by classifier: TMVA_LSTM
: No variable ranking supplied by classifier: TMVA_DNN
BDTG : Ranking result (top variable is best ranked)
: --------------------------------------------
: Rank : Variable : Variable Importance
: --------------------------------------------
: 1 : vars_time8 : 2.843e-02
: 2 : vars_time8 : 2.805e-02
: 3 : vars_time9 : 2.757e-02
: 4 : vars_time8 : 2.751e-02
: 5 : vars_time7 : 2.741e-02
: 6 : vars_time7 : 2.711e-02
: 7 : vars_time6 : 2.578e-02
: 8 : vars_time9 : 2.416e-02
: 9 : vars_time9 : 2.381e-02
: 10 : vars_time8 : 2.375e-02
: 11 : vars_time9 : 2.366e-02
: 12 : vars_time7 : 2.361e-02
: 13 : vars_time6 : 2.326e-02
: 14 : vars_time6 : 2.233e-02
: 15 : vars_time7 : 2.220e-02
: 16 : vars_time8 : 2.197e-02
: 17 : vars_time9 : 2.194e-02
: 18 : vars_time7 : 2.162e-02
: 19 : vars_time5 : 2.158e-02
: 20 : vars_time0 : 1.987e-02
: 21 : vars_time6 : 1.970e-02
: 22 : vars_time7 : 1.884e-02
: 23 : vars_time5 : 1.772e-02
: 24 : vars_time6 : 1.727e-02
: 25 : vars_time8 : 1.668e-02
: 26 : vars_time7 : 1.617e-02
: 27 : vars_time8 : 1.592e-02
: 28 : vars_time9 : 1.582e-02
: 29 : vars_time7 : 1.427e-02
: 30 : vars_time5 : 1.240e-02
: 31 : vars_time7 : 1.230e-02
: 32 : vars_time7 : 1.219e-02
: 33 : vars_time6 : 1.192e-02
: 34 : vars_time5 : 1.177e-02
: 35 : vars_time9 : 1.131e-02
: 36 : vars_time7 : 1.112e-02
: 37 : vars_time8 : 1.082e-02
: 38 : vars_time9 : 1.079e-02
: 39 : vars_time0 : 1.079e-02
: 40 : vars_time7 : 1.078e-02
: 41 : vars_time8 : 1.076e-02
: 42 : vars_time8 : 1.057e-02
: 43 : vars_time4 : 1.013e-02
: 44 : vars_time8 : 9.769e-03
: 45 : vars_time9 : 9.479e-03
: 46 : vars_time1 : 9.298e-03
: 47 : vars_time0 : 9.202e-03
: 48 : vars_time6 : 8.944e-03
: 49 : vars_time8 : 8.916e-03
: 50 : vars_time7 : 8.604e-03
: 51 : vars_time8 : 8.587e-03
: 52 : vars_time0 : 8.524e-03
: 53 : vars_time9 : 8.489e-03
: 54 : vars_time8 : 7.978e-03
: 55 : vars_time5 : 7.872e-03
: 56 : vars_time0 : 7.586e-03
: 57 : vars_time6 : 6.828e-03
: 58 : vars_time9 : 6.780e-03
: 59 : vars_time8 : 6.560e-03
: 60 : vars_time7 : 6.189e-03
: 61 : vars_time7 : 6.116e-03
: 62 : vars_time1 : 5.545e-03
: 63 : vars_time7 : 4.935e-03
: 64 : vars_time5 : 4.847e-03
: 65 : vars_time8 : 4.834e-03
: 66 : vars_time9 : 4.781e-03
: 67 : vars_time9 : 4.628e-03
: 68 : vars_time8 : 4.536e-03
: 69 : vars_time0 : 4.424e-03
: 70 : vars_time7 : 3.918e-03
: 71 : vars_time5 : 3.536e-03
: 72 : vars_time5 : 3.530e-03
: 73 : vars_time7 : 3.314e-03
: 74 : vars_time2 : 3.030e-03
: 75 : vars_time6 : 2.756e-03
: 76 : vars_time0 : 0.000e+00
: 77 : vars_time0 : 0.000e+00
: 78 : vars_time0 : 0.000e+00
: 79 : vars_time0 : 0.000e+00
: 80 : vars_time0 : 0.000e+00
: 81 : vars_time0 : 0.000e+00
: 82 : vars_time0 : 0.000e+00
: 83 : vars_time0 : 0.000e+00
: 84 : vars_time0 : 0.000e+00
: 85 : vars_time0 : 0.000e+00
: 86 : vars_time0 : 0.000e+00
: 87 : vars_time0 : 0.000e+00
: 88 : vars_time0 : 0.000e+00
: 89 : vars_time0 : 0.000e+00
: 90 : vars_time0 : 0.000e+00
: 91 : vars_time0 : 0.000e+00
: 92 : vars_time0 : 0.000e+00
: 93 : vars_time0 : 0.000e+00
: 94 : vars_time0 : 0.000e+00
: 95 : vars_time0 : 0.000e+00
: 96 : vars_time0 : 0.000e+00
: 97 : vars_time0 : 0.000e+00
: 98 : vars_time0 : 0.000e+00
: 99 : vars_time0 : 0.000e+00
: 100 : vars_time1 : 0.000e+00
: 101 : vars_time1 : 0.000e+00
: 102 : vars_time1 : 0.000e+00
: 103 : vars_time1 : 0.000e+00
: 104 : vars_time1 : 0.000e+00
: 105 : vars_time1 : 0.000e+00
: 106 : vars_time1 : 0.000e+00
: 107 : vars_time1 : 0.000e+00
: 108 : vars_time1 : 0.000e+00
: 109 : vars_time1 : 0.000e+00
: 110 : vars_time1 : 0.000e+00
: 111 : vars_time1 : 0.000e+00
: 112 : vars_time1 : 0.000e+00
: 113 : vars_time1 : 0.000e+00
: 114 : vars_time1 : 0.000e+00
: 115 : vars_time1 : 0.000e+00
: 116 : vars_time1 : 0.000e+00
: 117 : vars_time1 : 0.000e+00
: 118 : vars_time1 : 0.000e+00
: 119 : vars_time1 : 0.000e+00
: 120 : vars_time1 : 0.000e+00
: 121 : vars_time1 : 0.000e+00
: 122 : vars_time1 : 0.000e+00
: 123 : vars_time1 : 0.000e+00
: 124 : vars_time1 : 0.000e+00
: 125 : vars_time1 : 0.000e+00
: 126 : vars_time1 : 0.000e+00
: 127 : vars_time1 : 0.000e+00
: 128 : vars_time2 : 0.000e+00
: 129 : vars_time2 : 0.000e+00
: 130 : vars_time2 : 0.000e+00
: 131 : vars_time2 : 0.000e+00
: 132 : vars_time2 : 0.000e+00
: 133 : vars_time2 : 0.000e+00
: 134 : vars_time2 : 0.000e+00
: 135 : vars_time2 : 0.000e+00
: 136 : vars_time2 : 0.000e+00
: 137 : vars_time2 : 0.000e+00
: 138 : vars_time2 : 0.000e+00
: 139 : vars_time2 : 0.000e+00
: 140 : vars_time2 : 0.000e+00
: 141 : vars_time2 : 0.000e+00
: 142 : vars_time2 : 0.000e+00
: 143 : vars_time2 : 0.000e+00
: 144 : vars_time2 : 0.000e+00
: 145 : vars_time2 : 0.000e+00
: 146 : vars_time2 : 0.000e+00
: 147 : vars_time2 : 0.000e+00
: 148 : vars_time2 : 0.000e+00
: 149 : vars_time2 : 0.000e+00
: 150 : vars_time2 : 0.000e+00
: 151 : vars_time2 : 0.000e+00
: 152 : vars_time2 : 0.000e+00
: 153 : vars_time2 : 0.000e+00
: 154 : vars_time2 : 0.000e+00
: 155 : vars_time2 : 0.000e+00
: 156 : vars_time2 : 0.000e+00
: 157 : vars_time3 : 0.000e+00
: 158 : vars_time3 : 0.000e+00
: 159 : vars_time3 : 0.000e+00
: 160 : vars_time3 : 0.000e+00
: 161 : vars_time3 : 0.000e+00
: 162 : vars_time3 : 0.000e+00
: 163 : vars_time3 : 0.000e+00
: 164 : vars_time3 : 0.000e+00
: 165 : vars_time3 : 0.000e+00
: 166 : vars_time3 : 0.000e+00
: 167 : vars_time3 : 0.000e+00
: 168 : vars_time3 : 0.000e+00
: 169 : vars_time3 : 0.000e+00
: 170 : vars_time3 : 0.000e+00
: 171 : vars_time3 : 0.000e+00
: 172 : vars_time3 : 0.000e+00
: 173 : vars_time3 : 0.000e+00
: 174 : vars_time3 : 0.000e+00
: 175 : vars_time3 : 0.000e+00
: 176 : vars_time3 : 0.000e+00
: 177 : vars_time3 : 0.000e+00
: 178 : vars_time3 : 0.000e+00
: 179 : vars_time3 : 0.000e+00
: 180 : vars_time3 : 0.000e+00
: 181 : vars_time3 : 0.000e+00
: 182 : vars_time3 : 0.000e+00
: 183 : vars_time3 : 0.000e+00
: 184 : vars_time3 : 0.000e+00
: 185 : vars_time3 : 0.000e+00
: 186 : vars_time3 : 0.000e+00
: 187 : vars_time4 : 0.000e+00
: 188 : vars_time4 : 0.000e+00
: 189 : vars_time4 : 0.000e+00
: 190 : vars_time4 : 0.000e+00
: 191 : vars_time4 : 0.000e+00
: 192 : vars_time4 : 0.000e+00
: 193 : vars_time4 : 0.000e+00
: 194 : vars_time4 : 0.000e+00
: 195 : vars_time4 : 0.000e+00
: 196 : vars_time4 : 0.000e+00
: 197 : vars_time4 : 0.000e+00
: 198 : vars_time4 : 0.000e+00
: 199 : vars_time4 : 0.000e+00
: 200 : vars_time4 : 0.000e+00
: 201 : vars_time4 : 0.000e+00
: 202 : vars_time4 : 0.000e+00
: 203 : vars_time4 : 0.000e+00
: 204 : vars_time4 : 0.000e+00
: 205 : vars_time4 : 0.000e+00
: 206 : vars_time4 : 0.000e+00
: 207 : vars_time4 : 0.000e+00
: 208 : vars_time4 : 0.000e+00
: 209 : vars_time4 : 0.000e+00
: 210 : vars_time4 : 0.000e+00
: 211 : vars_time4 : 0.000e+00
: 212 : vars_time4 : 0.000e+00
: 213 : vars_time4 : 0.000e+00
: 214 : vars_time4 : 0.000e+00
: 215 : vars_time4 : 0.000e+00
: 216 : vars_time5 : 0.000e+00
: 217 : vars_time5 : 0.000e+00
: 218 : vars_time5 : 0.000e+00
: 219 : vars_time5 : 0.000e+00
: 220 : vars_time5 : 0.000e+00
: 221 : vars_time5 : 0.000e+00
: 222 : vars_time5 : 0.000e+00
: 223 : vars_time5 : 0.000e+00
: 224 : vars_time5 : 0.000e+00
: 225 : vars_time5 : 0.000e+00
: 226 : vars_time5 : 0.000e+00
: 227 : vars_time5 : 0.000e+00
: 228 : vars_time5 : 0.000e+00
: 229 : vars_time5 : 0.000e+00
: 230 : vars_time5 : 0.000e+00
: 231 : vars_time5 : 0.000e+00
: 232 : vars_time5 : 0.000e+00
: 233 : vars_time5 : 0.000e+00
: 234 : vars_time5 : 0.000e+00
: 235 : vars_time5 : 0.000e+00
: 236 : vars_time5 : 0.000e+00
: 237 : vars_time5 : 0.000e+00
: 238 : vars_time6 : 0.000e+00
: 239 : vars_time6 : 0.000e+00
: 240 : vars_time6 : 0.000e+00
: 241 : vars_time6 : 0.000e+00
: 242 : vars_time6 : 0.000e+00
: 243 : vars_time6 : 0.000e+00
: 244 : vars_time6 : 0.000e+00
: 245 : vars_time6 : 0.000e+00
: 246 : vars_time6 : 0.000e+00
: 247 : vars_time6 : 0.000e+00
: 248 : vars_time6 : 0.000e+00
: 249 : vars_time6 : 0.000e+00
: 250 : vars_time6 : 0.000e+00
: 251 : vars_time6 : 0.000e+00
: 252 : vars_time6 : 0.000e+00
: 253 : vars_time6 : 0.000e+00
: 254 : vars_time6 : 0.000e+00
: 255 : vars_time6 : 0.000e+00
: 256 : vars_time6 : 0.000e+00
: 257 : vars_time6 : 0.000e+00
: 258 : vars_time6 : 0.000e+00
: 259 : vars_time7 : 0.000e+00
: 260 : vars_time7 : 0.000e+00
: 261 : vars_time7 : 0.000e+00
: 262 : vars_time7 : 0.000e+00
: 263 : vars_time7 : 0.000e+00
: 264 : vars_time7 : 0.000e+00
: 265 : vars_time7 : 0.000e+00
: 266 : vars_time7 : 0.000e+00
: 267 : vars_time7 : 0.000e+00
: 268 : vars_time7 : 0.000e+00
: 269 : vars_time7 : 0.000e+00
: 270 : vars_time7 : 0.000e+00
: 271 : vars_time8 : 0.000e+00
: 272 : vars_time8 : 0.000e+00
: 273 : vars_time8 : 0.000e+00
: 274 : vars_time8 : 0.000e+00
: 275 : vars_time8 : 0.000e+00
: 276 : vars_time8 : 0.000e+00
: 277 : vars_time8 : 0.000e+00
: 278 : vars_time8 : 0.000e+00
: 279 : vars_time8 : 0.000e+00
: 280 : vars_time8 : 0.000e+00
: 281 : vars_time8 : 0.000e+00
: 282 : vars_time8 : 0.000e+00
: 283 : vars_time8 : 0.000e+00
: 284 : vars_time9 : 0.000e+00
: 285 : vars_time9 : 0.000e+00
: 286 : vars_time9 : 0.000e+00
: 287 : vars_time9 : 0.000e+00
: 288 : vars_time9 : 0.000e+00
: 289 : vars_time9 : 0.000e+00
: 290 : vars_time9 : 0.000e+00
: 291 : vars_time9 : 0.000e+00
: 292 : vars_time9 : 0.000e+00
: 293 : vars_time9 : 0.000e+00
: 294 : vars_time9 : 0.000e+00
: 295 : vars_time9 : 0.000e+00
: 296 : vars_time9 : 0.000e+00
: 297 : vars_time9 : 0.000e+00
: 298 : vars_time9 : 0.000e+00
: 299 : vars_time9 : 0.000e+00
: 300 : vars_time9 : 0.000e+00
: --------------------------------------------
TH1.Print Name = TrainingHistory_TMVA_LSTM_trainingError, Entries= 0, Total sum= 8.98219
TH1.Print Name = TrainingHistory_TMVA_LSTM_valError, Entries= 0, Total sum= 8.78016
TH1.Print Name = TrainingHistory_TMVA_DNN_trainingError, Entries= 0, Total sum= 12.5081
TH1.Print Name = TrainingHistory_TMVA_DNN_valError, Entries= 0, Total sum= 12.4642
Factory : === Destroy and recreate all methods via weight files for testing ===
:
: Reading weight file: dataset/weights/TMVAClassification_TMVA_LSTM.weights.xml
: Reading weight file: dataset/weights/TMVAClassification_TMVA_DNN.weights.xml
: Reading weight file: dataset/weights/TMVAClassification_BDTG.weights.xml
nthreads = 16
Factory : Test all methods
Factory : Test method: TMVA_LSTM for Classification performance
:
: Evaluate deep neural network on CPU using batches with size = 1000
:
TMVA_LSTM : [dataset] : Evaluation of TMVA_LSTM on testing sample (4000 events)
: Elapsed time for evaluation of 4000 events: 0.173 sec
Factory : Test method: TMVA_DNN for Classification performance
:
: Evaluate deep neural network on CPU using batches with size = 1000
:
TMVA_DNN : [dataset] : Evaluation of TMVA_DNN on testing sample (4000 events)
: Elapsed time for evaluation of 4000 events: 0.0999 sec
Factory : Test method: BDTG for Classification performance
:
BDTG : [dataset] : Evaluation of BDTG on testing sample (4000 events)
: Elapsed time for evaluation of 4000 events: 0.0195 sec
Factory : Evaluate all methods
Factory : Evaluate classifier: TMVA_LSTM
:
TMVA_LSTM : [dataset] : Loop over test events and fill histograms with classifier response...
:
: Evaluate deep neural network on CPU using batches with size = 1000
:
: Dataset[dataset] : variable plots are not produces ! The number of variables is 300 , it is larger than 200
Factory : Evaluate classifier: TMVA_DNN
:
TMVA_DNN : [dataset] : Loop over test events and fill histograms with classifier response...
:
: Evaluate deep neural network on CPU using batches with size = 1000
:
: Dataset[dataset] : variable plots are not produces ! The number of variables is 300 , it is larger than 200
Factory : Evaluate classifier: BDTG
:
BDTG : [dataset] : Loop over test events and fill histograms with classifier response...
:
: Dataset[dataset] : variable plots are not produces ! The number of variables is 300 , it is larger than 200
:
: Evaluation results ranked by best signal efficiency and purity (area)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA
: Name: Method: ROC-integ
: dataset TMVA_LSTM : 0.900
: dataset BDTG : 0.842
: dataset TMVA_DNN : 0.819
: -------------------------------------------------------------------------------------------------------------------
:
: Testing efficiency compared to training efficiency (overtraining check)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA Signal efficiency: from test sample (from training sample)
: Name: Method: @B=0.01 @B=0.10 @B=0.30
: -------------------------------------------------------------------------------------------------------------------
: dataset TMVA_LSTM : 0.300 (0.322) 0.708 (0.720) 0.900 (0.910)
: dataset BDTG : 0.188 (0.221) 0.569 (0.578) 0.805 (0.831)
: dataset TMVA_DNN : 0.061 (0.067) 0.519 (0.543) 0.737 (0.739)
: -------------------------------------------------------------------------------------------------------------------
:
Dataset:dataset : Created tree 'TestTree' with 4000 events
:
Dataset:dataset : Created tree 'TrainTree' with 16000 events
:
Factory : Thank you for using TMVA!
: For citation information, please visit: http://tmva.sf.net/citeTMVA.html
/***
# TMVA Classification Example Using a Recurrent Neural Network
This is an example of using a RNN in TMVA.
We do the classification using a toy data set containing a time series of data sample ntimes
and with dimension ndim that is generated when running the provided function `MakeTimeData (nevents, ntime, ndim)`
**/
#include<TROOT.h>
#include "TMVA/Factory.h"
#include "TMVA/Config.h"
#include "TMVA/MethodDL.h"
#include "TFile.h"
#include "TTree.h"
/// Helper function to generate the time data set
/// make some time data but not of fixed length.
/// use a poisson with mu = 5 and troncated at 10
///
void MakeTimeData(int n, int ntime, int ndim )
{
// const int ntime = 10;
// const int ndim = 30; // number of dim/time
TString fname = TString::Format("time_data_t%d_d%d.root", ntime, ndim);
std::vector<TH1 *> v1(ntime);
std::vector<TH1 *> v2(ntime);
int i = 0;
for (int i = 0; i < ntime; ++i) {
v1[i] = new TH1D(TString::Format("h1_%d", i), "h1", ndim, 0, 10);
v2[i] = new TH1D(TString::Format("h2_%d", i), "h2", ndim, 0, 10);
}
auto f1 = new TF1("f1", "gaus");
auto f2 = new TF1("f2", "gaus");
TTree sgn("sgn", "sgn");
TTree bkg("bkg", "bkg");
TFile f(fname, "RECREATE");
std::vector<std::vector<float>> x1(ntime);
std::vector<std::vector<float>> x2(ntime);
for (int i = 0; i < ntime; ++i) {
x1[i] = std::vector<float>(ndim);
x2[i] = std::vector<float>(ndim);
}
for (auto i = 0; i < ntime; i++) {
bkg.Branch(Form("vars_time%d", i), "std::vector<float>", &x1[i]);
sgn.Branch(Form("vars_time%d", i), "std::vector<float>", &x2[i]);
}
sgn.SetDirectory(&f);
bkg.SetDirectory(&f);
std::vector<double> mean1(ntime);
std::vector<double> mean2(ntime);
std::vector<double> sigma1(ntime);
std::vector<double> sigma2(ntime);
for (int j = 0; j < ntime; ++j) {
mean1[j] = 5. + 0.2 * sin(TMath::Pi() * j / double(ntime));
mean2[j] = 5. + 0.2 * cos(TMath::Pi() * j / double(ntime));
sigma1[j] = 4 + 0.3 * sin(TMath::Pi() * j / double(ntime));
sigma2[j] = 4 + 0.3 * cos(TMath::Pi() * j / double(ntime));
}
for (int i = 0; i < n; ++i) {
if (i % 1000 == 0)
std::cout << "Generating event ... " << i << std::endl;
for (int j = 0; j < ntime; ++j) {
auto h1 = v1[j];
auto h2 = v2[j];
h1->Reset();
h2->Reset();
f1->SetParameters(1, mean1[j], sigma1[j]);
f2->SetParameters(1, mean2[j], sigma2[j]);
h1->FillRandom("f1", 1000);
h2->FillRandom("f2", 1000);
for (int k = 0; k < ndim; ++k) {
// std::cout << j*10+k << " ";
x1[j][k] = h1->GetBinContent(k + 1) + gRandom->Gaus(0, 10);
x2[j][k] = h2->GetBinContent(k + 1) + gRandom->Gaus(0, 10);
}
}
// std::cout << std::endl;
sgn.Fill();
bkg.Fill();
if (n == 1) {
auto c1 = new TCanvas();
c1->Divide(ntime, 2);
for (int j = 0; j < ntime; ++j) {
c1->cd(j + 1);
v1[j]->Draw();
}
for (int j = 0; j < ntime; ++j) {
c1->cd(ntime + j + 1);
v2[j]->Draw();
}
gPad->Update();
}
}
if (n > 1) {
sgn.Write();
bkg.Write();
sgn.Print();
bkg.Print();
f.Close();
}
}
/// macro for performing a classification using a Recurrent Neural Network
/// @param use_type
/// use_type = 0 use Simple RNN network
/// use_type = 1 use LSTM network
/// use_type = 2 use GRU
/// use_type = 3 build 3 different networks with RNN, LSTM and GRU
void TMVA_RNN_Classification(int use_type = 1)
{
const int ninput = 30;
const int ntime = 10;
const int batchSize = 100;
const int maxepochs = 20;
int nTotEvts = 10000; // total events to be generated for signal or background
bool useKeras = true;
bool useTMVA_RNN = true;
bool useTMVA_DNN = true;
bool useTMVA_BDT = false;
std::vector<std::string> rnn_types = {"RNN", "LSTM", "GRU"};
std::vector<bool> use_rnn_type = {1, 1, 1};
if (use_type >=0 && use_type < 3) {
use_rnn_type = {0,0,0};
use_rnn_type[use_type] = 1;
}
bool useGPU = true; // use GPU for TMVA if available
#ifndef R__HAS_TMVAGPU
useGPU = false;
#ifndef R__HAS_TMVACPU
Warning("TMVA_RNN_Classification", "TMVA is not build with GPU or CPU multi-thread support. Cannot use TMVA Deep Learning for RNN");
useTMVA_RNN = false;
#endif
#endif
TString archString = (useGPU) ? "GPU" : "CPU";
bool writeOutputFile = true;
const char *rnn_type = "RNN";
#ifdef R__HAS_PYMVA
#else
useKeras = false;
#endif
int num_threads = 0; // use by default all threads
// do enable MT running
if (num_threads >= 0) {
ROOT::EnableImplicitMT(num_threads);
if (num_threads > 0) gSystem->Setenv("OMP_NUM_THREADS", TString::Format("%d",num_threads));
}
else
gSystem->Setenv("OMP_NUM_THREADS", "1");
std::cout << "Running with nthreads = " << ROOT::GetThreadPoolSize() << std::endl;
TString inputFileName = "time_data_t10_d30.root";
bool fileExist = !gSystem->AccessPathName(inputFileName);
// if file does not exists create it
if (!fileExist) {
MakeTimeData(nTotEvts,ntime, ninput);
}
auto inputFile = TFile::Open(inputFileName);
if (!inputFile) {
Error("TMVA_RNN_Classification", "Error opening input file %s - exit", inputFileName.Data());
return;
}
std::cout << "--- RNNClassification : Using input file: " << inputFile->GetName() << std::endl;
// Create a ROOT output file where TMVA will store ntuples, histograms, etc.
TString outfileName(TString::Format("data_RNN_%s.root", archString.Data()));
TFile *outputFile = nullptr;
if (writeOutputFile) outputFile = TFile::Open(outfileName, "RECREATE");
/**
## Declare Factory
Create the Factory class. Later you can choose the methods
whose performance you'd like to investigate.
The factory is the major TMVA object you have to interact with. Here is the list of parameters you need to
pass
- The first argument is the base of the name of all the output
weightfiles in the directory weight/ that will be created with the
method parameters
- The second argument is the output file for the training results
- The third argument is a string option defining some general configuration for the TMVA session.
For example all TMVA output can be suppressed by removing the "!" (not) in front of the "Silent" argument in
the option string
**/
// Creating the factory object
TMVA::Factory *factory = new TMVA::Factory("TMVAClassification", outputFile,
"!V:!Silent:Color:DrawProgressBar:Transformations=None:!Correlations:"
"AnalysisType=Classification:ModelPersistence");
TMVA::DataLoader *dataloader = new TMVA::DataLoader("dataset");
TTree *signalTree = (TTree *)inputFile->Get("sgn");
TTree *background = (TTree *)inputFile->Get("bkg");
const int nvar = ninput * ntime;
/// add variables - use new AddVariablesArray function
for (auto i = 0; i < ntime; i++) {
dataloader->AddVariablesArray(Form("vars_time%d", i), ninput);
}
dataloader->AddSignalTree(signalTree, 1.0);
dataloader->AddBackgroundTree(background, 1.0);
// check given input
auto &datainfo = dataloader->GetDataSetInfo();
auto vars = datainfo.GetListOfVariables();
std::cout << "number of variables is " << vars.size() << std::endl;
for (auto &v : vars)
std::cout << v << ",";
std::cout << std::endl;
int nTrainSig = 0.8 * nTotEvts;
int nTrainBkg = 0.8 * nTotEvts;
// build the string options for DataLoader::PrepareTrainingAndTestTree
TString prepareOptions = TString::Format("nTrain_Signal=%d:nTrain_Background=%d:SplitMode=Random:SplitSeed=100:NormMode=NumEvents:!V:!CalcCorrelations", nTrainSig, nTrainBkg);
// Apply additional cuts on the signal and background samples (can be different)
TCut mycuts = ""; // for example: TCut mycuts = "abs(var1)<0.5 && abs(var2-0.5)<1";
TCut mycutb = "";
dataloader->PrepareTrainingAndTestTree(mycuts, mycutb, prepareOptions);
std::cout << "prepared DATA LOADER " << std::endl;
/**
## Book TMVA recurrent models
Book the different types of recurrent models in TMVA (SimpleRNN, LSTM or GRU)
**/
if (useTMVA_RNN) {
for (int i = 0; i < 3; ++i) {
if (!use_rnn_type[i])
continue;
const char *rnn_type = rnn_types[i].c_str();
/// define the inputlayout string for RNN
/// the input data should be organize as following:
//// input layout for RNN: time x ndim
TString inputLayoutString = TString::Format("InputLayout=%d|%d", ntime, ninput);
/// Define RNN layer layout
/// it should be LayerType (RNN or LSTM or GRU) | number of units | number of inputs | time steps | remember output (typically no=0 | return full sequence
TString rnnLayout = TString::Format("%s|10|%d|%d|0|1", rnn_type, ninput, ntime);
/// add after RNN a reshape layer (needed top flatten the output) and a dense layer with 64 units and a last one
/// Note the last layer is linear because when using Crossentropy a Sigmoid is applied already
TString layoutString = TString("Layout=") + rnnLayout + TString(",RESHAPE|FLAT,DENSE|64|TANH,LINEAR");
/// Defining Training strategies. Different training strings can be concatenate. Use however only one
TString trainingString1 = TString::Format("LearningRate=1e-3,Momentum=0.0,Repetitions=1,"
"ConvergenceSteps=5,BatchSize=%d,TestRepetitions=1,"
"WeightDecay=1e-2,Regularization=None,MaxEpochs=%d,"
"Optimizer=ADAM,DropConfig=0.0+0.+0.+0.",
batchSize,maxepochs);
TString trainingStrategyString("TrainingStrategy=");
trainingStrategyString += trainingString1; // + "|" + trainingString2
/// Define the full RNN Noption string adding the final options for all network
TString rnnOptions("!H:V:ErrorStrategy=CROSSENTROPY:VarTransform=None:"
"WeightInitialization=XAVIERUNIFORM:ValidationSize=0.2:RandomSeed=1234");
rnnOptions.Append(":");
rnnOptions.Append(inputLayoutString);
rnnOptions.Append(":");
rnnOptions.Append(layoutString);
rnnOptions.Append(":");
rnnOptions.Append(trainingStrategyString);
rnnOptions.Append(":");
rnnOptions.Append(TString::Format("Architecture=%s", archString.Data()));
TString rnnName = "TMVA_" + TString(rnn_type);
factory->BookMethod(dataloader, TMVA::Types::kDL, rnnName, rnnOptions);
}
}
/**
## Book TMVA fully connected dense layer models
**/
if (useTMVA_DNN) {
// Method DL with Dense Layer
TString inputLayoutString = TString::Format("InputLayout=1|1|%d", ntime * ninput);
TString layoutString("Layout=DENSE|64|TANH,DENSE|TANH|64,DENSE|TANH|64,LINEAR");
// Training strategies.
TString trainingString1("LearningRate=1e-3,Momentum=0.0,Repetitions=1,"
"ConvergenceSteps=10,BatchSize=256,TestRepetitions=1,"
"WeightDecay=1e-4,Regularization=None,MaxEpochs=20"
"DropConfig=0.0+0.+0.+0.,Optimizer=ADAM");
TString trainingStrategyString("TrainingStrategy=");
trainingStrategyString += trainingString1; // + "|" + trainingString2
// General Options.
TString dnnOptions("!H:V:ErrorStrategy=CROSSENTROPY:VarTransform=None:"
"WeightInitialization=XAVIER:RandomSeed=0");
dnnOptions.Append(":");
dnnOptions.Append(inputLayoutString);
dnnOptions.Append(":");
dnnOptions.Append(layoutString);
dnnOptions.Append(":");
dnnOptions.Append(trainingStrategyString);
dnnOptions.Append(":");
dnnOptions.Append(archString);
TString dnnName = "TMVA_DNN";
factory->BookMethod(dataloader, TMVA::Types::kDL, dnnName, dnnOptions);
}
/**
## Book Keras recurrent models
Book the different types of recurrent models in Keras (SimpleRNN, LSTM or GRU)
**/
if (useKeras) {
for (int i = 0; i < 3; i++) {
if (use_rnn_type[i]) {
TString modelName = TString::Format("model_%s.h5", rnn_types[i].c_str());
TString trainedModelName = TString::Format("trained_model_%s.h5", rnn_types[i].c_str());
Info("TMVA_RNN_Classification", "Building recurrent keras model using a %s layer", rnn_types[i].c_str());
// create python script which can be executed
// create 2 conv2d layer + maxpool + dense
m.AddLine("import keras");
m.AddLine("from keras.models import Sequential");
m.AddLine("from keras.optimizers import Adam");
m.AddLine("from keras.layers import Input, Dense, Dropout, Flatten, SimpleRNN, GRU, LSTM, Reshape, "
"BatchNormalization");
m.AddLine("");
m.AddLine("model = keras.models.Sequential() ");
m.AddLine("model.add(Reshape((10, 30), input_shape = (10*30, )))");
// add recurrent neural network depending on type / Use option to return the full output
if (rnn_types[i] == "LSTM")
m.AddLine("model.add(LSTM(units=10, return_sequences=True) )");
else if (rnn_types[i] == "GRU")
m.AddLine("model.add(GRU(units=10, return_sequences=True) )");
else
m.AddLine("model.add(SimpleRNN(units=10, return_sequences=True) )");
// m.AddLine("model.add(BatchNormalization())");
m.AddLine("model.add(Flatten())"); // needed if returning the full time output sequence
m.AddLine("model.add(Dense(64, activation = 'tanh')) ");
m.AddLine("model.add(Dense(2, activation = 'sigmoid')) ");
m.AddLine(
"model.compile(loss = 'binary_crossentropy', optimizer = Adam(lr = 0.001), metrics = ['accuracy'])");
m.AddLine(TString::Format("modelName = '%s'", modelName.Data()));
m.AddLine("model.save(modelName)");
m.AddLine("model.summary()");
m.SaveSource("make_rnn_model.py");
// execute
gSystem->Exec("python make_rnn_model.py");
Warning("TMVA_RNN_Classification", "Error creating Keras recurrennt model file - Skip using Keras");
useKeras = false;
} else {
// book PyKeras method only if Keras model could be created
Info("TMVA_RNN_Classification", "Booking Keras %s model", rnn_types[i].c_str());
factory->BookMethod(dataloader, TMVA::Types::kPyKeras,
TString::Format("PyKeras_%s", rnn_types[i].c_str()),
TString::Format("!H:!V:VarTransform=None:FilenameModel=%s:"
"FilenameTrainedModel=%s:GpuOptions=allow_growth=True:"
"NumEpochs=%d:BatchSize=%d",
modelName.Data(), trainedModelName.Data(), maxepochs, batchSize));
}
}
}
}
// use BDT in case not using Keras or TMVA DL
if (!useKeras || !useTMVA_BDT)
useTMVA_BDT = true;
/**
## Book TMVA BDT
**/
if (useTMVA_BDT) {
factory->BookMethod(dataloader, TMVA::Types::kBDT, "BDTG",
"!H:!V:NTrees=100:MinNodeSize=2.5%:BoostType=Grad:Shrinkage=0.10:UseBaggedBoost:"
"BaggedSampleFraction=0.5:nCuts=20:"
"MaxDepth=2");
}
/// Train all methods
factory->TrainAllMethods();
std::cout << "nthreads = " << ROOT::GetThreadPoolSize() << std::endl;
// ---- Evaluate all MVAs using the set of test events
factory->TestAllMethods();
// ----- Evaluate and compare performance of all configured MVAs
factory->EvaluateAllMethods();
// check method
// plot ROC curve
auto c1 = factory->GetROCCurve(dataloader);
c1->Draw();
if (outputFile) outputFile->Close();
}
Author
Lorenzo Moneta

Definition in file TMVA_RNN_Classification.C.

m
auto * m
Definition: textangle.C:8
n
const Int_t n
Definition: legend1.C:16
TMVA::DataLoader::AddVariablesArray
void AddVariablesArray(const TString &expression, int size, char type='F', Double_t min=0, Double_t max=0)
user inserts discriminating array of variables in data set info in case input tree provides an array ...
Definition: DataLoader.cxx:503
TCut
Definition: TCut.h:25
TRandom::Gaus
virtual Double_t Gaus(Double_t mean=0, Double_t sigma=1)
Samples a random number from the standard Normal (Gaussian) Distribution with the given mean and sigm...
Definition: TRandom.cxx:263
TSystem::Setenv
virtual void Setenv(const char *name, const char *value)
Set environment variable.
Definition: TSystem.cxx:1645
Warning
void Warning(const char *location, const char *msgfmt,...)
Use this function in warning situations.
Definition: TError.cxx:231
f
#define f(i)
Definition: RSha256.hxx:122
TMVA::DataLoader::PrepareTrainingAndTestTree
void PrepareTrainingAndTestTree(const TCut &cut, const TString &splitOpt)
prepare the training and test trees -> same cuts for signal and background
Definition: DataLoader.cxx:631
TH2::GetBinContent
virtual Double_t GetBinContent(Int_t bin) const
Return content of bin number bin.
Definition: TH2.h:88
TString::Data
const char * Data() const
Definition: TString.h:369
DataSetInfo.h
Form
char * Form(const char *fmt,...)
TMVA::Types::kBDT
@ kBDT
Definition: Types.h:111
RooAbsArg::Print
virtual void Print(Option_t *options=0) const
Print the object to the defaultPrintStream().
Definition: RooAbsArg.h:320
TTree
Definition: TTree.h:79
TH1D
1-D histogram with a double per channel (see TH1 documentation)}
Definition: TH1.h:615
DataLoader.h
TFile::Open
static TFile * Open(const char *name, Option_t *option="", const char *ftitle="", Int_t compress=ROOT::RCompressionSetting::EDefaults::kUseCompiledDefault, Int_t netopt=0)
Create / open a file.
Definition: TFile.cxx:3946
TMVA::Factory::TestAllMethods
void TestAllMethods()
Evaluates all booked methods on the testing data and adds the output to the Results in the corresponi...
Definition: Factory.cxx:1241
sin
double sin(double)
cos
double cos(double)
TString::Format
static TString Format(const char *fmt,...)
Static method which formats a string using a printf style format descriptor and return a TString.
Definition: TString.cxx:2311
TTree.h
TMVA::DataLoader::AddSignalTree
void AddSignalTree(TTree *signal, Double_t weight=1.0, Types::ETreeType treetype=Types::kMaxTreeType)
number of signal events (used to compute significance)
Definition: DataLoader.cxx:370
TString
Definition: TString.h:136
TSystem::AccessPathName
virtual Bool_t AccessPathName(const char *path, EAccessMode mode=kFileExists)
Returns FALSE if one can access a file using the specified access mode.
Definition: TSystem.cxx:1294
v
@ v
Definition: rootcling_impl.cxx:3635
TFile.h
h1
TH1F * h1
Definition: legend1.C:5
TH2::Reset
virtual void Reset(Option_t *option="")
Reset this histogram: contents, errors, etc.
Definition: TH2.cxx:2464
x1
static const double x1[5]
Definition: RooGaussKronrodIntegrator1D.cxx:346
ROOT::EnableImplicitMT
void EnableImplicitMT(UInt_t numthreads=0)
Enable ROOT's implicit multi-threading for all objects and methods that provide an internal paralleli...
Definition: TROOT.cxx:525
TROOT.h
TF1::SetParameters
virtual void SetParameters(const Double_t *params)
Definition: TF1.h:640
TMVA::Config::Instance
static Config & Instance()
static function: returns TMVA instance
Definition: Config.cxx:105
TMath::Pi
constexpr Double_t Pi()
Definition: TMath.h:43
TDirectoryFile::Get
TObject * Get(const char *namecycle) override
Return pointer to object identified by namecycle.
Definition: TDirectoryFile.cxx:909
TH1::GetBinContent
virtual Double_t GetBinContent(Int_t bin) const
Return content of bin number bin.
Definition: TH1.cxx:4906
TH2::FillRandom
virtual void FillRandom(const char *fname, Int_t ntimes=5000)
Fill histogram following distribution in function fname.
Definition: TH2.cxx:599
TSystem::Exec
virtual Int_t Exec(const char *shellcmd)
Execute a command.
Definition: TSystem.cxx:654
ROOT::GetThreadPoolSize
UInt_t GetThreadPoolSize()
Returns the size of ROOT's thread pool.
Definition: TROOT.cxx:563
gRandom
R__EXTERN TRandom * gRandom
Definition: TRandom.h:62
TRandom::SetSeed
virtual void SetSeed(ULong_t seed=0)
Set the random generator seed.
Definition: TRandom.cxx:597
TMVA::Factory
Definition: Factory.h:80
TH1::FillRandom
virtual void FillRandom(const char *fname, Int_t ntimes=5000)
Fill histogram following distribution in function fname.
Definition: TH1.cxx:3444
TFile
Definition: TFile.h:54
Config.h
TMacro
Definition: TMacro.h:31
gSystem
R__EXTERN TSystem * gSystem
Definition: TSystem.h:559
TMVA::Factory::BookMethod
MethodBase * BookMethod(DataLoader *loader, TString theMethodName, TString methodTitle, TString theOption="")
Book a classifier or regression method.
Definition: Factory.cxx:342
MethodDL.h
TMVA::Types::kDL
@ kDL
Definition: Types.h:124
f1
TF1 * f1
Definition: legend1.C:11
v1
@ v1
Definition: rootcling_impl.cxx:3637
TCanvas
Definition: TCanvas.h:23
Info
void Info(const char *location, const char *msgfmt,...)
Use this function for informational messages.
Definition: TError.cxx:220
v2
@ v2
Definition: rootcling_impl.cxx:3638
TObject::Write
virtual Int_t Write(const char *name=0, Int_t option=0, Int_t bufsize=0)
Write this object to the current directory.
Definition: TObject.cxx:795
TFile::Close
void Close(Option_t *option="") override
Close a file.
Definition: TFile.cxx:876
Factory.h
x2
static const double x2[5]
Definition: RooGaussKronrodIntegrator1D.cxx:364
gPad
#define gPad
Definition: TVirtualPad.h:287
TH1F::Reset
virtual void Reset(Option_t *option="")
Reset.
Definition: TH1.cxx:9561
TMVA::DataLoader::AddBackgroundTree
void AddBackgroundTree(TTree *background, Double_t weight=1.0, Types::ETreeType treetype=Types::kMaxTreeType)
number of signal events (used to compute significance)
Definition: DataLoader.cxx:401
TNamed::GetName
virtual const char * GetName() const
Returns name of object.
Definition: TNamed.h:53
TMVA::Types::kPyKeras
@ kPyKeras
Definition: Types.h:128
TMVA::Factory::EvaluateAllMethods
void EvaluateAllMethods(void)
Iterates over all MVAs that have been booked, and calls their evaluation methods.
Definition: Factory.cxx:1346
TMVA::Factory::TrainAllMethods
void TrainAllMethods()
Iterates through all booked methods and calls training.
Definition: Factory.cxx:1090
TF1
1-Dim function class
Definition: TF1.h:212
TMVA::DataLoader::GetDataSetInfo
DataSetInfo & GetDataSetInfo()
Definition: DataLoader.cxx:137
make_rnn_model.modelName
string modelName
Definition: make_rnn_model.py:13
TMVA::Factory::GetROCCurve
TGraph * GetROCCurve(DataLoader *loader, TString theMethodName, Bool_t setTitles=kTRUE, UInt_t iClass=0)
Argument iClass specifies the class to generate the ROC curve in a multiclass setting.
Definition: Factory.cxx:900
TMVA::PyMethodBase::PyInitialize
static void PyInitialize()
Initialize Python interpreter.
Definition: PyMethodBase.cxx:125
TMVA::DataLoader
Definition: DataLoader.h:50
Error
void Error(const char *location, const char *msgfmt,...)
Use this function in case an error occurred.
Definition: TError.cxx:187
c1
return c1
Definition: legend1.C:41