Logo ROOT  
Reference Guide
tmva100_DataPreparation.py File Reference

Namespaces

 tmva100_DataPreparation
 

Detailed Description

View in nbviewer Open in SWAN This tutorial illustrates how to prepare ROOT datasets to be nicely readable by most machine learning methods. This requires filtering the inital complex datasets and writing the data in a flat format.

import ROOT
def filter_events(df):
"""
Reduce initial dataset to only events which shall be used for training
"""
return df.Filter("nElectron>=2 && nMuon>=2", "At least two electrons and two muons")
def define_variables(df):
"""
Define the variables which shall be used for training
"""
return df.Define("Muon_pt_1", "Muon_pt[0]")\
.Define("Muon_pt_2", "Muon_pt[1]")\
.Define("Electron_pt_1", "Electron_pt[0]")\
.Define("Electron_pt_2", "Electron_pt[1]")
variables = ["Muon_pt_1", "Muon_pt_2", "Electron_pt_1", "Electron_pt_2"]
if __name__ == "__main__":
for filename, label in [["SMHiggsToZZTo4L.root", "signal"], ["ZZTo2e2mu.root", "background"]]:
print(">>> Extract the training and testing events for {} from the {} dataset.".format(
label, filename))
# Load dataset, filter the required events and define the training variables
filepath = "root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/" + filename
df = ROOT.RDataFrame("Events", filepath)
df = filter_events(df)
df = define_variables(df)
# Book cutflow report
report = df.Report()
# Split dataset by event number for training and testing
columns = ROOT.std.vector["string"](variables)
df.Filter("event % 2 == 0", "Select events with even event number for training")\
.Snapshot("Events", "train_" + label + ".root", columns)
df.Filter("event % 2 == 1", "Select events with odd event number for training")\
.Snapshot("Events", "test_" + label + ".root", columns)
# Print cutflow report
report.Print()
>>> Extract the training and testing events for signal from the SMHiggsToZZTo4L.root dataset.
At least two electrons and two muons: pass=45352 all=299973 -- eff=15.12 % cumulative eff=15.12 %
>>> Extract the training and testing events for background from the ZZTo2e2mu.root dataset.
At least two electrons and two muons: pass=262776 all=1497445 -- eff=17.55 % cumulative eff=17.55 %
Date
August 2019
Author
Stefan Wunsch

Definition in file tmva100_DataPreparation.py.

ROOT::RDataFrame
ROOT's RDataFrame offers a high level interface for analyses of data stored in TTrees,...
Definition: RDataFrame.hxx:42