Logo ROOT  
Reference Guide
df007_snapshot.py
Go to the documentation of this file.
1## \file
2## \ingroup tutorial_dataframe
3## \notebook -draw
4## Write ROOT data with RDataFrame.
5##
6## This tutorial shows how to write out datasets in ROOT format using the RDataFrame
7##
8## \macro_image
9## \macro_code
10##
11## \date April 2017
12## \author Danilo Piparo (CERN)
13
14import ROOT
15
16# A simple helper function to fill a test tree: this makes the example stand-alone.
17def fill_tree(treeName, fileName):
18 df = ROOT.RDataFrame(10000)
19 df.Define("b1", "(int) rdfentry_")\
20 .Define("b2", "(float) rdfentry_ * rdfentry_").Snapshot(treeName, fileName)
21
22# We prepare an input tree to run on
23fileName = "df007_snapshot_py.root"
24outFileName = "df007_snapshot_output_py.root"
25outFileNameAllColumns = "df007_snapshot_output_allColumns_py.root"
26treeName = "myTree"
27fill_tree(treeName, fileName)
28
29# We read the tree from the file and create a RDataFrame.
30d = ROOT.RDataFrame(treeName, fileName)
31
32# ## Select entries
33# We now select some entries in the dataset
34d_cut = d.Filter("b1 % 2 == 0")
35# ## Enrich the dataset
36# Build some temporary columns: we'll write them out
37
38getVector_code ='''
39std::vector<float> getVector (float b2)
40{
41 std::vector<float> v;
42 for (int i = 0; i < 3; i++) v.push_back(b2*i);
43 return v;
44}
45'''
46ROOT.gInterpreter.Declare(getVector_code)
47
48d2 = d_cut.Define("b1_square", "b1 * b1") \
49 .Define("b2_vector", "getVector( b2 )")
50
51# ## Write it to disk in ROOT format
52# We now write to disk a new dataset with one of the variables originally
53# present in the tree and the new variables.
54# The user can explicitly specify the types of the columns as template
55# arguments of the Snapshot method, otherwise they will be automatically
56# inferred.
57branchList = ROOT.vector('string')()
58for branchName in ["b1", "b1_square", "b2_vector"]:
59 branchList.push_back(branchName)
60d2.Snapshot(treeName, outFileName, branchList)
61
62# Open the new file and list the columns of the tree
63f1 = ROOT.TFile(outFileName)
64t = f1.myTree
65print("These are the columns b1, b1_square and b2_vector:")
66for branch in t.GetListOfBranches():
67 print("Branch: %s" %branch.GetName())
68
69f1.Close()
70
71# We are not forced to write the full set of column names. We can also
72# specify a regular expression for that. In case nothing is specified, all
73# columns are persistified.
74d2.Snapshot(treeName, outFileNameAllColumns)
75
76# Open the new file and list the columns of the tree
77f2 = ROOT.TFile(outFileNameAllColumns)
78t = f2.myTree
79print("These are all the columns available to this dataframe:")
80for branch in t.GetListOfBranches():
81 print("Branch: %s" %branch.GetName())
82
83f2.Close()
84
85# We can also get a fresh RDataFrame out of the snapshot and restart the
86# analysis chain from it.
87
88branchList.clear()
89branchList.push_back("b1_square")
90snapshot_df = d2.Snapshot(treeName, outFileName, branchList);
91h = snapshot_df.Histo1D("b1_square")
92
93c = ROOT.TCanvas()
94h.Draw()
95c.SaveAs("df007_snapshot.png")
96
97print("Saved figure to df007_snapshot.png")
ROOT's RDataFrame offers a high level interface for analyses of data stored in TTree,...
Definition: RDataFrame.hxx:42