Logo ROOT   6.10/09
Reference Guide
tdf001_introduction.C
Go to the documentation of this file.
1 /// \file
2 /// \ingroup tutorial_tdataframe
3 /// \notebook -nodraw
4 /// This tutorial illustrates the basic features of the TDataFrame class,
5 /// a utility which allows to interact with data stored in TTrees following
6 /// a functional-chain like approach.
7 ///
8 /// \macro_code
9 ///
10 /// \date December 2016
11 /// \author Enrico Guiraud
12 
13 // ## Preparation
14 // This notebook can be compiled with this invocation
15 // `g++ -o tdf001_introduction tdf001_introduction.C `root-config --cflags --libs` -lTreePlayer`
16 
17 #include "TFile.h"
18 #include "TH1F.h"
19 #include "TTree.h"
20 
21 #include "ROOT/TDataFrame.hxx"
22 
23 // A simple helper function to fill a test tree: this makes the example
24 // stand-alone.
25 void fill_tree(const char *filename, const char *treeName)
26 {
27  TFile f(filename, "RECREATE");
28  TTree t(treeName, treeName);
29  double b1;
30  int b2;
31  t.Branch("b1", &b1);
32  t.Branch("b2", &b2);
33  for (int i = 0; i < 10; ++i) {
34  b1 = i;
35  b2 = i * i;
36  t.Fill();
37  }
38  t.Write();
39  f.Close();
40  return;
41 }
42 
44 {
45 
46  // We prepare an input tree to run on
47  auto fileName = "tdf001_introduction.root";
48  auto treeName = "myTree";
49  fill_tree(fileName, treeName);
50 
51  // We read the tree from the file and create a TDataFrame, a class that
52  // allows us to interact with the data contained in the tree.
53  // We select a default column, a *branch* to adopt ROOT jargon, which will
54  // be looked at if none is specified by the user when dealing with filters
55  // and actions.
56  ROOT::Experimental::TDataFrame d(treeName, fileName, {"b1"});
57 
58  // ## Operations on the dataframe
59  // We now review some *actions* which can be performed on the data frame.
60  // All actions but ForEach return a TActionResultPtr<T>. The series of
61  // operations on the data frame is not executed until one of those pointers
62  // is accessed. If the Foreach action is invoked, the execution is immediate.
63  // But first of all, let us we define now our cut-flow with two lambda
64  // functions. We can use free functions too.
65  auto cutb1 = [](double b1) { return b1 < 5.; };
66  auto cutb1b2 = [](int b2, double b1) { return b2 % 2 && b1 < 4.; };
67 
68  // ### `Count` action
69  // The `Count` allows to retrieve the number of the entries that passed the
70  // filters. Here we show how the automatic selection of the column kicks
71  // in in case the user specifies none.
72  auto entries1 = d.Filter(cutb1) // <- no column name specified here!
73  .Filter(cutb1b2, {"b2", "b1"})
74  .Count();
75 
76  std::cout << *entries1 << " entries passed all filters" << std::endl;
77 
78  // Filters can be expressed as strings. The content must be C++ code. The
79  // name of the variables must be the name of the branches. The code is
80  // just in time compiled.
81  auto entries2 = d.Filter("b1 < 5.").Count();
82  std::cout << *entries2 << " entries passed the string filter" << std::endl;
83 
84  // ### `Min`, `Max` and `Mean` actions
85  // These actions allow to retrieve statistical information about the entries
86  // passing the cuts, if any.
87  auto b1b2_cut = d.Filter(cutb1b2, {"b2", "b1"});
88  auto minVal = b1b2_cut.Min();
89  auto maxVal = b1b2_cut.Max();
90  auto meanVal = b1b2_cut.Mean();
91  auto nonDefmeanVal = b1b2_cut.Mean("b2"); // <- Column is not the default
92  std::cout << "The mean is always included between the min and the max: " << *minVal << " <= " << *meanVal
93  << " <= " << *maxVal << std::endl;
94 
95  // ### `Take` action
96  // The `Take` action allows to retrieve all values of the variable stored in a
97  // particular column that passed filters we specified. The values are stored
98  // in a list by default, but other collections can be chosen.
99  auto b1_cut = d.Filter(cutb1);
100  auto b1List = b1_cut.Take<double>();
101  auto b1Vec = b1_cut.Take<double, std::vector<double>>();
102 
103  std::cout << "Selected b1 entries" << std::endl;
104  for (auto b1_entry : *b1List) std::cout << b1_entry << " ";
105  std::cout << std::endl;
106  auto b1VecCl = TClass::GetClass(typeid(*b1Vec));
107  std::cout << "The type of b1Vec is" << b1VecCl->GetName() << std::endl;
108 
109  // ### `Histo1D` action
110  // The `Histo1D` action allows to fill an histogram. It returns a TH1F filled
111  // with values of the column that passed the filters. For the most common
112  // types, the type of the values stored in the column is automatically
113  // guessed.
114  auto hist = d.Filter(cutb1).Histo1D();
115  std::cout << "Filled h " << hist->GetEntries() << " times, mean: " << hist->GetMean() << std::endl;
116 
117  // ### `Foreach` action
118  // The most generic action of all: an operation is applied to all entries.
119  // In this case we fill a histogram. In some sense this is a violation of a
120  // purely functional paradigm - C++ allows to do that.
121  TH1F h("h", "h", 12, -1, 11);
122  d.Filter([](int b2) { return b2 % 2 == 0; }, {"b2"}).Foreach([&h](double b1) { h.Fill(b1); });
123 
124  std::cout << "Filled h with " << h.GetEntries() << " entries" << std::endl;
125 
126  // ## Express your chain of operations with clarity!
127  // We are discussing an example here but it is not hard to imagine much more
128  // complex pipelines of actions acting on data. Those might require code
129  // which is well organised, for example allowing to conditionally add filters
130  // or again to clearly separate filters and actions without the need of
131  // writing the entire pipeline on one line. This can be easily achieved.
132  // We'll show this re-working the `Count` example:
133  auto cutb1_result = d.Filter(cutb1);
134  auto cutb1b2_result = d.Filter(cutb1b2, {"b2", "b1"});
135  auto cutb1_cutb1b2_result = cutb1_result.Filter(cutb1b2, {"b2", "b1"});
136  // Now we want to count:
137  auto evts_cutb1_result = cutb1_result.Count();
138  auto evts_cutb1b2_result = cutb1b2_result.Count();
139  auto evts_cutb1_cutb1b2_result = cutb1_cutb1b2_result.Count();
140 
141  std::cout << "Events passing cutb1: " << *evts_cutb1_result << std::endl
142  << "Events passing cutb1b2: " << *evts_cutb1b2_result << std::endl
143  << "Events passing both: " << *evts_cutb1_cutb1b2_result << std::endl;
144 
145  // ## Calculating quantities starting from existing columns
146  // Often, operations need to be carried out on quantities calculated starting
147  // from the ones present in the columns. We'll create in this example a third
148  // column the values of which are the sum of the *b1* and *b2* ones, entry by
149  // entry. The way in which the new quantity is defined is via a runable.
150  // It is important to note two aspects at this point:
151  // - The value is created on the fly only if the entry passed the existing
152  // filters.
153  // - The newly created column behaves as the one present on the file on disk.
154  // - The operation creates a new value, without modifying anything. De facto,
155  // this is like having a general container at disposal able to accommodate
156  // any value of any type.
157  // Let's dive in an example:
158  auto entries_sum = d.Define("sum", [](double b1, int b2) { return b2 + b1; }, {"b1", "b2"})
159  .Filter([](double sum) { return sum > 4.2; }, {"sum"})
160  .Count();
161  std::cout << *entries_sum << std::endl;
162 
163  // Additional columns can be expressed as strings. The content must be C++
164  // code. The name of the variables must be the name of the branches. The code
165  // is just in time compiled.
166  auto entries_sum2 = d.Define("sum", "b1 + b2").Filter("sum > 4.2").Count();
167  std::cout << *entries_sum2 << std::endl;
168 
169  return 0;
170 }
171 
172 int main()
173 {
174  return tdf001_introduction();
175 }
virtual Int_t Fill(Double_t x)
Increment bin with abscissa X by 1.
Definition: TH1.cxx:3126
static long int sum(long int i)
Definition: Factory.cxx:2162
TH1 * h
Definition: legend2.C:5
A ROOT file is a suite of consecutive data records (TKey instances) with a well defined format...
Definition: TFile.h:46
tomato 1-D histogram with a float per channel (see TH1 documentation)}
Definition: TH1.h:551
TInterface< TFilterBase > Filter(F f, const ColumnNames_t &bn={}, std::string_view name="")
Append a filter to the call graph.
double f(double x)
virtual Double_t GetEntries() const
Return the current number of entries.
Definition: TH1.cxx:4054
static TClass * GetClass(const char *name, Bool_t load=kTRUE, Bool_t silent=kFALSE)
Static method returning pointer to TClass of the specified class name.
Definition: TClass.cxx:2885
ROOT&#39;s TDataFrame offers a high level interface for analyses of data stored in TTrees.
Definition: TDataFrame.hxx:36
A TTree object has a header with a name and a title.
Definition: TTree.h:78
int main(int argc, char **argv)