Logo ROOT  
Reference Guide
df019_Cache.C
Go to the documentation of this file.
1/// \file
2/// \ingroup tutorial_dataframe
3/// \notebook -draw
4/// This tutorial shows how the content of a data frame can be cached in memory
5/// in form of a data frame. The content of the columns is stored in memory in
6/// contiguous slabs of memory and is "ready to use", i.e. no ROOT IO operation
7/// is performed.
8///
9/// Creating a cached data frame storing all of its content deserialised and uncompressed
10/// in memory is particularly useful when dealing with datasets of a moderate size
11/// (small enough to fit the RAM) over which several explorative loops need to be
12/// performed at as fast as possible. In addition, caching can be useful when no file
13/// on disk needs to be created as a side effect of checkpointing part of the analysis.
14///
15/// All steps in the caching are lazy, i.e. the cached data frame is actually filled
16/// only when the event loop is triggered on it.
17///
18/// \macro_code
19/// \macro_image
20///
21/// \date June 2018
22/// \author Danilo Piparo
23
24void df019_Cache()
25{
26 // We create a data frame on top of the hsimple example
27 auto hsimplePath = gROOT->GetTutorialDir();
28 hsimplePath += "/hsimple.root";
29 ROOT::RDataFrame df("ntuple", hsimplePath.Data());
30
31 // We apply a simple cut and define a new column
32 auto df_cut = df.Filter([](float py) { return py > 0.f; }, {"py"})
33 .Define("px_plus_py", [](float px, float py) { return px + py; }, {"px", "py"});
34
35 // We cache the content of the dataset. Nothing has happened yet: the work to accomplish
36 // has been described. As for `Snapshot`, the types and columns can be written out explicitly
37 // or left for the jitting to handle (`df_cached` is intentionally unused - it shows how to
38 // to create a *cached* data frame specifying column types explicitly):
39 auto df_cached = df_cut.Cache<float, float>({"px_plus_py", "py"});
40 auto df_cached_implicit = df_cut.Cache();
41 auto h = df_cached_implicit.Histo1D<float>("px_plus_py");
42
43 // Now the event loop on the cached dataset is triggered. This event triggers the loop
44 // on the `df` data frame lazily.
45 h->DrawCopy();
46}
#define h(i)
Definition: RSha256.hxx:106
#define gROOT
Definition: TROOT.h:406
ROOT's RDataFrame offers a high level interface for analyses of data stored in TTrees,...
Definition: RDataFrame.hxx:42