Logo ROOT  
Reference Guide
df019_Cache.py
Go to the documentation of this file.
1## \file
2## \ingroup tutorial_dataframe
3## \notebook -draw
4## This tutorial shows how the content of a data frame can be cached in memory
5## in form of a data frame. The content of the columns is stored in memory in
6## contiguous slabs of memory and is "ready to use", i.e. no ROOT IO operation
7## is performed.
8##
9## Creating a cached data frame storing all of its content deserialised and uncompressed
10## in memory is particularly useful when dealing with datasets of a moderate size
11## (small enough to fit the RAM) over which several explorative loops need to be
12## performed at as fast as possible. In addition, caching can be useful when no file
13## on disk needs to be created as a side effect of checkpointing part of the analysis.
14##
15## All steps in the caching are lazy, i.e. the cached data frame is actually filled
16## only when the event loop is triggered on it.
17##
18## \macro_code
19## \macro_image
20##
21## \date June 2018
22## \author Danilo Piparo
23
24import ROOT
25import os
26
27# We create a data frame on top of the hsimple example
28hsimplePath = os.path.join(str(ROOT.gROOT.GetTutorialDir().Data()), "hsimple.root")
29df = ROOT.RDataFrame("ntuple", hsimplePath)
30
31#We apply a simple cut and define a new column
32df_cut = df.Filter("py > 0.f")\
33 .Define("px_plus_py", "px + py")
34
35# We cache the content of the dataset. Nothing has happened yet: the work to accomplish
36# has been described.
37df_cached = df_cut.Cache()
38
39h = df_cached.Histo1D("px_plus_py")
40
41# Now the event loop on the cached dataset is triggered. This event triggers the loop
42# on the `df` data frame lazily.
43h.Draw()
ROOT's RDataFrame offers a high level interface for analyses of data stored in TTrees,...
Definition: RDataFrame.hxx:42