3You can use RDataFrame in Python thanks to the dynamic Python/
C++ translation of [
PyROOT](https:
4is the same
as for C++,
a simple example follows.
8sum = df.Filter(
"x > 10").Sum(
"y")
12### User code in the RDataFrame workflow
16In the simple example that was shown above,
a C++ expression is passed to the
Filter() operation as
a string
17(`"
x > 0"`), even if we call the method from Python. Indeed, under the hood, the analysis computations run in
18C++, while Python is just the interface language.
20To perform more complex operations that don't fit into
a simple expression
string, you can just-in-time compile
21C++ functions - via the C++ interpreter cling - and use those functions in an expression. See the following
22snippet
for an example:
25# JIT a C++ function from Python
26ROOT.gInterpreter.Declare(
"""
27bool myFilter(float x) {
33# Use the function in an RDF operation
34sum = df.Filter(
"myFilter(x)").Sum(
"y")
38To increase the performance even further, you can also pre-compile
a C++ library with
full code optimizations
39and load the
function into the RDataFrame computation
as follows.
42ROOT.gSystem.Load(
"path/to/myLibrary.so") # Library with the myFilter
function
43ROOT.gInterpreter.Declare(
'#include "myLibrary.h"') # Header with the declaration of the myFilter
function
45sum = df.Filter(
"myFilter(x)").Sum(
"y")
49A more thorough explanation of how to use
C++ code from Python can be found in the [
PyROOT manual](https:
53ROOT also offers the
option to compile Python functions with fundamental types and arrays thereof
using [Numba](https:
54Such compiled functions can then be used in
a C++ expression provided to RDataFrame.
56The
function to be compiled should be decorated with `
ROOT.Numba.Declare`, which allows to specify the parameter and
57return types. See the following snippet
for a simple example or the
full tutorial [here](pyroot004__NumbaDeclare_8py.html).
60@
ROOT.Numba.Declare([
"float"],
"bool")
64df =
ROOT.RDataFrame(
"myTree",
"myFile.root")
69It also works with collections: `
RVec` objects of fundamental types can be transparently converted to/from numpy arrays:
72@
ROOT.Numba.Declare([
'RVec<float>',
'int'],
'RVec<float>')
73def pypowarray(numpyvec, pow):
76df.Define(
'array',
'ROOT::RVecF{1.,2.,3.}')\
77 .Define(
'arraySquared',
'Numba::pypowarray(array, 2)')
80Note that this functionality requires the Python packages `numba` and `cffi` to be installed.
82### Interoperability with NumPy
84#### Conversion to NumPy arrays
86Eventually, you probably would like to inspect the content of the RDataFrame or process the
data further
87with Python libraries. For this purpose, we provide the `AsNumpy()`
function, which returns the columns
88of your RDataFrame
as a dictionary of NumPy arrays. See
a simple example below or
a full tutorial [here](df026__AsNumpyArrays_8py.html).
92cols = df.
Filter(
"x > 10").AsNumpy([
"x",
"y"]) #
retrieve columns
"x" and
"y" as NumPy arrays
93print(cols[
"x"], cols[
"y"])
# the values of the cols dictionary are NumPy arrays
96#### Processing data stored in NumPy arrays
98In
case you have
data in NumPy arrays in Python and you want to process the
data with
ROOT, you can easily
99create an RDataFrame
using `
ROOT.
RDF.FromNumpy`. The factory
function accepts
a dictionary where
100the keys are the column names and the values are NumPy arrays, and returns
a new RDataFrame with the provided
103Only arrays of fundamental types (integers and floating point values) are supported and the arrays must have the same
length.
104Data is read directly from the arrays: no copies are performed.
107# Read data from NumPy arrays
108# The column names in the RDataFrame are taken from the dictionary keys
109x,
y = numpy.array([1, 2, 3]), numpy.array([4, 5, 6])
112# Use RDataFrame as usual,
e.g. write out
a ROOT file
113df.Define(
"z",
"x + y").Snapshot(
"tree",
"file.root")
116### Interoperability with [AwkwardArray](https:
118The
function for RDataFrame to Awkward conversion is ak.from_rdataframe(). The argument to
this function accepts
a tuple of strings that are the RDataFrame column names. By
default this function returns ak.Array
type.
124array = ak.from_rdataframe(
134The
function for Awkward to RDataFrame conversion is ak.to_rdataframe().
136The argument to
this function requires a dictionary: { <column
name string> : <awkward array> }. This
function always returns an RDataFrame
object.
138The arrays given
for each column have to be equal
length:
143 {
"x": [1.1, 1.2, 1.3]},
146 {
"x": [4.1, 4.2, 4.3, 4.4]},
150array_y = ak.Array([1, 2, 3, 4, 5])
151array_z = ak.Array([[1.1], [2.1, 2.3, 2.4], [3.1], [4.1, 4.2, 4.3], [5.1]])
153assert
len(array_x) ==
len(array_y) ==
len(array_z)
155df = ak.to_rdataframe({
"x": array_x,
"y": array_y,
"z": array_z})
158### Construct histogram and profile models from
a tuple
160The Histo1D(), Histo2D(), Histo3D(), Profile1D() and Profile2D() methods return
161histograms and profiles, respectively, which can be constructed using
a model
164In Python, we can specify the arguments
for the constructor of such histogram or
165profile model with
a Python tuple, as shown in the example below:
168# First argument is a tuple with the arguments to construct a TH1D model
169h = df.Histo1D((
"histName",
"histTitle", 64, 0., 128.),
"myColumn")
172### AsRNode helper function
177ROOT.gInterpreter.Declare(
"""
178ROOT::RDF::RNode MyTransformation(ROOT::RDF::RNode df) {
179 auto myFunc = [](float x){ return -x;};
180 return df.Define("y", myFunc, {"x"});
184# Cast the RDataFrame head node
186df_transformed =
ROOT.MyTransformation(
ROOT.
RDF.AsRNode(df))
188# ... or any other node
189df2 = df.Filter(
"x > 42")
190df2_transformed =
ROOT.MyTransformation(
ROOT.
RDF.AsRNode(df2))
static void retrieve(const gsl_integration_workspace *workspace, double *a, double *b, double *r, double *e)
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void data
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h length
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t UChar_t len
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t type
The public interface to the RDataFrame federation of classes.
RInterface< RDFDetail::RFilter< F, Proxied >, DS_t > Filter(F f, const ColumnNames_t &columns={}, std::string_view name="")
Append a filter to the call graph.
ROOT's RDataFrame offers a modern, high-level interface for analysis of data stored in TTree ,...
T Sum(const RVec< T > &v, const T zero=T(0))
Sum elements of an RVec.
RVec< T > Filter(const RVec< T > &v, F &&f)
Create a new collection with the elements passing the filter expressed by the predicate.
ROOT::VecOps::RVec< T > RVec
RNode AsRNode(NodeType node)
Cast a RDataFrame node to the common type ROOT::RDF::RNode.
void function(const Char_t *name_, T fun, const Char_t *docstring=0)
tbb::task_arena is an alias of tbb::interface7::task_arena, which doesn't allow to forward declare tb...
constexpr Double_t C()
Velocity of light in .
static uint64_t sum(uint64_t i)