25#include <nlohmann/json.hpp>  
 1247   void Exec(unsigned int slot)
 
 1249      fPerThreadResults[slot]++;
 
 1252   // Called at the end of the event loop.
 
 1255      *fFinalResult = std::accumulate(fPerThreadResults.begin(), fPerThreadResults.end(), 0);
 
 1258   // Called by RDataFrame to retrieve the name of this action.
 
 1259   std::string GetActionName() const { return "MyCounter"; }
 
 1263   ROOT::RDataFrame df(10);
 
 1264   ROOT::RDF::RResultPtr<int> resultPtr = df.Book<>(MyCounter{df.GetNSlots()}, {});
 
 1265   // The GetValue call triggers the event loop
 
 1266   std::cout << "Number of processed entries: " <<  resultPtr.GetValue() << std::endl;
 
 1270See the Book() method for more information and [this tutorial](https://root.cern/doc/master/df018__customActions_8C.html)
 
 1271for a more complete example.
 
 1273#### Injecting arbitrary code in the event loop with Foreach() and ForeachSlot()
 
 1275Foreach() takes a callable (lambda expression, free function, functor...) and a list of columns and
 
 1276executes the callable on the values of those columns for each event that passes all upstream selections.
 
 1277It can be used to perform actions that are not already available in the interface. For example, the following snippet
 
 1278evaluates the root mean square of column "x":
 
 1280// Single-thread evaluation of RMS of column "x" using Foreach
 
 1283df.Foreach([&sumSq, &n](double x) { ++n; sumSq += x*x; }, {"x"});
 
 1284std::cout << "rms of x: " << std::sqrt(sumSq / n) << std::endl;
 
 1286In multi-thread runs, users are responsible for the thread-safety of the expression passed to Foreach():
 
 1287thread will execute the expression concurrently.
 
 1288The code above would need to employ some resource protection mechanism to ensure non-concurrent writing of `rms`; but
 
 1289this is probably too much head-scratch for such a simple operation.
 
 1291ForeachSlot() can help in this situation. It is an alternative version of Foreach() for which the function takes an
 
 1292additional "processing slot" parameter besides the columns it should be applied to. RDataFrame
 
 1293guarantees that ForeachSlot() will invoke the user expression with different `slot` parameters for different concurrent
 
 1294executions (see [Special helper columns: rdfentry_ and rdfslot_](\ref helper-cols) for more information on the slot parameter).
 
 1295We can take advantage of ForeachSlot() to evaluate a thread-safe root mean square of column "x":
 
 1297// Thread-safe evaluation of RMS of column "x" using ForeachSlot
 
 1298ROOT::EnableImplicitMT();
 
 1299const unsigned int nSlots = df.GetNSlots();
 
 1300std::vector<double> sumSqs(nSlots, 0.);
 
 1301std::vector<unsigned int> ns(nSlots, 0);
 
 1303df.ForeachSlot([&sumSqs, &ns](unsigned int slot, double x) { sumSqs[slot] += x*x; ns[slot] += 1; }, {"x"});
 
 1304double sumSq = std::accumulate(sumSqs.begin(), sumSqs.end(), 0.); // sum all squares
 
 1305unsigned int n = std::accumulate(ns.begin(), ns.end(), 0); // sum all counts
 
 1306std::cout << "rms of x: " << std::sqrt(sumSq / n) << std::endl;
 
 1308Notice how we created one `double` variable for each processing slot and later merged their results via `std::accumulate`.
 
 1312### Dataset joins with friend trees
 
 1314Vertically concatenating multiple trees that have the same columns (creating a logical dataset with the same columns and
 
 1315more rows) is trivial in RDataFrame: just pass the tree name and a list of file names to RDataFrame's constructor, or create a TChain
 
 1316out of the desired trees and pass that to RDataFrame.
 
 1318Horizontal concatenations of trees or chains (creating a logical dataset with the same number of rows and the union of the
 
 1319columns of multiple trees) leverages TTree's "friend" mechanism.
 
 1321Simple joins of trees that do not have the same number of rows are also possible with indexed friend trees (see below).
 
 1323To use friend trees in RDataFrame, set up trees with the appropriate relationships and then instantiate an RDataFrame
 
 1329main.AddFriend(&friend, "myFriend");
 
 1332auto df2 = df.Filter("myFriend.MyCol == 42");
 
 1335The same applies for TChains. Columns coming from the friend trees can be referred to by their full name, like in the example above,
 
 1336or the friend tree name can be omitted in case the column name is not ambiguous (e.g. "MyCol" could be used instead of
 
 1337"myFriend.MyCol" in the example above if there is no column "MyCol" in the main tree).
 
 1339\note A common source of confusion is that trees that are written out from a multi-thread Snapshot() call will have their
 
 1340      entries (block-wise) shuffled with respect to the original tree. Such trees cannot be used as friends of the original
 
 1341      one: rows will be mismatched.
 
 1343Indexed friend trees provide a way to perform simple joins of multiple trees over a common column.
 
 1344When a certain entry in the main tree (or chain) is loaded, the friend trees (or chains) will then load an entry where the
 
 1345"index" columns have a value identical to the one in the main one. For example, in Python:
 
 1351# If a friend tree has an index on `commonColumn`, when the main tree loads
 
 1352# a given row, it also loads the row of the friend tree that has the same
 
 1353# value of `commonColumn`
 
 1354aux_tree.BuildIndex("commonColumn")
 
 1356mainTree.AddFriend(aux_tree)
 
 1358df = ROOT.RDataFrame(mainTree)
 
 1361RDataFrame supports indexed friend TTrees from ROOT v6.24 in single-thread mode and from v6.28/02 in multi-thread mode.
 
 1363\anchor other-file-formats
 
 1364### Reading data formats other than ROOT trees
 
 1365RDataFrame can be interfaced with RDataSources. The ROOT::RDF::RDataSource interface defines an API that RDataFrame can use to read arbitrary columnar data formats.
 
 1367RDataFrame calls into concrete RDataSource implementations to retrieve information about the data, retrieve (thread-local) readers or "cursors" for selected columns
 
 1368and to advance the readers to the desired data entry.
 
 1369Some predefined RDataSources are natively provided by ROOT such as the ROOT::RDF::RCsvDS which allows to read comma separated files:
 
 1371auto tdf = ROOT::RDF::FromCSV("MuRun2010B.csv");
 
 1372auto filteredEvents =
 
 1373   tdf.Filter("Q1 * Q2 == -1")
 
 1374      .Define("m", "sqrt(pow(E1 + E2, 2) - (pow(px1 + px2, 2) + pow(py1 + py2, 2) + pow(pz1 + pz2, 2)))");
 
 1375auto h = filteredEvents.Histo1D("m");
 
 1379See also FromNumpy (Python-only), FromRNTuple(), FromArrow(), FromSqlite().
 
 1382### Computation graphs (storing and reusing sets of transformations)
 
 1384As we saw, transformed dataframes can be stored as variables and reused multiple times to create modified versions of the dataset. This implicitly defines a **computation graph** in which
 
 1385several paths of filtering/creation of columns are executed simultaneously, and finally aggregated results are produced.
 
 1387RDataFrame detects when several actions use the same filter or the same defined column, and **only evaluates each
 
 1388filter or defined column once per event**, regardless of how many times that result is used down the computation graph.
 
 1389Objects read from each column are **built once and never copied**, for maximum efficiency.
 
 1390When "upstream" filters are not passed, subsequent filters, temporary column expressions and actions are not evaluated,
 
 1391so it might be advisable to put the strictest filters first in the graph.
 
 1393\anchor representgraph
 
 1394### Visualizing the computation graph
 
 1395It is possible to print the computation graph from any node to obtain a [DOT (graphviz)](https://en.wikipedia.org/wiki/DOT_(graph_description_language)) representation either on the standard output
 
 1398Invoking the function ROOT::RDF::SaveGraph() on any node that is not the head node, the computation graph of the branch
 
 1399the node belongs to is printed. By using the head node, the entire computation graph is printed.
 
 1401Following there is an example of usage:
 
 1403// First, a sample computational graph is built
 
 1404ROOT::RDataFrame df("tree", "f.root");
 
 1406auto df2 = df.Define("x", []() { return 1; })
 
 1407             .Filter("col0 % 1 == col0")
 
 1408             .Filter([](int b1) { return b1 <2; }, {"cut1"})
 
 1409             .Define("y", []() { return 1; });
 
 1411auto count =  df2.Count();
 
 1413// Prints the graph to the rd1.dot file in the current directory
 
 1414ROOT::RDF::SaveGraph(df, "./mydot.dot");
 
 1415// Prints the graph to standard output
 
 1416ROOT::RDF::SaveGraph(df);
 
 1419The generated graph can be rendered using one of the graphviz filters, e.g. `dot`. For instance, the image below can be generated with the following command:
 
 1421$ dot -Tpng computation_graph.dot -ocomputation_graph.png
 
 1424\image html RDF_Graph2.png
 
 1427### Activating RDataFrame execution logs
 
 1429RDataFrame has experimental support for verbose logging of the event loop runtimes and other interesting related information. It is activated as follows:
 
 1431#include <ROOT/RLogger.hxx>
 
 1433// this increases RDF's verbosity level as long as the `verbosity` variable is in scope
 
 1434auto verbosity = ROOT::Experimental::RLogScopedVerbosity(ROOT::Detail::RDF::RDFLogChannel(), ROOT::Experimental::ELogLevel::kInfo);
 
 1441verbosity = ROOT.Experimental.RLogScopedVerbosity(ROOT.Detail.RDF.RDFLogChannel(), ROOT.Experimental.ELogLevel.kInfo)
 
 1444More information (e.g. start and end of each multi-thread task) is printed using `ELogLevel.kDebug` and even more
 
 1445(e.g. a full dump of the generated code that RDataFrame just-in-time-compiles) using `ELogLevel.kDebug+10`.
 
 1447\anchor rdf-from-spec
 
 1448### Creating an RDataFrame from a dataset specification file
 
 1450RDataFrame can be created using a dataset specification JSON file: 
 
 1455df = ROOT.RDF.Experimental.FromSpec("spec.json")
 
 1458The input dataset specification JSON file needs to be provided by the user and it describes all necessary samples and
 
 1459their associated metadata information. The main required key is the "samples" (at least one sample is needed) and the
 
 1460required sub-keys for each sample are "trees" and "files". Additionally, one can specify a metadata dictionary for each
 
 1461sample in the "metadata" key.
 
 1463A simple example for the formatting of the specification in the JSON file is the following:
 
 1469         "trees": ["tree1", "tree2"],
 
 1470         "files": ["file1.root", "file2.root"],
 
 1474            "sample_category" = "data"
 
 1478         "trees": ["tree3", "tree4"],
 
 1479         "files": ["file3.root", "file4.root"],
 
 1483            "sample_category" = "MC_background"
 
 1490The metadata information from the specification file can be then accessed using the DefinePerSample function.
 
 1491For example, to access luminosity information (stored as a double):
 
 1494df.DefinePerSample("lumi", 'rdfsampleinfo_.GetD("lumi")')
 
 1497or sample_category information (stored as a string):
 
 1500df.DefinePerSample("sample_category", 'rdfsampleinfo_.GetS("sample_category")')
 
 1503or directly the filename:
 
 1506df.DefinePerSample("name", "rdfsampleinfo_.GetSampleName()")
 
 1509An example implementation of the "FromSpec" method is available in tutorial: df106_HiggstoFourLeptons.py, which also
 
 1510provides a corresponding exemplary JSON file for the dataset specification.
 
 1513### Adding a progress bar 
 
 1515A progress bar showing the processed event statistics can be added to any RDataFrame program.
 
 1516The event statistics include elapsed time, currently processed file, currently processed events, the rate of event processing 
 
 1517and an estimated remaining time (per file being processed). It is recorded and printed in the terminal every m events and every 
 
 1518n seconds (by default m = 1000 and n = 1). The ProgressBar can be also added when the multithread (MT) mode is enabled. 
 
 1520ProgressBar is added after creating the dataframe object (df):
 
 1522ROOT::RDataFrame df("tree", "file.root");
 
 1523ROOT::RDF::Experimental::AddProgressBar(df);
 
 1526Alternatively, RDataFrame can be cast to an RNode first, giving the user more flexibility 
 
 1527For example, it can be called at any computational node, such as Filter or Define, not only the head node,
 
 1528with no change to the ProgressBar function itself: 
 
 1530ROOT::RDataFrame df("tree", "file.root");
 
 1531auto df_1 = ROOT::RDF::RNode(df.Filter("x>1"));
 
 1532ROOT::RDF::Experimental::AddProgressBar(df_1);
 
 1534Examples of implemented progress bars can be seen by running [Higgs to Four Lepton tutorial](https://root.cern/doc/master/df106__HiggsToFourLeptons_8py_source.html) and [Dimuon tutorial](https://root.cern/doc/master/df102__NanoAODDimuonAnalysis_8C.html). 
 
 1554   : RInterface(std::make_shared<
RDFDetail::RLoopManager>(nullptr, defaultColumns))
 
 1557      auto msg = 
"Invalid TDirectory!";
 
 1558      throw std::runtime_error(msg);
 
 1560   const std::string treeNameInt(treeName);
 
 1561   auto tree = 
static_cast<TTree *
>(dirPtr->
Get(treeNameInt.c_str()));
 
 1563      auto msg = 
"Tree \"" + treeNameInt + 
"\" cannot be found!";
 
 1564      throw std::runtime_error(msg);
 
 1566   GetProxiedPtr()->SetTree(std::shared_ptr<TTree>(
tree, [](
TTree *) {}));
 
 1581RDataFrame::RDataFrame(std::string_view treeName, std::string_view filenameglob, 
const ColumnNames_t &defaultColumns)
 
 1584   const std::string treeNameInt(treeName);
 
 1585   const std::string filenameglobInt(filenameglob);
 
 1587   chain->Add(filenameglobInt.c_str());
 
 1606   std::string treeNameInt(treeName);
 
 1608   for (
auto &
f : fileglobs)
 
 1609      chain->Add(
f.c_str());
 
 1677namespace Experimental {
 
 1711   const nlohmann::ordered_json fullData = nlohmann::ordered_json::parse(std::ifstream(jsonFile));
 
 1712   if (!fullData.contains(
"samples") || fullData[
"samples"].size() == 0) {
 
 1713      throw std::runtime_error(
 
 1714         R
"(The input specification does not contain any samples. Please provide the samples in the specification like: 
 1718         "trees": ["tree1", "tree2"], 
 1719         "files": ["file1.root", "file2.root"], 
 1720         "metadata": {"lumi": 1.0, } 
 1723         "trees": ["tree3", "tree4"], 
 1724         "files": ["file3.root", "file4.root"], 
 1725         "metadata": {"lumi": 0.5, } 
 1733   for (
const auto &keyValue : fullData[
"samples"].items()) {
 
 1734      const std::string &sampleName = keyValue.key();
 
 1735      const auto &sample = keyValue.value();
 
 1738      if (!sample.contains(
"trees")) {
 
 1739         throw std::runtime_error(
"A list of tree names must be provided for sample " + sampleName + 
".");
 
 1741      std::vector<std::string> trees = sample[
"trees"];
 
 1742      if (!sample.contains(
"files")) {
 
 1743         throw std::runtime_error(
"A list of files must be provided for sample " + sampleName + 
".");
 
 1745      std::vector<std::string> files = sample[
"files"];
 
 1746      if (!sample.contains(
"metadata")) {
 
 1750         for (
const auto &metadata : sample[
"metadata"].items()) {
 
 1751            const auto &val = metadata.value();
 
 1752            if (val.is_string())
 
 1753               m.Add(metadata.key(), val.get<std::string>());
 
 1754            else if (val.is_number_integer())
 
 1755               m.Add(metadata.key(), val.get<
int>());
 
 1756            else if (val.is_number_float())
 
 1757               m.Add(metadata.key(), val.get<
double>());
 
 1759               throw std::logic_error(
"The metadata keys can only be of type [string|int|double].");
 
 1764   if (fullData.contains(
"friends")) {
 
 1765      for (
const auto &friends : fullData[
"friends"].items()) {
 
 1766         std::string alias = friends.key();
 
 1767         std::vector<std::string> trees = friends.value()[
"trees"];
 
 1768         std::vector<std::string> files = friends.value()[
"files"];
 
 1769         if (files.size() != trees.size() && trees.size() > 1)
 
 1770            throw std::runtime_error(
"Mismatch between trees and files in a friend.");
 
 1775   if (fullData.contains(
"range")) {
 
 1776      std::vector<int> range = fullData[
"range"];
 
 1778      if (range.size() == 1)
 
 1780      else if (range.size() == 2)
 
 1797   auto *
tree = df.GetTree();
 
 1798   auto defCols = df.GetDefaultColumnNames();
 
 1800   std::ostringstream ret;
 
 1802      ret << 
"A data frame built on top of the " << 
tree->GetName() << 
" dataset.";
 
 1803      if (!defCols.empty()) {
 
 1804         if (defCols.size() == 1)
 
 1805            ret << 
"\nDefault column: " << defCols[0];
 
 1807            ret << 
"\nDefault columns:\n";
 
 1808            for (
auto &&col : defCols) {
 
 1809               ret << 
" - " << col << 
"\n";
 
 1814      ret << 
"A data frame associated to the data source \"" << cling::printValue(ds) << 
"\"";
 
 1816      ret << 
"An empty data frame that will create " << df.GetNEmptyEntries() << 
" entries\n";
 
unsigned long long ULong64_t
The head node of a RDF computation graph.
A dataset specification for RDataFrame.
RDatasetSpec & WithGlobalFriends(const std::string &treeName, const std::string &fileNameGlob, const std::string &alias="")
RDatasetSpec & AddSample(RSample sample)
RDatasetSpec & WithGlobalRange(const RDatasetSpec::REntryRange &entryRange={})
Class representing a sample (grouping of trees (and their fileglobs) and (optional) metadata)
RDataSource * fDataSource
Non-owning pointer to a data-source object. Null if no data-source. RLoopManager has ownership of the...
RDFDetail::RLoopManager * GetLoopManager() const
const std::shared_ptr< RDFDetail::RLoopManager > & GetProxiedPtr() const
ROOT's RDataFrame offers a modern, high-level interface for analysis of data stored in TTree ,...
RDataFrame(std::string_view treeName, std::string_view filenameglob, const ColumnNames_t &defaultColumns={})
Build the dataframe.
ROOT::RDF::ColumnNames_t ColumnNames_t
Describe directory structure in memory.
virtual TObject * Get(const char *namecycle)
Return pointer to object identified by namecycle.
A TTree represents a columnar dataset.
std::unique_ptr< TChain > MakeChainForMT(const std::string &name="", const std::string &title="")
Create a TChain object with options that avoid common causes of thread contention.
ROOT::RDataFrame FromSpec(const std::string &jsonFile)
Factory method to create an RDataFrame from a JSON specification file.
std::vector< std::string > ColumnNames_t
This file contains a specialised ROOT message handler to test for diagnostic in unit tests.
std::shared_ptr< const ColumnNames_t > ColumnNamesPtr_t