{ "cells": [ { "cell_type": "markdown", "id": "ac6d87ed", "metadata": {}, "source": [ "# rf408_RDataFrameToRooFit\n", "Fill RooDataSet/RooDataHist in RDataFrame.\n", "\n", "This tutorial shows how to fill RooFit data classes directly from RDataFrame.\n", "Using two small helpers, we tell RDataFrame where the data has to go.\n", "\n", "\n", "\n", "\n", "**Author:** Stephan Hageboeck (CERN) \n", "This notebook tutorial was automatically generated with ROOTBOOK-izer from the macro found in the ROOT repository on Tuesday, May 19, 2026 at 08:31 PM." ] }, { "cell_type": "code", "execution_count": 1, "id": "56b8cd76", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:31:57.454974Z", "iopub.status.busy": "2026-05-19T20:31:57.454869Z", "iopub.status.idle": "2026-05-19T20:31:57.463131Z", "shell.execute_reply": "2026-05-19T20:31:57.462692Z" } }, "outputs": [], "source": [ "%%cpp -d\n", "#include \n", "\n", "#include " ] }, { "cell_type": "markdown", "id": "754ca839", "metadata": {}, "source": [ " Print the first few entries and summary statistics.\n", " " ] }, { "cell_type": "code", "execution_count": 2, "id": "c3ea3242", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:31:57.464670Z", "iopub.status.busy": "2026-05-19T20:31:57.464543Z", "iopub.status.idle": "2026-05-19T20:31:57.491494Z", "shell.execute_reply": "2026-05-19T20:31:57.491007Z" } }, "outputs": [], "source": [ "%%cpp -d\n", "void printData(const RooAbsData& data) {\n", " std::cout << \"\\n\";\n", " data.Print();\n", "\n", " for (int i=0; i < data.numEntries() && i < 20; ++i) {\n", " std::cout << \"(\";\n", " for (const auto var : *data.get(i)) {\n", " std::cout << std::setprecision(3) << std::right << std::fixed << std::setw(8) << static_cast(var)->getVal() << \", \";\n", " }\n", " std::cout << \")\\tweight=\" << std::setw(10) << data.weight() << std::endl;\n", " }\n", "\n", " // Get the x and y variables from the dataset:\n", " const auto & x = static_cast(*(*data.get())[0]);\n", " const auto & y = static_cast(*(*data.get())[1]);\n", "\n", " std::cout << \"mean(x) = \" << data.mean(x) << \"\\tsigma(x) = \" << std::sqrt(data.moment(x, 2.))\n", " << \"\\n\" << \"mean(y) = \" << data.mean(y) << \"\\tsigma(y) = \" << std::sqrt(data.moment(y, 2.)) << std::endl;\n", "}" ] }, { "cell_type": "markdown", "id": "a86a2879", "metadata": {}, "source": [ "Set up\n", "------------------------" ] }, { "cell_type": "markdown", "id": "5827766e", "metadata": {}, "source": [ "We create an RDataFrame with two columns filled with 2 million random numbers." ] }, { "cell_type": "code", "execution_count": 3, "id": "5d60c12e", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:31:57.493018Z", "iopub.status.busy": "2026-05-19T20:31:57.492902Z", "iopub.status.idle": "2026-05-19T20:31:58.193270Z", "shell.execute_reply": "2026-05-19T20:31:58.192568Z" } }, "outputs": [], "source": [ "auto df = ROOT::RDataFrame{2000000}.Define(\"x\", []() { return gRandom->Uniform(-5., 5.); }).Define(\"y\", []() {\n", " return gRandom->Gaus(1., 3.);\n", "});" ] }, { "cell_type": "markdown", "id": "45580595", "metadata": {}, "source": [ "We create RooFit variables that will represent the dataset." ] }, { "cell_type": "code", "execution_count": 4, "id": "e2c614dc", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:31:58.195378Z", "iopub.status.busy": "2026-05-19T20:31:58.195259Z", "iopub.status.idle": "2026-05-19T20:31:58.403615Z", "shell.execute_reply": "2026-05-19T20:31:58.402940Z" } }, "outputs": [], "source": [ "RooRealVar x(\"x\", \"x\", -5., 5.);\n", "RooRealVar y(\"y\", \"y\", -50., 50.);\n", "x.setBins(10);\n", "y.setBins(20);" ] }, { "cell_type": "markdown", "id": "d866db6b", "metadata": {}, "source": [ "Booking the creation of RooDataSet / RooDataHist in RDataFrame\n", "----------------------------------------------------------------" ] }, { "cell_type": "markdown", "id": "1c1a95a2", "metadata": {}, "source": [ "Method 1:\n", "---------\n", "We directly book the RooDataSetHelper action.\n", "We need to pass\n", "- the RDataFrame column types as template parameters\n", "- the constructor arguments for RooDataSet (they follow the same syntax as the usual RooDataSet constructors)\n", "- the column names that RDataFrame should fill into the dataset\n", "\n", "NOTE: RDataFrame columns are matched to RooFit variables by position, *not by name*!\n", "\n", "The returned object is not yet a RooDataSet, but an RResultPtr that will\n", "be lazy-evaluated once you call GetValue() on it. We will only evaluate\n", "the RResultPtr once all other RDataFrame related actions are declared.\n", "This way we trigger the event loop computation only once, which will\n", "improve the runtime significantly.\n", "\n", "To learn more about lazy actions, see:\n", "https://root.cern/doc/master/classROOT_1_1RDataFrame.html#actions" ] }, { "cell_type": "code", "execution_count": 5, "id": "e644bc66", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:31:58.405755Z", "iopub.status.busy": "2026-05-19T20:31:58.405626Z", "iopub.status.idle": "2026-05-19T20:31:59.323965Z", "shell.execute_reply": "2026-05-19T20:31:59.323362Z" } }, "outputs": [], "source": [ "ROOT::RDF::RResultPtr rooDataSetResult = df.Book(\n", " RooDataSetHelper(\"dataset\", // Name\n", " \"Title of dataset\", // Title\n", " RooArgSet(x, y) // Variables in this dataset\n", " ),\n", " {\"x\", \"y\"} // Column names in RDataFrame.\n", ");" ] }, { "cell_type": "markdown", "id": "28712b8b", "metadata": {}, "source": [ "Method 2:\n", "---------\n", "We first declare the RooDataHistHelper" ] }, { "cell_type": "code", "execution_count": 6, "id": "860afefe", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:31:59.326202Z", "iopub.status.busy": "2026-05-19T20:31:59.326085Z", "iopub.status.idle": "2026-05-19T20:31:59.528355Z", "shell.execute_reply": "2026-05-19T20:31:59.527630Z" } }, "outputs": [], "source": [ "RooDataHistHelper rdhMaker{\"datahist\", // Name\n", " \"Title of data hist\", // Title\n", " RooArgSet(x, y) // Variables in this dataset\n", "};" ] }, { "cell_type": "markdown", "id": "9bd55f99", "metadata": {}, "source": [ "Then, we move it into an RDataFrame action:" ] }, { "cell_type": "code", "execution_count": 7, "id": "3ae94acb", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:31:59.530399Z", "iopub.status.busy": "2026-05-19T20:31:59.530265Z", "iopub.status.idle": "2026-05-19T20:32:00.068628Z", "shell.execute_reply": "2026-05-19T20:32:00.067996Z" } }, "outputs": [], "source": [ "ROOT::RDF::RResultPtr rooDataHistResult = df.Book(std::move(rdhMaker), {\"x\", \"y\"});" ] }, { "cell_type": "markdown", "id": "4f82a5b3", "metadata": {}, "source": [ "Run it and inspect the results\n", "-------------------------------" ] }, { "cell_type": "markdown", "id": "b28874fd", "metadata": {}, "source": [ "At this point, all RDF actions were defined (namely, the `Book`\n", "operations), so we can get values from the RResultPtr objects, triggering\n", "the event loop and getting the actual RooFit data objects." ] }, { "cell_type": "code", "execution_count": 8, "id": "b34b2806", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:32:00.070799Z", "iopub.status.busy": "2026-05-19T20:32:00.070683Z", "iopub.status.idle": "2026-05-19T20:32:00.752868Z", "shell.execute_reply": "2026-05-19T20:32:00.752191Z" } }, "outputs": [], "source": [ "RooDataSet const& rooDataSet = rooDataSetResult.GetValue();\n", "RooDataHist const& rooDataHist = rooDataHistResult.GetValue();" ] }, { "cell_type": "markdown", "id": "f02a76a2", "metadata": {}, "source": [ "Let's inspect the dataset / datahist." ] }, { "cell_type": "code", "execution_count": 9, "id": "23acf511", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:32:00.754576Z", "iopub.status.busy": "2026-05-19T20:32:00.754457Z", "iopub.status.idle": "2026-05-19T20:32:01.311678Z", "shell.execute_reply": "2026-05-19T20:32:01.311112Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RooDataSet::dataset[x,y] = 2000000 entries\n", "( 4.997, -0.304, )\tweight= 1.000\n", "( 4.472, 0.910, )\tweight= 1.000\n", "( 4.575, 0.830, )\tweight= 1.000\n", "( 0.400, 0.776, )\tweight= 1.000\n", "( 2.599, -0.232, )\tweight= 1.000\n", "( -1.844, 1.575, )\tweight= 1.000\n", "( 0.197, 0.853, )\tweight= 1.000\n", "( -1.077, -0.721, )\tweight= 1.000\n", "( -4.697, -3.165, )\tweight= 1.000\n", "( 4.437, -1.208, )\tweight= 1.000\n", "( 3.983, -0.146, )\tweight= 1.000\n", "( -0.014, -1.447, )\tweight= 1.000\n", "( -3.177, -2.704, )\tweight= 1.000\n", "( -4.371, -0.363, )\tweight= 1.000\n", "( 2.254, -0.499, )\tweight= 1.000\n", "( 2.139, 6.533, )\tweight= 1.000\n", "( 1.993, 6.991, )\tweight= 1.000\n", "( -3.708, 7.781, )\tweight= 1.000\n", "( -4.168, 1.284, )\tweight= 1.000\n", "( -4.177, 4.650, )\tweight= 1.000\n", "mean(x) = 0.001\tsigma(x) = 2.886\n", "mean(y) = 1.000\tsigma(y) = 3.000\n", "\n", "RooDataHist::datahist[x,y] = 200 bins (2000000.000 weights)\n", "( -4.500, -47.500, )\tweight= 0.000\n", "( -4.500, -42.500, )\tweight= 0.000\n", "( -4.500, -37.500, )\tweight= 0.000\n", "( -4.500, -32.500, )\tweight= 0.000\n", "( -4.500, -27.500, )\tweight= 0.000\n", "( -4.500, -22.500, )\tweight= 0.000\n", "( -4.500, -17.500, )\tweight= 0.000\n", "( -4.500, -12.500, )\tweight= 24.000\n", "( -4.500, -7.500, )\tweight= 4537.000\n", "( -4.500, -2.500, )\tweight= 69653.000\n", "( -4.500, 2.500, )\tweight=107838.000\n", "( -4.500, 7.500, )\tweight= 17790.000\n", "( -4.500, 12.500, )\tweight= 292.000\n", "( -4.500, 17.500, )\tweight= 0.000\n", "( -4.500, 22.500, )\tweight= 0.000\n", "( -4.500, 27.500, )\tweight= 0.000\n", "( -4.500, 32.500, )\tweight= 0.000\n", "( -4.500, 37.500, )\tweight= 0.000\n", "( -4.500, 42.500, )\tweight= 0.000\n", "( -4.500, 47.500, )\tweight= 0.000\n", "mean(x) = 0.001\tsigma(x) = 2.872\n", "mean(y) = 0.999\tsigma(y) = 3.329\n" ] } ], "source": [ "printData(rooDataSet);\n", "printData(rooDataHist);" ] }, { "cell_type": "markdown", "id": "e27f908a", "metadata": {}, "source": [ "Draw all canvases " ] }, { "cell_type": "code", "execution_count": 10, "id": "add3b754", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:32:01.313024Z", "iopub.status.busy": "2026-05-19T20:32:01.312899Z", "iopub.status.idle": "2026-05-19T20:32:01.521248Z", "shell.execute_reply": "2026-05-19T20:32:01.520617Z" } }, "outputs": [], "source": [ "gROOT->GetListOfCanvases()->Draw()" ] } ], "metadata": { "kernelspec": { "display_name": "ROOT C++", "language": "c++", "name": "root" }, "language_info": { "codemirror_mode": "text/x-c++src", "file_extension": ".C", "mimetype": " text/x-c++src", "name": "c++" } }, "nbformat": 4, "nbformat_minor": 5 }