{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "2828e849",
   "metadata": {},
   "source": [
    "# df001_introduction\n",
    "Basic usage of RDataFrame from python.\n",
    "\n",
    "This tutorial illustrates the basic features of the RDataFrame class,\n",
    "a utility which allows to interact with data stored in TTrees following\n",
    "a functional-chain like approach.\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "**Author:** Danilo Piparo (CERN)  \n",
    "<i><small>This notebook tutorial was automatically generated with <a href= \"https://github.com/root-project/root/blob/master/documentation/doxygen/converttonotebook.py\">ROOTBOOK-izer</a> from the macro found in the ROOT repository  on Tuesday, May 19, 2026 at 08:09 PM.</small></i>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "ec7333c1",
   "metadata": {
    "collapsed": false,
    "execution": {
     "iopub.execute_input": "2026-05-19T20:09:17.824436Z",
     "iopub.status.busy": "2026-05-19T20:09:17.824296Z",
     "iopub.status.idle": "2026-05-19T20:09:18.801434Z",
     "shell.execute_reply": "2026-05-19T20:09:18.801069Z"
    }
   },
   "outputs": [],
   "source": [
    "import ROOT\n",
    "\n",
    "def fill_tree(treeName, fileName):\n",
    "    \"\"\"A simple helper function to fill a test tree: this makes the example stand-alone.\"\"\"\n",
    "    df = ROOT.RDataFrame(10)\n",
    "    df.Define(\"b1\", \"static_cast<double>(rdfentry_)\")\\\n",
    "      .Define(\"b2\", \"static_cast<int>(rdfentry_ * rdfentry_)\").Snapshot(treeName, fileName)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "587b593b",
   "metadata": {},
   "source": [
    "We prepare an input tree to run on"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "5412c42a",
   "metadata": {
    "collapsed": false,
    "execution": {
     "iopub.execute_input": "2026-05-19T20:09:18.813456Z",
     "iopub.status.busy": "2026-05-19T20:09:18.813306Z",
     "iopub.status.idle": "2026-05-19T20:09:20.564094Z",
     "shell.execute_reply": "2026-05-19T20:09:20.563458Z"
    }
   },
   "outputs": [],
   "source": [
    "fileName = \"df001_introduction_py.root\"\n",
    "treeName = \"myTree\"\n",
    "fill_tree(treeName, fileName)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "58d90f99",
   "metadata": {},
   "source": [
    "We read the tree from the file and create a RDataFrame, a class that\n",
    "allows us to interact with the data contained in the tree."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "6bca4e28",
   "metadata": {
    "collapsed": false,
    "execution": {
     "iopub.execute_input": "2026-05-19T20:09:20.572877Z",
     "iopub.status.busy": "2026-05-19T20:09:20.572748Z",
     "iopub.status.idle": "2026-05-19T20:09:20.687365Z",
     "shell.execute_reply": "2026-05-19T20:09:20.686639Z"
    }
   },
   "outputs": [],
   "source": [
    "d = ROOT.RDataFrame(treeName, fileName)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c0e4176",
   "metadata": {},
   "source": [
    "Operations on the dataframe\n",
    "We now review some *actions* which can be performed on the data frame.\n",
    "Actions can be divided into instant actions (e. g. Foreach()) and lazy\n",
    "actions (e. g. Count()), depending on whether they trigger the event \n",
    "loop immediately or only when one of the results is accessed for the \n",
    "first time. Actions that return \"something\" either return their result \n",
    "wrapped in a RResultPtr or in a RDataFrame.\n",
    "But first of all, let us we define now our cut-flow with two strings.\n",
    "Filters can be expressed as strings. The content must be C++ code. The\n",
    "name of the variables must be the name of the branches. The code is\n",
    "just-in-time compiled."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "3eeb566e",
   "metadata": {
    "collapsed": false,
    "execution": {
     "iopub.execute_input": "2026-05-19T20:09:20.689141Z",
     "iopub.status.busy": "2026-05-19T20:09:20.689019Z",
     "iopub.status.idle": "2026-05-19T20:09:20.791941Z",
     "shell.execute_reply": "2026-05-19T20:09:20.791560Z"
    }
   },
   "outputs": [],
   "source": [
    "cutb1 = 'b1 < 5.'\n",
    "cutb1b2 = 'b2 % 2 && b1 < 4.'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9abad312",
   "metadata": {},
   "source": [
    "`Count` action\n",
    "The `Count` allows to retrieve the number of the entries that passed the\n",
    "filters. Here we show how the automatic selection of the column kicks\n",
    "in in case the user specifies none."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "1df44457",
   "metadata": {
    "collapsed": false,
    "execution": {
     "iopub.execute_input": "2026-05-19T20:09:20.801669Z",
     "iopub.status.busy": "2026-05-19T20:09:20.801511Z",
     "iopub.status.idle": "2026-05-19T20:09:21.886567Z",
     "shell.execute_reply": "2026-05-19T20:09:21.885959Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2 entries passed all filters\n",
      "5 entries passed all filters\n"
     ]
    }
   ],
   "source": [
    "entries1 = d.Filter(cutb1) \\\n",
    "            .Filter(cutb1b2) \\\n",
    "            .Count();\n",
    "\n",
    "print('{} entries passed all filters'.format(entries1.GetValue()))\n",
    "\n",
    "entries2 = d.Filter(\"b1 < 5.\").Count();\n",
    "print('{} entries passed all filters'.format(entries2.GetValue()))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65a735e6",
   "metadata": {},
   "source": [
    "`Min`, `Max` and `Mean` actions\n",
    "These actions allow to retrieve statistical information about the entries\n",
    "passing the cuts, if any."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "33501e17",
   "metadata": {
    "collapsed": false,
    "execution": {
     "iopub.execute_input": "2026-05-19T20:09:21.899173Z",
     "iopub.status.busy": "2026-05-19T20:09:21.899031Z",
     "iopub.status.idle": "2026-05-19T20:09:23.128700Z",
     "shell.execute_reply": "2026-05-19T20:09:23.128230Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The mean is always included between the min and the max: 1.0 <= 2.0 <= 3.0\n"
     ]
    }
   ],
   "source": [
    "b1b2_cut = d.Filter(cutb1b2)\n",
    "minVal = b1b2_cut.Min('b1')\n",
    "maxVal = b1b2_cut.Max('b1')\n",
    "meanVal = b1b2_cut.Mean('b1')\n",
    "nonDefmeanVal = b1b2_cut.Mean(\"b2\")\n",
    "print('The mean is always included between the min and the max: {0} <= {1} <= {2}'.format(minVal.GetValue(), meanVal.GetValue(), maxVal.GetValue()))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f1a4af6b",
   "metadata": {},
   "source": [
    "`Histo1D` action\n",
    "The `Histo1D` action allows to fill an histogram. It returns a TH1F filled\n",
    "with values of the column that passed the filters. For the most common\n",
    "types, the type of the values stored in the column is automatically\n",
    "guessed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "a90c10e3",
   "metadata": {
    "collapsed": false,
    "execution": {
     "iopub.execute_input": "2026-05-19T20:09:23.135650Z",
     "iopub.status.busy": "2026-05-19T20:09:23.135505Z",
     "iopub.status.idle": "2026-05-19T20:09:23.897559Z",
     "shell.execute_reply": "2026-05-19T20:09:23.888459Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Filled h 5.0 times, mean: 2.0\n"
     ]
    }
   ],
   "source": [
    "hist = d.Filter(cutb1).Histo1D('b1')\n",
    "print('Filled h {0} times, mean: {1}'.format(hist.GetEntries(), hist.GetMean()))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d673426f",
   "metadata": {},
   "source": [
    "Express your chain of operations with clarity!\n",
    "We are discussing an example here but it is not hard to imagine much more\n",
    "complex pipelines of actions acting on data. Those might require code\n",
    "which is well organised, for example allowing to conditionally add filters\n",
    "or again to clearly separate filters and actions without the need of\n",
    "writing the entire pipeline on one line. This can be easily achieved.\n",
    "We'll show this re-working the `Count` example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "59ec6aec",
   "metadata": {
    "collapsed": false,
    "execution": {
     "iopub.execute_input": "2026-05-19T20:09:23.911327Z",
     "iopub.status.busy": "2026-05-19T20:09:23.911190Z",
     "iopub.status.idle": "2026-05-19T20:09:24.014390Z",
     "shell.execute_reply": "2026-05-19T20:09:24.014011Z"
    }
   },
   "outputs": [],
   "source": [
    "cutb1_result = d.Filter(cutb1);\n",
    "cutb1b2_result = d.Filter(cutb1b2);\n",
    "cutb1_cutb1b2_result = cutb1_result.Filter(cutb1b2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5a5246b6",
   "metadata": {},
   "source": [
    "Now we want to count:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "e64ffe45",
   "metadata": {
    "collapsed": false,
    "execution": {
     "iopub.execute_input": "2026-05-19T20:09:24.023149Z",
     "iopub.status.busy": "2026-05-19T20:09:24.023026Z",
     "iopub.status.idle": "2026-05-19T20:09:24.126907Z",
     "shell.execute_reply": "2026-05-19T20:09:24.126498Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Events passing cutb1: 5\n",
      "Events passing cutb1b2: 2\n",
      "Events passing both: 2\n"
     ]
    }
   ],
   "source": [
    "evts_cutb1_result = cutb1_result.Count()\n",
    "evts_cutb1b2_result = cutb1b2_result.Count()\n",
    "evts_cutb1_cutb1b2_result = cutb1_cutb1b2_result.Count()\n",
    "\n",
    "print('Events passing cutb1: {}'.format(evts_cutb1_result.GetValue()))\n",
    "print('Events passing cutb1b2: {}'.format(evts_cutb1b2_result.GetValue()))\n",
    "print('Events passing both: {}'.format(evts_cutb1_cutb1b2_result.GetValue()))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4369c5fa",
   "metadata": {},
   "source": [
    "Calculating quantities starting from existing columns\n",
    "Often, operations need to be carried out on quantities calculated starting\n",
    "from the ones present in the columns. We'll create in this example a third\n",
    "column, the values of which are the sum of the *b1* and *b2* ones, entry by\n",
    "entry. The way in which the new quantity is defined is via a callable.\n",
    "It is important to note two aspects at this point:\n",
    "- The value is created on the fly only if the entry passed the existing\n",
    "filters.\n",
    "- The newly created column behaves as the one present on the file on disk.\n",
    "- The operation creates a new value, without modifying anything. De facto,\n",
    "this is like having a general container at disposal able to accommodate\n",
    "any value of any type.\n",
    "Let's dive in an example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "155f5e92",
   "metadata": {
    "collapsed": false,
    "execution": {
     "iopub.execute_input": "2026-05-19T20:09:24.131618Z",
     "iopub.status.busy": "2026-05-19T20:09:24.131486Z",
     "iopub.status.idle": "2026-05-19T20:09:24.375988Z",
     "shell.execute_reply": "2026-05-19T20:09:24.375524Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "8\n"
     ]
    }
   ],
   "source": [
    "entries_sum = d.Define('sum', 'b2 + b1') \\\n",
    "               .Filter('sum > 4.2') \\\n",
    "               .Count()\n",
    "print(entries_sum.GetValue())"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}