{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "dafa2bda",
   "metadata": {},
   "source": [
    "# df036_missingBranches\n",
    "\n",
    "This example shows how to process a dataset where entries might be\n",
    "incomplete due to one or more missing branches in one or more of the files\n",
    "in the dataset. It shows usage of the FilterAvailable and DefaultValueFor\n",
    "RDataFrame functionalities to act upon the missing entries.\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "**Author:** Vincenzo Eduardo Padulano (CERN)  \n",
    "<i><small>This notebook tutorial was automatically generated with <a href= \"https://github.com/root-project/root/blob/master/documentation/doxygen/converttonotebook.py\">ROOTBOOK-izer</a> from the macro found in the ROOT repository  on Tuesday, May 19, 2026 at 08:10 PM.</small></i>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "62ac143b",
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import array\n",
    "import os\n",
    "\n",
    "import ROOT\n",
    "\n",
    "\n",
    "class DatasetContext:\n",
    "    \"\"\"A helper class to create the dataset for the tutorial below.\"\"\"\n",
    "\n",
    "    filenames = [\n",
    "        \"df036_missingBranches_py_file_1.root\",\n",
    "        \"df036_missingBranches_py_file_2.root\",\n",
    "        \"df036_missingBranches_py_file_3.root\",\n",
    "    ]\n",
    "    treenames = [\"tree_1\", \"tree_2\", \"tree_3\"]\n",
    "    nentries = 5\n",
    "\n",
    "    def __init__(self):\n",
    "        with ROOT.TFile(self.filenames[0], \"RECREATE\"):\n",
    "            t = ROOT.TTree(self.treenames[0], self.treenames[0])\n",
    "            x = array.array(\"i\", [0])  # any array can also be a numpy array\n",
    "            y = array.array(\"i\", [0])\n",
    "            t.Branch(\"x\", x, \"x/I\")\n",
    "            t.Branch(\"y\", y, \"y/I\")\n",
    "\n",
    "            for i in range(1, self.nentries + 1):\n",
    "                x[0] = i\n",
    "                y[0] = 2 * i\n",
    "                t.Fill()\n",
    "\n",
    "            t.Write()\n",
    "\n",
    "        with ROOT.TFile(self.filenames[1], \"RECREATE\"):\n",
    "            t = ROOT.TTree(self.treenames[1], self.treenames[1])\n",
    "            y = array.array(\"i\", [0])  # any array can also be a numpy array\n",
    "            t.Branch(\"y\", y, \"y/I\")\n",
    "\n",
    "            for i in range(1, self.nentries + 1):\n",
    "                y[0] = 3 * i\n",
    "                t.Fill()\n",
    "\n",
    "            t.Write()\n",
    "\n",
    "        with ROOT.TFile(self.filenames[2], \"RECREATE\"):\n",
    "            t = ROOT.TTree(self.treenames[2], self.treenames[2])\n",
    "            x = array.array(\"i\", [0])  # any array can also be a numpy array\n",
    "            t.Branch(\"x\", x, \"x/I\")\n",
    "\n",
    "            for i in range(1, self.nentries + 1):\n",
    "                x[0] = 4 * i\n",
    "                t.Fill()\n",
    "\n",
    "            t.Write()\n",
    "\n",
    "    def __enter__(self):\n",
    "        \"\"\"Enable using the class as a context manager.\"\"\"\n",
    "        return self\n",
    "\n",
    "    def __exit__(self, *_):\n",
    "        \"\"\"\n",
    "        Enable using the class as a context manager. At the end of the context,\n",
    "        remove the files created.\n",
    "        \"\"\"\n",
    "        for filename in self.filenames:\n",
    "            os.remove(filename)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3ea77bfe",
   "metadata": {},
   "source": [
    "The input dataset contains three files, with one TTree each.\n",
    "The first contains branches (x, y), the second only branch y, the third\n",
    "only branch x. The TChain will process the three files, encountering a\n",
    "different missing branch when switching to the next tree"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1111544c",
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "chain = ROOT.TChain()\n",
    "for fname, tname in zip(dataset.filenames, dataset.treenames):\n",
    "    chain.Add(fname + \"?#\" + tname)\n",
    "\n",
    "df = ROOT.RDataFrame(chain)\n",
    "\n",
    "default_value = ROOT.std.numeric_limits[int].min()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61ce47fc",
   "metadata": {},
   "source": [
    "Example 1: provide a default value for all missing branches"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "98d914d5",
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "display_1 = (\n",
    "    df.DefaultValueFor(\"x\", default_value)\n",
    "    .DefaultValueFor(\"y\", default_value)\n",
    "    .Display(columnList=(\"x\", \"y\"), nRows=15)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "91805e65",
   "metadata": {},
   "source": [
    "Example 2: provide a default value for branch y, but skip events where\n",
    "branch x is missing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5275d8b6",
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "display_2 = df.DefaultValueFor(\"y\", default_value).FilterAvailable(\"x\").Display(columnList=(\"x\", \"y\"), nRows=15)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "166ed1a4",
   "metadata": {},
   "source": [
    "Example 3: only keep events where branch y is missing and display values for branch x"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9c14710b",
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "display_3 = df.FilterMissing(\"y\").Display(columnList=(\"x\",), nRows=15)\n",
    "\n",
    "print(\"Example 1: provide a default value for all missing branches\")\n",
    "display_1.Print()\n",
    "print(\"Example 2: provide a default value for branch y, but skip events where branch x is missing\")\n",
    "display_2.Print()\n",
    "print(\"Example 3: only keep events where branch y is missing and display values for branch x\")\n",
    "display_3.Print()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
