{
"cells": [
{
"cell_type": "markdown",
"id": "0bc63b1d",
"metadata": {},
"source": [
"# df036_missingBranches\n",
"\n",
"This example shows how to process a dataset where entries might be\n",
"incomplete due to one or more missing branches in one or more of the files\n",
"in the dataset. It shows usage of the FilterAvailable and DefaultValueFor\n",
"RDataFrame functionalities to act upon the missing entries.\n",
"\n",
"\n",
"\n",
"\n",
"**Author:** Vincenzo Eduardo Padulano (CERN) \n",
"This notebook tutorial was automatically generated with ROOTBOOK-izer from the macro found in the ROOT repository on Tuesday, May 19, 2026 at 08:10 PM."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "32592120",
"metadata": {
"collapsed": false,
"execution": {
"iopub.execute_input": "2026-05-19T20:10:16.649883Z",
"iopub.status.busy": "2026-05-19T20:10:16.649784Z",
"iopub.status.idle": "2026-05-19T20:10:16.653350Z",
"shell.execute_reply": "2026-05-19T20:10:16.652961Z"
}
},
"outputs": [],
"source": [
"%%cpp -d\n",
"#include \n",
"#include \n",
"#include \n",
"#include \n",
"\n",
"#include \n",
"#include "
]
},
{
"cell_type": "markdown",
"id": "a215098f",
"metadata": {},
"source": [
"A helper class to create the dataset for the tutorial below."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "142ea902",
"metadata": {
"collapsed": false,
"execution": {
"iopub.execute_input": "2026-05-19T20:10:16.655037Z",
"iopub.status.busy": "2026-05-19T20:10:16.654922Z",
"iopub.status.idle": "2026-05-19T20:10:17.010378Z",
"shell.execute_reply": "2026-05-19T20:10:17.001381Z"
}
},
"outputs": [],
"source": [
"struct Dataset {\n",
"\n",
" constexpr static std::array fFileNames{\"df036_missingBranches_C_file_1.root\",\n",
" \"df036_missingBranches_C_file_2.root\",\n",
" \"df036_missingBranches_C_file_3.root\"};\n",
" constexpr static std::array fTreeNames{\"tree_1\", \"tree_2\", \"tree_3\"};\n",
" constexpr static auto fTreeEntries{5};\n",
"\n",
" Dataset()\n",
" {\n",
" {\n",
" TFile f(fFileNames[0], \"RECREATE\");\n",
" TTree t(fTreeNames[0], fTreeNames[0]);\n",
" int x{};\n",
" int y{};\n",
" t.Branch(\"x\", &x, \"x/I\");\n",
" t.Branch(\"y\", &y, \"y/I\");\n",
" for (int i = 1; i <= fTreeEntries; i++) {\n",
" x = i;\n",
" y = 2 * i;\n",
" t.Fill();\n",
" }\n",
"\n",
" t.Write();\n",
" }\n",
"\n",
" {\n",
" TFile f(fFileNames[1], \"RECREATE\");\n",
" TTree t(fTreeNames[1], fTreeNames[1]);\n",
" int y{};\n",
" t.Branch(\"y\", &y, \"y/I\");\n",
" for (int i = 1; i <= fTreeEntries; i++) {\n",
" y = 3 * i;\n",
" t.Fill();\n",
" }\n",
"\n",
" t.Write();\n",
" }\n",
"\n",
" {\n",
" TFile f(fFileNames[2], \"RECREATE\");\n",
" TTree t(fTreeNames[2], fTreeNames[2]);\n",
" int x{};\n",
" t.Branch(\"x\", &x, \"x/I\");\n",
" for (int i = 1; i <= fTreeEntries; i++) {\n",
" x = 4 * i;\n",
" t.Fill();\n",
" }\n",
"\n",
" t.Write();\n",
" }\n",
" }\n",
"\n",
" ~Dataset()\n",
" {\n",
" for (auto &&fileName : fFileNames)\n",
" std::remove(fileName);\n",
" }\n",
"};"
]
},
{
"cell_type": "markdown",
"id": "b829a169",
"metadata": {},
"source": [
"Create the example dataset. Three files are created with one TTree each.\n",
"The first contains branches (x, y), the second only branch y, the third\n",
"only branch x."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "b490699e",
"metadata": {
"collapsed": false,
"execution": {
"iopub.execute_input": "2026-05-19T20:10:17.014481Z",
"iopub.status.busy": "2026-05-19T20:10:17.014336Z",
"iopub.status.idle": "2026-05-19T20:10:17.217693Z",
"shell.execute_reply": "2026-05-19T20:10:17.217020Z"
}
},
"outputs": [],
"source": [
"Dataset trees{};"
]
},
{
"cell_type": "markdown",
"id": "161bb2da",
"metadata": {},
"source": [
"The TChain will process the three files, encountering a different missing\n",
"branch when switching to the next tree"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "6100ae74",
"metadata": {
"collapsed": false,
"execution": {
"iopub.execute_input": "2026-05-19T20:10:17.219453Z",
"iopub.status.busy": "2026-05-19T20:10:17.219342Z",
"iopub.status.idle": "2026-05-19T20:10:17.430932Z",
"shell.execute_reply": "2026-05-19T20:10:17.426182Z"
}
},
"outputs": [],
"source": [
"TChain c{};\n",
"for (auto i = 0; i < trees.fFileNames.size(); i++) {\n",
" const auto fullPath = std::string(trees.fFileNames[i]) + \"?#\" + trees.fTreeNames[i];\n",
" c.Add(fullPath.c_str());\n",
"}\n",
"\n",
"ROOT::RDataFrame df{c};\n",
"\n",
"constexpr static auto defaultValue = std::numeric_limits::min();"
]
},
{
"cell_type": "markdown",
"id": "25cca832",
"metadata": {},
"source": [
"Example 1: provide a default value for all missing branches"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "d483dd08",
"metadata": {
"collapsed": false,
"execution": {
"iopub.execute_input": "2026-05-19T20:10:17.437726Z",
"iopub.status.busy": "2026-05-19T20:10:17.437546Z",
"iopub.status.idle": "2026-05-19T20:10:18.750622Z",
"shell.execute_reply": "2026-05-19T20:10:18.750240Z"
}
},
"outputs": [],
"source": [
"auto display1 = df.DefaultValueFor(\"x\", defaultValue)\n",
" .DefaultValueFor(\"y\", defaultValue)\n",
" .Display({\"x\", \"y\"}, /*nRows*/ 15);"
]
},
{
"cell_type": "markdown",
"id": "96708720",
"metadata": {},
"source": [
"Example 2: provide a default value for branch y, but skip events where\n",
"branch x is missing"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "635980d9",
"metadata": {
"collapsed": false,
"execution": {
"iopub.execute_input": "2026-05-19T20:10:18.753248Z",
"iopub.status.busy": "2026-05-19T20:10:18.753130Z",
"iopub.status.idle": "2026-05-19T20:10:19.550030Z",
"shell.execute_reply": "2026-05-19T20:10:19.549530Z"
}
},
"outputs": [],
"source": [
"auto display2 =\n",
" df.DefaultValueFor(\"y\", defaultValue).FilterAvailable(\"x\").Display({\"x\", \"y\"}, /*nRows*/ 15);"
]
},
{
"cell_type": "markdown",
"id": "3b28a5c9",
"metadata": {},
"source": [
"Example 3: only keep events where branch y is missing and display values for branch x"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "e851dda8",
"metadata": {
"collapsed": false,
"execution": {
"iopub.execute_input": "2026-05-19T20:10:19.564867Z",
"iopub.status.busy": "2026-05-19T20:10:19.564735Z",
"iopub.status.idle": "2026-05-19T20:10:20.210403Z",
"shell.execute_reply": "2026-05-19T20:10:20.209224Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Example 1: provide a default value for all missing branches\n",
"+-----+-------------+-------------+\n",
"| Row | x | y | \n",
"+-----+-------------+-------------+\n",
"| 0 | 1 | 2 | \n",
"+-----+-------------+-------------+\n",
"| 1 | 2 | 4 | \n",
"+-----+-------------+-------------+\n",
"| 2 | 3 | 6 | \n",
"+-----+-------------+-------------+\n",
"| 3 | 4 | 8 | \n",
"+-----+-------------+-------------+\n",
"| 4 | 5 | 10 | \n",
"+-----+-------------+-------------+\n",
"| 5 | -2147483648 | 3 | \n",
"+-----+-------------+-------------+\n",
"| 6 | -2147483648 | 6 | \n",
"+-----+-------------+-------------+\n",
"| 7 | -2147483648 | 9 | \n",
"+-----+-------------+-------------+\n",
"| 8 | -2147483648 | 12 | \n",
"+-----+-------------+-------------+\n",
"| 9 | -2147483648 | 15 | \n",
"+-----+-------------+-------------+\n",
"| 10 | 4 | -2147483648 | \n",
"+-----+-------------+-------------+\n",
"| 11 | 8 | -2147483648 | \n",
"+-----+-------------+-------------+\n",
"| 12 | 12 | -2147483648 | \n",
"+-----+-------------+-------------+\n",
"| 13 | 16 | -2147483648 | \n",
"+-----+-------------+-------------+\n",
"| 14 | 20 | -2147483648 | \n",
"+-----+-------------+-------------+\n",
"Example 2: provide a default value for branch y, but skip events where branch x is missing\n",
"+-----+----+-------------+\n",
"| Row | x | y | \n",
"+-----+----+-------------+\n",
"| 0 | 1 | 2 | \n",
"+-----+----+-------------+\n",
"| 1 | 2 | 4 | \n",
"+-----+----+-------------+\n",
"| 2 | 3 | 6 | \n",
"+-----+----+-------------+\n",
"| 3 | 4 | 8 | \n",
"+-----+----+-------------+\n",
"| 4 | 5 | 10 | \n",
"+-----+----+-------------+\n",
"| 10 | 4 | -2147483648 | \n",
"+-----+----+-------------+\n",
"| 11 | 8 | -2147483648 | \n",
"+-----+----+-------------+\n",
"| 12 | 12 | -2147483648 | \n",
"+-----+----+-------------+\n",
"| 13 | 16 | -2147483648 | \n",
"+-----+----+-------------+\n",
"| 14 | 20 | -2147483648 | \n",
"+-----+----+-------------+\n",
"Example 3: only keep events where branch y is missing and display values for branch x\n",
"+-----+----+\n",
"| Row | x | \n",
"+-----+----+\n",
"| 10 | 4 | \n",
"+-----+----+\n",
"| 11 | 8 | \n",
"+-----+----+\n",
"| 12 | 12 | \n",
"+-----+----+\n",
"| 13 | 16 | \n",
"+-----+----+\n",
"| 14 | 20 | \n",
"+-----+----+\n"
]
}
],
"source": [
"auto display3 = df.FilterMissing(\"y\").Display({\"x\"}, /*nRows*/ 15);\n",
"\n",
"std::cout << \"Example 1: provide a default value for all missing branches\\n\";\n",
"display1->Print();\n",
"\n",
"std::cout << \"Example 2: provide a default value for branch y, but skip events where branch x is missing\\n\";\n",
"display2->Print();\n",
"\n",
"std::cout << \"Example 3: only keep events where branch y is missing and display values for branch x\\n\";\n",
"display3->Print();"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "ROOT C++",
"language": "c++",
"name": "root"
},
"language_info": {
"codemirror_mode": "text/x-c++src",
"file_extension": ".C",
"mimetype": " text/x-c++src",
"name": "c++"
}
},
"nbformat": 4,
"nbformat_minor": 5
}