{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "88a5da87",
   "metadata": {},
   "source": [
    "# df026_AsNumpyArrays\n",
    "Read data from RDataFrame into Numpy arrays.\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "**Author:** Stefan Wunsch (KIT, CERN)  \n",
    "<i><small>This notebook tutorial was automatically generated with <a href= \"https://github.com/root-project/root/blob/master/documentation/doxygen/converttonotebook.py\">ROOTBOOK-izer</a> from the macro found in the ROOT repository  on Tuesday, May 19, 2026 at 08:09 PM.</small></i>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "60db843e",
   "metadata": {
    "collapsed": false,
    "execution": {
     "iopub.execute_input": "2026-05-19T20:10:01.369051Z",
     "iopub.status.busy": "2026-05-19T20:10:01.368925Z",
     "iopub.status.idle": "2026-05-19T20:10:02.322326Z",
     "shell.execute_reply": "2026-05-19T20:10:02.321212Z"
    }
   },
   "outputs": [],
   "source": [
    "import ROOT\n",
    "from sys import exit"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad776f28",
   "metadata": {},
   "source": [
    "Let's create a simple dataframe with ten rows and two columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "5355e103",
   "metadata": {
    "collapsed": false,
    "execution": {
     "iopub.execute_input": "2026-05-19T20:10:02.324023Z",
     "iopub.status.busy": "2026-05-19T20:10:02.323894Z",
     "iopub.status.idle": "2026-05-19T20:10:02.843169Z",
     "shell.execute_reply": "2026-05-19T20:10:02.842805Z"
    }
   },
   "outputs": [],
   "source": [
    "df = ROOT.RDataFrame(10) \\\n",
    "         .Define(\"x\", \"(int)rdfentry_\") \\\n",
    "         .Define(\"y\", \"1.f/(1.f+rdfentry_)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e9c2471",
   "metadata": {},
   "source": [
    "Next, we want to access the data from Python as Numpy arrays. To do so, the\n",
    "content of the dataframe is converted using the AsNumpy method. The returned\n",
    "object is a dictionary with the column names as keys and 1D numpy arrays with\n",
    "the content as values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "7b45d865",
   "metadata": {
    "collapsed": false,
    "execution": {
     "iopub.execute_input": "2026-05-19T20:10:02.845239Z",
     "iopub.status.busy": "2026-05-19T20:10:02.845122Z",
     "iopub.status.idle": "2026-05-19T20:10:04.681852Z",
     "shell.execute_reply": "2026-05-19T20:10:04.680932Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Read-out of the full RDataFrame:\n",
      "{'x': ndarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32), 'y': ndarray([1.        , 0.5       , 0.33333334, 0.25      , 0.2       ,\n",
      "         0.16666667, 0.14285715, 0.125     , 0.11111111, 0.1       ],\n",
      "        dtype=float32)}\n",
      "\n"
     ]
    }
   ],
   "source": [
    "npy = df.AsNumpy()\n",
    "print(\"Read-out of the full RDataFrame:\\n{}\\n\".format(npy))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "50b57743",
   "metadata": {},
   "source": [
    "Since reading out data to memory is expensive, always try to read-out only what\n",
    "is needed for your analysis. You can use all RDataFrame features to reduce your\n",
    "dataset, e.g., the Filter transformation. Furthermore, you can can pass to the\n",
    "AsNumpy method a whitelist of column names with the option `columns` or a blacklist\n",
    "with column names with the option `exclude`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "0b96238c",
   "metadata": {
    "collapsed": false,
    "execution": {
     "iopub.execute_input": "2026-05-19T20:10:04.683423Z",
     "iopub.status.busy": "2026-05-19T20:10:04.683221Z",
     "iopub.status.idle": "2026-05-19T20:10:05.719367Z",
     "shell.execute_reply": "2026-05-19T20:10:05.718585Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Read-out of the filtered RDataFrame:\n",
      "{'x': ndarray([6, 7, 8, 9], dtype=int32), 'y': ndarray([0.14285715, 0.125     , 0.11111111, 0.1       ], dtype=float32)}\n",
      "\n",
      "Read-out of the filtered RDataFrame with the columns option:\n",
      "{'x': ndarray([6, 7, 8, 9], dtype=int32)}\n",
      "\n",
      "Read-out of the filtered RDataFrame with the exclude option:\n",
      "{'y': ndarray([0.14285715, 0.125     , 0.11111111, 0.1       ], dtype=float32)}\n",
      "\n"
     ]
    }
   ],
   "source": [
    "df2 = df.Filter(\"x>5\")\n",
    "npy2 = df2.AsNumpy()\n",
    "print(\"Read-out of the filtered RDataFrame:\\n{}\\n\".format(npy2))\n",
    "\n",
    "npy3 = df2.AsNumpy(columns=[\"x\"])\n",
    "print(\"Read-out of the filtered RDataFrame with the columns option:\\n{}\\n\".format(npy3))\n",
    "\n",
    "npy4 = df2.AsNumpy(exclude=[\"x\"])\n",
    "print(\"Read-out of the filtered RDataFrame with the exclude option:\\n{}\\n\".format(npy4))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "577e6f34",
   "metadata": {},
   "source": [
    "You can read-out all objects from ROOT files since these are wrapped by PyROOT\n",
    "in the Python world. However, be aware that objects other than fundamental types,\n",
    "such as complex C++ objects and not int or float, are costly to read-out."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "1a358de1",
   "metadata": {
    "collapsed": false,
    "execution": {
     "iopub.execute_input": "2026-05-19T20:10:05.720771Z",
     "iopub.status.busy": "2026-05-19T20:10:05.720635Z",
     "iopub.status.idle": "2026-05-19T20:10:06.421858Z",
     "shell.execute_reply": "2026-05-19T20:10:06.421386Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Read-out of C++ objects:\n",
      "[<cppyy.gbl.CustomObject object at 0x55643a8030b0>\n",
      " <cppyy.gbl.CustomObject object at 0x55643a8030b4>\n",
      " <cppyy.gbl.CustomObject object at 0x55643a8030b8>\n",
      " <cppyy.gbl.CustomObject object at 0x55643a8030bc>\n",
      " <cppyy.gbl.CustomObject object at 0x55643a8030c0>\n",
      " <cppyy.gbl.CustomObject object at 0x55643a8030c4>\n",
      " <cppyy.gbl.CustomObject object at 0x55643a8030c8>\n",
      " <cppyy.gbl.CustomObject object at 0x55643a8030cc>\n",
      " <cppyy.gbl.CustomObject object at 0x55643a8030d0>\n",
      " <cppyy.gbl.CustomObject object at 0x55643a8030d4>]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Access to all methods and data members of the C++ object:\n",
      "Object: <cppyy.gbl.CustomObject object at 0x55643a8030b0>\n",
      "Access data member: custom_object.x = 42\n",
      "\n"
     ]
    }
   ],
   "source": [
    "ROOT.gInterpreter.Declare(\"\"\"\n",
    "// Inject the C++ class CustomObject in the C++ runtime.\n",
    "class CustomObject {\n",
    "public:\n",
    "    int x = 42;\n",
    "};\n",
    "// Create a function that returns such an object. This is called to fill the dataframe.\n",
    "CustomObject fill_object() { return CustomObject(); }\n",
    "\"\"\")\n",
    "\n",
    "df3 = df.Define(\"custom_object\", \"fill_object()\")\n",
    "npy5 = df3.AsNumpy()\n",
    "print(\"Read-out of C++ objects:\\n{}\\n\".format(npy5[\"custom_object\"]))\n",
    "print(\"Access to all methods and data members of the C++ object:\\nObject: {}\\nAccess data member: custom_object.x = {}\\n\".format(\n",
    "    repr(npy5[\"custom_object\"][0]), npy5[\"custom_object\"][0].x))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20a3d0bb",
   "metadata": {},
   "source": [
    "Note that you can pass the object returned by AsNumpy directly to pandas.DataFrame\n",
    "including any complex C++ object that may be read-out."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "03841dc1",
   "metadata": {
    "collapsed": false,
    "execution": {
     "iopub.execute_input": "2026-05-19T20:10:06.423425Z",
     "iopub.status.busy": "2026-05-19T20:10:06.423300Z",
     "iopub.status.idle": "2026-05-19T20:10:06.670118Z",
     "shell.execute_reply": "2026-05-19T20:10:06.669318Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Content of the ROOT.RDataFrame as pandas.DataFrame:\n",
      "                                       custom_object  x         y\n",
      "0  <cppyy.gbl.CustomObject object at 0x55643a8030b0>  0  1.000000\n",
      "1  <cppyy.gbl.CustomObject object at 0x55643a8030b4>  1  0.500000\n",
      "2  <cppyy.gbl.CustomObject object at 0x55643a8030b8>  2  0.333333\n",
      "3  <cppyy.gbl.CustomObject object at 0x55643a8030bc>  3  0.250000\n",
      "4  <cppyy.gbl.CustomObject object at 0x55643a8030c0>  4  0.200000\n",
      "5  <cppyy.gbl.CustomObject object at 0x55643a8030c4>  5  0.166667\n",
      "6  <cppyy.gbl.CustomObject object at 0x55643a8030c8>  6  0.142857\n",
      "7  <cppyy.gbl.CustomObject object at 0x55643a8030cc>  7  0.125000\n",
      "8  <cppyy.gbl.CustomObject object at 0x55643a8030d0>  8  0.111111\n",
      "9  <cppyy.gbl.CustomObject object at 0x55643a8030d4>  9  0.100000\n",
      "\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    import pandas\n",
    "except:\n",
    "    print(\"Please install the pandas package to run this section of the tutorial.\")\n",
    "    exit(1)\n",
    "\n",
    "df = pandas.DataFrame(npy5)\n",
    "print(\"Content of the ROOT.RDataFrame as pandas.DataFrame:\\n{}\\n\".format(df))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed9c3251",
   "metadata": {},
   "source": [
    "Draw all canvases "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "538cdd20",
   "metadata": {
    "collapsed": false,
    "execution": {
     "iopub.execute_input": "2026-05-19T20:10:06.671482Z",
     "iopub.status.busy": "2026-05-19T20:10:06.671347Z",
     "iopub.status.idle": "2026-05-19T20:10:06.779740Z",
     "shell.execute_reply": "2026-05-19T20:10:06.778631Z"
    }
   },
   "outputs": [],
   "source": [
    "from ROOT import gROOT \n",
    "gROOT.GetListOfCanvases().Draw()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}