{ "cells": [ { "cell_type": "markdown", "id": "c3da9b87", "metadata": {}, "source": [ "# df014_CSVDataSource\n", "Process a CSV file with RDataFrame and the CSV data source.\n", "\n", "This tutorial illustrates how use the RDataFrame in combination with a\n", "RDataSource. In this case we use a RCsvDS. This data source allows to read\n", "a CSV file from a RDataFrame.\n", "As a result of running this tutorial, we will produce plots of the dimuon\n", "spectrum starting from a subset of the CMS collision events of Run2010B.\n", "Dataset Reference:\n", "McCauley, T. (2014). Dimuon event information derived from the Run2010B\n", "public Mu dataset. CERN Open Data Portal.\n", "DOI: [10.7483/OPENDATA.CMS.CB8H.MFFA](http://opendata.cern.ch/record/700).\n", "\n", "\n", "\n", "\n", "**Author:** Enric Tejedor (CERN) \n", "This notebook tutorial was automatically generated with ROOTBOOK-izer from the macro found in the ROOT repository on Tuesday, May 19, 2026 at 08:09 PM." ] }, { "cell_type": "markdown", "id": "c85640e3", "metadata": {}, "source": [ "Let's first create a RDF that will read from the CSV file.\n", "The types of the columns will be automatically inferred." ] }, { "cell_type": "code", "execution_count": 1, "id": "b5034614", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:09:41.924557Z", "iopub.status.busy": "2026-05-19T20:09:41.924414Z", "iopub.status.idle": "2026-05-19T20:09:42.341505Z", "shell.execute_reply": "2026-05-19T20:09:42.341046Z" } }, "outputs": [], "source": [ "auto fileUrl = \"http://root.cern/files/tutorials/df014_CsvDataSource_MuRun2010B.csv\";\n", "auto df = ROOT::RDF::FromCSV(fileUrl);" ] }, { "cell_type": "markdown", "id": "2febcd69", "metadata": {}, "source": [ "Now we will apply a first filter based on two columns of the CSV,\n", "and we will define a new column that will contain the invariant mass.\n", "Note how the new invariant mass column is defined from several other\n", "columns that already existed in the CSV file." ] }, { "cell_type": "code", "execution_count": 2, "id": "f8197b40", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:09:42.359255Z", "iopub.status.busy": "2026-05-19T20:09:42.359104Z", "iopub.status.idle": "2026-05-19T20:09:42.562709Z", "shell.execute_reply": "2026-05-19T20:09:42.562079Z" } }, "outputs": [], "source": [ "auto filteredEvents =\n", " df.Filter(\"Q1 * Q2 == -1\")\n", " .Define(\"m\", \"sqrt(pow(E1 + E2, 2) - (pow(px1 + px2, 2) + pow(py1 + py2, 2) + pow(pz1 + pz2, 2)))\");" ] }, { "cell_type": "markdown", "id": "68704ac5", "metadata": {}, "source": [ "Next we create a histogram to hold the invariant mass values and we draw it." ] }, { "cell_type": "code", "execution_count": 3, "id": "1ed64cb9", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:09:42.573072Z", "iopub.status.busy": "2026-05-19T20:09:42.572900Z", "iopub.status.idle": "2026-05-19T20:09:45.480022Z", "shell.execute_reply": "2026-05-19T20:09:45.479534Z" } }, "outputs": [], "source": [ "auto invMass =\n", " filteredEvents.Histo1D({\"invMass\", \"CMS Opendata: #mu#mu mass;#mu#mu mass [GeV];Events\", 512, 2, 110}, \"m\");\n", "\n", "auto c = new TCanvas();\n", "c->SetLogx();\n", "c->SetLogy();\n", "invMass->DrawClone();" ] }, { "cell_type": "markdown", "id": "8fae4ffb", "metadata": {}, "source": [ "We will now produce a plot also for the J/Psi particle. We will plot\n", "on the same canvas the full spectrum and the zoom in on the J/psi particle.\n", "First we will create the full spectrum histogram from the invariant mass\n", "column, using a different histogram model than before." ] }, { "cell_type": "code", "execution_count": 4, "id": "bef18571", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:09:45.482106Z", "iopub.status.busy": "2026-05-19T20:09:45.481984Z", "iopub.status.idle": "2026-05-19T20:09:45.684918Z", "shell.execute_reply": "2026-05-19T20:09:45.684262Z" } }, "outputs": [], "source": [ "auto fullSpectrum =\n", " filteredEvents.Histo1D({\"Spectrum\", \"Subset of CMS Run 2010B;#mu#mu mass [GeV];Events\", 1024, 2, 110}, \"m\");" ] }, { "cell_type": "markdown", "id": "ea4033bd", "metadata": {}, "source": [ "Next we will create the histogram for the J/psi particle, applying first\n", "the corresponding cut." ] }, { "cell_type": "code", "execution_count": 5, "id": "c5a41535", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:09:45.686906Z", "iopub.status.busy": "2026-05-19T20:09:45.686787Z", "iopub.status.idle": "2026-05-19T20:09:46.084539Z", "shell.execute_reply": "2026-05-19T20:09:46.083383Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "input_line_69:4:1: warning: captures will be by reference, not copy\n", "auto jpsiCut = [jpsiLow, jpsiHigh](double m) { return m < jpsiHigh && m > jpsiLow; };\n", "^\n" ] } ], "source": [ "double jpsiLow = 2.95;\n", "double jpsiHigh = 3.25;\n", "auto jpsiCut = [jpsiLow, jpsiHigh](double m) { return m < jpsiHigh && m > jpsiLow; };\n", "auto jpsi =\n", " filteredEvents.Filter(jpsiCut, {\"m\"})\n", " .Histo1D({\"jpsi\", \"Subset of CMS Run 2010B: J/#psi window;#mu#mu mass [GeV];Events\", 128, jpsiLow, jpsiHigh},\n", " \"m\");" ] }, { "cell_type": "markdown", "id": "6f372418", "metadata": {}, "source": [ "Finally we draw the two histograms side by side." ] }, { "cell_type": "code", "execution_count": 6, "id": "2c983ecd", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:09:46.086366Z", "iopub.status.busy": "2026-05-19T20:09:46.086238Z", "iopub.status.idle": "2026-05-19T20:09:47.598796Z", "shell.execute_reply": "2026-05-19T20:09:47.593389Z" } }, "outputs": [], "source": [ "auto dualCanvas = new TCanvas(\"DualCanvas\", \"DualCanvas\", 800, 512);\n", "dualCanvas->Divide(2, 1);\n", "auto leftPad = dualCanvas->cd(1);\n", "leftPad->SetLogx();\n", "leftPad->SetLogy();\n", "fullSpectrum->DrawClone(\"Hist\");\n", "dualCanvas->cd(2);\n", "jpsi->SetMarkerStyle(20);\n", "jpsi->DrawClone(\"HistP\");\n", "\n", "return 0;" ] }, { "cell_type": "markdown", "id": "b5a4831a", "metadata": {}, "source": [ "Draw all canvases " ] }, { "cell_type": "code", "execution_count": 7, "id": "05801eeb", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:09:47.607316Z", "iopub.status.busy": "2026-05-19T20:09:47.607177Z", "iopub.status.idle": "2026-05-19T20:09:47.816959Z", "shell.execute_reply": "2026-05-19T20:09:47.816385Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "
\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "
\n", "
\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "gROOT->GetListOfCanvases()->Draw()" ] } ], "metadata": { "kernelspec": { "display_name": "ROOT C++", "language": "c++", "name": "root" }, "language_info": { "codemirror_mode": "text/x-c++src", "file_extension": ".C", "mimetype": " text/x-c++src", "name": "c++" } }, "nbformat": 4, "nbformat_minor": 5 }