{ "cells": [ { "cell_type": "markdown", "id": "53044d3e", "metadata": {}, "source": [ "# ntpl017_shared_reader\n", "Example of efficient multi-threaded reading when multiple threads share a single reader.\n", "In this example, two threads share the work as follows: the first thread processes all the even entries numbers\n", "and the second thread all the odd entry numbers. The second thread works twice as slow as the first one.\n", "\n", "As a result, the threads need the same clusters and pages but at different points in time.\n", "With the naive way of using the reader, this read pattern will result in cache thrashing.\n", "\n", "Using the \"active entry token\" API, on the other hand, the reader will be informed about which entries\n", "need to be kept in the caches and cache thrashing is prevented.\n", "\n", "\n", "\n", "\n", "**Author:** The ROOT Team \n", "This notebook tutorial was automatically generated with ROOTBOOK-izer from the macro found in the ROOT repository on Tuesday, May 19, 2026 at 08:15 PM." ] }, { "cell_type": "code", "execution_count": 1, "id": "258f766a", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:15:37.985887Z", "iopub.status.busy": "2026-05-19T20:15:37.985766Z", "iopub.status.idle": "2026-05-19T20:15:37.994126Z", "shell.execute_reply": "2026-05-19T20:15:37.993588Z" } }, "outputs": [], "source": [ "%%cpp -d\n", "\n", "#include \n", "#include \n", "#include \n", "\n", "#include \n", "#include \n", "#include \n", "#include \n", "#include \n", "#include \n", "#include \n", "#include \n", "\n", "#include \n", "#include \n", "#include \n", "#include \n", "#include \n", "#include \n", "#include \n", "#include \n", "#include \n", "\n", "using namespace std::chrono_literals;" ] }, { "cell_type": "markdown", "id": "424220b6", "metadata": {}, "source": [ "Where to store the ntuple of this example" ] }, { "cell_type": "code", "execution_count": 2, "id": "970a258a", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:15:37.995815Z", "iopub.status.busy": "2026-05-19T20:15:37.995663Z", "iopub.status.idle": "2026-05-19T20:15:38.310749Z", "shell.execute_reply": "2026-05-19T20:15:38.310072Z" } }, "outputs": [], "source": [ "constexpr char const *kNTupleFileName = \"ntpl017_shared_reader.root\";" ] }, { "cell_type": "markdown", "id": "c090d1cd", "metadata": {}, "source": [ "The sample class that is stored in the RNTuple" ] }, { "cell_type": "code", "execution_count": 3, "id": "72439e4d", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:15:38.313021Z", "iopub.status.busy": "2026-05-19T20:15:38.312904Z", "iopub.status.idle": "2026-05-19T20:15:38.517263Z", "shell.execute_reply": "2026-05-19T20:15:38.516632Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "input_line_54:7:2: error: expected expression\n", " %%cpp -d\n", " ^\n", "input_line_54:7:3: error: expected expression\n", " %%cpp -d\n", " ^\n", "input_line_54:7:4: error: use of undeclared identifier 'cpp'\n", " %%cpp -d\n", " ^\n", "input_line_54:7:9: error: use of undeclared identifier 'd'\n", " %%cpp -d\n", " ^\n" ] } ], "source": [ "struct Point2D {\n", " float fX;\n", " float fY;\n", "};\n", "\n", "%%cpp -d" ] }, { "cell_type": "markdown", "id": "6490c225", "metadata": {}, "source": [ "Nicify output of EReadMode values" ] }, { "cell_type": "code", "execution_count": 4, "id": "8fe21c7b", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:15:38.518877Z", "iopub.status.busy": "2026-05-19T20:15:38.518760Z", "iopub.status.idle": "2026-05-19T20:15:38.725071Z", "shell.execute_reply": "2026-05-19T20:15:38.724003Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "input_line_55:1:50: error: unknown type name 'EReadMode'\n", "std::ostream &operator<<(std::ostream &os, const EReadMode &e)\n", " ^\n", "input_line_55:4:9: error: use of undeclared identifier 'EReadMode'\n", " case EReadMode::kNaive: os << \"naive\"; break;\n", " ^\n", "input_line_55:5:9: error: use of undeclared identifier 'EReadMode'\n", " case EReadMode::kInformed: os << \"informed\"; break;\n", " ^\n", "input_line_55:13:2: error: expected expression\n", " / To be reset between ProcessEntries calls to Read()\n", " ^\n", "input_line_55:13:4: error: use of undeclared identifier 'To'\n", " / To be reset between ProcessEntries calls to Read()\n", " ^\n" ] } ], "source": [ "std::ostream &operator<<(std::ostream &os, const EReadMode &e)\n", "{\n", " switch (e) {\n", " case EReadMode::kNaive: os << \"naive\"; break;\n", " case EReadMode::kInformed: os << \"informed\"; break;\n", " default: os << \"???\";\n", " }\n", " return os;\n", "}\n", "\n", "\n", "/ To be reset between ProcessEntries calls to Read()\n", "static std::atomic gNEntriesDone;\n", "static std::atomic gThreadId;" ] }, { "cell_type": "markdown", "id": "80ec3d6b", "metadata": {}, "source": [ " Whether we read with setting active entry tokens (informed) or n\n", " " ] }, { "cell_type": "code", "execution_count": 5, "id": "c3da2719", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:15:38.726625Z", "iopub.status.busy": "2026-05-19T20:15:38.726490Z", "iopub.status.idle": "2026-05-19T20:15:38.729053Z", "shell.execute_reply": "2026-05-19T20:15:38.728525Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "input_line_56:1:5: error: use of undeclared identifier 'naive'\n", "ot (naive)\n", " ^\n" ] } ], "source": [ "%%cpp -d\n", "ot (naive)\n", "enum class EReadMode {\n", " kNaive,\n", " kInformed\n", "};" ] }, { "cell_type": "markdown", "id": "c2e066c8", "metadata": {}, "source": [ " The read thread's main function\n", " " ] }, { "cell_type": "code", "execution_count": 6, "id": "a24f8a86", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:15:38.730773Z", "iopub.status.busy": "2026-05-19T20:15:38.730644Z", "iopub.status.idle": "2026-05-19T20:15:38.748301Z", "shell.execute_reply": "2026-05-19T20:15:38.747706Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "input_line_57:1:50: error: unknown type name 'EReadMode'\n", "void ProcessEntries(ROOT::RNTupleReader &reader, EReadMode readMode, const std::chrono::microseconds &usPerEvent,\n", " ^\n", "input_line_57:7:25: error: use of undeclared identifier 'gThreadId'\n", " const int threadId = gThreadId++;\n", " ^\n", "input_line_57:17:15: error: use of undeclared identifier 'EReadMode'\n", " case EReadMode::kInformed: token.SetEntryNumber(i); break;\n", " ^\n", "input_line_57:18:15: error: use of undeclared identifier 'EReadMode'\n", " case EReadMode::kNaive: break;\n", " ^\n", "input_line_57:27:31: error: use of undeclared identifier 'gNEntriesDone'\n", " const int entriesDone = gNEntriesDone++;\n", " ^\n" ] } ], "source": [ "%%cpp -d\n", "void ProcessEntries(ROOT::RNTupleReader &reader, EReadMode readMode, const std::chrono::microseconds &usPerEvent,\n", " std::vector &sumLoadedClusters, std::vector &sumUnsealedPages,\n", " std::vector &nClusters, std::vector &nPages)\n", "{\n", " static std::mutex gLock;\n", "\n", " const int threadId = gThreadId++;\n", "\n", " auto token = reader.CreateActiveEntryToken();\n", " for (unsigned int i = threadId; i < reader.GetNEntries(); i += 2) {\n", " {\n", " std::lock_guard guard(gLock);\n", "\n", " // The only difference between naive and informed reading: in informed reading, we indicate which\n", " // entry we are going to use before loading it.\n", " switch (readMode) {\n", " case EReadMode::kInformed: token.SetEntryNumber(i); break;\n", " case EReadMode::kNaive: break;\n", " default: std::terminate(); // never here\n", " }\n", "\n", " reader.LoadEntry(i);\n", " }\n", "\n", " std::this_thread::sleep_for(usPerEvent);\n", "\n", " const int entriesDone = gNEntriesDone++;\n", " sumLoadedClusters.at(entriesDone) =\n", " reader.GetMetrics().GetCounter(\"RNTupleReader.RPageSourceFile.nClusterLoaded\")->GetValueAsInt();\n", " sumUnsealedPages[entriesDone] =\n", " reader.GetMetrics().GetCounter(\"RNTupleReader.RPageSourceFile.nPageUnsealed\")->GetValueAsInt();\n", " nClusters[entriesDone] =\n", " reader.GetMetrics().GetCounter(\"RNTupleReader.RPageSourceFile.RClusterPool.nCluster\")->GetValueAsInt();\n", " nPages[entriesDone] =\n", " reader.GetMetrics().GetCounter(\"RNTupleReader.RPageSourceFile.RPagePool.nPage\")->GetValueAsInt();\n", " }\n", "}" ] }, { "cell_type": "markdown", "id": "0539c903", "metadata": {}, "source": [ " Definition of a helper function: " ] }, { "cell_type": "code", "execution_count": 7, "id": "6bc81e67", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:15:38.750146Z", "iopub.status.busy": "2026-05-19T20:15:38.750017Z", "iopub.status.idle": "2026-05-19T20:15:38.763882Z", "shell.execute_reply": "2026-05-19T20:15:38.763286Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "input_line_58:1:11: error: unknown type name 'EReadMode'\n", "void Read(EReadMode readMode, std::vector &sumLoadedClusters, std::vector &sumUnsealedPages,\n", " ^\n", "input_line_58:7:4: error: use of undeclared identifier 'gNEntriesDone'\n", " gNEntriesDone = 0;\n", " ^\n", "input_line_58:8:4: error: use of undeclared identifier 'gThreadId'\n", " gThreadId = 0;\n", " ^\n", "input_line_58:17:29: error: use of undeclared identifier 'ProcessEntries'\n", " threads[0] = std::thread(ProcessEntries, std::ref(*reader), readMode, 100us, std::ref(sumLoadedClusters),\n", " ^\n", "input_line_58:19:29: error: use of undeclared identifier 'ProcessEntries'\n", " threads[1] = std::thread(ProcessEntries, std::ref(*reader), readMode, 200us, std::ref(sumLoadedClusters),\n", " ^\n" ] } ], "source": [ "%%cpp -d\n", "void Read(EReadMode readMode, std::vector &sumLoadedClusters, std::vector &sumUnsealedPages,\n", " std::vector &nClusters, std::vector &nPages)\n", "{\n", " auto reader = ROOT::RNTupleReader::Open(\"ntpl\", kNTupleFileName);\n", " reader->EnableMetrics();\n", "\n", " gNEntriesDone = 0;\n", " gThreadId = 0;\n", "\n", " const auto N = reader->GetNEntries();\n", " sumLoadedClusters.resize(N);\n", " sumUnsealedPages.resize(N);\n", " nClusters.resize(N);\n", " nPages.resize(N);\n", "\n", " std::array threads;\n", " threads[0] = std::thread(ProcessEntries, std::ref(*reader), readMode, 100us, std::ref(sumLoadedClusters),\n", " std::ref(sumUnsealedPages), std::ref(nClusters), std::ref(nPages));\n", " threads[1] = std::thread(ProcessEntries, std::ref(*reader), readMode, 200us, std::ref(sumLoadedClusters),\n", " std::ref(sumUnsealedPages), std::ref(nClusters), std::ref(nPages));\n", " for (auto &t : threads) {\n", " t.join();\n", " }\n", "\n", " std::cout << \"Reading in mode '\" << readMode << \"':\" << std::endl;\n", " std::cout << \"===========================\" << std::endl;\n", " reader->GetMetrics().Print(std::cout);\n", " std::cout << \"===========================\" << std::endl << std::endl;\n", "}" ] }, { "cell_type": "markdown", "id": "47d7398d", "metadata": {}, "source": [ " Definition of a helper function: " ] }, { "cell_type": "code", "execution_count": 8, "id": "13cccd72", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:15:38.765375Z", "iopub.status.busy": "2026-05-19T20:15:38.765259Z", "iopub.status.idle": "2026-05-19T20:15:38.804169Z", "shell.execute_reply": "2026-05-19T20:15:38.803791Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "input_line_59:4:37: error: use of undeclared identifier 'Point2D'\n", " auto ptrPoint = model->MakeField(\"point\");\n", " ^\n" ] } ], "source": [ "%%cpp -d\n", "void Write()\n", "{\n", " auto model = ROOT::RNTupleModel::Create();\n", " auto ptrPoint = model->MakeField(\"point\");\n", "\n", " auto writer = ROOT::RNTupleWriter::Recreate(std::move(model), \"ntpl\", kNTupleFileName);\n", "\n", " for (int i = 0; i < 1000; ++i) {\n", " if (i % 100 == 0)\n", " writer->CommitCluster();\n", "\n", " auto prng = std::make_unique();\n", " prng->SetSeed();\n", "\n", " ptrPoint->fX = prng->Rndm(1);\n", " ptrPoint->fY = prng->Rndm(1);\n", " writer->Fill();\n", " }\n", "}" ] }, { "cell_type": "markdown", "id": "5562150b", "metadata": {}, "source": [ " Definition of a helper function: " ] }, { "cell_type": "code", "execution_count": 9, "id": "6f3646b4", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:15:38.806087Z", "iopub.status.busy": "2026-05-19T20:15:38.805955Z", "iopub.status.idle": "2026-05-19T20:15:38.812156Z", "shell.execute_reply": "2026-05-19T20:15:38.811582Z" } }, "outputs": [], "source": [ "%%cpp -d\n", "TGraph *GetGraph(const std::vector &counts)\n", "{\n", " auto graph = new TGraph();\n", " for (unsigned int i = 0; i < counts.size(); ++i) {\n", " graph->SetPoint(i, i, counts[i]);\n", " }\n", " graph->GetXaxis()->SetTitle(\"Number of processed entries\");\n", " return graph;\n", "}" ] }, { "cell_type": "code", "execution_count": 10, "id": "f89e0fbe", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:15:38.813736Z", "iopub.status.busy": "2026-05-19T20:15:38.813588Z", "iopub.status.idle": "2026-05-19T20:15:39.019878Z", "shell.execute_reply": "2026-05-19T20:15:39.019032Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "input_line_61:20:6: error: 'EReadMode' is not a class, namespace, or enumeration\n", "Read(EReadMode::kNaive, sumLoadedClusters, sumUnsealedPages, nClusters, nPages);\n", " ^\n", "input_line_61:20:6: note: 'EReadMode' declared here\n" ] } ], "source": [ "ROOT::EnableImplicitMT();\n", "\n", "Write();\n", "ROOT::RNTupleReader::Open(\"ntpl\", kNTupleFileName)->PrintInfo(ROOT::ENTupleInfo::kStorageDetails);\n", "\n", "std::vector sumLoadedClusters;\n", "std::vector sumUnsealedPages;\n", "std::vector nClusters;\n", "std::vector nPages;\n", "TLatex latex;\n", "\n", "gStyle->SetOptStat(0);\n", "gStyle->SetLineWidth(2);\n", "gStyle->SetMarkerStyle(8);\n", "TCanvas *canvas = new TCanvas(\"\", \"Shared Reader Example\", 200, 10, 1500, 1000);\n", "\n", "canvas->Divide(2, 2);\n", "\n", "Read(EReadMode::kNaive, sumLoadedClusters, sumUnsealedPages, nClusters, nPages);\n", "\n", "canvas->cd(1);\n", "auto graph1 = GetGraph(sumUnsealedPages);\n", "graph1->SetLineColor(kRed);\n", "graph1->Draw(\"AL\");\n", "auto graph2 = GetGraph(sumLoadedClusters);\n", "graph2->SetLineColor(kBlue);\n", "graph2->Draw(\"SAME L\");\n", "\n", "auto legend = new TLegend(0.125, 0.725, 0.625, 0.875);\n", "legend->AddEntry(graph1, \"Number of decompressed pages\", \"l\");\n", "legend->AddEntry(graph2, \"Number of loaded clusters\", \"l\");\n", "legend->Draw();\n", "\n", "latex.SetTextAlign(22);\n", "latex.DrawLatexNDC(0.5, 0.95, \"Naive Reading\");\n", "\n", "canvas->cd(3);\n", "\n", "auto graph3 = GetGraph(nPages);\n", "graph3->SetMarkerColor(kRed);\n", "graph3->GetYaxis()->SetNdivisions(8);\n", "graph3->GetYaxis()->SetRangeUser(-0.5, 14);\n", "graph3->Draw(\"AP\");\n", "\n", "auto graph4 = GetGraph(nClusters);\n", "graph4->SetMarkerColor(kBlue);\n", "graph4->Draw(\"SAME P\");\n", "\n", "legend = new TLegend(0.35, 0.725, 0.85, 0.875);\n", "legend->AddEntry(graph3, \"Number of currently cached pages\", \"p\");\n", "legend->AddEntry(graph4, \"Number of currently cached clusters\", \"p\");\n", "legend->Draw();" ] }, { "cell_type": "markdown", "id": "bb0ea1cd", "metadata": {}, "source": [ "===============================" ] }, { "cell_type": "code", "execution_count": 11, "id": "35224f0f", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:15:39.021487Z", "iopub.status.busy": "2026-05-19T20:15:39.021360Z", "iopub.status.idle": "2026-05-19T20:15:39.227365Z", "shell.execute_reply": "2026-05-19T20:15:39.226712Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "input_line_62:2:7: error: 'EReadMode' is not a class, namespace, or enumeration\n", " Read(EReadMode::kInformed, sumLoadedClusters, sumUnsealedPages, nClusters, nPages);\n", " ^\n", "input_line_62:2:7: note: 'EReadMode' declared here\n", "input_line_62:2:29: error: use of undeclared identifier 'sumLoadedClusters'\n", " Read(EReadMode::kInformed, sumLoadedClusters, sumUnsealedPages, nClusters, nPages);\n", " ^\n", "input_line_62:2:48: error: use of undeclared identifier 'sumUnsealedPages'\n", " Read(EReadMode::kInformed, sumLoadedClusters, sumUnsealedPages, nClusters, nPages);\n", " ^\n", "input_line_62:2:66: error: use of undeclared identifier 'nClusters'\n", " Read(EReadMode::kInformed, sumLoadedClusters, sumUnsealedPages, nClusters, nPages);\n", " ^\n", "input_line_62:2:77: error: use of undeclared identifier 'nPages'\n", " Read(EReadMode::kInformed, sumLoadedClusters, sumUnsealedPages, nClusters, nPages);\n", " ^\n", "input_line_62:4:1: error: use of undeclared identifier 'canvas'\n", "canvas->cd(2);\n", "^\n", "input_line_62:5:24: error: use of undeclared identifier 'sumUnsealedPages'\n", "auto graph5 = GetGraph(sumUnsealedPages);\n", " ^\n", "input_line_62:9:24: error: use of undeclared identifier 'sumLoadedClusters'\n", "auto graph6 = GetGraph(sumLoadedClusters);\n", " ^\n", "input_line_62:13:1: error: use of undeclared identifier 'latex'\n", "latex.SetTextAlign(22);\n", "^\n", "input_line_62:14:1: error: use of undeclared identifier 'latex'\n", "latex.DrawLatexNDC(0.5, 0.95, \"Informed Reading\");\n", "^\n", "input_line_62:16:1: error: use of undeclared identifier 'canvas'\n", "canvas->cd(4);\n", "^\n", "input_line_62:18:24: error: use of undeclared identifier 'nPages'\n", "auto graph7 = GetGraph(nPages);\n", " ^\n", "input_line_62:24:24: error: use of undeclared identifier 'nClusters'\n", "auto graph8 = GetGraph(nClusters);\n", " ^\n" ] } ], "source": [ "Read(EReadMode::kInformed, sumLoadedClusters, sumUnsealedPages, nClusters, nPages);\n", "\n", "canvas->cd(2);\n", "auto graph5 = GetGraph(sumUnsealedPages);\n", "graph5->SetLineColor(kRed);\n", "graph5->Draw(\"AL\");\n", "\n", "auto graph6 = GetGraph(sumLoadedClusters);\n", "graph6->SetLineColor(kBlue);\n", "graph6->Draw(\"SAME L\");\n", "\n", "latex.SetTextAlign(22);\n", "latex.DrawLatexNDC(0.5, 0.95, \"Informed Reading\");\n", "\n", "canvas->cd(4);\n", "\n", "auto graph7 = GetGraph(nPages);\n", "graph7->SetMarkerColor(kRed);\n", "graph7->GetYaxis()->SetNdivisions(8);\n", "graph7->GetYaxis()->SetRangeUser(-0.5, 14);\n", "graph7->Draw(\"AP\");\n", "\n", "auto graph8 = GetGraph(nClusters);\n", "graph8->SetMarkerColor(kBlue);\n", "graph8->Draw(\"SAME P\");" ] }, { "cell_type": "markdown", "id": "66fb238e", "metadata": {}, "source": [ "Draw all canvases " ] }, { "cell_type": "code", "execution_count": 12, "id": "a5bdad6f", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:15:39.229237Z", "iopub.status.busy": "2026-05-19T20:15:39.229112Z", "iopub.status.idle": "2026-05-19T20:15:39.434551Z", "shell.execute_reply": "2026-05-19T20:15:39.433903Z" } }, "outputs": [], "source": [ "gROOT->GetListOfCanvases()->Draw()" ] } ], "metadata": { "kernelspec": { "display_name": "ROOT C++", "language": "c++", "name": "root" }, "language_info": { "codemirror_mode": "text/x-c++src", "file_extension": ".C", "mimetype": " text/x-c++src", "name": "c++" } }, "nbformat": 4, "nbformat_minor": 5 }