Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
ReadSpeedCLI.cxx
Go to the documentation of this file.
1// Author: Enrico Guiraud, David Poulton 2022
2
3/*************************************************************************
4 * Copyright (C) 1995-2022, Rene Brun and Fons Rademakers. *
5 * All rights reserved. *
6 * *
7 * For the licensing terms see $ROOTSYS/LICENSE. *
8 * For the list of contributors see $ROOTSYS/README/CREDITS. *
9 *************************************************************************/
10
11#include "ReadSpeedCLI.hxx"
12
13#ifdef R__USE_IMT
14#include <ROOT/TTreeProcessorMT.hxx> // for TTreeProcessorMT::SetTasksPerWorkerHint
15#endif
16
17#include <iostream>
18#include <cstring>
19
20using namespace ReadSpeed;
21
22const auto usageText = "Usage:\n"
23 " rootreadspeed --files fname1 [fname2 ...]\n"
24 " --trees tname1 [tname2 ...]\n"
25 " (--all-branches | --branches bname1 [bname2 ...] | --branches-regex bregex1 "
26 "[bregex2 ...])\n"
27 " [--threads nthreads]\n"
28 " [--tasks-per-worker ntasks]\n"
29 " rootreadspeed (--help|-h)\n"
30 " \n"
31 " Use -h for usage help, --help for detailed information.\n";
32
33const auto argUsageText =
34 "Arguments:\n"
35 " Specifying files and trees:\n"
36 " --files fname1 [fname2...]\n"
37 " The list of ROOT files to read from.\n"
38 "\n"
39 " --trees tname1 [tname2...]\n"
40 " The list of TTrees to read from the files.\n"
41 " If only one TTree is provided then it will be used for all files.\n"
42 " If multiple TTrees are specified, each TTree is read from the respective file.\n"
43 "\n"
44 " Specifying branches:\n"
45 " Branches can be specified using one of the following flags. Currently only one can be used at a time.\n"
46 "\n"
47 " --all-branches\n"
48 " Reads every branch from the specified files and TTrees.\n"
49 " --branches bname1 [bname2...]\n"
50 " Reads the branches with matching names. Will error if any of the branches are not found.\n"
51 " --branches-regex bregex1 [bregex2 ...]\n"
52 " Reads any branches with a name matching the provided regex. Will error if any provided regex does not match "
53 "at least one branch.\n"
54 "\n"
55 " Meta arguments:\n"
56 " --threads nthreads\n"
57 " The number of threads to use for file reading. Will automatically cap to the number of available threads on "
58 "the machine.\n"
59 " --tasks-per-worker ntasks\n"
60 " The number of tasks to generate for each worker thread when using multithreading.";
61
62const auto fullUsageText =
63 "Description:\n"
64 "\n"
65 "rootreadspeed is a tool used to help identify bottlenecks in ROOT analysis programs by providing an idea of what "
66 "throughput you can expect when reading ROOT files in certain configurations. It does this by providing information "
67 "about the number of bytes read from your files, how long this takes, and the different throughputs in MB/s, both "
68 "in total and per thread.\n"
69 "\n\n"
70 "Compressed vs Uncompressed Throughput:\n"
71 "\n"
72 "Throughput speeds are provided as compressed and uncompressed - ROOT files are usually saved in compressed "
73 "format, so these will often differ. Compressed bytes is the total number of bytes read from TFiles during the "
74 "readspeed test (possibly including meta-data). Uncompressed bytes is the number of bytes processed by reading the "
75 "branch values in the TTree. Throughput is calculated as the total number of bytes over the total runtime "
76 "(including decompression time) in the uncompressed and compressed cases.\n"
77 "\n\n"
78 "Interpreting results:\n"
79 "\n"
80 "There are three possible scenarios when using rootreadspeed, namely:\n"
81 "\n"
82 "1. The 'Real Time' is significantly lower than your own analysis runtime.\n"
83 "This would imply your actual application code is dominating the runtime of your analysis, ie. your analysis "
84 "logic or framework is taking up the time. The best way to decrease the runtime would be to optimize your code, "
85 "attempt to parallelize it onto multiple threads if possible, or use a machine with a more performant CPU. The best "
86 "way to decrease the runtime would be to optimize your code (or the framework's), parallelize it onto multiple "
87 "threads if possible (for example with RDataFrame and EnableImplicitMT) or switch to a machine with a more "
88 "performant CPU.\n"
89 "\n"
90 "2. The 'Real Time' is significantly higher than 'CPU Time / number of threads'.\n"
91 "If the real time is higher than the CPU time per core it implies the reading of data is the bottleneck, as "
92 "the CPU cores are wasting time waiting for data to arrive from your disk/drive or network connection in order to "
93 "decompress it. The best way to decrease your runtime would be transferring the data you need onto a faster storage "
94 "medium (i.e. a faster disk/drive such as an SSD, or connecting to a faster network for remote file access), or to "
95 "use a compression algorithm with a higher compression ratio, possibly at the cost of the decompression rate. "
96 "Changing the number of threads is unlikely to help, and in fact using too many threads may degrade "
97 "performance if they make requests to different regions of your local storage.\n"
98 "N.B. If no '--threads' argument was provided this is 1, otherwise it is the minimum of the value provided and "
99 "the number of threads your CPU can run in parallel. It is worth noting that - on shared systems or if running "
100 "other heavy applications - the number of your own threads running at any time may be lower than the limit due to "
101 "demand on the CPU.\n"
102 "\n"
103 "3. The 'Real Time' is similar to 'CPU Time / number of threads' AND 'Compressed Throughput' is lower than "
104 "expected for your storage medium:\n"
105 "This would imply that your CPU threads aren't decompressing data as fast as your storage medium can provide it, "
106 "and so decompression is the bottleneck. The best way to decrease your runtime would be to utilise a system with a "
107 "faster CPU, or make use use of more threads when running, or use a compression algorithm with a higher "
108 "decompression rate such as LZ4, possibly at the cost of some extra file size.\n"
109 "\n\n"
110 "A note on caching:\n"
111 "\n"
112 "If your data is stored on a local disk, the system may cache some/all of the file in memory after it is first "
113 "read. If this is realistic of how your analysis will run - then there is no concern. However, if you expect to "
114 "only read files once in a while - and as such the files are unlikely to be in the cache - consider clearing the "
115 "cache before running rootreadspeed. On Linux this can be done by running `echo 3 > /proc/sys/vm/drop_caches` as a "
116 "superuser or a specific file can be dropped from the cache with `dd of=<FILENAME> oflag=nocache "
117 "conv=notrunc,fdatasync count=0 > /dev/null 2>&1`.\n"
118 "\n\n"
119 "Known overhead of TTreeReader, RDataFrame:\n"
120 "\n"
121 "rootreadspeed is designed to read all data present in the specified branches, trees and files at the highest "
122 "possible speed. When the application bottleneck is not in the computations performed by analysis logic, "
123 "higher-level interfaces built on top of TTree such as TTreeReader and RDataFrame are known to add a significant "
124 "runtime overhead with respect to the runtimes reported by rootreadspeed (up to a factor 2). In realistic "
125 "analysis applications it has been observed that a large part of that overhead is compensated by the ability of "
126 "TTreeReader and RDataFrame to read branch values selectively, based on event cuts, and this overhead will be "
127 "reduced significantly when using RDataFrame in conjunction with RNTuple.";
128
130{
131 std::cout << "Thread pool size:\t\t" << r.fThreadPoolSize << '\n';
132
133 if (r.fMTSetupRealTime > 0.) {
134 std::cout << "Real time to setup MT run:\t" << r.fMTSetupRealTime << " s\n";
135 std::cout << "CPU time to setup MT run:\t" << r.fMTSetupCpuTime << " s\n";
136 }
137
138 std::cout << "Real time:\t\t\t" << r.fRealTime << " s\n";
139 std::cout << "CPU time:\t\t\t" << r.fCpuTime << " s\n";
140
141 std::cout << "Uncompressed data read:\t\t" << r.fUncompressedBytesRead << " bytes\n";
142 std::cout << "Compressed data read:\t\t" << r.fCompressedBytesRead << " bytes\n";
143
144 const unsigned int effectiveThreads = std::max(r.fThreadPoolSize, 1u);
145
146 std::cout << "Uncompressed throughput:\t" << r.fUncompressedBytesRead / r.fRealTime / 1024 / 1024 << " MB/s\n";
147 std::cout << "\t\t\t\t" << r.fUncompressedBytesRead / r.fRealTime / 1024 / 1024 / effectiveThreads
148 << " MB/s/thread for " << effectiveThreads << " threads\n";
149 std::cout << "Compressed throughput:\t\t" << r.fCompressedBytesRead / r.fRealTime / 1024 / 1024 << " MB/s\n";
150 std::cout << "\t\t\t\t" << r.fCompressedBytesRead / r.fRealTime / 1024 / 1024 / effectiveThreads
151 << " MB/s/thread for " << effectiveThreads << " threads\n\n";
152
153 const float cpuEfficiency = (r.fCpuTime / effectiveThreads) / r.fRealTime;
154
155 std::cout << "CPU Efficiency: \t\t" << (cpuEfficiency * 100) << "%\n";
156 std::cout << "Reading data is ";
157 if (cpuEfficiency > 0.80f) {
158 std::cout << "likely CPU bound (decompression).\n";
159 } else if (cpuEfficiency < 0.50f) {
160 std::cout << "likely I/O bound.\n";
161 } else {
162 std::cout << "likely balanced (more threads may help though).\n";
163 }
164 std::cout << "For details run with the --help command.\n";
165}
166
167Args ReadSpeed::ParseArgs(const std::vector<std::string> &args)
168{
169 // Print help message and exit if "--help"
170 const auto argsProvided = args.size() >= 2;
171 const auto helpUsed = argsProvided && (args[1] == "--help" || args[1] == "-h");
172 const auto longHelpUsed = argsProvided && args[1] == "--help";
173
174 if (!argsProvided || helpUsed) {
175 std::cout << usageText;
176 if (helpUsed)
177 std::cout << "\n" << argUsageText;
178 if (longHelpUsed)
179 std::cout << "\n\n" << fullUsageText;
180 std::cout << std::endl;
181
182 return {};
183 }
184
185 Data d;
186 unsigned int nThreads = 0;
187
188 enum class EArgState { kNone, kTrees, kFiles, kBranches, kThreads, kTasksPerWorkerHint } argState = EArgState::kNone;
189 enum class EBranchState { kNone, kRegular, kRegex, kAll } branchState = EBranchState::kNone;
190 const auto branchOptionsErrMsg =
191 "Options --all-branches, --branches, and --branches-regex are mutually exclusive. You can use only one.\n";
192
193 for (size_t i = 1; i < args.size(); ++i) {
194 const auto &arg = args[i];
195
196 if (arg == "--trees") {
197 argState = EArgState::kTrees;
198 } else if (arg == "--files") {
199 argState = EArgState::kFiles;
200 } else if (arg == "--all-branches") {
201 argState = EArgState::kNone;
202 if (branchState != EBranchState::kNone && branchState != EBranchState::kAll) {
203 std::cerr << branchOptionsErrMsg;
204 return {};
205 }
206 branchState = EBranchState::kAll;
207 d.fUseRegex = true;
208 d.fBranchNames = {".*"};
209 } else if (arg == "--branches") {
210 argState = EArgState::kBranches;
211 if (branchState != EBranchState::kNone && branchState != EBranchState::kRegular) {
212 std::cerr << branchOptionsErrMsg;
213 return {};
214 }
215 branchState = EBranchState::kRegular;
216 } else if (arg == "--branches-regex") {
217 argState = EArgState::kBranches;
218 if (branchState != EBranchState::kNone && branchState != EBranchState::kRegex) {
219 std::cerr << branchOptionsErrMsg;
220 return {};
221 }
222 branchState = EBranchState::kRegex;
223 d.fUseRegex = true;
224 } else if (arg == "--threads") {
225 argState = EArgState::kThreads;
226 } else if (arg == "--tasks-per-worker") {
227 argState = EArgState::kTasksPerWorkerHint;
228 } else if (arg[0] == '-') {
229 std::cerr << "Unrecognized option '" << arg << "'\n";
230 return {};
231 } else {
232 switch (argState) {
233 case EArgState::kTrees: d.fTreeNames.emplace_back(arg); break;
234 case EArgState::kFiles: d.fFileNames.emplace_back(arg); break;
235 case EArgState::kBranches: d.fBranchNames.emplace_back(arg); break;
236 case EArgState::kThreads:
237 nThreads = std::stoi(arg);
238 argState = EArgState::kNone;
239 break;
240 case EArgState::kTasksPerWorkerHint:
241#ifdef R__USE_IMT
243 argState = EArgState::kNone;
244#else
245 std::cerr << "ROOT was built without implicit multi-threading (IMT) support. The --tasks-per-worker option "
246 "will be ignored.\n";
247#endif
248 break;
249 default: std::cerr << "Unrecognized option '" << arg << "'\n"; return {};
250 }
251 }
252 }
253
254 return Args{std::move(d), nThreads, branchState == EBranchState::kAll, /*fShouldRun=*/true};
255}
256
257Args ReadSpeed::ParseArgs(int argc, char **argv)
258{
259 std::vector<std::string> args;
260 args.reserve(argc);
261
262 for (int i = 0; i < argc; ++i) {
263 args.emplace_back(argv[i]);
264 }
265
266 return ParseArgs(args);
267}
const Handle_t kNone
Definition GuiTypes.h:88
#define d(i)
Definition RSha256.hxx:102
const auto fullUsageText
const auto usageText
const auto argUsageText
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t r
static void SetTasksPerWorkerHint(unsigned int m)
Set the hint for the desired number of tasks created per worker.
void PrintThroughput(const Result &r)
Args ParseArgs(const std::vector< std::string > &args)
bool fUseRegex
If the branch names should use regex matching.
Definition ReadSpeed.hxx:30