23 " rootreadspeed --files fname1 [fname2 ...]\n"
24 " --trees tname1 [tname2 ...]\n"
25 " (--all-branches | --branches bname1 [bname2 ...] | --branches-regex bregex1 "
27 " [--threads nthreads]\n"
28 " [--tasks-per-worker ntasks]\n"
29 " rootreadspeed (--help|-h)\n"
31 " Use -h for usage help, --help for detailed information.\n";
35 " Specifying files and trees:\n"
36 " --files fname1 [fname2...]\n"
37 " The list of ROOT files to read from.\n"
39 " --trees tname1 [tname2...]\n"
40 " The list of TTrees to read from the files.\n"
41 " If only one TTree is provided then it will be used for all files.\n"
42 " If multiple TTrees are specified, each TTree is read from the respective file.\n"
44 " Specifying branches:\n"
45 " Branches can be specified using one of the following flags. Currently only one can be used at a time.\n"
48 " Reads every branch from the specified files and TTrees.\n"
49 " --branches bname1 [bname2...]\n"
50 " Reads the branches with matching names. Will error if any of the branches are not found.\n"
51 " --branches-regex bregex1 [bregex2 ...]\n"
52 " Reads any branches with a name matching the provided regex. Will error if any provided regex does not match "
53 "at least one branch.\n"
56 " --threads nthreads\n"
57 " The number of threads to use for file reading. Will automatically cap to the number of available threads on "
59 " --tasks-per-worker ntasks\n"
60 " The number of tasks to generate for each worker thread when using multithreading.";
65 "rootreadspeed is a tool used to help identify bottlenecks in ROOT analysis programs by providing an idea of what "
66 "throughput you can expect when reading ROOT files in certain configurations. It does this by providing information "
67 "about the number of bytes read from your files, how long this takes, and the different throughputs in MB/s, both "
68 "in total and per thread.\n"
70 "Compressed vs Uncompressed Throughput:\n"
72 "Throughput speeds are provided as compressed and uncompressed - ROOT files are usually saved in compressed "
73 "format, so these will often differ. Compressed bytes is the total number of bytes read from TFiles during the "
74 "readspeed test (possibly including meta-data). Uncompressed bytes is the number of bytes processed by reading the "
75 "branch values in the TTree. Throughput is calculated as the total number of bytes over the total runtime "
76 "(including decompression time) in the uncompressed and compressed cases.\n"
78 "Interpreting results:\n"
80 "There are three possible scenarios when using rootreadspeed, namely:\n"
82 "1. The 'Real Time' is significantly lower than your own analysis runtime.\n"
83 "This would imply your actual application code is dominating the runtime of your analysis, ie. your analysis "
84 "logic or framework is taking up the time. The best way to decrease the runtime would be to optimize your code, "
85 "attempt to parallelize it onto multiple threads if possible, or use a machine with a more performant CPU. The best "
86 "way to decrease the runtime would be to optimize your code (or the framework's), parallelize it onto multiple "
87 "threads if possible (for example with RDataFrame and EnableImplicitMT) or switch to a machine with a more "
90 "2. The 'Real Time' is significantly higher than 'CPU Time / number of threads'.\n"
91 "If the real time is higher than the CPU time per core it implies the reading of data is the bottleneck, as "
92 "the CPU cores are wasting time waiting for data to arrive from your disk/drive or network connection in order to "
93 "decompress it. The best way to decrease your runtime would be transferring the data you need onto a faster storage "
94 "medium (i.e. a faster disk/drive such as an SSD, or connecting to a faster network for remote file access), or to "
95 "use a compression algorithm with a higher compression ratio, possibly at the cost of the decompression rate. "
96 "Changing the number of threads is unlikely to help, and in fact using too many threads may degrade "
97 "performance if they make requests to different regions of your local storage.\n"
98 "N.B. If no '--threads' argument was provided this is 1, otherwise it is the minimum of the value provided and "
99 "the number of threads your CPU can run in parallel. It is worth noting that - on shared systems or if running "
100 "other heavy applications - the number of your own threads running at any time may be lower than the limit due to "
101 "demand on the CPU.\n"
103 "3. The 'Real Time' is similar to 'CPU Time / number of threads' AND 'Compressed Throughput' is lower than "
104 "expected for your storage medium:\n"
105 "This would imply that your CPU threads aren't decompressing data as fast as your storage medium can provide it, "
106 "and so decompression is the bottleneck. The best way to decrease your runtime would be to utilise a system with a "
107 "faster CPU, or make use use of more threads when running, or use a compression algorithm with a higher "
108 "decompression rate such as LZ4, possibly at the cost of some extra file size.\n"
110 "A note on caching:\n"
112 "If your data is stored on a local disk, the system may cache some/all of the file in memory after it is first "
113 "read. If this is realistic of how your analysis will run - then there is no concern. However, if you expect to "
114 "only read files once in a while - and as such the files are unlikely to be in the cache - consider clearing the "
115 "cache before running rootreadspeed. On Linux this can be done by running `echo 3 > /proc/sys/vm/drop_caches` as a "
116 "superuser or a specific file can be dropped from the cache with `dd of=<FILENAME> oflag=nocache "
117 "conv=notrunc,fdatasync count=0 > /dev/null 2>&1`.\n"
119 "Known overhead of TTreeReader, RDataFrame:\n"
121 "rootreadspeed is designed to read all data present in the specified branches, trees and files at the highest "
122 "possible speed. When the application bottleneck is not in the computations performed by analysis logic, "
123 "higher-level interfaces built on top of TTree such as TTreeReader and RDataFrame are known to add a significant "
124 "runtime overhead with respect to the runtimes reported by rootreadspeed (up to a factor 2). In realistic "
125 "analysis applications it has been observed that a large part of that overhead is compensated by the ability of "
126 "TTreeReader and RDataFrame to read branch values selectively, based on event cuts, and this overhead will be "
127 "reduced significantly when using RDataFrame in conjunction with RNTuple.";
131 std::cout <<
"Thread pool size:\t\t" <<
r.fThreadPoolSize <<
'\n';
133 if (
r.fMTSetupRealTime > 0.) {
134 std::cout <<
"Real time to setup MT run:\t" <<
r.fMTSetupRealTime <<
" s\n";
135 std::cout <<
"CPU time to setup MT run:\t" <<
r.fMTSetupCpuTime <<
" s\n";
138 std::cout <<
"Real time:\t\t\t" <<
r.fRealTime <<
" s\n";
139 std::cout <<
"CPU time:\t\t\t" <<
r.fCpuTime <<
" s\n";
141 std::cout <<
"Uncompressed data read:\t\t" <<
r.fUncompressedBytesRead <<
" bytes\n";
142 std::cout <<
"Compressed data read:\t\t" <<
r.fCompressedBytesRead <<
" bytes\n";
144 const unsigned int effectiveThreads = std::max(
r.fThreadPoolSize, 1u);
146 std::cout <<
"Uncompressed throughput:\t" <<
r.fUncompressedBytesRead /
r.fRealTime / 1024 / 1024 <<
" MB/s\n";
147 std::cout <<
"\t\t\t\t" <<
r.fUncompressedBytesRead /
r.fRealTime / 1024 / 1024 / effectiveThreads
148 <<
" MB/s/thread for " << effectiveThreads <<
" threads\n";
149 std::cout <<
"Compressed throughput:\t\t" <<
r.fCompressedBytesRead /
r.fRealTime / 1024 / 1024 <<
" MB/s\n";
150 std::cout <<
"\t\t\t\t" <<
r.fCompressedBytesRead /
r.fRealTime / 1024 / 1024 / effectiveThreads
151 <<
" MB/s/thread for " << effectiveThreads <<
" threads\n\n";
153 const float cpuEfficiency = (
r.fCpuTime / effectiveThreads) /
r.fRealTime;
155 std::cout <<
"CPU Efficiency: \t\t" << (cpuEfficiency * 100) <<
"%\n";
156 std::cout <<
"Reading data is ";
157 if (cpuEfficiency > 0.80f) {
158 std::cout <<
"likely CPU bound (decompression).\n";
159 }
else if (cpuEfficiency < 0.50f) {
160 std::cout <<
"likely I/O bound.\n";
162 std::cout <<
"likely balanced (more threads may help though).\n";
164 std::cout <<
"For details run with the --help command.\n";
170 const auto argsProvided = args.size() >= 2;
171 const auto helpUsed = argsProvided && (args[1] ==
"--help" || args[1] ==
"-h");
172 const auto longHelpUsed = argsProvided && args[1] ==
"--help";
174 if (!argsProvided || helpUsed) {
180 std::cout << std::endl;
186 unsigned int nThreads = 0;
188 enum class EArgState {
kNone, kTrees, kFiles, kBranches, kThreads, kTasksPerWorkerHint } argState = EArgState::kNone;
189 enum class EBranchState {
kNone, kRegular, kRegex, kAll } branchState = EBranchState::kNone;
190 const auto branchOptionsErrMsg =
191 "Options --all-branches, --branches, and --branches-regex are mutually exclusive. You can use only one.\n";
193 for (
size_t i = 1; i < args.size(); ++i) {
194 const auto &arg = args[i];
196 if (arg ==
"--trees") {
197 argState = EArgState::kTrees;
198 }
else if (arg ==
"--files") {
199 argState = EArgState::kFiles;
200 }
else if (arg ==
"--all-branches") {
201 argState = EArgState::kNone;
202 if (branchState != EBranchState::kNone && branchState != EBranchState::kAll) {
203 std::cerr << branchOptionsErrMsg;
206 branchState = EBranchState::kAll;
208 d.fBranchNames = {
".*"};
209 }
else if (arg ==
"--branches") {
210 argState = EArgState::kBranches;
211 if (branchState != EBranchState::kNone && branchState != EBranchState::kRegular) {
212 std::cerr << branchOptionsErrMsg;
215 branchState = EBranchState::kRegular;
216 }
else if (arg ==
"--branches-regex") {
217 argState = EArgState::kBranches;
218 if (branchState != EBranchState::kNone && branchState != EBranchState::kRegex) {
219 std::cerr << branchOptionsErrMsg;
222 branchState = EBranchState::kRegex;
224 }
else if (arg ==
"--threads") {
225 argState = EArgState::kThreads;
226 }
else if (arg ==
"--tasks-per-worker") {
227 argState = EArgState::kTasksPerWorkerHint;
228 }
else if (arg[0] ==
'-') {
229 std::cerr <<
"Unrecognized option '" << arg <<
"'\n";
233 case EArgState::kTrees:
d.fTreeNames.emplace_back(arg);
break;
234 case EArgState::kFiles:
d.fFileNames.emplace_back(arg);
break;
235 case EArgState::kBranches:
d.fBranchNames.emplace_back(arg);
break;
236 case EArgState::kThreads:
237 nThreads = std::stoi(arg);
238 argState = EArgState::kNone;
240 case EArgState::kTasksPerWorkerHint:
243 argState = EArgState::kNone;
245 std::cerr <<
"ROOT was built without implicit multi-threading (IMT) support. The --tasks-per-worker option "
246 "will be ignored.\n";
249 default: std::cerr <<
"Unrecognized option '" << arg <<
"'\n";
return {};
254 return Args{std::move(
d), nThreads, branchState == EBranchState::kAll,
true};
259 std::vector<std::string> args;
262 for (
int i = 0; i < argc; ++i) {
263 args.emplace_back(argv[i]);
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t r
static void SetTasksPerWorkerHint(unsigned int m)
Set the hint for the desired number of tasks created per worker.
void PrintThroughput(const Result &r)
Args ParseArgs(const std::vector< std::string > &args)
bool fUseRegex
If the branch names should use regex matching.