You are here

The PROOF benchmark framework: TProofBench

  1. Introduction
  2. Creating the TProofBench object
  3. The CPU benchmark
    1. Default benchmark
    2. Using an alternative selector
    3. Running the benchmark
  4. The I/O benchmark
    1. Default benchmark
    2. Creating the default dataset
    3. Running the benchmark
  5. The output file
  6. Saving the performance tree
  7. Displaying the results
    1. DrawCPU
    2. DrawDataSet
  8. Getting the performance specs
  9. Importing proofbench into an older ROOT version
    1. Check-out the module from the trunk
    2. Build the module
    3. Run the module

Introduction 

This page describes the usage of the new benchmark module developed by Sangsu Ryu (KISTI) and available starting with  ROOT 5.29/02. The code is located in the dedicated sub-directory 'proof/proofbench'. The old benchmark utilities under $ROOTSYS/test/ProofBench, still provided for legacy only, are described in here.

The steering class is called TProofBench. This class has one constructor whose purpose is to connect to the cluster to be bechmarked. TProofBench steers the running of two type of benchmarks: cycle-driven (aka CPU-intensive) and data-driven (aka IO-intensive). For IO-intensive benchmarks TProofBench provides a framework to generate the relevant data files and create the related datasets. By default, the selectors provided by benchsuite are used. However it is possible to use alternative selectors and data files.

Creating the TProofBench object

The TProofBench constructor takes three arguments:

TProofBench(const char *url, const char *outfile = "", const char *proofopt = 0)
  1. The URL of the PROOF master or "lite" for a PROOF-Lite session.
  2. The full path to a file where to save the results of the benchmark. By default the results are saved to a file created in the current directory with a name in the form proofbench-master-Nw-yyyymmdd-hhmm.root .
  3. Additional options to be passed to TProof::Open when opening the PROOF session for the benchmark.
  4. Example:
    root [0] TProofBench pb("cernvm24.cern.ch")
    Starting master: opening connection ...
    Starting master: OK                                                 
    Opening connections to workers: OK (40 workers)                 
    Setting up worker servers: OK (40 workers)                 
    
    
      ### Welcome to the PROOF test cluster of the SFT group ###
    
    
    PROOF set to parallel mode (40 workers)
    Info in <:setoutfile>: using default output file: 'proofbench-cernvm24.cern.ch-40w-20110208-1217.root'
    root [1] 
    

The CPU benchmark

Default benchmark

The default CPU benchmark consists in creating 16 1d histos filled with 30000*Nworkers random numbers.

Using an alternative selector

The name of the alternative selector must be set using TProofBench::SetCPUSel(const char *sel); the slector must be known to the system before running; for example, it can be loaded with TProof::Load. Alternatively, a comma-separated list of PAR files to be enabled before the run can be passed via TProofBench::SetCPUPar(const char *parlist).  

Running the benchmark

RunCPU: stepping on the number of workers 
Int_t RunCPU(Long64_t ncycles=-1, Int_t start=-1, Int_t stop=-1, Int_t step=-1);
  1. ncycles: number of cycles. Default 1000000;
  2. start: number of workers to start with. Default 1;
  3. stop: maximum number of workers for the scan. Default: number of active workers;
  4. step: increase in number of workers between two points. Default 1;
RunCPUx: stepping on the number of workers per worker node
Int_t RunCPUx(Long64_t ncycles=-1, Int_t start=-1, Int_t stop=-1);
  1. ncycles: number of cycles. Default 1000000;
  2. start: number of workers per node to start with. Default 1;
  3. stop: maximum number of workers per node for the scan. Default: number of active workers per node;

When the benchmark is run a dedicated TCanvas pops-up showing the results in real time. The canvas is divided vertically in two regions. On the left zone the absolute scaling plot is shown (cycles/s); on the right zone the same plot normalized to the number of workers is displayed; this second plot allows to pot easily deviations from ideal scalability. An example fo the realtime canvas is shown below.

 Example of real time plot during a RunCPUx

The I/O benchmark

Default benchmark

The default benchmark is based on reading trees of Event structures (see $ROOTSYS/test/Event.h, .cxx).

Creating the default dataset

The default dataset can be created using the MakeDataSet method taking three arguments:

Int_t MakeDataSet(const char *dset = 0, Long64_t nevt = -1, const char *fnroot = "event")
  1. the name of the created dataset. Default is 'BenchDataSet';
  2. the number of events in the files. Default is 30000;
  3. the root for the file names. Defaut is 'event', so the files are named 'event--1.root', 'event--2.root', etc etc.
  4. Example:
    root [1] pb.MakeDataSet()
    Info in <:makedataset>: uploading 'proof/proofbench/src/ProofBenchDataSel.par' ...
    Info in <:makedataset>: enabling 'ProofBenchDataSel' ...
    Collection name='TMap', class='TMap', size=5
     Key:   TObjString = cernvm30.cern.ch
     Value:  Collection name='THashList', class='THashList', size=32
     Key:   TObjString = cernvm34.cern.ch
     Value:  Collection name='THashList', class='THashList', size=32
     Key:   TObjString = cernvm28.cern.ch
     Value:  Collection name='THashList', class='THashList', size=32
     Key:   TObjString = cernvm32.cern.ch
     Value:  Collection name='THashList', class='THashList', size=32
     Key:   TObjString = cernvm26.cern.ch
     Value:  Collection name='THashList', class='THashList', size=32
    Mst-0: merging output objects ... done                                     
    Mst-0: grand total: sent 45 objects, size: 49008 bytes                            
    Collection name='TList', class='TList', size=45
     Collection name='MissingFiles', class='TList', size=0
     OBJ: TStatus   PROOF_Status    OK
     OBJ: TOutputListSelectorDataMap        PROOF_TOutputListSelectorDataMap_object Converter from output list to TSelector data members
     Collection name='PROOF_FilesGenerated_cernvm30.cern.ch_0.27', class='TList', size=5
     Collection name='PROOF_FilesGenerated_cernvm30.cern.ch_0.17', class='TList', size=6
    
     ...
    
     Collection name='PROOF_FilesGenerated_cernvm26.cern.ch_0.5', class='TList', size=4
     Collection name='PROOF_FilesGenerated_cernvm26.cern.ch_0.30', class='TList', size=4
     OBJ: TParameter        PROOF_MinPacketTime     Named templated parameter type
     OBJ: TParameter        PROOF_MaxPacketTime     Named templated parameter type
    TFileCollection dum - dum contains: 0 files with a size of 0 bytes, 0.0 % staged - default tree name: '(null)'
    The collection contains the following files:
    Collection name='THashList', class='THashList', size=160
     root://cernvm30.cern.ch//pool/proofbox/data/default/ganis/event-cernvm30.cern.ch-1.root -|-|- d41d8cd98f00b204e9800998ecf8427e
     root://cernvm30.cern.ch//pool/proofbox/data/default/ganis/event-cernvm30.cern.ch-10.root -|-|- d41d8cd98f00b204e9800998ecf8427e
    
     ...
    
     root://cernvm26.cern.ch//pool/proofbox/data/default/ganis/event-cernvm26.cern.ch-152.root -|-|- d41d8cd98f00b204e9800998ecf8427e
     root://cernvm26.cern.ch//pool/proofbox/data/default/ganis/event-cernvm26.cern.ch-160.root -|-|- d41d8cd98f00b204e9800998ecf8427e
    16:12:39 25785 Mst-0 | Info in <:scandataset>: opening 160 files that appear to be newly staged
    16:12:39 25785 Mst-0 | Info in <:scandataset>: processing 0.'new' file: root://cernvm26.cern.ch//pool/proofbox/data/default/ganis/event-cernvm26.cern.ch-129.root
    16:12:39 25785 Mst-0 | Info in <:scandataset>: processing 16.'new' file: root://cernvm26.cern.ch//pool/proofbox/data/default/ganis/event-cernvm26.cern.ch-145.root
    16:12:39 25785 Mst-0 | Info in <:scandataset>: processing 32.'new' file: root://cernvm28.cern.ch//pool/proofbox/data/default/ganis/event-cernvm28.cern.ch-65.root
    16:12:40 25785 Mst-0 | Info in <:scandataset>: processing 48.'new' file: root://cernvm28.cern.ch//pool/proofbox/data/default/ganis/event-cernvm28.cern.ch-81.root
    16:12:40 25785 Mst-0 | Info in <:scandataset>: processing 64.'new' file: root://cernvm30.cern.ch//pool/proofbox/data/default/ganis/event-cernvm30.cern.ch-1.root
    16:12:40 25785 Mst-0 | Info in <:scandataset>: processing 80.'new' file: root://cernvm30.cern.ch//pool/proofbox/data/default/ganis/event-cernvm30.cern.ch-24.root
    16:12:40 25785 Mst-0 | Info in <:scandataset>: processing 96.'new' file: root://cernvm32.cern.ch//pool/proofbox/data/default/ganis/event-cernvm32.cern.ch-100.root
    16:12:40 25785 Mst-0 | Info in <:scandataset>: processing 112.'new' file: root://cernvm32.cern.ch//pool/proofbox/data/default/ganis/event-cernvm32.cern.ch-116.root
    16:12:41 25785 Mst-0 | Info in <:scandataset>: processing 128.'new' file: root://cernvm34.cern.ch//pool/proofbox/data/default/ganis/event-cernvm34.cern.ch-33.root
    16:12:41 25785 Mst-0 | Info in <:scandataset>: processing 144.'new' file: root://cernvm34.cern.ch//pool/proofbox/data/default/ganis/event-cernvm34.cern.ch-49.root
    16:12:41 25785 Mst-0 | Info in <:scandataset>: 160 files 'new'; 0 files touched; 0 files disappeared
    (Int_t)0
    root [2]
    

Running the benchmark

RunDataSet: stepping on the number of workers 
Int_t RunDataSet(const char *dset = "BenchDataSet",
                    Int_t start = 1, Int_t stop = -1, Int_t step = 1);
  1. dset: dataset name. Default 'BenchDataSet';
  2. start: number of workers to start with. Default 1;
  3. stop: maximum number of workers for the scan. Default: number of active workers;
  4. step: increase in number of workers between two points. Default 1;
RunDataSetx: stepping on the number of workers per worker node
Int_t RunDataSetx(const char *dset = "BenchDataSet", Int_t start = 1, Int_t stop = -1)
  1. dset: dataset name. Default 'BenchDataSet';
  2. start: number of workers per node to start with. Default 1;
  3. stop: maximum number of workers per node for the scan. Default: number of active workers per node;

When the benchmark is run, a dedicated TCanvas pops-up showing the results in real time. The canvas is divided in tfour regions. On the left zone the absolute scaling plots are shown in terms of events/s (top plot) and MBytes/s (bottom); on the right zone the same quantities normalized to the number of workers are displayed. An example of the realtime canvas is shown below.

 Example of real time plot for a data read benchmark

 

The output file

The results are saved in the output file at the path passed by the caller (or at the default path created automatically). The results of the CPU benchmark are saved under the directories 'RunCPU' or 'RunCPUx', those of the IO intensive benchmark under 'RunDataSet' or 'RunDataReadx'.

Saving the performance tree

The proofbench tool allows to save in the output file the performance trees from the various runs done during the benchmark; these can be useful for detailed studies of the results. To save the performance trees one has to set the debug variable to true calling - before the run - the method TProofBench::SetDebug : 

p.SetDebug(kTRUE);

The trees are saved under the relevant subdirectory - i.e., RunCPU, RunCPUx, etc - and are called PROOF_PerfStats_Type_Nwrks_Tthtry, where: Type is the type of benchmark, CPU or DataRead; N is the number of workers during the related run and T is the ordinality of the runt.

Example: PROOF_PerfStats_CPU_8wrks_0thtry will be the name of the tree corresponding to the first run (T=0) of the CPU benchmark (Type=CPU) with 8 workers (N=8).

Displaying the results

Two methods are provided to show the results: DrawCPU and DrawDataSet .

TProofBench::DrawCPU: drawing the results of the CPU benchmark
static void DrawCPU(const char *filewithresults, const char *opt = "std:", Bool_t verbose = kFALSE, Int_t dofit = 0);
  1. filewithresults: the file with the results to be displayed; this is filled during RunCPU or RunCPUx;
  2. opt: what to plot:
    1. 'std:' : the standard plot rate vs number of workers (default);
    2. 'stdx:' : the standard plot rate vs number of workers per node;
    3. 'norm:' : the normalized rate plot vs number of workers;
    4. 'normx:' : the normalized rate plot vs number of workers per node;
  3. verbose: if kTRUE print details about the graph points;
  4. dofit: control extraction of performance specs (see below);
TProofBench::DrawDataSet: drawing the results of the I/O benchmark
static void DrawDataSet(const char *filewithresults, const char *opt = "std:", const char *type = "mbs", Bool_t verbose = kFALSE);
  1. filewithresults: the file with the results to be displayed; this is filled during RunCPU or RunCPUx;
  2. opt: what to plot:
    1. 'std:' : the standard plot rate vs number of workers (default);
    2. 'stdx:' : the standard plot rate vs number of workers per node;
    3. 'norm:' : the normalized rate plot vs number of workers;
    4. 'normx:' : the normalized rate plot vs number of workers per node;
  3. type: which rate:
    1. 'mbs:' : I/O rate (default);
    2. 'evts:' : event rate;
  4. verbose: if kTRUE print details about the graph points;

Getting the performance specs

Starting from the development versions 5.33/02 (and tagged patched versions 5.32/01 and 5.30/06) TProofBench provides a way to extract some performance specs from the output of the CPU benchmark. This is controlled by the last argument in TProofBench::DrawCPU. This integer can take 3 values: 0 do nothing; 1 extract the number using a 1st degree parametrization; 2, as 1 but use 2nd degree parametrization.

For the scalability plot the parametrization are just 1st or 2nd degree polynomials. For the normalized plot, the fitting function is the rational expression obtained from the ration of the same polynomials and the simple 1st degree with parameters {0,1.}.

When the argument is 1 or 2 a table with the relevant measurements is printed on the screen.

The method TProofBench::GetPerfSpecs provides a simplified wrapper around TProofBench::DrawCPU, adding the possibility to scan a directory for proofbench outputs with the possibility to chose which one to use for measurement.

A collection of performnace specs from a few setup can be found here.

Importing proofbench into an older ROOT version

It is possible to import and build the proofbench module in previous ROOT versions. The module is client side-only, so once built it can be used to benchmark any PROOF installation.

The following instructions assume that we are in $ROOTSYS; they have been tested on ROOT 5.28/00 but should work with other reasonable recent ROOT versions.

1. Check-out the module from the trunk

$ cd proof
$ svn co http://root.cern.ch/svn/root/trunk/proof/proofbench
A    proofbench/src
A    proofbench/src/TProofBenchRun.cxx
A    proofbench/src/TProofBenchDataSet.cxx
A    proofbench/src/TSelHist.cxx
A    proofbench/src/TProofNodes.cxx
A    proofbench/src/TSelEventGen.cxx
A    proofbench/src/TProofBenchRunDataRead.cxx
A    proofbench/src/TProofBenchRunCPU.cxx
A    proofbench/src/TProofBench.cxx
A    proofbench/src/TSelEvent.cxx
A    proofbench/src/TSelHandleDataSet.cxx
A    proofbench/inc
A    proofbench/inc/TProofBench.h
A    proofbench/inc/TSelEvent.h
A    proofbench/inc/TSelHandleDataSet.h
A    proofbench/inc/LinkDef.h
A    proofbench/inc/TProofBenchRun.h
A    proofbench/inc/TProofBenchTypes.h
A    proofbench/inc/TProofBenchDataSet.h
A    proofbench/inc/TSelHist.h
A    proofbench/inc/TProofNodes.h
A    proofbench/inc/TSelEventGen.h
A    proofbench/inc/TProofBenchRunDataRead.h
A    proofbench/inc/TProofBenchRunCPU.h
A    proofbench/Module.mk
Checked out revision 40758.
$

2. Build the module

To build the module we need to make the main ROOT Makefile aware of the new module; for that we modify the MODULES variable on the command line; we define a minimal set of modules, the core ones, to be able to build proofbench. However, before running make we must create by hand the directory where to create the default PAR files needed by the benchmark:

$ cd ..
$ make etc/proof/proofbench
$ make MODULES="build cint/cint core/metautils core/pcre core/clib core/utils core/base core/cont core/meta core/zip core/thread core/newdelete proof/proofbench" all-proofbench
cp /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/inc/TProofBenchDataSet.h include/TProofBenchDataSet.h
cp /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/inc/TProofBench.h include/TProofBench.h
cp /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/inc/TProofBenchRunCPU.h include/TProofBenchRunCPU.h
cp /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/inc/TProofBenchRunDataRead.h include/TProofBenchRunDataRead.h
cp /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/inc/TProofBenchRun.h include/TProofBenchRun.h
cp /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/inc/TProofBenchTypes.h include/TProofBenchTypes.h
cp /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/inc/TProofNodes.h include/TProofNodes.h
bin/rmkdepend -R -fproof/proofbench/src/TProofBench.d -Y -w 1000 -- -pipe -m64 -Wshadow -Wall -W -Woverloaded-virtual -fPIC -Iinclude  -pthread -D__cplusplus -- /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/src/TProofBench.cxx
g++ -O2 -pipe -m64 -Wshadow -Wall -W -Woverloaded-virtual -fPIC -Iinclude  -pthread -o proof/proofbench/src/TProofBench.o -c /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/src/TProofBench.cxx
bin/rmkdepend -R -fproof/proofbench/src/TProofBenchDataSet.d -Y -w 1000 -- -pipe -m64 -Wshadow -Wall -W -Woverloaded-virtual -fPIC -Iinclude  -pthread -D__cplusplus -- /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/src/TProofBenchDataSet.cxx
g++ -O2 -pipe -m64 -Wshadow -Wall -W -Woverloaded-virtual -fPIC -Iinclude  -pthread -o proof/proofbench/src/TProofBenchDataSet.o -c /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/src/TProofBenchDataSet.cxx
bin/rmkdepend -R -fproof/proofbench/src/TProofBenchRunCPU.d -Y -w 1000 -- -pipe -m64 -Wshadow -Wall -W -Woverloaded-virtual -fPIC -Iinclude  -pthread -D__cplusplus -- /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/src/TProofBenchRunCPU.cxx
g++ -O2 -pipe -m64 -Wshadow -Wall -W -Woverloaded-virtual -fPIC -Iinclude  -pthread -o proof/proofbench/src/TProofBenchRunCPU.o -c /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/src/TProofBenchRunCPU.cxx
bin/rmkdepend -R -fproof/proofbench/src/TProofBenchRun.d -Y -w 1000 -- -pipe -m64 -Wshadow -Wall -W -Woverloaded-virtual -fPIC -Iinclude  -pthread -D__cplusplus -- /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/src/TProofBenchRun.cxx
g++ -O2 -pipe -m64 -Wshadow -Wall -W -Woverloaded-virtual -fPIC -Iinclude  -pthread -o proof/proofbench/src/TProofBenchRun.o -c /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/src/TProofBenchRun.cxx
bin/rmkdepend -R -fproof/proofbench/src/TProofBenchRunDataRead.d -Y -w 1000 -- -pipe -m64 -Wshadow -Wall -W -Woverloaded-virtual -fPIC -Iinclude  -pthread -D__cplusplus -- /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/src/TProofBenchRunDataRead.cxx
g++ -O2 -pipe -m64 -Wshadow -Wall -W -Woverloaded-virtual -fPIC -Iinclude  -pthread -o proof/proofbench/src/TProofBenchRunDataRead.o -c /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/src/TProofBenchRunDataRead.cxx
bin/rmkdepend -R -fproof/proofbench/src/TProofNodes.d -Y -w 1000 -- -pipe -m64 -Wshadow -Wall -W -Woverloaded-virtual -fPIC -Iinclude  -pthread -D__cplusplus -- /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/src/TProofNodes.cxx
g++ -O2 -pipe -m64 -Wshadow -Wall -W -Woverloaded-virtual -fPIC -Iinclude  -pthread -o proof/proofbench/src/TProofNodes.o -c /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/src/TProofNodes.cxx
Generating dictionary proof/proofbench/src/G__ProofBench.cxx...
core/utils/src/rootcint_tmp -cint -f proof/proofbench/src/G__ProofBench.cxx -c /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/inc/TProofBenchDataSet.h /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/inc/TProofBench.h /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/inc/TProofBenchRunCPU.h /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/inc/TProofBenchRunDataRead.h /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/inc/TProofBenchRun.h /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/inc/TProofBenchTypes.h /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/inc/TProofNodes.h /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/inc/LinkDef.h
bin/rmkdepend -R -fproof/proofbench/src/G__ProofBench.d -Y -w 1000 -- \
           -pipe -m64 -Wshadow -Wall -W -Woverloaded-virtual -fPIC -Iinclude  -pthread  -D__cplusplus -Icint/cint/lib/prec_stl \
           -Icint/cint/stl -I/home/ganis/local/root/5-28-00-patches/root/cint/cint/inc -- proof/proofbench/src/G__ProofBench.cxx
g++  -pipe -m64 -Wshadow -Wall -W -Woverloaded-virtual -fPIC -Iinclude  -pthread  -I. -I/home/ganis/local/root/5-28-00-patches/root/cint/cint/inc  -o proof/proofbench/src/G__ProofBench.o -c proof/proofbench/src/G__ProofBench.cxx
g++ -shared -Wl,-soname,libProofBench.so -m64 -O2 -o lib/libProofBench.so proof/proofbench/src/TProofBench.o proof/proofbench/src/TProofBenchDataSet.o proof/proofbench/src/TProofBenchRunCPU.o proof/proofbench/src/TProofBenchRun.o proof/proofbench/src/TProofBenchRunDataRead.o proof/proofbench/src/TProofNodes.o proof/proofbench/src/G__ProofBench.o
==> lib/libProofBench.so done
bin/rlibmap -o lib/libProofBench.rootmap -l lib/libProofBench.so \
                   -d  -c /home/ganis/local/root/5-28-00-patches/root/proof/proofbench/inc/LinkDef.h
Generating PAR file etc/proof/proofbench/ProofBenchDataSel.par...
Generating PAR file etc/proof/proofbench/ProofBenchCPUSel.par...

3. Run the module

At this point we should be able to run the module; however, since the dependencies file cannot be modified on the fly we need to load the library need by libProofBench by hand, in the ROOT session and in the PROOF session, once started, i.e. after the TProofBench object has been created:

$ root -l
root [0] gSystem->Load("libProofPlayer.so")
(int)0
root [1] TProofBench pb("proofadm@cernvm24.cern.ch")
Starting master: opening connection ...
Starting master: OK                                                 
Opening connections to workers: OK (40 workers)                 
Setting up worker servers: OK (40 workers)                 
PROOF set to parallel mode (40 workers)
Info in <:setoutfile>: using default output file: 'proofbench-cernvm24.cern.ch-40w-20110830-1727.root'
root [2] gProof->Exec("gSystem->Load(\"libProofPlayer.so\")")
(int)0
(int)0
...
(Int_t)0
root [3] pb.RunCPU()

Here we go!