You are here

Analysing the performance tree

The performance tree is a TTree containing detailed information about the workflow of a PROOF query. The tree is not created by default: detailed instructions to create and save it can be found in the dedicated section.
Content of this page:

  1. TProofPerfAnalysis: set of tools to analyse the performance tree
    1. Constructor
    2. Worker query information
    3. Event and packet distributions
    4. Processing rates vs query time
    5. Packet retrieval latency vs query time
    6. File distribution
    7. Detailed processing information per file
    8. Saving intermediate drawn objects to file
  2. Description of the performance tree

1. TProofPerfAnalysis: set of tools to analyse the performance tree

The container class TProofPerfAnalysis is an interface to a set of tools to analyse the performance tree of a given query. The class is located under $ROOTSYS/proof/proofbench and therefore appears in the $ROOTSYS/lib/libProofBench.so plug-in. The class has been introduced during the development cycle 5.33/01 and back-ported to the patch branches of production versions 5.30 and 5.32; it therefore ships with the ROOT development release 5.33/02 and with the patch releases 5.30/07 and 5.32/01.

1.1 Constructor

The class constructor has the following signature:

TProofPerfAnalysis(const char *perffile, const char *title = "", const char *treename = "PROOF_PerfStats");

with one mandatory and two optional arguments:

  • perffile, mandatory, is the file to be analysed; can also be located remotely, for example on the web.
  • title, optional, is the title to be used on the canvas to characterize the results of the analyzed queries; by default is set to ""
  • treename, optional, the name of the tree to be analyzed; default is "PROOF_PerfStats". It is also possible to speficy a subdirectory or to add the string passed her to 'PROOF_PerfStats', so that the final serached name is 'PROOF_PerfStats'. If the name specified by treename, or derived by treename, is not found, the constructor will look into the file directory structure for the first TTree whose name contains treename. This is typically enough tolocate the performance tree in the file.

If the initialization is successful, the constructor will run some analysis on the worker performance and print a summary of the result.

Here is an example:

root [0] TProofPerfAnalysis ppa1("H4muEOS-001.root")
 +++ TTree 'PROOF_PerfStats' has 612 entries
 +++ 16 workers were active during this query
 +++ Total query time: 713.634155 secs (init: 0.133229 secs, merge: 432.601013 secs)
 +++ Avg processing rates: 153.8833 evts/s, 29.3369 MB/s
 +++ Max processing rates: 594.7538 evts/s, 80.6259 MB/s
 +++ 11 files were processed during this query

The summary information can be also re-printed with the method

   void  Summary(Option_t *opt = "", const char *out = "") .
Currently only option 'S' is supported to get a compact format:
root [1] ppa1.Summary("S")
16 713.634155 0.133229 432.601013 153.883303 594.753847 29.336907 80.625946

 

The second argument specifies a text file where to save the information; the file is (re)created at each call, unless a '+' is pre-pended to the file name, in which case the information is appended to en existing file.

1.2 Worker query information

The performance tree contains detailed information about packets. This can be used to the performance of a given workers during the query. A summary of the worker activity can be displayed using the method PrintWrkInfo, available with two signatures:

   void  PrintWrkInfo(Int_t showlast = 10);
   void  PrintWrkInfo(const char *wrk);
The first form can be used to printout the information about the showlast slowest workers. The second to display the the information of a given worker or a set of workers whose ordinal number or FQDN matches the string passed argument. Here are two ways to display the same information using the two methods:
root [1] pf.PrintWrkInfo(2)                                                                                                                                            
 +++ TWrkInfo ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++                                                                                       
 +++ Worker:             0.100, lxfssl3310.cern.ch
 +++ Activity interval:  3.842385 -> 433.544617
 +++ Amounts processed:  32 packets (32 remote), 263633 evts, 2516378014 bytes
 +++ Processing time:    429.675842 s (CPU: 48.580000 s)
 +++ Averages:           613.562538 evts/s, 5.585152 MB/s
 +++ Total latency:      0.025222
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
 +++ TWrkInfo ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
 +++ Worker:             0.112, lxfssl3404.cern.ch
 +++ Activity interval:  3.500722 -> 433.617889
 +++ Amounts processed:  18 packets (18 remote), 234250 evts, 2084701916 bytes
 +++ Processing time:    430.099195 s (CPU: 49.230000 s)
 +++ Averages:           544.641801 evts/s, 4.622484 MB/s
 +++ Total latency:      0.017882
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
and
root [1] pf.PrintWrkInfo("0.100,lxfssl3404.cern.ch")
 +++ TWrkInfo ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++                                                                                       
 +++ Worker:             0.100, lxfssl3310.cern.ch
 +++ Activity interval:  3.842385 -> 433.544617
 +++ Amounts processed:  32 packets (32 remote), 263633 evts, 2516378014 bytes
 +++ Processing time:    429.675842 s (CPU: 48.580000 s)
 +++ Averages:           613.562538 evts/s, 5.585152 MB/s
 +++ Total latency:      0.025222
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
 +++ TWrkInfo ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
 +++ Worker:             0.112, lxfssl3404.cern.ch
 +++ Activity interval:  3.500722 -> 433.617889
 +++ Amounts processed:  18 packets (18 remote), 234250 evts, 2084701916 bytes
 +++ Processing time:    430.099195 s (CPU: 49.230000 s)
 +++ Averages:           544.641801 evts/s, 4.622484 MB/s
 +++ Total latency:      0.017882
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 

Graphically is interesting to see how the processing activity developed during the query time. This can be displayed using the WorkerActivity method

void WorkerActivity();
An example of output produced by this method is shown in the figure.

Example of worker activity during the query

The upper plots show the query start-up and processing end. For a good query - like the one shown in here - the starting and finishing time is very close for all the workers. Bad performing workers will show up as long tails in here. The bottom plot quantifies the query finishing time and measures the spread across the workers.

1.3 Event and packet distributions

The packet and event distribution across workers can be displayed using the method EventDist():

 
void  EventDist();
A call to this creates a canvas with in the top plot the events per worker and in the bottom the packets per worker. An example is shown in the figure: Example of worker activity during the query

1.4 Processing rate vs query time

The method RatePlot() displays the event and MB processing rates as a function of the query time per each worker. The signature is the following:

 void  RatePlot(const char *wrks = 0);
The argument allows to select one or more workers by their ordinal number (comma-separated list). The MB rate makes sense only for data-driven queries. In such a case results from remote packets are displayed in red. By pointing on a curve and pressing the right button of the mouse is possible to find out which curve related to which worker, and have a second run for the given worker(s).

An example of the plot is shown in the figure. Example of worker activity during the query

 

1.5 Packet retrieval latency vs query time

The latency in getting a packet from the master measured by the worker can displayed as a function of the query time using the method LatencyPlot():

void  LatencyPlot(const char *wrks = 0);
As in the case of rates, the argument allows to select one or more workers by their ordinal number (comma-separated list). By pointing on a curve and pressing the right button of the mouse is possible to find out which curve related to which worker, and have a second run for the given worker(s).

An example of the plot is shown in the figure. Example of packet retrieval latency as a function of the query time

 

1.6 File distribution

For data-driven queries reading files from several servers it may be interesting to look at the distribution of packets across the servers and workers. This is the purpose of the FileDist() method:

void  FileDist(Bool_t writedetails = kFALSE);
which produces two canvases, one with the distribution of packets and MB across the data servers, which gives an idea of how even are data distributed, and a second canvas with a 3D plots showing the correlations between workers and servers for data serving. Optionally the details of this information can be saved into a text file.

Examples of the plots are shown in the following figures. Example of packet and MB distribution across data servers

Example of {worker, data server} correlations

 

 

 

1.7 Detailed processing information per file

Two methods are provided to display detailed information about the processing of each file:

   void  PrintFileInfo(Int_t showslowest = 10, const char *opt = "", const char *out = 0);
   void  PrintFileInfo(const char *fn, const char *opt = "P", const char *out = 0);

The first method displays information about the files that took longer to process (by default the last ten):

root [6] ppa1.PrintFileInfo(3)
 +++ TFileInfo ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
 +++ Server:         root://eosatlas/
 +++ File:           /eos/atlas/user/g/gganis/data1/mc12_8TeV.160155.PowhegPythia8_AU2CT10_ggH125_ZZ4lep.merge.NTUP_HSG2.e1191_s1469_s1470_r3542_r3549_p1344_tid01133894_00/NTUP_HSG2.01133894._000009.root.1
 +++ Processing interval:  0.090499 -> 229.785583
 +++ Packets:         45 (45 remote)
 +++ Processing wrks: 7 (7 remote)
 +++ MB rates:       5.388210 MB/s (avg), 2.640950 MB/s (min), 10.326082 MB/s (max)
 +++ Sizes:          222  (avg), 19 (min), 1000 (max)
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
 +++ TFileInfo ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
 +++ Server:         root://eosatlas/
 +++ File:           /eos/atlas/user/g/gganis/data1/mc12_8TeV.160155.PowhegPythia8_AU2CT10_ggH125_ZZ4lep.merge.NTUP_HSG2.e1191_s1469_s1470_r3542_r3549_p1344_tid01133894_00/NTUP_HSG2.01133894._000001.root.1
 +++ Processing interval:  0.062248 -> 612.267883
 +++ Packets:         11 (11 remote)
 +++ Processing wrks: 2 (2 remote)
 +++ MB rates:       7.362400 MB/s (avg), 0.239816 MB/s (min), 11.984926 MB/s (max)
 +++ Sizes:          909  (avg), 686 (min), 1180 (max)
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
 +++ TFileInfo ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
 +++ Server:         root://eosatlas/
 +++ File:           /eos/atlas/user/g/gganis/data1/mc12_8TeV.160155.PowhegPythia8_AU2CT10_ggH125_ZZ4lep.merge.NTUP_HSG2.e1191_s1469_s1470_r3542_r3549_p1344_tid01133894_00/NTUP_HSG2.01133894._000004.root.1
 +++ Processing interval:  151.918732 -> 713.634155
 +++ Packets:         159 (159 remote)
 +++ Processing wrks: 15 (15 remote)
 +++ MB rates:       7.150163 MB/s (avg), 0.129910 MB/s (min), 19.217761 MB/s (max)
 +++ Sizes:          62  (avg), 8 (min), 1000 (max)
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 

The second prototype allows to address a given file. Using the option filed, detailed information about the packets can be obtained:

root [10] ppa1.PrintFileInfo("/eos/atlas/user/g/gganis/data1/mc12_8TeV.160155.PowhegPythia8_AU2CT10_ggH125_ZZ4lep.merge.NTUP_HSG2.e1191_s1469_s1470_r3542_r3549_p1344_tid01133894_00/NTUP_HSG2.01133894._000001.root.1", "WP")
 +++ TFileInfo ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
 +++ Server:         root://eosatlas/
 +++ File:           /eos/atlas/user/g/gganis/data1/mc12_8TeV.160155.PowhegPythia8_AU2CT10_ggH125_ZZ4lep.merge.NTUP_HSG2.e1191_s1469_s1470_r3542_r3549_p1344_tid01133894_00/NTUP_HSG2.01133894._000001.root.1
 +++ Processing interval:  0.062248 -> 612.267883
 +++ Packets:         11 (11 remote)
 +++ Processing wrks: 2 (2 remote)
Collection name='TList', class='TList', size=2
  Worker: 0.3,  packet(s): 10
              1000 evts,          4.92 MB/s,         0.062 ->       22.280 s
               900 evts,          5.63 MB/s,        22.283 ->       53.295 s
               713 evts,         11.98 MB/s,        53.296 ->       63.765 s
               820 evts,         10.39 MB/s,        63.766 ->       71.557 s
               960 evts,          9.76 MB/s,        71.557 ->       81.750 s
              1075 evts,          9.92 MB/s,        81.751 ->       92.715 s
              1180 evts,          5.75 MB/s,        92.715 ->      117.761 s
               892 evts,          8.44 MB/s,       117.761 ->      127.196 s
               774 evts,          5.77 MB/s,       127.196 ->      144.127 s
               686 evts,          8.17 MB/s,       144.128 ->      151.290 s
  Worker: 0.8,  packet(s): 1
              1000 evts,          0.24 MB/s,         0.111 ->      612.268 s
 +++ MB rates:       7.362400 MB/s (avg), 0.239816 MB/s (min), 11.984926 MB/s (max)
 +++ Sizes:          909  (avg), 686 (min), 1180 (max)
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
The last argument allows to specify a text file where the information displayed on the screen can be saved.

 

1.8 Saving drawn objects to file

Starting with version 5.34/32 (and 6.04/00) the temporary objects created during the various calls described above can be save to a file for further processing. This functionality is controlled by the TProofPerfAnalysis method SetSaveResult:

Int_t SetSaveResult(const char *file = "results.root", Option_t *mode = "RECREATE")

The fuctionality remains enabled after a first call to SetSaveResult and can be disabled by a call with 'file' null . By default a new file is created each time SetSaveResult is calle with a non-null 'file'; the second argument allows to update files instead of recreate them.

Example:

root [] TProofPerfAnalysis paf("perf_analysis_output.root")
root [] paf.SetSaveResult()
Drawn objects will be saved in file 'results.root'
root [] paf.RatePlot()
root [] paf.SetSaveResult("results-global.root", "UPDATE")
Drawn objects will be saved in file 'results-global.root'
root [] paf.RatePlot()
root [] paf.SetSaveResult(0)
Drawn objects saving disabled

 

2. Description of the performance tree

The name of the tree is PROOF_PerfStats and its entries are instances of the class TPerfEvent. The tree is filled via the TPerfStats interface class, an implementation of TVirtualPerfStats. The class TPerfEvent contains the following:

class TPerfEvent : public TObject {

public:
   TString                       fEvtNode;   // node on which the event was generated
   TTimeStamp                    fTimeStamp; // time offset from start of run
   TVirtualPerfStats::EEventType fType;
   TString                       fSlaveName;
   TString                       fNodeName;
   TString                       fFileName;
   TString                       fFileClass;
   TString                       fSlave;
   Long64_t                      fEventsProcessed;
   Long64_t                      fBytesRead;
   Long64_t                      fLen;
   Double_t                      fLatency;
   Double_t                      fProcTime;
   Double_t                      fCpuTime;
   Bool_t                        fIsStart;
   Bool_t                        fIsOk;

   TPerfEvent(TTimeStamp *offset = 0);
   virtual ~TPerfEvent() {}

   Bool_t   IsSortable() const { return kTRUE; }
   Int_t    Compare(const TObject *obj) const;
   void     Print(Option_t *option="") const;

   ClassDef(TPerfEvent,3) // Class holding TProof Event Info
};

The meaning of the different members depend of the event type. The types are described in TVirtualPerfStats.h :

   enum EEventType {
      kUnDefined,
      kPacket,       //info of single packet processing
      kStart,        //begin of run
      kStop,         //end of run
      kFile,         //file started/finished in packetizer
      kFileOpen,     //opening data file statistics
      kFileRead,     //reading data file event
      kRate,         //processing {evt, MB} rates
      kNumEventType  //number of entries, must be last
   };

Not all event types are filled during a PROOF query, but only the following:

  1. kStart, marking the begin of query processing;
  2. kPacket, after each packet has been processed;
  3. kFile, when a file is assigned to a worker and when there is no more packets from the file to be given to the worker;
  4. kRate, when a message with processing progress information is sent to the client;
  5. kStop, marking the end of query processing.

Each entry in the tree has a timestamp, so that time correlation between the different events are possible.