Analysing the performance tree

The performance tree is a TTree containing detailed information about the workflow of a PROOF query. The tree is not created by default: detailed instructions to create and save it can be found in the dedicated section.
Content of this page:

  1. TProofPerfAnalysis: set of tools to analyse the performance tree
    1. Constructor
    2. Worker query information
    3. Event and packet distributions
    4. Processing rates vs query time
    5. Packet retrieval latency vs query time
    6. File distribution
  2. Description of the performance tree

1. TProofPerfAnalysis: set of tools to analyse the performance tree

The container class TProofPerfAnalysis is an interface to a set of tools to analyse the performance tree of a given query. The class is located under $ROOTSYS/proof/proofbench and therefore appears in the $ROOTSYS/lib/libProofBench.so plug-in. The class has been introduced during the development cycle 5.33/01 and back-ported to the patch branches of production versions 5.30 and 5.32; it therefore ships with the ROOT development release 5.33/02 and with the patch releases 5.30/07 and 5.32/01.

1.1 Constructor

The class constructor has the following signature:

TProofPerfAnalysis(const char *perffile, const char *title = "", const char *treename = "PROOF_PerfStats");

with one mandatory and two optional arguments:

  • perffile, mandatory, is the file to be analysed; can also be located remotely, for example on the web.
  • title, optional, is the title to be used on the canvas to characterize the results of the analyzed queries; by default is set to ""
  • treename, optional, the name of the tree to be analyzed; default is "PROOF_PerfStats". It is also possible to speficy a subdirectory or to add the string passed her to 'PROOF_PerfStats', so that the final serached name is 'PROOF_PerfStats<treename>'. If the name specified by treename, or derived by treename, is not found, the constructor will look into the file directory structure for the first TTree whose name contains treename. This is typically enough tolocate the performance tree in the file.

If the initialization is successful, the constructor will run some analysis on the worker performance and print a summary of the result.

Here is an example:

root [0] TProofPerfAnalysis pf("perftree-alice-caf-20w-eventproc.root")
 +++ TTree 'PROOF_PerfStats' has 1515 entries
 +++ 20 workers were active during this query
 +++ Total query time: 433.617889 secs (init: 3.843145 secs, merge: 2.958466 secs)
root [1] 

1.2 Worker query information

The performance tree contains detailed information about packets. This can be used to the performance of a given workers during the query. A summary of the worker activity can be displayed using the method PrintWrkInfo, available with two signatures:

   void  PrintWrkInfo(Int_t showlast = 10);
   void  PrintWrkInfo(const char *wrk);
The first form can be used to printout the information about the showlast slowest workers. The second to display the the information of a given worker or a set of workers whose ordinal number or FQDN matches the string passed argument. Here are two ways to display the same information using the two methods:
root [1] pf.PrintWrkInfo(2)                                                                                                                                            
 +++ TWrkInfo ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++                                                                                       
 +++ Worker:             0.100, lxfssl3310.cern.ch
 +++ Activity interval:  3.842385 -> 433.544617
 +++ Amounts processed:  32 packets (32 remote), 263633 evts, 2516378014 bytes
 +++ Processing time:    429.675842 s (CPU: 48.580000 s)
 +++ Averages:           613.562538 evts/s, 5.585152 MB/s
 +++ Total latency:      0.025222
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
 +++ TWrkInfo ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
 +++ Worker:             0.112, lxfssl3404.cern.ch
 +++ Activity interval:  3.500722 -> 433.617889
 +++ Amounts processed:  18 packets (18 remote), 234250 evts, 2084701916 bytes
 +++ Processing time:    430.099195 s (CPU: 49.230000 s)
 +++ Averages:           544.641801 evts/s, 4.622484 MB/s
 +++ Total latency:      0.017882
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
and
root [1] pf.PrintWrkInfo("0.100,lxfssl3404.cern.ch")
 +++ TWrkInfo ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++                                                                                       
 +++ Worker:             0.100, lxfssl3310.cern.ch
 +++ Activity interval:  3.842385 -> 433.544617
 +++ Amounts processed:  32 packets (32 remote), 263633 evts, 2516378014 bytes
 +++ Processing time:    429.675842 s (CPU: 48.580000 s)
 +++ Averages:           613.562538 evts/s, 5.585152 MB/s
 +++ Total latency:      0.025222
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
 +++ TWrkInfo ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
 +++ Worker:             0.112, lxfssl3404.cern.ch
 +++ Activity interval:  3.500722 -> 433.617889
 +++ Amounts processed:  18 packets (18 remote), 234250 evts, 2084701916 bytes
 +++ Processing time:    430.099195 s (CPU: 49.230000 s)
 +++ Averages:           544.641801 evts/s, 4.622484 MB/s
 +++ Total latency:      0.017882
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 

Graphically is interesting to see how the processing activity developed during the query time. This can be displayed using the WorkerActivity method

void WorkerActivity();
An example of output produced by this method is shown in the figure.

Example of worker activity during the query

The upper plots show the query start-up and processing end. For a good query - like the one shown in here - the starting and finishing time is very close for all the workers. Bad performing workers will show up as long tails in here. The bottom plot quantifies the query finishing time and measures the spread across the workers.

1.3 Event and packet distributions

The packet and event distribution across workers can be displayed using the method EventDist():

 
void  EventDist();
A call to this creates a canvas with in the top plot the events per worker and in the bottom the packets per worker. An example is shown in the figure: Example of worker activity during the query

1.4 Processing rate vs query time

The method RatePlot() displays the event and MB processing rates as a function of the query time per each worker. The signature is the following:

 void  RatePlot(const char *wrks = 0);
The argument allows to select one or more workers by their ordinal number (comma-separated list). The MB rate makes sense only for data-driven queries. In such a case results from remote packets are displayed in red. By pointing on a curve and pressing the right button of the mouse is possible to find out which curve related to which worker, and have a second run for the given worker(s).

An example of the plot is shown in the figure. Example of worker activity during the query

 

1.5 Packet retrieval latency vs query time

The latency in getting a packet from the master measured by the worker can displayed as a function of the query time using the method LatencyPlot():

void  LatencyPlot(const char *wrks = 0);
As in the case of rates, the argument allows to select one or more workers by their ordinal number (comma-separated list). By pointing on a curve and pressing the right button of the mouse is possible to find out which curve related to which worker, and have a second run for the given worker(s).

An example of the plot is shown in the figure. Example of packet retrieval latency as a function of the query time

 

1.6 File distribution

For data-driven queries reading files from several servers it may be interesting to look at the distribution of packets across the servers and workers. This is the purpose of the FileDist() method:

void  FileDist(Bool_t writedetails = kFALSE);
which produces two canvases, one with the distribution of packets and MB across the data servers, which gives an idea of how even are data distributed, and a second canvas with a 3D plots showing the correlations between workers and servers for data serving. Optionally the details of this information can be saved into a text file.

Examples of the plots are shown in the following figures. Example of packet and MB distribution across data servers

Example of {worker, data server} correlations

 

 

2. Description of the performance tree

The name of the tree is PROOF_PerfStats and its entries are instances of the class TPerfEvent. The tree is filled via the TPerfStats interface class, an implementation of TVirtualPerfStats. The class TPerfEvent contains the following:

class TPerfEvent : public TObject {
 
public:
   TString                       fEvtNode;   // node on which the event was generated
   TTimeStamp                    fTimeStamp; // time offset from start of run
   TVirtualPerfStats::EEventType fType;
   TString                       fSlaveName;
   TString                       fNodeName;
   TString                       fFileName;
   TString                       fFileClass;
   TString                       fSlave;
   Long64_t                      fEventsProcessed;
   Long64_t                      fBytesRead;
   Long64_t                      fLen;
   Double_t                      fLatency;
   Double_t                      fProcTime;
   Double_t                      fCpuTime;
   Bool_t                        fIsStart;
   Bool_t                        fIsOk;
 
   TPerfEvent(TTimeStamp *offset = 0);
   virtual ~TPerfEvent() {}
 
   Bool_t   IsSortable() const { return kTRUE; }
   Int_t    Compare(const TObject *obj) const;
   void     Print(Option_t *option="") const;
 
   ClassDef(TPerfEvent,3) // Class holding TProof Event Info
};

The meaning of the different members depend of the event type. The types are described in TVirtualPerfStats.h :

   enum EEventType {
      kUnDefined,
      kPacket,       //info of single packet processing
      kStart,        //begin of run
      kStop,         //end of run
      kFile,         //file started/finished in packetizer
      kFileOpen,     //opening data file statistics
      kFileRead,     //reading data file event
      kRate,         //processing {evt, MB} rates
      kNumEventType  //number of entries, must be last
   };

Not all event types are filled during a PROOF query, but only the following:

  1. kStart, marking the begin of query processing;
  2. kPacket, after each packet has been processed;
  3. kFile, when a file is assigned to a worker and when there is no more packets from the file to be given to the worker;
  4. kRate, when a message with processing progress information is sent to the client;
  5. kStop, marking the end of query processing.

Each entry in the tree has a timestamp, so that time correlation between the different events are possible.