You are here

More advanced examples

In this pages we will dissect the PROOF tutorials and steering macros available under tutorials/proof. The tutorials directory is available under $ROOTSYS fro non-prefixed installations or under docdir for prefixed installations, defaulting to prefix/share/doc/root .

See also the slides for a one-day PROOF tutorial: Introduction, Basic hands-on .

  1. Run the available examples: runProof.C
    1. Cycle-driven processing
      1. "simple": simple random number generation
      2. "event": simple Monte Carlo generation
      3. "pythia8": example of usage of pythia8
    2. Data-driven processing
      1. "h1": h1 analysis
      2. "eventproc": Processing of 'Event' entries
      3. "friends": using TTree friends
    3. Large-output handling via files on workers
      1. "simplefile": merging histograms via files
      2. "ntuple": merging simple ntuple (TTree) via files
      3. "dataset": access simple ntuple (TTree) files via a dataset
  2. Start a PROOF session using getProof.C

 

1. Run the available examples: runProof.C

The macro 'runProof.C' steers the running of the tutorials. It starts a PROOF session, sets the required options, and runs the required tutorial. The signature of the runProof macro is the following:

void runProof(const char *what = "simple",
              const char *url = "proof://localhost:40000",
              Int_t nwrks = -1)

The first argument what is a string composed by the name of the tutorial to run and, optionally, by one or more arguments in the form 'name(arg1,arg2,...)'; arguments can be general (applying to all tutorials) or specific to a given tutorial. Available tutorials and general arguments are described in the tables below. The second argument url is the place where to run the tutorial, used as first argument to getProof. The third argument nwrks is the number of workers to be used for the tutorial: it is used as a second argument to the call to getProof and as argument of a call to TProof::SetParallel .

The available tutorials are shown in the table:

Available tutorials
Name Type Output Description
simple cycle histograms
1 canvas
Generate gaussian random
numbers and fill histograms
simplefile cycle histograms
2 canvases
Generate gaussian randoms;
saved in two directories;
merge via file
event cycle histograms
1 canvas
Generate Event entries;
run simple analysis;
uses 'event.par' PAR file
pythia8 cycle histograms
1 canvas
Generate Pythia8 Monte Carlo events;
run simple analysis;
uses 'pythia8.par' PAR file
ntuple cycle ntuple
1 canvas
Generate simple ntuple;
merge via file
dataset cycle dataset
1 canvas
Same as 'ntuple' but dataset
creation instead of file merging;
uses TProof::DrawSelect
friends data histograms
1 canvas
Generate some simple main TTree
trees and their friends;
run simple analysis
h1 data histograms
2 canvases
Usual H1 analysis
eventproc data histograms
1 canvas
Simple analysis of files with
Event entries;
uses 'event.par' PAR file
 

The 'Type' refers to the type of processing. The keyword 'data' means data-driven; this happens when processing a TTree, whose entries steer the distribution of work. The type 'cycle' indicates that a cycle of tasks has to be repeated the specified number of times and there is no steering TTree involved; this happens, for example, when generating Monte Carlo events. 

The general arguments are shown in the table:

General arguments
Argument Description
debug=[what:]level Set verbosity to 'level';
optionally select the scope with 'what'
(same names as in TProofDebug)
nevt=N Process N entries
first=F Start processing from entry F
(when processing data files)
asyn Run in non-blocking (asynchronous)
mode
nwrk=N Set the number of active workers to N
(may not always be successful)
punzip Use parallel unzip in reading files
cache=bytes
cache=kbytesK
cache=mbytesM
Change the size of the TTree cache;
use
submergers[=S] Enable merging via submergers
(number set to S or to the default)
rateest=average Use the measured average to estimate
the current processing speed reported
by the progress bar
perftree=perftreefile.root Generate the performance tree and
save it to perftreefile.root

 

In addition to these common arguments, there is a way to control the ACLiC mode used to build the selectors: by default  '+' is used, i.e. compile-if-changed. However, this may lead to problems if the available selector libs were compiled in previous sessions with a different set of loaded libraries (this is a general problem in ROOT). When this happens the best solution is to force recompilation (ACLiC mode '++'). To do this just add '++' to the name of the tutorial, e.g. runProof("event++") .

 

 

2.1 Cycle-driven processing

The PROOF tutorials include three examples of cycle-driven processing, i.e. of a generic task executed in parallel N times.

 

2.1.1 "simple": ProofSimple

Selector: tutorials/proof/ProofSimple.htutorials/proof/ProofSimple.C
PAR file: none

This is an example of a random gaussian generation filling a configurable number of 1D histograms.

 

2.1.2 "event": ProofEvent

Selector: tutorials/proof/ProofEvent.htutorials/proof/ProofEvent.C
PAR file: tutorials/proof/event.par

This is an exampe of a simple Monte Carlo generation creating 'event'-like structures (see $ROOTSYS/test/Event.h). This example shows how to use a PAR package.

 

 

2.1.3 "pythia8": ProofPythia

Selectortutorials/proof/ProofPythia.htutorials/proof/ProofPythia.C
PAR filetutorials/proof/pythia8.par

This is similar to the previous example but with a real Monte Carlo generator, Pythia 8. It shows how to use a PAR package and how to set the relevant external variables.

 

2.2 Data-driven processing

The PROOF tutorials include three examples of data-driven processing, i.e. of processing steered by an exting TTree.

 

2.2.1 "h1": h1analysis

Selectortutorials/tree/h1analysis.htutorials/tree/h1analysis.C 
PAR filenone

This is the famous H1-analysis available since a long time under tutorials/tree. By default the data are read from the ROOT HTTP server. However, one can an change the location of the source.

 

2.2.2 "eventproc": ProofEventProc

Selectortutorials/proof/ProofEventProc.htutorials/proof/ProofEventProc.C
PAR filetutorials/proof/event.par

This is an example of data-driven analysis using a PAR package. It also shows how to vary the fraction of the event read for the analysis.

 

2.2.3 "friends": using TTree friends

Selectortutorials/proof/ProofFriends.htutorials/proof/ProofFriends.C 
PAR filenone

 

2.3 Large-output handling via files on workers

The PROOF tutorials include three examples of handling of large outputs via intermediate saving of the worker outputs on files local to the workers.

 

2.3.1 "simplefile": merging histograms via files

Selectortutorials/proof/ProofSimpleFile.htutorials/proof/ProofSimpleFile.C 
PAR filenone

 

 

2.3.2 "ntuple": merge simple ntuple (TTree) via files

Selectortutorials/proof/ProofNtuple.htutorials/proof/ProofNtuple.C 
PAR filenone

This example shows how to generate in parallel a simple ntuple, to save it in local files on the workers, and to automatically merge the files as part of the finalization phase.

 

2.3.3 "dataset": access simple ntuple (TTree) files via a dataset

Selectortutorials/proof/ProofNtuple.htutorials/proof/ProofNtuple.C 
PAR filenone

This example is similar to the previous one, except that the files, instead of being merge, are left on the workers and a dataset is created and registered on the master so that it can be automatically used in a subsequent run. In the example, the 'subsequent runs' are examples of drawing operation via PROOF.

 

2. Start a PROOF session using getProof.C

In this section we describe the macro 'getProof.C', located under $ROOTSYS/tutorials/proof, whihc is used by 'runProof.C' (and '$ROOTSYS/test/stressProof.C') to start a PROOF session, either locally (lite or standard) or to a remote cluster. When relevant, the macro checks for the state of the relevant daemon. For local daemons, the macro makes the necessary configurations steps and automatically (re)starts the daemon.

Signature:

TProof *getProof(const char *url = "proof://localhost:40000",
                 Int_t nwrks = -1, const char *dir = 0,
                 const char *opt = "ask", Bool_t dyn = kFALSE,
                 Bool_t tutords = kFALSE)
The first argument url is the url of the master where to start the PROOF session; use 'lite://' for a PROOF-Lite session. The default is 'proof://localhost:40000', i.e. a standard cluster on the local machine on port 40000. The macro tries to start the daemon if not found. The second argument nwrks specifies the number of workers for the session; it can only be fulfilled if the daemon allows the requested number (or if the daemon needs to be started and, therefore, configured) or in PROOF-Lite.

The other arguments apply only in the case the local daemon on port 40000 needs to be started:

  1. dir:
  2. directory to be used for the files and working areas. When starting a new instance of the daemon the relevant files and directories in this directory are cleaned. If left null, the default is used: '/tmp/user/.getproof';
  3. opt: defines what to do if an existing xrootd uses the same ports; possible options are: "ask", ask the user; "force", kill the xrootd and start a new one; if any other string is specified the existing xrootd will be used. Default is 'ask'. Note that for a change in 'nwrks' to be effective you need to specify 'force';
  4. dyn: this flag can be used to switch on dynamic, per-job worker setup scheduling. Default is off;
  5. tutords: this flag can be used to force a dataset dir under the tutorial dir. Default is no.

Examples:

  1. Standard session with startup of the local daemon
    root [0] .L tutorials/proof/getProof.C+
    root [1] TProof *p = getProof()
    getProof: working area not specified temp 
    getProof: working area (tutorial dir): /tmp/ganis/.getproof
    SysError in <:unixtcpconnect>: connect (localhost:40000) (Connection refused)
    getProof: xrootd config file at /tmp/ganis/.getproof/xpd.cf
    getProof: xrootd log file at /tmp/ganis/.getproof/xpdtut/xpd.log
    (NB: any error line from XrdClientSock::RecvRaw and XrdClientMessage::ReadRaw should be ignored)
    getProof: waiting for xrootd to start ...
    getProof: xrootd pid: 11681
    getProof: start / attach the PROOF session ...
    Starting master: opening connection ...
    Starting master: OK                                                 
    Opening connections to workers: OK (4 workers)                 
    Setting up worker servers: OK (4 workers)                 
    PROOF set to parallel mode (4 workers)
    root [2]
    
  2. PROOF-Lite session with 8 workers
    root [0] .L tutorials/proof/getProof.C+
    root [1] TProof *p = getProof("lite://", 8)
    getProof: trying to open a PROOF-Lite session with 8 workers                                                                                                           
     +++ Starting PROOF-Lite with 8 workers +++
    Opening connections to workers: OK (8 workers)                 
    Setting up worker servers: OK (8 workers)                 
    PROOF set to parallel mode (8 workers)
    root [2]