More advanced examples
In this pages we will dissect the PROOF tutorials and steering macros available under tutorials/proof. The tutorials directory is available under $ROOTSYS fro non-prefixed installations or under docdir for prefixed installations, defaulting to prefix/share/doc/root .
See also the slides for a one-day PROOF tutorial: Introduction, Basic hands-on .
- Run the available examples: runProof.C
- Start a PROOF session using getProof.C
1. Run the available examples: runProof.C
The macro 'runProof.C' steers the running of the tutorials. It starts a PROOF session, sets the required options, and runs the required tutorial. The signature of the runProof macro is the following:
void runProof(const char *what = "simple", const char *url = "proof://localhost:40000", Int_t nwrks = -1)
The first argument what is a string composed by the name of the tutorial to run and, optionally, by one or more arguments in the form 'name(arg1,arg2,...)'; arguments can be general (applying to all tutorials) or specific to a given tutorial. Available tutorials and general arguments are described in the tables below. The second argument url is the place where to run the tutorial, used as first argument to getProof. The third argument nwrks is the number of workers to be used for the tutorial: it is used as a second argument to the call to getProof and as argument of a call to TProof::SetParallel .
The available tutorials are shown in the table:
| Name | Type | Output | Description |
|---|---|---|---|
| simple | cycle | histograms 1 canvas |
Generate gaussian random numbers and fill histograms |
| simplefile | cycle | histograms 2 canvases |
Generate gaussian randoms; saved in two directories; merge via file |
| event | cycle | histograms 1 canvas |
Generate Event entries; run simple analysis; uses 'event.par' PAR file |
| pythia8 | cycle | histograms 1 canvas |
Generate Pythia8 Monte Carlo events; run simple analysis; uses 'pythia8.par' PAR file |
| ntuple | cycle | ntuple 1 canvas |
Generate simple ntuple; merge via file |
| dataset | cycle | dataset 1 canvas |
Same as 'ntuple' but dataset creation instead of file merging; uses TProof::DrawSelect |
| friends | data | histograms 1 canvas |
Generate some simple main TTree trees and their friends; run simple analysis |
| h1 | data | histograms 2 canvases |
Usual H1 analysis |
| eventproc | data | histograms 1 canvas |
Simple analysis of files with Event entries; uses 'event.par' PAR file |
The 'Type' refers to the type of processing. The keyword 'data' means data-driven; this happens when processing a TTree, whose entries steer the distribution of work. The type 'cycle' indicates that a cycle of tasks has to be repeated the specified number of times and there is no steering TTree involved; this happens, for example, when generating Monte Carlo events.
The general arguments are shown in the table:
| Argument | Description |
|---|---|
| debug=[what:]level | Set verbosity to 'level'; optionally select the scope with 'what' (same names as in TProofDebug) |
| nevt=N | Process N entries |
| first=F | Start processing from entry F (when processing data files) |
| asyn | Run in non-blocking (asynchronous) mode |
| nwrk=N | Set the number of active workers to N (may not always be successful) |
| punzip | Use parallel unzip in reading files |
| cache=bytes cache=kbytesK cache=mbytesM |
Change the size of the TTree cache; use <=0 ti disable |
| submergers[=S] | Enable merging via submergers (number set to S or to the default) |
| rateest=average | Use the measured average to estimate the current processing speed reported by the progress bar |
| perftree=perftreefile.root | Generate the performance tree and save it to perftreefile.root |
In addition to these common arguments, there is a way to control the ACLiC mode used to build the selectors: by default '+' is used, i.e. compile-if-changed. However, this may lead to problems if the available selector libs were compiled in previous sessions with a different set of loaded libraries (this is a general problem in ROOT). When this happens the best solution is to force recompilation (ACLiC mode '++'). To do this just add '++' to the name of the tutorial, e.g. runProof("event++") .
2.1 Cycle-driven processing
The PROOF tutorials include three examples of cycle-driven processing, i.e. of a generic task executed in parallel N times.
2.1.1 "simple": ProofSimple
Selector: tutorials/proof/ProofSimple.h, tutorials/proof/ProofSimple.C
PAR file: none
This is an example of a random gaussian generation filling a configurable number of 1D histograms.
2.1.2 "event": ProofEvent
Selector: tutorials/proof/ProofEvent.h, tutorials/proof/ProofEvent.C
PAR file: tutorials/proof/event.par
This is an exampe of a simple Monte Carlo generation creating 'event'-like structures (see $ROOTSYS/test/Event.h). This example shows how to use a PAR package.
2.1.3 "pythia8": ProofPythia
Selector: tutorials/proof/ProofPythia.h, tutorials/proof/ProofPythia.C
PAR file: tutorials/proof/pythia8.par
This is similar to the previous example but with a real Monte Carlo generator, Pythia 8. It shows how to use a PAR package and how to set the relevant external variables.
2.2 Data-driven processing
The PROOF tutorials include three examples of data-driven processing, i.e. of processing steered by an exting TTree.
2.2.1 "h1": h1analysis
Selector: tutorials/tree/h1analysis.h, tutorials/tree/h1analysis.C
PAR file: none
This is the famous H1-analysis available since a long time under tutorials/tree. By default the data are read from the ROOT HTTP server. However, one can an change the location of the source.
2.2.2 "eventproc": ProofEventProc
Selector: tutorials/proof/ProofEventProc.h, tutorials/proof/ProofEventProc.C
PAR file: tutorials/proof/event.par
This is an example of data-driven analysis using a PAR package. It also shows how to vary the fraction of the event read for the analysis.
2.2.3 "friends": using TTree friends
Selector: tutorials/proof/ProofFriends.h, tutorials/proof/ProofFriends.C
PAR file: none
2.3 Large-output handling via files on workers
The PROOF tutorials include three examples of handling of large outputs via intermediate saving of the worker outputs on files local to the workers.
2.3.1 "simplefile": merging histograms via files
Selector: tutorials/proof/ProofSimpleFile.h, tutorials/proof/ProofSimpleFile.C
PAR file: none
2.3.2 "ntuple": merge simple ntuple (TTree) via files
Selector: tutorials/proof/ProofNtuple.h, tutorials/proof/ProofNtuple.C
PAR file: none
This example shows how to generate in parallel a simple ntuple, to save it in local files on the workers, and to automatically merge the files as part of the finalization phase.
2.3.3 "dataset": access simple ntuple (TTree) files via a dataset
Selector: tutorials/proof/ProofNtuple.h, tutorials/proof/ProofNtuple.C
PAR file: none
This example is similar to the previous one, except that the files, instead of being merge, are left on the workers and a dataset is created and registered on the master so that it can be automatically used in a subsequent run. In the example, the 'subsequent runs' are examples of drawing operation via PROOF.
2. Start a PROOF session using getProof.C
In this section we describe the macro 'getProof.C', located under $ROOTSYS/tutorials/proof, whihc is used by 'runProof.C' (and '$ROOTSYS/test/stressProof.C') to start a PROOF session, either locally (lite or standard) or to a remote cluster. When relevant, the macro checks for the state of the relevant daemon. For local daemons, the macro makes the necessary configurations steps and automatically (re)starts the daemon.
Signature:
TProof *getProof(const char *url = "proof://localhost:40000", Int_t nwrks = -1, const char *dir = 0, const char *opt = "ask", Bool_t dyn = kFALSE, Bool_t tutords = kFALSE)
The other arguments apply only in the case the local daemon on port 40000 needs to be started:
- dir: directory to be used for the files and working areas. When starting a new instance of the daemon the relevant files and directories in this directory are cleaned. If left null, the default is used: '/tmp/user/.getproof';
- opt: defines what to do if an existing xrootd uses the same ports; possible options are: "ask", ask the user; "force", kill the xrootd and start a new one; if any other string is specified the existing xrootd will be used. Default is 'ask'. Note that for a change in 'nwrks' to be effective you need to specify 'force';
- dyn: this flag can be used to switch on dynamic, per-job worker setup scheduling. Default is off;
- tutords: this flag can be used to force a dataset dir under the tutorial dir. Default is no.
Examples:
- Standard session with startup of the local daemon
root [0] .L tutorials/proof/getProof.C+ root [1] TProof *p = getProof() getProof: working area not specified temp getProof: working area (tutorial dir): /tmp/ganis/.getproof SysError in <TUnixSystem::UnixTcpConnect>: connect (localhost:40000) (Connection refused) getProof: xrootd config file at /tmp/ganis/.getproof/xpd.cf getProof: xrootd log file at /tmp/ganis/.getproof/xpdtut/xpd.log (NB: any error line from XrdClientSock::RecvRaw and XrdClientMessage::ReadRaw should be ignored) getProof: waiting for xrootd to start ... getProof: xrootd pid: 11681 getProof: start / attach the PROOF session ... Starting master: opening connection ... Starting master: OK Opening connections to workers: OK (4 workers) Setting up worker servers: OK (4 workers) PROOF set to parallel mode (4 workers) root [2]
- PROOF-Lite session with 8 workers
root [0] .L tutorials/proof/getProof.C+ root [1] TProof *p = getProof("lite://", 8) getProof: trying to open a PROOF-Lite session with 8 workers +++ Starting PROOF-Lite with 8 workers +++ Opening connections to workers: OK (8 workers) Setting up worker servers: OK (8 workers) PROOF set to parallel mode (8 workers) root [2]
| Attachment | Size |
|---|---|
| Tutorial-ProofIntro-1d.pdf | 580.7 KB |
| Tutorial-Proof-1d.pdf | 479.41 KB |