You are here

Basic processing

  1. Introduction
  2. Dataset definition
    1. Using a TFileCollection
    2. Using a TDSet
    3. Using a TChain
  3. TProof::Process API
    1. Cycle-driven processing
    2. Data-driven processing
      1. Processing a TFileCollection by object
      2. Processing a TFileCollection by name
      3. Processing a TDSet
      4. Processing a TChain
  4. Information about the last query

1. Introduction

Basic processing in PROOF is steered by the TProof::Process procedure. In PROOF terminology a call to any of the TProof::Process represents a query. A query is defined by an algorithm, the number of time a basic task is repeated and possibly a dataset. In the latter case the number of times the task is repeated is typically the number of entries in the dataset files. The algorithm must be defined in the TSelector framework (see also Developing a TSelector). The TSelector is passed either by implementation file, from which a TSelector object is instantiated on each working process, or by selector name, if the TSelector inheriting class is already known to the system (for example because included in a loaded package).

Starting with ROOT version 5.34 is possible also to pass the selector by object, that is, to create a TSelector object in the client session and give it to TProof::Process; the selector will be then streamed to the relevant processes.

2. Dataset definition

2.1 Using a TFileCollection

The TFileCollection class is a named collection of files (each described by a TFileInfo) and therefore represents the recommended way to describe a dataset in ROOT. TFileCollection allows to store meta-information about the content of the files and therefore allows to describe at once all the trees included in a file. In addition, some of the experiment catalogues (e.g. the ALICE one) output a TFileCollection object from their query engine. PROOF provides a way to register TFileCollection objects on the master which can be referred to by-name. See ...

2.2. Using a TDSet

The TDSet class is the way PROOF internally handles the dataset. Starting from a TDSet will minimize the number of internal transformations, but may be less convenient if the original collection is not in TDSet format.

2.3. Using a TChain

TChain is a class that allows to describe a dataset - i.e. a set of files - with a TTree interface. By construction a TChain can describe only one TTree, so if more trees are contained in the files one needs to define a TChain per each TTree. PROOF internally must transform the TChain into a TDSet, so for large datasets starting from a TChain is quite inefficient.

3. TProof::Process API

In this section we describe the TProof methods devoted for processing.

In all cases the option field is available inside the selector via TSelector::GetOption() (some special options, e.g. "ASYN", "feedback=...", are filtered out before starting processing).

The methods taking a TSelector object are only available starting from ROOT 5.34 . In this case the client can create and configure the TSelector object before start processing. Note that the TSelector object must be streamable for this to work, i.e. a positive class version must be set via the ClassDef macro, increased each time the selector class members are modified. Also, the TSelector class must be known on the master and workers nodes. Typical logical workflow:

root [] TProof *proof = TProof::Open("")
root [] proof->Load("MySelector.C+") // Load locally and on the cluster
root [] MySelector *mysel = new MySelector(arg1, arg2)
root [] mysel->setMyParms(par1, par2, par3)
root [] proof->Process(mysel, 1000000)

 

3.1 Cycle-driven processing

In this case we only need to specify a TSelector and the number of times, ncycles, its Process method is called.

Long64_t Process(const char *selector, Long64_t ncycles, Option_t *option = "")
Long64_t Process(TSelector *selector, Long64_t ncycles, Option_t *option = "")

 

3.2 Data-driven processing

Data-driven processing is steered by a dataset, i.e. by a main TTree stored in a set of files. The Process methods take, therefore, a dataset - in one of the forms described above - and a TSelector; the number of entries, nentries, and the first entry to process, firstentry, are optional and only needed to restrict processing to a sub-sample of events.

3.2.1 Processing a TFileCollection by object

The dataset is in this case defined by a TFileCollection object created by the client. This is object is passed directly to Process and then transferred by PROOF to the master.

Long64_t Process(TFileCollection *fc, const char *selector, Option_t *option = "", Long64_t nentries = -1, Long64_t firstentry = 0)
Long64_t Process(TFileCollection *fc, TSelector *selector, Option_t *option = "", Long64_t nentries = -1, Long64_t firstentry = 0)

 

3.2.2 Processing a TFileCollection by name

Processing a TFileCollection by name is possible when the TFileCollection has been registered on the master using the PROOF dataset manager interface (see Working with data sets). The available datasets can be displayed using TProof::ShowDataSets().

Long64_t Process(const char *dsetname, const char *selector, Option_t *option = "", Long64_t nentries = -1, Long64_t firstentry = 0, TObject *enl = 0)
Long64_t Process(const char *dsetname, TSelector *selector, Option_t *option = "", Long64_t nentries = -1, Long64_t firstentry = 0, TObject *enl = 0)

 

3.2.3 Processing a TDSet

The dataset is in this case defined by a TDSet object created by the client. This is object is passed directly to Process and then transferred by PROOF to the master.

Long64_t Process(TDSet *dset, const char *selector, Option_t *option = "", Long64_t nentries = -1, Long64_t firstentry = 0)
Long64_t Process(TDSet *dset, TSelector *selector, Option_t *option = "", Long64_t nentries = -1, Long64_t firstentry = 0)

 

3.2.4 Processing a TChain

The Process(...) methods used in this case are those of TChain:

Long64_t Process(const char *selector, Option_t *option = "", Long64_t nentries = -1, Long64_t firstentry = 0)
Long64_t Process(TSelector *selector, Option_t *option = "", Long64_t nentries = -1, Long64_t firstentry = 0)
The TChain::SetProof(TProof *) method is used to tell the chain to use PROOF for processing instead of the local ROOT session. This is the typical flow:
root [] TChain chain("MyTree")
root [] TProof *proof = TProof::Open("")
root [] chain.SetProof(proof)
root [] chain.Process("myselector.C+")
root [] chain.SetProof(0) // Detach from the PROOF session

 

The TChain is internally transformed in a TDSet object which is sent to the master.

4. Information about the last query

Detailed information about the last query can be obtained from the TQueryResult object returned by TProof::GetQueryResult() method. For example
 

root [3] gProof->GetQueryResult()->Print()
+++ #:1 ref:"pcphsft64-1444127173-2526:q1" sel:h1analysis finalized evts:0-283812
root [4] gProof->GetQueryResult()->Print("F")
+++
+++ #:1 ref:"pcphsft64-1444127173-2526:q1" sel:h1analysis finalized
+++        started:   Tue Oct  6 12:26:13 2015
+++        init:      2.932 sec
+++        process:   11.957 sec (CPU time: 5.2 sec)
+++        merge:     0.313 sec
+++        processed: 283813 events (size: 36.352 MBs)
+++        rate:      23736.8 evts/sec
+++        # workers: 8
+++        results:   sent to client
+++        outlist:   19 objects