You are here

Uploading data files to a PROOF cluster

Access to the data to be processed should be provided to the workers using the appropriate means, i.e. by opening access to the required mass storages and/or instrumenting the storage available on the nodes with appropriate systems, e.g. xrootd, and exploiting the related tools, e.g. xrdcp.

Assuming that the chosen mass storage is accessible by the client machine via the standard ROOT tools (TFile::Open, TSystem, ...), there are a couple of simple functions which could be used to upload on this mass storage a set of files provide via a TList or a text file.

These static functions are hosted by TProofMgr and will be described in this section. These functiona are available starting from ROOT development version 5.33/02 .

The TProofMgr::UploadFiles functions

The signatures of the functions are the following:

static TFileCollection *UploadFiles(TList *src, const char *mss, const char *dest = 0); 
static TFileCollection *UploadFiles(const char *srcfiles, const char *mss, const char *dest = 0); 

They differ in the first argument.
In the first case the argument is a TList either of TFileInfo or of TObjString. The function will interpret the first URL in TFileInfo or the string in TObjString as the URL of the file to be uploaded.
In the second case the argument is either the path to a text file where the paths to the files to be uploaded are given or a the path of a directory containing the files to be uploaded. In the case of the text file, the must be specified one per line, with lines beginning by '#' being ignored:

# The H1 files under "/data/h1" (this line is ignored)
/data/h1/dstarmb.root
/data/h1/dstarp1a.root
/data/h1/dstarp1b.root
/data/h1/dstarp2.root

The second and third arguments are the same for the two functions.
The second argument specifies the URL of the destination. This includes the protocol, host, port and possibly the first components of the path. This argument is mandatory.

The third argument specifies the remaining part of the path at destination and it accepts place-holders allowing to modify the final path. The supported place-holders are given in the table:
 

UploadFiles: supported place-holders to define the destination
,,,... n-th componet of the sore sub-path
basename in the source path
sequential number of the file in the list or text file
path file in the source path
local user name
local group name

This argument is not mandatory and default to ''.
As an example, if the source file is

      protosrc://host//d0/d1/d2/d3/d4/d5/myfile

dest is

      /pool/user/////

mss is

      protodst://hostdst//nm/

then the corresponding destination path is

      protodst://hostdst//nm/pool/user/d3/d4/d5/99/myfile

Example

This example shows how to upload the files from directory "/data/files" to the MSSS associated with the PROOF cluster at 'master', adding the sequential number to the destination path; the files are then registered as dataset 'mydataset':

root [] TProof *proof = TProof::Open("master")
root [] TFileCollection *fc = TProofMgr::UploadFiles("/data/files", proof->Mgr()->GetMssUrl(), ".")
root [] proof->RegisterDataSet("mydataset", fc)