You are here

Running the PROOF query in valgrind

For linux-based systems, the possibility to run PROOF queries in valgrind is available since ROOT version 5.14/00. The several manual settings needed to achieve that where automatized in ROOT 5.24/00, for standard PROOF, and 5.26/00 for PROOF-Lite. For versions earlier than these (and >= 5.14/00) the manual setup described below should still work.

The idea behind the valgrind runs is to help tracing memory leaks, corruptions, and anything else that can be caught with valgrind. Note, however, that these kind of runs are meaningful only if running a ROOT version built with debug symbols (configured with '--build-debug'). Note, also, that valgrind must be available on the cluster: no check is currently done for this, and the absence may lead to unresponsiveness of the system.

  1. Automatic setup
    1. Master session
    2. Worker sessions
    3. Options to valgrind
    4. Valgrinding the tutorials (runProof)
    5. Caveats
      1. PROOF-Lite caveats
  2. Manual setup
    1. Standard PROOF
    2. PROOF-Lite

Automatic setup

Master session

To run the master session under valgrind you should start PROOF with the option 'valgrind' or 'valgrind=master'

root [] p = TProof::Open("ganis@alicecaf.cern.ch", "valgrind")

 ---> Starting a debug run with valgrind (master:YES, workers:NO)
 ---> Please be patient: startup may be VERY slow ...
 ---> Logs will be available as special tags in the log window (from the progress dialog or TProof::LogViewer())
 ---> (Reminder: this debug run makes sense only if you are running a debug version of ROOT)

Starting master: opening connection ...
Starting master: connection open: setting up server ...

The session may take a few minutes to start. Once the session is started, you should run your query as in a normal run. The output from valgrind is saved in a separated log file which is also available via the TProofLog object, with a special '-valgrind' tag:

 

Note that master valgrinding does not make sense in PROOF-Lite but in 5.26/00 a protection against this is missing leading to weird error messages.

Workers sessions

To valgrind the worker sessions one has to start PROOF with the option 'valgrind=workers'

root [] p = TProof::Open("ganis@alicecaf.cern.ch", "valgrind=workers")

 ---> Starting a debug run with valgrind (master:NO, workers:2)
 ---> Please be patient: startup may be VERY slow ...
 ---> Logs will be available as special tags in the log window (from the progress dialog or TProof::LogViewer())
 ---> (Reminder: this debug run makes sense only if you are running a debug version of ROOT)

Starting master: opening connection ...
Starting master: connection open: setting up server ...
Opening connections to workers: 

 

By default, to reduce the overhead on the machines, only two worker sessions are started for this run. This is typically enough to study a problem on the workers. The number of workers started under valgrind can be passed as an option using the '#' character: for example, to start 5 workers enter 'valgrind=workers#5' .

As for the master run, the logs are available in the log viewer frame with special '-valgrind' tags.

To run both the master and workers under valgrind, enter the option 'valgrind=master+workers' (the of workers is controlled by the '#' also in this case).

Options to valgrind

By default valgrind is started with the following options:
          -v          verbose mode
          --suppressions=$ROOTSYS/etc/valgrind-root.supp
                       apply the known suppressions from the ROOT distribution being used
          --log-file=.valgrind.log'
                       save logs so that the standard log retrieval mechanisms can find them

It is possible to add options using the TProof::AddEnvVar mechanism, the variable 'PROOF_WRAPPERCMD' and the 'valgrind_opts:' prefix. For example, to run a full memory check on the workers, one has to enter enter the following:

root [] TProof::AddEnvVar("PROOF_WRAPPERCMD", "valgrind_opts:--leak-check=full")
root [] p = TProof::Open("ganis@alicecaf.cern.ch", "valgrind=workers")
Info in <:parseconfigfield>: valgrind run: resetting 'PROOF_WRAPPERCMD': must be set again for next run , if any
 ---> Starting a debug run with valgrind (master:NO, workers:2)
...

Valgrinding the tutorials (runProof, stressProof)

It is possible to trigger the automatic valgrind setup by defining the env GETPROOF_VALGRIND. For example, to run the master in valgrind do

$ export GETPROOF_VALGRIND="valgrind=master"
(or
$ export GETPROOF_VALGRIND="valgrind=master valgrind_opts:--leak-check=full"
to also set some options) before running runProof. Note that this acts at the level of getProof which is also called internally by runProof; therefore this is also the way to valgrind stressProof, because 'stressProof' also uses getProof
to start the PROOF session.

Caveats

Note that valgrind logs many useful information only when quitting the application; therefore, to get the information that you are looking for, you may need to quit the session, restart and browse the logs with the log viewer.

PROOF-Lite

In PROOF-Lite workers valgrinding should be the default; however, in 5.26/00 you still need to enter the full 'valgrind=workers' option string to enable it. Also, in 5.26/00 you cannot control the number of workers in the PROOF-Lite session via the '#' character; you have to do it via the standard PROOF-Lite way, e.g. TProof::Open("workers=2","valgrind=workers") will start a valgrind PROOF-Lite session with 2 workers, no matter how many cores the machine has.

Manual setup

We describe here how to manually setup a valgrind run. This functionality is available starting with ROOT version 5.14/00 . The all idea is to run the PROOF sessions within valgrind. To achieve this one needs to define 'valgrind' as wrapper command via the proper environment variables and to modify the relevant timeouts to account for the longer startup time of the sessions within valgrind.

Note that with the manual setup is not possible to retrieve automatically the valgrind log files via the PROOF log viewer: you must be able to access those files directly in a alternative way.

Standard PROOF

The environment variables that control the wrapper to be used to start a PROOF session are given in the table:

PROOF_MASTER_WRAPPERCMD Wrapper command to be used to start the master session; workers are started normally
PROOF_SLAVE_WRAPPERCMD Wrapper command to be used to start the worker sessions; master is started normally
PROOF_WRAPPERCMD Wrapper command to be used to start all the sessions

In order to make these effective for the sessions one has to add them to the proper list using the static TProof::AddEnvVar. For example:

root [] TProof::AddEnvVar("PROOF_WRAPPERCMD","valgrind -v --log-file=/tmp/vg/proof-%p.log") 
starts all the sessions (master and workers) via valgrind, in verbose mode, with the log files created under /tmp/vg and named 'proof-%p .log' (the placeholder '%p' is available in recent valgrind versions: check your valgrind installation for the exact options available).

The timeout which is relevant in this case is the the one defined via xpd.intwait to be set in the xproofd (xrootd) configuration file: one should set it to some high value (>600):

root [] TProof::AddEnvVar("PROOF_NWORKERS","2")

When checking workers it is also a good idea to reduce the number of workers to a small number to speedup a bit the run: most of the problems that can be spotted by valgrind are already visible with 2 workers (this is the default in the automatic setup). For ROOT versions >= 5.24/00 the number of workers to be started is controlled by the environment variable PROOF_NWORKERS, e.g.

xpd.intwait 600

For previous versions the modification of the proof.conf file is required.

PROOF-Lite

In the case of PROOF-Lite the wrapperin principle the wraper can be set manually in exactly the same way as for the standard case. However, this was not working properly before 5.26/00 . Fortunately, the way PROOF-Lite works provide an easy workaround. In fact, the PROOF-Lite workers inherit automatically the environment of the client starting the sessions. Therefore the TProof::AddEnvVar calls can be just replaced by the equivalent setting to be done before starting ROOT. For example, to valgrind 2 PROOF-Lite workers the following should be done:

$ export PROOF_WRAPPERCMD="valgrind -v --log-file=/tmp/vg/lite-22-%p"
$ root -l
root [0] gEnv->SetValue("ProofLite.StartupTimeOut", 500)
root [1] TProof *p = TProof::Open("workers=2")
The example also shows how to control the relevant timeout and the number of workers to be valgrinded.