You are here

Enabling the use of submergers

1. Introduction

PROOF has the possibility to parallelize the merging step by using a subset of the workers (the ones ending first) as mergers. This technique is particularly useful when the output is composed by a large number of objects whose size does not depend on the number of entries or processing time. The typical case are histograms. Under certain assumptions it can be shown that the optimal number of mergers is the square root of number of set of outputs to be merged. Since typically we have one set of output per worker, the optimal number is the square root of the number of available workers.
See root.cern.ch/drupal/sites/default/files/OpocenskaThesis.pdf .

2. Enabling use of mergers in PROOF

The use of sub-mergers can be enabled both on the client or server side. Clients can request it by setting the parameter PROOF_UseMergers to 0, for letting PROOF to calculate the optimal number of mergers, or to the number of mergers to be used.

root [] proof->SetParameter("PROOF_UseMergers", (Int_t)0) .

A negative value disable the use of mergers in the case it is enabled by default on the server side.

On the server side the use of mergers is enabled by the directive

xpd.putrc Proof.UseMergers 0

As for clients, 0 instructs PROOF to calculate the optimal number of mergers; a positive number will force a given number of mergers.

2.1 Versions 5.34/10 to 5.34/28

An optimized mechanism for output sending was introduced in 5.34/10; the modified logic prevented the 'PROOF_UseMerges' setting to be effective, is used alone. The additional setting

root [] proof->SetParameter("PROOF_ControlSendOutput", (Int_t)0)

is required for those version.

3. Caveats

The mergers open a server socket to accept connection from the workers assigned to them. The system must allow these incoming connections. The set of TCP ports required to be open among the worker nods is 5000-15000 .