PROOF
- New functionality
- PROOF-Lite
- 2-tier
realization of PROOF intended for multi-core machines; the client
starts directly the workers; no daemon is required. To start a session
just use TProof::Open("") or TProof::Open("lite"). From there on
everything should be as in normal PROOF. To start a standard PROOF
session (i.e. via daemons) on the localhost use
TProof::Open("localhost").
- XrdProofd plug-in
- Possibility to define the list worker directly in the
xrootd config file (new directive xpd.worker, see Wiki reference pages)
- Support for automatic reconnections in the case xrootd
is restarted
- Dedicated admin area (under xrd.admin/.xproof.port) to
keep information about active and terminated sessions, and active
clients. This is used to reguraly check the client and session
activity, to cleanup orphalin sessions and to shutdown inactive client
connections.
- domain + level control of printout message
- Dynamic "per-query" scheduling
- Dynamic worker startup. It can be enabled by the cluster
administrator with the 'xpd.putrc Proof.DynamicStartup 1' directive
in the config file. The effect is that a session starts only on
the master. When a query is submitted (call to TProof::Process),
the session master contacts the scheduler.
In response it receives a list of workers and starts the worker
processes. The environment is copied from the master to the workers.
It consist of: the include and library paths, the set of enabled
packages as well as the macros loaded by the user.
- Flexible and fault-tolerant workers
- A packet resubmitting mechanism. When a worker dies all the
packets that it processed are resubmitted.
- Added the possibility to handle dynamically removed workers and partly processed
packets (when a worker is stopped while processing a packet it finishes
the current event and the rest of the packet is reassigned to another workers).
It's done by a new method TPacketizerAdaptive::AddProcessed(TSlave *sl,
TProofProgressStatus *st, TList **) and TPacketizerAdaptive::ReassignPacket.
- Add
possibility to display the memory footprint on workers and master as a
function of the entry processed (workers) or of the merging step
(master). A new button has been added to the PROOF dialog box to
retrieve and display the memory usage. On the workers about 100
measurements are recorded by default; this number can be changed with'proof->SetParameter("PROOF_MemLogFreq", memlogfreq)';
- Improvements:
- More
complete set of tests in test/stressProof . To run with PROOF-Lite pass
the argument 'lite' as master URL, e.g. './stressProof lite'.
- Possibility
to control on the client via rc variable the location of the sandbox,
package directory, cache and dataset directory (the latters two only
for PROOF-Lite); the variable names are 'Proof.Sandbox', 'Proof.PackageDir', 'Proof.CacheDir' and 'Proof.DataSetDir'. The default location of the sandbox has been changed from "~/proof" to "~/.proof" to avoid interferences with possible users' working areas.
- XrdProofd plug-in
- Overall refactorization for easier
maintainance and improved solidity
- Improved format of printout messages: all information
messages contain now the tag 'xpd-I' and all error messages the
tag 'xpd-E', so that they can easily be grepped out from the
log file.
- Log sending
-
Implement selective sending of logs from workers to master to avoid duplicating
too many text lines on the master log. Logs are now sent only after Exec, Print
requests and in case an error (level >= kError) occured. Of course, the full
logs can always be retrieved via TProofMgr::GetSessionLogs
- Log retrieval:
- for 'grep' operations, use the system 'grep' command
via 'popen'
instead of a handmade filtering; this implies that the full grep
functionality is now available
- set the default number of displayed lines to 100
instead of 10
- Improve diagnostic in case of worker death: clients will
now
receive a message containing the low level reason for the failure and a
hint for getting more information
- In
TProofOutputFile, support the "<user>" and "<group>"
placeholders in the output file name to automatically re-direct the
output to an area specific to the logged user.
- Addition of a new class TProofProgressStatus, which is used to keep
the query progress stauts in all the TProofPlayer objects and in the
TPacketizerAdaptive. It is also send in kPROOF_GETPACKET and
kPROOF_STOPPROCESS messages.
- The class TPacketizerProgressive is removed.
- Changing the protocol version to 19: TProofProgressStatus used in
kPROOF_STOPPROCESS and kPROOF_GETNEXTPACKET messages in Master - worker communication
- Fixes
- Invalidate the TProofMgr when the physical connection is
closed; avoids
crashing when trying to get the logs after a failure.
- Fix a memory leak in log retrieval (the TProofLog object
was never
deleted)
- Add protections for the cases the manager cannot be
initialized
- Fix a race condition possibly affecting the handling of
workers death
- Avoid duplicating worker logs in the master log file
unless
when explicitely needed by the request (Exec(...), Print(...)) or when
an error occured
- Fix
problem with the determination and transmission of the name of the
object to be processed. The problem appeared when processing files
containing >1 trees in changing order.
- Fix problem with TProof::Load loading the macro to one worker only per machine
- Fix wrong return code preventing the correct propagation of the full ClearPackage to workers
- Fix a problem causing the whole query to stop even in the case a worker was terminated gently with SIGTERM.
-
Fix a problem triggering full re-build of a package upon change of a
single file; the version info file was wrongly reset; this should
happen only after a re-build.
- Make sure that in case multiple TProofOutputFile are present, each get merged correctly
- Fix problem in TProofServLogHandler::Notify due to bad usage of Form(...).