Re: fault tolerant processing

From: Fons Rademakers (Fons.Rademakers@cern.ch)
Date: Thu Mar 11 1999 - 17:34:21 MET


In principle errors are trapped by ROOT. However, correct continuation
depends a lot on where the error happened. If it was somewhere in the
interpreter, things are currently not correctly reset to allow a
restart.
We expect to work on this problem with Masa during the ROOT workshop
in Fermilab. If the error happens somewhere else a lot depends on how
good the code can handle re-entrancy, etc. The system should support
some global reset so that one can continue with a next event in a clean
state.

Cheers, Fons.


Valeriy Onuchin wrote:
> 
> Hi Rooters!
> We are using ROOT for online monitoring
> http://emcal06.rhic.bnl.gov/~onuchin/Sproot/html/USER_Index.html
> 
>  One of the main our problems is providing
> fault tolerant processing =
> providing recovery from system/ROOT/process failure.
> 
>  If anybody has solutions or experience how to deal with it ?
> 
>  With best regards,                     Valery
> 
> P.S.
>  Suppose similar problems must be in offline processing too,
> e.g. AtlasFast and Star have a chain of makers,
> what do you do when one of the makers crashed your root session?
> 
>  ... and suggestion
> we are using TMapFiles for local storage of processed data
> http://emcal06.rhic.bnl.gov/~onuchin/Sproot/html/DbManager.html
> after introducing TMapRec it became possible to loop over
> objects in TMapFile ,
> 
>  but could you(Fons) change   TMapFile:AcquireSemaphore()
> and TMapFile::ReleaseSemaphore() from protected to public ?

-- 
Org:    CERN, European Laboratory for Particle Physics.
Mail:   1211 Geneve 23, Switzerland
E-Mail: Fons.Rademakers@cern.ch              Phone: +41 22 7679248
WWW:    http://root.cern.ch/~rdm/            Fax:   +41 22 7677910



This archive was generated by hypermail 2b29 : Tue Jan 04 2000 - 00:43:30 MET