Re: [ROOT] reading a partially corrupted file

From: Rene Brun (Rene.Brun@cern.ch)
Date: Tue Jan 02 2001 - 22:12:34 MET


Hi Sue,
I have implemented your request. Now in CVS

Rene Brun

On Tue, 2 Jan 2001, Susan Kasahara wrote:

> Hi ROOT team,
> I have encountered a problem very similar to the problem reported by
> Dave Casper a couple of weeks ago.
> In my situation, I am running two processes - a daq & dispatcher.
> The daq saves at regular intervals its raw data TTree to file and the
> dispatcher reads the most recent version of this TTree at regular
> intervals to give its clients access to the most recent data generated
> by the daq.
>  Occasionally, the dispatcher and the daq "collide" such that the dispatcher
> attempts to read a region of the file just as the daq is overwriting that
> region with new data.  In this case the new TTree retrieved by the
> dispatcher is corrupted.
>    The tree is retrieved by the dispatcher via the statement:
> TTree* tree = (TTree*)gDirectory -> Get("EventTree");
> which in turn invokes TKey::ReadObj() to actually recover the
> requested tree object.
> If the tree is corrupt, TKey::ReadObj() prints the message:
> 
> R__unzip: error in header
> 
> but a non-null pointer to the corrupt tree is still returned to my calling
> routine as though nothing is wrong.
>   It seems like, as in the case that Dave described, the return value could
> be used to signal that a failure has occured.  In my case,  this signal
> would be a null pointer to the object.   I could then use this null return
> value as a signal to the dispatcher to try again.
> Thanks for your help,
> Sue Kasahara
> 
> Dave Casper wrote:
> 
> >         Hi,
> >
> > I have data which was written to a removable disk, a few sectors of which
> > appear to be (or have become) bad.  When reading the file, I get the
> > following error message on the screen:
> >
> > R__unzip: error during decompression
> > Error in <TBasket::Read>: fObjlen = 2497941, nout = 0
> >
> > (There is no "TBasket::Read" method; the message in question actually comes
> > from TBasket::ReadBasketBuffers).  My program crashes (not surprisingly)
> > when trying to process the event in question, probably because some chunk of
> > the data is missing.
> >
> > I am wondering if there is any way to handle such an error more gracefully.
> > Currently I check the return status of TTree::GetEntry.  About all I know to
> > do is check whether the number of bytes read is zero, since I don't know
> > what the number of bytes which *should* be read is.  That fails to catch
> > this error.  I don't see any way to tell whether such an error has occurred.
> > It would seem to me that if there is an error, TBasket should invalidate the
> > current operation and return zero bytes.  This should (naively) filter up
> > through the system such that TTree::GetEntry also returns zero bytes and
> > there is some way to determine that an error occurred during the call.
> > Alternatively, perhaps some status flag could be set if an error occurs at
> > any point during the operation, which I could check in addition to the
> > number of bytes read.  All that is really needed is for Root to set some
> > flag when an error is detected.  The user could be responsible for clearing
> > the status flag before any operation he wants to handle errors from, and
> > checking to see if it is set afterward.
> >
> > Of course, one hopes that such errors occur infrequently, but they will
> > happen from time to time, and it seems drastic if there is no way to detect
> > them and continue.  One of the nice things about having a direct access file
> > format is that corruption of a small part of a file shouldn't prevent the
> > other parts of it from being read.
> >
> > Is this type of error flagging something which could be added?  It seems a
> > shame to detect an error and print a message on the screen but provide no
> > way for the caller to know that it has occurred.
> >
> > Dave
> > dcasper@uci.edu
> 



This archive was generated by hypermail 2b29 : Tue Jan 01 2002 - 17:50:33 MET