Re: [ROOT] error handling

From: Rene Brun (Rene.Brun@cern.ch)
Date: Wed Feb 12 2003 - 14:43:13 MET


Hi Susan,

As you have guessed correctly, the message "Error in <TObjArray::At.."
is generated when Root reads what should be a class version number
from the buffer. This happens typically when the file has been overwritten.
The system, in principle, should be able to recover. In fact, I see that
you get the correct error message
 " Warning in <TBuffer::CheckByteCount>: PlexPlaneId::Streamer() not in
sync         with data on file, fix Streamer"
It is likely that some data member of one your class (eg a pointer)
will not contain valid data after this error. trying to access it and follow it
will generate a segm fault.

I can have a look at your file, if you want. Could you also send a subset
of your classes?

Rene Brun



> Hi roottalk,
> We are experiencing a problem with partially corrupt root data files.  These
> files have been produced on a reconstruction production farm, and the source of the
> corruption is not clear yet and is being investigated.
>     We believe that only a small subset of the entries on
> any one tree are affected by the data corruption, and I would like to be able
> to recognize when a corrupt record has been read into memory, warn the
> user, and then move to the next record in the tree so that all subsequent unaffected records
> can be processed.  I'm wondering how this can be done.
>   An example of what happens in a case where TTree::GetEntry() attempts
> to read a corrupt data record:
> 
> Error in <TObjArray::At>: index 66 out of bounds (size: 16, this: 0x0921aad0)
> Error in <TObjArray::At>: index 3072 out of bounds (size: 15, this: 0x09239978)
> Error in <TObjArray::At>: index 852 out of bounds (size: 15, this: 0x0923a740)
> Error in <TObjArray::At>: index -11069 out of bounds (size: 68, this: 0x0921aad0)
> Error in <TObjArray::AddAt>: out of bounds at -11069 in 921aad0
> Error in <TBuffer::CheckByteCount>: object of class PlexPlaneId read too many bytes: 6 instead of -1879030366
> Warning in <TBuffer::CheckByteCount>: PlexPlaneId::Streamer() not in sync with data on file, fix Streamer()
> Segmentation fault (core dumped)
> 
> The first sign of a problem, the TObjArray::At error messages, are produced when the
> TClass::ReadBuffer method reads a version number from the corrupt data buffer that is
> ridiculous for the class and uses that corrupt version number to access the TObjArray
> containing the StreamerInfo's. The subsequent segv occurs deep within root and the
> GetEntry method never returns to the user.
> Other corrupt data records produce different symptoms, but the segv is usually
> preceded by some error messages from root.
>   I thought I could perhaps use an error handler to catch the errors and abort the
> read of the current entry without aborting the job, which would allow the user to continue
> processing entries.  Unfortunately, I'm really a novice at using error handlers, and although I see that
> I can override root's ErrorHandler default function using TError's SetErrorHandler
> method, I don't see how to write the function so that it resurfaces at the place in my
> code just after the TTree::GetEntry() method is invoked.  Perhaps this is a bad idea anyway,
> since it may leave some unfinished business in TTree::GetEntry?
>   Can anyone suggest a solution to skip past these corrupt records?  Of course, our
> first priority is to fix the cause of the data corruption and all data will eventually be
> reprocessed, but this will be a stopgap measure to allow the user to look at the data
> in the meantime.
> Thanks for your help,
> -Sue
> p.s. I'm using root cvs as of this past Sunday and gcc 3.2 on rh linux to read the data files.
> The data files were produced with an older version of root.



This archive was generated by hypermail 2b29 : Thu Jan 01 2004 - 17:50:09 MET