Re: [ROOT] error handling

From: Susan Kasahara (schubert@hep.umn.edu)
Date: Thu Feb 13 2003 - 07:10:44 MET


Hi Rene,
Thanks for your offer to look at the file.  I've put a few examples
of corrupt data files along with a description of the problem observed on:
http://www.hep.umn.edu/~schubert/badfiles/
I will also try to pull together a subset of relevant classes and send this
to you tomorrow.
-Sue

Rene Brun wrote:

> Hi Susan,
> 
> As you have guessed correctly, the message "Error in <TObjArray::At.."
> is generated when Root reads what should be a class version number
> from the buffer. This happens typically when the file has been overwritten.
> The system, in principle, should be able to recover. In fact, I see that
> you get the correct error message
>  " Warning in <TBuffer::CheckByteCount>: PlexPlaneId::Streamer() not in
> sync         with data on file, fix Streamer"
> It is likely that some data member of one your class (eg a pointer)
> will not contain valid data after this error. trying to access it and follow it
> will generate a segm fault.
> 
> I can have a look at your file, if you want. Could you also send a subset
> of your classes?
> 
> Rene Brun
> 
> 
> 
> 
>>Hi roottalk,
>>We are experiencing a problem with partially corrupt root data files.  These
>>files have been produced on a reconstruction production farm, and the source of the
>>corruption is not clear yet and is being investigated.
>>    We believe that only a small subset of the entries on
>>any one tree are affected by the data corruption, and I would like to be able
>>to recognize when a corrupt record has been read into memory, warn the
>>user, and then move to the next record in the tree so that all subsequent unaffected records
>>can be processed.  I'm wondering how this can be done.
>>  An example of what happens in a case where TTree::GetEntry() attempts
>>to read a corrupt data record:
>>
>>Error in <TObjArray::At>: index 66 out of bounds (size: 16, this: 0x0921aad0)
>>Error in <TObjArray::At>: index 3072 out of bounds (size: 15, this: 0x09239978)
>>Error in <TObjArray::At>: index 852 out of bounds (size: 15, this: 0x0923a740)
>>Error in <TObjArray::At>: index -11069 out of bounds (size: 68, this: 0x0921aad0)
>>Error in <TObjArray::AddAt>: out of bounds at -11069 in 921aad0
>>Error in <TBuffer::CheckByteCount>: object of class PlexPlaneId read too many bytes: 6 instead of -1879030366
>>Warning in <TBuffer::CheckByteCount>: PlexPlaneId::Streamer() not in sync with data on file, fix Streamer()
>>Segmentation fault (core dumped)
>>
>>The first sign of a problem, the TObjArray::At error messages, are produced when the
>>TClass::ReadBuffer method reads a version number from the corrupt data buffer that is
>>ridiculous for the class and uses that corrupt version number to access the TObjArray
>>containing the StreamerInfo's. The subsequent segv occurs deep within root and the
>>GetEntry method never returns to the user.
>>Other corrupt data records produce different symptoms, but the segv is usually
>>preceded by some error messages from root.
>>  I thought I could perhaps use an error handler to catch the errors and abort the
>>read of the current entry without aborting the job, which would allow the user to continue
>>processing entries.  Unfortunately, I'm really a novice at using error handlers, and although I see that
>>I can override root's ErrorHandler default function using TError's SetErrorHandler
>>method, I don't see how to write the function so that it resurfaces at the place in my
>>code just after the TTree::GetEntry() method is invoked.  Perhaps this is a bad idea anyway,
>>since it may leave some unfinished business in TTree::GetEntry?
>>  Can anyone suggest a solution to skip past these corrupt records?  Of course, our
>>first priority is to fix the cause of the data corruption and all data will eventually be
>>reprocessed, but this will be a stopgap measure to allow the user to look at the data
>>in the meantime.
>>Thanks for your help,
>>-Sue
>>p.s. I'm using root cvs as of this past Sunday and gcc 3.2 on rh linux to read the data files.
>>The data files were produced with an older version of root.
>>



This archive was generated by hypermail 2b29 : Thu Jan 01 2004 - 17:50:09 MET