[ROOT] error handling

From: Susan Kasahara (schubert@physics.umn.edu)
Date: Wed Feb 12 2003 - 05:22:34 MET


Hi roottalk,
We are experiencing a problem with partially corrupt root data files.  These
files have been produced on a reconstruction production farm, and the source of the
corruption is not clear yet and is being investigated.
    We believe that only a small subset of the entries on
any one tree are affected by the data corruption, and I would like to be able
to recognize when a corrupt record has been read into memory, warn the
user, and then move to the next record in the tree so that all subsequent unaffected records
can be processed.  I'm wondering how this can be done.
  An example of what happens in a case where TTree::GetEntry() attempts
to read a corrupt data record:

Error in <TObjArray::At>: index 66 out of bounds (size: 16, this: 0x0921aad0)
Error in <TObjArray::At>: index 3072 out of bounds (size: 15, this: 0x09239978)
Error in <TObjArray::At>: index 852 out of bounds (size: 15, this: 0x0923a740)
Error in <TObjArray::At>: index -11069 out of bounds (size: 68, this: 0x0921aad0)
Error in <TObjArray::AddAt>: out of bounds at -11069 in 921aad0
Error in <TBuffer::CheckByteCount>: object of class PlexPlaneId read too many bytes: 6 instead of -1879030366
Warning in <TBuffer::CheckByteCount>: PlexPlaneId::Streamer() not in sync with data on file, fix Streamer()
Segmentation fault (core dumped)

The first sign of a problem, the TObjArray::At error messages, are produced when the
TClass::ReadBuffer method reads a version number from the corrupt data buffer that is
ridiculous for the class and uses that corrupt version number to access the TObjArray
containing the StreamerInfo's. The subsequent segv occurs deep within root and the
GetEntry method never returns to the user.
Other corrupt data records produce different symptoms, but the segv is usually
preceded by some error messages from root.
  I thought I could perhaps use an error handler to catch the errors and abort the
read of the current entry without aborting the job, which would allow the user to continue
processing entries.  Unfortunately, I'm really a novice at using error handlers, and although I see that
I can override root's ErrorHandler default function using TError's SetErrorHandler
method, I don't see how to write the function so that it resurfaces at the place in my
code just after the TTree::GetEntry() method is invoked.  Perhaps this is a bad idea anyway,
since it may leave some unfinished business in TTree::GetEntry?
  Can anyone suggest a solution to skip past these corrupt records?  Of course, our
first priority is to fix the cause of the data corruption and all data will eventually be
reprocessed, but this will be a stopgap measure to allow the user to look at the data
in the meantime.
Thanks for your help,
-Sue
p.s. I'm using root cvs as of this past Sunday and gcc 3.2 on rh linux to read the data files.
The data files were produced with an older version of root.



This archive was generated by hypermail 2b29 : Thu Jan 01 2004 - 17:50:09 MET