[ROOT] Strange problem with TBranch buffer size

From: Christian Holm Christensen (cholm@hehi03.nbi.dk)
Date: Tue Aug 21 2001 - 18:41:48 MEST


Hi, 

I've recently come across a strang thing that happens to me when
reading back a TTree from a file.  The TTree contains one top level
branch of a class Foo, that contains a TObjArray.  The TObjArray may
contain objects of classes Foo, Bar, Baz, Qux, ... and any number of
them.  

When things go wrong, it's usually flaged with 

    index -20432 out of bounds (size: 13, .... 

and then a segmentation violation causing an abort and core dump. 

So I did some serious debugging. It turns out the out of bounds error
comes from "Int_t TClass::ReadBuffer(TBuffer &b, void *pointer)": 

   UInt_t R__s, R__c;
   Version_t version = b.ReadVersion(&R__s, &R__c);    
   if (gFile && gFile->GetVersion() < 30000) version = -1;
   TStreamerInfo *sinfo = (TStreamerInfo*)fStreamerInfo->At(version);
                                                            ^^^^^^^

where version is some absurd number.  Further debugging showed, that
the problem poped actually popped up in  
"Version_t TBuffer::ReadVersion(UInt_t *startpos, UInt_t *bcnt)": 

    union {
      UInt_t     cnt;
      Version_t  vers[2];
    } v;
    *this >> v.vers[1];
    *this >> v.vers[0];

Here, v.cnt is supposed to be a masked byte count, but instead it is
0, or something so that in the next instruction 

    if (!(v.cnt & kByteCountMask)) {
      fBufCur -= sizeof(UInt_t);
      v.cnt = 0;
    }

the conditional is true, and so the buffer backs up "sizeof(UInt_t)" =
4 bytes, and then goes on. The next thing is: 

    *bcnt = (v.cnt & ~kByteCountMask);
    *this >> version;

So the TClass gets the wrong byte count, and a wierd version number,
since the buffer was reading the wrong stuff!  Later on this is also
what causes the segmentation violation.  So, the whole thing is messed
up because of this short (or is it long?) read.  So, I played around a
bit, tried different things.  

It turns out, that certain combinations of the buffer size and split
level makes the thing happen.  I did an investigation, and here's what
I found: 

   split |              buffer size 
   level | 100   2000   4000   6400   16000   32000
   ------+-----------------------------------------
     0   | n/a    n/a     ok    bad     n/a     n/a
     1   |  ok     ok    bad    bad     bad     bad
     2   |  ok    n/a    n/a    ok       ok      ok
    99   |  ok    n/a    n/a    ok       ok      ok

"n/a" means I was lazy and didn't do the test.  "ok" I could read back
fine. "bad" means it failed as outlined above. 

Ok, so the numbers above only really makes sense if you have the full
class specs and are running the thing on a machine like to mine.  

This seems odd to me.  Does anyone have a good explanation?  Is this
really the behaviour intended (presuming that I'm not doing something
wrong, which I don't thing I do).  

My machine is a Pentium III, 733 MHz, 256 MB RAM + 1 GB swap, Redhat
6.2, ROOT 3.01/06 (CVS head a week ago). 

Yours, 

Christian Holm Christensen -------------------------------------------
Address: Sankt Hansgade 23, 1. th.           Phone:  (+45) 35 35 96 91 
         DK-2200 Copenhagen N                Cell:   (+45) 28 82 16 23
         Denmark                             Office: (+45) 353  25 305 
Email:   cholm@nbi.dk                        Web:    www.nbi.dk/~cholm



This archive was generated by hypermail 2b29 : Tue Jan 01 2002 - 17:50:58 MET