Re: [ROOT] Strange problem with TBranch buffer size

From: Pasha Murat (630)840-8237@169G ((630)840-8237@169G)
Date: Tue Aug 21 2001 - 19:11:34 MEST


we also saw weird things [huge memory leaks] when were trying 
to use split=1 with 3.01/06. not using split=1 any longer... -best, Pasha

Christian Holm Christensen wrote:
> 
> Hi,
> 
> I've recently come across a strang thing that happens to me when
> reading back a TTree from a file.  The TTree contains one top level
> branch of a class Foo, that contains a TObjArray.  The TObjArray may
> contain objects of classes Foo, Bar, Baz, Qux, ... and any number of
> them.
> 
> When things go wrong, it's usually flaged with
> 
>     index -20432 out of bounds (size: 13, ....
> 
> and then a segmentation violation causing an abort and core dump.
> 
> So I did some serious debugging. It turns out the out of bounds error
> comes from "Int_t TClass::ReadBuffer(TBuffer &b, void *pointer)":
> 
>    UInt_t R__s, R__c;
>    Version_t version = b.ReadVersion(&R__s, &R__c);
>    if (gFile && gFile->GetVersion() < 30000) version = -1;
>    TStreamerInfo *sinfo = (TStreamerInfo*)fStreamerInfo->At(version);
>                                                             ^^^^^^^
> 
> where version is some absurd number.  Further debugging showed, that
> the problem poped actually popped up in
> "Version_t TBuffer::ReadVersion(UInt_t *startpos, UInt_t *bcnt)":
> 
>     union {
>       UInt_t     cnt;
>       Version_t  vers[2];
>     } v;
>     *this >> v.vers[1];
>     *this >> v.vers[0];
> 
> Here, v.cnt is supposed to be a masked byte count, but instead it is
> 0, or something so that in the next instruction
> 
>     if (!(v.cnt & kByteCountMask)) {
>       fBufCur -= sizeof(UInt_t);
>       v.cnt = 0;
>     }
> 
> the conditional is true, and so the buffer backs up "sizeof(UInt_t)" =
> 4 bytes, and then goes on. The next thing is:
> 
>     *bcnt = (v.cnt & ~kByteCountMask);
>     *this >> version;
> 
> So the TClass gets the wrong byte count, and a wierd version number,
> since the buffer was reading the wrong stuff!  Later on this is also
> what causes the segmentation violation.  So, the whole thing is messed
> up because of this short (or is it long?) read.  So, I played around a
> bit, tried different things.
> 
> It turns out, that certain combinations of the buffer size and split
> level makes the thing happen.  I did an investigation, and here's what
> I found:
> 
>    split |              buffer size
>    level | 100   2000   4000   6400   16000   32000
>    ------+-----------------------------------------
>      0   | n/a    n/a     ok    bad     n/a     n/a
>      1   |  ok     ok    bad    bad     bad     bad
>      2   |  ok    n/a    n/a    ok       ok      ok
>     99   |  ok    n/a    n/a    ok       ok      ok
> 
> "n/a" means I was lazy and didn't do the test.  "ok" I could read back
> fine. "bad" means it failed as outlined above.
> 
> Ok, so the numbers above only really makes sense if you have the full
> class specs and are running the thing on a machine like to mine.
> 
> This seems odd to me.  Does anyone have a good explanation?  Is this
> really the behaviour intended (presuming that I'm not doing something
> wrong, which I don't thing I do).
> 
> My machine is a Pentium III, 733 MHz, 256 MB RAM + 1 GB swap, Redhat
> 6.2, ROOT 3.01/06 (CVS head a week ago).
> 
> Yours,
> 
> Christian Holm Christensen -------------------------------------------
> Address: Sankt Hansgade 23, 1. th.           Phone:  (+45) 35 35 96 91
>          DK-2200 Copenhagen N                Cell:   (+45) 28 82 16 23
>          Denmark                             Office: (+45) 353  25 305
> Email:   cholm@nbi.dk                        Web:    www.nbi.dk/~cholm



This archive was generated by hypermail 2b29 : Tue Jan 01 2002 - 17:50:58 MET