we also saw weird things [huge memory leaks] when were trying to use split=1 with 3.01/06. not using split=1 any longer... -best, Pasha Christian Holm Christensen wrote: > > Hi, > > I've recently come across a strang thing that happens to me when > reading back a TTree from a file. The TTree contains one top level > branch of a class Foo, that contains a TObjArray. The TObjArray may > contain objects of classes Foo, Bar, Baz, Qux, ... and any number of > them. > > When things go wrong, it's usually flaged with > > index -20432 out of bounds (size: 13, .... > > and then a segmentation violation causing an abort and core dump. > > So I did some serious debugging. It turns out the out of bounds error > comes from "Int_t TClass::ReadBuffer(TBuffer &b, void *pointer)": > > UInt_t R__s, R__c; > Version_t version = b.ReadVersion(&R__s, &R__c); > if (gFile && gFile->GetVersion() < 30000) version = -1; > TStreamerInfo *sinfo = (TStreamerInfo*)fStreamerInfo->At(version); > ^^^^^^^ > > where version is some absurd number. Further debugging showed, that > the problem poped actually popped up in > "Version_t TBuffer::ReadVersion(UInt_t *startpos, UInt_t *bcnt)": > > union { > UInt_t cnt; > Version_t vers[2]; > } v; > *this >> v.vers[1]; > *this >> v.vers[0]; > > Here, v.cnt is supposed to be a masked byte count, but instead it is > 0, or something so that in the next instruction > > if (!(v.cnt & kByteCountMask)) { > fBufCur -= sizeof(UInt_t); > v.cnt = 0; > } > > the conditional is true, and so the buffer backs up "sizeof(UInt_t)" = > 4 bytes, and then goes on. The next thing is: > > *bcnt = (v.cnt & ~kByteCountMask); > *this >> version; > > So the TClass gets the wrong byte count, and a wierd version number, > since the buffer was reading the wrong stuff! Later on this is also > what causes the segmentation violation. So, the whole thing is messed > up because of this short (or is it long?) read. So, I played around a > bit, tried different things. > > It turns out, that certain combinations of the buffer size and split > level makes the thing happen. I did an investigation, and here's what > I found: > > split | buffer size > level | 100 2000 4000 6400 16000 32000 > ------+----------------------------------------- > 0 | n/a n/a ok bad n/a n/a > 1 | ok ok bad bad bad bad > 2 | ok n/a n/a ok ok ok > 99 | ok n/a n/a ok ok ok > > "n/a" means I was lazy and didn't do the test. "ok" I could read back > fine. "bad" means it failed as outlined above. > > Ok, so the numbers above only really makes sense if you have the full > class specs and are running the thing on a machine like to mine. > > This seems odd to me. Does anyone have a good explanation? Is this > really the behaviour intended (presuming that I'm not doing something > wrong, which I don't thing I do). > > My machine is a Pentium III, 733 MHz, 256 MB RAM + 1 GB swap, Redhat > 6.2, ROOT 3.01/06 (CVS head a week ago). > > Yours, > > Christian Holm Christensen ------------------------------------------- > Address: Sankt Hansgade 23, 1. th. Phone: (+45) 35 35 96 91 > DK-2200 Copenhagen N Cell: (+45) 28 82 16 23 > Denmark Office: (+45) 353 25 305 > Email: cholm@nbi.dk Web: www.nbi.dk/~cholm
This archive was generated by hypermail 2b29 : Tue Jan 01 2002 - 17:50:58 MET