Hi Valeri & Valeri On Fri, 10 May 2002, Valeri Fine wrote: > Hi Valeri > > The branches of TTree objects are written with so called "index-sequential" method. > This doesn't provide the direct access to an arbitrary event rather to the block of the > several events. > correct > This means when you try to access you events sequentially then everything fine. > When you try to access your events in the random order your program are forced > to read the entire block of the events into memory even you need only one single event > from that block. correct > > This may lead to the performance degradation by two reason: > > 1. With one single I/O operation your code reads an extra information (events) > that is no use at the time may be correct > 2. When you will require some particular event then very likely it is to be read in > again (see point 1) Not correct, otherwise there is no point in buffering. There is a compromise to be made between two antagonist requirements -if you read mainly sequentially through all your events, you better have very large block sizes. - if you want to select a tiny percentage of all entries, you better choose a small buffer size. For example, if your event size is 1Kbyte and you define a block size of 16 KBytes, even if you read only one event out of 1000, you will have to read anyway 16 events at each call to GetEntry. When I say "read", I mean read the buffer, not unpack the structure in objects in memory. On the other hand, if your event size is 100 Kbytes and your block size is 32K, you will not see the first problem. > > This is what Rene called "masochist way" > > If this is what you need really you may regard the "normal" ROOT I/O rather > TTree I/O. Wrong. Tree I/O is ALWAYS much better than single object I/O. Single object I/O does not scale to a large number of events. It is OK to store histograms or a few thousand objects, not for millions of objects. In addition, compression will always be better with Trees (in particular in split mode) than with individual objects I/O. > That does provide the direct access to the objects. > However the number of events you may handle may be less those you can handle with > TTree. You may be required to write some code to get the service that TTree class provides. correct. However, you may have to write more code than you may think a priori ::) Rene Brun > > > Hope this helps, Valeri > > ----- > Dr.Valeri Fine > STAR/US Atlas E-mail: fine@bnl.gov > Brookhaven National Lab Phone: +1 631 344 7806 > Upton, NY 11973-5000 FAX: +1 631 344 4206 > USA > > > > ----- Original Message ----- > From: "Valeri Tioukov" <Valeri.Tioukov@na.infn.it> > To: "Rene Brun" <Rene.Brun@cern.ch> > Cc: "roottalk" <roottalk@pcroot.cern.ch> > Sent: Friday, May 10, 2002 11:36 AM > Subject: Re: [ROOT] Arbitrary access to tree entries VS sequential one > > > > Hi Rene, > > > > Actually my case is the access to the tree entries via Index so it is really > > arbitrary from the entry point of view. In script is only the simpliest hack to > > illustrate the problem. > > The reality I fased to is that index access is much less effective then I > > expected, especially in the case of million of entries and it is due to the > > GetEntry slowing down. May be I'm mistaken but my impression that with 2.25 it > > was much faster? > > I do not see how 2 Trees could help in case of abitrary (index) access. > > > > Regards > > Valeri > > > > On Fri, 10 May 2002, Rene Brun wrote: > > > > > Hi Valeri, > > > > > > Your access type is not arbitrary but the most masochist way > > > to destroy the ROOT cache ::) > > > > > > Recently, we had a discussion on roottalk on this subject. > > > I suggested for this kind of applications, to open the same Tree twice. > > > Each Tree will have its own cache and you will see a dramatic improvement > > > in performance. > > > > > > Rene Brun > > > > > > > > > Valeri Tioukov wrote: > > > > > > > > Hi rooters, > > > > > > > > I noted that the arbitrary access to the tree entries is significantly slower > > > > then the sequential ones. > > > > > > > > May be it is an important feature of reading algorithm and it not simple to > > > > optimise, overwise it cold be useful to speed up. > > > > > > > > In the applied script I tested 2 cases of access to the tree entries: > > > > > > > > 1) sequential > > > > 2) "arbitrary" (not really arbitrary in my test) > > > > > > > > Test was done on the file Event.root with 20000 events generated with the > > > > standard settings. > > > > > > > > In case of reading only one branch fNtrack the output is: > > > > 20000 events: Real time 0:0:3, CP time 3.380 > > > > 20000 events: Real time 0:0:40, CP time 40.330 > > > > > > > > so the second case is 12 times slower. In principle one could expect some > > > > encrease of access time due to memory (disk) pages lists or similar effects. > > > > But this could behave like a constant: 40-3 = 37. > > > > Instead when I read more branches the ratio is even more havy: > > > > > > > > 20000 events: Real time 0:0:6, CP time 6.110 > > > > 20000 events: Real time 0:2:19, CP time 139.130 > > > > > > > > ratio = 23, difference = 133 > > > > > > > > Regards, > > > > Valeri > > > > > > > > //--test1.C------------------------------------------------------------------- > > > > > > > > // To start the test do: > > > > // > > > > // root [0] .L libEvent.so > > > > // root [1] .L test1.C++ > > > > // root [2] init() > > > > // root [3] sel() > > > > > > > > #include "TTree.h" > > > > #include "TFile.h" > > > > #include "Event.h" > > > > #include "TStopwatch.h" > > > > #include "iostream.h" > > > > > > > > TFile *f = 0; > > > > TTree *chain =0; > > > > > > > > void init() > > > > { > > > > f = new TFile("Event.root"); > > > > chain = (TTree*)f->Get("T"); > > > > } > > > > > > > > void sel() > > > > { > > > > Event *event = 0; > > > > > > > > //chain->SetBranchStatus("*",0); // ratio 12 > > > > //chain->SetBranchStatus("fNtrack",1); // > > > > > > > > chain->SetBranchStatus("*",1); // ratio 23 > > > > chain->SetBranchStatus("fTracks*",0); // > > > > > > > > chain->SetBranchAddress("event",&event); > > > > > > > > int entries = (int)(chain->GetEntries()); > > > > int entries2 = entries/2; > > > > int counter; > > > > printf("chain has %d entries\n",entries); > > > > > > > > TStopwatch timer1; > > > > timer1.Start(); > > > > counter=0; > > > > for(int i=0; i<entries; i++ ) { > > > > chain->GetEntry(i); > > > > counter++; > > > > } > > > > > > > > timer1.Stop(); > > > > cout << counter<< " events: \t"; timer1.Print(); > > > > > > > > int e1 = 200; > > > > int e2 = entries-200; > > > > > > > > TStopwatch timer2; > > > > timer2.Start(); > > > > counter=0; > > > > for(int i=0; i<entries2; i++ ) { > > > > chain->GetEntry(e1); > > > > counter++; > > > > chain->GetEntry(e2); > > > > counter++; > > > > } > > > > timer2.Stop(); > > > > cout << counter<< " events: \t"; timer2.Print(); > > > > } > > > > //---------------------------------------------------------------- > > > > > > > > >
This archive was generated by hypermail 2b29 : Sat Jan 04 2003 - 23:50:52 MET