Re: [ROOT] Arbitrary access to tree entries VS sequential one

From: Rene Brun (Rene.Brun@cern.ch)
Date: Fri May 10 2002 - 23:28:11 MEST


Hi Valeri & Valeri

On Fri, 10 May 2002, Valeri Fine wrote:

> Hi Valeri
> 
> The branches of TTree objects are written with so called "index-sequential" method.
> This doesn't provide the direct access to an arbitrary event rather to the block of the 
> several events.
> 
correct

> This means when you try to access you events sequentially then everything fine.
> When you try to access your events in the random order your program are forced 
> to read the entire block of the events into memory even you need only one single event 
> from that block.

correct

> 
> This may lead to the performance degradation  by two reason:
> 
>    1. With one single I/O operation your code  reads an extra information (events) 
>        that is no use at the time

may be correct

>    2. When you will require some particular event then
very likely it is to be read in
>       again (see point 1)

Not correct, otherwise there is no point in buffering.
There is a compromise to be made between two antagonist requirements
 -if you read mainly sequentially through all your events, you better have
  very large block sizes.
 - if you want to select a tiny percentage of all entries, you better
  choose a small buffer size.
For example, if your event size is 1Kbyte and you define a block size of
16 KBytes, even if you read only one event out of 1000, you will have
to read anyway 16 events at each call to GetEntry. When I say "read",
I mean read the buffer, not unpack the structure in objects in memory.
On the other hand, if your event size is 100 Kbytes and your block size
is 32K, you will not see the first problem.
> 
> This is what Rene called "masochist way"
> 
> If this is what you need  really you may regard the "normal" ROOT I/O rather 
> TTree I/O.

Wrong. Tree I/O is ALWAYS much better than single object I/O. Single
object I/O does not scale to a large number of events. It is OK to store
histograms or a few thousand objects, not for millions of objects.
In addition, compression will always be better with Trees (in particular
in split mode) than with individual objects I/O.

> That does provide the direct access to the objects.
> However the number of events you may handle may be less those you can handle with
> TTree. You may be  required  to write some code to get the service that TTree class provides.

correct. However, you may have to write more code than you may think a
priori ::)

Rene Brun

> 
> 
>                            Hope this helps, Valeri
> 
> -----
> Dr.Valeri Fine
> STAR/US Atlas                                    E-mail: fine@bnl.gov
> Brookhaven National Lab                Phone: +1 631 344 7806
> Upton, NY 11973-5000                       FAX:     +1 631 344 4206
> USA
> 
> 
> 
> ----- Original Message ----- 
> From: "Valeri Tioukov" <Valeri.Tioukov@na.infn.it>
> To: "Rene Brun" <Rene.Brun@cern.ch>
> Cc: "roottalk" <roottalk@pcroot.cern.ch>
> Sent: Friday, May 10, 2002 11:36 AM
> Subject: Re: [ROOT] Arbitrary access to tree entries VS sequential one
> 
> 
> > Hi Rene,
> > 
> > Actually my case is the access to the tree entries via Index so it is really
> > arbitrary from the entry point of view. In script is only the simpliest hack to
> > illustrate the problem.
> > The reality I fased to is that index access is much less effective then I
> > expected, especially in the case of million of entries and it is due to the
> > GetEntry slowing down. May be I'm mistaken but my impression that with 2.25 it
> > was much faster?
> > I do not see how 2 Trees could help in case of abitrary (index) access.
> > 
> > Regards
> > Valeri
> > 
> > On Fri, 10 May 2002, Rene Brun wrote:
> > 
> > > Hi Valeri,
> > >
> > > Your access type is not arbitrary but the most masochist way
> > > to destroy the ROOT cache ::)
> > >
> > > Recently, we had a discussion on roottalk on this subject.
> > > I suggested for this kind of applications, to open the same Tree twice.
> > > Each Tree will have its own cache and you will see a dramatic improvement
> > > in performance.
> > >
> > > Rene Brun
> > >
> > >
> > > Valeri Tioukov wrote:
> > > >
> > > > Hi rooters,
> > > >
> > > > I noted that the arbitrary access to the tree entries is significantly slower
> > > > then the sequential ones.
> > > >
> > > > May be it is an important feature of reading algorithm  and it not simple to
> > > > optimise, overwise it cold be useful to speed up.
> > > >
> > > > In the applied script I tested 2 cases of access to the tree entries:
> > > >
> > > > 1) sequential
> > > > 2) "arbitrary" (not really arbitrary in my test)
> > > >
> > > > Test was done on the file Event.root with 20000 events generated with the
> > > > standard settings.
> > > >
> > > > In case of reading only one branch fNtrack the output is:
> > > > 20000 events:   Real time 0:0:3, CP time 3.380
> > > > 20000 events:   Real time 0:0:40, CP time 40.330
> > > >
> > > > so the second case is 12 times slower. In principle one could expect some
> > > > encrease of access time due to memory (disk) pages lists or similar effects.
> > > > But this could behave like a constant: 40-3 = 37.
> > > > Instead when I read more branches the ratio is even more havy:
> > > >
> > > > 20000 events:   Real time 0:0:6, CP time 6.110
> > > > 20000 events:   Real time 0:2:19, CP time 139.130
> > > >
> > > > ratio = 23, difference = 133
> > > >
> > > > Regards,
> > > > Valeri
> > > >
> > > > //--test1.C-------------------------------------------------------------------
> > > >
> > > > // To start the test do:
> > > > //
> > > > // root [0] .L libEvent.so
> > > > // root [1] .L test1.C++
> > > > // root [2] init()
> > > > // root [3] sel()
> > > >
> > > > #include "TTree.h"
> > > > #include "TFile.h"
> > > > #include "Event.h"
> > > > #include "TStopwatch.h"
> > > > #include "iostream.h"
> > > >
> > > > TFile      *f = 0;
> > > > TTree      *chain =0;
> > > >
> > > > void init()
> > > > {
> > > >   f = new TFile("Event.root");
> > > >   chain = (TTree*)f->Get("T");
> > > > }
> > > >
> > > > void sel()
> > > > {
> > > >   Event *event = 0;
> > > >
> > > >   //chain->SetBranchStatus("*",0);           // ratio 12
> > > >   //chain->SetBranchStatus("fNtrack",1);     //
> > > >
> > > >   chain->SetBranchStatus("*",1);             // ratio 23
> > > >   chain->SetBranchStatus("fTracks*",0);      //
> > > >
> > > >   chain->SetBranchAddress("event",&event);
> > > >
> > > >   int entries = (int)(chain->GetEntries());
> > > >   int entries2 = entries/2;
> > > >   int counter;
> > > >   printf("chain has %d entries\n",entries);
> > > >
> > > >   TStopwatch timer1;
> > > >   timer1.Start();
> > > >   counter=0;
> > > >   for(int i=0; i<entries; i++ ) {
> > > >     chain->GetEntry(i);
> > > >     counter++;
> > > >   }
> > > >
> > > >   timer1.Stop();
> > > >   cout << counter<< " events: \t"; timer1.Print();
> > > >
> > > >   int e1 = 200;
> > > >   int e2 = entries-200;
> > > >
> > > >   TStopwatch timer2;
> > > >   timer2.Start();
> > > >   counter=0;
> > > >   for(int i=0; i<entries2; i++ ) {
> > > >     chain->GetEntry(e1);
> > > >     counter++;
> > > >     chain->GetEntry(e2);
> > > >     counter++;
> > > >   }
> > > >   timer2.Stop();
> > > >   cout << counter<< " events: \t"; timer2.Print();
> > > > }
> > > > //----------------------------------------------------------------
> > >
> > >
> > 
> 



This archive was generated by hypermail 2b29 : Sat Jan 04 2003 - 23:50:52 MET