Re: [ROOT] Comparison of speed of accessing data

From: Rene Brun (Rene.Brun@cern.ch)
Date: Fri Nov 22 2002 - 15:09:23 MET


Hi HP,

OK, we are converging to a possible problem.
Could you send me the result of TTree::Print for your Root file.
I want to see the size of the buffers. If the buffers are too large,
the Get function has to import too much data in memory.

Rene Brun

On Fri, 22 Nov 2002, HP Wei wrote:

> 
> >Could you tell me the total time to process only
> >   for each stock
> >      ** tree = (TTree*) root_file.Get(stock),
> >      activate the Date branch,
>         
>        /****
>        find the indexes for the desired date
> >>     set up containers for the requested data fields, (using STL's vector)
> >>     activate branches (date, time, price, size),
> >>     for all the targeted indexes
> >>         tree->GetEntry(i),
> >>         put the data into the container.
>        ****/
>        
> >      root_file.Delete(stock)
> --------------------------------------------------------------------
> 
>    First, the data month is 200101.
>    The data are  sorted in (Date, time).
> 
>    Now, more timing info.  
>    measured_time = (tms.utime + tms.stime) in unit of seconds.
>    (aDate = the date for which we are extracting data)
>    Block represents the block within /**** and ****/ in the above
>    psedo-code.
>    Our_database_package is for the custom software that is currently
>    used in our company.  The database is compressed extensively.
>    It also stores some indexes info for fast accessing.
>    Even with the index info stored in the file, the file size
>    is still about 30%-50% smaller than the root_file counterpart
>    (one tree for one stock) for a given month.
>    The timing in the our_database_package column below
>    is for the whole task described in the above pseudo-code.
>    
>    [Note: the date 20010131 is chosen 
>           because the real data for this date
>           is located near the last rows of a tree, 
>           while 20010102
>           at the very beginning of a tree.]
>    
>    Table of measured_time for various situation:
>    ---------------------------------------------------------
>    aDate      w/Block  wo/Block    our_database_package  
>    ---------------------------------------------------------
>    20010131   75.93    57.37       21.58
>    20010102   61.65    57.35       17.57
>    
>    With this timing data, it appears that 
>    root_file.Get(stock) operation is a very expensive one.
>    In the pseudocode, 
>    the step: find the indexes for the desired date
>    is a simple linear search.
>    For the case of 20010131, it takes less than 18 sec (compared with
>    57.37 for Get() etc).
>    
>    
>    ------------------------------------------------------------
>    Note: this exercise is not to show off (or advertizing) our 
>    custom_database package.  We are investigating whether we
>    can utilize or integrate ROOT.   
>    The existing system is thus the obvious benchmark target.
>    
>    I am wondering if there are some 'tricks' (hidden functionalities)
>    in ROOT to improve the data storage and accessing performance.
>    
>    
>    Valeri mentioned about TTable class.
>    Maybe that is another thing to try.
>    Anyone knows the fundamental difference between TTable and TTree
>    in terms of how they organize the data on disk ??     
>    
> --HP
>    
>    
> 



This archive was generated by hypermail 2b29 : Sat Jan 04 2003 - 23:51:20 MET