Re: [ROOT] Comparison of speed of accessing data

From: HP Wei (hp@rentec.com)
Date: Fri Nov 22 2002 - 14:58:25 MET


>Could you tell me the total time to process only
>   for each stock
>      ** tree = (TTree*) root_file.Get(stock),
>      activate the Date branch,
        
       /****
       find the indexes for the desired date
>>     set up containers for the requested data fields, (using STL's vector)
>>     activate branches (date, time, price, size),
>>     for all the targeted indexes
>>         tree->GetEntry(i),
>>         put the data into the container.
       ****/
       
>      root_file.Delete(stock)
--------------------------------------------------------------------

   First, the data month is 200101.
   The data are  sorted in (Date, time).

   Now, more timing info.  
   measured_time = (tms.utime + tms.stime) in unit of seconds.
   (aDate = the date for which we are extracting data)
   Block represents the block within /**** and ****/ in the above
   psedo-code.
   Our_database_package is for the custom software that is currently
   used in our company.  The database is compressed extensively.
   It also stores some indexes info for fast accessing.
   Even with the index info stored in the file, the file size
   is still about 30%-50% smaller than the root_file counterpart
   (one tree for one stock) for a given month.
   The timing in the our_database_package column below
   is for the whole task described in the above pseudo-code.
   
   [Note: the date 20010131 is chosen 
          because the real data for this date
          is located near the last rows of a tree, 
          while 20010102
          at the very beginning of a tree.]
   
   Table of measured_time for various situation:
   ---------------------------------------------------------
   aDate      w/Block  wo/Block    our_database_package  
   ---------------------------------------------------------
   20010131   75.93    57.37       21.58
   20010102   61.65    57.35       17.57
   
   With this timing data, it appears that 
   root_file.Get(stock) operation is a very expensive one.
   In the pseudocode, 
   the step: find the indexes for the desired date
   is a simple linear search.
   For the case of 20010131, it takes less than 18 sec (compared with
   57.37 for Get() etc).
   
   
   ------------------------------------------------------------
   Note: this exercise is not to show off (or advertizing) our 
   custom_database package.  We are investigating whether we
   can utilize or integrate ROOT.   
   The existing system is thus the obvious benchmark target.
   
   I am wondering if there are some 'tricks' (hidden functionalities)
   in ROOT to improve the data storage and accessing performance.
   
   
   Valeri mentioned about TTable class.
   Maybe that is another thing to try.
   Anyone knows the fundamental difference between TTable and TTree
   in terms of how they organize the data on disk ??     
   
--HP
   
   



This archive was generated by hypermail 2b29 : Sat Jan 04 2003 - 23:51:20 MET