Re: [ROOT] Comparison of speed of accessing data

From: HP Wei (hp@rentec.com)
Date: Fri Dec 13 2002 - 04:28:02 MET


I'd like to thank many people who replied to my posting
and gave me some insight about ROOT and about ideas.
The result is that in the original test, I am able to
'beat' our internal benchmark (although not by great margin).
However, that test is to SEQUENTIALLY access the data in
a ROOT's tree.

Yesterday, a new test revealed that ROOT's tree perhaps
is intrinsically slow for 'RANDOM accessing' data in a tree.
I will describe the situation in the following so that maybe
someone could point out anything that I missed.

------------------------------------------------------------

First, the objective is to access the stock price data
for all 10000 stocks at one particular instance (say, 9:40).
i.e. we want to get data in an array P[10000].

Intuitively, one would think that the natural way of storing
the data is to have a tree like this:
         price_branch
t0       P[10000]
t1       P[10000]
t2       P[10000]
...

However, because each stock's price is very different from each other,
this arrangement (with compression on)
 will blow up the file_size to an unacceptable level.

So, to make file_size smaller, the best way is to store data like
this:
        price_branch
stock1  p[100]         ---> 100 is the time instances in a day
stock2  p[100]
stock3  p[100]
....

So, to meet the objective, we need to read through all 10000 stocks
in a day and do a 'transpose-like' operation.
Doing this for 10000 stocks and all instances in a day for all days in
a tree SEQUENTIALLY is found to be a little faster than our current
software system.

BUT, if we want to access say only 500 stocks (say, those stocks in SP500)
we then have a problem.  The data in a day for those 500 stocks
are (in general) not in a
continuous block in the branch but is dispersed 'randomly'.
(e.g. if the first stock is at entry 2, the second stock may be
      at entry 150.)

If the buffer-size when the tree was created is 32000,
we wasted a lot of time when doing GetEvent(2).
This is because we only need 800 bytes of data, but ROOT will read
32000 bytes of data into the buffer.

Setting buffer_size to 1000
seems to be a reasonable way out. But the
file_size problem kicks in.

---------------------------------------------
Questions:
   Is there a provision in accessing a tree branch to allow
us to control how many bytes ROOT extracts data into the buffer
from the disk ?????

   Any other suggestions are appreciated.
--HP



This archive was generated by hypermail 2b29 : Sat Jan 04 2003 - 23:51:23 MET