Re: [ROOT] Comparison of speed of accessing data

From: HP Wei (
Date: Fri Dec 13 2002 - 04:28:02 MET

I'd like to thank many people who replied to my posting
and gave me some insight about ROOT and about ideas.
The result is that in the original test, I am able to
'beat' our internal benchmark (although not by great margin).
However, that test is to SEQUENTIALLY access the data in
a ROOT's tree.

Yesterday, a new test revealed that ROOT's tree perhaps
is intrinsically slow for 'RANDOM accessing' data in a tree.
I will describe the situation in the following so that maybe
someone could point out anything that I missed.


First, the objective is to access the stock price data
for all 10000 stocks at one particular instance (say, 9:40).
i.e. we want to get data in an array P[10000].

Intuitively, one would think that the natural way of storing
the data is to have a tree like this:
t0       P[10000]
t1       P[10000]
t2       P[10000]

However, because each stock's price is very different from each other,
this arrangement (with compression on)
 will blow up the file_size to an unacceptable level.

So, to make file_size smaller, the best way is to store data like
stock1  p[100]         ---> 100 is the time instances in a day
stock2  p[100]
stock3  p[100]

So, to meet the objective, we need to read through all 10000 stocks
in a day and do a 'transpose-like' operation.
Doing this for 10000 stocks and all instances in a day for all days in
a tree SEQUENTIALLY is found to be a little faster than our current
software system.

BUT, if we want to access say only 500 stocks (say, those stocks in SP500)
we then have a problem.  The data in a day for those 500 stocks
are (in general) not in a
continuous block in the branch but is dispersed 'randomly'.
(e.g. if the first stock is at entry 2, the second stock may be
      at entry 150.)

If the buffer-size when the tree was created is 32000,
we wasted a lot of time when doing GetEvent(2).
This is because we only need 800 bytes of data, but ROOT will read
32000 bytes of data into the buffer.

Setting buffer_size to 1000
seems to be a reasonable way out. But the
file_size problem kicks in.

   Is there a provision in accessing a tree branch to allow
us to control how many bytes ROOT extracts data into the buffer
from the disk ?????

   Any other suggestions are appreciated.

This archive was generated by hypermail 2b29 : Sat Jan 04 2003 - 23:51:23 MET