I'd like to thank many people who replied to my posting and gave me some insight about ROOT and about ideas. The result is that in the original test, I am able to 'beat' our internal benchmark (although not by great margin). However, that test is to SEQUENTIALLY access the data in a ROOT's tree. Yesterday, a new test revealed that ROOT's tree perhaps is intrinsically slow for 'RANDOM accessing' data in a tree. I will describe the situation in the following so that maybe someone could point out anything that I missed. ------------------------------------------------------------ First, the objective is to access the stock price data for all 10000 stocks at one particular instance (say, 9:40). i.e. we want to get data in an array P[10000]. Intuitively, one would think that the natural way of storing the data is to have a tree like this: price_branch t0 P[10000] t1 P[10000] t2 P[10000] ... However, because each stock's price is very different from each other, this arrangement (with compression on) will blow up the file_size to an unacceptable level. So, to make file_size smaller, the best way is to store data like this: price_branch stock1 p[100] ---> 100 is the time instances in a day stock2 p[100] stock3 p[100] .... So, to meet the objective, we need to read through all 10000 stocks in a day and do a 'transpose-like' operation. Doing this for 10000 stocks and all instances in a day for all days in a tree SEQUENTIALLY is found to be a little faster than our current software system. BUT, if we want to access say only 500 stocks (say, those stocks in SP500) we then have a problem. The data in a day for those 500 stocks are (in general) not in a continuous block in the branch but is dispersed 'randomly'. (e.g. if the first stock is at entry 2, the second stock may be at entry 150.) If the buffer-size when the tree was created is 32000, we wasted a lot of time when doing GetEvent(2). This is because we only need 800 bytes of data, but ROOT will read 32000 bytes of data into the buffer. Setting buffer_size to 1000 seems to be a reasonable way out. But the file_size problem kicks in. --------------------------------------------- Questions: Is there a provision in accessing a tree branch to allow us to control how many bytes ROOT extracts data into the buffer from the disk ????? Any other suggestions are appreciated. --HP
This archive was generated by hypermail 2b29 : Sat Jan 04 2003 - 23:51:23 MET