>Could you tell me the total time to process only > for each stock > ** tree = (TTree*) root_file.Get(stock), > activate the Date branch, /**** find the indexes for the desired date >> set up containers for the requested data fields, (using STL's vector) >> activate branches (date, time, price, size), >> for all the targeted indexes >> tree->GetEntry(i), >> put the data into the container. ****/ > root_file.Delete(stock) -------------------------------------------------------------------- First, the data month is 200101. The data are sorted in (Date, time). Now, more timing info. measured_time = (tms.utime + tms.stime) in unit of seconds. (aDate = the date for which we are extracting data) Block represents the block within /**** and ****/ in the above psedo-code. Our_database_package is for the custom software that is currently used in our company. The database is compressed extensively. It also stores some indexes info for fast accessing. Even with the index info stored in the file, the file size is still about 30%-50% smaller than the root_file counterpart (one tree for one stock) for a given month. The timing in the our_database_package column below is for the whole task described in the above pseudo-code. [Note: the date 20010131 is chosen because the real data for this date is located near the last rows of a tree, while 20010102 at the very beginning of a tree.] Table of measured_time for various situation: --------------------------------------------------------- aDate w/Block wo/Block our_database_package --------------------------------------------------------- 20010131 75.93 57.37 21.58 20010102 61.65 57.35 17.57 With this timing data, it appears that root_file.Get(stock) operation is a very expensive one. In the pseudocode, the step: find the indexes for the desired date is a simple linear search. For the case of 20010131, it takes less than 18 sec (compared with 57.37 for Get() etc). ------------------------------------------------------------ Note: this exercise is not to show off (or advertizing) our custom_database package. We are investigating whether we can utilize or integrate ROOT. The existing system is thus the obvious benchmark target. I am wondering if there are some 'tricks' (hidden functionalities) in ROOT to improve the data storage and accessing performance. Valeri mentioned about TTable class. Maybe that is another thing to try. Anyone knows the fundamental difference between TTable and TTree in terms of how they organize the data on disk ?? --HP
This archive was generated by hypermail 2b29 : Sat Jan 04 2003 - 23:51:20 MET