Hi, > I'd like to point out a few things. First of all, there is a principal difference > between the I/O methods used by "DB-like" and "non-DB-like" applications and one > of the consequences of this difference is that the latter can achieve much higher > I/O bandwidth per process. In old good times of DOS/CPM I've been involved in a low level database design. From this experience I would say that the highest I/O performance can be achieved if you 1. Write/Read fixed size binary records. 2. Do not provide insert functionality but do provide replace functionality instead. 3. Delete records via setting "deleted" flag and clean up (rewrite) a database during idle (night) time 4. Do format c: before installing a database to not jump between cilinders on the HD 5. Provide smart buffering/caching which adopts (system) I/O buffer to record size and user queries. 6. Provide smart indexing (hashing for string fields) This lets you read/write data with (near) your system I/O speed which can be much higher than 25-30MB/sec on modern SCSI devices. The I/O performance currently is one of > the key issues for HEP experiments. For example, CDF already had to modify > ROOT object-oriented I/O mechanism when writing out the TTree's out of the DAQ > to be be able to achieve the rate of about 25-30 MB/sec per process (the default I/O > doesn't provide such rate), and this is what defines the overall data logging > rate for us now. I think that ROOT Trees are much heavier than 1-6 described above. That is the reason for your modification. > ROOT I/O allows to write objects into a file and to modify/delete them after they > have been written. TTree is just one of many objects ROOT can write out. > Let people correct me if I'm wrong, but as far as I can tell, TTree is a very > specialized container, designed to optimize the I/O performance for the objects > stored in it, and the assumption that the TTree object is not going to be modified, > only appended, seems to be quite important for this optimization. Therefore, > I'd be extremely cautious about making any changes to the design of TTree which > could have impact on the performance. Object serialization mechanism which we use in ROOT was initially developed by Borland for TurboVision and TurboPascal 5.5-6.0 in somewhat 1985-90. I do not think somebody considers this mechanism for real databasing. Unfortunately TTree is the only database-like container in ROOT. ROOT doesn't have a hierarchy of data storage classes which provide different functionalities on different levels. For example if I write only fixed size records with several binary fields I do not need 80% of TTree functionalities. Thus I would guess I can gain xx% in I/O performace providing a class for this specific case. At the same time I would like to use TTree like query/drawing so that I would like the same (virtual) user interface for all databasing classes. > Definitely, having additional DB-oriented capabilities in ROOT would be nice. > However a question of whether these capabilities should be provided > by modifying the TTree or by implementing a different kind of data container > is an open one. This is exactly what I have asked. If nobody needs these features in TTree I would like to write my own storage class for my project. I have also noticed that ROOT doesn't work weel with small events of a few tens of bytes. Thus I think it should be stated quite clearly in what field ROOT is suppose to be used. Operating with hundreds of HEP 10-100MB events is quite different from millions of 100 byte spectroscopy events. > I'd also like to comment on another issue. I know that there is a lot of > requests to the ROOT team coming from the HEP experiments, which implementation > requires significant resources. For example, PROOF-server is along-awaited > project. The implementation of the specialized ROOT client-server utility to > minimize the traffic over the net when running ROOT on a remote node gives another > example. Full integration of the "TBuffer-exchange" (fast I/O) mode into ROOT > is yet another one. CDF has requested this mode and is using it for the data-taking > and I believe that the next generation of experiments will depend on this mode > even stronger. Taking into consideration the actual resources of the ROOT team, > I think, that we need to have well-specified priorities. I do not want to start any kind of flame, but could you tell me why ROOT team consists of only two persons if it serves experiments with billion annual bugets? My long research experience tells me that scientific organizations have a very inefficient management. Is it a kind of game? Just for fun look into "future plans" on the ROOT site (last updated in 95 or so) and compare it with the present ROOT status. If ROOT would take one or two more people with permanent positions all these plans could become true. Regards, Anton
This archive was generated by hypermail 2b29 : Tue Jan 01 2002 - 17:50:37 MET