The table below (also available in Postscript format) shows the result of a ROOT benchmark run on a Pentium Pro 200 Mhz (HP Vectra XU) running RedHat Linux v4.0 (kernel 2.0.27). The machine has 256 Mbytes of RAM and I/O is on the local disk (Seagate Barracuda 9GB Ultra SCSI).
The standard test program Event has been used to generate the data file in various configurations with 1000 events for each configuration. The data file is organized in TTree format. The program was invoked with:
Event 1000 comp split
where:
The Total Time column is the real time in seconds to run the program. Effective time corresponds to the real time minus the time spent in non I/O operations (essentially the random number generator).
The program Event generates in average 600 tracks per event. Each track has 17 data members.
The read benchmark runs under the interactive version of ROOT. The Total time to read All is the real time reported by the execution of the macro eventa, via the ROOT interpreter. We did not correct this time for the overhead coming from the interpreter itself.
The Total time to read Sample corresponds to the execution of the macro eventb. This macro loops on all events. For each event, the branch containing the number of tracks is read. In case the number of tracks is less than 585, the full event is read in memory. This test is obviously not possible in non-split mode. In non-split mode, the full event must be read in memory.
This benchmark illustrates the pros and cons of the compression option. Use compression where the time spent in I/O is small compared to the total processing time or if the compression factor is high.
Note that, so far, we did not make any special effort to tune the ROOT I/O performance. We expect to improve this area in the coming releases.
The times reported in the table correspond to complete I/O operations necessary to deal with machine independent binary files. On Linux, this also includes byte-swapping operations. The ROOT file allows for direct access to any event in the file and also direct access to any part of an event when split=1.
Note also that the uncompressed file generated with split=0 is 48.7 Mbytes and only 47.17 Mbytes for the option split=1. The difference in size is due to the object identification mechanism overhead when the event is written to a single buffer. This overhead does not exist in split mode because the branch buffers are optimized for homogeneous data types.
You can run yourself the test programs on your architecture. The program Event will report the write performance. You can measure the read performance by executing the macros eventa and eventb. The performance depends not only of the processor type, but also of the disk devices (local, NFS, AFS, etc.).