The ROOT Object I/O System


Why Not Use an OO Data Base Instead?

The ROOT system includes a set of classes to support Input/Output from/to machine independent files. The system is designed to be particularly efficient for objects frequently manipulated by physicists: histograms, ntuples, trees and events.

One could argue that this functionality may also be provided by Object Oriented Data Base Management Systems (OODBMS). We consider OODBMS's as potential candidates for the replacement of tools like HEPDB or FATMEN, i.e. when locking and concurrent writing is required. But we do not believe that they provide a solution for the types of objects mentioned above. Why?

If a connection to an OODBMS is required, we understand now how to make an automatic interface between the ROOT dictionary and the data base schema definition language (ddl). To use an OO data base, the user classes must inherit from a base class ooObj. This inheritance affects many places in the user's code (casting operations in iterators, if templates are not used). The ROOT base class TObject could be used instead. A special version of TObject could be provided to make an automatic and transparent interface to the ooObj class.

The Physical File Structure

The ROOT file structure is shown in the figure below. A file has a file header (less than 64 bytes) and then several logical records of variable length. The first 4 bytes of each physical record are an integer holding the number of bytes in this record. If the number of bytes is negative, it identifies a deleted record that can be reused in a subsequent write operation. The following bytes (documented on the picture) contain all the information to uniquely identify a data block on the file.

The TFile::Map member function can be used to view the contents of a file by reading sequentially all the data blocks. The information stored by the ROOT Output functions is always in machine independent format (ASCII, IEEE floating point, Big Endian byte ordering). The redundancy in the logical record header can be used in case of file corruption or disk errors, to rebuild the original structure. Data in the logical records can be compressed or uncompressed, but the logical record header is never compressed. The first logical record on a file always contain the description of the top level directory of the file.



The logical records contain the following possible data:

Note that the concept of record length or block size has completely disappeared from the ROOT terminology. This simplifies considerably the logic of the system. It makes also simpler the implementation of memory mapping techniques.

The Logical File Structure

A ROOT file is like a Unix file system. It can contain directories and objects with an unlimited number of levels. Each directory has an associated list of keys kept in memory until the file is closed. Finding an object on the file is done in two steps:

ROOT objects are always written in consecutive order on the file. By default, the directory description is written when the TDirectory constructor is invoked. However, if in a previous session, objects were deleted, the space released can be used by newly created objects. In this case, ROOT tries to find the best free block. The list of free blocks is a list supported by the TFile object. If a free block has a length that matches the length of the new stored object, the object is written in the free block and this free block is deleted from the list of free blocks. If not, the first free block bigger than the object is used. When a file is closed, the linked list of each directory is written to the file.

Thanks to this structure, a ROOT file can be read sequentially if all objects must be processed, or individual objects can be randomly accessed using the information in the TKey object. The TTree objects use a variant of the standard TKeys(TBasket). The TBasket keys are designed to address randomly a large quantity of objects (TBranch buffers) in very large files.



Support for Class/Schema Evolution

A ROOT file will in general be written with the same version of a class library. However, in the life time of a collaboration, the definition of many classes is likely to change frequently.

The problem is made even more complicated by the use of inheritance. Assume a class D with its base classes C, B and A. An object of class D must be identified by the four version numbers of its composing classes.

We have implemented in the ROOT files a stamping mechanism that guarantees that old files can always be read by new libraries. A likely case is the one of an analysis program looping on many data sets generated across the years with different class definitions.

The header files containing the class definition must include the macro:

   ClassDef(Classname,VersionID),    e.g.   ClassDef(TLine,1)

The ClassDef macro is defined in the Rtypes.h include file which is automatically referenced by any include file using TObject derived classes. The second parameter of ClassDef is an integer representing the class version number. When the file is processed by the CINT Dictionary Generator the version information is saved in the dictionary data structure and is later available at execution time in the Streamer() I/O method.

The Input/Output Functions

An object is written to a file via the TObject::Write() function. This operation consists of the following steps:

An object is read from the file into memory via the TKey::Read() function using the following sequence of operations:

The prototype for the Streamer() function is automatically declared by the ClassDef macro. An Example of code for the Streamer() function is shown below for the ROOT class TShape. This code illustrates how the Streamer() function deals with base classes, including multiple inheritance. The two data members fNumber and fVisibility are integers. fMaterial is a pointer to the material definition for this shape. In the likely case that many different shapes will reference the same material, only one copy of the referenced material will be written. The code for this function is automatically generated by the CINT Dictionary Generator.

//_______________________________________________________________________
void TShape::Streamer(TBuffer &b)
{
   // Stream a TShape object

   if (b.IsReading()) {
      Version_t v = b.ReadVersion();
      TNamed::Streamer(b);
      TAttLine::Streamer(b);
      TAttFill::Streamer(b);
      b >> fNumber;
      b >> fVisibility;
      b >> fMaterial;
   } else {
      b.WriteVersion(TShape::IsA());
      TNamed::Streamer(b);
      TAttLine::Streamer(b);
      TAttFill::Streamer(b);
      b << fNumber;
      b << fVisibility;
      b << fMaterial;
   }
}

Compression or No Compression?

By default, objects are compressed before being written to a file. The ROOT compression algorithm is based on derivatives of the well known gzip algorithm.

This algorithm supports up to 9 levels of compression. The default compression level is 1. This level is specified as a parameter in the TFile constructor or can be modified by the TFile::SetCompressionLevel() function. If the level is set to 0, no compression is done. The performance of the compression algorithm can be seen on an object by object basis by using the TFile::Map() function. Experience with this algorithm tends to indicate a compression factor of 1.3 for raw data files and around 2 on most DSTs files.

Our choice of default compression level 1 is a compromise between time and compression efficiency. To get an idea of the ROOT compression capabilities, you can try the

gzip -1 file

on any of your files.

The time to uncompress a buffer is small compared to the compression time and is independent of the selected compression level. A ROOT file may contain objects written with different compression levels.

The Compression Algorithm

The algorithm identifies new text as repetitions of old text within a fixed-length sliding window trailing behind the new text.

The deflation process depends on being able to identify portions of the input text which are identical to earlier input (within a sliding window trailing behind the input currently being processed).

The most straightforward technique turns out to be the fastest for most cases: try all possible matches and select the longest. The key feature of this algorithm is that insertions into the string dictionary are very simple and thus fast, and deletions are avoided completely. Insertions are performed at each input character, whereas string matches are performed only when the previous match ends. So it is preferable to spend more time in matches to allow very fast string insertions and avoid deletions. The matching algorithm for small strings is inspired by that of Rabin & Karp. A brute force approach is used to find longer strings when a small match has been found. A similar algorithm is used in comic (by Jan-Mark Wams) and freeze (by Leonid Broukhis). A description of the Rabin and Karp algorithm is given in the book

        "Algorithms" by R. Sedgewick, Addison-Wesley, p252.

see also

      Fiala,E.R., and Greene,D.H.
      Data Compression with Finite Windows, Comm.ACM, 32,4 (1989) 490-595

We would like to thank Evgueni Tcherniaev for providing a very recent version of the algorithm. When you are interested in more information you are invited to read the many comments and the copyright notice in the code (directory ZIP).

The ROOT Trees


For many years, the data flow model in HEP has been:

   Raw Data Tapes --> Data Summary Tapes --> Mini/Micro DST's

The introduction of Ntuples in the PAW framework has proven to be very successful. Many experiments are using Ntuples as a convenient replacement for mini-DSTs or even DSTs.

The PAW Ntuples, however, were restricted to very simple data objects, collection of single variables or arrays.

With ROOT, we are introducing a new concept that we call Trees. Trees provide the functionality of the Ntuples and much more. The Tree architecture extends the concept of the Ntuple to all complex objects or data structures found in Raw Data tapes and DST's. The idea is that the same data model, same language, same style of queries can be used to all data sets in one experiment. Trees are designed to support not only complex objects, but also a very large number of them in a large number of files. The storage model using standard ROOT files is shown below.

In a conventional DST, all data structures of one event were written in a contiguous area on the file. This model has been very successful and robust for sequential files and when the analysis program requires access to a large number of all attributes of one event. On the other hand, this model was particularly inefficient when one had to iterate on a subset of the events or when a small subset of the event attributes was used. A typical structure for one event in HEP experiments is shown below. The pseudo-log scale used is just to indicate the large differences in object sizes from the various detectors.



An example of a ROOT based event structure is described in NA49 classes. The logical structure of the classes is shown in this GIF file.

A Tree (class TTree) is made of branches. Each branch (class TBranch) is described by its leaves (class TLeaf). The leaves can be simple variables, structures, arrays or objects. An array may be of variable length, the length itself being a variable in the same branch or another branch. Branches will in general be objects. However, we thought important to also support variables and structures for these applications not yet converted to C++ and objects. A structure, for example, could be a simple C structure or a list of variables in a Fortran common block.

When the fruits of one branch (detector data) are ready to be picked, they are collected into baskets (class TBasket). When the Baskets become full, they go to the store, the file.

Each Branch will go to a different buffer (Basket). The TPC buffers will be written maybe after every event, whereas Trigger info buffers may be written only after every few hundred events and event flag buffers may be kept in memory. The different buffers can be organized to be written to the same file or to different files.

It is easy and efficient for an interactive analysis program to make selections on the event flag and then only read buffers of the selected events. This mechanism is also well suited for parallel architectures. Note that this scheme allows also insertion of a new Branch at any time in an existing file or set of files.

When queries are executed on one or more variables or objects, only the branch buffers containing these variables are read into memory.

Data are in general processed on different architectures with different memory sizes. In case the analysis is performed on a parallel architecture with a lot of memory, as many buffers as possible are kept in memory, maybe even all buffers.

The Tree data structure allows direct access to any event, direct access to any branch or any leaf even in the case of variable length structures.

A working example of a tree can be found in the description of the class TTree.

See also:



Rene Brun, Fons Rademakers
Last update 26/8/96 by FR