Re: 2 questions on ROOT data storage mechanism

From: Rene Brun (Rene.Brun@cern.ch)
Date: Fri Nov 07 1997 - 15:26:09 MET


Pasha Murat wrote:

>         Hello,
>
> I have a couple of questions on ROOT data storage mechanism, which could be
> interesting for many people especially from large collaborations.
>
> - is it possible to read an event written by somebody else in the collaboration
>   back without knowing *everything* about the event format?
>
>   In "ZEBRA world" if I have an event (a set of ZEBRA banks) written out,
>   I can always read it back as a set of ZEBRA banks, and then use only those
>   banks I need, without knowing the exact format of the rest banks.
>

When you use the serialisation technique (no split), you MUST havethe code that
created the object to read it.
If you use the split mode (one more advantage), it is possible to read
a Tree without having access to the original source code. I think
this feature is EXTREMELY important. You may want to look at
some data taken 5 years ago and you do not have access to the code
that produced the data.
For example, if you look in test/Event, you can  generate the file
in split mode (say 1000 events) by invoking the program Event
   Event 1000 0 1 1
and you get the output data base (file) Event.root
In an interactive Root session, you can do:
  Root > TFile f("Event.root")
  Root > T.Draw("fPx")    (use TBrowser to browse all leafs of the tree)
  Root > T.MakeCode("T.C")

T.C is a prototype of an analysis function where the class data members
have been converted to simple data types or arrays of basic types.

> - I heard that people have encountered certain difficulties with using
>   versioning mechanism provided by Objectivity. So let me ask (I apologize
>   for not trying this feature in advance) how the versioning mechanism
>   works in ROOT?  Suppose I've written several data files out using
>
> ClassDef(MyClass,1)
>
>   for MyClass. Then I changed the definition of the class (say, added one
>   more data member to it) and now am using
>
> ClassDef(MyClass,2)
>
>   and trying to read the data written in "old" format back. How does ROOT
>   know about the differences between version 1 and version 2 of MyClass



  ROOT supports class schema evolution. This is a fundamental feature.
It is likely that you will write data sets with many changes in your
classes during the life time of your experiment.
With Root, we do not want to FORCE users to convert their data base
when a class changes. This may be OK for trivial data bases, but not
for Terabytes of data.
The problem must be described in a more general way that you seem
to imply in your mail.
Suppose a class ClassA
   class ClassA : public ClassB
      int   a1;
      int   a2;

   class  ClassB : public ClassC
      float b1;
      float b2;

   class  ClassC : public TObject
      int  c1;

You must consider all possible scenarios where any of the classes
ClassA, ClassB, ClassC and TObject could change with time.
So you must have a mechanism to identify each level of inheritance.
In Root, I/O is via the function Streamer automatically generated
by rootcint. For the case above,

//______________________________________________________________________________
void ClassA::Streamer(TBuffer &R__b)
{
   // Stream an object of class ClassA.

   if (R__b.IsReading()) {
      Version_t R__v = R__b.ReadVersion();
      ClassB::Streamer(R__b);
      R__b >> a1;
      R__b >> a2;
   } else {
      R__b.WriteVersion(ClassA::IsA());
      ClassB::Streamer(R__b);
      R__b << a1;
      R__b << a2;
   }
}

//______________________________________________________________________________
void ClassB::Streamer(TBuffer &R__b)
{
   // Stream an object of class ClassB.

   if (R__b.IsReading()) {
      Version_t R__v = R__b.ReadVersion();
      ClassB::Streamer(R__b);
      R__b >> b1;
      R__b >> b2;
   } else {
      R__b.WriteVersion(ClassB::IsA());
      ClassB::Streamer(R__b);
      R__b << b1;
      R__b << b2;
   }
}

//______________________________________________________________________________
void ClassC::Streamer(TBuffer &R__b)
{
   // Stream an object of class ClassC.

   if (R__b.IsReading()) {
      Version_t R__v = R__b.ReadVersion();
      TObject::Streamer(R__b);
      R__b >> c1;
   } else {
      R__b.WriteVersion(ClassC::IsA());
      TObject::Streamer(R__b);
      R__b << c1;
   }
}

In the code automatically generated, the statement like
      R__b.WriteVersion(ClassC::IsA());
writes the ClassC version number (the one given in ClassDef)
in the output buffer as a 2 bytes integer.
When in read mode, the statement:
      Version_t R__v = R__b.ReadVersion();
returns in R__v the version number of the class when the object
was written.
Coming back to your question, suppose that in ClassA, I want
to add an additional data member a3 in version 2 of this class
and that in ClassB I delete the data member b2, I can modify
the corresponding Streamer functions in the following way:
Here I assume that I am running with the latest version.
//______________________________________________________________________________
void ClassA::Streamer(TBuffer &R__b)
{
   // Stream an object of class ClassA.

   if (R__b.IsReading()) {
      Version_t R__v = R__b.ReadVersion();
      ClassB::Streamer(R__b);
      R__b >> a1;
      R__b >> a2;
      if (R__v > 1)  R__b >> a3;
      else  a3 = 0;    // or a default value that makes sense
   } else {
      R__b.WriteVersion(ClassA::IsA());
      ClassB::Streamer(R__b);
      R__b << a1;
      R__b << a2;
      R__b << a3;
   }
}

//______________________________________________________________________________
void ClassB::Streamer(TBuffer &R__b)
{
   // Stream an object of class ClassB.

   if (R__b.IsReading()) {
      Version_t R__v = R__b.ReadVersion();
      ClassB::Streamer(R__b);
      R__b >> b1;
      float oldb2;
      if (R__v == 1)  R__b >> oldb2;
   } else {
      R__b.WriteVersion(ClassB::IsA());
      ClassB::Streamer(R__b);
      R__b << b1;
   }
}

This example illustrates that support for class schema evolution
implies a small overhead of 2 bytes per level of inheritance
in the output file. This overhead is in general substantially
reduced by the compression algorithm.

In case of split mode, the situation is a bit more complex.
You cannot have in the SAME Tree in the SAME file objects
generated by two different class versions.
I think that this is an academical case for which one could
imagine to provide some support in the future.

This is another reason why we provide TTree::MakeCode.
In case, you have lost the original class definition,
you can always convert your Tree back simple types or arrays.

Rene Brun



This archive was generated by hypermail 2b29 : Tue Jan 04 2000 - 00:26:22 MET