Re: [ROOT] TTree modification

From: Anton Fokin (anton.fokin@smartquant.com)
Date: Mon Feb 26 2001 - 11:28:18 MET


Hi Rene,

> We are well aware of all your points 1 to 6 and the associated theory.
> I respect your experience with DOS/CPM. I have also spent sometime
> in the past, may be too much, in developing systems based on fixed
> size binary records and there are many reasons why we abandonned
> such systems.

This is clear and not. Fixed size recods make a special case which can be
put in a separate class with the fastest I/O. Perhaps this class might be a
base for  TTree with variable size records. I understand all disadvantades
of the fixed size but at the same time I would guess that many users face a
situation where they write small fixed size events like several
detectors/telescopes. Also it would be interesting to estimate how much I/O
performance you gain going from TTree to a simple fixed size I/O with no
compression.

>  I/O is of crucial importance for our framework and I hope you understand
> that we are targeting primarily the most urgent requests from the
> experiments we are working with.

Yes, no problems, questions whatsoever with that. But perhaps you should
clarify what kind of exeperimets you are working with. As I mentioned
before, HEP and NP exepriments might be quite different in philosophy and
data manipulation design. Perhaps TTrees are fine in HEP but it might be
stated that they have limitations in case of spectroscopy or other small
experiments. Then someone from NP community can take specail care and
develop a storage class for millions of small records (with fixed and non
fixed sizes).

>  Root supports currently two kinds of storage techniques
>  - via MyClass::Write function if MyClass derives from TObject.
>    This is based on a simple serialisation algorithm. You can delete,
replace,
>    overwrite objects by new versions. This technique is OK for a small
>    number of objects (say < one million). The main limitation is the size
>    of the table of the keys (on average 60 bytes per key) and not so much
>    the time to find an object (hashtable). Use this technique for objects
>    such as histograms, graphs, canvases, geometries, calibrations.
>    Note that the serialisation algorithm in version 3.00 has nothing to
do,
>    like you claim, with the Borland algorithms.

I am afraid that in many cases your data warehouse will provide housing of
the hash table and keys:) I did not claim that root serialization is taken
from Borland, I simply said that according to my memories a common object
serialization mechanizm (even for user derived classes) was initially
introduced by Borland. That time I was very impressed by this technology.

> Some clarification on a "replace mode" for TTrees.
> A replace option could be considered only in the special case of branches
> containing only basic types and with no compression. Replacing variable
length
> structures or compressed branches will destroy completly the simple and
> efficient addressing scheme used internally.
> Replacing a complete branch could also be considered in all modes.

If root would have several inherited data storage classes it could be quite
easy to do such things. Everybody understands that one has to pay for
variable size records and compression, which again makes variable size from
fized size.

Compression itself is a questionable topic for me. If I need a compression,
I would prefer instaling things like smartdrive or spend money for smart
internal hardvare compression. Also, it is nice to have compression for for
objects with holes but for data warehousing usually a good design creates
nearly random binary data flow. But, well, compression is fine and can
impress students :)

> You are also raising some non technical points. Concerning the support
> for Root and the size of the support team: The CERN Computing Review
> has recently recommended in its final report that the system must now be
> officially supported and additional manpower provided.

Fine, congratulations! When you have an opening, send me a message - I will
apply :)

> Talking about Web pages, I noticed that your web page
> http://www.smartquant.com is nearly unreadable.

Well. This is a perfect example of another open project, namely Netscape 4.0
for Linux. www.smartquant.com looks fine under MSIE or Netscape6.0 on NT. I
am aware of smartquant problems under Linux but I do not have enough time to
duplicate all the pages so that on Linux it shows only one font.

By the way, this is my girlfriend who designed www.smartquant.com  on
Winodows and she would be quite happy if someone can give her
sugestion/hints how to make Windows web pages looking good on Linux Netscape
4.0.

Regards,
Anton

>
> Rene Brun
>
> Anton Fokin wrote:
> >
> > Hi,
> >
> > > I'd like to point out a few things. First of all, there is a principal
> > difference
> > > between the I/O  methods used by "DB-like" and "non-DB-like"
applications
> > and one
> > > of the consequences of this difference is that the latter can achieve
much
> > higher
> > > I/O bandwidth per process.
> >
> > In old good times of DOS/CPM I've been involved in a low level database
> > design. From this experience
> > I would say that the highest I/O performance can be achieved if you
> >
> > 1. Write/Read fixed size binary records.
> > 2. Do not provide insert functionality but do provide replace
functionality
> > instead.
> > 3. Delete records via setting "deleted" flag and clean up (rewrite) a
> > database during idle (night) time
> > 4. Do format c: before installing a database to not jump between
cilinders
> > on the HD
> > 5. Provide smart buffering/caching which adopts (system) I/O buffer to
> > record size and user queries.
> > 6. Provide smart indexing (hashing for string fields)
> >
> > This lets you read/write data with (near) your system I/O speed which
can be
> > much higher than 25-30MB/sec
> > on modern SCSI devices.
> >
> > The I/O performance currently is one of
> > > the key issues for HEP experiments. For example, CDF already had to
modify
> > > ROOT object-oriented I/O mechanism when writing out the TTree's out of
the
> > DAQ
> > > to be be able to achieve the rate of about 25-30 MB/sec per process
(the
> > default I/O
> > > doesn't provide such rate), and this is what defines the overall data
> > logging
> > > rate for us now.
> >
> > I think that ROOT Trees are much heavier than 1-6 described above. That
is
> > the reason for your modification.
> >
> > > ROOT I/O allows to write objects into a file and to modify/delete them
> > after they
> > > have been written. TTree is just one of many objects ROOT can write
out.
> > > Let people correct me if I'm wrong, but as far as I can tell, TTree is
a
> > very
> > > specialized container, designed to optimize the I/O performance for
the
> > objects
> > > stored in it, and the assumption that the TTree object is not going to
be
> > modified,
> > > only appended, seems to be quite important for this optimization.
> > Therefore,
> > > I'd be extremely cautious about making any changes to the design of
TTree
> > which
> > > could have impact on the performance.
> >
> > Object serialization mechanism which we use in ROOT was initially
developed
> > by Borland for TurboVision and TurboPascal 5.5-6.0 in somewhat 1985-90.
I do
> > not think somebody considers this mechanism for real databasing.
> >
> > Unfortunately TTree is the only database-like container in ROOT. ROOT
> > doesn't have a hierarchy of data storage classes which provide different
> > functionalities on different levels. For example if I write only fixed
size
> > records with several binary fields I do not need 80% of TTree
> > functionalities. Thus I would guess I can gain xx% in I/O performace
> > providing a class for this specific case. At the same time I would like
to
> > use TTree like query/drawing so that I would like the same (virtual)
user
> > interface for all databasing classes.
> >
> > > Definitely, having additional DB-oriented capabilities in ROOT would
be
> > nice.
> > > However a question of whether these capabilities should be provided
> > > by modifying the TTree or by implementing a different kind of data
> > container
> > > is an open one.
> >
> > This is exactly what I have asked. If nobody needs these features in
TTree I
> > would like to write my own storage class for my project. I have also
noticed
> > that ROOT doesn't work weel with small events of a few tens of bytes.
Thus I
> > think it should be stated quite clearly in what field ROOT is suppose to
be
> > used. Operating with hundreds of HEP 10-100MB events is quite different
from
> > millions of 100 byte spectroscopy events.
> >
> > > I'd also like to comment on another issue. I know that there is a lot
of
> > > requests to the ROOT team coming from the HEP experiments, which
> > implementation
> > > requires significant resources. For example, PROOF-server is
along-awaited
> > > project. The implementation of the specialized ROOT client-server
utility
> > to
> > > minimize the traffic over the net when running ROOT on a remote node
gives
> > another
> > > example. Full integration of the "TBuffer-exchange" (fast I/O) mode
into
> > ROOT
> > > is yet another one. CDF has requested this mode and is using it for
the
> > data-taking
> > > and I believe that the next generation of experiments will depend on
this
> > mode
> > > even stronger. Taking into consideration the actual resources of the
ROOT
> > team,
> > > I think, that we need to have well-specified priorities.
> >
> > I do not want to start any kind of flame, but could you tell me why ROOT
> > team consists of only two persons if it serves experiments with billion
> > annual bugets? My long research experience tells me that scientific
> > organizations have a very inefficient management. Is it a kind of game?
Just
> > for fun look into "future plans" on the ROOT site (last updated in 95 or
so)
> > and compare it with the present ROOT status. If ROOT would take one or
two
> > more people with permanent positions all these plans could become true.
> >
> > Regards,
> > Anton
>
>



This archive was generated by hypermail 2b29 : Tue Jan 01 2002 - 17:50:37 MET