Re: again ROOT db (long) (fwd)

From: Christoph Borgmeier (borg@mail.desy.de)
Date: Thu Apr 02 1998 - 13:16:24 MEST


sorry, if this appears twice somewhere. It is a reply on a Pasha Murat
mail.

---------- Forwarded message ----------
Date: Wed, 1 Apr 1998 19:37:17 +0200
From: Christoph Borgmeier <borg@hera-b.desy.de>
Newsgroups: cern.root
Subject: Re: again ROOT db (long)


Hi Pasha,

thank you for the remarks. It still seems to me, I could not make myself
completely understandable. So I'll try to clarify some points. Some of
these parts might look like arguing, but that is definitely not what I
want. So first of all: I understand and agree with many of your points, if
my further explanations and examples seem a bit odd or even sarcastic,
it's just my struggling to be understood and mainly the effect of my
little knowledge of the English language.

On 31 Mar 1998, Pasha Murat wrote: 

[...]
> Christoph Borgmeier writes:
>  > * all objects of one type must be stored in the same array. That might
>  > lead to problems with temporary and semi-temporary objects, i.e. objects
>  > which should not be stored or only be stored under certain circumstances. 
>  > 
> 	Why isn't it possible to have 2 TCloneArray's or TObjArray's ?
> 	I'm presently working on comparing 2 different pattern recognition
> 	algorithms and 2 TObjArray's of tracks  (with different names!) 
> 	and 2 different arrays of track segments (again - with different
> 	names) coexist in the code just fine...

oops, of course it's possible to have more than one array of a certain
type. But you need, as you describe, completely disconnected sets of data. 

[...]
>  > * all `integer pointers' point into the same array. This forbids the use
>  > of polymorphism, which is a major advantage of the ROOT system. 
>  > 
> 	Why isn't it possible for a track object to have one integer data 
> 	member being a number of the primary vertex and another integer 
> 	being a number of the calorimeter tower pointed to by a track?

I think that was exactly my point: I have for example an array of primary
vertices at the interaction point, and some other vertices, which have
slightly different parameter sets linked with some tracks, from which they
are reconstructed. Of course, they share a major part of their
functionality, while they are still different types and are stored in
different places. This would be an example of polymorphism.

> 
>  > * the objects pointed to are not defined by language constructs, the
>  > relations are not stored explicitly in the DB. So any code reading the DB
>  > has to have already built in the additional information about the
>  > relations.
>  > 
> 	In case of ROOT it is a Streamer function which writes/reads
> 	an object to/from ROOT file. So code writing the ROOT file already 
> 	has to have built-in knowledge about the things it writes out.
> 	As the same Streamer function does both reading and writing
> 	there is nothing wrong with the same "knowledge" to be available
> 	on the read branch.

Apparently I did not express myself correctly. Let me try it again a
little more verbousely: I might have the track class you mention above with
an integer meaning `calorimeter tower'. I write some data into a file and
access the database later. Now what tells me the entry `5' in this field? 
For example our reconstruction program has stored clusters in an array
called RCAL? Is it the fifth (sixth) entry of it? Or does it point into
our geometry array GEOXXX and denotes a certain tower? Maybe one has
another array for yet another calorimeter. What do I try to explain: 
Nothing - neither a part of the C++ language, nor the ROOT run time type
information - tells me, that this `5' actually means I have to look up the
fifth (sixth) element of the recoCal array, which happens to be a data
member of a certain FooEvent class. That was just a pathologic example.

A closely related point is, that maybe I still could not get used to the
idea, that the identity of an object is directly related to its certain
position in a certain (global?) array. Note that this is the very
classical (ZEBRA) approach, which differs from other possible ways: in
C/C++ the identity of an object is given by its location in memory, while
an object in a database might be recovered by a key (maybe similar to
TNamed). The C/C++ pointers have obviously the advantage of being more
dynamic: you can `new' and `delete' them without creating visible holes
(like in an array) and without touching the relations of other objects. 
The latter would become a major problem when `Compress'ing a TClonesArray
object which has collected some holes. 

[...]
>  > ROOT provides part of the necessary power, e.g. the possibility to store
>  > canvases with deep polymorphic substructures. But up to now, I failed to
>  > store parts of my events in a similar way.
>  > 
> 	Here is an important comment: if we consider the requirements to run-time
> 	representation of the objects and to their persistent representation
> 	it is easy to see that they are very different. At run-time one needs
> 	the representation which would be as convenient and efficient as 
> 	possible so, for example, it makes sense to store in track/particle 
> 	object track px,py,pz, pt, mass, eta, phi and energy calculated once 
> 	to avoid multiple calculations of square roots(again - root ...), 
> 	sines, cosines etc.

important point: I would not like to have more scalars on the surface as
absolutely necessary. I would want to have also here a substructure, which
provides some encapsulation: a FourVector, a function for calculating
distances to (arbitrary) other objects, etc. That means, I would like to
have a transparent functionality, with physics and geometry in the
foreground, invisibly supported as well by the run-time representation as
by the persistent representation. On the lowest level, even the most
beautiful structures become `0' and `1'. 

This point has of course little to do with the TClonesArray-index
discussion, but in many discussions I have experienced a certain
correlation with it.

You might get the impression, that the things I desire are completely
theoretical. So I'll try to give some vague ideas: 

* Locality of serialized objects, `crosslinked branches': If one would not
use ordinary pointers, but a class with the dereferencing operator working
like a pointer, one could load additional objects on demand. This sounds
similar to the virtual memory management, where additional pages are
`ordered' by a page fault. Maybe one could even adapt this (platform
dependent) version and avoid the replacement of the standard pointers.

* `missing information' in TClonesArray-indices: One should define the
relation between certain arrays _systematically_ and _persistently_. (uh,
even our ZEBRA/FORTRAN package does that). That means, the class members
should not be ints but `ArrayPointers' which can be normally dereferenced. 
Internally, such things could be ints. This is another example of what I
mean when writing about `transparency'. 

ok, these are just vague first ideas, but maybe someone has further
thoughts on them? 

[...]
> But again - all this is mostly a matter of taste.

yes, and I think it's always helpful and maybe even pleasant to discuss
it.

Cheers,
Christoph

[...]



This archive was generated by hypermail 2b29 : Tue Jan 04 2000 - 00:34:31 MET