Re: again ROOT db (long)

From: Pasha Murat (murat@cdfsga.fnal.gov)
Date: Tue Mar 31 1998 - 18:36:00 MEST


	Hello,

speaking about restrictions and limitations we should always keep in mind
that *reasonable* restrictions and conventions are very useful. 
Moreover at a closer look some restrictions appear to be not restrictions
but just *rules* and I'll try to illustrate this point below. 

Christoph Borgmeier writes:
 > * all objects of one type must be stored in the same array. That might
 > lead to problems with temporary and semi-temporary objects, i.e. objects
 > which should not be stored or only be stored under certain circumstances. 
 > 
	Why isn't it possible to have 2 TCloneArray's or TObjArray's ?
	I'm presently working on comparing 2 different pattern recognition
	algorithms and 2 TObjArray's of tracks  (with different names!) 
	and 2 different arrays of track segments (again - with different
	names) coexist in the code just fine...

	For the objects which have to be stored only under certain 
	circumstances one could keep pointers to them in the event object
	and to set it to zero if such an object has not to be written out.

 > * all `integer pointers' point into the same array. This forbids the use
 > of polymorphism, which is a major advantage of the ROOT system. 
 > 
	Why isn't it possible for a track object to have one integer data 
	member being a number of the primary vertex and another integer 
	being a number of the calorimeter tower pointed to by a track?

 > * the objects pointed to are not defined by language constructs, the
 > relations are not stored explicitly in the DB. So any code reading the DB
 > has to have already built in the additional information about the
 > relations.
 > 
	In case of ROOT it is a Streamer function which writes/reads
	an object to/from ROOT file. So code writing the ROOT file already 
	has to have built-in knowledge about the things it writes out.
	As the same Streamer function does both reading and writing
	there is nothing wrong with the same "knowledge" to be available
	on the read branch.

 > It seems to me that there are many possible applications for such missing
 > features, e.g. reconstructed decays with generalized particles like
 > different kinds of tracks and calorimeter information. They could also be
 > related to different types of detector hits and calorimeter clusters. 
 > 
 > One could also have different sets of reconstructed tracks, e.g. simple
 > hit fits or results of global vertex fits. One might also get a set of
 > competing reconstructions and store only some of them. These things will
 > have a rich substructure and will not fit well into `flat' arrays.
 > 
 > ROOT provides part of the necessary power, e.g. the possibility to store
 > canvases with deep polymorphic substructures. But up to now, I failed to
 > store parts of my events in a similar way.
 > 
	Here is an important comment: if we consider the requirements to run-time
	representation of the objects and to their persistent representation
	it is easy to see that they are very different. At run-time one needs
	the representation which would be as convenient and efficient as 
	possible so, for example, it makes sense to store in track/particle 
	object track px,py,pz, pt, mass, eta, phi and energy calculated once 
	to avoid multiple calculations of square roots(again - root ...), 
	sines, cosines etc.

	On the other hand when the object is being written out, one of the major
	requirements is data compression and compactness, so it is quite enough 
	to write out just 4 numbers out of 8 listed above. This is a simple 
	example, which shows that in real life it may not be necessary to make 
	persistent all the cross-linked structures which exist at run time.

	There is a practical experience of BaBar collaboration who are using
	Obj/DB (which has all the nice features listed by Cristoph in it)
	for I/O. To keep I/O efficient BaBar people create large persistent 
	objects which they call "banks", pack run-time objects into the banks
	write banks into Obj/DB. After the banks are read back, they 
	are unpacked and the run-time objects restored.

	Conclusion: to keep I/O efficient one may not want to consider
	writing out exactly "flat arrays" rather than "rich substructures".

 > The first reason (*) is, that there might be cross-links inside the object
 > tree. These should not lead to the same object stored twice, there seems
 > to be a hash table mechanism to avoid this, but I could not make it work
 > yet. 
 > 
	BTW, working with integers (indices of the objects in their arrays)
	rather than with pointers solves this problem automatically.

 > The second problem is the deleting of the sub-objects. In my event
 > example, the objects containing the pointers do not necessarily _own_ the
 > objects pointed to (maybe because of *), that's why I tried to create my
 > own garbage collection for this. (It failed so far to free all memory of
 > ~1000 events, but I am still struggling.)

	This is a general problem with the lists(containers) keeping
	pointers to the objects. Normally people use one of the 2 
	following solutions:

	- one could have lists which "know" whether or not they own their
	  objects (have DeleteObjects flag), so destructor acts accordingly
	  to the value of this flag;
	- one could use Clear() instead of Delete() for the containers which
	  do not own objects stored in them in the destructor (ROOT case). 

	I don't see any practical difference between these 2 
	approaches: in both cases it is the user who decides if the objects 
	pointed to have to be deleted... Just minor differences in coding and
	a matter of personal taste... 

I wouldn't like people to get an impression that I'm arguing with Cristoph.
I agree with his observations, but it seems that they are mostly theoretical.
What I tried  to show is that for each concrete case mentioned by Cristoph
there is a practical solution with which I personally am pretty comfortable.
But again - all this is mostly a matter of taste.

						Regards, Pasha.



This archive was generated by hypermail 2b29 : Tue Jan 04 2000 - 00:34:31 MET