Re: Response to ROOT criticism? from Rene Brun on 2006-08-03 (RootTalk)

From: Rene Brun <Rene.Brun_at_cern.ch>
Date: Thu, 03 Aug 2006 23:17:22 +0200

Andy Buckley wrote:
>
> A few weeks ago we had the long discussion about ROOT's page on
> Wikipedia. I'm pleased to note that the page is now more balanced than
> it was, and the discussion continues, so some progress on the initial
> topic was made. Good!
>
> During that discussion, Rene Brun said that after the next ROOT
> release he would respond to the specific technical criticisms that
> came up in the discussion
> (http://root.cern.ch/root/roottalk/roottalk06/0823.html). Since the
> answer never came, this email is a reminder!
>
> I'd like to see how the development team answers some of our points,
> hopefully without degenerating into another slanging match. To save
> trawling the previous thread for the topics, here are a selected few
> in fairly brief form (and _roughly_ in decreasing order of importance
> from my point of view):
>

As I said, I will reply in due time. We made our release 5.12 middle of July and I took some days
of rest, then the usual stuff with the mail backlog after the holidays. I will reply now only to some of your points, even if I disagree with your order of importance.
> ####################
>
> Interactive scripting interface
> -------------------------------
> Is C++ really considered as a sensible language for interactive use?
> Given that it's legendary even among other compiled, strong-typed
> languages for being baroque and syntactically complex? Is there any
> intention to use Python, Ruby (or another language specifically
> _designed_ for interactive/scripting use) as the default text UI?
Since you seem to ignore it, we provide python and Ruby interfaces for users who believe that
these front-ends are better or are forced by their collaboration to use them.
Personally, I have nothing against Python. I like several things in the principle and the implementation.
However, I DO NOT BELIEVE that Python is the right solution. C++ is the accepted language by
the LHC and other experiments for simulation, reconstruction and all the hard work.
You have to know C++ anyhow (and know it pretty well) if you intend to do Physics these days.
I have always advocated a simple use of C++, being careful with the prophets of advanced
features. You can perfectly write readable and very efficient and portable code without being very
fancy. But why I do not believe in Python for the long term? With more and more powerful CPUs and multi-core devices, one should be able to compile
on the fly many lines of code directly from the interactive interpreter shell. This is the direction
that we are taking with ACLIC (great development and support from Philippe). You can do

root > .x myfile.C
This is the standard way via the CINT interpreter root > .x myfile.C+
The file myfile.C will be automatically compiled with the native compiler, including an
automatic generation of the class(es) dictionaries in a transparent way. This works wonderfully well
on all platforms, and it behaves like make. If you run again ".x myfile.C+" without having
changed myfile.C, the system is clever enough to use directly the already generated compiled
code and the response is instantanous.
We see a rapidly growing acceptance of this way of working. It is safe. It is fast. It is robust.
More we go, faster it will be. In less than 3 years from now, you should be able to compile
on the fly 10,000 lines of C++ in less than 1 second. The fact that only one language will
be required, ie the main language, will be such a terrific advantage. I am convinced that this is
the way to go. Of course, one could implement a thin layer that helps looping on our usual
collections, object containers, in a transparent way in memory or/and disk. I also support the Python interface. We are very lucky to collaborate with Python enthusiasts,
(thanks Wim) and I believe that it is good that some people (or even collaborations) accept
to be guinea pigs in this area. So to be short, you can already use Python (pyroot) in ROOT.
By the way, why do you use C++ in CEDAR, or RIVET or its successor?

>
> Memory management / object ownership
> ------------------------------------
> If I pass a histogram to a canvas, say to plot it to an EPS file, and
> then want to delete the canvas and do some more manipulations on the
> histogram, I can't do so directly because the canvas' destructor will
> also delete any objects to which it holds pointers. This
> over-aggressive memory management forces weird strategies like cloning
> everything before passing it to a canvas: is that really justifiable?

We have a small chapter in the Users Guide describing the object ownership for histograms and Trees.
I just note that your statement above is wrong. If you delete the canvas, you do not delete the histogram. Deleting the canvas deletes the low level graphics objects such as lines, text not created by the histogram. And if you do not like the automatic ownership mechanism, you can disable it by calling

TH1::AddDirectory(false);
There are many reasons why (by default) we take the object ownership. As always when taking
this kind of decisions, you have pros and cons. I imagine the reactions from very unhappy users
if we were forcing them to manage objects like histogram. And I repeat, you can manage
these objects yourself by the call described above.
>
> Global state
> ------------
> When I joined this list at the time of the Wikipedia discussion, I saw
> a reply from Rene to a question about tree-filling
> (http://root.cern.ch/root/roottalk/roottalk06/0779.html) where the
> answer was:
>
> Instead of doing:
> TTree *T = new TTree(...)
> TFile *f = new TFile(...)
> you should do:
> TFile *f = new TFile(...)
> TTree *T = new TTree(...)
>
> So, in other words, the order of semantically unrelated statements can
> matter due to hidden state variables. What's the justification for
> such subtle and invisible dependence on the state? Doesn't this create
> pitfalls in development of user code?

What do you do when you want to refill your car with gas? You take your car and open the tank. Right?
ROOT (like PAW) (like Unix) has the concept of current directory. When you create a new file, you do not specify the name of the directory where you want to create it.
In the vast majority of cases, you create it in the current directory. Right?
It would not be very elegant to specify (via arguments) the place where you want to create
your new objects.
Yes, the penalty is a global object gDirectory. I know the problems with that (multithreading),
but I believe that the pros outweighs the cons. As always to see the pros and cons, on would need to see an implementation without the
concept of gDirectory, and by implementation, I mean having a global view on all the
consequences (not only when creating a Tree).
>
> Reinvention / compatibility
> ---------------------------
> Why doesn't ROOT encourage use of existing standard tools like the C++
> STL (encouraged pretty much everywhere else in my experience) and the
> established HDF data formats (which pre-date and provide much of the
> functionality of the ROOT format)? In the case of the STL, is there
> any reason for non-use of templated containers in ROOT other than that
> CINT doesn't understand them properly? In the case of HDF5, wouldn't
> it be better to extend it for object storage than to re-invent the wheel?
I think that you have a real fixation on HDF and I am convinced that you know ZERO about HDF
and HDF5. I know HDF and I am totally convinced that the ROOT file design and Tree structure
is far more powerful than HDF. However HDF is a standard format in Astrophysics.
We know customers (AstroRoot) who have made interfaces between FITS (HDF dialect)
with ROOT. We have been so much accused in the past to do everything. My point is that
if people feel the need of an interface, they should do some work and we will help them.
We also notice that the main requests are for conversions from HDF formats to ROOT format.

Concerning STL, I have much to say about it. I implemented the first version of the histogram
package using templates and quickly moved to a second than third (the current) implementation.
ROOT will have been dead since a long time if we had used templates as early as 1995.
Even today, we see so many portability issues with code using templates. STL collections did not exist when we started (or in a different form). We had access
to early versions of STL when Igor Stepanov was still working with HP. The thing was unusable
and anyhow using STL is much connected with templates. To day the situation is a bit better, in particular if you do not need to compile your code
with anything else than gcc (>3) or VC++(>7.0). One of the problems with templates is the static definition (also good aspect with this,ie type
checking at compilation time). However,templates should be more dynamic, in particular when
using them in an interactive environement, but I have no time to discuss this.
There are many places in ROOT where we use more and more STL, but we cannot change
the main collections using ROOT collections for STL collections.
>
> Class design
> ------------
> My canonical example of a problematic class structure in ROOT is the
> histogram classes:

The only problem with the design was to name TH1 what should have been called THistogram.
I choose TH1 to minimize the typing (we did not have the TAB facility for many years).
I am not going to defend all the aspects of the TH design. However, I do not think that the
current design penalizes: performance, size or ease of use, in particular:
>
> * All histograms inherit from the 1D TH1. Therefore TH1F, for
> example, has
> a GetZAxis() method, which should never be used! So the class design
> renders object polymorphism pointless.
I am afraid that you are wrong. TH1 objects may have a 3-d representation (via LEGO
options) and we will take advantage of the zaxis to control the thickness of the Z view
when viewing all 2-d objects (and in particular) histograms with the 3-D GL viewer.
This was an excellent decision.
> * Histogram data and presentation are conjoined: if I make a
> histogram const
> to protect the sanctity of my data, I also can't change its
> colour. D'oh.
> * and more... ,

And what? Do you mean that if you want to specify a graphics attribute (all in TAttLine,
TAttFill, TAttMarker and TattText) you will pass the info as a draw option. I would be delighted if you could submit a proposal in this direction. What a mess!!

>
> What's the justification for allowing the histogram classes (probably
> the most widely used classes in ROOT) to remain so poorly implemented?
> (There are, of course, other examples, but I know the histogramming
> fairly well)

I do not know what you mean by poorly implemented. There are certainly many places for improvements.
Many of these classes were the first classes designed in C++. Let me know if you find a place
where you could gain in code size, performance, etc. Your contribution will be acknowledged.
>
> Reflex, Mathcore/Mathmore and Minuit++
> --------------------------------------
> Can the developers comment on whether these packages, now part of the
> ROOT project, will remain usable without any dependencies on the rest
> of the ROOT libraries? I can see no reason for the objects in these
> classes to inherit from TObject, for example, and to do so would
> greatly reduce their usefulness to areas of HEP code which don't use
> ROOT. Comments?

Reflex and MathCore are independent of the other ROOT libs. MathMore depends on GSL and may depend on other things in the future. Minuit2 has two components: one independent of ROOT, the other based on ROOT because it implements an interface for TVirtualFitter. However, note that in order to use MathCore, MathMore in an interactive environement,
you need the dictionaries that will depend on CINT and libCore. As pointed out by Konstantin, this is only a question of religion, mainly raised by those not
understanding the advantages of a coherent framework.The requirements to keep these libraries
independent force independent build mechanisms and additional work.
>
> ####################
>
> That's it! Well, not quite: others on the Wikipedia talk page, in the
> previous mailing list discussion (and in real life HEP meetings, of
> course) have highlighted concerns about ROOT's persistency and other
> features. But to keep it reasonably short, I've just picked these few.
I did not see in the discussions any worries/concerns about the ROOT persistency mechanism.
I believe that this is a big success and probably the main reason why ROOT is so successful.
>
> Thanks to the ROOT team for (hopefully) taking the time to address my
> concerns. I'd like to re-iterate that I have no reason for opposition
> to ROOT, no other wares to sell: I'm just enumerating the reasons why
> I stopped using ROOT. If these issues are addressed, then maybe I'll
> go back to using it :-)

Your intentions are not clear. For sure you started this thread as a very strong opponent to ROOT.
An example are your links to some non-sense pages from one or two well known opponents,
I have no illusion to convince you to use it again ::)
>
> I think a valid point in the previous discussion was that the dominant
> component of data analysis time is often the time spent getting the
> software in a usable state rather than the time spent
> number-crunching. So before the complaints flood into my email inbox,
> consider that this sort of "philosophising" *can* have a definite
> impact on the day-to-day work of data analysis --- just because you
> can currently persuade ROOT to do what you want doesn't mean that it
> can't be improved! But if you're not interested in this design
> discussion, sorry in advance if you have to delete a lot of email.
ROOT has been built (as we say in French "contre vents et maree") by many people contributing
to it in many different forms: code, suggestions, criticisms. We hope that this will continue.
Today we have less margins than before. We must be back compatible. It does not mean
that we are against improvements. We give a strong weight to concrete code contributions,
new packages or optimization/rewrite of existing algorithms. Our priority is to consolidate the system, to improve robustness and be ready for the very
challenging coming months.