Re: Response to ROOT criticism?

From: Rene Brun <Rene.Brun_at_cern.ch>
Date: Sat, 05 Aug 2006 22:26:45 +0200


Andy Buckley wrote:
> I know that, and already acknowledged it in the last thread. However,
> we'll just have to disagree on whether C++ is a suitable language for
> interactive use, and certainly on whether or not CINT is a stable
> interpreter implementation.

We can disagree. Let's users vote with their feet. I believe that users have already voted many times.
>
> You're arguing that because multicore CPUs will make compilation
> quick, therefore C++ will be a *fast* language with JIT compilation,
> but that ignores the fact that it's probably one of the most complex
> and pitfall-ridden languages to write. The speed could be matched by
> e.g. a Python interface to already compiled C++ library objects, in
> which the critical path operations, like loops, take place. (There's
> already a nice example of why internalising map operations can be fast
> in the form of Google's MapReduce algorithm)
You did not understand my point. If the interactive language (via CINT/C++) can be rapidly compiled
by the native compiler, this is an irresistible advantage. ACLIC provides a seamless, simple
and efficient interface (in a portable and compiler independent way) to the native compiler.
So you have the combined advantages of an interpreter and a compiler, and you learn only one language.
>
> Here's a couple of other points I disagree with :)
>
> * I disagree with your point that all particle physics researchers
> these days have to know C++ very well and, by implication, write it
> well. In my experience, many HEP researchers write dismal C++. But
> that's opinion, so let's move on :)
Do you have any evidence of the contrary. People who do not understand this are
left on the edge or in a niche. I did not say that it is good. It is a fact.
> * There's an assumption above that the critical path is in the speed
> of the code. However, there's also the substantial time spent
> getting your code to work, which is likely to be greater when
> writing C++ than with a rapid deployment scripting language. If the
> number crunching is done in compiled code, then you *can* have your
> cake and eat it. SciPy (www.scipy.org) does this with great success
> but - whoops - ROOT can't get any benefit from NumPy arrays.
The critical path is the speed at which you achieve your results when working
in a collaborative manner. Working with simple C++ is as efficient as working with Java or Python.
> Because we have no intention of using it for interactive sessions!
> Rivet also needs to be able to interface with native code libraries
> from Fortran and some rather complex C++, which is difficult to do
> from Java. Of course, C++ is also "the official LCG language", but as
> we're not part of LCG we chose it more according to its merits. We're
> also using the STL containers, traits and other functionality, without
> which C++ is pretty primitive, so I'd say our use of C++ is distinctly
> different from ROOT's.

You are confusing C++ within ROOT and users code. I know plenty of people using
sophisticated C++ together with ROOT. When using ACLIC from CINT you have no limitations
with C++, but you are reasoning like a ROOT user in 1997, not 2006.
> I didn't know that the object ownership stuff could be forced off.
As I said, you have never read the most basic pages in the Users Guide. This is described since many years.
> And yes, it's a design decision. Admittedly, it would be a moot
> problem if ROOT used the Boost (http://www.boost.org/) or other shared
> pointers, e.g. http://www.boost.org/libs/smart_ptr/smart_ptr.htm .
AHAH! This is what I call in French "trainer les casseroles", ie it brakes your car and you die.
Most projects competing with ROOT in the past years died because of so many dependencies
that the project becomes quickly totally unmanageable. Do you have experience at all with Boost
and if yes on how many systems?
>
> Or if Java had been used instead, then the user would never need to be
> exposed to memory management at all, thanks to the built-in garbage
> collector (Jamie Zawinski, as ever, has something to say about GCs:
> http://www.jwz.org/doc/gc.html). Since ROOT has reinvented so many
> other Java features, a garbage collector wouldn't be too surprising :)
AHAHAH! You probably forgot a long list of other candidates. This is the typical argument
of a teen ager computer scientist who has produced lines of code only for himself and never
produced a system with a life time greater than a few weeks.
>
> So yes, if you've decided not to support shared pointers, then it is
> just a point of view thing. Maybe I should have made my question "why
> don't you use smart pointers to alleviate memory management problems?"
This is a technical issue. I have nothing against smart pointers. One of the problems here is object
persistency. Smart pointers (like our TRefs) have to work in a consistent way in memory
and maintain relationship through I/O operations.
> Incidentally, I'm not the only one who finds current ROOT memory
> management awkward: it wasn't even one of my criticisms until a bunch
> of other people contacted me to say it was their number one complaint!
This is a vague and poor statement.
>
>>> Global state
> ...
>>> So, in other words, the order of semantically unrelated statements
>>> can matter due to hidden state variables. What's the justification
>>> for such subtle and invisible dependence on the state? Doesn't this
>>> create pitfalls in development of user code?
>> What do you do when you want to refill your car with gas? You take
>> your car and open the tank. Right?
>
> Uhhm, how's that related? :)

Very much related unless you do not understand what a tree is. Before creating a tree,
you must think where you are going to store it, like when you buy gas you better take your car
with you to fill the tank.
> My complaint isn't that there *is* a global state: that's debateable.
> It's that classes like trees and histograms, even if they're only ever
> memory-resident, are over-sensitive to that global state and - worse -
> manipulate it via invisible side-effects. So before you even get to
> points about design strategy, this question was "why is the global
> state so badly implemented?"

This is a void argument or rather an argument of somebody with no arguments.
> I wish I could comment, because HDF5 is well-documented and I've had a
> look through the design docs, although not in the detail I'd need to
> to extend it for object persistency. By comparison, I can't find any
> definition documents for the ROOT format. Apparently, either could
> Julius Hrivnac when he had to write a Java ROOT file writer - his
> description was that the design is "a horror" - but I can't comment on
> that personally.

The problem with Julius is that he understand only the latin language. Saying that the ROOT file
format is not documented is the best proof that you and him have never read the documentation:
Users Guide or simply the description of class TFile. By the way the word "ROOT format"
is totally meaningless and refers to the Fortran era. I assume that by "ROOT format" you mean
"the principle of operation of ROOT persistency". This has two facets   -A: the low level file format
  -B: How objects are serialized
A cannot be simpler and it is fully described in the TFile class description. A ROOT file
is just a collection of variable length logical records. The 1st word of the record is the
number of bytes in the record, such that it is trivial to navigate in a ROOT file.
To see what I mean, take any ROOT file and do:

   TFile f("myfile.root");
   f.Map();
If at this point you are lost, do not read the following. The content of a logical record correspond to your class structure (just nested parenthesis).
You can figure out how it works with the following example. Suppose that you have a TH1 object
(1-d histogram) and you want to see how it is serialized, do

   TFile f("myfile.root","new");
   gDebug=2;
   myhist.Write();
You will get one line per data member of each object. More interesting collections like TTree are fully documented in the class TTree.
Clever people (example Tony Johnson from JAS) have figured out themselves how this works without asking any question. Nothing really fancy. Powerful tools are simple, hence efficient.
>
> But while we're close to the topic, how come the ROOT format doesn't
> implement transient-persistent separation, despite that being a
> requirement of the LCG Persistency Blueprint document?
This is a total non-sense argument. We have people preaching for transient-persistent separation
and using ROOT in production. We do not impose any model here. If you believe that the separation is a good thing, do it, ROOT will support whatever you give to it. I just notice a substantial evolution with respect to what was the Bible in 2002. You seem to be substantially out of phase
with what is happening these days.
>
> I was in a meeting a couple of weeks ago about 4-vector storage in
> HepMC and it transpired that changing a private member variable would
> render the stored ROOT file data unreadable! Transient-persistent
> separation could avoid that problem, if well-designed.
This is again pure fallacy and non-sense. You seem to ignore that ROOT supports a very powerful
automatic class evolution system. Again, read the doc before making such statements.
>
>
>> We also notice that the main requests are for conversions from HDF
> > formats to ROOT format.
>
> I wouldn't rule out selection effects in that observation: people who
> wish to use HDF rather than ROOT are unlikely to contact the ROOT
> developers!

So why is it a problem then? Be consistent with yourself.
>
>
> That's a shame, because I think it's an interesting point: templates
> are dynamic through polymorphism, i.e. good class hierarchy design.
> I'd be interested to know of a circumstance where they aren't flexible
> and generic enough, as much for personal information as anything else.
No, I am sorry, C++ templates are static. That's why you can do type-checking at compilation time.
It is also the reason of the code bloat induced by the heavy use of templates.
> And this isn't just because of CINT's inability to understand them?
> And what's the excuse when it comes to STL components like
> std::string, which are actually static template instances and are much
> safer adn more flexible than char*?

Again the proof that you are making statements about something that you do not know.
You can perfectly use templates with CINT. But because templates are statically defined
in the source, one is forced to generate dictionaries for the concrete class instances
to use interactively or make I/O with. It would be good if templates could be
defined at run time.
>
> Again, this is just personal opinion, but I find that fairly
> incomprehensible, and it sounds more like an opportunistic twist on an
> early mistake. So what's the point of that GetZaxis() when NOT using a
> Lego representation? How is this an improvement over having external
> "HistogramPainter" classes?

One more proof that you do not know the system. We have a class THistPainter deriving from an abstract interface TVirtualHistPainter giving you the possibility to implement your own version if you don't like the existing implementation.
 From all your comments, you really give the impression to not be concerned by I/O. Setting attributes of any kind in a histogram is essential. Most of these attributes must be persistent. All implementations of histograming systems ignoring these basic facts have miserably failed
like you will fail in your projects if you do not understand this point. Probably simply a question of experience.

> No, I mean to use external classes to present the data in graphical
> form, rather than tying the ideas of data and presentation together.
> The you can pass graphics attributes whichever way you like - you
> just pass them to a "HistogramPainter" rather than a histogram itself.
ditto. Not only you ignore persistency, but want to complicate VERY SERIOUSLY the users life.
What a mess it would be if people were following your way. It would be impossible (or very complex)
to clone pads, canvases. The nice TLegend facility could not be implemented and so many parameters to specify that the code will become unreadable.
>
>>> What's the justification for allowing the histogram classes
>>> (probably the most widely used classes in ROOT) to remain so poorly
>>> implemented? (There are, of course, other examples, but I know the
>>> histogramming fairly well)
>> I do not know what you mean by poorly implemented. There are
>> certainly many places for improvements. Many of these classes were
>> the first classes designed in C++. Let me know if you find a place
>> where you could gain in code size, performance, etc. Your
>> contribution will be acknowledged.
>
> Sorry, I should have said poorly designed: the design is a bad
> implementation of the idea of representing data :)
Like many of your arguments, I found this one very particularly arrogant .
>
> Your argument can be extended until all windowing systems, readline
> libraries and other system libraries have to be included in ROOT in
> the name of "coherence"! It seems to me that your concept of the
> dependency is the wrong way around, but as long as I can continue to
> use Minuit without the need for dictionaries and libCore, I'm happy.
So you will be unhappy. To use Minuit interactively, you need a dictionary.
>
> That's sad: if I thought ROOT would do what I need in a useful way, I
> would use it, regardless of the name and any personal politics. I
> didn't start using ROOT as a "very strong opponent" --- that came
> naturally as a result of learning how OO *should* be done. But let's
> avoid the personal stuff, yes?c produce the problem?
>
>

Let's talk about this in a few years, if you are still in the field.

Rene Brun   Received on Sat Aug 05 2006 - 22:27:10 MEST

This archive was generated by hypermail 2.2.0 : Mon Jan 01 2007 - 16:32:00 MET