Re: Response to ROOT criticism? from Andy Buckley on 2006-08-04 (RootTalk)

From: Andy Buckley <andy.buckley_at_durham.ac.uk>
Date: Fri, 04 Aug 2006 19:10:01 +0100

Rene Brun wrote:
> Andy Buckley wrote:

>> Interactive scripting interface ------------------------------- Is

>> C++ really considered as a sensible language for interactive use? >
> So to be short, you can already use Python (pyroot) in ROOT.

I know that, and already acknowledged it in the last thread. However, we'll just have to disagree on whether C++ is a suitable language for interactive use, and certainly on whether or not CINT is a stable interpreter implementation.

You're arguing that because multicore CPUs will make compilation quick, therefore C++ will be a *fast* language with JIT compilation, but that ignores the fact that it's probably one of the most complex and pitfall-ridden languages to write. The speed could be matched by e.g. a Python interface to already compiled C++ library objects, in which the critical path operations, like loops, take place. (There's already a nice example of why internalising map operations can be fast in the form of Google's MapReduce algorithm)

Here's a couple of other points I disagree with :)

I disagree with your point that all particle physics researchers these days have to know C++ very well and, by implication, write it well. In my experience, many HEP researchers write dismal C++. But that's opinion, so let's move on :)
There's an assumption above that the critical path is in the speed of the code. However, there's also the substantial time spent getting your code to work, which is likely to be greater when writing C++ than with a rapid deployment scripting language. If the number crunching is done in compiled code, then you *can* have your cake and eat it. SciPy (www.scipy.org) does this with great success but - whoops - ROOT can't get any benefit from NumPy arrays.

> By the way, why do you use C++ in CEDAR, or RIVET or its successor?

Because we have no intention of using it for interactive sessions! Rivet also needs to be able to interface with native code libraries from Fortran and some rather complex C++, which is difficult to do from Java. Of course, C++ is also "the official LCG language", but as we're not part of LCG we chose it more according to its merits. We're also using the STL containers, traits and other functionality, without which C++ is pretty primitive, so I'd say our use of C++ is distinctly different from ROOT's.

The rest of CEDAR's code is written in Java, with a lot of the HepForge functionality written in Python. No silver bullet :)

>> Memory management / object ownership

 >> ------------------------------------

> There are many reasons why (by default) we
> take the object ownership. As always when taking this kind of
> decisions, you have pros and cons. I imagine the reactions from very
> unhappy users if we were forcing them to manage objects like
> histogram. And I repeat, you can manage these objects yourself

I didn't know that the object ownership stuff could be forced off. And yes, it's a design decision. Admittedly, it would be a moot problem if ROOT used the Boost (http://www.boost.org/) or other shared pointers, e.g. http://www.boost.org/libs/smart_ptr/smart_ptr.htm .

Or if Java had been used instead, then the user would never need to be exposed to memory management at all, thanks to the built-in garbage collector (Jamie Zawinski, as ever, has something to say about GCs: http://www.jwz.org/doc/gc.html). Since ROOT has reinvented so many other Java features, a garbage collector wouldn't be too surprising :)

So yes, if you've decided not to support shared pointers, then it is just a point of view thing. Maybe I should have made my question "why don't you use smart pointers to alleviate memory management problems?"

Incidentally, I'm not the only one who finds current ROOT memory management awkward: it wasn't even one of my criticisms until a bunch of other people contacted me to say it was their number one complaint!

>> Global state
...

>> So, in other words, the order of semantically unrelated statements
>> can matter due to hidden state variables. What's the justification
>> for such subtle and invisible dependence on the state? Doesn't this
>> create pitfalls in development of user code?

> What do you do when you want to refill your car with gas? You take
> your car and open the tank. Right?

Uhhm, how's that related? :)

> Yes, the penalty is
> a global object gDirectory. I know the problems with that
> (multithreading), but I believe that the pros outweighs the cons.

My complaint isn't that there *is* a global state: that's debateable. It's that classes like trees and histograms, even if they're only ever memory-resident, are over-sensitive to that global state and - worse - manipulate it via invisible side-effects. So before you even get to points about design strategy, this question was "why is the global state so badly implemented?"

>> Reinvention / compatibility
>> ROOT encourage use of existing standard tools like the C++ STL
>> (encouraged pretty much everywhere else in my experience) and the 
>> established HDF data formats (which pre-date and provide much of
>> the functionality of the ROOT format)? In the case of the STL, is
>> there any reason for non-use of templated containers in ROOT other
>> than that CINT doesn't understand them properly? In the case of
>> HDF5, wouldn't it be better to extend it for object storage than to
>> re-invent the wheel?

> I think that you have a real fixation on HDF and I am convinced that
> you know ZERO about HDF and HDF5. I know HDF and I am totally
> convinced that the ROOT file design and Tree structure is far more
> powerful than HDF.

I wish I could comment, because HDF5 is well-documented and I've had a look through the design docs, although not in the detail I'd need to to extend it for object persistency. By comparison, I can't find any definition documents for the ROOT format. Apparently, either could Julius Hrivnac when he had to write a Java ROOT file writer - his description was that the design is "a horror" - but I can't comment on that personally.

But while we're close to the topic, how come the ROOT format doesn't implement transient-persistent separation, despite that being a requirement of the LCG Persistency Blueprint document?

I was in a meeting a couple of weeks ago about 4-vector storage in HepMC and it transpired that changing a private member variable would render the stored ROOT file data unreadable! Transient-persistent separation could avoid that problem, if well-designed.

> We also notice that the main requests are for conversions from HDF
> formats to ROOT format.

I wouldn't rule out selection effects in that observation: people who wish to use HDF rather than ROOT are unlikely to contact the ROOT developers!

> Concerning STL, I have much to say about it.

[...]

Your point that STL evolved alongside ROOT is certainly true. However, now that it's part of the C++ standard and is certainly the accepted way to do write C++ code outside of ROOT, the argument for not using it in a visible way is getting weaker (IMO).

> However,templates should
> be more dynamic, in particular when using them in an interactive
> environement, but I have no time to discuss this.

That's a shame, because I think it's an interesting point: templates are dynamic through polymorphism, i.e. good class hierarchy design. I'd be interested to know of a circumstance where they aren't flexible and generic enough, as much for personal information as anything else.

> There are many
> places in ROOT where we use more and more STL, but we cannot change
> the main collections using ROOT collections for STL collections.

And this isn't just because of CINT's inability to understand them? And what's the excuse when it comes to STL components like std::string, which are actually static template instances and are much safer adn more flexible than char*?

>> * All histograms inherit from the 1D TH1. Therefore TH1F, for 
>> example, has a GetZAxis() method, which should never be used! So
>> the class design renders object polymorphism pointless.

> I am afraid that you are wrong. TH1 objects may have a 3-d
> representation (via LEGO options) and we will take advantage of the
> zaxis to control the thickness of the Z view when viewing all 2-d
> objects (and in particular) histograms with the 3-D GL viewer. This
> was an excellent decision.

Again, this is just personal opinion, but I find that fairly incomprehensible, and it sounds more like an opportunistic twist on an early mistake. So what's the point of that GetZaxis() when NOT using a Lego representation? How is this an improvement over having external "HistogramPainter" classes?

>> * Histogram data and presentation are conjoined: if I make a 
>> histogram const to protect the sanctity of my data, I also can't
>> change its colour. D'oh. * and more... ,

>
> And what? Do you mean that if you want to specify a graphics
> attribute (all in TAttLine, TAttFill, TAttMarker and TattText) you
> will pass the info as a draw option. I would be delighted if you
> could submit a proposal in this direction. What a mess!!

No, I mean to use external classes to present the data in graphical form, rather than tying the ideas of data and presentation together. The you can pass graphics attributes whichever way you like - you just pass them to a "HistogramPainter" rather than a histogram itself.

>> What's the justification for allowing the histogram classes
>> (probably the most widely used classes in ROOT) to remain so poorly
>> implemented? (There are, of course, other examples, but I know the
>> histogramming fairly well)

> I do not know what you mean by poorly implemented. There are
> certainly many places for improvements. Many of these classes were
> the first classes designed in C++. Let me know if you find a place
> where you could gain in code size, performance, etc. Your
> contribution will be acknowledged.

Sorry, I should have said poorly designed: the design is a bad implementation of the idea of representing data :)

>> Reflex, Mathcore/Mathmore and Minuit++ 
>> -------------------------------------- Can the developers comment
>> on whether these packages, now part of the ROOT project, will
>> remain usable without any dependencies on the rest of the ROOT
>> libraries? I can see no reason for the objects in these classes to
>> inherit from TObject, for example, and to do so would greatly
>> reduce their usefulness to areas of HEP code which don't use ROOT.
>> Comments?

> Reflex and MathCore are independent of the other ROOT libs. MathMore
> depends on GSL and may depend on other things in the future. Minuit2
> has two components: one independent of ROOT, the other based on ROOT
> because it implements an interface for TVirtualFitter. However, note
> that in order to use MathCore, MathMore in an interactive
> environement, you need the dictionaries that will depend on CINT and
> libCore. As pointed out by Konstantin, this is only a question of
> religion, mainly raised by those not understanding the advantages of
> a coherent framework.The requirements to keep these libraries
> independent force independent build mechanisms and additional work.

"this is only a question of religion, mainly raised by those not understanding the advantages of a coherent framework": condemning "religious" views and then presenting one :) I have no problem with a coherent framework, I just want to be able to use one other than ROOT.

Your argument can be extended until all windowing systems, readline libraries and other system libraries have to be included in ROOT in the name of "coherence"! It seems to me that your concept of the dependency is the wrong way around, but as long as I can continue to use Minuit without the need for dictionaries and libCore, I'm happy.

> I did not see in the discussions any worries/concerns about the ROOT
> persistency mechanism. I believe that this is a big success and
> probably the main reason why ROOT is so successful.

http://hrivnac.web.cern.ch/hrivnac/Blog/#2006.06.30

and, I'm afraid, many private emails to me that I'm not going to reproduce. And my experience in the HepMC/CLHEP discussion recently was that many senior computing people in LCG and the experiments regard ROOT persistency (and in particular the fact that it doesn't separate transient and persistent objects) as deeply problematic.

>> Thanks to the ROOT team for (hopefully) taking the time to address
>> my concerns. I'd like to re-iterate that I have no reason for
>> opposition to ROOT, no other wares to sell: I'm just enumerating
>> the reasons why I stopped using ROOT. If these issues are
>> addressed, then maybe I'll go back to using it :-)

> Your intentions are not clear. For sure you started this thread as a
> very strong opponent to ROOT. An example are your links to some
> non-sense pages from one or two well known opponents, I have no
> illusion to convince you to use it again ::)

That's sad: if I thought ROOT would do what I need in a useful way, I would use it, regardless of the name and any personal politics. I didn't start using ROOT as a "very strong opponent" --- that came naturally as a result of learning how OO *should* be done. But let's avoid the personal stuff, yes?

> More in the coming days or weeks.

Thanks for taking time to reply. I can't say I agree with (m)any of your points, but it's been informative :)

Andy Received on Fri Aug 04 2006 - 20:10:33 MEST

This archive was generated by hypermail 2.2.0 : Mon Jan 01 2007 - 16:32:00 MET