Do we need yet another custom C++ interpreter?
Hi,
"A ROOT User" asks "Is it really necessary to replace CINT dictionary with cling?", bringing up very reasonable concerns and arguments against re-implementing CINT. I will try to answer his comments to clarify why we do it, and how it connects with the rest.
A fundamental misconception is that the status quo is acceptable. It is not, for several reasons.
- CINT vs C++
CINT was designed (20 years ago!) to be a C interpreted; C++ support was added later. It still has many shortcomings with C++ 2003, let alone C++11. - CINT maintenance
The original author of CINT, Masaharu Goto, has moved on; CINT has been maintained mainly by the ROOT team. It has 300k lines of code; that's a considerable fraction of ROOT's 2.5MLOC. It has been designed to fit into an integrated processing unit of appliances (like medical ones) - not for 16GB RAM, 8 compute thread, 50000 class environments. - Reflex and GCCXML solve it
ATLAS, CMS and LHCb use GCCXML to parse their headers, a set of python scripts to parse the generated XML file and write a C++ source file, the Reflex dictionary, which then gets compiled, linked, loaded, its data injected into the Reflex reflection database, which then gets copied through Cintex into CINT. We have thus many duplications of strings (three in the worst case with Reflex) and conflicts between duplicate dictionaries in Reflex versus CINT (famous: "std::map<std::string, TH1*>" must not be described through Reflex). On top of that, GCCXML is a limited parser (e.g. it swallows typedefs in certain conditions, think Double32_t); as it uses the GCC parser this will not be fixed. I.e. the current setup is fragile, inefficient, and limiting. - CINT is not relevant, I use PyROOT
For calling into C++, PyROOT relies on CINT's reflection data from ROOT (which is why it's so fantastic compared to static SWIG-based approaches). And ROOT relies on CINT for I/O, both the dictionaries and the interpreter. I.e. you use CINT much, much more than you think: it's not just the prompt, it in the core of most of ROOT.
So we need to do something. C++ interpreters are extremely rare. Instead of rewriting a C++ interpreter we decided to reuse existing code. Code that we can still influence, but that's nevertheless production-grade. We expected that this will solve the maintenance and correctness issues. And because it's correct we don't need Reflex, but can instead use one central, fast (compiler!) reflection database.
So yes, this is a major overhaul of ROOT and the dictionaries. We will signal that with a new major ROOT version number. But we expect it to solve the correctness, stability, memory and CPU-consumption as well as the maintenance issues we currently have. The current implementation of cling (which is not yet complete) uses a mere 5000 lines of custom code developed by HEP; everything else is provided through LLVM and clang.
And regarding PyROOT: I am sure Wim will make good use of the new JIT power that comes with cling! Just like we expect the JIT to leave traces e.g. in TFormula, and the real reflection database in the I/O, THtml etc. It gets us unstuck, flexible and future-safe in many central areas of ROOT. O the places you'll go!
Cheers,

Other Python bindings
Re: Other Python bindings
Hi Bram,
Thanks for your question. The main issue about the boost binding is that it is - as far as I understand - completely static and intrusive. PyROOT on the other hand is based on refection data, and it has features that e.g. the Boost binding doesn't offer (e.g. the mapping of concepts). Other bindings (e.g. SWIG-based ones) are difficult to maintain, not compatible with C++, and don't offer PyROOT's features either. So the cost is both on the implementation side and the feature side. Thus why not simply use PyROOT? :-)
Note that we will soon have a PyROOT that builds on top of clang, as part of ROOT 6. I think Wim (the author of PyROOT) plans to port it to a version without ROOT, likely involving PyPy. So that might be exactly what you are looking for :-)
Cheers, Axel.
Re: Other Python bindings
Hi Bram, Axel,
let me add to that (and point out that none of the mentioned tools are intrusive, btw.). The biggest problem with boost.python (with pyste; standalone it is a non-starter) and SWIG is that you need to run a separate tool to create and compile bindings. On top, these bindings are compiled against a specific version of Python, making for a distribution headache (just see the non-pickup of Python3 because of this problem). Compare: dictionaries are already available for all the most important classes in experiments, the EDM, because they are generated for I/O needs. They also do not depend on Python, and thus not on any specific version (only PyROOT does). Besides the obvious ease of use, there is also the benefit of lower memory footprints by not replicating structures. (For that matter, PyROOT creates bindings lazily, the others do not.)
Other problems we've had, are that boost.python is very, very slow and only in "keeping alive" mode since 2004 or so. Pyste is based on gccxml, so no C++11 there, and has seen no major updates since 2005. SWIG is much, much better in both regards, but not up to snuff: it plain and simply can not parse our header files. The way around that, is to write .i files, but as you can imagine, that duplication is not nice for maintenance. Worse, the developers of individual packages need to do this work, and not every C++ developer has Python, let alone SWIG, experience.
Then there's PyPy. All existing binding generator tools (including PyROOT) rely on CPython internals, or at least on the Python C-API. That does not jive with PyPy as it has for example a garbage collector instead of reference counting. Through some heroics, it does expose a Python C-API, but it's slow as it interferes (blocks, really) the just-in-time compiler. Therefore, within PyPy, there are two new approaches: cffi for C and cppyy for C++. Both are part of the standard PyPy releases. There is also already a PyROOT version for the latter (see: http://root.cern.ch/drupal/content/pypyroot).
Cheers,
Wim
Why?
Re: Why?
Hi Matt!
Thanks for your feedback; I'll tried to reply to each of your comments one by one. I do not disagree with all of your comments, but I might have explanations for some of them :-) Sometimes you seem to misinterpret "backward compatibility" (which means "what used to work will continue to work") with "no change" - but that might just have been your motivation to take the time for writing your feedback, so I don't complain :-) Given the relevance of your comments I decided to reply in a separate blog post.
Cheers, Axel
Thank you for the very nice
Re: Backward Compatibility
Hi ROOT user,
Thanks for your comment! And yes, backward compatibility is key in this area. I will do all I can do reduce the amount of code we need to maintain only for backward compatibility reasons - e.g. Reflex can hopefully be removed instead of being rewired to tap the clang AST (i.e. the cling reflection database). But at the same time we will make sure that all data stored by the experiments remains readable (ideally even from 2001 :-).
This is mostly an issue of type names; CINT has some non-obvious (and non-standard compliant) naming conventions for types, and we must make sure that cling continues to understand them. Or we cannot read an
edm::TaggedVector<edm::Jet>anymore (because CINT would have called it anedm::TaggedVector<Jet>).We plan to release a snapshot of ROOT using cling in the third quarter of 2012; we will really appreciate feedback on problems with reading old files - as you correctly pointed out this is one of the most crucial ingredients of this project.
Cheers, Axel.
Thank you for clarifying a
Re: I/O Performance
Hi ROOT User,
We have dramatically improved the I/O performance over the last two years. If you use the latest production release also for writing data you might be able to see a performance improvement of an order of magnitude compared to e.g. 5.26, both in real and CPU time! See e.g. this blog entry.
We have been comparing the performance of ROOT I/O with competitors like Google ProtoBuf; we know exactly where we spend extra time and why, e.g. for schema evolution, proper C++ type support, introspection, pointers.
On the other hand, are you sure you make use of all the performance features ROOT offers? Did you enable the tree cache (on by default for PROOF and one tree per file, off - for now, still - otherwise)? Do you only read the branches you need? I am working on a new TTree read access class that should simplify all of that considerably (and is type safe - no more
void*&!); maybe I should take your comment as an invitation to speed up :-)Cheers, Axel.
Thanks again Axel for another
-- Rely gcc with reflex (or any other similar mechanism/compiler) to generate dictionaries for IO and PyROOT. This probably does not require an interpreter.
-- Promote use of PyROOT for analysis macros instead of macros interpreted by CINT.
-- Maintain CINT as for legacy reasons and plan an eventual phase out.
I suspect there is a good reason why C/C++ interpreters are rare - it is just not an easy/efficient way to run code. PyROOT is a fantastic tool and it can everything that CINT can, with an advantage of having access to all python libraries.
Anyway, this is just to clarify my point. I understand that you went through similar arguments and choose a development path best suitable for the community.
Re: Interpreters
Hi ROOT user,
Thanks for your comments - they are excellent!
Your scenario would probably work - but we decided against it, and I believe that we have good reasons for that :-)
GCCXML's future is limited; there is a re-write based on GCC's plugin mechanism, but both suffer from the same problems: we cannot influence what the GCC parser does. And reading headers, writing XML, parsing XML, writing (huge files of) C++, compiling, linking, loading - that's really, really inefficient and error prone.
Python is much simpler than C++. But it's still a horrible language in our environment, unless it's used as bash++. Not a single algorithm should be written in Python: it's terribly hard to convert it into C++, and it's incredibly slow in Python (ask the Google developers about youtube).
So C++ is not a good interpreted language, mainly due to its syntactic verbosity and its lack of dynamic interfaces and reflection capabilities - think
Cheers, Axel
Hi Alex, Very good points
Very good points but let me try to defend python. I have found that a following approach (used by ATLAS that I also adopted in my private code) works fantastically well:
-- Use python to read configuration, find input files, etc;
-- Write performance critical code in C++;
-- Create C++ objects in python (relying on ROOT for dictionary support);
-- Pass configuration from python to C++;
-- Do calculations in C++;
-- Return results to python for processing, ploting, etc;
-- Run entire plot making code in python for stacking, labeling, etc.
Granted, this is probably a more complex approach than most of us in physics are willing to tolerate. I suspect that you do not have much choice since the user community wants CINT-like functionality from ROOT (and one feature of the ROOT project that makes it great is a full consideration of what experiments and users need for data taking and analysis).
Thanks for the interesting discussion! I have learned quite a bit about ROOT plans and it all seems very promising. Cheers!
Dependency on Python
CINT need to be communitized, that's the whole problem
CINT and Open Source
Hi Daniel,
Thank you for your comment! As a matter of fact, CINT does not depend on ROOT at all. It is open source. It was used in commercial products independently of ROOT. I also don't see where the connection between cling and a python dependence comes in?
Given the amount of work that went into GCC to bring C++11 support I find it unrealistic that we (not compiler people!) would be able to lift CINT to C++11...
Cheers, Axel