RootTalk


ROOT Discussion Forums

PyPyROOT: high performance PyROOT (first beta)

Discuss PyROOT, the Python ROOT language binding, here.

Moderators: wlav, rootdev

PyPyROOT: high performance PyROOT (first beta)

Unread postby wlav » Thu Sep 05, 2013 22:45

Dear all,

PyPy (http://pypy.org) offers a highly compatible version of the Python 2.7 interpreter and comes with a just-in-time compiler (JIT) that can greatly speed up certain types of computing tasks. In particular, mathematics done in loops, as is common in HEP analysis codes.

Where PyPy breaks compatibility, is in the use of extension libraries, especially when those rely on the internals of the CPython interpreter. PyROOT being on of them, it did so far not work on PyPy.

Now, a first beta release of PyPy supporting PyROOT is available here:

/afs/.cern.ch/sw/lcg/external/pypy

Run the setup script (which sets up the proper CPython, ROOT, and GCC):

/afs/.cern.ch/sw/lcg/external/pypy/x86_64-slc6/setup-pypyroot.sh

which will make the executable 'pypyroot' available, which can be used as the normal CPython Python interpreter. For example:

Code: Select all
   $ pypyroot
   >>>> import ROOT
   >>>> c1 = ROOT.TCanvas()
   >>>> # etc.


I hope to collect feedback over a period of about a month or so, then cut a release 1.0 by CHEP. Please help out by trying it out!

For performance improvements, the "compilation" part of the JIT is only half the story. Far more important are the recognition of use cases by PyPy and specialization of them. One such case is ROOT I/O and TTree reading, for example, runs at C++ speeds in pypyroot (it can be 50x slower in PyROOT). As you can imagine, that is a long (but worthwhile) work in progress.

Other developments by the PyPy team, that are important for the future of high performance python codes, include automatic thread safety using software transactional memory and use of vector instructions in the JIT for numpy code. Likewise, these are a works in progress, but are coming along nicely.

Compatibility with PyROOT is not 100%, and in some ways never will be. For example the fact that PyPy uses a garbage collector rather than reference counting, and that "from ROOT import *" can not work together with the JIT. However, both these can simply be worked around to have code that works fine on both interpreters (e.g. consistently use "import ROOT") and the list of remaining features to-be-done is rather small and shrinking. (1)

Furthermore, comparing pypyroot today to what PyROOT had in functionality when it was first included in ROOT, back in 2004, it is clear that pypyroot is light years ahead of that version. As such, it is already very useful for any standalone ROOT work. (2)

A list of C++ features supported can be found here (this is documenting the Reflex backend, but it's the same feature-list for CINT):

http://doc.pypy.org/en/latest/cppyy.html#features

In addition, for the CINT backend, pypyroot has several ROOT-specific pythonizations, such as for TObject, TTree, TString, etc.

When ROOT6 comes out, the CINT backend of pypyroot is ready to be swapped out for a Cling backend. The latter allows for a tighter integration with the JIT (as has been shown with the Reflex backend, which is part of PyPy by default since release 2.0).

Please try it out and give me feedback. Thanks!

Best regards,
Wim Lavrijsen

(1) See: http://root.cern.ch/drupal/content/pypyroot for a short-list
(2) If you use a lot of (C-)extension modules, not bound using rootcint, some of your favorite ones may not (yet) be available for PyPy.
User avatar
wlav
 
Posts: 1221
Joined: Mon Jun 14, 2004 18:40
Location: Lawrence Berkeley National Lab

Re: PyPyROOT: high performance PyROOT (first beta)

Unread postby jfcaron » Sun Sep 08, 2013 7:15

Is there a way for me to try PyPyROOT on my own machine outside of CERN?

I can get PyPy and ROOT from MacPorts, but does ROOT need to be rebuilt to work with PyPy? Can I just try to import ROOT from PyPy naïvely?

Jean-François
jfcaron
 
Posts: 225
Joined: Fri Apr 01, 2011 10:49

Re: PyPyROOT: high performance PyROOT (first beta)

Unread postby wlav » Sun Sep 08, 2013 17:16

Jean-François,

looking at MacPorts, I see their version of PyPy is 2.0.2, so it will have cppyy. The default used, however, is the loadable backend so that pypy does not need to be linked with any C++ libraries. This backend needs to be installed separately, and I only have that for Reflex. The problem with CINT is that certain features are baked into pypy (e.g. TString converters and the RecursiveRemove callback), until I figure out a decent API for adding such functionality at the user-level.

You can, of course, install pypy from source. Details are here:
http://doc.pypy.org/en/latest/cppyy.html#installation

I recommend doing the 'hg up reflex-support', since I have not yet moved the latest changes (virtually all for CINT) to default. To enable the CINT backend, you need to modify two files. First, select the builtin_capi, then select from there cint_capi. Details are here:
http://doc.pypy.org/en/latest/cppyy_backend.html

All this is going to be cleaned up with ROOT6 and the LLVM backend. :)

On lxplus, I also needed to install libffi (the shared library is easiest to deal with, so if you build libffi from source, use --enable-shared).

Beyond that, yes, theoretically ROOT will need to be rebuild as well, but only libPyROOT. However, that should only be needed to get TPython to work, all the rest is fine. But I'm thinking of including that code in the CINT backend directly, since the headers in PyROOT only contain forward declares that should pre-empt the code in libPyROOT and all should work. Haven't done that yet, though.

Thanks for trying it out!

Cheers,
Wim
User avatar
wlav
 
Posts: 1221
Joined: Mon Jun 14, 2004 18:40
Location: Lawrence Berkeley National Lab

Re: PyPyROOT: high performance PyROOT (first beta)

Unread postby jfcaron » Mon Sep 09, 2013 21:07

I followed the procedure at your first link, including enabling the CINT backend. The translation took 2 hours with MacPort's CPython (MacPort's PyPy didn't work, it gave an error about the __pycache__ files):

Code: Select all
jfcaron@jfcaron-MacBook:~/Projects/PyPyRoot/pypy$ pypy rpython/translator/goal/translate.py --opt=jit pypy/goal/targetpypystandalone.py --withmod-cppyy
Traceback (most recent call last):
 File "app_main.py", line 72, in run_toplevel
 File "rpython/translator/goal/translate.py", line 89, in <module>
   log = py.log.Producer("translation")
 File "/Users/jfcaron/Projects/PyPyRoot/pypy/py/_apipkg.py", line 114, in __makeattr
   result = importobj(modpath, attrname)
 File "/Users/jfcaron/Projects/PyPyRoot/pypy/py/_apipkg.py", line 37, in importobj
   module = __import__(modpath, None, None, ['__doc__'])
 File "/Users/jfcaron/Projects/PyPyRoot/pypy/py/_log/log.py", line 184, in <module>
   setattr(Syslog, _prio, getattr(py.std.syslog, _prio))
 File "/Users/jfcaron/Projects/PyPyRoot/pypy/py/_std.py", line 13, in __getattr__
   m = __import__(name)
 File "/opt/local/lib/pypy/lib_pypy/syslog.py", line 68, in <module>
   lib = ffi.verify("""
 File "/opt/local/lib/pypy/lib_pypy/cffi/api.py", line 311, in verify
   lib = self.verifier.load_library()
 File "/opt/local/lib/pypy/lib_pypy/cffi/verifier.py", line 68, in load_library
   self.compile_module()
 File "/opt/local/lib/pypy/lib_pypy/cffi/verifier.py", line 55, in compile_module
   self._write_source()
 File "/opt/local/lib/pypy/lib_pypy/cffi/verifier.py", line 117, in _write_source
   file = open(self.sourcefilename, 'w')
IOError: [Errno 2] No such file or directory: '/opt/local/lib/pypy/lib_pypy/__pycache__/_cffi__g7019d5d3xad93c709.c'


After translation, I tried the "Basic bindings example" from your link, but it seems to not recognize the MyClass:
Code: Select all
>>>> import cppyy
>>>> cppyy.load_reflection_info("libMyClassDict.so")
<CPPLibrary object at 0x0000000106933370>
>>>> myinst = cppyy.gbl.MyClass(42)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: <class '__main__.::'> object has no attribute 'MyClass' (details: '<class '__main__.::'>' has no attribute 'MyClass')
>>>> "MyClass" in dir(cppyy.gbl)
False


A few extra notes:
I had to install gccxml-devel from MacPorts to get genreflex to work.
I had to remove the -rdynamic switch from the g++ call, because apparently that's on by default on OSX.
I did install libffi from MacPorts, but it's not clear at what step it is linked in.

So I am still a few steps behind trying PyPyROOT, as cppyy is not working yet. I'd be happy to try any other recommendations to get it working so I can test my PyROOT code. Unfortunately I'm not very good at getting stuff to compile/link, but I can report error messages!

Jean-François
jfcaron
 
Posts: 225
Joined: Fri Apr 01, 2011 10:49

Re: PyPyROOT: high performance PyROOT (first beta)

Unread postby wlav » Mon Sep 09, 2013 21:45

Jean-François,

for the first, I'm not sure, but it might be an access-rights issue to the __pycache__ directory. It is surprising though, that cffi caches are created in the installed pypy version. Maybe Armin or Maciej will have an answer later on the dev list (most folks hang out on IRC though, rather than on dev).

As for the example: yes, the instructions there are for Reflex. With the CINT backend, Reflex is not supported. Theoretically I could mix them natively (rather than through Cintex), but then you'd lose I/O for Reflex classes, which isn't much of an option either. The CINT backend is for ROOT only (PyCintex.py can be modified to make it work, though, just haven't done so yet). LLVM/ROOT6 will unify all.

Easiest with CINT is just to use ACLiC. This should work:
Code: Select all
>>>> import cppyy
>>>> cppyy.gbl.gROOT.LoadMacro("MyClass.h+")
>>>> myinst = cppyy.gbl.MyClass(42)

And then all you need is to pick up ROOT.py from /afs/.cern.ch/sw/lcg/external/pypy/x86_64-slc6/python/ROOT.py and that'd be it (although I have not tested this on a Mac yet).

(With the Reflex or the loadable backend, there won't be any 'gROOT' builtin.)

Thanks for trying! :)

Cheers,
Wim
User avatar
wlav
 
Posts: 1221
Joined: Mon Jun 14, 2004 18:40
Location: Lawrence Berkeley National Lab

Re: PyPyROOT: high performance PyROOT (first beta)

Unread postby jfcaron » Tue Sep 10, 2013 2:30

Thanks, I didn't realize that the example for cppyy was Reflex only and not CINT. I tried your example instead which compiles MyClass.h with ACLiC, and it works. I copied the ROOT.py from your afs address, and the basic test of creating a TH1F & Drawing also works.

Perhaps tomorrow I will try to run some of my analysis code with pypy-c.

Jean-François
jfcaron
 
Posts: 225
Joined: Fri Apr 01, 2011 10:49

Re: PyPyROOT: high performance PyROOT (first beta)

Unread postby jfcaron » Tue Sep 10, 2013 18:48

It somewhat works, but has weird failures! Read on for the whole story.

The times here are all done using the bash built-in "time" on my fast MacBook. In reality the main processing was done on a cluster whose individual processors were much slower. The code doesn't do any parallel processing, but I ran hundreds of independent data chunks simultaneously. All the python code avoids using NumPy because getting it set up properly on the cluster was painful, and I only really needed contiguous arrays for ROOT interoptability, so I mostly used the builtin Python arrays and my own array arithmetic extension.

For the project that I just finished, my analysis went in several stages. The first is a C++-only compiled ROOT program, so that's fixed (and takes ~tens of minutes to run per data chunk).

The second is a Python-only (using PyROOT) program. Here are the results:
CPython: ~2m45s
PyPy: ~1m0s

The third stage is a shorter all-Python script. With CPython it runs in ~14s, but with PyPyROOT it fails when I try to make a TF1. Here is the error message:
Code: Select all
Traceback (most recent call last):
  File "app_main.py", line 72, in run_toplevel
  File "calculate_R.py", line 180, in <module>
    pois = ROOT.TF1("pois",Poisson(),clulims[0],clulims[1],2)
TypeError: none of the 8 overloaded methods succeeded. Full details:
  TF1::TF1() =>
    TypeError: wrong number of arguments
  TF1::TF1(const TF1&) =>
    TypeError: wrong number of arguments
  TF1::TF1(const char*, const char*, Double_t, Double_t) =>
    TypeError: wrong number of arguments
  TF1::TF1(const char*, Double_t, Double_t, Int_t) =>
    TypeError: wrong number of arguments
  TF1::TF1(const char*, ROOT::Math::ParamFunctor, Double_t, Double_t, Int_t) =>
    TypeError: cannot pass instance as ParamFunctor
  TF1::TF1(const char*, void*, Double_t, Double_t, Int_t) =>
    TypeError: 'CPPInstance' object expected, got 'instance' instead
  TF1::TF1(const char*, void*, Double_t, Double_t, Int_t, const char*) =>
    TypeError: wrong number of arguments
  TF1::TF1(const char*, void*, void*, Double_t, Double_t, Int_t, const char*, const char*) =>
    TypeError: wrong number of arguments

The code is here http://bazaar.launchpad.net/~jfcaron/+junk/TRIUMFBeamTest/view/head:/cluster_analysis/calculate_R.py if you wish to look at how I am using the TF1. The Poisson() call creates a Python functor object that internally calls ROOT::TMath stuff.

The third script makes some plots (all with ROOT-based stuff like TCanvases and TGraphs). With CPython it runs in ~3s, but PyPy again crashes with this error message:
Code: Select all
Traceback (most recent call last):
  File "app_main.py", line 72, in run_toplevel
  File "FOMplots.py", line 1444, in <module>
    FOMplots(sys.argv)
  File "FOMplots.py", line 283, in FOMplots
    rescaleaxis(traces[c][i],sample_width/1e-9)
  File "FOMplots.py", line 98, in rescaleaxis
    g.SetHistogram(0)
TypeError: cannot pass int as TH1F

Again the code responsible for the crash is here: http://bazaar.launchpad.net/~jfcaron/+junk/TRIUMFBeamTest/view/head:/cluster_analysis/FOMplots.py It happens in a function that rescales a TGraph, and that code works fine in CPython.

So overall, I am happy that the first python stage works with PyPy, and magically runs faster. The other two show mysterious crashes in code that otherwise works in CPython. I should note that I had already put in some effort to optimize the first python analysis stage. For example it uses tons of memory to cache results rather than recompute them, so if one of PyPy's magic tricks is to speed up native python calculations, my caching would reduce the visible benefit from PyPy.

As before, I am willing to try modifications to get PyPy working for all the stages (or to track down the problem if it's something in PyPy(ROOT)). I am very excited by the prospect of using PyPyROOT for my current project (which is still in its infant-C++ stage).

Jean-François
jfcaron
 
Posts: 225
Joined: Fri Apr 01, 2011 10:49

Re: PyPyROOT: high performance PyROOT (first beta)

Unread postby wlav » Tue Sep 10, 2013 21:47

Jean-François.

cool, thanks for the feedback!

The exceptions occur for the TF1 b/c I have yet to write the TF1/2/3 callback implementations and pythonizations. The second because the code does not allow the integer '0' to pass through a pointer. I just need to write/add those. The latter is trivial to fix, the former is more work. I'll get to it.

As for caching ... it depends: PyPy does not so much memoize results, but rather elides them. This only works if the compute-heavy function call is completely side-effect free, and there are only a limited number of different inputs within a compiled trace (otherwise there is the risk of an explosion of combinatorics) or the inputs to the function are known to be constant over the scope of the trace. Those are hard requirements to meet.

Of course, if the cached function is relatively simple and can be inlined within a trace, then that may very well be faster than the lookup in the cache.

Thanks again,
Wim
User avatar
wlav
 
Posts: 1221
Joined: Mon Jun 14, 2004 18:40
Location: Lawrence Berkeley National Lab

Re: PyPyROOT: high performance PyROOT (first beta)

Unread postby wlav » Sat Sep 14, 2013 3:07

Jean-François,

back to this one. :)

So I have working TF1 callback. Still, I'm seeing a crash when using it for a fit, but only post-translation. What I see in gdb seems simple to fix (NULL-check), but isn't for today anymore. I realize that a fit is what you need, not just the callback ...

The other TODO left is that errors are currently silently absorbed, but that is no more than annoying.

Performance seems fine (tested on plotting, not fitting), as I'm able to bring the callback first back to the interpreter, only then call the user function. Meaning, the user function is open for JIT-ing, and the penalty is 'only' in the song-and-dance through CINT. I do not know however if the code will warm up if the loop is in C++ (as is the case when doing a Fit). I would expect not, so that would require some JIT hints.

The other problem, passing an int 0 through a pointer, is fixed.

Code has been pushed on the reflex-support branch, so it can be tried out if your are adventurous, but I want to fix those (post-translation) errors and a few other feedback items before rebuilding on lxplus.

Thanks,
Wim
User avatar
wlav
 
Posts: 1221
Joined: Mon Jun 14, 2004 18:40
Location: Lawrence Berkeley National Lab


Return to PyROOT Support

Who is online

Users browsing this forum: No registered users and 1 guest