Root_numpy

Hi all,
This is not exactly a PyROOT topic. But, I think this may be useful for people who are using numpy and root especially ipython notebook.

I found myself spending a lot of time waiting for reading in large datafile and convert it to numpy array.

So, I wrote a little CPython extension to read in rootfile and convert it to numpy structure array. It’s a C++ extension and call Root library directly so that’s where most of the speed improvement is from.

github.com/piti118/root_numpy

Tutorial is also given in the package.

Very nice! Would you be interested in including this in rootpy?

https://github.com/rootpy/rootpy

See: https://github.com/rootpy/rootpy/blob/master/rootpy/root2array.py

Your C extension could greatly speed up these functions.

Noel

Hello, now I want to read a dataset from *.root, then deal with it by python and numpy, but there seems to be that root_numpy cannot deal with complex branch, such as a branch is a vector, is it right? do you have some useful suggestion how to read such structure data:


*Tree :TruthTree : Truth Tree *
*Entries : 1000 : Total = 94806134 bytes File Size = 48397513 *

  • : : Tree compression factor = 1.94 *

*Br 0 :Event : Event/I *
*Entries : 1000 : Total Size= 7278 bytes File Size = 238 *
*Baskets : 2 : Basket Size= 16017 bytes Compression= 17.45 *

*Br 1 :Event_Number : vector *
*Entries : 1000 : Total Size= 36518 bytes File Size = 4153 *
*Baskets : 2 : Basket Size= 16017 bytes Compression= 5.34 *

*Br 2 :Event_Nparticles : vector *
*Entries : 1000 : Total Size= 30070 bytes File Size = 4977 *
*Baskets : 2 : Basket Size= 16017 bytes Compression= 3.66 *

*Br 3 :Event_ProcessID : vector *
*Entries : 1000 : Total Size= 30062 bytes File Size = 1929 *
*Baskets : 2 : Basket Size= 16017 bytes Compression= 9.43 *

*Br 4 :Event_Weight : vector *
*Entries : 1000 : Total Size= 36518 bytes File Size = 1990 *
*Baskets : 2 : Basket Size= 16017 bytes Compression= 11.15 *

Hi,

what is the goal? An std::vector can be read as an std::vector. For it to be used as a numpy array, it would need to be copied over. The problem with vectors is that they’re templates, so you’d need a converter method for each different type if you’d want that code to be in C++ such as in done in root_numpy.

Cheers,
Wim

My goal is read the data from a tree (its structure I have posted before), then using this data to do some machine learning tasks.
Now the urgent task is read the data and put them into a array.
I’m not familiar with ROOT and PyROOT, I want to know whether there is an easy method can read the data directly. thx

Hi,

something simple like this should work for access:for event in mytree: npart = event.Event_Nparticles for N in npart: print N
Not too efficient (and can’t be in pure python, but that’s why I’m working on CppyyROOT). There’s no way to get to the underlying memory of an std::vector, so element-wise copy is the only option. You could hand it, since it’s iterable, to a python array, though:[code]>>> from ROOT import std

import array
array.array(‘i’, std.vector(int)(10))
array(‘i’, [0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
[/code]
Cheers,
Wim

Like Wim said, each for vector type needs to be instantiated by hand.
I just added support for vector of couple frequently use type(int float double long char) in the HEAD version.

This is done by the magic of memcpy and &((*v)[0]) guaranteeing contiguous array.

github.com/piti118/root_numpy