Re: list of unique values from a TTree

From: Noel Dawe <Noel.Dawe_at_cern.ch>
Date: Fri, 25 May 2012 10:32:15 -0700


Hi Rob,

Another easy and fast way to do this is with numpy.unique [1] and rootpy's [2] tree_to_ndarray [3]:

> python

Python 2.7.2 (default, Aug 4 2011, 05:34:32) [GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> from rootpy.io import open
>>> from rootpy.root2array import tree_to_ndarray
>>> import numpy as np
>>>
>>> with open('test.root') as f:

...     tree = f.test
...     print np.unique(tree_to_ndarray(tree, branches=['x']))
...

[-4.31474733 -3.86986685 -3.37964296 ..., 4.45551538 4.61414433   4.66035557]
>>>

There are no limits to the length of the array here (just your memory) and tree_to_ndarray is fast since the conversion is performed by a compiled C extension module.

Cheers,
Noel

[1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html
[2] https://github.com/rootpy/rootpy
[3]

https://github.com/rootpy/rootpy/blob/master/rootpy/root2array/root2array.py#L51

On Fri, May 25, 2012 at 9:13 AM, Rob Mahurin <rob_at_jlab.org> wrote:

> Thanks for the replies.
>
> On Fri, May 25, 2012 at 01:20:45PM +0000, Amnon Harel wrote:
> > I wonder how much quicker your PyROOT solution will be if you first:
> >
> > tree.SetBranchStatus( '*', 0 ) # disables all branches
> > tree.SetBranchStatus( 'scandata1', 1 ) # and enables the one we
> really need
>
> This gains me a little in speed, but it's still an order of magnitude
> slower than Draw() (tens of seconds versus less than a second).
>
> > AFAIK this is much harder in C++. I wrote a class that can handle most
> cases
> > (too often a complicated enough tree/chain will break a ROOT event loop).
> > If you want to try it out:
> >
> > http://www-d0.fnal.gov/~aharel/value_lister.h
> > http://www-d0.fnal.gov/~aharel/value_lister.c
>
> Thanks, I will see if these meet my needs.
>
> On Fri, May 25, 2012 at 02:23:53PM +0100, Tim Head wrote:
> > On Fri, May 25, 2012 at 2:08 PM, Rob Mahurin <rob_at_jlab.org> wrote:
> > >
> > > 2. Why is this (and the equivalent GetEntry loop called from
> > > CINT) so much slower that tree.Draw("scandata1")?
> > >
> >
> > A workaround for this might be to use something along the following
> > (unfortunately) multi line snippet:
> >
> > tree.Draw("scandata1>>h", "goff")
> > v1 = tree.GetV1() # gets an array of value you just drew
> > set(v1[n] for n in xrange(tree.GetEntries())
>
> Yes! This does exactly what I want, and is quite fast. It must move
> all the looping from interpreted code into compiled code, or something.
>
> Unfortunately, only the first 1000000 elements of v1 are filled, and
> no exception is thrown when v1[1000000] and beyond are read. (The
> subset that I'm fiddling with draws 1.5M entries; len(v1) reports
> 2147483647, an obvious lie.) I can apparently adjust this limit by
> calling tree.SetEstimate() before tree.Draw().
>
> If any developers are reading, it'd be nice if the PyDoubleBuffer
> acted a little more like a python object: knew its own size,
> supported slicing, raised an IndexError (or suchlike) when
> appropriate, etc.
>
> Thanks,
> Rob
>
> --
> Rob Mahurin
> University of Manitoba, Department of Physics
> and Thomas Jefferson National Accelerator Facility
> 12000 Jefferson Avenue Suite 6, Newport News, VA 23606
> office 757-269-6510; elsewhere 865-207-2594; rob_at_jlab.org
>
>

-- 
Vancouver: +1 778 373 9738
Geneva: +41 22 518 13 92
Skype: noel.dawe
Received on Fri May 25 2012 - 19:56:38 CEST

This archive was generated by hypermail 2.2.0 : Fri May 25 2012 - 23:50:02 CEST