Re: Minima and Maxima of ntuples

From: Birger Koblitz (koblitz@mail.desy.de)
Date: Mon Dec 06 1999 - 11:22:00 MET


Hi Rene,

I can understand the objections you have. In the case of a general object
being stored in a tree this is of course valid since how should the tree
know how to calculate a general minimum or maximum. One solution would be
to let the objects provide an interface for comparison between two of
them. An object which wants to have a maximum calculated would have to
inherit from an abstract class lets call it Sortable which provides the
pure virtual function
bool Sortable::isGreaterOrEqual(Sortable &).
If a class derives from this, which can be figured out using RTTI one
could provide a calculation of the minimum and maximum at the time
of filling the tree. This Java-ish ansatz would also be quite performant.
On the other hand this might well be too much fuss. I myself would be
quite happy if TNtuple would provide this feature only for its float
collumns ;-) . To do this one would have to give TNTuple an additional
array to store the minima and maxima and override the virtual functions
TTree::GetMaximum() with an own one returning the array entry. One would
also have to change TNtuple::Fill() to store the minima and maxima in the
array. This would be quite easy to achive and TNtuple would only provide
additional feature compared to TTree. One would not need to change
anything in the Leave classes.
In my eyes the feature to have the minima and maxima calculated once and
for all while filling the ntuple is quite important, since you would like
to know beforehand which ranges to use for the histograms or (in my case)
algorithms which want to use the ntuple entries. I also think that it is
important for root to provide ntuples which work as one would expect, i.e.
like the paw ones.

On Fri, 3 Dec 1999, Rene Brun wrote:

> Hi Birger,
> The possibility to compute & store the max/minimum per leaf
> is already foreseen in the TLeaf classes. It is currently
> only filled for leaves that represent a branch count.
> I did not implement it yet because there are cases where
> it does not make sense (a TLeaf is an object) or it is ambiguous,
> and time consuming for example arrays.
> For example, I had several requests to process fixed length arrays
> separately from variable length arrays.
> In case of variable length arrays, one can assume homogeneous info
> inside the array. For fix length, some people store a[0] = something,
> a[1]= something else. In this case min/max have no meaning except
> for each element of the array.
> Another reason why GetMinimum,Maximum, loop on all entries was that
> I was thinking to extend the meaning of the parameter to be an expression
> of the original variables. However, this could be two separate functions.
> 
> I agree with your remark with the 1e30. There are a few other places
> where this constant appear and should be changed as you propose.
> Thanks for your remarks.
> 
> Rene Brun
> 
> On Fri, 3 Dec 1999, Birger Koblitz wrote:
> 
> > Hi,
> > 
> > I noticed that TNtuple::GetMaximum(Text_t *) takes a significant amount of
> > time. Obviously it loops over all entries and calculates the maximum. This
> > is of course very slow. Wouldn't it be possible to calculate this
> > information while the NTuple is filled? I don't think it is very difficult
> > to implement and it would be quite usefull, especially if one needs some
> > bounds e.g. before one wants to fill a histogram with the information of
> > the ntuple. What was said is of course also true for GetMinimum.
> > Actually I encountered the problem while porting some kumac from paw where
> > this seems to be implemented, and I don't think that root should lack
> > anything paw features, right ;-) ?
> > So, Rene, would it be possible for you to implement this feature in the
> > next release?
> > 
> > Cheers,
> >   Birger
> > 
> > PS: Looking at the source, one actually finds:
> > 
> >  Float_t TTree::GetMinimum(const char *columname)
> > {
> > 
> >    TLeaf *leaf = GetLeaf(columname);
> >    if (!leaf) return 0;
> >    TBranch *branch = leaf->GetBranch();
> >    Float_t cmin = 1e30;                <============ ?????
> >    for (Int_t i=0;i<fEntries;i++) {
> >       branch->GetEntry(i);
> >       Float_t val = leaf->GetValue();
> >       if (val < cmin) cmin = val;
> >    }
> >    return cmin;
> > }
> > 
> > obviously there is a bug in the marked line, because it makes assumptions
> > about the representation of float values on the machine: it assumes
> > 1e30 to be the largest float value which is false on most machines. One
> > should either initialize with the first entry or with FLOATMAX calculated
> > in math.h or values.h. Sorry for being pedantic, but numerics is something
> > to take seriously...
> > 
> > 
> > 
> > /------------------------------------------------------------\
> > | Birger Koblitz                    koblitz@mail.desy.de     |
> > | Max-Planck-Institut fuer Physik                            |
> > | (Werner Heisenberg-Institut)                               |
> > | DESY-FH1K                         Tel. (40) 8998-3971      |
> > | Notkestr. 85                                               |
> > | D-22603  HAMBURG                                           |
> > \------------------------------------------------------------/
> > 
> 
> 

/------------------------------------------------------------\
| Birger Koblitz                    koblitz@mail.desy.de     |
| Max-Planck-Institut fuer Physik                            |
| (Werner Heisenberg-Institut)                               |
| DESY-FH1K                         Tel. (40) 8998-3971      |
| Notkestr. 85                                               |
| D-22603  HAMBURG                                           |
\------------------------------------------------------------/



This archive was generated by hypermail 2b29 : Tue Jan 04 2000 - 00:43:44 MET