Re: Some thoughts about histogram classes in root [LONG MESSAGE]

From: Rene Brun (Rene.Brun@cern.ch)
Date: Tue Feb 15 2000 - 11:24:46 MET


Hi Vincent & Rooters
This is an answer to Vincent mail with some additional ideas.
The existing histogram classes in Root are the result of a long and
painful
exercise. At the start of the project, we implemented a straight copy
of the Hbook functions for 1-d and 2-d histograms. These were working
well.
However, we had many requests to support additional data types, in
particular
double. The "obvious" thing to do was to use templates to implement
in an "elegant" way all the data types and avoid code duplication.
We implemented a second iteration of the histogram classes using
templates.
We quickly faced a long list of problems with this implementation
when we added more functionality (projections, fits, etc). 
Template support at the time was non portable and not very efficient.
In addition, the size of the histogramming package in memory was
becoming
unacceptable as well the compile/link time (and we had less than 20,000
lines of code !!). We rewrote the entire package converging gradually
towards the current form. The TH2 and TH3 classes were added later,
mainly to support additional statistics parameters (correlation factors
for 2-d, etc). We could have added these parameters in the base class.
One important constraint was to keep the size (memory and persistent
version) as short as possible. We know many applications generating
1000s
of histograms. We also converged to an implementation with the 3 TAxis
objects in the base class. This is a long history. I think it was
a wise decision but with a lot of consequences.
I know that many thinks should be cleaned in the package.
You refer to the Fill3 functions for example. These functions are
redundant
now and will be removed in the future version. By the way, if you think
more than one second, your proposal to move these functions to TH3
will not work. 
There are several areas to be improved (see below). You are however
making 
some statements difficult to accept. Be careful with the word
ridiculous!
A real program generates a mixture of many histogram types, all in the
same
list. You certainly do not want to have a container per histogram type !
One has to protect against all possible user mistakes (this may be found 
inelegant from a computer scientist fresh from the owen, but I dont
care).
This type of tests will have to be done, independently from the class
hierarchy.
You are proposing to add new functions such as TH1::GetXmin, etc.
I think that this a bad idea. This would save some typing, but:
  - One will have to duplicate all the TAxis functions 3 times in TH1
  - This will make TH1 even much bigger and the dictionary also.
It was certainly a mistake to implement the kind of functions like
TH1::GetBinCenter, TH1::GetNdivisions, etc. The access via TAxis
is more logical.
I agree with your proposal for a symmetric function to TH1::GetBin
for TH2s and TH3s.

Alexander is suggesting 4-d, 5-d histograms, N-d histograms. However
I believe that ntuples are a better alternative and give you more
freedom (as pointed by Pasha). I do not see this as a constraint
for the class hierarchy.

I am not opposed at all to a constructive debate about the evolution
and restructuring of the histogramming package and related service
classes.
Recently, we had discussions for example on the following questions:
  - Should we have an abstract interface below TH1 ? If yes, what for ?
    common API with other histogramming packages ? risk of defining
    a lowest common denominator and endless fights ?
  - Can we imagine the notion of an histogram object (just data with
    getters and setters). This would facilitate the import/export with
    other packages and also Java. I am in favour of this option.
  - Should we move the high level functions such as projections, fits,
    operations to associated classes ? We started a similar exercize in
2.23
    by delegating the graphics functions to THistPainter. This has
certainly
    complicated the structure (OK, it is more modular). Because the
histogram
    package must work in a multi-threaded environment (online), this has
    many consequences in the architecture.
  - Remove classes TH2 and TH3 (not TH2xxx, TH3xxx). As said above,
these two
    classes are only used to store statistical quantities. The stats
could
    be a dynamic double* in TH1 instead of named data members in TH1,
TH2, TH3.
  - We must include support for string data types. This has also some
    implication on TAxis (labels, sorting, etc).
  - Damir Buskulic has raised the question of precision in TAxis
(float).
    Should we promote float to double in TAxis ? If yes, TH1
constructors
    must be changed accordingly.

We should also consider an evolution of the TGraph and TF1 classes.
Adding TAxis objects directly in these classes seems unavoidable.
Going via TGraph::GetHistogram is not very popular.

We also would like to start a discussion on the evolution of
TTree/TBranch/
TLeaf classes. Should we define more abstraction below these 3 classes ?
Fred Gray has some interesting suggestions to abstract the interface
to facilitate access to non-Root data.

As usual, comments are welcome.

Rene Brun




Vincent Colin de Verdiere wrote:
> 
> I started using 3D histograms in root and I came with some remarks
> and thoughts about Histogram classes in root ...
> 
> //////////  Inheritance tree //////////////////////////
> 
> There 1 class by number of dimensions: TH1, TH2 and TH3. This is very
> nice. It is, for sure, very convenient to make TH2 and TH3 derive from
> TH1. But, on the OO point of view, it should also means that TH3 specific
> methods are declared and defined in TH3 and not in TH1. This is currently
> not the case : for exemple, there are fill() functions with 1, 2, 3
> dimensions in TH1...why ?
> if people are building 3D histograms using TH1 -> it's wrong. The
> class name should be something like THisto.
> if people are building 3D histograms using TH3 -> it's useless to
> define the functions in TH1.
> 
> One of my favorite examples:
> //////////////////
> TProfile *TH1::ProfileX(...) {
>   if (GetDimension() != 2) {
>      Error("ProfileX","Not a 2-D histogram");
>      return 0;
>   }
> ....
> }
> ////////////////
> What does this function has to do in TH1 ???
> 
> There are also examples on the other side:
>  multiple definition of the same function
> fill3 has exactly the same code in TH3S, TH3C, TH3F...
> why not using inheritance by defining it in TH3 ?
> 
> A main point is that using OO programming has to simplify the user's or
> programmer's life. I don't think that puting all methods in a single class
> with many "if" cases really helps the user. Are there any technical
> (linked with cint?) or historical reasons for this architecture ?
> If not, it seems to me very easy to move methods from TH1 to TH2 or
> TH3 without breaking some backward compatibility.
> 
> ///////////// Naming //////////////////////////////////////
>  fill12, fill3, fill...
>    * can anyone tell me what are the differences between these functions ?
>    * without reading the source code ?
> 
> //////////// Some functions which could be added ? ////////
>  access to min/max values on each axis:
> exple:
>  Float_t TH1::GetXmin(void) { return GetXaxis()->GetXmin(); }
>  Float_t TH3::GetZmax(void) { return GetZaxis()->GetXmax(); }
>    ....
> 
> // A function to get all 3 coordinates of a bin in a 3D histogram out
> // of the 1D bin number of TH1.
> // usefull to give any meaning to GetMaximumBin() function when used
> // in TH3.
>  void TH3::Bin1ToBin3(Int_t bin, Int_t& binx, Int_t& biny, Int_t&
> binz)
> {
>   Int_t nbbinX= GetNbinsX()+2;
>   Int_t nbbinY= GetNbinsY()+2;
> 
>   binx= bin%nbbinX;
>   biny= (bin/nbbinX) % nbbinY;
>   binz= (bin/nbbinX) / nbbinY;
> }
> 
> // A function to get original values of bins.
> // for exple:
> void TH3::BinToCoord(Int_t bin, Float_t& x, Float_t& y, Float_t& z)
> {
>   TAxis* xax = GetXaxis();
>   TAxis* yax = GetYaxis();
>   TAxis* zax = GetZaxis();
> 
>   Int_t binx, biny, binz;
>   Bin1ToBin3(bin, binx, biny, binz);
> 
>   x= xax->GetBinCenter(binx);
>   y= yax->GetBinCenter(biny);
>   z= zax->GetBinCenter(binz);
> }
> 
> -- Vincent
> 
> Vincent Colin de Verdiere (vincent.colin.de.verdiere@cern.ch)
> at home: 44, rue de la Fruitiere, 01710-Thoiry tel: 04 50 20 87 90
> at work: CERN, division EP, Office 13-1-038, CH - 1211 Geneva 23
>         tel: (+41) 22 76 72839, fax: (+41) 22 767 9075



This archive was generated by hypermail 2b29 : Tue Jan 02 2001 - 11:50:19 MET