Re: [ROOT] KolmogorovTest

From: Victor Perevoztchikov (perev@bnl.gov)
Date: Wed Jul 30 2003 - 20:26:47 MEST


Hi Rene,
> I reimplemented TMath::KolmogorovProb starting from the latest
> version of CERNLIB PROBKL (see CVS) and the result is identical.

yes, I told that reimplementation is correct. I compared f77 and C++
versions, they are the same.
I thought that the bug in original, fortran version.

Now I tested them. I wrote my own prob function, not optimal but correct one
and
found that result is the same again. That means there is no bug in
KolmogorovProb and ksprob()

I think there is only one explanation. Kolmogorov test is correct in the
limit of big numbers.
When I increased number of events in one sample from 100 to 1000 and use the
same 10 bins
probabilitiy histogram with the same 100 entries in it, result is much
better, no empty bins and
distribution close to flat.
Still some excess of big probability, probably 1000 events in sample is
still not big enough

So there is no bug in Kolmogorov test implementation,
but it  is usable, when number of events in a sample about 1000

Victor

Victor M. Perevoztchikov   perev@bnl.gov
Brookhaven National Laboratory MS 510A PO Box 5000 Upton NY 11973-5000
tel office : 631-344-7894; fax 631-344-4206;

----- Original Message -----
From: "Rene Brun" <brun@pcbrun.cern.ch>
To: "Victor Perevoztchikov" <perev@bnl.gov>
Cc: "Rene Brun" <Rene.Brun@cern.ch>; "Ben Kilminster" <bjk@fnal.gov>;
<roottalk@pcroot.cern.ch>
Sent: Wednesday, July 30, 2003 12:51 AM
Subject: Re: [ROOT] KolmogorovTest


> Victor,
>
> You did not read completly my previous mail. There are entries between
> 0.8 and 0.9. I was myself surprised by the result. Change the number
> of bins from 10 to 100.
> I reimplemented TMath::KolmogorovProb starting from the latest
> version of CERNLIB PROBKL (see CVS) and the result is identical.
>
> Rene Brun
>
>
> On
> Tue, 29 Jul 2003, Victor Perevoztchikov wrote:
>
> > Hi Rene and Ben,
> > > The values given to TMath::KolmogorovProv are discrete due to
> > > the difference between consecutive bins (small integers).
> > it is possible, but too suspicious. I did more tests.
> >
> > 1. I repeated Ben test in fortran with HBOOK and got exactly the same
> > result.
> >    So it is not a result of F77 ==> C++ conversion. Conversion is
correct.
> >
> > 2. I have made a test with two flat distrubutions (rndm) without
histograms
> > involved.
> >     So now "The values given to TMath::KolmogorovProb are NOT discrete "
> >    But result is still bad. Again there is no  entries between 0.8 and
0.9
> >
> > Resume: KolmogorovProb is wrong. It is wrong already in HBOOK.
> > Is it possible to find the author of this function?
> >
> > Victor
> >
> >
> > Victor M. Perevoztchikov   perev@bnl.gov
> > Brookhaven National Laboratory MS 510A PO Box 5000 Upton NY 11973-5000
> > tel office : 631-344-7894; fax 631-344-4206;
> >
> > ----- Original Message -----
> > From: "Rene Brun" <Rene.Brun@cern.ch>
> > To: "Ben Kilminster" <bjk@fnal.gov>
> > Cc: <roottalk@pcroot.cern.ch>
> > Sent: Sunday, July 27, 2003 10:50 AM
> > Subject: Re: [ROOT] KolmogorovTest
> >
> >
> > > Hi Ben,
> > >
> > > The fact that there are no entries between 0.8 and 0.9 is simply a
> > > binning artefact. Increase your number of bins from 10 to 100.
> > >
> > > The values given to TMath::KolmogorovProv are discrete due to
> > > the difference between consecutive bins (small integers).
> > > Thus introduces an asymmetry in favour of high z values.
> > >
> > > Rene Brun
> > >
> > > On Thu,
> > > 24 Jul 2003, Ben Kilminster wrote:
> > >
> > > > Hi fellow Rooters,
> > > >
> > > > The KS probability is supposed to be a value which is uniformly
> > > > distributed between zero and one if you are comparing two
distributions
> > > > which come from the same parent distribution.
> > > >
> > > > In the following root macro that I run, I see that it is not flat,
it is
> > > > peaked strongly at one, and that there is a big hole where the
> > probability
> > > > is not filled.
> > > >
> > > > Does anyone have any idea what is wrong ?
> > > >
> > > > Cheers,
> > > > Ben
> > > >
> > > > (I am using root v3_05_04d KCC_4_0 Linux+2.4)
> > > >
> > > >
> > > > {
> > > > TCanvas *c1 = new TCanvas("c1","plots",600,700);
> > > > c1->Divide(2,2);
> > > > // Make a gaussian distribution
> > > > TH1F *HGaussMain = new TH1F("HGaussMain","Gaussian pseudoexperiment
> > > > ",100,0,10);
> > > > for (i = 0; i < 10000; i++) {
> > > >   HGaussMain->Fill(gRandom->Gaus(5,1.0));
> > > >   }
> > > > // Draw the parent distribution
> > > > TCanvas *c1 = new TCanvas("c1","plots",600,700);
> > > > c1->Divide(2,2);
> > > > c1->cd(1);
> > > > HGaussMain->Draw();
> > > >
> > > > // Now loop through choosing 100 event
> > > > // daughter distributions from the parent distribution
> > > > // comparing them with KS statistic
> > > >
> > > > TH1F *HGauss1 = new TH1F("HGauss1","Random Gaussian
> > > > pseudoexperiment",100,0,10);
> > > > TH1F *HGauss2 = new TH1F("HGauss2","Random Gaussian
> > > > pseudoexperiment",100,0,10);
> > > > TH1F *HKSValues = new TH1F("HKSValues","KS values",10,0,1.0);
> > > >
> > > > for (int j = 0; j < 1000; j++) {
> > > > HGauss1->Reset();
> > > > HGauss2->Reset();
> > > > for (int i = 0; i < 100; i++) {
> > > >   HGauss1->Fill(HGaussMain->GetRandom());
> > > >   HGauss2->Fill(HGaussMain->GetRandom());
> > > >   }
> > > > double KS_agree = HGauss1->KolmogorovTest(HGauss2);
> > > > HKSValues->Fill(KS_agree);
> > > > }
> > > >
> > > > c1->cd(2);
> > > > HGauss1->Draw();  // typical daughter distribution
> > > > c1->cd(3);
> > > > HGauss2->Draw();  // another typical daughter distribution
> > > > c1->cd(4);
> > > > HKSValues->Draw();  // This is not flat
> > > >
> > > > }
> > > >
> > > >
> > >
> >
>



This archive was generated by hypermail 2b29 : Thu Jan 01 2004 - 17:50:14 MET