Re: ROOFIT and TMinuit

From: Lorenzo Moneta <Lorenzo.Moneta_at_cern.ch>
Date: Wed, 19 Aug 2009 14:58:16 +0200


Hi,

Thanks Wouter for the detailed explanations. Just one correction, the formula for the chi2 is the same in ROOT and Roofit. The difference is in ROOT we
skip the empty bins and do not consider them in the chi2 calculation and also in the total ndf of the fit.

I will cc message to the rootalk list, since I forgot to reply to the list before,

  Lorenzo
On Aug 19, 2009, at 2:46 PM, Wouter Verkerke wrote:

>
> Hi Lorenzo, Anil,
>
> For (mostly) historical reasons, RooFit has some unusual conventions
> in its
> chi-squared calculations. Let me highlight 2 points:
>
> - RooChi2Var calculates chi2 = sum_data [ data(i) - func(i) ] ^2 /
> E_data(i)^2
> where E_data(i) is the error associated with the data point. (By
> default
> that is sqrt(data(i)). (this instead of func(i) in the
> denominator, which
> is used in root).
>
> - RooPlot::chiSquare() returns an already reduced chi^2, so the
> additional
> division by ndof in the macro below should not be done.
>
> I will add an option to use the func(i) denominator in the chi-
> squared calculation in the next release.
>
> Aside from this, if you have many bins with statistics <10, you are
> better off
> with a likehood fit than with a chi-squared fit, as the Gaussian
> approximations
> made in the latter are not valid for bins with low statistics. You
> can of course
> still calculate the chi-squared afterwards.
>
> You can calculate any kind of test statistic based on
> (x_data,y_data,y_func) yourself easily starting with your RooPlot
> with a function and a dataset as follows
>
> RooHist* histo = (RooHist*) frame->findObject("data") ;
> RooCurve* func = (RooCurve*) frame->findObject("model") ;
> for (Int_t i=0 ; i<histo->GetN() ; i++) {
> Double_t xdata,ydata ;
> histo->GetPoint(i,xdata,ydata) ;
> Double_t yfunc = curve->interpolate(xdata) ;
> // Use xdata,ydata,yfunc here
> }
>
>
> Wouter
>
>
> On Wed, 19 Aug 2009, Lorenzo Moneta wrote:
>
>> Hi,
>>
>> The result using TMInuit looks reasonable, the chi2 is high because
>> the function does not parametrize well the data. You should improve
>> probably the
>> modeling of the function.
>> The RooFit ch2 fit does not work, because you have some bins with
>> zero errors (empty).
>> Apparently, looking at the example tutorials/rf602_chi2fit.C the
>> fit will work only when all bins are filled.
>> The chi2 value printed also does not make any sense to me, I don't
>> know how is computed, maybe Wouter can comment on this
>>
>> Lorenzo
>>
>>
>>
>> On Aug 19, 2009, at 12:22 PM, Anil Singh wrote:
>>
>>> From: Anil Singh <anil79_at_fnal.gov>
>>> Date: August 19, 2009 12:21:40 PM GMT+02:00
>>> To: Lorenzo Moneta <Lorenzo.Moneta_at_cern.ch>
>>> Subject: Re: [ROOT] ROOFIT and TMinuit
>>> Hi Lorenzo,
>>> Thanks for the reply. But is n't the difference too large
>>> (21 vs 1.8), to be explained by limited statistics ?
>>> Moreover Iam afraid that these observations can hardly
>>> be due to lack of statistics in my case.
>>> There are ~100000 events for Z production.
>>> I am attatching the rootfile in mail.
>>> The programs used in earlier file should run fine..one
>>> just has to change the source file name.
>>> anil
>>> ----- Original Message -----
>>> From: Lorenzo Moneta <Lorenzo.Moneta_at_cern.ch>
>>> Date: Wednesday, August 19, 2009 3:00 pm
>>> Subject: Re: [ROOT] ROOFIT and TMinuit
>>> To: Anil Singh <anil79_at_fnal.gov>
>>> Cc: roottalk_at_cern.ch
>>>> Hello Anil,
>>>> Although I don't have the data file you are fitting I presume the
>>>> difference is due to the treatment of the empty bins.
>>>> In bare ROOT and also in your case you are skipping the empty bins,
>>>> while in Roofit they are considered with an error given by the
>>>> 68% CL
>>>> of the Poisson distribution with mu=0. The fit will be biased in
>>>> both
>>>> cases, since when the number of entry/bins is less than ~5, the
>>>> normal
>>>> approximation for the
>>>> bin error is not anymore valid.
>>>> If you have low statistics the reccomeded way to fit an histogram
>>>> is
>>>> by using a binned likelihood method using Poisson statistics for
>>>> the
>>>> bin content.
>>>> This can be done in ROOT by fitting an histogram using the TH1::Fit
>>>> method with option "L".
>>>> In you case also you don;t need to use the TMinuit class, but when
>>>> you can just use TH1::Fit also for a least square fit.
>>>> Best Regards
>>>> Lorenzo
>>>> On Aug 19, 2009, at 7:36 AM, Anil Singh wrote:
>>>>> Dear Rooters,
>>>>> I have certain confusions (may be very stupid )regarding the
>>>>> working
>>>>> of TMinuit
>>>>> and RooFit packages. I am vaguley aware that RooFit and TMinuit
>>>>> use
>>>>> Minuit2
>>>>> engine for minimization.
>>>>> I am trying to fit a resonance using Breit Wigner shape. The fit
>>>>> looks very nice
>>>>> with the code that I created using example given in minuit manual.
>>>>> But the
>>>>> chi square shows very high (~21) even if I limit myself to the
>>>>> range
>>>>> where fit is especially
>>>>> good. Suspecting some hidden errors I moved to use TMinuit class
>>>>> but
>>>>> the results
>>>>> were same. Now just when I was ready to accept the numbers I made
>>>>> last
>>>>> validation effort using the RooFit package. The chisquare of the
>>>>> fit
>>>>> now turned out
>>>>> to be "~1.8", while the visual features of plot and the values of
>>>>> fiited parameters
>>>>> looked same as is other two attempts. I dont know what to believe
>>>>> in. Here are
>>>>> the TMinuit and RooFit Program that I used.
>>>>> TMINUIT
>>>>> =
>>>>> =
>>>>> ==================================================================
>>>>> #include <iostream>
>>>>> #include <fstream>
>>>>> #include <cstdlib>
>>>>> #include <cmath>
>>>>> #include <string>
>>>>> #include <vector>
>>>>> #include <TMinuit.h>
>>>>> #include <TCanvas.h>
>>>>> #include <TStyle.h>
>>>>> #include <TROOT.h>
>>>>> #include <TF1.h>
>>>>> TFile* f1 = new TFile("MyAnalysis.root");
>>>>> TH1D* myHist =(TH1D*) gDirectory->Get("h_z_mass1");
>>>>> std::vector<double>* the_pos = new std::vector<double>() ;
>>>>> std::vector<double>* the_mes = new std::vector<double>();
>>>>> std::vector<double>* the_err = new std::vector<double>();
>>>>> double yield(){
>>>>> double sum=0;
>>>>> for(int i=0; i!=(*the_mes).size(); i++){
>>>>> sum = sum+(*the_mes)[i];
>>>>> }
>>>>> return sum;
>>>>> }
>>>>> double BreitWigner(double* xptr, double* par){
>>>>> double mean = par[0];
>>>>> double gamma=par[1];
>>>>> double cons=par[2];
>>>>> double m2 = mean*mean;
>>>>> double g2 = gamma*gamma;
>>>>> double g2overm2 = g2/m2;
>>>>> double x = *xptr;
>>>>> double s = x*x;
>>>>> double deltaS = s-m2;
>>>>> double lineshape=0;
>>>>> if (fabs(deltaS/m2)<16){
>>>>> double prop = deltaS*deltaS + s*s*g2overm2;
>>>>> lineshape = (2/3.14)*g2*m2/(deltaS*deltaS + s*s*(g2overm2));
>>>>> }
>>>>> return cons*lineshape;
>>>>> }
>>>>> void ReadData(){
>>>>> myHist->Sumw2();
>>>>> int n = myHist->GetNbinsX();
>>>>> for(int i=0; i<n; i++){
>>>>>
>>>>> double content = myHist->GetBinContent(i+1);
>>>>> double x = myHist->GetBinCenter(i+1);
>>>>> double er = myHist->GetBinError(i+1);
>>>>> if((x<40)||(x>200))
>>>>> continue;
>>>>> the_pos->push_back(x);
>>>>> the_mes->push_back(content);
>>>>> the_err->push_back(er);
>>>>> }
>>>>> }
>>>>> void fcn(Int_t &npar, Double_t *gin, Double_t &f, Double_t *par,
>>>>> Int_t iflag)
>>>>> {
>>>>> Int_t nbins = the_mes->size();
>>>>> // cout<<"-------------->>"<<nbins<<endl;
>>>>> //calculate chisquare
>>>>> Double_t chisq = 0;
>>>>> Double_t delta;
>>>>> for (int i=0;i<nbins; i++) {
>>>>> if((*the_err)[i]){
>>>>> double thepoint = (*the_pos)[i];
>>>>> delta = ((*the_mes)[i]-BreitWigner(&thepoint,par))/(*the_err)
>>>>> [i];
>>>>> chisq += delta*delta;}
>>>>> // cout<<"delta: "<<delta<<endl;
>>>>> }
>>>>> cout<<"chisq: "<<chisq/nbins<<endl;
>>>>> f = chisq;
>>>>> }
>>>>> void main(int argc, char **argv){
>>>>> TFile* file=new TFile("TMinuitFit.root","recreate");
>>>>> TF1* func = new TF1("funcplot", BreitWigner,40,195,3);
>>>>> ReadData();
>>>>> TMinuit *gMinuit = new TMinuit(3);
>>>>> gMinuit->SetFCN(fcn);
>>>>> Double_t arglist[3];
>>>>> Int_t ierflg = 0;
>>>>> double yld = yield();
>>>>> arglist[0] = 1;
>>>>>
>>>>> gMinuit->mnexcm("SET ERR", arglist ,1,ierflg);
>>>>> Double_t vstart[3] = {91,2.5,12000};
>>>>> Double_t step[3] = {10 , 0.01 , 1200};
>>>>>
>>>>> gMinuit->mnparm(0, "mean", vstart[0], step[0], 0,0,ierflg);
>>>>> gMinuit->mnparm(1, "gamma", vstart[1], step[1], 0,0,ierflg);
>>>>> gMinuit->mnparm(2, "cons", vstart[2], step[2], 0,0,ierflg);
>>>>>
>>>>> arglist[0]=5000;
>>>>> arglist[1]=0.1;
>>>>> gMinuit->mnexcm("SET STR", arglist,2,ierflg);
>>>>> gMinuit->mnexcm("MIGRAD", arglist, 2, ierflg);
>>>>> double err[3];
>>>>> double outpar[3];
>>>>> for(int i=0; i<3; i++){
>>>>> gMinuit->GetParameter(i,outpar[i],err[i]);
>>>>> }
>>>>> func->SetParameters(outpar);
>>>>> func->SetLineWidth(2);
>>>>> file->cd();
>>>>> myHist->Write();
>>>>> func->Write();
>>>>> file->Write();
>>>>> file->Close();
>>>>> cout<<"=================================="<<endl;
>>>>>
>>>>> Double_t amin,edm,errdef;
>>>>> Int_t nvpar,nparx,icstat;
>>>>> gMinuit->mnstat(amin,edm,errdef,nvpar,nparx,icstat);
>>>>> gMinuit->mnprin(3,amin);
>>>>> cout<<"AMIN: "<<amin<<std::endl;
>>>>> //
>>>>> =
>>>>> =
>>>>> =
>>>>> =
>>>>> =
>>>>> =
>>>>> =
>>>>> =
>>>>> =
>>>>> ==================================================================
>>>>> ROOFIT
>>>>> //
>>>>> =
>>>>> =
>>>>> =
>>>>> =
>>>>> =
>>>>> ==================================================================
>>>>> //Access Data from histograms
>>>>> TFile* f1 = new TFile("MyAnalysis.root");
>>>>> TH1D* myHist =(TH1D*) gDirectory->Get("h_z_mass1");
>>>>> myHist->Sumw2();
>>>>> //Declare variable for quantity under study.
>>>>> RooRealVar mass("mass","mass",40,200);
>>>>> //Declare a data set using histogram.
>>>>> RooDataHist data("mass","mass",mass,myHist);
>>>>> //Make a Briet-Wigner Model.
>>>>> RooRealVar mean("mean","mean",70,110);
>>>>> RooRealVar width("gamma","gamma",0,10);
>>>>> RooBreitWigner bw("bw","bw",mass,mean,width);
>>>>> RooRealVar n1("n1","peak",0,20000);
>>>>> RooAddPdf model("model","model",RooArgList(bw),n1);
>>>>> //Fit the Model to data
>>>>> // model.fitTo(data, RooFit::Range(40,200));
>>>>> RooChi2Var Chi2("Chi2","Chi2",model,data,true);
>>>>> RooMinuit m2(Chi2);
>>>>> m2.migrad();
>>>>> // m2.hesse();
>>>>> RooFitResult* r2 = m2.save() ;
>>>>> //Print the Parameter Info
>>>>> mean.Print();
>>>>> width.Print();
>>>>> n1.Print();
>>>>> //Plotting Bussiness
>>>>> RooPlot* xframe = mass.frame();
>>>>> data.plotOn(xframe, RooFit::Name("data"));
>>>>> model.plotOn(xframe,RooFit::Name("model"));
>>>>> xframe->Draw();
>>>>> //Goodness of Fit
>>>>> double chi2 = xframe->chiSquare("model","data",3);
>>>>> double ndoff = xframe->GetNbinsX();
>>>>> cout<<"chi2/ndoff: "<<chi2<<"/"<<ndoff<<'\t'<<
>>>>> chi2/ndoff<<endl;
>>>>> //
>>>>> =
>>>>> =
>>>>> =
>>>>> =
>>>>> =
>>>>> =
>>>>> =
>>>>> =
>>>>> ==================================================================
>>>>> Any advice is greatly appreciated.
>>>>> Anil Singh
>>> <TMinuitFit.root>
>>
Received on Wed Aug 19 2009 - 14:58:22 CEST

This archive was generated by hypermail 2.2.0 : Wed Aug 19 2009 - 17:50:02 CEST