Re: different fit behaviour in ROOT and ROOFIT

From: Lorenzo Moneta <Lorenzo.Moneta_at_cern.ch>
Date: Mon, 5 Sep 2011 09:42:06 +0000


HI Roberta,
On Sep 5, 2011, at 11:28 AM, Roberta Arnaldi wrote:

> Dear Lorenzo,
>
> thanks a lot for having investigated this problem and having found the reason of the discrepancy!!!
>
> Just to be sure I understood correctly:
> - the "I" option, not yet implemented in ROOFIT, causes the difference wrt ROOT
> - the "I" option should have a larger effect in ROOFIT wrt the one I observe in ROOT because of the slightly different likelihood function adopted. In ROOT, in fact, the result seems independent on the options :
> N= 2226 +- 189 -> options RL
> N= 2220 +- 183 -> options RLI
> To better understand, where I can find a definition of the ROOFIT likelihood?

Yes, the approximation of not using the "I" option causes a much larger effect on ROOFIT than ROOT. Often this bias is very small and it would pass un-noticed, but in some cases, like yours is not.

The ROOFIT likelihood is defined exactly as the extended un-binned likelihood (see for example G. Cowan book, page 84 formula 6.34). The same definition is used in RooFit for binned or unbinned data sets, and the binned data sets is considered as having n(i) points placed in the bin center. This is the same as the PDG formula, only if you consider the "I" option.

>
> If I use ROOFIT unbinned fits, should I probably get similar results in ROOT and ROOFIT?

Correct, by using the unbinned ROOFIT fit you should get a result consistent with the binned ROOT fit. In ROOT presently the unbinned extended likelihood fit is not yet implemented in TTree::UnbinFit. However, an unbinned fit on datasets of million of events can use quite some CPU time.

  Cheers, Lorenzo

> Please, let me know if there are news from the ROOFIT author!
>
> Thanks again for your help in solving my problem!
>
> Roberta
>
>
> Lorenzo Moneta wrote:

>> Dear Roberta, 
>> I have investigate this further and I found the reason for the difference. It is due how the binned likelihood fits are implemented in ROOT and ROOFIT. When implementing the likelihood both ROOT and ROOFIT approximates the expected number of events for each bin by using the function value at the center for the bin, instead of using the integral. The formula in ROOT is the one described in the PDG (equation 33.12 in http://pdg.lbl.gov/2011/reviews/rpp2011-rev-statistics.pdf ). ROOFIT uses a similar formula, but it replaces the sum of the expected bin content by its total value. By doing this, the approximation of using the function value at the center of the bin  instead of the integral is not used. This enhances the bias with respect to ROOT, where the approximation is always used. Since the two terms containing the expected bin entries (nu(i) in 33.12) have opposite sign, in the formula used in ROOT, the bias is reduced with respect to ROOFIT. I could reproduce this by fitting a toy example with exponential backgrounds plus a gaussian signal. In this cases by using the function value at the center of the bin one overestimates the expected bin entries and then your fitted values of number of entries will be biased to lower values, like in your histogram. 
>> If you want to be safe, use then the option "I" when fitting in ROOT where the integral of the fitted function in the bin is used for computing the expected bin content. Unfortunately RooFIt does not provide this possibility in binned fits. I will report this problem to the RooFit author and ask him also to implement the option of using the integral for each bin. 
>>  Thank you for reporting this problem, 
>>  Lorenzo
>> On Sep 2, 2011, at 10:19 AM, Roberta Arnaldi wrote:
>> 
>>  
>>> Dear Lorenzo,
>>> 
>>> thanks a lot for having looked to my problem and for spotting the "guilty" fitter :)
>>> Unfortunately, the discrepancy between Root and Roofit is something which I always observe, while performing fits to the invariant mass spectra, i.e. it is not something related to the specific histogram I had
>>> attached to the mail...
>>> It's really a pity to give up the use of Roofit, since (before discovering the discrepancy with ROOT) I had the impression that the fit stability and the speed where really very good!
>>> 
>>> ...in my opinion, it would be important to understand why roofit seems not to behave properly, since it is now widely used in the LHC experiments!
>>> Please, let me know if there are any news!
>>> 
>>> Ciao and thanks again!
>>> 
>>>   Roberta
>>> 
>>> 
>>> 
>>> On Thu, 1 Sep 2011, Lorenzo Moneta wrote:
>>> 
>>>    
>>>> Hi Roberta,
>>>> 
>>>> Sorry for having looking late into this problem. I have investigated more, it required me some time and I have concluded that the RooFIt result is probably not right, while the ROOT is. The reason for the difference is still to be found.
>>>> I have been using the same pdf definition of RooFit, by transforming the pdf in a TF1 and then fitted using ROOT and I get consistent results
>>>> with your TestROOT macro. In particular if I use as input the RooFit parameters , I get clearly a smaller value of the likelihood  function
>>>> (meaning that is not optimal).
>>>> Furthermore, when using a chi2 fit method (not maximum likelihood) which should work perfectly fine and give the same results in your case since the histogram bin errors are gaussian I get again the same result. ( NJPSI ~ 2200)
>>>> 
>>>> So, something is probably wrong in the RooFIt fitting, it could be also a numerical problem not dealt correctly in the likelihood calculation in RooFit.
>>>> 
>>>> I attach my macro, which uses the RooFit to build the model but  ROOT for fitting
>>>> 
>>>> Cheers,
>>>> 
>>>> Lorenzo
>>>> 
>>>> 
>>>>      
>> 
>>  
Received on Mon Sep 05 2011 - 11:42:18 CEST

This archive was generated by hypermail 2.2.0 : Mon Sep 05 2011 - 17:50:01 CEST