ROOT   Reference Guide
Searching...
No Matches
rf614_binned_fit_problems.py
Go to the documentation of this file.
1## \file
2## \ingroup tutorial_roofit
3## \notebook -js
4## A tutorial that explains you how to solve problems with binning effects and
5## numerical stability in binned fits.
6##
7## ### Introduction
8##
9## In this tutorial, you will learn three new things:
10##
11## 1. How to reduce the bias in binned fits by changing the definition of the
12## normalization integral
13##
14## 2. How to completely get rid of binning effects by integrating the pdf over each bin
15##
16## 3. How to improve the numeric stability of fits with a greatly different
17## number of events per bin, using a constant per-bin counterterm
18##
19## \macro_code
20## \macro_output
21##
22## \date January 2023
23## \author Jonas Rembser
24
25import ROOT
26
27
28def generateBinnedAsimov(pdf, x, n_events):
29 """
30 Generate binned Asimov dataset for a continuous pdf.
31 One should in principle be able to use
32 pdf.generateBinned(x, n_events, RooFit::ExpectedData()).
33 Unfortunately it has a problem: it also has the bin bias that this tutorial
34 demonstrates, to if we would use it, the biases would cancel out.
35 """
36 data_h = ROOT.RooDataHist("dataH", "dataH", {x})
37 x_binning = x.getBinning()
38
39 for i_bin in range(x.numBins()):
40 x.setRange("bin", x_binning.binLow(i_bin), x_binning.binHigh(i_bin))
41 integ = pdf.createIntegral(x, NormSet=x, Range="bin")
42 ROOT.SetOwnership(integ, True)
43 integ.getVal()
44 data_h.set(i_bin, n_events * integ.getVal(), -1)
45
46 return data_h
47
48
49def enableBinIntegrator(func, num_bins):
50 """
51 Force numeric integration and do this numeric integration with the
52 RooBinIntegrator, which sums the function values at the bin centers.
53 """
54 custom_config = ROOT.RooNumIntConfig(func.getIntegratorConfig())
55 custom_config.method1D().setLabel("RooBinIntegrator")
56 custom_config.getConfigSection("RooBinIntegrator").setRealValue("numBins", num_bins)
57 func.setIntegratorConfig(custom_config)
58 func.forceNumInt(True)
59
60
61def disableBinIntegrator(func):
62 """
63 Reset the integrator config to disable the RooBinIntegrator.
64 """
65 func.setIntegratorConfig()
66 func.forceNumInt(False)
67
68
69# Silence info output for this tutorial
70ROOT.RooMsgService.instance().getStream(1).removeTopic(ROOT.RooFit.Minimization)
71ROOT.RooMsgService.instance().getStream(1).removeTopic(ROOT.RooFit.Fitting)
72ROOT.RooMsgService.instance().getStream(1).removeTopic(ROOT.RooFit.Generation)
73
74# Exponential example
75# -------------------
76
77# Set up the observable
78x = ROOT.RooRealVar("x", "x", 0.1, 5.1)
79x.setBins(10)
80# fewer bins so we have larger binning effects for this demo
81
82# Let's first look at the example of an exponential function
83c = ROOT.RooRealVar("c", "c", -1.8, -5, 5)
84expo = ROOT.RooExponential("expo", "expo", x, c)
85
86# Generate an Asimov dataset such that the only difference between the fit
87# result and the true parameters comes from binning effects.
88expo_data = generateBinnedAsimov(expo, x, 10000)
89
90# If you do the fit the usual was in RooFit, you will get a bias in the
91# result. This is because the continuous, normalized pdf is evaluated only
92# at the bin centers.
93fit1 = expo.fitTo(expo_data, Save=True, PrintLevel=-1, SumW2Error=False)
94fit1.Print()
95
96# In the case of an exponential function, the bias that you get by
97# evaluating the pdf only at the bin centers is a constant scale factor in
98# each bin. Here, we can do a trick to get rid of the bias: we also
99# evaluate the normalization integral for the pdf the same way, i.e.,
100# summing the values of the unnormalized pdf at the bin centers. Like this
101# the bias cancels out. You can achieve this by customizing the way how the
103enableBinIntegrator(expo, x.numBins())
104fit2 = expo.fitTo(expo_data, Save=True, PrintLevel=-1, SumW2Error=False)
105fit2.Print()
106disableBinIntegrator(expo)
107
108# Power law example
109# -----------------
110
111# Let's not look at another example: a power law \f[x^a\f].
112a = ROOT.RooRealVar("a", "a", -0.3, -5.0, 5.0)
113powerlaw = ROOT.RooPowerSum("powerlaw", "powerlaw", x, ROOT.RooFit.RooConst(1.0), a)
114powerlaw_data = generateBinnedAsimov(powerlaw, x, 10000)
115
116# Again, if you do a vanilla fit, you'll get a bias
117fit3 = powerlaw.fitTo(powerlaw_data, Save=True, PrintLevel=-1, SumW2Error=False)
118fit3.Print()
119
120# This time, the bias is not the same factor in each bin! This means our
121# trick by sampling the integral in the same way doesn't cancel out the
122# bias completely. The average bias is canceled, but there are per-bin
123# biases that remain. Still, this method has some value: it is cheaper than
124# rigurously correcting the bias by integrating the pdf in each bin. So if
125# you know your per-bin bias variations are small or performance is an
126# issue, this approach can be sufficient.
127enableBinIntegrator(powerlaw, x.numBins())
128fit4 = powerlaw.fitTo(powerlaw_data, Save=True, PrintLevel=-1, SumW2Error=False)
129fit4.Print()
130disableBinIntegrator(powerlaw)
131
132# To get rid of the binning effects in the general case, one can use the
133# IntegrateBins() command argument. Now, the pdf is not evaluated at the
134# bin centers, but numerically integrated over each bin and divided by the
135# bin width. The parameter for IntegrateBins() is the required precision
136# for the numeric integrals. This is computationally expensive, but the
137# bias is now not a problem anymore.
138fit5 = powerlaw.fitTo(powerlaw_data, IntegrateBins=1e-3, Save=True, PrintLevel=-1, SumW2Error=False)
139fit5.Print()
140
141# Improving numerical stability
142# -----------------------------
143
144# There is one more problem with binned fits that is related to the binning
145# effects because often, a binned fit is affected by both problems.
146#
147# The issue is numerical stability for fits with a greatly different number
148# of events in each bin. For each bin, you have a term \f[n\log(p)\f] in
149# the NLL, where \f[n\f] is the number of observations in the bin, and
150# \f[p\f] the predicted probability to have an event in that bin. The
151# difference in the logarithms for each bin is small, but the difference in
152# \f[n\f] can be orders of magnitudes! Therefore, when summing these terms,
153# lots of numerical precision is lost for the bins with less events.
154
155# We can study this with the example of an exponential plus a Gaussian. The
156# Gaussian is only a faint signal in the tail of the exponential where
157# there are not so many events. And we can't afford any precision loss for
158# these bins, otherwise we can't fit the Gaussian.
159
160x.setBins(100) # It's not about binning effects anymore, so reset the number of bins.
161
162mu = ROOT.RooRealVar("mu", "mu", 3.0, 0.1, 5.1)
163sigma = ROOT.RooRealVar("sigma", "sigma", 0.5, 0.01, 5.0)
164gauss = ROOT.RooGaussian("gauss", "gauss", x, mu, sigma)
165
166nsig = ROOT.RooRealVar("nsig", "nsig", 10000, 0, 1e9)
167nbkg = ROOT.RooRealVar("nbkg", "nbkg", 10000000, 0, 1e9)
168frac = ROOT.RooRealVar("frac", "frac", nsig.getVal() / (nsig.getVal() + nbkg.getVal()), 0.0, 1.0)
169
170model = ROOT.RooAddPdf("model", "model", [gauss, expo], [nsig, nbkg])
171
172model_data = model.generateBinned(x)
173
174# Set the starting values for the Gaussian parameters away from the true
175# value such that the fit is not trivial.
176mu.setVal(2.0)
177sigma.setVal(1.0)
178
179fit6 = model.fitTo(model_data, Save=True, PrintLevel=-1, SumW2Error=False)
180fit6.Print()
181
182# You should see in the previous fit result that the fit did not converge:
183# the MINIMIZE return code should be -1 (a successful fit has status code zero).
184
185# To improve the situation, we can apply a numeric trick: if we subtract in
186# each bin a constant counterterm \f[n\log(n/N)\f], we get terms for each
187# bin that are closer to each other in order of magnitude as long as the
188# initial model is not extremely off. Proving this mathematically is left
189# as an exercise to the reader.
190
191# This counterterms can be enabled by passing the Offset("bin") option to
192# RooAbsPdf::fitTo() or RooAbsPdf::createNLL().
193
194fit7 = model.fitTo(model_data, Offset="bin", Save=True, PrintLevel=-1, SumW2Error=False)
195fit7.Print()
196
197# You should now see in the last fit result that the fit has converged.