Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
rf611_weightedfits.C
Go to the documentation of this file.
1/// \file
2/// \ingroup tutorial_roofit
3/// \notebook -js
4/// Likelihood and minimization: Parameter uncertainties for weighted unbinned ML fits
5///
6/// ## Parameter uncertainties for weighted unbinned ML fits
7///
8/// Based on example from https://arxiv.org/abs/1911.01303
9///
10/// This example compares different approaches to determining parameter uncertainties in weighted unbinned maximum
11/// likelihood fits. Performing a weighted unbinned maximum likelihood fits can be useful to account for acceptance
12/// effects and to statistically subtract background events using the sPlot formalism. It is however well known that the
13/// inverse Hessian matrix does not yield parameter uncertainties with correct coverage in the presence of event
14/// weights. Three approaches to the determination of parameter uncertainties are compared in this example:
15///
16/// 1. Using the inverse weighted Hessian matrix [`SumW2Error(false)`]
17///
18/// 2. Using the expression [`SumW2Error(true)`]
19/// \f[
20/// V_{ij} = H_{ik}^{-1} C_{kl} H_{lj}^{-1}
21/// \f]
22/// where H is the weighted Hessian matrix and C is the Hessian matrix with squared weights
23///
24/// 3. The asymptotically correct approach (for details please see https://arxiv.org/abs/1911.01303)
25/// [`Asymptotic(true)`]
26/// \f[
27/// V_{ij} = H_{ik}^{-1} D_{kl} H_{lj}^{-1}
28/// \f]
29/// where H is the weighted Hessian matrix and D is given by
30/// \f[
31/// D_{kl} = \sum_{e=1}^{N} w_e^2 \frac{\partial \log(P)}{\partial \lambda_k}\frac{\partial \log(P)}{\partial
32/// \lambda_l}
33/// \f]
34/// with the event weight \f$w_e\f$.
35///
36/// The example performs the fit of a second order polynomial in the angle cos(theta) [-1,1] to a weighted data set.
37/// The polynomial is given by
38/// \f[
39/// P = \frac{ 1 + c_0 \cdot \cos(\theta) + c_1 \cdot \cos(\theta) \cdot \cos(\theta) }{\mathrm{Norm}}
40/// \f]
41/// The two coefficients \f$ c_0 \f$ and \f$ c_1 \f$ and their uncertainties are to be determined in the fit.
42///
43/// The per-event weight is used to correct for an acceptance effect, two different acceptance models can be studied:
44/// - `acceptancemodel==1`: eff = \f$ 0.3 + 0.7 \cdot \cos(\theta) \cdot \cos(\theta) \f$
45/// - `acceptancemodel==2`: eff = \f$ 1.0 - 0.7 \cdot \cos(\theta) \cdot \cos(\theta) \f$
46/// The data is generated to be flat before the acceptance effect.
47///
48/// The performance of the different approaches to determine parameter uncertainties is compared using the pull
49/// distributions from a large number of pseudoexperiments. The pull is defined as \f$ (\lambda_i -
50/// \lambda_{gen})/\sigma(\lambda_i) \f$, where \f$ \lambda_i \f$ is the fitted parameter and \f$ \sigma(\lambda_i) \f$
51/// its uncertainty for pseudoexperiment number i. If the fit is unbiased and the parameter uncertainties are estimated
52/// correctly, the pull distribution should be a Gaussian centered around zero with a width of one.
53///
54/// \macro_image
55/// \macro_code
56/// \macro_output
57///
58/// \date November 2019
59/// \author Christoph Langenbruch
60
61#include "TH1D.h"
62#include "TCanvas.h"
63#include "TROOT.h"
64#include "TStyle.h"
65#include "TRandom3.h"
66#include "TLegend.h"
67#include "RooRealVar.h"
68#include "RooFitResult.h"
69#include "RooDataSet.h"
70#include "RooPolynomial.h"
71
72using namespace RooFit;
73
74void rf611_weightedfits(int acceptancemodel = 2)
75{
76 // I n i t i a l i s a t i o n a n d S e t u p
77 //------------------------------------------------
78
79 // plotting options
82 gStyle->SetTitleSize(0.05, "XY");
83 gStyle->SetLabelSize(0.05, "XY");
84 gStyle->SetTitleOffset(0.9, "XY");
85 gStyle->SetTextSize(0.05);
88 gStyle->SetPadTopMargin(0.075);
94
95 // initialise TRandom3
96 TRandom3 *rnd = new TRandom3();
97 rnd->SetSeed(191101303);
98
99 // accepted events and events weighted to account for the acceptance
100 TH1 *haccepted = new TH1D("haccepted", "Generated events;cos(#theta);#events", 40, -1.0, 1.0);
101 TH1 *hweighted = new TH1D("hweighted", "Generated events;cos(#theta);#events", 40, -1.0, 1.0);
102 // histograms holding pull distributions
103 std::array<TH1 *, 3> hc0pull;
104 std::array<TH1 *, 3> hc1pull;
105 std::array<TH1 *, 3> hntotpull;
106 std::array<std::string, 3> methodLabels{"Inverse weighted Hessian matrix [SumW2Error(false)]",
107 "Hessian matrix with squared weights [SumW2Error(true)]",
108 "Asymptotically correct approach [Asymptotic(true)]"};
109 auto makePullXLabel = [](std::string const &pLabel) {
110 return "Pull (" + pLabel + "^{fit}-" + pLabel + "^{gen})/#sigma(" + pLabel + ")";
111 };
112 for (std::size_t i = 0; i < 3; ++i) {
113 std::string const &iLabel = std::to_string(i);
114 // using the inverse Hessian matrix
115 std::string hc0XLabel = methodLabels[i] + ";" + makePullXLabel("c_{0}") + ";";
116 std::string hc1XLabel = methodLabels[i] + ";" + makePullXLabel("c_{1}") + ";";
117 std::string hntotXLabel = methodLabels[i] + ";" + makePullXLabel("N_{tot}") + ";";
118 hc0pull[i] = new TH1D(("hc0pull" + iLabel).c_str(), hc0XLabel.c_str(), 20, -5.0, 5.0);
119 // using the correction with the Hessian matrix with squared weights
120 hc1pull[i] = new TH1D(("hc1pull" + iLabel).c_str(), hc1XLabel.c_str(), 20, -5.0, 5.0);
121 // asymptotically correct approach
122 hntotpull[i] = new TH1D(("hntotpull" + iLabel).c_str(), hntotXLabel.c_str(), 20, -5.0, 5.0);
123 }
124
125 // number of pseudoexperiments (toys) and number of events per pseudoexperiment
126 constexpr std::size_t ntoys = 500;
127 constexpr std::size_t nstats = 500;
128 // parameters used in the generation
129 constexpr double c0gen = 0.0;
130 constexpr double c1gen = 0.0;
131
132 // Silence fitting and minimisation messages
133 auto &msgSv = RooMsgService::instance();
134 msgSv.getStream(1).removeTopic(RooFit::Minimization);
135 msgSv.getStream(1).removeTopic(RooFit::Fitting);
136
137 std::cout << "Running " << ntoys * 3 << " toy fits ..." << std::endl;
138
139 // M a i n l o o p : r u n p s e u d o e x p e r i m e n t s
140 //----------------------------------------------------------------
141 for (std::size_t i = 0; i < ntoys; i++) {
142 // S e t u p p a r a m e t e r s a n d P D F
143 //-----------------------------------------------
144 // angle theta and the weight to account for the acceptance effect
145 RooRealVar costheta("costheta", "costheta", -1.0, 1.0);
146 RooRealVar weight("weight", "weight", 0.0, 1000.0);
147
148 // initialise parameters to fit
149 RooRealVar c0("c0", "0th-order coefficient", c0gen, -1.0, 1.0);
150 RooRealVar c1("c1", "1st-order coefficient", c1gen, -1.0, 1.0);
151 c0.setError(0.01);
152 c1.setError(0.01);
153 // create simple second-order polynomial as probability density function
154 RooPolynomial pol("pol", "pol", costheta, {c0, c1}, 1);
155
156 double ngen = nstats;
157 if (acceptancemodel == 1)
158 ngen *= 2.0 / (23.0 / 15.0);
159 else
160 ngen *= 2.0 / (16.0 / 15.0);
161 RooRealVar ntot("ntot", "ntot", ngen, 0.0, 2.0 * ngen);
162 RooExtendPdf extended("extended", "extended pdf", pol, ntot);
163 int npoisson = rnd->Poisson(nstats);
164
165 // G e n e r a t e d a t a s e t f o r p s e u d o e x p e r i m e n t i
166 //-------------------------------------------------------------------------------
167 RooDataSet data("data", "data", {costheta, weight}, WeightVar("weight"));
168 // generate nstats events
169 for (std::size_t j = 0; j < npoisson; j++) {
170 bool finished = false;
171 // use simple accept/reject for generation
172 while (!finished) {
173 costheta = 2.0 * rnd->Rndm() - 1.0;
174 // efficiency for the specific value of cos(theta)
175 double eff = 1.0;
176 if (acceptancemodel == 1)
177 eff = 1.0 - 0.7 * costheta.getVal() * costheta.getVal();
178 else
179 eff = 0.3 + 0.7 * costheta.getVal() * costheta.getVal();
180 // use 1/eff as weight to account for acceptance
181 weight = 1.0 / eff;
182 // accept/reject
183 if (10.0 * rnd->Rndm() < eff * pol.getVal())
184 finished = true;
185 }
186 haccepted->Fill(costheta.getVal());
187 hweighted->Fill(costheta.getVal(), weight.getVal());
188 data.add({costheta, weight}, weight.getVal());
189 }
190
191 auto fillPulls = [&](std::size_t i) {
192 hc0pull[i]->Fill((c0.getVal() - c0gen) / c0.getError());
193 hc1pull[i]->Fill((c1.getVal() - c1gen) / c1.getError());
194 hntotpull[i]->Fill((ntot.getVal() - ngen) / ntot.getError());
195 };
196
197 // F i t t o y u s i n g t h e t h r e e d i f f e r e n t a p p r o a c h e s t o u n c e r t a i
198 // n t y d e t e r m i n a t i o n
199 //-------------------------------------------------------------------------------------------------------------------------------------------------
200 // this uses the inverse weighted Hessian matrix
201 extended.fitTo(data, SumW2Error(false), PrintLevel(-1));
202 fillPulls(0);
203
204 // this uses the correction with the Hesse matrix with squared weights
205 extended.fitTo(data, SumW2Error(true), PrintLevel(-1));
206 fillPulls(1);
207
208 // this uses the asymptotically correct approach
209 extended.fitTo(data, AsymptoticError(true), PrintLevel(-1));
210 fillPulls(2);
211 }
212
213 std::cout << "... done." << std::endl;
214
215 // P l o t o u t p u t d i s t r i b u t i o n s
216 //--------------------------------------------------
217
218 // plot accepted (weighted) events
219 gStyle->SetOptStat(0);
220 gStyle->SetOptFit(0);
221 TCanvas *cevents = new TCanvas("cevents", "cevents", 800, 600);
222 cevents->cd(1);
223 hweighted->SetMinimum(0.0);
224 hweighted->SetLineColor(2);
225 hweighted->Draw("hist");
226 haccepted->Draw("same hist");
227 TLegend *leg = new TLegend(0.6, 0.8, 0.9, 0.9);
228 leg->AddEntry(haccepted, "Accepted");
229 leg->AddEntry(hweighted, "Weighted");
230 leg->Draw();
231 cevents->Update();
232
233 // plot pull distributions
234 TCanvas *cpull = new TCanvas("cpull", "cpull", 1200, 800);
235 cpull->Divide(3, 3);
236
237 std::vector<TH1 *> pullHistos{hc0pull[0], hc0pull[1], hc0pull[2], hc1pull[0], hc1pull[1],
238 hc1pull[2], hntotpull[0], hntotpull[1], hntotpull[2]};
239
240 gStyle->SetOptStat(1100);
241 gStyle->SetOptFit(11);
242
243 for (std::size_t i = 0; i < pullHistos.size(); ++i) {
244 cpull->cd(i + 1);
245 pullHistos[i]->Fit("gaus");
246 pullHistos[i]->Draw("ep");
247 }
248
249 cpull->Update();
250}
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void data
R__EXTERN TStyle * gStyle
Definition TStyle.h:436
Container class to hold unbinned data.
Definition RooDataSet.h:34
RooExtendPdf is a wrapper around an existing PDF that adds a parameteric extended likelihood term to ...
static RooMsgService & instance()
Return reference to singleton instance.
RooPolynomial implements a polynomial p.d.f of the form.
Variable that can be changed from the outside.
Definition RooRealVar.h:37
virtual void SetLineColor(Color_t lcolor)
Set the line color.
Definition TAttLine.h:40
virtual void SetMarkerStyle(Style_t mstyle=1)
Set the marker style.
Definition TAttMarker.h:40
virtual void SetMarkerSize(Size_t msize=1)
Set the marker size.
Definition TAttMarker.h:45
virtual void SetTextSize(Float_t tsize=1)
Set the text size.
Definition TAttText.h:47
The Canvas class.
Definition TCanvas.h:23
TVirtualPad * cd(Int_t subpadnumber=0) override
Set current canvas & pad.
Definition TCanvas.cxx:719
void Update() override
Update canvas pad buffers.
Definition TCanvas.cxx:2489
1-D histogram with a double per channel (see TH1 documentation)
Definition TH1.h:671
TH1 is the base class of all histogram classes in ROOT.
Definition TH1.h:59
virtual Int_t Fill(Double_t x)
Increment bin with abscissa X by 1.
Definition TH1.cxx:3346
void Draw(Option_t *option="") override
Draw this histogram with options.
Definition TH1.cxx:3068
virtual void SetMinimum(Double_t minimum=-1111)
Definition TH1.h:406
This class displays a legend box (TPaveText) containing several legend entries.
Definition TLegend.h:23
void Divide(Int_t nx=1, Int_t ny=1, Float_t xmargin=0.01, Float_t ymargin=0.01, Int_t color=0) override
Automatic pad generation by division.
Definition TPad.cxx:1249
Random number generator class based on M.
Definition TRandom3.h:27
Double_t Rndm() override
Machine independent random number generator.
Definition TRandom3.cxx:99
void SetSeed(ULong_t seed=0) override
Set the random generator sequence if seed is 0 (default value) a TUUID is generated and used to fill ...
Definition TRandom3.cxx:206
virtual ULong64_t Poisson(Double_t mean)
Generates a random integer N according to a Poisson law.
Definition TRandom.cxx:404
void SetPadTopMargin(Float_t margin=0.1)
Definition TStyle.h:359
void SetOptStat(Int_t stat=1)
The type of information printed in the histogram statistics box can be selected via the parameter mod...
Definition TStyle.cxx:1640
void SetPadBottomMargin(Float_t margin=0.1)
Definition TStyle.h:358
void SetPaintTextFormat(const char *format="g")
Definition TStyle.h:386
void SetEndErrorSize(Float_t np=2)
Set the size (in pixels) of the small lines drawn at the end of the error bars (TH1 or TGraphErrors).
Definition TStyle.cxx:1340
void SetPadRightMargin(Float_t margin=0.1)
Definition TStyle.h:361
void SetTitleOffset(Float_t offset=1, Option_t *axis="X")
Specify a parameter offset to control the distance between the axis and the axis title.
Definition TStyle.cxx:1798
void SetPadLeftMargin(Float_t margin=0.1)
Definition TStyle.h:360
void SetHistLineColor(Color_t color=1)
Definition TStyle.h:380
void SetTitleSize(Float_t size=0.02, Option_t *axis="X")
Definition TStyle.cxx:1817
void SetHistLineWidth(Width_t width=1)
Definition TStyle.h:383
void SetLabelSize(Float_t size=0.04, Option_t *axis="X")
Set size of axis labels.
Definition TStyle.cxx:1444
void SetOptFit(Int_t fit=1)
The type of information about fit parameters printed in the histogram statistics box can be selected ...
Definition TStyle.cxx:1593
void Draw(Option_t *option="") override=0
Default Draw method for all objects.
RooCmdArg WeightVar(const char *name="weight", bool reinterpretAsWeight=false)
RooCmdArg AsymptoticError(bool flag)
RooCmdArg SumW2Error(bool flag)
RooCmdArg PrintLevel(Int_t code)
return c1
Definition legend1.C:41
leg
Definition legend1.C:34
The namespace RooFit contains mostly switches that change the behaviour of functions of PDFs (or othe...
Definition CodegenImpl.h:64