Likelihood and minimization: Parameter uncertainties for weighted unbinned ML fits
This example compares different approaches to determining parameter uncertainties in weighted unbinned maximum likelihood fits. Performing a weighted unbinned maximum likelihood fits can be useful to account for acceptance effects and to statistically subtract background events using the sPlot formalism. It is however well known that the inverse Hessian matrix does not yield parameter uncertainties with correct coverage in the presence of event weights. Three approaches to the determination of parameter uncertainties are compared in this example:
The example performs the fit of a second order polynomial in the angle cos(theta) [-1,1] to a weighted data set. The polynomial is given by
\[ P = \frac{ 1 + c_0 \cdot \cos(\theta) + c_1 \cdot \cos(\theta) \cdot \cos(\theta) }{\mathrm{Norm}} \]
The two coefficients \( c_0 \) and \( c_1 \) and their uncertainties are to be determined in the fit.
The per-event weight is used to correct for an acceptance effect, two different acceptance models can be studied:
The performance of the different approaches to determine parameter uncertainties is compared using the pull distributions from a large number of pseudoexperiments. The pull is defined as \( (\lambda_i - \lambda_{gen})/\sigma(\lambda_i) \), where \( \lambda_i \) is the fitted parameter and \( \sigma(\lambda_i) \) its uncertainty for pseudoexperiment number i. If the fit is unbiased and the parameter uncertainties are estimated correctly, the pull distribution should be a Gaussian centered around zero with a width of one.
␛[1mRooFit v3.60 -- Developed by Wouter Verkerke and David Kirkby␛[0m
Copyright (C) 2000-2013 NIKHEF, University of California & Stanford University
All rights reserved, please read http://roofit.sourceforge.net/license.txt
Running 1500 toy fits ...
... done.
FCN=13.515 FROM MIGRAD STATUS=CONVERGED 60 CALLS 61 TOTAL
EDM=7.11482e-10 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 Constant 8.06326e+01 4.61035e+00 6.80051e-03 2.63179e-06
2 Mean 4.59821e-04 5.58735e-02 1.02835e-04 -6.59629e-04
3 Sigma 1.20608e+00 4.29699e-02 1.71309e-05 1.24617e-03
FCN=5.84356 FROM MIGRAD STATUS=CONVERGED 59 CALLS 60 TOTAL
EDM=4.43987e-09 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 Constant 9.75213e+01 5.59338e+00 5.60402e-03 -1.97617e-05
2 Mean 7.51216e-03 4.64617e-02 5.90804e-05 -7.04981e-04
3 Sigma 1.01358e+00 3.70947e-02 1.21766e-05 -6.78934e-03
FCN=5.92321 FROM MIGRAD STATUS=CONVERGED 59 CALLS 60 TOTAL
EDM=2.82374e-09 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 Constant 9.67219e+01 5.56424e+00 5.59080e-03 -1.59410e-05
2 Mean 1.31546e-02 4.71082e-02 5.99842e-05 -5.99126e-04
3 Sigma 1.02204e+00 3.77681e-02 1.23082e-05 -4.76847e-03
FCN=9.99353 FROM MIGRAD STATUS=CONVERGED 51 CALLS 52 TOTAL
EDM=1.54209e-08 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 Constant 7.31330e+01 4.15612e+00 5.34878e-03 8.14696e-06
2 Mean -3.53982e-03 6.24075e-02 1.00776e-04 2.69686e-03
3 Sigma 1.34426e+00 4.86401e-02 1.55414e-05 6.26797e-03
FCN=13.9377 FROM MIGRAD STATUS=CONVERGED 59 CALLS 60 TOTAL
EDM=1.41305e-07 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 Constant 3.74207e+01 2.42528e+00 3.48790e-03 -5.82511e-05
2 Mean -3.59360e-01 1.24029e-01 2.28239e-04 3.37657e-03
3 Sigma 2.24823e+00 1.11069e-01 2.42149e-05 -2.95187e-02
FCN=6.06878 FROM MIGRAD STATUS=CONVERGED 60 CALLS 61 TOTAL
EDM=2.14707e-09 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 Constant 9.96656e+01 5.51125e+00 5.82217e-03 -7.19806e-06
2 Mean 2.92964e-02 4.61022e-02 5.98483e-05 1.20218e-03
3 Sigma 9.95652e-01 3.38750e-02 1.20391e-05 -3.38248e-03
(int) 0
int rf611_weightedfits(int acceptancemodel=2) {
TH1D* haccepted =
new TH1D(
"haccepted",
"Generated events;cos(#theta);#events", 40, -1.0, 1.0);
TH1D* hweighted =
new TH1D(
"hweighted",
"Generated events;cos(#theta);#events", 40, -1.0, 1.0);
TH1D* hc0pull1 =
new TH1D(
"hc0pull1",
"Inverse weighted Hessian matrix [SumW2Error(false)];Pull (c_{0}^{fit}-c_{0}^{gen})/#sigma(c_{0});", 20, -5.0, 5.0);
TH1D* hc1pull1 =
new TH1D(
"hc1pull1",
"Inverse weighted Hessian matrix [SumW2Error(false)];Pull (c_{1}^{fit}-c_{1}^{gen})/#sigma(c_{1});", 20, -5.0, 5.0);
TH1D* hc0pull2 =
new TH1D(
"hc0pull2",
"Hessian matrix with squared weights [SumW2Error(true)];Pull (c_{0}^{fit}-c_{0}^{gen})/#sigma(c_{0});", 20, -5.0, 5.0);
TH1D* hc1pull2 =
new TH1D(
"hc1pull2",
"Hessian matrix with squared weights [SumW2Error(true)];Pull (c_{1}^{fit}-c_{1}^{gen})/#sigma(c_{1});", 20, -5.0, 5.0);
TH1D* hc0pull3 =
new TH1D(
"hc0pull3",
"Asymptotically correct approach [Asymptotic(true)];Pull (c_{0}^{fit}-c_{0}^{gen})/#sigma(c_{0});", 20, -5.0, 5.0);
TH1D* hc1pull3 =
new TH1D(
"hc1pull3",
"Asymptotically correct approach [Asymptotic(true)];Pull (c_{1}^{fit}-c_{1}^{gen})/#sigma(c_{1});", 20, -5.0, 5.0);
constexpr unsigned int ntoys = 500;
constexpr unsigned int nstats = 5000;
constexpr double c0gen = 0.0;
constexpr double c1gen = 0.0;
std::cout << "Running " << ntoys*3 << " toy fits ..." << std::endl;
for (unsigned int i=0; i<ntoys; i++) {
RooRealVar costheta(
"costheta",
"costheta", -1.0, 1.0);
RooRealVar weight(
"weight",
"weight", 0.0, 1000.0);
RooRealVar c0(
"c0",
"0th-order coefficient", c0gen, -1.0, 1.0);
RooRealVar c1(
"c1",
"1st-order coefficient", c1gen, -1.0, 1.0);
c0.setError(0.01);
for (unsigned int j=0; j<nstats; j++) {
bool finished = false;
while (!finished) {
costheta = 2.0*rnd->
Rndm()-1.0;
double eff = 1.0;
if (acceptancemodel == 1)
eff = 1.0 - 0.7 * costheta.getVal()*costheta.getVal();
else
eff = 0.3 + 0.7 * costheta.getVal()*costheta.getVal();
weight = 1.0/eff;
if (10.0*rnd->
Rndm() < eff*pol.getVal())
finished = true;
}
haccepted->
Fill(costheta.getVal());
hweighted->
Fill(costheta.getVal(), weight.getVal());
data.add(
RooArgSet(costheta, weight), weight.getVal());
}
hc0pull1->
Fill((c0.getVal()-c0gen)/c0.getError());
hc1pull1->
Fill((
c1.getVal()-c1gen)/
c1.getError());
hc0pull2->
Fill((c0.getVal()-c0gen)/c0.getError());
hc1pull2->
Fill((
c1.getVal()-c1gen)/
c1.getError());
hc0pull3->
Fill((c0.getVal()-c0gen)/c0.getError());
hc1pull3->
Fill((
c1.getVal()-c1gen)/
c1.getError());
}
std::cout << "... done." << std::endl;
haccepted->
Draw(
"same hist");
leg->AddEntry(haccepted,
"Accepted");
leg->AddEntry(hweighted,
"Weighted");
return 0;
}
R__EXTERN TStyle * gStyle
RooArgList is a container object that can hold multiple RooAbsArg objects.
RooArgSet is a container object that can hold multiple RooAbsArg objects.
RooDataSet is a container class to hold unbinned data.
RooFitResult is a container class to hold the input and output of a PDF fit to a dataset.
static RooMsgService & instance()
Return reference to singleton instance.
RooPolynomial implements a polynomial p.d.f of the form.
RooRealVar represents a variable that can be changed from the outside.
virtual void SetLineColor(Color_t lcolor)
Set the line color.
virtual void SetMarkerStyle(Style_t mstyle=1)
Set the marker style.
virtual void SetMarkerSize(Size_t msize=1)
Set the marker size.
virtual void SetTextSize(Float_t tsize=1)
Set the text size.
virtual void Update()
Update canvas pad buffers.
TVirtualPad * cd(Int_t subpadnumber=0)
Set current canvas & pad.
1-D histogram with a double per channel (see TH1 documentation)}
virtual TFitResultPtr Fit(const char *formula, Option_t *option="", Option_t *goption="", Double_t xmin=0, Double_t xmax=0)
Fit histogram with function fname.
virtual Int_t Fill(Double_t x)
Increment bin with abscissa X by 1.
virtual void SetMinimum(Double_t minimum=-1111)
virtual void Draw(Option_t *option="")
Draw this histogram with options.
This class displays a legend box (TPaveText) containing several legend entries.
virtual void Divide(Int_t nx=1, Int_t ny=1, Float_t xmargin=0.01, Float_t ymargin=0.01, Int_t color=0)
Automatic pad generation by division.
Random number generator class based on M.
virtual Double_t Rndm()
Machine independent random number generator.
virtual void SetSeed(ULong_t seed=0)
Set the random generator sequence if seed is 0 (default value) a TUUID is generated and used to fill ...
void SetPadTopMargin(Float_t margin=0.1)
void SetOptStat(Int_t stat=1)
The type of information printed in the histogram statistics box can be selected via the parameter mod...
void SetPadBottomMargin(Float_t margin=0.1)
void SetPaintTextFormat(const char *format="g")
void SetEndErrorSize(Float_t np=2)
Set the size (in pixels) of the small lines drawn at the end of the error bars (TH1 or TGraphErrors).
void SetPadRightMargin(Float_t margin=0.1)
void SetTitleOffset(Float_t offset=1, Option_t *axis="X")
Specify a parameter offset to control the distance between the axis and the axis title.
void SetPadLeftMargin(Float_t margin=0.1)
void SetHistLineColor(Color_t color=1)
void SetTitleSize(Float_t size=0.02, Option_t *axis="X")
void SetHistLineWidth(Width_t width=1)
void SetLabelSize(Float_t size=0.04, Option_t *axis="X")
Set size of axis labels.
void SetOptFit(Int_t fit=1)
The type of information about fit parameters printed in the histogram statistics box can be selected ...
RooCmdArg WeightVar(const char *name, Bool_t reinterpretAsWeight=kFALSE)
RooCmdArg SumW2Error(Bool_t flag)
RooCmdArg Save(Bool_t flag=kTRUE)
RooCmdArg PrintLevel(Int_t code)
RooCmdArg BatchMode(bool flag=true)
RooCmdArg AsymptoticError(Bool_t flag)
The namespace RooFit contains mostly switches that change the behaviour of functions of PDFs (or othe...