Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
rf403_weightedevts.py File Reference

Namespaces

namespace  rf403_weightedevts
 

Detailed Description

View in nbviewer Open in SWAN

'DATA AND CATEGORIES' RooFit tutorial macro #403

Using weights in unbinned datasets

from __future__ import print_function
import ROOT
# Create observable and unweighted dataset
# -------------------------------------------
# Declare observable
x = ROOT.RooRealVar("x", "x", -10, 10)
x.setBins(40)
# Construction a uniform pdf
p0 = ROOT.RooPolynomial("px", "px", x)
# Sample 1000 events from pdf
data = p0.generate({x}, 1000)
# Calculate weight and make dataset weighted
# --------------------------------------------------
# Construct formula to calculate (fake) weight for events
wFunc = ROOT.RooFormulaVar("w", "event weight", "(x*x+10)", [x])
# Add column with variable w to previously generated dataset
w = data.addColumn(wFunc)
# Dataset d is now a dataset with two observable (x,w) with 1000 entries
data.Print()
# Instruct dataset wdata in interpret w as event weight rather than as
# observable
wdata = ROOT.RooDataSet(data.GetName(), data.GetTitle(), data, data.get(), "", w.GetName())
# Dataset d is now a dataset with one observable (x) with 1000 entries and
# a sum of weights of ~430K
wdata.Print()
# Unbinned ML fit to weighted data
# ---------------------------------------------------------------
# Construction quadratic polynomial pdf for fitting
a0 = ROOT.RooRealVar("a0", "a0", 1)
a1 = ROOT.RooRealVar("a1", "a1", 0, -1, 1)
a2 = ROOT.RooRealVar("a2", "a2", 1, 0, 10)
p2 = ROOT.RooPolynomial("p2", "p2", x, [a0, a1, a2], 0)
# Fit quadratic polynomial to weighted data
# NOTE: A plain Maximum likelihood fit to weighted data does in general
# NOT result in correct error estimates, individual
# event weights represent Poisson statistics themselves.
#
# Fit with 'wrong' errors
r_ml_wgt = p2.fitTo(wdata, Save=True, PrintLevel=-1)
# A first order correction to estimated parameter errors in an
# (unbinned) ML fit can be obtained by calculating the
# covariance matrix as
#
# V' = V C-1 V
#
# where V is the covariance matrix calculated from a fit
# to -logL = - sum [ w_i log f(x_i) ] and C is the covariance
# matrix calculated from -logL' = -sum [ w_i^2 log f(x_i) ]
# (i.e. the weights are applied squared)
#
# A fit in self mode can be performed as follows:
r_ml_wgt_corr = p2.fitTo(wdata, Save=True, SumW2Error=True, PrintLevel=-1)
# Plot weighted data and fit result
# ---------------------------------------------------------------
# Construct plot frame
frame = x.frame(Title="Unbinned ML fit, chi^2 fit to weighted data")
# Plot data using sum-of-weights-squared error rather than Poisson errors
wdata.plotOn(frame, DataError="SumW2")
# Overlay result of 2nd order polynomial fit to weighted data
p2.plotOn(frame)
# ML fit of pdf to equivalent unweighted dataset
# ---------------------------------------------------------------------
# Construct a pdf with the same shape as p0 after weighting
genPdf = ROOT.RooGenericPdf("genPdf", "x*x+10", [x])
# Sample a dataset with the same number of events as data
data2 = genPdf.generate({x}, 1000)
# Sample a dataset with the same number of weights as data
data3 = genPdf.generate({x}, 43000)
# Fit the 2nd order polynomial to both unweighted datasets and save the
# results for comparison
r_ml_unw10 = p2.fitTo(data2, Save=True, PrintLevel=-1)
r_ml_unw43 = p2.fitTo(data3, Save=True, PrintLevel=-1)
# Chis2 fit of pdf to binned weighted dataset
# ---------------------------------------------------------------------------
# Construct binned clone of unbinned weighted dataset
binnedData = wdata.binnedClone()
binnedData.Print("v")
# Perform chi2 fit to binned weighted dataset using sum-of-weights errors
#
# NB: Within the usual approximations of a chi2 fit, chi2 fit to weighted
# data using sum-of-weights-squared errors does give correct error
# estimates
chi2 = p2.createChi2(binnedData, ROOT.RooFit.DataError("SumW2"))
m = ROOT.RooMinimizer(chi2)
m.migrad()
m.hesse()
# Plot chi^2 fit result on frame as well
r_chi2_wgt = m.save()
p2.plotOn(frame, LineStyle="--", LineColor="r")
# Compare fit results of chi2, L fits to (un)weighted data
# ------------------------------------------------------------
# Note that ML fit on 1Kevt of weighted data is closer to result of ML fit on 43Kevt of unweighted data
# than to 1Kevt of unweighted data, the reference chi^2 fit with SumW2 error gives a result closer to
# that of an unbinned ML fit to 1Kevt of unweighted data.
print("==> ML Fit results on 1K unweighted events")
r_ml_unw10.Print()
print("==> ML Fit results on 43K unweighted events")
r_ml_unw43.Print()
print("==> ML Fit results on 1K weighted events with a summed weight of 43K")
r_ml_wgt.Print()
print("==> Corrected ML Fit results on 1K weighted events with a summed weight of 43K")
r_ml_wgt_corr.Print()
print("==> Chi2 Fit results on 1K weighted events with a summed weight of 43K")
r_chi2_wgt.Print()
c = ROOT.TCanvas("rf403_weightedevts", "rf403_weightedevts", 600, 600)
ROOT.gPad.SetLeftMargin(0.15)
frame.GetYaxis().SetTitleOffset(1.8)
frame.Draw()
c.SaveAs("rf403_weightedevts.png")
RooDataSet::pxData[x,w] = 1000 entries
RooDataSet::pxData[x,weight:w] = 1000 entries (43238.9 weighted)
[#0] WARNING:InputArguments -- RooAbsPdf::fitTo(p2) WARNING: a likelihood fit is requested of what appears to be weighted data.
While the estimated values of the parameters will always be calculated taking the weights into account,
there are multiple ways to estimate the errors of the parameters. You are advised to make an
explicit choice for the error calculation:
- Either provide SumW2Error(true), to calculate a sum-of-weights-corrected HESSE error matrix
(error will be proportional to the number of events in MC).
- Or provide SumW2Error(false), to return errors from original HESSE error matrix
(which will be proportional to the sum of the weights, i.e., a dataset with <sum of weights> events).
- Or provide AsymptoticError(true), to use the asymptotically correct expression
(for details see https://arxiv.org/abs/1911.01303).
[#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: activating const optimization
[#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: deactivating const optimization
[#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: activating const optimization
[#1] INFO:Fitting -- RooAbsPdf::fitTo(p2) Calculating sum-of-weights-squared correction matrix for covariance matrix
[#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: deactivating const optimization
[#1] INFO:NumericIntegration -- RooRealIntegral::init(genPdf_Int[x]) using numeric integrator RooIntegrator1D to calculate Int(x)
[#1] INFO:NumericIntegration -- RooRealIntegral::init(genPdf_Int[x]) using numeric integrator RooIntegrator1D to calculate Int(x)
[#1] INFO:NumericIntegration -- RooRealIntegral::init(genPdf_Int[x]) using numeric integrator RooIntegrator1D to calculate Int(x)
[#1] INFO:NumericIntegration -- RooRealIntegral::init(genPdf_Int[x]) using numeric integrator RooIntegrator1D to calculate Int(x)
[#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: activating const optimization
[#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: deactivating const optimization
[#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: activating const optimization
[#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: deactivating const optimization
DataStore pxData_binned (Generated From px_binned)
Contains 40 entries
Observables:
1) x = 9.75 L(-10 - 10) B(40) "x"
Binned Dataset pxData_binned (Generated From px_binned)
Contains 40 bins with a total weight of 43238.9
Observables: 1) x = 9.75 L(-10 - 10) B(40) "x"
**********
** 54 **SET NOGRAD
**********
PARAMETER DEFINITIONS:
NO. NAME VALUE STEP SIZE LIMITS
1 a1 -1.13263e-03 4.02065e-03 -1.00000e+00 1.00000e+00
2 a2 9.75516e-02 2.36614e-03 0.00000e+00 1.00000e+01
**********
** 55 **SET ERR 1
**********
**********
** 56 **SET PRINT 1
**********
**********
** 57 **SET STR 1
**********
NOW USING STRATEGY 1: TRY TO BALANCE SPEED AGAINST RELIABILITY
**********
** 58 **MIGRAD 1000 1
**********
FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
START MIGRAD MINIMIZATION. STRATEGY 1. CONVERGENCE WHEN EDM .LT. 1.00e-03
FCN=32.3464 FROM MIGRAD STATUS=INITIATE 6 CALLS 7 TOTAL
EDM= unknown STRATEGY= 1 NO ERROR MATRIX
EXT PARAMETER CURRENT GUESS STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 a1 -1.13263e-03 4.02065e-03 4.02066e-03 2.74413e+01
2 a2 9.75516e-02 2.36614e-03 2.40760e-03 -2.04371e+02
MIGRAD MINIMIZATION HAS CONVERGED.
MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=31.3747 FROM MIGRAD STATUS=CONVERGED 28 CALLS 29 TOTAL
EDM=8.52339e-08 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 a1 -9.98900e-03 2.62975e-02 7.27538e-05 4.94399e-03
2 a2 1.06373e-01 1.01849e-02 2.73457e-05 -3.91519e-02
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 2 ERR DEF=1
6.917e-04 -6.529e-06
-6.529e-06 1.037e-04
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2
1 0.02437 1.000 -0.024
2 0.02437 -0.024 1.000
**********
** 59 **SET ERR 1
**********
**********
** 60 **SET PRINT 1
**********
**********
** 61 **HESSE 1000
**********
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=31.3747 FROM HESSE STATUS=OK 10 CALLS 39 TOTAL
EDM=8.52317e-08 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER INTERNAL INTERNAL
NO. NAME VALUE ERROR STEP SIZE VALUE
1 a1 -9.98900e-03 2.62976e-02 1.45508e-05 -9.98917e-03
2 a2 1.06373e-01 1.01850e-02 5.46914e-06 -1.36415e+00
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 2 ERR DEF=1
6.917e-04 -6.575e-06
-6.575e-06 1.037e-04
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2
1 0.02455 1.000 -0.025
2 0.02455 -0.025 1.000
RooFitResult: minimized FCN value: 2766.49, estimated distance to minimum: 7.63714e-07
covariance matrix quality: Full, accurate covariance matrix
Status : MINIMIZE=0 HESSE=0
Floating Parameter FinalValue +/- Error
-------------------- --------------------------
a1 8.9440e-03 +/- 2.69e-02
a2 1.0129e-01 +/- 1.67e-02
RooFitResult: minimized FCN value: 118892, estimated distance to minimum: 2.21254e-06
covariance matrix quality: Full, accurate covariance matrix
Status : MINIMIZE=0 HESSE=0
Floating Parameter FinalValue +/- Error
-------------------- --------------------------
a1 -1.1326e-03 +/- 4.02e-03
a2 9.7552e-02 +/- 2.37e-03
RooFitResult: minimized FCN value: 119682, estimated distance to minimum: 1.28662e-07
covariance matrix quality: Full, accurate covariance matrix
Status : MINIMIZE=0 HESSE=0
Floating Parameter FinalValue +/- Error
-------------------- --------------------------
a1 -4.8560e-03 +/- 4.03e-03
a2 9.8651e-02 +/- 2.41e-03
RooFitResult: minimized FCN value: 6.84247e+06, estimated distance to minimum: 158914
covariance matrix quality: Full, accurate covariance matrix
Status : MINIMIZE=0 HESSE=0 HESSE=0
Floating Parameter FinalValue +/- Error
-------------------- --------------------------
a1 -4.8555e-03 +/- 3.00e-02
a2 9.8652e-02 +/- 2.99e-02
RooFitResult: minimized FCN value: 31.3747, estimated distance to minimum: 8.52317e-08
covariance matrix quality: Full, accurate covariance matrix
Status : MIGRAD=0 HESSE=0
Floating Parameter FinalValue +/- Error
-------------------- --------------------------
a1 -9.9890e-03 +/- 2.63e-02
a2 1.0637e-01 +/- 1.02e-02
==> ML Fit results on 1K unweighted events
==> ML Fit results on 43K unweighted events
==> ML Fit results on 1K weighted events with a summed weight of 43K
==> Corrected ML Fit results on 1K weighted events with a summed weight of 43K
==> Chi2 Fit results on 1K weighted events with a summed weight of 43K
Date
February 2018
Authors
Clemens Lange, Wouter Verkerke (C version)

Definition in file rf403_weightedevts.py.