ROOT   Reference Guide
Searching...
No Matches
BernsteinCorrection.cxx
Go to the documentation of this file.
1// @(#)root/roostats:$Id$
2// Author: Kyle Cranmer 28/07/2008
3
4/*************************************************************************
7 * *
8 * For the licensing terms see $ROOTSYS/LICENSE. * 9 * For the list of contributors see$ROOTSYS/README/CREDITS. *
10 *************************************************************************/
11
12/** \class RooStats::BernsteinCorrection
13 \ingroup Roostats
14
15BernsteinCorrection is a utility in RooStats to augment a nominal PDF with a polynomial
16correction term. This is useful for incorporating systematic variations to the nominal PDF.
17The Bernstein basis polynomials are particularly appropriate because they are positive definite.
18
19This tool was inspired by the work of Glen Cowan together with Stephan Horner, Sascha Caron,
20Eilam Gross, and others.
21The initial implementation is independent work. The major step forward in the approach was
22to provide a well defined algorithm that specifies the order of polynomial to be included
23in the correction. This is an empirical algorithm, so in addition to the nominal model it
24needs either a real data set or a simulated one. In the early work, the nominal model was taken
25to be a histogram from Monte Carlo simulations, but in this implementation it is generalized to an
26arbitrary PDF (which includes a RooHistPdf). The algorithm basically consists of a
27hypothesis test of an nth-order correction (null) against a n+1-th order correction (alternate).
28The quantity q = -2 log LR is used to determine whether the n+1-th order correction is a major
29improvement to the n-th order correction. The distribution of q is expected to be roughly
30\f$\chi^2\f$ with one degree of freedom if the n-th order correction is a good model for the data.
31 Thus, one only moves to the n+1-th order correction of q is relatively large. The chance that
32one moves from the n-th to the n+1-th order correction when the n-th order correction
33(eg. a type 1 error) is sufficient is given by the Prob(\f$\chi^2_1\f$ > threshold). The constructor
34of this class allows you to directly set this tolerance (in terms of probability that the n+1-th
36
37To do:
38Add another method to the utility that will make the sampling distribution for -2 log lambda
39for various m vs. m+1 order corrections using a nominal model and perhaps having two ways of
40generating the toys (either via a histogram or via an independent model that is supposed to
41 reflect reality). That will allow one to make plots like Glen has at the end of his DRAFT
42 very easily.
43
44*/
45
46
48
49#include "RooGlobalFunc.h"
50#include "RooDataSet.h"
51#include "RooRealVar.h"
52#include "RooAbsPdf.h"
53#include "RooFitResult.h"
54#include "TMath.h"
55#include <string>
56#include <vector>
57#include <cstdio>
58#include <sstream>
59#include <iostream>
60
61#include "RooEffProd.h"
62#include "RooWorkspace.h"
63#include "RooDataHist.h"
64#include "RooHistPdf.h"
65
66#include "RooBernstein.h"
67
69
70
72
73using namespace RooFit;
74using namespace RooStats;
75using std::cout, std::endl, std::vector;
76
77////////////////////////////////////////////////////////////////////////////////
78
80 fMaxDegree(10), fMaxCorrection(100), fTolerance(tolerance){
81}
82
83
84////////////////////////////////////////////////////////////////////////////////
85/// Main method for Bernstein correction.
86
88 const char* nominalName,
89 const char* varName,
90 const char* dataName){
91 // get ingredients out of workspace
92 RooRealVar* x = wks->var(varName);
93 RooAbsPdf* nominal = wks->pdf(nominalName);
94 RooAbsData* data = wks->data(dataName);
95
96 if (!x || !nominal || !data) {
97 cout << "Error: wrong name for pdf or variable or dataset - return -1 " << std::endl;
98 return -1;
99 }
100
101 std::cout << "BernsteinCorrection::ImportCorrectedPdf - Doing initial Fit with nominal model " << std::endl;
102
103 // initialize alg, by checking how well nominal model fits
105
106 std::unique_ptr<RooFitResult> nominalResult{nominal->fitTo(*data,Save(),Minos(false), Hesse(false),PrintLevel(printLevel))};
107 double lastNll= nominalResult->minNll();
108
109 if (nominalResult->status() != 0 ) {
110 std::cout << "BernsteinCorrection::ImportCorrectedPdf - Error fit with nominal model failed - exit" << std::endl;
111 return -1;
112 }
113
114 // setup a log
115 std::stringstream log;
116 log << "------ Begin Bernstein Correction Log --------" << endl;
117
118 // Local variables that we want to keep in scope after loop
119 RooArgList coeff;
120 vector<RooRealVar*> coefficients;
121 double q = 1E6;
122 Int_t degree = -1;
123
124 // The while loop
125 bool keepGoing = true;
126 while( keepGoing ) {
127 degree++;
128
129 // we need to generate names for vars on the fly
130 std::stringstream str;
131 str<<"_"<<degree;
132
133 RooRealVar* newCoef = new RooRealVar(("c"+str.str()).c_str(),
134 "Bernstein basis poly coefficient",
135 1.0, 0., fMaxCorrection);
137 coefficients.push_back(newCoef);
138 // Since pdf is normalized - coefficient for degree 0 is fixed to be 1
139 if (degree == 0) {
140 newCoef->setVal(1);
141 newCoef->setConstant(true);
142 continue;
143 }
144
145 // make the polynomial correction term
146 RooBernstein* poly = new RooBernstein("poly", "Bernstein poly", *x, coeff);
147
148 // make the corrected PDF = nominal * poly
149 RooEffProd* corrected = new RooEffProd("corrected","",*nominal,*poly);
150
151 // check to see how well this correction fits
152 std::unique_ptr<RooFitResult> result{corrected->fitTo(*data,Save(),Minos(false), Hesse(false),PrintLevel(printLevel))};
153
154 if (result->status() != 0) {
155 std::cout << "BernsteinCorrection::ImportCorrectedPdf - Error fit with corrected model failed" << std::endl;
156 return -1;
157 }
158
159
160 // Hypothesis test between previous correction (null)
161 // and this one (alternate). Use -2 log LR for test statistic
162 q = 2*(lastNll - result->minNll()); // -2 log lambda, goes like significance^2
163 // check if we should keep going based on rate of Type I error
164 keepGoing = (degree < 1 || TMath::Prob(q,1) < fTolerance );
165 if (degree >= fMaxDegree) keepGoing = false;
166
167 if(!keepGoing){
168 // terminate loop, import corrected PDF
169 //RooMsgService::instance().setGlobalKillBelow(RooFit::FATAL) ;
170 wks->import(*corrected);
171 //RooMsgService::instance().setGlobalKillBelow(RooFit::DEBUG) ;
172 } else {
173 // memory management
174 delete corrected;
175 delete poly;
176 }
177
178 // for the log
179 if(degree != 0){
180 log << "degree = " << degree
181 << " -log L("<<degree-1<<") = " << lastNll
182 << " -log L(" << degree <<") = " << result->minNll()
183 << " q = " << q
184 << " P(chi^2_1 > q) = " << TMath::Prob(q,1) << endl;;
185 }
186
187 // update last result for next iteration in loop
188 lastNll = result->minNll();
189 }
190
191 log << "------ End Bernstein Correction Log --------" << endl;
192 cout << log.str();
193
194 return degree;
195}
196
197
198////////////////////////////////////////////////////////////////////////////////
199/// Create sampling distribution for q given degree-1 vs. degree corrections
200
202 const char* nominalName,
203 const char* varName,
204 const char* dataName,
205 TH1F* samplingDist,
206 TH1F* samplingDistExtra,
207 Int_t degree,
208 Int_t nToys){
209 // get ingredients out of workspace
210 RooRealVar* x = wks->var(varName);
211 RooAbsPdf* nominal = wks->pdf(nominalName);
212 RooAbsData* data = wks->data(dataName);
213
214 if (!x || !nominal || !data) {
215 cout << "Error: wrong name for pdf or variable or dataset ! " << std::endl;
216 return;
217 }
218
219 // setup a log
220 std::stringstream log;
221 log << "------ Begin Bernstein Correction Log --------" << endl;
222
223 // Local variables that we want to keep in scope after loop
224 RooArgList coeff; // n-th degree correction
225 RooArgList coeffNull; // n-1 correction
226 RooArgList coeffExtra; // n+1 correction
227 vector<RooRealVar*> coefficients;
228
229 //cout << "make coefs" << endl;
230 for(int i = 0; i<=degree+1; ++i) {
231 // we need to generate names for vars on the fly
232 std::stringstream str;
233 str<<"_"<<i;
234
235 RooRealVar* newCoef = new RooRealVar(("c"+str.str()).c_str(),
236 "Bernstein basis poly coefficient",
237 1., 0., fMaxCorrection);
238
239 // keep three sets of coefficients for n-1, n, n+1 corrections
243 coefficients.push_back(newCoef);
244 }
245
246 // make the polynomial correction term
247 RooBernstein* poly
248 = new RooBernstein("poly", "Bernstein poly", *x, coeff);
249
250 // make the polynomial correction term
251 RooBernstein* polyNull
252 = new RooBernstein("polyNull", "Bernstein poly", *x, coeffNull);
253
254 // make the polynomial correction term
255 RooBernstein* polyExtra
256 = new RooBernstein("polyExtra", "Bernstein poly", *x, coeffExtra);
257
258 // make the corrected PDF = nominal * poly
259 RooEffProd* corrected
260 = new RooEffProd("corrected","",*nominal,*poly);
261
262 RooEffProd* correctedNull
263 = new RooEffProd("correctedNull","",*nominal,*polyNull);
264
265 RooEffProd* correctedExtra
266 = new RooEffProd("correctedExtra","",*nominal,*polyExtra);
267
268
269 cout << "made pdfs, make toy generator" << endl;
270
271 // make a PDF to generate the toys
272 RooDataHist dataHist("dataHist","",*x,*data);
273 RooHistPdf toyGen("toyGen","",*x,dataHist);
274
276
278 if (printLevel < 0) {
280 }
281
282
283 // TH1F* samplingDist = new TH1F("samplingDist","",20,0,10);
284 double q = 0;
285 double qExtra = 0;
286 // do toys
287 for (int i = 0; i < nToys; ++i) {
288 cout << "on toy " << i << endl;
289
290 std::unique_ptr<RooDataSet> tmpData{toyGen.generate(*x, data->numEntries())};
291 // check to see how well this correction fits
292 std::unique_ptr<RooFitResult> result{
293 corrected->fitTo(*tmpData, Save(), Minos(false), Hesse(false), PrintLevel(printLevel))};
294
295 std::unique_ptr<RooFitResult> resultNull{
296 correctedNull->fitTo(*tmpData, Save(), Minos(false), Hesse(false), PrintLevel(printLevel))};
297
298 std::unique_ptr<RooFitResult> resultExtra{
299 correctedExtra->fitTo(*tmpData, Save(), Minos(false), Hesse(false), PrintLevel(printLevel))};
300
301 // Hypothesis test between previous correction (null)
302 // and this one (alternate). Use -2 log LR for test statistic
303 q = 2 * (resultNull->minNll() - result->minNll());
304
305 qExtra = 2 * (result->minNll() - resultExtra->minNll());
306
307 samplingDist->Fill(q);
308 samplingDistExtra->Fill(qExtra);
309 if (printLevel > 0) {
310 cout << "NLL Results: null " << resultNull->minNll() << " ref = " << result->minNll() << " extra"
311 << resultExtra->minNll() << endl;
312 }
313 }
314
316
317 // return samplingDist;
318}
#define ClassImp(name)
Definition Rtypes.h:377
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void data
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t result
float * q
virtual bool add(const RooAbsArg &var, bool silent=false)
Add the specified argument to list.
Abstract base class for binned and unbinned datasets.
Definition RooAbsData.h:57
Abstract interface for all probability density functions.
Definition RooAbsPdf.h:40
RooFit::OwningPtr< RooFitResult > fitTo(RooAbsData &data, CmdArgs_t const &... cmdArgs)
Fit PDF to given dataset.
Definition RooAbsPdf.h:157
RooFit::OwningPtr< RooDataSet > generate(const RooArgSet &whatVars, Int_t nEvents, const RooCmdArg &arg1, const RooCmdArg &arg2={}, const RooCmdArg &arg3={}, const RooCmdArg &arg4={}, const RooCmdArg &arg5={})
See RooAbsPdf::generate(const RooArgSet&,const RooCmdArg&,const RooCmdArg&,const RooCmdArg&,...
Definition RooAbsPdf.h:57
void setConstant(bool value=true)
RooArgList is a container object that can hold multiple RooAbsArg objects.
Definition RooArgList.h:22
Bernstein basis polynomials are positive-definite in the range [0,1].
Container class to hold N-dimensional binned data.
Definition RooDataHist.h:39
The class RooEffProd implements the product of a PDF with an efficiency function.
Definition RooEffProd.h:19
A propability density function sampled from a multidimensional histogram.
Definition RooHistPdf.h:30
static RooMsgService & instance()
Return reference to singleton instance.
void setGlobalKillBelow(RooFit::MsgLevel level)
RooFit::MsgLevel globalKillBelow() const
Variable that can be changed from the outside.
Definition RooRealVar.h:37
void setVal(double value) override
Set value of variable to 'value'.
BernsteinCorrection is a utility in RooStats to augment a nominal PDF with a polynomial correction te...
Int_t ImportCorrectedPdf(RooWorkspace *, const char *, const char *, const char *)
Main method for Bernstein correction.
double fMaxCorrection
maximum correction factor at any point (default is 100)
BernsteinCorrection(double tolerance=0.05)
double fTolerance
probability to add an unnecessary term
Int_t fMaxDegree
maximum polynomial degree correction (default is 10)
void CreateQSamplingDist(RooWorkspace *wks, const char *nominalName, const char *varName, const char *dataName, TH1F *, TH1F *, Int_t degree, Int_t nToys=500)
Create sampling distribution for q given degree-1 vs. degree corrections.
Persistable container for RooFit projects.
RooAbsPdf * pdf(RooStringView name) const
Retrieve p.d.f (RooAbsPdf) with given name. A null pointer is returned if not found.
bool import(const RooAbsArg &arg, const RooCmdArg &arg1={}, const RooCmdArg &arg2={}, const RooCmdArg &arg3={}, const RooCmdArg &arg4={}, const RooCmdArg &arg5={}, const RooCmdArg &arg6={}, const RooCmdArg &arg7={}, const RooCmdArg &arg8={}, const RooCmdArg &arg9={})
Import a RooAbsArg object, e.g.
RooRealVar * var(RooStringView name) const
Retrieve real-valued variable (RooRealVar) with given name. A null pointer is returned if not found.
RooAbsData * data(RooStringView name) const
Retrieve dataset (binned or unbinned) with given name. A null pointer is returned if not found.
1-D histogram with a float per channel (see TH1 documentation)
Definition TH1.h:621
virtual Int_t Fill(Double_t x)
Increment bin with abscissa X by 1.
Definition TH1.cxx:3344
RooCmdArg Hesse(bool flag=true)
RooCmdArg Save(bool flag=true)
RooCmdArg Minos(bool flag=true)
RooCmdArg PrintLevel(Int_t code)
Double_t x[n]
Definition legend1.C:17
The namespace RooFit contains mostly switches that change the behaviour of functions of PDFs (or othe...
Definition JSONIO.h:26
MsgLevel
Verbosity level for RooMsgService::StreamConfig in RooMsgService.
Namespace for the RooStats classes.
Definition Asimov.h:19
Double_t Prob(Double_t chi2, Int_t ndf)
Computation of the probability for a certain Chi-squared (chi2) and number of degrees of freedom (ndf...
Definition TMath.cxx:637