Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
TwoSidedFrequentistUpperLimitWithBands.C
Go to the documentation of this file.
1/// \file
2/// \ingroup tutorial_roostats
3/// \notebook -js
4/// TwoSidedFrequentistUpperLimitWithBands
5///
6///
7/// This is a standard demo that can be used with any ROOT file
8/// prepared in the standard way. You specify:
9/// - name for input ROOT file
10/// - name of workspace inside ROOT file that holds model and data
11/// - name of ModelConfig that specifies details for calculator tools
12/// - name of dataset
13///
14/// With default parameters the macro will attempt to run the
15/// standard hist2workspace example and read the ROOT file
16/// that it produces.
17///
18/// You may want to control:
19/// ~~~{.cpp}
20/// double confidenceLevel=0.95;
21/// double additionalToysFac = 1.;
22/// int nPointsToScan = 12;
23/// int nToyMC = 200;
24/// ~~~
25///
26/// This uses a modified version of the profile likelihood ratio as
27/// a test statistic for upper limits (eg. test stat = 0 if muhat>mu).
28///
29/// Based on the observed data, one defines a set of parameter points
30/// to be tested based on the value of the parameter of interest
31/// and the conditional MLE (eg. profiled) values of the nuisance parameters.
32///
33/// At each parameter point, pseudo-experiments are generated using this
34/// fixed reference model and then the test statistic is evaluated.
35/// The auxiliary measurements (global observables) associated with the
36/// constraint terms in nuisance parameters are also fluctuated in the
37/// process of generating the pseudo-experiments in a frequentist manner
38/// forming an 'unconditional ensemble'. One could form a 'conditional'
39/// ensemble in which these auxiliary measurements are fixed. Note that the
40/// nuisance parameters are not randomized, which is a Bayesian procedure.
41/// Note, the nuisance parameters are floating in the fits. For each point,
42/// the threshold that defines the 95% acceptance region is found. This
43/// forms a "Confidence Belt".
44///
45/// After constructing the confidence belt, one can find the confidence
46/// interval for any particular dataset by finding the intersection
47/// of the observed test statistic and the confidence belt. First
48/// this is done on the observed data to get an observed 1-sided upper limt.
49///
50/// Finally, there expected limit and bands (from background-only) are
51/// formed by generating background-only data and finding the upper limit.
52/// The background-only is defined as such that the nuisance parameters are
53/// fixed to their best fit value based on the data with the signal rate fixed to 0.
54/// The bands are done by hand for now, will later be part of the RooStats tools.
55///
56/// On a technical note, this technique IS the generalization of Feldman-Cousins
57/// with nuisance parameters.
58///
59/// Building the confidence belt can be computationally expensive.
60/// Once it is built, one could save it to a file and use it in a separate step.
61///
62/// We can use PROOF to speed things along in parallel, however,
63/// the test statistic has to be installed on the workers
64/// so either turn off PROOF or include the modified test statistic
65/// in your $ROOTSYS/roofit/roostats/inc directory,
66/// add the additional line to the LinkDef.h file,
67/// and recompile root.
68///
69/// Note, if you have a boundary on the parameter of interest (eg. cross-section)
70/// the threshold on the two-sided test statistic starts off at moderate values and plateaus.
71///
72/// [#0] PROGRESS:Generation -- generated toys: 500 / 999
73/// NeymanConstruction: Prog: 12/50 total MC = 39 this test stat = 0
74/// SigXsecOverSM=0.69 alpha_syst1=0.136515 alpha_syst3=0.425415 beta_syst2=1.08496 [-1e+30, 0.011215] in interval = 1
75///
76/// this tells you the values of the parameters being used to generate the pseudo-experiments
77/// and the threshold in this case is 0.011215. One would expect for 95% that the threshold
78/// would be ~1.35 once the cross-section is far enough away from 0 that it is essentially
79/// unaffected by the boundary. As one reaches the last points in the scan, the
80/// threshold starts to get artificially high. This is because the range of the parameter in
81/// the fit is the same as the range in the scan. In the future, these should be independently
82/// controlled, but they are not now. As a result the ~50% of pseudo-experiments that have an
83/// upward fluctuation end up with muhat = muMax. Because of this, the upper range of the
84/// parameter should be well above the expected upper limit... but not too high or one will
85/// need a very large value of nPointsToScan to resolve the relevant region. This can be
86/// improved, but this is the first version of this script.
87///
88/// Important note: when the model includes external constraint terms, like a Gaussian
89/// constraint to a nuisance parameter centered around some nominal value there is
90/// a subtlety. The asymptotic results are all based on the assumption that all the
91/// measurements fluctuate... including the nominal values from auxiliary measurements.
92/// If these do not fluctuate, this corresponds to an "conditional ensemble". The
93/// result is that the distribution of the test statistic can become very non-chi^2.
94/// This results in thresholds that become very large.
95///
96/// \macro_image
97/// \macro_output
98/// \macro_code
99///
100/// \authors Kyle Cranmer,Contributions from Aaron Armbruster, Haoshuang Ji, Haichen Wang and Daniel Whiteson
101
102#include "TFile.h"
103#include "TROOT.h"
104#include "TH1F.h"
105#include "TCanvas.h"
106#include "TSystem.h"
107#include <iostream>
108
109#include "RooWorkspace.h"
110#include "RooSimultaneous.h"
111#include "RooAbsData.h"
112
113#include "RooStats/ModelConfig.h"
118
121
122using namespace RooFit;
123using namespace RooStats;
124using std::cout, std::endl;
125
126// -------------------------------------------------------
127
128void workspaceName = "combined",
129 const char *modelConfigName = "ModelConfig",
130 const char *dataName = "obsData")
131{
132
133 double confidenceLevel = 0.95;
134 // degrade/improve number of pseudo-experiments used to define the confidence belt.
135 // value of 1 corresponds to default number of toys in the tail, which is 50/(1-confidenceLevel)
136 double additionalToysFac = 0.5;
137 int nPointsToScan = 20; // number of steps in the parameter of interest
138 int nToyMC = 200; // number of toys used to define the expected limit and band
139
140 // -------------------------------------------------------
141 // First part is just to access a user-defined file
142 // or create the standard example file if it doesn't exist
143 const char *filename = "";
144 if (!strcmp(infile, "")) {
145 filename = "results/example_combined_GaussExample_model.root";
146 bool fileExist = !gSystem->AccessPathName(filename); // note opposite return code
147 // if file does not exists generate with histfactory
148 if (!fileExist) {
149 // Normally this would be run on the command line
150 cout << "will run standard hist2workspace example" << endl;
151 gROOT->ProcessLine(".! prepareHistFactory .");
152 gROOT->ProcessLine(".! hist2workspace config/example.xml");
153 cout << "\n\n---------------------" << endl;
154 cout << "Done creating example input" << endl;
155 cout << "---------------------\n\n" << endl;
156 }
157
158 } else
160
161 // Try to open the file
163
164 // -------------------------------------------------------
165 // Now get the data and workspace
166
167 // get the workspace out of the file
169
170 // get the modelConfig out of the file
172
173 // get the modelConfig out of the file
174 RooAbsData *data = w->data(dataName);
175
176 cout << "Found data and ModelConfig:" << endl;
177 mc->Print();
178
179 // -------------------------------------------------------
180 // Now get the POI for convenience
181 // you may want to adjust the range of your POI
182 RooRealVar *firstPOI = (RooRealVar *)mc->GetParametersOfInterest()->first();
183 /* firstPOI->setMin(0);*/
184 /* firstPOI->setMax(10);*/
185
186 // -------------------------------------------------------
187 // create and use the FeldmanCousins tool
188 // to find and plot the 95% confidence interval
189 // on the parameter of interest as specified
190 // in the model config
191 // REMEMBER, we will change the test statistic
192 // so this is NOT a Feldman-Cousins interval
194 fc.SetConfidenceLevel(confidenceLevel);
195 fc.AdditionalNToysFactor(additionalToysFac); // improve sampling that defines confidence belt
196 // fc.UseAdaptiveSampling(true); // speed it up a bit, but don't use for expected limits
197 fc.SetNBins(nPointsToScan); // set how many points per parameter of interest to scan
198 fc.CreateConfBelt(true); // save the information in the belt for plotting
199
200 // -------------------------------------------------------
201 // Feldman-Cousins is a unified limit by definition
202 // but the tool takes care of a few things for us like which values
203 // of the nuisance parameters should be used to generate toys.
204 // so let's just change the test statistic and realize this is
205 // no longer "Feldman-Cousins" but is a fully frequentist Neyman-Construction.
206 // fc.GetTestStatSampler()->SetTestStatistic(&onesided);
207 // ((ToyMCSampler*) fc.GetTestStatSampler())->SetGenerateBinned(true);
208 ToyMCSampler *toymcsampler = (ToyMCSampler *)fc.GetTestStatSampler();
209 ProfileLikelihoodTestStat *testStat = dynamic_cast<ProfileLikelihoodTestStat *>(toymcsampler->GetTestStatistic());
210
211 // Since this tool needs to throw toy MC the PDF needs to be
212 // extended or the tool needs to know how many entries in a dataset
213 // per pseudo experiment.
214 // In the 'number counting form' where the entries in the dataset
215 // are counts, and not values of discriminating variables, the
216 // datasets typically only have one entry and the PDF is not
217 // extended.
218 if (!mc->GetPdf()->canBeExtended()) {
219 if (data->numEntries() == 1)
220 fc.FluctuateNumDataEntries(false);
221 else
222 cout << "Not sure what to do about this model" << endl;
223 }
224
225 if (mc->GetGlobalObservables()) {
226 cout << "will use global observables for unconditional ensemble" << endl;
227 mc->GetGlobalObservables()->Print();
228 toymcsampler->SetGlobalObservables(*mc->GetGlobalObservables());
229 }
230
231 // Now get the interval
232 PointSetInterval *interval = fc.GetInterval();
233 ConfidenceBelt *belt = fc.GetConfidenceBelt();
234
235 // print out the interval on the first Parameter of Interest
236 cout << "\n95% interval on " << firstPOI->GetName() << " is : [" << interval->LowerLimit(*firstPOI) << ", "
237 << interval->UpperLimit(*firstPOI) << "] " << endl;
238
239 // get observed UL and value of test statistic evaluated there
241 double observedUL = interval->UpperLimit(*firstPOI);
242 firstPOI->setVal(observedUL);
243 double obsTSatObsUL = fc.GetTestStatSampler()->EvaluateTestStatistic(*data, tmpPOI);
244
245 // Ask the calculator which points were scanned
246 RooDataSet *parameterScan = (RooDataSet *)fc.GetPointsToScan();
248
249 // make a histogram of parameter vs. threshold
251 new TH1F("histOfThresholds", "", parameterScan->numEntries(), firstPOI->getMin(), firstPOI->getMax());
252 histOfThresholds->GetXaxis()->SetTitle(firstPOI->GetName());
253 histOfThresholds->GetYaxis()->SetTitle("Threshold");
254
255 // loop through the points that were tested and ask confidence belt
256 // what the upper/lower thresholds were.
257 // For FeldmanCousins, the lower cut off is always 0
258 for (Int_t i = 0; i < parameterScan->numEntries(); ++i) {
259 tmpPoint = (RooArgSet *)parameterScan->get(i)->clone("temp");
260 // cout <<"get threshold"<<endl;
261 double arMax = belt->GetAcceptanceRegionMax(*tmpPoint);
262 double poiVal = tmpPoint->getRealValue(firstPOI->GetName());
264 }
265 TCanvas *c1 = new TCanvas();
266 c1->Divide(2);
267 c1->cd(1);
268 histOfThresholds->SetMinimum(0);
269 histOfThresholds->Draw();
270 c1->cd(2);
271
272 // -------------------------------------------------------
273 // Now we generate the expected bands and power-constraint
274
275 // First: find parameter point for mu=0, with conditional MLEs for nuisance parameters
276 std::unique_ptr<RooAbsReal> nll{mc->GetPdf()->createNLL(*data)};
277 std::unique_ptr<RooAbsReal> profile{nll->createProfile(*mc->GetParametersOfInterest())};
278 firstPOI->setVal(0.);
279 profile->getVal(); // this will do fit and set nuisance parameters to profiled values
281 if (mc->GetNuisanceParameters())
282 poiAndNuisance->add(*mc->GetNuisanceParameters());
283 poiAndNuisance->add(*mc->GetParametersOfInterest());
284 w->saveSnapshot("paramsToGenerateData", *poiAndNuisance);
286 cout << "\nWill use these parameter points to generate pseudo data for bkg only" << endl;
287 paramsToGenerateData->Print("v");
288
290 unconditionalObs.add(*mc->GetObservables());
291 unconditionalObs.add(*mc->GetGlobalObservables()); // comment this out for the original conditional ensemble
292
293 double CLb = 0;
294 double CLbinclusive = 0;
295
296 // Now we generate background only and find distribution of upper limits
297 TH1F *histOfUL = new TH1F("histOfUL", "", 100, 0, firstPOI->getMax());
298 histOfUL->GetXaxis()->SetTitle("Upper Limit (background only)");
299 histOfUL->GetYaxis()->SetTitle("Entries");
300 for (int imc = 0; imc < nToyMC; ++imc) {
301
302 // set parameters back to values for generating pseudo data
303 // cout << "\n get current nuis, set vals, print again" << endl;
304 w->loadSnapshot("paramsToGenerateData");
305 // poiAndNuisance->Print("v");
306
307 std::unique_ptr<RooDataSet> toyData;
308 // now generate a toy dataset for the main measurement
309 if (!mc->GetPdf()->canBeExtended()) {
310 if (data->numEntries() == 1)
311 toyData = std::unique_ptr<RooDataSet>{mc->GetPdf()->generate(*mc->GetObservables(), 1)};
312 else
313 cout << "Not sure what to do about this model" << endl;
314 } else {
315 // cout << "generating extended dataset"<<endl;
316 toyData = std::unique_ptr<RooDataSet>{mc->GetPdf()->generate(*mc->GetObservables(), Extended())};
317 }
318
319 // generate global observables
320 std::unique_ptr<RooDataSet> one{mc->GetPdf()->generateSimGlobal(*mc->GetGlobalObservables(), 1)};
321 const RooArgSet *values = one->get();
322 std::unique_ptr<RooArgSet> allVars{mc->GetPdf()->getVariables()};
323 allVars->assign(*values);
324
325 // get test stat at observed UL in observed data
326 firstPOI->setVal(observedUL);
327 double toyTSatObsUL = fc.GetTestStatSampler()->EvaluateTestStatistic(*toyData, tmpPOI);
328 // toyData->get()->Print("v");
329 // cout <<"obsTSatObsUL " <<obsTSatObsUL << "toyTS " << toyTSatObsUL << endl;
330 if (obsTSatObsUL < toyTSatObsUL) // not sure about <= part yet
331 CLb += (1.) / nToyMC;
332 if (obsTSatObsUL <= toyTSatObsUL) // not sure about <= part yet
333 CLbinclusive += (1.) / nToyMC;
334
335 // loop over points in belt to find upper limit for this toy data
336 double thisUL = 0;
337 for (Int_t i = 0; i < parameterScan->numEntries(); ++i) {
338 tmpPoint = (RooArgSet *)parameterScan->get(i)->clone("temp");
339 double arMax = belt->GetAcceptanceRegionMax(*tmpPoint);
340 firstPOI->setVal(tmpPoint->getRealValue(firstPOI->GetName()));
341 // double thisTS = profile->getVal();
342 double thisTS = fc.GetTestStatSampler()->EvaluateTestStatistic(*toyData, tmpPOI);
343
344 // cout << "poi = " << firstPOI->getVal()
345 // << " max is " << arMax << " this profile = " << thisTS << endl;
346 // cout << "thisTS = " << thisTS<<endl;
347 if (thisTS <= arMax) {
348 thisUL = firstPOI->getVal();
349 } else {
350 break;
351 }
352 }
353
354 histOfUL->Fill(thisUL);
355
356 // for few events, data is often the same, and UL is often the same
357 // cout << "thisUL = " << thisUL<<endl;
358 }
359 histOfUL->Draw();
360 c1->SaveAs("two-sided_upper_limit_output.pdf");
361
362 // if you want to see a plot of the sampling distribution for a particular scan point:
363 /*
364 SamplingDistPlot sampPlot;
365 int indexInScan = 0;
366 tmpPoint = (RooArgSet*) parameterScan->get(indexInScan)->clone("temp");
367 firstPOI->setVal( tmpPoint->getRealValue(firstPOI->GetName()) );
368 toymcsampler->SetParametersForTestStat(tmpPOI);
369 SamplingDistribution* samp = toymcsampler->GetSamplingDistribution(*tmpPoint);
370 sampPlot.AddSamplingDistribution(samp);
371 sampPlot.Draw();
372 */
373
374 // Now find bands and power constraint
375 Double_t *bins = histOfUL->GetIntegral();
376 TH1F *cumulative = (TH1F *)histOfUL->Clone("cumulative");
377 cumulative->SetContent(bins);
378 double band2sigDown = 0, band1sigDown = 0, bandMedian = 0, band1sigUp = 0, band2sigUp = 0;
379 for (int i = 1; i <= cumulative->GetNbinsX(); ++i) {
380 if (bins[i] < RooStats::SignificanceToPValue(2))
381 band2sigDown = cumulative->GetBinCenter(i);
382 if (bins[i] < RooStats::SignificanceToPValue(1))
383 band1sigDown = cumulative->GetBinCenter(i);
384 if (bins[i] < 0.5)
385 bandMedian = cumulative->GetBinCenter(i);
386 if (bins[i] < RooStats::SignificanceToPValue(-1))
387 band1sigUp = cumulative->GetBinCenter(i);
388 if (bins[i] < RooStats::SignificanceToPValue(-2))
389 band2sigUp = cumulative->GetBinCenter(i);
390 }
391 cout << "-2 sigma band " << band2sigDown << endl;
392 cout << "-1 sigma band " << band1sigDown << " [Power Constraint)]" << endl;
393 cout << "median of band " << bandMedian << endl;
394 cout << "+1 sigma band " << band1sigUp << endl;
395 cout << "+2 sigma band " << band2sigUp << endl;
396
397 // print out the interval on the first Parameter of Interest
398 cout << "\nobserved 95% upper-limit " << interval->UpperLimit(*firstPOI) << endl;
399 cout << "CLb strict [P(toy>obs|0)] for observed 95% upper-limit " << CLb << endl;
400 cout << "CLb inclusive [P(toy>=obs|0)] for observed 95% upper-limit " << CLbinclusive << endl;
401}
int Int_t
Definition RtypesCore.h:45
double Double_t
Definition RtypesCore.h:59
ROOT::Detail::TRangeCast< T, true > TRangeDynCast
TRangeDynCast is an adapter class that allows the typed iteration through a TCollection.
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void data
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char filename
#define gROOT
Definition TROOT.h:406
R__EXTERN TSystem * gSystem
Definition TSystem.h:572
Abstract base class for binned and unbinned datasets.
Definition RooAbsData.h:57
RooArgSet is a container object that can hold multiple RooAbsArg objects.
Definition RooArgSet.h:24
Container class to hold unbinned data.
Definition RooDataSet.h:34
Variable that can be changed from the outside.
Definition RooRealVar.h:37
ConfidenceBelt is a concrete implementation of the ConfInterval interface.
The FeldmanCousins class (like the Feldman-Cousins technique) is essentially a specific configuration...
ModelConfig is a simple class that holds configuration information specifying how a model should be u...
Definition ModelConfig.h:35
PointSetInterval is a concrete implementation of the ConfInterval interface.
ProfileLikelihoodTestStat is an implementation of the TestStatistic interface that calculates the pro...
ToyMCSampler is an implementation of the TestStatSampler interface.
Persistable container for RooFit projects.
The Canvas class.
Definition TCanvas.h:23
A ROOT file is an on-disk file, usually with extension .root, that stores objects in a file-system-li...
Definition TFile.h:131
static TFile * Open(const char *name, Option_t *option="", const char *ftitle="", Int_t compress=ROOT::RCompressionSetting::EDefaults::kUseCompiledDefault, Int_t netopt=0)
Create / open a file.
Definition TFile.cxx:4131
1-D histogram with a float per channel (see TH1 documentation)
Definition TH1.h:650
virtual Bool_t AccessPathName(const char *path, EAccessMode mode=kFileExists)
Returns FALSE if one can access a file using the specified access mode.
Definition TSystem.cxx:1308
return c1
Definition legend1.C:41
double nll(double pdf, double weight, int binnedL, int doBinOffset)
Definition MathFuncs.h:379
The namespace RooFit contains mostly switches that change the behaviour of functions of PDFs (or othe...
Definition CodegenImpl.h:64
Namespace for the RooStats classes.
Definition CodegenImpl.h:58
double SignificanceToPValue(double Z)
returns p-value corresponding to a 1-sided significance