Logo ROOT   6.21/01
Reference Guide
FourBinInstructional.C
Go to the documentation of this file.
1 /// \file
2 /// \ingroup tutorial_roostats
3 /// \notebook
4 /// This example is a generalization of the on/off problem.
5 ///
6 /// This example is a generalization of the on/off problem.
7 /// It's a common setup for SUSY searches. Imagine that one has two
8 /// variables "x" and "y" (eg. missing ET and SumET), see figure.
9 /// The signal region has high values of both of these variables (top right).
10 /// One can see low values of "x" or "y" acting as side-bands. If we
11 /// just used "y" as a sideband, we would have the on/off problem.
12 /// - In the signal region we observe non events and expect s+b events.
13 /// - In the region with low values of "y" (bottom right)
14 /// we observe noff events and expect tau*b events.
15 /// Note the significance of tau. In the background only case:
16 ///
17 /// ~~~{.cpp}
18 /// tau ~ <expectation off> / <expectation on>
19 /// ~~~
20 ///
21 /// If tau is known, this model is sufficient, but often tau is not known exactly.
22 /// So one can use low values of "x" as an additional constraint for tau.
23 /// Note that this technique critically depends on the notion that the
24 /// joint distribution for "x" and "y" can be factorized.
25 /// Generally, these regions have many events, so it the ratio can be
26 /// measured very precisely there. So we extend the model to describe the
27 /// left two boxes... denoted with "bar".
28 /// - In the upper left we observe nonbar events and expect bbar events
29 /// - In the bottom left we observe noffbar events and expect tau bbar events
30 /// Note again we have:
31 ///
32 /// ~~~{.cpp}
33 /// tau ~ <expectation off bar> / <expectation on bar>
34 /// ~~~
35 ///
36 /// One can further expand the model to account for the systematic associated
37 /// to assuming the distribution of "x" and "y" factorizes (eg. that
38 /// tau is the same for off/on and offbar/onbar). This can be done in several
39 /// ways, but here we introduce an additional parameter rho, which so that
40 /// one set of models will use tau and the other tau*rho. The choice is arbitrary,
41 /// but it has consequences on the numerical stability of the algorithms.
42 /// The "bar" measurements typically have more events (& smaller relative errors).
43 /// If we choose
44 ///
45 /// ~~~{.cpp}
46 /// <expectation noffbar> = tau * rho * <expectation noonbar>
47 /// ~~~
48 ///
49 /// the product tau*rho will be known very precisely (~1/sqrt(bbar)) and the contour
50 /// in those parameters will be narrow and have a non-trivial tau~1/rho shape.
51 /// However, if we choose to put rho on the non/noff measurements (where the
52 /// product will have an error `~1/sqrt(b))`, the contours will be more amenable
53 /// to numerical techniques. Thus, here we choose to define:
54 ///
55 /// ~~~{.cpp}
56 /// tau := <expectation off bar> / (<expectation on bar>)
57 /// rho := <expectation off> / (<expectation on> * tau)
58 ///
59 /// ^ y
60 /// |
61 /// |---------------------------+
62 /// | | |
63 /// | nonbar | non |
64 /// | bbar | s+b |
65 /// | | |
66 /// |---------------+-----------|
67 /// | | |
68 /// | noffbar | noff |
69 /// | tau bbar | tau b rho |
70 /// | | |
71 /// +-----------------------------> x
72 /// ~~~
73 ///
74 /// Left in this way, the problem is under-constrained. However, one may
75 /// have some auxiliary measurement (usually based on Monte Carlo) to
76 /// constrain rho. Let us call this auxiliary measurement that gives
77 /// the nominal value of rho "rhonom". Thus, there is a 'constraint' term in
78 /// the full model: P(rhonom | rho). In this case, we consider a Gaussian
79 /// constraint with standard deviation sigma.
80 ///
81 /// In the example, the initial values of the parameters are:
82 ///
83 /// ~~~{.cpp}
84 /// - s = 40
85 /// - b = 100
86 /// - tau = 5
87 /// - bbar = 1000
88 /// - rho = 1
89 /// (sigma for rho = 20%)
90 /// ~~~
91 ///
92 /// and in the toy dataset:
93 ///
94 /// ~~~{.cpp}
95 /// - non = 139
96 /// - noff = 528
97 /// - nonbar = 993
98 /// - noffbar = 4906
99 /// - rhonom = 1.27824
100 /// ~~~
101 ///
102 /// Note, the covariance matrix of the parameters has large off-diagonal terms.
103 /// Clearly s,b are anti-correlated. Similarly, since noffbar >> nonbar, one would
104 /// expect bbar,tau to be anti-correlated.
105 ///
106 /// This can be seen below.
107 ///
108 /// ~~~{.cpp}
109 /// GLOBAL b bbar rho s tau
110 /// b 0.96820 1.000 0.191 -0.942 -0.762 -0.209
111 /// bbar 0.91191 0.191 1.000 0.000 -0.146 -0.912
112 /// rho 0.96348 -0.942 0.000 1.000 0.718 -0.000
113 /// s 0.76250 -0.762 -0.146 0.718 1.000 0.160
114 /// tau 0.92084 -0.209 -0.912 -0.000 0.160 1.000
115 /// ~~~
116 ///
117 /// Similarly, since tau*rho appears as a product, we expect rho,tau
118 /// to be anti-correlated. When the error on rho is significantly
119 /// larger than 1/sqrt(bbar), tau is essentially known and the
120 /// correlation is minimal (tau mainly cares about bbar, and rho about b,s).
121 /// In the alternate parametrization (bbar* tau * rho) the correlation coefficient
122 /// for rho,tau is large (and negative).
123 ///
124 /// The code below uses best-practices for RooFit & RooStats as of June 2010.
125 ///
126 /// It proceeds as follows:
127 /// - create a workspace to hold the model
128 /// - use workspace factory to quickly create the terms of the model
129 /// - use workspace factory to define total model (a prod pdf)
130 /// - create a RooStats ModelConfig to specify observables, parameters of interest
131 /// - add to the ModelConfig a prior on the parameters for Bayesian techniques
132 /// note, the pdf it is factorized for parameters of interest & nuisance params
133 /// - visualize the model
134 /// - write the workspace to a file
135 /// - use several of RooStats IntervalCalculators & compare results
136 ///
137 /// \macro_image
138 /// \macro_output
139 /// \macro_code
140 ///
141 /// \authors authors: Kyle Cranmer, Tanja Rommerskirchen
142 
143 #include "TStopwatch.h"
144 #include "TCanvas.h"
145 #include "TROOT.h"
146 #include "RooPlot.h"
147 #include "RooAbsPdf.h"
148 #include "RooWorkspace.h"
149 #include "RooDataSet.h"
150 #include "RooGlobalFunc.h"
151 #include "RooFitResult.h"
152 #include "RooRandom.h"
157 #include "RooStats/MCMCCalculator.h"
158 #include "RooStats/MCMCInterval.h"
160 #include "RooStats/ProposalHelper.h"
161 #include "RooStats/SimpleInterval.h"
162 #include "RooStats/FeldmanCousins.h"
164 
165 using namespace RooFit;
166 using namespace RooStats;
167 
168 void FourBinInstructional(bool doBayesian = false, bool doFeldmanCousins = false, bool doMCMC = false)
169 {
170 
171  // let's time this challenging example
172  TStopwatch t;
173  t.Start();
174 
175  // set RooFit random seed for reproducible results
177 
178  // make model
179  RooWorkspace *wspace = new RooWorkspace("wspace");
180  wspace->factory("Poisson::on(non[0,1000], sum::splusb(s[40,0,100],b[100,0,300]))");
181  wspace->factory("Poisson::off(noff[0,5000], prod::taub(b,tau[5,3,7],rho[1,0,2]))");
182  wspace->factory("Poisson::onbar(nonbar[0,10000], bbar[1000,500,2000])");
183  wspace->factory("Poisson::offbar(noffbar[0,1000000], prod::lambdaoffbar(bbar, tau))");
184  wspace->factory("Gaussian::mcCons(rhonom[1.,0,2], rho, sigma[.2])");
185  wspace->factory("PROD::model(on,off,onbar,offbar,mcCons)");
186  wspace->defineSet("obs", "non,noff,nonbar,noffbar,rhonom");
187 
188  wspace->factory("Uniform::prior_poi({s})");
189  wspace->factory("Uniform::prior_nuis({b,bbar,tau, rho})");
190  wspace->factory("PROD::prior(prior_poi,prior_nuis)");
191 
192  // ----------------------------------
193  // Control some interesting variations
194  // define parameers of interest
195  // for 1-d plots
196  wspace->defineSet("poi", "s");
197  wspace->defineSet("nuis", "b,tau,rho,bbar");
198  // for 2-d plots to inspect correlations:
199  // wspace->defineSet("poi","s,rho");
200 
201  // test simpler cases where parameters are known.
202  // wspace->var("tau")->setConstant();
203  // wspace->var("rho")->setConstant();
204  // wspace->var("b")->setConstant();
205  // wspace->var("bbar")->setConstant();
206 
207  // inspect workspace
208  // wspace->Print();
209 
210  // ----------------------------------------------------------
211  // Generate toy data
212  // generate toy data assuming current value of the parameters
213  // import into workspace.
214  // add Verbose() to see how it's being generated
215  RooDataSet *data = wspace->pdf("model")->generate(*wspace->set("obs"), 1);
216  // data->Print("v");
217  wspace->import(*data);
218 
219  // ----------------------------------
220  // Now the statistical tests
221  // model config
222  ModelConfig *modelConfig = new ModelConfig("FourBins");
223  modelConfig->SetWorkspace(*wspace);
224  modelConfig->SetPdf(*wspace->pdf("model"));
225  modelConfig->SetPriorPdf(*wspace->pdf("prior"));
226  modelConfig->SetParametersOfInterest(*wspace->set("poi"));
227  modelConfig->SetNuisanceParameters(*wspace->set("nuis"));
228  wspace->import(*modelConfig);
229  wspace->writeToFile("FourBin.root");
230 
231  // -------------------------------------------------
232  // If you want to see the covariance matrix uncomment
233  // wspace->pdf("model")->fitTo(*data);
234 
235  // use ProfileLikelihood
236  ProfileLikelihoodCalculator plc(*data, *modelConfig);
237  plc.SetConfidenceLevel(0.95);
238  LikelihoodInterval *plInt = plc.GetInterval();
241  plInt->LowerLimit(*wspace->var("s")); // get ugly print out of the way. Fix.
243 
244  // use FeldmaCousins (takes ~20 min)
245  FeldmanCousins fc(*data, *modelConfig);
246  fc.SetConfidenceLevel(0.95);
247  // number counting: dataset always has 1 entry with N events observed
248  fc.FluctuateNumDataEntries(false);
249  fc.UseAdaptiveSampling(true);
250  fc.SetNBins(40);
251  PointSetInterval *fcInt = NULL;
252  if (doFeldmanCousins) { // takes 7 minutes
253  fcInt = (PointSetInterval *)fc.GetInterval(); // fix cast
254  }
255 
256  // use BayesianCalculator (only 1-d parameter of interest, slow for this problem)
257  BayesianCalculator bc(*data, *modelConfig);
258  bc.SetConfidenceLevel(0.95);
259  SimpleInterval *bInt = NULL;
260  if (doBayesian && wspace->set("poi")->getSize() == 1) {
261  bInt = bc.GetInterval();
262  } else {
263  cout << "Bayesian Calc. only supports on parameter of interest" << endl;
264  }
265 
266  // use MCMCCalculator (takes about 1 min)
267  // Want an efficient proposal function, so derive it from covariance
268  // matrix of fit
269  RooFitResult *fit = wspace->pdf("model")->fitTo(*data, Save());
270  ProposalHelper ph;
271  ph.SetVariables((RooArgSet &)fit->floatParsFinal());
272  ph.SetCovMatrix(fit->covarianceMatrix());
273  ph.SetUpdateProposalParameters(kTRUE); // auto-create mean vars and add mappings
274  ph.SetCacheSize(100);
276 
277  MCMCCalculator mc(*data, *modelConfig);
278  mc.SetConfidenceLevel(0.95);
279  mc.SetProposalFunction(*pf);
280  mc.SetNumBurnInSteps(500); // first N steps to be ignored as burn-in
281  mc.SetNumIters(50000);
282  mc.SetLeftSideTailFraction(0.5); // make a central interval
283  MCMCInterval *mcInt = NULL;
284  if (doMCMC)
285  mcInt = mc.GetInterval();
286 
287  // ----------------------------------
288  // Make some plots
289  TCanvas *c1 = (TCanvas *)gROOT->Get("c1");
290  if (!c1)
291  c1 = new TCanvas("c1");
292 
293  if (doBayesian && doMCMC) {
294  c1->Divide(3);
295  c1->cd(1);
296  } else if (doBayesian || doMCMC) {
297  c1->Divide(2);
298  c1->cd(1);
299  }
300 
301  LikelihoodIntervalPlot *lrplot = new LikelihoodIntervalPlot(plInt);
302  lrplot->Draw();
303 
304  if (doBayesian && wspace->set("poi")->getSize() == 1) {
305  c1->cd(2);
306  // the plot takes a long time and print lots of error
307  // using a scan it is better
308  bc.SetScanOfPosterior(20);
309  RooPlot *bplot = bc.GetPosteriorPlot();
310  bplot->Draw();
311  }
312 
313  if (doMCMC) {
314  if (doBayesian && wspace->set("poi")->getSize() == 1)
315  c1->cd(3);
316  else
317  c1->cd(2);
318  MCMCIntervalPlot mcPlot(*mcInt);
319  mcPlot.Draw();
320  }
321 
322  // ----------------------------------
323  // querry intervals
324  cout << "Profile Likelihood interval on s = [" << plInt->LowerLimit(*wspace->var("s")) << ", "
325  << plInt->UpperLimit(*wspace->var("s")) << "]" << endl;
326  // Profile Likelihood interval on s = [12.1902, 88.6871]
327 
328  if (doBayesian && wspace->set("poi")->getSize() == 1) {
329  cout << "Bayesian interval on s = [" << bInt->LowerLimit() << ", " << bInt->UpperLimit() << "]" << endl;
330  }
331 
332  if (doFeldmanCousins) {
333  cout << "Feldman Cousins interval on s = [" << fcInt->LowerLimit(*wspace->var("s")) << ", "
334  << fcInt->UpperLimit(*wspace->var("s")) << "]" << endl;
335  // Feldman Cousins interval on s = [18.75 +/- 2.45, 83.75 +/- 2.45]
336  }
337 
338  if (doMCMC) {
339  cout << "MCMC interval on s = [" << mcInt->LowerLimit(*wspace->var("s")) << ", "
340  << mcInt->UpperLimit(*wspace->var("s")) << "]" << endl;
341  // MCMC interval on s = [15.7628, 84.7266]
342  }
343 
344  t.Print();
345 }
RooDataSet * generate(const RooArgSet &whatVars, Int_t nEvents, const RooCmdArg &arg1, const RooCmdArg &arg2=RooCmdArg::none(), const RooCmdArg &arg3=RooCmdArg::none(), const RooCmdArg &arg4=RooCmdArg::none(), const RooCmdArg &arg5=RooCmdArg::none())
See RooAbsPdf::generate(const RooArgSet&,const RooCmdArg&,const RooCmdArg&,const RooCmdArg&,const RooCmdArg&,const RooCmdArg&,const RooCmdArg&)
Definition: RooAbsPdf.h:55
ProposalFunction is an interface for all proposal functions that would be used with a Markov Chain Mo...
ModelConfig is a simple class that holds configuration information specifying how a model should be u...
Definition: ModelConfig.h:30
const RooArgList & floatParsFinal() const
Definition: RooFitResult.h:110
void Start(Bool_t reset=kTRUE)
Start the stopwatch.
Definition: TStopwatch.cxx:58
void Draw(const Option_t *options=0)
draw the likelihood interval or contour for the 1D case a RooPlot is drawn by default of the profiled...
This class provides simple and straightforward utilities to plot a MCMCInterval object.
void Print(Option_t *option="") const
Print the real and cpu time passed between the start and stop events.
Definition: TStopwatch.cxx:219
LikelihoodInterval is a concrete implementation of the RooStats::ConfInterval interface.
virtual void SetWorkspace(RooWorkspace &ws)
Definition: ModelConfig.h:66
return c1
Definition: legend1.C:41
RooArgSet is a container object that can hold multiple RooAbsArg objects.
Definition: RooArgSet.h:28
const TMatrixDSym & covarianceMatrix() const
Return covariance matrix.
virtual ProposalFunction * GetProposalFunction()
virtual void SetUpdateProposalParameters(Bool_t updateParams)
This class provides simple and straightforward utilities to plot a LikelihoodInterval object...
RooFit::MsgLevel globalKillBelow() const
The ProfileLikelihoodCalculator is a concrete implementation of CombinedCalculator (the interface cla...
#define gROOT
Definition: TROOT.h:415
MsgLevel
Verbosity level for RooMsgService::StreamConfig in RooMsgService.
Definition: RooGlobalFunc.h:65
RooFitResult is a container class to hold the input and output of a PDF fit to a dataset.
Definition: RooFitResult.h:40
static RooMsgService & instance()
Return reference to singleton instance.
Double_t LowerLimit(const RooRealVar &param)
return the lower bound of the interval on a given parameter
static struct mg_connection * fc(struct mg_context *ctx)
Definition: civetweb.c:3728
virtual void SetSeed(ULong_t seed=0)
Set the random generator seed.
Definition: TRandom.cxx:597
The namespace RooFit contains mostly switches that change the behaviour of functions of PDFs (or othe...
virtual void SetPdf(const RooAbsPdf &pdf)
Set the Pdf, add to the the workspace if not already there.
Definition: ModelConfig.h:81
static TRandom * randomGenerator()
Return a pointer to a singleton random-number generator implementation.
Definition: RooRandom.cxx:54
virtual Double_t LowerLimit()
Double_t LowerLimit(RooRealVar &param)
return lower limit on a given parameter
Int_t getSize() const
Double_t UpperLimit(const RooRealVar &param)
return the upper bound of the interval on a given parameter
virtual void SetNuisanceParameters(const RooArgSet &set)
Specify the nuisance parameters (parameters that are not POI).
Definition: ModelConfig.h:119
virtual void SetCacheSize(Int_t size)
Bool_t defineSet(const char *name, const RooArgSet &aset, Bool_t importMissing=kFALSE)
Define a named RooArgSet with given constituents.
void setGlobalKillBelow(RooFit::MsgLevel level)
Bool_t writeToFile(const char *fileName, Bool_t recreate=kTRUE)
Save this current workspace into given file.
virtual Double_t UpperLimit(RooRealVar &param)
get the highest value of param that is within the confidence interval
RooDataSet is a container class to hold unbinned data.
Definition: RooDataSet.h:31
Double_t UpperLimit(RooRealVar &param)
return upper limit on a given parameter
A RooPlot is a plot frame and a container for graphics objects within that frame. ...
Definition: RooPlot.h:44
The FeldmanCousins class (like the Feldman-Cousins technique) is essentially a specific configuration...
The Canvas class.
Definition: TCanvas.h:31
Namespace for the RooStats classes.
Definition: Asimov.h:20
PointSetInterval is a concrete implementation of the ConfInterval interface.
RooAbsPdf * pdf(const char *name) const
Retrieve p.d.f (RooAbsPdf) with given name. A null pointer is returned if not found.
virtual Double_t LowerLimit(RooRealVar &param)
get the lowest value of param that is within the confidence interval
virtual void SetPriorPdf(const RooAbsPdf &pdf)
Set the Prior Pdf, add to the the workspace if not already there.
Definition: ModelConfig.h:87
virtual void SetCovMatrix(const TMatrixDSym &covMatrix)
RooRealVar * var(const char *name) const
Retrieve real-valued variable (RooRealVar) with given name. A null pointer is returned if not found...
RooFactoryWSTool & factory()
Return instance to factory tool.
RooCmdArg Save(Bool_t flag=kTRUE)
virtual Double_t UpperLimit()
SimpleInterval is a concrete implementation of the ConfInterval interface.
Bool_t import(const RooAbsArg &arg, const RooCmdArg &arg1=RooCmdArg(), const RooCmdArg &arg2=RooCmdArg(), const RooCmdArg &arg3=RooCmdArg(), const RooCmdArg &arg4=RooCmdArg(), const RooCmdArg &arg5=RooCmdArg(), const RooCmdArg &arg6=RooCmdArg(), const RooCmdArg &arg7=RooCmdArg(), const RooCmdArg &arg8=RooCmdArg(), const RooCmdArg &arg9=RooCmdArg())
Import a RooAbsArg object, e.g.
const RooArgSet * set(const char *name)
Return pointer to previously defined named set with given nmame If no such set is found a null pointe...
virtual RooFitResult * fitTo(RooAbsData &data, const RooCmdArg &arg1=RooCmdArg::none(), const RooCmdArg &arg2=RooCmdArg::none(), const RooCmdArg &arg3=RooCmdArg::none(), const RooCmdArg &arg4=RooCmdArg::none(), const RooCmdArg &arg5=RooCmdArg::none(), const RooCmdArg &arg6=RooCmdArg::none(), const RooCmdArg &arg7=RooCmdArg::none(), const RooCmdArg &arg8=RooCmdArg::none())
Fit PDF to given dataset.
Definition: RooAbsPdf.cxx:1256
virtual void SetParametersOfInterest(const RooArgSet &set)
Specify parameters of interest.
Definition: ModelConfig.h:100
MCMCInterval is a concrete implementation of the RooStats::ConfInterval interface.
Definition: MCMCInterval.h:30
virtual void SetVariables(RooArgList &vars)
Bayesian Calculator estimating an interval or a credible region using the Markov-Chain Monte Carlo me...
const Bool_t kTRUE
Definition: RtypesCore.h:87
BayesianCalculator is a concrete implementation of IntervalCalculator, providing the computation of a...
The RooWorkspace is a persistable container for RooFit projects.
Definition: RooWorkspace.h:43
virtual void Draw(Option_t *options=0)
Draw this plot and all of the elements it contains.
Definition: RooPlot.cxx:712
Stopwatch class.
Definition: TStopwatch.h:28