Logo ROOT   6.12/07
Reference Guide
FourBinInstructional.C
Go to the documentation of this file.
1 /// \file
2 /// \ingroup tutorial_roostats
3 /// \notebook
4 /// This example is a generalization of the on/off problem.
5 ///
6 /// This example is a generalization of the on/off problem.
7 /// It's a common setup for SUSY searches. Imagine that one has two
8 /// variables "x" and "y" (eg. missing ET and SumET), see figure.
9 /// The signal region has high values of both of these variables (top right).
10 /// One can see low values of "x" or "y" acting as side-bands. If we
11 /// just used "y" as a sideband, we would have the on/off problem.
12 /// - In the signal region we observe non events and expect s+b events.
13 /// - In the region with low values of "y" (bottom right)
14 /// we observe noff events and expect tau*b events.
15 /// Note the significance of tau. In the background only case:
16 ///
17 /// ~~~{.cpp}
18 /// tau ~ <expectation off> / <expectation on>
19 /// ~~~
20 ///
21 /// If tau is known, this model is sufficient, but often tau is not known exactly.
22 /// So one can use low values of "x" as an additional constraint for tau.
23 /// Note that this technique critically depends on the notion that the
24 /// joint distribution for "x" and "y" can be factorized.
25 /// Generally, these regions have many events, so it the ratio can be
26 /// measured very precisely there. So we extend the model to describe the
27 /// left two boxes... denoted with "bar".
28 /// - In the upper left we observe nonbar events and expect bbar events
29 /// - In the bottom left we observe noffbar events and expect tau bbar events
30 /// Note again we have:
31 ///
32 /// ~~~{.cpp}
33 /// tau ~ <expectation off bar> / <expectation on bar>
34 /// ~~~
35 ///
36 /// One can further expand the model to account for the systematic associated
37 /// to assuming the distribution of "x" and "y" factorizes (eg. that
38 /// tau is the same for off/on and offbar/onbar). This can be done in several
39 /// ways, but here we introduce an additional parameter rho, which so that
40 /// one set of models will use tau and the other tau*rho. The choice is arbitrary,
41 /// but it has consequences on the numerical stability of the algorithms.
42 /// The "bar" measurements typically have more events (& smaller relative errors).
43 /// If we choose
44 ///
45 /// ~~~{.cpp}
46 /// <expectation noffbar> = tau * rho * <expectation noonbar>
47 /// ~~~
48 ///
49 /// the product tau*rho will be known very precisely (~1/sqrt(bbar)) and the contour
50 /// in those parameters will be narrow and have a non-trivial tau~1/rho shape.
51 /// However, if we choose to put rho on the non/noff measurements (where the
52 /// product will have an error `~1/sqrt(b))`, the contours will be more amenable
53 /// to numerical techniques. Thus, here we choose to define:
54 ///
55 /// ~~~{.cpp}
56 /// tau := <expectation off bar> / (<expectation on bar>)
57 /// rho := <expectation off> / (<expectation on> * tau)
58 ///
59 /// ^ y
60 /// |
61 /// |---------------------------+
62 /// | | |
63 /// | nonbar | non |
64 /// | bbar | s+b |
65 /// | | |
66 /// |---------------+-----------|
67 /// | | |
68 /// | noffbar | noff |
69 /// | tau bbar | tau b rho |
70 /// | | |
71 /// +-----------------------------> x
72 /// ~~~
73 ///
74 /// Left in this way, the problem is under-constrained. However, one may
75 /// have some auxiliary measurement (usually based on Monte Carlo) to
76 /// constrain rho. Let us call this auxiliary measurement that gives
77 /// the nominal value of rho "rhonom". Thus, there is a 'constraint' term in
78 /// the full model: P(rhonom | rho). In this case, we consider a Gaussian
79 /// constraint with standard deviation sigma.
80 ///
81 /// In the example, the initial values of the parameters are:
82 ///
83 /// ~~~{.cpp}
84 /// - s = 40
85 /// - b = 100
86 /// - tau = 5
87 /// - bbar = 1000
88 /// - rho = 1
89 /// (sigma for rho = 20%)
90 /// ~~~
91 ///
92 /// and in the toy dataset:
93 ///
94 /// ~~~{.cpp}
95 /// - non = 139
96 /// - noff = 528
97 /// - nonbar = 993
98 /// - noffbar = 4906
99 /// - rhonom = 1.27824
100 /// ~~~
101 ///
102 /// Note, the covariance matrix of the parameters has large off-diagonal terms.
103 /// Clearly s,b are anti-correlated. Similarly, since noffbar >> nonbar, one would
104 /// expect bbar,tau to be anti-correlated.
105 ///
106 /// This can be seen below.
107 ///
108 /// ~~~{.cpp}
109 /// GLOBAL b bbar rho s tau
110 /// b 0.96820 1.000 0.191 -0.942 -0.762 -0.209
111 /// bbar 0.91191 0.191 1.000 0.000 -0.146 -0.912
112 /// rho 0.96348 -0.942 0.000 1.000 0.718 -0.000
113 /// s 0.76250 -0.762 -0.146 0.718 1.000 0.160
114 /// tau 0.92084 -0.209 -0.912 -0.000 0.160 1.000
115 /// ~~~
116 ///
117 /// Similarly, since tau*rho appears as a product, we expect rho,tau
118 /// to be anti-correlated. When the error on rho is significantly
119 /// larger than 1/sqrt(bbar), tau is essentially known and the
120 /// correlation is minimal (tau mainly cares about bbar, and rho about b,s).
121 /// In the alternate parametrization (bbar* tau * rho) the correlation coefficient
122 /// for rho,tau is large (and negative).
123 ///
124 /// The code below uses best-practices for RooFit & RooStats as of June 2010.
125 ///
126 /// It proceeds as follows:
127 /// - create a workspace to hold the model
128 /// - use workspace factory to quickly create the terms of the model
129 /// - use workspace factory to define total model (a prod pdf)
130 /// - create a RooStats ModelConfig to specify observables, parameters of interest
131 /// - add to the ModelConfig a prior on the parameters for Bayesian techniques
132 /// note, the pdf it is factorized for parameters of interest & nuisance params
133 /// - visualize the model
134 /// - write the workspace to a file
135 /// - use several of RooStats IntervalCalculators & compare results
136 ///
137 /// \macro_image
138 /// \macro_output
139 /// \macro_code
140 ///
141 /// \authors authors: Kyle Cranmer, Tanja Rommerskirchen
142 
143 #include "TStopwatch.h"
144 #include "TCanvas.h"
145 #include "TROOT.h"
146 #include "RooPlot.h"
147 #include "RooAbsPdf.h"
148 #include "RooWorkspace.h"
149 #include "RooDataSet.h"
150 #include "RooGlobalFunc.h"
151 #include "RooFitResult.h"
152 #include "RooRandom.h"
157 #include "RooStats/MCMCCalculator.h"
158 #include "RooStats/MCMCInterval.h"
160 #include "RooStats/ProposalHelper.h"
161 #include "RooStats/SimpleInterval.h"
162 #include "RooStats/FeldmanCousins.h"
164 
165 using namespace RooFit;
166 using namespace RooStats;
167 
168 void FourBinInstructional(bool doBayesian=false, bool doFeldmanCousins=false, bool doMCMC=false){
169 
170  // let's time this challenging example
171  TStopwatch t;
172  t.Start();
173 
174  // set RooFit random seed for reproducible results
176 
177  // make model
178  RooWorkspace* wspace = new RooWorkspace("wspace");
179  wspace->factory("Poisson::on(non[0,1000], sum::splusb(s[40,0,100],b[100,0,300]))");
180  wspace->factory("Poisson::off(noff[0,5000], prod::taub(b,tau[5,3,7],rho[1,0,2]))");
181  wspace->factory("Poisson::onbar(nonbar[0,10000], bbar[1000,500,2000])");
182  wspace->factory("Poisson::offbar(noffbar[0,1000000], prod::lambdaoffbar(bbar, tau))");
183  wspace->factory("Gaussian::mcCons(rhonom[1.,0,2], rho, sigma[.2])");
184  wspace->factory("PROD::model(on,off,onbar,offbar,mcCons)");
185  wspace->defineSet("obs","non,noff,nonbar,noffbar,rhonom");
186 
187  wspace->factory("Uniform::prior_poi({s})");
188  wspace->factory("Uniform::prior_nuis({b,bbar,tau, rho})");
189  wspace->factory("PROD::prior(prior_poi,prior_nuis)");
190 
191  // ----------------------------------
192  // Control some interesting variations
193  // define parameers of interest
194  // for 1-d plots
195  wspace->defineSet("poi","s");
196  wspace->defineSet("nuis","b,tau,rho,bbar");
197  // for 2-d plots to inspect correlations:
198  // wspace->defineSet("poi","s,rho");
199 
200  // test simpler cases where parameters are known.
201  // wspace->var("tau")->setConstant();
202  // wspace->var("rho")->setConstant();
203  // wspace->var("b")->setConstant();
204  // wspace->var("bbar")->setConstant();
205 
206  // inspect workspace
207  // wspace->Print();
208 
209  // ----------------------------------------------------------
210  // Generate toy data
211  // generate toy data assuming current value of the parameters
212  // import into workspace.
213  // add Verbose() to see how it's being generated
214  RooDataSet* data = wspace->pdf("model")->generate(*wspace->set("obs"),1);
215  // data->Print("v");
216  wspace->import(*data);
217 
218  // ----------------------------------
219  // Now the statistical tests
220  // model config
221  ModelConfig* modelConfig = new ModelConfig("FourBins");
222  modelConfig->SetWorkspace(*wspace);
223  modelConfig->SetPdf(*wspace->pdf("model"));
224  modelConfig->SetPriorPdf(*wspace->pdf("prior"));
225  modelConfig->SetParametersOfInterest(*wspace->set("poi"));
226  modelConfig->SetNuisanceParameters(*wspace->set("nuis"));
227  wspace->import(*modelConfig);
228  wspace->writeToFile("FourBin.root");
229 
230  // -------------------------------------------------
231  // If you want to see the covariance matrix uncomment
232  // wspace->pdf("model")->fitTo(*data);
233 
234  // use ProfileLikelihood
235  ProfileLikelihoodCalculator plc(*data, *modelConfig);
236  plc.SetConfidenceLevel(0.95);
237  LikelihoodInterval* plInt = plc.GetInterval();
240  plInt->LowerLimit( *wspace->var("s") ); // get ugly print out of the way. Fix.
242 
243  // use FeldmaCousins (takes ~20 min)
244  FeldmanCousins fc(*data, *modelConfig);
245  fc.SetConfidenceLevel(0.95);
246  //number counting: dataset always has 1 entry with N events observed
247  fc.FluctuateNumDataEntries(false);
248  fc.UseAdaptiveSampling(true);
249  fc.SetNBins(40);
250  PointSetInterval* fcInt = NULL;
251  if(doFeldmanCousins){ // takes 7 minutes
252  fcInt = (PointSetInterval*) fc.GetInterval(); // fix cast
253  }
254 
255 
256  // use BayesianCalculator (only 1-d parameter of interest, slow for this problem)
257  BayesianCalculator bc(*data, *modelConfig);
258  bc.SetConfidenceLevel(0.95);
259  SimpleInterval* bInt = NULL;
260  if(doBayesian && wspace->set("poi")->getSize() == 1) {
261  bInt = bc.GetInterval();
262  } else{
263  cout << "Bayesian Calc. only supports on parameter of interest" << endl;
264  }
265 
266 
267  // use MCMCCalculator (takes about 1 min)
268  // Want an efficient proposal function, so derive it from covariance
269  // matrix of fit
270  RooFitResult* fit = wspace->pdf("model")->fitTo(*data,Save());
271  ProposalHelper ph;
273  ph.SetCovMatrix(fit->covarianceMatrix());
274  ph.SetUpdateProposalParameters(kTRUE); // auto-create mean vars and add mappings
275  ph.SetCacheSize(100);
277 
278  MCMCCalculator mc(*data, *modelConfig);
279  mc.SetConfidenceLevel(0.95);
280  mc.SetProposalFunction(*pf);
281  mc.SetNumBurnInSteps(500); // first N steps to be ignored as burn-in
282  mc.SetNumIters(50000);
283  mc.SetLeftSideTailFraction(0.5); // make a central interval
284  MCMCInterval* mcInt = NULL;
285  if(doMCMC)
286  mcInt = mc.GetInterval();
287 
288  // ----------------------------------
289  // Make some plots
290  TCanvas* c1 = (TCanvas*) gROOT->Get("c1");
291  if(!c1)
292  c1 = new TCanvas("c1");
293 
294  if(doBayesian && doMCMC){
295  c1->Divide(3);
296  c1->cd(1);
297  }
298  else if (doBayesian || doMCMC){
299  c1->Divide(2);
300  c1->cd(1);
301  }
302 
303  LikelihoodIntervalPlot* lrplot = new LikelihoodIntervalPlot(plInt);
304  lrplot->Draw();
305 
306  if(doBayesian && wspace->set("poi")->getSize() == 1) {
307  c1->cd(2);
308  // the plot takes a long time and print lots of error
309  // using a scan it is better
310  bc.SetScanOfPosterior(20);
311  RooPlot* bplot = bc.GetPosteriorPlot();
312  bplot->Draw();
313  }
314 
315  if(doMCMC){
316  if(doBayesian && wspace->set("poi")->getSize() == 1)
317  c1->cd(3);
318  else
319  c1->cd(2);
320  MCMCIntervalPlot mcPlot(*mcInt);
321  mcPlot.Draw();
322  }
323 
324  // ----------------------------------
325  // querry intervals
326  cout << "Profile Likelihood interval on s = [" <<
327  plInt->LowerLimit( *wspace->var("s") ) << ", " <<
328  plInt->UpperLimit( *wspace->var("s") ) << "]" << endl;
329  //Profile Likelihood interval on s = [12.1902, 88.6871]
330 
331 
332  if(doBayesian && wspace->set("poi")->getSize() == 1) {
333  cout << "Bayesian interval on s = [" <<
334  bInt->LowerLimit( ) << ", " <<
335  bInt->UpperLimit( ) << "]" << endl;
336  }
337 
338  if(doFeldmanCousins){
339  cout << "Feldman Cousins interval on s = [" <<
340  fcInt->LowerLimit( *wspace->var("s") ) << ", " <<
341  fcInt->UpperLimit( *wspace->var("s") ) << "]" << endl;
342  //Feldman Cousins interval on s = [18.75 +/- 2.45, 83.75 +/- 2.45]
343  }
344 
345  if(doMCMC){
346  cout << "MCMC interval on s = [" <<
347  mcInt->LowerLimit(*wspace->var("s") ) << ", " <<
348  mcInt->UpperLimit(*wspace->var("s") ) << "]" << endl;
349  //MCMC interval on s = [15.7628, 84.7266]
350 
351  }
352 
353  t.Print();
354 }
ProposalFunction is an interface for all proposal functions that would be used with a Markov Chain Mo...
ModelConfig is a simple class that holds configuration information specifying how a model should be u...
Definition: ModelConfig.h:30
const RooArgList & floatParsFinal() const
Definition: RooFitResult.h:110
void Start(Bool_t reset=kTRUE)
Start the stopwatch.
Definition: TStopwatch.cxx:58
void Draw(const Option_t *options=0)
draw the likelihood interval or contour for the 1D case a RooPlot is drawn by default of the profiled...
This class provides simple and straightforward utilities to plot a MCMCInterval object.
void Print(Option_t *option="") const
Print the real and cpu time passed between the start and stop events.
Definition: TStopwatch.cxx:219
LikelihoodInterval is a concrete implementation of the RooStats::ConfInterval interface.
virtual void SetWorkspace(RooWorkspace &ws)
Definition: ModelConfig.h:66
return c1
Definition: legend1.C:41
const TMatrixDSym & covarianceMatrix() const
Return covariance matrix.
virtual ProposalFunction * GetProposalFunction()
virtual void SetUpdateProposalParameters(Bool_t updateParams)
TVirtualPad * cd(Int_t subpadnumber=0)
Set current canvas & pad.
Definition: TCanvas.cxx:688
This class provides simple and straightforward utilities to plot a LikelihoodInterval object...
RooFit::MsgLevel globalKillBelow() const
ProfileLikelihoodCalculator is a concrete implementation of CombinedCalculator (the interface class f...
#define gROOT
Definition: TROOT.h:402
static RooMsgService & instance()
Return reference to singleton instance.
Double_t LowerLimit(const RooRealVar &param)
return the lower bound of the interval on a given parameter
static struct mg_connection * fc(struct mg_context *ctx)
Definition: civetweb.c:1956
virtual void SetSeed(ULong_t seed=0)
Set the random generator seed.
Definition: TRandom.cxx:589
virtual void SetPdf(const RooAbsPdf &pdf)
Set the Pdf, add to the the workspace if not already there.
Definition: ModelConfig.h:75
static TRandom * randomGenerator()
Return a pointer to a singleton random-number generator implementation.
Definition: RooRandom.cxx:54
virtual Double_t LowerLimit()
Double_t LowerLimit(RooRealVar &param)
return lower limit on a given parameter
Int_t getSize() const
Double_t UpperLimit(const RooRealVar &param)
return the upper bound of the interval on a given parameter
virtual void SetNuisanceParameters(const RooArgSet &set)
specify the nuisance parameters (e.g. the rest of the parameters)
Definition: ModelConfig.h:108
virtual void SetCacheSize(Int_t size)
Bool_t defineSet(const char *name, const RooArgSet &aset, Bool_t importMissing=kFALSE)
Define a named RooArgSet with given constituents.
void setGlobalKillBelow(RooFit::MsgLevel level)
Bool_t writeToFile(const char *fileName, Bool_t recreate=kTRUE)
Save this current workspace into given file.
virtual Double_t UpperLimit(RooRealVar &param)
get the highest value of param that is within the confidence interval
RooDataSet is a container class to hold unbinned data.
Definition: RooDataSet.h:29
Double_t UpperLimit(RooRealVar &param)
return upper limit on a given parameter
A RooPlot is a plot frame and a container for graphics objects within that frame. ...
Definition: RooPlot.h:41
The FeldmanCousins class (like the Feldman-Cousins technique) is essentially a specific configuration...
The Canvas class.
Definition: TCanvas.h:31
Namespace for the RooStats classes.
Definition: Asimov.h:20
PointSetInterval is a concrete implementation of the ConfInterval interface.
RooAbsPdf * pdf(const char *name) const
Retrieve p.d.f (RooAbsPdf) with given name. A null pointer is returned if not found.
virtual Double_t LowerLimit(RooRealVar &param)
get the lowest value of param that is within the confidence interval
virtual void SetPriorPdf(const RooAbsPdf &pdf)
Set the Prior Pdf, add to the the workspace if not already there.
Definition: ModelConfig.h:81
virtual void SetCovMatrix(const TMatrixDSym &covMatrix)
RooRealVar * var(const char *name) const
Retrieve real-valued variable (RooRealVar) with given name. A null pointer is returned if not found...
RooFactoryWSTool & factory()
Return instance to factory tool.
RooCmdArg Save(Bool_t flag=kTRUE)
virtual Double_t UpperLimit()
SimpleInterval is a concrete implementation of the ConfInterval interface.
virtual void Divide(Int_t nx=1, Int_t ny=1, Float_t xmargin=0.01, Float_t ymargin=0.01, Int_t color=0)
Automatic pad generation by division.
Definition: TPad.cxx:1153
Bool_t import(const RooAbsArg &arg, const RooCmdArg &arg1=RooCmdArg(), const RooCmdArg &arg2=RooCmdArg(), const RooCmdArg &arg3=RooCmdArg(), const RooCmdArg &arg4=RooCmdArg(), const RooCmdArg &arg5=RooCmdArg(), const RooCmdArg &arg6=RooCmdArg(), const RooCmdArg &arg7=RooCmdArg(), const RooCmdArg &arg8=RooCmdArg(), const RooCmdArg &arg9=RooCmdArg())
Import a RooAbsArg object, e.g.
RooDataSet * generate(const RooArgSet &whatVars, Int_t nEvents, const RooCmdArg &arg1, const RooCmdArg &arg2=RooCmdArg::none(), const RooCmdArg &arg3=RooCmdArg::none(), const RooCmdArg &arg4=RooCmdArg::none(), const RooCmdArg &arg5=RooCmdArg::none())
Generate a new dataset containing the specified variables with events sampled from our distribution...
Definition: RooAbsPdf.cxx:1725
const RooArgSet * set(const char *name)
Return pointer to previously defined named set with given nmame If no such set is found a null pointe...
virtual RooFitResult * fitTo(RooAbsData &data, const RooCmdArg &arg1=RooCmdArg::none(), const RooCmdArg &arg2=RooCmdArg::none(), const RooCmdArg &arg3=RooCmdArg::none(), const RooCmdArg &arg4=RooCmdArg::none(), const RooCmdArg &arg5=RooCmdArg::none(), const RooCmdArg &arg6=RooCmdArg::none(), const RooCmdArg &arg7=RooCmdArg::none(), const RooCmdArg &arg8=RooCmdArg::none())
Fit PDF to given dataset.
Definition: RooAbsPdf.cxx:1079
virtual void SetParametersOfInterest(const RooArgSet &set)
Definition: ModelConfig.h:93
MCMCInterval is a concrete implementation of the RooStats::ConfInterval interface.
Definition: MCMCInterval.h:30
virtual void SetVariables(RooArgList &vars)
Bayesian Calculator estimating an interval or a credible region using the Markov-Chain Monte Carlo me...
const Bool_t kTRUE
Definition: RtypesCore.h:87
BayesianCalculator is a concrete implementation of IntervalCalculator, providing the computation of a...
The RooWorkspace is a persistable container for RooFit projects.
Definition: RooWorkspace.h:42
virtual void Draw(Option_t *options=0)
Draw this plot and all of the elements it contains.
Definition: RooPlot.cxx:559
Stopwatch class.
Definition: TStopwatch.h:28