ROOT  6.07/01
Reference Guide
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Groups Pages
FourBinInstructional.C
Go to the documentation of this file.
1 // This example is a generalization of the on/off problem.
2 
3 /*
4 FourBin Instructional Tutorial:
5  authors:
6  Kyle Cranmer <cranmer@cern.ch>
7  Tanja Rommerskirchen <tanja.rommerskirchen@cern.ch>
8 
9  date: June 1, 2010
10 
11  This example is a generalization of the on/off problem.
12 It's a common setup for SUSY searches. Imagine that one has two
13 variables "x" and "y" (eg. missing ET and SumET), see figure.
14 The signal region has high values of both of these variables (top right).
15 One can see low values of "x" or "y" acting as side-bands. If we
16 just used "y" as a sideband, we would have the on/off problem.
17  - In the signal region we observe non events and expect s+b events.
18  - In the region with low values of "y" (bottom right)
19  we observe noff events and expect tau*b events.
20 Note the significance of tau. In the background only case:
21  tau ~ <expectation off> / <expectation on>
22 If tau is known, this model is sufficient, but often tau is not known exactly.
23 So one can use low values of "x" as an additional constraint for tau.
24 Note that this technique critically depends on the notion that the
25 joint distribution for "x" and "y" can be factorized.
26 Generally, these regions have many events, so it the ratio can be
27 measured very precisely there. So we extend the model to describe the
28 left two boxes... denoted with "bar".
29  - In the upper left we observe nonbar events and expect bbar events
30  - In the bottom left we observe noffbar events and expect tau bbar events
31 Note again we have:
32  tau ~ <expecation off bar> / <expectation on bar>
33 One can further expand the model to account for the systematic associated
34 to assuming the distribution of "x" and "y" factorizes (eg. that
35 tau is the same for off/on and offbar/onbar). This can be done in several
36 ways, but here we introduce an additional parameter rho, which so that
37 one set of models will use tau and the other tau*rho. The choice is arbitary,
38 but it has consequences on the numerical stability of the algorithms.
39 The "bar" measurements typically have more events (& smaller relative errors).
40 If we choose <expectation noffbar> = tau * rho * <expectation noonbar>, the
41 product tau*rho will be known very precisely (~1/sqrt(bbar)) and the contour
42 in those parameters will be narrow and have a non-trivial tau~1/rho shape.
43 However, if we choose to put rho on the non/noff measurements (where the
44 product will have an error ~1/sqrt(b)), the contours will be more ameanable
45 to numerical techniques. Thus, here we choose to define
46  tau := <expecation off bar> / (<expectation on bar>)
47  rho := <expecation off> / (<expectation on> * tau)
48 
49 ^ y
50 |
51 |---------------------------+
52 | | |
53 | nonbar | non |
54 | bbar | s+b |
55 | | |
56 |---------------+-----------|
57 | | |
58 | noffbar | noff |
59 | tau bbar | tau b rho |
60 | | |
61 +-----------------------------> x
62 
63 
64 Left in this way, the problem is under-constrained. However, one may
65 have some auxiliary measurement (usually based on Monte Carlo) to
66 constrain rho. Let us call this auxiliary measurement that gives
67 the nominal value of rho "rhonom". Thus, there is a 'constraint' term in
68 the full model: P(rhonom | rho). In this case, we consider a Gaussian
69 constraint with standard deviation sigma.
70 
71 In the example, the initial values of the parameters are
72  - s = 40
73  - b = 100
74  - tau = 5
75  - bbar = 1000
76  - rho = 1
77  (sigma for rho = 20%)
78 and in the toy dataset:
79  - non = 139
80  - noff = 528
81  - nonbar = 993
82  - noffbar = 4906
83  - rhonom = 1.27824
84 
85 Note, the covariance matrix of the parameters has large off-diagonal terms.
86 Clearly s,b are anti-correlated. Similary, since noffbar >> nonbar, one would
87 expect bbar,tau to be anti-correlated.
88 
89 This can be seen below.
90  GLOBAL b bbar rho s tau
91  b 0.96820 1.000 0.191 -0.942 -0.762 -0.209
92  bbar 0.91191 0.191 1.000 0.000 -0.146 -0.912
93  rho 0.96348 -0.942 0.000 1.000 0.718 -0.000
94  s 0.76250 -0.762 -0.146 0.718 1.000 0.160
95  tau 0.92084 -0.209 -0.912 -0.000 0.160 1.000
96 
97 Similarly, since tau*rho appears as a product, we expect rho,tau
98 to be anti-correlated. When the error on rho is significantly
99 larger than 1/sqrt(bbar), tau is essentially known and the
100 correlation is minimal (tau mainly cares about bbar, and rho about b,s).
101 In the alternate parametrizaton (bbar* tau * rho) the correlation coefficient
102 for rho,tau is large (and negative).
103 
104 
105 The code below uses best-practices for RooFit & RooStats as of
106 June 2010. It proceeds as follows:
107  - create a workspace to hold the model
108  - use workspace factory to quickly create the terms of the model
109  - use workspace factory to define total model (a prod pdf)
110  - create a RooStats ModelConfig to specify observables, parameters of interest
111  - add to the ModelConfig a prior on the parameters for Bayesian techniques
112  - note, the pdf it is factorized for parameters of interest & nuisance params
113  - visualize the model
114  - write the workspace to a file
115  - use several of RooStats IntervalCalculators & compare results
116 */
117 
118 #include "TStopwatch.h"
119 #include "TCanvas.h"
120 #include "TROOT.h"
121 #include "RooPlot.h"
122 #include "RooAbsPdf.h"
123 #include "RooWorkspace.h"
124 #include "RooDataSet.h"
125 #include "RooGlobalFunc.h"
126 #include "RooFitResult.h"
127 #include "RooRandom.h"
132 #include "RooStats/MCMCCalculator.h"
133 #include "RooStats/MCMCInterval.h"
135 #include "RooStats/ProposalHelper.h"
136 #include "RooStats/SimpleInterval.h"
137 #include "RooStats/FeldmanCousins.h"
139 
140 using namespace RooFit;
141 using namespace RooStats;
142 
143 void FourBinInstructional(bool doBayesian=false, bool doFeldmanCousins=false, bool doMCMC=false){
144 
145  // let's time this challenging example
146  TStopwatch t;
147  t.Start();
148 
149  // set RooFit random seed for reproducible results
151 
152  // make model
153  RooWorkspace* wspace = new RooWorkspace("wspace");
154  wspace->factory("Poisson::on(non[0,1000], sum::splusb(s[40,0,100],b[100,0,300]))");
155  wspace->factory("Poisson::off(noff[0,5000], prod::taub(b,tau[5,3,7],rho[1,0,2]))");
156  wspace->factory("Poisson::onbar(nonbar[0,10000], bbar[1000,500,2000])");
157  wspace->factory("Poisson::offbar(noffbar[0,1000000], prod::lambdaoffbar(bbar, tau))");
158  wspace->factory("Gaussian::mcCons(rhonom[1.,0,2], rho, sigma[.2])");
159  wspace->factory("PROD::model(on,off,onbar,offbar,mcCons)");
160  wspace->defineSet("obs","non,noff,nonbar,noffbar,rhonom");
161 
162  wspace->factory("Uniform::prior_poi({s})");
163  wspace->factory("Uniform::prior_nuis({b,bbar,tau, rho})");
164  wspace->factory("PROD::prior(prior_poi,prior_nuis)");
165 
166  ///////////////////////////////////////////
167  // Control some interesting variations
168  // define parameers of interest
169  // for 1-d plots
170  wspace->defineSet("poi","s");
171  wspace->defineSet("nuis","b,tau,rho,bbar");
172  // for 2-d plots to inspect correlations:
173  // wspace->defineSet("poi","s,rho");
174 
175  // test simpler cases where parameters are known.
176  // wspace->var("tau")->setConstant();
177  // wspace->var("rho")->setConstant();
178  // wspace->var("b")->setConstant();
179  // wspace->var("bbar")->setConstant();
180 
181  // inspect workspace
182  // wspace->Print();
183 
184  ////////////////////////////////////////////////////////////
185  // Generate toy data
186  // generate toy data assuming current value of the parameters
187  // import into workspace.
188  // add Verbose() to see how it's being generated
189  RooDataSet* data = wspace->pdf("model")->generate(*wspace->set("obs"),1);
190  // data->Print("v");
191  wspace->import(*data);
192 
193  /////////////////////////////////////////////////////
194  // Now the statistical tests
195  // model config
196  ModelConfig* modelConfig = new ModelConfig("FourBins");
197  modelConfig->SetWorkspace(*wspace);
198  modelConfig->SetPdf(*wspace->pdf("model"));
199  modelConfig->SetPriorPdf(*wspace->pdf("prior"));
200  modelConfig->SetParametersOfInterest(*wspace->set("poi"));
201  modelConfig->SetNuisanceParameters(*wspace->set("nuis"));
202  wspace->import(*modelConfig);
203  wspace->writeToFile("FourBin.root");
204 
205  //////////////////////////////////////////////////
206  // If you want to see the covariance matrix uncomment
207  // wspace->pdf("model")->fitTo(*data);
208 
209  // use ProfileLikelihood
210  ProfileLikelihoodCalculator plc(*data, *modelConfig);
211  plc.SetConfidenceLevel(0.95);
212  LikelihoodInterval* plInt = plc.GetInterval();
215  plInt->LowerLimit( *wspace->var("s") ); // get ugly print out of the way. Fix.
217 
218  // use FeldmaCousins (takes ~20 min)
219  FeldmanCousins fc(*data, *modelConfig);
220  fc.SetConfidenceLevel(0.95);
221  //number counting: dataset always has 1 entry with N events observed
222  fc.FluctuateNumDataEntries(false);
223  fc.UseAdaptiveSampling(true);
224  fc.SetNBins(40);
225  PointSetInterval* fcInt = NULL;
226  if(doFeldmanCousins){ // takes 7 minutes
227  fcInt = (PointSetInterval*) fc.GetInterval(); // fix cast
228  }
229 
230 
231  // use BayesianCalculator (only 1-d parameter of interest, slow for this problem)
232  BayesianCalculator bc(*data, *modelConfig);
233  bc.SetConfidenceLevel(0.95);
234  SimpleInterval* bInt = NULL;
235  if(doBayesian && wspace->set("poi")->getSize() == 1) {
236  bInt = bc.GetInterval();
237  } else{
238  cout << "Bayesian Calc. only supports on parameter of interest" << endl;
239  }
240 
241 
242  // use MCMCCalculator (takes about 1 min)
243  // Want an efficient proposal function, so derive it from covariance
244  // matrix of fit
245  RooFitResult* fit = wspace->pdf("model")->fitTo(*data,Save());
246  ProposalHelper ph;
248  ph.SetCovMatrix(fit->covarianceMatrix());
249  ph.SetUpdateProposalParameters(kTRUE); // auto-create mean vars and add mappings
250  ph.SetCacheSize(100);
252 
253  MCMCCalculator mc(*data, *modelConfig);
254  mc.SetConfidenceLevel(0.95);
255  mc.SetProposalFunction(*pf);
256  mc.SetNumBurnInSteps(500); // first N steps to be ignored as burn-in
257  mc.SetNumIters(50000);
258  mc.SetLeftSideTailFraction(0.5); // make a central interval
259  MCMCInterval* mcInt = NULL;
260  if(doMCMC)
261  mcInt = mc.GetInterval();
262 
263  //////////////////////////////////////
264  // Make some plots
265  TCanvas* c1 = (TCanvas*) gROOT->Get("c1");
266  if(!c1)
267  c1 = new TCanvas("c1");
268 
269  if(doBayesian && doMCMC){
270  c1->Divide(3);
271  c1->cd(1);
272  }
273  else if (doBayesian || doMCMC){
274  c1->Divide(2);
275  c1->cd(1);
276  }
277 
278  LikelihoodIntervalPlot* lrplot = new LikelihoodIntervalPlot(plInt);
279  lrplot->Draw();
280 
281  if(doBayesian && wspace->set("poi")->getSize() == 1) {
282  c1->cd(2);
283  // the plot takes a long time and print lots of error
284  // using a scan it is better
285  bc.SetScanOfPosterior(20);
286  RooPlot* bplot = bc.GetPosteriorPlot();
287  bplot->Draw();
288  }
289 
290  if(doMCMC){
291  if(doBayesian && wspace->set("poi")->getSize() == 1)
292  c1->cd(3);
293  else
294  c1->cd(2);
295  MCMCIntervalPlot mcPlot(*mcInt);
296  mcPlot.Draw();
297  }
298 
299  ////////////////////////////////////
300  // querry intervals
301  cout << "Profile Likelihood interval on s = [" <<
302  plInt->LowerLimit( *wspace->var("s") ) << ", " <<
303  plInt->UpperLimit( *wspace->var("s") ) << "]" << endl;
304  //Profile Likelihood interval on s = [12.1902, 88.6871]
305 
306 
307  if(doBayesian && wspace->set("poi")->getSize() == 1) {
308  cout << "Bayesian interval on s = [" <<
309  bInt->LowerLimit( ) << ", " <<
310  bInt->UpperLimit( ) << "]" << endl;
311  }
312 
313  if(doFeldmanCousins){
314  cout << "Feldman Cousins interval on s = [" <<
315  fcInt->LowerLimit( *wspace->var("s") ) << ", " <<
316  fcInt->UpperLimit( *wspace->var("s") ) << "]" << endl;
317  //Feldman Cousins interval on s = [18.75 +/- 2.45, 83.75 +/- 2.45]
318  }
319 
320  if(doMCMC){
321  cout << "MCMC interval on s = [" <<
322  mcInt->LowerLimit(*wspace->var("s") ) << ", " <<
323  mcInt->UpperLimit(*wspace->var("s") ) << "]" << endl;
324  //MCMC interval on s = [15.7628, 84.7266]
325 
326  }
327 
328  t.Print();
329 
330 
331 }
RooPlot * GetPosteriorPlot(bool norm=false, double precision=0.01) const
get the plot with option to get it normalized
ProposalFunction is an interface for all proposal functions that would be used with a Markov Chain Mo...
void Print(Option_t *option="") const
Print the real and cpu time passed between the start and stop events.
Definition: TStopwatch.cxx:217
ModelConfig is a simple class that holds configuration information specifying how a model should be u...
Definition: ModelConfig.h:52
void Start(Bool_t reset=kTRUE)
Start the stopwatch.
Definition: TStopwatch.cxx:56
void Draw(const Option_t *options=0)
draw the likelihood interval or contour for the 1D case a RooPlot is drawn by default of the profiled...
This class provides simple and straightforward utilities to plot a MCMCInterval object.
const RooArgList & floatParsFinal() const
Definition: RooFitResult.h:109
LikelihoodInterval is a concrete implementation of the RooStats::ConfInterval interface.
virtual void SetWorkspace(RooWorkspace &ws)
Definition: ModelConfig.h:88
void SetScanOfPosterior(int nbin=100)
use directly the approximate posterior function obtained by binning it in nbins by default the cdf is...
TCanvas * c1
Definition: legend1.C:2
void FourBinInstructional(bool doBayesian=false, bool doFeldmanCousins=false, bool doMCMC=false)
friend class RooWorkspace
Definition: RooAbsArg.h:472
virtual ProposalFunction * GetProposalFunction()
virtual void SetUpdateProposalParameters(Bool_t updateParams)
This class provides simple and straightforward utilities to plot a LikelihoodInterval object...
ProfileLikelihoodCalculator is a concrete implementation of CombinedCalculator (the interface class f...
virtual void SetNumBurnInSteps(Int_t numBurnInSteps)
set the number of steps in the chain to discard as burn-in, starting from the first ...
#define gROOT
Definition: TROOT.h:344
void Draw(const Option_t *options=NULL)
static RooMsgService & instance()
Return reference to singleton instance.
virtual void SetSeed(UInt_t seed=0)
Set the random generator seed.
Definition: TRandom.cxx:568
Double_t LowerLimit(const RooRealVar &param)
return the lower bound of the interval on a given parameter
virtual LikelihoodInterval * GetInterval() const
Return a likelihood interval.
virtual SimpleInterval * GetInterval() const
compute the interval.
static struct mg_connection * fc(struct mg_context *ctx)
Definition: civetweb.c:839
virtual void SetConfidenceLevel(Double_t cl)
set the confidence level for the interval (eg. 0.95 for a 95% Confidence Interval) ...
virtual void SetConfidenceLevel(Double_t cl)
set the confidence level for the interval (eg. 0.95 for a 95% Confidence Interval) ...
RooAbsPdf * pdf(const char *name) const
Retrieve p.d.f (RooAbsPdf) with given name. A null pointer is returned if not found.
virtual void SetPdf(const RooAbsPdf &pdf)
Set the Pdf, add to the the workspace if not already there.
Definition: ModelConfig.h:97
static TRandom * randomGenerator()
Return a pointer to a singleton random-number generator implementation.
Definition: RooRandom.cxx:54
virtual Double_t LowerLimit()
virtual void SetConfidenceLevel(Double_t cl)
set the confidence level for the interval (eg. 0.95 for a 95% Confidence Interval) ...
RooFit::MsgLevel globalKillBelow() const
Double_t LowerLimit(RooRealVar &param)
return lower limit on a given parameter
TThread * t[5]
Definition: threadsh1.C:13
Double_t UpperLimit(const RooRealVar &param)
return the upper bound of the interval on a given parameter
virtual void SetNumIters(Int_t numIters)
set the number of iterations to run the metropolis algorithm
virtual MCMCInterval * GetInterval() const
Main interface to get a ConfInterval.
virtual void SetNuisanceParameters(const RooArgSet &set)
specify the nuisance parameters (e.g. the rest of the parameters)
Definition: ModelConfig.h:130
void SetNBins(Int_t bins)
virtual void SetCacheSize(Int_t size)
Bool_t defineSet(const char *name, const RooArgSet &aset, Bool_t importMissing=kFALSE)
Define a named RooArgSet with given constituents.
void FluctuateNumDataEntries(bool flag=true)
const TMatrixDSym & covarianceMatrix() const
Return covariance matrix.
void setGlobalKillBelow(RooFit::MsgLevel level)
Bool_t writeToFile(const char *fileName, Bool_t recreate=kTRUE)
Save this current workspace into given file.
RooDataSet is a container class to hold unbinned data.
Definition: RooDataSet.h:29
RooRealVar * var(const char *name) const
Retrieve real-valued variable (RooRealVar) with given name. A null pointer is returned if not found...
Double_t UpperLimit(RooRealVar &param)
return upper limit on a given parameter
A RooPlot is a plot frame and a container for graphics objects within that frame. ...
Definition: RooPlot.h:41
The FeldmanCousins class (like the Feldman-Cousins technique) is essentially a specific configuration...
PointSetInterval is a concrete implementation of the ConfInterval interface.
virtual void SetProposalFunction(ProposalFunction &proposalFunction)
set the proposal function for suggesting new points for the MCMC
virtual void SetPriorPdf(const RooAbsPdf &pdf)
Set the Prior Pdf, add to the the workspace if not already there.
Definition: ModelConfig.h:103
virtual PointSetInterval * GetInterval() const
Main interface to get a ConfInterval (will be a PointSetInterval)
virtual void SetCovMatrix(const TMatrixDSym &covMatrix)
virtual void SetConfidenceLevel(Double_t cl)
set the confidence level for the interval (eg. 0.95 for a 95% Confidence Interval) ...
RooFactoryWSTool & factory()
Return instance to factory tool.
RooCmdArg Save(Bool_t flag=kTRUE)
virtual Double_t UpperLimit()
Bool_t import(const RooAbsArg &arg, const RooCmdArg &arg1=RooCmdArg(), const RooCmdArg &arg2=RooCmdArg(), const RooCmdArg &arg3=RooCmdArg(), const RooCmdArg &arg4=RooCmdArg(), const RooCmdArg &arg5=RooCmdArg(), const RooCmdArg &arg6=RooCmdArg(), const RooCmdArg &arg7=RooCmdArg(), const RooCmdArg &arg8=RooCmdArg(), const RooCmdArg &arg9=RooCmdArg())
Import a RooAbsArg object, e.g.
RooDataSet * generate(const RooArgSet &whatVars, Int_t nEvents, const RooCmdArg &arg1, const RooCmdArg &arg2=RooCmdArg::none(), const RooCmdArg &arg3=RooCmdArg::none(), const RooCmdArg &arg4=RooCmdArg::none(), const RooCmdArg &arg5=RooCmdArg::none())
Generate a new dataset containing the specified variables with events sampled from our distribution...
Definition: RooAbsPdf.cxx:1702
const RooArgSet * set(const char *name)
Return pointer to previously defined named set with given nmame If no such set is found a null pointe...
#define NULL
Definition: Rtypes.h:82
virtual RooFitResult * fitTo(RooAbsData &data, const RooCmdArg &arg1=RooCmdArg::none(), const RooCmdArg &arg2=RooCmdArg::none(), const RooCmdArg &arg3=RooCmdArg::none(), const RooCmdArg &arg4=RooCmdArg::none(), const RooCmdArg &arg5=RooCmdArg::none(), const RooCmdArg &arg6=RooCmdArg::none(), const RooCmdArg &arg7=RooCmdArg::none(), const RooCmdArg &arg8=RooCmdArg::none())
Fit PDF to given dataset.
Definition: RooAbsPdf.cxx:1056
virtual void SetLeftSideTailFraction(Double_t a)
Set the left side tail fraction.
virtual void SetParametersOfInterest(const RooArgSet &set)
Definition: ModelConfig.h:115
Int_t getSize() const
virtual void SetVariables(RooArgList &vars)
const Bool_t kTRUE
Definition: Rtypes.h:91
Bayesian Calculator estimating an interval or a credible region using the Markov-Chain Monte Carlo me...
BayesianCalculator is a concrete implementation of IntervalCalculator, providing the computation of a...
The RooWorkspace is a persistable container for RooFit projects.
Definition: RooWorkspace.h:42
void UseAdaptiveSampling(bool flag=true)
virtual void Draw(Option_t *options=0)
Draw this plot and all of the elements it contains.
Definition: RooPlot.cxx:559
Stopwatch class.
Definition: TStopwatch.h:30