ROOT   Reference Guide
Searching...
No Matches
NeymanConstruction.cxx
Go to the documentation of this file.
1// @(#)root/roostats:$Id$
2// Author: Kyle Cranmer January 2009
3
4/*************************************************************************
7 * *
8 * For the licensing terms see $ROOTSYS/LICENSE. * 9 * For the list of contributors see$ROOTSYS/README/CREDITS. *
10 *************************************************************************/
11
12/** \class RooStats::NeymanConstruction
13 \ingroup Roostats
14
15NeymanConstruction is a concrete implementation of the NeymanConstruction
16interface that, as the name suggests, performs a NeymanConstruction. It produces
17a RooStats::PointSetInterval, which is a concrete implementation of the
18ConfInterval interface.
19
20The Neyman Construction is not a uniquely defined statistical technique, it
21requires that one specify an ordering rule or ordering principle, which is
22usually incoded by choosing a specific test statistic and limits of integration
23(corresponding to upper/lower/central limits). As a result, this class must be
24configured with the corresponding information before it can produce an interval.
25Common configurations, such as the Feldman-Cousins approach, can be enforced by
26other light weight classes.
27
28The Neyman Construction considers every point in the parameter space
29independently, no assumptions are made that the interval is connected or of a
30particular shape. As a result, the PointSetInterval class is used to represent
31the result. The user indicate which points in the parameter space to perform
32the construction by providing a PointSetInterval instance with the desired points.
33
34This class is fairly light weight, because the choice of parameter points to be
35considered is factorized and so is the creation of the sampling distribution of
36the test statistic (which is done by a concrete class implementing the
37DistributionCreator interface). As a result, this class basically just drives the
38construction by:
39
40 - using a DistributionCreator to create the SamplingDistribution of a user-
41 defined test statistic for each parameter point of interest,
42 - defining the acceptance region in the data by finding the thresholds on the
43 test statistic such that the integral of the sampling distribution is of the
44 appropriate size and consistent with the limits of integration
45 (eg. upper/lower/central limits),
46 - and finally updating the PointSetInterval based on whether the value of the
47 test statistic evaluated on the data are in the acceptance region.
48
49*/
50
52
54
56
60
61#include "RooMsgService.h"
62#include "RooGlobalFunc.h"
63
64#include "RooDataSet.h"
65#include "TFile.h"
66#include "TMath.h"
67#include "TH1F.h"
68
70
71using namespace RooFit;
72using namespace RooStats;
73using namespace std;
74
75
76////////////////////////////////////////////////////////////////////////////////
77/// default constructor
78
80 fSize(0.05),
81 fData(data),
82 fModel(model),
83 fTestStatSampler(0),
84 fPointsToTest(0),
85 fLeftSideFraction(0),
86 fConfBelt(0), // constructed with tree data
89 fSaveBeltToFile(false),
90 fCreateBelt(false)
91
92{
93// fWS = new RooWorkspace();
94// fOwnsWorkspace = true;
95// fDataName = "";
96// fPdfName = "";
97}
98
99////////////////////////////////////////////////////////////////////////////////
100/// default constructor
101/// if(fOwnsWorkspace && fWS) delete fWS;
102/// if(fConfBelt) delete fConfBelt;
103
105}
106
107////////////////////////////////////////////////////////////////////////////////
108/// Main interface to get a RooStats::ConfInterval.
109/// It constructs a RooStats::SetInterval.
110
112
113 TFile* f=0;
114 if(fSaveBeltToFile){
115 //coverity[FORWARD_NULL]
116 oocoutI(f,Contents) << "NeymanConstruction saving ConfidenceBelt to file SamplingDistributions.root" << endl;
117 f = new TFile("SamplingDistributions.root","recreate");
118 }
119
120 Int_t npass = 0;
121 RooArgSet* point;
122
123 // strange problems when using snapshots.
124 // RooArgSet* fPOI = (RooArgSet*) fModel.GetParametersOfInterest()->snapshot();
126
127 RooDataSet* pointsInInterval = new RooDataSet("pointsInInterval",
128 "points in interval",
129 *(fPointsToTest->get(0)) );
130
131 // loop over points to test
132 for(Int_t i=0; i<fPointsToTest->numEntries(); ++i){
133 // get a parameter point from the list of points to test.
134 point = (RooArgSet*) fPointsToTest->get(i);//->clone("temp");
135
136 // set parameters of interest to current point
137 *fPOI = *point;
138
139 // set test stat sampler to use this point
141
142 // get the value of the test statistic for this data set
143 Double_t thisTestStatistic = fTestStatSampler->EvaluateTestStatistic(fData, *fPOI );
144 /*
145 cout << "NC CHECK: " << i << endl;
146 point->Print();
147 fPOI->Print("v");
148 fData.Print();
149 cout <<"thisTestStatistic = " << thisTestStatistic << endl;
150 */
151
152 // find the lower & upper thresholds on the test statistic that
153 // define the acceptance region in the data
154
155 SamplingDistribution* samplingDist=0;
157 Double_t upperEdgeOfAcceptance, upperEdgeMinusSigma, upperEdgePlusSigma;
158 Double_t lowerEdgeOfAcceptance, lowerEdgeMinusSigma, lowerEdgePlusSigma;
160
161 // the adaptive sampling algorithm wants at least one toy event to be outside
162 // of the requested pvalue including the sampling variation. That leads to an equation
163 // N-1 = (1-alpha)N + Z sqrt(N - (1-alpha)N) // for upper limit and
164 // 1 = alpha N - Z sqrt(alpha N) // for lower limit
165 //
166 // solving for N gives:
167 // N = 1/alpha * [3/2 + sqrt(5)] for Z = 1 (which is used currently)
168 // thus, a good guess for the first iteration of events is N=3.73/alpha~4/alpha
169 // should replace alpha here by smaller tail probability: eg. alpha*Min(leftsideFrac, 1.-leftsideFrac)
170 // totalMC will be incremented by 2 before first call, so initiated it at half the value
172 if(fLeftSideFraction==0. || fLeftSideFraction ==1.){
173 totalMC = (Int_t) (2./fSize);
174 }
175 // use control
177 totalMC = (Int_t) tmc;
178
179 ToyMCSampler* toyMCSampler = dynamic_cast<ToyMCSampler*>(fTestStatSampler);
181 do{
182 // this will be executed first, then while conditioned checked
183 // as an exit condition for the loop.
184
185 // the next line is where most of the time will be spent
186 // generating the sampling dist of the test statistic.
187 additionalMC = 2*totalMC; // grow by a factor of two
188 samplingDist =
189 toyMCSampler->AppendSamplingDistribution(*point,
190 samplingDist,
192 if (!samplingDist) {
193 oocoutE((TObject*)0,Eval) << "Neyman Construction: error generating sampling distribution" << endl;
194 return 0;
195 }
196 totalMC=samplingDist->GetSize();
197
198 //cout << "without sigma upper = " <<
199 //samplingDist->InverseCDF( 1. - ((1.-fLeftSideFraction) * fSize) ) << endl;
200
201 sigma = 1;
202 upperEdgeOfAcceptance =
203 samplingDist->InverseCDF( 1. - ((1.-fLeftSideFraction) * fSize) ,
204 sigma, upperEdgePlusSigma);
205 sigma = -1;
206 samplingDist->InverseCDF( 1. - ((1.-fLeftSideFraction) * fSize) ,
207 sigma, upperEdgeMinusSigma);
208
209 sigma = 1;
210 lowerEdgeOfAcceptance =
211 samplingDist->InverseCDF( fLeftSideFraction * fSize ,
212 sigma, lowerEdgePlusSigma);
213 sigma = -1;
214 samplingDist->InverseCDF( fLeftSideFraction * fSize ,
215 sigma, lowerEdgeMinusSigma);
216
217 ooccoutD(samplingDist,Eval) << "NeymanConstruction: "
218 << "total MC = " << totalMC
219 << " this test stat = " << thisTestStatistic << endl
220 << " upper edge -1sigma = " << upperEdgeMinusSigma
221 << ", upperEdge = "<<upperEdgeOfAcceptance
222 << ", upper edge +1sigma = " << upperEdgePlusSigma << endl
223 << " lower edge -1sigma = " << lowerEdgeMinusSigma
224 << ", lowerEdge = "<<lowerEdgeOfAcceptance
225 << ", lower edge +1sigma = " << lowerEdgePlusSigma << endl;
226 } while((
227 (thisTestStatistic <= upperEdgeOfAcceptance &&
228 thisTestStatistic > upperEdgeMinusSigma)
229 || (thisTestStatistic >= upperEdgeOfAcceptance &&
230 thisTestStatistic < upperEdgePlusSigma)
231 || (thisTestStatistic <= lowerEdgeOfAcceptance &&
232 thisTestStatistic > lowerEdgeMinusSigma)
233 || (thisTestStatistic >= lowerEdgeOfAcceptance &&
234 thisTestStatistic < lowerEdgePlusSigma)
235 ) && (totalMC < 100./fSize)
236 ) ; // need ; here
237 } else {
238 // the next line is where most of the time will be spent
239 // generating the sampling dist of the test statistic.
240 samplingDist = fTestStatSampler->GetSamplingDistribution(*point);
241 if (!samplingDist) {
242 oocoutE((TObject*)0,Eval) << "Neyman Construction: error generating sampling distribution" << endl;
243 return 0;
244 }
245
246 lowerEdgeOfAcceptance =
247 samplingDist->InverseCDF( fLeftSideFraction * fSize );
248 upperEdgeOfAcceptance =
249 samplingDist->InverseCDF( 1. - ((1.-fLeftSideFraction) * fSize) );
250 }
251
252 // add acceptance region to ConfidenceBelt
253 if(fConfBelt && fCreateBelt){
254 // cout << "conf belt set " << fConfBelt << endl;
256 lowerEdgeOfAcceptance,
257 upperEdgeOfAcceptance);
258 }
259
260 // printout some debug info
261 TIter itr = point->createIterator();
262 RooRealVar* myarg;
263 ooccoutP(samplingDist,Eval) << "NeymanConstruction: Prog: "<< i+1<<"/"<<fPointsToTest->numEntries()
264 << " total MC = " << samplingDist->GetSize()
265 << " this test stat = " << thisTestStatistic << endl;
266 ooccoutP(samplingDist,Eval) << " ";
267 while ((myarg = (RooRealVar *)itr.Next())) {
268 ooccoutP(samplingDist,Eval) << myarg->GetName() << "=" << myarg->getVal() << " ";
269 }
270 ooccoutP(samplingDist,Eval) << "[" << lowerEdgeOfAcceptance << ", "
271 << upperEdgeOfAcceptance << "] " << " in interval = " <<
272 (thisTestStatistic >= lowerEdgeOfAcceptance && thisTestStatistic <= upperEdgeOfAcceptance)
273 << endl << endl;
274
275 // Check if this data is in the acceptance region
276 if(thisTestStatistic >= lowerEdgeOfAcceptance && thisTestStatistic <= upperEdgeOfAcceptance) {
277 // if so, set this point to true
278 // fPointsToTest->add(*point, 1.); // this behaves differently for Hist and DataSet
280 ++npass;
281 }
282
283 if(fSaveBeltToFile){
284 //write to file
285 samplingDist->Write();
286 string tmpName = "hist_";
287 tmpName+=samplingDist->GetName();
288 TH1F* h = new TH1F(tmpName.c_str(),"",500,0.,5.);
289 for(int ii=0; ii<samplingDist->GetSize(); ++ii){
290 h->Fill(samplingDist->GetSamplingDistribution().at(ii) );
291 }
292 h->Write();
293 delete h;
294 }
295
296 delete samplingDist;
297 // delete point; // from dataset
298 }
299 oocoutI(pointsInInterval,Eval) << npass << " points in interval" << endl;
300
301 // create an interval based pointsInInterval
302 PointSetInterval* interval
303 = new PointSetInterval("ClassicalConfidenceInterval", *pointsInInterval);
304
305
306 if(fSaveBeltToFile){
307 // write belt to file
308 fConfBelt->Write();
309
310 f->Close();
311 }
312
313 delete f;
314 //delete data;
315 return interval;
316}
size_t fSize
#define f(i)
Definition RSha256.hxx:104
#define h(i)
Definition RSha256.hxx:106
#define oocoutE(o, a)
#define oocoutI(o, a)
#define ooccoutP(o, a)
#define ooccoutD(o, a)
int Int_t
Definition RtypesCore.h:45
#define ClassImp(name)
Definition Rtypes.h:364
TIterator * createIterator(Bool_t dir=kIterForward) const
TIterator-style iteration over contained elements.
RooAbsData is the common abstract base class for binned and unbinned datasets.
Definition RooAbsData.h:49
virtual const RooArgSet * get() const
Definition RooAbsData.h:92
virtual Int_t numEntries() const
Return number of entries in dataset, i.e., count unweighted entries.
Double_t getVal(const RooArgSet *normalisationSet=nullptr) const
Evaluate object.
Definition RooAbsReal.h:91
RooArgSet is a container object that can hold multiple RooAbsArg objects.
Definition RooArgSet.h:29
RooDataSet is a container class to hold unbinned data.
Definition RooDataSet.h:33
virtual void add(const RooArgSet &row, Double_t weight=1.0, Double_t weightError=0) override
Add a data point, with its coordinates specified in the 'data' argset, to the data set.
RooRealVar represents a variable that can be changed from the outside.
Definition RooRealVar.h:39
void AddAcceptanceRegion(RooArgSet &, AcceptanceRegion region, Double_t cl=-1., Double_t leftside=-1.)
ModelConfig is a simple class that holds configuration information specifying how a model should be u...
Definition ModelConfig.h:30
const RooArgSet * GetParametersOfInterest() const
get RooArgSet containing the parameter of interest (return NULL if not existing)
NeymanConstruction is a concrete implementation of the NeymanConstruction interface that,...
virtual PointSetInterval * GetInterval() const
Main interface to get a ConfInterval (will be a PointSetInterval)
virtual ~NeymanConstruction()
default constructor if(fOwnsWorkspace && fWS) delete fWS; if(fConfBelt) delete fConfBelt;
RooAbsData & fData
size of the test (eg. specified rate of Type I error)
NeymanConstruction(RooAbsData &data, ModelConfig &model)
NeymanConstruction();.
PointSetInterval is a concrete implementation of the ConfInterval interface.
This class simply holds a sampling distribution of some test statistic.
Double_t InverseCDF(Double_t pvalue)
get the inverse of the Cumulative distribution function
Int_t GetSize() const
size of samples
const std::vector< Double_t > & GetSamplingDistribution() const
Get test statistics values.
virtual Double_t EvaluateTestStatistic(RooAbsData &data, RooArgSet &paramsOfInterest)=0
virtual void SetParametersForTestStat(const RooArgSet &)=0
virtual SamplingDistribution * GetSamplingDistribution(RooArgSet &paramsOfInterest)=0
ToyMCSampler is an implementation of the TestStatSampler interface.
virtual SamplingDistribution * AppendSamplingDistribution(RooArgSet &allParameters, SamplingDistribution *last, Int_t additionalMC)
Extended interface to append to sampling distribution more samples.
A ROOT file is a suite of consecutive data records (TKey instances) with a well defined format.
Definition TFile.h:54
1-D histogram with a float per channel (see TH1 documentation)}
Definition TH1.h:575
TObject * Next()
virtual const char * GetName() const
Returns name of object.
Definition TNamed.h:47
Mother of all ROOT objects.
Definition TObject.h:37
virtual Int_t Write(const char *name=0, Int_t option=0, Int_t bufsize=0)
Write this object to the current directory.
Definition TObject.cxx:798
const Double_t sigma
The namespace RooFit contains mostly switches that change the behaviour of functions of PDFs (or othe...
Namespace for the RooStats classes.
Definition Asimov.h:19
Short_t Min(Short_t a, Short_t b)
Definition TMathBase.h:180