Logo ROOT  
Reference Guide
RooBinSamplingPdf.cxx
Go to the documentation of this file.
1// Authors: Stephan Hageboeck, CERN; Andrea Sciandra, SCIPP-UCSC/Atlas; Nov 2020
2
3/*****************************************************************************
4 * RooFit
5 * Authors: *
6 * WV, Wouter Verkerke, UC Santa Barbara, verkerke@slac.stanford.edu *
7 * DK, David Kirkby, UC Irvine, dkirkby@uci.edu *
8 * *
9 * Copyright (c) 2000-2020, Regents of the University of California *
10 * and Stanford University. All rights reserved. *
11 * *
12 * Redistribution and use in source and binary forms, *
13 * with or without modification, are permitted according to the terms *
14 * listed in LICENSE (http://roofit.sourceforge.net/license.txt) *
15 *****************************************************************************/
16
17/**
18 * \class RooBinSamplingPdf
19 * The RooBinSamplingPdf is supposed to be used as an adapter between a continuous PDF
20 * and a binned distribution.
21 * When RooFit is used to fit binned data, and the PDF is continuous, it takes the probability density
22 * at the bin centre as a proxy for the probability averaged (integrated) over the entire bin. This is
23 * correct only if the second derivative of the function vanishes, though. This is shown in the plots
24 * below.
25 *
26 * For PDFs that have larger curvatures, the RooBinSamplingPdf can be used. It integrates the PDF in each
27 * bin using an adaptive integrator. This usually requires 21 times more function evaluations, but significantly
28 * reduces biases due to better sampling of the PDF. The integrator can be accessed from the outside
29 * using integrator(). This can be used to change the integration rules, so less/more function evaluations are
30 * performed. The target precision of the integrator can be set in the constructor.
31 *
32 *
33 * ### How to use it
34 * There are two ways to use this class:
35 * - Manually wrap a PDF:
36 * ```
37 * RooBinSamplingPdf binSampler("<name>", "title", <binned observable of PDF>, <original PDF> [, <precision for integrator>]);
38 * binSampler.fitTo(data);
39 * ```
40 * When a PDF is wrapped with a RooBinSamplingPDF, just use the bin sampling PDF instead of the original one for fits
41 * or plotting etc.
42 * \note The binning will be taken from the observable. Make sure that this binning is the same as the one of the dataset that should be fit.
43 * Use RooRealVar::setBinning() to adapt it.
44 * - Instruct test statistics to carry out this wrapping automatically:
45 * ```
46 * pdf.fitTo(data, IntegrateBins(<precision>));
47 * ```
48 * This method is especially useful when used with a simultaneous PDF, since each component will automatically be wrapped,
49 * depending on the value of `precision`:
50 * - `precision < 0.`: None of the PDFs are touched, bin sampling is off.
51 * - `precision = 0.`: Continuous PDFs that are fit to a RooDataHist are wrapped into a RooBinSamplingPdf. The target precision
52 * forwarded to the integrator is 1.E-4 (the default argument of the constructor).
53 * - `precision > 0.`: All continuous PDFs are automatically wrapped into a RooBinSamplingPdf, regardless of what data they are
54 * fit to (see next paragraph). The same `'precision'` is used for all integrators.
55 *
56 * ### Simulating a binned fit using RooDataSet
57 * Some frameworks use unbinned data (RooDataSet) to simulate binned datasets. By adding one entry for each bin centre with the
58 * appropriate weight, one can achieve the same result as fitting with RooDataHist. In this case, however, RooFit cannot
59 * auto-detect that a binned fit is running, and that an integration over the bin is desired (note that there are no bins to
60 * integrate over in this kind of dataset).
61 *
62 * In this case, `IntegrateBins(>0.)` needs to be used, and the desired binning needs to be assigned to the observable
63 * of the dataset:
64 * ```
65 * RooRealVar x("x", "x", 0., 5.);
66 * x.setBins(10);
67 *
68 * // <create dataset and model>
69 *
70 * model.fitTo(data, IntegrateBins(>0.));
71 * ```
72 *
73 * \see RooAbsPdf::fitTo()
74 * \see IntegrateBins()
75 *
76 * \note This feature is currently limited to one-dimensional PDFs.
77 *
78 *
79 * \htmlonly <style>div.image img[src="RooBinSamplingPdf_OFF.png"]{width:12cm;}</style> \endhtmlonly
80 * \htmlonly <style>div.image img[src="RooBinSamplingPdf_ON.png" ]{width:12cm;}</style> \endhtmlonly
81 * <table>
82 * <tr><th> Binned fit without %RooBinSamplingPdf <th> Binned fit with %RooBinSamplingPdf </td></tr>
83 * <tr><td> \image html RooBinSamplingPdf_OFF.png ""
84 * </td>
85 * <td> \image html RooBinSamplingPdf_ON.png ""
86 * </td></tr>
87 * </table>
88 *
89 */
90
91
92#include "RooBinSamplingPdf.h"
93
94#include "RooHelpers.h"
95#include "RooRealBinding.h"
96#include "RunContext.h"
97#include "RooRealVar.h"
98#include "RooGlobalFunc.h"
99#include "RooDataHist.h"
100
101#include "Math/Integrator.h"
102
103#include <algorithm>
104
105////////////////////////////////////////////////////////////////////////////////
106/// Construct a new RooBinSamplingPdf.
107/// \param[in] name A name to identify this object.
108/// \param[in] title Title (for e.g. plotting)
109/// \param[in] observable Observable to integrate over (the one that is binned).
110/// \param[in] inputPdf A PDF whose bins should be sampled with higher precision.
111/// \param[in] epsilon Relative precision for the integrator, which is used to sample the bins.
112/// Note that ROOT's default is to use an adaptive integrator, which in its first iteration usually reaches
113/// relative precision of 1.E-4 or better. Therefore, asking for lower precision rarely has an effect.
114RooBinSamplingPdf::RooBinSamplingPdf(const char *name, const char *title, RooAbsRealLValue& observable,
115 RooAbsPdf& inputPdf, double epsilon) :
116 RooAbsPdf(name, title),
117 _pdf("inputPdf", "Function to be converted into a PDF", this, inputPdf),
118 _observable("observable", "Observable to integrate over", this, observable, true, true),
119 _relEpsilon(epsilon) {
120 if (!_pdf->dependsOn(*_observable)) {
121 throw std::invalid_argument(std::string("RooBinSamplingPDF(") + GetName()
122 + "): The PDF " + _pdf->GetName() + " needs to depend on the observable "
123 + _observable->GetName());
124 }
125}
126
127
128 ////////////////////////////////////////////////////////////////////////////////
129 /// Copy a RooBinSamplingPdf.
130 /// \param[in] other PDF to copy.
131 /// \param[in] name Optionally rename the copy.
133 RooAbsPdf(other, name),
134 _pdf("inputPdf", this, other._pdf),
135 _observable("observable", this, other._observable),
136 _relEpsilon(other._relEpsilon) { }
137
138
139////////////////////////////////////////////////////////////////////////////////
140/// Integrate the PDF over the current bin of the observable.
142 const unsigned int bin = _observable->getBin();
143 const double low = _observable->getBinning().binLow(bin);
144 const double high = _observable->getBinning().binHigh(bin);
145
146 const double oldX = _observable->getVal();
147 double result;
148 {
149 // Important: When the integrator samples x, caching of sub-tree values needs to be off.
151 result = integrate(_normSet, low, high) / (high-low);
152 }
153
154 _observable->setVal(oldX);
155
156 return result;
157}
158
159
160////////////////////////////////////////////////////////////////////////////////
161/// Integrate the PDF over all its bins, and return a batch with those values.
162/// \param[in,out] evalData Struct with evaluation data.
163/// \param[in] normSet Normalisation set that's used to evaluate the PDF.
165 // Retrieve binning, which we need to compute the probabilities
166 auto boundaries = binBoundaries();
167 auto xValues = _observable->getValues(evalData, normSet);
168 auto results = evalData.makeBatch(this, xValues.size());
169
170 // Important: When the integrator samples x, caching of sub-tree values needs to be off.
172
173 // Now integrate PDF in each bin:
174 for (unsigned int i=0; i < xValues.size(); ++i) {
175 const double x = xValues[i];
176 const auto upperIt = std::upper_bound(boundaries.begin(), boundaries.end(), x);
177 const unsigned int bin = std::distance(boundaries.begin(), upperIt) - 1;
178 assert(bin < boundaries.size());
179
180 results[i] = integrate(normSet, boundaries[bin], boundaries[bin+1]) / (boundaries[bin+1]-boundaries[bin]);
181 }
182
183 return results;
184}
185
186
187////////////////////////////////////////////////////////////////////////////////
188/// Get the bin boundaries for the observable.
189/// These will be recomputed whenever the shape of this object is dirty.
191 if (isShapeDirty() || _binBoundaries.empty()) {
192 _binBoundaries.clear();
193 const RooAbsBinning& binning = _observable->getBinning(nullptr);
194 const double* boundaries = binning.array();
195
196 for (int i=0; i < binning.numBoundaries(); ++i) {
197 _binBoundaries.push_back(boundaries[i]);
198 }
199
200 assert(std::is_sorted(_binBoundaries.begin(), _binBoundaries.end()));
201
203 }
204
205 return {_binBoundaries};
206}
207
208
209////////////////////////////////////////////////////////////////////////////////
210/// Return a list of all bin boundaries, so the PDF is plotted correctly.
211/// \param[in] obs Observable to generate the boundaries for.
212/// \param[in] xlo Beginning of range to create list of boundaries for.
213/// \param[in] xhi End of range to create to create list of boundaries for.
214/// \return Pointer to a list to be deleted by caller.
215std::list<double>* RooBinSamplingPdf::binBoundaries(RooAbsRealLValue& obs, Double_t xlo, Double_t xhi) const {
216 if (obs.namePtr() != _observable->namePtr()) {
217 coutE(Plotting) << "RooBinSamplingPdf::binBoundaries(" << GetName() << "): observable '" << obs.GetName()
218 << "' is not the observable of this PDF ('" << _observable->GetName() << "')." << std::endl;
219 return nullptr;
220 }
221
222 auto list = new std::list<double>;
223 for (double val : binBoundaries()) {
224 if (xlo <= val && val < xhi)
225 list->push_back(val);
226 }
227
228 return list;
229}
230
231
232////////////////////////////////////////////////////////////////////////////////
233/// Return a list of all bin edges, so the PDF is plotted as a step function.
234/// \param[in] obs Observable to generate the sampling hint for.
235/// \param[in] xlo Beginning of range to create sampling hint for.
236/// \param[in] xhi End of range to create sampling hint for.
237/// \return Pointer to a list to be deleted by caller.
239 if (obs.namePtr() != _observable->namePtr()) {
240 coutE(Plotting) << "RooBinSamplingPdf::plotSamplingHint(" << GetName() << "): observable '" << obs.GetName()
241 << "' is not the observable of this PDF ('" << _observable->GetName() << "')." << std::endl;
242 return nullptr;
243 }
244
245 auto binEdges = new std::list<double>;
246 const auto& binning = obs.getBinning();
247
248 for (unsigned int bin=0, numBins = static_cast<unsigned int>(binning.numBins()); bin < numBins; ++bin) {
249 const double low = std::max(binning.binLow(bin), xlo);
250 const double high = std::min(binning.binHigh(bin), xhi);
251 const double width = high - low;
252
253 // Check if this bin is in plotting range at all
254 if (low >= high)
255 continue;
256
257 // Move support points slightly inside the bin, so step function is plotted correctly.
258 binEdges->push_back(low + 0.001 * width);
259 binEdges->push_back(high - 0.001 * width);
260 }
261
262 return binEdges;
263}
264
265
266////////////////////////////////////////////////////////////////////////////////
267/// Direct access to the unique_ptr holding the integrator that's used to sample the bins.
268/// This can be used to change options such as sampling accuracy or to entirely exchange the integrator.
269///
270/// #### Example: Use the 61-point Gauss-Kronrod integration rule
271/// ```{.cpp}
272/// ROOT::Math::IntegratorOneDimOptions intOptions = pdf.integrator()->Options();
273/// intOptions.SetNPoints(6); // 61-point integration rule
274/// intOptions.SetRelTolerance(1.E-9); // Smaller tolerance -> more subdivisions
275/// pdf.integrator()->SetOptions(intOptions);
276/// ```
277/// \see ROOT::Math::IntegratorOneDim::SetOptions for more details on integration options.
278/// \note When RooBinSamplingPdf is loaded from files, integrator options will fall back to the default values.
279std::unique_ptr<ROOT::Math::IntegratorOneDim>& RooBinSamplingPdf::integrator() const {
280 if (!_integrator) {
282 ROOT::Math::IntegrationOneDim::kADAPTIVE, // GSL Integrator. Will really get it only if MathMore enabled.
283 -1., _relEpsilon, // Abs epsilon = default, rel epsilon set by us.
284 0, // We don't limit the sub-intervals. Steer run time via _relEpsilon.
285 2 // This should read ROOT::Math::Integration::kGAUSS21, but this is in MathMore, so we cannot include it here.
286 ));
287 }
288
289 return _integrator;
290}
291
292
293////////////////////////////////////////////////////////////////////////////////
294/// Binding used by the integrator to evaluate the PDF.
295double RooBinSamplingPdf::operator()(double x) const {
297 return _pdf->getVal();
298}
299
300
301////////////////////////////////////////////////////////////////////////////////
302/// Integrate the wrapped PDF using our current integrator, with given norm set and limits.
303double RooBinSamplingPdf::integrate(const RooArgSet* /*normSet*/, double low, double high) const {
304 return integrator()->Integral(low, high);
305}
306
307
308/// Creates a wrapping RooBinSamplingPdf if appropriate.
309/// \param[in] pdf The input pdf.
310/// \param[in] data The dataset to be used in the fit, used to figure out the
311/// observables and whether the dataset is binned.
312/// \param[in] precision Precision argument for all created RooBinSamplingPdfs.
313std::unique_ptr<RooAbsPdf> RooBinSamplingPdf::create(RooAbsPdf& pdf, RooAbsData const &data, double precision) {
314 if (precision < 0.)
315 return nullptr;
316
317 std::unique_ptr<RooArgSet> funcObservables( pdf.getObservables(data) );
318 const bool oneDimAndBinned = (1 == std::count_if(funcObservables->begin(), funcObservables->end(), [](const RooAbsArg* arg) {
319 auto var = dynamic_cast<const RooRealVar*>(arg);
320 return var && var->numBins() > 1;
321 }));
322
323 if (!oneDimAndBinned) {
324 if (precision > 0.) {
326 << "Integration over bins was requested, but this is currently only implemented for 1-D fits." << std::endl;
327 }
328 return nullptr;
329 }
330
331 // Find the real-valued observable. We don't care about categories.
332 auto theObs = std::find_if(funcObservables->begin(), funcObservables->end(), [](const RooAbsArg* arg){
333 return dynamic_cast<const RooAbsRealLValue*>(arg);
334 });
335 assert(theObs != funcObservables->end());
336
337 std::unique_ptr<RooAbsPdf> newPdf;
338
339 if (precision > 0.) {
340 // User forced integration. Let just apply it.
341 newPdf = std::make_unique<RooBinSamplingPdf>(
342 (std::string(pdf.GetName()) + "_binSampling").c_str(), pdf.GetTitle(),
343 *static_cast<RooAbsRealLValue *>(*theObs), pdf, precision);
344 } else if (dynamic_cast<RooDataHist const *>(&data) != nullptr &&
345 precision == 0. && !pdf.isBinnedDistribution(*data.get())) {
346 // User didn't forbid integration, and it seems appropriate with a
347 // RooDataHist.
348 newPdf = std::make_unique<RooBinSamplingPdf>(
349 (std::string(pdf.GetName()) + "_binSampling").c_str(), pdf.GetTitle(),
350 *static_cast<RooAbsRealLValue *>(*theObs), pdf);
351 }
352
353 return newPdf;
354}
#define oocoutE(o, a)
Definition: RooMsgService.h:48
#define coutE(a)
Definition: RooMsgService.h:33
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void data
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t result
Option_t Option_t width
char name[80]
Definition: TGX11.cxx:110
User Class for performing numerical integration of a function in one dimension.
Definition: Integrator.h:98
RooAbsArg is the common abstract base class for objects that represent a value and a "shape" in RooFi...
Definition: RooAbsArg.h:78
RooArgSet * getObservables(const RooArgSet &set, Bool_t valueOnly=kTRUE) const
Given a set of possible observables, return the observables that this PDF depends on.
Definition: RooAbsArg.h:318
const TNamed * namePtr() const
De-duplicated pointer to this object's name.
Definition: RooAbsArg.h:586
Bool_t dependsOn(const RooAbsCollection &serverList, const RooAbsArg *ignoreArg=0, Bool_t valueOnly=kFALSE) const
Test whether we depend on (ie, are served by) any object in the specified collection.
Definition: RooAbsArg.cxx:835
Bool_t isShapeDirty() const
Definition: RooAbsArg.h:440
Bool_t inhibitDirty() const
Delete watch flag.
Definition: RooAbsArg.cxx:114
void clearShapeDirty() const
Definition: RooAbsArg.h:628
RooAbsBinning is the abstract base class for RooRealVar binning definitions.
Definition: RooAbsBinning.h:26
virtual Double_t binLow(Int_t bin) const =0
virtual Double_t * array() const =0
virtual Double_t binHigh(Int_t bin) const =0
virtual Int_t numBoundaries() const =0
RooAbsData is the common abstract base class for binned and unbinned datasets.
Definition: RooAbsData.h:61
RooArgSet const * _normSet
Normalization integral (owned by _normMgr)
Definition: RooAbsPdf.h:354
RooAbsRealLValue is the common abstract base class for objects that represent a real value that may a...
virtual const RooAbsBinning & getBinning(const char *name=0, Bool_t verbose=kTRUE, Bool_t createOnTheFly=kFALSE) const =0
Retrive binning configuration with given name or default binning.
virtual void setVal(Double_t value)=0
Set the current value of the object. Needs to be overridden by implementations.
Int_t getBin(const char *rangeName=0) const override
virtual RooSpan< const double > getValues(RooBatchCompute::RunContext &evalData, const RooArgSet *normSet=nullptr) const
Definition: RooAbsReal.cxx:311
Double_t getVal(const RooArgSet *normalisationSet=nullptr) const
Evaluate object.
Definition: RooAbsReal.h:93
virtual Bool_t isBinnedDistribution(const RooArgSet &) const
Tests if the distribution is binned. Unless overridden by derived classes, this always returns false.
Definition: RooAbsReal.h:344
RooArgSet is a container object that can hold multiple RooAbsArg objects.
Definition: RooArgSet.h:57
The RooBinSamplingPdf is supposed to be used as an adapter between a continuous PDF and a binned dist...
double integrate(const RooArgSet *normSet, double low, double high) const
Integrate the wrapped PDF using our current integrator, with given norm set and limits.
RooTemplateProxy< RooAbsPdf > _pdf
std::vector< double > _binBoundaries
! Workspace to store data for bin sampling
std::unique_ptr< ROOT::Math::IntegratorOneDim > & integrator() const
Direct access to the unique_ptr holding the integrator that's used to sample the bins.
RooTemplateProxy< RooAbsRealLValue > _observable
RooSpan< const double > binBoundaries() const
Get the bin boundaries for the observable.
std::unique_ptr< ROOT::Math::IntegratorOneDim > _integrator
! Integrator used to sample bins.
double _relEpsilon
Default integrator precision.
static std::unique_ptr< RooAbsPdf > create(RooAbsPdf &pdf, RooAbsData const &data, double precision)
Creates a wrapping RooBinSamplingPdf if appropriate.
double operator()(double x) const
Binding used by the integrator to evaluate the PDF.
double evaluate() const override
Integrate the PDF over the current bin of the observable.
const RooAbsPdf & pdf() const
std::list< double > * plotSamplingHint(RooAbsRealLValue &obs, double xlo, double xhi) const override
Return a list of all bin edges, so the PDF is plotted as a step function.
RooSpan< double > evaluateSpan(RooBatchCompute::RunContext &evalData, const RooArgSet *normSet) const override
Integrate the PDF over all its bins, and return a batch with those values.
The RooDataHist is a container class to hold N-dimensional binned data.
Definition: RooDataHist.h:45
A simple container to hold a batch of data values.
Definition: RooSpan.h:34
const char * GetName() const override
Returns name of object.
Definition: TNamed.h:47
const char * GetTitle() const override
Returns title of object.
Definition: TNamed.h:48
@ kADAPTIVE
to be used for general functions without singularities
Double_t x[n]
Definition: legend1.C:17
This struct enables passing computation data around between elements of a computation graph.
Definition: RunContext.h:31
RooSpan< double > makeBatch(const RooAbsArg *owner, std::size_t size)
Create a writable batch.
Definition: RunContext.cxx:89
Disable all caches for sub-branches in an expression tree.
Definition: RooHelpers.h:102
double epsilon
Definition: triangle.c:618