Logo ROOT  
Reference Guide
buildLikelihood.cxx
Go to the documentation of this file.
1/*
2 * Project: RooFit
3 * Authors:
4 * PB, Patrick Bos, Netherlands eScience Center, p.bos@esciencecenter.nl
5 *
6 * Copyright (c) 2021, CERN
7 *
8 * Redistribution and use in source and binary forms,
9 * with or without modification, are permitted according to the terms
10 * listed in LICENSE (http://roofit.sourceforge.net/license.txt)
11 */
12
14
15#include <RooSimultaneous.h>
20#include <RooAbsPdf.h>
21#include <RooAbsData.h>
22#include <RooRealSumPdf.h>
23#include <RooProdPdf.h>
24#include <TClass.h>
25
26#include <memory>
27
28namespace RooFit {
29/**
30 * \brief Namespace for new RooFit test statistic calculation.
31 *
32 * RooFit::TestStatistics contains a major refactoring of the RooAbsTestStatistic-RooAbsOptTestStatistic-RooNLLVar
33 * inheritance tree into:
34 * 1. statistics-based classes on the one hand;
35 * 2. calculation/evaluation/optimization based classes on the other hand.
36 *
37 * The likelihood is the central unit on the statistics side. The RooAbsL class is implemented for four kinds of
38 * likelihoods: binned, unbinned, "subsidiary" (an optimization for numerical stability that gathers components like
39 * global observables) and "sum" (over multiple components of the other types). These classes provide ways to compute
40 * their components in parallelizable chunks that can be used by the calculator classes as they see fit.
41 *
42 * On top of the likelihood classes, we also provide for convenience a likelihood builder buildLikelihood, as a free
43 * function in the namespace. This function analyzes the pdf and automatically constructs the proper likelihood, built
44 * up from the available RooAbsL subclasses.
45 *
46 * The calculator "Wrapper" classes are abstract interfaces. These can be implemented for different kinds of algorithms,
47 * or with different kinds of optimization "back-ends" in mind. In an upcoming PR, we will introduce the fork-based
48 * multi-processing implementation based on RooFit::MultiProcess. Other possible implementations could use the GPU or
49 * external tools like TensorFlow.
50 *
51 * The coupling of all these classes to RooMinimizer is made via the MinuitFcnGrad class, which owns the Wrappers that
52 * calculate the likelihood components.
53 */
54namespace TestStatistics {
55
56namespace { // private implementation details
57
58RooArgSet getConstraintsSet(RooAbsPdf *pdf, RooAbsData *data, ConstrainedParameters constrained_parameters,
59 ExternalConstraints external_constraints, GlobalObservables global_observables,
60 std::string global_observables_tag)
61{
62 // BEGIN CONSTRAINT COLLECTION; copied from RooAbsPdf::createNLL
63
64 bool doStripDisconnected = false;
65 // If no explicit list of parameters to be constrained is specified apply default algorithm
66 // All terms of RooProdPdfs that do not contain observables and share parameters with one or more
67 // terms that do contain observables are added as constraints.
68#ifndef NDEBUG
69 bool did_default_constraint_algo = false;
70 std::size_t N_default_constraints = 0;
71#endif
72 if (constrained_parameters.set.empty()) {
73 std::unique_ptr<RooArgSet> default_constraints{pdf->getParameters(*data, false)};
74 constrained_parameters.set.add(*default_constraints);
75 doStripDisconnected = true;
76#ifndef NDEBUG
77 did_default_constraint_algo = true;
78 N_default_constraints = default_constraints->getSize();
79#endif
80 }
81#ifndef NDEBUG
82 if (did_default_constraint_algo) {
83 assert(N_default_constraints == static_cast<std::size_t>(constrained_parameters.set.getSize()));
84 }
85#endif
86
87 // Collect internal and external constraint specifications
88 RooArgSet allConstraints;
89
90 if (!global_observables_tag.empty()) {
91 if (!global_observables.set.empty()) {
92 global_observables.set.removeAll();
93 }
94 std::unique_ptr<RooArgSet> allVars{pdf->getVariables()};
95 global_observables.set.add(
96 *dynamic_cast<RooArgSet *>(allVars->selectByAttrib(global_observables_tag.c_str(), true)));
97 oocoutI(nullptr, Minimization)
98 << "User-defined specification of global observables definition with tag named '" << global_observables_tag
99 << "'" << std::endl;
100 } else if (global_observables.set.empty()) {
101 // neither global_observables nor global_observables_tag was given - try if a default tag is defined in the head
102 // node
103 const char *defGlobObsTag = pdf->getStringAttribute("DefaultGlobalObservablesTag");
104 if (defGlobObsTag) {
105 oocoutI(nullptr, Minimization)
106 << "p.d.f. provides built-in specification of global observables definition with tag named '"
107 << defGlobObsTag << "'" << std::endl;
108 std::unique_ptr<RooArgSet> allVars{pdf->getVariables()};
109 global_observables.set.add(*dynamic_cast<RooArgSet *>(allVars->selectByAttrib(defGlobObsTag, true)));
110 }
111 }
112
113 // EGP: removed workspace (RooAbsPdf::_myws) based stuff for now; TODO: reconnect this class to workspaces
114
115 if (!constrained_parameters.set.empty()) {
116 std::unique_ptr<RooArgSet> constraints{
117 pdf->getAllConstraints(*data->get(), constrained_parameters.set, doStripDisconnected)};
118 allConstraints.add(*constraints);
119 }
120 if (!external_constraints.set.empty()) {
121 allConstraints.add(external_constraints.set);
122 }
123
124 return allConstraints;
125}
126
127/*
128 * \brief Extract a collection of subsidiary likelihoods from a pdf
129 *
130 * \param[in] pdf Raw pointer to the pdf
131 * \param[in] data Raw pointer to the dataset
132 * \param[in] constrained_parameters Set of parameters that are constrained. Pdf components dependent on these alone are
133 * added to the subsidiary likelihood.
134 * \param[in] external_constraints Set of external constraint pdfs, i.e. constraints
135 * not necessarily in the pdf itself. These are always added to the subsidiary likelihood.
136 * \param[in] global_observables
137 * Observables that have a constant value, independent of the dataset events. Pdf components dependent on these alone
138 * are added to the subsidiary likelihood. \note Overrides all other likelihood parameters (like those in \p
139 * constrained_parameters) if present.
140 * \param[in] global_observables_tag String that can be set as attribute in pdf
141 * components to indicate that it is a global observable. Can be used instead of or in addition to \p
142 * global_observables.
143 * \return A unique pointer to a RooSubsidiaryL that contains all terms in the pdf that can be
144 * calculated separately from the other components in the full likelihood.
145 */
146std::unique_ptr<RooSubsidiaryL>
147buildSubsidiaryL(RooAbsPdf *pdf, RooAbsData *data, ConstrainedParameters constrained_parameters,
148 ExternalConstraints external_constraints, GlobalObservables global_observables,
149 std::string global_observables_tag)
150{
151 auto allConstraints = getConstraintsSet(pdf, data, constrained_parameters, external_constraints, global_observables,
152 global_observables_tag);
153
154 std::unique_ptr<RooSubsidiaryL> subsidiary_likelihood;
155 // Include constraints, if any, in likelihood
156 if (!allConstraints.empty()) {
157
158 oocoutI(nullptr, Minimization)
159 << " Including the following contraint terms in minimization: " << allConstraints << std::endl;
160 if (!global_observables.set.empty()) {
161 oocoutI(nullptr, Minimization)
162 << "The following global observables have been defined: " << global_observables.set << std::endl;
163 }
164 std::string name("likelihood for pdf ");
165 name += pdf->GetName();
166 subsidiary_likelihood = std::make_unique<RooSubsidiaryL>(
167 name, allConstraints,
168 (!global_observables.set.empty()) ? global_observables.set : constrained_parameters.set);
169 }
170
171 return subsidiary_likelihood;
172}
173
174bool isSimultaneous(RooAbsPdf *pdf)
175{
176 auto sim_pdf = dynamic_cast<RooSimultaneous *>(pdf);
177 return sim_pdf != nullptr;
178}
179
180/// Get the binned part of a pdf
181///
182/// \param pdf Raw pointer to the pdf
183/// \return A pdf is binned if it has attribute "BinnedLikelihood" and it is a RooRealSumPdf (or derived class).
184/// If \p pdf itself is binned, it will be returned. If the pdf is a RooProdPdf (or derived), the product terms
185/// will be searched for a binned component and the first such term that is found will be returned. Note that
186/// the simultaneous pdf setup is such that it is assumed that only one component is binned, so this should
187/// always return the correct binned component. If no binned component is found, nullptr is returned.
188RooAbsPdf *getBinnedPdf(RooAbsPdf *pdf)
189{
190 RooAbsPdf *binnedPdf = nullptr;
191 if (pdf->getAttribute("BinnedLikelihood") && pdf->IsA()->InheritsFrom(RooRealSumPdf::Class())) {
192 // Simplest case: top-level of pdf is a RRSP
193 binnedPdf = pdf;
194 } else if (pdf->IsA()->InheritsFrom(RooProdPdf::Class())) {
195 // Default case: top-level pdf is a product of RRSP and other pdfs
196 for (const auto component : ((RooProdPdf *)pdf)->pdfList()) {
197 if (component->getAttribute("BinnedLikelihood") && component->IsA()->InheritsFrom(RooRealSumPdf::Class())) {
198 binnedPdf = (RooAbsPdf *)component;
199 break;
200 }
201 }
202 }
203 return binnedPdf;
204}
205
206/*
207 * \brief Build a set of likelihood components to build a likelihood from a simultaneous pdf
208 *
209 * \param[in] pdf Raw pointer to the pdf
210 * \param[in] data Raw pointer to the dataset
211 * \param[in] extended Set extended term calculation on, off or use Extended::Auto to determine automatically based on
212 * the pdf whether to activate or not.
213 * \return A vector to RooAbsL unique_ptrs that contain all component binned and/or
214 * unbinned likelihoods. Note: subsidiary components are not included; use getConstraintsSet and/or
215 * buildSubsidiaryLikelihood to add those.
216 */
217std::vector<std::unique_ptr<RooAbsL>>
218getSimultaneousComponents(RooAbsPdf *pdf, RooAbsData *data, RooAbsL::Extended extended)
219{
220 auto sim_pdf = dynamic_cast<RooSimultaneous *>(pdf);
221
222 // the rest of this function is an adaptation of RooAbsTestStatistic::initSimMode:
223
224 RooAbsCategoryLValue &simCat = (RooAbsCategoryLValue &)sim_pdf->indexCat();
225
226 // note: this is valid for simultaneous likelihoods, not for other test statistic types (e.g. chi2) for which this
227 // should return true.
228 bool process_empty_data_sets = RooAbsL::isExtendedHelper(pdf, extended);
229
230 TString simCatName(simCat.GetName());
231 // Note: important not to use cloned dataset here (possible when this code is run in Roo[...]L ctor), use the
232 // original one (which is data_ in Roo[...]L ctors, but data here)
233 std::unique_ptr<TList> dsetList{data->split(*sim_pdf, process_empty_data_sets)};
234 if (!dsetList) {
235 throw std::logic_error(
236 "getSimultaneousComponents ERROR, index category of simultaneous pdf is missing in dataset, aborting");
237 }
238
239 // Count number of used states
240 std::size_t N_components = 0;
241
242 for (const auto &catState : simCat) {
243 // Retrieve the PDF for this simCat state
244 RooAbsPdf *component_pdf = sim_pdf->getPdf(catState.first.c_str());
245 auto dset = (RooAbsData *)dsetList->FindObject(catState.first.c_str());
246
247 if (component_pdf && dset && (0. != dset->sumEntries() || process_empty_data_sets)) {
248 ++N_components;
249 }
250 }
251
252 // Allocate arrays
253 std::vector<std::unique_ptr<RooAbsL>> components;
254 components.reserve(N_components);
255 // _gofSplitMode.resize(N_components); // not used, Hybrid mode only, see below
256
257 // Create array of regular fit contexts, containing subset of data and single fitCat PDF
258 std::size_t n = 0;
259 for (const auto &catState : simCat) {
260 const std::string &catName = catState.first;
261 // Retrieve the PDF for this simCat state
262 RooAbsPdf *component_pdf = sim_pdf->getPdf(catName.c_str());
263 auto dset = (RooAbsData *)dsetList->FindObject(catName.c_str());
264
265 if (component_pdf && dset && (0. != dset->sumEntries() || process_empty_data_sets)) {
266 ooccoutI((TObject *)nullptr, Fitting)
267 << "getSimultaneousComponents: creating slave calculator #" << n << " for state " << catName << " ("
268 << dset->numEntries() << " dataset entries)" << std::endl;
269
270 RooAbsPdf *binnedPdf = getBinnedPdf(component_pdf);
271 bool binnedL = (binnedPdf != nullptr);
272 if (binnedPdf == nullptr && component_pdf->IsA()->InheritsFrom(RooProdPdf::Class())) {
273 // Default case: top-level pdf is a product of RRSP and other pdfs
274 for (const auto component : ((RooProdPdf *)component_pdf)->pdfList()) {
275 if (component->getAttribute("MAIN_MEASUREMENT")) {
276 // not really a binned pdf, but this prevents a (potentially) long list of subsidiary measurements to
277 // be passed to the slave calculator
278 binnedPdf = (RooAbsPdf *)component;
279 break;
280 }
281 }
282 }
283 // Below here directly pass binnedPdf instead of PROD(binnedPdf,constraints) as constraints are evaluated
284 // elsewhere anyway and omitting them reduces model complexity and associated handling/cloning times
285 if (binnedL) {
286 components.push_back(std::make_unique<RooBinnedL>((binnedPdf ? binnedPdf : component_pdf), dset));
287 } else {
288 components.push_back(std::make_unique<RooUnbinnedL>((binnedPdf ? binnedPdf : component_pdf), dset));
289 }
290 // }
291 components.back()->setSimCount(N_components);
292
293 // Servers may have been redirected between instantiation and (deferred) initialization
294
295 std::unique_ptr<RooArgSet> actualParams{binnedPdf ? binnedPdf->getParameters(dset)
296 : component_pdf->getParameters(dset)};
297 std::unique_ptr<RooArgSet> selTargetParams{
298 (RooArgSet *)pdf->getParameters(*data)->selectCommon(*actualParams)};
299
300 assert(selTargetParams->equals(*components.back()->getParameters()));
301
302 ++n;
303 } else {
304 if ((!dset || (0. != dset->sumEntries() && !process_empty_data_sets)) && component_pdf) {
305 ooccoutD((TObject *)nullptr, Fitting) << "getSimultaneousComponents: state " << catName
306 << " has no data entries, no slave calculator created" << std::endl;
307 }
308 }
309 }
310 oocoutI(nullptr, Fitting) << "getSimultaneousComponents: created " << n << " slave calculators."
311 << std::endl;
312
313 return components;
314}
315
316} // anonymous namespace with private implementation details
317
318
319/*
320 * \brief Build a likelihood from a pdf + dataset, optionally with a subsidiary likelihood component
321 *
322 * This function analyzes the pdf and automatically constructs the proper likelihood, built up from the available
323 * RooAbsL subclasses. In essence, this can give 8 conceptually different combinations, based on three questions:
324 * 1. Is it a simultaneous pdf?
325 * 2. Is the pdf binned?
326 * 3. Does the pdf have subsidiary terms?
327 * If questions 1 and 3 are answered negatively, this function will either return a RooBinnedL or RooUnbinnedL. In all
328 * other cases it returns a RooSumL, which will contain RooBinnedL and/or RooUnbinnedL component(s) and possibly a
329 * RooSubsidiaryL component with constraint terms.
330 *
331 * \param[in] pdf Raw pointer to the pdf
332 * \param[in] data Raw pointer to the dataset
333 * \param[in] extended Set extended term calculation on, off or use Extended::Auto to determine automatically based on
334 * the pdf whether to activate or not.
335 * \param[in] constrained_parameters Set of parameters that are constrained. Pdf components dependent on these alone are
336 * added to the subsidiary likelihood.
337 * \param[in] external_constraints Set of external constraint pdfs, i.e. constraints
338 * not necessarily in the pdf itself. These are always added to the subsidiary likelihood.
339 * \param[in] global_observables
340 * Observables that have a constant value, independent of the dataset events. Pdf components dependent on these alone
341 * are added to the subsidiary likelihood. \note Overrides all other likelihood parameters (like those in \p
342 * constrained_parameters) if present.
343 * \param[in] global_observables_tag String that can be set as attribute in pdf
344 * components to indicate that it is a global observable. Can be used instead of or in addition to \p
345 * global_observables.
346 * \return A unique pointer to a RooSubsidiaryL that contains all terms in
347 * the pdf that can be calculated separately from the other components in the full likelihood.
348 */
349std::unique_ptr<RooAbsL> buildLikelihood(RooAbsPdf *pdf, RooAbsData *data, RooAbsL::Extended extended,
350 ConstrainedParameters constrained_parameters,
351 ExternalConstraints external_constraints, GlobalObservables global_observables,
352 std::string global_observables_tag)
353{
354 std::unique_ptr<RooAbsL> likelihood;
355 std::vector<std::unique_ptr<RooAbsL>> components;
356
357 if (isSimultaneous(pdf)) {
358 components = getSimultaneousComponents(pdf, data, extended);
359 } else if (auto binnedPdf = getBinnedPdf(pdf)) {
360 likelihood = std::make_unique<RooBinnedL>(binnedPdf, data);
361 } else { // unbinned
362 likelihood = std::make_unique<RooUnbinnedL>(pdf, data, extended);
363 }
364
365 auto subsidiary = buildSubsidiaryL(pdf, data, constrained_parameters, external_constraints, global_observables, global_observables_tag);
366 if (subsidiary) {
367 if (likelihood) {
368 components.push_back(std::move(likelihood));
369 }
370 components.push_back(std::move(subsidiary));
371 }
372 if (components.size() > 0) {
373 likelihood = std::make_unique<RooSumL>(pdf, data, std::move(components), extended);
374 }
375 return likelihood;
376}
377
378// delegating convenience overloads
379std::unique_ptr<RooAbsL>
381{
382 return buildLikelihood(pdf, data, RooAbsL::Extended::Auto, constrained_parameters);
383}
384std::unique_ptr<RooAbsL>
386{
387 return buildLikelihood(pdf, data, RooAbsL::Extended::Auto, {}, external_constraints);
388}
389std::unique_ptr<RooAbsL>
391{
392 return buildLikelihood(pdf, data, RooAbsL::Extended::Auto, {}, {}, global_observables);
393}
394std::unique_ptr<RooAbsL>
395buildLikelihood(RooAbsPdf *pdf, RooAbsData *data, std::string global_observables_tag)
396{
397 return buildLikelihood(pdf, data, RooAbsL::Extended::Auto, {}, {}, {}, global_observables_tag);
398}
399std::unique_ptr<RooAbsL>
400buildLikelihood(RooAbsPdf *pdf, RooAbsData *data, ConstrainedParameters constrained_parameters, GlobalObservables global_observables)
401{
402 return buildLikelihood(pdf, data, RooAbsL::Extended::Auto, constrained_parameters, {}, global_observables);
403}
404
405
406} // namespace TestStatistics
407} // namespace RooFit
#define oocoutI(o, a)
Definition: RooMsgService.h:49
#define ooccoutI(o, a)
Definition: RooMsgService.h:57
#define ooccoutD(o, a)
Definition: RooMsgService.h:56
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void data
char name[80]
Definition: TGX11.cxx:110
const Text_t * getStringAttribute(const Text_t *key) const
Get string attribute mapped under key 'key'.
Definition: RooAbsArg.cxx:299
RooArgSet * getVariables(bool stripDisconnected=true) const
Return RooArgSet with all variables (tree leaf nodes of expresssion tree)
Definition: RooAbsArg.cxx:2035
bool getAttribute(const Text_t *name) const
Check if a named attribute is set. By default, all attributes are unset.
Definition: RooAbsArg.cxx:269
RooArgSet * getParameters(const RooAbsData *data, bool stripDisconnected=true) const
Create a list of leaf nodes in the arg tree starting with ourself as top node that don't match any of...
Definition: RooAbsArg.cxx:541
RooAbsCategoryLValue is the common abstract base class for objects that represent a discrete value th...
virtual void removeAll()
Remove all arguments from our set, deleting them if we own them.
bool empty() const
virtual bool add(const RooAbsArg &var, bool silent=false)
Add the specified argument to list.
bool selectCommon(const RooAbsCollection &refColl, RooAbsCollection &outColl) const
Create a subset of the current collection, consisting only of those elements that are contained as we...
RooAbsData is the common abstract base class for binned and unbinned datasets.
Definition: RooAbsData.h:62
TClass * IsA() const override
Definition: RooAbsPdf.h:385
virtual RooArgSet * getAllConstraints(const RooArgSet &observables, RooArgSet &constrainedParams, bool stripDisconnected=true) const
This helper function finds and collects all constraints terms of all component p.d....
Definition: RooAbsPdf.cxx:3374
RooArgSet is a container object that can hold multiple RooAbsArg objects.
Definition: RooArgSet.h:56
static bool isExtendedHelper(RooAbsPdf *pdf, Extended extended)
Definition: RooAbsL.cxx:34
RooProdPdf is an efficient implementation of a product of PDFs of the form.
Definition: RooProdPdf.h:33
static TClass * Class()
static TClass * Class()
RooSimultaneous facilitates simultaneous fitting of multiple PDFs to subsets of a given dataset.
Bool_t InheritsFrom(const char *cl) const override
Return kTRUE if this class inherits from a class with name "classname".
Definition: TClass.cxx:4863
const char * GetName() const override
Returns name of object.
Definition: TNamed.h:47
Mother of all ROOT objects.
Definition: TObject.h:37
virtual TObject * FindObject(const char *name) const
Must be redefined in derived classes.
Definition: TObject.cxx:327
Basic string class.
Definition: TString.h:136
RooCmdArg GlobalObservables(Args_t &&... argsOrArgSet)
RooCmdArg ExternalConstraints(const RooArgSet &constraintPdfs)
const Int_t n
Definition: legend1.C:16
std::unique_ptr< RooAbsL > buildLikelihood(RooAbsPdf *pdf, RooAbsData *data, RooAbsL::Extended extended=RooAbsL::Extended::Auto, ConstrainedParameters constrained_parameters={}, ExternalConstraints external_constraints={}, GlobalObservables global_observables={}, std::string global_observables_tag={})
The namespace RooFit contains mostly switches that change the behaviour of functions of PDFs (or othe...
Definition: Common.h:18
@ Minimization
Definition: RooGlobalFunc.h:63
Optional parameter used in buildLikelihood(), see documentation there.
Optional parameter used in buildLikelihood(), see documentation there.
Optional parameter used in buildLikelihood(), see documentation there.