Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
buildLikelihood.cxx
Go to the documentation of this file.
1/*
2 * Project: RooFit
3 * Authors:
4 * PB, Patrick Bos, Netherlands eScience Center, p.bos@esciencecenter.nl
5 *
6 * Copyright (c) 2021, CERN
7 *
8 * Redistribution and use in source and binary forms,
9 * with or without modification, are permitted according to the terms
10 * listed in LICENSE (http://roofit.sourceforge.net/license.txt)
11 */
12
14
15#include <RooSimultaneous.h>
20#include <RooAbsPdf.h>
21#include <RooAbsData.h>
22#include <RooRealSumPdf.h>
23#include <RooProdPdf.h>
24#include <TClass.h>
25
26#include <memory>
27
28namespace RooFit {
29/**
30 * \brief Namespace for new RooFit test statistic calculation.
31 *
32 * RooFit::TestStatistics contains a major refactoring of the RooAbsTestStatistic-RooAbsOptTestStatistic-RooNLLVar
33 * inheritance tree into:
34 * 1. statistics-based classes on the one hand;
35 * 2. calculation/evaluation/optimization based classes on the other hand.
36 *
37 * The likelihood is the central unit on the statistics side. The RooAbsL class is implemented for four kinds of
38 * likelihoods: binned, unbinned, "subsidiary" (an optimization for numerical stability that gathers components like
39 * global observables) and "sum" (over multiple components of the other types). These classes provide ways to compute
40 * their components in parallelizable chunks that can be used by the calculator classes as they see fit.
41 *
42 * On top of the likelihood classes, we also provide for convenience a likelihood builder `NLLFactory`. This factory
43 * analyzes the pdf and automatically constructs the proper likelihood, built up from the available RooAbsL subclasses.
44 * Options, like specifying constraint terms or global observables, can be passed using method chaining. The
45 * `NLLFactory::build` method finally returns the constructed likelihood as a RooRealL object that can be fit to using
46 * RooMinimizer.
47 *
48 * The calculator "Wrapper" classes are abstract interfaces. These can be implemented for different kinds of algorithms,
49 * or with different kinds of optimization "back-ends" in mind. Two fork-based multi-processing implementations based
50 * on RooFit::MultiProcess are available, one to calculate the gradient of the likelihood in parallel and one for the
51 * likelihood itself. The likelihood can also be calculated serially.
52 *
53 * The coupling of all these classes to RooMinimizer is made via the MinuitFcnGrad class, which owns the Wrappers that
54 * calculate the likelihood components.
55 *
56 * More extensive documentation is available at
57 * https://github.com/root-project/root/blob/master/roofit/doc/developers/test_statistics.md
58 */
59namespace TestStatistics {
60
61namespace { // private implementation details
62
63RooArgSet getConstraintsSet(RooAbsPdf *pdf, RooAbsData *data, RooArgSet constrained_parameters,
64 RooArgSet const &external_constraints, RooArgSet global_observables,
65 std::string const &global_observables_tag)
66{
67 // BEGIN CONSTRAINT COLLECTION; copied from RooAbsPdf::createNLL
68
69 bool doStripDisconnected = false;
70 // If no explicit list of parameters to be constrained is specified apply default algorithm
71 // All terms of RooProdPdfs that do not contain observables and share parameters with one or more
72 // terms that do contain observables are added as constraints.
73#ifndef NDEBUG
74 bool did_default_constraint_algo = false;
75 std::size_t N_default_constraints = 0;
76#endif
77 if (constrained_parameters.empty()) {
78 std::unique_ptr<RooArgSet> default_constraints{pdf->getParameters(*data, false)};
79 constrained_parameters.add(*default_constraints);
80 doStripDisconnected = true;
81#ifndef NDEBUG
82 did_default_constraint_algo = true;
83 N_default_constraints = default_constraints->size();
84#endif
85 }
86#ifndef NDEBUG
87 if (did_default_constraint_algo) {
88 assert(N_default_constraints == static_cast<std::size_t>(constrained_parameters.size()));
89 }
90#endif
91
92 // Collect internal and external constraint specifications
93 RooArgSet allConstraints;
94
95 if (!global_observables_tag.empty()) {
96 if (!global_observables.empty()) {
97 global_observables.removeAll();
98 }
99 std::unique_ptr<RooArgSet> allVars{pdf->getVariables()};
100 global_observables.add(
101 *std::unique_ptr<RooArgSet>{allVars->selectByAttrib(global_observables_tag.c_str(), true)});
102 oocoutI(nullptr, Minimization) << "User-defined specification of global observables definition with tag named '"
103 << global_observables_tag << "'" << std::endl;
104 } else if (global_observables.empty()) {
105 // neither global_observables nor global_observables_tag was given - try if a default tag is defined in the head
106 // node
107 const char *defGlobObsTag = pdf->getStringAttribute("DefaultGlobalObservablesTag");
108 if (defGlobObsTag) {
109 oocoutI(nullptr, Minimization)
110 << "p.d.f. provides built-in specification of global observables definition with tag named '"
111 << defGlobObsTag << "'" << std::endl;
112 std::unique_ptr<RooArgSet> allVars{pdf->getVariables()};
113 global_observables.add(*std::unique_ptr<RooArgSet>{allVars->selectByAttrib(defGlobObsTag, true)});
114 }
115 }
116
117 // EGP: removed workspace (RooAbsPdf::_myws) based stuff for now; TODO: reconnect this class to workspaces
118
119 if (!constrained_parameters.empty()) {
120 std::unique_ptr<RooArgSet> constraints{
121 pdf->getAllConstraints(*data->get(), constrained_parameters, doStripDisconnected)};
122 allConstraints.add(*constraints);
123 }
124 if (!external_constraints.empty()) {
125 allConstraints.add(external_constraints);
126 }
127
128 return allConstraints;
129}
130
131/*
132 * \brief Extract a collection of subsidiary likelihoods from a pdf
133 *
134 * \param[in] pdf Raw pointer to the pdf
135 * \param[in] data Raw pointer to the dataset
136 * \param[in] constrained_parameters Set of parameters that are constrained. Pdf components dependent on these alone are
137 * added to the subsidiary likelihood.
138 * \param[in] external_constraints Set of external constraint pdfs, i.e. constraints
139 * not necessarily in the pdf itself. These are always added to the subsidiary likelihood.
140 * \param[in] global_observables
141 * Observables that have a constant value, independent of the dataset events. Pdf components dependent on these alone
142 * are added to the subsidiary likelihood. \note Overrides all other likelihood parameters (like those in \p
143 * constrained_parameters) if present.
144 * \param[in] global_observables_tag String that can be set as attribute in pdf
145 * components to indicate that it is a global observable. Can be used instead of or in addition to \p
146 * global_observables.
147 * \return A unique pointer to a RooSubsidiaryL that contains all terms in the pdf that can be
148 * calculated separately from the other components in the full likelihood.
149 */
150std::unique_ptr<RooSubsidiaryL> buildSubsidiaryL(RooAbsPdf *pdf, RooAbsData *data, RooArgSet constrained_parameters,
151 RooArgSet const &external_constraints, RooArgSet global_observables,
152 std::string const &global_observables_tag)
153{
154 auto allConstraints = getConstraintsSet(pdf, data, constrained_parameters, external_constraints, global_observables,
155 global_observables_tag);
156
157 std::unique_ptr<RooSubsidiaryL> subsidiary_likelihood;
158 // Include constraints, if any, in likelihood
159 if (!allConstraints.empty()) {
160
161 oocoutI(nullptr, Minimization) << " Including the following constraint terms in minimization: " << allConstraints
162 << std::endl;
163 if (!global_observables.empty()) {
164 oocoutI(nullptr, Minimization) << "The following global observables have been defined: " << global_observables
165 << std::endl;
166 }
167 std::string name("likelihood for pdf ");
168 name += pdf->GetName();
169 subsidiary_likelihood = std::make_unique<RooSubsidiaryL>(
170 name, allConstraints, (!global_observables.empty()) ? global_observables : constrained_parameters);
171 }
172
173 return subsidiary_likelihood;
174}
175
176/// Get the binned part of a pdf
177///
178/// \param pdf Raw pointer to the pdf
179/// \return A pdf is binned if it has attribute "BinnedLikelihood" and it is a RooRealSumPdf (or derived class).
180/// If \p pdf itself is binned, it will be returned. If the pdf is a RooProdPdf (or derived), the product terms
181/// will be searched for a binned component and the first such term that is found will be returned. Note that
182/// the simultaneous pdf setup is such that it is assumed that only one component is binned, so this should
183/// always return the correct binned component. If no binned component is found, nullptr is returned.
184RooAbsPdf *getBinnedPdf(RooAbsPdf *pdf)
185{
186 RooAbsPdf *binnedPdf = nullptr;
187 if (pdf->getAttribute("BinnedLikelihood") && pdf->IsA()->InheritsFrom(RooRealSumPdf::Class())) {
188 // Simplest case: top-level of pdf is a RRSP
189 binnedPdf = pdf;
190 } else if (pdf->IsA()->InheritsFrom(RooProdPdf::Class())) {
191 // Default case: top-level pdf is a product of RRSP and other pdfs
192 for (const auto component : (static_cast<RooProdPdf *>(pdf))->pdfList()) {
193 if (component->getAttribute("BinnedLikelihood") && component->IsA()->InheritsFrom(RooRealSumPdf::Class())) {
194 binnedPdf = static_cast<RooAbsPdf *>(component);
195 break;
196 }
197 }
198 }
199 return binnedPdf;
200}
201
202} // namespace
203
204/*
205 * \brief Build a set of likelihood components to build a likelihood from a simultaneous pdf.
206 *
207 * \return A vector to RooAbsL unique_ptrs that contain all component binned and/or
208 * unbinned likelihoods. Note: subsidiary components are not included; use getConstraintsSet and/or
209 * buildSubsidiaryLikelihood to add those.
210 */
211std::vector<std::unique_ptr<RooAbsL>> NLLFactory::getSimultaneousComponents()
212{
213 auto sim_pdf = dynamic_cast<RooSimultaneous *>(&_pdf);
214
215 // the rest of this function is an adaptation of RooAbsTestStatistic::initSimMode:
216
217 auto &simCat = const_cast<RooAbsCategoryLValue &>(sim_pdf->indexCat());
218
219 // note: this is valid for simultaneous likelihoods, not for other test statistic types (e.g. chi2) for which this
220 // should return true.
221 bool process_empty_data_sets = RooAbsL::isExtendedHelper(&_pdf, _extended);
222
223 TString simCatName(simCat.GetName());
224 // Note: important not to use cloned dataset here (possible when this code is run in Roo[...]L ctor), use the
225 // original one (which is data_ in Roo[...]L ctors, but data here)
226 std::unique_ptr<TList> dsetList{_data.split(*sim_pdf, process_empty_data_sets)};
227 if (!dsetList) {
228 throw std::logic_error(
229 "getSimultaneousComponents ERROR, index category of simultaneous pdf is missing in dataset, aborting");
230 }
231
232 // Count number of used states
233 std::size_t N_components = 0;
234
235 for (const auto &catState : simCat) {
236 // Retrieve the PDF for this simCat state
237 RooAbsPdf *component_pdf = sim_pdf->getPdf(catState.first.c_str());
238 auto *dset = static_cast<RooAbsData *>(dsetList->FindObject(catState.first.c_str()));
239
240 if (component_pdf && dset && (0. != dset->sumEntries() || process_empty_data_sets)) {
241 ++N_components;
242 }
243 }
244
245 // Allocate arrays
246 std::vector<std::unique_ptr<RooAbsL>> components;
247 components.reserve(N_components);
248 // _gofSplitMode.resize(N_components); // not used, Hybrid mode only, see below
249
250 // Create array of regular fit contexts, containing subset of data and single fitCat PDF
251 std::size_t n = 0;
252 for (const auto &catState : simCat) {
253 const std::string &catName = catState.first;
254 // Retrieve the PDF for this simCat state
255 RooAbsPdf *component_pdf = sim_pdf->getPdf(catName.c_str());
256 auto *dset = static_cast<RooAbsData *>(dsetList->FindObject(catName.c_str()));
257
258 if (component_pdf && dset && (0. != dset->sumEntries() || process_empty_data_sets)) {
259 ooccoutI(nullptr, Fitting) << "getSimultaneousComponents: creating slave calculator #" << n << " for state "
260 << catName << " (" << dset->numEntries() << " dataset entries)" << std::endl;
261
262 RooAbsPdf *binnedPdf = getBinnedPdf(component_pdf);
263 bool binnedL = (binnedPdf != nullptr);
264 if (binnedPdf == nullptr && component_pdf->IsA()->InheritsFrom(RooProdPdf::Class())) {
265 // Default case: top-level pdf is a product of RRSP and other pdfs
266 for (const auto component : (static_cast<RooProdPdf *>(component_pdf))->pdfList()) {
267 if (component->getAttribute("MAIN_MEASUREMENT")) {
268 // not really a binned pdf, but this prevents a (potentially) long list of subsidiary measurements to
269 // be passed to the slave calculator
270 binnedPdf = static_cast<RooAbsPdf *>(component);
271 break;
272 }
273 }
274 }
275 // Below here directly pass binnedPdf instead of PROD(binnedPdf,constraints) as constraints are evaluated
276 // elsewhere anyway and omitting them reduces model complexity and associated handling/cloning times
277 if (binnedL) {
278 components.push_back(std::make_unique<RooBinnedL>((binnedPdf ? binnedPdf : component_pdf), dset));
279 } else {
280 components.push_back(
281 std::make_unique<RooUnbinnedL>((binnedPdf ? binnedPdf : component_pdf), dset, _extended, _evalBackend));
282 }
283 // }
284 components.back()->setSimCount(N_components);
285
286 // Servers may have been redirected between instantiation and (deferred) initialization
287
288 std::unique_ptr<RooArgSet> actualParams{binnedPdf ? binnedPdf->getParameters(dset)
289 : component_pdf->getParameters(dset)};
290 RooArgSet params;
291 _pdf.getParameters(_data.get(), params);
292 RooArgSet selTargetParams;
293 params.selectCommon(*actualParams, selTargetParams);
294
295 assert(selTargetParams.equals(*components.back()->getParameters()));
296
297 ++n;
298 } else {
299 if ((!dset || (0. != dset->sumEntries() && !process_empty_data_sets)) && component_pdf) {
300 ooccoutD(nullptr, Fitting) << "getSimultaneousComponents: state " << catName
301 << " has no data entries, no slave calculator created" << std::endl;
302 }
303 }
304 }
305 oocoutI(nullptr, Fitting) << "getSimultaneousComponents: created " << n << " slave calculators." << std::endl;
306
307 return components;
308}
309
310/// Create a likelihood builder for a given pdf and dataset.
311/// \param[in] pdf Raw pointer to the pdf
312/// \param[in] data Raw pointer to the dataset
314
315/*
316 * \brief Build a likelihood from a pdf + dataset, optionally with a subsidiary likelihood component.
317 *
318 * This function analyzes the pdf and automatically constructs the proper likelihood, built up from the available
319 * RooAbsL subclasses. In essence, this can give 8 conceptually different combinations, based on three questions:
320 * 1. Is it a simultaneous pdf?
321 * 2. Is the pdf binned?
322 * 3. Does the pdf have subsidiary terms?
323 * If questions 1 and 3 are answered negatively, this function will either return a RooBinnedL or RooUnbinnedL. In all
324 * other cases it returns a RooSumL, which will contain RooBinnedL and/or RooUnbinnedL component(s) and possibly a
325 * RooSubsidiaryL component with constraint terms.
326 *
327 * \return A unique pointer to a RooSubsidiaryL that contains all terms in
328 * the pdf that can be calculated separately from the other components in the full likelihood.
329 */
330std::unique_ptr<RooAbsL> NLLFactory::build()
331{
332 std::unique_ptr<RooAbsL> likelihood;
333 std::vector<std::unique_ptr<RooAbsL>> components;
334
335 if (dynamic_cast<RooSimultaneous const *>(&_pdf)) {
336 components = getSimultaneousComponents();
337 } else if (auto binnedPdf = getBinnedPdf(&_pdf)) {
338 likelihood = std::make_unique<RooBinnedL>(binnedPdf, &_data);
339 } else { // unbinned
340 likelihood = std::make_unique<RooUnbinnedL>(&_pdf, &_data, _extended, _evalBackend);
341 }
342
345 if (subsidiary) {
346 if (likelihood) {
347 components.push_back(std::move(likelihood));
348 }
349 components.push_back(std::move(subsidiary));
350 }
351 if (!components.empty()) {
352 likelihood = std::make_unique<RooSumL>(&_pdf, &_data, std::move(components), _extended);
353 }
354 return likelihood;
355}
356
357/// \param[in] extended Set extended term calculation on, off or use
358/// RooAbsL::Extended::Auto to determine automatically based on the
359/// pdf whether to activate or not.
361{
362 _extended = extended;
363 return *this;
364}
365
366/// \param[in] constrainedParameters Set of parameters that are constrained.
367/// Pdf components dependent on these alone are added to the
368/// subsidiary likelihood.
370{
371 _constrainedParameters.add(constrainedParameters);
372 return *this;
373}
374
375/// \param[in] externalConstraints Set of external constraint pdfs, i.e.
376/// constraints not necessarily in the pdf itself. These are always
377/// added to the subsidiary likelihood.
379{
380 _externalConstraints.add(externalConstraints);
381 return *this;
382}
383
384/// \param[in] globalObservables Observables that have a constant value,
385/// independent of the dataset events. Pdf components dependent on
386/// these alone are added to the subsidiary likelihood.
387/// \note Overrides all other likelihood parameters (like those in
388/// NLLFactory::ConstrainedParameters()) if present.
390{
391 _globalObservables.add(globalObservables);
392 return *this;
393}
394
395/// \param[in] globalObservablesTag String that can be set as attribute in
396/// pdf components to indicate that it is a global observable. Can
397/// be used instead of or in addition to
398/// NLLFactory::GlobalObservables().
399NLLFactory &NLLFactory::GlobalObservablesTag(const char *globalObservablesTag)
400{
401 _globalObservablesTag = globalObservablesTag;
402 return *this;
403}
404
406{
407 _evalBackend = evalBackend;
408 return *this;
409}
410
411} // namespace TestStatistics
412} // namespace RooFit
#define oocoutI(o, a)
#define ooccoutI(o, a)
#define ooccoutD(o, a)
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void data
char name[80]
Definition TGX11.cxx:110
RooFit::OwningPtr< RooArgSet > getParameters(const RooAbsData *data, bool stripDisconnected=true) const
Create a list of leaf nodes in the arg tree starting with ourself as top node that don't match any of...
const Text_t * getStringAttribute(const Text_t *key) const
Get string attribute mapped under key 'key'.
bool getAttribute(const Text_t *name) const
Check if a named attribute is set. By default, all attributes are unset.
RooFit::OwningPtr< RooArgSet > getVariables(bool stripDisconnected=true) const
Return RooArgSet with all variables (tree leaf nodes of expression tree)
Abstract base class for objects that represent a discrete value that can be set from the outside,...
bool equals(const RooAbsCollection &otherColl) const
Check if this and other collection have identically-named contents.
virtual void removeAll()
Remove all arguments from our set, deleting them if we own them.
virtual bool add(const RooAbsArg &var, bool silent=false)
Add the specified argument to list.
Storage_t::size_type size() const
Abstract base class for binned and unbinned datasets.
Definition RooAbsData.h:57
virtual const RooArgSet * get() const
Definition RooAbsData.h:101
virtual RooFit::OwningPtr< TList > split(const RooAbsCategory &splitCat, bool createEmptyDataSets=false) const
Split the dataset into subsets based on states of a categorical variable in this dataset.
Abstract interface for all probability density functions.
Definition RooAbsPdf.h:40
TClass * IsA() const override
Definition RooAbsPdf.h:351
RooArgSet * getAllConstraints(const RooArgSet &observables, RooArgSet &constrainedParams, bool stripDisconnected=true) const
This helper function finds and collects all constraints terms of all component p.d....
RooArgSet is a container object that can hold multiple RooAbsArg objects.
Definition RooArgSet.h:24
RooArgSet * selectCommon(const RooAbsCollection &refColl) const
Use RooAbsCollection::selecCommon(), but return as RooArgSet.
Definition RooArgSet.h:154
std::unique_ptr< RooAbsL > build()
std::vector< std::unique_ptr< RooAbsL > > getSimultaneousComponents()
NLLFactory & EvalBackend(RooFit::EvalBackend evalBackend)
NLLFactory(RooAbsPdf &pdf, RooAbsData &data)
Create a likelihood builder for a given pdf and dataset.
NLLFactory & ExternalConstraints(const RooArgSet &externalconstraints)
NLLFactory & Extended(RooAbsL::Extended extended)
NLLFactory & GlobalObservables(const RooArgSet &globalObservables)
NLLFactory & ConstrainedParameters(const RooArgSet &constrainedParameters)
NLLFactory & GlobalObservablesTag(const char *globalObservablesTag)
static bool isExtendedHelper(RooAbsPdf *pdf, Extended extended)
Definition RooAbsL.cxx:35
Efficient implementation of a product of PDFs of the form.
Definition RooProdPdf.h:39
static TClass * Class()
static TClass * Class()
Facilitates simultaneous fitting of multiple PDFs to subsets of a given dataset.
Bool_t InheritsFrom(const char *cl) const override
Return kTRUE if this class inherits from a class with name "classname".
Definition TClass.cxx:4943
const char * GetName() const override
Returns name of object.
Definition TNamed.h:47
Basic string class.
Definition TString.h:139
const Int_t n
Definition legend1.C:16
The namespace RooFit contains mostly switches that change the behaviour of functions of PDFs (or othe...
Definition CodegenImpl.h:64