You are here

RooFit in 20 Minutes

Purpose

The RooFit library provides a toolkit for modeling the expected distribution of events in a physics analysis. Models can be used to perform unbinned maximum likelihood fits, produce plots, and generate "toy Monte Carlo" samples for various studies. RooFit was originally developed for the BaBar collaboration, a particle physics experiment at the Stanford Linear Accelerator Center. The software is primarily designed as a particle physics data analysis tool, but its general nature and open architecture make it useful for other types of data analysis also.

Mathematical model

The core functionality of RooFit is to enable the modeling of ‘event data’ distributions, where each event is a discrete occurrence in time, and has one or more measured observables associated with it. Experiments of this nature result in datasets obeying Poisson (or binomial) statistics.

The natural modeling language for such distributions are probability density functions F(x;p) that describe the probability density the distribution of observables x in terms of function in parameter p. The defining properties of probability density functions, unit normalization with respect to all observables and positive definiteness, also provide important benefits for the design of a structured modeling language: p.d.f.s are easily added with intuitive interpretation of fraction coefficients, they allow construction of higher dimensional p.d.f.s out of lower dimensional building block with an intuitive language to introduce and describe correlations between observables, they allow the universal implementation of toy Monte Carlo sampling techniques, and are of course an prerequisite for the use of (unbinned) maximum likelihood parameter estimation technique.

Design

RooFit introduces a granular structure in its mapping of mathematical data models components to C++ objects: rather than aiming at a monolithic entity that describes a data model, each math symbol is presented by a separate object. A feature of this design philosophy is that all RooFit models always consist of multiple objects.

For example a Gaussian probability density function consists typically of four objects, three objects representing the observable, the mean and the sigma parameters, and one object representing a Gaussian probability density function.

// --- Observable ---
RooRealVar mes("mes","m_{ES} (GeV)",5.20,5.30) ;

// --- Parameters ---
RooRealVar sigmean("sigmean","B^{#pm} mass",5.28,5.20,5.30) ;
RooRealVar sigwidth("sigwidth","B^{#pm} width",0.0027,0.001,1.) ;

// --- Build Gaussian PDF ---
RooGaussian signal("signal","signal PDF",mes,sigmean,sigwidth) ;

Similarly, model building operations such as addition, multiplication, integration are represented by separate operator objects and make the modeling language scale well to models of arbitrary complexity.

A complete example

Starting with the Gaussian signal p.d.f. defined above, the example below constructs a one-dimensional probability density function with a Gaussian signal component and a 'ARGUS' phase space background component.

// --- Build Argus background PDF ---
RooRealVar argpar("argpar","argus shape parameter",-20.0,-100.,-1.) ;
RooArgusBG background("background","Argus PDF",mes,RooConst(5.291),argpar) ;

// --- Construct signal+background PDF ---
RooRealVar nsig("nsig","#signal events",200,0.,10000) ;
RooRealVar nbkg("nbkg","#background events",800,0.,10000) ;
RooAddPdf model("model","g+a",RooArgList(signal,background),RooArgList(nsig,nbkg)) ;

We can now use this p.d.f. to generate an unbinned toy dataset, fit the p.d.f to that dataset using an unbinned maximum likelihood fit and visalizes the data with the p.d.f overlaid.

// --- Generate a toyMC sample from composite PDF ---
RooDataSet *data = model.generate(mes,2000) ;

// --- Perform extended ML fit of composite PDF to toy data ---
model.fitTo(*data) ;

// --- Plot toy data and composite PDF overlaid ---
RooPlot* mesframe = mes.frame() ;
data->plotOn(mesframe) ;
model.plotOn(mesframe) ;
model.plotOn(mesframe,Components(argus),LineStyle(kDashed)) ;

In the example above, all indivual components of the RooFit p.d.f (the variables, component p.d.f.s and combined p.d.f.) are all created individually in the macro. It is also possible to organize them in a container class 'the workspace' that has an associated factory tool to create trees of RooFit objects of arbitrary complexity from a simple construction language. A much shorter macro that uses a workspace and is equivalent to the above macro is shown below:

using namespace RooFit ;

RooWorkspace w("w",kTRUE) ;
w.factory("Gaussian::gauss(mes[5.20,5.30],mean[5.28,5.2,5.3],width[0.0027,0.001,1])");
w.factory("ArgusBG::argus(mes,5.291,argpar[-20,-100,-1])") ;
w.factory("SUM::sum(nsig[200,0,10000]*gauss,nbkg[800,0,10000]*argus)") ;

// --- Generate a toyMC sample from composite PDF ---
RooDataSet *data = w::sum.generate(w::mes,2000) ;

// --- Perform extended ML fit of composite PDF to toy data ---
w::sum.fitTo(*data) ;

// --- Plot toy data and composite PDF overlaid ---
RooPlot* mesframe = w::mes.frame() ;
data->plotOn(mesframe) ;
w::sum.plotOn(mesframe) ;
w::sum.plotOn(mesframe,Components(w::argus),LineStyle(kDashed)) ;

mesframe->Draw()  ;

Example of convolution of two p.d.f.s

The first example illustrated the use of the addition operator p.d.f RooAddPdf to compose p.d.f. It is also possible to construct convolutions of p.d.f.s using the FFT convolution operator p.d.f.

RooWorkspace w("w",kTRUE) ;

w.factory("Landau::landau(t[-10,30],ml[5,-20,20],sl[1,0.1,10])") ;
w.factory("Gaussian::gauss(t,mg[0],sg[2,0.1,10])") ;

// Construct convoluted pdf lxg(x) = landau(x) (*) gauss(x)
RooFFTConvPdf lxg("lxg","landau (X) gauss",w::t,w::landau,w::gauss) ;

// Alternative construction method using workspace
// w.factory("FCONV::lxg(x,landau,gauss)") ;

The above p.d.f. lxg can be used for fitting, plotting and event generation in exactly the same way as the p.d.f. model of the first example. The picture below shows the fitted lxg overlaid on data, as well as the unconvoluted Landau p.d.f. (dashed).

Example of a multi-dimensional p.d.f.

One can also construct multi-dimensional p.d.f.s with and without correlations using the product operator RooProdPdf. The example below shows how to construct a 2-dimensional p.d.f. with correlations of the form F(x|y)*G(y) where the conditional p.d.f. F(x|y) describes the distribution in observable x given a value of y, and p.d.f G(y) describes the distribution in observable y

RooWorkspace w("w",kTRUE) ;

// Construct F(x|y) -- a Gaussian in x with a mean that depends on y
w.factory("PolyVar::meanx(y[-5,5],{a0[-0.5,-5,5],a1[-0.5,-1,1]})") ;
w.factory("Gaussian::gaussx(x[-5,5],meanx,sigmax[0.5])") ;

// Construct G(y) -- an Exponential in y
w.factory("Exponential::gaussy(y,-0.2)") ;

// Construct M(x,y) = F(x|y)*G(y)
w.factory("PROD::model(F|y,G)") ;

Fitting, plotting and event generation with multi-dimensional p.d.f.s is very similar to that of one-dimensional p.d.f.s:

RooDataSet *data = w::model.generate(RooArgSet(w::x,w::y),10000) ;

w::model.fitTo(*data) ;

RooPlot* xframe = w::x.frame() ;
data->plotOn(frame) ;
w::model.plotOn(frame) ;

The picture below shows a 2-D visualization of the M(x,y) and the projection on the x observable (i.e. Int M(x,y) dy)) overlaid on the x distribution of the data

Example of working with likelihood functions and profile likelihood

The previous examples illustrated various techniques to construct probability density functions in RooFit. This examples illustrates various operations that can be applied at the likelihood level.

Given a p.d.f. and a dataset, the likelihood function can be constructed as

RooAbsReal* nll = pdf.createNLL(data) ;

The likelihood function behaves like a regular RooFit function and can be plotted the same way probability density functions can.

RooPlot* frame = myparam.frame() ;
nll->plotOn(frame) ;

Since likelihood evaluations are potentially time-consuming RooFit facilitates calculation of likelihood in parallel on multiple processes. This parallelization process is transparent to the user. To request parallel calculation on 8 processors (on the same host), construct the likelihood function as follows

RooAbsReal* nll = pdf.createNLL(data,NumCPU(8)) ;

One can also construct along similar lines the profile likelihood, which is the likelihood minimized w.r.t. the nuisance parameters, i.e for a likelihood L(p,q) where p is a parameter of interest and q is a nuisance parameters, the value of the profile likelihood PL(p) is the value of L(p,q) at the value of q where L(p,q) is lowest. A profile likelihood is construct as follows

RooAbsReal* pll = nll->createProfile(paramOfInterest) ;

The complete example below construct a toy p.d.f. and dataset and compare a likelihood scan and a profile likelihood scan in one of the parameters.

// Construct model
RooWorkspace w("w",kTRUE) ;
w.factory("Gaussian::g1(x[-20,20],mean[-10,10],sigma_g1[3])") ;
w.factory("Gaussian::g2(x,mean,sigma_g2[4,3,6])") ;
w.factory("SUM::model(frac[0.5,0,1]*g1,g2)") ;

// Generate 1000 events 
DataSet* data = w::model.generate(w::x,1000) ;

// Create likelihood function
RooAbsReal* nll = w::model.createNLL(*data,NumCPU(2)) ;

// Minimize likelihood
RooMinuit(*nll).migrad() ;

// Plot likelihood scan in parameter frac
RooPlot* frame1 = w::frac.frame(Bins(10),Range(0.01,0.95)) ;
nll->plotOn(frame1,ShiftToZero()) ;

// Plot the profile likelihood in frac
RooAbsReal* pll_frac = nll->createProfile(w::frac) ;
pll_frac->plotOn(frame1,LineColor(kRed)) ;

The resulting plot containing the likelihood and the profile likelihood in the frac parameter, as well as a similar plot for the sigma_g2 parameter is shown below.

Documentation

Please refer to this page for more documentation

Tutorial macros

A set of 83 tutorial macros (also referenced in the Users Manual) is available in the ROOT distribution in $ROOTSYS/tutorials/roofit as well as here.

Class Documentation

The detailed documentation of all class methods and data members is available for the core classes and the pdf classes.

Other documentation

Here is a link to a 200 slide presentation on RooFit presented in the French School of Statistics 2008 (slides are in English)

Forum for help and questions

Please post your questions on RooFit in the ROOT Math & Statistics tools forum