ROOT logo
ROOT » MATH » SPLOT » TSPlot

class TSPlot: public TObject


Overview

A common method used in High Energy Physics to perform measurements is the maximum Likelihood method, exploiting discriminating variables to disentangle signal from background. The crucial point for such an analysis to be reliable is to use an exhaustive list of sources of events combined with an accurate description of all the Probability Density Functions (PDF).

To assess the validity of the fit, a convincing quality check is to explore further the data sample by examining the distributions of control variables. A control variable can be obtained for instance by removing one of the discriminating variables before performing again the maximum Likelihood fit: this removed variable is a control variable. The expected distribution of this control variable, for signal, is to be compared to the one extracted, for signal, from the data sample. In order to be able to do so, one must be able to unfold from the distribution of the whole data sample.

The TSPlot method allows to reconstruct the distributions for the control variable, independently for each of the various sources of events, without making use of any a priori knowledge on this variable. The aim is thus to use the knowledge available for the discriminating variables to infer the behaviour of the individual sources of events with respect to the control variable.

TSPlot is optimal if the control variable is uncorrelated with the discriminating variables.

A detail description of the formalism itself, called $\hbox{$_s$}{\cal P}lot$, is given in [1].

The method

The $\hbox{$_s$}{\cal P}lot$ technique is developped in the above context of a maximum Likelihood method making use of discriminating variables.

One considers a data sample in which are merged several species of events. These species represent various signal components and background components which all together account for the data sample. The different terms of the log-Likelihood are:

  • $N$: the total number of events in the data sample,
  • ${\rm N}_{\rm s}$: the number of species of events populating the data sample,
  • $N_i$: the number of events expected on the average for the $i^{\rm th}$ species,
  • ${\rm f}_i(y_e)$: the value of the PDFs of the discriminating variables $y$ for the $i^{th}$ species and for event $e$,
  • $x$: the set of control variables which, by definition, do not appear in the expression of the Likelihood function ${\cal L}$.
The extended log-Likelihood reads:
\begin{displaymath}
{\cal L}=\sum_{e=1}^{N}\ln \Big\{ \sum_{i=1}^{{\rm N}_{\rm s}}N_i{\rm f}_i(y_e) \Big\} -\sum_{i=1}^{{\rm N}_{\rm s}}N_i ~.
\end{displaymath} (1)

From this expression, after maximization of ${\cal L}$ with respect to the $N_i$ parameters, a weight can be computed for every event and each species, in order to obtain later the true distribution ${\hbox{\bf {M}}}_i(x)$ of variable $x$. If ${\rm n}$ is one of the ${\rm N}_{\rm s}$ species present in the data sample, the weight for this species is defined by:
\begin{displaymath}
\begin{Large}
\fbox{$
{_s{\cal P}}_{\rm n}(y_e)={\sum_{j=1}^...
...um_{k=1}^{{\rm N}_{\rm s}}N_k{\rm f}_k(y_e) } $}\end{Large} ~,
\end{displaymath} (2)

where $\hbox{\bf V}_{{\rm n}j}$ is the covariance matrix resulting from the Likelihood maximization. This matrix can be used directly from the fit, but this is numerically less accurate than the direct computation:
\begin{displaymath}
\hbox{\bf V}^{-1}_{{\rm n}j}~=~
{\partial^2(-{\cal L})\over\...
...y_e)\over(\sum_{k=1}^{{\rm N}_{\rm s}}N_k{\rm f}_k(y_e))^2} ~.
\end{displaymath} (3)

The distribution of the control variable $x$ obtained by histogramming the weighted events reproduces, on average, the true distribution ${\hbox{\bf {M}}}_{\rm n}(x)$.

The class TSPlot allows to reconstruct the true distribution ${\hbox{\bf {M}}}_{\rm n}(x)$ of a control variable $x$ for each of the ${\rm N}_{\rm s}$ species from the sole knowledge of the PDFs of the discriminating variables ${\rm f}_i(y)$. The plots obtained thanks to the TSPlot class are called $\hbox {$_s$}{\cal P}lots$.

Some properties and checks

Beside reproducing the true distribution, $\hbox {$_s$}{\cal P}lots$ bear remarkable properties:

  • Each $x$-distribution is properly normalized:
    \begin{displaymath}
\sum_{e=1}^{N} {_s{\cal P}}_{\rm n}(y_e)~=~N_{\rm n}~.
\end{displaymath} (4)

  • For any event:
    \begin{displaymath}
\sum_{l=1}^{{\rm N}_{\rm s}} {_s{\cal P}}_l(y_e) ~=~1 ~.
\end{displaymath} (5)

    That is to say that, summing up the ${\rm N}_{\rm s}$ $\hbox {$_s$}{\cal P}lots$, one recovers the data sample distribution in $x$, and summing up the number of events entering in a $\hbox{$_s$}{\cal P}lot$ for a given species, one recovers the yield of the species, as provided by the fit. The property 4 is implemented in the TSPlot class as a check.
  • the sum of the statistical uncertainties per bin
    \begin{displaymath}
\sigma[N_{\rm n}\ _s\tilde{\rm M}_{\rm n}(x) {\delta x}]~=~\sqrt{\sum_{e \subset {\delta x}} ({_s{\cal P}}_{\rm n})^2} ~.
\end{displaymath} (6)

    reproduces the statistical uncertainty on the yield $N_{\rm n}$, as provided by the fit: $\sigma[N_{\rm n}]\equiv\sqrt{\hbox{\bf V}_{{\rm n}{\rm n}}}$. Because of that and since the determination of the yields is optimal when obtained using a Likelihood fit, one can conclude that the $\hbox{$_s$}{\cal P}lot$ technique is itself an optimal method to reconstruct distributions of control variables.

Different steps followed by TSPlot

  1. A maximum Likelihood fit is performed to obtain the yields $N_i$ of the various species. The fit relies on discriminating variables $y$ uncorrelated with a control variable $x$: the later is therefore totally absent from the fit.
  2. The weights ${_s{\cal P}}$ are calculated using Eq. (2) where the covariance matrix is taken from Minuit.
  3. Histograms of $x$ are filled by weighting the events with ${_s{\cal P}}$.
  4. Error bars per bin are given by Eq. (6).
The $\hbox {$_s$}{\cal P}lots$ reproduce the true distributions of the species in the control variable $x$, within the above defined statistical uncertainties.

Illustrations

To illustrate the technique, one considers an example derived from the analysis where $\hbox {$_s$}{\cal P}lots$ have been first used (charmless B decays). One is dealing with a data sample in which two species are present: the first is termed signal and the second background. A maximum Likelihood fit is performed to obtain the two yields $N_1$ and $N_2$. The fit relies on two discriminating variables collectively denoted $y$ which are chosen within three possible variables denoted ${m_{\rm ES}}$, $\Delta E$ and ${\cal F}$. The variable which is not incorporated in $y$ is used as the control variable $x$. The six distributions of the three variables are assumed to be the ones depicted in Fig. 1.

Figure 1: Distributions of the three discriminating variables available to perform the Likelihood fit: ${m_{\rm ES}}$, $\Delta E$, ${\cal F}$. Among the three variables, two are used to perform the fit while one is kept out of the fit to serve the purpose of a control variable. The three distributions on the top (resp. bottom) of the figure correspond to the signal (resp. background). The unit of the vertical axis is chosen such that it indicates the number of entries per bin, if one slices the histograms in 25 bins.
\begin{figure}\begin{center}
\mbox{{\psfig{file=pdfmesNIM.eps,width=0.33\linewi...
...th}}
{\psfig{file=pdffiNIM.eps,width=0.33\linewidth}}}
\end{center}\end{figure}

A data sample being built through a Monte Carlo simulation based on the distributions shown in Fig. 1, one obtains the three distributions of Fig. 2. Whereas the distribution of $\Delta E$ clearly indicates the presence of the signal, the distribution of ${m_{\rm ES}}$ and ${\cal F}$ are less obviously populated by signal.

Figure 2: Distributions of the three discriminating variables for signal plus background. The three distributions are the ones obtained from a data sample obtained through a Monte Carlo simulation based on the distributions shown in Fig. 1. The data sample consists of 500 signal events and 5000 background events.
\begin{figure}\begin{center}
\mbox{{\psfig{file=genmesTOTNIM.eps,width=0.33\lin...
...}
{\psfig{file=genfiTOTNIM.eps,width=0.33\linewidth}}}
\end{center}\end{figure}

Chosing $\Delta E$ and ${\cal F}$ as discriminating variables to determine $N_1$ and $N_2$ through a maximum Likelihood fit, one builds, for the control variable ${m_{\rm ES}}$ which is unknown to the fit, the two $\hbox {$_s$}{\cal P}lots$ for signal and background shown in Fig. 3. One observes that the $\hbox{$_s$}{\cal P}lot$ for signal reproduces correctly the PDF even where the latter vanishes, although the error bars remain sizeable. This results from the almost complete cancellation between positive and negative weights: the sum of weights is close to zero while the sum of weights squared is not. The occurence of negative weights occurs through the appearance of the covariance matrix, and its negative components, in the definition of Eq. (2).

A word of caution is in order with respect to the error bars. Whereas their sum in quadrature is identical to the statistical uncertainties of the yields determined by the fit, and if, in addition, they are asymptotically correct, the error bars should be handled with care for low statistics and/or for too fine binning. This is because the error bars do not incorporate two known properties of the PDFs: PDFs are positive definite and can be non-zero in a given x-bin, even if in the particular data sample at hand, no event is observed in this bin. The latter limitation is not specific to $\hbox {$_s$}{\cal P}lots$, rather it is always present when one is willing to infer the PDF at the origin of an histogram, when, for some bins, the number of entries does not guaranty the applicability of the Gaussian regime. In such situations, a satisfactory practice is to attach allowed ranges to the histogram to indicate the upper and lower limits of the PDF value which are consistent with the actual observation, at a given confidence level.

Figure 3: The $\hbox {$_s$}{\cal P}lots$ (signal on the left, background on the right) obtained for ${m_{\rm ES}}$ are represented as dots with error bars. They are obtained from a fit using only information from $\Delta E$ and ${\cal F}$.
\begin{figure}\begin{center}
\mbox{\psfig{file=mass-sig-sPlot.eps,width=0.48\li...
... \psfig{file=mass-bkg-sPlot.eps,width=0.48\linewidth}}
\end{center}\end{figure}

Chosing ${m_{\rm ES}}$ and $\Delta E$ as discriminating variables to determine $N_1$ and $N_2$ through a maximum Likelihood fit, one builds, for the control variable ${\cal F}$ which is unknown to the fit, the two $\hbox {$_s$}{\cal P}lots$ for signal and background shown in Fig. 4. In the $\hbox{$_s$}{\cal P}lot$ for signal one observes that error bars are the largest in the $x$ regions where the background is the largest.

Figure 4: The $\hbox {$_s$}{\cal P}lots$ (signal on the left, background on the right) obtained for ${\cal F}$ are represented as dots with error bars. They are obtained from a fit using only information from ${m_{\rm ES}}$ and $\Delta E$.
\begin{figure}\begin{center}
\mbox{\psfig{file=fisher-sig-sPlot.eps,width=0.48\...
...psfig{file=fisher-bkg-sPlot.eps,width=0.48\linewidth}}
\end{center}\end{figure}

The results above can be obtained by running the tutorial TestSPlot.C

Function Members (Methods)

public:
TSPlot()
TSPlot(Int_t nx, Int_t ny, Int_t ne, Int_t ns, TTree* tree)
virtual~TSPlot()
voidTObject::AbstractMethod(const char* method) const
virtual voidTObject::AppendPad(Option_t* option = "")
virtual voidBrowse(TBrowser* b)
virtual voidTObject::Browse(TBrowser* b)
static TClass*Class()
virtual const char*TObject::ClassName() const
virtual voidTObject::Clear(Option_t* = "")
virtual TObject*TObject::Clone(const char* newname = "") const
virtual Int_tTObject::Compare(const TObject* obj) const
virtual voidTObject::Copy(TObject& object) const
virtual voidTObject::Delete(Option_t* option = "")MENU
virtual Int_tTObject::DistancetoPrimitive(Int_t px, Int_t py)
virtual voidTObject::Draw(Option_t* option = "")
virtual voidTObject::DrawClass() constMENU
virtual TObject*TObject::DrawClone(Option_t* option = "") constMENU
virtual voidTObject::Dump() constMENU
virtual voidTObject::Error(const char* method, const char* msgfmt) const
virtual voidTObject::Execute(const char* method, const char* params, Int_t* error = 0)
virtual voidTObject::Execute(TMethod* method, TObjArray* params, Int_t* error = 0)
virtual voidTObject::ExecuteEvent(Int_t event, Int_t px, Int_t py)
virtual voidTObject::Fatal(const char* method, const char* msgfmt) const
voidFillSWeightsHists(Int_t nbins = 50)
voidFillXvarHists(Int_t nbins = 100)
voidFillYpdfHists(Int_t nbins = 100)
voidFillYvarHists(Int_t nbins = 100)
virtual TObject*TObject::FindObject(const char* name) const
virtual TObject*TObject::FindObject(const TObject* obj) const
virtual Option_t*TObject::GetDrawOption() const
static Long_tTObject::GetDtorOnly()
virtual const char*TObject::GetIconName() const
virtual const char*TObject::GetName() const
Int_tGetNevents()
Int_tGetNspecies()
virtual char*TObject::GetObjectInfo(Int_t px, Int_t py) const
static Bool_tTObject::GetObjectStat()
virtual Option_t*TObject::GetOption() const
voidGetSWeights(TMatrixD& weights)
voidGetSWeights(Double_t* weights)
TH1D*GetSWeightsHist(Int_t ixvar, Int_t ispecies, Int_t iyexcl = -1)
TObjArray*GetSWeightsHists()
virtual const char*TObject::GetTitle() const
TString*GetTreeExpression()
TString*GetTreeName()
TString*GetTreeSelection()
virtual UInt_tTObject::GetUniqueID() const
TH1D*GetXvarHist(Int_t ixvar)
TObjArray*GetXvarHists()
TH1D*GetYpdfHist(Int_t iyvar, Int_t ispecies)
TObjArray*GetYpdfHists()
TH1D*GetYvarHist(Int_t iyvar)
TObjArray*GetYvarHists()
virtual Bool_tTObject::HandleTimer(TTimer* timer)
virtual ULong_tTObject::Hash() const
virtual voidTObject::Info(const char* method, const char* msgfmt) const
virtual Bool_tTObject::InheritsFrom(const char* classname) const
virtual Bool_tTObject::InheritsFrom(const TClass* cl) const
virtual voidTObject::Inspect() constMENU
voidTObject::InvertBit(UInt_t f)
virtual TClass*IsA() const
virtual Bool_tTObject::IsEqual(const TObject* obj) const
virtual Bool_tIsFolder() const
virtual Bool_tTObject::IsFolder() const
Bool_tTObject::IsOnHeap() const
virtual Bool_tTObject::IsSortable() const
Bool_tTObject::IsZombie() const
virtual voidTObject::ls(Option_t* option = "") const
voidMakeSPlot(Option_t* option = "v")
voidTObject::MayNotUse(const char* method) const
virtual Bool_tTObject::Notify()
static voidTObject::operator delete(void* ptr)
static voidTObject::operator delete(void* ptr, void* vp)
static voidTObject::operator delete[](void* ptr)
static voidTObject::operator delete[](void* ptr, void* vp)
void*TObject::operator new(size_t sz)
void*TObject::operator new(size_t sz, void* vp)
void*TObject::operator new[](size_t sz)
void*TObject::operator new[](size_t sz, void* vp)
TObject&TObject::operator=(const TObject& rhs)
virtual voidTObject::Paint(Option_t* option = "")
virtual voidTObject::Pop()
virtual voidTObject::Print(Option_t* option = "") const
virtual Int_tTObject::Read(const char* name)
virtual voidTObject::RecursiveRemove(TObject* obj)
voidRefillHist(Int_t type, Int_t var, Int_t nbins, Double_t min, Double_t max, Int_t nspecies = -1)
voidTObject::ResetBit(UInt_t f)
virtual voidTObject::SaveAs(const char* filename = "", Option_t* option = "") constMENU
virtual voidTObject::SavePrimitive(basic_ostream<char,char_traits<char> >& out, Option_t* option = "")
voidTObject::SetBit(UInt_t f)
voidTObject::SetBit(UInt_t f, Bool_t set)
virtual voidTObject::SetDrawOption(Option_t* option = "")MENU
static voidTObject::SetDtorOnly(void* obj)
voidSetInitialNumbersOfSpecies(Int_t* numbers)
voidSetNEvents(Int_t ne)
voidSetNSpecies(Int_t ns)
voidSetNX(Int_t nx)
voidSetNY(Int_t ny)
static voidTObject::SetObjectStat(Bool_t stat)
voidSetTree(TTree* tree)
voidSetTreeSelection(const char* varexp = "", const char* selection = "", Long64_t firstentry = 0)
virtual voidTObject::SetUniqueID(UInt_t uid)
virtual voidShowMembers(TMemberInspector& insp, char* parent)
virtual voidTObject::ShowMembers(TMemberInspector& insp, char* parent)
virtual voidStreamer(TBuffer& b)
virtual voidTObject::Streamer(TBuffer& b)
voidStreamerNVirtual(TBuffer& b)
voidTObject::StreamerNVirtual(TBuffer& b)
virtual voidTObject::SysError(const char* method, const char* msgfmt) const
Bool_tTObject::TestBit(UInt_t f) const
Int_tTObject::TestBits(UInt_t f) const
virtual voidTObject::UseCurrentStyle()
virtual voidTObject::Warning(const char* method, const char* msgfmt) const
virtual Int_tTObject::Write(const char* name = 0, Int_t option = 0, Int_t bufsize = 0)
virtual Int_tTObject::Write(const char* name = 0, Int_t option = 0, Int_t bufsize = 0) const
protected:
virtual voidTObject::DoError(int level, const char* location, const char* fmt, va_list va) const
voidTObject::MakeZombie()
voidSPlots(Double_t* covmat, Int_t i_excl)

Data Members

protected:
TMatrixDfMinmaxmins and maxs of variables for histogramming
Int_tfNSpeciesNumber of species
Int_tfNeventsTotal number of events
Double_t*fNumbersOfEvents[fNSpecies] estimates of numbers of events in each species
Int_tfNxNumber of control variables
Int_tfNyNumber of discriminating variables
TMatrixDfPdfTot!
TMatrixDfSWeightscomputed sWeights
TObjArrayfSWeightsHistshistograms of weighted variables
TString*fSelectionSelection on the tree
TTree*fTree!
TString*fTreenameThe name of the data tree
TString*fVarexpVariables used for splot
TMatrixDfXvar!
TObjArrayfXvarHistshistograms of control variables
TMatrixDfYpdf!
TObjArrayfYpdfHistshistograms of pdfs
TMatrixDfYvar!
TObjArrayfYvarHistshistograms of discriminating variables

Class Charts

Inheritance Inherited Members Includes Libraries
Class Charts

Function documentation

TSPlot()
 default constructor (used by I/O only)
TSPlot(Int_t nx, Int_t ny, Int_t ne, Int_t ns, TTree* tree)
normal TSPlot constructor
 nx :  number of control variables
 ny :  number of discriminating variables
 ne :  total number of events
 ns :  number of species
 tree: input data
~TSPlot()
 destructor
void Browse(TBrowser* b)
To browse the histograms
void SetInitialNumbersOfSpecies(Int_t* numbers)
Set the initial number of events of each species - used
as initial estimates in minuit
void MakeSPlot(Option_t* option = "v")
Calculates the sWeights
The option controls the print level
"Q" - no print out
"V" - prints the estimated #of events in species - default
"VV" - as "V" + the minuit printing + sums of weights for control
void SPlots(Double_t* covmat, Int_t i_excl)
Computes the sWeights from the covariance matrix
void GetSWeights(TMatrixD &weights)
Returns the matrix of sweights
void GetSWeights(Double_t *weights)
Returns the matrix of sweights. It is assumed that the array passed in the argurment is big enough
void FillXvarHists(Int_t nbins = 100)
Fills the histograms of x variables (not weighted) with nbins
TObjArray* GetXvarHists()
Returns the array of histograms of x variables (not weighted)
If histograms have not already
been filled, they are filled with default binning 100.
TH1D * GetXvarHist(Int_t ixvar)
Returns the histogram of variable #ixvar
If histograms have not already
been filled, they are filled with default binning 100.
void FillYvarHists(Int_t nbins = 100)
Fill the histograms of y variables
TObjArray* GetYvarHists()
Returns the array of histograms of y variables. If histograms have not already
been filled, they are filled with default binning 100.
TH1D * GetYvarHist(Int_t iyvar)
Returns the histogram of variable iyvar.If histograms have not already
been filled, they are filled with default binning 100.
void FillYpdfHists(Int_t nbins = 100)
Fills the histograms of pdf-s of y variables with binning nbins
TObjArray* GetYpdfHists()
Returns the array of histograms of pdf's of y variables with binning nbins
If histograms have not already
been filled, they are filled with default binning 100.
TH1D * GetYpdfHist(Int_t iyvar, Int_t ispecies)
Returns the histogram of the pdf of variable #iyvar for species #ispecies, binning nbins
If histograms have not already
been filled, they are filled with default binning 100.
void FillSWeightsHists(Int_t nbins = 50)
The order of histograms in the array:
x0_species0, x0_species1,..., x1_species0, x1_species1,..., y0_species0, y0_species1,...
If the histograms have already been filled with a different binning, they are refilled
and all histograms are deleted
TObjArray * GetSWeightsHists()
Returns an array of all histograms of variables, weighted with sWeights
If histograms have not been already filled, they are filled with default binning 50
The order of histograms in the array:
x0_species0, x0_species1,..., x1_species0, x1_species1,..., y0_species0, y0_species1,...
void RefillHist(Int_t type, Int_t var, Int_t nbins, Double_t min, Double_t max, Int_t nspecies = -1)
The Fill...Hist() methods fill the histograms with the real limits on the variables
This method allows to refill the specified histogram with user-set boundaries min and max
Parameters:
type = 1 - histogram of x variable #nvar
     = 2 - histogram of y variable #nvar
     = 3 - histogram of y_pdf for y #nvar and species #nspecies
     = 4 - histogram of x variable #nvar, species #nspecies, WITH sWeights
     = 5 - histogram of y variable #nvar, species #nspecies, WITH sWeights
TH1D * GetSWeightsHist(Int_t ixvar, Int_t ispecies, Int_t iyexcl = -1)
Returns the histogram of a variable, weithed with sWeights
If histograms have not been already filled, they are filled with default binning 50
If parameter ixvar!=-1, the histogram of x-variable #ixvar is returned for species ispecies
If parameter ixvar==-1, the histogram of y-variable #iyexcl is returned for species ispecies
If the histogram has already been filled and the binning is different from the parameter nbins
all histograms with old binning will be deleted and refilled.
void SetTree(TTree* tree)
 Set the input Tree
void SetTreeSelection(const char* varexp = "", const char* selection = "", Long64_t firstentry = 0)
Specifies the variables from the tree to be used for splot

Variables fNx, fNy, fNSpecies and fNEvents should already be set!

In the 1st parameter it is assumed that first fNx variables are x(control variables),
then fNy y variables (discriminating variables),
then fNy*fNSpecies ypdf variables (probability distribution functions of dicriminating
variables for different species). The order of pdfs should be: species0_y0, species0_y1,...
species1_y0, species1_y1,...species[fNSpecies-1]_y0...
The 2nd parameter allows to make a cut
TTree::Draw method description contains more details on specifying expression and selection
Bool_t IsFolder() const
{ return kTRUE;}
Int_t GetNevents()
{return fNevents;}
Int_t GetNspecies()
{return fNSpecies;}
TString* GetTreeName()
{return fTreename;}
TString* GetTreeSelection()
{return fSelection;}
TString* GetTreeExpression()
{return fVarexp;}
void SetNX(Int_t nx)
{fNx=nx;}
void SetNY(Int_t ny)
{fNy=ny;}
void SetNSpecies(Int_t ns)
{fNSpecies=ns;}
void SetNEvents(Int_t ne)
{fNevents=ne;}