Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
df103_NanoAODHiggsAnalysis.py File Reference

Namespaces

namespace  df103_NanoAODHiggsAnalysis
 

Detailed Description

View in nbviewer Open in SWAN An example of complex analysis with RDataFrame: reconstructing the Higgs boson.

This tutorial is a simplified but yet complex example of an analysis reconstructing the Higgs boson decaying to two Z bosons from events with four leptons. The data and simulated events are taken from CERN OpenData representing a subset of the data recorded in 2012 with the CMS detector at the LHC. The tutorials follows the Higgs to four leptons analysis published on CERN Open Data portal (10.7483/OPENDATA.CMS.JKB8.RR42). The resulting plots show the invariant mass of the selected four lepton systems in different decay modes (four muons, four electrons and two of each kind) and in a combined plot indicating the decay of the Higgs boson with a mass of about 125 GeV.

The following steps are performed for each sample with data and simulated events in order to reconstruct the Higgs boson from the selected muons and electrons:

  1. Select interesting events with multiple cuts on event properties, e.g., number of leptons, kinematics of the leptons and quality of the tracks.
  2. Reconstruct two Z bosons of which only one on the mass shell from the selected events and apply additional cuts on the reconstructed objects.
  3. Reconstruct the Higgs boson from the remaining Z boson candidates and calculate its invariant mass.

Another aim of this version of the tutorial is to show a way to blend C++ and Python code. All the functions that make computations on data to define new columns or filter existing ones in a precise way, better suited to be written in C++, have been moved to a header that is then declared to the ROOT C++ interpreter. The functions that instead create nodes of the computational graph (e.g. Filter, Define) remain inside the main Python script.

The tutorial has the fast mode enabled by default, which reads the data from already skimmed datasets with a total size of only 51MB. If the fast mode is disabled, the tutorial runs over the full dataset with a size of 12GB.

Definition in file df103_NanoAODHiggsAnalysis.py.