Convert between NumPy arrays or Pandas DataFrames and RooDataSets.
This tutorials first how to export a RooDataSet to NumPy arrays or a Pandas DataFrame, and then it shows you how to create a RooDataSet from a Pandas DataFrame.
import ROOT
import numpy as np
n_events = 10000
x = ROOT.RooRealVar("x", "x", -10, 10)
mean = ROOT.RooRealVar("mean", "mean of gaussian", 1, -10, 10)
sigma = ROOT.RooRealVar("sigma", "width of gaussian", 1, 0.1, 10)
gauss = ROOT.RooGaussian("gauss", "gaussian PDF", x, mean, sigma)
data = gauss.generate(ROOT.RooArgSet(x), 10000)
arrays = data.to_numpy()
print("Mean of numpy array:", np.mean(arrays["x"]))
print("Standard deviation of numpy array:", np.std(arrays["x"]))
df = data.to_pandas()
try:
import matplotlib.pyplot as plt
df.hist(column="x", bins=x.bins())
except Exception:
print(
'Skipping `df.hist(column="x", bins=x.bins())` because matplotlib could not be imported or was not able to display the plot.'
)
del data
del arrays
del df
x_arr = np.random.normal(-1.0, 1.0, (n_events,))
data = ROOT.RooDataSet.from_numpy({"x": x_arr}, [x])
fit_result = gauss.fitTo(data, PrintLevel=-1, Save=True)
fit_result.Print()
xframe = x.frame(Title="Gaussian pdf")
data.plotOn(xframe)
gauss.plotOn(xframe)
c = ROOT.TCanvas("rf409_NumPyPandasToRooFit", "rf409_NumPyPandasToRooFit", 800, 400)
xframe.Draw()
c.SaveAs("rf409_NumPyPandasToRooFit.png")
def print_histogram_output(histogram_output):
counts, bin_edges = histogram_output
print(np.array(counts, dtype=int))
print(bin_edges[0])
datahist = data.binnedClone()
counts, bin_edges = datahist.to_numpy()
print("Counts and bin edges from RooDataHist.to_numpy:")
print_histogram_output((counts, bin_edges))
print("Counts and bin edges from np.histogram:")
print_histogram_output(np.histogramdd([x_arr], bins=[x.bins()]))
datahist_new_1 = ROOT.RooDataHist.from_numpy(counts, [x])
print("RooDataHist imported with default binning and exported back to numpy:")
print_histogram_output(datahist_new_1.to_numpy())
bins = [np.linspace(-10, 10, 21)]
counts, _ = np.histogramdd([x_arr], bins=bins)
datahist_new_2 = ROOT.RooDataHist.from_numpy(counts, [x], bins=bins)
print("RooDataHist imported with linspace binning and exported back to numpy:")
print_histogram_output(datahist_new_2.to_numpy())
bins = [20]
ranges = [(-10, 10)]
counts, _ = np.histogramdd([x_arr], bins=bins, range=ranges)
datahist_new_3 = ROOT.RooDataHist.from_numpy(counts, [x], bins=bins, ranges=ranges)
print("RooDataHist imported with uniform binning and exported back to numpy:")
print_histogram_output(datahist_new_3.to_numpy())
[#1] INFO:Fitting -- RooAbsPdf::fitTo(gauss_over_gauss_Int[x]) fixing normalization set for coefficient determination to observables in data
[#1] INFO:Fitting -- using CPU computation library compiled with -mavx2
[#1] INFO:Fitting -- RooAddition::defaultErrorLevel(nll_gauss_over_gauss_Int[x]_) Summation contains a RooNLLVar, using its error level
[#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: activating const optimization
[#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: deactivating const optimization
RooFitResult: minimized FCN value: 14132.6, estimated distance to minimum: 1.02873e-09
covariance matrix quality: Full, accurate covariance matrix
Status : MINIMIZE=0 HESSE=0
Floating Parameter FinalValue +/- Error
-------------------- --------------------------
mean -1.0053e+00 +/- 9.94e-03
sigma 9.9434e-01 +/- 7.03e-03
Mean of numpy array: 1.0066466535473984
Standard deviation of numpy array: 0.9973499677811349
Counts and bin edges from RooDataHist.to_numpy:
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 2 0 4 4 13 25 31 58 91 114
188 238 385 423 545 635 686 779 796 832 762 714 633 505 413 334 253 185
128 85 52 29 22 20 7 5 1 0 2 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0]
[-10. -9.8 -9.6 -9.4 -9.2 -9. -8.8 -8.6 -8.4 -8.2 -8. -7.8
-7.6 -7.4 -7.2 -7. -6.8 -6.6 -6.4 -6.2 -6. -5.8 -5.6 -5.4
-5.2 -5. -4.8 -4.6 -4.4 -4.2 -4. -3.8 -3.6 -3.4 -3.2 -3.
-2.8 -2.6 -2.4 -2.2 -2. -1.8 -1.6 -1.4 -1.2 -1. -0.8 -0.6
-0.4 -0.2 0. 0.2 0.4 0.6 0.8 1. 1.2 1.4 1.6 1.8
2. 2.2 2.4 2.6 2.8 3. 3.2 3.4 3.6 3.8 4. 4.2
4.4 4.6 4.8 5. 5.2 5.4 5.6 5.8 6. 6.2 6.4 6.6
6.8 7. 7.2 7.4 7.6 7.8 8. 8.2 8.4 8.6 8.8 9.
9.2 9.4 9.6 9.8 10. ]
Counts and bin edges from np.histogram:
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 2 0 4 4 13 25 31 58 91 114
188 238 385 423 545 635 686 779 796 832 762 714 633 505 413 334 253 185
128 85 52 29 22 20 7 5 1 0 2 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0]
[-10. -9.8 -9.6 -9.4 -9.2 -9. -8.8 -8.6 -8.4 -8.2 -8. -7.8
-7.6 -7.4 -7.2 -7. -6.8 -6.6 -6.4 -6.2 -6. -5.8 -5.6 -5.4
-5.2 -5. -4.8 -4.6 -4.4 -4.2 -4. -3.8 -3.6 -3.4 -3.2 -3.
-2.8 -2.6 -2.4 -2.2 -2. -1.8 -1.6 -1.4 -1.2 -1. -0.8 -0.6
-0.4 -0.2 0. 0.2 0.4 0.6 0.8 1. 1.2 1.4 1.6 1.8
2. 2.2 2.4 2.6 2.8 3. 3.2 3.4 3.6 3.8 4. 4.2
4.4 4.6 4.8 5. 5.2 5.4 5.6 5.8 6. 6.2 6.4 6.6
6.8 7. 7.2 7.4 7.6 7.8 8. 8.2 8.4 8.6 8.8 9.
9.2 9.4 9.6 9.8 10. ]
RooDataHist imported with default binning and exported back to numpy:
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 2 0 4 4 13 25 31 58 91 114
188 238 385 423 545 635 686 779 796 832 762 714 633 505 413 334 253 185
128 85 52 29 22 20 7 5 1 0 2 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0]
[-10. -9.8 -9.6 -9.4 -9.2 -9. -8.8 -8.6 -8.4 -8.2 -8. -7.8
-7.6 -7.4 -7.2 -7. -6.8 -6.6 -6.4 -6.2 -6. -5.8 -5.6 -5.4
-5.2 -5. -4.8 -4.6 -4.4 -4.2 -4. -3.8 -3.6 -3.4 -3.2 -3.
-2.8 -2.6 -2.4 -2.2 -2. -1.8 -1.6 -1.4 -1.2 -1. -0.8 -0.6
-0.4 -0.2 0. 0.2 0.4 0.6 0.8 1. 1.2 1.4 1.6 1.8
2. 2.2 2.4 2.6 2.8 3. 3.2 3.4 3.6 3.8 4. 4.2
4.4 4.6 4.8 5. 5.2 5.4 5.6 5.8 6. 6.2 6.4 6.6
6.8 7. 7.2 7.4 7.6 7.8 8. 8.2 8.4 8.6 8.8 9.
9.2 9.4 9.6 9.8 10. ]
RooDataHist imported with linspace binning and exported back to numpy:
[ 0 0 0 0 0 10 218 1348 3441 3446 1313 208 15 1
0 0 0 0 0 0]
[-10. -9. -8. -7. -6. -5. -4. -3. -2. -1. 0. 1. 2. 3.
4. 5. 6. 7. 8. 9. 10.]
RooDataHist imported with uniform binning and exported back to numpy:
[ 0 0 0 0 0 10 218 1348 3441 3446 1313 208 15 1
0 0 0 0 0 0]
[-10. -9. -8. -7. -6. -5. -4. -3. -2. -1. 0. 1. 2. 3.
4. 5. 6. 7. 8. 9. 10.]
- Date
- November 2021
- Author
- Jonas Rembser
Definition in file rf409_NumPyPandasToRooFit.py.