Overview of the RooBatchCompute library.
Contains optimized computation functions for PDFs that enable significantly faster fittings.
While fitting, a significant amount of time and processing power is spent on computing the probability function for every event and PDF involved in the fitting model. To speed up this process, roofit can use the computation functions provided in this library. The functions provided here process whole data arrays (batches) instead of a single event at a time, as in the legacy evaluate() function in roofit. In addition, the code is written in a manner that allows for compiler optimizations, notably auto-vectorization. This library is compiled multiple times for different vector instruction set architectures and the optimal code is executed during runtime, as a result of an automatic hardware detection mechanism that this library contains. As a result, fits can benefit by a speedup of 3x-16x.
As of ROOT v6.26, RooBatchComputes also provides multithread and CUDA instances of the computation functions, resulting in even greater improvements for fitting times.
This library is an internal component of RooFit, so users are not supposed to actively interact with it. Instead, they can benefit from significantly faster times for fitting by calling fitTo()
and providing a BatchMode("cpu")
or a BatchMode("cuda")
option.
Note: In case the system does not support vector instructions, the RooBatchCompute::Cpu
option is guaranteed to work properly by using a generic CPU library. In contrast, users must first make sure that their system supports CUDA in order to use the RooBatchCompute::Cuda
option. If this is not the case, an exception will be thrown.
If "cuda"
is selected, RooFit will launch CUDA kernels for computing likelihoods and potentially other intense computations. At the same time, the most efficient CPU library loaded will also handle parts of the computations in parallel with the GPU (or potentially, if it's faster, all of them), thus gaining full advantage of the available hardware. For this purpose RooFitDriver
, a newly created RooFit class (in roofitcore) takes over the task of analyzing the computations and assigning each to the correct piece of hardware, taking into consideration the performance boost or penalty that may arise with every method of computing.
The CPU instance of the computing library can furthermore execute multithread computations. This also applies for computations handled by the CPU in the "cuda"
mode. To use them, one needs to set the desired number of parallel tasks before calling fitTo()
as shown below:
The easiest and most efficient way of accelerating your PDFs is to request their addition to the official RooFit by submitting a ticket here. The ROOT team will gladly assist you and take care of the details.
While your code is integrated, you are able to significantly improve the speed of fitting (but not take full advantage of the RooBatchCompute library), at least by using the batch evaluation feature. To make use of it, one should override RooAbsReal::computeBatch()
This method must be implemented so that it fills the output
array with the normalized probabilities computed for nEvents
events, the data of which can be retrieved from dataMap
. dataMap
is a simple std::map<RooRealVar*, std::span<const double>>
. Note that it is not necessary to evaluate any of the objects that the PDF relies to, because they have already been evaluated by the RooFitDriver, so that their updated results are always present in dataMap
. The RooBatchCompute::RooBatchComputeInterface
pointer should be ignored.
Make sure to add the computeBatch()
function signature in the header RooMyPDF.h
and mark it as override
to ensure that you have successfully overridden the method. As a final note, always remember to append RooBatchCompute::
to the classes defined in the RooBatchCompute library, or write using namespace RooBatchCompute
.
Files | |
file | ComputeFunctions.cxx |
This file contains vectorizable computation functions for PDFs and other Roofit objects. | |
Classes | |
class | Batches |
These classes encapsulate the necessary data for the computations. More... | |
class | RbcClass |
This file contains the code for cuda computations using the RooBatchCompute library. More... | |
class | RooBatchCompute::RooBatchComputeInterface |
The interface which should be implemented to provide optimised computation functions for implementations of RooAbsReal::doEval(). More... | |