Building and loading the chunks from the blocks and chunks constructed in RChunkConstructor.
In this class the blocks are stiches together to form chunks that are loaded into memory. The blocks used to create each chunk comes from different parts of the dataset. This is achieved by shuffling the blocks before distributing them into chunks. The purpose of this process is to reduce bias during machine learning training by ensuring that the data is well mixed. The dataset is also spit into training and validation sets with the user-defined validation split fraction.
Definition at line 112 of file RChunkLoader.hxx.
Public Member Functions | |
| RChunkLoader (ROOT::RDF::RNode &rdf, const std::size_t chunkSize, const std::size_t blockSize, const float validationSplit, const std::vector< std::string > &cols, const std::vector< std::size_t > &vecSizes={}, const float vecPadding=0.0, bool shuffle=true, const std::size_t setSeed=0) | |
| void | CheckIfOverlap (RFlat2DMatrix &Tensor1, RFlat2DMatrix &Tensor2) |
| void | CheckIfUnique (RFlat2DMatrix &Tensor) |
| void | CreateTrainingChunksIntervals () |
| Create training chunks consisiting of block intervals of different types. | |
| void | CreateValidationChunksIntervals () |
| Create training chunks consisiting of block intervals of different types. | |
| std::size_t | GetNumTrainingChunks () |
| std::size_t | GetNumTrainingEntries () |
| std::size_t | GetNumValidationChunks () |
| std::size_t | GetNumValidationEntries () |
| std::vector< std::size_t > | GetTrainingChunkSizes () |
| std::vector< std::size_t > | GetValidationChunkSizes () |
| void | LoadTrainingChunk (RFlat2DMatrix &TrainChunkTensor, std::size_t chunk) |
| Load the nth chunk from the training dataset into a tensor. | |
| void | LoadValidationChunk (RFlat2DMatrix &ValidationChunkTensor, std::size_t chunk) |
| Load the nth chunk from the validation dataset into a tensor. | |
| void | ResetDataframe () |
| void | SplitDataset () |
| Distribute the blocks into training and validation datasets. | |
Private Attributes | |
| ROOT::RDF::RNode & | f_rdf |
| std::size_t | fBlockSize |
| std::size_t | fChunkSize |
| std::vector< std::string > | fCols |
| ROOT::RDF::RResultPtr< std::vector< ULong64_t > > | fEntries |
| bool | fNotFiltered |
| std::size_t | fNumChunkCols |
| std::size_t | fNumCols |
| std::size_t | fNumEntries |
| std::size_t | fNumTrainEntries |
| std::size_t | fNumValidationEntries |
| std::size_t | fSetSeed |
| bool | fShuffle |
| std::size_t | fSumVecSizes |
| std::unique_ptr< RFlat2DMatrixOperators > | fTensorOperators |
| std::unique_ptr< RChunkConstructor > | fTraining |
| std::unique_ptr< RChunkConstructor > | fValidation |
| float | fValidationSplit |
| std::size_t | fVecPadding |
| std::vector< std::size_t > | fVecSizes |
#include <ROOT/ML/RChunkLoader.hxx>
|
inline |
Definition at line 142 of file RChunkLoader.hxx.
|
inline |
Definition at line 448 of file RChunkLoader.hxx.
|
inline |
Definition at line 440 of file RChunkLoader.hxx.
|
inline |
Create training chunks consisiting of block intervals of different types.
Definition at line 255 of file RChunkLoader.hxx.
|
inline |
Create training chunks consisiting of block intervals of different types.
Definition at line 295 of file RChunkLoader.hxx.
|
inline |
Definition at line 471 of file RChunkLoader.hxx.
|
inline |
Definition at line 437 of file RChunkLoader.hxx.
|
inline |
Definition at line 473 of file RChunkLoader.hxx.
|
inline |
Definition at line 438 of file RChunkLoader.hxx.
|
inline |
Definition at line 434 of file RChunkLoader.hxx.
|
inline |
Definition at line 435 of file RChunkLoader.hxx.
|
inline |
Load the nth chunk from the training dataset into a tensor.
| [in] | TrainChunkTensor | RTensor for the training chunk |
| [in] | chunk | Index of the chunk in the dataset |
Definition at line 333 of file RChunkLoader.hxx.
|
inline |
Load the nth chunk from the validation dataset into a tensor.
| [in] | ValidationChunkTensor | RTensor for the validation chunk |
| [in] | chunk | Index of the chunk in the dataset |
Definition at line 385 of file RChunkLoader.hxx.
|
inline |
Definition at line 432 of file RChunkLoader.hxx.
|
inline |
Distribute the blocks into training and validation datasets.
Definition at line 180 of file RChunkLoader.hxx.
|
private |
Definition at line 128 of file RChunkLoader.hxx.
|
private |
Definition at line 116 of file RChunkLoader.hxx.
|
private |
Definition at line 115 of file RChunkLoader.hxx.
|
private |
Definition at line 129 of file RChunkLoader.hxx.
|
private |
Definition at line 136 of file RChunkLoader.hxx.
|
private |
Definition at line 133 of file RChunkLoader.hxx.
|
private |
Definition at line 122 of file RChunkLoader.hxx.
|
private |
Definition at line 130 of file RChunkLoader.hxx.
|
private |
Definition at line 114 of file RChunkLoader.hxx.
|
private |
Definition at line 124 of file RChunkLoader.hxx.
|
private |
Definition at line 125 of file RChunkLoader.hxx.
|
private |
Definition at line 131 of file RChunkLoader.hxx.
|
private |
Definition at line 134 of file RChunkLoader.hxx.
|
private |
Definition at line 120 of file RChunkLoader.hxx.
|
private |
Definition at line 126 of file RChunkLoader.hxx.
|
private |
Definition at line 138 of file RChunkLoader.hxx.
|
private |
Definition at line 139 of file RChunkLoader.hxx.
|
private |
Definition at line 117 of file RChunkLoader.hxx.
|
private |
Definition at line 121 of file RChunkLoader.hxx.
|
private |
Definition at line 119 of file RChunkLoader.hxx.