Abstract interface to read data from an ntuple.
The page source is initialized with the columns of interest. Alias columns from projected fields are mapped to the corresponding physical columns. Pages from the columns of interest can then be mapped into memory. The page source also gives access to the ntuple's meta-data.
Definition at line 551 of file RPageStorage.hxx.
Classes | |
class | RActivePhysicalColumns |
Keeps track of the requested physical column IDs and their in-memory target type via a column element identifier. More... | |
struct | RClusterInfo |
Summarizes cluster-level information that are necessary to load a certain page. More... | |
struct | RCounters |
Default I/O performance counters that get registered in fMetrics More... | |
struct | REntryRange |
Used in SetEntryRange / GetEntryRange. More... | |
class | RExclDescriptorGuard |
An RAII wrapper used for the writable access to RPageSource::fDescriptor . See GetSharedDescriptorGuard() . More... | |
class | RSharedDescriptorGuard |
An RAII wrapper used for the read-only access to RPageSource::fDescriptor . See GetExclDescriptorGuard() `. More... | |
Public Member Functions | |
RPageSource (const RPageSource &)=delete | |
RPageSource (RPageSource &&)=delete | |
RPageSource (std::string_view ntupleName, const RNTupleReadOptions &fOptions) | |
~RPageSource () override | |
ColumnHandle_t | AddColumn (DescriptorId_t fieldId, RColumn &column) override |
Register a new column. | |
void | Attach () |
Open the physical storage container and deserialize header and footer. | |
std::unique_ptr< RPageSource > | Clone () const |
Open the same storage multiple time, e.g. | |
void | DropColumn (ColumnHandle_t columnHandle) override |
Unregisters a column. | |
REntryRange | GetEntryRange () const |
NTupleSize_t | GetNElements (ColumnHandle_t columnHandle) |
NTupleSize_t | GetNEntries () |
const RNTupleReadOptions & | GetReadOptions () const |
const RSharedDescriptorGuard | GetSharedDescriptorGuard () const |
Takes the read lock for the descriptor. | |
EPageStorageType | GetType () final |
Whether the concrete implementation is a sink or a source. | |
virtual std::vector< std::unique_ptr< RCluster > > | LoadClusters (std::span< RCluster::RKey > clusterKeys)=0 |
Populates all the pages of the given cluster ids and columns; it is possible that some columns do not contain any pages. | |
virtual RPageRef | LoadPage (ColumnHandle_t columnHandle, NTupleSize_t globalIndex) |
Allocates and fills a page that contains the index-th element. | |
virtual RPageRef | LoadPage (ColumnHandle_t columnHandle, RClusterIndex clusterIndex) |
Another version of LoadPage that allows to specify cluster-relative indexes. | |
virtual void | LoadSealedPage (DescriptorId_t physicalColumnId, RClusterIndex clusterIndex, RSealedPage &sealedPage)=0 |
Read the packed and compressed bytes of a page into the memory buffer provided by sealedPage . | |
void | LoadStructure () |
Loads header and footer without decompressing or deserializing them. | |
RPageSource & | operator= (const RPageSource &)=delete |
RPageSource & | operator= (RPageSource &&)=delete |
void | SetEntryRange (const REntryRange &range) |
Promise to only read from the given entry range. | |
RResult< RPage > | UnsealPage (const RSealedPage &sealedPage, const RColumnElementBase &element) |
void | UnzipCluster (RCluster *cluster) |
Parallel decompression and unpacking of the pages in the given cluster. | |
Public Member Functions inherited from ROOT::Experimental::Internal::RPageStorage | |
RPageStorage (const RPageStorage &other)=delete | |
RPageStorage (RPageStorage &&other)=default | |
RPageStorage (std::string_view name) | |
virtual | ~RPageStorage () |
ColumnId_t | GetColumnId (ColumnHandle_t columnHandle) const |
virtual Detail::RNTupleMetrics & | GetMetrics () |
Returns the default metrics object. | |
const std::string & | GetNTupleName () const |
Returns the NTuple name. | |
RPageStorage & | operator= (const RPageStorage &other)=delete |
RPageStorage & | operator= (RPageStorage &&other)=default |
void | SetTaskScheduler (RTaskScheduler *taskScheduler) |
Static Public Member Functions | |
static std::unique_ptr< RPageSource > | Create (std::string_view ntupleName, std::string_view location, const RNTupleReadOptions &options=RNTupleReadOptions()) |
Guess the concrete derived page source from the file name (location) | |
static RResult< RPage > | UnsealPage (const RSealedPage &sealedPage, const RColumnElementBase &element, RPageAllocator &pageAlloc) |
Helper for unstreaming a page. | |
Protected Member Functions | |
virtual RNTupleDescriptor | AttachImpl ()=0 |
LoadStructureImpl() has been called before AttachImpl() is called | |
virtual std::unique_ptr< RPageSource > | CloneImpl () const =0 |
Returns a new, unattached page source for the same data set. | |
void | EnableDefaultMetrics (const std::string &prefix) |
Enables the default set of metrics provided by RPageSource. | |
RExclDescriptorGuard | GetExclDescriptorGuard () |
Note that the underlying lock is not recursive. See GetSharedDescriptorGuard() for further information. | |
virtual RPageRef | LoadPageImpl (ColumnHandle_t columnHandle, const RClusterInfo &clusterInfo, ClusterSize_t::ValueType idxInCluster)=0 |
virtual void | LoadStructureImpl ()=0 |
void | PrepareLoadCluster (const RCluster::RKey &clusterKey, ROnDiskPageMap &pageZeroMap, std::function< void(DescriptorId_t, NTupleSize_t, const RClusterDescriptor::RPageRange::RPageInfo &)> perPageFunc) |
Prepare a page range read for the column set in clusterKey . | |
virtual void | UnzipClusterImpl (RCluster *cluster) |
Protected Member Functions inherited from ROOT::Experimental::Internal::RPageStorage | |
void | WaitForAllTasks () |
Protected Attributes | |
RActivePhysicalColumns | fActivePhysicalColumns |
The active columns are implicitly defined by the model fields or views. | |
std::unique_ptr< RCounters > | fCounters |
RNTupleReadOptions | fOptions |
RPagePool | fPagePool |
Pages that are unzipped with IMT are staged into the page pool. | |
Protected Attributes inherited from ROOT::Experimental::Internal::RPageStorage | |
Detail::RNTupleMetrics | fMetrics |
std::string | fNTupleName |
std::unique_ptr< RPageAllocator > | fPageAllocator |
For the time being, we will use the heap allocator for all sources and sinks. This may change in the future. | |
RTaskScheduler * | fTaskScheduler = nullptr |
Private Member Functions | |
void | UpdateLastUsedCluster (DescriptorId_t clusterId) |
Does nothing if fLastUsedCluster == clusterId. | |
Private Attributes | |
RNTupleDescriptor | fDescriptor |
std::shared_mutex | fDescriptorLock |
REntryRange | fEntryRange |
Used by the cluster pool to prevent reading beyond the given range. | |
bool | fHasStructure = false |
Set to true once LoadStructure() is called. | |
bool | fIsAttached = false |
Set to true once Attach() is called. | |
DescriptorId_t | fLastUsedCluster = kInvalidDescriptorId |
Remembers the last cluster id from which a page was requested. | |
std::map< NTupleSize_t, DescriptorId_t > | fPreloadedClusters |
Clusters from where pages got preloaded in UnzipClusterImpl(), ordered by first entry number of the clusters. | |
Additional Inherited Members | |
Public Types inherited from ROOT::Experimental::Internal::RPageStorage | |
using | ColumnHandle_t = RColumnHandle |
The column handle identifies a column with the current open page storage. | |
using | SealedPageSequence_t = std::deque< RSealedPage > |
Static Public Attributes inherited from ROOT::Experimental::Internal::RPageStorage | |
static constexpr std::size_t | kNBytesPageChecksum = sizeof(std::uint64_t) |
The page checksum is a 64bit xxhash3. | |
#include <ROOT/RPageStorage.hxx>
ROOT::Experimental::Internal::RPageSource::RPageSource | ( | std::string_view | ntupleName, |
const RNTupleReadOptions & | fOptions | ||
) |
Definition at line 145 of file RPageStorage.cxx.
|
delete |
|
delete |
|
override |
Definition at line 150 of file RPageStorage.cxx.
|
overridevirtual |
Register a new column.
When reading, the column must exist in the ntuple on disk corresponding to the meta-data. When writing, every column can only be attached once.
Implements ROOT::Experimental::Internal::RPageStorage.
Definition at line 173 of file RPageStorage.cxx.
void ROOT::Experimental::Internal::RPageSource::Attach | ( | ) |
Open the physical storage container and deserialize header and footer.
Definition at line 203 of file RPageStorage.cxx.
|
protectedpure virtual |
LoadStructureImpl()
has been called before AttachImpl()
is called
Implemented in ROOT::Experimental::Internal::RPageSourceDaos, and ROOT::Experimental::Internal::RPageSourceFile.
std::unique_ptr< ROOT::Experimental::Internal::RPageSource > ROOT::Experimental::Internal::RPageSource::Clone | ( | ) | const |
Open the same storage multiple time, e.g.
for reading in multiple threads. If the source is already attached, the clone will be attached, too. The clone will use, however, it's own connection to the underlying storage (e.g., file descriptor, XRootD handle, etc.)
Definition at line 211 of file RPageStorage.cxx.
|
protectedpure virtual |
Returns a new, unattached page source for the same data set.
Implemented in ROOT::Experimental::Internal::RPageSourceDaos, and ROOT::Experimental::Internal::RPageSourceFile.
|
static |
Guess the concrete derived page source from the file name (location)
Definition at line 153 of file RPageStorage.cxx.
|
overridevirtual |
Unregisters a column.
A page source decreases the reference counter for the corresponding active column. For a page sink, dropping columns is currently a no-op.
Implements ROOT::Experimental::Internal::RPageStorage.
Definition at line 183 of file RPageStorage.cxx.
|
protected |
Enables the default set of metrics provided by RPageSource.
prefix
will be used as the prefix for the counters registered in the internal RNTupleMetrics object. A subclass using the default set of metrics is responsible for updating the counters appropriately, e.g. fCounters->fNRead.Inc()
Alternatively, a subclass might provide its own RNTupleMetrics object by overriding the GetMetrics()
member function.
Definition at line 432 of file RPageStorage.cxx.
|
inline |
Definition at line 772 of file RPageStorage.hxx.
|
inlineprotected |
Note that the underlying lock is not recursive. See GetSharedDescriptorGuard()
for further information.
Definition at line 719 of file RPageStorage.hxx.
ROOT::Experimental::NTupleSize_t ROOT::Experimental::Internal::RPageSource::GetNElements | ( | ColumnHandle_t | columnHandle | ) |
Definition at line 227 of file RPageStorage.cxx.
ROOT::Experimental::NTupleSize_t ROOT::Experimental::Internal::RPageSource::GetNEntries | ( | ) |
Definition at line 222 of file RPageStorage.cxx.
|
inline |
Definition at line 743 of file RPageStorage.hxx.
|
inline |
Takes the read lock for the descriptor.
Multiple threads can take the lock concurrently. The underlying std::shared_mutex
, however, is neither read nor write recursive: within one thread, only one lock (shared or exclusive) must be acquired at the same time. This requires special care in sections protected by GetSharedDescriptorGuard()
and GetExclDescriptorGuard()
especially to avoid that the locks are acquired indirectly (e.g. by a call to GetNEntries()
). As a general guideline, no other method of the page source should be called (directly or indirectly) in a guarded section.
Definition at line 751 of file RPageStorage.hxx.
|
inlinefinalvirtual |
Whether the concrete implementation is a sink or a source.
Implements ROOT::Experimental::Internal::RPageStorage.
Definition at line 742 of file RPageStorage.hxx.
|
pure virtual |
Populates all the pages of the given cluster ids and columns; it is possible that some columns do not contain any pages.
The page source may load more columns than the minimal necessary set from columns
. To indicate which columns have been loaded, LoadClusters()
must mark them with
SetColumnAvailable(). That includes the ones from the
columnsthat don't have pages; otherwise subsequent requests for the cluster would assume an incomplete cluster and trigger loading again.
LoadClusters()` is typically called from the I/O thread of a cluster pool, i.e. the method runs concurrently to other methods of the page source.
Implemented in ROOT::Experimental::Internal::RPageSourceDaos, and ROOT::Experimental::Internal::RPageSourceFile.
|
virtual |
Allocates and fills a page that contains the index-th element.
The default implementation searches the page and calls LoadPageImpl(). Returns a default-constructed RPage for suppressed columns.
Definition at line 359 of file RPageStorage.cxx.
|
virtual |
Another version of LoadPage
that allows to specify cluster-relative indexes.
Returns a default-constructed RPage for suppressed columns.
Definition at line 397 of file RPageStorage.cxx.
|
protectedpure virtual |
|
pure virtual |
Read the packed and compressed bytes of a page into the memory buffer provided by sealedPage
.
The sealed page can be used subsequently in a call to RPageSink::CommitSealedPage
. The fSize
and fNElements
member of the sealedPage parameters are always set. If sealedPage.fBuffer
is nullptr
, no data will be copied but the returned size information can be used by the caller to allocate a large enough buffer and call LoadSealedPage
again.
Implemented in ROOT::Experimental::Internal::RPageSourceDaos, and ROOT::Experimental::Internal::RPageSourceFile.
void ROOT::Experimental::Internal::RPageSource::LoadStructure | ( | ) |
Loads header and footer without decompressing or deserializing them.
This can be used to asynchronously open a file in the background. The method is idempotent and it is called as a first step in Attach()
. Pages sources may or may not make use of splitting loading and processing meta-data. Therefore, LoadStructure()
may do nothing and defer loading the meta-data to Attach()
.
Definition at line 196 of file RPageStorage.cxx.
|
protectedpure virtual |
|
delete |
|
delete |
|
protected |
Prepare a page range read for the column set in clusterKey
.
Specifically, pages referencing the kTypePageZero
locator are filled in pageZeroMap
; otherwise, perPageFunc
is called for each page. This is commonly used as part of LoadClusters()
in derived classes.
Definition at line 308 of file RPageStorage.cxx.
void ROOT::Experimental::Internal::RPageSource::SetEntryRange | ( | const REntryRange & | range | ) |
Promise to only read from the given entry range.
If set, prevents the cluster pool from reading-ahead beyond the given range. The range needs to be within [0, GetNEntries())
.
Definition at line 188 of file RPageStorage.cxx.
ROOT::RResult< ROOT::Experimental::Internal::RPage > ROOT::Experimental::Internal::RPageSource::UnsealPage | ( | const RSealedPage & | sealedPage, |
const RColumnElementBase & | element | ||
) |
Definition at line 528 of file RPageStorage.cxx.
|
static |
Helper for unstreaming a page.
This is commonly used in derived, concrete page sources. The implementation currently always makes a memory copy, even if the sealed page is uncompressed and in the final memory layout. The optimization of directly mapping pages is left to the concrete page source implementations.
Definition at line 534 of file RPageStorage.cxx.
void ROOT::Experimental::Internal::RPageSource::UnzipCluster | ( | RCluster * | cluster | ) |
Parallel decompression and unpacking of the pages in the given cluster.
The unzipped pages are supposed to be preloaded in a page pool attached to the source. The method is triggered by the cluster pool's unzip thread. It is an optional optimization, the method can safely do nothing. In particular, the actual implementation will only run if a task scheduler is set. In practice, a task scheduler is set if implicit multi-threading is turned on.
Definition at line 232 of file RPageStorage.cxx.
|
protectedvirtual |
Definition at line 238 of file RPageStorage.cxx.
|
private |
Does nothing if fLastUsedCluster == clusterId.
Otherwise, updated fLastUsedCluster and evict unused paged from the page pool of all previous clusters. Must not be called when the descriptor guard is taken.
Definition at line 334 of file RPageStorage.cxx.
|
protected |
The active columns are implicitly defined by the model fields or views.
Definition at line 687 of file RPageStorage.hxx.
|
protected |
Definition at line 683 of file RPageStorage.hxx.
|
private |
Definition at line 605 of file RPageStorage.hxx.
|
mutableprivate |
Definition at line 606 of file RPageStorage.hxx.
|
private |
Used by the cluster pool to prevent reading beyond the given range.
Definition at line 607 of file RPageStorage.hxx.
|
private |
Set to true once LoadStructure()
is called.
Definition at line 608 of file RPageStorage.hxx.
|
private |
Set to true once Attach()
is called.
Definition at line 609 of file RPageStorage.hxx.
|
private |
Remembers the last cluster id from which a page was requested.
Definition at line 612 of file RPageStorage.hxx.
|
protected |
Definition at line 685 of file RPageStorage.hxx.
|
protected |
Pages that are unzipped with IMT are staged into the page pool.
Definition at line 690 of file RPageStorage.hxx.
|
private |
Clusters from where pages got preloaded in UnzipClusterImpl(), ordered by first entry number of the clusters.
If the last used cluster changes in LoadPage(), all unused pages from previous clusters are evicted from the page pool.
Definition at line 616 of file RPageStorage.hxx.