Abstract interface to read data from an ntuple.
The page source is initialized with the columns of interest. Alias columns from projected fields are mapped to the corresponding physical columns. Pages from the columns of interest can then be mapped into memory. The page source also gives access to the ntuple's meta-data.
Definition at line 558 of file RPageStorage.hxx.
Classes | |
class | RActivePhysicalColumns |
Keeps track of the requested physical column IDs and their in-memory target type via a column element identifier. More... | |
struct | RClusterInfo |
Summarizes cluster-level information that are necessary to load a certain page. More... | |
struct | RCounters |
Default I/O performance counters that get registered in fMetrics More... | |
struct | REntryRange |
Used in SetEntryRange / GetEntryRange. More... | |
class | RExclDescriptorGuard |
An RAII wrapper used for the writable access to RPageSource::fDescriptor . See GetSharedDescriptorGuard() . More... | |
class | RSharedDescriptorGuard |
An RAII wrapper used for the read-only access to RPageSource::fDescriptor . See GetExclDescriptorGuard() `. More... | |
Static Public Member Functions | |
static std::unique_ptr< RPageSource > | Create (std::string_view ntupleName, std::string_view location, const RNTupleReadOptions &options=RNTupleReadOptions()) |
Guess the concrete derived page source from the file name (location) | |
static RResult< RPage > | UnsealPage (const RSealedPage &sealedPage, const RColumnElementBase &element, RPageAllocator &pageAlloc) |
Helper for unstreaming a page. | |
Protected Attributes | |
RActivePhysicalColumns | fActivePhysicalColumns |
The active columns are implicitly defined by the model fields or views. | |
std::unique_ptr< RCounters > | fCounters |
RNTupleReadOptions | fOptions |
RPagePool | fPagePool |
Pages that are unzipped with IMT are staged into the page pool. | |
![]() | |
Detail::RNTupleMetrics | fMetrics |
std::string | fNTupleName |
std::unique_ptr< RPageAllocator > | fPageAllocator |
For the time being, we will use the heap allocator for all sources and sinks. This may change in the future. | |
RTaskScheduler * | fTaskScheduler = nullptr |
Private Member Functions | |
void | UpdateLastUsedCluster (DescriptorId_t clusterId) |
Does nothing if fLastUsedCluster == clusterId. | |
Private Attributes | |
RNTupleDescriptor | fDescriptor |
std::shared_mutex | fDescriptorLock |
REntryRange | fEntryRange |
Used by the cluster pool to prevent reading beyond the given range. | |
bool | fHasStructure = false |
Set to true once LoadStructure() is called. | |
bool | fIsAttached = false |
Set to true once Attach() is called. | |
DescriptorId_t | fLastUsedCluster = kInvalidDescriptorId |
Remembers the last cluster id from which a page was requested. | |
std::map< NTupleSize_t, DescriptorId_t > | fPreloadedClusters |
Clusters from where pages got preloaded in UnzipClusterImpl(), ordered by first entry number of the clusters. | |
Additional Inherited Members | |
![]() | |
using | ColumnHandle_t = RColumnHandle |
The column handle identifies a column with the current open page storage. | |
using | SealedPageSequence_t = std::deque<RSealedPage> |
![]() | |
static constexpr std::size_t | kNBytesPageChecksum = sizeof(std::uint64_t) |
The page checksum is a 64bit xxhash3. | |
#include <ROOT/RPageStorage.hxx>
ROOT::Experimental::Internal::RPageSource::RPageSource | ( | std::string_view | ntupleName, |
const RNTupleReadOptions & | fOptions ) |
Definition at line 145 of file RPageStorage.cxx.
|
delete |
|
delete |
|
override |
Definition at line 150 of file RPageStorage.cxx.
|
overridevirtual |
Register a new column.
When reading, the column must exist in the ntuple on disk corresponding to the meta-data. When writing, every column can only be attached once.
Implements ROOT::Experimental::Internal::RPageStorage.
Definition at line 173 of file RPageStorage.cxx.
void ROOT::Experimental::Internal::RPageSource::Attach | ( | RNTupleSerializer::EDescriptorDeserializeMode | mode = RNTupleSerializer::EDescriptorDeserializeMode::kForReading | ) |
Open the physical storage container and deserialize header and footer.
Definition at line 203 of file RPageStorage.cxx.
|
protectedpure virtual |
LoadStructureImpl()
has been called before AttachImpl()
is called
Implemented in ROOT::Experimental::Internal::RPageSourceDaos, and ROOT::Experimental::Internal::RPageSourceFile.
std::unique_ptr< ROOT::Experimental::Internal::RPageSource > ROOT::Experimental::Internal::RPageSource::Clone | ( | ) | const |
Open the same storage multiple time, e.g.
for reading in multiple threads. If the source is already attached, the clone will be attached, too. The clone will use, however, it's own connection to the underlying storage (e.g., file descriptor, XRootD handle, etc.)
Definition at line 211 of file RPageStorage.cxx.
|
protectedpure virtual |
Returns a new, unattached page source for the same data set.
Implemented in ROOT::Experimental::Internal::RPageSourceDaos, and ROOT::Experimental::Internal::RPageSourceFile.
|
static |
Guess the concrete derived page source from the file name (location)
Definition at line 153 of file RPageStorage.cxx.
|
overridevirtual |
Unregisters a column.
A page source decreases the reference counter for the corresponding active column. For a page sink, dropping columns is currently a no-op.
Implements ROOT::Experimental::Internal::RPageStorage.
Definition at line 183 of file RPageStorage.cxx.
|
protected |
Enables the default set of metrics provided by RPageSource.
prefix
will be used as the prefix for the counters registered in the internal RNTupleMetrics object. A subclass using the default set of metrics is responsible for updating the counters appropriately, e.g. fCounters->fNRead.Inc()
Alternatively, a subclass might provide its own RNTupleMetrics object by overriding the GetMetrics()
member function.
Definition at line 434 of file RPageStorage.cxx.
|
inline |
Definition at line 780 of file RPageStorage.hxx.
|
inlineprotected |
Note that the underlying lock is not recursive. See GetSharedDescriptorGuard()
for further information.
Definition at line 726 of file RPageStorage.hxx.
ROOT::Experimental::NTupleSize_t ROOT::Experimental::Internal::RPageSource::GetNElements | ( | ColumnHandle_t | columnHandle | ) |
Definition at line 227 of file RPageStorage.cxx.
ROOT::Experimental::NTupleSize_t ROOT::Experimental::Internal::RPageSource::GetNEntries | ( | ) |
Definition at line 222 of file RPageStorage.cxx.
|
inline |
Definition at line 750 of file RPageStorage.hxx.
|
inline |
Takes the read lock for the descriptor.
Multiple threads can take the lock concurrently. The underlying std::shared_mutex
, however, is neither read nor write recursive: within one thread, only one lock (shared or exclusive) must be acquired at the same time. This requires special care in sections protected by GetSharedDescriptorGuard()
and GetExclDescriptorGuard()
especially to avoid that the locks are acquired indirectly (e.g. by a call to GetNEntries()
). As a general guideline, no other method of the page source should be called (directly or indirectly) in a guarded section.
Definition at line 758 of file RPageStorage.hxx.
|
inlinefinalvirtual |
Whether the concrete implementation is a sink or a source.
Implements ROOT::Experimental::Internal::RPageStorage.
Definition at line 749 of file RPageStorage.hxx.
|
pure virtual |
Populates all the pages of the given cluster ids and columns; it is possible that some columns do not contain any pages.
The page source may load more columns than the minimal necessary set from columns
. To indicate which columns have been loaded, LoadClusters()
must mark them with
SetColumnAvailable(). That includes the ones from the
columnsthat don't have pages; otherwise subsequent requests for the cluster would assume an incomplete cluster and trigger loading again.
LoadClusters()` is typically called from the I/O thread of a cluster pool, i.e. the method runs concurrently to other methods of the page source.
Implemented in ROOT::Experimental::Internal::RPageSourceDaos, and ROOT::Experimental::Internal::RPageSourceFile.
|
virtual |
Allocates and fills a page that contains the index-th element.
The default implementation searches the page and calls LoadPageImpl(). Returns a default-constructed RPage for suppressed columns.
Definition at line 361 of file RPageStorage.cxx.
|
virtual |
Another version of LoadPage
that allows to specify cluster-relative indexes.
Returns a default-constructed RPage for suppressed columns.
Definition at line 399 of file RPageStorage.cxx.
|
protectedpure virtual |
|
pure virtual |
Read the packed and compressed bytes of a page into the memory buffer provided by sealedPage
.
The sealed page can be used subsequently in a call to RPageSink::CommitSealedPage
. The fSize
and fNElements
member of the sealedPage parameters are always set. If sealedPage.fBuffer
is nullptr
, no data will be copied but the returned size information can be used by the caller to allocate a large enough buffer and call LoadSealedPage
again.
Implemented in ROOT::Experimental::Internal::RPageSourceDaos, and ROOT::Experimental::Internal::RPageSourceFile.
void ROOT::Experimental::Internal::RPageSource::LoadStructure | ( | ) |
Loads header and footer without decompressing or deserializing them.
This can be used to asynchronously open a file in the background. The method is idempotent and it is called as a first step in Attach()
. Pages sources may or may not make use of splitting loading and processing meta-data. Therefore, LoadStructure()
may do nothing and defer loading the meta-data to Attach()
.
Definition at line 196 of file RPageStorage.cxx.
|
protectedpure virtual |
|
delete |
|
delete |
|
protected |
Prepare a page range read for the column set in clusterKey
.
Specifically, pages referencing the kTypePageZero
locator are filled in pageZeroMap
; otherwise, perPageFunc
is called for each page. This is commonly used as part of LoadClusters()
in derived classes.
Definition at line 309 of file RPageStorage.cxx.
void ROOT::Experimental::Internal::RPageSource::SetEntryRange | ( | const REntryRange & | range | ) |
Promise to only read from the given entry range.
If set, prevents the cluster pool from reading-ahead beyond the given range. The range needs to be within [0, GetNEntries())
.
Definition at line 188 of file RPageStorage.cxx.
ROOT::RResult< ROOT::Experimental::Internal::RPage > ROOT::Experimental::Internal::RPageSource::UnsealPage | ( | const RSealedPage & | sealedPage, |
const RColumnElementBase & | element ) |
Definition at line 530 of file RPageStorage.cxx.
|
static |
Helper for unstreaming a page.
This is commonly used in derived, concrete page sources. The implementation currently always makes a memory copy, even if the sealed page is uncompressed and in the final memory layout. The optimization of directly mapping pages is left to the concrete page source implementations.
Definition at line 536 of file RPageStorage.cxx.
void ROOT::Experimental::Internal::RPageSource::UnzipCluster | ( | RCluster * | cluster | ) |
Parallel decompression and unpacking of the pages in the given cluster.
The unzipped pages are supposed to be preloaded in a page pool attached to the source. The method is triggered by the cluster pool's unzip thread. It is an optional optimization, the method can safely do nothing. In particular, the actual implementation will only run if a task scheduler is set. In practice, a task scheduler is set if implicit multi-threading is turned on.
Definition at line 232 of file RPageStorage.cxx.
|
protectedvirtual |
Definition at line 238 of file RPageStorage.cxx.
|
private |
Does nothing if fLastUsedCluster == clusterId.
Otherwise, updated fLastUsedCluster and evict unused paged from the page pool of all previous clusters. Must not be called when the descriptor guard is taken.
Definition at line 335 of file RPageStorage.cxx.
|
protected |
The active columns are implicitly defined by the model fields or views.
Definition at line 694 of file RPageStorage.hxx.
|
protected |
Definition at line 690 of file RPageStorage.hxx.
|
private |
Definition at line 612 of file RPageStorage.hxx.
|
mutableprivate |
Definition at line 613 of file RPageStorage.hxx.
|
private |
Used by the cluster pool to prevent reading beyond the given range.
Definition at line 614 of file RPageStorage.hxx.
Set to true once LoadStructure()
is called.
Definition at line 615 of file RPageStorage.hxx.
Set to true once Attach()
is called.
Definition at line 616 of file RPageStorage.hxx.
|
private |
Remembers the last cluster id from which a page was requested.
Definition at line 619 of file RPageStorage.hxx.
|
protected |
Definition at line 692 of file RPageStorage.hxx.
|
protected |
Pages that are unzipped with IMT are staged into the page pool.
Definition at line 697 of file RPageStorage.hxx.
|
private |
Clusters from where pages got preloaded in UnzipClusterImpl(), ordered by first entry number of the clusters.
If the last used cluster changes in LoadPage(), all unused pages from previous clusters are evicted from the page pool.
Definition at line 623 of file RPageStorage.hxx.