Storage provider that reads ntuple pages from a file.
Definition at line 124 of file RPageStorageFile.hxx.
Classes | |
| struct | RFileCounters |
| File-specific I/O performance counters. More... | |
Public Types | |
| using | ColumnHandle_t = RColumnHandle |
| The column handle identifies a column with the current open page storage. | |
| using | SealedPageSequence_t = std::deque<RSealedPage> |
Static Public Member Functions | |
| static std::unique_ptr< RPageSource > | Create (std::string_view ntupleName, std::string_view location, const ROOT::RNTupleReadOptions &options=ROOT::RNTupleReadOptions()) |
| Guess the concrete derived page source from the file name (location) | |
| static std::unique_ptr< RPageSourceFile > | CreateFromAnchor (const RNTuple &anchor, const ROOT::RNTupleReadOptions &options=ROOT::RNTupleReadOptions()) |
| Used from the RNTuple class to build a datasource if the anchor is already available. | |
| static RResult< ROOT::Internal::RPage > | UnsealPage (const RSealedPage &sealedPage, const ROOT::Internal::RColumnElementBase &element, ROOT::Internal::RPageAllocator &pageAlloc) |
| Helper for unstreaming a page. | |
Static Public Attributes | |
| static constexpr std::size_t | kNBytesPageChecksum = sizeof(std::uint64_t) |
| The page checksum is a 64bit xxhash3. | |
Protected Attributes | |
| std::unique_ptr< RCounters > | fCounters |
| ROOT::Experimental::Detail::RNTupleMetrics | fMetrics |
| std::string | fNTupleName |
| ROOT::RNTupleReadOptions | fOptions |
| std::unique_ptr< ROOT::Internal::RPageAllocator > | fPageAllocator |
| For the time being, we will use the heap allocator for all sources and sinks. This may change in the future. | |
| RStructureBuffer | fStructureBuffer |
| Populated by LoadStructureImpl(), reset at the end of Attach() | |
| RTaskScheduler * | fTaskScheduler = nullptr |
Private Member Functions | |
| RPageSourceFile (std::string_view ntupleName, const ROOT::RNTupleReadOptions &options) | |
| ROOT::Internal::RPageRef | LoadPageFromSummary (ColumnHandle_t columnHandle, const RPageSummary &pageSummary) |
| ROOT::Internal::RPageRef | LoadZeroPage (ColumnHandle_t columnHandle, const RPageSummary &pageSummary) |
| std::unique_ptr< ROOT::Internal::RCluster > | PrepareSingleCluster (const ROOT::Internal::RCluster::RKey &clusterKey, std::vector< RRawFile::RIOVec > &readRequests) |
| Helper function for LoadClusters: it prepares the memory buffer (page map) and the read requests for a given cluster and columns. | |
| void | UpdateLastUsedCluster (ROOT::DescriptorId_t clusterId) |
| Does nothing if fLastUsedCluster == clusterId. | |
Private Attributes | |
| RActivePhysicalColumns | fActivePhysicalColumns |
| The active columns are implicitly defined by the model fields or views. | |
| std::optional< RNTuple > | fAnchor |
| Either provided by CreateFromAnchor, or read from the ROOT file given the ntuple name. | |
| ROOT::Internal::RClusterPool | fClusterPool |
| The cluster pool asynchronously preloads the next few clusters. | |
| ROOT::Internal::RCluster * | fCurrentCluster = nullptr |
| The last cluster from which a page got loaded. Points into fClusterPool->fPool. | |
| ROOT::RNTupleDescriptor | fDescriptor |
| RNTupleDescriptorBuilder | fDescriptorBuilder |
| The descriptor is created from the header and footer either in AttachImpl or in CreateFromAnchor. | |
| std::shared_mutex | fDescriptorLock |
| REntryRange | fEntryRange |
| Used by the cluster pool to prevent reading beyond the given range. | |
| std::unique_ptr< RRawFile > | fFile |
| An RRawFile is used to request the necessary byte ranges from a local or a remote file. | |
| std::unique_ptr< RFileCounters > | fFileCounters |
| std::int64_t | fFileSize = 0 |
| Total file size, set once in AttachImpl() | |
| bool | fHasStreamerInfosRegistered = false |
| Set to true when RegisterStreamerInfos() is called. | |
| bool | fHasStructure = false |
Set to true once LoadStructure() is called. | |
| bool | fIsAttached = false |
Set to true once Attach() is called. | |
| std::uint64_t | fLastOffset = 0 |
| Tracks the last read offset for seek distance calculation. | |
| ROOT::DescriptorId_t | fLastUsedCluster = ROOT::kInvalidDescriptorId |
| Remembers the last cluster id from which a page was requested. | |
| ROOT::Internal::RPagePool | fPagePool |
| Pages that are unzipped with IMT are staged into the page pool. | |
| std::unordered_set< ROOT::DescriptorId_t > | fPinnedClusters |
| Pinned clusters and their $2 * (cluster bunch size) - 1$ successors will not be evicted from the cluster pool. | |
| std::map< ROOT::NTupleSize_t, ROOT::DescriptorId_t > | fPreloadedClusters |
| Clusters from where pages got preloaded in UnzipClusterImpl(), ordered by first entry number of the clusters. | |
| ROOT::Internal::RMiniFileReader | fReader |
| Takes the fFile to read ntuple blobs from it. | |
Friends | |
| class | ROOT::RNTuple |
#include <ROOT/RPageStorageFile.hxx>
The column handle identifies a column with the current open page storage.
Definition at line 180 of file RPageStorage.hxx.
|
inherited |
Definition at line 130 of file RPageStorage.hxx.
|
private |
Definition at line 323 of file RPageStorageFile.cxx.
| ROOT::Internal::RPageSourceFile::RPageSourceFile | ( | std::string_view | ntupleName, |
| std::string_view | path, | ||
| const ROOT::RNTupleReadOptions & | options ) |
Definition at line 379 of file RPageStorageFile.cxx.
| ROOT::Internal::RPageSourceFile::RPageSourceFile | ( | std::string_view | ntupleName, |
| std::unique_ptr< RRawFile > | file, | ||
| const ROOT::RNTupleReadOptions & | options ) |
Definition at line 369 of file RPageStorageFile.cxx.
|
delete |
|
delete |
|
override |
Definition at line 411 of file RPageStorageFile.cxx.
|
overridevirtualinherited |
Register a new column.
When reading, the column must exist in the ntuple on disk corresponding to the metadata. When writing, every column can only be attached once.
Implements ROOT::Internal::RPageStorage.
Definition at line 195 of file RPageStorage.cxx.
|
inherited |
Open the physical storage container and deserialize header and footer.
Definition at line 225 of file RPageStorage.cxx.
|
finalprotectedvirtual |
LoadStructureImpl() has been called before AttachImpl() is called
Implements ROOT::Internal::RPageSource.
Definition at line 476 of file RPageStorageFile.cxx.
|
inherited |
Open the same storage multiple time, e.g.
for reading in multiple threads. If the source is already attached, the clone will be attached, too. The clone will use, however, it's own connection to the underlying storage (e.g., file descriptor, XRootD handle, etc.)
Definition at line 251 of file RPageStorage.cxx.
|
finalprotectedvirtual |
The cloned page source creates a new raw file and reader and opens its own file descriptor to the data.
Implements ROOT::Internal::RPageSource.
Definition at line 523 of file RPageStorageFile.cxx.
|
staticinherited |
Guess the concrete derived page source from the file name (location)
Definition at line 175 of file RPageStorage.cxx.
|
static |
Used from the RNTuple class to build a datasource if the anchor is already available.
Requires the RNTuple object to be streamed from a file.
Definition at line 386 of file RPageStorageFile.cxx.
|
overridevirtualinherited |
Unregisters a column.
A page source decreases the reference counter for the corresponding active column. For a page sink, dropping columns is currently a no-op.
Implements ROOT::Internal::RPageStorage.
Definition at line 205 of file RPageStorage.cxx.
|
protectedinherited |
Enables the default set of metrics provided by RPageSource.
prefix will be used as the prefix for the counters registered in the internal RNTupleMetrics object. A subclass using the default set of metrics is responsible for updating the counters appropriately, e.g. fCounters->fNRead.Inc() Alternatively, a subclass might provide its own RNTupleMetrics object by overriding the GetMetrics() member function.
Definition at line 583 of file RPageStorage.cxx.
|
inlineinherited |
Definition at line 188 of file RPageStorage.hxx.
|
inlineinherited |
Definition at line 840 of file RPageStorage.hxx.
|
inlineprotectedinherited |
Note that the underlying lock is not recursive. See GetSharedDescriptorGuard() for further information.
Definition at line 782 of file RPageStorage.hxx.
|
inlinevirtualinherited |
Returns the default metrics object.
Subclasses might alternatively provide their own metrics object by overriding this.
Definition at line 192 of file RPageStorage.hxx.
|
inherited |
Definition at line 267 of file RPageStorage.cxx.
|
inherited |
Definition at line 262 of file RPageStorage.cxx.
|
inlineinherited |
Returns the NTuple name.
Definition at line 195 of file RPageStorage.hxx.
|
inlineinherited |
Definition at line 878 of file RPageStorage.hxx.
|
inlineinherited |
Definition at line 810 of file RPageStorage.hxx.
|
inlineinherited |
Takes the read lock for the descriptor.
Multiple threads can take the lock concurrently. The underlying std::shared_mutex, however, is neither read nor write recursive: within one thread, only one lock (shared or exclusive) must be acquired at the same time. This requires special care in sections protected by GetSharedDescriptorGuard() and GetExclDescriptorGuard() especially to avoid that the locks are acquired indirectly. As a general guideline, no other method of the page source should be called (directly or indirectly) in a guarded section.
Definition at line 818 of file RPageStorage.hxx.
|
inlinefinalvirtualinherited |
Whether the concrete implementation is a sink or a source.
Implements ROOT::Internal::RPageStorage.
Definition at line 809 of file RPageStorage.hxx.
|
finalvirtual |
Populates all the pages of the given cluster ids and columns; it is possible that some columns do not contain any pages.
The page source may load more columns than the minimal necessary set from columns. To indicate which columns have been loaded, LoadClusters()must mark them withSetColumnAvailable(). That includes the ones from thecolumnsthat don't have pages; otherwise subsequent requests for the cluster would assume an incomplete cluster and trigger loading again. LoadClusters()` is typically called from the I/O thread of a cluster pool, i.e. the method runs concurrently to other methods of the page source.
Implements ROOT::Internal::RPageSource.
Definition at line 657 of file RPageStorageFile.cxx.
|
virtualinherited |
Another version of LoadPage that allows to specify cluster-relative indexes.
Returns a default-constructed RPage for suppressed columns.
Definition at line 552 of file RPageStorage.cxx.
|
virtualinherited |
Allocates and fills a page that contains the index-th element.
Calls into the concrete page source for loading the corresponding sealed page of cluster where necessary. Returns a default-constructed RPage for suppressed columns.
Definition at line 519 of file RPageStorage.cxx.
|
privateinherited |
Definition at line 455 of file RPageStorage.cxx.
|
finalprotectedvirtual |
Implements ROOT::Internal::RPageSource.
Definition at line 502 of file RPageStorageFile.cxx.
|
inherited |
Read the packed and compressed bytes of a page into the memory buffer provided by sealedPage.
The sealed page can be used subsequently in a call to RPageSink::CommitSealedPage. The fSize and fNElements member of the sealedPage parameters are always set. If sealedPage.fBuffer is nullptr, no data will be copied but the returned size information can be used by the caller to allocate a large enough buffer and call LoadSealedPage again.
Definition at line 411 of file RPageStorage.cxx.
|
finalprotectedvirtual |
Implements ROOT::Internal::RPageSource.
Definition at line 507 of file RPageStorageFile.cxx.
|
finalvirtual |
Forces the loading of ROOT StreamerInfo from the underlying file.
This currently only has an effect for TFile-backed sources.
Implements ROOT::Internal::RPageSource.
Definition at line 724 of file RPageStorageFile.cxx.
|
inherited |
Loads header and footer without decompressing or deserializing them.
This can be used to asynchronously open a file in the background. The method is idempotent and it is called as a first step in Attach(). Pages sources may or may not make use of splitting loading and processing metadata. Therefore, LoadStructure() may do nothing and defer loading the metadata to Attach().
Definition at line 218 of file RPageStorage.cxx.
|
finalprotectedvirtual |
Fills fStructureBuffer with the compressed header and footer.
Implements ROOT::Internal::RPageSource.
Definition at line 431 of file RPageStorageFile.cxx.
|
privateinherited |
Definition at line 437 of file RPageStorage.cxx.
|
finalvirtual |
Creates a new PageSource using the same underlying file as this but referring to a different RNTuple, described by anchorLink.
Implements ROOT::Internal::RPageSource.
Definition at line 417 of file RPageStorageFile.cxx.
|
delete |
|
delete |
|
inlineinherited |
Instructs the cluster pool and page pool to consider the given cluster as active (should stay cached).
Definition at line 875 of file RPageStorage.hxx.
|
protectedinherited |
Prepare a page range read for the column set in clusterKey.
Specifically, pages referencing the kTypePageZero locator are filled in pageZeroMap; otherwise, perPageFunc is called for each page. This is commonly used as part of LoadClusters() in derived classes.
Definition at line 350 of file RPageStorage.cxx.
|
private |
Helper function for LoadClusters: it prepares the memory buffer (page map) and the read requests for a given cluster and columns.
The reead requests are appended to the provided vector. This way, requests can be collected for multiple clusters before sending them to RRawFile::ReadV().
Definition at line 532 of file RPageStorageFile.cxx.
|
inherited |
Builds the streamer info records from the descriptor's extra type info section.
This is necessary when connecting streamer fields so that emulated classes can be read.
Definition at line 720 of file RPageStorage.cxx.
|
inherited |
Promise to only read from the given entry range.
If set, prevents the cluster pool from reading-ahead beyond the given range. The range needs to be within [0, GetNEntries()).
Definition at line 210 of file RPageStorage.cxx.
|
inlineinherited |
Definition at line 197 of file RPageStorage.hxx.
|
inlineprotectedinherited |
Definition at line 785 of file RPageStorage.hxx.
|
inlineinherited |
Allows the given cluster to be evicted from the cluster pool and page pool.
Definition at line 877 of file RPageStorage.hxx.
|
inherited |
Definition at line 676 of file RPageStorage.cxx.
|
staticinherited |
Helper for unstreaming a page.
This is commonly used in derived, concrete page sources. The implementation currently always makes a memory copy, even if the sealed page is uncompressed and in the final memory layout. The optimization of directly mapping pages is left to the concrete page source implementations.
Definition at line 681 of file RPageStorage.cxx.
|
inherited |
Parallel decompression and unpacking of the pages in the given cluster.
The unzipped pages are supposed to be preloaded in a page pool attached to the source. The method is triggered by the cluster pool's unzip thread. It is an optional optimization, the method can safely do nothing. In particular, the actual implementation will only run if a task scheduler is set. In practice, a task scheduler is set if implicit multi-threading is turned on.
Definition at line 272 of file RPageStorage.cxx.
|
protectedvirtualinherited |
Definition at line 278 of file RPageStorage.cxx.
|
privateinherited |
Does nothing if fLastUsedCluster == clusterId.
Otherwise, updated fLastUsedCluster and evict unused paged from the page pool of all previous clusters. Must not be called when the descriptor guard is taken.
Definition at line 377 of file RPageStorage.cxx.
|
inlineprotectedinherited |
Definition at line 153 of file RPageStorage.hxx.
|
friend |
Definition at line 125 of file RPageStorageFile.hxx.
|
privateinherited |
The active columns are implicitly defined by the model fields or views.
Definition at line 675 of file RPageStorage.hxx.
|
private |
Either provided by CreateFromAnchor, or read from the ROOT file given the ntuple name.
Definition at line 129 of file RPageStorageFile.hxx.
|
privateinherited |
The cluster pool asynchronously preloads the next few clusters.
Note that derived classes should call StopClusterPoolBackgroundThread() in their destructor so that the I/O background thread does not call methods from the destructed derived class.
Definition at line 680 of file RPageStorage.hxx.
|
protectedinherited |
Definition at line 745 of file RPageStorage.hxx.
|
private |
The last cluster from which a page got loaded. Points into fClusterPool->fPool.
Definition at line 131 of file RPageStorageFile.hxx.
|
privateinherited |
Definition at line 667 of file RPageStorage.hxx.
|
private |
The descriptor is created from the header and footer either in AttachImpl or in CreateFromAnchor.
Definition at line 137 of file RPageStorageFile.hxx.
|
mutableprivateinherited |
Definition at line 668 of file RPageStorage.hxx.
|
privateinherited |
Used by the cluster pool to prevent reading beyond the given range.
Definition at line 669 of file RPageStorage.hxx.
|
private |
An RRawFile is used to request the necessary byte ranges from a local or a remote file.
Definition at line 133 of file RPageStorageFile.hxx.
|
private |
Definition at line 148 of file RPageStorageFile.hxx.
|
private |
Total file size, set once in AttachImpl()
Definition at line 150 of file RPageStorageFile.hxx.
Set to true when RegisterStreamerInfos() is called.
Definition at line 672 of file RPageStorage.hxx.
Set to true once LoadStructure() is called.
Definition at line 670 of file RPageStorage.hxx.
Set to true once Attach() is called.
Definition at line 671 of file RPageStorage.hxx.
|
private |
Tracks the last read offset for seek distance calculation.
Definition at line 139 of file RPageStorageFile.hxx.
|
privateinherited |
Remembers the last cluster id from which a page was requested.
Definition at line 687 of file RPageStorage.hxx.
|
protectedinherited |
Definition at line 146 of file RPageStorage.hxx.
|
protectedinherited |
Definition at line 151 of file RPageStorage.hxx.
|
protectedinherited |
Definition at line 748 of file RPageStorage.hxx.
|
protectedinherited |
For the time being, we will use the heap allocator for all sources and sinks. This may change in the future.
Definition at line 149 of file RPageStorage.hxx.
|
privateinherited |
Pages that are unzipped with IMT are staged into the page pool.
Definition at line 684 of file RPageStorage.hxx.
|
privateinherited |
Pinned clusters and their $2 * (cluster bunch size) - 1$ successors will not be evicted from the cluster pool.
Pages of pinned clusters won't be evicted from the page pool.
Definition at line 695 of file RPageStorage.hxx.
|
privateinherited |
Clusters from where pages got preloaded in UnzipClusterImpl(), ordered by first entry number of the clusters.
If the last used cluster changes in LoadPage(), all unused pages from previous clusters are evicted from the page pool. Pinned clusters won't be evicted.
Definition at line 691 of file RPageStorage.hxx.
|
private |
Takes the fFile to read ntuple blobs from it.
Definition at line 135 of file RPageStorageFile.hxx.
|
protectedinherited |
Populated by LoadStructureImpl(), reset at the end of Attach()
Definition at line 746 of file RPageStorage.hxx.
|
protectedinherited |
Definition at line 152 of file RPageStorage.hxx.
|
staticconstexprinherited |
The page checksum is a 64bit xxhash3.
Definition at line 73 of file RPageStorage.hxx.