RSqliteDS is an RDF data source implementation for SQL result sets from sqlite3 files.
RDataFrame data source class for reading SQlite files.
The RSqliteDS is able to feed an RDataFrame with data from a SQlite SELECT query. One can use it like
auto rdf = ROOT::RDF::MakeSqliteDataFrame("/path/to/file.sqlite", "select name from table"); auto h = rdf.Define("lName", "name.length()").Histo1D("lName");
The data source has to provide column types for all the columns. Determining column types in SQlite is tricky as it is dynamically typed and in principle each row can have different column types. The following heuristics is used:
Definition at line 52 of file RSqliteDS.hxx.
Classes | |
struct | Value_t |
Used to hold a single "cell" of the SELECT query's result table. Can be changed to std::variant once available. More... | |
Public Member Functions | |
RSqliteDS (const std::string &fileName, const std::string &query) | |
Build the dataframe. More... | |
~RSqliteDS () | |
Frees the sqlite resources and closes the file. More... | |
const std::vector< std::string > & | GetColumnNames () const final |
Returns the SELECT queries names. More... | |
std::vector< std::pair< ULong64_t, ULong64_t > > | GetEntryRanges () final |
Returns a range of size 1 as long as more rows are available in the SQL result set. More... | |
std::string | GetLabel () final |
Return a string representation of the datasource type. More... | |
std::string | GetTypeName (std::string_view colName) const final |
Returns the C++ type for a given column name, implemented as a linear search through all the columns. More... | |
bool | HasColumn (std::string_view colName) const final |
A linear search through the columns for the given name. More... | |
void | Initialise () final |
Resets the SQlite query engine at the beginning of the event loop. More... | |
bool | SetEntry (unsigned int slot, ULong64_t entry) final |
Stores the result of the current active sqlite query row as a C++ value. More... | |
void | SetNSlots (unsigned int nSlots) final |
Almost a no-op, many slots can in fact reduce the performance due to thread synchronization. More... | |
Public Member Functions inherited from ROOT::RDF::RDataSource | |
virtual | ~RDataSource ()=default |
virtual void | Finalise () |
Convenience method called after concluding an event-loop. More... | |
virtual void | FinaliseSlot (unsigned int) |
Convenience method called at the end of the data processing associated to a slot. More... | |
virtual const std::vector< std::string > & | GetColumnNames () const =0 |
Returns a reference to the collection of the dataset's column names. More... | |
template<typename T > | |
std::vector< T ** > | GetColumnReaders (std::string_view columnName) |
Called at most once per column by RDF. More... | |
virtual std::vector< std::pair< ULong64_t, ULong64_t > > | GetEntryRanges ()=0 |
Return ranges of entries to distribute to tasks. More... | |
virtual std::string | GetLabel () |
Return a string representation of the datasource type. More... | |
virtual std::string | GetTypeName (std::string_view) const =0 |
Type of a column as a string, e.g. More... | |
virtual bool | HasColumn (std::string_view) const =0 |
Checks if the dataset has a certain column. More... | |
virtual void | Initialise () |
Convenience method called before starting an event-loop. More... | |
virtual void | InitSlot (unsigned int, ULong64_t) |
Convenience method called at the start of the data processing associated to a slot. More... | |
virtual bool | SetEntry (unsigned int slot, ULong64_t entry)=0 |
Advance the "cursors" returned by GetColumnReaders to the selected entry for a particular slot. More... | |
virtual void | SetNSlots (unsigned int nSlots)=0 |
Inform RDataSource of the number of processing slots (i.e. More... | |
Protected Member Functions | |
Record_t | GetColumnReadersImpl (std::string_view name, const std::type_info &) final |
Activates the given column's result value. More... | |
Protected Member Functions inherited from ROOT::RDF::RDataSource | |
virtual std::string | AsString () |
virtual Record_t | GetColumnReadersImpl (std::string_view name, const std::type_info &)=0 |
type-erased vector of pointers to pointers to column values - one per slot More... | |
Private Types | |
enum class | ETypes { kInteger , kReal , kText , kBlob , kNull } |
All the types known to SQlite. Changes require changing fgTypeNames, too. More... | |
Private Member Functions | |
void | SqliteError (int errcode) |
Helper function to throw an exception if there is a fatal sqlite error, e.g. an I/O error. More... | |
Private Attributes | |
std::vector< std::string > | fColumnNames |
std::vector< ETypes > | fColumnTypes |
std::unique_ptr< Internal::RSqliteDSDataSet > | fDataSet |
ULong64_t | fNRow |
unsigned int | fNSlots |
std::vector< Value_t > | fValues |
The data source is inherently single-threaded and returns only one row at a time. This vector holds the results. More... | |
Static Private Attributes | |
static constexpr char const * | fgTypeNames [] |
Corresponds to the types defined in ETypes. More... | |
Additional Inherited Members | |
Protected Types inherited from ROOT::RDF::RDataSource | |
using | Record_t = std::vector< void * > |
#include <ROOT/RSqliteDS.hxx>
|
strongprivate |
All the types known to SQlite. Changes require changing fgTypeNames, too.
Enumerator | |
---|---|
kInteger | |
kReal | |
kText | |
kBlob | |
kNull |
Definition at line 56 of file RSqliteDS.hxx.
ROOT::RDF::RSqliteDS::RSqliteDS | ( | const std::string & | fileName, |
const std::string & | query | ||
) |
Build the dataframe.
[in] | fileName | The path to an sqlite3 file, will be opened read-only |
[in] | query | A valid sqlite3 SELECT query |
The constructor opens the sqlite file, prepares the query engine and determines the column names and types.
Definition at line 364 of file RSqliteDS.cxx.
ROOT::RDF::RSqliteDS::~RSqliteDS | ( | ) |
Frees the sqlite resources and closes the file.
Definition at line 441 of file RSqliteDS.cxx.
|
finalvirtual |
Returns the SELECT queries names.
The column names have been cached in the constructor. For expressions, the column name is the string of the expression unless the query defines a column name with as like in "SELECT 1 + 1 as mycolumn FROM table"
Implements ROOT::RDF::RDataSource.
Definition at line 454 of file RSqliteDS.cxx.
|
finalprotectedvirtual |
Activates the given column's result value.
Implements ROOT::RDF::RDataSource.
Definition at line 461 of file RSqliteDS.cxx.
|
finalvirtual |
Returns a range of size 1 as long as more rows are available in the SQL result set.
This inherently serialized the RDF independent of the number of slots.
Implements ROOT::RDF::RDataSource.
Definition at line 484 of file RSqliteDS.cxx.
|
finalvirtual |
Return a string representation of the datasource type.
The returned string will be used by ROOT::RDF::SaveGraph() to represent the datasource in the visualization of the computation graph. Concrete datasources can override the default implementation.
Reimplemented from ROOT::RDF::RDataSource.
Definition at line 532 of file RSqliteDS.cxx.
|
finalvirtual |
Returns the C++ type for a given column name, implemented as a linear search through all the columns.
Implements ROOT::RDF::RDataSource.
Definition at line 503 of file RSqliteDS.cxx.
|
finalvirtual |
A linear search through the columns for the given name.
Implements ROOT::RDF::RDataSource.
Definition at line 517 of file RSqliteDS.cxx.
|
finalvirtual |
Resets the SQlite query engine at the beginning of the event loop.
Reimplemented from ROOT::RDF::RDataSource.
Definition at line 524 of file RSqliteDS.cxx.
Stores the result of the current active sqlite query row as a C++ value.
Implements ROOT::RDF::RDataSource.
Definition at line 549 of file RSqliteDS.cxx.
Almost a no-op, many slots can in fact reduce the performance due to thread synchronization.
Implements ROOT::RDF::RDataSource.
Definition at line 585 of file RSqliteDS.cxx.
Helper function to throw an exception if there is a fatal sqlite error, e.g. an I/O error.
Definition at line 596 of file RSqliteDS.cxx.
|
private |
Definition at line 84 of file RSqliteDS.hxx.
|
private |
Definition at line 85 of file RSqliteDS.hxx.
|
private |
Definition at line 81 of file RSqliteDS.hxx.
|
staticconstexprprivate |
Corresponds to the types defined in ETypes.
Definition at line 91 of file RSqliteDS.hxx.
|
private |
Definition at line 83 of file RSqliteDS.hxx.
|
private |
Definition at line 82 of file RSqliteDS.hxx.
|
private |
The data source is inherently single-threaded and returns only one row at a time. This vector holds the results.
Definition at line 87 of file RSqliteDS.hxx.