Logo ROOT   6.12/07
Reference Guide
List of all members | Public Member Functions | Private Types | Private Member Functions | Private Attributes | Static Private Attributes | List of all members
ROOT::Experimental::TDF::TCsvDS Class Referencefinal

TDataFrame data source class for reading CSV files.

The TCsvDS class implements a CSV file reader for TDataFrame.

A TDataFrame that reads from a CSV file can be constructed using the factory method ROOT::Experimental::TDF::MakeCsvDataFrame, which accepts three parameters:

  1. Path to the CSV file.
  2. Boolean that specifies whether the first row of the CSV file contains headers or not (optional, default true). If false, header names will be automatically generated.
  3. Delimiter (optional, default ',').

The types of the columns in the CSV file are automatically inferred. The supported types are:

These are some formatting rules expected by the TCsvDS implementation:

The current implementation of TCsvDS reads the entire CSV file content into memory before TDataFrame starts processing it. Therefore, before creating a CSV TDataFrame, it is important to check both how much memory is available and the size of the CSV file.

Definition at line 18 of file TCsvDS.hxx.

Public Member Functions

 TCsvDS (std::string_view fileName, bool readHeaders=true, char delimiter=',')
 Constructor to create a CSV TDataSource for TDataFrame. More...
 
 ~TCsvDS ()
 Destructor. More...
 
const std::vector< std::string > & GetColumnNames () const
 Returns a reference to the collection of the dataset's column names. More...
 
std::vector< std::pair< ULong64_t, ULong64_t > > GetEntryRanges ()
 Return ranges of entries to distribute to tasks. More...
 
std::string GetTypeName (std::string_view colName) const
 Type of a column as a string, e.g. More...
 
bool HasColumn (std::string_view colName) const
 Checks if the dataset has a certain column. More...
 
void Initialise ()
 Convenience method called before starting an event-loop. More...
 
void SetEntry (unsigned int slot, ULong64_t entry)
 Advance the "cursors" returned by GetColumnReaders to the selected entry for a particular slot. More...
 
void SetNSlots (unsigned int nSlots)
 Inform TDataSource of the number of processing slots (i.e. More...
 
- Public Member Functions inherited from ROOT::Experimental::TDF::TDataSource
virtual ~TDataSource ()=default
 
virtual void Finalise ()
 Convenience method called after concluding an event-loop. More...
 
virtual void FinaliseSlot (unsigned int)
 Convenience method called at the end of the data processing associated to a slot. More...
 
template<typename T >
std::vector< T ** > GetColumnReaders (std::string_view columnName)
 Called at most once per column by TDF. More...
 
virtual void InitSlot (unsigned int, ULong64_t)
 Convenience method called at the start of the data processing associated to a slot. More...
 

Private Types

using Record = std::vector< void * >
 

Private Member Functions

void FillHeaders (const std::string &)
 
void FillRecord (const std::string &, Record &)
 
void GenerateHeaders (size_t)
 
std::vector< void * > GetColumnReadersImpl (std::string_view, const std::type_info &)
 type-erased vector of pointers to pointers to column values - one per slot More...
 
void InferColTypes (std::vector< std::string > &)
 
void InferType (const std::string &, unsigned int)
 
std::vector< std::string > ParseColumns (const std::string &)
 
size_t ParseValue (const std::string &, std::vector< std::string > &, size_t)
 

Private Attributes

std::vector< std::deque< bool > > fBoolEvtValues
 
std::vector< std::vector< void * > > fColAddresses
 
std::map< std::string, std::string > fColTypes
 
std::list< std::string > fColTypesList
 
char fDelimiter
 
std::vector< std::vector< double > > fDoubleEvtValues
 
std::vector< std::pair< ULong64_t, ULong64_t > > fEntryRanges
 
std::string fFileName
 
std::vector< std::string > fHeaders
 
std::vector< std::vector< Long64_t > > fLong64EvtValues
 
unsigned int fNSlots = 0U
 
std::vector< RecordfRecords
 
std::vector< std::vector< std::string > > fStringEvtValues
 

Static Private Attributes

static TRegexp doubleRegex1
 
static TRegexp doubleRegex2
 
static TRegexp falseRegex
 
static TRegexp intRegex
 
static TRegexp trueRegex
 

Additional Inherited Members

#include <ROOT/TCsvDS.hxx>

Inheritance diagram for ROOT::Experimental::TDF::TCsvDS:
[legend]

Member Typedef Documentation

◆ Record

using ROOT::Experimental::TDF::TCsvDS::Record = std::vector<void *>
private

Definition at line 21 of file TCsvDS.hxx.

Constructor & Destructor Documentation

◆ TCsvDS()

ROOT::Experimental::TDF::TCsvDS::TCsvDS ( std::string_view  fileName,
bool  readHeaders = true,
char  delimiter = ',' 
)

Constructor to create a CSV TDataSource for TDataFrame.

Parameters
[in]fileNamePath of the CSV file.
[in]readHeaderstrue if the CSV file contains headers as first row, false otherwise (default true).
[in]delimiterDelimiter character (default ',').

Definition at line 239 of file TCsvDS.cxx.

◆ ~TCsvDS()

ROOT::Experimental::TDF::TCsvDS::~TCsvDS ( )

Destructor.

Definition at line 278 of file TCsvDS.cxx.

Member Function Documentation

◆ FillHeaders()

void ROOT::Experimental::TDF::TCsvDS::FillHeaders ( const std::string &  line)
private

Definition at line 95 of file TCsvDS.cxx.

◆ FillRecord()

void ROOT::Experimental::TDF::TCsvDS::FillRecord ( const std::string &  line,
Record record 
)
private

Definition at line 103 of file TCsvDS.cxx.

◆ GenerateHeaders()

void ROOT::Experimental::TDF::TCsvDS::GenerateHeaders ( size_t  size)
private

Definition at line 129 of file TCsvDS.cxx.

◆ GetColumnNames()

const std::vector< std::string > & ROOT::Experimental::TDF::TCsvDS::GetColumnNames ( ) const
virtual

Returns a reference to the collection of the dataset's column names.

Implements ROOT::Experimental::TDF::TDataSource.

Definition at line 298 of file TCsvDS.cxx.

◆ GetColumnReadersImpl()

std::vector< void * > ROOT::Experimental::TDF::TCsvDS::GetColumnReadersImpl ( std::string_view  name,
const std::type_info &   
)
privatevirtual

type-erased vector of pointers to pointers to column values - one per slot

Implements ROOT::Experimental::TDF::TDataSource.

Definition at line 136 of file TCsvDS.cxx.

◆ GetEntryRanges()

std::vector< std::pair< ULong64_t, ULong64_t > > ROOT::Experimental::TDF::TCsvDS::GetEntryRanges ( )
virtual

Return ranges of entries to distribute to tasks.

They are required to be contiguous intervals with no entries skipped. Supposing a dataset with nEntries, the intervals must start at 0 and end at nEntries, e.g. [0-5],[5-10] for 10 entries.

Implements ROOT::Experimental::TDF::TDataSource.

Definition at line 303 of file TCsvDS.cxx.

◆ GetTypeName()

std::string ROOT::Experimental::TDF::TCsvDS::GetTypeName ( std::string_view  ) const
virtual

Type of a column as a string, e.g.

GetTypeName("x") == "double". Required for jitting e.g. df.Filter("x>0").

Parameters
[in]columnNameThe name of the column

Implements ROOT::Experimental::TDF::TDataSource.

Definition at line 309 of file TCsvDS.cxx.

◆ HasColumn()

bool ROOT::Experimental::TDF::TCsvDS::HasColumn ( std::string_view  ) const
virtual

Checks if the dataset has a certain column.

Parameters
[in]columnNameThe name of the column

Implements ROOT::Experimental::TDF::TDataSource.

Definition at line 320 of file TCsvDS.cxx.

◆ InferColTypes()

void ROOT::Experimental::TDF::TCsvDS::InferColTypes ( std::vector< std::string > &  columns)
private

Definition at line 168 of file TCsvDS.cxx.

◆ InferType()

void ROOT::Experimental::TDF::TCsvDS::InferType ( const std::string &  col,
unsigned int  idxCol 
)
private

Definition at line 177 of file TCsvDS.cxx.

◆ Initialise()

void ROOT::Experimental::TDF::TCsvDS::Initialise ( )
virtual

Convenience method called before starting an event-loop.

This method might be called multiple times over the lifetime of a TDataSource, since users can run multiple event-loops with the same TDataFrame. Ideally, Initialise should set the state of the TDataSource so that multiple identical event-loops will produce identical results.

Reimplemented from ROOT::Experimental::TDF::TDataSource.

Definition at line 360 of file TCsvDS.cxx.

◆ ParseColumns()

std::vector< std::string > ROOT::Experimental::TDF::TCsvDS::ParseColumns ( const std::string &  line)
private

Definition at line 197 of file TCsvDS.cxx.

◆ ParseValue()

size_t ROOT::Experimental::TDF::TCsvDS::ParseValue ( const std::string &  line,
std::vector< std::string > &  columns,
size_t  i 
)
private

Definition at line 208 of file TCsvDS.cxx.

◆ SetEntry()

void ROOT::Experimental::TDF::TCsvDS::SetEntry ( unsigned int  slot,
ULong64_t  entry 
)
virtual

Advance the "cursors" returned by GetColumnReaders to the selected entry for a particular slot.

Parameters
[in]slotThe data processing slot that needs to be considered
[in]entryThe entry which needs to be pointed to by the reader pointers Slots are adopted to accommodate parallel data processing. Different workers will loop over different ranges and will be labelled by different "slot" values.

Implements ROOT::Experimental::TDF::TDataSource.

Definition at line 325 of file TCsvDS.cxx.

◆ SetNSlots()

void ROOT::Experimental::TDF::TCsvDS::SetNSlots ( unsigned int  nSlots)
virtual

Inform TDataSource of the number of processing slots (i.e.

worker threads) used by the associated TDataFrame. Slots numbers are used to simplify parallel execution: TDataFrame guarantees that different threads will always pass different slot values when calling methods concurrently.

Implements ROOT::Experimental::TDF::TDataSource.

Definition at line 343 of file TCsvDS.cxx.

Member Data Documentation

◆ doubleRegex1

TRegexp ROOT::Experimental::TDF::TCsvDS::doubleRegex1
staticprivate

Definition at line 39 of file TCsvDS.hxx.

◆ doubleRegex2

TRegexp ROOT::Experimental::TDF::TCsvDS::doubleRegex2
staticprivate

Definition at line 39 of file TCsvDS.hxx.

◆ falseRegex

TRegexp ROOT::Experimental::TDF::TCsvDS::falseRegex
staticprivate

Definition at line 39 of file TCsvDS.hxx.

◆ fBoolEvtValues

std::vector<std::deque<bool> > ROOT::Experimental::TDF::TCsvDS::fBoolEvtValues
private

Definition at line 37 of file TCsvDS.hxx.

◆ fColAddresses

std::vector<std::vector<void *> > ROOT::Experimental::TDF::TCsvDS::fColAddresses
private

Definition at line 29 of file TCsvDS.hxx.

◆ fColTypes

std::map<std::string, std::string> ROOT::Experimental::TDF::TCsvDS::fColTypes
private

Definition at line 27 of file TCsvDS.hxx.

◆ fColTypesList

std::list<std::string> ROOT::Experimental::TDF::TCsvDS::fColTypesList
private

Definition at line 28 of file TCsvDS.hxx.

◆ fDelimiter

char ROOT::Experimental::TDF::TCsvDS::fDelimiter
private

Definition at line 25 of file TCsvDS.hxx.

◆ fDoubleEvtValues

std::vector<std::vector<double> > ROOT::Experimental::TDF::TCsvDS::fDoubleEvtValues
private

Definition at line 32 of file TCsvDS.hxx.

◆ fEntryRanges

std::vector<std::pair<ULong64_t, ULong64_t> > ROOT::Experimental::TDF::TCsvDS::fEntryRanges
private

Definition at line 30 of file TCsvDS.hxx.

◆ fFileName

std::string ROOT::Experimental::TDF::TCsvDS::fFileName
private

Definition at line 24 of file TCsvDS.hxx.

◆ fHeaders

std::vector<std::string> ROOT::Experimental::TDF::TCsvDS::fHeaders
private

Definition at line 26 of file TCsvDS.hxx.

◆ fLong64EvtValues

std::vector<std::vector<Long64_t> > ROOT::Experimental::TDF::TCsvDS::fLong64EvtValues
private

Definition at line 33 of file TCsvDS.hxx.

◆ fNSlots

unsigned int ROOT::Experimental::TDF::TCsvDS::fNSlots = 0U
private

Definition at line 23 of file TCsvDS.hxx.

◆ fRecords

std::vector<Record> ROOT::Experimental::TDF::TCsvDS::fRecords
private

Definition at line 31 of file TCsvDS.hxx.

◆ fStringEvtValues

std::vector<std::vector<std::string> > ROOT::Experimental::TDF::TCsvDS::fStringEvtValues
private

Definition at line 34 of file TCsvDS.hxx.

◆ intRegex

TRegexp ROOT::Experimental::TDF::TCsvDS::intRegex
staticprivate

Definition at line 39 of file TCsvDS.hxx.

◆ trueRegex

TRegexp ROOT::Experimental::TDF::TCsvDS::trueRegex
staticprivate

Definition at line 39 of file TCsvDS.hxx.

Libraries for ROOT::Experimental::TDF::TCsvDS:
[legend]

The documentation for this class was generated from the following files: