Hi Dimitri,
However in this case, there is no clear meaning to the compressed size.
One GetEntry uses is only part of buffers that are compressed (i.e. each compression unit (a basket) is containing the data for the branch for several entries) and a compressed value would be at best an approximation.
> As a user, I would be way more interested to get the "real" number of bytes flowing from the source to my application, not the > "uncompressed" one.
Actually, in my opinion, both are very important. The uncompressed size represent your real data size and as an overall flow of data is much more interesting from an application point of view than the compressed size. The compressed size is of course also important but more storage management point of view.
> so the read rate is ~ 2 MB/s
By the way this read rate looks pretty low and is likely 'only' due to the high compression ratio (which means that the application spend more time proportionally in unzipping than normal). If the compression ratio in your example is not actually typical from the real use case, you will find that the read rate expressed in uncompressed byte per second would be closer to what you would see in the real use case.
Cheers,
Philippe.
On 11/9/11 3:56 PM, Dimitri Bourilkov wrote:
> Hi Philippe,
>
> Thanks, I see. This is a test file, probably the number of branches not filled for each event (and easy to compress) is
> higher than average.
>
> As a user, I would be way more interested to get the "real" number of bytes flowing from the source to my application, not
> the "uncompressed" one. Or, as in rsync, I wouldn't mind getting both, but just the latter is not so useful in many cases.
>
> Cheers,
> Dimitri
>
> On 11/09/2011 04:14 PM, Philippe Canal wrote:
>> Hi,
>>
>> The number returned by GetEntry is the uncompressed size of the data while the size of the file
>> is the sum of the compressed data size. So it is expected that the size/rate seen as from
>> GetEntry should be higher than the one as seen from the file size(/file system).
>> However a factor of 125 times higher is still a bit high ....
>>
>> Cheers,
>> Philippe.
>>
>> On 11/9/11 11:44 AM, Dimitri Bourilkov wrote:
>>> Hi,
>>>
>>> root 5.22 on SLC5
>>>
>>> The documentation about TTree says:
>>>
>>> Int_t GetEntry(Long64_t entry = 0, Int_t getall = 0)
>>>
>>> The function returns the number of bytes read from the input buffer.
>>>
>>> When I loop on a 7 GB file with 250k events, a tree with many branches, it takes ~ 3600 s, so the read rate is ~ 2 MB/s. As I am
>>> reading from a remote source, this rate is confirmed by the Ganglia plot for the network traffic. So far so good.
>>>
>>> Now if I look at the output from the root code (generated with MakeClass, snippet below) for nb (bytes per event) and nbytes, I
>>> get a rate which is 125 times higher?!
>>>
>>> Can someone comment on the output from GetEntry.
>>>
>>> Thanks, Dimitri
>>>
>>>
>>> Long64_t nentries = fChain->GetEntriesFast();
>>>
>>> Long64_t nbytes = 0, nb = 0;
>>> for (Long64_t jentry=0; jentry<nentries;jentry++) {
>>> Long64_t ientry = LoadTree(jentry);
>>> if (ientry < 0) break;
>>> nb = fChain->GetEntry(jentry); nbytes += nb;
>>> // if (Cut(ientry) < 0) continue;
>>> if(jentry-int(jentry/ifreq)*ifreq == 0) {
>>> cout << "Event, bytes = " << jentry << ", " << nb << endl;
>>> }
>>> }
>>>
>>> cout << "N entries, Nbytes = " << nentries << ", " << nbytes << endl;
>
Received on Wed Nov 09 2011 - 23:09:49 CET
This archive was generated by hypermail 2.2.0 : Thu Nov 10 2011 - 11:50:02 CET