Re: root file splitting and merging

From: Philippe Canal <pcanal_at_fnal.gov>
Date: Tue, 8 Nov 2011 13:48:39 -0600


Hi Donal,

I strongly suggest that look into using Proof, our Parallel ROOT Facility, PROOF, is an extension of ROOT <http://root.cern.ch> enabling interactive analysis of large sets of ROOT files in parallel on clusters of computers or many-core machines. See <http://root.cern.ch/drupal/content/proof>.

 > I have a large root file (say 2G), and want a efficient way to
 > split it into small bulks(say 64M), and store these bulks on
 > different computing node to analyse it separately.

The cost of doing the active splitting of the file is more than likely be much higher than the gain you would have by accessing the file in that manner.

 > I wonder if I can read the root file sequentially, object-by-object, to  > a buffer, and save it to a new file when the the buffer >= 64M.

If you use PROOF and have created the file with the autoflush set to 64M, PROOF will essentially do this separation for you and access the file only by 64M chunks.

Cheers,
Philippe.

On 11/8/11 12:43 AM, donal0412 wrote:
> Thanks Philippe.
> Approximate size is fine.
> But my use case is :
> I have a large root file (say 2G), and want a efficient way to split it into small bulks(say 64M), and store these bulks on
> different computing node to analyse it separately.
> I wonder if I can read the root file sequentially, object-by-object, to a buffer, and save it to a new file when the the buffer >=
> 64M.
> I have few knowledge about root, so I would appreciate if you can give me some example codes.
> Cheers,
> Donal
> On 2011/11/7 23:40, Philippe Canal wrote:
>> Hi Donal.
>>
>> There is currently no easy way to enforce that a ROOT file is exactly in 64M chunks. However you can get close (within
>> statistical fluctuations) by setting the 'auto-flush' size is 64M when writing the TTree (SetAutoFlush(-64*1024*1024) )
>>
>> Cheers,
>> Philippe.
>>
>> On 11/5/11 3:56 AM, donal0412 wrote:
>>> Hi ROOT experts and users,
>>> I'm considering to use HDFS to store ROOT files and use map-reduce framework to process the files (reconstruction,analysis,MC).
>>> I wonder if it's possible to have an efficient way to split the ROOT file into fixed size(say 64M) , and merge some ROOT files
>>> into one file.
>>>
>>> Thanks !
>>> Donal
>>>
>
Received on Tue Nov 08 2011 - 20:48:46 CET

This archive was generated by hypermail 2.2.0 : Wed Nov 09 2011 - 17:50:01 CET