Re: hadd: "too many open files"

From: Philippe Canal <pcanal_at_fnal.gov>
Date: Thu, 11 Aug 2011 16:50:26 -0500


Hi Vladimir,

I was not very clear. The newest version of hadd (in the trunk) does exactly what you need and can handle an arbitrary number of files and by default on SCL5 will automatically process them in bunch 924 files (i.e. 1024 - 100 for some wiggle room).

Cheers,
Philippe.

On 8/11/11 4:43 PM, Vladimir Savinov wrote:
> Hi Philippe,
> the standard setting on SLC5 is 1024 descriptors (only).
> Would that be possible to include one of those scripts that
> could merge an arbitrary number of ROOT files into ROOT
> distribution? I am one of those people who have to use
> hadd several times from another script to do merging in
> groups of approx. 800 files. I routinely need to merge 5K files.
> thanks,
> v
>
> On 8/11/2011 3:40 PM, Philippe Canal wrote:
>> Hi Noel,
>>
>> TFileMerger and hadd are now limited to 'ulimit -n' (minus some wiggle room for files opened by the system or CINT) files opened
>> at the same
>> and the value of this maximum can be customized at run-time (hadd -n max, TFileMerger::SetMaxOpenedFiles). (In revision 40569
>> and up).
>>
>> Cheers,
>> Philippe.
>>
>> On 8/8/11 2:11 AM, Noel Dawe wrote:
>>> Hi Philippe,
>>>
>>> I see. Why not perform the merge in batches containing a maximum of "ulimit -n" files then? Or add an option -n allowing the
>>> user to specify a maximum number of files to consider at once. Although taking a slight performance hit if more than "ulimit -n"
>>> files were being merged, at least hadd would not hit the system limits and fail. I think the slight performance hit is
>>> definitely worth actually running to completion. Actually, I think any necessary performance hit is worth it. Otherwise users
>>> typically write some kind of wrapper script which calls hadd on a subset of the files until all files are merged (essentially
>>> doing as suggested above).
>>>
>>> Noel
>>>
>>> On Mon, Aug 8, 2011 at 1:51 AM, Philippe Canal <pcanal_at_fnal.gov <mailto:pcanal_at_fnal.gov>> wrote:
>>>
>>> Hi Noel,
>>>
>>> The current scheme comes from 2 observation, one being that opening a file is comparitively slow especially if the file is
>>> not local.
>>> The 2nd is that it is more efficient time wise to get one object to be merged and then merge into this object the equivalent
>>> objects from all the remaining files and then to move on to the next object/directory. This is particular helpful with deep
>>> directory
>>> hierarchy are its reduced the number of traversal that are needed.
>>>
>>> Cheers,
>>> Philippe.
>>>
>>>
>>> On 8/6/11 5:19 AM, Noel Dawe wrote:
>>>
>>> I don't know why hadd needs to open all the files at the same time but probably a better way to write this tool would be
>>> to never open more than two files at once: copy the first file to the destination and keep it open, then pop off the
>>> next file, open it, merge it into the first, close it, then pop off the next file and open it, etc...
>>>
>>> Noel
>>>
>>>
Received on Thu Aug 11 2011 - 23:50:36 CEST

This archive was generated by hypermail 2.2.0 : Fri Aug 12 2011 - 05:50:01 CEST