Re: hadd: "too many open files"

From: Noel Dawe <Noel.Dawe_at_cern.ch>
Date: Mon, 8 Aug 2011 09:11:56 +0200


Hi Philippe,

I see. Why not perform the merge in batches containing a maximum of "ulimit -n" files then? Or add an option -n allowing the user to specify a maximum number of files to consider at once. Although taking a slight performance hit if more than "ulimit -n" files were being merged, at least hadd would not hit the system limits and fail. I think the slight performance hit is definitely worth actually running to completion. Actually, I think any necessary performance hit is worth it. Otherwise users typically write some kind of wrapper script which calls hadd on a subset of the files until all files are merged (essentially doing as suggested above).

Noel

On Mon, Aug 8, 2011 at 1:51 AM, Philippe Canal <pcanal_at_fnal.gov> wrote:

> Hi Noel,
>
> The current scheme comes from 2 observation, one being that opening a file
> is comparitively slow especially if the file is not local.
> The 2nd is that it is more efficient time wise to get one object to be
> merged and then merge into this object the equivalent
> objects from all the remaining files and then to move on to the next
> object/directory. This is particular helpful with deep directory
> hierarchy are its reduced the number of traversal that are needed.
>
> Cheers,
> Philippe.
>
>
> On 8/6/11 5:19 AM, Noel Dawe wrote:
>
>> I don't know why hadd needs to open all the files at the same time but
>> probably a better way to write this tool would be to never open more than
>> two files at once: copy the first file to the destination and keep it open,
>> then pop off the next file, open it, merge it into the first, close it, then
>> pop off the next file and open it, etc...
>>
>> Noel
>>
>
Received on Mon Aug 08 2011 - 09:12:27 CEST

This archive was generated by hypermail 2.2.0 : Thu Aug 11 2011 - 23:50:01 CEST