Re: hadd: "too many open files"

From: Vladimir Savinov <vps3_at_pitt.edu>
Date: Thu, 11 Aug 2011 17:43:29 -0400


Hi Philippe,
the standard setting on SLC5 is 1024 descriptors (only). Would that be possible to include one of those scripts that could merge an arbitrary number of ROOT files into ROOT distribution? I am one of those people who have to use hadd several times from another script to do merging in groups of approx. 800 files. I routinely need to merge 5K files. thanks,
v

On 8/11/2011 3:40 PM, Philippe Canal wrote:
> Hi Noel,
>
> TFileMerger and hadd are now limited to 'ulimit -n' (minus some wiggle
> room for files opened by the system or CINT) files opened at the same
> and the value of this maximum can be customized at run-time (hadd -n
> max, TFileMerger::SetMaxOpenedFiles). (In revision 40569 and up).
>
> Cheers,
> Philippe.
>
> On 8/8/11 2:11 AM, Noel Dawe wrote:
>> Hi Philippe,
>>
>> I see. Why not perform the merge in batches containing a maximum of
>> "ulimit -n" files then? Or add an option -n allowing the user to
>> specify a maximum number of files to consider at once. Although
>> taking a slight performance hit if more than "ulimit -n" files were
>> being merged, at least hadd would not hit the system limits and fail.
>> I think the slight performance hit is definitely worth actually
>> running to completion. Actually, I think any necessary performance
>> hit is worth it. Otherwise users typically write some kind of wrapper
>> script which calls hadd on a subset of the files until all files are
>> merged (essentially doing as suggested above).
>>
>> Noel
>>
>> On Mon, Aug 8, 2011 at 1:51 AM, Philippe Canal <pcanal_at_fnal.gov
>> <mailto:pcanal_at_fnal.gov>> wrote:
>>
>> Hi Noel,
>>
>> The current scheme comes from 2 observation, one being that
>> opening a file is comparitively slow especially if the file is
>> not local.
>> The 2nd is that it is more efficient time wise to get one object
>> to be merged and then merge into this object the equivalent
>> objects from all the remaining files and then to move on to the
>> next object/directory. This is particular helpful with deep
>> directory
>> hierarchy are its reduced the number of traversal that are needed.
>>
>> Cheers,
>> Philippe.
>>
>>
>> On 8/6/11 5:19 AM, Noel Dawe wrote:
>>
>> I don't know why hadd needs to open all the files at the same
>> time but probably a better way to write this tool would be to
>> never open more than two files at once: copy the first file
>> to the destination and keep it open, then pop off the next
>> file, open it, merge it into the first, close it, then pop
>> off the next file and open it, etc...
>>
>> Noel
>>
>>
Received on Thu Aug 11 2011 - 23:43:38 CEST

This archive was generated by hypermail 2.2.0 : Fri Aug 12 2011 - 05:50:01 CEST