Re: hadd: "too many open files"

From: Vladimir Savinov <vps3_at_pitt.edu>
Date: Thu, 11 Aug 2011 21:13:48 -0400


Hi Philippe,
fantastic!!!
thank you very much!
v

On 8/11/2011 5:50 PM, Philippe Canal wrote:
> Hi Vladimir,
>
> I was not very clear. The newest version of hadd (in the trunk) does
> exactly what you need
> and can handle an arbitrary number of files and by default on SCL5
> will automatically process
> them in bunch 924 files (i.e. 1024 - 100 for some wiggle room).
>
> Cheers,
> Philippe.
>
> On 8/11/11 4:43 PM, Vladimir Savinov wrote:
>> Hi Philippe,
>> the standard setting on SLC5 is 1024 descriptors (only).
>> Would that be possible to include one of those scripts that
>> could merge an arbitrary number of ROOT files into ROOT
>> distribution? I am one of those people who have to use
>> hadd several times from another script to do merging in
>> groups of approx. 800 files. I routinely need to merge 5K files.
>> thanks,
>> v
>>
>> On 8/11/2011 3:40 PM, Philippe Canal wrote:
>>> Hi Noel,
>>>
>>> TFileMerger and hadd are now limited to 'ulimit -n' (minus some
>>> wiggle room for files opened by the system or CINT) files opened at
>>> the same
>>> and the value of this maximum can be customized at run-time (hadd -n
>>> max, TFileMerger::SetMaxOpenedFiles). (In revision 40569 and up).
>>>
>>> Cheers,
>>> Philippe.
>>>
>>> On 8/8/11 2:11 AM, Noel Dawe wrote:
>>>> Hi Philippe,
>>>>
>>>> I see. Why not perform the merge in batches containing a maximum of
>>>> "ulimit -n" files then? Or add an option -n allowing the user to
>>>> specify a maximum number of files to consider at once. Although
>>>> taking a slight performance hit if more than "ulimit -n" files were
>>>> being merged, at least hadd would not hit the system limits and
>>>> fail. I think the slight performance hit is definitely worth
>>>> actually running to completion. Actually, I think any necessary
>>>> performance hit is worth it. Otherwise users typically write some
>>>> kind of wrapper script which calls hadd on a subset of the files
>>>> until all files are merged (essentially doing as suggested above).
>>>>
>>>> Noel
>>>>
>>>> On Mon, Aug 8, 2011 at 1:51 AM, Philippe Canal <pcanal_at_fnal.gov
>>>> <mailto:pcanal_at_fnal.gov>> wrote:
>>>>
>>>> Hi Noel,
>>>>
>>>> The current scheme comes from 2 observation, one being that
>>>> opening a file is comparitively slow especially if the file is
>>>> not local.
>>>> The 2nd is that it is more efficient time wise to get one
>>>> object to be merged and then merge into this object the equivalent
>>>> objects from all the remaining files and then to move on to the
>>>> next object/directory. This is particular helpful with deep
>>>> directory
>>>> hierarchy are its reduced the number of traversal that are needed.
>>>>
>>>> Cheers,
>>>> Philippe.
>>>>
>>>>
>>>> On 8/6/11 5:19 AM, Noel Dawe wrote:
>>>>
>>>> I don't know why hadd needs to open all the files at the
>>>> same time but probably a better way to write this tool
>>>> would be to never open more than two files at once: copy
>>>> the first file to the destination and keep it open, then
>>>> pop off the next file, open it, merge it into the first,
>>>> close it, then pop off the next file and open it, etc...
>>>>
>>>> Noel
>>>>
>>>>
Received on Fri Aug 12 2011 - 03:13:44 CEST

This archive was generated by hypermail 2.2.0 : Fri Aug 12 2011 - 11:50:01 CEST