Re: Addin THnSparseD objects from Axel Naumann on 2009-06-16 (RootTalk)

From: Axel Naumann <Axel.Naumann_at_cern.ch>
Date: Tue, 16 Jun 2009 10:15:58 +0200

Hi,

20 * 1.2M is still reasonable. There must be something else. Can you send (or put on the web) one of the files you are trying to merge? You can send it to me personally if you prefer.

Cheers, Axel

Alberto Pulvirenti wrote on 06/16/2009 09:55 AM:

> Hello,
> 
> to speak the truth, I have at least 20 such histograms per file, the 
> nested lists are a way to keep well sorted the things, but I was 
> wandering if having too many nested levels of sorting could cause memory 
> problems, since once I tried to put all of these histograms in just one 
> TList level and it seemed to be better.
> 
> For what concerns TTrees, it should be filled with combinations of pair 
> of tracks from events containing up to 1000 track each, so in principle, 
> should be almost 10^6 combinations per event, for at least 200 event per 
> job, and for a total number of aroun 1 million of events. I think that a 
> TTree could explode with such sizes, and it could become extremely huge 
> in terms of file size... unless there is something I don't know...
> 
> Cheers
> 
>    Alberto
> 
> Axel Naumann wrote:

>> Hi,
>>
>> you say you have 400k bins, 1/4 of them filled, so that makes 100k
>> bins. You use doubles (8 bytes), and THnSparse's bin bookkeeping most
>> probably costs you much less than 4 bytes per bin (depends on the
>> number of dimensions and the number of bins per axis), so they should
>> use less than 1.2M bytes. That does not explain why you run out of
>> memory.
>>
>> Of course now the question is how many *different* THnSparses are you
>> trying to merge, i.e. how many THnSparses do you expect to end up
>> after the merge process? You have 4 nested lists; the number of
>> objects easily explodes because you can get an exponential increase.
>> It really depends on how you store things in there.
>>
>> Btw: why don't you store the analysis result in a TTree? That's much
>> less memory consuming, you can create projections from that just as
>> easily as from a THnSparse, and TTrees are trivial to merge. They
>> should also allow you to get rid of the four nested list levels. If
>> you really need to keep things in memory you could try with in-memory
>> trees (i.e. those that are not attached to a TFile).
>>
>> Cheers, Axel.
>>
>> Alberto Pulvirenti wrote on 06/16/2009 09:17 AM:
>>> Dear all,
>>>
>>> I have tried to switch to THnSparseD for my analysis, since it allow
>>> to have multiple binnings which is of help in fastening my analysis
>>> and keeping relation between all the variables I used to bin my
>>> histogram, but this caused to have virtually an object with very many
>>> bins (~400k), even if in the most segmented axes, almost one quarter
>>> of them should be empty.
>>> Since I run jobs on grid, I have many output files, each one
>>> containing a set of such objects, and I have then to merge them into
>>> a single file.
>>>
>>> In my file, the histograms are stored into a TList object, which
>>> contains 3 deeper order of TLists (I mean:
>>> TList->TList->TList->TList->object)
>>>
>>> Now, when I use "hadd" to do this, of the TFileMerger, it happens
>>> that the memory occupancy of the execution takes almost all the RAM
>>> available in my PC (4 GB), so I don't manage to add them without
>>> raising segmentation faults or aborts due to excessive memory
>>> consumption.
>>>
>>> I was wandering if this can be imputable to the huge size of
>>> THnSparse's or to problems related to having many nested levels of
>>> TLists which are all stored as a single key inside the file.
>>>
>>> Can someone help me in understanding this or give me some suggestion
>>> about the best way to store data in the file?
>>>
>>> Thanks, best regards
>>>
>>> Alberto
>>>
>>>

>
> Received on Tue Jun 16 2009 - 10:15:54 CEST

This archive was generated by hypermail 2.2.0 : Tue Jun 16 2009 - 11:50:03 CEST