Re: Time for merging of THnSparse

From: Axel Naumann <Axel.Naumann_at_cern.ch>
Date: Wed, 28 Jul 2010 12:54:17 +0200


Hi Marian,

can you check that attached patch also solves it in your real case? It helps tremendously for my test case, see attached plot.

Cheers, Axel.

Marian Ivanov wrote on 07/21/2010 02:19 PM:

> Hallo Axel.
> 
> The test was made on the 8 core nodes machine with 32 GBy RAM.
> We have limit 4 GBy per process.  In my test I used 2.4 GBy at maximum.
> 
> I do not expect  problem witch swapping.
> 
> I'm running the calgrind, maybe tomorrow I will know more.
> 
> In meantime, using just simple "profiling", stopping the program with
> debugger I end up almost always in TExMap::GetValue
> 
> TExMap::fSize ~ 14 millions. I do not know how exactly it is working but
> this huge number
> is suspicious.
> 
> Ciao
> Marian.
> 
> 
> 
> 
> Marian Ivanov wrote:

>> Hallo Axel.
>>
>> All histograms have approximately the same number of filled bins (order
>> of magnitude)
>> The problem is not there:
>>
>> miranov_at_lxb346:testMerge> cat calib.list | xargs ls -al
>> -rw-r--r-- 1 miranov alice 151955578 2010-07-18 22:02
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge0/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 98555232 2010-07-19 02:22
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge10/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 87543700 2010-07-19 02:26
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge11/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 79393978 2010-07-19 02:29
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge12/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 455069427 2010-07-19 04:01
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge13/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 343900123 2010-07-19 04:38
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge14/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 286235767 2010-07-18 22:16
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge1/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 434115635 2010-07-18 23:51
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge2/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 451920293 2010-07-19 01:26
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge3/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 307624494 2010-07-19 01:45
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge4/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 309207616 2010-07-19 02:01
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge5/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 121205820 2010-07-19 02:05
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge6/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 156846508 2010-07-19 02:10
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge7/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 96025399 2010-07-19 02:14
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge8/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 76993658 2010-07-19 02:18
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge9/CalibObjects.root
>>
>> Example file:
>> TFile
>> f("/lustre/alice/miranov/calibPass2/testAlign1307/119037/merge13/CalibObjects.root")
>> align->GetClusterDelta(2)->GetNbins()
>> (const Long64_t)5817301
>> align->GetClusterDelta(2)->GetSparseFractionBins()
>> (const Double_t)8.22225454228849850e-02
>> root [13] align->GetClusterDelta(2)->GetSparseFractionMem()
>> (const Double_t)4.93335272537309910e-01
>>
>> Size expected in using non sparse regime (if it will exist)
>> root [15] 60*180*53*36 x 4 Bytes
>> (const int)20606400x 4 Bytes = 80 MBy
>>
>> In general it looks like we are approaching exponential regime, approaching
>> 100% occupancy.
>>
>> I still did not kill the job, in attachment you can find the current
>> progress:
>> Time: merging number
>> Virtual memory: merging number
>>
>> Maybe it can be also VM related.
>>
>> I do not know who else can implement the new alternative algorithm, but
>> our calibration rely
>> on usage of THnSparse. Unfortunately I can not reduce the number of bins.
>>
>>
>> Ciao
>> Marian.
>>
>>
>>
>> Axel Naumann wrote:
>>
>>> Hi Marian,
>>>
>>> Marian Ivanov wrote on 07/19/2010 09:54 AM:
>>>
>>>
>>>> CPU time for merging is smaller (in comparison with 5.26.xxx) but still
>>>> I observe non linear time dependence of merging.
>>>> See attached picture - time to merge versus merge number.
>>>>
>>>>
>>> thanks for the timing info, I will investigate. Do the different merge
>>> steps have approximately the same number of filled bins? The step from 4
>>> to 5 is a bit scary...
>>>
>>>
>>>
>>>> but I think that
>>>> you can try to implement THnSparse in non sparse mode in case of
>>>> occupancy > something.
>>>> I expect it will save time, and also disk space. As we discussed before.
>>>>
>>>>
>>> Yes, agreed :-) Do you know someone who could implement this? Otherwise
>>> I will do it, but I cannot guarantee a date for that...
>>>
>>> Cheers, Axel.
>>>
>>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>> ------------------------------------------------------------------------
>>

>
>

merge_time.png
Received on Wed Jul 28 2010 - 12:54:22 CEST

This archive was generated by hypermail 2.2.0 : Wed Jul 28 2010 - 17:50:01 CEST