Hi Marian,
can you check that attached patch also solves it in your real case? It
helps tremendously for my test case, see attached plot.
Cheers, Axel.
Marian Ivanov wrote on 07/21/2010 02:19 PM:
> Hallo Axel.
>
> The test was made on the 8 core nodes machine with 32 GBy RAM.
> We have limit 4 GBy per process. In my test I used 2.4 GBy at maximum.
>
> I do not expect problem witch swapping.
>
> I'm running the calgrind, maybe tomorrow I will know more.
>
> In meantime, using just simple "profiling", stopping the program with
> debugger I end up almost always in TExMap::GetValue
>
> TExMap::fSize ~ 14 millions. I do not know how exactly it is working but
> this huge number
> is suspicious.
>
> Ciao
> Marian.
>
>
>
>
> Marian Ivanov wrote:
>> Hallo Axel.
>>
>> All histograms have approximately the same number of filled bins (order
>> of magnitude)
>> The problem is not there:
>>
>> miranov_at_lxb346:testMerge> cat calib.list | xargs ls -al
>> -rw-r--r-- 1 miranov alice 151955578 2010-07-18 22:02
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge0/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 98555232 2010-07-19 02:22
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge10/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 87543700 2010-07-19 02:26
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge11/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 79393978 2010-07-19 02:29
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge12/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 455069427 2010-07-19 04:01
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge13/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 343900123 2010-07-19 04:38
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge14/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 286235767 2010-07-18 22:16
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge1/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 434115635 2010-07-18 23:51
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge2/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 451920293 2010-07-19 01:26
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge3/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 307624494 2010-07-19 01:45
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge4/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 309207616 2010-07-19 02:01
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge5/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 121205820 2010-07-19 02:05
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge6/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 156846508 2010-07-19 02:10
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge7/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 96025399 2010-07-19 02:14
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge8/CalibObjects.root
>> -rw-r--r-- 1 miranov alice 76993658 2010-07-19 02:18
>> /lustre/alice/miranov/calibPass2/testAlign1307/119037/merge9/CalibObjects.root
>>
>> Example file:
>> TFile
>> f("/lustre/alice/miranov/calibPass2/testAlign1307/119037/merge13/CalibObjects.root")
>> align->GetClusterDelta(2)->GetNbins()
>> (const Long64_t)5817301
>> align->GetClusterDelta(2)->GetSparseFractionBins()
>> (const Double_t)8.22225454228849850e-02
>> root [13] align->GetClusterDelta(2)->GetSparseFractionMem()
>> (const Double_t)4.93335272537309910e-01
>>
>> Size expected in using non sparse regime (if it will exist)
>> root [15] 60*180*53*36 x 4 Bytes
>> (const int)20606400x 4 Bytes = 80 MBy
>>
>> In general it looks like we are approaching exponential regime, approaching
>> 100% occupancy.
>>
>> I still did not kill the job, in attachment you can find the current
>> progress:
>> Time: merging number
>> Virtual memory: merging number
>>
>> Maybe it can be also VM related.
>>
>> I do not know who else can implement the new alternative algorithm, but
>> our calibration rely
>> on usage of THnSparse. Unfortunately I can not reduce the number of bins.
>>
>>
>> Ciao
>> Marian.
>>
>>
>>
>> Axel Naumann wrote:
>>
>>> Hi Marian,
>>>
>>> Marian Ivanov wrote on 07/19/2010 09:54 AM:
>>>
>>>
>>>> CPU time for merging is smaller (in comparison with 5.26.xxx) but still
>>>> I observe non linear time dependence of merging.
>>>> See attached picture - time to merge versus merge number.
>>>>
>>>>
>>> thanks for the timing info, I will investigate. Do the different merge
>>> steps have approximately the same number of filled bins? The step from 4
>>> to 5 is a bit scary...
>>>
>>>
>>>
>>>> but I think that
>>>> you can try to implement THnSparse in non sparse mode in case of
>>>> occupancy > something.
>>>> I expect it will save time, and also disk space. As we discussed before.
>>>>
>>>>
>>> Yes, agreed :-) Do you know someone who could implement this? Otherwise
>>> I will do it, but I cannot guarantee a date for that...
>>>
>>> Cheers, Axel.
>>>
>>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>> ------------------------------------------------------------------------
>>
>
>
Received on Wed Jul 28 2010 - 12:54:22 CEST