Re: (retry) PROOF and I/O from Rene Brun on 2010-07-08 (RootTalk)

From: Rene Brun <Rene.Brun_at_cern.ch>
Date: Thu, 8 Jul 2010 06:43:10 +0200

Hi Doug,

From your numbers, I conclude that you have 2 problems -you do not use the TreeCache for a reason to be investigated -using Python slows down your process considerably (factor 10 to 100)

Could you run the same analysis outside PROOF using ROOT version >=5.26? read carefully the 5.26 release notes at http://root.cern.ch/root/v526/Version526.news.html and also and in particular the different use cases of the TreeCache at http://root.cern.ch/root/html/TTreeCache.html

It would be good to understand the "IO" issue if you could use the TTreePerfStats class (see release notes of class doc). In this way, we will understand immediately if you really use the TreeCache and see the IO performance.

Rene Brun

Doug Schouten wrote:
> Hi,
>
> I am writing some fairly complicated selectors (TPySelector's
> actually) and I notice that, particularly when accessing data over
> NFS, the PROOF slaves quickly become I/O bound, as I see many
> proofserve.exe processes sitting nearly idle. This also happens using
> data only on local disk (RAID-5, 7200 rpm Seagate Barracuda's ... so
> can't improve things to much there).
>
> I have tried increasing the TTree caching using t.SetCacheSize(), and
> I have also slimmed the ROOT files considerably and turned off all the
> branches with SetBranchStatus() that I don't need at run-time.
>
> However, I still see relatively poor performance in terms of CPU
> usage. I have 16-core machines (albeit with hyper-threading) and I
> would like to utilize them better.
>
> So my question is two-fold:
>
> (1) are there some methods/tips/tricks to improve performance? Are
> there caching parameters that I can set somewhere to prefetch
> files/trees in larger chunks? Currently I am processing my datasets at
> ~ 2.5 MB/s, reported by the PROOF GUI, which is pretty slow IMHO.
> However, I think this is actually the rate of data being analyzed and
> not the rate at which I am reading through the files, which I guess
> are two very different things for large trees with many branches that
> I am not using. Am I right about this?
>
> (2) anticipating that there are no easy solutions in (1), has anyone
> heard of memcached? This is a distributed memory cache which one can
> use to pool extra RAM from multiple machines. One can then use a FUSE
> filesystem, memcachefs, to store files in pooled memory. I am
> wondering how I could possibly interface this with the TDSet
> infrastructure in PROOF. In particular, I imagine a FIFO buffer
> manager that pre-fetches files in a TDSet and kicks out
> already-processed ones, running in a separate thread/process somewhere
> on my cluster. Somehow, I would have to trick PROOF to not verify the
> files before running the workers (because they would only 'arrive' in
> the cache just before they are needed), and I would have to have some
> way of communicating where I am in the TDSet list of files to the
> cache manager so that I can grab the next N files and place them in
> the cache. Then, if the memory cache is large enough, or if I can copy
> files into it ~ as fast as I process them, hopefully I can lessen the
> I/O constraints since reading from this cache will be constrained only
> by network latency and some (apparently) very small CPU overhead in
> memcached.
>
> (Note: there is also a C++ API for memcached which can deal with
> arbitrary chunks of data, not restricted to whole files, but I imagine
> this would be even more low-level and complicated.)
>
> thanks,
> Doug
>
Received on Thu Jul 08 2010 - 06:43:14 CEST

This archive was generated by hypermail 2.2.0 : Thu Jul 08 2010 - 17:50:01 CEST