Hi,
I am writing some fairly complicated selectors (TPySelector's actually)
and I notice that, particularly when accessing data over NFS, the PROOF
slaves quickly become I/O bound, as I see many proofserve.exe processes
sitting nearly idle. This also happens using data only on local disk
(RAID-5, 7200 rpm Seagate Barracuda's ... so can't improve things to
much there).
I have tried increasing the TTree caching using t.SetCacheSize(), and I have also slimmed the ROOT files considerably and turned off all the branches with SetBranchStatus() that I don't need at run-time.
However, I still see relatively poor performance in terms of CPU usage. I have 16-core machines (albeit with hyper-threading) and I would like to utilize them better.
So my question is two-fold:
(1) are there some methods/tips/tricks to improve performance? Are there
caching parameters that I can set somewhere to prefetch files/trees in
larger chunks? Currently I am processing my datasets at ~ 2.5 MB/s,
reported by the PROOF GUI, which is pretty slow IMHO. However, I think
this is actually the rate of data being analyzed and not the rate at
which I am reading through the files, which I guess are two very
different things for large trees with many branches that I am not using.
Am I right about this?
(2) anticipating that there are no easy solutions in (1), has anyone
heard of memcached? This is a distributed memory cache which one can use
to pool extra RAM from multiple machines. One can then use a FUSE
filesystem, memcachefs, to store files in pooled memory. I am wondering
how I could possibly interface this with the TDSet infrastructure in
PROOF. In particular, I imagine a FIFO buffer manager that pre-fetches
files in a TDSet and kicks out already-processed ones, running in a
separate thread/process somewhere on my cluster. Somehow, I would have
to trick PROOF to not verify the files before running the workers
(because they would only 'arrive' in the cache just before they are
needed), and I would have to have some way of communicating where I am
in the TDSet list of files to the cache manager so that I can grab the
next N files and place them in the cache. Then, if the memory cache is
large enough, or if I can copy files into it ~ as fast as I process
them, hopefully I can lessen the I/O constraints since reading from this
cache will be constrained only by network latency and some (apparently)
very small CPU overhead in memcached.
(Note: there is also a C++ API for memcached which can deal with
arbitrary chunks of data, not restricted to whole files, but I imagine
this would be even more low-level and complicated.)
thanks,
Doug
Received on Thu Jul 08 2010 - 01:55:38 CEST
This archive was generated by hypermail 2.2.0 : Thu Jul 08 2010 - 11:50:01 CEST