[ROOT] Reading ROOT trees: no read-ahead under linux-2.6.8?

From: Konstantin Olchanski (olchansk@sam.triumf.ca)
Date: Mon Oct 04 2004 - 22:29:28 MEST

Rooters- we observe suboptimal performance while reading ROOT trees-
the ROOT process is nominally I/O bound, but we see low CPU utilization,
about 30% (top, vmstat 1), low disk utilization 40% (iostat -x 1) and
high wait times (vmstat 1) 15%. This is on Fedora-2 with the Fedora
stock 2.6.8 linux kernel.

The observed pattern is consistent with "wait for data from disk, compute
some, wait for more data from disk, compute some more, etc...".

Disk-level and file-level read-ahead inside recent 2.6 linux kernels
is supposed to prevent the "wait for data" thing, but aparently
read-ahead is not happening at all. If we cause "manual" read-ahead (say,
concurrently, "dd" the tree file to /dev/null), disk utilization
and CPU utilization go to 100%, as they should, and the tree reading
code runs about twice faster.

I suspect that the pattern of system calls that ROOT uses
to read trees: read() followed by lseek() followed by read(), etc...
(as observed by strace) somehow defeats and disables the file-level
read-ahead in the 2.6 linux kernels. Maybe the lseek() calls make the
kernel think "oh, this process is just jumping around, no bother with

I can think of several ROOT-level solutions to this performance problem:
(apart from doctoring the Linux kernel)

1) get rid of the lseek() calls (an "optimization": to skip 12 Kbytes, instead
   of lseek() do a read() into a dummy buffer, to skip 12 Mbytes, do lseek()).
2) do more data buffering inside ROOT (to read 12 Kbytes, read 12 Mbytes
   into an internal buffer (ram is cheap) and return subsequent data
   from this buffer).
3) maybe use xrootd? Does it do read-ahead data buffering?

Any thoughts?

