[ROOT] Re: Users_Guide_07.pdf (11)

From: Rene Brun (Rene.Brun@cern.ch)
Date: Tue Feb 20 2001 - 11:41:33 MET


Hi Jacek,

Thanks for reporting the problem with the example at page 269. The numbers shown
are the times to write and not to read. I also realize that these numbers
were generated on a very slow machine. We will reprocess the examples
on a machine of type Pentium III 800 Mhz.

About the compression levels:
First remark: The only interesting compression levels are 0,1 and 2.
Compression levels above 2 are not competitive (too much CPU overhead
compared to the gain in file space).
Assuming that comp is the compression level specified in the TFile constructor,
the following algorithm is used:
 - split = 0, the branch buffer is compressed with level=comp
 - split = 1, the branches buffers are compressed with level=comp if the data
types
              are not float.
              the branches buffers are compressed with level=comp-1 for types
float
              
The reason for the distinction is that the gain in compression for data types
of type float is in general around 20 or 30 per cent and for other types
in general better than 100 per cent.
              
Rene Brun

Jacek M. Holeczek wrote:
> 
> Hi,
> Up to now I thought that the Compression Level (0, 1, ... 9) directly
> corresponds to the "gzip" level (no_compression, -1, ... -9).
> Suddenly I have found that in :
> 1. MainEvent.cxx :
>         if comp = 0 no compression at all
>         if comp = 1 event is compressed
>         if comp = 2 same as 1. In addition branches with floats in the
>                     TClonesArray are also compressed
>         (nothing mentioned about 3, 4, ... 9)
> 2. Ebench.html :
>         comp = 0 means: no compression at all
>         comp = 1 means: compress everything if split = 0
>         comp = 1 means: compress only the Tree branches with integers if
>                         split = 1
>         comp = 2 means: compress everything if split=1
>         (nothing mentioned about 3, 4, ... 9)
> What is the trick here ?
> In case this is really true that the compression level does not directly
> correspond to the "gzip" compression level could you please describe the
> "rules" exactly (in which case something is compressed and with which
> "level", what is not compressed) in the "Users Guide" -> "9 Input/Output"
> -> "The Pysical Layout of ROOT Files" -> "Compression" (page 159).
> This also reminds me that on page 159 in chapter "Compression" there are
> some numbers given for the file sizes of compressed files as a function of
> the compression level. Could I ask you to add here some numbers of the
> time needed to "write" and time needed to "read" this file as a function
> of the compression level ?
> While looking at the compression question I noticed one more bug.
> On page 269 in chapter "Event - An Example of a ROOT Application." I can
> read "You can see that the compressed file reads much slower (4.02 seconds
> vs. 12.5 seconds)." This was really a magic for me some days ago when I
> read this text for the first time. Up to now I thought that decompressing
> does not really introduce any big factor (compressing itself does,
> depending on the level). Now, after a careful second look, I can see
> where this "much slower" comes from. In the middle example "box" I can see
> "Real Time=4.02 ... You READ 5.22 MBytes/Realtime seconds ...", while in
> the last example "box" I can see "RealTime=12.5 ... You WRITE 1.66 ...".
> You are comparing 4.02 READ time of uncompressed file with 12.5 WRITE time
> of COMPRESSED file !!! The last "output" was not copied from the "Event 0
> 0 20", but, most probably, from the "Event 400 1 1 1".
> I think that in this place one could add both comparisons : write time
> uncompressed vs. compressed, and read time uncompressed vs. compressed.
> Best regards,
> Jacek.



This archive was generated by hypermail 2b29 : Tue Jan 01 2002 - 17:50:37 MET