Re: Writing objects to file.

From: Rene Brun (Rene.Brun@cern.ch)
Date: Mon Apr 19 1999 - 18:09:12 MEST


Hi Alex,
See my comments below.
Your algorithm goes like N*N. You can do it <<log(N) !!!

Rene Brun

Alexander Zvyagin wrote:
> 
> Date: Fri, 16 Apr 1999 20:17:51 +0000
> From: ZVYAGIN <ZVYAGIN@mx.ihep.su>
> X-Sender: ZVYAGIN@polar.ihep.su
> To: ROOT talk mailing list <roottalk@hpsalo.cern.ch>
> Subject: Writing objects to file.
> Message-ID: <Pine.VMS.3.91-b11-vms.990416201435.6327A-100000@polar.ihep.su>
> MIME-Version: 1.0
> Content-Type: TEXT/PLAIN; charset=US-ASCII
> 
> Dear ROOT developers and users,
> 
> I have several questions about macro attached at the end of this mail.
> The macro instructions are simple:
>   w1) open ROOT file for writing
>   w2) write several TNamed objects in the file
>   w3) close file
>   r1) open the same file in "read" mode
>   r2) read objects back
> When objects are written/read to/from file the time of writing/reading
> is printed to screen.
> 
> Please take a look at the macro and readout from it.
> 
> Well, these are my questions:
> 1) Root increases it size (in memory) on 300 bytes after writing each
>    TNamed("","") object. Why? If this is correct, is it possible to free
>    the memory? (May be I need to write 10^7 objects and I do not have 3Gb
>    of meomry!)

For eack object written into a file with TObject::Write a new TKey
object
is created with a length depending on the object name (say typically 100
bytes
per object). 
TObject::Write is not intended to store millions of objects. There are
much better techniques to solve this problem (TTree). TTrees have been
designed
to cope with a very large number of entries. The overhead per entry is
virtually null.

> 2) The rate of writing/reading drops with time. Why?

The time to write increases slightly with the number of objects. The
Root
TFile/TDirectory has a member of type THashList supporting the list
of keys. This hashlist is created with an initial capacity of 100 slots.
In your case, your names are always Eventxxxxx, introducing some
inefficiency in the hashlist algorithm. However, this case could be
frequent.
We have improved the algorithm to give a better response in this case 
(independent of the number of entries in your case).


Your problem was mainly in reading your events. That is where you had a
N**2
problem. I have modify your algorithm in two forms (alex1 and alex2).
See the code below compared to your original algorithm (alex0).

time for alex0 (your original algorithm)
========================================
root [0] .x alex0.C
Writing...
b         : Real Time =   3.05 seconds Cpu Time =   3.05 seconds
b         : Real Time =   3.25 seconds Cpu Time =   3.25 seconds
b         : Real Time =   3.47 seconds Cpu Time =   3.47 seconds
b         : Real Time =   3.65 seconds Cpu Time =   3.65 seconds

Reading...
b         : Real Time =   7.07 seconds Cpu Time =   7.07 seconds
b         : Real Time =  29.32 seconds Cpu Time =  28.73 seconds
b         : Real Time =  51.22 seconds Cpu Time =  50.48 seconds
b         : Real Time =  71.01 seconds Cpu Time =  70.39 seconds

time for alex1 (a better algorithm)
===================================
root [0] .x alex1.C
Writing...
b         : Real Time =   3.04 seconds Cpu Time =   3.05 seconds
b         : Real Time =   3.26 seconds Cpu Time =   3.26 seconds
b         : Real Time =   3.47 seconds Cpu Time =   3.47 seconds
b         : Real Time =   3.68 seconds Cpu Time =   3.67 seconds

Reading...
b         : Real Time =   0.62 seconds Cpu Time =   0.62 seconds
b         : Real Time =   0.82 seconds Cpu Time =   0.82 seconds
b         : Real Time =   1.02 seconds Cpu Time =   1.02 seconds
b         : Real Time =   1.21 seconds Cpu Time =   1.21 seconds

time for alex2 (the best algorithm)
===================================
root [1] .x alex2.C
Writing...
b         : Real Time =   3.05 seconds Cpu Time =   3.04 seconds
b         : Real Time =   3.26 seconds Cpu Time =   3.26 seconds
b         : Real Time =   3.45 seconds Cpu Time =   3.45 seconds
b         : Real Time =   3.67 seconds Cpu Time =   3.67 seconds

Reading...
b         : Real Time =   0.31 seconds Cpu Time =   0.31 seconds
b         : Real Time =   0.31 seconds Cpu Time =   0.30 seconds
b         : Real Time =   0.31 seconds Cpu Time =   0.31 seconds
b         : Real Time =   0.32 seconds Cpu Time =   0.31 seconds

With the new algorithm in THashList, the time to write is:
root [0] .x alex.C
Writing...
b         : Real Time =   2.96 seconds Cpu Time =   2.96 seconds
b         : Real Time =   2.98 seconds Cpu Time =   2.97 seconds
b         : Real Time =   3.02 seconds Cpu Time =   3.02 seconds
b         : Real Time =   3.04 seconds Cpu Time =   3.04 seconds


Code for alex1.C (read part only)
=================================

  printf("\nReading...\n");

  TFile f2("alex.root");       // Open file for reading
  b.Reset();
  b.Start("b");
  THashList *keys = f2.GetListOfKeys();
  for( int i=1; ; i++ ) {
    if(i%update==0)
    {
      // Print time of reading last portion of objects.
      b.Show("b");
      b.Reset();
      b.Start("b");
    }
    sprintf(name,"event%d",i); // Create object name
    TKey *key = keys->FindObject(name);
    if (key == 0) break; // Can not read object - finish
    TNamed *o = (TNamed*)key->ReadObj();
    delete o;                // Delete object
  }

Code for alex2.C (reading only)
===============================

  printf("\nReading...\n");

  TFile f2("alex.root");       // Open file for reading
  b.Reset();
  b.Start("b");
  TIter next(f2.GetListOfKeys());
  TKey *key;
  Int_t i=0;
  while ((key = (TKey*)next())) {
    i++;
    if(i%update==0)
    {
      // Print time of reading last portion of objects.
      b.Show("b");
      b.Reset();
      b.Start("b");
    }
    TNamed *o = (TNamed*)key->ReadObj();
    delete o;                // Delete object
  }




> 
> With best wishes,
> Alexander Zvyagin.
> 
> // --- CUT HERE ---
> // File b.c
> 
> void b(void)
> {
>   char        name[111];         // Name of object
>   int         N      = 20000,    // Total objects amount
>               update = 5000;     // Update rate
>   TNamed      a        ("","");  // The object
>   TBenchmark  b;                 // For printing statistics
> 
>   // ----------------------------------
> 
>   printf("Writing...\n");
> 
>   TFile f1("1.root","RECREATE","",9);  // Open file for writing
>   b.Start("b");
>   for( int i=1; i<=N; i++ )            // Cycle...
>   {
>     if(i%update==0)
>     {
>       // Print time of writing last portion of objects.
>       b.Show("b");
>       b.Reset();
>       b.Start("b");
>     }
>     sprintf(name,"event%d",i);         // Create object name
>     a.Write(name);                     // Write object
>   }
>   f1.Close();
> 
>   // ----------------------------------
> 
>   printf("\nReading...\n");
> 
>   TFile f2("1.root");       // Open file for reading
>   b.Reset();
>   b.Start("b");
>   for( int i=1; ; i++ )
>   {
>     if(i%update==0)
>     {
>       // Print time of reading last portion of objects.
>       b.Show("b");
>       b.Reset();
>       b.Start("b");
>     }
>     sprintf(name,"event%d",i); // Create object name
>     TNamed *o = f2.Get(name);  // Read next object
>     if( NULL==o )
>       break;                   // Can not read object - finish
>     else
>       delete o;                // Delete object
>   }
> 
>   // ----------------------------------
> }
> 
> // --- CUT HERE ---
> 
>   *******************************************
>   *                                         *
>   *        W E L C O M E  to  R O O T       *
>   *                                         *
>   *   Version   2.21/08     18 March 1999   *
>   *                                         *
>   *  You are welcome to visit our Web site  *
>   *          http://root.cern.ch            *
>   *                                         *
>   *******************************************
> 
> CINT/ROOT C/C++ Interpreter version 5.13.92, Mar 13 1999
> Type ? for help. Commands must be C++ statements.
> Enclose multiple statements between { }.
> root [0] .x b.c
> Writing...
> b         : Real Time =   1.71 seconds Cpu Time =   1.48 seconds
> b         : Real Time =   2.11 seconds Cpu Time =   1.90 seconds
> b         : Real Time =   2.45 seconds Cpu Time =   2.30 seconds
> b         : Real Time =   3.00 seconds Cpu Time =   2.63 seconds
> 
> Reading...
> b         : Real Time =  20.07 seconds Cpu Time =  19.04 seconds
> b         : Real Time =  62.75 seconds Cpu Time =  60.52 seconds
> b         : Real Time = 103.11 seconds Cpu Time = 101.37 seconds
> b         : Real Time = 147.08 seconds Cpu Time = 140.51 seconds
> 
> Memory statistics:
> 
> Root size in memory (kb):
> before running the macro     9440
> after "writing"             15524
> after "reading"             15524



This archive was generated by hypermail 2b29 : Tue Jan 04 2000 - 00:43:31 MET