Re: [ROOT] 2.25.03: anomalous TKey cycles in TFile for large TTrees

From: Fons Rademakers (Fons.Rademakers@cern.ch)
Date: Tue Nov 14 2000 - 18:58:30 MET


Hi Matt,

   use TDirectory::Purge() to get rid of any lower cycles in your file.
The gaps in the file are reused next time you add something to the file.

Cheers, Fons.


"Matthew D. Langston" wrote:
> 
> Hi Rene,
> 
> Thank you for your help, and for setting me straight about the AutoSave
> feature.
> 
> Wouldn't it make more sense for the AutoSave feature to treat the last
> TTree::Write in the same way as previous calls to TTree::Write?  In other
> words, if the last TTree::Write is successful, why keep around the previous
> cycle produced by the AutoSave mechanism?  That previous cycle is useless,
> and just takes up precious space if the last TTree::Write is successful.
> 
> Also, is there a convenient "purge" facility (ala VMS) to get rid of old
> cycles in a TFile?  Or, does one just have to iterate through the TKeys of
> the TFile by hand and delete the corresponding objects.  If the latter, then
> is it necessary to "defragment" the TFile after deleting old cycles, or does
> ROOT do this for us?
> 
> Regards, Matt
> 
> ----- Original Message -----
> From: "Rene Brun" <Rene.Brun@cern.ch>
> To: "Matthew D. Langston" <langston@SLAC.Stanford.EDU>
> Cc: "roottalk" <roottalk@pcroot.cern.ch>
> Sent: Monday, November 13, 2000 11:43 PM
> Subject: Re: [ROOT] 2.25.03: anomalous TKey cycles in TFile for large TTrees
> 
> > Hi Matt,
> > The behaviour you describe is normal. See URL:
> >     http://root.cern.ch/root/html225/TTree.html#TTree:AutoSave
> >
> > The default value for AutoSave is to save a copy of the Tree header
> > as soon as you have written 100 Mbytes (uncompressed) to the file.
> > When AutoSave is called, a new key is created with the current header.
> > If the creation of the key is successful, the previous key is deleted.
> > At the end of the job, when you do tree.Write, a new key is
> > always created, keeping the previous cycle of the same key, if any.
> > You can use the option kOverwrite when calling the final tree.Write
> > if you want to keep one single key.
> >
> > The Autosave facility has been introduced against possible job crashes.
> > You can always recover data up to the last AutoSave.
> >
> > In case you loop yourself on the list of TKeys in a directory,
> > you have to take into account teh fact that one key may have more than
> > one cycle.
> >
> > Rene Brun
> >
> > Matthew D. Langston wrote:
> > >
> > > The problem concerns ROOT 2.25.03 under RedHat Linux 6.1 Intel, and is
> > > reproducible.
> > >
> > > I have come across an anomaly with TFile's containing large TTrees
> (where
> > > "large" means each TTree is anywhere from a few hundred MB to 1 GB
> > > uncompressed, and the total size of the TFile is about 1 GB compressed).
> > > The problem is that the ROOT I/O seems to be creating erroneous TKey
> cycles
> > > in the TFile.  Are my TFiles being created corrupt?  Or, are these extra
> > > TKey cycles just an artifact of the TTree creation and filling process?
> If
> > > so, do I need to "purge" (ala VMS) these erroneous cycles?  If so, how?
> > >
> > > I have attached to this e-mail (in the file called TTree_Print.txt) the
> > > output of calling TTree::Print just after I write each of six TTrees to
> a
> > > TFile, where each TTree is written to a separate TDirectory within the
> > > TFile.  As you can see by looking at this attached file, there are six
> > > sections of output from TTree::Print for each of the six TTrees that I
> > > created and filled.
> > >
> > > However, when I later look at this TFile with TBrowser, or by simply
> > > iterating through each TKey in the TFile and printing out the name and
> cycle
> > > number of the TTrees that I find, I see several cycles for each TTree.
> I
> > > have attached this output in the second file included with this email
> (this
> > > file is called TFile_contents.txt).
> > >
> > > For example, the first TTree that I create has 72019 entries according
> to
> > > TTree::Print just after filling and writing the TTree.  However, when I
> > > later reopen the TFile and iterate over it, I find two TTrees with cycle
> > > numbers 1 and 2, each with 67509 entries and 72019 entries,
> respectively.
> > >
> > > Finally, I have attached a table (in the file TKey_cycle_anomaly.txt) of
> the
> > > output for the TFile that shows the cycle numbers, the number of entries
> in
> > > each TTree, and the difference in the number of entries between the
> "good"
> > > cycle and the "erroneous" cycle.  Note that one of the TTrees, the one
> in
> > > the TDirectory named "year_1995", didn't have one of the erroneous TKey
> > > cycles created (i.e. this particular TTree is fine).  It would appear
> that
> > > this "erroneous TKey cycle effect" is triggered after a certain number
> of
> > > bytes is written to the TFile.
> > >
> > > So, based on this data, I would claim that ROOT created these additional
> > > TTree cycles when it shouldn't have:
> > >
> > > year_1993/SelectedWABEvents;1
> > >
> > > year_1994/SelectedWABEvents;1
> > >
> > > year_1996/SelectedWABEvents;2
> > >
> > > year_1997/SelectedWABEvents;5
> > >
> > > year_1998/SelectedWABEvents;10
> > >
> > > Would anyone be able to comment on this?  This is such a dramatic effect
> > > that I would assume that other collaborations which create large TTrees
> and
> > > TFiles would have seen this too.  Is there a workaround for this?
> > >
> > > Regards, Matt
> > >
> > > --
> > > Matthew D. Langston
> > > SLD, Stanford Linear Accelerator Center
> > > langston@SLAC.Stanford.EDU
> > >
> >
>   --------------------------------------------------------------------------
> ------
> > >
> > >                          Name: TTree_Print.txt
> > >    TTree_Print.txt       Type: Plain Text (text/plain)
> > >                      Encoding: 7BIT
> > >
> > >                             Name: TFile_contents.txt
> > >    TFile_contents.txt       Type: Plain Text (text/plain)
> > >                         Encoding: 7BIT
> > >
> > >                                 Name: TKey_cycle_anomaly.txt
> > >    TKey_cycle_anomaly.txt       Type: Plain Text (text/plain)
> > >                             Encoding: 7BIT
> >

-- 
Org:    CERN, European Laboratory for Particle Physics.
Mail:   1211 Geneve 23, Switzerland
E-Mail: Fons.Rademakers@cern.ch              Phone: +41 22 7679248
WWW:    http://root.cern.ch/~rdm/            Fax:   +41 22 7677910



This archive was generated by hypermail 2b29 : Tue Jan 02 2001 - 11:50:37 MET