Re: TTree::AutoSave() from Tom Roberts on 2007-05-31 (RootTalk)

From: Tom Roberts <tjrob_at_fnal.gov>
Date: Wed, 30 May 2007 23:58:59 -0500

Rene Brun wrote:
> -I do not understand why you need to have 17 ntuples in a G4 simulation.

It depends on what one is simulating. I am simulating beamlines, not a detector. In a beamline one usually wants to characterize the composition of the beam at several different locations (i.e. be able to histogram the particle types, their momenta, positions, and angles, etc.). The simplest way to do this is to sample the beam when it enters a thin volume (we call it a virtualdetector because it is not a real component of the beamline), and write the track variables into a TNtuple. It is most natural to have a separate TNtuple for each location (virtualdetector), and for a beamline with 17 virtualdetectors there are 17 TNtuples in one TFile. The user can also create a wide TNtuple that is the union of several virtualdetectors, requiring each track to hit some subset of them.

This design originally used HistoScope, and separate TNtuples was the only approach possible. But in Root I can now imagine writing a single TTree consisting of Events, each of which has one or more Tracks, each of which has zero or more hits in the virtualdetectors, a hit being an array of floats (just like the row of the TNtuple) in its own TBranch. I guess this is what you are advocating -- I have to think about it. Fortunately the simulation program is modular enough so I could implement both approaches and let the user choose (the user can still select HistoScope on platforms that support it, as well as several ASCII formats). The Root macro to generate histograms from such a TFile will be significantly more complex, though; and it will probably be specific to output from my program (my current macro can handle any TNtuples from any source). My users are neither Root experts nor C++ programmers.

> You have something wrong
>  in your data model. Most simulation applications that I see produce one 
> single Tree in output.
>  This is simpler to handle and it is more efficient in space and time.

They are simulating a completely different type of system, and naturally want a completely different type of output.

I doubt the approach above would be simpler. And I don't care about efficiency here, as the Root part of this package is plenty good enough. But I do need to think about this alternate design and whether it would be worthwhile....

Thanks,

Tom Roberts

> 
> Rene Brun
> 
> Tom Roberts wrote:

>> Rene Brun wrote:
>>> It looks like your Tree has a small number of entries and all entries
>>> fit in memory in the branch buffers,
>>
>> I don't know. The file without calls to AutoSave() is 326 kbytes, with
>> a dozen calls is 589 kbytes -- clearly small enough to fit in memory.
>> This is 1,000 events.
>>
>>
>>> or more entries but with very large buffer sizes. I can tell you more
>>> if you send the result of TTree::Print.
>>
>> Well, I'm removing the calls to AutoSave(), as in practice it's easier
>> to re-run a crashed job than to deal with the hassle. My users are
>> non-experts, and having two copies of every TNtuple is confusing.
>>
>>
>>
>> Is there really no way to "checkpoint" all of the TTree-s (or
>> TNtuple-s) in a TFile so that if the program crashes the file can be
>> opened and used up to the last checkpoint? Without keeping a second
>> copy of all the data or a second version of each TTree.
>>
>>
>>> Note that, in general, it is a very bad idea to have many Trees in
>>> the same file. It is more efficient
>>> to have one single Tree with more branches.
>>
>> This is a Geant4 simulation that was based on HistoScope (now
>> defunct). It generates 17 different TNtuples, and they cannot be
>> combined; we find it more convenient to have 1 root file than 17. I
>> have written a Root macro that essentially performs the duties of the
>> old "histo" program (i.e. easily and interactively generate histograms
>> from TNtuples, with sliders for cuts). Efficiency is not a problem, as
>> even a 250k event TNtuple can be re-scanned as I move a slider without
>> undue delay (it's great to have such computing power in a laptop!).
>>
>>
>> Tom Roberts
>>
>>
>>>
>>> Rene Brun
>>>
>>>
>>> Tom Roberts wrote:
>>>> When I use TTree::Autosave() on a tree in a TFile, I get two copies
>>>> of the TTree as expected (different versions), but the file is
>>>> roughly twice as large. As I am making files that are quite large,
>>>> doubling their size is not acceptable. Note my TFile contains many
>>>> TTree-s, and I call AutoSave() on all of them every N events
>>>> (N=100000 by default).
>>>>
>>>> Is there any other way to "checkpoint" the output file so if the
>>>> generating program crashes I can still recover most of the data that
>>>> was written to the .root file, without doubling the size of the file?
>>>>
>>>> In a related question: how does TFile::Flush() relate to such
>>>> "checkpoints"? My guess is that partly filled buffers of the TTree
>>>> are not written to disk, and for consistency this really needs to
>>>> happen at the TTree level, not the TFile level.
>>>>
>>>>
>>>> Tom Roberts
>>>

> Received on Thu May 31 2007 - 06:59:20 CEST

This archive was generated by hypermail 2.2.0 : Thu May 31 2007 - 11:50:01 CEST