Hi Pasha, This is a LONG mail explaining the basics of the ROOT I/O. Your original event structure looks like: EVENT_AB A *fA ------------>Word[2] 8 bytes fUniqueID 4 bytes fBits 4 bytes B *fB ------------>Word[2] 8 bytes fUniqueID 4 bytes fBits 4 bytes fUniqueID 4 bytes fBits 4 bytes TOTAL ------------------------> 40 bytes (you should in principle add 2*4 bytes for the 2 pointers) For 100,000 events (I used 10 times more events than you because ROOT I/O is so fast on my PentiumPro that measurements with only 10000 events were not precise enough) you expect to write around 4 Mbytes. I ran your code in the configuration A above with no compression (A0) or compression (A1). Results are: A0 size = 7134335 bytes time = 3.61 seconds A1 = 842108 bytes = 8.49 seconds As you are correctly indicating in your email, the overhead in this rather academical example is pretty high (7.13 Mbytes instead of the expected 4 with no compression, but 0.84 Mbytes with compression). What is the explanation? When an object is written via the Streamer function (eg, the objects pointed by fA and fB in your example), ROOT writes: - The name of the class (in your case 9 bytes). One could in principle eliminate this when using the split method. - The class version identifier (2 bytes) specified in ClassDef. This version stamp is useful in case you intend to modify your class schema. - The 8 bytes of array Word[2] - The TObject class version (2 bytes) - the 8 bytes for TObject::fUniqueID and fBits TOTAL = 29 bytes +Samething for object pointed by fB. + 8 bytes for EVENT_AB::TObject::fUniqueID and fBits Grand TOTAL = 64 bytes The remaining 7.1 bytes/event are used for event bookeeping and pointers to the file. As you can see, the overhead is essentially due to your very small event size. We have optimized ROOT for event sizes of at least a few hundred bytes. For event sizes above 1 Kbyte, the overhead is totally negligible. However, even in the case of this small example, you should include the objects fA and fB in the main object EVENT_AB. I have modified your EVENT_AB.h file to look like: class EVENT_AB : public TObject { A fA; instead A *fA; B fB; of B *fB; In this case, the implementation of EVENT_AB.cc is much simpler: EVENT_AB::EVENT_AB() {} ~EVENT_AB::EVENT_AB() {} void EVENT_AB::init() {} The results for 100,000 events are: B0 size = 3695848 bytes time = 2.08 seconds B1 = 182997 bytes = 3.81 seconds As you can see, the overhead has totally disappeared. The compression algorithm is doing a better job (21.2 compared to 8.8 for case A1). These numbers indicate that the general bookeeping overhead in time is very small. Case B0 shows that ROOT can write up to 150,000 objects per second (3 objects per event) in split mode. You must be very careful in drawing conclusions on the file size. Make a more realistic implementation of your event model. Use composition instead of pointers for very small objects. The ROOT split algorithm works also for composed objects. Unless your event computation time is very small compared to I/O, use the default compression algorithm 1, or even better 2. With compression level 1, only the branches with integers are compressed in split mode. With compression level 2, all branches are compressed. In particular, the compression algorithm works very efficiently when data members are "nearly constant" between events (this is the case for fBits and fUniqueID for example). You should also note that the files written by ROOT are truely portable. The file structure supports direct access to any event in the file or subset of event (branch) if you use the split method. ROOT I/O is designed for large files (our files in NA49 are typically 1.5 Gigabytes) and to process a large collection of these files (chains). The time to insert a new event in a file is independent of the filesize and/or the number of events. If your operating system can supports Terabytes of disk, ROOT I/O should perfectly scale on your machine. I would be delighted if somebody could run the above example with OODBMS systems (such as Objectivity) and publish the results to this list (time, data base size, portability). Rene Brun Pasha Murat wrote: > Hello, > > after exercising a little bit with ROOT I/O I ended up with a problem which > might be interesting for everybody on this list and with an example, > full source code of which (including makefile) is enclosed below. > > What this example is doing: > -------------------------- > I'm writing out 10000 events , each event (EVENT_AB) is a very simple object. > It contains 2 pointers to dynamically allocated objects (A and B), > each of A and B is 2 integers+TObject (see header files below) > Defining pointers to A and B in the event serves for writing events > out in split mode. > > I intentionally use a small buffer(2000 bytes), which however is much larger > than the event size, so buffer size shouldn't be a factor in this exercise. > > Next, I'm trying to estimate the expected size of the output file. > Size of each A and B is 12 bytes (2*4+sizeof(TObject)=12), > size of the event header (EVENT_AB) is also 12 bytes: > 4+4+sizeof(TObject) = 12 bytes (the last number may be even smaller). > > With 10000 events written in it the output file should have its size of > about 10000*(12+12+12) = 360000 bytes. ROOT, however, creates a file, > which is 712959 bytes large, i.e. almost 2 times larger! > > Here is what the output says: > > /cdf/upgrade/tracking/murat/test_root>ab.exe > TFile Writing Name=ab.root Title=ab test > ************************************************************************************ > *Tree :AB : AB tree * > *Entries : 10000 : Total Size = 712644 bytes File Size = 712644 bytes * > * : : Tree compression factor = 1.00 * > ************************************************************************************ > *Branch :Event : Event * > *Entries : 10000 : BranchObject (see below) * > *..................................................................................* > *Branch :fA : fA * > *Entries : 10000 : Total Size = 290002 bytes File Size = 290002 bytes * > *Baskets : 166 : Basket Size = 2000 bytes EvOffsetLen= 1000 * > * : : Branch compression factor = 1.00 * > *..................................................................................* > *Branch :fB : fB * > *Entries : 10000 : Total Size = 290002 bytes File Size = 290002 bytes * > *Baskets : 166 : Basket Size = 2000 bytes EvOffsetLen= 1000 * > * : : Branch compression factor = 1.00 * > *..................................................................................* > *Branch :fUniqueID : fUniqueID * > *Entries : 10000 : Total Size = 39960 bytes File Size = 39960 bytes * > *Baskets : 20 : Basket Size = 2000 bytes EvOffsetLen= 0 * > * : : Branch compression factor = 1.00 * > *..................................................................................* > *Branch :fBits : fBits * > *Entries : 10000 : Total Size = 39960 bytes File Size = 39960 bytes * > *Baskets : 20 : Basket Size = 2000 bytes EvOffsetLen= 0 * > * : : Branch compression factor = 1.00 * > *..................................................................................* > /cdf/upgrade/tracking/murat/test_root>dir -l *.root > -rw-r--r-- 1 murat cdfupg 456479 Aug 29 11:19 Event.root > -rw-r--r-- 1 murat cdfupg 712959 Oct 25 23:36 ab.root > > ********************* > > - my 1st observation is that the size of each of fA and fB branches > (290002) is about 2.5 times larger than one could expect from > multiplying 12*10000 = 120000 > > - the size of fBits and fUniqueID branches (constituents of TObject) is > about right: 4*10000 = 40000 and the branch size is 39960 > > - then there is some missing component which accounts for > 712644-2*(290002+39960) = 52720 bytes ( about 5 bytes per event) > > The resulting question is: where the overhead (factor of 1.98) is coming from? > - It seems to be unacceptably large... > > I'd appreciate any comments and hope that I did something wrong, > > thanks, Pasha. > > ****************************************************** > source codes and makefile of the example which produced output above > ***************************************************** > > ------------------------------ A.hh > #ifndef __A_HH__ > #define __A_HH__ > #include "TObject.h" > > class A: public TObject { > public: > Int_t Word[2]; > > A(); > virtual ~A(); > ClassDef(A,1) > }; > > #endif > ----------------------------- B.hh > #ifndef __B_HH__ > #define __B_HH__ > #include "TObject.h" > > class B: public TObject { > public: > Int_t Word[2]; > > B(); > virtual ~B(); > ClassDef(B,1) > }; > > #endif > ----------------------------- EVENT_AB.hh > #ifndef __EVENT_AB_HH__ > #define __EVENT_AB_HH__ > #include "A.hh" > #include "B.hh" > > class EVENT_AB : public TObject { > A* fA; > B* fB; > public: > EVENT_AB(); > virtual ~EVENT_AB(); > void init(); > ClassDef(EVENT_AB,1) > }; > #endif > ------------------------------- A.cc > #include "A.hh" > > ClassImp(A) > > A::A() { > Word[0] = 101; > Word[1] = 202; > } > > A::~A() { > } > -------------------------------- B.cc > #include "B.hh" > > ClassImp(B) > > B::B() { > Word[0] = 101; > Word[1] = 202; > } > > B::~B() { > } > ---------------------------------- EVENT_AB.cc > #include "EVENT_AB.hh" > > ClassImp(EVENT_AB) > > EVENT_AB::EVENT_AB() { > fA = 0; > fB = 0; > } > > void EVENT_AB::init() { > if (fA) delete fA; > fA = new A(); > if (fB) delete fB; > fB = new B(); > } > > EVENT_AB::~EVENT_AB() { > if (fA) delete fA; > if (fB) delete fB; > } > ---------------------------------- test_write1.cc > // -*- Mode: C++ -*- > //------------------------------------------------------------------------------ > // Oct 03 1997 P.Murat > // > // revision history : > // ------------------ > // *0000 Oct 14 1997 P.Murat: creation date > //------------------------------------------------------------------------------ > #ifdef __GNUG__ > #pragma implementation > #endif > > #include <stdlib.h> > > #include "TROOT.h" > #include "TFile.h" > #include "TRandom.h" > #include "TTree.h" > #include "TBranch.h" > #include "TClonesArray.h" > #include "TStopwatch.h" > > #include "EVENT_AB.hh" > > EVENT_AB* Event; > TROOT* Root; > TFile* File; > TTree* Tree; > TBranch* BranchAB; > > int main() { > > int split, bufsize; > int comp = 0; > > Root = ::new TROOT("root","root"); > > File = new TFile("ab.root","RECREATE","ab test"); > > File->SetCompressionLevel(comp); > Tree = new TTree("AB","AB tree"); > > // autosave when 1 Mbyte written > Tree->SetAutoSave(1000000); > bufsize = 2000; > split = 1; > Event = new EVENT_AB(); > BranchAB = Tree->Branch("Event","EVENT_AB",&Event,bufsize,split); > > for (int i=0; i<10000; i++) { > Event->init(); > Tree->Fill(); > } > > File->Write(); > Tree->Print(); > // tree should be deleted BEFORE > // the file is closed > delete Tree; > delete Event; > File->Close(); > delete File; > delete Root; > } > ------------------------------------------- Makefile > ROOTLIBS = -L$(ROOTSYS)/lib -lBase -lRint -lCint -lClib -lCont -lFunc -lGraf \ > -lGraf3d -lHist -lHtml -lMeta -lMinuit -lNet -lPostscript \ > -lProof -lTree -lUnix -lZip > ROOTGLIBS = -lGpad -lGX11 -lMotif -lWidgets -lX3d > > # SGI with GCC > CXX = gcc > CXXFLAGS = -fsigned-char -fPIC -w -g -I$(ROOTSYS)/include > LD = gcc > LDFLAGS = -Wl,-u,__builtin_new -Wl,-u,__builtin_delete -Wl,-u,__nw__FUiPv > SOFLAGS = -Wl,-soname,libEvent.so -shared > LIBS = $(ROOTLIBS) -lg++ -lm -ldl > #GLIBS = $(ROOTLIBS) $(ROOTGLIBS) -lXm -lXt -lX11 -lg++ -lm -lPW -ldl > > # don't need -ldl on IRIX 6.2 > > GLIBS = $(ROOTLIBS) $(ROOTGLIBS) -lXm -lXt -lX11 -lg++ -lm -lPW > > w1: > > rootcint -f ab_cint.cc -c A.hh B.hh EVENT_AB.hh > gcc -o ab.exe $(CPPFLAGS) $(CXXFLAGS) A.cc B.cc EVENT_AB.cc \ > ab_cint.cc test_write1.cc $(LDFLAGS) $(GLIBS) > ******************************************************************************** > ******************************************************************************** > /cdf/upgrade/tracking/murat/test_root>make w1 > rootcint -f ab_cint.cc -c A.hh B.hh EVENT_AB.hh > Note: operator new() masked c > gcc -o ab.exe -fsigned-char -fPIC -w -g -I/cdf/upgrade/root/v1_03/include A.cc B.cc EVENT_AB.cc \ > ab_cint.cc test_write1.cc -Wl,-u,__builtin_new -Wl,-u,__builtin_delete -Wl,-u,__nw__FUiPv -L/cdf/upgrade/root/v1_03/lib -lBase -lRint lCint -lClib -lCont -lFunc -lGraf -lGraf3d -lHist -lHtml -lMeta -lMinuit -lNet -lPostscript -lProof -lTree -lUnix -lZip -lGpad -lGX11 -lMotif -Widgets -lX3d -lXm -lXt -lX11 -lg++ -lm -lPW > 10.251u 1.871s 0:18.02 67.2% 0+0k 344+36io 96pf+0w
This archive was generated by hypermail 2b29 : Tue Jan 04 2000 - 00:26:21 MET