Hi Konstantin, In the CVS version, I modified TFile::WriteBuffer to set the bit kWriteError in case of errors of the type: "error writing all requested bytes to file" I modified TDirectory::WriteObject to return 0 bytes in case of an error while writing to the file. I also modified the signature of TTree::AutoSave to return the number of bytes written to the file. A null value indicates an error. Rene Brun Konstantin Olchanski wrote: > > Rooters- in a production environement, our application is writing ROOT > TTree files into an unreliable storage array (NFS to GPFS to FiberChannel > to RAID-whatever). Any disk write errors (intermittent disk full, > intermittent I/O errors, etc) corrupt the output ROOT file, so we > want to catch the errors and stop the application. > > It turns out that catching the disk errors while writing a ROOT TTree > file is not simple. The TTree->AutoSave() and TTree->Fill() are "void" > and do not return success or failure status. > > One can check the TFile->TestBits(kWriteError), but some write > errors corrupt the output file without setting the file->SetBit(kWriteError); > > For example, consider this stack trace: (ROOT v3.10.2) > > #0 0x420d6fb0 in write () from /lib/i686/libc.so.6 > #1 0x40102661 in TFile::SysWrite(int, void const*, int) (this=0x8896098, fd=143231640, buf=0xe8, len=-1) at base/src/TFile.cxx:2019 > #2 0x40100630 in TFile::WriteBuffer(char const*, int) (this=0x8896098, buf=0x8898a98 "", len=232) at base/src/TFile.cxx:1466 > #3 0x4010733f in TKey::WriteFile(int) (this=0x8896ee0, cycle=0) at base/src/TKey.cxx:762 > #4 0x40113264 in TObject::Write(char const*, int, int) (this=0x8896c40, name=0x0, option=0, bufsize=0) at base/src/TObject.cxx:889 > #5 0x40ade893 in TTree::AutoSave(char const*) (this=0x8896c40, option=0x80494b5 "") at tree/src/TTree.cxx:685 > #6 0x08049039 in savetree_ () at tree.cpp:156 > > If "write()" does not write all the data (the disk is full), > it returns a short count to TKey::WriteFile(). There, the error condition is > ignored, with the output file corrupted, and with "kWriteError" not > flagged. > > If the disk error is intermittent and goes away by the time we want to > write something again (or if write() always returns a short count rather > than -1), we get undetectable output file corruption. > > We do get "error writing all requested bytes to file %s, wrote %d of %d" error messages to stderr, but the application cannot see them and continues chewing up cpu-hours writing an unreadable output file. > > Even if TKey::WriteFile() were to propagate the error condition, > it is again ignored in TTree::AutoSave() (wOK=Write(), wOK is not used), > with output file corrupted but "kWriteError" not flagged. > > I did not check if similar problems exist in the TTree->Fill() path. > > Ideally, TTree->AutoSave() and TTree->Fill() should return an error > status. Otherwise, we could detect the error and set the TFile::kWriteError > bit in TKey::WriteFile() and elsewhere. > > Any thoughts? > Should I try to come up with a patch for flagging file->SetBit(kWriteError)? > > -- > Konstantin Olchanski > Data Acquisition Systems: The Bytes Must Flow! > Email: olchansk-at-triumf-dot-ca > Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada
This archive was generated by hypermail 2b29 : Sun Jan 02 2005 - 05:50:08 MET