Rooters- in a production environement, our application is writing ROOT TTree files into an unreliable storage array (NFS to GPFS to FiberChannel to RAID-whatever). Any disk write errors (intermittent disk full, intermittent I/O errors, etc) corrupt the output ROOT file, so we want to catch the errors and stop the application. It turns out that catching the disk errors while writing a ROOT TTree file is not simple. The TTree->AutoSave() and TTree->Fill() are "void" and do not return success or failure status. One can check the TFile->TestBits(kWriteError), but some write errors corrupt the output file without setting the file->SetBit(kWriteError); For example, consider this stack trace: (ROOT v3.10.2) #0 0x420d6fb0 in write () from /lib/i686/libc.so.6 #1 0x40102661 in TFile::SysWrite(int, void const*, int) (this=0x8896098, fd=143231640, buf=0xe8, len=-1) at base/src/TFile.cxx:2019 #2 0x40100630 in TFile::WriteBuffer(char const*, int) (this=0x8896098, buf=0x8898a98 "", len=232) at base/src/TFile.cxx:1466 #3 0x4010733f in TKey::WriteFile(int) (this=0x8896ee0, cycle=0) at base/src/TKey.cxx:762 #4 0x40113264 in TObject::Write(char const*, int, int) (this=0x8896c40, name=0x0, option=0, bufsize=0) at base/src/TObject.cxx:889 #5 0x40ade893 in TTree::AutoSave(char const*) (this=0x8896c40, option=0x80494b5 "") at tree/src/TTree.cxx:685 #6 0x08049039 in savetree_ () at tree.cpp:156 If "write()" does not write all the data (the disk is full), it returns a short count to TKey::WriteFile(). There, the error condition is ignored, with the output file corrupted, and with "kWriteError" not flagged. If the disk error is intermittent and goes away by the time we want to write something again (or if write() always returns a short count rather than -1), we get undetectable output file corruption. We do get "error writing all requested bytes to file %s, wrote %d of %d" error messages to stderr, but the application cannot see them and continues chewing up cpu-hours writing an unreadable output file. Even if TKey::WriteFile() were to propagate the error condition, it is again ignored in TTree::AutoSave() (wOK=Write(), wOK is not used), with output file corrupted but "kWriteError" not flagged. I did not check if similar problems exist in the TTree->Fill() path. Ideally, TTree->AutoSave() and TTree->Fill() should return an error status. Otherwise, we could detect the error and set the TFile::kWriteError bit in TKey::WriteFile() and elsewhere. Any thoughts? Should I try to come up with a patch for flagging file->SetBit(kWriteError)? -- Konstantin Olchanski Data Acquisition Systems: The Bytes Must Flow! Email: olchansk-at-triumf-dot-ca Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada
This archive was generated by hypermail 2b29 : Sun Jan 02 2005 - 05:50:08 MET