I’m trying to use gProof->GetManager()->GetFile in the (compiled) program after a PROOF session and in such a case GetFile hangs at 16Kb or so. It doesn’t work even when I’m closing the session first and then call TProof::Mgr()->GetFIle().
However, in the ROOT console TProof::Mgr()->GetFIle() works just fine.
root [0] p = TProof::Open("lxslc22.ihep.ac.cn")
Starting master: opening connection ...
Starting master: OK
Opening connections to workers: OK (6 workers)
Setting up worker servers: OK (6 workers)
PROOF set to parallel mode (1 worker)
(class TProof*)0x84f9b00
root [1] p->GetManager()
(class TProofMgr*)0x844b5e0
root [2] p->GetManager()
(class TProofMgr*)0x844b5e0
root [3] TProof::Mgr("lxslc22.ihep.ac.cn")
(class TProofMgr*)0x844b5e0
root [4] TProof::Mgr("lxslc22.ihep.ac.cn")
(class TProofMgr*)0x844b5e0
You are right, “lxslc.ihep.ac.cn” is not valid host name. I think the only place it could come from is reverse dns record for IP-address associated with lxslc22.
The other thing I’ve just mentioned is as follows
root [4] p->GetManager()->GetFile("/afs/ihep.ac.cn/users/e/eugenyboger/root/bin/xrd","/tmp/")
Local file exists already: would you like to overwrite it? [N/y]y
[GetFile] Total 2.03 MB |>...................Error: Symbol #include is not defined in current scope (tmpfile):1:
Error: Symbol exception is not defined in current scope (tmpfile):1:
Syntax Error: #include <exception> (tmpfile):1:
Error: Symbol G__exception is not defined in current scope (tmpfile):1:
Error: type G__exception not defined FILE:(tmpfile) LINE:1
(Int_t)0
*** Interpreter error recovered ***
...
Root > .q
SysError in <TPosixCondition::~TPosixCondition>: pthread_cond_destroy error (No such file or directory)
SysError in <TPosixMutex::~TPosixMutex>: pthread_mutex_destroy error (No such file or directory)
It happens (instead of hang described above) when the destination file already exists.
Also performed some test on my laptop with local PROOF started: after
p = TProof::Open("localhost")
Calls to GetFile will eventually fail
root [0] p = TProof::Open("localhost") Starting master: opening connection ...
Starting master: OK
Opening connections to workers: OK (2 workers)
Setting up worker servers: OK (2 workers)
PROOF set to parallel mode (2 workers)
(class TProof*)0x84c7ef8
root [1] for (i=0;i<20; i++) TProof::Mgr("localhost")->GetFile("test.root","/tmp/","force")
Warning: Automatic variable i is allocated (tmpfile):1:
[GetFile] Total 0.10 MB |====================| 100.00 % [14.1 MB/s]
[GetFile] Total 0.10 MB |===>................Error: Symbol #include is not defined in current scope (tmpfile):1:
Error: Symbol exception is not defined in current scope (tmpfile):1:
Syntax Error: #include <exception> (tmpfile):1:
Error: Symbol G__exception is not defined in current scope (tmpfile):1:
Error: type G__exception not defined FILE:(tmpfile) LINE:1
*** Interpreter error recovered ***
root [2] terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
So the problem doesn’t seem to be related to particular cluster.
The problem is still here, I have just seen it with latest svn and with different proof cluster (xrootd is from ROOT 5.26b) . Sometimes it works, sometimes it does not:
Mst-0: grand total: sent 57 objects, size: 15541714 bytes
[GetFile] Total 0.06 MB |===================>terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted
upd: errors seem to be somehow related to the file size
stressProof test show no errors.
However, these two simple steps almost always tend to hang:
p = TProof::Open("xrootd@lgdui01");
for (i=0;i<200; i++) p->GetManager()->GetFile("/data/pool/testfile.dd","/tmp/","force");
The output is like:
root [1] for (i=0;i<200; i++) p->GetManager()->GetFile("/data/pool/testfile.dd","/tmp/","force")
Warning: Automatic variable i is allocated (tmpfile):1:
[GetFile] Total 4.00 MB |====================| 100.00 % [10.8 MB/s]
[GetFile] Total 4.00 MB |====================| 100.00 % [10.8 MB/s]
[GetFile] Total 4.00 MB |====================| 100.00 % [10.8 MB/s]
[GetFile] Total 4.00 MB |====================| 100.00 % [10.8 MB/s]
[GetFile] Total 4.00 MB |====================| 100.00 % [10.8 MB/s]
[GetFile] Total 4.00 MB |====================| 100.00 % [10.8 MB/s]
[GetFile] Total 4.00 MB |====================| 100.00 % [10.9 MB/s]
[GetFile] Total 4.00 MB |====================| 100.00 % [10.9 MB/s]
[GetFile] Total 4.00 MB |====================| 100.00 % [10.9 MB/s]
[GetFile] Total 4.00 MB |====================| 100.00 % [10.9 MB/s]
[GetFile] Total 4.00 MB |==============>.....| 0.00 % [10.7 MB/s]
And console hang forever.
Don’t really know is it related to the problem discussed or not, but it definitely seems strange.