Passing user proxy to proof

Hello,

I am new to PROOF and am doing some testing a few nodes and am having trouble getting GSI authentication to work. I have two scenarios I want to test: reading from local disk and reading from the CMS xrootd federation. Reading from local disk works without issue. To read from cms-xrd-global.cern.ch I need to pass a valid proxy to PROOF.

I tried to enable gsi authentication on my manager and worker nodes with these config lines:

xpd.seclib libXrdSec.so
xpd.sec.protocol gsi -dlgpxy:1

When I run my script, I am prompted for my passphrase and once I enter it I immediately see this message:

*** Break *** segmentation violation

Here is the bottom of the stack trace:

#11 0x00000032384ae907 in bn_mul_words () from /usr/lib64/libcrypto.so.10
#12 0x00000032384a96fc in BN_mul () from /usr/lib64/libcrypto.so.10
#13 0x00000032384c85de in RSA_check_key () from /usr/lib64/libcrypto.so.10
#14 0x00002b6ba75a582e in XrdSslgsiX509CreateProxy(char const*, char const*, XrdProxyOpt_t*, XrdCryptosslgsiX509Chain*, XrdCryptoRSA**, char const*) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc481/cms/cmssw/CMSSW_7_3_0/external/slc6_amd64_gcc481/lib/libXrdCryptossl.so.1
#15 0x00002b6ba7575df2 in XrdSecProtocolgsi::InitProxy(ProxyIn_t*, XrdCryptosslgsiX509Chain*, XrdCryptoRSA**) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc481/cms/cmssw/CMSSW_7_3_0/external/slc6_amd64_gcc481/lib/libXrdSecgsi.so
#16 0x00002b6ba757652a in XrdSecProtocolgsi::QueryProxy(bool, XrdSutCache*, char const*, XrdCryptoFactory*, int, ProxyIn_t*, ProxyOut_t*) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc481/cms/cmssw/CMSSW_7_3_0/external/slc6_amd64_gcc481/lib/libXrdSecgsi.so
#17 0x00002b6ba75791e3 in XrdSecProtocolgsi::ClientDoInit(XrdSutBuffer*, XrdSutBuffer**, XrdOucString&) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc481/cms/cmssw/CMSSW_7_3_0/external/slc6_amd64_gcc481/lib/libXrdSecgsi.so
#18 0x00002b6ba75793a5 in XrdSecProtocolgsi::ParseClientInput(XrdSutBuffer*, XrdSutBuffer**, XrdOucString&) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc481/cms/cmssw/CMSSW_7_3_0/external/slc6_amd64_gcc481/lib/libXrdSecgsi.so
#19 0x00002b6ba7579705 in XrdSecProtocolgsi::getCredentials(XrdSecBuffer*, XrdOucErrInfo*) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc481/cms/cmssw/CMSSW_7_3_0/external/slc6_amd64_gcc481/lib/libXrdSecgsi.so
#20 0x00002b6ba689e40a in XrdProofConn::Authenticate (this=0x1bfab70, plist=0x1da52b0 “&P=gsi,v:10300,c:ssl,ca:c7a717ce.0”, plsiz=35) at /build/bellenot/SPI/x86_64-slc6-gcc48-dbg/root/proof/proofd/src/XrdProofConn.cxx:1287
#21 0x00002b6ba689db9d in XrdProofConn::Login (this=0x1bfab70) at /build/bellenot/SPI/x86_64-slc6-gcc48-dbg/root/proof/proofd/src/XrdProofConn.cxx:1177
#22 0x00002b6ba689bead in XrdProofConn::GetAccessToSrv (this=0x1bfab70, p=0x0) at /build/bellenot/SPI/x86_64-slc6-gcc48-dbg/root/proof/proofd/src/XrdProofConn.cxx:881
#23 0x00002b6ba6897d8e in XrdProofConn::Connect (this=0x1bfab70) at /build/bellenot/SPI/x86_64-slc6-gcc48-dbg/root/proof/proofd/src/XrdProofConn.cxx:224
#24 0x00002b6ba6897ac9 in XrdProofConn::Init (this=0x1bfab70, url=0x1d92250 “proof://cms-a000.rcac.purdue.edu:1093/”) at /build/bellenot/SPI/x86_64-slc6-gcc48-dbg/root/proof/proofd/src/XrdProofConn.cxx:182
#25 0x00002b6ba689768f in XrdProofConn::XrdProofConn (this=0x1bfab70, url=0x1d92250 “proof://cms-a000.rcac.purdue.edu:1093/”, m=77 ‘M’, psid=36, capver=1 ‘001’, uh=0x1d92530, logbuf=0x1d92589 “”) at /build/bellenot/SPI/x86_64-slc6-gcc48-dbg/root/proof/proofd/src/XrdProofConn.cxx:118
#26 0x00002b6ba688be0b in TXSocket::TXSocket (this=0x1d92390, url=0x1d92250 “proof://cms-a000.rcac.purdue.edu:1093/”, m=67 ‘C’, psid=36, capver=1 ‘001’, logbuf=0x0, loglevel=-1, handler=0x1d8b0a0) at /build/bellenot/SPI/x86_64-slc6-gcc48-dbg/root/proof/proofx/src/TXSocket.cxx:192
#27 0x00002b6ba6878b9d in TXProofMgr::Init (this=0x1d8af20) at /build/bellenot/SPI/x86_64-slc6-gcc48-dbg/root/proof/proofx/src/TXProofMgr.cxx:123
#28 0x00002b6ba6878a6c in TXProofMgr::TXProofMgr (this=0x1d8af20, url=0x1d8aa70 “proof://cms-a000.rcac.purdue.edu/”, dbg=-1, alias=0x0) at /build/bellenot/SPI/x86_64-slc6-gcc48-dbg/root/proof/proofx/src/TXProofMgr.cxx:101
#29 0x00002b6ba68789c0 in GetTXProofMgr (url=0x1d8aa70 “proof://cms-a000.rcac.purdue.edu/”, l=-1, al=0x0) at /build/bellenot/SPI/x86_64-slc6-gcc48-dbg/root/proof/proofx/src/TXProofMgr.cxx:82
#30 0x00002b6ba64c1fc2 in TProofMgr::Create (uin=0x1d8aab0 “http://cms-a000.rcac.purdue.edu:1093/”, loglevel=-1, alias=0x0, xpd=true) at /build/bellenot/SPI/x86_64-slc6-gcc48-dbg/root/proof/proof/src/TProofMgr.cxx:564
#31 0x00002b6ba64a6502 in TProof::Open (cluster=0x1d1d528 “cms-a000.rcac.purdue.edu:1093”, conffile=0x1d1d840 “workers=10”, confdir=0x0, loglevel=0) at /build/bellenot/SPI/x86_64-slc6-gcc48-dbg/root/proof/proof/src/TProof.cxx:12069

I also tried this line I saw from a different post:
xpd.sec.protocol gsi -dlgpxy:1 -d:1 -certdir:/etc/grid-security/certificates -cert:/etc/grid-security/xrd/xrdcert.pem -key:/etc/grid-security/xrd/xrdkey.pem

But, still get the same result.

Any thoughts on where this error is coming from? I attached my xpd.cf. Let me know if you would like any other information.

Thanks,
Erik
xpd.cf.txt (943 Bytes)

Dear Erik,

This looks like a binary incompatibility issue.
The problem happens on the client machine already: is that also on SLC6 or other?
Which version of xrootd are you running? Has this been build against the openssl which is being picked-up?
I know, this are perhaps difficult questions to answer, but we have to find out, before embarking in a stressful debug process.

From what I can see, it looks like you are working with ROOT 5.34/18 and in the backtrace the source refers to /build/bellenot/SPI/x86_64-slc6-gcc48-dbg/ . The SPI build for 5.34/18 was still done with xrootd v3.2.7 .
Also, on the cluster you seem to take XrdProofd from

     /cvmfs/cms.cern.ch/slc6_amd64_gcc481/cms/cmssw/CMSSW_7_3_0/external/slc6_amd64_gcc481

Is this compatible with the one in

    /grp/cms/tools/xrootd/Packages/5.34.18/x86_64-slc6-gcc48-dbg

?

Unfortunately in the last couple of years we had incompatible changes in openssl (0.9.x is not compatible with 1.0.x) and also xrootd. The only way out is to have ‘closed’ environment where everything is under control.
Do you have all what you need under

/cvmfs/cms.cern.ch/slc6_amd64_gcc481/cms/cmssw/CMSSW_7_3_0/external/slc6_amd64_gcc481

?
If yes, perhaps we should stick to that.

G

Yes, the client is SLC6

The servers have xrootd4.0.4 installed from the OSG repos.
The openssl version is:
/usr/lib64/libcrypto.so → libcrypto.so.1.0.1e
If I just run an xrootd server instance on the server with gsi auth enabled I can copy files out with a valid proxy.

[quote=“ganis”]
From what I can see, it looks like you are working with ROOT 5.34/18 and in the backtrace the source refers to /build/bellenot/SPI/x86_64-slc6-gcc48-dbg/ . The SPI build for 5.34/18 was still done with xrootd v3.2.7 .
Also, on the cluster you seem to take XrdProofd from

     /cvmfs/cms.cern.ch/slc6_amd64_gcc481/cms/cmssw/CMSSW_7_3_0/external/slc6_amd64_gcc481

Is this compatible with the one in

    /grp/cms/tools/xrootd/Packages/5.34.18/x86_64-slc6-gcc48-dbg

?

Unfortunately in the last couple of years we had incompatible changes in openssl (0.9.x is not compatible with 1.0.x) and also xrootd. The only way out is to have ‘closed’ environment where everything is under control.
Do you have all what you need under

/cvmfs/cms.cern.ch/slc6_amd64_gcc481/cms/cmssw/CMSSW_7_3_0/external/slc6_amd64_gcc481

?
If yes, perhaps we should stick to that.

G[/quote]
If I understand the above correctly, you are suggesting I change the config to include this line:

xpd.rootsys /cvmfs/cms.cern.ch/slc6_amd64_gcc481/cms/cmssw/CMSSW_7_3_0/external/slc6_amd64_gcc481

I tried that I got the same result. I’m not tied to using a specific CMSSW or ROOT release. If there is a version that is compatible with openssl 1.0.1e I can change the config to point only to those locations. Is that feasible?

Thanks

Hi,

The crash that you posted in the first post is on the client side, it is really when the client starts the all process by initializing the proxy and so on. The crash is in the openssl calls and it looks typical of a internal structure mismatch.

This is why it is important to work with a consistent environment.
I am not familiar enough with the CMS software to tell you how to achieve that. I can imagine you have an automatic way to configure the environment for CMSSW. Does that also include setting up ROOT and XROOTD?

Let’s try to debug the client side first.
Can you post the output of:

$ which root
$ which xrootd
$ echo $LD_LIBRARY_PATH

?
Also, can you try to locate the first occurence of libXrdProofd.so in the list of paths LD_LIBRARY_PATH and run ‘ldd’ on that?
For example, if it is under

/cvmfs/cms.cern.ch/slc6_amd64_gcc481/cms/cmssw/CMSSW_7_3_0/external/slc6_amd64_gcc481/lib

then do

ldd /cvmfs/cms.cern.ch/slc6_amd64_gcc481/cms/cmssw/CMSSW_7_3_0/external/slc6_amd64_gcc481/lib/libXrdProofd.so

and post it.

If you happen to be at CERN, we can have a look together.

G

Hi,

Sorry I was away for a bit and didn’t have a chance to look at this.

From your path/library suggestions, I cleaned up all my proof and client scripts to only use one instance of CMSSW. There was some mixing of versions between the scripts that start proof and the environment on my client.

This seems to have gotten me past the seg fault, but now I am unable to connect to the proof master. This is what I see in the logs when I try to connect:

150331 14:18:33 4488 xpd-I: ClientMgr::Login: hostname: 'cms-adm.rcac.purdue.edu
150331 14:18:33 4488 xpd-I: goughes.10001:9@cms-adm: ClientMgr::Auth: logging as ‘7ed6d683.0’ instead of ‘goughes’ following admin settings
150331 14:18:33 4488 xpd-I: goughes.10001:9@cms-adm: ClientMgr::Auth: goughes.10001:9@cms-adm login as 7ed6d683.0
150331 14:18:33 4489 XrdPoll: Sever event occured for goughes.10001:9@cms-adm
150331 14:18:33 4488 XrdLink: Unable to receive from goughes.10001:9@cms-adm; connection reset by peer
150331 14:18:33 4488 xpd-I: goughes.10001:9@cms-adm: Protocol::recycle: user disconnected; type: ClientMaster

I am not at CERN, but we could do a WebEx if you would like.

Thanks,
-Erik

Hello,

Ok, so the problem is now that it tries to login you with a username not recognized by the machine.
There is no gridmap file on the machines, right?
This is located by default under /etc/grid-security/grid-mapfile and it is the source of information for mapping certificate DN to loacl usernames.
Without grid-mapfile the hash of the certificate DN, i.e. ‘7ed6d683.0’, is used, which is not recognized.
Grid-mapfiles have very simple form: a text file with lines like this

DN           username

where DN is, for example, the output of

$ openssl x509 -in yourcert.pem -subject | grep subject | sed s/subject=//  

Just create one in the standard location (or in some other location and use -gridmap=my-grid-mapfile in the gsi protocol plug-in configuration; see xrootd.org/doc/dev41/sec_config. … c361850217).
You need to do it on all machines, so perhaps you can use a shared directory.

G

Hello,

Sorry for the late reply. Things are working now after I added the grid-mapfile.

I also had to add this line to .rootrc:
XSec.GSI.DelegProxy: 2

Without that I also would get a seg fault when trying to access files from the global CMS redirector.

I attached my final working configs.

Thanks for all the help!
xpd.cf.txt (692 Bytes)
proof.sh.txt (499 Bytes)