Seg fault while compiling ROOT6

Hi all, I was very eager to try out the new ROOT6, but I am getting seg faults when compiling! I have attached the log files. Here are the commands I used to compile:

git clone http://root.cern.ch/git/root.git
cd root
git tag -l
git checkout -b v6-00-00
./configure macosx64 --enable-python --enable-gsl-shared --with-gsl-incdir="/Users/jfcaron/Projects/GSL/compiled/include" --with-gsl-libdir="/Users/jfcaron/Projects/GSL/compiled/lib" --with-f77="/opt/local/bin/gfortran-mp-4.8" --enable-minuit2 > configurelog.txt
make -j 8 1> makelog_stdout.txt 2> makelog_stderr.txt

I use a customized GSL which is why I am compiling myself instead of using MacPorts. The seg fault doesn’t appear to be related to GSL, but it’s hard to tell what the error is from the output. What else can I do to diagnose this?

Jean-François
makelog_stdout.txt.zip (44.8 KB)
makelog_stderr.txt (36.7 KB)
configurelog.txt (7.5 KB)

Hi Jean-François,

the crash you report happens during the generation of dictionaries.
I understand from your post that you cannot have build remnants around and you start from a fresh build.
Do you run OSx 10.9? Which gsl version you are using (so that I can fully reproduce your setup on my machine)?
Is there a reason why you don’t use the system gfortran?

Cheers,
Danilo

Hi Danilo, while I already have another version of ROOT in another directory, this attempt at compiling is for a “fresh” git clone and after doing “make clean”. I am using OSX 10.9.2, with XCode’s provided clang:

Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)
Target: x86_64-apple-darwin13.1.0
Thread model: posix

The my custom version of GSL is based on GSL 1.16 (and works linked against my other ROOT version 5.34/18).

I don’t use the system gfortran because I don’t think I have one. When Apple’s XCode switched from gcc to llvm/clang they lost the automatic bundling of gfortran. So I am using gfortran from MacPorts, provided by gcc 4.8. This seems weird, but again it worked for 5.34/18

Jean-François

Hi Jean-Francois,

on my mac, OSx 10.9, gsl 1.16 built by hand as you pointed out, configure command

./condifure macosx64 --enable-python --enable-gsl-shared --with-gsl-incdir=~/jfcaron/gsl/include/ --with-gsl-libdir=~/jfcaron/gsl/lib/ --enable-minuit2

I cannot reproduce the issue, i.e. ROOT6 compiles to the end and works.
As you can notice, the only difference is the fortran compiler, which in my case is present as gfortran-4.2.

I notice that you did not use the v6-00-00 but just created and switched to a local branch called v6-00-00 remaining at the HEAD.
Could you try out the compilation again after moving to the v6 tag with

git checkout -b v6-00-00 v6-00-00

?

Cheers,
Danilo

Woops, I didn’t notice that I only typed “v6-00-00” once. I re-tried and unfortunately it seems nothing has changed. I have attached the new log files, but it looks like the only differences are things like the memory addresses during the stack trace.

If I recall, gcc 4.2 was the last gcc that Apple used for XCode, under OSX 10.7. Is your gfortran-4.2 from XCode? Does it matter which compiler was used to compile GSL?

Jean-François
makelog_stdout.txt.zip (156 KB)
makelog_stderr.txt (37.5 KB)
configurelog.txt (7.57 KB)

Hi,

this is odd :frowning:
I see from your stdout log that quite some dictionaries are successfully generated before the crash. Would it be possible to make the dictionaries one by one for example deleting the all available G__* files and then using make (w/o -j 8)?
We can therefore see which is the first failing dictionary and narrow down a bit the issue.

Cheers,
Danilo

It looks like it’s failing at MathCore. The log is attached. It says “make: *** [math/mathcore/src/G__MathCore.cxx] Error 139”. With gcc, make error 139 would be a core dump, out of swap or something. I don’t know what it means for rootcling.

By the way, is it more useful to attach separate stderr/stdout logs, or just use &>?

Jean-François
redodictlog.txt (53.1 KB)

Hi,

the &> in presence of 1 single process at the time is the best in my opinion as it shows what one would see on the terminal.
As I said, unfortunately I cannot manage to reproduce this issue on my machine, this implies some debugging. To move forward, I propose:

  1. To try and compile root with the standard configuration, e.g. no special gsl: ./configure;make
  2. Try and compile root with cmake.
    This will help us understanding if some idiosyncrasy is present in the general setup of the machine.

Then, with the present (gsl+ everything) setup:
3.a) Add debug symbols and get the full stacktrace (–build=debug)
4.b) See what is the class the system is trying to load, i.e. first argument of TROOT::LoadClass
This will help us to better understand what is going on.

  1. ./configure;make suffers from the problem of trying to compile with only one core reported in other threads. ./configure; make -j2 again gives a seg fault on MathCore.
  2. I tried these CMake instructions:
mkdir workdir; cd workdir
cmake .. -DCMAKE_INSTALL_PREFIX=.
make -j 8

This time I get an Undefined symbols for architecture x86_64. After “Completed CLING” the output is:

Scanning dependencies of target LLVMRES Scanning dependencies of target MetaUtils Scanning dependencies of target MetaUtilsLLVM Scanning dependencies of target MetaLLVM [ 9%] [ 9%] Building CXX object core/metautils/CMakeFiles/MetaUtils.dir/src/RConversionRuleParser.cxx.o Building CXX object core/metautils/CMakeFiles/MetaUtils.dir/src/TClassEdit.cxx.o [ 9%] [ 9%] [ 9%] [ 9%] Building CXX object core/metautils/CMakeFiles/MetaUtilsLLVM.dir/src/ClassSelectionRule.cxx.o Building CXX object core/metautils/CMakeFiles/MetaUtilsLLVM.dir/src/BaseSelectionRule.cxx.o Building CXX object core/metautils/CMakeFiles/MetaUtilsLLVM.dir/src/VariableSelectionRule.cxx.o Building CXX object core/metautils/CMakeFiles/MetaUtilsLLVM.dir/src/RStl.cxx.o [ 9%] Building CXX object core/meta/CMakeFiles/MetaLLVM.dir/src/TCling.cxx.o [ 9%] Built target LLVMRES [ 9%] Building CXX object core/metautils/CMakeFiles/MetaUtilsLLVM.dir/src/Scanner.cxx.o [ 9%] Building CXX object core/meta/CMakeFiles/MetaLLVM.dir/src/TClingBaseClassInfo.cxx.o [ 9%] Building CXX object core/meta/CMakeFiles/MetaLLVM.dir/src/TClingCallbacks.cxx.o [ 9%] Building CXX object core/metautils/CMakeFiles/MetaUtilsLLVM.dir/src/SelectionRules.cxx.o [ 9%] Built target MetaUtils [ 9%] Building CXX object core/meta/CMakeFiles/MetaLLVM.dir/src/TClingCallFunc.cxx.o [ 9%] Building CXX object core/meta/CMakeFiles/MetaLLVM.dir/src/TClingClassInfo.cxx.o [ 9%] Building CXX object core/meta/CMakeFiles/MetaLLVM.dir/src/TClingDataMemberInfo.cxx.o [ 9%] Building CXX object core/metautils/CMakeFiles/MetaUtilsLLVM.dir/src/XMLReader.cxx.o [ 10%] Building CXX object core/meta/CMakeFiles/MetaLLVM.dir/src/TClingMethodArgInfo.cxx.o [ 10%] Building CXX object core/meta/CMakeFiles/MetaLLVM.dir/src/TClingMethodInfo.cxx.o [ 10%] Building CXX object core/meta/CMakeFiles/MetaLLVM.dir/src/TClingTypedefInfo.cxx.o [ 10%] Building CXX object core/meta/CMakeFiles/MetaLLVM.dir/src/TClingTypeInfo.cxx.o [ 10%] Building CXX object core/meta/CMakeFiles/MetaLLVM.dir/src/TClingValue.cxx.o [ 10%] Built target MetaUtilsLLVM Scanning dependencies of target rootcling_tmp [ 11%] [ 11%] [ 11%] [ 11%] Building CXX object core/utils/CMakeFiles/rootcling_tmp.dir/src/DictSelectionReader.cxx.o Building CXX object core/utils/CMakeFiles/rootcling_tmp.dir/src/LinkdefReader.cxx.o Building CXX object core/utils/CMakeFiles/rootcling_tmp.dir/src/TModuleGenerator.cxx.o Building CXX object core/utils/CMakeFiles/rootcling_tmp.dir/src/rootcling_tmp.cxx.o [ 11%] Building CXX object core/utils/CMakeFiles/rootcling_tmp.dir/__/metautils/src/TMetaUtils.cxx.o [ 11%] Built target MetaLLVM Linking CXX executable ../../bin/rootcling_tmp Undefined symbols for architecture x86_64: "_AddAncestorPCMROOTFile", referenced from: RootCling(int, char**, bool, bool) in rootcling_tmp.cxx.o "_AddStreamerInfoToROOTFile", referenced from: GenerateFullDict(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, cling::Interpreter&, RScanner&, SelectionRules&, std::__1::list<ROOT::TMetaUtils::RConstructorType, std::__1::allocator<ROOT::TMetaUtils::RConstructorType> > const&, bool, bool, bool, bool) in rootcling_tmp.cxx.o "_AddTypedefToROOTFile", referenced from: GenerateFullDict(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, cling::Interpreter&, RScanner&, SelectionRules&, std::__1::list<ROOT::TMetaUtils::RConstructorType, std::__1::allocator<ROOT::TMetaUtils::RConstructorType> > const&, bool, bool, bool, bool) in rootcling_tmp.cxx.o "_CloseStreamerInfoROOTFile", referenced from: GenerateFullDict(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, cling::Interpreter&, RScanner&, SelectionRules&, std::__1::list<ROOT::TMetaUtils::RConstructorType, std::__1::allocator<ROOT::TMetaUtils::RConstructorType> > const&, bool, bool, bool, bool) in rootcling_tmp.cxx.o "_InitializeStreamerInfoROOTFile", referenced from: RootCling(int, char**, bool, bool) in rootcling_tmp.cxx.o "_TCling__GetInterpreter", referenced from: RootCling(int, char**, bool, bool) in rootcling_tmp.cxx.o "_TROOT__GetExtraInterpreterArgs", referenced from: RootCling(int, char**, bool, bool) in rootcling_tmp.cxx.o ld: symbol(s) not found for architecture x86_64 clang: error: linker command failed with exit code 1 (use -v to see invocation) make[2]: *** [bin/rootcling_tmp] Error 1 make[1]: *** [core/utils/CMakeFiles/rootcling_tmp.dir/all] Error 2 make: *** [all] Error 2

I will start working on 3 and 4.

I’ve now also tried using the following commands:

./configure macosx64 --enable-python --enable-gsl-shared --with-gsl-incdir="/Users/jfcaron/Projects/GSL/compiled/include" --with-gsl-libdir="/Users/jfcaron/Projects/GSL/compiled/lib" --with-f77="/opt/local/bin/gfortran-mp-4.8" --enable-minuit2 --build=debug
make -j 2 &> makelog.txt

Again, I couldn’t run it in single-threaded mode because of the problems others have had. The combined output is attached.

I’m not used to reading this kind of output, but it doesn’t look like it’s printing the name of the class used with LoadClass. Can you glean anything from it?

By the way, thanks for all the help so far.
Jean-François
makelog.txt (1.03 MB)

Hi Jean-François,

thanks for all the steps taken so far.
Now, the goal of the debug build is to attach a debugger (gdb?) to the dictionary generation process and print the name of the actual variable. I cannot figure out what is going on with cmake. Odd.
I propose, if it’s ok for you, to change a bit the strategy. Could you download the tarball of the sources and simply run ./configure;make -jN ?

Cheers,
Danilo

I don’t have gdb, but lldb. I am not very good at using it, but I am able to re-run the last command from the error log of “make”. lldb automatically stops at the first error, and I can list the variables, but this isn’t inside LoadClass and I can’t see any class names…:

[code]$ lldb – bin/rootcling -rootbuild -f io/xml/src/G__XMLIO.cxx -s lib/libXMLIO.so -rml libXMLIO.so -rmf lib/libXMLIO.rootmap -m lib/libCore_rdict.pcm -m lib/libRIO._rdict.pcm -c /Users/jfcaron/Software/custom_root/root6/root/io/xml/inc/TBufferXML.h /Users/jfcaron/Software/custom_root/root6/root/io/xml/inc/TKeyXML.h /Users/jfcaron/Software/custom_root/root6/root/io/xml/inc/TXMLEngine.h /Users/jfcaron/Software/custom_root/root6/root/io/xml/inc/TXMLFile.h /Users/jfcaron/Software/custom_root/root6/root/io/xml/inc/TXMLPlayer.h /Users/jfcaron/Software/custom_root/root6/root/io/xml/inc/TXMLSetup.h /Users/jfcaron/Software/custom_root/root6/root/io/xml/inc/LinkDef.h
Current executable set to ‘bin/rootcling’ (x86_64).
(lldb) run
Process 68311 launched: ‘/Users/jfcaron/Software/custom_root/root6/root/bin/rootcling’ (x86_64)
Process 68311 stopped

  • thread #1: tid = 0x41a549, 0x00000001003ec72a rootclingTSystem::GetLibraries(this=0x0000000102e11190, regexp=0x0000000101b941f0, options=0x0000000101b941f0, isRegexp=true) + 314 at TSystem.cxx:2023, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x00000001003ec72a rootclingTSystem::GetLibraries(this=0x0000000102e11190, regexp=0x0000000101b941f0, options=0x0000000101b941f0, isRegexp=true) + 314 at TSystem.cxx:2023
    2020 opt.ReplaceAll(“L”, “”);
    2021
    2022 if (opt.IsNull() || opt.First(‘D’) != kNPOS)
    -> 2023 libs += gInterpreter->GetSharedLibs();
    2024
    2025 // Cint currently register all libraries that
    2026 // are loaded and have a dictionary in them, this
    (lldb) frame variable
    (TSystem *) this = 0x0000000102e11190
    (const char *) regexp = 0x0000000101b941f0 “”
    (const char *) options = 0x0000000101b941f0 “”
    (Bool_t) isRegexp = true
    (TString) libs = {
    fRep = {
    = {
    fLong = (fCap = 0, fSize = 0, fData = 0x0000000000000000)
    fShort = (fSize = ‘\0’, fData = “”)
    fRaw = {
    fWords = ([0] = 0, [1] = 0, [2] = 0, [3] = 0)
    }
    }
    }
    }
    (TString) opt = {
    fRep = {
    = {
    fLong = (fCap = 1606352896, fSize = 32767, fData = 0x0000000000000001)
    fShort = (fSize = ‘\0’, fData = “”)
    fRaw = {
    fWords = ([0] = 1606352896, [1] = 32767, [2] = 1, [3] = 0)
    }
    }
    }
    }
    (Bool_t) so2dylib = false
    (TString) slinked = {
    fRep = {
    = {
    fLong = (fCap = 30171168, fSize = 1, fData = 0x6572685462696c12)
    fShort = (fSize = ’ ', fData = “?\x01\x01") fRaw = { fWords = ([0] = 30171168, [1] = 1, [2] = 1651076114, [3] = 1701996628) } } } } (const char *) linked = 0x00007fff5fbfc9d0 "ad" (lldb) fr v -a (TString) libs = { fRep = { = { fLong = (fCap = 0, fSize = 0, fData = 0x0000000000000000) fShort = (fSize = '\0', fData = "") fRaw = { fWords = ([0] = 0, [1] = 0, [2] = 0, [3] = 0) } } } } (TString) opt = { fRep = { = { fLong = (fCap = 1606352896, fSize = 32767, fData = 0x0000000000000001) fShort = (fSize = '\0', fData = "") fRaw = { fWords = ([0] = 1606352896, [1] = 32767, [2] = 1, [3] = 0) } } } } (Bool_t) so2dylib = false (TString) slinked = { fRep = { = { fLong = (fCap = 30171168, fSize = 1, fData = 0x6572685462696c12) fShort = (fSize = ' ', fData = "?\x01\x01”)
    fRaw = {
    fWords = ([0] = 30171168, [1] = 1, [2] = 1651076114, [3] = 1701996628)
    }
    }
    }
    }
    (const char *) linked = 0x00007fff5fbfc9d0 “ad”
    [/code]

Is that useful information?

Jean-François

Hi,

uhm. This could be an issue. What you could do is the following:
[ul]
[li] See the backtrace when the execution stops because of an error - bt command[/li]
[li] Go to the frame where the error occours - fr N command (the numbers appear on the left when typing bt)[/li]
[li] Print the value of the variable - print NameOfTheVar command[/li][/ul]
luckily any online tutorial will be able to guide you through as this is the basic of debugging.

What was the outcome of the compilation of the sources in the tarball?

Cheers,
Danilo

Hi, thanks for your patience. My debugging skills are (shamefully) mostly limited to putting std::cout << “test1” statements about my code, though I did learn how to use XCode Instruments for memory leak finding.

Here is lldb’s output when I list the frames, step into the one for TROOT::LoadClass and print the parameter names:

Process 54247 launched: '/Users/jfcaron/Software/custom_root/root6/root/bin/rootcling' (x86_64)
Process 54247 stopped
* thread #1: tid = 0x44bb70, 0x00000001003ec72a rootcling`TSystem::GetLibraries(this=0x0000000102e12c00, regexp=0x0000000101b941f0, options=0x0000000101b941f0, isRegexp=true) + 314 at TSystem.cxx:2023, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00000001003ec72a rootcling`TSystem::GetLibraries(this=0x0000000102e12c00, regexp=0x0000000101b941f0, options=0x0000000101b941f0, isRegexp=true) + 314 at TSystem.cxx:2023
   2020	      opt.ReplaceAll("L", "");
   2021	
   2022	   if (opt.IsNull() || opt.First('D') != kNPOS)
-> 2023	      libs += gInterpreter->GetSharedLibs();
   2024	
   2025	   // Cint currently register all libraries that
   2026	   // are loaded and have a dictionary in them, this
(lldb) bt
* thread #1: tid = 0x44bb70, 0x00000001003ec72a rootcling`TSystem::GetLibraries(this=0x0000000102e12c00, regexp=0x0000000101b941f0, options=0x0000000101b941f0, isRegexp=true) + 314 at TSystem.cxx:2023, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00000001003ec72a rootcling`TSystem::GetLibraries(this=0x0000000102e12c00, regexp=0x0000000101b941f0, options=0x0000000101b941f0, isRegexp=true) + 314 at TSystem.cxx:2023
    frame #1: 0x00000001003eac0a rootcling`TSystem::Load(this=0x0000000102e12c00, module=0x000000010380d900, entry=0x0000000000000000, system=true) + 106 at TSystem.cxx:1765
    frame #2: 0x00000001004bfe2f rootcling`TUnixSystem::Load(this=0x0000000102e12c00, module=0x000000010380d900, entry=0x0000000000000000, system=true) + 63 at TUnixSystem.cxx:2668
    frame #3: 0x00000001003bf170 rootcling`TROOT::LoadClass(this=0x0000000101e189a9, (null)=0x0000000101b57651, libname=0x0000000101b57659, check=false) + 400 at TROOT.cxx:1815
    frame #4: 0x00000001003ba0b5 rootcling`TROOT::InitThreads(this=0x0000000101e189a9) + 165 at TROOT.cxx:1627
    frame #5: 0x00000001003b93cc rootcling`TROOT(this=0x0000000101e189a9, name=0x0000000101b52d5a, title=0x0000000101b5797f, initfunc=0x0000000000000000) + 6988 at TROOT.cxx:562
    frame #6: 0x00000001003b786d rootcling`TROOT(this=0x0000000101e189a9, name=0x0000000101b52d5a, title=0x0000000101b5797f, initfunc=0x0000000000000000) + 45 at TROOT.cxx:573
    frame #7: 0x00000001003c3664 rootcling`TROOTAllocator(this=0x0000000101e189a9) + 84 at TROOT.cxx:286
    frame #8: 0x00000001003c2795 rootcling`TROOTAllocator(this=0x0000000101e189a9) + 21 at TROOT.cxx:287
    frame #9: 0x00000001003b6b59 rootcling`ROOT::GetROOT1() + 89 at TROOT.cxx:299
    frame #10: 0x00000001003b74ba rootcling`ROOT::GetROOT() + 10 at TROOT.cxx:313
    frame #11: 0x00000001003c2719 rootcling`__cxx_global_var_init1 + 9 at TROOT.cxx:321
    frame #12: 0x00000001003c276e rootcling`_GLOBAL__I_a + 14 at TROOT.cxx:158
    frame #13: 0x00007fff5fc11c2e
    frame #14: 0x00007fff5fc11dba
    frame #15: 0x00007fff5fc0ea62
    frame #16: 0x00007fff5fc0e8f6
    frame #17: 0x00007fff5fc021da
    frame #18: 0x00007fff5fc05560
    frame #19: 0x00007fff5fc0127b
    frame #20: 0x00007fff5fc0105e
(lldb) fr s 3
frame #3: 0x00000001003bf170 rootcling`TROOT::LoadClass(this=0x0000000101e189a9, (null)=0x0000000101b57651, libname=0x0000000101b57659, check=false) + 400 at TROOT.cxx:1815
   1812	      if (check)
   1813	         err = 0;
   1814	      else {
-> 1815	         err = gSystem->Load(path, 0, kTRUE);
   1816	      }
   1817	      delete [] path;
   1818	   } else {
(lldb) print libname
(const char *) $0 = 0x0000000101b57659 "Thread"
(lldb) print path
(char *) $1 = 0x000000010380d900 "/Users/jfcaron/Software/custom_root/compiled/lib/root/libThread.so"

So it looks like it’s libThread. I poked around the other frames and it’s all libThread.

I also tried compiling from the tarball (again, just “make” without -j N (for N > 1) failed), but unfortunately the result is the same. There is a segfault when running TROOT::LoadClass.

Jean-François

Hi Jean-François,

thanks for the analysis: unfortunately I don’t see anything striking there :frowning:
Another suggestion: could you try from a clean source tree to configure disabling fink (./configure --disable-fink)? I start to think that something in your system is unexpectedly making your compilation fail.

Cheers,
Danilo

A clean git clone of 6-00-00, using ./configure --disable-fink did not change the results.

I tried installing the new root6 port from MacPorts (which is 6.00.00), and actually it installed seemingly without errors, but when I try to run it, that’s when I get a seg fault which looks similar to the seg fault from when I tried to ./configure && make myself:

 *** Break *** segmentation violation
 Generating stack trace...
 0x0000000105b7e962 in TROOT::LoadClass(char const*, char const*, bool) (in libCore.6.so) + 194
 0x0000000105b7b960 in TROOT::TROOT(char const*, char const*, void (**)()) (in libCore.6.so) + 4848
 0x0000000105b79ed9 in ROOT::GetROOT1() (in libCore.6.so) + 89
 0x0000000105b8077b in _GLOBAL__I_a (in libCore.6.so) + 27
 0x00007fff6c355c2e in <unknown function>
 0x00007fff6c355dba in <unknown function>
 0x00007fff6c352a62 in <unknown function>
 0x00007fff6c3529eb in <unknown function>
 0x00007fff6c3528f6 in <unknown function>
 0x00007fff6c3461da in <unknown function>
 0x00007fff6c349560 in <unknown function>
 0x00007fff6c34527b in <unknown function>
 0x00007fff6c34505e in <unknown function>
 0x0000000000000002 in <unknown function>

I will email the maintainers listed for this MacPorts port. Perhaps they have some more information about OSX problems.

Jean-François

I have resolved the immediate problem with the help of the MacPorts root6 maintainers, although the problem was unrelated to MacPorts.

I still had my old .rootrc file from ROOT 5.34, and one of the items was causing root6 to mysteriously crash upon startup. The compilation was actually happening just fine, but I guess the makefile runs some tests or something, so the seg fault was happening after compilation. I never actually tried running “root” after the seg fault.

The bad item in my .rootrc file was this one:

Commenting out this line, my custom compiled root6 linked with my custom GSL works as expected. I may open a bug report about the UseThreads option being broken and leading to a very mysterious crash.

Thanks for all the help in diagnosing this, even though it was a relatively silly problem.

Jean-Francois