Opened 8 years ago

Closed 8 years ago

Last modified 4 years ago

#11967 closed defect (fixed)

os x 10.7 Lion -- Sage segfaults on startup when initializing GiNaC

Reported by: was Owned by: drkirkby
Priority: blocker Milestone: sage-5.0
Component: porting Keywords: python osx lion darwin
Cc: burcin, jhpalmieri Merged in: sage-5.0.beta4
Authors: John Palmieri Reviewers: Jeroen Demeyer
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by chapoton)

(The problem has been solved -- see below -- though I haven't figured out what the right patch should be yet. A solution to get Sage to startup without segfault is to delete local/lib/python2.6/config/libpython2.6.a and replace it by local/lib/libpython2.6.dylib, then rebuild the pynac spkg and the Sage library.)

After getting Sage to building (as explained at #11881), we get

$ sage -gdb
then "bt" for a backtrace:
...

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000000
PyInt_FromLong (ival=Cannot access memory at address 0x0
) at intobject.c:91
91 intobject.c: No such file or directory.
 in intobject.c
(gdb) bt
#0  PyInt_FromLong (ival=Cannot access memory at address 0x0
) at intobject.c:91
#1  0x0000000107781951 in GiNaC::numeric::numeric ()
#2  0x00000001077c865a in GiNaC::library_init::library_init ()
#3  0x0000000107662029 in global constructors keyed to _ZN5GiNaC8py_funcsE ()
#4  0x00007fff5fc0fd1a in __dyld__ZN16ImageLoaderMachO18doModInitFunctionsERKN11ImageLoader11LinkContextE ()
#5  0x00007fff5fc0fa66 in __dyld__ZN16ImageLoaderMachO16doInitializationERKN11ImageLoader11LinkContextE ()
#6  0x00007fff5fc0d258 in __dyld__ZN11ImageLoader23recursiveInitializationERKNS_11LinkContextEjRNS_21InitializerTimingListE ()
#7  0x00007fff5fc0d1f1 in __dyld__ZN11ImageLoader23recursiveInitializationERKNS_11LinkContextEjRNS_21InitializerTimingListE ()
#8  0x00007fff5fc0e02b in __dyld__ZN11ImageLoader15runInitializersERKNS_11LinkContextERNS_21InitializerTimingListE ()
#9  0x00007fff5fc03189 in __dyld__ZN4dyld15runInitializersEP11ImageLoader ()
#10 0x00007fff5fc095cb in __dyld_dlopen ()
#11 0x00007fff8fddb95b in dlopen ()
#12 0x00000001000c957a in _PyImport_GetDynLoadFunc ()
#13 0x00000001000b7095 in _PyImport_LoadDynamicModule ()
#14 0x00000001000b56d3 in import_submodule ()
#15 0x00000001000b58eb in load_next ()
...

New spkg: http://sage.math.washington.edu/home/palmieri/SPKG/Old/python-2.7.2.p2.spkg

Attachments (1)

trac_11967-python.patch (1.2 KB) - added by jhpalmieri 8 years ago.
patch for python spkg; for reference only

Download all attachments as: .zip

Change History (32)

comment:1 Changed 8 years ago by burcin

  • Cc burcin added

comment:2 Changed 8 years ago by was

I looked into this further, but haven't solved the problem. I made an *empty* Cython file a.pyx with Extension description:

    Extension('sage.symbolic.a',
              sources = ['sage/symbolic/a.pyx'],
              language = 'c++',
              depends = ginac_depends,
              libraries = ["pynac", "gmp"]),

then I did "sage -ipython" and "import sage.symbolic.a", and I get exactly the same crash as when import sage.all. I looked at the traceback more carefully. The line

#0  PyInt_FromLong (ival=Cannot access memory at address 0x0
) at intobject.c:91

occurs in the Python library in this code:

#if NSMALLNEGINTS + NSMALLPOSINTS > 0
	if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) {
		v = small_ints[ival + NSMALLNEGINTS];
		Py_INCREF(v);                                      <------------------------ ***
#ifdef COUNT_ALLOCS

Thus the array small_ints -- this is some "object pool" (maybe) -- hasn't been initialized. This was called from the Pynac library on startup where various constants (e.g., 1,2,3,4,5, etc., ) are defined. In particular, this line in pynac's numeric.cpp causes the trouble:

    t = PYOBJECT;
    if (!(v._pyobject = PyInt_FromLong(x)))
      py_error("Error creating int");

This *feels* to me a lot like what would happen if two separate copies of the Python library were linked in. One of them hasn't been initialized yet, so the object pool is nonsense. However, I don't see a static libpython sitting around anywhere, so I'm not quite sure what happened.

comment:3 Changed 8 years ago by jdemeyer

  • Milestone sage-4.7.3 deleted

Milestone sage-4.7.3 deleted

comment:4 Changed 8 years ago by was

  • Milestone set to sage-4.8

I've put sage-4.7.3.alpha1, built from source on a 10.6 computer (bsd.math) and on a 10.7 computer (my laptop), then moved the 10.6 built one to the 10.7 computer. The version built on 10.6 starts up on 10.7 just fine, but of course the version built on 10.7 crashes as explained above. I looked at the file local/lib/libpynac-0.2.3.dylib in each install, and they are different:

10.6$ ls -l local/lib/libpynac-0.2.3.dylib 
-rwxr-xr-x  1 wstein  staff  3176552 Nov  2 18:15 local/lib/libpynac-0.2.3.dylib
10.7$  ls -l local/lib/libpynac-0.2.3.dylib 
-rwxr-xr-x  1 wstein  staff  5144444 Nov  2 18:27 local/lib/libpynac-0.2.3.dylib

Notice that the version that works is much smaller. The bad one (on 10.7) has twice as many symbols:

10.6$ nm -o local/lib/libpynac-0.2.3.dylib|wc -l
4665
10.7$ nm -o local/lib/libpynac-0.2.3.dylib|wc -l
9526

Check out this:

10.6$nm -o local/lib/libpynac-0.2.3.dylib|grep -i Py_INCREF
but
10.7$ nm -o local/lib/libpynac-0.2.3.dylib|grep -i Py_INCREF
local/lib/libpynac-0.2.3.dylib: 00000000001a9750 T _Py_IncRef

This suggests again that the build of Pynac on OS X 10.7 is now broken and somehow accidentally links in another copy of Python.

Looking at the build log, we find this during the pynac build (on 10.7):

-L/Users/wstein/sage/install/sage-4.7.3.alpha1/local/lib/python2.6/config -lpython2.6 

and in that -L'd directory we find only an "evil" static libpython2.6.a:

$ ls local/lib/python2.6/config
Makefile	Setup.config	config.c	install-sh	makesetup
Setup		Setup.local	config.c.in	libpython2.6.a	python.o

The same libpython2.6.a is in that directory in the 10.6 version. The same build line appears there to. So... I'm guessing that one of the changes in XCode 4.x is to make it so that -L works differently. In fact, what now happens (that causes this problem) does seem pretty reasonable to me.

My first attempt to fix this is to rebuild the pynac spkg, but with libpython2.6.dylib simply copied into the local/lib/python2.6/config directory. That one change was *not* sufficient to fix the problem, but I bet something related will work...

comment:5 Changed 8 years ago by was

OK, I'm next rebuilding everything (pynac spkg + "sage -ba"), with that static library (mentioned in the previous comment above) *deleted* to avoid it accidentally getting linked in somewhere. This did not help. Starting Sage fails with this traceback now (different than before):

terminate called throwing an exception
Program received signal SIGABRT, Aborted.
0x00007fff941a182a in __kill ()
(gdb) bt
#0  0x00007fff941a182a in __kill ()
#1  0x00007fff8d186a9c in abort ()
#2  0x00007fff8d0fe7bc in abort_message ()
#3  0x00007fff8d0fbfcf in default_terminate ()
#4  0x00007fff8ccc21cd in _objc_terminate ()
#5  0x00007fff8d0fc001 in safe_handler_caller ()
#6  0x00007fff8d0fc05c in std::terminate ()
#7  0x00007fff8d0fd152 in __cxa_throw ()
#8  0x0000000101f1cb85 in py_error (s=0xe688 <Address 0xe688 out of bounds>) at numeric.cpp:131
#9  0x0000000101f2915e in Integer (x=@0x100308fa0) at numeric.cpp:169
#10 0x0000000101f29dd6 in Rational () at /Users/wstein/sage/install/sage-4.7.3.alpha1/spkg/build/pynac-0.2.3/src/ginac/numeric.cpp:180
#11 0x0000000101f29dd6 in GiNaC::Number_T::operator/ (this=0x6, x=@0x0) at numeric.cpp:554
#12 0x0000000101f2a549 in GiNaC::numeric::numeric (this=0xe688, numer=59016, denom=2) at numeric.cpp:1410
#13 0x0000000101f672ea in GiNaC::library_init::library_init (this=0xe688) at utils.cpp:288
#14 0x0000000101e00bf9 in __static_initialization_and_destruction_0 [inlined] () at /Users/wstein/sage/install/sage-4.7.3.alpha1/spkg/build/pynac-0.2.3/src/ginac/py_funcs.cpp:59

What's happening here is this code is run during pynac initialization and totally fails due to the PyImport_ImportModule(...) failing, returning an error pointer that is invalid, which again suggests that the PYthon library is somehow not properly initialized.

PyObject* Integer(const long int& x) {
  if (initialized) 
    return GiNaC::py_funcs.py_integer_from_long(x);
  
  // Slow version since we can't call Cython-exported code yet.
  PyObject* m = PyImport_ImportModule("sage.rings.integer");
  if (!m)
    py_error("Error importing sage.rings.integer");

comment:6 Changed 8 years ago by was

  • Description modified (diff)

Hey wait, I just realized that in my test above I was trying to import that "a" module I had made first, and some other parts of the Sage library hadn't been loaded yet. Removing that, Sage starts up without segfaulting. W00t!

So I have now successfully 100% build Sage on OS X 10.7.2 with XCode 4.2, and got it to startup. I'll post the result of running the test suite soon.

comment:7 Changed 8 years ago by was

  • Description modified (diff)

comment:8 Changed 8 years ago by jhpalmieri

  • Cc jhpalmieri added

comment:9 follow-up: Changed 8 years ago by jhpalmieri

So should we delete this file in the python spkg? Here's an spkg which does that:

I'm attaching the patch to the spkg, for reference.

comment:10 in reply to: ↑ 9 Changed 8 years ago by was

Replying to jhpalmieri:

So should we delete this file in the python spkg?

I don't know. It might also make sense to somehow change Pynac, but I'm not sure. It is always bad in the context of Sage to link in the python library statically, so it's probably good to not have that file exported at all...

comment:11 follow-up: Changed 8 years ago by drkirkby

I doubt it is relevant, but note there is a problem on 64-bit Solaris with Pynac causing a segfault, which I think was related to the order of objects being loaded is not defined. I can't recall the details, and need to leave for work shortly, but there is a Solaris ticket for it and some comments from Burchin about it,

Dave

comment:12 Changed 8 years ago by jdemeyer

  • Priority changed from major to blocker

comment:13 in reply to: ↑ 11 Changed 8 years ago by jhpalmieri

Replying to drkirkby:

I doubt it is relevant, but note there is a problem on 64-bit Solaris with Pynac causing a segfault,

I think the ticket is #11116. It doesn't look relevant to me, either, but other people should take a look.

comment:14 Changed 8 years ago by jdemeyer

  • Milestone changed from sage-4.8 to sage-5.0

comment:15 Changed 8 years ago by was

I'm trying this again with OS X 10.7 and Sage-5.0.beta1:

$ cd SAGE_ROOT
$ rm local/lib/python2.7/config/libpython2.7.a 
$ cp local/lib/libpython2.7.dylib local/lib/python2.7/config/
$ ./sage -f spkg/standard/pynac-0.2.3.p0.spkg
$ touch devel/sage/sage/symbolic/pynac.pyx
$ ./sage -br

And now it starts up!

comment:16 Changed 8 years ago by was

NOTE: It's much safer to do ./sage -ba above.

comment:17 Changed 8 years ago by jhpalmieri

  • Authors set to John Palmieri
  • Description modified (diff)
  • Keywords python osx lion darwin added
  • Status changed from new to needs_review

New spkg: on OS X Lion, this deletes the file libpython2.7.a. Self-tests fail, but they always do with Python. More importantly, Sage starts up with no problems.

comment:18 Changed 8 years ago by jdemeyer

  • Status changed from needs_review to needs_work

The issue has nothing to with the compiler, it is a linker issue. So, you should remove the [ -z "$CC" ] check. Anyway, doesn't Sage always set "$CC"?

comment:19 Changed 8 years ago by jdemeyer

Since I'm changing the Python spkg in #12422 anyway, I might make this change myself and rebase #12422 on this spkg.

comment:20 follow-up: Changed 8 years ago by jhpalmieri

  • Status changed from needs_work to needs_review

You're right, Sage always sets $CC to "gcc". So on tickets where it's necessary, we should check whether [ "$CC" = "gcc" ]. I put up a new spkg here since this is a small change, but go ahead and put the focus on #12422 if you want.

Changed 8 years ago by jhpalmieri

patch for python spkg; for reference only

comment:21 in reply to: ↑ 20 ; follow-ups: Changed 8 years ago by jdemeyer

Replying to jhpalmieri:

You're right, Sage always sets $CC to "gcc". So on tickets where it's necessary, we should check whether [ "$CC" = "gcc" ].

I don't think we should ever do that. Check features, not executable names.

comment:22 Changed 8 years ago by jdemeyer

In #12422, I'm changing "rm" to "rm -f". Objections?

comment:23 in reply to: ↑ 21 Changed 8 years ago by jhpalmieri

Replying to jdemeyer:

Replying to jhpalmieri:

You're right, Sage always sets $CC to "gcc". So on tickets where it's necessary, we should check whether [ "$CC" = "gcc" ].

I don't think we should ever do that. Check features, not executable names.

I don't see anything wrong with checking whether "$CC" is "gcc", as long as it's just a preliminary check, not the only check.

Replying to jdemeyer:

In #12422, I'm changing "rm" to "rm -f". Objections?

No objections, that's fine. (I think in preliminary versions of the patch, I kept misspelling the name of the file to be deleted, so I wanted spkg-install to quit if it didn't delete anything. Otherwise I would have had 'rm -f' there, too.)

comment:24 Changed 8 years ago by jdemeyer

  • Reviewers set to Jeroen Demeyer
  • Status changed from needs_review to positive_review

Works for me!

comment:25 Changed 8 years ago by jdemeyer

  • Merged in set to sage-5.0.beta4
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:26 follow-up: Changed 8 years ago by jdemeyer

We should not rely on xcodebuild to determine XCode version numbers (see sage-devel). How about applying this fix on all OS X 10.6 and above (on a new ticket)?

comment:27 in reply to: ↑ 26 Changed 8 years ago by jhpalmieri

Replying to jdemeyer:

We should not rely on xcodebuild to determine XCode version numbers (see sage-devel). How about applying this fix on all OS X 10.6 and above (on a new ticket)?

I've only seen the problem on OS X 10.7. Why apply it on 10.6? (I'm guessing that it wouldn't break anything, so it wouldn't hurt, but is there a reason for doing it on 10.6?)

comment:28 Changed 8 years ago by jhpalmieri

See #12574 for a followup.

comment:29 in reply to: ↑ 21 Changed 8 years ago by drkirkby

Replying to jdemeyer:

Replying to jhpalmieri:

You're right, Sage always sets $CC to "gcc". So on tickets where it's necessary, we should check whether [ "$CC" = "gcc" ].

I don't think we should ever do that. Check features, not executable names.

The testcc.sh script in $SAGE_LOCAL/bin does exactly that - it tests features. It can identify all the common C compilers, and some of the not-so-common C compilers. There's also another script (I think testcxx.sh), which does the same for C++ compilers. Doing this for Fortran compilers is something I have never managed to work out.

Dave

comment:30 Changed 8 years ago by jdemeyer

  • Description modified (diff)

comment:31 Changed 4 years ago by chapoton

  • Description modified (diff)
Note: See TracTickets for help on using tickets.