#8788 closed defect (fixed)
segfault in Sage-4.4 built using GCC-4.5.0
Reported by: | was | Owned by: | GeorgSWeber |
---|---|---|---|
Priority: | blocker | Milestone: | sage-4.4.1 |
Component: | build | Keywords: | |
Cc: | wjp, leif | Merged in: | sage-4.4.1.alpha3 |
Authors: | Willem Jan Palenstijn | Reviewers: | William Stein |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description
If we build Sage-4.4 (with several tickets/patches/elbow grease) with GCC-4.5.0, then many things cause it to segfault at exit. The simplest I found so far is this:
sage: Mat(GF(9,'a'),3,sparse=True).random_element() sage: from sage.matrix.matrix_space import test_trivial_matrices_inverse as tinv sage: tinv(ZZ, sparse=False) sage: quit Exiting Sage (CPU time 0m0.11s, Wall time 0m0.15s). *** glibc detected *** python: double free or corruption (fasttop): 0x000000000233a930 *** ======= Backtrace: ========= /lib64/libc.so.6[0x39a6c74a76] /lib64/libc.so.6(exit+0xe2)[0x39a6c35b82] python[0x4c3896] python[0x4c30f5] python(PyRun_SimpleFileExFlags+0x159)[0x4c5e69] python(Py_Main+0xa5e)[0x413cde] /lib64/libc.so.6(__libc_start_main+0xfd)[0x39a6c1eb1d] python[0x412f79] ======= Memory map: ======== 00400000-00566000 r-xp 00000000 00:13 12537003 /home/wstein/screen/lena/sage-4.4/local/bin/python 00765000-0079e000 rw-p 00165000 00:13 12537003 /home/wstein/screen/lena/sage-4.4/local/bin/python 0079e000-007ad000 rw-p 00000000 00:00 0 00bf1000-04d16000 rw-p 00000000 00:00 0 [heap] 316e600000-316e61c000 r-xp 00000000 fd:00 8683576 /lib64/libselinux.so.1 316e61c000-316e81b000 ---p 0001c000 fd:00 8683576 /lib64/libselinux.so.1 316e81b000-316e81c000 r--p 0001b000 fd:00 8683576 /lib64/libselinux.so.1 316e81c000-316e81d000 rw-p 0001c000 fd:00 8683576 /lib64/libselinux.so.1 316e81d000-316e81e000 rw-p 00000000 00:00 0 3171200000-3171203000 r-xp 00000000 fd:00 8683697 /lib64/libcom_err.so.2.1 3171203000-3171402000 ---p 00003000 fd:00 8683697 /lib64/libcom_err.so.2.1 3171402000-3171403000 rw-p 00002000 fd:00 8683697 /lib64/libcom_err.so.2.1 3171600000-31716b3000 r-xp 00000000 fd:00 8683698 /lib64/libkrb5.so.3.3 31716b3000-31718b3000 ---p 000b3000 fd:00 8683698 /lib64/libkrb5.so.3.3 31718b3000-31718bd000 rw-p 000b3000 fd:00 8683698 /lib64/libkrb5.so.3.3 3171a00000-3171a2d000 r-xp 00000000 fd:00 8683700 /lib64/libgssapi_krb5.so.2.2 3171a2d000-3171c2d000 ---p 0002d000 fd:00 8683700 /lib64/libgssapi_krb5.so.2.2 3171c2d000-3171c2f000 rw-p 0002d000 fd:00 8683700 /lib64/libgssapi_krb5.so.2.2 3171e00000-3171e2a000 r-xp 00000000 fd:00 8683677 /lib64/libk5crypto.so.3.1 3171e2a000-317202a000 ---p 0002a000 fd:00 8683677 /lib64/libk5crypto.so.3.1 317202a000-317202c000 rw-p 0002a000 fd:00 8683677 /lib64/libk5crypto.so.3.1 3172200000-3172208000 r-xp 00000000 fd:00 8683667 /lib64/libkrb5support.so.0.1 3172208000-3172408000 ---p 00008000 fd:00 8683667 /lib64/libkrb5support.so.0.1 3172408000-3172409000 rw-p 00008000 fd:00 8683667 /lib64/libkrb5support.so.0.1 3172600000-3172652000 r-xp 00000000 fd:00 52070079 /usr/lib64/libssl.so.1.0.0 3172652000-3172851000 ---p 00052000 fd:00 52070079 /usr/lib64/libssl.so.1.0.0 3172851000-3172859000 rw-p 00051000 fd:00 52070079 /usr/lib64/libssl.so.1.0.0 39a6800000-39a681e000 r-xp 00000000 fd:00 8683525 /lib64/ld-2.11.1.so 39a6a1d000-39a6a1e000 r--p 0001d000 fd:00 8683525 /lib64/ld-2.11.1.so 39a6a1e000-39a6a1f000 rw-p 0001e000 fd:00 8683525 /lib64/ld-2.11.1.so ...
Change History (13)
comment:1 Changed 11 years ago by
comment:2 Changed 11 years ago by
- Cc wjp added
comment:3 Changed 11 years ago by
- Cc leif added
comment:4 Changed 11 years ago by
From what I can tell, the issue is related to linbox and givaro both using the
randstate stuff in givaro's gmp++_int.inl. On my home machine the internal
random states (a local static in Integer::randstate()
) in both end up as different objects, but on lena they seem to
use the exact same object in memory, causing it to be deleted twice on exit.
If anybody else wants to take a look, I tracked this down by putting a breakpoint on mpir's 'randclear_lc' and looking at the rdi register which is the pointer to the randstate.
comment:5 Changed 11 years ago by
Well, this is a fun one. Givaro and Linbox indeed end up destructing the same object.
The destructor is registered once via givaro:
#0 0x00000039a6c35dd0 in __cxa_atexit_internal () from /lib64/libc.so.6 #1 0x00007fffddf09ec2 in randstate (...) at sage-4.4/local//include/gmp++/gmp++_int.inl:317 #2 seeding (...) at sage-4.4/local//include/gmp++/gmp++_int.inl:322 #3 seeding (...) at sage-4.4/local//include/givaro/givinteger.h:132 #4 IntFactorDom (...) at sage-4.4/local//include/givaro/givintfactor.h:43 #5 IntNumTheoDom (...) at sage-4.4/local//include/givaro/givintnumtheo.h:23 #6 GFqDom<int>::GFqDom (...) at sage-4.4/local//include/givaro/givgfq.inl:931
and once via linbox:
#0 0x00000039a6c35dd0 in __cxa_atexit_internal () from /lib64/libc.so.6 #1 0x00007fffd5dbe365 in randstate (...) at sage-4.4/local/include/gmp++/gmp++_int.inl:317 #2 seeding (...) at sage-4.4/local/include/gmp++/gmp++_int.inl:322 #3 setSeed (...) at ../../linbox/randiter/random-prime.h:57 #4 LinBox::RandomPrimeIterator::RandomPrimeIterator (this=0x7fffffffc600, bits=<value optimized out>, seed=<value optimized out>) at ../../linbox/randiter/random-prime.h:26
This might be a compiler and/or linker bug...
I'm not altogether sure how best to workaround it. One possible way would just be to avoid clearing the randstate entirely in givaro's Integer::randstate()
. If I understand things correctly, there won't be more than one copy around for each library using givaro, so it won't actually leak memory except on program exit.
I need to stop looking at this for today, but if anyone wants to test, that would require replacing the following in [gmp++_int.inl
inline gmp_randclass& Integer::randstate(long unsigned int seed) { static gmp_randclass randstate(GMP_RAND_ALG_DEFAULT,seed); return static_cast<gmp_randclass&>(randstate); }
by
inline gmp_randclass& Integer::randstate(long unsigned int seed) { static gmp_randclass* randstate = new gmp_randclass(GMP_RAND_ALG_DEFAULT,seed); return *randstate; }
An initial quick test shows that this might fix the issue, but I only rebuilt linbox after this change; nothing else, not even givaro itself. And I only tried the example given in the initial report in the ticket, no doctests.
comment:6 Changed 11 years ago by
All doctests pass after the change I mentioned. I'll turn this into a new givaro spkg ready for more testing later today, unless somebody beats me to it.
comment:7 Changed 11 years ago by
- Status changed from new to needs_review
A new givaro spkg to work around this problem:
http://www.math.leidenuniv.nl/~wpalenst/sage/givaro-3.2.13rc2.p1.spkg
It basically fixes the problem by not destructing the randstate objects on exit. This shouldn't be a problem because the destructor only frees memory.
comment:8 Changed 11 years ago by
Does this mean it will create a memory leak?
comment:9 Changed 11 years ago by
No. The objects persist until sage exits, regardless of if their destructors are called. The only thing that changes is that the objects aren't actually freed when sage exits, which is pretty much irrelevant.
(An exception would be if something were to dlopen/dlclose libgivaro or liblinboxsage repeatedly, but I don't think that's the case.)
comment:10 Changed 11 years ago by
- Status changed from needs_review to positive_review
comment:11 Changed 11 years ago by
- Merged in set to 4.4.1.alpha3
- Resolution set to fixed
- Status changed from positive_review to closed
comment:12 Changed 11 years ago by
- Reviewers set to William Stein
comment:13 Changed 11 years ago by
- Merged in changed from 4.4.1.alpha3 to sage-4.4.1.alpha3
Here is what fails WITH THE DOUBLE FREE ERROR when doctesting sage-4.4.1.alpha0:
Everything is likely to involve something in linear algebra... that's a common theme! Linbox?