Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#8788 closed defect (fixed)

segfault in Sage-4.4 built using GCC-4.5.0

Reported by: was Owned by: GeorgSWeber
Priority: blocker Milestone: sage-4.4.1
Component: build Keywords:
Cc: wjp, leif Merged in: sage-4.4.1.alpha3
Authors: Willem Jan Palenstijn Reviewers: William Stein
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges


If we build Sage-4.4 (with several tickets/patches/elbow grease) with GCC-4.5.0, then many things cause it to segfault at exit. The simplest I found so far is this:

sage: Mat(GF(9,'a'),3,sparse=True).random_element()
sage: from sage.matrix.matrix_space import test_trivial_matrices_inverse as tinv
sage: tinv(ZZ, sparse=False)
sage: quit
Exiting Sage (CPU time 0m0.11s, Wall time 0m0.15s).
*** glibc detected *** python: double free or corruption (fasttop): 0x000000000233a930 ***
======= Backtrace: =========
======= Memory map: ========
00400000-00566000 r-xp 00000000 00:13 12537003                           /home/wstein/screen/lena/sage-4.4/local/bin/python
00765000-0079e000 rw-p 00165000 00:13 12537003                           /home/wstein/screen/lena/sage-4.4/local/bin/python
0079e000-007ad000 rw-p 00000000 00:00 0
00bf1000-04d16000 rw-p 00000000 00:00 0                                  [heap]
316e600000-316e61c000 r-xp 00000000 fd:00 8683576                        /lib64/
316e61c000-316e81b000 ---p 0001c000 fd:00 8683576                        /lib64/
316e81b000-316e81c000 r--p 0001b000 fd:00 8683576                        /lib64/
316e81c000-316e81d000 rw-p 0001c000 fd:00 8683576                        /lib64/
316e81d000-316e81e000 rw-p 00000000 00:00 0
3171200000-3171203000 r-xp 00000000 fd:00 8683697                        /lib64/
3171203000-3171402000 ---p 00003000 fd:00 8683697                        /lib64/
3171402000-3171403000 rw-p 00002000 fd:00 8683697                        /lib64/
3171600000-31716b3000 r-xp 00000000 fd:00 8683698                        /lib64/
31716b3000-31718b3000 ---p 000b3000 fd:00 8683698                        /lib64/
31718b3000-31718bd000 rw-p 000b3000 fd:00 8683698                        /lib64/
3171a00000-3171a2d000 r-xp 00000000 fd:00 8683700                        /lib64/
3171a2d000-3171c2d000 ---p 0002d000 fd:00 8683700                        /lib64/
3171c2d000-3171c2f000 rw-p 0002d000 fd:00 8683700                        /lib64/
3171e00000-3171e2a000 r-xp 00000000 fd:00 8683677                        /lib64/
3171e2a000-317202a000 ---p 0002a000 fd:00 8683677                        /lib64/
317202a000-317202c000 rw-p 0002a000 fd:00 8683677                        /lib64/
3172200000-3172208000 r-xp 00000000 fd:00 8683667                        /lib64/
3172208000-3172408000 ---p 00008000 fd:00 8683667                        /lib64/
3172408000-3172409000 rw-p 00008000 fd:00 8683667                        /lib64/
3172600000-3172652000 r-xp 00000000 fd:00 52070079                       /usr/lib64/
3172652000-3172851000 ---p 00052000 fd:00 52070079                       /usr/lib64/
3172851000-3172859000 rw-p 00051000 fd:00 52070079                       /usr/lib64/
39a6800000-39a681e000 r-xp 00000000 fd:00 8683525                        /lib64/
39a6a1d000-39a6a1e000 r--p 0001d000 fd:00 8683525                        /lib64/
39a6a1e000-39a6a1f000 rw-p 0001e000 fd:00 8683525                        /lib64/

Change History (13)

comment:1 Changed 11 years ago by was

Here is what fails WITH THE DOUBLE FREE ERROR when doctesting sage-4.4.1.alpha0:

        sage -t  "devel/sage/sage/modular/modsym/" # Killed/crashed
        sage -t  "devel/sage/sage/modular/modsym/" # Killed/crashed
        sage -t  "devel/sage/sage/modular/modform/" # Killed/crashed
        sage -t  "devel/sage/sage/modular/ssmod/" # Killed/crashed
        sage -t  "devel/sage/sage/modules/" # Killed/crashed
        sage -t  "devel/sage/sage/matrix/matrix_sparse.pyx" # Killed/crashed
        sage -t  "devel/sage/sage/matrix/" # Killed/crashed
        sage -t  "devel/sage/sage/matrix/matrix2.pyx" # Killed/crashed
        sage -t  "devel/sage/sage/rings/number_field/" # Killed/crashed
        sage -t  "devel/sage/sage/rings/number_field/" # Killed/crashed
        sage -t  "devel/sage/sage/rings/finite_rings/element_ntl_gf2e.pyx" # Killed/crashed
        sage -t  "devel/sage/sage/rings/finite_rings/" # Killed/crashed
        sage -t  "devel/sage/sage/rings/finite_rings/" # Killed/crashed
        sage -t  "devel/sage/sage/rings/polynomial/" # Killed/crashed
        sage -t  "devel/sage/sage/groups/" # Killed/crashed
        sage -t  "devel/sage/sage/tests/" # Killed/crashed
        sage -t  "devel/sage/sage/schemes/hyperelliptic_curves/" # Killed/crashed
        sage -t  "devel/sage/sage/schemes/hyperelliptic_curves/" # Killed/crashed
        sage -t  "devel/sage/sage/schemes/elliptic_curves/" # Killed/crashed
        sage -t  "devel/sage/sage/coding/" # Killed/crashed
        sage -t  "devel/sage/sage/coding/" # Killed/crashed

Everything is likely to involve something in linear algebra... that's a common theme! Linbox?

comment:2 Changed 11 years ago by wjp

  • Cc wjp added

comment:3 Changed 11 years ago by leif

  • Cc leif added

comment:4 Changed 11 years ago by wjp

From what I can tell, the issue is related to linbox and givaro both using the randstate stuff in givaro's gmp++_int.inl. On my home machine the internal random states (a local static in Integer::randstate() ) in both end up as different objects, but on lena they seem to use the exact same object in memory, causing it to be deleted twice on exit.

If anybody else wants to take a look, I tracked this down by putting a breakpoint on mpir's 'randclear_lc' and looking at the rdi register which is the pointer to the randstate.

comment:5 Changed 11 years ago by wjp

Well, this is a fun one. Givaro and Linbox indeed end up destructing the same object.

The destructor is registered once via givaro:

#0  0x00000039a6c35dd0 in __cxa_atexit_internal () from /lib64/
#1  0x00007fffddf09ec2 in randstate (...)
    at sage-4.4/local//include/gmp++/gmp++_int.inl:317
#2  seeding (...)
    at sage-4.4/local//include/gmp++/gmp++_int.inl:322
#3  seeding (...)
    at sage-4.4/local//include/givaro/givinteger.h:132
#4  IntFactorDom (...)
    at sage-4.4/local//include/givaro/givintfactor.h:43
#5  IntNumTheoDom (...)
    at sage-4.4/local//include/givaro/givintnumtheo.h:23
#6  GFqDom<int>::GFqDom (...)
    at sage-4.4/local//include/givaro/givgfq.inl:931

and once via linbox:

#0  0x00000039a6c35dd0 in __cxa_atexit_internal () from /lib64/
#1  0x00007fffd5dbe365 in randstate (...)
    at sage-4.4/local/include/gmp++/gmp++_int.inl:317
#2  seeding (...)
    at sage-4.4/local/include/gmp++/gmp++_int.inl:322
#3  setSeed (...) at ../../linbox/randiter/random-prime.h:57
#4  LinBox::RandomPrimeIterator::RandomPrimeIterator (this=0x7fffffffc600, 
    bits=<value optimized out>, seed=<value optimized out>)
    at ../../linbox/randiter/random-prime.h:26

This might be a compiler and/or linker bug...

I'm not altogether sure how best to workaround it. One possible way would just be to avoid clearing the randstate entirely in givaro's Integer::randstate(). If I understand things correctly, there won't be more than one copy around for each library using givaro, so it won't actually leak memory except on program exit.

I need to stop looking at this for today, but if anyone wants to test, that would require replacing the following in [gmp++_int.inl

inline gmp_randclass& Integer::randstate(long unsigned int seed) {
	static gmp_randclass randstate(GMP_RAND_ALG_DEFAULT,seed);
	return static_cast<gmp_randclass&>(randstate);


inline gmp_randclass& Integer::randstate(long unsigned int seed) {
        static gmp_randclass* randstate = new gmp_randclass(GMP_RAND_ALG_DEFAULT,seed);
        return *randstate;

An initial quick test shows that this might fix the issue, but I only rebuilt linbox after this change; nothing else, not even givaro itself. And I only tried the example given in the initial report in the ticket, no doctests.

comment:6 Changed 11 years ago by wjp

All doctests pass after the change I mentioned. I'll turn this into a new givaro spkg ready for more testing later today, unless somebody beats me to it.

comment:7 Changed 11 years ago by wjp

  • Status changed from new to needs_review

A new givaro spkg to work around this problem:

It basically fixes the problem by not destructing the randstate objects on exit. This shouldn't be a problem because the destructor only frees memory.

comment:8 Changed 11 years ago by drkirkby

Does this mean it will create a memory leak?

comment:9 Changed 11 years ago by wjp

No. The objects persist until sage exits, regardless of if their destructors are called. The only thing that changes is that the objects aren't actually freed when sage exits, which is pretty much irrelevant.

(An exception would be if something were to dlopen/dlclose libgivaro or liblinboxsage repeatedly, but I don't think that's the case.)

comment:10 Changed 11 years ago by was

  • Status changed from needs_review to positive_review

comment:11 Changed 11 years ago by was

  • Merged in set to 4.4.1.alpha3
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:12 Changed 11 years ago by mvngu

  • Authors set to Willem Jan Palenstijn
  • Reviewers set to William Stein

comment:13 Changed 11 years ago by mvngu

  • Merged in changed from 4.4.1.alpha3 to sage-4.4.1.alpha3
Note: See TracTickets for help on using tickets.