Opened 4 years ago

Closed 4 years ago

#22536 closed defect (duplicate)

SIGFPE during libgap_initialize()

Reported by: jdemeyer Owned by:
Priority: blocker Milestone: sage-duplicate/invalid/wontfix
Component: packages: standard Keywords:
Cc: dimpase, vbraun Merged in:
Authors: Reviewers: Jeroen Demeyer
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description

I have seen this on two occasions seemingly randomly appear. Once it appears, the only fix is to remove the generated workspaces:

Cython backtrace
----------------
#0  0x00007fb7fd50e410 in waitpid()
#1  0x00007fb7f87b7370 in print_enhanced_backtrace()at /tmp/pip-49MJ1g-build/build/src/cysignals/implementation.c:394
#2  0x00007fb7f87b78c0 in sigdie()at /tmp/pip-49MJ1g-build/build/src/cysignals/implementation.c:413
#3  0x00007fb7f87bafd0 in cysigs_signal_handler()at /tmp/pip-49MJ1g-build/build/src/cysignals/implementation.c:206
#4  0x00007fb7fd50e860 in __restore_rt()
#5  0x00007fb75f89cf70 in libGAP_RNamName()
#6  0x00007fb75f9b5118 in libGAP_CallErrorInner()
#7  0x00007fb75f9b5337 in libGAP_ErrorQuit()
#8  0x00007fb75f89be3b in libGAP_LoadObjError()
#9  0x00007fb75f8c42e4 in libGAP_LoadBagData()
#10 0x00007fb75f8c4d57 in libGAP_LoadWorkspace()
#11 0x00007fb75f9ba5b1 in libGAP_InitializeGap()
#12 0x00007fb75f9613f9 in libgap_initialize()
#13 0x00007fb760247110 in __pyx_f_4sage_4libs_3gap_4util_initialize()at /usr/local/src/sage-config/src/build/cythonized/sage/libs/gap/util.c:4568
  4563     *     libgap_start_interaction('')
  4564     *     libgap_initialize(argc, argv)             # <<<<<<<<<<<<<<
  4565     *     gap_error_msg = str(libgap_get_output())
  4566     *     libgap_finish_interaction()
  4567     */
> 4568      libgap_initialize(__pyx_v_argc, __pyx_v_argv);
  4569    
  4570      /* "sage/libs/gap/util.pyx":219
  4571     *     libgap_start_interaction('')
  4572     *     libgap_initialize(argc, argv)
#14 0x00007fb74ac7e681 in __pyx_pf_4sage_4libs_3gap_6libgap_3Gap_24__init__()at /usr/local/src/sage-config/src/build/cythonized/sage/libs/gap/libgap.c:5351
  5346     *         """
  5347     *         initialize()             # <<<<<<<<<<<<<<
  5348     *         libgap_set_gasman_callback(gasman_callback)
  5349     *         from sage.rings.integer_ring import ZZ
  5350     */
> 5351      __pyx_t_1 = __pyx_f_4sage_4libs_3gap_4util_initialize(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 626, __pyx_L1_error)
  5352      __Pyx_GOTREF(__pyx_t_1);
  5353      __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  5354    
  5355      /* "sage/libs/gap/libgap.pyx":627
------------------------------------------------------------------------
Unhandled SIGFPE: An unhandled floating point exception occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
------------------------------------------------------------------------

I'm setting this to blocker because this never appeared with Sage 7.5.1 or before. So my guess is that #20914 might be responsible.

Change History (14)

comment:1 follow-up: Changed 4 years ago by dimpase

did you see this on the same system?

comment:2 in reply to: ↑ 1 Changed 4 years ago by jdemeyer

Replying to dimpase:

did you see this on the same system?

Yes. Twice on my laptop where I do all my Sage development. For example, this time I was doing some Cython development, then I went back to the develop branch and got this error.

comment:3 Changed 4 years ago by jdemeyer

Now I'm getting a SIGSEGV here:

Stack backtrace
---------------
No symbol table info available.
#1  0x00007fa5b169a3e3 in print_enhanced_backtrace () at build/src/cysignals/implementation.c:394
        parent_pid = 30738
        pid = <optimized out>
#2  0x00007fa5b169a8da in sigdie (sig=sig@entry=11, s=s@entry=0x7fa5b16a4570 "Unhandled SIGSEGV: A segmentation fault occurred.")
    at build/src/cysignals/implementation.c:413
No locals.
#3  0x00007fa5b169e167 in cysigs_signal_handler (sig=11) at build/src/cysignals/implementation.c:212
        inside = <optimized out>
#4  <signal handler called>
No symbol table info available.
#5  0x00007fa51768e47b in libGAP_NextBagRestoring () from /usr/local/src/sage-config/local/lib/libgap.so.4
No symbol table info available.
#6  0x00007fa517535318 in libGAP_LoadBagData () from /usr/local/src/sage-config/local/lib/libgap.so.4
No symbol table info available.
#7  0x00007fa517536435 in libGAP_LoadWorkspace () from /usr/local/src/sage-config/local/lib/libgap.so.4
No symbol table info available.
#8  0x00007fa51762b8ba in libGAP_InitializeGap () from /usr/local/src/sage-config/local/lib/libgap.so.4
No symbol table info available.
#9  0x00007fa5175d2474 in libgap_initialize () from /usr/local/src/sage-config/local/lib/libgap.so.4
No symbol table info available.
#10 0x00007fa517eb8717 in __pyx_f_4sage_4libs_3gap_4util_initialize () at build/cythonized/sage/libs/gap/util.c:4589
        __pyx_v_argv = {0x7fa517ec19c7 "sage", 0x7fa517ec1842 "-l", 0x7fa5b1ceb0a4 "/usr/local/src/sage-config/local/gap/latest", 0x7fa517ec1845 "-o", 
          0x7fa5062f87a4 "251m", 0x7fa517ec1848 "-s", 0x7fa5062f87a4 "251m", 0x7fa517ec184b "-m", 0x7fa517ec184e "64m", 0x7fa517ec1852 "-q", 0x7fa517ec1855 "-T", 
          0x7fa517ec1858 "-L", 0x7fa50610cc1c "/home/jdemeyer/.sage/gap/libgap-workspace-4034600654683281042", 0x0}
        __pyx_v_s = 0x7fa5b1ceb080
        __pyx_v__get_gap_memory_pool_size_MB = <optimized out>
        __pyx_v_memory_pool = 0x7fa5062f8780
        __pyx_v_argc = 13
        __pyx_v_workspace = 0x7fa50610cbf8
        __pyx_v_workspace_is_up_to_date = 0x7fa5b69c2d80 <_Py_TrueStruct>
        __pyx_v_gap_error_msg = 0x0
        __pyx_r = 0x0
        __pyx_t_1 = <optimized out>
        __pyx_t_2 = 0x0
        __pyx_t_3 = 0x0
        __pyx_t_4 = 0x0
        __pyx_t_6 = 0x0
        __pyx_t_7 = <optimized out>
        __pyx_t_9 = <optimized out>

comment:4 follow-ups: Changed 4 years ago by dimpase

it's baffling to me that you get SIGFPEs, for GAP doesn't really do floating point things. Do you have a weird list of GAP packages installed?

comment:5 in reply to: ↑ 4 Changed 4 years ago by jdemeyer

Replying to dimpase:

it's baffling to me that you get SIGFPEs, for GAP doesn't really do floating point things.

Also integer division by 0 gives a SIGFPE, run cython('''print(<int>1 // <int>0)''') for example.

Do you have a weird list of GAP packages installed?

No, this is on a mostly clean installation of Sage where I regularly run make distclean.

comment:6 Changed 4 years ago by jdemeyer

The last time (from 3) was again after doing some Cython development and downgrading from 7.6.beta6 (#22554 to be precise) to 7.6.beta4 (#22471 to be precise).

comment:7 follow-up: Changed 4 years ago by dimpase

could it be #22437 ? Did you see this before the latter was merged? (in case you have a stale bunch of gap/ headers somewhere...)

Last edited 4 years ago by dimpase (previous) (diff)

comment:8 in reply to: ↑ 4 Changed 4 years ago by jdemeyer

Replying to dimpase:

it's baffling to me that you get SIGFPEs, for GAP doesn't really do floating point things. Do you have a weird list of GAP packages installed?

Just a guess here...

The code for libGAP_RNamName (see ticket description) has this line

pos = (pos % libGAP_SizeRNam) + 1;

This would give a SIGFPE in case libGAP_SizeRNam == 0.

comment:9 in reply to: ↑ 7 Changed 4 years ago by jdemeyer

Replying to dimpase:

could it be #22437 ? Did you see this before the latter was merged? (in case you have a stale bunch of gap/ headers somewhere...)

In both cases, the problem goes away after removing the generated GAP workspaces. So it cannot be a compilation issue.

comment:10 Changed 4 years ago by jdemeyer

I'm wondering if this is related to #20421 somehow. Maybe a race condition in writing the workspace, leading to corruption?

comment:11 Changed 4 years ago by jdemeyer

I just recompiled GAP and libGAP with SAGE_DEBUG=yes (on top of 7.6.beta4). After doing that, the workspaces were regenerated automatically and everything worked again.

I then restored the corrupted workspace from 3 that I had saved in a separate directory and the corruption appeared again. Traceback:

Stack backtrace
---------------
No symbol table info available.
#1  0x00007f0db97be3e3 in print_enhanced_backtrace ()
    at build/src/cysignals/implementation.c:394
        parent_pid = 16070
        pid = <optimized out>
#2  0x00007f0db97be8da in sigdie (sig=sig@entry=11, 
    s=s@entry=0x7f0db97c8570 "Unhandled SIGSEGV: A segmentation fault occurred.") at build/src/cysignals/implementation.c:413
No locals.
#3  0x00007f0db97c2167 in cysigs_signal_handler (sig=11)
    at build/src/cysignals/implementation.c:212
        inside = <optimized out>
#4  <signal handler called>
No symbol table info available.
#5  0x00007f0d1d98e6ad in libGAP_NextBagRestoring (size=37497077760, type=0)
    at gasman.c:984
        bag = 0x7f0cfe3fd810
        i = 31941069
#6  0x00007f0d1d8354a8 in libGAP_LoadBagData () at saveload.c:465
        bag = 0x7f0cfe3fd808
        size = 37497077760
        type = 0
#7  0x00007f0d1d8365c5 in libGAP_LoadWorkspace (
    fname=0x7f0d0e0fa944 "/home/jdemeyer/.sage/gap/libgap-workspace-4034600654683281042") at saveload.c:911
        nMods = 5
        nGlobs = 491
        nBags = 324935
        i = 26882
        maxSize = 6991526
        isGapRootRelative = 1
        globalcount = 491
        buf = "Bag data\000ib/random.g:NameFunc[3](-48550429)\000\061)\000\000\360\317\064\004\000\000\000\000`6\021\016\r\177\000\000\r\000\000\000\000\000\000\000$\340X\273\r\177\000\000\000\000\000\000\000\000\000\000\020\200<\376\f\177\000\000@\237\023\253\377\177\000\000\020\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\000\000\001\000\000\000\000\000\336+\237\035\r\177\000\000j\a\230\035\r\177\000\000\200\237\023\253\377\177\000\000Q\234\222\035\r\177\000\000\372\000\236\035\r\177\000\000R\370\212\035\r\177\000\000 \300\236\035\r\177\000\000+T\222\035\r\177\000\000\310(\n\036\r\177\000\000"...
        glob = 0x7f0d1dc361f8 <libGAP_NameFunc+24>
#8  0x00007f0d1d92ba4a in libGAP_InitializeGap (pargc=0x7fffab13a02c, 
    argv=0x7fffab13a0c0) at gap.c:3410
        i = 136
        ret = 0
#9  0x00007f0d1d8d2604 in libgap_initialize (argc=13, argv=0x7fffab13a0c0)
    at libgap.c:73

comment:12 Changed 4 years ago by dimpase

Corrupt workspaces almost always crash GAP (or libGAP), no wonder.

comment:13 follow-up: Changed 4 years ago by vbraun

I'm not sure if gap workspaces are compatible between normal and debug builds.

Presumably the issue is mitated by #22570

I'm leaning towards not making this a blocker

comment:14 in reply to: ↑ 13 Changed 4 years ago by jdemeyer

  • Milestone changed from sage-7.6 to sage-duplicate/invalid/wontfix
  • Resolution set to duplicate
  • Reviewers set to Jeroen Demeyer
  • Status changed from new to closed

Replying to vbraun:

Presumably the issue is mitated by #22570

We don't know, but at least we can hope...

I'm leaning towards not making this a blocker

I think it should either be a blocker or closed as duplicate of #22570.

Note: See TracTickets for help on using tickets.