#28311 closed defect (fixed)

Random failure in combinatorial_polyhedron/base.pyx

Reported by: vbraun Owned by:
Priority: major Milestone: sage-9.1
Component: geometry Keywords: random_fail, combinatorial polyhedron
Cc: gh-kliem, vdelecroix, jdemeyer Merged in:
Authors: Jonathan Kliem Reviewers: Travis Scrimshaw
Report Upstream: N/A Work issues:
Branch: 83e5564 (Commits, GitHub, GitLab) Commit: 83e5564b6c7c277f3497d978eba211453c777afb
Dependencies: Stopgaps:

Status badges

Description (last modified by gh-kliem)

sage: N = combinations(range(25),24) ## line 1174 ##
sage: C = CombinatorialPolyhedron(N) ## line 1175 ##
sage: try:
    alarm(0.5)
    C.f_vector()
except:
    print("alarm!") ## line 1176 ##
------------------------------------------------------------------------
/var/lib/buildbot/slave/sage3_git/build/local/lib/python3.7/site-packages/cysignals/signals.cpython-37m-x86_64-linux-gnu.so(+0x87f8)[0x7f9ee7ee97f8]
/var/lib/buildbot/slave/sage3_git/build/local/lib/python3.7/site-packages/cysignals/signals.cpython-37m-x86_64-linux-gnu.so(+0x88a8)[0x7f9ee7ee98a8]
/var/lib/buildbot/slave/sage3_git/build/local/lib/python3.7/site-packages/cysignals/signals.cpython-37m-x86_64-linux-gnu.so(+0xba9c)[0x7f9ee7eeca9c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f9eef7f9390]
/lib/x86_64-linux-gnu/libc.so.6(+0x11509c)[0x7f9eef53309c]
/lib/x86_64-linux-gnu/libc.so.6(+0x802bd)[0x7f9eef49e2bd]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f9eef4a253c]
/var/lib/buildbot/slave/sage3_git/build/local/lib/python3.7/site-packages/sage/ext/memory_allocator.cpython-37m-x86_64-linux-gnu.so(+0x37ed)[0x7f9e818e07ed]
/var/lib/buildbot/slave/sage3_git/build/local/lib/python3.7/site-packages/sage/geometry/polyhedron/combinatorial_polyhedron/face_iterator.cpython-37m-x86_64-linux-gnu.so(+0x7ce7)[0x7f9e6b7a9ce7]
/var/lib/buildbot/slave/sage3_git/build/local/lib/python3.7/site-packages/sage/geometry/polyhedron/combinatorial_polyhedron/base.cpython-37m-x86_64-linux-gnu.so(+0x2dab4)[0x7f9e6be55ab4]
/var/lib/buildbot/slave/sage3_git/build/local/lib/python3.7/site-packages/sage/geometry/polyhedron/combinatorial_polyhedron/base.cpython-37m-x86_64-linux-gnu.so(+0x2e398)[0x7f9e6be56398]
/var/lib/buildbot/slave/sage3_git/build/local/lib/libpython3.7m.so.1.0(_PyMethodDef_RawFastCallKeywords+0x300)[0x7f9eefa9f190]
/var/lib/buildbot/slave/sage3_git/build/local/lib/libpython3.7m.so.1.0(_PyMethodDescr_FastCallKeywords+0x49)[0x7f9eefaa7e69]
/var/lib/buildbot/slave/sage3_git/build/local/lib/libpython3.7m.so.1.0(_PyEval_EvalFrameDefault+0x7a8c)[0x7f9eefa76b2c]
/var/lib/buildbot/slave/sage3_git/build/local/lib/libpython3.7m.so.1.0(_PyEval_EvalCodeWithName+0xa34)[0x7f9eefb80ab4]
/var/lib/buildbot/slave/sage3_git/build/local/lib/libpython3.7m.so.1.0(PyEval_EvalCodeEx+0x3e)[0x7f9eefb80bce]
[...]
/var/lib/buildbot/slave/sage3_git/build/local/lib/libpython3.7m.so.1.0(+0x1d3010)[0x7f9eefbd8010]
/var/lib/buildbot/slave/sage3_git/build/local/lib/libpython3.7m.so.1.0(_Py_UnixMain+0x39)[0x7f9eefbd82c9]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f9eef43e830]
/var/lib/buildbot/slave/sage3_git/build/local/bin/python3(_start+0x29)[0x400729]
------------------------------------------------------------------------
**********************************************************************
----------------------------------------------------------------------
sage -t --long src/sage/geometry/polyhedron/combinatorial_polyhedron/base.pyx  # Timed out (and interrupt failed)
----------------------------------------------------------------------

The backtrace usually points into sage/ext/memory_allocator. My guess is that the computation is mostly about shufffling about memory, and raising signals / longjmp out of malloc has a chance of messing up things.

We fix this by allocating the memory in get_next_level on the stack.

Change History (14)

comment:1 Changed 21 months ago by jhpalmieri

On OS X, I've seen

sage -t --long --warn-long 59.7 src/sage/geometry/polyhedron/combinatorial_polyhedron/base.pyx
    Killed due to illegal instruction
**********************************************************************
Tests run before process (pid=84349) failed:
...
sage: try:
    alarm(0.5)
    C.f_vector()
except:
    print("alarm!") ## line 1132 ##
alarm!
sage: C.f_vector()  # long time ## line 1139 ##
------------------------------------------------------------------------
0   signals.cpython-37m-darwin.so       0x00000001023390ba print_backtrace + 58
1   signals.cpython-37m-darwin.so       0x000000010233d3f7 sigdie + 39
2   signals.cpython-37m-darwin.so       0x000000010233d350 sigdie_for_sig + 224
3   libsystem_platform.dylib            0x00007fff596c8b5d _sigtramp + 29
4   ???                                 0x0000000400000000 0x0 + 17179869184
...
83  libpython3.7m.dylib                 0x00000001009acf07 pymain_main + 6727
84  libpython3.7m.dylib                 0x00000001009ada8f _Py_UnixMain + 111
85  libdyld.dylib                       0x00007fff594dd3d5 start + 1
86  ???                                 0x0000000000000006 0x0 + 6
------------------------------------------------------------------------
Unhandled SIGILL: An illegal instruction occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
------------------------------------------------------------------------

It's not repeatable, but I have seen it on two different machines, Python 2 and Python 3.

comment:2 Changed 21 months ago by gh-kliem

See also #28287.

Actually in bit_vector_operations.cc there is a memory allocation, namely *is_not_newface = new int[n_faces -1]();.

I never considered it to be a possible problem. I could move this allocation to FaceIterator (outside of sig_on/sig_off). This also involves in time being saved (this is why I did it somewhere).

Last edited 21 months ago by gh-kliem (previous) (diff)

comment:3 Changed 21 months ago by vbraun

  • Cc jdemeyer added

I don't understand how that could ever end up with a SIGILL (instead of SIGSEGV) on OSX but maybe thats an Apple secret

comment:4 Changed 21 months ago by gh-kliem

The resolution of #28287 is to remove the tests.

comment:5 Changed 20 months ago by chapoton

  • Milestone changed from sage-8.9 to sage-duplicate/invalid/wontfix
  • Status changed from new to needs_review

So this is now an invalid ticket, right ?

comment:6 Changed 20 months ago by gh-kliem

  • Branch set to public/28311
  • Commit set to fe19a772ac4d45c1491e6dcda5660dfa0ae74bba

We could also remove the memory allocation in the method get_next_level as illustrated in my commit.


New commits:

fe19a77removed memory allocation within sig_on/sig_off

comment:7 Changed 17 months ago by gh-kliem

  • Milestone changed from sage-duplicate/invalid/wontfix to sage-9.0

comment:8 Changed 17 months ago by gh-kliem

  • Branch changed from public/28311 to public/28311-nwe
  • Commit fe19a772ac4d45c1491e6dcda5660dfa0ae74bba deleted

Much simpler fix.

comment:9 Changed 17 months ago by gh-kliem

  • Branch changed from public/28311-nwe to public/28311-new
  • Commit set to 83e5564b6c7c277f3497d978eba211453c777afb

New commits:

83e5564allocate is_not_newfaces on stack

comment:10 Changed 17 months ago by gh-kliem

  • Description modified (diff)
  • Keywords combinatorial polyhedron added

comment:11 Changed 16 months ago by gh-kliem

  • Authors set to Jonathan Kliem

comment:12 Changed 16 months ago by tscrim

  • Reviewers set to Travis Scrimshaw
  • Status changed from needs_review to positive_review

LGTM.

comment:13 Changed 16 months ago by chapoton

  • Milestone changed from sage-9.0 to sage-9.1

9.0 is out

comment:14 Changed 15 months ago by vbraun

  • Branch changed from public/28311-new to 83e5564b6c7c277f3497d978eba211453c777afb
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.