Opened 3 years ago

Closed 3 years ago

#24534 closed defect (duplicate)

Mysterious hang in linbox after fork on Cygwin

Reported by: embray Owned by: embray
Priority: critical Milestone: sage-duplicate/invalid/wontfix
Component: porting: Cygwin Keywords:
Cc: Merged in:
Authors: Reviewers:
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description

The following test reproduces the problem reliably (for me) in Sage 8.1 (plus one tiny patch needed for psutil to build, which has no runtime effect):

from sage.all import *
import os
import sys

def test():
    return random_matrix(ZZ, 30).determinant(algorithm='linbox',
                                             proof=False)

if os.fork() == 0:
    print('in child; the determinant is: %s' % test())
    sys.stdout.flush()
else:
    print('in parent; the determinant is: %s' % test())
    sys.stdout.flush()
    print('waiting for child')
    sys.stdout.flush()
    print(os.wait())
    sys.stdout.flush()

This is especially mysterious for a few reasons:

1) I never saw this error before, and I only first starting noticing after I upgraded my Windows machine to the "Windows 10 Fall Creators Update". That may just be a coincidence though.

2) The issue does not occur when I run the SageMath 8.1 binary for Windows I built a few weeks ago. I made that build before my Windows update, but running it on the updated Windows has no issue. The Cygwin version is the same in both cases.

3) All of linbox's make check tests pass--I'm not sure if any of them deal with fork, however.

So this is only happening currently in my development Cygwin, and not in a previously built binary. This suggests to me some kind of problem occurring at build time but it's hard to say yet.

Change History (3)

comment:1 Changed 3 years ago by embray

  • Owner changed from (none) to embray

comment:2 Changed 3 years ago by embray

Another data point: Although in my test above the child just hangs, I do see in strace an attempt to call open() with an invalid filename pointer, and it also reports an access violation, but then just...hangs. That might be a Cygwin bug, somehow.

But there's another test that is resulting in a seemingly related segfault (along with several other tests in the same module):

File "src/sage/dynamics/arithmetic_dynamics/projective_ds.py", line 4171, in sage.dynamics.arithmetic_dynamics.projective_ds.?.rational_periodic_points
Failed example:
    f.rational_periodic_points()
Expected:
    [(w : 1), (1 : 0), (-w + 1 : 1)]
Got:
    Exception raised by child process with pid=3804:
    Traceback (most recent call last):
      File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/sage/parallel/use_fork.py", line 177, in __call__
        self._subprocess(f, dir, *v0)
      File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/sage/parallel/use_fork.py", line 302, in _subprocess
        value = f(*args, **kwds)
      File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/sage/dynamics/arithmetic_dynamics/projective_ds.py", line 2283, in parallel_function
        return morphism.possible_periods()
      File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/sage/dynamics/arithmetic_dynamics/projective_ds.py", line 5466, in possible_periods
        return _fast_possible_periods(self, return_points)
      File "sage/dynamics/arithmetic_dynamics/projective_ds_helper.pyx", line 27, in sage.dynamics.arithmetic_dynamics.projective_ds_helper._fast_possible_periods (build/cythonized/sage/dynamics/arithmetic_dynamics/projective_ds_helper.c:3385)
        cpdef _fast_possible_periods(self, return_points=False):
      File "sage/dynamics/arithmetic_dynamics/projective_ds_helper.pyx", line 177, in _enum_points (build/cythonized/sage/dynamics/arithmetic_dynamics/projective_ds_helper.c:3588)
        yield _get_point_from_hash(value, prime, dimension)
      File "sage/matrix/matrix2.pyx", line 2329, in sage.matrix.matrix2.Matrix.charpoly (build/cythonized/sage/matrix/matrix2.c:21594)
        f = self.lift().charpoly(var).change_ring(R)
      File "sage/matrix/matrix_integer_dense.pyx", line 1256, in sage.matrix.matrix_integer_dense.Matrix_integer_dense.charpoly (build/cythonized/sage/matrix/matrix_integer_dense.c:12614)
        sig_on()
    SignalError: Segmentation fault
    ------------------------------------------------------------------------
    ------------------------------------------------------------------------
    Attaching gdb to process id 6516.
    Attaching gdb to process id 16424.
    GNU gdb (GDB) (Cygwin 7.10.1-1) 7.10.1
    Copyright (C) 2015 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-pc-cygwin".
    Type "show configuration" for configuration details.
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>.
    Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.
    For help, type "help".
    Type "apropos word" to search for commands related to "word".
    [New Thread 6516.0x31e0]
    [New Thread 6516.0x5ee0]
    [New Thread 6516.0xbb8]
    [New Thread 6516.0x3328]
    [New Thread 6516.0x48f8]
    [New Thread 6516.0x4938]
    [New Thread 6516.0xf44]
    0x00007ffff1183801 in ntdll!DbgBreakPoint ()
       from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
    <BLANKLINE>
    Stack backtrace
    ---------------
       from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
    No symbol table info available.
    #1  0x00007ffff11b01cb in ntdll!DbgUiRemoteBreakin ()
       from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
    No symbol table info available.
    #2  0x00007fffef911fe4 in KERNEL32!BaseThreadInitThunk ()
       from /cygdrive/c/WINDOWS/System32/KERNEL32.DLL
    No symbol table info available.
    #3  0x00007ffff114efb1 in ntdll!RtlUserThreadStart ()
       from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
    No symbol table info available.
    #4  0x0000000000000000 in ?? ()
    No symbol table info available.
    Backtrace stopped: previous frame inner to this frame (corrupt stack?)
    <BLANKLINE>
    <BLANKLINE>
    Cython backtrace
    ----------------
    #0  0x0000000000000000 in ntdll!DbgBreakPoint()
    #1  0x0000000000000000 in ntdll!DbgUiRemoteBreakin()
    #2  0x0000000000000000 in KERNEL32!BaseThreadInitThunk()
    #3  0x0000000000000000 in ntdll!RtlUserThreadStart()
    <BLANKLINE>
    <BLANKLINE>
    Traceback (most recent call last):
      File "<string>", line 131, in <module>
      File "<string>", line 106, in invoke
      File "<string>", line 113, in newest_first_order
      File "<string>", line 92, in print_stackframe
      File "/usr/lib/python2.7/site-packages/Cython/Debugger/libcython.py", line 88, in wrapper
        raise NoFunctionNameInFrameError()
    NoFunctionNameInFrameError: C function name could not be determined in the current C stack frame
    close failed in file object destructor:
    IOError: [Errno 9] Bad file descriptor
    Saved trace to /home/embray/.sage/crash_logs/crash_JQ9mUA.log
    ------------------------------------------------------------------------
    Unhandled SIGSEGV: A segmentation fault occurred.
    This probably occurred because a *compiled* module has a bug
    in it and is not properly wrapped with sig_on(), sig_off().
    Python will now terminate.
    ------------------------------------------------------------------------

comment:3 Changed 3 years ago by embray

  • Milestone changed from sage-8.2 to sage-duplicate/invalid/wontfix
  • Resolution set to duplicate
  • Status changed from new to closed

Ahah, it seems that when I last upgraded my Cygwin installation it also installed openblas, and I'm being bitten by this again: #22822.

That also explains why I'm not having the same problem from my installer binaries. This ticket can be closed, then, since it's not really an issue with linbox, and originates from another known issue (I've confirmed that the relevant code in LinBox? is called threaded versions of routines in openblas).

Note: See TracTickets for help on using tickets.