#28944 closed defect (duplicate)

Sage crash on OS X 10.15 with Smith normal form for mod 2 matrices

Reported by: jhpalmieri Owned by:
Priority: blocker Milestone: sage-duplicate/invalid/wontfix
Component: linear algebra Keywords: darwin
Cc: mkoeppe, malb Merged in:
Authors: Reviewers: John Palmieri
Report Upstream: Reported upstream. No feedback yet. Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description (last modified by jhpalmieri)

Frequently (maybe half the time, maybe more), the following command crashes Sage:

sage: random_matrix(GF(2), 6, 133).smith_form()
python3(70264,0x10f32edc0) malloc: Incorrect checksum for freed object 0x7fcc61462f50: probably modified after being freed.
Corrupt value: 0x0
python3(70264,0x10f32edc0) malloc: *** set a breakpoint in malloc_error_break to debug
------------------------------------------------------------------------
(no backtrace available)
------------------------------------------------------------------------
Unhandled SIGILL: An illegal instruction occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
------------------------------------------------------------------------
/Users/palmieri/Desktop/Sage_stuff/git/sage/src/bin/sage-python: line 2: 70264 Illegal instruction: 4  sage -python "$@"

This is with Sage 9.0, either a Python 3 or a Python 2 build, OS X 10.15.2. I don't see the same problem on an Ubuntu virtual machine.

Attachments (2)

m4ri-20140914.p0.log (41.5 KB) - added by jhpalmieri 15 months ago.
m4rie-20150908.p0.log (128.6 KB) - added by jhpalmieri 15 months ago.

Download all attachments as: .zip

Change History (42)

comment:1 Changed 16 months ago by jhpalmieri

  • Description modified (diff)
  • Keywords darwin added
  • Summary changed from Sage crash with Smith normal form for mod 2 matrices to Sage crash on OS X with Smith normal form for mod 2 matrices

comment:2 Changed 16 months ago by gh-DaveWitteMorris

I confirm that this is a problem. I tried it 10 times (in Python 3 build of 9.0 on Mac OS 10.15.1) and it crashed Sage every time.

comment:3 Changed 16 months ago by gh-mwageringel

On macOS 10.13.6, I cannot replicate this.

comment:4 Changed 16 months ago by gh-DaveWitteMorris

For me, it seems to be consistent. smith_form is happy to deal with a 128 x 128 matrix (or even a 300 x 128 matrix, which is the largest I tried), but it cannot handle more than 128 columns: it crashes on a 2 x 129 matrix.

For comparison, I replaced GF(2) with GF(4) and GF(7). GF(7) gave no errors (up to 150 x 150), but GF(4) gave NotImplementedError at various sizes of square matrices (96 x 96, 68 x 68, 93 x 93). The error messages all look pretty much the same. Here is one of them:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-17-84785303824d> in <module>()
      1 for k in range(Integer(1),Integer(151)):
----> 2     x = random_matrix(GF(Integer(4)), k, k).smith_form();
      3     print(k)
      4 

/Users/dmorris/Documents/misc/Programs/sage3/local/lib/python3.7/site-packages/sage/matrix/matrix2.pyx in sage.matrix.matrix2.Matrix.smith_form (build/cythonized/sage/matrix/matrix2.c:92051)()
  13664         mm = t.submatrix(1,1)
  13665         if transformation:
> 13666             dd, uu, vv = mm.smith_form(transformation=True)
  13667         else:
  13668             dd = mm.smith_form(transformation=False)

/Users/dmorris/Documents/misc/Programs/sage3/local/lib/python3.7/site-packages/sage/matrix/matrix2.pyx in sage.matrix.matrix2.Matrix.smith_form (build/cythonized/sage/matrix/matrix2.c:92517)()
  13672             u = uu.new_matrix(1,1,[1]).block_sum(uu) * u
  13673             v = v * vv.new_matrix(1,1,[1]).block_sum(vv)
> 13674         dp, up, vp = _smith_diag(d, transformation=transformation)
  13675         if integral is False:
  13676             dp = dp.change_ring(R)

/Users/dmorris/Documents/misc/Programs/sage3/local/lib/python3.7/site-packages/sage/matrix/matrix2.pyx in sage.matrix.matrix2._smith_diag (build/cythonized/sage/matrix/matrix2.c:106383)()
  15606                 t = t[0]
  15607                 # find lambda, mu such that lambda*d[i,i] + mu*d[j,j] = t
> 15608                 lamb = R(dp[i,i]/t).inverse_mod( R.ideal(dp[j,j]/t))
  15609                 mu = R((t - lamb*dp[i,i]) / dp[j,j])
  15610 

/Users/dmorris/Documents/misc/Programs/sage3/local/lib/python3.7/site-packages/sage/structure/element.pyx in sage.structure.element.CommutativeRingElement.inverse_mod (build/cythonized/sage/structure/element.c:19099)()
   2800         i.e., if `I` and ``self`` together generate the unit ideal.
   2801         """
-> 2802         raise NotImplementedError
   2803 
   2804     def divides(self, x):

NotImplementedError:

comment:5 Changed 15 months ago by jhpalmieri

I have checked on three machines: I see the crash on two machines with OS X 10.15, not on the one with 10.14.6.

comment:6 Changed 15 months ago by jhpalmieri

  • Summary changed from Sage crash on OS X with Smith normal form for mod 2 matrices to Sage crash on OS X 10.15 with Smith normal form for mod 2 matrices

comment:7 Changed 15 months ago by gh-DaveWitteMorris

I looked into the NotImplementedError that I am getting with GF(4). Something very strange is happening, and I am leaving this on the same ticket for now, because it may be a different symptom of whatever is causing the crashes over GF(2). The following code should always print True.

M = random_matrix(GF(4), 100, 100)
for k in range(50):
    S,U,V = M.smith_form()
    print(S == U * M * V) # True

Instead, in any given run, I always get False at least a few times. And the loop usually does not get through the entire set of 50, because it is terminated by NotImplementedError (or, occasionally, a sage crash like the one for GF(2)). We are using the same matrix for the entire run, so the output should not be random.

I understand the superficial cause of the NotImplementedError, but not the underlying cause. The value of t[0,0] is always nonzero at the start of the following code snippet from the definition of smith_form.

        # now recurse: t now has a nonzero entry at 0,0 and zero entries in the rest
        # of the 0th row and column, so we apply smith_form to the smaller submatrix
        mm = t.submatrix(1,1)
        if transformation:
            dd, uu, vv = mm.smith_form(transformation=True)
        else:
            dd = mm.smith_form(transformation=False)
        mone = self.new_matrix(1, 1, [1])
        d = dd.new_matrix(1,1,[t[0,0]]).block_sum(dd)

However, executing mm.smith_form sometimes changes the value of t[0,0], so that t[0,0] is 0 in the definition of d. (In the cases where the value of t[0,0] is changed, it seems to always be changed to 0.) This 0 in the top left corner of d causes a NotImplementedError. (I have opened ticket #28967 to eliminate the NotImplementedError, but the resulting value of smith_form will be incorrect, because the value of t[0,0] is wrong.)

I am clearly out of my depth. I don't see how executing mm.smith_form could be changing the value of t. I thought there might be an out-of-range index error, so I tried defining mm to be a deepcopy of t.submatrix(1,1), but that did not seem to make a difference. There is no need to waste time trying to explain to me what is happening, but let me know if there is something I can do to help. Eg, I could try to locate the place in the code that is crashing sage for matrices over GF(2), even though I won't be able to understand why it is happening.

comment:8 Changed 15 months ago by jhpalmieri

I think this is a different issue, since I see the GF(4) problem on OS X 10.14.6, but not the crash: I only see the crash with OS X 10.15.

comment:9 Changed 15 months ago by jhpalmieri

The NotImplementedError is arising in the function _smith_diag, which is too complicated in the case of field coefficients. The documentation says: "If any of the d's is a unit, it replaces it with 1". We don't need to compute ideals generated by elements to test that, or compute inverses of elements mod ideals, or anything like that. This patch may avoid the NotImplementedError problem, but it doesn't fix the bad math (returning False for S == M * U * V, plus the occasional crash):

  • src/sage/matrix/matrix2.pyx

    diff --git a/src/sage/matrix/matrix2.pyx b/src/sage/matrix/matrix2.pyx
    index ea139556d1..b58822a021 100644
    a b cdef class Matrix(Matrix1): 
    1366613666            dd, uu, vv = mm.smith_form(transformation=True)
    1366713667        else:
    1366813668            dd = mm.smith_form(transformation=False)
    13669         mone = self.new_matrix(1, 1, [1])
    1367013669        d = dd.new_matrix(1,1,[t[0,0]]).block_sum(dd)
    1367113670        if transformation:
    1367213671            u = uu.new_matrix(1,1,[1]).block_sum(uu) * u
    1367313672            v = v * vv.new_matrix(1,1,[1]).block_sum(vv)
    13674         dp, up, vp = _smith_diag(d, transformation=transformation)
     13673        if R.is_field() and integral is False:
     13674            dp, up, vp = _smith_diag_field(d, transformation=transformation)
     13675        else:
     13676            dp, up, vp = _smith_diag(d, transformation=transformation)
    1367513677        if integral is False:
    1367613678            dp = dp.change_ring(R)
    1367713679        elif integral is not None:
    def _smith_diag(d, transformation=True): 
    1562515627                dp = newlmat*dp*newrmat
    1562615628    return dp, left, right
    1562715629
     15630def _smith_diag_field(d, transformation=True):
     15631    r"""
     15632    For internal use by the smith_form routine. Given a diagonal
     15633    matrix d over a field F, return matrices d', a,b such that a\*d\*b
     15634    = d' and d' is block diagonal with the identity matrix in the
     15635    upper left, zeroes elsewhere. We assume that the nonzero entries
     15636    are in the upper left: if a diagonal entry ``d[i,i]`` is zero,
     15637    so are all of the entries ``d[j,i]`` with ``j > i``.
     15638
     15639    This is meant to mimic the results of :func:`_smith_diag`, but
     15640    with field coefficients it is simpler. The matrix `d'` is a block
     15641    matrix with the identity in the upper left, zeroes elsewhere. The
     15642    matrix `b` is the identity matrix. The matrix `a` is diagonal; for
     15643    each `i`, its `(i,i)`-entry is either zero or the inverse of
     15644    `d[i,i]`, depending on whether `d[i,i]` is zero.
     15645
     15646    EXAMPLES::
     15647
     15648        sage: from sage.matrix.matrix2 import _smith_diag_field
     15649        sage: A = matrix(QQ, 2, [2,0,0,0])
     15650        sage: D,U,V = _smith_diag_field(A); D,U,V
     15651        (
     15652        [1 0]  [1/2   0]  [1 0]
     15653        [0 0], [  0   1], [0 1]
     15654        )
     15655    """
     15656    dp = d.__copy__()
     15657    n = min(d.nrows(), d.ncols())
     15658    R = d.base_ring()
     15659    one = R.one()
     15660    if transformation:
     15661        left = d.new_matrix(d.nrows(), d.nrows(), 1)
     15662        right = d.new_matrix(d.ncols(), d.ncols(), 1)
     15663    else:
     15664        left = right = None
     15665    for i in xrange(n):
     15666        if dp[i,i]:
     15667            if dp[i,i] != 1:
     15668                if transformation:
     15669                    left[i,i] = one/dp[i,i]
     15670                dp[i,i] = one
     15671    return dp, left, right
     15672
    1562815673def _generic_clear_column(m):
    1562915674    r"""
    1563015675    Reduce the first column of m to canonical form -- that is, all entries

comment:10 Changed 15 months ago by jhpalmieri

Oh, and for what it's worth, in brief testing I did not see any of these GF(4) problems on an Ubuntu virtual machine. So perhaps this is also OS X only.

comment:11 Changed 15 months ago by gh-DaveWitteMorris

Thanks for letting me know that I am not the only one seeing the GF(4) problem.

When smith_form is operating correctly, the diagonal matrix that is fed to _smith_diag_field will have all of its 0's at the end, so (over a field) inverse_mod will never be called. The patch in #28967 gives a definition for inverse_mod, so the NotImplementedError is eliminated even when smith_form gets bad input (over a field), but you are 100% correct that it does not address any of the root issues.

I will open a new ticket for the GF(4) problem. I have seen your GF(2) problem over several finite fields of even order, but the NotImplementedError has only been for GF(4) so far.

comment:12 Changed 15 months ago by gh-DaveWitteMorris

I opened #28970 to discuss the GF(4) problem.

comment:13 Changed 15 months ago by dimpase

Please have a look at logs/pkgs/m4ri(e)*.log for any clues, perhaps there are telling compiler warnings...

Changed 15 months ago by jhpalmieri

Changed 15 months ago by jhpalmieri

comment:14 Changed 15 months ago by jhpalmieri

I don't see anything interesting, but I'm not an expert, so I'm attaching the logs in case anyone else can spot anything.

comment:15 Changed 15 months ago by dimpase

Could you try to update the m4ri package by cloning m4ri master from https://bitbucket.org/malb/m4ri/ which has few things changed since Sage's m4ri, in particular one or two about "fixing undefined behaviour", and see if it helps.

comment:16 Changed 15 months ago by jhpalmieri

The new version of m4ri didn't help with the crash on OS X 10.15 or the GF(4) problem.

comment:17 Changed 15 months ago by dimpase

I also see more fresher than our m4rie fixes here: https://bitbucket.org/malb/m4rie/commits/ Could you do the same check for m4rie, too?

comment:18 Changed 15 months ago by dimpase

it smells like a compiler bug to me, probably showing up on the Cython implementation of smith_form rather than on m4ri(e) themselves.

An interesting test is whether one still sees this on conda-based Sage installation (conda builds its own compiler).

What are clang versions that exhibit this bug?

comment:19 Changed 15 months ago by jhpalmieri

I wouldn't be surprised if it's a compiler bug with repercussions on Cython.

$ clang --version
Apple clang version 11.0.0 (clang-1100.0.33.16)
Target: x86_64-apple-darwin19.3.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

exhibits the OS X 10.15 bug (the crash with Smith form over GF(2)). For the GF(4) problem, I see it with this and also on OS X 10.14, same clang version number, but target x86_64-apple-darwin18.7.0.

comment:20 Changed 15 months ago by jhpalmieri

I installed Sage via conda on an OS 10.15 machine. I still get the crash, but not as often: maybe once every 5 or 10 times instead of more than half the time.

The crash with regular Sage:

python3(1568,0x10eafedc0) malloc: Incorrect checksum for freed object 0x7fb1e2298820: probably modified after being freed.
Corrupt value: 0x8000000000000000
python3(1568,0x10eafedc0) malloc: *** set a breakpoint in malloc_error_break to debug
------------------------------------------------------------------------
(no backtrace available)
------------------------------------------------------------------------
Unhandled SIGABRT: An abort() occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
------------------------------------------------------------------------
/Users/palmieri/Sage/sage-9.0/src/bin/sage-python: line 2:  1568 Abort trap: 6           sage -python "$@"

The crash with conda's Sage:

------------------------------------------------------------------------
(no backtrace available)
------------------------------------------------------------------------
Unhandled SIGSEGV: A segmentation fault occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
------------------------------------------------------------------------
/Users/palmieri/anaconda3/envs/sage/bin/sage-python: line 2:  1271 Segmentation fault: 11  sage -python "$@"
Last edited 15 months ago by jhpalmieri (previous) (diff)

comment:21 Changed 15 months ago by gh-DaveWitteMorris

In case this is useful information...

I tried a some runs where (I think) I replaced all of the matrix multiplications in smith_form, _smith_onestep, and _generic_clear_column with high-school-level row-by-column multiplication. That didn't seem to make much difference, but one of the runs gave an error message that may be more interesting, because it seems to show that the crash was in new_matrix.

python3(953,0x10c845dc0) malloc: Incorrect checksum for freed object 0x7fc47fdaa450: probably modified after being freed.
Corrupt value: 0x0
python3(953,0x10c845dc0) malloc: *** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
  File "/Users/dmorris/Downloads/smith_is_random.sage.py", line 12, in <module>
    S, U, V = M.smith_form(transformation=True)
  File "sage/matrix/matrix2.pyx", line 13675, in sage.matrix.matrix2.Matrix.smith_form (build/cythonized/sage/matrix/matrix2.c:92723)
  File "sage/matrix/matrix2.pyx", line 13654, in sage.matrix.matrix2.Matrix.smith_form (build/cythonized/sage/matrix/matrix2.c:92191)
  File "sage/matrix/matrix1.pyx", line 2442, in sage.matrix.matrix1.Matrix.new_matrix (build/cythonized/sage/matrix/matrix1.c:18728)
  File "/Users/dmorris/Documents/misc/Programs/sage3/local/lib/python3.7/site-packages/sage/matrix/matrix_space.py", line 816, in __call__

Line 13675 of matrix2.pyx is dd, uu, vv = mm.smith_form(transformation=True)

Line 13654 of matrix2.pyx is left_mat = self.new_matrix(self.nrows(), self.nrows(), 1)

Line 2442 of matrix1.pyx is return self.matrix_space(nrows, ncols, sparse)(entries=entries, coerce=coerce, copy=copy)

(The line numbers in matrix2.pyx are 9 more than usual, because I inserted the definition of a function to do matrix multiplication.)

comment:22 Changed 15 months ago by jhpalmieri

At some point I was wondering if new_matrix was the problem, since I couldn't see how else the matrix t could have been affected. Some sort of overflow or something in the use of M.get_unsafe or M.set_unsafe?

comment:23 Changed 15 months ago by dimpase

can you install Xcode 10 and see if using it instead works?

comment:24 Changed 15 months ago by dimpase

https://developer.apple.com/documentation/xcode_release_notes/xcode_11_release_notes has some interesting remarks:

Clang now provides a mechanism for controlling exit-time destructor registration.
 You can disable these globally with the flag -fno-c++-static-destructors, or apply the attribute [[clang::no_destroy]] to disable the destructors of specific variables. 
The attribute [[clang::always_destroy]] was also added to enable destructors of specific variables when -fno-c++-static-destructors is used. (21734598)

and perhaps even more relevant:

the static linker (ld) now moves globals that are marked as constant into a new segment:
__DATA_CONST. 
These globals may consist of compiler generated pointers that the dynamic linker (dyld) needs to fix up during load, but are otherwise constant such as vtables and explicitly declared constant pointers. 
Once dyld has finished loading the image it makes __DATA_CONST readonly. 
This change doesn’t impact well behaved code, 
but may break code that depends on undefined behavior such as using a type pun to write to a pointer that’s declared as const. (50898833)

comment:25 Changed 15 months ago by jhpalmieri

I tried to install Xcode 10, and I still see the same problem. That is, I deleted the Xcode 11 app and installed Xcode 10, and I removed /Library/Developer/CommandLineTools/. To get the Sage build to succeed, I had to reinstall the command line tools. clang --version says Apple LLVM version 10.0.1 (clang-1001.0.46.4), but there could still be remnants of Xcode 11 in /usr/lib or other places.

Last edited 15 months ago by jhpalmieri (previous) (diff)

comment:26 Changed 15 months ago by dimpase

  • Cc mkoeppe added
  • Priority changed from critical to blocker

so the problem is likely in MacOS rather than in compiler?

Tough.

comment:27 Changed 15 months ago by jhpalmieri

If people who are better at debugging C code on OS X could take a look, maybe they could figure out exactly what's going on.

comment:28 Changed 15 months ago by vbraun

Can reproduce the issue on Linux by enabling bounds checking in glibc:

[release@zen Sage]$ MALLOC_CHECK_=3 ./sage
┌────────────────────────────────────────────────────────────────────┐
│ SageMath version 9.1.beta0, Release Date: 2020-01-10               │
│ Using Python 3.7.3. Type "help()" for help.                        │
└────────────────────────────────────────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Warning: this is a prerelease version, and it may be unstable.     ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
sage: random_matrix(GF(2), 6, 133).smith_form()
free(): invalid pointer
------------------------------------------------------------------------
/home/release/Sage/local/lib/python3.7/site-packages/cysignals/signals.cpython-37m-x86_64-linux-gnu.so(+0x9158)[0x7f8f19b29158]
/home/release/Sage/local/lib/python3.7/site-packages/cysignals/signals.cpython-37m-x86_64-linux-gnu.so(+0x91f9)[0x7f8f19b291f9]
/home/release/Sage/local/lib/python3.7/site-packages/cysignals/signals.cpython-37m-x86_64-linux-gnu.so(+0xcb6d)[0x7f8f19b2cb6d]
/lib64/libpthread.so.0(+0x14b20)[0x7f8f29881b20]
/lib64/libc.so.6(gsignal+0x145)[0x7f8f2958c625]
/lib64/libc.so.6(abort+0x12b)[0x7f8f295758d9]
/lib64/libc.so.6(+0x804af)[0x7f8f295d04af]
/lib64/libc.so.6(+0x87a9c)[0x7f8f295d7a9c]
/lib64/libc.so.6(+0x8b982)[0x7f8f295db982]
/lib64/libm4ri-0.0.20140914.so(m4ri_mmc_free+0x6c)[0x7f8c1ecbae1c]
/lib64/libm4ri-0.0.20140914.so(mzd_free+0x86)[0x7f8c1ec9dcc6]
/home/release/Sage/local/lib/python3.7/site-packages/sage/matrix/matrix_mod2_dense.cpython-37m-x86_64-linux-gnu.so(+0xc84f)[0x7f8c1ecf484f]
/home/release/Sage/local/lib/python3.7/site-packages/sage/matrix/matrix2.cpython-37m-x86_64-linux-gnu.so(+0x16f493)[0x7f8c1fa25493]
/home/release/Sage/local/lib/python3.7/site-packages/sage/matrix/matrix2.cpython-37m-x86_64-linux-gnu.so(+0x17637c)[0x7f8c1fa2c37c]
/home/release/Sage/local/lib/libpython3.7m.so.1.0(PyCFunction_Call+0x10d)[0x7f8f2998108d]

Running it in valgrind on Linux, there seems to be an out of bounds write going on here:

==2156412== Invalid write of size 8
==2156412==    at 0x1D0870F3: mzd_submatrix (in /usr/lib64/libm4ri-0.0.20140914.so)
==2156412==    by 0x1D0080FC: __pyx_pf_4sage_6matrix_17matrix_mod2_dense_17Matrix_mod2_dense_44submatrix (matrix_mod2_dense.c:12591)
==2156412==    by 0x1D008570: __pyx_pw_4sage_6matrix_17matrix_mod2_dense_17Matrix_mod2_dense_45submatrix (matrix_mod2_dense.c:12196)
==2156412==    by 0x48D819B: cfunction_call_varargs (call.c:753)
==2156412==    by 0x48DAD7A: PyCFunction_Call (call.c:784)
==2156412==    by 0x1BF33A66: __Pyx_PyObject_Call (matrix2.c:120348)
==2156412==    by 0x1BFFF7A6: __pyx_pf_4sage_6matrix_7matrix2_6Matrix_266smith_form (matrix2.c:92128)
==2156412==    by 0x1C004F98: __pyx_pw_4sage_6matrix_7matrix2_6Matrix_267smith_form (matrix2.c:91182)
==2156412==    by 0x48D819B: cfunction_call_varargs (call.c:753)
==2156412==    by 0x48DAD7A: PyCFunction_Call (call.c:784)
==2156412==    by 0x1BF33A66: __Pyx_PyObject_Call (matrix2.c:120348)
==2156412==    by 0x1BFFF864: __pyx_pf_4sage_6matrix_7matrix2_6Matrix_266smith_form (matrix2.c:92156)
==2156412==  Address 0x14dfb210 is 0 bytes after a block of size 16 alloc'd
==2156412==    at 0x483BE45: memalign (vg_replace_malloc.c:908)
==2156412==    by 0x483BF47: posix_memalign (vg_replace_malloc.c:1072)
==2156412==    by 0x1D09DD45: m4ri_mmc_malloc (in /usr/lib64/libm4ri-0.0.20140914.so)
==2156412==    by 0x1D0808FF: mzd_init (in /usr/lib64/libm4ri-0.0.20140914.so)
==2156412==    by 0x1D025396: __pyx_pf_4sage_6matrix_17matrix_mod2_dense_17Matrix_mod2_dense___cinit__ (matrix_mod2_dense.c:3763)
==2156412==    by 0x1D026B0C: __pyx_pw_4sage_6matrix_17matrix_mod2_dense_17Matrix_mod2_dense_1__cinit__ (matrix_mod2_dense.c:3576)
==2156412==    by 0x1D026BE5: __pyx_tp_new_4sage_6matrix_17matrix_mod2_dense_Matrix_mod2_dense (matrix_mod2_dense.c:20399)
==2156412==    by 0x493CDBD: type_call (typeobject.c:929)
==2156412==    by 0x48D9180: _PyObject_FastCallKeywords (call.c:199)
==2156412==    by 0x49BED1A: call_function (ceval.c:4619)
==2156412==    by 0x49CBC59: _PyEval_EvalFrameDefault (ceval.c:3093)
==2156412==    by 0x49C00C0: PyEval_EvalFrameEx (ceval.c:547)

comment:29 Changed 15 months ago by vbraun

  • Cc malb added

A shorter crasher:

./sage -c 'random_matrix(GF(2), 2, 129).submatrix(1, 1)'

results in an invalid free message both on Linux (with MALLOC_CHECK_=3) and OSX

We call mzd_submatrix(A._entries, self._entries, 1, 1, 2, 129)

comment:30 Changed 15 months ago by dimpase

even easier:

$ MALLOC_CHECK_=3 ./sage -c 'zero_matrix(GF(2), 2, 129).submatrix(1, 1)'
free(): invalid pointer
------------------------------------------------------------------------
/home/dimpase/sage/local/lib/python3.7/site-packages/cysignals/signals.cpython-37m-x86_64-linux-gnu.so(+0x9248)[0x79b7b2fa3248]
/home/dimpase/sage/local/lib/python3.7/site-packages/cysignals/signals.cpython-37m-x86_64-linux-gnu.so(+0x92e8)[0x79b7b2fa32e8]
/home/dimpase/sage/local/lib/python3.7/site-packages/cysignals/signals.cpython-37m-x86_64-linux-gnu.so(+0xce6d)[0x79b7b2fa6e6d]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x79b7b3749730]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x79b7b341e7bb]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x121)[0x79b7b3409535]
/lib/x86_64-linux-gnu/libc.so.6(+0x79508)[0x79b7b3460508]
/lib/x86_64-linux-gnu/libc.so.6(+0x7fc1a)[0x79b7b3466c1a]
/lib/x86_64-linux-gnu/libc.so.6(+0x83b3e)[0x79b7b346ab3e]
/usr/lib/x86_64-linux-gnu/libm4ri-0.0.20140914.so(m4ri_mmc_cleanup+0x43)[0x79b74bb56973]
/usr/lib/x86_64-linux-gnu/libm4ri-0.0.20140914.so(m4ri_fini+0x9)[0x79b74bb316f9]
/lib64/ld-linux-x86-64.so.2(+0xf6f6)[0x79b7b3af26f6]
/lib/x86_64-linux-gnu/libc.so.6(+0x39d8c)[0x79b7b3420d8c]
/lib/x86_64-linux-gnu/libc.so.6(+0x39eba)[0x79b7b3420eba]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf2)[0x79b7b340b0a2]
python3(_start+0x2a)[0x5988f27b608a]

and essentially the same with the new m4ri version from #29026

comment:31 Changed 15 months ago by dimpase

by the way, 129 is a kind of a magic value, nearby values don't produce a crash.

comment:32 Changed 15 months ago by dimpase

  • Report Upstream changed from N/A to Reported upstream. No feedback yet.

comment:33 Changed 15 months ago by vbraun

Martin says he'll have a look

comment:34 Changed 15 months ago by malb

This should be fixed now in M4RI's master. I haven't tested against Sage yet, though. Happy to cut a new release, obviously.

Last edited 15 months ago by malb (previous) (diff)

comment:35 Changed 15 months ago by dimpase

#29026 now has this patch in. Please verify it fixes the issue.

comment:36 Changed 15 months ago by gh-DaveWitteMorris

Yes, #29026 eliminated the problem for me. I tested thousands of times and never got any kind of error. (It also fixed #28970 for me.) Thanks!!

comment:37 Changed 15 months ago by jhpalmieri

  • Milestone changed from sage-9.1 to sage-duplicate/invalid/wontfix
  • Status changed from new to needs_review

Looks good to me, too. Let's close this as a duplicate of #29026.

comment:38 Changed 15 months ago by jhpalmieri

  • Status changed from needs_review to positive_review

comment:39 Changed 15 months ago by jhpalmieri

  • Reviewers set to John Palmieri

comment:40 Changed 15 months ago by chapoton

  • Resolution set to duplicate
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.