#3113 closed defect (fixed)
[with patch, with positive review] Major segfault related to modular symbols and pickling
Reported by: | was | Owned by: | craigcitro |
---|---|---|---|
Priority: | blocker | Milestone: | sage-3.0.3 |
Component: | modular forms | Keywords: | |
Cc: | wjp | Merged in: | |
Authors: | Reviewers: | ||
Report Upstream: | Work issues: | ||
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description (last modified by )
DATA = '/tmp/' def ranks(N): filename = '%sranks-%s.sobj'%(DATA,N) if os.path.exists(filename): D = load(filename) else: D = ModularSymbols(N,sign=1).cuspidal_submodule().new_submodule().decomposition() for i,A in enumerate(D): eps = -A.atkin_lehner_operator().matrix()[0,0] winding_element = A.rational_period_mapping()(A.ambient_module()([0,oo])) print N, i, eps, winding_element == 0 save(D, filename)
then
sage: ranks(11) 11 0 1 False sage: ranks(11) BOOM!
GDB gives
sage: ranks(11) Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_PROTECTION_FAILURE at address: 0x00000001 0x0077231c in __gmpn_gcd_1 () (gdb) bt #0 0x0077231c in __gmpn_gcd_1 () #1 0x0075ba88 in __gmpz_gcd () Cannot access memory at address 0x5 (gdb) The program is running. Exit anyway? (y or n) lo^H^H Please answer y or n.
Attachments (2)
Change History (16)
comment:1 Changed 13 years ago by
comment:2 Changed 13 years ago by
- Cc wjp added
More info:
DATA = '/tmp/' def ranks(N): filename = '%sranks-%s.sobj'%(DATA,N) if os.path.exists(filename): print "loaded pickled object" D = load(filename) else: print "computing modular symbol" D = ModularSymbols(N,sign=1).cuspidal_submodule().new_submodule().decomposition() for i,A in enumerate(D): print "foo" eps = -A.atkin_lehner_operator().matrix()[0,0] print "bar" winding_element = A.rational_period_mapping()(A.ambient_module()([0,oo])) print "baz" print N, i, eps, winding_element == 0 save(D, filename)
With that I get:
sage: ranks(11) computing modular symbol foo bar baz 11 0 1 False sage: ranks(11) loaded pickled object foo bar *** glibc detected *** realloc(): invalid size: 0x00007fffa25d3f90 *** /scratch/mabshoff/release-cycle/sage-3.0.2.alpha0/local/bin/sage-sage: line 214: 12788 Aborted sage-ipython "$@" -c "$SAGE_STARTUP_COMMAND;"
wjp saw the following under valgrind:
Invalid free() / delete / delete[] ==10368== at 0x4C210F2: realloc (in /usr/lib64/valgrind/amd64-linux/vgpreload_memcheck.so) ==10368== by 0x925C230: __gmpz_realloc (in /data/sage/sage-3.0.rc0/local/lib/libgmp.so.3.4.1) ==10368== by 0x92550DE: __gmpz_gcd (in /data/sage/sage-3.0.rc0/local/lib/libgmp.so.3.4.1) ==10368== by 0x925F796: __gmpq_aors (in /data/sage/sage-3.0.rc0/local/lib/libgmp.so.3.4.1) ==10368== by 0x25B2D153: __pyx_f_4sage_6matrix_21matrix_rational_dense_21Matrix_rational_dense__vector_times_matrix_c_impl (matrix_rational_dense.c:5570) ==10368== by 0x2F0EEC19: __pyx_f_4sage_6matrix_6action_18VectorMatrixAction__call_c_impl (action.c:2903) ==10368== by 0xEE8208B: __pyx_f_4sage_10categories_6action_6Action__call_c (action.c:1684) ==10368== by 0xEC61415: __pyx_f_4sage_9structure_6coerce_24CoercionModel_cache_maps_bin_op_c (coerce.c:5018) ==10368== by 0xE7E180A: __pyx_pf_4sage_9structure_7element_6Vector___mul__ (element.c:10413) ==10368== by 0x417FDC: binary_op1 (abstract.c:398) ==10368== by 0x41B717: PyNumber_Multiply (abstract.c:669) ==10368== by 0x48463C: PyEval_EvalFrameEx (ceval.c:1073) ==10368== Address 0x7feffd4b0 is on thread 1's stack ==10368== ==10368== Invalid write of size 8 ==10368== at 0x9254E04: __gmpz_gcd (in /data/sage/sage-3.0.rc0/local/lib/libgmp.so.3.4.1) ==10368== by 0x925F796: __gmpq_aors (in /data/sage/sage-3.0.rc0/local/lib/libgmp.so.3.4.1) ==10368== by 0x25B2D153: __pyx_f_4sage_6matrix_21matrix_rational_dense_21Matrix_rational_dense__vector_times_matrix_c_impl (matrix_rational_dense.c:5570) ==10368== by 0x2F0EEC19: __pyx_f_4sage_6matrix_6action_18VectorMatrixAction__call_c_impl (action.c:2903) ==10368== by 0xEE8208B: __pyx_f_4sage_10categories_6action_6Action__call_c (action.c:1684) ==10368== by 0xEC61415: __pyx_f_4sage_9structure_6coerce_24CoercionModel_cache_maps_bin_op_c (coerce.c:5018) ==10368== by 0xE7E180A: __pyx_pf_4sage_9structure_7element_6Vector___mul__ (element.c:10413) ==10368== by 0x417FDC: binary_op1 (abstract.c:398) ==10368== by 0x41B717: PyNumber_Multiply (abstract.c:669) ==10368== by 0x48463C: PyEval_EvalFrameEx (ceval.c:1073) ==10368== by 0x4886B1: PyEval_EvalCodeEx (ceval.c:2836) ==10368== by 0x4D1E17: function_call (funcobject.c:517) ==10368== Address 0x0 is not stack'd, malloc'd or (recently) free'd
And:
[23:50] <mabshoff|zzz> wstein|afk: around? [23:50] <mabshoff|zzz> #3113 is quite odd to say the least. [23:50] <mabshoff|zzz> It valgrinds clean, but it either crashes on the first load or not at all it seems. [23:55] <wjp> I'm getting what appears to be a realloc() of stack memory in valgrind [23:55] <mabshoff|zzz> wjp: where? [23:55] <wjp> sec [23:55] <mabshoff|zzz> It seems to crash every time on OSX 10.5 [23:55] <mabshoff|zzz> But on sage.math it is harder to hit. [23:56] <wjp> http://www.math.leidenuniv.nl/~wpalenst/sage/3113.valgrind [23:57] <mabshoff|zzz> ok, I didn't see that at all. [23:57] <wjp> oh, this is with 3.0.rc0, by the way [23:57] * wjp should build 3.0.1 at home [23:59] <mabshoff|zzz> mmh, that corresponds to "mpq_mul(y, w._entries[j], self._matrix[j][i])" [23:59] <mabshoff|zzz> wjp: I doubt that matters. [00:00] <wjp> still thought I'd mention it just in case :-) [00:00] <mabshoff|zzz> :) [00:00] <wjp> (and for line numbers and such) [00:00] <mabshoff|zzz> Maybe the pickling is broken [00:00] <mabshoff|zzz> Otherwise I can't imagine what goes wrong there. [00:00] <wjp> if it's trying to realloc stack memory, it sounds like the unpickling may be allocating things the wrong way? [00:01] <mabshoff|zzz> alignment comes to mind. [00:04] <mabshoff|zzz> wjp: what goes boom is "winding_element = A.rational_period_mapping()(A.ambient_module()([0,oo]))" [00:04] <wjp> hm, it only seems to crash here if I save and load in the same session [00:05] <mabshoff|zzz> yep [00:05] <mabshoff|zzz> When I do not deleted the pickled objects it seems fine.
Cheers,
Michael
comment:3 Changed 13 years ago by
- Description modified (diff)
comment:4 Changed 13 years ago by
This is a problem in the automatic coercion model code that dispatches how "vector times matrix" is done. To see this:
- Add print statements in matrix_rational_dense.pyx
cdef Vector _vector_times_matrix_c_impl(self, Vector v): ... print "Input: v = ", v print "type(v) = ", type(v)
- Run the example (after deleting /tmp/*.sobj!!):
sage: DATA = '/tmp/' sage: def ranks(N): ....: filename = '%sranks-%s.sobj'%(DATA,N) ....: if os.path.exists(filename): ....: D = load(filename) ....: else: ....: D = ModularSymbols(N,sign=1).cuspidal_submodule().new_submodule().decomposition() ....: for i,A in enumerate(D): ....: eps = -A.atkin_lehner_operator().matrix()[0,0] ....: winding_element = A.rational_period_mapping()(A.ambient_module()([0,oo])) ....: print N, i, eps, winding_element == 0 ....: save(D, filename) ....: sage: sage: ranks(11) Input: v = (0, 1) type(v) = <type 'sage.modules.vector_rational_dense.Vector_rational_dense'> Input: v = (0, 1) type(v) = <type 'sage.modules.vector_rational_dense.Vector_rational_dense'> Input: v = (-1, 0) type(v) = <type 'sage.modules.vector_rational_dense.Vector_rational_dense'> 11 0 1 False sage: ranks(11) Input: v = (-1, 0) type(v) = <type 'sage.modules.free_module_element.FreeModuleElement_generic_sparse'> ------------------------------------------------------------ Unhandled SIGBUS: A bus error occured in SAGE. This probably occured because a *compiled* component of SAGE has a bug in it (typically accessing invalid memory) or is not properly wrapped with _sig_on, _sig_off. You might want to run SAGE under gdb with 'sage -gdb' to debug this. SAGE will now terminate (sorry). ------------------------------------------------------------
comment:5 Changed 13 years ago by
This is caused by matrix action caching code and the facts that:
- sparse and dense matrix spaces of the same dimension and degree are considered "equal" in sage.
- And pickle/unpickle happens to change something from dense to sparse.
Certainly 2 is weird and is probably very specifically caused by a mistake in some pure python code for modular symbols. But it shouldn't result in segfaults.
If one add 4 lines to modules/free_module.py to make it so modular symbols spaces are *not* equal if one is sparse and one is dense, then the problem completely disappears.
Here's the diff that would do that:
diff -r 0a4213d9da78 sage/modules/free_module.py --- a/sage/modules/free_module.py Tue May 06 10:12:53 2008 -0700 +++ b/sage/modules/free_module.py Tue May 06 17:06:36 2008 -0700 @@ -2784,6 +2784,8 @@ class FreeModule_ambient(FreeModule_gene if not isinstance(other, FreeModule_generic): return cmp(type(self), type(other)) if isinstance(other, FreeModule_ambient): + c = cmp(self.is_sparse(), other.is_sparse()) + if c: return c c = cmp(self.rank(), other.rank()) if c: return c c = cmp(self.base_ring(), other.base_ring()) @@ -3344,6 +3346,8 @@ class FreeModule_submodule_with_basis_pi return 0 if not isinstance(other, FreeModule_generic): return cmp(type(self), type(other)) + c = cmp(self.is_sparse(), other.is_sparse()) + if c: return c c = cmp(self.ambient_vector_space(), other.ambient_vector_space()) if c: return c c = cmp(self.dimension(), other.dimension())
I will leave it to Robert Bradshaw to decide what to do. Probably the right solution is that sparse and dense vector spaces do not compare to be equal no matter what. Hmmm. Thoughts?
comment:6 Changed 13 years ago by
Probably we shouldn't be using dictionaries (that compare with equals) to cache this information. There should also be a check. The coercion model itself uses a custom dictionary that keys on the object pointer, but the caching in the class itself is done via standard Python dicts. There should be a check at the very least--I'll write a patch to do that.
Changed 13 years ago by
comment:7 Changed 13 years ago by
- Summary changed from Major segfault related to modular symbols and pickling to [with patch] Major segfault related to modular symbols and pickling
It now adds an extra check to make sure coercion is performed when parents are equal but not unique for actions.
comment:8 Changed 13 years ago by
- Summary changed from [with patch] Major segfault related to modular symbols and pickling to [with patch, needs review] Major segfault related to modular symbols and pickling
comment:9 Changed 13 years ago by
- Summary changed from [with patch, needs review] Major segfault related to modular symbols and pickling to [with patch, with negative review] Major segfault related to modular symbols and pickling
Patch looks reasonable, but this creates a doctest failure in sage/coding/linear_code.py
. The __eq__
code there could probably be improved, but the error seems to be coming up in some sort of internal consistency check with the coercion model. I'm voting to kick this back to robertwb to think about, since he's the expert.
Here's the doctest failure:
sage -t devel/sage-ref/sage/coding/linear_code.py ********************************************************************** File "/sage/tmp/linear_code.py", line 139: age: C == loads(dumps(C)) Exception raised: Traceback (most recent call last): File "/sage/local/lib/python2.5/doctest.py", line 1228, in __run compileflags, 1) in test.globs File "<doctest __main__.example_0[13]>", line 1, in <module> C == loads(dumps(C))###line 139: age: C == loads(dumps(C)) File "/sage/local/lib/python2.5/site-packages/sage/coding/linear_code.py", line 984, in __eq__ if scheck*c: File "element.pyx", line 2122, in sage.structure.element.Matrix.__mul__ (sage/structure/element.c:11352) File "coerce.pyx", line 267, in sage.structure.coerce.CoercionModel_cache_maps.bin_op_c (sage/structure/coerce.c:5039) File "coerce.pyx", line 527, in sage.structure.coerce.CoercionModel_cache_maps.get_action_c (sage/structure/coerce.c:7876) File "coerce.pyx", line 574, in sage.structure.coerce.CoercionModel_cache_maps.verify_action (sage/structure/coerce.c:8379) RuntimeError: There is a BUG in the coercion model: Action found for R <built-in function mul> S does not have the correct domains R = Full MatrixSpace of 3 by 7 dense matrices over Finite Field of size 2 S = Vector space of dimension 7 over Finite Field of size 2 (should be Full MatrixSpace of 3 by 7 dense matrices over Finite Field of size 2, Vector space of dimension 7 over Finite Field of size 2) action = Left action by Full MatrixSpace of 3 by 7 dense matrices over Finite Field of size 2 on Vector space of dimension 7 over Finite Field of size 2 (<type 'sage.categories.action.PrecomposedAction'>) **********************************************************************
Negative review until this gets corrected, but will be happy to give a positive once this is in order ...
comment:10 Changed 13 years ago by
- Priority changed from major to blocker
Changed 13 years ago by
comment:11 Changed 13 years ago by
- Summary changed from [with patch, with negative review] Major segfault related to modular symbols and pickling to [with patch, needs review] Major segfault related to modular symbols and pickling
Attached patch fixes the last small problem -- somewhere there was a left
instead of right
.
comment:12 Changed 13 years ago by
- Summary changed from [with patch, needs review] Major segfault related to modular symbols and pickling to [with patch, with positive review] Major segfault related to modular symbols and pickling
Dyslexia strikes again! This is the right fix.
comment:13 Changed 13 years ago by
- Resolution set to fixed
- Status changed from new to closed
Merged in Sage 3.0.3.rc0
comment:14 Changed 13 years ago by
- Milestone changed from sage-3.0.4 to sage-3.0.3
This throws an error for me the first time around:
But with some slight fix I also get a crash:
Poking around!
Cheers,
Michael