Ticket #8177 (needs_work defect)

Opened 3 years ago

Last modified 8 weeks ago

Breaking Integer's invariant can lead to a segfault in Sage 4.3.2.alpha1, Mac OS X 10.6.2

Reported by: mvngu Owned by: tbd
Priority: major Milestone: sage-5.10
Component: documentation Keywords:
Cc: nthiery, mvngu, hivert Work issues:
Report Upstream: N/A Reviewers:
Authors: Merged in:
Dependencies: Stopgaps:

Description (last modified by nthiery) (diff)

Integer assumes the following invariant:

x.parent() == ZZ <==> x.class == Integer

Breaking this invariant (which is easy to do using plain Python) can lead to a segfault.

See the following discussion for how it was detected:

From  sage-devel and also  reported here:

> built fine on mac 10.6.2 but one failure for sage -testall :

> The following tests failed:

>        sage -t  "devel/sage/sage/structure/element_wrapper.py" # Segfault

I get the same result on bsd.math (Mac OS X 10.6.2). Doing a verbose
long doctest, I get:

Trying:
    Integer(1) < l11###line 213:_sage_    >>> 1 < l11
Expecting:
    False

------------------------------------------------------------
Unhandled SIGSEGV: A segmentation fault occured in SAGE.
This probably occured because a *compiled* component
of SAGE has a bug in it (typically accessing invalid memory)
or is not properly wrapped with _sig_on, _sig_off.
You might want to run SAGE under gdb with 'sage -gdb' to debug this.
SAGE will now terminate (sorry).
------------------------------------------------------------ 

See #8200 which improves the doctests of ElementWrapper? to not trigger this bug anymore. #8200 does not resolve the issue though!

Attachments

trac_8177-element_wrapper_segfault-fh.patch Download (3.2 KB) - added by hivert 3 years ago.

Change History

comment:1 follow-up: ↓ 5 Changed 3 years ago by was

  • Priority changed from major to blocker

Here is a complete self-contained session that illustrates the problem:

wstein@bsd:~/build/sage-4.3.2.rc0$ ./sage
----------------------------------------------------------------------
| Sage Version 4.3.2.rc0, Release Date: 2010-02-03                   |
| Type notebook() for the GUI, and license() for information.        |
----------------------------------------------------------------------
**********************************************************************
*                                                                    *
* Warning: this is a prerelease version, and it may be unstable.     *
*                                                                    *
**********************************************************************
sage: class MyElement(ElementWrapper): __lt__ = ElementWrapper._lt_by_value
....:
sage: a = MyElement(1, parent = ZZ)
sage: 1 < a


------------------------------------------------------------
Unhandled SIGSEGV: A segmentation fault occured in SAGE.
This probably occured because a *compiled* component
of SAGE has a bug in it (typically accessing invalid memory)
or is not properly wrapped with _sig_on, _sig_off.
You might want to run SAGE under gdb with 'sage -gdb' to debug this.
SAGE will now terminate (sorry).
------------------------------------------------------------

wstein@bsd:~/build/sage-4.3.2.rc0$

comment:2 Changed 3 years ago by was

Oh, and here is the traceback:

sage: 1 < a                                                                 

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000000
0x000000010164f04d in __gmpz_cmp ()
(gdb) bt
#0  0x000000010164f04d in __gmpz_cmp ()
#1  0x0000000103ab178d in __pyx_f_4sage_5rings_7integer_7Integer__cmp_c_impl (__pyx_v_left=0x1097f81b0, __pyx_v_right=0x10caade10) at sage/rings/integer.c:7334
#2  0x0000000102317afc in __pyx_f_4sage_9structure_7element_7Element__richcmp_c_impl (__pyx_v_left=0x1097f81b0, __pyx_v_right=0x10caade10, __pyx_v_op=0) at sage/structure/element.c:7177
#3  0x0000000102349553 in __pyx_f_4sage_9structure_7element_7Element__richcmp (__pyx_v_left=0x1097f81b0, __pyx_v_right=0x10caade10, __pyx_v_op=0) at sage/structure/element.c:6917
#4  0x0000000103aae77b in __pyx_pf_4sage_5rings_7integer_7Integer___richcmp__ (__pyx_v_left=<value temporarily unavailable, due to optimizations>, __pyx_v_right=<value temporarily unavailable, due to optimizations>, __pyx_v_op=<value temporarily unavailable, due to optimizations>) at sage/rings/integer.c:7293
#5  0x000000010004e15c in try_rich_compare ()
#6  0x000000010004faff in PyObject_RichCompare ()
#7  0x00000001000af100 in PyEval_EvalFrameEx ()
#8  0x00000001000b3e70 in PyEval_EvalCodeEx ()
...

comment:3 Changed 3 years ago by craigcitro

  • Cc nthiery added

As mentioned in one of the sage-devel threads linked above, the issue here is that ZZ assumes that if it's the parent of some x, then x is of type sage.rings.integer.Integer. (Similar assumptions are made all over.) It turns out that this popping up only on OSX was a weird artifact of compiler details/how memory is initialized/the alignment of the planets, as explained in the sage-devel thread.

I'm waiting for Nicolas to weigh in on what he thinks should happen with the ElementWrapper code -- he's probably the most appropriate person to spin a patch. (Nicolas, I'm adding you in the cc.)

Changed 3 years ago by hivert

comment:4 Changed 3 years ago by hivert

  • Cc mvngu, hivert added
  • Status changed from new to needs_review
  • Authors set to Florent Hivert

Hi there !

I uploaded a patch where a replaced ZZ as a parent to Sets().example("facade") which is much closer to a meaningfull parent. It should hide the segfault problem which is actually a problem in Integer. However, I don't have access to a Max OS machine so please tell me if it doesn't work.

Cheers,

Florent

comment:5 in reply to: ↑ 1 Changed 3 years ago by nthiery

Replying to was:

Here is a complete self-contained session that illustrates the problem:

wstein@bsd:~/build/sage-4.3.2.rc0$ ./sage
----------------------------------------------------------------------
| Sage Version 4.3.2.rc0, Release Date: 2010-02-03                   |
| Type notebook() for the GUI, and license() for information.        |
----------------------------------------------------------------------
**********************************************************************
*                                                                    *
* Warning: this is a prerelease version, and it may be unstable.     *
*                                                                    *
**********************************************************************
sage: class MyElement(ElementWrapper): __lt__ = ElementWrapper._lt_by_value
....:
sage: a = MyElement(1, parent = ZZ)
sage: 1 < a

Just for the record, could someone post the result of this more self contained example?

   sage: class F(Element):
    ....:     pass
    ....:
    sage: x = F(ZZ)
    sage: 1 < x
    False

comment:6 Changed 3 years ago by nthiery

  • Priority changed from blocker to major
  • Status changed from needs_review to needs_work
  • Description modified (diff)
  • Summary changed from element_wrapper.py: Sage 4.3.2.alpha1 segfault on Mac OS X 10.6.2 to Breaking Integer's invariant can lead to a segfault in Sage 4.3.2.alpha1, Mac OS X 10.6.2

I have worked further on Florent's patch, and made a separate ticket for it: #8200. I have removed the blocker priority on this, and set it on #8200. I leave this ticket open, since the issue is still there. I let the experts decide whether this should be a won't fix or not.

comment:7 follow-up: ↓ 8 Changed 3 years ago by wdj

For me, this patch fixed the segfault but still failed:

jeeves:sage-4.3.2.rc0 wdj$ ./sage -t  "devel/sage/sage/structure/element_wrapper.py" # Segfault
sage -t  "devel/sage/sage/structure/element_wrapper.py"     
**********************************************************************
File "/Users/wdj/sagefiles/sage-4.3.2.rc0/devel/sage/sage/structure/element_wrapper.py", line 213:
    sage: 1 < l11
Expected:
    False
Got:
    True
**********************************************************************
1 items had failures:
   1 of  14 in __main__.example_8
***Test Failed*** 1 failures.
For whitespace errors, see the file /Users/wdj/.sage//tmp/.doctest_element_wrapper.py
         [2.3 s]
 
----------------------------------------------------------------------
The following tests failed:


        sage -t  "devel/sage/sage/structure/element_wrapper.py"

This is on sage-4.3.2.rc0 and mac 10.6.2.

comment:8 in reply to: ↑ 7 ; follow-up: ↓ 9 Changed 3 years ago by nthiery

Replying to wdj:

For me, this patch fixed the segfault but still failed:

Could you please try with the patch on #8200? It should be fixed in principle. Thanks!

comment:9 in reply to: ↑ 8 ; follow-up: ↓ 10 Changed 3 years ago by wdj

Replying to nthiery:

Replying to wdj:

For me, this patch fixed the segfault but still failed:

Could you please try with the patch on #8200? It should be fixed in principle. Thanks!

I'm running sage -testall on #8200 now (the above single doctest passes though).

Does this new ticket mean you want Minh to make this as wontfix/invalid/duplicate and close it?

comment:10 in reply to: ↑ 9 ; follow-up: ↓ 11 Changed 3 years ago by nthiery

Replying to wdj:

Does this new ticket mean you want Minh to make this as wontfix/invalid/duplicate and close it?

No, it means that, with #8200, I have done my part at fixing my improper usage of ZZ. But the issue that such an improper usage can cause a segfault is still there, and I leave to the experts the decision of whether to fix it now, leave it to later, or resolve it as wontfix.

I personally vote -1 for making it a wontfix.

comment:11 in reply to: ↑ 10 Changed 3 years ago by hivert

  • Authors Florent Hivert deleted

No, it means that, with #8200, I have done my part at fixing my improper usage of ZZ. But the issue that such an improper usage can cause a segfault is still there, and I leave to the experts the decision of whether to fix it now, leave it to later, or resolve it as wontfix.

I personally vote -1 for making it a wontfix.

I strongly second Nicolas -1.

Moreover, if the segfault is not removed the invariant

    x.parent() == ZZ <==> x.class == Integer

must be clearly stated with a _big warning_ in the doc. My opinion is that the segfault must be left only if there is a very large performance penalty fixing it. By the way is this invariant an equivalence ? As far as I understood the segfault only came because we where breaking the ==> part.

Cheers,

Florent

By the way, I've removed myself as author since I won't be hacking in integers (my work in integrated in #8200).

comment:12 Changed 8 weeks ago by roed

  • Component changed from doctest to documentation
Note: See TracTickets for help on using tickets.