Opened 9 years ago

Closed 9 years ago

Last modified 7 years ago

#12480 closed defect (fixed)

NTL segfault on OS X 10.7

Reported by: jdemeyer Owned by: was
Priority: critical Milestone: sage-5.0
Component: interfaces Keywords:
Cc: Merged in: sage-5.0.beta8
Authors: David Roe Reviewers: William Stein, Jeroen Demeyer
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description (last modified by jdemeyer)

On OS X 10.7 (compiled with gcc-4.6.2, #12369):

The following tests failed:

        sage -t  --long -force_lib devel/sage/sage/rings/padics/padic_ZZ_pX_CR_element.pyx # Killed/crashed

This is because executing the following code (from the file devel/sage/sage/rings/padics/padic_ZZ_pX_CR_element.pyx):

R = Zp(5,5)
S.<x> = R[]
f = x^5 + 75*x^3 - 15*x^2 +125*x - 5
W.<w> = R.ext(f)
pol = ntl.ZZ_pX([5^40,5^42,3*5^41], 5^44)
W(pol, relprec = 0)

crashes Sage:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000300000002
0x0000000101791ae1 in CopyPointer (dst=@0x10190b700, src=0x300000002) at ZZ_p.c:157
157           if (src->ref_count == NTL_MAX_LONG)
Current language:  auto; currently c++
(gdb) bt
#0  0x0000000101791ae1 in CopyPointer (dst=@0x10190b700, src=0x300000002) at ZZ_p.c:157
#1  0x0000000101791d46 in NTL::ZZ_pContext::restore (this=0x100191148) at ZZ_p.c:205
#2  0x000000010164e1aa in ZZ_pX_conv_modulus ()
#3  0x00000001063a1ff7 in __pyx_f_4sage_5rings_6padics_16pow_computer_ext_ZZ_pX_eis_shift_p (__pyx_v_self=0x10db203f0, __pyx_v_x=0x10e057f18, __pyx_v_a=0x10e057f18, __pyx_v_n=200, __pyx_v_finalprec=0) at pow_computer_ext.cpp:6173
#4  0x0000000106392871 in __pyx_f_4sage_5rings_6padics_16pow_computer_ext_27PowComputer_ZZ_pX_small_Eis_eis_shift (__pyx_v_self=0x10db203f0, __pyx_v_x=0x10e057f18, __pyx_v_a=0x10e057f18, __pyx_v_n=200, __pyx_v_finalprec=0) at pow_computer_ext.cpp:12917
#5  0x000000010638aced in __pyx_f_4sage_5rings_6padics_16pow_computer_ext_17PowComputer_ZZ_pX_eis_shift_capdiv (__pyx_v_self=0x10db203f0, __pyx_v_x=0x10e057f18, __pyx_v_a=0x10e057f18, __pyx_v_n=200, __pyx_v_finalprec=0) at pow_computer_ext.cpp:9448
#6  0x00000001064e0aba in __pyx_f_4sage_5rings_6padics_22padic_ZZ_pX_CR_element_18pAdicZZpXCRElement__internal_lshift (__pyx_v_self=0x10e057ef0, __pyx_v_shift=-200) at padic_ZZ_pX_CR_element.cpp:9197
#7  0x00000001064dfd8b in __pyx_f_4sage_5rings_6padics_22padic_ZZ_pX_CR_element_18pAdicZZpXCRElement__set_from_ZZ_pX_part2 (__pyx_v_self=0x10e057ef0, __pyx_v_poly=0x10e075f98) at padic_ZZ_pX_CR_element.cpp:8445
#8  0x00000001064de4c7 in __pyx_f_4sage_5rings_6padics_22padic_ZZ_pX_CR_element_18pAdicZZpXCRElement__set_from_ZZ_pX_rel (__pyx_v_self=0x10e057ef0, __pyx_v_poly=0x10e075f98, __pyx_v_ctx=0x10e078730, __pyx_v_relprec=0) at padic_ZZ_pX_CR_element.cpp:8146
#9  0x00000001064fe4ee in __pyx_pf_4sage_5rings_6padics_22padic_ZZ_pX_CR_element_18pAdicZZpXCRElement___init__ (__pyx_v_self=0x10e057ef0, __pyx_args=0x10db23d40, __pyx_kwds=0x10da8f290) at padic_ZZ_pX_CR_element.cpp:5163
#10 0x000000010007a1a7 in type_call ()
Previous frame inner to this frame (gdb could not unwind past this frame)

This is a huge can of worms. The relevant code is in sage/rings/padics/pow_computer_ext.pyx:

    cdef ntl_ZZ_pContext_class get_context(self, long n):
        """
        Returns the context for p^n.

        Note that this function will raise an Index error if n > self.cache_limit.
        Also, it will return None on input 0
        [...]
        """
        if n < 0:
            n = -n
        try:
            return self.c[n]
        except IndexError:
            return PowComputer_ZZ_pX.get_context(self, n)

On input 0, we have a Py_None typecast to a ntl_ZZ_pContext_class and we pretend like that's okay. How does this make sense? The fact that it crashes on OS X 10.7 is expected, the surprising thing is that this doesn't crash on other systems.

The None in the self.c[] array comes from line 1671 in pow_computer_ext.pyx:

            self.c.append(None)
            for i from 1 <= i <= cache_limit:
                self.c.append(PowComputer_ZZ_pX.get_context(self,i))

Attachments (2)

12480.patch (1.9 KB) - added by roed 9 years ago.
12480_review.patch (14.9 KB) - added by jdemeyer 9 years ago.
Reviewer patch, apply on top of previous

Download all attachments as: .zip

Change History (21)

comment:1 Changed 9 years ago by jdemeyer

  • Description modified (diff)

comment:2 Changed 9 years ago by jdemeyer

  • Description modified (diff)

comment:3 Changed 9 years ago by jdemeyer

  • Description modified (diff)

comment:4 Changed 9 years ago by jdemeyer

  • Description modified (diff)

Changed 9 years ago by roed

comment:5 Changed 9 years ago by was

  • Status changed from new to needs_review

Positive review to the patch posted right when I write this.

comment:6 Changed 9 years ago by was

  • Status changed from needs_review to positive_review

comment:7 Changed 9 years ago by jdemeyer

  • Authors set to David Roe
  • Reviewers set to William Stein, Jeroen Demeyer
  • Status changed from positive_review to needs_work

Reviewer patch fixes some documentation strings and also removes exceptions which will be ignored anyway (a cdef function can only raise exceptions if it is declared with an "except" value or if it returns a Python object).

comment:8 Changed 9 years ago by jdemeyer

  • Status changed from needs_work to needs_review

comment:9 Changed 9 years ago by jdemeyer

William or David, could any of you please review my reviewer patch?

comment:10 follow-up: Changed 9 years ago by was

  • Status changed from needs_review to needs_work

David and I discussed ValueError? versus RuntimeError? when he wrote this. I initially suggested ValueError? (as you changed it to), but he argued that this error should *never* occur, ever -- there should be no way to call this code from Python to produce this error. I'm convinced by David.

comment:11 in reply to: ↑ 10 Changed 9 years ago by jdemeyer

Replying to was:

he argued that this error should *never* occur, ever

I don't see why this is an argument for RuntimeError instead of ValueError.

How would you like SystemError instead? Of all standard Python exceptions, it seems to be the closest equivalent to what you want: http://docs.python.org/library/exceptions.html#exceptions.SystemError

comment:12 Changed 9 years ago by jdemeyer

* ping *

comment:13 Changed 9 years ago by jdemeyer

* ahem *

comment:14 follow-up: Changed 9 years ago by roed

Sorry: I wasn't getting trac notifications on this ticket for some reason.

The changes look good, except the removal of the three checks for n < 0 in pow_mpz_t_tmp and pow_ZZ_tmp where a ValueError? is raised. While I understand that Cython will ignore those errors, I think the right option is to open another ticket and modify what is done, rather than just removing the checks. With these changes it's easier for these functions to segfault on incorrect input.

Changed 9 years ago by jdemeyer

Reviewer patch, apply on top of previous

comment:15 Changed 9 years ago by jdemeyer

  • Status changed from needs_work to needs_review

I kept all raise statements but annotated them by

# Exception will be ignored by Cython

comment:16 follow-up: Changed 9 years ago by roed

  • Status changed from needs_review to positive_review

Looks good. Have you created a ticket to fix these ignored exceptions, or shall I?

comment:17 in reply to: ↑ 16 Changed 9 years ago by jdemeyer

Replying to roed:

Looks good. Have you created a ticket to fix these ignored exceptions, or shall I?

I don't plan to deal with this code, so go ahead.

comment:18 Changed 9 years ago by jdemeyer

  • Merged in set to sage-5.0.beta8
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:19 in reply to: ↑ 14 Changed 7 years ago by saraedum

Replying to roed:

The changes look good, except the removal of the three checks for n < 0 in pow_mpz_t_tmp and pow_ZZ_tmp where a ValueError? is raised. While I understand that Cython will ignore those errors, I think the right option is to open another ticket and modify what is done, rather than just removing the checks. With these changes it's easier for these functions to segfault on incorrect input.

Does this refer to how Cython used to work? Does it make sense to add an except NULL to the function definition now? I stumbled upon this while working on #13591.

Note: See TracTickets for help on using tickets.