Opened 10 years ago

Closed 8 years ago

Last modified 5 years ago

#11551 closed defect (worksforme)

Pari segfault on Sage startup in Cygwin

Reported by: kcrisman Owned by: tbd
Priority: major Milestone: sage-duplicate/invalid/wontfix
Component: porting: Cygwin Keywords: pari
Cc: dimpase, mhansen, jdemeyer, jpflori Merged in:
Authors: Reviewers: Karl-Dieter Crisman, Jean-Pierre Flori
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description (last modified by kcrisman)

In both Windows XP and Window 7 it is now possible (again) to build Sage on Cygwin. However, Sage has a segmentation fault in Pari upon startup.

This happens in initalizing the Pynac i (init_pynac_I in sage/symbolic/pynac.pyx), but the final thing is that the mpfr number 1.00000000000 causes the segfault upon running the ._pari_() method. Suggestions as to why that would be - and a potential fix - are welcome.

Attachments (3)

Parisegfault.PNG (87.8 KB) - added by kcrisman 10 years ago.
Screenshot of the problem
Screen shot 2011-06-29 at 9.19.14 PM.png (101.3 KB) - added by kcrisman 10 years ago.
Screenshot of last bits of backtrace from sage -gdb
Pari-2.5.0Segfault.png (76.8 KB) - added by kcrisman 10 years ago.
Segfault with #11130 applied

Download all attachments as: .zip

Change History (67)

Changed 10 years ago by kcrisman

Screenshot of the problem

comment:1 follow-up: Changed 10 years ago by kcrisman

I've attached a screenshot of the traceback - the best I can do in Cygwin with my limited experience.

comment:2 follow-up: Changed 10 years ago by jdemeyer

Can you please attach your sage/rings/real_mpfr.c?

Please do not report errors on non-released Sage versions (like sage-4.7.1.alpha4 in your case). Those versions can (and probably will) change slightly, which makes it harder to reproduce errors.

comment:3 in reply to: ↑ 1 Changed 10 years ago by dimpase

Replying to kcrisman:

I've attached a screenshot of the traceback - the best I can do in Cygwin with my limited experience.

hmm, you should be able to just copy the thing with your mouse and paste...

perhaps, running in an better terminal window, such as mintty : http://code.google.com/p/mintty/

comment:4 in reply to: ↑ 2 ; follow-up: Changed 10 years ago by kcrisman

Replying to jdemeyer:

Can you please attach your sage/rings/real_mpfr.c?

I'll try - depends on whether my wifi will work. I'm not on that computer currently.


Please do not report errors on non-released Sage versions (like sage-4.7.1.alpha4 in your case). Those versions can (and probably will) change slightly, which makes it harder to reproduce errors.

Well, building on Cygwin is not exactly straightforward, and (at least for me) extremely time-consuming, so I wanted to make sure I had as bleeding-edge of code as possible to catch potential problems. I find it unlikely that patches or spkgs will currently be backed out just because they break Cygwin, though if that is not true, that would make this job much easier and I would be very grateful.

Luckily, Mike Hansen already had this error (almost assuredly the same one) in 4.7.alpha3 - see this sage-devel thread. So I think that is the place to look. He thought it was the new error handling or the Pari upgrade, but <uniformed opinion>the message sounds more like Pari itself </uninformed opinion>).


I'd love to try a better terminal - William also had suggested one at Sage Days 31 - but I've only been really using Cygwin for maybe a week, and so I wouldn't even know how to ask Cygwin to use a different shell. Cut-and-paste does not work, as far as I've been able to tell.

comment:5 in reply to: ↑ 4 Changed 10 years ago by kcrisman

Replying to kcrisman:

Replying to jdemeyer:

Can you please attach your sage/rings/real_mpfr.c?

I'll try - depends on whether my wifi will work. I'm not on that computer currently.

Okay, that's a 1.5 MB file, so I am just posting a link.

http://sage.math.washington.edu/home/kcrisman/real_mpfr.c

This would be so great if it was possible to track down without too much trouble.

comment:6 Changed 10 years ago by kcrisman

As another (possibly unrelated) data point, #6743 has two patches which change the behavior of sage/rings/complex_double.pyx to get Sage to start (well, a year or two ago).

comment:7 Changed 10 years ago by kcrisman

I put in print statements at every conceivable place. Here is as far as it gets:

    def _pari_(self):
<snip comments/docs>
        sig_on()
        if mpfr_nan_p(self.value) or mpfr_inf_p(self.value):
            raise ValueError, 'Cannot convert NaN or infinity to Pari float'

        # wordsize for PARI
        cdef unsigned long wordsize = sizeof(long)*8

        cdef int prec
        prec = (<RealField_class>self._parent).__prec

        # We round up the precision to the nearest multiple of wordsize.
        cdef int rounded_prec
        rounded_prec = (self.prec() + wordsize - 1) & ~(wordsize - 1)

        # Yes, assigning to self works fine, even in Pyrex.
        if rounded_prec > prec:
            self = RealField(rounded_prec)(self)

        cdef mpz_t mantissa
        cdef mp_exp_t exponent
        cdef GEN pari_float

        if mpfr_zero_p(self.value):
            pari_float = real_0_bit(-rounded_prec)
        else:
            # Now we can extract the mantissa, and it will be normalized
            # (the most significant bit of the most significant word will be 1).
            mpz_init(mantissa)
            exponent = mpfr_get_z_exp(mantissa, self.value)
 
WE GET HERE AND NO FURTHER
           
            # Create a PARI REAL
            pari_float = cgetr(2 + rounded_prec / wordsize)
            mpz_export(&pari_float[2], NULL, 1, wordsize/8, 0, 0, mantissa)
            mpz_clear(mantissa)
            setexpo(pari_float, exponent + rounded_prec - 1)
            setsigne(pari_float, mpfr_sgn(self.value))
        
        cdef PariInstance P
        P = sage.libs.pari.all.pari
        return P.new_gen(pari_float)

Since

    # level1.h (incomplete!)
    
    GEN     cgetg_copy(long lx, GEN x)
    GEN     cgetg(long x, long y)
    GEN     cgeti(long x)
    GEN     cgetr(long x)
    long    itos(GEN x)
    GEN     real_0_bit(long bitprec)
    GEN     stoi(long s)

so cgetr is indeed from level1.h, which is where the sage -gdb backtrace ends up before raising the interrupt. What would get that to have problems?

Also attaching screenshot of the traceback.

Changed 10 years ago by kcrisman

Screenshot of last bits of backtrace from sage -gdb

comment:8 Changed 10 years ago by kcrisman

A little further "print"-ing revealed that that cgetr is the problem. By the way, 2 + rounded_prec / wordsize = 2 + 64/32 = 4.

What does

cgetr (x=(value optimized out))

mean? Does this mean that the Pari float will always have the same precision no matter what?

comment:9 Changed 10 years ago by kcrisman

GEN cgetr(long n) allocates memory on the stack for a t_REAL of length n, and initializes its first codeword. Identical to cgetg(n,t_REAL).

I'm going to try a few other things and then stop for now. But hopefully this helps.

comment:10 Changed 10 years ago by kcrisman

Trying even cgetg(4,t_REAL) raises a similar error. Pari seems to not be able to allocate anything - I don't know whether there is anything before this in initialization of Sage that has a problem.


Another data point: sage -gp works fine. Something in libpari might be off. How might I test that without actually starting Sage?

comment:11 Changed 10 years ago by kcrisman

I have now confirmed this with the released 4.7.1.alpha3 on both XP and Win7. It is very reproducible, always the same place.

comment:12 Changed 10 years ago by kcrisman

Another update: commenting out everything about initializing the Pynac I doesn't help, because there is another place in initialization this is used:

rings/qqbar.py:5800:    QQbar_I_nf = QuadraticField(-1, 'I', embedding=CC.gen())

which also causes the identical problem.

And _init_qqbar in sage/all.py seems like a fairly big thing to try to work around, even in testing. But commenting this out as well does allow Sage to start!

comment:13 follow-up: Changed 10 years ago by jdemeyer

Some things to try:

  1. The new PARI spkg from #11130.
  2. Compiling PARI with SAGE_DEBUG=yes and posting a backtrace again.

comment:14 follow-up: Changed 10 years ago by kcrisman

I tried 2. first. Not very exciting.

Program received signal SIGSEGV, Segmentation fault.
0x343a8ad5 in pari_err () from /home/.../sage-4.7.1.alpha3/devel/sage/sage-main/build/sage/rings/real_mpfr.dll

I couldn't get anything out of it that I hadn't seen before.

Again, knowing how to test whether libpari is working at all would be really helpful. The files in local/lib/ certainly exist, at any rate, and they are the ones created when I ./sage -f'ed it just now.

comment:15 Changed 10 years ago by kcrisman

I can't get 1. to install on Cygwin. Seems like a linking order error or something, see #11130.

comment:16 in reply to: ↑ 14 Changed 10 years ago by leif

Replying to kcrisman:

I tried 2. first. Not very exciting.

Program received signal SIGSEGV, Segmentation fault.
0x343a8ad5 in pari_err () from /home/.../sage-4.7.1.alpha3/devel/sage/sage-main/build/sage/rings/real_mpfr.dll

I couldn't get anything out of it that I hadn't seen before.

First "result" (i.e., I don't know yet why pari_error() is called at all, but see below):

Debugging this bottom-up, according to your nice screen shot the segfault originates from:

static void
err_init(void)
{
  /* make sure pari_err msg starts at the beginning of line */
  if (!pari_last_was_newline()) pari_putc('\n');
  pariOut->flush(); /***** THIS SEGFAULTS *****/
  pariErr->flush();
  pariOut = pariErr;
  term_color(c_ERR);
}

So obviously pariOut (and most probably also pariErr) aren't properly initialized at that point. (Note that line 885 in the vanilla PARI sources is the assignment statement, but we patch src/src/language/init.c such that we get an offset of +2 lines.)

PARI error number 14 is "errpile" (i.e. heap / [PARI] stack error), which is most probably raised for the same reason, namely because the PARI stack apparently isn't [yet] initialized when cgetr() gets called.

For the moment, it's up to someone else to donate his/her 2 ct or more... ;-)

comment:17 follow-ups: Changed 10 years ago by leif

I have no idea why real_mpfr[.pyx] shouldn't initialize [the] PARI [library] (i.e., the pari_instance variable defined in sage/libs/pari/gen.pyx), but you (Karl-Dieter) could verify it gets initialized by putting some print statement(s) into PariInstance's __init__(), preferably (also) around pari_init_opts(), to make sure the latter really gets called, because of

        if bot:
            return  # pari already initialized.

There are a few things that might be relevant here:

  • Cython doesn't support C enum constants (here e.g. INIT_DFTm), therefore one has to declare them as cdef extern ints, but I don't think that's the problem here.
  • bot is a very bad name for a global variable (of a library!), i.e. some other library / module might use the same for a different purpose, such that the one supposed to be PARI's may actually already have some non-zero value despite PARI not yet being initialized. (The early-return check in PariInstance's __init__() worsens that to some extent, though other problems would certainly arise later in that case.)

comment:18 in reply to: ↑ 17 ; follow-up: Changed 10 years ago by kcrisman

Replying to leif:

I have no idea why real_mpfr[.pyx] shouldn't initialize [the] PARI [library] (i.e., the pari_instance variable defined in sage/libs/pari/gen.pyx), but you (Karl-Dieter) could verify it gets initialized by putting some print statement(s) into PariInstance's __init__(), preferably (also) around pari_init_opts(),

Thanks, Leif - that seems very reasonable. Unfortunately I sort of destroyed my installations trying to do #11130 and I'm not sure how to fix that. I didn't know what the #0 error was, so I just started at #1, which at least I could interpret - well, I don't know much about Pari internals. But this explanation makes sense; can't allocate something to something that doesn't exist.

I'll try this when I get a chance.

comment:19 in reply to: ↑ 17 Changed 10 years ago by kcrisman

  • Cython doesn't support C enum constants (here e.g. INIT_DFTm), therefore one has to declare them as cdef extern ints, but I don't think that's the problem here.
  • bot is a very bad name for a global variable (of a library!), i.e. some other library / module might use the same for a different purpose, such that the one supposed to be PARI's may actually already have some non-zero value despite PARI not yet being initialized. (The early-return check in PariInstance's __init__() worsens that to some extent, though other problems would certainly arise later in that case.)

It's not far enough along to try these, but here's something naive.

        cdef GEN pari_float
<snip>
        else:
<snip>
            # Create a PARI REAL
            pari_float = cgetr(2 + rounded_prec / wordsize)
 <snip>
        cdef PariInstance P
        P = sage.libs.pari.all.pari
        return P.new_gen(pari_float)

So it looks like the GEN gets defined before the PariInstance - is that a problem for some reason? Again, this is totally naive, and probably wrong since this works everywhere else.

comment:20 in reply to: ↑ 18 Changed 10 years ago by leif

Replying to kcrisman:

Unfortunately I sort of destroyed my installations trying to do #11130 and I'm not sure how to fix that.

I don't know how you managed that ;-) but you should be able to just reinstall the "old" PARI (2.4.3.alpha.p7) at least (assuming you also have a Sage branch without #11130's patches applied, though these only change doctests IIRC).

If you think something may get mixed up with a previous installation, you can also

$ rm -rf $SAGE_ROOT/local/include/pari/
$ rm $SAGE_ROOT/local/lib/libpari*
$ rm $SAGE_ROOT/local/bin/{libpari,gp}*

before reinstalling the PARI package. (And perhaps also run ./sage -ba-force after you've reinstalled it.)

I didn't know what the #0 error was, so I just started at #1, which at least I could interpret [...]

No idea what the #0 and #1 refer to...


So it looks like the GEN gets defined before the PariInstance - is that a problem for some reason?

No. The weird trailer just explicitly uses the one and only global "PariInstance" pari_instance alias P alias sage.libs.pari.gen.pari alias sage.libs.pari.all.pari (which should get initialized as soon as you import from that module (sage.libs.pari.gen), which is done far above in real_mpfr.pyx), because new_gen() is only available as a member function (or "method") of an instance, for whatever reason.

comment:21 in reply to: ↑ 13 Changed 10 years ago by leif

Replying to jdemeyer:

Some things to try:

  1. Compiling PARI with SAGE_DEBUG=yes and posting a backtrace again.

We compile PARI with -g by default btw., SAGE_DEBUG only adds -O0.

Changed 10 years ago by kcrisman

Segfault with #11130 applied

comment:22 Changed 10 years ago by kcrisman

Latest screenshot shows that the upgrade of Pari in #11130 causes a slightly different segfault backtrace, but still along the same lines of what Leif is suggesting and nearly the same as before.

  if (x > (avma-bot) / sizeof(long)) pari_err(errpile);

is line 86 in level1.h, unless there are patches in Sage, and with the same two-line offset the #0 error in the backtrace is the same as above. What I find interesting is that this time it doesn't mention real_mpfr or cgetr, though I assume that is still where the problem is.

comment:23 Changed 10 years ago by kcrisman

Okay, inserting appropriate print statements gives

Got to first line of PariInsstance init
Got beyond 'if bot' of PariInstance init instead
Got beyond 'pari_init_opts'
Got here
Got to just before pari_float

which can be interpreted as

  • Pari was initialized
  • the 'if bot' was NOT taken
  • and pari_init_opts was apparently called
  • then we got to the complex number line
  • then we got to the real_mpfr line with the cgetr
  • and we didn't make it past that, as usual.

So apparently bot was not yet set, contrary to your hypothesis, but there is still something weird going on with the stack. The rest of the lines in the _init_ don't look that innocent either; if one of them failed or allocated something null would it raise an error? (Like the pari_free line or the pariOut lines?)

comment:24 follow-up: Changed 10 years ago by kcrisman

I also put in a print statement for bot and added

if pariOut:
    print "if pariOut worked"

after pariOut is first put in, and that printed and bot turned out to be 2121924616. In case one cared :)

But here is something perhaps slightly more interesting. If I print before and after

init_stack(size)

I get bot=0 before, but bot=212... after, but bot=0 again by the time the program segfaults. There are lots of other places bot could be set, and also the stack is set (I think?) in gen.pyx, so I am not at all sure what is going on here! I hope this helps someone figure it out.

comment:25 Changed 10 years ago by kcrisman

Here is something else that Mike Hansen mentioned on a sage-devel thread I also linked to in comment:4.

Currently Sage 4.7.alpha3 does not start up due to a segfault caused 
by either the new PARI added in 4.6 or the new interrupt handling 
code.  I have a clean backtrace for this, but I haven't delved into it 
yet. 

I wonder whether it would be possible to easily remove the "new interrupt handling code" from a current installation. Or worth it... I wouldn't even attempt to build 4.5.x with all other spkgs 'correct for Cygwin', as there probably would be nasty dependency issues...

comment:26 in reply to: ↑ 24 ; follow-up: Changed 10 years ago by jdemeyer

Replying to kcrisman:

I also put in a print statement for bot and added

if pariOut:
    print "if pariOut worked"

after pariOut is first put in, and that printed and bot turned out to be 2121924616. In case one cared :)

But here is something perhaps slightly more interesting. If I print before and after

init_stack(size)

I get bot=0 before, but bot=212... after, but bot=0 again by the time the program segfaults.

bot == 0 is certainly very bad and is surely the cause of the segfault. It would be great if you could figure out why bot is set to zero.

comment:27 in reply to: ↑ 26 ; follow-up: Changed 10 years ago by kcrisman

Replying to jdemeyer:

Replying to kcrisman:

But here is something perhaps slightly more interesting. If I print before and after I get bot=0 before, but bot=212... after, but bot=0 again by the time the program segfaults.

bot == 0 is certainly very bad and is surely the cause of the segfault. It would be great if you could figure out why bot is set to zero.

Hmm, okay. So here is what I discovered so far:

  • The PariInstance init is called well before the Pynac initialization.
  • In the PariInstance init, the actual deep copy is called three times, and all is well.
  • I am able to insert a statement like
    print "bot is now", bot
    
    in real_mpfr.pyx.
  • I am not allowed to do this in other files where PariInstance is imported, such as matrix/matrix_integer_dense.pyx. On the very reasonable grounds that
    undeclared name not builtin:bot
    

Well! So why the heck is real_mpfr.pyx allowing me to insert this print statement in the first place? It seems like this is why bot is causing problems - it shouldn't even be defined. Nowhere else in that file is bot used, other than in my print statements, as far as I could tell.

Anyway, in integer.pyx, Cython doesn't complain about this, and to my surprise bot is already 0 there!

Let's look at the relevant order in sage/all.py:

<snip>
from sage.libs.all       import *  # here is where Pari gets imported, and hence the PariInstance

from sage.rings.all      import * # here is where integer.pyx presumably is initialized and bot is already zero
from sage.matrix.all     import *

# This must come before Calculus -- it initializes the Pynac library.
import sage.symbolic.pynac # here is where the symptom turned up while initializing the square root of -1

I'll try to look into this a little more now, otherwise more tomorrow.

comment:28 Changed 10 years ago by kcrisman

It's somewhat disturbing how much stuff happens so early in sage/all.py just because we import sage/misc/functional.py in there. Pari is initialized and I get the integer.pyx bot=0, all within that.

My bisecting skills are getting a lot of practice now...

comment:29 Changed 10 years ago by kcrisman

The magic all happens in

from sage.rings.complex_double import CDF

in sage/misc/functional.py. I have been having a lot of difficulty narrowing it down further within complex_double.pyx, though.

comment:30 Changed 10 years ago by kcrisman

Okay, here is what happens in a normal Sage build (OS X, Sage 4.7):

----------------------------------------------------------------------
| Sage Version 4.7, Release Date: 2011-05-23                         |
| Type notebook() for the GUI, and license() for information.        |
----------------------------------------------------------------------
Going to import from functional # this is in sage/misc/all.py
Going to import CDF # this is is sage/misc/functional.py, near the top
0 # bot starts at 0
4338077696 # bot becomes this
In integer.pyx, bot is  4338077696 # it's still correct at this point
Top of complex_double # Now we actually reach complex_double!
Middle of complex_double
Bottom of complex_double
End Going to import CDF
End Going to import from functional

So somehow Pari is initialized, bot is created, and then other stuff happens including doing integer.pyx all before we actually get to the top of complex_double, even though we are importing from complex_double!

I think this is because the C file generated by Cython imports a lot of stuff ahead of translating the things that actually happen from complex_double.pyx. But at this point I'm out of my depth.

Can someone look in complex_double.c to see where Pari would first be initialized (and then where integer.pyx/integer.c would be invoked, precisely ONCE)? For instance, close to the top there is

#include "pari/paricfg.h"
#include "pari/pari.h"
#include "pari/paripriv.h"
#include "stdsage.h"
#include "interrupt.h"
#include "complex.h"
#include "ntl_wrap.h"
#include "ZZ_pylong.h"

but I have no idea whether that is relevant. There is also more than one place where I see

* cdef class PariInstance(sage.structure.parent_base.ParentWithBase):             # <<<<<<<<<<<<<<

in things coming from other files which are imported, I guess. The stretch around 1430 seems most likely

/* "sage/libs/pari/gen.pxd":20
 * cimport sage.structure.parent_base
 * 
 * cdef class PariInstance(sage.structure.parent_base.ParentWithBase):             # <<<<<<<<<<<<<<
 *     cdef gen PARI_ZERO, PARI_ONE, PARI_TWO
 *     cdef gen new_gen(self, GEN x)
 */

struct __pyx_vtabstruct_4sage_4libs_4pari_3gen_PariInstance {
  struct __pyx_vtabstruct_4sage_9structure_11parent_base_ParentWithBase __pyx_base;
  struct __pyx_obj_4sage_4libs_4pari_3gen_gen *(*new_gen)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, GEN);
  PyObject *(*new_gen_to_string)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, GEN);
  struct __pyx_obj_4sage_4libs_4pari_3gen_gen *(*new_gen_noclear)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, GEN);
  struct __pyx_obj_4sage_4libs_4pari_3gen_gen *(*new_gen_from_mpz_t)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, mpz_t);
  struct __pyx_obj_4sage_4libs_4pari_3gen_gen *(*new_gen_from_mpq_t)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, mpq_t);
  GEN (*new_GEN_from_mpz_t)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, mpz_t);
  struct __pyx_obj_4sage_4libs_4pari_3gen_gen *(*new_gen_from_int)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, int);
  struct __pyx_obj_4sage_4libs_4pari_3gen_gen *(*new_t_POL_from_int_star)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, int *, int, long);
  struct __pyx_obj_4sage_4libs_4pari_3gen_gen *(*new_gen_from_padic)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, long, long, mpz_t, mpz_t, mpz_t);
  void (*clear_stack)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *);
  void (*set_mytop_to_avma)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *);
  struct __pyx_obj_4sage_4libs_4pari_3gen_gen *(*double_to_gen_c)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, double);
  GEN (*double_to_GEN)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, double);
  GEN (*deepcopy_to_python_heap)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, GEN, pari_sp *);
  struct __pyx_obj_4sage_4libs_4pari_3gen_gen *(*new_ref)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, GEN, struct __pyx_obj_4sage_4libs_4pari_3gen_gen *);
  struct __pyx_obj_4sage_4libs_4pari_3gen_gen *(*_empty_vector)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, long);
  long (*get_var)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, PyObject *);
  GEN (*toGEN)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, PyObject *, int);
  GEN (*integer_matrix_GEN)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, mpz_t **, Py_ssize_t, Py_ssize_t);
  GEN (*integer_matrix_permuted_for_hnf_GEN)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, mpz_t **, Py_ssize_t, Py_ssize_t);
  PyObject *(*integer_matrix)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, mpz_t **, Py_ssize_t, Py_ssize_t, int);
  GEN (*rational_matrix_GEN)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, mpq_t **, Py_ssize_t, Py_ssize_t);
  PyObject *(*rational_matrix)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, mpq_t **, Py_ssize_t, Py_ssize_t);
};
static struct __pyx_vtabstruct_4sage_4libs_4pari_3gen_PariInstance *__pyx_vtabptr_4sage_4libs_4pari_3gen_PariInstance;

because it has these references to other files like integer_matrix, but again I have no idea. Help!

comment:31 in reply to: ↑ 27 ; follow-up: Changed 10 years ago by leif

Replying to kcrisman:

  • I am able to insert a statement like
print "bot is now", bot

in real_mpfr.pyx.

  • I am not allowed to do this in other files where PariInstance is imported, such as matrix/matrix_integer_dense.pyx. On the very reasonable grounds that
undeclared name not builtin:bot

Well! So why the heck is real_mpfr.pyx allowing me to insert this print statement in the first place?

You're "allowed" to print bot in some of the Cython files because they include .../pari/decl.pxi which declares it, e.g.:

$ grep "pari/decl" devel/sage/sage/rings/*
devel/sage/sage/rings/complex_double.pxd:include '../libs/pari/decl.pxi'
devel/sage/sage/rings/factorint.pyx:include "../libs/pari/decl.pxi"
devel/sage/sage/rings/fast_arith.pyx:include "../libs/pari/decl.pxi"
devel/sage/sage/rings/integer.pyx:include "../libs/pari/decl.pxi"
devel/sage/sage/rings/rational.pyx:include "../libs/pari/decl.pxi"
devel/sage/sage/rings/real_mpfr.pxd:include '../libs/pari/decl.pxi'

It seems like this is why bot is causing problems - it shouldn't even be defined. Nowhere else in that file is bot used, other than in my print statements, as far as I could tell.

Anyway, in integer.pyx, Cython doesn't complain about this, and to my surprise bot is already 0 there!

The value of bot (the lower / start address of PARI's stack) seems quite large, both on Cygwin and MacOS X.

init_stack() has some flaws, but these shouldn't be the cause of your problem.

The assignments to pariOut etc. and the pari_free() call are harmless; they don't allocate anything and cannot fail.

You can try to start sage with -min (bypassing a lot of imports), do

sage: from sage.libs.pari import *

and play with the PARI library, to see if that works.

Btw., complex_number.pyx doesn't import sage.libs.pari[.all] although it uses it.

Sorry, not of much help I guess.

comment:32 in reply to: ↑ 31 Changed 10 years ago by kcrisman

You're "allowed" to print bot in some of the Cython files because they include .../pari/decl.pxi which declares it, e.g.:

I see, that explains it.

Anyway, in integer.pyx, Cython doesn't complain about this, and to my surprise bot is already 0 there!

The value of bot (the lower / start address of PARI's stack) seems quite large, both on Cygwin and MacOS X.

Hmm, well, I can't tackle that :)

You can try to start sage with -min (bypassing a lot of imports), do

Doesn't bypass enough imports - I get the same. In fact, it even initializes I (and hence segfaults).

Btw., complex_number.pyx doesn't import sage.libs.pari[.all] although it uses it.

True. Nonetheless, I get the following sequence:

  1. sage/all.py imports misc
  2. sage/misc/all.py imports some stuff from functional.py
  3. In order to import that stuff, functional.py must be loaded, I guess (according to Python doc)
  4. One of the things imported in functional.py is from complex_double import CDF
  5. This is the exact line that causes the bot to become 0 in Cygwin
    1. In fact, it is the exact line which first initializes Pari and bot in the first place.
    2. Then, still within that line in functional.py, apparently integer.pyx is also called (for the first time?) and in Cygwin bot becomes 0 again
    3. But nowhere in complex_double.pyx is there something I can print to identify where this happens. Presumably it's in the various cimport statements there, which apparently come earlier in the C file than the import stuff and anything else, like print statements.
  6. By the time we leave this line in functional.py, all the damage is done.

Does that help? I just have no idea how all this happens in complex_double.c, yet importing CDF from that in misc/functional.py is where the bot=0 happens, for sure.

comment:33 follow-up: Changed 10 years ago by leif

Just curious, do you run Cygwin on a 32-bit Windows / machine?

Also, bot==0 isn't sufficient to cause the stack overflow, since (avma-0) / sizeof(long) should certainly be larger than x in new_chunk(size_t x) (unless avma is that large that it is interpreted as a negative number), so apparently avma gets corrupted as well (as bot and pariOut appear to).

But it would perhaps be helpful to print bot, top and avma after init_stack() has been called (in gen.pyx).

Maybe we should just fix the buggy init_stack(), though that doesn't explain why bot gets zero (and perhaps sage_pariOut also corrupted). I also wonder why this doesn't cause problems on other systems.

comment:34 Changed 10 years ago by leif

Oh, just noticed it's not that bad as I thought, as PARI actually defines pari_sp to ulong rather than long*.

(If it wouldn't, top and avma would point to bot + sizeof(long) * size, i.e. far beyond what gets allocated.)

comment:35 follow-up: Changed 10 years ago by leif

P.S.:

Does ./sage -t [-long] devel/sage/sage/tests/interrupt.pyx work on Cygwin?

comment:36 in reply to: ↑ 33 Changed 10 years ago by kcrisman

Replying to leif:

Just curious, do you run Cygwin on a 32-bit Windows / machine?

I run it on a Parallels running Boot Camp on a 64-bit Mac OS X machine :) Though I also had the same problem on a Win 7 machine I borrowed once, at least the segfault happened, I mean.

Also, bot==0 isn't sufficient to cause the stack overflow, since (avma-0) / sizeof(long) should certainly be larger than x in new_chunk(size_t x) (unless avma is that large that it is interpreted as a negative number), so apparently avma gets corrupted as well (as bot and pariOut appear to).

But it would perhaps be helpful to print bot, top and avma after init_stack() has been called (in gen.pyx).

I'll try that when I get a chance, not immediately.

Maybe we should just fix the buggy init_stack(), though that doesn't explain why bot gets zero (and perhaps sage_pariOut also corrupted). I also wonder why this doesn't cause problems on other systems.

Well, you can do that one :) Thanks for the ideas.

comment:37 in reply to: ↑ 35 ; follow-up: Changed 10 years ago by leif

Replying to leif:

Does ./sage -t [-long] devel/sage/sage/tests/interrupt.pyx work on Cygwin?

Testing this (which isn't directly related to PARI) would be important, too, since according to MSDN longjmping from a signal handler (which we do) doesn't work on Windows. I don't know if Cygwin works around that somehow.


So do you run a 32-bit Windows version (on your 64-bit machine)?

comment:38 in reply to: ↑ 37 ; follow-up: Changed 10 years ago by kcrisman

Replying to leif:

Replying to leif:

Does ./sage -t [-long] devel/sage/sage/tests/interrupt.pyx work on Cygwin?

Testing this (which isn't directly related to PARI) would be important, too, since according to MSDN longjmping from a signal handler (which we do) doesn't work on Windows. I don't know if Cygwin works around that somehow.

Yes, I saw that comment, but I'm still fixing the old builds to try to test this. ./sage -ba-force takes a long time on a VM, and I don't want to comment out "I" and "qqbar" on the newest build where I have all the print statements to try to fix this bug.

However, that said, I don't recall that being one of the files that caused problems when I tested before (with no maximalib/ECLlib and no I or qqbar).


So do you run a 32-bit Windows version (on your 64-bit machine)?

Well, I guess it's 64-bit but not 64-bit

  64-bit Kernel and Extensions:	No

So the processor is 64-bit, but not the kernel (whatever that difference is), and I don't have 64-bit on. In any case, the Parallels is running XP, and I checked and it is definitely a 32-bit version of XP.

comment:39 in reply to: ↑ 38 Changed 10 years ago by kcrisman

Replying to kcrisman:

Replying to leif:

Replying to leif:

Does ./sage -t [-long] devel/sage/sage/tests/interrupt.pyx work on Cygwin?

Testing this (which isn't directly related to PARI) would be important, too, since according to MSDN longjmping from a signal handler (which we do) doesn't work on Windows. I don't know if Cygwin works around that somehow.

Passed - 240.3 seconds.

comment:40 follow-up: Changed 10 years ago by kcrisman

As for the printing:

  • after initializing the stack, bot, top, and avma are all 2121924616
  • in integer.pyx, they are all zero.

So, I guess it's possible that x > (0 - 0) /sizeof(long) = 0. That doesn't explain how these are corrupted/reset, though. I think I will post to sage-devel asking someone exactly what is happening (that is, the sequence of files gone to) in that import of CDF. It's happening somewhere in there.

comment:41 in reply to: ↑ 40 ; follow-up: Changed 10 years ago by leif

Replying to kcrisman:

As for the printing:

  • after initializing the stack, bot, top, and avma are all 2121924616

I assume top and avma are 2121924616 + 16000000 = 2137924616, or bot is 2105924616; otherwise this would be the first error.

  • in integer.pyx, they are all zero.

Nice.

So, I guess it's possible that x > (0 - 0) /sizeof(long) = 0.

Certainly, since x there is the number of atomic PARI items (longs) to allocate on / put onto the stack.

comment:42 in reply to: ↑ 41 Changed 10 years ago by kcrisman

As for the printing:

  • after initializing the stack, bot, top, and avma are all 2121924616

I assume top and avma are 2121924616 + 16000000 = 2137924616, or bot is 2105924616; otherwise this would be the first error.

You are correct, I didn't look closely enough. Should have had a check digit :)

  • in integer.pyx, they are all zero.

So now we just need that import hook thing Robert Bradshaw talked about to find out where this could have changed in between.

comment:43 Changed 10 years ago by kcrisman

Well, this doesn't happen in any of the other files where /pari/decl.pxi is defined, unfortunately (as it would have been easier to find) - those are all imported on startup after integer.pyx, apparently, if at all (for instance, factorint.pyx isn't). I still can't find any other places in libs/pari/gen.pyx which is called during the startup where bot and friends are bad, either.

In fact, avma is exactly what it is supposed to be (213...) again well after all the bad stuff happens! Maybe Pari is 'unitialized' somehow, then initialized again since bot is once again zero... just not in time to save the Pynac_I initialization.

comment:44 Changed 10 years ago by leif

Well, some random pointer might corrupt PARI's stack variables as well.

But dumping the values of these variables whenever some module gets imported should help narrowing the place where this or similar happens.

comment:45 Changed 10 years ago by kcrisman

Just before the 0s appear, the 'deep copy to Python heap' is called and avma goes down to ...600 instead of 616. The others are the same.

Then we have zeros, also in the other things while importing CDF.

But the next time the deep copy in gen.pyx is called, avma is back to normal, down to 588 (later up to 592 when a new_gen is created).

So it seems that it might indeed be happening one of those places where avma is reset in complex_double.pyx? Presumably not the special function ones.


Random? But why is it so reliable then, and on different computers/versions of Windows? (Unless the importing of randstate did it, but I don't think that's what you meant, and I don't think this has avma or bot or top.)


The real problem is that I can't dump values of these variables while the pyx files are being imported, because editing the pyx file complex_double won't do that until after they are all imported. And I can't dump them from places where they aren't defined - precious few, really.

Really needing that import hook thingie.

comment:46 follow-up: Changed 9 years ago by kcrisman

This has reappeared in the discussion to #12104. Leif has a patch to get Cython files to say where they have problems importing somewhere, hopefully will be of use.

comment:47 in reply to: ↑ 46 Changed 9 years ago by dimpase

Replying to kcrisman:

This has reappeared in the discussion to #12104. Leif has a patch to get Cython files to say where they have problems importing somewhere, hopefully will be of use.

these global variables in Windows DDLs... IMHO they need a special treatment: search for "global" here: http://cygwin.com/faq/faq.programming.html

perhaps this is a source of all this blues.

comment:48 Changed 9 years ago by kcrisman

  • Description modified (diff)

comment:49 Changed 9 years ago by kcrisman

Still a problem. But:

User 1@GC02635 /home/SageUser/sage-4.7.2
$ ./sage
----------------------------------------------------------------------
| Sage Version 4.7.2, Release Date: 2011-10-29                       |
| Type notebook() for the GUI, and license() for information.        |
----------------------------------------------------------------------
sage: 2+2
4

This is on XP, after commenting out the inits in sage/symbolic/pynac.pyx and sage/all.py. So we just need to track down the problem.

comment:50 Changed 9 years ago by kcrisman

  • Cc jpflori added

@jpflori - any ideas on this one?

comment:51 Changed 9 years ago by jpflori

I remember doing something with the symbolic i some time ago, potentially while updating pynac. Not sure this is related, but I'll begin with finding traces of that.

comment:52 Changed 9 years ago by jpflori

This was here http://trac.sagemath.org/sage_trac/ticket/12950#comment:11 but seems unrelated at first sight.

comment:53 Changed 9 years ago by kcrisman

No, this is quite different, I think. You might as well read the long list of updates here first - it was quite an education even doing all this, though I was ultimately woefully unsuccessful.

comment:54 Changed 8 years ago by kcrisman

This is not showing on on Cygwin on XP with the current status of building, nor apparently on Windows 7 (JP, can you confirm this). Maybe we should close this, though it's frustrating not to know what the problem was.

comment:55 Changed 8 years ago by jpflori

No, I did not got that error on my Windows 7 install. Maybe this is related to #11116?

comment:56 Changed 8 years ago by kcrisman

  • Milestone changed from sage-5.4 to sage-duplicate/invalid/wontfix
  • Reviewers set to Karl-Dieter Crisman, Jean-Pierre Flori
  • Status changed from new to needs_review

I doubt it. Since neither of us is seeing this currently, I'll mark it to close, though if it ever happens again at least this info is here for posterity!

comment:57 Changed 8 years ago by kcrisman

  • Status changed from needs_review to positive_review

comment:58 follow-up: Changed 8 years ago by jpflori

The bug in #11116 only happens on one arch, and potentially with only a few versions of Sage. Maybe it's the same here. You were unlucky enough to try out a particular version of Sage on a particular machine where the problems of initialization order lead to a segfault. But a innocent looking chenge since anywhere in the Sage library might have made the problem disappear.

comment:59 in reply to: ↑ 58 ; follow-up: Changed 8 years ago by kcrisman

But a innocent looking chenge since anywhere in the Sage library might have made the problem disappear.

It's true. At the same time, it wasn't just me - Mike Hansen had this traceback well over a year ago. It is possible it was only ever on XP, who knows.

comment:60 in reply to: ↑ 59 ; follow-up: Changed 8 years ago by dimpase

Replying to kcrisman:

But a innocent looking chenge since anywhere in the Sage library might have made the problem disappear.

It's true. At the same time, it wasn't just me - Mike Hansen had this traceback well over a year ago. It is possible it was only ever on XP, who knows.

AFAIR, I saw this on my 32-bit Win7, too. IMHO, it's a Cygwin improvement that is to credit.

By the way, I have strange problems with my new 64-bit Win7, Sage install does not get past bzip2. Or is bzip2 supposed to come from Cygwin natively? Perhaps the toolchain is broken?

I use the latest Cygwin. Does it still need a manual fix in that libtool or autoconf or what was that?

comment:61 in reply to: ↑ 60 Changed 8 years ago by jpflori

Replying to dimpase:

Replying to kcrisman:

But a innocent looking chenge since anywhere in the Sage library might have made the problem disappear.

It's true. At the same time, it wasn't just me - Mike Hansen had this traceback well over a year ago. It is possible it was only ever on XP, who knows.

AFAIR, I saw this on my 32-bit Win7, too. IMHO, it's a Cygwin improvement that is to credit.

By the way, I have strange problems with my new 64-bit Win7, Sage install does not get past bzip2. Or is bzip2 supposed to come from Cygwin natively? Perhaps the toolchain is broken?

Did not encounter that problem, but I had the Cygwin bzip2 package installed before building Sage.

I use the latest Cygwin. Does it still need a manual fix in that libtool or autoconf or what was that?

I don't know... the problem was that updating from the gcc4 default package (4.3.stg) to the 4.5.stg "forgot" to update some pathes in configuration files in the postinst script. The latest 4.5 package seems to be from october 2011, so I doubt the problem has been fixed.

comment:62 Changed 8 years ago by jdemeyer

  • Resolution set to worksforme
  • Status changed from positive_review to closed

comment:63 Changed 5 years ago by embray

I've encountered exactly this issue trying to import sage, which I finally got to build after a number of other fixes (not all of which I've posted patches for yet).

Since this ticket has already been closed and is quite long, should I open a new one? Or just reopen this one? It appears to be the exact same issue--I'm getting a segfault at the line

pari_float = cgetr(2 + rounded_prec / wordsize)

in real_mpfr.pyx.

comment:64 Changed 5 years ago by jdemeyer

I suggest to open a new ticket.

Note: See TracTickets for help on using tickets.