#11551 closed defect (worksforme)
Pari segfault on Sage startup in Cygwin
Reported by: | kcrisman | Owned by: | tbd |
---|---|---|---|
Priority: | major | Milestone: | sage-duplicate/invalid/wontfix |
Component: | porting: Cygwin | Keywords: | pari |
Cc: | dimpase, mhansen, jdemeyer, jpflori | Merged in: | |
Authors: | Reviewers: | Karl-Dieter Crisman, Jean-Pierre Flori | |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description (last modified by )
In both Windows XP and Window 7 it is now possible (again) to build Sage on Cygwin. However, Sage has a segmentation fault in Pari upon startup.
This happens in initalizing the Pynac i (init_pynac_I
in sage/symbolic/pynac.pyx), but the final thing is that the mpfr number 1.00000000000 causes the segfault upon running the ._pari_()
method. Suggestions as to why that would be - and a potential fix - are welcome.
Attachments (3)
Change History (67)
Changed 11 years ago by
comment:1 follow-up: ↓ 3 Changed 11 years ago by
I've attached a screenshot of the traceback - the best I can do in Cygwin with my limited experience.
comment:2 follow-up: ↓ 4 Changed 11 years ago by
Can you please attach your sage/rings/real_mpfr.c
?
Please do not report errors on non-released Sage versions (like sage-4.7.1.alpha4 in your case). Those versions can (and probably will) change slightly, which makes it harder to reproduce errors.
comment:3 in reply to: ↑ 1 Changed 11 years ago by
Replying to kcrisman:
I've attached a screenshot of the traceback - the best I can do in Cygwin with my limited experience.
hmm, you should be able to just copy the thing with your mouse and paste...
perhaps, running in an better terminal window, such as mintty : http://code.google.com/p/mintty/
comment:4 in reply to: ↑ 2 ; follow-up: ↓ 5 Changed 11 years ago by
Replying to jdemeyer:
Can you please attach your
sage/rings/real_mpfr.c
?
I'll try - depends on whether my wifi will work. I'm not on that computer currently.
Please do not report errors on non-released Sage versions (like sage-4.7.1.alpha4 in your case). Those versions can (and probably will) change slightly, which makes it harder to reproduce errors.
Well, building on Cygwin is not exactly straightforward, and (at least for me) extremely time-consuming, so I wanted to make sure I had as bleeding-edge of code as possible to catch potential problems. I find it unlikely that patches or spkgs will currently be backed out just because they break Cygwin, though if that is not true, that would make this job much easier and I would be very grateful.
Luckily, Mike Hansen already had this error (almost assuredly the same one) in 4.7.alpha3 - see this sage-devel thread. So I think that is the place to look. He thought it was the new error handling or the Pari upgrade, but <uniformed opinion>the message sounds more like Pari itself </uninformed opinion>).
I'd love to try a better terminal - William also had suggested one at Sage Days 31 - but I've only been really using Cygwin for maybe a week, and so I wouldn't even know how to ask Cygwin to use a different shell. Cut-and-paste does not work, as far as I've been able to tell.
comment:5 in reply to: ↑ 4 Changed 11 years ago by
Replying to kcrisman:
Replying to jdemeyer:
Can you please attach your
sage/rings/real_mpfr.c
?I'll try - depends on whether my wifi will work. I'm not on that computer currently.
Okay, that's a 1.5 MB file, so I am just posting a link.
http://sage.math.washington.edu/home/kcrisman/real_mpfr.c
This would be so great if it was possible to track down without too much trouble.
comment:6 Changed 11 years ago by
As another (possibly unrelated) data point, #6743 has two patches which change the behavior of sage/rings/complex_double.pyx to get Sage to start (well, a year or two ago).
comment:7 Changed 11 years ago by
I put in print statements at every conceivable place. Here is as far as it gets:
def _pari_(self): <snip comments/docs> sig_on() if mpfr_nan_p(self.value) or mpfr_inf_p(self.value): raise ValueError, 'Cannot convert NaN or infinity to Pari float' # wordsize for PARI cdef unsigned long wordsize = sizeof(long)*8 cdef int prec prec = (<RealField_class>self._parent).__prec # We round up the precision to the nearest multiple of wordsize. cdef int rounded_prec rounded_prec = (self.prec() + wordsize - 1) & ~(wordsize - 1) # Yes, assigning to self works fine, even in Pyrex. if rounded_prec > prec: self = RealField(rounded_prec)(self) cdef mpz_t mantissa cdef mp_exp_t exponent cdef GEN pari_float if mpfr_zero_p(self.value): pari_float = real_0_bit(-rounded_prec) else: # Now we can extract the mantissa, and it will be normalized # (the most significant bit of the most significant word will be 1). mpz_init(mantissa) exponent = mpfr_get_z_exp(mantissa, self.value) WE GET HERE AND NO FURTHER # Create a PARI REAL pari_float = cgetr(2 + rounded_prec / wordsize) mpz_export(&pari_float[2], NULL, 1, wordsize/8, 0, 0, mantissa) mpz_clear(mantissa) setexpo(pari_float, exponent + rounded_prec - 1) setsigne(pari_float, mpfr_sgn(self.value)) cdef PariInstance P P = sage.libs.pari.all.pari return P.new_gen(pari_float)
Since
# level1.h (incomplete!) GEN cgetg_copy(long lx, GEN x) GEN cgetg(long x, long y) GEN cgeti(long x) GEN cgetr(long x) long itos(GEN x) GEN real_0_bit(long bitprec) GEN stoi(long s)
so cgetr
is indeed from level1.h, which is where the sage -gdb
backtrace ends up before raising the interrupt. What would get that to have problems?
Also attaching screenshot of the traceback.
comment:8 Changed 11 years ago by
A little further "print"-ing revealed that that cgetr is the problem. By the way, 2 + rounded_prec / wordsize = 2 + 64/32 = 4
.
What does
cgetr (x=(value optimized out))
mean? Does this mean that the Pari float will always have the same precision no matter what?
comment:9 Changed 11 years ago by
GEN cgetr(long n) allocates memory on the stack for a t_REAL of length n, and initializes its first codeword. Identical to cgetg(n,t_REAL).
I'm going to try a few other things and then stop for now. But hopefully this helps.
comment:10 Changed 11 years ago by
Trying even cgetg(4,t_REAL)
raises a similar error. Pari seems to not be able to allocate anything - I don't know whether there is anything before this in initialization of Sage that has a problem.
Another data point: sage -gp
works fine. Something in libpari might be off. How might I test that without actually starting Sage?
comment:11 Changed 11 years ago by
I have now confirmed this with the released 4.7.1.alpha3 on both XP and Win7. It is very reproducible, always the same place.
comment:12 Changed 11 years ago by
Another update: commenting out everything about initializing the Pynac I doesn't help, because there is another place in initialization this is used:
rings/qqbar.py:5800: QQbar_I_nf = QuadraticField(-1, 'I', embedding=CC.gen())
which also causes the identical problem.
And _init_qqbar
in sage/all.py
seems like a fairly big thing to try to work around, even in testing. But commenting this out as well does allow Sage to start!
comment:13 follow-up: ↓ 21 Changed 11 years ago by
Some things to try:
- The new PARI spkg from #11130.
- Compiling PARI with SAGE_DEBUG=yes and posting a backtrace again.
comment:14 follow-up: ↓ 16 Changed 11 years ago by
I tried 2. first. Not very exciting.
Program received signal SIGSEGV, Segmentation fault. 0x343a8ad5 in pari_err () from /home/.../sage-4.7.1.alpha3/devel/sage/sage-main/build/sage/rings/real_mpfr.dll
I couldn't get anything out of it that I hadn't seen before.
Again, knowing how to test whether libpari is working at all would be really helpful. The files in local/lib/ certainly exist, at any rate, and they are the ones created when I ./sage -f'ed it just now.
comment:15 Changed 11 years ago by
I can't get 1. to install on Cygwin. Seems like a linking order error or something, see #11130.
comment:16 in reply to: ↑ 14 Changed 11 years ago by
Replying to kcrisman:
I tried 2. first. Not very exciting.
Program received signal SIGSEGV, Segmentation fault. 0x343a8ad5 in pari_err () from /home/.../sage-4.7.1.alpha3/devel/sage/sage-main/build/sage/rings/real_mpfr.dll
I couldn't get anything out of it that I hadn't seen before.
First "result" (i.e., I don't know yet why pari_error()
is called at all, but see below):
Debugging this bottom-up, according to your nice screen shot the segfault originates from:
static void err_init(void) { /* make sure pari_err msg starts at the beginning of line */ if (!pari_last_was_newline()) pari_putc('\n'); pariOut->flush(); /***** THIS SEGFAULTS *****/ pariErr->flush(); pariOut = pariErr; term_color(c_ERR); }
So obviously pariOut
(and most probably also pariErr
) aren't properly initialized at that point. (Note that line 885 in the vanilla PARI sources is the assignment statement, but we patch src/src/language/init.c
such that we get an offset of +2 lines.)
PARI error number 14 is "errpile
" (i.e. heap / [PARI] stack error), which is most probably raised for the same reason, namely because the PARI stack apparently isn't [yet] initialized when cgetr()
gets called.
For the moment, it's up to someone else to donate his/her 2 ct or more... ;-)
comment:17 follow-ups: ↓ 18 ↓ 19 Changed 11 years ago by
I have no idea why real_mpfr
[.pyx
] shouldn't initialize [the] PARI [library] (i.e., the pari_instance
variable defined in sage/libs/pari/gen.pyx
), but you (Karl-Dieter) could verify it gets initialized by putting some print statement(s) into PariInstance
's __init__()
, preferably (also) around pari_init_opts()
, to make sure the latter really gets called, because of
if bot: return # pari already initialized.
There are a few things that might be relevant here:
- Cython doesn't support C
enum
constants (here e.g.INIT_DFTm
), therefore one has to declare them ascdef extern int
s, but I don't think that's the problem here.
bot
is a very bad name for a global variable (of a library!), i.e. some other library / module might use the same for a different purpose, such that the one supposed to be PARI's may actually already have some non-zero value despite PARI not yet being initialized. (The early-return
check inPariInstance
's__init__()
worsens that to some extent, though other problems would certainly arise later in that case.)
comment:18 in reply to: ↑ 17 ; follow-up: ↓ 20 Changed 11 years ago by
Replying to leif:
I have no idea why
real_mpfr
[.pyx
] shouldn't initialize [the] PARI [library] (i.e., thepari_instance
variable defined insage/libs/pari/gen.pyx
), but you (Karl-Dieter) could verify it gets initialized by putting some print statement(s) intoPariInstance
's__init__()
, preferably (also) aroundpari_init_opts()
,
Thanks, Leif - that seems very reasonable. Unfortunately I sort of destroyed my installations trying to do #11130 and I'm not sure how to fix that. I didn't know what the #0
error was, so I just started at #1
, which at least I could interpret - well, I don't know much about Pari internals. But this explanation makes sense; can't allocate something to something that doesn't exist.
I'll try this when I get a chance.
comment:19 in reply to: ↑ 17 Changed 11 years ago by
- Cython doesn't support C
enum
constants (here e.g.INIT_DFTm
), therefore one has to declare them ascdef extern int
s, but I don't think that's the problem here.
bot
is a very bad name for a global variable (of a library!), i.e. some other library / module might use the same for a different purpose, such that the one supposed to be PARI's may actually already have some non-zero value despite PARI not yet being initialized. (The early-return
check inPariInstance
's__init__()
worsens that to some extent, though other problems would certainly arise later in that case.)
It's not far enough along to try these, but here's something naive.
cdef GEN pari_float <snip> else: <snip> # Create a PARI REAL pari_float = cgetr(2 + rounded_prec / wordsize) <snip> cdef PariInstance P P = sage.libs.pari.all.pari return P.new_gen(pari_float)
So it looks like the GEN gets defined before the PariInstance - is that a problem for some reason? Again, this is totally naive, and probably wrong since this works everywhere else.
comment:20 in reply to: ↑ 18 Changed 11 years ago by
Replying to kcrisman:
Unfortunately I sort of destroyed my installations trying to do #11130 and I'm not sure how to fix that.
I don't know how you managed that ;-) but you should be able to just reinstall the "old" PARI (2.4.3.alpha.p7) at least (assuming you also have a Sage branch without #11130's patches applied, though these only change doctests IIRC).
If you think something may get mixed up with a previous installation, you can also
$ rm -rf $SAGE_ROOT/local/include/pari/ $ rm $SAGE_ROOT/local/lib/libpari* $ rm $SAGE_ROOT/local/bin/{libpari,gp}*
before reinstalling the PARI package.
(And perhaps also run ./sage -ba-force
after you've reinstalled it.)
I didn't know what the
#0
error was, so I just started at#1
, which at least I could interpret [...]
No idea what the #0
and #1
refer to...
So it looks like the GEN gets defined before the PariInstance - is that a problem for some reason?
No. The weird trailer just explicitly uses the one and only global "PariInstance
" pari_instance
alias P
alias sage.libs.pari.gen.pari
alias sage.libs.pari.all.pari
(which should get initialized as soon as you import from that module (sage.libs.pari.gen
), which is done far above in real_mpfr.pyx
), because new_gen()
is only available as a member function (or "method") of an instance, for whatever reason.
comment:21 in reply to: ↑ 13 Changed 11 years ago by
Replying to jdemeyer:
Some things to try:
- Compiling PARI with SAGE_DEBUG=yes and posting a backtrace again.
We compile PARI with -g
by default btw., SAGE_DEBUG
only adds -O0
.
comment:22 Changed 11 years ago by
Latest screenshot shows that the upgrade of Pari in #11130 causes a slightly different segfault backtrace, but still along the same lines of what Leif is suggesting and nearly the same as before.
if (x > (avma-bot) / sizeof(long)) pari_err(errpile);
is line 86 in level1.h, unless there are patches in Sage, and with the same two-line offset the #0
error in the backtrace is the same as above. What I find interesting is that this time it doesn't mention real_mpfr
or cgetr
, though I assume that is still where the problem is.
comment:23 Changed 11 years ago by
Okay, inserting appropriate print statements gives
Got to first line of PariInsstance init Got beyond 'if bot' of PariInstance init instead Got beyond 'pari_init_opts' Got here Got to just before pari_float
which can be interpreted as
- Pari was initialized
- the 'if bot' was NOT taken
- and
pari_init_opts
was apparently called - then we got to the complex number line
- then we got to the
real_mpfr
line with thecgetr
- and we didn't make it past that, as usual.
So apparently bot
was not yet set, contrary to your hypothesis, but there is still something weird going on with the stack. The rest of the lines in the _init_
don't look that innocent either; if one of them failed or allocated something null would it raise an error? (Like the pari_free
line or the pariOut
lines?)
comment:24 follow-up: ↓ 26 Changed 11 years ago by
I also put in a print statement for bot and added
if pariOut: print "if pariOut worked"
after pariOut
is first put in, and that printed and bot turned out to be 2121924616. In case one cared :)
But here is something perhaps slightly more interesting. If I print before and after
init_stack(size)
I get bot=0
before, but bot=212...
after, but bot=0
again by the time the program segfaults. There are lots of other places bot could be set, and also the stack is set (I think?) in gen.pyx, so I am not at all sure what is going on here! I hope this helps someone figure it out.
comment:25 Changed 11 years ago by
Here is something else that Mike Hansen mentioned on a sage-devel thread I also linked to in comment:4.
Currently Sage 4.7.alpha3 does not start up due to a segfault caused by either the new PARI added in 4.6 or the new interrupt handling code. I have a clean backtrace for this, but I haven't delved into it yet.
I wonder whether it would be possible to easily remove the "new interrupt handling code" from a current installation. Or worth it... I wouldn't even attempt to build 4.5.x with all other spkgs 'correct for Cygwin', as there probably would be nasty dependency issues...
comment:26 in reply to: ↑ 24 ; follow-up: ↓ 27 Changed 11 years ago by
Replying to kcrisman:
I also put in a print statement for bot and added
if pariOut: print "if pariOut worked"after
pariOut
is first put in, and that printed and bot turned out to be 2121924616. In case one cared :)But here is something perhaps slightly more interesting. If I print before and after
init_stack(size)I get
bot=0
before, butbot=212...
after, butbot=0
again by the time the program segfaults.
bot == 0
is certainly very bad and is surely the cause of the segfault. It would be great if you could figure out why bot
is set to zero.
comment:27 in reply to: ↑ 26 ; follow-up: ↓ 31 Changed 11 years ago by
Replying to jdemeyer:
Replying to kcrisman:
But here is something perhaps slightly more interesting. If I print before and after I get
bot=0
before, butbot=212...
after, butbot=0
again by the time the program segfaults.
bot == 0
is certainly very bad and is surely the cause of the segfault. It would be great if you could figure out whybot
is set to zero.
Hmm, okay. So here is what I discovered so far:
- The PariInstance init is called well before the Pynac initialization.
- In the PariInstance init, the actual deep copy is called three times, and all is well.
- I am able to insert a statement like
print "bot is now", bot
inreal_mpfr.pyx
. - I am not allowed to do this in other files where PariInstance is imported, such as
matrix/matrix_integer_dense.pyx
. On the very reasonable grounds thatundeclared name not builtin:bot
Well! So why the heck is real_mpfr.pyx
allowing me to insert this print statement in the first place? It seems like this is why bot is causing problems - it shouldn't even be defined. Nowhere else in that file is bot used, other than in my print statements, as far as I could tell.
Anyway, in integer.pyx, Cython doesn't complain about this, and to my surprise bot is already 0 there!
Let's look at the relevant order in sage/all.py:
<snip> from sage.libs.all import * # here is where Pari gets imported, and hence the PariInstance from sage.rings.all import * # here is where integer.pyx presumably is initialized and bot is already zero from sage.matrix.all import * # This must come before Calculus -- it initializes the Pynac library. import sage.symbolic.pynac # here is where the symptom turned up while initializing the square root of -1
I'll try to look into this a little more now, otherwise more tomorrow.
comment:28 Changed 11 years ago by
It's somewhat disturbing how much stuff happens so early in sage/all.py just because we import sage/misc/functional.py in there. Pari is initialized and I get the integer.pyx bot=0
, all within that.
My bisecting skills are getting a lot of practice now...
comment:29 Changed 11 years ago by
The magic all happens in
from sage.rings.complex_double import CDF
in sage/misc/functional.py. I have been having a lot of difficulty narrowing it down further within complex_double.pyx
, though.
comment:30 Changed 11 years ago by
Okay, here is what happens in a normal Sage build (OS X, Sage 4.7):
---------------------------------------------------------------------- | Sage Version 4.7, Release Date: 2011-05-23 | | Type notebook() for the GUI, and license() for information. | ---------------------------------------------------------------------- Going to import from functional # this is in sage/misc/all.py Going to import CDF # this is is sage/misc/functional.py, near the top 0 # bot starts at 0 4338077696 # bot becomes this In integer.pyx, bot is 4338077696 # it's still correct at this point Top of complex_double # Now we actually reach complex_double! Middle of complex_double Bottom of complex_double End Going to import CDF End Going to import from functional
So somehow Pari is initialized, bot is created, and then other stuff happens including doing integer.pyx all before we actually get to the top of complex_double
, even though we are importing from complex_double
!
I think this is because the C file generated by Cython imports a lot of stuff ahead of translating the things that actually happen from complex_double.pyx
. But at this point I'm out of my depth.
Can someone look in complex_double.c
to see where Pari would first be initialized (and then where integer.pyx/integer.c would be invoked, precisely ONCE)? For instance, close to the top there is
#include "pari/paricfg.h" #include "pari/pari.h" #include "pari/paripriv.h" #include "stdsage.h" #include "interrupt.h" #include "complex.h" #include "ntl_wrap.h" #include "ZZ_pylong.h"
but I have no idea whether that is relevant. There is also more than one place where I see
* cdef class PariInstance(sage.structure.parent_base.ParentWithBase): # <<<<<<<<<<<<<<
in things coming from other files which are imported, I guess. The stretch around 1430 seems most likely
/* "sage/libs/pari/gen.pxd":20 * cimport sage.structure.parent_base * * cdef class PariInstance(sage.structure.parent_base.ParentWithBase): # <<<<<<<<<<<<<< * cdef gen PARI_ZERO, PARI_ONE, PARI_TWO * cdef gen new_gen(self, GEN x) */ struct __pyx_vtabstruct_4sage_4libs_4pari_3gen_PariInstance { struct __pyx_vtabstruct_4sage_9structure_11parent_base_ParentWithBase __pyx_base; struct __pyx_obj_4sage_4libs_4pari_3gen_gen *(*new_gen)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, GEN); PyObject *(*new_gen_to_string)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, GEN); struct __pyx_obj_4sage_4libs_4pari_3gen_gen *(*new_gen_noclear)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, GEN); struct __pyx_obj_4sage_4libs_4pari_3gen_gen *(*new_gen_from_mpz_t)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, mpz_t); struct __pyx_obj_4sage_4libs_4pari_3gen_gen *(*new_gen_from_mpq_t)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, mpq_t); GEN (*new_GEN_from_mpz_t)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, mpz_t); struct __pyx_obj_4sage_4libs_4pari_3gen_gen *(*new_gen_from_int)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, int); struct __pyx_obj_4sage_4libs_4pari_3gen_gen *(*new_t_POL_from_int_star)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, int *, int, long); struct __pyx_obj_4sage_4libs_4pari_3gen_gen *(*new_gen_from_padic)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, long, long, mpz_t, mpz_t, mpz_t); void (*clear_stack)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *); void (*set_mytop_to_avma)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *); struct __pyx_obj_4sage_4libs_4pari_3gen_gen *(*double_to_gen_c)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, double); GEN (*double_to_GEN)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, double); GEN (*deepcopy_to_python_heap)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, GEN, pari_sp *); struct __pyx_obj_4sage_4libs_4pari_3gen_gen *(*new_ref)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, GEN, struct __pyx_obj_4sage_4libs_4pari_3gen_gen *); struct __pyx_obj_4sage_4libs_4pari_3gen_gen *(*_empty_vector)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, long); long (*get_var)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, PyObject *); GEN (*toGEN)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, PyObject *, int); GEN (*integer_matrix_GEN)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, mpz_t **, Py_ssize_t, Py_ssize_t); GEN (*integer_matrix_permuted_for_hnf_GEN)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, mpz_t **, Py_ssize_t, Py_ssize_t); PyObject *(*integer_matrix)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, mpz_t **, Py_ssize_t, Py_ssize_t, int); GEN (*rational_matrix_GEN)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, mpq_t **, Py_ssize_t, Py_ssize_t); PyObject *(*rational_matrix)(struct __pyx_obj_4sage_4libs_4pari_3gen_PariInstance *, mpq_t **, Py_ssize_t, Py_ssize_t); }; static struct __pyx_vtabstruct_4sage_4libs_4pari_3gen_PariInstance *__pyx_vtabptr_4sage_4libs_4pari_3gen_PariInstance;
because it has these references to other files like integer_matrix, but again I have no idea. Help!
comment:31 in reply to: ↑ 27 ; follow-up: ↓ 32 Changed 11 years ago by
Replying to kcrisman:
- I am able to insert a statement like
print "bot is now", bot
in
real_mpfr.pyx
.
- I am not allowed to do this in other files where PariInstance is imported, such as
matrix/matrix_integer_dense.pyx
. On the very reasonable grounds that
undeclared name not builtin:bot
Well! So why the heck is
real_mpfr.pyx
allowing me to insert this print statement in the first place?
You're "allowed" to print bot
in some of the Cython files because they include .../pari/decl.pxi
which declares it, e.g.:
$ grep "pari/decl" devel/sage/sage/rings/* devel/sage/sage/rings/complex_double.pxd:include '../libs/pari/decl.pxi' devel/sage/sage/rings/factorint.pyx:include "../libs/pari/decl.pxi" devel/sage/sage/rings/fast_arith.pyx:include "../libs/pari/decl.pxi" devel/sage/sage/rings/integer.pyx:include "../libs/pari/decl.pxi" devel/sage/sage/rings/rational.pyx:include "../libs/pari/decl.pxi" devel/sage/sage/rings/real_mpfr.pxd:include '../libs/pari/decl.pxi'
It seems like this is why bot is causing problems - it shouldn't even be defined. Nowhere else in that file is bot used, other than in my print statements, as far as I could tell.
Anyway, in integer.pyx, Cython doesn't complain about this, and to my surprise bot is already 0 there!
The value of bot
(the lower / start address of PARI's stack) seems quite large, both on Cygwin and MacOS X.
init_stack()
has some flaws, but these shouldn't be the cause of your problem.
The assignments to pariOut
etc. and the pari_free()
call are harmless; they don't allocate anything and cannot fail.
You can try to start sage
with -min
(bypassing a lot of imports), do
sage: from sage.libs.pari import *
and play with the PARI library, to see if that works.
Btw., complex_number.pyx
doesn't import sage.libs.pari
[.all
] although it uses it.
Sorry, not of much help I guess.
comment:32 in reply to: ↑ 31 Changed 11 years ago by
You're "allowed" to print
bot
in some of the Cython files because they include.../pari/decl.pxi
which declares it, e.g.:
I see, that explains it.
Anyway, in integer.pyx, Cython doesn't complain about this, and to my surprise bot is already 0 there!
The value of
bot
(the lower / start address of PARI's stack) seems quite large, both on Cygwin and MacOS X.
Hmm, well, I can't tackle that :)
You can try to start
sage
with-min
(bypassing a lot of imports), do
Doesn't bypass enough imports - I get the same. In fact, it even initializes I (and hence segfaults).
Btw.,
complex_number.pyx
doesn't importsage.libs.pari
[.all
] although it uses it.
True. Nonetheless, I get the following sequence:
- sage/all.py imports misc
- sage/misc/all.py imports some stuff from functional.py
- In order to import that stuff, functional.py must be loaded, I guess (according to Python doc)
- One of the things imported in functional.py is
from complex_double import CDF
- This is the exact line that causes the bot to become 0 in Cygwin
- In fact, it is the exact line which first initializes Pari and bot in the first place.
- Then, still within that line in functional.py, apparently integer.pyx is also called (for the first time?) and in Cygwin bot becomes 0 again
- But nowhere in
complex_double.pyx
is there something I can print to identify where this happens. Presumably it's in the various cimport statements there, which apparently come earlier in the C file than the import stuff and anything else, like print statements.
- By the time we leave this line in functional.py, all the damage is done.
Does that help? I just have no idea how all this happens in complex_double.c
, yet importing CDF from that in misc/functional.py is where the bot=0
happens, for sure.
comment:33 follow-up: ↓ 36 Changed 11 years ago by
Just curious, do you run Cygwin on a 32-bit Windows / machine?
Also, bot==0
isn't sufficient to cause the stack overflow, since (avma-0) / sizeof(long)
should certainly be larger than x
in new_chunk(size_t x)
(unless avma
is that large that it is interpreted as a negative number), so apparently avma
gets corrupted as well (as bot
and pariOut
appear to).
But it would perhaps be helpful to print bot
, top
and avma
after init_stack()
has been called (in gen.pyx
).
Maybe we should just fix the buggy init_stack()
, though that doesn't explain why bot
gets zero (and perhaps sage_pariOut
also corrupted). I also wonder why this doesn't cause problems on other systems.
comment:34 Changed 11 years ago by
Oh, just noticed it's not that bad as I thought, as PARI actually defines pari_sp
to ulong
rather than long*
.
(If it wouldn't, top
and avma
would point to bot +
sizeof(long) *
size
, i.e. far beyond what gets allocated.)
comment:35 follow-up: ↓ 37 Changed 11 years ago by
P.S.:
Does ./sage -t [-long] devel/sage/sage/tests/interrupt.pyx
work on Cygwin?
comment:36 in reply to: ↑ 33 Changed 11 years ago by
Replying to leif:
Just curious, do you run Cygwin on a 32-bit Windows / machine?
I run it on a Parallels running Boot Camp on a 64-bit Mac OS X machine :) Though I also had the same problem on a Win 7 machine I borrowed once, at least the segfault happened, I mean.
Also,
bot==0
isn't sufficient to cause the stack overflow, since(avma-0) / sizeof(long)
should certainly be larger thanx
innew_chunk(size_t x)
(unlessavma
is that large that it is interpreted as a negative number), so apparentlyavma
gets corrupted as well (asbot
andpariOut
appear to).But it would perhaps be helpful to print
bot
,top
andavma
afterinit_stack()
has been called (ingen.pyx
).
I'll try that when I get a chance, not immediately.
Maybe we should just fix the buggy
init_stack()
, though that doesn't explain whybot
gets zero (and perhapssage_pariOut
also corrupted). I also wonder why this doesn't cause problems on other systems.
Well, you can do that one :) Thanks for the ideas.
comment:37 in reply to: ↑ 35 ; follow-up: ↓ 38 Changed 11 years ago by
Replying to leif:
Does
./sage -t [-long] devel/sage/sage/tests/interrupt.pyx
work on Cygwin?
Testing this (which isn't directly related to PARI) would be important, too, since according to MSDN longjmp
ing from a signal handler (which we do) doesn't work on Windows. I don't know if Cygwin works around that somehow.
So do you run a 32-bit Windows version (on your 64-bit machine)?
comment:38 in reply to: ↑ 37 ; follow-up: ↓ 39 Changed 11 years ago by
Replying to leif:
Replying to leif:
Does
./sage -t [-long] devel/sage/sage/tests/interrupt.pyx
work on Cygwin?Testing this (which isn't directly related to PARI) would be important, too, since according to MSDN
longjmp
ing from a signal handler (which we do) doesn't work on Windows. I don't know if Cygwin works around that somehow.
Yes, I saw that comment, but I'm still fixing the old builds to try to test this. ./sage -ba-force
takes a long time on a VM, and I don't want to comment out "I" and "qqbar" on the newest build where I have all the print statements to try to fix this bug.
However, that said, I don't recall that being one of the files that caused problems when I tested before (with no maximalib/ECLlib and no I or qqbar).
So do you run a 32-bit Windows version (on your 64-bit machine)?
Well, I guess it's 64-bit but not 64-bit
64-bit Kernel and Extensions: No
So the processor is 64-bit, but not the kernel (whatever that difference is), and I don't have 64-bit on. In any case, the Parallels is running XP, and I checked and it is definitely a 32-bit version of XP.
comment:39 in reply to: ↑ 38 Changed 11 years ago by
Replying to kcrisman:
Replying to leif:
Replying to leif:
Does
./sage -t [-long] devel/sage/sage/tests/interrupt.pyx
work on Cygwin?Testing this (which isn't directly related to PARI) would be important, too, since according to MSDN
longjmp
ing from a signal handler (which we do) doesn't work on Windows. I don't know if Cygwin works around that somehow.
Passed - 240.3 seconds.
comment:40 follow-up: ↓ 41 Changed 11 years ago by
As for the printing:
- after initializing the stack,
bot
,top
, andavma
are all 2121924616 - in integer.pyx, they are all zero.
So, I guess it's possible that x > (0 - 0) /sizeof(long) = 0
. That doesn't explain how these are corrupted/reset, though. I think I will post to sage-devel asking someone exactly what is happening (that is, the sequence of files gone to) in that import of CDF. It's happening somewhere in there.
comment:41 in reply to: ↑ 40 ; follow-up: ↓ 42 Changed 11 years ago by
Replying to kcrisman:
As for the printing:
- after initializing the stack,
bot
,top
, andavma
are all 2121924616
I assume top
and avma
are 2121924616 + 16000000 = 2137924616, or bot
is 2105924616; otherwise this would be the first error.
- in integer.pyx, they are all zero.
Nice.
So, I guess it's possible that
x > (0 - 0) /sizeof(long) = 0
.
Certainly, since x
there is the number of atomic PARI items (long
s) to allocate on / put onto the stack.
comment:42 in reply to: ↑ 41 Changed 11 years ago by
As for the printing:
- after initializing the stack,
bot
,top
, andavma
are all 2121924616I assume
top
andavma
are 2121924616 + 16000000 = 2137924616, orbot
is 2105924616; otherwise this would be the first error.
You are correct, I didn't look closely enough. Should have had a check digit :)
- in integer.pyx, they are all zero.
So now we just need that import hook thing Robert Bradshaw talked about to find out where this could have changed in between.
comment:43 Changed 11 years ago by
Well, this doesn't happen in any of the other files where /pari/decl.pxi
is defined, unfortunately (as it would have been easier to find) - those are all imported on startup after integer.pyx, apparently, if at all (for instance, factorint.pyx isn't). I still can't find any other places in libs/pari/gen.pyx
which is called during the startup where bot
and friends are bad, either.
In fact, avma
is exactly what it is supposed to be (213...) again well after all the bad stuff happens! Maybe Pari is 'unitialized' somehow, then initialized again since bot
is once again zero... just not in time to save the Pynac_I
initialization.
comment:44 Changed 11 years ago by
Well, some random pointer might corrupt PARI's stack variables as well.
But dumping the values of these variables whenever some module gets imported should help narrowing the place where this or similar happens.
comment:45 Changed 11 years ago by
Just before the 0
s appear, the 'deep copy to Python heap' is called and avma goes down to ...600 instead of 616. The others are the same.
Then we have zeros, also in the other things while importing CDF.
But the next time the deep copy in gen.pyx is called, avma is back to normal, down to 588 (later up to 592 when a new_gen
is created).
So it seems that it might indeed be happening one of those places where avma
is reset in complex_double.pyx
? Presumably not the special function ones.
Random? But why is it so reliable then, and on different computers/versions of Windows? (Unless the importing of randstate did it, but I don't think that's what you meant, and I don't think this has avma
or bot
or top
.)
The real problem is that I can't dump values of these variables while the pyx files are being imported, because editing the pyx file complex_double
won't do that until after they are all imported. And I can't dump them from places where they aren't defined - precious few, really.
Really needing that import hook thingie.
comment:46 follow-up: ↓ 47 Changed 11 years ago by
This has reappeared in the discussion to #12104. Leif has a patch to get Cython files to say where they have problems importing somewhere, hopefully will be of use.
comment:47 in reply to: ↑ 46 Changed 11 years ago by
Replying to kcrisman:
This has reappeared in the discussion to #12104. Leif has a patch to get Cython files to say where they have problems importing somewhere, hopefully will be of use.
these global variables in Windows DDLs... IMHO they need a special treatment: search for "global" here: http://cygwin.com/faq/faq.programming.html
perhaps this is a source of all this blues.
comment:48 Changed 11 years ago by
- Description modified (diff)
comment:49 Changed 11 years ago by
Still a problem. But:
User 1@GC02635 /home/SageUser/sage-4.7.2 $ ./sage ---------------------------------------------------------------------- | Sage Version 4.7.2, Release Date: 2011-10-29 | | Type notebook() for the GUI, and license() for information. | ---------------------------------------------------------------------- sage: 2+2 4
This is on XP, after commenting out the inits in sage/symbolic/pynac.pyx and sage/all.py. So we just need to track down the problem.
comment:51 Changed 10 years ago by
I remember doing something with the symbolic i some time ago, potentially while updating pynac. Not sure this is related, but I'll begin with finding traces of that.
comment:52 Changed 10 years ago by
This was here http://trac.sagemath.org/sage_trac/ticket/12950#comment:11 but seems unrelated at first sight.
comment:53 Changed 10 years ago by
No, this is quite different, I think. You might as well read the long list of updates here first - it was quite an education even doing all this, though I was ultimately woefully unsuccessful.
comment:54 Changed 10 years ago by
This is not showing on on Cygwin on XP with the current status of building, nor apparently on Windows 7 (JP, can you confirm this). Maybe we should close this, though it's frustrating not to know what the problem was.
comment:55 Changed 10 years ago by
No, I did not got that error on my Windows 7 install. Maybe this is related to #11116?
comment:56 Changed 10 years ago by
- Milestone changed from sage-5.4 to sage-duplicate/invalid/wontfix
- Reviewers set to Karl-Dieter Crisman, Jean-Pierre Flori
- Status changed from new to needs_review
I doubt it. Since neither of us is seeing this currently, I'll mark it to close, though if it ever happens again at least this info is here for posterity!
comment:57 Changed 10 years ago by
- Status changed from needs_review to positive_review
comment:58 follow-up: ↓ 59 Changed 10 years ago by
The bug in #11116 only happens on one arch, and potentially with only a few versions of Sage. Maybe it's the same here. You were unlucky enough to try out a particular version of Sage on a particular machine where the problems of initialization order lead to a segfault. But a innocent looking chenge since anywhere in the Sage library might have made the problem disappear.
comment:59 in reply to: ↑ 58 ; follow-up: ↓ 60 Changed 10 years ago by
But a innocent looking chenge since anywhere in the Sage library might have made the problem disappear.
It's true. At the same time, it wasn't just me - Mike Hansen had this traceback well over a year ago. It is possible it was only ever on XP, who knows.
comment:60 in reply to: ↑ 59 ; follow-up: ↓ 61 Changed 10 years ago by
Replying to kcrisman:
But a innocent looking chenge since anywhere in the Sage library might have made the problem disappear.
It's true. At the same time, it wasn't just me - Mike Hansen had this traceback well over a year ago. It is possible it was only ever on XP, who knows.
AFAIR, I saw this on my 32-bit Win7, too. IMHO, it's a Cygwin improvement that is to credit.
By the way, I have strange problems with my new 64-bit Win7, Sage install does not get past bzip2. Or is bzip2 supposed to come from Cygwin natively? Perhaps the toolchain is broken?
I use the latest Cygwin. Does it still need a manual fix in that libtool or autoconf or what was that?
comment:61 in reply to: ↑ 60 Changed 10 years ago by
Replying to dimpase:
Replying to kcrisman:
But a innocent looking chenge since anywhere in the Sage library might have made the problem disappear.
It's true. At the same time, it wasn't just me - Mike Hansen had this traceback well over a year ago. It is possible it was only ever on XP, who knows.
AFAIR, I saw this on my 32-bit Win7, too. IMHO, it's a Cygwin improvement that is to credit.
By the way, I have strange problems with my new 64-bit Win7, Sage install does not get past bzip2. Or is bzip2 supposed to come from Cygwin natively? Perhaps the toolchain is broken?
Did not encounter that problem, but I had the Cygwin bzip2 package installed before building Sage.
I use the latest Cygwin. Does it still need a manual fix in that libtool or autoconf or what was that?
I don't know... the problem was that updating from the gcc4 default package (4.3.stg) to the 4.5.stg "forgot" to update some pathes in configuration files in the postinst script. The latest 4.5 package seems to be from october 2011, so I doubt the problem has been fixed.
comment:62 Changed 10 years ago by
- Resolution set to worksforme
- Status changed from positive_review to closed
comment:63 Changed 6 years ago by
I've encountered exactly this issue trying to import sage, which I finally got to build after a number of other fixes (not all of which I've posted patches for yet).
Since this ticket has already been closed and is quite long, should I open a new one? Or just reopen this one? It appears to be the exact same issue--I'm getting a segfault at the line
pari_float = cgetr(2 + rounded_prec / wordsize)
in real_mpfr.pyx.
comment:64 Changed 6 years ago by
I suggest to open a new ticket.
Screenshot of the problem