Opened 12 years ago
Closed 11 years ago
#9583 closed defect (fixed)
Unhandled SIGSEGV with 4.5.2.alpha0 on t2
Reported by: | mpatel | Owned by: | drkirkby |
---|---|---|---|
Priority: | blocker | Milestone: | sage-4.5.2 |
Component: | porting: Solaris | Keywords: | |
Cc: | drkirkby, jhpalmieri, john_perry, malb, SimonKing, leif | Merged in: | sage-4.5.2.alpha1 |
Authors: | Dan Drake | Reviewers: | Mitesh Patel |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description (last modified by )
Reported by John Palmieri on sage-release:
t2.math: seems to build successfully, but I get the following when I try to start sage: ------------------------------------------------------------ Unhandled SIGSEGV: A segmentation fault occurred in Sage. This probably occurred because a *compiled* component of Sage has a bug in it (typically accessing invalid memory) or is not properly wrapped with _sig_on, _sig_off. You might want to run Sage under gdb with 'sage -gdb' to debug this. Sage will now terminate (sorry). ------------------------------------------------------------ I haven't tried to debug this. I don't know how to use gdb, in any case. Any suggestions about what the problem might be? You can find the build in /scratch/palmieri/.
Hardware and software configuration of t2.math.washington.edu
- Sun SPARC Enterprise T5240 Server
- 2 x 1167 MHz UltraSPARC T2 PLUS processors. (16 cores and 128 hardware threads in total).
- 32 GB RAM
- No swap devices configured.
- Solaris 10 update 7 (5/09)
- gcc 4.4.1 configured to use the Sun linker and Sun assembler.
- Sage was built in on a local ZFS file system (/scratch) as a 32-bit application.
Attachments (1)
Change History (61)
comment:1 Changed 12 years ago by
comment:2 Changed 12 years ago by
I just checked if anything odd had happened on 't2'. There are no obvious errors in the log file - just the usual ones related to the fact 'disk.math' has been mis-configured.
Jul 22 05:56:37 t2 nfs: [ID 236337 kern.info] NOTICE: [NFS4][Server: disk][Mntpt: /home]NFS op OP_GETATTR got error NFS4ERR_STALE causing recovery action NR_STALE. Jul 22 05:56:37 t2 nfs: [ID 236337 kern.info] NOTICE: [NFS4][Server: disk][Mntpt: /home]NFS op OP_LOOKUP got error NFS4ERR_STALE causing recovery action NR_STALE. Jul 22 05:56:37 t2 nfs: [ID 236337 kern.info] NOTICE: [NFS4][Server: disk][Mntpt: /home]NFS op OP_GETATTR got error NFS4ERR_STALE causing recovery action NR_STALE. Jul 22 05:56:37 t2 nfs: [ID 236337 kern.info] NOTICE: [NFS4][Server: disk][Mntpt: /home]NFS op OP_LOOKUP got error NFS4ERR_STALE causing recovery action NR_STALE. Jul 22 05:56:37 t2 nfs: [ID 236337 kern.info] NOTICE: [NFS4][Server: disk][Mntpt: /home]NFS op OP_GETATTR got error NFS4ERR_STALE causing recovery action NR_STALE. Jul 22 05:56:37 t2 nfs: [ID 286389 kern.info] NOTICE: [NFS4][Server: disk][Mntpt: /home]File ./sergey/core (rnode_pt: 30107802420) was closed due to NFS recovery error on server disk(failed to recover from NFS4ERR_STALE NFS4ERR_STALE) Jul 22 05:56:37 t2 nfs: [ID 941083 kern.info] NOTICE: NFS4 FACT SHEET: Jul 22 05:56:37 t2 Action: NR_STALE Jul 22 05:56:37 t2 NFS4 error: NFS4ERR_STALE Jul 22 06:26:59 t2 sshd[11907]: [ID 800047 auth.crit] fatal: Write failed: Broken pipe Jul 22 07:40:45 t2 sshd[18797]: [ID 800047 auth.crit] fatal: Write failed: Broken pipe Jul 22 09:16:24 t2 sshd[7318]: [ID 800047 auth.crit] fatal: Read from socket failed: Connection reset by peer Jul 22 11:11:05 t2 sshd[6504]: [ID 800047 auth.crit] fatal: Read from socket failed: Connection reset by peer
No messages today, so its probably not a problem on 't2'.
Dave
comment:3 follow-up: ↓ 13 Changed 12 years ago by
- Cc john_perry malb SimonKing added
Bisection indicates that #1396 is the source of the problem:
[...] good trac_9012.patch 9114_doc_infinite_polynomial.patch trac_9114-reviewer.patch good trac_9207.patch good trac%236922_final.patch good trac_9499.patch bad trac1396-singular_options.2.patch trac_9111.patch trac_9111-doc-edits.patch trac_9111-doc_addition.patch trac_9373.patch trac_9375-graph-doctests.patch bad trac_9485-strongly_connected_componnents_digraph-fix-nt.patch [...]
comment:4 follow-up: ↓ 6 Changed 12 years ago by
We merged #1396's trac1396-singular_options.2.patch in the sage repository's revision 14701.
comment:5 Changed 12 years ago by
I got the same problem on a SPARC of mine, but managed to quite easily find the problem running
sage -gdb
Here we can see what line causes the problem, though I expect it is auto generated by Cython, so knowing which line of python it is might be more difficult.
drkirkby@redstart:~/32/sage-4.5.2.alpha0$ ./sage -gdb ---------------------------------------------------------------------- | Sage Version 4.5.2.alpha0, Release Date: 2010-07-21 | | Type notebook() for the GUI, and license() for information. | ---------------------------------------------------------------------- ********************************************************************** * * * Warning: this is a prerelease version, and it may be unstable. * * * ********************************************************************** /export/home/drkirkby/32/sage-4.5.2.alpha0/local/bin/sage-ipython GNU gdb (GDB) 7.0.1 Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.10". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /export/home/drkirkby/32/sage-4.5.2.alpha0/local/bin/python...done. [Thread debugging using libthread_db enabled] [New Thread 1 (LWP 1)] Python 2.6.4 (r264:75706, Jul 23 2010, 17:40:08) [GCC 4.4.3] on sunos5 Type "help", "copyright", "credits" or "license" for more information. Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1 (LWP 1)] 0xfa660a74 in __pyx_pf_4sage_4libs_8singular_6option_27LibSingularOptions_abstract_load (__pyx_v_self=0x1f2b8f0, __pyx_args=0x1e58310, __pyx_kwds=<value optimized out>) at sage/libs/singular/option.cpp:1800 1800 Kstd1_mu = __pyx_t_5; Current language: auto The current source language is "auto; currently c++". (gdb) br __pyx_pf_4sage_4libs_8singular_6option_27LibSingularOptions_abstract_load Breakpoint 1 at 0xfa6606d8: file sage/libs/singular/option.cpp, line 1679. (gdb) run The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /export/home/drkirkby/32/sage-4.5.2.alpha0/local/bin/python -i [Thread debugging using libthread_db enabled] [New Thread 1 (LWP 1)] Python 2.6.4 (r264:75706, Jul 23 2010, 17:40:08) [GCC 4.4.3] on sunos5 Type "help", "copyright", "credits" or "license" for more information. [Switching to Thread 1 (LWP 1)] Breakpoint 1, __pyx_pf_4sage_4libs_8singular_6option_27LibSingularOptions_abstract_load (__pyx_v_self=0x1f2b8f0, __pyx_args=0x1e58310, __pyx_kwds=0x0) at sage/libs/singular/option.cpp:1679 1679 if (unlikely(__pyx_kwds)) { (gdb) s 1669 static PyObject *__pyx_pf_4sage_4libs_8singular_6option_27LibSingularOptions_abstract_load(PyObject *__pyx_v_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { (gdb) n 0xfa6532d8 in call_frame_dummy () from /export/home/drkirkby/32/sage-4.5.2.alpha0/local/lib/python2.6/site-packages/sage/libs/singular/option.so (gdb) n Single stepping until exit from function call_frame_dummy, which has no line number information. __pyx_pf_4sage_4libs_8singular_6option_27LibSingularOptions_abstract_load (__pyx_v_self=0x1f2b8f0, __pyx_args=0x1e58310, __pyx_kwds=0x0) at sage/libs/singular/option.cpp:1679 1679 if (unlikely(__pyx_kwds)) { (gdb) n 1701 switch (PyTuple_GET_SIZE(__pyx_args)) { (gdb) n 1702 case 1: __pyx_v_value = PyTuple_GET_ITEM(__pyx_args, 0); (gdb) n 1714 __Pyx_INCREF((PyObject *)__pyx_v_self); (gdb) 1715 __Pyx_INCREF(__pyx_v_value); (gdb) 1724 __pyx_t_1 = PyObject_RichCompare(__pyx_v_value, Py_None, Py_EQ); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 280; __pyx_clineno = __LINE__; goto __pyx_L1_error;} (gdb) 1715 __Pyx_INCREF(__pyx_v_value); (gdb) 1724 __pyx_t_1 = PyObject_RichCompare(__pyx_v_value, Py_None, Py_EQ); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 280; __pyx_clineno = __LINE__; goto __pyx_L1_error;} (gdb) 1726 __pyx_t_2 = __Pyx_PyObject_IsTrue(__pyx_t_1); if (unlikely(__pyx_t_2 < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 280; __pyx_clineno = __LINE__; goto __pyx_L1_error;} (gdb) 1727 __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; (gdb) 1728 if (__pyx_t_2) { (gdb) 1762 __pyx_t_1 = __Pyx_GetItemInt(__pyx_v_value, 0, sizeof(long), PyInt_FromLong); if (!__pyx_t_1) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 282; __pyx_clineno = __LINE__; goto __pyx_L1_error;} (gdb) 1764 __pyx_t_3 = PyTuple_New(1); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 282; __pyx_clineno = __LINE__; goto __pyx_L1_error;} (gdb) 1769 __pyx_t_1 = PyObject_Call(((PyObject *)((PyObject*)&PyInt_Type)), __pyx_t_3, NULL); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 282; __pyx_clineno = __LINE__; goto __pyx_L1_error;} (gdb) 1766 PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_t_1); (gdb) 1769 __pyx_t_1 = PyObject_Call(((PyObject *)((PyObject*)&PyInt_Type)), __pyx_t_3, NULL); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 282; __pyx_clineno = __LINE__; goto __pyx_L1_error;} (gdb) 1771 __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; (gdb) 5168 if (likely(PyInt_Check(x))) { (gdb) 1773 __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; (gdb) 1774 (((struct __pyx_obj_4sage_4libs_8singular_6option_LibSingularOptions_abstract *)__pyx_v_self)->global_options[0]) = __pyx_t_4; (gdb) 1783 __pyx_t_1 = __Pyx_GetItemInt(__pyx_v_value, 1, sizeof(long), PyInt_FromLong); if (!__pyx_t_1) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 285; __pyx_clineno = __LINE__; goto __pyx_L1_error;} (gdb) 1785 __pyx_t_5 = __Pyx_PyInt_AsInt(__pyx_t_1); if (unlikely((__pyx_t_5 == (int)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 285; __pyx_clineno = __LINE__; goto __pyx_L1_error;} (gdb) 1786 __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; (gdb) 1787 Kstd1_deg = __pyx_t_5; (gdb) 1796 __pyx_t_1 = __Pyx_GetItemInt(__pyx_v_value, 2, sizeof(long), PyInt_FromLong); if (!__pyx_t_1) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 286; __pyx_clineno = __LINE__; goto __pyx_L1_error;} (gdb) 1798 __pyx_t_5 = __Pyx_PyInt_AsInt(__pyx_t_1); if (unlikely((__pyx_t_5 == (int)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 286; __pyx_clineno = __LINE__; goto __pyx_L1_error;} (gdb) 1799 __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; (gdb) 1802 __pyx_r = Py_None; __Pyx_INCREF(Py_None); (gdb) 1800 Kstd1_mu = __pyx_t_5; (gdb) Program received signal SIGSEGV, Segmentation fault. 0xfa660a74 in __pyx_pf_4sage_4libs_8singular_6option_27LibSingularOptions_abstract_load (__pyx_v_self=0x1f2b8f0, __pyx_args=0x1e58310, __pyx_kwds=<value optimized out>) at sage/libs/singular/option.cpp:1800 1800 Kstd1_mu = __pyx_t_5; (gdb)
Dave
comment:6 in reply to: ↑ 4 Changed 12 years ago by
Replying to mpatel:
We merged #1396's trac1396-singular_options.2.patch in the sage repository's revision 14701.
What Mercuaial command could one use to reverse that? If I knew what I was going, perhaps I could revese it and rebuild the Sage library. But I don't know how to do this.
Dave
comment:7 Changed 12 years ago by
John said he did not know how to use GDB. These were the steps I took, which made finding this easy. I would add, it is often much more difficult to find the bugs - I perhaps got lucky here
- Start Sage with
sage -gdb
- Lucilly it crashed immediately, saying it was at
__pyx_pf_4sage_4libs_8singular_6option_27LibSingularOptions_abstract_load
- I put a breakpoint on that with
br __pyx_pf_4sage_4libs_8singular_6option_27LibSingularOptions_abstract_load
- Restarted the program. It broke at
__pyx_pf_4sage_4libs_8singular_6option_27LibSingularOptions_abstract_load
as expected. - I stepped into that bit of code using
s
which is short forstep
- I used
n
, which is short fornext
to execute one line at a time. - After using
next
once, just hitting return will runnext again
. For some reason I typed it a few times, but that was not necessary.
Dave
comment:8 follow-up: ↓ 9 Changed 12 years ago by
After replacing libs/singular/option.pyx
with the one from sage-4.5.rc1 (the version I happen to have lying around), and doing sage -b
, sage started without segfaulting. I assume that a bunch of doctests will break now since I haven't backed out all of the changes from #1396, but this file does seem to be the problem.
comment:9 in reply to: ↑ 8 ; follow-up: ↓ 10 Changed 12 years ago by
Replying to jhpalmieri:
After replacing
libs/singular/option.pyx
with the one from sage-4.5.rc1 (the version I happen to have lying around), and doingsage -b
, sage started without segfaulting. I assume that a bunch of doctests will break now since I haven't backed out all of the changes from #1396, but this file does seem to be the problem.
Thank you John. I will try that.
It would be nice to know if there was a good way to do this in Mercurial though. Adding patches is quite easy, but I assume there is a way to back them out even after they have been committed.
comment:10 in reply to: ↑ 9 Changed 12 years ago by
Replying to drkirkby:
It would be nice to know if there was a good way to do this in Mercurial though. Adding patches is quite easy, but I assume there is a way to back them out even after they have been committed.
You could try
hg up 14700
to check out revision 14700. To undo this, runhg up
, which checks out the "tip" revision.- Or
hg revert -r 14700 libs/singular/option.pyx
to revert justoption.pyx
. To undo this, runhg revert --all
, which should revert all files to their "tip" version.
You also could try patch -R
with the original patch, though I haven't done this. But I think the complexity of undoing just a given patch depends on whether subsequent commits modified the same files.
By the way, there's also hg bisect
, which I didn't use above because I had 4.5.1 + an unfinished queue available. But we might find it useful for tracking down doctest failures, crashes, etc.
comment:11 Changed 12 years ago by
I should add that it sometimes helps to run hg qpop -a
before checking out or reverting to other revisions.
comment:12 Changed 12 years ago by
- Description modified (diff)
comment:13 in reply to: ↑ 3 Changed 12 years ago by
Hi!
Replying to mpatel:
Bisection indicates that #1396 is the source of the problem:
Then I think I should explain what portion of that patch I think might be related, and my reasons for writing it.
As David find out using sage -gdb
, the segfault occurs in sage.libs.singular.option.LibSingularOptions_abstract.load
, and the C-code in the traceback suggests that it is exactly in line 286 of the Cython file,
Kstd1_mu = value[2]
What happens at startup?
In line 666 of option.pyx, some LibSingularOptions
object is created, and in line 667, the method reset_default()
is called. There, we have
from sage.libs.singular.singular import _saved_options self.load(_saved_options)
Where does _saved_options come from?
_saved_options is defined in line 51 of sage/libs/singular/singular.pyx. It is immediately initialised with some value, which I explicitly did in order to prevent a segfault caused by accessing an uninitialised C-variable. But the true initialisation (with the actual value used by libsingular) is then done at line 675.
In particular, value[2]
in the segfaulting line should be initialised to the value 0. So, I don't see why it should be a problem to assign this to the int variable Kstd1_mu
.
I am not familiar with t2, and I don't know whether I would be able to build sage on it (I don't even know if I have an account). So, could you please test (by adding print commands in the appropriate places or so):
- Is _save_value indeed initialised in line 51 of sage/libs/singular/singular.pyx?
- Is the true initialisation in line 675 of sage/libs/singular/singular.pyx executed?
- What is the argument of the
load
function right before it segfaults? It should be a list of three int, the last two being zero. - Could it be that I simply forgot to say
global _saved_options
before line 51 of sage/libs/singular/singular.pyx?
Cheers, Simon
comment:14 follow-up: ↓ 15 Changed 12 years ago by
I modified the "load" method to add some print statements:
def load(self, value=None): if value == None: value = (None,0,0) self.global_options[0] = int(value[0]) global Kstd1_deg global Kstd1_mu print value[0], value[1], value[2] Kstd1_deg = value[1] print "Kstd1_deg defined" print Kstd1_deg Kstd1_mu = value[2] print "Kstd1_mu defined" print Kstd1_mu
Then when I run "sage -br", I get this: it's not happy about setting Kstd1_mu
:
---------------------------------------------------------------------- | Sage Version 4.5.2.alpha0, Release Date: 2010-07-21 | | Type notebook() for the GUI, and license() for information. | ---------------------------------------------------------------------- ********************************************************************** * * * Warning: this is a prerelease version, and it may be unstable. * * * ********************************************************************** 100663426 0 0 Kstd1_deg defined 0 ------------------------------------------------------------ Unhandled SIGSEGV: A segmentation fault occurred in Sage. This probably occurred because a *compiled* component of Sage has a bug in it (typically accessing invalid memory) or is not properly wrapped with _sig_on, _sig_off. You might want to run Sage under gdb with 'sage -gdb' to debug this. Sage will now terminate (sorry). ------------------------------------------------------------
I tried changing the assignment for Kstd1_mu to Kstd1_mu = 0
but it didn't help.
In the file SAGE_ROOT/local/include/singular/kstd1.h
, the variables Kstd1_deg
and Kstd1_mu
are defined differently; could that be causing the problem?
extern int LazyPass,LazyDegree,mu,Kstd1_deg; #define Kstd1_mu mu
As far as using t2, I think if you have an account on sage.math, you have one on t2.math, same password, same home directory. Different /scratch directory. My build is in /scratch/palmieri/sage..., and it should be world-readable.
comment:15 in reply to: ↑ 14 ; follow-up: ↓ 18 Changed 12 years ago by
Hi John!
Yes, I already found out that indeed I have an account on t2.math. I could see only part of the advices, but at least I added a .profile in my home dictionary containing
if [ `uname -n` = t2 ] ; then . /usr/local/gcc-4.4.1-sun-linker/gcc441sun fi
Then, I unpacked the pre-built sage in some directory of mine (but not in /scratch, perhaps that would have been better), and Sage started.
Replying to jhpalmieri:
... In the file
SAGE_ROOT/local/include/singular/kstd1.h
, the variablesKstd1_deg
andKstd1_mu
are defined differently; could that be causing the problem?extern int LazyPass,LazyDegree,mu,Kstd1_deg; #define Kstd1_mu mu
Yes, this is something that we have been wondering about in Kaiserslautern at Sage Days 23.5.
My understanding is that the line #define Kstd1_mu mu
has the same effect as replacing Kstd1_mu
by mu
in any C-file that includes the header kstd1.h
- in particular, this should also hold for the Cython-generated C-files. Hans Schönemann (Singular developer) agreed that this should not be a problem. And I used similar #define
tricks myself repeatedly, so far without problem.
But this was all using gcc -- Could it be that t2/Sun compiler behave differently in that regard?
However, I would expect -- if kstd1.h
really is to blame for it - that Singular would not build. Recall that this is a file verbously taken from Singular.
Cheers,
Simon
comment:16 Changed 12 years ago by
The compiler on t2 is gcc, but maybe on Solaris, compiler behavior is more strict, so things which are slightly sloppy and work on other systems may fail on Solaris. But I know next to nothing about C and compilers and issues like that.
comment:17 Changed 12 years ago by
Are there any flags we could pass to the compiler for options.pyx (or singular.pyx) in SAGE_ROOT/devel/sage/module_list.py, which would help which this situation?
comment:18 in reply to: ↑ 15 Changed 12 years ago by
Replying to SimonKing:
But this was all using gcc -- Could it be that t2/Sun compiler behave differently in that regard?
The compiler used to build Sage on t2.math.washington.edu is gcc. The Sun compilers are much stricter than gcc, and will not loads of Sage. So this code is presenting a problem with gcc.
If I build Sage 64-bit on Solaris SPARC it dumps core very easily. I've noticed the errors often seem to be singular related. Also, if I load a library for debugging, by the time we have got to the
sage:
prompt, there is already a couple of memory leaks. But those should not cause a crash, unless they exhaust too much memory, which I don't think they will do. But it does make me a bit suspicious of the Singular/Sage? combination.
One thing worth doing is looking at compiler warnings. There are tons of them in Sage, but perhaps one near this problem might give us a clue.
Dave
comment:19 follow-up: ↓ 21 Changed 12 years ago by
I touched the files option.pyx and singular.pyx in devel/sage/sage/libs/singular/ and did "sage -b". The only warnings seem to be these:
cc1plus: warning: command line option "-Wstrict-prototypes" is valid for Ada/C/ObjC but not for C++ cc1plus: warning: command line option "-Wstrict-prototypes" is valid for Ada/C/ObjC but not for C++
comment:20 Changed 12 years ago by
For the record: Instead of importing Kstd1_mu
from "kstd1.h"
, I imported mu
(should be the same, by #define Kstd1_mu mu
) and changed the rest of the code accordingly. But the segfault remains.
comment:21 in reply to: ↑ 19 ; follow-up: ↓ 22 Changed 11 years ago by
Replying to jhpalmieri:
I touched the files option.pyx and singular.pyx in devel/sage/sage/libs/singular/ and did "sage -b". The only warnings seem to be these:
cc1plus: warning: command line option "-Wstrict-prototypes" is valid for Ada/C/ObjC but not for C++ cc1plus: warning: command line option "-Wstrict-prototypes" is valid for Ada/C/ObjC but not for C++
When I touch one of the files and run sage -b
I see:
building 'sage.libs.singular.option' extension gcc -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/export/home/drkirkby/32/sage-4.5.2.alpha0/local/include/singular -I/export/home/drkirkby/32/sage-4.5.2.alpha0/local//include -I/export/home/drkirkby/32/sage-4.5.2.alpha0/local//include/csage -I/export/home/drkirkby/32/sage-4.5.2.alpha0/devel//sage/sage/ext -I/export/home/drkirkby/32/sage-4.5.2.alpha0/local/include/python2.6 -c sage/libs/singular/option.cpp -o build/temp.solaris-2.10-sun4u-2.6/sage/libs/singular/option.o -w cc1plus: warning: command line option "-Wstrict-prototypes" is valid for Ada/C/ObjC but not for C++
What is certainly not a good idea is that the C compiler gcc is being invoked to compile a C++ program. Since option.cpp is a C++ file, it should be compiled with g++, not gcc.
I doubt that would be extensively checked by the gcc developers. It may be a case of you get away with it on some platforms some of the time, but not all platforms all of the time.
Dave
comment:22 in reply to: ↑ 21 ; follow-up: ↓ 24 Changed 11 years ago by
Replying to drkirkby:
Replying to jhpalmieri: ... What is certainly not a good idea is that the C compiler gcc is being invoked to compile a C++ program. Since option.cpp is a C++ file, it should be compiled with g++, not gcc.
So, the crucial question is: Why is Cython generating a C++ file rather than a C file?
Does it generate C++ since it is linked against Singular sources (which ostensibly are C++)?
And what happens if one invokes g++ rather then gcc on the Cython-generated code?
comment:23 follow-up: ↓ 25 Changed 11 years ago by
libpari-gmp.so.2
also exports a mu
symbol, by the way. But it's a function, not data, so assigning to that could cause that crash.
comment:24 in reply to: ↑ 22 Changed 11 years ago by
When I touch one of the files, I see
gcc -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/scratch/palmieri/sage-4.5.2.alpha0/local/include/singular -I/scratch/palmieri/sage-4.5.2.alpha0/local//include -I/scratch/palmieri/sage-4.5.2.alpha0/local//include/csage -I/scratch/palmieri/sage-4.5.2.alpha0/devel//sage/sage/ext -I/scratch/palmieri/sage-4.5.2.alpha0/local/include/python2.6 -c sage/libs/singular/option.cpp -o build/temp.solaris-2.10-sun4v-2.6/sage/libs/singular/option.o -w
then a few lines later,
g++ -shared build/temp.solaris-2.10-sun4v-2.6/sage/libs/singular/option.o -L/scratch/palmieri/sage-4.5.2.alpha0/local//lib -L/scratch/palmieri/sage-4.5.2.alpha0/local/lib -lcsage -lm -lreadline -lsingular -lgivaro -lgmpxx -lgmp -lstdc++ -lntl -lpython2.6 -o build/lib.solaris-2.10-sun4v-2.6/sage/libs/singular/option.so
So, the crucial question is: Why is Cython generating a C++ file rather than a C file?
Presumably because in devel/sage/module_list.py, it says
Extension('sage.libs.singular.option', sources = ['sage/libs/singular/option.pyx'], libraries = ['m', 'readline', 'singular', 'givaro', 'gmpxx', 'gmp'], language="c++", include_dirs = [SAGE_ROOT +'/local/include/singular'], depends = [SAGE_ROOT + "/local/include/libsingular.h"]),
Note the "language" line.
libpari-gmp.so.2 also exports a mu symbol, by the way. But it's a function, not data, so assigning to that could cause that crash.
That's interesting; could that be the problem? What can we do to fix it?
comment:25 in reply to: ↑ 23 ; follow-up: ↓ 27 Changed 11 years ago by
Replying to wjp:
libpari-gmp.so.2
also exports amu
symbol, by the way. But it's a function, not data, so assigning to that could cause that crash.
OK, but why would this only strike on t2?
comment:26 follow-up: ↓ 31 Changed 11 years ago by
If I comment out all assignments to Kstd1_mu
, then Sage starts and for the directory sage/libs/singular, there is only one doctest failure:
sage -t devel/sage/sage/libs/singular/option.pyx ********************************************************************** File "/scratch/palmieri/sage-4.5.2.alpha0/devel/sage-main/sage/libs/singular/option.pyx", line 415: sage: J.groebner_basis(mult_bound=100) Expected: [x^3*y^2 + y^3*z^2 + x^2*z^3, x^2*y^3 + x^3*z^2 + y^2*z^3, y^5, x^6 + x*y^4*z^5, x^4*z^2 - y^4*z^2 - x^2*y*z^3 + x*y^2*z^3, z^6 - x*y^4*z^4 - x^3*y*z^5] Got: [x^3*y^2 + y^3*z^2 + x^2*z^3, x^2*y^3 + x^3*z^2 + y^2*z^3, y^5, x^6, x^4*z^2 - y^4*z^2 - x^2*y*z^3 + x*y^2*z^3, z^6, y^4*z^3 - y^3*z^4 - x^2*z^5, x^3*y*z^4 - x^2*y^2*z^4 + x*y^3*z^4, x^3*z^5, x^2*y*z^5 + y^3*z^5, x*y^3*z^5]
Not surprising, since it looks setting the multiplicative bound wants to set Kstd1_mu
.
Is something like this an option, if we can't fix the whole problem?
comment:27 in reply to: ↑ 25 Changed 11 years ago by
Replying to SimonKing:
Replying to wjp:
libpari-gmp.so.2
also exports amu
symbol, by the way. But it's a function, not data, so assigning to that could cause that crash.OK, but why would this only strike on t2?
Bugs that only show on one system are quite common. That's one of the big advantages in testing on multiple platforms. I have hit many such bugs over the years.
- I recall writing some multi-threaded code which worked fine on numerous systems - Linux, HP-UX, Unicos, Solaris, tru64 and IRIX. But it occasionally failed on AIX. I suspected it was an AIX bug, but then I realised it was a bug in my code, which could have shown up on any operating system. It just never did.
- I recall an ex-colleague writing some finite difference code that worked fine on hi Linux system, but crashed on a quad core Solaris machine. He then looked at his code carefully and found there was a genuine bug.
I actually find that there are some bugs related to Singular and Pari that show up on 64-bit Solaris. I suspect they are genuine bugs and could cause problems on other system, but just have not to date
Dave
comment:28 follow-up: ↓ 29 Changed 11 years ago by
DrKirkby
: Please don't accuse other projects randomly and unconstructively. If there are 64 bit bugs in them, please report them in a separate ticket. (Or limit discussion of them to those tickets if you already made tickets.) This ticket is getting quite long enough already, and we haven't even fixed the problem in this one yet :-)
jhpalmieri: if doable, it would be good to rename mu
entirely inside singular entirely to something less generic. It seems they already use Kstd1_mu
internally everywhere with that #define Kstd1_mu mu
, so hopefully it's as easy as renaming mu
and removing the #define
. I'll see if I can get it to work quickly and make a new singular spkg for that.
comment:29 in reply to: ↑ 28 Changed 11 years ago by
Replying to wjp:
jhpalmieri: if doable, it would be good to rename
mu
entirely inside singular entirely to something less generic. It seems they already useKstd1_mu
internally everywhere with that#define Kstd1_mu mu
, so hopefully it's as easy as renamingmu
and removing the#define
. I'll see if I can get it to work quickly and make a new singular spkg for that.
That sounds like a good plan, I'll inform upstream to see whether they have some insight to share.
comment:30 follow-up: ↓ 33 Changed 11 years ago by
- Status changed from new to needs_review
New SPKG that renames mu
to Kstd1_mu
at:
http://www.math.leidenuniv.nl/~wpalenst/sage/singular-3.1.0.4.p8.spkg
It builds ok on my 64 bit linux system, and all tests pass.
malb: Waiting for upstream's feedback would be good too; thanks for informing them.
comment:31 in reply to: ↑ 26 ; follow-ups: ↓ 32 ↓ 35 Changed 11 years ago by
Replying to jhpalmieri:
If I comment out all assignments to
Kstd1_mu
, then Sage starts and for the directory sage/libs/singular, there is only one doctest failure: ... Not surprising, since it looks setting the multiplicative bound wants to setKstd1_mu
.Is something like this an option, if we can't fix the whole problem?
Usage of Kstd1_mu
was introduced by my patch at #1396. The aim of that patch was to make all Singular options available to libsingular. Two of Singular's options involve an int parameter: degBound
and multBound
. I used the former a lot, and it is kind of relieving that it doesn't cause a problem here. The latter, however, was new to me until two weeks ago.
So, personally, I'd say that your suggestion could be a short term solution. But still, I'd like to see all Singular options available in libsingular.
I'd prefer to first test whether wjp's solution works.
@wjp and @malb:
option.pyx links against certain files of Singular, located in SAGE_LOCAL
, if I'm not mistaken. Will installing the new spkg automatically put these files in place? Or do you say that the spkg actually fixes the problem already?
One general question: Is t2 really so slow, or am I getting a wrong impression? After touching option.pyx, it took about an hour to do sage -br
.
Cheers, Simon
comment:32 in reply to: ↑ 31 Changed 11 years ago by
Replying to SimonKing:
One general question: Is t2 really so slow, or am I getting a wrong impression? After touching option.pyx, it took about an hour to do
sage -br
.
... or could the missing speed be caused by my not using /scratch? Now I unpacked the available Sage binary in /scratch, and it was much faster.
comment:33 in reply to: ↑ 30 Changed 11 years ago by
Replying to wjp:
New SPKG that renames
mu
toKstd1_mu
at:http://www.math.leidenuniv.nl/~wpalenst/sage/singular-3.1.0.4.p8.spkg
Isn't there already a singular-3.1.1...spkg waiting for review (or already with a positive review)? Martin?
comment:34 follow-up: ↓ 36 Changed 11 years ago by
There is a ticket to create a newer version of Singular - see #8059. Although it had positive review, and I was marked as the reviewer, I never gave it positive review - the author did. I particularly asked if it worked properly on Solaris, but the only evidence presented was that the package built - not that Sage built, or Sage passed the doctests on Solaris.
However, if I understand correctly, 4.5.2 is going to be library updates only, with no updates to .spkg files, so I do not believe an updated Singular.spkg is likely to be merged in 4.5.2, though it might if the update is seen as critical. Though changing variable names in one version, when another version is likely to be merged on the release after this, is perhaps not a great idea.
Simon, you are correct, 't2' is slow. It's a shame really, as it is very nice hardware, which is totally unsuited for the task at hand. The CPUs are designed for a very different task. However, with parallel builds it is nowhere near as bad as it used to be.
Dave
comment:35 in reply to: ↑ 31 Changed 11 years ago by
Replying to SimonKing:
One general question: Is t2 really so slow, or am I getting a wrong impression? After touching option.pyx, it took about an hour to do
sage -br
.
Building in t2's /scratch
should help, but it really is slow for our purposes. It should also help to use, e.g.,
$ env MAKE="make -j48" ./sage -br
comment:36 in reply to: ↑ 34 ; follow-up: ↓ 39 Changed 11 years ago by
Replying to drkirkby:
However, if I understand correctly, 4.5.2 is going to be library updates only, with no updates to .spkg files, so I do not believe an updated Singular.spkg is likely to be merged in 4.5.2, though it might if the update is seen as critical. Though changing variable names in one version, when another version is likely to be merged on the release after this, is perhaps not a great idea.
Is it really "changning variable names"? If I understood correctly, Singular does not use the variable name mu
- it consistently uses Kstd1_mu
, and so do I. So, I wouldn't call it "changing variable names" but "clearifying a variable definition".
comment:37 follow-up: ↓ 38 Changed 11 years ago by
Thanks to the link to #8059. The new version of Singular there actually already has done exactly the same thing, and renamed mu
to Kstd1_mu
. So that spkg supercedes this one.
comment:38 in reply to: ↑ 37 ; follow-up: ↓ 41 Changed 11 years ago by
Replying to wjp:
Thanks to the link to #8059. The new version of Singular there actually already has done exactly the same thing, and renamed
mu
toKstd1_mu
. So that spkg supercedes this one.
That's good news!
I already started to build your singular-3.1.0-spkg and will report whether it works. And then I'll try again with the spkg from #8059, also doing make ptestlong
, so that hopefully a proper positive review for #8059 is possible.
Cheers, Simon
comment:39 in reply to: ↑ 36 Changed 11 years ago by
Replying to SimonKing:
Is it really "changning variable names"? If I understood correctly, Singular does not use the variable name
mu
- it consistently usesKstd1_mu
, and so do I. So, I wouldn't call it "changing variable names" but "clearifying a variable definition".
Sorry, I mis-understood.
Dave
comment:40 Changed 11 years ago by
It's not clear to me if the updated Singular can go into this release of Sage though. IIRC, William has specifically remarked that the updates to Pari and Singular would not be in this release.
Anyway, I must do something else. Need to finish a job application, which needs to be done in less than than two hours!
Dave
comment:41 in reply to: ↑ 38 Changed 11 years ago by
Replying to SimonKing:
I already started to build your singular-3.1.0-spkg and will report whether it works.
I am afraid it didn't. It seems to me that local/include/singular/kstd1.h was not replaced when I installed the new spkg.
So, where does local/include/singular/kstd1.h come from? Where do I get the files in local/include/singular/ from?
comment:42 Changed 11 years ago by
Sorry, that was a very stupid mistake in spkg-install
. I updated the spkg to fix that. (Same URL)
comment:43 follow-up: ↓ 44 Changed 11 years ago by
Replying to SimonKing:
Replying to SimonKing:
I already started to build your singular-3.1.0-spkg and will report whether it works.
I am afraid it didn't. It seems to me that local/include/singular/kstd1.h was not replaced when I installed the new spkg.
Willem's first singular-3.1.0.4.p8.spkg
yields an "Unhandled SIGSEGV" for me on t2; I haven't tested the latest version. However, Sage does start with the latest patch (I ignored the rejects) and package at #8059, after I run ./sage -b
. I'm running the long doctests now.
comment:44 in reply to: ↑ 43 Changed 11 years ago by
Replying to mpatel:
Willem's first
singular-3.1.0.4.p8.spkg
yields an "Unhandled SIGSEGV" for me on t2; I haven't tested the latest version. However, Sage does start with the latest patch (I ignored the rejects) and package at #8059, after I run./sage -b
. I'm running the long doctests now.
These pass, except for those fixed by #9590. The suite for sage/schemes/elliptic_curves/ell_rational_field.py
timed out (3602.4 s) and I'm rerunning it now. I'll try to install and test the new p8 spkg in a separate copy of 4.5.2.alpha0.
comment:45 follow-up: ↓ 46 Changed 11 years ago by
Would Mitesh, (who is a release manager for 4.3.2), consider merging the package at #8059, despite it being a major update to Singular, if it fixes this problem? I know the plan was not to update .spkg files in Sage 4.3.2, but plans do sometimes change.
If my understanding is correct,
http://sage.math.washington.edu/home/malb/spkgs/singular-3-1-1-4.spkg
which is the latest release, does not define Kstd1_mu to be mu, so this particular problem should not exist. However, since singular-3-1-1-4.spkg
has not been checked properly on 't2', there may be other problems.
It seems to me that the policy the release managers adopt regarding the updating of .spkg files could have a major impact on what happens on this ticket.
Dave
comment:46 in reply to: ↑ 45 ; follow-up: ↓ 51 Changed 11 years ago by
Replying to drkirkby:
Would Mitesh, (who is a release manager for 4.3.2),
I'm also a release manager for 4.5.2. :)
consider merging the package at #8059, despite it being a major update to Singular, if it fixes this problem? I know the plan was not to update .spkg files in Sage 4.3.2, but plans do sometimes change.
We are planning to merge a new sagenb spkg, but I would prefer to avoid merging a new Singular spkg.
Did the patch from #1396 fix any bugs or failing doctests, or did it only add new functionality? If it only added new functionality, I would really prefer to back it out for now.
If my understanding is correct,
http://sage.math.washington.edu/home/malb/spkgs/singular-3-1-1-4.spkg
which is the latest release, does not define Kstd1_mu to be mu, so this particular problem should not exist. However, since
singular-3-1-1-4.spkg
has not been checked properly on 't2', there may be other problems.It seems to me that the policy the release managers adopt regarding the updating of .spkg files could have a major impact on what happens on this ticket.
Dave
comment:47 follow-ups: ↓ 48 ↓ 52 ↓ 59 Changed 11 years ago by
By the way, the proper way in Mercurial to "undo" a changeset is the backout
command. Here's how you can test 4.5.2.alpha0 without #1396:
- build 4.5.2.alpha0
- make a new branch
- in that branch, do
hg backout --merge 14701
- do
hg commit
to commit the result of the merge - test Sage as usual
See http://hgbook.red-bean.com/read/finding-and-fixing-mistakes.html#id392287 for more info. I'm currently testing 4.5.2.alpha0 on t2.math after doing the above.
comment:48 in reply to: ↑ 47 Changed 11 years ago by
comment:49 Changed 11 years ago by
Willem's first singular-3.1.0.4.p8.spkg yields an "Unhandled SIGSEGV" for me on t2.
The first one didn't make any changes. After doing any of the others, you probably have to do "sage -ba" (or maybe you can get away with just rebuilding the pyx files in libs/singular?).
comment:50 follow-up: ↓ 58 Changed 11 years ago by
For what it's worth, I've tested the spkg posted here on a bunch of different platforms, both building from scratch and upgrading (followed by "sage -ba"), and it seems to behave well: passes all tests (except for known, unrelated, failures) on sage.math, taurus, menas, lena, my OS X box, and some tests on t2.math -- I haven't had time to run all of them, but it passes all tests on libs/singular/ and rings/. So I would give this a positive review. If you feel like merging this spkg instead of backing out #1396, you have that option. (For what it's worth, the package at #8059 does not work for me when I try to build 4.5.2.alpha0 with it; maybe it's not compatible with #1396?)
I'll run all long tests on t2.math tonight, and if there are no problems, I'll mark this officially as "positive review".
As I said, after upgrading, you have to make sure to do "sage -ba". Is this automatic in the upgrade process?
comment:51 in reply to: ↑ 46 Changed 11 years ago by
comment:52 in reply to: ↑ 47 Changed 11 years ago by
comment:53 Changed 11 years ago by
Since #1396 didn't fix any bugs or other failures, and since I'm not brave enough to merge any more spkgs than I already have, for 4.5.2 I propose that we backout attachment:trac1396-singular_options.2.patch. I will open a new ticket, in which we can re-merge #1396.
The patch here was obtained by using hg backout
, so if we trust Mercurial, it will exactly reverse the patch from #1396.
comment:54 Changed 11 years ago by
The new ticket for remerging is #9599.
comment:55 follow-up: ↓ 57 Changed 11 years ago by
I'm now testing 4.5.2.alpha0 plus comment 47's backout procedure on bsd.math, sage.math, and t2, but I won't be able to report the results until after I wake up.
comment:56 Changed 11 years ago by
- Cc leif added
comment:57 in reply to: ↑ 55 Changed 11 years ago by
- Reviewers set to Mitesh Patel
- Status changed from needs_review to positive_review
Replying to mpatel:
I'm now testing 4.5.2.alpha0 plus comment 47's backout procedure on bsd.math, sage.math, and t2, but I won't be able to report the results until after I wake up.
I get no new long doctest failures. Also,
$ cd SAGE_ROOT/devel/sage $ wget http://trac.sagemath.org/sage_trac/raw-attachment/ticket/1396/trac1396-singular_options.2.patch $ wget http://trac.sagemath.org/sage_trac/raw-attachment/ticket/9583/trac_9583.patch $ hg stat $ patch -p1 < trac1396-singular_options.2.patch patching file sage/interfaces/singular.py patching file sage/libs/singular/option.pyx patching file sage/libs/singular/singular-cdefs.pxi patching file sage/libs/singular/singular.pyx patching file sage/rings/polynomial/multi_polynomial_ideal.py $ hg stat M sage/interfaces/singular.py M sage/libs/singular/option.pyx M sage/libs/singular/singular-cdefs.pxi M sage/libs/singular/singular.pyx M sage/rings/polynomial/multi_polynomial_ideal.py $ patch -p1 < trac_9583.patch patching file sage/interfaces/singular.py patching file sage/libs/singular/option.pyx patching file sage/libs/singular/singular-cdefs.pxi patching file sage/libs/singular/singular.pyx patching file sage/rings/polynomial/multi_polynomial_ideal.py $ hg stat $ hg diff $
so I'm ready to give this a positive review.
comment:58 in reply to: ↑ 50 Changed 11 years ago by
Replying to jhpalmieri:
As I said, after upgrading, you have to make sure to do "sage -ba". Is this automatic in the upgrade process?
As far as I can tell, sage -upgrade
(which invoke sage-upgrade
and sage-update
) and sage -f/i
(which call sage-spkg
) do not check whether it's necessary to rebuild dependent packages (nor warn about or actually rebuild them), particularly those that are already installed (i.e., have a corresponding marker in SAGE_ROOT/spkg/installed
). Or am I misunderstanding your question?
comment:59 in reply to: ↑ 47 Changed 11 years ago by
Replying to ddrake:
By the way, the proper way in Mercurial to "undo" a changeset is the
backout
command. Here's how you can test 4.5.2.alpha0 without #1396:
- build 4.5.2.alpha0
- make a new branch
- in that branch, do
hg backout --merge 14701
- do
hg commit
to commit the result of the merge- test Sage as usual
See http://hgbook.red-bean.com/read/finding-and-fixing-mistakes.html#id392287 for more info. I'm currently testing 4.5.2.alpha0 on t2.math after doing the above.
Thank you for that Dan. Its usefulness to me will extend well beyond the point this ticket gets closed.
Dave
comment:60 Changed 11 years ago by
- Merged in set to sage-4.5.2.alpha1
- Resolution set to fixed
- Status changed from positive_review to closed
I've seen this, too, with
(I've deleted the build to make room in t2's /scratch.)
If it helps: We merged only sage library patches (no new spkgs) in 4.5.2.alpha0.