Opened 11 years ago

Closed 11 years ago

#9583 closed defect (fixed)

Unhandled SIGSEGV with 4.5.2.alpha0 on t2

Reported by: mpatel Owned by: drkirkby
Priority: blocker Milestone: sage-4.5.2
Component: porting: Solaris Keywords:
Cc: drkirkby, jhpalmieri, john_perry, malb, SimonKing, leif Merged in: sage-4.5.2.alpha1
Authors: Dan Drake Reviewers: Mitesh Patel
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description (last modified by drkirkby)

Reported by John Palmieri on sage-release:

t2.math: seems to build successfully, but I get the following when I
try to start sage:

------------------------------------------------------------
Unhandled SIGSEGV: A segmentation fault occurred in Sage.
This probably occurred because a *compiled* component
of Sage has a bug in it (typically accessing invalid memory)
or is not properly wrapped with _sig_on, _sig_off.
You might want to run Sage under gdb with 'sage -gdb' to debug this.
Sage will now terminate (sorry).
------------------------------------------------------------

I haven't tried to debug this.  I don't know how to use gdb, in any
case.  Any suggestions about what the problem might be?  You can find
the build in /scratch/palmieri/.

Hardware and software configuration of t2.math.washington.edu

  • Sun SPARC Enterprise T5240 Server
  • 2 x 1167 MHz UltraSPARC T2 PLUS processors. (16 cores and 128 hardware threads in total).
  • 32 GB RAM
  • No swap devices configured.
  • Solaris 10 update 7 (5/09)
  • gcc 4.4.1 configured to use the Sun linker and Sun assembler.
  • Sage was built in on a local ZFS file system (/scratch) as a 32-bit application.

Attachments (1)

trac_9583.patch (46.5 KB) - added by ddrake 11 years ago.
backout attachment:trac1396-singular_options.2.patch

Download all attachments as: .zip

Change History (61)

comment:1 Changed 11 years ago by mpatel

I've seen this, too, with

$ env MAKE="make -j64" SAGE_PARALLEL_SPKG_BUILD="yes" make build

(I've deleted the build to make room in t2's /scratch.)

If it helps: We merged only sage library patches (no new spkgs) in 4.5.2.alpha0.

comment:2 Changed 11 years ago by drkirkby

I just checked if anything odd had happened on 't2'. There are no obvious errors in the log file - just the usual ones related to the fact 'disk.math' has been mis-configured.

Jul 22 05:56:37 t2 nfs: [ID 236337 kern.info] NOTICE: [NFS4][Server: disk][Mntpt: /home]NFS op OP_GETATTR got error NFS4ERR_STALE causing recovery action NR_STALE.
Jul 22 05:56:37 t2 nfs: [ID 236337 kern.info] NOTICE: [NFS4][Server: disk][Mntpt: /home]NFS op OP_LOOKUP got error NFS4ERR_STALE causing recovery action NR_STALE.
Jul 22 05:56:37 t2 nfs: [ID 236337 kern.info] NOTICE: [NFS4][Server: disk][Mntpt: /home]NFS op OP_GETATTR got error NFS4ERR_STALE causing recovery action NR_STALE.
Jul 22 05:56:37 t2 nfs: [ID 236337 kern.info] NOTICE: [NFS4][Server: disk][Mntpt: /home]NFS op OP_LOOKUP got error NFS4ERR_STALE causing recovery action NR_STALE.
Jul 22 05:56:37 t2 nfs: [ID 236337 kern.info] NOTICE: [NFS4][Server: disk][Mntpt: /home]NFS op OP_GETATTR got error NFS4ERR_STALE causing recovery action NR_STALE.
Jul 22 05:56:37 t2 nfs: [ID 286389 kern.info] NOTICE: [NFS4][Server: disk][Mntpt: /home]File ./sergey/core (rnode_pt: 30107802420) was closed due to NFS recovery error on server disk(failed to recover from NFS4ERR_STALE NFS4ERR_STALE)
Jul 22 05:56:37 t2 nfs: [ID 941083 kern.info] NOTICE: NFS4 FACT SHEET: 
Jul 22 05:56:37 t2  Action: NR_STALE 
Jul 22 05:56:37 t2  NFS4 error: NFS4ERR_STALE   
Jul 22 06:26:59 t2 sshd[11907]: [ID 800047 auth.crit] fatal: Write failed: Broken pipe
Jul 22 07:40:45 t2 sshd[18797]: [ID 800047 auth.crit] fatal: Write failed: Broken pipe
Jul 22 09:16:24 t2 sshd[7318]: [ID 800047 auth.crit] fatal: Read from socket failed: Connection reset by peer
Jul 22 11:11:05 t2 sshd[6504]: [ID 800047 auth.crit] fatal: Read from socket failed: Connection reset by peer

No messages today, so its probably not a problem on 't2'.

Dave

comment:3 follow-up: Changed 11 years ago by mpatel

  • Cc john_perry malb SimonKing added

Bisection indicates that #1396 is the source of the problem:

[...]
good    trac_9012.patch
        9114_doc_infinite_polynomial.patch
        trac_9114-reviewer.patch
good    trac_9207.patch
good    trac%236922_final.patch
good    trac_9499.patch
bad     trac1396-singular_options.2.patch
        trac_9111.patch
        trac_9111-doc-edits.patch
        trac_9111-doc_addition.patch
        trac_9373.patch
        trac_9375-graph-doctests.patch
bad     trac_9485-strongly_connected_componnents_digraph-fix-nt.patch
[...]

comment:4 follow-up: Changed 11 years ago by mpatel

We merged #1396's trac1396-singular_options.2.patch in the sage repository's revision 14701.

comment:5 Changed 11 years ago by drkirkby

I got the same problem on a SPARC of mine, but managed to quite easily find the problem running

sage -gdb

Here we can see what line causes the problem, though I expect it is auto generated by Cython, so knowing which line of python it is might be more difficult.

drkirkby@redstart:~/32/sage-4.5.2.alpha0$ ./sage -gdb
----------------------------------------------------------------------
| Sage Version 4.5.2.alpha0, Release Date: 2010-07-21                |
| Type notebook() for the GUI, and license() for information.        |
----------------------------------------------------------------------
**********************************************************************
*                                                                    *
* Warning: this is a prerelease version, and it may be unstable.     *
*                                                                    *
**********************************************************************
/export/home/drkirkby/32/sage-4.5.2.alpha0/local/bin/sage-ipython
GNU gdb (GDB) 7.0.1
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.10".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /export/home/drkirkby/32/sage-4.5.2.alpha0/local/bin/python...done.
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
Python 2.6.4 (r264:75706, Jul 23 2010, 17:40:08) 
[GCC 4.4.3] on sunos5
Type "help", "copyright", "credits" or "license" for more information.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1 (LWP 1)]
0xfa660a74 in __pyx_pf_4sage_4libs_8singular_6option_27LibSingularOptions_abstract_load (__pyx_v_self=0x1f2b8f0, 
    __pyx_args=0x1e58310, __pyx_kwds=<value optimized out>) at sage/libs/singular/option.cpp:1800
1800      Kstd1_mu = __pyx_t_5;
Current language:  auto
The current source language is "auto; currently c++".
(gdb) br __pyx_pf_4sage_4libs_8singular_6option_27LibSingularOptions_abstract_load
Breakpoint 1 at 0xfa6606d8: file sage/libs/singular/option.cpp, line 1679.
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /export/home/drkirkby/32/sage-4.5.2.alpha0/local/bin/python -i
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
Python 2.6.4 (r264:75706, Jul 23 2010, 17:40:08) 
[GCC 4.4.3] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
[Switching to Thread 1 (LWP 1)]

Breakpoint 1, __pyx_pf_4sage_4libs_8singular_6option_27LibSingularOptions_abstract_load (__pyx_v_self=0x1f2b8f0, 
    __pyx_args=0x1e58310, __pyx_kwds=0x0) at sage/libs/singular/option.cpp:1679
1679      if (unlikely(__pyx_kwds)) {
(gdb) s
1669    static PyObject *__pyx_pf_4sage_4libs_8singular_6option_27LibSingularOptions_abstract_load(PyObject *__pyx_v_self, PyObject *__pyx_args, PyObject *__pyx_kwds) {
(gdb) n
0xfa6532d8 in call_frame_dummy ()
   from /export/home/drkirkby/32/sage-4.5.2.alpha0/local/lib/python2.6/site-packages/sage/libs/singular/option.so
(gdb) n
Single stepping until exit from function call_frame_dummy, 
which has no line number information.
__pyx_pf_4sage_4libs_8singular_6option_27LibSingularOptions_abstract_load (__pyx_v_self=0x1f2b8f0, __pyx_args=0x1e58310, 
    __pyx_kwds=0x0) at sage/libs/singular/option.cpp:1679
1679      if (unlikely(__pyx_kwds)) {
(gdb) n
1701        switch (PyTuple_GET_SIZE(__pyx_args)) {
(gdb) n
1702          case  1: __pyx_v_value = PyTuple_GET_ITEM(__pyx_args, 0);
(gdb) n
1714      __Pyx_INCREF((PyObject *)__pyx_v_self);
(gdb) 
1715      __Pyx_INCREF(__pyx_v_value);
(gdb) 
1724      __pyx_t_1 = PyObject_RichCompare(__pyx_v_value, Py_None, Py_EQ); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 280; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
(gdb) 
1715      __Pyx_INCREF(__pyx_v_value);
(gdb) 
1724      __pyx_t_1 = PyObject_RichCompare(__pyx_v_value, Py_None, Py_EQ); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 280; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
(gdb) 
1726      __pyx_t_2 = __Pyx_PyObject_IsTrue(__pyx_t_1); if (unlikely(__pyx_t_2 < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 280; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
(gdb) 
1727      __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
(gdb) 
1728      if (__pyx_t_2) {
(gdb) 
1762      __pyx_t_1 = __Pyx_GetItemInt(__pyx_v_value, 0, sizeof(long), PyInt_FromLong); if (!__pyx_t_1) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 282; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
(gdb) 
1764      __pyx_t_3 = PyTuple_New(1); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 282; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
(gdb) 
1769      __pyx_t_1 = PyObject_Call(((PyObject *)((PyObject*)&PyInt_Type)), __pyx_t_3, NULL); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 282; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
(gdb) 
1766      PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_t_1);
(gdb) 
1769      __pyx_t_1 = PyObject_Call(((PyObject *)((PyObject*)&PyInt_Type)), __pyx_t_3, NULL); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 282; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
(gdb) 
1771      __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
(gdb) 
5168        if (likely(PyInt_Check(x))) {
(gdb) 
1773      __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
(gdb) 
1774      (((struct __pyx_obj_4sage_4libs_8singular_6option_LibSingularOptions_abstract *)__pyx_v_self)->global_options[0]) = __pyx_t_4;
(gdb) 
1783      __pyx_t_1 = __Pyx_GetItemInt(__pyx_v_value, 1, sizeof(long), PyInt_FromLong); if (!__pyx_t_1) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 285; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
(gdb) 
1785      __pyx_t_5 = __Pyx_PyInt_AsInt(__pyx_t_1); if (unlikely((__pyx_t_5 == (int)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 285; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
(gdb) 
1786      __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
(gdb) 
1787      Kstd1_deg = __pyx_t_5;
(gdb) 
1796      __pyx_t_1 = __Pyx_GetItemInt(__pyx_v_value, 2, sizeof(long), PyInt_FromLong); if (!__pyx_t_1) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 286; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
(gdb) 
1798      __pyx_t_5 = __Pyx_PyInt_AsInt(__pyx_t_1); if (unlikely((__pyx_t_5 == (int)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 286; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
(gdb) 
1799      __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
(gdb) 
1802      __pyx_r = Py_None; __Pyx_INCREF(Py_None);
(gdb) 
1800      Kstd1_mu = __pyx_t_5;
(gdb) 

Program received signal SIGSEGV, Segmentation fault.
0xfa660a74 in __pyx_pf_4sage_4libs_8singular_6option_27LibSingularOptions_abstract_load (__pyx_v_self=0x1f2b8f0, 
    __pyx_args=0x1e58310, __pyx_kwds=<value optimized out>) at sage/libs/singular/option.cpp:1800
1800      Kstd1_mu = __pyx_t_5;
(gdb) 

Dave

comment:6 in reply to: ↑ 4 Changed 11 years ago by drkirkby

Replying to mpatel:

We merged #1396's trac1396-singular_options.2.patch in the sage repository's revision 14701.

What Mercuaial command could one use to reverse that? If I knew what I was going, perhaps I could revese it and rebuild the Sage library. But I don't know how to do this.

Dave

comment:7 Changed 11 years ago by drkirkby

John said he did not know how to use GDB. These were the steps I took, which made finding this easy. I would add, it is often much more difficult to find the bugs - I perhaps got lucky here

  • Start Sage with sage -gdb
  • Lucilly it crashed immediately, saying it was at __pyx_pf_4sage_4libs_8singular_6option_27LibSingularOptions_abstract_load
  • I put a breakpoint on that with br __pyx_pf_4sage_4libs_8singular_6option_27LibSingularOptions_abstract_load
  • Restarted the program. It broke at __pyx_pf_4sage_4libs_8singular_6option_27LibSingularOptions_abstract_load as expected.
  • I stepped into that bit of code using s which is short for step
  • I used n, which is short for next to execute one line at a time.
  • After using next once, just hitting return will run next again. For some reason I typed it a few times, but that was not necessary.

Dave

comment:8 follow-up: Changed 11 years ago by jhpalmieri

After replacing libs/singular/option.pyx with the one from sage-4.5.rc1 (the version I happen to have lying around), and doing sage -b, sage started without segfaulting. I assume that a bunch of doctests will break now since I haven't backed out all of the changes from #1396, but this file does seem to be the problem.

comment:9 in reply to: ↑ 8 ; follow-up: Changed 11 years ago by drkirkby

Replying to jhpalmieri:

After replacing libs/singular/option.pyx with the one from sage-4.5.rc1 (the version I happen to have lying around), and doing sage -b, sage started without segfaulting. I assume that a bunch of doctests will break now since I haven't backed out all of the changes from #1396, but this file does seem to be the problem.

Thank you John. I will try that.

It would be nice to know if there was a good way to do this in Mercurial though. Adding patches is quite easy, but I assume there is a way to back them out even after they have been committed.

comment:10 in reply to: ↑ 9 Changed 11 years ago by mpatel

Replying to drkirkby:

It would be nice to know if there was a good way to do this in Mercurial though. Adding patches is quite easy, but I assume there is a way to back them out even after they have been committed.

You could try

  • hg up 14700 to check out revision 14700. To undo this, run hg up, which checks out the "tip" revision.
  • Or hg revert -r 14700 libs/singular/option.pyx to revert just option.pyx. To undo this, run hg revert --all, which should revert all files to their "tip" version.

You also could try patch -R with the original patch, though I haven't done this. But I think the complexity of undoing just a given patch depends on whether subsequent commits modified the same files.

By the way, there's also hg bisect, which I didn't use above because I had 4.5.1 + an unfinished queue available. But we might find it useful for tracking down doctest failures, crashes, etc.

comment:11 Changed 11 years ago by mpatel

I should add that it sometimes helps to run hg qpop -a before checking out or reverting to other revisions.

comment:12 Changed 11 years ago by drkirkby

  • Description modified (diff)

comment:13 in reply to: ↑ 3 Changed 11 years ago by SimonKing

Hi!

Replying to mpatel:

Bisection indicates that #1396 is the source of the problem:

Then I think I should explain what portion of that patch I think might be related, and my reasons for writing it.

As David find out using sage -gdb, the segfault occurs in sage.libs.singular.option.LibSingularOptions_abstract.load, and the C-code in the traceback suggests that it is exactly in line 286 of the Cython file,

Kstd1_mu  = value[2] 

What happens at startup?

In line 666 of option.pyx, some LibSingularOptions object is created, and in line 667, the method reset_default() is called. There, we have

    from sage.libs.singular.singular import _saved_options 
    self.load(_saved_options)

Where does _saved_options come from?

_saved_options is defined in line 51 of sage/libs/singular/singular.pyx. It is immediately initialised with some value, which I explicitly did in order to prevent a segfault caused by accessing an uninitialised C-variable. But the true initialisation (with the actual value used by libsingular) is then done at line 675.

In particular, value[2] in the segfaulting line should be initialised to the value 0. So, I don't see why it should be a problem to assign this to the int variable Kstd1_mu.

I am not familiar with t2, and I don't know whether I would be able to build sage on it (I don't even know if I have an account). So, could you please test (by adding print commands in the appropriate places or so):

  1. Is _save_value indeed initialised in line 51 of sage/libs/singular/singular.pyx?
  2. Is the true initialisation in line 675 of sage/libs/singular/singular.pyx executed?
  3. What is the argument of the load function right before it segfaults? It should be a list of three int, the last two being zero.
  4. Could it be that I simply forgot to say global _saved_options before line 51 of sage/libs/singular/singular.pyx?

Cheers, Simon

comment:14 follow-up: Changed 11 years ago by jhpalmieri

I modified the "load" method to add some print statements:

    def load(self, value=None):
        if value == None:
            value = (None,0,0)
        self.global_options[0] = int(value[0])
        global Kstd1_deg
        global Kstd1_mu

        print value[0], value[1], value[2]

        Kstd1_deg = value[1]
        print "Kstd1_deg defined"
        print Kstd1_deg

        Kstd1_mu  = value[2]

        print "Kstd1_mu defined"
        print Kstd1_mu

Then when I run "sage -br", I get this: it's not happy about setting Kstd1_mu:

----------------------------------------------------------------------
| Sage Version 4.5.2.alpha0, Release Date: 2010-07-21                |
| Type notebook() for the GUI, and license() for information.        |
----------------------------------------------------------------------
**********************************************************************
*                                                                    *
* Warning: this is a prerelease version, and it may be unstable.     *
*                                                                    *
**********************************************************************
100663426 0 0
Kstd1_deg defined
0


------------------------------------------------------------
Unhandled SIGSEGV: A segmentation fault occurred in Sage.
This probably occurred because a *compiled* component
of Sage has a bug in it (typically accessing invalid memory)
or is not properly wrapped with _sig_on, _sig_off.
You might want to run Sage under gdb with 'sage -gdb' to debug this.
Sage will now terminate (sorry).
------------------------------------------------------------

I tried changing the assignment for Kstd1_mu to Kstd1_mu = 0 but it didn't help.

In the file SAGE_ROOT/local/include/singular/kstd1.h, the variables Kstd1_deg and Kstd1_mu are defined differently; could that be causing the problem?

extern int LazyPass,LazyDegree,mu,Kstd1_deg;
#define Kstd1_mu mu

As far as using t2, I think if you have an account on sage.math, you have one on t2.math, same password, same home directory. Different /scratch directory. My build is in /scratch/palmieri/sage..., and it should be world-readable.

comment:15 in reply to: ↑ 14 ; follow-up: Changed 11 years ago by SimonKing

Hi John!

Yes, I already found out that indeed I have an account on t2.math. I could see only part of the advices, but at least I added a .profile in my home dictionary containing

if [ `uname -n` = t2 ] ; then
   . /usr/local/gcc-4.4.1-sun-linker/gcc441sun
fi

Then, I unpacked the pre-built sage in some directory of mine (but not in /scratch, perhaps that would have been better), and Sage started.

Replying to jhpalmieri:

... In the file SAGE_ROOT/local/include/singular/kstd1.h, the variables Kstd1_deg and Kstd1_mu are defined differently; could that be causing the problem?

extern int LazyPass,LazyDegree,mu,Kstd1_deg;
#define Kstd1_mu mu

Yes, this is something that we have been wondering about in Kaiserslautern at Sage Days 23.5.

My understanding is that the line #define Kstd1_mu mu has the same effect as replacing Kstd1_mu by mu in any C-file that includes the header kstd1.h - in particular, this should also hold for the Cython-generated C-files. Hans Schönemann (Singular developer) agreed that this should not be a problem. And I used similar #define tricks myself repeatedly, so far without problem.

But this was all using gcc -- Could it be that t2/Sun compiler behave differently in that regard?

However, I would expect -- if kstd1.h really is to blame for it - that Singular would not build. Recall that this is a file verbously taken from Singular.

Cheers,

Simon

comment:16 Changed 11 years ago by jhpalmieri

The compiler on t2 is gcc, but maybe on Solaris, compiler behavior is more strict, so things which are slightly sloppy and work on other systems may fail on Solaris. But I know next to nothing about C and compilers and issues like that.

comment:17 Changed 11 years ago by jhpalmieri

Are there any flags we could pass to the compiler for options.pyx (or singular.pyx) in SAGE_ROOT/devel/sage/module_list.py, which would help which this situation?

comment:18 in reply to: ↑ 15 Changed 11 years ago by drkirkby

Replying to SimonKing:

But this was all using gcc -- Could it be that t2/Sun compiler behave differently in that regard?

The compiler used to build Sage on t2.math.washington.edu is gcc. The Sun compilers are much stricter than gcc, and will not loads of Sage. So this code is presenting a problem with gcc.

If I build Sage 64-bit on Solaris SPARC it dumps core very easily. I've noticed the errors often seem to be singular related. Also, if I load a library for debugging, by the time we have got to the

sage:

prompt, there is already a couple of memory leaks. But those should not cause a crash, unless they exhaust too much memory, which I don't think they will do. But it does make me a bit suspicious of the Singular/Sage? combination.

One thing worth doing is looking at compiler warnings. There are tons of them in Sage, but perhaps one near this problem might give us a clue.

Dave

comment:19 follow-up: Changed 11 years ago by jhpalmieri

I touched the files option.pyx and singular.pyx in devel/sage/sage/libs/singular/ and did "sage -b". The only warnings seem to be these:

cc1plus: warning: command line option "-Wstrict-prototypes" is valid for Ada/C/ObjC but not for C++
cc1plus: warning: command line option "-Wstrict-prototypes" is valid for Ada/C/ObjC but not for C++

comment:20 Changed 11 years ago by SimonKing

For the record: Instead of importing Kstd1_mu from "kstd1.h", I imported mu (should be the same, by #define Kstd1_mu mu) and changed the rest of the code accordingly. But the segfault remains.

comment:21 in reply to: ↑ 19 ; follow-up: Changed 11 years ago by drkirkby

Replying to jhpalmieri:

I touched the files option.pyx and singular.pyx in devel/sage/sage/libs/singular/ and did "sage -b". The only warnings seem to be these:

cc1plus: warning: command line option "-Wstrict-prototypes" is valid for Ada/C/ObjC but not for C++
cc1plus: warning: command line option "-Wstrict-prototypes" is valid for Ada/C/ObjC but not for C++

When I touch one of the files and run sage -b I see:

building 'sage.libs.singular.option' extension
gcc -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/export/home/drkirkby/32/sage-4.5.2.alpha0/local/include/singular -I/export/home/drkirkby/32/sage-4.5.2.alpha0/local//include -I/export/home/drkirkby/32/sage-4.5.2.alpha0/local//include/csage -I/export/home/drkirkby/32/sage-4.5.2.alpha0/devel//sage/sage/ext -I/export/home/drkirkby/32/sage-4.5.2.alpha0/local/include/python2.6 -c sage/libs/singular/option.cpp -o build/temp.solaris-2.10-sun4u-2.6/sage/libs/singular/option.o -w
cc1plus: warning: command line option "-Wstrict-prototypes" is valid for Ada/C/ObjC but not for C++

What is certainly not a good idea is that the C compiler gcc is being invoked to compile a C++ program. Since option.cpp is a C++ file, it should be compiled with g++, not gcc.

I doubt that would be extensively checked by the gcc developers. It may be a case of you get away with it on some platforms some of the time, but not all platforms all of the time.

Dave

comment:22 in reply to: ↑ 21 ; follow-up: Changed 11 years ago by SimonKing

Replying to drkirkby:

Replying to jhpalmieri: ... What is certainly not a good idea is that the C compiler gcc is being invoked to compile a C++ program. Since option.cpp is a C++ file, it should be compiled with g++, not gcc.

So, the crucial question is: Why is Cython generating a C++ file rather than a C file?

Does it generate C++ since it is linked against Singular sources (which ostensibly are C++)?

And what happens if one invokes g++ rather then gcc on the Cython-generated code?

comment:23 follow-up: Changed 11 years ago by wjp

libpari-gmp.so.2 also exports a mu symbol, by the way. But it's a function, not data, so assigning to that could cause that crash.

comment:24 in reply to: ↑ 22 Changed 11 years ago by jhpalmieri

When I touch one of the files, I see

gcc -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/scratch/palmieri/sage-4.5.2.alpha0/local/include/singular -I/scratch/palmieri/sage-4.5.2.alpha0/local//include -I/scratch/palmieri/sage-4.5.2.alpha0/local//include/csage -I/scratch/palmieri/sage-4.5.2.alpha0/devel//sage/sage/ext -I/scratch/palmieri/sage-4.5.2.alpha0/local/include/python2.6 -c sage/libs/singular/option.cpp -o build/temp.solaris-2.10-sun4v-2.6/sage/libs/singular/option.o -w

then a few lines later,

g++ -shared build/temp.solaris-2.10-sun4v-2.6/sage/libs/singular/option.o -L/scratch/palmieri/sage-4.5.2.alpha0/local//lib -L/scratch/palmieri/sage-4.5.2.alpha0/local/lib -lcsage -lm -lreadline -lsingular -lgivaro -lgmpxx -lgmp -lstdc++ -lntl -lpython2.6 -o build/lib.solaris-2.10-sun4v-2.6/sage/libs/singular/option.so

So, the crucial question is: Why is Cython generating a C++ file rather than a C file?

Presumably because in devel/sage/module_list.py, it says

    Extension('sage.libs.singular.option',
              sources = ['sage/libs/singular/option.pyx'],
              libraries = ['m', 'readline', 'singular', 'givaro', 'gmpxx', 'gmp'],
              language="c++",
              include_dirs = [SAGE_ROOT +'/local/include/singular'],
              depends = [SAGE_ROOT + "/local/include/libsingular.h"]),

Note the "language" line.

libpari-gmp.so.2 also exports a mu symbol, by the way. But it's a function, not data, so assigning to that could cause that crash.

That's interesting; could that be the problem? What can we do to fix it?

comment:25 in reply to: ↑ 23 ; follow-up: Changed 11 years ago by SimonKing

Replying to wjp:

libpari-gmp.so.2 also exports a mu symbol, by the way. But it's a function, not data, so assigning to that could cause that crash.

OK, but why would this only strike on t2?

comment:26 follow-up: Changed 11 years ago by jhpalmieri

If I comment out all assignments to Kstd1_mu, then Sage starts and for the directory sage/libs/singular, there is only one doctest failure:

sage -t  devel/sage/sage/libs/singular/option.pyx
**********************************************************************
File "/scratch/palmieri/sage-4.5.2.alpha0/devel/sage-main/sage/libs/singular/option.pyx", line 415:
    sage: J.groebner_basis(mult_bound=100)
Expected:
    [x^3*y^2 + y^3*z^2 + x^2*z^3, x^2*y^3 + x^3*z^2 + y^2*z^3, y^5, x^6 + x*y^4*z^5, x^4*z^2 - y^4*z^2 - x^2*y*z^3 + x*y^2*z^3, z^6 - x*y^4*z^4 - x^3*y*z^5]
Got:
    [x^3*y^2 + y^3*z^2 + x^2*z^3, x^2*y^3 + x^3*z^2 + y^2*z^3, y^5, x^6, x^4*z^2 - y^4*z^2 - x^2*y*z^3 + x*y^2*z^3, z^6, y^4*z^3 - y^3*z^4 - x^2*z^5, x^3*y*z^4 - x^2*y^2*z^4 + x*y^3*z^4, x^3*z^5, x^2*y*z^5 + y^3*z^5, x*y^3*z^5]

Not surprising, since it looks setting the multiplicative bound wants to set Kstd1_mu.

Is something like this an option, if we can't fix the whole problem?

comment:27 in reply to: ↑ 25 Changed 11 years ago by drkirkby

Replying to SimonKing:

Replying to wjp:

libpari-gmp.so.2 also exports a mu symbol, by the way. But it's a function, not data, so assigning to that could cause that crash.

OK, but why would this only strike on t2?

Bugs that only show on one system are quite common. That's one of the big advantages in testing on multiple platforms. I have hit many such bugs over the years.

  • I recall writing some multi-threaded code which worked fine on numerous systems - Linux, HP-UX, Unicos, Solaris, tru64 and IRIX. But it occasionally failed on AIX. I suspected it was an AIX bug, but then I realised it was a bug in my code, which could have shown up on any operating system. It just never did.
  • I recall an ex-colleague writing some finite difference code that worked fine on hi Linux system, but crashed on a quad core Solaris machine. He then looked at his code carefully and found there was a genuine bug.

I actually find that there are some bugs related to Singular and Pari that show up on 64-bit Solaris. I suspect they are genuine bugs and could cause problems on other system, but just have not to date

Dave

comment:28 follow-up: Changed 11 years ago by wjp

DrKirkby: Please don't accuse other projects randomly and unconstructively. If there are 64 bit bugs in them, please report them in a separate ticket. (Or limit discussion of them to those tickets if you already made tickets.) This ticket is getting quite long enough already, and we haven't even fixed the problem in this one yet :-)

jhpalmieri: if doable, it would be good to rename mu entirely inside singular entirely to something less generic. It seems they already use Kstd1_mu internally everywhere with that #define Kstd1_mu mu, so hopefully it's as easy as renaming mu and removing the #define. I'll see if I can get it to work quickly and make a new singular spkg for that.

comment:29 in reply to: ↑ 28 Changed 11 years ago by malb

Replying to wjp:

jhpalmieri: if doable, it would be good to rename mu entirely inside singular entirely to something less generic. It seems they already use Kstd1_mu internally everywhere with that #define Kstd1_mu mu, so hopefully it's as easy as renaming mu and removing the #define. I'll see if I can get it to work quickly and make a new singular spkg for that.

That sounds like a good plan, I'll inform upstream to see whether they have some insight to share.

comment:30 follow-up: Changed 11 years ago by wjp

  • Status changed from new to needs_review

New SPKG that renames mu to Kstd1_mu at:

http://www.math.leidenuniv.nl/~wpalenst/sage/singular-3.1.0.4.p8.spkg

It builds ok on my 64 bit linux system, and all tests pass.

malb: Waiting for upstream's feedback would be good too; thanks for informing them.

comment:31 in reply to: ↑ 26 ; follow-ups: Changed 11 years ago by SimonKing

Replying to jhpalmieri:

If I comment out all assignments to Kstd1_mu, then Sage starts and for the directory sage/libs/singular, there is only one doctest failure: ... Not surprising, since it looks setting the multiplicative bound wants to set Kstd1_mu.

Is something like this an option, if we can't fix the whole problem?

Usage of Kstd1_mu was introduced by my patch at #1396. The aim of that patch was to make all Singular options available to libsingular. Two of Singular's options involve an int parameter: degBound and multBound. I used the former a lot, and it is kind of relieving that it doesn't cause a problem here. The latter, however, was new to me until two weeks ago.

So, personally, I'd say that your suggestion could be a short term solution. But still, I'd like to see all Singular options available in libsingular.

I'd prefer to first test whether wjp's solution works.

@wjp and @malb: option.pyx links against certain files of Singular, located in SAGE_LOCAL, if I'm not mistaken. Will installing the new spkg automatically put these files in place? Or do you say that the spkg actually fixes the problem already?

One general question: Is t2 really so slow, or am I getting a wrong impression? After touching option.pyx, it took about an hour to do sage -br.

Cheers, Simon

comment:32 in reply to: ↑ 31 Changed 11 years ago by SimonKing

Replying to SimonKing:

One general question: Is t2 really so slow, or am I getting a wrong impression? After touching option.pyx, it took about an hour to do sage -br.

... or could the missing speed be caused by my not using /scratch? Now I unpacked the available Sage binary in /scratch, and it was much faster.

comment:33 in reply to: ↑ 30 Changed 11 years ago by SimonKing

Replying to wjp:

New SPKG that renames mu to Kstd1_mu at:

http://www.math.leidenuniv.nl/~wpalenst/sage/singular-3.1.0.4.p8.spkg

Isn't there already a singular-3.1.1...spkg waiting for review (or already with a positive review)? Martin?

comment:34 follow-up: Changed 11 years ago by drkirkby

There is a ticket to create a newer version of Singular - see #8059. Although it had positive review, and I was marked as the reviewer, I never gave it positive review - the author did. I particularly asked if it worked properly on Solaris, but the only evidence presented was that the package built - not that Sage built, or Sage passed the doctests on Solaris.

However, if I understand correctly, 4.5.2 is going to be library updates only, with no updates to .spkg files, so I do not believe an updated Singular.spkg is likely to be merged in 4.5.2, though it might if the update is seen as critical. Though changing variable names in one version, when another version is likely to be merged on the release after this, is perhaps not a great idea.

Simon, you are correct, 't2' is slow. It's a shame really, as it is very nice hardware, which is totally unsuited for the task at hand. The CPUs are designed for a very different task. However, with parallel builds it is nowhere near as bad as it used to be.

Dave

comment:35 in reply to: ↑ 31 Changed 11 years ago by mpatel

Replying to SimonKing:

One general question: Is t2 really so slow, or am I getting a wrong impression? After touching option.pyx, it took about an hour to do sage -br.

Building in t2's /scratch should help, but it really is slow for our purposes. It should also help to use, e.g.,

$ env MAKE="make -j48" ./sage -br

comment:36 in reply to: ↑ 34 ; follow-up: Changed 11 years ago by SimonKing

Replying to drkirkby:

However, if I understand correctly, 4.5.2 is going to be library updates only, with no updates to .spkg files, so I do not believe an updated Singular.spkg is likely to be merged in 4.5.2, though it might if the update is seen as critical. Though changing variable names in one version, when another version is likely to be merged on the release after this, is perhaps not a great idea.

Is it really "changning variable names"? If I understood correctly, Singular does not use the variable name mu - it consistently uses Kstd1_mu, and so do I. So, I wouldn't call it "changing variable names" but "clearifying a variable definition".

comment:37 follow-up: Changed 11 years ago by wjp

Thanks to the link to #8059. The new version of Singular there actually already has done exactly the same thing, and renamed mu to Kstd1_mu. So that spkg supercedes this one.

comment:38 in reply to: ↑ 37 ; follow-up: Changed 11 years ago by SimonKing

Replying to wjp:

Thanks to the link to #8059. The new version of Singular there actually already has done exactly the same thing, and renamed mu to Kstd1_mu. So that spkg supercedes this one.

That's good news!

I already started to build your singular-3.1.0-spkg and will report whether it works. And then I'll try again with the spkg from #8059, also doing make ptestlong, so that hopefully a proper positive review for #8059 is possible.

Cheers, Simon

comment:39 in reply to: ↑ 36 Changed 11 years ago by drkirkby

Replying to SimonKing:

Is it really "changning variable names"? If I understood correctly, Singular does not use the variable name mu - it consistently uses Kstd1_mu, and so do I. So, I wouldn't call it "changing variable names" but "clearifying a variable definition".

Sorry, I mis-understood.

Dave

comment:40 Changed 11 years ago by drkirkby

It's not clear to me if the updated Singular can go into this release of Sage though. IIRC, William has specifically remarked that the updates to Pari and Singular would not be in this release.

Anyway, I must do something else. Need to finish a job application, which needs to be done in less than than two hours!

Dave

comment:41 in reply to: ↑ 38 Changed 11 years ago by SimonKing

Replying to SimonKing:

I already started to build your singular-3.1.0-spkg and will report whether it works.

I am afraid it didn't. It seems to me that local/include/singular/kstd1.h was not replaced when I installed the new spkg.

So, where does local/include/singular/kstd1.h come from? Where do I get the files in local/include/singular/ from?

comment:42 Changed 11 years ago by wjp

Sorry, that was a very stupid mistake in spkg-install. I updated the spkg to fix that. (Same URL)

comment:43 follow-up: Changed 11 years ago by mpatel

Replying to SimonKing:

Replying to SimonKing:

I already started to build your singular-3.1.0-spkg and will report whether it works.

I am afraid it didn't. It seems to me that local/include/singular/kstd1.h was not replaced when I installed the new spkg.

Willem's first singular-3.1.0.4.p8.spkg yields an "Unhandled SIGSEGV" for me on t2; I haven't tested the latest version. However, Sage does start with the latest patch (I ignored the rejects) and package at #8059, after I run ./sage -b. I'm running the long doctests now.

comment:44 in reply to: ↑ 43 Changed 11 years ago by mpatel

Replying to mpatel:

Willem's first singular-3.1.0.4.p8.spkg yields an "Unhandled SIGSEGV" for me on t2; I haven't tested the latest version. However, Sage does start with the latest patch (I ignored the rejects) and package at #8059, after I run ./sage -b. I'm running the long doctests now.

These pass, except for those fixed by #9590. The suite for sage/schemes/elliptic_curves/ell_rational_field.py timed out (3602.4 s) and I'm rerunning it now. I'll try to install and test the new p8 spkg in a separate copy of 4.5.2.alpha0.

comment:45 follow-up: Changed 11 years ago by drkirkby

Would Mitesh, (who is a release manager for 4.3.2), consider merging the package at #8059, despite it being a major update to Singular, if it fixes this problem? I know the plan was not to update .spkg files in Sage 4.3.2, but plans do sometimes change.

If my understanding is correct,

http://sage.math.washington.edu/home/malb/spkgs/singular-3-1-1-4.spkg

which is the latest release, does not define Kstd1_mu to be mu, so this particular problem should not exist. However, since singular-3-1-1-4.spkg has not been checked properly on 't2', there may be other problems.

It seems to me that the policy the release managers adopt regarding the updating of .spkg files could have a major impact on what happens on this ticket.

Dave

comment:46 in reply to: ↑ 45 ; follow-up: Changed 11 years ago by ddrake

Replying to drkirkby:

Would Mitesh, (who is a release manager for 4.3.2),

I'm also a release manager for 4.5.2. :)

consider merging the package at #8059, despite it being a major update to Singular, if it fixes this problem? I know the plan was not to update .spkg files in Sage 4.3.2, but plans do sometimes change.

We are planning to merge a new sagenb spkg, but I would prefer to avoid merging a new Singular spkg.

Did the patch from #1396 fix any bugs or failing doctests, or did it only add new functionality? If it only added new functionality, I would really prefer to back it out for now.

If my understanding is correct,

http://sage.math.washington.edu/home/malb/spkgs/singular-3-1-1-4.spkg

which is the latest release, does not define Kstd1_mu to be mu, so this particular problem should not exist. However, since singular-3-1-1-4.spkg has not been checked properly on 't2', there may be other problems.

It seems to me that the policy the release managers adopt regarding the updating of .spkg files could have a major impact on what happens on this ticket.

Dave

comment:47 follow-ups: Changed 11 years ago by ddrake

By the way, the proper way in Mercurial to "undo" a changeset is the backout command. Here's how you can test 4.5.2.alpha0 without #1396:

  • build 4.5.2.alpha0
  • make a new branch
  • in that branch, do hg backout --merge 14701
  • do hg commit to commit the result of the merge
  • test Sage as usual

See http://hgbook.red-bean.com/read/finding-and-fixing-mistakes.html#id392287 for more info. I'm currently testing 4.5.2.alpha0 on t2.math after doing the above.

comment:48 in reply to: ↑ 47 Changed 11 years ago by ddrake

Replying to ddrake:

I'm currently testing 4.5.2.alpha0 on t2.math after doing the above.

Sage starts after backing out #1396. I'll see about doctests.

comment:49 Changed 11 years ago by jhpalmieri

Willem's first singular-3.1.0.4.p8.spkg yields an "Unhandled SIGSEGV" for me on t2.

The first one didn't make any changes. After doing any of the others, you probably have to do "sage -ba" (or maybe you can get away with just rebuilding the pyx files in libs/singular?).

comment:50 follow-up: Changed 11 years ago by jhpalmieri

For what it's worth, I've tested the spkg posted here on a bunch of different platforms, both building from scratch and upgrading (followed by "sage -ba"), and it seems to behave well: passes all tests (except for known, unrelated, failures) on sage.math, taurus, menas, lena, my OS X box, and some tests on t2.math -- I haven't had time to run all of them, but it passes all tests on libs/singular/ and rings/. So I would give this a positive review. If you feel like merging this spkg instead of backing out #1396, you have that option. (For what it's worth, the package at #8059 does not work for me when I try to build 4.5.2.alpha0 with it; maybe it's not compatible with #1396?)

I'll run all long tests on t2.math tonight, and if there are no problems, I'll mark this officially as "positive review".

As I said, after upgrading, you have to make sure to do "sage -ba". Is this automatic in the upgrade process?

comment:51 in reply to: ↑ 46 Changed 11 years ago by SimonKing

Replying to ddrake:

... Did the patch from #1396 fix any bugs or failing doctests, or did it only add new functionality?

It is an enhancement.

comment:52 in reply to: ↑ 47 Changed 11 years ago by mpatel

Replying to ddrake:

By the way, the proper way in Mercurial to "undo" a changeset is the backout command. Here's how you can test 4.5.2.alpha0 without #1396:

Thanks for the tip! Thanks also to David for the GDB mini-lesson!

comment:53 Changed 11 years ago by ddrake

Since #1396 didn't fix any bugs or other failures, and since I'm not brave enough to merge any more spkgs than I already have, for 4.5.2 I propose that we backout attachment:trac1396-singular_options.2.patch. I will open a new ticket, in which we can re-merge #1396.

The patch here was obtained by using hg backout, so if we trust Mercurial, it will exactly reverse the patch from #1396.

comment:54 Changed 11 years ago by ddrake

The new ticket for remerging is #9599.

comment:55 follow-up: Changed 11 years ago by mpatel

  • Authors set to Dan Drake

I'm now testing 4.5.2.alpha0 plus comment 47's backout procedure on bsd.math, sage.math, and t2, but I won't be able to report the results until after I wake up.

comment:56 Changed 11 years ago by leif

  • Cc leif added

comment:57 in reply to: ↑ 55 Changed 11 years ago by mpatel

  • Reviewers set to Mitesh Patel
  • Status changed from needs_review to positive_review

Replying to mpatel:

I'm now testing 4.5.2.alpha0 plus comment 47's backout procedure on bsd.math, sage.math, and t2, but I won't be able to report the results until after I wake up.

I get no new long doctest failures. Also,

$ cd SAGE_ROOT/devel/sage
$ wget http://trac.sagemath.org/sage_trac/raw-attachment/ticket/1396/trac1396-singular_options.2.patch
$ wget http://trac.sagemath.org/sage_trac/raw-attachment/ticket/9583/trac_9583.patch
$ hg stat
$ patch -p1 < trac1396-singular_options.2.patch
patching file sage/interfaces/singular.py
patching file sage/libs/singular/option.pyx
patching file sage/libs/singular/singular-cdefs.pxi
patching file sage/libs/singular/singular.pyx
patching file sage/rings/polynomial/multi_polynomial_ideal.py
$ hg stat
M sage/interfaces/singular.py
M sage/libs/singular/option.pyx
M sage/libs/singular/singular-cdefs.pxi
M sage/libs/singular/singular.pyx
M sage/rings/polynomial/multi_polynomial_ideal.py
$ patch -p1 < trac_9583.patch
patching file sage/interfaces/singular.py
patching file sage/libs/singular/option.pyx
patching file sage/libs/singular/singular-cdefs.pxi
patching file sage/libs/singular/singular.pyx
patching file sage/rings/polynomial/multi_polynomial_ideal.py
$ hg stat
$ hg diff
$

so I'm ready to give this a positive review.

comment:58 in reply to: ↑ 50 Changed 11 years ago by mpatel

Replying to jhpalmieri:

As I said, after upgrading, you have to make sure to do "sage -ba". Is this automatic in the upgrade process?

As far as I can tell, sage -upgrade (which invoke sage-upgrade and sage-update) and sage -f/i (which call sage-spkg) do not check whether it's necessary to rebuild dependent packages (nor warn about or actually rebuild them), particularly those that are already installed (i.e., have a corresponding marker in SAGE_ROOT/spkg/installed). Or am I misunderstanding your question?

comment:59 in reply to: ↑ 47 Changed 11 years ago by drkirkby

Replying to ddrake:

By the way, the proper way in Mercurial to "undo" a changeset is the backout command. Here's how you can test 4.5.2.alpha0 without #1396:

  • build 4.5.2.alpha0
  • make a new branch
  • in that branch, do hg backout --merge 14701
  • do hg commit to commit the result of the merge
  • test Sage as usual

See http://hgbook.red-bean.com/read/finding-and-fixing-mistakes.html#id392287 for more info. I'm currently testing 4.5.2.alpha0 on t2.math after doing the above.

Thank you for that Dan. Its usefulness to me will extend well beyond the point this ticket gets closed.

Dave

comment:60 Changed 11 years ago by ddrake

  • Merged in set to sage-4.5.2.alpha1
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.