Opened 9 years ago

Closed 8 years ago

#13151 closed defect (fixed)

fix pickling of Matrix_modn_dense_double on SPARC

Reported by: malb Owned by: drkirkby
Priority: major Milestone: sage-5.9
Component: porting: Solaris Keywords: Solaris SPARC
Cc: vbraun, jdemeyer, drkirkby, jpflori, leif Merged in: sage-5.9.beta5
Authors: Jeroen Demeyer Reviewers: Jean-Pierre Flori
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description (last modified by jdemeyer)

On the Skynet machine mark (Solaris SPARC), pickling a Matrix_modn_dense_double results in a Bus Error (as reported at #12841):

sage -t  --long -force_lib devel/sage/sage/matrix/matrix_modn_dense.pyx
**********************************************************************
File "/home/buildbot/build/sage/mark-1/mark_full/build/sage-5.2.beta0/devel/sage-main/sage/matrix/matrix_modn_dense.pyx", line 464:
    sage: data, version = A._pickle()
Exception raised:
    Traceback (most recent call last):
      File "/home/buildbot/build/sage/mark-1/mark_full/build/sage-5.2.beta0/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/home/buildbot/build/sage/mark-1/mark_full/build/sage-5.2.beta0/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/home/buildbot/build/sage/mark-1/mark_full/build/sage-5.2.beta0/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_6[8]>", line 1, in <module>
        data, version = A._pickle()###line 464:
    sage: data, version = A._pickle()
      File "matrix_modn_dense_template.pxi", line 636, in sage.matrix.matrix_modn_dense_double.Matrix_modn_dense_template._pickle (sage/matrix/matrix_modn_dense_double.cpp:6518)
        sig_on()
    RuntimeError: Bus error
**********************************************************************

Attachments (1)

13151_alignment.patch (8.4 KB) - added by jdemeyer 8 years ago.

Download all attachments as: .zip

Change History (33)

comment:1 Changed 9 years ago by malb

Volker reported at #12841:

The output of !PyString_FromStringAndSize is 4 but not 8-byte aligned. By faking it

           um = <mod_int*>((<char *>s)+4)

I avoided the crash.

comment:2 Changed 9 years ago by vbraun

I tried gcc -munaligned-doubles but that didn't help.

The problem seems to be that malloc on 32-bit SPARC solaris generally returns 4-byte aligned buffers.

Last edited 9 years ago by vbraun (previous) (diff)

comment:3 follow-up: Changed 9 years ago by vbraun

I've built 32-bit Sage on OpenIndiana and it works fine, for the record.

comment:4 in reply to: ↑ 3 ; follow-up: Changed 9 years ago by jdemeyer

Replying to vbraun:

I've built 32-bit Sage on OpenIndiana and it works fine, for the record.

On SPARC hardware?

comment:5 in reply to: ↑ 4 ; follow-up: Changed 9 years ago by vbraun

Replying to jdemeyer:

On SPARC hardware?

Very funny. OI doesn't even support SPARC ;-)

comment:6 in reply to: ↑ 5 Changed 9 years ago by jdemeyer

Replying to vbraun:

Replying to jdemeyer:

On SPARC hardware?

Very funny. OI doesn't even support SPARC ;-)

I didn't know that, but anyway it makes your comment irrelevant. It is more of a hardware-related issue than a software-related issue.

comment:7 follow-up: Changed 9 years ago by jdemeyer

eplying to vbraun:

I tried gcc -munaligned-doubles but that didn't help.

The problem seems to be that malloc on 32-bit SPARC solaris generally returns 4-byte aligned buffers.

That's not necessarily the case. The string <char*>s isn't directly generated from a malloc, instead it is the result of a PyString_AsString() call. That isn't guaranteed to be 8-bytes aligned.

The obvious solution is to allocate a buffer ourselves, fill that and pass the result to PyString_FromStringAndSize(buf, size). This will cause an extra memcpy() but that might be unavoidable.

comment:8 Changed 9 years ago by vbraun

Some more testing reveals that a) malloc on SPARC is in fact 8-byte aligned (as specified by the C99 standard), and b) the output of PyString_FromStringAndSize is never 8-byte aligned, not on SPARC 32-bit, OI 32-bit, Linux 64-bit:

sage: cython("""
....: def f():
....:         cdef char* buf
....:     s = PyString_FromStringAndSize(NULL, 1*8)
....:     buf = <char*>s
....:     return <int>buf
....: """)
sage: 
sage: set([ f() % 8 for i in range(100) ])
set([4])

comment:9 Changed 9 years ago by vbraun

For debugging purposes it might be useful to know that you can turn off the misalignment support and trigger bus errors on x86 processors:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <inttypes.h>

/* Run as (on x86 processors):
 * $ gcc -o test test.c && ./test
 * Bus error (core dumped)
 */


int main(void)
{
  __asm__("pushf\n"
          "orl $0x40000, (%rsp)\n"
          "popf");
 
    char *ptr = malloc(20);
    ptr += 4;

    uint64_t *uint64_ptr = (uint64_t*)ptr;
    (*uint64_ptr) = (uint64_t)1;

    return 0;
}

comment:10 Changed 9 years ago by jdemeyer

  • Component changed from pickling to porting
  • Description modified (diff)
  • Keywords SPARC added
  • Owner changed from was to drkirkby
  • Summary changed from fix pickling of Matrix_modn_dense_double on Solaris (Mark) to fix pickling of Matrix_modn_dense_double on SPARC

comment:11 follow-up: Changed 9 years ago by vbraun

I noticed another oddity: sage -hg works with my sage-4.7.2.alpha3 build, but not with the sage-5.1.beta3 build. Both shipped with mercurial-1.8.4. Perhaps our gcc spkg is at fault? The working build used /usr/local/gcc-4.5.1/sparc-SunOS-ultrasparc3-sun-as-ld

-bash-3.00$ ./sage -hg qpush
/home/vbraun/opt/mark/sage-5.1.beta3/spkg/bin/sage: line 641: 21670 Bus Error               (core dumped) "$SAGE_LOCAL/bin/hg" "$@"
-bash-3.00$ pstack core
core 'core' of 21670:   python /home/vbraun/opt/mark/sage-5.1.beta3/local/bin/hg qpush
 fea40f14 parse_dirstate (3be794, fea417c0, 3bec40, 3bec51, ff223ebc, 2b37d0) + b8
 ff203388 PyCFunction_Call (3725a8, 390170, 0, e9850, 0, fea40e5c) + e0
 ff270f1c PyEval_EvalFrameEx (3be5f0, 0, 38c8a0, 22bc60, 209e0, 390170) + 6030
 ff271738 PyEval_EvalFrameEx (3be490, 0, 3be5f4, 33bb05, 209e0, 209e0) + 684c
 ff271738 PyEval_EvalFrameEx (3b53c8, 0, 3be494, 123b40, 209e0, 209e0) + 684c
 ff272acc PyEval_EvalCodeEx (1291d0, 3b53d4, 3b53c8, 24be4c, 3, 0) + 924
 ff1ea2b0 function_call (13de70, 24be40, 0, 17e4f8, 13de70, 1813a4) + 8c
 ff1be5c8 PyObject_Call (13de70, 24be40, 0, ff337c78, ff1ea224, 24be30) + 60
 ff1bedc8 PyObject_CallFunctionObjArgs (13de70, 3426f0, ffbfce14, ffbfce14, 0, 24be40) + d4
 ff205ea8 _PyObject_GenericGetAttrWithDict (0, 1c69c0, 2a8d20, 3426f0, ff223ebc, 2b37d0) + d0
 ff2052c4 PyObject_GetAttr (2b37d0, 1c69c0, ffffffff, ef088, ef088, 2b37d0) + 64
 ff26cbd0 PyEval_EvalFrameEx (3bc888, 0, 33fe50, 33bdad, 209e0, 3bc9d0) + 1ce4
 ff272acc PyEval_EvalCodeEx (339c38, 3bc890, 3bc888, 355c44, 2, 0) + 924
 ff1ea2b0 function_call (33df30, 355c38, 0, c, 33df30, 24be40) + 8c
[...]
 ff297134 PyRun_FileExFlags (feef31f4, ffbff48b, 16ec00, 52660, 52660, 11fa88) + 7c
 ff297d18 PyRun_SimpleFileExFlags (ffbff4b8, ffbff48b, 1, ffbff294, ff2eac90, feef31f4) + d4
 ff2ac8f4 Py_Main  (1, ffbff2fc, ff33a55c, 0, feef31f4, 0) + c38
 0001055c _start   (0, 0, 0, 0, 0, 0) + 5c

comment:12 follow-up: Changed 9 years ago by jdemeyer

I can confirm the hg problem, but don't feel like investigating that now.

You could try building the current Sage with that gcc version instead of using the GCC spkg (note that we currently only support the Sun linker on Solaris, but that's okay).

comment:13 Changed 9 years ago by vbraun

Back to the pickling issue, why do cast double to uint64_t at all? Since we clearly don't care about endianness, why not just memcpy the bare data?

comment:14 Changed 9 years ago by malb

When we load a pickle from machine A on machine B and they have different endianness, then we care about fixing that.

comment:15 Changed 9 years ago by jdemeyer

  • Cc drkirkby added
  • Keywords Solaris added

comment:16 in reply to: ↑ 12 ; follow-ups: Changed 9 years ago by drkirkby

Replying to jdemeyer:

I can confirm the hg problem, but don't feel like investigating that now.

You could try building the current Sage with that gcc version instead of using the GCC spkg (note that we currently only support the Sun linker on Solaris, but that's okay).

I built Sage sage-5.3.rc0 on a Sun Blade 2000 (2 x UltraSPARC III 1200 MHz CPUs, 8 GB RAM) using gcc 4.5.0, which was on my system, and so I did not use the gcc packaged with Sage. I get the same problem, so it is not some bug related to the particular gcc in Sage.

Is Skynet's SPARC machine running? If not, and someone wants to look at this, they can use my SPARC. I don't keep it on 24/7, but now I am writing some code to control a vector network analyzer, so the machine will be on for a week or two.

Dave

comment:17 in reply to: ↑ 16 Changed 9 years ago by jdemeyer

Replying to drkirkby:

Is Skynet's SPARC machine running?

Yes, it is.

comment:18 in reply to: ↑ 16 Changed 9 years ago by jdemeyer

Replying to drkirkby:

I built Sage sage-5.3.rc0 on a Sun Blade 2000 (2 x UltraSPARC III 1200 MHz CPUs, 8 GB RAM) using gcc 4.5.0, which was on my system, and so I did not use the gcc packaged with Sage.

Which assembler/linker is that using?

comment:19 Changed 8 years ago by jpflori

  • Cc jpflori added

comment:20 Changed 8 years ago by leif

  • Cc leif added

comment:21 in reply to: ↑ 11 ; follow-up: Changed 8 years ago by leif

Replying to vbraun:

I noticed another oddity: sage -hg works with my sage-4.7.2.alpha3 build, [...]

Doesn't really surprise me... ;-)

(I've tested it with GCC 4.5.1 and 4.6.1.)

comment:22 in reply to: ↑ 21 Changed 8 years ago by jpflori

Replying to leif:

Replying to vbraun:

I noticed another oddity: sage -hg works with my sage-4.7.2.alpha3 build, [...]

Doesn't really surprise me... ;-)

(I've tested it with GCC 4.5.1 and 4.6.1.)

On my 5.9.beta3 setup (with Sage's GCC), ./sage -hg seems fine as well.

comment:23 in reply to: ↑ 7 Changed 8 years ago by jpflori

Replying to jdemeyer:

eplying to vbraun:

I tried gcc -munaligned-doubles but that didn't help.

The problem seems to be that malloc on 32-bit SPARC solaris generally returns 4-byte aligned buffers.

That's not necessarily the case. The string <char*>s isn't directly generated from a malloc, instead it is the result of a PyString_AsString() call. That isn't guaranteed to be 8-bytes aligned.

The obvious solution is to allocate a buffer ourselves, fill that and pass the result to PyString_FromStringAndSize(buf, size). This will cause an extra memcpy() but that might be unavoidable.

So should we go with that solution?

comment:24 Changed 8 years ago by jpflori

Not that we really care if matrix unpickling is a little slower...

comment:25 Changed 8 years ago by jdemeyer

OK, you write the patch or me (just asking to avoid duplicating effort).

It would be really cool if we could have full SPARC suppport in sage-5.9, and maybe even full Cygwin support also.

comment:26 Changed 8 years ago by jpflori

I won't have time before late tonight, so please go ahead and I'll review it later if that suits you.

comment:27 Changed 8 years ago by jdemeyer

  • Authors set to Jeroen Demeyer
  • Status changed from new to needs_review

comment:28 Changed 8 years ago by jdemeyer

  • Component changed from porting to porting: Solaris

comment:29 Changed 8 years ago by jdemeyer

See #14429 for another SPARC-related ticket. When applying #13151 and #14429, there should be no more doctest failures on Solaris SPARC, i.e. Solaris SPARC would become fully supported with these two tickets.

comment:30 Changed 8 years ago by jpflori

  • Reviewers set to Jean-Pierre Flori
  • Status changed from needs_review to positive_review

This works fine and the patch looks ok, implementing the solution which was suggested here.

Changed 8 years ago by jdemeyer

comment:31 Changed 8 years ago by jdemeyer

I fixed a small mistake: the sig_on() should be outside the try/finally statement. I assume that the positive review still stands.

comment:32 Changed 8 years ago by jdemeyer

  • Merged in set to sage-5.9.beta5
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.