Opened 9 years ago
Closed 8 years ago
#13151 closed defect (fixed)
fix pickling of Matrix_modn_dense_double on SPARC
Reported by: | malb | Owned by: | drkirkby |
---|---|---|---|
Priority: | major | Milestone: | sage-5.9 |
Component: | porting: Solaris | Keywords: | Solaris SPARC |
Cc: | vbraun, jdemeyer, drkirkby, jpflori, leif | Merged in: | sage-5.9.beta5 |
Authors: | Jeroen Demeyer | Reviewers: | Jean-Pierre Flori |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description (last modified by )
On the Skynet machine mark
(Solaris SPARC), pickling a Matrix_modn_dense_double
results in a Bus Error (as reported at #12841):
sage -t --long -force_lib devel/sage/sage/matrix/matrix_modn_dense.pyx ********************************************************************** File "/home/buildbot/build/sage/mark-1/mark_full/build/sage-5.2.beta0/devel/sage-main/sage/matrix/matrix_modn_dense.pyx", line 464: sage: data, version = A._pickle() Exception raised: Traceback (most recent call last): File "/home/buildbot/build/sage/mark-1/mark_full/build/sage-5.2.beta0/local/bin/ncadoctest.py", line 1231, in run_one_test self.run_one_example(test, example, filename, compileflags) File "/home/buildbot/build/sage/mark-1/mark_full/build/sage-5.2.beta0/local/bin/sagedoctest.py", line 38, in run_one_example OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags) File "/home/buildbot/build/sage/mark-1/mark_full/build/sage-5.2.beta0/local/bin/ncadoctest.py", line 1172, in run_one_example compileflags, 1) in test.globs File "<doctest __main__.example_6[8]>", line 1, in <module> data, version = A._pickle()###line 464: sage: data, version = A._pickle() File "matrix_modn_dense_template.pxi", line 636, in sage.matrix.matrix_modn_dense_double.Matrix_modn_dense_template._pickle (sage/matrix/matrix_modn_dense_double.cpp:6518) sig_on() RuntimeError: Bus error **********************************************************************
Attachments (1)
Change History (33)
comment:1 Changed 9 years ago by
comment:2 Changed 9 years ago by
I tried gcc -munaligned-doubles
but that didn't help.
The problem seems to be that malloc on 32-bit SPARC solaris generally returns 4-byte aligned buffers.
comment:3 follow-up: ↓ 4 Changed 9 years ago by
I've built 32-bit Sage on OpenIndiana and it works fine, for the record.
comment:4 in reply to: ↑ 3 ; follow-up: ↓ 5 Changed 9 years ago by
Replying to vbraun:
I've built 32-bit Sage on OpenIndiana and it works fine, for the record.
On SPARC hardware?
comment:5 in reply to: ↑ 4 ; follow-up: ↓ 6 Changed 9 years ago by
comment:6 in reply to: ↑ 5 Changed 9 years ago by
comment:7 follow-up: ↓ 23 Changed 9 years ago by
eplying to vbraun:
I tried
gcc -munaligned-doubles
but that didn't help.The problem seems to be that malloc on 32-bit SPARC solaris generally returns 4-byte aligned buffers.
That's not necessarily the case. The string <char*>s
isn't directly generated from a malloc
, instead it is the result of a PyString_AsString()
call. That isn't guaranteed to be 8-bytes aligned.
The obvious solution is to allocate a buffer ourselves, fill that and pass the result to PyString_FromStringAndSize(buf, size)
. This will cause an extra memcpy()
but that might be unavoidable.
comment:8 Changed 9 years ago by
Some more testing reveals that a) malloc on SPARC is in fact 8-byte aligned (as specified by the C99 standard), and b) the output of PyString_FromStringAndSize
is never 8-byte aligned, not on SPARC 32-bit, OI 32-bit, Linux 64-bit:
sage: cython(""" ....: def f(): ....: cdef char* buf ....: s = PyString_FromStringAndSize(NULL, 1*8) ....: buf = <char*>s ....: return <int>buf ....: """) sage: sage: set([ f() % 8 for i in range(100) ]) set([4])
comment:9 Changed 9 years ago by
For debugging purposes it might be useful to know that you can turn off the misalignment support and trigger bus errors on x86 processors:
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <inttypes.h> /* Run as (on x86 processors): * $ gcc -o test test.c && ./test * Bus error (core dumped) */ int main(void) { __asm__("pushf\n" "orl $0x40000, (%rsp)\n" "popf"); char *ptr = malloc(20); ptr += 4; uint64_t *uint64_ptr = (uint64_t*)ptr; (*uint64_ptr) = (uint64_t)1; return 0; }
comment:10 Changed 9 years ago by
- Component changed from pickling to porting
- Description modified (diff)
- Keywords SPARC added
- Owner changed from was to drkirkby
- Summary changed from fix pickling of Matrix_modn_dense_double on Solaris (Mark) to fix pickling of Matrix_modn_dense_double on SPARC
comment:11 follow-up: ↓ 21 Changed 9 years ago by
I noticed another oddity: sage -hg
works with my sage-4.7.2.alpha3 build, but not with the sage-5.1.beta3 build. Both shipped with mercurial-1.8.4. Perhaps our gcc spkg is at fault? The working build used /usr/local/gcc-4.5.1/sparc-SunOS-ultrasparc3-sun-as-ld
-bash-3.00$ ./sage -hg qpush /home/vbraun/opt/mark/sage-5.1.beta3/spkg/bin/sage: line 641: 21670 Bus Error (core dumped) "$SAGE_LOCAL/bin/hg" "$@" -bash-3.00$ pstack core core 'core' of 21670: python /home/vbraun/opt/mark/sage-5.1.beta3/local/bin/hg qpush fea40f14 parse_dirstate (3be794, fea417c0, 3bec40, 3bec51, ff223ebc, 2b37d0) + b8 ff203388 PyCFunction_Call (3725a8, 390170, 0, e9850, 0, fea40e5c) + e0 ff270f1c PyEval_EvalFrameEx (3be5f0, 0, 38c8a0, 22bc60, 209e0, 390170) + 6030 ff271738 PyEval_EvalFrameEx (3be490, 0, 3be5f4, 33bb05, 209e0, 209e0) + 684c ff271738 PyEval_EvalFrameEx (3b53c8, 0, 3be494, 123b40, 209e0, 209e0) + 684c ff272acc PyEval_EvalCodeEx (1291d0, 3b53d4, 3b53c8, 24be4c, 3, 0) + 924 ff1ea2b0 function_call (13de70, 24be40, 0, 17e4f8, 13de70, 1813a4) + 8c ff1be5c8 PyObject_Call (13de70, 24be40, 0, ff337c78, ff1ea224, 24be30) + 60 ff1bedc8 PyObject_CallFunctionObjArgs (13de70, 3426f0, ffbfce14, ffbfce14, 0, 24be40) + d4 ff205ea8 _PyObject_GenericGetAttrWithDict (0, 1c69c0, 2a8d20, 3426f0, ff223ebc, 2b37d0) + d0 ff2052c4 PyObject_GetAttr (2b37d0, 1c69c0, ffffffff, ef088, ef088, 2b37d0) + 64 ff26cbd0 PyEval_EvalFrameEx (3bc888, 0, 33fe50, 33bdad, 209e0, 3bc9d0) + 1ce4 ff272acc PyEval_EvalCodeEx (339c38, 3bc890, 3bc888, 355c44, 2, 0) + 924 ff1ea2b0 function_call (33df30, 355c38, 0, c, 33df30, 24be40) + 8c [...] ff297134 PyRun_FileExFlags (feef31f4, ffbff48b, 16ec00, 52660, 52660, 11fa88) + 7c ff297d18 PyRun_SimpleFileExFlags (ffbff4b8, ffbff48b, 1, ffbff294, ff2eac90, feef31f4) + d4 ff2ac8f4 Py_Main (1, ffbff2fc, ff33a55c, 0, feef31f4, 0) + c38 0001055c _start (0, 0, 0, 0, 0, 0) + 5c
comment:12 follow-up: ↓ 16 Changed 9 years ago by
I can confirm the hg
problem, but don't feel like investigating that now.
You could try building the current Sage with that gcc
version instead of using the GCC spkg (note that we currently only support the Sun linker on Solaris, but that's okay).
comment:13 Changed 9 years ago by
Back to the pickling issue, why do cast double to uint64_t at all? Since we clearly don't care about endianness, why not just memcpy the bare data?
comment:14 Changed 9 years ago by
When we load a pickle from machine A on machine B and they have different endianness, then we care about fixing that.
comment:15 Changed 9 years ago by
- Cc drkirkby added
- Keywords Solaris added
comment:16 in reply to: ↑ 12 ; follow-ups: ↓ 17 ↓ 18 Changed 9 years ago by
Replying to jdemeyer:
I can confirm the
hg
problem, but don't feel like investigating that now.You could try building the current Sage with that
gcc
version instead of using the GCC spkg (note that we currently only support the Sun linker on Solaris, but that's okay).
I built Sage sage-5.3.rc0 on a Sun Blade 2000 (2 x UltraSPARC III 1200 MHz CPUs, 8 GB RAM) using gcc 4.5.0, which was on my system, and so I did not use the gcc packaged with Sage. I get the same problem, so it is not some bug related to the particular gcc in Sage.
Is Skynet's SPARC machine running? If not, and someone wants to look at this, they can use my SPARC. I don't keep it on 24/7, but now I am writing some code to control a vector network analyzer, so the machine will be on for a week or two.
Dave
comment:17 in reply to: ↑ 16 Changed 9 years ago by
comment:18 in reply to: ↑ 16 Changed 9 years ago by
Replying to drkirkby:
I built Sage sage-5.3.rc0 on a Sun Blade 2000 (2 x UltraSPARC III 1200 MHz CPUs, 8 GB RAM) using gcc 4.5.0, which was on my system, and so I did not use the gcc packaged with Sage.
Which assembler/linker is that using?
comment:19 Changed 8 years ago by
- Cc jpflori added
comment:20 Changed 8 years ago by
- Cc leif added
comment:21 in reply to: ↑ 11 ; follow-up: ↓ 22 Changed 8 years ago by
Replying to vbraun:
I noticed another oddity:
sage -hg
works with my sage-4.7.2.alpha3 build, [...]
Doesn't really surprise me... ;-)
(I've tested it with GCC 4.5.1 and 4.6.1.)
comment:22 in reply to: ↑ 21 Changed 8 years ago by
comment:23 in reply to: ↑ 7 Changed 8 years ago by
Replying to jdemeyer:
eplying to vbraun:
I tried
gcc -munaligned-doubles
but that didn't help.The problem seems to be that malloc on 32-bit SPARC solaris generally returns 4-byte aligned buffers.
That's not necessarily the case. The string
<char*>s
isn't directly generated from amalloc
, instead it is the result of aPyString_AsString()
call. That isn't guaranteed to be 8-bytes aligned.The obvious solution is to allocate a buffer ourselves, fill that and pass the result to
PyString_FromStringAndSize(buf, size)
. This will cause an extramemcpy()
but that might be unavoidable.
So should we go with that solution?
comment:24 Changed 8 years ago by
Not that we really care if matrix unpickling is a little slower...
comment:25 Changed 8 years ago by
OK, you write the patch or me (just asking to avoid duplicating effort).
It would be really cool if we could have full SPARC suppport in sage-5.9, and maybe even full Cygwin support also.
comment:26 Changed 8 years ago by
I won't have time before late tonight, so please go ahead and I'll review it later if that suits you.
comment:27 Changed 8 years ago by
- Status changed from new to needs_review
comment:28 Changed 8 years ago by
- Component changed from porting to porting: Solaris
comment:29 Changed 8 years ago by
comment:30 Changed 8 years ago by
- Reviewers set to Jean-Pierre Flori
- Status changed from needs_review to positive_review
This works fine and the patch looks ok, implementing the solution which was suggested here.
Changed 8 years ago by
comment:31 Changed 8 years ago by
I fixed a small mistake: the sig_on()
should be outside the try/finally
statement. I assume that the positive review still stands.
comment:32 Changed 8 years ago by
- Merged in set to sage-5.9.beta5
- Resolution set to fixed
- Status changed from positive_review to closed
Volker reported at #12841: