Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#12954 closed defect (fixed)

Make MPIR support SAGE_FAT_BINARY on all systems

Reported by: vbraun Owned by: tbd
Priority: blocker Milestone: sage-5.0.1
Component: packages: standard Keywords: sd40.5
Cc: jdemeyer, leif, jpflori, mhansen, kcrisman, vbraun Merged in: sage-5.0.1.rc0
Authors: Jeroen Demeyer Reviewers: William Stein, Karl-Dieter Crisman
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by jdemeyer)

The OSX binary sage-5.0-OSX-64bit-10.6-x86_64-Darwin.dmg dies in MPFR on Core 2 Duo processors. This is because the MPIR spkg only checks SAGE_FAT_BINARY on Linux systems.

Reported on https://groups.google.com/d/topic/sage-support/IfJCisKo7Ao/discussion

spkg: http://boxen.math.washington.edu/home/jdemeyer/spkg/mpir-2.4.0.p4.spkg

mpir-2.4.0.p4 (Jeroen Demeyer, 26 May 2012)

  • Trac #12954: when SAGE_FAT_BINARY=yes on systems which don't support --enable-fat, compile a generic binary (not assuming any particular CPU) by passing the output of configfsf.guess to the --build option of ./configure.
  • Disable SAGE_FAT_BINARY when bootstrapping GCC.
  • Rename MPIR_EXTRA_OPTS to MPIR_CONFIGURE.
  • Add configure options directly to MPIR_CONFIGURE, no longer use SAGE_CONF_OPTS for this.
  • When user specifies CFLAGS, append them to MPIR's flags instead of completely replacing MPIR's flags.
  • Remove $default_cflags and get_processor_specific_cflags().

Attachments (1)

mpir-2.4.0.p4.diff (11.2 KB) - added by jdemeyer 8 years ago.
Diff between the 2.4.0.p3 and 2.4.0.p4 spkgs. For reference / review only.

Download all attachments as: .zip

Change History (37)

comment:1 Changed 8 years ago by vbraun

  • Cc jdemeyer leif jpflori mhansen added

comment:2 Changed 8 years ago by kcrisman

  • Cc kcrisman added

comment:3 follow-up: Changed 8 years ago by jdemeyer

Can you disassemble to see which instruction failed?

This MPFR binary was built with

-Wall -Wmissing-prototypes -Wpointer-arith -O2 -m64 -march=corei7 -mtune=corei7 

MPIR was built with

-O2 -m64 -march=corei7 -mtune=corei7

So, SAGE_FAT_BINARY indeed doesn't work as it should.

Last edited 8 years ago by jdemeyer (previous) (diff)

comment:4 in reply to: ↑ 3 Changed 8 years ago by kcrisman

Can you disassemble to see which instruction failed?

If you tell me how to do that (explicit commands) I have the offending binary on the right CPU and could at least get you a dump of the info that came out.

comment:5 Changed 8 years ago by jdemeyer

  • Component changed from build to packages
  • Owner changed from GeorgSWeber to tbd
  • Priority changed from major to blocker

comment:6 Changed 8 years ago by jdemeyer

  • Description modified (diff)
  • Summary changed from MPFR illegal instruction in OSX binaries to Make MPIR support SAGE_FAT_BINARY on all systems

The problem is with MPIR (as MPFR simply uses MPIR's flags). The current MPIR spkg only checks SAGE_FAT_BINARY on Linux systems.

comment:7 Changed 8 years ago by jdemeyer

  • Description modified (diff)

comment:8 Changed 8 years ago by jdemeyer

  • Authors set to Jeroen Demeyer
  • Description modified (diff)

comment:9 Changed 8 years ago by jdemeyer

  • Description modified (diff)

comment:10 Changed 8 years ago by jdemeyer

  • Description modified (diff)
  • Milestone changed from sage-5.1 to sage-5.0.1
  • Status changed from new to needs_review

comment:11 follow-ups: Changed 8 years ago by vbraun

SAGE_FAT_BINARY should be synonymous with SAGE_GENERIC_BINARY, though for historical reasons it isn't named well. From http://www.sagemath.org/doc/installation/source.html:

SAGE_FAT_BINARY - to prepare a binary distribution that will run on the widest range of target machines, set this variable to "yes"

I think right now you don't remove advanced CFLAGS on linux platfroms if SAGE_FAT_BINARY is set.

comment:12 in reply to: ↑ 11 ; follow-up: Changed 8 years ago by jdemeyer

Replying to vbraun:

SAGE_FAT_BINARY should be synonymous with SAGE_GENERIC_BINARY, though for historical reasons it isn't named well. From http://www.sagemath.org/doc/installation/source.html:

SAGE_FAT_BINARY - to prepare a binary distribution that will run on the widest range of target machines, set this variable to "yes"

I think right now you don't remove advanced CFLAGS on linux platfroms if SAGE_FAT_BINARY is set.

MPIR actually does support "fat binaries" in the true sense on x86 systems: building a library with code for different processors, determining at run-time which one to use.

comment:13 follow-up: Changed 8 years ago by dimpase

I can confirm that installing this spkg into the sage-5.0-OSX-64bit-10.6-x86_64-Darwin.dmg binary release

on Core 2 Duo (OSX 10.6.8) fixes the problem, apparently. I'll run more tests, just in case.

comment:14 in reply to: ↑ 13 ; follow-up: Changed 8 years ago by mhansen

Replying to dimpase:

I can confirm that installing this spkg into the sage-5.0-OSX-64bit-10.6-x86_64-Darwin.dmg binary release

on Core 2 Duo (OSX 10.6.8) fixes the problem, apparently. I'll run more tests, just in case.

I would think that any spkg installed (i.e., built from source on the machine it's running on) would work. Isn't the issue when it's built on another machine (with newer processor) and then the resulting binary used on an older machine?

comment:15 in reply to: ↑ 14 Changed 8 years ago by dimpase

Replying to mhansen:

Replying to dimpase:

I can confirm that installing this spkg into the sage-5.0-OSX-64bit-10.6-x86_64-Darwin.dmg binary release

on Core 2 Duo (OSX 10.6.8) fixes the problem, apparently. I'll run more tests, just in case.

I would think that any spkg installed (i.e., built from source on the machine it's running on) would work. Isn't the issue when it's built on another machine (with newer processor) and then the resulting binary used on an older machine?

it's a good point. Well, I don't have newer hardware to test this.

comment:16 in reply to: ↑ 12 ; follow-up: Changed 8 years ago by vbraun

Replying to jdemeyer:

MPIR actually does support "fat binaries" in the true sense on x86 systems: building a library with code for different processors, determining at run-time which one to use.

Why do we have to set CFLAGS at all, then?

Also, is this really true? With all CFLAGS turned on won't the compiler potentially emit advanced instructions in the surrounding library code, outside of the time-critical algorithms?

comment:17 in reply to: ↑ 16 Changed 8 years ago by jdemeyer

Replying to vbraun:

With all CFLAGS turned on won't the compiler potentially emit advanced instructions in the surrounding library code, outside of the time-critical algorithms?

The surrounding library code is compiled with basic CFLAGS, without setting -march at all. But the critical routines are compiled several times for various processors.

comment:18 Changed 8 years ago by jdemeyer

  • Description modified (diff)

If you want

comment:19 follow-up: Changed 8 years ago by dimpase

  • Status changed from needs_review to needs_work

I tried the (non-app) binary from 5.0.1.rc0 on on Core 2 Duo (OSX 10.6.8), and got the usual crash:

sage: int(2.75)

------------------------------------------------------------------------
Unhandled SIGILL: An illegal instruction occurred in Sage.
This probably occurred because a *compiled* component of Sage has a bug
in it and is not properly wrapped with sig_on(), sig_off(). You might
want to run Sage under gdb with 'sage -gdb' to debug this.
Sage will now terminate.
------------------------------------------------------------------------
/Users/dima/Desktop/sage/spkg/bin/sage: line 312: 63629 Illegal instruction     sage-ipython "$@" -i

with ./sage -gdb, I get

sage: int(2.75)

Program received signal EXC_BAD_INSTRUCTION, Illegal instruction/operand.
0x00000001016e3ed9 in case1 ()
(gdb) bt
#0  0x00000001016e3ed9 in case1 ()
#1  0x0000000103e6dba4 in parsed_string_to_mpfr ()
#2  0x0000000103e6e7fb in mpfr_strtofr ()
Previous frame inner to this frame (gdb could not unwind past this frame)
(gdb) 
Last edited 8 years ago by dimpase (previous) (diff)

comment:20 in reply to: ↑ 19 ; follow-up: Changed 8 years ago by jdemeyer

Can you send the output of

(gdb) disas

when getting a SIGILL?

comment:21 in reply to: ↑ 20 Changed 8 years ago by dimpase

Replying to jdemeyer:

Can you send the output of

(gdb) disas

when getting a SIGILL?

sage: int(2.75)

Program received signal EXC_BAD_INSTRUCTION, Illegal instruction/operand.
0x00000001016e3ed9 in case1 ()
(gdb) disas
Dump of assembler code for function case1:
0x00000001016e3ed9 <case1+0>:	popcnt 0x20(%rdi),%r8
0x00000001016e3edf <case1+6>:	add    %r8,%rax
End of assembler dump.
(gdb) 

comment:22 follow-up: Changed 8 years ago by vbraun

popcnt is SSE4.2, so Core 2 Duo don't have it. Btw on Linux I can do

[~]$ objdump -d sage/local/lib/libmpir.so | grep popcnt
   45200:	f3 4c 0f b8 04 cf    	popcnt (%rdi,%rcx,8),%r8
   45206:	f3 4c 0f b8 4c cf 08 	popcnt 0x8(%rdi,%rcx,8),%r9
[...]

and I believe the equivalent command on OSX would be otool -tV. Shouldn't we add some script to sage -bdist that checks for the absence of, say, SSE >= 4 instructions?

comment:23 in reply to: ↑ 22 ; follow-up: Changed 8 years ago by dimpase

Replying to vbraun:

popcnt is SSE4.2, so Core 2 Duo don't have it. Btw on Linux I can do

[~]$ objdump -d sage/local/lib/libmpir.so | grep popcnt
   45200:	f3 4c 0f b8 04 cf    	popcnt (%rdi,%rcx,8),%r8
   45206:	f3 4c 0f b8 4c cf 08 	popcnt 0x8(%rdi,%rcx,8),%r9
[...]

and I believe the equivalent command on OSX would be otool -tV.

yes, that's right (except it's not .so, but .dylb)

$ otool -tV local/lib/libmpir.dylib | grep popcnt
000000000003be60	popcnt	(%rdi,%rcx,8),%r8
000000000003be66	popcnt	0x08(%rdi,%rcx,8),%r9

Shouldn't we add some script to sage -bdist that checks for the absence of, say, SSE >= 4 instructions?

how would it help? The problem seems to be in the inability of non-Apple gcc to build fat, a.k.a. universal, dynamic libs and executables. So I think we should either resort to making binary releases for OSX targeted to a particular architecture, or provide some post-install script that basically would do ./sage -f mpir (and anything else that needs similar fix) followed by ./sage -b. It's time-consuming to run, but it needs to be done only once...

comment:24 in reply to: ↑ 23 ; follow-up: Changed 8 years ago by vbraun

Replying to dimpase:

how would it help?

It would help to ensure that we don't accidentally ship binaries that require advanced instruction set extensions. Thats all I meant.

comment:25 in reply to: ↑ 24 ; follow-up: Changed 8 years ago by dimpase

Replying to vbraun:

Replying to dimpase:

how would it help?

It would help to ensure that we don't accidentally ship binaries that require advanced instruction set extensions. Thats all I meant.

well, we certainly should ship such binaries, it's more about the proper labeling of them as such...

comment:26 in reply to: ↑ 11 ; follow-up: Changed 8 years ago by leif

Replying to vbraun:

SAGE_FAT_BINARY should be synonymous with SAGE_GENERIC_BINARY, though for historical reasons it isn't named well. From http://www.sagemath.org/doc/installation/source.html:

SAGE_FAT_BINARY - to prepare a binary distribution that will run on the widest range of target machines, set this variable to "yes"

That description is certainly insufficient (or, to be more precise, one shouldn't read it from right to left). One should in general not prepare "fat" binaries on relatively new (e.g. Core i7) CPUs -- unless one knows what one's doing... ;-)

IMHO the main purpose of SAGE_FAT_BINARY is to avoid the use of processor-specific ("hand-written") assembly code (and prevent Sage from using things like -march=native); it's still up to the distributor to make sure the compiler is targeted to the desired CPU family (whichever that is).


Doing some "sanity check" upon sage -bdist isn't all bad, but of course heavily depends on what (limited) architecture a specific binary distribution is supposed to be run on.


I never liked MPIR_EXTRA_OPTS (for additional options to be passed to MPIR's configure), but it's named like this for consistency with other spkgs, i.e., historical reasons. A couple of spkgs (or every spkg that supports such) use ${PKG_NAME}_EXTRA_OPTS.

comment:27 Changed 8 years ago by jdemeyer

  • Description modified (diff)
  • Status changed from needs_work to needs_review

comment:28 Changed 8 years ago by jdemeyer

  • Keywords sd40.5 added

comment:29 in reply to: ↑ 25 Changed 8 years ago by jdemeyer

Replying to dimpase:

Replying to vbraun:

Replying to dimpase:

how would it help?

It would help to ensure that we don't accidentally ship binaries that require advanced instruction set extensions. Thats all I meant.

well, we certainly should ship such binaries

I don't think we should ship such binaries, otherwise you're just going to confuse people.

comment:30 Changed 8 years ago by jdemeyer

  • Cc vbraun added

New spkg should be ready for review, changes have not yet been committed.

Changed 8 years ago by jdemeyer

Diff between the 2.4.0.p3 and 2.4.0.p4 spkgs. For reference / review only.

comment:31 in reply to: ↑ 26 Changed 8 years ago by jdemeyer

Replying to leif:

I never liked MPIR_EXTRA_OPTS (for additional options to be passed to MPIR's configure), but it's named like this for consistency with other spkgs, i.e., historical reasons. A couple of spkgs (or every spkg that supports such) use ${PKG_NAME}_EXTRA_OPTS.

So let's change it to {PKG}_CONFIGURE. Some of these packages (mpfr, ecm, pari) need to be changed anyway.

comment:32 Changed 8 years ago by was

  • Reviewers set to William Stein
  • Status changed from needs_review to positive_review

I read through your patch twice, tried building with SAGE_FAT_BINARY="yes", and this all looks great to me. Of course, the real test is the buildbot and whether or not this really solves people's problems. So for now I give this a positive review.

comment:33 Changed 8 years ago by kcrisman

  • Reviewers changed from William Stein to William Stein, Karl-Dieter Crisman

I can confirm that this seems to work on an Intel Core 2 Duo which previously could not use the buildbot's binary.

comment:34 Changed 8 years ago by jdemeyer

  • Merged in set to sage-5.0.1
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:35 Changed 8 years ago by jdemeyer

  • Merged in changed from sage-5.0.1 to sage-5.0.1.rc0

comment:36 Changed 8 years ago by kcrisman

Just for reference, this also appears to have solved a similar issue on older PPC Macs using binaries built on (still old, but not as old) less old PPC Macs which was new with 5.0. Good work.

Note: See TracTickets for help on using tickets.