Opened 5 years ago

Closed 5 years ago

#21064 closed defect (fixed)

Enable NTL's '-march=native' more cautiously

Reported by: leif Owned by: leif
Priority: blocker Milestone: sage-7.3
Component: packages: standard Keywords: assembler shlx mulx
Cc: dunfield, fbissey Merged in:
Authors: Leif Leonhardy Reviewers: Jeroen Demeyer, Volker Braun
Report Upstream: N/A Work issues:
Branch: 41ea5da (Commits, GitHub, GitLab) Commit: 41ea5daf35f02a6e53bd62a08a423a472edbb2d4
Dependencies: Stopgaps:

Status badges

Description (last modified by leif)

Take more care in passing NATIVE=on to NTL's configure.

Upstream only checks whether the compiler accepts the option, nothing beyond that.

This causes problems if the assembler isn't recent enough to understand all of the instructions GCC emits with -march=native, e.g. on older versions of MacOS X (with an ancient GAS as the default assembler), or on older Linux distros where Sage's GCC got built because the native one is outdated (more precisely, not sufficient to build Sage).

While we could (and hopefully will) add more sophisticated checks regarding the assembler's capabilities (or use an alternate one, e.g. clang's/LLVM's on Darwin, cf. #20779), we need a quick (and safe) fix for Sage 7.3.

Change History (24)

comment:1 Changed 5 years ago by leif

  • Description modified (diff)

comment:2 Changed 5 years ago by leif

  • Owner changed from (none) to leif

comment:3 Changed 5 years ago by leif

  • Branch set to u/leif/enable_NTL_native_more_safely
  • Cc dunfield fbissey added
  • Commit set to 7e1ceb270804b7525db553d35be29ccecc03206a
  • Status changed from new to needs_review
  • Work issues push a branch :-) deleted

Here it is.

Feel free to vote for the second commit; the third is less restrictive.


New commits:

0348d59NTL: Don't enable NATIVE (also) when Sage's GCC is used (because the assembler *might* not be capable enough).
3c9ee18NTL: Bump patchlevel, add some messages and prepare for improvement regarding NATIVE=on/off (in spkg-install).
7e1ceb2NTL: Do not entirely disable NATIVE when Sage's GCC is used, white-list some assemblers instead (including LLVM's).

comment:4 follow-up: Changed 5 years ago by fbissey

Option 3 forced by brain to work and conduct some little checks on my machine. Option 3 is more complicated but more refined in its treatment of corner cases, and it is well documented (all options are documented well enough).

Being refined is always going to be more complicated. Anyone has contacted Victor to see if he could make a check that would also test the assembler in a new version? That would be even better in my opinion. But in the meantime. I'll vote for (3) against my better judgement about its complexity.

comment:5 in reply to: ↑ 4 ; follow-up: Changed 5 years ago by leif

Replying to fbissey:

Option 3 forced by brain to work and conduct some little checks on my machine. Option 3 is more complicated but more refined in its treatment of corner cases, and it is well documented (all options are documented well enough).

Being refined is always going to be more complicated. Anyone has contacted Victor to see if he could make a check that would also test the assembler in a new version? That would be even better in my opinion. But in the meantime. I'll vote for (3) against my better judgement about its complexity.

Yep, it became more lengthy than I first thought. (Empathy with Apple fan boys is no good. ;-) )

W.r.t. upstream, thinking more about it I came to the conclusion that it's pretty ok to assume that the parts of the toolchain fit, and for an ordinary user manually running NTL's configure it's easy to just pass NATIVE=off if something doesn't work.

On the other hand, I actually intended to add proper feature checks (for each "non-standard" ISA extension enabled by -march=native, which of course is a moving target), probably also with some runtime checks (actually executing instructions), because what CPUID reports isn't always correct (and hence GCC could be "wrong" as well), be it a crippled and flawed Intel chip, or a VM bug.

comment:6 follow-up: Changed 5 years ago by dunfield

I tried out version (3) of this patch on OS X version 10.9 (which has the old as) and now NTL compiles fine. So it looks good at my end.

One very minor quibble with the OS X-specific comment in the spkg-install file:

TODO: If we pass '-q', it suddenly becomes LLVM's 'as', which should work.

It is true that NTL will compile if you add -q, but I wouldn't recommend doing so just for NTL, only Sage as a whole; I've seen weird segfaults when I've tried mixing the two assemblers...

comment:7 follow-up: Changed 5 years ago by fbissey

That's a good point and in that case it would be motivation for (2) where you don't touch the assembler.

comment:8 in reply to: ↑ 6 ; follow-up: Changed 5 years ago by leif

Replying to dunfield:

One very minor quibble with the OS X-specific comment in the spkg-install file:

TODO: If we pass '-q', it suddenly becomes LLVM's 'as', which should work.

It is true that NTL will compile if you add -q, but I wouldn't recommend doing so just for NTL, only Sage as a whole; I've seen weird segfaults when I've tried mixing the two assemblers...

Better this way?

  • build/pkgs/ntl/spkg-install

    diff --git a/build/pkgs/ntl/spkg-install b/build/pkgs/ntl/spkg-install
    index 0a35355..cb5ad4e 100755
    a b ntl_configure() 
    109109                assembler_ok=true # perhaps until we upgrade Sage's GCC... ;-)
    110110                ;;
    111111              1.38) # This is Apple's ancient version of GNU as.
    112                 # TODO: If we pass '-q', it suddenly becomes LLVM's 'as', which
    113                 #       should work.
     112                # If we passed '-q', it would suddenly become LLVM's 'as'.
    114113                ;;
    115114            esac
    116115        elif [ -n "$LLVMAS_VERSION" ]; then

(The corresponding commit message is more explicit.)

comment:9 in reply to: ↑ 7 Changed 5 years ago by dunfield

Replying to fbissey:

That's a good point and in that case it would be motivation for (2) where you don't touch the assembler.

I think (3) is still OK in this regard since it never changes the assembler, but just determines which will be used (and consequently whether we can expect it to work with the -march=native flag).

comment:10 in reply to: ↑ 8 ; follow-up: Changed 5 years ago by dunfield

Replying to leif:

Better this way?

Yes, that looks good to me.

comment:11 Changed 5 years ago by git

  • Commit changed from 7e1ceb270804b7525db553d35be29ccecc03206a to 3cccfbbe23776c2fc682a6dce73b1efee23a8339

Branch pushed to git repo; I updated commit sha1. New commits:

3cccfbbNTL: Don't encourage people to pass '-q' to Apple's old GAS just in NTL's spkg-install.

comment:12 in reply to: ↑ 10 Changed 5 years ago by leif

Replying to dunfield:

Replying to leif:

Better this way?

Yes, that looks good to me.

Committed.

comment:13 follow-up: Changed 5 years ago by jdemeyer

That's a lot of complicated logic which looks quite fragile. Why not simply:

  1. Try to build normally.
  1. If that failed, retry without -march=native.

comment:14 in reply to: ↑ 13 ; follow-up: Changed 5 years ago by leif

Replying to jdemeyer:

That's a lot of complicated logic which looks quite fragile.

It's about 30 lines of code, 10 of them echos. And I fail to see where it's "fragile", as we don't do dangerous things here, just try to not lose performance without reason.


Why not simply:

  1. Try to build normally.
  1. If that failed, retry without -march=native.

That's nearly dumber than simply disabling NATIVE; if NTL took a few seconds (maybe minutes) to build, I'd perhaps agree. And we cannot be sure it will fail early.

Imagine we'd do so for every package, probably trying more than one option.

As I said, in the long run, proper tests belong to NTL's configure, and the general problem on MacOS X will be solved on #20779. I'd also appreciate if we had an (optional) binutils package for older Linux distros, otherwise we should perhaps stop building our own GCC (on those at least).

comment:15 Changed 5 years ago by leif

P.S.: If we solve the toolchain problems (partially caused by Apple, but also by us), we should IMHO by default build everything with -march=native, unless SAGE_FAT_BINARY=yes.

comment:16 in reply to: ↑ 14 ; follow-up: Changed 5 years ago by jdemeyer

Replying to leif:

And I fail to see where it's "fragile"

The part where you are checking version numbers in strings. This 2.2[2-9]*|2.3*) for example is a very wrong way to test for a version number >= 2.22. But even if you fix that, you still have to "parse" the output of as --version correctly.

comment:17 Changed 5 years ago by jdemeyer

Can you remove this pointless code:

              1.38) # This is Apple's ancient version of GNU as.
                # If we passed '-q', it would suddenly become LLVM's 'as'.
                ;;

comment:18 Changed 5 years ago by git

  • Commit changed from 3cccfbbe23776c2fc682a6dce73b1efee23a8339 to 41ea5daf35f02a6e53bd62a08a423a472edbb2d4

Branch pushed to git repo; I updated commit sha1. New commits:

41ea5daNTL: Do not consider GAS 2.3 as recent, warn if it's still (Apple's) 1.38 in spkg-install

comment:19 in reply to: ↑ 16 Changed 5 years ago by leif

Replying to jdemeyer:

Replying to leif:

And I fail to see where it's "fragile"

The part where you are checking version numbers in strings. This 2.2[2-9]*|2.3*) for example is a very wrong way to test for a version number >= 2.22.

Show me a system where GAS 2.3 is used (and Sage at least partially builds)...

(Fixed that though.)


But even if you fix that, you still have to "parse" the output of as --version correctly.

If it contains LLVM, it's LLVM's as, certainly not GAS at least, and completely sufficient here.

If you instead meant to refer to as -v,

        GAS_VERSION=$( ${AS:-as} -v </dev/null -o /dev/null 2>&1 \
            | sed  -n '/GNU assembler/s/^.* \([12]\.[^ ][^ ]*\).*$/\1/p' )

works for all versions I encountered in this context (namely, Sage). This is spkg-install, not configure.ac. (Note that GAS doesn't support anything like -dumpversion, and distro packagers mess up what vanilla GAS gives.)


And, again repeating myself, I'm not going to mess with Darwin and Xcode versions here, that belongs elsewhere. (You may have noticed that I do not even check whether we're on Darwin if GAS is 1.38 or LLVM's assembler is present, because that's far beyond what we need here. This ticket works around a fall-out of other bugs in Sage: Not properly checking / requiring Xcode versions, and installing a GCC which may not fit, hence break, the other parts of the toolchain -- that's why I'm voting for offering an adequate binutils Sage package.)

comment:20 Changed 5 years ago by leif

P.S.: I'll probably write some script which converts -march=native in *FLAGS to -march=native -mno-foo -mno-bar ... as appropriate, but including (and regularly updating!) such somewhere in Sage is something for future tickets. (We could do that, i.e., build SAGE_NATIVE_*FLAGS, in Sage's top-level configure, and again after having installed Sage's GCC and probably binutils. But in general, -march=native is supposed to work without additional hacks on platforms where GCC supports that option.)

comment:21 in reply to: ↑ 5 Changed 5 years ago by leif

Replying to leif:

Replying to fbissey:

Anyone has contacted Victor to see if he could make a check that would also test the assembler in a new version?

W.r.t. upstream, thinking more about it I came to the conclusion that it's pretty ok to assume that the parts of the toolchain fit, and for an ordinary user manually running NTL's configure it's easy to just pass NATIVE=off if something doesn't work.

On the other hand, I actually intended to add proper feature checks (for each "non-standard" ISA extension enabled by -march=native, which of course is a moving target), probably also with some runtime checks (actually executing instructions), because what CPUID reports isn't always correct (and hence GCC could be "wrong" as well), be it a crippled and flawed Intel chip, or a VM bug.

Looking deeper into NTL's configure process (I haven't done for a while), there's certainly room for improvements, i.e., changes we could submit upstream in the long run.

While there are tests for AVX, AVX2 and __builtin_clzl() actually running test code (implicitly testing the toolchain), the test program (which gets compiled and linked, but currently not run) for CXXAUTOFLAGS (which include -march=native if NATIVE=on) is the following:

int main() { return 0; }

I think it's unlikely that GCC will generate code containing any ISA extension instructions for this, so the assembler won't complain there (and the program -- as is -- is likely to run without errors on any machine with the same [generic] architecture, not only on those with the specific one GCC thinks is the native).

comment:22 follow-up: Changed 5 years ago by vbraun

  • Reviewers set to Jeroen Demeyer, Volker Braun
  • Status changed from needs_review to positive_review

Fixing the configure tests would be the best solution, but for now we should probably ship this workaround.

comment:23 in reply to: ↑ 22 Changed 5 years ago by leif

Replying to vbraun:

Fixing the configure tests would be the best solution, but for now we should probably ship this workaround.

Yep. Unfortunately NTL doesn't have the usual configure && make structure; configure calls a Perl script which generates (and continually rewrites) a makefile (and indirectly config files) which is then again called (make -f ...) from the script to perform some tests.

The "final" makefile then first builds kind of another configure chain and runs that before building the library; there, some tests simply happen "too late" (as building the tools may already fail).

(This is similar to a shell script which first tests features of the shell it is run with, but, when run with the wrong shell, already bails out with a syntax error when performing the tests supposed to avoid exactly that.)

So it's not easy to submit anything useful / acceptable upstream (without completely rewriting the above, or being too specific and probably kind of redundant in just fixing our problem which is rather uncommon).

comment:24 Changed 5 years ago by vbraun

  • Branch changed from u/leif/enable_NTL_native_more_safely to 41ea5daf35f02a6e53bd62a08a423a472edbb2d4
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.