Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#21223 closed defect (fixed)

ECM segfaults in Cygwin64

Reported by: embray Owned by:
Priority: major Milestone: sage-7.4
Component: porting: Cygwin Keywords:
Cc: jpflori, tscrim Merged in:
Authors: Erik Bray Reviewers: Jeroen Demeyer
Report Upstream: Fixed upstream, but not in a stable release. Work issues:
Branch: a4adca5 (Commits) Commit:
Dependencies: Stopgaps:

Description (last modified by embray)

Running ECM with almost any inputs results in a segfault on Cygwin64.

This is because any time it enters one of its hand-written assembly routines it needs to use the appropriate calling convention (Microsoft x64 convention on Windows), but it is only picking the correct Windows calling convention when compiling with mingw. It should do the same with Cygwin as well.

One possible workaround without patching would be to configure ECM with --disable-asm-redc to disable the custom assembly code entirely. But this is also overkill--the code works just fine as long as it's configured correctly for the platform.

Submitted upstream at https://gforge.inria.fr/tracker/index.php?func=detail&aid=20697&group_id=135&atid=623

Change History (43)

comment:1 Changed 3 years ago by embray

  • Authors set to Erik Bray
  • Branch set to u/embray/ticket-21223
  • Commit set to c80277a2ea1663d4803e4974985f829d27d5259c
  • Description modified (diff)
  • Status changed from new to needs_review

New commits:

c80277aMake sure the WINDOWS64_ABI flag is set on Cygwin as well, otherwise the assembly routines use the wrong calling convention.

comment:2 Changed 3 years ago by embray

  • Report Upstream changed from N/A to Not yet reported upstream; Will do shortly.

comment:3 Changed 3 years ago by leif

ECM 7.x got "revived", see #20385. 7.0.3 is still kind of beta/rc though I'd say.

comment:4 Changed 3 years ago by git

  • Commit changed from c80277a2ea1663d4803e4974985f829d27d5259c to 6361e6cdac8484964b9e952f5b482b986113aaee

Branch pushed to git repo; I updated commit sha1. New commits:

6361e6cBump patch level

comment:5 follow-up: Changed 3 years ago by leif

... as is the branch on #20385; i.e., not really "needs review" yet, but you could try whether that has been fixed upstream already, probably with vanilla 7.0.3.

comment:6 in reply to: ↑ 5 Changed 3 years ago by leif

Replying to leif:

you could try whether that has been fixed upstream already, probably with vanilla 7.0.3.

Nope, in 7.0.3 (now configure.ac) there's still:

if test "x$enable_asm_redc" = xyes; then
# do the necessary definitions and includes
  AC_DEFINE([USE_ASM_REDC],1,[Define to 1 to use asm redc])
  test "x$CCAS" != x || CCAS="$CC -c"
  AC_SUBST([CCAS])
  GMP_PROG_M4
  GMP_ASM_UNDERSCORE
  GMP_ASM_TEXT
  GMP_ASM_GLOBL
  GMP_ASM_TYPE

  case $host in
    *-*-mingw32) GMP_DEFINE([WINDOWS64_ABI], 1)
                 AC_DEFINE([WINDOWS64_ABI], 1,[Define to 1 if x86_64 mulredc*() functions should be called with Windows ABI]);;
    *) ;;
  esac

  case $host in
    pentium3-*-*)
      echo "WARNING: Your processor is recognized as Pentium3."
      echo "         The asm code uses SSE2, and therefore it might"
      echo "         fail if your proc is indeed a P3, and not a"
      echo "         Pentium M. If you have compilation problems,"
      echo "         consider using --disable-asm-redc." ;;
    *)
  esac
fi
AM_CONDITIONAL([ENABLE_ASM_REDC], [test "x$enable_asm_redc" = xyes])

comment:7 Changed 3 years ago by leif

I'll take care of including this patch on #20385 if upstream won't include it soon.

comment:8 Changed 3 years ago by embray

Okay, thanks.

comment:9 follow-up: Changed 3 years ago by jdemeyer

  • Milestone set to sage-7.4
  • Status changed from needs_review to needs_work
  1. This should be reported upstream.
  1. You need to patch configure.in before configure otherwise the build system will want to run autoconf to update configure.
  1. I cannot test this, but the patch is clear. If you say this makes ECM work on Cygwin64, that's good enough for me.

comment:10 in reply to: ↑ 9 Changed 3 years ago by leif

Replying to jdemeyer:

  1. You need to patch configure.in before configure otherwise the build system will want to run autoconf to update configure.

Oh right, patch doesn't by default take the modification time from the patch header (where configure is about a minute newer).

While it would be safer to explicitly touch configure after patching in spkg-install, make shouldTM treat configure up-to-date if you just reverse the order in the patch. (I'm not even sure it wouldn't as is; you'd probably need a very slow machine to cause it to not do so.)

comment:11 Changed 3 years ago by embray

  • Description modified (diff)
  • Report Upstream changed from Not yet reported upstream; Will do shortly. to Reported upstream. No feedback yet.

comment:12 Changed 3 years ago by git

  • Commit changed from 6361e6cdac8484964b9e952f5b482b986113aaee to a4adca5f24de505dfe56316ddc9bd47954746415

Branch pushed to git repo; I updated commit sha1. New commits:

a4adca5Change the patch order so that configure gets touched after configure.in

comment:13 Changed 3 years ago by embray

  • Status changed from needs_work to needs_review

comment:14 Changed 3 years ago by jdemeyer

  • Reviewers set to Jeroen Demeyer
  • Status changed from needs_review to positive_review

comment:15 Changed 3 years ago by jdemeyer

I assume you actually tested this on Cygwin64 :-)

comment:16 Changed 3 years ago by leif

Jeroen was quicker, but I also didn't test this on Cygwin[64]...

Last edited 3 years ago by leif (previous) (diff)

comment:17 Changed 3 years ago by embray

Yup. Before this patch, running ecm segfaults. Without it, no segfaults (and all the tests that invoke it pass)

comment:18 Changed 3 years ago by leif

FWIW, I guess similar would still happen on Cygwin32 on some x86 CPUs at least (probably x86_64 in 32-bit mode as well), but only if one explicitly configures GMP-ECM with --enable-asm-redc, which we don't do in Sage. (A user could still do that in Sage's environment variable ECM_CONFIGURE though.) The relevant asm code is in athlon/ and pentium4/.

Not sure whether the NTT inline assembly code (which is only used on [some] x86 CPUs, and again probably also on x86_64 in 32-bit mode) is safe w.r.t. the Windows ABI. (It might get used on Cygwin32, even by default, depending on the CPU, AFAICS.)

But both is IMHO beyond this ticket, since the ticket's title mentions just Cygwin64. (Feel free to try and eventually open a follow-up ticket for that... :P -- I personally won't.)

comment:19 follow-up: Changed 3 years ago by embray

The solution in the ticket just targets "cygwin" so it's a moot point I think.

Also I'm personally only targeting Sage for Cygwin64. Reason being I hope soon (once most of the issues are worked out) to release a Windows installer for Sage that includes cygwin as a DLL. This is for users, less so developers, and I'm not interested in non-64-bit platforms (though willing to poke at them if absolutely necessary).

comment:20 in reply to: ↑ 19 Changed 3 years ago by leif

  • Cc jpflori added

Replying to embray:

The solution in the ticket just targets "cygwin" so it's a moot point I think.

Nope, that doesn't fix the two issues I mentioned above, as on 32-bit systems the x86_64 code isn't used, and (of course) only in the latter WINDOWS64_ABI is used at all. (There's no macro for Windows ABI in general, nor Win32 ABI, although native Microsoft Windows -- with MS Visual Foo -- is handled.)


Also I'm personally only targeting Sage for Cygwin64. Reason being I hope soon (once most of the issues are worked out) to release a Windows installer for Sage that includes cygwin as a DLL. This is for users, less so developers, and I'm not interested in non-64-bit platforms (though willing to poke at them if absolutely necessary).

Ok.

Other Sage-on-Cygwin developers may try. (You could still detail your upstream report though, as it effectively only addresses Cygwin64.)

comment:21 follow-up: Changed 3 years ago by embray

I'm not really sure what you're trying to say here.

comment:22 in reply to: ↑ 21 Changed 3 years ago by leif

Replying to embray:

I'm not really sure what you're trying to say here.

On 32-bit systems (including Cygwin32), other assembly files are used, which may cause similar problems (because they at least don't care about Pascal calling convention).

There's also some inline assembly which is used on 32-bit (x86) systems only; there I'm not sure whether it complies also to the Win32 ABI w.r.t. register usage.

comment:23 follow-up: Changed 3 years ago by embray

Ah, okay. That I don't know. If it is a problem it can be addressed in a new ticket.

comment:24 in reply to: ↑ 23 Changed 3 years ago by leif

Replying to embray:

Ah, okay. That I don't know. If it is a problem it can be addressed in a new ticket.

Sure. In any case, it's easy to work around potential problems on Cygwin32 by simply setting ECM_CONFIGURE="--disable-sse2 --disable-asm-redc" before or when (re)installing the package.

Who is likely to test on Cygwin32?

comment:25 Changed 3 years ago by embray

  • Cc tscrim added

comment:26 Changed 3 years ago by vbraun

  • Branch changed from u/embray/ticket-21223 to a4adca5f24de505dfe56316ddc9bd47954746415
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:27 Changed 3 years ago by zimmerma

  • Commit a4adca5f24de505dfe56316ddc9bd47954746415 deleted

patch applied upstream. Are you sure this is fine under 32-bit cygwin too?

Paul

comment:28 follow-up: Changed 3 years ago by jpflori

I should be able to test Cygwin32 this afternoon.

comment:29 Changed 3 years ago by zimmerma

I should be able to test Cygwin32 this afternoon.

great! Please test revision 2976 (or later) of GMP-ECM:

$ svn checkout svn://scm.gforge.inria.fr/svnroot/ecm/trunk ecm

Paul

comment:30 in reply to: ↑ 28 ; follow-up: Changed 3 years ago by leif

Replying to jpflori:

I should be able to test Cygwin32 this afternoon.

I replied to Paul on the upstream bug report. (The patch here won't break anything on Cygwin32, but it won't solve potential similar issues on 32-bit Cygwin either.)

You may try trunk (outside of Sage) with --enable-sse2 (but without --enable-asm-redc), perhaps running make longcheck instead of just make check, and afterwards with --enable-asm-redc (but without --enable-sse2).

comment:31 in reply to: ↑ 30 ; follow-up: Changed 3 years ago by leif

Replying to leif:

Replying to jpflori:

I should be able to test Cygwin32 this afternoon.

You may try trunk (outside of Sage) with --enable-sse2 (but without --enable-asm-redc), perhaps running make longcheck instead of just make check, and afterwards with --enable-asm-redc (but without --enable-sse2).

Also saving the config.logs and the build logs perhaps, to see what actually got used. (In 6.4.4, there's 32-bit asm code in athlon/ and pentium4/, besides inline assembly in {ntt_gfp,spv}.c, which is less critical I think. Just noticed ECM 7.x now also has x86/, which I haven't at all looked at yet.)

comment:32 in reply to: ↑ 31 Changed 3 years ago by leif

Replying to leif:

(In 6.4.4, there's 32-bit asm code in athlon/ and pentium4/, besides inline assembly in {ntt_gfp,spv}.c, which is less critical I think. Just noticed ECM 7.x now also has x86/, which I haven't at all looked at yet.)

Oh, there's no assembly code in x86/, just params.h:

#define MPZMOD_THRESHOLD 135
#define REDC_THRESHOLD 200
#define MPN_MUL_LO_THRESHOLD_TABLE {0, 0, 1, 1, 0, 0, 5, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 16, 1, 14, 14, 16, 16, 1}
#define NTT_GFP_TWIDDLE_DIF_BREAKOVER 11
#define NTT_GFP_TWIDDLE_DIT_BREAKOVER 11
#define MUL_NTT_THRESHOLD 262144
#define PREREVERTDIVISION_NTT_THRESHOLD 128
#define POLYINVERT_NTT_THRESHOLD 65536
#define POLYEVALT_NTT_THRESHOLD 16384
#define MPZSPV_NORMALISE_STRIDE 512

(And there's of course also inline assembly in longlong.h; don't know what's used on Cygwin[32], but presumably less critical, too.)

comment:33 Changed 3 years ago by jpflori

Some reports using ECM trunk. On a x86_64, with a Cygwin64 setup, outside of Sage, using my own GMP 6.1.1 (dll) and Cygwin's GCC:

./configure --prefix=$LOCAL --with-gmp=$LOCAL --enable-shared --disable-static

I get

...
configure: Build for host type x86_64-unknown-cygwin
configure: CC=gcc, CFLAGS=-W -Wall -Wundef -O2 -pedantic -fomit-frame-pointer -m64 -mtune=sandybridge -march=sandybridge
configure: Linking GMP with -lgmp
configure: Using asm redc code from directory x86_64
configure: Not using SSE2 instructions in NTT code
configure: Using APRCL to prove factors prime/composite
configure: Assertions enabled
configure: Shell command execution disabled
configure: OpenMP disabled
configure: Memory debugging disabled

and make fails with

m4 -I../ -DOPERATION_mulredc1 `test -f mulredc1.asm || echo './'`mulredc1.asm >mulredc1.s
/bin/sh ../libtool    --mode=compile gcc  -O2 -pedantic -fomit-frame-pointer -m64 -mtune=sandybridge -march=sandybridge -c -o mulredc1.lo mulredc1.s
libtool: compile:  gcc -O2 -pedantic -fomit-frame-pointer -m64 -mtune=sandybridge -march=sandybridge -c mulredc1.s  -DDLL_EXPORT -DPIC -o .libs/mulredc1.o
/usr/lib/gcc/x86_64-pc-cygwin/5.4.0/../../../../x86_64-pc-cygwin/bin/as: BFD assertion (GNU Binutils) 2.25.2 a échoué /cygdrive/i/szsz/tmpp/cygwin64/binutils/binutils-2.25-4.x86_64/src/binutils-gdb/bfd/coff-x86_64.c:695
mulredc1.s: Messages de l'assembleur:
mulredc1.s:59: Erreur: ne peut représenter le type de réadressage BFD_RELOC_386_GOTPC

Windows does not like $_GLOBAL_OFFSET_TABLE I would say.

Using ECM 6.4.4 on my Cygwin64 setup with the patch to use Win64 ABI, I get a similar configuration and make is fine. Surely WANT_ASSERT gets set in one case and not in the other, leading to the invalid use of the GOT in the trunk version. But then make check fails with

./test.pp1 ./ecm.exe
Cygwin runtime failure: /home/jp/ecm-6.4.4/.libs/ecm.exe: Invalid relocation.  Offset 0x3025f223a at address 0x100414542 doesn't fit into 32 bits
############### ERROR ###############
Expected return code 8 but got 127
make: *** [Makefile:1982: check] Error 1

Going back to the trunk version and passing --disable-assert, make is fine. And make check runs but fail all tests.

FAIL: test.pp1
FAIL: test.pm1
FAIL: test.ecm

with the same relocation errors as in 6.4.4 as the log files tell.

On the same x86_64 machine, with the trunk version, using a Cygwin32 setup with a similar configure command, I get:

configure: Build for host type i686-pc-cygwin
configure: CC=gcc, CFLAGS=-W -Wall -Wundef -m32 -O2 -pedantic -fomit-frame-pointer -mtune=sandybridge -march=sandybridge
configure: Linking GMP with -lgmp
configure: Not using asm redc code
configure: Using SSE2 instructions in NTT code
configure: Using APRCL to prove factors prime/composite
configure: Assertions enabled
configure: Shell command execution disabled
configure: OpenMP disabled
configure: Memory debugging disabled

and make and make check are ok.

comment:34 follow-up: Changed 3 years ago by jpflori

Just realized the "relocation" errors might just be rebasing issues...

comment:35 Changed 3 years ago by embray

  • Report Upstream changed from Reported upstream. No feedback yet. to Fixed upstream, in a later stable release.

Patch has been accepted upstream.

comment:36 in reply to: ↑ 34 Changed 3 years ago by jpflori

Replying to jpflori:

Just realized the "relocation" errors might just be rebasing issues...

And I can confirm they are.

comment:37 Changed 3 years ago by jpflori

I'll post further experiments on #20385.

comment:38 Changed 3 years ago by leif

  • Report Upstream changed from Fixed upstream, in a later stable release. to Fixed upstream, but not in a stable release.

Or did I miss something?

comment:39 follow-up: Changed 3 years ago by embray

I'm not sure I understand the meaning--I took "in a later stable release" to mean there will eventually be a stable release that contains this fix. "Later" to me implies that it doesn't exist yet, but will.

comment:40 in reply to: ↑ 39 Changed 3 years ago by leif

Replying to embray:

I'm not sure I understand the meaning--I took "in a later stable release" to mean there will eventually be a stable release that contains this fix. "Later" to me implies that it doesn't exist yet, but will.

No, "later" refers to what we currently have in Sage, so "fixed in a later [stable] upstream release" means there exists some upstream version without the issue, but Sage hasn't yet upgraded to it (which is quite often the case as some upgrades aren't straight-forward, for various reasons).

comment:41 Changed 3 years ago by leif

So it's not as speculative as you thought... ;-)

comment:42 Changed 3 years ago by embray

Okay, I've been misusing that then! Indeed I thought it was mostly speculative (but generally likely to be accurate for active projects). Anyways, fine by me.

comment:43 Changed 3 years ago by zimmerma

a new release candidate is now available at

https://members.loria.fr/PZimmermann/ecm-7.0.4-rc1.tar.gz

Paul

Note: See TracTickets for help on using tickets.