Opened 2 years ago

Closed 18 months ago

#29537 closed defect (fixed)

build not portable despite using SAGE_FAT_BINARY=yes, NTL/openblas/numpy-related

Reported by: Matthias Köppe Owned by:
Priority: blocker Milestone: sage-9.3
Component: porting Keywords: sd111
Cc: Dima Pasechnik, Michael Orlitzky, François Bissey, Erik Bray, Darij Grinberg, Travis Scrimshaw, gh-kliem, Sébastien Labbé Merged in:
Authors: Jonathan Kliem, Matthias Koeppe Reviewers: Matthias Koeppe, Jonathan Kliem
Report Upstream: N/A Work issues:
Branch: d6dced2 (Commits, GitHub, GitLab) Commit: d6dced2457b7c3a6b645f2e56b9ca44c4b09e437
Dependencies: #31493 Stopgaps:

Status badges

Description (last modified by Matthias Köppe)

We fix several issues involving SAGE_FAT_BINARY:

  • numpy intrinsics need a special build flag in this mode
  • ntl needs to disable use of a problematic instruction set
  • After #31064, the GH Actions tests for cygwin did not pass SAGE_FAT_BINARY to the actual build, randomly causing SIGILLs when a partial build was passed to the next stage through the sage-local artifact

Attachments (1)

openblas-0.3.13.log (12.4 MB) - added by Matthias Köppe 19 months ago.

Change History (84)

comment:1 Changed 2 years ago by Matthias Köppe

Priority: majorblocker

comment:2 Changed 2 years ago by Matthias Köppe

Cc: Darij Grinberg Travis Scrimshaw added

comment:3 Changed 2 years ago by Dima Pasechnik

could it be some kind of hardware mismatch?

comment:4 Changed 2 years ago by Matthias Köppe

Possibly, I'll investigate

comment:5 Changed 2 years ago by Matthias Köppe

I will try with the configuration that Erik's https://github.com/sagemath/sage-windows/blob/master/Makefile uses

comment:6 Changed 2 years ago by Matthias Köppe

Description: modified (diff)
Milestone: sage-9.1sage-9.2
Priority: blockercritical
Summary: Crash on building documentation and on startup, NTL-related, on cygwin-standardcygwin-standard: build not portable despite using SAGE_FAT_BINARY=yes, NTL-related

comment:7 Changed 2 years ago by Erik Bray

I looked at one of the build logs on GitHub and it does look like SAGE_FAT_BINARY=yes is being passed down, so other than that I'm not aware of any issue that would cause this unless there's a new issue with NTL I haven't seen before.

comment:8 Changed 2 years ago by Erik Bray

Possibly related: #29109

comment:9 Changed 2 years ago by Matthias Köppe

Milestone: sage-9.2sage-9.3

comment:10 Changed 22 months ago by Matthias Köppe

Cc: gh-kliem added
Keywords: sd111 added

comment:11 Changed 21 months ago by Matthias Köppe

It's quite possible that this also affects our Docker builds.

comment:12 Changed 21 months ago by Matthias Köppe

Cc: Sébastien Labbé added
Component: porting: Cygwinporting
Description: modified (diff)
Summary: cygwin-standard: build not portable despite using SAGE_FAT_BINARY=yes, NTL-relatedbuild not portable despite using SAGE_FAT_BINARY=yes, NTL-related

A report regarding the Docker builds: https://groups.google.com/g/sage-devel/c/Dbd4nFi8R7I (but no NTL involvement visible)

comment:13 Changed 21 months ago by gh-kliem

From NTL/doc/tour-win.html:

152 Configuration flags.
153 </b>
154 <p>
155 
156 Also in directory "<tt>include/NTL</tt>" is a file called "<tt>config.h</tt>".
157 You can edit this file to override some of NTL's default options
158 for basic configuration and performance.
159 Again, the defaults should be good for
160 Windows with MSVC++.

In config.h there is a couple of flags that needs to be set by hand apparently.

Those look like they might make problems:

195 #if 0
196 #define NTL_CLEAN_INT
197 
198 /*
199  *   This will disallow the use of some non-standard integer arithmetic
200  *   that may improve performance somewhat.
201  *
202  */
203 
204 #endif
205 
206 #if 1
207 #define NTL_CLEAN_PTR
208 
209 /*
210  *   This will disallow the use of some non-standard pointer arithmetic
211  *   that may improve performance somewhat.
212  *
213  */
214 
215 #endif
216 
217 #if 1
218 #define NTL_SAFE_VECTORS
219 
220 /*
221  * This will compile NTL in "safe vector" mode, only assuming
222  * the relocatability property for trivial types and types
223  * explicitly declared relocatable.  See vector.txt for more details.
224  */
225 
226 #endif
227 
228 #if 0
229 #define NTL_ENABLE_AVX_FFT
230 
231 /*
232  * This will compile NTL in a way that enables an AVX implemention
233  * of the small-prime FFT.
234  */
235 
236 #endif
237 
238 
239 #if 0
240 #define NTL_AVOID_AVX512
241  
242 /*
243  * This will compile NTL in a way that avoids 512-bit operations,
244  * even if AVX512 is available.
245  */
246 
247 #endif
248  
249 #if 0
250 #define NTL_RANGE_CHECK
251 
252 /*
253  *   This will generate vector subscript range-check code.
254  *   Useful for debugging, but it slows things down of course.
255  *
256  */
257 
258 #endif

comment:14 Changed 21 months ago by gh-kliem

So I guess we have to patch NTL as to avoid those flags by configuration.

comment:15 Changed 21 months ago by gh-kliem

The relevant configuration is done in src/DoConfig, which is unfortunately perl (yet another language whichs syntax I do not understand).

But one might even define those things as environment variables and this might be passed through??

In src/DoConfig there is the following list:

 55 %ConfigFlag = (
 56 
 57 'NTL_LEGACY_NO_NAMESPACE' => 'off',
 58 'NTL_LEGACY_INPUT_ERROR'  => 'off',
 59 'NTL_DISABLE_LONGDOUBLE'  => 'off',
 60 'NTL_DISABLE_LONGLONG'    => 'off',
 61 'NTL_DISABLE_LL_ASM'      => 'off',
 62 'NTL_MAXIMIZE_SP_NBITS'   => 'off',
 63 'NTL_LEGACY_SP_MULMOD'    => 'off',
 64 'NTL_THREADS'             => 'on',
 65 'NTL_TLS_HACK'            => 'on',
 66 'NTL_EXCEPTIONS'          => 'off',
 67 'NTL_STD_CXX11'           => 'on',
 68 'NTL_STD_CXX14'           => 'off',
 69 'NTL_DISABLE_MOVE_ASSIGN' => 'on',
 70 'NTL_DISABLE_MOVE'        => 'off',
 71 'NTL_THREAD_BOOST'        => 'on',
 72 'NTL_GMP_LIP'             => 'on',
 73 'NTL_GF2X_LIB'            => 'off',
 74 'NTL_X86_FIX'             => 'off',
 75 'NTL_NO_X86_FIX'          => 'off',
 76 'NTL_NO_INIT_TRANS'       => 'on',
 77 'NTL_CLEAN_INT'           => 'off',
 78 'NTL_CLEAN_PTR'           => 'on',
 79 'NTL_SAFE_VECTORS'        => 'on',
 80 'NTL_RANGE_CHECK'         => 'off',
 81 'NTL_ENABLE_AVX_FFT'      => 'off',
 82 'NTL_AVOID_AVX512'        => 'off',
 83 
 84 
 85 'NTL_SPMM_ULL'            => 'off',
 86 'NTL_AVOID_BRANCHING'     => 'off',
 87 'NTL_FFT_BIGTAB'          => 'off',
 88 'NTL_FFT_LAZYMUL'         => 'off',
 89 'NTL_TBL_REM'             => 'off',
 90 'NTL_CRT_ALTCODE'         => 'off',
 91 'NTL_CRT_ALTCODE_SMALL'   => 'off',
 92 'NTL_GF2X_NOINLINE'       => 'off',
 93 'NTL_GF2X_ALTCODE'        => 'off',
 94 'NTL_GF2X_ALTCODE1'       => 'off',
 95 
 96 
 97 );

What makes me thing that it might be AVX512 related, is that we haven't encountered this before. Github workflows is the only place where I encountered processors capable of AVX512.

comment:16 Changed 21 months ago by Matthias Köppe

I agree that NTL_AVOID_AVX512 is a top suspect here. Perhaps it can be passed to configure, like we already pass NATIVE=off when SAGE_FAT_BINARY=yes?

comment:17 Changed 21 months ago by gh-kliem

Branch: public/29537
Commit: f053669ecdaae1ea48dce00819e0c8f4de6a3f9d

Ok. Lets try that. How do I test it? Do I just start the cygwin actions with that? What tickets do I need to pull yet?


New commits:

f053669try to build portable ntl by disabling avx512

comment:18 Changed 21 months ago by Matthias Köppe

Run it with #31064 (or #29152)

comment:20 Changed 21 months ago by gh-kliem

iii-a already failed. Is this permanent? Should I just rerun?

https://github.com/kliem/sage/runs/1625789472?check_suite_focus=true

comment:21 Changed 21 months ago by Matthias Köppe

Hm. This failure is from fpylll - I have created #31146 for this.

Note that cygwin-standard does not build ntl because it finds the package provided by cygwin. Would need to test cygwin-minimal

comment:22 Changed 21 months ago by gh-kliem

ok, I started cygwin-minimal.

But now I'm confused. How does this relate to the ticket description?

By default the github workflow does not set SAGE_FAT_BINARY it seems, so of course one cannot use it locally.

comment:23 Changed 21 months ago by Matthias Köppe

Looks like the ticket description is outdated and cygwin's ntl package is accepted in current sage.

comment:24 Changed 21 months ago by Erik Bray

This is also related to this issue with Sage Windows, due to the fact that the NTL package for Cygwin is not built with NATIVE=off (and I was building Sage with the system package).

I reported the issue to the package maintainer who has rebuilt the package: https://cygwin.com/pipermail/cygwin/2020-December/247206.html

comment:25 Changed 19 months ago by Matthias Köppe

Summary: build not portable despite using SAGE_FAT_BINARY=yes, NTL-relatedbuild not portable despite using SAGE_FAT_BINARY=yes, NTL-related / openblas-related

I have encountered a similar problem again on Cygwin. This time it seems to come from OpenBLAS.

The Cygwin build takes place in multiple stages, passing on partial builds of $SAGE_LOCAL to the next stages via artifacts. I have now seen several instances (latest: in which the artifacts contain native instructions that lead to SIGILL on use in the next stage.

For example, in https://github.com/mkoeppe/sage/runs/2029013426?check_suite_focus=true aborts early in the sagelib build. On a local machine I can reproduce it as well by downloading the artifact. Using gdb it can be seen that merely importing some numpy modules already gives SIGILL.

Perhaps the latest change to OpenBLAS in #22179 needs revisiting.

comment:26 Changed 19 months ago by Matthias Köppe

Testing with revert of #22179 (for Cygwin) in https://github.com/mkoeppe/sage/actions/runs/628022275

comment:27 Changed 19 months ago by Erik Bray

This might also be related to https://github.com/sagemath/sage-windows/issues/57

NTL is working now on sage-windows, but I suspect there is a portability issue with OpenBLAS (thought it is resulting in segfaults, apparently, not illegal instructions).

comment:28 Changed 19 months ago by Matthias Köppe

Thanks for the pointer.

Even with reverting #22179 (for Cygwin), I'm still getting SIGILL, but (apparently) from numpy itself, in PyInit__multiarray_umath. It should be checked if numpy is somehow configuring itself using different CFLAGS

comment:29 Changed 19 months ago by Matthias Köppe

I think we may have to use -march=... explicitly to fix numpy

comment:30 Changed 19 months ago by gh-kliem

https://github.com/numpy/numpy/blob/cb557b79fa0ce467c881830f8e8e042c484ccfaa/doc/source/reference/simd/simd-optimizations.rst

This is the page to look at, I guess. If I understand correctly, we need to build numpy with --cpu-baseline=NONE if SAGE_FAT_BINARY.

comment:31 in reply to:  30 Changed 19 months ago by Erik Bray

Replying to gh-kliem:

https://github.com/numpy/numpy/blob/cb557b79fa0ce467c881830f8e8e042c484ccfaa/doc/source/reference/simd/simd-optimizations.rst

This is the page to look at, I guess. If I understand correctly, we need to build numpy with --cpu-baseline=NONE if SAGE_FAT_BINARY.

Thanks for finding that. I knew about the new SIMD stuff in Numpy, but not about the new associated compilation options (nor the fact that we weren't using them yet).

I think --cpu-baseline=MIN should also be good-enough to run on the vast, vast majority of machines that are modern enough to even be capable of running Sage.

We should also look into --cpu-dispatch. It will result in larger binaries, but IIUC will allow Numpy to perform runtime CPU feature detection and call the correct implementations.

comment:32 Changed 19 months ago by gh-kliem

I agree that we should consider --cpu-dispatch for our binaries.

I don't think --cpu-baseline=MIN will work. If compiled on an x86 this will return SSE SSE2, but that won't work on apples M1.

The problem is that --cpu-baseline=MIN still assumes a similar architecture.

comment:33 in reply to:  32 Changed 19 months ago by Erik Bray

Replying to gh-kliem:

I don't think --cpu-baseline=MIN will work. If compiled on an x86 this will return SSE SSE2, but that won't work on apples M1.

The problem is that --cpu-baseline=MIN still assumes a similar architecture.

I don't think that's true. If you look at the docs you linked, the meaning of baseline=MIN is architecture-dependent. It won't enable features that are not relevant for the target architecture.

Of course, that means we need to build Sage binaries for aarch64 but that's already the case anyways I assume.

comment:34 Changed 19 months ago by Matthias Köppe

Let's ignore the harder problem of building universal binaries that work on both x86_64 and arm.

Using --cpu-dispatch when SAGE_FAT_BINARY is set sounds like the right solution. Could one of you prepare a branch please?

comment:35 Changed 19 months ago by Matthias Köppe

Summary: build not portable despite using SAGE_FAT_BINARY=yes, NTL-related / openblas-relatedbuild not portable despite using SAGE_FAT_BINARY=yes, NTL/openblas/numpy-related

comment:36 Changed 19 months ago by Matthias Köppe

Priority: criticalblocker

comment:37 Changed 19 months ago by git

Commit: f053669ecdaae1ea48dce00819e0c8f4de6a3f9d9587b4013735a01c0f84eb9455bfa0e03663ebf3

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

9587b40try to build portable ntl by disabling avx512

comment:38 Changed 19 months ago by git

Commit: 9587b4013735a01c0f84eb9455bfa0e03663ebf3468395d340ffeacaad551fd0301bb07ffc8d6252

Branch pushed to git repo; I updated commit sha1. New commits:

468395ddo not allow numpy intrinsics when building fat binary

comment:39 Changed 19 months ago by gh-kliem

Reviewers: https://github.com/kliem/sage/pull/41/checks

comment:40 Changed 19 months ago by Matthias Köppe

I ran this also in https://github.com/mkoeppe/sage/actions/runs/637210121 (on top of #25993 and other tickets - https://github.com/mkoeppe/sage/tree/ci-25993-2021-03-05-29827-29537) and I am still getting SIGSEGVs from plotting and SIGILLs

comment:41 Changed 19 months ago by Matthias Köppe

Erik, in https://groups.google.com/g/sage-devel/c/s_inhvSxLp8/m/Vtb35oYcAgAJ you say you have fixed it in the windows binary build --- what's the solution?

comment:42 in reply to:  41 ; Changed 19 months ago by Erik Bray

Replying to mkoeppe:

Erik, in https://groups.google.com/g/sage-devel/c/s_inhvSxLp8/m/Vtb35oYcAgAJ you say you have fixed it in the windows binary build --- what's the solution?

The issue on the NTL side was fixed: https://github.com/sagemath/sage-windows/issues/53

The solution was to not use the NTL package from Cygwin and custom compile it instead.

I also brought the issue up with the package maintainer: https://cygwin.com/pipermail/cygwin/2020-December/247196.html

He has released a new version of the package to address it: https://cygwin.com/pipermail/cygwin/2020-December/247205.html

He still rebuilt it, with my perhaps fault suggestion of including CXXFLAGS=-mavx, which others have rightfully pointed out will still make it less portable, though I think that's how I built it as well.

He released another new NTL package in January but I'm not sure whether or not it changes anything about this: https://cygwin.com/pipermail/cygwin/2021-January/247439.html

The NTL fix definitely worked for some, but others are still reporting segfaults in plotting, so I think that has to do either with OpenBLAS or Numpy, but I haven't been able to gather enough details: https://github.com/sagemath/sage-windows/issues/57

comment:43 in reply to:  40 ; Changed 19 months ago by Erik Bray

Replying to mkoeppe:

I ran this also in https://github.com/mkoeppe/sage/actions/runs/637210121 (on top of #25993 and other tickets - https://github.com/mkoeppe/sage/tree/ci-25993-2021-03-05-29827-29537) and I am still getting SIGSEGVs from plotting and SIGILLs

Do we know anything about the CPU architecture of the machines these builds are being run on? I looked through the logs but didn't find anything. It might be helpful to add a cat /proc/cpuinfo somewhere.

comment:44 Changed 19 months ago by Erik Bray

Possibly related but I'm not sure since others are reporting it's fixed in 1.19.4. I didn't know Numpy was now installing its own copies of libraries in numpy.libs when you install from a wheel. I don't think that should be affecting Sage, but it's worth a look: https://github.com/numpy/numpy/issues/17674

comment:45 Changed 19 months ago by Erik Bray

Worth a look in part because Numpy with OpenBLAS 0.3.12 was causing segfaults, so they reverted to OpenBLAS 0.3.9: https://github.com/numpy/numpy/pull/17680

We are using 0.3.13 which might also have problems with Numpy, though I wish I had more insight into what those problems actually are.

comment:46 Changed 19 months ago by Erik Bray

Many seem to have suggested it might have to do with OpenBLAS threading. Could we try setting export OPENBLAS_NUM_THREADS=1 and see if it makes any difference?

comment:47 Changed 19 months ago by Erik Bray

Alas, Sage 9.2 has OpenBLAS 0.3.9, so I guess this is not directly related to the people reporting this against Sage 9.2 for Windows :/

comment:48 in reply to:  46 Changed 19 months ago by Matthias Köppe

Replying to embray:

Many seem to have suggested it might have to do with OpenBLAS threading. Could we try setting export OPENBLAS_NUM_THREADS=1 and see if it makes any difference?

We already set this in src/bin/sage-env

comment:49 Changed 19 months ago by git

Commit: 468395d340ffeacaad551fd0301bb07ffc8d625286f4c345a8b8bfe3ab235337d35443771da5c965

Branch pushed to git repo; I updated commit sha1. New commits:

86f4c34.github/workflows/ci-cygwin-*.yml: Print cpuinfo

comment:50 in reply to:  43 Changed 19 months ago by Matthias Köppe

Replying to embray:

Replying to mkoeppe:

I ran this also in https://github.com/mkoeppe/sage/actions/runs/637210121 (on top of #25993 and other tickets - https://github.com/mkoeppe/sage/tree/ci-25993-2021-03-05-29827-29537) and I am still getting SIGSEGVs from plotting and SIGILLs

Do we know anything about the CPU architecture of the machines these builds are being run on?

Only that it varies even within one run.

I looked through the logs but didn't find anything. It might be helpful to add a cat /proc/cpuinfo somewhere.

I have added it at the beginning of each job

comment:51 in reply to:  42 Changed 19 months ago by Matthias Köppe

Replying to embray:

The issue on the NTL side was fixed: https://github.com/sagemath/sage-windows/issues/53

The solution was to not use the NTL package from Cygwin and custom compile it instead.

I also brought the issue up with the package maintainer: [...] He released another new NTL package in January but I'm not sure whether or not it changes anything about this: https://cygwin.com/pipermail/cygwin/2021-January/247439.html

The NTL fix definitely worked for some [...]

Thanks a lot for the details on NTL. I haven't seen the NTL-related failures in a while, so let's assume it is fixed by the changes to the Cygwin package. (But unrelated issues currently block the cygwin-minimal build -- so we cannot be sure yet that our NTL works in this configuration.)

comment:52 Changed 19 months ago by git

Commit: 86f4c345a8b8bfe3ab235337d35443771da5c965f6ec2c962e79469275f1d99c79524bfe4eac6be5

Branch pushed to git repo; I updated commit sha1. New commits:

1092d4b.github/workflows/ci-cygwin-*.yml: Use configure --enable-fat-binary - the environment variable SAGE_FAT_BINARY is not passed through tox
67b01ectox.ini: Pass through SAGE_NUM_THREADS
f6ec2c9github/workflows/ci-cygwin-*.yml: Remove setting of SAGE_CHECK* variables which duplicates what tox does

comment:53 Changed 19 months ago by Matthias Köppe

Reviewers: https://github.com/kliem/sage/pull/41/checkshttps://github.com/mkoeppe/sage/actions/runs/647496577

comment:54 Changed 19 months ago by Matthias Köppe

Authors: Jonathan Kliem, Matthias Koeppe
Status: newneeds_review

comment:55 Changed 19 months ago by Matthias Köppe

Description: modified (diff)

comment:56 Changed 19 months ago by git

Commit: f6ec2c962e79469275f1d99c79524bfe4eac6be5c76bd917e4d36b0414af9875cdb75f1cf4be5aa1

Branch pushed to git repo; I updated commit sha1. New commits:

c76bd91.github/workflows/ci-cygwin*.yml: Fix up for cpuinfo output

comment:57 Changed 19 months ago by Matthias Köppe

Reviewers: https://github.com/mkoeppe/sage/actions/runs/647496577https://github.com/mkoeppe/sage/actions/runs/647547421

comment:58 Changed 19 months ago by Matthias Köppe

In https://github.com/mkoeppe/sage/runs/2099002937?check_suite_focus=true I see a build failure:

  ...
  [openblas-0.3.13]   /usr/lib/gcc/x86_64-pc-cygwin/10/../../../../x86_64-pc-cygwin/bin/ld: cannot export zlatm6_: symbol not defined
  [openblas-0.3.13]   /usr/lib/gcc/x86_64-pc-cygwin/10/../../../../x86_64-pc-cygwin/bin/ld: cannot export zlatm6_: symbol not defined
  [openblas-0.3.13]   /usr/lib/gcc/x86_64-pc-cygwin/10/../../../../x86_64-pc-cygwin/bin/ld: cannot export zlatme_: symbol not defined
  ...

comment:59 Changed 19 months ago by Matthias Köppe

The full log does not reveal a specific error message, but building one object fails, it looks like gcc crashes.

And in a second attempt in the same run, the build succeeds.

Changed 19 months ago by Matthias Köppe

Attachment: openblas-0.3.13.log added

comment:60 Changed 19 months ago by Matthias Köppe

Upon restarting the workflow, the openblas build succeeds without issues. Running into rebasing issues again (haven't seen this in a while!) https://github.com/mkoeppe/sage/runs/2103478775?check_suite_focus=true

  [pillow-8.0.1]         0 [main] python3 47699 child_info_fork::abort: address space needed by 'Scanners.cpython-38-x86_64-cygwin.dll' (0x400000) is already occupied
  [pillow-8.0.1]         0 [main] python3 47700 child_info_fork::abort: address space needed by 'Scanners.cpython-38-x86_64-cygwin.dll' (0x400000) is already occupied
Last edited 19 months ago by Matthias Köppe (previous) (diff)

comment:61 Changed 19 months ago by git

Commit: c76bd917e4d36b0414af9875cdb75f1cf4be5aa12037d68daf7c7389b7ba9357499b538969604b74

Branch pushed to git repo; I updated commit sha1. New commits:

2037d68build/bin/sage-print-system-package-command [CYGWIN]: Fix typo

comment:62 Changed 19 months ago by Erik Bray

That's weird. Scanners.cpython-38-x86_64-cygwin.dll is part of Cython. This suggests that rebase failed after installing Cython, or possibly that Cython had just been installed but the rebase lock was still held. It doesn't look that way from the logs though; I would have thought Cython was installed in an earlier stage.

comment:63 Changed 19 months ago by Erik Bray

From the logs from the previous stage of that run:

Copying package files from temporary location /opt/sage-461876e6c77b59756ad9db557c884486eb40881e/var/tmp/sage/build/cython-0.29.21/inst to /opt/sage-461876e6c77b59756ad9db557c884486eb40881e
Waiting for rebase lock
Getting list of dlls. This may take a while...
Now rebasing...
Successfully installed cython-0.29.21

So that's mysterious.

comment:64 Changed 19 months ago by Matthias Köppe

Thanks for looking into this -- I really need your expertise here!

I run rebasing at the end of .github/workflows/extract-sage-local.sh - to make sure that the downloaded stages that have been built in parallel can work together.

comment:65 Changed 19 months ago by Erik Bray

On this build I notice during "Extract sage-local artifact":

Filesystem       Size  Used Avail Use% Mounted on
C:/tools/cygwin  256G  170G   86G  67% /
D:                14G  2.2G   12G  16% /cygdrive/d
Getting list of dlls. This may take a while...
Now rebasing...
rebaseall: only ash or dash processes are allowed during rebasing
    Exit all Cygwin processes and stop all Cygwin services.
    Execute ash (or dash) from Start/Run... or a cmd or command window.
    Execute '/bin/rebaseall' from ash (or dash).
Error: Process completed with exit code 1.

If you want to do a full rebase in this step it can't be called from a shell script run with /bin/bash. The script either needs to be run with dash, or as a separate command with dash.

comment:66 Changed 19 months ago by Matthias Köppe

Yes, in this run I tried to fix the problem by changing to rebaseall...

comment:67 in reply to:  66 Changed 19 months ago by Erik Bray

Replying to mkoeppe:

Yes, in this run I tried to fix the problem by changing to rebaseall...

Ah, OK. I think that's a good idea. It just needs to be run outside a Cygwin process with no other Cygwin processes running (since it can also modify the cygwin DLL itself and other support libraries that are not normally touched by plain rebase).

comment:68 Changed 19 months ago by Matthias Köppe

Is this the cygwin dash that should be used?

comment:69 Changed 19 months ago by Erik Bray

Yes, or you can just call C:\cygwin64\bin\rebaseall directly without going through a shell, I think.

comment:70 Changed 19 months ago by Matthias Köppe

Right, but I want to keep it in the extraction script so that it is easy to use manually too

Last edited 19 months ago by Matthias Köppe (previous) (diff)

comment:71 Changed 19 months ago by Matthias Köppe

comment:72 Changed 19 months ago by Matthias Köppe

This worked! https://github.com/mkoeppe/sage/runs/2134463553

I'll push the fixes to this ticket.

comment:73 Changed 19 months ago by git

Commit: 2037d68daf7c7389b7ba9357499b538969604b74da978f85227ad4380b54f471b131bb82d48524bd

Branch pushed to git repo; I updated commit sha1. New commits:

d0df00c.github/workflows/extract-sage-local.sh: Use sage-rebase.sh --all
da978f8.github/workflows/{extract-sage-local.sh, ci-cygwin-*.yml): Use /bin/dash for scripts invoking rebaseall

comment:74 Changed 19 months ago by Matthias Köppe

Reviewers: https://github.com/mkoeppe/sage/actions/runs/647547421https://github.com/mkoeppe/sage/actions/runs/662602246

comment:75 Changed 19 months ago by Matthias Köppe

With the rebasing fixes applied, I am getting again to sage-iii (https://github.com/mkoeppe/sage/runs/2140904387?check_suite_focus=true)

  [sagelib-9.3.beta9]   Generating auto-generated sources
  [sagelib-9.3.beta9]   Building interpreters for fast_callable
  [sagelib-9.3.beta9]   -> First build of interpreters
  [sagelib-9.3.beta9]   running build_cython
  [sagelib-9.3.beta9]   Enabling Cython debugging support
  [sagelib-9.3.beta9]   /cygdrive/d/a/sage/sage/build/pkgs/sagelib/spkg-install: line 58: 61038 Illegal instruction     (core dumped) python3 -u setup.py --no-user-cfg build install

which is again coming from numpy, I think. Help....

Last edited 19 months ago by Matthias Köppe (previous) (diff)

comment:76 Changed 19 months ago by Matthias Köppe

Reviewers: https://github.com/mkoeppe/sage/actions/runs/662602246Matthias Koeppe, ..., https://github.com/mkoeppe/sage/actions/runs/662602246

But the current branch is already an improvement, so let's get this in...

comment:77 Changed 19 months ago by gh-kliem

Reviewers: Matthias Koeppe, ..., https://github.com/mkoeppe/sage/actions/runs/662602246Matthias Koeppe, Jonathan Kliem
Status: needs_reviewpositive_review

LGTM.

comment:78 Changed 19 months ago by Matthias Köppe

Thanks!

comment:79 Changed 19 months ago by Matthias Köppe

Follow-up in #31521

comment:80 Changed 19 months ago by Volker Braun

Status: positive_reviewneeds_work

Merge conflict

    STDOUT: Auto-merging tox.ini
    STDOUT: Auto-merging .github/workflows/ci-cygwin-standard.yml
    STDOUT: CONFLICT (content): Merge conflict in .github/workflows/ci-cygwin-standard.yml
    STDOUT: Auto-merging .github/workflows/ci-cygwin-minimal.yml
    STDOUT: CONFLICT (content): Merge conflict in .github/workflows/ci-cygwin-minimal.yml
    STDOUT: Automatic merge failed; fix conflicts and then commit the result.

comment:81 Changed 19 months ago by git

Commit: da978f85227ad4380b54f471b131bb82d48524bdd6dced2457b7c3a6b645f2e56b9ca44c4b09e437

Branch pushed to git repo; I updated commit sha1. New commits:

7081189tox.ini, build/bin/write-dockerfile.sh: Use configure --enable-download-from-upstream-url --enable-experimental-packages instead of setting SAGE_SPKG directly
d6dced2Merge #31493

comment:82 Changed 19 months ago by Matthias Köppe

Dependencies: #31493
Status: needs_workpositive_review

comment:83 Changed 18 months ago by Volker Braun

Branch: public/29537d6dced2457b7c3a6b645f2e56b9ca44c4b09e437
Resolution: fixed
Status: positive_reviewclosed
Note: See TracTickets for help on using tickets.