Opened 18 months ago
Closed 14 months ago
#31565 closed defect (fixed)
Build still nonportable despite SAGE_FAT_BINARY=yes because of numpy
Reported by:  Matthias Köppe  Owned by:  

Priority:  blocker  Milestone:  sage9.4 
Component:  build  Keywords:  sdl 
Cc:  Erik Bray, ghkliem, Dima Pasechnik, Volker Braun, François Bissey, Samuel Lelièvre  Merged in:  
Authors:  Jonathan Kliem  Reviewers:  Thierry Monteil 
Report Upstream:  N/A  Work issues:  
Branch:  49e531d (Commits, GitHub, GitLab)  Commit:  49e531d3c24ca5bf3f6dff54ce54687256cc429f 
Dependencies:  #32257  Stopgaps: 
Description (last modified by )
Followup from #29537, #31521.
Observed on cygwinstandard
but likely also affects the Docker images and the Sage binary distribution.
With the upgrade to numpy 1.20.x (#31008), the nonportability shows as an error message instead of a crash:
[sagelib9.4.beta0] from numpy.core._multiarray_umath import ( [sagelib9.4.beta0] RuntimeError: NumPy was built with baseline optimizations: [sagelib9.4.beta0] (SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 AVX512F AVX512CD AVX512_SKX) but your machine doesn't support: [sagelib9.4.beta0] (AVX512F). [sagelib9.4.beta0] ************************************************************************ [sagelib9.4.beta0] Error building the Sage library [sagelib9.4.beta0] ************************************************************************ [sagelib9.4.beta0] Full log file: /cygdrive/d/a/sage/sage/logs/pkgs/sagelib9.4.beta0.log
Report withs the 9.3 Linux binary:
Change History (38)
comment:1 Changed 18 months ago by
Cc:  François Bissey added 

comment:2 Changed 18 months ago by
comment:3 Changed 18 months ago by
The symptom is that something you build on one machine does not run on another machine, aborting with SIGILL.
comment:4 Changed 18 months ago by
Description:  modified (diff) 

comment:5 Changed 17 months ago by
Milestone:  sage9.3 → sage9.4 

Moving to 9.4, as 9.3 has been released.
comment:6 Changed 16 months ago by
Dependencies:  → #31008 

Description:  modified (diff) 
Priority:  critical → blocker 
comment:9 Changed 16 months ago by
Description:  modified (diff) 

comment:10 Changed 16 months ago by
Description:  modified (diff) 

comment:11 Changed 15 months ago by
Branch:  → u/mkoeppe/build_still_non_portable_despite_sage_fat_binary_yes_because_of_numpy 

comment:12 Changed 15 months ago by
Authors:  → Jonathan Kliem 

Commit:  → b249c4750f510ba18a36b0306a15914325059398 
Status:  new → needs_review 
New commits:
b249c47  Revert "Revert "do not allow numpy intrinsics when building fat binary""

comment:13 Changed 15 months ago by
Reviewers:  → https://github.com/mkoeppe/sage/actions/runs/952966309 

comment:15 Changed 15 months ago by
The cygwinstandard
build looked rather promising (no "baseline optimizations" message, no crash when just importing numpy) but I am getting crashes again https://github.com/mkoeppe/sage/runs/2867645468 when the doctests do any plotting.
comment:16 Changed 15 months ago by
Reviewers:  https://github.com/mkoeppe/sage/actions/runs/952966309 → Matthias Koeppe 

Status:  needs_review → positive_review 
The other runs at https://github.com/mkoeppe/sage/runs/2865933225?check_suite_focus=true look clean.
So I consider this ticket already an improvement. We'll have to chase the crash on cygwin when switching CPUs between build stages in ... yet another ... followup ticket.
comment:17 Changed 15 months ago by
More numpyrelated trouble, looking as e.g. from https://groups.google.com/d/msgid/sagedevel/763f865098034bbaa0ec46744204fc22n%40googlegroups.com. (there are more reports like this)
[sagelib9.4.beta2] File "/home/chapoton/sage/local/lib/python3.8/sitepackages/numpy/core/overrides.py", line 7, in <module> [sagelib9.4.beta2] from numpy.core._multiarray_umath import ( [sagelib9.4.beta2] RuntimeError: NumPy was built with baseline optimizations: [sagelib9.4.beta2] (SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42) but your machine doesn't support: [sagelib9.4.beta2] (POPCNT).
So somehow CPU changes its state, and refuses to say yes to features tested as yes during the Numpy build?! Is it due to some CPU flags manipulations happening during sagelib build?
comment:18 Changed 15 months ago by
I've opened #32021 to deal with that RuntimeError: NumPy was built with baseline optimizations:
thing.
comment:19 Changed 15 months ago by
Acutally the definition of cpudispatch
in NUMPY_FCONFIG
isn't clean. However, only the vallue will be passed and not the variable NUMPY_FCONFIG
.
comment:20 Changed 15 months ago by
Status:  positive_review → needs_work 

This doesn't work. See https://trac.sagemath.org/ticket/32021#comment:10.
It's a build option not a configure option.
comment:21 Changed 15 months ago by
Actually the place to do it is correct however:
The command arguments are available in build, build_clib, and build_ext. if build_clib or build_ext are not specified by the user, the arguments of build will be used instead, which also holds the default values.
So this does not work for bdist_wheel
and this is what we use.
comment:22 Changed 15 months ago by
You can do setup.py bdist_wheel build [...buildoptions...]
, see for example build/pkgs/jupyter_jsmol/spkginstall.in
comment:23 Changed 15 months ago by
Thanks. This seems to do the trick.
Actually this itself might even work for this ticket here.
comment:24 Changed 15 months ago by
Branch:  u/mkoeppe/build_still_non_portable_despite_sage_fat_binary_yes_because_of_numpy → public/31565 

Commit:  b249c4750f510ba18a36b0306a15914325059398 → 82ae48525b377032296f7f83cdc70e7acf5cd533 
Status:  needs_work → needs_review 
New commits:
82ae485  disable baseline in case of SAGE_FAT_BINARY

comment:26 Changed 15 months ago by
Status:  needs_review → needs_info 

Unclear whether this is still needed now that #32021 is merged in 9.4.beta5
comment:27 Changed 14 months ago by
Dependencies:  #31008 → #32257 

comment:28 followup: 31 Changed 14 months ago by
With #32021 and the pynac
build failure fixed by #32257, we are back to being able to build and run the testsuite on Cygwin. https://github.com/mkoeppe/sage/runs/3126612760?check_suite_focus=true
Numerous SIGSEGVs whenever plotting is involved point to more trouble with numpy.
I'll try out the branch of the present ticket on top of #32257.
comment:29 Changed 14 months ago by
Commit:  82ae48525b377032296f7f83cdc70e7acf5cd533 → 49e531d3c24ca5bf3f6dff54ce54687256cc429f 

comment:30 Changed 14 months ago by
Reviewers:  Matthias Koeppe → https://github.com/mkoeppe/sage/actions/runs/1054288431 

That's now running at https://github.com/mkoeppe/sage/actions/runs/1054288431
comment:31 Changed 14 months ago by
comment:32 Changed 14 months ago by
Reviewers:  https://github.com/mkoeppe/sage/actions/runs/1054288431 

Testing with #32080 merged at https://github.com/mkoeppe/sage/actions/runs/1056686451
comment:33 Changed 14 months ago by
Cc:  Samuel Lelièvre added 

Status:  needs_info → needs_work 
Also with #32080 no change. Segfaults on every plot.
Next step would be to try to reproduce this in a local installation in Cygwin.
comment:34 Changed 14 months ago by
Priority:  blocker → critical 

comment:35 Changed 14 months ago by
Reduced to critical  see https://groups.google.com/g/sagerelease/c/91CGN0cra2k/m/1WwPZNshBQAJ
comment:36 Changed 14 months ago by
Keywords:  sdl added 

Reviewers:  → Thierry Monteil 
I confirm that this branch fixes SSE2 bug for numpy in a quemulated Pentium 3. Note however that the error appeared at run time not build time.
As this will allow me to rebuild 32bit patchbots and a new SDL for bullseye release (which i did not do for a while), i am +1 for setting this ticket a blocker and get it merged in 9.4.
comment:37 Changed 14 months ago by
Priority:  critical → blocker 

Status:  needs_work → positive_review 
If nobody complains.
comment:38 Changed 14 months ago by
Branch:  public/31565 → 49e531d3c24ca5bf3f6dff54ce54687256cc429f 

Resolution:  → fixed 
Status:  positive_review → closed 
What are the symptoms? I don't really do cygwin but I may do docker one day.