#20096 closed enhancement (fixed)
Make OpenBLAS standard instead of ATLAS
Reported by:  cpernet  Owned by:  

Priority:  major  Milestone:  sage7.4 
Component:  packages: standard  Keywords:  BLAS, sd75 
Cc:  jpflori  Merged in:  
Authors:  Clément Pernet  Reviewers:  JeanPierre Flori 
Report Upstream:  N/A  Work issues:  
Branch:  2300a31 (Commits)  Commit:  
Dependencies:  Stopgaps: 
Description (last modified by )
Change History (62)
comment:1 Changed 5 years ago by
 Dependencies set to #17075
comment:2 Changed 5 years ago by
comment:3 Changed 5 years ago by
Sure. I should have been clearer: the idea is to
 change the default BLAS installed by SageMath from ATLAS to OpenBLAS
 let the user the option to specify which BLAS to link against.
 simplify the current BLAS, BLAS2 variable mess
comment:4 Changed 5 years ago by
 Description modified (diff)
comment:5 Changed 5 years ago by
 Description modified (diff)
comment:6 Changed 5 years ago by
 Description modified (diff)
comment:7 Changed 5 years ago by
 Component changed from linear algebra to packages: standard
comment:8 Changed 5 years ago by
 Dependencies changed from #17075 to #17075 #20129
comment:9 Changed 5 years ago by
 Dependencies changed from #17075 #20129 to #17075, #20129, #20130
 Summary changed from Replace ATLAS by OpenBLAS to Choose between ATLAS and OpenBLAS
comment:10 Changed 5 years ago by
 Dependencies changed from #17075, #20129, #20130 to #20133, #20140
 Milestone changed from sagewishlist to sage7.1
comment:11 Changed 4 years ago by
Should we close this after #20542? Or switch to openblas by default here?
comment:12 Changed 4 years ago by
 Dependencies #20133, #20140 deleted
 Summary changed from Choose between ATLAS and OpenBLAS to Make OpenBLAS standard instead of ATLAS
comment:13 Changed 4 years ago by
Anyone else thinks we should go ahead with this one?
comment:14 Changed 4 years ago by
+1... send a patch ;)
comment:15 Changed 4 years ago by
 Branch set to public/openblas
 Commit set to a3aceaa70cbf6c8f9578d632b1f91c8cf2b2aa02
New commits:
a3aceaa  switch default BLAS to OpenBLAS

comment:16 Changed 4 years ago by
 Commit changed from a3aceaa70cbf6c8f9578d632b1f91c8cf2b2aa02 to 2e73362a63297abd0d4ff77a38479245260acf0e
Branch pushed to git repo; I updated commit sha1. New commits:
2e73362  Make OpenBLAS default

comment:17 Changed 4 years ago by
 Commit changed from 2e73362a63297abd0d4ff77a38479245260acf0e to 17c092e0fdb05bef1904211899a7fbb01345c03e
Branch pushed to git repo; I updated commit sha1. New commits:
17c092e  Modify configure help strings about blas.

comment:18 Changed 4 years ago by
 Keywords sd75 added
 Reviewers set to JeanPierre Flori
 Status changed from new to needs_review
comment:19 Changed 4 years ago by
 Status changed from needs_review to positive_review
comment:20 Changed 4 years ago by
 Status changed from positive_review to needs_work
sage t long src/sage/misc/package.py ********************************************************************** File "src/sage/misc/package.py", line 179, in sage.misc.package.list_packages Failed example: sorted(L.keys()) Expected: ['alabaster', 'arb', 'atlas', ... 'zn_poly', 'zope_interface'] Got: ['alabaster', 'arb', 'babel', 'backports_abc',
comment:21 Changed 4 years ago by
 Commit changed from 17c092e0fdb05bef1904211899a7fbb01345c03e to c52483276acbcc6cdbac37afe7febad10222e2a2
comment:22 Changed 4 years ago by
 Status changed from needs_work to positive_review
comment:23 Changed 4 years ago by
 Status changed from positive_review to needs_work
Openblas fails to build on some 32bit Linux platforms, e.g. http://build.sagedev.org/release/builders/%20%20slow%20AIMS%20bu16_32s02%20%28Ubuntu%2016.04%2032%20bit%29%20incremental/builds/258/steps/compile_1/logs/stdio
[openblas0.2.15] gfortran O2 Wall L/home/buildbot/slave/sage_git/build/local/lib Wl,rpath,/home/buildbot/slave/sage_git/build/local/lib o sblat1 sblat1.o ../libopenblas_haswellpr0.2.15.a lm lpthread lgfortran lm lpthread lgfortran [openblas0.2.15] ../libopenblas_haswellpr0.2.15.a(saxpy.o): In function `saxpy_': [openblas0.2.15] axpy.c:(.text+0x9a): undefined reference to `saxpy_k' [openblas0.2.15] axpy.c:(.text+0x12a): undefined reference to `saxpy_k' [openblas0.2.15] ../libopenblas_haswellpr0.2.15.a(sswap.o): In function `sswap_': [openblas0.2.15] swap.c:(.text+0x84): undefined reference to `sswap_k' [openblas0.2.15] swap.c:(.text+0xfb): undefined reference to `sswap_k' [openblas0.2.15] ../libopenblas_haswellpr0.2.15.a(scopy.o): In function `scopy_': [openblas0.2.15] copy.c:(.text+0x41): undefined reference to `scopy_k' [openblas0.2.15] ../libopenblas_haswellpr0.2.15.a(sscal.o): In function `sscal_': [openblas0.2.15] scal.c:(.text+0x67): undefined reference to `sscal_k' [openblas0.2.15] scal.c:(.text+0xc1): undefined reference to `sscal_k' [openblas0.2.15] ../libopenblas_haswellpr0.2.15.a(sdot.o): In function `sdot_': [openblas0.2.15] dot.c:(.text+0x41): undefined reference to `sdot_k' [openblas0.2.15] ../libopenblas_haswellpr0.2.15.a(sasum.o): In function `sasum_': [openblas0.2.15] asum.c:(.text+0x28): undefined reference to `sasum_k' [openblas0.2.15] ../libopenblas_haswellpr0.2.15.a(snrm2.o): In function `snrm2_': [openblas0.2.15] nrm2.c:(.text+0x28): undefined reference to `snrm2_k' [openblas0.2.15] ../libopenblas_haswellpr0.2.15.a(isamax.o): In function `isamax_': [openblas0.2.15] imax.c:(.text+0x2b): undefined reference to `isamax_k' [openblas0.2.15] ../libopenblas_haswellpr0.2.15.a(srot.o): In function `srot_': [openblas0.2.15] rot.c:(.text+0x4f): undefined reference to `srot_k' [openblas0.2.15] ../libopenblas_haswellpr0.2.15.a(memory.o): In function `blas_memory_alloc': [openblas0.2.15] memory.c:(.text+0x456): undefined reference to `cpuid' [openblas0.2.15] ../libopenblas_haswellpr0.2.15.a(parameter.o): In function `get_L2_size': [openblas0.2.15] parameter.c:(.text+0x35): undefined reference to `cpuid' [openblas0.2.15] collect2: error: ld returned 1 exit status
comment:24 Changed 4 years ago by
Also, numerical noise:
sage t long src/sage/matrix/matrix_double_dense.pyx ********************************************************************** File "src/sage/matrix/matrix_double_dense.pyx", line 1009, in sage.matrix.matrix_double_dense.Matrix_double_dense.singular_values Failed example: A.condition() > 1.6e16 or A.condition() Expected: True Got: 1.5972138093369194e+16 ********************************************************************** 1 item had failures: 1 of 31 in sage.matrix.matrix_double_dense.Matrix_double_dense.singular_values [636 tests, 1 failure, 1.91 s]
comment:25 Changed 4 years ago by
More numerical noise:
sage t long src/sage/modular/modform/numerical.py ********************************************************************** File "src/sage/modular/modform/numerical.py", line 429, in sage.modular.modform.numerical.NumericalEigenforms.eigenvalues Failed example: n.eigenvalues([3,5,13]) # rel tol 2e10 Expected: [[177148.0, 252.00000000001896], [48828126.0, 4830.000000001376], [1792160394038.0, 577737.9999898539]] Got: [[177148.0, 252.00000000001413], [48828126.0, 4830.000000003907], [1792160394038.0, 577737.9998566243]] Tolerance exceeded in 1 of 6: 577737.9999898539 vs 577737.9998566243, tolerance 2e10 > 2e10 ********************************************************************** File "src/sage/modular/modform/numerical.py", line 446, in sage.modular.modform.numerical.NumericalEigenforms.eigenvalues.phi Failed example: n.eigenvalues([3,5,13]) # rel tol 2e10 Expected: [[177148.0, 252.00000000001896], [48828126.0, 4830.000000001376], [1792160394038.0, 577737.9999898539]] Got: [[177148.0, 252.00000000001413], [48828126.0, 4830.000000003907], [1792160394038.0, 577737.9998566243]] Tolerance exceeded in 1 of 6: 577737.9999898539 vs 577737.9998566243, tolerance 2e10 > 2e10 ********************************************************************** 2 items had failures: 1 of 3 in sage.modular.modform.numerical.NumericalEigenforms.eigenvalues 1 of 3 in sage.modular.modform.numerical.NumericalEigenforms.eigenvalues.phi [46 tests, 2 failures, 0.56 s]
comment:26 Changed 4 years ago by
Very surprising.
Could you give the details on the host architecture where the numerical noise issues appeared (uname a
and cat /proc/cpuinfo
)?
I could not reproduce them on my x86_64 sandibridge running linuxmint 14.
Similarly for the 32 bit linking errors (it never saw it fail to compile on 32 bits ubuntu's). I could then try to find a similar arch and investigate it and/or report the problem upstream.
comment:27 Changed 4 years ago by
The buildbot machine with the link failure is Ubuntu 16.04 32bit (in a VM).
As far as the numerical noise goes, the tolerance just needs to be incremented a a tiny bit.
comment:28 Changed 4 years ago by
The linking issue looks like https://github.com/xianyi/OpenBLAS/issues/657.
OpenBLAS devs say they did not tune the 32bit haswell case, and recommend passing BINARY=32
to make which will default to nehalem kernels.
comment:29 Changed 4 years ago by
 Commit changed from c52483276acbcc6cdbac37afe7febad10222e2a2 to b37ead3b0cf261d6e7c473497ea3d5009e910725
Branch pushed to git repo; I updated commit sha1. New commits:
b37ead3  fixing numerical noise issues of #20096 in some arch when OpenBLAS replaces ATLAS.

comment:30 Changed 4 years ago by
 Commit changed from b37ead3b0cf261d6e7c473497ea3d5009e910725 to 2a13e137a8688fe5a14b10c147cd500733b5afba
Branch pushed to git repo; I updated commit sha1. New commits:
2a13e13  Tells OpenBLAS to use default kernels on 32bits arch (fixing #20096)

comment:31 Changed 4 years ago by
 Status changed from needs_work to needs_review
The last two commits fix the two issues raised by Volker:
 linking error on 32 bits > passing BINARY=32 to make
 numerical noise > fixing the tolerance
comment:32 followup: ↓ 36 Changed 4 years ago by
 Status changed from needs_review to needs_work
Damn, this patch doesn't work as SAGE64 is not necessarily set to yes when compiling on a 64 bits machine. After reading about the uselessness of SAGE64 on sagedevel, and browsing through MPIR's and M4RI's spkg's I don't find any environment variable that would tell us whether the target architecture is 32 or 64 bits.
An option could be to copy/paste MPIR's long case switch setting ABI=32/64. Does anybody have a better solution?
comment:33 Changed 4 years ago by
 Commit changed from 2a13e137a8688fe5a14b10c147cd500733b5afba to f8fedca2afd96832f1606f5d4fcdbb58623e7fb5
Branch pushed to git repo; I updated commit sha1. New commits:
f8fedca  properly checking whether we are compiling on a 64 or 32 bit architecture.

comment:34 followup: ↓ 37 Changed 4 years ago by
 Status changed from needs_work to needs_review
Ok, I pushed an new python way of checking this.
comment:35 Changed 4 years ago by
IMHO at this point its easier to just write the spkginstall in Python
comment:36 in reply to: ↑ 32 Changed 4 years ago by
Replying to cpernet:
Damn, this patch doesn't work as SAGE64 is not necessarily set to yes when compiling on a 64 bits machine. After reading about the uselessness of SAGE64 on sagedevel, and browsing through
Yes indeed.
SAGE64
was used on strange archs where kernel is 64 bits but userspace is 32 bits by default, e.g. old OS X or Solaris.
comment:37 in reply to: ↑ 34 ; followup: ↓ 60 Changed 4 years ago by
Replying to cpernet:
Ok, I pushed an new python way of checking this.
Then we should make the spkg
dependent on python
.
Note that ATLAS
install script also used python
so there is no regression here.
comment:38 Changed 4 years ago by
(I mean the spkgintall
script for ATLAS
).
comment:39 Changed 4 years ago by
Maybe using something with uname m
could work without depending on python?
And maybe it's time for a testabi.sh
script in sage/src/bin/
?
@volker: any preferred solution?
comment:40 Changed 4 years ago by
We require a system python so it doesn't have to depend on our Python. IMHO the atlas dependency is incorrect. It needs to be python2/3 compatible, though.
comment:41 Changed 4 years ago by
Not sure it is what Volker meant, but you cannot use print
the same way in Python 2 and 3...
comment:42 Changed 4 years ago by
 Commit changed from f8fedca2afd96832f1606f5d4fcdbb58623e7fb5 to d5dd1d018e401f5922e9e2b0949359349488e2be
Branch pushed to git repo; I updated commit sha1. New commits:
d5dd1d0  support system python3

comment:43 Changed 4 years ago by
You can after
from __future__ import print_function
comment:44 followup: ↓ 45 Changed 4 years ago by
I've just fixed the print
so it would work on a system python v3 as well as v2.
As for comment:39 : a solution using calls to uname
is being used in mpir/spkginstall. It would be a good starting point to write a testabi.sh
, but I do not feel confident enough with sage's build system to write myself.
I also find this solution rather complicated, compared to calling python's plateform.architecture()
If required, I'll rewrite the spkginstall
in python, but as I noticed many other spkginstall
s are in bash and make a one line call to python, I thought it would be ok like this.
comment:45 in reply to: ↑ 44 Changed 4 years ago by
Replying to cpernet:
I've just fixed the
That's quite lucky :) Adding
from __future__ import print_function
turns print
into a real function.
With your code it calls the magic print
on ("...")
which turns out to have the same behavior in this specific case.
As for comment:39 : a solution using calls to
uname
is being used in mpir/spkginstall. It would be a good starting point to write atestabi.sh
, but I do not feel confident enough with sage's build system to write myself. I also find this solution rather complicated, compared to calling python'splateform.architecture()
If required, I'll rewrite the
spkginstall
in python, but as I noticed many otherspkginstall
s are in bash and make a one line call to python, I thought it would be ok like this.
Nope, please don't do so.
comment:46 Changed 4 years ago by
 Commit changed from d5dd1d018e401f5922e9e2b0949359349488e2be to 2bff27f16b4d8590955a8716358f6d68e7c82e5a
Branch pushed to git repo; I updated commit sha1. New commits:
2bff27f  Cleaner Python 2/3 use of print.

comment:47 Changed 4 years ago by
 Status changed from needs_review to positive_review
comment:48 Changed 4 years ago by
 Milestone changed from sage7.1 to sage7.4
comment:49 Changed 4 years ago by
 Status changed from positive_review to needs_work
On arando (linux 32bit):
[openblas0.2.15] gfortran O2 Wall L/home/buildslavesage/slave/sage_git/build/local/lib Wl,rpath,/home/buildslavesage/slave/sage_git/build/local/lib o zblat3 zblat3.o ../libopenblas_sandybridgepr0.2.15.a lm lpthread lgfortran lm lpthread lgfortran [openblas0.2.15] ../libopenblas_sandybridgepr0.2.15.a: file not recognized: File truncated [openblas0.2.15] collect2: error: ld returned 1 exit status [openblas0.2.15] make[4]: *** [dblat3] Error 1 [openblas0.2.15] ../libopenblas_sandybridgepr0.2.15.a: file not recognized: File truncated [openblas0.2.15] collect2: error: ld returned 1 exit status [openblas0.2.15] make[4]: *** [cblat3] Error 1 [openblas0.2.15] ../libopenblas_sandybridgepr0.2.15.a: file not recognized: File truncated [openblas0.2.15] collect2: error: ld returned 1 exit status [openblas0.2.15] make[4]: *** [zblat3] Error 1 [openblas0.2.15] make[4]: Target `all' not remade because of errors. [openblas0.2.15] make[4]: Leaving directory `/home/buildslavesage/slave/sage_git/build/local/var/tmp/sage/build/openblas0.2.15/src/test' [openblas0.2.15] make[3]: *** [tests] Error 2 [openblas0.2.15] make[3]: Leaving directory `/home/buildslavesage/slave/sage_git/build/local/var/tmp/sage/build/openblas0.2.15/src' [openblas0.2.15] Error while running the OpenBLAS testsuite ... exiting [openblas0.2.15] [openblas0.2.15] real 0m0.438s [openblas0.2.15] user 0m0.172s [openblas0.2.15] sys 0m0.024s [openblas0.2.15] ************************************************************************ [openblas0.2.15] Error testing package openblas0.2.15
comment:50 Changed 4 years ago by
The first occurence of "libopenblas_sandybridgep" is
[openblas0.2.15] touch libopenblas_sandybridgepr0.2.15.a [openblas0.2.15] make j 4 C test all [openblas0.2.15] make[4]: Entering directory `/home/buildslavesage/slave/sage_git/build/local/var/tmp/sage/build/openblas0.2.15/src/test' [openblas0.2.15] gfortran O2 Wall L/home/buildslavesage/slave/sage_git/build/local/lib Wl,rpath,/home/buildslavesage/slave/sage_git/build/local/lib o sblat1 sblat1.o ../libopenblas_sandybridgepr0.2.15.a lm lpthread lgfortran lm lpthread lgfortran [openblas0.2.15] gfortran O2 Wall L/home/buildslavesage/slave/sage_git/build/local/lib Wl,rpath,/home/buildslavesage/slave/sage_git/build/local/lib o dblat1 dblat1.o ../libopenblas_sandybridgepr0.2.15.a lm lpthread lgfortran lm lpthread lgfortran [openblas0.2.15] gfortran O2 Wall L/home/buildslavesage/slave/sage_git/build/local/lib Wl,rpath,/home/buildslavesage/slave/sage_git/build/local/lib o cblat1 cblat1.o ../libopenblas_sandybridgepr0.2.15.a lm lpthread lgfortran lm lpthread lgfortran [openblas0.2.15] gfortran O2 Wall L/home/buildslavesage/slave/sage_git/build/local/lib Wl,rpath,/home/buildslavesage/slave/sage_git/build/local/lib o zblat1 zblat1.o ../libopenblas_sandybridgepr0.2.15.a lm lpthread lgfortran lm lpthread lgfortran [openblas0.2.15] ../libopenblas_sandybridgepr0.2.15.a: file not recognized: File truncated [openblas0.2.15] collect2: error: ld returned 1 exit status [openblas0.2.15] ../libopenblas_sandybridgepr0.2.15.a: file not recognized: File truncated [openblas0.2.15] make[4]: *** [sblat1] Error 1
You can't build a static archive with "touch" %)
comment:51 Changed 4 years ago by
Oops...
comment:52 Changed 4 years ago by
Isn't there a previous issue with the static archive in the log?
touch
is maybe just here to update the timestamp...
We could/should also disable building a static archive.
And update openblas to 0.2.19 and hope the problem disappear...
comment:53 Changed 4 years ago by
Indeed on my machine, running make tests
starts by touching the static archive.
No we have to devise why passing BINARY=32
somehow prevents the required archive to be produced.
@volker: can you tell us what so
and a
files were produced on arando?
comment:54 Changed 4 years ago by
No, there was no previous apperance. It could have been > /dev/null
but there is no valid .a ...
Ask Dima for an account on arando.
comment:55 Changed 4 years ago by
In general, disabling static libs is a +1 from me
comment:56 Changed 4 years ago by
Hum this was actually an easy one...
In spkgcheck
we run make tests
without BINARY=32
so the static archive name the tests want to link to is not the right one...
comment:57 Changed 4 years ago by
 Commit changed from 2bff27f16b4d8590955a8716358f6d68e7c82e5a to 2300a31a039afc27d111c9757efeec1a2bb643be
Branch pushed to git repo; I updated commit sha1. New commits:
2300a31  Pass the same envvars to make for building and testing openblas.

comment:58 Changed 4 years ago by
 Status changed from needs_work to positive_review
The modification is quite easy so I put this back to positive review.
@volker: if you think we'd better patch the Makefile
to use a symlink for linking to the static archive when building the tests, please put this back to needs_work
.
comment:59 Changed 4 years ago by
 Branch changed from public/openblas to 2300a31a039afc27d111c9757efeec1a2bb643be
 Resolution set to fixed
 Status changed from positive_review to closed
comment:60 in reply to: ↑ 37 Changed 4 years ago by
 Commit 2300a31a039afc27d111c9757efeec1a2bb643be deleted
Replying to jpflori:
Replying to cpernet:
Ok, I pushed an new python way of checking this.
Then we should make the
spkg
dependent onpython
. Note thatATLAS
install script also usedpython
so there is no regression here.
This is a bug. Sage meanwhile requires a system Python (>= 2.6) to build / bootstrap anyway.
And while ATLAS' spkginstall
in some files at least claims to aim at compatibility to Python 2.4(!), I have some patches to make it work with Python 2.6 (in order to drop the dependency on Sage's Python, such that it is built early, almost first)...
comment:61 Changed 4 years ago by
Followup w.r.t. OpenBLAS brokenness: #21561 (build issues; it also segfaults on AMD CPUs, cf. sagedevel thread from September 25th)
comment:62 Changed 4 years ago by
Apologies, wrong ticket!
Hardcoding one blas over another blas ist imho the wrong way. It should be configurable, see #17075.