Opened 4 years ago

Closed 3 years ago

Last modified 3 years ago

#20096 closed enhancement (fixed)

Make OpenBLAS standard instead of ATLAS

Reported by: cpernet Owned by:
Priority: major Milestone: sage-7.4
Component: packages: standard Keywords: BLAS, sd75
Cc: jpflori Merged in:
Authors: Clément Pernet Reviewers: Jean-Pierre Flori
Report Upstream: N/A Work issues:
Branch: 2300a31 (Commits) Commit:
Dependencies: Stopgaps:

Description (last modified by vbraun)

OpenBLAS should replace ATLAS as the by default installed BLAS since it is

  • (much) faster to install
  • faster to run (on some platforms)

This replacement is mentionned in #17635 (upgrade fflas-ffpack) and is related to #19719 (upgrade ATLAS), and #17630 (clean up the BLAS).

Change History (62)

comment:1 Changed 4 years ago by jdemeyer

  • Dependencies set to #17075

comment:2 Changed 4 years ago by vbraun

Hardcoding one blas over another blas ist imho the wrong way. It should be configurable, see #17075.

comment:3 Changed 4 years ago by cpernet

Sure. I should have been clearer: the idea is to

  • change the default BLAS installed by SageMath from ATLAS to OpenBLAS
  • let the user the option to specify which BLAS to link against.
  • simplify the current BLAS, BLAS2 variable mess

comment:4 Changed 4 years ago by cpernet

  • Description modified (diff)

comment:5 Changed 4 years ago by cpernet

  • Description modified (diff)

comment:6 Changed 4 years ago by vbraun

  • Description modified (diff)

comment:7 Changed 4 years ago by cpernet

  • Component changed from linear algebra to packages: standard

comment:8 Changed 4 years ago by vbraun

  • Dependencies changed from #17075 to #17075 #20129

comment:9 Changed 4 years ago by jdemeyer

  • Dependencies changed from #17075 #20129 to #17075, #20129, #20130
  • Summary changed from Replace ATLAS by OpenBLAS to Choose between ATLAS and OpenBLAS

comment:10 Changed 4 years ago by jdemeyer

  • Dependencies changed from #17075, #20129, #20130 to #20133, #20140
  • Milestone changed from sage-wishlist to sage-7.1

comment:11 Changed 4 years ago by jpflori

Should we close this after #20542? Or switch to openblas by default here?

comment:12 Changed 4 years ago by jdemeyer

  • Dependencies #20133, #20140 deleted
  • Summary changed from Choose between ATLAS and OpenBLAS to Make OpenBLAS standard instead of ATLAS

comment:13 Changed 3 years ago by jpflori

Anyone else thinks we should go ahead with this one?

comment:14 Changed 3 years ago by vbraun

+1... send a patch ;-)

comment:15 Changed 3 years ago by cpernet

  • Branch set to public/openblas
  • Commit set to a3aceaa70cbf6c8f9578d632b1f91c8cf2b2aa02

New commits:

a3aceaaswitch default BLAS to OpenBLAS

comment:16 Changed 3 years ago by git

  • Commit changed from a3aceaa70cbf6c8f9578d632b1f91c8cf2b2aa02 to 2e73362a63297abd0d4ff77a38479245260acf0e

Branch pushed to git repo; I updated commit sha1. New commits:

2e73362Make OpenBLAS default

comment:17 Changed 3 years ago by git

  • Commit changed from 2e73362a63297abd0d4ff77a38479245260acf0e to 17c092e0fdb05bef1904211899a7fbb01345c03e

Branch pushed to git repo; I updated commit sha1. New commits:

17c092eModify configure help strings about blas.

comment:18 Changed 3 years ago by jpflori

  • Authors set to Clément Pernet
  • Keywords sd75 added
  • Reviewers set to Jean-Pierre Flori
  • Status changed from new to needs_review

comment:19 Changed 3 years ago by jpflori

  • Status changed from needs_review to positive_review

comment:20 Changed 3 years ago by vbraun

  • Status changed from positive_review to needs_work
sage -t --long src/sage/misc/package.py
**********************************************************************
File "src/sage/misc/package.py", line 179, in sage.misc.package.list_packages
Failed example:
    sorted(L.keys())
Expected:
    ['alabaster',
     'arb',
     'atlas',
     ...
     'zn_poly',
     'zope_interface']
Got:
    ['alabaster',
     'arb',
     'babel',
     'backports_abc',

comment:21 Changed 3 years ago by git

  • Commit changed from 17c092e0fdb05bef1904211899a7fbb01345c03e to c52483276acbcc6cdbac37afe7febad10222e2a2

Branch pushed to git repo; I updated commit sha1. New commits:

d57884cupdate standard package list
c524832Merge branch 'public/openblas' of git://trac.sagemath.org/sage into openblas

comment:22 Changed 3 years ago by cpernet

  • Status changed from needs_work to positive_review

comment:23 Changed 3 years ago by vbraun

  • Status changed from positive_review to needs_work

Openblas fails to build on some 32-bit Linux platforms, e.g. http://build.sagedev.org/release/builders/%20%20slow%20AIMS%20bu16_32s02%20%28Ubuntu%2016.04%2032%20bit%29%20incremental/builds/258/steps/compile_1/logs/stdio

[openblas-0.2.15] gfortran -O2 -Wall  -L/home/buildbot/slave/sage_git/build/local/lib -Wl,-rpath,/home/buildbot/slave/sage_git/build/local/lib  -o sblat1 sblat1.o ../libopenblas_haswellp-r0.2.15.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran 
[openblas-0.2.15] ../libopenblas_haswellp-r0.2.15.a(saxpy.o): In function `saxpy_':
[openblas-0.2.15] axpy.c:(.text+0x9a): undefined reference to `saxpy_k'
[openblas-0.2.15] axpy.c:(.text+0x12a): undefined reference to `saxpy_k'
[openblas-0.2.15] ../libopenblas_haswellp-r0.2.15.a(sswap.o): In function `sswap_':
[openblas-0.2.15] swap.c:(.text+0x84): undefined reference to `sswap_k'
[openblas-0.2.15] swap.c:(.text+0xfb): undefined reference to `sswap_k'
[openblas-0.2.15] ../libopenblas_haswellp-r0.2.15.a(scopy.o): In function `scopy_':
[openblas-0.2.15] copy.c:(.text+0x41): undefined reference to `scopy_k'
[openblas-0.2.15] ../libopenblas_haswellp-r0.2.15.a(sscal.o): In function `sscal_':
[openblas-0.2.15] scal.c:(.text+0x67): undefined reference to `sscal_k'
[openblas-0.2.15] scal.c:(.text+0xc1): undefined reference to `sscal_k'
[openblas-0.2.15] ../libopenblas_haswellp-r0.2.15.a(sdot.o): In function `sdot_':
[openblas-0.2.15] dot.c:(.text+0x41): undefined reference to `sdot_k'
[openblas-0.2.15] ../libopenblas_haswellp-r0.2.15.a(sasum.o): In function `sasum_':
[openblas-0.2.15] asum.c:(.text+0x28): undefined reference to `sasum_k'
[openblas-0.2.15] ../libopenblas_haswellp-r0.2.15.a(snrm2.o): In function `snrm2_':
[openblas-0.2.15] nrm2.c:(.text+0x28): undefined reference to `snrm2_k'
[openblas-0.2.15] ../libopenblas_haswellp-r0.2.15.a(isamax.o): In function `isamax_':
[openblas-0.2.15] imax.c:(.text+0x2b): undefined reference to `isamax_k'
[openblas-0.2.15] ../libopenblas_haswellp-r0.2.15.a(srot.o): In function `srot_':
[openblas-0.2.15] rot.c:(.text+0x4f): undefined reference to `srot_k'
[openblas-0.2.15] ../libopenblas_haswellp-r0.2.15.a(memory.o): In function `blas_memory_alloc':
[openblas-0.2.15] memory.c:(.text+0x456): undefined reference to `cpuid'
[openblas-0.2.15] ../libopenblas_haswellp-r0.2.15.a(parameter.o): In function `get_L2_size':
[openblas-0.2.15] parameter.c:(.text+0x35): undefined reference to `cpuid'
[openblas-0.2.15] collect2: error: ld returned 1 exit status

comment:24 Changed 3 years ago by vbraun

Also, numerical noise:

sage -t --long src/sage/matrix/matrix_double_dense.pyx
**********************************************************************
File "src/sage/matrix/matrix_double_dense.pyx", line 1009, in sage.matrix.matrix_double_dense.Matrix_double_dense.singular_values
Failed example:
    A.condition() > 1.6e16 or A.condition()
Expected:
    True
Got:
    1.5972138093369194e+16
**********************************************************************
1 item had failures:
   1 of  31 in sage.matrix.matrix_double_dense.Matrix_double_dense.singular_values
    [636 tests, 1 failure, 1.91 s]

comment:25 Changed 3 years ago by vbraun

More numerical noise:

sage -t --long src/sage/modular/modform/numerical.py
**********************************************************************
File "src/sage/modular/modform/numerical.py", line 429, in sage.modular.modform.numerical.NumericalEigenforms.eigenvalues
Failed example:
    n.eigenvalues([3,5,13])  # rel tol 2e-10
Expected:
    [[177148.0, 252.00000000001896], [48828126.0, 4830.000000001376], [1792160394038.0, -577737.9999898539]]
Got:
    [[177148.0, 252.00000000001413],
     [48828126.0, 4830.000000003907],
     [1792160394038.0, -577737.9998566243]]
Tolerance exceeded in 1 of 6:
    -577737.9999898539 vs -577737.9998566243, tolerance 2e-10 > 2e-10
**********************************************************************
File "src/sage/modular/modform/numerical.py", line 446, in sage.modular.modform.numerical.NumericalEigenforms.eigenvalues.phi
Failed example:
    n.eigenvalues([3,5,13])  # rel tol 2e-10
Expected:
    [[177148.0, 252.00000000001896], [48828126.0, 4830.000000001376], [1792160394038.0, -577737.9999898539]]
Got:
    [[177148.0, 252.00000000001413],
     [48828126.0, 4830.000000003907],
     [1792160394038.0, -577737.9998566243]]
Tolerance exceeded in 1 of 6:
    -577737.9999898539 vs -577737.9998566243, tolerance 2e-10 > 2e-10
**********************************************************************
2 items had failures:
   1 of   3 in sage.modular.modform.numerical.NumericalEigenforms.eigenvalues
   1 of   3 in sage.modular.modform.numerical.NumericalEigenforms.eigenvalues.phi
    [46 tests, 2 failures, 0.56 s]

comment:26 Changed 3 years ago by cpernet

Very surprising. Could you give the details on the host architecture where the numerical noise issues appeared (uname -a and cat /proc/cpuinfo)? I could not reproduce them on my x86_64 sandibridge running linux-mint 14.

Similarly for the 32 bit linking errors (it never saw it fail to compile on 32 bits ubuntu's). I could then try to find a similar arch and investigate it and/or report the problem upstream.

comment:27 Changed 3 years ago by vbraun

The buildbot machine with the link failure is Ubuntu 16.04 32-bit (in a VM).

As far as the numerical noise goes, the tolerance just needs to be incremented a a tiny bit.

comment:28 Changed 3 years ago by cpernet

The linking issue looks like https://github.com/xianyi/OpenBLAS/issues/657. OpenBLAS devs say they did not tune the 32bit haswell case, and recommend passing BINARY=32 to make which will default to nehalem kernels.

comment:29 Changed 3 years ago by git

  • Commit changed from c52483276acbcc6cdbac37afe7febad10222e2a2 to b37ead3b0cf261d6e7c473497ea3d5009e910725

Branch pushed to git repo; I updated commit sha1. New commits:

b37ead3fixing numerical noise issues of #20096 in some arch when OpenBLAS replaces ATLAS.

comment:30 Changed 3 years ago by git

  • Commit changed from b37ead3b0cf261d6e7c473497ea3d5009e910725 to 2a13e137a8688fe5a14b10c147cd500733b5afba

Branch pushed to git repo; I updated commit sha1. New commits:

2a13e13Tells OpenBLAS to use default kernels on 32bits arch (fixing #20096)

comment:31 Changed 3 years ago by cpernet

  • Status changed from needs_work to needs_review

The last two commits fix the two issues raised by Volker:

  • linking error on 32 bits -> passing BINARY=32 to make
  • numerical noise -> fixing the tolerance

comment:32 follow-up: Changed 3 years ago by cpernet

  • Status changed from needs_review to needs_work

Damn, this patch doesn't work as SAGE64 is not necessarily set to yes when compiling on a 64 bits machine. After reading about the uselessness of SAGE64 on sage-devel, and browsing through MPIR's and M4RI's spkg's I don't find any environment variable that would tell us whether the target architecture is 32 or 64 bits.

An option could be to copy/paste MPIR's long case switch setting ABI=32/64. Does anybody have a better solution?

comment:33 Changed 3 years ago by git

  • Commit changed from 2a13e137a8688fe5a14b10c147cd500733b5afba to f8fedca2afd96832f1606f5d4fcdbb58623e7fb5

Branch pushed to git repo; I updated commit sha1. New commits:

f8fedcaproperly checking whether we are compiling on a 64 or 32 bit architecture.

comment:34 follow-up: Changed 3 years ago by cpernet

  • Status changed from needs_work to needs_review

Ok, I pushed an new python way of checking this.

comment:35 Changed 3 years ago by vbraun

IMHO at this point its easier to just write the spkg-install in Python

comment:36 in reply to: ↑ 32 Changed 3 years ago by jpflori

Replying to cpernet:

Damn, this patch doesn't work as SAGE64 is not necessarily set to yes when compiling on a 64 bits machine. After reading about the uselessness of SAGE64 on sage-devel, and browsing through

Yes indeed. SAGE64 was used on strange archs where kernel is 64 bits but userspace is 32 bits by default, e.g. old OS X or Solaris.

comment:37 in reply to: ↑ 34 ; follow-up: Changed 3 years ago by jpflori

Replying to cpernet:

Ok, I pushed an new python way of checking this.

Then we should make the spkg dependent on python. Note that ATLAS install script also used python so there is no regression here.

comment:38 Changed 3 years ago by jpflori

(I mean the spkg-intall script for ATLAS).

comment:39 Changed 3 years ago by jpflori

Maybe using something with uname -m could work without depending on python? And maybe it's time for a testabi.sh script in sage/src/bin/?

@volker: any preferred solution?

comment:40 Changed 3 years ago by vbraun

We require a system python so it doesn't have to depend on our Python. IMHO the atlas dependency is incorrect. It needs to be python2/3 compatible, though.

comment:41 Changed 3 years ago by jpflori

Not sure it is what Volker meant, but you cannot use print the same way in Python 2 and 3...

comment:42 Changed 3 years ago by git

  • Commit changed from f8fedca2afd96832f1606f5d4fcdbb58623e7fb5 to d5dd1d018e401f5922e9e2b0949359349488e2be

Branch pushed to git repo; I updated commit sha1. New commits:

d5dd1d0support system python3

comment:43 Changed 3 years ago by vbraun

You can after

from __future__ import print_function

comment:44 follow-up: Changed 3 years ago by cpernet

I've just fixed the print so it would work on a system python v3 as well as v2.

As for comment:39 : a solution using calls to uname is being used in mpir/spkg-install. It would be a good starting point to write a testabi.sh, but I do not feel confident enough with sage's build system to write myself. I also find this solution rather complicated, compared to calling python's plateform.architecture()

If required, I'll rewrite the spkg-install in python, but as I noticed many other spkg-installs are in bash and make a one line call to python, I thought it would be ok like this.

comment:45 in reply to: ↑ 44 Changed 3 years ago by jpflori

Replying to cpernet:

I've just fixed the print so it would work on a system python v3 as well as v2.

That's quite lucky :) Adding

from __future__ import print_function

turns print into a real function. With your code it calls the magic print on ("...") which turns out to have the same behavior in this specific case.

As for comment:39 : a solution using calls to uname is being used in mpir/spkg-install. It would be a good starting point to write a testabi.sh, but I do not feel confident enough with sage's build system to write myself. I also find this solution rather complicated, compared to calling python's plateform.architecture()

If required, I'll rewrite the spkg-install in python, but as I noticed many other spkg-installs are in bash and make a one line call to python, I thought it would be ok like this.

Nope, please don't do so.

comment:46 Changed 3 years ago by git

  • Commit changed from d5dd1d018e401f5922e9e2b0949359349488e2be to 2bff27f16b4d8590955a8716358f6d68e7c82e5a

Branch pushed to git repo; I updated commit sha1. New commits:

2bff27fCleaner Python 2/3 use of print.

comment:47 Changed 3 years ago by jpflori

  • Status changed from needs_review to positive_review

comment:48 Changed 3 years ago by jpflori

  • Milestone changed from sage-7.1 to sage-7.4

comment:49 Changed 3 years ago by vbraun

  • Status changed from positive_review to needs_work

On arando (linux 32-bit):

[openblas-0.2.15] gfortran -O2 -Wall  -L/home/buildslave-sage/slave/sage_git/build/local/lib -Wl,-rpath,/home/buildslave-sage/slave/sage_git/build/local/lib  -o zblat3 zblat3.o ../libopenblas_sandybridgep-r0.2.15.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran 
[openblas-0.2.15] ../libopenblas_sandybridgep-r0.2.15.a: file not recognized: File truncated
[openblas-0.2.15] collect2: error: ld returned 1 exit status
[openblas-0.2.15] make[4]: *** [dblat3] Error 1
[openblas-0.2.15] ../libopenblas_sandybridgep-r0.2.15.a: file not recognized: File truncated
[openblas-0.2.15] collect2: error: ld returned 1 exit status
[openblas-0.2.15] make[4]: *** [cblat3] Error 1
[openblas-0.2.15] ../libopenblas_sandybridgep-r0.2.15.a: file not recognized: File truncated
[openblas-0.2.15] collect2: error: ld returned 1 exit status
[openblas-0.2.15] make[4]: *** [zblat3] Error 1
[openblas-0.2.15] make[4]: Target `all' not remade because of errors.
[openblas-0.2.15] make[4]: Leaving directory `/home/buildslave-sage/slave/sage_git/build/local/var/tmp/sage/build/openblas-0.2.15/src/test'
[openblas-0.2.15] make[3]: *** [tests] Error 2
[openblas-0.2.15] make[3]: Leaving directory `/home/buildslave-sage/slave/sage_git/build/local/var/tmp/sage/build/openblas-0.2.15/src'
[openblas-0.2.15] Error while running the OpenBLAS testsuite ... exiting
[openblas-0.2.15] 
[openblas-0.2.15] real	0m0.438s
[openblas-0.2.15] user	0m0.172s
[openblas-0.2.15] sys	0m0.024s
[openblas-0.2.15] ************************************************************************
[openblas-0.2.15] Error testing package openblas-0.2.15

comment:50 Changed 3 years ago by vbraun

The first occurence of "libopenblas_sandybridgep" is

[openblas-0.2.15] touch libopenblas_sandybridgep-r0.2.15.a
[openblas-0.2.15] make -j 4 -C test all
[openblas-0.2.15] make[4]: Entering directory `/home/buildslave-sage/slave/sage_git/build/local/var/tmp/sage/build/openblas-0.2.15/src/test'
[openblas-0.2.15] gfortran -O2 -Wall  -L/home/buildslave-sage/slave/sage_git/build/local/lib -Wl,-rpath,/home/buildslave-sage/slave/sage_git/build/local/lib  -o sblat1 sblat1.o ../libopenblas_sandybridgep-r0.2.15.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran 
[openblas-0.2.15] gfortran -O2 -Wall  -L/home/buildslave-sage/slave/sage_git/build/local/lib -Wl,-rpath,/home/buildslave-sage/slave/sage_git/build/local/lib  -o dblat1 dblat1.o ../libopenblas_sandybridgep-r0.2.15.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran 
[openblas-0.2.15] gfortran -O2 -Wall  -L/home/buildslave-sage/slave/sage_git/build/local/lib -Wl,-rpath,/home/buildslave-sage/slave/sage_git/build/local/lib  -o cblat1 cblat1.o ../libopenblas_sandybridgep-r0.2.15.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran 
[openblas-0.2.15] gfortran -O2 -Wall  -L/home/buildslave-sage/slave/sage_git/build/local/lib -Wl,-rpath,/home/buildslave-sage/slave/sage_git/build/local/lib  -o zblat1 zblat1.o ../libopenblas_sandybridgep-r0.2.15.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran 
[openblas-0.2.15] ../libopenblas_sandybridgep-r0.2.15.a: file not recognized: File truncated
[openblas-0.2.15] collect2: error: ld returned 1 exit status
[openblas-0.2.15] ../libopenblas_sandybridgep-r0.2.15.a: file not recognized: File truncated
[openblas-0.2.15] make[4]: *** [sblat1] Error 1

You can't build a static archive with "touch" %-)

comment:51 Changed 3 years ago by jpflori

Oops...

comment:52 Changed 3 years ago by jpflori

Isn't there a previous issue with the static archive in the log? touch is maybe just here to update the timestamp...

We could/should also disable building a static archive.

And update openblas to 0.2.19 and hope the problem disappear...

comment:53 Changed 3 years ago by jpflori

Indeed on my machine, running make tests starts by touching the static archive. No we have to devise why passing BINARY=32 somehow prevents the required archive to be produced. @volker: can you tell us what so and a files were produced on arando?

comment:54 Changed 3 years ago by vbraun

No, there was no previous apperance. It could have been > /dev/null but there is no valid .a ...

Ask Dima for an account on arando.

comment:55 Changed 3 years ago by vbraun

In general, disabling static libs is a +1 from me

comment:56 Changed 3 years ago by jpflori

Hum this was actually an easy one... In spkg-check we run make tests without BINARY=32 so the static archive name the tests want to link to is not the right one...

comment:57 Changed 3 years ago by git

  • Commit changed from 2bff27f16b4d8590955a8716358f6d68e7c82e5a to 2300a31a039afc27d111c9757efeec1a2bb643be

Branch pushed to git repo; I updated commit sha1. New commits:

2300a31Pass the same envvars to make for building and testing openblas.

comment:58 Changed 3 years ago by jpflori

  • Status changed from needs_work to positive_review

The modification is quite easy so I put this back to positive review.

@volker: if you think we'd better patch the Makefile to use a symlink for linking to the static archive when building the tests, please put this back to needs_work.

comment:59 Changed 3 years ago by vbraun

  • Branch changed from public/openblas to 2300a31a039afc27d111c9757efeec1a2bb643be
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:60 in reply to: ↑ 37 Changed 3 years ago by leif

  • Commit 2300a31a039afc27d111c9757efeec1a2bb643be deleted

Replying to jpflori:

Replying to cpernet:

Ok, I pushed an new python way of checking this.

Then we should make the spkg dependent on python. Note that ATLAS install script also used python so there is no regression here.

This is a bug. Sage meanwhile requires a system Python (>= 2.6) to build / bootstrap anyway.

And while ATLAS' spkg-install in some files at least claims to aim at compatibility to Python 2.4(!), I have some patches to make it work with Python 2.6 (in order to drop the dependency on Sage's Python, such that it is built early, almost first)...

comment:61 Changed 3 years ago by leif

Follow-up w.r.t. OpenBLAS brokenness: #21561 (build issues; it also segfaults on AMD CPUs, cf. sage-devel thread from September 25th)

comment:62 Changed 3 years ago by vbraun

Apologies, wrong ticket!

Last edited 3 years ago by vbraun (previous) (diff)
Note: See TracTickets for help on using tickets.