Opened 17 months ago

Closed 12 months ago

Last modified 12 months ago

#23272 closed enhancement (fixed)

OpenBLAS - alterations for TARGET

Reported by: aram.dermenjian Owned by:
Priority: major Milestone: sage-8.0
Component: packages: standard Keywords: openblas
Cc: jpflori, vbraun, jdemeyer, moritz Merged in:
Authors: Aram Dermenjian Reviewers: Volker Braun, Dima Pasechnik
Report Upstream: N/A Work issues:
Branch: 895b48f (Commits) Commit:
Dependencies: Stopgaps:

Description (last modified by slelievre)

On purchasing a new laptop, I attempted to MAKE sage as I normally have and ran into the following error:

Error building Sage. 

The following package(s) may have failed to build (not necessarily during this run of 'make all'):

* package: openblas-0.2.19.p0 
  log file: /sage/sage/logs/pkgs/openblas-0.2.19.p0.log
  build directory: /sage/sage/local/var/tmp/sage/build/openblas-0.2.19.p0

The error seemed to be really coming from:

make[3]: *** [getarch_2nd] Error 1
Makefile:123: *** OpenBLAS: Detecting CPU failed. Please set TARGET explicitly, e.g. make TARGET=your_cpu_target. Please read README for the detail..  Stop.
make[3]: Leaving directory '/sage/sage/local/var/tmp/sage/build/openblas-0.2.19.p0/src'
Error building OpenBLAS

It turned out that this error has been seen multiple times in Sage:

etc.

I noticed that updating the config (as mentioned in the threads above) accurately solved the problem, but I wanted to try and fix the problem head on rather than just temporarily solve it. With that in mind, I conversed with the OpenBLAS people: https://github.com/xianyi/OpenBLAS/issues/1204 and they suggested I update the cpu file for the build, which I have included.

Additionally, I wanted to fix this problem for future users and thus I wanted to add a fail-safe for the code so that if this error happens again, then we automatically try the TARGET=ATOM method. But, I want to create this with the caveat that a user can change the method.

Before finalizing these changes, we should note a few things, but as I've not worked that long with sage I don't know how to address them. In essence:

  • Are the changes I made appropriate for the make?
  • I'm adding documentation on this as there is no documentation.
  • There was questions on why we have "p0" at the end of our version? I assume this is from the patch, but I don't know.
  • The programmer suggested we change the " top level option to use -lblas (debian/ubuntu) -lopenblas (suse/fedora/epel)". I don't know what that means in regards to us, but figured someone might be able to answer this question and maybe we can ensure this is happening.

If there are any questions feel free to ask, and if you believe this is not necessarily or if something else is better practice, let me know. I've never done a bash alteration so not sure what the details for that.

The tarball is here https://github.com/xianyi/OpenBLAS/archive/v0.2.20.tar.gz (using the browser to download it will give it correct name)

Change History (43)

comment:1 Changed 17 months ago by aram.dermenjian

  • Branch set to u/aram.dermenjian/openblas___alterations_for_target

comment:2 Changed 17 months ago by aram.dermenjian

  • Authors set to Aram Dermenjian
  • Commit set to 56a64c1d6eb081d714b0ab9630f33708a6ade34e
  • Type changed from defect to enhancement

New commits:

9ce1ab8OpenBLAS changes for CPUs
d5d2e42Update trac tags
56a64c1Merge branch 'openblas' into t/23272/openblas___alterations_for_target

comment:3 Changed 17 months ago by aram.dermenjian

  • Status changed from new to needs_review

I'm setting the status to needs review mainly to get people's attention. I assume there will be things that need to be changed.

comment:4 Changed 17 months ago by jpflori

  • Cc jpflori added

comment:5 Changed 17 months ago by nthiery

  • Cc vbraun jdemeyer added

We discussed a bit. In the long run, we would like OpenBlas? to provide a short mantra to configure itself with "use Atom by default if target unavailable".

For now, I wondered about separating somewhat more the configuration part from the compilation, by adding a new dependency in OpenBlas? makefile like:

    is_architecture_available:
         <here the logic from the beginning of the `libs` dependency

and then, in spkg-install / spkg-check do first a call to

    make is_architecture_available

do the additional logic if needed, and then just call make as usual.

Just throwing ideas in the air; not sure this is the right thing to do.

Insights anyone?

comment:6 Changed 17 months ago by jdemeyer

I don't really like that we are just guessing ATOM as a fall-back when the build failed. I would prefer to just let the failed build be failed (possibly with an error message suggesting OPENBLAS_CONFIGURE="TARGET=ATOM" or something).

Also, if you add a patch, you should bump the version number of openblas (from 0.2.19 to 0.2.19.p0).

comment:7 Changed 16 months ago by git

  • Commit changed from 56a64c1d6eb081d714b0ab9630f33708a6ade34e to 4be16f32aa2ad1b89dd6bc868b415f83c0f0e6b7

Branch pushed to git repo; I updated commit sha1. New commits:

4be16f3Check directly that Target is needed

comment:8 Changed 16 months ago by aram.dermenjian

  • Description modified (diff)

New commits:

4be16f3Check directly that Target is needed

comment:9 Changed 16 months ago by aram.dermenjian

So the reason we are just guessing "ATOM" is because that was the recommended fall-back by the openblas team. (You can see the conversation I had in the link I posted)

THe problem with just letting the build fail is that as new computers come out with new CPUs these CPUs will not be included in the openblas config which tests against CPU. In other words all future computers will start having problems when new CPUs get introduced. Since OpenBLAS doesn't update regularly, we can't rely on us to "remember" every time a new CPU gets released.

The point of this ticket is to upgrade the openblas script through a patch so that new CPUs that have been added in the last year will now properly be able to get a target. Additionally, it will protect us for future CPUs by defaulting to ATOM until we get an update to OpenBLAS.

Note that the version is already 0.2.19.p0 so I've let it as is.

comment:10 Changed 16 months ago by vbraun

Typo: OpenBAS -> OpenBLAS

I agree that it would be best if OpenBLAS would just default to something sensible if it can't detect the CPU. But, unless thats implemented, we should just try a second time with TARGET=ATOM. I'd prefer if we just unconditionally try that a second time and not rely on the spelling of the logged error message, or the location of the log file.

OpenBLAS 0.2.20 should be released really soon now...

comment:11 Changed 16 months ago by vbraun

PS: the patchlevel should be increased 0.2.19.p0 -> 0.2.19.p1 to trigger recompilation.

comment:12 Changed 16 months ago by fbissey

I just received a notice that openblas 0.2.20 has been released

Version 0.2.20

24-Jul-2017



common:

        * Improved CMake support

        * Fixed several thread race and locking bugs

        * Fixed default LAPACK optimization level

        * Updated LAPACK to 3.7.0

        * Added ReLAPACK (https://github.com/HPAC/ReLAPACK, make BUILD_RELAPACK=1)



POWER:

        * Optimizations for Power9

        * Fixed several Power8 assembly bugs

....
x86_64:

        * Detect Intel Bay Trail and Apollo Lake

        * Detect Intel Sky Lake and Kaby Lake

        * Detect Intel Knights Landing

        * Detect AMD A8, A10, A12 and Ryzen

        * Support 64bit builds with Visual Studio

        * Fix building with Intel and PGI compilers

        * Fix building with MINGW and TDM-GCC

        * Fix cmake builds for Haswell and related cpus

        * Fix building for Sandybridge with CLANG 3.9

* Add support for the FLANG compiler

....

While some of the improvements in this ticket should be kept, I think it would be best if it became an upgrade ticket.

comment:13 Changed 16 months ago by fbissey

Upgrading should also get rid of us of all the current patches except for openblas-0.2.19-SMP_conditional.patch (because the PR for it is still sitting on my HD - real life happened and it got forgotten).

comment:14 Changed 16 months ago by dimpase

  • Description modified (diff)

comment:15 Changed 16 months ago by embray

if grep -q "Detecting CPU failed. Please set TARGET explicitly" "${SAGE_ROOT}/logs/pkgs/openblas-$(<${SAGE_ROOT}/build/pkgs/openblas/package-version.txt).log"; then

This isn't going to work for a number reasons; for one the log files are usually appended to so this will be true for builds that failed but then later succeeded if the log files were not deleted first.

Better to explicitly tee the configure command to a separate log file and grep that.

comment:16 Changed 15 months ago by git

  • Commit changed from 4be16f32aa2ad1b89dd6bc868b415f83c0f0e6b7 to 842607822d1501c2d6d0dbca9d3aa2cf750ec25d

Branch pushed to git repo; I updated commit sha1. New commits:

90e56c8Update OpenBLAS package and update install error handling
39094e3Merge branch 'develop' into t/23272/openblas___alterations_for_target
5d758c5Remove unused envar
8426078Typo fix

comment:17 Changed 15 months ago by aram.dermenjian

So in this update I've gone ahead and updated the version and removed the unnecessary patches (keeping the one mentioned by fbissey). I've not worked with packages before so I don't know how I'm supposed to actually get the package itself onto the servers? I personally added it directly into my personal upstream, but not sure if that is a viable solution?

I've also removed the documentation test and instead went with a TARGET test like I had done before. I've gone to my old method of just retrying with "TARGET=ATOM" as was also suggested by vbraun.

comment:18 Changed 15 months ago by dimpase

Where is the updated tarball (the one with

+sha1=a186074145a24823e82c65672dad1cd1ca6fe89c
+md5=48637eb29f5b492b91459175dcc574b1
+cksum=607023019

How does one get it? Please provide a link.

comment:19 Changed 15 months ago by dimpase

Once the ticket is positively reviewed and merged, a copy of the tarball will be put on sagemath mirrors by the release manager.

comment:20 Changed 15 months ago by dimpase

  • Description modified (diff)

OK, I gather it's the official openblas release tarball (md5sum is right). I've modified the ticket description accordingly.

comment:22 Changed 15 months ago by fbissey

Release seems to be tardy. Possibly because they were moving to a 0.3.0 release. We could adopt https://github.com/xianyi/OpenBLAS/commit/63cfa32691680505e6b9daf0997755178ddd3144 instead.

comment:23 Changed 15 months ago by aram.dermenjian

@dimpase Yes, that was the correct one.

@isuruf @fbissey Thanks for the update. I think we might want to consider their release 0.2.21 branch (https://github.com/xianyi/OpenBLAS/tree/release-0.2.21) as it seems to include exactly the patches that are needed to fix the issue that is discussed in Issue 1258 of OpenBLAS (https://github.com/xianyi/OpenBLAS/issues/1258#issuecomment-320223121). I'm going to respond on the issue thread they have there. We'll see if they have any updates. I don't think they are waiting for 0.3.0 release as that issue was created after they had started talking about 0.3.0. So I don't believe that to be the case.

comment:24 Changed 15 months ago by git

  • Commit changed from 842607822d1501c2d6d0dbca9d3aa2cf750ec25d to 341cec9afa8cbb8204a545b1d556ad1f45fef689

Branch pushed to git repo; I updated commit sha1. New commits:

341cec9Patch up error created in v0.2.20

comment:25 Changed 15 months ago by aram.dermenjian

Ok, so according to the authors, v0.2.21 is likely not going to be coming out soon and that the only change is one patch which was added to master. Instead of rezipping everything, I've decided to just add this patch into our patches directory and upload that directly.

Everything is working on my end and this ticket is now ready for review.

comment:26 Changed 15 months ago by git

  • Commit changed from 341cec9afa8cbb8204a545b1d556ad1f45fef689 to 3831e40be64aadb9773fb0e35a569c54c0268ec6

Branch pushed to git repo; I updated commit sha1. New commits:

3831e40Merge branch 'develop' into t/23272/openblas___alterations_for_target

comment:27 Changed 14 months ago by vbraun

I think this needs updating for openblas-0.2.19-cygwin.patch

comment:28 Changed 14 months ago by aram.dermenjian

I had removed all of the patches except for one. If the cygwin patch is required I can re-add it in.

comment:29 Changed 14 months ago by vbraun

It was recently added; please merge in the latest beta

comment:30 Changed 14 months ago by aram.dermenjian

@vbraun I don't know what you mean by the latest beta. Can you be a little more specific? (Do you have a link? I can try mergeing it in and making sure it works. Alternatively, if you've already access to the beta on your machine you can try integrating it and then I can pull and test it on mine as well?)

comment:31 Changed 14 months ago by vbraun

I just mean the latest develop branch, e.g. on trac: git remote update trac && git merge trac/develop

comment:32 Changed 14 months ago by git

  • Commit changed from 3831e40be64aadb9773fb0e35a569c54c0268ec6 to 229870046dfac171c0df30131ff18bb3440b4f71

Branch pushed to git repo; I updated commit sha1. New commits:

4c76040Merge branch 'develop' into t/23272/openblas___alterations_for_target
5d405e4update cygwin patch
2298700Merge branch 'develop' into t/23272/openblas___alterations_for_target

comment:33 Changed 14 months ago by aram.dermenjian

I kept having problems merging in the cygwin patch (kept failing on my build for some reason) I fixed it and have merged again with the latest develop. It should be ready for testing now.

comment:34 Changed 13 months ago by moritz

  • Cc moritz added

I don't have much time to look into the ticket, but just wanted to confirm, that it worked for me.

sage 8.1.beta7 didn't compile on a debian box with a kaby lake i7-7500U CPU, because of the 'missing target' openblas error.

After loading this patch it worked like a charm (All tests passed). I guess it is useful to have a fix even after the problem for my specific cpu will be fixed upstream; there always will be newer cpus...

comment:35 Changed 13 months ago by moritz

In sage 8.1.beta8, I simply compiled with export OPENBLAS_CONFIGURE="TARGET=HASWELL" (with a kabylake CPU..) seems to work.

comment:36 Changed 13 months ago by vbraun

  • Reviewers set to Volker Braun
  • Status changed from needs_review to positive_review

comment:37 Changed 12 months ago by vbraun

On two 32-bit buildbots:

**********************************************************************
File "src/sage/numerical/sdp.pyx", line 100, in sage.numerical.sdp
Failed example:
    p.dual_variable(1)  # rel tol 1e-03
Expected:
    [ 85555.0 -85555.0]
    [-85555.0  85555.0]
Got:
    [ 85463.87196917446 -85463.87196914306]
    [-85463.87196914306  85463.87196917432]
Tolerance exceeded in 4 of 4:
    85555.0 vs 85463.87196917446, tolerance 1e-03 > 1e-03
    -85555.0 vs -85463.87196914306, tolerance 1e-03 > 1e-03
    -85555.0 vs -85463.87196914306, tolerance 1e-03 > 1e-03
    85555.0 vs 85463.87196917432, tolerance 1e-03 > 1e-03
**********************************************************************
1 item had failures:
   1 of  41 in sage.numerical.sdp
    [283 tests, 1 failure, 0.75 s]
----------------------------------------------------------------------
sage -t --long src/sage/numerical/sdp.pyx  # 1 doctest failed
----------------------------------------------------------------------

comment:38 Changed 12 months ago by vbraun

  • Status changed from positive_review to needs_work

comment:39 Changed 12 months ago by dimpase

OK, I'll increase the tolerance to 2e-03 there.

comment:40 Changed 12 months ago by dimpase

  • Branch changed from u/aram.dermenjian/openblas___alterations_for_target to public/openblas___alterations_for_target
  • Commit changed from 229870046dfac171c0df30131ff18bb3440b4f71 to 895b48f4c3ed7ebee36f6ffe4ee8d804e7e47d64
  • Reviewers changed from Volker Braun to Volker Braun, Dima Pasechnik
  • Status changed from needs_work to positive_review

New commits:

0af8f5bMerge branch 'u/aram.dermenjian/openblas___alterations_for_target' of trac.sagemath.org:sage into oblas
895b48fincrease tol

comment:41 Changed 12 months ago by andy

This fixed the build under error I was having on Ubuntu 17.10. The /proc/cpuinfo identifies the CPU as:

Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz

comment:42 Changed 12 months ago by vbraun

  • Branch changed from public/openblas___alterations_for_target to 895b48f4c3ed7ebee36f6ffe4ee8d804e7e47d64
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:43 Changed 12 months ago by slelievre

  • Commit 895b48f4c3ed7ebee36f6ffe4ee8d804e7e47d64 deleted
  • Description modified (diff)

(Minor fix to ticket description formatting.)

Note: See TracTickets for help on using tickets.