Opened 12 years ago

Last modified 9 years ago

#10508 closed enhancement

Update ATLAS to stable version 3.10 — at Version 127

Reported by: vbraun Owned by: tbd
Priority: major Milestone: sage-5.10
Component: packages: standard Keywords: ATLAS spkg
Cc: dimpase, fbissey, leif, kcrisman Merged in:
Authors: Volker Braun Reviewers: Benjamin Jones
Report Upstream: Reported upstream. No feedback yet. Work issues:
Branch: Commit:
Dependencies: #13160 Stopgaps:

Status badges

Description (last modified by vbraun)

The new atlas release now builds netlib lapack itself, so the lapack tarball is now included in the ATLAS spkg.

  • Updated to newest upstream source, various patches are no longer required
  • SAGE_ATLAS_LIB=path now searches in path/libatlas.so instead of path/lib/libatlas.so so it works for people with atlas in /lib64, too.
  • Threading is now enabled by default
  • Flush before os.system (#13210)

Upstream has made some attempt at changing the layout of the shared libraries, which is now different from the static libraries. The atlas spkg contains a stub autoconf/libtools project that unpacks the static libraries and repacks them into equivalent shared libraries.

By default, ATLAS will now try twice to get timings and fail immediately if throttling is enabled. If auto-tuning fails build with SAGE_ATLAS_ARCH=fast, and if that fails with SAGE_ATLAS_ARCH=base. On x86, the fast and base targets are the new ATLAS generic targets x86SSE3 and x86SSE2/x86x87.

The current spkg version is at

http://www.stp.dias.ie/~vbraun/Sage/spkg/atlas-3.10.0.spkg

Apply trac_10508_root_repo.patch to the SAGE_ROOT repository and 10508_doctest.patch, trac_10508_update_atlas_docs.patch to the Sage repository.

Remove the lapack and blas packages.

Change History (128)

comment:1 Changed 12 years ago by vbraun

Note that you cannot just use sage -f atlas-3.9.32.spkg to update atlas only. Many other spkgs use blas/lapack and must be rebuilt. The easiest way is to do a separate Sage installation...

The cvxopt spkg needs to be updated to link correctly with this atlas release, see #10509.

comment:2 Changed 12 years ago by dimpase

  • Cc dimpase added

comment:3 Changed 12 years ago by dimpase

  • Status changed from new to needs_info

Does it mean that LAPACK spkg can be removed, too?

comment:4 Changed 12 years ago by vbraun

Yes, the lapack spkg can be removed.

I'm still trying to debug some issues with linbox...

comment:5 Changed 12 years ago by fbissey

  • Cc fbissey added

BLAS can also be removed if we go with this. f2c which was used before we got ATLAS to provide cblas by f2c-ing the BLAS package should also be removed (I think it listed in scipy's dependency only).

comment:6 follow-up: Changed 12 years ago by dimpase

on 32-bit x86 Linux (Debian squeeze) I get the following, when trying to install the spkg (applied the patches to a pristine Sage 4.6.1.alpha3):

...
make -j1 libatlas.so libptf77blas.so libf77blas.so \
                libptcblas.so libcblas.so liblapack.so
make[3]: Entering directory `/usr/local/src/sage/sage-4.6.1.alpha3/spkg/build/atlas-3.9.32/ATLAS-build/lib'
ld -melf_i386 -shared -soname /usr/local/src/sage/sage-4.6.1.alpha3/local/lib/libatlas.so -o libatlas.so \
           -rpath-link /usr/local/src/sage/sage-4.6.1.alpha3/local/lib \
           --whole-archive libatlas.a --no-whole-archive -lc -lm
make[3]: *** No rule to make target `libptf77blas.a', needed by `libptf77blas.so'.  Stop.
make[3]: Leaving directory `/usr/local/src/sage/sage-4.6.1.alpha3/spkg/build/at
las-3.9.32/ATLAS-build/lib'
make[2]: *** [ptshared] Error 2
make[2]: Leaving directory `/usr/local/src/sage/sage-4.6.1.alpha3/spkg/build/at
las-3.9.32/ATLAS-build/lib'
Configuration:
    SAGE_LOCAL: /usr/local/src/sage/sage-4.6.1.alpha3/local
    linker_Solaris?: False
    PPC?: False
    SPKG_DIR: /usr/local/src/sage/sage-4.6.1.alpha3/spkg/build/atlas-3.9.32
    linker_GNU?: True
    ld: GNU
    system: Linux
    Darwin?: False
    machine: i686
    fortran: gfortran
    Solaris?: False
    fortran_g95?: False
    bits: 32bit
    CYGWIN?: False
    SPARC?: False
    fortran_GNU?: True
    FreeBSD?: False
    32bit?: True
    Linux?: True
    64bit?: False
    Intel?: False
    processor: 

comment:7 in reply to: ↑ 6 Changed 12 years ago by dimpase

Replying to dimpase:

on 32-bit x86 Linux (Debian squeeze) I get the following, when trying to install the spkg (applied the patches to a pristine Sage 4.6.1.alpha3):

complete install.log is here: http://boxen.math.washington.edu/home/dima/tmp/install-alt3.9.log.gz

comment:8 follow-up: Changed 12 years ago by vbraun

Hi Dima,

I think the problem is

make[6]: Entering directory `/usr/local/src/sage/sage-4.6.1.alpha3/spkg/build/atlas-3.9.32/ATLAS-build/tune/sysinfo'
gcc -c -DL2SIZE=4194304 -I/usr/local/src/sage/sage-4.6.1.alpha3/spkg/build/atlas-3.9.32/ATLAS-build/include -I/usr/local/src/sage/sage-4.6.1.alpha3/spkg/build/atlas-3.9.32/ATLAS-build/../src//include -I/usr/local/src/sage/sage-4.6.1.alpha3/spkg/build/atlas-3.9.32/ATLAS-build/../src//include/contrib -DAdd_ -DF77_INTEGER=int -DStringSunStyle -DATL_OS_Linux -DATL_ARCH_CoreDuo -DATL_CPUMHZ=800 -DATL_SSE3 -DATL_SSE2 -DATL_SSE1 -DATL_GAS_x8632  -fomit-frame-pointer -O3 -mfpmath=387 -fPIC -m32 /usr/local/src/sage/sage-4.6.1.alpha3/spkg/build/atlas-3.9.32/ATLAS-build/../src//tune/sysinfo/findNT.c
gcc -DL2SIZE=4194304 -I/usr/local/src/sage/sage-4.6.1.alpha3/spkg/build/atlas-3.9.32/ATLAS-build/include -I/usr/local/src/sage/sage-4.6.1.alpha3/spkg/build/atlas-3.9.32/ATLAS-build/../src//include -I/usr/local/src/sage/sage-4.6.1.alpha3/spkg/build/atlas-3.9.32/ATLAS-build/../src//include/contrib -DAdd_ -DF77_INTEGER=int -DStringSunStyle -DATL_OS_Linux -DATL_ARCH_CoreDuo -DATL_CPUMHZ=800 -DATL_SSE3 -DATL_SSE2 -DATL_SSE1 -DATL_GAS_x8632  -fomit-frame-pointer -O3 -mfpmath=387 -fPIC -m32 -o xfindNT findNT.o ATL_walltime.o -lm
/usr/lib/crt1.o: In function `_start':
(.text+0x18): undefined reference to `main'
collect2: ld returned 1 exit status
make[6]: *** [xfindNT] Error 1

This fails because ATL_NCPU is not set. Do you have a single-core processor? I guess the threaded libraries are not built in that case.

comment:9 in reply to: ↑ 8 ; follow-up: Changed 12 years ago by dimpase

Replying to vbraun:

This fails because ATL_NCPU is not set. Do you have a single-core processor? I guess the threaded libraries are not built in that case.

yes, it's single-core. An old Pentium M. (Atlas 3.8 spkg does not build on it, at all)

comment:10 follow-up: Changed 12 years ago by vbraun

I changed the spkg-install to make only single-threaded shared libraries if necessary. Should work now.

For simplicity, I made spkgs for cvxopts and sage_scripts. So you need to add

http://www.stp.dias.ie/~vbraun/Sage/spkg/atlas-3.9.32.spkg http://www.stp.dias.ie/~vbraun/Sage/spkg/cvxopt-1.1.3.p0.spkg http://www.stp.dias.ie/~vbraun/Sage/spkg/sage_scripts-4.6.1.alpha3.p0.spkg

to $SAGE_ROOT/spkg/standard and then

  • replace spkg/install with the attached version (Note that sage_scripts overwrites this file during installation, aargh)
  • replace spkg/standard/deps with the attached version.

I'm still having doctest errors with linbox...

comment:11 in reply to: ↑ 10 Changed 12 years ago by dimpase

Replying to vbraun:

I changed the spkg-install to make only single-threaded shared libraries if necessary. Should work now.

It builds OK, but then testlong gives quite a bit of failures:

The following tests failed:
        sage -t  -long -force_lib "devel/sage/doc/en/bordeaux_2008/elliptic_curves.rst"
        sage -t  -long -force_lib "devel/sage/sage/modular/modsym/space.py"
        sage -t  -long -force_lib "devel/sage/sage/modular/modsym/tests.py"
        sage -t  -long -force_lib "devel/sage/sage/modular/modsym/subspace.py"
        sage -t  -long -force_lib "devel/sage/sage/modular/modsym/ambient.py"
        sage -t  -long -force_lib "devel/sage/sage/modular/modform/space.py"
        sage -t  -long -force_lib "devel/sage/sage/modular/modform/element.py"
        sage -t  -long -force_lib "devel/sage/sage/modular/modform/ambient.py"
        sage -t  -long -force_lib "devel/sage/sage/modular/hecke/submodule.py"
        sage -t  -long -force_lib "devel/sage/sage/modular/abvar/abvar.py"
        sage -t  -long -force_lib "devel/sage/sage/modular/abvar/torsion_subgroup.py"
        sage -t  -long -force_lib "devel/sage/sage/modular/abvar/cuspidal_subgroup.py"
        sage -t  -long -force_lib "devel/sage/sage/modular/abvar/finite_subgroup.py"
        sage -t  -long -force_lib "devel/sage/sage/matrix/matrix_integer_dense.pyx"
        sage -t  -long -force_lib "devel/sage/sage/matrix/matrix_integer_dense_hnf.py"
        sage -t  -long -force_lib "devel/sage/sage/rings/qqbar.py"
        sage -t  -long -force_lib "devel/sage/sage/finance/time_series.pyx"
        sage -t  -long -force_lib "devel/sage/sage/schemes/elliptic_curves/padic_lseries.py"
        sage -t  -long -force_lib "devel/sage/sage/schemes/elliptic_curves/ell_modular_symbols.py"
        sage -t  -long -force_lib "devel/sage/sage/schemes/elliptic_curves/ell_rational_field.py"
        sage -t  -long -force_lib "devel/sage/sage/schemes/elliptic_curves/sha_tate.py"
Total time for all tests: 30045.4 seconds

I do not know how many of them are Atlas related, though. The log is here: http://boxen.math.washington.edu/home/dima/tmp/testlong-atl3.9.log

comment:12 follow-up: Changed 12 years ago by fbissey

A lot of it is probably related. Was this a build from scratch? linbox calls seem to be particularly affected, there are a lot of failures in that path: sage.matrix.matrix_integer_dense.Matrix_integer_dense._charpoly_linbox stuff going through: sage.matrix.matrix_rational_dense.Matrix_rational_dense.right_kernel is also affected and that smells like atlas at work too.

I must say we have observed a number of failures related to ATLAS-3.9.23 in sage-on-gentoo: https://github.com/cschwan/sage-on-gentoo/issues#issue/3 you have failure there too but not due to iml/ATLAS as far as I can see. https://github.com/cschwan/sage-on-gentoo/issues#issue/6 more subtle. I see you have the failure with devel/sage/sage/finance/time_series.pyx you may want to look the comment I wrote about it in the section "known test failures for Sage on Gentoo": https://github.com/cschwan/sage-on-gentoo/wiki/Known-test-failures note that sage-on-gentoo can use (c)blas-reference, ATLAS or gsl-cblas or some combinations - in fact amd and intel libraries could in principle be used but no one I know has tried. http://www.gentoo.org/proj/en/science/blas-lapack.xml

Not sure about the SIGFPE, it could come from a result rounded to 0 or the like.

comment:13 in reply to: ↑ 12 Changed 12 years ago by dimpase

Replying to fbissey:

A lot of it is probably related. Was this a build from scratch?

yes, it's a build from scratch, on the same machine (old Pentium M) as already discussed above.

linbox calls seem to be particularly affected, there are a lot of failures in that path:

I wonder if these are Atlas bugs, or Linbox bugs...

One should try an OSX build.

comment:14 follow-up: Changed 12 years ago by fbissey

why OS X in particular? I could do that but not before the 5th of January, when my university reopens. Ok I could do it before that but I'll enjoy the break a little bit more :)

comment:15 in reply to: ↑ 14 ; follow-up: Changed 12 years ago by dimpase

Replying to fbissey:

why OS X in particular? I could do that but not before the 5th of January, when my university reopens. Ok I could do it before that but I'll enjoy the break a little bit more :)

Or does OSX remain disabled for Atlas, i.e. it's not built?

comment:16 in reply to: ↑ 15 Changed 12 years ago by fbissey

Replying to dimpase:

Replying to fbissey:

why OS X in particular? I could do that but not before the 5th of January, when my university reopens. Ok I could do it before that but I'll enjoy the break a little bit more :)

Or does OSX remain disabled for Atlas, i.e. it's not built?

I had forgotten about that. I checked the spkg and ATLAS is not built on cygwin and OS X. So it makes sense.

comment:17 follow-up: Changed 12 years ago by vbraun

I get the same doctest errors. Most of them are linbox related. The SIGFPE is from converting a NAN into a GMP integer, but I haven't gotten to the root of the NAN yet. In trying to debug this I've noticed that there are a bunch of valgrind warnings in the linbox code path we are using. I've asked some more specific questions on the linbox-use mailinglist:

https://groups.google.com/d/topic/linbox-use/N3QNNOQuTAc/discussion

But so far no final conclusion.

comment:18 in reply to: ↑ 9 ; follow-up: Changed 12 years ago by fbissey

Replying to dimpase:

Replying to vbraun:

This fails because ATL_NCPU is not set. Do you have a single-core processor? I guess the threaded libraries are not built in that case.

yes, it's single-core. An old Pentium M. (Atlas 3.8 spkg does not build on it, at all)

I notice you say it is a Pentium M, yet ATLAS is compiling things with the assumption that it is a coreduo: -DATL_ARCH_CoreDuo -DATL_CPUMHZ=800 -DATL_SSE3 -DATL_SSE2 -DATL_SSE1 -DATL_GAS_x8632 I am guessing the speed is right but the rest is not. Was your successful build using these parameters? I had in one instance a non-working ATLAS because the cpu type was misdetected (wanted to use sse3 when it didn't have them). The build was curiously successful but the library was unusable.

comment:19 in reply to: ↑ 18 Changed 12 years ago by dimpase

Replying to fbissey:

Replying to dimpase:

Replying to vbraun:

This fails because ATL_NCPU is not set. Do you have a single-core processor? I guess the threaded libraries are not built in that case.

yes, it's single-core. An old Pentium M. (Atlas 3.8 spkg does not build on it, at all)

I notice you say it is a Pentium M, yet ATLAS is compiling things with the assumption that it is a coreduo: -DATL_ARCH_CoreDuo -DATL_CPUMHZ=800 -DATL_SSE3 -DATL_SSE2 -DATL_SSE1 -DATL_GAS_x8632 I am guessing the speed is right but the rest is not. Was your successful build using these parameters? I had in one instance a non-working ATLAS because the cpu type was misdetected (wanted to use sse3 when it didn't have them). The build was curiously successful but the library was unusable.

The processor is Pentium M Banias 1.1GHz, http://ark.intel.com/Product.aspx?id=27600 (according to GotoBlas? installation procedure :-))

The Atlas built is not totally useless, it works for many doctests. By the way, 3.8 also thinks it's a CoreDuo?. And it works. I saw somewhere a note that it's OK.

comment:20 in reply to: ↑ 17 Changed 11 years ago by fbissey

Replying to vbraun:

I get the same doctest errors. Most of them are linbox related. The SIGFPE is from converting a NAN into a GMP integer, but I haven't gotten to the root of the NAN yet. In trying to debug this I've noticed that there are a bunch of valgrind warnings in the linbox code path we are using. I've asked some more specific questions on the linbox-use mailinglist:

https://groups.google.com/d/topic/linbox-use/N3QNNOQuTAc/discussion

But so far no final conclusion.

Converting a NAN into a GMP integer is exactly what was happening in https://github.com/cschwan/sage-on-gentoo/issues/3 and it didn't happen when using gslcblas. I will do a full build of sage-on-gentoo with 3.9.40 (or 41) and see if I can see anything.

comment:21 Changed 11 years ago by fbissey

Yuck, still got problems leading to linbox, I got the ones leading to iml too. Note that using another cblas like gslcblas/openblas/reference(netlib) all these problems disappear which seem to indicate that there is something going on in ATLAS itself or that all the others gets it wrong (I realise that on a stock sage you may have trouble compiling iml with anything else than ATLAS, it requires some patching to the configure script to be able to do so).

comment:23 Changed 11 years ago by fbissey

I have 3.9.41 now, the only difference was it compiled faster because there was tuning for my cpu. But we could investigate 3.8.4. Thanks for pointing it is out, some of us (me at least) didn't think it would ever happen. We may have a stable ATLAS supporting newer CPUs at last, and it looks like I could test it quickly.

comment:24 Changed 11 years ago by fbissey

OK 3.8.4 doesn't suffer from any of the drawbacks that are apparent in 3.9.23+.

The big drawback is that it doesn't build lapack nicely on its own, provided we point to the sources like the latest 3.9.xx do. That means spkg-install would need a little bit more work.

Opinions?

comment:25 Changed 11 years ago by vbraun

Can we keep this ticket focused on the development version of atlas and discuss the stable ATLAS on #10226? I updated the spkg on the latter ticket to the new stable ATLAS release.

comment:26 Changed 11 years ago by fbissey

Sure. My current opinion and that's something I am pushing to sage-on-gentoo users is to avoid ATLAS 3.9.xx for the time being. It is possible that ATLAS-3.9.xx is doing something permissible that iml and linbox are not ready to catch. My line of thought is that I remember that some algorithm in iml takes the inverse of a result returned by cblas. If the result is 0 instead of a small value we may have our NaN, more likely some result from ATLAS is a NaN.

comment:27 Changed 11 years ago by leif

  • Cc leif added

Note that you can copy updated spkgs into spkg/standard/ and then do

env SAGE_UPGRADING=yes make build

to rebuild all dependent packages.

(We should perhaps add make targets for that to the top-level Makefile and document them in the Developer's Guide, as this is currently just a side-effect of the effort to make upgrading work.)

If you at the same time need to apply patches to the Sage library, things get a bit more complicated, as e.g. Sage switches to the main branch before reinstalling the Sage library package. One safe way is to first apply the patches, create a new sage-x.y.z-whatever spkg (with devel/sage/spkg-dist) and replace the one in spkg/standard/ with that one (or at least make sure newest_version will pick up the right one).

Note that the extension modules' dependencies in module_list.py are currently far from complete. #8664 adds some in a generic way by adding them automatically in setup.py, i.e. lets modules also depend on the headers of the libraries they use (which [only] works if the headers' mtimes get modified / updated during installation of their corresponding libraries). The dumb alternative is to run sage -ba-force after an "upgrade" process.

comment:28 follow-up: Changed 11 years ago by leif

Ping.

comment:29 in reply to: ↑ 28 Changed 11 years ago by dimpase

Replying to leif:

Ping.

just fired up my vintage MacOSX 32-bit Powerbbok PPC... Will know more in, like, 12 hours...

comment:30 follow-up: Changed 11 years ago by fbissey

Several issues:

  • atlas is now at version 3.9.49 (last I checked)
  • I have reports of it failling to build on some old opterons
  • @dimpase: do you need atlas on OS X ppc? Don't you use vectorize like the other OS X?
  • I haven't checked but I am quite sure from other reports that cblas_dgem{m,v} is still buggy (someone posted that R built against that Atlas lapack was giving them trouble and R upstream pointed the finger to Atlas).

comment:31 in reply to: ↑ 30 Changed 11 years ago by dimpase

Replying to fbissey:

Several issues:

  • atlas is now at version 3.9.49 (last I checked)
  • I have reports of it failling to build on some old opterons

so what? I have a 5-year old 32-bit Intel laptop on which the sage-current Atlas does not build.

  • @dimpase: do you need atlas on OS X ppc? Don't you use vectorize like the other OS X?

indeed. But that's for #10509. Oops, was posting on the wrong ticket...

comment:32 Changed 10 years ago by kcrisman

With respect to comment:33:ticket:12011, should this be closed as a dup of #12011?

comment:33 Changed 10 years ago by dimpase

  • Status changed from needs_info to needs_review

propose to close this one and refer to #12011 the the continuation of the upgrade work.

comment:34 Changed 10 years ago by vbraun

  • Milestone changed from sage-5.0 to sage-duplicate/invalid/wontfix
  • Status changed from needs_review to positive_review

I would say #12011 is a duplicate of my ticket but oh well ;-)

comment:35 Changed 10 years ago by jdemeyer

  • Authors set to Volker Braun
  • Description modified (diff)
  • Milestone changed from sage-duplicate/invalid/wontfix to sage-5.0
  • Status changed from positive_review to needs_work
  • Summary changed from Update ATLAS to version 3.9.32 to Update ATLAS to version 3.9.x

Seems like #12011 isn't a duplicate after all...

comment:36 Changed 10 years ago by vbraun

  • Description modified (diff)

comment:37 Changed 10 years ago by vbraun

  • Description modified (diff)

comment:38 Changed 10 years ago by jdemeyer

  • Milestone changed from sage-5.1 to sage-5.3

Obviously, the patches to spkg/install and spkg/standard/deps must be rebased.

comment:39 Changed 10 years ago by jdemeyer

It makes a lot more sense to me to put the LAPACK tarball at the top level of the spkg instead of in patches/.

patches/ATLAS-lib/autom4te.cache should be removed.

comment:40 Changed 10 years ago by jdemeyer

I don't like using assert for control flow, because that's not what it's meant for.

Why not replace those by (see #13210)

if rc != 0:
    print "Error: foo"
    sys.exit(rc)

comment:41 Changed 10 years ago by jdemeyer

There is something wrong with the history in SPKG.txt (atlas-3.8.4 is completely missing and there is atlas-3.9.68 for #12011 which never got merged)

comment:42 follow-up: Changed 10 years ago by vbraun

I don't have any strong opinion on where to put the lapack tarball, except that our convention of only allowing a single src/ directory is shortsighted.

Note that I'm not using assertions for control flow, i.e. I'm not using

  try:
    <command>
  except AssertionError:
    <alternative>

Note that you could theoretically also catch the SystemExit exception, so sys.exit() isn't different from assert in that regard. I'm only using assertions to ensure the following contract holds: spkg-install completes successfully only if the relevant atlas configure/make completed with rc==0

I'll replace the asserts by something that returns rc as exit code, though.

comment:43 Changed 10 years ago by vbraun

  • Description modified (diff)

comment:44 in reply to: ↑ 42 Changed 10 years ago by jdemeyer

Replying to vbraun:

I'm only using assertions to ensure the following contract holds: spkg-install completes successfully only if the relevant atlas configure/make completed with rc==0

I think (IMHO but I might be wrong) that assertions should be used only in a situation where an assertion being false indicates a bug in the program. If rc != 0 in spkg-install, that is an ordinary error condition, not a bug in the spkg-install script.

comment:45 follow-up: Changed 10 years ago by jdemeyer

Starting from sage-5.1, I get one doctest failure:

File "/release/merger/sage-5.1-atlas/devel/sage-main/sage/rings/polynomial/polynomial_element.pyx", line 1039:
    sage: parent(poly)([ 0.0 if abs(c)<=1e-14 else c for c in poly.coeffs() ])
Expected:
    1.0
Got:
    1.02140518266e-14*x^2 + 1.0

comment:46 in reply to: ↑ 45 Changed 10 years ago by dimpase

Replying to jdemeyer:

Starting from sage-5.1, I get one doctest failure:

     sage: parent(poly)([ 0.0 if abs(c)<=1e-14 else c for c in poly.coeffs() ])

it's most probably not at Atlas problem, but rather that 1e-14 (any solid rationale behind this choice? I guess not.) needs to be adjusted.

comment:47 Changed 10 years ago by benjaminfjones

  • Reviewers set to Benjamin Jones

Testing in sage-5.1.rc1 on x86_64 debian Linux:

$ uname -a
Linux sage 2.6.32 #1 SMP Fri Sep 2 21:08:57 CDT 2011 x86_64 GNU/Linux

$ head -18 /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 44
model name      : Intel(R) Xeon(R) CPU           X5690  @ 3.47GHz
stepping        : 2
cpu MHz         : 3465.790
cache size      : 12288 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu de tsc msr pae cx8 sep cmov pat clflush mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good aperfmperf pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm ida arat
bogomips        : 6931.58
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

BEFORE new ATLAS spkg

sage -t  "devel/sage-main/sage/modular/modsym/ambient.py"   
         [11.6 s]

sage -t  "devel/sage-main/sage/modular/hecke/ambient_module.py"
         [4.2 s]

sage -t  "devel/sage-main/sage/modular/hecke/hecke_operator.py"
         [3.0 s]

AFTER new spkg

  1. untar'd fresh sage-5.1.rc1
  2. replaced atlas-3.8.4.p1.spkg with atlas-3.9.85.spkg
  3. make build

SPKG build log: http://sage.math.washington.edu/home/bjones/atlas-3.9.85.log

Build atlas-3.9.85

real    11m18.595s
user    11m15.906s
sys     2m56.099s
Successfully installed atlas-3.9.85

Tests

sage -t  "devel/sage-main/sage/modular/modsym/ambient.py"   
         [11.6 s]

sage -t  "devel/sage-main/sage/modular/hecke/ambient_module.py"
         [4.2 s]

sage -t  "devel/sage-main/sage/modular/hecke/hecke_operator.py"
         [3.1 s]

All tests

$ make ptestlong
...
All tests passed!
Total time for all tests: 1282.1 seconds

comment:48 Changed 10 years ago by fbissey

3.10.0 has just been released a few hours ago. Do we try to go for it? Like Jeroen remarked there may be tolerance issues depending on the hardware. I know I have one when using openblas instead of ATLAS.

comment:49 Changed 10 years ago by vbraun

  • Description modified (diff)
  • Status changed from needs_work to needs_review
  • Summary changed from Update ATLAS to version 3.9.x to Update ATLAS to stable version 3.10

comment:50 follow-up: Changed 10 years ago by vbraun

The new spkg will attempt to build on OSX if SAGE_ATLAS_ARCH is set. Untested, but a start.

I also am planning to allow SAGE_ATLAS_LIB=system and use ldconfig -p to get the system atlas, if available. Though perhaps we should do that in a later version. This spkg seems to fix some build issues that people reported on the sage mailinglists so it might be good to get it in sooner rather than later.

comment:51 in reply to: ↑ 50 Changed 10 years ago by jdemeyer

Replying to vbraun:

so it might be good to get it in sooner rather than later.

+1, let's not worry now about changing the working of SAGE_ATLAS_LIB.

comment:52 Changed 10 years ago by jdemeyer

  • Dependencies set to sage-5.2.beta0

comment:53 Changed 10 years ago by jdemeyer

  • Description modified (diff)

comment:54 Changed 10 years ago by jdemeyer

  • Description modified (diff)

comment:55 Changed 10 years ago by jdemeyer

In spkg-install, I find it confusing that you have a function build() which calls configure() and make_atlas().

I would get rid of build() and simply call configure() and make_atlas() directly.

comment:56 Changed 10 years ago by jdemeyer

  • Status changed from needs_review to needs_work

About SAGE_FAT_BINARY: at one point you are checking

os.environ.has_key('SAGE_FAT_BINARY')

and at another you are checking

os.environ.get('SAGE_FAT_BINARY', 'no') == 'yes' and conf['Intel?']

I think the correct check is

os.environ.get('SAGE_FAT_BINARY', 'no') == 'yes'

"SAGE_FAT_BINARY" has evolved to mean "build a generic binary on any processor", so it's not Intel specific anymore. Just call configure_base().

Last edited 10 years ago by jdemeyer (previous) (diff)

comment:57 Changed 10 years ago by jdemeyer

Somebody needs to review 10508_doctest.patch

comment:58 Changed 10 years ago by jdemeyer

Detail: in system_with_flush(), your print command should not have a space after "Running". The space is automatically added by print.

comment:59 Changed 10 years ago by vbraun

Updated spkg to remove the space and make the SAGE_FAT_BINARY check uniform.

I'm fine with the doctests patch...

comment:60 Changed 10 years ago by vbraun

I'm always calling build() except in one place where I want to allow configure() to fail (because throttling is enabled).

comment:61 Changed 10 years ago by jdemeyer

Is this expected? Compare the build time on arando (Ubuntu 12.04 i386):

real    104m44.021s
user    97m25.753s
sys     1m18.157s
Successfully installed atlas-3.8.4.p1
real    259m50.222s
user    233m21.879s
sys     7m38.405s
Successfully installed atlas-3.10.0

comment:62 Changed 10 years ago by vbraun

  • Status changed from needs_work to needs_review

I've noticed before that the build time for the "generic" binary is rather long. Its not entirely generic, it is still doing timing for cache edges. But the result will be a working library, it won't probe funky asm implementations that other CPUs might not support. I'll ask on the ATLAS mailinglist for clarification.

comment:63 follow-up: Changed 10 years ago by jdemeyer

Timing on x86_64 with SAGE_FAT_BINARY=yes on sage.math:

real    139m50.733s
user    134m30.710s
sys     7m12.390s
Successfully installed atlas-3.8.4.p1
real    350m5.662s
user    330m23.460s
sys     22m1.010s
Successfully installed atlas-3.10.0

Why does the new ATLAS take so much longer to build than the old one?

comment:64 in reply to: ↑ 63 Changed 10 years ago by leif

Replying to jdemeyer:

Timing on x86_64 with SAGE_FAT_BINARY=yes on sage.math:

real    139m50.733s
user    134m30.710s
sys     7m12.390s
Successfully installed atlas-3.8.4.p1
real    350m5.662s
user    330m23.460s
sys     22m1.010s
Successfully installed atlas-3.10.0

Why does the new ATLAS take so much longer to build than the old one?

I can beat that:

Finished installing shared ATLAS library.

real	821m14.881s
user	739m58.560s
sys	58m23.440s
Successfully installed atlas-3.10.0

(That's with sage -f ... on an otherwise idle machine, an AMD Fusion E-450 running Ubuntu 10.04.4 LTS x86_64.)

It took far more time than building all of Sage [in parallel, on that dual-core CPU], including the old ATLAS spkg. I currently have no figures for a separate ATLAS 3.8.4 build, but the timings from previous parallel Sage builds on that machine vary between

real	214m52.096s
user	175m5.390s
sys	9m56.210s
Successfully installed atlas-3.8.4.p1

and

real	256m32.949s
user	217m30.760s
sys	10m31.050s
Successfully installed atlas-3.8.4.p1

(The LAPACK and BLAS spkg build times in these builds are a few minutes and less than one minute [wall time], respectively.)

I was actually hoping ATLAS 3.9.x / 3.10 meanwhile "knows" these AMD CPUs and therefore builds at least a bit faster...

comment:65 Changed 10 years ago by leif

FWIW, ptestlong passed (after rebuilding all dependent packages) with Sage 5.2.beta0 (without applying the doctest patch).

comment:66 Changed 10 years ago by fbissey

We have noticed the building time increase in Gentoo as well. At 3.9.82 I think. Apparently they have changed how they detect the compiler and that's what causing the spike. But it seem fixed in 3.10.0 unless we are carrying a specific patch in Gentoo

     Fri Jun 29 13:42:08 2012 >>> sci-libs/atlas-3.9.80
       merge time: 8 minutes and 4 seconds.

     Wed Jul  4 14:56:21 2012 >>> sci-libs/atlas-3.9.82
       merge time: 3 hours, 54 minutes and 30 seconds.

     Wed Jul 11 11:11:23 2012 >>> sci-libs/atlas-3.10.0
       merge time: 8 minutes and 34 seconds.

comment:67 Changed 10 years ago by vbraun

If I set SAGE_ATLAS_ARCH=Corei2,AVX,SSE3,SSE2,SSE1 then I can also compile atlas-3.10.0 in about 10 minutes. The issue is only with the "generic" archdefs, which seem to be not sufficiently specialzied. I've raised this issue on the ATLAS mailinglist.

comment:68 Changed 10 years ago by vbraun

The updated atlas spkg also installs a script atlas-config in $SAGE_LOCAL/bin which can be used to compute new architectural defaults. I need somebody on Linux i386 to first install the new spkg and then run

SAGE_FAT_BINARY=yes atlas-config --archdef

to grind out the archdefs for the i386 "generic" target. I'm currently doing this for x86_64, but I don't have a i386 machine. This will use sudo to turn off CPU throttling so you need to be a sudoer.

[vbraun@volker-desktop sage-5.2.beta1]$ ./local/bin/atlas-config --help
usage: atlas-config [-h] [--unthrottle PID] [--archdef]

(Re-)Build ATLAS (http://math-atlas.sourceforge.net) according to the
SAGE_ATLAS_ARCH environment variable

optional arguments:
  -h, --help        show this help message and exit
  --unthrottle PID  switch CPU throttling off until PID finishes
  --archdef         build archdef tarball and save it to the current directory

comment:69 follow-up: Changed 10 years ago by vbraun

I've updated the spkg with new 64-bit generic archdefs, this now builds in about 10 mins.

comment:70 in reply to: ↑ 69 Changed 10 years ago by leif

Replying to vbraun:

I've updated the spkg with new 64-bit generic archdefs, this now builds in about 10 mins.

md5sum?

I incidentally just downloaded some new version (163f090f18bb8616e93617677a644cd8) and triggered a full build from scratch.

comment:71 Changed 10 years ago by vbraun

The newest version is 8a16c9d39add1c6c3f37e13986e2a3cc and thats whats linked in the ticket.

comment:72 Changed 10 years ago by vbraun

The new version d33e9114156d8373fa61f957b379e029 changes the "fast" 64-bit archdef to P4E64SSE3.

It turns out that there are no 64-bit generic archdefs, which might have been one reason for why ATLAS was slow to compile. The SPKG uses the existing 32-bit archdefs or the 64-bit one that I made. On x86 the spkg should produce a working ATLAS library within 10-30 mins, and only go through the tuning process if either

  • CPU throttling is disabled (scaling_governor=performance, needs root), or
  • SAGE_ATLAS_ARCH is explicitly set to something different from "base" / "fast".

comment:73 follow-up: Changed 10 years ago by vbraun

PS: Surprisingly enough, the new ATLAS spkg actually compiled on OSX (bsd.math)! If you want to try yourself just set SAGE_ATLAS_ARCH="base" to force building on OSX.

comment:74 in reply to: ↑ 73 Changed 10 years ago by benjaminfjones

Replying to vbraun:

PS: Surprisingly enough, the new ATLAS spkg actually compiled on OSX (bsd.math)! If you want to try yourself just set SAGE_ATLAS_ARCH="base" to force building on OSX.

Very cool. I just got ATLAS to build on my OS X 10.6.8 machine setting SAGE_ATLAS_ARCH=Corei2. Running long tests now.

comment:75 follow-up: Changed 10 years ago by benjaminfjones

Update: ATLAS-3.10.0 built successfully on my OS X 10.6.8 machine with SAGE_ATLAS_ARCH=Corei2, the build took approx. 16 mins. The build log is at http://sage.math.washington.edu/home/bjones/atlas-3.10.0.log. Sage passes all make ptestlong tests. The spkg looks very good to me. I'd give this a positive review, but maybe it should be tested by a few other reviewers on other platforms first.

comment:76 in reply to: ↑ 75 Changed 10 years ago by dimpase

Replying to benjaminfjones:

Update: ATLAS-3.10.0 built successfully on my OS X 10.6.8 machine with SAGE_ATLAS_ARCH=Corei2, the build took approx. 16 mins. The build log is at http://sage.math.washington.edu/home/bjones/atlas-3.10.0.log. Sage passes all make ptestlong tests. The spkg looks very good to me. I'd give this a positive review, but maybe it should be tested by a few other reviewers on other platforms first.

I've built it successfully on OS X 10.6.8 (with Core2 Duo) and setting SAGE_ATLAS_ARCH="base".

comment:77 Changed 10 years ago by leif

Well, there's at least room for nitpicking (a couple of typos and some inconsistencies as well as superfluous code in spkg-install and probably configuration.py, don't recall)... I'll maybe take a look at it again tomorrow, and probably provide a patch (provided Volker doesn't plan to make further major changes to these files).


How about also installing a user script for convenience to save the built ATLAS libraries to another place (for later use with SAGE_ATLAS_LIB)?

Regarding the mentioned excessive tuning times, I also wonder whether we should use something like SAGE_ATLAS_ARCH=fast (or base) by default, i.e., only do self-tuning if the user explicitly asks for it in some way. I guess the Sage Installation Guide needs to get updated anyway w.r.t. ATLAS and environment variables.

comment:78 Changed 10 years ago by leif

The root repo patch should get rebased for Sage 5.2.rc0.

comment:79 Changed 10 years ago by jhpalmieri

On hawk (OpenSolaris):

DONE configure                                                                                     
Finished configuring ATLAS.                                                                        
Running make -j1                                                                                   
make[2]: Entering directory `/export/home/palmieri/testing/ATLAS/sage-5.2.rc0/spkg/build/atlas-3.1\
0.0/ATLAS-build'                                                                                   
make[2]: warning: -jN forced in submake: disabling jobserver mode.                                 
make -j1 -f Make.top build                                                                         
make[3]: Entering directory `/export/home/palmieri/testing/ATLAS/sage-5.2.rc0/spkg/build/atlas-3.1\
0.0/ATLAS-build'                                                                                   
Make.top:1: Make.inc: No such file or directory                                                    
Make.top:325: warning: overriding commands for target `/AtlasTest'                                 
Make.top:76: warning: ignoring old commands for target `/AtlasTest'                                
make[3]: *** No rule to make target `Make.inc'.  Stop.                                             
make[3]: Leaving directory `/export/home/palmieri/testing/ATLAS/sage-5.2.rc0/spkg/build/atlas-3.10\
.0/ATLAS-build'                                                                                    
make[2]: *** [build] Error 2                                                                       
make[2]: Leaving directory `/export/home/palmieri/testing/ATLAS/sage-5.2.rc0/spkg/build/atlas-3.10\
.0/ATLAS-build'                                                                                    
------------------------------------------------------------                                       
  File "./spkg-install", line 478, in <module>                                                     
    assert_success(rc, bad='Failed to build ATLAS.', good='Finished building ATLAS core.')         
  File "./spkg-install", line 74, in assert_success                                                
    traceback.print_stack(file=sys.stdout)                                                         
------------------------------------------------------------                                       
Error:  Failed to build ATLAS.                                                                     
                                                                                                   
real    4m10.778s                                                                                  
user    0m7.766s                                                                                   
sys     0m8.391s                                                                                   
Successfully installed atlas-3.10.0                                                                
Deleting temporary build directory                                                                 
/export/home/palmieri/testing/ATLAS/sage-5.2.rc0/spkg/build/atlas-3.10.0                           
Finished installing atlas-3.10.0.spkg 

I don't know why it's not building, but it shouldn't exit saying "Successfully installed atlas-3.10.0". I added a print statement, and "rc" is 512. The documentation for sys.exit says that for the argument, "Most systems require it to be in the range 0-127, and produce undefined results otherwise." We could instead do this:

  • spkg-install

    diff --git a/spkg-install b/spkg-install
    a b def assert_success(rc, good=None, bad=No 
    7474    traceback.print_stack(file=sys.stdout)
    7575    print '-'*60
    7676    if bad is not None:
    77         print 'Error: ', bad
    78     sys.exit(rc)
     77        sys.exit('Error: %s' % bad)
     78    sys.exit(1)
    7979
    8080######################################################################
    8181### Skip building ATLAS on specific systems

comment:80 Changed 10 years ago by jhpalmieri

  • Status changed from needs_review to needs_work

comment:81 follow-up: Changed 10 years ago by leif

On Ubuntu 10.04.4 LTS x86_64 (AMD E-450), with Sage 5.2.rc0 and SAGE_ATLAS_ARCH=fast I get:

...

Building using specific architecture.
Fast configuration on Intel x86_64 compatible CPUs.
Running configure with arch = P4E64SSE3, isa extensions ('SSE3', 'SSE2', 'SSE1'), archdef dir None
Traceback (most recent call last):
  File "./spkg-install", line 454, in <module>
    rc = build()
  File "./spkg-install", line 447, in build
    rc = configure(arch, isa_ext, archdef_dir)
  File "./spkg-install", line 315, in configure
    cmd += ' -A '+str(ATLAS_MACHTYPE.index(arch))
ValueError: tuple.index(x): x not in tuple

real    0m0.701s
user    0m0.090s
sys     0m0.060s
************************************************************************
Error installing package atlas-3.10.0
************************************************************************

comment:82 Changed 10 years ago by fbissey

Built successfully on power7. A few oddities in the log but I don't think they are important

make -j1 atlas_run atldir=/hpc/scratch/frb15/sandbox/sage-5.1.beta5/spkg/build/atlas-3.10.0/ATLAS-build exe=xprobe_comp redir=config1.out \
                args="-v 0 -o atlconf.txt -O 1 -A 7 -Si nof77 0 -V 6  -Fa ic '-fPIC' -C sm 'gcc' -Fa sm '-fPIC' -C dm 'gcc' -Fa dm '-fPIC' 
-C sk 'gcc' -Fa sk '-fPIC' -C dk 'gcc' -Fa dk '-fPIC' -C xc 'gcc' -Fa xc '-fPIC' -Fa gc '-fPIC' -C if 'sage_fortran' -Fa if '-fPIC' -b 64 -
d b /hpc/scratch/frb15/sandbox/sage-5.1.beta5/spkg/build/atlas-3.10.0/ATLAS-build"
make[1]: Entering directory `/hpc/scratch/frb15/sandbox/sage-5.1.beta5/spkg/build/atlas-3.10.0/ATLAS-build'
cd /hpc/scratch/frb15/sandbox/sage-5.1.beta5/spkg/build/atlas-3.10.0/ATLAS-build ; ./xprobe_comp -v 0 -o atlconf.txt -O 1 -A 7 -Si nof77 0 
-V 6  -Fa ic '-fPIC' -C sm 'gcc' -Fa sm '-fPIC' -C dm 'gcc' -Fa dm '-fPIC' -C sk 'gcc' -Fa sk '-fPIC' -C dk 'gcc' -Fa dk '-fPIC' -C xc 'gcc
' -Fa xc '-fPIC' -Fa gc '-fPIC' -C if 'sage_fortran' -Fa if '-fPIC' -b 64 -d b /hpc/scratch/frb15/sandbox/sage-5.1.beta5/spkg/build/atlas-3
.10.0/ATLAS-build > config1.out
sh: -c: line 0: unexpected EOF while looking for matching ``'
sh: -c: line 1: syntax error: unexpected end of file
sh: -c: line 0: unexpected EOF while looking for matching ``'
sh: -c: line 1: syntax error: unexpected end of file
sh: -c: line 0: unexpected EOF while looking for matching ``'
sh: -c: line 1: syntax error: unexpected end of file
sh: -c: line 0: unexpected EOF while looking for matching ``'
sh: -c: line 1: syntax error: unexpected end of file
sh: -c: line 0: unexpected EOF while looking for matching ``'
sh: -c: line 1: syntax error: unexpected end of file
probe_f2c.o: In function `ATL_tmpnam':
/hpc/scratch/frb15/sandbox/sage-5.1.beta5/spkg/build/atlas-3.10.0/ATLAS-build/../src//CONFIG/include/atlas_sys.h:224: warning: the use of `
tmpnam' is dangerous, better use `mkstemp'

I noticed that libcblas.so is using rpath

ldd -r local/lib/libcblas.so.2.1.0 
        linux-vdso64.so.1 =>  (0x0000040000000000)
        libatlas.so.2 => /hpc/scratch/frb15/sandbox/sage-5.1.beta5/local/lib/libatlas.so.2 (0x0000040000060000)
        libpthread.so.0 => /lib64/power7/libpthread.so.0 (0x00000400008a0000)
        libm.so.6 => /lib64/power7/libm.so.6 (0x00000400008e0000)
        libc.so.6 => /lib64/power7/libc.so.6 (0x00000400009c0000)
        /lib64/ld64.so.1 (0x0000000024560000)

And that's from outside the sage shell. However for f77blas and presumably lapack, libgfortran is not rpath-ed (I am using sage gcc's spkg in this case).

That is important info if you want to put atlas libraries in another location.

I will run a few tests shortly.

comment:83 Changed 10 years ago by benjaminfjones

Volker, do you still need architecture defaults for i386? I have access to an old laptop with a Centrino M processor that reports being i386 / i686 in uname -a and in /proc/cpuinfo:

cpu family 6
cpu model 13
Intel Pentium M

I haven't ever built Sage on it, I imagine it would take several years, but I'm willing to give it a try.

comment:84 follow-up: Changed 10 years ago by vbraun

  • John, I'm against changing the exit codes, we report whatever the sub-process spat out. So even if its >127, thats just what we were handed so clearly its a supported exit code.

I don't have an account on hawk, but in any case that should not stop us from shipping an updated ATLAS that fixes build issues on modern hard/software. We can always clean up second-tier platforms later. But do post the whole log, the error is likely further up.

  • Leif, are you working on a patch? In configure_fast(), we should set
    arch = 'P4E'
    
    instead of P4E64SSE3 is added later. This is what causes your build failure.
  • Francois: the weird messages are normal. The rpaths are set by libtools and are what libtools likes to set. As long as we set LD_LIBRARY_PATH this doesn't matter except when we distribute the binaries. This needs to be fixed some day in all shared libs but not on this ticket.
  • Benjamin: no I don't need 32-bit archdefs, this should be covered by what ATLAS already ships with. The atlas spkg should build relatively quickly now except for the cases in 72

comment:85 in reply to: ↑ 84 Changed 10 years ago by jhpalmieri

Replying to vbraun:

  • John, I'm against changing the exit codes, we report whatever the sub-process spat out. So even if its >127, thats just what we were handed so clearly its a supported exit code.

But after sys.exit(512), Python has return code of zero, on sage.math, OS X, and OpenSolaris. So sage-spkg thinks the spkg installed correctly, which it clearly didn't. From the documentation for os.system, it looks to me that its output should be divided by 512 to get a return code suitable for sys.exit, but I'm not sure about that. However you want to fix it, it has to be changed: it's not acceptable for the compilation to fail but for spkg-install to have a return code of zero.

I don't have an account on hawk, but in any case that should not stop us from shipping an updated ATLAS that fixes build issues on modern hard/software. We can always clean up second-tier platforms later. But do post the whole log, the error is likely further up.

The log is posted at http://sage.math.washington.edu/home/palmieri/misc/atlas-3.10.0.log.

comment:86 Changed 10 years ago by jdemeyer

Something along the following lines should be used to handle rc, see http://docs.python.org/library/os.html

if os.WIFEXITED(rc):
    rc = os.WEXITSTATUS(rc)
elif os.WIFSIGNALED(rc):
    rc = 128 + os.WTERMSIG(rc)
else:
    raise SystemError("Unknown return value %i for os.system()"%rc)

comment:87 Changed 10 years ago by vbraun

I've investigated and reported the Solaris build issue upstream at https://sourceforge.net/tracker/?func=detail&aid=3545418&group_id=23725&atid=379483

comment:88 Changed 10 years ago by vbraun

The return value is actually the return value of os.system(), which is described in http://docs.python.org/library/os.html#os.wait

To extract the exit status, we should just do sys.exit((rc >> 8) & 0x7f). Leif, are you working on the spkg right now?

comment:89 Changed 10 years ago by vbraun

The workaround for the Solaris build issue is to use the fqn for $CC and sage_fortran

comment:90 Changed 10 years ago by vbraun

Since Leif apparently isn't around I implemented the fqn workaround for the Solaris build and the return status issues. Solaris build is still broken but now at a different place. Updated spkg at the same place, md5sum is 878695a26071cfe73a9977bd8413b748.

comment:91 Changed 10 years ago by jhpalmieri

This version now builds on hawk: log file here.

comment:92 Changed 10 years ago by vbraun

Sounds good! I updated the spkg with yet another SPARC Solaris fix, md5sum is 6dbcf22c920626380f2cba877cca4cb1. Though still doesn't work on mark/skynet, but at least makes it now into the compile phase. In any case SPARC solaris issues shouldn't delay this ticket.

comment:93 in reply to: ↑ 81 ; follow-up: Changed 10 years ago by leif

Replying to leif:

On Ubuntu 10.04.4 LTS x86_64 (AMD E-450), with Sage 5.2.rc0 and SAGE_ATLAS_ARCH=fast I get:

ValueError: tuple.index(x): x not in tuple

Using SAGE_ATLAS_ARCH=base in contrast worked (and ptestlong passed with Sage 5.2.rc0, FWIW):

real    34m12.187s
user    30m3.170s
sys     5m30.850s
Successfully installed atlas-3.10.0

Still not that fast, but approximately within your estimates...

comment:94 in reply to: ↑ 93 ; follow-up: Changed 10 years ago by vbraun

Replying to leif:

real 34m12.187s

Thats pretty good for 18W TDP. I take it compiling all of Sage takes 2+ hours on that machine?

comment:95 in reply to: ↑ 94 ; follow-up: Changed 10 years ago by leif

Replying to vbraun:

Replying to leif:

real 34m12.187s

Thats pretty good for 18W TDP. I take it compiling all of Sage takes 2+ hours on that machine?

Sure. Although ATLAS currently consumes only <= 9W ;-)

Unfortunately ATLAS is built quite late (due to its odd dependency on Sage's Python -- while your script is apparently designed to support Python 2.4 as well), so a fair amount of the time spent building Sage only one core is used (because the remaining packages directly or indirectly depend on ATLAS).

I reinstalled the updated spkg again with SAGE_ATLAS_ARCH=fast:

real    40m20.227s
user    36m26.220s
sys     6m9.500s
Successfully installed atlas-3.10.0

comment:96 follow-up: Changed 10 years ago by jhpalmieri

An update: on hawk, I unpacked a sage-5.2.rc0 tarball, replaced the old ATLAS spkg with this one, and built from scratch. There are a bunch of doctest failures:

The following tests failed:                                                                        
                                                                                                   
        sage -t  --long -force_lib devel/sage/sage/matrix/matrix2.pyx # 12 doctests failed
        sage -t  --long -force_lib devel/sage/sage/misc/functional.py # 1 doctests failed
        sage -t  --long -force_lib devel/sage/sage/finance/time_series.pyx # 6 doctests failed
        sage -t  --long -force_lib devel/sage/sage/numerical/test.py # Killed/crashed
        sage -t  --long -force_lib devel/sage/sage/modular/modform/numerical.py # 3 doctests failed
        sage -t  --long -force_lib devel/sage/sage/numerical/optimize.py # Killed/crashed
        sage -t  --long -force_lib devel/sage/sage/matrix/matrix_double_dense.pyx # 68 doctests failed
        sage -t  --long -force_lib devel/sage/doc/en/a_tour_of_sage/index.rst # Killed/crashed
        sage -t  --long -force_lib devel/sage/doc/en/numerical_sage/cvxopt.rst # Killed/crashed
        sage -t  --long -force_lib devel/sage/doc/fr/a_tour_of_sage/index.rst # Killed/crashed
        sage -t  --long -force_lib devel/sage/doc/tr/a_tour_of_sage/index.rst # Killed/crashed
        sage -t  --long -force_lib devel/sage/sage/combinat/e_one_star.py # Killed/crashed

For example:

File "/export/home/palmieri/testing/ATLAS/sage-5.2.rc0/devel/sage-main/sage/matrix/matrix2.pyx", line 8157:
    sage: (A - M*G).zero_at(10^-12)
Expected:
    [0.0 0.0 0.0]
    [0.0 0.0 0.0]
    [0.0 0.0 0.0]
Got:
    [                                 0.0                                  0.0                                  0.0]
    [  -0.10532733041 + 0.0950573490006*I   0.017805411596 - 0.0512258178986*I -0.0226596712913 + 0.0414519876977*I]
    [  0.100615400305 + 0.0962034401538*I -0.0779990660567 - 0.0543172822202*I   0.057608664751 + 0.0154619373789*I]

and

File "/export/home/palmieri/testing/ATLAS/sage-5.2.rc0/devel/sage-main/sage/misc/functional.py", line 1144:
    sage: norm(M)
Expected:
    10.6903311292
Got:
    10.4323182134

I'll try to build again in case something wrong the first time.

comment:97 in reply to: ↑ 96 Changed 10 years ago by leif

Replying to jhpalmieri:

An update: on hawk, I unpacked a sage-5.2.rc0 tarball, replaced the old ATLAS spkg with this one, and built from scratch. There are a bunch of doctest failures:

Did you apply the root repo patch (to remove the BLAS and LAPACK spkgs)?

comment:98 Changed 10 years ago by jhpalmieri

Oops, no, I forgot. One more time...

comment:99 Changed 10 years ago by jhpalmieri

Now Sage doesn't build on hawk, I guess due to the problems noted on #10509: cvxopt doesn't build, because it says

ld: fatal: library -lblas: not found

I'll skip building cvxopt and continue with the rest of the build.

comment:100 follow-up: Changed 10 years ago by vbraun

I don't think the cvxopt problem is due to #10509. The cvxopt spkg explicitly links against blas, this is bad. From the cvxopt patches/setup.py.patch:

+ libraries = ['m','lapack','gsl','blas','gslcblas','cblas','gfortran','atlas']

this is wrong, it should be f77blas if the Fortran version is actually used or not there at all. Of course all modern systems have a libblas.so somewhere so the linker finds it, notes that it is not used, and proceeds. Except that on Hawk, I guess, there is no system-wide libblas. We should proceed removing blas in this ticket and then fix cvxopt on second-tier platforms later.

comment:101 Changed 10 years ago by jhpalmieri

I meant that linking to blas was noted at #10509 as a possible problem, not that #10509 was causing this issue.

comment:102 Changed 10 years ago by vbraun

You are right, that the patch on #10509 should have been applied to cvxopt a long time ago then it wouldn't break here.

comment:103 in reply to: ↑ 95 ; follow-up: Changed 10 years ago by leif

Replying to leif:

Replying to vbraun:

Replying to leif:

real 34m12.187s

Thats pretty good for 18W TDP. I take it compiling all of Sage takes 2+ hours on that machine?

[...]

I reinstalled the updated spkg again with SAGE_ATLAS_ARCH=fast:

real    40m20.227s
user    36m26.220s
sys     6m9.500s
Successfully installed atlas-3.10.0

ROFL, with SAGE_ATLAS_ARCH="AMD64K10h,SSE3,SSE2,SSE1,3DNow" (which involves self-tuning) it took

real	36m15.153s
user	31m23.290s
sys	5m56.960s
Successfully installed atlas-3.10.0

Also a bit strange is that the timing for ptestlong (all for Sage 5.2.rc0, GCC 4.4.3) was

base < fast < AMD64K10h

(i.e., fastest with SAGE_ATLAS_ARCH=base), although I think at least during the last run the machine was partially loaded with other stuff as well, and clearly ptestlong isn't very appropriate to benchmark ATLAS performance... ;-)

[Not going to use the ATLAS tools for comparison right now, perhaps later...]

comment:104 Changed 10 years ago by leif

P.S.:

Another weird thing are (non-fatal w.r.t. the build) errors like

FlagCheck.c:1: error: bad value (ultrasparc) for -mtune= switch
FlagCheck.c:1: error: bad value (ultrasparc) for -mtune= switch
FlagCheck.c:1: error: bad value (ultrasparc) for -mtune= switch
FlagCheck.c:1: error: bad value (ultrasparc) for -mtune= switch
FlagCheck.c:1: error: bad value (armv7) for -march= switch
FlagCheck.c:1: error: bad value (armv7) for -mtune= switch
FlagCheck.c:1: error: bad value (ultrasparc) for -mtune= switch
FlagCheck.c:1: error: bad value (ultrasparc) for -mtune= switch
FlagCheck.c:1: error: bad value (ultrasparc) for -mtune= switch
FlagCheck.c:1: error: bad value (ultrasparc) for -mtune= switch
FlagCheck.c:1: error: bad value (970) for -mtune= switch

even if one specifies the architecture (i.e., on x86).

comment:105 Changed 10 years ago by jhpalmieri

On hawk: with the appropriate patches applied, I still get some test failures, but as far as I can tell, they're all due to cvxopt being broken. So it looks pretty good.

comment:106 follow-up: Changed 10 years ago by leif

Is it intentional that static libraries no longer get installed (although built)?

comment:107 Changed 10 years ago by fbissey

On cvxopt I am doing a new spkg in #13160 I'll check what I have done there. My main issue with the current spkg is it is horribly overlinked.

comment:108 in reply to: ↑ 106 ; follow-up: Changed 10 years ago by vbraun

Replying to leif:

Is it intentional that static libraries no longer get installed (although built)?

Upstream really only builds static libraries. But static libraries suck for our purposes. So yet, it is intentional that the static libraries are not installed.

comment:109 in reply to: ↑ 100 Changed 10 years ago by dimpase

Replying to vbraun:

I don't think the cvxopt problem is due to #10509. The cvxopt spkg explicitly links against blas, this is bad. From the cvxopt patches/setup.py.patch:

+ libraries = ['m','lapack','gsl','blas','gslcblas','cblas','gfortran','atlas']

this is wrong, it should be f77blas if the Fortran version is actually used or not there at all. Of course all modern systems have a libblas.so somewhere so the linker finds it, notes that it is not used, and proceeds. Except that on Hawk, I guess, there is no system-wide libblas. We should proceed removing blas in this ticket and then fix cvxopt on second-tier platforms later.

Perhaps this ticket should get #13160 as a dependence. One thing I checked is that it appears to work on OSX 10.6.8, both with native blas/lapack, and with Atlas 3.10 from this ticket. I imagine #13160 can get finalized quickly.

comment:110 in reply to: ↑ 103 Changed 10 years ago by leif

Replying to leif:

[Not going to use the ATLAS tools for comparison right now, perhaps later...]

FWIW, while make time works, make atlvat2.pdf ... (to build ATLAS vs. ATLAS comparison charts) seems to be broken -- for me it always fails with a buffer overflow.

comment:111 in reply to: ↑ 108 Changed 10 years ago by leif

Replying to vbraun:

Replying to leif:

Is it intentional that static libraries no longer get installed (although built)?

Upstream really only builds static libraries. But static libraries suck for our purposes. So yet, it is intentional that the static libraries are not installed.

Well, as long as also the shared libraries are present, they're (usually) preferred over the static ones (i.e., unless one explicitly asks for linking against the latter), so copying these into $SAGE_LOCAL/lib/ IMHO shouldn't hurt. (The static libraries are btw. needed to compare different ATLAS installations; the only way to "keep" them is to reinstall the ATLAS spkg with ./sage -f -s ... or to set SAGE_KEEP_BUILT_SPKGS, and manually copy them.)

Note that previously installed static ATLAS libraries currently don't get removed. Don't know whether that may cause trouble (e.g. with upgrading); see above.

comment:112 Changed 10 years ago by vbraun

Its true that it doesn't hurt to have the static libraries as long as you don't use them. This is like saying that a knife doesn't hurt until you are stabbed with it. True, but why put a sharp blade under the couch pillow in hopes that nobody will sit on it?

It would be nice to have some system to compare different atlas versions and compile runs, but thats definitely for another ticket. Ideally the atlas-config python script could save the atlas libraries in a private directory, for example by setting a special environment variable while building atlas. And then have some way to tabulate the performance of different installs.

comment:113 Changed 10 years ago by vbraun

  • Dependencies changed from sage-5.2.beta0 to #13160
  • Status changed from needs_work to needs_review

I think the only remaining blocker is that it (or rather, cvxopt) doesn't build on hawk. Since I don't have an account, can someone test it (the spkg + patches from this ticket + the cvxopt spkg from #13160)? Everything else in this ticket has been reviewed already, we just have to check that the interaction with cvxopt is fixed on the last "fully supported" platform.

comment:114 Changed 10 years ago by jhpalmieri

Cvxopt still doesn't build. I see the same error when using the ATLAS spkg here or when setting SAGE_ATLAS_LIB=/ATLAS32. Here is the log.

Does the ATLAS spkg here build on skynet/mark (and Solaris on sparc in general)?

comment:115 Changed 10 years ago by fbissey

Why the heck is it not finding gsl? Oh I see the include line is actually wrong. I'll check the spkg but we should continue with cvxopt issues at #13160.

comment:116 Changed 10 years ago by vbraun

John, now that you verified that it works on Hawk is there anything else that prevents you from pressing the positive review button? ;-)

comment:117 Changed 10 years ago by jhpalmieri

I'm testing on a few skynet machines. Should I expect it to work on mark?

comment:118 follow-up: Changed 10 years ago by vbraun

It worked for me on mark (sparc solaris)

comment:119 in reply to: ↑ 118 Changed 10 years ago by jdemeyer

Replying to vbraun:

It worked for me on mark (sparc solaris)

OMG cool, Skynet is back up. I was totally unaware of that!

comment:120 Changed 10 years ago by leif

I'm still not happy with "discarding" the built static libraries; there should at least be some convenient way to save them (other than SAGE_KEEP_BUILT_SPKGS=yes or installing with sage (-i|-f) -s ....)

Another issue is the extremely increased build time on some machines if one doesn't set SAGE_ATLAS_ARCH. Don't know how we could handle that, but it certainly gives rise to a lot of user complaints.

comment:121 Changed 10 years ago by leif

P.S.: W.r.t. the "knife": If you don't want to install static libraries (somewhere), at least previous ones should get removed (or moved somewhere else) upon a successful ATLAS build.

comment:122 follow-up: Changed 10 years ago by vbraun

Now that I added the generic 64-bit archdefs the default build time (without setting SAGE_ATLAS_ARCH) should be moderate on all x86 systems. I.e. less CPU time than building the rest of Sage.

Your suggestions about handling the static libraries are enhancement requests. By itself, its useless to keep a backup of the static libraries somewhere. I agree that one should keep them around and devise a way to benchmark them, but not on this ticket. Also I'm against attempting to delete stuff from previous installs unless it actively conflicts with the new spkg. Which it does not, the damage of statically linking is already done.

comment:123 Changed 10 years ago by jhpalmieri

I'm willing to give this a positive review now. Leif, what about you? Can we defer your issues to a follow-up?

comment:124 in reply to: ↑ 122 ; follow-up: Changed 10 years ago by leif

Replying to vbraun:

Now that I added the generic 64-bit archdefs the default build time (without setting SAGE_ATLAS_ARCH) should be moderate on all x86 systems. I.e. less CPU time than building the rest of Sage.

Ok, hopefully...


Your suggestions about handling the static libraries are enhancement requests. By itself, its useless to keep a backup of the static libraries somewhere. I agree that one should keep them around and devise a way to benchmark them, but not on this ticket.

I'd rather say not installing them [anywhere] is a regression w.r.t. the previous spkg.

Also I'm against attempting to delete stuff from previous installs unless it actively conflicts with the new spkg. Which it does not, the damage of statically linking is already done.

If so, it shouldn't hurt to keep ATLAS installing them either... ;-)

[I'd expect "more damage" when having different .a and .so library versions.]

comment:125 Changed 10 years ago by leif

I'm not wanting to hold up this ticket, but IMHO the Installation Guide should get updated, at least documenting the new atlas-config script.

comment:126 in reply to: ↑ 124 Changed 10 years ago by vbraun

Replying to leif:

I'd rather say not installing them [anywhere] is a regression w.r.t. the previous spkg.

Its a major improvement, not a regression!

If so, it shouldn't hurt to keep ATLAS installing them either... ;-)

As I explained previously, thats not true. We have to change things to make them better. But we can't un-link the static linkage that has happened previously, so when you upgrade from an existing spkg you potentially keep the old code. And there is nothing a new atlas spkg can do about this. If you want to be sure that you don't have cruft statically linked you'll have to do a clean install. This is precisely why it was a bad idea to install static libraries previously.

Changed 10 years ago by vbraun

Initial patch

comment:127 Changed 10 years ago by vbraun

  • Description modified (diff)

I've added documentation to the installation guide for the atlas-config script and updated the environment variables.

Note: See TracTickets for help on using tickets.