Ticket #6276 (closed defect: fixed)

Opened 15 months ago

Last modified 14 months ago

[with patch; positive review] atlas-3.8.3.p2 dumps core on Solaris 10 with gcc 4.4.0

Reported by: drkirkby Owned by: drkirkby
Priority: critical Milestone: sage-4.1
Component: solaris Keywords: solaris atlas
Cc: Author(s): David Kirkby
Report Upstream: Reviewer(s): William Stein
Merged in: sage-4.1.rc0 Work issues:

Description

Running on t2.math.washington.edu (a Sun T5240 running Solaris 10 update 4), the build of ATLAS fails when building sage-4.0.1.alpha0. (A sqlite bug was fixed first to allow Sage to start building ATLAS). Here's information about the build system.

kirkby@t2:~/sage-4.0.1.alpha0$ uname -a
SunOS t2 5.10 Generic_127111-09 sun4v sparc SUNW,T5240
kirkby@t2:~/sage-4.0.1.alpha0$ cat /etc/release
                       Solaris 10 8/07 s10s_u4wos_12b SPARC
           Copyright 2007 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                            Assembled 16 August 2007

I did post this to sage-devel under the title "atlas-3.8.3.p2 failing on Solaris 10 with gcc-4.4.0" William Stein copied this to Clint Whaley -- the main ATLAS developer.

Here's the last bit of the error. An almost full copy of all the output while building ATLAS is in the attached file - I removed 2500 or so lines showing the output from tar as the files were extracted.

Dave

make[6]: Entering directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/src/testing'
make[6]: `zlib.grd' is up to date.
make[6]: Leaving directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/src/testing'
make clib.grd
make[6]: Entering directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/src/testing'
make[6]: `clib.grd' is up to date.
make[6]: Leaving directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/src/testing'
make[5]: Leaving directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/src/testing'
make INSTALL_LOG/L1CacheSize
make[5]: Entering directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/bin'
cp /home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/tune/sysinfo/res/L1CacheSize INSTALL_LOG/.
make[5]: Leaving directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/bin'
make INSTALL_LOG/sMULADD pre=s
make[5]: Entering directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/bin'
cp /home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/tune/sysinfo/res/sMULADD INSTALL_LOG/.
make[5]: Leaving directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/bin'
make INSTALL_LOG/dMULADD pre=d
make[5]: Entering directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/bin'
cp /home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/tune/sysinfo/res/dMULADD INSTALL_LOG/.
make[5]: Leaving directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/bin'
make[4]: Leaving directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/bin'
make[4]: Entering directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/bin'
cd /home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/tune/blas/gemm ; make res/dMMRES pre=d nb=88
make[5]: Entering directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/tune/blas/gemm'
make xmmsearch
make[6]: Entering directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/tune/blas/gemm'
make[6]: `xmmsearch' is up to date.
make[6]: Leaving directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/tune/blas/gemm'
./xmmsearch -p d
Precision='d', FORCE=0, LAT=-1, nreg=-1, MaxL1=128
NB setting not supplied; calculating:
make[6]: Entering directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/tune/blas/gemm'
rm -f res/L1CacheSize
cd /home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/tune/sysinfo ; make res/L1CacheSize
make[7]: Entering directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/tune/sysinfo'
make[7]: `res/L1CacheSize' is up to date.
make[7]: Leaving directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/tune/sysinfo'
ln -s /home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/tune/sysinfo/res/L1CacheSize res/L1CacheSize
make[6]: Leaving directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/tune/blas/gemm'

      Read in L1 Cache size as = 8KB.
tmp=4, tL1size=1024

      Read in L1 Cache size as = 8KB.
L1Size=1024, pre=d, Smallnb=0
Assertion failed: nb, file /home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/../src//tune/blas/gemm/mmsearch.c, line 1106
mmnreg = 47

NB's to try: 28   20   24   16   32

make[5]: *** [res/dMMRES] Abort (core dumped)
make[5]: Leaving directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/tune/blas/gemm'
make[4]: *** [/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/tune/blas/gemm/res/dMMRES] Error 2
make[4]: Leaving directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/bin'
Assertion failed: fp, file /home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build/../src//bin/atlas_install.c, line 376


IN STAGE 1 INSTALL:  SYSTEM PROBE/AUX COMPILE


   Level 1 cache size calculated as 8KB
   dFPU: Separate multiply and add instructions with 1 cycle pipeline.
         Apparent number of registers : 3
         Register-register performance=330.93MFLOPS
   sFPU: Separate multiply and add instructions with 2 cycle pipeline.
         Apparent number of registers : 5
         Register-register performance=642.85MFLOPS


IN STAGE 2 INSTALL:  TYPE-DEPENDENT TUNING


STAGE 2-1: TUNING PREC='d' (precision 1 of 4)


   STAGE 2-1-1 : BUILDING BLOCK MATMUL TUNE
make -f Makefile INSTALL_LOG/dMMRES pre=d 2>&1 | ./xatlas_tee INSTALL_LOG/dMMSEARCH.LOG
Abort - core dumped
make[3]: *** [build] Error 134
make[3]: Leaving directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build'
make[2]: *** [build] Error 2
make[2]: Leaving directory `/home/kirkby/sage-4.0.1.alpha0/spkg/build/atlas-3.8.3.p2/ATLAS-build'
Failed to build ATLAS.
Failed to build ATLAS.

real    46m55.681s
user    37m30.636s
sys     2m54.685s

Attachments

install.log.gz Download (78.0 KB) - added by drkirkby 15 months ago.
Compressd copy of install process.

Change History

Changed 15 months ago by drkirkby

Compressd copy of install process.

Changed 15 months ago by was

  • priority changed from blocker to critical

Changed 15 months ago by drkirkby

  • owner changed from tbd to drkirkby
  • status changed from new to assigned
  • summary changed from atlas-3.8.3.p2 dumps core on Solaris 10 with gcc 4.4.0 to [with patch; needs review] atlas-3.8.3.p2 dumps core on Solaris 10 with gcc 4.4.0

I've now developed a TEMPORARY fix for this, which needs a review.

Change GuessSmallNB() in src/tune/blas/gemm/mmsearch.c as suggested by Clint Whaley to return 28 on Solaris. This is ONLY A TEMPORARY FIX and once the real problem in the function is sorted out, this fix will need to be removed. But for now it permits ATLAS to build on a Sun T5240 with gcc-4.4.0.

Apart from the comments, the only change to the C source code is to add

return(28);

at the top of the function GuessSmallNB(). This fix is only implemented on Solaris, as the spkg-install now includes:

import shutil
if os.uname()[0] == 'SunOS':
   shutil.copy2('patches/mmsearch-with-temp-Solaris-fix.c','src/tune/blas/gemm/mmsearch.c')

With this patch applied, ATLAS builds on Solaris, with the next Solaris failure being in 'linbox'.

Once an ATLAS developer is able to find the real reason for the failure, an update of the ATLAS source could should be implemented, which will mean we will mean this patch should be removed at a later date.

Please see  http://sage.math.washington.edu/home/kirkby/Solaris-fixes/atlas/

I've NOT used 'hg' to commit this in any way (not even sure if I'm supposed to do that or the reviewer), so can the reviewer please do this for me.

Dave

Changed 15 months ago by drkirkby

  • status changed from assigned to new

I see there is a p3 of this package in sage-4.0.2.rc3.tar, so the version should be update to 4, which will need changes to the SPKG.txt.

Changed 15 months ago by drkirkby

I've updated it, so should be ready to test.

ATLAS sure does take some time to build! Hours and hours.

Changed 15 months ago by was

ATLAS sure does take some time to build! Hours and hours.

If ATLAS doesn't have pretuning information about a given machine it takes hours and hours. When it does have that tuning information cached, it takes about 15 minutes. There is a database of pretuning info included in the ATLAS spkg. We have to figure out how to include t2's tuning info.

I did start this build, so hopefully I can give this a positive review in hours and hours :-)

Changed 15 months ago by was

  • summary changed from [with patch; needs review] atlas-3.8.3.p2 dumps core on Solaris 10 with gcc 4.4.0 to [with patch; positive review] atlas-3.8.3.p2 dumps core on Solaris 10 with gcc 4.4.0

Changed 14 months ago by rlm

  • status changed from new to closed
  • reviewer set to William Stein
  • resolution set to fixed
  • merged set to sage-4.1.rc0

drkirkby -- can you set the Author line for this ticket, and add your full name to the front page?

Changed 14 months ago by mvngu

  • author set to David Kirkby
Note: See TracTickets for help on using tickets.