Opened 9 years ago

Closed 8 years ago

Last modified 8 years ago

#11168 closed defect (fixed)

rubiks fails doctest with gcc 4.6.0 and -O2 optimisation.

Reported by: drkirkby Owned by: drkirkby
Priority: blocker Milestone: sage-4.7
Component: porting: Solaris Keywords:
Cc: jhpalmieri Merged in: sage-4.7.rc2
Authors: David Kirkby, Jeroen Demeyer Reviewers: John Palmieri, David Kirkby
Report Upstream: Fixed upstream, but not in a stable release. Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by jdemeyer)

On a Sun Ultra 27 running OpenSolaris 06/2009, the rubiks-20070912.p12 package builds, but fails a doctest

drkirkby@hawk:~/sage-4.7.alpha3$ ./sage -t  -long -force_lib devel/sage/sage/interfaces/rubik.py
sage -t -long -force_lib "devel/sage/sage/interfaces/rubik.py"
**********************************************************************
File "/export/home/drkirkby/sage-4.7.alpha3/devel/sage/sage/interfaces/rubik.py", line 132:
    sage: solver = OptimalSolver() # long time
Exception raised:
    Traceback (most recent call last):
      File "/export/home/drkirkby/sage-4.7.alpha3/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/export/home/drkirkby/sage-4.7.alpha3/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/export/home/drkirkby/sage-4.7.alpha3/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_3[4]>", line 1, in <module>
        solver = OptimalSolver() # long time###line 132:
    sage: solver = OptimalSolver() # long time
      File "/export/home/drkirkby/sage-4.7.alpha3/local/lib/python/site-packages/sage/interfaces/rubik.py", line 98, in __init__
        self.ready()
      File "/export/home/drkirkby/sage-4.7.alpha3/local/lib/python/site-packages/sage/interfaces/rubik.py", line 117, in ready
        self.child.expect('enter cube')
      File "/export/home/drkirkby/sage-4.7.alpha3/local/lib/python/site-packages/pexpect.py", line 912, in expect
        return self.expect_list(compiled_pattern_list, timeout, searchwindowsize)
      File "/export/home/drkirkby/sage-4.7.alpha3/local/lib/python/site-packages/pexpect.py", line 978, in expect_list
        raise EOF (str(e) + '\n' + str(self))
    EOF: End Of File (EOF) in read_nonblocking(). Empty string style platform.
    <pexpect.spawn instance at 0xc5367ac>
    version: 2.0 ($Revision: 1.151 $)
    command: /export/home/drkirkby/sage-4.7.alpha3/local/bin/optimal
    args: ['/export/home/drkirkby/sage-4.7.alpha3/local/bin/optimal']
    patterns:
        enter cube
    buffer (last 100 chars): 
    before (last 100 chars): olutions
    
    initializing transformation tables
    
    init_fulledge_to_edgequot : too many  edgequot's

    after: <class 'pexpect.EOF'>
    match: None
    match_index: None
    exitstatus: None
    flag_eof: 1
    pid: 10994
    child_fd: 3
    timeout: None
    delimiter: <class 'pexpect.EOF'>
    logfile: None
    maxread: 2000
    searchwindowsize: None
    delaybeforesend: 0.1
**********************************************************************
1 items had failures:
   1 of  12 in __main__.example_3
***Test Failed*** 1 failures.
For whitespace errors, see the file /export/home/drkirkby/.sage//tmp/.doctest_rubik.py
         [9.4 s]
 
----------------------------------------------------------------------
The following tests failed:


        sage -t -long -force_lib "devel/sage/sage/interfaces/rubik.py"
Total time for all tests: 9.4 seconds

However, if the optimisation level is changed from -O2 to -O1, the program behaves properly.

drkirkby@hawk:~/sage-4.7.alpha3$ ./sage -t  -long -force_lib devel/sage/sage/interfaces/rubik.py
sage -t -long -force_lib "devel/sage/sage/interfaces/rubik.py"
         [26.7 s]
 
----------------------------------------------------------------------
All tests passed!
Total time for all tests: 26.7 seconds

Similar problems have been observed on other platforms too. A revised package, which works around the gcc 4.6.0 bug by adding the flag -fno-ivopts:

http://boxen.math.washington.edu/home/jdemeyer/spkg/rubiks-20070912.p16.spkg

Reported upstream to gcc: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48702

For other gcc 4.6.0 specific problems see #11216

Attachments (3)

rubiks-optimisation-problem.patch (3.0 KB) - added by drkirkby 8 years ago.
Patch which changes optimisation to -O1 if using gcc 4.6.0 (for review purposes only - changes are in the .spkg)
rubiks-20070912.p14-p15.diff (2.1 KB) - added by jdemeyer 8 years ago.
diff p14 -> p15 (for reviewing only)
rubiks-20070912.p15-p16.diff (1.3 KB) - added by jdemeyer 8 years ago.
Diff for the rubiks spkg, for reviewing only

Download all attachments as: .zip

Change History (42)

comment:1 Changed 9 years ago by drkirkby

  • Authors set to David Kirkby
  • Description modified (diff)
  • Status changed from new to needs_review
  • Type changed from PLEASE CHANGE to defect

comment:2 Changed 9 years ago by drkirkby

  • Description modified (diff)

comment:3 Changed 9 years ago by jhpalmieri

  • Reviewers set to John Palmieri
  • Status changed from needs_review to positive_review

The patch is very simple. The new spkg builds on both hawk (OpenSolaris on x86, with gcc 4.5.0 and gcc 4.6.0) and t2 (Solaris on sparc, with gcc 4.5.1 and 4.6.0), and tests pass in all cases. With the old version, tests failed on both machines using gcc 4.6.0 (but everything worked fine with the earlier version of gcc).

Note: to build with gcc 4.6.0 on t2, I had to set LD_LIBRARY_PATH and unset LD_OPTIONS; I couldn't get LD_OPTIONS to work. But I don't really know how to properly use these variables anyway...

comment:4 Changed 9 years ago by jdemeyer

  • Merged in set to sage-4.7.alpha5
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:5 Changed 9 years ago by jdemeyer

  • Merged in sage-4.7.alpha5 deleted
  • Resolution fixed deleted
  • Status changed from closed to new

comment:6 Changed 9 years ago by jdemeyer

  • Status changed from new to needs_work

Marking this as needs_work because there are more systems with the same symptom. In particular, on the buildbot Fedora 14-32 (cicero).

comment:7 follow-up: Changed 9 years ago by jdemeyer

Also, I really dislike simply lowering the optimization level. That is wiping the problem under the carpet, not solving it. Also should be reported upstream.

comment:8 in reply to: ↑ 7 Changed 8 years ago by drkirkby

Replying to jdemeyer:

Also, I really dislike simply lowering the optimization level. That is wiping the problem under the carpet, not solving it. Also should be reported upstream.

In the short term, I don't see we can do anything other than just lower the optimisation.

It should be noted that there are quite a few Sage patches applied - some directly to the source code, which should not have been done, but has been done. This package has been patched 13 times already!

Dave

comment:9 follow-up: Changed 8 years ago by drkirkby

  • Description modified (diff)
  • Status changed from needs_work to needs_review

Here's a revised package, which reduces the optimisation level to -O1 (as my previous patch), but this time only on all platforms, and not just Solaris.

Dave

comment:10 in reply to: ↑ 9 Changed 8 years ago by drkirkby

  • Summary changed from rubiks fails doctest on OpenSolaris x86 with gcc 4.6.0 and -O2 optimisation. to rubiks fails doctest with gcc 4.6.0 and -O2 optimisation.

Replying to drkirkby:

Here's a revised package, which reduces the optimisation level to -O1 (as my previous patch), but this time only on all platforms, and not just Solaris.

Dave

The grammar was not very good there! The point being, this patch changes the optimisation to -O1 on all platforms.

Dave

comment:11 follow-up: Changed 8 years ago by mariah

  • Report Upstream changed from Not yet reported upstream; Will do shortly. to Reported upstream. Little or no feedback.
  • Status changed from needs_review to needs_work

I believe only the file src/reid/optimal.c needs to be reduced to -O1 due to an optimization regression in gcc-4.6.0. See gcc bugzilla # 48702 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48702

comment:12 Changed 8 years ago by mariah

  • Report Upstream changed from Reported upstream. Little or no feedback. to Reported upstream. Developers acknowledge bug.

comment:13 in reply to: ↑ 11 Changed 8 years ago by jdemeyer

Replying to mariah:

I believe only the file src/reid/optimal.c needs to be reduced to -O1 due to an optimization regression in gcc-4.6.0. See gcc bugzilla # 48702 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48702

Many thanks for sorting this out!!!

I'm not so sure whether it makes sense to change the optimization level for just 1 file. Maybe the whole directory src/reid yes, but one file?

comment:14 Changed 8 years ago by drkirkby

I don't know for sure, but I don't believe any of this is going to be too time-critical. I can't somehow seeing one wanting to solve lots of rubiks cubes and given solving them takes very little time, I don't see this as very important if we lose a few clock cycles here or there.

dave

comment:15 Changed 8 years ago by jdemeyer

Since this is clearly a gcc 4.6.0 bug, I suggest to check in spkg-install whether we are using gcc 4.6.0 and only reduce the optimization level to -O1 if so.

Another annoying thing about this package is that some of the patches in patches/ are the wrong way around (they are patches from the patched version to the original version).

comment:16 Changed 8 years ago by drkirkby

  • Description modified (diff)
  • Status changed from needs_work to needs_review

I've now changed this so instead of reducing the optimisation on Solaris, it does it on all platforms with gcc 4.6.0. First it tests for gcc using $SAGE_LOCAL/bin/testcc.sh, then if we are using gcc, a second test is performed for gcc 4.6.0 by adding the 'dumpversion' option to gcc.

The following is the critical bit of new code.

OPTIMIZATION_FLAGS="-O2"
if [ "x`$SAGE_LOCAL/bin/testcc.sh $CC`" = xGCC ] && [ "x`$CC -dumpversion`" = x4.6.0 ] ; then 
   echo "Warning: Reducing the optimisation level to -O1 due to a bug in gcc 4.6.0"
   OPTIMIZATION_FLAGS="-O1"
fi

There are some typos corrected too - see the attached patch.

Dave

Changed 8 years ago by drkirkby

Patch which changes optimisation to -O1 if using gcc 4.6.0 (for review purposes only - changes are in the .spkg)

comment:17 Changed 8 years ago by drkirkby

  • Description modified (diff)

comment:18 follow-up: Changed 8 years ago by jdemeyer

  • Authors changed from David Kirkby to David Kirkby, Jeroen Demeyer
  • Description modified (diff)

David, in principle I agree with your changes. However, the gcc report indicates a problem with gcc/tree-ssa-loop-ivopts.c and because of this it suffices to disable -fivopts. New spkg which does this (and also simplifies the gcc version check): http://boxen.math.washington.edu/home/jdemeyer/spkg/rubiks-20070912.p15.spkg

comment:19 in reply to: ↑ 18 ; follow-up: Changed 8 years ago by drkirkby

Replying to jdemeyer:

David, in principle I agree with your changes. However, the gcc report indicates a problem with gcc/tree-ssa-loop-ivopts.c and because of this it suffices to disable -fivopts. New spkg which does this (and also simplifies the gcc version check): http://boxen.math.washington.edu/home/jdemeyer/spkg/rubiks-20070912.p15.spkg

I think my test

if [ "x`$SAGE_LOCAL/bin/testcc.sh $CC`" = xGCC ] && [ "x`$CC -dumpversion`" = x4.6.0 ] ; then  

for the version is preferable. I can't see the point in calling a C compiler with --version then greping the output, where there is a documented option (-dumpversion) to show just the version,

Also, this could well fail for another compiler with a version of 4.6.0 - I think its safer to test for gcc first.

Dave

comment:20 in reply to: ↑ 19 Changed 8 years ago by jdemeyer

Replying to drkirkby:

Replying to jdemeyer:

David, in principle I agree with your changes. However, the gcc report indicates a problem with gcc/tree-ssa-loop-ivopts.c and because of this it suffices to disable -fivopts. New spkg which does this (and also simplifies the gcc version check): http://boxen.math.washington.edu/home/jdemeyer/spkg/rubiks-20070912.p15.spkg

I think my test

if [ "x`$SAGE_LOCAL/bin/testcc.sh $CC`" = xGCC ] && [ "x`$CC -dumpversion`" = x4.6.0 ] ; then  

for the version is preferable. I can't see the point in calling a C compiler with --version then greping the output, where there is a documented option (-dumpversion) to show just the version,

Also, this could well fail for another compiler with a version of 4.6.0 - I think its safer to test for gcc first.

My test does test for gcc in the --version output.

If I change the gcc test to use your original test code, would the rest of my patch get positive_review?

comment:21 follow-up: Changed 8 years ago by jdemeyer

David, another minor issue with your test: I would change [ "x`$CC -dumpversion`" = x4.6.0 ] to [ "x`$CC -dumpversion 2>/dev/null`" = x4.6.0 ] in case that $CC does not support -dumpversion.

comment:22 in reply to: ↑ 21 ; follow-ups: Changed 8 years ago by drkirkby

Replying to jdemeyer:

David, another minor issue with your test: I would change [ "x`$CC -dumpversion`" = x4.6.0 ] to [ "x`$CC -dumpversion 2>/dev/null`" = x4.6.0 ] in case that $CC does not support -dumpversion.

I need to be doing something else now (gardening), so don't have chance to do much with this, but a few quick notes.

  • My machine is rather busy now, doing a scrub of the disks (Google scrub and ZFS if you want). The test took 130.9 seconds, compared to the 26.7 seconds with just -O1. That's slowed down by a factor of almost 5. Perhaps the load on my machine has caused this, but can you check this option has not slowed things significantly. If it slower than -O1, then I think it would be better to revert to -O1.
  • My test does not use the -dumpversion option unless the compiler is first confirmed to be gcc, since the second part of the test (the version) will not be executed unless the first part (the type of compiler), is confirmed to be GCC. So I don't really see the need to send stderr to /dev/null, but I'm not bothered about that fact.
  • A very minor point, but it would be worth putting a link to the gcc bug in SPKG.txt.

Assuming you can confirm the new compiler option given does not cause a dramatic slowdown, and change the test, then feel free to change the ticket to positive review - don't wait for me. Issues about SPKG.txt and /dev/null are not too critical - make what changes you feel are appropriate.

Dave

comment:23 in reply to: ↑ 22 Changed 8 years ago by jdemeyer

Replying to drkirkby:

can you check this option has not slowed things significantly. If it slower than -O1, then I think it would be better to revert to -O1.

On my machine (Linux arcanis 2.6.34-gentoo-r12 #2 SMP Mon Dec 6 17:16:04 CET 2010 x86_64 Intel(R) Core(TM)2 Duo CPU T5870 @ 2.00GHz GenuineIntel GNU/Linux) with gcc 4.6.0, the long test sage/interfaces/rubik.py takes:

  1. p14 (using -O1): 39.9 s
  2. p15 (using -O2 -fno-ivopts): 37.5 s

comment:24 in reply to: ↑ 22 ; follow-up: Changed 8 years ago by jdemeyer

Replying to drkirkby:

  • My test does not use the -dumpversion option unless the compiler is first confirmed to be gcc, since the second part of the test (the version) will not be executed unless the first part (the type of compiler), is confirmed to be GCC. So I don't really see the need to send stderr to /dev/null, but I'm not bothered about that fact.

Can you guarantee that every compiler which defines __GNUC__ understands the option -dumpversion?

Changed 8 years ago by jdemeyer

diff p14 -> p15 (for reviewing only)

comment:25 in reply to: ↑ 24 Changed 8 years ago by drkirkby

Replying to jdemeyer:

Replying to drkirkby:

  • My test does not use the -dumpversion option unless the compiler is first confirmed to be gcc, since the second part of the test (the version) will not be executed unless the first part (the type of compiler), is confirmed to be GCC. So I don't really see the need to send stderr to /dev/null, but I'm not bothered about that fact.

Can you guarantee that every compiler which defines __GNUC__ understands the option -dumpversion?

I think if any compiler is going to aim to be GCC compatible then it would need to support the '-dumpversion' option. If it does not, then I would consider that a compiler bug. The Intel compiler supports '-dumpversion' - see

http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/cpp/lin/main_cls_lin.pdf

Clang does too:

https://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Driver/Options.td?sortby=date&r1=106956&r2=106955&pathrev=106956&diff_format=s

GCC For Sun Systems uses the gcc front end and takes the GNU options. It actually uses the GCC source code, so I assume that would take the '-dumpversion' option too.

I'm not aware of any other GCC compatible compiler. But if they really are GCC compatible, they should take the '-dumpversion' option.

I've just checked gcc 3.4.3, and that takes the '-dumpversion' option.

drkirkby@hawk:~$ /usr/bin/gcc -dumpversion
3.4.3
drkirkby@hawk:~$ /usr/bin/g++ -dumpversion
3.4.3
drkirkby@hawk:~$ 

Just out of interest, I checked on t2.math (a SPARC system), and that supports -dumpversion' too.

kirkby@t2:32 ~$ /usr/sfw/bin/gcc -dumpversion
3.4.3

I would use more caution if it was the Fortran compiler we were testing, as the option gave a different output (same as --version) in older versions of gfortran. But that bug was fixed a year or two ago, so that now 'gfortran' gives just the version number if given the option '-dumpversion'. But that test in the rubiks package only tests the C compiler, so it's immaterial what gfortran does.

I would also note we are using the -dumpversion' compiler option already. The 'prereq' script uses two autoconf macros (ax_gcc_version.m4 & ax_gxx_version.m4) which use the option. That is run to check the versions of gcc and g++ are the same. So I think if there was a problem using the -dumpversion' compiler option, we would have known about it before.

Dave

comment:26 follow-up: Changed 8 years ago by jdemeyer

  • Priority changed from major to blocker

David, I agree with everything you say, but I hope you can agree that the 2>/dev/null doesn't hurt anyway. Can it get positive_review now?

comment:27 in reply to: ↑ 26 Changed 8 years ago by drkirkby

  • Status changed from needs_review to positive_review

Replying to jdemeyer:

David, I agree with everything you say, but I hope you can agree that the 2>/dev/null doesn't hurt anyway. Can it get positive_review now?

Sure.

comment:28 Changed 8 years ago by jdemeyer

  • Merged in set to sage-4.7.rc1
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:29 Changed 8 years ago by jdemeyer

  • Merged in sage-4.7.rc1 deleted
  • Resolution fixed deleted
  • Status changed from closed to new

comment:30 Changed 8 years ago by jdemeyer

  • Description modified (diff)
  • Status changed from new to needs_review

comment:31 Changed 8 years ago by jdemeyer

  • Reviewers changed from John Palmieri to John Palmieri, David Kirkby

Changed 8 years ago by jdemeyer

Diff for the rubiks spkg, for reviewing only

comment:32 follow-up: Changed 8 years ago by mariah

  • Status changed from needs_review to needs_work

The patch will be applied for later versions than just gcc-4.6.0. Recommend that the patch only be applied for gcc-4.6.0.

comment:33 in reply to: ↑ 32 Changed 8 years ago by jdemeyer

  • Status changed from needs_work to needs_review

Replying to mariah:

The patch will be applied for later versions than just gcc-4.6.0. Recommend that the patch only be applied for gcc-4.6.0.

Why?

There already exist pre-release versions of gcc 4.6.1 manifesting the same problem.

I think that having the workaround for all versions 4.6.x doesn't really hurt. It is certainly a safer alternative than applying the workaround only for gcc 4.6.0. If at some point, this bug is fixed in a later gcc 4.6.x, then we can still change the spkg accordingly.

comment:34 Changed 8 years ago by jdemeyer

  • Description modified (diff)

comment:35 follow-up: Changed 8 years ago by jdemeyer

Can somebody please review? Thanks.

comment:36 in reply to: ↑ 35 Changed 8 years ago by drkirkby

  • Status changed from needs_review to positive_review

Replying to jdemeyer:

Can somebody please review? Thanks.

Sure:

drkirkby@hawk:~/sage-4.7.rc0$  ./sage -t  -long -force_lib devel/sage/sage/interfaces/rubik.py
sage -t -long -force_lib "devel/sage/sage/interfaces/rubik.py"
	 [25.5 s]
 
----------------------------------------------------------------------
All tests passed!
Total time for all tests: 25.6 seconds

so that option is working to resolve the bug on OpenSolaris too. This looks good to me.

Dave

comment:37 Changed 8 years ago by jdemeyer

  • Merged in set to sage-4.7.rc2
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:38 Changed 8 years ago by jdemeyer

  • Report Upstream changed from Reported upstream. Developers acknowledge bug. to Fixed upstream, but not in a stable release.

comment:39 Changed 8 years ago by jdemeyer

See #11437 for a follow-up.

Note: See TracTickets for help on using tickets.