Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#10042 closed defect (fixed)

Doctest failure in sage/rings/polynomial/polynomial_element.pyx

Reported by: mpatel Owned by: mvngu
Priority: blocker Milestone: sage-4.6
Component: doctest coverage Keywords:
Cc: mhansen, zimmerma, was, jason Merged in: sage-4.6.rc0
Authors: Dima Pasechnik Reviewers: Leif Leonhardy
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by mpatel)

With the forthcoming 4.6.alpha2, I get this doctest failure on the Skynet machines cicero (x86-Linux-pentium4-fc) and fulvia (x86_64-SunOS-core2):

sage -t -long "devel/sage/sage/rings/polynomial/polynomial_element.pyx"
**********************************************************************
File "/home/mpatel/build/fulvia/sage-4.6.alpha2/devel/sage/sage/rings/polynomial/polynomial_element.pyx", line 4316:
    sage: f.roots(algorithm='numpy')
Expected:
    doctest... UserWarning: NumPy does not support arbitrary precision arithmetic.  The roots found will likely have less precision than you expect.
    [(-1.7724538509055158819194275565678253769874572753906250000000, 1), (1.7724538509055158819194275565678253769874572753906250000000, 1)]
Got:
    doctest:1: UserWarning: NumPy does not support arbitrary precision arithmetic.  The roots found will likely have less precision than you expect.
    [(-1.7724538509055158819194275565678253769874572753906250000000, 1), (1.7724538509055161039640324815991334617137908935546875000000, 1)]

Related: #8054.

Attachments (1)

trac_10042_xsquare_pi.patch (1.1 KB) - added by dimpase 9 years ago.
the patch the put ellipses to the right precision

Download all attachments as: .zip

Change History (88)

comment:1 Changed 9 years ago by mpatel

  • Description modified (diff)

comment:2 Changed 9 years ago by zimmerma

  • Status changed from new to needs_info

Probably something has changed in roots(algorithm='numpy') between 4.6.alpha1 and 4.6.alpha2. Was numpy upgraded? Is the 4.6.alpha2 source available somewhere?

Paul

comment:3 Changed 9 years ago by mpatel

We didn't upgrade NumPy. We released 4.6.alpha2 just some hours ago. The source is in

http://sage.math.washington.edu/home/release/sage-4.6.alpha2

A list of merged tickets is here.

comment:4 Changed 9 years ago by zimmerma

it works on my machine (64-bit Core 2) with sage-4.6.alpha2, thus it is a machine-dependent problem. Please could someone who has access to cicero or fulvia isolate this problem? The code starts at line 4560 in rings/polynomial/polynomial_element.pyx.

Maybe it is a problem in NumPy?, maybe in the conversion between RealField? and the NumPy? types.

Note the difference is only by one unit in last place between the "got" and "expected" values, thus it might be a conversion error too.

Paul

comment:5 Changed 9 years ago by drkirkby

On OpenSolaris x86:

sage -t  -long devel/sage/sage/rings/polynomial/polynomial_element.pyx
**********************************************************************
File "/export/home/drkirkby/sage-4.6.alpha2/devel/sage-main/sage/rings/polynomial/polynomial_element.pyx", line 4316:
    sage: f.roots(algorithm='numpy')
Expected:
    doctest... UserWarning: NumPy does not support arbitrary precision arithmetic.  The roots found will likely have less precision than you expect.
    [(-1.7724538509055158819194275565678253769874572753906250000000, 1), (1.7724538509055158819194275565678253769874572753906250000000, 1)]
Got:
    doctest:1: UserWarning: NumPy does not support arbitrary precision arithmetic.  The roots found will likely have less precision than you expect.
    [(-1.7724538509055158819194275565678253769874572753906250000000, 1), (1.7724538509055161039640324815991334617137908935546875000000, 1)]
**********************************************************************

I've not looked at this code, but is this only using machine precision? (That's what the numPy message semi-implies). If so, why is there any surprise this acts differently on different machines? Perhaps I've missed the point of this.

Dave

comment:6 follow-up: Changed 9 years ago by zimmerma

I've not looked at this code, but is this only using machine precision? If so, why is there any surprise this acts differently on different machines?

if the code is only using basic arithmetic operations and the square root, which is what it should do to solve x^2-pi (which is what the failing test does) the behaviour should not depend on the machine used, since those operations are standardized by IEEE 754.

Paul

comment:7 in reply to: ↑ 6 Changed 9 years ago by drkirkby

Replying to zimmerma:

I've not looked at this code, but is this only using machine precision? If so, why is there any surprise this acts differently on different machines?

if the code is only using basic arithmetic operations and the square root, which is what it should do to solve x^2-pi (which is what the failing test does) the behaviour should not depend on the machine used, since those operations are standardized by IEEE 754.

Paul

But the maths library is not standardised by IEEE 754. It's known for example that exp(1.0) gives a different answer on SPARC to x86.

That said, the fact the answers should be sqrt(pi) and -sqrt(pi), one might hope that one gets the same magnitude I must admit.

I just created a noddy C program on my OpenSolaris machine where this failed.

drkirkby@hawk:~$ cat test.c
#include <stdio.h>
#include <math.h>

volatile double x=M_PI;
int main() {
    printf ("%.70lf\n",sqrt(x));
}

drkirkby@hawk:~$ gcc -lm test.c
drkirkby@hawk:~$ ./a.out
1.7724538509055158819194275565678253769874572753906250000000000000000000

comment:8 follow-up: Changed 9 years ago by zimmerma

I just created a noddy C program on my OpenSolaris? machine where this failed.

what did fail? You got the expected result.

Paul

comment:9 Changed 9 years ago by kcrisman

Another datapoint. On OS X 10.4 on PPC, I get

Expected:
    doctest... UserWarning: NumPy does not support arbitrary precision arithmetic.  The roots found will likely have less precision than you expect.
    [(-1.7724538509055158819194275565678253769874572753906250000000, 1), (1.7724538509055158819194275565678253769874572753906250000000, 1)]
Got:
    doctest:1: UserWarning: NumPy does not support arbitrary precision arithmetic.  The roots found will likely have less precision than you expect.
    [(-1.7724538509055161039640324815991334617137908935546875000000, 1), (1.7724538509055158819194275565678253769874572753906250000000, 1)]

in Sage 4.6.alpha2

comment:10 follow-up: Changed 9 years ago by zimmerma

Karl-Dieter, please could you isolate the problem (see instructions in comment 4)? In particular we need to know if the problem lies in NumPy? itself, or in the RR<->NumPy? conversion.

Paul

comment:11 in reply to: ↑ 8 Changed 9 years ago by drkirkby

Replying to zimmerma:

I just created a noddy C program on my OpenSolaris? machine where this failed.

what did fail? You got the expected result.

Paul

What I meant by "where this failed" was that the C program was compiled and built on the same computer on which the Sage doctest failed.

I realise that the result I got from the noddy C program was the same as was expected in the Sage doctest.

I don't have the time to investigate this.

Dave

comment:12 in reply to: ↑ 10 Changed 9 years ago by kcrisman

Replying to zimmerma:

Karl-Dieter, please could you isolate the problem (see instructions in comment 4)? In particular we need to know if the problem lies in NumPy? itself, or in the RR<->NumPy? conversion.

Like Dave, I don't have time to do much with this. Apparently something goes 'wrong' in ?RealField?, based on what I see here:

----------------------------------------------------------------------
| Sage Version 4.6.alpha2, Release Date: 2010-09-29                  |
| Type notebook() for the GUI, and license() for information.        |
----------------------------------------------------------------------
**********************************************************************
*                                                                    *
* Warning: this is a prerelease version, and it may be unstable.     *
*                                                                    *
**********************************************************************
sage: R.<x> = RealField(200)[]
sage: f = x^2 - R(pi)
sage: f.roots(algorithm='numpy')
/Users/student/Desktop/sage-4.6.alpha2/local/bin/sage-ipython:1: UserWarning: NumPy does not support arbitrary precision arithmetic.  The roots found will likely have less precision than you expect.
  #!/usr/bin/env python
[(-1.7724538509055161039640324815991334617137908935546875000000, 1), (1.7724538509055158819194275565678253769874572753906250000000, 1)]
sage: seq=[]
sage: K = f.parent().base_ring()
sage: L = K
sage: K
Real Field with 200 bits of precision
sage: input_fp = True
sage: L  
Real Field with 200 bits of precision
sage: output_fp=True
sage: input_complex=False
sage: output_complex=False
sage: input_arbprec=True        
sage: import numpy
sage: from numpy.linalg.linalg import LinAlgError
sage: coeffs = f.list()
sage: coeffs
[-3.1415926535897932384626433832795028841971693993751058209749, 0.00000000000000000000000000000000000000000000000000000000000, 1.0000000000000000000000000000000000000000000000000000000000]
sage: ty = float                 
sage: numpy_dtype='double'
sage: numpy_array=numpy.array([ty(c) for c in reversed(coeffs)],dtype=numpy_dtype)
sage: ext_rts1 = numpy.roots(numpy_array)
sage: ext_rts1
array([ 1.77245385, -1.77245385])
sage: rts = []
sage: for rt in ext_rts1:
....:     rts.append(CDF(rt))
....:     
sage: rts.sort()
sage: ext_rts = rts
sage: ext_rts
[-1.77245385091, 1.77245385091]
sage: rts = sorted([L(root.real()) for root in ext_rts if root.imag() == 0])
sage: rts
[-1.7724538509055161039640324815991334617137908935546875000000, 1.7724538509055158819194275565678253769874572753906250000000]

which is the same as my final output, I think. Note that ext_rts still have the same magnitude. This is a big-endian machine, I guess (PPC)? I have no idea if that is relevant, and I note that the error seems to be the opposite of the one others reported.

comment:13 follow-up: Changed 9 years ago by zimmerma

thank you Karl-Dieter for your help. It is much appreciated. On my computer I get more digits if I print elements of ext_rts1 separately:

sage: ext_rts1[0]
-1.7724538509055159
sage: ext_rts1[1]
1.7724538509055159

which shows that they are exactly opposite:

sage: ext_rts1[0]+ext_rts1[1]
0.0

Similarly I get:

sage: rts[0]
-1.77245385091
sage: rts[1]
1.77245385091
sage: rts[0]+rts[1]
0

What do you get for the above commands?

Paul

comment:14 in reply to: ↑ 13 Changed 9 years ago by kcrisman

Replying to zimmerma:

thank you Karl-Dieter for your help. It is much appreciated. On my computer I get more digits What do you get for the above commands?

Okay, not the same.

sage: ext_rts1[0]
1.7724538509055159
sage: ext_rts1[1]
-1.7724538509055161
sage: ext_rts1[0]+ext_rts1[1]
-2.2204460492503131e-16
sage: rts[0]
-1.7724538509055161039640324815991334617137908935546875000000
sage: rts[1]
1.7724538509055158819194275565678253769874572753906250000000
sage: rts[0]+rts[1]
-2.2204460492503130808472633361816406250000000000000000000000e-16

This happens whether I use 4.6.alpha2 (numpy-1.3.0) or 4.6.alpha1+numpy-1.4.1 (happened to be working on that ticket on this computer, of course that's not merged yet). Was this doctest added in alpha2?

comment:15 Changed 9 years ago by jhpalmieri

Up to a sign, I get the same thing on fulvia:

sage: ext_rts1[0]
-1.7724538509055159
sage: ext_rts1[1]
1.7724538509055161
sage: ext_rts1[0] + ext_rts1[1]
2.2204460492503131e-16
sage: rts[0] + rts[1]
2.2204460492503130808472633361816406250000000000000000000000e-16

I think the doctest is new between 4.6.alpha1 and 4.6.alpha2, as well as the warning message about NumPy not supporting arbitrary precision arithmetic.

comment:16 Changed 9 years ago by zimmerma

  • Cc was added
  • Status changed from needs_info to needs_work
  • Work issues set to report upstream (numpy)

thanks Karl-Dieter and John. This clearly shows this is an upstream (numpy) problem. Does anybody know how to report the problem upstream? The numpy SPKG.txt file says that *Josh* Kantor is the maintainer, but on trac.sagemath.org I can only see *Jeff* Kantor.

Paul

comment:17 Changed 9 years ago by zimmerma

Was this doctest added in alpha2?

yes, see #8054.

Paul

comment:18 Changed 9 years ago by kcrisman

With respect to upstream, it's worth pointing out that we are a couple releases behind. Eventually John or I might be able to try this with the numpy 1.5.0 that is now on Trac # something, but it's possible it's already fixed there, so you may want to try the latest numpy and see about that first.

comment:19 Changed 9 years ago by zimmerma

so you may want to try the latest numpy and see about that first

unfortunately I cannot reproduce the original problem on my computer, thus trying the latest numpy will surely work for me, but will not help resolving that issue.

Paul

comment:20 Changed 9 years ago by jhpalmieri

I downloaded the new numpy skpg from #9808 and I've installed it on fulvia. Now I'm running sage -ba, and then I'll rerun the doctest here.

comment:21 Changed 9 years ago by jhpalmieri

Well, that didn't work ("ImportError?: ld.so.1: python: fatal: relocation error:"). So I'll try again, building everything from scratch.

comment:22 follow-up: Changed 9 years ago by zimmerma

Dave kindly opened an account for me on his computer. I did investigate: the problem lies in the eigvals routine from numpy, which itself calls the dgeev routine from lapack. I am stuck there, since the dgeev routine is quite complex. Again, the maintainers of the lapack spkg are Josh Kantor and William.

Paul

comment:23 Changed 9 years ago by jhpalmieri

I'm also stuck because the new version of numpy from #9808 doesn't really build successfully on fulvia, so I can't test whether the new numpy fixes the problem. One suggestion at #9808 was that lapack and/or ATLAS are broken on fulvia, which may be causing the problems with the new numpy build. I suppose it might also cause the problem on this ticket, from what you're saying.

comment:24 in reply to: ↑ 22 Changed 9 years ago by kcrisman

Replying to zimmerma:

Dave kindly opened an account for me on his computer. I did investigate: the problem lies in the eigvals routine from numpy, which itself calls the dgeev routine from lapack. I am stuck there, since the dgeev routine is quite complex. Again, the maintainers of the lapack spkg are Josh Kantor and William.

I get the same problem with the new numpy package, although I didn't do ./sage -ba since it would take forever... I don't think that matters, though, since

sage: numpy.version.version
'1.5.0'

so it's using the new one.

You may want to email sage-devel on this one to ask who would be checking up on lapack upstream; I don't know, since William has a lot of other things going on and is often a 'default' spkg maintainer, and I'm not sure Kantor has been active much (HISTORY.txt on the website indicates his last 'contribution' in terms of refereeing or code was about a year ago).

Feel free to place any commands here that would enable someone to easily replicate what you found on another machine, by the way.

comment:25 Changed 9 years ago by zimmerma

Feel free to place any commands here that would enable someone to easily replicate what you found on another machine, by the way.

Here is what I did on Dave's machine:

R.<x> = RealField(200)[]
f = x^2 - R(pi)
import numpy
coeffs = f.list()
ty = float
numpy_dtype='double'
numpy_array=numpy.array([ty(c) for c in reversed(coeffs)],dtype=numpy_dtype)
ext_rts1 = numpy.roots(numpy_array)
ext_rts1[0]
ext_rts1[1]
p=numpy_array
from numpy.lib.shape_base import hstack, atleast_1d
p = atleast_1d(p)
import numpy.core.numeric as NX
non_zero = NX.nonzero(NX.ravel(p))[0]
trailing_zeros = len(p) - non_zero[-1] - 1
p = p[int(non_zero[0]):int(non_zero[-1])+1]
N = len(p)
from numpy.lib.twodim_base import diag, vander
A = diag(NX.ones((N-2,), p.dtype), -1)
A[0, :] = -p[1:] / p[0]
from numpy.linalg import eigvals, lstsq
roots = eigvals(A)
roots[0]
roots[1]
from numpy.core import asarray, inexact, complexfloating, zeros, double
a = copy(A)
a = asarray(a)
wrap = getattr(a, "__array_wrap__", a.__array_wrap__)
t = double
result_t = double
from numpy.linalg import lapack_lite
n = a.shape[0]
dummy = zeros((1,), t)
wr = zeros((n,), t)
wi = zeros((n,), t)
lwork = 1
work = zeros((lwork,), t)
lapack_routine = lapack_lite.dgeev
results = lapack_routine('N', 'N', n, a, n, wr, wi,
                         dummy, 1, dummy, 1, work, -1, 0)

res=wr.astype(result_t)
res[0]
res[1]

Paul

PS: it seems the final wr value is not always correctly set, sometimes I got 0.0 for res[0] and res[1].

comment:26 Changed 9 years ago by kcrisman

I'm not sure what I am supposed to get with this - what is wrong with 0.0?

sage: res
array([ 0.,  0.])
sage: res[0]
0.0
sage: res[1]
0.0

Should this not happen, but instead just give 0.? Maybe that is the bug, then; I get this consistently, and astype is a numpy method.

I still get the bad result up through roots, but the rest is mysterious to me (other than it gets access to the dgeev method. results seems to be just

sage: results
{'info': 0, 'lda': 2, 'jobvl': 'N', 'ldvl': 1, 'n': 2, 'ldvr': 1, 'lwork': -1, 'jobvr': 'N', 'dgeev_': 0}

I hope that is helpful to know.

comment:27 Changed 9 years ago by drkirkby

I note someone added a work issues report upstream (numpy). Is that still appropiate? Has that been done?

http://projects.scipy.org/numpy

is the place to do it if this is an upstream bug. I've just created a bug report there myself http://projects.scipy.org/numpy/ticket/1625 for the Numpy issue at #9808.

comment:28 follow-ups: Changed 9 years ago by zimmerma

  • Work issues report upstream (numpy) deleted

I'm not sure what I am supposed to get with this - what is wrong with 0.0?

I also got sometimes 0.0, instead of the expected values array([ 1.77245385, -1.77245385]).

Thanks Dave for reporting the problem upstream.

Paul

comment:29 in reply to: ↑ 28 Changed 9 years ago by drkirkby

Replying to zimmerma:

I'm not sure what I am supposed to get with this - what is wrong with 0.0?

I also got sometimes 0.0, instead of the expected values array([ 1.77245385, -1.77245385]).

Thanks Dave for reporting the problem upstream.

Paul

Paul,

I have NOT reported this upstream. I reported another Numpy issue to the Numpy list, and gave a link to where to post bug reports, but I have not done this one. It would be better if someone who understands the bug better does that.

Dave

comment:30 Changed 9 years ago by zimmerma

  • Work issues set to report upstream

I have NOT reported this upstream.

sorry. I understand the bug, but don't know NumPy? good enough to isolate it as a NumPy? bug.

Paul

comment:31 in reply to: ↑ 28 Changed 9 years ago by kcrisman

Replying to zimmerma:

I'm not sure what I am supposed to get with this - what is wrong with 0.0?

I also got sometimes 0.0, instead of the expected values array([ 1.77245385, -1.77245385]).

Umm... help me out here.

wr = zeros((n,), t)

So, shouldn't that make an array of zeros?

Also, when I do results, the output is not really helpful to me. Is this supposed to create the array you mention above in place of wr? I'm sorry if I'm missing something obvious.

comment:32 follow-up: Changed 9 years ago by zimmerma

my guess is that lapack_routine modifies in place the array wr of zeros, and that res=wr.astype(result_t) recovers the modified array.

Paul

comment:33 in reply to: ↑ 32 Changed 9 years ago by kcrisman

Replying to zimmerma:

my guess is that lapack_routine modifies in place the array wr of zeros, and that res=wr.astype(result_t) recovers the modified array.

Okay, so one does want something other than the zeros. Thanks for clarifying that.

But I have to say, I don't even get anything different on my OS X 10.6 Mac, yet it gets the right roots

sage: roots = eigvals(A)
sage: roots[0]
1.7724538509055159
sage: roots[1]
-1.7724538509055159

Is it possible you mistyped something around the lapack_routine spot?

comment:34 Changed 9 years ago by zimmerma

on Dave's machine with sage-4.6.alpha2 I get:

sage: roots = eigvals(A)
sage: roots[0]
-1.7724538509055159
sage: roots[1]
1.7724538509055161

Note the different last digits, the positive root should print 1.7724538509055159.

On a Core 2 I get the same results as in comment 33.

Paul

comment:35 Changed 9 years ago by kcrisman

Yes, and I get something similar in Comment 14. My question is that I would like to see what you get for res[0] and res[1] on a machine where it supposedly works, because on my 10.6 Mac (where roots is correct) I also get 0.0 for those, just like on the machine where roots is incorrect. I hope this clarifies what I meant.

comment:36 Changed 9 years ago by zimmerma

I cannot reproduce the non-zero values I once got for res[0] and res[1], I now always get 0.0. It must be something related to the Sage-NumPy? interface. Someone more fluent with NumPy? should look at this. Otherwise I will try to investigate more next week.

Paul

comment:37 Changed 9 years ago by drkirkby

Has anyone had chance to look at this? It is marked as a blocker for 4.6, so is quite important this is squashed soon.

Dave

comment:38 follow-up: Changed 9 years ago by zimmerma

Has anyone had chance to look at this? It is marked as a blocker for 4.6, so is quite important this is squashed soon.

I agree. The problem has been isolated to be a NumPy? (or Lapack) problem. Thus someone fluent with NumPy? and/or Lapack should look at it. Apart from William, who is quite busy, can we add someone in cc who knows about Numpy and/or Lapack?

Otherwise, a quick-and-dirty workaround is to replace the last digits by ... in the doctest to allow some numerical error.

Paul

comment:39 in reply to: ↑ 38 Changed 9 years ago by drkirkby

Replying to zimmerma:

Has anyone had chance to look at this? It is marked as a blocker for 4.6, so is quite important this is squashed soon.

I agree. The problem has been isolated to be a NumPy? (or Lapack) problem. Thus someone fluent with NumPy? and/or Lapack should look at it. Apart from William, who is quite busy, can we add someone in cc who knows about Numpy and/or Lapack?

If we could get some more precise details about how it's been isolated, we might be able to post to the Numpy list now.

Otherwise, a quick-and-dirty workaround is to replace the last digits by ... in the doctest to allow some numerical error.

Paul

I'm not sure that achieves anything useful. Sure the doctest will pass, but it might actually delay the urgency given to fixing the underlying problem. A successful test is one that has uncovered a previously unknown error. This tests has done that. I don't personally think it's appropriate to water down the test just to get 100% passes.

I suggested some time ago we should have a page on a wiki which documents known issues with a release - either before or after the release is made.

e.g I started this, though it's not been updated

http://wiki.sagemath.org/errata

I think it would be better to document the problem rather than temporarily hide it.

Dave

comment:40 follow-up: Changed 9 years ago by kcrisman

On sage-release, Georg Weber reports the following:

Sage-4.6.alpha3 builds fine on 32bit MacIntel with OS X 10.4 (Tiger). 

sage -t  -long -force_lib "devel/sage/sage/rings/polynomial/ 
polynomial_element.pyx" 
********************************************************************** 
File "/Users/Shared/sage/test/sage-4.6.alpha3/devel/sage/sage/rings/ 
polynomial/polynomial_element.pyx", line 4317: 
    sage: f.roots(algorithm='numpy') 
Expected: 
    doctest... UserWarning: NumPy does not support arbitrary precision 
arithmetic.  The roots found will likely have less precision than you 
expect. 
    [(-1.7724538509055158819194275565678253769874572753906250000000, 
1), (1.7724538509055158819194275565678253769874572753906250000000, 1)] 
Got: 
    doctest:1: UserWarning: NumPy does not support arbitrary precision 
arithmetic.  The roots found will likely have less precision than you 
expect. 
    [(-1.7724538509055161039640324815991334617137908935546875000000, 
1), (1.7724538509055158819194275565678253769874572753906250000000, 1)] 
********************************************************************** 

So this is yet another data point about this issue.

comment:41 Changed 9 years ago by jason

  • Cc jason added

comment:42 follow-up: Changed 9 years ago by drkirkby

If I'm not mistaken, the systems which are showing this problem are all 32-bit builds.

  • 'fulvia', the Solaris 10 x86 system is using a 32-bit version of Sage, despite the hardware is 64-bit
  • 'hawk', my own personal OpenSolaris machine is using a 32-bit version of Sage, despite the hardware is 64-bit
  • Cicero is a 32-bit Linux system - see http://wiki.sagemath.org/skynet
  • Georg Weber's MAC is 32-bit.

Are there any counter examples? I believe John has built Sage on 'mark2' without the failure. Although that's a 64-bit system, the binary is only 32-bit.

From what I understand, sqrt() should be the same on any IEEE 754 system. However, the same is not true for pow(). It seems unlikely but perhaps one root is being computed using sqrt(x) and the other pow(x,0.5). Mathematically they are equal, but perhaps not on a computer with a finite number of bits.

Since pi is irrational, this will never be able to be computed exactly using floating point maths.

Dave

comment:43 in reply to: ↑ 42 Changed 9 years ago by kcrisman

Replying to drkirkby:

If I'm not mistaken, the systems which are showing this problem are all 32-bit builds.

  • 'fulvia', the Solaris 10 x86 system is using a 32-bit version of Sage, despite the hardware is 64-bit
  • 'hawk', my own personal OpenSolaris machine is using a 32-bit version of Sage, despite the hardware is 64-bit
  • Cicero is a 32-bit Linux system - see http://wiki.sagemath.org/skynet
  • Georg Weber's MAC is 32-bit.

My PPC Mac also is building 32-bit.

From what I understand, sqrt() should be the same on any IEEE 754 system. However, the same is not true for pow(). It seems unlikely but perhaps one root is being computed using sqrt(x) and the other pow(x,0.5). Mathematically they are equal, but perhaps not on a computer with a finite number of bits.

Since Paul pointed out that there is some linear algorithm going on here from deep within numpy, it seems unlikely that this is what is going on.

comment:44 follow-up: Changed 9 years ago by zimmerma

Since Paul pointed out that there is some linear algorithm going on here from deep within numpy, it seems unlikely that this is what is going on.

yes it seems the computation is done by the lapack dgeev routine. From the source file dgeev.f in the Sage distribution, it does not call POW however it calls SQRT. But the square root is a standard IEEE function. What could happen however is that on some machines lapack could use double extended precision (i.e., 64-bit instead of 53-bit significand).

One way to debug this is to add print statements in the dgeev.f file, recompile lapack and see the difference between two different computers. How can one (easily) recompile and build a spkg (I guess sage -br will not do that)?

Paul

comment:45 Changed 9 years ago by drkirkby

Paul's comments about the 53 or 64-bits has got me thinking.

I think gcc behaves differently with regards to SSE instructions on 32-bit and 64-bit builds. I believe in some cases the FPU is used, but in others the SSE instructions are used and I recall some comments saying the default changed between 32-bit builds and 64-bit builds, though there are compiler options one can use to override this in modern gccs.

However, why the absolute magnitude of the two roots differ is odd. The two roots I got were:

-1.7724538509055158819194275565678253769874572753906250000000
+1.7724538509055161039640324815991334617137908935546875000000

Dave

comment:46 follow-up: Changed 9 years ago by zimmerma

Dave, you might try to recompile the lapack spkg with -O0 instead of the default -O3.

Paul

comment:47 in reply to: ↑ 46 Changed 9 years ago by drkirkby

Replying to zimmerma:

Dave, you might try to recompile the lapack spkg with -O0 instead of the default -O3.

Paul

I'll try that Paul. The Lapack package is a real mess.

  • It's probably a snapshot taken from CVS, given it has the date in it, but no version number.
  • It's almost 3 years old, but the last Lapack was released in 30th June 2010
  • The Mercurial repository is basically useless.
drkirkby@hawk:~/sage-4.6.alpha3/spkg/standard/lapack-20071123.p1$ hg status | head
M spkg-install
? spkg-install~
? src/BLAS/SRC/Makefile
? src/BLAS/SRC/caxpy.f
? src/BLAS/SRC/ccopy.f
? src/BLAS/SRC/cdotc.f
? src/BLAS/SRC/cdotu.f
? src/BLAS/SRC/cgbmv.f
? src/BLAS/SRC/cgemm.f
drkirkby@hawk:~/sage-4.6.alpha3/spkg/standard/lapack-20071123.p1$ cat SPKG.txt
MAINTAINERS:
    Josh Kantor
    William Stein
drkirkby@hawk:~/sage-4.6.alpha3/spkg/standard/lapack-20071123.p1$ 

I did a build of the old lapack with -O0, but it made no difference. I've tried my noddy C program with compiler options like -mfpmath=387 & -mfpmath=sse -m32 and -m64, but nothing changes the output.

The problem is a lot in Sage depends on lapack, so I think a total rebuild is in order in this case. That includes ATLAS on this system, which takes nearly 2 hours to build in 32-bit mode, as there's no tuning data for my Xeon W3580 on 32-bit builds. On 64-bit builds, ATLAS builds in about 8 minutes.

There must be a pretty good case for updating Lapack very soon, if not in this release. As this ticket is a blocker, in the event an updated lapack solves it, then I think that would be worth doing.

Dave

comment:48 follow-up: Changed 9 years ago by zimmerma

David,

The problem is a lot in Sage depends on lapack, so I think a total rebuild is in order in this case.

I would be surprised that the lapack dgeev routine depends on other components.

Paul

comment:49 in reply to: ↑ 48 Changed 9 years ago by drkirkby

Replying to zimmerma:

David,

The problem is a lot in Sage depends on lapack, so I think a total rebuild is in order in this case.

I would be surprised that the lapack dgeev routine depends on other components.

Paul

Well. I've created a new lapack package, which I've built on my OpenSolaris machine.

I notice it can build 4 libraries, but it looks like Sage is only building one of them. There's this bit of code in one of the files. I've no idea what they all might do, though.

LAPACKLIB    = lapack$(PLAT).a
TMGLIB       = tmglib$(PLAT).a
EIGSRCLIB    = eigsrc$(PLAT).a
LINSRCLIB    = linsrc$(PLAT).a

Perhaps it would be wise if we just run "make" and installed the default libraries, rather than a subset of them. The total build time of the lapack library is quite modest:

real	0m13.445s
user	1m7.550s
sys	0m26.048s
Successfully installed lapack-3.2.2

(that's on a quad core 3.33 GHz Hyperthreaded Xeon, so it would obviously take more/less time on slower/faster hardware. It would not doubt take longer to build all 4 libraries too). But having a partial installation might not be such a good idea and would no doubt make running tests more difficult. (Of course, there's no spkg-check file in lapack package in Sage. Testing code is a very low priority.)

I've put a temporary copy of the lapack package (md5 checksum 521afc0c5b1fde0a0159c4770b0e173e) at

http://boxen.math.washington.edu/home/kirkby/crap/lapack-3.2.2.spkg

This has is just hacked together at this point. There's no Mecurial repository (I deleted the other as it looked pretty useless), SPKG.txt is a mess, there's no spkg-check, not spell checked, .... etc etc. BUT perhaps someone else can try it. Sage built ok on my OpenSolaris machine with this and it's just building the documentation I as write. I'll clean up the package at some point, but some can try if they want.

Dave

comment:50 Changed 9 years ago by drkirkby

I created #10123 to update lapack. It does not look like it will solve this problem, so it's not huge urgency. I'll take care of that.

Dave

comment:51 Changed 9 years ago by drkirkby

I made a mistake. Lapack is building ok, but once ATLAS built, it would have copied the lapack from my global ATLAS install. Hence this was not a correct build. I thought it was a bit quick, as a full build with ATLAS takes 2-3 hours, where this built in 20 minutes or so.

I'm rebuilding now.

Dave

comment:52 follow-up: Changed 9 years ago by jhpalmieri

For lapack, see also #10121: a cleaned up version of the current spkg.

comment:53 in reply to: ↑ 52 Changed 9 years ago by drkirkby

Replying to jhpalmieri:

For lapack, see also #10121: a cleaned up version of the current spkg.

Thank you John. I cc'ed you on #10123. It looks like it will be relatively easy to update the version, as there spkg-install is very simple. So producing a new version should be very easy.

The current package name suggests this is based on code from November 2007, but the last commit to the repository is several months before then. Anyway, hopefully #10123 will bring some benefits.

Dave

comment:54 Changed 9 years ago by drkirkby

I'm busy until at least Wednesday of next week (20th October), so will not be spending much time on Sage.

Updating the version of Lapack (#10123) is more complex than I thought it would be, and I don't have time to complete this. Libraries are called foobar.a and not libfoobar.a. This causes issues with both the packages self-tests and Sage doctests. So the package I created is effectively useless and will remain so unless I work on it more.

IIRC, I did manage to get this doc test to run, and it still failed, so the Lapack upgrade will not fix this. However, it's hard to report bugs upstream when we are running 3 year old versions of packages, so I think updating Lapack is necessary before we report this upstream.

So it would be good if someone could review John's cleaned up Lapack at #10121, and not wait for me to produce a new version of Lapack.

It would be good if someone could write a few doc tests broadly similar to this one, to ensure that under some other conditions the Numpy errors does not become totally unacceptable.

Having thought it more, I think Paul Zimmerma's suggestion to use the dots to make this pass might be sensible, as from a practical point of view, such a small error is not really a problem. There does appear to be a bug, but there are a lot more serious bugs in Sage that are not blockers.

For example, if one runs the Maxima self-tests, several fail with small numerical issues with ECL but pass with other Lisp interpreters. One of the Maxima developers has said he think ECL is "off by one bit in places". Of course, the fact we don't run the Maxima self-tests as part of Sage means we don't see this.

Many testing frameworks for software have several categories for failures, which include

  • Expected to fail and did fail
  • Expected to fail and passed.
  • Expected to pass and failed

comment:55 in reply to: ↑ 40 Changed 9 years ago by dimpase

Replying to kcrisman:

On sage-release, Georg Weber reports the following:

 Sage-4.6.alpha3 builds fine on 32bit MacIntel with OS X 10.4 (Tiger). 
 
 sage -t  -long -force_lib "devel/sage/sage/rings/polynomial/ 
 polynomial_element.pyx" 
 ********************************************************************** 
 File "/Users/Shared/sage/test/sage-4.6.alpha3/devel/sage/sage/rings/ 
 polynomial/polynomial_element.pyx", line 4317: 
......

Same (up to the last digit...) on a MacOSX 10.5 PPC, with gcc-4.1.2 (using an unofficial snapshot of Apple's Xcode from R homepage)

Just my 0.01c

Dima

comment:56 in reply to: ↑ 44 Changed 9 years ago by leif

Replying to zimmerma:

[...] it seems the computation is done by the lapack dgeev routine. From the source file dgeev.f in the Sage distribution, it does not call POW however it calls SQRT. But the square root is a standard IEEE function. What could happen however is that on some machines lapack could use double extended precision (i.e., 64-bit instead of 53-bit significand).

Yes, the difference is caused by LAPACK, not NumPy (see below).

I'm not sure but I doubt PPCs have (Intels!) extended double precision, i.e. -mfpmath=sse vs. -mfpmath=387 doesn't hold there. (But IBM uses other floating-point formats beyond 64-bit double precision.)

One way to debug this is to add print statements in the dgeev.f file, recompile lapack and see the difference between two different computers. How can one (easily) recompile and build a spkg (I guess sage -br will not do that)?

For hacking the Fortran code (avoiding to create new LAPACK spkgs), it is sufficient to extract the corresponding files (e.g. dgeev.f) from the spkg and put modified versions into the [static] LAPACK library, which is $SAGE_ROOT/local/lib/liblapack.a, and to afterwards rebuild / reinstall the NumPy spkg, which links this library into its dynamic ones.

Assuming one has copied lapack-20071123.p1/src/SRC/dgeev.f to $SAGE_ROOT, one can do e.g. the following to include a modified version:

~/sage-x.y.z$ gfortran -fPIC -c dgeev.f                     # or sage_fortran -c dgeev.f
~/sage-x.y.z$ ar rs local/lib/liblapack.a dgeev.o           # "s" saves running ranlib separately
~/sage-x.y.z$ ./sage -f spkg/standard/numpy-1.3.0.p4.spkg   # force reinstallation

and then run Sage.

(Dave: I've not verified if the "s" option is POSIX... ;-) The ar command above is just a short-cut for

~/sage-x.y.z$ ar r local/lib/liblapack.a dgeev.o   # replace the existing dgeev module
~/sage-x.y.z$ ranlib local/lib/liblapack.a         # rebuild the symbol table

)

On a 32-bit Pentium4 (Prescott) and a 64-bit Core2 I get, respectively:

sage: R.<x>=RealField(200)[]                                                
sage: f=x^2 - R(pi)                                                         
sage: import numpy                                                          
sage: na=numpy.array([float(c) for c in reversed(f.list())], dtype="double")
sage: n_roots=numpy.roots(na)                                               
Workspace query only:
A( 1,  1)=   -0.0000000000000000000000000D+00
A( 1,  2)=    0.3141592653589793115998000D+01
A( 2,  1)=    0.1000000000000000000000000D+01
A( 2,  2)=    0.0000000000000000000000000D+00
Actual computation:
A( 1,  1)=   -0.0000000000000000000000000D+00
A( 1,  2)=    0.3141592653589793115998000D+01
A( 2,  1)=    0.1000000000000000000000000D+01
A( 2,  2)=    0.0000000000000000000000000D+00
Eigenvalues only...
WR( 1)=   -0.1772453850905515881919400D+01
WR( 2)=    0.1772453850905516103964000D+01
sage: 
sage: R.<x> = RealField(200)[]
sage: f = x^2 - R(pi)
sage: import numpy
sage: na=numpy.array([float(c) for c in reversed(f.list())], dtype='double')
sage: n_roots=numpy.roots(na)
Workspace query only:
A( 1,  1)=   -0.0000000000000000000000000D+00
A( 1,  2)=    0.3141592653589793115998000D+01
A( 2,  1)=    0.1000000000000000000000000D+01
A( 2,  2)=    0.0000000000000000000000000D+00
Actual computation:
A( 1,  1)=   -0.0000000000000000000000000D+00
A( 1,  2)=    0.3141592653589793115998000D+01
A( 2,  1)=    0.1000000000000000000000000D+01
A( 2,  2)=    0.0000000000000000000000000D+00
Eigenvalues only...
WR( 1)=   -0.1772453850905515881919400D+01
WR( 2)=    0.1772453850905515881919400D+01
sage: 

(The machine parameters queried by LAPACK's DLAMCH() are identical; the chosen precision / format for the PRINT statements, D35.25, should be sufficient.)

What about (temporarily?) tagging the doctest result(s) with # 32-bit / # 64-bit? (It seems the results are identical on all 32-bit platforms...?)

I've not yet tracked this down further (perhaps someone more knowledgeable of LAPACK / Fortran can do this), nor have I tried the more recent LAPACK version.

comment:57 follow-up: Changed 9 years ago by dimpase

The dgeev.f from the canonical place: http://www.netlib.org/lapack/double/dgeev.f does not differ from what we have in lapack spkg. So I doubt that a "more recent version" (well, this one is Fortran VI, no joke...) would provide a fix.

It looks to me more of a compiler bug than Lapack bug... Well, I used to know enough fortran to be able to dig this; I wonder how one can get hold of a non-gcc-based fortran compiler for one of platforms we have this issue...

comment:58 in reply to: ↑ 57 Changed 9 years ago by leif

Replying to dimpase:

The dgeev.f from the canonical place: http://www.netlib.org/lapack/double/dgeev.f does not differ from what we have in lapack spkg. So I doubt that a "more recent version" (well, this one is Fortran VI, no joke...) would provide a fix.

dgeev calls many other modules...

It looks to me more of a compiler bug than Lapack bug...

I was thinking of GCC built-ins, too.

I wonder how one can get hold of a non-gcc-based fortran compiler for one of platforms we have this issue...

Sun's Fortran compiler on e.g. t2.math? Perhaps Dave also has an IBM (AIX) or HP[-UX] Fortran compiler.

Btw, on Linux, I have a couple of LAPACK library versions, including "3gf" ("gf" for GNU's gfortran).

comment:59 follow-up: Changed 9 years ago by leif

I've just recompiled LAPACK with -march=native -mfpmath=sse (by adding this in the sage_fortran script) on the Pentium4 Prescott (which has SSE3)... with that, the difference vanishes.

I still wonder what happens on PowerPC CPUs.

comment:60 in reply to: ↑ 59 ; follow-up: Changed 9 years ago by dimpase

Replying to leif:

I've just recompiled LAPACK with -march=native -mfpmath=sse (by adding this in the sage_fortran script) on the Pentium4 Prescott (which has SSE3)... with that, the difference vanishes.

I still wonder what happens on PowerPC CPUs.

there no SSEx on PPC, it's an x86 thing, IMHO... OK, so perhaps there is a miracle compiler switch on PPC that will do the same trick. Let me write a Fortran - only test, and see...

comment:61 Changed 9 years ago by leif

Replying to leif:

I've just recompiled LAPACK with -march=native -mfpmath=sse (by adding this in the sage_fortran script) on the Pentium4 Prescott (which has SSE3)... with that, the difference vanishes.

... and the opposite (i.e., -mfpmath=387) on the Core2, which in turn results in:

leif@quadriga:~/Sage/sage-4.6.alpha3$ ./sage -t -long devel/sage/sage/rings/polynomial/polynomial_element.pyx 
sage -t -long "devel/sage/sage/rings/polynomial/polynomial_element.pyx"
**********************************************************************
File "/home/leif/Sage/sage-4.6.alpha3/devel/sage/sage/rings/polynomial/polynomial_element.pyx", line 4317:
    sage: f.roots(algorithm='numpy')
Expected:
    doctest... UserWarning: NumPy does not support arbitrary precision arithmetic.  The roots found will likely have less precision than you expect.
    [(-1.7724538509055158819194275565678253769874572753906250000000, 1), (1.7724538509055158819194275565678253769874572753906250000000, 1)]
Got:
    doctest:1: UserWarning: NumPy does not support arbitrary precision arithmetic.  The roots found will likely have less precision than you expect.
    [(-1.7724538509055158819194275565678253769874572753906250000000, 1), (1.7724538509055161039640324815991334617137908935546875000000, 1)]
**********************************************************************
1 items had failures:
   1 of 148 in __main__.example_83
***Test Failed*** 1 failures.
For whitespace errors, see the file /home/leif/.sage//tmp/.doctest_polynomial_element.py
         [10.7 s]
 
----------------------------------------------------------------------
The following tests failed:


        sage -t -long "devel/sage/sage/rings/polynomial/polynomial_element.pyx"

:D

comment:62 in reply to: ↑ 60 ; follow-up: Changed 9 years ago by leif

Replying to dimpase:

there no SSEx on PPC, it's an x86 thing, IMHO...

Of course not (nor 387), but some IBM-specific double-double type.

OK, so perhaps there is a miracle compiler switch on PPC that will do the same trick.

       -fdefault-double-8
           Set the "DOUBLE PRECISION" type to an 8 byte wide type.  If -fdefault-real-8 is given, "DOUBLE PRECISION" would instead be promoted to 16 bytes
           if possible, and -fdefault-double-8 can be used to prevent this.  The kind of real constants like "1.d0" will not be changed by -fdefault-real-8
           though, so also -fdefault-double-8 does not affect it.

;-)

comment:63 in reply to: ↑ 62 Changed 9 years ago by dimpase

Replying to leif:

Replying to dimpase:

there no SSEx on PPC, it's an x86 thing, IMHO...

Of course not (nor 387), but some IBM-specific double-double type.

OK, so perhaps there is a miracle compiler switch on PPC that will do the same trick.

       -fdefault-double-8
           Set the "DOUBLE PRECISION" type to an 8 byte wide type.  If -fdefault-real-8 is given, "DOUBLE PRECISION" would instead be promoted to 16 bytes
           if possible, and -fdefault-double-8 can be used to prevent this.  The kind of real constants like "1.d0" will not be changed by -fdefault-real-8
           though, so also -fdefault-double-8 does not affect it.

;-)

unfortunately this does not help on MacOSX 10.5 PPC. I tried to recompile lapack with extra "-fdefault-double-8 -fdefault-real-8" options put into sage_fortran, to no avail. One cannot add -fdefault-double-8 option alone, as the compiler (gfortran 4.2.1) says that -fdefault-real-8 is required along with it. And -fdefault-real-8 alone produces code that does not work.

The next attempt will be with "-fdefault-real-8 -mlong-double-128 -lmx"...

comment:64 follow-up: Changed 9 years ago by drkirkby

I tried playing with the compiler flags on a noddy C program. My Fortran is almost non-existant, but perhaps it's worth trying a Fortran program. But I can't change the values of these parameters in my Noddy C program by use of compiler flags.

drkirkby@hawk:~$ cat test.c
#include <float.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>

volatile long double x=M_PI;
volatile long double y=-M_PI;

main(){
  printf("+sqrt(pi)=%.70lf\n",sqrt(x));
  printf("-sqrt(pi)=%.70lf\n",sqrt(fabs(y)));
}
drkirkby@hawk:~$ gcc test.c -mfpmath=sse -lm
drkirkby@hawk:~$ ./a.out
+sqrt(pi)=1.7724538509055158819194275565678253769874572753906250000000000000000000
-sqrt(pi)=1.7724538509055158819194275565678253769874572753906250000000000000000000
drkirkby@hawk:~$ gcc test.c -mfpmath=387 -lm
drkirkby@hawk:~$ ./a.out
+sqrt(pi)=1.7724538509055158819194275565678253769874572753906250000000000000000000
-sqrt(pi)=1.7724538509055158819194275565678253769874572753906250000000000000000000

t2.math is in rather a mess now, following William deleting all the NFS file system that were shared to t2.math.

I'll create Dima an account on my OpenSolaris machine. The problem exists on that. I can't say for sure whether it does on my AIX server (PowerPC) or HP-UX workstation (PA-RISC CPU). Both those systems have native compilers, but neither can build Sage. Both are quite a bit slower than the OpenSolaris machine too.

Leave that with me.

Dave

comment:65 in reply to: ↑ 64 Changed 9 years ago by dimpase

  • Priority changed from blocker to major

Replying to drkirkby:

I tried playing with the compiler flags on a noddy C program. My Fortran is almost non-existant, but perhaps it's worth trying a Fortran program. But I can't change the values of these parameters in my Noddy C program by use of compiler flags.

We are computing the roots of f(x)=x2-Pi by computing the eigenvalues of the matrix

[[0.0, Pi], 
[1.0, 0.0]]

We call LAPACK's DGEEV that does just that, by a relatively nontrivial iterative procedure, The stopping condition is related to the precision guaranteed by LAPACK, and it turns out, on some floating point architectures, that the error in computing the negative root is very close to the precision guaranteed by LAPACK: A test program that calls DGEEV on this matrix, on MacOSX 10.5 (PPC G4) outputs:

Eigenvalue( 1) =  1.77245385090551588191942755656782538E+00
Eigenvalue( 2) =-1.77245385090551610396403248159913346E+00
Difference in absolute value  2.22044604925031308084726333618164063E-16

whereas LAPACK precision (for doubles) is, taken from install.log:

  Precision                    =   2.220446049250313E-016

Why does anyone think LAPACK must do better than that, I do not understand. Indeed, some other hardwares do better on this example, but this is just pure luck, and/or ability to get better precision from the hardware--- something that is not available on G4 PPC --- or rather the compiler cannot take full advantage of the hardware.

I propose to adjust the precision in doctests, and be done with it. It is not a bug...

I take the liberty to reduce the priority to major.

Dima

comment:66 Changed 9 years ago by mpatel

Would it help to try building with the Intel Fortran Compiler? It appears to be free on Linux for non-commercial use, but I haven't tried to download and install it. Or are we disallowed from using it, because Sage might be used commercially?

comment:67 follow-up: Changed 9 years ago by mpatel

  • Priority changed from major to blocker

I'd still like to resolve this before we release 4.6.

Are there any objections to Leif's suggestion to use tagged 32-bit and 64-bit results? Or, if tagging doesn't cover all reported cases, to using dots (...)?

comment:68 in reply to: ↑ 67 ; follow-up: Changed 9 years ago by dimpase

  • Work issues report upstream deleted

Replying to mpatel:

I'd still like to resolve this before we release 4.6.

Are there any objections to Leif's suggestion to use tagged 32-bit and 64-bit results? Or, if tagging doesn't cover all reported cases, to using dots (...)?

Well, do we agree now that this is a purely doctest problem? (mhansen (changeset 14950, trac #8054) has put too many digits there...)

One cannot expect more precision from NumPy/Lapack? than a concrete combination of hardware and software gives you. There is in fact a Lapack call that will give you the precision, but this looks to me an overkill to code this into the doctest. On all the platforms we have, the precision of more than 2.22045E-16 is what we get. So the test should pass if the difference between the actual root and the NumPy? output is less than that. Which amounts to putting dots after, umh, you do the maths :-)

P.S. For the interested, this is how to test DGEEV without anything but a Fortran compiler (assuming Lapack is installed):

1) grab http://www.nag.co.uk/lapack-ex/examples/source/dgeev-ex.f off the net

2) edit it to increase the number of digits printed, i.e. edit format statements labelled 9999 and 9998 so that they look as follows:

99999 FORMAT (1X,A,I2,A,1P,E42.35)
99998 FORMAT (1X,A,I2,A,'(',1P,E42.35,',',1P,E42.35,')')

3) compile it with your Fortran compiler: (something like $ gfortran -llapack dgeev-ex.f )

4) create a text file, say, inp.txt, with the input (the data begins at the 2nd line):

# blah...
2
0. 0.3141592653589793115998000D+01
1. 0.

5) $ ./a.out <inp.txt

Eigenvalues of the 2x2 matrix in inp.txt, i.e. +/- square root of Pi, up to the precision the setup can manage, will be printed in all gory detail.

Dima

Changed 9 years ago by dimpase

the patch the put ellipses to the right precision

comment:69 in reply to: ↑ 68 Changed 9 years ago by dimpase

  • Status changed from needs_work to needs_review

Replying to dimpase:

Replying to mpatel:

I'd still like to resolve this before we release 4.6.

Are there any objections to Leif's suggestion to use tagged 32-bit and 64-bit results? Or, if tagging doesn't cover all reported cases, to using dots (...)?

Well, do we agree now that this is a purely doctest problem? (mhansen (changeset 14950, trac #8054) has put too many digits there...)

I've just attached the patch that puts the ellipses to the right places, so that the doctest passes on Macosx 10.5 PPC

comment:70 Changed 9 years ago by zimmerma

thanks Dima for giving the instructions on how to test DGEEV. I'll give it a try.

Paul

comment:71 Changed 9 years ago by zimmerma

thanks Dima for giving the instructions on how to test DGEEV. I'll give it a try.

on a 64-bit Core 2 I get:

tarte% ./a.out < inp.txt 
 DGEEV Example Program Results

 Eigenvalue( 1) =  1.77245385090551588191940000000000000E+00

 Eigenvector( 1)
  8.7095E-01
  4.9138E-01

 Eigenvalue( 2) = -1.77245385090551588191940000000000000E+00

 Eigenvector( 2)
 -8.7095E-01
  4.9138E-01

On Dave machine I get:

zimmerma@hawk:~$ ./a.out < inp.txt 
 DGEEV Example Program Results

 Eigenvalue( 1) =  1.77245385090551588191940000000000000E+00

 Eigenvector( 1)
  8.7095E-01
  4.9138E-01

 Eigenvalue( 2) = -1.77245385090551610396400000000000000E+00

 Eigenvector( 2)
 -8.7095E-01
  4.9138E-01

The second eigenvalue differs. Now we have to investigate inside the DGEEV call. If it performs only additions, subtractions, multiplications, divisions and square roots, there should be no difference according to IEEE 754 (unless different formats are used).

Paul

comment:72 Changed 9 years ago by leif

  • Authors set to Dima Pasechnik
  • Reviewers set to Leif Leonhardy
  • Status changed from needs_review to positive_review

Positive review from me for Dima's patch. (Feel free to revert in case you disagree.)

Since the floating-point precision does (or should) not depend on the machine word width (or pointer size), it's better to not give different results for the doctest tagged with "32-bit" / "64-bit".

(As mentioned above, by changing compiler flags I can make the original doctest fail on a 64-bit Core2 as well as make it pass on a 32-bit Pentium4 Prescott.)

comment:73 Changed 9 years ago by leif

P.S.: Despite the positive review for Dima's patch, I appreciate Paul tracking this further down.

I wouldn't consider this a real LAPACK bug, but wonder if it's "intentional". It's quite weird anyway, though one should IMHO in general not expect to get symmetric results here; I think it's mostly our example that makes this perhaps more surprising.

comment:74 Changed 9 years ago by drkirkby

Worth noting is:

  • lapack.a gets copied to $SAGE_LOCAL/lib in the lapack-20071123.p1.spkg
  • On OS X, the ATLAS package is not built.
  • On Linux and Solaris, ATLAS get's built, creates a new lapack.a (I believe).
  • There's a comment dated 28th November 2007 in SPKG.txt for ATLAS that: ATLAS produces borked lapack.so which has missing symbols
  • On Linux, Solaris and FreeBSD, there's some code in the ATLAS package for creating a shared liblapack.so from the static liblapack.a
  • On Solaris, the liblapack.so get's deleted as it causes problems with R.

So on OS X, one only needs to look at the LAPACK code, but on Linux, FreeBSD and Solaris one needs to consider what ATLAS might do too.

Also note that on my system, the output of my Noddy C program can't be changed by changing the compiler flags to use the 387 or SSE instructions. It always gives the output expected by the doctest.

Dave

comment:75 Changed 9 years ago by drkirkby

This might end up being "Report upstream to Intel!!" Some of you may be old enough to remember the bug in the Intel i486 for floating point code. I had one hell of a struggle with Dell to get the CPU replaced. Intel sent me a 38 page document explaining why the bug was not of concern to most of the scientific community. Even Intel were reluctant to replace the CPU, but I finally got it replaced.

Dave

comment:76 follow-up: Changed 9 years ago by drkirkby

What I find annoying about these doctests is that whilst the output will be

1.772453850905515

or

1.772453850905516

depending on system,

With these changes, now anything from

1.772453850905510

to

1.772453850905519

is considered acceptable. So we have permitted 'errors' several times higher than necessary.

In some cases, I've met in the past, the adding of ... has resulted in results errors 1000x higher being considered acceptable. IMHO, this method of testing is not very good.

Dave

comment:77 in reply to: ↑ 76 ; follow-up: Changed 9 years ago by leif

Replying to drkirkby:

What I find annoying about these doctests is that whilst the output will be 1.772453850905515 or 1.772453850905516 depending on system. With these changes, now anything from 1.772453850905510 to 1.772453850905519 is considered acceptable. So we have permitted 'errors' several times higher than necessary.

In some cases, I've met in the past, the adding of ... has resulted in results errors 1000x higher being considered acceptable. IMHO, this method of testing is not very good.

In the light of the "user warning" given, I think using dots is acceptable here.


W.r.t. 387 vs. SSE, try this:

#include <stdio.h>
#include <math.h>

volatile double x=M_PI;
int main() {
    printf ("float:\n%+.70f\n%+.70f\n",sqrtf((float)x),-sqrtf((float)x));

    printf ("double:\n%+.70lf\n%+.70lf\n",sqrt(x),-sqrt(x));

    printf ("long double:\n%+.70Lf\n%+.70Lf\n",
        sqrtl((long double)x),
        -sqrtl((long double)x));

    return 0;
}

I only get a different result for the negated square root in the float case with 387 code (Core2, gcc 4.4.3).

comment:78 in reply to: ↑ 77 Changed 9 years ago by drkirkby

Replying to leif:

In the light of the "user warning" given, I think using dots is acceptable here.

Yes, agreed. Just in general I think this method of comparing floating point numbers is rather dumb.

W.r.t. 387 vs. SSE, try this:

SSE

drkirkby@hawk:~$ gcc test3.c -lm -mfpmath=sse
drkirkby@hawk:~$ ./a.out
float:
+1.7724539041519165039062500000000000000000000000000000000000000000000000
-1.7724539041519165039062500000000000000000000000000000000000000000000000
double:
+1.7724538509055158819194275565678253769874572753906250000000000000000000
-1.7724538509055158819194275565678253769874572753906250000000000000000000
long double:
+1.7724538509055159927248895845863785325491335242986679077148437500000000
-1.7724538509055159927248895845863785325491335242986679077148437500000000

387

drkirkby@hawk:~$ gcc test3.c -lm -mfpmath=387
drkirkby@hawk:~$ ./a.out
float:
+1.7724539041519165039062500000000000000000000000000000000000000000000000
-1.7724538755670267153874419818748719990253448486328125000000000000000000
double:
+1.7724538509055158819194275565678253769874572753906250000000000000000000
-1.7724538509055158819194275565678253769874572753906250000000000000000000
long double:
+1.7724538509055159927248895845863785325491335242986679077148437500000000
-1.7724538509055159927248895845863785325491335242986679077148437500000000

I only get a different result for the negated square root in the float case with 387 code (Core2, gcc 4.4.3).

Yes, I too get different results (OpenSolaris, Intel Xeon W3580 @ 3.33 GHz, gcc 4.5.0)

Perhaps Paul has some comments on those different numbers. They would appear to violate one of the principle claims of IEE-754 to me, but perhaps I am wrong.

Dave

comment:79 follow-up: Changed 9 years ago by zimmerma

the gcc documentation says that with {{{-mfpmath=387}}, temporary results are computed with 80 bits of precision (i.e., with double extended precision), thus different results might be obtained. It also points to -ffloat-store, which prevents using extended precision. With -ffloat-store I get on a 64-bit Core 2 (and without it, I get different absolute values for float, exactly as Dave in the above comment):

tarte% gcc -ffloat-store -mfpmath=387 test3.c -lm
tarte% ./a.out
float:
+1.7724539041519165039062500000000000000000000000000000000000000000000000
-1.7724539041519165039062500000000000000000000000000000000000000000000000
double:
+1.7724538509055158819194275565678253769874572753906250000000000000000000
-1.7724538509055158819194275565678253769874572753906250000000000000000000
long double:
+1.7724538509055159927248895845863785325491335242986679077148437500000000
-1.7724538509055159927248895845863785325491335242986679077148437500000000

I thus suggest to recompile Lapack with -ffloat-store and try again.

Paul

comment:80 follow-up: Changed 9 years ago by zimmerma

I've reported to Kaveh Ghazi (a GCC developer) the strange behaviour of comment 78. If/when I get an answer, I will report it here.

Paul

comment:81 in reply to: ↑ 80 ; follow-ups: Changed 9 years ago by zimmerma

Replying to zimmerma:

I've reported to Kaveh Ghazi (a GCC developer) the strange behaviour of comment 78. If/when I get an answer, I will report it here.

cf http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46080 to follow the GCC issue. However I'm not sure this is related to this ticket. Does Lapack use single precision (i.e., the C float type)?

Paul

comment:82 in reply to: ↑ 81 Changed 9 years ago by dimpase

Replying to zimmerma:

Replying to zimmerma:

I've reported to Kaveh Ghazi (a GCC developer) the strange behaviour of comment 78. If/when I get an answer, I will report it here.

cf http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46080 to follow the GCC issue. However I'm not sure this is related to this ticket. Does Lapack use single precision (i.e., the C float type)?

Lapack has routines for both single and double precision (names starting with S and D, respectively)

Dmitrii

Paul

comment:83 follow-up: Changed 9 years ago by zimmerma

Lapack has routines for both single and double precision (names starting with S and D, respectively)

thus since the Lapack issue is with dgeev, it might be independent from the GCC float issue. However, since the GCC issue is marked as a regression from 4.4, somebody with an older version of GCC might try to recompile Lapack with it on the machines where one got different absolute values.

Paul

comment:84 in reply to: ↑ 83 Changed 9 years ago by leif

Replying to zimmerma:

Lapack has routines for both single and double precision (names starting with S and D, respectively)

thus since the Lapack issue is with dgeev, it might be independent from the GCC float issue. However, since the GCC issue is marked as a regression from 4.4, somebody with an older version of GCC might try to recompile Lapack with it on the machines where one got different absolute values.

The Sage / LAPACK results on the 32-bit Pentium4 Prescott came from GCC 4.3.3. (And I didn't get differences for my C test program there, regardless of SSE/387 and C99.)

Also, XCode comes with even older GCCs.

comment:85 in reply to: ↑ 79 Changed 9 years ago by dimpase

I thus suggest to recompile Lapack with -ffloat-store and try again.

I tried this on MacOSX 10.5 PPC (G4) with gfortran 4.2.1, and saw no difference. Basically, there seems to be no appropriate fortran compiler options available for this platform (Apple did not work on a Fortran compiler to use the better FP (altivec) on G4 processors; IBM did have a Fortran compiler, xlf (?) for that chip of them, but it is no longer sold). The following must be telling the story:

(sage subshell) cantor:dgeev dima$ sage_fortran -faltivec -llapack -lblas $SAGE_LOCAL/lib/libgfortran.a dgt.f dgeev.f
f951: warning: command line option "-faltivec" is valid for C/C++/ObjC/ObjC++ but not for Fortran
f951: warning: command line option "-faltivec" is valid for C/C++/ObjC/ObjC++ but not for Fortran

comment:86 Changed 9 years ago by mpatel

  • Merged in set to sage-4.6.rc0
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:87 in reply to: ↑ 81 Changed 9 years ago by zimmerma

cf http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46080 to follow the GCC issue.

the GCC developers say this is not a GCC bug, but "simply the unpredictability of excess precision arithmetics", and advise to use -ffloat-store/-fexcess-precision=standard/volatile float.

Is it possible to use -ffloat-store to build Lapack? It seems to be a valid option of sage_fortran:

tarte% /localdisk/tmp/sage-4.6/local/bin/sage_fortran -v --help | grep store
...
-ffloat-store               Don't allocate floats and doubles in extended-
                              precision registers

Paul

Note: See TracTickets for help on using tickets.