#10042 closed defect (fixed)
Doctest failure in sage/rings/polynomial/polynomial_element.pyx
Reported by: | mpatel | Owned by: | mvngu |
---|---|---|---|
Priority: | blocker | Milestone: | sage-4.6 |
Component: | doctest coverage | Keywords: | |
Cc: | mhansen, zimmerma, was, jason | Merged in: | sage-4.6.rc0 |
Authors: | Dima Pasechnik | Reviewers: | Leif Leonhardy |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description (last modified by )
With the forthcoming 4.6.alpha2, I get this doctest failure on the Skynet machines cicero (x86-Linux-pentium4-fc) and fulvia (x86_64-SunOS-core2):
sage -t -long "devel/sage/sage/rings/polynomial/polynomial_element.pyx" ********************************************************************** File "/home/mpatel/build/fulvia/sage-4.6.alpha2/devel/sage/sage/rings/polynomial/polynomial_element.pyx", line 4316: sage: f.roots(algorithm='numpy') Expected: doctest... UserWarning: NumPy does not support arbitrary precision arithmetic. The roots found will likely have less precision than you expect. [(-1.7724538509055158819194275565678253769874572753906250000000, 1), (1.7724538509055158819194275565678253769874572753906250000000, 1)] Got: doctest:1: UserWarning: NumPy does not support arbitrary precision arithmetic. The roots found will likely have less precision than you expect. [(-1.7724538509055158819194275565678253769874572753906250000000, 1), (1.7724538509055161039640324815991334617137908935546875000000, 1)]
Related: #8054.
Attachments (1)
Change History (88)
comment:1 Changed 9 years ago by
- Description modified (diff)
comment:2 Changed 9 years ago by
- Status changed from new to needs_info
comment:3 Changed 9 years ago by
We didn't upgrade NumPy. We released 4.6.alpha2 just some hours ago. The source is in
http://sage.math.washington.edu/home/release/sage-4.6.alpha2
A list of merged tickets is here.
comment:4 Changed 9 years ago by
it works on my machine (64-bit Core 2) with sage-4.6.alpha2, thus it is a machine-dependent problem. Please could someone who has access to cicero or fulvia isolate this problem? The code starts at line 4560 in rings/polynomial/polynomial_element.pyx.
Maybe it is a problem in NumPy?, maybe in the conversion between RealField? and the NumPy? types.
Note the difference is only by one unit in last place between the "got" and "expected" values, thus it might be a conversion error too.
Paul
comment:5 Changed 9 years ago by
On OpenSolaris x86:
sage -t -long devel/sage/sage/rings/polynomial/polynomial_element.pyx ********************************************************************** File "/export/home/drkirkby/sage-4.6.alpha2/devel/sage-main/sage/rings/polynomial/polynomial_element.pyx", line 4316: sage: f.roots(algorithm='numpy') Expected: doctest... UserWarning: NumPy does not support arbitrary precision arithmetic. The roots found will likely have less precision than you expect. [(-1.7724538509055158819194275565678253769874572753906250000000, 1), (1.7724538509055158819194275565678253769874572753906250000000, 1)] Got: doctest:1: UserWarning: NumPy does not support arbitrary precision arithmetic. The roots found will likely have less precision than you expect. [(-1.7724538509055158819194275565678253769874572753906250000000, 1), (1.7724538509055161039640324815991334617137908935546875000000, 1)] **********************************************************************
I've not looked at this code, but is this only using machine precision? (That's what the numPy message semi-implies). If so, why is there any surprise this acts differently on different machines? Perhaps I've missed the point of this.
Dave
comment:6 follow-up: ↓ 7 Changed 9 years ago by
I've not looked at this code, but is this only using machine precision? If so, why is there any surprise this acts differently on different machines?
if the code is only using basic arithmetic operations and the square root, which is what it should
do to solve x^2-pi
(which is what the failing test does) the behaviour should not depend on
the machine used, since those operations are standardized by IEEE 754.
Paul
comment:7 in reply to: ↑ 6 Changed 9 years ago by
Replying to zimmerma:
I've not looked at this code, but is this only using machine precision? If so, why is there any surprise this acts differently on different machines?
if the code is only using basic arithmetic operations and the square root, which is what it should do to solve
x^2-pi
(which is what the failing test does) the behaviour should not depend on the machine used, since those operations are standardized by IEEE 754.Paul
But the maths library is not standardised by IEEE 754. It's known for example that exp(1.0) gives a different answer on SPARC to x86.
That said, the fact the answers should be sqrt(pi) and -sqrt(pi), one might hope that one gets the same magnitude I must admit.
I just created a noddy C program on my OpenSolaris machine where this failed.
drkirkby@hawk:~$ cat test.c #include <stdio.h> #include <math.h> volatile double x=M_PI; int main() { printf ("%.70lf\n",sqrt(x)); } drkirkby@hawk:~$ gcc -lm test.c drkirkby@hawk:~$ ./a.out 1.7724538509055158819194275565678253769874572753906250000000000000000000
comment:8 follow-up: ↓ 11 Changed 9 years ago by
I just created a noddy C program on my OpenSolaris? machine where this failed.
what did fail? You got the expected result.
Paul
comment:9 Changed 9 years ago by
Another datapoint. On OS X 10.4 on PPC, I get
Expected: doctest... UserWarning: NumPy does not support arbitrary precision arithmetic. The roots found will likely have less precision than you expect. [(-1.7724538509055158819194275565678253769874572753906250000000, 1), (1.7724538509055158819194275565678253769874572753906250000000, 1)] Got: doctest:1: UserWarning: NumPy does not support arbitrary precision arithmetic. The roots found will likely have less precision than you expect. [(-1.7724538509055161039640324815991334617137908935546875000000, 1), (1.7724538509055158819194275565678253769874572753906250000000, 1)]
in Sage 4.6.alpha2
comment:10 follow-up: ↓ 12 Changed 9 years ago by
comment:11 in reply to: ↑ 8 Changed 9 years ago by
Replying to zimmerma:
I just created a noddy C program on my OpenSolaris? machine where this failed.
what did fail? You got the expected result.
Paul
What I meant by "where this failed" was that the C program was compiled and built on the same computer on which the Sage doctest failed.
I realise that the result I got from the noddy C program was the same as was expected in the Sage doctest.
I don't have the time to investigate this.
Dave
comment:12 in reply to: ↑ 10 Changed 9 years ago by
Replying to zimmerma:
Karl-Dieter, please could you isolate the problem (see instructions in comment 4)? In particular we need to know if the problem lies in NumPy? itself, or in the RR<->NumPy? conversion.
Like Dave, I don't have time to do much with this. Apparently something goes 'wrong' in ?RealField?, based on what I see here:
---------------------------------------------------------------------- | Sage Version 4.6.alpha2, Release Date: 2010-09-29 | | Type notebook() for the GUI, and license() for information. | ---------------------------------------------------------------------- ********************************************************************** * * * Warning: this is a prerelease version, and it may be unstable. * * * ********************************************************************** sage: R.<x> = RealField(200)[] sage: f = x^2 - R(pi) sage: f.roots(algorithm='numpy') /Users/student/Desktop/sage-4.6.alpha2/local/bin/sage-ipython:1: UserWarning: NumPy does not support arbitrary precision arithmetic. The roots found will likely have less precision than you expect. #!/usr/bin/env python [(-1.7724538509055161039640324815991334617137908935546875000000, 1), (1.7724538509055158819194275565678253769874572753906250000000, 1)] sage: seq=[] sage: K = f.parent().base_ring() sage: L = K sage: K Real Field with 200 bits of precision sage: input_fp = True sage: L Real Field with 200 bits of precision sage: output_fp=True sage: input_complex=False sage: output_complex=False sage: input_arbprec=True sage: import numpy sage: from numpy.linalg.linalg import LinAlgError sage: coeffs = f.list() sage: coeffs [-3.1415926535897932384626433832795028841971693993751058209749, 0.00000000000000000000000000000000000000000000000000000000000, 1.0000000000000000000000000000000000000000000000000000000000] sage: ty = float sage: numpy_dtype='double' sage: numpy_array=numpy.array([ty(c) for c in reversed(coeffs)],dtype=numpy_dtype) sage: ext_rts1 = numpy.roots(numpy_array) sage: ext_rts1 array([ 1.77245385, -1.77245385]) sage: rts = [] sage: for rt in ext_rts1: ....: rts.append(CDF(rt)) ....: sage: rts.sort() sage: ext_rts = rts sage: ext_rts [-1.77245385091, 1.77245385091] sage: rts = sorted([L(root.real()) for root in ext_rts if root.imag() == 0]) sage: rts [-1.7724538509055161039640324815991334617137908935546875000000, 1.7724538509055158819194275565678253769874572753906250000000]
which is the same as my final output, I think. Note that ext_rts
still have the same magnitude. This is a big-endian machine, I guess (PPC)? I have no idea if that is relevant, and I note that the error seems to be the opposite of the one others reported.
comment:13 follow-up: ↓ 14 Changed 9 years ago by
thank you Karl-Dieter for your help. It is much appreciated. On my computer I get more digits
if I print elements of ext_rts1
separately:
sage: ext_rts1[0] -1.7724538509055159 sage: ext_rts1[1] 1.7724538509055159
which shows that they are exactly opposite:
sage: ext_rts1[0]+ext_rts1[1] 0.0
Similarly I get:
sage: rts[0] -1.77245385091 sage: rts[1] 1.77245385091 sage: rts[0]+rts[1] 0
What do you get for the above commands?
Paul
comment:14 in reply to: ↑ 13 Changed 9 years ago by
Replying to zimmerma:
thank you Karl-Dieter for your help. It is much appreciated. On my computer I get more digits What do you get for the above commands?
Okay, not the same.
sage: ext_rts1[0] 1.7724538509055159 sage: ext_rts1[1] -1.7724538509055161 sage: ext_rts1[0]+ext_rts1[1] -2.2204460492503131e-16 sage: rts[0] -1.7724538509055161039640324815991334617137908935546875000000 sage: rts[1] 1.7724538509055158819194275565678253769874572753906250000000 sage: rts[0]+rts[1] -2.2204460492503130808472633361816406250000000000000000000000e-16
This happens whether I use 4.6.alpha2 (numpy-1.3.0) or 4.6.alpha1+numpy-1.4.1 (happened to be working on that ticket on this computer, of course that's not merged yet). Was this doctest added in alpha2?
comment:15 Changed 9 years ago by
Up to a sign, I get the same thing on fulvia:
sage: ext_rts1[0] -1.7724538509055159 sage: ext_rts1[1] 1.7724538509055161 sage: ext_rts1[0] + ext_rts1[1] 2.2204460492503131e-16 sage: rts[0] + rts[1] 2.2204460492503130808472633361816406250000000000000000000000e-16
I think the doctest is new between 4.6.alpha1 and 4.6.alpha2, as well as the warning message about NumPy not supporting arbitrary precision arithmetic.
comment:16 Changed 9 years ago by
- Cc was added
- Status changed from needs_info to needs_work
- Work issues set to report upstream (numpy)
thanks Karl-Dieter and John. This clearly shows this is an upstream (numpy) problem. Does anybody know how to report the problem upstream? The numpy SPKG.txt file says that *Josh* Kantor is the maintainer, but on trac.sagemath.org I can only see *Jeff* Kantor.
Paul
comment:17 Changed 9 years ago by
comment:18 Changed 9 years ago by
With respect to upstream, it's worth pointing out that we are a couple releases behind. Eventually John or I might be able to try this with the numpy 1.5.0 that is now on Trac # something, but it's possible it's already fixed there, so you may want to try the latest numpy and see about that first.
comment:19 Changed 9 years ago by
so you may want to try the latest numpy and see about that first
unfortunately I cannot reproduce the original problem on my computer, thus trying the latest numpy will surely work for me, but will not help resolving that issue.
Paul
comment:20 Changed 9 years ago by
I downloaded the new numpy skpg from #9808 and I've installed it on fulvia. Now I'm running sage -ba
, and then I'll rerun the doctest here.
comment:21 Changed 9 years ago by
Well, that didn't work ("ImportError?: ld.so.1: python: fatal: relocation error:"). So I'll try again, building everything from scratch.
comment:22 follow-up: ↓ 24 Changed 9 years ago by
Dave kindly opened an account for me on his computer. I did investigate: the problem lies in the eigvals routine from numpy, which itself calls the dgeev routine from lapack. I am stuck there, since the dgeev routine is quite complex. Again, the maintainers of the lapack spkg are Josh Kantor and William.
Paul
comment:23 Changed 9 years ago by
I'm also stuck because the new version of numpy from #9808 doesn't really build successfully on fulvia, so I can't test whether the new numpy fixes the problem. One suggestion at #9808 was that lapack and/or ATLAS are broken on fulvia, which may be causing the problems with the new numpy build. I suppose it might also cause the problem on this ticket, from what you're saying.
comment:24 in reply to: ↑ 22 Changed 9 years ago by
Replying to zimmerma:
Dave kindly opened an account for me on his computer. I did investigate: the problem lies in the eigvals routine from numpy, which itself calls the dgeev routine from lapack. I am stuck there, since the dgeev routine is quite complex. Again, the maintainers of the lapack spkg are Josh Kantor and William.
I get the same problem with the new numpy package, although I didn't do ./sage -ba
since it would take forever... I don't think that matters, though, since
sage: numpy.version.version '1.5.0'
so it's using the new one.
You may want to email sage-devel on this one to ask who would be checking up on lapack upstream; I don't know, since William has a lot of other things going on and is often a 'default' spkg maintainer, and I'm not sure Kantor has been active much (HISTORY.txt on the website indicates his last 'contribution' in terms of refereeing or code was about a year ago).
Feel free to place any commands here that would enable someone to easily replicate what you found on another machine, by the way.
comment:25 Changed 9 years ago by
Feel free to place any commands here that would enable someone to easily replicate what you found on another machine, by the way.
Here is what I did on Dave's machine:
R.<x> = RealField(200)[] f = x^2 - R(pi) import numpy coeffs = f.list() ty = float numpy_dtype='double' numpy_array=numpy.array([ty(c) for c in reversed(coeffs)],dtype=numpy_dtype) ext_rts1 = numpy.roots(numpy_array) ext_rts1[0] ext_rts1[1] p=numpy_array from numpy.lib.shape_base import hstack, atleast_1d p = atleast_1d(p) import numpy.core.numeric as NX non_zero = NX.nonzero(NX.ravel(p))[0] trailing_zeros = len(p) - non_zero[-1] - 1 p = p[int(non_zero[0]):int(non_zero[-1])+1] N = len(p) from numpy.lib.twodim_base import diag, vander A = diag(NX.ones((N-2,), p.dtype), -1) A[0, :] = -p[1:] / p[0] from numpy.linalg import eigvals, lstsq roots = eigvals(A) roots[0] roots[1] from numpy.core import asarray, inexact, complexfloating, zeros, double a = copy(A) a = asarray(a) wrap = getattr(a, "__array_wrap__", a.__array_wrap__) t = double result_t = double from numpy.linalg import lapack_lite n = a.shape[0] dummy = zeros((1,), t) wr = zeros((n,), t) wi = zeros((n,), t) lwork = 1 work = zeros((lwork,), t) lapack_routine = lapack_lite.dgeev results = lapack_routine('N', 'N', n, a, n, wr, wi, dummy, 1, dummy, 1, work, -1, 0) res=wr.astype(result_t) res[0] res[1]
Paul
PS: it seems the final wr
value is not always correctly set, sometimes I got 0.0 for
res[0] and res[1].
comment:26 Changed 9 years ago by
I'm not sure what I am supposed to get with this - what is wrong with 0.0?
sage: res array([ 0., 0.]) sage: res[0] 0.0 sage: res[1] 0.0
Should this not happen, but instead just give 0.
? Maybe that is the bug, then; I get this consistently, and astype
is a numpy method.
I still get the bad result up through roots
, but the rest is mysterious to me (other than it gets access to the dgeev
method. results
seems to be just
sage: results {'info': 0, 'lda': 2, 'jobvl': 'N', 'ldvl': 1, 'n': 2, 'ldvr': 1, 'lwork': -1, 'jobvr': 'N', 'dgeev_': 0}
I hope that is helpful to know.
comment:27 Changed 9 years ago by
I note someone added a work issues report upstream (numpy). Is that still appropiate? Has that been done?
http://projects.scipy.org/numpy
is the place to do it if this is an upstream bug. I've just created a bug report there myself http://projects.scipy.org/numpy/ticket/1625 for the Numpy issue at #9808.
comment:28 follow-ups: ↓ 29 ↓ 31 Changed 9 years ago by
- Work issues report upstream (numpy) deleted
I'm not sure what I am supposed to get with this - what is wrong with 0.0?
I also got sometimes 0.0, instead of the expected values array([ 1.77245385, -1.77245385]).
Thanks Dave for reporting the problem upstream.
Paul
comment:29 in reply to: ↑ 28 Changed 9 years ago by
Replying to zimmerma:
I'm not sure what I am supposed to get with this - what is wrong with 0.0?
I also got sometimes 0.0, instead of the expected values array([ 1.77245385, -1.77245385]).
Thanks Dave for reporting the problem upstream.
Paul
Paul,
I have NOT reported this upstream. I reported another Numpy issue to the Numpy list, and gave a link to where to post bug reports, but I have not done this one. It would be better if someone who understands the bug better does that.
Dave
comment:30 Changed 9 years ago by
- Work issues set to report upstream
comment:31 in reply to: ↑ 28 Changed 9 years ago by
Replying to zimmerma:
I'm not sure what I am supposed to get with this - what is wrong with 0.0?
I also got sometimes 0.0, instead of the expected values array([ 1.77245385, -1.77245385]).
Umm... help me out here.
wr = zeros((n,), t)
So, shouldn't that make an array of zeros?
Also, when I do results
, the output is not really helpful to me. Is this supposed to create the array you mention above in place of wr
? I'm sorry if I'm missing something obvious.
comment:32 follow-up: ↓ 33 Changed 9 years ago by
my guess is that lapack_routine
modifies in place the array wr
of zeros,
and that res=wr.astype(result_t)
recovers the modified array.
Paul
comment:33 in reply to: ↑ 32 Changed 9 years ago by
Replying to zimmerma:
my guess is that
lapack_routine
modifies in place the arraywr
of zeros, and thatres=wr.astype(result_t)
recovers the modified array.
Okay, so one does want something other than the zeros. Thanks for clarifying that.
But I have to say, I don't even get anything different on my OS X 10.6 Mac, yet it gets the right roots
sage: roots = eigvals(A) sage: roots[0] 1.7724538509055159 sage: roots[1] -1.7724538509055159
Is it possible you mistyped something around the lapack_routine
spot?
comment:34 Changed 9 years ago by
on Dave's machine with sage-4.6.alpha2 I get:
sage: roots = eigvals(A) sage: roots[0] -1.7724538509055159 sage: roots[1] 1.7724538509055161
Note the different last digits, the positive root should print 1.7724538509055159
.
On a Core 2 I get the same results as in comment 33.
Paul
comment:35 Changed 9 years ago by
Yes, and I get something similar in Comment 14. My question is that I would like to see what you get for res[0]
and res[1]
on a machine where it supposedly works, because on my 10.6 Mac (where roots
is correct) I also get 0.0
for those, just like on the machine where roots
is incorrect. I hope this clarifies what I meant.
comment:36 Changed 9 years ago by
comment:37 Changed 9 years ago by
Has anyone had chance to look at this? It is marked as a blocker for 4.6, so is quite important this is squashed soon.
Dave
comment:38 follow-up: ↓ 39 Changed 9 years ago by
Has anyone had chance to look at this? It is marked as a blocker for 4.6, so is quite important this is squashed soon.
I agree. The problem has been isolated to be a NumPy? (or Lapack) problem. Thus someone fluent with NumPy? and/or Lapack should look at it. Apart from William, who is quite busy, can we add someone in cc who knows about Numpy and/or Lapack?
Otherwise, a quick-and-dirty workaround is to replace the last digits by ...
in the doctest
to allow some numerical error.
Paul
comment:39 in reply to: ↑ 38 Changed 9 years ago by
Replying to zimmerma:
Has anyone had chance to look at this? It is marked as a blocker for 4.6, so is quite important this is squashed soon.
I agree. The problem has been isolated to be a NumPy? (or Lapack) problem. Thus someone fluent with NumPy? and/or Lapack should look at it. Apart from William, who is quite busy, can we add someone in cc who knows about Numpy and/or Lapack?
If we could get some more precise details about how it's been isolated, we might be able to post to the Numpy list now.
Otherwise, a quick-and-dirty workaround is to replace the last digits by
...
in the doctest to allow some numerical error.Paul
I'm not sure that achieves anything useful. Sure the doctest will pass, but it might actually delay the urgency given to fixing the underlying problem. A successful test is one that has uncovered a previously unknown error. This tests has done that. I don't personally think it's appropriate to water down the test just to get 100% passes.
I suggested some time ago we should have a page on a wiki which documents known issues with a release - either before or after the release is made.
e.g I started this, though it's not been updated
http://wiki.sagemath.org/errata
I think it would be better to document the problem rather than temporarily hide it.
Dave
comment:40 follow-up: ↓ 55 Changed 9 years ago by
On sage-release, Georg Weber reports the following:
Sage-4.6.alpha3 builds fine on 32bit MacIntel with OS X 10.4 (Tiger). sage -t -long -force_lib "devel/sage/sage/rings/polynomial/ polynomial_element.pyx" ********************************************************************** File "/Users/Shared/sage/test/sage-4.6.alpha3/devel/sage/sage/rings/ polynomial/polynomial_element.pyx", line 4317: sage: f.roots(algorithm='numpy') Expected: doctest... UserWarning: NumPy does not support arbitrary precision arithmetic. The roots found will likely have less precision than you expect. [(-1.7724538509055158819194275565678253769874572753906250000000, 1), (1.7724538509055158819194275565678253769874572753906250000000, 1)] Got: doctest:1: UserWarning: NumPy does not support arbitrary precision arithmetic. The roots found will likely have less precision than you expect. [(-1.7724538509055161039640324815991334617137908935546875000000, 1), (1.7724538509055158819194275565678253769874572753906250000000, 1)] **********************************************************************
So this is yet another data point about this issue.
comment:41 Changed 9 years ago by
- Cc jason added
comment:42 follow-up: ↓ 43 Changed 9 years ago by
If I'm not mistaken, the systems which are showing this problem are all 32-bit builds.
- 'fulvia', the Solaris 10 x86 system is using a 32-bit version of Sage, despite the hardware is 64-bit
- 'hawk', my own personal OpenSolaris machine is using a 32-bit version of Sage, despite the hardware is 64-bit
- Cicero is a 32-bit Linux system - see http://wiki.sagemath.org/skynet
- Georg Weber's MAC is 32-bit.
Are there any counter examples? I believe John has built Sage on 'mark2' without the failure. Although that's a 64-bit system, the binary is only 32-bit.
From what I understand, sqrt() should be the same on any IEEE 754 system. However, the same is not true for pow(). It seems unlikely but perhaps one root is being computed using sqrt(x) and the other pow(x,0.5). Mathematically they are equal, but perhaps not on a computer with a finite number of bits.
Since pi is irrational, this will never be able to be computed exactly using floating point maths.
Dave
comment:43 in reply to: ↑ 42 Changed 9 years ago by
Replying to drkirkby:
If I'm not mistaken, the systems which are showing this problem are all 32-bit builds.
- 'fulvia', the Solaris 10 x86 system is using a 32-bit version of Sage, despite the hardware is 64-bit
- 'hawk', my own personal OpenSolaris machine is using a 32-bit version of Sage, despite the hardware is 64-bit
- Cicero is a 32-bit Linux system - see http://wiki.sagemath.org/skynet
- Georg Weber's MAC is 32-bit.
My PPC Mac also is building 32-bit.
From what I understand, sqrt() should be the same on any IEEE 754 system. However, the same is not true for pow(). It seems unlikely but perhaps one root is being computed using sqrt(x) and the other pow(x,0.5). Mathematically they are equal, but perhaps not on a computer with a finite number of bits.
Since Paul pointed out that there is some linear algorithm going on here from deep within numpy, it seems unlikely that this is what is going on.
comment:44 follow-up: ↓ 56 Changed 9 years ago by
Since Paul pointed out that there is some linear algorithm going on here from deep within numpy, it seems unlikely that this is what is going on.
yes it seems the computation is done by the lapack dgeev
routine. From the source file
dgeev.f
in the Sage distribution, it does not call POW
however it calls SQRT
. But the square root is a
standard IEEE function. What could happen however is that on some machines lapack could use
double extended precision (i.e., 64-bit instead of 53-bit significand).
One way to debug this is to add print statements in the dgeev.f
file, recompile lapack and see the difference between two different computers. How can one (easily) recompile and build a
spkg (I guess sage -br
will not do that)?
Paul
comment:45 Changed 9 years ago by
Paul's comments about the 53 or 64-bits has got me thinking.
I think gcc behaves differently with regards to SSE instructions on 32-bit and 64-bit builds. I believe in some cases the FPU is used, but in others the SSE instructions are used and I recall some comments saying the default changed between 32-bit builds and 64-bit builds, though there are compiler options one can use to override this in modern gccs.
However, why the absolute magnitude of the two roots differ is odd. The two roots I got were:
-1.7724538509055158819194275565678253769874572753906250000000 +1.7724538509055161039640324815991334617137908935546875000000
Dave
comment:46 follow-up: ↓ 47 Changed 9 years ago by
Dave, you might try to recompile the lapack spkg with -O0
instead of the default
-O3
.
Paul
comment:47 in reply to: ↑ 46 Changed 9 years ago by
Replying to zimmerma:
Dave, you might try to recompile the lapack spkg with
-O0
instead of the default-O3
.Paul
I'll try that Paul. The Lapack package is a real mess.
- It's probably a snapshot taken from CVS, given it has the date in it, but no version number.
- It's almost 3 years old, but the last Lapack was released in 30th June 2010
- The Mercurial repository is basically useless.
drkirkby@hawk:~/sage-4.6.alpha3/spkg/standard/lapack-20071123.p1$ hg status | head M spkg-install ? spkg-install~ ? src/BLAS/SRC/Makefile ? src/BLAS/SRC/caxpy.f ? src/BLAS/SRC/ccopy.f ? src/BLAS/SRC/cdotc.f ? src/BLAS/SRC/cdotu.f ? src/BLAS/SRC/cgbmv.f ? src/BLAS/SRC/cgemm.f
drkirkby@hawk:~/sage-4.6.alpha3/spkg/standard/lapack-20071123.p1$ cat SPKG.txt MAINTAINERS: Josh Kantor William Stein drkirkby@hawk:~/sage-4.6.alpha3/spkg/standard/lapack-20071123.p1$
I did a build of the old lapack with -O0, but it made no difference. I've tried my noddy C program with compiler options like -mfpmath=387 & -mfpmath=sse -m32 and -m64, but nothing changes the output.
The problem is a lot in Sage depends on lapack, so I think a total rebuild is in order in this case. That includes ATLAS on this system, which takes nearly 2 hours to build in 32-bit mode, as there's no tuning data for my Xeon W3580 on 32-bit builds. On 64-bit builds, ATLAS builds in about 8 minutes.
There must be a pretty good case for updating Lapack very soon, if not in this release. As this ticket is a blocker, in the event an updated lapack solves it, then I think that would be worth doing.
Dave
comment:48 follow-up: ↓ 49 Changed 9 years ago by
David,
The problem is a lot in Sage depends on lapack, so I think a total rebuild is in order in this case.
I would be surprised that the lapack dgeev
routine depends on other components.
Paul
comment:49 in reply to: ↑ 48 Changed 9 years ago by
Replying to zimmerma:
David,
The problem is a lot in Sage depends on lapack, so I think a total rebuild is in order in this case.
I would be surprised that the lapack
dgeev
routine depends on other components.Paul
Well. I've created a new lapack package, which I've built on my OpenSolaris machine.
I notice it can build 4 libraries, but it looks like Sage is only building one of them. There's this bit of code in one of the files. I've no idea what they all might do, though.
LAPACKLIB = lapack$(PLAT).a TMGLIB = tmglib$(PLAT).a EIGSRCLIB = eigsrc$(PLAT).a LINSRCLIB = linsrc$(PLAT).a
Perhaps it would be wise if we just run "make" and installed the default libraries, rather than a subset of them. The total build time of the lapack library is quite modest:
real 0m13.445s user 1m7.550s sys 0m26.048s Successfully installed lapack-3.2.2
(that's on a quad core 3.33 GHz Hyperthreaded Xeon, so it would obviously take more/less time on slower/faster hardware. It would not doubt take longer to build all 4 libraries too). But having a partial installation might not be such a good idea and would no doubt make running tests more difficult. (Of course, there's no spkg-check
file in lapack package in Sage. Testing code is a very low priority.)
I've put a temporary copy of the lapack package (md5 checksum 521afc0c5b1fde0a0159c4770b0e173e
) at
http://boxen.math.washington.edu/home/kirkby/crap/lapack-3.2.2.spkg
This has is just hacked together at this point. There's no Mecurial repository (I deleted the other as it looked pretty useless), SPKG.txt
is a mess, there's no spkg-check
, not spell checked, .... etc etc. BUT perhaps someone else can try it. Sage built ok on my OpenSolaris machine with this and it's just building the documentation I as write. I'll clean up the package at some point, but some can try if they want.
Dave
comment:50 Changed 9 years ago by
I created #10123 to update lapack. It does not look like it will solve this problem, so it's not huge urgency. I'll take care of that.
Dave
comment:51 Changed 9 years ago by
I made a mistake. Lapack is building ok, but once ATLAS built, it would have copied the lapack from my global ATLAS install. Hence this was not a correct build. I thought it was a bit quick, as a full build with ATLAS takes 2-3 hours, where this built in 20 minutes or so.
I'm rebuilding now.
Dave
comment:52 follow-up: ↓ 53 Changed 9 years ago by
For lapack, see also #10121: a cleaned up version of the current spkg.
comment:53 in reply to: ↑ 52 Changed 9 years ago by
Replying to jhpalmieri:
For lapack, see also #10121: a cleaned up version of the current spkg.
Thank you John. I cc'ed you on #10123. It looks like it will be relatively easy to update the version, as there spkg-install is very simple. So producing a new version should be very easy.
The current package name suggests this is based on code from November 2007, but the last commit to the repository is several months before then. Anyway, hopefully #10123 will bring some benefits.
Dave
comment:54 Changed 9 years ago by
I'm busy until at least Wednesday of next week (20th October), so will not be spending much time on Sage.
Updating the version of Lapack (#10123) is more complex than I thought it would be, and I don't have time to complete this. Libraries are called foobar.a and not libfoobar.a. This causes issues with both the packages self-tests and Sage doctests. So the package I created is effectively useless and will remain so unless I work on it more.
IIRC, I did manage to get this doc test to run, and it still failed, so the Lapack upgrade will not fix this. However, it's hard to report bugs upstream when we are running 3 year old versions of packages, so I think updating Lapack is necessary before we report this upstream.
So it would be good if someone could review John's cleaned up Lapack at #10121, and not wait for me to produce a new version of Lapack.
It would be good if someone could write a few doc tests broadly similar to this one, to ensure that under some other conditions the Numpy errors does not become totally unacceptable.
Having thought it more, I think Paul Zimmerma's suggestion to use the dots to make this pass might be sensible, as from a practical point of view, such a small error is not really a problem. There does appear to be a bug, but there are a lot more serious bugs in Sage that are not blockers.
For example, if one runs the Maxima self-tests, several fail with small numerical issues with ECL but pass with other Lisp interpreters. One of the Maxima developers has said he think ECL is "off by one bit in places". Of course, the fact we don't run the Maxima self-tests as part of Sage means we don't see this.
Many testing frameworks for software have several categories for failures, which include
- Expected to fail and did fail
- Expected to fail and passed.
- Expected to pass and failed
comment:55 in reply to: ↑ 40 Changed 9 years ago by
Replying to kcrisman:
On sage-release, Georg Weber reports the following:
Sage-4.6.alpha3 builds fine on 32bit MacIntel with OS X 10.4 (Tiger). sage -t -long -force_lib "devel/sage/sage/rings/polynomial/ polynomial_element.pyx" ********************************************************************** File "/Users/Shared/sage/test/sage-4.6.alpha3/devel/sage/sage/rings/ polynomial/polynomial_element.pyx", line 4317: ......
Same (up to the last digit...) on a MacOSX 10.5 PPC, with gcc-4.1.2 (using an unofficial snapshot of Apple's Xcode from R homepage)
Just my 0.01c
Dima
comment:56 in reply to: ↑ 44 Changed 9 years ago by
Replying to zimmerma:
[...] it seems the computation is done by the lapack
dgeev
routine. From the source filedgeev.f
in the Sage distribution, it does not callPOW
however it callsSQRT
. But the square root is a standard IEEE function. What could happen however is that on some machines lapack could use double extended precision (i.e., 64-bit instead of 53-bit significand).
Yes, the difference is caused by LAPACK, not NumPy (see below).
I'm not sure but I doubt PPCs have (Intels!) extended double precision, i.e. -mfpmath=sse
vs. -mfpmath=387
doesn't hold there. (But IBM uses other floating-point formats beyond 64-bit double precision.)
One way to debug this is to add print statements in the
dgeev.f
file, recompile lapack and see the difference between two different computers. How can one (easily) recompile and build a spkg (I guesssage -br
will not do that)?
For hacking the Fortran code (avoiding to create new LAPACK spkgs), it is sufficient to extract the corresponding files (e.g. dgeev.f
) from the spkg and put modified versions into the [static] LAPACK library, which is $SAGE_ROOT/local/lib/liblapack.a
, and to afterwards rebuild / reinstall the NumPy spkg, which links this library into its dynamic ones.
Assuming one has copied lapack-20071123.p1/src/SRC/dgeev.f
to $SAGE_ROOT
, one can do e.g. the following to include a modified version:
~/sage-x.y.z$ gfortran -fPIC -c dgeev.f # or sage_fortran -c dgeev.f ~/sage-x.y.z$ ar rs local/lib/liblapack.a dgeev.o # "s" saves running ranlib separately ~/sage-x.y.z$ ./sage -f spkg/standard/numpy-1.3.0.p4.spkg # force reinstallation
and then run Sage.
(Dave: I've not verified if the "s" option is POSIX... ;-) The ar
command above is just a short-cut for
~/sage-x.y.z$ ar r local/lib/liblapack.a dgeev.o # replace the existing dgeev module ~/sage-x.y.z$ ranlib local/lib/liblapack.a # rebuild the symbol table
)
On a 32-bit Pentium4 (Prescott) and a 64-bit Core2 I get, respectively:
sage: R.<x>=RealField(200)[] sage: f=x^2 - R(pi) sage: import numpy sage: na=numpy.array([float(c) for c in reversed(f.list())], dtype="double") sage: n_roots=numpy.roots(na) Workspace query only: A( 1, 1)= -0.0000000000000000000000000D+00 A( 1, 2)= 0.3141592653589793115998000D+01 A( 2, 1)= 0.1000000000000000000000000D+01 A( 2, 2)= 0.0000000000000000000000000D+00 Actual computation: A( 1, 1)= -0.0000000000000000000000000D+00 A( 1, 2)= 0.3141592653589793115998000D+01 A( 2, 1)= 0.1000000000000000000000000D+01 A( 2, 2)= 0.0000000000000000000000000D+00 Eigenvalues only... WR( 1)= -0.1772453850905515881919400D+01 WR( 2)= 0.1772453850905516103964000D+01 sage:
sage: R.<x> = RealField(200)[] sage: f = x^2 - R(pi) sage: import numpy sage: na=numpy.array([float(c) for c in reversed(f.list())], dtype='double') sage: n_roots=numpy.roots(na) Workspace query only: A( 1, 1)= -0.0000000000000000000000000D+00 A( 1, 2)= 0.3141592653589793115998000D+01 A( 2, 1)= 0.1000000000000000000000000D+01 A( 2, 2)= 0.0000000000000000000000000D+00 Actual computation: A( 1, 1)= -0.0000000000000000000000000D+00 A( 1, 2)= 0.3141592653589793115998000D+01 A( 2, 1)= 0.1000000000000000000000000D+01 A( 2, 2)= 0.0000000000000000000000000D+00 Eigenvalues only... WR( 1)= -0.1772453850905515881919400D+01 WR( 2)= 0.1772453850905515881919400D+01 sage:
(The machine parameters queried by LAPACK's DLAMCH()
are identical; the chosen precision / format for the PRINT
statements, D35.25
, should be sufficient.)
What about (temporarily?) tagging the doctest result(s) with # 32-bit
/ # 64-bit
? (It seems the results are identical on all 32-bit platforms...?)
I've not yet tracked this down further (perhaps someone more knowledgeable of LAPACK / Fortran can do this), nor have I tried the more recent LAPACK version.
comment:57 follow-up: ↓ 58 Changed 9 years ago by
The dgeev.f from the canonical place: http://www.netlib.org/lapack/double/dgeev.f does not differ from what we have in lapack spkg. So I doubt that a "more recent version" (well, this one is Fortran VI, no joke...) would provide a fix.
It looks to me more of a compiler bug than Lapack bug... Well, I used to know enough fortran to be able to dig this; I wonder how one can get hold of a non-gcc-based fortran compiler for one of platforms we have this issue...
comment:58 in reply to: ↑ 57 Changed 9 years ago by
Replying to dimpase:
The dgeev.f from the canonical place: http://www.netlib.org/lapack/double/dgeev.f does not differ from what we have in lapack spkg. So I doubt that a "more recent version" (well, this one is Fortran VI, no joke...) would provide a fix.
dgeev
calls many other modules...
It looks to me more of a compiler bug than Lapack bug...
I was thinking of GCC built-ins, too.
I wonder how one can get hold of a non-gcc-based fortran compiler for one of platforms we have this issue...
Sun's Fortran compiler on e.g. t2.math? Perhaps Dave also has an IBM (AIX) or HP[-UX] Fortran compiler.
Btw, on Linux, I have a couple of LAPACK library versions, including "3gf" ("gf" for GNU's gfortran).
comment:59 follow-up: ↓ 60 Changed 9 years ago by
I've just recompiled LAPACK with -march=native -mfpmath=sse
(by adding this in the sage_fortran
script) on the Pentium4 Prescott (which has SSE3)... with that, the difference vanishes.
I still wonder what happens on PowerPC CPUs.
comment:60 in reply to: ↑ 59 ; follow-up: ↓ 62 Changed 9 years ago by
Replying to leif:
I've just recompiled LAPACK with
-march=native -mfpmath=sse
(by adding this in thesage_fortran
script) on the Pentium4 Prescott (which has SSE3)... with that, the difference vanishes.I still wonder what happens on PowerPC CPUs.
there no SSEx on PPC, it's an x86 thing, IMHO... OK, so perhaps there is a miracle compiler switch on PPC that will do the same trick. Let me write a Fortran - only test, and see...
comment:61 Changed 9 years ago by
Replying to leif:
I've just recompiled LAPACK with
-march=native -mfpmath=sse
(by adding this in thesage_fortran
script) on the Pentium4 Prescott (which has SSE3)... with that, the difference vanishes.
... and the opposite (i.e., -mfpmath=387
) on the Core2, which in turn results in:
leif@quadriga:~/Sage/sage-4.6.alpha3$ ./sage -t -long devel/sage/sage/rings/polynomial/polynomial_element.pyx sage -t -long "devel/sage/sage/rings/polynomial/polynomial_element.pyx" ********************************************************************** File "/home/leif/Sage/sage-4.6.alpha3/devel/sage/sage/rings/polynomial/polynomial_element.pyx", line 4317: sage: f.roots(algorithm='numpy') Expected: doctest... UserWarning: NumPy does not support arbitrary precision arithmetic. The roots found will likely have less precision than you expect. [(-1.7724538509055158819194275565678253769874572753906250000000, 1), (1.7724538509055158819194275565678253769874572753906250000000, 1)] Got: doctest:1: UserWarning: NumPy does not support arbitrary precision arithmetic. The roots found will likely have less precision than you expect. [(-1.7724538509055158819194275565678253769874572753906250000000, 1), (1.7724538509055161039640324815991334617137908935546875000000, 1)] ********************************************************************** 1 items had failures: 1 of 148 in __main__.example_83 ***Test Failed*** 1 failures. For whitespace errors, see the file /home/leif/.sage//tmp/.doctest_polynomial_element.py [10.7 s] ---------------------------------------------------------------------- The following tests failed: sage -t -long "devel/sage/sage/rings/polynomial/polynomial_element.pyx"
:D
comment:62 in reply to: ↑ 60 ; follow-up: ↓ 63 Changed 9 years ago by
Replying to dimpase:
there no SSEx on PPC, it's an x86 thing, IMHO...
Of course not (nor 387), but some IBM-specific double-double type.
OK, so perhaps there is a miracle compiler switch on PPC that will do the same trick.
-fdefault-double-8 Set the "DOUBLE PRECISION" type to an 8 byte wide type. If -fdefault-real-8 is given, "DOUBLE PRECISION" would instead be promoted to 16 bytes if possible, and -fdefault-double-8 can be used to prevent this. The kind of real constants like "1.d0" will not be changed by -fdefault-real-8 though, so also -fdefault-double-8 does not affect it.
;-)
comment:63 in reply to: ↑ 62 Changed 9 years ago by
Replying to leif:
Replying to dimpase:
there no SSEx on PPC, it's an x86 thing, IMHO...
Of course not (nor 387), but some IBM-specific double-double type.
OK, so perhaps there is a miracle compiler switch on PPC that will do the same trick.
-fdefault-double-8 Set the "DOUBLE PRECISION" type to an 8 byte wide type. If -fdefault-real-8 is given, "DOUBLE PRECISION" would instead be promoted to 16 bytes if possible, and -fdefault-double-8 can be used to prevent this. The kind of real constants like "1.d0" will not be changed by -fdefault-real-8 though, so also -fdefault-double-8 does not affect it.;-)
unfortunately this does not help on MacOSX 10.5 PPC. I tried to recompile lapack with extra "-fdefault-double-8 -fdefault-real-8" options put into sage_fortran, to no avail. One cannot add -fdefault-double-8 option alone, as the compiler (gfortran 4.2.1) says that -fdefault-real-8 is required along with it. And -fdefault-real-8 alone produces code that does not work.
The next attempt will be with "-fdefault-real-8 -mlong-double-128 -lmx"...
comment:64 follow-up: ↓ 65 Changed 9 years ago by
I tried playing with the compiler flags on a noddy C program. My Fortran is almost non-existant, but perhaps it's worth trying a Fortran program. But I can't change the values of these parameters in my Noddy C program by use of compiler flags.
drkirkby@hawk:~$ cat test.c #include <float.h> #include <math.h> #include <stdio.h> #include <stdlib.h> volatile long double x=M_PI; volatile long double y=-M_PI; main(){ printf("+sqrt(pi)=%.70lf\n",sqrt(x)); printf("-sqrt(pi)=%.70lf\n",sqrt(fabs(y))); } drkirkby@hawk:~$ gcc test.c -mfpmath=sse -lm drkirkby@hawk:~$ ./a.out +sqrt(pi)=1.7724538509055158819194275565678253769874572753906250000000000000000000 -sqrt(pi)=1.7724538509055158819194275565678253769874572753906250000000000000000000 drkirkby@hawk:~$ gcc test.c -mfpmath=387 -lm drkirkby@hawk:~$ ./a.out +sqrt(pi)=1.7724538509055158819194275565678253769874572753906250000000000000000000 -sqrt(pi)=1.7724538509055158819194275565678253769874572753906250000000000000000000
t2.math
is in rather a mess now, following William deleting all the NFS file system that were shared to t2.math
.
I'll create Dima an account on my OpenSolaris machine. The problem exists on that. I can't say for sure whether it does on my AIX server (PowerPC) or HP-UX workstation (PA-RISC CPU). Both those systems have native compilers, but neither can build Sage. Both are quite a bit slower than the OpenSolaris machine too.
Leave that with me.
Dave
comment:65 in reply to: ↑ 64 Changed 9 years ago by
- Priority changed from blocker to major
Replying to drkirkby:
I tried playing with the compiler flags on a noddy C program. My Fortran is almost non-existant, but perhaps it's worth trying a Fortran program. But I can't change the values of these parameters in my Noddy C program by use of compiler flags.
We are computing the roots of f(x)=x2-Pi by computing the eigenvalues of the matrix
[[0.0, Pi], [1.0, 0.0]]
We call LAPACK's DGEEV that does just that, by a relatively nontrivial iterative procedure, The stopping condition is related to the precision guaranteed by LAPACK, and it turns out, on some floating point architectures, that the error in computing the negative root is very close to the precision guaranteed by LAPACK: A test program that calls DGEEV on this matrix, on MacOSX 10.5 (PPC G4) outputs:
Eigenvalue( 1) = 1.77245385090551588191942755656782538E+00 Eigenvalue( 2) =-1.77245385090551610396403248159913346E+00 Difference in absolute value 2.22044604925031308084726333618164063E-16
whereas LAPACK precision (for doubles) is, taken from install.log:
Precision = 2.220446049250313E-016
Why does anyone think LAPACK must do better than that, I do not understand. Indeed, some other hardwares do better on this example, but this is just pure luck, and/or ability to get better precision from the hardware--- something that is not available on G4 PPC --- or rather the compiler cannot take full advantage of the hardware.
I propose to adjust the precision in doctests, and be done with it. It is not a bug...
I take the liberty to reduce the priority to major.
Dima
comment:66 Changed 9 years ago by
Would it help to try building with the Intel Fortran Compiler? It appears to be free on Linux for non-commercial use, but I haven't tried to download and install it. Or are we disallowed from using it, because Sage might be used commercially?
comment:67 follow-up: ↓ 68 Changed 9 years ago by
- Priority changed from major to blocker
I'd still like to resolve this before we release 4.6.
Are there any objections to Leif's suggestion to use tagged 32-bit and 64-bit results? Or, if tagging doesn't cover all reported cases, to using dots (...
)?
comment:68 in reply to: ↑ 67 ; follow-up: ↓ 69 Changed 9 years ago by
- Work issues report upstream deleted
Replying to mpatel:
I'd still like to resolve this before we release 4.6.
Are there any objections to Leif's suggestion to use tagged 32-bit and 64-bit results? Or, if tagging doesn't cover all reported cases, to using dots (
...
)?
Well, do we agree now that this is a purely doctest problem? (mhansen (changeset 14950, trac #8054) has put too many digits there...)
One cannot expect more precision from NumPy/Lapack? than a concrete combination of hardware and software gives you. There is in fact a Lapack call that will give you the precision, but this looks to me an overkill to code this into the doctest. On all the platforms we have, the precision of more than 2.22045E-16 is what we get. So the test should pass if the difference between the actual root and the NumPy? output is less than that. Which amounts to putting dots after, umh, you do the maths :-)
P.S. For the interested, this is how to test DGEEV without anything but a Fortran compiler (assuming Lapack is installed):
1) grab http://www.nag.co.uk/lapack-ex/examples/source/dgeev-ex.f off the net
2) edit it to increase the number of digits printed, i.e. edit format statements labelled 9999 and 9998 so that they look as follows:
99999 FORMAT (1X,A,I2,A,1P,E42.35) 99998 FORMAT (1X,A,I2,A,'(',1P,E42.35,',',1P,E42.35,')')
3) compile it with your Fortran compiler: (something like $ gfortran -llapack dgeev-ex.f )
4) create a text file, say, inp.txt, with the input (the data begins at the 2nd line):
# blah... 2 0. 0.3141592653589793115998000D+01 1. 0.
5) $ ./a.out <inp.txt
Eigenvalues of the 2x2 matrix in inp.txt, i.e. +/- square root of Pi, up to the precision the setup can manage, will be printed in all gory detail.
Dima
comment:69 in reply to: ↑ 68 Changed 9 years ago by
- Status changed from needs_work to needs_review
Replying to dimpase:
Replying to mpatel:
I'd still like to resolve this before we release 4.6.
Are there any objections to Leif's suggestion to use tagged 32-bit and 64-bit results? Or, if tagging doesn't cover all reported cases, to using dots (
...
)?Well, do we agree now that this is a purely doctest problem? (mhansen (changeset 14950, trac #8054) has put too many digits there...)
I've just attached the patch that puts the ellipses to the right places, so that the doctest passes on Macosx 10.5 PPC
comment:70 Changed 9 years ago by
thanks Dima for giving the instructions on how to test DGEEV. I'll give it a try.
Paul
comment:71 Changed 9 years ago by
thanks Dima for giving the instructions on how to test DGEEV. I'll give it a try.
on a 64-bit Core 2 I get:
tarte% ./a.out < inp.txt DGEEV Example Program Results Eigenvalue( 1) = 1.77245385090551588191940000000000000E+00 Eigenvector( 1) 8.7095E-01 4.9138E-01 Eigenvalue( 2) = -1.77245385090551588191940000000000000E+00 Eigenvector( 2) -8.7095E-01 4.9138E-01
On Dave machine I get:
zimmerma@hawk:~$ ./a.out < inp.txt DGEEV Example Program Results Eigenvalue( 1) = 1.77245385090551588191940000000000000E+00 Eigenvector( 1) 8.7095E-01 4.9138E-01 Eigenvalue( 2) = -1.77245385090551610396400000000000000E+00 Eigenvector( 2) -8.7095E-01 4.9138E-01
The second eigenvalue differs. Now we have to investigate inside the DGEEV call. If it performs only additions, subtractions, multiplications, divisions and square roots, there should be no difference according to IEEE 754 (unless different formats are used).
Paul
comment:72 Changed 9 years ago by
- Reviewers set to Leif Leonhardy
- Status changed from needs_review to positive_review
Positive review from me for Dima's patch. (Feel free to revert in case you disagree.)
Since the floating-point precision does (or should) not depend on the machine word width (or pointer size), it's better to not give different results for the doctest tagged with "32-bit" / "64-bit".
(As mentioned above, by changing compiler flags I can make the original doctest fail on a 64-bit Core2 as well as make it pass on a 32-bit Pentium4 Prescott.)
comment:73 Changed 9 years ago by
P.S.: Despite the positive review for Dima's patch, I appreciate Paul tracking this further down.
I wouldn't consider this a real LAPACK bug, but wonder if it's "intentional". It's quite weird anyway, though one should IMHO in general not expect to get symmetric results here; I think it's mostly our example that makes this perhaps more surprising.
comment:74 Changed 9 years ago by
Worth noting is:
- lapack.a gets copied to $SAGE_LOCAL/lib in the lapack-20071123.p1.spkg
- On OS X, the ATLAS package is not built.
- On Linux and Solaris, ATLAS get's built, creates a new lapack.a (I believe).
- There's a comment dated 28th November 2007 in SPKG.txt for ATLAS that: ATLAS produces borked lapack.so which has missing symbols
- On Linux, Solaris and FreeBSD, there's some code in the ATLAS package for creating a shared liblapack.so from the static liblapack.a
- On Solaris, the liblapack.so get's deleted as it causes problems with R.
So on OS X, one only needs to look at the LAPACK code, but on Linux, FreeBSD and Solaris one needs to consider what ATLAS might do too.
Also note that on my system, the output of my Noddy C program can't be changed by changing the compiler flags to use the 387 or SSE instructions. It always gives the output expected by the doctest.
Dave
comment:75 Changed 9 years ago by
This might end up being "Report upstream to Intel!!" Some of you may be old enough to remember the bug in the Intel i486 for floating point code. I had one hell of a struggle with Dell to get the CPU replaced. Intel sent me a 38 page document explaining why the bug was not of concern to most of the scientific community. Even Intel were reluctant to replace the CPU, but I finally got it replaced.
Dave
comment:76 follow-up: ↓ 77 Changed 9 years ago by
What I find annoying about these doctests is that whilst the output will be
1.772453850905515
or
1.772453850905516
depending on system,
With these changes, now anything from
1.772453850905510
to
1.772453850905519
is considered acceptable. So we have permitted 'errors' several times higher than necessary.
In some cases, I've met in the past, the adding of ... has resulted in results errors 1000x higher being considered acceptable. IMHO, this method of testing is not very good.
Dave
comment:77 in reply to: ↑ 76 ; follow-up: ↓ 78 Changed 9 years ago by
Replying to drkirkby:
What I find annoying about these doctests is that whilst the output will be
1.772453850905515
or1.772453850905516
depending on system. With these changes, now anything from1.772453850905510
to1.772453850905519
is considered acceptable. So we have permitted 'errors' several times higher than necessary.In some cases, I've met in the past, the adding of ... has resulted in results errors 1000x higher being considered acceptable. IMHO, this method of testing is not very good.
In the light of the "user warning" given, I think using dots is acceptable here.
W.r.t. 387 vs. SSE, try this:
#include <stdio.h> #include <math.h> volatile double x=M_PI; int main() { printf ("float:\n%+.70f\n%+.70f\n",sqrtf((float)x),-sqrtf((float)x)); printf ("double:\n%+.70lf\n%+.70lf\n",sqrt(x),-sqrt(x)); printf ("long double:\n%+.70Lf\n%+.70Lf\n", sqrtl((long double)x), -sqrtl((long double)x)); return 0; }
I only get a different result for the negated square root in the float
case with 387 code (Core2, gcc 4.4.3).
comment:78 in reply to: ↑ 77 Changed 9 years ago by
Replying to leif:
In the light of the "user warning" given, I think using dots is acceptable here.
Yes, agreed. Just in general I think this method of comparing floating point numbers is rather dumb.
W.r.t. 387 vs. SSE, try this:
SSE
drkirkby@hawk:~$ gcc test3.c -lm -mfpmath=sse drkirkby@hawk:~$ ./a.out float: +1.7724539041519165039062500000000000000000000000000000000000000000000000 -1.7724539041519165039062500000000000000000000000000000000000000000000000 double: +1.7724538509055158819194275565678253769874572753906250000000000000000000 -1.7724538509055158819194275565678253769874572753906250000000000000000000 long double: +1.7724538509055159927248895845863785325491335242986679077148437500000000 -1.7724538509055159927248895845863785325491335242986679077148437500000000
387
drkirkby@hawk:~$ gcc test3.c -lm -mfpmath=387 drkirkby@hawk:~$ ./a.out float: +1.7724539041519165039062500000000000000000000000000000000000000000000000 -1.7724538755670267153874419818748719990253448486328125000000000000000000 double: +1.7724538509055158819194275565678253769874572753906250000000000000000000 -1.7724538509055158819194275565678253769874572753906250000000000000000000 long double: +1.7724538509055159927248895845863785325491335242986679077148437500000000 -1.7724538509055159927248895845863785325491335242986679077148437500000000
I only get a different result for the negated square root in the float case with 387 code (Core2, gcc 4.4.3).
Yes, I too get different results (OpenSolaris, Intel Xeon W3580 @ 3.33 GHz, gcc 4.5.0)
Perhaps Paul has some comments on those different numbers. They would appear to violate one of the principle claims of IEE-754 to me, but perhaps I am wrong.
Dave
comment:79 follow-up: ↓ 85 Changed 9 years ago by
the gcc documentation says that with {{{-mfpmath=387}}, temporary results are computed with 80
bits of precision (i.e., with double extended precision), thus different results might be
obtained. It also points to -ffloat-store
, which prevents using extended precision.
With -ffloat-store
I get on a 64-bit Core 2 (and without it, I get different absolute
values for float, exactly as Dave in the above comment):
tarte% gcc -ffloat-store -mfpmath=387 test3.c -lm tarte% ./a.out float: +1.7724539041519165039062500000000000000000000000000000000000000000000000 -1.7724539041519165039062500000000000000000000000000000000000000000000000 double: +1.7724538509055158819194275565678253769874572753906250000000000000000000 -1.7724538509055158819194275565678253769874572753906250000000000000000000 long double: +1.7724538509055159927248895845863785325491335242986679077148437500000000 -1.7724538509055159927248895845863785325491335242986679077148437500000000
I thus suggest to recompile Lapack with -ffloat-store
and try again.
Paul
comment:80 follow-up: ↓ 81 Changed 9 years ago by
I've reported to Kaveh Ghazi (a GCC developer) the strange behaviour of comment 78. If/when I get an answer, I will report it here.
Paul
comment:81 in reply to: ↑ 80 ; follow-ups: ↓ 82 ↓ 87 Changed 9 years ago by
Replying to zimmerma:
I've reported to Kaveh Ghazi (a GCC developer) the strange behaviour of comment 78. If/when I get an answer, I will report it here.
cf http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46080 to follow the GCC issue. However I'm not sure this is related to this ticket. Does Lapack use single precision (i.e., the C float type)?
Paul
comment:82 in reply to: ↑ 81 Changed 9 years ago by
Replying to zimmerma:
Replying to zimmerma:
I've reported to Kaveh Ghazi (a GCC developer) the strange behaviour of comment 78. If/when I get an answer, I will report it here.
cf http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46080 to follow the GCC issue. However I'm not sure this is related to this ticket. Does Lapack use single precision (i.e., the C float type)?
Lapack has routines for both single and double precision (names starting with S and D, respectively)
Dmitrii
Paul
comment:83 follow-up: ↓ 84 Changed 9 years ago by
Lapack has routines for both single and double precision (names starting with S and D, respectively)
thus since the Lapack issue is with dgeev
, it might be independent from the GCC float
issue. However, since the GCC issue is marked as a regression from 4.4, somebody with an older
version of GCC might try to recompile Lapack with it on the machines where one got different
absolute values.
Paul
comment:84 in reply to: ↑ 83 Changed 9 years ago by
Replying to zimmerma:
Lapack has routines for both single and double precision (names starting with S and D, respectively)
thus since the Lapack issue is with
dgeev
, it might be independent from the GCC float issue. However, since the GCC issue is marked as a regression from 4.4, somebody with an older version of GCC might try to recompile Lapack with it on the machines where one got different absolute values.
The Sage / LAPACK results on the 32-bit Pentium4 Prescott came from GCC 4.3.3. (And I didn't get differences for my C test program there, regardless of SSE/387 and C99.)
Also, XCode comes with even older GCCs.
comment:85 in reply to: ↑ 79 Changed 9 years ago by
I thus suggest to recompile Lapack with
-ffloat-store
and try again.
I tried this on MacOSX 10.5 PPC (G4) with gfortran 4.2.1, and saw no difference. Basically, there seems to be no appropriate fortran compiler options available for this platform (Apple did not work on a Fortran compiler to use the better FP (altivec) on G4 processors; IBM did have a Fortran compiler, xlf (?) for that chip of them, but it is no longer sold). The following must be telling the story:
(sage subshell) cantor:dgeev dima$ sage_fortran -faltivec -llapack -lblas $SAGE_LOCAL/lib/libgfortran.a dgt.f dgeev.f f951: warning: command line option "-faltivec" is valid for C/C++/ObjC/ObjC++ but not for Fortran f951: warning: command line option "-faltivec" is valid for C/C++/ObjC/ObjC++ but not for Fortran
comment:86 Changed 9 years ago by
- Merged in set to sage-4.6.rc0
- Resolution set to fixed
- Status changed from positive_review to closed
comment:87 in reply to: ↑ 81 Changed 9 years ago by
cf http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46080 to follow the GCC issue.
the GCC developers say this is not a GCC bug, but "simply the unpredictability of excess precision arithmetics", and advise to use -ffloat-store/-fexcess-precision=standard/volatile float.
Is it possible to use -ffloat-store
to build Lapack? It seems to be a valid option of
sage_fortran
:
tarte% /localdisk/tmp/sage-4.6/local/bin/sage_fortran -v --help | grep store ... -ffloat-store Don't allocate floats and doubles in extended- precision registers
Paul
Probably something has changed in roots(algorithm='numpy') between 4.6.alpha1 and 4.6.alpha2. Was numpy upgraded? Is the 4.6.alpha2 source available somewhere?
Paul