Opened 6 years ago
Closed 6 years ago
#22799 closed defect (fixed)
Strange warnings from numpy/matplotlib when sage is built with clang
Reported by:  François Bissey  Owned by:  

Priority:  major  Milestone:  sage8.0 
Component:  porting  Keywords:  
Cc:  Dima Pasechnik, John Palmieri  Merged in:  
Authors:  François Bissey, Dima Pasechnik, Paul Zimmermann  Reviewers:  John Palmieri, Dima Pasechnik 
Report Upstream:  Fixed upstream, but not in a stable release.  Work issues:  
Branch:  fd29778 (Commits, GitHub, GitLab)  Commit:  fd29778835a64b8582f1b6ce7d47c8c5062ee240 
Dependencies:  #22582  Stopgaps: 
Description (last modified by )
Seen with clang+OS X and freeBSD+clang
sage t long src/doc/en/prep/Calculus.rst # 1 doctest failed sage t long src/sage/plot/graphics.py # 1 doctest failed sage t long src/sage/plot/plot.py # 1 doctest failed sage t long src/sage/rings/polynomial/polynomial_real_mpfr_dense.pyx # 1 doctest failed sage t long src/sage/structure/coerce.pyx # 1 doctest failed
All these doctest fail because an unexpected warning is emitted:
File "src/sage/rings/polynomial/polynomial_real_mpfr_dense.pyx", line 21, in sage.rings.polynomial.polynomial_real_mpfr_dense Failed example: numpy.float32('1.5') * x Expected: 1.50000000000000*x Got: doctest:warning File "/usr/home/dima/Sage/sage/src/bin/sageruntests", line 89, in <module> err = DC.run() File "/usr/home/dima/Sage/sage/local/lib/python2.7/sitepackages/sage/doctest/control.py", line 1134, in run self.run_doctests() File "/usr/home/dima/Sage/sage/local/lib/python2.7/sitepackages/sage/doctest/control.py", line 858, in run_doctests self.dispatcher.dispatch() File "/usr/home/dima/Sage/sage/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 1705, in dispatch self.parallel_dispatch() File "/usr/home/dima/Sage/sage/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 1595, in parallel_dispatch w.start() # This might take some time File "/usr/home/dima/Sage/sage/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 1871, in start super(DocTestWorker, self).start() File "/usr/home/dima/Sage/sage/local/lib/python2.7/multiprocessing/process.py", line 130, in start self._popen = Popen(self) File "/usr/home/dima/Sage/sage/local/lib/python2.7/multiprocessing/forking.py", line 126, in __init__ code = process_obj._bootstrap() File "/usr/home/dima/Sage/sage/local/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/home/dima/Sage/sage/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 1844, in run task(self.options, self.outtmpfile, msgpipe, self.result_queue) File "/usr/home/dima/Sage/sage/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 2137, in __call__ runner.run(test) File "/usr/home/dima/Sage/sage/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 641, in run return self._run(test, compileflags, out) File "/usr/home/dima/Sage/sage/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 503, in _run self.compile_and_execute(example, compiler, test.globs) File "/usr/home/dima/Sage/sage/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 866, in compile_and_execute exec(compiled, globs) File "<doctest sage.rings.polynomial.polynomial_real_mpfr_dense[7]>", line 1, in <module> numpy.float32('1.5') * x : RuntimeWarning: invalid value encountered in multiply 1.50000000000000*x
More specifically, the warning is emitted by the call
sage: x=polygen(RR) sage: numpy.float32('1.5') * x
seen on freeBSD+clang, OS X+clang and linux+clang.
Similarly, the warning is emitted in
sage: numpy.float64(5)>e
or >=
instead of >
, or pi
instead of e
. Note that pi.n()
and e.n()
are of type RR
, so again it points at the direction on mpfr
.
Change History (138)
comment:1 Changed 6 years ago by
comment:2 Changed 6 years ago by
As I suspected, both warning ultimately originate from inside numpy
. The multiply
warning comes from numpy/core/tests/test_datetime.py
??? And greater_equal
from numpy/lib/tests/test_nanfunctions.py
, comparison with NaN.
comment:3 Changed 6 years ago by
Cc:  John Palmieri added 

comment:4 Changed 6 years ago by
I tried to compare the build logs for numpy (on OS X) using gcc
vs. clang
. I found a few errors or warnings in the clang build that are not present in the gcc build. For example:
numpy/core/src/multiarray/datetime.c:781:24: warning: comparison of constant 1 with expression of type 'NPY_DATETIMEUNIT' is always false [Wtautologicalconstantoutofrangecompare] if (out_meta>base == 1) { ~~~~~~~~~~~~~~ ^ ~~ numpy/core/src/multiarray/datetime.c:1847:24: warning: comparison of constant 1 with expression of type 'NPY_DATETIMEUNIT' is always false [Wtautologicalconstantoutofrangecompare] if (out_meta>base == 1) { ~~~~~~~~~~~~~~ ^ ~~ numpy/core/src/multiarray/datetime.c:1976:20: warning: comparison of unsigned enum expression >= 0 is always true [Wtautologicalcompare] if (meta>base >= 0 && meta>base < (NPY_FR_GENERIC + 1)) { ~~~~~~~~~~ ^ ~ numpy/core/src/multiarray/datetime.c:2395:24: warning: comparison of constant 1 with expression of type 'NPY_DATETIMEUNIT' is always false [Wtautologicalconstantoutofrangecompare] if (meta>base == 1) { ~~~~~~~~~~ ^ ~~ numpy/core/src/multiarray/datetime.c:2411:24: warning: comparison of constant 1 with expression of type 'NPY_DATETIMEUNIT' is always false [Wtautologicalconstantoutofrangecompare] if (meta>base == 1  meta>base == NPY_FR_GENERIC) { ~~~~~~~~~~ ^ ~~ numpy/core/src/multiarray/datetime.c:2424:24: warning: comparison of constant 1 with expression of type 'NPY_DATETIMEUNIT' is always false [Wtautologicalconstantoutofrangecompare] if (meta>base == 1) { ~~~~~~~~~~ ^ ~~ numpy/core/src/multiarray/datetime.c:2463:24: warning: comparison of constant 1 with expression of type 'NPY_DATETIMEUNIT' is always false [Wtautologicalconstantoutofrangecompare] if (meta>base == 1) { ~~~~~~~~~~ ^ ~~ numpy/core/src/multiarray/datetime.c:2495:28: warning: comparison of constant 1 with expression of type 'NPY_DATETIMEUNIT' is always false [Wtautologicalconstantoutofrangecompare] if (meta>base == 1) { ~~~~~~~~~~ ^ ~~ numpy/core/src/multiarray/datetime.c:2522:24: warning: comparison of constant 1 with expression of type 'NPY_DATETIMEUNIT' is always false [Wtautologicalconstantoutofrangecompare] if (meta>base == 1) { ~~~~~~~~~~ ^ ~~ numpy/core/src/multiarray/datetime.c:2598:28: warning: comparison of constant 1 with expression of type 'NPY_DATETIMEUNIT' is always false [Wtautologicalconstantoutofrangecompare] if (meta>base == 1) { ~~~~~~~~~~ ^ ~~ numpy/core/src/multiarray/datetime.c:2609:24: warning: comparison of constant 1 with expression of type 'NPY_DATETIMEUNIT' is always false [Wtautologicalconstantoutofrangecompare] if (meta>base == 1) { ~~~~~~~~~~ ^ ~~ numpy/core/src/multiarray/datetime.c:2622:24: warning: comparison of constant 1 with expression of type 'NPY_DATETIMEUNIT' is always false [Wtautologicalconstantoutofrangecompare] if (meta>base == 1) { ~~~~~~~~~~ ^ ~~ numpy/core/src/multiarray/datetime.c:2661:24: warning: comparison of constant 1 with expression of type 'NPY_DATETIMEUNIT' is always false [Wtautologicalconstantoutofrangecompare] if (meta>base == 1) { ~~~~~~~~~~ ^ ~~ numpy/core/src/multiarray/datetime.c:2730:24: warning: comparison of constant 1 with expression of type 'NPY_DATETIMEUNIT' is always false [Wtautologicalconstantoutofrangecompare] if (meta>base == 1) { ~~~~~~~~~~ ^ ~~ numpy/core/src/multiarray/datetime.c:2784:24: warning: comparison of constant 1 with expression of type 'NPY_DATETIMEUNIT' is always false [Wtautologicalconstantoutofrangecompare] if (meta>base == 1) { ~~~~~~~~~~ ^ ~~ numpy/core/src/multiarray/datetime.c:3118:26: warning: comparison of constant 1 with expression of type 'NPY_DATETIMEUNIT' is always false [Wtautologicalconstantoutofrangecompare] if (inout_meta>base == 1) { ~~~~~~~~~~~~~~~~ ^ ~~ 16 warnings generated.
There are quite a few warnings in the clang build related to comparisons.
comment:5 followup: 6 Changed 6 years ago by
By the way, the page https://www.scipy.org/scipylib/building/macosx.html says that we should use
export FFLAGS=ff2c
along with the remark 'The Fortran flag “ff2c” has been reported to be necessary.' We do not do this. Is there any reason to believe that this would help?
comment:6 Changed 6 years ago by
Replying to jhpalmieri:
By the way, the page https://www.scipy.org/scipylib/building/macosx.html says that we should use
export FFLAGS=ff2calong with the remark 'The Fortran flag “ff2c” has been reported to be necessary.' We do not do this. Is there any reason to believe that this would help?
If we were using g77 for sure it would be necessary. The page itself is dated in my opinion. I'll dig a bit around the compilation warnings, there may be overoptimisation somewhere.
comment:7 followup: 8 Changed 6 years ago by
It is at least interesting that we are getting compiler warnings on some of the files related to doctest failures, and furthermore on comparisons.
comment:8 Changed 6 years ago by
Replying to jhpalmieri:
It is at least interesting that we are getting compiler warnings on some of the files related to doctest failures, and furthermore on comparisons.
It is indeed.
comment:9 Changed 6 years ago by
Just for those wondering: upgrading to numpy 1.12.1 doesn't change a thing.
comment:10 Changed 6 years ago by
I am not sure numpy
is the root cause of the problem. I recompiled numpy
using x86_64appledarwin16.5.0gcc
which is currently installed by gfortran package. I remove gcc, but not more sophisticated names like the above. The error persists. Of course the above compiler is not bootstraped which could have an impact.
However, I am wondering if something funny happens in the coercion framework.
comment:11 Changed 6 years ago by
I suspect that the bug is not (only) in coersion, but in numpy's float conversion/truncation module; on FreeBSD/clang I get no warning with numpy.float
or
numpy.float128
, but I do get warnings with numpy.float<k>
with k<128
:
sage: import numpy as np sage: x=polygen(RR) sage: np.float32('1.5')*x /usr/home/dima/Sage/sage/src/bin/sageipython:1: RuntimeWarning: invalid value encountered in multiply #!/usr/bin/env python 1.50000000000000*x sage: np.float16('1.5')*x /usr/home/dima/Sage/sage/src/bin/sageipython:1: RuntimeWarning: invalid value encountered in multiply #!/usr/bin/env python 1.50000000000000*x sage: np.float64('1.5')*x /usr/home/dima/Sage/sage/src/bin/sageipython:1: RuntimeWarning: invalid value encountered in multiply #!/usr/bin/env python 1.50000000000000*x sage: np.float128('1.5')*x 1.50000000000000*x sage: np.float('1.5')*x 1.50000000000000*x
That's of course still a mystery, and a combination of the fact that it only happens with x
coming from RealField()
(its precision does not matter, I tried many values, with the same effect).
comment:13 followup: 14 Changed 6 years ago by
Replying to fbissey:
All these doctest fail because an unexpected warning is emitted:
RuntimeWarning: invalid value encountered in greater_equal
Traceback please (general rule: when posting bug report, never truncate a traceback).
comment:14 Changed 6 years ago by
Replying to jdemeyer:
Replying to fbissey:
All these doctest fail because an unexpected warning is emitted:
RuntimeWarning: invalid value encountered in greater_equalTraceback please (general rule: when posting bug report, never truncate a traceback).
these are very uninformative (it's just a message that comes from the depths of a compiled part of numpy, not causing any interrupts). For instance:
File "src/sage/rings/polynomial/polynomial_real_mpfr_dense.pyx", line 21, in sage.rings.polynomial.polynomial_real_mpfr_dense Failed example: numpy.float32('1.5') * x Expected: 1.50000000000000*x Got: doctest:warning File "/usr/home/dima/Sage/sage/src/bin/sageruntests", line 89, in <module> err = DC.run() File "/usr/home/dima/Sage/sage/local/lib/python2.7/sitepackages/sage/doctest/control.py", line 1134, in run self.run_doctests() File "/usr/home/dima/Sage/sage/local/lib/python2.7/sitepackages/sage/doctest/control.py", line 858, in run_doctests self.dispatcher.dispatch() File "/usr/home/dima/Sage/sage/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 1705, in dispatch self.parallel_dispatch() File "/usr/home/dima/Sage/sage/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 1595, in parallel_dispatch w.start() # This might take some time File "/usr/home/dima/Sage/sage/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 1871, in start super(DocTestWorker, self).start() File "/usr/home/dima/Sage/sage/local/lib/python2.7/multiprocessing/process.py", line 130, in start self._popen = Popen(self) File "/usr/home/dima/Sage/sage/local/lib/python2.7/multiprocessing/forking.py", line 126, in __init__ code = process_obj._bootstrap() File "/usr/home/dima/Sage/sage/local/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/home/dima/Sage/sage/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 1844, in run task(self.options, self.outtmpfile, msgpipe, self.result_queue) File "/usr/home/dima/Sage/sage/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 2137, in __call__ runner.run(test) File "/usr/home/dima/Sage/sage/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 641, in run return self._run(test, compileflags, out) File "/usr/home/dima/Sage/sage/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 503, in _run self.compile_and_execute(example, compiler, test.globs) File "/usr/home/dima/Sage/sage/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 866, in compile_and_execute exec(compiled, globs) File "<doctest sage.rings.polynomial.polynomial_real_mpfr_dense[7]>", line 1, in <module> numpy.float32('1.5') * x : RuntimeWarning: invalid value encountered in multiply 1.50000000000000*x
It's also pretty hard to extract anything useful at the Sage prompt, I tried to set up a numpy interrupt following numpy docs, but it ends up telling me that the interrupt happends at the top level of ipython.
comment:15 Changed 6 years ago by
Description:  modified (diff) 

comment:16 followup: 18 Changed 6 years ago by
Could you run numpy tests on clang/OSX? I did so on clang/FreeBSD, with some failures in complex arithmetic/functions (mostly numerical noise, but not only) More precisely:
 untar the numpy tarball in
upstream/
 launch
sage sh
 install nose (
pip install nose
) cd upstream/numpy1.11.1/numpy/testing
 run
./setup.py install
cd ../core/tests
 run
python test*.py
On FreeBSD I get
Ran 3072 tests in 53.164s FAILED (KNOWNFAIL=6, SKIP=4, failures=22)
Perhaps we should upgrade to 1.12.1 and see if this helps.
comment:17 Changed 6 years ago by
I did ./sage i nose
to install nose, but otherwise followed your instructions. I got
Ran 3072 tests in 47.097s OK (KNOWNFAIL=6, SKIP=4)
I also got this warning many times:
/Users/palmieri/Desktop/Sage_stuff/sage_builds/TESTING/sage8.0.beta2/numpy1.11.1/numpy/core/tests/test_datetime.py:1114: FutureWarning: In the future, NAT != NAT will be True rather than False.
comment:18 followup: 25 Changed 6 years ago by
Replying to dimpase:
Perhaps we should upgrade to 1.12.1 and see if this helps.
First thing I tried. No changes. But I will try the tests ASAP.
comment:19 Changed 6 years ago by
So numpy1.12.1
(sagesh) fbissey@Mirage:tests$ python test*.py .................................................................................................................................................................................................................................................................................................................................................................................................................................KKK........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................S.................................................................................................................................................................................................................................................................................................................................................S.........................................................................................................................................................................................................................................................................................................K......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................K...SK.S.......S................................................................................  Ran 3167 tests in 81.934s OK (KNOWNFAIL=6, SKIP=5)
No warnings.
comment:20 followup: 21 Changed 6 years ago by
We're discussing this problem on sagedevel and it appears that Sage does not follow the coercion
framework prescription for the case of numpy.float32('1.5')*x
, but instead does something that makes numpy unhappy on clang...
comment:21 followup: 22 Changed 6 years ago by
Replying to dimpase:
it appears that Sage does not follow the coercion framework prescription for the case of
numpy.float32('1.5')*x
This has nothing to do with the coercion framework. When doing a * b
, Python usually calls type(a).__mul__(a, b)
. If a
is not a Sage object (as is the case here), then it's the __mul__
method of a
which handles this multiplication. Only if this returns NotImplemented
, then Python will call type(b).__mul__(a, b)
.
comment:22 followup: 26 Changed 6 years ago by
Replying to jdemeyer:
Replying to dimpase:
it appears that Sage does not follow the coercion framework prescription for the case of
numpy.float32('1.5')*x
This has nothing to do with the coercion framework. When doing
a * b
, Python usually callstype(a).__mul__(a, b)
. Ifa
is not a Sage object (as is the case here), then it's the__mul__
method ofa
which handles this multiplication. Only if this returnsNotImplemented
, then Python will calltype(b).__mul__(a, b)
.
Are you saying that the warning comes from
type(numpy.float32('1.5')).__mul__()
? OK, this makes senseI tried this call on clang/FreeBSD and got the warning in question all right. So we know where to look at.
Still, this does not explain why sage.structure.element.coercion_model.explain()
lies in this case (or in the case x
comes as the 1st argument).
comment:23 Changed 6 years ago by
Description:  modified (diff) 

comment:24 Changed 6 years ago by
Description:  modified (diff) 

comment:25 Changed 6 years ago by
Replying to fbissey:
Replying to dimpase:
Perhaps we should upgrade to 1.12.1 and see if this helps.
First thing I tried. No changes. But I will try the tests ASAP.
by the way, __mul__
is changing in 1.12. On 1.11 I get
sage: import numpy as np sage: v=np.float32('1.5') sage: v*"bla" /home/dima/Sage/sagedev/src/bin/sageipython:1: VisibleDeprecationWarning: using a noninteger number instead of an integer will result in an error in the future #!/usr/bin/env python 'bla'
while on 1.12
sage: v*"bla"  TypeError Traceback (most recent call last) <ipythoninput10f57c3b7020a0> in <module>() > 1 v*"bla" TypeError: 'numpy.float32' object cannot be interpreted as an index
comment:26 followup: 27 Changed 6 years ago by
Replying to dimpase:
Still, this does not explain why
sage.structure.element.coercion_model.explain()
lies in this case
cm.explain()
answers what the coercion model would do if asked to perform this multiplication. It doesn't answer what Python does when asked to perform this multiplication.
comment:27 followup: 28 Changed 6 years ago by
Replying to jdemeyer:
Replying to dimpase:
Still, this does not explain why
sage.structure.element.coercion_model.explain()
lies in this case
cm.explain()
answers what the coercion model would do if asked to perform this multiplication. It doesn't answer what Python does when asked to perform this multiplication.
 Do you mean that
x*numpy.float32('1.5')
does not use coersion, either? (Otherwise intentionally breaking the code path printed bycoercion_model.explain()
would make a differencebut it does not)  Documentation could be much more clear on how he coercion model is (or is not) invoked in particular cases.
comment:28 Changed 6 years ago by
Replying to dimpase:
 Do you mean that
x*numpy.float32('1.5')
does not use coersion, either?
No, that would immediately use the coercion model. If you think that there is a bug with coercion_model.explain()
please open a new selfcontained bug.
comment:29 Changed 6 years ago by
I think that the multiplication in question is done entirely in numpy. (Yes, numpy does have its own polynomial class).
sage: import numpy as np sage: v=np.float32('1.5') sage: x=polygen(RealField()) sage: np.multiply(v,x) /usr/home/dima/Sage/sage/src/bin/sageipython:1: RuntimeWarning: invalid value encountered in multiply #!/usr/bin/env python 1.50000000000000*x sage: np.multiply(x,v) /usr/home/dima/Sage/sage/src/bin/sageipython:1: RuntimeWarning: invalid value encountered in multiply #!/usr/bin/env python 1.50000000000000*x
I have no idea whether RR['x']
was designed to be compatible with numpy's polynomials, or this is a coincidence.
More computations:
sage: np.multiply(x^3,xx^2) x^5 + x^4 sage: np.multiply(v+v*x^3,xx^2) /usr/home/dima/Sage/sage/src/bin/sageipython:1: RuntimeWarning: invalid value encountered in multiply #!/usr/bin/env python /usr/home/dima/Sage/sage/src/bin/sageipython:1: RuntimeWarning: invalid value encountered in add #!/usr/bin/env python 1.50000000000000*x^5 + 1.50000000000000*x^4  1.50000000000000*x^2 + 1.50000000000000*x sage: np.multiply(v,v) 2.25
this is all for clang/FreeBSD (on Linux with gcc, all of the above passes without warnings).
comment:30 Changed 6 years ago by
Description:  modified (diff) 

comment:31 Changed 6 years ago by
Description:  modified (diff) 

comment:32 Changed 6 years ago by
Description:  modified (diff) 

comment:33 Changed 6 years ago by
Description:  modified (diff) 

comment:34 followup: 35 Changed 6 years ago by
I don't see why the doctest
sage: numpy.float32('1.5') * x 1.50000000000000*x
in sage.rings.polynomial.polynomial_real_mpfr_dense
should pass in the first place, since no Sage coercion is taking place. Do we need to add some sort of conversion from Sage types to numpy types?
comment:35 Changed 6 years ago by
Replying to jhpalmieri:
I don't see why the doctest
sage: numpy.float32('1.5') * x 1.50000000000000*xin
sage.rings.polynomial.polynomial_real_mpfr_dense
should pass in the first place, since no Sage coercion is taking place.
Wrong. Let me explain what happens:
 Python calls
type(numpy.float32('1.5')).__mul__(numpy.float32('1.5'), x)
.
 Numpy tries to do this multiplication but realizes that it cannot, so it returns
NotImplemented
.
 Python calls
type(x).__mul__(..., ...)
which involves the coercion model.
 The coercion model knows how to handle this.
So the difference between GCC and Clang almost certainly lies in step 2 above.
comment:36 Changed 6 years ago by
In case it helps, I think the RuntimeWarning
message comes from line 122 of numpy1.11.1/numpy/core/src/umath/ufunc_object.c
.
comment:37 followup: 39 Changed 6 years ago by
Hmm, I was not entirely right. The correct sequence looks like:
 Python calls
type(numpy.float32('1.5')).__mul__(numpy.float32('1.5'), x)
.
 Numpy tries to do this multiplication but realizes that it cannot.
 Numpy "coerces"
numpy.float32(1.5)
tofloat(1.5)
(a Python float) and callsfloat(1.5) * x
instead.
 Python realizes that it cannot do this multiplication.
 Python calls
type(x).__mul__(1.5, x)
which is handled by the coercion model.
Still, the difference must be in step 2.
comment:38 Changed 6 years ago by
If I apply this patch to numpy, then the failing doctests don't fail any more:

numpy/core/src/umath/ufunc_object.c
a b 120 120 switch(method) { 121 121 case UFUNC_ERR_WARN: 122 122 PyOS_snprintf(msg, sizeof(msg), "%s encountered in %s", errtype, name); 123 if (PyErr_Warn(PyExc_RuntimeWarning, msg) < 0) {124 goto fail;125 }126 123 break; 127 124 case UFUNC_ERR_RAISE: 128 125 PyErr_Format(PyExc_FloatingPointError, "%s encountered in %s",
This does not explain what's going on, and I don't see a particular reason to think that this is the right solution, but maybe someone who understands Python and/or numpy better might be able to make something productive out of it.
comment:39 Changed 6 years ago by
Replying to jdemeyer:
Hmm, I was not entirely right. The correct sequence looks like:
 Python calls
type(numpy.float32('1.5')).__mul__(numpy.float32('1.5'), x)
.
 Numpy tries to do this multiplication but realizes that it cannot.
 Numpy "coerces"
numpy.float32(1.5)
tofloat(1.5)
(a Python float) and callsfloat(1.5) * x
instead.
 Python realizes that it cannot do this multiplication.
 Python calls
type(x).__mul__(1.5, x)
which is handled by the coercion model.Still, the difference must be in step 2.
this theory does not work, as step 2 has nothing to do with type(x)
but type(x)
matters, as can be seen here:
sage: type(x) <type 'sage.symbolic.expression.Expression'> sage: type(numpy.float32('1.5')).__mul__(numpy.float32('1.5'), x) 1.5*x sage: x=polygen(RDF) sage: type(numpy.float32('1.5')).__mul__(numpy.float32('1.5'), x) 1.5*x sage: x=polygen(RR) sage: type(numpy.float32('1.5')).__mul__(numpy.float32('1.5'), x) /usr/home/dima/Sage/sage/src/bin/sageipython:1: RuntimeWarning: invalid value encountered in multiply #!/usr/bin/env python 1.50000000000000*x sage: float(1.5)*x 1.50000000000000*x
comment:40 Changed 6 years ago by
Report Upstream:  N/A → Reported upstream. No feedback yet. 

I have created https://github.com/numpy/numpy/issues/9007
comment:41 followups: 42 47 Changed 6 years ago by
You can't multiply a type by a number, or if you can, I don't know what sort of answer you would expect. Note also that your working example at the end is with float
, not numpy.float32
. So I claim that your example doesn't make any sense. I get this:
sage: numpy.float32('1.5') * x /Users/palmieri/Desktop/Sage_stuff/sage_builds/TESTING/sage8.0.beta3/src/bin/sageipython:1: RuntimeWarning: invalid value encountered in multiply #!/usr/bin/env python 1.50000000000000*x
That is, it just comes from plain multiplication in numpy.
comment:42 Changed 6 years ago by
Replying to jhpalmieri:
You can't multiply a type by a number,
in this case the type has __mul__
sage: import numpy sage: v=numpy.float32('1.5') sage: t=type(v) sage: t.__mul__? Type: wrapper_descriptor String form: <slot wrapper '__mul__' of 'numpy.generic' objects> File: Docstring: x.__mul__(y) <==> x*y ... sage: t.__mul__(v,2) 3.0
of course, in this case this is the same as v*2
, just some levels down the code path.
One can unwind it a bit more:
sage: v.__mul__ <methodwrapper '__mul__' of numpy.float32 object at 0x7f4ad5773318> sage: v.__mul__(2) 3.0 sage: v.__mul__(polygen(RR)) /usr/home/dima/Sage/sage/src/bin/sageipython:1: RuntimeWarning: invalid value encountered in multiply #!/usr/bin/env python 1.50000000000000*x
by the way, the same warning with __add__
and __sub__
.
comment:43 followup: 44 Changed 6 years ago by
Right, so isn't
numpy.float32('1.5') * polygen(RR)
a simple illustration of the problem, with no type
involved?
comment:44 Changed 6 years ago by
Replying to jhpalmieri:
Right, so isn't
numpy.float32('1.5') * polygen(RR)a simple illustration of the problem, with no
type
involved?
it's less clear this way whether or how Sage's coersion is involved.
numpy.float32('1.5').__mul__(polygen(RR))
also does not call type explicitly, and is closer to the end of the codepath of this computation.
comment:45 Changed 6 years ago by
First, sorry, I misunderstood about the use of type in the original example. Second, doesn't the problem happen before Sage's coercion? Third, it is not clear how numpy decides whether or not it can do a multiplication (in the language of Jeroen's step 2), so the type of x could very well be relevant there, couldn't it?
Maybe someone who is comfortable with gdb and related tools (= not me) should try the problematic operation in Sage + gcc and also in Sage + clang to see what is happening differently in the two cases.
comment:46 followup: 49 Changed 6 years ago by
Report Upstream:  Reported upstream. No feedback yet. → Reported upstream. Developers acknowledge bug. 

Numpy devs say a lot on https://github.com/numpy/numpy/issues/9007  that apparently Sage might be setting the FPU flag which is read by numpy, and leads to this warning. Also, they are able to show how to exclude np.float32
and np.float64
types from being under suspicion.
Perhaps we should upgrade to numpy 1.12.1, so that we're all on the same version (I am on 1.12.1)
comment:47 followup: 48 Changed 6 years ago by
Replying to jhpalmieri:
You can't multiply a type by a number
Sorry, as Dima said, type(A).__mul__(A, B)
is just a contrived way of writing A.__mul__(B)
. I wrote it with type()
because that's closer to what Python internally actually does.
Note that A.__mul__(B)
is not the same as A * B
: essentially A * B
first calls A.__mul__(B)
. When this returns NotImplemented
, Python tries the reversed multiplication.
comment:48 Changed 6 years ago by
Replying to jdemeyer:
Replying to jhpalmieri:
You can't multiply a type by a number
Sorry, as Dima said,
type(A).__mul__(A, B)
is just a contrived way of writingA.__mul__(B)
.
Right, I understand this now, as was vaguely implied in comment:45.
comment:49 Changed 6 years ago by
Replying to dimpase:
Numpy devs say a lot on https://github.com/numpy/numpy/issues/9007  that apparently Sage might be setting the FPU flag which is read by numpy, and leads to this warning. Also, they are able to show how to exclude
np.float32
andnp.float64
types from being under suspicion.Perhaps we should upgrade to numpy 1.12.1, so that we're all on the same version (I am on 1.12.1)
Yes we could make a ticket numpy1.12.1, we could drop at least one patch and I don't expect anything else to break.
The discussion on the numpy tracker goes in the direction I suspected it would. numpy is issuing the warning because of something happening in sage. I had arrived at that conclusion by recompiling numpy (1.12.1 at the time) with gcc on a full clang install on OS X. And the problem didn't go away, implying something sageside is happening when compiled with clang.
comment:51 Changed 6 years ago by
Let's agree on who would work on debugging; say, in src/sage/rings/polynomial/polynomial_real_mpfr_dense.pyx
one would probably need to set an FPU trap to find exactly what triggers the numpy warning in np.float32('1.5')*polygen(RR)
(it's very likely to be a call to mpfr
, but which one?)
Or perhaps there should be a way to set this up globally, for all the cython modules.
(I'd be too busy in the coming week).
comment:52 Changed 6 years ago by
Dependencies:  #22895 → #22582 

comment:53 Changed 6 years ago by
Description:  modified (diff) 

Similarly, the warning is emitted in
sage: numpy.float64(5)>e
or >=
instead of >
, or pi
instead of e
. Note that pi.n()
and e.n()
are of type RR
, so again it points at the direction on mpfr
.
comment:54 Changed 6 years ago by
In spirit, it's pretty much the same as the polygen(RR)
problem. Let b=numpy.float64(5)
. Then
b.__gt__(e)
prints the same warning; under the hood it apparently calls mpfr
, which raises an FP flag, which then gets picked up by the warning printer.
Note that if I first call np.seterr(invalid='ignore')
then no warnings are printed.
comment:55 Changed 6 years ago by
it could help to configure MPFR with enablelogging
(see details in doc/README.dev in the MPFR source repository):
For example, just define MPFR_LOG_ALL, run you program, and view `mpfr.log`.
comment:56 Changed 6 years ago by
Leads me to two observation. I accidentally rebuilt mpfr with gcc on my linux box (MPFR_CONFIGURE="enablelogging" ./sage f mpfr
) and ran all the test successfully. So mpfr+clang
seems to be the real cause of the trouble.
Once I remembered that on my linux machine I had to set CC
and CXX
I encountered another problem trying to compile with logging:
libtool: compile: clang DMPFR_USE_LOGGING=1 DTIME_WITH_SYS_TIME=1 DHAVE_INTTYPES_H=1 DHAVE_STDINT_H=1 DHAVE_LOCALE_H=1 DHAVE_WCHAR_H=1 DHAVE_STDARG=1 DHAVE_SYS_TIME_H=1 DHAVE_STRUCT_LCONV_DECIMAL_POINT=1 DHAVE_STRUCT_LCONV_THOUSANDS_SEP=1 DHAVE_ALLOCA_H=1 DHAVE_STDINT_H=1 DHAVE_VA_COPY=1 DHAVE_SETLOCALE=1 DHAVE_GETTIMEOFDAY=1 DHAVE_LONG_LONG=1 DHAVE_INTMAX_T=1 DMPFR_HAVE_INTMAX_MAX=1 DMPFR_HAVE_FESETROUND=1 DHAVE_DENORMS=1 DHAVE_SIGNEDZ=1 DHAVE_ROUND=1 DHAVE_TRUNC=1 DHAVE_FLOOR=1 DHAVE_CEIL=1 DHAVE_NEARBYINT=1 DHAVE_LDOUBLE_IEEE_EXT_LITTLE=1 DHAVE_CLOCK_GETTIME=1 DLT_OBJDIR=\".libs/\" DHAVE_ATTRIBUTE_MODE=1 DHAVE___GMPN_ROOTREM=1 I. I/home/fbissey/sandbox/gitfork/sageclang/local/include Wall Wmissingprototypes Wpointerarith m64 O2 march=corei7avx mtune=corei7avx g MT add.lo MD MP MF .deps/add.Tpo c add.c fPIC DPIC o .libs/add.o add.c:28:3: error: illegal storage class on function MPFR_LOG_FUNC ^ ./mpfrimpl.h:1716:3: note: expanded from macro 'MPFR_LOG_FUNC' auto void __mpfr_log_cleanup (int *time); \ ^ add.c:28:3: error: function definition is not allowed here ./mpfrimpl.h:1717:39: note: expanded from macro 'MPFR_LOG_FUNC' void __mpfr_log_cleanup (int *time) { \ ^ 2 errors generated.
So we'll have to fix logging with clang before we can investigate with this tool. I will try on OS X shortly.
comment:57 Changed 6 years ago by
Hum equivalent error on OS X
libtool: compile: gcc DMPFR_USE_LOGGING=1 DTIME_WITH_SYS_TIME=1 DHAVE_INTTYPES_H=1 DHAVE_STDINT_H=1 DHAVE_LOCALE_H=1 DHAVE_WCHAR_H=1 DHAVE_STDARG=1 DHAVE_SYS_TIME_H=1 DHAVE_STRUCT_LCONV_DECIMAL_POINT=1 DHAVE_STRUCT_LCONV_THOUSANDS_SEP=1 DHAVE_ALLOCA_H=1 DHAVE_STDINT_H=1 DHAVE_VA_COPY=1 DHAVE_SETLOCALE=1 DHAVE_GETTIMEOFDAY=1 DHAVE_LONG_LONG=1 DHAVE_INTMAX_T=1 DMPFR_HAVE_INTMAX_MAX=1 DMPFR_HAVE_FESETROUND=1 DHAVE_DENORMS=1 DHAVE_SIGNEDZ=1 DHAVE_ROUND=1 DHAVE_TRUNC=1 DHAVE_FLOOR=1 DHAVE_CEIL=1 DHAVE_NEARBYINT=1 DHAVE_LDOUBLE_IEEE_EXT_LITTLE=1 DHAVE_CLOCK_GETTIME=1 DLT_OBJDIR=\".libs/\" DHAVE_ATTRIBUTE_MODE=1 DHAVE___GMPN_ROOTREM=1 I. I/Users/fbissey/build/sageclang/local/include Wall Wmissingprototypes Wpointerarith m64 O2 march=corei7avx mtune=corei7avx g MT exceptions.lo MD MP MF .deps/exceptions.Tpo c exceptions.c fnocommon DPIC o .libs/exceptions.o In file included from exceptions.c:23: ./mpfrimpl.h:1557:4: error: "Logging not supported (needs gcc >= 3.0 and GNU C Library >= 2.0)." # error "Logging not supported (needs gcc >= 3.0 and GNU C Library >= 2.0)." ^ 1 error generated.
comment:58 followup: 59 Changed 6 years ago by
Unless you already tried this, I'd try rerunning autoconf.
comment:59 Changed 6 years ago by
Replying to dimpase:
Unless you already tried this, I'd try rerunning autoconf.
You mean recreating mpfr's configure by running autoreconf
or something else altogether?
comment:60 followup: 61 Changed 6 years ago by
/* The following test on glibc is there mainly for Darwin (Mac OS X), to obtain a better error message. The real test should have been a test concerning nested functions in gcc, which are disabled by default on Darwin; but it is not possible to do that without a configure test. */ # if defined (__cplusplus)  !(__MPFR_GNUC(3,0) && __MPFR_GLIBC(2,0)) # error "Logging not supported (needs gcc >= 3.0 and GNU C Library >= 2.0)."
seems to say that logging needs nested functions. And clang does not do them, as we know... I'd say this is an MPFR bug, no?
comment:61 Changed 6 years ago by
Replying to dimpase:
/* The following test on glibc is there mainly for Darwin (Mac OS X), to obtain a better error message. The real test should have been a test concerning nested functions in gcc, which are disabled by default on Darwin; but it is not possible to do that without a configure test. */ # if defined (__cplusplus)  !(__MPFR_GNUC(3,0) && __MPFR_GLIBC(2,0)) # error "Logging not supported (needs gcc >= 3.0 and GNU C Library >= 2.0)."seems to say that logging needs nested functions. And clang does not do them, as we know... I'd say this is an MPFR bug, no?
Yes. Using a GNU extension  the word bug can be argued, but it fails to adhere to the standard which you do at your own peril and the cost of portability.
comment:62 Changed 6 years ago by
One can of course do a log on gcc and hope that it's identical to what one would get on clang...
comment:63 followup: 64 Changed 6 years ago by
I'd say this is an MPFR bug, no?
it is in fact a clang bug:
https://bugs.llvm.org//show_bug.cgi?id=6378
"Clang doesn't support the GNU nested function extension, sorry. We have no plans to implement it."
comment:64 followup: 67 Changed 6 years ago by
Replying to zimmerma:
I'd say this is an MPFR bug, no?
it is in fact a clang bug:
https://bugs.llvm.org//show_bug.cgi?id=6378
"Clang doesn't support the GNU nested function extension, sorry. We have no plans to implement it."
Well, as MPFR makes no claims to adhere to a C standard, you may indeed consider it a feature :)
comment:65 Changed 6 years ago by
for what's worth, this is the mpfr.log
I see on linux/gcc, after stripping the initialisation part, and running
np.float64(5).__gt__(e)
; looks like mpfr
is computing exp(1.0)
to certain precision.
> mpfr_exp:IN x[53]=1 rnd=3 > mpfr_const_log2_internal:IN rnd_mode=0 > mpfr_const_log2_internal:ZIV 1st prec=42 > mpfr_div:IN u[42]=2.2496e+21 v[42]=3.24549e+21 rnd=0 > mpfr_div:TIM 0ms > mpfr_div:OUT q[42]=0.693147 inexact=1 > mpfr_const_log2_internal:TIM 0ms > mpfr_const_log2_internal:OUT x[32]=0.693147 inex=1 > mpfr_mul:IN b[32]=0.693147 c[64]=4.61169e+18 rnd=2 > mpfr_mul:TIM 0ms > mpfr_mul:OUT a[32]=3.19658e+18 inexact=1 > mpfr_sub_ui:IN x[64]=4.61169e+18 u=2 rnd=0 > mpfr_sub:IN b[64]=4.61169e+18 c[64]=2 rnd=0 > mpfr_sub:TIM 0ms > mpfr_sub:OUT a[64]=4.61169e+18 > mpfr_sub_ui:TIM 0ms > mpfr_sub_ui:OUT y[64]=4.61169e+18 inexact=0 > mpfr_mul:IN b[32]=0.693147 c[64]=4.61169e+18 rnd=3 > mpfr_mul:TIM 0ms > mpfr_mul:OUT a[32]=3.19658e+18 inexact=1 > mpfr_exp_2:IN x[53]=1 rnd=3 > mpfr_const_log2_internal:IN rnd_mode=0 > mpfr_const_log2_internal:ZIV 1st prec=74 > mpfr_div:IN u[74]=2.17458e+40 v[74]=3.13725e+40 rnd=0 > mpfr_div:TIM 0ms > mpfr_div:OUT q[74]=0.693147 inexact=1 > mpfr_const_log2_internal:TIM 0ms > mpfr_const_log2_internal:OUT x[64]=0.693147 inex=1 > mpfr_div:IN u[53]=1 v[64]=0.693147 rnd=0 > mpfr_div:TIM 0ms > mpfr_div:OUT q[64]=1.4427 inexact=1 > mpfr_exp_2.114: d(x)=1.000000000000000000000000000000e+00 n=1 > mpfr_exp_2:ZIV 1st prec=78 > mpfr_exp_2.152: n=1 K=5 l=11 q=78 error_r=2 > mpfr_const_log2_internal:IN rnd_mode=0 > mpfr_const_log2_internal:ZIV 1st prec=90 > mpfr_div:IN u[90]=8.48887e+52 v[90]=1.22469e+53 rnd=0 > mpfr_div:TIM 0ms > mpfr_div:OUT q[90]=0.693147 inexact=1 > mpfr_const_log2_internal:TIM 0ms > mpfr_const_log2_internal:OUT x[80]=0.693147 inex=1 > mpfr_exp_2.169:x[53]=1 > mpfr_exp_2.170:r[80]=0.693147 > mpfr_sub:IN b[53]=1 c[80]=0.693147 rnd=2 > mpfr_sub:TIM 0ms > mpfr_sub:OUT a[80]=0.306853 > mpfr_exp_2.189:r[78]=0.306853 > mpfr_div_2ui:IN x[78]=0.306853 n=5 rnd=2 > mpfr_div_2ui:TIM 0ms > mpfr_div_2ui:OUT y[78]=0.00958915 inexact=0 > mpfr_exp_2.202: l=270 q=78 (K+l)*q^2=1.673e+06 > mpfr_exp_2.219: before mult. by 2^n: > mpfr_exp_2.220:s[80]=1.35914 > mpfr_exp_2.221: err=5 bits > mpfr_mul_2si:IN x[80]=1.35914 n=1 rnd=3 > mpfr_mul_2si:TIM 0ms > mpfr_mul_2si:OUT y[53]=2.71828 inexact=1 > mpfr_exp_2:TIM 3ms > mpfr_exp_2:OUT y[53]=2.71828 inexact=1 > mpfr_exp:TIM 3ms > mpfr_exp:OUT y[53]=2.71828 inexact=1 > mpfr_exp:IN x[53]=1 rnd=2 > mpfr_mul:IN b[32]=0.693147 c[64]=4.61169e+18 rnd=2 > mpfr_mul:TIM 0ms > mpfr_mul:OUT a[32]=3.19658e+18 inexact=1 > mpfr_sub_ui:IN x[64]=4.61169e+18 u=2 rnd=0 > mpfr_sub:IN b[64]=4.61169e+18 c[64]=2 rnd=0 > mpfr_sub:TIM 0ms > mpfr_sub:OUT a[64]=4.61169e+18 > mpfr_sub_ui:TIM 0ms > mpfr_sub_ui:OUT y[64]=4.61169e+18 inexact=0 > mpfr_mul:IN b[32]=0.693147 c[64]=4.61169e+18 rnd=3 > mpfr_mul:TIM 0ms > mpfr_mul:OUT a[32]=3.19658e+18 inexact=1 > mpfr_exp_2:IN x[53]=1 rnd=2 > mpfr_div:IN u[53]=1 v[64]=0.693147 rnd=0 > mpfr_div:TIM 0ms > mpfr_div:OUT q[64]=1.4427 inexact=1 > mpfr_exp_2.114: d(x)=1.000000000000000000000000000000e+00 n=1 > mpfr_exp_2:ZIV 1st prec=78 > mpfr_exp_2.152: n=1 K=5 l=11 q=78 error_r=2 > mpfr_exp_2.169:x[53]=1 > mpfr_exp_2.170:r[80]=0.693147 > mpfr_sub:IN b[53]=1 c[80]=0.693147 rnd=2 > mpfr_sub:TIM 0ms > mpfr_sub:OUT a[80]=0.306853 > mpfr_exp_2.189:r[78]=0.306853 > mpfr_div_2ui:IN x[78]=0.306853 n=5 rnd=2 > mpfr_div_2ui:TIM 0ms > mpfr_div_2ui:OUT y[78]=0.00958915 inexact=0 > mpfr_exp_2.202: l=270 q=78 (K+l)*q^2=1.673e+06 > mpfr_exp_2.219: before mult. by 2^n: > mpfr_exp_2.220:s[80]=1.35914 > mpfr_exp_2.221: err=5 bits > mpfr_mul_2si:IN x[80]=1.35914 n=1 rnd=2 > mpfr_mul_2si:TIM 0ms > mpfr_mul_2si:OUT y[53]=2.71828 inexact=1 > mpfr_exp_2:TIM 0ms > mpfr_exp_2:OUT y[53]=2.71828 inexact=1 > mpfr_exp:TIM 0ms > mpfr_exp:OUT y[53]=2.71828 inexact=1 > mpfr_mul:IN b[64]=52 c[77]=0.25 rnd=2 > mpfr_mul:TIM 0ms > mpfr_mul:OUT a[64]=13 inexact=0 > mpfr_mul:IN b[64]=52 c[77]=0.25 rnd=2 > mpfr_mul:TIM 0ms > mpfr_mul:OUT a[64]=13 inexact=0 > mpfr_add:IN b[53]=5 c[53]=2.71828 rnd=3 > mpfr_add:TIM 0ms > mpfr_add:OUT a[53]=2.28172 > mpfr_add:IN b[53]=5 c[53]=2.71828 rnd=2 > mpfr_add:TIM 0ms > mpfr_add:OUT a[53]=2.28172 > mpfr_mul:IN b[64]=52 c[77]=0.25 rnd=2 > mpfr_mul:TIM 0ms > mpfr_mul:OUT a[64]=13 inexact=0 > mpfr_mul:IN b[64]=52 c[77]=0.25 rnd=2 > mpfr_mul:TIM 0ms > mpfr_mul:OUT a[64]=13 inexact=0 > mpfr_exp_2: Ziv failed 0.00% (0 bad cases / 2 calls) > mpfr_const_log2_internal: Ziv failed 0.00% (0 bad cases / 3 calls)
comment:66 Changed 6 years ago by
indeed, MPFR is first computing exp(1)
with rounding towards infinity, then again exp(1)
with rounding towards +infinity.
Then it computes twice 52*0.25
with rounding towards +infinity (I wonder why the same value is computed twice).
Then it adds 5
and 2.71828
with rounding towards infinity and +infinity,
I guess this is to compare intervals for 5
and exp(1)
.
Again it computes twice 52*0.25
with the same rounding, I don't know why.
comment:67 Changed 6 years ago by
Replying to dimpase:
Replying to zimmerma:
I'd say this is an MPFR bug, no?
it is in fact a clang bug:
https://bugs.llvm.org//show_bug.cgi?id=6378
"Clang doesn't support the GNU nested function extension, sorry. We have no plans to implement it."
Well, as MPFR makes no claims to adhere to a C standard, you may indeed consider it a feature :)
As a person that has worked/ is working on rather exotic systems and do porting I consider it sad. It has been an extension for a rather a long time, if it was considered a useful or desirable feature it would be in the standard or planned for the next standard. That feature doesn't seem to have a big uptake either, we only had one package that absolutely required porting in sage so far.
Fortunately you don't use it in the functional part of mpfr but it is annoying.
My opinion is that while you make no claim to be standard compliant you should aim towards it and admit it as an issue, it does not have to be a show stopper bug.
comment:68 followup: 70 Changed 6 years ago by
My opinion is that while you make no claim to be standard compliant you should aim towards it and admit it as an issue, it does not have to be a show stopper bug.
feel free to report to the clang developers. It makes no sense to implement in MPFR a feature that should be implemented by the compiler.
Back to the issue, does np.float64(5).__gt__(e)
give the warning with clang?
Paul
comment:69 Changed 6 years ago by
anyway, to trace the function mpfr_exp
for example, you can apply the following patch (against the development version, but it should apply to 3.1.5 as well):
 src/exp.c (revision 11456) +++ src/exp.c (working copy) @@ 42,10 +42,15 @@ int inexact; MPFR_SAVE_EXPO_DECL (expo); +#if 0 MPFR_LOG_FUNC (("x[%Pu]=%.*Rg rnd=%d", mpfr_get_prec (x), mpfr_log_prec, x, rnd_mode), ("y[%Pu]=%.*Rg inexact=%d", mpfr_get_prec (y), mpfr_log_prec, y, inexact)); +#else + mpfr_printf ("x[%Pu]=%.*Rg rnd=%d\n", mpfr_get_prec (x), 6, x, rnd_mode); + fflush (stdout); +#endif if (MPFR_UNLIKELY( MPFR_IS_SINGULAR(x) )) { @@ 185,5 +190,8 @@ } } + mpfr_printf ("y[%Pu]=%.*Rg inexact=%d\n", mpfr_get_prec (y), 6, y, inexact); + fflush (stdout); + return mpfr_check_range (y, inexact, rnd_mode); }
This will enable to see whether the warning occurs inside the mpfr_exp
call
(you might want to replace mpfr_printf(...)
by mpfr_fprintf (stderr, ...)
if the warning is printed to stderr, and change fflush(stdout)
into fflush(stderr)
).
comment:70 followup: 71 Changed 6 years ago by
Replying to zimmerma:
Back to the issue, does
np.float64(5).__gt__(e)
give the warning with clang?
Yes, at least on OS X:
sage: np.float16(5).__gt__(e) /Users/jpalmier/Desktop/Sage_stuff/sage_builds/TESTING/sage8.0.beta5/src/bin/sageipython:1: RuntimeWarning: invalid value encountered in greater #!/usr/bin/env python True sage: np.float32(5).__gt__(e) /Users/jpalmier/Desktop/Sage_stuff/sage_builds/TESTING/sage8.0.beta5/src/bin/sageipython:1: RuntimeWarning: invalid value encountered in greater #!/usr/bin/env python True sage: np.float64(5).__gt__(e) /Users/jpalmier/Desktop/Sage_stuff/sage_builds/TESTING/sage8.0.beta5/src/bin/sageipython:1: RuntimeWarning: invalid value encountered in greater #!/usr/bin/env python True sage: np.float128(5).__gt__(e) /Users/jpalmier/Desktop/Sage_stuff/sage_builds/TESTING/sage8.0.beta5/src/bin/sageipython:1: RuntimeWarning: invalid value encountered in greater #!/usr/bin/env python True
comment:71 Changed 6 years ago by
Replying to jhpalmieri:
Replying to zimmerma:
Back to the issue, does
np.float64(5).__gt__(e)
give the warning with clang?Yes, at least on OS X:
sage: np.float16(5).__gt__(e) /Users/jpalmier/Desktop/Sage_stuff/sage_builds/TESTING/sage8.0.beta5/src/bin/sageipython:1: RuntimeWarning: invalid value encountered in greater #!/usr/bin/env python True sage: np.float32(5).__gt__(e) /Users/jpalmier/Desktop/Sage_stuff/sage_builds/TESTING/sage8.0.beta5/src/bin/sageipython:1: RuntimeWarning: invalid value encountered in greater #!/usr/bin/env python True sage: np.float64(5).__gt__(e) /Users/jpalmier/Desktop/Sage_stuff/sage_builds/TESTING/sage8.0.beta5/src/bin/sageipython:1: RuntimeWarning: invalid value encountered in greater #!/usr/bin/env python True sage: np.float128(5).__gt__(e) /Users/jpalmier/Desktop/Sage_stuff/sage_builds/TESTING/sage8.0.beta5/src/bin/sageipython:1: RuntimeWarning: invalid value encountered in greater #!/usr/bin/env python True
Same on linux.
comment:72 Changed 6 years ago by
With the patch from comment:69:
sage: import numpy as np sage: np.float128(5).__gt__(e) x[53]=1 rnd=3 y[53]=2.71828 inexact=1 x[53]=1 rnd=2 y[53]=2.71828 inexact=1 /Users/jpalmier/Desktop/Sage_stuff/sage_builds/TESTING/sage8.0.beta5/src/bin/sageipython:1: RuntimeWarning: invalid value encountered in greater #!/usr/bin/env python True
comment:73 Changed 6 years ago by
Any suggestions for other changes along the lines of comment:69 to help track down the problem?
comment:74 Changed 6 years ago by
I suppose anything short of actually setting up a watch on FPU bits will not help much. (sorry for slow response  I'm in singleparenting mode for a week :))
comment:75 followup: 76 Changed 6 years ago by
from comment 72 it seems the warning occurs after the two calls to mpfr_exp
.
Here is another patch to see whether it occurs in mpfr_mul
or mpfr_add
:
Index: src/add.c ===================================================================  src/add.c (revision 11456) +++ src/add.c (working copy) @@ 25,11 +25,16 @@ MPFR_HOT_FUNCTION_ATTR int mpfr_add (mpfr_ptr a, mpfr_srcptr b, mpfr_srcptr c, mpfr_rnd_t rnd_mode) { +#if 0 MPFR_LOG_FUNC (("b[%Pu]=%.*Rg c[%Pu]=%.*Rg rnd=%d", mpfr_get_prec (b), mpfr_log_prec, b, mpfr_get_prec (c), mpfr_log_prec, c, rnd_mode), ("a[%Pu]=%.*Rg", mpfr_get_prec (a), mpfr_log_prec, a)); +#else + printf ("enter mpfr_add\n"); + fflush (stdout); +#endif if (MPFR_ARE_SINGULAR_OR_UBF (b, c)) { @@ 100,23 +105,28 @@ MPFR_ASSERTD (MPFR_IS_PURE_FP (b)); MPFR_ASSERTD (MPFR_IS_PURE_FP (c)); + int ret; if (MPFR_UNLIKELY(MPFR_SIGN(b) != MPFR_SIGN(c))) { /* signs differ, it is a subtraction */ if (MPFR_LIKELY(MPFR_PREC(a) == MPFR_PREC(b) && MPFR_PREC(b) == MPFR_PREC(c)))  return mpfr_sub1sp(a, b, c, rnd_mode); + ret = mpfr_sub1sp(a, b, c, rnd_mode); else  return mpfr_sub1(a, b, c, rnd_mode); + ret = mpfr_sub1(a, b, c, rnd_mode); } else { /* signs are equal, it's an addition */ if (MPFR_LIKELY(MPFR_PREC(a) == MPFR_PREC(b) && MPFR_PREC(b) == MPFR_PREC(c)))  return mpfr_add1sp(a, b, c, rnd_mode); + ret = mpfr_add1sp(a, b, c, rnd_mode); else if (MPFR_GET_EXP(b) < MPFR_GET_EXP(c))  return mpfr_add1(a, c, b, rnd_mode); + ret = mpfr_add1(a, c, b, rnd_mode); else  return mpfr_add1(a, b, c, rnd_mode); + ret = mpfr_add1(a, b, c, rnd_mode); } + + printf ("exit mpfr_add\n"); + fflush (stdout); + return ret; } Index: src/mul.c ===================================================================  src/mul.c (revision 11456) +++ src/mul.c (working copy) @@ 688,6 +688,7 @@ mp_size_t bn, cn, tn, k, threshold; MPFR_TMP_DECL (marker); +#if 0 MPFR_LOG_FUNC (("b[%Pu]=%.*Rg c[%Pu]=%.*Rg rnd=%d", mpfr_get_prec (b), mpfr_log_prec, b, @@ 694,6 +695,10 @@ mpfr_get_prec (c), mpfr_log_prec, c, rnd_mode), ("a[%Pu]=%.*Rg inexact=%d", mpfr_get_prec (a), mpfr_log_prec, a, inexact)); +#else + printf ("enter mpfr_mul\n"); + fflush (stdout); +#endif /* deal with special cases */ if (MPFR_ARE_SINGULAR (b, c)) @@ 1030,5 +1035,7 @@ rnd_mode = MPFR_RNDZ; return mpfr_underflow (a, rnd_mode, sign); } + printf ("exit mpfr_mul\n"); + fflush (stdout); MPFR_RET (inexact); }
Side question: what routine of MPFR (if any) does np.float128(5).__gt__
call?
comment:76 Changed 6 years ago by
Replying to zimmerma:
from comment 72 it seems the warning occurs after the two calls to
mpfr_exp
.
This is correct; there is no interrupt mechanics set that would make sure the warning printed immediately. The warning is printed after numpy
completes the task of computing the value of
np.float128(5).__gt__(e)
, before it returns the result.
Side question: what routine of MPFR (if any) does
np.float128(5).__gt__
call?
Numpy people told us some details here.
They say that behind the curtains it will try calling something like e.__lt__
, assuming e
is the argument (and all this happens within a compiled module written in (generated) C, making it hard to debug easily).
So we have two Pythonbased computer algebra systems not talking to each other too well...
comment:77 followup: 79 Changed 6 years ago by
my guess is the following:
(1) first intervals of MPFR values are computed that enclose 5 and exp(1)
(2) then those intervals are converted into the np.float128
type
(3) then the comparison is performed
I guess the warning occurs because a NaN was generated in step (2). It might be inside the mpfr_get_float128
function. Here is another patch to check:
Index: src/get_float128.c ===================================================================  src/get_float128.c (revision 11456) +++ src/get_float128.c (working copy) @@ 30,8 +30,15 @@ mpfr_get_float128 (mpfr_srcptr x, mpfr_rnd_t rnd_mode) { + printf ("enter mpfr_get_float128\n"); + fflush (stdout); + if (MPFR_UNLIKELY (MPFR_IS_SINGULAR (x)))  return (__float128) mpfr_get_d (x, rnd_mode); + { + printf ("exit mpfr_get_float128: MPFR_IS_SINGULAR(x)\n"); + fflush (stdout); + return (__float128) mpfr_get_d (x, rnd_mode); + } else /* now x is a normal nonzero number */ { __float128 r; /* result */ @@ 97,6 +104,8 @@ } if (sign < 0) r = r; + printf ("exit mpfr_get_float128: normal case\n"); + fflush (stdout); return r; } }
comment:78 Changed 6 years ago by
I don't see get_float128.c
in version 3.1.5. I've tried to use similar patches in get_ld.c
, get_d64.c
, and get_float.c
, but none of the relevant functions print anything when I run np.float128(5).__gt__(e)
(or the same with np.float64
, etc.).
comment:79 Changed 6 years ago by
Replying to zimmerma:
my guess is the following:
(1) first intervals of MPFR values are computed that enclose 5 and exp(1)
(2) then those intervals are converted into the
np.float128
type
IMHO it is different (and isn't in so by the complete(?) trace in comment 65 above): numpy
has no way to convert MPFR numbers into numpy
numbers,
without asking Sage to do this. And it does not even know that Sage can do it. So it all happens on the level of Python data: numpy
knows that after getting "not implemented" from np.float128(5).__gt__(e)
it may try e.__lt__(np.float128(5))
.
And the latter invokes comparison in Sage, done with MPFR numbers.
(3) then the comparison is performed
Isn't the actual comparison is performed on MPFR numbers rather than on numpy
numbers?
As above
> mpfr_add:IN b[53]=5 c[53]=2.71828 rnd=3 > mpfr_add:TIM 0ms > mpfr_add:OUT a[53]=2.28172 > mpfr_add:IN b[53]=5 c[53]=2.71828 rnd=2 > mpfr_add:TIM 0ms
comment:80 Changed 6 years ago by
Isn't the actual comparison is performed on MPFR numbers rather than on numpy numbers?
yes it might be, since after the two mpfr_add
calls we should get an interval [u,v] where
exp(1)5 lies. Then I guess Sage should check whether u > 0
or v < 0
. But this is possible via several MPFR functions (which are not logged through enablelogging
). It could be
mpfr_cmp_ui (u, 0)
, or mpfr_cmp (u, zero)
since zero is predefined, or mpfr_sgn(u)
. One should add logging in those functions to see which one is called.
comment:81 Changed 6 years ago by
As far as I can tell, the last thing called (or at least the last thing called in which I've added logging) before numpy reports an error is mpfr_cmp3
.
comment:82 Changed 6 years ago by
As far as I can tell, the last thing called (or at least the last thing called in which I've added logging) before numpy reports an error is
mpfr_cmp3
.
please could you test with the following patch (against mpfr3.1.5)?
 cmp.c 20160927 09:58:15.000000000 +0200 +++ /tmp/cmp.c 20170512 08:32:54.914688069 +0200 @@ 35,6 +35,11 @@ mp_size_t bn, cn; mp_limb_t *bp, *cp; + printf ("enter mpfr_cmp3\n"); + printf ("b="); mpfr_dump (b); + printf ("c="); mpfr_dump (c); + printf ("s=%d\n", s); + s = MPFR_MULT_SIGN( s , MPFR_SIGN(c) ); if (MPFR_ARE_SINGULAR(b, c)) @@ 42,34 +47,59 @@ if (MPFR_IS_NAN (b)  MPFR_IS_NAN (c)) { MPFR_SET_ERANGE (); + printf ("exit mpfr_cmp3: NaN case\n"); return 0; } else if (MPFR_IS_INF(b)) { if (MPFR_IS_INF(c) && s == MPFR_SIGN(b) )  return 0; + { + printf ("exit mpfr_cmp3: Inf1 case\n"); + return 0; + } else  return MPFR_SIGN(b); + { + printf ("exit mpfr_cmp3: Inf2 case\n"); + return MPFR_SIGN(b); + } } else if (MPFR_IS_INF(c))  return s; + { + printf ("exit mpfr_cmp3: Inf3 case\n"); + return s; + } else if (MPFR_IS_ZERO(b))  return MPFR_IS_ZERO(c) ? 0 : s; + { + printf ("exit mpfr_cmp3: zero1 case\n"); + return MPFR_IS_ZERO(c) ? 0 : s; + } else /* necessarily c=0 */  return MPFR_SIGN(b); + { + return MPFR_SIGN(b); + printf ("exit mpfr_cmp3: zero2 case\n"); + } } /* b and c are real numbers */ if (s != MPFR_SIGN(b))  return MPFR_SIGN(b); + { + printf ("exit mpfr_cmp3: s != MPFR_SIGN(b)\n"); + return MPFR_SIGN(b); + } /* now signs are equal */ be = MPFR_GET_EXP (b); ce = MPFR_GET_EXP (c); if (be > ce)  return s; + { + printf ("exit mpfr_cmp3: be > ce\n"); + return s; + } if (be < ce)  return s; + { + printf ("exit mpfr_cmp3: be < ce\n"); + return s; + } /* both signs and exponents are equal */ @@ 82,18 +112,31 @@ for ( ; bn >= 0 && cn >= 0; bn, cn) { if (bp[bn] > cp[cn])  return s; + { + printf ("exit mpfr_cmp3: bp[bn] > cp[cn]\n"); + return s; + } if (bp[bn] < cp[cn])  return s; + { + printf ("exit mpfr_cmp3: bp[bn] < cp[cn]\n"); + return s; + } } for ( ; bn >= 0; bn) if (bp[bn])  return s; + { + printf ("exit mpfr_cmp3: bp[bn] > 0\n"); + return s; + } for ( ; cn >= 0; cn) if (cp[cn])  return s; + { + printf ("exit mpfr_cmp3: cp[bn] > 0\n"); + return s; + }  return 0; + printf ("exit mpfr_cmp3: equal case\n"); + return 0; } #undef mpfr_cmp
comment:83 Changed 6 years ago by
I get this:
enter mpfr_cmp3 b=0 c=0.10011011111100001010100010110001010001010111011010010E1 s=1 exit mpfr_cmp3: zero1 case /Users/jpalmier/Desktop/Sage_stuff/sage_builds/TESTING/sage8.0.beta4/src/bin/sageipython:1: RuntimeWarning: invalid value encountered in less #!/usr/bin/env python
comment:84 followup: 87 Changed 6 years ago by
I see no "less" comparison in the mpfr_cmp3
branch corresponding to the zero1
case. The invalid value encountered in less
warning might correspond to a comparison with NaN, but neither b nor c are NaN here.
comment:85 Changed 6 years ago by
What the the Python call that triggers this?
It used to be np.float64(5).__gt__(e)
, giving invalid value in greater
warning. Does the same call with patched MPFR give invalid value in less
warning?
comment:86 Changed 6 years ago by
I get warnings with both __gt__
and __lt__
; the one in comment:83 was from __lt__
. When I do np.float64('1.5').__gt__(e)
, the various logging messages end in this:
enter mpfr_cmp3 b=0.10011011111100001010100010110001010001010111011010100E1 c=0 s=1 enter mpfr_cmp3 b=0 c=0.10011011111100001010100010110001010001010111011010010E1 s=1 exit mpfr_cmp3: zero1 case /Users/jpalmier/Desktop/Sage_stuff/sage_builds/TESTING/sage8.0.beta4/src/bin/sageipython:1: RuntimeWarning: invalid value encountered in greater #!/usr/bin/env python
comment:87 Changed 6 years ago by
Replying to zimmerma:
I see no "less" comparison in the
mpfr_cmp3
branch corresponding to thezero1
case. Theinvalid value encountered in less
warning might correspond to a comparison with NaN, but neither b nor c are NaN here.
is there a "greater" comparison? This is the one that would correspond to __lt__
in the original call, as numpy people tell us.
comment:88 Changed 6 years ago by
is there a "greater" comparison?
there is no "greater" comparison either. The only comparisons are between exponents and words of the significand, but no such comparison occurs when one of the operands is zero.
And anyway, there is no doubleprecision NaN
in that function.
comment:89 Changed 6 years ago by
here is somewhat less involved way to trigger this, not involving running through numpy evaluation loop
import numpy as np from ctypes import cdll from ctypes.util import find_library libm = cdll.LoadLibrary(find_library('m')) print libm.fetestexcept(int(0x01)) # checks if FE_INVALID is set bool(e.__lt__(np.float32('1.5'))) print libm.fetestexcept(int(0x01))
Running this on Linux/gcc produces
0 False 0
while on FreeBSD/clang I get
0 False 1
After I've found this, I decided to check whether merely importing numpy does something to the FPU flags on FreeBSD, and in fact it does! Namely, the output of
from ctypes import cdll from ctypes.util import find_library libm = cdll.LoadLibrary(find_library('m')) print libm.fetestexcept(int(0x01)) import numpy print libm.fetestexcept(int(0x01))
is
0 0
on Linux/gcc, and
1 0
on FreeBSD/clang! And in fact one can see that the FE_INVALID bit is flipped by
bool(e.__lt__(float('1.5')))
just as well:
sage: import numpy ....: from ctypes import cdll ....: from ctypes.util import find_library ....: libm = cdll.LoadLibrary(find_library('m')) ....: print libm.fetestexcept(int(0x01)) ....: bool(e.__lt__(float('1.5'))) ....: print libm.fetestexcept(int(0x01)) ....: 0 False 1
on FreeBSD/clang (but the last 1 becomes 0 on Linux/gcc).
comment:90 Changed 6 years ago by
That is, we also need to debug Sage for a place that flips FE_INVALID!
$ ./sage python Python 2.7.13 (default, May 14 2017, 23:48:25) [GCC 4.2.1 Compatible Clang 4.0.0 ] on freebsd11 Type "help", "copyright", "credits" or "license" for more information. >>> from ctypes import cdll >>> from ctypes.util import find_library >>> libm = cdll.LoadLibrary(find_library('m')) >>> print libm.fetestexcept(int(0x01)) 0 >>> from sage.all import * >>> print libm.fetestexcept(int(0x01)) 1
comment:91 Changed 6 years ago by
I added a bunch of print ("TAG: {}".format(libm.fetestexcept(int(0x01))))
statements with various tags. The result changes from 0 to 1 in the file sage/libs/pynac/pynac.pyx
, at the line init_pynac_I()
. Within that function, it changes at the line
K = QuadraticField(1, 'I', embedding=CC.gen(), latex_name='i')
Quadratic fields are constructed using UniqueFactory
, and the result changes from 0 to 1 in the try/except
block
cache_key = key print ("020: {}".format(libm.fetestexcept(int(0x01)))) try: try: return self._cache[version, cache_key] except TypeError: # key is unhashable print ("030: {}".format(libm.fetestexcept(int(0x01)))) cache_key = _cache_key(cache_key) return self._cache[version, cache_key] except KeyError: print ("040: {}".format(libm.fetestexcept(int(0x01)))) pass
in the get_object
method for UniqueFactory
in sage/structure/factory.pyx. With the print statements as indicated, I see
020: 0 040: 1 050: 1
So is it something to do with the cache? The cache is defined by
self._cache = sage.misc.weak_dict.WeakValueDictionary()
So is something going on with Sage's weak dictionaries?
There is also the possibility that I'm misinterpreting everything and the problem is somewhere else completely.
comment:92 Changed 6 years ago by
I don't get the need of nested try/except blocks in that fragment of the code, is it just some leftover?
comment:93 Changed 6 years ago by
I suppose it should be
try: blah except TypeError: blah except KeyError: pass
?
comment:94 Changed 6 years ago by
Unless the last except
is also there to catch a KeyError
in the call to return self._cache[version, cache_key]
within the first except
clause.
comment:95 followup: 98 Changed 6 years ago by
what happens if you remove the except KeyError
part?
comment:96 followup: 97 Changed 6 years ago by
Probably one should try building fpectl Python module (not in Sage Python) and use it to locate where flags are raised during the import.
comment:97 Changed 6 years ago by
Replying to dimpase:
Probably one should try building fpectl Python module (not in Sage Python) and use it to locate where flags are raised during the import.
good idea. I'm curious to see where a NaN is generated.
comment:98 Changed 6 years ago by
Replying to zimmerma:
what happens if you remove the
except KeyError
part?
A KeyError
gets raised and Sage crashes on startup:
KeyError: ((8, 0, 'beta5'), (Rational Field, x^2 + 1, ('I',), (Complex Lazy Field, 1*I), 'i', None, False, None))
If instead I print out self._cache, version, and cache_key, I see this:
self._cache: <WeakValueDictionary at 0x21bb99410> version: (8, 0, 'beta5'), cache_key: (Rational Field, x^2 + 1, ('I',), (Complex Lazy Field, 1*I), 'i', None, False, None)
comment:99 followups: 100 107 Changed 6 years ago by
and then do you get 050: 1
or 050: 0
?
I wonder how the access to self._cache[version, cache_key]
can yield an invalid exception.
comment:100 Changed 6 years ago by
Replying to zimmerma:
and then do you get
050: 1
or050: 0
?
If I remove the entire except KeyError
section, Sage crashes before it ever gets to that print statement. If I keep that part, possibly adding some print statements to it, then I get 050: 1
, as I said before.
I wonder how the access to
self._cache[version, cache_key]
can yield an invalid exception.
comment:101 followup: 102 Changed 6 years ago by
I have no other idea, except trying the fpectl Python module as suggested in comment 97.
comment:102 Changed 6 years ago by
Replying to zimmerma:
I have no other idea, except trying the fpectl Python module as suggested in comment 97.
I'm trying this on FreeBSD now. fpectl is in fact quite broken, so I had to hack around to build it, and yes, I can set fpectl.turnon_sigfpe()
and get a coredump while running from sage.all import *
. The problem is to get a good coredump now...
comment:103 Changed 6 years ago by
the coredump (while running from sage.all import *
with sigfpe on) points its finger at a call to mpfr_set_d
. Specifically:
/* "sage/rings/real_mpfr.pyx":5796 * cdef RealField_class parent = <RealField_class>self._codomain * cdef RealNumber y = parent._new() * mpfr_set_d(y.value, x, parent.rnd) # <<<<<<<<<<<<<< * return y * */ __pyx_t_6 = __pyx_PyFloat_AsDouble(__pyx_v_x); if (unlikely((__pyx_t_6 == (double)1) && PyErr_Occurred())) __PYX_ERR(0, 5796, __py x_L1_error) mpfr_set_d(__pyx_v_y>value, __pyx_t_6, __pyx_v_parent>rnd);
as can be seen at
#0 0x0000000800b141ca in kill () from /lib/libc.so.7 #1 0x00000008044892eb in sigdie (sig=8, s=0x80448b8b3 "Unhandled SIGFPE: An unhandled floating point exception occurred.") at implementation.c:427 #2 0x000000080448910f in cysigs_signal_handler (sig=8) at implementation.c:206 #3 0x00000008013ed78f in pthread_sigmask () from /lib/libthr.so.3 #4 0x00000008013ecd6f in pthread_getspecific () from /lib/libthr.so.3 #5 <signal handler called> #6 0x0000000811a7abfd in mpfr_set_d (r=0x8605c89d0, d=<value optimized out>, rnd_mode=MPFR_RNDN) at set_d.c:121 #7 0x0000000848c5af54 in __pyx_f_4sage_5rings_9real_mpfr_11double_toRR__call_ (__pyx_v_self=<value optimized out>, __pyx_v_x=0x84a298320, __pyx_skip_dispatch=<value optimized out>) at /usr/home/dima/Sage/sage/src/build/cythonized/sage/rings/real_mpfr.c:37760 #8 0x000000080cf2b20a in __pyx_f_4sage_9structure_6coerce_24CoercionModel_cache_maps_canonical_coercion ( __pyx_v_self=<value optimized out>, __pyx_v_x=<value optimized out>, __pyx_v_y=0x84a298320, __pyx_skip_dispatch=<value optimized out>) at /usr/home/dima/Sage/sage/src/build/cythonized/sage/structure/coerce.c:10904 #9 0x000000080cf3433b in __pyx_f_4sage_9structure_6coerce_24CoercionModel_cache_maps_richcmp (__pyx_v_self=0x808a86738, __pyx_v_x=0x8605c87d0, __pyx_v_y=0x84a298320, __pyx_v_op=0, __pyx_skip_dispatch=<value optimized out>) at /usr/home/dima/Sage/sage/src/build/cythonized/sage/structure/coerce.c:20224 #10 0x000000080ccd0185 in __pyx_pw_4sage_9structure_7element_7Element_67__richcmp__ (__pyx_v_self=0x8605c87d0, __pyx_v_other=0x84a298320, __pyx_v_op=0) at /usr/home/dima/Sage/sage/src/build/cythonized/sage/structure/element.c:10385 #11 0x0000000800e76d2b in try_rich_compare (v=0x8605c87d0, w=0x84a298320, op=0) at Objects/object.c:622 #12 0x0000000800e74c69 in PyObject_RichCompare (v=0x8605c87d0, w=0x84a298320, op=0) at Objects/object.c:930 #13 0x0000000800ed2ced in PyEval_EvalFrameEx (f=0x855fbdda0, throwflag=<value optimized out>) at Python/ceval.c:4858 #14 0x0000000800ecd874 in PyEval_EvalCodeEx (co=<value optimized out>, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=<value optimized out>, kws=0x0, kwcount=<value optimized out>, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584 #15 0x0000000800e588c3 in function_call (func=<value optimized out>, arg=<value optimized out>, kw=<value optimized out>) at Objects/funcobject.c:523 #16 0x0000000800e2b9c1 in PyObject_CallFunctionObjArgs (callable=0x810fa2758) at Objects/abstract.c:2547 #17 0x0000000800e5deb8 in listsort (self=0x8605c4908, args=<value optimized out>, kwds=<value optimized out>) at Objects/listobject.c:2100 ...
comment:104 Changed 6 years ago by
even shorter way to see the problem:
$ ./sage python Python 2.7.13 (default, May 17 2017, 16:36:57) [GCC 4.2.1 Compatible Clang 4.0.0 ] on freebsd11 Type "help", "copyright", "credits" or "license" for more information. >>> import fpectl >>> from sage.all import * >>> fpectl.turnon_sigfpe() >>> p=RDF.pi() >>> from sage.rings.real_mpfr import double_toRR >>> f=double_toRR(RDF, RealField(22)) >>> f(p) Fatal Python error: Unprotected floating point exception ...
and the piece of the dump points to the same place:
#13 <signal handler called> #14 0x0000000811a7abfd in mpfr_set_d (r=0x879a93550, d=<value optimized out>, rnd_mode=MPFR_RNDN) at set_d.c:121 #15 0x0000000848c5af54 in __pyx_f_4sage_5rings_9real_mpfr_11double_toRR__call_ (__pyx_v_self=<value optimized out>, __pyx_v_x=0x87983bfd0, __pyx_skip_dispatch=<value optimized out>) at /usr/home/dima/Sage/sage/src/build/cythonized/sage/rings/real_mpfr.c:37760 #16 0x000000080d17a5d1 in __pyx_pw_4sage_10categories_3map_3Map_30__call__ (__pyx_v_self=<value optimized out>, __pyx_args=<value optimized out>, __pyx_kwds=<value optimized out>) at /usr/home/dima/Sage/sage/src/build/cythonized/sage/categories/map.c:6682 #17 0x0000000800e31195 in PyObject_Call (func=0x879a88e60, arg=0x800752610, kw=0x0) at Objects/abstract.c:2547
comment:105 Changed 6 years ago by
Does MPFR suppress compiler's messages?
(there is some >/dev/null
in the log...) Just wondering whether we might be missing some interesting compiler warnings.
comment:106 Changed 6 years ago by
here is sanity check  even without the fpectl thing, one can see that f(p)
above flips an FPU bit:
sage: from ctypes import cdll sage: from ctypes.util import find_library sage: libm = cdll.LoadLibrary(find_library('m')) sage: print libm.fetestexcept(int(0x01)) 1 sage: import numpy # this resets the FPU bits sage: print libm.fetestexcept(int(0x01)) 0 sage: p=RDF.pi() sage: from sage.rings.real_mpfr import double_toRR sage: f=double_toRR(RDF, RealField(22)) sage: f(p) 3.14159 sage: print libm.fetestexcept(int(0x01)) 1
More precisely, it is the FE_INVALID bit.
comment:107 Changed 6 years ago by
Replying to zimmerma:
I wonder how the access to
self._cache[version, cache_key]
can yield an invalid exception.
With ad hoc mpfr logging turned on, I see a lot of messages printed by mpfr between the "try" and "except" parts of the code. I don't know why. For example, if I add some print messages to mpfr_set_d
:
020: 0 enter mpfr_set_d exit mpfr_set_d: normal enter mpfr_set_d exit mpfr_set_d: normal enter mpfr_set_d exit mpfr_set_d: normal enter mpfr_set_d exit mpfr_set_d: normal enter mpfr_set_d exit mpfr_set_d: normal enter mpfr_set_d exit mpfr_set_d: normal enter mpfr_set_d exit mpfr_set_d: normal enter mpfr_set_d exit mpfr_set_d: normal enter mpfr_set_d exit mpfr_set_d: normal enter mpfr_set_d exit mpfr_set_d: normal enter mpfr_set_d exit mpfr_set_d: normal enter mpfr_set_d exit mpfr_set_d: normal 040: 1 enter mpfr_set_d exit mpfr_set_d: normal version:(8, 0, 'beta4'), cache_key: (Rational Field, x^2 + 1, ('I',), (Complex Lazy Field, 1*I), 'i', None, False, None) 050: 1
("exit mpfr_set_d: normal" is printed if the end of that function is reached, as opposed to the special cases for 0 or infinity.)
comment:108 followups: 109 112 Changed 6 years ago by
line 121 of set_d.c
is the following in mpfr3.1.5 (if no patch was applied in Sage):
manl = d;
A first suggestion is to add a cast:
manl = (mp_limb_t) d;
Also, please could you compile with O0 g
so that the debugger does not optimize out the value of d
?
#6 0x0000000811a7abfd in mpfr_set_d (r=0x8605c89d0, d=<value optimized out>, rnd_mode=MPFR_RNDN) at set_d.c:121
Alternatively, one could add at the beginning of __gmpfr_extract_double
:
printf ("d=%f\n", d);
to see which value of d
produces the invalid exception.
comment:109 Changed 6 years ago by
Replying to zimmerma:
line 121 of
set_d.c
is the following in mpfr3.1.5 (if no patch was applied in Sage):manl = d;A first suggestion is to add a cast:
manl = (mp_limb_t) d;Also, please could you compile with
O0 g
so that the debugger does not optimize out the value ofd
?#6 0x0000000811a7abfd in mpfr_set_d (r=0x8605c89d0, d=<value optimized out>, rnd_mode=MPFR_RNDN) at set_d.c:121
Here is the result (coredump) of f(1.5)
invoked as in comment 104 above:
#13 <signal handler called> #14 0x0000000811a86deb in __gmpfr_extract_double (rp=0x7fffffffd438, d=1.3835058055282164e+19) at set_d.c:121 #15 0x0000000811a8696e in mpfr_set_d (r=0x879ad3610, d=1.5, rnd_mode=MPFR_RNDN) at set_d.c:214 #16 0x0000000848c5af54 in __pyx_f_4sage_5rings_9real_mpfr_11double_toRR__call_ (__pyx_v_self=<value optimized out>, __pyx_v_x=0x87987bfd0, __pyx_skip_dispatch=<value optimized out>) at /usr/home/dima/Sage/sage/src/build/cythonized/sage/rings/real_mpfr.c:37760 #17 0x000000080d17a38e in __pyx_pw_4sage_10categories_3map_3Map_30__call__ (__pyx_v_self=<value optimized out>, __pyx_args=<value optimized out>, __pyx_kwds=<value optimized out>) at /usr/home/dima/Sage/sage/src/build/cythonized/sage/categories/map.c:7093
after I recompiled mpfr and mpc with O0
(by using SAGE_DEBUG=yes)
In instead I do f(p)
(exactly as in comment 104) the only difference I see
is
#14 0x0000000811486deb in __gmpfr_extract_double (rp=0x7fffffffd448, d=1.4488038916154245e+19) at set_d.c:121 #15 0x000000081148696e in mpfr_set_d (r=0x87991da30, d=3.1415926535897931, rnd_mode=MPFR_RNDN) at set_d.c:214
What would you like to get printed (or/and Ccast'ed ?) based on this info?
comment:110 Changed 6 years ago by
please could you first apply the following patch and try again?
 set_d.c.orig 20170518 12:01:41.925524683 +0200 +++ set_d.c 20170518 12:04:38.728177129 +0200 @@ 38,6 +38,8 @@ mp_limb_t manh; #endif + printf ("line 41: d=%.16e\n", d); + /* BUGS 1. Should handle Inf and NaN in IEEE specific code. 2. Handle Inf and NaN also in default code, to avoid hangs. @@ 95,11 +97,13 @@ { d *= (1.0 / 65536.0); exp += 16; + printf ("100: d=%.16e exp=%ld\n", d, exp); } while (d >= 1.0) { d *= 0.5; exp += 1; + printf ("106: d=%.16e exp=%ld\n", d, exp); } } else if (d < 0.5) @@ 116,7 +120,9 @@ } } + printf ("MP_BASE_AS_DOUBLE = %.16e\n", MP_BASE_AS_DOUBLE); d *= MP_BASE_AS_DOUBLE; + printf ("125: d=%.16e\n", d); #if GMP_NUMB_BITS >= 64 manl = d; #else
I suspect the value of d
at (original) line 121 is NaN, but I can't see how this can happen.
comment:111 Changed 6 years ago by
with the patch I get
line 41: d=3.1415926535897931e+00 106: d=1.5707963267948966e+00 exp=1 106: d=7.8539816339744828e01 exp=2 MP_BASE_AS_DOUBLE = 1.8446744073709552e+19 125: d=1.4488038916154245e+19 Fatal Python error: Unprotected floating point exception
and essentially the same (up to line number changes):
#13 <signal handler called> #14 0x0000000811486e6c in __gmpfr_extract_double (rp=0x7fffffffd458, d=1.4488038916154245e+19) at set_d.c:127 #15 0x000000081148696e in mpfr_set_d (r=0x87991e910, d=3.1415926535897931, rnd_mode=MPFR_RNDN) at set_d.c:220 #16 0x000000084865af54 in __pyx_f_4sage_5rings_9real_mpfr_11double_toRR__call_ (__pyx_v_self=<value optimized out>, __pyx_v_x=0x8796baf58, __pyx_skip_dispatch=<value optimized out>) at /usr/home/dima/Sage/sage/src/build/cythonized/sage/rings/real_mpfr.c:37760
comment:112 Changed 6 years ago by
Replying to zimmerma:
line 121 of
set_d.c
is the following in mpfr3.1.5 (if no patch was applied in Sage):manl = d;A first suggestion is to add a cast:
manl = (mp_limb_t) d;
Forgot to mention that this cast does not change anything at all.
comment:113 Changed 6 years ago by
ok, I can reproduce the issue on my Linux machine with the following program (where I replaced mp_limb_t manl
by long manl
):
#include <stdio.h> #include <stdlib.h> #include <fenv.h> int main (int argc, char *argv[]) { long manl; double d = atof (argv[1]); printf ("fetestexcept() = %d\n", fetestexcept (FE_INVALID)); manl = d; printf ("fetestexcept() = %d\n", fetestexcept (FE_INVALID)); printf ("manl = %ld\n", manl); return 0; }
which gives:
zimmerma@tomate:/tmp$ gcc O0 e.c lm zimmerma@tomate:/tmp$ ./a.out 1.4488038916154245e+19 fetestexcept() = 0 fetestexcept() = 1 manl = 9223372036854775808
thus a possible explanation could be that mp_limb_t
is a signed 64bit type (aka long
) on this platform, but I doubt.
Please can you check the output of this program (and the same by replacing long
by mp_limb_t
and adding the gmp.h
header)?
comment:114 Changed 6 years ago by
on the platform one has mp_limb_t
of type unsigned long int
, but still if I replace
long manl;
with
unsigned long manl;
and replace format %ld
by %lu
I get
$ ./a.out 1.4488038916154245e+19 fetestexcept() = 0 fetestexcept() = 1 manl = 14488038916154245120
In fact, as well, FE_INEXACT
is raised, not only FE_INVALID
.
By the way: sometimes, if I enter as input something smaller, say 1.44e0
I get no FE_INVALID
raised, but instead FE_INEXACT
is raised.
Entering 1.44e0
on Linux/gcc still raises FE_INEXACT
after the assignment; on FreeBSD/clang FE_INEXACT
is also raised at d=atof()
call.
comment:115 Changed 6 years ago by
By the way: sometimes, if I enter as input something smaller, say 1.44e0 I get no FE_INVALID raised, but instead FE_INEXACT is raised.
this is normal, since 1.44e0 is not an integer, and thus the integer conversion should raise the
inexact flag, but for 1.4488038916154245e+19
it should not be the case, since the value of d
should be 14488038916154245120
, which is the closest 53bit floatingpoint number.
We are making progress. Now I can reproduce the issue with clang on my LinuxDebian computer with the following program:
#include <stdio.h> #include <stdlib.h> #include <fenv.h> int main (int argc, char *argv[]) { unsigned long manl; double d = atof (argv[1]); printf ("d = %f\n", d); printf ("fetestexcept() = %d\n", fetestexcept (FE_ALL_EXCEPT)); manl = d; printf ("fetestexcept() = %d\n", fetestexcept (FE_ALL_EXCEPT)); printf ("manl = %lu\n", manl); return 0; }
which gives:
zimmerma@tomate:/tmp$ gcc O0 g e.c lm; ./a.out 1.4488038916154245e+19 d = 14488038916154245120.000000 fetestexcept() = 0 fetestexcept() = 0 manl = 14488038916154245120 zimmerma@tomate:/tmp$ clang O0 g e.c lm; ./a.out 1.4488038916154245e+19 d = 14488038916154245120.000000 fetestexcept() = 0 fetestexcept() = 1 manl = 14488038916154245120
comment:116 Changed 6 years ago by
by dichotomy I found that the exception seems to be raised for 2^{63}512 and larger values:
zimmerma@tomate:/tmp$ clang O0 g e.c lm; ./a.out 9223372036854775295.0 d = 9223372036854774784.000000 fetestexcept() = 0 fetestexcept() = 0 manl = 9223372036854774784 zimmerma@tomate:/tmp$ clang O0 g e.c lm; ./a.out 9223372036854775296.0 d = 9223372036854775808.000000 fetestexcept() = 0 fetestexcept() = 1 manl = 9223372036854775808
comment:117 Changed 6 years ago by
On FreeBSD/clang the output of the comment 115 code is different (FP_INEXACT is raised, as I already said):
./a.out 1.4488038916154245e+19 d = 14488038916154245120.000000 fetestexcept() = 32 fetestexcept() = 33 manl = 14488038916154245120
But OK, it's indeed something fixable
comment:118 Changed 6 years ago by
seems to be bug 17686 of clang: https://bugs.llvm.org//show_bug.cgi?id=17686
comment:119 Changed 6 years ago by
Hmm, I do not like the reaction of clang devs, sounds like "huh, we don't care" to me. See e.g. related https://bugs.llvm.org//show_bug.cgi?id=8100
comment:120 Changed 6 years ago by
https://bugs.llvm.org//show_bug.cgi?id=8100 is a different issue. I hope they will fix bug 17686.
In the meantime the following ugly patch should fix the warning (in this code the value of d
is
always greater or equal to 2^{63}):
manl = 0x8000000000000000 + (mp_limb_t) (d  9223372036854775808.0);
comment:121 followup: 122 Changed 6 years ago by
Yeah, this does work! I'll post the (1line) patch; not sure how to make it conditional on clang, but that's of secondary importance. At least this nightmare is over, apparently. :)
comment:122 Changed 6 years ago by
Replying to dimpase:
Yeah, this does work! I'll post the (1line) patch; not sure how to make it conditional on clang, but that's of secondary importance. At least this nightmare is over, apparently. :)
Patch to mpfr or numpy? #12426 has stuff to expose the fact that clang is used, so it could be folded back into that.
comment:123 Changed 6 years ago by
Branch:  → u/dimpase/clangtrouble 

Commit:  → 66973af992d75dbf6e1d1aa461cb739044a8e568 
This implements Paul's fix. I'm still running tests, but few things so far indicate that no FPUrelated noise is created. (E.g. from sage.all import *
works with fpectl
armed to go off at any FPU flags raised, etc). And we can leave numpy
alone.
comment:124 Changed 6 years ago by
I think we can improve the patch by inserting a pragma so the fix is only compiled if we are using clang.
comment:126 Changed 6 years ago by
Authors:  → Dima Pasechnik, François Bissey 

Branch:  u/dimpase/clangtrouble → u/fbissey/clangtrouble 
Commit:  66973af992d75dbf6e1d1aa461cb739044a8e568 → 67fa2fa833b1c6b7fc7b6da1577358b2e1b6c128 
Status:  new → needs_review 
comment:128 Changed 6 years ago by
we applied the following patch in the development version of MPFR:
 src/set_d.c (revision 11493) +++ src/set_d.c (working copy) @@ 130,8 +130,16 @@ d *= MP_BASE_AS_DOUBLE; #if GMP_NUMB_BITS >= 64 +#ifndef __clang__ manl = d; #else + /* clang produces an invalid exception when d >= 2^63, + see https://bugs.llvm.org//show_bug.cgi?id=17686. + Since this is always the case, here, we use the following patch. */ + MPFR_STAT_STATIC_ASSERT (GMP_NUMB_BITS == 64); + manl = 0x8000000000000000 + (mp_limb_t) (d  0x8000000000000000); +#endif /* __clang__ */ +#else MPFR_STAT_STATIC_ASSERT (GMP_NUMB_BITS == 32); manh = (mp_limb_t) d; manl = (mp_limb_t) ((d  manh) * MP_BASE_AS_DOUBLE);
comment:129 Changed 6 years ago by
OK, I'll update the branch with this precise patch shortly. I have to think of using ifndef
more.
comment:130 Changed 6 years ago by
Commit:  67fa2fa833b1c6b7fc7b6da1577358b2e1b6c128 → fd29778835a64b8582f1b6ce7d47c8c5062ee240 

Branch pushed to git repo; I updated commit sha1. New commits:
fd29778  Rebase mpfr patch on upstream commit

comment:131 followup: 132 Changed 6 years ago by
I followed your commit as closely as possible, but 3.1.5 doesn't seem to have MPFR_STAT_STATIC_ASSERT
so I had to remove it.
comment:132 Changed 6 years ago by
Replying to fbissey:
I followed your commit as closely as possible, but 3.1.5 doesn't seem to have
MPFR_STAT_STATIC_ASSERT
so I had to remove it.
ok, sorry for that. I also realized that 3.1.5 uses the generic case (_GMP_IEEE_FLOATS not defined) while the development version of MPFR has _MPFR_IEEE_FLOATS defined, thus uses a different code (which does not raise the exception).
comment:134 Changed 6 years ago by
Reviewers:  → John Palmieri 

Status:  needs_review → positive_review 
Yes, positive review.
comment:135 Changed 6 years ago by
Reviewers:  John Palmieri 

I'm running tests on FreeBSD now, but I expect it will all be good with the patch.
comment:136 Changed 6 years ago by
Authors:  Dima Pasechnik, François Bissey → Dima Pasechnik, François Bissey, Paul Zimmermann 

Reviewers:  → Dima Pasechnik, John Palmieri 
comment:137 Changed 6 years ago by
Authors:  Dima Pasechnik, François Bissey, Paul Zimmermann → François Bissey, Dima Pasechnik, Paul Zimmermann 

Report Upstream:  Reported upstream. Developers acknowledge bug. → Fixed upstream, but not in a stable release. 
Reviewers:  Dima Pasechnik, John Palmieri → John Palmieri, Dima Pasechnik 
yes, this also is good on FreeBSD
comment:138 Changed 6 years ago by
Branch:  u/fbissey/clangtrouble → fd29778835a64b8582f1b6ce7d47c8c5062ee240 

Resolution:  → fixed 
Status:  positive_review → closed 
Both polynomial_real_mpfr_dense.pyx and coerce.py are the same test  more or less
There are references to #8426 and #18076.