#10952 closed enhancement (fixed)
better numerical accuracy testing
Reported by:  robertwb  Owned by:  mvngu 

Priority:  critical  Milestone:  sage4.7.2 
Component:  doctest coverage  Keywords:  sd32 noise noisy doctest failure error tolerance 
Cc:  jason, kcrisman  Merged in:  sage4.7.2.alpha3 
Authors:  Robert Bradshaw, Rob Beezer  Reviewers:  Jason Grout, Mariah Lenox, William Stein, John Palmieri 
Report Upstream:  N/A  Work issues:  
Branch:  Commit:  
Dependencies:  Stopgaps: 
Description (last modified by )
If a line contains tol
or tolerance
, numerical results are only
verified to the given tolerance. This may be prefixed by abs[olute]
or rel[ative]
to specify whether to measure absolute or relative error; defaults to relative error except when the expected value is exactly zero:
sage: RDF(pi) # abs tol 1e5 3.14159 sage: [10^n for n in [0.0 .. 4]] # rel tol 2e4 [0.9999, 10.001, 100.01, 999.9, 10001]
This can be useful when the exact output is subject to rounding error and/or processor floating point arithmetic variation.
Related:
.zero_at(epsilon)
methods, to fix noisy (and signed) zeroes; see for example #11848.
Apply
to the Sage scripts repository.
Apply
to the Sage library repository.
Attachments (9)
Change History (54)
Changed 8 years ago by
comment:1 Changed 8 years ago by
 Description modified (diff)
 Status changed from new to needs_review
comment:2 Changed 8 years ago by
comment:3 Changed 8 years ago by
Nitpicky: This comment is now incorrect:
# following three: used only for parsing only_optional; list of comments
Also, something should be added to the docs about this new option. Perhaps here: http://sagemath.org/doc/developer/conventions.html#furtherconventionsforautomatedtestingofexamples
Changed 8 years ago by
comment:4 Changed 8 years ago by
Thanks for the comments, new patches attached.
comment:5 Changed 8 years ago by
 Cc kcrisman added
comment:6 Changed 8 years ago by
Would this works with test of that kind as well:
File "/usr/share/sage/devel/sagemain/sage/stats/hmm/chmm.pyx", line 579: sage: m.viterbi([2,1,.1,0.3]) Expected: ([1, 1, 1, 0], 9.5660236533785135) Got: ([1, 1, 1, 0], 9.566023653378513)
or even
File "/usr/share/sage/devel/sagemain/sage/categories/examples/semigroups.py", line 32: sage: S.some_elements() Expected: [3, 42, 'a', 3.3999999999999999, 'raton laveur'] Got: [3, 42, 'a', 3.4, 'raton laveur']
(test failures courtesy of my work on python2.7)
comment:7 Changed 8 years ago by
Yes, it would (though you do have to explicitly annotate the test with a tolerance, and if Python 2.7 consistently gives, e.g., 3.4 then we should change the doctest rather than make it fuzzy.)
comment:8 Changed 8 years ago by
 Status changed from needs_review to needs_info
In trying to review this ticket, I applied 10952tolbin.patch in the local/bin directory of a skynet/taurus (x86_64Linuxnehalem) build of sage4.7.rc2. I next did 'sage b'. Yet I get
sage: print "The answers are", 1.5, 2, 1e12 # tol 1e3 The answers are 1.50000000000000 2 1.00000000000000e12
Am I missing something?
comment:9 Changed 8 years ago by
No, that is correct.
The example in the description is an example of something that would pass doctests (even though the true output doesn't match exactly, you're seeing what the actual output is).
comment:10 Changed 8 years ago by
Another nitpicky: the doc patch misspells "arithmetic". Definitely not big enough to stop a positive review, but also not big enough to be worth my effort in making a new patch right now.
Changed 8 years ago by
comment:11 Changed 8 years ago by
Argh. Fixed.
comment:12 followup: ↓ 13 Changed 8 years ago by
Thanks! If we had a github pull system, or a system where I could edit the patch inline, I would have fixed it.
comment:13 in reply to: ↑ 12 Changed 8 years ago by
 Reviewers set to Jason Grout
 Status changed from needs_info to needs_review
Replying to jason:
Thanks! If we had a github pull system, or a system where I could edit the patch inline, I would have fixed it.
I have to admit that would be convenient. But wouldn't that create a privileged class of commit people who don't need review? At least that seems to be the model for the projects I'm familiar with.
comment:14 Changed 8 years ago by
That person would be the release manager. He would be the only one able (or supposed to) merge into the master branch.
On the other hand, given our problems with finding release managers who could do the entire release process, maybe sharing the burden to commit patches to master wouldn't be a bad idea.
comment:15 Changed 8 years ago by
 Status changed from needs_review to needs_work
I applied 10952tolbin.patch in the local/bin directory of a skynet/taurus (x86_64Linuxnehalem) build of sage4.7.rc2. I next did 'sage b'. I then modified a doctest to include the lines in the ticket description
sage: print "The answers are", 1.5, 2, 1e12 # tol 1e3 The answers are 1.499999 2.0001 0
I then did 'sage b' again. When I run the doctest, I get
% ./sage t long force_lib "devel/sage/sage/symbolic/units.py" sage t long force_lib "devel/sage/sage/symbolic/units.py" ********************************************************************** File "/home/mariah/sage/sage4.7.rc2x86_64Linuxnehalemfcreview10952/devel/sage/sage/symbolic/units.py", line 13: sage: print "The answers are", 1.5, 2, 1e12 # tol 1e3 Exception raised: Traceback (most recent call last): File "/home/mariah/sage/sage4.7.rc2x86_64Linuxnehalemfcreview10952/local/bin/ncadoctest.py", line 1231, in run_one_test self.run_one_example(test, example, filename, compileflags) File "/home/mariah/sage/sage4.7.rc2x86_64Linuxnehalemfcreview10952/local/bin/sagedoctest.py", line 38, in run_one_example OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags) File "/home/mariah/sage/sage4.7.rc2x86_64Linuxnehalemfcreview10952/local/bin/ncadoctest.py", line 1172, in run_one_example compileflags, 1) in test.globs File "<doctest __main__.example_0[6]>", line 1 res = print "The answers are", RealNumber('1.5'), Integer(2), RealNumber('1e12') # tol 1e3###line 13: sage: print "The answers are", 1.5, 2, 1e12 # tol 1e3 ^ SyntaxError: invalid syntax **********************************************************************
Note that if I put the lines
sage: RDF(pi) # abs tol 1e5 3.14159
in the doctest, then the doctest passes.
comment:16 followup: ↓ 17 Changed 8 years ago by
 Description modified (diff)
 Status changed from needs_work to needs_review
True. It doesn't work with print (or other nonexpression statements). I've created #11336 to generalize it.
I think that this is still plenty useful even with that limitation, and don't know when I'll have more time to work on itwe could either get this in and start using it now or hold off until has time to write a more complete implementation at some later date (which I'd like to get to someday, but that someday list is long...)
comment:17 in reply to: ↑ 16 Changed 8 years ago by
 Reviewers changed from Jason Grout to Jason Grout, Mariah Lenox
 Status changed from needs_review to positive_review
comment:18 Changed 8 years ago by
 Milestone set to sage4.7.1
 Priority changed from major to critical
 Status changed from positive_review to needs_info
Which patches have to be applied?
comment:19 Changed 8 years ago by
I think the documentation should be expanded a bit to show less trivial examples, for example complex numbers or polynomials with nonexact coefficients.
comment:20 Changed 8 years ago by
Also matrices
Changed 8 years ago by
comment:21 Changed 8 years ago by
 Description modified (diff)
 Status changed from needs_info to needs_review
I have added an "Apply" list for this ticket.
My patch fixes one typo in the documentation (a missing ]).
I have added some nontrivial tests.
Jeroen:
 abs() for a matrix computes the determinant. Not useful for testing.
 Is there a class of polynomials with numerical coefficients? Does it have abs() defined on it?
I am working on this as a highpriority ticket for Bug Days 32.
Rob
comment:22 followup: ↓ 23 Changed 8 years ago by
 Description modified (diff)
This ticket is pretty amazing.
Rob, this doctest you added seems very wrong:
A relative tolerance on a root of a polynomial. :: sage: p = (y  10^16)*(y10^(13))*(y2); p y^3 + (1e+16)*y^2 + (2e+16)*y  2000.0 sage: p.roots() sage: p.roots(multiplicities=False)[2] # relative tol 1e10 10^16
In particular,
 it seems that y isn't defined
 the output of
p.roots()
is missing  I get an output of
1e16
instead of10^16
.
comment:23 in reply to: ↑ 22 Changed 8 years ago by
 Status changed from needs_review to needs_work
Replying to was:
Rob, this doctest you added seems very wrong:
Yes, I think this was the patch missing a refresh. I'll clean it in the AM.
Changed 8 years ago by
comment:24 Changed 8 years ago by
 Description modified (diff)
 Status changed from needs_work to needs_review
Reviewer documentation v2 patch cleans up the mess.
I used 10^16
intentionally. The tolerance feature does arithmetic with the
output, so does not have to identically match as text. Maybe that deserves a comment?
Rob
Changed 8 years ago by
comment:25 Changed 8 years ago by
 Description modified (diff)
Reviewer documentation v3 patch adds a comment about intended output being of any form, fixes some missing quotes and expands a bit more on how this works. Passes tests, so should be ready for review.
comment:26 Changed 8 years ago by
 Status changed from needs_review to positive_review
Nice. Game changer!!!! Sweet. It's about time. Great work! W00t. FTW. Frickin' awesome.
comment:27 Changed 8 years ago by
 Reviewers changed from Jason Grout, Mariah Lenox to Jason Grout, Mariah Lenox, William Stein
comment:28 Changed 8 years ago by
 Keywords sd32 added
comment:29 Changed 8 years ago by
 Description modified (diff)
Shouldn't the code only be included if the doctests actually contain # tol ...
?
( +[09.e+]+)?
hardly matches what we want btw.
I've put this onto the 1337 ticket (#11337).
comment:30 followup: ↓ 31 Changed 8 years ago by
 Reviewers changed from Jason Grout, Mariah Lenox, William Stein to Jason Grout, Mariah Lenox, William Stein, John Palmieri
 Status changed from positive_review to needs_work
This doesn't work. Re Leif's comment, I think the regexp should something like ' ((\.[09]+[09]+(\.[09]*)?)e[+]?[09]+)'
, which adds another group to the match, so we need to make this change:

sagedoctest
diff git a/sagedoctest b/sagedoctest
a b if __name__ == '__main__': 217 217 """ % dict 218 218 219 219 NONE=0; LONG_TIME=1; RANDOM=2; OPTIONAL=3; NOT_IMPLEMENTED=4; NOT_TESTED=5; TOLERANCE=6 220 tolerance_pattern = re.compile(r'\b((?:abs(?:olute)?)(?:rel(?:ative)?))? *?tol(?:erance)?\b ( +[09.e+]+)?')220 tolerance_pattern = re.compile(r'\b((?:abs(?:olute)?)(?:rel(?:ative)?))? *?tol(?:erance)?\b +((\.[09]+[09]+(\.[09]*)?)e[+]?[09]+)?') 221 221 222 222 def comment_modifier(s): 223 223 sind = s.find('#') … … def comment_modifier(s): 239 239 m = tolerance_pattern.search(L) 240 240 if m: 241 241 v.append(TOLERANCE) 242 rel_or_abs, epsilon = m.groups()242 rel_or_abs, epsilon, _, _ = m.groups() 243 243 if rel_or_abs is not None: 244 244 rel_or_abs = rel_or_abs[:3] 245 245 if epsilon is None:
I think this should match any of "1.2e12", "1e12", ".1e12", without matching "ee12e+122", like the previous regexp.
In addition to this, the whole thing just flat doesn't work. I created a Python file with some tolerance tests in it, and they all fail, and the failures print badly: they include tracebacks, for example. I'm not sure why the rst file patched in the two "doc" patches isn't being doctested correctly, but as far as I can tell, no doctests are being run in that file  just add an obviously wrong doctest, and tests still pass. Here is a test file:
"""
Here are some examples::
sage: 3+3
6
sage: RDF(pi) # abs tol 1e4
3.14159
Here is another example::
sage: [10^n for n in [0.0 .. 4]] # rel tol 2e4
[0.9999, 10.001, 100.01, 999.9, 10001]
And another::
sage: RDF(pi) # abs tol 1e4
3.14159
And that's the end.
"""
Save this as "new.py" and run "sage t new.py" to see some bad behavior.
comment:31 in reply to: ↑ 30 Changed 8 years ago by
Replying to jhpalmieri:
I think the regexp should something like
' ((\.[09]+[09]+(\.[09]*)?)e[+]?[09]+)'`
But that doesn't match 1ee7
:(
You can of course match funny things first, i.e. use more general patterns, but you then have to check the matched expression further before passing it to float()
.
The whole idea isn't that bad, but carries the danger that people simply increase the tolerance (too much) just to make doctests pass somehow, without caring where the variations originate from.
comment:32 Changed 8 years ago by
But float(1ee7)
raises an error. Why would we want to match that?
Re the issue of raising tolerance, I hope that referees will keep an eye on that sort of thing. Dave Kirkby, for example, is a strong advocate of not arbitrarily raising tolerance.
Changed 8 years ago by
comment:33 Changed 8 years ago by
 Description modified (diff)
I've fixed the issue with blanklines terminating example blocks. As for the regular expression for floats, I don't see any need to make it more complicatedI'd rather match misstyped numbers and raise an error than silently ignore them. If you feel strongly, this could be changed.
Referees should keep an eye on precision. This is back to my philosophy that a computer should run doctests and a human read the code (though we could add a patchbot plugin to flag drops in precision). It's not a new issuewe've been using "..." for quite a while now which is less flexible and more dangerous (as it can match more than just a single number).
comment:34 Changed 8 years ago by
 Status changed from needs_work to needs_review
comment:35 Changed 8 years ago by
How about emitting the extra code only if # tol ...
is used at all in the file?
That's a simple """... %s ...""" % (tolerance_code if uses_tol else "")
.
comment:36 followup: ↓ 37 Changed 8 years ago by
You're talking about the check_with_tolerance
function and helpers. One could add a variable to keep track of this and conditionally emit it; not sure if the added complexity is worth the tiny savings.
comment:37 in reply to: ↑ 36 Changed 8 years ago by
Replying to robertwb:
You're talking about the
check_with_tolerance
function and helpers. One could add a variable to keep track of this and conditionally emit it; not sure if the added complexity is worth the tiny savings.
Added complexity? One boolean variable, initialized to false and set to True
in only a few places?
This not only reduces the file size and run (import!) time but also avoids more potential name clashes.
Why should one add useless code to each and every file to doctest not using # tol
?
comment:38 Changed 8 years ago by
Either we make this variable a global (ugly) or return it on from call to doc_preparse along with the parsed docstring (also ugly). All to save a fraction of a block (~1KB) of an ephemeral file and < 1ms on an already fairly expensive operation. If we're looking to cut fat, it'd be better to avoid the quadratictime creation of the doctest file string.
The name clash is a red herringif we avoid it only because # tol is not used, that's a potentially latent bug in my mind. The function could be renamed if need be.
comment:39 followup: ↓ 40 Changed 8 years ago by
 Description modified (diff)
I don't see anything wrong with more complicated regular expressions, but I like regular expressions. So leave it as is if you want.
Meanwhile, the output for failures using tolerances isn't nice, compared to other sorts of failures:
sage t "builds/sage4.7.2.alpha2/devel/sagenew/sage/homology/new.py" ********************************************************************** File "/Applications/sage_builds/sage4.7.2.alpha2/devel/sagenew/sage/homology/new.py", line 4: sage: 3+3 Expected: 8 Got: 6 ********************************************************************** File "/Applications/sage_builds/sage4.7.2.alpha2/devel/sagenew/sage/homology/new.py", iled example: check_with_tolerance(''' 3.14159 ''', res, 9.9999999999999998e13, 'abs') Exception raised: Traceback (most recent call last): File "/Applications/sage/local/bin/ncadoctest.py", line 1231, in run_one_test self.run_one_example(test, example, filename, compileflags) File "/Applications/sage/local/bin/sagedoctest.py", line 38, in run_one_example OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags) File "/Applications/sage/local/bin/ncadoctest.py", line 1172, in run_one_example compileflags, 1) in test.globs File "<doctest __main__.example_0[11]>", line 3, in <module> ''', res, 9.9999999999999998e13, 'abs') File "/Users/palmieri/.sage//tmp/new.py", line 35, in check_with_tolerance assert abs(expected_value  actual_value) < epsilon, "Out of tolerance %s vs %s" % (expected_value, actual_value) AssertionError: Out of tolerance 3.14159 vs 3.14159265359 ********************************************************************** 1 items had failures: 2 of 13 in __main__.example_0 ***Test Failed*** 2 failures. For whitespace errors, see the file /Users/palmieri/.sage//tmp/.doctest_new.py [3.9 s]
Notice that the line number is missing and the traceback is present. I'm attaching a referee patch which tries to fix this. Please take a look. For me, this gives this sort of output (with a slightly different file):
sage t "builds/sage4.7.2.alpha2/devel/sagenew/new.py" ********************************************************************** File "/Applications/sage_builds/sage4.7.2.alpha2/devel/sagenew/new.py", line 4: sage: 3+3 Expected: 8 Got: 6 ********************************************************************** File "/Applications/sage_builds/sage4.7.2.alpha2/devel/sagenew/new.py", line 6: sage: RDF(pi) # abs tol 1e6 Out of tolerance 3.14159 vs 3.14159265359 ********************************************************************** File "/Applications/sage_builds/sage4.7.2.alpha2/devel/sagenew/new.py", line 16: sage: RDF(pi) # abs tol 1e8 Out of tolerance 3.14159 vs 3.14159265359 ********************************************************************** 1 items had failures: 3 of 16 in __main__.example_0 ***Test Failed*** 3 failures. For whitespace errors, see the file /Users/palmieri/.sage//tmp/.doctest_new.py [4.2 s]
Since the whole ticket had a positive review earlier, I think if you're happy with my changes, you can switch it back to that status.
Finally, the issue with doctests in conventions.rst
is, I think, that the various triple quotes """
confuse the parsing routine (the function pythonify_rst
in particular). This belongs on another ticket. That file contains enough incomplete code stubs that I wonder if it shouldn't be doctested at all...
comment:40 in reply to: ↑ 39 ; followup: ↓ 41 Changed 8 years ago by
Replying to jhpalmieri:
I don't see anything wrong with more complicated regular expressions, but I like regular expressions. So leave it as is if you want.
Yes, only
sage: foo # tol 1ee7 anything
should always pass. ;)
Meanwhile, the output for failures using tolerances isn't nice, compared to other sorts of failures [...]
Notice that the line number is missing and the traceback is present. I'm attaching a referee patch which tries to fix this. Please take a look.
Haven't tested your patch (nor looked at it), but the output you gave looks much better.
Since the whole ticket had a positive review earlier, I think if you're happy with my changes, you can switch it back to that status.
I won't object, though I in general don't like the idea of adding code (regardless how small the overhead or impact might be) unconditionally, i.e., if there's no need to do so.
We have similar situations all around, where everybody adds "just a little", IMHO.
comment:41 in reply to: ↑ 40 Changed 8 years ago by
 Status changed from needs_review to positive_review
Replying to leif:
Replying to jhpalmieri:
I don't see anything wrong with more complicated regular expressions, but I like regular expressions. So leave it as is if you want.
Yes, only
sage: foo # tol 1ee7 anythingshould always pass. ;)
Currently, it raises an error indicating that the *doctest* itself is bad.
Meanwhile, the output for failures using tolerances isn't nice, compared to other sorts of failures [...]
Notice that the line number is missing and the traceback is present. I'm attaching a referee patch which tries to fix this. Please take a look.Haven't tested your patch (nor looked at it), but the output you gave looks much better.
Since the whole ticket had a positive review earlier, I think if you're happy with my changes, you can switch it back to that status.
I won't object, though I in general don't like the idea of adding code (regardless how small the overhead or impact might be) unconditionally, i.e., if there's no need to do so.
We have similar situations all around, where everybody adds "just a little", IMHO.
Yes, it's a tradeoff between adding "just a little" python parsing overhead to the testing infrastructure or "just a little" human parsing overhead to the testing infrastructure. I'd rather put the burden on machines than developers in this case.
The reviewer patches look good, certainly an improvement, so I'm setting this back to positive review unless there's something else that needs to be taken care of.
comment:42 Changed 8 years ago by
 Description modified (diff)
 Keywords noise noisy doctest failure error tolerance added
comment:43 Changed 8 years ago by
 Merged in set to sage4.7.2.alpha3
 Resolution set to fixed
 Status changed from positive_review to closed
comment:44 Changed 7 years ago by
I'd like to draw the attention of folks here to comment:7:ticket:12493. Apparently one can't do onlyoptional
tests along with tol
tests at the same time. My guess is that not too many people use onlyoptional
and the tol
stuff is pretty new.
In fact, I only found one occurrence in 5.0.beta3. Can that be right? This has been in Sage for months!
sage: search_src(" tol ","#") symbolic/integration/integral.py:587: sage: error.numerical_approx() # abs tol 10e10
Anyway, even if my analysis is wrong (let's hope it's easier than that), I figure the people here can give a quick diagnosis of #12493.
comment:45 Changed 7 years ago by
Actually, comment:8:ticket:12493 is even better! Optional and tol don't play well together.
Amazing speed getting this up! Shouldn't "((?:abs(?:solute)?)" be "((?:abs(?:olute)?)" ?