Opened 9 years ago

Closed 8 years ago

Last modified 8 years ago

#10952 closed enhancement (fixed)

better numerical accuracy testing

Reported by: robertwb Owned by: mvngu
Priority: critical Milestone: sage-4.7.2
Component: doctest coverage Keywords: sd32 noise noisy doctest failure error tolerance
Cc: jason, kcrisman Merged in: sage-4.7.2.alpha3
Authors: Robert Bradshaw, Rob Beezer Reviewers: Jason Grout, Mariah Lenox, William Stein, John Palmieri
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by leif)

If a line contains tol or tolerance, numerical results are only verified to the given tolerance. This may be prefixed by abs[olute] or rel[ative] to specify whether to measure absolute or relative error; defaults to relative error except when the expected value is exactly zero:

        sage: RDF(pi)                               # abs tol 1e-5 
        3.14159 
        sage: [10^n for n in [0.0 .. 4]]            # rel tol 2e-4 
        [0.9999, 10.001, 100.01, 999.9, 10001] 

This can be useful when the exact output is subject to rounding error and/or processor floating point arithmetic variation.


Related:

  • .zero_at(epsilon) methods, to fix noisy (and signed) zeroes; see for example #11848.

Apply

  1. 10952-tol-bin.2.patch
  2. trac_10952-ref.patch

to the Sage scripts repository.

Apply

  1. 10952-tol-doc.2.patch
  2. trac_10952-reviewer-docs-v3.patch

to the Sage library repository.

Attachments (9)

noise.patch (5.0 KB) - added by robertwb 9 years ago.
10952-tol-bin.patch (5.1 KB) - added by robertwb 9 years ago.
apply only this patch to the bin repo
10952-tol-doc.patch (1.4 KB) - added by robertwb 9 years ago.
10952-tol-doc.2.patch (1.4 KB) - added by robertwb 9 years ago.
trac_10952-reviewer-docs.patch (2.8 KB) - added by rbeezer 8 years ago.
trac_10952-reviewer-docs-v2.patch (2.8 KB) - added by rbeezer 8 years ago.
trac_10952-reviewer-docs-v3.patch (3.7 KB) - added by rbeezer 8 years ago.
10952-tol-bin.2.patch (5.2 KB) - added by robertwb 8 years ago.
trac_10952-ref.patch (2.1 KB) - added by jhpalmieri 8 years ago.
scripts repo; apply on top of other scripts patch

Download all attachments as: .zip

Change History (54)

Changed 9 years ago by robertwb

comment:1 Changed 9 years ago by robertwb

  • Description modified (diff)
  • Status changed from new to needs_review

comment:2 Changed 9 years ago by jason

Amazing speed getting this up! Shouldn't "((?:abs(?:solute)?)" be "((?:abs(?:olute)?)" ?

comment:3 Changed 9 years ago by jason

Nitpicky: This comment is now incorrect:

# following three: used only for parsing only_optional; list of comments 

Also, something should be added to the docs about this new option. Perhaps here: http://sagemath.org/doc/developer/conventions.html#further-conventions-for-automated-testing-of-examples

Changed 9 years ago by robertwb

apply only this patch to the bin repo

Changed 9 years ago by robertwb

comment:4 Changed 9 years ago by robertwb

Thanks for the comments, new patches attached.

comment:5 Changed 9 years ago by kcrisman

  • Cc kcrisman added

comment:6 Changed 9 years ago by fbissey

Would this works with test of that kind as well:

File "/usr/share/sage/devel/sage-main/sage/stats/hmm/chmm.pyx", line 579:
    sage: m.viterbi([-2,-1,.1,0.3])
Expected:
    ([1, 1, 1, 0], -9.5660236533785135)
Got:
    ([1, 1, 1, 0], -9.566023653378513)

or even

File "/usr/share/sage/devel/sage-main/sage/categories/examples/semigroups.py", line 32:
    sage: S.some_elements()
Expected:
    [3, 42, 'a', 3.3999999999999999, 'raton laveur']
Got:
    [3, 42, 'a', 3.4, 'raton laveur']

(test failures courtesy of my work on python-2.7)

comment:7 Changed 9 years ago by robertwb

Yes, it would (though you do have to explicitly annotate the test with a tolerance, and if Python 2.7 consistently gives, e.g., 3.4 then we should change the doctest rather than make it fuzzy.)

comment:8 Changed 9 years ago by mariah

  • Status changed from needs_review to needs_info

In trying to review this ticket, I applied 10952-tol-bin.patch in the local/bin directory of a skynet/taurus (x86_64-Linux-nehalem) build of sage-4.7.rc2. I next did 'sage -b'. Yet I get

sage: print "The answers are", 1.5, 2, 1e-12 # tol 1e-3
The answers are 1.50000000000000 2 1.00000000000000e-12

Am I missing something?

comment:9 Changed 9 years ago by robertwb

No, that is correct.

The example in the description is an example of something that would pass doctests (even though the true output doesn't match exactly, you're seeing what the actual output is).

comment:10 Changed 9 years ago by jason

Another nitpicky: the doc patch misspells "arithmetic". Definitely not big enough to stop a positive review, but also not big enough to be worth my effort in making a new patch right now.

Changed 9 years ago by robertwb

comment:11 Changed 9 years ago by robertwb

Argh. Fixed.

comment:12 follow-up: Changed 9 years ago by jason

Thanks! If we had a github pull system, or a system where I could edit the patch inline, I would have fixed it.

comment:13 in reply to: ↑ 12 Changed 9 years ago by kcrisman

  • Authors set to Robert Bradshaw
  • Reviewers set to Jason Grout
  • Status changed from needs_info to needs_review

Replying to jason:

Thanks! If we had a github pull system, or a system where I could edit the patch inline, I would have fixed it.

I have to admit that would be convenient. But wouldn't that create a privileged class of commit people who don't need review? At least that seems to be the model for the projects I'm familiar with.

comment:14 Changed 9 years ago by jason

That person would be the release manager. He would be the only one able (or supposed to) merge into the master branch.

On the other hand, given our problems with finding release managers who could do the entire release process, maybe sharing the burden to commit patches to master wouldn't be a bad idea.

comment:15 Changed 9 years ago by mariah

  • Status changed from needs_review to needs_work

I applied 10952-tol-bin.patch in the local/bin directory of a skynet/taurus (x86_64-Linux-nehalem) build of sage-4.7.rc2. I next did 'sage -b'. I then modified a doctest to include the lines in the ticket description

sage: print "The answers are", 1.5, 2, 1e-12 # tol 1e-3
The answers are 1.499999 2.0001 0

I then did 'sage -b' again. When I run the doctest, I get

% ./sage -t  -long -force_lib "devel/sage/sage/symbolic/units.py"
sage -t -long -force_lib "devel/sage/sage/symbolic/units.py"
**********************************************************************
File "/home/mariah/sage/sage-4.7.rc2-x86_64-Linux-nehalem-fc-review-10952/devel/sage/sage/symbolic/units.py", line 13:
    sage: print "The answers are", 1.5, 2, 1e-12 # tol 1e-3
Exception raised:
    Traceback (most recent call last):
      File "/home/mariah/sage/sage-4.7.rc2-x86_64-Linux-nehalem-fc-review-10952/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/home/mariah/sage/sage-4.7.rc2-x86_64-Linux-nehalem-fc-review-10952/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/home/mariah/sage/sage-4.7.rc2-x86_64-Linux-nehalem-fc-review-10952/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_0[6]>", line 1
         res =  print "The answers are", RealNumber('1.5'), Integer(2), RealNumber('1e-12') # tol 1e-3###line 13:
    sage: print "The answers are", 1.5, 2, 1e-12 # tol 1e-3
                    ^
     SyntaxError: invalid syntax
**********************************************************************

Note that if I put the lines

    sage: RDF(pi)                               # abs tol 1e-5
    3.14159

in the doctest, then the doctest passes.

comment:16 follow-up: Changed 9 years ago by robertwb

  • Description modified (diff)
  • Status changed from needs_work to needs_review

True. It doesn't work with print (or other non-expression statements). I've created #11336 to generalize it.

I think that this is still plenty useful even with that limitation, and don't know when I'll have more time to work on it--we could either get this in and start using it now or hold off until has time to write a more complete implementation at some later date (which I'd like to get to someday, but that someday list is long...)

comment:17 in reply to: ↑ 16 Changed 9 years ago by mariah

  • Reviewers changed from Jason Grout to Jason Grout, Mariah Lenox
  • Status changed from needs_review to positive_review

Replying to robertwb:

True. It doesn't work with print (or other non-expression statements). I've created #11336 to generalize it.

I think that this is still plenty useful even with that limitation

Agreed. I am just doing my "reviewer" duty.

Positive review.

comment:18 Changed 9 years ago by jdemeyer

  • Milestone set to sage-4.7.1
  • Priority changed from major to critical
  • Status changed from positive_review to needs_info

Which patches have to be applied?

comment:19 Changed 9 years ago by jdemeyer

I think the documentation should be expanded a bit to show less trivial examples, for example complex numbers or polynomials with non-exact coefficients.

comment:20 Changed 9 years ago by jdemeyer

Also matrices

Changed 8 years ago by rbeezer

comment:21 Changed 8 years ago by rbeezer

  • Description modified (diff)
  • Status changed from needs_info to needs_review

I have added an "Apply" list for this ticket.

My patch fixes one typo in the documentation (a missing ]).

I have added some non-trivial tests.

Jeroen:

  1. abs() for a matrix computes the determinant. Not useful for testing.
  1. Is there a class of polynomials with numerical coefficients? Does it have abs() defined on it?

I am working on this as a high-priority ticket for Bug Days 32.

Rob

comment:22 follow-up: Changed 8 years ago by was

  • Description modified (diff)

This ticket is pretty amazing.

Rob, this doctest you added seems very wrong:

   A relative tolerance on a root of a polynomial.
    
   ::
   
       sage: p = (y - 10^16)*(y-10^(-13))*(y-2); p
       y^3 + (-1e+16)*y^2 + (2e+16)*y - 2000.0
       sage: p.roots()
       sage: p.roots(multiplicities=False)[2]     # relative tol 1e-10
       10^16

In particular,

  • it seems that y isn't defined
  • the output of p.roots() is missing
  • I get an output of 1e16 instead of 10^16.

comment:23 in reply to: ↑ 22 Changed 8 years ago by rbeezer

  • Status changed from needs_review to needs_work

Replying to was:

Rob, this doctest you added seems very wrong:

Yes, I think this was the patch missing a refresh. I'll clean it in the AM.

Changed 8 years ago by rbeezer

comment:24 Changed 8 years ago by rbeezer

  • Description modified (diff)
  • Status changed from needs_work to needs_review

Reviewer documentation v2 patch cleans up the mess.

I used 10^16 intentionally. The tolerance feature does arithmetic with the output, so does not have to identically match as text. Maybe that deserves a comment?

Rob

Changed 8 years ago by rbeezer

comment:25 Changed 8 years ago by rbeezer

  • Description modified (diff)

Reviewer documentation v3 patch adds a comment about intended output being of any form, fixes some missing quotes and expands a bit more on how this works. Passes tests, so should be ready for review.

comment:26 Changed 8 years ago by was

  • Status changed from needs_review to positive_review

Nice. Game changer!!!! Sweet. It's about time. Great work! W00t. FTW. Frickin' awesome.

comment:27 Changed 8 years ago by kcrisman

  • Authors changed from Robert Bradshaw to Robert Bradshaw, Rob Beezer
  • Reviewers changed from Jason Grout, Mariah Lenox to Jason Grout, Mariah Lenox, William Stein

comment:28 Changed 8 years ago by was

  • Keywords sd32 added

comment:29 Changed 8 years ago by leif

  • Description modified (diff)

Shouldn't the code only be included if the doctests actually contain # tol ...?

( +[0-9.e+-]+)? hardly matches what we want btw.


I've put this onto the 1337 ticket (#11337).

comment:30 follow-up: Changed 8 years ago by jhpalmieri

  • Reviewers changed from Jason Grout, Mariah Lenox, William Stein to Jason Grout, Mariah Lenox, William Stein, John Palmieri
  • Status changed from positive_review to needs_work

This doesn't work. Re Leif's comment, I think the regexp should something like ' ((\.[0-9]+|[0-9]+(\.[0-9]*)?)e[+-]?[0-9]+)', which adds another group to the match, so we need to make this change:

  • sage-doctest

    diff --git a/sage-doctest b/sage-doctest
    a b if __name__ == '__main__': 
    217217""" % dict
    218218
    219219NONE=0; LONG_TIME=1; RANDOM=2; OPTIONAL=3; NOT_IMPLEMENTED=4; NOT_TESTED=5; TOLERANCE=6
    220 tolerance_pattern = re.compile(r'\b((?:abs(?:olute)?)|(?:rel(?:ative)?))? *?tol(?:erance)?\b( +[0-9.e+-]+)?')
     220tolerance_pattern = re.compile(r'\b((?:abs(?:olute)?)|(?:rel(?:ative)?))? *?tol(?:erance)?\b +((\.[0-9]+|[0-9]+(\.[0-9]*)?)e[+-]?[0-9]+)?')
    221221
    222222def comment_modifier(s):
    223223    sind = s.find('#')
    def comment_modifier(s): 
    239239    m = tolerance_pattern.search(L)
    240240    if m:
    241241        v.append(TOLERANCE)
    242         rel_or_abs, epsilon = m.groups()
     242        rel_or_abs, epsilon, _, _ = m.groups()
    243243        if rel_or_abs is not None:
    244244            rel_or_abs = rel_or_abs[:3]
    245245        if epsilon is None:

I think this should match any of "1.2e12", "1e-12", ".1e12", without matching "ee12e+-122", like the previous regexp.

In addition to this, the whole thing just flat doesn't work. I created a Python file with some tolerance tests in it, and they all fail, and the failures print badly: they include tracebacks, for example. I'm not sure why the rst file patched in the two "doc" patches isn't being doctested correctly, but as far as I can tell, no doctests are being run in that file -- just add an obviously wrong doctest, and tests still pass. Here is a test file:

"""
Here are some examples::

    sage: 3+3
    6
    sage: RDF(pi)                               # abs tol 1e-4
    3.14159

Here is another example::

    sage: [10^n for n in [0.0 .. 4]]            # rel tol 2e-4
    [0.9999, 10.001, 100.01, 999.9, 10001]

And another::

    sage: RDF(pi)                               # abs tol 1e-4
    3.14159

And that's the end.
"""

Save this as "new.py" and run "sage -t new.py" to see some bad behavior.

comment:31 in reply to: ↑ 30 Changed 8 years ago by leif

Replying to jhpalmieri:

I think the regexp should something like

' ((\.[0-9]+|[0-9]+(\.[0-9]*)?)e[+-]?[0-9]+)'`

But that doesn't match 1ee7 :(

You can of course match funny things first, i.e. use more general patterns, but you then have to check the matched expression further before passing it to float().

The whole idea isn't that bad, but carries the danger that people simply increase the tolerance (too much) just to make doctests pass somehow, without caring where the variations originate from.

comment:32 Changed 8 years ago by jhpalmieri

But float(1ee7) raises an error. Why would we want to match that?

Re the issue of raising tolerance, I hope that referees will keep an eye on that sort of thing. Dave Kirkby, for example, is a strong advocate of not arbitrarily raising tolerance.

Changed 8 years ago by robertwb

comment:33 Changed 8 years ago by robertwb

  • Description modified (diff)

I've fixed the issue with blanklines terminating example blocks. As for the regular expression for floats, I don't see any need to make it more complicated--I'd rather match miss-typed numbers and raise an error than silently ignore them. If you feel strongly, this could be changed.

Referees should keep an eye on precision. This is back to my philosophy that a computer should run doctests and a human read the code (though we could add a patchbot plugin to flag drops in precision). It's not a new issue--we've been using "..." for quite a while now which is less flexible and more dangerous (as it can match more than just a single number).

comment:34 Changed 8 years ago by leif

  • Status changed from needs_work to needs_review

comment:35 Changed 8 years ago by leif

How about emitting the extra code only if # tol ... is used at all in the file?

That's a simple """... %s ...""" % (tolerance_code if uses_tol else "").

comment:36 follow-up: Changed 8 years ago by robertwb

You're talking about the check_with_tolerance function and helpers. One could add a variable to keep track of this and conditionally emit it; not sure if the added complexity is worth the tiny savings.

comment:37 in reply to: ↑ 36 Changed 8 years ago by leif

Replying to robertwb:

You're talking about the check_with_tolerance function and helpers. One could add a variable to keep track of this and conditionally emit it; not sure if the added complexity is worth the tiny savings.

Added complexity? One boolean variable, initialized to false and set to True in only a few places?

This not only reduces the file size and run (import!) time but also avoids more potential name clashes.

Why should one add useless code to each and every file to doctest not using # tol?

comment:38 Changed 8 years ago by robertwb

Either we make this variable a global (ugly) or return it on from call to doc_preparse along with the parsed docstring (also ugly). All to save a fraction of a block (~1KB) of an ephemeral file and < 1ms on an already fairly expensive operation. If we're looking to cut fat, it'd be better to avoid the quadratic-time creation of the doctest file string.

The name clash is a red herring--if we avoid it only because # tol is not used, that's a potentially latent bug in my mind. The function could be renamed if need be.

comment:39 follow-up: Changed 8 years ago by jhpalmieri

  • Description modified (diff)

I don't see anything wrong with more complicated regular expressions, but I like regular expressions. So leave it as is if you want.

Meanwhile, the output for failures using tolerances isn't nice, compared to other sorts of failures:

sage -t  "builds/sage-4.7.2.alpha2/devel/sage-new/sage/homology/new.py"
**********************************************************************
File "/Applications/sage_builds/sage-4.7.2.alpha2/devel/sage-new/sage/homology/new.py", line 4:
    sage: 3+3
Expected:
    8
Got:
    6
**********************************************************************
File "/Applications/sage_builds/sage-4.7.2.alpha2/devel/sage-new/sage/homology/new.py", iled example:
    check_with_tolerance('''
        3.14159
    ''', res, 9.9999999999999998e-13, 'abs')
Exception raised:
    Traceback (most recent call last):
      File "/Applications/sage/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/Applications/sage/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/Applications/sage/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_0[11]>", line 3, in <module>
        ''', res, 9.9999999999999998e-13, 'abs')
      File "/Users/palmieri/.sage//tmp/new.py", line 35, in check_with_tolerance
        assert abs(expected_value - actual_value) < epsilon, "Out of tolerance %s vs %s" % (expected_value, actual_value)
    AssertionError: Out of tolerance 3.14159 vs 3.14159265359
**********************************************************************
1 items had failures:
   2 of  13 in __main__.example_0
***Test Failed*** 2 failures.
For whitespace errors, see the file /Users/palmieri/.sage//tmp/.doctest_new.py
	 [3.9 s]

Notice that the line number is missing and the traceback is present. I'm attaching a referee patch which tries to fix this. Please take a look. For me, this gives this sort of output (with a slightly different file):

sage -t  "builds/sage-4.7.2.alpha2/devel/sage-new/new.py"   
**********************************************************************
File "/Applications/sage_builds/sage-4.7.2.alpha2/devel/sage-new/new.py", line 4:
    sage: 3+3
Expected:
    8
Got:
    6
**********************************************************************
File "/Applications/sage_builds/sage-4.7.2.alpha2/devel/sage-new/new.py", line 6:
    sage: RDF(pi)                               # abs tol 1e-6
Out of tolerance 3.14159 vs 3.14159265359
**********************************************************************
File "/Applications/sage_builds/sage-4.7.2.alpha2/devel/sage-new/new.py", line 16:
    sage: RDF(pi)                               # abs tol 1e-8
Out of tolerance 3.14159 vs 3.14159265359
**********************************************************************
1 items had failures:
   3 of  16 in __main__.example_0
***Test Failed*** 3 failures.
For whitespace errors, see the file /Users/palmieri/.sage//tmp/.doctest_new.py
	 [4.2 s]

Since the whole ticket had a positive review earlier, I think if you're happy with my changes, you can switch it back to that status.

Finally, the issue with doctests in conventions.rst is, I think, that the various triple quotes """ confuse the parsing routine (the function pythonify_rst in particular). This belongs on another ticket. That file contains enough incomplete code stubs that I wonder if it shouldn't be doctested at all...

Changed 8 years ago by jhpalmieri

scripts repo; apply on top of other scripts patch

comment:40 in reply to: ↑ 39 ; follow-up: Changed 8 years ago by leif

Replying to jhpalmieri:

I don't see anything wrong with more complicated regular expressions, but I like regular expressions. So leave it as is if you want.

Yes, only

sage: foo   # tol 1ee7
anything

should always pass. ;-)


Meanwhile, the output for failures using tolerances isn't nice, compared to other sorts of failures [...]
Notice that the line number is missing and the traceback is present. I'm attaching a referee patch which tries to fix this. Please take a look.

Haven't tested your patch (nor looked at it), but the output you gave looks much better.


Since the whole ticket had a positive review earlier, I think if you're happy with my changes, you can switch it back to that status.

I won't object, though I in general don't like the idea of adding code (regardless how small the overhead or impact might be) unconditionally, i.e., if there's no need to do so.

We have similar situations all around, where everybody adds "just a little", IMHO.

comment:41 in reply to: ↑ 40 Changed 8 years ago by robertwb

  • Status changed from needs_review to positive_review

Replying to leif:

Replying to jhpalmieri:

I don't see anything wrong with more complicated regular expressions, but I like regular expressions. So leave it as is if you want.

Yes, only

sage: foo   # tol 1ee7
anything

should always pass. ;-)

Currently, it raises an error indicating that the *doctest* itself is bad.

Meanwhile, the output for failures using tolerances isn't nice, compared to other sorts of failures [...]
Notice that the line number is missing and the traceback is present. I'm attaching a referee patch which tries to fix this. Please take a look.

Haven't tested your patch (nor looked at it), but the output you gave looks much better.


Since the whole ticket had a positive review earlier, I think if you're happy with my changes, you can switch it back to that status.

I won't object, though I in general don't like the idea of adding code (regardless how small the overhead or impact might be) unconditionally, i.e., if there's no need to do so.

We have similar situations all around, where everybody adds "just a little", IMHO.

Yes, it's a tradeoff between adding "just a little" python parsing overhead to the testing infrastructure or "just a little" human parsing overhead to the testing infrastructure. I'd rather put the burden on machines than developers in this case.

The reviewer patches look good, certainly an improvement, so I'm setting this back to positive review unless there's something else that needs to be taken care of.

comment:42 Changed 8 years ago by leif

  • Description modified (diff)
  • Keywords noise noisy doctest failure error tolerance added

comment:43 Changed 8 years ago by leif

  • Merged in set to sage-4.7.2.alpha3
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:44 Changed 8 years ago by kcrisman

I'd like to draw the attention of folks here to comment:7:ticket:12493. Apparently one can't do only-optional tests along with tol tests at the same time. My guess is that not too many people use only-optional and the tol stuff is pretty new.

In fact, I only found one occurrence in 5.0.beta3. Can that be right? This has been in Sage for months!

sage: search_src(" tol ","#")
symbolic/integration/integral.py:587:        sage: error.numerical_approx() # abs tol 10e-10

Anyway, even if my analysis is wrong (let's hope it's easier than that), I figure the people here can give a quick diagnosis of #12493.

comment:45 Changed 8 years ago by kcrisman

Actually, comment:8:ticket:12493 is even better! Optional and tol don't play well together.

Note: See TracTickets for help on using tickets.