#24772 closed enhancement (fixed)

Encoding fixes to the doctest module

Reported by: jdemeyer Owned by:
Priority: major Milestone: sage-8.2
Component: doctest framework Keywords:
Cc: embray Merged in:
Authors: Erik Bray Reviewers: Jeroen Demeyer
Report Upstream: N/A Work issues:
Branch: bf6cf97 (Commits) Commit: bf6cf97ada80f3d70a07cd416a53acd2453b6d88
Dependencies: #24771 Stopgaps:

Description


Change History (12)

comment:1 Changed 13 months ago by jdemeyer

  • Branch set to u/jdemeyer/ticket/24772

comment:2 Changed 13 months ago by jdemeyer

  • Commit set to dcff893007aff8e5bde44025801eb358acdefdad
  • Status changed from new to needs_review

New commits:

3eb2db6Miscellaneous fixes to tests in the doctest module.
51e20d9A few other miscellaneous fixes to the doctests tests
878a62bThis exception reads 'utf-8' on Python 3 and 'utf8' on Python 2
1ba0e67Fix various minor encoding issues
dcff893Correct some encoding issues on Python 2.

comment:3 follow-up: Changed 13 months ago by jdemeyer

  • Status changed from needs_review to needs_work

About

+            if not isinstance(got, six.text_type):
+                # On Python 3 got should already be unicode text, but on Python
+                # 2 it is not.  For comparison's sake we want the unicode text
+                # decoded from UTF-8. If there was some error such that the
+                # output is so malformed that it does not even decode from
+                # UTF-8 at all there will be an error in the test framework
+                # here. But this shouldn't happen at all, so we want it to be
+                # understood as an error in the test framework, and no some
+                # subtle error in the code under test.
+                got = got.decode(locale.getpreferredencoding())

In the comment you are talking about UTF-8 but you actually decode using the locale. Wouldn't UTF-8 make the most sense here?

Also a typo: "no some" -> "not some"

comment:4 Changed 13 months ago by git

  • Commit changed from dcff893007aff8e5bde44025801eb358acdefdad to 34876abb4119f6b5382797d065be23b830fd7762

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

12ff3beFix various minor encoding issues
34876abCorrect some encoding issues on Python 2.

comment:5 Changed 13 months ago by jdemeyer

I fixed your typo and reorganized the branch but there still remains the issue of locale vs utf-8.

comment:6 in reply to: ↑ 3 Changed 13 months ago by embray

Replying to jdemeyer:

About

+            if not isinstance(got, six.text_type):
+                # On Python 3 got should already be unicode text, but on Python
+                # 2 it is not.  For comparison's sake we want the unicode text
+                # decoded from UTF-8. If there was some error such that the
+                # output is so malformed that it does not even decode from
+                # UTF-8 at all there will be an error in the test framework
+                # here. But this shouldn't happen at all, so we want it to be
+                # understood as an error in the test framework, and no some
+                # subtle error in the code under test.
+                got = got.decode(locale.getpreferredencoding())

In the comment you are talking about UTF-8 but you actually decode using the locale. Wouldn't UTF-8 make the most sense here?

I think the use of "UTF-8" in the comment is an artifact--I'm using it as a placeholder for "whatever the correct encoding is". But actually I need to investigate that a bit more carefully. I think what this really ought to be is whatever encoding Python is using for sys.stdout but it may be more complicated than that.

comment:7 Changed 13 months ago by embray

It seems some of the patchbots in fact, however they happen to be configured, are returning 'ascii' for locale.getpreferredencoding(). I'll just make it UTF-8 for now and we can revisit locale-specific encodings if it comes up.

comment:8 follow-up: Changed 13 months ago by embray

  • Branch changed from u/jdemeyer/ticket/24772 to u/embray/python3/ticket-24772
  • Commit changed from 34876abb4119f6b5382797d065be23b830fd7762 to bf6cf97ada80f3d70a07cd416a53acd2453b6d88
  • Status changed from needs_work to positive_review

I just based this off your branch and incorporated the change to UTF-8. I think this should be good enough for now; it's not worth overthinking until there's a concrete reason to. Tests look good with this change.

Setting "positive review" assuming you agree.


New commits:

bf6cf97Correct some encoding issues on Python 2.

comment:9 in reply to: ↑ 8 Changed 13 months ago by jdemeyer

Replying to embray:

Setting "positive review" assuming you agree.

That was my only complaint, so yes I agree.

comment:10 Changed 13 months ago by vbraun

  • Status changed from positive_review to needs_work

Reviewer name...

comment:11 Changed 13 months ago by chapoton

  • Reviewers set to Jeroen Demeyer
  • Status changed from needs_work to positive_review

comment:12 Changed 13 months ago by vbraun

  • Branch changed from u/embray/python3/ticket-24772 to bf6cf97ada80f3d70a07cd416a53acd2453b6d88
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.