Opened 3 years ago
Closed 3 years ago
#24772 closed enhancement (fixed)
Encoding fixes to the doctest module
Reported by: | jdemeyer | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-8.2 |
Component: | doctest framework | Keywords: | |
Cc: | embray | Merged in: | |
Authors: | Erik Bray | Reviewers: | Jeroen Demeyer |
Report Upstream: | N/A | Work issues: | |
Branch: | bf6cf97 (Commits) | Commit: | bf6cf97ada80f3d70a07cd416a53acd2453b6d88 |
Dependencies: | #24771 | Stopgaps: |
Description
Change History (12)
comment:1 Changed 3 years ago by
- Branch set to u/jdemeyer/ticket/24772
comment:2 Changed 3 years ago by
- Commit set to dcff893007aff8e5bde44025801eb358acdefdad
- Status changed from new to needs_review
comment:3 follow-up: ↓ 6 Changed 3 years ago by
- Status changed from needs_review to needs_work
About
+ if not isinstance(got, six.text_type):
+ # On Python 3 got should already be unicode text, but on Python
+ # 2 it is not. For comparison's sake we want the unicode text
+ # decoded from UTF-8. If there was some error such that the
+ # output is so malformed that it does not even decode from
+ # UTF-8 at all there will be an error in the test framework
+ # here. But this shouldn't happen at all, so we want it to be
+ # understood as an error in the test framework, and no some
+ # subtle error in the code under test.
+ got = got.decode(locale.getpreferredencoding())
In the comment you are talking about UTF-8 but you actually decode using the locale. Wouldn't UTF-8 make the most sense here?
Also a typo: "no some" -> "not some"
comment:4 Changed 3 years ago by
- Commit changed from dcff893007aff8e5bde44025801eb358acdefdad to 34876abb4119f6b5382797d065be23b830fd7762
comment:5 Changed 3 years ago by
I fixed your typo and reorganized the branch but there still remains the issue of locale vs utf-8.
comment:6 in reply to: ↑ 3 Changed 3 years ago by
Replying to jdemeyer:
About
+ if not isinstance(got, six.text_type): + # On Python 3 got should already be unicode text, but on Python + # 2 it is not. For comparison's sake we want the unicode text + # decoded from UTF-8. If there was some error such that the + # output is so malformed that it does not even decode from + # UTF-8 at all there will be an error in the test framework + # here. But this shouldn't happen at all, so we want it to be + # understood as an error in the test framework, and no some + # subtle error in the code under test. + got = got.decode(locale.getpreferredencoding())
In the comment you are talking about UTF-8 but you actually decode using the locale. Wouldn't UTF-8 make the most sense here?
I think the use of "UTF-8" in the comment is an artifact--I'm using it as a placeholder for "whatever the correct encoding is". But actually I need to investigate that a bit more carefully. I think what this really ought to be is whatever encoding Python is using for sys.stdout
but it may be more complicated than that.
comment:7 Changed 3 years ago by
It seems some of the patchbots in fact, however they happen to be configured, are returning 'ascii'
for locale.getpreferredencoding()
. I'll just make it UTF-8 for now and we can revisit locale-specific encodings if it comes up.
comment:8 follow-up: ↓ 9 Changed 3 years ago by
- Branch changed from u/jdemeyer/ticket/24772 to u/embray/python3/ticket-24772
- Commit changed from 34876abb4119f6b5382797d065be23b830fd7762 to bf6cf97ada80f3d70a07cd416a53acd2453b6d88
- Status changed from needs_work to positive_review
I just based this off your branch and incorporated the change to UTF-8
. I think this should be good enough for now; it's not worth overthinking until there's a concrete reason to. Tests look good with this change.
Setting "positive review" assuming you agree.
New commits:
bf6cf97 | Correct some encoding issues on Python 2.
|
comment:9 in reply to: ↑ 8 Changed 3 years ago by
Replying to embray:
Setting "positive review" assuming you agree.
That was my only complaint, so yes I agree.
comment:10 Changed 3 years ago by
- Status changed from positive_review to needs_work
Reviewer name...
comment:11 Changed 3 years ago by
- Reviewers set to Jeroen Demeyer
- Status changed from needs_work to positive_review
comment:12 Changed 3 years ago by
- Branch changed from u/embray/python3/ticket-24772 to bf6cf97ada80f3d70a07cd416a53acd2453b6d88
- Resolution set to fixed
- Status changed from positive_review to closed
New commits:
Miscellaneous fixes to tests in the doctest module.
A few other miscellaneous fixes to the doctests tests
This exception reads 'utf-8' on Python 3 and 'utf8' on Python 2
Fix various minor encoding issues
Correct some encoding issues on Python 2.