Opened 5 years ago
Closed 5 years ago
#24059 closed enhancement (fixed)
py3 : add some decode in jmoldata and tachyon interfaces
Reported by: | Frédéric Chapoton | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-8.2 |
Component: | python3 | Keywords: | unicode |
Cc: | Erik Bray, Jeroen Demeyer | Merged in: | |
Authors: | Frédéric Chapoton | Reviewers: | Erik Bray |
Report Upstream: | N/A | Work issues: | |
Branch: | f13924e (Commits, GitHub, GitLab) | Commit: | f13924eea90ea5dc0d247cace07ee21353bdabca |
Dependencies: | Stopgaps: |
Description
because subprocess output is bytes
Change History (20)
comment:1 Changed 5 years ago by
Branch: | → u/chapoton/24059 |
---|---|
Commit: | → b769da831df61457be91a8f01ccb996866b8bb0a |
Status: | new → needs_review |
comment:2 Changed 5 years ago by
Cc: | Erik Bray added |
---|
comment:3 Changed 5 years ago by
This still isn't how we want to go about doing this, as discussed in other tickets...
The problem with littering the sources with .decode('utf-8')
are many:
- (minor) It's just kind of ugly, not very easy to grep for, and not always clear what the intent is
- On Python 2 it means you're returning
unicode
objects in code that previous just usedstr
. In my experience, when porting Py2 code to work on both Py2 and Py3 code it's best that objects that were alreadystr
remain asstr
. That way everything works the same as it did before in Python 2. And in Python 3str
is almost always preferable as well. There are exceptions to this where on Python 3 you want abytes
object, but in those cases on Python 2 it still remains asstr
. You never want aunicode
object in Python 2 except in those cases where unicode text was previously expected and supported. - In some cases just assuming 'utf-8' is unsafe. For example when decoding text that contains filesystem paths use
sys.getfilesystemencoding()
(this is also never 100% guaranteed to work but it covers the most common cases correctly across platforms).
I suggest adding a couple utility functions that look something like this:
import six DEFAULT_ENCODING = 'utf-8' if six.PY2: def _to_str(b, encoding): return b else: def _to_str(b, encoding): return b.decode(encoding, errors='surrogateescape') def to_str(b, encoding=DEFAULT_ENCODING): r""" Takes a `bytes` object and returns a `str`. On Python 2 this is a no-op since `bytes is str`. On Python 3 this decodes the given bytes with the specified encoding and returns a `str`. Examples: >>> to_str(b'abc') 'abc' >>> to_str(b'\xe2\x98\x83') '☃' """ return _to_str(b, encoding) if six.PY2: def _to_bytes(s, encoding): return s else: def _to_bytes(s, encoding): return s.encode(encoding, errors='surrogateescape') def to_bytes(s, encoding=DEFAULT_ENCODING): """ Takes a `str` object and returns a `bytes`. Basically the opposite of `to_str`. As with `to_str`, on Python 2 this is a no-op since `str` is `bytes`, whereas on Python 3 the given string is encoding with the specified encoding and a `bytes` is returned. Examples: >>> to_bytes('abc') b'abc' >>> to_bytes('☃') b'\xe2\x98\x83' """ return _to_bytes(s, encoding)
Then instead of calling .decode(...)
us to_str(...)
. This is nice because everything remains as was on Python 2, and on Python 3 you'll get a str
where desired. It's unfortunate to have to specify an encoding (but it also makes sense). This also has baked into it errors='surrogateescape'
which basically means that even if we did guess the encoding wrong we at least preserve the correct byte sequence when round-tripping back to bytes
.
comment:4 Changed 5 years ago by
I should note: The doctests in the above sample code will actually fail in Python 2. This is case where maybe the doctests and examples should be written in the version-specific implementations. This code is just an example...
comment:5 Changed 5 years ago by
Keywords: | unicode added |
---|
comment:7 Changed 5 years ago by
Status: | needs_review → needs_info |
---|
comment:8 Changed 5 years ago by
Not completely, but this can probably be closed. It seems to have accidentally completely removed the tachyon interface. I will open tickets for my fixes to the other interfaces. I don't think I've fixed the jmoldata interface, since I don't have java installed so the tests for it don't work anyways.
comment:9 Changed 5 years ago by
Milestone: | sage-8.1 → sage-8.2 |
---|
comment:10 Changed 5 years ago by
Commit: | b769da831df61457be91a8f01ccb996866b8bb0a → a29fb5826049e271ddaed1f1c6de0e52a4cd4192 |
---|
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
a29fb58 | adding some decode to subprocess in interfaces
|
comment:11 Changed 5 years ago by
Status: | needs_info → needs_review |
---|---|
Summary: | py3 : add some decode in interfaces → py3 : add some decode in jmoldate and tachyon interfaces |
New commits:
a29fb58 | adding some decode to subprocess in interfaces
|
comment:12 Changed 5 years ago by
Summary: | py3 : add some decode in jmoldate and tachyon interfaces → py3 : add some decode in jmoldata and tachyon interfaces |
---|
comment:13 Changed 5 years ago by
Commit: | a29fb5826049e271ddaed1f1c6de0e52a4cd4192 → f13924eea90ea5dc0d247cace07ee21353bdabca |
---|
Branch pushed to git repo; I updated commit sha1. New commits:
f13924e | more bytes_to_str in jmoldata
|
comment:16 Changed 5 years ago by
Fine by me.
(P.S. the pings really aren't necessary--you set this to needs review just 28 hours ago--we all have lots to work on and sometimes it takes >24 hours to get through the existing backlog--if it's in my e-mail I'll get to it (one day (maybe ;-)))
comment:17 Changed 5 years ago by
Reviewers: | → Erik Bray |
---|---|
Status: | needs_review → positive_review |
comment:18 follow-up: 19 Changed 5 years ago by
Sorry for the ping. I am sure you are very busy with other matters. All my apologies for being so impatient to see progress on python3.
comment:19 Changed 5 years ago by
Replying to chapoton:
All my apologies for being so impatient to see progress on python3.
That's nothing to apologize for! I'm excited to get this done too.
comment:20 Changed 5 years ago by
Branch: | u/chapoton/24059 → f13924eea90ea5dc0d247cace07ee21353bdabca |
---|---|
Resolution: | → fixed |
Status: | positive_review → closed |
New commits:
adding some decode to subprocess in interfaces