Opened 2 years ago

Closed 19 months ago

#24059 closed enhancement (fixed)

py3 : add some decode in jmoldata and tachyon interfaces

Reported by: chapoton Owned by:
Priority: major Milestone: sage-8.2
Component: python3 Keywords: unicode
Cc: embray, jdemeyer Merged in:
Authors: Frédéric Chapoton Reviewers: Erik Bray
Report Upstream: N/A Work issues:
Branch: f13924e (Commits) Commit: f13924eea90ea5dc0d247cace07ee21353bdabca
Dependencies: Stopgaps:

Description

because subprocess output is bytes

Change History (20)

comment:1 Changed 2 years ago by chapoton

  • Branch set to u/chapoton/24059
  • Commit set to b769da831df61457be91a8f01ccb996866b8bb0a
  • Status changed from new to needs_review

New commits:

b769da8adding some decode to subprocess in interfaces

comment:2 Changed 2 years ago by chapoton

  • Cc embray added

comment:3 Changed 2 years ago by embray

This still isn't how we want to go about doing this, as discussed in other tickets...

The problem with littering the sources with .decode('utf-8') are many:

  • (minor) It's just kind of ugly, not very easy to grep for, and not always clear what the intent is
  • On Python 2 it means you're returning unicode objects in code that previous just used str. In my experience, when porting Py2 code to work on both Py2 and Py3 code it's best that objects that were already str remain as str. That way everything works the same as it did before in Python 2. And in Python 3 str is almost always preferable as well. There are exceptions to this where on Python 3 you want a bytes object, but in those cases on Python 2 it still remains as str. You never want a unicode object in Python 2 except in those cases where unicode text was previously expected and supported.
  • In some cases just assuming 'utf-8' is unsafe. For example when decoding text that contains filesystem paths use sys.getfilesystemencoding() (this is also never 100% guaranteed to work but it covers the most common cases correctly across platforms).

I suggest adding a couple utility functions that look something like this:

import six

DEFAULT_ENCODING = 'utf-8'

if six.PY2:
    def _to_str(b, encoding):
        return b
else:
    def _to_str(b, encoding):
        return b.decode(encoding, errors='surrogateescape')


def to_str(b, encoding=DEFAULT_ENCODING):
    r"""
    Takes a `bytes` object and returns a `str`.

    On Python 2 this is a no-op since `bytes is str`.  On Python 3 this decodes
    the given bytes with the specified encoding and returns a `str`.

    Examples:

        >>> to_str(b'abc')
        'abc'
        >>> to_str(b'\xe2\x98\x83')
        '☃'

    """

    return _to_str(b, encoding)


if six.PY2:
    def _to_bytes(s, encoding):
        return s
else:
    def _to_bytes(s, encoding):
        return s.encode(encoding, errors='surrogateescape')


def to_bytes(s, encoding=DEFAULT_ENCODING):
    """
    Takes a `str` object and returns a `bytes`.  Basically the opposite of
    `to_str`.

    As with `to_str`, on Python 2 this is a no-op since `str` is `bytes`,
    whereas on Python 3 the given string is encoding with the specified
    encoding and a `bytes` is returned.

    Examples:

        >>> to_bytes('abc')
        b'abc'
        >>> to_bytes('☃')
        b'\xe2\x98\x83'
    """

    return _to_bytes(s, encoding)

Then instead of calling .decode(...) us to_str(...). This is nice because everything remains as was on Python 2, and on Python 3 you'll get a str where desired. It's unfortunate to have to specify an encoding (but it also makes sense). This also has baked into it errors='surrogateescape' which basically means that even if we did guess the encoding wrong we at least preserve the correct byte sequence when round-tripping back to bytes.

Last edited 2 years ago by embray (previous) (diff)

comment:4 Changed 2 years ago by embray

I should note: The doctests in the above sample code will actually fail in Python 2. This is case where maybe the doctests and examples should be written in the version-specific implementations. This code is just an example...

comment:5 Changed 2 years ago by chapoton

  • Keywords unicode added

comment:6 Changed 21 months ago by chapoton

is this ticket replaced by another one, Erik ?

comment:7 Changed 21 months ago by jdemeyer

  • Status changed from needs_review to needs_info

comment:8 Changed 21 months ago by embray

Not completely, but this can probably be closed. It seems to have accidentally completely removed the tachyon interface. I will open tickets for my fixes to the other interfaces. I don't think I've fixed the jmoldata interface, since I don't have java installed so the tests for it don't work anyways.

comment:9 Changed 20 months ago by chapoton

  • Milestone changed from sage-8.1 to sage-8.2

comment:10 Changed 20 months ago by git

  • Commit changed from b769da831df61457be91a8f01ccb996866b8bb0a to a29fb5826049e271ddaed1f1c6de0e52a4cd4192

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

a29fb58adding some decode to subprocess in interfaces

comment:11 Changed 20 months ago by chapoton

  • Status changed from needs_info to needs_review
  • Summary changed from py3 : add some decode in interfaces to py3 : add some decode in jmoldate and tachyon interfaces

New commits:

a29fb58adding some decode to subprocess in interfaces

comment:12 Changed 20 months ago by chapoton

  • Summary changed from py3 : add some decode in jmoldate and tachyon interfaces to py3 : add some decode in jmoldata and tachyon interfaces

comment:13 Changed 20 months ago by git

  • Commit changed from a29fb5826049e271ddaed1f1c6de0e52a4cd4192 to f13924eea90ea5dc0d247cace07ee21353bdabca

Branch pushed to git repo; I updated commit sha1. New commits:

f13924emore bytes_to_str in jmoldata

comment:14 Changed 20 months ago by chapoton

  • Cc jdemeyer added

ready for review, I think

comment:15 Changed 19 months ago by chapoton

ping ?

comment:16 Changed 19 months ago by embray

Fine by me.

(P.S. the pings really aren't necessary--you set this to needs review just 28 hours ago--we all have lots to work on and sometimes it takes >24 hours to get through the existing backlog--if it's in my e-mail I'll get to it (one day (maybe ;-)))

comment:17 Changed 19 months ago by embray

  • Reviewers set to Erik Bray
  • Status changed from needs_review to positive_review

comment:18 follow-up: Changed 19 months ago by chapoton

Sorry for the ping. I am sure you are very busy with other matters. All my apologies for being so impatient to see progress on python3.

comment:19 in reply to: ↑ 18 Changed 19 months ago by embray

Replying to chapoton:

All my apologies for being so impatient to see progress on python3.

That's nothing to apologize for! I'm excited to get this done too.

comment:20 Changed 19 months ago by vbraun

  • Branch changed from u/chapoton/24059 to f13924eea90ea5dc0d247cace07ee21353bdabca
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.