Opened 5 years ago

Closed 5 years ago

#24223 closed defect (fixed)

py3: several string conversion fixes

Reported by: embray Owned by:
Priority: major Milestone: sage-8.2
Component: python3 Keywords:
Cc: chapoton, jdemeyer Merged in:
Authors: Erik Bray Reviewers: Jeroen Demeyer
Report Upstream: N/A Work issues:
Branch: e9a582e (Commits, GitHub, GitLab) Commit: e9a582e0a3a5fe777ea34475200f18288fc3db2c
Dependencies: Stopgaps:

GitHub link to the corresponding issue

Description

This fixes several (hardly exhaustive) string conversions around Sage such that functions and methods that returned str on Python 2 also return str on Python 3 (and the same for functions and methods that take str as arguments) using the new string utilities from #24222.

This also incorporates/replaces the fixes from #23812.

Change History (41)

comment:1 Changed 5 years ago by embray

With this, plus a few other small fixes (including fixes from other existing tickets merged in) I was able to get the Sage REPL up and 1 + 1 working.

comment:2 Changed 5 years ago by git

Commit: 44a55407569e8919b7598ca367a05469732e78bf25d6c32154c88dc4e9f1520d105b586b61c43cdc

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

34ef6dcAdd HAVE_GMPY2 compile-time constant
158d984Force absolute import in have_module()
7f06e71Fix documentation
3ac146cAdd a Cython compile-time constant for Python major version.
f71fd41Add new string conversion utilities.
a5d5219Add support for optional error handling.
627d4c7Add new string conversion utilities.
25d6c32Numerous string conversions from bytes to str and from str to bytes for Python 3

comment:3 Changed 5 years ago by embray

Status: newneeds_review

Rebased on current version of #24222.

comment:4 Changed 5 years ago by jdemeyer

Status: needs_reviewneeds_work

Something is wrong: there are still the files src/sage/misc/string.*

comment:5 Changed 5 years ago by jdemeyer

This is again a ticket with many small independent changes where a "partial positive review" like what I proposed on #24025 would make sense. I didn't like how we handled that, so how should we proceed here?

comment:6 in reply to:  4 Changed 5 years ago by embray

Replying to jdemeyer:

Something is wrong: there are still the files src/sage/misc/string.*

Oops--part of the problem here is I'm moving between two different working trees, one for Python 2 and one for Python 3, so it can get confusing. I'll clean this up.

comment:7 in reply to:  5 Changed 5 years ago by embray

Replying to jdemeyer:

This is again a ticket with many small independent changes where a "partial positive review" like what I proposed on #24025 would make sense. I didn't like how we handled that, so how should we proceed here?

I'd be fine if you proposed your changes and tossed it back to me as "needs review", rather than going immediately to "positive review".

comment:8 Changed 5 years ago by jdemeyer

It seems strange to me to use locale.getpreferredencoding() for strings which do not depend on the locale. I would propose:

  1. Use locale.getpreferredencoding() for strings which actually depend on the locale.
  1. Use sys.getfilesystemencoding() for strings which actually depend on the file system encoding.
  1. Use UTF-8 otherwise.

comment:9 Changed 5 years ago by embray

Milestone: sage-8.1sage-8.2

comment:10 Changed 5 years ago by git

Commit: 25d6c32154c88dc4e9f1520d105b586b61c43cdc56f1c02add22c05178f43c2d14ea4bfa1aae0db9

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

64567aaMake type checks explicit
a358aeaRequire a const char*--this is more consistent for use in cases where
64a1c2dUse a less unusual character in the tests
1bf6709Use r-strings for the docstrings since they contain slashes
2bb8396Clean up these declarations:
6d73a41Use more specific type declarations here.
56f1c02Numerous string conversions from bytes to str and from str to bytes for Python 3

comment:11 Changed 5 years ago by git

Commit: 56f1c02add22c05178f43c2d14ea4bfa1aae0db9a9ebae6bfffc84a549af6e8da8abc52840b2b74c

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

a285af4Add new string conversion utilities.
5b3719aAdd support for optional error handling.
bf5f893Make type checks explicit
1bd27fcRequire a const char*--this is more consistent for use in cases where
bd2d931Use a less unusual character in the tests
b32f000Use r-strings for the docstrings since they contain slashes
ccf7bcdClean up these declarations:
dec9f3aUse more specific type declarations here.
a9ebae6Numerous string conversions from bytes to str and from str to bytes for Python 3

comment:12 Changed 5 years ago by jdemeyer

Is this needs_review again?

comment:13 Changed 5 years ago by embray

Status: needs_workneeds_review

Yeah, IIRC this was just a bunch of miscellaneous string conversions from some other tickets, but redone using the sage.cpython.string utilities.

comment:14 Changed 5 years ago by chapoton

3 failing doctests in integer.pyx

comment:15 Changed 5 years ago by jdemeyer

Status: needs_reviewneeds_work

comment:16 Changed 5 years ago by embray

Interesting--when I added str_to_bytes I predicted we might later want to allow Python 2 unicode objects to be supported somehow, but put it off because I didn't think there was much code in Sage that already was expected to handle unicode (except in a few places like unicode art, etc.). But the tests that are failing here clearly show that there are some places it's expected to work.

So I see at least three options:

  1. Modify str_to_bytes to treat unicode appropriate on Python 2
  2. Add some new function like unicode_to_str that's analogous to str_to_bytes (I'm not crazy about this)
  3. Just special case unicode on Python 2 where we do want to support it.

I lean towards 1. but I'm open to ideas.

Last edited 5 years ago by embray (previous) (diff)

comment:17 in reply to:  16 Changed 5 years ago by jdemeyer

Replying to embray:

I lean towards 1.

I agree. We should just support bytes and unicode (in the Cython sense!) as input to str_to_bytes regardless of Python version. One complication is that the Python 3 code path of str_to_bytes uses some functions specific to Python 3, so we cannot just run that code on Python 2.

comment:18 Changed 5 years ago by embray

Right, I'll give it a look though. It will just have to have a slightly different implementation for Python 2. Welcome back btw!

comment:19 Changed 5 years ago by jdemeyer

I created a new ticket for this: #24475

comment:20 in reply to:  18 Changed 5 years ago by jdemeyer

Replying to embray:

Right, I'll give it a look though. It will just have to have a slightly different implementation for Python 2.

I would rather avoid that. Instead of PyUnicode_EncodeLocale, you can just pass NULL as encoding to PyUnicode_AsEncodedString. That leaves PyUnicode_AsUTF8 which is really only used to translate the encoding and errors arguments. We could require that this is always of type str (so, not unicode on Python 2).

comment:21 in reply to:  18 Changed 5 years ago by jdemeyer

Replying to embray:

I'll give it a look though.

I can also do that. Just let me know, to avoid double work.

comment:22 Changed 5 years ago by git

Commit: a9ebae6bfffc84a549af6e8da8abc52840b2b74cc393e41ee8f627ca5b76149100039c5664208235

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

9cf556bRework str_to_bytes so that it accepts unicode on Python 2 and treats it same as str on Python 3.
086e523Numerous string conversions from bytes to str and from str to bytes for Python 3
728a0bcUse str_to_bytes for 'unicode' (which will be str on Python 3 and unicode on Python 2)
c393e41Minor Python 3 fix for sage.libs.flint.fmpz_poly

comment:23 Changed 5 years ago by chapoton

is this in "needs_review" state ?

Last edited 5 years ago by chapoton (previous) (diff)

comment:24 Changed 5 years ago by embray

Dependencies: #24222#24222, #24475

comment:25 Changed 5 years ago by git

Commit: c393e41ee8f627ca5b76149100039c56642082350925f3b3d747e2f935006576a929c317eda7f1f3

Branch pushed to git repo; I updated commit sha1. New commits:

0925f3bA few additional string conversion fixes for sage.libs.ecl

comment:26 Changed 5 years ago by embray

Status: needs_workneeds_review

comment:27 Changed 5 years ago by git

Commit: 0925f3b3d747e2f935006576a929c317eda7f1f37e4d3ccff889e902f6e93c42533b68e9991c83b4

Branch pushed to git repo; I updated commit sha1. New commits:

7e4d3ccpy3: fix pickling of integers and other minor bytes/str fixes

comment:28 Changed 5 years ago by git

Commit: 7e4d3ccff889e902f6e93c42533b68e9991c83b41338ba4db9db82a4e96191fa5b8ecb3f51cf5805

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

998911fRework str_to_bytes so that it accepts unicode on Python 2 and treats it same as str on Python 3.
c2c898eNumerous string conversions from bytes to str and from str to bytes for Python 3
0637cdeUse str_to_bytes for 'unicode' (which will be str on Python 3 and unicode on Python 2)
c0158a9Minor Python 3 fix for sage.libs.flint.fmpz_poly
f415e2fA few additional string conversion fixes for sage.libs.ecl
1338ba4py3: fix pickling of integers and other minor bytes/str fixes

comment:29 Changed 5 years ago by chapoton

Looks good to me, globally, even if do not feel I understand everything in detail.

Jeroen, do you think we can move forward with this important ticket ?

comment:30 Changed 5 years ago by jdemeyer

I always find it a bit unpleasant to review tickets with non-merged dependencies. So I would rather wait until the next beta.

Two immediate issues that I see:

  1. In src/sage/libs/mpmath/utils.pyx you add an import that you don't use.
  1. In several files, you import stuff from sage.cpython.string where you could use cimport.

comment:31 Changed 5 years ago by chapoton

Branch: u/embray/python3/string-fixespublic/python3-trac24223
Commit: 1338ba4db9db82a4e96191fa5b8ecb3f51cf580500f2776101f58a9c232fc06a3ee3eb2d87fc4d62
Dependencies: #24222, #24475

New commits:

00f2776Merge branch 'u/embray/python3/string-fixes' in 8.2.b4

comment:32 Changed 5 years ago by git

Commit: 00f2776101f58a9c232fc06a3ee3eb2d87fc4d627d68d8c73f2524d8ceb7af3d74bba57f31a4cbe7

Branch pushed to git repo; I updated commit sha1. New commits:

7d68d8csome details of imports

comment:33 Changed 5 years ago by jdemeyer

Reviewers: Jeroen Demeyer
Status: needs_reviewpositive_review

comment:34 Changed 5 years ago by embray

Status: positive_reviewneeds_work

This does not work:

-from sage.cpython.string import FS_ENCODING, str_to_bytes
+from sage.cpython.string cimport FS_ENCODING, str_to_bytes

FS_ENCODING is can't be cimported.

comment:35 Changed 5 years ago by git

Commit: 7d68d8c73f2524d8ceb7af3d74bba57f31a4cbe780b0d1f0c5d761c6804abff6bfbf94a1645105c8

Branch pushed to git repo; I updated commit sha1. New commits:

80b0d1ffix import

comment:36 Changed 5 years ago by chapoton

Status: needs_workneeds_review

import corrected

comment:37 Changed 5 years ago by embray

Branch: public/python3-trac24223u/embray/python3/string-fixes
Commit: 80b0d1f0c5d761c6804abff6bfbf94a1645105c8e9a582e0a3a5fe777ea34475200f18288fc3db2c
Status: needs_reviewpositive_review

I rebased and squashed some of these changes.


New commits:

cc214fdNumerous string conversions from bytes to str and from str to bytes for Python 3
f7e1153Use str_to_bytes for 'unicode' (which will be str on Python 3 and unicode on Python 2)
a4d6fe3A few additional string conversion fixes for sage.libs.ecl
d03a8d4py3: fix pickling of integers and other minor bytes/str fixes
e9a582eimport -> cimport where appropriate

comment:38 Changed 5 years ago by vbraun

Status: positive_reviewneeds_work
[sagelib-8.2.beta4] Error compiling Cython file:
[sagelib-8.2.beta4] ------------------------------------------------------------
[sagelib-8.2.beta4] ...
[sagelib-8.2.beta4] from sage.rings.finite_rings.finite_field_prime_modn import FiniteField_prime_modn
[sagelib-8.2.beta4] from sage.rings.finite_rings.finite_field_givaro import FiniteField_givaro
[sagelib-8.2.beta4] from sage.rings.finite_rings.finite_field_ntl_gf2e import FiniteField_ntl_gf2e
[sagelib-8.2.beta4] from sage.libs.pari.all import pari
[sagelib-8.2.beta4] from sage.libs.gmp.all cimport *
[sagelib-8.2.beta4] from sage.cpython.string cimport FS_ENCODING, str_to_bytes
[sagelib-8.2.beta4] ^
[sagelib-8.2.beta4] ------------------------------------------------------------
[sagelib-8.2.beta4] 
[sagelib-8.2.beta4] sage/libs/singular/singular.pyx:40:0: 'sage/cpython/string/FS_ENCODING.pxd' not found
[sagelib-8.2.beta4] 
[sagelib-8.2.beta4] Error compiling Cython file:
[sagelib-8.2.beta4] ------------------------------------------------------------
[sagelib-8.2.beta4] ...
[sagelib-8.2.beta4]     lib = SINGULAR_SO
[sagelib-8.2.beta4] 
[sagelib-8.2.beta4]     if not os.path.exists(lib):
[sagelib-8.2.beta4]         raise ImportError("cannot locate Singular library ({})".format(lib))
[sagelib-8.2.beta4] 
[sagelib-8.2.beta4]     lib = str_to_bytes(lib, FS_ENCODING, "surrogateescape")
[sagelib-8.2.beta4]                            ^
[sagelib-8.2.beta4] ------------------------------------------------------------
[sagelib-8.2.beta4] 
[sagelib-8.2.beta4] sage/libs/singular/singular.pyx:778:28: 'FS_ENCODING' is not a constant, variable or function identifier
[sagelib-8.2.beta4] Traceback (most recent call last):
[sagelib-8.2.beta4]   File "/home/buildbot/slave/sage_git/build/local/lib/python2.7/site-packages/Cython/Build/Dependencies.py", line 1179, in cythonize_one_helper
[sagelib-8.2.beta4]     return cythonize_one(*m)
[sagelib-8.2.beta4]   File "/home/buildbot/slave/sage_git/build/local/lib/python2.7/site-packages/Cython/Build/Dependencies.py", line 1161, in cythonize_one
[sagelib-8.2.beta4]     raise CompileError(None, pyx_file)
[sagelib-8.2.beta4] CompileError: sage/libs/singular/singular.pyx

comment:39 Changed 5 years ago by chapoton

Volker, did you use the latest branch ?

comment:40 Changed 5 years ago by embray

Status: needs_workpositive_review

He didn't. Setting this back to positive review since this otherwise fixes Jeroen's issue.

comment:41 Changed 5 years ago by vbraun

Branch: u/embray/python3/string-fixese9a582e0a3a5fe777ea34475200f18288fc3db2c
Resolution: fixed
Status: positive_reviewclosed
Note: See TracTickets for help on using tickets.