Opened 2 years ago

Closed 22 months ago

#24223 closed defect (fixed)

py3: several string conversion fixes

Reported by: embray Owned by:
Priority: major Milestone: sage-8.2
Component: python3 Keywords:
Cc: chapoton, jdemeyer Merged in:
Authors: Erik Bray Reviewers: Jeroen Demeyer
Report Upstream: N/A Work issues:
Branch: e9a582e (Commits) Commit: e9a582e0a3a5fe777ea34475200f18288fc3db2c
Dependencies: Stopgaps:

Description

This fixes several (hardly exhaustive) string conversions around Sage such that functions and methods that returned str on Python 2 also return str on Python 3 (and the same for functions and methods that take str as arguments) using the new string utilities from #24222.

This also incorporates/replaces the fixes from #23812.

Change History (41)

comment:1 Changed 2 years ago by embray

With this, plus a few other small fixes (including fixes from other existing tickets merged in) I was able to get the Sage REPL up and 1 + 1 working.

comment:2 Changed 2 years ago by git

  • Commit changed from 44a55407569e8919b7598ca367a05469732e78bf to 25d6c32154c88dc4e9f1520d105b586b61c43cdc

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

34ef6dcAdd HAVE_GMPY2 compile-time constant
158d984Force absolute import in have_module()
7f06e71Fix documentation
3ac146cAdd a Cython compile-time constant for Python major version.
f71fd41Add new string conversion utilities.
a5d5219Add support for optional error handling.
627d4c7Add new string conversion utilities.
25d6c32Numerous string conversions from bytes to str and from str to bytes for Python 3

comment:3 Changed 2 years ago by embray

  • Status changed from new to needs_review

Rebased on current version of #24222.

comment:4 follow-up: Changed 2 years ago by jdemeyer

  • Status changed from needs_review to needs_work

Something is wrong: there are still the files src/sage/misc/string.*

comment:5 follow-up: Changed 2 years ago by jdemeyer

This is again a ticket with many small independent changes where a "partial positive review" like what I proposed on #24025 would make sense. I didn't like how we handled that, so how should we proceed here?

comment:6 in reply to: ↑ 4 Changed 2 years ago by embray

Replying to jdemeyer:

Something is wrong: there are still the files src/sage/misc/string.*

Oops--part of the problem here is I'm moving between two different working trees, one for Python 2 and one for Python 3, so it can get confusing. I'll clean this up.

comment:7 in reply to: ↑ 5 Changed 2 years ago by embray

Replying to jdemeyer:

This is again a ticket with many small independent changes where a "partial positive review" like what I proposed on #24025 would make sense. I didn't like how we handled that, so how should we proceed here?

I'd be fine if you proposed your changes and tossed it back to me as "needs review", rather than going immediately to "positive review".

comment:8 Changed 2 years ago by jdemeyer

It seems strange to me to use locale.getpreferredencoding() for strings which do not depend on the locale. I would propose:

  1. Use locale.getpreferredencoding() for strings which actually depend on the locale.
  1. Use sys.getfilesystemencoding() for strings which actually depend on the file system encoding.
  1. Use UTF-8 otherwise.

comment:9 Changed 2 years ago by embray

  • Milestone changed from sage-8.1 to sage-8.2

comment:10 Changed 2 years ago by git

  • Commit changed from 25d6c32154c88dc4e9f1520d105b586b61c43cdc to 56f1c02add22c05178f43c2d14ea4bfa1aae0db9

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

64567aaMake type checks explicit
a358aeaRequire a const char*--this is more consistent for use in cases where
64a1c2dUse a less unusual character in the tests
1bf6709Use r-strings for the docstrings since they contain slashes
2bb8396Clean up these declarations:
6d73a41Use more specific type declarations here.
56f1c02Numerous string conversions from bytes to str and from str to bytes for Python 3

comment:11 Changed 2 years ago by git

  • Commit changed from 56f1c02add22c05178f43c2d14ea4bfa1aae0db9 to a9ebae6bfffc84a549af6e8da8abc52840b2b74c

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

a285af4Add new string conversion utilities.
5b3719aAdd support for optional error handling.
bf5f893Make type checks explicit
1bd27fcRequire a const char*--this is more consistent for use in cases where
bd2d931Use a less unusual character in the tests
b32f000Use r-strings for the docstrings since they contain slashes
ccf7bcdClean up these declarations:
dec9f3aUse more specific type declarations here.
a9ebae6Numerous string conversions from bytes to str and from str to bytes for Python 3

comment:12 Changed 23 months ago by jdemeyer

Is this needs_review again?

comment:13 Changed 23 months ago by embray

  • Status changed from needs_work to needs_review

Yeah, IIRC this was just a bunch of miscellaneous string conversions from some other tickets, but redone using the sage.cpython.string utilities.

comment:14 Changed 23 months ago by chapoton

3 failing doctests in integer.pyx

comment:15 Changed 23 months ago by jdemeyer

  • Status changed from needs_review to needs_work

comment:16 follow-up: Changed 23 months ago by embray

Interesting--when I added str_to_bytes I predicted we might later want to allow Python 2 unicode objects to be supported somehow, but put it off because I didn't think there was much code in Sage that already was expected to handle unicode (except in a few places like unicode art, etc.). But the tests that are failing here clearly show that there are some places it's expected to work.

So I see at least three options:

  1. Modify str_to_bytes to treat unicode appropriate on Python 2
  2. Add some new function like unicode_to_str that's analogous to str_to_bytes (I'm not crazy about this)
  3. Just special case unicode on Python 2 where we do want to support it.

I lean towards 1. but I'm open to ideas.

Last edited 23 months ago by embray (previous) (diff)

comment:17 in reply to: ↑ 16 Changed 23 months ago by jdemeyer

Replying to embray:

I lean towards 1.

I agree. We should just support bytes and unicode (in the Cython sense!) as input to str_to_bytes regardless of Python version. One complication is that the Python 3 code path of str_to_bytes uses some functions specific to Python 3, so we cannot just run that code on Python 2.

comment:18 follow-ups: Changed 23 months ago by embray

Right, I'll give it a look though. It will just have to have a slightly different implementation for Python 2. Welcome back btw!

comment:19 Changed 23 months ago by jdemeyer

I created a new ticket for this: #24475

comment:20 in reply to: ↑ 18 Changed 23 months ago by jdemeyer

Replying to embray:

Right, I'll give it a look though. It will just have to have a slightly different implementation for Python 2.

I would rather avoid that. Instead of PyUnicode_EncodeLocale, you can just pass NULL as encoding to PyUnicode_AsEncodedString. That leaves PyUnicode_AsUTF8 which is really only used to translate the encoding and errors arguments. We could require that this is always of type str (so, not unicode on Python 2).

comment:21 in reply to: ↑ 18 Changed 23 months ago by jdemeyer

Replying to embray:

I'll give it a look though.

I can also do that. Just let me know, to avoid double work.

comment:22 Changed 22 months ago by git

  • Commit changed from a9ebae6bfffc84a549af6e8da8abc52840b2b74c to c393e41ee8f627ca5b76149100039c5664208235

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

9cf556bRework str_to_bytes so that it accepts unicode on Python 2 and treats it same as str on Python 3.
086e523Numerous string conversions from bytes to str and from str to bytes for Python 3
728a0bcUse str_to_bytes for 'unicode' (which will be str on Python 3 and unicode on Python 2)
c393e41Minor Python 3 fix for sage.libs.flint.fmpz_poly

comment:23 Changed 22 months ago by chapoton

is this in "needs_review" state ?

Last edited 22 months ago by chapoton (previous) (diff)

comment:24 Changed 22 months ago by embray

  • Dependencies changed from #24222 to #24222, #24475

comment:25 Changed 22 months ago by git

  • Commit changed from c393e41ee8f627ca5b76149100039c5664208235 to 0925f3b3d747e2f935006576a929c317eda7f1f3

Branch pushed to git repo; I updated commit sha1. New commits:

0925f3bA few additional string conversion fixes for sage.libs.ecl

comment:26 Changed 22 months ago by embray

  • Status changed from needs_work to needs_review

comment:27 Changed 22 months ago by git

  • Commit changed from 0925f3b3d747e2f935006576a929c317eda7f1f3 to 7e4d3ccff889e902f6e93c42533b68e9991c83b4

Branch pushed to git repo; I updated commit sha1. New commits:

7e4d3ccpy3: fix pickling of integers and other minor bytes/str fixes

comment:28 Changed 22 months ago by git

  • Commit changed from 7e4d3ccff889e902f6e93c42533b68e9991c83b4 to 1338ba4db9db82a4e96191fa5b8ecb3f51cf5805

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

998911fRework str_to_bytes so that it accepts unicode on Python 2 and treats it same as str on Python 3.
c2c898eNumerous string conversions from bytes to str and from str to bytes for Python 3
0637cdeUse str_to_bytes for 'unicode' (which will be str on Python 3 and unicode on Python 2)
c0158a9Minor Python 3 fix for sage.libs.flint.fmpz_poly
f415e2fA few additional string conversion fixes for sage.libs.ecl
1338ba4py3: fix pickling of integers and other minor bytes/str fixes

comment:29 Changed 22 months ago by chapoton

Looks good to me, globally, even if do not feel I understand everything in detail.

Jeroen, do you think we can move forward with this important ticket ?

comment:30 Changed 22 months ago by jdemeyer

I always find it a bit unpleasant to review tickets with non-merged dependencies. So I would rather wait until the next beta.

Two immediate issues that I see:

  1. In src/sage/libs/mpmath/utils.pyx you add an import that you don't use.
  1. In several files, you import stuff from sage.cpython.string where you could use cimport.

comment:31 Changed 22 months ago by chapoton

  • Branch changed from u/embray/python3/string-fixes to public/python3-trac24223
  • Commit changed from 1338ba4db9db82a4e96191fa5b8ecb3f51cf5805 to 00f2776101f58a9c232fc06a3ee3eb2d87fc4d62
  • Dependencies #24222, #24475 deleted

New commits:

00f2776Merge branch 'u/embray/python3/string-fixes' in 8.2.b4

comment:32 Changed 22 months ago by git

  • Commit changed from 00f2776101f58a9c232fc06a3ee3eb2d87fc4d62 to 7d68d8c73f2524d8ceb7af3d74bba57f31a4cbe7

Branch pushed to git repo; I updated commit sha1. New commits:

7d68d8csome details of imports

comment:33 Changed 22 months ago by jdemeyer

  • Reviewers set to Jeroen Demeyer
  • Status changed from needs_review to positive_review

comment:34 Changed 22 months ago by embray

  • Status changed from positive_review to needs_work

This does not work:

-from sage.cpython.string import FS_ENCODING, str_to_bytes
+from sage.cpython.string cimport FS_ENCODING, str_to_bytes

FS_ENCODING is can't be cimported.

comment:35 Changed 22 months ago by git

  • Commit changed from 7d68d8c73f2524d8ceb7af3d74bba57f31a4cbe7 to 80b0d1f0c5d761c6804abff6bfbf94a1645105c8

Branch pushed to git repo; I updated commit sha1. New commits:

80b0d1ffix import

comment:36 Changed 22 months ago by chapoton

  • Status changed from needs_work to needs_review

import corrected

comment:37 Changed 22 months ago by embray

  • Branch changed from public/python3-trac24223 to u/embray/python3/string-fixes
  • Commit changed from 80b0d1f0c5d761c6804abff6bfbf94a1645105c8 to e9a582e0a3a5fe777ea34475200f18288fc3db2c
  • Status changed from needs_review to positive_review

I rebased and squashed some of these changes.


New commits:

cc214fdNumerous string conversions from bytes to str and from str to bytes for Python 3
f7e1153Use str_to_bytes for 'unicode' (which will be str on Python 3 and unicode on Python 2)
a4d6fe3A few additional string conversion fixes for sage.libs.ecl
d03a8d4py3: fix pickling of integers and other minor bytes/str fixes
e9a582eimport -> cimport where appropriate

comment:38 Changed 22 months ago by vbraun

  • Status changed from positive_review to needs_work
[sagelib-8.2.beta4] Error compiling Cython file:
[sagelib-8.2.beta4] ------------------------------------------------------------
[sagelib-8.2.beta4] ...
[sagelib-8.2.beta4] from sage.rings.finite_rings.finite_field_prime_modn import FiniteField_prime_modn
[sagelib-8.2.beta4] from sage.rings.finite_rings.finite_field_givaro import FiniteField_givaro
[sagelib-8.2.beta4] from sage.rings.finite_rings.finite_field_ntl_gf2e import FiniteField_ntl_gf2e
[sagelib-8.2.beta4] from sage.libs.pari.all import pari
[sagelib-8.2.beta4] from sage.libs.gmp.all cimport *
[sagelib-8.2.beta4] from sage.cpython.string cimport FS_ENCODING, str_to_bytes
[sagelib-8.2.beta4] ^
[sagelib-8.2.beta4] ------------------------------------------------------------
[sagelib-8.2.beta4] 
[sagelib-8.2.beta4] sage/libs/singular/singular.pyx:40:0: 'sage/cpython/string/FS_ENCODING.pxd' not found
[sagelib-8.2.beta4] 
[sagelib-8.2.beta4] Error compiling Cython file:
[sagelib-8.2.beta4] ------------------------------------------------------------
[sagelib-8.2.beta4] ...
[sagelib-8.2.beta4]     lib = SINGULAR_SO
[sagelib-8.2.beta4] 
[sagelib-8.2.beta4]     if not os.path.exists(lib):
[sagelib-8.2.beta4]         raise ImportError("cannot locate Singular library ({})".format(lib))
[sagelib-8.2.beta4] 
[sagelib-8.2.beta4]     lib = str_to_bytes(lib, FS_ENCODING, "surrogateescape")
[sagelib-8.2.beta4]                            ^
[sagelib-8.2.beta4] ------------------------------------------------------------
[sagelib-8.2.beta4] 
[sagelib-8.2.beta4] sage/libs/singular/singular.pyx:778:28: 'FS_ENCODING' is not a constant, variable or function identifier
[sagelib-8.2.beta4] Traceback (most recent call last):
[sagelib-8.2.beta4]   File "/home/buildbot/slave/sage_git/build/local/lib/python2.7/site-packages/Cython/Build/Dependencies.py", line 1179, in cythonize_one_helper
[sagelib-8.2.beta4]     return cythonize_one(*m)
[sagelib-8.2.beta4]   File "/home/buildbot/slave/sage_git/build/local/lib/python2.7/site-packages/Cython/Build/Dependencies.py", line 1161, in cythonize_one
[sagelib-8.2.beta4]     raise CompileError(None, pyx_file)
[sagelib-8.2.beta4] CompileError: sage/libs/singular/singular.pyx

comment:39 Changed 22 months ago by chapoton

Volker, did you use the latest branch ?

comment:40 Changed 22 months ago by embray

  • Status changed from needs_work to positive_review

He didn't. Setting this back to positive review since this otherwise fixes Jeroen's issue.

comment:41 Changed 22 months ago by vbraun

  • Branch changed from u/embray/python3/string-fixes to e9a582e0a3a5fe777ea34475200f18288fc3db2c
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.