Opened 5 years ago
Closed 5 years ago
#24222 closed defect (fixed)
py3: simplified string conversion utilities
Reported by: | embray | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-8.2 |
Component: | python3 | Keywords: | |
Cc: | chapoton, jdemeyer | Merged in: | |
Authors: | Erik Bray, Jeroen Demeyer | Reviewers: | Jeroen Demeyer, Erik Bray |
Report Upstream: | N/A | Work issues: | |
Branch: | dec9f3a (Commits, GitHub, GitLab) | Commit: | dec9f3ac79a992d3152366ae266f1065c70c7867 |
Dependencies: | Stopgaps: |
Description
A possible alternative to #24186, implementing simple conversion from C char
arrays or bytes
objects to str
objects, and of str
objects to bytes
objects. Here "str
" and "bytes
" are to be read exactly for either Python 2 or Python 3, so on Python 2 this means no conversion is performed since str is bytes == True
.
One thing this does not do is implement any kind of conversion from Python 2 unicode
objects to bytes
. This functionality might be worth adding, in some form, to str_to_bytes
. But this would add a new feature on Python 2, whereas for now I'm only trying to preserve the existing functionality on Python 2 exactly, while transparently supporting Python 3 str
s everywhere that Python 2 str
s are supported.
Change History (96)
comment:1 Changed 5 years ago by
Commit: | → 4c787436df8bad6ebf1edac65ba97e080295e3bc |
---|
comment:2 Changed 5 years ago by
Status: | new → needs_review |
---|
comment:3 Changed 5 years ago by
Status: | needs_review → needs_info |
---|
OK, so now we have to discuss encodings... first of all, I really don't like the encoding
argument in the functions, that will just slow everything down. Better pick one (or a few) good encoding(s) and use those.
I don't see why you would pick locale.getpreferredencoding()
as default encoding. Relevant encodings should be:
ascii
. In many cases we know that output from a C library is an ASCII string (for example, the string representation of a number). Luckily, ASCII is compatible with most other encodings, so this isn't an issue.
filesystemencoding
. This is the encoding that Python internally uses for most conversionschar*
<->str
. For this reason, I think that this is the most important use case and it should be the default when in doubt.
utf-8
. This is good if you just want to storeunicode
aschar*
(say, an internal user-specified name for something) and you don't care about the exact encoding but you do care that round-trip conversion works properly.
locale.getpreferredencoding()
. I believe that this is mainly relevant for subprocesses, provided that those subprocesses also look at the encoding.
For efficiency, I would rather have several sets of functions, each with a hard-coded encoding.
I would also suggest to implement these functions (or at least the functions where you care about performance) as cdef inline
functions in the .pxd
file. That way, they will actually be inlined when calling them from an external Cython module.
comment:4 follow-up: 5 Changed 5 years ago by
You don't need six.PY2
. You can just replace
if six.PY2: from cpython.string cimport PyString_FromString else: from cpython.bytes cimport PyBytes_FromString as PyString_FromString
by
from cpython.bytes cimport PyBytes_FromString as PyString_FromString
You could also just use PyBytes_FromString
in the code.
comment:5 Changed 5 years ago by
Replying to jdemeyer:
You don't need
six.PY2
. You can just replace
That's what I thought originally too, but I was having some problems trying to do it as a one-liner. I'll try your suggestion though and double-check whether it works.
comment:6 follow-ups: 8 9 Changed 5 years ago by
I don't think the "encoding" argument slows anything down by any significant amount, especially in cases where it isn't used. I'd be amenable to encoding-specific functions for some cases, as Python has those as well in its API. But right now I'm really trying to avoid breadth of API surface. It is good to have generic versions of these functions that accept any encoding, and this is the bare minimum needed to get Python 3 support off the ground. I think that later we can encoding-specific functions where specific use cases for them can be demonstrated. I don't want to get into the weeds with this right now.
As for the default encoding, locale.getpreferredencoding()
makes more sense. sys.getfilesystemencoding()
is specifically about guessing (more or less) what encoding the filesystem specifically is using for encoding e.g. of filenames, and has nothing to do with encodings of file contents or of arbitrary strings output by software. (Well, that's not exactly true since on *nix the two are essentially the same, although the former can change if the process's locale is changed, I think).
So on *NIX platforms like we care about the most it's actually mostly a moot point. But a lot of the software Sage interfaces with is locale-aware, so it's better to take that into account as the default than not.
comment:7 follow-up: 10 Changed 5 years ago by
I would also suggest to implement these functions (or at least the functions where you care about performance) as cdef inline functions in the .pxd file.
I was wondering about that. But in that case the .pyx
file will be empty (does it even need to exist at all)? What about the default encoding variables?
comment:8 follow-up: 14 Changed 5 years ago by
Maybe the best solution would be to not have a default encoding and require users to think what encoding they want?
comment:9 Changed 5 years ago by
Replying to embray:
I don't think the "encoding" argument slows anything down by any significant amount
The Python 3 C API has specific functions to encode/decode for certainly particular encodings:
- https://docs.python.org/3/c-api/unicode.html#utf-8-codecs
- https://docs.python.org/3/c-api/unicode.html#locale-encoding
- https://docs.python.org/3/c-api/unicode.html#file-system-encoding
I haven't profiled, but I would hope that these are faster than the generic API.
comment:10 Changed 5 years ago by
Replying to embray:
But in that case the
.pyx
file will be empty (does it even need to exist at all)?
It doesn't need to exist.
What about the default encoding variables?
I propose to remove those variables anyway. If you do need them for some reason, they must be in the .pyx
file.
comment:11 follow-up: 15 Changed 5 years ago by
Two more things:
- it is safer to use the Cython-compile-time constant
PY_VERSION_HEX
instead of the C-compile-time constantPY_MAJOR_VERSION
. It is dubious C code to write something likeif (0) nonexisting_function()
when nonexisting_function()
doesn't actually exist like PyUnicode_AsUTF8
on Python 2. The only reason that the above code compiles is compiler optimization.
See also #24215 for Cython-compile-time constants in general.
- Could you move this to
sage/cpython/string.pxd
? I recently created thesage/cpython
directory to deal with low-level Python internals and I think that these functions would fit there.
comment:12 follow-up: 27 Changed 5 years ago by
fails to build, see patchbot report:
from string import Template
comment:13 Changed 5 years ago by
oh, I see. Name conflict between the global "string" module and the local "string" module..
comment:14 follow-ups: 17 21 Changed 5 years ago by
Replying to jdemeyer:
Maybe the best solution would be to not have a default encoding and require users to think what encoding they want?
That's far too onerous and anxiety-provoking for the user :) Python itself uses default encodings all over the place when in doubt (hence e.g. PyUnicode_DecodeFSDefault). It's an unfortunate fact that there's isn't always one "right answer" here; the best we can do is provide sensible defaults and the ability for user-specified encoding where applicable; that is, where we know we want a specific encoding. Moving forward we can also do more, for example, to ensure that any locale-aware code run by Sage is handled well.
I could definitely agree to adding more encoding-specific helper functions, especially for ASCII and UTF-8. But as a first pass, for the sake of getting Python 3 support off the ground, I'd prefer to leave this as is and then make adjustments as specific use cases arise. It will be difficult to even find those specific use cases until and unless we get further along on getting Python 3 working in general (with these functions, plus a few other fixes I'll be posting soon, I've gotten the Sage doctest runner working, so that will help expose a lot of interesting cases quickly).
I'll look at the rest of your suggestions; they seem reasonable.
comment:15 follow-up: 16 Changed 5 years ago by
Replying to jdemeyer:
Two more things:
- it is safer to use the Cython-compile-time constant
PY_VERSION_HEX
instead of the C-compile-time constantPY_MAJOR_VERSION
. It is dubious C code to write something like
Ah, I was actually really looking for something like this but I couldn't find it anywhere in the Cython documentation. Am I just blind?
comment:16 follow-up: 23 Changed 5 years ago by
Replying to embray:
Replying to jdemeyer:
Two more things:
- it is safer to use the Cython-compile-time constant
PY_VERSION_HEX
instead of the C-compile-time constantPY_MAJOR_VERSION
. It is dubious C code to write something likeAh, I was actually really looking for something like this but I couldn't find it anywhere in the Cython documentation. Am I just blind?
Nevermind; I see now that we explicitly pass that in to cythonize
in the setup.py
. For that matter, any reason not to add simply PY2
and PY3
compile-time constants? Comparing against PY_VERSION_HEX
is a little annoying.
comment:17 follow-up: 18 Changed 5 years ago by
It will be difficult to even find those specific use cases until and unless we get further along on getting Python 3 working in general (with these functions, plus a few other fixes I'll be posting soon, I've gotten the Sage doctest runner working, so that will help expose a lot of interesting cases quickly).
This sounds great ! I was hoping that the doctest framework could be made to work at some point, but was not expecting it soon.
comment:18 Changed 5 years ago by
Replying to chapoton:
It will be difficult to even find those specific use cases until and unless we get >further along on getting Python 3 working in general (with these functions, plus a >few other fixes I'll be posting soon, I've gotten the Sage doctest runner working, so >that will help expose a lot of interesting cases quickly).
This sounds great ! I was hoping that the doctest framework could be made to work at some point, but was not expecting it soon.
I had to get it working, in part, so that I could run the doctests for this module :)
I think it will help things go much faster.
comment:19 Changed 5 years ago by
I'm seeing now how wanting to have module-level global variables in conjunction with inline c(p)def
functions in Cython gets problematic. It just compiles them as __Pyx_GetModuleGlobalName(name)
. I feel like this should be a bug (I think not too difficult to fix?) in Cython, since it's really counter to how Python should work in this case.
For functions inlined from another module--at least if those functions access global variables from their original module--it should import that module during module initialization and use the correct module dict for globals lookups (just as normal Python functions do, basically).
That's an issue beyond this one though, so I'll rework things for now to get rid use of the global variables by these functions (I still want to have default encodings though).
comment:20 Changed 5 years ago by
So it turns out PyUnicode_AsEncodedString
and PyUnicode_Decode
do allow the encoding
argument to be NULL
, in which case they default to UTF-8. So if it's a good enough default for Python maybe it's a good enough default for us, for now.
Perhaps I'll just stick with that functionality, and take care to use things like locale.getpreferredencoding()
when necessary, such as interfacing with external software. To that end it would be good to come up with some list of exactly what software Sage interfaces with is locale aware. Certainly ECL and GAP are strong candidates; probably others. It will be good to add some tests to that effect but we can worry about it when we come to it.
comment:21 Changed 5 years ago by
Replying to embray:
Python itself uses default encodings all over the place when in doubt
Right, but there are several defaults (each of UTF-8, sys.getfilesystemencoding()
and locale.getpreferredencoding()
could be considered as defaults). And I don't think that any of those 3 has the large majority. So I still feel that forcing the user to pick one is the best solution.
comment:22 follow-up: 26 Changed 5 years ago by
Also, I feel that error handling should be different for the different cases: if you are communicating with locale-aware software using locale.getpreferredencoding()
you probably want decoding errors to be actual errors. With the file system encoding on the other hand, surrogateescape
is typically a better default.
comment:23 follow-up: 25 Changed 5 years ago by
Replying to embray:
For that matter, any reason not to add simply
PY2
andPY3
compile-time constants? Comparing againstPY_VERSION_HEX
is a little annoying.
No problem for me, although I would prefer PY_MAJOR_VERSION
then. If you want to do that, please make a new ticket and make it depend on #24215 please.
comment:24 Changed 5 years ago by
Commit: | 4c787436df8bad6ebf1edac65ba97e080295e3bc → bb073e885eec1eeae8eac4c4776a81d84c2c61c1 |
---|
Branch pushed to git repo; I updated commit sha1. New commits:
bb073e8 | Move sage.misc.string to sage.cpython.string and rework how default encoding is handled
|
comment:25 Changed 5 years ago by
Replying to jdemeyer:
Replying to embray:
For that matter, any reason not to add simply
PY2
andPY3
compile-time constants? Comparing againstPY_VERSION_HEX
is a little annoying.No problem for me, although I would prefer
PY_MAJOR_VERSION
then. If you want to do that, please make a new ticket and make it depend on #24215 please.
Okay hold on...
comment:26 Changed 5 years ago by
Replying to jdemeyer:
Also, I feel that error handling should be different for the different cases: if you are communicating with locale-aware software using
locale.getpreferredencoding()
you probably want decoding errors to be actual errors. With the file system encoding on the other hand,surrogateescape
is typically a better default.
Yeah, I'm inclined to agree here--I think I'll change that. But in that case I'd also want an optional argument for error handling.
comment:27 Changed 5 years ago by
comment:28 Changed 5 years ago by
Commit: | bb073e885eec1eeae8eac4c4776a81d84c2c61c1 → a5d521980e6ce42afc2b127e8f7d892f32488ee4 |
---|
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
34ef6dc | Add HAVE_GMPY2 compile-time constant
|
158d984 | Force absolute import in have_module()
|
7f06e71 | Fix documentation
|
3ac146c | Add a Cython compile-time constant for Python major version.
|
f71fd41 | Add new string conversion utilities.
|
a5d5219 | Add support for optional error handling.
|
comment:29 follow-up: 32 Changed 5 years ago by
Dependencies: | → #24246 |
---|---|
Status: | needs_info → needs_review |
I'm pretty happy with this now as-is. I'm up for minor tweaking but would prefer not to make any other major changes to how this works, for now, since it's proving pretty useful in getting other code working.
I'd be up for adding additional shortcut functions for certain special cases if they come up often enough--we'll see.
(Also, it turns out if you put cpdef
function definitions into a .pxd
file it won't generate a module for them unless you also have a corresponding .pyx
file, even if it's empty--annoying. That said, it's useful to have a shortcut for sys.getfilesystemencoding()
which is annoying to write :)
comment:30 Changed 5 years ago by
I have just noticed that we use a similar kind of function "from sagenb.misc.misc import encoded_str" in src/sage/misc/latex.py..
comment:31 follow-up: 35 Changed 5 years ago by
If you want to pass encoding
and errors
arguments, can you pass them as the C type const char*
? That avoids needless slow manipulation of Python objects.
comment:32 follow-ups: 34 36 Changed 5 years ago by
Replying to embray:
since it's proving pretty useful in getting other code working.
If you have such use cases, I'd love to see them.
I'm not yet convinced that we should use locale.getpreferredencoding()
as default. Since this is just a preliminary interface, why can't we require an encoding argument until we have a clearer idea that most Unicode conversions are indeed using locale.getpreferredencoding
?
comment:33 follow-up: 37 Changed 5 years ago by
The code is a bit inconsistent in how it checks that the input has the correct type, for example passing non-bytes to bytes_to_str()
. In any case, I would drop the typing on the argument (bytes b
-> b
). If you want checks, make them explicit by using checked casts (<str?>
instead of <str>
).
comment:34 Changed 5 years ago by
comment:35 follow-up: 42 Changed 5 years ago by
Replying to jdemeyer:
If you want to pass
encoding
anderrors
arguments, can you pass them as the C typeconst char*
? That avoids needless slow manipulation of Python objects.
Doesn't that mean that from Python you'd have to pass those arguments in as bytes
then?
Let's please not get too hung up about micro-optimization here. Compared to actual string decoding (except in some trivial cases) this kind of point is pretty trivial.
comment:36 follow-up: 39 Changed 5 years ago by
Replying to jdemeyer:
I'm not yet convinced that we should use
locale.getpreferredencoding()
as default. Since this is just a preliminary interface, why can't we require an encoding argument until we have a clearer idea that most Unicode conversions are indeed usinglocale.getpreferredencoding
?
No, for exactly the reason that in all cases it's not clear what the best choice is. Anyways the end result is going to be everyone just writing "utf-8" everywhere which I'd prefer to avoid. I'd like to see how far we get with a sensible default for now. There are some cases where it's definitely good to specify an encoding--for example all files written by sage can be encoded utf-8 in general. In general that's not the case though.
comment:37 follow-up: 38 Changed 5 years ago by
Replying to jdemeyer:
The code is a bit inconsistent in how it checks that the input has the correct type, for example passing non-bytes to
bytes_to_str()
. In any case, I would drop the typing on the argument (bytes b
->b
). If you want checks, make them explicit by using checked casts (<str?>
instead of<str>
).
"make them explicit" I don't know what isn't explicit about specifying the argument type?
comment:38 follow-up: 49 Changed 5 years ago by
Replying to embray:
I don't know what isn't explicit about specifying the argument type?
It's explicit but not what you want. Cython always allows None
to be passed as any type. If you really want bytes
and not None
, you need to check for it. And if you do an explicit check, there is no reason for the typing of the argument.
comment:39 Changed 5 years ago by
Replying to embray:
Anyways the end result is going to be everyone just writing "utf-8" everywhere which I'd prefer to avoid.
And now we're in a situation where everybody uses locale.getpreferredencoding()
which is also not what you want.
comment:40 follow-up: 41 Changed 5 years ago by
I owe you an apology... I read some Python docs and sources and it turns out that Python uses locale.getpreferredencoding()
for more things than I thought. So with that in mind, locale.getpreferredencoding()
might actually be a reasonable default after all.
comment:41 Changed 5 years ago by
Replying to jdemeyer:
I owe you an apology... I read some Python docs and sources and it turns out that Python uses
locale.getpreferredencoding()
for more things than I thought. So with that in mind,locale.getpreferredencoding()
might actually be a reasonable default after all.
That's okay, I'm just going off experience here. But it's a hazy topic so I don't mind having my assumptions challenged.
What's nice about having a sensible default is that, especially starting out, one needs to use a lot of these and it's not always immediately obvious what to do. It's easier to not worry about it until you know you need to (e.g. something produces some text in an encoding you weren't expecting). The locale encoding is a good bet though since it will typically match what the terminal is using.
comment:42 Changed 5 years ago by
Replying to embray:
Let's please not get too hung up about micro-optimization here. Compared to actual string decoding (except in some trivial cases) this kind of point is pretty trivial.
Another good point: PyUnicode_AsUTF8
caches its own representation as UTF-8 string. So the PyUnicode_AsUTF8()
calls for encoding
and errors
should be fast.
comment:43 Changed 5 years ago by
Reviewers: | → Jeroen Demeyer |
---|
comment:44 follow-up: 45 Changed 5 years ago by
Yes, especially since they'll likely be interned anyways.
I'm still going to look into your point about argument types.
comment:45 Changed 5 years ago by
Replying to embray:
I'm still going to look into your point about argument types.
No. I'm preparing a small reviewer patch.
comment:46 Changed 5 years ago by
Branch: | u/embray/python3/string-conversions → u/jdemeyer/python3/string-conversions |
---|
comment:47 Changed 5 years ago by
Commit: | a5d521980e6ce42afc2b127e8f7d892f32488ee4 → 4a209df305a721e9c30bc1aa384d128e265c419c |
---|
comment:48 Changed 5 years ago by
Note that my commit does not affect the API of these functions, so this shouldn't require any changes for tickets depending on this one.
comment:49 Changed 5 years ago by
Status: | needs_review → needs_work |
---|
Replying to jdemeyer:
Replying to embray:
I don't know what isn't explicit about specifying the argument type?
It's explicit but not what you want. Cython always allows
None
to be passed as any type. If you really wantbytes
and notNone
, you need to check for it. And if you do an explicit check, there is no reason for the typing of the argument.
I'm still not following you here. Dropping the type specification seems to make things worse, not better. I do see what you're saying about None
which is weird. I don't like that. But specifying the argument of the type provides some compile-time static checking that's already been helpful a few times. It's not foolproof of course because in many cases Cython can't be sure what type a Python object will be. But in cases where it can (e.g. something "obvious" like foo = b'abc'; str_to_bytes(foo)
) you get:
Error compiling Cython file: ------------------------------------------------------------ ... foo = b'abc' bar = str_to_bytes(foo) ^ ------------------------------------------------------------ Cannot convert 'bytes' object to str implicitly. This is not portable to Py3.
While this seems to be specific to bytes -> str, it is very useful in this case.
Also, without this type check you get much less useful errors if you pass in the wrong type. With the type specification:
>>> str_to_bytes(b'a') Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: Argument 's' has incorrect type (expected str, got bytes)
Without it
>>> str_to_bytes(b'a') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "sage/cpython/string.pxd", line 70, in sage.cpython.string.str_to_bytes (build/cythonized/sage/cpython/string.c:1660) File "sage/cpython/string.pxd", line 99, in sage.cpython.string.str_to_bytes (build/cythonized/sage/cpython/string.c:1523) TypeError: bad argument type for built-in operation
This error happens to be raising from PyUnicode_EncodeLocale
, but one wouldn't guess that just from looking at it. It seems sloppy.
One thing I did recently run into, however, is that str_to_bytes(str s, ...)
does not allow subclasses of str
which is inconvenient. Your version does seem allow str
subclasses (and I guess likewise bytes
subclasses for bytes_to_str
). It would be nice to have a middle ground between accepting subclasses, but also having better type checking.
comment:50 follow-up: 51 Changed 5 years ago by
As you suggested earlier, using explicit type-checking casts like PyUnicode_EncodeLocale(<str?>s, err)
gets part of the way there--it at least provides better exceptions when the wrong type is passed in.
Unfortunately in the case of str
Cython wants to use PyString_CheckExact
, instead of some more generic check, and this disallows str
subclasses (even though for user-defined types it will use a check that allows subclasses). I feel like this a bug in Cython--why use PyString_CheckExact
instead of PyString_Check
? The latter still does the equivalent of the former first as a special case. This is also especially nicely optimized on Python 3 where they added a Py_TPFLAGS_UNICODE_SUBCLASS
flag as a special case :)
comment:51 follow-up: 52 Changed 5 years ago by
Replying to embray:
I feel like this a bug in Cython--why use
PyString_CheckExact
instead ofPyString_Check
?
Feature, not a bug :-)
Cython uses this typing as an optimization tool, not as a checking tool. It happens to do some checking (like assigning cdef str x = []
will fail) but that is just to avoid segfaults. For certain builtin types, Cython accesses the internals of those types. But accessing those internals is only valid for the exact type, it might not be valid for subtypes.
comment:52 Changed 5 years ago by
Replying to jdemeyer:
Replying to embray:
I feel like this a bug in Cython--why use
PyString_CheckExact
instead ofPyString_Check
?Feature, not a bug :-)
Yep, I just confirmed this myself--although it doesn't really make this explicit the way the code is written definitely implies that it's intentional.
Cython uses this typing as an optimization tool, not as a checking tool. It happens to do some checking (like assigning
cdef str x = []
will fail) but that is just to avoid segfaults. For certain builtin types, Cython accesses the internals of those types. But accessing those internals is only valid for the exact type, it might not be valid for subtypes.
Right. It depends. I feel like one should be able to ask for allowing subtypes explicitly though, if nothing else. If Cython needs to muck around with the internals it can still make an "exact" check...
comment:53 follow-ups: 54 55 Changed 5 years ago by
Also I'd say Cython does use types for checking, not just optimization, for example it obviously won't let you do things like
cdef int i i = "abc"
I guess it depends on what you mean by "typing"--is this referring to Python types or C types?
So in the mean time what do we do? Write the explicit checks manually? Or just not allow str
subclasses and require a manual str()
when passing in objects that might be str
subclasses (ew, very un-Pythonic).
comment:54 follow-up: 56 Changed 5 years ago by
Replying to embray:
So in the mean time what do we do?
I would argue that the current branch works and that nothing needs to be changed. Even if the error message from PyUnicode_Decode
is obscure, the traceback would easily point to the right place.
comment:55 Changed 5 years ago by
Replying to embray:
I guess it depends on what you mean by "typing"--is this referring to Python types or C types?
I was specifically talking about builtin Python types (not C types nor cdef classes
).
comment:56 follow-up: 57 Changed 5 years ago by
Replying to jdemeyer:
Replying to embray:
So in the mean time what do we do?
I would argue that the current branch works and that nothing needs to be changed. Even if the error message from
PyUnicode_Decode
is obscure, the traceback would easily point to the right place.
Okay--I'm really not happy about that--given that the <str?>
syntax is the programmer explicitly asking for a type check it should allow some flexibility in how that's performed. But that's a Cython issue. I can live with this in the meantime. However, one thing that still needs to be fixed then is to remove the type checks in the Python 2 cases. I'm not sure what good they're doing, and they are restricting the use of subtypes on Python 2...
comment:57 Changed 5 years ago by
Replying to embray:
given that the
<str?>
syntax is the programmer explicitly asking for a type check it should allow some flexibility in how that's performed. But that's a Cython issue.
Again, feature, not a bug. It is meant as "optimize this code assuming that this is a str
but check it just in case". Of course, it is too tempting to use this to replace isinstance()
checks as I have done. I'll fix it right now.
comment:58 Changed 5 years ago by
Commit: | 4a209df305a721e9c30bc1aa384d128e265c419c → 64567aa772d737b6ff052f99a67cc6e8ad72c21c |
---|
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
64567aa | Make type checks explicit
|
comment:59 Changed 5 years ago by
Status: | needs_work → needs_review |
---|
comment:60 follow-up: 61 Changed 5 years ago by
Okay, I take your word for it, but in that case I think the the Cython documentation is a little deceptive about how this works / is meant to be used. Not deliberately of course, it's just unclear. This is like with importlib.import_module
--the documentation is unhelpful here, but I'm sure you're right...
comment:61 Changed 5 years ago by
Replying to embray:
Okay, I take your word for it, but in that case I think the the Cython documentation is a little deceptive about how this works / is meant to be used.
comment:62 Changed 5 years ago by
Authors: | Erik Bray → Erik Bray, Jeroen Demeyer |
---|---|
Reviewers: | Jeroen Demeyer → Jeroen Demeyer, Erik Bray |
Status: | needs_review → positive_review |
Alright, I'm good with this.
comment:63 Changed 5 years ago by
Status: | positive_review → needs_work |
---|
I think it would actually be best if char_to_str
took a const char *
instead of just char *
. I was lazy about that at first, but for the sake of correctness I think this should be updated.
comment:65 Changed 5 years ago by
Sure--I can do it if you want. I'll just set it back to positive_review then since we're in agreement.
comment:66 Changed 5 years ago by
Branch: | u/jdemeyer/python3/string-conversions → u/embray/python3/string-conversions |
---|---|
Commit: | 64567aa772d737b6ff052f99a67cc6e8ad72c21c → a358aeaec8612efe19678482fb510751a1b7b33a |
Status: | needs_work → positive_review |
New commits:
a358aea | Require a const char*--this is more consistent for use in cases where
|
comment:68 Changed 5 years ago by
It would be helpful if you linked to a build result page demonstrating the issue.
comment:70 Changed 5 years ago by
Milestone: | sage-8.1 → sage-8.2 |
---|
comment:71 Changed 5 years ago by
Well I have unrelated problems building the PDF docs (is there a list somewhere of the dependencies? I have a pretty big texlive install and it still doesn't seem to have all the necessary fonts or something, even for some characters not related to this ticket).
Still if I had to guess it seems likely the problem is my use of the Snowman character (my favorite go-to unicode character). I'll just change it to something a little more likely to be already supported...
comment:72 Changed 5 years ago by
Commit: | a358aeaec8612efe19678482fb510751a1b7b33a → 64a1c2d58713636a51c60e3be508204c88513658 |
---|
Branch pushed to git repo; I updated commit sha1. New commits:
64a1c2d | Use a less unusual character in the tests
|
comment:73 Changed 5 years ago by
Status: | needs_work → needs_review |
---|
comment:74 Changed 5 years ago by
maybe you should add r"""
} when the doctests contains something that can perturb the pdf doc ?
comment:75 Changed 5 years ago by
Ah, there are some slashes in there too; I should make it an r-string just in case.
comment:76 Changed 5 years ago by
Commit: | 64a1c2d58713636a51c60e3be508204c88513658 → 1bf6709cb5b256537bab4058ba3fa578996f521b |
---|
Branch pushed to git repo; I updated commit sha1. New commits:
1bf6709 | Use r-strings for the docstrings since they contain slashes
|
comment:77 Changed 5 years ago by
Have a look at build/pkgs/texlive/package-list.txt
You can add custom codepoints in src/doc/common/conf.py
comment:79 Changed 5 years ago by
Breaks the python 3.6 build for me in sage-on-gentoo
[477/483] Cythonizing sage/symbolic/function.pyx Error compiling Cython file: ------------------------------------------------------------ ... err = NULL # implies "strict" else: err = PyUnicode_AsUTF8(errors) if encoding is None: return PyUnicode_DecodeLocale(c, err) ^ ------------------------------------------------------------ sage/cpython/string.pxd:37:41: Cannot convert Unicode string to 'str' implicitly. This is not portable and requires explicit encoding. Traceback (most recent call last): File "/usr/lib64/python3.6/site-packages/Cython/Build/Dependencies.py", line 1179, in cythonize_one_helper return cythonize_one(*m) File "/usr/lib64/python3.6/site-packages/Cython/Build/Dependencies.py", line 1161, in cythonize_one raise CompileError(None, pyx_file) Cython.Compiler.Errors.CompileError: sage/cpython/string.pyx
cython 0.27.3 and python-3.6.3. Since this is from python3 specific section it won't show up on Volker's build bots.
comment:80 Changed 5 years ago by
Status: | positive_review → needs_work |
---|
Hrm. It seems I never actually merged/tested Jeroen's changes on my Python 3 branch. I assumed he tested it.
comment:81 Changed 5 years ago by
Commit: | 1bf6709cb5b256537bab4058ba3fa578996f521b → 2bb839655c50259c8beec273cab5773eb5158265 |
---|
Branch pushed to git repo; I updated commit sha1. New commits:
2bb8396 | Clean up these declarations:
|
comment:82 Changed 5 years ago by
Status: | needs_work → needs_review |
---|
comment:83 Changed 5 years ago by
Status: | needs_review → needs_work |
---|
Instead of removing the return type specifications, it would be better to fix them. Replacing the unicode
return value by str
should work.
comment:85 Changed 5 years ago by
Commit: | 2bb839655c50259c8beec273cab5773eb5158265 → 6d73a416b18f9d0edec280009495ae77f7b0a49a |
---|
Branch pushed to git repo; I updated commit sha1. New commits:
6d73a41 | Use more specific type declarations here.
|
comment:86 Changed 5 years ago by
Status: | needs_work → needs_review |
---|
comment:88 Changed 5 years ago by
Status: | needs_review → positive_review |
---|
one green bot. I am setting to positive this important ticket.
comment:90 Changed 5 years ago by
Volker, do you by chance have any idea about what was the conflicting ticket ?
comment:91 follow-up: 94 Changed 5 years ago by
Merge conflict with what? If there were a normal "master" branch into which tickets were merged regularly against which I could compare that would be one thing, but you can't just secretly merge a bunch of tickets all at once, claim "merge conflict", and expect me to guess what it conflicts with.
comment:92 Changed 5 years ago by
Commit: | 6d73a416b18f9d0edec280009495ae77f7b0a49a → dec9f3ac79a992d3152366ae266f1065c70c7867 |
---|
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
a285af4 | Add new string conversion utilities.
|
5b3719a | Add support for optional error handling.
|
bf5f893 | Make type checks explicit
|
1bd27fc | Require a const char*--this is more consistent for use in cases where
|
bd2d931 | Use a less unusual character in the tests
|
b32f000 | Use r-strings for the docstrings since they contain slashes
|
ccf7bcd | Clean up these declarations:
|
dec9f3a | Use more specific type declarations here.
|
comment:93 Changed 5 years ago by
Rebased on current develop branch if it helps, but there was no merge conflict.
New commits:
a285af4 | Add new string conversion utilities.
|
5b3719a | Add support for optional error handling.
|
bf5f893 | Make type checks explicit
|
1bd27fc | Require a const char*--this is more consistent for use in cases where
|
bd2d931 | Use a less unusual character in the tests
|
b32f000 | Use r-strings for the docstrings since they contain slashes
|
ccf7bcd | Clean up these declarations:
|
dec9f3a | Use more specific type declarations here.
|
comment:94 Changed 5 years ago by
Replying to embray:
Merge conflict with what? If there were a normal "master" branch into which tickets were merged regularly against which I could compare that would be one thing, but you can't just secretly merge a bunch of tickets all at once, claim "merge conflict", and expect me to guess what it conflicts with.
It's not secret: https://github.com/vbraun/sage/tree/develop
comment:95 Changed 5 years ago by
Dependencies: | #24246 |
---|---|
Status: | needs_work → positive_review |
I don't see any conflict...
comment:96 Changed 5 years ago by
Branch: | u/embray/python3/string-conversions → dec9f3ac79a992d3152366ae266f1065c70c7867 |
---|---|
Resolution: | → fixed |
Status: | positive_review → closed |
Branch pushed to git repo; I updated commit sha1. New commits:
Add new string conversion utilities.