#30053 closed defect (fixed)
Python 3.7+: setlocale: LC_ALL: cannot change locale (C.UTF-8) from build/bin/sage-spkg and in doctests; disable use of system Python 3.6
Reported by: | dimpase | Owned by: | |
---|---|---|---|
Priority: | blocker | Milestone: | sage-9.2 |
Component: | build | Keywords: | |
Cc: | arojas, mkoeppe, slelievre, mjo, fbissey, embray, chapoton | Merged in: | |
Authors: | Dima Pasechnik, Matthias Koeppe | Reviewers: | Matthias Koeppe, Dima Pasechnik |
Report Upstream: | N/A | Work issues: | |
Branch: | be47518 (Commits, GitHub, GitLab) | Commit: | |
Dependencies: | Stopgaps: |
Description (last modified by )
Change History (95)
comment:1 Changed 2 years ago by
- Cc arojas added
comment:2 Changed 2 years ago by
- Cc mkoeppe added
comment:3 follow-up: ↓ 5 Changed 2 years ago by
- Cc arojas mkoeppe removed
comment:4 Changed 2 years ago by
- Cc arojas mkoeppe added
- Component changed from PLEASE CHANGE to build
- Type changed from PLEASE CHANGE to defect
comment:5 in reply to: ↑ 3 Changed 2 years ago by
Replying to dimpase:
an easy way out is just to check whether the locale change worked, and if not, use
C
locale, notC.UTF-8
.
+1
comment:6 Changed 2 years ago by
- Branch set to u/dimpase/build/careful_with_C_UTF8
- Commit set to 37e042c44d71e841e19fcf70b6995a0f9b3c4ec2
- Status changed from new to needs_review
New commits:
37e042c | only use locale C.UTF-8 if available, else C
|
comment:7 Changed 2 years ago by
comment:8 follow-up: ↓ 17 Changed 2 years ago by
I started test runs:
comment:9 Changed 2 years ago by
this ticket was a result of a bug report on Arch, not centos. Hopefully it works for #30008 too.
comment:10 Changed 2 years ago by
Tests did not complete, because the 9.2.beta3 tests fail everywhere.
https://github.com/sagemath/sage/actions/runs/157607524
Is this a github issue or have we broken sage?
Step 13/18 : RUN ./bootstrap ---> Running in 89427ef5c1c4 rm -rf config configure build/make/Makefile-auto.in rm -f src/doc/en/installation/*.txt rm -rf src/doc/en/reference/spkg/*.rst rm -f src/doc/en/reference/repl/*.txt src/doc/bootstrap:48: installing src/doc/en/installation/arch.txt and src/doc/en/installation/arch-optional.txt src/doc/bootstrap:48: installing src/doc/en/installation/debian.txt and src/doc/en/installation/debian-optional.txt src/doc/bootstrap:48: installing src/doc/en/installation/fedora.txt and src/doc/en/installation/fedora-optional.txt src/doc/bootstrap:48: installing src/doc/en/installation/cygwin.txt and src/doc/en/installation/cygwin-optional.txt src/doc/bootstrap:48: installing src/doc/en/installation/homebrew.txt and src/doc/en/installation/homebrew-optional.txt src/doc/bootstrap:55: installing src/doc/en/reference/spkg/*.rst src/doc/bootstrap:83: installing src/doc/en/reference/repl/options.txt src/doc/bootstrap: line 84: src/doc/en/reference/repl/options.txt: No such file or directory The command '/bin/sh -c ./bootstrap' returned a non-zero code: 1
comment:11 Changed 2 years ago by
I just found #30064. Edit: I was cc on that, but I didn't realize how serious this is.
Ok. I'll run a new test then.
comment:12 follow-up: ↓ 14 Changed 2 years ago by
- Status changed from needs_review to needs_work
This breaks building sphinx on windows.
https://github.com/kliem/sage/runs/838933940
Same error as #30008.
As far as I understand the problem is that we need some sort of UTF to make the sphinx build work.
It appears that on cygwin
the default is better than C
and C.UTF-8
does not work.
So maybe C
is not the best alternative for C.UTF-8
.
Btw, strangely centos 7 appears to work with the current beta. I don't know what happened. (And I don't know yet, if this behavior is stable).
comment:13 Changed 2 years ago by
And it breaks centos 8.
comment:14 in reply to: ↑ 12 Changed 2 years ago by
Replying to gh-kliem:
This breaks building sphinx on windows.
https://github.com/kliem/sage/runs/838933940
Same error as #30008.
As far as I understand the problem is that we need some sort of UTF to make the sphinx build work. It appears that on
cygwin
the default is better thanC
andC.UTF-8
does not work. So maybeC
is not the best alternative forC.UTF-8
.
I'm not sure what you mean here. C.UTF-8
is supported on Cygwin and is in fact the default locale in absence of any other settings: https://www.cygwin.com/cygwin-ug-net/setup-locale.html
The default locale in the absence of the aforementioned locale environment variables is "C.UTF-8".
comment:15 Changed 2 years ago by
the error UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 45: ordinal not in range(128)
:
2020-07-05T16:29:58.1676169Z [sphinx-3.0.4.p0] installing. Log file: /cygdrive/d/a/sage/sage/logs/pkgs/sphinx-3.0.4.p0.log 2020-07-05T16:30:01.7883565Z [sphinx-3.0.4.p0] error installing, exit status 1. End of log file: 2020-07-05T16:30:01.8141086Z [sphinx-3.0.4.p0] Found local metadata for sphinx-3.0.4.p0 2020-07-05T16:30:01.8155620Z [sphinx-3.0.4.p0] Attempting to download package Sphinx-3.0.4.tar.gz from mirrors 2020-07-05T16:30:01.8169727Z [sphinx-3.0.4.p0] http://mirrors.mit.edu/sage/spkg/upstream/sphinx/Sphinx-3.0.4.tar.gz 2020-07-05T16:30:01.8174798Z [sphinx-3.0.4.p0] [......................................................................] 2020-07-05T16:30:01.8191111Z [sphinx-3.0.4.p0] sphinx-3.0.4.p0 2020-07-05T16:30:01.8193080Z [sphinx-3.0.4.p0] ==================================================== 2020-07-05T16:30:01.8197343Z [sphinx-3.0.4.p0] Setting up build directory for sphinx-3.0.4.p0 2020-07-05T16:30:01.8217424Z [sphinx-3.0.4.p0] Traceback (most recent call last): 2020-07-05T16:30:01.8221790Z [sphinx-3.0.4.p0] File "/cygdrive/d/a/sage/sage/build/bin/sage-uncompress-spkg", line 23, in <module> 2020-07-05T16:30:01.8222267Z [sphinx-3.0.4.p0] run() 2020-07-05T16:30:01.8222576Z [sphinx-3.0.4.p0] File "/cygdrive/d/a/sage/sage/build/bin/../sage_bootstrap/uncompress/cmdline.py", line 72, in run 2020-07-05T16:30:01.8222857Z [sphinx-3.0.4.p0] unpack_archive(archive, dirname) 2020-07-05T16:30:01.8223251Z [sphinx-3.0.4.p0] File "/cygdrive/d/a/sage/sage/build/bin/../sage_bootstrap/uncompress/action.py", line 68, in unpack_archive 2020-07-05T16:30:01.8223583Z [sphinx-3.0.4.p0] archive.extractall(members=archive.names) 2020-07-05T16:30:01.8223861Z [sphinx-3.0.4.p0] File "/cygdrive/d/a/sage/sage/build/bin/../sage_bootstrap/uncompress/tar_file.py", line 96, in extractall 2020-07-05T16:30:01.8224117Z [sphinx-3.0.4.p0] **kwargs) 2020-07-05T16:30:01.8224323Z [sphinx-3.0.4.p0] File "/usr/lib/python3.6/tarfile.py", line 2010, in extractall 2020-07-05T16:30:01.8224793Z [sphinx-3.0.4.p0] numeric_owner=numeric_owner) 2020-07-05T16:30:01.8225151Z [sphinx-3.0.4.p0] File "/usr/lib/python3.6/tarfile.py", line 2052, in extract 2020-07-05T16:30:01.8225442Z [sphinx-3.0.4.p0] numeric_owner=numeric_owner) 2020-07-05T16:30:01.8225898Z [sphinx-3.0.4.p0] File "/cygdrive/d/a/sage/sage/build/bin/../sage_bootstrap/uncompress/tar_file.py", line 122, in _extract_member 2020-07-05T16:30:01.8226166Z [sphinx-3.0.4.p0] **kwargs) 2020-07-05T16:30:01.8226601Z [sphinx-3.0.4.p0] File "/usr/lib/python3.6/tarfile.py", line 2122, in _extract_member 2020-07-05T16:30:01.8226883Z [sphinx-3.0.4.p0] self.makefile(tarinfo, targetpath) 2020-07-05T16:30:01.8227488Z [sphinx-3.0.4.p0] File "/usr/lib/python3.6/tarfile.py", line 2163, in makefile 2020-07-05T16:30:01.8227940Z [sphinx-3.0.4.p0] with bltn_open(targetpath, "wb") as target: 2020-07-05T16:30:01.8228249Z [sphinx-3.0.4.p0] UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 45: ordinal not in range(128) 2020-07-05T16:30:01.8228542Z [sphinx-3.0.4.p0] ************************************************************************ 2020-07-05T16:30:01.8228988Z [sphinx-3.0.4.p0] Error: failed to extract /cygdrive/d/a/sage/sage/upstream/Sphinx-3.0.4.tar.gz 2020-07-05T16:30:01.8229264Z [sphinx-3.0.4.p0] ************************************************************************ 2020-07-05T16:30:01.8229537Z [sphinx-3.0.4.p0] Full log file: /cygdrive/d/a/sage/sage/logs/pkgs/sphinx-3.0.4.p0.log
comment:16 Changed 2 years ago by
could it be that locale
on Cygwin is not installed by default?
However, https://www.cygwin.com/cygwin-ug-net/setup-locale.html says:
Note For a list of locales supported by your Windows machine, use the new locale -a command, which is part of the Cygwin package. For a description see locale(1)
comment:17 in reply to: ↑ 8 Changed 2 years ago by
Replying to gh-kliem:
I started test runs:
I just started rerunning those tests on top of the current beta. Maybe that stuff just goes away by itself.
comment:18 Changed 2 years ago by
Still causes this error.
comment:19 Changed 2 years ago by
If the centos issue is caused by the sphinx upgrade (according to #30008), why is it blocking this? This is meant to fix another (very annoying) issue on Arch.
comment:20 Changed 2 years ago by
It appears that #30008 fixed itself. However, this here broke the cygwin sphinx build, last I checked.
It have no clue what is going on, but with this ticket we go from passing to failing.
comment:21 Changed 2 years ago by
It seems a default setting on Cygwin is LANG=en_US.UTF-8
. Perhaps we can try to only set LC_ALL
if LANG
is not already set or something like this.
comment:22 Changed 2 years ago by
Also it should be investigated whether it was really necessary to add this line in #29033 to achieve Python 3.6 support. In particular note that sage-uncompress-spkg
uses sage-system-python
(which can even be python2) -- which really has nothing to do with Python 3.6 support (which is about PYTHON_FOR_VENV
).
comment:23 Changed 2 years ago by
- Cc slelievre added
comment:24 Changed 2 years ago by
- Cc mjo fbissey added
comment:25 Changed 2 years ago by
What problems arise if we drop the locale mangling entirely? Trac #15791 doesn't mention a problem.
comment:26 Changed 23 months ago by
- Priority changed from major to blocker
comment:27 Changed 23 months ago by
- Description modified (diff)
comment:28 Changed 23 months ago by
- Summary changed from setlocale: LC_ALL: cannot change locale (C.UTF-8) from build/bin/sage-spkg to setlocale: LC_ALL: cannot change locale (C.UTF-8) from build/bin/sage-spkg and in doctests
comment:29 Changed 23 months ago by
- Description modified (diff)
comment:30 Changed 23 months ago by
I am not sure if this is related, but while compiling Cypari on macOS, every file gives a warning of this type:
Colperl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LC_ALL = "C.UTF-8", LC_TERMINAL = "iTerm2", LANG = "de_DE.UTF-8" are supported and installed on your system. perl: warning: Falling back to a fallback locale ("de_DE.UTF-8").
The fallback that is used seems to work for me, though.
comment:31 follow-up: ↓ 33 Changed 23 months ago by
- Cc embray added
I just did a fresh build with LC_ALL=C
and see no outstanding problems (python-3.7.8).
Maybe we should just revert that line? Why rock the boat?
Erik is the only other person who might know why it was added.
comment:32 Changed 23 months ago by
This will need testing on centos-8
with Python 3.6
comment:33 in reply to: ↑ 31 ; follow-up: ↓ 34 Changed 22 months ago by
Replying to mjo:
I just did a fresh build with
LC_ALL=C
and see no outstanding problems (python-3.7.8).Maybe we should just revert that line? Why rock the boat?
No. This was added for reasons. Specifically to ensure compatibility between how Python 3.6 and Python 3.7 set the default encoding. Without this, there were bugs on Python 3.6 with Python not using a unicode character encoding by default. See https://www.python.org/dev/peps/pep-0538/
The simplest way to deal with this problem for currently released versions of CPython is to explicitly set a more sensible locale when launching the application. For example:
LC_CTYPE=C.UTF-8 python3 ...
The C.UTF-8 locale is a full locale definition that uses UTF-8 for the LC_CTYPE category, and the same settings as the C locale for all other categories (including LC_COLLATE). It is offered by a number of Linux distributions (including Debian, Ubuntu, Fedora, Alpine and Android) as an alternative to the ASCII-based C locale. Some other platforms (such as HP-UX) offer an equivalent locale definition under the name C.utf8.
Mac OS X and other *BSD systems have taken a different approach: instead of offering a C.UTF-8 locale, they offer a partial UTF-8 locale that only defines the LC_CTYPE category. On such systems, the preferred environmental locale adjustment is to set LC_CTYPE=UTF-8 rather than to set LC_ALL or LANG.
Perhaps this should also try the LC_CTYPE=UTF-8
mentioned here. Otherwise Dima's approach makes sense, though it can't guarantee that everything will just work on those systems. I can't recall exactly what broke without it but I do recall there was something. With Python 3.7 (the default when not using the system Python) this shouldn't be a problem since Python will basically force a UTF-8 locale for itself.
comment:34 in reply to: ↑ 33 ; follow-up: ↓ 36 Changed 22 months ago by
Replying to embray:
Perhaps this should also try the
LC_CTYPE=UTF-8
mentioned here. Otherwise Dima's approach makes sense, though it can't guarantee that everything will just work on those systems. I can't recall exactly what broke without it but I do recall there was something. With Python 3.7 (the default when not using the system Python) this shouldn't be a problem since Python will basically force a UTF-8 locale for itself.
How about only doing this export LC_...=
on Python3.x with x<7 ?
This should in particular make Arch people (apparently Arch has no C.UTF-8 or a similar locale,
everything UTF-8 there is language-specific) happy, as their Python is new enough.
comment:35 Changed 22 months ago by
Ok, thanks for the information. I think the major take-away from PEP538 is,
With this change, any *nix platform that does not offer at least one of the C.UTF-8, C.utf8 or UTF-8 locales as part of its standard configuration would only be considered a fully supported platform for CPython 3.7+ deployments when a suitable locale other than the default C locale is configured explicitly (e.g. en_AU.UTF-8, zh_CN.gb18030).
I'm pretty sure we have files that actually need the UTF-8 encoding by now, so that rules out the possibility of "doing nothing" (leaving LC_ALL=C
or LC_CTYPE=C
) on python-3.6. And if we want to make python-3.6 work the way that python-3.7 does, then we're in the same situation as upstream is with respect to C.UTF-8: we have to consider python-3.6 with no C.UTF-8 (or equivalent) unsupported.
So I see two real options left:
- Try to set the locale to
C.UTF-8
orC.utf8
orUTF-8
when python-3.6 is being used, and declare the system unsupported if we can't. If python-3.7+ is being used, we can set the locale toC
, and it will coerce the locale to something utf8ish on its own. In either case, a lack of utf8 locale would be unsupported. - Don't set a locale at all, and rely on the distribution/user to set a utf8 locale by default. This would result in some confusing grep/sort behavior (they're locale-dependent), but maybe we could be extra careful in our SPKGs to work around that, so that e.g.
en_US.UTF-8
would work too.
Long-term, as one of the largest python projects in existence, I think we probably have to suck it up and go with (1), even though it pains me to require a locale that glibc doesn't even ship and isn't POSIX. Whatever decisions python makes, we're stuck with.
comment:36 in reply to: ↑ 34 Changed 22 months ago by
Replying to dimpase:
How about only doing this
export LC_...=
on Python3.x with x<7 ? This should in particular make Arch people (apparently Arch has no C.UTF-8 or a similar locale, everything UTF-8 there is language-specific) happy, as their Python is new enough.
I think a combination of this and the current branch is the best we can do. On python-3.6, we should try to set LC_ALL
to C.UTF-8
, C.utf8
, or UTF-8
. If we can't, then we should leave it alone and pray that the user's locale is compatible with all of our SPKGs. That situation would be unsupported by sage.
On python-3.7+, we can set LC_ALL=C
, and python itself will try to pick an appropriate UTF-8 version of the locale. What does python on arch do in this situation? It's possible that python itself will fail to find a suitable UTF-8 locale, but there's not a lot we can do if upstream python insists on a nonstandard locale. Arch will just have to reconsider their decision unless they want to be not-fully supported by upstream python-3.7+.
comment:37 Changed 22 months ago by
Personally I don't care what glibc or POSIX say on this. I think 1) is a fine option.
comment:38 Changed 22 months ago by
Apparently we carry the C.UTF-8 patch in Gentoo for systemd, who definitely don't care about portability:
comment:39 Changed 22 months ago by
Is anyone working on this?
comment:40 follow-up: ↓ 41 Changed 22 months ago by
- Priority changed from blocker to major
Sage-the-python-library can't realistically support anything that CPython does not; Its 2020, who in their right mind doesn't support utf-8? Better diagnostics for non-compliant systems would be great but imho not a blocker.
comment:41 in reply to: ↑ 40 Changed 22 months ago by
Replying to vbraun:
Sage-the-python-library can't realistically support anything that CPython does not; Its 2020, who in their right mind doesn't support utf-8? Better diagnostics for non-compliant systems would be great but imho not a blocker.
These systems do support UTF-8, but not the (as of yet) non-standard C.UTF-8
locale.
The current branch has the right idea, but since python-3.6 and python-3.7 act differently, it can be made a bit more precise. With python-3.7+, we can set LC_ALL=C
and let python do the guessing. (Maybe it doesn't succeed, but officially Not Our Problem at that point.) With python-3.6, we can check for the C.UTF-8
locale and set it when found, with a fallback to LC_ALL=C
. The current branch does this unconditionally but, it should only do it for python-3.6 and we should check the other equivalent names C.utf8
and UTF-8
too.
I think it's worthwhile to not output a million scary error messages in the sage-9.2 release on these systems that have done nothing wrong. At the very least, we owe it to the Arch maintainers who do a lot for sage and would have to field the resulting bug reports (or patch this themselves). I'm sure there are BSDs where this is problematic too.
comment:42 Changed 22 months ago by
Arch locales maintainers just need to get C.UTF-8 locale, they are being silly (their argument - "it's an evil coming from Debian" - and I'm told they don't reopen the corresponding issue, as it's "decided". Meanwhile everybody else has C.UTF-8 locale, it's just them who don't)
comment:43 Changed 22 months ago by
- Priority changed from major to blocker
It's a blocker because it is a regression regarding platform support.
Can we please get a fix done?
comment:44 Changed 22 months ago by
- Commit changed from 37e042c44d71e841e19fcf70b6995a0f9b3c4ec2 to e5f6663372fab14c6a386004a925de59be216f31
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
e5f6663 | only use locale C.UTF-8 if available, else C
|
comment:45 Changed 22 months ago by
rebased over the latest beta
comment:46 Changed 22 months ago by
- Description modified (diff)
comment:47 Changed 22 months ago by
Using a freshly installed Ubuntu 18.04 (bionic) and with some french settings set somewhere so that $ git pull
returns Déjà à jour
, a french equivalent for Already up to date
, running make on 9.2.beta12
yields the [dochtml] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2661: ordinal not in range(128)
error, see this post, right from the start of the build of the documentation.
With beta12 + current branch, I still get the same error after make doc-clean && make
.
comment:48 Changed 22 months ago by
That being said, I now see where it comes from:
sage: with open('src/doc/en/reference/references/index.rst', 'r') as f: s = f.read() sage: s[2600:2700] ' characteristic,* The Open Book Series, vol. 2, no. 1, pp. 37–53, Jan. 2019.\n\n.. [ABZ2007] \\R. Aharo' sage: s[2661:2700] '–53, Jan. 2019.\n\n.. [ABZ2007] \\R. Aharo'
The character –
has many occurrences in that file and is possibly not the only occurrence of a nonascii character (for example names of authors...). So, I am not sure replacing them by --
is the correct fix. And I don't see how this is related at all with the C.UTF-8
configuration.
comment:49 Changed 22 months ago by
So your machine has no C.UTF-8 locale installed, right?
comment:50 Changed 22 months ago by
How can I figure this out? If it can help, I have this:
$ locale LANG=fr_CA.UTF-8 LANGUAGE=fr_CA:fr_FR:en_GB:en LC_CTYPE="fr_CA.UTF-8" LC_NUMERIC=fr_FR.UTF-8 LC_TIME=fr_FR.UTF-8 LC_COLLATE="fr_CA.UTF-8" LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES="fr_CA.UTF-8" LC_PAPER=fr_FR.UTF-8 LC_NAME=fr_FR.UTF-8 LC_ADDRESS=fr_FR.UTF-8 LC_TELEPHONE=fr_FR.UTF-8 LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=fr_FR.UTF-8 LC_ALL= slabbe@miami ~ $ man locale slabbe@miami ~ $ locale -a C C.UTF-8 en_AG en_AG.utf8 en_AU.utf8 en_BW.utf8 en_CA.utf8 en_DK.utf8 en_GB.utf8 en_HK.utf8 en_IE.utf8 en_IL en_IL.utf8 en_IN en_IN.utf8 en_NG en_NG.utf8 en_NZ.utf8 en_PH.utf8 en_SG.utf8 en_US.utf8 en_ZA.utf8 en_ZM en_ZM.utf8 en_ZW.utf8 fr_BE.utf8 fr_CA.utf8 fr_CH.utf8 french fr_FR fr_FR.iso88591 fr_FR.utf8 fr_LU.utf8 POSIX
comment:51 follow-up: ↓ 52 Changed 22 months ago by
in
+if test x`locale -a | grep C\.UTF-8` != x; then + export LC_ALL=C.UTF-8; +else + export LC_ALL=C; +fi
bit of this branch, could you change both LC_ALL
to LC_CTYPE
and try if it helps?
comment:52 in reply to: ↑ 51 Changed 22 months ago by
Replying to dimpase:
bit of this branch, could you change both
LC_ALL
toLC_CTYPE
and try if it helps?
Same error after make doc-clean and make.
comment:53 Changed 22 months ago by
- Cc chapoton added
To me, it seems like an error of the following kind. That is we are opening the src/doc/en/reference/references/index.rst
file as a bytes
type, and at some place (where?), we decode the bytes to ascii and then we get UnicodeDecodeError
because it is not ascii at all. Here is a way to reproduce the same error message:
sage: with open('src/doc/en/reference/references/index.rst', 'rb') as f: b = f.read() sage: b.decode('ascii') --------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) <ipython-input-13-498050d5a3fb> in <module> ----> 1 b.decode('ascii') UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2661: ordinal not in range(128) sage: s = b.decode('utf-8')
I am adding Frédéric in cc since he fixed a lot of those UnicodeDecodeError
in recent times with the passage to Python 3.
Also, why this works in Python3.8 and not in Python3.6 ?
comment:54 follow-up: ↓ 56 Changed 22 months ago by
perhaps Python 3.6 needs an extra punch in form of
-
src/sage/docs/conf.py
a b plot_pre_code = """ 36 36 # Set locale to prevent having commas in decimal numbers 37 37 # in tachyon input (see https://trac.sagemath.org/ticket/28971) 38 38 import locale 39 locale.setlocale(LC_ALL, '') 39 40 locale.setlocale(locale.LC_NUMERIC, 'C') 40 41 def sphinx_plot(graphics, **kwds): 41 42 import matplotlib.image as mpimg
could you try this, without what's suggested in comment:51 ?
There is a comment
According to POSIX, a program which has not called setlocale(LC_ALL, '') runs using the portable 'C' locale. Calling setlocale(LC_ALL, '') lets it use the default locale as defined by the LANG variable. Since we do not want to interfere with the current locale setting we thus emulate the behavior in the way described above.
in https://docs.python.org/3.6/library/locale.html which makes me think it might be what's needed.
comment:55 Changed 22 months ago by
The name C.UTF-8
isn't sufficient; you have to check for the others too. On Gentoo:
$ locale -a C C.utf8 POSIX en_US en_US.iso88591 en_US.utf8
comment:56 in reply to: ↑ 54 Changed 22 months ago by
Replying to dimpase:
could you try this, without what's suggested in comment:51 ?
I get the same error after make doc-clean && make. In fact adding a line print('AAAAAAAAAAAAA')
a this place does not print any AAAAAAAAA...
on my screen, so I wonder if that code is really executed before the error occurs.
comment:57 Changed 22 months ago by
AAAAA, indeed (headbang on kbd) here is the fix, I think
-
src/doc/en/reference/conf_sub.py
a b ref_src = os.path.join(SAGE_DOC_SRC, 'en', 'reference') 20 20 ref_out = os.path.join(SAGE_DOC, 'html', 'en', 'reference') 21 21 22 22 # We use the main document's title, if we can find it. 23 rst_file = open('index.rst', 'r' )23 rst_file = open('index.rst', 'r', encoding='utf-8') 24 24 rst_lines = rst_file.read().splitlines() 25 25 rst_file.close() 26 26
indeed, that's one place in our homebaked docbuilder that opens UTF-8-encoded files without specifying them as UTF, and Python 3.6 does not like it.
comment:58 Changed 22 months ago by
there are also many more missing encoding='utf-8'
in open()
on source files in the docbuilder.
and perhaps on other text files too. all these lead to these UnicodeDecodeError: 'ascii' codec can't decode byte...
Should we just abandon the idea to support random system Python 3.6, they are just broken in this way...
comment:59 Changed 22 months ago by
Worse, there are files with lots of utf-8 stuff, sitting in r"""..."""
multiline comments, e.g.
src/sage/algebras/lie_algebras/lie_algebra_element.pyx
, and they cause problems too.
comment:60 Changed 22 months ago by
this is with Ubuntu 18.04 native Python 3.6:
[dochtml] [polynomia] WARNING: error while formatting arguments for sage.rings.polynomial.multi_polynomial_ideal.MPolynomialIdeal_singular_repr.complete_primary_decomposition: 'ascii' codec can't decode byte 0xc2 in position 4774: ordinal not in range(128) [dochtml] [polynomia] WARNING: error while formatting arguments for sage.rings.polynomial.multi_polynomial_ideal.MPolynomialIdeal_singular_repr.genus: 'ascii' codec can't decode byte 0xc2 in position 4774: ordinal not in range(128) [dochtml] [polynomia] WARNING: error while formatting arguments for sage.rings.polynomial.multi_polynomial_ideal.MPolynomialIdeal_singular_repr.primary_decomposition_complete: 'ascii' codec can't decode byte 0xc2 in position 4774: ordinal not in range(128) [dochtml] [polynomia] /home/dima/sage/dev/sage/local/lib/python3.6/site-packages/sage/rings/polynomial/multi_polynomial_ideal.py:docstring of sage.rings.polynomial.multi_polynomial_ideal.MPolynomialIdeal.degree_of_semi_regularity:8: WARNING: Inline interpreted text or phrase reference start-string without end-string. [dochtml] [polynomia] /home/dima/sage/dev/sage/local/lib/python3.6/site-packages/sage/rings/polynomial/multi_polynomial_ideal.py:docstring of sage.rings.polynomial.multi_polynomial_ideal.MPolynomialIdeal.degree_of_semi_regularity:9: WARNING: Block quote ends without a blank line; unexpected unindent. [dochtml] [polynomia] /home/dima/sage/dev/sage/local/lib/python3.6/site-packages/sage/rings/polynomial/multi_polynomial_ideal.py:docstring of sage.rings.polynomial.multi_polynomial_ideal.MPolynomialIdeal.degree_of_semi_regularity:23: WARNING: Inline interpreted text or phrase reference start-string without end-string. [dochtml] [polynomia] /home/dima/sage/dev/sage/local/lib/python3.6/site-packages/sage/rings/polynomial/multi_polynomial_ideal.py:docstring of sage.rings.polynomial.multi_polynomial_ideal.MPolynomialIdeal.degree_of_semi_regularity:24: WARNING: Block quote ends without a blank line; unexpected unindent. [dochtml] [polynomia] /home/dima/sage/dev/sage/local/lib/python3.6/site-packages/sage/rings/polynomial/multi_polynomial_ideal.py:docstring of sage.rings.polynomial.multi_polynomial_ideal.MPolynomialIdeal_singular_repr.associated_primes:4: WARNING: Inline interpreted text or phrase reference start-string without end-string. [dochtml] [polynomia] /home/dima/sage/dev/sage/local/lib/python3.6/site-packages/sage/rings/polynomial/multi_polynomial_ideal.py:docstring of sage.rings.polynomial.multi_polynomial_ideal.MPolynomialIdeal_singular_repr.associated_primes:7: WARNING: Block quote ends without a blank line; unexpected unindent. [dochtml] [polynomia] /home/dima/sage/dev/sage/local/lib/python3.6/site-packages/sage/rings/polynomial/multi_polynomial_ideal.py:docstring of sage.rings.polynomial.multi_polynomial_ideal.MPolynomialIdeal_singular_repr.primary_decomposition:4: WARNING: Inline interpreted text or phrase reference start-string without end-string. [dochtml] [polynomia] /home/dima/sage/dev/sage/local/lib/python3.6/site-packages/sage/rings/polynomial/multi_polynomial_ideal.py:docstring of sage.rings.polynomial.multi_polynomial_ideal.MPolynomialIdeal_singular_repr.primary_decomposition:6: WARNING: Block quote ends without a blank line; unexpected unindent. [dochtml] [polynomia] WARNING: error while formatting arguments for sage.rings.polynomial.polynomial_element.Polynomial.hamming_weight: 'ascii' codec can't decode byte 0xc3 in position 8166: ordinal not in range(128) [dochtml] [polynomia] WARNING: error while formatting arguments for sage.rings.polynomial.polynomial_element.Polynomial.inverse_series_trunc: 'ascii' codec can't decode byte 0xc3 in position 8166: ordinal not in range(128) [dochtml] [polynomia] WARNING: error while formatting arguments for sage.rings.polynomial.polynomial_element.Polynomial.is_one: 'ascii' codec can't decode byte 0xc3 in position 8166: ordinal not in range(128) [dochtml] [polynomia] WARNING: error while formatting arguments for sage.rings.polynomial.polynomial_element.Polynomial.is_term: 'ascii' codec can't decode byte 0xc3 in position 8166: ordinal not in range(128) [dochtml] [polynomia] WARNING: error while formatting arguments for sage.rings.polynomial.polynomial_element.Polynomial.is_zero: 'ascii' codec can't decode byte 0xc3 in position 8166: ordinal not in range(128) [dochtml] [polynomia] WARNING: error while formatting arguments for sage.rings.polynomial.polynomial_element.Polynomial.list: 'ascii' codec can't decode byte 0xc3 in position 8166: ordinal not in range(128) [dochtml] [polynomia] WARNING: error while formatting arguments for sage.rings.polynomial.polynomial_element.Polynomial.number_of_terms: 'ascii' codec can't decode byte 0xc3 in position 8166: ordinal not in range(128) [dochtml] [polynomia] WARNING: error while formatting arguments for sage.rings.polynomial.polynomial_element.Polynomial.truncate: 'ascii' codec can't decode byte 0xc3 in position 8166: ordinal not in range(128) [dochtml] [polynomia] WARNING: error while formatting arguments for sage.rings.polynomial.polynomial_element.Polynomial_generic_dense.is_term: 'ascii' codec can't decode byte 0xc3 in position 8166: ordinal not in range(128) [dochtml] [polynomia] WARNING: error while formatting arguments for sage.rings.polynomial.polynomial_element.Polynomial_generic_dense.list: 'ascii' codec can't decode byte 0xc3 in position 8166: ordinal not in range(128) [dochtml] [polynomia] WARNING: error while formatting arguments for sage.rings.polynomial.polynomial_element.Polynomial_generic_dense.truncate: 'ascii' codec can't decode byte 0xc3 in position 8166: ordinal not in range(128) [dochtml] [polynomia] WARNING: error while formatting arguments for sage.rings.polynomial.polynomial_element.generic_power_trunc: 'ascii' codec can't decode byte 0xc3 in position 8166: ordinal not in range(128) [dochtml] [polynomia] WARNING: error while formatting arguments for sage.rings.polynomial.polynomial_element.Polynomial._mul_trunc_: 'ascii' codec can't decode byte 0xc3 in position 8166: ordinal not in range(128) [dochtml] [polynomia] The inventory files are in local/share/doc/sage/inventory/en/reference/polynomial_rings. [dochtml] Error building the documentation. [dochtml] Traceback (most recent call last): [dochtml] File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main [dochtml] "__main__", mod_spec) [dochtml] File "/usr/lib/python3.6/runpy.py", line 85, in _run_code [dochtml] exec(code, run_globals) [dochtml] File "/home/dima/sage/dev/sage/local/lib/python3.6/site-packages/sage_setup/docbuild/__main__.py", line 2, in <module> [dochtml] main() [dochtml] File "/home/dima/sage/dev/sage/local/lib/python3.6/site-packages/sage_setup/docbuild/__init__.py", line 1730, in main [dochtml] builder() [dochtml] File "/home/dima/sage/dev/sage/local/lib/python3.6/site-packages/sage_setup/docbuild/__init__.py", line 344, in _wrapper [dochtml] getattr(get_builder(document), 'inventory')(*args, **kwds) [dochtml] File "/home/dima/sage/dev/sage/local/lib/python3.6/site-packages/sage_setup/docbuild/__init__.py", line 570, in _wrapper [dochtml] self._build_everything_except_bibliography(lang, format, *args, **kwds) [dochtml] File "/home/dima/sage/dev/sage/local/lib/python3.6/site-packages/sage_setup/docbuild/__init__.py", line 556, in _build_everything_except_bibliography [dochtml] build_many(build_ref_doc, non_references) [dochtml] File "/home/dima/sage/dev/sage/local/lib/python3.6/site-packages/sage_setup/docbuild/__init__.py", line 296, in build_many [dochtml] _build_many(target, args, processes=NUM_THREADS) [dochtml] File "/home/dima/sage/dev/sage/local/lib/python3.6/site-packages/sage_setup/docbuild/utils.py", line 291, in build_many [dochtml] raise worker_exc.original_exception [dochtml] OSError: WARNING: error while formatting arguments for sage.rings.polynomial.multi_polynomial_ideal.MPolynomialIdeal_singular_repr.complete_primary_decomposition: 'ascii' codec can't decode byte 0xc2 in position 4774: ordinal not in range(128)
if anyone wants to sort all this out, be my guest.
comment:61 Changed 22 months ago by
please push your branch if you started to do some changes using utf-8 encodings. I will take a look tomorrow europe time.
comment:62 Changed 22 months ago by
Because build_many
catches the error, it is difficult to debug. The following change allows to see the real error:
-
src/sage_setup/docbuild/__init__.py
diff --git a/src/sage_setup/docbuild/__init__.py b/src/sage_setup/docbuild/__init__.py index 0841f429e7..c9e08214f3 100644
a b def build_many(target, args): 292 292 Thin wrapper around `sage_setup.docbuild.utils.build_many` which uses the 293 293 docbuild settings ``NUM_THREADS`` and ``ABORT_ON_ERROR``. 294 294 """ 295 for arg in args: 296 target(arg) 295 297 try: 296 298 _build_many(target, args, processes=NUM_THREADS) 297 299 except BaseException as exc:
which is:
[dochtml] File "/home/slabbe/GitBox/sage/local/lib/python3.6/site-packages/sage_setup/docbuild/__init__.py", line 296, in build_many [dochtml] target(arg) [dochtml] File "/home/slabbe/GitBox/sage/local/lib/python3.6/site-packages/sage_setup/docbuild/__init__.py", line 79, in build_ref_doc [dochtml] getattr(ReferenceSubBuilder(doc, lang), format)(*args, **kwds) [dochtml] File "/home/slabbe/GitBox/sage/local/lib/python3.6/site-packages/sage_setup/docbuild/__init__.py", line 763, in _wrapper [dochtml] for module_name in self.get_all_included_modules(): [dochtml] File "/home/slabbe/GitBox/sage/local/lib/python3.6/site-packages/sage_setup/docbuild/__init__.py", line 927, in get_all_included_modules [dochtml] for module in self.get_modules(filename): [dochtml] File "/home/slabbe/GitBox/sage/local/lib/python3.6/site-packages/sage_setup/docbuild/__init__.py", line 1009, in get_modules [dochtml] lines = f.readlines() [dochtml] File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode [dochtml] return codecs.ascii_decode(input, self.errors)[0] [dochtml] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2661: ordinal not in range(128)
But indeed, I see that adding a encoding='utf-8'
at this place is not sufficient.
comment:63 Changed 22 months ago by
It is not sifficient, and I get the codec errors on a file where I cleaned all non-ascii away (sic!).
I am giving up on this. I think the code is too new for Python 3.6, and I don't think we should invest time in a version that will be EOL by the end of 2021.
People who only have Python 3.6 can build Python.
comment:64 Changed 22 months ago by
We already have some code in Sage that was taken from Python 3.7, in src/sage/misc/sageinspect.py, which is used in docbuilding, perhaps it is too new, e.g.
def formatannotation(annotation, base_module=None): """ This is taken from Python 3.7's inspect.py; the only change is to add documentation. ...
code from sageinspect.py
is used in "formatting arguments", the error shown in comment:60.
comment:65 Changed 22 months ago by
I think I will surrender too. I posted the changes that needs to be done on the branch u/slabbe/30053 to get Python3.6 to work. As you, I now get the error
[dochtml] OSError: WARNING: error while formatting arguments for sage.rings.polynomial.multi_polynomial_ideal.MPolynomialIdeal_singular_repr.complete_primary_decomposition: 'ascii' codec can't decode byte 0xc2 in position 4774: ordinal not in range(128)
comment:66 Changed 22 months ago by
- Dependencies set to #30551
- Status changed from needs_work to needs_review
this ticket helps if one is on python 3.x for x>6. As to 3.6, well, it's apparently hard, as Sage code and perhaps sphinx are too new for it.
comment:67 Changed 22 months ago by
On python >= 3.7, LC_ALL=C
is sufficient, because python itself will attempt to find a UTF-8 version of that locale.
(It may be only a semantic difference, but then the non-existence of C.UTF-8
on some systems becomes Not Our Problem.)
comment:68 follow-up: ↓ 69 Changed 22 months ago by
this patch sets LC_ALL to a meaningful value. Do you want it always to be C
?
comment:69 in reply to: ↑ 68 Changed 22 months ago by
Replying to dimpase:
this patch sets LC_ALL to a meaningful value. Do you want it always to be
C
?
Yes, if we're going to throw python-3.6 under the bus in this release.
LC_ALL=C
is the standard choice, and python >= 3.7 already treat LC_ALL=C
magically. That's what "PEP 538 -- Coercing the legacy C locale to a UTF-8 based locale" is about.
comment:70 Changed 22 months ago by
- Description modified (diff)
Let's refocus this ticket (blocker for Sage 9.2) on solving the issues for Python 3.7+
To me Python 3.6 support is a "wishlist" item, but not a blocker. I have created a separate ticket for it: #30576
comment:71 Changed 22 months ago by
- Summary changed from setlocale: LC_ALL: cannot change locale (C.UTF-8) from build/bin/sage-spkg and in doctests to Python 3.7+: setlocale: LC_ALL: cannot change locale (C.UTF-8) from build/bin/sage-spkg and in doctests
comment:72 follow-up: ↓ 73 Changed 22 months ago by
what are the 3.7+ problems left here? I presume all of them have plain C locale, so it all should work, no?
comment:73 in reply to: ↑ 72 Changed 22 months ago by
Replying to dimpase:
what are the 3.7+ problems left here? I presume all of them have plain C locale, so it all should work, no?
None, but AFAIK there were none before we started messing with the locale, either.
If no one is going to fix the python-3.6 problems, we should just drop it from the list of supported system pythons and revert this to LC_ALL=C
. That's simpler and leaves us with one fewer locale problem to debug in the future.
comment:74 follow-up: ↓ 75 Changed 22 months ago by
I wouldn't close the door on python 3.6 yet, so let's get this patch in please.
comment:75 in reply to: ↑ 74 ; follow-up: ↓ 76 Changed 22 months ago by
Replying to mkoeppe:
I wouldn't close the door on python 3.6 yet, so let's get this patch in please.
It still isn't sufficient for python-3.6, even if someone does fix the other problems. The name C.UTF-8
isn't standard because... it's not standardized yet. On Gentoo, it's C.utf8
, and elsewhere apparently (per the PEP) it's called simply UTF-8
. All of them should be checked/tried, and we should really only be doing it if python-3.6 is in use. That way, when we drop python-3.6, we're left with just LC_ALL=C
again.
Also keep in mind that if no one actually fixes the other problems with python-3.6, the door is closed regardless.
comment:76 in reply to: ↑ 75 ; follow-up: ↓ 79 Changed 22 months ago by
comment:77 Changed 22 months ago by
- Reviewers set to Matthias Koeppe, Dima Pasechnik
- Status changed from needs_review to positive_review
there is no harm in adding this, anyway, thus, over to bots.
comment:78 Changed 22 months ago by
- Dependencies #30551 deleted
I've also removed #30551 as a dependency - after all whether Python 3.6 is (somewhat) supported (IMHO docbuilding is the main problem there) can be sorted out later.
comment:79 in reply to: ↑ 76 Changed 22 months ago by
Replying to mkoeppe:
Yes, I know, that's why I opened the separate ticket #30576 for that.
The stuff on ticket #30576 isn't what I was referring to...
Replying to dimpase:
there is no harm in adding this, anyway, thus, over to bots.
The harm is that now we have additional locale problems to debug for a change that doesn't solve anything. With python-3.7, there are now three different configurations:
- The
C.UTF-8
locale exists and is used. - No utf8 locale exists, so
C
is used. - A utf8 locale is available, but your grep doesn't find it, so we wind up with
C
on a machine that could have a utf8 locale.
Why, when LC_ALL=C
unconditionally works fine and had been in there for ages?
If someone wants to fix python-3.6, we should add these hacks only for python-3.6, and we should try the alternate names C.utf8
and UTF-8
, too.
comment:80 Changed 22 months ago by
- Commit changed from e5f6663372fab14c6a386004a925de59be216f31 to ff6f4a68d4f396cb5f10e9918e5942113f739aa1
- Status changed from positive_review to needs_review
Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:
ff6f4a6 | assume Python 3.7 or better
|
comment:81 Changed 22 months ago by
OK, so how about this?
comment:82 Changed 22 months ago by
I think that's the best solution, possibly combined with dropping 3.6 from the list of system pythons that we accept. Portability is nice and all, but we're wasting a lot of time trying to support an evolutionary dead-end with py3.6.
comment:83 follow-up: ↓ 85 Changed 22 months ago by
Fine with me - then please also adjust python3's spkg-configure...
comment:84 Changed 22 months ago by
- Commit changed from ff6f4a68d4f396cb5f10e9918e5942113f739aa1 to 64c3a8a6c99e916b68dfec6a2cd1346b023916e9
Branch pushed to git repo; I updated commit sha1. New commits:
64c3a8a | only test python3, to be 3.7 or 3.8
|
comment:85 in reply to: ↑ 83 Changed 22 months ago by
Replying to mkoeppe:
Fine with me - then please also adjust python3's spkg-configure...
fixed, as discussed - only test python3
, to be either 3.7 or 3.8.
Please review
comment:86 Changed 22 months ago by
- Branch changed from u/dimpase/build/careful_with_C_UTF8 to u/mkoeppe/build/careful_with_C_UTF8
comment:87 Changed 22 months ago by
- Commit changed from 64c3a8a6c99e916b68dfec6a2cd1346b023916e9 to be47518c1019d37e11bbac9a3268d69b196691c8
comment:88 Changed 22 months ago by
Rationale: The changed search behavior would lead to confusing reports on sage-release.
comment:89 Changed 22 months ago by
- Status changed from needs_review to positive_review
comment:90 Changed 22 months ago by
- Description modified (diff)
- Summary changed from Python 3.7+: setlocale: LC_ALL: cannot change locale (C.UTF-8) from build/bin/sage-spkg and in doctests to Python 3.7+: setlocale: LC_ALL: cannot change locale (C.UTF-8) from build/bin/sage-spkg and in doctests; disable use of system Python 3.6
comment:91 Changed 22 months ago by
- Description modified (diff)
comment:92 Changed 22 months ago by
- Description modified (diff)
comment:93 Changed 22 months ago by
- Description modified (diff)
comment:94 Changed 22 months ago by
- Branch changed from u/mkoeppe/build/careful_with_C_UTF8 to be47518c1019d37e11bbac9a3268d69b196691c8
- Resolution set to fixed
- Status changed from positive_review to closed
comment:95 Changed 22 months ago by
- Commit be47518c1019d37e11bbac9a3268d69b196691c8 deleted
- Description modified (diff)
an easy way out is just to check whether the locale change worked, and if not, use
C
locale, notC.UTF-8
. Perhaps print a warning.