Ticket #13161 (closed defect: fixed)
Fix unicode issue in axes labels
| Reported by: | kcrisman | Owned by: | jason, was |
|---|---|---|---|
| Priority: | major | Milestone: | sage-5.3 |
| Component: | graphics | Keywords: | |
| Cc: | ddrake, shahuwang | Work issues: | |
| Report Upstream: | N/A | Reviewers: | Sébastien Labbé |
| Authors: | Dan Drake, 王瑞期 | Merged in: | sage-5.3.rc0 |
| Dependencies: | Stopgaps: |
Description (last modified by slabbe) (diff)
See this ask.sagemath.org post. The proposed fix there is to replace this line by
-self.__axes_labels = (str(l[0]), str(l[1])) +self.__axes_labels = (l[0], l[1])
Considering that l is already supposed to be a tuple or list of two strings, this seems pretty plausible.
Apply trac_13161.patch and trac_13161-review-sl.patch
Attachments
Change History
comment:2 follow-up: ↓ 3 Changed 11 months ago by kcrisman
- Cc ddrake added
There is an update at the ask.sagemath.org post. Apparently it's actually a fair amount of work to get matplotlib to do certain unicode fonts properly... that doesn't mean this isn't a bug.
I'm wondering what the Sage-mobile people at Sunkyunkwan U. in Korea do for this, if they do anything? Dan, do you know?
comment:3 in reply to: ↑ 2 ; follow-up: ↓ 5 Changed 11 months ago by ddrake
Replying to kcrisman:
I'm wondering what the Sage-mobile people at Sunkyunkwan U. in Korea do for this, if they do anything? Dan, do you know?
In my experience, Korean developers are ignorant of encoding issues; it seems like they test it on their computer, and if it works, they ship it. This works fine when all your users are using the same version of Windows XP as you, but I still get lots of emails and see webpages in euc-kr or other encodings that don't declare their encoding and hence don't work when things default to utf-8.
So, short answer: my guess is they don't do anything. But I'll ask them.
comment:4 Changed 10 months ago by slabbe
I came up with this bug yesterday using sage-5.0. Here are the notes I took :
The following works:
sage: c = circle((0,0), 1) sage: c.axes_labels(['X', 'Y']) sage: c
Accent in normal string gets a ValueError:
sage: c.axes_labels(['an accent : é', 'Y']) sage: c Traceback (most recent call last): ... ValueError: matplotlib display text must have all code points < 128 or use Unicode strings
Accent in unicode string gets translated to str which yields a UnicodeError :
sage: c.axes_labels([u'an accent : é', 'Y']) Traceback (most recent call last): ... --> 634 self.__axes_labels = (str(l[0]), str(l[1])) ... UnicodeEncodeError: 'ascii' codec can't encode characters in position 12-13: ordinal not in range(128)
Not declaring the unicode for command text also gives a ValueError:
sage: text('an accent : é', (0,0))
Traceback (most recent call last):
...
ValueError: matplotlib display text must have all code points < 128 or use Unicode strings
But declaring unicode works for command text. Although, no error are raised, the result is not good : the "é" gets printed as the two characters "~A commercial c" :
sage: text(u'an accent : é', (0,0))
Strangely it gets printed correctly if the command is written in a file (without declaring encoding) and then attached :
# file.sage t = text(u'an accent : é', (0,0))
and then
sage: attach file.sage sage: t
is perfect! This shows that somehow matplotlib is able to work with unicode strings. We just have to make it work properly.
comment:5 in reply to: ↑ 3 Changed 10 months ago by ddrake
Replying to ddrake:
So, short answer: my guess is they don't do anything. But I'll ask them.
To followup: their response is, they didn't do anything special and do run into encoding issues here and there.
comment:6 Changed 10 months ago by ddrake
- Status changed from new to needs_review
- Authors set to Dan Drake
Patch up. I removed the str() calls, and also just wrote tuple(l), since the code guarantees that we have a tuple or list of length 2.
I see the problem reported by Sébastian, though; using Unicode strings, the generated plots have the wrong characters. But I think that is a font problem and could be fixed on another ticket; for now, let's get our own code working.
comment:7 follow-up: ↓ 8 Changed 10 months ago by slabbe
Hi, on sage-5.0, I get the following reject :
10 slabbe@pol ~/Applications/sage-5.0/devel/sage/sage/plot $ cat graphics.py.rej
--- graphics.py
+++ graphics.py
@@ -637,7 +656,7 @@
raise TypeError, "l must be a list or tuple"
if len(l) != 2:
raise ValueError, "l must have length 2"
- self._axes_labels = (str(l[0]), str(l[1]))
+ self._axes_labels = tuple(l)
def axes_label_color(self, c=None):
r"""
I need to install a newer version of Sage to do the review...
Sébastien
comment:8 in reply to: ↑ 7 Changed 10 months ago by ddrake
comment:9 follow-up: ↓ 10 Changed 10 months ago by slabbe
- Status changed from needs_review to positive_review
- Reviewers set to Sébastien Labbé
Problem is fixed (tested on sage-5.2.rc0). All tests passed on sage/plot/graphics.py. Documention builds fine and looks good from html and command line. I will open a new ticket for the other problem mentionned above about how matplolib handles unicode.
Positive review.
comment:10 in reply to: ↑ 9 Changed 10 months ago by slabbe
I will open a new ticket for the other problem mentionned above about how matplolib handles unicode.
This is now #13296
comment:11 Changed 10 months ago by kcrisman
- Authors changed from Dan Drake to Dan Drake, shahuwang
I'm just temporarily adding the original author by his ask.sagemath handle. Hopefully we'll get a complete name soon.
comment:13 Changed 10 months ago by jdemeyer
- Status changed from positive_review to needs_work
This breaks when building the PDF documentation:
! Package ucs Error: Unknown Unicode character 45812 = U+B2F4,
(ucs) possibly declared in uni-178.def.
(ucs) Type H to see if it is available with options.
See the ucs package documentation for explanation.
Type H <return> for immediate help.
...
l.39867 ...s}{<EB><8B><B4>}\PYG{l+s}{'}\PYG{p}{]}\PYG{p}{)}
?
! Emergency stop.
...
l.39867 ...s}{<EB><8B><B4>}\PYG{l+s}{'}\PYG{p}{]}\PYG{p}{)}
! ==> Fatal error occurred, no output PDF file produced!
comment:14 follow-up: ↓ 15 Changed 10 months ago by slabbe
I never builded the documentation as pdf before. I just did it and get the same error.
Could this be related to the first line added by the patch:
# -*- encoding: utf-8 -*-
which should be
# -*- coding: utf-8 -*-
?
Another possibility would be to write the test as
sage: c.axes_labels(['z',u'\xeb\x8b\xb4'])
sage: c._axes_labels
('z', u'\xeb\x8b\xb4')
instead of
sage: c.axes_labels(['z',u'ëŽ'])
sage: c._axes_labels
('z', u'\xeb\x8b\xb4')
I am testing those two options right now.
comment:15 in reply to: ↑ 14 Changed 10 months ago by slabbe
I am testing those two options right now.
I get the same error for both options...
comment:16 Changed 10 months ago by jhpalmieri
- Status changed from needs_work to needs_review
My guess is that the Asian (?) character used in the doctest is not recognized by utf-8 encoding, or at least by PDFLaTeX's unicode implementation. So we should try a different character instead, for example with this change:
-
sage/plot/graphics.py
diff --git a/sage/plot/graphics.py b/sage/plot/graphics.py
a b class Graphics(SageObject): 637 637 :: 638 638 639 639 sage: c = circle((0,0), 1) 640 sage: c.axes_labels(['z',u' 담'])640 sage: c.axes_labels(['z',u'é']) 641 641 sage: c._axes_labels 642 ('z', u'\x eb\x8b\xb4')642 ('z', u'\xc3\xa9') 643 643 644 644 sage: c.axes_labels([u'an accent : é', 'Y']) 645 645 sage: c._axes_labels
I'm attaching a new patch which does this.
comment:18 follow-up: ↓ 21 Changed 10 months ago by jdemeyer
Just for interestingness, couldn't you use an actual word instead of one character?
For example, "intérêt" (French for interest, a word which might plausably appear as label).
comment:19 follow-up: ↓ 20 Changed 10 months ago by kcrisman
- Cc shahuwang added
- Authors changed from Dan Drake, shahuwang to Dan Drake, 王瑞期
Hi shahuwang - is this a good way to have your name? Although usually we use more-or-less Roman characters for our stuff (including for e.g. Korean developers), maybe this is a first time...
comment:20 in reply to: ↑ 19 Changed 10 months ago by shahuwang
Replying to kcrisman:
Hi shahuwang - is this a good way to have your name? Although usually we use more-or-less Roman characters for our stuff (including for e.g. Korean developers), maybe this is a first time...
Hi, it seems that there will be some diffcult if I use my name in Chinese,sorry for that,and I will change it into Roman characters.Thanks very much.
Also,I want to report another bug of sage to deal with CJK(Chinese,Japanese,Korean),I open a ticket for it ,this is the ticket .As I am not familiar with the process,maybe I have made some mistake here.
comment:21 in reply to: ↑ 18 Changed 10 months ago by slabbe
Replying to jdemeyer:
Just for interestingness, couldn't you use an actual word instead of one character?
For example, "intérêt" (French for interest, a word which might plausably appear as label).
I just added a patch which applies on top of the previous one and which adds real life example of french word which can occur in axis label and which contain an accent :
- Axes des abscisses (axe des X)
- Axes des ordonnées (axe des Y)
Needs review!
comment:23 Changed 9 months ago by jhpalmieri
- Status changed from needs_review to positive_review
Looks good to me.
comment:24 Changed 9 months ago by jdemeyer
- Status changed from positive_review to closed
- Resolution set to fixed
- Merged in set to sage-5.3.rc0


In fact, I get a kind of weird error trying to do things like
anyway, so we could have better support for this in general. One shouldn't have to set a matplotlib option.