Opened 8 years ago

Closed 8 years ago

#13161 closed defect (fixed)

Fix unicode issue in axes labels

Reported by: kcrisman Owned by: jason, was
Priority: major Milestone: sage-5.3
Component: graphics Keywords:
Cc: ddrake, shahuwang Merged in: sage-5.3.rc0
Authors: Dan Drake, 王瑞期 Reviewers: Sébastien Labbé
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by slabbe)

See this ask.sagemath.org post. The proposed fix there is to replace this line by

-self.__axes_labels = (str(l[0]), str(l[1]))
+self.__axes_labels = (l[0], l[1])

Considering that l is already supposed to be a tuple or list of two strings, this seems pretty plausible.


Apply trac_13161.patch and trac_13161-review-sl.patch

Attachments (2)

trac_13161.patch (1.4 KB) - added by jhpalmieri 8 years ago.
trac_13161-review-sl.patch (900 bytes) - added by slabbe 8 years ago.

Download all attachments as: .zip

Change History (26)

comment:1 Changed 8 years ago by kcrisman

In fact, I get a kind of weird error trying to do things like

sage: plot(x,axes_labels=['z','담'])

anyway, so we could have better support for this in general. One shouldn't have to set a matplotlib option.

comment:2 follow-up: Changed 8 years ago by kcrisman

  • Cc ddrake added

There is an update at the ask.sagemath.org post. Apparently it's actually a fair amount of work to get matplotlib to do certain unicode fonts properly... that doesn't mean this isn't a bug.

I'm wondering what the Sage-mobile people at Sunkyunkwan U. in Korea do for this, if they do anything? Dan, do you know?

comment:3 in reply to: ↑ 2 ; follow-up: Changed 8 years ago by ddrake

Replying to kcrisman:

I'm wondering what the Sage-mobile people at Sunkyunkwan U. in Korea do for this, if they do anything? Dan, do you know?

In my experience, Korean developers are ignorant of encoding issues; it seems like they test it on their computer, and if it works, they ship it. This works fine when all your users are using the same version of Windows XP as you, but I still get lots of emails and see webpages in euc-kr or other encodings that don't declare their encoding and hence don't work when things default to utf-8.

So, short answer: my guess is they don't do anything. But I'll ask them.

comment:4 Changed 8 years ago by slabbe

I came up with this bug yesterday using sage-5.0. Here are the notes I took :

The following works:

sage: c = circle((0,0), 1)
sage: c.axes_labels(['X', 'Y'])
sage: c

Accent in normal string gets a ValueError:

sage: c.axes_labels(['an accent : é', 'Y'])
sage: c
Traceback (most recent call last): 
... 
ValueError: matplotlib display text must have all code points < 128 or use Unicode strings

Accent in unicode string gets translated to str which yields a UnicodeError :

sage: c.axes_labels([u'an accent : é', 'Y'])                               
Traceback (most recent call last):
...             
--> 634         self.__axes_labels = (str(l[0]), str(l[1]))
...
UnicodeEncodeError: 'ascii' codec can't encode characters in position 12-13: ordinal not in range(128)

Not declaring the unicode for command text also gives a ValueError:

sage: text('an accent : é', (0,0))
Traceback (most recent call last): 
... 
ValueError: matplotlib display text must have all code points < 128 or use Unicode strings

But declaring unicode works for command text. Although, no error are raised, the result is not good : the "é" gets printed as the two characters "~A commercial c" :

sage: text(u'an accent : é', (0,0))

Strangely it gets printed correctly if the command is written in a file (without declaring encoding) and then attached :

# file.sage
t = text(u'an accent : é', (0,0))

and then

sage: attach file.sage
sage: t

is perfect! This shows that somehow matplotlib is able to work with unicode strings. We just have to make it work properly.

Last edited 8 years ago by slabbe (previous) (diff)

comment:5 in reply to: ↑ 3 Changed 8 years ago by ddrake

Replying to ddrake:

So, short answer: my guess is they don't do anything. But I'll ask them.

To followup: their response is, they didn't do anything special and do run into encoding issues here and there.

comment:6 Changed 8 years ago by ddrake

  • Authors set to Dan Drake
  • Status changed from new to needs_review

Patch up. I removed the str() calls, and also just wrote tuple(l), since the code guarantees that we have a tuple or list of length 2.

I see the problem reported by Sébastian, though; using Unicode strings, the generated plots have the wrong characters. But I think that is a font problem and could be fixed on another ticket; for now, let's get our own code working.

Last edited 8 years ago by ddrake (previous) (diff)

comment:7 follow-up: Changed 8 years ago by slabbe

Hi, on sage-5.0, I get the following reject :

10 slabbe@pol ~/Applications/sage-5.0/devel/sage/sage/plot $ cat graphics.py.rej 
--- graphics.py
+++ graphics.py
@@ -637,7 +656,7 @@
             raise TypeError, "l must be a list or tuple"
         if len(l) != 2:
             raise ValueError, "l must have length 2"
-        self._axes_labels = (str(l[0]), str(l[1]))
+        self._axes_labels = tuple(l)
 
     def axes_label_color(self, c=None):
         r"""

I need to install a newer version of Sage to do the review...

Sébastien

comment:8 in reply to: ↑ 7 Changed 8 years ago by ddrake

Replying to slabbe:

Hi, on sage-5.0, I get the following reject :

Ah! Sorry! I forgot to say that I used 5.2.rc0.

However, you can manually fix it quite easily: the problem is #12974, which changed the number of underscores. On older versions (before 5.2.beta0), use two underscores.

Last edited 8 years ago by ddrake (previous) (diff)

comment:9 follow-up: Changed 8 years ago by slabbe

  • Reviewers set to Sébastien Labbé
  • Status changed from needs_review to positive_review

Problem is fixed (tested on sage-5.2.rc0). All tests passed on sage/plot/graphics.py. Documention builds fine and looks good from html and command line. I will open a new ticket for the other problem mentionned above about how matplolib handles unicode.

Positive review.

Last edited 8 years ago by slabbe (previous) (diff)

comment:10 in reply to: ↑ 9 Changed 8 years ago by slabbe

I will open a new ticket for the other problem mentionned above about how matplolib handles unicode.

This is now #13296

comment:11 Changed 8 years ago by kcrisman

  • Authors changed from Dan Drake to Dan Drake, shahuwang

I'm just temporarily adding the original author by his ask.sagemath handle. Hopefully we'll get a complete name soon.

comment:12 Changed 8 years ago by jdemeyer

  • Milestone changed from sage-5.2 to sage-5.3

comment:13 Changed 8 years ago by jdemeyer

  • Status changed from positive_review to needs_work

This breaks when building the PDF documentation:

! Package ucs Error: Unknown Unicode character 45812 = U+B2F4,
(ucs)                possibly declared in uni-178.def.
(ucs)                Type H to see if it is available with options.

See the ucs package documentation for explanation.
Type  H <return>  for immediate help.
 ...

l.39867 ...s}{<EB><8B><B4>}\PYG{l+s}{'}\PYG{p}{]}\PYG{p}{)}

?
! Emergency stop.
 ...

l.39867 ...s}{<EB><8B><B4>}\PYG{l+s}{'}\PYG{p}{]}\PYG{p}{)}

!  ==> Fatal error occurred, no output PDF file produced!

comment:14 follow-up: Changed 8 years ago by slabbe

I never builded the documentation as pdf before. I just did it and get the same error.

Could this be related to the first line added by the patch:

# -*- encoding: utf-8 -*- 

which should be

# -*- coding: utf-8 -*- 

?

Another possibility would be to write the test as

sage: c.axes_labels(['z',u'\xeb\x8b\xb4']) 
sage: c._axes_labels 
('z', u'\xeb\x8b\xb4') 

instead of

sage: c.axes_labels(['z',u'닮']) 
sage: c._axes_labels 
('z', u'\xeb\x8b\xb4') 

I am testing those two options right now.

comment:15 in reply to: ↑ 14 Changed 8 years ago by slabbe

I am testing those two options right now.

I get the same error for both options...

comment:16 Changed 8 years ago by jhpalmieri

  • Status changed from needs_work to needs_review

My guess is that the Asian (?) character used in the doctest is not recognized by utf-8 encoding, or at least by PDFLaTeX's unicode implementation. So we should try a different character instead, for example with this change:

  • sage/plot/graphics.py

    diff --git a/sage/plot/graphics.py b/sage/plot/graphics.py
    a b class Graphics(SageObject): 
    637637        ::
    638638
    639639            sage: c = circle((0,0), 1)
    640             sage: c.axes_labels(['z',u''])
     640            sage: c.axes_labels(['z',u'é'])
    641641            sage: c._axes_labels
    642             ('z', u'\xeb\x8b\xb4')
     642            ('z', u'\xc3\xa9')
    643643
    644644            sage: c.axes_labels([u'an accent : é', 'Y'])
    645645            sage: c._axes_labels

I'm attaching a new patch which does this.

Changed 8 years ago by jhpalmieri

comment:17 Changed 8 years ago by jhpalmieri

  • Description modified (diff)

comment:18 follow-up: Changed 8 years ago by jdemeyer

Just for interestingness, couldn't you use an actual word instead of one character?

For example, "intérêt" (French for interest, a word which might plausably appear as label).

comment:19 follow-up: Changed 8 years ago by kcrisman

  • Authors changed from Dan Drake, shahuwang to Dan Drake, 王瑞期
  • Cc shahuwang added

Hi shahuwang - is this a good way to have your name? Although usually we use more-or-less Roman characters for our stuff (including for e.g. Korean developers), maybe this is a first time...

comment:20 in reply to: ↑ 19 Changed 8 years ago by shahuwang

Replying to kcrisman:

Hi shahuwang - is this a good way to have your name? Although usually we use more-or-less Roman characters for our stuff (including for e.g. Korean developers), maybe this is a first time...

Hi, it seems that there will be some diffcult if I use my name in Chinese,sorry for that,and I will change it into Roman characters.Thanks very much.

Also,I want to report another bug of sage to deal with CJK(Chinese,Japanese,Korean),I open a ticket for it ,this is the ticket .As I am not familiar with the process,maybe I have made some mistake here.

Changed 8 years ago by slabbe

comment:21 in reply to: ↑ 18 Changed 8 years ago by slabbe

Replying to jdemeyer:

Just for interestingness, couldn't you use an actual word instead of one character?

For example, "intérêt" (French for interest, a word which might plausably appear as label).

I just added a patch which applies on top of the previous one and which adds real life example of french word which can occur in axis label and which contain an accent :

  • Axes des abscisses (axe des X)
  • Axes des ordonnées (axe des Y)

Needs review!

comment:22 Changed 8 years ago by slabbe

  • Description modified (diff)

comment:23 Changed 8 years ago by jhpalmieri

  • Status changed from needs_review to positive_review

Looks good to me.

comment:24 Changed 8 years ago by jdemeyer

  • Merged in set to sage-5.3.rc0
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.