Opened 13 years ago

Closed 13 years ago

Last modified 13 years ago

#8036 closed defect (fixed)

Sage 4.3.1 reference manual: PDF version failed to build due to non-ASCII characters in docstring

Reported by: Minh Van Nguyen Owned by: Minh Van Nguyen
Priority: minor Milestone: sage-4.3.2
Component: documentation Keywords: non-ASCII characters
Cc: Merged in: sage-4.3.2.rc0
Authors: Mitesh Patel Reviewers: John Palmieri
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description

Even after applying #8021, the PDF version of the reference manual for Sage 4.3.1 failed to build. This is due to non-ASCII characters in the docstring of the method prove_BSD() of the class EllipticCurve_rational_field in

sage/schemes/elliptic_curves/ell_rational_field.py

Here's a snippet of the error message:

! Package inputenc Error: Unicode char \u8:ǎ not set up for use with LaTeX.

See the inputenc package documentation for explanation.
Type  H <return>  for immediate help.
 ...                                              
                                                  
l.364560 C. Tarniţǎ
                     . Computational verification of the Birch and
?

Attachments (5)

trac_8036-non-ascii.patch (1011 bytes) - added by Minh Van Nguyen 13 years ago.
based on Sage 4.3.1
utf8.tex (103 bytes) - added by Gonzalo Tornaría 13 years ago.
Latex file which shows usage of utf8
trac_8036-tex-replacements.patch (875 bytes) - added by John Palmieri 13 years ago.
apply only this patch
trac_8036-three-non-ascii.patch (2.0 KB) - added by Rob Beezer 13 years ago.
trac_8036-docbuild_utf8x.patch (1.7 KB) - added by Mitesh Patel 13 years ago.
Set utf8x in Sphinx option. Solo patch.

Download all attachments as: .zip

Change History (19)

Changed 13 years ago by Minh Van Nguyen

Attachment: trac_8036-non-ascii.patch added

based on Sage 4.3.1

comment:1 Changed 13 years ago by Minh Van Nguyen

Authors: Minh Van Nguyen
Status: newneeds_review

comment:2 Changed 13 years ago by Gonzalo Tornaría

Status: needs_reviewneeds_work

LaTeX is perfectly fine with utf8 if one uses the inputenc package:

\usepackage[utf8x]{inputenc}

IOW, it's the latex preamble which needs fixing.

Changed 13 years ago by Gonzalo Tornaría

Attachment: utf8.tex added

Latex file which shows usage of utf8

comment:3 Changed 13 years ago by John Palmieri

Authors: Minh Van NguyenMinh Van Nguyen, John Palmieri
Status: needs_workneeds_review

Sphinx uses \usepackage[utf8]{inputenc}, so if we want to change this to [utf8x], we need to patch Sphinx. I have no experience with [utf8] or [utf8x], but the documentation for inputenc frowns on utf8x, to some extent. Another option is to add characters one by one, as needed, using

\DeclareUnicodeCharacter{blah}{blah}

(See the documentation for inputenc.) If we knew the details, we could add lines like this to SAGE_ROOT/devel/sage/doc/common/conf.py -- add to the latex_preamble. I don't know the details.

A third option is to get rid of all accents, as mvngu's patch does.

A fourth option is to use the attached patch trac_8036-tex-replacements.patch, which does some preprocessing, changing the offending character to something latex can handle.

I'll mark this as "needs review", in case option 4 is appealing.

Changed 13 years ago by John Palmieri

apply only this patch

comment:4 Changed 13 years ago by John Palmieri

Note: When I preview my attachment, the "offending character" looks like a capital "C" with a cedilla, but don't be deceived: the actual character (when I download the patch and look at it in emacs, for example), is an "a" with a "vee" accent on top -- the last character in "Tarnita".

comment:5 in reply to:  3 ; Changed 13 years ago by John Palmieri

Replying to jhpalmieri:

I have no experience with [utf8] or [utf8x], but the documentation for inputenc frowns on utf8x, to some extent.

In case you're interested in this, the documentation says

For other languages that do not fit well into LaTeX font selection scheme, ... the outlined inputenc approach will not work. If that is the case one can try using Dominique Unruh’s option utf8x for inputenc which has a somewhat different approach and encodes many more UTF-8 characters than the standard utf8 option. However, we recommend to do so only if you really need such alphabets as there are problems with this extended approach which were precisely the reason that we decided to limit the support to what is properly supported within the boundaries of LaTeX’s font selection.

I don't know what the "problems with this extended approach" are.

comment:6 in reply to:  5 ; Changed 13 years ago by Gonzalo Tornaría

Replying to jhpalmieri:

Replying to jhpalmieri:

I have no experience with [utf8] or [utf8x], but the documentation for inputenc frowns on utf8x, to some extent.

In case you're interested in this, the documentation says

For other languages that do not fit well into LaTeX font selection scheme, ... the outlined inputenc approach will not work. If that is the case one can try using Dominique Unruh’s option utf8x for inputenc which has a somewhat different approach and encodes many more UTF-8 characters than the standard utf8 option. However, we recommend to do so only if you really need such alphabets as there are problems with this extended approach which were precisely the reason that we decided to limit the support to what is properly supported within the boundaries of LaTeX’s font selection.

I don't know what the "problems with this extended approach" are.

I use [utf8x] on a daily basis, without issues. As you quoted above, it is well known that [utf8] supports a reduced set of characters. Not that utf8x supports arbitrary unicode characters, but I think a proper superset of those supported by utf8.

The option [utf8x] is part of latex package "ucs".

Your proposal (according to the posted patch) would be to special-case any characters not supported by [utf8] option? The patch only handles that particular letter.

comment:7 in reply to:  6 Changed 13 years ago by John Palmieri

Replying to tornaria:

Your proposal (according to the posted patch) would be to special-case any characters not supported by [utf8] option? The patch only handles that particular letter.

It's either that or patch Sphinx -- not hard, but I'm reluctant to patch external packages if there are other alternatives. I don't know how often we are likely to come across characters not supported by [utf8], so I don't know which option is better.

comment:8 Changed 13 years ago by Rob Beezer

There are three non-ascii characters in this file, which prevent me from building the HTML version of the documentation. The patches here already seem to address the tex processing that builds the PDF.

The patch simply identifies the three characters and replaces them with straight ASCII equivalents. It might be useful for folks trying to build the docs to test their own fixes/changes elsewhere. I'm not trying to weigh-in on the long-run solution to this problem.

Changed 13 years ago by Rob Beezer

comment:9 Changed 13 years ago by Mitesh Patel

#7999 should take care of the HTML reference manual.

comment:10 Changed 13 years ago by Mitesh Patel

For now, what if we set:

latex_elements['inputenc'] = '\\usepackage[utf8x]{inputenc}'

in doc/common/conf.py?

Changed 13 years ago by Mitesh Patel

Set utf8x in Sphinx option. Solo patch.

comment:11 in reply to:  10 Changed 13 years ago by Mitesh Patel

Replying to mpatel:

For now, what if we set:

I've attached a patch that does this. It appears to solve the problem in this ticket's description.

But it fails to handle the unicode tests we've added to SageNB at #7249.

comment:12 Changed 13 years ago by John Palmieri

Status: needs_reviewpositive_review

I like trac_8036-docbuild_utf8x.patch. I didn't know about the latex_elements customization; very nice.

To the release manager: apply only trac_8036-docbuild_utf8x.patch.

comment:13 Changed 13 years ago by Minh Van Nguyen

Authors: Minh Van Nguyen, John PalmieriMitesh Patel
Merged in: sage-4.3.2.rc0
Resolution: fixed
Reviewers: John Palmieri
Status: positive_reviewclosed

comment:14 Changed 13 years ago by Minh Van Nguyen

The attachment trac_8036-docbuild_utf8x.patch breaks the build of the French tutorial. See #8146 for a follow-up to this issue.

Note: See TracTickets for help on using tickets.