Opened 6 years ago

Closed 6 years ago

#16715 closed defect (fixed)

Remove "nonbreaking spaces" from .rst files

Reported by: strogdon Owned by:
Priority: major Milestone: sage-6.3
Component: documentation Keywords:
Cc: fbissey Merged in:
Authors: Frédéric Chapoton Reviewers: Steven Trogdon
Report Upstream: N/A Work issues:
Branch: 64fb999 (Commits) Commit: 64fb99992a991b14adee47320a7c6a515eb52492
Dependencies: Stopgaps:

Description

For reference see sage-devel comment:

https://groups.google.com/forum/#!topic/sage-devel/W-fPceNbp0w

Certain .rst files (src/doc/fr/tour_coercion.rst in particular) contain nonbreaking spaces that can cause docutils to fail when generating html docs. Failures are known to be present with docutils-0.11. An octal dump (od -c) of tour_coercion.rst reveals the spaces:

0013340   n   n   e   a   u   ,       e   l   l   e   s       n   e    
0013360   s   o   n   t       p   a   s       d   e       t   y   p   e
0013400       `   `   R   i   n   g   E   l   e   m   e   n   t   `   `
0013420 302 240   :  \n  \n   :   :  \n  \n                   s   a   g
0013440   e   :       M       =       M   a   t   r   i   x   (   Z   Z
0013460   ,   2   ,   2   )   ;       M  \n                   [   0    
0013500   0   ]  \n                   [   0       0   ]  \n            
0013520       s   a   g   e   :       i   s   i   n   s   t   a   n   c

The 302 is octal for a "circumflex capital A" and the 240 is octal for a "nonbreaking space" and it is these characters that cause docutils-0.11 to fail. With docutils-0.11 the failure presents as

$ rst2html.py src/doc/fr/tutorial/tour_coercion.rst > /dev/null
src/doc/fr/tutorial/tour_coercion.rst:149: (WARNING/2) Inline literal start-string without end-string.

From a unicode compliant terminal line 149 starts as

Le type ``RingElement`` ne correspond pas parfaitement à la notion
mathématique d'élément d'anneau. Par exemple, bien que les matrices carrées
appartiennent à un anneau, elles ne sont pas de type ``RingElement`` :

where the nonbreaking space appears between the `` and :.

Change History (18)

comment:1 Changed 6 years ago by strogdon

  • Cc fbissey added

comment:2 Changed 6 years ago by strogdon

I'm not sure how pervasive these spaces are nor am I certain of the best way to remove them? These spaces appear to be present at other places in src/doc/fr/tutorial/tour_coercion.rst and are probably only an issue with non-English .rst files.

comment:3 Changed 6 years ago by strogdon

  • Component changed from PLEASE CHANGE to documentation

comment:4 follow-up: Changed 6 years ago by chapoton

Here is a way to see them all:

grep --color='auto' -P -n "\xA0" *.rst

comment:5 in reply to: ↑ 4 Changed 6 years ago by fbissey

Replying to chapoton:

Here is a way to see them all:

grep --color='auto' -P -n "\xA0" *.rst

Bonus points if you have a sed comand that would get rid of them.

comment:6 Changed 6 years ago by chapoton

found something here :

http://lifehacker.com/5810026/quickly-find-and-replace-text-across-multiple-documents-via-the-command-line

namely

perl -pi -w -e 's/SEARCH_FOR/REPLACE_WITH/g;' *.rst

The following seems to work very well :

perl -pi -w -e 's/\xC2\xA0/ /g;' *.rst
Last edited 6 years ago by chapoton (previous) (diff)

comment:7 Changed 6 years ago by fbissey

I will try later.

comment:8 Changed 6 years ago by vbraun

Note that the files are UTF-8 encoded. Hence the U+00A0 codepoint is encoded in a 3-byte sequence.

comment:9 Changed 6 years ago by chapoton

  • Authors set to Frédéric Chapoton
  • Branch set to u/chapoton/16715
  • Commit set to 8b14f373f8ed4634c8b77583a85d0626a1e1b3df

Hello,

I did the search-and-replace in all the French documentation. It looks ok, I think.

Should it be done on the other languages also ?


New commits:

8b14f37trac #16715 remove all unbreakable spaces from fr documentation

comment:10 Changed 6 years ago by strogdon

Before I check things did you try, for example,

rst2html.py src/doc/fr/tutorial/tour_coercion.rst > out.html

to see if there are any WARNINGS and then load out.html in a browser to see if it appears consistent? I could have done something wrong here but when I remove just the nonbreaking spaces I'm left with a file that is not utf-8.

comment:11 Changed 6 years ago by strogdon

Of course there will not be WARNINGS if the rst2html.py that's shipped with vanilla Sage is used. I will try to test later today.

comment:12 Changed 6 years ago by strogdon

I've fetched things locally and the files look really good. But I'll test later today.

comment:13 Changed 6 years ago by strogdon

Everything seems fine here. And the WARNING disappears when building against docutils-0.11 (sage-on-gentoo). I see, if I've done things correctly, only two other .rst files where the space occurs. They are

ru/tutorial/index.rst:7:Sage — это бесплатное и свободно распространяемое математическое программное
ru/tutorial/introduction.rst:5:Данное учебное пособие — лучший способ познакомиться с Sage за несколько

I can see the spaces in these files and the above perl script does remove them. I think it would be good to replace these files as well.

Are there any thoughts on how to prevent this sort of thing in the future? Or does one just deal with it whenever it interferes with building the html docs?

comment:14 Changed 6 years ago by git

  • Commit changed from 8b14f373f8ed4634c8b77583a85d0626a1e1b3df to 64fb99992a991b14adee47320a7c6a515eb52492

Branch pushed to git repo; I updated commit sha1. New commits:

64fb999trac #16715 remove all unbreakable space from russian doc

comment:15 Changed 6 years ago by chapoton

I have taken care of the russian case. :)

I do not know how to prevent this in the future. There is already a patchbot plugin which warns when a new unicode character is introduced, but it does not tell whether this is the "unbreakable space" or something else like "éàùèö".

comment:16 Changed 6 years ago by strogdon

  • Status changed from new to needs_review

comment:17 Changed 6 years ago by strogdon

  • Reviewers set to Steven Trogdon
  • Status changed from needs_review to positive_review

I believe this fixes things.

comment:18 Changed 6 years ago by vbraun

  • Branch changed from u/chapoton/16715 to 64fb99992a991b14adee47320a7c6a515eb52492
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.