Opened 8 years ago

Closed 8 years ago

#10440 closed defect (fixed)

preparser does not correctly identify encoding lines

Reported by: ddrake Owned by: ddrake
Priority: major Milestone: sage-4.6.2
Component: misc Keywords: preparser encoding
Cc: Merged in: sage-4.6.2.alpha4
Authors: Dan Drake Reviewers: Volker Braun, Maxim Cournoyer
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by vbraun)

PEP 263 says that the encoding line for a Python file should match a certain regular expression in the first or second line, but the preparser seems to be looking for the precise string "# -*- coding: utf-8 -*-".

We should follow PEP 263 precisely. In particular, this will make it easy for SageTeX users to specify an encoding; because of TeX weirdness that I don't entirely understand, it is easiest to write the line "## -*- coding: utf-8 -*-" to the generated script.

  • Apply trac_10440.patch to the sage library repository.
  • Apply trac_10440_scripts_repo.patch to the sage scripts repository.

Attachments (2)

trac_10440.patch (6.7 KB) - added by ddrake 8 years ago.
trac_10440_scripts_repo.patch (1.6 KB) - added by ddrake 8 years ago.
apply to scripts repo

Download all attachments as: .zip

Change History (13)

Changed 8 years ago by ddrake

comment:1 Changed 8 years ago by ddrake

  • Authors set to Dan Drake
  • Status changed from new to needs_review

comment:2 Changed 8 years ago by ddrake

  • Owner changed from jason to ddrake

Note that we aren't following PEP 263, since we default to UTF-8 and not ASCII. The patch now allows the encoding declaration to be in the first or second line, and makes the handle_encoding_declaration() much simpler and easier to understand.

comment:3 Changed 8 years ago by ddrake

  • Status changed from needs_review to needs_work

Hmm, this doesn't fix the SageTeX issue: it looks like the preparser is still looking for "# -*- coding: utf-8 -*-". I'll work on an updated patch.

Changed 8 years ago by ddrake

apply to scripts repo

comment:4 Changed 8 years ago by ddrake

Apply both patches, although I suspect only the second is necessary to fix the immediate SageTeX problem. It seems like unnecessary code duplication to have two places where we look through a string and extract an encoding declaration, but at least both bits of code work properly now.

comment:5 Changed 8 years ago by ddrake

  • Status changed from needs_work to needs_review

comment:6 Changed 8 years ago by ddrake

  • Keywords preparser added; prepaser removed

comment:7 Changed 8 years ago by jdemeyer

Concerning the default of UTF-8 instead of ASCII: that's perfectly fine since any valid ASCII text is also valid UTF-8. If we default to UTF-8, there is also no reason to look for a UTF-8 byte order mark (BOM).

comment:8 Changed 8 years ago by vbraun

  • Reviewers set to Volker Braun
  • Status changed from needs_review to positive_review

Irregardless of SageTeX, we should hand through the encoding. This patch works for me. Applies cleanly on Sage-4.6.2.alpha3.

comment:9 Changed 8 years ago by vbraun

  • Description modified (diff)

comment:10 Changed 8 years ago by maxim

  • Reviewers changed from Volker Braun to Volker Braun, Maxim Cournoyer

I confirmed that by installing the second patch (trac_10440_scripts_repo.patch), I can now process code comments containing utf-8 characters (â ô à é...) with sagetex correctly.

Thank you for looking into this, Dr. Drake!

comment:11 Changed 8 years ago by jdemeyer

  • Merged in set to sage-4.6.2.alpha4
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.