Opened 10 years ago

Closed 10 years ago

#7249 closed enhancement (fixed)

switch the notebook's templating system to Jinja2

Reported by: ddrake Owned by: boothby
Priority: major Milestone: sage-4.3.2
Component: notebook Keywords:
Cc: mhansen, mpatel, ddrake Merged in: sagenb-0.7
Authors: Tim Dumol, Dan Drake Reviewers: Mitesh Patel
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description

We should switch the notebook's templating system from Jinja to Jinja2. Jinja2, among other things, makes i18n much easier.

Attachments (13)

trac_7249_switch_notebook_to_jinja2.patch (10.0 KB) - added by ddrake 10 years ago.
apply to sagenb repo
trac_7249_jinja2.patch (22.8 KB) - added by timdumol 10 years ago.
Converts much storage to unicode, and adds the necessary encoding for storage and networking. Uses jinja2 instead of jinja. Depends on #7835, #7786 and their dependencies.
trac_7249_jinja2_v2.patch (34.5 KB) - added by mpatel 10 years ago.
Fix several doctests. Replaces previous.
trac_7249_jinja2_v3.patch (36.1 KB) - added by mpatel 10 years ago.
Rebased vs. #7835 v2, #7786 v8, etc. Replaces previous.
trac_7249_jinja2_v4.patch (50.9 KB) - added by timdumol 10 years ago.
Fixes the styling of the worksheet listing page.
trac_7249_jinja2_v5.patch (50.4 KB) - added by timdumol 10 years ago.
Rebase on a new patch set.
trac_7249_jinja2_v5.2.patch (50.4 KB) - added by timdumol 10 years ago.
Added an 'ignore' errors directive to unicode_string
trac_7249_jinja2_v5.3.patch (50.7 KB) - added by timdumol 10 years ago.
Adds the suggested fixes by mpatel.
trac_7249_jinja2_v5.4.patch (59.3 KB) - added by timdumol 10 years ago.
Adds the requested doctests and Se tests and the cleanup on sage_inspect.py
trac_7249-jinja2_v6.patch (60.1 KB) - added by timdumol 10 years ago.
Fixes a problem with text cells with unicode contents.
trac_7249-jinja2_v7.patch (63.0 KB) - added by mpatel 10 years ago.
Test fix, drop pdb. Replaces previous.
trac_7249-jinja2_v8.patch (63.4 KB) - added by mpatel 10 years ago.
Make Worksheet.name's docstring raw, for Sphinx. Replaces previous.
trac_7249-jinja2_v9.5.patch (64.4 KB) - added by mpatel 10 years ago.
Make the docstrings unicode strings. Replaces previous.

Download all attachments as: .zip

Change History (41)

Changed 10 years ago by ddrake

apply to sagenb repo

comment:1 Changed 10 years ago by ddrake

  • Authors set to Dan Drake
  • Status changed from new to needs_review

comment:2 Changed 10 years ago by mpatel

#7269 should now include these changes.

comment:3 follow-up: Changed 10 years ago by timdumol

  • Report Upstream set to N/A
  • Status changed from needs_review to needs_work

#7269 no longer includes these changes. Refer to comment:25:ticket:7269.

Changes must be made to deal with unicode encoding/decoding before this can be reviewed.

comment:4 in reply to: ↑ 3 Changed 10 years ago by ddrake

Replying to timdumol:

#7269 no longer includes these changes. Refer to comment:25:ticket:7269.

Changes must be made to deal with unicode encoding/decoding before this can be reviewed.

At comment:24:ticket:7269 you say that we need Model-View-Presenter/Controller? separation. How hard will that be? It sounds hard. :) I've run into Unicode string problems while trying to internationalize the notebook, so I'd definitely like to do this if it's the right way to go about it.

comment:5 Changed 10 years ago by timdumol

MVC [1] separates the layers of the program into a model, which has the data; views, which are ways to present the data; and the controller/s, which control which data to show etc.

Currently the notebook serves as a model/controller and has some view functions mixed into it (html_*). Separating things will make things cleaner, and will make dealing with unicode easier: the view is presented with encoded byte strings, the controller deals with unicode strings, and data is retrieved from the model (db or filesystem) in encoded data which is converted to unicode strings.

Without this separation, one will need to check whether each string passed is an encoded byte string or a unicode string.

Separation will entail first removing the html_* functions, which in turn means restructuring the templates. I'll be working on that very soon.

[1] http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller

Changed 10 years ago by timdumol

Converts much storage to unicode, and adds the necessary encoding for storage and networking. Uses jinja2 instead of jinja. Depends on #7835, #7786 and their dependencies.

comment:6 Changed 10 years ago by timdumol

  • Cc mhansen mpatel ddrake added
  • Status changed from needs_work to needs_review

Hopefully this patch will do the trick and make i18n easier.

Changed 10 years ago by mpatel

Fix several doctests. Replaces previous.

comment:7 follow-up: Changed 10 years ago by mpatel

  • Status changed from needs_review to needs_work

V3:

  • Fixes a number of doctests.
  • iamges --> images in cell.py.
  • inpit_text = input_text.decode('utf-8', str) --> input_text = input_text.decode('utf-8', 'ignore') in cell.py.

I'm not sure about the choice of encode / decode in some places. Should we always decode instances of str and encode instances of unicode? Could you please double-check the patch? Also, would it be better to write and use a helper function to do the conversions?

Should the changes to cell.html be at #7786? Isn't div_wrap a boolean?

Does the change to Worksheet.preparse belong at #7835?

If we're planning to use unicode "everywhere", even for the worksheet / cell system, should we also consistently cover the relevant methods (and examples for) in worksheet.py and elsewhere?

I apologize for my ignorance.

comment:8 Changed 10 years ago by mpatel

  • Authors changed from Dan Drake to Tim Dumol, Dan Drake

Changed 10 years ago by mpatel

Rebased vs. #7835 v2, #7786 v8, etc. Replaces previous.

comment:9 Changed 10 years ago by mpatel

V3 (really) is rebased against the latest at #7835, #7786. Besides consistently using UTF-8 for the cell/worksheet system, I'm curious about Cell.__init__, where strs are encoded, not decoded. Again, I'm not very familiar with encodings, so I may well be wrong. Thanks!

comment:10 Changed 10 years ago by mpatel

According to the Python Unicode HOWTO, we can use the codecs.open to read from (decode) and write to (encode) files transparently.

comment:11 in reply to: ↑ 7 Changed 10 years ago by timdumol

Replying to mpatel:

V3:

  • Fixes a number of doctests.
  • iamges --> images in cell.py.
  • inpit_text = input_text.decode('utf-8', str) --> input_text = input_text.decode('utf-8', 'ignore') in cell.py.

I'm not sure about the choice of encode / decode in some places. Should we always decode instances of str and encode instances of unicode? Could you please double-check the patch? Also, would it be better to write and use a helper function to do the conversions?

Sure, and yes.

Should the changes to cell.html be at #7786? Isn't div_wrap a boolean?

div_wrap is a boolean. The div_wrap_ and wrap_ variables are workarounds to limitations present in Jinja2 but not in Jinja, so no, it's not for #7786.

Does the change to Worksheet.preparse belong at #7835?

Since this particular ticket is for Jinja2, which requires unicode, no. #7835 only replaces functionality that was previously present.

If we're planning to use unicode "everywhere", even for the worksheet / cell system, should we also consistently cover the relevant methods (and examples for) in worksheet.py and elsewhere?

Yes, unicode functionality should be doctested.

I apologize for my ignorance.

comment:12 Changed 10 years ago by timdumol

Also, don't apologize for any supposed ignorance please. We're all ignorant.

comment:13 Changed 10 years ago by timdumol

  • Status changed from needs_work to needs_review

The new patch fixes the styling of the worksheet listing page and adds two new functions in sage.misc.misc: encoded_str and unicode_str for encoding and decoding strings automatically.

Changed 10 years ago by timdumol

Fixes the styling of the worksheet listing page.

comment:14 Changed 10 years ago by timdumol

This is the new patch queue:

trac_7650-sagenb_doctesting_v6.patch
trac_7650-reviewer.patch
trac_7648-missing_indent.patch
trac_7663-rstrip_prompt.patch
trac_7847-empty-trash-no-referer.patch
trac_7786-template-jinja-idiomatic.patch
trac_7863-declare_vars_aux_js_v2.patch
trac_7874-typeset_interact_labels.patch
trac_7858-key_binding_vars_v2.patch
trac_7666-alphanumeric_cell_ids_B5.patch
trac_7666-reviewer.patch
trac_7835-preparsing-unicode_v2.patch
trac_7249_jinja2_v5.patch

Changed 10 years ago by timdumol

Rebase on a new patch set.

Changed 10 years ago by timdumol

Added an 'ignore' errors directive to unicode_string

comment:15 Changed 10 years ago by mpatel

  • Reviewers set to Mitesh Patel

In notebook.py, should hunk = encoded_str(hunk) be hunk = unicode_str(hunk)?

In Cell.__init__, can we simplify self.set_input_text(str(input).replace('\r',''))?

In twist.py, do we need to call unicode_str on the incoming input? The constructors and set_input_text do this.

comment:16 Changed 10 years ago by mpatel

Also, evaluating print 'é' raises

          File "/home/tmp/sagenb-0.5/src/sagenb/sagenb/notebook/twist.py", line 1281, in render
            cell.set_input_text(input_text)
          File "/home/tmp/sagenb-0.5/src/sagenb/sagenb/notebook/cell.py", line 1270, in set_input_text
            input = unicode_str(input)
          File "/home/tmp/sagenb-0.5/src/sagenb/sagenb/misc/misc.py", line 441, in unicode_str
            return unicode(str(obj), encoding, 'ignore')
        exceptions.UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128)

comment:17 Changed 10 years ago by timdumol

  • Status changed from needs_review to positive_review

Nice catches. Here's a patch to fix them.

Changed 10 years ago by timdumol

Adds the suggested fixes by mpatel.

comment:18 Changed 10 years ago by timdumol

  • Status changed from positive_review to needs_work

Woops! Sorry, accidentally put it to positive review. You should be the one, if ever.

comment:19 Changed 10 years ago by mpatel

Upon saving and quitting a worksheet that contains an evaluated print `é` cell, I see

worksheet.py:1924: exceptions.UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal

What about the unicode_strs in twist.py? I'm just wondering why we need them.

Minor: It appears we can just call encoded_str in sage_inspect.py.

Also, can you slip some Unicode characters into the Se and/or doctests, so that we can detect potential regressions?

comment:20 Changed 10 years ago by timdumol

I'm pretty sure that that warning is unavoidable (has to do with upgrading from non-unicode to unicode). The unicode_strs are there just out of paranoia. Nice catch with the sage_inspect.py thing. I'll update the tests now.

Changed 10 years ago by timdumol

Adds the requested doctests and Se tests and the cleanup on sage_inspect.py

comment:21 Changed 10 years ago by timdumol

  • Status changed from needs_work to needs_review

This new patch should add the requested tests.

Changed 10 years ago by timdumol

Fixes a problem with text cells with unicode contents.

Changed 10 years ago by mpatel

Test fix, drop pdb. Replaces previous.

comment:22 Changed 10 years ago by mpatel

  • Status changed from needs_review to positive_review

V7 fixes a failed doctest in cell.py (line 138) and removes import pdb; pdb.set_trace() from test_worksheet_list.py.

comment:23 Changed 10 years ago by mpatel

On the non-ASCII Unicode characters which may break docbuilds, please see #8000.

comment:24 Changed 10 years ago by mpatel

Sphinx raises

reading sources... [100%] sagenb/notebook/worksheet
Sphinx error:
'utf8' codec can't decode bytes in position 420-422: invalid data

The "invalid" data appears to be u'\xe4' in line 695 (or so) of worksheet.py:

            u'\u03ab\xe4\u013b\u01be\u1e40\u0411'                               

V8 just makes the docstring raw (""" --> r"""). I've added 'Теория чисел', partly for fun.

Changed 10 years ago by mpatel

Make Worksheet.name's docstring raw, for Sphinx. Replaces previous.

comment:25 Changed 10 years ago by mpatel

It may be better to make the docstrings unicode strings. V9 does this, but I can't attach it right now:

Trac detected an internal error:
    IOError: [Errno 28] No space left on device

Changed 10 years ago by mpatel

Make the docstrings unicode strings. Replaces previous.

comment:26 Changed 10 years ago by timdumol

The review patches look good. Nice work. I must say, the reviewing of this patch is real messy.

comment:27 Changed 10 years ago by mpatel

  • Merged in set to sagenb-0.7
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:28 Changed 10 years ago by mpatel

  • blocking set to 8051
Note: See TracTickets for help on using tickets.