Opened 9 years ago

Closed 6 years ago

#10637 closed enhancement (fixed)

Implement sage -sws2rst

Reported by: nthiery Owned by: jason, mpatel, was
Priority: major Milestone: sage-5.12
Component: notebook Keywords: ReST, worksheet
Cc: nthiery, hivert, slabbe, kcrisman, beezer, whuss, novoselt, kini, jdemeyer Merged in: sage-5.12.beta0
Authors: Pablo Angulo, Karl-Dieter Crisman Reviewers: Nicolas M. Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow, John Palmieri, Simon King, Karl-Dieter Crisman, Pablo Angulo
Report Upstream: Workaround found; Bug reported upstream. Work issues:
Branch: Commit:
Dependencies: #14330 Stopgaps:

Description (last modified by jdemeyer)

Implement:

    sage -sws2rst bla.sws bli.sws ...

which given worksheets

bla.sws, bli.sws, ... would create ReST files bla.rst, bli.rst, ... together with media directories:

media/bla/
media/bla/data/
media/bla/7/sage0.png
...
media/bli/
...

The proposed implementation adds a script local/bin/sage-sws2rst, edits spkg/bin/sage to add the sws2rst option, and add some libraries in sagenb-main/sagenb/notebook/. It further depends on the BeautifulSoup Python library (released under Python's license).

The script builds the ReST file from the worksheet.html file included in the .sws as follow:

  • Preparsing to handle the input / output fields
  • Parsing of the resulting html using BeautifulSoup
  • Manipulation on the obtained tree

Suggestions for better file layout or implementation welcome!

Install instructions

Release Manager Instructions

Attachments (20)

beautifulsoup-3.2.0.p0.spkg (30.0 KB) - added by pang 9 years ago.
sage package for BeautifulSoup?
add_sws2rst.patch (4.4 KB) - added by pang 9 years ago.
patch to be applied on local/bin
tools_for_sws2rst.patch (16.9 KB) - added by pang 9 years ago.
patch to be applied on sagenb
add_sws2rst_2.patch (4.3 KB) - added by pang 9 years ago.
replaces the first patch
tools_sws2rst_2.patch (21.9 KB) - added by pang 9 years ago.
replaces the first patch
add_sws2rst_3.patch (3.4 KB) - added by pang 9 years ago.
add_sws2rst_3.patch (replaces earlier versions)
tools_sws2rst_3.patch (25.4 KB) - added by pang 9 years ago.
tools_sws2rst_3.patch (replaces earlier versions)
add_sws2rst_4.patch (3.4 KB) - added by pang 8 years ago.
add_sws2rst_4.patch (replaces earlier versions)
tools_sws2rst_4.patch (25.3 KB) - added by pang 8 years ago.
tools_sws2rst_43.patch (replaces earlier versions)
trac_10637-scripts.patch (2.5 KB) - added by kcrisman 7 years ago.
Apply to local/bin
trac_10637-sagenb-reviewer.patch (898 bytes) - added by kcrisman 7 years ago.
Apply to sagenb
trac_10637-root-docsandmore.patch (7.0 KB) - added by kcrisman 7 years ago.
Apply to root repository
sage-sws2rst.rej (6.6 KB) - added by pang 7 years ago.
trac_10637-scripts-docsandmore.patch (7.0 KB) - added by kcrisman 7 years ago.
Apply to scripts
trac_10637_answer_to_kcrisman.patch (11.3 KB) - added by pang 7 years ago.
minor updates suggested by kcrisman
trac_10637-root.patch (1.1 KB) - added by kcrisman 7 years ago.
Apply to root repository
trac_10637-second-review.patch (1.3 KB) - added by kcrisman 7 years ago.
exp1.sws (53.1 KB) - added by dimpase 7 years ago.
example sws with an error
blah.sws (490 bytes) - added by dimpase 7 years ago.
another, trivial, example of sws giving an error
trac_10637_missing_exec.patch (497 bytes) - added by dimpase 7 years ago.
adding missing exec in spkg/bin/sage

Download all attachments as: .zip

Change History (187)

comment:1 Changed 9 years ago by nthiery

  • Description modified (diff)

comment:2 Changed 9 years ago by nthiery

  • Description modified (diff)

Changed 9 years ago by pang

sage package for BeautifulSoup?

comment:3 Changed 9 years ago by pang

So I have some preliminary code, but after a conversation with Vincent there is some things I have to check. For the moment, I have uploaded a sage package for BeautifulSoup?, which I think is important in dealing with the html code generated by tinyMCE, and which I use in the preliminary code. It's pretty straightforward so I hope it works in other platforms. I only tested on linux 32 bits, but it's a pure python file.

Changed 9 years ago by pang

patch to be applied on local/bin

Changed 9 years ago by pang

patch to be applied on sagenb

comment:4 Changed 9 years ago by pang

  • Cc nthiery hivert added

Install instructions

  • It's necessary to install the beautifulsoup spkg
  • A first patch adds the sage-sws2rst script, and it must be imported on the local/bin dir
  • A second patch sends three files to sagenb/misc and it must be imported on the devel/sagenb dir, and followed by a
    sage -python setup.py install && sage -python setup.py develop

Comments

The code above is preliminary and undocumented, but I post it in order to guide the discussion. I'll write comments and make improvements when we decide on some issues.

I have to warn the task is not trivial, as html is more relaxed than rst, so that some editing is required after the conversion. The result from the script produces rst files that get compiled correctly only on occassion, but usually the modifications required are not much.

How it works

The sws file is uncompressed, and the images inside the 'data' and the 'cell/xx' dirs ar copied to a '_media' dir (this is done by the sage-sws2rst file).

Then the file worksheet.html is parsed, and split into comments, source code and output. The first is handled using the library BeautifulSoup?. Source code is mostly untouched, and results are parsed to find some patterns like "image", "latex", "docstring", "other html patterns" and "plain text". The first two are displayed. The docstring, or any unrecognized html in the result cells are discarded. Plain text is indented, and is recognized as the output of the previous cell.

Issues

  • rest does not understand that images or latex are the output of some code cells.
  • docstrings are ignored. I felt stupid trying to parse some html that comes from a rst file. Maybe I can grab the rst file from the source, but only if it belongs to the library, so is it worth it?

Please add your concerns here.

comment:5 Changed 9 years ago by slabbe

  • Cc slabbe added

comment:6 Changed 9 years ago by slabbe

I dont know if it can help but this week I found this makefile which helped me to understand how to call sphinx with which options :

devel/sage-main/doc/en/tutorial/Makefile

Changed 9 years ago by pang

replaces the first patch

Changed 9 years ago by pang

replaces the first patch

comment:7 Changed 9 years ago by pang

  • Status changed from new to needs_review

I have tested this patch against a huge group of complex worksheets, and I'm satisfied with the result. I took into account many little things that are Sage specific or tinyMCE specific, but this new patch is still not 100% guaranteed to produce valid rst. However, it's very likely that the result will get compiled by sage -docbuild, and it's very likely that only small changes are required.

I'm still open to general suggestions.

comment:8 Changed 9 years ago by kcrisman

  • Cc kcrisman added

comment:9 Changed 9 years ago by nthiery

Hi Pablo!

I just used your script for preparing my next demo from a previous worksheet. It worked smoothly, thanks so much! It would be really good to get this in; alas, I don't feel qualified to review code touching the notebook. A couple tiny comments:

  • Could you add the installation instructions to the ticket description? This will make Jeroen happy :-)
  • I had to fix the execution rights on the script:
 chmod +x /opt/sage/local/bin/sage-sws2rst
  • Shall the advertisement of the option appear in sage -h or sage -advanced. I could argue for sage -h since this functionality is for sage users as much as developpers.
  • Support for converting several sws at once would be handy.
  • I got this warning each time I ran the script:
  /opt/sage/local/bin/sage-sws2rst:19: RuntimeWarning: tempnam is a potential security risk to your program
  tempname = os.tempnam('.')

Cheers,

Nicolas

comment:10 Changed 9 years ago by kcrisman

  • Cc beezer whuss added
  • Reviewers set to Nicolas Thiéry

nthiery, this is still reviewing! I'm adding you, even if you don't feel like you can give positive review.

Also, we'll want correct commit messages for the release manager.

It is really a shame that the work here, at tex2sws, and sws2tex are not coordinated. I'm just cc:ing some of those people so they are aware of this. Eventually, of course, those should both be part of the standard Sage, and one could refactor some of them. Well, at least of sws2tex and sws2rst.

comment:11 Changed 9 years ago by nthiery

  • Reviewers changed from Nicolas Thiéry to Nicolas Thiéry, ...

Ah, I forgot to mention:

What about using '----' instead of ':::::' to underline subsection titles?

It seems a bit more readable, and ':' is already used quite a lot by Sphinx; and one could imagine a very (very!) short title producing a confusing '::'.

Cheers,

Nicolas

comment:12 Changed 9 years ago by nthiery

For the record, the following tests fail with the patches applied on 4.6.2 with Error: TAB character found

   sage -t  "4.6.2/devel/sagenb-main/sagenb/misc/worksheet2rst.py"
   sage -t  "4.6.2/devel/sagenb-main/sagenb/misc/comments2rst.py"

comment:13 Changed 9 years ago by pang

Hello: I've addressed those issues, but I can't pass a test with a message:

Expected:
    '::\n\n    sage: 2+2'
Got:
    '::\n\n    sage: 2+2'

They look the same. They both use spaces, not tabs.

If I run the test from a sage console, it works (with ==).

any ideas?

Changed 9 years ago by pang

add_sws2rst_3.patch (replaces earlier versions)

Changed 9 years ago by pang

tools_sws2rst_3.patch (replaces earlier versions)

comment:14 follow-up: Changed 9 years ago by pang

  • Description modified (diff)
  • Owner changed from jason, mpatel, was to pang

Done. I also added some tests:

  • I replaced by """, otherwise tests are ignored!!!
  • I don't use the tests which include the word sage:, because they fail in weird ways. This is a consequence of the testing framework.

comment:15 in reply to: ↑ 14 Changed 9 years ago by pang

  • Owner changed from pang to jason, mpatel, was

Done. I also added some tests:

  • I replaced ' ' ' by " " ", otherwise tests are ignored!!!
  • I don't use the tests which include the word sage:, because they fail in weird ways. This is a consequence of the testing framework.

comment:16 Changed 9 years ago by jbandlow

This is great! I notice that when this is run on old notebook files, one gets the following error:

    Traceback (most recent call last):
  File "/usr/local/src/sage/sage-4.7.rc0/local/bin/sage-sws2rst", line 66, in <module>
    process_sws(file_name)
  File "/usr/local/src/sage/sage-4.7.rc0/local/bin/sage-sws2rst", line 38, in process_sws
    for cell in os.listdir(cells_path):
OSError: [Errno 2] No such file or directory: '/tmp/Iterators and backtracking.sws/sage_worksheet/cells'

I think this is because there is another directory level on these old-style worksheets:

   '/tmp/Iterators and backtracking.sws/sage_worksheet/<some number>/cells'

It would be nice if this error could be caught and the user given a message to the effect of "Conversion failed. Try opening your notebook with a recent version of sage, resaving it, and running the script again".

comment:17 Changed 9 years ago by jbandlow

This post is just an attempt to close the boldface.

comment:18 Changed 9 years ago by jason

I followed the directions in the description to install, but got this when I tried running it:

% sage -sws2rst mysheet.sws
/Users/grout/sage/local/bin/sage-sage: line 385: /Users/grout/sage/local/bin/sage-sws2rst: Permission denied

So perhaps it needs an executable bit, or the patch needs to be exported in git format to pick that up?

comment:19 Changed 9 years ago by jason

My worksheet name included spaces. A few images led to this being in the text:

.. image:: PREP Quickstart, NumAnalysis_media/cell_5_sage0.png

However, the html output from running sphinx on the rst file included a link to this image:

<img alt="PREPQuickstart,NumAnalysis_media/cell_5_sage0.png" class="align-center" src="PREPQuickstart,NumAnalysis_media/cell_5_sage0.png" /> 

(notice the missing spaces)

I looked around and this is a bug report here: https://bitbucket.org/birkenfeld/sphinx/issue/453/spaces-get-stripped-from-image-file-names

According to http://docutils.sourceforge.net/docs/howto/rst-directives.html#image, the first thing done in the image directive is to discard all whitespace, so this script should probably discard all whitespace when making the image links.

comment:20 Changed 9 years ago by jason

I just converted a worksheet, and the input/output of a cell got converted so that the output was indented with a tab:

    sage: spline[1]
	1/2*(x - 2)*((-14.3540674845*x + 37.3205750783)*(x - 2) + 25.741626739501953) + 20.0

It seems like the output should be indented 4 spaces instead.

Changed 8 years ago by pang

add_sws2rst_4.patch (replaces earlier versions)

Changed 8 years ago by pang

tools_sws2rst_43.patch (replaces earlier versions)

comment:21 Changed 8 years ago by pang

  • Description modified (diff)

comment:22 Changed 8 years ago by pang

Solved the problems mentioned by jason: thanks for testing it.

comment:23 Changed 8 years ago by pang

jbandlow, ¿can you send me an old worksheet file? in the old worksheets that I keep, it's rather

/tmp_dir/<some number>/cells

than

/tmp_dir/sage_worksheet/<some number>/cells

It is possible to deal with the old style directly, unless you have a reason to force the user to pass the worksheet through a newer version...

comment:24 follow-up: Changed 8 years ago by kcrisman

Here is another related project - sws2tex, though it uses HTMLParser instead of Beautiful Soup.

comment:25 in reply to: ↑ 24 Changed 8 years ago by pang

Replying to kcrisman:

Here is another related project - sws2tex, though it uses HTMLParser instead of Beautiful Soup.

I know, someone told me about that project before. Though it may seem like duplicated effort, the approach is different: sws2tex converts html+css style into latex style, color by color, font by font, while sws2rst ignores all style. Still I might learn from the code, for sure.

comment:26 Changed 8 years ago by kini

Just checking, but has tools_sws2rst_4.patch been ported to the new notebook at http://github.com/sagemath/sagenb ?

comment:27 Changed 8 years ago by kini

  • Dependencies set to #11080

Setting #11080 as a dependency because the old notebook is basically abandoned now.

comment:28 Changed 8 years ago by kcrisman

Low-priority comment: Beautiful Soup is now at version 4, though that is under the MIT license. It works with Python 2.7 and 3. The 3.2.1 maintenance release is as of last February, also not too old.

comment:29 Changed 8 years ago by kcrisman

Technical comment: see the latest spkg-install instructions for a slight update on the shell commands - in particular, we should have a nonempty exit check at the end, and the SAGE_LOCAL bit has changed slightly.

Also, you don't need to make this a p0 package, because there are no patches! :)

comment:30 Changed 8 years ago by kcrisman

  • Report Upstream changed from N/A to Reported upstream. No feedback yet.
  • Reviewers changed from Nicolas Thiéry, ... to Nicolas Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow
  • Status changed from needs_review to needs_work

Less trivially, and for this 'needs work':

  • Your patch probably won't apply - there is no longer the script sage-sage! See e.g. the 5.0 sage-sage rump. Luckily, you can probably create nearly the same patch to the new script (which is the old script) at $SAGE_ROOT/spkg/bin/sage.
  • The notebook is now "upstream", so to speak, so we need to report this there and can't really merge it until the new notebook is in in any case. I've submitted an upstream issue report at this location.

So might as well upgrade Beautiful Soup and fix the other stuff too. I would like to try this over the next few weeks to finally start migrating some documentation to ReST format, so I may do all this if I get a chance.

comment:31 follow-up: Changed 8 years ago by pang

Thanks a lot for the careful comment on what needs to be updated. I'll take care soon.

comment:32 in reply to: ↑ 31 Changed 8 years ago by kcrisman

Thanks a lot for the careful comment on what needs to be updated. I'll take care soon.

Great; I would much rather review this functionality so that I can use it than do all of that! Looking forward to it.

comment:33 Changed 7 years ago by kcrisman

Just to encourage you, I took a look and it looks like you can really practically take the identical patch, just on this new file! Since all the other files are new, and don't seem to interact with anything else, they should be fine.

I'm even wondering whether sage-sws2rst would need to live in Sage. This is nearly pure notebook functionality. But maybe that would be for another ticket; far more important to get this in. I may try at least doing this "by hand" next week to work on my project.

comment:34 Changed 7 years ago by kcrisman

  • Description modified (diff)

New Beautiful Soup spkg at http://sage.math.washington.edu/home/kcrisman/beautifulsoup-3.2.1.spkg this location. I'm going with the maintenance release for now because I don't want to have to think about licenses.

I'll try to figure out what to do with the other stuff soon; shouldn't be hard, but I always have trouble with the new notebook upstream business...

comment:35 Changed 7 years ago by kcrisman

Rebased the patch for the sage script for the root directory. I put its advanced message in documentation because all the other sections were even worse, and it certainly can lead to documentation. I'm open to ideas about that, though.

Changed 7 years ago by kcrisman

Apply to local/bin

comment:36 Changed 7 years ago by kcrisman

  • Description modified (diff)
  • Reviewers changed from Nicolas Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow to Nicolas Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow, Karl-Dieter Crisman
  • Status changed from needs_work to needs_review

comment:37 Changed 7 years ago by kcrisman

  • Description modified (diff)

Okay, at this point one could just review this as-is. Naturally, the sagenb stuff needs to be applied upstream, but these instructions would work. I'm not sure what the whole setup.py business is about, but I guess it's worth doing? I don't know if you'd have to be in the sagenb directory for that...

  • Apply tools_sws2rst_4.patch to sagenb, possibly followed by
        sage -python setup.py install && sage -python setup.py develop
    

comment:38 Changed 7 years ago by kcrisman

Pablo, why do these files need to be in the notebook? I'm a little confused about this. They don't seem to use anything in the notebook. Couldn't one just have them be in some other directory in the main Sage library? I do understand that they are about the notebook, so maybe it would be best to have them there, but then it's weird that the script isn't also in the notebook.

Anyway, that's a side issue, I guess.

comment:39 Changed 7 years ago by kcrisman

Okay, I tried this, but got

$ ../../sage -sws2rst ~/Downloads/Test4sws2rst.sws 
<snip>
  File "/Users/.../sage-5.1.beta1-flask/local/lib/python/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 20] Not a directory: '/Users/.../Test4sws2rst.sws/sage_worksheet'

which might make sense based on comment:23, except this was a brand-new worksheet. Maybe I'm doing something wrong in using it?

The specific place in sage-sws2rst that breaks is

sws_file.extractall(tempname)

and the rest the Python this calls.

Here is the problem, in pure Python

>>> tempname = os.path.join(tempfile.gettempdir(), "/Users/.../Test4sws2rst.sws")
>>> tempname
'/Users/.../Test4sws2rst.sws'
>>> tempfile.gettempdir()
'/var/folders/Yy/YytEJm5VEB0+pBRD7JNLe++++TQ/-Tmp-'

Why isn't this working right? Ah, looking at the help tells me...

If any component is an absolute path, all previous path components
    will be discarded.

So apparently one can only do an extraction from the same directory (../ gives a similar problem).

But then!

$ ./sage -sws2rst Test4sws2rst.sws 
Test4sws2rst.sws
Traceback (most recent call last):
  File "/Users/.../sage-5.1.beta1-flask/local/bin/sage-sws2rst", line 66, in <module>
    process_sws(file_name)
  File "/Users/.../sage-5.1.beta1-flask/local/bin/sage-sws2rst", line 39, in process_sws
    for cell in os.listdir(cells_path):
OSError: [Errno 2] No such file or directory: '/var/folders/Yy/YytEJm5VEB0+pBRD7JNLe++++TQ/-Tmp-/Test4sws2rst.sws/sage_worksheet/cells'

And now I nearly give up. Isn't the point exactly to create such things? I'm doing something wrong here, or maybe I rebased the patches wrong? I don't think so...

Here we go.

$ ls /var/folders/Yy/YytEJm5VEB0+pBRD7JNLe++++TQ/-Tmp-/Test4sws2rst.sws/sage_worksheet/
worksheet.html		worksheet.txt		worksheet_conf.pickle

No cells. I wonder why? worksheet.html does have stuff in it. Now I really am stumped.

Last edited 7 years ago by kcrisman (previous) (diff)

comment:40 Changed 7 years ago by kcrisman

  • Status changed from needs_review to needs_work

This needs work - the sage -advanced list now has a file conversion category, and I didn't base this on the most recent version of Sage. Patch coming.

comment:41 Changed 7 years ago by kcrisman

  • Authors changed from Pablo Angulo to Pablo Angulo, Karl-Dieter Crisman
  • Reviewers changed from Nicolas Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow, Karl-Dieter Crisman to Nicolas Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow
  • Status changed from needs_work to needs_review

Okay, I put it there, and also removed it from the non-advanced list, since the rst2sws and rst2txt were not in the normal list.

comment:42 Changed 7 years ago by kcrisman

  • Report Upstream changed from Reported upstream. No feedback yet. to Workaround found; Bug reported upstream.

See this pull request at sagenb for the latest upstream report.

comment:43 follow-ups: Changed 7 years ago by kini

I wonder why you are doing this conversion from scratch instead of using the docutils syntax tree. We already ship docutils - doesn't it have a way to convert HTML to rST? Why do we need BeautifulSoup?

comment:44 in reply to: ↑ 43 Changed 7 years ago by kcrisman

I wonder why you are doing this conversion from scratch instead of using the docutils syntax tree. We already ship docutils - doesn't it have a way to convert HTML to rST? Why do we need BeautifulSoup?

I'm sure that's a good question, but maybe that would be an improvement for later...

comment:45 Changed 7 years ago by kcrisman

WOW!!! This is really awesome! I did a moderately complicated worksheet just now, and had only one error message.

: WARNING: Bullet list ends without a blank line; unexpected unindent.

This was in a sublist. Here is the html source in sage_worksheet.

<p>This works because Sage is very collaborative. &nbsp;</p>
<ul>
<li>Anyone who has access to the server can be invited to share a document.</li>
<li>In this case, the teacher would post a lab, and the students could share a copy of it with each other to work on it.</li>
<li>Then, when they are done, they can:
<ul>
<li>Publish it to the server for the teacher to view, or&nbsp;</li>
<li>Share it with the teacher, or&nbsp;</li>
<li>Download it to be uploaded by the teacher...</li>
</ul>
</li>
</ul>

I don't know whether this is 'needs work' or for a future ticket, since it still looks right in the built Sphinx output.

Also, along the same priority lines, if you include jpg images in the data directory, I guess they don't always look quite right - that is, they don't scale with the size of the picture. I guess that is just because they are jpg files, not png - probably there is no real solution to this. The graphics from Sage do fine.


More importantly, I think that there we need to add some documentation as to how to use this somewhere. What do you think? I think that the Sphinx tutorial in conjunction with the following would do it.

  • To turn a worksheet into nice documentation outside the Sage doc tree, first make a folder somewhere convenient labeled (say) MyProject.
  • Then move your .sws file into that folder, and cd into it.
  • Now the key is to use Sage's shell to run Sphinx on it! Run sage -sh here.
  • Now type sphinx-quickstart and follow the instructions in the Sphinx tutorial. you will probably want to render math with MathJax?.
  • If you now type make html you should get a beautiful-looking web page in _build/html. Other things like make pdf will do the same.

Here's another thing. As far as I can tell, one can pretty much only run this on an .sws that is in the directory you live in. So we need to document this in the same place.


So here are the issues for 'needs work'.

  • Make sage -sws2rst fail gracefully if one hasn't installed the beautifulsoup spkg. Naturally, this requires beautifulsoup to be an optional or at least experimental spkg. Since it's just a Python thing, that shouldn't be a problem.
  • Add documentation. This doesn't need to be intense.
  • Somehow we have to resolve this issue with the older worksheets (if it is in fact an issue).
  • Figure out how to deal with the case where there are no cells/ in the worksheet for some reason.

comment:46 Changed 7 years ago by kini

I'm sure that's a good question, but maybe that would be an improvement for later...

So, we add BeautifulSoup as a dependency for sagenb in this ticket, and then deem removing it again a followup? That's a bit... odd...

  • Make sage -sws2rst fail gracefully if one hasn't installed the beautifulsoup spkg. Naturally, this requires beautifulsoup to be an optional or at least experimental spkg. Since it's just a Python thing, that shouldn't be a problem.

That's more like it :)

comment:47 in reply to: ↑ 43 Changed 7 years ago by slabbe

Replying to kini:

I wonder why you are doing this conversion from scratch instead of using the docutils syntax tree. We already ship docutils - doesn't it have a way to convert HTML to rST? Why do we need BeautifulSoup?

As far as I know, docutils reads rst only. It translates this into plenty of format (odt, .html, .tex, s5, etc.) but not the other way around. To implement sws2rst, you need some html reader like beautiful soup.

Sébastien

comment:48 Changed 7 years ago by kcrisman

With respect to this worksheet with no cells, it gets weirder. As soon as I downloaded it from sagenb.org and uploaded it locally, then downloaded again, it had everything and worked fine. Continuing to test.

comment:49 Changed 7 years ago by kcrisman

Okay, I can reproduce this.

  • Create a new worksheet.
  • Save it immediately (no computation).
  • Download it and process with sage -sws2rst.

Probably something like "delete all output" could also make this happen. I don't know why a worksheet wouldn't have cells in other cases, but at any rate this can happen under "natural" circumstances as well as my weird one.


By the way, the stuff generated is not perfect, mainly because many of us use TinyMCE header styles to do larger fonts - just easier than trying to figure out how to make larger text and center it. So semantics and actual use don't coincide. But I figure if someone wanted to turn this into a rst file then they would know that might happen.

comment:50 Changed 7 years ago by kcrisman

  • Status changed from needs_review to needs_work
  • Work issues set to fail nicely if no beautifulsoup, add doc in sage-sws2rst, case with no cells/ directory

At #11459, slabbe points out that sage-rst2sws has pretty good docs right in the script itself that print out with sage -sws2rst -h. That would suffice for the purposes here.

comment:51 Changed 7 years ago by kcrisman

So I changed enough to add a second patch for clarity of what is what. I have now plenty of documentation with sage -sws2rst -h and then sage -sws2rst --sphinxify (which is mentioned in the help message.

Also, it turns out that this already does fail nicely if there is no BeautifulSoup, except it recommends to download from this Trac ticket. Wow, an entire spkg attached to a ticket! So that needs to be fixed, but that is only a tiny change. Patch coming up there too.

Changed 7 years ago by kcrisman

Apply to sagenb

Changed 7 years ago by kcrisman

Apply to root repository

comment:52 follow-up: Changed 7 years ago by kcrisman

  • Description modified (diff)
  • Milestone changed from sage-5.1 to sage-5.2
  • Status changed from needs_work to needs_review
  • Work issues fail nicely if no beautifulsoup, add doc in sage-sws2rst, case with no cells/ directory deleted

To patchbot and others, instructions:


Okay, this should take care of all the work issues. I've made enough changes in the sage-sws2rst file that it definitely needs review, whether from pang or slabbe or someone else. Please try especially to break it with weird input; but everything else should really be okay, given that my changes to Pablo's great core work is very minimal.

Note that not only my review patches need review, but also the original sagenb patches as well. The sagenb stuff is up-to-date in the pull request 75 of sagenb.

I'm sure there are more elegant ways to do it, but the original scripts patch is fine.

I suppose someone should also review the spkg, though there is really almost nothing to review other than bringing it up to developer guide guidelines. See this sage-devel thread for a vote about whether this is allowed to be an optional spkg.

Last edited 7 years ago by kcrisman (previous) (diff)

comment:53 in reply to: ↑ 52 Changed 7 years ago by kcrisman

I suppose someone should also review the spkg, though there is really almost nothing to review other than bringing it up to developer guide guidelines. See this sage-devel thread for a vote about whether this is allowed to be an optional spkg.

No response, but given that this is available via both pip and easy_install, I say it should be optional at the end of this.

Now we just need review of the review patches, and the sagenb patches.

comment:54 Changed 7 years ago by pang

Thanks a lot for you care Karl.

I downloaded a source version of sage 5.0.1, and tried to:

"Apply trac_10637-root.patch to the root repository", with the result:

patching file spkg/bin/sage
Hunk #1 FAILED at 189
Hunk #2 succeeded at 397 (offset -23 lines).
1 out of 2 hunks FAILED -- saving rejects to file spkg/bin/sage.rej
patch failed to apply
spkg/bin/sage
patch failed, rejects left in working dir
errors during apply, please fix and refresh trac_10637-root.patch

where spkg/bin/sage.rej reads:

--- sage
+++ sage
@@ -190,6 +190,8 @@
     echo "                         reStructuredText source."
     echo "  -rst2sws [...]      -- Generates Sage worksheet (.sws) from standalone"
     echo "                         reStructuredText source."
+    echo "  -sws2rst <sws doc>  -- Generates a reStructuredText source file from"
+    echo "                         a Sage worksheet (.sws) document."
 
     echo
     ####  1.......................26..................................................78

What am I dong wrong?

comment:55 Changed 7 years ago by kcrisman

  • Dependencies changed from #11080 to #11080, #11459

I based it on a later beta because of the model from rst2sws - this depends on #11459, I guess, sorry for not making that more explicit.

Last edited 7 years ago by kcrisman (previous) (diff)

comment:56 Changed 7 years ago by pang

ok, now

applying trac_10637-root.patch
now at: trac_10637-root.patch

works, but the next patch fails

applying trac_10637-root-docsandmore.patch
unable to find 'sage-sws2rst' for patching
5 out of 5 hunks FAILED -- saving rejects to file sage-sws2rst.rej
patch failed, unable to continue (try -v)
patch failed, rejects left in working dir
errors during apply, please fix and refresh trac_10637-root-docsandmore.patch

I attach sage-sws2rst.rej...

Changed 7 years ago by pang

comment:57 Changed 7 years ago by kcrisman

  • Description modified (diff)

I'm stupid. That patch, despite the name, applies to the scripts (local/bin) repository. Sorry; updating instructions here and in the description.


To patchbot and others, instructions:

comment:58 Changed 7 years ago by kcrisman

pang commented elsewhere:

Hi, karl: It's working now, I just need to doctest the full sage library and stuff like that. Currently my hardware is limited, so I cannot do this at any time.

You may want to get an account on the sage.math cluster.

Anyway, I wouldn't worry too much about doctests outside of sagenb, since this doesn't actually change the Sage library at all! The changes are in the actual script, and so checking that it still works the same as you intended and has correct logic is the most important thing.

Once that was true, all that would remain would be for me or someone else other than you to review the actual sagenb stuff you originally included in tools_sws2rst_4.patch. I may eventually be able to do so, but such parsing is slow going for me. I wonder if one of the folks from sage-combinat might be recruited to do so? On the plus side, there is very little chance of bitrot on the sagenb end, and fixing it on the Sage end should be relatively straightforward.

comment:59 follow-ups: Changed 7 years ago by kcrisman

Okay, I've read through the code, and here is everything I have to say without actually trying it out on a variety of worksheets. Anyone have any comments on my patch to the actual script?


To me, the main issue is that the code needs to be fairly well-formed. Is worksheet.html really always that well-formed of HTML? I just don't know.

The reasons this doesn't concern me too much are

  1. The worst that could happen is that the rst doesn't look good, but nothing gets destroyed
  2. Presumably someone wouldn't bother to use this functionality in the first place without checking that the worksheet at least looked nice
  3. Presumably BeautifulSoup makes the html more well-formed

Places this could go wrong, maybe, with weird input, below. Keep in mind I'm not at all a regex wizard, so that could be part of my questions. I'd appreciate any responses to whether these could be problems; although most of them wouldn't be a big issue, I still feel like especially the first several really require (for me) explanation.

  • $$ that are intended to be empty LaTeX for later filling in - definitely counts on $$ always existing in pairs
  • I'm not sure about the replace_courier thing. Are you saying that you, personally, use Courier font to indicate code in TinyMCE, and this is how you replace it with <code></code> tags? It would be nice for this to be customizable; otherwise, what happens to the poor sap who happens to like Courier and then finds all their text replaced by code?
  • It looks like in visiting ordered lists, that all sublists will automatically becomes numbered. But couldn't one have an unordered list inside an ordered list?
  • in the replace_latex thing, is it conceivable that the re's would match something by mistake? It looks like e.g. latex_beginning is matching anything that starts and ends with a dollar sign, as long as it really does end with a dollar sign and not \$, where that would include any character (possibly zero) before that. Again, in a well-formed worksheet that wouldn't be a problem, but maybe sometimes people would do things like "$\$$".
  • Or what if the TinyMCE was (completely) "h$x+y=z$", then wouldn't this get replaced by ":math:x+y=z" with the h gone? It seems like you're assuming
    1 is a whitespace character?
  • How could the branch of visit_strong with _inside_code_tag be reached? The only place this flag seems to be True is in visit_code, but in that case one just gets plain text and and wouldn't visit_strong, it seems to me
  • What if there was an anchor tag that was NOT an "href"? There are other uses for anchors.
  • There is potential for malformed /// or }}} without the others or cell id's missing and | missing to cause trouble
  • <p> could be appended to no purpose; I guess this counts heavily on all tags being properly paired
  • In results sections, is all html being removed? Is there a reason for this? I assume that this is to take care of a specific type of result that can occur that I can't think of right now. After all, a Sage cell is a Python cell, and html certainly could be legitimate output.
  • Is it conceivable that the rest will do something wrong with the escape_chars + and friends when it's not replaced? I assume not since they're turned into :math: mode
  • Am I correct in saying that once inside a code tag, all other tags are ignored? That appears to be what is going on here.
  • In the table, I suppose it's at least possible there could be a tfoot element. Does this not come out of TinyMCE?
  • What other preformatted text <pre> do we expect? Should they all be ::'ed?
  • I know this might not happen a lot, but in theory a cell could do sage: sage: sage: and it should all be removed... not that this is a very big deal.
  • I do like that most of the traceback is removed - am I right in that? Why is there not a continue traceback thing?

Well, that's a lot of dumb little questions. On the very positive plus side, it looks like this is at least as good, if not much better, than other html2rst things on the web - one even advertises "no support for tables, don't even try". I like the infrastructure for extending this should there be call for more to go, and the use of ints but via the names. Great work!

comment:60 in reply to: ↑ 59 ; follow-up: Changed 7 years ago by pang

Thanks again, Karl. I will address most comments later, after some experiment, but I'd love to hear your opinion on one of them. You said:

Replying to kcrisman:

  • In results sections, is all html being removed? Is there a reason for this? I assume that this is to take care of a specific type of result that can occur that I can't think of right now. After all, a Sage cell is a Python cell, and html certainly could be legitimate output.

What should I do? I don't feel like parsing it into rst, because it's output, and parsing is lossy. I cannot be too literal either, because I have to leave something that works when the rst files are rendered into html, tex or whatever.

The question is related to others, like the one on tracebacks for example. I didn't like long tracebacks in the docs, which would scare my non-technical prospective readers. Should I rewrite sws2rst to incorporate several config options, like long_tracebacks or parse_html_in_results or replace_monospace_font_by_code_tags? Can you think of others?

comment:61 in reply to: ↑ 60 Changed 7 years ago by kcrisman

  • In results sections, is all html being removed? Is there a reason for this? I assume that this is to take care of a specific type of result that can occur that I can't think of right now. After all, a Sage cell is a Python cell, and html certainly could be legitimate output.

Here's an example - @interact output is html. In sws2tex, whuss and friends just completely ignore them - perhaps not the worst idea, given that it's supposed to be interactive and a tex (or rst) document isn't, by definition.

What should I do? I don't feel like parsing it into rst, because it's output, and parsing is lossy. I cannot be too literal either, because I have to leave something that works when the rst files are rendered into html, tex or whatever.

I suppose one could start the tree all over again? But that seems less than ideal. Would it be possible to place the html in code tags? I am wondering what some of the typical examples you ran across were. Certainly we can't (or at least don't want to, maybe rest supports this) support all the style stuff html would support.

The question is related to others, like the one on tracebacks for example. I didn't like long tracebacks in the docs, which would scare my non-technical prospective readers. Should I rewrite sws2rst to incorporate several config options, like long_tracebacks or parse_html_in_results or replace_monospace_font_by_code_tags? Can you think of others?

Maybe in another ticket, in general. The html one I was a little unsure, since html output could be legitimate. I do think that the monospace (Courier, I guess?) by code shouldn't be the default, just because that seems to be your own personal trick, but conceivably others might just want text, and might be surprised if all text that was in lots of TinyMCE fonts looks the same, except the text in that particular font. I wish we had a way to turn the different font sizes into different rest...

Anyway, it's clear that there are no major changes required here. I'm going to start testing the same set of worksheets Jason tried last year now, since I'm hoping to convert them anyway, and see what happens. Let's hope this can get in by Sage 5.3!

Changed 7 years ago by kcrisman

Apply to scripts

comment:62 Changed 7 years ago by kcrisman

  • Description modified (diff)

Made a very minor change to the sphinxify help message, so I renamed that patch so it makes more sense now.

comment:63 follow-ups: Changed 7 years ago by kcrisman

Okay, here are some comments from a first try of a bunch of worksheets. It is a long list! So before I get to it, I want to say that in general, this is working great; everything that doesn't work is easily fixable in the rst source file by a very much non-expert in ReST like myself. If some of these things are fixable - particularly the extra newline needed in lists, and the issue with math formatting - that would be best before this makes it into Sage, but most of the rest should be in an upgrade. Great work!


  • As I suspected, intra-worksheets links are not respected. For example, I had
    <a href="#Main">below</a>
    
    and then later
    <p id="Main">The main part of ...
    
    The ReST is
    please skip  `below <#Main>`_ .
    
    I don't know if that is easy to fix, but anyway it's not perfect. Again, given the goals of this, that could be in a second version, but I wanted to point it out. Maybe if the text is #foo it could be prepended somehow? But of course the ids are mostly gone, I guess.
  • The way you replace a span by a \n means that sometimes TinyMCE stuff left over from trying to unformat things gets in a new line. TinyMCE is annoying that way - sometimes to unitalicize you have to do a few things. Anyway, users will have to expect lots of newlines because of that. In one case, I intentionally had used a lot of different ones, and since bold isn't a span, but everything else you can do is, the doc looked kind of weird, lots of new lines. In another case, the text was colored on purpose.
    Plot a green <span style="color: #008000;">$y=\sin(x)$</span> together with...
    
    and then it made a new line there. I'm wondering why you chose to make \n instead of just a space - surely you encountered some "real-life" examples where that was the better option. I'm assuming we can't easily take the color info in, if that's the only info.
  • I think one might need a few other new lines in other places. Here is an example of something that didn't work right. The html:
    <p>For a typical mathematical function, it's pretty straightforward to define it. &nbsp;Below, we define the function $$f(x)=x^2\; .$$</p>
    
    which became
    For a typical mathematical function, it's pretty straightforward to define it.  Below, we define the function
    .. MATH::
    
        f(x)=x^2\; .
    
    which typeset with the .. MATH: as part of the paragraph, interpreting the double colon as making it an input cell (which it then looked like. I'm not sure what happened here, but I'm guessing it has something to do with the removal of the paragraphs with double dollar signs.
  • Here's an even worse example of math not behaving right.
    <ul>
    <li>For instance, it isn't too hard to add things like $$\zeta(s)=\sum_{n=1}^{\infty}\frac{1}{n^s}=\prod_p \left(\frac{1}{1-p^{-s}}\right)\; .$$&nbsp;</li>
    <li>One just types things like&nbsp;"\$\$\zeta(s)=\sum_{n=1}^{\infty}\frac{1}{n^s}=\prod_p \left(\frac{1}{1-p^{-s}}\right)\$\$" in the word processor. &nbsp;</li>
    <li>Whether this shows up as nicely as possible depends on what fonts you have in your browser, but it should be legible. &nbsp;</li>
    <li>More realistically, we might type "\$f(x)=x^2\$" so that we remember that $f(x)=x^2$ in this worksheet.</li>
    </ul>
    
    became all math mode, because the indentation
      - For instance, it isn't too hard to add things like
    .. MATH::
      
        \zeta(s)=\sum_{n=1}^{\infty}\frac{1}{n^s}=\prod_p \left(\frac{1}{1-p^{-s}}\right)\; .
    
    
    
      - One just types things like "\$\$\zeta(s)=\sum_{n=1}^{\infty}\frac{1}{n^s}=\prod_p \left(\frac{1}{1\-p^{\-s}}\right)\$\$$
        
      - Whether this shows up as nicely as possible depends on what fonts you have in your browser, but it should be legible.
    
    
    all was interpreted as being part of the math block! Instead, the math block should have been inside the list, I guess?
  • This is just amusing.
    arrow $\mapsto$ as "|--&gt;").
    
    becomes the same thing, so it doesn't turn back into |-->, the arrow! Not sure what to do about that; why didn't the greater than sign just become a greater than sign when translated from html back? Also, I thought that | was one of the characters you escaped, but maybe this one escaped being escaped?
  • In that same line, Sphinx says
    WARNING: Inline substitution_reference start-string without end-string.
    
    which I assume is related because of the open pipe | without a close pipe.
  • Dumb question; is there a way to have an empty cell? Currently you remove them, but if the input and output are empty, then maybe it's supposed to be an empty cell in the "live" documentation... that's obviously very low priority.
  • Hyperlinks (to other sites, which do work) add an extra space on either side. That looks ok if there isn't punctuation, but if there is, it looks weird.
    the  `Sage website <http://www.sagemath.org/help.html>`_ , which
    
    Note the space before the comma, which shows up in the built doc as well.
  • Here is an example of the very common error about bullet lists.
     - Sage uses the Python programming language, which uses this syntax, 'under the hood', and
    
     - Because it makes it easier to distinguish among
      - The mathematical object,
    
    yields
    WARNING: Bullet list ends without a blank line; unexpected unindent.
    
    pretty consistently. This should be fixed, and probably isn't too hard to do by adding an extra \n somewhere in the ul code. I would say that 90% of the Sphinx errors I got were this.
  • There were also a few like this.
    WARNING: Block quote ends without a blank line; unexpected unindent.
    
    This comes from turning things like
    <p>The most obvious one is simply turning $$\int f(x)dx$$ into $$\int_a^b f(x)dx\; ,$$ as indicated in the Calculus I section. &nbsp;</p>
    

into

The most obvious one is simply turning
.. MATH::

    \int f(x)dx

  into
.. MATH::

    \int_a^b f(x)dx\; ,

  as indicated in the Calculus I section.

Not only is there the issue with the reinterpretation of the math block as a literal block, but it interprets the extra words as part of the literal block - but only because they for some reason got extra spaces.

  • I found one thing that doesn't show up properly because of the html blocks being removed - documentation! If one evaluates a cell like binomial?? then that is all html, and just disappears. Maybe just making it plain text? I'm not sure either, you are right about that being a pain in the neck to do anything intelligent with.
  • There isn't much you can do about this, but since "the order enforced will be the order as encountered" with section heading markup, one can pretty easily get a lot of SEVERE: Title level inconsistent: warnings. This is because we have a WYSIWYG editor which doesn't remember state between cells.
  • Back to issues about lists. I think that you are starting too big. The first indent level should just be at the main level; the ones after that can be indented by one. Isn't this
    - item one
    
     - item 1a
     - item 1b
    
    - item two
    
     - item 2a
     -item 2b
    
    more correct than
     - item one
      - item 1a
      - item 1b
    
     - item two
      - item 2a
      -item 2b
    
    which you currently have? If you look at rendered worksheets, you'll see what I mean - everything is just a little too far in.

comment:64 Changed 7 years ago by kcrisman

And as a followup, doing another group of simpler worksheets yielded only three bullet list errors and a few other things. I did have a couple MathJax issues, but I'm pretty sure that's what they are, nothing to do with the patches here.

comment:65 Changed 7 years ago by jhpalmieri

Maybe a patch like the following would allow passing absolute path names as arguments:

  • sage-sws2rst

    diff --git a/sage-sws2rst b/sage-sws2rst
    a b def process_sws(file_name): 
    5151    #TODO: python complains about using tempnam, but I don't
    5252    #know hot to fix it or see any danger
    5353#    tempname = os.tempnam('.')
    54     tempname = os.path.join(tempfile.gettempdir(), file_name)
    55     sws_file.extractall(tempname)
    5654    base_name = os.path.split(os.path.splitext(file_name)[0])[1]
    5755    base_name_clean = base_name.replace(' ','_')
     56    tempname = os.path.join(tempfile.gettempdir(), base_name_clean)
     57    sws_file.extractall(tempname)
    5858
    5959    #Images
    6060    images_dir = base_name_clean + '_media'

This needs serious testing, but being unable to pass absolute paths is a show-stopper for me. By the way, you could also use os.path.basename instead of os.path.split to define base_name.

comment:66 Changed 7 years ago by kcrisman

  • Description modified (diff)
  • Reviewers changed from Nicolas Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow to Nicolas Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow, John Palmieri
  • Status changed from needs_review to needs_work
  • Work issues set to answer questions, math formatting, lists, maybe absolute paths?

Okay, after thinking about it some more, here is what would be absolutely required for positive review.

  • All questions in comment:59 and comment:63 need to be answered. Not solved! Just to have some answer, even if it's "I don't know". This will help tremendously for any followup work.
  • The issue with the math formatting needing the new line must be fixed.
  • The variety of bullet issues should be made better.
    • The unexpected indent warning
    • Display math in lists
    • If not already true, allow unordered lists as sublists of ordered ones (? maybe not as important as the others)

Humbly disagreeing with jhpalmieri about the absolute paths, since I documented it - but of course that would be great, what magic makes this work but not the other version? If this does in fact work, then I guess that would be good too.

The point is that we don't want to make the review process overly long here.

comment:67 in reply to: ↑ 63 ; follow-up: Changed 7 years ago by kcrisman

Some suggestions for code. Ignore if stupid.


  • This is just amusing.
    arrow $\mapsto$ as "|--&gt;").
    
    becomes the same thing, so it doesn't turn back into |-->, the arrow! Not sure what to do about that; why didn't the greater than sign just become a greater than sign when translated from html back? Also, I thought that | was one of the characters you escaped, but maybe this one escaped being escaped?

Okay, I think that what happens is that in replace_latex you only replace these characters if there was no LaTeX to replace in the first place. But of course that might not always be the case. This isn't a huge issue, but worth pointing out and probably easy to fix.


In visit_li, I think that replacing

        return (' '*self._nested_list 
                + ('#. ' if self._inside_ol else '- ') 
                +' '.join(self.visit(tag) for tag in node.contents))

with ' '*(self._nested_list-1) should work. That was also a very minor point.


Would reversing the setting in visit_ol for visit_ul work for allowing nested mixed lists?

self._inside_ol = False
blah
self._inside_ol = True

Would replacing, in visit_display,

return ('\n.. MATH::\n\n    ' + 

with

return ('\n\n.. MATH::\n\n    ' + 

help with the math display issue?


In visit_li, maybe a number of potential issues (in addition to the math not being indented enough) could be solved by

    def visit_li(self, node): 
        return (' '*self._nested_list 
                + ('#. ' if self._inside_ol else '- ') 
                +' '.join(self.visit(tag) for tag in node.contents))

with

    def visit_li(self, node):
        spacing = ' '*(self._nested_list - 1) 
        return (spacing 
                + ('#. ' if self._inside_ol else '- ') 
                +' '.join(self.visit(tag) for tag in node.contents).replace('\n','\n'+spacing) )

What do you think? This just preserves the indentation until the soup gives us the end of the list item, if I did it right.


To solve the issue with the unexpected indent, it looks like it would suffice to change the visit_ul and visit_ol to have

result = '\n' + '\n'.join(self.visit(tag) for tag in node.contents)

I think that changing visit_li will be less optimal, since there are already newlines between all of those things. This is where it should be handled.


Sorry for not having nice diffs like John - I haven't tried these out, just thinking out loud about code.

comment:68 Changed 7 years ago by kcrisman

Here is something more minor, but proving to be just as annoying as replacing all the other things as I use the actual documents I created with this. Do we really need .. end of output in addition to a couple \ns? It only functions as a comment, and at least in normal Sage documentation this is never there. I understand its semantic purpose, but it seems like this would be something better handled as a switch. Again, that could be for a second version...

comment:69 Changed 7 years ago by kcrisman

Just as a shoutout, #13260 is mostly possible due to this patch :)

comment:70 Changed 7 years ago by novoselt

  • Cc novoselt added

comment:71 Changed 7 years ago by kcrisman

Another thing possibly to do here would be to test this in sage/tests/cmdline.py as pointed out at comment:41:ticket:11459.

comment:72 in reply to: ↑ 59 ; follow-ups: Changed 7 years ago by pang

Replying to kcrisman:

  • $$ that are intended to be empty LaTeX for later filling in - definitely counts on $$ always existing in pairs

I work around a possible problem if $$ and <p>...</p> are not correctly nested, because I observed that this happens, but I assumed that there is no unpaired $$ because the jsmath in the notebook makes that kind of mistake pretty obvious.

  • I'm not sure about the replace_courier thing. Are you saying that you, personally, use Courier font to indicate code in TinyMCE, and this is how you replace it with <code></code> tags? It would be nice for this to be customizable; otherwise, what happens to the poor sap who happens to like Courier and then finds all their text replaced by code?

Done: I don't use that function any more

  • It looks like in visiting ordered lists, that all sublists will automatically becomes numbered. But couldn't one have an unordered list inside an ordered list?

Solved

  • in the replace_latex thing, is it conceivable that the re's would match something by mistake? It looks like e.g. latex_beginning is matching anything that starts and ends with a dollar sign, as long as it really does end with a dollar sign and not \$, where that would include any character (possibly zero) before that. Again, in a well-formed worksheet that wouldn't be a problem, but maybe sometimes people would do things like "$\$$".

Nice catch! I've taken care of that.

  • Or what if the TinyMCE was (completely) "h$x+y=z$", then wouldn't this get replaced by ":math:x+y=z" with the h gone? It seems like you're assuming
    1 is a whitespace character?

Nope, I collect that char too.

  • How could the branch of visit_strong with _inside_code_tag be reached? The only place this flag seems to be True is in visit_code, but in that case one just gets plain text and and wouldn't visit_strong, it seems to me

You're damn right.

  • What if there was an anchor tag that was NOT an "href"? There are other uses for anchors.

ok, done.

  • There is potential for malformed /// or }}} without the others or cell id's missing and | missing to cause trouble

Yes, but that's not tinyMCE dependent, and I haven't seen it fail. With any parser, there is wrong input that will produce problems.

  • <p> could be appended to no purpose; I guess this counts heavily on all tags being properly paired

can you give an example? I've tried several configurations and BeautifulSoup? seems pretty robust

  • Is it conceivable that the rest will do something wrong with the escape_chars + and friends when it's not replaced? I assume not since they're turned into :math: mode

not sure I understand, haven't seen any problem in this line

  • Am I correct in saying that once inside a code tag, all other tags are ignored? That appears to be what is going on here.

That's correct. That was probably to deal with <code>var = <strong>my_func</strong>(spam, eggs)</code>, but if you'd rather do something else I'd like to know.

  • In the table, I suppose it's at least possible there could be a tfoot element. Does this not come out of TinyMCE?

No, TinyMCE does not use the footer, but it's no big deal: now we take care of both.

  • What other preformatted text <pre> do we expect? Should they all be ::'ed?

please detail

  • I know this might not happen a lot, but in theory a cell could do sage: sage: sage: and it should all be removed... not that this is a very big deal.

forgot that one, so I'll leave that to a further patch

comment:73 in reply to: ↑ 67 ; follow-up: Changed 7 years ago by pang

Replying to kcrisman:

Some suggestions for code. Ignore if stupid.


  • This is just amusing.
    arrow $\mapsto$ as "|--&gt;").
    
    becomes the same thing, so it doesn't turn back into |-->, the arrow! Not sure what to do about that; why didn't the greater than sign just become a greater than sign when translated from html back? Also, I thought that | was one of the characters you escaped, but maybe this one escaped being escaped?

Okay, I think that what happens is that in replace_latex you only replace these characters if there was no LaTeX to replace in the first place. But of course that might not always be the case. This isn't a huge issue, but worth pointing out and probably easy to fix.

Solved: now I replace those chars outside of LaTex? as many times as needed.

In visit_li, I think that replacing

        return (' '*self._nested_list 
                + ('#. ' if self._inside_ol else '- ') 
                +' '.join(self.visit(tag) for tag in node.contents))

with ' '*(self._nested_list-1) should work. That was also a very minor point.

ok, did something similar. It did improve a lot.

Would reversing the setting in visit_ol for visit_ul work for allowing nested mixed lists?

self._inside_ol = False
blah
self._inside_ol = True

ok, but I did something slightly more complex. Your proposal would not cover all situations, like more than two nesting levels.

Would replacing, in visit_display,

return ('\n.. MATH::\n\n    ' + 

with

return ('\n\n.. MATH::\n\n    ' + 

help with the math display issue?

It did

To solve the issue with the unexpected indent, it looks like it would suffice to change the visit_ul and visit_ol to have

result = '\n' + '\n'.join(self.visit(tag) for tag in node.contents)

I think that changing visit_li will be less optimal, since there are already newlines between all of those things. This is where it should be handled.

You're right. Problem is: with tinyMCE I got a nested list that was inside an <li> tag for the parent list, but of course I want to allow also for the nested ul to be inside the ul directly, so: is there is any problem is I just add an excess of blank lines:

result = '\n\n' + '\n'.join(self.visit(tag) for tag in node.contents)

comment:74 in reply to: ↑ 63 ; follow-up: Changed 7 years ago by pang

Replying to kcrisman:

  • As I suspected, intra-worksheets links are not respected.

They are now

  • The way you replace a span by a \n means that sometimes TinyMCE stuff left over from trying to unformat things gets in a new line. TinyMCE is annoying that way - sometimes to unitalicize you have to do a few things. Anyway, users will have to expect lots of newlines because of that. In one case, I intentionally had used a lot of different ones, and since bold isn't a span, but everything else you can do is, the doc looked kind of weird, lots of new lines. In another case, the text was colored on purpose.
    Plot a green <span style="color: #008000;">$y=\sin(x)$</span> together with...
    
    and then it made a new line there. I'm wondering why you chose to make \n instead of just a space - surely you encountered some "real-life" examples where that was the better option. I'm assuming we can't easily take the color info in, if that's the only info.

I don't really remember why I did that, but it also screwed html tables in some situations, so it's best, and more naturally, made into a space.

  • I think one might need a few other new lines in other places.
  • Here's an even worse example of math not behaving right.
    <ul>
    <li>For instance, it isn't too hard to add things like $$\zeta(s)=\sum_{n=1}^{\infty}\frac{1}{n^s}=\prod_p \left(\frac{1}{1-p^{-s}}\right)\; .$$&nbsp;</li>
    <li>One just types things like&nbsp;"\$\$\zeta(s)=\sum_{n=1}^{\infty}\frac{1}{n^s}=\prod_p \left(\frac{1}{1-p^{-s}}\right)\$\$" in the word processor. &nbsp;</li>
    <li>Whether this shows up as nicely as possible depends on what fonts you have in your browser, but it should be legible. &nbsp;</li>
    <li>More realistically, we might type "\$f(x)=x^2\$" so that we remember that $f(x)=x^2$ in this worksheet.</li>
    </ul>
    
    became all math mode, because the indentation
      - For instance, it isn't too hard to add things like
    .. MATH::
      
        \zeta(s)=\sum_{n=1}^{\infty}\frac{1}{n^s}=\prod_p \left(\frac{1}{1-p^{-s}}\right)\; .
    
    
    
      - One just types things like "\$\$\zeta(s)=\sum_{n=1}^{\infty}\frac{1}{n^s}=\prod_p \left(\frac{1}{1\-p^{\-s}}\right)\$\$$
        
      - Whether this shows up as nicely as possible depends on what fonts you have in your browser, but it should be legible.
    
    
    all was interpreted as being part of the math block! Instead, the math block should have been inside the list, I guess?

That was solved by unindenting the list items one place, as proposed in other place. Would not work if you have display math inside a nested list. A solution was to place comments ".. end of MATH" after any display math environment. I could do the same with lists and any other indentation-dependent environment.

  • This is just amusing.
    arrow $\mapsto$ as "|--&gt;").
    
    becomes the same thing, so it doesn't turn back into |-->, the arrow! Not sure what to do about that; why didn't the greater than sign just become a greater than sign when translated from html back? Also, I thought that | was one of the characters you escaped, but maybe this one escaped being escaped?

Solved: now I replace those chars outside of LaTex? as many times as needed.

  • Dumb question; is there a way to have an empty cell? Currently you remove them, but if the input and output are empty, then maybe it's supposed to be an empty cell in the "live" documentation... that's obviously very low priority.

forgot that one too

  • Hyperlinks (to other sites, which do work) add an extra space on either side. That looks ok if there isn't punctuation, but if there is, it looks weird.
    the  `Sage website <http://www.sagemath.org/help.html>`_ , which
    
    Note the space before the comma, which shows up in the built doc as well.

fixed

  • I found one thing that doesn't show up properly because of the html blocks being removed - documentation! If one evaluates a cell like binomial?? then that is all html, and just disappears. Maybe just making it plain text? I'm not sure either, you are right about that being a pain in the neck to do anything intelligent with.

Regarding docstring like in binomial??, how about we change the sage code that turns the doctring into html and displays the result, so that it also leaves a comment with the plain rst? The we could simply catch the comment...

  • There isn't much you can do about this, but since "the order enforced will be the order as encountered" with section heading markup, one can pretty easily get a lot of SEVERE: Title level inconsistent: warnings. This is because we have a WYSIWYG editor which doesn't remember state between cells.

The use case that I find more likely is a worksheet with no heading at all. I could catch those ones and use the file name as a title: what do you think?

comment:75 Changed 7 years ago by pang

One general comment: I was introducing plenty of extra blank lines. This was the easy way to be safe in case there is no whitespace, and to make it work in some non-perfect-but-real situations like the ul nested inside the li. I took the easy way: introduce as many blank lines as I need at each point, then replace any ocurrence of two or more blank lines with exacly one blank line.

To jhpalmieri: Your patch works allright, and it's now included.

comment:76 Changed 7 years ago by kcrisman

Hi Pablo! Wow, what a reply.

I don't have time to test this all the way now, or even try it, but I just wanted to know whether you felt like the stuff I listed in comment:66 was addressed thoroughly. I assume so! In that case, presumably any potential reviewer could just look at your responses and see what they think, not just me, which would be good for getting this in.

For full integration, I guess we'd need to know exactly what patches to apply, and we'd really need some update of the fork I made for this at this pull request. I think that my current git version of sagenb is too corrupted or I just can't figure it out, so I'd need to start afresh.

comment:77 follow-up: Changed 7 years ago by pang

I've just downloaded sage 5.3, compiled for ubuntu 12.04, and I could apply all the patches:

do you need me to do anything with mercurial, queues and patches? I'll investigate the github thing later. If I have ever used it, I don't remember.

comment:78 Changed 7 years ago by pang

  • Description modified (diff)

comment:79 in reply to: ↑ 77 Changed 7 years ago by kcrisman

  • Cc kini added

Replying to pang:

I've just downloaded sage 5.3, compiled for ubuntu 12.04, and I could apply all the patches:

do you need me to do anything with mercurial, queues and patches? I'll investigate the github thing later. If I have ever used it, I don't remember.

Yes, technically we need to create a pull request. See here for the current, outdated one. Unfortunately I have a lot of trouble with github (not so much git as the process of syncing with it) so I'd be grateful if you were able to check out a clone of the master and then do this; luckily, nearly everything is just a new file.

Maybe you could even change that pull request, though I doubt it. Cc:ing kini on the off chance he knows some slick way to do that.

comment:80 Changed 7 years ago by kini

I can't change the pull request, since you created and own it. Only you can change it. I can however make a new pull request if you don't know how to change your pull request. I'm currently traveling, though, so I can't do it immediately, I'm afraid.

comment:81 in reply to: ↑ 72 ; follow-up: Changed 7 years ago by kcrisman

  • $$ that are intended to be empty LaTeX for later filling in - definitely counts on $$ always existing in pairs

I work around a possible problem if $$ and <p>...</p> are not correctly nested, because I observed that this happens, but I assumed that there is no unpaired $$ because the jsmath in the notebook makes that kind of mistake pretty obvious.

Hmm, good point.

  • I'm not sure about the replace_courier thing. Are you saying that you, personally, use Courier font to indicate code in TinyMCE, and this is how you replace it with <code></code> tags? It would be nice for this to be customizable; otherwise, what happens to the poor sap who happens to like Courier and then finds all their text replaced by code?

Done: I don't use that function any more

What do you do instead? What happens to things in code tags?

  • It looks like in visiting ordered lists, that all sublists will automatically becomes numbered. But couldn't one have an unordered list inside an ordered list?

Solved

Yup.

  • in the replace_latex thing, is it conceivable that the re's would match something by mistake? It looks like e.g. latex_beginning is matching anything that starts and ends with a dollar sign, as long as it really does end with a dollar sign and not \$, where that would include any character (possibly zero) before that. Again, in a well-formed worksheet that wouldn't be a problem, but maybe sometimes people would do things like "$\$$".

Nice catch! I've taken care of that.

I see that, and I think it works. I may have to look at it more closely, as single_dollar is one I have to consult my regex resources for :)

More to come...

comment:82 in reply to: ↑ 72 Changed 7 years ago by kcrisman

  • What if there was an anchor tag that was NOT an "href"? There are other uses for anchors.

ok, done.

Good solution. Since in ReST things are supposed to be unique, there could conceivably be some nastiness if one had links in several worksheets that pointed to the same thing, but that would be fixable by the user.

  • <p> could be appended to no purpose; I guess this counts heavily on all tags being properly paired

can you give an example? I've tried several configurations and BeautifulSoup? seems pretty robust

If I could, I can't any more. We'll let it pass.

  • In the table, I suppose it's at least possible there could be a tfoot element. Does this not come out of TinyMCE?

No, TinyMCE does not use the footer, but it's no big deal: now we take care of both.

Ok.

  • What other preformatted text <pre> do we expect? Should they all be ::'ed?

please detail

I don't remember any more. I think that I was just wondering what sort of other things might end up in a <pre> tag, and whether some might not be "block literals" in Sphinx terms. I don't know the answer to this.

  • I know this might not happen a lot, but in theory a cell could do sage: sage: sage: and it should all be removed... not that this is a very big deal.

forgot that one, so I'll leave that to a further patch

Well, and hopefully it would all doctest correctly.

comment:83 in reply to: ↑ 73 Changed 7 years ago by kcrisman

Okay, I think that what happens is that in replace_latex you only replace these characters if there was no LaTeX to replace in the first place. But of course that might not always be the case. This isn't a huge issue, but worth pointing out and probably easy to fix.

Solved: now I replace those chars outside of LaTex? as many times as needed.

Check.

In visit_li, I think that replacing

        return (' '*self._nested_list 
                + ('#. ' if self._inside_ol else '- ') 
                +' '.join(self.visit(tag) for tag in node.contents))

with ' '*(self._nested_list-1) should work. That was also a very minor point.

ok, did something similar. It did improve a lot.

Ok, I found it - a little tricky!

Would reversing the setting in visit_ol for visit_ul work for allowing nested mixed lists?

self._inside_ol = False
blah
self._inside_ol = True

ok, but I did something slightly more complex.

Yup, and I like it.

Would replacing, in visit_display,

return ('\n.. MATH::\n\n    ' + 

with

return ('\n\n.. MATH::\n\n    ' + 

help with the math display issue?

It did

Great.

To solve the issue with the unexpected indent, it looks like it would suffice to change the visit_ul and visit_ol to have

result = '\n' + '\n'.join(self.visit(tag) for tag in node.contents)

I think that changing visit_li will be less optimal, since there are already newlines between all of those things. This is where it should be handled.

You're right. Problem is: with tinyMCE I got a nested list that was inside an <li> tag for the parent list, but of course I want to allow also for the nested ul to be inside the ul directly, so: is there is any problem is I just add an excess of blank lines:

result = '\n\n' + '\n'.join(self.visit(tag) for tag in node.contents)

That should be okay, I think.

comment:84 in reply to: ↑ 74 Changed 7 years ago by kcrisman

  • As I suspected, intra-worksheets links are not respected.

They are now

Good.

  • Here's an even worse example of math not behaving right.

That was solved by unindenting the list items one place, as proposed in other place. Would not work if you have display math inside a nested list. A solution was to place comments ".. end of MATH" after any display math environment. I could do the same with lists and any other indentation-dependent environment.

For now this fix is fine.

  • This is just amusing.
    arrow $\mapsto$ as "|--&gt;").
    
    becomes the same thing, so it doesn't turn back into |-->, the arrow! Not sure what to do about that; why didn't the greater than sign just become a greater than sign when translated from html back? Also, I thought that | was one of the characters you escaped, but maybe this one escaped being escaped?

Solved: now I replace those chars outside of LaTex? as many times as needed.

Yup, at the end. Good.

  • Hyperlinks (to other sites, which do work) add an extra space on either side. That looks ok if there isn't punctuation, but if there is, it looks weird.
    the  `Sage website <http://www.sagemath.org/help.html>`_ , which
    
    Note the space before the comma, which shows up in the built doc as well.

fixed

Ok.

Just a few more to look at, and then the regex checking...

comment:85 follow-up: Changed 7 years ago by kcrisman

  • The way you replace a span by a \n means that sometimes TinyMCE stuff left over from trying to unformat things gets in a new line. TinyMCE is annoying that way - sometimes to unitalicize you have to do a few things. Anyway, users will have to expect lots of newlines because of that. In one case, I intentionally had used a lot of different ones, and since bold isn't a span, but everything else you can do is, the doc looked kind of weird, lots of new lines. In another case, the text was colored on purpose.
    Plot a green <span style="color: #008000;">$y=\sin(x)$</span> together with...
    
    and then it made a new line there. I'm wondering why you chose to make \n instead of just a space - surely you encountered some "real-life" examples where that was the better option. I'm assuming we can't easily take the color info in, if that's the only info.

I don't really remember why I did that, but it also screwed html tables in some situations, so it's best, and more naturally, made into a space.

Is this the change in visit_inline_no_tag? Sorry, I just can't figure that one out.

  • Dumb question; is there a way to have an empty cell? Currently you remove them, but if the input and output are empty, then maybe it's supposed to be an empty cell in the "live" documentation... that's obviously very low priority.

forgot that one too

I can't find where you did this.

  • I found one thing that doesn't show up properly because of the html blocks being removed - documentation! If one evaluates a cell like binomial?? then that is all html, and just disappears. Maybe just making it plain text? I'm not sure either, you are right about that being a pain in the neck to do anything intelligent with.

Regarding docstring like in binomial??, how about we change the sage code that turns the doctring into html and displays the result, so that it also leaves a comment with the plain rst? The we could simply catch the comment...

Well, maybe. That should be a different patch; we should really get this in.

  • There isn't much you can do about this, but since "the order enforced will be the order as encountered" with section heading markup, one can pretty easily get a lot of SEVERE: Title level inconsistent: warnings. This is because we have a WYSIWYG editor which doesn't remember state between cells.

The use case that I find more likely is a worksheet with no heading at all. I could catch those ones and use the file name as a title: what do you think?

That's not a bad idea. And I like your catching of the identical heading thing - that may have been a little of the problem. That said, if only we could catch this thing... anyway, fixing the no heading worksheet is a good idea.

Finally, I'm just wondering about a few changes - surely it's obvious what they do, but I'm lazy.

  • The change to ALL_ENTITIES in soup.
  • The images_dir stuff - was that in response to anything I said?

comment:86 in reply to: ↑ 81 ; follow-up: Changed 7 years ago by pang

Replying to kcrisman:

  • I'm not sure about the replace_courier thing. Are you saying that you, personally, use Courier font to indicate code in TinyMCE, and this is how you replace it with <code></code> tags? It would be nice for this to be customizable; otherwise, what happens to the poor sap who happens to like Courier and then finds all their text replaced by code?

Done: I don't use that function any more

What do you do instead? What happens to things in code tags?

Code tags work the same, but I don't turn <span style="font-family...courier...">foo(bar)=spam+eggs</span> into <code>foo(bar)=spam+eggs</code>

comment:87 in reply to: ↑ 86 Changed 7 years ago by kcrisman

What do you do instead? What happens to things in code tags?

Code tags work the same, but I don't turn <span style="font-family...courier...">foo(bar)=spam+eggs</span> into <code>foo(bar)=spam+eggs</code>

Oh, of course - sorry for the question, even. I was trying to look only at the patches and not the full code.

comment:88 in reply to: ↑ 85 ; follow-up: Changed 7 years ago by pang

Replying to kcrisman:

  • The way you replace a span by a \n means that sometimes TinyMCE stuff left over from trying to unformat things gets in a new line. TinyMCE is annoying that way - sometimes to unitalicize you have to do a few things. Anyway, users will have to expect lots of newlines because of that. In one case, I intentionally had used a lot of different ones, and since bold isn't a span, but everything else you can do is, the doc looked kind of weird, lots of new lines. In another case, the text was colored on purpose.
    Plot a green <span style="color: #008000;">$y=\sin(x)$</span> together with...
    
    and then it made a new line there. I'm wondering why you chose to make \n instead of just a space - surely you encountered some "real-life" examples where that was the better option. I'm assuming we can't easily take the color info in, if that's the only info.

I don't really remember why I did that, but it also screwed html tables in some situations, so it's best, and more naturally, made into a space.

Is this the change in visit_inline_no_tag? Sorry, I just can't figure that one out.

No, it's the 147 'p': 'inline_no_tag',

159 'p': 'p',

plus the:

331 def visit_p(self, node): 332 return .join(self.visit(tag) for tag in node.contents) + '\n\n'

by the way, the following lines are garbage:

333 cs = [self.visit(tag) for tag in node.contents] 334 return ' '.join(c for c in cs if c.strip()) + '\n\n'

Karl, should I make a single patch, and fix also this little stupid thing?

  • Dumb question; is there a way to have an empty cell? Currently you remove them, but if the input and output are empty, then maybe it's supposed to be an empty cell in the "live" documentation... that's obviously very low priority.

forgot that one too

I can't find where you did this.

I've done nothing, I guess I could drop the code that discards the empty code cells, but on the other hand, I want them out of my pdf...

Finally, I'm just wondering about a few changes - surely it's obvious what they do, but I'm lazy.

  • The change to ALL_ENTITIES in soup.

Following the documentation of BeautifulSoup?, that should convert entities like &amp; into & and $lt; into <, but it doesn't work, so I do it later anyway. In a later ticket, I'll update to a newer version of BeautifulSoup?, and this will be a useful reminder.

  • The images_dir stuff - was that in response to anything I said?

No, that's in order to be able to use worksheet2rst as a sage-independent script. Found it useful during testing.

comment:89 in reply to: ↑ 88 ; follow-up: Changed 7 years ago by kcrisman

No, it's the 147 'p': 'inline_no_tag',

159 'p': 'p',

plus the:

331 def visit_p(self, node): 332 return .join(self.visit(tag) for tag in node.contents) + '\n\n'

Check, thanks.

by the way, the following lines are garbage:

333 cs = [self.visit(tag) for tag in node.contents] 334 return ' '.join(c for c in cs if c.strip()) + '\n\n'

Karl, should I make a single patch, and fix also this little stupid thing?

Don't make a single patch, but do change the response_to_kcrisman patch and add this and that thing where you catch a worksheet with an empty title and add a title.

I've done nothing, I guess I could drop the code that discards the empty code cells, but on the other hand, I want them out of my pdf...

Eventually one would want this to be a switch, but I guess it's ok for now. That could be an enhancement request.

Thanks for the answers to the other two questions.

So now I should probably actually test that this works on the things I used it on before :) and then hopefully a pull request will be possible to make.

comment:90 in reply to: ↑ 89 Changed 7 years ago by pang

Replying to kcrisman:

Karl, should I make a single patch, and fix also this little stupid thing?

Don't make a single patch, but do change the response_to_kcrisman patch and add this and that thing where you catch a worksheet with an empty title and add a title.

Done: are you happy with the "catch a worksheet with an empty title and add a title"? It's nothing special, I wasn't fully satisfied with other ideas, so went for the simpler one.

comment:91 Changed 7 years ago by kcrisman

  • Status changed from needs_work to needs_review

Good on those things. Now we really do just need final review (if from me, requires checking the most recent patch regex stuff and checking a bunch of worksheets to see that nothing horrible happens), no more 'needs work', then a new pull request. I'm sorry I can't do any of those the next couple days. I'm pretty confident this will look quite good, though.

comment:92 follow-up: Changed 7 years ago by pang

Karl, please hold on this. I've noticed one bug. It should be asey to fix, but I cannot take care now. Please leave your testing for later.

comment:93 in reply to: ↑ 92 Changed 7 years ago by kcrisman

Karl, please hold on this. I've noticed one bug. It should be asey to fix, but I cannot take care now. Please leave your testing for later.

No problem; if you've noticed, I haven't had the time since the summer :)

Changed 7 years ago by pang

minor updates suggested by kcrisman

comment:94 Changed 7 years ago by pang

The bug is gone. It was a regex mistake that swallowed one character. I introduced it while trying to pass all your tests. Negative lookbehind was the answer:

http://www.regular-expressions.info/lookaround.html

comment:95 Changed 7 years ago by jason

What is the status of the sagenb pull request? Is it up to date?

comment:96 Changed 7 years ago by kcrisman

No, the pull request has not been updated since whenever it was created, as far as I know.

As to review, I think that Pablo implicitly liked my additions (?) and so, as I say in comment:91, someone needs to check that the regex changes in attachment:trac_10637_answer_to_kcrisman.patch are ok (the rest of that patch is fine, I think) and then presumably test actual worksheets one more time.

So we'd both be grateful for an independent viewer to do those things. I don't think I will even know how to update my pull request right now because when I upgraded my computer I just deleted my standalone sagenb repository - I still have a lot of trouble making it do the "right" thing with respect to github. But this stuff is really within epsilon of being ready, and would be really useful.

comment:97 follow-up: Changed 7 years ago by jason

How about when this is positively reviewed, I'll merge the sagenb patch(es) in.

comment:98 in reply to: ↑ 97 Changed 7 years ago by kcrisman

How about when this is positively reviewed, I'll merge the sagenb patch(es) in.

Okay; I didn't know who was the current sagenb master master, so I was hesitant. I'll try to make this a priority - Sandy did slow things down somewhat here, though perhaps the extra time would be best spent grading rather than reviewing patches ;-) but I really want Pablo's hard work in as well. At least it can't really bitrot too much.

comment:99 Changed 7 years ago by jason

Well, the sagenb merge maestro position has been somewhat vacant, as you may have noticed. Today I decided to clean up a bunch of pending pull requests, though.

comment:100 Changed 7 years ago by kini

Yes, sorry about that, by the way. I guess I was sort of the "sagenb merge maestro" for a while but I've been pretty busy for the last month and a half or so and have been neglecting Sage. I'll probably be out for the rest of the calendar year as well, actually - sorry about that :(

comment:101 Changed 7 years ago by SimonKing

The instructions in the ticket descriptions don't work. I get

simon@linux-sqwp:~/SAGE/prerelease/sage-5.6.rc0/local/bin> hg qimport -P http://trac.sagemath.org/sage_trac/raw-attachment/ticket/10637/trac_10637_answer_to_kcrisman.patch
Füge trac_10637_answer_to_kcrisman.patch zur Seriendatei hinzu
Wende trac_10637_answer_to_kcrisman.patch an
Kann 'sagenb/misc/comments2rst.py' nicht zum Patchen finden
14 von 14 Teilstücken sind FEHLGESCHLAGEN -- speichere Ausschuss in Datei sagenb/misc/comments2rst.py.rej
Kann 'sagenb/misc/worksheet2rst.py' nicht zum Patchen finden
4 von 4 Teilstücken sind FEHLGESCHLAGEN -- speichere Ausschuss in Datei sagenb/misc/worksheet2rst.py.rej
Patch schlug fehl und Fortsetzung unmöglich (versuche -v)
Patch schlug fehl, Fehlerabschnitte noch im Arbeitsverzeichnis
Fehler beim Anwenden. Bitte beheben und trac_10637_answer_to_kcrisman.patch aktualisieren

I guess one first needs to do the sagenb thingy mentioned in the ticket description.

comment:102 Changed 7 years ago by SimonKing

Aha! Could it be that trac_10637_answer_to_kcrisman.patch is to be applied to the sagenb and not to the scripts repository?

comment:103 Changed 7 years ago by SimonKing

  • Description modified (diff)

Yep. At least trac_10637_answer_to_kcrisman.patch does apply in sagenb.

comment:104 Changed 7 years ago by SimonKing

Is there a sagenb repository in sage-5.7.beta2? I tried to apply the sagenb patches when I was in SAGE_ROOT/devel/sagenb, but the patch would create SAGE_ROOT/sagenb/misc and put stuff there!

So, please elaborate on the installation instructions.

comment:105 follow-up: Changed 7 years ago by kini

Hi Simon, thanks for looking at this ticket! We did decide not to ship the repository with sagenb anymore (as it is quite large due to various... well, some might term them "mistakes" in the past :) ). The patch itself has been migrated to a git branch, which you can see here:

https://github.com/sagemath/sagenb/pull/75

In order to "apply" it, you'll need to install the sagenb repository / development version. There are some instructions on how to do this at the main github project page, https://github.com/sagemath/sagenb (scroll down to the bottom). Hope that helps. I have to go now, but I may post further details later.

comment:106 Changed 7 years ago by kini

Sorry, I meant look at INSTALL.rst

comment:107 Changed 7 years ago by kcrisman

Simon, if you could finish reviewing this, that would be awesome. See comment:96 for what remains to be reviewed - basically one thing with regex and then to go ahead and make sure a bunch of worksheets look okay when using it.

comment:108 in reply to: ↑ 105 Changed 7 years ago by SimonKing

Replying to kini:

Hi Simon, thanks for looking at this ticket! We did decide not to ship the repository with sagenb anymore (as it is quite large due to various... well, some might term them "mistakes" in the past :) ). The patch itself has been migrated to a git branch

Arrgh. So, I'd need to learn git first...

comment:109 Changed 7 years ago by SimonKing

In the short run: Would it work to take the patches from here and manually apply them with that patch command, i.e., without using a version control system?

comment:110 Changed 7 years ago by kcrisman

In the short run: Would it work to take the patches from here and manually apply them with that patch command, i.e., without using a version control system?

I don't see why not.

Hi Simon, thanks for looking at this ticket! We did decide not to ship the repository with sagenb anymore (as it is quite large due to various... well, some might term them "mistakes" in the past :) ). The patch itself has been migrated to a git branch

Arrgh. So, I'd need to learn git first...

Hopefully there would be some sequence of cut-and-paste commands someone could provide... unfortunately, that person is not me.

comment:111 follow-up: Changed 7 years ago by SimonKing

What seems to work (around) is to initialise a new hg repository in SAGE_ROOT/devel/sagenb. Of course I am aware that at some point in the near future I need to learn git...

Now that I have successfully transformed my worksheet into rst with these warnings:

Processing How to implement new algebraic structures in Sage.sws
Warning: node not supported (or something else?) address
Warning: node not supported (or something else?) address
Warning: node not supported (or something else?) address
Warning: node not supported (or something else?) address
Warning: node not supported (or something else?) address
Warning: node not supported (or something else?) h5
Warning: node not supported (or something else?) h5
Warning: node not supported (or something else?) h5
Warning: node not supported (or something else?) h5
Warning: node not supported (or something else?) h5

what remains to do to turn the .rst file into html?

comment:112 follow-up: Changed 7 years ago by SimonKing

Perhaps rst2html.py?

It gave me the warnings

(sage-sh) simon@linux-sqwp:Worksheets$ rst2html.py How_to_implement_new_algebraic_structures_in_Sage.rst > How_to_implement_new_algebraic_structures_in_Sage.html
How_to_implement_new_algebraic_structures_in_Sage.rst:173: (ERROR/3) Unknown interpreted text role "class".
How_to_implement_new_algebraic_structures_in_Sage.rst:192: (WARNING/2) Inline strong start-string without end-string.
How_to_implement_new_algebraic_structures_in_Sage.rst:829: (WARNING/2) Inline strong start-string without end-string.
How_to_implement_new_algebraic_structures_in_Sage.rst:1234: (WARNING/2) Inline interpreted text or phrase reference start-string without end-string.

But I think my problem is that the .rst file really belongs (and links to) to the docs of Sage. So, I'll try to put it into devel/sage/doc/en/thematic_tutorials/ and let sage -docbuild do the work.

comment:113 in reply to: ↑ 112 Changed 7 years ago by kcrisman

Perhaps rst2html.py?

No.

But I think my problem is that the .rst file really belongs (and links to) to the docs of Sage. So, I'll try to put it into devel/sage/doc/en/thematic_tutorials/ and let sage -docbuild do the work.

That's reasonable. I used Sphinx quickstart inside Sage - I thought I documented that somewhere here, but perhaps I didn't.

comment:114 Changed 7 years ago by SimonKing

Since the resulting .rst file is supposed to end up in the thematic tutorials anyway, I tried it, and it seems to work more or less. Some details are not as they were intended.

  • The original worksheet contained an address in some html block. It came out verbously, i.e., in the html page, I see
    <address style=”text-align: right;”><span style=”font-size: small;”>Simon King</span></address> <address style=”text-align: right;”><span style=”font-size: small;”>Friedrich-Schiller-Universität Jena</span></address> <address style=”text-align: right;”><span style=”font-size: small;”>E-mail: simon dot king at uni hyphen jena dot de</span></address> <address style=”text-align: right;”><span style=”font-size: small;”>© 2011</span></address> <address><br /></address>
    
  • Is there the possibility to insert Sage-specific links, such as :class:`sage.structure.parent.Parent`? I think adding this would be a nice feature.
  • Indentation does not work. See below.
  • In my worksheet, I have:

methods for all objects of a category, and for all elements of such objects.

This became

methods *for all objects of a category*, and *for all elements of such objects*

  • Headers of level 5 are not recognised. Hence, <h5 style=”padding-left: 30px;”><span style=”font-size: large;”>1. A coercion is a map, a conversion may be a <em>partial</em> map</span></h5> is printed verbously.

Concerning indentation:

In my worksheet, I have something in a html field that looks like this:

Outline

Use existing base classes

There are sub-classes of sage.structure.parent.Parent resp. of sage.structure.element.Element that will help you a lot. Inheriting from these classes is essential for using Sage's coercion system.

Arithmetic operations should be implemented by single underscore methods, such as _add_, _mul_.

The outcome then looks like

Outline

Use existing base classes

There are sub-classes of sage.structure.parent.Parent resp. of sage.structure.element.Element that will help you a lot. Inheriting from these classes is essential for using Sage's coercion system.

Arithmetic operations should be implemented by single underscore methods, such as _add_, _mul_.

comment:115 Changed 7 years ago by SimonKing

PS: At the end of my previous post, note that the word "Outline" was bold and underlined in the worksheet, but came out just as bold. Also font sizes were not respected. But I didnt use <h2> and those things here, so, it could be considered my fault.

comment:116 follow-up: Changed 7 years ago by kcrisman

Hmm, not recognizing level 5 (or 6) headers might be a problem. Otherwise:


You can insert the links yourself, but I think that adding them all the time wouldn't make sense, as such rst files might not necessarily become part of Sage documentation, per se.

Unrecognized tags will not be guessed at, I think that is a feature.

Indentation I'm not sure about. Whitespace doesn't count; you should really be using some formatting for that, I think - not exactly user error, but probably not something one could design around. What does the original worksheet have there (in its html version)? What that looks like would have a big impact on it.

<p>Here is some text.</p>
<p><span style="white-space: pre;"> </span>Here, I've tabbed.</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Here I just used spaces.</p>

is what I get, and it looks to me like the tabs from TinyMCE might not really get much done to them; I bet the third line would work, though. I'd consider this an enhancement request.

comment:117 in reply to: ↑ 116 ; follow-up: Changed 7 years ago by SimonKing

Replying to kcrisman:

You can insert the links yourself, but I think that adding them all the time wouldn't make sense, as such rst files might not necessarily become part of Sage documentation, per se.

OK. I guess that in the end I have to edit the rst file anyway.

Unrecognized tags will not be guessed at, I think that is a feature.

The notebook comes with an editor (tinyMCE, or what is it's name?), and I think sws2rst should be able to work with what is produced by the editor.

Indentation I'm not sure about. Whitespace doesn't count; you should really be using some formatting for that,

I used the editor in the notebook. Shouldn't that be fine?

I think - not exactly user error, but probably not something one could design around. What does the original worksheet have there (in its html version)? What that looks like would have a big impact on it.

When I click on the "text" link, I see

<p><span style="text-decoration: underline;"><strong><span style="font-size: x-large;">Outline</span></strong></span></p>
<p><strong>Use existing base classes</strong></p>
<p style="padding-left: 30px;">There are sub-classes of <span style="font-family: arial,helvetica,sans-serif;">sage.structure.parent.Parent</span> resp. of <span style="font-family: arial,helvetica,sans-serif;">sage.structure.element.Element</span> that will help you a lot. Inheriting from these classes is essential for using Sage's coercion system.</p>
<p style="padding-left: 30px;">Arithmetic operations should be implemented by <span style="text-decoration: underline;"><em>single underscore</em></span> methods, such as <span style="font-family: arial,helvetica,sans-serif;">_add_, _mul_</span>.</p>
<p><strong>Turn your parent structure into an object of a category</strong></p>
<p style="padding-left: 30px;">Declare the category during initialisation - Your parent structure will inherit further useful methods and consistency tests.</p>
<p><strong>Provide your parent structure with an element class</strong></p>
<p style="padding-left: 30px;">Assign to it an attribute called "Element" - The elements will inherit further useful methods from the category.</p>
<p style="padding-left: 30px;">In addition, some basic conversions will immediately work.</p>
<p><strong>Implement further conversions<br /></strong></p>
<p style="padding-left: 30px;">Never override a parent's __call__ method! Provide <span style="font-family: arial,helvetica,sans-serif;">_element_constructor_</span> instead.</p>
<p><strong>Declare coercions</strong></p>
<p style="padding-left: 30px;">If a conversion happens to be a morphism, you may consider to turn it into a coercion. It will then <em>implicitly</em> be used in arithmetic operations.</p>
<p><strong><span style="text-decoration: underline;">Advanced coercion:</span> Define construction functors for your parent structure</strong></p>
<p style="padding-left: 30px;">Sage will automatically create new parents for you when needed, by some kind of "pushout" construction.</p>
<p><strong>Run the automatic test suites</strong></p>
<p style="padding-left: 30px;">Each method should be documented and provide a doc test (we are not giving examples here). In addition, any method defined for a category should be supported by a test method that is executed when running the test suite.</p>

So, that's what is produced by the editor.

comment:118 in reply to: ↑ 117 Changed 7 years ago by kcrisman

The notebook comes with an editor (tinyMCE, or what is it's name?), and I think sws2rst should be able to work with what is produced by the editor.

Well, I don't think that arbitrary stuff should be. For instance, we currently ignore 3D graphics.

Indentation I'm not sure about. Whitespace doesn't count; you should really be using some formatting for that,

I used the editor in the notebook. Shouldn't that be fine?

Yes, in principle, but I'm not sure we can support every single feature of TinyMCE, an upstream project.

I think - not exactly user error, but probably not something one could design around. What does the original worksheet have there (in its html version)? What that looks like would have a big impact on it.

When I click on the "text" link, I see

<p><span style="text-decoration: underline;"><strong><span style="font-size: x-large;">Outline</span></strong></span></p>
<p><strong>Use existing base classes</strong></p>
<p style="padding-left: 30px;">There are sub-classes of <span style="font-family: arial,helvetica,sans-serif;">sage.structure.parent.Parent</span> resp. of <span style="font-family: arial,helvetica,sans-serif;">sage.structure.element.Element</span> that will help you a lot. Inheriting from these classes is essential for using Sage's coercion system.</p>
<p style="padding-left: 30px;">Arithmetic operations should be implemented by <span style="text-decoration: underline;"><em>single underscore</em></span> methods, such as <span style="font-family: arial,helvetica,sans-serif;">_add_, _mul_</span>.</p>
<p><strong>Turn your parent structure into an object of a category</strong></p>
<p style="padding-left: 30px;">Declare the category during initialisation - Your parent structure will inherit further useful methods and consistency tests.</p>
<p><strong>Provide your parent structure with an element class</strong></p>
<p style="padding-left: 30px;">Assign to it an attribute called "Element" - The elements will inherit further useful methods from the category.</p>
<p style="padding-left: 30px;">In addition, some basic conversions will immediately work.</p>
<p><strong>Implement further conversions<br /></strong></p>
<p style="padding-left: 30px;">Never override a parent's __call__ method! Provide <span style="font-family: arial,helvetica,sans-serif;">_element_constructor_</span> instead.</p>
<p><strong>Declare coercions</strong></p>
<p style="padding-left: 30px;">If a conversion happens to be a morphism, you may consider to turn it into a coercion. It will then <em>implicitly</em> be used in arithmetic operations.</p>
<p><strong><span style="text-decoration: underline;">Advanced coercion:</span> Define construction functors for your parent structure</strong></p>
<p style="padding-left: 30px;">Sage will automatically create new parents for you when needed, by some kind of "pushout" construction.</p>
<p><strong>Run the automatic test suites</strong></p>
<p style="padding-left: 30px;">Each method should be documented and provide a doc test (we are not giving examples here). In addition, any method defined for a category should be supported by a test method that is executed when running the test suite.</p>

So, that's what is produced by the editor.

Hopefully Pablo can answer this.

comment:119 follow-up: Changed 7 years ago by pang

In general, we ignore style completely. That's part of the goal, not a failure.

The goal of this script is to turn worksheets into simpler, more maintainable, doctest-friendly, restructured text. Restructured text does not support style, it's roughly equivalent to (a subset of) html without css. For instance, there is no construct for underlined text. Html docs generated from rst sources could be shown underlined, depending on the style used by rst2html.

So some info is lost, but that means it will be easier for the next guy to keep your documentation at sync with the current sage version, and correct typos.

If your text is bold and cursive because it is "strong", then this script must recognize it and sorround it with asterisks. If it bold and cursive because you applied some style, it's not going to work, and it's hopeless and a loss of time to try to convert html into rst so that it generates html with the exact same style. Any way of doing so involves hacking sphinx so that it keeps the original html in a style file, in one way or another. By the way, I don't think rst supports text that is both strong and em, I vaguely recall some problems trying to nest one inside the other. I'll check this eventually, forgive my lousy memory.

Regarding indentation, please think which html tag would correspond to the text you wanted to indent: is it a quote? is a code snippet? Then a sws to rst converter will have a chance.

It is true that tinyMCE abuses style, at points at which tags could be used. For example, I had to use the courier font to mark variable names within the text. I wrote a plugin for tinyMCE to add a code button that worked much like the strong button. It was quick dirty code, and the tinyMCE guys ignored the suggestion, but it's definitely doable. May be could tweak the editor so that uses more tags and less style.

In the meantime, different people might use underline and indent to mean different things, and there's little I can do. Please take this script as a help in your conversion work. At some points, it won't guess the canonical way to map your intentions into the simpler rst format, and you'll have to do it manually.

This said, thanks a lot for trying the code, and finding the h5 bug. For some reason, I stopped at h4: there are only 5 levels, are there?. It's hard to know if I can do anything about the rest, given the goal I described above. I'll check carefully, but will probably need more info.

comment:120 in reply to: ↑ 119 Changed 7 years ago by SimonKing

Replying to pang:

For instance, there is no construct for underlined text.

Didn't know that.

If your text is bold and cursive because it is "strong", then this script must recognize it and sorround it with asterisks. If it bold and cursive because you applied some style, it's not going to work, and it's hopeless and a loss of time to try to convert html into rst so that it generates html with the exact same style.

It is bold and cursive because I clicked the bold and cursive buttons in the editor that comes with the notebook. Since bold and cursive is supported by rst, my uneducated guess is that a converter should be able to deal with it.

By the way, I don't think rst supports text that is both strong and em

OK, if that's the case then I have to withdraw my previous statement, and the converter can of course not deal with it.

Regarding indentation, please think which html tag would correspond to the text you wanted to indent: is it a quote? is a code snippet? Then a sws to rst converter will have a chance.

Indentation is indentation, I'd say. It isn't code, because it is in a text field, and not in a code cell. It isn't a quote because -- well, I don't have the notebook open, but I think the editor has some button to typeset quotes, and I didn't click this.

This said, thanks a lot for trying the code, and finding the h5 bug. For some reason, I stopped at h4: there are only 5 levels, are there?.

I don't know. But in any case, I am aware that no converter will work perfectly, and I need to edit the result anyway.

Last edited 7 years ago by SimonKing (previous) (diff)

comment:121 follow-up: Changed 7 years ago by SimonKing

The rst file created from a worksheet seems to end each code cell by

.. end of output

Is there any use of that line? When I removed it, the resulting html looked the same.

comment:122 in reply to: ↑ 121 Changed 7 years ago by pang

Replying to SimonKing:

The rst file created from a worksheet seems to end each code cell by

.. end of output

Is there any use of that line? When I removed it, the resulting html looked the same.

The problem, if i remember correctly, comes if there is a list after an output cell. The list is indented one space, but the output cell does not end until it finds a non-indented line.

comment:123 follow-ups: Changed 7 years ago by kcrisman

  • Reviewers changed from Nicolas Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow, John Palmieri to Nicolas Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow, John Palmieri, Simon King
  • Status changed from needs_review to needs_work
  • Work issues answer questions, math formatting, lists, maybe absolute paths? deleted

It is bold and cursive because I clicked the bold and cursive buttons in the editor that comes with the notebook. Since bold and cursive is supported by rst, my uneducated guess is that a converter should be able to deal with it.

Yes, that does make sense. I guess the question is whether "premature optimization is the root of all evil" or not. In this case, I think that having an sws2rst converter in Sage that at least converts some stuff, but can't handle styles, is worse than having no sws2rst converter at all. I have found this to be extremely useful in its current state, so I'm not sure it's worth postponing it indefinitely for this reason (as opposed to checking the code itself, which I have lamentably not had time to finish off, but which I should do today).

I think that the problem is that

<p style="padding-left: 30px;">

is how you achieved your indentation. That's an HTML style, not whitespace.

Okay, on further investigation, what you must have done is clicked the "indent" button. Usually I only use this with lists, which creates more levels of lists (and which sws2rst already handles correctly).

Pablo, do you think that finding this particular thing? That is, if we have

style="padding-left: 60px;"

where 60 could be any positive integer multiple of 30, as a specific thing in the tag <p ...>, could that specific thing be searched for and then replaced with an equivalent amount of whitespace (maybe four &nbsp; for instance, per 30px)? Certainly it would be nightmarish to try to make that into an appropriate amount of lists, and in any case we don't want the list bullets in that case.


A question for Simon - did the Sage root and scripts patches apply fine on a recent beta of 5.7? I assume so, just checking with respect to rebasing.


Okay, for positive review we need the following, in my opinion. The previous work issues were dealt with long ago.

  • Still to check that regex, though apparently it's working fine
  • Add levels <h5> and <h6> (see here, h6 is also actually supported by TinyMCE in Sage)
  • Decide what to do with address tag (possibly turn it into pre tag? just an idea) and do that
  • Decide whether it's feasible to replace the specific style of indentation mentioned above, and do it if so
  • Make an updated pull request for sagenb (luckily not a problem with respect to rebasing, since we only add new files)

comment:124 in reply to: ↑ 123 Changed 7 years ago by SimonKing

Replying to kcrisman:

I guess the question is whether "premature optimization is the root of all evil" or not. In this case, I think that having an sws2rst converter in Sage that at least converts some stuff, but can't handle styles, is worse than having no sws2rst converter at all.

I guess you mean "is better than"...

Anyway. I found the converter very useful, and so I will soon be able to provide a new thematic tutorial, by conversion of a worksheet that I occasionally used in talks on coercion. So, from me it is "+1" to make it available.

However, I am no sphinx/rst expert nor an expert notebook user. So, I don't feel qualified to be a reviewer for this.


A question for Simon - did the Sage root and scripts patches apply fine on a recent beta of 5.7? I assume so, just checking with respect to rebasing.


Everything applied find to sage-5.7.beta2 (at least after creating a hg repository in devel/sagenb...).

comment:125 in reply to: ↑ 123 ; follow-up: Changed 7 years ago by kcrisman

Okay, for positive review we need the following, in my opinion. The previous work issues were dealt with long ago.

  • Still to check that regex, though apparently it's working fine
  • Add levels <h5> and <h6> (see here, h6 is also actually supported by TinyMCE in Sage)

See here for some conventions, but anyway it should be easy to add something for these, just two places to do it.

  • Decide what to do with address tag (possibly turn it into pre tag? just an idea) and do that
  • Decide whether it's feasible to replace the specific style of indentation mentioned above, and do it if so
  • Make an updated pull request for sagenb (luckily not a problem with respect to rebasing, since we only add new files)

Anyway. I found the converter very useful, and so I will soon be able to provide a new thematic tutorial, by conversion of a worksheet that I occasionally used in talks on coercion. So, from me it is "+1" to make it available.

Great!

However, I am no sphinx/rst expert nor an expert notebook user. So, I don't feel qualified to be a reviewer for this.

No problem.

Everything applied find to sage-5.7.beta2 (at least after creating a hg repository in devel/sagenb...).

Good news.

comment:126 in reply to: ↑ 111 Changed 7 years ago by kcrisman

Now that I have successfully transformed my worksheet into rst with these warnings: what remains to do to turn the .rst file into html?

See the doc you get from running sage --sws2rst, like

sage --sws2rst -h 
sage --sws2rst --sphinxify

Sorry I forgot about that earlier; it's been a while!

comment:127 in reply to: ↑ 125 ; follow-up: Changed 7 years ago by kcrisman

  • Work issues set to address tag, h5 and h6 tags

Okay, for positive review we need the following, in my opinion. The previous work issues were dealt with long ago.

  • Still to check that regex, though apparently it's working fine

Still need to do.

  • Add levels <h5> and <h6> (see here, h6 is also actually supported by TinyMCE in Sage)

See here for some conventions, but anyway it should be easy to add something for these, just two places to do it.

This will be easy, I believe.

  • Decide what to do with address tag (possibly turn it into pre tag? just an idea) and do that

Let's just turn it into pre for now.

  • Decide whether it's feasible to replace the specific style of indentation mentioned above, and do it if so

I think we need to save that for another ticket; although it's unfortunate that TinyMCE does this, I have a feeling this will never get in at all if we wait for one of us to have time to do it (there will be some tricky parsing to do).

  • Make an updated pull request for sagenb (luckily not a problem with respect to rebasing, since we only add new files)

Shouldn't be a problem, probably one of the sagenb folks can help with this as well.

comment:128 Changed 7 years ago by kcrisman

  • Description modified (diff)

comment:129 in reply to: ↑ 127 Changed 7 years ago by kcrisman

  • Status changed from needs_work to needs_review
  • Work issues address tag, h5 and h6 tags deleted
  • Add levels <h5> and <h6> (see here, h6 is also actually supported by TinyMCE in Sage)

See here for some conventions, but anyway it should be easy to add something for these, just two places to do it.

This will be easy, I believe.

Yes. The reason it didn't work for Simon is that Pablo added it to the list of headers, but not the list of tags! I'll be updating the pull shortly and posting a patch here for reference.

  • Decide what to do with address tag (possibly turn it into pre tag? just an idea) and do that

Also done. It turned out that making it into em is the best way to emulate what is there.

I did discover a different issue - it turns out that we probably aren't dealing with line breaks correctly inside of things like strong and em. In HTML you can have multiple lines inside of such a tag, but just adding a newline doesn't work for this, as ReST treats

*Your name
Your email*

as one line of text, which is the usual behavior, but

*Your name

Your email*

is no longer emphasized. I don't think there is a super-easy way to fix this, and as pointed out above one will always have to do some tweaking of the output. Making it em for now seems like a good option, and I've fixed the br tag to have two newlines. I didn't fix it in the pre and math cases because those are working ok as they have a different role with the indentation and all.

  • Still to check that regex, though apparently it's working fine

Okay, I did my best to mess with it, but I think that the only things it won't do correctly wouldn't look very good in a Sage worksheet anyway, so they would be user errors (like \$$x$$ and the like).

  • Make an updated pull request for sagenb (luckily not a problem with respect to rebasing, since we only add new files)

Shouldn't be a problem, probably one of the sagenb folks can help with this as well.

I just updated my pull request with Pablo's second patch as well as my own extra patch, so we are there!


I think that the only thing that has to be done is, at most, for Pablo to confirm that my change to the br tag is okay. It's too late for me to check this with actual worksheets today.

comment:130 Changed 7 years ago by kcrisman

Oops, need to update the Sage patches, I think.

Changed 7 years ago by kcrisman

Apply to root repository

Changed 7 years ago by kcrisman

comment:131 Changed 7 years ago by kcrisman

  • Cc jdemeyer added
  • Dependencies #11080, #11459 deleted
  • Description modified (diff)

Okay, for full review I suppose it would be nice for

  • Pablo to confirm that br tag is okay now or for someone (could be me) to try some worksheets with that to test
  • Jeroen or someone else who knows their way around spkg/bin/sage to make sure the update to the root patch is ok even though the exit etc. is gone (but it seems to be gone from the others as well)

Keshav, the pull requests are updated, so please don't release the next sagenb until this is done.

comment:132 Changed 7 years ago by pang

Karl, I confirm the br tag is ok, and better, with your fix, and I say "thanks a lot". I've been busy switching job, forgot this one!

comment:133 Changed 7 years ago by kcrisman

  • Description modified (diff)
  • Reviewers changed from Nicolas Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow, John Palmieri, Simon King to Nicolas Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow, John Palmieri, Simon King, Karl-Dieter Crisman, Pablo Angulo
  • Status changed from needs_review to positive_review

Thanks so much!

Jeroen, I'm setting this to positive review, but please ask for a fix on the root repo patch if that is not the new way these scripts are called.

comment:134 Changed 7 years ago by nthiery

  • Reviewers changed from Nicolas Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow, John Palmieri, Simon King, Karl-Dieter Crisman, Pablo Angulo to Nicolas M. Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow, John Palmieri, Simon King, Karl-Dieter Crisman, Pablo Angulo

Congratulations guys for finishing this! I am looking forward to it!

comment:135 Changed 7 years ago by jdemeyer

  • Milestone changed from sage-5.9 to sage-5.10

comment:136 Changed 7 years ago by jdemeyer

  • Dependencies set to sagenb-0.10.6
  • Milestone changed from sage-5.10 to sage-pending

comment:137 Changed 7 years ago by jdemeyer

  • Dependencies changed from sagenb-0.10.6 to #14330

comment:138 Changed 7 years ago by kini

  • Description modified (diff)

comment:139 Changed 7 years ago by jdemeyer

  • Description modified (diff)

comment:140 follow-up: Changed 7 years ago by jdemeyer

  • Status changed from positive_review to needs_work
sage -t devel/sagenb-main/sagenb/misc/comments2rst.py
**********************************************************************
File "devel/sagenb-main/sagenb/misc/comments2rst.py", line 47, in sagenb.misc.comments2rst.preprocess_display_latex
Failed example:
    from sagenb.misc.comments2rst import preprocess_display_latex
Exception raised:
    Traceback (most recent call last):
      File "/mazur/release/merger/sage-5.10.beta2/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 454, in _run
        self.execute(example, compiled, test.globs)
      File "/mazur/release/merger/sage-5.10.beta2/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 813, in execute
        exec compiled in globs
      File "<doctest sagenb.misc.comments2rst.preprocess_display_latex[0]>", line 1, in <module>
        from sagenb.misc.comments2rst import preprocess_display_latex
      File "/mazur/release/merger/sage-5.10.beta2/devel/sagenb/sagenb/misc/comments2rst.py", line 34, in <module>
        """
    ImportError: BeautifulSoup must be installed.

    Please either install using Sage spkg installation

        sage -i beautifulsoup

    or by using one of

        pip install BeautifulSoup
        easy_install BeautifulSoup

    in the Sage shell (sage --sh).

**********************************************************************

comment:141 in reply to: ↑ 140 Changed 7 years ago by dimpase

Replying to jdemeyer: how about

sage -i beautifulsoup

first?

Well, with this I have a different failure (on OSX 10.6.8), rc0 (do I need rc1???)

$ sage -t devel/sagenb-main/sagenb/misc/comments2rst.py
Running doctests with ID 2013-04-30-23-31-02-a3c02abb.
Doctesting 1 file.
sage -t devel/sagenb-main/sagenb/misc/comments2rst.py
**********************************************************************
File "devel/sagenb-main/sagenb/misc/comments2rst.py", line 373, in sagenb.misc.comments2rst.Soup2Rst.get_plain_text
Failed example:
    html2rst('<p>Some text with <em>math</em>: $e^{\pi i}=-1$</p>', '')
Expected:
    u'Some text with  *math* : :math:`e^{\\pi i}=-1`\n'
Got:
    u'Some text with  *math* :  :math:`e^{\\pi i}=-1`\n\n'
**********************************************************************
File "devel/sagenb-main/sagenb/misc/comments2rst.py", line 375, in sagenb.misc.comments2rst.Soup2Rst.get_plain_text
Failed example:
    html2rst('<p>Text with <em>incorrect</p> nesting</em>.', '')       
Expected:
    u'Text with  *incorrect*\n\n nesting\n.'
Got:
    u'Text with  *incorrect* \n\n nesting\n.'
**********************************************************************
File "devel/sagenb-main/sagenb/misc/comments2rst.py", line 377, in sagenb.misc.comments2rst.Soup2Rst.get_plain_text
Failed example:
    html2rst('<pre>Preformatted: \n    a+2\n</pre><p> Not preformatted: \n    a+2\n</p>', '')
Expected:
    u'::\n\n    Preformatted: \n        a+2\n    \nNot preformatted:     a\\+2\n'
Got:
    u'::\n\n    Preformatted: \n        a+2\n    \n Not preformatted:     a\\+2\n\n'
**********************************************************************
File "devel/sagenb-main/sagenb/misc/comments2rst.py", line 381, in sagenb.misc.comments2rst.Soup2Rst.get_plain_text
Failed example:
    html2rst('<p>some text</p><p>$$</p><p>3.183098861 \cdot 10^{-1}</p><p>$$</p>','')
Expected:
    u'some text\n\n.. MATH::\n\n    3.183098861 \\cdot 10^{-1}\n'
Got:
    u'some text\n\n.. MATH::\n\n    3.183098861 \\cdot 10^{-1}\n\n.. end of math\n\n'
**********************************************************************
1 item had failures:
   4 of   7 in sagenb.misc.comments2rst.Soup2Rst.get_plain_text
    [23 tests, 4 failures, 0.09 s]
----------------------------------------------------------------------
sage -t devel/sagenb-main/sagenb/misc/comments2rst.py  # 4 doctests failed

comment:142 follow-up: Changed 7 years ago by jdemeyer

Doctests should pass without optional packages. Add

# optional - beautifulsoup

where needed.

comment:143 in reply to: ↑ 142 Changed 7 years ago by dimpase

Replying to jdemeyer:

Doctests should pass without optional packages. Add

# optional - beautifulsoup

where needed.

shouldn't we consider making it standard? it's 32.5K and the MIT license...

comment:144 Changed 7 years ago by kcrisman

Huh, I didn't consider running tests since this is upstream and why would we be running upstream tests? ;-)

I won't have time to fix this immediately. But maybe I can try to do a pull request "manually" on github. As you can see, all the text failures are due to an additional space and extra newlines which I didn't think to check, so easy to fix.

I don't know whether we should really put the optional EVERYwhere. I guess we can.

I don't see the need to make beautifulsoup standard, though; it's only needed for this one thing, and the import error is pretty informative.

comment:145 follow-up: Changed 7 years ago by kcrisman

I've opened Github issue 159 for this.

comment:146 in reply to: ↑ 145 Changed 7 years ago by kcrisman

I've opened Github issue 159 for this.

And pull request 160 should (?) fix it. I think this is easier than making things standard.

comment:147 Changed 7 years ago by kcrisman

  • Status changed from needs_work to needs_review

comment:148 Changed 7 years ago by kcrisman

  • Status changed from needs_review to positive_review

The sws2rst problems should be gone now, as that pull request has been merged upstream. However, #14330 is not currently ready, so I'm leaving this as sage-pending.

comment:149 Changed 7 years ago by kcrisman

And while we're waiting for #14330...

Anyone involved on this ticket ever seen this extension?

Changed 7 years ago by dimpase

example sws with an error

comment:150 Changed 7 years ago by dimpase

I am getting

$ /usr/local/src/sage/sage-5.10.rc1/sage -sws2rst exp1.sws
Processing exp1.sws
File at exp1.rst
Image directory at exp1_media
  File "exp1.sws", line 1
SyntaxError: Non-ASCII character '\xa2' in file exp1.sws on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

on the attached exp1.sws. This is with sage-5.10.rc1 on OSX 10.6.8. Any ideas why? Is it just me?

The error message is really weird, by the way. as sws files are not ASCII...

comment:151 follow-up: Changed 7 years ago by kcrisman

I can look into this eventually. Given that the title is

~/exp1

I suspect that may be the problem. Probably it can't handle certain titles at this point, presumably we could fix this. Weirdly, I can't find where in worksheet.html the title shows up any more. I thought in the past that was in that file, but when I open it, it isn't there (not just this worksheet).

However, unless all your worksheets are suddenly corrupted by this, I think it's not worth rescinding the review.

Changed 7 years ago by dimpase

another, trivial, example of sws giving an error

comment:152 in reply to: ↑ 151 Changed 7 years ago by dimpase

Replying to kcrisman:

I can look into this eventually. Given that the title is

~/exp1

I suspect that may be the problem. Probably it can't handle certain titles at this point, presumably we could fix this. Weirdly, I can't find where in worksheet.html the title shows up any more. I thought in the past that was in that file, but when I open it, it isn't there (not just this worksheet).

Do you mean you can reproduce the error I get on this sws? I attach another one, completely trivial.

In a way, it's not a real error, it's just an error message; I get all the files created as they should.

However, unless all your worksheets are suddenly corrupted by this, I think it's not worth rescinding the review.

Well, it could be the new sagenb spkg and the new Sage release (5.10.rc1).

Changed 7 years ago by dimpase

adding missing exec in spkg/bin/sage

comment:153 Changed 7 years ago by dimpase

  • Description modified (diff)
  • Status changed from positive_review to needs_work

I've found the source of trouble - it's missing exec in invocation of sage-sws2rst in spkg/bin/sage.This causes a different way to parse the argument. More precisely, without exec the script does not exit immediately, but proceeds to analyze other possible execution, and it proceeds to execute sage-run on the sws input!

The 1-line patch is attached. (And only it needs review).

Last edited 7 years ago by dimpase (previous) (diff)

comment:154 Changed 7 years ago by dimpase

  • Status changed from needs_work to needs_review

comment:155 Changed 7 years ago by kcrisman

  • Status changed from needs_review to positive_review

Do you mean you can reproduce the error I get on this sws? I attach another one, completely trivial.

No, I had to ditch the upstream because of the error I mention on #14430. I was referring to something else.

Jeroen, I'm setting this to positive review, but please ask for a fix on the root repo patch if that is not the new way these scripts are called.

I should have not been this passive about it, in retrospect. Of course this totally makes sense looking at the other ones. I do not, however, have time to actually try this out. But I think this should be okay.

comment:156 Changed 7 years ago by jdemeyer

  • Description modified (diff)

comment:157 Changed 7 years ago by jdemeyer

  • Merged in set to sage-5.11.beta1
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:158 Changed 7 years ago by jdemeyer

  • Merged in sage-5.11.beta1 deleted
  • Resolution fixed deleted
  • Status changed from closed to new

Merged by mistake since it depends on #14330.

comment:159 Changed 7 years ago by jdemeyer

  • Status changed from new to needs_review

comment:160 Changed 7 years ago by jdemeyer

  • Status changed from needs_review to positive_review

comment:161 Changed 6 years ago by jdemeyer

  • Description modified (diff)

comment:162 Changed 6 years ago by jdemeyer

  • Milestone changed from sage-pending to sage-5.11

comment:163 Changed 6 years ago by schilly

the optional spkg is on the server

comment:164 Changed 6 years ago by jdemeyer

  • Milestone changed from sage-5.11 to sage-pending

comment:165 Changed 6 years ago by dimpase

What is needed from #14330 here? I tried this ticket with the sagenb included in Sage 5.9, and it seems to work just fine!

comment:166 Changed 6 years ago by jdemeyer

  • Milestone changed from sage-pending to sage-5.12

comment:167 Changed 6 years ago by jdemeyer

  • Merged in set to sage-5.12.beta0
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.