Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#11498 closed enhancement (fixed)

Improve LaTeXing of strings

Reported by: novoselt Owned by: jason, mpatel, was
Priority: major Milestone: sage-4.7.2
Component: user interface Keywords: sd31
Cc: kcrisman, jhpalmieri Merged in: sage-4.7.2.alpha0
Authors: Andrey Novoseltsev Reviewers: John Palmieri
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description (last modified by novoselt)

See the attached PDF for an example of a class that has no _latex_. All commands are repeated twice, one with Sage-4.7 and the second one with the patch applied. In both cases typeset checkbox on the top of the worksheet was tuned on.

Also, I invite you to try

RubiksCube()

in the notebook with typeset mode on before and after the patch - such situations are precisely the ones that I wanted to solve (they are so bad, that printout does not reflect them accurately).

It is possible that user classes will not have customized latex method and there are some standard classes that use their string representation. If this representation includes multiple lines, the result in most cases is not very good and sometimes quite horrible, making it unpleasant to use the notebook in typeset mode. The patch aims to fix this situation and clean up latex module along the way.

The major change is in sage.latex.str_function. Also sage.latex.JSMath.eval is simplified a bit and does not alter the code almost at all anymore. The rest are mostly doctest adjustments and some code clean-up and simplifications which happened while I was going through it to understand how it works.

Apply:

  1. trac_11498_LaTeX.patch

Attachments (4)

String_LaTeXing_--_Sage.pdf (58.5 KB) - added by novoselt 10 years ago.
trac_11498_improve_string_LaTeXing.patch (29.0 KB) - added by novoselt 10 years ago.
Updated version.
trac_11498_LaTeX.patch (28.5 KB) - added by jhpalmieri 10 years ago.
new version; apply only this patch
trac_11498_delta.patch (1.9 KB) - added by jhpalmieri 10 years ago.
for reference only; difference between old patch and new one

Download all attachments as: .zip

Change History (18)

Changed 10 years ago by novoselt

comment:1 Changed 10 years ago by novoselt

I still need to work on these failures:

	sage -t -long devel/sage-main/sage/graphs/generic_graph.py # 1 doctests failed
	sage -t -long devel/sage-main/sage/combinat/free_module.py # 13 doctests failed
	sage -t -long devel/sage-main/sage/modular/hecke/algebra.py # 1 doctests failed
	sage -t -long devel/sage-main/doc/de/tutorial/latex.rst # 1 doctests failed
	sage -t -long devel/sage-main/sage/sets/set.py # 1 doctests failed
	sage -t -long devel/sage-main/sage/misc/html.py # 2 doctests failed
	sage -t -long devel/sage-main/doc/en/tutorial/latex.rst # 1 doctests failed

comment:2 Changed 10 years ago by novoselt

  • Description modified (diff)
  • Status changed from new to needs_review

comment:3 Changed 10 years ago by novoselt

By the way, the issue of this ticket was discussed here:

http://groups.google.com/group/sage-devel/browse_thread/thread/9aee79aaa40c8390/e5d55c0b800775b1

So John is cc-ed due to his explicit request, hint-hint ;-)

comment:4 follow-up: Changed 10 years ago by jhpalmieri

Overall, this looks pretty good, but it could use some work. Here are some comments.

  • in html.py, the comments "We work around a limitation of jsmath (it can't typeset '\texttt')::" should be deleted, since those doctests don't involve texttt anymore. (Do we even need to do the replacement s.replace('\\texttt','\\hbox') now, or can that code be deleted?)
  • in latex.py, for bool_function and None_function, "\mathrm{%s}" is preferable to "{\rm %s}". Actually, I think we can just return "\mathrm{%s}" always, regardless of notebook vs. command line distinctions. Do you know why we had two versions for bool_function before? It looks okay to me without the mbox...
  • a few typos and rewordings:
    @@ -239,7 +239,7 @@ def str_function(x):
         If ``x`` contains only digits with, possibly, a single decimal point and/or
         a sign in front, it is considered to be its own representation. Otherwise
         each line of ``x`` is wrapped in a ``\verb`` command and these lines are
    -    assembled in a left-justified array. This gives to complicated string the
    +    assembled in a left-justified array. This gives to complicated strings the
         closest look to their "terminal representation".
        
         .. warning:: Such wrappers **cannot** be used as arguments of LaTeX
    @@ -289,7 +289,7 @@ def str_function(x):
         # There is a bug in verb-space treatment in jsMath...
         spacer = "\\phantom{%s}"
         # \phantom{\verb!%s!} is more accurate and it works, but it is not a valid
    -    # LaTeX and may cause problems, so let's live with the above variant untill
    +    # LaTeX and may cause problems, so let's live with the above variant until
         # spaces are properly treated in jsMath/MathJax and we don't need to worry.
         lines = []
         for line in x.split("\n"):
    
  • another comment about the old code: I think that the output from dict_function looks better with no extra space after the colon, so I would suggest the following, but take a look yourself and see what you think:
    @@ -334,12 +334,10 @@ def dict_function(x):
                    \left[\sin\left(z^{2}\right), \frac{1}{2} \, y\right]\right\}
         """
         return "".join([r"\left\{",
    -                    ", ".join(r"%s :\: %s" % (latex(key), latex(value))
    +                    ", ".join(r"%s : %s" % (latex(key), latex(value))
                                   for key, value in x.iteritems()),
                         r"\right\}"])
    
  • Also, in JSMath().eval, if x is already a LaTeX expression, then x=str(x) doesn't do anything, and neither does x=latex(x), so I think we can simplify it more:
    @@ -1680,11 +1681,7 @@ class JSMath:
                 sage: JSMath().eval(type(3), mode='inline')
                 <html>...\verb|&lt;type|\phantom{x}\verb|'sage.rings.integer.Integer'&gt;|</span></html>
             """
    -        # If x is already a LaTeX expression, i.e. the output of latex(blah),
    -        # we will treat it as a string, so that we can see the code itself.
    -        if isinstance(x, LatexExpr):
    -            x = str(x)
    -        # Now get a regular LaTeX representation of x...
    +        # Get a regular LaTeX representation of x...
    
  • Finally, the output in one of the doctests,
                <html>...\verb|&lt;type|\phantom{x}\verb|'sage.rings.integer.Integer'&gt;|</span></html>
    
    looks like bad latex: the "&lt;" inside of the \verb environment shouldn't be typeset correctly, but it actually seems to work. I wonder why...

When we switch to MathJax (#9774), we'll have to redo some of this.

comment:5 in reply to: ↑ 4 Changed 10 years ago by novoselt

Thanks for the comments!

I don't know why there are differences for different modes, this is the first time I am working with these modules. I agree that the fewer are differences the better and will also remove the extra space.

Here is the reason for extra string conversion:

sage: s = "asdf fs"
sage: l = latex(s)
sage: print l
sage: print latex(l)
sage: print latex(str(l))
\verb|asdf|\phantom{x}\verb|fs|
\verb|asdf|\phantom{x}\verb|fs|
\verb"\verb|asdf|\phantom{x}\verb|fs|"

As I understand, the idea is that when user types latex(x) in the notebook, then (s)he wants to see the actual LaTeX code. So it will be wrapped the second time and displayed.

Regarding &lt; - it is not produced in normal LaTeX but can happen only in jsMath. For some reason it parses < even inside verb and the old solution was to insert extra spaces (which somehow prevents it from this mistake), but this alters the output and looks especially bad on typesetting things like >=. Replacing it with HTML code seems to work, I don't know why and I don't really care - once it is fixed upstream, we will see a doctest break and can remove this workaround. (Hopefully we can also get rid of phantoms eventually, but in MathJax the problem was still present two weeks ago. I don't think it is possible to write a doctest that will detect fixing the spacing issue upstream.)

Changed 10 years ago by novoselt

Updated version.

comment:6 Changed 10 years ago by jhpalmieri

  • Reviewers set to John Palmieri

I'm mostly happy with this. Here's a slightly different version (a full patch plus a patch showing the difference between the old one and the new one); I think this does a better job of preserving the original behavior with respect to combinatorial free modules. If you're happy with my changes, feel free to mark this as "positive review".

Changed 10 years ago by jhpalmieri

new version; apply only this patch

Changed 10 years ago by jhpalmieri

for reference only; difference between old patch and new one

comment:7 Changed 10 years ago by novoselt

Hi John, would you mind explaining to me how does your code work? I am not very familiar with regular expressions.

Also, how did you notice the new patch? It kind of annoys me that trac does not send messages when there are new patches attached to the tickets, so I usually leave some comment to trigger it. This time I was waiting since I didn't yet run tests on the whole library to see if anything got broke ;-)

comment:8 Changed 10 years ago by jhpalmieri

I happened to take a look at the ticket, so I saw the new patch; mine is based on that one.

The documentation for regular expressions in Python is here. In the code

if s.find('\\verb') != -1: 
    import re 
    s = re.sub("\\\\verb(.)(.*?)\\1", "\\2", s) 
    s = s.replace("\\phantom{x}", " ") 

the second and third lines are the ones involving regular expressions. Line 3 is a search-and-replace command: it searches the argument s for a pattern \verb followed by a single character (that's the .), and putting that character in parentheses allows you to refer to it later as \1. Then there are any number of characters (that's the .*?, also enclosed in parentheses, so you can refer to it later as \2), followed by whatever the first character was (that's the \\1). It replaces all of this with the string in the second set of parentheses, \2.

There is one subtlety in this: the standard way to refer to any string of characters is .*, but the problem here is that in a string like

\verb|a|, \verb|b|

if we used .*, then the string \2 would match "a|, \verb|b" -- everything between the first "|" and the last "|": by default, regular expression matches use the longest match possible. Using .*? instead of .* tells it to match the shortest string possible instead, so it does two replacements: first \2 matches "a", and then it matches "b", turning this into

a, b

comment:9 Changed 10 years ago by novoselt

  • Status changed from needs_review to positive_review

Thank you!

All tests pass, positive review!

comment:10 Changed 10 years ago by novoselt

  • Description modified (diff)

comment:11 Changed 10 years ago by jdemeyer

  • Milestone changed from sage-4.7.1 to sage-4.7.2

comment:12 Changed 10 years ago by jdemeyer

  • Component changed from notebook to user interface

comment:13 Changed 10 years ago by jdemeyer

  • Merged in set to sage-4.7.2.alpha0
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:14 Changed 10 years ago by jdemeyer

See #12156 for a related ticket.

Note: See TracTickets for help on using tickets.