Opened 7 years ago

Closed 6 years ago

#18270 closed enhancement (fixed)

Print matrices using unicode large delimiters (on demand)

Reported by: gagern Owned by:
Priority: major Milestone: sage-6.7
Component: user interface Keywords: unicode matrix
Cc: Merged in:
Authors: Martin von Gagern Reviewers: Volker Braun
Report Upstream: N/A Work issues:
Branch: 6b6f089 (Commits, GitHub, GitLab) Commit: 6b6f089590377c61061bc4629e44adb47da6a0e1
Dependencies: Stopgaps:

Status badges

Description (last modified by gagern)

Unicode provides nice symbols which can be combined to form large delimiters. It would be nice if we could print matrices using these, instead of the same ASCII brackets on all the lines. So instead of

[1 2|3]
[4 5|6]
[7 8|9]

I'd like to see one of these:

⎛1 2│3⎞    ⎡1 2│3⎤
⎜4 5│6⎟ or ⎢4 5│6⎥
⎝7 8│9⎠    ⎣7 8│9⎦

Perhaps it's best to do this in small increments: start with a keyword argument to the str method of matrices, then later on make this the default. So the ticket here is only for on-demand support of this feature, not for its automatic use by default. For later reference, see #14733 which switched the banner to Unicode, and thereby made the choice that Sage may look broken on non-Unicode terminals.

Change History (23)

comment:1 Changed 7 years ago by gagern

  • Branch set to u/gagern/MatrixUnicodeDelimiters

comment:2 Changed 7 years ago by gagern

  • Commit set to e174830facfed57934dec14cf107ffd6952955b5
  • Status changed from new to needs_review

New commits:

e174830Trac #18270: Introduce unicode_symbols keyword argument to Matrix.str

comment:3 follow-up: Changed 7 years ago by vdelecroix

This banner is already a mess. If I ssh + screen + sage at my laboratory, then I got a ugly

�����������������������������������

And then I am not able to see anything else (but still inputting Sage commands work).

The new matrices look nice though. For the on-demand feature it would be nice to have a global flag allowing (or avoiding) unicode

$ sage --with-no-unicode

Vincent

comment:4 follow-up: Changed 7 years ago by vdelecroix

Why not only unicode for the argument name?

comment:5 in reply to: ↑ 4 Changed 7 years ago by gagern

Replying to vdelecroix:

Why not only unicode for the argument name?

Because in Python 2 unicode names a type as well, and I feared that at some point we might want to convert element types to unicode, thus have to call that coercion function.

comment:6 in reply to: ↑ 3 ; follow-up: Changed 7 years ago by gagern

Replying to vdelecroix:

This banner is already a mess. If I ssh + screen + sage at my laboratory, …

Perhaps you should file that as a bug, so it can be addressed? Does using screen -U help? What does locale print on your client's terminal, inside the ssh and inside screen respectively?

For the on-demand feature

I meant “on-demand” as opposed to “automatic”, with the demand being expressed for each matrix that gets printed, i.e. by using the keyword argument from my commit. I believe you're talking about something far more automatic here.

it would be nice to have a global flag allowing (or avoiding) unicode

The canonical way, at least on Linux, would be to inspect LC_CTYPE facet of the current locale, and detect whether that refers to UTF8 or not. If one used this to automatically choose a sane default, then setting the LC_CTYPE environment variable manually would serve the same function as the switch you suggest:

$ LC_CTYPE=en_US      sage  # without unicode
$ LC_CTYPE=en_US.utf8 sage  # with unicode

Of course, using en_US in this example is just the common default. I guess you as well as I might be using a different setting there in practice. Adding a switch might still make sense to add visibility to this feature. But all of this should probably be discussed in a follow-up ticket once we get the change here accepted.

comment:7 in reply to: ↑ 6 Changed 7 years ago by vdelecroix

Replying to gagern:

Replying to vdelecroix:

This banner is already a mess. If I ssh + screen + sage at my laboratory, …

Perhaps you should file that as a bug, so it can be addressed? Does using screen -U help? What does locale print on your client's terminal, inside the ssh and inside screen respectively?

Indeed, screen -U solves the problem. My locale on the client is

LANG=en_GB.UTF-8
LANGUAGE=en_GB:en
LC_CTYPE="en_GB.UTF-8"
...

and on the remote is

LANG=C
LANGUAGE=fr_FR:
LC_CTYPE="C"
...

For the on-demand feature

I meant “on-demand” as opposed to “automatic”, with the demand being expressed for each matrix that gets printed, i.e. by using the keyword argument from my commit. I believe you're talking about something far more automatic here.

it would be nice to have a global flag allowing (or avoiding) unicode

The canonical way, at least on Linux, would be to inspect LC_CTYPE facet of the current locale, and detect whether that refers to UTF8 or not. If one used this to automatically choose a sane default, then setting the LC_CTYPE environment variable manually would serve the same function as the switch you suggest:

$ LC_CTYPE=en_US      sage  # without unicode
$ LC_CTYPE=en_US.utf8 sage  # with unicode

Of course, using en_US in this example is just the common default. I guess you as well as I might be using a different setting there in practice. Adding a switch might still make sense to add visibility to this feature. But all of this should probably be discussed in a follow-up ticket once we get the change here accepted.

I like your suggestion very much.

Vincent

comment:8 follow-up: Changed 7 years ago by vdelecroix

  • Status changed from needs_review to needs_work

review comment:

  • I would very much prefer that you avoid replace as well as the final step which deals with the top and bottom line. You should rather define all the characters needed as variables at the begining (as you did for left_bracket, right_bracket, etc). That would help for readability and if at some point we decide that all these characters are arguments of the function.
  • why not curly bracket?

Vincent

comment:9 in reply to: ↑ 8 ; follow-up: Changed 7 years ago by gagern

Replying to vdelecroix:

review comment:

  • I would very much prefer that you avoid replace as well as the final step which deals with the top and bottom line. You should rather define all the characters needed as variables at the begining (as you did for left_bracket, right_bracket, etc). That would help for readability and if at some point we decide that all these characters are arguments of the function.

Indeed when I started this patch, I had a line like

top_left_bracket, mid_left_bracket, … = u"⎡⎢⎣⎤⎥⎦"

but the long names made things very hard to read. And shorter alternatives, like tlb, make things a bit hard to understand and therefore harder to maintain. Quite the opposite of “help for readability”, in my opinion. But if you insist, I can go with this approach, using the short names.

Keep in mind that the left_bracket and right_bracket strings, which have been there before, are not only representing the brackets, but also collecting the vertical lines at the ends.

Plugging the delimiting brackets in the right rows is a non-trivial affair, and would therefore in my opinion be far harder to maintain. The possibility of an arbitrary number of horizontal lines at either end complicates things considerably. I haven't been able to come up with a reasonably simple code snippet to achieve correct symbols automatically while building these strings. So unless you absolutely insist on this point, or can give a good reason on why my approach has very undesirable reprecussions, or can suggest some formulation, I'd rather stick with my approach. Or adjust it in such a way that it assembles the matrix body without any brackets, and then adds all the bracket parts in a single loop instead of replacing existing incorrect parts.

  • why not curly bracket?

Because I don't encounter them in my day-to-day work. Can you give me a hint as to where these might occur? For an even number of rows (except two), big curly brackets would look slightly unsymmetric, but I doubt that should be a concern.

comment:10 in reply to: ↑ 9 ; follow-up: Changed 7 years ago by vdelecroix

Replying to gagern:

Replying to vdelecroix:

review comment:

  • I would very much prefer that you avoid replace as well as the final step which deals with the top and bottom line. You should rather define all the characters needed as variables at the begining (as you did for left_bracket, right_bracket, etc). That would help for readability and if at some point we decide that all these characters are arguments of the function.

Indeed when I started this patch, I had a line like

top_left_bracket, mid_left_bracket, … = u"⎡⎢⎣⎤⎥⎦"

but the long names made things very hard to read. And shorter alternatives, like tlb, make things a bit hard to understand and therefore harder to maintain. Quite the opposite of “help for readability”, in my opinion. But if you insist, I can go with this approach, using the short names.

If there are comments it is fine. For example

tlb = u"⎡"    # top left bracket
...

or

# we set delimiters as a string composed of 6 characters
#   - top left bracket (tlb)
#   ...
if unicode_symbols:
    delimiters = u"⎡⎢⎣⎤⎥⎦"
else:
    delimiters = "[[[]]]"
tlb, ... = delimiters
  • why not curly bracket?

Because I don't encounter them in my day-to-day work. Can you give me a hint as to where these might occur? For an even number of rows (except two), big curly brackets would look slightly unsymmetric, but I doubt that should be a concern.

By "curly", I meant only using ⎧ ⎫ ⎩ ⎭ instead of your more angular version. The term was probably wrong. See

sage: m = matrix([[2,1],[0,1]])
sage: view(m)

... no angle...

comment:11 in reply to: ↑ 10 Changed 7 years ago by gagern

Replying to vdelecroix:

If there are comments it is fine. For example …

OK, I can do that. It would be 11 characters: the six you used, plus two for single-row matrices, plus three line-drawing characters. The latter could be a separate string if you prefer. I'd make it one string because I'm lazy, but that's a weak argument.

By "curly", I meant only using ⎧ ⎫ ⎩ ⎭ instead of your more angular version.

Ah, round parentheses. Personally, I much prefer these round ones as well. I had the impression that in the USA especially, the square brackets are somewhat more common. But that impression (based mostly on Wikipedia) may be wrong, and in any case the comparison with the LaTeX rendering is a strong motivations to make things consistent and satisfy my personal preference at the same time. We might of course let the user choose between these alternatives, by allowing a string like "round" or "square" instead of True for the keyword argument.

comment:12 follow-up: Changed 6 years ago by vbraun

I've made #18357: Unicode Art.

IMHO its

  • British: round bracket ()
  • American: parentheses ()
  • square bracket []
  • curly bracket {}

I also prefer round for matrices. Could you also change it to unicode=<bool> instead of unicode_symbol (shorter and clearer).

comment:13 in reply to: ↑ 12 Changed 6 years ago by vbraun

Fixed:

  • British: round bracket ()
  • American: parentheses ()
  • square bracket []
  • curly brace {}
Last edited 6 years ago by vbraun (previous) (diff)

comment:14 Changed 6 years ago by git

  • Commit changed from e174830facfed57934dec14cf107ffd6952955b5 to e4e5f4be04b0124d6d599b937a3ad3fed0377fe3

Branch pushed to git repo; I updated commit sha1. New commits:

e4e5f4bImproved unicode delimiters for matrices, allow choosing the shape

comment:15 Changed 6 years ago by gagern

  • Description modified (diff)
  • Status changed from needs_work to needs_review

comment:16 Changed 6 years ago by git

  • Commit changed from e4e5f4be04b0124d6d599b937a3ad3fed0377fe3 to 206b0363fa23c6ae2642ee3b192195705fcef0d9

Branch pushed to git repo; I updated commit sha1. New commits:

206b036Format link to trac ticket

comment:17 Changed 6 years ago by vbraun

  • Reviewers set to Volker Braun
  • Status changed from needs_review to positive_review

Looks great!

I would prefer unicodedata.lookup('RIGHT SQUARE BRACKET EXTENSION') over , pasting the unicode in makes it hard to review that the correct code point is used. I'll refactor that in #18357 if you don't mind.

comment:18 Changed 6 years ago by vbraun

  • Status changed from positive_review to needs_work

PDF docs don't build:

! Package inputenc Error: Unicode char \u8:⎛ not set up for use with LaTeX.

See the inputenc package documentation for explanation.
Type  H <return>  for immediate help.
 ...                                              
                                                  
l.7663 \PYG{g+go}{⎛1 2\textbar{}3⎞}
                                       
? 
! Emergency stop.
 ...                                              
                                                  
l.7663 \PYG{g+go}{⎛1 2\textbar{}3⎞}

comment:19 Changed 6 years ago by vbraun

You just have to add a suitable DeclareUnicodeCharacter to src/doc/common/conf.py

comment:20 Changed 6 years ago by git

  • Commit changed from 206b0363fa23c6ae2642ee3b192195705fcef0d9 to 6b6f089590377c61061bc4629e44adb47da6a0e1

Branch pushed to git repo; I updated commit sha1. New commits:

6b6f089TeX rendering for unicode big delimiters

comment:21 Changed 6 years ago by gagern

  • Status changed from needs_work to needs_review

This is my attempt to make things look reasonably useful, even though the vertical spacing and placement is far from optimal. But seeing how many other unicode symbols are represented with crude ASCII work-arounds, I think the amount of effort I put into this should be at least on a similar level.

I just filed #18370 about better unicode support by switching to XeTeX or luaTeX. I have no experience with these, but I've read on several occasions that they offer far superior Unicode support. In this sense, I hope that my unicode symbol declarations will be a temporary workaround, although I have no idea just how temporary.

comment:22 Changed 6 years ago by vbraun

  • Status changed from needs_review to positive_review

Sounds good. The main point of running latex on the docs is to validate the markup, nobody is going to print out the pdf.

comment:23 Changed 6 years ago by vbraun

  • Branch changed from u/gagern/MatrixUnicodeDelimiters to 6b6f089590377c61061bc4629e44adb47da6a0e1
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.