Opened 6 years ago

Closed 5 years ago

#15036 closed defect (fixed)

Speed up matrix repr for matrices larger than 20

Reported by: nbruin Owned by:
Priority: major Milestone: sage-6.3
Component: linear algebra Keywords:
Cc: Merged in:
Authors: Nils Bruin Reviewers: Travis Scrimshaw, Nathann Cohen
Report Upstream: N/A Work issues:
Branch: 3b0cc7a (Commits) Commit: 3b0cc7a1fc5c45effa3adf72c7e4411ea3a93740
Dependencies: Stopgaps:

Description

See this sage-devel discussion.

Sage has a very nice feature where it tries to not print all entries of a large matrix:

sage: A=matrix(20,20,[2]*400)
sage: A
20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries)

But, as you see, Sage has to do some magic to know that A refers to the matrix in question. Indeed, in other situations it won't find a useful name:

sage: B=A
sage: A
20 x 20 dense matrix over Integer Ring (type 'print obj.str()' to see all of the entries)

The process of finding a name consists of trawling through all "globals" dictionaries on the call stack. This is an incredibly expensive operation which also breaks python's programming model: reverse lookups of bindings are not part of the language specification.

The main concern here is that string representations may not actually make it to an output stream: they may be part of a (perhaps poorly constructed) exception message that gets caught. Fixing the exception message may not be an option, since some of these happen internally in python. TO illustrate the dramatic effect, consider:

sage: def t(depth,N,n):
....:         if depth:
....:                 return t(depth-1,N,n)
....:         else:
....:                 G = matrix([[2 for u in range(N)] for v in range(N)])
....:                 for i in range(n):
....:                         _=repr(G)

Trying it with a flat call stack is already showing that the repr of a 20x20 matrix is much more expensive to produce than for a 19x19 matrix (a factor 80):

sage: timeit('t(0,19,10)')
125 loops, best of 3: 2.14 ms per loop
sage: timeit('t(0,20,10)')
5 loops, best of 3: 172 ms per loop

With a deep call stack this gets even more pronounced:

sage: sage: timeit('t(100,19,10)') 
125 loops, best of 3: 2.22 ms per loop
sage: timeit('t(100,20,10)') 
5 loops, best of 3: 5.45 s per loop

Attachments (2)

15036-matrix_repr_fix.patch (1.4 KB) - added by nbruin 6 years ago.
Make matrix repr for large matrices not trawl the call stack
15036-display_hook_fix.patch (10.4 KB) - added by nbruin 6 years ago.
Basic repr and a static message added in displayhook; doctest fixes.

Download all attachments as: .zip

Change History (40)

Changed 6 years ago by nbruin

Make matrix repr for large matrices not trawl the call stack

comment:1 Changed 6 years ago by nbruin

Straightforward patch attached. Probably needs some doctest fixes.

comment:2 Changed 6 years ago by nbruin

  • Status changed from new to needs_review

comment:3 Changed 6 years ago by nbruin

sage -t devel/sage/sage/schemes/elliptic_curves/heegner.py  # 2 doctests failed
sage -t devel/sage/sage/matrix/matrix2.pyx  # 1 doctest failed
sage -t devel/sage/sage/misc/sageinspect.py  # 2 doctests failed
sage -t devel/sage/sage/modular/modsym/ambient.py  # 3 doctests failed
sage -t devel/sage/sage/matrix/constructor.py  # 1 doctest failed
sage -t devel/sage/sage/rings/polynomial/multi_polynomial_sequence.py  # 1 doctest failed
sage -t devel/sage/sage/matrix/matrix_mod2e_dense.pyx  # 1 doctest failed
sage -t devel/sage/sage/rings/number_field/number_field_ideal.py  # 1 doctest failed
sage -t devel/sage/sage/matrix/matrix_mod2_dense.pyx  # 1 doctest failed
sage -t devel/sage/sage/modular/hecke/hecke_operator.py  # 1 doctest failed
sage -t devel/sage/sage/algebras/steenrod/steenrod_algebra_bases.py  # 2 doctests failed
sage -t devel/sage/sage/matrix/matrix0.pyx  # 5 doctests failed
sage -t devel/sage/sage/homology/chain_complex.py  # 1 doctest failed
sage -t devel/sage/sage/modules/free_module_morphism.py  # 1 doctest failed
sage -t devel/sage/sage/modular/modsym/relation_matrix.py  # 3 doctests failed
sage -t devel/sage/sage/modular/hecke/degenmap.py  # 1 doctest failed

Most of the doctest failures outside of matrix0.pyx are of the form

Expected:
    Vector space of degree 20 and dimension 20 over Rational Field
    Basis matrix:
    20 x 20 dense matrix over Rational Field
Got:
    Vector space of degree 20 and dimension 20 over Rational Field
    Basis matrix:
    20 x 20 dense matrix over Rational Field (type '<matrix>.str()' to see all of the entries)

which suggests that not including a message about .str() may be more appropriate.

A wild thought: If people feel saying something about .str() is really necessary for output printed by the IPython REPL or notebook output, perhaps we should put in a hook on those to do a name lookup in globals there. In that case, we'd still get

sage: A = <large matrix>
sage: A
* x * matrix over * (type 'print A.str()' to see all entries)

but this would happen because TOP LEVEL recognizes it's printing a string representation that may not fully reflect the value. Certainly, TOP LEVEL looking in globals() wouldn't be so bad, AND it wouldn't need to look in stack frames to find what globals dictionary is appropriate.

comment:4 Changed 6 years ago by nbruin

Second attachment regains interactive printing. With the notebook fix suggested on #14469, this should work in the notebook as well.

comment:5 Changed 6 years ago by nbruin

patchbot apply 15036-display_hook_fix.patch

Personally, I wouldn't mind the original patch either. This new patch does stay a little closer to the original output, though.

comment:6 Changed 6 years ago by nbruin

Only a few doctests to amend with the new patch:

sage -t devel/sage/sage/matrix/matrix_mod2e_dense.pyx  # 1 doctest failed
sage -t devel/sage/sage/matrix/matrix0.pyx  # 3 doctests failed

If people are OK with the proposed fix, I'll upload a patch fixing the doctests.

comment:7 follow-up: Changed 6 years ago by rbeezer

My 2 cents.

  1. I like that large matrices do not get printed entirely without being requested explicitly.
  2. Even with a helpful message, I see students who wonder why they do not get the whole matrix. So I would rather not see the bare output
    20 x 20 dense matrix over Rational Field
    
  3. I think it is less important to see the actual name of the matrix parroted in the output (the source of the inefficiency here). So something like the following would be sufficient in my eyes, if it was easier to code/maintain:
    20 x 20 dense matrix over Rational Field (use the '.str()' method to see all of the entries)
    

Thanks for cleaning this up.

Rob

comment:8 in reply to: ↑ 7 ; follow-up: Changed 6 years ago by nbruin

Thanks for the feedback.

UNPATCHED BEHAVIOUR:

sage: L=[matrix(20,20,[2]*400)]
sage: L
[20 x 20 dense matrix over Integer Ring]
sage: A=L[0]
sage: L
[20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries)]
sage: B=L[0]
sage: L
[20 x 20 dense matrix over Integer Ring (type 'print obj.str()' to see all of the entries)]

where you can see:

  • No printing tip if no direct binding is found
  • Printing of an explicit name if a unique direct binding is found
  • Printing of a generic obj name designator if more than one direct binding was found.

This all with an excessively verbose:

sage: L=[matrix(20,20,[2]*400)]*10
sage: A=L[0]
sage: L
[20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries),
 20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries),
 20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries),
 20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries),
 20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries),
 20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries),
 20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries),
 20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries),
 20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries),
 20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries)]

WITH THE PATCH:

sage: L=[matrix(20,20,[2]*400)]
sage: L
[20 x 20 dense matrix over Integer Ring]
sage: A=L[0]
sage: print A
20 x 20 dense matrix over Integer Ring
sage: A
20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries)
sage: L
[20 x 20 dense matrix over Integer Ring]
sage: L[0]
20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries)
sage: B=L[0]
sage: L
[20 x 20 dense matrix over Integer Ring]
sage: L[0]
20 x 20 dense matrix over Integer Ring (type 'print obj.str()' to see all of the entries)

Thus:

  • If the value returned is not a matrix itself (but for instance, a list with matrices in it), no tip is printed. I think that's appropriate.
  • Since the print hook of the interface is the routine that appends the message, a straight print statement never included the tip either
  • If the value itself is a matrix then the tip is included according to the rules that applied before.

I think this is preferable to the old behaviour because

  • we're losing the inefficiency in most string productions
  • the tip is often annoying in list printing. That matrices are abbreviated in a list environment is understandable, I think. People will naturally try to look at the matrix itself if they are interested in the value.
  • we're still getting the tip in most cases where it's likely to be useful.

We can implement Beezer's static suggestion in the displayhook, which will be more efficient (although that is not so important) and more appropriate. In that case we would get:

sage: L=[matrix(20,20,[2]*400)]
sage: L
[20 x 20 dense matrix over Integer Ring]
sage: L[0]
20 x 20 dense matrix over Integer Ring (use the '.str()' method to see the entries)
sage: A=L[0]
sage: L[0]
20 x 20 dense matrix over Integer Ring (use the '.str()' method to see the entries)
sage: print L[0]
20 x 20 dense matrix over Integer Ring

I think this is the best solution, because:

  • It provides a useful tip consistently in a minimal case: when a matrix value is returned that gets printed in abbreviated form.
  • It provides a tip that's always correct, regardless of what bindings exist (and indeed, what bindings exist is immaterial for the problem that the user is facing)
  • We won't be digging around in the globals or the call stack without explicitly being asked to do so.

So should we go for that then?

Incidentally, I think .str() is an awful name for this method. The fact that str(M) and M.str() do similar but different things is a horrible trap. Also, .str and .__str__ are too similar to both exist but be different. I don't have a direct alternative in mind, nor the inclination to follow through on the horrible deprecation process that would be required to remove this wart, though.

comment:9 in reply to: ↑ 8 Changed 6 years ago by rbeezer

Replying to nbruin:

So should we go for that then?

Thanks for the very careful examples. I had not dug that deep.

I agree 100% with 100% of what you said, including the str() function/method mess. I will mostly be very glad to see the "obj." go away from the tip. Thanks again for wrestling with this one.

Rob

Changed 6 years ago by nbruin

Basic repr and a static message added in displayhook; doctest fixes.

comment:10 Changed 6 years ago by nbruin

patchbot apply 15036-display_hook_fix.patch

comment:11 Changed 6 years ago by chapoton

  • Status changed from needs_review to needs_work

one doctest is failing

comment:12 Changed 6 years ago by vbraun_spam

  • Milestone changed from sage-6.1 to sage-6.2

comment:13 Changed 6 years ago by tscrim

  • Authors set to Nils Bruin
  • Branch set to public/ticket/matrix_repr_speedup-15036
  • Commit set to e852ce10285877db93309db156139a7bf3b8bc54
  • Status changed from needs_work to needs_review

I've moved it to a git branch, rebased, and fixed the failing doctest.


New commits:

30776ebMake matrix repr for large matrices not trawl the call stack and move that functionality to displayhook instead.
e852ce1Fixed failing doctest.

comment:14 follow-up: Changed 6 years ago by tscrim

If someone could check my changes, then it's a positive review.

comment:15 in reply to: ↑ 14 Changed 6 years ago by ncohen

  • Reviewers set to Travis Scrimshaw, Nathann Cohen
  • Status changed from needs_review to positive_review

If someone could check my changes, then it's a positive review.

This change looks safe :-P

Nathann

Last edited 6 years ago by ncohen (previous) (diff)

comment:16 Changed 6 years ago by tscrim

I liked the unedited comment. :p Thanks.

comment:17 follow-up: Changed 6 years ago by vbraun

  • Status changed from positive_review to needs_work

You didn't actually move the "find variable name" to the displayhook. Instead you just removed that functionality. Which I don't really mind, but then there is no reason to modify the displayhook. Just put it all into _repr_.

comment:18 in reply to: ↑ 17 Changed 6 years ago by nbruin

  • Status changed from needs_work to needs_review

Replying to vbraun:

You didn't actually move the "find variable name" to the displayhook. Instead you just removed that functionality. Which I don't really mind, but then there is no reason to modify the displayhook. Just put it all into _repr_.

There is. This was carefully argued, see comment 8. I'll repeat the (slightly modified) example here for your convenience:

sage: L=[matrix(20,20,[2]*400) for i in range(3)]
sage: L
[20 x 20 dense matrix over Integer Ring,
20 x 20 dense matrix over Integer Ring,
20 x 20 dense matrix over Integer Ring]

It's annoying and not useful to include the '.str()' message with each of these.

The rest of the examples gives an idea of the balance that is struck by handling the message in displayhook, which Rob and I decided on as the most desirable behaviour.

sage: L[0]
20 x 20 dense matrix over Integer Ring (use the '.str()' method to see the entries)
sage: A=L[0]
sage: L[0]
20 x 20 dense matrix over Integer Ring (use the '.str()' method to see the entries)
sage: print L[0]
20 x 20 dense matrix over Integer Ring

Setting back to "needs review"--Volker, would you set it back to positive if this satisfies your concern? If you want the annoying message all the time, go for it, but you'll be changing a LOT more doctests. This behaviour turned out to be much closer to what we already had.

comment:19 follow-up: Changed 6 years ago by vbraun

Fine, but I still think its fugly to hardcode part of the matrix output into the displayhook. If you want to achieve that, how about introducing a special method _repr_help(), say, whose output will be appended in the displayhook if that object is the top-level output. That would give us an extensible mechanism and keep the declaration of the representation in the matrix module.

comment:20 in reply to: ↑ 19 Changed 6 years ago by nbruin

Replying to vbraun:

Fine, but I still think its fugly to hardcode part of the matrix output into the displayhook. If you want to achieve that, how about introducing a special method _repr_help(), say, whose output will be appended in the displayhook if that object is the top-level output. That would give us an extensible mechanism and keep the declaration of the representation in the matrix module.

I agree that refactoring is worth doing if we have more cases (and will be easy). Currently, the only objects this would appy to would be matrices and lists, and we can't reach into lists to implement the protocol there. For now we'd be designing and implementing a protocol to use once.

(Conceptually, it is DisplayHook that should provide the hint: the object is simply giving a compact representation. It's not its task to provide help. The top-level interface is the right place to decide to provide help. Once we have more than one object requiring this, it starts making sense to implement this by letting the DisplayHook ask the object for a "more helpful" string representation.)

comment:21 follow-up: Changed 6 years ago by jhpalmieri

I think you should remove the line

from sage.misc.sageinspect import sage_getvariablename

from the __repr__ method in matrix0.pyx.

comment:22 Changed 6 years ago by nbruin

  • Branch changed from public/ticket/matrix_repr_speedup-15036 to u/nbruin/ticket/15036
  • Created changed from 08/11/13 20:26:57 to 08/11/13 20:26:57
  • Modified changed from 02/08/14 17:48:02 to 02/08/14 17:48:02

comment:23 in reply to: ↑ 21 Changed 6 years ago by nbruin

  • Commit changed from e852ce10285877db93309db156139a7bf3b8bc54 to fdb32821fa9382cb3bd468d1d0615f8f246470b0

Replying to jhpalmieri:

I think you should remove the [import]

Good point. Done.


New commits:

fdb3282trac #15036: remove redundant import statement

comment:24 Changed 5 years ago by jhpalmieri

This was on the verge of getting a positive review a while ago. Any progress?

comment:25 Changed 5 years ago by vbraun_spam

  • Milestone changed from sage-6.2 to sage-6.3

comment:26 follow-up: Changed 5 years ago by rws

  • Status changed from needs_review to needs_work

With 6.3base0 (tested only matrix/):

sage -t src/sage/matrix/matrix_mod2e_dense.pyx  # 1 doctest failed
sage -t src/sage/matrix/constructor.py  # 1 doctest failed
sage -t src/sage/matrix/matrix_mod2_dense.pyx  # 1 doctest failed
sage -t src/sage/matrix/matrix0.pyx  # 4 doctests failed

comment:27 Changed 5 years ago by git

  • Commit changed from fdb32821fa9382cb3bd468d1d0615f8f246470b0 to d41816c57ede520353ed24ea9534c1219137d295

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

508e858Make matrix repr for large matrices not trawl the call stack and move that functionality to displayhook instead.
6edd34fFixed failing doctest.
0eb5514trac #15036: remove redundant import statement
d41816ctrac 15036: make some changes for rebase on 6.2

comment:28 in reply to: ↑ 26 Changed 5 years ago by nbruin

  • Status changed from needs_work to needs_review

Replying to rws:

With 6.3base0 (tested only matrix/):

sage -t src/sage/matrix/matrix_mod2e_dense.pyx  # 1 doctest failed
sage -t src/sage/matrix/constructor.py  # 1 doctest failed
sage -t src/sage/matrix/matrix_mod2_dense.pyx  # 1 doctest failed
sage -t src/sage/matrix/matrix0.pyx  # 4 doctests failed

I did not experience those doctest failures, but I did experience some other problems. Hopefully newly rebased branch is OK. If not, please go at it, because fixing this merge is beyond me if this doesn't do the trick.

comment:29 Changed 5 years ago by ncohen

(new broken doctest in combinat/designs/latin_squares.py ^^;)

comment:30 follow-up: Changed 5 years ago by tscrim

  • Branch changed from u/nbruin/ticket/15036 to public/ticket/matrix_repr_speedup-15036
  • Commit changed from d41816c57ede520353ed24ea9534c1219137d295 to f2d8c363fc26c1dfff9a868714cbc22bfc4759dc

I've fixed the broken doctests that I found (matrix_space.py and the combinat/designs/latin_squares.py). If someone could double-check, I think it's positive review.


New commits:

30776ebMake matrix repr for large matrices not trawl the call stack and move that functionality to displayhook instead.
e852ce1Fixed failing doctest.
fdb3282trac #15036: remove redundant import statement
a53d28fMerge branch 'u/nbruin/ticket/15036' of trac.sagemath.org:sage into public/ticket/large_matrix_repr-15036
f2d8c36Fixed failing doctests.

comment:31 in reply to: ↑ 30 Changed 5 years ago by nbruin

Replying to tscrim:

I've fixed the broken doctests that I found (matrix_space.py and the combinat/designs/latin_squares.py). If someone could double-check, I think it's positive review.

I don't think you based your branch on the branch from comment 27. I had to move the

from sage.matrix.matrix import is_Matrix

to inside simple_format_obj because otherwise Sage won't start for me: the IPython crash report has a long traceback, showing that somewhere along the line, that import statement fails because the name ZZ isn't available (yet).

This problem only arises with interactive use (and I guess in doctests that test sage via an expect interface and hence running with IPython). It would be surprising, but not impossible, if you're not experiencing this problem (in which case import resolutions aren't deterministic), but the fact that the problematic situation can arise means we need to fix it. The relevant (two line) change is at the bottom of http://git.sagemath.org/sage.git/commit/?id=d41816c57ede520353ed24ea9534c1219137d295.

Last edited 5 years ago by nbruin (previous) (diff)

comment:32 Changed 5 years ago by git

  • Commit changed from f2d8c363fc26c1dfff9a868714cbc22bfc4759dc to f1b274282301f0b999969e77a620293235cedd8f

Branch pushed to git repo; I updated commit sha1. New commits:

f1b2742trac #15036: fix late import of is_Matrix to avoid startup problems

comment:33 Changed 5 years ago by nbruin

Hopefully this provides a more elegant approach to importing the is_Matrix symbol at just the right time.

comment:34 follow-up: Changed 5 years ago by tscrim

I had pulled the current branch that was on trac, but perhaps my merge of develop overwrote that...strange... Does this not work with lazy_import?

comment:35 Changed 5 years ago by git

  • Commit changed from f1b274282301f0b999969e77a620293235cedd8f to 3b0cc7a1fc5c45effa3adf72c7e4411ea3a93740

Branch pushed to git repo; I updated commit sha1. New commits:

3b0cc7aUse lazy_import instead

comment:36 in reply to: ↑ 34 Changed 5 years ago by nbruin

Replying to tscrim:

Does this not work with lazy_import?

Good point. That's probably an easier solution (at least to write and understand) And it resolves completely on the first execution.

comment:37 Changed 5 years ago by tscrim

  • Status changed from needs_review to positive_review

LGTM. Thanks Nils.

comment:38 Changed 5 years ago by vbraun

  • Branch changed from public/ticket/matrix_repr_speedup-15036 to 3b0cc7a1fc5c45effa3adf72c7e4411ea3a93740
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.