Opened 6 years ago
Closed 6 years ago
#15036 closed defect (fixed)
Speed up matrix repr for matrices larger than 20
Reported by: | nbruin | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-6.3 |
Component: | linear algebra | Keywords: | |
Cc: | Merged in: | ||
Authors: | Nils Bruin | Reviewers: | Travis Scrimshaw, Nathann Cohen |
Report Upstream: | N/A | Work issues: | |
Branch: | 3b0cc7a (Commits) | Commit: | 3b0cc7a1fc5c45effa3adf72c7e4411ea3a93740 |
Dependencies: | Stopgaps: |
Description
See this sage-devel discussion.
Sage has a very nice feature where it tries to not print all entries of a large matrix:
sage: A=matrix(20,20,[2]*400) sage: A 20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries)
But, as you see, Sage has to do some magic to know that A
refers to the matrix in question. Indeed, in other situations it won't find a useful name:
sage: B=A sage: A 20 x 20 dense matrix over Integer Ring (type 'print obj.str()' to see all of the entries)
The process of finding a name consists of trawling through all "globals" dictionaries on the call stack. This is an incredibly expensive operation which also breaks python's programming model: reverse lookups of bindings are not part of the language specification.
The main concern here is that string representations may not actually make it to an output stream: they may be part of a (perhaps poorly constructed) exception message that gets caught. Fixing the exception message may not be an option, since some of these happen internally in python. TO illustrate the dramatic effect, consider:
sage: def t(depth,N,n): ....: if depth: ....: return t(depth-1,N,n) ....: else: ....: G = matrix([[2 for u in range(N)] for v in range(N)]) ....: for i in range(n): ....: _=repr(G)
Trying it with a flat call stack is already showing that the repr of a 20x20 matrix is much more expensive to produce than for a 19x19 matrix (a factor 80):
sage: timeit('t(0,19,10)') 125 loops, best of 3: 2.14 ms per loop sage: timeit('t(0,20,10)') 5 loops, best of 3: 172 ms per loop
With a deep call stack this gets even more pronounced:
sage: sage: timeit('t(100,19,10)') 125 loops, best of 3: 2.22 ms per loop sage: timeit('t(100,20,10)') 5 loops, best of 3: 5.45 s per loop
Attachments (2)
Change History (40)
Changed 6 years ago by
comment:1 Changed 6 years ago by
Straightforward patch attached. Probably needs some doctest fixes.
comment:2 Changed 6 years ago by
- Status changed from new to needs_review
comment:3 Changed 6 years ago by
sage -t devel/sage/sage/schemes/elliptic_curves/heegner.py # 2 doctests failed sage -t devel/sage/sage/matrix/matrix2.pyx # 1 doctest failed sage -t devel/sage/sage/misc/sageinspect.py # 2 doctests failed sage -t devel/sage/sage/modular/modsym/ambient.py # 3 doctests failed sage -t devel/sage/sage/matrix/constructor.py # 1 doctest failed sage -t devel/sage/sage/rings/polynomial/multi_polynomial_sequence.py # 1 doctest failed sage -t devel/sage/sage/matrix/matrix_mod2e_dense.pyx # 1 doctest failed sage -t devel/sage/sage/rings/number_field/number_field_ideal.py # 1 doctest failed sage -t devel/sage/sage/matrix/matrix_mod2_dense.pyx # 1 doctest failed sage -t devel/sage/sage/modular/hecke/hecke_operator.py # 1 doctest failed sage -t devel/sage/sage/algebras/steenrod/steenrod_algebra_bases.py # 2 doctests failed sage -t devel/sage/sage/matrix/matrix0.pyx # 5 doctests failed sage -t devel/sage/sage/homology/chain_complex.py # 1 doctest failed sage -t devel/sage/sage/modules/free_module_morphism.py # 1 doctest failed sage -t devel/sage/sage/modular/modsym/relation_matrix.py # 3 doctests failed sage -t devel/sage/sage/modular/hecke/degenmap.py # 1 doctest failed
Most of the doctest failures outside of matrix0.pyx
are of the form
Expected: Vector space of degree 20 and dimension 20 over Rational Field Basis matrix: 20 x 20 dense matrix over Rational Field Got: Vector space of degree 20 and dimension 20 over Rational Field Basis matrix: 20 x 20 dense matrix over Rational Field (type '<matrix>.str()' to see all of the entries)
which suggests that not including a message about .str()
may be more appropriate.
A wild thought: If people feel saying something about .str()
is really necessary for output printed by the IPython REPL or notebook output, perhaps we should put in a hook on those to do a name lookup in globals
there. In that case, we'd still get
sage: A = <large matrix> sage: A * x * matrix over * (type 'print A.str()' to see all entries)
but this would happen because TOP LEVEL recognizes it's printing a string representation that may not fully reflect the value. Certainly, TOP LEVEL looking in globals() wouldn't be so bad, AND it wouldn't need to look in stack frames to find what globals dictionary is appropriate.
comment:4 Changed 6 years ago by
Second attachment regains interactive printing. With the notebook fix suggested on #14469, this should work in the notebook as well.
comment:5 Changed 6 years ago by
patchbot apply 15036-display_hook_fix.patch
Personally, I wouldn't mind the original patch either. This new patch does stay a little closer to the original output, though.
comment:6 Changed 6 years ago by
Only a few doctests to amend with the new patch:
sage -t devel/sage/sage/matrix/matrix_mod2e_dense.pyx # 1 doctest failed sage -t devel/sage/sage/matrix/matrix0.pyx # 3 doctests failed
If people are OK with the proposed fix, I'll upload a patch fixing the doctests.
comment:7 follow-up: ↓ 8 Changed 6 years ago by
My 2 cents.
- I like that large matrices do not get printed entirely without being requested explicitly.
- Even with a helpful message, I see students who wonder why they do not get the whole matrix. So I would rather not see the bare output
20 x 20 dense matrix over Rational Field
- I think it is less important to see the actual name of the matrix parroted in the output (the source of the inefficiency here). So something like the following would be sufficient in my eyes, if it was easier to code/maintain:
20 x 20 dense matrix over Rational Field (use the '.str()' method to see all of the entries)
Thanks for cleaning this up.
Rob
comment:8 in reply to: ↑ 7 ; follow-up: ↓ 9 Changed 6 years ago by
Thanks for the feedback.
UNPATCHED BEHAVIOUR:
sage: L=[matrix(20,20,[2]*400)] sage: L [20 x 20 dense matrix over Integer Ring] sage: A=L[0] sage: L [20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries)] sage: B=L[0] sage: L [20 x 20 dense matrix over Integer Ring (type 'print obj.str()' to see all of the entries)]
where you can see:
- No printing tip if no direct binding is found
- Printing of an explicit name if a unique direct binding is found
- Printing of a generic
obj
name designator if more than one direct binding was found.
This all with an excessively verbose:
sage: L=[matrix(20,20,[2]*400)]*10 sage: A=L[0] sage: L [20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries), 20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries), 20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries), 20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries), 20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries), 20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries), 20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries), 20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries), 20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries), 20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries)]
WITH THE PATCH:
sage: L=[matrix(20,20,[2]*400)] sage: L [20 x 20 dense matrix over Integer Ring] sage: A=L[0] sage: print A 20 x 20 dense matrix over Integer Ring sage: A 20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries) sage: L [20 x 20 dense matrix over Integer Ring] sage: L[0] 20 x 20 dense matrix over Integer Ring (type 'print A.str()' to see all of the entries) sage: B=L[0] sage: L [20 x 20 dense matrix over Integer Ring] sage: L[0] 20 x 20 dense matrix over Integer Ring (type 'print obj.str()' to see all of the entries)
Thus:
- If the value returned is not a matrix itself (but for instance, a list with matrices in it), no tip is printed. I think that's appropriate.
- Since the print hook of the interface is the routine that appends the message, a straight print statement never included the tip either
- If the value itself is a matrix then the tip is included according to the rules that applied before.
I think this is preferable to the old behaviour because
- we're losing the inefficiency in most string productions
- the tip is often annoying in list printing. That matrices are abbreviated in a list environment is understandable, I think. People will naturally try to look at the matrix itself if they are interested in the value.
- we're still getting the tip in most cases where it's likely to be useful.
We can implement Beezer's static suggestion in the displayhook, which will be more efficient (although that is not so important) and more appropriate. In that case we would get:
sage: L=[matrix(20,20,[2]*400)] sage: L [20 x 20 dense matrix over Integer Ring] sage: L[0] 20 x 20 dense matrix over Integer Ring (use the '.str()' method to see the entries) sage: A=L[0] sage: L[0] 20 x 20 dense matrix over Integer Ring (use the '.str()' method to see the entries) sage: print L[0] 20 x 20 dense matrix over Integer Ring
I think this is the best solution, because:
- It provides a useful tip consistently in a minimal case: when a matrix value is returned that gets printed in abbreviated form.
- It provides a tip that's always correct, regardless of what bindings exist (and indeed, what bindings exist is immaterial for the problem that the user is facing)
- We won't be digging around in the globals or the call stack without explicitly being asked to do so.
So should we go for that then?
Incidentally, I think .str()
is an awful name for this method. The fact that str(M)
and M.str()
do similar but different things is a horrible trap. Also, .str
and .__str__
are too similar to both exist but be different. I don't have a direct alternative in mind, nor the inclination to follow through on the horrible deprecation process that would be required to remove this wart, though.
comment:9 in reply to: ↑ 8 Changed 6 years ago by
Replying to nbruin:
So should we go for that then?
Thanks for the very careful examples. I had not dug that deep.
I agree 100% with 100% of what you said, including the str()
function/method mess. I will mostly be very glad to see the "obj." go away from the tip. Thanks again for wrestling with this one.
Rob
comment:10 Changed 6 years ago by
patchbot apply 15036-display_hook_fix.patch
comment:11 Changed 6 years ago by
- Status changed from needs_review to needs_work
one doctest is failing
comment:12 Changed 6 years ago by
- Milestone changed from sage-6.1 to sage-6.2
comment:13 Changed 6 years ago by
- Branch set to public/ticket/matrix_repr_speedup-15036
- Commit set to e852ce10285877db93309db156139a7bf3b8bc54
- Status changed from needs_work to needs_review
comment:14 follow-up: ↓ 15 Changed 6 years ago by
If someone could check my changes, then it's a positive review.
comment:15 in reply to: ↑ 14 Changed 6 years ago by
- Reviewers set to Travis Scrimshaw, Nathann Cohen
- Status changed from needs_review to positive_review
If someone could check my changes, then it's a positive review.
This change looks safe :-P
Nathann
comment:16 Changed 6 years ago by
I liked the unedited comment. :p
Thanks.
comment:17 follow-up: ↓ 18 Changed 6 years ago by
- Status changed from positive_review to needs_work
You didn't actually move the "find variable name" to the displayhook. Instead you just removed that functionality. Which I don't really mind, but then there is no reason to modify the displayhook. Just put it all into _repr_
.
comment:18 in reply to: ↑ 17 Changed 6 years ago by
- Status changed from needs_work to needs_review
Replying to vbraun:
You didn't actually move the "find variable name" to the displayhook. Instead you just removed that functionality. Which I don't really mind, but then there is no reason to modify the displayhook. Just put it all into
_repr_
.
There is. This was carefully argued, see comment 8. I'll repeat the (slightly modified) example here for your convenience:
sage: L=[matrix(20,20,[2]*400) for i in range(3)] sage: L [20 x 20 dense matrix over Integer Ring, 20 x 20 dense matrix over Integer Ring, 20 x 20 dense matrix over Integer Ring]
It's annoying and not useful to include the '.str()' message with each of these.
The rest of the examples gives an idea of the balance that is struck by handling the message in displayhook, which Rob and I decided on as the most desirable behaviour.
sage: L[0] 20 x 20 dense matrix over Integer Ring (use the '.str()' method to see the entries) sage: A=L[0] sage: L[0] 20 x 20 dense matrix over Integer Ring (use the '.str()' method to see the entries) sage: print L[0] 20 x 20 dense matrix over Integer Ring
Setting back to "needs review"--Volker, would you set it back to positive if this satisfies your concern? If you want the annoying message all the time, go for it, but you'll be changing a LOT more doctests. This behaviour turned out to be much closer to what we already had.
comment:19 follow-up: ↓ 20 Changed 6 years ago by
Fine, but I still think its fugly to hardcode part of the matrix output into the displayhook. If you want to achieve that, how about introducing a special method _repr_help()
, say, whose output will be appended in the displayhook if that object is the top-level output. That would give us an extensible mechanism and keep the declaration of the representation in the matrix module.
comment:20 in reply to: ↑ 19 Changed 6 years ago by
Replying to vbraun:
Fine, but I still think its fugly to hardcode part of the matrix output into the displayhook. If you want to achieve that, how about introducing a special method
_repr_help()
, say, whose output will be appended in the displayhook if that object is the top-level output. That would give us an extensible mechanism and keep the declaration of the representation in the matrix module.
I agree that refactoring is worth doing if we have more cases (and will be easy). Currently, the only objects this would appy to would be matrices and lists, and we can't reach into lists to implement the protocol there. For now we'd be designing and implementing a protocol to use once.
(Conceptually, it is DisplayHook
that should provide the hint: the object is simply giving a compact representation. It's not its task to provide help. The top-level interface is the right place to decide to provide help. Once we have more than one object requiring this, it starts making sense to implement this by letting the DisplayHook
ask the object for a "more helpful" string representation.)
comment:21 follow-up: ↓ 23 Changed 6 years ago by
I think you should remove the line
from sage.misc.sageinspect import sage_getvariablename
from the __repr__
method in matrix0.pyx
.
comment:22 Changed 6 years ago by
- Branch changed from public/ticket/matrix_repr_speedup-15036 to u/nbruin/ticket/15036
- Created changed from 08/11/13 20:26:57 to 08/11/13 20:26:57
- Modified changed from 02/08/14 17:48:02 to 02/08/14 17:48:02
comment:23 in reply to: ↑ 21 Changed 6 years ago by
- Commit changed from e852ce10285877db93309db156139a7bf3b8bc54 to fdb32821fa9382cb3bd468d1d0615f8f246470b0
Replying to jhpalmieri:
I think you should remove the [import]
Good point. Done.
New commits:
fdb3282 | trac #15036: remove redundant import statement
|
comment:24 Changed 6 years ago by
This was on the verge of getting a positive review a while ago. Any progress?
comment:25 Changed 6 years ago by
- Milestone changed from sage-6.2 to sage-6.3
comment:26 follow-up: ↓ 28 Changed 6 years ago by
- Status changed from needs_review to needs_work
With 6.3base0 (tested only matrix/
):
sage -t src/sage/matrix/matrix_mod2e_dense.pyx # 1 doctest failed sage -t src/sage/matrix/constructor.py # 1 doctest failed sage -t src/sage/matrix/matrix_mod2_dense.pyx # 1 doctest failed sage -t src/sage/matrix/matrix0.pyx # 4 doctests failed
comment:27 Changed 6 years ago by
- Commit changed from fdb32821fa9382cb3bd468d1d0615f8f246470b0 to d41816c57ede520353ed24ea9534c1219137d295
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
508e858 | Make matrix repr for large matrices not trawl the call stack and move that functionality to displayhook instead.
|
6edd34f | Fixed failing doctest.
|
0eb5514 | trac #15036: remove redundant import statement
|
d41816c | trac 15036: make some changes for rebase on 6.2
|
comment:28 in reply to: ↑ 26 Changed 6 years ago by
- Status changed from needs_work to needs_review
Replying to rws:
With 6.3base0 (tested only
matrix/
):sage -t src/sage/matrix/matrix_mod2e_dense.pyx # 1 doctest failed sage -t src/sage/matrix/constructor.py # 1 doctest failed sage -t src/sage/matrix/matrix_mod2_dense.pyx # 1 doctest failed sage -t src/sage/matrix/matrix0.pyx # 4 doctests failed
I did not experience those doctest failures, but I did experience some other problems. Hopefully newly rebased branch is OK. If not, please go at it, because fixing this merge is beyond me if this doesn't do the trick.
comment:29 Changed 6 years ago by
(new broken doctest in combinat/designs/latin_squares.py ^^;
)
comment:30 follow-up: ↓ 31 Changed 6 years ago by
- Branch changed from u/nbruin/ticket/15036 to public/ticket/matrix_repr_speedup-15036
- Commit changed from d41816c57ede520353ed24ea9534c1219137d295 to f2d8c363fc26c1dfff9a868714cbc22bfc4759dc
I've fixed the broken doctests that I found (matrix_space.py
and the combinat/designs/latin_squares.py
). If someone could double-check, I think it's positive review.
New commits:
30776eb | Make matrix repr for large matrices not trawl the call stack and move that functionality to displayhook instead.
|
e852ce1 | Fixed failing doctest.
|
fdb3282 | trac #15036: remove redundant import statement
|
a53d28f | Merge branch 'u/nbruin/ticket/15036' of trac.sagemath.org:sage into public/ticket/large_matrix_repr-15036
|
f2d8c36 | Fixed failing doctests.
|
comment:31 in reply to: ↑ 30 Changed 6 years ago by
Replying to tscrim:
I've fixed the broken doctests that I found (
matrix_space.py
and thecombinat/designs/latin_squares.py
). If someone could double-check, I think it's positive review.
I don't think you based your branch on the branch from comment 27. I had to move the
from sage.matrix.matrix import is_Matrix
to inside simple_format_obj
because otherwise Sage won't start for me: the IPython crash report has a long traceback, showing that somewhere along the line, that import statement fails because the name ZZ
isn't available (yet).
This problem only arises with interactive use (and I guess in doctests that test sage via an expect interface and hence running with IPython). It would be surprising, but not impossible, if you're not experiencing this problem (in which case import resolutions aren't deterministic), but the fact that the problematic situation can arise means we need to fix it. The relevant (two line) change is at the bottom of http://git.sagemath.org/sage.git/commit/?id=d41816c57ede520353ed24ea9534c1219137d295.
comment:32 Changed 6 years ago by
- Commit changed from f2d8c363fc26c1dfff9a868714cbc22bfc4759dc to f1b274282301f0b999969e77a620293235cedd8f
Branch pushed to git repo; I updated commit sha1. New commits:
f1b2742 | trac #15036: fix late import of is_Matrix to avoid startup problems
|
comment:33 Changed 6 years ago by
Hopefully this provides a more elegant approach to importing the is_Matrix
symbol at just the right time.
comment:34 follow-up: ↓ 36 Changed 6 years ago by
I had pulled the current branch that was on trac, but perhaps my merge of develop
overwrote that...strange... Does this not work with lazy_import
?
comment:35 Changed 6 years ago by
- Commit changed from f1b274282301f0b999969e77a620293235cedd8f to 3b0cc7a1fc5c45effa3adf72c7e4411ea3a93740
Branch pushed to git repo; I updated commit sha1. New commits:
3b0cc7a | Use lazy_import instead
|
comment:36 in reply to: ↑ 34 Changed 6 years ago by
Replying to tscrim:
Does this not work with
lazy_import
?
Good point. That's probably an easier solution (at least to write and understand) And it resolves completely on the first execution.
comment:37 Changed 6 years ago by
- Status changed from needs_review to positive_review
LGTM. Thanks Nils.
comment:38 Changed 6 years ago by
- Branch changed from public/ticket/matrix_repr_speedup-15036 to 3b0cc7a1fc5c45effa3adf72c7e4411ea3a93740
- Resolution set to fixed
- Status changed from positive_review to closed
Make matrix repr for large matrices not trawl the call stack