Opened 6 months ago
Closed 5 months ago
#32901 closed enhancement (fixed)
Cache _row_ambient_module of matrices
Reported by: | gh-kliem | Owned by: | |
---|---|---|---|
Priority: | critical | Milestone: | sage-9.5 |
Component: | linear algebra | Keywords: | |
Cc: | nbruin, mjo | Merged in: | |
Authors: | Jonathan Kliem | Reviewers: | Michael Orlitzky |
Report Upstream: | N/A | Work issues: | |
Branch: | e6188d0 (Commits, GitHub, GitLab) | Commit: | e6188d02dacc78f3a594cad5925cccd8513483b2 |
Dependencies: | Stopgaps: |
Description (last modified by )
... and some other optimizations in _row_ambient_module
and _column_ambient_module
:
Before:
sage: M = random_matrix(ZZ, 10) sage: %timeit M._row_ambient_module() 2.77 µs ± 11.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit M._row_ambient_module(QQ) 2.93 µs ± 14.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit M._column_ambient_module() 66.3 ns ± 0.0649 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) sage: v = random_vector(ZZ, 10) sage: %timeit v*M 5.56 µs ± 30.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit M._clear_cache(); M._row_ambient_module() 2.61 µs ± 11.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit M._clear_cache(); M._row_ambient_module(QQ) 2.73 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit M._clear_cache(); M._column_ambient_module() 1.5 µs ± 4.57 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) sage: %timeit M._clear_cache(); v*M 5.26 µs ± 20.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
After:
sage: M = random_matrix(ZZ, 10) sage: %timeit M._row_ambient_module() 68.6 ns ± 0.0774 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) sage: %timeit M._row_ambient_module(QQ) 1 µs ± 1.16 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) sage: %timeit M._column_ambient_module() 64.9 ns ± 0.0636 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) sage: v = random_vector(ZZ, 10) sage: %timeit v*M 1.81 µs ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) sage: %timeit M._clear_cache(); M._row_ambient_module() 1.24 µs ± 1.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) sage: %timeit M._clear_cache(); M._row_ambient_module(QQ) 2.57 µs ± 9.12 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit M._clear_cache(); M._column_ambient_module() 1.25 µs ± 4.43 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) sage: %timeit M._clear_cache(); v*M 3.34 µs ± 13.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Change History (23)
comment:1 Changed 6 months ago by
- Branch set to public/32901
- Commit set to 182ee47f0418a11bf9f1f22e6ad58eb4618d003f
- Status changed from new to needs_review
comment:2 Changed 6 months ago by
- Cc nbruin added
Given the following:
sage: M = random_matrix(ZZ, 10) sage: p = M._row_ambient_module() sage: %timeit p.zero_vector() 703 ns ± 1.65 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
I would say that the ticket is a good start. Using cached_method
decorator does not give an improvement (but a slight slowdown).
comment:3 Changed 6 months ago by
- Commit changed from 182ee47f0418a11bf9f1f22e6ad58eb4618d003f to 52efd67e51b06d72586ab508ba2d6da70c0d0218
Branch pushed to git repo; I updated commit sha1. New commits:
52efd67 | ome further optimization of the fast path in _row_ambient_module
|
comment:4 Changed 6 months ago by
This change reduces the penalty for having an optional base_ring
argument a little further: I'm now seeing 109 ns vs. 101 ns for row vs. column. So the price of the optional argument, together with the testing for it, is still measurable, but is now something like 6% rather than 50%.
comment:5 Changed 6 months ago by
- Description modified (diff)
comment:6 Changed 6 months ago by
_repr_
doesn't work unfortunatly. There are failing doctests.
comment:7 Changed 6 months ago by
- Description modified (diff)
With the changes that I'll push in a second, I get the following timings:
sage: M = random_matrix(ZZ, 10) sage: %timeit M._row_ambient_module(QQ) 1.08 µs ± 2.16 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) sage: %timeit M._row_ambient_module().change_ring(QQ) 2.57 µs ± 7.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
So it is still a bit faster. Unfortunatly, __repr__
is lots slower than _repr_
.
comment:8 Changed 6 months ago by
- Commit changed from 52efd67e51b06d72586ab508ba2d6da70c0d0218 to ba5eecf89e6ba47db31265cfe1ea45194bb1b2e3
comment:9 follow-up: ↓ 10 Changed 6 months ago by
This is the __repr__
on SageObject
:
try: name = self.__custom_name if name is not None: return name except AttributeError: pass try: reprfunc = self._repr_ except AttributeError: return super().__repr__() result = reprfunc() if isinstance(result, str): return result # Allow _repr_ to return unicode on Python 2 return result.encode('utf-8')
I think our python 2 compatibility should be removed now. Furthermore, I think __repr__
should return a str
, not a bytes
(which is what encode
will return). So the fallback is now erroneous, I think. That means we should just return result
unconditionally and trust it produces a str
.
EDIT: these changes don't actually affect the speed very much. I assume __custom_name
is rare, so the usual code path at the moment incurs an AttributeError
, which is the expensive part of a try/except
. Rewriting that with a hasattr(self,"__custom_name")
instead didn't really affect the speed much either.
comment:10 in reply to: ↑ 9 Changed 6 months ago by
Replying to nbruin:
This is the
__repr__
onSageObject
:try: name = self.__custom_name if name is not None: return name except AttributeError: pass try: reprfunc = self._repr_ except AttributeError: return super().__repr__() result = reprfunc() if isinstance(result, str): return result # Allow _repr_ to return unicode on Python 2 return result.encode('utf-8')I think our python 2 compatibility should be removed now. Furthermore, I think
__repr__
should return astr
, not abytes
(which is whatencode
will return). So the fallback is now erroneous, I think. That means we should just returnresult
unconditionally and trust it produces astr
. EDIT: these changes don't actually affect the speed very much. I assume__custom_name
is rare, so the usual code path at the moment incurs anAttributeError
, which is the expensive part of atry/except
. Rewriting that with ahasattr(self,"__custom_name")
instead didn't really affect the speed much either.
Yes, unfortunatly this choice of self.__custom_name
is slow.
However, I don't think doing those changes should be part of this ticket.
comment:11 Changed 6 months ago by
- Cc mjo added
Please leave some comments in the code mentioning how performance-critical it is, and that the unusual logic speeds things up. Otherwise someone is going to see the duplication introduced in Nils's commit and try to change it back the way it was.
comment:12 Changed 6 months ago by
- Commit changed from ba5eecf89e6ba47db31265cfe1ea45194bb1b2e3 to 313c645c723a9a1d9a15c6d2fb571de3108b8f4e
comment:13 Changed 6 months ago by
So long as we're shaving off nanoseconds, you can probably replace self.ncols()
with self._ncols
, self.base_ring()
with self._base_ring
, and self.is_sparse()
with self.is_sparse_c()
to help the compiler out if it hasn't already optimized out the intermediate methods.
comment:14 follow-up: ↓ 15 Changed 6 months ago by
I could do that, but it woudn't make difference for the cached version.
comment:15 in reply to: ↑ 14 Changed 6 months ago by
Replying to gh-kliem:
I could do that, but it woudn't make difference for the cached version.
Sure, but it doesn't hurt the cached path, either. Since this is critical for vector-matrix multiplication, I was thinking of all the random tests that I have that generate a random matrix, use it once, and then throw it out. The difference is small but measurable.
Before:
sage: M = matrix.random(ZZ,10) sage: %timeit M._clear_cache(); M._row_ambient_module() 5.16 µs ± 12.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit M._clear_cache(); M._row_ambient_module() 5.18 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit M._clear_cache(); M._row_ambient_module() 5.17 µs ± 9.82 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit M._clear_cache(); M._row_ambient_module() 5.17 µs ± 10.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit M._clear_cache(); M._row_ambient_module() 5.1 µs ± 22.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
After:
sage: M = matrix.random(ZZ,10) sage: %timeit M._clear_cache(); M._row_ambient_module() 4.93 µs ± 18 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit M._clear_cache(); M._row_ambient_module() 4.85 µs ± 15 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit M._clear_cache(); M._row_ambient_module() 4.86 µs ± 19.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit M._clear_cache(); M._row_ambient_module() 4.93 µs ± 5.97 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit M._clear_cache(); M._row_ambient_module() 4.89 µs ± 13.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
comment:16 follow-up: ↓ 17 Changed 6 months ago by
While you're at it, check if the same changes can be made to _column_ambient_module.
comment:17 in reply to: ↑ 16 Changed 6 months ago by
Replying to nbruin:
While you're at it, check if the same changes can be made to _column_ambient_module.
Sure. Before:
sage: M = matrix.random(ZZ,10) sage: %timeit M._clear_cache(); M._column_ambient_module() 5.02 µs ± 12.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit M._clear_cache(); M._column_ambient_module() 5.01 µs ± 17.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit M._clear_cache(); M._column_ambient_module() 5.02 µs ± 9.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit M._clear_cache(); M._column_ambient_module() 5.01 µs ± 9.14 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit M._clear_cache(); M._column_ambient_module() 5.02 µs ± 12.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
And after:
sage: M = matrix.random(ZZ,10) sage: %timeit M._clear_cache(); M._column_ambient_module() 4.36 µs ± 17.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit M._clear_cache(); M._column_ambient_module() 4.41 µs ± 14.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit M._clear_cache(); M._column_ambient_module() 4.41 µs ± 25.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit M._clear_cache(); M._column_ambient_module() 4.41 µs ± 19.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit M._clear_cache(); M._column_ambient_module() 4.41 µs ± 32.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
comment:18 Changed 6 months ago by
The current state:
sage: set_random_seed(0) sage: M = matrix.random(ZZ, 10) sage: v = random_vector(ZZ, 10) sage: %timeit M._clear_cache(); v*M 3.69 µs ± 12.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
With optimization:
sage: set_random_seed(0) sage: M = matrix.random(ZZ, 10) sage: v = random_vector(ZZ, 10) sage: %timeit M._clear_cache(); v*M 3.54 µs ± 28.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
So you are right. It is noticable.
comment:19 Changed 6 months ago by
- Commit changed from 313c645c723a9a1d9a15c6d2fb571de3108b8f4e to e6188d02dacc78f3a594cad5925cccd8513483b2
Branch pushed to git repo; I updated commit sha1. New commits:
e6188d0 | optimization when obtaining FreeModule
|
comment:20 Changed 6 months ago by
- Description modified (diff)
comment:21 Changed 6 months ago by
- Reviewers set to Michael Orlitzky
- Status changed from needs_review to positive_review
LGTM. Nils, you may want to add yourself as an author?
comment:22 Changed 5 months ago by
Is it really necessary to cache the ambient module when the base_ring
argument is not the underlying base ring of the matrix? Going through the repr
does not look like a reasonable option anyway. Discussion to be continued at #32984.
comment:23 Changed 5 months ago by
- Branch changed from public/32901 to e6188d02dacc78f3a594cad5925cccd8513483b2
- Resolution set to fixed
- Status changed from positive_review to closed
It's a tiny change with a huge perfomance impact on
_vector_times_matrix_
, hence I marked it as critical.New commits:
cache row_ambient_module of matrices