Opened 6 months ago

Closed 5 months ago

#32901 closed enhancement (fixed)

Cache _row_ambient_module of matrices

Reported by: gh-kliem Owned by:
Priority: critical Milestone: sage-9.5
Component: linear algebra Keywords:
Cc: nbruin, mjo Merged in:
Authors: Jonathan Kliem Reviewers: Michael Orlitzky
Report Upstream: N/A Work issues:
Branch: e6188d0 (Commits, GitHub, GitLab) Commit: e6188d02dacc78f3a594cad5925cccd8513483b2
Dependencies: Stopgaps:

Status badges

Description (last modified by gh-kliem)

... and some other optimizations in _row_ambient_module and _column_ambient_module:

Before:

sage: M = random_matrix(ZZ, 10)
sage: %timeit M._row_ambient_module()
2.77 µs ± 11.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit M._row_ambient_module(QQ)
2.93 µs ± 14.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit M._column_ambient_module()
66.3 ns ± 0.0649 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
sage: v = random_vector(ZZ, 10)
sage: %timeit v*M
5.56 µs ± 30.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit M._clear_cache(); M._row_ambient_module()
2.61 µs ± 11.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit M._clear_cache(); M._row_ambient_module(QQ)
2.73 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit M._clear_cache(); M._column_ambient_module()
1.5 µs ± 4.57 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
sage: %timeit M._clear_cache(); v*M
5.26 µs ± 20.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

After:

sage: M = random_matrix(ZZ, 10)
sage: %timeit M._row_ambient_module()
68.6 ns ± 0.0774 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
sage: %timeit M._row_ambient_module(QQ)
1 µs ± 1.16 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
sage: %timeit M._column_ambient_module()
64.9 ns ± 0.0636 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
sage: v = random_vector(ZZ, 10)
sage: %timeit v*M
1.81 µs ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
sage: %timeit M._clear_cache(); M._row_ambient_module()
1.24 µs ± 1.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
sage: %timeit M._clear_cache(); M._row_ambient_module(QQ)
2.57 µs ± 9.12 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit M._clear_cache(); M._column_ambient_module()
1.25 µs ± 4.43 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
sage: %timeit M._clear_cache(); v*M
3.34 µs ± 13.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Change History (23)

comment:1 Changed 6 months ago by gh-kliem

  • Branch set to public/32901
  • Commit set to 182ee47f0418a11bf9f1f22e6ad58eb4618d003f
  • Status changed from new to needs_review

It's a tiny change with a huge perfomance impact on _vector_times_matrix_, hence I marked it as critical.


New commits:

182ee47cache row_ambient_module of matrices

comment:2 Changed 6 months ago by gh-kliem

  • Cc nbruin added

Given the following:

sage: M = random_matrix(ZZ, 10)
sage: p = M._row_ambient_module()
sage: %timeit p.zero_vector()
703 ns ± 1.65 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

I would say that the ticket is a good start. Using cached_method decorator does not give an improvement (but a slight slowdown).

comment:3 Changed 6 months ago by git

  • Commit changed from 182ee47f0418a11bf9f1f22e6ad58eb4618d003f to 52efd67e51b06d72586ab508ba2d6da70c0d0218

Branch pushed to git repo; I updated commit sha1. New commits:

52efd67ome further optimization of the fast path in _row_ambient_module

comment:4 Changed 6 months ago by nbruin

This change reduces the penalty for having an optional base_ring argument a little further: I'm now seeing 109 ns vs. 101 ns for row vs. column. So the price of the optional argument, together with the testing for it, is still measurable, but is now something like 6% rather than 50%.

comment:5 Changed 6 months ago by gh-kliem

  • Description modified (diff)

comment:6 Changed 6 months ago by gh-kliem

_repr_ doesn't work unfortunatly. There are failing doctests.

comment:7 Changed 6 months ago by gh-kliem

  • Description modified (diff)

With the changes that I'll push in a second, I get the following timings:

sage: M = random_matrix(ZZ, 10)
sage: %timeit M._row_ambient_module(QQ)
1.08 µs ± 2.16 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
sage: %timeit M._row_ambient_module().change_ring(QQ)
2.57 µs ± 7.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

So it is still a bit faster. Unfortunatly, __repr__ is lots slower than _repr_.

comment:8 Changed 6 months ago by git

  • Commit changed from 52efd67e51b06d72586ab508ba2d6da70c0d0218 to ba5eecf89e6ba47db31265cfe1ea45194bb1b2e3

Branch pushed to git repo; I updated commit sha1. New commits:

06236ebcythonize _row_ambient_moudle and _column_ambient_module
2dd33b9Merge branch 'public/32901' of git://trac.sagemath.org/sage into public/32901
ba5eecfuse __repr__ as _repr_ isnt always defined

comment:9 follow-up: Changed 6 months ago by nbruin

This is the __repr__ on SageObject:

        try:
            name = self.__custom_name
            if name is not None:
                return name
        except AttributeError:
            pass
        try:
            reprfunc = self._repr_
        except AttributeError:
            return super().__repr__()
        result = reprfunc()
        if isinstance(result, str):
            return result
        # Allow _repr_ to return unicode on Python 2
        return result.encode('utf-8')

I think our python 2 compatibility should be removed now. Furthermore, I think __repr__ should return a str, not a bytes (which is what encode will return). So the fallback is now erroneous, I think. That means we should just return result unconditionally and trust it produces a str. EDIT: these changes don't actually affect the speed very much. I assume __custom_name is rare, so the usual code path at the moment incurs an AttributeError, which is the expensive part of a try/except. Rewriting that with a hasattr(self,"__custom_name") instead didn't really affect the speed much either.

Last edited 6 months ago by nbruin (previous) (diff)

comment:10 in reply to: ↑ 9 Changed 6 months ago by gh-kliem

Replying to nbruin:

This is the __repr__ on SageObject:

        try:
            name = self.__custom_name
            if name is not None:
                return name
        except AttributeError:
            pass
        try:
            reprfunc = self._repr_
        except AttributeError:
            return super().__repr__()
        result = reprfunc()
        if isinstance(result, str):
            return result
        # Allow _repr_ to return unicode on Python 2
        return result.encode('utf-8')

I think our python 2 compatibility should be removed now. Furthermore, I think __repr__ should return a str, not a bytes (which is what encode will return). So the fallback is now erroneous, I think. That means we should just return result unconditionally and trust it produces a str. EDIT: these changes don't actually affect the speed very much. I assume __custom_name is rare, so the usual code path at the moment incurs an AttributeError, which is the expensive part of a try/except. Rewriting that with a hasattr(self,"__custom_name") instead didn't really affect the speed much either.

Yes, unfortunatly this choice of self.__custom_name is slow.

However, I don't think doing those changes should be part of this ticket.

comment:11 Changed 6 months ago by mjo

  • Cc mjo added

Please leave some comments in the code mentioning how performance-critical it is, and that the unusual logic speeds things up. Otherwise someone is going to see the duplication introduced in Nils's commit and try to change it back the way it was.

comment:12 Changed 6 months ago by git

  • Commit changed from ba5eecf89e6ba47db31265cfe1ea45194bb1b2e3 to 313c645c723a9a1d9a15c6d2fb571de3108b8f4e

Branch pushed to git repo; I updated commit sha1. New commits:

a356ff1remark about code duplication
313c645documentation

comment:13 Changed 6 months ago by mjo

So long as we're shaving off nanoseconds, you can probably replace self.ncols() with self._ncols, self.base_ring() with self._base_ring, and self.is_sparse() with self.is_sparse_c() to help the compiler out if it hasn't already optimized out the intermediate methods.

comment:14 follow-up: Changed 6 months ago by gh-kliem

I could do that, but it woudn't make difference for the cached version.

comment:15 in reply to: ↑ 14 Changed 6 months ago by mjo

Replying to gh-kliem:

I could do that, but it woudn't make difference for the cached version.

Sure, but it doesn't hurt the cached path, either. Since this is critical for vector-matrix multiplication, I was thinking of all the random tests that I have that generate a random matrix, use it once, and then throw it out. The difference is small but measurable.

Before:

sage: M = matrix.random(ZZ,10)
sage: %timeit M._clear_cache(); M._row_ambient_module()
5.16 µs ± 12.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit M._clear_cache(); M._row_ambient_module()
5.18 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit M._clear_cache(); M._row_ambient_module()
5.17 µs ± 9.82 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit M._clear_cache(); M._row_ambient_module()
5.17 µs ± 10.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit M._clear_cache(); M._row_ambient_module()
5.1 µs ± 22.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

After:

sage: M = matrix.random(ZZ,10)
sage: %timeit M._clear_cache(); M._row_ambient_module()
4.93 µs ± 18 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit M._clear_cache(); M._row_ambient_module()
4.85 µs ± 15 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit M._clear_cache(); M._row_ambient_module()
4.86 µs ± 19.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit M._clear_cache(); M._row_ambient_module()
4.93 µs ± 5.97 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit M._clear_cache(); M._row_ambient_module()
4.89 µs ± 13.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

comment:16 follow-up: Changed 6 months ago by nbruin

While you're at it, check if the same changes can be made to _column_ambient_module.

comment:17 in reply to: ↑ 16 Changed 6 months ago by mjo

Replying to nbruin:

While you're at it, check if the same changes can be made to _column_ambient_module.

Sure. Before:

sage: M = matrix.random(ZZ,10)
sage: %timeit M._clear_cache(); M._column_ambient_module()
5.02 µs ± 12.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit M._clear_cache(); M._column_ambient_module()
5.01 µs ± 17.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit M._clear_cache(); M._column_ambient_module()
5.02 µs ± 9.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit M._clear_cache(); M._column_ambient_module()
5.01 µs ± 9.14 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit M._clear_cache(); M._column_ambient_module()
5.02 µs ± 12.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

And after:

sage: M = matrix.random(ZZ,10)
sage: %timeit M._clear_cache(); M._column_ambient_module()
4.36 µs ± 17.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit M._clear_cache(); M._column_ambient_module()
4.41 µs ± 14.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit M._clear_cache(); M._column_ambient_module()
4.41 µs ± 25.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit M._clear_cache(); M._column_ambient_module()
4.41 µs ± 19.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit M._clear_cache(); M._column_ambient_module()
4.41 µs ± 32.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

comment:18 Changed 6 months ago by gh-kliem

The current state:

sage: set_random_seed(0)
sage: M = matrix.random(ZZ, 10)
sage: v = random_vector(ZZ, 10)
sage: %timeit M._clear_cache(); v*M
3.69 µs ± 12.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

With optimization:

sage: set_random_seed(0)
sage: M = matrix.random(ZZ, 10)
sage: v = random_vector(ZZ, 10)
sage: %timeit M._clear_cache(); v*M
3.54 µs ± 28.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

So you are right. It is noticable.

comment:19 Changed 6 months ago by git

  • Commit changed from 313c645c723a9a1d9a15c6d2fb571de3108b8f4e to e6188d02dacc78f3a594cad5925cccd8513483b2

Branch pushed to git repo; I updated commit sha1. New commits:

e6188d0optimization when obtaining FreeModule

comment:20 Changed 6 months ago by gh-kliem

  • Description modified (diff)

comment:21 Changed 6 months ago by mjo

  • Reviewers set to Michael Orlitzky
  • Status changed from needs_review to positive_review

LGTM. Nils, you may want to add yourself as an author?

comment:22 Changed 5 months ago by vdelecroix

Is it really necessary to cache the ambient module when the base_ring argument is not the underlying base ring of the matrix? Going through the repr does not look like a reasonable option anyway. Discussion to be continued at #32984.

comment:23 Changed 5 months ago by vbraun

  • Branch changed from public/32901 to e6188d02dacc78f3a594cad5925cccd8513483b2
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.