Ticket #13170 (closed enhancement: fixed)

Opened 11 months ago

Last modified 11 months ago

Speedup the default nonzero test for matrices

Reported by: SimonKing Owned by: tbd
Priority: major Milestone: sage-5.2
Component: performance Keywords:
Cc: malb Work issues:
Report Upstream: N/A Reviewers: Javier López Peña
Authors: Simon King Merged in: sage-5.2.beta1
Dependencies: Stopgaps:

Description

The default __nonzero__ method for matrices as defined in sage/matrix/matrix0.pyx is

    def __nonzero__(self):
        z = self._base_ring(0)
        cdef Py_ssize_t i, j
        for i from 0 <= i < self._nrows:
            for j from 0 <= j < self._ncols:
                if self.get_unsafe(i,j) != z:
                    return True
        return False

This default implementation is actually used, over some fields. But it turns out that the default could be considerably improved. Let's define three alternative methods, and compare with the default:

from sage.matrix.matrix0 cimport Matrix
from sage.structure.element cimport Element
def my_first_bool(Matrix M):
    cdef Py_ssize_t i, j
    for i from 0 <= i < M._nrows:
        for j from 0 <= j < M._ncols:
            if M.get_unsafe(i,j):
                return True
    return False
def my_second_bool(Matrix M):
    cdef Py_ssize_t i, j
    # avoid some overhead: zero_element may be faster than ...(0)
    cdef Element z = M._base_ring.zero_element()
    for i from 0 <= i < M._nrows:
        for j from 0 <= j < M._ncols:
            if M.get_unsafe(i,j)!=z:
                return True
    return False
def my_third_bool(Matrix M):
    cdef Py_ssize_t i, j
    cdef Element z = M._base_ring.zero_element()
    for i from 0 <= i < M._nrows:
        for j from 0 <= j < M._ncols:
            # Try to skip coercion
            if z._cmp_(M.get_unsafe(i,j)) is not 0:
                return True
    return False
def default_bool(Matrix M):
    cdef Py_ssize_t i, j
    z = M._base_ring(0)
    for i from 0 <= i < M._nrows:
        for j from 0 <= j < M._ncols:
            if M.get_unsafe(i,j)!=z:
                return True
    return False

GF(2)

A 5- to 10-fold speed-up is possible:

sage: K = GF(2)
sage: M = random_matrix(K,1000,1000)
sage: my_first_bool(M) == my_second_bool(M) == my_third_bool(M) == default_bool(M) == bool(M)
True
sage: %timeit my_first_bool(M)
625 loops, best of 3: 373 ns per loop
sage: %timeit my_second_bool(M)
625 loops, best of 3: 653 ns per loop
sage: %timeit my_third_bool(M)
625 loops, best of 3: 2.74 µs per loop
sage: %timeit default_bool(M)
625 loops, best of 3: 4.33 µs per loop
sage: %timeit bool(M)
625 loops, best of 3: 4.94 µs per loop
sage: M*=0
sage: my_first_bool(M) == my_second_bool(M) == my_third_bool(M) == default_bool(M) == bool(M)
True
sage: %timeit my_first_bool(M)
25 loops, best of 3: 16.7 ms per loop
sage: %timeit my_second_bool(M)
5 loops, best of 3: 73.2 ms per loop
sage: %timeit my_third_bool(M)
5 loops, best of 3: 365 ms per loop
sage: %timeit default_bool(M)
5 loops, best of 3: 75.9 ms per loop
sage: %timeit bool(M)
5 loops, best of 3: 73.2 ms per loop

GF(4)

A 10- to 20-fold speed-up is possible:

sage: K = GF(4,'z')
sage: M = random_matrix(K,1000,1000)
sage: my_first_bool(M) == my_second_bool(M) == my_third_bool(M) == default_bool(M) == bool(M)
True
sage: %timeit my_first_bool(M)
625 loops, best of 3: 499 ns per loop
sage: %timeit my_second_bool(M)
625 loops, best of 3: 1.51 µs per loop
sage: %timeit my_third_bool(M)
625 loops, best of 3: 2.36 µs per loop
sage: %timeit default_bool(M)
625 loops, best of 3: 9.59 µs per loop
sage: %timeit bool(M)
625 loops, best of 3: 10.3 µs per loop
sage: M*=0
sage: my_first_bool(M) == my_second_bool(M) == my_third_bool(M) == default_bool(M) == bool(M)
True
sage: %timeit my_first_bool(M)
25 loops, best of 3: 30.1 ms per loop
sage: %timeit my_second_bool(M)
5 loops, best of 3: 466 ms per loop
sage: %timeit my_third_bool(M)
5 loops, best of 3: 845 ms per loop
sage: %timeit default_bool(M)
5 loops, best of 3: 482 ms per loop
sage: %timeit bool(M)
5 loops, best of 3: 473 ms per loop

GF(3)

Sage already uses a specialised implementation of __nonzero__:

sage: K = GF(3)
sage: M = random_matrix(K,1000,1000)
sage: my_first_bool(M) == my_second_bool(M) == my_third_bool(M) == default_bool(M) == bool(M)
True
sage: %timeit default_bool(M)
625 loops, best of 3: 33.5 µs per loop
sage: %timeit bool(M)
625 loops, best of 3: 566 ns per loop
sage: M*=0
sage: %timeit default_bool(M)
# Interrupted after a few MINUTES!!
sage: %timeit bool(M)
125 loops, best of 3: 4.23 ms per loop

GF(25)

A 20- to 40-fold speed-up is possible:

sage: K = GF(25,'z')
sage: M = random_matrix(K,1000,1000)
sage: my_first_bool(M) == my_second_bool(M) == my_third_bool(M) == default_bool(M) == bool(M)
True
sage: %timeit my_first_bool(M)
625 loops, best of 3: 408 ns per loop
sage: %timeit my_second_bool(M)
625 loops, best of 3: 2.44 µs per loop
sage: %timeit my_third_bool(M)
625 loops, best of 3: 3.47 µs per loop
sage: %timeit default_bool(M)
625 loops, best of 3: 20.6 µs per loop
sage: %timeit bool(M)
625 loops, best of 3: 22.1 µs per loop
sage: M*=0
sage: my_first_bool(M) == my_second_bool(M) == my_third_bool(M) == default_bool(M) == bool(M)
True
sage: %timeit my_first_bool(M)
25 loops, best of 3: 16.8 ms per loop
sage: %timeit my_second_bool(M)
5 loops, best of 3: 456 ms per loop
sage: %timeit my_third_bool(M)
5 loops, best of 3: 819 ms per loop
sage: %timeit default_bool(M)
5 loops, best of 3: 446 ms per loop
sage: %timeit bool(M)
5 loops, best of 3: 454 ms per loop

Polynomial rings

Over more exotic base rings, a 6- to 40-fold speed-up is possible

sage: K.<x,y> = QQ[]
sage: M = random_matrix(K,100,100)
sage: my_first_bool(M) == my_second_bool(M) == my_third_bool(M) == default_bool(M) == bool(M)
True
sage: %timeit my_first_bool(M)
625 loops, best of 3: 374 ns per loop
sage: %timeit my_second_bool(M)
625 loops, best of 3: 1.03 µs per loop
sage: %timeit my_third_bool(M)
625 loops, best of 3: 758 ns per loop
sage: %timeit default_bool(M)
625 loops, best of 3: 14.8 µs per loop
sage: %timeit bool(M)
625 loops, best of 3: 15.4 µs per loop
sage: M*=0
sage: my_first_bool(M) == my_second_bool(M) == my_third_bool(M) == default_bool(M) == bool(M)
True
sage: %timeit my_first_bool(M)
625 loops, best of 3: 176 µs per loop
sage: %timeit my_second_bool(M)
625 loops, best of 3: 764 µs per loop
sage: %timeit my_third_bool(M)
125 loops, best of 3: 3.56 ms per loop
sage: %timeit default_bool(M)
625 loops, best of 3: 825 µs per loop
sage: %timeit bool(M)
625 loops, best of 3: 848 µs per loop

Conclusion

Of course, a custom __nonzero__ method using special C-functions of the matrix backend (as in the case of GF(3)) would be best. However, the above timings suggest to change the default implementation of __nonzero__ as I do in my patch.

Attachments

trac13170_generic_matrix_nonzero.patch Download (752 bytes) - added by SimonKing 11 months ago.

Change History

Changed 11 months ago by SimonKing

comment:1 Changed 11 months ago by SimonKing

  • Status changed from new to needs_review

With the patch, the timings become

sage: K = GF(2)
sage: M = random_matrix(K,1000,1000)
sage: %timeit bool(M)
625 loops, best of 3: 547 ns per loop
sage: M*=0
sage: %timeit bool(M)
25 loops, best of 3: 38 ms per loop
sage: K = GF(3)
sage: M = random_matrix(K,1000,1000)
sage: %timeit bool(M)
625 loops, best of 3: 464 ns per loop
sage: M*=0
sage: %timeit bool(M)
125 loops, best of 3: 3.61 ms per loop
sage: K = GF(4,'z')
sage: M = random_matrix(K,1000,1000)
sage: %timeit bool(M)
625 loops, best of 3: 665 ns per loop
sage: M*=0
sage: %timeit bool(M)
5 loops, best of 3: 67.9 ms per loop
sage: K = GF(25,'z')
sage: M = random_matrix(K,1000,1000)
sage: %timeit bool(M)
625 loops, best of 3: 504 ns per loop
sage: M*=0
sage: %timeit bool(M)
25 loops, best of 3: 38.6 ms per loop
sage: K.<x,y> = QQ[]
sage: M = random_matrix(K,100,100)
sage: %timeit bool(M)
625 loops, best of 3: 500 ns per loop
sage: M*=0
sage: %timeit bool(M)
625 loops, best of 3: 398 µs per loop

The speed-up is clear, though not as good as expected. Anyway: Needs review!

comment:2 Changed 11 months ago by jlopez

  • Status changed from needs_review to positive_review
  • Reviewers set to Javier López Peña

I made my own tests running the following code:

for i in [2, 3, 4, 5, 7, 9, 11, 16]:
    K = GF(i, "z")
    for j in [100,200,300,500,1000]:
        M = random_matrix(K, j, j)
        bool(M)
        M *= 0
        bool(M)

without the patch:

5 loops, best of 3: 3.68 s per loop

with the patch:

5 loops, best of 3: 2.29 s per loop

So there is some speedup. Code looks harmless and tests pass. Positive review!

comment:3 Changed 11 months ago by jdemeyer

  • Status changed from positive_review to closed
  • Resolution set to fixed
  • Merged in set to sage-5.2.beta1
Note: See TracTickets for help on using tickets.