Opened 5 years ago

Closed 5 years ago

# speed up generic polynomials

Reported by: Owned by: mmezzarobba major sage-7.6 performance cremona, nbruin, SimonKing Marc Mezzarobba, Travis Scrimshaw Travis Scrimshaw, Marc Mezzarobba N/A b846e12

Some hopefully representative examples:

Pol.<x> = QQi[]
p = x + i + 1
q = ( (8*i - 12)*x^10 + (-i + 2)*x^9 + (-i + 1)*x^8 + (-i + 7)*x^7 + (-i + 1)*x^6 + (-12*i + 2)*x^5 + (i + 1)*x^4 + (-7*i + 1)*x^3 + (2*i + 1)*x^2 + x - i - 2)
a = 42
b = QQi(42)
mat = matrix(QQi, [[1, i], [0,2]])
y = polygen(Pol, 'y')
foo = y^2 + x + 1
fl1 = 1.r
RR1 = 1.
for cmd in [
'p*p',
'p**5',
'p._mul_trunc_(q, 3)',
'p(0)',
'q(0)',
'p(a)',
'p(b)',
'q(b)',
'q(fl1)',
'q(RR1)',
'q(a)',
'q(mat)',
'foo(x=1, y=2)',
]:
timeit(cmd)
sage: pol = MatrixSpace(ZZ, 1000, 1000)['x'].random_element(degree=10)
sage: %timeit pol(42)

Results:

test                  old      new
--------------------------------------
p*p                   13.1 µs  8.14 µs
p**5                  53.7 µs  35.9 µs
p._mul_trunc_(q, 3)   17 µs    8.26 µs
p(0)                  3.49 µs  920 ns
q(0)                  3.5 µs   877 ns
p(a)                  6.72 µs  3.78 µs
p(b)                  6.68 µs  2.48 µs
q(b)                  13.8 µs  11.9 µs
q(fl1)                7.47 ms  8.24 ms
q(RR1)                7.48 ms  8.27 ms
q(a)                  16.8 µs  15.8 µs
q(mat)                290 µs   284 µs
foo(x=1, y=2)         64.1 µs  54 µs
---------------------------------------
pol(42)               208 ms   202 ms

(Evaluations at floating-point numbers are very slow, and suffer from a small regression, due to the slowness of the generic coercion map through CLF.)

Edit: add up to ~0.5 µs to some of the above timings due to subsequent fixes.

### comment:1 Changed 5 years ago by mmezzarobba

• Cc cremona nbruin SimonKing added
• Status changed from new to needs_review

### comment:2 Changed 5 years ago by git

• Commit changed from 9b7eec254e31c45e80cf0e81b9192e9152ccdf18 to 1fb9d2737f5a9e454f25bf80a1eeef7b4825ccfd

Branch pushed to git repo; I updated commit sha1. New commits:

 ​0678134 generic Polynomials: [] -> get_unsafe() when possible ​1fb9d27 speed up generic Polynomial.__call__()

### comment:3 Changed 5 years ago by mmezzarobba

• Description modified (diff)
• Summary changed from speed up generic polynomials a little to speed up generic polynomials

The tests in rings/ pass; I haven't yet run the whole test suite.

### comment:4 Changed 5 years ago by git

• Commit changed from 1fb9d2737f5a9e454f25bf80a1eeef7b4825ccfd to 7488953070e2ee0f503c4b5a3258bc12492c9ee8

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

 ​7488953 speed up generic Polynomial.__call__()

### comment:5 follow-up: ↓ 6 Changed 5 years ago by tscrim

Looks good overall. From a quick look through the code, it seems like the inputs to the do_* methods always take lists as inputs. Likewise, should we make list a cpdef method that guarantees a list being returned? This might get us some extra speed as well.

### comment:6 in reply to: ↑ 5 Changed 5 years ago by mmezzarobba

Looks good overall. From a quick look through the code, it seems like the inputs to the do_* methods always take lists as inputs. Likewise, should we make list a cpdef method that guarantees a list being returned? This might get us some extra speed as well.

Thanks for the suggestion. Adding type annotations to do_* and brutally replacing def list(...) by cdef list list(...) everywhere actually seems to make things slower, though the difference is small. I didn't try to investigate any further. And I must confess that it is not at all clear to me what implications those cdef have with the relatively complex hierarchy of subclasses that override list() (sometimes from Cython, sometimes from Python, sometimes with additional arguments).

### comment:7 follow-up: ↓ 8 Changed 5 years ago by tscrim

You might be loosing some speed by variables not being cdef'ed as a list. Did you happen to save your current branch? I can play around with it later tonight to see if I can get some extra speed out of it.

### comment:8 in reply to: ↑ 7 Changed 5 years ago by mmezzarobba

I noticed a small problem with the last commit: it would cause terrible regression for cases like polynomials over large matrix spaces evaluated at scalars. I'll push a fix in a minute.

You might be loosing some speed by variables not being cdef'ed as a list. Did you happen to save your current branch?

Yes, see u/mmezzarobba/speed_up_generic_polynomials-alt.

### comment:9 Changed 5 years ago by git

• Commit changed from 7488953070e2ee0f503c4b5a3258bc12492c9ee8 to 25f884d48ae7392c908231dfcaa1931b66cbe900

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

 ​25f884d speed up generic Polynomial.__call__()

### comment:10 Changed 5 years ago by mmezzarobba

• Description modified (diff)

### comment:11 Changed 5 years ago by tscrim

• Branch changed from u/mmezzarobba/speed_up_generic_polynomials to u/tscrim/speed_up_generic_polynomials-22152
• Commit changed from 25f884d48ae7392c908231dfcaa1931b66cbe900 to a75321719bd7de6637b740bf00c6f266e7acfda7
• Milestone changed from sage-7.5 to sage-7.6
• Reviewers set to Travis Scrimshaw

I went through and made all of the def list(self) methods into cpdef list list(self, bint copy=True) to take advantage of the times when we don't have to copy the list and to specify the output type. This didn't give as much of a speed bump as I was expecting. I also made some other improvements by using get_unsafe and other things. Here are my timings:

Current branch:

625 loops, best of 3: 6.68 µs per loop
625 loops, best of 3: 29.2 µs per loop
625 loops, best of 3: 6.77 µs per loop
625 loops, best of 3: 739 ns per loop
625 loops, best of 3: 699 ns per loop
625 loops, best of 3: 3.3 µs per loop
625 loops, best of 3: 2.23 µs per loop
625 loops, best of 3: 10.4 µs per loop
125 loops, best of 3: 7.11 ms per loop
125 loops, best of 3: 7.1 ms per loop
625 loops, best of 3: 13.2 µs per loop
625 loops, best of 3: 247 µs per loop
625 loops, best of 3: 46.7 µs per loop

vs my new branch

625 loops, best of 3: 6.5 µs per loop
625 loops, best of 3: 28.6 µs per loop
625 loops, best of 3: 6.64 µs per loop
625 loops, best of 3: 739 ns per loop
625 loops, best of 3: 737 ns per loop
625 loops, best of 3: 3.13 µs per loop
625 loops, best of 3: 2.19 µs per loop
625 loops, best of 3: 10.6 µs per loop
125 loops, best of 3: 6.88 ms per loop
125 loops, best of 3: 6.84 ms per loop
625 loops, best of 3: 13.4 µs per loop
625 loops, best of 3: 240 µs per loop
625 loops, best of 3: 44.4 µs per loop

The small timings have too much variation to say there is a definitive regression. However, the arithmetic operations have improved.

Something else I noticed that probably could get us some speed is requiring all polynomials to have a _new_constant_poly. Currently only the dense ones have such a method. Shall we do that here?

New commits:

 ​f355638 wip ​b3878a2 Merge branch 'u/mmezzarobba/speed_up_generic_polynomials-alt' of git://trac.sagemath.org/sage into u/mmezzarobba/speed_up_generic_polynomials-alt ​526fdad Using cpdef list list() and some other improvements. ​6b53da5 Making list always take a second input for the dense cases and some other speedups. ​a753217 A few more _new_generic instead of via the coercion framework.

### comment:12 Changed 5 years ago by mmezzarobba

• Authors changed from Marc Mezzarobba to Marc Mezzarobba, Travis Scrimshaw
• Branch changed from u/tscrim/speed_up_generic_polynomials-22152 to u/mmezzarobba/speed_up_generic_polynomials
• Commit changed from a75321719bd7de6637b740bf00c6f266e7acfda7 to 82554f906ead4a2693223115d25b20698c82e1ee
• Reviewers changed from Travis Scrimshaw to Travis Scrimshaw, Travis Scrimshaw

Thanks for the improvements! I cleaned up the branch a little (rebased, squashed my "wip" commit into the following one) without changing the code.

Regarding _new_constant_poly() for sparse polynomials: I really don't think I'll have time to make further changes until Feb. 8 at best, though I'll do my best to review yours if you make some. In any case I'm happy with the current branch; you can set the ticket to positive review if you agree, assuming the patchbot doesn't find anything to complain about.

New commits:

 ​40cabed don't call parent() on result from methods of base Polynomial class ​425aa01 speed up truncated multiplication of generic polynomials ​edb148d cache the result of PolynomialRing.is_exact() ​4628080 polynomial_element: minor code simplifications/speedups ​901e40e disable cython checks in generic polynomial multiplication ​67aadec generic Polynomials: [] -> get_unsafe() when possible ​856a74a speed up generic Polynomial.__call__() ​6bca846 Using cpdef list list() and some other improvements. ​04e89ae Making list always take a second input for the dense cases and some other speedups. ​82554f9 A few more _new_generic instead of via the coercion framework.

### comment:13 Changed 5 years ago by mmezzarobba

• Reviewers changed from Travis Scrimshaw, Travis Scrimshaw to Travis Scrimshaw, Marc Mezzarobba

### comment:14 Changed 5 years ago by tscrim

• Status changed from needs_review to positive_review

Let's get this into Sage now and do further optimizations on a followup ticket.

### comment:15 Changed 5 years ago by vbraun

• Status changed from positive_review to needs_work

Merge conflict...

### comment:16 Changed 5 years ago by git

• Commit changed from 82554f906ead4a2693223115d25b20698c82e1ee to 2e6ce3627e8cc20b0eaf9813d725969e61089871

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

 ​232d8a2 don't call parent() on result from methods of base Polynomial class ​623fac3 speed up truncated multiplication of generic polynomials ​66be984 cache the result of PolynomialRing.is_exact() ​e56be91 polynomial_element: minor code simplifications/speedups ​2e71c9a disable cython checks in generic polynomial multiplication ​7023518 generic Polynomials: [] -> get_unsafe() when possible ​421e9c5 speed up generic Polynomial.__call__() ​59ab09b Using cpdef list list() and some other improvements. ​bff9081 Making list always take a second input for the dense cases and some other speedups. ​2e6ce36 A few more _new_generic instead of via the coercion framework.

### comment:17 Changed 5 years ago by git

• Commit changed from 2e6ce3627e8cc20b0eaf9813d725969e61089871 to 582bc1451110361c164f5af0f7c9e582bca7eba3

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

 ​9076574 don't call parent() on result from methods of base Polynomial class ​3bcc65d speed up truncated multiplication of generic polynomials ​00c32ee cache the result of PolynomialRing.is_exact() ​5a50716 polynomial_element: minor code simplifications/speedups ​f7ac5d3 disable cython checks in generic polynomial multiplication ​53aa45c generic Polynomials: [] -> get_unsafe() when possible ​9c386fb speed up generic Polynomial.__call__() ​9d91c6c Using cpdef list list() and some other improvements. ​ff878d6 Making list always take a second input for the dense cases and some other speedups. ​582bc14 A few more _new_generic instead of via the coercion framework.

### comment:18 Changed 5 years ago by git

• Commit changed from 582bc1451110361c164f5af0f7c9e582bca7eba3 to c4b9000794094b9b116a15d9ab66f831f7f9d126

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

 ​ce745fc don't call parent() on result from methods of base Polynomial class ​e86ab58 speed up truncated multiplication of generic polynomials ​25c536e cache the result of PolynomialRing.is_exact() ​a87df2b polynomial_element: minor code simplifications/speedups ​9439459 disable cython checks in generic polynomial multiplication ​2de8e6b generic Polynomials: [] -> get_unsafe() when possible ​9907674 speed up generic Polynomial.__call__() ​74b1d6b Using cpdef list list() and some other improvements. ​115876d Making list always take a second input for the dense cases and some other speedups. ​c4b9000 A few more _new_generic instead of via the coercion framework.

### comment:19 Changed 5 years ago by mmezzarobba

• Status changed from needs_work to positive_review

Rebased.

### comment:20 follow-up: ↓ 21 Changed 5 years ago by vbraun

• Status changed from positive_review to needs_work

merge conflict...

### comment:21 in reply to: ↑ 20 Changed 5 years ago by mmezzarobba

merge conflict...

Again? With which branch? This ticket's branch (as of c4b9000) is based on the current trac:develop.

### comment:22 Changed 5 years ago by git

• Commit changed from c4b9000794094b9b116a15d9ab66f831f7f9d126 to 29bfdb557e99262cb4c01a6915a55033a4862535

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

 ​660a603 don't call parent() on result from methods of base Polynomial class ​773036b speed up truncated multiplication of generic polynomials ​3429977 cache the result of PolynomialRing.is_exact() ​1925f00 polynomial_element: minor code simplifications/speedups ​095bdc5 disable cython checks in generic polynomial multiplication ​74b56ba generic Polynomials: [] -> get_unsafe() when possible ​03e8691 speed up generic Polynomial.__call__() ​f0f1923 Using cpdef list list() and some other improvements. ​3532b34 Making list always take a second input for the dense cases and some other speedups. ​29bfdb5 A few more _new_generic instead of via the coercion framework.

### comment:23 Changed 5 years ago by mmezzarobba

• Status changed from needs_work to positive_review

Rebased again, on top of 7.6.beta1.

### comment:24 Changed 5 years ago by tscrim

• Status changed from positive_review to needs_work

I am getting the same doctest failure as the patchbot:

**********************************************************************
Failed example:
for N in range(3, max_N):                     # long time
sigma = E.padic_sigma(p, N, E2=E2)         # long time
assert sigma == max_sigma
Exception raised:
Traceback (most recent call last):
File "/home/travis/sage-build/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 498, in _run
self.compile_and_execute(example, compiler, test.globs)
File "/home/travis/sage-build/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 861, in compile_and_execute
exec(compiled, globs)
assert sigma == max_sigma
AssertionError
**********************************************************************
[197 tests, 1 failure, 11.79 s]
----------------------------------------------------------------------
sage -t --long src/sage/schemes/elliptic_curves/padics.py  # 1 doctest failed
----------------------------------------------------------------------

I almost certainly did something wrong with my round of improvements. This will require some digging.

### comment:25 Changed 5 years ago by tscrim

This was introduced by me in 3532b34. Now to figure out exactly what the problem is...

### comment:26 follow-up: ↓ 27 Changed 5 years ago by tscrim

• Branch changed from u/mmezzarobba/speed_up_generic_polynomials to u/tscrim/speed_up_generic_polynomials-22152
• Commit changed from 29bfdb557e99262cb4c01a6915a55033a4862535 to b846e12f17880b85c8929471308918ac572f30ea
• Status changed from needs_work to needs_review

Okay, so the problem was in the change I made to the __iter__ in that iterating over the list is not the same as iterating up to the degree. I don't understand why this results in different behavior. However, for now, I think we can just let it be.

New commits:

 ​b846e12 Reverting change to __iter__.

### comment:27 in reply to: ↑ 26 Changed 5 years ago by mmezzarobba

Okay, so the problem was in the change I made to the __iter__ in that iterating over the list is not the same as iterating up to the degree. I don't understand why this results in different behavior.

Since the failure has to do with p-adics, I suspect the reason may be that:

sage: Pol.<x> = Qp(5)[]
sage: pol = 1 + O(5)*x
sage: pol.degree()
0
sage: list(pol)
[1 + O(5^20), O(5)]

I don't know if this is intentional, nor why the implementation of p-adics take this convention. (I'd have expected degree() to check that the coefficients are nonzero using __nonzero__(), and __nonzero__() to return true except when its argument is exactly zero, like it does for some other inexact parents.) But perhaps we need to check that our other changes don't break things like this.

### comment:28 Changed 5 years ago by tscrim

I don't think we made any other functional changes as get_unsafe has to be consistent with __getitem__ (for non-slices). Everything else is extending current behavior or passing type info to Cython.

I'm not sure if that is intentional either. We might need to ask someone more familiar with the p-adics. Although that could be something we push to another ticket too.

### comment:29 Changed 5 years ago by mmezzarobba

• Status changed from needs_review to positive_review

Agreed.

### comment:30 Changed 5 years ago by vbraun

• Branch changed from u/tscrim/speed_up_generic_polynomials-22152 to b846e12f17880b85c8929471308918ac572f30ea
• Resolution set to fixed
• Status changed from positive_review to closed

### comment:31 Changed 5 years ago by mmezzarobba

• Commit b846e12f17880b85c8929471308918ac572f30ea deleted

Looks like we have another problem:

sage: Pol = ZZ['x']['y']['z']
....: d = Pol.gens_dict_recursive()
....: x, y, z = d['x'], d['y'], d['z']
....: v = QQ(0)
....: (x*z)(x=v)
<BOOM>

### comment:32 Changed 5 years ago by tscrim

This is something very special about it being 0 with the coercion.

### comment:33 Changed 5 years ago by tscrim

This also works for p = x*z:

sage: p(1)(x=v)
0

### comment:34 Changed 5 years ago by tscrim

Followup on #22317.

Note: See TracTickets for help on using tickets.