#22152 closed enhancement (fixed)
speed up generic polynomials
Reported by:  mmezzarobba  Owned by:  

Priority:  major  Milestone:  sage7.6 
Component:  performance  Keywords:  
Cc:  cremona, nbruin, SimonKing  Merged in:  
Authors:  Marc Mezzarobba, Travis Scrimshaw  Reviewers:  Travis Scrimshaw, Marc Mezzarobba 
Report Upstream:  N/A  Work issues:  
Branch:  b846e12 (Commits, GitHub, GitLab)  Commit:  
Dependencies:  Stopgaps: 
Description (last modified by )
Some hopefully representative examples:
QQi.<i> = QuadraticField(1) Pol.<x> = QQi[] p = x + i + 1 q = ( (8*i  12)*x^10 + (i + 2)*x^9 + (i + 1)*x^8 + (i + 7)*x^7 + (i + 1)*x^6 + (12*i + 2)*x^5 + (i + 1)*x^4 + (7*i + 1)*x^3 + (2*i + 1)*x^2 + x  i  2) a = 42 b = QQi(42) mat = matrix(QQi, [[1, i], [0,2]]) y = polygen(Pol, 'y') foo = y^2 + x + 1 fl1 = 1.r RR1 = 1. for cmd in [ 'p*p', 'p**5', 'p._mul_trunc_(q, 3)', 'p(0)', 'q(0)', 'p(a)', 'p(b)', 'q(b)', 'q(fl1)', 'q(RR1)', 'q(a)', 'q(mat)', 'foo(x=1, y=2)', ]: timeit(cmd)
sage: pol = MatrixSpace(ZZ, 1000, 1000)['x'].random_element(degree=10) sage: %timeit pol(42)
Results:
test old new  p*p 13.1 µs 8.14 µs p**5 53.7 µs 35.9 µs p._mul_trunc_(q, 3) 17 µs 8.26 µs p(0) 3.49 µs 920 ns q(0) 3.5 µs 877 ns p(a) 6.72 µs 3.78 µs p(b) 6.68 µs 2.48 µs q(b) 13.8 µs 11.9 µs q(fl1) 7.47 ms 8.24 ms q(RR1) 7.48 ms 8.27 ms q(a) 16.8 µs 15.8 µs q(mat) 290 µs 284 µs foo(x=1, y=2) 64.1 µs 54 µs  pol(42) 208 ms 202 ms
(Evaluations at floatingpoint numbers are very slow, and suffer from a small regression, due to the slowness of the generic coercion map through CLF
.)
Edit: add up to ~0.5 µs to some of the above timings due to subsequent fixes.
Change History (34)
comment:1 Changed 5 years ago by
 Cc cremona nbruin SimonKing added
 Status changed from new to needs_review
comment:2 Changed 5 years ago by
 Commit changed from 9b7eec254e31c45e80cf0e81b9192e9152ccdf18 to 1fb9d2737f5a9e454f25bf80a1eeef7b4825ccfd
comment:3 Changed 5 years ago by
 Description modified (diff)
 Summary changed from speed up generic polynomials a little to speed up generic polynomials
The tests in rings/
pass; I haven't yet run the whole test suite.
comment:4 Changed 5 years ago by
 Commit changed from 1fb9d2737f5a9e454f25bf80a1eeef7b4825ccfd to 7488953070e2ee0f503c4b5a3258bc12492c9ee8
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
7488953  speed up generic Polynomial.__call__()

comment:5 followup: ↓ 6 Changed 5 years ago by
Looks good overall. From a quick look through the code, it seems like the inputs to the do_*
methods always take lists as inputs. Likewise, should we make list
a cpdef
method that guarantees a list being returned? This might get us some extra speed as well.
comment:6 in reply to: ↑ 5 Changed 5 years ago by
Replying to tscrim:
Looks good overall. From a quick look through the code, it seems like the inputs to the
do_*
methods always take lists as inputs. Likewise, should we makelist
acpdef
method that guarantees a list being returned? This might get us some extra speed as well.
Thanks for the suggestion. Adding type annotations to do_*
and brutally replacing def list(...)
by cdef list list(...)
everywhere actually seems to make things slower, though the difference is small. I didn't try to investigate any further. And I must confess that it is not at all clear to me what implications those cdef
have with the relatively complex hierarchy of subclasses that override list()
(sometimes from Cython, sometimes from Python, sometimes with additional arguments).
comment:7 followup: ↓ 8 Changed 5 years ago by
You might be loosing some speed by variables not being cdef'ed as a list. Did you happen to save your current branch? I can play around with it later tonight to see if I can get some extra speed out of it.
comment:8 in reply to: ↑ 7 Changed 5 years ago by
I noticed a small problem with the last commit: it would cause terrible regression for cases like polynomials over large matrix spaces evaluated at scalars. I'll push a fix in a minute.
Replying to tscrim:
You might be loosing some speed by variables not being cdef'ed as a list. Did you happen to save your current branch?
Yes, see u/mmezzarobba/speed_up_generic_polynomialsalt
.
comment:9 Changed 5 years ago by
 Commit changed from 7488953070e2ee0f503c4b5a3258bc12492c9ee8 to 25f884d48ae7392c908231dfcaa1931b66cbe900
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
25f884d  speed up generic Polynomial.__call__()

comment:10 Changed 5 years ago by
 Description modified (diff)
comment:11 Changed 5 years ago by
 Branch changed from u/mmezzarobba/speed_up_generic_polynomials to u/tscrim/speed_up_generic_polynomials22152
 Commit changed from 25f884d48ae7392c908231dfcaa1931b66cbe900 to a75321719bd7de6637b740bf00c6f266e7acfda7
 Milestone changed from sage7.5 to sage7.6
 Reviewers set to Travis Scrimshaw
I went through and made all of the def list(self)
methods into cpdef list list(self, bint copy=True)
to take advantage of the times when we don't have to copy the list and to specify the output type. This didn't give as much of a speed bump as I was expecting. I also made some other improvements by using get_unsafe
and other things. Here are my timings:
Current branch:
625 loops, best of 3: 6.68 µs per loop 625 loops, best of 3: 29.2 µs per loop 625 loops, best of 3: 6.77 µs per loop 625 loops, best of 3: 739 ns per loop 625 loops, best of 3: 699 ns per loop 625 loops, best of 3: 3.3 µs per loop 625 loops, best of 3: 2.23 µs per loop 625 loops, best of 3: 10.4 µs per loop 125 loops, best of 3: 7.11 ms per loop 125 loops, best of 3: 7.1 ms per loop 625 loops, best of 3: 13.2 µs per loop 625 loops, best of 3: 247 µs per loop 625 loops, best of 3: 46.7 µs per loop
vs my new branch
625 loops, best of 3: 6.5 µs per loop 625 loops, best of 3: 28.6 µs per loop 625 loops, best of 3: 6.64 µs per loop 625 loops, best of 3: 739 ns per loop 625 loops, best of 3: 737 ns per loop 625 loops, best of 3: 3.13 µs per loop 625 loops, best of 3: 2.19 µs per loop 625 loops, best of 3: 10.6 µs per loop 125 loops, best of 3: 6.88 ms per loop 125 loops, best of 3: 6.84 ms per loop 625 loops, best of 3: 13.4 µs per loop 625 loops, best of 3: 240 µs per loop 625 loops, best of 3: 44.4 µs per loop
The small timings have too much variation to say there is a definitive regression. However, the arithmetic operations have improved.
Something else I noticed that probably could get us some speed is requiring all polynomials to have a _new_constant_poly
. Currently only the dense ones have such a method. Shall we do that here?
New commits:
f355638  wip

b3878a2  Merge branch 'u/mmezzarobba/speed_up_generic_polynomialsalt' of git://trac.sagemath.org/sage into u/mmezzarobba/speed_up_generic_polynomialsalt

526fdad  Using cpdef list list() and some other improvements.

6b53da5  Making list always take a second input for the dense cases and some other speedups.

a753217  A few more _new_generic instead of via the coercion framework.

comment:12 Changed 5 years ago by
 Branch changed from u/tscrim/speed_up_generic_polynomials22152 to u/mmezzarobba/speed_up_generic_polynomials
 Commit changed from a75321719bd7de6637b740bf00c6f266e7acfda7 to 82554f906ead4a2693223115d25b20698c82e1ee
 Reviewers changed from Travis Scrimshaw to Travis Scrimshaw, Travis Scrimshaw
Thanks for the improvements! I cleaned up the branch a little (rebased, squashed my "wip" commit into the following one) without changing the code.
Regarding _new_constant_poly()
for sparse polynomials: I really don't think I'll have time to make further changes until Feb. 8 at best, though I'll do my best to review yours if you make some. In any case I'm happy with the current branch; you can set the ticket to positive review if you agree, assuming the patchbot doesn't find anything to complain about.
New commits:
40cabed  don't call parent() on result from methods of base Polynomial class

425aa01  speed up truncated multiplication of generic polynomials

edb148d  cache the result of PolynomialRing.is_exact()

4628080  polynomial_element: minor code simplifications/speedups

901e40e  disable cython checks in generic polynomial multiplication

67aadec  generic Polynomials: [] > get_unsafe() when possible

856a74a  speed up generic Polynomial.__call__()

6bca846  Using cpdef list list() and some other improvements.

04e89ae  Making list always take a second input for the dense cases and some other speedups.

82554f9  A few more _new_generic instead of via the coercion framework.

comment:13 Changed 5 years ago by
 Reviewers changed from Travis Scrimshaw, Travis Scrimshaw to Travis Scrimshaw, Marc Mezzarobba
comment:14 Changed 5 years ago by
 Status changed from needs_review to positive_review
Let's get this into Sage now and do further optimizations on a followup ticket.
comment:15 Changed 5 years ago by
 Status changed from positive_review to needs_work
Merge conflict...
comment:16 Changed 5 years ago by
 Commit changed from 82554f906ead4a2693223115d25b20698c82e1ee to 2e6ce3627e8cc20b0eaf9813d725969e61089871
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
232d8a2  don't call parent() on result from methods of base Polynomial class

623fac3  speed up truncated multiplication of generic polynomials

66be984  cache the result of PolynomialRing.is_exact()

e56be91  polynomial_element: minor code simplifications/speedups

2e71c9a  disable cython checks in generic polynomial multiplication

7023518  generic Polynomials: [] > get_unsafe() when possible

421e9c5  speed up generic Polynomial.__call__()

59ab09b  Using cpdef list list() and some other improvements.

bff9081  Making list always take a second input for the dense cases and some other speedups.

2e6ce36  A few more _new_generic instead of via the coercion framework.

comment:17 Changed 5 years ago by
 Commit changed from 2e6ce3627e8cc20b0eaf9813d725969e61089871 to 582bc1451110361c164f5af0f7c9e582bca7eba3
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
9076574  don't call parent() on result from methods of base Polynomial class

3bcc65d  speed up truncated multiplication of generic polynomials

00c32ee  cache the result of PolynomialRing.is_exact()

5a50716  polynomial_element: minor code simplifications/speedups

f7ac5d3  disable cython checks in generic polynomial multiplication

53aa45c  generic Polynomials: [] > get_unsafe() when possible

9c386fb  speed up generic Polynomial.__call__()

9d91c6c  Using cpdef list list() and some other improvements.

ff878d6  Making list always take a second input for the dense cases and some other speedups.

582bc14  A few more _new_generic instead of via the coercion framework.

comment:18 Changed 5 years ago by
 Commit changed from 582bc1451110361c164f5af0f7c9e582bca7eba3 to c4b9000794094b9b116a15d9ab66f831f7f9d126
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
ce745fc  don't call parent() on result from methods of base Polynomial class

e86ab58  speed up truncated multiplication of generic polynomials

25c536e  cache the result of PolynomialRing.is_exact()

a87df2b  polynomial_element: minor code simplifications/speedups

9439459  disable cython checks in generic polynomial multiplication

2de8e6b  generic Polynomials: [] > get_unsafe() when possible

9907674  speed up generic Polynomial.__call__()

74b1d6b  Using cpdef list list() and some other improvements.

115876d  Making list always take a second input for the dense cases and some other speedups.

c4b9000  A few more _new_generic instead of via the coercion framework.

comment:20 followup: ↓ 21 Changed 5 years ago by
 Status changed from positive_review to needs_work
merge conflict...
comment:21 in reply to: ↑ 20 Changed 5 years ago by
Replying to vbraun:
merge conflict...
Again? With which branch? This ticket's branch (as of c4b9000
) is based on the current trac:develop
.
comment:22 Changed 5 years ago by
 Commit changed from c4b9000794094b9b116a15d9ab66f831f7f9d126 to 29bfdb557e99262cb4c01a6915a55033a4862535
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
660a603  don't call parent() on result from methods of base Polynomial class

773036b  speed up truncated multiplication of generic polynomials

3429977  cache the result of PolynomialRing.is_exact()

1925f00  polynomial_element: minor code simplifications/speedups

095bdc5  disable cython checks in generic polynomial multiplication

74b56ba  generic Polynomials: [] > get_unsafe() when possible

03e8691  speed up generic Polynomial.__call__()

f0f1923  Using cpdef list list() and some other improvements.

3532b34  Making list always take a second input for the dense cases and some other speedups.

29bfdb5  A few more _new_generic instead of via the coercion framework.

comment:23 Changed 5 years ago by
 Status changed from needs_work to positive_review
Rebased again, on top of 7.6.beta1.
comment:24 Changed 5 years ago by
 Status changed from positive_review to needs_work
I am getting the same doctest failure as the patchbot:
sage t long src/sage/schemes/elliptic_curves/padics.py ********************************************************************** File "src/sage/schemes/elliptic_curves/padics.py", line 1058, in sage.schemes.elliptic_curves.padics.padic_sigma Failed example: for N in range(3, max_N): # long time sigma = E.padic_sigma(p, N, E2=E2) # long time assert sigma == max_sigma Exception raised: Traceback (most recent call last): File "/home/travis/sagebuild/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 498, in _run self.compile_and_execute(example, compiler, test.globs) File "/home/travis/sagebuild/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 861, in compile_and_execute exec(compiled, globs) File "<doctest sage.schemes.elliptic_curves.padics.padic_sigma[19]>", line 3, in <module> assert sigma == max_sigma AssertionError ********************************************************************** 1 item had failures: 1 of 21 in sage.schemes.elliptic_curves.padics.padic_sigma [197 tests, 1 failure, 11.79 s]  sage t long src/sage/schemes/elliptic_curves/padics.py # 1 doctest failed 
I almost certainly did something wrong with my round of improvements. This will require some digging.
comment:25 Changed 5 years ago by
This was introduced by me in 3532b34. Now to figure out exactly what the problem is...
comment:26 followup: ↓ 27 Changed 5 years ago by
 Branch changed from u/mmezzarobba/speed_up_generic_polynomials to u/tscrim/speed_up_generic_polynomials22152
 Commit changed from 29bfdb557e99262cb4c01a6915a55033a4862535 to b846e12f17880b85c8929471308918ac572f30ea
 Status changed from needs_work to needs_review
Okay, so the problem was in the change I made to the __iter__
in that iterating over the list is not the same as iterating up to the degree. I don't understand why this results in different behavior. However, for now, I think we can just let it be.
New commits:
b846e12  Reverting change to __iter__.

comment:27 in reply to: ↑ 26 Changed 5 years ago by
Replying to tscrim:
Okay, so the problem was in the change I made to the
__iter__
in that iterating over the list is not the same as iterating up to the degree. I don't understand why this results in different behavior.
Since the failure has to do with padics, I suspect the reason may be that:
sage: Pol.<x> = Qp(5)[] sage: pol = 1 + O(5)*x sage: pol.degree() 0 sage: list(pol) [1 + O(5^20), O(5)]
I don't know if this is intentional, nor why the implementation of padics take this convention. (I'd have expected degree()
to check that the coefficients are nonzero using __nonzero__()
, and __nonzero__()
to return true except when its argument is exactly zero, like it does for some other inexact parents.) But perhaps we need to check that our other changes don't break things like this.
comment:28 Changed 5 years ago by
I don't think we made any other functional changes as get_unsafe
has to be consistent with __getitem__
(for nonslices). Everything else is extending current behavior or passing type info to Cython.
I'm not sure if that is intentional either. We might need to ask someone more familiar with the padics. Although that could be something we push to another ticket too.
comment:30 Changed 5 years ago by
 Branch changed from u/tscrim/speed_up_generic_polynomials22152 to b846e12f17880b85c8929471308918ac572f30ea
 Resolution set to fixed
 Status changed from positive_review to closed
comment:31 Changed 5 years ago by
 Commit b846e12f17880b85c8929471308918ac572f30ea deleted
Looks like we have another problem:
sage: Pol = ZZ['x']['y']['z'] ....: d = Pol.gens_dict_recursive() ....: x, y, z = d['x'], d['y'], d['z'] ....: v = QQ(0) ....: (x*z)(x=v) <BOOM>
comment:32 Changed 5 years ago by
This is something very special about it being 0 with the coercion.
comment:33 Changed 5 years ago by
This also works for p = x*z
:
sage: p(1)(x=v) 0
comment:34 Changed 5 years ago by
Followup on #22317.
Branch pushed to git repo; I updated commit sha1. New commits:
generic Polynomials: [] > get_unsafe() when possible
speed up generic Polynomial.__call__()