Opened 11 years ago
Closed 7 years ago
#10255 closed defect (fixed)
Correct karatsuba multiplication of univariate polynomials for different degree polynomials
Reported by:  lftabera  Owned by:  AlexGhitza 

Priority:  major  Milestone:  sage6.1 
Component:  basic arithmetic  Keywords:  karatsuba, multiplication, polynomial 
Cc:  Merged in:  
Authors:  Luis Felipe Tabera Alonso, Marc Mezzarobba  Reviewers:  Marc Mezzarobba, Luis Felipe Tabera Alonso 
Report Upstream:  N/A  Work issues:  
Branch:  u/mmezzarobba/10255karatsuba  Commit:  597747007214bdb661d112296f54a77989aa2a01 
Dependencies:  Stopgaps: 
Description (last modified by )
In the generic implementation of univariate polynomials over exact rings, plain karatsuba is used.
However, for different degree polynomials, the karatsuba code makes more products and additions than the classical multiplication code. Moreover, for equally sized polynomials, the degree in which karatsuba starts to be better than plain multiplication looks too high for me.
See attachments comparison_product_50_400.png and comparison_addition_50_400.png for the number of operations multiplying degree 50 and 400 polynomials, in yellow, the number of operations using _mul_generic, in red using current _mul_karatsuba as of 4.6.1 and in blue with the patch proposed.
sage: K=QQ[I][x] sage: f=K.random_element(1500) sage: g=K.random_element(1500) sage: %time _ = f._mul_generic(g) CPU times: user 6.03 s, sys: 0.08 s, total: 6.11 s Wall time: 6.43 s sage: %time _ = f._mul_karatsuba(g) CPU times: user 6.95 s, sys: 0.06 s, total: 7.01 s Wall time: 7.76 s
See comment:13 for some benchmarks
Apply: trac_10255_karatsuba_improbement.patch
Attachments (6)
Change History (36)
comment:1 Changed 11 years ago by
comment:2 Changed 10 years ago by
the code is too naive for different sized polynomials. Even with my improvements I get this:
Two degree 849 polynomials, no threshold:
274750 additions
87729 products
One degree 849 polynomial and a 449 polynomial:
472812 additions
218649 products
Experimental data
sage: f=ZZ[x].random_element(849) sage: g=ZZ[x].random_element(849) sage: %timeit f._mul_karatsuba(g) 5 loops, best of 3: 95.9 ms per loop sage: f=ZZ[x].random_element(849) sage: g=ZZ[x].random_element(449) sage: %timeit f._mul_karatsuba(g) 5 loops, best of 3: 218 ms per loop
The shorter product is more expensive!
I have a patch right now to solve this, have to clean it up though.
comment:3 Changed 10 years ago by
Further improvements for polynomials of of different sizes:
if f is of degree m1 and g is of degree n1 with m<n, consider g as:
g0 + x^m*g1 + x^(2*m)g2 + ... + x^(q*m)*gq
Compute the products f*gi with karatsuba and reconstruct f*g
With this strategy, for polynomials of degree 849 and 449 I get:
214507 additions
39942 products
Far better from previous method. Anyway this is not optimal and it is hard to messure the recursion overhead. I have taken this strategy from old versions of gmp documentation.
Some timmings (Debian 64bits). All polyomials have been generated with random_element without further parameters. Pure karatsuba without threshold for better comparison with old code. With an appropriate threshold the timming whould slightly better.
Note that for rigns like QQ[x] and GF(2)[x] sage uses faster special libraries (FLINT, NTL and the like).
Degree f:1000,g:1000
QQ[x] old code: 828 ms new code: 498 ms GF(2)[x] old code: 284 ms new code: 99.1 ms QQ[sqrt(2)][x] old code: 2.61 s new code: 480 ms # 5 times faster ZZ[x]['y']['z']['t'] old code: 86.95 s new code: 71.10 s
degree f:849, g:449
QQ[x] old code: 988 ms new code: 343 ms GF(2)[x] old code: 148 ms new code: 67.6 ms QQ[sqrt(2)][x] old code: 1.93 s new code: 313 ms #6 times faster ZZ[x]['y']['z']['t'] old code: 342.16 s new code: 48.11 s # 7 times faster
degree 3, 5
QQ[x] old code: 135 µs new code: 83.6 µs GF(2)[x] old code: 137 µs new code: 100 µs QQ[sqrt(2)][x] old code: 323 µs new code: 86.4 µs ZZ[x]['y']['z']['t'] old code: 19.5 ms new code: 13.2 ms
TODO:
 Sensible default threshold.
 Documentation cleanup. (almost done)
 More testing for correctness rather than speed. This is a very basic feature, so we cannot allow correctness bugs here.
comment:4 Changed 10 years ago by
All doctest passes and I am trying to make doctest independent of future changes of the default threshold. But I get the following difference that puzzles me.
sage t long force_lib "devel/sage/sage/rings/qqbar.py" ********************************************************************** File "/home/rosamari/sage/devel/sage/sage/rings/qqbar.py", line 373: sage: sage_input(one, verify=True) Expected: # Verified R.<x> = QQbar[] v1 = AA(2) v2 = QQbar(sqrt(v1)) v3 = QQbar(3) v4 = sqrt(v3) v5 = v2*v4 v6 = (1  v2)*(1  v4)  1  v5 v7 = QQbar(sqrt(v1)) v8 = sqrt(v3) si1 = v7*v8 cp = AA.common_polynomial(x^2 + ((1  v7)*(1 + v8)  1 + si1)*x  si1) v9 = QQbar.polynomial_root(cp, RIF(RR(1.7320508075688774), RR(1.7320508075688772))) v10 = 1  v9 v11 = v6 + (v10  1) v12 = 1  v4  QQbar.polynomial_root(cp, RIF(RR(1.7320508075688774), RR(1.7320508075688772))) v13 = 1 + v12 v14 = v10*(v6 + v5)  (v6  v5*v9) si2 = v5*v9 AA.polynomial_root(AA.common_polynomial(x^4 + (v11 + (v13  1))*x^3 + (v14 + (v13*v11  v11))*x^2 + (v13*(v14  si2)  (v14  si2*v12))*x  si2*v12), RIF(RR(0.99999999999999989), RR(1.0000000000000002))) Got: # Verified R.<x> = QQbar[] v1 = QQbar(3) v2 = sqrt(v1) v3 = AA(2) v4 = QQbar(sqrt(v3)) v5 = sqrt(v1) si1 = v4*v5 cp = AA.common_polynomial(x^2 + ((1  v4)*(1 + v5)  1 + si1)*x  si1) v6 = 1  v2  QQbar.polynomial_root(cp, RIF(RR(1.7320508075688774), RR(1.7320508075688772))) v7 = QQbar.polynomial_root(cp, RIF(RR(1.7320508075688774), RR(1.7320508075688772))) v8 = QQbar(sqrt(v3)) v9 = v8*v2 v10 = (1  v8)*(1  v2)  1  v9 v11 = v7 + v10 v12 = v6*v11 si2 = v7*v9 v13 = (1  v7)*(v10 + v9)  v10 + si2 si3 = v6*si2 AA.polynomial_root(AA.common_polynomial(x^4 + ((1 + v6)*(1 + v11)  1  v12)*x^3 + (v12 + v13)*x^2 + ((1 + v6)*(v13  si2)  v13 + si3)*x  si3), RIF(RR(0.99999999999999989), RR(1.0000000000000002))) ********************************************************************** 1 items had failures: 1 of 152 in __main__.example_0 ***Test Failed*** 1 failures.
comment:5 Changed 10 years ago by
I have eliminated karatsuba_sumand karatsuba_diff so there are less function calls and less if branches in the coding (we know an can predict at each step the length of every list.
I have also eliminated most slicing of lists so less objects are creating. The recursion does not operates over slices but over the orginial list plus some pointers to the slice we are working on.
With this improvements the code is about 10% faster. I also think that improves readability.
For number fields of degree 2 and 3 the code is about three times slower than pari. For number fields of degree >=6 sage seems faster than pari.
For dense bivariate polynomials in QQ[x][y] the multiplication starts to be faster than in QQ[x,y] for degree 16, for dense polynomials of degree 100 QQ[x][y] is 45x faster than QQ[x,y]. Of course, for nondense polynomials multivariate multiplication is much much faster.
Some doctests added to compare karatsuba with generic multiplication and doctest for the noncommutative ring case.
Still has the QQbar fail.
comment:6 followup: ↓ 7 Changed 10 years ago by
Hello,
First of all, I want to say that it's great that you are looking at this part of the Sage library and are trying to improve it!
From reading your ticket progress, it seems that perhaps you might benefit from some help with this. Have you considered advertising this problem on the developers' list at sagedevel? Perhaps someone else is interested in this, too.
In any case, here are a couple of comments:
1) What is your main motivation for wanting to improve this code? If you want to achieve faster computations in polynomial rings over specific rings, QQ[i], you should most definitely either write some Cython code or even a separate C library. You should *not* be tinkering with the generic Python multiplication code. To keep with the example, if you are interested in QQ[i], then (just as a first idea) you should represent your polynomials as sums of two polynomials internally, one for the real part and one for the imaginary code. This approach also applies to other number fields of small degree.
2) Stating what you definitely already know: Karatsuba multiplication trades additions and multiplications in the base ring. Thus, the threshold highly depends on the relative speeds of addition and multiplication, and it will be very difficult for you to make a default choice that is useful in general. Even worse, what about base rings where multiplication is faster than addition?
3) I haven't looked at the details, but I believe you might simply not be able to apply this code to polynomial rings with inexact base rings.
All that said, while this patch looks very interesting, I think there are a lot of things that need to be sorted out. By the way, I am not quite sure whether any one else is currently reading this ticket, but at least during the next week I should have a lot of time to look at this.
Best wishes,
Sebastian
comment:7 in reply to: ↑ 6 Changed 10 years ago by
Sebastian,
Thanks for looking at this. I feel that the ticket is ready for review, but there is a doctest fail and I wanted to understand why. My impression is that the doctest should be changed, but I did not have enough time to dig it.
Replying to spancratz:
1) What is your main motivation for wanting to improve this code? If you want to achieve faster computations in polynomial rings over specific rings, QQ[i], you should most definitely either write some Cython code or even a separate C library. You should *not* be tinkering with the generic Python multiplication code. To keep with the example, if you are interested in QQ[i], then (just as a first idea) you should represent your polynomials as sums of two polynomials internally, one for the real part and one for the imaginary code. This approach also applies to other number fields of small degree.
My primary interest was univariate polynomial rings over number fields of degree around 3040. I was considering to use the specific class I have added in #8558, because, for number field base rings, the slowness comes really from adding number field elements by python 0. But then I realised that the karatsuba multiplication is not well enough implemented if both polynomials have different degree, so I changed the cython code improving here and there.
2) Stating what you definitely already know: Karatsuba multiplication trades additions and multiplications in the base ring. Thus, the threshold highly depends on the relative speeds of addition and multiplication, and it will be very difficult for you to make a default choice that is useful in general. Even worse, what about base rings where multiplication is faster than addition?
That is why the default threshold is zero. No classical multiplication at all. But I have added a ringdependent parameter to change this value. Univariate polynomial rings have now a method set_karatsuba_threshold to deal with this issue. Rings in which addition is faster than multiplication also benefits from karatsuba, since the number of additions is also subquadratic in the degree of the polynomials.
3) I haven't looked at the details, but I believe you might simply not be able to apply this code to polynomial rings with inexact base rings.
By default, unless the user calls explicitelly _mul_karatsuba, mutiplication for inexact rings is the classical multiplication algorithm. I have not changed this. Also,if a class (ZZ[x], GF(5)[x], etc.) has its own multiplication code, my patch does not change this.
All that said, while this patch looks very interesting, I think there are a lot of things that need to be sorted out. By the way, I am not quite sure whether any one else is currently reading this ticket, but at least during the next week I should have a lot of time to look at this.
Thanks!, if you have any question on why I changed this or that or have further suggestions/improvements, do not hesitate to ask.
Luis
comment:8 Changed 10 years ago by
One can actually a speedup of more than a factor 10 by using PARI:
The following function converts a polynomial over a number field to a PARI object:
def nfpol_to_pari(f): return pari([c._pari_('a') for c in f.list()]).Polrev()
Now let's try your example:
sage: K=QQ[I][x] sage: f=K.random_element(1500) sage: g=K.random_element(1500) sage: %time _ = f._mul_generic(g) CPU times: user 3.57 s, sys: 0.00 s, total: 3.57 s Wall time: 3.59 s sage: %time _ = f._mul_karatsuba(g) CPU times: user 5.06 s, sys: 0.05 s, total: 5.11 s Wall time: 5.12 s sage: fpari = nfpol_to_pari(f) sage: gpari = nfpol_to_pari(g) sage: %time _ = fpari*gpari CPU times: user 0.26 s, sys: 0.00 s, total: 0.26 s Wall time: 0.26 s
Since PARI is written completely in C, it's obvious why one could expect such a speedup. Obviously, writing specialized code to deal with number field polynomials would even be better.
comment:9 Changed 10 years ago by
Obviously you did not try the computations with my patch, it is still much slower than PARI for small number fields (as I acknowledge in the ticket, PARI is around three times faster for degree two extensions):
sage: K=QQ[I][x] sage: f=K.random_element(1500) sage: g=K.random_element(1500) sage: %time _ = f._mul_generic(g) CPU times: user 5.94 s, sys: 0.02 s, total: 5.96 s Wall time: 6.01 s sage: %time _ = f._mul_karatsuba(g) CPU times: user 1.31 s, sys: 0.00 s, total: 1.32 s Wall time: 1.33 s sage: K.set_karatsuba_threshold(8) sage: %time _ = f._mul_karatsuba(g) CPU times: user 0.97 s, sys: 0.00 s, total: 0.97 s Wall time: 1.00 s sage: %time fpari = nfpol_to_pari(f) CPU times: user 0.24 s, sys: 0.00 s, total: 0.24 s Wall time: 0.25 s sage: %time gpari = nfpol_to_pari(g) CPU times: user 0.24 s, sys: 0.00 s, total: 0.24 s Wall time: 0.25 s sage: %time hpari = fpari*gpari CPU times: user 0.42 s, sys: 0.00 s, total: 0.42 s Wall time: 0.44 s sage: %time h = K(hpari) CPU times: user 0.70 s, sys: 0.02 s, total: 0.72 s Wall time: 0.73 s
If you take into account a naive transformation from sage data to pari and back, the total time is worse than the mul_karatsuba time. So if a specialiczed class would use PARI, this transformation has to be taken into account.
This pari advantage disapears for bigger number fields
sage: K=CyclotomicField(7)[x] sage: f=K.random_element(1500) sage: g=K.random_element(1500) sage: %time _ = f._mul_karatsuba(g) CPU times: user 15.33 s, sys: 0.02 s, total: 15.35 s Wall time: 15.44 s sage: %time _ = f._mul_karatsuba(g,8) CPU times: user 12.96 s, sys: 0.08 s, total: 13.03 s Wall time: 13.10 s sage: %time fpari = nfpol_to_pari(f) CPU times: user 0.30 s, sys: 0.00 s, total: 0.30 s Wall time: 0.32 s sage: %time gpari = nfpol_to_pari(g) CPU times: user 0.30 s, sys: 0.00 s, total: 0.31 s Wall time: 0.32 s sage: %time hpari = fpari*gpari CPU times: user 24.26 s, sys: 0.02 s, total: 24.28 s Wall time: 25.23 s sage: %time h = K(hpari) CPU times: user 1.48 s, sys: 0.03 s, total: 1.50 s Wall time: 1.52 s
Note: for absolute number fields, I am used to set a karatsuba threshold in 8.
comment:10 followup: ↓ 11 Changed 10 years ago by
Hi Luis,
I'm sorry for the delay in my reply. Also, thank you Jeroen for adding the explicit timings. Let me just go through a couple of things:
 "I feel that the ticket is ready for review"
I don't really agree. It is certainly in a rather useful state, but it requires more people looking at it, using it, and providing feedback.
 For one, as you say, this "is a very basic feature, so we cannot allow correctness bugs here." There should be a *lot* more tests. Rather than having a single test case for illustrative purposes (as is the case in many places in the Sage library), there should be more bulk tests here. You could have that in the Sage framework e.g. by creating a list with
f._mul_karatsuba(g)  f*g
for many random pairs (f,g), then remove duplicates from the list, and check that the list has length one and only contains 0.
 Your code is still noticeably slower than PARI for the base ring being a number field of small degree. I guess strongly that this will be a much more typical use case than large degree number fields and as such this is not acceptable.
 Another one of your examples is
ZZ[x]y?z?t?
. I think this is a very bad example. I don't think one should ever expect a user to work with a multivariate polynomial ring in this way just to get some obscure speedup. Of course, dense and sparse multivariate polynomial rings offer massively different performance characteristic. Perhaps there should be a
dense
option to
MPolynomialRing
, but that's not for this ticket. In any case, this example should not be relevant here. Finally, if you really care about dense multivariate polynomials over the integers, I am sure this can be done much, much faster using C or Cython and arrays of
mpz_t
's for the coefficients. The speedup of this will be much, much larger than anything you get from nesting a bunch univariate polynomial rings in Python.
 Finally, the statement
"If you take into account a naive transformation from sage data to pari and back, the total time is worse than the mul_karatsuba time. So if a specialiczed class would use PARI, this transformation has to be taken into account."
is not true. In short, the code for polynomials over number fields should then be changed so that the internal representation simply was the PARI object  no converting back and forth in between the computations whatsoever!
I have looked at the currently only remaining failing test case in QQbar. Unfortunately, I am very puzzled by what could possibly cause this and don't have any help to offer right now.
By the way, I am sorry if this might sound a little harsh in places. Of course, as said, I am still very happy to keep looking at this etc.
Best wishes,
Sebastian
comment:11 in reply to: ↑ 10 Changed 10 years ago by
Hi Sebastian,
Replying to spancratz:
Hi Luis,
I'm sorry for the delay in my reply. Also, thank you Jeroen for adding the explicit timings. Let me just go through a couple of things:
 "I feel that the ticket is ready for review"
I don't really agree. It is certainly in a rather useful state, but it requires more people looking at it, using it, and providing feedback.
 For one, as you say, this "is a very basic feature, so we cannot allow correctness bugs here." There should be a *lot* more tests. Rather than having a single test case for illustrative purposes (as is the case in many places in the Sage library), there should be more bulk tests here. You could have that in the Sage framework e.g. by creating a list with
f._mul_karatsuba(g)  f*g
for many random pairs (f,g), then remove duplicates from the list, and check that the list has length one and only contains 0.
This is my fault. Of course I have made much more testing (random polynomials over different rings and ranged degrees), but they are not reflected in the ticket. I will upload the scripts I used for testing.
 Your code is still noticeably slower than PARI for the base ring being a number field of small degree. I guess strongly that this will be a much more typical use case than large degree number fields and as such this is not acceptable.
Ok I agree, but this ticket is not about polynomials for number fields, but multiplication of Polynomial_generic_dense_field elements.
A new class for polynomials over number fields would be good, but that is a different ticket. Also, the claim that PARI is better for small degrees is not so clear to me, at least without some nontrivial work. K.random_element creates polynomials with very small coefficients and no denominator, these polynomials are not the most common userbase case in my opinion. For such polynomials, it seems to me that memory management is the problem, creating and destroying lots of number field elements. If you take not so big coefficients and, specially, if your coefficients do have denominators, then PARI is not better due to flint integers in sage.
Note: computations made on an old laptop.
sage: f=K.random_element(15,10**6) / ZZ.random_element(10**6) sage: g=K.random_element(15,10**6) / ZZ.random_element(10**6) sage: fpari = nfpol_to_pari(f) sage: gpari = nfpol_to_pari(g) sage: %timeit _=f._mul_karatsuba(g,8) 125 loops, best of 3: 4.56 ms per loop sage: %timeit fpari*gpari 25 loops, best of 3: 11 ms per loop sage: f=K.random_element(1500,10**6) / ZZ.random_element(10**6) sage: g=K.random_element(1500,10**6) / ZZ.random_element(10**6) sage: fpari = nfpol_to_pari(f) sage: gpari = nfpol_to_pari(g) sage: %time _=f._mul_karatsuba(g,8) CPU times: user 3.90 s, sys: 0.05 s, total: 3.95 s Wall time: 9.30 s sage: %time _=fpari*gpari CPU times: user 7.17 s, sys: 0.08 s, total: 7.25 s Wall time: 18.11 s
 Another one of your examples is
ZZ[x]y?z?t?
. I think this is a very bad example. I don't think one should ever expect a user to work with a multivariate polynomial ring in this way just to get some obscure speedup. Of course, dense and sparse multivariate polynomial rings offer massively different performance characteristic. Perhaps there should be a
dense
option to
MPolynomialRing
, but that's not for this ticket. In any case, this example should not be relevant here. Finally, if you really care about dense multivariate polynomials over the integers, I am sure this can be done much, much faster using C or Cython and arrays of
mpz_t
's for the coefficients. The speedup of this will be much, much larger than anything you get from nesting a bunch univariate polynomial rings in Python.
Ok, this is a bad example, I needed some rings that produced genuine Polynomial_generic_dense_field appart from number fields. I also wanted to compare the algorithm with singular multivariate multiplication which I think it is considered fairly good. Note that ZZ[x][y] are currently arrays of mpz_t elements.
 Finally, the statement
"If you take into account a naive transformation from sage data to pari and back, the total time is worse than the mul_karatsuba time. So if a specialiczed class would use PARI, this transformation has to be taken into account."
is not true. In short, the code for polynomials over number fields should then be changed so that the internal representation simply was the PARI object  no converting back and forth in between the computations whatsoever!
Answered above.
I have looked at the currently only remaining failing test case in QQbar. Unfortunately, I am very puzzled by what could possibly cause this and don't have any help to offer right now.
QQbar is an exact ring, so it will use karatsuba multiplication. But computations are done inexactly if they are safe. Since the karatsuba code for different degree polynomials have changed, then the inexact results for inexact rings have also changed. I think that this is the reason for the change of the doctest here.
By the way, I am sorry if this might sound a little harsh in places. Of course, as said, I am still very happy to keep looking at this etc.
Do not worry, certainly your comments and Jeroen's are really welcome.
Luis
comment:12 Changed 10 years ago by
Hi Luis,
Ok, this is a bad example, I needed some rings that produced genuine Polynomial_generic_dense_field appart from number fields.
Ok I agree, but this ticket is not about polynomials for number fields, but multiplication of Polynomial_generic_dense_field elements.
Well, which base ring is it that you are interested in?
I don't really think there is all that much point in squeezing a small improvement out of the generic case when often much more is available, even more so when it is not clear that your patch improves the runtime across all base rings.
By the way, perhaps other developers might disagree with me. Perhaps it would be a good idea for you to ask about this on sagedevel?
Of course, these are just words... Here's an example:
sage: R.<x> = QQ[I][] sage: f = R.random_element(1500) sage: g = R.random_element(1500) sage: %time _ = f._mul_generic(g) CPU times: user 4.39 s, sys: 0.01 s, total: 4.40 s Wall time: 4.42 s sage: %time _ = f._mul_karatsuba(g) CPU times: user 5.63 s, sys: 0.04 s, total: 5.67 s Wall time: 5.73 s sage: S.<y> = QQ[] sage: f0 = S([f[i][0] for i in range(f.degree() + 1)]) sage: f1 = S([f[i][1] for i in range(f.degree() + 1)]) sage: g0 = S([g[i][0] for i in range(g.degree() + 1)]) sage: g1 = S([g[i][1] for i in range(g.degree() + 1)]) sage: timeit('f0 * g0  f1 * g1') 125 loops, best of 3: 2.76 ms per loop sage: timeit('f1 * g0 + f0 * g1') 125 loops, best of 3: 2.99 ms per loop
Here, the current code takes about 5s, whereas one can easily complete the computation in 6ms. This improvement is much better than the factor of 7 that your patch provides.
By the way, I believe a similar trick works for all polynomials over number fields in the case when the degree of the polynomial is typically larger than the degree of the number field.
In the opposite, when the degree of the number field is typically larger than the degree of the polynomials, I guess one would want to take a different approach.
I also wanted to compare the algorithm with singular multivariate multiplication which I think it is considered fairly good.
Hopefully, my code snippet above convinces you that the current
implementation of QQ[I]x?
isn't all that good.
Note that ZZ[x][y] are currently arrays of mpz_t elements.
I do not think this is true. ZZx?y? is different from
ZZ['x','y']. The latter is implemented as a sparse multivariate
polynomial ring. But the former is *not* implemented as a single
contiguous array of mpz_t
's but as a list of coefficients,
where each coefficient (i.e. polynomials in ZZx?) is again
implemented (in FLINT, this time) as an array of
mpz_t
's.
That is to say, here you have nested lists, which is not was I
meant: I meant a single contiguous array of
mpz_t
's. I'm
not saying one should do that in the generic case, but if one
really cares about dense twovariate polynomials over ZZ, this
approach should be favourable.
Anyway, all that said, let's be more positive again and look at your actual code:
 In
_mul_generic
, I think you should move some of the isolated and nonedge cases from TESTS to EXAMPLES. TESTS should then contains more bulk tests as suggested earlier on this ticket.
 In
_mul_generic
, your calls to
list()
will likely create new lists of objects. If they are guaranteed to be shallow copies of the elements, that is ok. This needs to be verified. If they are deep copies, however, this should be changed.
 The code in
do_karatsuba
is not very readable. I think this can easily be improved by removing all the comments from the function body  there are many more comment lines then actual code lines!  into a brief descriptive documentation in the function header.
Best wishes,
Sebastian
comment:13 followup: ↓ 15 Changed 10 years ago by
Replying to spancratz:
Hi Luis,
Ok, this is a bad example, I needed some rings that produced genuine Polynomial_generic_dense_field appart from number fields.
Ok I agree, but this ticket is not about polynomials for number fields, but multiplication of Polynomial_generic_dense_field elements.
Well, which base ring is it that you are interested in?
The fact that I am interested in polynomials over number fields does not invalidate the fact that the current generic karatsuba method must be changed because it is too inefficient. Do you think that mul_karatsuba is ok when it is slower for degree 1500 polynomials?
I don't really think there is all that much point in squeezing a small improvement out of the generic case when often much more is available, even more so when it is not clear that your patch improves the runtime across all base rings.
It does improve across all rings (or at least is not worse) because:
 It has less function calls, less slicing of lists and less if branches (low impact).
 It does not perform plenty of unnecessary additions of 0 + ModuleElement? (high impact over some base rings, like number fields or orders of number fields).
 User has control over the karatsuba threshold, this can speed up things in some cases. By default the threshold is the same as the previous implementation. So this is not worse than the previous method (quite ring dependent).
 The case of different sized polynomials is inefficient in the current implementation (high impact over every ring).
This last reason is very important. For instance, take a pair of polynomials over ZZ. One of degree twice the other. _mul_karatsuba is slower than _mul_generic independently of the degree. This is a bug in my opinion.
sage: K=ZZ[x] sage: f=K.random_element(4000) sage: g=K.random_element(2000) sage: h=K.random_element(1500) sage: %time _= f._mul_generic(g) CPU times: user 0.90 s, sys: 0.00 s, total: 0.90 s Wall time: 0.90 s sage: %time _= f._mul_karatsuba(g) CPU times: user 2.65 s, sys: 0.00 s, total: 2.66 s Wall time: 2.66 s sage: %time _= f._mul_karatsuba(h) CPU times: user 3.41 s, sys: 0.00 s, total: 3.41 s Wall time: 3.42 s
With patch:
sage: K=ZZ[x] sage: f=K.random_element(4000) sage: g=K.random_element(2000) sage: h=K.random_element(1500) sage: %time _=f._mul_generic(g) CPU times: user 0.90 s, sys: 0.00 s, total: 0.90 s Wall time: 0.91 s sage: %time _=f._mul_karatsuba(g) CPU times: user 0.24 s, sys: 0.00 s, total: 0.24 s Wall time: 0.24 s sage: %time _=f._mul_karatsuba(h) CPU times: user 0.25 s, sys: 0.00 s, total: 0.25 s Wall time: 0.25 s
I do not care if FLINT is lightning fast here. This example is just a symptom that _mul_karatsuba is not doing its job compared with _mul_generic. Moreover, multiplying a polynomial of degree 4000 and other of degree 1500 is much slower than multiplying a polynomial of degree 4000 and other of degree 2000. With the patch they cost around the same time, not something I am confortable with.
More examples for more rings, taking two polynomials. One of degree 27 and the other 31. The first line is the time taking with my patch, second line with SAGE 4.6.
ZZ[x] 625 loops, best of 3: 195 µs per loop 625 loops, best of 3: 435 µs per loop ZZ[I][x] 625 loops, best of 3: 1.05 ms per loop 5 loops, best of 3: 219 ms per loop ZZ[I,sqrt(2)][x] 25 loops, best of 3: 10.3 ms per loop 5 loops, best of 3: 317 ms per loop QQ[x] 625 loops, best of 3: 1 ms per loop 125 loops, best of 3: 1.93 ms per loop Integers(6)[x] 625 loops, best of 3: 263 µs per loop 625 loops, best of 3: 890 µs per loop ZZ[x]['y'] 625 loops, best of 3: 263 µs per loop 125 loops, best of 3: 5.06 ms per loop QuaternionAlgebra(1,1)[x] 125 loops, best of 3: 2.36 ms per loop 125 loops, best of 3: 4.43 ms per loop GF(2)[x].fraction_field()[y] 5 loops, best of 3: 44 ms per loop 5 loops, best of 3: 70.6 ms per loop QQ[x].fraction_field()['y'] 5 loops, best of 3: 229 ms per loop 5 loops, best of 3: 290 ms per loop GF(5)[x].quotient_ring(x^5x)[x] 5 loops, best of 3: 53.5 ms per loop 5 loops, best of 3: 80.6 ms per loop SR[x] 625 loops, best of 3: 1.26 ms per loop 125 loops, best of 3: 3.95 ms per loop QQ['x,y,z']['t'] 25 loops, best of 3: 25.7 ms per loop 25 loops, best of 3: 34.1 ms per loop GF(7^2,'a')[x] #Here the time is virtually the same due to the cost of creating the lists 5 loops, best of 3: 68.8 ms per loop 5 loops, best of 3: 71.3 ms per loop
 x200 faster for orders of number fields
 x30 for relative orders
 x2 faster for quaternion algebras
 x2 for the symbolic ring
I do not think that this is a small improvement.
By the way, perhaps other developers might disagree with me. Perhaps it would be a good idea for you to ask about this on sagedevel?
I'll do
Of course, these are just words... Here's an example:
sage: R.<x> = QQ[I][] sage: f = R.random_element(1500) sage: g = R.random_element(1500) sage: %time _ = f._mul_generic(g) CPU times: user 4.39 s, sys: 0.01 s, total: 4.40 s Wall time: 4.42 s sage: %time _ = f._mul_karatsuba(g) CPU times: user 5.63 s, sys: 0.04 s, total: 5.67 s Wall time: 5.73 s sage: S.<y> = QQ[] sage: f0 = S([f[i][0] for i in range(f.degree() + 1)]) sage: f1 = S([f[i][1] for i in range(f.degree() + 1)]) sage: g0 = S([g[i][0] for i in range(g.degree() + 1)]) sage: g1 = S([g[i][1] for i in range(g.degree() + 1)]) sage: timeit('f0 * g0  f1 * g1') 125 loops, best of 3: 2.76 ms per loop sage: timeit('f1 * g0 + f0 * g1') 125 loops, best of 3: 2.99 ms per loopHere, the current code takes about 5s, whereas one can easily complete the computation in 6ms. This improvement is much better than the factor of 7 that your patch provides.
Ok, but you are disregarding all other cases.
Anyway, all that said, let's be more positive again and look at your actual code:
 In
_mul_generic
, I think you should move some of the isolated and nonedge cases from TESTS to EXAMPLES. TESTS should then contains more bulk tests as suggested earlier on this ticket.
 In
_mul_generic
, your calls to
list()
will likely create new lists of objects. If they are guaranteed to be shallow copies of the elements, that is ok. This needs to be verified. If they are deep copies, however, this should be changed.
I have not really changed _mul_generic, only moved the code to avoid duplication of code. Calling list is already in the original code. list is also used for addition by the way. But I realize something, I have to take care of inline operations in the code. If base ring elements are inmutable (as I think they are expected to) this will just call noninlined operations. If they are mutable, multiplication of polynomials will change the input polynomials, which is wrong. I will change this.
 The code in
do_karatsuba
is not very readable. I think this can easily be improved by removing all the comments from the function body  there are many more comment lines then actual code lines!  into a brief descriptive documentation in the function header.
I will take a look,
Best regards,
Luis
comment:14 Changed 10 years ago by
For the number field specific case, see #10591
comment:15 in reply to: ↑ 13 Changed 10 years ago by
Hi Luis,
I am terribly sorry for not writing on this thread again any earlier; after returning to the UK following the Sage bug days in Seattle, I was busy catching up on some other things. Anyway...
Ok I agree, but this ticket is not about polynomials for number fields, but multiplication of Polynomial_generic_dense_field elements.
Well, which base ring is it that you are interested in?
The fact that I am interested in polynomials over number fields does not invalidate the fact that the current generic karatsuba method must be changed because it is too inefficient. Do you think that mul_karatsuba is ok when it is slower for degree 1500 polynomials?
Of course, it is great that (via this patch!) there might soon be some code which speeds up the generic implementation. But to be honest, personally I don't think the generic implementation is all that important; after all, it is only a fallback option used if noone has brought up enough energy to implement something more specific.
I do not care if FLINT is lightning fast here.
I feel this is a bit of a shame. Like rjf mentioned on sagedevel for many base rings you will be able to come up with something much, much faster than this small (by comparison!) improvement.
This example is just a symptom that _mul_karatsuba is not doing its job compared with _mul_generic. Moreover, multiplying a polynomial of degree 4000 and other of degree 1500 is much slower than multiplying a polynomial of degree 4000 and other of degree 2000.
Of course, you are right, this is ridiculous. Implicit zeropadding (or even explicit, if you assume only references are stored) should be nearly for free, so a 4000 by 1500 product shouldn't cost any noticeably more than a 4000 by 2000 product. (Of course, ideally it should cost less!)
 x200 faster for orders of number fields
 x30 for relative orders
 x2 faster for quaternion algebras
 x2 for the symbolic ring
I do not think that this is a small improvement.
Obviously, we can agree to disagree here. In absolute terms, all of the above improvements are fantastic! In relative terms, looking at what would be possible just by writing base ring specific code, however, I am confident you could do better.
Of course, these are just words... Here's an example:
sage: R.<x> = QQ[I][] sage: f = R.random_element(1500) sage: g = R.random_element(1500) sage: %time _ = f._mul_generic(g) CPU times: user 4.39 s, sys: 0.01 s, total: 4.40 s Wall time: 4.42 s sage: %time _ = f._mul_karatsuba(g) CPU times: user 5.63 s, sys: 0.04 s, total: 5.67 s Wall time: 5.73 s sage: S.<y> = QQ[] sage: f0 = S([f[i][0] for i in range(f.degree() + 1)]) sage: f1 = S([f[i][1] for i in range(f.degree() + 1)]) sage: g0 = S([g[i][0] for i in range(g.degree() + 1)]) sage: g1 = S([g[i][1] for i in range(g.degree() + 1)]) sage: timeit('f0 * g0  f1 * g1') 125 loops, best of 3: 2.76 ms per loop sage: timeit('f1 * g0 + f0 * g1') 125 loops, best of 3: 2.99 ms per loopOk, but you are disregarding all other cases.
Obviously, and deliberately so. But please let me know which ring you actually, mathematically care about. I would not be surprised if we could come up with something noticeably faster!
By the way, I am not sure whether anyone else is looking at this bit of code. While I can't promise how much spare time I will have in the next couple of weeks, I am happy to keep looking at this, run tests, give comments etc.
Best wishes,
Sebastian
comment:16 Changed 10 years ago by
Hi Sebastian,
I think that all is said, I understand your point of view and appreciate the interest you are showing in this ticket. But I also think that this ticket is necessary and that current behavior for different sized polynomials is a bug.
I am not currently working a lot on Sage, since I am overloaded with preexams and administrative duties, but I will try to finish with the cleaning of the code in the next two weeks.
Bests
Changed 10 years ago by
Changed 10 years ago by
comment:17 Changed 10 years ago by
just for the record, I attach two images showing some numbers.
comparison_product_50_400.png Shows the number of products made to multiply a fixed polynomial of degree 49 and another polynomial of degree from 0 to 399.
comparison_addition_50_400.png shows the number of additions performed by the same polynomials.
 In yellow using algorithm _mul_generic
 In red using current _mul_karatsuba as of 4.6.1
 in blue with this patch
We can appreciate that for a big difference in the degrees of the polynomials, current _mul_karatsuba needs more additions and more products than _mul_generic.
Changed 10 years ago by
comment:18 Changed 10 years ago by
 Status changed from new to needs_review
Apply: trac_karatsuba_improvements.2.patch
Changed 10 years ago by
comment:20 Changed 8 years ago by
 Description modified (diff)
 Summary changed from improve karatsuba multiplication of univariate polynomials to Correct karatsuba multiplication of univariate polynomials for different degree polynomials
 Type changed from enhancement to defect
Reflect better the nature of the problem in the description.
comment:21 Changed 8 years ago by
 Description modified (diff)
Apply: trac_10255_karatsuba_improbement.patch
comment:22 Changed 8 years ago by
 Milestone changed from sage5.11 to sage5.12
comment:23 Changed 7 years ago by
Apply: trac_10255_karatsuba_improbement.patch
comment:24 Changed 7 years ago by
 Branch set to u/lftabera/ticket/10255
comment:25 Changed 7 years ago by
 Commit set to 4d08ddff868a930de116c31163af910ea33b3d7d
Branch pushed to git repo; I updated commit sha1. New commits:
4d08ddf  #10255 Improve karatsuba multiplication.

comment:26 Changed 7 years ago by
 Branch changed from u/lftabera/ticket/10255 to u/mmezzarobba/10255karatsuba
 Commit changed from 4d08ddff868a930de116c31163af910ea33b3d7d to f6cd17edfaaa9ef285174579417855ebae41ae73
 Reviewers set to Marc Mezzarobba
I for one believe that having a reasonable generic multiplication is important, no matter how much faster specialized implementations can be.
The implementation in this ticket is a significant improvement over the existing one. The code looks fine to me and works well on all examples I tried.
(There are still, e.g., a few list copies that would be nice to avoid, but this does not look easy to do without changes to the notsouniform interface of polynomials. In any case, this is not a reason to further delay this ticket IMO. Same goes for the other improvements I can think of.)
A few minor points though, all related to documentation and tests:
 Shouldn't the test in the docstring of
do_karatsuba_different_size
readsage: K = ZZ[x] sage: f = K.random_element(21) # instead of 28 sage: g = K.random_element(34) sage: f*g  f._mul_karatsuba(g,0) 0
in accordance with the comment about Fibonacci numbers?  The tests that call
random_element()
should probably usesage.misc.random_testing
.  I pushed a commit with minor changes to some docstrings and comments (no code change). Could you please have a look and tell me if you agree with my changes?
New commits:
f6cd17e  Improved Karatsuba: cosmetic changes to docstrings and comments

comment:27 Changed 7 years ago by
 Branch changed from u/mmezzarobba/10255karatsuba to u/lftabera/ticket/10255
 Commit changed from f6cd17edfaaa9ef285174579417855ebae41ae73 to 15dabc332b766acd7adffad8cfc1371653037831
Marc,
Thanks a lot for taking a look into this and working on improving the documentation. I agree with your changes. I have also changed the bogus nonfibnacci degree.
Concerning the random tests, I have moved the comparison of the result with external libraries to rings/tests.py, but I have left the rest of the tests in place. Do you agree with this changes?
comment:28 Changed 7 years ago by
 Branch changed from u/lftabera/ticket/10255 to u/mmezzarobba/10255karatsuba
 Commit changed from 15dabc332b766acd7adffad8cfc1371653037831 to 597747007214bdb661d112296f54a77989aa2a01
Hi,
I mostly agree with your changes, but I think randomized tests over other rings than ZZ
would not be a luxury. So I modified test_karatsuba_multiplication
to take the base ring as input. Can you have a look at my patch?
Most importantly, your test used to compare g*f
to f._mul_karatsuba(g, threshold)
. I changed g*f
to (essentially) f*g
in order to handle noncommutative base rings. But perhaps you wrote g*f
deliberately? I also made the degree bound passed to R.random_element
random. And I changed ZZ['x']
to PolynomialRing(base_ring, 'x')
because some rings currently don't support the former syntax.
If you are ok with my changes, please go on and set the ticket to positive_review.
New commits:
5977470  Improved Karatsuba: more thorough tests

comment:29 Changed 7 years ago by
 Reviewers changed from Marc Mezzarobba to Marc Mezzarobba, Luis Felipe Tabera Alonso
 Status changed from needs_review to positive_review
The code looks good to me. I wrote f._mul_karatsuba thinking on testing commutativity, but now I realize that assuming that the other library I am testing against is correct, it does not make much sense.
comment:30 Changed 7 years ago by
 Resolution set to fixed
 Status changed from positive_review to closed
Same example with my patch. Pure karatsuba:
TODO:
Take a more careful look to the correctness of code. Check that the changes do not introduce performance penalties on a wider set of rings, degrees and size of polynomials. Introduce a userdefinable threshold (with a sensible default for most common rings) for which karatsuba falls back to classic multiplication.