Sage: Ticket #12177: Prime slicing for matrix multiplication
https://trac.sagemath.org/ticket/12177
<p>
In Sage, matrix arithmetic over finite fields is fast in the following cases:
</p>
<ul><li>Prime fields, using linbox (<a class="closed ticket" href="https://trac.sagemath.org/ticket/4260" title="enhancement: use LinBox as native matrix representation for dense matrices over GF(p) (closed: fixed)">#4260</a>) resp. <code>M4RI</code> over <code>GF(2)</code>
</li><li><code>GF(2^e)</code>, using <code>M4RIE</code> (<a class="closed ticket" href="https://trac.sagemath.org/ticket/9562" title="enhancement: Add M4RIE to Sage (closed: fixed)">#9562</a>)
</li></ul><p>
In all other cases, it sucks. There is the suggestion to use a wrapper of a fork of <code>C-MeatAxe</code> (<a class="closed ticket" href="https://trac.sagemath.org/ticket/12103" title="defect: Use MeatAxe as an optional back end for dense matrices over `GF(p^n)`, ... (closed: fixed)">#12103</a>), but this would only work up to field size 255 and might have further disadvantages.
</p>
<p>
Martin Albrecht suggested to use "prime slicing" instead:
</p>
<ul><li>Represent matrices over <code>GF(p^n)</code> by a list of n matrices over <code>GF(p)</code> (with Linbox as backend)
</li><li>Matrix multiplication over <code>GF(p^n)</code> is implemented by a series of multiplications over <code>GF(p)</code>
</li><li>Karatsuba type formulae reduce the number of "prime" multiplications involved.
</li></ul><p>
On <a class="ext-link" href="http://groups.google.com/group/sage-devel/browse_thread/thread/8a988a2f5d997a5a/ebcde393f8ef5240"><span class="icon"></span>sage-devel</a>, Martin gave a couple of references:
</p>
<ul><li><a class="ext-link" href="http://www.google.com/url?sa=D&q=http://arxiv.org/abs/0901.1413&usg=AFQjCNFg5cUrXjBTqkon1V6n4Yl_CVQ9yA"><span class="icon"></span>Boothby and Bradshaw</a> Bitslicing and the Method of Four Russians Over Larger Finite Fields
</li><li><a class="ext-link" href="http://bit.ly/r5PVgv"><span class="icon"></span>Montgomery</a> Five, Six, and Seven-Term Karatsuba-Like Formulae
</li></ul><p>
On another occasion, Martin also pointed out how to compute echelon forms in that setting.
</p>
<p>
The purpose of this ticket is to make that idea real.
</p>
<p>
<strong>TODO</strong>
</p>
<ul><li>addition, subtraction (easy)
</li><li>scalar multiplication (medium)
</li><li>row swaps, column swaps (easy)
</li><li>C = C + A*B (easy)
</li><li>fast randomize (for testing, easy)
</li><li>apply LAPACK permutations (easy)
</li><li>matrix windows (cheap submatrices, arder)
</li><li>TRSM lower left / TRSM upper left (medium)
</li><li>PLE (medium)
</li><li>Karatsuba-like for up to degree 10 (harder)
</li></ul>en-usSagehttps://trac.sagemath.org/chrome/site/logo_sagemath_trac.png
https://trac.sagemath.org/ticket/12177
Trac 1.1.6SimonKingSun, 18 Dec 2011 10:36:03 GMTdescription changed
https://trac.sagemath.org/ticket/12177#comment:1
https://trac.sagemath.org/ticket/12177#comment:1
<ul>
<li><strong>description</strong>
modified (<a href="/ticket/12177?action=diff&version=1">diff</a>)
</li>
</ul>
TicketmalbSun, 18 Dec 2011 12:55:39 GMT
https://trac.sagemath.org/ticket/12177#comment:2
https://trac.sagemath.org/ticket/12177#comment:2
<p>
Hi Simon, great that you're getting the ball rolling. Two comments:
</p>
<p>
(a) I think the this code shouldn't be in Sage but on a lower level, preferably LinBox. But perhaps we can write proof of concept in Sage first and then port to C++?
</p>
<p>
(b)I figure we should get an idea about what performance to expect, so:
</p>
<div class="wiki-code"><div class="code"><pre>p <span class="o">=</span> <span class="mi">17</span>
K<span class="o">.<</span>a<span class="o">></span> <span class="o">=</span> GF<span class="p">(</span>p<span class="o">^</span><span class="mi">2</span><span class="p">)</span>
A <span class="o">=</span> <span class="p">[</span>random_matrix<span class="p">(</span>GF<span class="p">(</span>p<span class="p">),</span><span class="mi">2000</span><span class="p">,</span><span class="mi">2000</span><span class="p">)</span> <span class="k">for</span> _ <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">2</span><span class="p">)]</span>
B <span class="o">=</span> <span class="p">[</span>random_matrix<span class="p">(</span>GF<span class="p">(</span>p<span class="p">),</span><span class="mi">2000</span><span class="p">,</span><span class="mi">2000</span><span class="p">)</span> <span class="k">for</span> _ <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">2</span><span class="p">)]</span>
C <span class="o">=</span> <span class="p">[</span>matrix<span class="p">(</span>GF<span class="p">(</span>p<span class="p">),</span><span class="mi">2000</span><span class="p">,</span><span class="mi">2000</span><span class="p">)</span> <span class="k">for</span> _ <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">2</span><span class="p">)]</span>
<span class="k">def</span> <span class="nf">mul2</span><span class="p">(</span>K<span class="p">,</span> C<span class="p">,</span>A<span class="p">,</span>B<span class="p">):</span>
poly <span class="o">=</span> K<span class="o">.</span>polynomial<span class="p">()</span>
T0 <span class="o">=</span> A<span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">*</span>B<span class="p">[</span><span class="mi">0</span><span class="p">]</span>
C<span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+=</span> T0
C<span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">-=</span> T0
C<span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">+=</span> <span class="p">(</span>A<span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> A<span class="p">[</span><span class="mi">1</span><span class="p">])</span><span class="o">*</span><span class="p">(</span>B<span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">+</span>B<span class="p">[</span><span class="mi">1</span><span class="p">])</span>
T0 <span class="o">=</span> A<span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">*</span>B<span class="p">[</span><span class="mi">1</span><span class="p">]</span>
C<span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">-=</span> T0
<span class="k">for</span> i<span class="p">,</span>c <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span>poly<span class="p">)):</span>
<span class="k">if</span> i <span class="o">==</span> <span class="mi">2</span><span class="p">:</span>
<span class="k">break</span>
C<span class="p">[</span>i<span class="p">]</span> <span class="o">-=</span> c<span class="o">*</span>T0
<span class="k">return</span> C
</pre></div></div><p>
mul2 (I didn't check for bugs) with these inputs takes about 3 seconds on my computer, which was to be expected since a product over GF(p) takes about 1 second on my computer. For comparison Magma is a bit better:
</p>
<pre class="wiki">Magma V2.15-10 Sun Dec 18 2011 12:53:05 on road [Seed = 1273554698]
Type ? for help. Type <Ctrl>-D to quit.
> K:=GF(17^2);
> A:=RandomMatrix(K,2000,2000);
> B:=RandomMatrix(K,2000,2000);
> time C:=A*B;
Time: 2.320
</pre><p>
I'm not sure we can do much about it, since the performance essentially depends on the speed of mod-p matrices.
</p>
TicketSimonKingSun, 18 Dec 2011 15:42:23 GMT
https://trac.sagemath.org/ticket/12177#comment:3
https://trac.sagemath.org/ticket/12177#comment:3
<p>
I think we have to do two things:
</p>
<blockquote>
<p>
#. Have some code that deals with lists of matrices. I understand that Burcin has such code.
#. Find a way to find a Karatsuba type formula that reduces the number of mod-p multiplications.
</p>
</blockquote>
<p>
I think the second point should actually done not by Karatsuba, but by a modification of Level-n Toom multiplication (where n is the degree of our field extension). See <a class="ext-link" href="http://en.wikipedia.org/wiki/Toom%E2%80%93Cook_multiplication"><span class="icon"></span>Wikipedia</a> for a method to compute the level-n formula.
</p>
<p>
Ideally, we would also merge the defining polynomial of our field extension into the Toom formula.
</p>
TicketSimonKingSun, 18 Dec 2011 15:54:19 GMT
https://trac.sagemath.org/ticket/12177#comment:4
https://trac.sagemath.org/ticket/12177#comment:4
<p>
Here are some thoughts - I am not sure if this is all correct, but I think it is a starting point:
</p>
<p>
Elements of <code>K=GF(p^n)</code> are represented by polynomials of degree <code>n-1</code> over <code>k=GF(p)</code>.
</p>
<p>
Assumption: <code>p>>n</code>.
</p>
<p>
Each polynomial of degree <code>n-1</code> (hence, any element of K) is determined by evaluation at n points. When multipliying two polynomials then (modulo the defining polynomial of K) they also give rise to a polynomial of degree <code>n-1</code>. Hence, again, n evaluation points are enough.
</p>
<p>
I am not sure, though, whether we can choose <em>any</em> n-tuple of elements of k.
</p>
<p>
First step:
</p>
<p>
Choose n points from k. Evalutation of a generic polynomial of degree n-1 gives rise to n linear combinations of the polynomial's coefficients, described by some invertible square matrix. Let A be the inverse of that matrix.
</p>
<p>
Multiplying two polynomials of degree n modulo the defining polynomial now means: Evaluate the two factors at the given n points. For each point, multiply the two evaluations. Multiply this with A, and read off the coefficients of the product.
</p>
<p>
So, this can be used for our polynomials of matrices. If my arguments work, then we would be down to n multiplications over GF(p) for one multiplication over <code>GF(p^n)</code>. And comparing the figures of Linbox over <code>GF(17)</code> and Magma over <code>GF(17^2)</code>, we would be competitive with Magma.
</p>
TicketSimonKingSun, 18 Dec 2011 16:21:02 GMT
https://trac.sagemath.org/ticket/12177#comment:5
https://trac.sagemath.org/ticket/12177#comment:5
<p>
Sorry, in my previous post, I totally forgot to include "reduction modulo defining polynomial". Anyway, I am now trying to produce some code.
</p>
TicketmalbMon, 19 Dec 2011 14:54:48 GMTattachment set
https://trac.sagemath.org/ticket/12177
https://trac.sagemath.org/ticket/12177
<ul>
<li><strong>attachment</strong>
set to <em>toom.py</em>
</li>
</ul>
<p>
proof of concept
</p>
TicketmalbMon, 19 Dec 2011 14:55:26 GMT
https://trac.sagemath.org/ticket/12177#comment:6
https://trac.sagemath.org/ticket/12177#comment:6
<p>
I attached a proof of concept that the Toom thingy works.
</p>
TicketburcinMon, 19 Dec 2011 15:56:27 GMTmilestone changed
https://trac.sagemath.org/ticket/12177#comment:7
https://trac.sagemath.org/ticket/12177#comment:7
<ul>
<li><strong>milestone</strong>
changed from <em>sage-4.8</em> to <em>sage-feature</em>
</li>
</ul>
<p>
I attached a patch with template classes implementing this polynomial with matrix coefficients representation. There is also an implementation of the naive multiplication algorithm for GF(p<sup>k</sup>) to demonstrate FFLAS calls.
</p>
<p>
Timings for the naive multiplication from Martin's example above (<a class="ticket" href="https://trac.sagemath.org/ticket/12177#comment:2" title="Comment 2">comment:2</a>) for p=17 are:
</p>
<pre class="wiki">k time
2 4.51
3 10.28
4 19.17
</pre><p>
That's n<sup>2</sup> coefficient matrix multiplications with some overhead to handle the rollover with the minimal polynomial. We should look at a better algorithm. :)
</p>
TicketSimonKingMon, 19 Dec 2011 16:02:19 GMT
https://trac.sagemath.org/ticket/12177#comment:8
https://trac.sagemath.org/ticket/12177#comment:8
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/12177#comment:7" title="Comment 7">burcin</a>:
</p>
<blockquote class="citation">
<p>
That's n<sup>2</sup> coefficient matrix multiplications with some overhead to handle the rollover with the minimal polynomial. We should look at a better algorithm. :)
</p>
</blockquote>
<p>
... which probably is Toom. Martin, do you have code for Toom multiplication?
</p>
TicketmalbMon, 19 Dec 2011 17:12:27 GMT
https://trac.sagemath.org/ticket/12177#comment:9
https://trac.sagemath.org/ticket/12177#comment:9
<p>
I've attached it, but it's going to be slow if you try it because a*A for a in the field there is no special code in Sage yet.
</p>
TicketSimonKingTue, 20 Dec 2011 14:15:25 GMTattachment set
https://trac.sagemath.org/ticket/12177
https://trac.sagemath.org/ticket/12177
<ul>
<li><strong>attachment</strong>
set to <em>toom_matrix.py</em>
</li>
</ul>
<p>
Another proof of concept
</p>
TicketSimonKingTue, 20 Dec 2011 14:16:32 GMT
https://trac.sagemath.org/ticket/12177#comment:10
https://trac.sagemath.org/ticket/12177#comment:10
<p>
Martin, in your Toom proof of concept, aren't you evaluating at too few points? We start with polynomials of degree less than the degree of the field extension. But the product will have (before reduction) twice that degree.
</p>
<p>
Hence, instead of <code>FWD = Matrix(K, l, l, [K(i**j) for i in range(l) for j in range(l)])</code>, I think we need
</p>
<pre class="wiki"> FWD = Matrix(K, l, l, [K(i**j) for i in range(l) for j in range(2*l-1)])
</pre><p>
The following lines are to be changed accordingly.
</p>
<p>
We could of course include the reduction to degree l-1 into the matrix <code>BCK</code>. I did something along these lines in <a class="attachment" href="https://trac.sagemath.org/attachment/ticket/12177/toom_matrix.py" title="Attachment 'toom_matrix.py' in Ticket #12177">toom_matrix.py</a><a class="trac-rawlink" href="https://trac.sagemath.org/raw-attachment/ticket/12177/toom_matrix.py" title="Download"></a>, which returns the equivalent of <code>FWD</code> and <code>BCK</code> - perhaps you can plug it into your code?
</p>
TicketmalbTue, 20 Dec 2011 14:28:12 GMT
https://trac.sagemath.org/ticket/12177#comment:11
https://trac.sagemath.org/ticket/12177#comment:11
<p>
Simon, <code>l = len(A)+len(B)-1</code> i.e., l is *not* the degree of the inputs. Merging the modular reduction with the interpolation might be a good idea though.
</p>
TicketSimonKingTue, 20 Dec 2011 14:34:07 GMT
https://trac.sagemath.org/ticket/12177#comment:12
https://trac.sagemath.org/ticket/12177#comment:12
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/12177#comment:11" title="Comment 11">malb</a>:
</p>
<blockquote class="citation">
<p>
Simon, <code>l = len(A)+len(B)-1</code> i.e., l is *not* the degree of the inputs. Merging the modular reduction with the interpolation might be a good idea though.
</p>
</blockquote>
<p>
Sorry! I was somehow misreading it as <code>l = len(A)</code> - what happened to my eyes?
</p>
TicketburcinWed, 21 Dec 2011 11:58:39 GMTattachment set
https://trac.sagemath.org/ticket/12177
https://trac.sagemath.org/ticket/12177
<ul>
<li><strong>attachment</strong>
set to <em>trac_12177-coeff_matrices_template.patch</em>
</li>
</ul>
TicketburcinWed, 21 Dec 2011 12:01:38 GMT
https://trac.sagemath.org/ticket/12177#comment:13
https://trac.sagemath.org/ticket/12177#comment:13
<p>
I refreshed my patch with a version that implements the multi point evaluation approach using FFLAS. It can be reached via the <code>_multiply_toom()</code> function:
</p>
<pre class="wiki">sage: K.<a> = GF(17^6)
sage: MS = MatrixSpace(K, 2000, 2000)
sage: from sage.matrix.matrix_modq_dense_float import Matrix_modq_dense_float
sage: M = Matrix_modq_dense_float(MS, a^5)
sage: res = M._multiply_toom(M)
<some debugging output>
</pre>
TicketmalbWed, 21 Dec 2011 19:33:43 GMTdescription changed
https://trac.sagemath.org/ticket/12177#comment:14
https://trac.sagemath.org/ticket/12177#comment:14
<ul>
<li><strong>description</strong>
modified (<a href="/ticket/12177?action=diff&version=14">diff</a>)
</li>
</ul>
TicketkedlayaWed, 17 Aug 2016 01:08:37 GMT
https://trac.sagemath.org/ticket/12177#comment:15
https://trac.sagemath.org/ticket/12177#comment:15
<p>
See <a class="new ticket" href="https://trac.sagemath.org/ticket/9888" title="defect: matrix multiplication over integer mod ring is slow (new)">#9888</a> for a related discussion.
</p>
Ticket