Opened 10 years ago
Closed 10 years ago
#11610 closed enhancement (fixed)
Reduce memory consumption of generic Strassen-Winograd implementation
Reported by: | SimonKing | Owned by: | jason, was |
---|---|---|---|
Priority: | major | Milestone: | sage-4.7.2 |
Component: | linear algebra | Keywords: | Strassen Winograd matrix multiplication |
Cc: | hedtke | Merged in: | sage-4.7.2.alpha2 |
Authors: | Simon King | Reviewers: | Ivo Hedtke |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description (last modified by )
sage.matrix.strassen provides a framework for implementing fast matrix multiplication à la Strassen-Winograd.
In the current implementation, the number of arithmetic operations per iteration is quite alright: 7 matrix multiplications and 15 additions/subtractions. However, there is a lot of temporary memory allocated; that's bad, both for the memory consumption and the computation time.
I am well aware that, even though Strassen-Winograd is asymptotically fast and practically a lot better than the default multiplication of dense matrices over GF(n) (n not prime) in Sage, the additional memory consumption can be a real problem. So, I am not suggesting to use Strassen-Winograd by default for Matrix_modn_dense
.
However, I believe that it would be good to make the generic Strassen-Winograd implementation more useful.
Ivo Hedtke pointed to "Boyer, Dumans, Pernet and Zhou: Memory efficient scheduling of Strassen-Winograd's matrix multiplication algorithm. International Symposium on Symbolic and Algebraic Computation 2009." They showed that in fact by improving the scheduling, no temporary memory needs to be allocated. The price to pay is that the input matrices are destroyed.
However, there is a schedule, apparently due to an article of Douglas-Heroux-Slishman-Smith (but also appearing in the Boyer-Dumans-Pernet-Zhou article) that preserves the input matrices but requiress only rather little additional temporary memory. The attached patch implements that non-destructive schedule.
Here is a small benchmark: Multiplication of two 2000x2000 matrices over GF(125), using Strassen-Winograd with a cutoff of 20. I measure the memory consumption during computation using "top", and express it in "percentage of my computer's memory".
Without any patches:
12.3% memory, 437.50 s
With the new patch:
7.2% memory, 439.92 s
With #11589 and the new patch:
6.9% memory, 425.28 s
The comparison with the default multiplication in this case is quite impressive: Even with #11589, one obtains
6.3% memory, 903.98 s (NO Strassen-Winograd)
I did some smaller examples to determine a good cutoff - it seems to me that something between 15 and 25 is optimal on my machine and using sage.matrix.matrix_modn_dense.Matrix_modn_dense
. Note that MeatAxe
only needs 12 seconds even without Strassen-Winograd, and Magma even needs half of that.
Attachments (1)
Change History (5)
Changed 10 years ago by
comment:1 Changed 10 years ago by
- Description modified (diff)
- Status changed from new to needs_review
comment:2 Changed 10 years ago by
- Reviewers set to hedtke
- Status changed from needs_review to positive_review
First of all: A nive piece of work. Thank you Simon!
- OK: Compiles without any problems or warnings on my 64-bit 4.7 sage on my Mac 10.6.8.
- OK: Made a little test with 10000 random matrices. result is always correct ;-)
- OK: doctests in the directory sage/matrix are ok, "All tests passed!"
- OK: long doctests ok, "All tests passed!"
- OK: no changes in the documentation, therfore nothing to check
- OK: doctest coverage 100%
- OK: docbuild ok
- OK: code is well written and documented.
- OK: source (paper) for the method is given.
- OK: the implemented method is the same method as in the paper
- Memory Requirements (measured with XCode Instruments): 1000x1000 GF(125) matrices
- with patch: 197.11 MB (93.97 seconds)
- without patch: 836.12 MB (98.49 seconds)
comment:3 Changed 10 years ago by
- Reviewers changed from hedtke to Ivo Hedtke
comment:4 Changed 10 years ago by
- Merged in set to sage-4.7.2.alpha2
- Resolution set to fixed
- Status changed from positive_review to closed
Improve memory efficiency of generic Strassen-Winograd implementation