#27444 Expose multithreaded fflasffpack features
Expose multithreaded fflasffpack features
Authors:  Hongguang Zhu, Clément Pernet  Reviewers:  Luca De Feo 
Description
Dense linear algebra mod small p uses fflasffpack over float or double. This library support multithreading for some routines:
 matrixmatrix multiplication
 PLUQ decomposition
 etc.
This ticket proposes to expose these parallel variants to Sage user.
comment:9 Changed 3 years ago by
For the record: matrix multiplication over finite field works in parallel. The following experiment is on a 12 cores Intel skylakex server:
pernet@retourdest:~/soft/sage$ export OMP_NUM_THREADS=12;./sage ┌────────────────────────────────────────────────────────────────────┐ │ SageMath version 8.7.beta7, Release Date: 20190310 │ │ Using Python 2.7.15. Type "help()" for help. │ └────────────────────────────────────────────────────────────────────┘ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Warning: this is a prerelease version, and it may be unstable. ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ sage: a=random_matrix(GF(11),10000) sage: time b=a*a CPU times: user 23.3 s, sys: 405 ms, total: 23.7 s Wall time: 3.3 s
while on the same machine, on master we get:
sage: a=random_matrix(GF(11),10000) sage: time b=a*a CPU times: user 18.4 s, sys: 211 ms, total: 18.6 s Wall time: 18.7 s
The speedup of 6 for a 12 core machine is below what a selfstanding fflasffpack achieves. We need to investigate the reasons: alignment, strassenwinograd threshold, numa placement, etc.
I confirm that the branch successfullly parallizes det, rank and echelonize. However, in the current version of the branch, the matrix product is no longer parallelized. Could you revert the call to pfgemm?
comment:24 followup: ↓ 27 Changed 3 years ago by
I'm curious about these:
if m*n > 100000: sig_on()
Any reason not to always call sig_on()
?
Docs need updating.
EXAMPLES:
The number of processes is initialized to 1 (no parallelization) for each field (only tensor computations are implemented at the moment):
Also, it would be good to document in the matrix reference the availability of parallelism, like it is done in http://doc.sagemath.org/html/en/reference/tensor_free_modules/sage/tensor/modules/free_module_tensor.html#sage.tensor.modules.free_module_tensor.FreeModuleTensor.disp
comment:27 in reply to: ↑ 24 Changed 3 years ago by
Replying to defeo:
I'm curious about these:
if m*n > 100000: sig_on()Any reason not to always call
sig_on()
?
I guess the reason is to avoid the overhead of using the sigs when the call is with a tiny instance which runs almost instantly and will therefore have no chance of being interrupted.
@defeo: Doc fixed in the above commit. Should be good to go now.
The ticket has no authors?
I don't think this ticket should have removed disableopenmp
from the fflas_ffpack build without mention, because now it requires OpenMP support in any modules which link to it, and this doesn't seem to work properly.
