#27444 closed enhancement (fixed)
Expose multithreaded fflasffpack features
Reported by:  ghZHG2017  Owned by:  

Priority:  major  Milestone:  sage8.9 
Component:  packages: standard  Keywords:  
Cc:  cpernet  Merged in:  
Authors:  Hongguang Zhu, Clément Pernet  Reviewers:  Luca De Feo 
Report Upstream:  N/A  Work issues:  
Branch:  e432cb0 (Commits, GitHub, GitLab)  Commit:  
Dependencies:  #26932  Stopgaps: 
Description
Dense linear algebra mod small p uses fflasffpack over float or double. This library support multithreading for some routines:
 matrixmatrix multiplication
 PLUQ decomposition
 etc.
This ticket proposes to expose these parallel variants to Sage user.
Change History (35)
comment:1 Changed 3 years ago by
 Branch set to u/ghZHG2017/expose_multithreaded_fflas_ffpack_features
comment:2 Changed 3 years ago by
 Commit set to e7f4be4220242e4c70faf180912d6a580a30624b
comment:3 Changed 3 years ago by
 Commit changed from e7f4be4220242e4c70faf180912d6a580a30624b to 22e592e5cdae3c12bf45886aaa6194928c9e7c87
Branch pushed to git repo; I updated commit sha1. New commits:
22e592e  fixing indents and templated parallel helper

comment:4 Changed 3 years ago by
 Commit changed from 22e592e5cdae3c12bf45886aaa6194928c9e7c87 to eae88f5f92c812f68381846dbeba5fbbf38cd8ad
Branch pushed to git repo; I updated commit sha1. New commits:
eae88f5  fgemm aggregated into SageMaths with no more compile time error but not yet find out how to run it with more threads

comment:5 Changed 3 years ago by
 Commit changed from eae88f5f92c812f68381846dbeba5fbbf38cd8ad to 01de1e16a01f54a6bcfed76e6c6b39e588e954a5
Branch pushed to git repo; I updated commit sha1. New commits:
01de1e1  Compile and link flags added for enable openmp but no multithreading found

comment:6 Changed 3 years ago by
 Commit changed from 01de1e16a01f54a6bcfed76e6c6b39e588e954a5 to 86c203840a152fac507804c138348cca550c451e
Branch pushed to git repo; I updated commit sha1. New commits:
fff3992  remove MMHelperAlgo

d81e3c1  Merge branch 'u/ghZHG2017/expose_multithreaded_fflas_ffpack_features' of git://trac.sagemath.org/sage into t/27444/expose_multithreaded_fflas_ffpack_features

86c2038  parallel fgemm working by patching fflasffpack

comment:7 Changed 3 years ago by
 Commit changed from 86c203840a152fac507804c138348cca550c451e to 449b103c8dadc063933a40cae67ee0f9a379cb2f
Branch pushed to git repo; I updated commit sha1. New commits:
449b103  fix const error

comment:8 Changed 3 years ago by
 Commit changed from 449b103c8dadc063933a40cae67ee0f9a379cb2f to 397c10854ad651aad31f8b47043c04d289c7bc1b
Branch pushed to git repo; I updated commit sha1. New commits:
397c108  fix error

comment:9 Changed 3 years ago by
For the record: matrix multiplication over finite field works in parallel. The following experiment is on a 12 cores Intel skylakex server:
pernet@retourdest:~/soft/sage$ export OMP_NUM_THREADS=12;./sage ┌────────────────────────────────────────────────────────────────────┐ │ SageMath version 8.7.beta7, Release Date: 20190310 │ │ Using Python 2.7.15. Type "help()" for help. │ └────────────────────────────────────────────────────────────────────┘ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Warning: this is a prerelease version, and it may be unstable. ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ sage: a=random_matrix(GF(11),10000) sage: time b=a*a CPU times: user 23.3 s, sys: 405 ms, total: 23.7 s Wall time: 3.3 s
while on the same machine, on master we get:
sage: a=random_matrix(GF(11),10000) sage: time b=a*a CPU times: user 18.4 s, sys: 211 ms, total: 18.6 s Wall time: 18.7 s
The speedup of 6 for a 12 core machine is below what a selfstanding fflasffpack achieves. We need to investigate the reasons: alignment, strassenwinograd threshold, numa placement, etc.
comment:10 Changed 3 years ago by
 Milestone changed from sage8.7 to sage8.8
Ticket retargeted after milestone closed (if you don't believe this ticket is appropriate for the Sage 8.8 release please retarget manually)
comment:11 Changed 3 years ago by
 Commit changed from 397c10854ad651aad31f8b47043c04d289c7bc1b to 58a5309f4c3da394afa1dae95cac918d9d940262
Branch pushed to git repo; I updated commit sha1. New commits:
58a5309  Using Sage Parallelism() to let user choose number of threads for fgemm

comment:12 Changed 3 years ago by
 Commit changed from 58a5309f4c3da394afa1dae95cac918d9d940262 to 421c853598e7f226ff2d172559fa3596d67f6ad8
Branch pushed to git repo; I updated commit sha1. New commits:
421c853  Multithreaded fgemv using SageMath Parallelism() seems to be working

comment:13 Changed 3 years ago by
 Branch changed from u/ghZHG2017/expose_multithreaded_fflas_ffpack_features to u/cpernet/expose_multithreaded_fflas_ffpack_features
comment:14 Changed 3 years ago by
 Branch changed from u/cpernet/expose_multithreaded_fflas_ffpack_features to u/ghZHG2017/expose_multithreaded_fflas_ffpack_features
comment:15 Changed 3 years ago by
 Milestone sage8.8 deleted
As the Sage8.8 release milestone is pending, we should delete the sage8.8 milestone for tickets that are not actively being worked on or that still require significant work to move forward. If you feel that this ticket should be included in the next Sage release at the soonest please set its milestone to the next release milestone (sage8.9).
comment:16 Changed 3 years ago by
 Commit changed from 421c853598e7f226ff2d172559fa3596d67f6ad8 to 2d552b9bd261147757b02b13461e7325b5bf1a9a
Branch pushed to git repo; I updated commit sha1. New commits:
bd8b2a0  fix 32 compilation failure for linboxrank

ddacbf0  upgrade to linbox1.6.3 fflasffpack2.4.1 and givaro4.1.1

f2d1f29  upgrade to fflasffpack2.4.2 fixing testsuite errors with echlon forms

7848156  upgrade do fflasffpack 2.4.3 fixing an error in charpoly autotune

c391252  Merge branch 't/26932/upgrade_to_givaro_4_1_0_fflas_ffpack_2_4_0_linbox_1_6_0' into t/27444/expose_multithreaded_fflas_ffpack_features

2d552b9  pRank pDet and pReducedRowEchelonForm added with multithreading which also have been tested and cleaned up for code review

comment:17 Changed 3 years ago by
I confirm that the branch successfullly parallizes det, rank and echelonize. However, in the current version of the branch, the matrix product is no longer parallelized. Could you revert the call to pfgemm?
comment:18 Changed 3 years ago by
 Commit changed from 2d552b9bd261147757b02b13461e7325b5bf1a9a to 6ca42e23280f1562253274f9e0df04b45092304e
Branch pushed to git repo; I updated commit sha1. New commits:
6ca42e2  Added back pfgemm with the same parallelism like pDet

comment:19 Changed 3 years ago by
 Dependencies set to 26932
comment:20 Changed 3 years ago by
 Dependencies changed from 26932 to #26932
comment:21 Changed 3 years ago by
 Status changed from new to needs_review
comment:22 Changed 3 years ago by
 Branch changed from u/ghZHG2017/expose_multithreaded_fflas_ffpack_features to u/cpernet/expose_multithreaded_fflas_ffpack_features
comment:23 Changed 3 years ago by
 Commit changed from 6ca42e23280f1562253274f9e0df04b45092304e to 371d63cc721ad867b05e902e428879b2671f5e61
comment:24 followup: ↓ 27 Changed 3 years ago by
I'm curious about these:
if m*n > 100000: sig_on()
Any reason not to always call sig_on()
?
comment:25 Changed 3 years ago by
Docs need updating.
In parallelism.py
:
EXAMPLES:
The number of processes is initialized to 1 (no parallelization) for each field (only tensor computations are implemented at the moment):
Also, it would be good to document in the matrix reference the availability of parallelism, like it is done in http://doc.sagemath.org/html/en/reference/tensor_free_modules/sage/tensor/modules/free_module_tensor.html#sage.tensor.modules.free_module_tensor.FreeModuleTensor.disp
comment:26 Changed 3 years ago by
Otherwise, I confirm there is no regression with Python 3. If you can fix the docs, this ticket is good for me.
comment:27 in reply to: ↑ 24 Changed 3 years ago by
Replying to defeo:
I'm curious about these:
if m*n > 100000: sig_on()Any reason not to always call
sig_on()
?
I guess the reason is to avoid the overhead of using the sigs when the call is with a tiny instance which runs almost instantly and will therefore have no chance of being interrupted.
comment:28 Changed 3 years ago by
 Commit changed from 371d63cc721ad867b05e902e428879b2671f5e61 to e432cb0e542e6c2d7d5e5495611cb42240b89c17
Branch pushed to git repo; I updated commit sha1. New commits:
e432cb0  fixing the doc

comment:29 Changed 3 years ago by
@defeo: Doc fixed in the above commit. Should be good to go now.
comment:30 Changed 3 years ago by
 Reviewers set to Luca De Feo
 Status changed from needs_review to positive_review
The ticket has no authors?
comment:31 Changed 3 years ago by
comment:33 Changed 3 years ago by
 Branch changed from u/cpernet/expose_multithreaded_fflas_ffpack_features to e432cb0e542e6c2d7d5e5495611cb42240b89c17
 Resolution set to fixed
 Status changed from positive_review to closed
comment:34 Changed 3 years ago by
 Commit e432cb0e542e6c2d7d5e5495611cb42240b89c17 deleted
I don't think this ticket should have removed disableopenmp
from the fflas_ffpack build without mention, because now it requires OpenMP support in any modules which link to it, and this doesn't seem to work properly.
Branch pushed to git repo; I updated commit sha1. New commits:
Add ParSeqHelper to Sage