#27444 closed enhancement (fixed)
Expose multithreaded fflasffpack features
Reported by:  ghZHG2017  Owned by:  

Priority:  major  Milestone:  sage8.9 
Component:  packages: standard  Keywords:  
Cc:  Clément Pernet  Merged in:  
Authors:  Hongguang Zhu, Clément Pernet  Reviewers:  Luca De Feo 
Report Upstream:  N/A  Work issues:  
Branch:  e432cb0 (Commits, GitHub, GitLab)  Commit:  
Dependencies:  #26932  Stopgaps: 
Description
Dense linear algebra mod small p uses fflasffpack over float or double. This library support multithreading for some routines:
 matrixmatrix multiplication
 PLUQ decomposition
 etc.
This ticket proposes to expose these parallel variants to Sage user.
Change History (35)
comment:1 Changed 4 years ago by
Branch:  → u/ghZHG2017/expose_multithreaded_fflas_ffpack_features 

comment:2 Changed 4 years ago by
Commit:  → e7f4be4220242e4c70faf180912d6a580a30624b 

comment:3 Changed 4 years ago by
Commit:  e7f4be4220242e4c70faf180912d6a580a30624b → 22e592e5cdae3c12bf45886aaa6194928c9e7c87 

Branch pushed to git repo; I updated commit sha1. New commits:
22e592e  fixing indents and templated parallel helper

comment:4 Changed 4 years ago by
Commit:  22e592e5cdae3c12bf45886aaa6194928c9e7c87 → eae88f5f92c812f68381846dbeba5fbbf38cd8ad 

Branch pushed to git repo; I updated commit sha1. New commits:
eae88f5  fgemm aggregated into SageMaths with no more compile time error but not yet find out how to run it with more threads

comment:5 Changed 4 years ago by
Commit:  eae88f5f92c812f68381846dbeba5fbbf38cd8ad → 01de1e16a01f54a6bcfed76e6c6b39e588e954a5 

Branch pushed to git repo; I updated commit sha1. New commits:
01de1e1  Compile and link flags added for enable openmp but no multithreading found

comment:6 Changed 4 years ago by
Commit:  01de1e16a01f54a6bcfed76e6c6b39e588e954a5 → 86c203840a152fac507804c138348cca550c451e 

Branch pushed to git repo; I updated commit sha1. New commits:
fff3992  remove MMHelperAlgo

d81e3c1  Merge branch 'u/ghZHG2017/expose_multithreaded_fflas_ffpack_features' of git://trac.sagemath.org/sage into t/27444/expose_multithreaded_fflas_ffpack_features

86c2038  parallel fgemm working by patching fflasffpack

comment:7 Changed 4 years ago by
Commit:  86c203840a152fac507804c138348cca550c451e → 449b103c8dadc063933a40cae67ee0f9a379cb2f 

Branch pushed to git repo; I updated commit sha1. New commits:
449b103  fix const error

comment:8 Changed 4 years ago by
Commit:  449b103c8dadc063933a40cae67ee0f9a379cb2f → 397c10854ad651aad31f8b47043c04d289c7bc1b 

Branch pushed to git repo; I updated commit sha1. New commits:
397c108  fix error

comment:9 Changed 4 years ago by
For the record: matrix multiplication over finite field works in parallel. The following experiment is on a 12 cores Intel skylakex server:
pernet@retourdest:~/soft/sage$ export OMP_NUM_THREADS=12;./sage ┌────────────────────────────────────────────────────────────────────┐ │ SageMath version 8.7.beta7, Release Date: 20190310 │ │ Using Python 2.7.15. Type "help()" for help. │ └────────────────────────────────────────────────────────────────────┘ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Warning: this is a prerelease version, and it may be unstable. ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ sage: a=random_matrix(GF(11),10000) sage: time b=a*a CPU times: user 23.3 s, sys: 405 ms, total: 23.7 s Wall time: 3.3 s
while on the same machine, on master we get:
sage: a=random_matrix(GF(11),10000) sage: time b=a*a CPU times: user 18.4 s, sys: 211 ms, total: 18.6 s Wall time: 18.7 s
The speedup of 6 for a 12 core machine is below what a selfstanding fflasffpack achieves. We need to investigate the reasons: alignment, strassenwinograd threshold, numa placement, etc.
comment:10 Changed 4 years ago by
Milestone:  sage8.7 → sage8.8 

Ticket retargeted after milestone closed (if you don't believe this ticket is appropriate for the Sage 8.8 release please retarget manually)
comment:11 Changed 4 years ago by
Commit:  397c10854ad651aad31f8b47043c04d289c7bc1b → 58a5309f4c3da394afa1dae95cac918d9d940262 

Branch pushed to git repo; I updated commit sha1. New commits:
58a5309  Using Sage Parallelism() to let user choose number of threads for fgemm

comment:12 Changed 4 years ago by
Commit:  58a5309f4c3da394afa1dae95cac918d9d940262 → 421c853598e7f226ff2d172559fa3596d67f6ad8 

Branch pushed to git repo; I updated commit sha1. New commits:
421c853  Multithreaded fgemv using SageMath Parallelism() seems to be working

comment:13 Changed 4 years ago by
Branch:  u/ghZHG2017/expose_multithreaded_fflas_ffpack_features → u/cpernet/expose_multithreaded_fflas_ffpack_features 

comment:14 Changed 4 years ago by
Branch:  u/cpernet/expose_multithreaded_fflas_ffpack_features → u/ghZHG2017/expose_multithreaded_fflas_ffpack_features 

comment:15 Changed 3 years ago by
Milestone:  sage8.8 

As the Sage8.8 release milestone is pending, we should delete the sage8.8 milestone for tickets that are not actively being worked on or that still require significant work to move forward. If you feel that this ticket should be included in the next Sage release at the soonest please set its milestone to the next release milestone (sage8.9).
comment:16 Changed 3 years ago by
Commit:  421c853598e7f226ff2d172559fa3596d67f6ad8 → 2d552b9bd261147757b02b13461e7325b5bf1a9a 

Branch pushed to git repo; I updated commit sha1. New commits:
bd8b2a0  fix 32 compilation failure for linboxrank

ddacbf0  upgrade to linbox1.6.3 fflasffpack2.4.1 and givaro4.1.1

f2d1f29  upgrade to fflasffpack2.4.2 fixing testsuite errors with echlon forms

7848156  upgrade do fflasffpack 2.4.3 fixing an error in charpoly autotune

c391252  Merge branch 't/26932/upgrade_to_givaro_4_1_0_fflas_ffpack_2_4_0_linbox_1_6_0' into t/27444/expose_multithreaded_fflas_ffpack_features

2d552b9  pRank pDet and pReducedRowEchelonForm added with multithreading which also have been tested and cleaned up for code review

comment:17 Changed 3 years ago by
I confirm that the branch successfullly parallizes det, rank and echelonize. However, in the current version of the branch, the matrix product is no longer parallelized. Could you revert the call to pfgemm?
comment:18 Changed 3 years ago by
Commit:  2d552b9bd261147757b02b13461e7325b5bf1a9a → 6ca42e23280f1562253274f9e0df04b45092304e 

Branch pushed to git repo; I updated commit sha1. New commits:
6ca42e2  Added back pfgemm with the same parallelism like pDet

comment:19 Changed 3 years ago by
Dependencies:  → 26932 

comment:20 Changed 3 years ago by
Dependencies:  26932 → #26932 

comment:21 Changed 3 years ago by
Status:  new → needs_review 

comment:22 Changed 3 years ago by
Branch:  u/ghZHG2017/expose_multithreaded_fflas_ffpack_features → u/cpernet/expose_multithreaded_fflas_ffpack_features 

comment:23 Changed 3 years ago by
Commit:  6ca42e23280f1562253274f9e0df04b45092304e → 371d63cc721ad867b05e902e428879b2671f5e61 

comment:24 followup: 27 Changed 3 years ago by
I'm curious about these:
if m*n > 100000: sig_on()
Any reason not to always call sig_on()
?
comment:25 Changed 3 years ago by
Docs need updating.
In parallelism.py
:
EXAMPLES:
The number of processes is initialized to 1 (no parallelization) for each field (only tensor computations are implemented at the moment):
Also, it would be good to document in the matrix reference the availability of parallelism, like it is done in http://doc.sagemath.org/html/en/reference/tensor_free_modules/sage/tensor/modules/free_module_tensor.html#sage.tensor.modules.free_module_tensor.FreeModuleTensor.disp
comment:26 Changed 3 years ago by
Otherwise, I confirm there is no regression with Python 3. If you can fix the docs, this ticket is good for me.
comment:27 Changed 3 years ago by
Replying to defeo:
I'm curious about these:
if m*n > 100000: sig_on()Any reason not to always call
sig_on()
?
I guess the reason is to avoid the overhead of using the sigs when the call is with a tiny instance which runs almost instantly and will therefore have no chance of being interrupted.
comment:28 Changed 3 years ago by
Commit:  371d63cc721ad867b05e902e428879b2671f5e61 → e432cb0e542e6c2d7d5e5495611cb42240b89c17 

Branch pushed to git repo; I updated commit sha1. New commits:
e432cb0  fixing the doc

comment:30 Changed 3 years ago by
Reviewers:  → Luca De Feo 

Status:  needs_review → positive_review 
The ticket has no authors?
comment:31 Changed 3 years ago by
Authors:  → Hongguang Zhu, Clement Pernet 

comment:33 Changed 3 years ago by
Branch:  u/cpernet/expose_multithreaded_fflas_ffpack_features → e432cb0e542e6c2d7d5e5495611cb42240b89c17 

Resolution:  → fixed 
Status:  positive_review → closed 
comment:34 Changed 3 years ago by
Commit:  e432cb0e542e6c2d7d5e5495611cb42240b89c17 

I don't think this ticket should have removed disableopenmp
from the fflas_ffpack build without mention, because now it requires OpenMP support in any modules which link to it, and this doesn't seem to work properly.
comment:35 Changed 2 years ago by
Authors:  Hongguang Zhu, Clement Pernet → Hongguang Zhu, Clément Pernet 

Branch pushed to git repo; I updated commit sha1. New commits:
Add ParSeqHelper to Sage