Opened 2 years ago
Last modified 4 weeks ago
#29796 new enhancement
Parallelization of Wedge Product
Reported by:  Michael Jung  Owned by:  

Priority:  major  Milestone:  sage9.8 
Component:  geometry  Keywords:  manifolds, differential_forms, parallel 
Cc:  Eric Gourgoulhon, Travis Scrimshaw, Matthias Köppe  Merged in:  
Authors:  Michael Jung  Reviewers:  
Report Upstream:  N/A  Work issues:  
Branch:  u/ghmjungmath/wedge_product_parallel (Commits, GitHub, GitLab)  Commit:  6303e7c19f873255c82c0dd76721baa8c5721669 
Dependencies:  Stopgaps: 
Description (last modified by )
Apparently, the wedge product is not performed on multiple cores when parallel computation is enabled. According to the compontentwise computation of general tensors, I add this feature for the wedge product for alternate forms, too.
Change History (20)
comment:1 Changed 2 years ago by
Authors:  → Michael Jung 

Cc:  Eric Gourgoulhon added 
Component:  PLEASE CHANGE → geometry 
Description:  modified (diff) 
Keywords:  manifolds mixed_forms added 
comment:2 Changed 2 years ago by
Type:  PLEASE CHANGE → enhancement 

comment:3 Changed 2 years ago by
Description:  modified (diff) 

comment:4 Changed 2 years ago by
Description:  modified (diff) 

Keywords:  differential_forms parallel added; mixed_forms removed 
Summary:  Mixed Forms  Fast zero check → Parallelization of Wedge Product 
comment:5 Changed 2 years ago by
Description:  modified (diff) 

comment:6 Changed 2 years ago by
Branch:  → u/ghmjungmath/wedge_product_parallel 

comment:7 Changed 2 years ago by
Cc:  Travis Scrimshaw added 

Commit:  → d8ecedceb0f88de6afb5af3ad4f53a622552fec4 
comment:8 Changed 2 years ago by
Commit:  d8ecedceb0f88de6afb5af3ad4f53a622552fec4 → 6303e7c19f873255c82c0dd76721baa8c5721669 

comment:9 followup: 10 Changed 2 years ago by
Some computations in 4 dimensions made it slightly worse: from around 8 sec to 15 sec. In contrast, more complicated computations in 6 dimensions yield a good improvement.
However, I noticed that the cpus are not fully engaged and run around 2080% workload, varying all the time. Hence, there is much room for improvement.
I appreciate any suggestions. I feel a little bit lost here.
New commits:
0961bdc  Trac #29796: further small improvements

6303e7c  Trac #29796: indentation fixed

comment:10 Changed 2 years ago by
Replying to ghmjungmath:
Some computations in 4 dimensions made it slightly worse: from around 8 sec to 15 sec. In contrast, more complicated computations in 6 dimensions yield a good improvement.
However, I noticed that the cpus are not fully engaged and run around 2080% workload, varying all the time. Hence, there is much room for improvement.
I appreciate any suggestions. I feel a little bit lost here.
I would say that the behaviour that you observe is due to the computation being not fully parallelized in the current code. Indeed, in the final lines
for ii, val in paral_wedge(listParalInput): for jj in val: cmp_r[[jj[0]]] += jj[1]
the computation cmp_r[[jj[0]]] += jj[1]
is performed sequentially.
comment:11 Changed 2 years ago by
Interestingly, I dropped the summation completely, and still, the computation takes longer than without parallelization. This is odd, isn't it?
Even this modification doesn't improve anything:
ind_list = [(ind_s, ind_o) for ind_s in cmp_s._comp for ind_o in cmp_o._comp if len(ind_s+ind_o) == len(set(ind_s+ind_o))] nproc = Parallelism().get('tensor') if nproc != 1: # Parallel computation lol = lambda lst, sz: [lst[i:i + sz] for i in range(0, len(lst), sz)] ind_step = max(1, int(len(ind_list) / nproc)) local_list = lol(ind_list, ind_step) # list of input parameters: listParalInput = [(cmp_s, cmp_o, ind_part) for ind_part in local_list] @parallel(p_iter='multiprocessing', ncpus=nproc) def paral_wedge(s, o, local_list_ind): partial = [] for ind_s, ind_o in local_list_ind: ind_r = ind_s + ind_o partial.append([ind_r, s._comp[ind_s] * o._comp[ind_o]]) return partial for ii, val in paral_wedge(listParalInput): for jj in val: cmp_r[[jj[0]]] = jj[1] else: # Sequential computation for ind_s, ind_o in ind_list: ind_r = ind_s + ind_o cmp_r[[ind_r]] += cmp_s._comp[ind_s] * cmp_o._comp[ind_o]
If nproc
is set to 1, the original speed is preserved.
I am fully aware that this leads to wrong results and the summation should be covered within the parallelization, somehow. Nevertheless, this seems strange to me.
comment:12 Changed 2 years ago by
Besides this odd fact, do you have any ideas how the summation can be parallelized, too?
comment:13 Changed 2 years ago by
Cc:  Matthias Köppe added 

comment:14 Changed 2 years ago by
Milestone:  sage9.2 → sage9.3 

comment:15 Changed 20 months ago by
Milestone:  sage9.3 → sage9.4 

Setting new milestone based on a cursory review of ticket status, priority, and last modification date.
comment:16 Changed 15 months ago by
By the way, why don't we use MapReduce
patterns (or similar) for parallelizations? The parallelization syntax used all over is hardly readable imho.
See for example: https://towardsdatascience.com/abeginnersintroductionintomapreduce2c912bb5e6ac
comment:17 Changed 15 months ago by
Milestone:  sage9.4 → sage9.5 

comment:18 Changed 10 months ago by
Milestone:  sage9.5 → sage9.6 

comment:19 Changed 6 months ago by
Milestone:  sage9.6 → sage9.7 

comment:20 Changed 4 weeks ago by
Milestone:  sage9.7 → sage9.8 

This is my very first approach simply copied from the previous ones. However, I noticed that in lower dimensions, the parallelization is even slower. Furthermore, one could improve this process a little bit further just my considering indices distinct indices from the beginning (see the check in the loop).
I appreciate any help since I have no clue about effective parallelization.
New commits:
Trac #29796: first parallelization approach