Opened 2 years ago

Last modified 5 weeks ago

#29796 new enhancement

Parallelization of Wedge Product

Reported by: Michael Jung Owned by:
Priority: major Milestone: sage-9.8
Component: geometry Keywords: manifolds, differential_forms, parallel
Cc: Eric Gourgoulhon, Travis Scrimshaw, Matthias Köppe Merged in:
Authors: Michael Jung Reviewers:
Report Upstream: N/A Work issues:
Branch: u/gh-mjungmath/wedge_product_parallel (Commits, GitHub, GitLab) Commit: 6303e7c19f873255c82c0dd76721baa8c5721669
Dependencies: Stopgaps:

Status badges

Description (last modified by Michael Jung)

Apparently, the wedge product is not performed on multiple cores when parallel computation is enabled. According to the compontent-wise computation of general tensors, I add this feature for the wedge product for alternate forms, too.

Change History (20)

comment:1 Changed 2 years ago by Michael Jung

Authors: Michael Jung
Cc: Eric Gourgoulhon added
Component: PLEASE CHANGEgeometry
Description: modified (diff)
Keywords: manifolds mixed_forms added

comment:2 Changed 2 years ago by Michael Jung

Type: PLEASE CHANGEenhancement

comment:3 Changed 2 years ago by Michael Jung

Description: modified (diff)

comment:4 Changed 2 years ago by Michael Jung

Description: modified (diff)
Keywords: differential_forms parallel added; mixed_forms removed
Summary: Mixed Forms - Fast zero checkParallelization of Wedge Product

comment:5 Changed 2 years ago by Michael Jung

Description: modified (diff)

comment:6 Changed 2 years ago by Michael Jung

Branch: u/gh-mjungmath/wedge_product_parallel

comment:7 Changed 2 years ago by Michael Jung

Cc: Travis Scrimshaw added
Commit: d8ecedceb0f88de6afb5af3ad4f53a622552fec4

This is my very first approach simply copied from the previous ones. However, I noticed that in lower dimensions, the parallelization is even slower. Furthermore, one could improve this process a little bit further just my considering distinct indices from the beginning (see the check in the loop).

I appreciate any help since I am not familiar with effective parallelization.


New commits:

d8ecedcTrac #29796: first parallelization approach
Last edited 2 years ago by Michael Jung (previous) (diff)

comment:8 Changed 2 years ago by git

Commit: d8ecedceb0f88de6afb5af3ad4f53a622552fec46303e7c19f873255c82c0dd76721baa8c5721669

Branch pushed to git repo; I updated commit sha1. New commits:

0961bdcTrac #29796: further small improvements
6303e7cTrac #29796: indentation fixed

comment:9 Changed 2 years ago by Michael Jung

Some computations in 4 dimensions made it slightly worse: from around 8 sec to 15 sec. In contrast, more complicated computations in 6 dimensions yield a significant improvement: from around 40 minutes to around 15 minutes.

However, I noticed that the cpus are not fully engaged and run around 20-40% workload.

I appreciate any suggestions. I feel a little bit lost here.


New commits:

0961bdcTrac #29796: further small improvements
6303e7cTrac #29796: indentation fixed

New commits:

0961bdcTrac #29796: further small improvements
6303e7cTrac #29796: indentation fixed
Version 0, edited 2 years ago by Michael Jung (next)

comment:10 in reply to:  9 Changed 2 years ago by Eric Gourgoulhon

Replying to gh-mjungmath:

Some computations in 4 dimensions made it slightly worse: from around 8 sec to 15 sec. In contrast, more complicated computations in 6 dimensions yield a good improvement.

However, I noticed that the cpus are not fully engaged and run around 20-80% workload, varying all the time. Hence, there is much room for improvement.

I appreciate any suggestions. I feel a little bit lost here.

I would say that the behaviour that you observe is due to the computation being not fully parallelized in the current code. Indeed, in the final lines

            for ii, val in paral_wedge(listParalInput):
                 for jj in val:
                     cmp_r[[jj[0]]] += jj[1]

the computation cmp_r[[jj[0]]] += jj[1] is performed sequentially.

comment:11 Changed 2 years ago by Michael Jung

Interestingly, I dropped the summation completely, and still, the computation takes longer than without parallelization. This is odd, isn't it?

Even this modification doesn't improve anything:

        ind_list = [(ind_s, ind_o) for ind_s in cmp_s._comp
                                   for ind_o in cmp_o._comp
                    if len(ind_s+ind_o) == len(set(ind_s+ind_o))]
        nproc = Parallelism().get('tensor')
        if nproc != 1:
            # Parallel computation
            lol = lambda lst, sz: [lst[i:i + sz] for i in
                                   range(0, len(lst), sz)]
            ind_step = max(1, int(len(ind_list) / nproc))
            local_list = lol(ind_list, ind_step)
            # list of input parameters:
            listParalInput = [(cmp_s, cmp_o, ind_part) for ind_part in
                              local_list]

            @parallel(p_iter='multiprocessing', ncpus=nproc)
            def paral_wedge(s, o, local_list_ind):
                partial = []
                for ind_s, ind_o in local_list_ind:
                    ind_r = ind_s + ind_o
                    partial.append([ind_r, s._comp[ind_s] * o._comp[ind_o]])
                return partial
            for ii, val in paral_wedge(listParalInput):
                for jj in val:
                    cmp_r[[jj[0]]] = jj[1]
        else:
            # Sequential computation
            for ind_s, ind_o in ind_list:
                ind_r = ind_s + ind_o
                cmp_r[[ind_r]] += cmp_s._comp[ind_s] * cmp_o._comp[ind_o]

If nproc is set to 1, the original speed is preserved.

I am fully aware that this leads to wrong results and the summation should be covered within the parallelization, somehow. Nevertheless, this seems strange to me.

Last edited 2 years ago by Michael Jung (previous) (diff)

comment:12 Changed 2 years ago by Michael Jung

Besides this odd fact, do you have any ideas how the summation can be parallelized, too?

comment:13 Changed 2 years ago by Michael Jung

Cc: Matthias Köppe added

comment:14 Changed 2 years ago by Matthias Köppe

Milestone: sage-9.2sage-9.3

comment:15 Changed 20 months ago by Matthias Köppe

Milestone: sage-9.3sage-9.4

Setting new milestone based on a cursory review of ticket status, priority, and last modification date.

comment:16 Changed 15 months ago by Michael Jung

By the way, why don't we use MapReduce patterns (or similar) for parallelizations? The parallelization syntax used all over is hardly readable imho.

See for example: https://towardsdatascience.com/a-beginners-introduction-into-mapreduce-2c912bb5e6ac

comment:17 Changed 15 months ago by Matthias Köppe

Milestone: sage-9.4sage-9.5

comment:18 Changed 10 months ago by Matthias Köppe

Milestone: sage-9.5sage-9.6

comment:19 Changed 6 months ago by Matthias Köppe

Milestone: sage-9.6sage-9.7

comment:20 Changed 5 weeks ago by Matthias Köppe

Milestone: sage-9.7sage-9.8
Note: See TracTickets for help on using tickets.