Opened 7 years ago

Closed 3 years ago

Last modified 3 years ago

#13580 closed enhancement (fixed)

Parallel map reduce on SearchForest

Reported by: hivert Owned by: hivert
Priority: major Milestone: sage-7.2
Component: combinatorics Keywords: map-reduce, days57, days77
Cc: sage-combinat, ncohen, ​jdemeyer, slabbe Merged in:
Authors: Florent Hivert, Jean-Baptiste Priez, Nathann Cohen Reviewers: Sébastien Labbé, Jean-Baptiste Priez
Report Upstream: N/A Work issues:
Branch: 134c1fa (Commits) Commit:
Dependencies: Stopgaps:

Description (last modified by hivert)

Implement a map reduce algorithm in parallel on large sets described by a SearchForest. We use a work-stealing algorithm (see https://en.wikipedia.org/wiki/Work_stealing) based on Python's multiprocessing.

Change History (109)

comment:1 Changed 6 years ago by slabbe

Salut Florent et Nathann,

I am starting to think/work on #6637... I have some questions. Mainly, I would like to know what is this ticket, because the above one liner in the description does not say much...

  • Will SearchForest survive or not?
  • Does it replace SearchForest?
  • Does it improve SearchForest?
  • Does it use SearchForest?
  • Or is it used by SearchForest?

Also, Florent wrote on sage-devel in October 2012 that

   I'm also in the process of finalizing a patch which do parallel and even
   distributed map-reduce on recursively enumerated sets (currently badly named
   SearchForest, I'll change the name in my patch, ticket #13580, patch on
   Sage-Combinat queue [1]).
  • I do not see in the cited patch that the name of SearchForest is changed.
  • What would be a better name for SearchForest ?
  • What patch is the more recent? the "old" one or the "experimental" one?
    trac_13580-map_reduce-fh.patch #+experimental
    map_reduce_improved_loop-fh.patch #+experimental
    map_reduce_condition-fh.patch #+experimental
    trac_13580-map_reduce-old-fh.patch 

comment:2 Changed 6 years ago by jdemeyer

  • Milestone changed from sage-5.11 to sage-5.12

comment:3 Changed 5 years ago by vbraun_spam

  • Milestone changed from sage-6.1 to sage-6.2

comment:4 Changed 5 years ago by hivert

  • Branch set to u/hivert/13580/map_reduce

comment:5 Changed 5 years ago by hivert

  • Commit set to 693c672387212c47ac65012466f92ffa2cc55932
  • Keywords days57 added

New commits:

693c672Imported code from trac_13580-map_reduce-fh.patch + fixed multiline doctests.

comment:6 Changed 5 years ago by vbraun_spam

  • Milestone changed from sage-6.2 to sage-6.3

comment:7 Changed 5 years ago by vbraun_spam

  • Milestone changed from sage-6.3 to sage-6.4

comment:8 Changed 4 years ago by nthiery

  • Branch changed from u/hivert/13580/map_reduce to u/nthiery/13580/map_reduce

comment:9 Changed 4 years ago by git

  • Commit changed from 693c672387212c47ac65012466f92ffa2cc55932 to addb17a177397cd9de0952e7c1ee457fe49220bc

Branch pushed to git repo; I updated commit sha1. New commits:

addb17a13580: Trivial rest fix

comment:10 Changed 4 years ago by hivert

  • Branch changed from u/nthiery/13580/map_reduce to u/hivert/13580/map_reduce

comment:11 Changed 4 years ago by git

  • Commit changed from addb17a177397cd9de0952e7c1ee457fe49220bc to 92e6e680771d6f286954bf85ee0064d22a0b253c

Branch pushed to git repo; I updated commit sha1. New commits:

92e6e68Merge branch 't/13580/map_reduce' into 13580

comment:12 Changed 4 years ago by git

  • Commit changed from 92e6e680771d6f286954bf85ee0064d22a0b253c to 734c74810e63321463e4bca7bcf0da85dffc47b9

Branch pushed to git repo; I updated commit sha1. New commits:

734c748#13580 Fixed test after merging #6637

comment:13 Changed 4 years ago by slabbe

I saw the following typo in map reduce file while looking at the previous commit: "As an example, ee compute"

comment:14 Changed 4 years ago by git

  • Commit changed from 734c74810e63321463e4bca7bcf0da85dffc47b9 to f234f60f37e007365c2aec1b4defea55ff255c70

Branch pushed to git repo; I updated commit sha1. New commits:

f234f60#13580 : improved documentation

comment:15 Changed 3 years ago by git

  • Commit changed from f234f60f37e007365c2aec1b4defea55ff255c70 to 7a33037a2d444e2f6981637d3e187cbcca72ba7d

Branch pushed to git repo; I updated commit sha1. New commits:

171be75Merge branch 'master' into t/13580/13580/map_reduce
bfdf33eFixed naming convention
7a33037Work in progress on the DOC

comment:16 Changed 3 years ago by git

  • Commit changed from 7a33037a2d444e2f6981637d3e187cbcca72ba7d to 168ceae047328c6c529e0ce70d91a4300efabe29

Branch pushed to git repo; I updated commit sha1. New commits:

168ceaeAdded architecture picture for map_reduce

comment:17 Changed 3 years ago by git

  • Commit changed from 168ceae047328c6c529e0ce70d91a4300efabe29 to 6c127527dfcd62673f2d871616a22b95a4b368e9

Branch pushed to git repo; I updated commit sha1. New commits:

366fc17More Doc
6c12752Tested timeout option

comment:18 Changed 3 years ago by git

  • Commit changed from 6c127527dfcd62673f2d871616a22b95a4b368e9 to 493bb52f98bb2551eb557528fbfc2d6a908bd38d

Branch pushed to git repo; I updated commit sha1. New commits:

493bb52Done the doc of Map/Reduce

comment:19 Changed 3 years ago by git

  • Commit changed from 493bb52f98bb2551eb557528fbfc2d6a908bd38d to 2534f12eaf4a820c6e9a76894aabf19775b8ba05

Branch pushed to git repo; I updated commit sha1. New commits:

2534f12Moved map/reduce to sage/parallel

comment:20 Changed 3 years ago by hivert

  • Cc ​jdemeyer slabbe added
  • Description modified (diff)
  • Status changed from new to needs_review

comment:21 follow-up: Changed 3 years ago by slabbe

  • Status changed from needs_review to needs_work

The branch currently have conflicts with development version of Sage (because the link is red in the description of the ticket above). The branch seems to be based on 6.2.beta6 which is old and may explain the presence of a conflict.

comment:22 in reply to: ↑ 21 ; follow-up: Changed 3 years ago by hivert

Replying to slabbe:

The branch currently have conflicts with development version of Sage (because the link is red in the description of the ticket above). The branch seems to be based on 6.2.beta6 which is old and may explain the presence of a conflict.

Yes ! there is a trivial conflict. Thank you for pointing it. I'm fixing it, testing and re-uploading.

comment:23 Changed 3 years ago by git

  • Commit changed from 2534f12eaf4a820c6e9a76894aabf19775b8ba05 to e5b74770eec0b0e93e22a70358bc83f16893ce2a

Branch pushed to git repo; I updated commit sha1. New commits:

e5b7477Merge 6.10.rc1 + fixed conflict

comment:24 Changed 3 years ago by git

  • Commit changed from e5b74770eec0b0e93e22a70358bc83f16893ce2a to 82fd1e47599b75ef22133d18199ed1ee15478f5d

Branch pushed to git repo; I updated commit sha1. New commits:

82fd1e4Fixed links according to deprecation/rebase

comment:25 in reply to: ↑ 22 Changed 3 years ago by hivert

  • Status changed from needs_work to needs_review

Replying to hivert:

Replying to slabbe:

The branch currently have conflicts with development version of Sage (because the link is red in the description of the ticket above). The branch seems to be based on 6.2.beta6 which is old and may explain the presence of a conflict.

Yes ! there is a trivial conflict. Thank you for pointing it. I'm fixing it, testing and re-uploading.

Done ! I was actually based on 6.9.

comment:26 Changed 3 years ago by git

  • Commit changed from 82fd1e47599b75ef22133d18199ed1ee15478f5d to 68b6530ecc825e773bdd34b09a98b21eb4d8987c

Branch pushed to git repo; I updated commit sha1. New commits:

68b6530Removed TODO in doc

comment:27 Changed 3 years ago by hivert

  • Summary changed from Parellel map reduce on SearchForest to Parallel map reduce on SearchForest

comment:28 Changed 3 years ago by git

  • Commit changed from 68b6530ecc825e773bdd34b09a98b21eb4d8987c to b93ba7da6f57a9fd6f02469819f4d3ce0c2f289b

Branch pushed to git repo; I updated commit sha1. New commits:

b93ba7dFixed a link + doctest

comment:29 follow-up: Changed 3 years ago by git

  • Commit changed from b93ba7da6f57a9fd6f02469819f4d3ce0c2f289b to 5c7720dae1034f1b2e04bb9abb3ea90797de55ee

Branch pushed to git repo; I updated commit sha1. New commits:

5c7720dFixed doctest continuations and raise statements

comment:30 in reply to: ↑ 29 Changed 3 years ago by hivert

Replying to git:

5c7720dFixed doctest continuations and raise statements

Patchbot was complaining about old style multiline doctests and exception raises. Fixed and reuped. Should be Ok now.

comment:31 Changed 3 years ago by git

  • Commit changed from 5c7720dae1034f1b2e04bb9abb3ea90797de55ee to cb9c011d0b7f2f3a736f7418c51d49ccc8c5b715

Branch pushed to git repo; I updated commit sha1. New commits:

cb9c011Fixed another multiline doctest

comment:32 follow-up: Changed 3 years ago by slabbe

I just looked at the code in the browser. Looks good. I will do real code review later this week hopefully. Quickly, I saw two typos:

First line of /src/sage/parallel/map_reduce.py : Paralell

Later in the same file I think: parameters tu the

comment:33 Changed 3 years ago by git

  • Commit changed from cb9c011d0b7f2f3a736f7418c51d49ccc8c5b715 to 14b086beac66e6a3fe3bf6be2cd7ae0e28c20152

Branch pushed to git repo; I updated commit sha1. New commits:

14b086bFixed Sebastien's Typos

comment:34 in reply to: ↑ 32 Changed 3 years ago by hivert

Replying to slabbe:

I just looked at the code in the browser. Looks good. I will do real code review later this week hopefully. Quickly, I saw two typos:

Fixed ! I'm afraid you'll find more ! By the way, removed distributed from the title since I don't do distributed computation anymore. I had a prototype which allowed to launch those computation on a cluster of machines, but it induces a huge performance loss do I dropped It. maybe we will work on this with Jeroen...

comment:35 follow-up: Changed 3 years ago by slabbe

Dear Florent, I know that you like multiline doctests, but for two reasons I prefer to avoid them as much as possible in doctests:

  1. It is not fun for the user who wants to adapt an example after a copy paste in the console. The up arrow brings the multiline all at once and it is not fun to change the value 10 to a value 25 let say.
  2. Using variables to store the input before calling the method gives lot of information easily to the user: what are the argument names, in what order should they be used.

I know I am asking you to change your style by asking you this, but don't you agree with me or can you provide some counter-arguments to mine? I give an example below:

  • src/sage/combinat/backtrack.py

    diff --git a/src/sage/combinat/backtrack.py b/src/sage/combinat/backtrack.py
    index 9dee057..ac1243a 100644
    a b class SearchForest(Parent): 
    695695                   reduce_function = None,
    696696                   reduce_init = None):
    697697        r"""
    698         Apply en Map Reduce algorithm on ``self``
     698        Apply a Map Reduce algorithm on ``self``
    699699
    700700        EXAMPLES::
    701701
    702             sage: F = RecursivelyEnumeratedSet( [([i],i, i) for i in range(1,10)],
    703             ....:     lambda (list, sum, last):
    704             ....:         [(list + [i], sum + i, i) for i in range(1,last)],
    705             ....:     structure='forest', enumeration='depth')
     702            sage: seeds = [([i],i, i) for i in range(1,10)]
     703            sage: succ = lambda (list, sum, last):
     704            ....:               [(list + [i], sum + i, i) for i in range(1,last)]
     705            sage: F = RecursivelyEnumeratedSet(seeds, succ,
     706            ....:                       structure='forest', enumeration='depth')
    706707            sage: y = var('y')
    707             sage: F.map_reduce(
    708             ....:     lambda (li, sum, _): y**sum, lambda x,y: x + y,  0 )
     708            sage: map_function = lambda (li, sum, _): y**sum
     709            sage: reduce_function = lambda x,y: x + y
     710            sage: F.map_reduce(map_function, reduce_function, 0)
    709711            y^45 + y^44 + y^43 + 2*y^42 + 2*y^41 + 3*y^40 + 4*y^39 + 5*y^38 + 6*y^37 + 8*y^36 + 9*y^35
    710712
    711713        .. SEEALSO:: :mod:`sage.parallel.map_reduce`

I noticed another typo on the same line as the one parameters tu the : On -> One

Last edited 3 years ago by slabbe (previous) (diff)

comment:36 in reply to: ↑ 35 ; follow-up: Changed 3 years ago by hivert

Replying to slabbe:

Dear Florent, I know that you like multiline doctests, but for two reasons I prefer to avoid them as much as possible in doctests:

  1. It is not fun for the user who wants to adapt an example after a copy paste in the console. The up arrow brings the multiline all at once and it is not fun to change the value 10 to a value 25 let say.

I'm tempted to answer that I don't have this problem with emacs ;-). Of course, for the people using a two letters editor...

  1. Using variables to store the input before calling the method gives lot of information easily to the user: what are the argument names, in what order should they be used.

I'm much more convinced by this argument. I'm watching an exam tomorrow evening. I'll change my code accordingly.

comment:37 in reply to: ↑ 36 ; follow-up: Changed 3 years ago by hivert

Replying to hivert:

I'm much more convinced by this argument. I'm watching an exam tomorrow evening. I'll change my code accordingly.

Actually, I'm not that much convinced. It's not clear for me that

      sage: map_function = lambda x: 1
      sage: reduce_function = lambda x,y: x+y
      sage: reduce_init = 0
      sage: S.map_reduce(map_function, reduce_function, reduce_init)
      131071

is much better than

      sage: S.map_reduce(
      ....:   map_function = lambda x: 1,
      ....:   reduce_function = lambda x,y: x+y,
      ....:   reduce_init = 0 )

In this second version which is the style I adopted in map_reduce.py, we don't repeat the name twice, which is better in my opinion.

A third possibility is

      sage: S.map_reduce(lambda x: 1, lambda x,y: x+y, 0)

It is the one used through the whole backtrack.py files. Unless for very short example, I don't find it very readable. But since it's not adding new code but changing old one, which needs even more cleanup and doc improvement, I'd rather keep it for another ticket. More precisly, I'd rather do that during #16351.

What do you think ?

Last edited 3 years ago by hivert (previous) (diff)

comment:38 follow-up: Changed 3 years ago by hivert

I'm not sure to understand the patchbot complaint ? Is it because the map_reduce is not imported as Sage's startup ?

comment:39 Changed 3 years ago by hivert

I just uploaded a doc improvement: the map_reduce function was missing the description of the input.

comment:40 Changed 3 years ago by git

  • Commit changed from 14b086beac66e6a3fe3bf6be2cd7ae0e28c20152 to e3c4c716b4dbd9650f06de80e6477bb4e9d36a1c

Branch pushed to git repo; I updated commit sha1. New commits:

e3c4c71Doc improvements

comment:41 Changed 3 years ago by git

  • Commit changed from e3c4c716b4dbd9650f06de80e6477bb4e9d36a1c to 31c4735651aa4f56c7e6899e6352669fa2b21d99

Branch pushed to git repo; I updated commit sha1. New commits:

31c4735Removed unneded import in pxd

comment:42 in reply to: ↑ 37 Changed 3 years ago by slabbe

Replying to hivert:

What do you think ?

I think that some doctest are more important than other, especially the first doctests that people might be expected to see when using the map reduce code. For example the map_reduce method in the recursively enumerated set file is one possible entry point where clarity of doctests matters more. Therefore, I like the way you change the doctest. Definitively, the top of the module is an important entry point. For other sub methods of the class map reduce, I care less...

comment:43 in reply to: ↑ 38 ; follow-up: Changed 3 years ago by slabbe

Replying to hivert:

I'm not sure to understand the patchbot complaint ? Is it because the map_reduce is not imported as Sage's startup ?

I don't see why neither...

comment:44 follow-up: Changed 3 years ago by slabbe

Event, Condition, time and os are imported in map_reduce file but are not used.

comment:45 follow-up: Changed 3 years ago by slabbe

Is there a reason why you use res=[""] and res[0] in print_communication_statistics instead of res="" and res?

comment:46 in reply to: ↑ 45 ; follow-up: Changed 3 years ago by hivert

Replying to slabbe:

Is there a reason why you use res=[""] and res[0] in print_communication_statistics instead of res="" and res?

Yes ! This is the classical trick to have a local variable shared with the local function (see e.g: http://stackoverflow.com/questions/2609518/python-nested-function-scopes).

comment:47 in reply to: ↑ 44 Changed 3 years ago by hivert

Replying to slabbe:

Event, Condition, time and os are imported in map_reduce file but are not used.

Fixed in my last push.

comment:48 Changed 3 years ago by git

  • Commit changed from 31c4735651aa4f56c7e6899e6352669fa2b21d99 to cca788124c7eb757cceda7cb7a73f9af28149a1e

Branch pushed to git repo; I updated commit sha1. New commits:

cca7881Removed unneeded import

comment:49 in reply to: ↑ 43 Changed 3 years ago by hivert

Replying to slabbe:

I don't see why neither...

According to Vincent on Sage devel it is probably a bug of the patchbot (see https://groups.google.com/forum/#!topic/sage-devel/xlpLwktA5Hk).

comment:50 in reply to: ↑ 46 Changed 3 years ago by slabbe

Replying to hivert:

Replying to slabbe:

Is there a reason why you use res=[""] and res[0] in print_communication_statistics instead of res="" and res?

Yes ! This is the classical trick to have a local variable shared with the local function (see e.g: http://stackoverflow.com/questions/2609518/python-nested-function-scopes).

Could you add this information inside the code of that method?

comment:51 follow-up: Changed 3 years ago by slabbe

map_reduce method of SearchForest should document what happen when this method is called with no arguments, i.e. returns the cardinality. More precisely, it should say that the value None for arguments map_function and reduce_function is replaced by what?

comment:52 Changed 3 years ago by slabbe

On can use the three following parameters: -> One can use the three following parameters:

comment:53 follow-up: Changed 3 years ago by slabbe

I get the following doctests error on top on sage-6.10 on my machine:

Git branch: 13580
Using --optional=gcc,mpir,python2,sage
Doctesting 1 file.
sage -t --warn-long 85.4 map_reduce.py
**********************************************************************
File "map_reduce.py", line 960, in sage.parallel.map_reduce.RESetMapReduce._signal_task_start
Failed example:
    S._active_tasks
Expected:
    <Semaphore(value=2)>
Got:
    <Semaphore(value=unknown)>
**********************************************************************
File "map_reduce.py", line 964, in sage.parallel.map_reduce.RESetMapReduce._signal_task_start
Failed example:
    S._active_tasks
Expected:
    <Semaphore(value=3)>
Got:
    <Semaphore(value=unknown)>
**********************************************************************
File "map_reduce.py", line 985, in sage.parallel.map_reduce.RESetMapReduce._signal_task_done
Failed example:
    S._active_tasks
Expected:
    <Semaphore(value=2)>
Got:
    <Semaphore(value=unknown)>
**********************************************************************
File "map_reduce.py", line 989, in sage.parallel.map_reduce.RESetMapReduce._signal_task_done
Failed example:
    S._active_tasks
Expected:
    <Semaphore(value=1)>
Got:
    <Semaphore(value=unknown)>
**********************************************************************
2 items had failures:
   2 of   9 in sage.parallel.map_reduce.RESetMapReduce._signal_task_done
   2 of   7 in sage.parallel.map_reduce.RESetMapReduce._signal_task_start
    [249 tests, 4 failures, 43.04 s]
----------------------------------------------------------------------
sage -t --warn-long 85.4 map_reduce.py  # 4 doctests failed
----------------------------------------------------------------------

It seems ok on patchbot reporting seemingly unrelated errors :

----------------------------------------------------------------------
sage -t --long src/sage/interfaces/expect.py  # Timed out
sage -t --long src/sage/interfaces/gap.py  # 10 doctests failed
----------------------------------------------------------------------
Last edited 3 years ago by slabbe (previous) (diff)

comment:54 Changed 3 years ago by slabbe

  • Status changed from needs_review to needs_work

I divided the review in a stack of tasks but since I am unable to parallelize my working time, I did the review serially with many interruptions during the previous days:)

I finished to look at what I wanted to look at. The documentation compiles good and helps to understand what is happening. I was able to test the parallel computation on at least one extensive example. To me it will be a positive review after my previous four comments are answered.

comment:55 in reply to: ↑ 53 ; follow-up: Changed 3 years ago by hivert

Replying to slabbe:

I get the following doctests error on top on sage-6.10 on my machine:

Expected:
    <Semaphore(value=1)>
Got:
    <Semaphore(value=unknown)>

Is this happening on MacOS ? I know that Semaphore are implemented differently on this system. See in particular the note after class multiprocessing.BoundedSemaphore, where it says:

on Unix platforms like Mac OS X where sem_getvalue() is not implemented.

If so and if the rests behave correctly, I'm tempted to replace the doctests with

    <Semaphore(value=...)>

What do you think ?

comment:56 in reply to: ↑ 51 ; follow-up: Changed 3 years ago by hivert

Replying to slabbe:

map_reduce method of SearchForest should document what happen when this method is called with no arguments, i.e. returns the cardinality. More precisely, it should say that the value None for arguments map_function and reduce_function is replaced by what?

I didn't document this one because I'm was not sure It was a sensible default. If you think it is then let's go for it.

By the way thanks for your careful review !!!

Florent

comment:57 Changed 3 years ago by git

  • Commit changed from cca788124c7eb757cceda7cb7a73f9af28149a1e to 1a8b78e378bf3909a5c4ad4b15a77df7320ab42d

Branch pushed to git repo; I updated commit sha1. New commits:

1a8b78e- MacOSX compatible doctests for semaphore.

comment:58 Changed 3 years ago by hivert

I uploaded a patch which should address all your concerns:

  • tests on MacOSX
  • typos
  • default values documentation

comment:59 Changed 3 years ago by slabbe

Introduced typo in the previous commit: the function costant to 1. -> constant

comment:60 in reply to: ↑ 55 Changed 3 years ago by slabbe

Replying to hivert:

Replying to slabbe:

I get the following doctests error on top on sage-6.10 on my machine:

Expected:
    <Semaphore(value=1)>
Got:
    <Semaphore(value=unknown)>

Is this happening on MacOS ?

Yes

If so and if the rests behave correctly, I'm tempted to replace the doctests with

    <Semaphore(value=...)>

What do you think ?

I agree. All tests passed now...

comment:61 in reply to: ↑ 56 Changed 3 years ago by slabbe

Replying to hivert:

I didn't document this one because I'm was not sure It was a sensible default. If you think it is then let's go for it.

I think it is a good default. Also, sometimes, you may want to set only some of the arguments and this information allows it. Finally, it's good to see that cardinality is a special case of this method.

Last edited 3 years ago by slabbe (previous) (diff)

comment:62 Changed 3 years ago by slabbe

  • Reviewers set to Sébastien Labbé

comment:63 Changed 3 years ago by slabbe

Just to be clear: I am waiting for the costant -> constant fix to change the status to positive review.

comment:64 Changed 3 years ago by nthiery

  • Branch changed from u/hivert/13580/map_reduce to u/nthiery/13580/map_reduce

comment:65 Changed 3 years ago by nthiery

  • Commit changed from 1a8b78e378bf3909a5c4ad4b15a77df7320ab42d to 98fd9e1c7a085b1ba62d004dba6cca4c89001cd2
  • Status changed from needs_work to positive_review

Typo fixed, hence changing the status to positive review on Sebastien's behalf.


New commits:

98fd9e113580: typo fix

comment:67 Changed 3 years ago by elixyre

Hello and merry christmas every one,

I have some question on this ticket.

Firstly, I don't understand the documentation from the line 571:

    Decription of the map/reduce operation:

    - ``map_function=f`` -- (default to ``None``)
    - ``reduce_function=red`` -- (default to ``None``)
    - ``reduce_init=init`` -- (default to ``None``)

What means f, red and init? My opinion is that f, redand init are irrelevant but an example should be interesting at this point:

(copy-paste of one in the module documentation)

 sage: from sage.parallel.map_reduce import RESetMapReduce
 sage: S = RESetMapReduce(
 ....:   roots = [[]],
 ....:   children = lambda l: [l+[0], l+[1]] if len(l) <= 15 else [],
 ....:   map_function = lambda x : 1,
 ....:   reduce_function = lambda x,y: x+y,
 ....:   reduce_init = 0 )
 sage: S.run()
 131071

Ok, there is the seealso which says go to see the documentation of the module for examples, but one example there is interesting, no?

Then it seems there is some useless import (like line 1345 from multiprocessing import current_process) (I try to compile sage soon and I can remove all useless import if you want).

Finally, I'm not sure to understand the use of AbortError. It is used to abort the computation and to say to the workers and the thiefs every think is done?

Jean-Baptiste

comment:68 in reply to: ↑ 66 Changed 3 years ago by slabbe

Replying to vbraun:

Fails on OSX: http://build.sagedev.org/release/builders/%20%20fast%20Volker%20MiniMac%20%28OSX%2010.10%20x86_64%29%20incremental/builds/636/steps/shell_4/logs/stdio

I can not reproduce this problem on my Mac OS X Yosemite 10.10.2:

$ sage -tp --long src/sage/parallel/map_reduce.py 
Running doctests with ID 2015-12-28-10-44-31-65e477d9.
Git branch: 13580
Using --optional=gcc,mpir,python2,sage
Doctesting 1 file using 2 threads.
sage -t --long --warn-long 85.0 src/sage/parallel/map_reduce.py
    [252 tests, 40.89 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------

comment:69 Changed 3 years ago by elixyre

Hello,

I can reproduce test failures on El Capitan (os x):

I run 3 tests and I obtained 3 distincts failures. My problem is that if I try to execute those lines which fails in sage terminal (several times) then it does not fail!

Is it possible that the test environment is not robust specially on mac os?

comment:70 follow-up: Changed 3 years ago by elixyre

I read the code and I don't know how to understand this:

res = post_process(node)
if res is not None:
    self._res = reduc(self._res, mapp(res))

in the worker code of walk_branch_locally method.

I think this is useless, no? Or if not then the documentation of post_process function should be improve to specify that if post_process returns None then there is a specific behavior.

comment:71 in reply to: ↑ 70 Changed 3 years ago by elixyre

Replying to elixyre:

res = post_process(node)
if res is not None:
    self._res = reduc(self._res, mapp(res))

I think this is useless, no? Or if not then the documentation of post_process function should be improve to specify that if post_process returns None then there is a specific behavior.

Ok, this behavior is specify in the documentation of RESetMapReduce but this is not specified in paragraph (line 135) associated of this module.

comment:72 Changed 3 years ago by elixyre

Furthermore the lines:

for child in newnodes: 
    self._todo.append(child)

of walk_branch_locally should be replaced by

self._todo_.extend(newnodes)

(I just notice that I read and if you are agree with my remark then I can replace that)

comment:73 follow-up: Changed 3 years ago by elixyre

Go back to None activities, if I assume some users want to use nodes which could be None then my first idea is to use the post_process function to avoid problem but... if the thief doesn't receive then he send None and he says there is no job to do.

I propose two solutions:

  • first one, to specify that None is not a node never! (this is assumed but this must be explicited in the documentation),
  • second one (the object way), to define an object NoTask.

Opinion?

comment:74 in reply to: ↑ 73 Changed 3 years ago by hivert

Replying to elixyre:

Go back to None activities, if I assume some users want to use nodes which could be None then my first idea is to use the post_process function to avoid problem but... if the thief doesn't receive then he send None and he says there is no job to do.

I propose two solutions:

  • first one, to specify that None is not a node never! (this is assumed but this must be explicited in the documentation),

I'm clearly in Favor of this solution. We assume that None doesn't blong to the set described by the underlying RESet. Note that this is something which concerns more the underlying RESet than the map-reduce computation.

comment:75 Changed 3 years ago by hivert

  • Branch changed from u/nthiery/13580/map_reduce to u/hivert/13580/map_reduce

comment:76 Changed 3 years ago by hivert

  • Authors changed from Florent Hivert, Nathann Cohen to Florent Hivert, Jean-Baptiste Priez, Nathann Cohen
  • Commit changed from 98fd9e1c7a085b1ba62d004dba6cca4c89001cd2 to 6e3bbd35f4572dffa1953e4f4acddc916a36c967

New commits:

6e3bbd3More logging for debugging

comment:77 Changed 3 years ago by git

  • Commit changed from 6e3bbd35f4572dffa1953e4f4acddc916a36c967 to 6b4a5cffe362093e6011801d3485a9dc02ddc1e2

Branch pushed to git repo; I updated commit sha1. New commits:

98fd9e113580: typo fix
6b4a5cfMerge branch 'u/nthiery/13580/map_reduce' into t/13580/map_reduce

comment:78 in reply to: ↑ 66 Changed 3 years ago by hivert

Replying to vbraun:

Fails on OSX: http://build.sagedev.org/release/builders/%20%20fast%20Volker%20MiniMac%20%28OSX%2010.10%20x86_64%29%20incremental/builds/636/steps/shell_4/logs/stdio

We finally got the problem with Jean-Baptiste. It occurs that semaphore are broken on MacOSX (or at least are not fully POSIX compliant). In particular, on standard unixes, when two processes are trying to acquire a semaphore whose value is more than two, they always both succeeded. On MacOS, one of them may fail. As a consequence, I'm writing a different code form MacOS relying on a Lock and a shared integer. It may be slower on system where semaphore are implemented in a lockless way.

Is there a standard Sage/Python? way to check if we are on MacOS ?

comment:79 Changed 3 years ago by elixyre

  • Branch changed from u/hivert/13580/map_reduce to u/elixyre/ticket/13580
  • Commit changed from 6b4a5cffe362093e6011801d3485a9dc02ddc1e2 to 44566e6dea943f9494fb315d9e3794c7b309c6de
  • Reviewers changed from Sébastien Labbé to Sébastien Labbé, Jean-Baptiste Priez

New commits:

44566e6ticket search forest map reduce with no problem on mac os x

comment:80 Changed 3 years ago by elixyre

I pushed the modifications done with Florent about mac os problem.

It seems there still exists the problem of synchronization with the Pipe (of the worker) but... it is never appear... so may be I am wrong.

Is it possible that two distinct process write in the same time on the _write_task of a worker?

comment:81 Changed 3 years ago by hivert

  • Branch changed from u/elixyre/ticket/13580 to u/hivert/ticket/13580

comment:82 Changed 3 years ago by hivert

  • Commit changed from 44566e6dea943f9494fb315d9e3794c7b309c6de to aa8280f9cf68d1e592b760cc85bb7442b47a243e

I just uploaded a version which hopefully should be working on all OSes including MacOS. Please tests on various oses and report any failure here. I don't put needs review yet because it needs a few refactoring and the coverage it not yet 100%.


New commits:

9456035Work in problem in fixing MACOX broken Semaphore
158023bWrote commented out MacOS version
aa8280fMap-reduce version which should be working even on Darwin based OSes.

comment:83 Changed 3 years ago by git

  • Commit changed from aa8280f9cf68d1e592b760cc85bb7442b47a243e to f9841658ccda142d07cbfaa633dbf4b236165986

Branch pushed to git repo; I updated commit sha1. New commits:

f984165Fixed the doc

comment:84 Changed 3 years ago by git

  • Commit changed from f9841658ccda142d07cbfaa633dbf4b236165986 to a5f4d676ce7201a801bf6c841aae0585fc144336

Branch pushed to git repo; I updated commit sha1. New commits:

a5f4d67Improved doc for map-reduce

comment:85 Changed 3 years ago by git

  • Commit changed from a5f4d676ce7201a801bf6c841aae0585fc144336 to 569de0dae0d23e084cbce7690a2a64b53f906f8b

Branch pushed to git repo; I updated commit sha1. New commits:

569de0dMerge branch 'develop' into t/13580/map_reduce

comment:86 Changed 3 years ago by git

  • Commit changed from 569de0dae0d23e084cbce7690a2a64b53f906f8b to b07dfe28221a7473041d65230be409d6cbfafbfc

Branch pushed to git repo; I updated commit sha1. New commits:

b07dfe2Doc of the two implementationsof ActiveTaskCounter

comment:87 Changed 3 years ago by git

  • Commit changed from b07dfe28221a7473041d65230be409d6cbfafbfc to beebcbcd43c2d7a9d024f142f5e3cb328fbc4036

Branch pushed to git repo; I updated commit sha1. New commits:

beebcbc#13580: Added comment on timing in the doc

comment:88 Changed 3 years ago by hivert

  • Keywords days77 added
  • Status changed from needs_work to needs_review

I just uploaded a patch with basically two implementation of ActiveTaskCounter: one using semaphore and which doesn't work on MACOS and one using semaphore for all POSIX compliant OSes. It should be ready for review.

comment:89 Changed 3 years ago by git

  • Commit changed from beebcbcd43c2d7a9d024f142f5e3cb328fbc4036 to 58eca2e4decb05733145b18e08f16120b8765a81

Branch pushed to git repo; I updated commit sha1. New commits:

58eca2e#13580: Removed comment which is now in the doc

comment:90 Changed 3 years ago by git

  • Commit changed from 58eca2e4decb05733145b18e08f16120b8765a81 to 1badd8aec7efc3229cfdf76bffab3ad1f532bd33

Branch pushed to git repo; I updated commit sha1. New commits:

1badd8a#13580: Renamed ActiveTaskCounter(Posix)

comment:91 Changed 3 years ago by slabbe

  • Status changed from needs_review to needs_work

I get the following errors on OSX :

Running doctests with ID 2016-04-07-19-23-55-8fd43b63.
Git branch: 13580
Using --optional=dot2tex,mpir,python2,sage
Doctesting 1 file using 2 threads.
sage -t --long --warn-long 84.5 src/sage/parallel/map_reduce.py
**********************************************************************
File "src/sage/parallel/map_reduce.py", line 727, in sage.parallel.map_reduce.ActiveTaskCounterPosix.__repr__
Failed example:
    ATC(4)
Expected:
    ActiveTaskCounter(value=4)
Got:
    <repr(<sage.parallel.map_reduce.ActiveTaskCounterPosix at 0x190d32e90>) failed: NotImplementedError>
**********************************************************************
File "src/sage/parallel/map_reduce.py", line 743, in sage.parallel.map_reduce.ActiveTaskCounterPosix.task_start
Failed example:
    c = ATC(4); c
Expected:
    ActiveTaskCounter(value=4)
Got:
    <repr(<sage.parallel.map_reduce.ActiveTaskCounterPosix at 0x190d32e50>) failed: NotImplementedError>
**********************************************************************
File "src/sage/parallel/map_reduce.py", line 745, in sage.parallel.map_reduce.ActiveTaskCounterPosix.task_start
Failed example:
    c.task_start()
Exception raised:
    Traceback (most recent call last):
      File "/Users/slabbe/Applications/sage-git/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 496, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/Users/slabbe/Applications/sage-git/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 858, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.parallel.map_reduce.ActiveTaskCounterPosix.task_start[2]>", line 1, in <module>
        c.task_start()
      File "/Users/slabbe/Applications/sage-git/local/lib/python2.7/site-packages/sage/parallel/map_reduce.py", line 764, in task_start
        return self._active_tasks.get_value()
      File "/Users/slabbe/Applications/sage-git/local/lib/python/multiprocessing/synchronize.py", line 114, in get_value
        return self._semlock._get_value()
    NotImplementedError
**********************************************************************
File "src/sage/parallel/map_reduce.py", line 747, in sage.parallel.map_reduce.ActiveTaskCounterPosix.task_start
Failed example:
    c
Expected:
    ActiveTaskCounter(value=5)
Got:
    <repr(<sage.parallel.map_reduce.ActiveTaskCounterPosix at 0x190d32e50>) failed: NotImplementedError>
**********************************************************************
File "src/sage/parallel/map_reduce.py", line 755, in sage.parallel.map_reduce.ActiveTaskCounterPosix.task_start
Failed example:
    c
Expected:
    ActiveTaskCounter(value=0)
Got:
    <repr(<sage.parallel.map_reduce.ActiveTaskCounterPosix at 0x190d32f90>) failed: NotImplementedError>
**********************************************************************
File "src/sage/parallel/map_reduce.py", line 778, in sage.parallel.map_reduce.ActiveTaskCounterPosix.task_done
Failed example:
    c = ATC(4); c
Expected:
    ActiveTaskCounter(value=4)
Got:
    <repr(<sage.parallel.map_reduce.ActiveTaskCounterPosix at 0x1026a5c10>) failed: NotImplementedError>
**********************************************************************
File "src/sage/parallel/map_reduce.py", line 780, in sage.parallel.map_reduce.ActiveTaskCounterPosix.task_done
Failed example:
    c.task_done()
Exception raised:
    Traceback (most recent call last):
      File "/Users/slabbe/Applications/sage-git/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 496, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/Users/slabbe/Applications/sage-git/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 858, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.parallel.map_reduce.ActiveTaskCounterPosix.task_done[2]>", line 1, in <module>
        c.task_done()
      File "/Users/slabbe/Applications/sage-git/local/lib/python2.7/site-packages/sage/parallel/map_reduce.py", line 796, in task_done
        return self._active_tasks.get_value()
      File "/Users/slabbe/Applications/sage-git/local/lib/python/multiprocessing/synchronize.py", line 114, in get_value
        return self._semlock._get_value()
    NotImplementedError
**********************************************************************
File "src/sage/parallel/map_reduce.py", line 782, in sage.parallel.map_reduce.ActiveTaskCounterPosix.task_done
Failed example:
    c
Expected:
    ActiveTaskCounter(value=3)
Got:
    <repr(<sage.parallel.map_reduce.ActiveTaskCounterPosix at 0x1026a5c10>) failed: NotImplementedError>
**********************************************************************
File "src/sage/parallel/map_reduce.py", line 805, in sage.parallel.map_reduce.ActiveTaskCounterPosix.abort
Failed example:
    c = ATC(4); c
Expected:
    ActiveTaskCounter(value=4)
Got:
    <repr(<sage.parallel.map_reduce.ActiveTaskCounterPosix at 0x1026a5910>) failed: NotImplementedError>
**********************************************************************
File "src/sage/parallel/map_reduce.py", line 808, in sage.parallel.map_reduce.ActiveTaskCounterPosix.abort
Failed example:
    c
Expected:
    ActiveTaskCounter(value=0)
Got:
    <repr(<sage.parallel.map_reduce.ActiveTaskCounterPosix at 0x1026a5910>) failed: NotImplementedError>
**********************************************************************
4 items had failures:
   1 of   3 in sage.parallel.map_reduce.ActiveTaskCounterPosix.__repr__
   2 of   5 in sage.parallel.map_reduce.ActiveTaskCounterPosix.abort
   3 of   7 in sage.parallel.map_reduce.ActiveTaskCounterPosix.task_done
   4 of   8 in sage.parallel.map_reduce.ActiveTaskCounterPosix.task_start
    [296 tests, 10 failures, 33.64 s]
----------------------------------------------------------------------
sage -t --long --warn-long 84.5 src/sage/parallel/map_reduce.py  # 10 doctests failed
----------------------------------------------------------------------
Total time for all tests: 33.9 seconds
    cpu time: 5.8 seconds
    cumulative wall time: 33.6 seconds

comment:92 Changed 3 years ago by git

  • Commit changed from 1badd8aec7efc3229cfdf76bffab3ad1f532bd33 to 46cbab9d81890649c3d056f47ed62065bcb04ca3

Branch pushed to git repo; I updated commit sha1. New commits:

46cbab913580: Fixed doctests to pass on Darwin

comment:93 Changed 3 years ago by slabbe

It works now! :

Using --optional=dot2tex,mpir,python2,sage
Doctesting 1 file using 2 threads.
sage -t --warn-long 84.3 src/sage/parallel/map_reduce.py
    [296 tests, 32.56 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------

comment:94 Changed 3 years ago by slabbe

some typos:

compute the the cardinality

its job. Non only mapping -> Not only

method so that to compute the number of element you only need to call:: -> wrap the line

precedingly by we only -> previously? , but

I would suggest to replace this:

        OUTPUT: Calling :meth:`task_done` decrement the counter and returns
        its value after the decrementation.

by

        OUTPUT: 

        Calling :meth:`task_done` decrement the counter and returns
        its value after the decrementation.

to follow sage standards (more than once).

Also, I would replace this:

INPUT: ``o`` -- a node

by

INPUT: 

- ``o`` -- a node

to follow sage standards (more than once).

the computation are done i -> s missing

comment:95 Changed 3 years ago by git

  • Commit changed from 46cbab9d81890649c3d056f47ed62065bcb04ca3 to 134c1fad85dd50fa28528adc343bad759b174cc6

Branch pushed to git repo; I updated commit sha1. New commits:

134c1fa13580: doc rereading

comment:96 follow-up: Changed 3 years ago by tscrim

I am wondering what else is currently needed for this ticket before a positive review?

Also, what are the current plans to cythonize SearchForest and the parallel reduce map?

comment:97 follow-up: Changed 3 years ago by slabbe

  • Status changed from needs_work to positive_review

What I had in mind for the next step to do was to move the SearchForest code from backtrack file into the recursivelyenumerated_set.pyx file *and* into the hierarchy of RecEnumSet classes so that it benefits for example of methods like the newly added to_digraph.

comment:98 in reply to: ↑ 96 ; follow-up: Changed 3 years ago by hivert

Replying to tscrim:

I am wondering what else is currently needed for this ticket before a positive review?

Sebastien is currently rereading it. I think he is very close to put a positive review.

Also, what are the current plans to cythonize SearchForest and the parallel reduce map?

Concerning Cythonization of map-reduce, my current opinion is that, the code is designed for case where the computation of the children is costly. In this case the cost of the current code should be negligible. There are a lot of function calls, I'm using Python synchronization primitive, but most importantly, communication is are by pickling/unpickling which is very slow. So the idea is to reduce at most at possible the communication, but without drastically changing the technology, I don't think you get a large improvement without great effort. You'll need to rewrite all the communication primitive and to inline the function call to the client code.

On the other hand, If you are allowed to write the code directly in C/C++ (which is the case for small basic objects), then the Cilk extension of C is wonderful. It's a small extension which is supported by GCC since version 5, ICC and in recent CLANG (https://cilkplus.github.io/). I've several code where I've been able to have very large speedup. See e.g. https://hal.archives-ouvertes.fr/hal-00823339/file/article.pdf if you are curious.

Now if you have a specific use case, I'll be happy to experiment.

comment:99 in reply to: ↑ 97 Changed 3 years ago by hivert

Replying to slabbe:

What I had in mind for the next step to do was to move the SearchForest code from backtrack file into the recursivelyenumerated_set.pyx file *and* into the hierarchy of RecEnumSet classes so that it benefits for example of methods like the newly added to_digraph.

That was also the next goal in my mind.

comment:100 Changed 3 years ago by slabbe

See #16351

comment:101 in reply to: ↑ 98 ; follow-up: Changed 3 years ago by tscrim

  • Milestone changed from sage-6.4 to sage-7.2

Replying to hivert:

Replying to tscrim:

Also, what are the current plans to cythonize SearchForest and the parallel reduce map?

Concerning Cythonization of map-reduce, my current opinion is that, the code is designed for case where the computation of the children is costly. In this case the cost of the current code should be negligible. There are a lot of function calls, I'm using Python synchronization primitive, but most importantly, communication is are by pickling/unpickling which is very slow. So the idea is to reduce at most at possible the communication, but without drastically changing the technology, I don't think you get a large improvement without great effort. You'll need to rewrite all the communication primitive and to inline the function call to the client code.

On the other hand, If you are allowed to write the code directly in C/C++ (which is the case for small basic objects), then the Cilk extension of C is wonderful. It's a small extension which is supported by GCC since version 5, ICC and in recent CLANG (https://cilkplus.github.io/). I've several code where I've been able to have very large speedup. See e.g. https://hal.archives-ouvertes.fr/hal-00823339/file/article.pdf if you are curious.

Now if you have a specific use case, I'll be happy to experiment.

Christian, Vivien, and I are working on trying to speed up the iteration of finite Coxeter groups by using (subgroups of) permutation groups and the iterator method of generic Coxeter groups, which uses a SearchForest (see #11187 and the CoxeterGroups category). Our goal is to try and match or beat GAP3 iteration time (which we are about ~30x slower).

We are in the process pushing our successor computation code to Cython and some of our other methods, but we would appreciate any insights you have.

Yet the code on this ticket will definitely help with that. Thank you for finishing it.

comment:102 in reply to: ↑ 101 ; follow-up: Changed 3 years ago by hivert

Replying to tscrim:

Christian, Vivien, and I are working on trying to speed up the iteration of finite Coxeter groups by using (subgroups of) permutation groups and the iterator method of generic Coxeter groups, which uses a SearchForest (see #11187 and the CoxeterGroups category). Our goal is to try and match or beat GAP3 iteration time (which we are about ~30x slower).

I'll be both very happy and interested to have usecase for this code.

We are in the process pushing our successor computation code to Cython and some of our other methods, but we would appreciate any insights you have.

Do you have a typical example relying on SearchForest of code on top of #11187 that I can profile ? Note that I included in the documentation of map_reduce.py a guide on how to profile the parallel code. I'd like to measure the proportion of time that is spend on Coxeter code one one hand and on the parallel infrastructure on the other.

Now, concerning coxeter group, I'm quite sure that rewriting part of the code in C/C++ and using advanced parallel stuff (AVX instruction set + SIMD) you can get speedup a large as several hundreds and even thousand ! As a witness on https://github.com/hivert/IVMPG I wrote a code whose goal is to enumerate integer vector modulo permutation group. Compared to the code we have in Sage which has been Cythonized and optimized by Simon King I manage to get speedup a large as x2000 ! I need help on the build system to be able to distribute it. I'm pretty sure the code there is very similar to what you need in Coxeter groups.

By the way, should we transfer this discussion on #11187 ?

comment:103 in reply to: ↑ 102 Changed 3 years ago by tscrim

Replying to hivert:

By the way, should we transfer this discussion on #11187 ?

Done.

comment:104 Changed 3 years ago by vbraun

  • Branch changed from u/hivert/ticket/13580 to 134c1fad85dd50fa28528adc343bad759b174cc6
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:105 follow-up: Changed 3 years ago by tmonteil

  • Commit 134c1fad85dd50fa28528adc343bad759b174cc6 deleted

This ticket breaks the doctests on my patchbot, i do not know whether it is because it is run from a VM or from a 32-bit system, see http://patchbot.sagemath.org/log/0/debian/8.3/i686/3.16.0-4-586/tmonteil-debian-jessie-32/2016-04-15%2012:13:01?short

comment:106 Changed 3 years ago by tmonteil

Note also that the VM emulates a "Pentium III (Katmai)"

comment:107 in reply to: ↑ 105 ; follow-up: Changed 3 years ago by hivert

Replying to tmonteil:

This ticket breaks the doctests on my patchbot, i do not know whether it is because it is run from a VM or from a 32-bit system, see http://patchbot.sagemath.org/log/0/debian/8.3/i686/3.16.0-4-586/tmonteil-debian-jessie-32/2016-04-15%2012:13:01?short

From the error message, it is neither because of 32 bits nor because of the VM but because your (virtual) machine has only one core. The goal of the patch is to distribute computation on several cores. The doctest assumes that your machine has at least 2 cores. I should fix that. Though I'm not sure what to do on those kind of machine. Pretend that they are two core and let the computation run on those two cores ?

comment:108 in reply to: ↑ 107 Changed 3 years ago by hivert

Replying to hivert:

From the error message, it is neither because of 32 bits nor because of the VM but because your (virtual) machine has only one core. The goal of the patch is to distribute computation on several cores. The doctest assumes that your machine has at least 2 cores. I should fix that. Though I'm not sure what to do on those kind of machine. Pretend that they are two core and let the computation run on those two cores ?

I just checked that I get the very same errors on my machine pretending that it has only one core.

comment:109 Changed 3 years ago by tmonteil

OK thanks for isolating the culplrit, i made #20449 as a follwup.

Last edited 3 years ago by tmonteil (previous) (diff)
Note: See TracTickets for help on using tickets.