#13580 closed enhancement (fixed)
Parallel map reduce on SearchForest
Reported by:  hivert  Owned by:  hivert 

Priority:  major  Milestone:  sage7.2 
Component:  combinatorics  Keywords:  mapreduce, days57, days77 
Cc:  sagecombinat, ncohen, jdemeyer, slabbe  Merged in:  
Authors:  Florent Hivert, JeanBaptiste Priez, Nathann Cohen  Reviewers:  Sébastien Labbé, JeanBaptiste Priez 
Report Upstream:  N/A  Work issues:  
Branch:  134c1fa (Commits, GitHub, GitLab)  Commit:  
Dependencies:  Stopgaps: 
Description (last modified by )
Implement a map reduce algorithm in parallel on large sets described by a SearchForest
. We use a workstealing algorithm (see https://en.wikipedia.org/wiki/Work_stealing) based on Python's multiprocessing.
Change History (109)
comment:1 Changed 8 years ago by
comment:2 Changed 8 years ago by
 Milestone changed from sage5.11 to sage5.12
comment:3 Changed 7 years ago by
 Milestone changed from sage6.1 to sage6.2
comment:4 Changed 7 years ago by
 Branch set to u/hivert/13580/map_reduce
comment:5 Changed 7 years ago by
 Commit set to 693c672387212c47ac65012466f92ffa2cc55932
 Keywords days57 added
New commits:
693c672  Imported code from trac_13580map_reducefh.patch + fixed multiline doctests.

comment:6 Changed 7 years ago by
 Milestone changed from sage6.2 to sage6.3
comment:7 Changed 7 years ago by
 Milestone changed from sage6.3 to sage6.4
comment:8 Changed 6 years ago by
 Branch changed from u/hivert/13580/map_reduce to u/nthiery/13580/map_reduce
comment:9 Changed 6 years ago by
 Commit changed from 693c672387212c47ac65012466f92ffa2cc55932 to addb17a177397cd9de0952e7c1ee457fe49220bc
Branch pushed to git repo; I updated commit sha1. New commits:
addb17a  13580: Trivial rest fix

comment:10 Changed 6 years ago by
 Branch changed from u/nthiery/13580/map_reduce to u/hivert/13580/map_reduce
comment:11 Changed 6 years ago by
 Commit changed from addb17a177397cd9de0952e7c1ee457fe49220bc to 92e6e680771d6f286954bf85ee0064d22a0b253c
Branch pushed to git repo; I updated commit sha1. New commits:
92e6e68  Merge branch 't/13580/map_reduce' into 13580

comment:12 Changed 6 years ago by
 Commit changed from 92e6e680771d6f286954bf85ee0064d22a0b253c to 734c74810e63321463e4bca7bcf0da85dffc47b9
Branch pushed to git repo; I updated commit sha1. New commits:
734c748  #13580 Fixed test after merging #6637

comment:13 Changed 6 years ago by
I saw the following typo in map reduce file while looking at the previous commit: "As an example, ee compute"
comment:14 Changed 6 years ago by
 Commit changed from 734c74810e63321463e4bca7bcf0da85dffc47b9 to f234f60f37e007365c2aec1b4defea55ff255c70
Branch pushed to git repo; I updated commit sha1. New commits:
f234f60  #13580 : improved documentation

comment:15 Changed 6 years ago by
 Commit changed from f234f60f37e007365c2aec1b4defea55ff255c70 to 7a33037a2d444e2f6981637d3e187cbcca72ba7d
comment:16 Changed 6 years ago by
 Commit changed from 7a33037a2d444e2f6981637d3e187cbcca72ba7d to 168ceae047328c6c529e0ce70d91a4300efabe29
Branch pushed to git repo; I updated commit sha1. New commits:
168ceae  Added architecture picture for map_reduce

comment:17 Changed 6 years ago by
 Commit changed from 168ceae047328c6c529e0ce70d91a4300efabe29 to 6c127527dfcd62673f2d871616a22b95a4b368e9
comment:18 Changed 6 years ago by
 Commit changed from 6c127527dfcd62673f2d871616a22b95a4b368e9 to 493bb52f98bb2551eb557528fbfc2d6a908bd38d
Branch pushed to git repo; I updated commit sha1. New commits:
493bb52  Done the doc of Map/Reduce

comment:19 Changed 6 years ago by
 Commit changed from 493bb52f98bb2551eb557528fbfc2d6a908bd38d to 2534f12eaf4a820c6e9a76894aabf19775b8ba05
Branch pushed to git repo; I updated commit sha1. New commits:
2534f12  Moved map/reduce to sage/parallel

comment:20 Changed 6 years ago by
 Cc jdemeyer slabbe added
 Description modified (diff)
 Status changed from new to needs_review
comment:21 followup: ↓ 22 Changed 6 years ago by
 Status changed from needs_review to needs_work
The branch currently have conflicts with development version of Sage (because the link is red in the description of the ticket above). The branch seems to be based on 6.2.beta6 which is old and may explain the presence of a conflict.
comment:22 in reply to: ↑ 21 ; followup: ↓ 25 Changed 6 years ago by
Replying to slabbe:
The branch currently have conflicts with development version of Sage (because the link is red in the description of the ticket above). The branch seems to be based on 6.2.beta6 which is old and may explain the presence of a conflict.
Yes ! there is a trivial conflict. Thank you for pointing it. I'm fixing it, testing and reuploading.
comment:23 Changed 5 years ago by
 Commit changed from 2534f12eaf4a820c6e9a76894aabf19775b8ba05 to e5b74770eec0b0e93e22a70358bc83f16893ce2a
Branch pushed to git repo; I updated commit sha1. New commits:
e5b7477  Merge 6.10.rc1 + fixed conflict

comment:24 Changed 5 years ago by
 Commit changed from e5b74770eec0b0e93e22a70358bc83f16893ce2a to 82fd1e47599b75ef22133d18199ed1ee15478f5d
Branch pushed to git repo; I updated commit sha1. New commits:
82fd1e4  Fixed links according to deprecation/rebase

comment:25 in reply to: ↑ 22 Changed 5 years ago by
 Status changed from needs_work to needs_review
Replying to hivert:
Replying to slabbe:
The branch currently have conflicts with development version of Sage (because the link is red in the description of the ticket above). The branch seems to be based on 6.2.beta6 which is old and may explain the presence of a conflict.
Yes ! there is a trivial conflict. Thank you for pointing it. I'm fixing it, testing and reuploading.
Done ! I was actually based on 6.9.
comment:26 Changed 5 years ago by
 Commit changed from 82fd1e47599b75ef22133d18199ed1ee15478f5d to 68b6530ecc825e773bdd34b09a98b21eb4d8987c
Branch pushed to git repo; I updated commit sha1. New commits:
68b6530  Removed TODO in doc

comment:27 Changed 5 years ago by
 Summary changed from Parellel map reduce on SearchForest to Parallel map reduce on SearchForest
comment:28 Changed 5 years ago by
 Commit changed from 68b6530ecc825e773bdd34b09a98b21eb4d8987c to b93ba7da6f57a9fd6f02469819f4d3ce0c2f289b
Branch pushed to git repo; I updated commit sha1. New commits:
b93ba7d  Fixed a link + doctest

comment:29 followup: ↓ 30 Changed 5 years ago by
 Commit changed from b93ba7da6f57a9fd6f02469819f4d3ce0c2f289b to 5c7720dae1034f1b2e04bb9abb3ea90797de55ee
Branch pushed to git repo; I updated commit sha1. New commits:
5c7720d  Fixed doctest continuations and raise statements

comment:30 in reply to: ↑ 29 Changed 5 years ago by
comment:31 Changed 5 years ago by
 Commit changed from 5c7720dae1034f1b2e04bb9abb3ea90797de55ee to cb9c011d0b7f2f3a736f7418c51d49ccc8c5b715
Branch pushed to git repo; I updated commit sha1. New commits:
cb9c011  Fixed another multiline doctest

comment:32 followup: ↓ 34 Changed 5 years ago by
I just looked at the code in the browser. Looks good. I will do real code review later this week hopefully. Quickly, I saw two typos:
First line of /src/sage/parallel/map_reduce.py
: Paralell
Later in the same file I think: parameters tu the
comment:33 Changed 5 years ago by
 Commit changed from cb9c011d0b7f2f3a736f7418c51d49ccc8c5b715 to 14b086beac66e6a3fe3bf6be2cd7ae0e28c20152
Branch pushed to git repo; I updated commit sha1. New commits:
14b086b  Fixed Sebastien's Typos

comment:34 in reply to: ↑ 32 Changed 5 years ago by
Replying to slabbe:
I just looked at the code in the browser. Looks good. I will do real code review later this week hopefully. Quickly, I saw two typos:
Fixed ! I'm afraid you'll find more ! By the way, removed distributed from the title since I don't do distributed computation anymore. I had a prototype which allowed to launch those computation on a cluster of machines, but it induces a huge performance loss do I dropped It. maybe we will work on this with Jeroen...
comment:35 followup: ↓ 36 Changed 5 years ago by
Dear Florent, I know that you like multiline doctests, but for two reasons I prefer to avoid them as much as possible in doctests:
 It is not fun for the user who wants to adapt an example after a copy paste in the console. The up arrow brings the multiline all at once and it is not fun to change the value 10 to a value 25 let say.
 Using variables to store the input before calling the method gives lot of information easily to the user: what are the argument names, in what order should they be used.
I know I am asking you to change your style by asking you this, but don't you agree with me or can you provide some counterarguments to mine? I give an example below:

src/sage/combinat/backtrack.py
diff git a/src/sage/combinat/backtrack.py b/src/sage/combinat/backtrack.py index 9dee057..ac1243a 100644
a b class SearchForest(Parent): 695 695 reduce_function = None, 696 696 reduce_init = None): 697 697 r""" 698 Apply enMap Reduce algorithm on ``self``698 Apply a Map Reduce algorithm on ``self`` 699 699 700 700 EXAMPLES:: 701 701 702 sage: F = RecursivelyEnumeratedSet( [([i],i, i) for i in range(1,10)], 703 ....: lambda (list, sum, last): 704 ....: [(list + [i], sum + i, i) for i in range(1,last)], 705 ....: structure='forest', enumeration='depth') 702 sage: seeds = [([i],i, i) for i in range(1,10)] 703 sage: succ = lambda (list, sum, last): 704 ....: [(list + [i], sum + i, i) for i in range(1,last)] 705 sage: F = RecursivelyEnumeratedSet(seeds, succ, 706 ....: structure='forest', enumeration='depth') 706 707 sage: y = var('y') 707 sage: F.map_reduce( 708 ....: lambda (li, sum, _): y**sum, lambda x,y: x + y, 0 ) 708 sage: map_function = lambda (li, sum, _): y**sum 709 sage: reduce_function = lambda x,y: x + y 710 sage: F.map_reduce(map_function, reduce_function, 0) 709 711 y^45 + y^44 + y^43 + 2*y^42 + 2*y^41 + 3*y^40 + 4*y^39 + 5*y^38 + 6*y^37 + 8*y^36 + 9*y^35 710 712 711 713 .. SEEALSO:: :mod:`sage.parallel.map_reduce`
I noticed another typo on the same line as the one parameters tu the
: On
> One
comment:36 in reply to: ↑ 35 ; followup: ↓ 37 Changed 5 years ago by
Replying to slabbe:
Dear Florent, I know that you like multiline doctests, but for two reasons I prefer to avoid them as much as possible in doctests:
 It is not fun for the user who wants to adapt an example after a copy paste in the console. The up arrow brings the multiline all at once and it is not fun to change the value 10 to a value 25 let say.
I'm tempted to answer that I don't have this problem with emacs ;). Of course, for the people using a two letters editor...
 Using variables to store the input before calling the method gives lot of information easily to the user: what are the argument names, in what order should they be used.
I'm much more convinced by this argument. I'm watching an exam tomorrow evening. I'll change my code accordingly.
comment:37 in reply to: ↑ 36 ; followup: ↓ 42 Changed 5 years ago by
Replying to hivert:
I'm much more convinced by this argument. I'm watching an exam tomorrow evening. I'll change my code accordingly.
Actually, I'm not that much convinced. It's not clear for me that
sage: map_function = lambda x: 1 sage: reduce_function = lambda x,y: x+y sage: reduce_init = 0 sage: S.map_reduce(map_function, reduce_function, reduce_init) 131071
is much better than
sage: S.map_reduce( ....: map_function = lambda x: 1, ....: reduce_function = lambda x,y: x+y, ....: reduce_init = 0 )
In this second version which is the style I adopted in map_reduce.py
, we don't repeat the name twice, which is better in my opinion.
A third possibility is
sage: S.map_reduce(lambda x: 1, lambda x,y: x+y, 0)
It is the one used through the whole backtrack.py
files. Unless for very short example, I don't find it very readable. But since it's not adding new code but changing old one, which needs even more cleanup and doc improvement, I'd rather keep it for another ticket. More precisly, I'd rather do that during #16351.
What do you think ?
comment:38 followup: ↓ 43 Changed 5 years ago by
I'm not sure to understand the patchbot complaint ? Is it because the map_reduce
is not imported as Sage's startup ?
comment:39 Changed 5 years ago by
I just uploaded a doc improvement: the map_reduce
function was missing the description of the input.
comment:40 Changed 5 years ago by
 Commit changed from 14b086beac66e6a3fe3bf6be2cd7ae0e28c20152 to e3c4c716b4dbd9650f06de80e6477bb4e9d36a1c
Branch pushed to git repo; I updated commit sha1. New commits:
e3c4c71  Doc improvements

comment:41 Changed 5 years ago by
 Commit changed from e3c4c716b4dbd9650f06de80e6477bb4e9d36a1c to 31c4735651aa4f56c7e6899e6352669fa2b21d99
Branch pushed to git repo; I updated commit sha1. New commits:
31c4735  Removed unneded import in pxd

comment:42 in reply to: ↑ 37 Changed 5 years ago by
Replying to hivert:
What do you think ?
I think that some doctest are more important than other, especially the first doctests that people might be expected to see when using the map reduce code. For example the map_reduce
method in the recursively enumerated set file is one possible entry point where clarity of doctests matters more. Therefore, I like the way you change the doctest. Definitively, the top of the module is an important entry point. For other sub methods of the class map reduce, I care less...
comment:43 in reply to: ↑ 38 ; followup: ↓ 49 Changed 5 years ago by
Replying to hivert:
I'm not sure to understand the patchbot complaint ? Is it because the
map_reduce
is not imported as Sage's startup ?
I don't see why neither...
comment:44 followup: ↓ 47 Changed 5 years ago by
Event
, Condition
, time
and os
are imported in map_reduce file but are not used.
comment:45 followup: ↓ 46 Changed 5 years ago by
Is there a reason why you use res=[""]
and res[0]
in print_communication_statistics
instead of res=""
and res
?
comment:46 in reply to: ↑ 45 ; followup: ↓ 50 Changed 5 years ago by
Replying to slabbe:
Is there a reason why you use
res=[""]
andres[0]
inprint_communication_statistics
instead ofres=""
andres
?
Yes ! This is the classical trick to have a local variable shared with the local function (see e.g: http://stackoverflow.com/questions/2609518/pythonnestedfunctionscopes).
comment:47 in reply to: ↑ 44 Changed 5 years ago by
Replying to slabbe:
Event
,Condition
,time
andos
are imported in map_reduce file but are not used.
Fixed in my last push.
comment:48 Changed 5 years ago by
 Commit changed from 31c4735651aa4f56c7e6899e6352669fa2b21d99 to cca788124c7eb757cceda7cb7a73f9af28149a1e
Branch pushed to git repo; I updated commit sha1. New commits:
cca7881  Removed unneeded import

comment:49 in reply to: ↑ 43 Changed 5 years ago by
Replying to slabbe:
I don't see why neither...
According to Vincent on Sage devel it is probably a bug of the patchbot (see https://groups.google.com/forum/#!topic/sagedevel/xlpLwktA5Hk).
comment:50 in reply to: ↑ 46 Changed 5 years ago by
Replying to hivert:
Replying to slabbe:
Is there a reason why you use
res=[""]
andres[0]
inprint_communication_statistics
instead ofres=""
andres
?Yes ! This is the classical trick to have a local variable shared with the local function (see e.g: http://stackoverflow.com/questions/2609518/pythonnestedfunctionscopes).
Could you add this information inside the code of that method?
comment:51 followup: ↓ 56 Changed 5 years ago by
map_reduce
method of SearchForest
should document what happen when this method is called with no arguments, i.e. returns the cardinality. More precisely, it should say that the value None
for arguments map_function
and reduce_function
is replaced by what?
comment:52 Changed 5 years ago by
On can use the three following parameters:
> One can use the three following parameters:
comment:53 followup: ↓ 55 Changed 5 years ago by
I get the following doctests error on top on sage6.10 on my machine:
Git branch: 13580 Using optional=gcc,mpir,python2,sage Doctesting 1 file. sage t warnlong 85.4 map_reduce.py ********************************************************************** File "map_reduce.py", line 960, in sage.parallel.map_reduce.RESetMapReduce._signal_task_start Failed example: S._active_tasks Expected: <Semaphore(value=2)> Got: <Semaphore(value=unknown)> ********************************************************************** File "map_reduce.py", line 964, in sage.parallel.map_reduce.RESetMapReduce._signal_task_start Failed example: S._active_tasks Expected: <Semaphore(value=3)> Got: <Semaphore(value=unknown)> ********************************************************************** File "map_reduce.py", line 985, in sage.parallel.map_reduce.RESetMapReduce._signal_task_done Failed example: S._active_tasks Expected: <Semaphore(value=2)> Got: <Semaphore(value=unknown)> ********************************************************************** File "map_reduce.py", line 989, in sage.parallel.map_reduce.RESetMapReduce._signal_task_done Failed example: S._active_tasks Expected: <Semaphore(value=1)> Got: <Semaphore(value=unknown)> ********************************************************************** 2 items had failures: 2 of 9 in sage.parallel.map_reduce.RESetMapReduce._signal_task_done 2 of 7 in sage.parallel.map_reduce.RESetMapReduce._signal_task_start [249 tests, 4 failures, 43.04 s]  sage t warnlong 85.4 map_reduce.py # 4 doctests failed 
It seems ok on patchbot reporting seemingly unrelated errors :
 sage t long src/sage/interfaces/expect.py # Timed out sage t long src/sage/interfaces/gap.py # 10 doctests failed 
comment:54 Changed 5 years ago by
 Status changed from needs_review to needs_work
I divided the review in a stack of tasks but since I am unable to parallelize my working time, I did the review serially with many interruptions during the previous days:)
I finished to look at what I wanted to look at. The documentation compiles good and helps to understand what is happening. I was able to test the parallel computation on at least one extensive example. To me it will be a positive review after my previous four comments are answered.
comment:55 in reply to: ↑ 53 ; followup: ↓ 60 Changed 5 years ago by
Replying to slabbe:
I get the following doctests error on top on sage6.10 on my machine:
Expected: <Semaphore(value=1)> Got: <Semaphore(value=unknown)>
Is this happening on MacOS ? I know that Semaphore are implemented differently on this system. See in particular the note after class multiprocessing.BoundedSemaphore
, where it says:
on Unix platforms like Mac OS X where sem_getvalue() is not implemented.
If so and if the rests behave correctly, I'm tempted to replace the doctests with
<Semaphore(value=...)>
What do you think ?
comment:56 in reply to: ↑ 51 ; followup: ↓ 61 Changed 5 years ago by
Replying to slabbe:
map_reduce
method ofSearchForest
should document what happen when this method is called with no arguments, i.e. returns the cardinality. More precisely, it should say that the valueNone
for argumentsmap_function
andreduce_function
is replaced by what?
I didn't document this one because I'm was not sure It was a sensible default. If you think it is then let's go for it.
By the way thanks for your careful review !!!
Florent
comment:57 Changed 5 years ago by
 Commit changed from cca788124c7eb757cceda7cb7a73f9af28149a1e to 1a8b78e378bf3909a5c4ad4b15a77df7320ab42d
Branch pushed to git repo; I updated commit sha1. New commits:
1a8b78e   MacOSX compatible doctests for semaphore.

comment:58 Changed 5 years ago by
I uploaded a patch which should address all your concerns:
 tests on MacOSX
 typos
 default values documentation
comment:59 Changed 5 years ago by
Introduced typo in the previous commit: the function costant to 1.
> constant
comment:60 in reply to: ↑ 55 Changed 5 years ago by
Replying to hivert:
Replying to slabbe:
I get the following doctests error on top on sage6.10 on my machine:
Expected: <Semaphore(value=1)> Got: <Semaphore(value=unknown)>Is this happening on MacOS ?
Yes
If so and if the rests behave correctly, I'm tempted to replace the doctests with
<Semaphore(value=...)>What do you think ?
I agree. All tests passed now...
comment:61 in reply to: ↑ 56 Changed 5 years ago by
Replying to hivert:
I didn't document this one because I'm was not sure It was a sensible default. If you think it is then let's go for it.
I think it is a good default. Also, sometimes, you may want to set only some of the arguments and this information allows it. Finally, it's good to see that cardinality is a special case of this method.
comment:62 Changed 5 years ago by
 Reviewers set to Sébastien Labbé
comment:63 Changed 5 years ago by
Just to be clear: I am waiting for the costant
> constant
fix to change the status to positive review.
comment:64 Changed 5 years ago by
 Branch changed from u/hivert/13580/map_reduce to u/nthiery/13580/map_reduce
comment:65 Changed 5 years ago by
 Commit changed from 1a8b78e378bf3909a5c4ad4b15a77df7320ab42d to 98fd9e1c7a085b1ba62d004dba6cca4c89001cd2
 Status changed from needs_work to positive_review
Typo fixed, hence changing the status to positive review on Sebastien's behalf.
New commits:
98fd9e1  13580: typo fix

comment:66 followups: ↓ 68 ↓ 78 Changed 5 years ago by
 Status changed from positive_review to needs_work
comment:67 Changed 5 years ago by
Hello and merry christmas every one,
I have some question on this ticket.
Firstly, I don't understand the documentation from the line 571:
Decription of the map/reduce operation:  ``map_function=f``  (default to ``None``)  ``reduce_function=red``  (default to ``None``)  ``reduce_init=init``  (default to ``None``)
What means f
, red
and init
? My opinion is that f
, red
and init
are irrelevant but an example should be interesting at this point:
(copypaste of one in the module documentation)
sage: from sage.parallel.map_reduce import RESetMapReduce sage: S = RESetMapReduce( ....: roots = [[]], ....: children = lambda l: [l+[0], l+[1]] if len(l) <= 15 else [], ....: map_function = lambda x : 1, ....: reduce_function = lambda x,y: x+y, ....: reduce_init = 0 ) sage: S.run() 131071
Ok, there is the seealso
which says go to see the documentation of the module for examples, but one example there is interesting, no?
Then it seems there is some useless import (like line 1345 from multiprocessing import current_process
) (I try to compile sage soon and I can remove all useless import if you want).
Finally, I'm not sure to understand the use of AbortError
. It is used to abort the computation and to say to the workers and the thiefs every think is done?
JeanBaptiste
comment:68 in reply to: ↑ 66 Changed 5 years ago by
Replying to vbraun:
I can not reproduce this problem on my Mac OS X Yosemite 10.10.2:
$ sage tp long src/sage/parallel/map_reduce.py Running doctests with ID 2015122810443165e477d9. Git branch: 13580 Using optional=gcc,mpir,python2,sage Doctesting 1 file using 2 threads. sage t long warnlong 85.0 src/sage/parallel/map_reduce.py [252 tests, 40.89 s]  All tests passed! 
comment:69 Changed 5 years ago by
Hello,
I can reproduce test failures on El Capitan (os x):
I run 3 tests and I obtained 3 distincts failures. My problem is that if I try to execute those lines which fails in sage terminal (several times) then it does not fail!
Is it possible that the test environment is not robust specially on mac os?
comment:70 followup: ↓ 71 Changed 5 years ago by
I read the code and I don't know how to understand this:
res = post_process(node) if res is not None: self._res = reduc(self._res, mapp(res))
in the worker code of walk_branch_locally
method.
I think this is useless, no? Or if not then the documentation of post_process
function should be improve to specify that if post_process
returns None
then there is a specific behavior.
comment:71 in reply to: ↑ 70 Changed 5 years ago by
Replying to elixyre:
res = post_process(node) if res is not None: self._res = reduc(self._res, mapp(res))I think this is useless, no? Or if not then the documentation of
post_process
function should be improve to specify that ifpost_process
returnsNone
then there is a specific behavior.
Ok, this behavior is specify in the documentation of RESetMapReduce
but this is not specified in paragraph (line 135) associated of this module.
comment:72 Changed 5 years ago by
Furthermore the lines:
for child in newnodes: self._todo.append(child)
of walk_branch_locally
should be replaced by
self._todo_.extend(newnodes)
(I just notice that I read and if you are agree with my remark then I can replace that)
comment:73 followup: ↓ 74 Changed 5 years ago by
Go back to None
activities, if I assume some users want to use nodes which could be None
then my first idea is to use the post_process
function to avoid problem but... if the thief doesn't receive then he send None
and he says there is no job to do.
I propose two solutions:
 first one, to specify that
None
is not a node never! (this is assumed but this must be explicited in the documentation),  second one (the object way), to define an object
NoTask
.
Opinion?
comment:74 in reply to: ↑ 73 Changed 5 years ago by
Replying to elixyre:
Go back to
None
activities, if I assume some users want to use nodes which could beNone
then my first idea is to use thepost_process
function to avoid problem but... if the thief doesn't receive then he sendNone
and he says there is no job to do.I propose two solutions:
 first one, to specify that
None
is not a node never! (this is assumed but this must be explicited in the documentation),
I'm clearly in Favor of this solution. We assume that None
doesn't blong to the set described by the underlying RESet. Note that this is something which concerns more the underlying RESet than the mapreduce computation.
comment:75 Changed 5 years ago by
 Branch changed from u/nthiery/13580/map_reduce to u/hivert/13580/map_reduce
comment:76 Changed 5 years ago by
 Commit changed from 98fd9e1c7a085b1ba62d004dba6cca4c89001cd2 to 6e3bbd35f4572dffa1953e4f4acddc916a36c967
New commits:
6e3bbd3  More logging for debugging

comment:77 Changed 5 years ago by
 Commit changed from 6e3bbd35f4572dffa1953e4f4acddc916a36c967 to 6b4a5cffe362093e6011801d3485a9dc02ddc1e2
comment:78 in reply to: ↑ 66 Changed 5 years ago by
Replying to vbraun:
We finally got the problem with JeanBaptiste. It occurs that semaphore are broken on MacOSX (or at least are not fully POSIX compliant). In particular, on standard unixes, when two processes are trying to acquire a semaphore whose value is more than two, they always both succeeded. On MacOS, one of them may fail. As a consequence, I'm writing a different code form MacOS relying on a Lock and a shared integer. It may be slower on system where semaphore are implemented in a lockless way.
Is there a standard Sage/Python? way to check if we are on MacOS ?
comment:79 Changed 5 years ago by
 Branch changed from u/hivert/13580/map_reduce to u/elixyre/ticket/13580
 Commit changed from 6b4a5cffe362093e6011801d3485a9dc02ddc1e2 to 44566e6dea943f9494fb315d9e3794c7b309c6de
 Reviewers changed from Sébastien Labbé to Sébastien Labbé, JeanBaptiste Priez
New commits:
44566e6  ticket search forest map reduce with no problem on mac os x

comment:80 Changed 5 years ago by
I pushed the modifications done with Florent about mac os problem.
It seems there still exists the problem of synchronization with the Pipe
(of the worker) but... it is never appear... so may be I am wrong.
Is it possible that two distinct process write in the same time on the _write_task
of a worker?
comment:81 Changed 5 years ago by
 Branch changed from u/elixyre/ticket/13580 to u/hivert/ticket/13580
comment:82 Changed 5 years ago by
 Commit changed from 44566e6dea943f9494fb315d9e3794c7b309c6de to aa8280f9cf68d1e592b760cc85bb7442b47a243e
I just uploaded a version which hopefully should be working on all OSes including MacOS. Please tests on various oses and report any failure here. I don't put needs review yet because it needs a few refactoring and the coverage it not yet 100%.
New commits:
9456035  Work in problem in fixing MACOX broken Semaphore

158023b  Wrote commented out MacOS version

aa8280f  Mapreduce version which should be working even on Darwin based OSes.

comment:83 Changed 5 years ago by
 Commit changed from aa8280f9cf68d1e592b760cc85bb7442b47a243e to f9841658ccda142d07cbfaa633dbf4b236165986
Branch pushed to git repo; I updated commit sha1. New commits:
f984165  Fixed the doc

comment:84 Changed 5 years ago by
 Commit changed from f9841658ccda142d07cbfaa633dbf4b236165986 to a5f4d676ce7201a801bf6c841aae0585fc144336
Branch pushed to git repo; I updated commit sha1. New commits:
a5f4d67  Improved doc for mapreduce

comment:85 Changed 5 years ago by
 Commit changed from a5f4d676ce7201a801bf6c841aae0585fc144336 to 569de0dae0d23e084cbce7690a2a64b53f906f8b
Branch pushed to git repo; I updated commit sha1. New commits:
569de0d  Merge branch 'develop' into t/13580/map_reduce

comment:86 Changed 5 years ago by
 Commit changed from 569de0dae0d23e084cbce7690a2a64b53f906f8b to b07dfe28221a7473041d65230be409d6cbfafbfc
Branch pushed to git repo; I updated commit sha1. New commits:
b07dfe2  Doc of the two implementationsof ActiveTaskCounter

comment:87 Changed 5 years ago by
 Commit changed from b07dfe28221a7473041d65230be409d6cbfafbfc to beebcbcd43c2d7a9d024f142f5e3cb328fbc4036
Branch pushed to git repo; I updated commit sha1. New commits:
beebcbc  #13580: Added comment on timing in the doc

comment:88 Changed 5 years ago by
 Keywords days77 added
 Status changed from needs_work to needs_review
I just uploaded a patch with basically two implementation of ActiveTaskCounter
: one using semaphore and which doesn't work on MACOS and one using semaphore for all POSIX compliant OSes. It should be ready for review.
comment:89 Changed 5 years ago by
 Commit changed from beebcbcd43c2d7a9d024f142f5e3cb328fbc4036 to 58eca2e4decb05733145b18e08f16120b8765a81
Branch pushed to git repo; I updated commit sha1. New commits:
58eca2e  #13580: Removed comment which is now in the doc

comment:90 Changed 5 years ago by
 Commit changed from 58eca2e4decb05733145b18e08f16120b8765a81 to 1badd8aec7efc3229cfdf76bffab3ad1f532bd33
Branch pushed to git repo; I updated commit sha1. New commits:
1badd8a  #13580: Renamed ActiveTaskCounter(Posix)

comment:91 Changed 5 years ago by
 Status changed from needs_review to needs_work
I get the following errors on OSX :
Running doctests with ID 201604071923558fd43b63. Git branch: 13580 Using optional=dot2tex,mpir,python2,sage Doctesting 1 file using 2 threads. sage t long warnlong 84.5 src/sage/parallel/map_reduce.py ********************************************************************** File "src/sage/parallel/map_reduce.py", line 727, in sage.parallel.map_reduce.ActiveTaskCounterPosix.__repr__ Failed example: ATC(4) Expected: ActiveTaskCounter(value=4) Got: <repr(<sage.parallel.map_reduce.ActiveTaskCounterPosix at 0x190d32e90>) failed: NotImplementedError> ********************************************************************** File "src/sage/parallel/map_reduce.py", line 743, in sage.parallel.map_reduce.ActiveTaskCounterPosix.task_start Failed example: c = ATC(4); c Expected: ActiveTaskCounter(value=4) Got: <repr(<sage.parallel.map_reduce.ActiveTaskCounterPosix at 0x190d32e50>) failed: NotImplementedError> ********************************************************************** File "src/sage/parallel/map_reduce.py", line 745, in sage.parallel.map_reduce.ActiveTaskCounterPosix.task_start Failed example: c.task_start() Exception raised: Traceback (most recent call last): File "/Users/slabbe/Applications/sagegit/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 496, in _run self.compile_and_execute(example, compiler, test.globs) File "/Users/slabbe/Applications/sagegit/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 858, in compile_and_execute exec(compiled, globs) File "<doctest sage.parallel.map_reduce.ActiveTaskCounterPosix.task_start[2]>", line 1, in <module> c.task_start() File "/Users/slabbe/Applications/sagegit/local/lib/python2.7/sitepackages/sage/parallel/map_reduce.py", line 764, in task_start return self._active_tasks.get_value() File "/Users/slabbe/Applications/sagegit/local/lib/python/multiprocessing/synchronize.py", line 114, in get_value return self._semlock._get_value() NotImplementedError ********************************************************************** File "src/sage/parallel/map_reduce.py", line 747, in sage.parallel.map_reduce.ActiveTaskCounterPosix.task_start Failed example: c Expected: ActiveTaskCounter(value=5) Got: <repr(<sage.parallel.map_reduce.ActiveTaskCounterPosix at 0x190d32e50>) failed: NotImplementedError> ********************************************************************** File "src/sage/parallel/map_reduce.py", line 755, in sage.parallel.map_reduce.ActiveTaskCounterPosix.task_start Failed example: c Expected: ActiveTaskCounter(value=0) Got: <repr(<sage.parallel.map_reduce.ActiveTaskCounterPosix at 0x190d32f90>) failed: NotImplementedError> ********************************************************************** File "src/sage/parallel/map_reduce.py", line 778, in sage.parallel.map_reduce.ActiveTaskCounterPosix.task_done Failed example: c = ATC(4); c Expected: ActiveTaskCounter(value=4) Got: <repr(<sage.parallel.map_reduce.ActiveTaskCounterPosix at 0x1026a5c10>) failed: NotImplementedError> ********************************************************************** File "src/sage/parallel/map_reduce.py", line 780, in sage.parallel.map_reduce.ActiveTaskCounterPosix.task_done Failed example: c.task_done() Exception raised: Traceback (most recent call last): File "/Users/slabbe/Applications/sagegit/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 496, in _run self.compile_and_execute(example, compiler, test.globs) File "/Users/slabbe/Applications/sagegit/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 858, in compile_and_execute exec(compiled, globs) File "<doctest sage.parallel.map_reduce.ActiveTaskCounterPosix.task_done[2]>", line 1, in <module> c.task_done() File "/Users/slabbe/Applications/sagegit/local/lib/python2.7/sitepackages/sage/parallel/map_reduce.py", line 796, in task_done return self._active_tasks.get_value() File "/Users/slabbe/Applications/sagegit/local/lib/python/multiprocessing/synchronize.py", line 114, in get_value return self._semlock._get_value() NotImplementedError ********************************************************************** File "src/sage/parallel/map_reduce.py", line 782, in sage.parallel.map_reduce.ActiveTaskCounterPosix.task_done Failed example: c Expected: ActiveTaskCounter(value=3) Got: <repr(<sage.parallel.map_reduce.ActiveTaskCounterPosix at 0x1026a5c10>) failed: NotImplementedError> ********************************************************************** File "src/sage/parallel/map_reduce.py", line 805, in sage.parallel.map_reduce.ActiveTaskCounterPosix.abort Failed example: c = ATC(4); c Expected: ActiveTaskCounter(value=4) Got: <repr(<sage.parallel.map_reduce.ActiveTaskCounterPosix at 0x1026a5910>) failed: NotImplementedError> ********************************************************************** File "src/sage/parallel/map_reduce.py", line 808, in sage.parallel.map_reduce.ActiveTaskCounterPosix.abort Failed example: c Expected: ActiveTaskCounter(value=0) Got: <repr(<sage.parallel.map_reduce.ActiveTaskCounterPosix at 0x1026a5910>) failed: NotImplementedError> ********************************************************************** 4 items had failures: 1 of 3 in sage.parallel.map_reduce.ActiveTaskCounterPosix.__repr__ 2 of 5 in sage.parallel.map_reduce.ActiveTaskCounterPosix.abort 3 of 7 in sage.parallel.map_reduce.ActiveTaskCounterPosix.task_done 4 of 8 in sage.parallel.map_reduce.ActiveTaskCounterPosix.task_start [296 tests, 10 failures, 33.64 s]  sage t long warnlong 84.5 src/sage/parallel/map_reduce.py # 10 doctests failed  Total time for all tests: 33.9 seconds cpu time: 5.8 seconds cumulative wall time: 33.6 seconds
comment:92 Changed 5 years ago by
 Commit changed from 1badd8aec7efc3229cfdf76bffab3ad1f532bd33 to 46cbab9d81890649c3d056f47ed62065bcb04ca3
Branch pushed to git repo; I updated commit sha1. New commits:
46cbab9  13580: Fixed doctests to pass on Darwin

comment:93 Changed 5 years ago by
It works now! :
Using optional=dot2tex,mpir,python2,sage Doctesting 1 file using 2 threads. sage t warnlong 84.3 src/sage/parallel/map_reduce.py [296 tests, 32.56 s]  All tests passed! 
comment:94 Changed 5 years ago by
some typos:
compute the the cardinality
its job. Non only mapping
> Not only
method so that to compute the number of element you only need to call::
> wrap the line
precedingly by we only
> previously
? , but
I would suggest to replace this:
OUTPUT: Calling :meth:`task_done` decrement the counter and returns its value after the decrementation.
by
OUTPUT: Calling :meth:`task_done` decrement the counter and returns its value after the decrementation.
to follow sage standards (more than once).
Also, I would replace this:
INPUT: ``o``  a node
by
INPUT:  ``o``  a node
to follow sage standards (more than once).
the computation are done i
> s missing
comment:95 Changed 5 years ago by
 Commit changed from 46cbab9d81890649c3d056f47ed62065bcb04ca3 to 134c1fad85dd50fa28528adc343bad759b174cc6
Branch pushed to git repo; I updated commit sha1. New commits:
134c1fa  13580: doc rereading

comment:96 followup: ↓ 98 Changed 5 years ago by
I am wondering what else is currently needed for this ticket before a positive review?
Also, what are the current plans to cythonize SearchForest
and the parallel reduce map?
comment:97 followup: ↓ 99 Changed 5 years ago by
 Status changed from needs_work to positive_review
What I had in mind for the next step to do was to move the SearchForest
code from backtrack file into the recursivelyenumerated_set.pyx
file *and* into the hierarchy of RecEnumSet
classes so that it benefits for example of methods like the newly added to_digraph
.
comment:98 in reply to: ↑ 96 ; followup: ↓ 101 Changed 5 years ago by
Replying to tscrim:
I am wondering what else is currently needed for this ticket before a positive review?
Sebastien is currently rereading it. I think he is very close to put a positive review.
Also, what are the current plans to cythonize
SearchForest
and the parallel reduce map?
Concerning Cythonization of mapreduce, my current opinion is that, the code is designed for case where the computation of the children is costly. In this case the cost of the current code should be negligible. There are a lot of function calls, I'm using Python synchronization primitive, but most importantly, communication is are by pickling/unpickling which is very slow. So the idea is to reduce at most at possible the communication, but without drastically changing the technology, I don't think you get a large improvement without great effort. You'll need to rewrite all the communication primitive and to inline the function call to the client code.
On the other hand, If you are allowed to write the code directly in C/C++ (which is the case for small basic objects), then the Cilk extension of C is wonderful. It's a small extension which is supported by GCC since version 5, ICC and in recent CLANG (https://cilkplus.github.io/). I've several code where I've been able to have very large speedup. See e.g. https://hal.archivesouvertes.fr/hal00823339/file/article.pdf if you are curious.
Now if you have a specific use case, I'll be happy to experiment.
comment:99 in reply to: ↑ 97 Changed 5 years ago by
Replying to slabbe:
What I had in mind for the next step to do was to move the
SearchForest
code from backtrack file into therecursivelyenumerated_set.pyx
file *and* into the hierarchy ofRecEnumSet
classes so that it benefits for example of methods like the newly addedto_digraph
.
That was also the next goal in my mind.
comment:100 Changed 5 years ago by
See #16351
comment:101 in reply to: ↑ 98 ; followup: ↓ 102 Changed 5 years ago by
 Milestone changed from sage6.4 to sage7.2
Replying to hivert:
Replying to tscrim:
Also, what are the current plans to cythonize
SearchForest
and the parallel reduce map?Concerning Cythonization of mapreduce, my current opinion is that, the code is designed for case where the computation of the children is costly. In this case the cost of the current code should be negligible. There are a lot of function calls, I'm using Python synchronization primitive, but most importantly, communication is are by pickling/unpickling which is very slow. So the idea is to reduce at most at possible the communication, but without drastically changing the technology, I don't think you get a large improvement without great effort. You'll need to rewrite all the communication primitive and to inline the function call to the client code.
On the other hand, If you are allowed to write the code directly in C/C++ (which is the case for small basic objects), then the Cilk extension of C is wonderful. It's a small extension which is supported by GCC since version 5, ICC and in recent CLANG (https://cilkplus.github.io/). I've several code where I've been able to have very large speedup. See e.g. https://hal.archivesouvertes.fr/hal00823339/file/article.pdf if you are curious.
Now if you have a specific use case, I'll be happy to experiment.
Christian, Vivien, and I are working on trying to speed up the iteration of finite Coxeter groups by using (subgroups of) permutation groups and the iterator method of generic Coxeter groups, which uses a SearchForest
(see #11187 and the CoxeterGroups
category). Our goal is to try and match or beat GAP3 iteration time (which we are about ~30x slower).
We are in the process pushing our successor computation code to Cython and some of our other methods, but we would appreciate any insights you have.
Yet the code on this ticket will definitely help with that. Thank you for finishing it.
comment:102 in reply to: ↑ 101 ; followup: ↓ 103 Changed 5 years ago by
Replying to tscrim:
Christian, Vivien, and I are working on trying to speed up the iteration of finite Coxeter groups by using (subgroups of) permutation groups and the iterator method of generic Coxeter groups, which uses a
SearchForest
(see #11187 and theCoxeterGroups
category). Our goal is to try and match or beat GAP3 iteration time (which we are about ~30x slower).
I'll be both very happy and interested to have usecase for this code.
We are in the process pushing our successor computation code to Cython and some of our other methods, but we would appreciate any insights you have.
Do you have a typical example relying on SearchForest
of code on top of
#11187 that I can profile ? Note that I included in the documentation of
map_reduce.py a guide on how to profile the parallel code. I'd like to measure
the proportion of time that is spend on Coxeter code one one hand and on the
parallel infrastructure on the other.
Now, concerning coxeter group, I'm quite sure that rewriting part of the code in C/C++ and using advanced parallel stuff (AVX instruction set + SIMD) you can get speedup a large as several hundreds and even thousand ! As a witness on https://github.com/hivert/IVMPG I wrote a code whose goal is to enumerate integer vector modulo permutation group. Compared to the code we have in Sage which has been Cythonized and optimized by Simon King I manage to get speedup a large as x2000 ! I need help on the build system to be able to distribute it. I'm pretty sure the code there is very similar to what you need in Coxeter groups.
By the way, should we transfer this discussion on #11187 ?
comment:103 in reply to: ↑ 102 Changed 5 years ago by
comment:104 Changed 5 years ago by
 Branch changed from u/hivert/ticket/13580 to 134c1fad85dd50fa28528adc343bad759b174cc6
 Resolution set to fixed
 Status changed from positive_review to closed
comment:105 followup: ↓ 107 Changed 5 years ago by
 Commit 134c1fad85dd50fa28528adc343bad759b174cc6 deleted
This ticket breaks the doctests on my patchbot, i do not know whether it is because it is run from a VM or from a 32bit system, see http://patchbot.sagemath.org/log/0/debian/8.3/i686/3.16.04586/tmonteildebianjessie32/20160415%2012:13:01?short
comment:106 Changed 5 years ago by
Note also that the VM emulates a "Pentium III (Katmai)"
comment:107 in reply to: ↑ 105 ; followup: ↓ 108 Changed 5 years ago by
Replying to tmonteil:
This ticket breaks the doctests on my patchbot, i do not know whether it is because it is run from a VM or from a 32bit system, see http://patchbot.sagemath.org/log/0/debian/8.3/i686/3.16.04586/tmonteildebianjessie32/20160415%2012:13:01?short
From the error message, it is neither because of 32 bits nor because of the VM but because your (virtual) machine has only one core. The goal of the patch is to distribute computation on several cores. The doctest assumes that your machine has at least 2 cores. I should fix that. Though I'm not sure what to do on those kind of machine. Pretend that they are two core and let the computation run on those two cores ?
comment:108 in reply to: ↑ 107 Changed 5 years ago by
Replying to hivert:
From the error message, it is neither because of 32 bits nor because of the VM but because your (virtual) machine has only one core. The goal of the patch is to distribute computation on several cores. The doctest assumes that your machine has at least 2 cores. I should fix that. Though I'm not sure what to do on those kind of machine. Pretend that they are two core and let the computation run on those two cores ?
I just checked that I get the very same errors on my machine pretending that it has only one core.
comment:109 Changed 5 years ago by
OK thanks for isolating the culplrit, i made #20449 as a follwup.
Salut Florent et Nathann,
I am starting to think/work on #6637... I have some questions. Mainly, I would like to know what is this ticket, because the above one liner in the description does not say much...
Also, Florent wrote on sagedevel in October 2012 that