Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#21312 closed defect (fixed)

Can't interrupt cleanly RecursivelyEnumeratedSet.graded_component

Reported by: slabbe Owned by:
Priority: major Milestone: sage-7.4
Component: combinatorics Keywords:
Cc: hivert Merged in:
Authors: Sébastien Labbé Reviewers: Salvatore Stella
Report Upstream: N/A Work issues:
Branch: 6634754 (Commits, GitHub, GitLab) Commit:
Dependencies: Stopgaps:

Status badges

Description (last modified by slabbe)

From the Iterators and KeyboardInterrupt discussion on sage-devel, here is a reproducable bug:

sage: def f(a):
....:    sleep(0.1)
....:    return [a-1,a+1]
sage: C = RecursivelyEnumeratedSet([0], f, structure='symmetric')
sage: from cysignals.alarm import alarm
sage: alarm(0.45); C.graded_component(10)
Traceback (most recent call last):
...
AlarmInterrupt: 
sage: C.graded_component(1)
{-1, 1}
sage: C.graded_component(2)
{-2, 2}
sage: C.graded_component(3)
Traceback (most recent call last):
...
StopIteration: 
sage: C.graded_component(4)
Traceback (most recent call last):
...
StopIteration: 

Change History (27)

comment:1 Changed 5 years ago by slabbe

  • Description modified (diff)

comment:3 Changed 5 years ago by slabbe

It goes down to graded_component_iterator method:

sage: def f(a):
....:     sleep(0.05)
....:     return [a-1,a+1]
sage: C = RecursivelyEnumeratedSet([0], f, structure='symmetric')
sage: it = C.graded_component_iterator()
sage: next(it)
{0}
sage: next(it)
{-1, 1}
sage: next(it)
{-2, 2}
sage: next(it)
{-3, 3}
sage: from cysignals.alarm import alarm
sage: alarm(0.02); next(it)
Traceback (most recent call last):
...
AlarmInterrupt: 
sage: next(it)
Traceback (most recent call last):
...
StopIteration: 

comment:4 Changed 5 years ago by etn40ff

In essence this seems to be the same issue I came with in sage-devel: graded_component_iterator constructs an iterator using yield and these do not behave nicely when they get interrupted.

comment:5 Changed 5 years ago by slabbe

I think I have an idea. We simply need to get rid of the graded_component_iterator in the code of graded_component. Then, this will not be the same code for structure graded and structure symmetric. I will try this as soon as I get a compiled sage in a few minutes.

comment:6 Changed 5 years ago by slabbe

  • Branch set to u/slabbe/21312
  • Commit set to 6634754063f50147ed813b34a4f822f3b616945a
  • Status changed from new to needs_review

New commits:

663475421312: avoid using iterator for graded_component

comment:7 follow-up: Changed 5 years ago by etn40ff

Thanks for this fix; at first glance it looks ok to me, I will try to give it a more in-depth review soon.

On a related topic: in my usual testing example I get a major slowdown replacing my custom made iterator with a RecursivelyEnumeratedSet. To explore a 8-regular graph with 25080 vertices it takes 54.6 seconds as opposed to 29.9. I am not sure if this depends on the __hash__ function I implemented or to the fact that I am a little bit more careful on which edges I travel along and which I can disregard a priori.

Another thing: maybe depth_first_search_iterator can be tweaked not to compute a whole graded component before returning. I have half and idea on how to do this, let me carefully think if it is feasible.

Last edited 5 years ago by etn40ff (previous) (diff)

comment:8 in reply to: ↑ 7 Changed 5 years ago by slabbe

I will try to give it a more in-depth review soon.

Great. Also, I created #21311 just this morning while looking at the code on github if you can have a look at it.

On a related topic: in my usual testing example I get a major slowdown

If you can provide a working example illustrating this, I will be curious in investigating it.

Another thing: maybe depth_first_search_iterator can be tweaked not to compute a whole graded component before returning. I have half and idea on how to do this, let me carefully think if it is feasible.

Great! Tell me you think it works.

comment:9 follow-ups: Changed 5 years ago by etn40ff

On a related topic: in my usual testing example I get a major slowdown

If you can provide a working example illustrating this, I will be curious in investigating it.

To get the example I am thinking about you need to merge in #21254 and #19538. Then

sage: A = ClusterAlgebra(['E',8])
sage: %time A.explore_to_depth(infinity)
CPU times: user 27.1 s, sys: 215 ms, total: 27.3 s
Wall time: 27.2 s

and

sage: A = ClusterAlgebra(['E',8])
sage: seeds = RecursivelyEnumeratedSet([A.initial_seed()], lambda S:
[S.mutate(i,inplace=False) for i in range(8)], structure='symmetric')
sage: %time  len(list(seeds))
CPU times: user 54.1 s, sys: 68 ms, total: 54.1 s
Wall time: 54.1 s
25080

(you need to recreate A because it keeps track of few computationally intense bits).

The speedup I get, I think, depend mostly on line 1961 and partly on line 1968 of cluster_algebra.py. The first makes sure that once an edge is known we do not walk it twice, the second enforces the fact that we only need to walk three sides of a square.

Last edited 5 years ago by etn40ff (previous) (diff)

comment:10 in reply to: ↑ 9 Changed 5 years ago by slabbe

The speedup I get, I think, depend mostly on line 1961 and partly on line 1968 of cluster_algebra.py. The first makes sure that once an edge is known we do not walk it twice, the second enforces the fact that we only need to walk three sides of a square.

In my experience, this kind of thing can be done by adding information into the nodes:

A = ClusterAlgebra(['E',8])
initial_node = tuple(A.initial_seed(), -1)
initial_nodes = [initial_node]
def succ(node):
    S, i = node
    allowed_directions = range(i+1, 8)
    return [(S.mutate(j,inplace=False),j) for j in allowed_directions]
seeds = RecursivelyEnumeratedSet(initial_nodes, succ, structure='symmetric')

Then post_process can return only the interesting part (at least now for forest, it works) of each node.

But I can't say if such an approach could work in your case.

Last edited 5 years ago by slabbe (previous) (diff)

comment:11 in reply to: ↑ 9 Changed 5 years ago by slabbe

The speedup I get, I think, depend mostly on line 1961 and partly on line 1968 of cluster_algebra.py. The first makes sure that once an edge is known we do not walk it twice, the second enforces the fact that we only need to walk three sides of a square.

The speedup seems to be very close to a factor of 2.

sage: table(["E8(yours) E6 E7 E8".split(),[54.1/27.2, 2.97/1.59, 19.2/9.57, 143./72]])
  E8(yours)          E6                 E7                 E8
  1.98897058823529   1.86792452830189   2.00626959247649   1.98611111111111

I would guess this is not a coincidence... I still do not see how lines 1961 or 1968 give a improvement by a factor of 2 ?

comment:12 Changed 5 years ago by etn40ff

Can you try with my code but with line 1961 commented out and j!= i instead of j>i in line 1968? I think I ran this once for E8 and runtime almost doubled.

comment:13 Changed 5 years ago by slabbe

Commenting out line 1961 and changing line 1968 as you suggest makes explore_to_depth slower but not 2 times slower. With this change, using RecursivelyEnumeratedSet is about 15% slower :

sage: from pandas import DataFrame
sage: df = DataFrame(index="E6 E7 E8".split())
sage: df['explore_to_depth'] = [2.53, 15.8, 124.]
sage: df['rec_enum_set'] = [2.91, 18.5, 141.]
sage: df['ratio'] = df.rec_enum_set / df.explore_to_depth
sage: df
    explore_to_depth      rec_enum_set             ratio
E6  2.53000000000000  2.91000000000000  1.15019762845850
E7  15.8000000000000  18.5000000000000  1.17088607594937
E8  124.000000000000  141.000000000000  1.13709677419355

Are there other hypothesis you are using on the structure of the graph?

comment:14 Changed 5 years ago by etn40ff

I do not think I am using any other assumption.

Note that the main loop I would write without using the assumptions above would be

while gets_bigger and depth_counter < kwargs.get('depth', infinity):
     # remember if we got a new seed
     gets_bigger = False

     for key in clusters.keys():
         sd, directions = clusters[key]
         while directions:
             # we can mutate in some direction
             i = directions.pop()
             new_sd  = sd.mutate(i, inplace=False, mutating_F=mutating_F)
             new_cl = frozenset(new_sd.g_vectors())
             if not new_cl in clusters:
                 # we got a new seed
                 gets_bigger = True
                 # next round do not mutate back to sd and do commuting mutations only in directions j > i
                 new_directions = [ j for j in allowed_dirs if j != i ]
                 clusters[new_cl] = [ new_sd, new_directions ]
                 yield new_sd
     # we went one step deeper
     depth_counter += 1

which should be slightly faster than the version you tested because it does not run the useless line

j = map(tuple,clusters[new_cl][0].g_vectors()).index(new_sd.g_vector(i))

comment:15 follow-up: Changed 5 years ago by etn40ff

Ah! I am not sure how RecursivelyEnumeratedSet does things but there is one last assumption I am using: I know which is the last direction I walked to get to any vertex so at the next pass I am not travelling in that direction.

Last edited 5 years ago by etn40ff (previous) (diff)

comment:16 in reply to: ↑ 15 Changed 5 years ago by slabbe

which should be slightly faster than the version you tested because it does not run the useless line j = map(tuple,clusters[new_cl][0].g_vectors()).index(new_sd.g_vector(i))

I had already commented that line in my previous post, also since commenting only line 1961 gives a syntax error. The only difference with your suggestion was that I was using

new_directions = [ j for j in allowed_dirs if j != i or new_sd.b_matrix()[j,i] != 0 ]

instead of

new_directions = [ j for j in allowed_dirs if j != i ]

Replying to etn40ff:

Ah! I am not sure how RecursivelyEnumeratedSet does things but there is one last assumption I am using: I know which is the last direction I walked to get to any vertex so at the next pass I am not travelling in that direction.

An hypothesis that we can't do right now inside the code of RecursivelyEnumeratedSet with symmetric structure since if y in succ(x), then we can't assume that

list(succ(x)).index(y) == list(succ(y)).index(x)

Also, the graph might not be k-regular (to make sure, that is the number of outgoing edges being constant).

But we can use that hypothesis in the succ function:

sage: A = ClusterAlgebra(['E',6])
sage: def succ(S):
....:     p = S.path_from_initial_seed()
....:     i = p[-1] if p else -1
....:     return [S.mutate(j,inplace=False) for j in range(6) if j != i]
sage: seeds = RecursivelyEnumeratedSet([A.initial_seed()], succ, structure='symmetric')
sage: %time len(list(seeds))
CPU times: user 2.69 s, sys: 200 ms, total: 2.89 s
Wall time: 2.75 s
833

With this new succ function and removing the part or new_sd.b_matrix()[j,i] != 0, the ratio become about 5% slower using RecursivelyEnumeratedSet :

sage: from pandas import DataFrame
sage: df = DataFrame(index="E6 E7 E8".split())
sage: df['explore_to_depth'] = [2.61, 16.1, 121.]
sage: df['rec_enum_set'] = [2.75, 16.9, 128.]
sage: df['ratio'] = df.rec_enum_set / df.explore_to_depth
sage: df
    explore_to_depth      rec_enum_set             ratio
E6  2.61000000000000  2.75000000000000  1.05363984674330
E7  16.1000000000000  16.9000000000000  1.04968944099379
E8  121.000000000000  128.000000000000  1.05785123966942

comment:17 follow-up: Changed 5 years ago by etn40ff

ok 5% may depend on your code being more general or storing more info. I'd be happy to live with it. Now the problem is: how do I enforce the other two assumptions?

PS: This ticket was meant to solve an issue and we are discussing something else. I guess we better leave this open till we are done.

comment:18 Changed 5 years ago by slabbe

Meanwhile, I was managed to use the second assumption (if I am not mistaken):

sage: A = ClusterAlgebra(['E',8])
sage: allowed_dirs = range(8)
sage: def succ(S):
....:     p = S.path_from_initial_seed()
....:     if p:
....:         i = p[-1]
....:         L = []
....:         for j in allowed_dirs:
....:              new_sd = S.mutate(j, inplace=False)
....:              if j > i or new_sd.b_matrix()[j,i] != 0:
....:                  L.append(new_sd)
....:         return L
....:     else:
....:         return [S.mutate(j,inplace=False) for j in allowed_dirs]
sage: seeds = RecursivelyEnumeratedSet([A.initial_seed()], succ) # not symmetric anymore right?

but I do not get any significant improvement compare to just avoid branches j!=i :

sage: %time len(list(seeds))
CPU times: user 2min 8s, sys: 605 ms, total: 2min 8s
Wall time: 2min 8s
25080

which is the same (128 s) as above.

comment:19 in reply to: ↑ 17 Changed 5 years ago by slabbe

Replying to etn40ff:

ok 5% may depend on your code being more general or storing more info. I'd be happy to live with it. Now the problem is: how do I enforce the other two assumptions?

I don't know if we can make the first assumption cleanly into the function succ (difficult to understand line 1959), but maybe the Recursively Enumerated Set class can be changed (or copied and changed) so that the line does: if y in B then change y so that we do not go to x when we visit y.

We should test if it works and if it is worth it, and if so, create a new ticket for it.

PS: This ticket was meant to solve an issue and we are discussing something else. I guess we better leave this open till we are done.

I agree. I wait for a review of this ticket then:)

comment:20 Changed 5 years ago by etn40ff

I had a look at the code and it seems good to me. I am waiting for sage to finish building before running sage -t --long to give a positive review.

On the other topic discussed in the comments of this ticket: I ended up keeping the implementation I had in #21254 with a try statement to catch KeyboardIntterrupt. At the moment it is faster and I do not have much time to refactor the code to use RecursivelyEnumeratedSet. I will probably leave this as a task for a future ticket.

comment:21 Changed 5 years ago by slabbe

  • Authors set to Sébastien Labbé

No problem. RecursivelyEnumeratedSet should be fast so that people really want to use it. So your example raises good questions.

comment:22 follow-ups: Changed 5 years ago by etn40ff

  • Reviewers set to Salvatore Stella
  • Status changed from needs_review to positive_review

Positive review.

It does make sense for code such as #21254 to use RecursivelyEnumeratedSet just because it is useless to keep reinventing the wheel each time. Moreover it offers quite a lot of useful features.

Two consideration about speed:

1) I guess there are several use cases in which the function succ is expensive but it is not too expensive, once a duplicate element has been found, to trim the directions in which succ is computed. This is the case, for example, of the code we discussed. It could be useful if RecursivelyEnumeratedSet could take a second function as input that is called each time a new element in the set is found and that takes care of pruning the search tree accordingly.

2) how essential is the 'forest' requirement to the parallelized map/reduce? The code we discussed could benefit much from this.

comment:23 in reply to: ↑ 22 Changed 5 years ago by slabbe

  • Cc hivert added

I will keep your comment 1) in mind for future evolvement of RecEnumSet.

2) how essential is the 'forest' requirement to the parallelized map/reduce? The code we discussed could benefit much from this.

Well you can set structure='forest' and see what happens. You will get elements generated twice in different trees and the independant workers will assume their element is new and will continue their search pruning its successors. Depending on the case, you can get an iterator that never halts even if the graph is finite...

To adapt the parallel map reduce code to work on graphs that are not forest, some communication should be added between workers. But then, it becomes less easy to get efficient parallel computations. That question should be asked to Florent:

Could their be a common set of already known nodes that is shared among the workers so that they could know if they find a new node or not?

comment:24 Changed 5 years ago by etn40ff

Well you can set structure='forest' and see what happens.

Nothing good: say your function is symmetric then it will continuously bounce in between two elements. If I am not mistaken 'forest' does not check for already produced elements.

comment:25 in reply to: ↑ 22 ; follow-up: Changed 5 years ago by hivert

Replying to etn40ff:

2) how essential is the 'forest' requirement to the parallelized

map/reduce? The code we discussed could benefit much from this.

It is essential. I'm planning to write some analogous code for directed acyclic graph, but this need to completely rethink the algorithm. If I manage, there certainly will be two independent codes for forests and DAGs. And there is a good change that the DAG code will be less efficient.

Using the forest hypothesis, a work stealing algorithm allows the perform the computation with minimal communications. This is crucial because in Python, communication are done interprocess through pickling, so that they are quite slow. For DAGs, you need a strategy to ensure that two differents processes are not computing the same nodes...

comment:26 Changed 5 years ago by vbraun

  • Branch changed from u/slabbe/21312 to 6634754063f50147ed813b34a4f822f3b616945a
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:27 in reply to: ↑ 25 Changed 5 years ago by etn40ff

  • Commit 6634754063f50147ed813b34a4f822f3b616945a deleted

Replying to hivert:

This is crucial because in Python, communication are done interprocess through pickling, so that they are quite slow.

That's a bummer! I am not sure DAGs would could be used in my case either.

One stray thought: can you use hashes for communication rather than the whole objects? You could get some mileage from the fact that you will be pickling a smaller amount of data.

Note: See TracTickets for help on using tickets.