Opened 6 years ago

Closed 5 years ago

#18910 closed enhancement (fixed)

Boost minimum spanning tree

Reported by: borassi Owned by:
Priority: major Milestone: sage-6.8
Component: graph theory Keywords: Boost, minimum spanning tree
Cc: ncohen, dcoudert, Stefan, Rudi Merged in:
Authors: Michele Borassi Reviewers: David Coudert
Report Upstream: N/A Work issues:
Branch: 38a817d (Commits) Commit: 38a817d387325a25c62df70c6e77d7c06169f152
Dependencies: #18876, #18906 Stopgaps:

Description (last modified by borassi)

Include Boost implementation of Kruskal and Prim algorithms for minimum spanning tree.

Change History (26)

comment:1 Changed 6 years ago by borassi

  • Authors set to Michele Borassi
  • Cc ncohen dcoudert added
  • Component changed from PLEASE CHANGE to graph theory
  • Dependencies set to #18876, #18906
  • Description modified (diff)
  • Keywords Boost minimum spanning tree added
  • Type changed from PLEASE CHANGE to enhancement

comment:2 Changed 5 years ago by borassi

  • Branch set to u/borassi/boost_minimum_spanning_tree

comment:3 Changed 5 years ago by borassi

  • Commit set to 8278b0bf8b3b4c86bdc5b543f5ca57c8b682f83a

Hello!

This is the first version of Boost minimum spanning tree. With this code, I'm inserting in Sage the code to interface with Boost weighted graphs. The improvement on this specific algorithm is not striking (about 2x over the old algorithm), but I still set the Boost algorithm as default.

Some benchmark:

sage: sage: def random_weighted_graph(n, m, lower_weight = 1, upper_weight = 100): 
        import random
        g = graphs.RandomGNM(n,m)
        m = g.num_edges()
        weights = [random.randint(lower_weight, upper_weight) for r in xrange(m)]
        uw_edges = g.edges()
        w_edges = [(uw_edges[i][0], uw_edges[i][1], weights[i]) for i in xrange(m)]
        return Graph(w_edges, weighted = True)
....: 
sage: g = random_weighted_graph(20000,200000)
sage: %timeit(g.min_spanning_tree(algorithm="Prim_Boost"))
1 loops, best of 3: 279 ms per loop
sage: %timeit(g.min_spanning_tree(algorithm="Kruskal_Boost"))
1 loops, best of 3: 361 ms per loop
sage: %timeit(g.min_spanning_tree(algorithm="Kruskal"))
1 loops, best of 3: 492 ms per loop
sage: %timeit(g.min_spanning_tree(algorithm="NetworkX"))
1 loops, best of 3: 1.92 s per loop
sage: %timeit(g.min_spanning_tree(algorithm="Prim_edge"))
1 loops, best of 3: 17.3 s per loop
sage: %timeit(g.min_spanning_tree(algorithm="Prim_fringe"))
1 loops, best of 3: 36.1 s per loop

Hope you like it!

Michele

comment:4 Changed 5 years ago by borassi

  • Status changed from new to needs_review

comment:5 Changed 5 years ago by dcoudert

The weight function you use does not allow to access all possible stuff that might be stored on an edge. For instance, I may have an edge like (u, v, {'weight':3, 'toto':4}) in which case a function like weight_function=lambda e:e[2]['weight'] is possible. With current implementation, you assume that weight_function((u,v)) returns the weight. Better to have weight_function(e) or weight_function((u,v,l)).

Also, could you be a bit more specific with the description of the weight function

- ``weight_function`` -- A function that takes an edge and returns a
          numeric weight. If ``None`` (default), the algorithm uses the edge
          weights, if available, otherwise it assigns weight 1 to each edge (in
          the latter case, the output can be any spanning tree).

In particular, the default behavior assume that edge weights are numbers, that is if e=(u,v,l) then l is a number.

David.

comment:6 Changed 5 years ago by git

  • Commit changed from 8278b0bf8b3b4c86bdc5b543f5ca57c8b682f83a to ceda13f0b83784adce78b527099e08e7e003c6ac

Branch pushed to git repo; I updated commit sha1. New commits:

ceda13fImproved explanation of weight_function

comment:7 Changed 5 years ago by borassi

Hello!

you are absolutely right: I have made the modification you asked, I have improved the explanation of the weight function, and I have added an example in the generic_graph file.

Thank you very much!

Michele

comment:8 follow-up: Changed 5 years ago by dcoudert

Hello,

I know that dealing with weights is not easy. However, here we have different behaviors.

sage: g = Graph([(0,1,{'blop':1}), (1,2,{'blop':'b'})])
sage: g.min_spanning_tree(algorithm="Prim_Boost", weight_function=lambda e:e[2]['blop'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
...
TypeError: a float is required
sage: g = Graph([(0,1,1), (1,2,'b')])
sage: g.min_spanning_tree(algorithm="Prim_Boost", weight_function=lambda e:e[2])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
...
TypeError: a float is required
sage: g.min_spanning_tree(algorithm="Prim_Boost")
[(0, 1, 1), (1, 2, 'b')]

David.

comment:9 in reply to: ↑ 8 Changed 5 years ago by borassi

Hello!

Yeah, I have some problems, too: it is difficult to decide which is the correct behavior. In your examples, I have trouble understanding which improvement you are suggesting: could you be a bit more specific?

In particular, in the first two cases, you provide a weight function which is not correct, so a TypeError? is raised (if you provide a function, the algorithm raises an error if it is not correct). In the third case, your graph is not weighted:

sage: g = Graph([(0,1,1), (1,2,'b')])
sage: g.weighted()
False

Hence, the algorithm simply says "Since the graph is not weighted, I ignore the label and set weight 1 to each edge".

Maybe a problem occurs if the graph is weighted and no function is provided: in this case, the algorithm tries to use labels even if they are not numbers. In Boost algorithms, non-numeric weights are assumed to be 1, while in other algorithms labels are simply compared, and Python decides which is the biggest.

sage: g = Graph([(0,1,1), (1,2,'b')], weighted=True)
sage: g.min_spanning_tree(algorithm="Prim_Boost")
[(0, 1, 1), (1, 2, 'b')]
sage: g.min_spanning_tree(algorithm="Prim_fringe")
[(0, 1, 1), (1, 2, 'b')]
sage: g.min_spanning_tree(algorithm="Prim_edge")
[(0, 1, 1), (1, 2, 'b')]
sage: g.min_spanning_tree(algorithm="Kruskal")
[(0, 1, 1), (1, 2, 'b')]
sage: g.min_spanning_tree(algorithm="Kruskal_Boost")
[(0, 1, 1), (1, 2, 'b')]
sage: g.min_spanning_tree(algorithm="NetworkX")
[(0, 1, 1), (1, 2, 'b')]

In the latter example, do you think I should only improve the explanation, which is in the INPUT section of the documentation and in the examples? Or do you think I should raise an exception if a weight is not a number?

Thank you very much!

Michele

comment:10 Changed 5 years ago by dcoudert

I think it's a weird behavior to assume that an edge like (1, 2, 'b') has weight 1. However, it was the original behavior of the method, so I propose to keep it unchanged. So don't raise exception, but if you can improve further the doc, it should be ok. Thanks.

comment:11 Changed 5 years ago by git

  • Commit changed from ceda13f0b83784adce78b527099e08e7e003c6ac to 4bbb2980aab4b0a81fc07cf62afa5395ee7719c1

Branch pushed to git repo; I updated commit sha1. New commits:

4bbb298Corrected edge weights

comment:12 Changed 5 years ago by borassi

Hello!

I have changed the behavior: now, if edge weights are not convertible to floats through float() function, an error is raised. I have tried to clarify this behavior as much as possible in the documentation.

Thank you very much!

Michele

comment:13 follow-up: Changed 5 years ago by dcoudert

The documentation builds properly and is now good.

I tracked all calls to min_spanning_tree to launch tests. All tests pass in graphs and homology. However, one test is broken in src/sage/matroids/utilities.py, method lift_cross_ratios. I suspect that this is due to the change in the output ordering of edges. I don't know what to do here. I have forced you to sort edges because all methods where not returning edges in the same order, and I think it is better now (the output is independent on the algorithm).

The set T constructed in this method is, before applying this patch:

set([((0, 0), (0, 1), (0, 0)), ((0, 0), (2, 1), (0, 2)), ((0, 1), (1, 0), (1, 0)), ((1, 0), (1, 1), (1, 1)), ((1, 1), (2, 0), (2, 1)), ((0, 0), (3, 1), (0, 3)), ((0, 0), (4, 1), (0, 4))])

and with this patch:

set([((0, 0), (0, 1), (0, 0)), ((0, 0), (2, 1), (0, 2)), ((0, 1), (1, 0), (1, 0)), ((1, 0), (1, 1), (1, 1)), ((2, 0), (2, 1), (2, 2)), ((0, 0), (3, 1), (0, 3)), ((0, 0), (4, 1), (0, 4))])

So only one entry differs: ((1, 1), (2, 0), (2, 1)) becomes ((2, 0), (2, 1), (2, 2)).

I don't know if we can safely update the expected output of this test or not. Nathann?

comment:14 in reply to: ↑ 13 ; follow-up: Changed 5 years ago by ncohen

  • Cc Stefan Rudi added

Hello,

I don't know if we can safely update the expected output of this test or not. Nathann?

Not sure. Possible, but not sure. The best is to ask the matroid guys. Stefan, Rudi? Does it look okay to you if by changing the behaviour of a graph spanning tree routine, the output of

sage: Z = lift_cross_ratios(A, to_sixth_root_of_unity)

at line 416 of matroids/utilities.py changes like that?

Failed example:
    Z
Expected:
    [ 1  0  1  1  1]
    [ 1  1  0  0  z]
    [ 0  1 -z -1  0]
Got:
    [     1      0      1      1      1]
    [     1      1      0      0      z]
    [     0  z - 1      1 -z + 1      0]

It may be caused by a different spanning tree, or by a reordering of its edges.

Nathann

comment:15 in reply to: ↑ 14 ; follow-up: Changed 5 years ago by Rudi

Replying to ncohen:

Stefan, Rudi? Does it look okay to you if by changing the behaviour of a graph spanning tree routine, the output of

sage: Z = lift_cross_ratios(A, to_sixth_root_of_unity)

at line 416 of matroids/utilities.py changes like that?

Failed example:
    Z
Expected:
    [ 1  0  1  1  1]
    [ 1  1  0  0  z]
    [ 0  1 -z -1  0]
Got:
    [     1      0      1      1      1]
    [     1      1      0      0      z]
    [     0  z - 1      1 -z + 1      0]

It may be caused by a different spanning tree, or by a reordering of its edges.

Yes, that looks OK. This output matrix Z is fixed up to scaling of rows and columns, and the tree is used to determine the scaling. Looks like in this example the bottom row is scaled by z-1 (which happens to equal -z-1 in that ring, z is is a sixth root of unity).

So it's safe to just update the output of the test.

comment:16 in reply to: ↑ 15 Changed 5 years ago by ncohen

Yes, that looks OK.

Great. Thanks for your answer :-)

Nathann

comment:17 Changed 5 years ago by git

  • Commit changed from 4bbb2980aab4b0a81fc07cf62afa5395ee7719c1 to fd9e2320a51353b9d90879e237869e4971613a5e

Branch pushed to git repo; I updated commit sha1. New commits:

fd9e232Corrected matroids test

comment:18 Changed 5 years ago by borassi

Done! Thank you very much, Rudi!

comment:19 Changed 5 years ago by dcoudert

  • Reviewers set to David Coudert
  • Status changed from needs_review to positive_review

Good!

comment:20 Changed 5 years ago by git

  • Commit changed from fd9e2320a51353b9d90879e237869e4971613a5e to 66c6aea291d11529833931cc094abd7b35df794c
  • Status changed from positive_review to needs_review

Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:

6c084ecMerge branch 'develop' into t/18876/boost_cuthill_mckee__king_ordering
3847037Merge branch 'develop' into t/18876/boost_cuthill_mckee__king_ordering
66c6aeaMerge branch 't/18876/boost_cuthill_mckee__king_ordering' into t/18910/boost_minimum_spanning_tree

comment:21 Changed 5 years ago by borassi

I have merged this ticket with the latest version of #18876 to avoid conflicts.

comment:22 follow-up: Changed 5 years ago by dcoudert

When I click on the link to the code (to of the page, branch u/borassi/boost_minimum_spanning_tree), for file bandwidth.pyx I see the following line which has been changed in #18876.

+    :meth:`bandwidth_heuristics()<sage.graphs.base.boost_graph.bandwidth_heuristics>` | Uses Boost heuristics to approximate the bandwidth of the input graph

Is it normal??

comment:23 in reply to: ↑ 22 Changed 5 years ago by borassi

Yes, it is normal, I think: I modified that line only after I merged #18876. However, I do not think it is a problem, because the new commit in #18876 should not create conflicts (I have tested the merge).

In any case, I re-merged them with this new commit.

Replying to dcoudert:

When I click on the link to the code (to of the page, branch u/borassi/boost_minimum_spanning_tree), for file bandwidth.pyx I see the following line which has been changed in #18876.

+    :meth:`bandwidth_heuristics()<sage.graphs.base.boost_graph.bandwidth_heuristics>` | Uses Boost heuristics to approximate the bandwidth of the input graph

Is it normal??

comment:24 Changed 5 years ago by git

  • Commit changed from 66c6aea291d11529833931cc094abd7b35df794c to 38a817d387325a25c62df70c6e77d7c06169f152

Branch pushed to git repo; I updated commit sha1. New commits:

ecda30fApplied Nathann's suggestion
38a817dMerge branch 't/18876/boost_cuthill_mckee__king_ordering' into t/18910/boost_minimum_spanning_tree

comment:25 Changed 5 years ago by dcoudert

  • Status changed from needs_review to positive_review

for me the patch is good to go !

comment:26 Changed 5 years ago by vbraun

  • Branch changed from u/borassi/boost_minimum_spanning_tree to 38a817d387325a25c62df70c6e77d7c06169f152
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.