Opened 5 years ago

Closed 4 years ago

#16475 closed defect (fixed)

Bug in Gomory-Hu tree algorithm

Reported by: foosterhof Owned by:
Priority: major Milestone: sage-6.6
Component: graph theory Keywords: gomory hu tree gomory-hu gomory_hu_tree
Cc: Rudi Merged in:
Authors: Nathann Cohen Reviewers: Michele Borassi
Report Upstream: N/A Work issues:
Branch: b0bbcd4 (Commits) Commit: b0bbcd4119209e2b0e158ac3f83a5d2068009993
Dependencies: #12797 Stopgaps:

Description

When trying to come up with a doctest to verify ticket #16404, I stumbled across this error:

sage: G = graphs.PetersenGraph()
sage: for u, v in [(0, 1), (0, 4), (0, 5), (1, 2), (1, 6), (3, 4), (5, 7), (5, 8)]:
....:     G.set_edge_label(u, v, 2)
....: T = G.gomory_hu_tree(method="FF")
....: f = G.flow(1, 9, use_edge_labels = True)
....: f2 = T.edge_label(1, 9)
....: if f != f2:
....:     print 'Proper Tree Edge Error:', f, f2
....: 
Proper Tree Edge Error: 3 6

This is a violation of the property that defines a Gomory-Hu tree.

Furthermore, the derived property that the flow between two vertices in a graph is the minimum of the edge_labels in the constructed Gomory-Hu tree can be violated, for instance in the above graph:

sage: f = G.flow(5, 1, use_edge_labels = True)
sage: P = T.shortest_path(5, 1)
sage: E = zip(P, P[1::])
sage: f2 = min(map(lambda x: T.edge_label(x[0], x[1]), E))
sage: if f != f2:
....:     print 'Tree Path Error:', f, f2, P
....: 
Tree Path Error: 6 3 [5, 9, 1]

Change History (22)

comment:1 Changed 5 years ago by foosterhof

  • Branch set to u/foosterhof/ticket/16475
  • Created changed from 06/12/14 08:40:17 to 06/12/14 08:40:17
  • Modified changed from 06/12/14 08:40:17 to 06/12/14 08:40:17

comment:2 Changed 5 years ago by foosterhof

  • Commit set to 27025d1708e3bc7c0589b266bbf1f2f143e6b97a
  • Status changed from new to needs_review

The problem with the algorithm was that after the recursion, an edge was added. This edge was added to just any 2 vertices before, but there is are 2 specific vertices to which this edge should be added. These vertices are now calculated correctly (hopefully)


New commits:

27025d1Fixed Gomory-Hu tree algorithm by adding the partitions

comment:3 Changed 5 years ago by git

  • Commit changed from 27025d1708e3bc7c0589b266bbf1f2f143e6b97a to 9326d1518c32d30978e7b4ce9e39720f1bf8083d

Branch pushed to git repo; I updated commit sha1. New commits:

9326d15Added doctest

comment:4 Changed 5 years ago by foosterhof

  • Dependencies set to 12797

This fix works correctly, as far as I tested, when ticket #12797 is applied as well.

Florian

Last edited 5 years ago by foosterhof (previous) (diff)

comment:5 Changed 5 years ago by ncohen

Hello !

Just some cosmetic remarks :

  • Set is *MUCH* slower than set or frozenset, so if you don't need Set for a specific reason don't use it:
sage: %timeit Set(range(50))
10000 loops, best of 3: 29.1 µs per loop
sage: %timeit frozenset(range(50))
100000 loops, best of 3: 2.05 µs per loop
  • It would be cool if you could add some comments about what the following block of code does. Just a line or 5 words, i.e. something to say with words what is happening there
+        C = {}
...
+            C[r] = C2[r]

Thanks for fixing that, btw :-)

Nathann

comment:6 Changed 5 years ago by chapoton

  • Dependencies changed from 12797 to #12797

comment:7 Changed 5 years ago by git

  • Commit changed from 9326d1518c32d30978e7b4ce9e39720f1bf8083d to e4604fa9ba9cf8ef5475620ed6b3d156402015d6

Branch pushed to git repo; I updated commit sha1. New commits:

e4604faAdded comment on goal and use of Partition code

comment:8 follow-up: Changed 5 years ago by foosterhof

Hi

I am trying to convert to use your suggestion, using frozenset, but the call g1.vertices() (and thus also g2.vertices()) complains that it cannot compare a (frozen)set to an int, as it will sort them. I have been trying to give it a nice key lambda function to circumvent this problem, but no luck so far. Any suggestions? Or does this mean that I can better just leave it as using Set?

I have added a comment, including a reference to a piece of a book by A. Schrijver, which explains the use of the partitions. (A Gomory-Hu tree can be defined for a specific subset R of V, in which case these partitions play a big role)


New commits:

e4604faAdded comment on goal and use of Partition code

comment:9 in reply to: ↑ 8 Changed 5 years ago by ncohen

Hello !

I am trying to convert to use your suggestion, using frozenset, but the call g1.vertices() (and thus also g2.vertices()) complains that it cannot compare a (frozen)set to an int, as it will sort them.

HMmmmmmmmm O_o

I do not understand what you mean. Could you give me an example of code that produces it ? It looks that you cannot use "&" on a frozenset, but it has an "intersection" method

sage: range(5) & frozenset(range(8))
...
TypeError: unsupported operand type(s) for &: 'list' and 'frozenset'
sage: frozenset(range(8)).intersection(range(5))
frozenset({0, 1, 2, 3, 4})

I have added a comment, including a reference to a piece of a book by A. Schrijver, which explains the use of the partitions. (A Gomory-Hu tree can be defined for a specific subset R of V, in which case these partitions play a big role)

Thanks ! :-)

Nathann

Last edited 5 years ago by ncohen (previous) (diff)

comment:10 follow-up: Changed 5 years ago by foosterhof

In Graph.vertices():

if not boundary_first:
    return sorted(list(self.vertex_iterator()), key=key)

Apparently, it is possible to compare Set and int (or sage.blabla.Integer) objects, but not (frozen)set and int.

sage: Set([252,35]) < 1
True
sage: set([252,35]) < 1
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-ea79432632b4> in <module>()
----> 1 set([Integer(2352),Integer(2352)]) < Integer(1)

TypeError: can only compare to a set

sage: frozenset([2352,2352]) < 1
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-55b270cfd9be> in <module>()
----> 1 frozenset([Integer(2352),Integer(2352)]) < Integer(1)

TypeError: can only compare to a set

And thus, when doing:

R1 = vertices & frozenset(g1.vertices())

it gives a TypeError?, as g1 contains a vertex that is a frozenset, namely g1_v:

g1_v = frozenset(set2)
g2_v = frozenset(set1)
g1.add_vertex(g1_v)
g2.add_vertex(g2_v)

Florian

EDIT:

This can be fixed very easily, but also more efficiently. First of all, we can pass g1.vertex_iterator() to frozenset(), which will probably solve it all together.

Second of all, we know exactly which vertices g1 has, namely all vertices in set1, and the extra vertex g1_v. None more. Since g1_v is not in the variable vertices, we can just pass set1 to frozenset(), and analogously set2 for g2.

EDIT2:

Ok, so apparently Graph._backend.iterator_edges() also compares vertices before yielding them, to make sure the 'lesser' one is reported as the first vertex. And _ford_fulkerson uses this function through Graph.edge_iterator().

I guess there is no way around using a Set?

Last edited 5 years ago by foosterhof (previous) (diff)

comment:11 in reply to: ↑ 10 Changed 5 years ago by ncohen

Oh. I see ! Sorry, I had not realized that the problem came from frozensets inside of the graph !

This can be fixed very easily, but also more efficiently. First of all, we can pass g1.vertex_iterator() to frozenset(), which will probably solve it all together.

Right, but that's going too far I guess.

Ok, so apparently Graph._backend.iterator_edges() also compares vertices before yielding them, to make sure the 'lesser' one is reported as the first vertex. And _ford_fulkerson uses this function through Graph.edge_iterator().

Yepyep. And g.vertices() sorts the vertices, which is not a good idea in itself but has some advantages, still ... :-/

I guess there is no way around using a Set?

Ahahaah. As you said there are many ways around, but the best is indeed to use Set. I don't think that it is a performance issue in this context, I was just saying aloud that it's better to avoid Set in general. Let's keep the code as it is :-)

Nathann

comment:12 Changed 4 years ago by vbraun_spam

  • Milestone changed from sage-6.3 to sage-6.4

comment:13 Changed 4 years ago by chapoton

  • Milestone changed from sage-6.4 to sage-6.6
  • Status changed from needs_review to needs_work

branch does not apply

comment:14 Changed 4 years ago by ncohen

  • Authors set to Nathann Cohen
  • Branch changed from u/foosterhof/ticket/16475 to public/16475
  • Commit changed from e4604fa9ba9cf8ef5475620ed6b3d156402015d6 to 573257955a6e18241076d9b7030327bf14dcd562
  • Status changed from needs_work to needs_review

Hello,

I had totally forgotten this branch, and this is very bad (wrong results) (EDIT:see below). I looked at the code this week-end and fixed it (the fix actually takes one line ...), and also simplified the function a bit.

As the fix actually avoids adding more code to the function, I propose it instead, and change the branch (I hope that nobody minds). Needs review again.

Nathann

(EDIT 16/05/2015) This sentence is misleading: I meant that the *bug* is very bad, and that forgetting it is dangerous since Sage will return wrong results for as long as it is not fixed.


New commits:

df60f70trac #16475: Move connectivity test to the main function
005e77ctrac #16475: A useless case
ebc6504trac #16475: simplify handling of edge labels (capacities)
70ca162trac #16475: slightly Simplify update of edge labels (capacities)
18c6493trac #16475: Set->frozenset
f8b054etrac #16475: actual bugfix
be53d23trac #16475: Documentation and renamed variables
5732579trac #16475: Bibliographical reference
Last edited 4 years ago by ncohen (previous) (diff)

comment:15 follow-up: Changed 4 years ago by foosterhof

Hello,

First of all, you say that my code was very bad, because it gave wrong results. Could you tell me why? I see no counter-example or anything to back up your claim.

Of course I dont mind any refactoring, and actually I think it is indeed much more readable, and even more efficient, but there is something bugging me:

In the bibliographical reference you give (which is actually what I used as well), the proof to show that a Gomory-Hu tree exists uses a Partition { C_r | r \in R }, which dictates how the two induction-obtained trees should be merged. This is what I tried to add to the code with my patch, but you have completely removed it. Why? You simply put the edge between the two trees to be {u, v} where u and v are the picked vertices. However, why would Schrijver put such rigor in proving that this tree obtained using the partition is correct, and not instead simply use 2 arbitrary vertices.

You say you have fixed the algorithm by a single line of code, however I still think that it might give incorrect results in some cases, for reasons above. Would you agree?

Florian

comment:16 in reply to: ↑ 15 ; follow-up: Changed 4 years ago by ncohen

Helloooooooooooo,

First of all, you say that my code was very bad, because it gave wrong results. Could you tell me why? I see no counter-example or anything to back up your claim.

Oh no sorry, that's a misunderstanding! I just re-read my comment and indeed I see how you could have understood it this way. I will edit it in a second. What I meant to say is "I had forgotten about this *bug*, and it is very bad (that I forgot it) because meanwhile Sage is returning wrong results".

I have found nothing wrong with your code. Actually, as I began reading the proof that you followed, I merely noticed that it could be done in a simpler way. Which is why I proposed this alternative implementation, hoping that you would not mind.

In the bibliographical reference you give (which is actually what I used as well), the proof to show that a Gomory-Hu tree exists uses a Partition { C_r | r \in R }, which dictates how the two induction-obtained trees should be merged. This is what I tried to add to the code with my patch, but you have completely removed it. Why? You simply put the edge between the two trees to be {u, v} where u and v are the picked vertices. However, why would Schrijver put such rigor in proving that this tree obtained using the partition is correct, and not instead simply use 2 arbitrary vertices.

What the current branch does is apply Lemma 15.14alpha (p249 in my edition of it). While I obviously cannot tell you why Schrijver wrote the proof the way he did, I suspect that it is because he wants to *prove* the algorithm while I only want to implement it.

Right now, the code applies Lemma 15.14alpha while *assuming* that every graph has a gomory-hu Tree. Schrijver cannot do this for the very obvious reasons that he wants to *prove* this existence result.

Assuming the existence of a gomory-hu tree, I apply Lemma 15.14alpha to two points to obtain a bipartition U,V of the graph, then successively contract U and V to obtain two 'reduced' graphs. In each graph you will find 'two kinds' of vertices, i.e.:

  • "real vertices from the original graphs" and
  • "fake vertices that only appear because of the merge operation"

With Lemma 15.14alpha, what we know is that the max flow between any two "real points" u,v of a specific "new" graph is equal to their max flow in the original graph. Consequently, the three that is being built must indeed be the gomory-hu tree (that we know exists).

You say you have fixed the algorithm by a single line of code, however I still think that it might give incorrect results in some cases, for reasons above. Would you agree?

Do you believe that the way it is done is wrong? Or do you believe that it is suspect of being wrong because it is a simpler algorithm than the one given by Schrijver's proofs?

In my experience, writing the proof of an algorithm requires much more theoretical machinery than what is actually needed when one implements it. Sometimes, it is even easier to make the (theoretical) algorithm a bit slower in order to make the explanation simpler. Very recently, I had to give a formal proof of an algorithm needed for some research problem: while it was easy to explain on a table, it was so complicated to explain it on paper that I started trying to change the algorithm itself to make the proof easier.

Sorry for the misunderstanding, and thank you for your attention with respect to this issue.

Nathann

comment:17 in reply to: ↑ 16 ; follow-up: Changed 4 years ago by borassi

Hello!

I have analyzed the proofs in Schrijver: foosterhof is right, because you cannot take two arbitrary vertices, but the code is still correct, because the vertices taken are exactly the vertices u,v used in computing the initial u-v cut (these vertices have no name in Schrijver). We could then modify the proof of Theorem 15.14 in Schrijver by using u as r' and v as r'' (r' and r'' are the extremal points of the "new" arc added). In the last sentence of the proof, Lemma 15.14alpha yields the same conclusion for arcs e \neq r'r'', while for the arc r'r'', the two components of the tree obtained by removing r'r'' correspond to a minimum r'-r'' cut because we have chosen u=r' and v=r''.

I have no clue on why Schrijver did not use this proof, which looks much simpler (at least to me). If you have any doubts, I can write the whole proof more formally.

Other (small) issues:

  • in the comments, I'd use "maximum flow" instead of "maximal flow": in my opinion, "maximal" means that there might be other elements that are not comparable (for instance, C is a maximal clique if there is no clique C' \supset C, while it is a maximum clique if it is a clique of maximum cardinality).
  • reading the code, I could not find Schrijver's reference: I think that adding this reference might be a good idea.
  • do you think we should provide a proof that our "simplification" of Schrijver's algorithm works?

Michele

Last edited 4 years ago by borassi (previous) (diff)

comment:18 Changed 4 years ago by git

  • Commit changed from 573257955a6e18241076d9b7030327bf14dcd562 to b0bbcd4119209e2b0e158ac3f83a5d2068009993

Branch pushed to git repo; I updated commit sha1. New commits:

7fadf1ftrac #16475: Merged with 6.7
b0bbcd4trac #16475: maximal->maximum

comment:19 in reply to: ↑ 17 ; follow-up: Changed 4 years ago by ncohen

Hello !

  • in the comments, I'd use "maximum flow" instead of "maximal flow"

I think that there is no confusion possible. I changed it anyway.

  • reading the code, I could not find Schrijver's reference: I think that adding this reference might be a good idea.

It appears in the documentation of Graph.gomory_hu_tree, right before the 'INPUT' block.

  • do you think we should provide a proof that our "simplification" of Schrijver's algorithm works?

As you please. Just remember that for as long as this ticket is not reviewed, Sage returns wrong answers. Also, Wikipedia contains a description of some algorithm to compute a Gomory-Hu tree (I did not read it). If it is the same then that's cool, and if ours is simpler then it is probably where the proof should be written.

http://en.wikipedia.org/wiki/Gomory%E2%80%93Hu_tree

Nathann

comment:20 Changed 4 years ago by borassi

  • Reviewers set to Michele Borassi
  • Status changed from needs_review to positive_review

comment:21 in reply to: ↑ 19 Changed 4 years ago by borassi

Hello!

  • in the comments, I'd use "maximum flow" instead of "maximal flow"

I think that there is no confusion possible. I changed it anyway.

Ok :-)

  • reading the code, I could not find Schrijver's reference: I think that adding this reference might be a good idea.

It appears in the documentation of Graph.gomory_hu_tree, right before the 'INPUT' block.

Sorry, I didn't find it!

  • do you think we should provide a proof that our "simplification" of Schrijver's algorithm works?

As you please. Just remember that for as long as this ticket is not reviewed, Sage returns wrong answers. Also, Wikipedia contains a description of some algorithm to compute a Gomory-Hu tree (I did not read it). If it is the same then that's cool, and if ours is simpler then it is probably where the proof should be written.

I agree: this is not the place where the proof should be written. I think the code is fine as it is!

comment:22 Changed 4 years ago by vbraun

  • Branch changed from public/16475 to b0bbcd4119209e2b0e158ac3f83a5d2068009993
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.