Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#23290 closed defect (fixed)

Graph.merge_vertices() destroys loops

Reported by: zgershkoff Owned by:
Priority: minor Milestone: sage-8.0
Component: graph theory Keywords:
Cc: dcoudert, tscrim, Stefan Merged in:
Authors: Zach Gershkoff Reviewers: Stefan van Zwam, David Coudert
Report Upstream: N/A Work issues:
Branch: ff571d6 (Commits) Commit:
Dependencies: Stopgaps:

Description (last modified by zgershkoff)

merge_vertices() accepts a list of vertices as input, but loops are lost unless they're on the first vertex.

Change History (17)

comment:1 Changed 2 years ago by zgershkoff

  • Authors set to Zachary Gershkoff
  • Cc dcoudert tscrim Stefan added
  • Component changed from PLEASE CHANGE to graph theory
  • Description modified (diff)
  • Priority changed from major to minor
  • Type changed from PLEASE CHANGE to defect
sage: edgelist = [(0,0,'a'), (0,1,'b'), (1,1,'c')]
sage: G = Graph(edgelist, loops=True, multiedges=True)
sage: G.merge_vertices([0,1]); G.edges()
[(0,0,'a')]

It looks like merge_vertices() works by computing the edge boundary between the specified set of vertices and the rest of the graph, then deleting all the vertices except the first one, then putting the edges back in.

So there are actually two issues here. (1) Loops not on the first vertex will be lost (because they're not in the edge boundary and their vertices get deleted), and (2) we'll lose other edges that are not in the edge boundary so they should become loops.

(1) is certainly a defect. I'm not so certain about (2) because I don't know if it's appropriate for this method to create loops. In fact, it's probably not.

Last edited 2 years ago by zgershkoff (previous) (diff)

comment:2 Changed 2 years ago by dcoudert

It is not an easy task to decide on the good behavior. See #10357 #9807 #7304 #7159.

comment:3 Changed 2 years ago by zgershkoff

In that case, I'll fix (1) and leave (2) for one of the other tickets.

comment:4 Changed 2 years ago by zgershkoff

  • Branch set to u/zgershkoff/graph_merge_vertices___destroys_loops

comment:5 Changed 2 years ago by zgershkoff

  • Commit set to 412dc9082deb4ed0c7f14cdbee7a37950e0e51a7
  • Status changed from new to needs_review

It now checks if any of the (di)graph's loops are on the vertices to be deleted before deleting them, and it only adds a few steps if loops are disabled.


New commits:

412dc90moved loops from vertices before the vertices get deleted

comment:6 Changed 2 years ago by zgershkoff

Before this gets closed: There's code at the beginning of merge_vertices() to check if the first vertex label is None, but I don't think this will ever happen because of #9362. Should I take those lines out?

Nevermind. A user may wish to input None as the first item in the list because this will give the merged vertex a new name.

Last edited 2 years ago by zgershkoff (previous) (diff)

comment:7 Changed 2 years ago by Stefan

  • Reviewers set to Stefan van Zwam
  • Status changed from needs_review to positive_review

Works as advertised!

comment:8 follow-up: Changed 2 years ago by dcoudert

  • Reviewers changed from Stefan van Zwam to Stefan van Zwam, David Coudert
  • Status changed from positive_review to needs_work

This could be more efficient

        u = vertices[0]
        for v in vertices[1:]:
            if self.has_edge(v, v):
                if self.allows_multiple_edges():
                    for l in self.edge_label(v, v):
                        self.add_edge(u, u, l)
                else:
                    self.add_edge(u, u, self.edge_label(v, v))

Could you add a test to do nothing when len(vertices) < 2.

sage: G = Graph()
sage: G.merge_vertices([None])
sage: G
Graph on 1 vertex
sage: G.merge_vertices([None])
sage: G
Graph on 2 vertices

Other issue: TESTS:: -> TESTS:

comment:9 Changed 2 years ago by dcoudert

Also, when multiedges are not allowed, the behavior must be documented.

sage: edgelist = [(0,0,'a'),(0,1,'b'),(1,1,'c')]
sage: G = Graph(edgelist, loops=True, multiedges=False)
sage: G.merge_vertices([0,1]); G.edges()
[(0, 0, 'c')]
sage: edgelist = [(0,0,'a'),(0,1,'b'),(1,1,'c'), (1, 2, 'd'), (2, 2, 'e')]
sage: G = Graph(edgelist, loops=True, multiedges=False)
sage: G.merge_vertices([0,1,2]); G.edges()
[(0, 0, 'e')]
sage: G = Graph(edgelist, loops=True, multiedges=False)
sage: G.merge_vertices([0,2,1]); G.edges()
[(0, 0, 'e')]

comment:10 in reply to: ↑ 8 ; follow-up: Changed 2 years ago by zgershkoff

Replying to dcoudert:

This could be more efficient

        u = vertices[0]
        for v in vertices[1:]:
            if self.has_edge(v, v):
                if self.allows_multiple_edges():
                    for l in self.edge_label(v, v):
                        self.add_edge(u, u, l)
                else:
                    self.add_edge(u, u, self.edge_label(v, v))

Is this faster because it specifies the vertices, rather than iterating over a list of edges to get the loops?

Could you add a test to do nothing when len(vertices) < 2.

sage: G = Graph()
sage: G.merge_vertices([None])
sage: G
Graph on 1 vertex
sage: G.merge_vertices([None])
sage: G
Graph on 2 vertices

It's an undocumented feature that None as the first entry will give the vertex a new name, and a bug that it will create a new vertex if there's nothing to merge. I'll addressed both.

Other issue: TESTS:: -> TESTS:

Is there a guide where I can see how to correctly mark up documentation? The developer's guide doesn't seem to cover it all, and many of the other methods I've looked at use TESTS:: and not TESTS:. This doesn't seem to be a standard thing for Python docstrings so I'm not sure where to look.

comment:11 in reply to: ↑ 10 ; follow-up: Changed 2 years ago by dcoudert

Replying to zgershkoff:

Replying to dcoudert:

This could be more efficient

        u = vertices[0]
        for v in vertices[1:]:
            if self.has_edge(v, v):
                if self.allows_multiple_edges():
                    for l in self.edge_label(v, v):
                        self.add_edge(u, u, l)
                else:
                    self.add_edge(u, u, self.edge_label(v, v))

Is this faster because it specifies the vertices, rather than iterating over a list of edges to get the loops?

The loop_edges method calls the loop_vertices method which returns [v for v in self if self.has_edge(v,v)] if loops are allowed.

Since vertices is a subset of the vertices of the graph, and we can expect this set to be small most of the time (e.g., 2), what I propose should be faster than your first proposal.

Could you add a test to do nothing when len(vertices) < 2.

sage: G = Graph()
sage: G.merge_vertices([None])
sage: G
Graph on 1 vertex
sage: G.merge_vertices([None])
sage: G
Graph on 2 vertices

It's an undocumented feature that None as the first entry will give the vertex a new name, and a bug that it will create a new vertex if there's nothing to merge. I'll addressed both.

It's documented in the method, 3rd paragraph: The new vertex is named after the first vertex in the list given in argument. If this first name is None, a new vertex is created.

Other issue: TESTS:: -> TESTS:

Is there a guide where I can see how to correctly mark up documentation? The developer's guide doesn't seem to cover it all, and many of the other methods I've looked at use TESTS:: and not TESTS:. This doesn't seem to be a standard thing for Python docstrings so I'm not sure where to look.

To be honest, I don't know. I checked on recent ticket, and #23255 suggests that TESTS:: is the good form. So keep it as is.

comment:12 Changed 2 years ago by git

  • Commit changed from 412dc9082deb4ed0c7f14cdbee7a37950e0e51a7 to ff571d6699ab233eedaf3b9d4a417ef6a1f6c56f

Branch pushed to git repo; I updated commit sha1. New commits:

ff571d6efficiency improvements; bug fix when input is [None]

comment:13 in reply to: ↑ 11 Changed 2 years ago by zgershkoff

Replying to dcoudert:

To be honest, I don't know. I checked on recent ticket, and #23255 suggests that TESTS:: is the good form. So keep it as is.

Judging from that ticket, the convention is to use :: when sage: follows and : when text follows. So in this instance, I should indeed use TESTS:. The usage isn't consistent throughout the file but I'm not going to try and fix it within this ticket.

I made the changes you suggested, but I had to change one of the tests:

sage: edgelist = [(0,0,'a'),(0,1,'b'),(1,1,'c'), (1, 2, 'd'), (2, 2, 'e')]
sage: G = Graph(edgelist, loops=True, multiedges=False)
sage: G.merge_vertices([0,1,2]); G.edges()
[(0, 0, 'e')]
sage: G = Graph(edgelist, loops=True, multiedges=False)
sage: G.merge_vertices([0,2,1]); G.edges()
[(0, 0, 'e')]

I think the last output should be [(0, 0, 'c')], and that's what the code currently returns, since it evaluates vertex 1 last and finds the label 'c' last.

comment:14 Changed 2 years ago by zgershkoff

  • Status changed from needs_work to needs_review

comment:15 Changed 2 years ago by dcoudert

  • Status changed from needs_review to positive_review

That's much better now. Thanks.

comment:16 Changed 2 years ago by vbraun

  • Branch changed from u/zgershkoff/graph_merge_vertices___destroys_loops to ff571d6699ab233eedaf3b9d4a417ef6a1f6c56f
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:17 Changed 2 years ago by jdemeyer

  • Authors changed from Zachary Gershkoff to Zach Gershkoff
  • Commit ff571d6699ab233eedaf3b9d4a417ef6a1f6c56f deleted
Note: See TracTickets for help on using tickets.