Opened 2 years ago

Closed 2 years ago

#30753 closed enhancement (fixed)

Further improvements in method subgraph

Reported by: David Coudert Owned by:
Priority: major Milestone: sage-9.3
Component: graph theory Keywords:
Cc: gh-kliem Merged in:
Authors: David Coudert, Jonathan Kliem Reviewers: David Coudert
Report Upstream: N/A Work issues:
Branch: 40bacf0 (Commits, GitHub, GitLab) Commit: 40bacf03c0633affd490bac998856618879ba271
Dependencies: #30665, #30769 Stopgaps:

Status badges

Description (last modified by gh-kliem)

We add a fast method subgraph_given_vertices to CGraphBackend. This method allows fast subgraph creation including copying. Also improvements in subgraph with a case forgotten in #30510.

Before #30665:

sage: G = graphs.Grid2dGraph(100, 100)  # dynamic sparse                                                                                                                            
sage: V = list(G)                                                                                                                                                                   
sage: %timeit H = G.subgraph(V[:50], algorithm='add')                                                                                                                               
280 µs ± 465 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sage: %timeit H = G.copy()                                                                                                                                                          
30.3 ms ± 72.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
                                                                                                                                                                              
sage: G = G.copy(immutable=True)  # static sparse                                                                                                                                   
sage: %timeit H = G.subgraph(V[:50], algorithm='add')                                                                                                                               
553 µs ± 373 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sage: %timeit H = G.copy(immutable=False)                                                                                                                                           
31.6 ms ± 91.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
                                                                                                                                                                               
sage: G = graphs.CompleteGraph(1000)  # dynamic dense                                                                                                                               
sage: V = list(G)                                                                                                                                                                   
sage: %timeit H = G.subgraph(V[:50], algorithm='add')                                                                                                                               
54.2 ms ± 54.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
sage: %timeit H = G.copy()                                                                                                                                                          
527 ms ± 663 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

With #30665 and #30769:

sage: G = graphs.Grid2dGraph(100, 100)  # dynamic sparse                                                                                                                                                                                                                                                                                                                   
sage: V = list(G)                                                                                                                                                                                                                                                                                                                                                          
sage: %timeit H = G.subgraph(V[:50], algorithm='add')                                                                                                                                                                                                                                                                                                                      
257 µs ± 356 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sage: %timeit H = G.copy()                                                                                                                                                                                                                                                                                                                                                 
26.4 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
                                                                                                                                                                                                                                                                                                                                                                     
sage: G = G.copy(immutable=True)  # static sparse                                                                                                                                                                                                                                                                                                                          
sage: %timeit H = G.subgraph(V[:50], algorithm='add')                                                                                                                                                                                                                                                                                                                      
521 µs ± 308 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sage: %timeit H = G.copy(immutable=False)                                                                                                                                                                                                                                                                                                                                  
28.3 ms ± 95.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

sage: G = graphs.CompleteGraph(1000)  # dynamic dense                                                                                                                                                                                                                                                                                                                      
sage: V = list(G)                                                                                                                                                                                                                                                                                                                                                          
sage: %timeit H = G.subgraph(V[:50], algorithm='add')                                                                                                                                                                                                                                                                                                                      
51.6 ms ± 64.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
sage: %timeit H = G.copy()                                                                                                                                                                                                                                                                                                                                                 
429 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

With this ticket:

sage: G = graphs.Grid2dGraph(100, 100)  # dynamic sparse                                                                                                                                                                                                                                                                                                                                     
sage: V = list(G)                                                                                                                                                                                                                                                                                                                                                          
sage: %timeit H = G.subgraph(V[:50], algorithm='add')                                                                                                                                                                                                                                                                                                                      
80.9 µs ± 19.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
sage: %timeit H = G.copy()                                                                                                                                                                                                                                                                                                                                                 
15.4 ms ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

sage: G = G.copy(immutable=True)  # static sparse                                                                                                                                                                                                                                                                                                                                           
sage: %timeit H = G.subgraph(V[:50], algorithm='add')                                                                                                                                                                                                                                                                                                                      
327 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sage: %timeit H = G.copy(immutable=False)                                                                                                                                                                                                                                                                                                                                  
20.3 ms ± 23 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

sage: G = graphs.CompleteGraph(1000)  # dynamic dense                                                                                                                                                                                                                                                                                                                             
sage: V = list(G)                                                                                                                                                                                                                                                                                                                                                          
sage: %timeit H = G.subgraph(V[:50], algorithm='add')                                                                                                                                                                                                                                                                                                                      
1.03 ms ± 10.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sage: %timeit H = G.copy()                                                                                                                                                                                                                                                                                                                                                 
269 ms ± 3.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Change History (32)

comment:1 Changed 2 years ago by David Coudert

Branch: public/graphs/30753_subgraph
Cc: gh-kliem added
Commit: 9495c5b111fb3df9b33f2f955ccbbfd44963f9f8

I'm not sure if the threshold for selecting default algorithm is still best possible. At least, we have improved both algorithms.


New commits:

9495c5btrac #30753: improve _subgraph_by_adding

comment:2 Changed 2 years ago by David Coudert

Status: newneeds_review

comment:3 Changed 2 years ago by David Coudert

Currently, the default algorithm is 'delete' if the number of vertices of the subgraph is at least 5% of the number of vertices of the graph. But following tests suggest that 'add' is faster when the number of vertices of the subgraph is up to 90%... so we should change the threshold to e.g. 90%. Do you agree ?

sage: G = graphs.Grid2dGraph(100, 100)
sage: V = list(G)
sage: %timeit H = G.subgraph(V[:50], algorithm=None)   # use add
219 µs ± 7.56 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sage: %timeit H = G.subgraph(V[:50], algorithm='add')
213 µs ± 10.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sage: %timeit H = G.subgraph(V[:50], algorithm='delete')
62.3 ms ± 1.64 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

sage: %timeit H = G.subgraph(V[:500], algorithm=None)  # use add
1.85 ms ± 35.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sage: %timeit H = G.subgraph(V[:500], algorithm='add')
1.99 ms ± 70.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sage: %timeit H = G.subgraph(V[:500], algorithm='delete')
64 ms ± 1.14 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

sage: %timeit H = G.subgraph(V[:5000], algorithm=None)  # use delete but add is faster here
53 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
sage: %timeit H = G.subgraph(V[:5000], algorithm='add')
23.7 ms ± 229 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
sage: %timeit H = G.subgraph(V[:5000], algorithm='delete')
52.3 ms ± 636 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
sage: G = graphs.CompleteGraph(1000)                                                                                                
sage: V = list(G)                                                                                                                   
sage: %timeit H = G.subgraph(V[:50], algorithm=None)  # use 'add'
17.9 ms ± 408 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
sage: %timeit H = G.subgraph(V[:50], algorithm='add')
19.5 ms ± 441 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
sage: %timeit H = G.subgraph(V[:50], algorithm='delete')
962 ms ± 14.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

sage: %timeit H = G.subgraph(V[:100], algorithm=None)  # use 'delete' but 'add' is faster
962 ms ± 20.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
sage: %timeit H = G.subgraph(V[:100], algorithm='add')
38 ms ± 828 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
sage: %timeit H = G.subgraph(V[:100], algorithm='delete')
959 ms ± 18.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

sage: %timeit H = G.subgraph(V[:500], algorithm=None)  # use 'delete' but 'add' is faster
949 ms ± 18.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
sage: %timeit H = G.subgraph(V[:500], algorithm='add')
283 ms ± 15.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
sage: %timeit H = G.subgraph(V[:100], algorithm='delete')
982 ms ± 32.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

sage: %timeit H = G.subgraph(V[:800], algorithm=None)  # use 'delete' but 'add' is faster
918 ms ± 35.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
sage: %timeit H = G.subgraph(V[:800], algorithm='add')
643 ms ± 19.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
sage: %timeit H = G.subgraph(V[:800], algorithm='delete')
909 ms ± 35.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

sage: %timeit H = G.subgraph(V[:900], algorithm=None)  # use 'delete' but 'add' is faster
862 ms ± 19.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
sage: %timeit H = G.subgraph(V[:900], algorithm='add')
735 ms ± 16.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

comment:4 Changed 2 years ago by git

Commit: 9495c5b111fb3df9b33f2f955ccbbfd44963f9f8c211ff85508ac09c27285aa84c99b98d63e7f4ea

Branch pushed to git repo; I updated commit sha1. New commits:

c211ff8trac #30753: change threshold for default algorithm

comment:5 Changed 2 years ago by David Coudert

I change the threshold for default algorithm to use use 'delete' when the subgraph has at least 90% of the vertices and 'add' otherwise. It seems faster this way. I updated some doctests (mostly changes in orderings).

comment:6 Changed 2 years ago by gh-kliem

The example in this ticket might imply, that #30665 should be put back onto needs work.

I see a huge slowdown.

comment:7 Changed 2 years ago by David Coudert

OK, so may be we should wait for the finalization of #30665 before changing the threshold.

comment:8 Changed 2 years ago by gh-kliem

I think I'm done with #30665.

However, I could open another ticket and implement a subgraph iterator in the backend: Given a list of vertices, iterate over all edges in the induced subgraph. This should be much faster.

comment:9 Changed 2 years ago by David Coudert

We can give it a try, at least for simple cases. You can either do it here (the branch is public) or in a separate ticket but it might be harder to ensure that all changes are consistent (i.e., improve the situation).

comment:10 Changed 2 years ago by gh-kliem

I would say, it is definitely worth it. But I have only a draft now with some todos.

In particular, this should speedup the copy method:

sage: G = graphs.Grid2dGraph(100, 100)                                                                                                                                                                                                                                                                                                                                     
sage: V = list(G)                                                                                                                                                                                                                                                                                                                                                          
sage: %timeit H = G.copy()                                                                                                                                                                                                                                                                                                                                                 
31.1 ms ± 183 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
sage: %timeit H = G.subgraph(V[:10000], algorithm='add')                                                                                                                                                                                                                                                                                                                   
16.6 ms ± 15.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

comment:11 Changed 2 years ago by gh-kliem

Btw, subgraph_by_deleting should only be used in place or if the backend has an optimized copy method (which appears to be not the case yet).

Not sure about the special subgraphs yet, which are not obtained by just reducing the number of vertices.

comment:12 Changed 2 years ago by git

Commit: c211ff85508ac09c27285aa84c99b98d63e7f4eaadb2d57c09b4ea0e9abc706c37a831d909203ead

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

5f7b30dreviewers suggestion
002ec5ddocument that an iterator can be used to relabel edges
5deaa25fix wrong usage and document it
d7c27ebmake iterator saver
86b0ba6expose `sort_vertices`
737e0f8lookup in bitset is much faster
07b7a8ebitset must not be empty
957b4cccheck if vertices are empty
8dbef67Merge branch 'u/gh-kliem/graphs_improve_edge_iterator' of git://trac.sagemath.org/sage into public/graphs/30753_subgraph
adb2d57first step towards subgraph_by_adding in backend

comment:13 Changed 2 years ago by git

Commit: adb2d57c09b4ea0e9abc706c37a831d909203ead3a42e25f9b335edbe748daef3d5d924d3d5b775f

Branch pushed to git repo; I updated commit sha1. New commits:

3a42e25check for bitset capacity

comment:14 Changed 2 years ago by gh-kliem

Authors: David CoudertDavid Coudert, Jonathan Kliem
Status: needs_reviewneeds_work

comment:15 Changed 2 years ago by gh-kliem

Dependencies: #30665

comment:16 Changed 2 years ago by David Coudert

I'm not sure about this last commit as it has a different number than in #30665. Is it normal ?

comment:17 Changed 2 years ago by gh-kliem

This usually happens to me when cherry picking. I can rebase at some point on #30665.

comment:18 Changed 2 years ago by git

Commit: 3a42e25f9b335edbe748daef3d5d924d3d5b775f0372941f0f4e62077b5957322d4ae56427dc06ba

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

b7d739fcheck for bitset capacity
ee04870trac #30753: improve _subgraph_by_adding
c0a3b0atrac #30753: change threshold for default algorithm
0372941first step towards subgraph_by_adding in backend

comment:19 Changed 2 years ago by David Coudert

  • may be the method obtain_subgraph could be renamed like subgraph_given_vertices or something more explicit with respect what it actually do.
  • there is a line cdef CGraph to remove
  • instead of max(b_vertices_2), why not using the size or capacity of bitset active_vertices ? or may be the size of b_vertices ?
  • Somehow, before starting to add vertices to other, we should call realloc with appropriate size, thus avoiding multiple calls.
  • I have no clue how to deal with multiple edges here.
  • Concerning methods add and delete, I fully agree that now delete should be used only when inplace is True or algorithm == 'delete'. The add method is now way faster and should be the default one, without threshold.

comment:20 Changed 2 years ago by gh-kliem

I'll keep that in mind.

Concerning the multiedges, unfortunately I shbould clean up dense and sparse graphs backend a bit first. Otherwise, I think it is not a problem.

If self is mutliedge, it will simply try to make other multiedge. If other is dense, this will raise an error.

If I do things right, optimizing is_subgraph afterwards should be easy. In particual this can be used to check equality fast.

comment:21 Changed 2 years ago by David Coudert

So may be for the moment, we can use the new method for graphs without multiple edges only and let other cases in the todo list (i.e., raise NotImplementedError if called on a graph with multiple edges).

It's cool if we can also optimize other methods. The work you have already done for iterating edges plus these optimizations of the subgraph method makes significant speed ups for the entire graph module :))

comment:22 Changed 2 years ago by gh-kliem

The current method does not even work for labeled graphs.

Give me a day or so and I'll see if I can figure things out.

comment:23 Changed 2 years ago by gh-kliem

Branch: public/graphs/30753_subgraphpublic/graphs/30753_subgraph_new
Commit: 0372941f0f4e62077b5957322d4ae56427dc06ba75b7de5acbbc1be7772b99ae0b8efeaee0d59ba4

This is basically working.

What is missing:

  • Documentation.
  • Cleanup.
  • Exposure in __init__ of DiGraph and Graph to speed up e.g. copy.
  • (Maybe) Heuristics when to delete instead of add: As long as there are no improved copying methods, I changed the deletion subgraph to first obtain the subgraph induced by the given vertices instead of making a copy. Depending on the input of edges and edge_property, this might be better than adding, but maybe in not so many cases. And in those cases it might be more natural to obtain a subgraph first and then delete edges.

Last 10 new commits:

c71a05efix doctests
a1e1445trac #30753: improve _subgraph_by_adding
19f3d62trac #30753: change threshold for default algorithm
3306bc9first step towards subgraph_by_adding in backend
ae9204breviewers comments
fd5a206deal with labels and multiple edges
bafda55avoid bitset must not be empty
853802aimplement subgraph_given_vertices for static sparse
159869fuse cases for subgraph
75b7de5allow deleting of multiedges and loops

comment:24 Changed 2 years ago by David Coudert

This is really nice and I observe some speed up at least when doctesting the graph module :))

comment:25 Changed 2 years ago by git

Commit: 75b7de5acbbc1be7772b99ae0b8efeaee0d59ba4cbda82c0f6674c1248338cb177273b8a24435af1

Branch pushed to git repo; I updated commit sha1. New commits:

1a000effix mistakes
cbda82cexpose subgraph method to initialization/copying

comment:26 Changed 2 years ago by gh-kliem

Dependencies: #30665#30665, #30769

comment:27 Changed 2 years ago by git

Commit: cbda82c0f6674c1248338cb177273b8a24435af1776e38f30e301f4bef172ff752633412f641e770

Branch pushed to git repo; I updated commit sha1. This was a forced push. Last 10 new commits:

1a8ec00trac #30753: change threshold for default algorithm
29b485bfirst step towards subgraph_by_adding in backend
2f25e38reviewers comments
f4f2473deal with labels and multiple edges
fd624f2avoid bitset must not be empty
f4bc40bimplement subgraph_given_vertices for static sparse
05c3f72use cases for subgraph
7a4326eallow deleting of multiedges and loops
fb96022fix mistakes
776e38fexpose subgraph method to initialization/copying

comment:28 Changed 2 years ago by git

Commit: 776e38f30e301f4bef172ff752633412f641e77040bacf03c0633affd490bac998856618879ba271

Branch pushed to git repo; I updated commit sha1. New commits:

7c6e1d5clean up
40bacf0documentation

comment:29 Changed 2 years ago by gh-kliem

Status: needs_workneeds_review

comment:30 Changed 2 years ago by gh-kliem

Description: modified (diff)

comment:31 Changed 2 years ago by David Coudert

Reviewers: David Coudert
Status: needs_reviewpositive_review

It's not easy to review the changes done in this ticket as it depends on two other big tickets. Nonetheless, so far all my tests are successful and show serious speed up with respect previous design. Method subgraph was a bottleneck in several of my own code.

comment:32 Changed 2 years ago by Volker Braun

Branch: public/graphs/30753_subgraph_new40bacf03c0633affd490bac998856618879ba271
Resolution: fixed
Status: positive_reviewclosed
Note: See TracTickets for help on using tickets.