#30753 closed enhancement (fixed)

Further improvements in method subgraph

Reported by: dcoudert Owned by:
Priority: major Milestone: sage-9.3
Component: graph theory Keywords:
Cc: gh-kliem Merged in:
Authors: David Coudert, Jonathan Kliem Reviewers: David Coudert
Report Upstream: N/A Work issues:
Branch: 40bacf0 (Commits, GitHub, GitLab) Commit: 40bacf03c0633affd490bac998856618879ba271
Dependencies: #30665, #30769 Stopgaps:

Status badges

Description (last modified by gh-kliem)

We add a fast method subgraph_given_vertices to CGraphBackend. This method allows fast subgraph creation including copying. Also improvements in subgraph with a case forgotten in #30510.

Before #30665:

sage: G = graphs.Grid2dGraph(100, 100)  # dynamic sparse                                                                                                                            
sage: V = list(G)                                                                                                                                                                   
sage: %timeit H = G.subgraph(V[:50], algorithm='add')                                                                                                                               
280 µs ± 465 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sage: %timeit H = G.copy()                                                                                                                                                          
30.3 ms ± 72.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
                                                                                                                                                                              
sage: G = G.copy(immutable=True)  # static sparse                                                                                                                                   
sage: %timeit H = G.subgraph(V[:50], algorithm='add')                                                                                                                               
553 µs ± 373 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sage: %timeit H = G.copy(immutable=False)                                                                                                                                           
31.6 ms ± 91.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
                                                                                                                                                                               
sage: G = graphs.CompleteGraph(1000)  # dynamic dense                                                                                                                               
sage: V = list(G)                                                                                                                                                                   
sage: %timeit H = G.subgraph(V[:50], algorithm='add')                                                                                                                               
54.2 ms ± 54.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
sage: %timeit H = G.copy()                                                                                                                                                          
527 ms ± 663 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

With #30665 and #30769:

sage: G = graphs.Grid2dGraph(100, 100)  # dynamic sparse                                                                                                                                                                                                                                                                                                                   
sage: V = list(G)                                                                                                                                                                                                                                                                                                                                                          
sage: %timeit H = G.subgraph(V[:50], algorithm='add')                                                                                                                                                                                                                                                                                                                      
257 µs ± 356 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sage: %timeit H = G.copy()                                                                                                                                                                                                                                                                                                                                                 
26.4 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
                                                                                                                                                                                                                                                                                                                                                                     
sage: G = G.copy(immutable=True)  # static sparse                                                                                                                                                                                                                                                                                                                          
sage: %timeit H = G.subgraph(V[:50], algorithm='add')                                                                                                                                                                                                                                                                                                                      
521 µs ± 308 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sage: %timeit H = G.copy(immutable=False)                                                                                                                                                                                                                                                                                                                                  
28.3 ms ± 95.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

sage: G = graphs.CompleteGraph(1000)  # dynamic dense                                                                                                                                                                                                                                                                                                                      
sage: V = list(G)                                                                                                                                                                                                                                                                                                                                                          
sage: %timeit H = G.subgraph(V[:50], algorithm='add')                                                                                                                                                                                                                                                                                                                      
51.6 ms ± 64.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
sage: %timeit H = G.copy()                                                                                                                                                                                                                                                                                                                                                 
429 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

With this ticket:

sage: G = graphs.Grid2dGraph(100, 100)  # dynamic sparse                                                                                                                                                                                                                                                                                                                                     
sage: V = list(G)                                                                                                                                                                                                                                                                                                                                                          
sage: %timeit H = G.subgraph(V[:50], algorithm='add')                                                                                                                                                                                                                                                                                                                      
80.9 µs ± 19.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
sage: %timeit H = G.copy()                                                                                                                                                                                                                                                                                                                                                 
15.4 ms ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

sage: G = G.copy(immutable=True)  # static sparse                                                                                                                                                                                                                                                                                                                                           
sage: %timeit H = G.subgraph(V[:50], algorithm='add')                                                                                                                                                                                                                                                                                                                      
327 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sage: %timeit H = G.copy(immutable=False)                                                                                                                                                                                                                                                                                                                                  
20.3 ms ± 23 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

sage: G = graphs.CompleteGraph(1000)  # dynamic dense                                                                                                                                                                                                                                                                                                                             
sage: V = list(G)                                                                                                                                                                                                                                                                                                                                                          
sage: %timeit H = G.subgraph(V[:50], algorithm='add')                                                                                                                                                                                                                                                                                                                      
1.03 ms ± 10.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sage: %timeit H = G.copy()                                                                                                                                                                                                                                                                                                                                                 
269 ms ± 3.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Change History (32)

comment:1 Changed 14 months ago by dcoudert

  • Branch set to public/graphs/30753_subgraph
  • Cc gh-kliem added
  • Commit set to 9495c5b111fb3df9b33f2f955ccbbfd44963f9f8

I'm not sure if the threshold for selecting default algorithm is still best possible. At least, we have improved both algorithms.


New commits:

9495c5btrac #30753: improve _subgraph_by_adding

comment:2 Changed 14 months ago by dcoudert

  • Status changed from new to needs_review

comment:3 Changed 14 months ago by dcoudert

Currently, the default algorithm is 'delete' if the number of vertices of the subgraph is at least 5% of the number of vertices of the graph. But following tests suggest that 'add' is faster when the number of vertices of the subgraph is up to 90%... so we should change the threshold to e.g. 90%. Do you agree ?

sage: G = graphs.Grid2dGraph(100, 100)
sage: V = list(G)
sage: %timeit H = G.subgraph(V[:50], algorithm=None)   # use add
219 µs ± 7.56 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sage: %timeit H = G.subgraph(V[:50], algorithm='add')
213 µs ± 10.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sage: %timeit H = G.subgraph(V[:50], algorithm='delete')
62.3 ms ± 1.64 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

sage: %timeit H = G.subgraph(V[:500], algorithm=None)  # use add
1.85 ms ± 35.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sage: %timeit H = G.subgraph(V[:500], algorithm='add')
1.99 ms ± 70.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sage: %timeit H = G.subgraph(V[:500], algorithm='delete')
64 ms ± 1.14 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

sage: %timeit H = G.subgraph(V[:5000], algorithm=None)  # use delete but add is faster here
53 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
sage: %timeit H = G.subgraph(V[:5000], algorithm='add')
23.7 ms ± 229 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
sage: %timeit H = G.subgraph(V[:5000], algorithm='delete')
52.3 ms ± 636 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
sage: G = graphs.CompleteGraph(1000)                                                                                                
sage: V = list(G)                                                                                                                   
sage: %timeit H = G.subgraph(V[:50], algorithm=None)  # use 'add'
17.9 ms ± 408 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
sage: %timeit H = G.subgraph(V[:50], algorithm='add')
19.5 ms ± 441 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
sage: %timeit H = G.subgraph(V[:50], algorithm='delete')
962 ms ± 14.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

sage: %timeit H = G.subgraph(V[:100], algorithm=None)  # use 'delete' but 'add' is faster
962 ms ± 20.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
sage: %timeit H = G.subgraph(V[:100], algorithm='add')
38 ms ± 828 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
sage: %timeit H = G.subgraph(V[:100], algorithm='delete')
959 ms ± 18.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

sage: %timeit H = G.subgraph(V[:500], algorithm=None)  # use 'delete' but 'add' is faster
949 ms ± 18.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
sage: %timeit H = G.subgraph(V[:500], algorithm='add')
283 ms ± 15.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
sage: %timeit H = G.subgraph(V[:100], algorithm='delete')
982 ms ± 32.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

sage: %timeit H = G.subgraph(V[:800], algorithm=None)  # use 'delete' but 'add' is faster
918 ms ± 35.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
sage: %timeit H = G.subgraph(V[:800], algorithm='add')
643 ms ± 19.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
sage: %timeit H = G.subgraph(V[:800], algorithm='delete')
909 ms ± 35.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

sage: %timeit H = G.subgraph(V[:900], algorithm=None)  # use 'delete' but 'add' is faster
862 ms ± 19.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
sage: %timeit H = G.subgraph(V[:900], algorithm='add')
735 ms ± 16.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

comment:4 Changed 14 months ago by git

  • Commit changed from 9495c5b111fb3df9b33f2f955ccbbfd44963f9f8 to c211ff85508ac09c27285aa84c99b98d63e7f4ea

Branch pushed to git repo; I updated commit sha1. New commits:

c211ff8trac #30753: change threshold for default algorithm

comment:5 Changed 14 months ago by dcoudert

I change the threshold for default algorithm to use use 'delete' when the subgraph has at least 90% of the vertices and 'add' otherwise. It seems faster this way. I updated some doctests (mostly changes in orderings).

comment:6 Changed 14 months ago by gh-kliem

The example in this ticket might imply, that #30665 should be put back onto needs work.

I see a huge slowdown.

comment:7 Changed 14 months ago by dcoudert

OK, so may be we should wait for the finalization of #30665 before changing the threshold.

comment:8 Changed 14 months ago by gh-kliem

I think I'm done with #30665.

However, I could open another ticket and implement a subgraph iterator in the backend: Given a list of vertices, iterate over all edges in the induced subgraph. This should be much faster.

comment:9 Changed 14 months ago by dcoudert

We can give it a try, at least for simple cases. You can either do it here (the branch is public) or in a separate ticket but it might be harder to ensure that all changes are consistent (i.e., improve the situation).

comment:10 Changed 14 months ago by gh-kliem

I would say, it is definitely worth it. But I have only a draft now with some todos.

In particular, this should speedup the copy method:

sage: G = graphs.Grid2dGraph(100, 100)                                                                                                                                                                                                                                                                                                                                     
sage: V = list(G)                                                                                                                                                                                                                                                                                                                                                          
sage: %timeit H = G.copy()                                                                                                                                                                                                                                                                                                                                                 
31.1 ms ± 183 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
sage: %timeit H = G.subgraph(V[:10000], algorithm='add')                                                                                                                                                                                                                                                                                                                   
16.6 ms ± 15.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

comment:11 Changed 14 months ago by gh-kliem

Btw, subgraph_by_deleting should only be used in place or if the backend has an optimized copy method (which appears to be not the case yet).

Not sure about the special subgraphs yet, which are not obtained by just reducing the number of vertices.

comment:12 Changed 14 months ago by git

  • Commit changed from c211ff85508ac09c27285aa84c99b98d63e7f4ea to adb2d57c09b4ea0e9abc706c37a831d909203ead

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

5f7b30dreviewers suggestion
002ec5ddocument that an iterator can be used to relabel edges
5deaa25fix wrong usage and document it
d7c27ebmake iterator saver
86b0ba6expose `sort_vertices`
737e0f8lookup in bitset is much faster
07b7a8ebitset must not be empty
957b4cccheck if vertices are empty
8dbef67Merge branch 'u/gh-kliem/graphs_improve_edge_iterator' of git://trac.sagemath.org/sage into public/graphs/30753_subgraph
adb2d57first step towards subgraph_by_adding in backend

comment:13 Changed 14 months ago by git

  • Commit changed from adb2d57c09b4ea0e9abc706c37a831d909203ead to 3a42e25f9b335edbe748daef3d5d924d3d5b775f

Branch pushed to git repo; I updated commit sha1. New commits:

3a42e25check for bitset capacity

comment:14 Changed 14 months ago by gh-kliem

  • Authors changed from David Coudert to David Coudert, Jonathan Kliem
  • Status changed from needs_review to needs_work

comment:15 Changed 14 months ago by gh-kliem

  • Dependencies set to #30665

comment:16 Changed 14 months ago by dcoudert

I'm not sure about this last commit as it has a different number than in #30665. Is it normal ?

comment:17 Changed 14 months ago by gh-kliem

This usually happens to me when cherry picking. I can rebase at some point on #30665.

comment:18 Changed 14 months ago by git

  • Commit changed from 3a42e25f9b335edbe748daef3d5d924d3d5b775f to 0372941f0f4e62077b5957322d4ae56427dc06ba

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

b7d739fcheck for bitset capacity
ee04870trac #30753: improve _subgraph_by_adding
c0a3b0atrac #30753: change threshold for default algorithm
0372941first step towards subgraph_by_adding in backend

comment:19 Changed 14 months ago by dcoudert

  • may be the method obtain_subgraph could be renamed like subgraph_given_vertices or something more explicit with respect what it actually do.
  • there is a line cdef CGraph to remove
  • instead of max(b_vertices_2), why not using the size or capacity of bitset active_vertices ? or may be the size of b_vertices ?
  • Somehow, before starting to add vertices to other, we should call realloc with appropriate size, thus avoiding multiple calls.
  • I have no clue how to deal with multiple edges here.
  • Concerning methods add and delete, I fully agree that now delete should be used only when inplace is True or algorithm == 'delete'. The add method is now way faster and should be the default one, without threshold.

comment:20 Changed 14 months ago by gh-kliem

I'll keep that in mind.

Concerning the multiedges, unfortunately I shbould clean up dense and sparse graphs backend a bit first. Otherwise, I think it is not a problem.

If self is mutliedge, it will simply try to make other multiedge. If other is dense, this will raise an error.

If I do things right, optimizing is_subgraph afterwards should be easy. In particual this can be used to check equality fast.

comment:21 Changed 14 months ago by dcoudert

So may be for the moment, we can use the new method for graphs without multiple edges only and let other cases in the todo list (i.e., raise NotImplementedError if called on a graph with multiple edges).

It's cool if we can also optimize other methods. The work you have already done for iterating edges plus these optimizations of the subgraph method makes significant speed ups for the entire graph module :))

comment:22 Changed 14 months ago by gh-kliem

The current method does not even work for labeled graphs.

Give me a day or so and I'll see if I can figure things out.

comment:23 Changed 14 months ago by gh-kliem

  • Branch changed from public/graphs/30753_subgraph to public/graphs/30753_subgraph_new
  • Commit changed from 0372941f0f4e62077b5957322d4ae56427dc06ba to 75b7de5acbbc1be7772b99ae0b8efeaee0d59ba4

This is basically working.

What is missing:

  • Documentation.
  • Cleanup.
  • Exposure in __init__ of DiGraph and Graph to speed up e.g. copy.
  • (Maybe) Heuristics when to delete instead of add: As long as there are no improved copying methods, I changed the deletion subgraph to first obtain the subgraph induced by the given vertices instead of making a copy. Depending on the input of edges and edge_property, this might be better than adding, but maybe in not so many cases. And in those cases it might be more natural to obtain a subgraph first and then delete edges.

Last 10 new commits:

c71a05efix doctests
a1e1445trac #30753: improve _subgraph_by_adding
19f3d62trac #30753: change threshold for default algorithm
3306bc9first step towards subgraph_by_adding in backend
ae9204breviewers comments
fd5a206deal with labels and multiple edges
bafda55avoid bitset must not be empty
853802aimplement subgraph_given_vertices for static sparse
159869fuse cases for subgraph
75b7de5allow deleting of multiedges and loops

comment:24 Changed 14 months ago by dcoudert

This is really nice and I observe some speed up at least when doctesting the graph module :))

comment:25 Changed 14 months ago by git

  • Commit changed from 75b7de5acbbc1be7772b99ae0b8efeaee0d59ba4 to cbda82c0f6674c1248338cb177273b8a24435af1

Branch pushed to git repo; I updated commit sha1. New commits:

1a000effix mistakes
cbda82cexpose subgraph method to initialization/copying

comment:26 Changed 14 months ago by gh-kliem

  • Dependencies changed from #30665 to #30665, #30769

comment:27 Changed 14 months ago by git

  • Commit changed from cbda82c0f6674c1248338cb177273b8a24435af1 to 776e38f30e301f4bef172ff752633412f641e770

Branch pushed to git repo; I updated commit sha1. This was a forced push. Last 10 new commits:

1a8ec00trac #30753: change threshold for default algorithm
29b485bfirst step towards subgraph_by_adding in backend
2f25e38reviewers comments
f4f2473deal with labels and multiple edges
fd624f2avoid bitset must not be empty
f4bc40bimplement subgraph_given_vertices for static sparse
05c3f72use cases for subgraph
7a4326eallow deleting of multiedges and loops
fb96022fix mistakes
776e38fexpose subgraph method to initialization/copying

comment:28 Changed 14 months ago by git

  • Commit changed from 776e38f30e301f4bef172ff752633412f641e770 to 40bacf03c0633affd490bac998856618879ba271

Branch pushed to git repo; I updated commit sha1. New commits:

7c6e1d5clean up
40bacf0documentation

comment:29 Changed 14 months ago by gh-kliem

  • Status changed from needs_work to needs_review

comment:30 Changed 14 months ago by gh-kliem

  • Description modified (diff)

comment:31 Changed 14 months ago by dcoudert

  • Reviewers set to David Coudert
  • Status changed from needs_review to positive_review

It's not easy to review the changes done in this ticket as it depends on two other big tickets. Nonetheless, so far all my tests are successful and show serious speed up with respect previous design. Method subgraph was a bottleneck in several of my own code.

comment:32 Changed 13 months ago by vbraun

  • Branch changed from public/graphs/30753_subgraph_new to 40bacf03c0633affd490bac998856618879ba271
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.