Opened 2 years ago
Closed 2 years ago
#30753 closed enhancement (fixed)
Further improvements in method subgraph
Reported by:  David Coudert  Owned by:  

Priority:  major  Milestone:  sage9.3 
Component:  graph theory  Keywords:  
Cc:  ghkliem  Merged in:  
Authors:  David Coudert, Jonathan Kliem  Reviewers:  David Coudert 
Report Upstream:  N/A  Work issues:  
Branch:  40bacf0 (Commits, GitHub, GitLab)  Commit:  40bacf03c0633affd490bac998856618879ba271 
Dependencies:  #30665, #30769  Stopgaps: 
Description (last modified by )
We add a fast method subgraph_given_vertices
to CGraphBackend
. This method allows fast subgraph creation including copying.
Also improvements in subgraph
with a case forgotten in #30510.
Before #30665:
sage: G = graphs.Grid2dGraph(100, 100) # dynamic sparse sage: V = list(G) sage: %timeit H = G.subgraph(V[:50], algorithm='add') 280 µs ± 465 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit H = G.copy() 30.3 ms ± 72.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: G = G.copy(immutable=True) # static sparse sage: %timeit H = G.subgraph(V[:50], algorithm='add') 553 µs ± 373 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit H = G.copy(immutable=False) 31.6 ms ± 91.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: G = graphs.CompleteGraph(1000) # dynamic dense sage: V = list(G) sage: %timeit H = G.subgraph(V[:50], algorithm='add') 54.2 ms ± 54.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: %timeit H = G.copy() 527 ms ± 663 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
sage: G = graphs.Grid2dGraph(100, 100) # dynamic sparse sage: V = list(G) sage: %timeit H = G.subgraph(V[:50], algorithm='add') 257 µs ± 356 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit H = G.copy() 26.4 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: G = G.copy(immutable=True) # static sparse sage: %timeit H = G.subgraph(V[:50], algorithm='add') 521 µs ± 308 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit H = G.copy(immutable=False) 28.3 ms ± 95.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: G = graphs.CompleteGraph(1000) # dynamic dense sage: V = list(G) sage: %timeit H = G.subgraph(V[:50], algorithm='add') 51.6 ms ± 64.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: %timeit H = G.copy() 429 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
With this ticket:
sage: G = graphs.Grid2dGraph(100, 100) # dynamic sparse sage: V = list(G) sage: %timeit H = G.subgraph(V[:50], algorithm='add') 80.9 µs ± 19.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) sage: %timeit H = G.copy() 15.4 ms ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) sage: G = G.copy(immutable=True) # static sparse sage: %timeit H = G.subgraph(V[:50], algorithm='add') 327 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit H = G.copy(immutable=False) 20.3 ms ± 23 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: G = graphs.CompleteGraph(1000) # dynamic dense sage: V = list(G) sage: %timeit H = G.subgraph(V[:50], algorithm='add') 1.03 ms ± 10.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit H = G.copy() 269 ms ± 3.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Change History (32)
comment:1 Changed 2 years ago by
Branch:  → public/graphs/30753_subgraph 

Cc:  ghkliem added 
Commit:  → 9495c5b111fb3df9b33f2f955ccbbfd44963f9f8 
comment:2 Changed 2 years ago by
Status:  new → needs_review 

comment:3 Changed 2 years ago by
Currently, the default algorithm is 'delete'
if the number of vertices of the subgraph is at least 5% of the number of vertices of the graph. But following tests suggest that 'add'
is faster when the number of vertices of the subgraph is up to 90%... so we should change the threshold to e.g. 90%. Do you agree ?
sage: G = graphs.Grid2dGraph(100, 100) sage: V = list(G) sage: %timeit H = G.subgraph(V[:50], algorithm=None) # use add 219 µs ± 7.56 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit H = G.subgraph(V[:50], algorithm='add') 213 µs ± 10.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit H = G.subgraph(V[:50], algorithm='delete') 62.3 ms ± 1.64 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: %timeit H = G.subgraph(V[:500], algorithm=None) # use add 1.85 ms ± 35.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit H = G.subgraph(V[:500], algorithm='add') 1.99 ms ± 70.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit H = G.subgraph(V[:500], algorithm='delete') 64 ms ± 1.14 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: %timeit H = G.subgraph(V[:5000], algorithm=None) # use delete but add is faster here 53 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: %timeit H = G.subgraph(V[:5000], algorithm='add') 23.7 ms ± 229 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: %timeit H = G.subgraph(V[:5000], algorithm='delete') 52.3 ms ± 636 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
sage: G = graphs.CompleteGraph(1000) sage: V = list(G) sage: %timeit H = G.subgraph(V[:50], algorithm=None) # use 'add' 17.9 ms ± 408 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) sage: %timeit H = G.subgraph(V[:50], algorithm='add') 19.5 ms ± 441 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) sage: %timeit H = G.subgraph(V[:50], algorithm='delete') 962 ms ± 14.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sage: %timeit H = G.subgraph(V[:100], algorithm=None) # use 'delete' but 'add' is faster 962 ms ± 20.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sage: %timeit H = G.subgraph(V[:100], algorithm='add') 38 ms ± 828 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: %timeit H = G.subgraph(V[:100], algorithm='delete') 959 ms ± 18.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sage: %timeit H = G.subgraph(V[:500], algorithm=None) # use 'delete' but 'add' is faster 949 ms ± 18.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sage: %timeit H = G.subgraph(V[:500], algorithm='add') 283 ms ± 15.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sage: %timeit H = G.subgraph(V[:100], algorithm='delete') 982 ms ± 32.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sage: %timeit H = G.subgraph(V[:800], algorithm=None) # use 'delete' but 'add' is faster 918 ms ± 35.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sage: %timeit H = G.subgraph(V[:800], algorithm='add') 643 ms ± 19.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sage: %timeit H = G.subgraph(V[:800], algorithm='delete') 909 ms ± 35.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sage: %timeit H = G.subgraph(V[:900], algorithm=None) # use 'delete' but 'add' is faster 862 ms ± 19.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sage: %timeit H = G.subgraph(V[:900], algorithm='add') 735 ms ± 16.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
comment:4 Changed 2 years ago by
Commit:  9495c5b111fb3df9b33f2f955ccbbfd44963f9f8 → c211ff85508ac09c27285aa84c99b98d63e7f4ea 

Branch pushed to git repo; I updated commit sha1. New commits:
c211ff8  trac #30753: change threshold for default algorithm

comment:5 Changed 2 years ago by
I change the threshold for default algorithm to use use 'delete'
when the subgraph has at least 90% of the vertices and 'add'
otherwise. It seems faster this way. I updated some doctests (mostly changes in orderings).
comment:6 Changed 2 years ago by
The example in this ticket might imply, that #30665 should be put back onto needs work.
I see a huge slowdown.
comment:7 Changed 2 years ago by
OK, so may be we should wait for the finalization of #30665 before changing the threshold.
comment:8 Changed 2 years ago by
I think I'm done with #30665.
However, I could open another ticket and implement a subgraph iterator in the backend: Given a list of vertices, iterate over all edges in the induced subgraph. This should be much faster.
comment:9 Changed 2 years ago by
We can give it a try, at least for simple cases. You can either do it here (the branch is public) or in a separate ticket but it might be harder to ensure that all changes are consistent (i.e., improve the situation).
comment:10 Changed 2 years ago by
I would say, it is definitely worth it. But I have only a draft now with some todos.
In particular, this should speedup the copy method:
sage: G = graphs.Grid2dGraph(100, 100) sage: V = list(G) sage: %timeit H = G.copy() 31.1 ms ± 183 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: %timeit H = G.subgraph(V[:10000], algorithm='add') 16.6 ms ± 15.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
comment:11 Changed 2 years ago by
Btw, subgraph_by_deleting
should only be used in place or if the backend has an optimized copy method (which appears to be not the case yet).
Not sure about the special subgraphs yet, which are not obtained by just reducing the number of vertices.
comment:12 Changed 2 years ago by
Commit:  c211ff85508ac09c27285aa84c99b98d63e7f4ea → adb2d57c09b4ea0e9abc706c37a831d909203ead 

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:
5f7b30d  reviewers suggestion

002ec5d  document that an iterator can be used to relabel edges

5deaa25  fix wrong usage and document it

d7c27eb  make iterator saver

86b0ba6  expose `sort_vertices`

737e0f8  lookup in bitset is much faster

07b7a8e  bitset must not be empty

957b4cc  check if vertices are empty

8dbef67  Merge branch 'u/ghkliem/graphs_improve_edge_iterator' of git://trac.sagemath.org/sage into public/graphs/30753_subgraph

adb2d57  first step towards subgraph_by_adding in backend

comment:13 Changed 2 years ago by
Commit:  adb2d57c09b4ea0e9abc706c37a831d909203ead → 3a42e25f9b335edbe748daef3d5d924d3d5b775f 

Branch pushed to git repo; I updated commit sha1. New commits:
3a42e25  check for bitset capacity

comment:14 Changed 2 years ago by
Authors:  David Coudert → David Coudert, Jonathan Kliem 

Status:  needs_review → needs_work 
comment:15 Changed 2 years ago by
Dependencies:  → #30665 

comment:16 Changed 2 years ago by
I'm not sure about this last commit as it has a different number than in #30665. Is it normal ?
comment:17 Changed 2 years ago by
This usually happens to me when cherry picking. I can rebase at some point on #30665.
comment:18 Changed 2 years ago by
Commit:  3a42e25f9b335edbe748daef3d5d924d3d5b775f → 0372941f0f4e62077b5957322d4ae56427dc06ba 

comment:19 Changed 2 years ago by
 may be the method
obtain_subgraph
could be renamed likesubgraph_given_vertices
or something more explicit with respect what it actually do.  there is a line
cdef CGraph
to remove  instead of
max(b_vertices_2)
, why not using the size or capacity of bitsetactive_vertices
? or may be the size ofb_vertices
?  Somehow, before starting to add vertices to
other
, we should callrealloc
with appropriate size, thus avoiding multiple calls.  I have no clue how to deal with multiple edges here.
 Concerning methods
add
anddelete
, I fully agree that nowdelete
should be used only wheninplace is True or algorithm == 'delete'
. Theadd
method is now way faster and should be the default one, without threshold.
comment:20 Changed 2 years ago by
I'll keep that in mind.
Concerning the multiedges, unfortunately I shbould clean up dense and sparse graphs backend a bit first. Otherwise, I think it is not a problem.
If self
is mutliedge, it will simply try to make
other
multiedge. If other is dense, this will raise an error.
If I do things right, optimizing is_subgraph
afterwards should be easy. In particual this can be used to check equality fast.
comment:21 Changed 2 years ago by
So may be for the moment, we can use the new method for graphs without multiple edges only and let other cases in the todo list (i.e., raise NotImplementedError
if called on a graph with multiple edges).
It's cool if we can also optimize other methods. The work you have already done for iterating edges plus these optimizations of the subgraph method makes significant speed ups for the entire graph module :))
comment:22 Changed 2 years ago by
The current method does not even work for labeled graphs.
Give me a day or so and I'll see if I can figure things out.
comment:23 Changed 2 years ago by
Branch:  public/graphs/30753_subgraph → public/graphs/30753_subgraph_new 

Commit:  0372941f0f4e62077b5957322d4ae56427dc06ba → 75b7de5acbbc1be7772b99ae0b8efeaee0d59ba4 
This is basically working.
What is missing:
 Documentation.
 Cleanup.
 Exposure in
__init__
ofDiGraph
andGraph
to speed up e.g. copy.  (Maybe) Heuristics when to delete instead of add:
As long as there are no improved copying methods, I changed the deletion subgraph to first obtain the subgraph induced by the given vertices instead of making a copy. Depending on the input of
edges
andedge_property
, this might be better than adding, but maybe in not so many cases. And in those cases it might be more natural to obtain a subgraph first and then delete edges.
Last 10 new commits:
c71a05e  fix doctests

a1e1445  trac #30753: improve _subgraph_by_adding

19f3d62  trac #30753: change threshold for default algorithm

3306bc9  first step towards subgraph_by_adding in backend

ae9204b  reviewers comments

fd5a206  deal with labels and multiple edges

bafda55  avoid bitset must not be empty

853802a  implement subgraph_given_vertices for static sparse

159869f  use cases for subgraph

75b7de5  allow deleting of multiedges and loops

comment:24 Changed 2 years ago by
This is really nice and I observe some speed up at least when doctesting the graph module :))
comment:25 Changed 2 years ago by
Commit:  75b7de5acbbc1be7772b99ae0b8efeaee0d59ba4 → cbda82c0f6674c1248338cb177273b8a24435af1 

comment:26 Changed 2 years ago by
Dependencies:  #30665 → #30665, #30769 

comment:27 Changed 2 years ago by
Commit:  cbda82c0f6674c1248338cb177273b8a24435af1 → 776e38f30e301f4bef172ff752633412f641e770 

Branch pushed to git repo; I updated commit sha1. This was a forced push. Last 10 new commits:
1a8ec00  trac #30753: change threshold for default algorithm

29b485b  first step towards subgraph_by_adding in backend

2f25e38  reviewers comments

f4f2473  deal with labels and multiple edges

fd624f2  avoid bitset must not be empty

f4bc40b  implement subgraph_given_vertices for static sparse

05c3f72  use cases for subgraph

7a4326e  allow deleting of multiedges and loops

fb96022  fix mistakes

776e38f  expose subgraph method to initialization/copying

comment:28 Changed 2 years ago by
Commit:  776e38f30e301f4bef172ff752633412f641e770 → 40bacf03c0633affd490bac998856618879ba271 

comment:29 Changed 2 years ago by
Status:  needs_work → needs_review 

comment:30 Changed 2 years ago by
Description:  modified (diff) 

comment:31 Changed 2 years ago by
Reviewers:  → David Coudert 

Status:  needs_review → positive_review 
It's not easy to review the changes done in this ticket as it depends on two other big tickets. Nonetheless, so far all my tests are successful and show serious speed up with respect previous design. Method subgraph was a bottleneck in several of my own code.
comment:32 Changed 2 years ago by
Branch:  public/graphs/30753_subgraph_new → 40bacf03c0633affd490bac998856618879ba271 

Resolution:  → fixed 
Status:  positive_review → closed 
I'm not sure if the threshold for selecting default algorithm is still best possible. At least, we have improved both algorithms.
New commits:
trac #30753: improve _subgraph_by_adding