Opened 2 years ago
Closed 2 years ago
#30665 closed enhancement (fixed)
Optimize edge iterator for graphs
Reported by: | gh-kliem | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-9.3 |
Component: | graph theory | Keywords: | edge iterator, graph |
Cc: | Merged in: | ||
Authors: | Jonathan Kliem | Reviewers: | David Coudert |
Report Upstream: | N/A | Work issues: | |
Branch: | 8beb573 (Commits, GitHub, GitLab) | Commit: | 8beb57302c92d17f4b703e53a1170aa0acd80920 |
Dependencies: | Stopgaps: |
Description (last modified by )
This ticket reorganizes and improves the iterator over edges in the graph backends.
Currently, the code is duplicated for dense and sparse graphs. Also the iterator works by creating successive lists of neighbors. Mostly, an iterator should not create lists of things.
In case of multiple edges, it might be desirable to create lists of all arcs from u
to v
. Otherwise we might risk segmentation faults, if messing with the graph while iterating through it.
We expose an unsorted edge iterator.
Before:
sage: G = Graph(loops=False, multiedges=False) ....: G.add_edges([(i, (i+85)%100) for i in range(100)]) ....: G.add_edges([(i, (i+37)%100) for i in range(100)]) ....: G.add_edges([(i, (i+83)%100) for i in range(100)]) sage: %timeit edges = list(G.edge_iterator()) 103 µs ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) sage: H = G.copy() sage: %timeit H == G 268 µs ± 2.43 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: G = Graph(loops=False, multiedges=False, sparse=False) ....: G.add_edges([(i, (i+85)%100) for i in range(100)]) ....: G.add_edges([(i, (i+37)%100) for i in range(100)]) ....: G.add_edges([(i, (i+83)%100) for i in range(100)]) sage: %timeit edges = list(G.edge_iterator()) 228 µs ± 40.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: H = G.copy() sage: %timeit H == G 380 µs ± 2.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: G = Graph(loops=False, multiedges=True) ....: G.add_edges([(i, (i+85)%100, j) for j in range(20) for i in range(100)]) ....: G.add_edges([(i, (i+37)%100, j) for j in range(20) for i in range(100)]) ....: G.add_edges([(i, (i+83)%100, j) for j in range(20) for i in range(100)]) sage: %timeit edges = list(G.edge_iterator()) 2.22 ms ± 412 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) sage: H = G.copy() sage: %timeit H == G 2.94 ms ± 51.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
After:
sage: G = Graph(loops=False, multiedges=False) ....: G.add_edges([(i, (i+85)%100) for i in range(100)]) ....: G.add_edges([(i, (i+37)%100) for i in range(100)]) ....: G.add_edges([(i, (i+83)%100) for i in range(100)]) sage: %timeit edges = list(G.edge_iterator()) 66.2 µs ± 693 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) sage: %timeit edges = list(G.edge_iterator(sort_vertices=False)) 50.1 µs ± 404 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) sage: H = G.copy() sage: %timeit H == G 244 µs ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: G = Graph(loops=False, multiedges=False, sparse=False) ....: G.add_edges([(i, (i+85)%100) for i in range(100)]) ....: G.add_edges([(i, (i+37)%100) for i in range(100)]) ....: G.add_edges([(i, (i+83)%100) for i in range(100)]) sage: %timeit edges = list(G.edge_iterator()) 125 µs ± 597 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) sage: %timeit edges = list(G.edge_iterator(sort_vertices=False)) 112 µs ± 1.22 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) sage: H = G.copy() sage: %timeit H == G 318 µs ± 44.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: G = Graph(loops=False, multiedges=True) ....: G.add_edges([(i, (i+85)%100, j) for j in range(20) for i in range(100)]) ....: G.add_edges([(i, (i+37)%100, j) for j in range(20) for i in range(100)]) ....: G.add_edges([(i, (i+83)%100, j) for j in range(20) for i in range(100)]) sage: %timeit edges = list(G.edge_iterator()) 1.05 ms ± 3.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit edges = list(G.edge_iterator(sort_vertices=False)) 1.02 ms ± 40.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: H = G.copy() sage: %timeit H == G 2.19 ms ± 28.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Change History (50)
comment:1 Changed 2 years ago by
Branch: | → u/gh-kliem/graphs_improve_edge_iterator |
---|
comment:2 Changed 2 years ago by
Commit: | → 86fdd80d12e24f314b9fe291e7a20c033dcff3ac |
---|
comment:3 follow-up: 5 Changed 2 years ago by
in method next_in_neighbor_unsafe
in dense_graph.pyx
, instead of for i in range(u, self.active_vertices.size)
, you could use bitset_next(self.active_vertices, u)
.
comment:4 Changed 2 years ago by
Commit: | 86fdd80d12e24f314b9fe291e7a20c033dcff3ac → 90659358abd6cc5f1fad387cb085a0d5f351c4c3 |
---|
comment:5 Changed 2 years ago by
Replying to dcoudert:
in method
next_in_neighbor_unsafe
indense_graph.pyx
, instead offor i in range(u, self.active_vertices.size)
, you could usebitset_next(self.active_vertices, u)
.
I'm not sure I understand.
There are a bunch of bitsets (a total of self.active_vertices.size - u
) and I'm trying to figure out, if one of them contains v
. I don't see, how to apply bitset_next
in this situation.
dense graphs shouldn't reimplement bitsets for their edges, but that is a different story.
comment:6 Changed 2 years ago by
I'm just saying that we can do something like:
cdef size_t i = u while i != -1: if self.edges[place + i * self.num_longs] & word: return i i = bitset_next(self.active_vertices, i) return -1
I don't know if it is faster or not for dense graphs.
comment:8 Changed 2 years ago by
Btw, bitset_next(foo, i)
will give the next element including i
.
Contrary to that next_in_neighbor_unsafe(v, u, l)
will give the next element not including u
. The reason for this is that in sparse graphs the arcs are grouped into a couple of trees according to u & hash_mask
. So it is much easier to obtain the next arc in the same tree. So next_in_neighbor_unsafe
will sort the u
s not by total order but different.
I don't want CGraph
to know how this works, so this way CGraph
can just trust the specification to figure out, what would be the next neighbor.
comment:9 Changed 2 years ago by
For now I would leave static sparse as it is and just define iterator_unsorted_edges = iterator_edges
.
In order to specify next_in_neighbor_unsafe
for static sparse efficiently, one should add something like u_index
to the signature. I don't know if there is much need for it.
comment:10 Changed 2 years ago by
Commit: | 90659358abd6cc5f1fad387cb085a0d5f351c4c3 → 1a85353a799f43804ade03a18afc9603c773abea |
---|
comment:11 Changed 2 years ago by
Description: | modified (diff) |
---|
comment:12 Changed 2 years ago by
Commit: | 1a85353a799f43804ade03a18afc9603c773abea → 4920ec8452527d3cd73048af6a74976575f9c7c0 |
---|
Branch pushed to git repo; I updated commit sha1. New commits:
4920ec8 | document that an iterator can be used to relabel edges
|
comment:13 Changed 2 years ago by
Code uses an edge iterator to relabel or vertices for example.
I documented that this is somewhat safe. This was one of my problem. In between I had segmentation faults because of this, because I was storing pointers to nodes, which were no longer valid.
comment:14 Changed 2 years ago by
Status: | new → needs_review |
---|
I pretty much happy with it for now.
I even managed to reduce the overall number of lines in the code while optimizing.
There are some optimizations that were simply not necessary. The compiler will figure out case distinctions like label
pretty good itself. If not, it usually doesn't matter for performance. I guess there are some cases, were compilers are pretty bad, but here it works all right and there is no need to duplicate code.
comment:15 Changed 2 years ago by
This is nice.
You added unsorted
to edge_iterator
. You could also add it to edges
and to EdgesView
in views.pyx
. We can of course let that for another ticket. We will have to check in many methods if we can benefit from this new parameter or not. Reducing the number of places where we sort is an important objective.
comment:16 Changed 2 years ago by
Once this is in you may want to check whether this also solves other "edge"-y tickets:
comment:17 follow-up: 19 Changed 2 years ago by
Patchbot reports doctest failures :
There is a problem in src/sage/combinat/cluster_algebra_quiver/mutation_class.py
around line 520, where one iterates over edges and modify the edges at the same time (direct access to the backend).
comment:18 Changed 2 years ago by
Commit: | 4920ec8452527d3cd73048af6a74976575f9c7c0 → 5fc372298b06c774a61da17098b335e0eb9a018d |
---|
Branch pushed to git repo; I updated commit sha1. New commits:
5fc3722 | fix wrong usage and document it
|
comment:19 Changed 2 years ago by
Replying to chapoton:
Patchbot reports doctest failures :
There is a problem in
src/sage/combinat/cluster_algebra_quiver/mutation_class.py
around line 520, where one iterates over edges and modify the edges at the same time (direct access to the backend).
Thank you. That was an easy one. I think it is just wrong to modify the vertices and not specify vertices
. It is difficult to know, what a user wants at this point.
I fixed it and documented this.
It is also kind of wrong to modify the edges while iterating, but as long as the vertices are well-defined this works and its the users problem to deal with whether new edges will appear in this run or not (unless you modify the current edge, like relabeling or deleting, then things are well-defined).
comment:20 follow-up: 22 Changed 2 years ago by
It is generally unsafe to modify the graph while iterating over the edges. Si I'm not sure the proposed modification is sufficient to solve the issue
comment:21 Changed 2 years ago by
Commit: | 5fc372298b06c774a61da17098b335e0eb9a018d → 5e91e9976bac1db76f819bca9c3c14f704cdaa11 |
---|
Branch pushed to git repo; I updated commit sha1. New commits:
5e91e99 | make iterator saver
|
comment:22 Changed 2 years ago by
Replying to dcoudert:
It is generally unsafe to modify the graph while iterating over the edges. Si I'm not sure the proposed modification is sufficient to solve the issue
I agree that iterating over something while modifying it, is not a good idea usually. It is nice, if you can relabel edges or delete some of them, without getting a complete list.
I added a quick check, that v_int
is still and active vertex. This indeed was a big issue (potentially segmentation fault), but also the only issue I can think of (which doesn't mean that it is the only one). Here are my thoughts about it:
next_neighbor_unsafe
will just pick the next neighbor as long as v_int
is an active vertex. If there are multiple arcs, they will be cached in a list, before yielding anything. This way things should be safe.
However, if you delete an edge and then add one, it might happen that the meaning of v_int
changes. I think, this is the users problem. You shouldn't ask for all edges out of a vertex and then along the way delete it and assume that things are fine.
If you have multiple arcs and delete them after obtaining the first arc was yield by the iterator, this is the users problem. The other arcs will still be output, but things should be safe as long as you use safe methods.
If you add/delete/modify edges, things are fine otherwise. However, it might happen that a new edge will be yield by the iterator or not (according to internal order). If you use the edge iterator to relabel edges things are fine, as long as you do not have multiple edges. (Once you have seen an edge (1,2)
, you know that it will not be output according to internal order again.)
comment:23 Changed 2 years ago by
OK, we can let it like that for the moment. This patch has only exhibit an issue in that code that is now fixed. If something goes wrong, a specific bug will be raised by a user.
comment:24 Changed 2 years ago by
I know. It's not like this was very stable before. In my opinion the only difference is, that before the edge iterator cached all arcs from v
, yield them all and then went on to the next vertex.
Now, we only cache all arcs from v
to u
. If there is only one arc, there is nothing to cache.
Either way, you can make the iterator behave strangely, by modifying the graph along the way. It shouldn't crash hard, this I think is important. Also the strange behavior should be somewhat foreseeable: If you add an arc (1,2)
along the way, it would be strange if (3,4)
appears twice now. It would be okay, if (1,2)
pops up again at some point.
If you delete an edge, I think it would even be fine, if it still appears in the iterator (even so it doesn't), but all the other edges should be yield correctly regardless.
comment:26 Changed 2 years ago by
Sure. What about naming it sorted
instead of unsorted (not in the backend), but for edges
, EdgesView
, edge_iterator
?
comment:27 Changed 2 years ago by
Well, edges
and EgesView
already have parameter sort
to sort or not the edges. The new parameter must be sufficiently different. I'm not sure of the best choice...
comment:30 Changed 2 years ago by
Commit: | 5e91e9976bac1db76f819bca9c3c14f704cdaa11 → 98269fd36424e58ddc76de2a204ad7ae668d6bb6 |
---|
Branch pushed to git repo; I updated commit sha1. New commits:
98269fd | expose `sort_vertices`
|
comment:31 Changed 2 years ago by
Description: | modified (diff) |
---|
comment:32 Changed 2 years ago by
Description: | modified (diff) |
---|
comment:33 Changed 2 years ago by
Reviewers: | → David Coudert |
---|---|
Status: | needs_review → positive_review |
LGTM.
Thanks a lot.
comment:35 Changed 2 years ago by
Status: | positive_review → needs_work |
---|
There is a slowdown in the case that we do not access all vertices:
Before:
sage: G = graphs.Grid2dGraph(100, 100) sage: V = list(G) sage: %timeit list(G.edge_iterator(V[:500])) 454 µs ± 266 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
After:
sage: G = graphs.Grid2dGraph(100, 100) sage: V = list(G) sage: %timeit list(G.edge_iterator(V[:500])) 2.72 ms ± 1.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
comment:36 Changed 2 years ago by
Commit: | 98269fd36424e58ddc76de2a204ad7ae668d6bb6 → 737e0f8011cb417059d23202abbab9dea41051c8 |
---|
Branch pushed to git repo; I updated commit sha1. This was a forced push. Last 10 new commits:
59fb906 | remove duplications
|
4944027 | switch u and v in place
|
38c6031 | fix for directed multiedge graphs with labels
|
dbd17ef | use unsorted edges to speed up equality check
|
5f7b30d | reviewers suggestion
|
002ec5d | document that an iterator can be used to relabel edges
|
5deaa25 | fix wrong usage and document it
|
d7c27eb | make iterator saver
|
86b0ba6 | expose `sort_vertices`
|
737e0f8 | lookup in bitset is much faster
|
comment:37 Changed 2 years ago by
Status: | needs_work → needs_review |
---|
One simple commit:
Before this ticket:
sage: G = graphs.Grid2dGraph(100, 100) sage: V = list(G) sage: %timeit list(G.edge_iterator(V[:500])) 449 µs ± 588 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit list(G.edge_iterator(V[:5000])) 4.97 ms ± 4.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) sage: G = graphs.CompleteGraph(1000) sage: V = list(G) sage: %timeit list(G.edge_iterator(V[:100])) 20.1 ms ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: %timeit list(G.edge_iterator(V[:800])) 130 ms ± 1.45 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
With this ticket:
sage: %timeit list(G.edge_iterator(V[:500], sort_vertices=False)) 340 µs ± 1.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: G = graphs.Grid2dGraph(100, 100) sage: V = list(G) sage: %timeit list(G.edge_iterator(V[:500])) 377 µs ± 2.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit list(G.edge_iterator(V[:500], sort_vertices=False)) 338 µs ± 2.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit list(G.edge_iterator(V[:5000])) 4.3 ms ± 4.44 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) sage: %timeit list(G.edge_iterator(V[:5000], sort_vertices=False)) 3.85 ms ± 2.35 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) sage: G = graphs.CompleteGraph(1000) sage: V = list(G) sage: %timeit list(G.edge_iterator(V[:100], sort_vertices=False)) 14.8 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) sage: %timeit list(G.edge_iterator(V[:800])) 102 ms ± 67 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: %timeit list(G.edge_iterator(V[:800], sort_vertices=False)) 92.3 ms ± 1.61 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: %timeit list(G.edge_iterator()) # note that this is still slower, so we gain something by requesting only some vertices 110 ms ± 177 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
comment:38 Changed 2 years ago by
I have numerous failing doctests like:
********************************************************************** File "src/sage/graphs/independent_sets.pyx", line 164, in sage.graphs.independent_sets.IndependentSets.__init__ Failed example: for i in range(5): check_with_subgraph_search(graphs.RandomGNP(11, .3)) Exception raised: Traceback (most recent call last): File "/Users/dcoudert/sage/local/lib/python3.8/site-packages/sage/doctest/forker.py", line 720, in _run self.compile_and_execute(example, compiler, test.globs) File "/Users/dcoudert/sage/local/lib/python3.8/site-packages/sage/doctest/forker.py", line 1145, in compile_and_execute exec(compiled, globs) File "<doctest sage.graphs.independent_sets.IndependentSets.__init__[8]>", line 2, in <module> check_with_subgraph_search(graphs.RandomGNP(Integer(11), RealNumber('.3'))) File "<doctest sage.graphs.independent_sets.IndependentSets.__init__[7]>", line 3, in check_with_subgraph_search if not all(G.subgraph(l).is_independent_set() for l in IS): File "<doctest sage.graphs.independent_sets.IndependentSets.__init__[7]>", line 3, in <genexpr> if not all(G.subgraph(l).is_independent_set() for l in IS): File "/Users/dcoudert/sage/local/lib/python3.8/site-packages/sage/graphs/generic_graph.py", line 12589, in subgraph return self._subgraph_by_adding(vertices=vertices, edges=edges, File "/Users/dcoudert/sage/local/lib/python3.8/site-packages/sage/graphs/generic_graph.py", line 12724, in _subgraph_by_adding edges_to_keep = [e for e in self.edges(vertices=vertices, sort=False) File "/Users/dcoudert/sage/local/lib/python3.8/site-packages/sage/graphs/generic_graph.py", line 12724, in <listcomp> edges_to_keep = [e for e in self.edges(vertices=vertices, sort=False) File "sage/graphs/views.pyx", line 438, in __iter__ (build/cythonized/sage/graphs/views.c:3991) yield from self._iter_unsorted(vertices) File "sage/graphs/views.pyx", line 410, in _iter_unsorted (build/cythonized/sage/graphs/views.c:3607) yield from self._graph._backend.iterator_edges(vertices, self._labels) File "sage/graphs/base/c_graph.pyx", line 3683, in _iterator_edges (build/cythonized/sage/graphs/base/c_graph.cpp:25263) b_vertices = FrozenBitset(foo for foo in b_vertices_2 if foo >= 0) File "sage/data_structures/bitset.pyx", line 398, in sage.data_structures.bitset.FrozenBitset.__init__ (build/cythonized/sage/data_structures/bitset.c:2753) raise ValueError("Bitsets must not be empty") ValueError: Bitsets must not be empty ********************************************************************** File "src/sage/graphs/views.pyx", line 209, in sage.graphs.views.EdgesView Failed example: E = EdgesView(G, vertices=[0, 3], labels=False, sort=True); E Expected: [] Got: <repr(<sage.graphs.views.EdgesView at 0x15bfdef40>) failed: ValueError: Bitsets must not be empty>
comment:39 Changed 2 years ago by
Commit: | 737e0f8011cb417059d23202abbab9dea41051c8 → 07b7a8ea5d44447cd8b66d00d7a4485b2dd4b8ec |
---|
Branch pushed to git repo; I updated commit sha1. New commits:
07b7a8e | bitset must not be empty
|
comment:40 Changed 2 years ago by
comment:41 Changed 2 years ago by
It's much better like that.
We have the case len(vertices) == 1
. Why not adding the case not vertices
(empty).
comment:42 Changed 2 years ago by
Commit: | 07b7a8ea5d44447cd8b66d00d7a4485b2dd4b8ec → 957b4cc9eae331af65b64e8e83fa49d35403f4f7 |
---|
Branch pushed to git repo; I updated commit sha1. New commits:
957b4cc | check if vertices are empty
|
comment:43 Changed 2 years ago by
Commit: | 957b4cc9eae331af65b64e8e83fa49d35403f4f7 → b7d739fc55cfedba11be5ab9868312281772a7cf |
---|
Branch pushed to git repo; I updated commit sha1. New commits:
b7d739f | check for bitset capacity
|
comment:45 Changed 2 years ago by
Status: | needs_review → positive_review |
---|
LGTM and better than previous version. Thank you.
comment:46 Changed 2 years ago by
Commit: | b7d739fc55cfedba11be5ab9868312281772a7cf → 8beb57302c92d17f4b703e53a1170aa0acd80920 |
---|---|
Status: | positive_review → needs_review |
Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:
8beb573 | bitset capacity is +1 to last index
|
comment:49 Changed 2 years ago by
Milestone: | sage-9.2 → sage-9.3 |
---|
comment:50 Changed 2 years ago by
Branch: | u/gh-kliem/graphs_improve_edge_iterator → 8beb57302c92d17f4b703e53a1170aa0acd80920 |
---|---|
Resolution: | → fixed |
Status: | positive_review → closed |
Branch pushed to git repo; I updated commit sha1. Last 10 new commits:
implement next_in_neighbor_unsafe for dense graph and use it
copyright edit
document new methods
fit design of dense grpah to sparse graph
implement `out_neighbors_unsafe` and `in_neighbors_unsafe` for CGraphs
somehow working version
let next_neighbor set the arc
somehow working version
fix incorrect doctests; iterator_edges is only supposed to be called in the undirected case
clean up