Opened 2 years ago
Closed 2 years ago
#30665 closed enhancement (fixed)
Optimize edge iterator for graphs
Reported by:  ghkliem  Owned by:  

Priority:  major  Milestone:  sage9.3 
Component:  graph theory  Keywords:  edge iterator, graph 
Cc:  Merged in:  
Authors:  Jonathan Kliem  Reviewers:  David Coudert 
Report Upstream:  N/A  Work issues:  
Branch:  8beb573 (Commits, GitHub, GitLab)  Commit:  8beb57302c92d17f4b703e53a1170aa0acd80920 
Dependencies:  Stopgaps: 
Description (last modified by )
This ticket reorganizes and improves the iterator over edges in the graph backends.
Currently, the code is duplicated for dense and sparse graphs. Also the iterator works by creating successive lists of neighbors. Mostly, an iterator should not create lists of things.
In case of multiple edges, it might be desirable to create lists of all arcs from u
to v
. Otherwise we might risk segmentation faults, if messing with the graph while iterating through it.
We expose an unsorted edge iterator.
Before:
sage: G = Graph(loops=False, multiedges=False) ....: G.add_edges([(i, (i+85)%100) for i in range(100)]) ....: G.add_edges([(i, (i+37)%100) for i in range(100)]) ....: G.add_edges([(i, (i+83)%100) for i in range(100)]) sage: %timeit edges = list(G.edge_iterator()) 103 µs ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) sage: H = G.copy() sage: %timeit H == G 268 µs ± 2.43 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: G = Graph(loops=False, multiedges=False, sparse=False) ....: G.add_edges([(i, (i+85)%100) for i in range(100)]) ....: G.add_edges([(i, (i+37)%100) for i in range(100)]) ....: G.add_edges([(i, (i+83)%100) for i in range(100)]) sage: %timeit edges = list(G.edge_iterator()) 228 µs ± 40.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: H = G.copy() sage: %timeit H == G 380 µs ± 2.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: G = Graph(loops=False, multiedges=True) ....: G.add_edges([(i, (i+85)%100, j) for j in range(20) for i in range(100)]) ....: G.add_edges([(i, (i+37)%100, j) for j in range(20) for i in range(100)]) ....: G.add_edges([(i, (i+83)%100, j) for j in range(20) for i in range(100)]) sage: %timeit edges = list(G.edge_iterator()) 2.22 ms ± 412 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) sage: H = G.copy() sage: %timeit H == G 2.94 ms ± 51.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
After:
sage: G = Graph(loops=False, multiedges=False) ....: G.add_edges([(i, (i+85)%100) for i in range(100)]) ....: G.add_edges([(i, (i+37)%100) for i in range(100)]) ....: G.add_edges([(i, (i+83)%100) for i in range(100)]) sage: %timeit edges = list(G.edge_iterator()) 66.2 µs ± 693 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) sage: %timeit edges = list(G.edge_iterator(sort_vertices=False)) 50.1 µs ± 404 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) sage: H = G.copy() sage: %timeit H == G 244 µs ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: G = Graph(loops=False, multiedges=False, sparse=False) ....: G.add_edges([(i, (i+85)%100) for i in range(100)]) ....: G.add_edges([(i, (i+37)%100) for i in range(100)]) ....: G.add_edges([(i, (i+83)%100) for i in range(100)]) sage: %timeit edges = list(G.edge_iterator()) 125 µs ± 597 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) sage: %timeit edges = list(G.edge_iterator(sort_vertices=False)) 112 µs ± 1.22 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) sage: H = G.copy() sage: %timeit H == G 318 µs ± 44.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: G = Graph(loops=False, multiedges=True) ....: G.add_edges([(i, (i+85)%100, j) for j in range(20) for i in range(100)]) ....: G.add_edges([(i, (i+37)%100, j) for j in range(20) for i in range(100)]) ....: G.add_edges([(i, (i+83)%100, j) for j in range(20) for i in range(100)]) sage: %timeit edges = list(G.edge_iterator()) 1.05 ms ± 3.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit edges = list(G.edge_iterator(sort_vertices=False)) 1.02 ms ± 40.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: H = G.copy() sage: %timeit H == G 2.19 ms ± 28.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Change History (50)
comment:1 Changed 2 years ago by
Branch:  → u/ghkliem/graphs_improve_edge_iterator 

comment:2 Changed 2 years ago by
Commit:  → 86fdd80d12e24f314b9fe291e7a20c033dcff3ac 

comment:3 followup: 5 Changed 2 years ago by
in method next_in_neighbor_unsafe
in dense_graph.pyx
, instead of for i in range(u, self.active_vertices.size)
, you could use bitset_next(self.active_vertices, u)
.
comment:4 Changed 2 years ago by
Commit:  86fdd80d12e24f314b9fe291e7a20c033dcff3ac → 90659358abd6cc5f1fad387cb085a0d5f351c4c3 

comment:5 Changed 2 years ago by
Replying to dcoudert:
in method
next_in_neighbor_unsafe
indense_graph.pyx
, instead offor i in range(u, self.active_vertices.size)
, you could usebitset_next(self.active_vertices, u)
.
I'm not sure I understand.
There are a bunch of bitsets (a total of self.active_vertices.size  u
) and I'm trying to figure out, if one of them contains v
. I don't see, how to apply bitset_next
in this situation.
dense graphs shouldn't reimplement bitsets for their edges, but that is a different story.
comment:6 Changed 2 years ago by
I'm just saying that we can do something like:
cdef size_t i = u while i != 1: if self.edges[place + i * self.num_longs] & word: return i i = bitset_next(self.active_vertices, i) return 1
I don't know if it is faster or not for dense graphs.
comment:8 Changed 2 years ago by
Btw, bitset_next(foo, i)
will give the next element including i
.
Contrary to that next_in_neighbor_unsafe(v, u, l)
will give the next element not including u
. The reason for this is that in sparse graphs the arcs are grouped into a couple of trees according to u & hash_mask
. So it is much easier to obtain the next arc in the same tree. So next_in_neighbor_unsafe
will sort the u
s not by total order but different.
I don't want CGraph
to know how this works, so this way CGraph
can just trust the specification to figure out, what would be the next neighbor.
comment:9 Changed 2 years ago by
For now I would leave static sparse as it is and just define iterator_unsorted_edges = iterator_edges
.
In order to specify next_in_neighbor_unsafe
for static sparse efficiently, one should add something like u_index
to the signature. I don't know if there is much need for it.
comment:10 Changed 2 years ago by
Commit:  90659358abd6cc5f1fad387cb085a0d5f351c4c3 → 1a85353a799f43804ade03a18afc9603c773abea 

comment:11 Changed 2 years ago by
Description:  modified (diff) 

comment:12 Changed 2 years ago by
Commit:  1a85353a799f43804ade03a18afc9603c773abea → 4920ec8452527d3cd73048af6a74976575f9c7c0 

Branch pushed to git repo; I updated commit sha1. New commits:
4920ec8  document that an iterator can be used to relabel edges

comment:13 Changed 2 years ago by
Code uses an edge iterator to relabel or vertices for example.
I documented that this is somewhat safe. This was one of my problem. In between I had segmentation faults because of this, because I was storing pointers to nodes, which were no longer valid.
comment:14 Changed 2 years ago by
Status:  new → needs_review 

I pretty much happy with it for now.
I even managed to reduce the overall number of lines in the code while optimizing.
There are some optimizations that were simply not necessary. The compiler will figure out case distinctions like label
pretty good itself. If not, it usually doesn't matter for performance. I guess there are some cases, were compilers are pretty bad, but here it works all right and there is no need to duplicate code.
comment:15 Changed 2 years ago by
This is nice.
You added unsorted
to edge_iterator
. You could also add it to edges
and to EdgesView
in views.pyx
. We can of course let that for another ticket. We will have to check in many methods if we can benefit from this new parameter or not. Reducing the number of places where we sort is an important objective.
comment:16 Changed 2 years ago by
Once this is in you may want to check whether this also solves other "edge"y tickets:
comment:17 followup: 19 Changed 2 years ago by
Patchbot reports doctest failures :
There is a problem in src/sage/combinat/cluster_algebra_quiver/mutation_class.py
around line 520, where one iterates over edges and modify the edges at the same time (direct access to the backend).
comment:18 Changed 2 years ago by
Commit:  4920ec8452527d3cd73048af6a74976575f9c7c0 → 5fc372298b06c774a61da17098b335e0eb9a018d 

Branch pushed to git repo; I updated commit sha1. New commits:
5fc3722  fix wrong usage and document it

comment:19 Changed 2 years ago by
Replying to chapoton:
Patchbot reports doctest failures :
There is a problem in
src/sage/combinat/cluster_algebra_quiver/mutation_class.py
around line 520, where one iterates over edges and modify the edges at the same time (direct access to the backend).
Thank you. That was an easy one. I think it is just wrong to modify the vertices and not specify vertices
. It is difficult to know, what a user wants at this point.
I fixed it and documented this.
It is also kind of wrong to modify the edges while iterating, but as long as the vertices are welldefined this works and its the users problem to deal with whether new edges will appear in this run or not (unless you modify the current edge, like relabeling or deleting, then things are welldefined).
comment:20 followup: 22 Changed 2 years ago by
It is generally unsafe to modify the graph while iterating over the edges. Si I'm not sure the proposed modification is sufficient to solve the issue
comment:21 Changed 2 years ago by
Commit:  5fc372298b06c774a61da17098b335e0eb9a018d → 5e91e9976bac1db76f819bca9c3c14f704cdaa11 

Branch pushed to git repo; I updated commit sha1. New commits:
5e91e99  make iterator saver

comment:22 Changed 2 years ago by
Replying to dcoudert:
It is generally unsafe to modify the graph while iterating over the edges. Si I'm not sure the proposed modification is sufficient to solve the issue
I agree that iterating over something while modifying it, is not a good idea usually. It is nice, if you can relabel edges or delete some of them, without getting a complete list.
I added a quick check, that v_int
is still and active vertex. This indeed was a big issue (potentially segmentation fault), but also the only issue I can think of (which doesn't mean that it is the only one). Here are my thoughts about it:
next_neighbor_unsafe
will just pick the next neighbor as long as v_int
is an active vertex. If there are multiple arcs, they will be cached in a list, before yielding anything. This way things should be safe.
However, if you delete an edge and then add one, it might happen that the meaning of v_int
changes. I think, this is the users problem. You shouldn't ask for all edges out of a vertex and then along the way delete it and assume that things are fine.
If you have multiple arcs and delete them after obtaining the first arc was yield by the iterator, this is the users problem. The other arcs will still be output, but things should be safe as long as you use safe methods.
If you add/delete/modify edges, things are fine otherwise. However, it might happen that a new edge will be yield by the iterator or not (according to internal order). If you use the edge iterator to relabel edges things are fine, as long as you do not have multiple edges. (Once you have seen an edge (1,2)
, you know that it will not be output according to internal order again.)
comment:23 Changed 2 years ago by
OK, we can let it like that for the moment. This patch has only exhibit an issue in that code that is now fixed. If something goes wrong, a specific bug will be raised by a user.
comment:24 Changed 2 years ago by
I know. It's not like this was very stable before. In my opinion the only difference is, that before the edge iterator cached all arcs from v
, yield them all and then went on to the next vertex.
Now, we only cache all arcs from v
to u
. If there is only one arc, there is nothing to cache.
Either way, you can make the iterator behave strangely, by modifying the graph along the way. It shouldn't crash hard, this I think is important. Also the strange behavior should be somewhat foreseeable: If you add an arc (1,2)
along the way, it would be strange if (3,4)
appears twice now. It would be okay, if (1,2)
pops up again at some point.
If you delete an edge, I think it would even be fine, if it still appears in the iterator (even so it doesn't), but all the other edges should be yield correctly regardless.
comment:26 Changed 2 years ago by
Sure. What about naming it sorted
instead of unsorted (not in the backend), but for edges
, EdgesView
, edge_iterator
?
comment:27 Changed 2 years ago by
Well, edges
and EgesView
already have parameter sort
to sort or not the edges. The new parameter must be sufficiently different. I'm not sure of the best choice...
comment:30 Changed 2 years ago by
Commit:  5e91e9976bac1db76f819bca9c3c14f704cdaa11 → 98269fd36424e58ddc76de2a204ad7ae668d6bb6 

Branch pushed to git repo; I updated commit sha1. New commits:
98269fd  expose `sort_vertices`

comment:31 Changed 2 years ago by
Description:  modified (diff) 

comment:32 Changed 2 years ago by
Description:  modified (diff) 

comment:33 Changed 2 years ago by
Reviewers:  → David Coudert 

Status:  needs_review → positive_review 
LGTM.
Thanks a lot.
comment:35 Changed 2 years ago by
Status:  positive_review → needs_work 

There is a slowdown in the case that we do not access all vertices:
Before:
sage: G = graphs.Grid2dGraph(100, 100) sage: V = list(G) sage: %timeit list(G.edge_iterator(V[:500])) 454 µs ± 266 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
After:
sage: G = graphs.Grid2dGraph(100, 100) sage: V = list(G) sage: %timeit list(G.edge_iterator(V[:500])) 2.72 ms ± 1.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
comment:36 Changed 2 years ago by
Commit:  98269fd36424e58ddc76de2a204ad7ae668d6bb6 → 737e0f8011cb417059d23202abbab9dea41051c8 

Branch pushed to git repo; I updated commit sha1. This was a forced push. Last 10 new commits:
59fb906  remove duplications

4944027  switch u and v in place

38c6031  fix for directed multiedge graphs with labels

dbd17ef  use unsorted edges to speed up equality check

5f7b30d  reviewers suggestion

002ec5d  document that an iterator can be used to relabel edges

5deaa25  fix wrong usage and document it

d7c27eb  make iterator saver

86b0ba6  expose `sort_vertices`

737e0f8  lookup in bitset is much faster

comment:37 Changed 2 years ago by
Status:  needs_work → needs_review 

One simple commit:
Before this ticket:
sage: G = graphs.Grid2dGraph(100, 100) sage: V = list(G) sage: %timeit list(G.edge_iterator(V[:500])) 449 µs ± 588 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit list(G.edge_iterator(V[:5000])) 4.97 ms ± 4.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) sage: G = graphs.CompleteGraph(1000) sage: V = list(G) sage: %timeit list(G.edge_iterator(V[:100])) 20.1 ms ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: %timeit list(G.edge_iterator(V[:800])) 130 ms ± 1.45 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
With this ticket:
sage: %timeit list(G.edge_iterator(V[:500], sort_vertices=False)) 340 µs ± 1.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: G = graphs.Grid2dGraph(100, 100) sage: V = list(G) sage: %timeit list(G.edge_iterator(V[:500])) 377 µs ± 2.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit list(G.edge_iterator(V[:500], sort_vertices=False)) 338 µs ± 2.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit list(G.edge_iterator(V[:5000])) 4.3 ms ± 4.44 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) sage: %timeit list(G.edge_iterator(V[:5000], sort_vertices=False)) 3.85 ms ± 2.35 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) sage: G = graphs.CompleteGraph(1000) sage: V = list(G) sage: %timeit list(G.edge_iterator(V[:100], sort_vertices=False)) 14.8 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) sage: %timeit list(G.edge_iterator(V[:800])) 102 ms ± 67 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: %timeit list(G.edge_iterator(V[:800], sort_vertices=False)) 92.3 ms ± 1.61 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: %timeit list(G.edge_iterator()) # note that this is still slower, so we gain something by requesting only some vertices 110 ms ± 177 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
comment:38 Changed 2 years ago by
I have numerous failing doctests like:
********************************************************************** File "src/sage/graphs/independent_sets.pyx", line 164, in sage.graphs.independent_sets.IndependentSets.__init__ Failed example: for i in range(5): check_with_subgraph_search(graphs.RandomGNP(11, .3)) Exception raised: Traceback (most recent call last): File "/Users/dcoudert/sage/local/lib/python3.8/sitepackages/sage/doctest/forker.py", line 720, in _run self.compile_and_execute(example, compiler, test.globs) File "/Users/dcoudert/sage/local/lib/python3.8/sitepackages/sage/doctest/forker.py", line 1145, in compile_and_execute exec(compiled, globs) File "<doctest sage.graphs.independent_sets.IndependentSets.__init__[8]>", line 2, in <module> check_with_subgraph_search(graphs.RandomGNP(Integer(11), RealNumber('.3'))) File "<doctest sage.graphs.independent_sets.IndependentSets.__init__[7]>", line 3, in check_with_subgraph_search if not all(G.subgraph(l).is_independent_set() for l in IS): File "<doctest sage.graphs.independent_sets.IndependentSets.__init__[7]>", line 3, in <genexpr> if not all(G.subgraph(l).is_independent_set() for l in IS): File "/Users/dcoudert/sage/local/lib/python3.8/sitepackages/sage/graphs/generic_graph.py", line 12589, in subgraph return self._subgraph_by_adding(vertices=vertices, edges=edges, File "/Users/dcoudert/sage/local/lib/python3.8/sitepackages/sage/graphs/generic_graph.py", line 12724, in _subgraph_by_adding edges_to_keep = [e for e in self.edges(vertices=vertices, sort=False) File "/Users/dcoudert/sage/local/lib/python3.8/sitepackages/sage/graphs/generic_graph.py", line 12724, in <listcomp> edges_to_keep = [e for e in self.edges(vertices=vertices, sort=False) File "sage/graphs/views.pyx", line 438, in __iter__ (build/cythonized/sage/graphs/views.c:3991) yield from self._iter_unsorted(vertices) File "sage/graphs/views.pyx", line 410, in _iter_unsorted (build/cythonized/sage/graphs/views.c:3607) yield from self._graph._backend.iterator_edges(vertices, self._labels) File "sage/graphs/base/c_graph.pyx", line 3683, in _iterator_edges (build/cythonized/sage/graphs/base/c_graph.cpp:25263) b_vertices = FrozenBitset(foo for foo in b_vertices_2 if foo >= 0) File "sage/data_structures/bitset.pyx", line 398, in sage.data_structures.bitset.FrozenBitset.__init__ (build/cythonized/sage/data_structures/bitset.c:2753) raise ValueError("Bitsets must not be empty") ValueError: Bitsets must not be empty ********************************************************************** File "src/sage/graphs/views.pyx", line 209, in sage.graphs.views.EdgesView Failed example: E = EdgesView(G, vertices=[0, 3], labels=False, sort=True); E Expected: [] Got: <repr(<sage.graphs.views.EdgesView at 0x15bfdef40>) failed: ValueError: Bitsets must not be empty>
comment:39 Changed 2 years ago by
Commit:  737e0f8011cb417059d23202abbab9dea41051c8 → 07b7a8ea5d44447cd8b66d00d7a4485b2dd4b8ec 

Branch pushed to git repo; I updated commit sha1. New commits:
07b7a8e  bitset must not be empty

comment:40 Changed 2 years ago by
comment:41 Changed 2 years ago by
It's much better like that.
We have the case len(vertices) == 1
. Why not adding the case not vertices
(empty).
comment:42 Changed 2 years ago by
Commit:  07b7a8ea5d44447cd8b66d00d7a4485b2dd4b8ec → 957b4cc9eae331af65b64e8e83fa49d35403f4f7 

Branch pushed to git repo; I updated commit sha1. New commits:
957b4cc  check if vertices are empty

comment:43 Changed 2 years ago by
Commit:  957b4cc9eae331af65b64e8e83fa49d35403f4f7 → b7d739fc55cfedba11be5ab9868312281772a7cf 

Branch pushed to git repo; I updated commit sha1. New commits:
b7d739f  check for bitset capacity

comment:45 Changed 2 years ago by
Status:  needs_review → positive_review 

LGTM and better than previous version. Thank you.
comment:46 Changed 2 years ago by
Commit:  b7d739fc55cfedba11be5ab9868312281772a7cf → 8beb57302c92d17f4b703e53a1170aa0acd80920 

Status:  positive_review → needs_review 
Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:
8beb573  bitset capacity is +1 to last index

comment:49 Changed 2 years ago by
Milestone:  sage9.2 → sage9.3 

comment:50 Changed 2 years ago by
Branch:  u/ghkliem/graphs_improve_edge_iterator → 8beb57302c92d17f4b703e53a1170aa0acd80920 

Resolution:  → fixed 
Status:  positive_review → closed 
Branch pushed to git repo; I updated commit sha1. Last 10 new commits:
implement next_in_neighbor_unsafe for dense graph and use it
copyright edit
document new methods
fit design of dense grpah to sparse graph
implement `out_neighbors_unsafe` and `in_neighbors_unsafe` for CGraphs
somehow working version
let next_neighbor set the arc
somehow working version
fix incorrect doctests; iterator_edges is only supposed to be called in the undirected case
clean up