Opened 9 months ago
Closed 8 months ago
#27570 closed task (fixed)
Minimal weight cycle basis Implementation
Reported by: | gh-rajat1433 | Owned by: | gh-rajat1433 |
---|---|---|---|
Priority: | major | Milestone: | sage-8.8 |
Component: | graph theory | Keywords: | cycle_basis, shortest_paths |
Cc: | dcoudert | Merged in: | |
Authors: | Rajat Mittal | Reviewers: | David Coudert |
Report Upstream: | N/A | Work issues: | |
Branch: | 66fe52d (Commits) | Commit: | 66fe52d1c0c21b2cf4d02e8085a439dbe54cb6e8 |
Dependencies: | Stopgaps: |
Description
Minimal weight cycle basis is a cycle basis for which the total weight of all the cycles is minimum.
Its applications are as follows:
- The minimum weight cycle basis of a nearest neighbor graph of points sampled from a three-dimensional surface can be used to obtain a reconstruction of the surface.
- In cheminformatics, the minimal cycle basis of a molecular graph is referred to as the Smallest Set of Smallest Rings (SSSR).
This ticket will aim at implementing this algorithm.
Change History (48)
comment:1 Changed 9 months ago by
comment:2 Changed 9 months ago by
- Branch set to u/gh-rajat1433/27570_min_weight_cycle_basis
- Owner changed from (none) to gh-rajat1433
comment:3 Changed 8 months ago by
I had a doubt regarding what should we add in examples when getting deprecation warning from external package. like below:
sage: g.minimum_cycle_basis(by_weight=True, algorithm="NetworkX") DeprecationWarning: connected_component_subgraphs is deprecated and will be removedin 2.2. Use (G.subgraph(c).copy() for c in connected_components(G)) [[1, 2, 3, 4], [1, 2, 3], [5, 6, 7]]
should we skip these examples in examples section?
comment:4 Changed 8 months ago by
Skip this example. This is an upstream issue as we use networkx 2.2.
comment:5 Changed 8 months ago by
- Commit set to 464c00c8b9caaba024f8c4587ed6586a0fe0b959
comment:6 Changed 8 months ago by
I have added minimum cycle basis cython implementation and for python implementation used networkx's min_cycle_basis method. The algorithm's complexity is O(m2)n and cython implementation is quite faster than NetwrokX's python as I have tested.
comment:7 Changed 8 months ago by
- Commit changed from 464c00c8b9caaba024f8c4587ed6586a0fe0b959 to 5512e26447bceb185cefe12a22893c6c31c5662c
Branch pushed to git repo; I updated commit sha1. New commits:
5512e26 | remove xtra spaces
|
comment:8 Changed 8 months ago by
- Status changed from new to needs_review
comment:9 Changed 8 months ago by
- Commit changed from 5512e26447bceb185cefe12a22893c6c31c5662c to f37b331169586323739a203dfc4427bbaaaadfe9
Branch pushed to git repo; I updated commit sha1. New commits:
f37b331 | removed unnecessary import
|
comment:10 follow-ups: ↓ 11 ↓ 12 Changed 8 months ago by
A first round of comments.
min_cycle_basis
:
- remove input parameter "algorithm"
- in many methods, we are using a mapping from integers to vertices. This is simply
cdef list int_to_vertex = list(G)
. The reverse mapping is thencdef dict vertex_to_int = {u: i for i, u in enumerate(int_to_vertex)}
.
- To simplify and speed up your code, you should either make a copy of
self
in which vertices are relabeled in0..n-1
, or at least relabel the edges inedgelist
using integers. The only step were you need the real vertex ids is before returning the result.
- You could work with edges as frozenset directly
- don't import
from sage.graphs.base.boost_graph import johnson_shortest_paths
inside a loop
minimum_cycle_basis
:
- I don't see the need for creating a new method here. You could add parameters to
cycle_basis
to make it return a minimum cycle basis when the user asked for it ?
comment:11 in reply to: ↑ 10 Changed 8 months ago by
Replying to dcoudert:
- in many methods, we are using a mapping from integers to vertices. This is simply
cdef list int_to_vertex = list(G)
. The reverse mapping is thencdef dict vertex_to_int = {u: i for i, u in enumerate(int_to_vertex)}
.
It seems SparseGraphBackend? object is not iterable so we cant do list(self)
[UPDATE] So I did this
cdef list int_to_vertex = list(self.iterator_verts(None)) cdef dict vertex_to_int = {u: i for i, u in enumerate(int_to_vertex)}
/home/rajat/new_version/sage-8.7.beta6/local/lib/python2.7/site-packages/sage/graphs/base/c_graph.pyx in sage.graphs.base.c_graph.CGraphBackend.min_cycle_basis (build/cythonized/sage/graphs/base/c_graph.cpp:21478)() 2457 l = len(edges_complement) 2458 cdef list orth_set = [set([e]) for e in edges_complement] -> 2459 cdef list int_to_vertex = list(self) 2460 cdef dict vertex_to_int = {u: i for i, u in enumerate(int_to_vertex)} 2461 if not by_weight: TypeError: 'sage.graphs.base.sparse_graph.SparseGraphBackend' object is not iterable
- You could work with edges as frozenset directly
Can you elaborate on it? I didn't understand.
comment:12 in reply to: ↑ 10 Changed 8 months ago by
Replying to dcoudert:
- To simplify and speed up your code, you should either make a copy of
self
in which vertices are relabeled in0..n-1
, or at least relabel the edges inedgelist
using integers. The only step were you need the real vertex ids is before returning the result.
I think this can be done and will improve dictionary look up time.
comment:13 Changed 8 months ago by
The parameters of cycle_basis and minimum_cycle_basis are quite different as in cyclebasis we just specify the output as vertices or edges but in min_cycle_basis we have algorithm parameter and weight parameters too. If they are merged it will mess a bit.(just my opinion). But if it seems better than I can do it.
comment:14 Changed 8 months ago by
- Commit changed from f37b331169586323739a203dfc4427bbaaaadfe9 to cc4ba48619594fd125413296e2d7d1413cc0376d
Branch pushed to git repo; I updated commit sha1. New commits:
cc4ba48 | improved the code
|
comment:15 Changed 8 months ago by
I agree with your arguments.
In your code, you code document a bit the type of graph you build, and so why you use n+j
or n + vidx
.
I'm a bit worried about the following lines because I don't know what you expect
+ new_cycle = {frozenset((int_to_vertex[u], int_to_vertex[v])) for u, v in edges} + cycle_basis.append(list(set().union(*new_cycle)))
With such operation, you loose the order of the vertices, so I don't see how you get a cycle.
I think the following is better, but it is only because I'm not sure how list manipulation is really done for sure instructions.
- orth_set[i + 1:] = [o ^ base if len(o & new_cycle) % 2 else o for o in orth_set[i + 1:]] + for j in range(i + 1, len(orth_set)): + orth_set[j] = o ^ base if len(o & new_cycle) % 2 else o
also
+ edges_c = [frozenset(e) for e in comp.edges(labels=False) if e not in edges_s]
+ edges_c = [frozenset(e) for e in comp.edge_iterator(labels=False) if e not in edges_s]
What if the input graph has loops ? If not accepted, then use self._scream_if_not_simple()
.
comment:16 Changed 8 months ago by
I forgot to tell/add that our algorithm returns only vertices in the cycle not necessarily in the order of cycle. This is a kind of limitation of the algorithm as given in https://link.springer.com/article/10.1007/s00453-007-9064-z. But the time complexity we get is good. I have added the following text now:
A list of cycle lists is returned. Each cycle list is a list of vertices which forms a cycle in G. Note that the vertices are not necessarily returned in the order by which they appear in the cycle
So now
+ new_cycle = {frozenset((int_to_vertex[u], int_to_vertex[v])) for u, v in edges} + cycle_basis.append(list(set().union(*new_cycle)))
this should not be our worry :) as the vertices are returned here in sorted order.
comment:17 Changed 8 months ago by
- Commit changed from cc4ba48619594fd125413296e2d7d1413cc0376d to 7038d4283e2a96696e20fc0864f308c2fc7fd7dc
Branch pushed to git repo; I updated commit sha1. New commits:
7038d42 | improved code and doc
|
comment:18 Changed 8 months ago by
min_cycle_basis
- A list of cycle lists is returned. Each cycle list is a list of vertices - which forms a cycle in G. Note that the vertices are not necessarily - returned in the order by which they appear in the cycle + A cycle basis is a list of cycles (list of vertices forming a cycle) of + `G`. Note that the vertices are not necessarily returned in the order + in which they appear in the cycle.
- Minimum weight cycle basis is the cycle basis for which the total weight - (length for unweighted graphs) of all the cycles is minimum. + A minimum weight cycle basis is a cycle basis that minimizes the sum of + the weights (length for unweighted graphs) of its cycles.
- - ``edges_complement`` -- list (default: ``None``); a list of edges - present in the ``self`` but not in present in a particular spanning - tree. + - ``edges_complement`` -- list (default: ``None``); list of edges of + ``self`` without the edges of a particular spanning tree.
- # Add 2 copies of each edge in self to T. Cross edge is added if - # edge is in orth otherwise in-plane edge is added + # For each edge in self, add 2 edges to G: "cross" edges if + # edge is in orth otherwise "in-plane" edges
No need for list(..)
- for edge in list(zip(min_path_nodes[:-1], min_path_nodes[1:])): + for edge in zip(min_path_nodes[:-1], min_path_nodes[1:]):
Here an alternative is to use chain
. You need to add before the loop from itertools import chain
. I don't know if it's faster...
new_cycle = {frozenset((int_to_vertex[u], int_to_vertex[v])) for u, v in edges} - cycle_basis.append(list(set().union(*new_cycle))) + cycle_basis.append(list(set(chain(*new_cycle))))
minimum_cycle_basis
- same comments than above for improving the text
- don't use this as plain text
minimum_cycle_basis
in documentation.- * If ``algorithm = "NetworkX"``, a networkx implementation of the - minimum_cycle_basis algorithm is used - - * If ``algorithm = None``, then cython implementation of the - minimum_cycle_basis algorithm is used + * If ``algorithm = "NetworkX"``, use networkx implementation + + * If ``algorithm = None``, use Sage Cython implementation
- make
edge_s
a set to speed up construction ofedges_c
+ edges_s = [(a, b) for a, b, c in sp_edges] + edges_s = set((a, b) for a, b, c in sp_edges)
compliment
->complement
- In fact, this is better
- # compliment of the edges of the spanning tree with respect - # to self + # Edges of self that are not in the spanning tree
comment:19 Changed 8 months ago by
- Commit changed from 7038d4283e2a96696e20fc0864f308c2fc7fd7dc to 734a300d512e85b60d3acbd2f5702912bb9389c4
Branch pushed to git repo; I updated commit sha1. New commits:
734a300 | improved
|
comment:20 follow-up: ↓ 21 Changed 8 months ago by
I think union and chain works differently and in the present case union seems correct as it collects all the vertices of different edges. chain might result in repetition of vertices.
comment:21 in reply to: ↑ 20 Changed 8 months ago by
Replying to gh-rajat1433:
I think union and chain works differently and in the present case union seems correct as it collects all the vertices of different edges. chain might result in repetition of vertices.
no, the behavior is the same here, but as I wrote, I'm not sure it's better here, so let it as is.
comment:22 Changed 8 months ago by
Yes you are right I was misunderstanding the definition of chain.
sage: n = {(1,2),(3,4),(1,4),(5,6),(6,8),(6,4)} sage: %timeit list(set().union(*n)) The slowest run took 11.81 times longer than the fastest. This could mean that an intermediate result is being cached. 1000000 loops, best of 3: 1.53 µs per loop sage: %timeit list(set(chain(*n))) The slowest run took 6.35 times longer than the fastest. This could mean that an intermediate result is being cached. 1000000 loops, best of 3: 1.58 µs per loop
Although not much difference but union is slightly more better than chain here.
comment:23 Changed 8 months ago by
`self`
->``self``
(twice)
- what happen when
edges_complement
is None ?
- you should document the fact that in
edges_complement
, edges are pairs without labels, and in fact frozensets. In fact, it might be better to computeedges_complement
insidemin_cycle_basis
to avoid errors in the input.
- a possible improvement could be to do
cdef list edgelist = list(self.iterator_unsorted_edges(list(self.iterator_verts(None)), True)) cdef list edgelist_int = [(vertex_to_int[e[0]], vertex_to_int[e[1], weight_function(e)) for e in edgelist] cdef list orth_set = [set([(vertex_to_int[e[0]], vertex_to_int[e[1])]) for e in edges_complement]
this way, inside the main loops you work only with integers and precomputed weights. Finally, you reconstruct the correct cycle basis before returning it.
You can then do
for e in edgelist:
->for uidx, vidx, edge_w in edgelist_int:
Think about it.
comment:24 Changed 8 months ago by
- Commit changed from 734a300d512e85b60d3acbd2f5702912bb9389c4 to 66e9c46ef456314a0b0ae3dbf964ea52639af0fd
Branch pushed to git repo; I updated commit sha1. New commits:
66e9c46 | improved
|
comment:25 follow-up: ↓ 26 Changed 8 months ago by
cdef list orth_set = [set([(vertex_to_int[e[0]], vertex_to_int[e[1])]) for e in edges_complement]
this will throw an error as edges_complement being a frozenset can't be indexed.
what happen when edges_complement is None ?
It should return an empty list. I added it in this commit to avoid process further.
I tried doing
cdef list orth_set = [set([(vertex_to_int[e[0]], vertex_to_int[e[1])]) for e in edges_complement]
before converting to frozenset but got some unexpected errors. Will see if it can be added.
comment:26 in reply to: ↑ 25 ; follow-up: ↓ 28 Changed 8 months ago by
Replying to gh-rajat1433:
cdef list orth_set = [set([(vertex_to_int[e[0]], vertex_to_int[e[1])]) for e in edges_complement]this will throw an error as edges_complement being a frozenset can't be indexed.
Then
cdef list orth_set = [set([(vertex_to_int[u], vertex_to_int[v])]) for u, v in edges_complement]
comment:27 Changed 8 months ago by
- Commit changed from 66e9c46ef456314a0b0ae3dbf964ea52639af0fd to 09246cbd3dd71b2a30aef045f6a13dcd109c6c23
Branch pushed to git repo; I updated commit sha1. New commits:
09246cb | merged with beta8.8.2
|
comment:28 in reply to: ↑ 26 Changed 8 months ago by
Replying to dcoudert:
Then
cdef list orth_set = [set([(vertex_to_int[u], vertex_to_int[v])]) for u, v in edges_complement]
We have
new_cycle = {frozenset((int_to_vertex[u], int_to_vertex[v])) for u, v in edges} cycle_basis.append(list(set().union(*new_cycle))) # updating orth_set so that i+1, i+2, ...th elements are orthogonal # to the newly found cycle base = orth_set[i] for j in range(i + 1, len(orth_set)): if len(**orth_set[j] & new_cycle**) % 2: orth_set[j] = orth_set[j] ^ base
In new_cycle we have original vertices so we can't convert vertex to int in orth_set before as they have to be bitwise and with new_cycle as indicated above.
comment:29 Changed 8 months ago by
consider this code
if not edges_complement: return [] if not by_weight: def weight_function(e): return 1 from sage.graphs.base.boost_graph import johnson_shortest_paths from sage.graphs.graph import Graph cdef int u_int, v_int, i, j cdef object u, v cdef list int_to_vertex = list(self.iterator_verts(None)) cdef dict vertex_to_int = {u: u_int for u_int, u in enumerate(int_to_vertex)} cdef list edgelist = [(vertex_to_int[e[0]], vertex_to_int[e[1]], weight_function(e)) for e in self.iterator_unsorted_edges(int_to_vertex, True)] edges_complement = [frozenset((vertex_to_int[u], vertex_to_int[v])) for u, v in edges_complement] cdef int l = len(edges_complement) cdef list orth_set = [set([e]) for e in edges_complement] cdef int n = self.num_verts() cdef list min_path_nodes cdef list min_path cdef dict all_pair_shortest_pathlens cdef dict cross_paths_lens cdef list cycle_basis = [] cdef set base for i in range(l): base = orth_set[i] G = Graph(weighted=True) # For each edge in self, add 2 edges to G: "cross" edges if edge is # in base, otherwise "in-plane" edges for u_int, v_int, edge_w in edgelist: # mapping the nodes in self from 0 to n-1 if frozenset((u_int, v_int)) in base: G.add_edge(u_int, n + v_int, edge_w) G.add_edge(n + u_int, v_int, edge_w) else: G.add_edge(u_int, v_int, edge_w) G.add_edge(n + u_int, n + v_int, edge_w) all_pair_shortest_pathlens = johnson_shortest_paths(G) cross_paths_lens = {j: all_pair_shortest_pathlens[j][n + j] for j in range(n)} u_int = min(cross_paths_lens, key=cross_paths_lens.get) v_int = n + u_int min_path = G._backend.bidirectional_dijkstra(u_int, v_int, distance_flag=False) # Mapping the nodes in G to nodes in self min_path_nodes = [u_int if u_int < n else u_int - n for u_int in min_path] # removal of edges occuring even number of times edges = set() for edge in zip(min_path_nodes[:-1], min_path_nodes[1:]): edges ^= {edge} new_cycle = {frozenset(e) for e in edges} cycle_basis.append([int_to_vertex[u_int] for u_int in set().union(*new_cycle)]) # updating orth_set so that i+1, i+2, ...th elements are orthogonal # to the newly found cycle for j in range(i + 1, l): if len(orth_set[j] & new_cycle) % 2: orth_set[j] = orth_set[j] ^ base return cycle_basis
Now, is c_graph.pyx
the best place for this code ? Somehow, instead of building a Graph
, you could directly build a boost graph and call corresponding methods on it. Plus, with #27518, we will be able to get the predecessor matrix, thus avoiding an extra call to Dijkstra.
comment:30 Changed 8 months ago by
- Now, is c_graph.pyx the best place for this code ? Somehow, instead of building a Graph, you could directly build a boost graph and call corresponding methods on it. Plus, with #27518, we will be able to get the predecessor matrix, thus avoiding an extra call to Dijkstra.
But when we call any method in boost graph it takes in sage graph and converts it into a boost graph like in Johnson's algorithm for all_paths. So if we want to use directly boost graph we may have to write a new function/ modify the existing function in boost_graph.pyx for using the algorithm of boost graph on boost graph directly without expecting a sage graph. If you have any other idea let me know.
comment:31 Changed 8 months ago by
- Commit changed from 09246cbd3dd71b2a30aef045f6a13dcd109c6c23 to 31457ee71044481c7c647e8bef5ff0a129647707
Branch pushed to git repo; I updated commit sha1. New commits:
31457ee | improved the code
|
comment:32 Changed 8 months ago by
- Branch changed from u/gh-rajat1433/27570_min_weight_cycle_basis to public/graphs/27570_min_cycle_basis
- Commit changed from 31457ee71044481c7c647e8bef5ff0a129647707 to 8d9c34e42ec0970b67b8f6bf09f39a55dbfdecd3
New branch with an implementation using boost. The main advantage is to avoid conversion between sage graph and boost graph as we create directly the boost graph.
Please try it, test it, etc. to check if it's working correctly and also if it is better than previous version.
New commits:
fd99eb0 | trac #27570: Merged with 8.8.beta3
|
8d9c34e | trac #27570: min_cycle_basis with boost
|
comment:33 Changed 8 months ago by
Since the branch is in public/
, you can modify it.
comment:34 Changed 8 months ago by
Thank You @dcoudert for helping me with the boost graph code. I have tested it on my system and its timing is improved as well as it is passing all the tests. So its certainly better and appropriate then the previous version.
For me it is good to go.
comment:35 Changed 8 months ago by
- Commit changed from 8d9c34e42ec0970b67b8f6bf09f39a55dbfdecd3 to 805ef5c10252df220b1b08ccceaa469a8a9d207a
Branch pushed to git repo; I updated commit sha1. New commits:
805ef5c | added min_cycle_basis
|
comment:36 Changed 8 months ago by
- Commit changed from 805ef5c10252df220b1b08ccceaa469a8a9d207a to 3e717c3e3217a8865283fcb1ff7675c9f0369d1f
Branch pushed to git repo; I updated commit sha1. New commits:
3e717c3 | trac #27570: reviewer improvement
|
comment:37 follow-up: ↓ 38 Changed 8 months ago by
I added some examples with networkx plus a few corrections.
In min_cycle_basis
, can we avoid input parameter edges_complement
and so compute a spanning tree inside the method. Is it important to have this input parameter ?
comment:38 in reply to: ↑ 37 Changed 8 months ago by
Replying to dcoudert:
I added some examples with networkx plus a few corrections.
In
min_cycle_basis
, can we avoid input parameteredges_complement
and so compute a spanning tree inside the method. Is it important to have this input parameter ?
Since we now have sage graph inside min_cycle_basis , edges_complement can now be computed inside our method , no need to pass it as a parameter. I will do the required change.
comment:39 Changed 8 months ago by
- Commit changed from 3e717c3e3217a8865283fcb1ff7675c9f0369d1f to ca55582d2a6f320d856090657b6ea1ded9c2b258
comment:40 Changed 8 months ago by
I have removed the edges_complement parameter. Note that I have added
+cpdef w_f(e): + return 1
}}} in boost_graph as required to get an unweighted spanning tree as cython doesn't allow lambda expressions. Couldn't find a better way to do it. If you have any other way in mind let me know.
comment:41 Changed 8 months ago by
Since any spanning tree is valid, you can simply avoid adding the weight function. So we don't need to define w_f
.
comment:42 Changed 8 months ago by
- Commit changed from ca55582d2a6f320d856090657b6ea1ded9c2b258 to 4ca454e30ce50c3f3049f26d69266943264a3d44
Branch pushed to git repo; I updated commit sha1. New commits:
4ca454e | removed w_f
|
comment:43 Changed 8 months ago by
I just tried the the patch with Python3 and we have some issues to fix. This is mostly ordering issues as for instance the order of the keys of a dictionary is not always the same in Python 2 and Python 3.
For instance, I get:
File "src/sage/graphs/generic_graph.py", line 4679, in sage.graphs.generic_graph.GenericGraph.minimum_cycle_basis Failed example: g.minimum_cycle_basis(by_weight=True) Expected: [[1, 2, 3], [1, 2, 3, 4], [5, 6, 7]] Got: [[1, 2, 3, 4], [1, 2, 3], [5, 6, 7]] ********************************************************************** File "src/sage/graphs/generic_graph.py", line 4681, in sage.graphs.generic_graph.GenericGraph.minimum_cycle_basis Failed example: g.minimum_cycle_basis(by_weight=False) Expected: [[1, 2, 3], [1, 3, 4], [5, 6, 7]] Got: [[1, 3, 4], [1, 2, 3], [5, 6, 7]]
So, I suggest to use sorted(g.minimum_cycle_basis(...))
in all examples, and possibly
sorted(sorted(c) for c in g.minimum_cycle_basis(...))
. This last one might be too much, but...
comment:44 Changed 8 months ago by
- Commit changed from 4ca454e30ce50c3f3049f26d69266943264a3d44 to 66fe52d1c0c21b2cf4d02e8085a439dbe54cb6e8
Branch pushed to git repo; I updated commit sha1. New commits:
66fe52d | sorted examples
|
comment:45 Changed 8 months ago by
In our examples we do not have disconnected components except in 1 in which sorted and unsorted are same, so sorted(g.minimum_cycle_basis(...)) will work
comment:46 Changed 8 months ago by
This has nothing to do with being connected or not. In all our example, we always get the vertices of a cycle sorted by label, although this is not a requirement in the code.
Instead of [[1, 2, 3], [1, 3, 4], [5, 6, 7]]
, we could get [[1, 3, 2], [3, 4, 1], [5, 7, 6]]
. If we had that, then sorted(sorted(c)...
would be necessary.
We could certainly improve further the code, but this is already quite good.
comment:47 Changed 8 months ago by
- Reviewers set to David Coudert
- Status changed from needs_review to positive_review
LGTM.
comment:48 Changed 8 months ago by
- Branch changed from public/graphs/27570_min_cycle_basis to 66fe52d1c0c21b2cf4d02e8085a439dbe54cb6e8
- Resolution set to fixed
- Status changed from positive_review to closed
minimum weight cycle basis in a graph with m edges and n vertices can be found in polynomial time using https://link.springer.com/article/10.1007/s00453-007-9064-z.
https://arxiv.org/pdf/0912.1208.pdf
I am going through these algorithms before implementing it.
These work for undirected(weighted/non-weighted) graphs without multiedges.