Opened 2 years ago

Closed 22 months ago

#30188 closed enhancement (fixed)

Modify shortest_paths method in boost_graph.pyx to take list of vertices as input

Reported by: gh-vipul79321 Owned by:
Priority: major Milestone: sage-9.2
Component: graph theory Keywords: gsoc2020
Cc: dcoudert, slelievre Merged in:
Authors: Vipul Gupta Reviewers: Samuel Lelièvre, David Coudert
Report Upstream: N/A Work issues:
Branch: 97fef26 (Commits, GitHub, GitLab) Commit: 97fef26ecbd6d0f326f393199e96ab32bd2f73d4
Dependencies: Stopgaps:

Status badges

Description (last modified by slelievre)

This ticket aims to modify shortest_paths method in boost_graph.pyx to take list of vertices as input. This will reduce the current scenario of converting a Sage graph to a Boost graph and then computing shortest paths for each required vertex.

Change History (36)

comment:1 Changed 2 years ago by gh-vipul79321

  • Branch set to public/graphs/30188_shortest_paths

comment:2 Changed 2 years ago by git

  • Commit set to 467fbc70a08b552b3de33d9065204ee9cbfb02c7

Branch pushed to git repo; I updated commit sha1. New commits:

comment:3 Changed 2 years ago by git

  • Commit changed from 467fbc70a08b552b3de33d9065204ee9cbfb02c7 to cc924a3f7348e68f99f9e3e06fac373a8abd70e8

Branch pushed to git repo; I updated commit sha1. New commits:

cc924a3code added

comment:4 follow-up: Changed 2 years ago by slelievre

  • Cc slelievre added
  • Description modified (diff)
  • Reviewers set to Samuel Lelièvre

Please add your full name in the "Authors" field.

Please set to needs_review when ready for review.

Use one more f-string:

-            raise ValueError("the starting vertex " + str(v) + " is not in " +
-                             "the graph")
+            raise ValueError(f"the starting vertex {v} is not in the graph")

Maybe wrap long lines, for instance:

-                        raise RuntimeError("Dijkstra algorithm does not work with negative weights, use Bellman-Ford instead")
+                        raise RuntimeError("Dijkstra algorithm does not "
+                                           "work with negative weights, "
+                                           "use Bellman-Ford instead")

Maybe remove unnecessary parentheses:

-    return (distances, predecessors)
+    return distances, predecessors

comment:5 Changed 2 years ago by git

  • Commit changed from cc924a3f7348e68f99f9e3e06fac373a8abd70e8 to 0c86a049b96f1ddb4bd7a7897b54f38c93bb1db0

Branch pushed to git repo; I updated commit sha1. New commits:

0c86a04minor review commit

comment:6 in reply to: ↑ 4 Changed 2 years ago by gh-vipul79321

Replying to slelievre:

Please add your full name in the "Authors" field.

Please set to needs_review when ready for review.

Currently, I am working on documentation. Will update as soon as I will be done.

Vipul

comment:7 Changed 2 years ago by gh-vipul79321

I have one query regarding the documentation of shortest_path method in boost_graph.pyx. It mentions that

    Note that, if the graph is undirected, a negative edge automatically creates
    a negative cycle: for this reason, in this case, Dijkstra algorithm is
    always better.

I dont understand since there is already negative weight, then why to use dijkstra. And further, in such graphs shortest path computation may not be possible to negative cycle.

comment:8 follow-up: Changed 2 years ago by git

  • Commit changed from 0c86a049b96f1ddb4bd7a7897b54f38c93bb1db0 to 33e436d58f9c4a2c5345168e188401701d3cdf12

Branch pushed to git repo; I updated commit sha1. New commits:

33e436ddocumentation added

comment:9 in reply to: ↑ 8 Changed 2 years ago by gh-vipul79321

  • Authors set to Vipul Gupta
  • Status changed from new to needs_review

Replying to git:

Branch pushed to git repo; I updated commit sha1. New commits:

33e436ddocumentation added

I have added documentation.

One thing, I would like to mention is that when vertext_list contains only one vertex, I have returned two dicts, instead of two dicts of dict. I have done so to keep uniformity in output from our method and current shortest_path method. So that in future, it will be easy for use to replace shortest_path method.

comment:10 Changed 2 years ago by gh-vipul79321

Now after complete review, we just need to expose this method in shortest_path_all_pairs, centrality_closeness and all such methods in generic_graph.py

comment:11 Changed 2 years ago by dcoudert

  • Cc dcoudert added; docudert removed

comment:12 follow-up: Changed 2 years ago by dcoudert

  • Here you should first check if vertex_list is None
    +    if not isinstance(vertex_list, list):
    +        vertex_list = [vertex_list]
    +
    +    for v in vertex_list:
    +        if v not in g:
    +            raise ValueError(f"the starting vertex {v} is not in the graph")
    +
    +    if vertex_list == None:
    +        vertex_list = list(g)
    
  • A list is better here. It's faster to get a value from a list than from a dictionary
    -    cdef dict int_to_v = dict(enumerate(g))
    +    cdef list int_to_v = list(g)
    
  • Why this comment ?
    +    # Needed for rational curves.
    
  • define types of variables distances, predecessors, pred_v, etc.

What I don't like here is that the method returns dict of dicts. A least a matrix would be a better choice. If the graph is large, it's a huge waste of time and space :(

comment:13 in reply to: ↑ 12 Changed 2 years ago by gh-vipul79321

Replying to dcoudert:

What I don't like here is that the method returns dict of dicts. A least a matrix would be a better choice. If the graph is large, it's a huge waste of time and space :(

I was hoping maybe we could return dict instead of dict of dict. And in that dict key values will be vertex in vertex list. And item value will be a list or vector storing distances[v] in order of list(g). Meaning distances[v][i] will return shortest distance between v and list(g)[i].

Similarly for predecessor.

Is it ok?

P.S.- we can do similar thing with matrix. But i just feel that this will be nice way.

comment:14 Changed 2 years ago by git

  • Commit changed from 33e436d58f9c4a2c5345168e188401701d3cdf12 to a44ccbdfedc00aa8eed1b10a7fb23cfaf0d4231e

Branch pushed to git repo; I updated commit sha1. New commits:

f4011b5trac #30188: fix merge conflicts
a44ccbdtrac #30188: review commit

comment:15 Changed 2 years ago by dcoudert

rebased on last beta and some minor improvements.

comment:16 follow-up: Changed 2 years ago by git

  • Commit changed from a44ccbdfedc00aa8eed1b10a7fb23cfaf0d4231e to 8ae87450b2709a36a5b8a3b91ee4ef5437f55b31

Branch pushed to git repo; I updated commit sha1. New commits:

8ae8745made list of lists as return type

comment:17 in reply to: ↑ 16 Changed 2 years ago by gh-vipul79321

Replying to git:

Branch pushed to git repo; I updated commit sha1. New commits:

8ae8745made list of lists as return type

I have made the return type as list of lists. Now for i, v in enumerate(vertex_list): distances[i] and predecessors[i] will give us the shortest path information from v.

Although, one thing I would like to mention is following scenario -

Current output using list of lists
sage: G = Graph(5)
sage: shortest_paths_from_vertices(G,0)
([[0, +Infinity, +Infinity, +Infinity, +Infinity]],
 [[None, None, None, None, None]])

when returning dict of dict
sage: shortest_paths(G,0)
({0: 0}, {0: None})

Clearly, there is a need of extra memory in list of lists in case of disconnected graphs.

One bug that i encountered is:

sage: G = Graph([ (0,1,2), (1,2,0.7)], weighted = True)
sage: shortest_paths_from_vertices(G, 0)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-1e6b63e11f81> in <module>()
----> 1 shortest_paths_from_vertices(G, Integer(0))

/home/vipul/sage/local/lib/python3.7/site-packages/sage/graphs/base/boost_graph.pyx in sage.graphs.base.boost_graph.shortest_paths_from_vertices (build/cythonized/sage/graphs/base/boost_graph.cpp:26092)()
   2101     return LB
   2102 
-> 2103 cpdef shortest_paths_from_vertices(g, vertex_list=None, weight_function=None, algorithm=None):
   2104     r"""
   2105     Compute the shortest paths to all vertices from each vertex in

/home/vipul/sage/local/lib/python3.7/site-packages/sage/graphs/base/boost_graph.pyx in sage.graphs.base.boost_graph.shortest_paths_from_vertices (build/cythonized/sage/graphs/base/boost_graph.cpp:25805)()
   2312         for vert in range(g.num_verts()):
   2313             if result.distances[vert] != sys.float_info.max:
-> 2314                 dist_v.append(correct_type(result.distances[vert]))
   2315                 pred = result.predecessors[vert]
   2316                 if pred != vert:

/home/vipul/sage/local/lib/python3.7/site-packages/sage/rings/integer.pyx in sage.rings.integer.Integer.__init__ (build/cythonized/sage/rings/integer.c:6091)()
    684                     mpz_set_pylong(self.value, n)
    685                 else:
--> 686                     raise TypeError("Cannot convert non-integral float to integer")
    687 
    688             elif isinstance(x, pari_gen):

TypeError: Cannot convert non-integral float to integer

I guess, we have to remove code of correct_type.

Best Vipul

comment:18 follow-up: Changed 2 years ago by git

  • Commit changed from 8ae87450b2709a36a5b8a3b91ee4ef5437f55b31 to c309ab42d962c6ff7017aa4533c97d6f765a3236

Branch pushed to git repo; I updated commit sha1. New commits:

c309ab4minor mistake

comment:19 in reply to: ↑ 18 Changed 2 years ago by gh-vipul79321

Replying to git:

Branch pushed to git repo; I updated commit sha1. New commits:

c309ab4minor mistake

Fixed a error, which i mistakenly introduced.

I havent updated documentation yet. I will update it, when we finalize the changes

comment:20 follow-up: Changed 2 years ago by dcoudert

Deciding of the correct type to use is not easy. May be it's easier to not try to correct the type, no ?

comment:21 in reply to: ↑ 20 Changed 2 years ago by gh-vipul79321

Replying to dcoudert:

Deciding of the correct type to use is not easy. May be it's easier to not try to correct the type, no ?

Yeah, i will remove that part.

comment:22 follow-up: Changed 2 years ago by git

  • Commit changed from c309ab42d962c6ff7017aa4533c97d6f765a3236 to 10143a3fe946e9e47c7023d842bb9ced936ee758

Branch pushed to git repo; I updated commit sha1. New commits:

10143a3correct type code removed

comment:23 in reply to: ↑ 22 Changed 2 years ago by gh-vipul79321

Replying to git:

Branch pushed to git repo; I updated commit sha1. New commits:

10143a3correct type code removed

Correct type code removed. If we are on board on current return method, I will update the documentation and expose it in generic_graph.py

comment:24 Changed 2 years ago by dcoudert

Honestly, I'm not completely sure of what the method does and it's objective. Can you clarify how you plan to use it and where. I'm not sure it does exactly what is needed.

comment:25 Changed 2 years ago by gh-vipul79321

Goal of this ticket is avoid format conversion (from sage graph to boost graph) for each vertex during shortest_path_all_pairs computation or other such method in generic_graph.py. For e.g, see the following code -

    from sage.graphs.base.boost_graph import shortest_paths
    dist = dict()
    pred = dict()
    for u in self:
        dist[u], pred[u] = shortest_paths(self, u, weight_function, algorithm)
    return dist, pred

Here shortest_paths of boost_graph.pyx is called for each vertex and hence format conversion time for each vertex.

This ticket aims to implement a method in boost_graph.pyx and take list of vertices as input and return shortest_path information for each vertex. Thus avoid conversion overhead.

Our current method shortest_path_from_vertices in boost_graph.pyx does that.

I hope it clear your doubt about purpose of the ticket :)

Vipul

comment:26 follow-up: Changed 2 years ago by dcoudert

my main concern is that this method returns lists of distances. Outside this method, the chosen order of the vertices is not known. So I don't know how this can be used.

comment:27 in reply to: ↑ 26 Changed 2 years ago by gh-vipul79321

Replying to dcoudert:

my main concern is that this method returns lists of distances. Outside this method, the chosen order of the vertices is not known. So I don't know how this can be used.

I think we can do it as follows in methods calling this method for retreiving path information -

distances , predecessors = shortest_path_from_vertices(G, vertex_list)
for i, v in enumerate(vertex_list):
    dist_v = distances[i]
    pred_v = predecessors[i]
    
    for id, vertex in enumerate(G):
       distance of vertex from v  = dist_v[id]
       predecessor of vertex in shortest_path from v = pred_v[id]

Will it work?

comment:28 follow-up: Changed 2 years ago by dcoudert

I don't think it's a safe usage, but I may be wrong.

comment:29 in reply to: ↑ 28 Changed 2 years ago by gh-vipul79321

Replying to dcoudert:

I don't think it's a safe usage, but I may be wrong.

If that's the case then, we can returnlist of dicts, although it will be slow, but it will be an overall improvement on current scenario.

I will add a note about in meta-ticket.

Last edited 2 years ago by gh-vipul79321 (previous) (diff)

comment:30 Changed 2 years ago by dcoudert

2 ideas to consider:

  • add input parameter order or int_to_vertex to specify the order of the vertices in the list
  • have 2 possible returned types: dict of lists and dict of dicts. The decision could be either when the given ordering is None, or with an extra input parameter.

comment:31 Changed 23 months ago by git

  • Commit changed from 10143a3fe946e9e47c7023d842bb9ced936ee758 to c5bfbfa6559f55e329ba43f4e1de09229aa59397

Branch pushed to git repo; I updated commit sha1. New commits:

466e7fbupdated
243a084documentation remaining
9ee8ea2introduced a extra parameter order
2c27cb7correct type method re-introduced for sake of doc-test. Will modify after finailizing
c5bfbfaMerge branch 'develop' into temp

comment:32 follow-up: Changed 22 months ago by dcoudert

We are almost done

  • to avoid potential issues when building the doc, avoid : in
    +    The input graph can be weighted: if the algorithm is Dijkstra, no negative
    +    weights are allowed, while if the algorithm is Bellman-Ford, negative
    +    weights are allowed, but there must be no negative cycle (otherwise, the
    +    shortest paths might not exist).
    +
    +    However, Dijkstra algorithm is more efficient: for this reason, we suggest
    +    to use Bellman-Ford only if necessary (which is also the default option).
    
  • in the inout blocks, when the sentences are short, the convention is to not end with ., e.g.,
    -    - ``order`` -- list (default: ``None``); order of vertices of `g`.
    +    - ``order`` -- list (default: ``None``); order of vertices of `g`
    
    of course, if you have a paragraph with several sentences, a final . is needed
  • again avoid : in Two possible outputs:. You can write for instead The type of output depends on the input. More precisely,
  • this is simpler and more efficient
    -           for vertex in g:
    -                if vertex not in order:
    -                    raise ValueError("Given ordering is not valid")
    +            if any(v not in g for v in order:
    +                raise ValueError("Given ordering is not valid")
    
    indeed, since order is a list, the test vertex not in order has time complexity in O(n), while testing v not in g takes constant time (I hope)

comment:33 Changed 22 months ago by git

  • Commit changed from c5bfbfa6559f55e329ba43f4e1de09229aa59397 to 97fef26ecbd6d0f326f393199e96ab32bd2f73d4

Branch pushed to git repo; I updated commit sha1. New commits:

ad7dc7dreview commit
b345479fixed merge conflict
97fef26minor correction

comment:34 in reply to: ↑ 32 Changed 22 months ago by gh-vipul79321

Replying to dcoudert:

  • this is simpler and more efficient
    -           for vertex in g:
    -                if vertex not in order:
    -                    raise ValueError("Given ordering is not valid")
    +            if any(v not in g for v in order:
    +                raise ValueError("Given ordering is not valid")
    
    indeed, since order is a list, the test vertex not in order has time complexity in O(n), while testing v not in g takes constant time (I hope)

Tried this

 +            if any(v not in g for v in order:
 +                raise ValueError("Given ordering is not valid")

but it gave the following error closures inside cpdef functions not yet supported.

comment:35 Changed 22 months ago by dcoudert

  • Reviewers changed from Samuel Lelièvre to Samuel Lelièvre, David Coudert
  • Status changed from needs_review to positive_review

LGTM.

comment:36 Changed 22 months ago by vbraun

  • Branch changed from public/graphs/30188_shortest_paths to 97fef26ecbd6d0f326f393199e96ab32bd2f73d4
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.