Opened 4 years ago
Closed 4 years ago
#19007 closed enhancement (fixed)
Refactor Closeness Centrality
Reported by:  borassi  Owned by:  

Priority:  major  Milestone:  sage6.9 
Component:  graph theory  Keywords:  Closeness centrality 
Cc:  ncohen, dcoudert  Merged in:  
Authors:  Michele Borassi  Reviewers:  David Coudert 
Report Upstream:  N/A  Work issues:  
Branch:  1551c6f (Commits)  Commit:  1551c6fd513cd4f7ac6c787badd589cf349fb788 
Dependencies:  #18931, #18876, #18910  Stopgaps: 
Description (last modified by )
Now, the closeness centrality algorithm simply calls the corresponding NetworkX routine (only for undirected graphs).
This ticket will move this routine to file generic_graph (closeness centrality is defined also in this case ![1]), and it will modify input/output variables as in routine shortest_path. Then, it will call our shortest path new routines instead of NetworkX routines.
This ticket is a first step towards the inclusion of the algorithm in ![2].
![1] http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6816651&tag=1
![2] http://arxiv.org/pdf/1507.01490v1.pdf
Change History (37)
comment:1 Changed 4 years ago by
 Cc ncohen dcoudert added
 Component changed from PLEASE CHANGE to graph theory
 Description modified (diff)
 Keywords Closeness centrality added
 Summary changed from refactor_centrality_closeness to Refactor Closeness Centrality
 Type changed from PLEASE CHANGE to enhancement
comment:2 Changed 4 years ago by
 Branch set to u/borassi/refactor_centrality_closeness
comment:3 Changed 4 years ago by
 Commit set to 5c85793ccb8234ac668173620523f45dbae4b985
 Dependencies set to #18931, #18876, #18910
 Status changed from new to needs_review
comment:4 followup: ↓ 7 Changed 4 years ago by
I don't really understand what you are doing in this commit http://git.sagemath.org/sage.git/commit/?h=5c85793ccb8234ac668173620523f45dbae4b985
In the ticket description you say that you will move everything in generic_graph
, but it might be more readable/clean to create a new file for centrality measures, no?
Also, you moved only references to the methods, right? (or the modifications are hidden in the merge with #18931).
+  :meth:`~sage.graphs.graph.Graph.centrality_degree` +  :meth:`~centrality_betweenness`
David.
comment:5 Changed 4 years ago by
 Commit changed from 5c85793ccb8234ac668173620523f45dbae4b985 to 5853eb73a9705229a1947fe89b91624117683e59
Branch pushed to git repo; I updated commit sha1. New commits:
5853eb7  Merged with beta2

comment:6 Changed 4 years ago by
 Commit changed from 5853eb73a9705229a1947fe89b91624117683e59 to a19addbd50db4086adbfc88c29b5c01f368dabc6
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
a19addb  Rewrote closeness centrality according to new standards

comment:7 in reply to: ↑ 4 Changed 4 years ago by
Hello!
Thank you for the comment, I will try to address the main issues raised.
I have put everything in a single commit, which simply removes the closeness centrality routine from graph.py
and rewrites it in generic_graph.py
. Now the ticket should be much easier to review. Since closeness centrality highly depends on shortest paths, the new routine follows all conventions in ticket #18931.
About the creation of a new file: we already have file centrality.pyx
, containing fast algorithms for betweenness centrality. However, these routines are called from a centrality_betweenness
routine in file generic_graph.py
, which should be used by the user. In this ticket, I just copied this policy.
In my opinion, if we want to create a new file, we should at least move also betweenness centrality and degree centrality: in my opinion, this goes beyond the scope of this ticket. Do you agree?
Best,
Michele
Replying to dcoudert:
I don't really understand what you are doing in this commit http://git.sagemath.org/sage.git/commit/?h=5c85793ccb8234ac668173620523f45dbae4b985
In the ticket description you say that you will move everything in
generic_graph
, but it might be more readable/clean to create a new file for centrality measures, no? Also, you moved only references to the methods, right? (or the modifications are hidden in the merge with #18931).+  :meth:`~sage.graphs.graph.Graph.centrality_degree` +  :meth:`~centrality_betweenness`David.
comment:8 followup: ↓ 9 Changed 4 years ago by
Something is wrong here
+ try: + v_iter = iter(vert) + except TypeError: + v_iter = [v_iter] + v_iter = iter(vert)
I assume you want first to have `v_vert = iter(list(vert))
The default of calling distances = self.shortest_path_all_pairs(by_weight,algorithm, weight_function, check_weight)[0]
is that you get a dictionary of dictionary, that is a huge data structure for large graphs (lost of memory and long to create). At least the second part of this method should be cythonized to get the array of distances, as is done for centrality_betweenness
.
David.
comment:9 in reply to: ↑ 8 Changed 4 years ago by
Something is wrong here
+ try: + v_iter = iter(vert) + except TypeError: + v_iter = [v_iter] + v_iter = iter(vert)
+1.
In some cases a vertex can be a pair, and thus is iterable. This would react badly with this code.
sage: g=graphs.KneserGraph(5,2) sage: v=g.vertices()[0] sage: g.centrality_closeness(v) <crash>
Nathann
comment:10 Changed 4 years ago by
Right. My previous proposal is not a good solution.
Something like that might also be unsafe if some labels in vert
are not in G
, but this can be checked when trying to associate computed values and vertices.
v_vert = iter([vert]) if vert in G else iter(vert)
comment:11 Changed 4 years ago by
 Commit changed from a19addbd50db4086adbfc88c29b5c01f368dabc6 to 7dfa2d26bcaf422f60b3ffd083bc535048922b2f
Branch pushed to git repo; I updated commit sha1. New commits:
7dfa2d2  Corrected v_iter, cythonized Johnson

comment:12 Changed 4 years ago by
Hello!
I hope you spent a great Assumption day!
With this new commit, (hopefully) I solved the problems with v_vert
and I cythonized the Johnson algorithm. In my opinion, it is not worth cythonizing FloydWarshall, because it is much slower than Johnson, especially on sparse graphs.
Thank you very much!
Michele
comment:13 followup: ↓ 16 Changed 4 years ago by
Hello,
I'm trying to understand the behavior of the method on digraphs. You wrote that the centrality of vertices of degree 0 is not reported. Does it means that for digraphs you don't report the centrality of vertices with outdegree 0? Also, what's the expected behavior on strongly connected digraphs?
David.
comment:14 Changed 4 years ago by
 Commit changed from 7dfa2d26bcaf422f60b3ffd083bc535048922b2f to 7a4059a8803326baf32f9922307d9f7ee3ce7238
Branch pushed to git repo; I updated commit sha1. New commits:
7a4059a  Improved documentation

comment:15 Changed 4 years ago by
 Commit changed from 7a4059a8803326baf32f9922307d9f7ee3ce7238 to 5c67f79dcb519e1df3a76021677a8292583a3d31
Branch pushed to git repo; I updated commit sha1. New commits:
5c67f79  Small mistake

comment:16 in reply to: ↑ 13 Changed 4 years ago by
Hello!
I tried to improve the documentation, answering your questions.
Michele
Replying to dcoudert:
You wrote that the centrality of vertices of degree 0 is not reported. Does it means that for digraphs you don't report the centrality of vertices with outdegree 0?
Yes, I changed in the documentation "degree" with "(out)degree"
Also, what's the expected behavior on strongly connected digraphs?
The closeness centrality of a vertex v is 1/farn(v), where farn(v) is the average distance between v and a random vertex w different from v (as in the case of connected graphs). I added an example about this.
comment:17 Changed 4 years ago by
Hello,
This is much better now. The method works well, passes all tests and the doc build properly.
One possible modification is to move the method to file centrality.pyx
which is made for gathering centrality methods. Let me know.
Best, David.
comment:18 Changed 4 years ago by
 Commit changed from 5c67f79dcb519e1df3a76021677a8292583a3d31 to 48514adedac2e122137907f410a92dcd5e6e6e3c
Branch pushed to git repo; I updated commit sha1. New commits:
48514ad  Moved centrality_closeness to centrality.pyx

comment:19 Changed 4 years ago by
Done!
However, I had to do some "magic" to avoid duplicate references... Hope it is ok!
comment:20 Changed 4 years ago by
 Commit changed from 48514adedac2e122137907f410a92dcd5e6e6e3c to cf9cf870c430b7f4ba06ec990fa37cf9ae66290d
Branch pushed to git repo; I updated commit sha1. New commits:
cf9cf87  Trailing whitespaces

comment:21 Changed 4 years ago by
Yep, I add to make docclean && make doc
to get it compiled.
I just saw in the doc of centrality_degree
in graph.py
, in section See also, that the link to centrality_closeness
is broken. In fact, the code is :meth:`centrality_closeness`
. Could you check that?
In fact we should also move this methode to centrality.pyx
, but in another ticket.
After that I think it will be ok.
David.
comment:22 Changed 4 years ago by
 Commit changed from cf9cf870c430b7f4ba06ec990fa37cf9ae66290d to 90638aa3a78bf992aa29902f929f131a973c021e
Branch pushed to git repo; I updated commit sha1. New commits:
90638aa  Corrected link

comment:23 Changed 4 years ago by
Done!
comment:24 Changed 4 years ago by
Hello,
do you understand why the patchbot reports some errors? I have no error when testing this patch on my computer (install, long tests, trials, docbuild html et pdf,...).
Best, David.
comment:25 Changed 4 years ago by
What is the patchbot? How can I find these errors?
comment:26 Changed 4 years ago by
on the top of this page, you can see a blue disk (or red or green, etc.) with a link to http://patchbot.sagemath.org/ticket/19007/ It reports on automatic tests of your patch. Tests are performed on all modules.
comment:27 Changed 4 years ago by
 Commit changed from 90638aa3a78bf992aa29902f929f131a973c021e to 73dd6d5e0a69e8c26ba23bf5425a9b4122a34ff6
Branch pushed to git repo; I updated commit sha1. New commits:
73dd6d5  Solve patchbot  first attempt

comment:28 Changed 4 years ago by
 Commit changed from 73dd6d5e0a69e8c26ba23bf5425a9b4122a34ff6 to f1c7fd57d513efda616b6dc8c3a81c7891ef5f12
Branch pushed to git repo; I updated commit sha1. New commits:
f1c7fd5  Solve patchbot  second attempt

comment:29 Changed 4 years ago by
Note that the patchbot needs a lot of time: it is not tested instantaneously and it takes some hours of computations...
comment:30 Changed 4 years ago by
Ok, I will wait until it answers... In my opinion, the problem is that we need to import centrality.pyx in generic_graph at startup, and this slows down the startup time. I tried to change the imports, hoping the total time will decrease. If any of the two attempts work, we are fine, otherwise I probably have to mail sagedevel...
comment:31 Changed 4 years ago by
Well, generic graph is becoming really really big with lots of tests.
For instance, do we really need the tests with for i in range(100):
. Can't we have only one test we randomly chosen number of edges? That should be enough.
David.
comment:32 Changed 4 years ago by
I'm afraid that all the attempts I have made were unsuccessful. The point is that importing module centrality.pyx
at startup is expensive, and I need it to import it to add method centrality_closeness
to GenericGraph
. Hence, the startup time becomes 2 seconds (too much).
In my opinion, this might be a reason why it is better to leave method centrality_closeness
in generic_graph
, and put only Cythonrelated methods inside centrality.pyx
, in order to use lazy imports. May I move method centrality_closeness back?
If we still want to move centrality_closeness
and related methods to centrality.pyx
, I think we should open another ticket, in order to find a way to use lazy imports (I have no idea how, though).
comment:33 Changed 4 years ago by
ok, move it back to generic graph. David.
comment:34 Changed 4 years ago by
 Commit changed from f1c7fd57d513efda616b6dc8c3a81c7891ef5f12 to 1551c6fd513cd4f7ac6c787badd589cf349fb788
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
1551c6f  Undo commit moving closeness_centrality to centrality.pyx. Corrected links.

comment:35 Changed 4 years ago by
 Reviewers set to David Coudert
For me the patch is good to go, but I prefer to wait for the next patchbot answer. David.
comment:36 Changed 4 years ago by
 Status changed from needs_review to positive_review
Since the patchbot reports no error and that the patch passes all tests/docbuild/etc. on my computer, I set it to positive review. Thanks. David.
comment:37 Changed 4 years ago by
 Branch changed from u/borassi/refactor_centrality_closeness to 1551c6fd513cd4f7ac6c787badd589cf349fb788
 Resolution set to fixed
 Status changed from positive_review to closed
Hello!
This is the first version of the refactoring of closeness_centrality. Let me know if you like it!