Opened 4 years ago

Closed 4 years ago

#19007 closed enhancement (fixed)

Refactor Closeness Centrality

Reported by: borassi Owned by:
Priority: major Milestone: sage-6.9
Component: graph theory Keywords: Closeness centrality
Cc: ncohen, dcoudert Merged in:
Authors: Michele Borassi Reviewers: David Coudert
Report Upstream: N/A Work issues:
Branch: 1551c6f (Commits) Commit: 1551c6fd513cd4f7ac6c787badd589cf349fb788
Dependencies: #18931, #18876, #18910 Stopgaps:

Description (last modified by borassi)

Now, the closeness centrality algorithm simply calls the corresponding NetworkX routine (only for undirected graphs).

This ticket will move this routine to file generic_graph (closeness centrality is defined also in this case ![1]), and it will modify input/output variables as in routine shortest_path. Then, it will call our shortest path new routines instead of NetworkX routines.

This ticket is a first step towards the inclusion of the algorithm in ![2].

![1] http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6816651&tag=1

![2] http://arxiv.org/pdf/1507.01490v1.pdf

Change History (37)

comment:1 Changed 4 years ago by borassi

  • Authors set to Michele Borassi
  • Cc ncohen dcoudert added
  • Component changed from PLEASE CHANGE to graph theory
  • Description modified (diff)
  • Keywords Closeness centrality added
  • Summary changed from refactor_centrality_closeness to Refactor Closeness Centrality
  • Type changed from PLEASE CHANGE to enhancement

comment:2 Changed 4 years ago by borassi

  • Branch set to u/borassi/refactor_centrality_closeness

comment:3 Changed 4 years ago by borassi

  • Commit set to 5c85793ccb8234ac668173620523f45dbae4b985
  • Dependencies set to #18931, #18876, #18910
  • Status changed from new to needs_review

Hello!

This is the first version of the refactoring of closeness_centrality. Let me know if you like it!

comment:4 follow-up: Changed 4 years ago by dcoudert

I don't really understand what you are doing in this commit http://git.sagemath.org/sage.git/commit/?h=5c85793ccb8234ac668173620523f45dbae4b985

In the ticket description you say that you will move everything in generic_graph, but it might be more readable/clean to create a new file for centrality measures, no? Also, you moved only references to the methods, right? (or the modifications are hidden in the merge with #18931).

+            - :meth:`~sage.graphs.graph.Graph.centrality_degree`
+            - :meth:`~centrality_betweenness`

David.

comment:5 Changed 4 years ago by git

  • Commit changed from 5c85793ccb8234ac668173620523f45dbae4b985 to 5853eb73a9705229a1947fe89b91624117683e59

Branch pushed to git repo; I updated commit sha1. New commits:

5853eb7Merged with beta2

comment:6 Changed 4 years ago by git

  • Commit changed from 5853eb73a9705229a1947fe89b91624117683e59 to a19addbd50db4086adbfc88c29b5c01f368dabc6

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

a19addbRewrote closeness centrality according to new standards

comment:7 in reply to: ↑ 4 Changed 4 years ago by borassi

Hello!

Thank you for the comment, I will try to address the main issues raised.

I have put everything in a single commit, which simply removes the closeness centrality routine from graph.py and rewrites it in generic_graph.py. Now the ticket should be much easier to review. Since closeness centrality highly depends on shortest paths, the new routine follows all conventions in ticket #18931.

About the creation of a new file: we already have file centrality.pyx, containing fast algorithms for betweenness centrality. However, these routines are called from a centrality_betweenness routine in file generic_graph.py, which should be used by the user. In this ticket, I just copied this policy.

In my opinion, if we want to create a new file, we should at least move also betweenness centrality and degree centrality: in my opinion, this goes beyond the scope of this ticket. Do you agree?

Best,

Michele

Replying to dcoudert:

I don't really understand what you are doing in this commit http://git.sagemath.org/sage.git/commit/?h=5c85793ccb8234ac668173620523f45dbae4b985

In the ticket description you say that you will move everything in generic_graph, but it might be more readable/clean to create a new file for centrality measures, no? Also, you moved only references to the methods, right? (or the modifications are hidden in the merge with #18931).

+            - :meth:`~sage.graphs.graph.Graph.centrality_degree`
+            - :meth:`~centrality_betweenness`

David.

Last edited 4 years ago by borassi (previous) (diff)

comment:8 follow-up: Changed 4 years ago by dcoudert

Something is wrong here

+            try:
+                v_iter = iter(vert)
+            except TypeError:
+                v_iter = [v_iter]
+                v_iter = iter(vert)

I assume you want first to have `v_vert = iter(list(vert))

The default of calling distances = self.shortest_path_all_pairs(by_weight,algorithm, weight_function, check_weight)[0] is that you get a dictionary of dictionary, that is a huge data structure for large graphs (lost of memory and long to create). At least the second part of this method should be cythonized to get the array of distances, as is done for centrality_betweenness.

David.

comment:9 in reply to: ↑ 8 Changed 4 years ago by ncohen

Something is wrong here

+            try:
+                v_iter = iter(vert)
+            except TypeError:
+                v_iter = [v_iter]
+                v_iter = iter(vert)

+1.

In some cases a vertex can be a pair, and thus is iterable. This would react badly with this code.

sage: g=graphs.KneserGraph(5,2)
sage: v=g.vertices()[0]
sage: g.centrality_closeness(v)
<crash>

Nathann

comment:10 Changed 4 years ago by dcoudert

Right. My previous proposal is not a good solution. Something like that might also be unsafe if some labels in vert are not in G, but this can be checked when trying to associate computed values and vertices.

v_vert = iter([vert]) if vert in G else iter(vert)

comment:11 Changed 4 years ago by git

  • Commit changed from a19addbd50db4086adbfc88c29b5c01f368dabc6 to 7dfa2d26bcaf422f60b3ffd083bc535048922b2f

Branch pushed to git repo; I updated commit sha1. New commits:

7dfa2d2Corrected v_iter, cythonized Johnson

comment:12 Changed 4 years ago by borassi

Hello!

I hope you spent a great Assumption day!

With this new commit, (hopefully) I solved the problems with v_vert and I cythonized the Johnson algorithm. In my opinion, it is not worth cythonizing Floyd-Warshall, because it is much slower than Johnson, especially on sparse graphs.

Thank you very much!

Michele

comment:13 follow-up: Changed 4 years ago by dcoudert

Hello,

I'm trying to understand the behavior of the method on digraphs. You wrote that the centrality of vertices of degree 0 is not reported. Does it means that for digraphs you don't report the centrality of vertices with out-degree 0? Also, what's the expected behavior on strongly connected digraphs?

David.

comment:14 Changed 4 years ago by git

  • Commit changed from 7dfa2d26bcaf422f60b3ffd083bc535048922b2f to 7a4059a8803326baf32f9922307d9f7ee3ce7238

Branch pushed to git repo; I updated commit sha1. New commits:

7a4059aImproved documentation

comment:15 Changed 4 years ago by git

  • Commit changed from 7a4059a8803326baf32f9922307d9f7ee3ce7238 to 5c67f79dcb519e1df3a76021677a8292583a3d31

Branch pushed to git repo; I updated commit sha1. New commits:

5c67f79Small mistake

comment:16 in reply to: ↑ 13 Changed 4 years ago by borassi

Hello!

I tried to improve the documentation, answering your questions.

Michele

Replying to dcoudert:

You wrote that the centrality of vertices of degree 0 is not reported. Does it means that for digraphs you don't report the centrality of vertices with out-degree 0?

Yes, I changed in the documentation "degree" with "(out)degree"

Also, what's the expected behavior on strongly connected digraphs?

The closeness centrality of a vertex v is 1/farn(v), where farn(v) is the average distance between v and a random vertex w different from v (as in the case of connected graphs). I added an example about this.

comment:17 Changed 4 years ago by dcoudert

Hello,

This is much better now. The method works well, passes all tests and the doc build properly.

One possible modification is to move the method to file centrality.pyx which is made for gathering centrality methods. Let me know.

Best, David.

comment:18 Changed 4 years ago by git

  • Commit changed from 5c67f79dcb519e1df3a76021677a8292583a3d31 to 48514adedac2e122137907f410a92dcd5e6e6e3c

Branch pushed to git repo; I updated commit sha1. New commits:

48514adMoved centrality_closeness to centrality.pyx

comment:19 Changed 4 years ago by borassi

Done!

However, I had to do some "magic" to avoid duplicate references... Hope it is ok!

comment:20 Changed 4 years ago by git

  • Commit changed from 48514adedac2e122137907f410a92dcd5e6e6e3c to cf9cf870c430b7f4ba06ec990fa37cf9ae66290d

Branch pushed to git repo; I updated commit sha1. New commits:

cf9cf87Trailing whitespaces

comment:21 Changed 4 years ago by dcoudert

Yep, I add to make doc-clean && make doc to get it compiled.

I just saw in the doc of centrality_degree in graph.py, in section See also, that the link to centrality_closeness is broken. In fact, the code is :meth:`centrality_closeness`. Could you check that? In fact we should also move this methode to centrality.pyx, but in another ticket.

After that I think it will be ok.

David.

comment:22 Changed 4 years ago by git

  • Commit changed from cf9cf870c430b7f4ba06ec990fa37cf9ae66290d to 90638aa3a78bf992aa29902f929f131a973c021e

Branch pushed to git repo; I updated commit sha1. New commits:

90638aaCorrected link

comment:23 Changed 4 years ago by borassi

Done!

comment:24 Changed 4 years ago by dcoudert

Hello,

do you understand why the patchbot reports some errors? I have no error when testing this patch on my computer (install, long tests, trials, docbuild html et pdf,...).

Best, David.

comment:25 Changed 4 years ago by borassi

What is the patchbot? How can I find these errors?

comment:26 Changed 4 years ago by dcoudert

on the top of this page, you can see a blue disk (or red or green, etc.) with a link to http://patchbot.sagemath.org/ticket/19007/ It reports on automatic tests of your patch. Tests are performed on all modules.

comment:27 Changed 4 years ago by git

  • Commit changed from 90638aa3a78bf992aa29902f929f131a973c021e to 73dd6d5e0a69e8c26ba23bf5425a9b4122a34ff6

Branch pushed to git repo; I updated commit sha1. New commits:

73dd6d5Solve patchbot - first attempt

comment:28 Changed 4 years ago by git

  • Commit changed from 73dd6d5e0a69e8c26ba23bf5425a9b4122a34ff6 to f1c7fd57d513efda616b6dc8c3a81c7891ef5f12

Branch pushed to git repo; I updated commit sha1. New commits:

f1c7fd5Solve patchbot - second attempt

comment:29 Changed 4 years ago by dcoudert

Note that the patchbot needs a lot of time: it is not tested instantaneously and it takes some hours of computations...

comment:30 Changed 4 years ago by borassi

Ok, I will wait until it answers... In my opinion, the problem is that we need to import centrality.pyx in generic_graph at startup, and this slows down the startup time. I tried to change the imports, hoping the total time will decrease. If any of the two attempts work, we are fine, otherwise I probably have to mail sage-devel...

comment:31 Changed 4 years ago by dcoudert

Well, generic graph is becoming really really big with lots of tests. For instance, do we really need the tests with for i in range(100): . Can't we have only one test we randomly chosen number of edges? That should be enough.

David.

comment:32 Changed 4 years ago by borassi

I'm afraid that all the attempts I have made were unsuccessful. The point is that importing module centrality.pyx at startup is expensive, and I need it to import it to add method centrality_closeness to GenericGraph. Hence, the startup time becomes 2 seconds (too much).

In my opinion, this might be a reason why it is better to leave method centrality_closeness in generic_graph, and put only Cython-related methods inside centrality.pyx, in order to use lazy imports. May I move method centrality_closeness back?

If we still want to move centrality_closeness and related methods to centrality.pyx, I think we should open another ticket, in order to find a way to use lazy imports (I have no idea how, though).

comment:33 Changed 4 years ago by dcoudert

ok, move it back to generic graph. David.

comment:34 Changed 4 years ago by git

  • Commit changed from f1c7fd57d513efda616b6dc8c3a81c7891ef5f12 to 1551c6fd513cd4f7ac6c787badd589cf349fb788

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

1551c6fUndo commit moving closeness_centrality to centrality.pyx. Corrected links.

comment:35 Changed 4 years ago by dcoudert

  • Reviewers set to David Coudert

For me the patch is good to go, but I prefer to wait for the next patchbot answer. David.

comment:36 Changed 4 years ago by dcoudert

  • Status changed from needs_review to positive_review

Since the patchbot reports no error and that the patch passes all tests/docbuild/etc. on my computer, I set it to positive review. Thanks. David.

comment:37 Changed 4 years ago by vbraun

  • Branch changed from u/borassi/refactor_centrality_closeness to 1551c6fd513cd4f7ac6c787badd589cf349fb788
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.