Opened 23 months ago

Closed 6 months ago

#24101 closed enhancement (fixed)

Implement Katz centrality and related methods for graphs

Reported by: clienkaemper Owned by: gh-rajat1433
Priority: major Milestone: sage-8.7
Component: graph theory Keywords: sd90
Cc: Merged in:
Authors: Caitlin Lienkaemper, Rajat Mittal Reviewers: David Coudert
Report Upstream: N/A Work issues:
Branch: 1a87f58 (Commits) Commit: 1a87f58d508f68e838cac0ba92d47f5160b23bc4
Dependencies: Stopgaps:

Description

want to be able to compute Katz centrality, use Katz method for link prediction in graphs/networks (https://en.wikipedia.org/wiki/Katz_centrality)

Change History (46)

comment:1 Changed 23 months ago by clienkaemper

  • Branch set to u/clienkaemper/implement_katz_centrality_and_related_methods_for_graphs

comment:2 Changed 23 months ago by git

  • Commit set to 99c2705ffb239f13cf58b6558ee5a66ba72f6e1c

Branch pushed to git repo; I updated commit sha1. New commits:

99c2705Adds functions for Katz centrality and related methods

comment:3 Changed 23 months ago by clienkaemper

  • Status changed from new to needs_review

comment:4 Changed 23 months ago by dcoudert

  • Reviewers set to David Coudert
  • Status changed from needs_review to needs_work

Welcome to Sagemath

I have some comments to help you improving the code.

  • unify nonedges_only and nonedgesonly. Actually, #24094 uses nonedgesonly. I don't know which is best, but should be the same.
  • Returns the Katz matrix of G with parameter alpha. -> Return the Katz matrix of the graph with parameter alpha.
  • only 1 empty line is needed after INPUT::
  • `True` -> ``True``
  • if true, eliminates common neighbors between adjacent vertices. Could you add that the value for each edge is set to 0 ?
  • the Katz matrix of the graph self with parameter alpha -> the Katz matrix of the graph with parameter alpha
  • G1 = Graph({1:[2,3], 2:[1,4], 3:[1,4], 4:[2,3]}) -> G1 = graphs.CycleGraph(4) ?
  • since you don't have G2, you could rename G1 to G
  • add a space between * and the text in *:meth:`~katz_centrality` and the line after
  • add an empty line after TESTS::
  • The examples you give in the TESTS could be in the examples section. In a TESTS block, we expect examples raising errors, with the empty graph, etc.
  • the test alpha <= 0 should be made before any other computation
  • if nonedges_only == True: -> if nonedges_only:

in method def katz_centrality(self, alpha):

  • you could add an optional parameter u to return the centrality of a specified vertex. For that, add u=None and then the return statement becomes something like if u is None: return K else: return K[u]
  • you must indent the text in 80 columns mode. Some code editors are really helpful to do that without effort
  • You return a list. Wouldn't it be better to return a dictionary keyed by vertices ? How to know which value corresponds to a node ?

comment:5 Changed 6 months ago by gh-rajat1433

  • Milestone changed from sage-8.1 to sage-8.7
  • Owner changed from (none) to gh-rajat1433

I would like to work on it. Should I work on existing branch or should I create a new one based on latest sage develop version?

Last edited 6 months ago by gh-rajat1433 (previous) (diff)

comment:6 Changed 6 months ago by dcoudert

You can either rebase this ticket on last develop branch (and fix the merge conflicts), or create a new branch based on last beta.

comment:7 Changed 6 months ago by gh-rajat1433

  • Branch changed from u/clienkaemper/implement_katz_centrality_and_related_methods_for_graphs to u/gh-rajat1433/24101_katz_centrality
  • Commit 99c2705ffb239f13cf58b6558ee5a66ba72f6e1c deleted

comment:8 Changed 6 months ago by git

  • Commit set to c7fff6c43966ae48eeac2ee4b974a56875b5ccc7

Branch pushed to git repo; I updated commit sha1. New commits:

77310bbKatz added
2c0f41dKatz added improved
c7fff6cfixed typos

comment:9 Changed 6 months ago by gh-rajat1433

  • Status changed from needs_work to needs_review

comment:10 Changed 6 months ago by dcoudert

Several issues (I'm skipping the coding style issues for the moment, but please have a look at http://doc.sagemath.org/html/en/developer/coding_basics.html and check how other methods are written).

In method katz_matrix:

  • add some explanation of what is a Katz matrix
  • please add parameter vertices similarly to method .adjacency_matrix(). This is important as in the future we will no longer ensure that the list of vertices is sorted (due to Python 3).

In method katz_centrality

  • add some explanation of what is the Katz centrality, plus a link to relevant wikipedia page
  • in the code, choose an order like verts = list(self), and pass it to katz_matrix. Here we don't need the list of vertices to be sorted, we just want to know the ordering, so we fix it.
  • u not in self.vertices() -> u not in self
  • You can simplify the construction of the dictionary
    -        for i in range(n):
    -            Kdict[vert[i]] = katz_values[0][i]
    +        Kdict = {u: katz_values[0][i] for i, u in enumerate(verts)}
    
  • Do we really need to build dictionary Kdict when u is not None ?

Do not forget to check that all tests pass and that the documentation builds properly and displays nicely.

comment:11 Changed 6 months ago by git

  • Commit changed from c7fff6c43966ae48eeac2ee4b974a56875b5ccc7 to 66151a6333aef8c6af48bb28eb33c53041c87e0a

Branch pushed to git repo; I updated commit sha1. New commits:

66151a6changes made in accordance to the review.

comment:12 follow-up: Changed 6 months ago by gh-rajat1433

Thanks for the prompt reply and suggesting the changes. Done all the improvements mentioned above and also made sure to pass all the tests. I have done the changes as per required by the documentation in generic_graph.py file. But could not figure out how to view those changes as they will actually be looking in documentation. And is there any test command for checking if format is correct for documentation.

Last edited 6 months ago by gh-rajat1433 (previous) (diff)

comment:13 Changed 6 months ago by git

  • Commit changed from 66151a6333aef8c6af48bb28eb33c53041c87e0a to 172082f5a200768a5f769cbc6373cb5a520b6c81

Branch pushed to git repo; I updated commit sha1. New commits:

172082fmade additions for documentation.

comment:14 Changed 6 months ago by git

  • Commit changed from 172082f5a200768a5f769cbc6373cb5a520b6c81 to 47d85e37b4a64b10405088a34adfb7b5693d9ef3

Branch pushed to git repo; I updated commit sha1. New commits:

47d85e3bug fix

comment:15 in reply to: ↑ 12 Changed 6 months ago by dcoudert

I have done the changes as per required by the documentation in generic_graph.py file. But could not figure out how to view those changes as they will actually be looking in documentation. And is there any test command for checking if format is correct for documentation.

See http://doc.sagemath.org/html/en/developer/sage_manuals.html

You can rebuild the full documentation using

sage -b && sage --docbuild reference html

or rebuild only the documentation of the graph module

sage -b && sage --docbuild reference/graphs html

Then, you can look at the documentation with your web browser.

file://PATH_ON_YOUR_COMPUTER/sage/local/share/doc/sage/html/en/reference/graphs/index.html

For the coding style, please try to comply to https://www.python.org/dev/peps/pep-0008

comment:16 Changed 6 months ago by git

  • Commit changed from 47d85e37b4a64b10405088a34adfb7b5693d9ef3 to 67a23844be9cb5a0fce4293bf02d7ce616629c91

Branch pushed to git repo; I updated commit sha1. New commits:

67a2384documentation build changes

comment:17 Changed 6 months ago by gh-rajat1433

I have made changes so as to build the documentation properly and also passed all the tests and tried to follow the coding convention. Can you please review the code so that we can finalize it.

Last edited 6 months ago by gh-rajat1433 (previous) (diff)

comment:18 Changed 6 months ago by dcoudert

  • You must ensure that all comments are formatted in 80 columns mode. Some code editors can be configured to ease this task
  • minor stylistic issue
    -        - ``nonedgesonly`` -- Boolean (default: ``True``); if True, value for each
    -          edge present in the graph is set to zero.
    +        - ``nonedgesonly`` -- boolean (default: ``True``); if ``True``, value for each
    +          edge present in the graph is set to zero
    
  • you can use for u in self or set(self) or if u in self, etc.
    -              set(vertices) != set(self.vertex_iterator())):
    +              set(vertices) != set(self)):
    
  • pep8
    -        K =  (In - alpha * A.transpose()).inverse()-In
    +        K =  (In - alpha * A.transpose()).inverse() - In
    

-            Missing = onesmat - A- In
+            Missing = onesmat - A - In
  • Please revise method katz_centrality: alignment, ensure that the code is correct, remove useless instructions, etc.

comment:19 Changed 6 months ago by git

  • Commit changed from 67a23844be9cb5a0fce4293bf02d7ce616629c91 to 27579a233a20e790ffc17b4273699b4fa0f17534

Branch pushed to git repo; I updated commit sha1. New commits:

05e8e6bempty fix
27579a2fixed alignment and other parts as per review

comment:20 Changed 6 months ago by gh-rajat1433

Thanks for the review. Its been really a learning experience. I have started following alignment and pep8 seriously and I have fixed the issues above.

Last edited 6 months ago by gh-rajat1433 (previous) (diff)

comment:21 Changed 6 months ago by dcoudert

Please check carefully your code, that comments are in 80 columns mode, if possible that you follow the PEP8 coding style, that the comments are sufficient and helpful to the user, and that your code is best possible.

for instance, you have apparently not double check this part:

-        if u :
-            idx = verts.index(u)
-            return katz_values[0][idx]
-
-        Kdict = {}
-        Kdict = {u: katz_values[0][i] for i, u in enumerate(verts)}
-
-        if u is None:
-            return Kdict
-        else :
-            return Kdict[u]
+        if u:
+            return katz_values[0][verts.index(u)]
+        return {u: katz_values[0][i] for i, u in enumerate(verts)}

comment:22 Changed 6 months ago by dcoudert

  • Status changed from needs_review to needs_work

comment:23 Changed 6 months ago by git

  • Commit changed from 27579a233a20e790ffc17b4273699b4fa0f17534 to f85cf1b95a6091ff0d04ec370c97dd77f1a25165

Branch pushed to git repo; I updated commit sha1. New commits:

f85cf1bimproved code

comment:24 Changed 6 months ago by git

  • Commit changed from f85cf1b95a6091ff0d04ec370c97dd77f1a25165 to 42019acaea411c958b480c4f8cfc1f04ffc97cd4

Branch pushed to git repo; I updated commit sha1. New commits:

686b15eimproved code
42019acimproved code

comment:25 Changed 6 months ago by gh-rajat1433

  • Status changed from needs_work to needs_review

comment:26 Changed 6 months ago by dcoudert

  • except in the names of the method, use Katz and not katz everywhere.
  • remove trailing white spaces
  • you should rephrase the description of method katz_matrix to first give the definition of this centrality measure, and then explain what's the returned matrix.
  • Avoid having too many empty lines like this. 1 is often enough
    +
    +
    +
    
  • remove the . at the end of error messages
  • if alpha <= 0: -> if alpha <= 0:
  • -        We compute katz_centrality for the undirected 4-cycle again (note that 
    +        We compute the Katz centrality of a 4-cycle (note that 
    
  • -        Return the katz centrality of the vertex u of the graph.
    +        Return the Katz centrality of vertex `u`.
    
  • there is a α. It should be ok, but it might be safer to use \alpha.

comment:27 Changed 6 months ago by git

  • Commit changed from 42019acaea411c958b480c4f8cfc1f04ffc97cd4 to cb9e45b198907ebbd354128cdddc30374b63fe9b

Branch pushed to git repo; I updated commit sha1. New commits:

cb9e45bfixed the coding style

comment:28 Changed 6 months ago by gh-rajat1433

Thanks for the review , I have fixed the coding style bugs.

comment:29 Changed 6 months ago by dcoudert

  • to use latex in the documentation, you must write `\alpha`, and not only \alpha. Using r""" and then `...` indicates to use latex.
  • in katz_centrality, when you ask for the centrality of a vertex u, you first compute katz_values = (M*matrix(QQ, n, 1, lambda i, j: 1)).transpose(). However, if I understand well the definition of the Katz matrix, it suffices to make the some of the values of a particular row (or column), right ? If so, you could avoid the matrix multiplication when asking for a single value.
  • also, a . and a to remove
    -          reciprocal of the spectral radius of the graph. (the maximum
    -          absolute eigenvalue of the adjacency matrix )
    +          reciprocal of the spectral radius of the graph (the maximum
    +          absolute eigenvalue of the adjacency matrix)
    

comment:30 Changed 6 months ago by git

  • Commit changed from cb9e45b198907ebbd354128cdddc30374b63fe9b to 3d9569b0792d34ec974ac7729c06762545e4710b

Branch pushed to git repo; I updated commit sha1. New commits:

3d9569bImproved Code

comment:31 Changed 6 months ago by gh-rajat1433

I have used sum function now and eliminated the multiplication step. I guess it give better complexity now in terms of space and time specially for a single vertex case.

comment:32 Changed 6 months ago by dcoudert

This is much better now. We certainly same some computation time (but see point below). For the space complexity, it remains in O(n^2) as we compute the Katz matrix in all cases.

  • The test if u and u not in self: must be done before computing verts = list(self) and more importantly the computation of the Katz matrix.

I have now more algorithmic questions (more interesting than coding style :P):

  • What is the expected answer when the graph is not connected ? May be we can save computation time if the answer is always 0 for vertices in different components ?
  • Same question for strongly connected digraphs? In the example, vertices 1, 2, 3, 7, 9 have no in-neighbors and vertex 4 has no out-neighbor. When you will get an answer, you should document the example explaining why some values are 0.

comment:33 Changed 6 months ago by gh-rajat1433

What is the expected answer when the graph is not connected ? May be we can save computation time if the answer is always 0 for vertices in different components ?

  • When graph is not connected answer for a particular vertex is only dependent on the vertices in its connected component only and not in the other component. So yes we really can do optimize here. Will work on that.

Same question for strongly connected digraphs? In the example, vertices 1, 2, 3, 7, 9 have no in-neighbors and vertex 4 has no out-neighbor. When you will get an answer, you should document the example explaining why some values are 0.

  • for strongly connected Digraph the answer will be non zero for every node. Also for any node with in-neighbors 0 the answer will be 0 as its the very essence of katz centrality to measure influence of an actor(node here) in a social network.

Will document the example with reasoning.

comment:34 Changed 6 months ago by git

  • Commit changed from 3d9569b0792d34ec974ac7729c06762545e4710b to c796415614414e13fcb9b80ca32b1f10f6512f61

Branch pushed to git repo; I updated commit sha1. New commits:

c796415optimized

comment:35 Changed 6 months ago by git

  • Commit changed from c796415614414e13fcb9b80ca32b1f10f6512f61 to e2f562c2bba3c04e947df8813133d86cef61597a

Branch pushed to git repo; I updated commit sha1. New commits:

e2f562coptimized

comment:36 Changed 6 months ago by dcoudert

  • You can get the connected component containing vertex u directly using C = G.connected_component_containing_vertex(u, sort=False).

Then you must compute the Katz matrix of the subgraph, i.e., self.subgraph(C).

if u:
    if self.is_connected():
        G = self
    else:
        G = self.subgraph(self.connected_component_containing_vertex(u, sort=False))
        verts = list(G)
    M = G.katz_matrix(alpha, nonedgesonly=False, vertices=verts)
    return sum(M[verts.index(u)])
  • When you compute the Katz centrality of all vertices, you want the union of the Katz centralities of each subgraph. You can use self.connected_components_subgraphs()

comment:37 Changed 6 months ago by git

  • Commit changed from e2f562c2bba3c04e947df8813133d86cef61597a to 13c28217b0d813ff0e4f23f789a5113d7b816935

Branch pushed to git repo; I updated commit sha1. New commits:

13c2821optimized the code

comment:38 Changed 6 months ago by gh-rajat1433

I have optimized the code as per the review.

comment:39 Changed 6 months ago by dcoudert

Please double check your code and test it ! it contains obvious errors.

You can write directly for g in self.connected_components_subgraphs()

Add a doctest with a non connected graph, e.g., graphs.PathGraph(3) + graphs.PathGraph(4).

comment:40 Changed 6 months ago by git

  • Commit changed from 13c28217b0d813ff0e4f23f789a5113d7b816935 to f47a3dd2aac48aa6d400f1a89748844a92dea0c7

Branch pushed to git repo; I updated commit sha1. New commits:

f47a3ddfixed the errors

comment:41 Changed 6 months ago by gh-rajat1433

I am really sorry about the previous errors. Now its fixed and well tested.

comment:42 Changed 6 months ago by dcoudert

Could you change the example to sage: (graphs.PathGraph(3) + graphs.PathGraph(4)).katz_centrality(1/20). This is to show that the values for the vertices in the P3 and in the P4 are the same than in previous examples.

comment:43 Changed 6 months ago by git

  • Commit changed from f47a3dd2aac48aa6d400f1a89748844a92dea0c7 to 1a87f58d508f68e838cac0ba92d47f5160b23bc4

Branch pushed to git repo; I updated commit sha1. New commits:

1a87f58minor addition

comment:44 Changed 6 months ago by gh-rajat1433

Could you change the example to sage: (graphs.PathGraph?(3) + graphs.PathGraph?(4)).katz_centrality(1/20). This is to show that the values for the vertices in the P3 and in the P4 are the same than in previous examples.

Done!

comment:45 Changed 6 months ago by dcoudert

  • Authors changed from Caitlin Lienkaemper to Caitlin Lienkaemper, Rajat Mittal
  • Status changed from needs_review to positive_review

Further speed up improvements can certainly be obtained, but I don't know how.

For me this patch is good to go.

comment:46 Changed 6 months ago by vbraun

  • Branch changed from u/gh-rajat1433/24101_katz_centrality to 1a87f58d508f68e838cac0ba92d47f5160b23bc4
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.