Opened 3 years ago

Closed 3 years ago

Last modified 2 years ago

#27480 closed enhancement (fixed)

Page Rank algorithm implementation for Graphs

Reported by: gh-rajat1433 Owned by: gh-rajat1433
Priority: major Milestone: sage-8.8
Component: graph theory Keywords: pagerank
Cc: dcoudert Merged in:
Authors: Rajat Mittal Reviewers: David Coudert
Report Upstream: N/A Work issues:
Branch: 07a884b (Commits, GitHub, GitLab) Commit:
Dependencies: Stopgaps:

Status badges

Description

Page Rank computes the ranking of the nodes of the graph based on the structure of the incoming links. It is a useful metrics in graphs and can be quite useful. (https://towardsdatascience.com/graphs-and-paths-pagerank-54f180a1aa0a)

Below is a link to a thesis on this algorithm http://www.sagemath.org/files/thesis/augeri-thesis-2008.pdf

It would be good to have its implementation in the Sage's graph module.

Change History (79)

comment:1 Changed 3 years ago by gh-rajat1433

  • Cc dcoudert added; David Coudert removed
  • Owner changed from (none) to gh-rajat1433

comment:2 Changed 3 years ago by gh-rajat1433

I have a brief idea about it(used it in my college project) and can implement it in Sage modules.

comment:3 Changed 3 years ago by gh-rajat1433

I just found that networkx library already has Page Rank algorithm implemented in Python. Should I implement it in Sage freshly or should I make a function to use networkx implementation ? Or maybe we can write a faster cython implemetation of it. Need some suggetsions!

Last edited 3 years ago by gh-rajat1433 (previous) (diff)

comment:4 Changed 3 years ago by dcoudert

networkx has several methods to compute Page rank: a pure Python, one using numpy, another using scipy, etc.

The optional package igraph also has an implementation of pagerank (see igraph documentation for pagerank (to install igraph, do sage -i igraph and sage -i python_igraph).

The best thing to do is to create a method including a parameter algorithm, and that will call the different algorithms. So algorithm could be None, 'networkx', 'numpy', 'scipy', 'igraph'`.

Last edited 3 years ago by slelievre (previous) (diff)

comment:5 Changed 3 years ago by gh-rajat1433

When it is None, should an implementation in sage be used? And if so should I do an implementation in python or cython?

comment:6 Changed 3 years ago by dcoudert

When it's None, the method should choose the best available implementation. And the best implementation could depend on the size or density of the (di)graph. For instance, for very large graphs, methods based on matrix multiplication might not be appropriate (memory consumption), while such methods might be very efficient for small graphs. Some measurements are needed to decide.

Before deciding if a new implementation is needed, we must know if what we can easily get is fast enough or not.

comment:7 Changed 3 years ago by gh-rajat1433

  • Branch set to u/gh-rajat1433/27480_page_rank

comment:8 Changed 3 years ago by gh-rajat1433

  • Dependencies set to #27496

comment:9 Changed 3 years ago by gh-rajat1433

I have installed igraph in my sage module. But however its pagerank algorithm is not able to work. It keeps on throwing a segmentation error. I use igraph_graph to convert sage graph into igraph. All other algorithms of igraph works fine. Following error message I get:

Cython backtrace
----------------

29	../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory.
Traceback (most recent call last):
  File "<string>", line 56, in <module>
  File "/usr/lib/python3/dist-packages/Cython/Debugger/libcython.py", line 689, in invoke
    for arg in string_to_argv(args):
TypeError: argument 1 must be str, not bytes
Saved trace to /home/rajat/.sage/crash_logs/crash_yLpPiW.log
------------------------------------------------------------------------
Unhandled SIGSEGV: A segmentation fault occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.

Regarding networkx,numpy and scipy, I have tested them on their runtime. Numpy does it by matrix multiplication and solving for eigenvalues, networkx has pure python implementation and scipy does matrix multiplication iteratively.

Following are the runtimes I got:

sage: G=nx.gnp_random_graph(40,0.01,directed=True)
sage: %timeit nx.pagerank_numpy(G)
1000 loops, best of 3: 883 µs per loop
sage: %timeit nx.pagerank_scipy(G)
1000 loops, best of 3: 1.7 ms per loop
sage: %timeit nx.pagerank(G)
1000 loops, best of 3: 1.7 ms per loop

sage: G=nx.gnp_random_graph(60,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 2.87 ms per loop
sage: %timeit nx.pagerank_scipy(G)
1000 loops, best of 3: 1.87 ms per loop
sage: %timeit nx.pagerank_numpy(G)
1000 loops, best of 3: 1.58 ms per loop

sage: G=nx.gnp_random_graph(70,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 3.66 ms per loop
sage: %timeit nx.pagerank_numpy(G)
100 loops, best of 3: 2.22 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 1.99 ms per loop

sage: G=nx.gnp_random_graph(80,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 4.66 ms per loop
sage: %timeit nx.pagerank_scipy(G)
1000 loops, best of 3: 1.93 ms per loop
sage: %timeit nx.pagerank_numpy(G)
100 loops, best of 3: 3.3 ms per loop

sage: G=nx.gnp_random_graph(100,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 7.07 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 2.29 ms per loop
sage: %timeit nx.pagerank_numpy(G)
100 loops, best of 3: 4.63 ms per loop

sage: G=nx.gnp_random_graph(400,0.01,directed=True)
sage: %timeit nx.pagerank_numpy(G)
10 loops, best of 3: 175 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 3.9 ms per loop
sage: %timeit nx.pagerank(G)
10 loops, best of 3: 53.2 ms per loop

sage: G=nx.gnp_random_graph(4000,0.01,directed=True)
sage: %timeit nx.pagerank(G)
1 loop, best of 3: 1.81 s per loop
sage: %timeit nx.pagerank_scipy(G)
1 loop, best of 3: 206 ms per loop
sage: %timeit nx.pagerank_numpy(G)
1 loop, best of 3: 1min 1s per loop

As per my analysis upto around 60 vertices numpy is fast but scipy is fastest after that.

comment:10 Changed 3 years ago by dcoudert

I don't know what's the problem with igraph. I have installed igraph and python_igraph on my OSX laptop and I can do

sage: G = graphs.RandomGNP(1000, .01)
sage: I = G.igraph_graph()
sage: %timeit I.pagerank()
100 loops, best of 3: 1.92 ms per loop

Try recompiling sage (make build or sage -b). If not working, you can ask for help on sage-devel. Explain what you did to install igraph, that other algorithms of igraph are working fine for you, and what's the error message you get.

comment:11 Changed 3 years ago by dcoudert

Note that a random graph on 40 nodes with probability 0.01 has in average 8 edges... so your experiments are not very conclusive on small graphs. Nonetheless, scipy seems the most efficient version.

comment:12 Changed 3 years ago by gh-rajat1433

More experiments with dense small graphs also suggest numpy to be best for small graphs

sage: G=nx.gnp_random_graph(40,0.5,directed=True)
sage:  **%timeit nx.pagerank_numpy(G)
The slowest run took 86.64 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 2.32 ms per loop**
sage: %timeit nx.pagerank_scipy(G)
The slowest run took 60.81 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.41 ms per loop
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 12.5 ms per loop


sage: G=nx.gnp_random_graph(60,0.5,directed=True)
sage: %timeit nx.pagerank(G)
10 loops, best of 3: 27 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 3.47 ms per loop
sage: **%timeit nx.pagerank_numpy(G)
100 loops, best of 3: 3.13 ms per loop**

comment:13 Changed 3 years ago by embray

  • Milestone changed from sage-8.7 to sage-8.8

Ticket retargeted after milestone closed (if you don't believe this ticket is appropriate for the Sage 8.8 release please retarget manually)

comment:14 Changed 3 years ago by gh-rajat1433

  • Dependencies changed from #27496 to #27496, #27502

comment:15 Changed 3 years ago by gh-rajat1433

Page rank for Ipython with weighted edges doesn't seem to be working: However there is a parameter called weights in Pagerank method of Igraph , I looked at its code and doc but it is not clear to me if it is using it.

https://github.com/igraph/python-igraph/blob/master/src/graphobject.c

https://igraph.org/python/doc/igraph.GraphBase-class.html#personalized_pagerank

sage: G = Graph(6)
sage: I = G.igraph_graph()
sage: I.add_edge(0,0,weight=40)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 0, {'weight': 40})
sage: I.add_edge(1,2,weight=50)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 1, {'weight': 50})
sage: I.add_edge(2,3,weight=60)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 2, {'weight': 60})
sage: I.add_edge(0,3,weight=70)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 3, {'weight': 70})
sage: I.add_edge(3,4,weight=80)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 4, {'weight': 80})
sage: I.add_edge(4,5,weight=20)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 5, {'weight': 20})
sage: I.pagerank(weights='weight')
[0.1380494948975056,
 0.11517272680482316,
 0.19683228912731204,
 0.237940473238224,
 0.196832289127312,
 0.11517272680482313]
sage: I.pagerank()
[0.1380494948975056,
 0.11517272680482316,
 0.19683228912731204,
 0.237940473238224,
 0.196832289127312,
 0.11517272680482313]

comment:16 Changed 3 years ago by gh-rajat1433

Following experiments show igraph's results to be the best due to its c-language code.

sage: G = graphs.RandomGNP(40, 0.2)
sage: n = G.networkx_graph()
sage: %timeit nx.pagerank_numpy(n)
1000 loops, best of 3: 1.06 ms per loop
sage: %timeit nx.pagerank_scipy(n)
The slowest run took 37.63 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.31 ms per loop
sage:  %timeit nx.pagerank(n)
100 loops, best of 3: 11.3 ms per loop
sage: i = G.igraph_graph()
sage: %timeit i.pagerank()
10000 loops, best of 3: **23.3 µs** per loop


sage: G = graphs.RandomGNP(40, 0.8)
sage: n = G.networkx_graph()
sage: %timeit nx.pagerank_numpy(n)
1000 loops, best of 3: 1.33 ms per loop
sage: %timeit nx.pagerank_scipy(n)
100 loops, best of 3: 2.16 ms per loop
sage:  %timeit nx.pagerank(n)
10 loops, best of 3: 20 ms per loop
sage: i = G.igraph_graph()
sage: %timeit i.pagerank()
10000 loops, best of 3: **39.1 µs** per loop


sage: G = graphs.RandomGNP(4000, 0.5)
sage: n = G.networkx_graph()
sage: i = G.igraph_graph()
sage: %timeit i.pagerank()
1 loop, best of 3: **1.03 s** per loop
sage: n = G.networkx_graph()
sage: %timeit nx.pagerank_numpy(n)
1 loop, best of 3: 1min 9s per loop
sage: %timeit nx.pagerank_scipy(n)
1 loop, best of 3: 7.12 s per loop
sage:  %timeit nx.pagerank(n)
....: 
....: 
Killed

comment:17 Changed 3 years ago by dcoudert

The code of the pagerank algorithm is here https://github.com/igraph/igraph/blob/master/src/centrality.c

comment:18 Changed 3 years ago by gh-rajat1433

  • Branch changed from u/gh-rajat1433/27480_page_rank to u/gh-rajat1433/27480_page_rank_implementation

comment:19 Changed 3 years ago by git

  • Commit set to ecbb4a7c73b93b8ff4326e018cb58fea330fdb97

Branch pushed to git repo; I updated commit sha1. New commits:

b2c82ecFixes cycle basis for multiedges
1a68ce1Add doctests
142e50fMerge branch 'develop' of git://trac.sagemath.org/sage into develop
8836f0dcheckpoint1
a2f38d9merged
8433c2amerged
b57b730pagerank method
ecbb4a7removed xtra lines

comment:20 Changed 3 years ago by gh-rajat1433

  • Branch changed from u/gh-rajat1433/27480_page_rank_implementation to u/gh-rajat1433/27480_page_rank_algo
  • Commit ecbb4a7c73b93b8ff4326e018cb58fea330fdb97 deleted

comment:21 Changed 3 years ago by git

  • Commit set to 142e50fa00bf1b76d4ec435278d3a9feec183b45

Branch pushed to git repo; I updated commit sha1. New commits:

b2c82ecFixes cycle basis for multiedges
1a68ce1Add doctests
142e50fMerge branch 'develop' of git://trac.sagemath.org/sage into develop

comment:22 Changed 3 years ago by git

  • Commit changed from 142e50fa00bf1b76d4ec435278d3a9feec183b45 to faafb7947fab662e4fef0f965e68e5b6d16d6062

Branch pushed to git repo; I updated commit sha1. New commits:

faafb79branch corrected

comment:23 Changed 3 years ago by git

  • Commit changed from faafb7947fab662e4fef0f965e68e5b6d16d6062 to cdef2867ecae6aad616400562b2ee825181e603d

Branch pushed to git repo; I updated commit sha1. New commits:

cdef286fix small mistakes

comment:24 Changed 3 years ago by git

  • Commit changed from cdef2867ecae6aad616400562b2ee825181e603d to b4e6b8c4937cba885bfdf81c8f84d67a0de2be24

Branch pushed to git repo; I updated commit sha1. New commits:

b4e6b8cremoving spaces

comment:25 Changed 3 years ago by gh-rajat1433

Sorry for the mess. Its due to update in the sage version now its corrected :)

comment:26 Changed 3 years ago by gh-rajat1433

I think that this is an issue on igraph side that its not able to work for weighted graphs in python. https://lists.nongnu.org/archive/html/igraph-help/2008-08/msg00030.html

Still I am seeing through how the code of igraph can be fixed to take weights in pagerank. Else we can use other libraries like scipy for weighted case.

comment:27 Changed 3 years ago by gh-rajat1433

I think the problem is with the version or codebase of igraph which sage currently uses , maybe its not been updated.

Using sage:

sage: G = Graph(5)
sage: I = G.igraph_graph()
sage: I.add_edge(0,1,weight=50)
igraph.Edge(<igraph.Graph object at 0x7fe144ca7a00>, 0, {'weight': 50})
sage: I.add_edge(1,2,weight=70)
....: 
igraph.Edge(<igraph.Graph object at 0x7fe144ca7a00>, 1, {'weight': 70})
sage: I.add_edge(2,4,weight=35)
igraph.Edge(<igraph.Graph object at 0x7fe144ca7a00>, 2, {'weight': 35})
sage: 
....: 
sage: I.add_edge(3,4,weight=35)
igraph.Edge(<igraph.Graph object at 0x7fe144ca7a00>, 3, {'weight': 35})
sage: I.pagerank()
[0.134527027027027,
 0.24594594594594593,
 0.23905405405405405,
 0.134527027027027,
 0.24594594594594593]
sage: I.pagerank(weights='weight')
[0.134527027027027,
 0.24594594594594593,
 0.23905405405405405,
 0.134527027027027,
 0.24594594594594593]

Using python terminal

>>> I = Graph(5)
>>> I
<igraph.Graph object at 0x7f5bcd250528>
>>> I.summary()
'IGRAPH U--- 5 0 -- '
>>> I.add_edge(0,1,weight=50)
>>> I.add_edge(1,2,weight=70)
>>> I.add_edge(2,4,weight=35)
>>> I.add_edge(3,4,weight=35)
>>> I.pagerank()
[0.134527027027027, 0.24594594594594593, 0.23905405405405405, 0.134527027027027, 0.24594594594594593]
>>> I.pagerank(weights='weight')
[0.1326579757790889, 0.2898578139644863, 0.2595856492098718, 0.11586448311914739, 0.20203407792740563]

comment:28 Changed 3 years ago by dcoudert

Sage 8.8.beta0 has been released and it includes the 2 tickets marked as dependencies. So you can update you develop branch and rebuild this ticket using last beta.

If it is not possible to use weight with igraph, add a TODO block in the method to say that it should be done someday.

comment:29 Changed 3 years ago by gh-rajat1433

I have already updated to sage8.8 beta0 today only!

Please read comment 27, as I am able to use weight parameter of igraph in my python console but in sage its not working (don't know why this discrepancy is present).

comment:30 Changed 3 years ago by gh-rajat1433

  • Dependencies #27496, #27502 deleted

comment:31 Changed 3 years ago by gh-rajat1433

Should I open a new ticket mentioning this discrepancy of igraph package in sage?

comment:32 Changed 3 years ago by dcoudert

I don't understand what's going on. Have you tried other methods using weights?

comment:33 Changed 3 years ago by gh-rajat1433

Yes other algorithms like dijkstra are working fine with weights. I think its an old bug of igraph which is updated as can be seen by the results in my python shell however its not been updated in sage version of igraph.

Reference: https://lists.nongnu.org/archive/html/igraph-help/2008-08/msg00030.html

comment:34 Changed 3 years ago by dcoudert

May the version that has been updated in #27502 is not the same as the version you have on your computer ? upstream should update python_igraph to be at least Python3 compatible. It currently contains deprecated stuff, so don't pass python3 tests :(

comment:35 Changed 3 years ago by git

  • Commit changed from b4e6b8c4937cba885bfdf81c8f84d67a0de2be24 to 78b1f8b5e409ae8115667cccb34f3a5bb6fd78c4

Branch pushed to git repo; I updated commit sha1. New commits:

46051b4remove spaces
566f979indenting
78b1f8bimproved code

comment:36 Changed 3 years ago by gh-rajat1433

  • Authors set to Rajat Mittal
  • Keywords pagerank added
  • Reviewers set to David Coudert
  • Status changed from new to needs_review

comment:37 follow-up: Changed 3 years ago by gh-rajat1433

I have used scipy in case of weighted graphs and igraph in case of unweighted as default case as per the results I got mentioned in the previous comments.

comment:38 Changed 3 years ago by git

  • Commit changed from 78b1f8b5e409ae8115667cccb34f3a5bb6fd78c4 to 112cac944df1d70ab96cebb323add4bcdf4c539f

Branch pushed to git repo; I updated commit sha1. New commits:

112cac9correct igraph example

comment:39 in reply to: ↑ 37 Changed 3 years ago by dcoudert

Replying to gh-rajat1433:

I have used scipy in case of weighted graphs and igraph in case of unweighted as default case as per the results I got mentioned in the previous comments.

  • Unfortunately, igraph is an optional package, so you can use it only if installed. See e.g. clique_maximum to know how to check if a package is installed.
  • Also, I suggest you start checking if default should be used, and then select most suitable default algorithm.
  • instead of implementation, you should use algorithm.
  • You should raise an error if the given name is unknown
  • the name of igraph is igraph, so don't use Igraph. In fact, you could do algorithm = algorithm.lower() to avoid errors.
  • if self.order() == 0: -> if not self.order():
  • this is not needed import igraph

comment:40 Changed 3 years ago by git

  • Commit changed from 112cac944df1d70ab96cebb323add4bcdf4c539f to e7ae5da2138792384eabeae9217b115106c372ff

Branch pushed to git repo; I updated commit sha1. New commits:

e7ae5daimproved the code

comment:41 Changed 3 years ago by gh-rajat1433

As per commment 9, 12 and 16 I think it will be a good idea to have threshold value of number of nodes in the graph as less than 60 nodes numpy implementation should be used as default and greater than that scipy implementation should be used. If you have another idea on thresholding based on density or any other parameter I would like to know.

Last edited 3 years ago by gh-rajat1433 (previous) (diff)

comment:42 Changed 3 years ago by dcoudert

why 60? I though scipy was fast even for small graphs...

comment:43 Changed 3 years ago by gh-rajat1433

Due to the following results I suggested 60. For small graphs time taken by numpy is lesser than that of scipy.

sage: G=nx.gnp_random_graph(40,0.01,directed=True)
sage: %timeit nx.pagerank_numpy(G)
1000 loops, best of 3: 883 µs per loop
sage: %timeit nx.pagerank_scipy(G)
1000 loops, best of 3: 1.7 ms per loop
sage: %timeit nx.pagerank(G)
1000 loops, best of 3: 1.7 ms per loop

sage: G=nx.gnp_random_graph(60,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 2.87 ms per loop
sage: %timeit nx.pagerank_scipy(G)
1000 loops, best of 3: 1.87 ms per loop
sage: %timeit nx.pagerank_numpy(G)
1000 loops, best of 3: 1.58 ms per loop

sage: G=nx.gnp_random_graph(70,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 3.66 ms per loop
sage: %timeit nx.pagerank_numpy(G)
100 loops, best of 3: 2.22 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 1.99 ms per loop

sage: G=nx.gnp_random_graph(80,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 4.66 ms per loop
sage: %timeit nx.pagerank_scipy(G)
1000 loops, best of 3: 1.93 ms per loop
sage: %timeit nx.pagerank_numpy(G)
100 loops, best of 3: 3.3 ms per loop

sage: G=nx.gnp_random_graph(100,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 7.07 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 2.29 ms per loop
sage: %timeit nx.pagerank_numpy(G)
100 loops, best of 3: 4.63 ms per loop

sage: G=nx.gnp_random_graph(400,0.01,directed=True)
sage: %timeit nx.pagerank_numpy(G)
10 loops, best of 3: 175 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 3.9 ms per loop
sage: %timeit nx.pagerank(G)
10 loops, best of 3: 53.2 ms per loop

sage: G=nx.gnp_random_graph(4000,0.01,directed=True)
sage: %timeit nx.pagerank(G)
1 loop, best of 3: 1.81 s per loop
sage: %timeit nx.pagerank_scipy(G)
1 loop, best of 3: 206 ms per loop
sage: %timeit nx.pagerank_numpy(G)
1 loop, best of 3: 1min 1s per loop

sage: G=nx.gnp_random_graph(40,0.5,directed=True)
sage:  **%timeit nx.pagerank_numpy(G)
The slowest run took 86.64 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 2.32 ms per loop**
sage: %timeit nx.pagerank_scipy(G)
The slowest run took 60.81 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.41 ms per loop
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 12.5 ms per loop


sage: G=nx.gnp_random_graph(60,0.5,directed=True)
sage: %timeit nx.pagerank(G)
10 loops, best of 3: 27 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 3.47 ms per loop
sage: **%timeit nx.pagerank_numpy(G)
100 loops, best of 3: 3.13 ms per loop**

sage: G = graphs.RandomGNP(40, 0.2)
sage: n = G.networkx_graph()
sage: %timeit nx.pagerank_numpy(n)
1000 loops, best of 3: 1.06 ms per loop
sage: %timeit nx.pagerank_scipy(n)
The slowest run took 37.63 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.31 ms per loop
sage:  %timeit nx.pagerank(n)
100 loops, best of 3: 11.3 ms per loop
sage: i = G.igraph_graph()
sage: %timeit i.pagerank()
10000 loops, best of 3: **23.3 µs** per loop


sage: G = graphs.RandomGNP(40, 0.8)
sage: n = G.networkx_graph()
sage: %timeit nx.pagerank_numpy(n)
1000 loops, best of 3: 1.33 ms per loop
sage: %timeit nx.pagerank_scipy(n)
100 loops, best of 3: 2.16 ms per loop
sage:  %timeit nx.pagerank(n)
10 loops, best of 3: 20 ms per loop
sage: i = G.igraph_graph()
sage: %timeit i.pagerank()
10000 loops, best of 3: **39.1 µs** per loop


sage: G = graphs.RandomGNP(4000, 0.5)
sage: n = G.networkx_graph()
sage: i = G.igraph_graph()
sage: %timeit i.pagerank()
1 loop, best of 3: **1.03 s** per loop
sage: n = G.networkx_graph()
sage: %timeit nx.pagerank_numpy(n)
1 loop, best of 3: 1min 9s per loop
sage: %timeit nx.pagerank_scipy(n)
1 loop, best of 3: 7.12 s per loop

comment:44 Changed 3 years ago by dcoudert

OK.

comment:45 Changed 3 years ago by git

  • Commit changed from e7ae5da2138792384eabeae9217b115106c372ff to 7364a6b00f70ae05ccc68c602e7797536c4d8a66

Branch pushed to git repo; I updated commit sha1. New commits:

5de6f2bimproved the code
7364a6breduced lines of code

comment:46 Changed 3 years ago by gh-rajat1433

I have added the default values and completed the code.

comment:47 Changed 3 years ago by dcoudert

  • We usually use vertex/vertices and not node/nodes.
  • -        Return the PageRank of the nodes in the graph.
    +        Return the PageRank of the vertices of ``self``.
    
  • This text might be more informative.
    -        PageRank calculates the ranking of nodes in the graph G based on the
    -        structure of the incoming links. It is popularly used to rank web
    -        pages.
    +        PageRank is a centrality measure first used to rank web pages.
    +        The PageRank algorithm outputs the probability distribution that
    +        a random walker in the graph visits a vertex.
    
  • for the default case, the simpler version is:
    +        if algorithm:
    +            algorithm = algorithm.lower()
    +        elif self.order <= 60:
    +            algorithm = 'numpy'
    +        else:
    +            algorithm = 'scipy'
    
  • If the graph is not connected, you can certainly compute pagerank on each connected component. Before doing it, check how to do the normalization afterward. Indeed:
    sage: import networkx
    sage: G = graphs.CycleGraph(4)
    sage: N = G.networkx_graph()
    sage: print(networkx.pagerank(N))
    {0: 0.25, 1: 0.25, 2: 0.25, 3: 0.25}
    sage: N = (G+G).networkx_graph()
    sage: print(networkx.pagerank(N))
    {0: 0.125, 1: 0.125, 2: 0.125, 3: 0.125, 4: 0.125, 5: 0.125, 6: 0.125, 7: 0.125}
    sage: H = graphs.CycleGraph(10)
    sage: N = H.networkx_graph()
    sage: print(networkx.pagerank(N))
    {0: 0.1, 1: 0.1, 2: 0.1, 3: 0.1, 4: 0.1, 5: 0.1, 6: 0.1, 7: 0.1, 8: 0.1, 9: 0.1}
    sage: N = (G+H).networkx_graph()
    sage: print(networkx.pagerank(N))
    {0: 0.07142857142857142, 1: 0.07142857142857142, 2: 0.07142857142857142, 3: 0.07142857142857142, 4: 0.07142857142857142, 5: 0.07142857142857142, 6: 0.07142857142857142, 7: 0.07142857142857142, 8: 0.07142857142857142, 9: 0.07142857142857142, 10: 0.07142857142857142, 11: 0.07142857142857142, 12: 0.07142857142857142, 13: 0.07142857142857142}
    

comment:48 Changed 3 years ago by git

  • Commit changed from 7364a6b00f70ae05ccc68c602e7797536c4d8a66 to 552a429952c404796d612ae9daeca9cc53df60f2

Branch pushed to git repo; I updated commit sha1. New commits:

552a429improved code

comment:49 Changed 3 years ago by gh-rajat1433

optimizing the code for computing pagerank for connected components subgraph will work for undirected graphs as we can easily find the normalization factor but for digraphs the nodes can be dangling also in such a case we randomly connect that node to other nodes in our graph with some weight value of the edge. In such cases this optimization may not be desired.

comment:50 Changed 3 years ago by dcoudert

it can also be let for a future ticket

comment:51 Changed 3 years ago by gh-rajat1433

For undirected graphs, I can do the optimization in this ticket only. :)

Last edited 3 years ago by gh-rajat1433 (previous) (diff)

comment:52 Changed 3 years ago by gh-rajat1433

Even for undirected graphs with personalization parameter set the optimization may not work as can be seen from example below:

sage: g = Graph([(1,2,3),(2,3,5),(3,5,8),(2,5,13),(2,4,25)])
sage: personalization={}
sage: personalization[1] = 8
sage: personalization[2] = 16
sage: personalization[3] = 24
sage: personalization[4] = 36
sage: personalization[5] = 25
sage: g.pagerank(by_weight=True, personalization=personalization)
sage: personalization2={}
sage: personalization2[1] = 84
sage: personalization2[2] = 42
sage: personalization2[3] = 30
sage: personalization2[4] = 70
sage: f = Graph([(1,2,2),(2,3,15),(2,4,22)])
sage: f.pagerank(by_weight=True,personalization=personalization2)
{1: 0.07643670757326188,
 2: 0.474528450109004,
 3: 0.17504521830388875,
 4: 0.27398962401384513}
sage: q = g + f
sage: personalization[0] = 8
sage: personalization[1] = 16
sage: personalization[2] = 24
sage: personalization[3] = 36
sage: personalization[4] = 25
sage: personalization[5] = 84
sage: personalization[6] = 42
sage: personalization[7] = 30
sage: personalization[8] = 70
sage: q.pagerank(by_weight=True,personalization=personalization)
{0: 0.010763853970772248,
 1: 0.12955307566026533,
 2: 0.04384414464203576,
 3: 0.07596743980618663,
 4: 0.0652489170000004,
 5: 0.05156608491296348,
 6: 0.3201286892373546,
 7: 0.11808892042931564,
 8: 0.18483887434110574}

So I guess its best to keep the method as it is for now.

comment:53 Changed 3 years ago by dcoudert

  • no capital letter after ;. So
    -        - ``alpha`` -- float (default: ``0.85``); Damping parameter for
    +        - ``alpha`` -- float (default: ``0.85``); damping parameter for
    
  • you should give slightly more details on what the parameters are. For instance, the PageRank? value for vertices without in-neighbors is 1 - damping. Furthermore, the damping factor (this term is better than parameter here) is the probability of resetting the random walk to a uniform distribution in each step.
  • As reported by the patchbot. Please check it
    -        .. SEEALSO:
    +        .. SEEALSO::
    
  • ...
    -        elif self.order <= 60:
    +        elif self.order() <= 60:
    
  • simplify code and avoid using .vertices() when not necessary (it sorts vertices)
    -            if by_weight:
    -                I = self.igraph_graph(edge_attrs={'weight': [weight_function(e)
    -                                                  for e in self.edge_iterator()]})
    -                page_rank = I.pagerank(damping=alpha, weights='weight')
    -                return {v: page_rank[i] for i, v in enumerate(self.vertices())}
    -            else:
    -                I = self.igraph_graph()
    -                page_rank = I.pagerank(damping=alpha)
    -                return {v: page_rank[i] for i, v in enumerate(self.vertices())}
    +            if by_weight:
    +                I = self.igraph_graph(edge_attrs={'weight': [weight_function(e)
    +                                                  for e in self.edge_iterator()]})
    +            else:
    +                I = self.igraph_graph()
    +            page_rank = I.pagerank(damping=alpha, weights=weight)
    +            return {v: page_rank[i] for i, v in enumerate(self)}
    
  • Don't start error message with a capital letter raise NotImplementedError("Only 'NetworkX', 'Numpy', 'Scipy', and 'igraph' are supported")

comment:54 Changed 3 years ago by git

  • Commit changed from 552a429952c404796d612ae9daeca9cc53df60f2 to c32242e8e8fa04a0fef0518853fb23e5b6accc26

Branch pushed to git repo; I updated commit sha1. New commits:

d5d7479improved code and documented
c32242eimproved

comment:55 Changed 3 years ago by gh-rajat1433

I have added more info on damping parameter alpha and improved the code as per review.

comment:56 Changed 3 years ago by gh-rajat1433

on my system all tests pass! but why isn't patchbot reflecting the same here. I was just wondering about how much time it takes for patchbot to get invoked here?

$ ./sage -t src/sage/graphs/geneic_graph.py
Running doctests with ID 2019-04-07-20-11-50-05efbf4e.
Git branch: u/gh-rajat1433/27480_page_rank_algo
Using --optional=dochtml,igraph,memlimit,mpir,python2,python_igraph,sage
Doctesting 1 file.
sage -t --warn-long 24.3 src/sage/graphs/generic_graph.py
    [3391 tests, 26.98 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------
Total time for all tests: 28.0 seconds
    cpu time: 19.8 seconds
    cumulative wall time: 27.0 seconds

Last edited 3 years ago by gh-rajat1433 (previous) (diff)

comment:57 Changed 3 years ago by dcoudert

  • please don't forget my previous #comment:47
    -    :meth:`~GenericGraph.pagerank` | Return the PageRank of the nodes in the graph.
    +    :meth:`~GenericGraph.pagerank` | Return the PageRank of the vertices of ``self``.
    
    -        Return the PageRank of the vertices in the graph.
    +        Return the PageRank of the vertices of ``self``.
    

The documentation requires further improvements. I know that the documentations of networkx/igraph are not very clear, but we can do better.

  • I don't understand what the personalization parameter is used for. We can at least clarify the text and be closer to what is explained in networkx
    -        - ``personalization`` -- dict (default: ``None``); the "personalization
    -          vector" consisting of a dictionary with a key for every graph node
    -          and nonzero personalization value for each node.
    -          By default, a uniform distribution is used.
    +        - ``personalization`` -- dict (default: ``None``); a dictionary keyed by
    +          vertices associating to each vertex a value. The personalization can be
    +          specified for a subset of the vertices only, and the sum of the values
    +          must be nonzero. 
    +          By default (``None``), a uniform distribution is used.
    
  • if you understand what is dangling, please improve the description.
  • You should indicate the parameters that cannot be used with an algorithm or that can only be used with an algorithm.
  • may be it would be better to start the list of parameters with the algorithms ? Well, think about it.
  • the note Note that 'networkx' does not support multigraphs. could be added directly with the in the line - NetworkX -- ...
  • you could add a TESTS block with errors, specific cases, etc. For instance with personalization={0: 1, 1: -1}

Also, note that using a graph like a 4-cycle is interesting to illustrate the effect of the various parameters. Indeed, the pagerank of each vertex is 0.25.

Note that the example you have is also interesting as it shows that the result is slightly different from an algorithm to another.

comment:58 Changed 3 years ago by dcoudert

and for the patchbot, I don't know when it is invoked. It depends on availability and so on the number of users who are kindly offering computing resource for that and spending time maintaining the system.

comment:59 Changed 3 years ago by git

  • Commit changed from c32242e8e8fa04a0fef0518853fb23e5b6accc26 to 03dd4451465f5c057ca24a19c2df5372c2874794

Branch pushed to git repo; I updated commit sha1. New commits:

e6dff8dimproved code and doc
03dd445merged

comment:60 Changed 3 years ago by git

  • Commit changed from 03dd4451465f5c057ca24a19c2df5372c2874794 to 06dbcc6483839e071908b6de17e76160b773f24e

Branch pushed to git repo; I updated commit sha1. New commits:

06dbcc6revert some changes

comment:61 Changed 3 years ago by gh-rajat1433

I have rebased my branch on 8.8 beta 1 and did the changes as mentioned in comment 57.

comment:62 Changed 3 years ago by git

  • Commit changed from 06dbcc6483839e071908b6de17e76160b773f24e to 30d493def189d6d71796749ef9ed921c9e0f8a19

Branch pushed to git repo; I updated commit sha1. New commits:

30d493dremved unnecessary spaces

comment:63 Changed 3 years ago by dcoudert

Further improvements are certainly possible, but it's already good. Do at least this one (plurals)

-            Parameter ``alpha``, ``by_weight`` and ``weight_function`` is common
-            for all algorithms but ``personalization`` and ``dangling``
-            parameters are used only in ``NetworkX``, ``Numpy`` and ``Scipy``
-            implementations.
+            Parameters ``alpha``, ``by_weight`` and ``weight_function`` are common
+            to all algorithms. Parameters ``personalization`` and ``dangling``
+            are used only by algorithms ``NetworkX``, ``Numpy`` and ``Scipy``.

comment:64 Changed 3 years ago by git

  • Commit changed from 30d493def189d6d71796749ef9ed921c9e0f8a19 to 804fe603a582470952b017d444ee8bb02925482f

Branch pushed to git repo; I updated commit sha1. New commits:

804fe60improved note

comment:65 Changed 3 years ago by gh-rajat1433

Done the changes!

comment:66 Changed 3 years ago by dcoudert

  • Status changed from needs_review to positive_review

LGTM.

comment:67 Changed 3 years ago by vbraun

  • Branch changed from u/gh-rajat1433/27480_page_rank_algo to 804fe603a582470952b017d444ee8bb02925482f
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:68 Changed 3 years ago by vbraun

  • Branch changed from 804fe603a582470952b017d444ee8bb02925482f to u/gh-rajat1433/27480_page_rank_algo
  • Resolution fixed deleted
  • Status changed from closed to new
**********************************************************************
File "src/sage/graphs/generic_graph.py", line 9536, in sage.graphs.generic_graph.GenericGraph.?
Failed example:
    G.pagerank(alpha=0.50, algorithm="igraph")
Exception raised:
    Traceback (most recent call last):
      File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 671, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 1095, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.graphs.generic_graph.GenericGraph.?[2]>", line 1, in <module>
        G.pagerank(alpha=RealNumber('0.50'), algorithm="igraph")
      File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/site-packages/sage/graphs/generic_graph.py", line 9668, in pagerank
        raise PackageNotFoundError("igraph")
    PackageNotFoundError: the package 'igraph' was not found. You can install it by running 'sage -i igraph' in a shell
**********************************************************************
File "src/sage/graphs/generic_graph.py", line 9581, in sage.graphs.generic_graph.GenericGraph.?
Failed example:
    G.pagerank(algorithm="igraph")
Exception raised:
    Traceback (most recent call last):
      File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 671, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 1095, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.graphs.generic_graph.GenericGraph.?[10]>", line 1, in <module>
        G.pagerank(algorithm="igraph")
      File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/site-packages/sage/graphs/generic_graph.py", line 9668, in pagerank
        raise PackageNotFoundError("igraph")
    PackageNotFoundError: the package 'igraph' was not found. You can install it by running 'sage -i igraph' in a shell
**********************************************************************
1 item had failures:
   2 of 890 in sage.graphs.generic_graph.GenericGraph.?
    [3393 tests, 2 failures, 36.96 s]
**********************************************************************

comment:69 Changed 3 years ago by dcoudert

@Rajat: add to each doctest with igraph # optional - python_igraph, like this

-            sage: G.pagerank(alpha=0.50, algorithm="igraph")
+            sage: G.pagerank(alpha=0.50, algorithm="igraph")  # optional - python_igraph
             {0: 0.25, 1: 0.25, 2: 0.24999999999999997, 3: 0.24999999999999997}

and

-            sage: G.pagerank(algorithm="igraph")
+            sage: G.pagerank(algorithm="igraph")  # optional - python_igraph

@Volker: sorry for this.

comment:70 Changed 3 years ago by git

  • Commit changed from 804fe603a582470952b017d444ee8bb02925482f to f10629a7b9f0efb844de21eaddaf01f8492e59d2

Branch pushed to git repo; I updated commit sha1. New commits:

f10629aadded optional igraph parameter

comment:71 Changed 3 years ago by gh-rajat1433

Added the optional igraph thing. Understood the importance of it as it may fail tests on systems on which igraph is not installed..

comment:72 Changed 3 years ago by dcoudert

  • Status changed from new to needs_review

comment:73 Changed 3 years ago by gh-rajat1433

$ ./sage -t src/sage/graphs/generic_graph.py
Running doctests with ID 2019-04-14-09-52-00-5ed75243.
Git branch: u/gh-rajat1433/27480_page_rank_algo
Using --optional=dochtml,igraph,memlimit,mpir,python2,python_igraph,sage
Doctesting 1 file.
sage -t --warn-long 24.3 src/sage/graphs/generic_graph.py
    [3402 tests, 27.42 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------
Total time for all tests: 28.3 seconds
    cpu time: 19.0 seconds
    cumulative wall time: 27.4 seconds

On my system all tests passes but patchbot has some failed examples don't know why...

comment:74 Changed 3 years ago by dcoudert

This is numerical noise due to floating point arithmetic. See http://doc.sagemath.org/html/en/developer/coding_basics.html#special-markup-to-influence-doctests

A solution is to add # abs tol 1e-9 to all doctests.

-            sage: G.pagerank(algorithm="Numpy")
+            sage: G.pagerank(algorithm="Numpy") # abs tol 1e-9

...

-            sage: G.pagerank()
+            sage: G.pagerank() # abs tol 1e-9

also for igraph

-            sage: G.pagerank(alpha=0.50, algorithm="igraph")  # optional - python_igraph
+            sage: G.pagerank(alpha=0.50, algorithm="igraph") # optional - python_igraph # abs tol 1e-9

comment:75 Changed 3 years ago by git

  • Commit changed from f10629a7b9f0efb844de21eaddaf01f8492e59d2 to 07a884b8ce9f1557b9ac1d2c98fa72aa0eea1e1e

Branch pushed to git repo; I updated commit sha1. New commits:

07a884bfloating point flag included

comment:76 Changed 3 years ago by dcoudert

  • Status changed from needs_review to positive_review

LGTM

comment:77 Changed 3 years ago by vbraun

  • Branch changed from u/gh-rajat1433/27480_page_rank_algo to 07a884b8ce9f1557b9ac1d2c98fa72aa0eea1e1e
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:78 follow-up: Changed 2 years ago by chapoton

  • Commit 07a884b8ce9f1557b9ac1d2c98fa72aa0eea1e1e deleted

There is an issue with igraph on the arando patchbot with 8.8.b4 and 8.8.b5:

**********************************************************************
File "src/sage/graphs/generic_graph.py", line 9663, in sage.graphs.generic_graph.GenericGraph.?
Failed example:
    G.pagerank(alpha=0.50, algorithm="igraph")  # optional - python_igraph
Expected:
    {0: 0.25, 1: 0.25, 2: 0.24999999999999997, 3: 0.24999999999999997}
Got:
    {0: 0.25, 1: 0.25, 2: 0.25, 3: 0.25}
**********************************************************************
1 item had failures:
   1 of 860 in sage.graphs.generic_graph.GenericGraph.?
    [3497 tests, 1 failure, 50.96 s]
----------------------------------------------------------------------
sage -t --long src/sage/graphs/generic_graph.py  # 1 doctest failed

comment:79 in reply to: ↑ 78 Changed 2 years ago by gh-rajat1433

I think I forgot to add # abs tol 1e-9 to this test.

Thanks David for opening #27811 for fixing it

Replying to chapoton:

There is an issue with igraph on the arando patchbot with 8.8.b4 and 8.8.b5:

**********************************************************************
File "src/sage/graphs/generic_graph.py", line 9663, in sage.graphs.generic_graph.GenericGraph.?
Failed example:
    G.pagerank(alpha=0.50, algorithm="igraph")  # optional - python_igraph
Expected:
    {0: 0.25, 1: 0.25, 2: 0.24999999999999997, 3: 0.24999999999999997}
Got:
    {0: 0.25, 1: 0.25, 2: 0.25, 3: 0.25}
**********************************************************************
1 item had failures:
   1 of 860 in sage.graphs.generic_graph.GenericGraph.?
    [3497 tests, 1 failure, 50.96 s]
----------------------------------------------------------------------
sage -t --long src/sage/graphs/generic_graph.py  # 1 doctest failed
Note: See TracTickets for help on using tickets.