Sage: Ticket #18137: Centrality betweenness in Sage
https://trac.sagemath.org/ticket/18137
<p>
I hate it that we do not appear in comparisons like the following, just because we are slower than the worst library <code>:-P</code>
</p>
<p>
<a class="ext-link" href="http://graph-tool.skewed.de/performance"><span class="icon"></span>http://graph-tool.skewed.de/performance</a>
</p>
<p>
With this branch we can compute the betweenness centrality in Sage with a decent speed.
</p>
<p>
Nathann
</p>
<p>
P.S.: The version of the code that deals with rational instead of floats has been removed because it is *much* slower (60x in some cases), and because I did not see how to make the two coexist without copy/pasting most of the code.
</p>
en-usSagehttps://trac.sagemath.org/chrome/site/logo_sagemath_trac.png
https://trac.sagemath.org/ticket/18137
Trac 1.1.6ncohenTue, 07 Apr 2015 19:40:31 GMTstatus changed; commit, branch set
https://trac.sagemath.org/ticket/18137#comment:1
https://trac.sagemath.org/ticket/18137#comment:1
<ul>
<li><strong>status</strong>
changed from <em>new</em> to <em>needs_review</em>
</li>
<li><strong>commit</strong>
set to <em>0402abffb7656e123ce64d38e6a8cb8392eecf89</em>
</li>
<li><strong>branch</strong>
set to <em>u/ncohen/18137</em>
</li>
</ul>
<p>
New commits:
</p>
<table class="wiki">
<tr><td><a class="ext-link" href="http://git.sagemath.org/sage.git/commit/?id=7b6329709f2d23f9b5b9b1995979c0c2ab2eccd4"><span class="icon"></span>7b63297</a></td><td><code>trac #18137: Add new centrality module</code>
</td></tr><tr><td><a class="ext-link" href="http://git.sagemath.org/sage.git/commit/?id=0402abffb7656e123ce64d38e6a8cb8392eecf89"><span class="icon"></span>0402abf</a></td><td><code>trac #18137: keep only the 'double' version, get rid of the rationals</code>
</td></tr></table>
TicketdcoudertTue, 07 Apr 2015 20:08:10 GMT
https://trac.sagemath.org/ticket/18137#comment:2
https://trac.sagemath.org/ticket/18137#comment:2
<p>
Hello,
</p>
<p>
I have only small remarks:
</p>
<ul><li>Wouldn't it be slightly faster to use arrays of bint rather than bitsets ? It would use more memory, but since you want to be fast, any saving is important.
</li><li>You could add a doctest to compare the result of your implementation with networkx
</li><li>You could pre-compute the <code>((n-1)*(n-2))</code>, although its minor improvement.
</li><li>You could add <code>cdef double x</code>
</li><li>Variables <code>k</code> and <code>d</code> are not used.
</li><li>You wrote <code>from centrality import centrality_betweenness</code>. Shouldn't it be <code>from sage.graphs.centrality import centrality_betweenness</code> ?
</li></ul>
TicketncohenTue, 07 Apr 2015 20:22:58 GMT
https://trac.sagemath.org/ticket/18137#comment:3
https://trac.sagemath.org/ticket/18137#comment:3
<p>
Hello,
</p>
<blockquote class="citation">
<ul><li>Wouldn't it be slightly faster to use arrays of bint rather than bitsets ? It would use more memory, but since you want to be fast, any saving is important.
</li></ul></blockquote>
<p>
It would not be much faster, because most of what this array would contain is zeroes (bint is an int in memory). Plus the bottleneck is float computation in this case <code>:-/</code>
</p>
<blockquote class="citation">
<ul><li>You could add a doctest to compare the result of your implementation with networkx
</li></ul></blockquote>
<p>
There is one, isn't there? In <code>centrality.pyx</code>. The one with a "tolerance" flag.
</p>
<blockquote class="citation">
<ul><li>You could pre-compute the <code>((n-1)*(n-2))</code>, although its minor improvement.
</li></ul></blockquote>
<p>
I don't think that it is worth it. Save a linear number of multiplications after all this work, really?.. <code>:-P</code>
</p>
<blockquote class="citation">
<ul><li>You could add <code>cdef double x</code>
</li></ul></blockquote>
<p>
Done.
</p>
<blockquote class="citation">
<ul><li>Variables <code>k</code> and <code>d</code> are not used.
</li></ul></blockquote>
<p>
Done.
</p>
<blockquote class="citation">
<ul><li>You wrote <code>from centrality import centrality_betweenness</code>. Shouldn't it be <code>from sage.graphs.centrality import centrality_betweenness</code> ?
</li></ul></blockquote>
<p>
They are in the same folder, so it works. It is even more robust as a result, as we can move them wherever we want and the path does not change.
</p>
<p>
I also changed a Cython flag which checks for exceptions when doing float divisions.
</p>
<p>
Nathann
</p>
TicketgitTue, 07 Apr 2015 20:23:19 GMTcommit changed
https://trac.sagemath.org/ticket/18137#comment:4
https://trac.sagemath.org/ticket/18137#comment:4
<ul>
<li><strong>commit</strong>
changed from <em>0402abffb7656e123ce64d38e6a8cb8392eecf89</em> to <em>47291d7a004cef908625eeb40ac3997645a7cce3</em>
</li>
</ul>
<p>
Branch pushed to git repo; I updated commit sha1. New commits:
</p>
<table class="wiki">
<tr><td><a class="ext-link" href="http://git.sagemath.org/sage.git/commit/?id=47291d7a004cef908625eeb40ac3997645a7cce3"><span class="icon"></span>47291d7</a></td><td><code>trac #18137: Review</code>
</td></tr></table>
TicketchapotonWed, 08 Apr 2015 06:21:49 GMT
https://trac.sagemath.org/ticket/18137#comment:5
https://trac.sagemath.org/ticket/18137#comment:5
<p>
there is numerical noise, add tolerance, see patchbot report.
</p>
TicketchapotonWed, 08 Apr 2015 06:23:28 GMTstatus changed
https://trac.sagemath.org/ticket/18137#comment:6
https://trac.sagemath.org/ticket/18137#comment:6
<ul>
<li><strong>status</strong>
changed from <em>needs_review</em> to <em>needs_work</em>
</li>
</ul>
TicketgitWed, 08 Apr 2015 07:15:55 GMTcommit changed
https://trac.sagemath.org/ticket/18137#comment:7
https://trac.sagemath.org/ticket/18137#comment:7
<ul>
<li><strong>commit</strong>
changed from <em>47291d7a004cef908625eeb40ac3997645a7cce3</em> to <em>421dc01c948b631dd227c17aeba0459080293b94</em>
</li>
</ul>
<p>
Branch pushed to git repo; I updated commit sha1. New commits:
</p>
<table class="wiki">
<tr><td><a class="ext-link" href="http://git.sagemath.org/sage.git/commit/?id=421dc01c948b631dd227c17aeba0459080293b94"><span class="icon"></span>421dc01</a></td><td><code>trac #18137: Numerical noise</code>
</td></tr></table>
TicketncohenWed, 08 Apr 2015 07:16:42 GMTstatus changed
https://trac.sagemath.org/ticket/18137#comment:8
https://trac.sagemath.org/ticket/18137#comment:8
<ul>
<li><strong>status</strong>
changed from <em>needs_work</em> to <em>needs_review</em>
</li>
</ul>
<p>
Another thing for which pathbot save us <code>:-P</code>
</p>
<p>
Thanks,
</p>
<p>
Nathann
</p>
TicketncohenWed, 08 Apr 2015 07:38:30 GMTdescription changed
https://trac.sagemath.org/ticket/18137#comment:9
https://trac.sagemath.org/ticket/18137#comment:9
<ul>
<li><strong>description</strong>
modified (<a href="/ticket/18137?action=diff&version=9">diff</a>)
</li>
</ul>
TicketvdelecroixWed, 08 Apr 2015 11:55:21 GMT
https://trac.sagemath.org/ticket/18137#comment:10
https://trac.sagemath.org/ticket/18137#comment:10
<p>
The advantage of using rationals is that it was exact! Here you are using floats but without any guarantee on the result. Aren't you? Do you have an estimate on the error depending on the number of vertices/edges? One solution solution would be to use ball arithmetic that also produce a bound on the error (see the recently added arb package). Or interval arithmetic (but that is slower).
</p>
<p>
Vincent
</p>
TicketdcoudertWed, 08 Apr 2015 12:35:42 GMT
https://trac.sagemath.org/ticket/18137#comment:11
https://trac.sagemath.org/ticket/18137#comment:11
<p>
Although Nathann would prefer not to, we could have 2 versions of the code, the fast one as default, and a slower exact one.
David.
</p>
TicketncohenWed, 08 Apr 2015 12:51:34 GMT
https://trac.sagemath.org/ticket/18137#comment:12
https://trac.sagemath.org/ticket/18137#comment:12
<p>
Hello,
</p>
<blockquote class="citation">
<p>
The advantage of using rationals is that it was exact!
</p>
</blockquote>
<p>
And this is the very reason why I wrote both implementations.
</p>
<p>
I am not so sure that it is a very big problem, however, as the algorithm will not add noise to noise like it can happen for PDE computations.
</p>
<p>
The current version of <code>centrality_betweenness</code>, from <strong>NetworkX</strong> shipped with Sage, also computes on floats (in Python):
</p>
<pre class="wiki"> sage: import networkx
sage: networkx.algorithms.centrality.betweenness._accumulate_basic??
</pre><p>
I wanted to check how <strong>Boost</strong> does it, but I was not able to locate the source code (God, how can anyone read those files???).
</p>
<p>
(15 minutes later)
</p>
<p>
Here it is! Line 338 of:
</p>
<p>
<a class="ext-link" href="http://boost.cvs.sourceforge.net/viewvc/boost/boost/boost/graph/betweenness_centrality.hpp?annotate=1.2.6.1"><span class="icon"></span>http://boost.cvs.sourceforge.net/viewvc/boost/boost/boost/graph/betweenness_centrality.hpp?annotate=1.2.6.1</a>
</p>
<p>
So the answer is that "it depends of dependency_type", which is.. A template.
</p>
<p>
For <strong>igraph</strong> it is apparently a double too:
<a class="ext-link" href="https://github.com/igraph/igraph/blob/master/src/centrality.c#L1685"><span class="icon"></span>https://github.com/igraph/igraph/blob/master/src/centrality.c#L1685</a>
<a class="ext-link" href="https://github.com/igraph/igraph/blob/master/src/centrality.c#L1804"><span class="icon"></span>https://github.com/igraph/igraph/blob/master/src/centrality.c#L1804</a>
</p>
<p>
For <strong>graph-tools</strong> (last of the libraries compared on the link in the ticket's description) it is apparently a double too, though I can't make sure for I do not find the <code>get_betweenness</code> function
</p>
<p>
<a class="ext-link" href="https://git.skewed.de/count0/graph-tool/blob/master/src/graph_tool/centrality/__init__.py#L326"><span class="icon"></span>https://git.skewed.de/count0/graph-tool/blob/master/src/graph_tool/centrality/__init__.py#L326</a>
</p>
<p>
Sooooooo please don't just limit your argumentation to "not exact=BAD". I care about this, and for this reason I implemented both (which definitely took more than a couple of minutes as you can imagine), but I do believe that for this kind of computations working on floats is not that bad, for I know when the divisions occur and, well, we do not mind much.
</p>
<p>
I would personally be very happy to have both in Sage, with an easy flag to switch from one implementation to the other. If you just checkout the first of my commits you will see that only one variable need to be changed so that double become rationals. My trouble is that using Cython's preprocessor instructions requires to run <code>sage -b</code>, and we do not want that.
</p>
<p>
I would also like to NOT have the same code copy/pasted twice, and to not pay for the 'if' inside of the loops.
</p>
<p>
I would be happy to have both if there is a free (in terms of computations) way to handle both at once, and a cheap (in term of lines of code) way to have both.
</p>
<p>
So far I did not find any way out, and I thought that the best was to have what everybody seemds interested in: computations on double (we can also turn them into 'long double' if necessary).
</p>
<p>
Nathann
</p>
<p>
P.S.: I uploaded a commit with both versions so that it will be available somewhere (and not on my computer only) if we ever need that implementation. I did that on purpose, to have it archived somewhere.
</p>
TicketvdelecroixWed, 08 Apr 2015 13:02:51 GMT
https://trac.sagemath.org/ticket/18137#comment:13
https://trac.sagemath.org/ticket/18137#comment:13
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/18137#comment:12" title="Comment 12">ncohen</a>:
</p>
<blockquote class="citation">
<p>
Sooooooo please don't just limit your argumentation to "not exact=BAD".
</p>
</blockquote>
<p>
Do not oversimplify. My argumentation was "not exact => extra care needed". Floats are wonderful because they are very fast.
</p>
<blockquote class="citation">
<p>
And this is the very reason why I wrote both implementations.
</p>
</blockquote>
<p>
Would be interesting to investigate (experimentally) the error propagation.
</p>
<blockquote class="citation">
<p>
I am not so sure that it is a very big problem, however, as the algorithm will not add noise to noise like it can happen for PDE computations.
</p>
</blockquote>
<p>
Already summing (a lot of) floating point numbers create problems. Simple (not so dramatic) example
</p>
<pre class="wiki">sage: sum(1.r/n for n in range(1,10000000))
16.69531126585727
sage: sum(1.r/n for n in range(9999999,0,-1))
16.695311265859964
</pre><p>
If you mix that with division, it is of course even worse.
</p>
<blockquote class="citation">
<p>
I care about this, and for this reason I implemented both (which definitely took more than a couple of minutes as you can imagine), but I do believe that for this kind of computations working on floats is not that bad, for I know when the divisions occur and, well, we do not mind much.
</p>
</blockquote>
<p>
I also believe so, but it would be better if we were sure and the documentation mentioned it. Something like: if you do have a graph with <code>m</code> vertices and <code>n</code> edges than the worst case is an error of <code>function(m,n)</code>.
</p>
<p>
Vincent
</p>
TicketncohenWed, 08 Apr 2015 13:20:49 GMT
https://trac.sagemath.org/ticket/18137#comment:14
https://trac.sagemath.org/ticket/18137#comment:14
<p>
Hello !
</p>
<p>
I agree that float operations make errors, but I do not know how to evaluate it. I expect the relative error to stay very very small in those cases, and in the graphs that are of interest for the networks community.
</p>
<p>
Would you know a trick to have both implementations available in the code (without recompilation)? I do not think that we can have 'real templates' in Cython, can we?
</p>
<p>
Nathann
</p>
TicketncohenWed, 08 Apr 2015 15:44:28 GMT
https://trac.sagemath.org/ticket/18137#comment:15
https://trac.sagemath.org/ticket/18137#comment:15
<p>
Okay. Here it is. It cost me the last four hours.
</p>
<p>
Nathann
</p>
TicketgitWed, 08 Apr 2015 15:44:56 GMTcommit changed
https://trac.sagemath.org/ticket/18137#comment:16
https://trac.sagemath.org/ticket/18137#comment:16
<ul>
<li><strong>commit</strong>
changed from <em>421dc01c948b631dd227c17aeba0459080293b94</em> to <em>2f2fbd47b8527ca6a4dac02f01f8c2167907cd20</em>
</li>
</ul>
<p>
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
</p>
<table class="wiki">
<tr><td><a class="ext-link" href="http://git.sagemath.org/sage.git/commit/?id=2f2fbd47b8527ca6a4dac02f01f8c2167907cd20"><span class="icon"></span>2f2fbd4</a></td><td><code>trac #18137: Add new centrality module</code>
</td></tr></table>
TicketvdelecroixWed, 08 Apr 2015 15:47:40 GMT
https://trac.sagemath.org/ticket/18137#comment:17
https://trac.sagemath.org/ticket/18137#comment:17
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/18137#comment:15" title="Comment 15">ncohen</a>:
</p>
<blockquote class="citation">
<p>
Okay. Here it is. It cost me the last four hours.
</p>
</blockquote>
<p>
Youhou! You initiated me to the world of Cython templating!
</p>
<p>
I am having a careful look right now.
</p>
TicketncohenWed, 08 Apr 2015 15:50:03 GMT
https://trac.sagemath.org/ticket/18137#comment:18
https://trac.sagemath.org/ticket/18137#comment:18
<blockquote class="citation">
<p>
Youhou! You initiated me to the world of Cython templating!
</p>
<p>
I am having a careful look right now.
</p>
</blockquote>
<p>
It is cool without being great. There are several limitations that you will only notice if you write some code yourself. In particular, the templating forced me to introduce a pure C function.
</p>
<p>
Nathann
</p>
TicketvdelecroixWed, 08 Apr 2015 16:07:11 GMT
https://trac.sagemath.org/ticket/18137#comment:19
https://trac.sagemath.org/ticket/18137#comment:19
<p>
At least from the following example, the computation with double is perfectly valid
</p>
<pre class="wiki">sage: g = graphs.RandomGNP(100,.2)
sage: cb0 = g.centrality_betweenness(exact=0)
sage: cb1 = g.centrality_betweenness(exact=1)
sage: max(cb0[i]-cb1[i] for i in cb0)
6.938893903907228e-18
</pre><p>
The mantissa of the double is <code>2^(-53) ~ 10^(-16)</code> and the numbers are about <code>0.01</code> which makes the error affect only the one or two last bits.
</p>
TicketvdelecroixWed, 08 Apr 2015 16:36:04 GMT
https://trac.sagemath.org/ticket/18137#comment:20
https://trac.sagemath.org/ticket/18137#comment:20
<p>
All right. I did not check what the algorithm does, but at least I can propose a simplification in the Cython fused business. Have a look at <code>public/18137</code>.
</p>
<p>
Vincent
</p>
TicketncohenWed, 08 Apr 2015 16:53:56 GMTcommit, branch changed
https://trac.sagemath.org/ticket/18137#comment:21
https://trac.sagemath.org/ticket/18137#comment:21
<ul>
<li><strong>commit</strong>
changed from <em>2f2fbd47b8527ca6a4dac02f01f8c2167907cd20</em> to <em>6dfcbc3b9043f911ab3f706d4b18dbce34b94d66</em>
</li>
<li><strong>branch</strong>
changed from <em>u/ncohen/18137</em> to <em>public/18137</em>
</li>
</ul>
TicketdcoudertWed, 08 Apr 2015 17:26:35 GMT
https://trac.sagemath.org/ticket/18137#comment:22
https://trac.sagemath.org/ticket/18137#comment:22
<p>
Hello,
</p>
<p>
you should add the description of parameter <code>exact</code> in the cython <code>centrality_betweenness</code> method.
</p>
<p>
It does matter much, but you could specify the return type of the C method (dict).
</p>
<p>
otherwise, the method is working perfectly on my mac ;)
</p>
<p>
David.
</p>
TicketgitWed, 08 Apr 2015 19:07:32 GMTcommit changed
https://trac.sagemath.org/ticket/18137#comment:23
https://trac.sagemath.org/ticket/18137#comment:23
<ul>
<li><strong>commit</strong>
changed from <em>6dfcbc3b9043f911ab3f706d4b18dbce34b94d66</em> to <em>f21ae32a86cfe0c9d86f75906491cb804af45b68</em>
</li>
</ul>
<p>
Branch pushed to git repo; I updated commit sha1. New commits:
</p>
<table class="wiki">
<tr><td><a class="ext-link" href="http://git.sagemath.org/sage.git/commit/?id=f21ae32a86cfe0c9d86f75906491cb804af45b68"><span class="icon"></span>f21ae32</a></td><td><code>trac #18137: Review</code>
</td></tr></table>
TicketncohenWed, 08 Apr 2015 19:09:15 GMTcc changed
https://trac.sagemath.org/ticket/18137#comment:24
https://trac.sagemath.org/ticket/18137#comment:24
<ul>
<li><strong>cc</strong>
<em>borassi</em> added
</li>
</ul>
TicketncohenThu, 09 Apr 2015 07:37:47 GMT
https://trac.sagemath.org/ticket/18137#comment:25
https://trac.sagemath.org/ticket/18137#comment:25
<p>
FYI: <a class="ext-link" href="https://groups.google.com/forum/#!topic/cython-users/xOw-LWUb7aY"><span class="icon"></span>https://groups.google.com/forum/#!topic/cython-users/xOw-LWUb7aY</a>
</p>
TicketchapotonThu, 09 Apr 2015 11:52:28 GMT
https://trac.sagemath.org/ticket/18137#comment:26
https://trac.sagemath.org/ticket/18137#comment:26
<p>
still not passing the tests
</p>
<p>
and please use <code>....:</code> newstyle doctest continuation
</p>
TicketgitThu, 09 Apr 2015 11:55:18 GMTcommit changed
https://trac.sagemath.org/ticket/18137#comment:27
https://trac.sagemath.org/ticket/18137#comment:27
<ul>
<li><strong>commit</strong>
changed from <em>f21ae32a86cfe0c9d86f75906491cb804af45b68</em> to <em>eaa6a3eccacf8f90064cde79febef149c7b8e5e8</em>
</li>
</ul>
<p>
Branch pushed to git repo; I updated commit sha1. New commits:
</p>
<table class="wiki">
<tr><td><a class="ext-link" href="http://git.sagemath.org/sage.git/commit/?id=eaa6a3eccacf8f90064cde79febef149c7b8e5e8"><span class="icon"></span>eaa6a3e</a></td><td><code>trac #18137: ... -> ....:</code>
</td></tr></table>
TicketncohenThu, 09 Apr 2015 11:57:10 GMT
https://trac.sagemath.org/ticket/18137#comment:28
https://trac.sagemath.org/ticket/18137#comment:28
<p>
Done. Not that I did not introduce any '...'. I only added a "# tol" flag on an existing one.
</p>
<p>
Nathann
</p>
TicketborassiThu, 09 Apr 2015 13:01:31 GMT
https://trac.sagemath.org/ticket/18137#comment:29
https://trac.sagemath.org/ticket/18137#comment:29
<p>
Hello,
this is my first comment: please, forgive me if I make mistakes due to my lack of experience, and I am willing to receive suggestions.
First of all, congrats for the excellent running time! Some comments follow (I think the first is important, while the others are only "style" comments):
</p>
<ol><li>bitset_union(seen,seen,next_layer): this operation takes time O(n), and it is performed at each layer. This means that, if there are many layers, then the running time might be O(n<sup>2</sup>) to perform a BFS, and O(mn<sup>2</sup>) overall (for instance, if the input is a path). I would change the code by memorizing the distances from the source, and making tests with distances. Example: it takes more time to analyze a path of length 10,000 than a G(N,p) random graph with N=10,000 and p=0.001 (with about 100,000 edges).
</li><li>Only to improve readability of the code: why do you use layer_current_beginning, layer_current_end, and layer_next_end? I would rather use a variable for the start of the queue and another for the end of the queue, as in a standard BFS. Is there any particular reason why you do not do it?
</li><li>In the comment part, Section "ALGORITHM", I would add something to explain what happens if k=0.
</li></ol><p>
Hope it is useful!
</p>
<p>
Michele
</p>
TicketncohenThu, 09 Apr 2015 13:25:20 GMT
https://trac.sagemath.org/ticket/18137#comment:30
https://trac.sagemath.org/ticket/18137#comment:30
<p>
Hello,
</p>
<blockquote class="citation">
<p>
this is my first comment: please, forgive me if I make mistakes due to my lack of experience, and I am willing to receive suggestions.
</p>
</blockquote>
<p>
Your comments are rather good, actually.
</p>
<blockquote class="citation">
<ol><li>bitset_union(seen,seen,next_layer): this operation takes time O(n), and it is performed at each layer. This means that, if there are many layers, then the running time might be O(n<sup>2</sup>) to perform a BFS, and O(mn<sup>2</sup>) overall (for instance, if the input is a path). I would change the code by memorizing the distances from the source, and making tests with distances. Example: it takes more time to analyze a path of length 10,000 than a G(N,p) random graph with N=10,000 and p=0.001 (with about 100,000 edges).
</li></ol></blockquote>
<p>
HMmm.. Well, indeed when thinking of a long path this may actually make a big difference. Do you feel like giving it a try and telling us about the difference?
</p>
<p>
I believe that it may make performances worse in some case, though I do not know by how much. Storing everything as a bitset (instead of storing distances) also makes things more compact in memory, and the cpu cache works better. Though again, I have no idea how much time that represents.
</p>
<blockquote class="citation">
<ol><li>Only to improve readability of the code: why do you use layer_current_beginning, layer_current_end, and layer_next_end? I would rather use a variable for the start of the queue and another for the end of the queue, as in a standard BFS. Is there any particular reason why you do not do it?
</li></ol></blockquote>
<p>
I do not understand your question. What do you mean by "a variable for the start of the que and another for the end of the queue"? This is precisely what <code>layer_current_beginning</code> and <code>layer_current_end</code> do, in my understanding.
</p>
<blockquote class="citation">
<ol><li>In the comment part, Section "ALGORITHM", I would add something to explain what happens if k=0.
</li></ol></blockquote>
<p>
In this case the sum is empty, and thus equal to 0. Do you think that it should be emphasized?
</p>
<p>
Nathann
</p>
TicketncohenThu, 09 Apr 2015 13:30:31 GMT
https://trac.sagemath.org/ticket/18137#comment:31
https://trac.sagemath.org/ticket/18137#comment:31
<p>
Okay, two news:
</p>
<p>
1) You are right, when profiling the code on a long path the bottleneck is this 'or'
</p>
<p>
2) We can have everything at the same time. Instead of clearing the new layer bitset and running this big 'or', we can keep adding stuff to it without ever clearing it, AND only adding the new elements to the 'seen' bitset at the end of the loop (using the content of the queue). 100% win. I will upload a commit in a second.
</p>
<p>
Nathann
</p>
TicketncohenThu, 09 Apr 2015 13:31:02 GMT
https://trac.sagemath.org/ticket/18137#comment:32
https://trac.sagemath.org/ticket/18137#comment:32
<p>
FYI, before the optimization
</p>
<pre class="wiki">sage: g = graphs.PathGraph(10000)
sage: %time _=g.centrality_betweenness()
CPU times: user 10 s, sys: 0 ns, total: 10 s
Wall time: 10 s
</pre>
TicketgitThu, 09 Apr 2015 13:35:36 GMTcommit changed
https://trac.sagemath.org/ticket/18137#comment:33
https://trac.sagemath.org/ticket/18137#comment:33
<ul>
<li><strong>commit</strong>
changed from <em>eaa6a3eccacf8f90064cde79febef149c7b8e5e8</em> to <em>c84caef2c24cf2a42bf802c6d2e829886fd74149</em>
</li>
</ul>
<p>
Branch pushed to git repo; I updated commit sha1. New commits:
</p>
<table class="wiki">
<tr><td><a class="ext-link" href="http://git.sagemath.org/sage.git/commit/?id=c84caef2c24cf2a42bf802c6d2e829886fd74149"><span class="icon"></span>c84caef</a></td><td><code>trac #18137: Thanks Michele!!!</code>
</td></tr></table>
TicketncohenThu, 09 Apr 2015 13:36:15 GMT
https://trac.sagemath.org/ticket/18137#comment:34
https://trac.sagemath.org/ticket/18137#comment:34
<p>
Here is is <code>:-)</code>
</p>
<pre class="wiki">sage: g = graphs.PathGraph(10000)
sage: %time _=g.centrality_betweenness()
CPU times: user 1.76 s, sys: 8 ms, total: 1.77 s
Wall time: 1.75 s
</pre><p>
Thanks,
</p>
<p>
Nathann
</p>
TicketborassiThu, 09 Apr 2015 14:00:18 GMT
https://trac.sagemath.org/ticket/18137#comment:35
https://trac.sagemath.org/ticket/18137#comment:35
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/18137#comment:30" title="Comment 30">ncohen</a>:
</p>
<blockquote class="citation">
<p>
Hello,
</p>
<blockquote class="citation">
<p>
this is my first comment: please, forgive me if I make mistakes due to my lack of experience, and I am willing to receive suggestions.
</p>
</blockquote>
<p>
Your comments are rather good, actually.
</p>
<blockquote class="citation">
<ol><li>bitset_union(seen,seen,next_layer): this operation takes time O(n), and it is performed at each layer. This means that, if there are many layers, then the running time might be O(n<sup>2</sup>) to perform a BFS, and O(mn<sup>2</sup>) overall (for instance, if the input is a path). I would change the code by memorizing the distances from the source, and making tests with distances. Example: it takes more time to analyze a path of length 10,000 than a G(N,p) random graph with N=10,000 and p=0.001 (with about 100,000 edges).
</li></ol></blockquote>
<p>
HMmm.. Well, indeed when thinking of a long path this may actually make a big difference. Do you feel like giving it a try and telling us about the difference?
</p>
<p>
I believe that it may make performances worse in some case, though I do not know by how much. Storing everything as a bitset (instead of storing distances) also makes things more compact in memory, and the cpu cache works better. Though again, I have no idea how much time that represents.
</p>
</blockquote>
<p>
Well, it seems you have already done it! :-)
</p>
<blockquote class="citation">
<blockquote class="citation">
<ol><li>Only to improve readability of the code: why do you use layer_current_beginning, layer_current_end, and layer_next_end? I would rather use a variable for the start of the queue and another for the end of the queue, as in a standard BFS. Is there any particular reason why you do not do it?
</li></ol></blockquote>
<p>
I do not understand your question. What do you mean by "a variable for the start of the que and another for the end of the queue"? This is precisely what <code>layer_current_beginning</code> and <code>layer_current_end</code> do, in my understanding.
</p>
</blockquote>
<p>
My point is that I do not like the nested loops:
</p>
<pre class="wiki">while layer_current_beginning<layer_current_end:
for j in range(layer_current_beginning,layer_current_end):
</pre><p>
I would rather use only the <code>while</code> loop, by replacing <code>j</code> with <code>layer_current_beginning</code>, and adding <code>layer_current_beginning++</code> at the end of the <code>while</code>. However, I realize now that, if you do not want to store distances, this is probably not possible, because you have two <code>for</code> loops. So, maybe the best is to leave it as it is.
</p>
<blockquote class="citation">
<blockquote class="citation">
<ol><li>In the comment part, Section "ALGORITHM", I would add something to explain what happens if k=0.
</li></ol></blockquote>
<p>
In this case the sum is empty, and thus equal to 0. Do you think that it should be emphasized?
</p>
</blockquote>
<p>
That's what I meant, but if you think it's not worth doing it, no problem!
</p>
<blockquote class="citation">
<p>
Nathann
</p>
</blockquote>
TicketncohenThu, 09 Apr 2015 14:11:04 GMT
https://trac.sagemath.org/ticket/18137#comment:36
https://trac.sagemath.org/ticket/18137#comment:36
<p>
Hello,
</p>
<blockquote class="citation">
<p>
Well, it seems you have already done it! :-)
</p>
</blockquote>
<p>
Well.. Three lines of code for an "all-cases improvement" and an order of magnitude for the worst case... Of course <code>:-P</code>
</p>
<blockquote class="citation">
<p>
My point is that I do not like the nested loops:
</p>
</blockquote>
<blockquote class="citation">
<p>
I would rather use only the <code>while</code> loop, by replacing <code>j</code> with <code>layer_current_beginning</code>, and adding <code>layer_current_beginning++</code> at the end of the <code>while</code>. However, I realize now that, if you do not want to store distances, this is probably not possible, because you have two <code>for</code> loops. So, maybe the best is to leave it as it is.
</p>
</blockquote>
<p>
Yep yep, I am forced to do that if I want to know where exactly I am without storing distances <code>^^;</code>
</p>
<blockquote class="citation">
<p>
That's what I meant, but if you think it's not worth doing it, no problem!
</p>
</blockquote>
<p>
I just looked at this explanation, and I do not see any way to hint at it (without saying the obvious 'if the sum is empty, well, it's zero'). SOooooo I will pass.
</p>
<p>
Thanks again for the speedup,
</p>
<p>
Nathann
</p>
TicketdcoudertThu, 09 Apr 2015 14:15:53 GMT
https://trac.sagemath.org/ticket/18137#comment:37
https://trac.sagemath.org/ticket/18137#comment:37
<p>
One possible extra speedup: decompose the graph into biconnected components ;)
the centrality of cut vertices is obvious to compute. For other vertices it might be more difficult to gather every parts.
</p>
<p>
David.
</p>
TicketncohenThu, 09 Apr 2015 14:17:50 GMT
https://trac.sagemath.org/ticket/18137#comment:38
https://trac.sagemath.org/ticket/18137#comment:38
<p>
Ahahah. Well, honestly I don't mind much. I am only implementing this to be able to say 6 months from now (future stable release) to anybody that compares graph libraries that they should also include Sage, and that we are faster than everybody else. Boost included <code>:-P</code>
</p>
<p>
Nathann
</p>
TicketdcoudertThu, 09 Apr 2015 14:19:57 GMT
https://trac.sagemath.org/ticket/18137#comment:39
https://trac.sagemath.org/ticket/18137#comment:39
<p>
That's fair enough ;)
</p>
<p>
We will certainly have future patches for other stuff.
</p>
<p>
One point: apparently boost computes simultaneously the vertex and the edge betweenness. This may explain some slow-down.
</p>
<p>
David.
</p>
TicketncohenThu, 09 Apr 2015 14:25:06 GMT
https://trac.sagemath.org/ticket/18137#comment:40
https://trac.sagemath.org/ticket/18137#comment:40
<p>
Well. We can have it for free too if we care. Instead of
</p>
<pre class="wiki">betweenness_source[v] += (betweenness_source[u]+1)*(n_paths_from_source[v]/n_paths_from_source[u])
</pre><p>
you must save the right value somewhere (that's your edge betweenness). It may induce a small loss in speed, but certainly not an integer factor.
</p>
<p>
Nathann
</p>
<p>
P.S.: Okay, now that I implemented so many cool things would anybody feel like ending the review? <code>:-P</code>
</p>
TicketborassiThu, 09 Apr 2015 14:29:27 GMTstatus changed
https://trac.sagemath.org/ticket/18137#comment:41
https://trac.sagemath.org/ticket/18137#comment:41
<ul>
<li><strong>status</strong>
changed from <em>needs_review</em> to <em>positive_review</em>
</li>
</ul>
TicketncohenThu, 09 Apr 2015 14:40:32 GMT
https://trac.sagemath.org/ticket/18137#comment:42
https://trac.sagemath.org/ticket/18137#comment:42
<p>
HMmmmmmmmmmm <code>O_o</code>
</p>
<p>
Well, I don't mind much your setting my ticket to positive review, but given that you are new here... Have you looked at this page?
</p>
<p>
<a href="http://www.sagemath.org/doc/developer/reviewer_checklist.html">http://www.sagemath.org/doc/developer/reviewer_checklist.html</a>
</p>
<p>
I am not saying that newcomers are not allowed to change the status -- everybody is, and everybody can review anything here -- it is just that I want to make sure that you know what reviewing a ticket involves (running the tests, wondering if they are sufficient, building the doc).
</p>
<p>
Thanks anyway,
</p>
<p>
Nathann
</p>
TicketborassiThu, 09 Apr 2015 14:47:05 GMTstatus changed
https://trac.sagemath.org/ticket/18137#comment:43
https://trac.sagemath.org/ticket/18137#comment:43
<ul>
<li><strong>status</strong>
changed from <em>positive_review</em> to <em>needs_review</em>
</li>
</ul>
TicketborassiThu, 09 Apr 2015 14:50:02 GMT
https://trac.sagemath.org/ticket/18137#comment:44
https://trac.sagemath.org/ticket/18137#comment:44
<p>
Sorry, this is the "mistake due to lack of experience". I thought "positive review" meant that I was happy with the code, but now I understand it is much more. I think it's better to leave this issue to more experienced people.
</p>
TicketncohenThu, 09 Apr 2015 14:53:53 GMT
https://trac.sagemath.org/ticket/18137#comment:45
https://trac.sagemath.org/ticket/18137#comment:45
<p>
No proooooob!!! If you have some spare time you can read our manual a bit. Reviewing a ticket is not very complicated and the 'technical checks' do not take more than a couple of minutes once you get used to them. And of course you can ask us any question if the manual isn't clear <code>;-)</code>
</p>
<p>
Nathann
</p>
TicketdcoudertThu, 09 Apr 2015 17:48:05 GMTstatus changed
https://trac.sagemath.org/ticket/18137#comment:46
https://trac.sagemath.org/ticket/18137#comment:46
<ul>
<li><strong>status</strong>
changed from <em>needs_review</em> to <em>needs_work</em>
</li>
</ul>
<p>
One issue:
</p>
<pre class="wiki">sage: G = Graph()
sage: G.centrality_betweenness()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
...
ValueError: bitset capacity must be greater than 0
</pre><p>
Did you know that your code is also working for multi-graphs?
</p>
<pre class="wiki">sage: G = Graph(multiedges=True)
sage: G.add_edge(0,1)
sage: G.add_edge(0,1)
sage: G.add_edge(1,2)
sage: G.add_edge(2,3)
sage: G.add_edge(0,3)
sage: G.centrality_betweenness(exact=1)
{0: 2/9, 1: 2/9, 2: 1/9, 3: 1/9}
</pre>
TicketgitFri, 10 Apr 2015 06:52:58 GMTcommit changed
https://trac.sagemath.org/ticket/18137#comment:47
https://trac.sagemath.org/ticket/18137#comment:47
<ul>
<li><strong>commit</strong>
changed from <em>c84caef2c24cf2a42bf802c6d2e829886fd74149</em> to <em>c9f83e7edf6d3b5a7e74b22a98127fbdbbc8cee6</em>
</li>
</ul>
<p>
Branch pushed to git repo; I updated commit sha1. New commits:
</p>
<table class="wiki">
<tr><td><a class="ext-link" href="http://git.sagemath.org/sage.git/commit/?id=c9f83e7edf6d3b5a7e74b22a98127fbdbbc8cee6"><span class="icon"></span>c9f83e7</a></td><td><code>trac #18137: Trivial cases... As always.</code>
</td></tr></table>
TicketncohenFri, 10 Apr 2015 06:55:12 GMT
https://trac.sagemath.org/ticket/18137#comment:48
https://trac.sagemath.org/ticket/18137#comment:48
<blockquote class="citation">
<p>
One issue:
</p>
</blockquote>
<p>
Right. Fixed.
</p>
<blockquote class="citation">
<p>
Did you know that your code is also working for multi-graphs?
</p>
</blockquote>
<p>
Yeah, that was a good news! At some point I wondered whether I should add a 'scream if not simple' somewhere, then figured out that it worked fine. It also extends the definition in the most natural way, i.e. by considering a path as a set of edges instead of a set of vertices.
</p>
<p>
And it also works for loops <code>:-PPPP</code>
</p>
<p>
Nathann
</p>
TicketdcoudertFri, 10 Apr 2015 15:29:52 GMTstatus changed; reviewer set
https://trac.sagemath.org/ticket/18137#comment:49
https://trac.sagemath.org/ticket/18137#comment:49
<ul>
<li><strong>status</strong>
changed from <em>needs_work</em> to <em>positive_review</em>
</li>
<li><strong>reviewer</strong>
set to <em>David Coudert</em>
</li>
</ul>
<p>
Good.
</p>
TicketvbraunSun, 12 Apr 2015 11:57:04 GMTstatus changed
https://trac.sagemath.org/ticket/18137#comment:50
https://trac.sagemath.org/ticket/18137#comment:50
<ul>
<li><strong>status</strong>
changed from <em>positive_review</em> to <em>needs_work</em>
</li>
</ul>
<p>
I'm getting this on 32-bit. You should probably add an <code># abs tol</code>
</p>
<pre class="wiki">sage -t --long src/sage/graphs/graph.py
**********************************************************************
File "src/sage/graphs/graph.py", line 4662, in sage.graphs.graph.Graph.?
Failed example:
(graphs.ChvatalGraph()).centrality_betweenness(
normalized=False) # abs tol abs 1e10
Expected:
{0: 3.833333333333333, 1: 3.833333333333333, 2: 3.333333333333333,
3: 3.333333333333333, 4: 3.833333333333333, 5: 3.833333333333333,
6: 3.333333333333333, 7: 3.333333333333333, 8: 3.333333333333333,
9: 3.333333333333333, 10: 3.333333333333333,
11: 3.333333333333333}
Got:
{0: 3.833333333333333,
1: 3.833333333333333,
2: 3.333333333333333,
3: 3.333333333333333,
4: 3.833333333333333,
5: 3.833333333333333,
6: 3.333333333333333,
7: 3.3333333333333335,
8: 3.333333333333333,
9: 3.333333333333333,
10: 3.333333333333333,
11: 3.333333333333333}
**********************************************************************
1 item had failures:
1 of 66 in sage.graphs.graph.Graph.?
[652 tests, 1 failure, 15.40 s]
</pre>
TicketgitSun, 12 Apr 2015 12:59:16 GMTcommit changed
https://trac.sagemath.org/ticket/18137#comment:51
https://trac.sagemath.org/ticket/18137#comment:51
<ul>
<li><strong>commit</strong>
changed from <em>c9f83e7edf6d3b5a7e74b22a98127fbdbbc8cee6</em> to <em>dddf5024500101cae695b56943f69d433351276e</em>
</li>
</ul>
<p>
Branch pushed to git repo; I updated commit sha1. New commits:
</p>
<table class="wiki">
<tr><td><a class="ext-link" href="http://git.sagemath.org/sage.git/commit/?id=dddf5024500101cae695b56943f69d433351276e"><span class="icon"></span>dddf502</a></td><td><code>trac #18137: broken doctest</code>
</td></tr></table>
TicketncohenSun, 12 Apr 2015 12:59:29 GMTstatus changed
https://trac.sagemath.org/ticket/18137#comment:52
https://trac.sagemath.org/ticket/18137#comment:52
<ul>
<li><strong>status</strong>
changed from <em>needs_work</em> to <em>positive_review</em>
</li>
</ul>
TicketgitSun, 12 Apr 2015 13:01:26 GMTstatus, commit changed
https://trac.sagemath.org/ticket/18137#comment:53
https://trac.sagemath.org/ticket/18137#comment:53
<ul>
<li><strong>status</strong>
changed from <em>positive_review</em> to <em>needs_review</em>
</li>
<li><strong>commit</strong>
changed from <em>dddf5024500101cae695b56943f69d433351276e</em> to <em>2db68fb0f56ce38214484cd4619e2fb2a8f4aa7b</em>
</li>
</ul>
<p>
Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. This was a forced push. New commits:
</p>
<table class="wiki">
<tr><td><a class="ext-link" href="http://git.sagemath.org/sage.git/commit/?id=2db68fb0f56ce38214484cd4619e2fb2a8f4aa7b"><span class="icon"></span>2db68fb</a></td><td><code>trac #18137: broken doctest</code>
</td></tr></table>
TicketncohenSun, 12 Apr 2015 13:01:43 GMTstatus changed
https://trac.sagemath.org/ticket/18137#comment:54
https://trac.sagemath.org/ticket/18137#comment:54
<ul>
<li><strong>status</strong>
changed from <em>needs_review</em> to <em>positive_review</em>
</li>
</ul>
TicketvbraunTue, 14 Apr 2015 19:43:06 GMTstatus, branch changed; resolution set
https://trac.sagemath.org/ticket/18137#comment:55
https://trac.sagemath.org/ticket/18137#comment:55
<ul>
<li><strong>status</strong>
changed from <em>positive_review</em> to <em>closed</em>
</li>
<li><strong>resolution</strong>
set to <em>fixed</em>
</li>
<li><strong>branch</strong>
changed from <em>public/18137</em> to <em>2db68fb0f56ce38214484cd4619e2fb2a8f4aa7b</em>
</li>
</ul>
Ticket