Sage: Ticket #18250: G.triangles_count speedup
https://trac.sagemath.org/ticket/18250
<p>
I thought that we had a decent implementation of <code>G.triangles_count</code>. Turns out that we did not.
</p>
<p>
Here is what the branch does:
</p>
<ul><li>Exception on non-simple graphs. The digraph version can handle loops (it
cannot handle multiple edges) and the graph version cannot handle any of the
two.
</li></ul><ul><li>(obvious) speedup in the 'iter' algorithm. Before:
<pre class="wiki">sage: %timeit _=g.triangles_count(algorithm='iter')
1 loops, best of 3: 10.2 s per loop
</pre>After:
<pre class="wiki">sage: g=graphs.RandomGNP(700,.3)
sage: %timeit _=g.triangles_count(algorithm='iter')
1 loops, best of 3: 6.37 s per loop
</pre></li></ul><ul><li>Added <code>triangles_count</code> in <code>static_sparse_graph</code> and another one in
<code>static_dense_graph</code>. They first copy the graph in a proper data structure and
count the triangles on the copy. They can take require more memory (though
they are 100000x cheaper than the default data structure)
</li></ul><ul><li>The default algorithm is 'iter' in the code but 'matrix' in the doc. Fixed:
<code>sparse_copy</code> is now the default.
</li></ul><p>
Overall, comparing default implementations:
</p>
<p>
Before
</p>
<pre class="wiki">sage: g = graphs.RandomGNP(500,.5)
sage: %timeit g.triangles_count()
1 loops, best of 3: 9.78 s per loop
</pre><p>
After
</p>
<pre class="wiki">sage: g = graphs.RandomGNP(500,.5)
sage: %timeit g.triangles_count()
10 loops, best of 3: 162 ms per loop
</pre>en-usSagehttps://trac.sagemath.org/chrome/site/logo_sagemath_trac.png
https://trac.sagemath.org/ticket/18250
Trac 1.1.6ncohenSun, 19 Apr 2015 09:41:35 GMTstatus, description changed; commit, branch set
https://trac.sagemath.org/ticket/18250#comment:1
https://trac.sagemath.org/ticket/18250#comment:1
<ul>
<li><strong>status</strong>
changed from <em>new</em> to <em>needs_review</em>
</li>
<li><strong>commit</strong>
set to <em>e780f711afba40b34c0d11b65adaf3cc4429fac2</em>
</li>
<li><strong>description</strong>
modified (<a href="/ticket/18250?action=diff&version=1">diff</a>)
</li>
<li><strong>branch</strong>
set to <em>public/18250</em>
</li>
</ul>
<p>
New commits:
</p>
<table class="wiki">
<tr><td><a class="ext-link" href="http://git.sagemath.org/sage.git/commit/?id=e780f711afba40b34c0d11b65adaf3cc4429fac2"><span class="icon"></span>e780f71</a></td><td><code>trac #18250: G.triangles_count speedup</code>
</td></tr></table>
TicketgitSun, 19 Apr 2015 11:04:37 GMTcommit changed
https://trac.sagemath.org/ticket/18250#comment:2
https://trac.sagemath.org/ticket/18250#comment:2
<ul>
<li><strong>commit</strong>
changed from <em>e780f711afba40b34c0d11b65adaf3cc4429fac2</em> to <em>ed8bb3c374929b2a32b3e9310124990eb5b2350c</em>
</li>
</ul>
<p>
Branch pushed to git repo; I updated commit sha1. New commits:
</p>
<table class="wiki">
<tr><td><a class="ext-link" href="http://git.sagemath.org/sage.git/commit/?id=ed8bb3c374929b2a32b3e9310124990eb5b2350c"><span class="icon"></span>ed8bb3c</a></td><td><code>Trac 18250: documentation</code>
</td></tr></table>
TicketvdelecroixSun, 19 Apr 2015 11:05:56 GMTstatus changed
https://trac.sagemath.org/ticket/18250#comment:3
https://trac.sagemath.org/ticket/18250#comment:3
<ul>
<li><strong>status</strong>
changed from <em>needs_review</em> to <em>needs_info</em>
</li>
</ul>
<p>
Hello,
</p>
<p>
Small commit for documentation at <code>public/18250</code>.
</p>
<p>
The algorithm <code>'iter'</code> returns <code>int</code> while <code>dense_copy</code> returns <code>Integer</code>. This would better be uniform.
</p>
<p>
You always perform a copy?! Isn't it possible to have a graph whose backend is already a <code>static_sparse_graph</code> or a <code>static_dense_graph</code>?
</p>
<p>
Vincent
</p>
TicketgitSun, 19 Apr 2015 12:36:44 GMTcommit changed
https://trac.sagemath.org/ticket/18250#comment:4
https://trac.sagemath.org/ticket/18250#comment:4
<ul>
<li><strong>commit</strong>
changed from <em>ed8bb3c374929b2a32b3e9310124990eb5b2350c</em> to <em>fa6115dc6cdea907dad9c2cd69668f5e71660024</em>
</li>
</ul>
<p>
Branch pushed to git repo; I updated commit sha1. New commits:
</p>
<table class="wiki">
<tr><td><a class="ext-link" href="http://git.sagemath.org/sage.git/commit/?id=fa6115dc6cdea907dad9c2cd69668f5e71660024"><span class="icon"></span>fa6115d</a></td><td><code>trac #18250: Integer return type and avoid useless copies</code>
</td></tr></table>
TicketncohenSun, 19 Apr 2015 12:43:14 GMTstatus changed
https://trac.sagemath.org/ticket/18250#comment:5
https://trac.sagemath.org/ticket/18250#comment:5
<ul>
<li><strong>status</strong>
changed from <em>needs_info</em> to <em>needs_review</em>
</li>
</ul>
<p>
Yooooooooooooooo,
</p>
<blockquote class="citation">
<p>
Small commit for documentation at <code>public/18250</code>.
</p>
</blockquote>
<p>
Oh, right. Thanks.
</p>
<blockquote class="citation">
<p>
The algorithm <code>'iter'</code> returns <code>int</code> while <code>dense_copy</code> returns <code>Integer</code>. This would better be uniform.
</p>
</blockquote>
<p>
Done.
</p>
<blockquote class="citation">
<p>
You always perform a copy?! Isn't it possible to have a graph whose backend is already a <code>static_sparse_graph</code> or a <code>static_dense_graph</code>?
</p>
</blockquote>
<p>
Well, in this case copies are eally cheap, so I personally prefer a simpler code to one that avoids that copy.
</p>
<p>
Still, I implemented it. There is a 'trick' somewhere:
</p>
<p>
1) I first build a copy of the graph with <code>G.copy(immutable=True)</code> (which makes no copy if the graph is already immutable), then access its internal data structure
</p>
<p>
2) If I do <code>g[0] = G.copy(immutable=True)._backend._cg.g[0]</code> I get a segfault because G.copy() is immediately deallocated. So I first do <code>G = G.copy(immutable=True)._backend</code>, and *then* access its internal data.
</p>
<p>
Fortunately the copy <code>G</code> is not deleted until the end of the function. If Cython ever notices that (as far as it is concerned) it is never used again in the function, then it will be deallocated and we will get a segfault again.
</p>
<p>
Nathann
</p>
TicketvdelecroixSun, 19 Apr 2015 12:47:34 GMT
https://trac.sagemath.org/ticket/18250#comment:6
https://trac.sagemath.org/ticket/18250#comment:6
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/18250#comment:5" title="Comment 5">ncohen</a>:
</p>
<blockquote class="citation">
<blockquote class="citation">
<p>
You always perform a copy?! Isn't it possible to have a graph whose backend is already a <code>static_sparse_graph</code> or a <code>static_dense_graph</code>?
</p>
</blockquote>
<p>
Fortunately the copy <code>G</code> is not deleted until the end of the function. If Cython ever notices that (as far as it is concerned) it is never used again in the function, then it will be deallocated and we will get a segfault again.
</p>
</blockquote>
<p>
Cython does not perform deallocation, it is Python job. Though Cython does increment/decrement the reference counter. And precisely, doing <code>G = G.copy(immutable=True)</code> precisely increment the reference counter!
</p>
<p>
Looks good! I am running the tests...
</p>
<p>
Vincent
</p>
TicketncohenSun, 19 Apr 2015 13:10:07 GMT
https://trac.sagemath.org/ticket/18250#comment:7
https://trac.sagemath.org/ticket/18250#comment:7
<blockquote class="citation">
<p>
Cython does not perform deallocation, it is Python job.
</p>
</blockquote>
<p>
Well if you prefer, but this code is translated into C, and gcc may notice that the object is totally useless. They must have done something to keep those Python variables until the end.
</p>
<p>
Nathann
</p>
TicketvdelecroixSun, 19 Apr 2015 13:18:15 GMT
https://trac.sagemath.org/ticket/18250#comment:8
https://trac.sagemath.org/ticket/18250#comment:8
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/18250#comment:7" title="Comment 7">ncohen</a>:
</p>
<blockquote class="citation">
<blockquote class="citation">
<p>
Cython does not perform deallocation, it is Python job.
</p>
</blockquote>
<p>
Well if you prefer, but this code is translated into C, and gcc may notice that the object is totally useless. They must have done something to keep those Python variables until the end.
</p>
</blockquote>
<p>
Nope. gcc will not do that. An affectation <code>G = H</code> is not one instruction if <code>H</code> is a Python object. Cython translates it into something like
</p>
<pre class="wiki">cdef PyObject * G = H;
Py_INCREF(G);
</pre><p>
whick prevents deallocation of H until G is garbage collected (typically at the end of the function or until a call of <code>del G</code>).
</p>
<p>
Vincent
</p>
TicketvdelecroixSun, 19 Apr 2015 13:24:22 GMTreviewer set
https://trac.sagemath.org/ticket/18250#comment:9
https://trac.sagemath.org/ticket/18250#comment:9
<ul>
<li><strong>reviewer</strong>
set to <em>Vincent Delecroix</em>
</li>
</ul>
<p>
I do not like the default choice being <code>sparse copy</code>. Especially if <code>G</code> is implemented as a static dense graph! Wouldn't it be better to have a default <code>algorithm=None</code> and be more clever about choosing the right algorithm?
</p>
TicketncohenSun, 19 Apr 2015 13:28:12 GMT
https://trac.sagemath.org/ticket/18250#comment:10
https://trac.sagemath.org/ticket/18250#comment:10
<p>
I don't think that anybody in Sage plays with data structure except me, in Sage's subfunctions. I can do it, but I am rather convinced that it is useless. It may be useful in one or two years if I can rewrite and clean all this stuff.
</p>
<p>
Will do.
</p>
<p>
Nathann
</p>
TicketvdelecroixSun, 19 Apr 2015 13:32:21 GMT
https://trac.sagemath.org/ticket/18250#comment:11
https://trac.sagemath.org/ticket/18250#comment:11
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/18250#comment:10" title="Comment 10">ncohen</a>:
</p>
<blockquote class="citation">
<p>
I don't think that anybody in Sage plays with data structure except me, in Sage's subfunctions. I can do it, but I am rather convinced that it is useless. It may be useful in one or two years if I can rewrite and clean all this stuff.
</p>
</blockquote>
<p>
Ok. Let say something based on the density.
</p>
TicketncohenSun, 19 Apr 2015 13:38:54 GMT
https://trac.sagemath.org/ticket/18250#comment:12
https://trac.sagemath.org/ticket/18250#comment:12
<blockquote class="citation">
<p>
Ok. Let say something based on the density.
</p>
</blockquote>
<p>
<code>O_o</code>
</p>
<p>
Nononono. That's for the user to decide. I don't want ugly <code>density>0.3</code> which may only make sense on a specific architecture, and on the kind of graphs that were used for the tests.
</p>
TicketgitSun, 19 Apr 2015 13:55:27 GMTcommit changed
https://trac.sagemath.org/ticket/18250#comment:13
https://trac.sagemath.org/ticket/18250#comment:13
<ul>
<li><strong>commit</strong>
changed from <em>fa6115dc6cdea907dad9c2cd69668f5e71660024</em> to <em>fd88c98cfc2fee8d0750cd9b6f88d1d7b8177ed4</em>
</li>
</ul>
<p>
Branch pushed to git repo; I updated commit sha1. New commits:
</p>
<table class="wiki">
<tr><td><a class="ext-link" href="http://git.sagemath.org/sage.git/commit/?id=fd88c98cfc2fee8d0750cd9b6f88d1d7b8177ed4"><span class="icon"></span>fd88c98</a></td><td><code>trac #18250: algorithm=None by default and better 'wrong values' check</code>
</td></tr></table>
TicketvdelecroixSun, 19 Apr 2015 14:27:13 GMT
https://trac.sagemath.org/ticket/18250#comment:14
https://trac.sagemath.org/ticket/18250#comment:14
<p>
Hello,
</p>
<p>
The method <code>matrix</code> does work for non simple graphs, right? (after removing the loops)
</p>
<p>
Vincent
</p>
TicketncohenSun, 19 Apr 2015 14:28:23 GMT
https://trac.sagemath.org/ticket/18250#comment:15
https://trac.sagemath.org/ticket/18250#comment:15
<blockquote class="citation">
<p>
The method <code>matrix</code> does work for non simple graphs, right? (after removing the loops)
</p>
</blockquote>
<p>
After removing loops and multiple edges, yes.
</p>
<p>
Nathann
</p>
TicketvdelecroixSun, 19 Apr 2015 14:31:37 GMT
https://trac.sagemath.org/ticket/18250#comment:16
https://trac.sagemath.org/ticket/18250#comment:16
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/18250#comment:15" title="Comment 15">ncohen</a>:
</p>
<blockquote class="citation">
<blockquote class="citation">
<p>
The method <code>matrix</code> does work for non simple graphs, right? (after removing the loops)
</p>
</blockquote>
<p>
After removing loops and multiple edges, yes.
</p>
</blockquote>
<p>
Please, consider the following <strong>right</strong> answers
</p>
<pre class="wiki">sage: G = Graph([(0,1),(0,1),(1,2),(2,0)], multiedges=True)
sage: (G.adjacency_matrix()**3).trace() / 6
2
sage: sage: G = Graph([(0,1),(0,1),(1,2), (1,2), (2,0)], multiedges=True)
sage: (G.adjacency_matrix()**3).trace()//6
4
</pre>
TicketncohenSun, 19 Apr 2015 14:32:43 GMT
https://trac.sagemath.org/ticket/18250#comment:17
https://trac.sagemath.org/ticket/18250#comment:17
<p>
What do you mean?
</p>
TicketvdelecroixSun, 19 Apr 2015 14:33:40 GMT
https://trac.sagemath.org/ticket/18250#comment:18
https://trac.sagemath.org/ticket/18250#comment:18
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/18250#comment:17" title="Comment 17">ncohen</a>:
</p>
<blockquote class="citation">
<p>
What do you mean?
</p>
</blockquote>
<p>
That the method <code>algorithm=matrix</code> is valid for non simple graphs (once you removed the loops).
</p>
TicketncohenSun, 19 Apr 2015 14:35:24 GMT
https://trac.sagemath.org/ticket/18250#comment:19
https://trac.sagemath.org/ticket/18250#comment:19
<p>
Depends what you call a triangle. For me it is a set of three points adjacent to each other.
</p>
TicketncohenSun, 19 Apr 2015 14:36:41 GMT
https://trac.sagemath.org/ticket/18250#comment:20
https://trac.sagemath.org/ticket/18250#comment:20
<p>
If it is a 'closed walk of length 3' then you should keep the loops inside.
</p>
TicketvdelecroixSun, 19 Apr 2015 14:38:32 GMT
https://trac.sagemath.org/ticket/18250#comment:21
https://trac.sagemath.org/ticket/18250#comment:21
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/18250#comment:20" title="Comment 20">ncohen</a>:
</p>
<blockquote class="citation">
<p>
If it is a 'closed walk of length 3' then you should keep the loops inside.
</p>
</blockquote>
<p>
It is not. It is <a class="ext-link" href="http://en.wikipedia.org/wiki/Triangle_graph"><span class="icon"></span>C3</a>: three vertices and three edges! The edges are part of the data.
</p>
TicketncohenSun, 19 Apr 2015 14:39:36 GMT
https://trac.sagemath.org/ticket/18250#comment:22
https://trac.sagemath.org/ticket/18250#comment:22
<p>
Ahahah. So if you have three vertices and 4 edges in between, then it is not a triangle? No problem, yet another definition
</p>
<p>
Nathann
</p>
TicketvdelecroixSun, 19 Apr 2015 14:42:09 GMT
https://trac.sagemath.org/ticket/18250#comment:23
https://trac.sagemath.org/ticket/18250#comment:23
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/18250#comment:22" title="Comment 22">ncohen</a>:
</p>
<blockquote class="citation">
<p>
Ahahah. So if you have three vertices and 4 edges in between, then it is not a triangle? No problem, yet another definition
</p>
</blockquote>
<p>
No. It is not a triangle. But it does contain many triangles. It is very much like on the sphere. For each (non degenerate) triple of points you have <strong>4</strong> triangles whose vertices are exactly these points.
</p>
TicketncohenSun, 19 Apr 2015 14:43:37 GMT
https://trac.sagemath.org/ticket/18250#comment:24
https://trac.sagemath.org/ticket/18250#comment:24
<p>
See, I am extremely aware of this kind of problems. There is one thousand different ways of defining stuff, and everybody has his own notion of it, depending on where you come from. I'm pretty sure that guys from word theory/automata would think that a triangle is just "a path of length three that leads you back to where you started" (that corresponds to the 'closed walk' I mentionned above).
</p>
<p>
I don't want people to believe stuff and end up with something else than what they had in mind. If two different persons can expect different result from the same function, I want either none of them or a way to force them to tell me what exactly they expect.
</p>
TicketncohenSun, 19 Apr 2015 14:45:17 GMT
https://trac.sagemath.org/ticket/18250#comment:25
https://trac.sagemath.org/ticket/18250#comment:25
<p>
Oh, and then you will also have the guys who have a weight on their edges and want to count the number of weighted triangles.
</p>
TicketvdelecroixSun, 19 Apr 2015 14:48:29 GMT
https://trac.sagemath.org/ticket/18250#comment:26
https://trac.sagemath.org/ticket/18250#comment:26
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/18250#comment:25" title="Comment 25">ncohen</a>:
</p>
<blockquote class="citation">
<p>
Oh, and then you will also have the guys who have a weight on their edges and want to count the number of weighted triangles.
</p>
</blockquote>
<p>
This is what I am thinking about when I consider a multi graph, it is just a graph with (positive) integer labels. The weight of a triangle is the product of the labels on the three edges. And to count triangles, you just sum the weights. It is what you get from the matrix trace (when you removed the loops).
</p>
<p>
But I agree that there is one ambiguity: you might want to count induced C3.
</p>
<p>
Vincent
</p>
TicketvdelecroixSun, 19 Apr 2015 14:49:53 GMT
https://trac.sagemath.org/ticket/18250#comment:27
https://trac.sagemath.org/ticket/18250#comment:27
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/18250#comment:24" title="Comment 24">ncohen</a>:
</p>
<blockquote class="citation">
<p>
I'm pretty sure that guys from word theory/automata would think that a triangle is just "a path of length three that leads you back to where you started" (that corresponds to the 'closed walk' I mentionned above).
</p>
</blockquote>
<p>
Some people have wrong ideas. I know.
</p>
TicketncohenSun, 19 Apr 2015 14:51:09 GMT
https://trac.sagemath.org/ticket/18250#comment:28
https://trac.sagemath.org/ticket/18250#comment:28
<blockquote class="citation">
<p>
Some people have wrong ideas. I know.
</p>
</blockquote>
<p>
And in my infinite Christ-like generosity I still care for them, and they shouldn't get wrong results just because they are sinners.
</p>
<p>
Nathann
</p>
TicketvdelecroixSun, 19 Apr 2015 14:54:49 GMTstatus changed
https://trac.sagemath.org/ticket/18250#comment:29
https://trac.sagemath.org/ticket/18250#comment:29
<ul>
<li><strong>status</strong>
changed from <em>needs_review</em> to <em>positive_review</em>
</li>
</ul>
<p>
More seriously, I would prefer if the <code>triangles_count</code> in <code>static_dense_graph</code> would only accept backend being <code>static_dense_graph</code> (and move the copy step inside the <code>generic_graph</code>). Would make more sense. But that's very minor. I just want to know your feelings about how you would handle specificities of the backends in the future.
</p>
TicketncohenSun, 19 Apr 2015 15:01:38 GMT
https://trac.sagemath.org/ticket/18250#comment:30
https://trac.sagemath.org/ticket/18250#comment:30
<blockquote class="citation">
<p>
More seriously, I would prefer if the <code>triangles_count</code> in <code>static_dense_graph</code> would only accept backend being <code>static_dense_graph</code> (and move the copy step inside the <code>generic_graph</code>)
</p>
</blockquote>
<p>
Well.. <code>generic_graph.py</code> cannot handle <code>C</code> types so it would have required to move the function to <code>generic_graph_pyx.pyx</code> (what a mess <code>>_<</code>), but I think that I was also influenced by the function <code>is_strongly_regular</code> immediately above.
</p>
<p>
Also, at first this function was building a copy of the graph every time, because I really did not think that it mattered. There are many functions like that in <code>distance_all_pairs</code> too. It contains every function which "relies heavily on the distance_all_pairs functions", and most of them accept Sage graphs directly (though some of them have pure C couterparts).
</p>
<blockquote class="citation">
<p>
I just want to know your feelings about how you would handle specificities of the backends in the future.
</p>
</blockquote>
<p>
Can you tell me what you mean by "specificities of the backends"?
</p>
<p>
Nathann
</p>
TicketncohenSun, 19 Apr 2015 15:01:49 GMT
https://trac.sagemath.org/ticket/18250#comment:31
https://trac.sagemath.org/ticket/18250#comment:31
<p>
(Thanks for the review btw!)
</p>
TicketncohenSun, 19 Apr 2015 16:14:06 GMT
https://trac.sagemath.org/ticket/18250#comment:32
https://trac.sagemath.org/ticket/18250#comment:32
<blockquote class="citation">
<p>
Can you tell me what you mean by "specificities of the backends"?
</p>
</blockquote>
<p>
If you actually mean "specificities of the backens" (and not about the Multi,looped,edge-labelled) graph), then what I want to do is merely to make it simple to switch from one to the other, for the user or for the coders. I am trying to clean the constructor first (<a class="closed ticket" href="https://trac.sagemath.org/ticket/18185" title="defect: Clean the Graph/DiGraph constructors (closed: fixed)">#18185</a> in <code>needs_review</code>), then I will try to make it a bit faster (creating immutable copies is unnecessarily complicated), to distribute them a bit (into subfunctions).
</p>
<p>
The main problem I haven't solved is this: what should the *default* backend for Sage graphs be?
</p>
<p>
For small graphs I expect that the current one is right. But being able to run "has_edge" in <code>log(n)</code> time, or being able to remove edges has a huge cost. If we just want to add edges and vertices (i.e. to build graphs, from an algorithm or from a file) then it can be MUCH cheaper. For analysis of large networks it is something that you cannot afford to pay, and then this backend must be different.
</p>
<p>
So if we want to make no choice at all, it really has to become much simpler and natural to pick a backend for a graph.
</p>
<p>
Also, we may want to have a 'virtual' backend like in Gap for graphs defined from a group action. No graph would actually be stored, yet all algorithms should be available.
</p>
<p>
Well, that's what I think of these days.
</p>
<p>
Nathann
</p>
TicketvbraunTue, 21 Apr 2015 00:10:54 GMTstatus, branch changed; resolution set
https://trac.sagemath.org/ticket/18250#comment:33
https://trac.sagemath.org/ticket/18250#comment:33
<ul>
<li><strong>status</strong>
changed from <em>positive_review</em> to <em>closed</em>
</li>
<li><strong>resolution</strong>
set to <em>fixed</em>
</li>
<li><strong>branch</strong>
changed from <em>public/18250</em> to <em>fd88c98cfc2fee8d0750cd9b6f88d1d7b8177ed4</em>
</li>
</ul>
Ticket