Opened 5 years ago
Closed 5 years ago
#18250 closed enhancement (fixed)
G.triangles_count speedup
Reported by:  ncohen  Owned by:  

Priority:  major  Milestone:  sage6.7 
Component:  graph theory  Keywords:  
Cc:  azi, vdelecroix, dcoudert  Merged in:  
Authors:  Nathann Cohen  Reviewers:  Vincent Delecroix 
Report Upstream:  N/A  Work issues:  
Branch:  fd88c98 (Commits)  Commit:  fd88c98cfc2fee8d0750cd9b6f88d1d7b8177ed4 
Dependencies:  Stopgaps: 
Description (last modified by )
I thought that we had a decent implementation of G.triangles_count
. Turns out that we did not.
Here is what the branch does:
 Exception on nonsimple graphs. The digraph version can handle loops (it cannot handle multiple edges) and the graph version cannot handle any of the two.
 (obvious) speedup in the 'iter' algorithm. Before:
sage: %timeit _=g.triangles_count(algorithm='iter') 1 loops, best of 3: 10.2 s per loop
After:sage: g=graphs.RandomGNP(700,.3) sage: %timeit _=g.triangles_count(algorithm='iter') 1 loops, best of 3: 6.37 s per loop
 Added
triangles_count
instatic_sparse_graph
and another one instatic_dense_graph
. They first copy the graph in a proper data structure and count the triangles on the copy. They can take require more memory (though they are 100000x cheaper than the default data structure)
 The default algorithm is 'iter' in the code but 'matrix' in the doc. Fixed:
sparse_copy
is now the default.
Overall, comparing default implementations:
Before
sage: g = graphs.RandomGNP(500,.5) sage: %timeit g.triangles_count() 1 loops, best of 3: 9.78 s per loop
After
sage: g = graphs.RandomGNP(500,.5) sage: %timeit g.triangles_count() 10 loops, best of 3: 162 ms per loop
Change History (33)
comment:1 Changed 5 years ago by
 Branch set to public/18250
 Commit set to e780f711afba40b34c0d11b65adaf3cc4429fac2
 Description modified (diff)
 Status changed from new to needs_review
comment:2 Changed 5 years ago by
 Commit changed from e780f711afba40b34c0d11b65adaf3cc4429fac2 to ed8bb3c374929b2a32b3e9310124990eb5b2350c
Branch pushed to git repo; I updated commit sha1. New commits:
ed8bb3c  Trac 18250: documentation

comment:3 followup: ↓ 5 Changed 5 years ago by
 Status changed from needs_review to needs_info
Hello,
Small commit for documentation at public/18250
.
The algorithm 'iter'
returns int
while dense_copy
returns Integer
. This would better be uniform.
You always perform a copy?! Isn't it possible to have a graph whose backend is already a static_sparse_graph
or a static_dense_graph
?
Vincent
comment:4 Changed 5 years ago by
 Commit changed from ed8bb3c374929b2a32b3e9310124990eb5b2350c to fa6115dc6cdea907dad9c2cd69668f5e71660024
Branch pushed to git repo; I updated commit sha1. New commits:
fa6115d  trac #18250: Integer return type and avoid useless copies

comment:5 in reply to: ↑ 3 ; followup: ↓ 6 Changed 5 years ago by
 Status changed from needs_info to needs_review
Yooooooooooooooo,
Small commit for documentation at
public/18250
.
Oh, right. Thanks.
The algorithm
'iter'
returnsint
whiledense_copy
returnsInteger
. This would better be uniform.
Done.
You always perform a copy?! Isn't it possible to have a graph whose backend is already a
static_sparse_graph
or astatic_dense_graph
?
Well, in this case copies are eally cheap, so I personally prefer a simpler code to one that avoids that copy.
Still, I implemented it. There is a 'trick' somewhere:
1) I first build a copy of the graph with G.copy(immutable=True)
(which makes no copy if the graph is already immutable), then access its internal data structure
2) If I do g[0] = G.copy(immutable=True)._backend._cg.g[0]
I get a segfault because G.copy() is immediately deallocated. So I first do G = G.copy(immutable=True)._backend
, and *then* access its internal data.
Fortunately the copy G
is not deleted until the end of the function. If Cython ever notices that (as far as it is concerned) it is never used again in the function, then it will be deallocated and we will get a segfault again.
Nathann
comment:6 in reply to: ↑ 5 ; followup: ↓ 7 Changed 5 years ago by
Replying to ncohen:
You always perform a copy?! Isn't it possible to have a graph whose backend is already a
static_sparse_graph
or astatic_dense_graph
?Fortunately the copy
G
is not deleted until the end of the function. If Cython ever notices that (as far as it is concerned) it is never used again in the function, then it will be deallocated and we will get a segfault again.
Cython does not perform deallocation, it is Python job. Though Cython does increment/decrement the reference counter. And precisely, doing G = G.copy(immutable=True)
precisely increment the reference counter!
Looks good! I am running the tests...
Vincent
comment:7 in reply to: ↑ 6 ; followup: ↓ 8 Changed 5 years ago by
Cython does not perform deallocation, it is Python job.
Well if you prefer, but this code is translated into C, and gcc may notice that the object is totally useless. They must have done something to keep those Python variables until the end.
Nathann
comment:8 in reply to: ↑ 7 Changed 5 years ago by
Replying to ncohen:
Cython does not perform deallocation, it is Python job.
Well if you prefer, but this code is translated into C, and gcc may notice that the object is totally useless. They must have done something to keep those Python variables until the end.
Nope. gcc will not do that. An affectation G = H
is not one instruction if H
is a Python object. Cython translates it into something like
cdef PyObject * G = H; Py_INCREF(G);
whick prevents deallocation of H until G is garbage collected (typically at the end of the function or until a call of del G
).
Vincent
comment:9 Changed 5 years ago by
 Reviewers set to Vincent Delecroix
I do not like the default choice being sparse copy
. Especially if G
is implemented as a static dense graph! Wouldn't it be better to have a default algorithm=None
and be more clever about choosing the right algorithm?
comment:10 followup: ↓ 11 Changed 5 years ago by
I don't think that anybody in Sage plays with data structure except me, in Sage's subfunctions. I can do it, but I am rather convinced that it is useless. It may be useful in one or two years if I can rewrite and clean all this stuff.
Will do.
Nathann
comment:11 in reply to: ↑ 10 ; followup: ↓ 12 Changed 5 years ago by
Replying to ncohen:
I don't think that anybody in Sage plays with data structure except me, in Sage's subfunctions. I can do it, but I am rather convinced that it is useless. It may be useful in one or two years if I can rewrite and clean all this stuff.
Ok. Let say something based on the density.
comment:12 in reply to: ↑ 11 Changed 5 years ago by
Ok. Let say something based on the density.
O_o
Nononono. That's for the user to decide. I don't want ugly density>0.3
which may only make sense on a specific architecture, and on the kind of graphs that were used for the tests.
comment:13 Changed 5 years ago by
 Commit changed from fa6115dc6cdea907dad9c2cd69668f5e71660024 to fd88c98cfc2fee8d0750cd9b6f88d1d7b8177ed4
Branch pushed to git repo; I updated commit sha1. New commits:
fd88c98  trac #18250: algorithm=None by default and better 'wrong values' check

comment:14 followup: ↓ 15 Changed 5 years ago by
Hello,
The method matrix
does work for non simple graphs, right? (after removing the loops)
Vincent
comment:15 in reply to: ↑ 14 ; followup: ↓ 16 Changed 5 years ago by
The method
matrix
does work for non simple graphs, right? (after removing the loops)
After removing loops and multiple edges, yes.
Nathann
comment:16 in reply to: ↑ 15 Changed 5 years ago by
Replying to ncohen:
The method
matrix
does work for non simple graphs, right? (after removing the loops)After removing loops and multiple edges, yes.
Please, consider the following right answers
sage: G = Graph([(0,1),(0,1),(1,2),(2,0)], multiedges=True) sage: (G.adjacency_matrix()**3).trace() / 6 2 sage: sage: G = Graph([(0,1),(0,1),(1,2), (1,2), (2,0)], multiedges=True) sage: (G.adjacency_matrix()**3).trace()//6 4
comment:17 followup: ↓ 18 Changed 5 years ago by
What do you mean?
comment:18 in reply to: ↑ 17 Changed 5 years ago by
Replying to ncohen:
What do you mean?
That the method algorithm=matrix
is valid for non simple graphs (once you removed the loops).
comment:19 Changed 5 years ago by
Depends what you call a triangle. For me it is a set of three points adjacent to each other.
comment:20 followup: ↓ 21 Changed 5 years ago by
If it is a 'closed walk of length 3' then you should keep the loops inside.
comment:21 in reply to: ↑ 20 Changed 5 years ago by
comment:22 followup: ↓ 23 Changed 5 years ago by
Ahahah. So if you have three vertices and 4 edges in between, then it is not a triangle? No problem, yet another definition
Nathann
comment:23 in reply to: ↑ 22 Changed 5 years ago by
Replying to ncohen:
Ahahah. So if you have three vertices and 4 edges in between, then it is not a triangle? No problem, yet another definition
No. It is not a triangle. But it does contain many triangles. It is very much like on the sphere. For each (non degenerate) triple of points you have 4 triangles whose vertices are exactly these points.
comment:24 followup: ↓ 27 Changed 5 years ago by
See, I am extremely aware of this kind of problems. There is one thousand different ways of defining stuff, and everybody has his own notion of it, depending on where you come from. I'm pretty sure that guys from word theory/automata would think that a triangle is just "a path of length three that leads you back to where you started" (that corresponds to the 'closed walk' I mentionned above).
I don't want people to believe stuff and end up with something else than what they had in mind. If two different persons can expect different result from the same function, I want either none of them or a way to force them to tell me what exactly they expect.
comment:25 followup: ↓ 26 Changed 5 years ago by
Oh, and then you will also have the guys who have a weight on their edges and want to count the number of weighted triangles.
comment:26 in reply to: ↑ 25 Changed 5 years ago by
Replying to ncohen:
Oh, and then you will also have the guys who have a weight on their edges and want to count the number of weighted triangles.
This is what I am thinking about when I consider a multi graph, it is just a graph with (positive) integer labels. The weight of a triangle is the product of the labels on the three edges. And to count triangles, you just sum the weights. It is what you get from the matrix trace (when you removed the loops).
But I agree that there is one ambiguity: you might want to count induced C3.
Vincent
comment:27 in reply to: ↑ 24 ; followup: ↓ 28 Changed 5 years ago by
Replying to ncohen:
I'm pretty sure that guys from word theory/automata would think that a triangle is just "a path of length three that leads you back to where you started" (that corresponds to the 'closed walk' I mentionned above).
Some people have wrong ideas. I know.
comment:28 in reply to: ↑ 27 Changed 5 years ago by
Some people have wrong ideas. I know.
And in my infinite Christlike generosity I still care for them, and they shouldn't get wrong results just because they are sinners.
Nathann
comment:29 followup: ↓ 30 Changed 5 years ago by
 Status changed from needs_review to positive_review
More seriously, I would prefer if the triangles_count
in static_dense_graph
would only accept backend being static_dense_graph
(and move the copy step inside the generic_graph
). Would make more sense. But that's very minor. I just want to know your feelings about how you would handle specificities of the backends in the future.
comment:30 in reply to: ↑ 29 ; followup: ↓ 32 Changed 5 years ago by
More seriously, I would prefer if the
triangles_count
instatic_dense_graph
would only accept backend beingstatic_dense_graph
(and move the copy step inside thegeneric_graph
)
Well.. generic_graph.py
cannot handle C
types so it would have required to move the function to generic_graph_pyx.pyx
(what a mess >_<
), but I think that I was also influenced by the function is_strongly_regular
immediately above.
Also, at first this function was building a copy of the graph every time, because I really did not think that it mattered. There are many functions like that in distance_all_pairs
too. It contains every function which "relies heavily on the distance_all_pairs functions", and most of them accept Sage graphs directly (though some of them have pure C couterparts).
I just want to know your feelings about how you would handle specificities of the backends in the future.
Can you tell me what you mean by "specificities of the backends"?
Nathann
comment:31 Changed 5 years ago by
(Thanks for the review btw!)
comment:32 in reply to: ↑ 30 Changed 5 years ago by
Can you tell me what you mean by "specificities of the backends"?
If you actually mean "specificities of the backens" (and not about the Multi,looped,edgelabelled) graph), then what I want to do is merely to make it simple to switch from one to the other, for the user or for the coders. I am trying to clean the constructor first (#18185 in needs_review
), then I will try to make it a bit faster (creating immutable copies is unnecessarily complicated), to distribute them a bit (into subfunctions).
The main problem I haven't solved is this: what should the *default* backend for Sage graphs be?
For small graphs I expect that the current one is right. But being able to run "has_edge" in log(n)
time, or being able to remove edges has a huge cost. If we just want to add edges and vertices (i.e. to build graphs, from an algorithm or from a file) then it can be MUCH cheaper. For analysis of large networks it is something that you cannot afford to pay, and then this backend must be different.
So if we want to make no choice at all, it really has to become much simpler and natural to pick a backend for a graph.
Also, we may want to have a 'virtual' backend like in Gap for graphs defined from a group action. No graph would actually be stored, yet all algorithms should be available.
Well, that's what I think of these days.
Nathann
comment:33 Changed 5 years ago by
 Branch changed from public/18250 to fd88c98cfc2fee8d0750cd9b6f88d1d7b8177ed4
 Resolution set to fixed
 Status changed from positive_review to closed
New commits:
trac #18250: G.triangles_count speedup