Opened 6 years ago
Closed 6 years ago
#18811 closed enhancement (fixed)
Boost Clustering Coefficient
Reported by:  borassi  Owned by:  

Priority:  major  Milestone:  sage6.8 
Component:  graph theory  Keywords:  Local clustering coefficient, Boost 
Cc:  ncohen, dcoudert  Merged in:  
Authors:  Michele Borassi  Reviewers:  Nathann Cohen 
Report Upstream:  N/A  Work issues:  
Branch:  582ce34 (Commits)  Commit:  582ce3466d9dff303532a84f9738cdd4f581d5a9 
Dependencies:  18564  Stopgaps: 
Description (last modified by )
Apply Boost algorithm for computing the clustering coefficient, using the interface of Ticket #18564.
Change History (25)
comment:1 Changed 6 years ago by
 Cc ncohen added
comment:2 Changed 6 years ago by
 Branch set to u/borassi/boost_clustering_coefficient
comment:3 followup: ↓ 4 Changed 6 years ago by
 Cc dcoudert added
 Commit set to 283c61be9357902ff555843b27c0ffed6fed6ea4
 Component changed from PLEASE CHANGE to graph theory
 Dependencies set to 18564
 Description modified (diff)
 Keywords Local clustering coefficient Boost added
 Status changed from new to needs_review
 Type changed from PLEASE CHANGE to enhancement
comment:4 in reply to: ↑ 3 ; followup: ↓ 5 Changed 6 years ago by
I hope you like it!
Ahahahah. I do! But the first thing I will do once this thing is reviewed is to see if I can beat it :P
Nathann
comment:5 in reply to: ↑ 4 ; followup: ↓ 6 Changed 6 years ago by
Ahahahah. I do! But the first thing I will do once this thing is reviewed is to see if I can beat it
:P
I'm pretty sure you can: to intersect two neighbors, Boost algorithm iterates over both adjacencies in time O(d_1d_2), where d_1 and d_2 are the degrees. Using binary trees instead of vectors, you could do it in O(d_1 log d_2), or viceversa.
comment:6 in reply to: ↑ 5 Changed 6 years ago by
I'm pretty sure you can: to intersect two neighbors, Boost algorithm iterates over both adjacencies in time O(d_1d_2), where d_1 and d_2 are the degrees. Using binary trees instead of vectors, you could do it in O(d_1 log d_2), or viceversa.
Sage's timings should be approximately those of triangle_count
.
comment:7 followup: ↓ 13 Changed 6 years ago by
Hellooooooo Michele,
 About
boost_clustering_coeff
 to me it makes more sense if this function, meant to apply to a boost graph, does not take a Sage graph as input. You only need it to get the vertices' correct label, but you only need this at a higher level. Returning a dictionary associating the coeffcient to each (integer) vertex is not very expensive considering the computations you do, and so you can afford to relabel it higher.
 Could you define "local clustering" somewhere? Is it any different from the 'clustering coefficient of a vertex'?
 Could you add an INPUT block to
cpdef clustering_coeff
?
 Same function  by using a
sig_check
instead ofsig_on/off
it is impossible to interrupt the function whileboost_resulting_coeff
is running. It will only be interrupted when the function returns.
 Could you rename 'cc' to 'average_clustering_coefficient'? When you need a comment to say what a variable is, it often means that you should rename the variable.
 in
clustering_average
 could you document the default behaviour ofimplementation=None
? (you did it inclustering_coeff
already).
 You should not indent the *text* after a 'TESTS:' block (like it is done for 'INPUT:'). Usually, you only need to indent text in the documentation after a '::'.
 Instead of
weight==False
, it is better to use 'not weight'. Handles better something likeweight=0
(faster to type:P
)
Looks very good!
Thanks,
Nathann
comment:8 Changed 6 years ago by
 Status changed from needs_review to needs_work
comment:9 followup: ↓ 14 Changed 6 years ago by
Hello,
I have not checked the code yet.
Nathann, I have an old code using bitsets for speeding up intersection of neighborhoods. I can update it using static dense graph and see if it is a good competitor ;) The idea is to compute the number of triangles incident to each edge, and then to compute the number of triangles incident to a given vertex.
David.
comment:10 Changed 6 years ago by
 Commit changed from 283c61be9357902ff555843b27c0ffed6fed6ea4 to 18bd01ab82cf0b9d064d4fe680f17df908bed0da
Branch pushed to git repo; I updated commit sha1. New commits:
18bd01a  Applied Nathann's suggestions

comment:11 Changed 6 years ago by
I do not change the status to "needs_review", yet, because I'm a bit tired and I'd like to check my work tomorrow morning.
comment:12 Changed 6 years ago by
 Commit changed from 18bd01ab82cf0b9d064d4fe680f17df908bed0da to 99bd994335f70455d898c83f1a13b3d38e8f5064
Branch pushed to git repo; I updated commit sha1. New commits:
99bd994  Few corrections in the reference manual

comment:13 in reply to: ↑ 7 ; followup: ↓ 15 Changed 6 years ago by
 Status changed from needs_work to needs_review
Helloooooooo!
 About
boost_clustering_coeff
 to me it makes more sense if this function,meant to apply to a boost graph, does not take a Sage graph as input. You only need it to get the vertices' correct label, but you only need this at a higher level. Returning a dictionary associating the coeffcient to each (integer) vertex is not very expensive considering the computations you do, and so you can afford to relabel it higher.
Oops, I wrote it like this to test if it worked, and I forgot to change it. By solving this issue, I also have found problems if the vertex label is not an integer, I corrected it, and I added a doctest.
 Could you define "local clustering" somewhere? Is it any different from the
'clustering coefficient of a vertex'?
It is exactly the same (https://en.wikipedia.org/wiki/Clustering_coefficient#Local_clustering_coefficient): I decided to remove "local" from all comments.
 Could you add an INPUT block to
cpdef clustering_coeff
?
Done!
 Same function  by using a
sig_check
instead ofsig_on/off
it isimpossible to interrupt the function while
boost_resulting_coeff
is running. It will only be interrupted when the function returns.
Done! Now I use sig_on/off.
 Could you rename 'cc' to 'average_clustering_coefficient'? When you need a
comment to say what a variable is, it often means that you should rename the variable.
Done!
 in
clustering_average
 could you document the default behaviour of
implementation=None
? (you did it inclustering_coeff
already).
Done
 You should not indent the *text* after a 'TESTS:' block (like it is done for
'INPUT:'). Usually, you only need to indent text in the documentation after a '::'.
Done!
 Instead of
weight==False
, it is better to use 'not weight'. Handles bettersomething like
weight=0
(faster to type:P
)
Done!
comment:14 in reply to: ↑ 9 ; followup: ↓ 21 Changed 6 years ago by
Nathann, I have an old code using bitsets for speeding up intersection of neighborhoods. I can update it using static dense graph and see if it is a good competitor ;)
Sage does have something like that, too. It would only require minor changes to what was added in #18250.
Nathann
comment:15 in reply to: ↑ 13 Changed 6 years ago by
 Reviewers set to Nathann Cohen
Hello Michele,
Thanks for your updates. I added a commit at public/18811, and if you agree with it then you can add it to the branch and this ticket can go.
Here are the modifications I made:
 remove trailing whitespaces
 The function
boost_clustering_coeff
claimed to accept 'None' as a list of vertices, but did not len(clust_of_v.values())
is equal tolen(clust_of_v)
{g.vertices()[v]: clust_v_int[v] for v in vertices_boost
} this line callsg.vertices()
for each[v]
. You can use 'tolerance' for tests like
sage: abs(G.clustering_average(implementation='boost')  G.clustering_average(implementation='networkx')) < 1E12
Have fuuuuuuuuun,
Nathann
comment:16 Changed 6 years ago by
 Commit changed from 99bd994335f70455d898c83f1a13b3d38e8f5064 to 6c8d49bc6d0101f897c489124d52c5e4a252a959
comment:17 followup: ↓ 18 Changed 6 years ago by
Hey nononon! I had changed the code to handle the 'None' case! Do whatever you want, but please keep the code and the doc consistent.
Nathann
comment:18 in reply to: ↑ 17 ; followup: ↓ 19 Changed 6 years ago by
Sorry, I didn't realize you modified the code... That's why I modified the doc, I thought you forgot a commit, or something like that. In my opinion, it is better to modify the doc, for simplicity!
BTW, how can I use 'tolerance' in the other test?
sage: G = graphs.RandomGNM(10,20) sage: clust_boost = G.clustering_coeff(implementation='boost') sage: clust_networkx = G.clustering_coeff(implementation='networkx') sage: for v in G.vertices(): ....: if abs(clust_boost[v]  clust_networkx[v]) > 1E12: ....: print "Error:" ....: print " Boost clustering of", v, "is", clust_boost[v] ....: print "NetworkX clustering of", v, "is", clust_networkx[v]
Replying to ncohen:
Hey nononon! I had changed the code to handle the 'None' case! Do whatever you want, but please keep the code and the doc consistent.
Nathann
comment:19 in reply to: ↑ 18 Changed 6 years ago by
Helloooooo,
In my opinion, it is better to modify the doc, for simplicity!
No problem for me. But please update the code too, in this case.
BTW, how can I use 'tolerance' in the other test?
I think that it could work by adding the same 'tol' flag on all lnes of the loop. Not sure, you have to try ^^;
http://doc.sagemath.org/html/en/developer/coding_basics.html#specialmarkuptoinfluencetests
Nathann
comment:20 Changed 6 years ago by
OOOppss nononono sorry. Given that your doctest does not print anything when it works, I'd say that it is easier to understand it this way. Otherwise you would have to add lines and lines of useless '0'.
By the way, you can write 'for v in G' instead of 'for v in G.vertices()'.
Have fuuuuun,
Nathann
comment:21 in reply to: ↑ 14 Changed 6 years ago by
comment:22 Changed 6 years ago by
 Commit changed from 6c8d49bc6d0101f897c489124d52c5e4a252a959 to 582ce3466d9dff303532a84f9738cdd4f581d5a9
comment:23 Changed 6 years ago by
For me, now the patch is good to go!
comment:24 Changed 6 years ago by
 Status changed from needs_review to positive_review
Okayyyy, then let it go.
Nathann
comment:25 Changed 6 years ago by
 Branch changed from u/borassi/boost_clustering_coefficient to 582ce3466d9dff303532a84f9738cdd4f581d5a9
 Resolution set to fixed
 Status changed from positive_review to closed
Hello! I have implemented the computation of the local clustering coefficient through Boost. The improvement is not as striking as the edge connectivity, but it is still a 10x improvement, more or less. Some benchmarks:
I hope you like it!