Opened 4 years ago

Closed 4 years ago

#18834 closed enhancement (fixed)

Use Sage to compute clustering coefficient

Reported by: ncohen Owned by:
Priority: major Milestone: sage-6.8
Component: graph theory Keywords:
Cc: borassi, dcoudert Merged in:
Authors: Nathann Cohen Reviewers: David Coudert
Report Upstream: N/A Work issues:
Branch: 8c25063 (Commits) Commit: 8c25063de223a611c202d937d828c78cf0db8bce
Dependencies: #18811 Stopgaps:

Description (last modified by ncohen)

Will have to be rebased+adapted over #18811.

sage: g = graphs.CompleteGraph(500)
sage: %time _=g.clustering_coeff(implementation="boost")
CPU times: user 7.62 s, sys: 8 ms, total: 7.62 s
Wall time: 7.63 s
sage: %time _=g.clustering_coeff()
CPU times: user 340 ms, sys: 0 ns, total: 340 ms
Wall time: 328 ms
sage: %time _=g.clustering_coeff(implementation="dense_copy")
CPU times: user 56 ms, sys: 4 ms, total: 60 ms
Wall time: 61.3 ms

Change History (21)

comment:1 Changed 4 years ago by ncohen

  • Branch set to u/ncohen/18834

comment:2 Changed 4 years ago by git

  • Commit set to 5a40d920d6a8c6585424329958b8b54a30c76275

Branch pushed to git repo; I updated commit sha1. New commits:

5a40d92trac #18834: Use Sage to compute clustering coefficient

comment:3 Changed 4 years ago by ncohen

  • Authors set to Nathann Cohen
  • Status changed from new to needs_review

comment:4 Changed 4 years ago by git

  • Commit changed from 5a40d920d6a8c6585424329958b8b54a30c76275 to dba89c2c25a22a1dd66e53abde9ae8e619723556

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

283c61bFirst draft
18bd01aApplied Nathann's suggestions
99bd994Few corrections in the reference manual
40df844trac #18811: Review
6c8d49bMore reviewer comments
dba89c2trac #18834: Use Sage to compute clustering coefficient

comment:5 Changed 4 years ago by ncohen

  • Description modified (diff)

comment:6 Changed 4 years ago by git

  • Commit changed from dba89c2c25a22a1dd66e53abde9ae8e619723556 to e25d6a4407c2c8cb5092fceed80ab86146134327

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

c365d02Removed "vertices is None" from boost_clustering_coeff
582ce34Changed for v in G.vertices() => for v in G
e25d6a4trac #18834: Use Sage to compute clustering coefficient

comment:7 Changed 4 years ago by ncohen

rebased on top of #18811.

comment:8 Changed 4 years ago by dcoudert

For me the patch is good to go (and the boost method is pretty fast in fact). However, the patch could be rebased on beta7 to solve the patchbot issue.

comment:9 Changed 4 years ago by git

  • Commit changed from e25d6a4407c2c8cb5092fceed80ab86146134327 to 6274bff716b13f4ef7b3e1745d8dcbf761336826

Branch pushed to git repo; I updated commit sha1. New commits:

6274bfftrac #18834: Merged with 6.8.beta7

comment:10 Changed 4 years ago by ncohen

Done! But what do you mean by 'boost is pretty fast'?? Sage is 100x faster O_o

comment:11 follow-up: Changed 4 years ago by dcoudert

On a clique that's for sure, but on other kinds of graphs...

sage: g = graphs.RandomBarabasiAlbert(1000,2)
sage: %timeit _=g.clustering_coeff(implementation="boost")
1000 loops, best of 3: 1.9 ms per loop
sage: %timeit _=g.clustering_coeff(implementation="dense_copy")
100 loops, best of 3: 7.34 ms per loop
sage: %timeit _=g.clustering_coeff(implementation="sparse_copy")
100 loops, best of 3: 13.9 ms per loop
sage: %timeit _=g.clustering_coeff(implementation="networkx")
100 loops, best of 3: 19.3 ms per loop

sage: g = graphs.RandomBarabasiAlbert(10000,2)
sage: %timeit _=g.clustering_coeff(implementation="boost")
10 loops, best of 3: 38 ms per loop
sage: %timeit _=g.clustering_coeff(implementation="dense_copy")
10 loops, best of 3: 143 ms per loop
sage: %timeit _=g.clustering_coeff(implementation="sparse_copy")
10 loops, best of 3: 148 ms per loop
sage: %timeit _=g.clustering_coeff(implementation="networkx")
1 loops, best of 3: 248 ms per loop

comment:12 Changed 4 years ago by dcoudert

Also, I have an issue:

sage: g = graphs.RandomGNP(1000,.1)
sage: N = 1000
sage: g = graphs.RandomGNP(N, log(N)/N)
sage: g.order(), g.size()
(1000, 3567)
sage: %timeit _=g.clustering_coeff(implementation="boost")
100 loops, best of 3: 2.78 ms per loop
sage: %timeit _=g.clustering_coeff(implementation="dense_copy")
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-31-cd519bf674aa> in <module>()
----> 1 get_ipython().magic(u'timeit _=g.clustering_coeff(implementation="dense_copy")')

/Users/dcoudert/sage/local/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in magic(self, arg_s)
   2305         magic_name, _, magic_arg_s = arg_s.partition(' ')
   2306         magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2307         return self.run_line_magic(magic_name, magic_arg_s)
   2308 
   2309     #-------------------------------------------------------------------------

/Users/dcoudert/sage/local/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in run_line_magic(self, magic_name, line)
   2226                 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
   2227             with self.builtin_trap:
-> 2228                 result = fn(*args,**kwargs)
   2229             return result
   2230 

/Users/dcoudert/sage/local/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in timeit(self, line, cell)

/Users/dcoudert/sage/local/lib/python2.7/site-packages/IPython/core/magic.pyc in <lambda>(f, *a, **k)
    191     # but it's overkill for just that one bit of state.
    192     def magic_deco(arg):
--> 193         call = lambda f, *a, **k: f(*a, **k)
    194 
    195         if callable(arg):

/Users/dcoudert/sage/local/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in timeit(self, line, cell)
   1034             number = 1
   1035             for _ in range(1, 10):
-> 1036                 time_number = timer.timeit(number)
   1037                 worst_tuning = max(worst_tuning, time_number / number)
   1038                 if time_number >= 0.2:

/Users/dcoudert/sage/local/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in timeit(self, number)
    130         gc.disable()
    131         try:
--> 132             timing = self.inner(it, self.timer)
    133         finally:
    134             if gcold:

<magic-timeit> in inner(_it, _timer)

/Users/dcoudert/sage/local/lib/python2.7/site-packages/sage/graphs/generic_graph.pyc in clustering_coeff(self, nodes, weight, implementation, return_vertex_weights)
  12437             from sage.rings.integer import Integer
  12438             return {v:Integer(count)/((self.degree(v)*(self.degree(v)-1))/2)
> 12439                     for v,count in triangles_count(self).iteritems()}
  12440 
  12441     def cluster_transitivity(self):

/Users/dcoudert/sage/local/lib/python2.7/site-packages/sage/graphs/generic_graph.pyc in <dictcomp>((v, count))
  12437             from sage.rings.integer import Integer
  12438             return {v:Integer(count)/((self.degree(v)*(self.degree(v)-1))/2)
> 12439                     for v,count in triangles_count(self).iteritems()}
  12440 
  12441     def cluster_transitivity(self):

/Users/dcoudert/sage/src/sage/structure/element.pyx in sage.structure.element.RingElement.__div__ (/Users/dcoudert/sage/src/build/cythonized/sage/structure/element.c:17310)()
   1991         if have_same_parent_c(self, right):
   1992             return (<RingElement>self)._div_(<RingElement>right)
-> 1993         return coercion_model.bin_op(self, right, div)
   1994 
   1995     cpdef RingElement _div_(self, RingElement right):

/Users/dcoudert/sage/src/sage/structure/coerce.pyx in sage.structure.coerce.CoercionModel_cache_maps.bin_op (/Users/dcoudert/sage/src/build/cythonized/sage/structure/coerce.c:8195)()
    994         try:
    995             xy = self.canonical_coercion(x,y)
--> 996             return PyObject_CallObject(op, xy)
    997         except TypeError as err:
    998             if xy is not None:

/Users/dcoudert/sage/src/sage/structure/element.pyx in sage.structure.element.RingElement.__div__ (/Users/dcoudert/sage/src/build/cythonized/sage/structure/element.c:17293)()
   1990         """
   1991         if have_same_parent_c(self, right):
-> 1992             return (<RingElement>self)._div_(<RingElement>right)
   1993         return coercion_model.bin_op(self, right, div)
   1994 

/Users/dcoudert/sage/src/sage/rings/integer.pyx in sage.rings.integer.Integer._div_ (/Users/dcoudert/sage/src/build/cythonized/sage/rings/integer.c:11621)()
   1728         # This is vastly faster than doing it here, since here
   1729         # we can't cimport rationals.
-> 1730         return the_integer_ring._div(self, right)
   1731 
   1732     def __floordiv__(x, y):

/Users/dcoudert/sage/src/sage/rings/integer_ring.pyx in sage.rings.integer_ring.IntegerRing_class._div (/Users/dcoudert/sage/src/build/cythonized/sage/rings/integer_ring.c:4894)()
    420         cdef rational.Rational x = rational.Rational.__new__(rational.Rational)
    421         if mpz_sgn(right.value) == 0:
--> 422             raise ZeroDivisionError('Rational division by zero')
    423         mpz_set(mpq_numref(x.value), left.value)
    424         mpz_set(mpq_denref(x.value), right.value)

ZeroDivisionError: Rational division by zero

I'll send you the graph by mail if you want to investigate.

comment:13 in reply to: ↑ 11 Changed 4 years ago by ncohen

On a clique that's for sure, but on other kinds of graphs...

Oh.. Wow..

HMmmm.. It seems that those graphs are stuffed with degree-2 vertices. I wonder how they do that O_o

Nathann

comment:14 Changed 4 years ago by dcoudert

I assume they iterate over the edges, which is what I would do on a sparse graph.

comment:15 Changed 4 years ago by ncohen

They probably do a dichotomic search where I do a linear search. I'll try to see tomorrow evening (I will sleep in an airport) if I can do something about that.

comment:16 Changed 4 years ago by git

  • Commit changed from 6274bff716b13f4ef7b3e1745d8dcbf761336826 to a5b7092b804e504c0dd5487bc368cf9d54082e4e

Branch pushed to git repo; I updated commit sha1. New commits:

a5b7092trac #18834: Division by zero

comment:17 Changed 4 years ago by git

  • Commit changed from a5b7092b804e504c0dd5487bc368cf9d54082e4e to 8c25063de223a611c202d937d828c78cf0db8bce

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

8c25063trac #18834: Division by zero

comment:18 Changed 4 years ago by ncohen

A more correct fix.

About the difference in timings:

sage: g=graphs.RandomBarabasiAlbert(10000,2)
sage: %timeit _ = g.clustering_coeff()
10 loops, best of 3: 96.8 ms per loop
sage: from sage.graphs.base.static_sparse_graph import triangles_count
sage: %timeit _ = triangles_count(g)
10 loops, best of 3: 70 ms per loop

It seems that a nontrivial part of the difference in timing is taken by the construction of the.... final dictionary. Profiling the method does not show a major cost of the actual algorithm.. So well, it seems that this graph is so easy to deal with that returning the result is not negligible :-P

Nathann

Last edited 4 years ago by ncohen (previous) (diff)

comment:19 Changed 4 years ago by dcoudert

  • Reviewers set to David Coudert
  • Status changed from needs_review to positive_review

For me the patch is now good to go!

comment:20 Changed 4 years ago by ncohen

Thanks !

Nathann

comment:21 Changed 4 years ago by vbraun

  • Branch changed from u/ncohen/18834 to 8c25063de223a611c202d937d828c78cf0db8bce
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.