Ticket #13808: trac_13808-v3.patch

File trac_13808-v3.patch, 43.8 KB (added by dcoudert, 8 years ago)

new combined version

  • doc/en/reference/graphs.rst

    # HG changeset patch
    # User dcoudert <david.coudert@inria.fr>
    # Date 1356046910 -3600
    # Node ID 103ad966cd80162f02503f9ff9c8d5da61e7bb2f
    # Parent  653a76423407a118217595cc3e5acea9370bd531
    trac #13808 -- Gromov hyperbolicity of a graph
    
    diff --git a/doc/en/reference/graphs.rst b/doc/en/reference/graphs.rst
    a b  
    6464   sage/graphs/distances_all_pairs
    6565   sage/graphs/graph_latex
    6666   sage/graphs/graph_list
    67 
     67   sage/graphs/hyperbolicity
  • module_list.py

    diff --git a/module_list.py b/module_list.py
    a b  
    461461    Extension('sage.graphs.genus',
    462462              sources = ['sage/graphs/genus.pyx']),
    463463
     464    Extension('sage.graphs.hyperbolicity',
     465              sources = ['sage/graphs/hyperbolicity.pyx']),
     466
    464467        ################################
    465468        ##
    466469        ## sage.graphs.base
  • new file sage/graphs/hyperbolicity.pyx

    diff --git a/sage/graphs/hyperbolicity.pyx b/sage/graphs/hyperbolicity.pyx
    new file mode 100644
    - +  
     1r"""
     2Hyperbolicity of a graph
     3
     4**Definition** :
     5
     6    The hyperbolicity `\delta` of a graph `G` has been defined by Gromov
     7    [Gromov87]_ as follows (we give here the so-called 4-points condition):
     8
     9      Let `a, b, c, d` be vertices of the graph, let `S_1`, `S_2` and `S_3` be
     10      defined by
     11
     12      .. MATH::
     13
     14          S_1 = dist(a, b) + dist(b, c)\\
     15          S_2 = dist(a, c) + dist(b, d)\\
     16          S_3 = dist(a, d) + dist(b, c)\\
     17
     18      and let `M_1` and `M_2` be the two largest values among `S_1`, `S_2`, and
     19      `S_3`. We define `hyp(a, b, c, d) = M_1 - M_2`, and the hyperbolicity
     20      `\delta(G)` of the graph is the maximum of `hyp` over all possible
     21      4-tuples `(a, b, c, d)` divided by 2. That is, the graph is said
     22      `\delta`-hyperbolic when
     23
     24      .. MATH::
     25
     26          \delta(G) = \frac{1}{2}\max_{a,b,c,d\in V(G)}hyp(a, b, c, d)
     27
     28      (note that `hyp(a, b, c, d)=0` whenever two elements among `a,b,c,d` are equal)
     29
     30**Some known results** :
     31
     32    - Trees and cliques are `0`-hyperbolic
     33
     34    - `n\times n` grids are `n-1`-hyperbolic
     35
     36    - Cycles are approximately `n/4`-hyperbolic
     37
     38    - Chordal graphs are `\leq 1`-hyperbolic
     39
     40    Besides, the hyperbolicity of a graph is the maximum over all its
     41    biconnected components.
     42
     43**Algorithms and complexity** :
     44
     45    The time complexity of the naive implementation (i.e. testing all 4-tuples)
     46    is `O( n^4 )`, and an algorithm with time complexity `O(n^{3.69})` has been
     47    proposed in [FIV12]_.  This remains very long for large-scale graphs, and
     48    much harder to implement.
     49
     50
     51    An improvement over the naive algorithm has been proposed in [CCL12]_, and
     52    is implemented in the current module. Like the naive algorithm, it has
     53    complexity `O(n^4)` but behaves much better in practice. It uses the
     54    following fact :
     55
     56      Assume that `S_1 = dist(a, b) + dist(c, d)` is the largest among
     57      `S_1,S_2,S_3`. We have
     58
     59      .. MATH::
     60
     61        S_2 + S_3 =& dist(a, c) + dist(b, d) + dist(a, d) + dist(b, c)\\
     62        =& [ dist(a, c) + dist(b, c) ] + [ dist(a, d) + dist(b, d)]\\
     63        \geq &dist(a,b) + dist(a,b)\\
     64        \geq &2dist(a,b)\\
     65
     66      Now, since `S_1` is the largest sum, we have
     67
     68      .. MATH::
     69
     70        hyp(a, b, c, d) =& S_1 - \max\{S_2, S_3\}\\
     71        \leq& S_1 - \frac{S_2+ S_3}{2}\\
     72        =& S_1 - dist(a, b)\\
     73        =& dist(c, d)\\
     74
     75      We obtain similarly that `hyp(a, b, c, d) \leq dist(a, b)`.  Consequently,
     76      in the implementation, we ensure that `S_1` is larger than `S_2` and `S_3`
     77      using an ordering of the pairs by decreasing lengths. Furthermore, we use
     78      the best value `h` found so far to cut exploration.
     79
     80    The worst case time complexity of this algorithm is `O(n^4)`, but it
     81    performs very well in practice since it cuts the search space.  This
     82    algorithm can be turned into an approximation algorithm since at any step of
     83    its execution we maintain an upper and a lower bound. We can thus stop
     84    execution as soon as a multiplicative approximation factor or an additive
     85    one is proven.
     86
     87TODO:
     88
     89- Add exact methods for the hyperbolicity of chordal graphs
     90
     91- Add method for partitioning the graph with clique separators
     92
     93**This module contains the following functions**
     94
     95At Python level :
     96
     97.. csv-table::
     98    :class: contentstable
     99    :widths: 30, 70
     100    :delim: |
     101
     102    :meth:`~hyperbolicity` | Return the hyperbolicity of the graph or an approximation of this value.
     103    :meth:`~hyperbolicity_distribution` | Return the hyperbolicity distribution of the graph or a sampling of it.
     104
     105REFERENCES:
     106
     107.. [CCL12] N. Cohen, D. Coudert, and A. Lancin. Exact and approximate algorithms
     108   for computing the hyperbolicity of large-scale graphs.  Research Report
     109   RR-8074, Sep. 2012. [http://hal.inria.fr/hal-00735481].
     110
     111.. [FIV12] H. Fournier, A. Ismail, and A. Vigneron. Computing the Gromov
     112   hyperbolicity of a discrete metric space. ArXiv, Tech. Rep. arXiv:1210.3323,
     113   Oct. 2012. [http://arxiv.org/abs/1210.3323].
     114
     115.. [Gromov87] M. Gromov. Hyperbolic groups. Essays in Group Theory, 8:75--263,
     116   1987.
     117
     118AUTHORS:
     119
     120- David Coudert (2012): initial version, exact and approximate algorithm,
     121  distribution, sampling
     122
     123
     124Methods
     125-------
     126"""
     127
     128###############################################################################
     129#           Copyright (C) 2012 David Coudert <david.coudert@inria.fr>
     130#
     131# Distributed  under  the  terms  of  the  GNU  General  Public  License (GPL)
     132#                         http://www.gnu.org/licenses/
     133###############################################################################
     134
     135# imports
     136from sage.graphs.graph import Graph
     137from sage.graphs.distances_all_pairs cimport c_distances_all_pairs
     138from sage.rings.arith import binomial
     139from sage.rings.integer_ring import ZZ
     140from sage.functions.other import floor
     141from sage.misc.bitset import Bitset
     142from libc.stdint cimport uint32_t, uint64_t
     143include "../ext/stdsage.pxi"
     144
     145
     146# Defining a pair of vertices as a C struct
     147ctypedef struct pair:
     148    int s
     149    int t
     150
     151
     152######################################################################
     153# Speedup functions
     154######################################################################
     155
     156def _my_subgraph(G, vertices, relabel=False, return_map=False):
     157    r"""
     158    Return the subgraph containing the given vertices
     159
     160    This method considers only the connectivity. Therefore, edge labels are
     161    ignored as well as any other decoration of the graph (vertex position,
     162    etc.).
     163
     164    If ``relabel`` is ``True``, the vertices of the new graph are relabeled with
     165    integers in the range '0\cdots |vertices|-1'. The relabeling map is returned
     166    if ``return_map`` is also ``True``.
     167
     168    TESTS:
     169
     170    Giving anything else than a Graph::
     171
     172        sage: from sage.graphs.hyperbolicity import _my_subgraph as mysub
     173        sage: mysub([],[])
     174        Traceback (most recent call last):
     175        ...
     176        ValueError: The input parameter must be a Graph.
     177
     178    Subgraph of a PetersenGraph::
     179
     180        sage: from sage.graphs.hyperbolicity import _my_subgraph as mysub
     181        sage: H = mysub(graphs.PetersenGraph(), [0,2,4,6])
     182        sage: H.edges(labels=None)
     183        [(0, 4)]
     184        sage: H.vertices()
     185        [0, 2, 4, 6]
     186    """
     187    if not isinstance(G,Graph):
     188        raise ValueError("The input parameter must be a Graph.")
     189    H = Graph()
     190    if not vertices:
     191        return (H,{}) if (relabel and return_map) else H
     192
     193    if relabel:
     194        map = dict(zip(iter(vertices),xrange(len(vertices))))
     195    else:
     196        map = dict(zip(iter(vertices),iter(vertices)))
     197
     198    B = {}
     199    for v in G.vertex_iterator():
     200        B[v] = False
     201    for v in vertices:
     202        B[v] = True
     203        H.add_vertex(map[v])
     204
     205    for u in vertices:
     206        for v in G.neighbor_iterator(u):
     207            if B[v]:
     208                H.add_edge(map[u],map[v])
     209
     210    return (H,map) if (relabel and return_map) else H
     211
     212
     213######################################################################
     214# Building blocks
     215######################################################################
     216
     217cdef inline int __hyp__(unsigned short ** distances, int a, int b, int c, int d):
     218    """
     219    Return the hyperbolicity of the given 4-tuple.
     220    """
     221    cdef int S1, S2, S3, h
     222    S1 = distances[a][b] + distances[c][d]
     223    S2 = distances[a][c] + distances[b][d]
     224    S3 = distances[a][d] + distances[b][c]
     225    if S1 >= S2:
     226        if S2 > S3:
     227            h = S1-S2
     228        else:
     229            h = abs(S1-S3)
     230    else:
     231        if S1 > S3:
     232            h = S2-S1
     233        else:
     234            h = abs(S2-S3)
     235    return h
     236
     237######################################################################
     238# Basic algorithm for the hyperbolicity
     239######################################################################
     240
     241cdef tuple __hyperbolicity_basic_algorithm__(int N,
     242                                             unsigned short **  distances,
     243                                             verbose = False):
     244    """
     245    Returns **twice** the hyperbolicity of a graph, and a certificate.
     246
     247    This method implements the basic algorithm for computing the hyperbolicity
     248    of a graph, that is iterating over all 4-tuples. See the module's
     249    documentation for more details.
     250
     251    INPUTS:
     252
     253    - ``N`` -- number of vertices of the graph.
     254
     255    - ``distances`` -- path distance matrix (see the distance_all_pairs module).
     256
     257    - ``verbose`` -- (default: ``False``) is boolean. Set to True to display
     258      some information during execution.
     259
     260    OUTPUTS:
     261
     262    This function returns a tuple ( h, certificate ), where:
     263
     264    - ``h`` -- the maximum computed value over all 4-tuples, and so is twice the
     265      hyperbolicity of the graph. If no such 4-tuple is found, -1 is returned.
     266
     267    - ``certificate`` -- 4-tuple of vertices maximizing the value `h`. If no
     268      such 4-tuple is found, the empty list [] is returned.
     269    """
     270    cdef int a, b, c, d, S1, S2, S3, hh, h_LB
     271    cdef list certificate
     272
     273    h_LB = -1
     274    for 0 <= a < N-3:
     275        for a < b < N-2:
     276            for b < c < N-1:
     277                for c < d < N:
     278
     279                    # We compute the hyperbolicity of the 4-tuple
     280                    hh = __hyp__(distances, a, b, c, d)
     281
     282                    # We compare the value with previously known bound
     283                    if hh > h_LB:
     284                        h_LB = hh
     285                        certificate = [a, b, c, d]
     286
     287                        if verbose:
     288                            print 'New lower bound:',ZZ(hh)/2
     289
     290    # Last, we return the computed value and the certificate
     291    if h_LB != -1:
     292        return ( h_LB, certificate )
     293    else:
     294        return ( -1, [] )
     295
     296
     297######################################################################
     298# Decomposition methods
     299######################################################################
     300
     301def elimination_ordering_of_simplicial_vertices(G, max_degree=4, verbose=False):
     302    r"""
     303    Return an elimination ordering of simplicial vertices.
     304
     305    An elimination ordering of simplicial vertices is an elimination ordering of
     306    the vertices of the graphs such that the induced subgraph of their neighbors
     307    is a clique. More precisely, as long as the graph as a vertex ``u`` such
     308    that the induced subgraph of its neighbors is a clique, we remove ``u`` from
     309    the graph, add it to the elimination ordering (list of vertices), and
     310    repeat. This method is inspired from the decomposition of a graph by
     311    clique-separators.
     312
     313    INPUTS:
     314
     315    - ``G`` -- a Graph
     316
     317    - ``max_degree`` -- (default: 4) maximum degree of the vertices to consider.
     318      The running time of this method depends on the value of this parameter.
     319
     320    - ``verbose`` -- (default: ``False``) is boolean set to ``True`` to display
     321      some information during execution.
     322
     323    OUTPUT:
     324
     325    - ``elim`` -- A ordered list of vertices such that vertex ``elim[i]`` is
     326      removed before vertex ``elim[i+1]``.
     327
     328    TESTS:
     329
     330    Giving anything else than a Graph::
     331
     332        sage: from sage.graphs.hyperbolicity import elimination_ordering_of_simplicial_vertices
     333        sage: elimination_ordering_of_simplicial_vertices([])
     334        Traceback (most recent call last):
     335        ...
     336        ValueError: The input parameter must be a Graph.
     337
     338    Giving two small bounds on degree::
     339
     340        sage: from sage.graphs.hyperbolicity import elimination_ordering_of_simplicial_vertices
     341        sage: elimination_ordering_of_simplicial_vertices(Graph(), max_degree=0)
     342        Traceback (most recent call last):
     343        ...
     344        ValueError: The parameter max_degree must be > 0.
     345
     346    Giving a graph build from a bipartite graph plus an edge::
     347
     348        sage: G = graphs.CompleteBipartiteGraph(2,10)
     349        sage: G.add_edge(0,1)
     350        sage: from sage.graphs.hyperbolicity import elimination_ordering_of_simplicial_vertices
     351        sage: elimination_ordering_of_simplicial_vertices(G)
     352        [2, 3, 4, 5, 6, 7, 8, 9, 10, 0, 1, 11]
     353        sage: elimination_ordering_of_simplicial_vertices(G,max_degree=1)
     354        []
     355    """
     356    if verbose:
     357        print 'Entering elimination_ordering_of_simplicial_vertices'
     358
     359    if not isinstance(G,Graph):
     360        raise ValueError("The input parameter must be a Graph.")
     361    elif max_degree < 1:
     362        raise ValueError("The parameter max_degree must be > 0.")
     363
     364    # We make a copy of the graph. We use a NetworkX graph since modifications
     365    # are a bit faster this way.
     366    import networkx
     367    ggnx = networkx.empty_graph()
     368    for u,v in G.edge_iterator(labels=None):
     369        ggnx.add_edge(u,v)
     370
     371    from sage.combinat.combinat import combinations_iterator
     372    cdef list elim = []
     373    cdef set L = set()
     374
     375    # We identify vertices of degree at most max_degree
     376    for u,d in ggnx.degree_iter():
     377        if d<=max_degree:
     378            L.add(u)
     379
     380    while L:
     381        # We pick up a vertex and check if the induced subgraph of its neighbors
     382        # is a clique. If True, we record it, remove it from the graph, and
     383        # update the list of vertices of degree at most max_degree.
     384        u = L.pop()
     385        X = ggnx.neighbors(u)
     386        if all(ggnx.has_edge(v,w) for v,w in combinations_iterator(X,2)):
     387            elim.append(u)
     388            ggnx.remove_node(u)
     389            for v,d in ggnx.degree_iter(X):
     390                if d<=max_degree:
     391                    L.add(v)
     392
     393    if verbose:
     394        print 'Propose to eliminate',len(elim),'of the',G.num_verts(),'vertices'
     395        print 'End elimination_ordering_of_simplicial_vertices'
     396
     397    return elim
     398
     399
     400######################################################################
     401# Compute the hyperbolicity using a path decreasing length ordering
     402######################################################################
     403
     404cdef inline _invert_cells(pair * tab, uint32_t idxa, uint32_t idxb):
     405    cdef pair tmp = tab[idxa]
     406    tab[idxa] = tab[idxb]
     407    tab[idxb] = tmp
     408
     409
     410cdef _order_pairs_according_elimination_ordering(elim, int N, pair *pairs, uint32_t nb_pairs, uint32_t *last_pair):
     411    r"""
     412    Re-order the pairs of vertices according the elimination of simplicial vertices.
     413
     414    We put pairs of vertices with an extremity in ``elim`` at the end of the
     415    array of pairs.  We record the positions of the first pair of vertices in
     416    ``elim``.  If ``elim`` is empty, the ordering is unchanged and we set
     417    ``last_pair=nb_pairs``.
     418    """
     419    cdef uint32_t j, jmax
     420
     421    jmax = nb_pairs-1
     422    if nb_pairs<=1 or not elim:
     423        last_pair[0] = nb_pairs
     424    else:
     425        B = Bitset(iter(elim))
     426        j = 0
     427        while j<jmax:
     428
     429            if pairs[j].s in B or pairs[j].t in B:
     430                while (pairs[jmax].s in B or pairs[jmax].t in B) and (j<jmax):
     431                    jmax -= 1
     432
     433                if j<jmax:
     434                    _invert_cells(pairs, j, jmax)
     435
     436                if jmax>0:
     437                    jmax -= 1
     438
     439            else: # This pair is at a correct position.
     440                j += 1
     441
     442        # We record the position of the first pair of vertices in elim
     443        last_pair[0] = jmax+1
     444
     445
     446cdef tuple __hyperbolicity__(int N,
     447                             unsigned short **  distances,
     448                             int D,
     449                             int h_LB,
     450                             float approximation_factor,
     451                             float additive_gap,
     452                             elim,
     453                             verbose = False):
     454    """
     455    Return the hyperbolicity of a graph.
     456
     457    This method implements the exact and the approximate algorithms proposed in
     458    [CCL12]_. See the module's documentation for more details.
     459
     460    INPUTS:
     461
     462    - ``N`` -- number of vertices of the graph
     463
     464    - ``distances`` -- path distance matrix
     465
     466    - ``D`` -- diameter of the graph
     467
     468    - ``h_LB`` -- lower bound on the hyperbolicity
     469
     470    - ``approximation_factor`` -- When the approximation factor is set to some
     471      value larger than 1.0, the function stop computations as soon as the ratio
     472      between the upper bound and the best found solution is less than the
     473      approximation factor. When the approximation factor is 1.0, the problem is
     474      solved optimaly.
     475
     476     - ``additive_gap`` -- When sets to a positive number, the function stop
     477       computations as soon as the difference between the upper bound and the
     478       best found solution is less than additive gap. When the gap is 0.0, the
     479       problem is solved optimaly.
     480
     481    - ``elim`` -- elimination ordering of simplicial vertices (list of vertices
     482      to eliminate when the lower bound is larger or equal to 1).
     483
     484    - ``verbose`` -- (default: ``False``) is boolean set to ``True`` to display
     485      some information during execution
     486
     487    OUTPUTS:
     488
     489    This function returns a tuple ( h, certificate, h_UB ), where:
     490
     491    - ``h`` -- is an integer. When 4-tuples with hyperbolicity larger or equal
     492     to `h_LB are found, h is the maximum computed value and so twice the
     493     hyperbolicity of the graph. If no such 4-tuple is found, it returns -1.
     494
     495    - ``certificate`` -- is a list of vertices. When 4-tuples with hyperbolicity
     496      larger that h_LB are found, certificate is the list of the 4 vertices for
     497      which the maximum value (and so the hyperbolicity of the graph) has been
     498      computed. If no such 4-tuple is found, it returns the empty list [].
     499
     500    - ``h_UB`` -- is an integer equal to the proven upper bound for `h`. When
     501      ``h == h_UB``, the returned solution is optimal.
     502    """
     503    cdef int i, j, l, l1, l2, x, y, h, hh, h_UB, a, b, c, d, S1, S2, S3
     504    cdef dict distr = {}
     505    cdef list certificate = []
     506
     507    # We count the number of pairs of vertices at distance l for every l
     508    cdef uint32_t * nb_pairs_of_length = <uint32_t *>sage_calloc(D+1,sizeof(uint32_t))
     509    for 0 <= i < N:
     510        for i+1 <= j < N:
     511            nb_pairs_of_length[ distances[i][j] ] += 1
     512
     513    # We organize the pairs by length in an array of pairs
     514    cdef pair ** pairs_of_length = <pair **>sage_malloc(sizeof(pair *)*(D+1))
     515    cdef uint32_t * cpt_pairs = <uint32_t *>sage_calloc(D+1,sizeof(uint32_t))
     516    for 1 <= i <= D:
     517        pairs_of_length[i] = <pair *>sage_malloc(sizeof(pair)*nb_pairs_of_length[i])
     518        # cpt_pairs[i] = 0
     519    for 0 <= i < N:
     520        for i+1 <= j < N:
     521            l = distances[i][j]
     522            pairs_of_length[ l ][ cpt_pairs[ l ] ].s = i
     523            pairs_of_length[ l ][ cpt_pairs[ l ] ].t = j
     524            cpt_pairs[ l ] += 1
     525    sage_free(cpt_pairs)
     526
     527    if verbose:
     528        print "Current 2 connected component has %d vertices and diameter %d" %(N,D)
     529        print "Paths length distribution:", [ (l, nb_pairs_of_length[l]) for l in range(1, D+1) ]
     530
     531    # We improve the ordering of the pairs according the elimination
     532    # ordering. We store in last_pair[l] the index of the last pair of length l
     533    # to consider when the lower bound on the hyperbolicity >=1.
     534    cdef uint32_t * last_pair = <uint32_t *>sage_malloc(sizeof(uint32_t)*(D+1))
     535    for 1 <= l <= D:
     536        _order_pairs_according_elimination_ordering(elim, N, pairs_of_length[l], nb_pairs_of_length[l], last_pair+l)
     537
     538    approximation_factor = min(approximation_factor, D)
     539    additive_gap = min(additive_gap, D)
     540
     541    # We create the list of triples (sum,length1,length2) sorted in
     542    # co-lexicographic order: decreasing by sum, decreasing by length2,
     543    # decreasing length1. This is to ensure a valid ordering for S1, to avoid
     544    # some tests, and to ease computation of bounds.
     545    cdef list triples = []
     546    for i in range(D,0,-1):
     547        for j in range(D,i-1,-1):
     548            triples.append((i+j,j,i))
     549
     550    # We use some short-cut variables for efficiency
     551    cdef pair * pairs_of_length_l1
     552    cdef pair * pairs_of_length_l2
     553    cdef uint32_t nb_pairs_of_length_l1, nb_pairs_of_length_l2
     554    h = h_LB
     555    h_UB = D
     556    for S1, l1, l2 in triples:
     557
     558        # If we cannot improve further, we stop
     559        if l2 <= h:
     560            h_UB = h
     561            break
     562
     563        if h_UB > l2:
     564            h_UB = l2
     565
     566            if verbose:
     567                print "New upper bound:",ZZ(l2)/2
     568
     569            # Termination if required approximation is found
     570            if certificate and ( (h_UB <= h*approximation_factor) or (h_UB-h <= additive_gap) ):
     571                break
     572
     573        pairs_of_length_l1 = pairs_of_length[l1]
     574        nb_pairs_of_length_l1 = last_pair[l1] if h>=1 else nb_pairs_of_length[l1]
     575        x = 0
     576        while x < nb_pairs_of_length_l1:
     577            a = pairs_of_length_l1[x].s
     578            b = pairs_of_length_l1[x].t
     579
     580            if l2 <= h:
     581                # We cut current exploration if lower bound cannot be improved
     582                break
     583            elif l1 == l2:
     584                y = x+1
     585            else:
     586                y = 0
     587
     588            pairs_of_length_l2 = pairs_of_length[l2]
     589            nb_pairs_of_length_l2 = last_pair[l2] if h>=1 else nb_pairs_of_length[l2]
     590            while y < nb_pairs_of_length_l2:
     591                c = pairs_of_length_l2[y].s
     592                d = pairs_of_length_l2[y].t
     593
     594                # If two points are equal, the value will be 0, so we skip the
     595                # test.
     596                if a == c or a == d or b == c or b == d:
     597                    y += 1
     598                    continue
     599
     600                # We compute the hyperbolicity of the 4-tuple. We have S1 = l1 +
     601                # l2, and the order in which pairs are visited allow us to claim
     602                # that S1 = max( S1, S2, S3 ). If at some point S1 is not the
     603                # maximum value, the order ensures that the maximum value has
     604                # previously been checked.
     605                S2 = distances[a][c] + distances[b][d]
     606                S3 = distances[a][d] + distances[b][c]
     607                if S2 > S3:
     608                    hh = S1 - S2
     609                else:
     610                    hh = S1 - S3
     611
     612                if h < hh:
     613                    # We update current bound on the hyperbolicity and the
     614                    # search space
     615                    h = hh
     616                    certificate = [a, b, c, d]
     617                    if h>=1:
     618                        nb_pairs_of_length_l2 = last_pair[l2]
     619                        nb_pairs_of_length_l1 = last_pair[l1]
     620                        if x >= nb_pairs_of_length_l1:
     621                            break
     622
     623                    if verbose:
     624                        print "New lower bound:",ZZ(hh)/2
     625
     626                # We go for the next pair c-d
     627                y += 1
     628                # We cut current exploration if we know we can not improve lower bound
     629                if l2 <= h:
     630                    h_UB = h
     631                    break
     632            # We go for the next pair a-b
     633            x += 1
     634
     635
     636    # We now release the memory
     637    sage_free(nb_pairs_of_length)
     638    for 1 <= i <= D:
     639        sage_free(pairs_of_length[i])
     640    sage_free(pairs_of_length)
     641    sage_free(last_pair)
     642
     643    # Last, we return the computed value and the certificate
     644    if len(certificate) == 0:
     645        return ( -1, [], h_UB )
     646    else:
     647        return (h, certificate, h_UB)
     648
     649
     650
     651def hyperbolicity(G, algorithm='cuts', approximation_factor=1.0, additive_gap=0, verbose = False):
     652    r"""
     653    Return the hyperbolicity of the graph or an approximation of this value.
     654
     655    The hyperbolicity of a graph has been defined by Gromov [Gromov87]_ as
     656    follows: Let `a, b, c, d` be vertices of the graph, let `S_1 = dist(a, b) +
     657    dist(b, c)`, `S_2 = dist(a, c) + dist(b, d)`, and `S_3 = dist(a, d) +
     658    dist(b, c)`, and let `M_1` and `M_2` be the two largest values among `S_1`,
     659    `S_2`, and `S_3`. We have `hyp(a, b, c, d) = |M_1 - M_2|`, and the
     660    hyperbolicity of the graph is the maximum over all possible 4-tuples `(a, b,
     661    c, d)` divided by 2. The worst case time complexity is in `O( n^4 )`.
     662
     663    INPUT:
     664
     665    - ``G`` -- a Graph
     666
     667    - ``algorithm`` -- (default: ``'cuts'``) specifies the algorithm to use
     668      among:
     669
     670          - ``'basic'`` is an exhaustive algorithm considering all possible
     671            4-tuples and so have time complexity in `O(n^4)`.
     672
     673          - ``'cuts'`` is an exact algorithm proposed in [CCL12_]. It considers
     674            the 4-tuples in an ordering allowing to cut the search space as soon
     675            as a new lower bound is found (see the module's documentation). This
     676            algorithm can be turned into a approximation algorithm.
     677
     678          - ``'cuts+'`` is an extension of the ``cuts`` algorithm that uses
     679            elimination ordering of the simplicial vertices as documented in
     680            [CCL12_]. For some classes of graphs it allows a significant
     681            speed-up. Try it to know if it is intersting for you.
     682
     683    - ``approximation_factor`` -- (default: 1.0) When the approximation factor
     684      is set to some value larger than 1.0, the function stop computations as
     685      soon as the ratio between the upper bound and the best found solution is
     686      less than the approximation factor. When the approximation factor is 1.0,
     687      the problem is solved optimaly. This parameter is used only when the
     688      chosen algorithm is ``'cuts'``.
     689
     690    - ``additive_gap`` -- (default: 0.0) When sets to a positive number, the
     691      function stop computations as soon as the difference between the upper
     692      bound and the best found solution is less than additive gap. When the gap
     693      is 0.0, the problem is solved optimaly. This parameter is used only when
     694      the chosen algorithm is ``'cuts'``.
     695
     696    - ``verbose`` -- (default: ``False``) is a boolean set to True to display
     697      some information during execution: new upper and lower bounds, etc.
     698
     699    OUTPUT:
     700
     701    This function returns the tuple ( delta, certificate, delta_UB ), where:
     702
     703    - ``delta`` -- the hyperbolicity of the graph (half-integer value).
     704
     705    - ``certificate`` -- is the list of the 4 vertices for which the maximum
     706      value has been computed, and so the hyperbolicity of the graph.
     707
     708    - ``delta_UB`` -- is an upper bound for ``delta``. When ``delta ==
     709      delta_UB``, the returned solution is optimal. Otherwise, the approximation
     710      factor if ``delta_UB/delta``.
     711
     712    EXAMPLES:
     713
     714    Hyperbolicity of a `3\times 3` grid::
     715
     716        sage: from sage.graphs.hyperbolicity import hyperbolicity
     717        sage: G = graphs.GridGraph([3,3])
     718        sage: hyperbolicity(G,algorithm='cuts')
     719        (2, [(0, 0), (0, 2), (2, 0), (2, 2)], 2)
     720        sage: hyperbolicity(G,algorithm='basic')
     721        (2, [(0, 0), (0, 2), (2, 0), (2, 2)], 2)
     722
     723    Hyperbolicity of a PetersenGraph::
     724
     725        sage: from sage.graphs.hyperbolicity import hyperbolicity
     726        sage: G = graphs.PetersenGraph()
     727        sage: hyperbolicity(G,algorithm='cuts')
     728        (1/2, [0, 1, 2, 3], 1/2)
     729        sage: hyperbolicity(G,algorithm='basic')
     730        (1/2, [0, 1, 2, 3], 1/2)
     731
     732    Asking for an approximation::
     733
     734        sage: from sage.graphs.hyperbolicity import hyperbolicity
     735        sage: G = graphs.GridGraph([2,10])
     736        sage: hyperbolicity(G,algorithm='cuts', approximation_factor=1.5)
     737        (1, [(0, 0), (0, 9), (1, 0), (1, 9)], 3/2)
     738        sage: hyperbolicity(G,algorithm='cuts', approximation_factor=4)
     739        (1, [(0, 0), (0, 9), (1, 0), (1, 9)], 4)
     740        sage: hyperbolicity(G,algorithm='cuts', additive_gap=2)
     741        (1, [(0, 0), (0, 9), (1, 0), (1, 9)], 3)
     742
     743    Comparison of results::
     744
     745        sage: from sage.graphs.hyperbolicity import hyperbolicity
     746        sage: for i in xrange(10): # long test
     747        ...       G = graphs.RandomBarabasiAlbert(100,2)
     748        ...       d1,_,_ = hyperbolicity(G,algorithm='basic')
     749        ...       d2,_,_ = hyperbolicity(G,algorithm='cuts')
     750        ...       d3,_,_ = hyperbolicity(G,algorithm='cuts+')
     751        ...       l3,_,u3 = hyperbolicity(G,approximation_factor=2)
     752        ...       if d1!=d2 or d1!=d3 or l3>d1 or u3<d1:
     753        ...          print "That's not good!"
     754
     755    TESTS:
     756
     757    Giving anything else than a Graph::
     758
     759        sage: from sage.graphs.hyperbolicity import hyperbolicity
     760        sage: hyperbolicity([])
     761        Traceback (most recent call last):
     762        ...
     763        ValueError: The input parameter must be a Graph.
     764
     765    Giving a non connected graph::
     766
     767        sage: from sage.graphs.hyperbolicity import hyperbolicity
     768        sage: G = Graph([(0,1),(2,3)])
     769        sage: hyperbolicity(G)
     770        Traceback (most recent call last):
     771        ...
     772        ValueError: The input Graph must be connected.
     773
     774    Giving wrong approximation factor::
     775
     776        sage: from sage.graphs.hyperbolicity import hyperbolicity
     777        sage: G = graphs.PetersenGraph()
     778        sage: hyperbolicity(G,algorithm='cuts', approximation_factor=0.1)
     779        Traceback (most recent call last):
     780        ...
     781        ValueError: The approximation factor must be >= 1.0.
     782
     783    Giving negative additive gap::
     784
     785        sage: from sage.graphs.hyperbolicity import hyperbolicity
     786        sage: G = Graph()
     787        sage: hyperbolicity(G,algorithm='cuts', additive_gap=-1)
     788        Traceback (most recent call last):
     789        ...
     790        ValueError: The additive gap must be >= 0.
     791
     792    Asking for an unknown algorithm::
     793
     794        sage: from sage.graphs.hyperbolicity import hyperbolicity
     795        sage: G = Graph()
     796        sage: hyperbolicity(G,algorithm='tip top')
     797        Traceback (most recent call last):
     798        ...
     799        ValueError: Algorithm 'tip top' not yet implemented. Please contribute.
     800    """
     801    if not isinstance(G,Graph):
     802        raise ValueError("The input parameter must be a Graph.")
     803    if algorithm=='cuts':
     804        if approximation_factor < 1.0:
     805            raise ValueError("The approximation factor must be >= 1.0.")
     806        if additive_gap < 0.0:
     807            raise ValueError("The additive gap must be >= 0.")
     808    elif not algorithm in ['basic', 'cuts+']:
     809        raise ValueError("Algorithm '%s' not yet implemented. Please contribute." %(algorithm))
     810
     811    # The hyperbolicity is defined on connected graphs
     812    if not G.is_connected():
     813        raise ValueError("The input Graph must be connected.")
     814
     815    # The hyperbolicity of some classes of graphs is known. If it is easy and
     816    # fast to test that a graph belongs to one of these classes, we do it.
     817    if G.num_verts() <= 3:
     818        # The hyperbolicity of a graph with 3 vertices is 0.
     819        # The certificate is the set of vertices.
     820        return 0, G.vertices(), 0
     821
     822    elif G.num_verts() == G.num_edges() + 1:
     823        # G is a tree
     824        # Any set of 4 vertices is a valid certificate
     825        return 0, [G.vertices()[x] for x in range(4)], 0
     826
     827    elif G.is_clique():
     828        # Any set of 4 vertices is a valid certificate
     829        return 0, [G.vertices()[z] for z in xrange(4)], 0
     830
     831
     832    cdef unsigned short * _distances_
     833    cdef unsigned short ** distances
     834    cdef int i, j, k, iN, N, hyp, hyp_UB, hh, hh_UB, D
     835    cdef dict distr = {}
     836    cdef list certificate = []
     837    cdef list certif
     838    cdef dict mymap, myinvmap
     839
     840    # We search for the largest 2-connected component. Indeed, the hyperbolicity
     841    # of a graph is the maximum over its 2-connected components.
     842    B,C = G.blocks_and_cut_vertices()
     843
     844    if verbose:
     845        # we compute the distribution of size of the blocks
     846        for V in B:
     847            i = len(V)
     848            if i in distr:
     849                distr[ i ] += 1
     850            else:
     851                distr[ i ] = 1
     852        print "Graph with %d blocks" %(len(B))
     853        print "Blocks size distribution:", distr
     854
     855    hyp = 0
     856    for V in B:
     857
     858        # The hyperbolicity of a graph with 3 vertices is 0, and a graph cannot
     859        # have hyperbolicity larger than N/2. So we consider only larger
     860        # 2-connected subgraphs.
     861        if len(V) > max( 3, 2*hyp) :
     862
     863            # We build the subgraph
     864            H = _my_subgraph(G,V)
     865            N = H.num_verts()
     866
     867            # We test if the block belongs to a graph class with known
     868            # hyperbolicity and a fast test.
     869            if H.is_clique():
     870                continue
     871
     872            # We relabel the vertices to ensure integer vertex names in the
     873            # range [0..N-1] since the c_distances_all_pairs uses integer labels
     874            # in the range [0..N-1].
     875            mymap = H.relabel( return_map=True )
     876
     877            # We compute the distances and store the results in a 2D array
     878            # Although it seems irrelevant to use a 2D array instead of a 1D
     879            # array, it seems to be faster in practice. We also compute the
     880            # diameter.
     881            _distances_ = c_distances_all_pairs(H)
     882            distances = <unsigned short **>sage_malloc(sizeof(unsigned short *)*N)
     883            D = 0
     884            for 0 <= i < N:
     885                distances[i] = _distances_+i*N
     886                for 0 <= j < N:
     887                    if distances[i][j] > D:
     888                        D = distances[i][j]
     889
     890            # We call the cython function for computing the hyperbolicity with
     891            # the required parameters.
     892            if algorithm == 'cuts' or algorithm == 'cuts+':
     893
     894                if algorithm == 'cuts+':
     895                    # We compute the elimination ordering of simplicial vertices of H
     896                    elim = elimination_ordering_of_simplicial_vertices(H, max(2,floor(N**(1/2.0))), verbose)
     897                else:
     898                    elim = []
     899
     900                hh, certif, hh_UB = __hyperbolicity__(N, distances, D, hyp, approximation_factor, 2*additive_gap, elim, verbose)
     901
     902            elif algorithm == 'basic':
     903                hh, certif = __hyperbolicity_basic_algorithm__(N, distances, verbose)
     904                hh_UB = hh
     905
     906            # We test if the new computed value improves upon previous value.
     907            if hh > hyp:
     908                hyp = hh
     909                hyp_UB = hh_UB
     910
     911                # We construct the inverse mapping of the relabeling of the
     912                # vertices of the subgraph
     913                myinvmap = dict([(mymap[x],x) for x in mymap])
     914
     915                # We then construct the correct certificate
     916                certificate = [myinvmap[u] for u in certif]
     917
     918            # We now release the memory
     919            sage_free(distances)
     920            sage_free(_distances_)
     921
     922    # Last, we return the computed value and the certificate
     923    return  ZZ(hyp)/2, sorted(certificate), ZZ(hyp_UB)/2
     924
     925
     926######################################################################
     927# Distribution of the hyperbolicity of 4-tuples
     928######################################################################
     929
     930cdef dict __hyperbolicity_distribution__(int N, unsigned short ** distances):
     931    """
     932    Return the distribution of the hyperbolicity of the 4-tuples of the graph.
     933
     934    The hyperbolicity of a graph has been defined by Gromov [Gromov87]_ as
     935    follows: Let `a, b, c, d` be vertices of the graph, let `S_1 = dist(a, b) +
     936    dist(b, c)`, `S_2 = dist(a, c) + dist(b, d)`, and `S_3 = dist(a, d) +
     937    dist(b, c)`, and let `M_1` and `M_2` be the two largest values among `S_1`,
     938    `S_2`, and `S_3`. We have `hyp(a, b, c, d) = |M_1 - M_2|`, and the
     939    hyperbolicity of the graph is the maximum over all possible 4-tuples `(a, b,
     940    c, d)` divided by 2.
     941
     942    The computation of the hyperbolicity of each 4-tuple, and so the
     943    hyperbolicity distribution, takes time in `O( n^4 )`.
     944
     945    We use ``unsigned long int`` on 64 bits, so ``uint64_t``, to count the
     946    number of 4-tuples of given hyperbolicity. So we cannot exceed `2^64-1`.
     947    This value should be sufficient for most users.
     948
     949    INPUT:
     950
     951    - ``N`` -- number of vertices of the graph (and side of the matrix)
     952
     953    - ``distances`` -- matrix of distances in the graph
     954
     955    OUTPUT:
     956
     957    - ``hdict`` -- A dictionnary such that hdict[i] is the number of 4-tuples of
     958      hyperbolicity i among the considered 4-tuples.
     959    """
     960    # We initialize the table of hyperbolicity. We use an array of unsigned long
     961    # int instead of a dictionnary since it is much faster.
     962    cdef int i
     963
     964    cdef uint64_t * hdistr = <uint64_t *>sage_calloc(N+1,sizeof(uint64_t))
     965
     966    # We now compute the hyperbolicity of each 4-tuple
     967    cdef int a, b, c, d
     968    for 0 <= a < N-3:
     969        for a < b < N-2:
     970            for b < c < N-1:
     971                for c < d < N:
     972                    hdistr[ __hyp__(distances, a, b, c, d) ] += 1
     973
     974    # We prepare the dictionnary of hyperbolicity distribution to return
     975    Nchoose4 = binomial(N,4)
     976    cdef dict hdict = dict( [ (ZZ(i)/2, ZZ(hdistr[i])/Nchoose4) for 0 <= i <= N if hdistr[i] > 0 ] )
     977
     978    sage_free(hdistr)
     979
     980    return hdict
     981
     982
     983# We use this trick since it is way faster than using the sage randint function.
     984cdef extern from "stdlib.h":
     985    long c_libc_random "random"()
     986    void c_libc_srandom "srandom"(unsigned int seed)
     987
     988cdef dict __hyperbolicity_sampling__(int N, unsigned short ** distances, uint64_t sampling_size):
     989    """
     990    Return a sampling of the hyperbolicity distribution of the graph.
     991
     992    The hyperbolicity of a graph has been defined by Gromov [Gromov87]_ as
     993    follows: Let `a, b, c, d` be vertices of the graph, let `S_1 = dist(a, b) +
     994    dist(b, c)`, `S_2 = dist(a, c) + dist(b, d)`, and `S_3 = dist(a, d) +
     995    dist(b, c)`, and let `M_1` and `M_2` be the two largest values among `S_1`,
     996    `S_2`, and `S_3`. We have `hyp(a, b, c, d) = |M_1 - M_2|`, and the
     997    hyperbolicity of the graph is the maximum over all possible 4-tuples `(a, b,
     998    c, d)` divided by 2.
     999
     1000    We use ``unsigned long int`` on 64 bits, so ``uint64_t``, to count the
     1001    number of 4-tuples of given hyperbolicity. So we cannot exceed `2^64-1`.
     1002    This value should be sufficient for most users.
     1003
     1004    INPUT:
     1005
     1006    - ``N`` -- number of vertices of the graph (and side of the matrix)
     1007
     1008    - ``distances`` -- matrix of distances in the graph
     1009
     1010    - ``sampling_size`` -- number of 4-tuples considered. Default value is 1000.
     1011
     1012    OUTPUT:
     1013
     1014    - ``hdict`` -- A dictionnary such that hdict[i] is the number of 4-tuples of
     1015                hyperbolicity i among the considered 4-tuples.
     1016    """
     1017    cdef int i, a, b, c, d
     1018    cdef uint64_t j
     1019
     1020    # We initialize the table of hyperbolicity. We use an array of unsigned long
     1021    # int instead of a dictionnary since it is much faster.
     1022    cdef uint64_t * hdistr = <uint64_t *>sage_calloc(N+1,sizeof(uint64_t))
     1023
     1024    # We now compute the hyperbolicity of each quadruple
     1025    for 0 <= j < sampling_size:
     1026        a = c_libc_random() % N
     1027        b = c_libc_random() % N
     1028        c = c_libc_random() % N
     1029        d = c_libc_random() % N
     1030        while a == b:
     1031            b = c_libc_random() % N
     1032        while a == c or b == c:
     1033            c = c_libc_random() % N
     1034        while a == d or b == d or c == d:
     1035            d = c_libc_random() % N
     1036
     1037        hdistr[ __hyp__(distances, a, b, c, d) ] += 1
     1038
     1039    # We prepare the dictionnary of hyperbolicity distribution from sampling
     1040    cdef dict hdict = dict( [ (ZZ(i)/2, ZZ(hdistr[i])/ZZ(sampling_size)) for 0 <= i <= N if hdistr[i] > 0 ] )
     1041
     1042    sage_free(hdistr)
     1043
     1044    return hdict
     1045
     1046
     1047def hyperbolicity_distribution(G, algorithm='sampling', sampling_size=10**6):
     1048    r"""
     1049    Return the hyperbolicity distribution of the graph or a sampling of it.
     1050
     1051    The hyperbolicity of a graph has been defined by Gromov [Gromov87]_ as
     1052    follows: Let `a, b, c, d` be vertices of the graph, let `S_1 = dist(a, b) +
     1053    dist(b, c)`, `S_2 = dist(a, c) + dist(b, d)`, and `S_3 = dist(a, d) +
     1054    dist(b, c)`, and let `M_1` and `M_2` be the two largest values among `S_1`,
     1055    `S_2`, and `S_3`. We have `hyp(a, b, c, d) = |M_1 - M_2|`, and the
     1056    hyperbolicity of the graph is the maximum over all possible 4-tuples `(a, b,
     1057    c, d)` divided by 2.
     1058
     1059    The computation of the hyperbolicity of each 4-tuple, and so the
     1060    hyperbolicity distribution, takes time in `O( n^4 )`.
     1061
     1062    INPUT:
     1063
     1064    - ``G`` -- a Graph.
     1065
     1066    - ``algorithm`` -- (default: 'sampling') When algorithm is 'sampling', it
     1067      returns the distribution of the hyperbolicity over a sample of
     1068      ``sampling_size`` 4-tuples. When algorithm is 'exact', it computes the
     1069      distribution of the hyperbolicity over all 4-tuples. Be aware that the
     1070      computation time can be HUGE.
     1071
     1072    - ``sampling_size`` -- (default: `10^6`) number of 4-tuples considered in
     1073      the sampling. Used only when ``algorithm == 'sampling'``.
     1074
     1075    OUTPUT:
     1076
     1077    - ``hdict`` -- A dictionnary such that hdict[i] is the number of 4-tuples of
     1078      hyperbolicity i.
     1079
     1080    EXAMPLES:
     1081
     1082    Exact hyperbolicity distribution of the Petersen Graph::
     1083
     1084        sage: from sage.graphs.hyperbolicity import hyperbolicity_distribution
     1085        sage: G = graphs.PetersenGraph()
     1086        sage: hyperbolicity_distribution(G,algorithm='exact')
     1087        {0: 3/7, 1/2: 4/7}
     1088
     1089    Exact hyperbolicity distribution of a `3\times 3` grid::
     1090
     1091        sage: from sage.graphs.hyperbolicity import hyperbolicity_distribution
     1092        sage: G = graphs.GridGraph([3,3])
     1093        sage: hyperbolicity_distribution(G,algorithm='exact')
     1094        {0: 11/18, 1: 8/21, 2: 1/126}
     1095
     1096    TESTS:
     1097
     1098    Giving anythin else than a Graph::
     1099
     1100        sage: from sage.graphs.hyperbolicity import hyperbolicity_distribution
     1101        sage: hyperbolicity_distribution([])
     1102        Traceback (most recent call last):
     1103        ...
     1104        ValueError: The input parameter must be a Graph.
     1105
     1106    Giving a non connected graph::
     1107
     1108        sage: from sage.graphs.hyperbolicity import hyperbolicity_distribution
     1109        sage: G = Graph([(0,1),(2,3)])
     1110        sage: hyperbolicity_distribution(G)
     1111        Traceback (most recent call last):
     1112        ...
     1113        ValueError: The input Graph must be connected.
     1114    """
     1115    if not isinstance(G,Graph):
     1116        raise ValueError("The input parameter must be a Graph.")
     1117    # The hyperbolicity is defined on connected graphs
     1118    if not G.is_connected():
     1119        raise ValueError("The input Graph must be connected.")
     1120
     1121    # The hyperbolicity distribution of some classes of graphs is known. If it
     1122    # is easy and fast to test that a graph belongs to one of these classes, we
     1123    # do it.
     1124    if (G.num_verts()==G.num_edges()+1) or G.is_clique():
     1125        return {0: sampling_size if algorithm=='sampling' else binomial(G.num_verts(),4)}
     1126
     1127    cdef int N = G.num_verts()
     1128    cdef int i, j
     1129    cdef unsigned short ** distances
     1130    cdef unsigned short * _distances_
     1131    cdef dict hdict
     1132
     1133    # We compute the all pairs shortest path and store the result in a 2D array
     1134    # for faster access.
     1135    H = G.relabel( inplace = False )
     1136    _distances_ = c_distances_all_pairs(H)
     1137    distances = <unsigned short **>sage_malloc(sizeof(unsigned short *)*N)
     1138    for 0 <= i < N:
     1139        distances[i] = _distances_+i*N
     1140
     1141    if algorithm == 'exact':
     1142        hdict = __hyperbolicity_distribution__(N, distances)
     1143    elif algorithm == 'sampling':
     1144        hdict = __hyperbolicity_sampling__(N, distances, sampling_size)
     1145    else:
     1146        raise ValueError("Algorithm '%s' not yet implemented. Please contribute." %(algorithm))
     1147
     1148    # We release memory
     1149    sage_free(distances)
     1150    sage_free(_distances_)
     1151
     1152    return hdict