Opened 3 years ago

Closed 3 years ago

#27482 closed enhancement (fixed)

Filter Kruskal Implementation For Minimum Spanning tree

Reported by: gh-Hrishabh-yadav Owned by:
Priority: major Milestone: sage-8.8
Component: graph theory Keywords:
Cc: dcoudert Merged in:
Authors: Hrishabh Yadav Reviewers: David Coudert
Report Upstream: N/A Work issues:
Branch: 94aa027 (Commits, GitHub, GitLab) Commit: 94aa027a99c5ff33fb9ca065c271887e26810d2f
Dependencies: Stopgaps:

Status badges

Description

This ticket introduces a modified version of Kruskal and is called filter-kruskal as suggested in the paper:

http://algo2.iti.kit.edu/documents/fkruskal.pdf

This algorithm avoids the sorting of a large dataset by partitioning the list of edges using a pivot. Its time complexity is approximately equal to O(m) where m=number of edges. I have implemented this new algorithm and it is passing all the tests. Currently, it takes approximately equal time as compared to Kruskal algorithm of sagemath. But, it can be further optimized.

Change History (53)

comment:1 Changed 3 years ago by dcoudert

Welcome to Sagemath !

Is the algorithm complicated to implement ? Is it faster than, or comparable to, other spanning tree methods that we already have ?

comment:2 Changed 3 years ago by gh-Hrishabh-yadav

  • Branch set to u/gh-Hrishabh-yadav/filter_kruskal_implementation_for_minimum_spanning_tree

comment:3 Changed 3 years ago by git

  • Commit set to a7c90fd8574bbae0f24a7e1d650760cf78f58fe7

Branch pushed to git repo; I updated commit sha1. New commits:

a7c90fdFilter Kruskal Implementation for MST

comment:4 Changed 3 years ago by git

  • Commit changed from a7c90fd8574bbae0f24a7e1d650760cf78f58fe7 to 68d25b6405eb5ecd948b49c4afef1504145e7c33

Branch pushed to git repo; I updated commit sha1. New commits:

68d25b6Updated filter_kruskal

comment:5 follow-up: Changed 3 years ago by gh-Hrishabh-yadav

Hi, Glad to work for Sagemath!

I have implemented the following algorithm and have committed it. Right now it is comparable to kruskal but with further optimization it should work faster than Kruskal and Prim's. This algorithm works quite well for large dataset where number of vertices are more than 60000, This statements has been shown in this following pdf :

http://algo2.iti.kit.edu/documents/algo1-2014/alenex09filterkruskal.pdf

Also notice I have moved the code of "all_paths" from generic_graphs.py file to generic_graph.pyx Please review

comment:6 in reply to: ↑ 5 Changed 3 years ago by dcoudert

I have implemented the following algorithm and have committed it. Right now it is comparable to kruskal but with further optimization it should work faster than Kruskal and Prim's. This algorithm works quite well for large dataset where number of vertices are more than 60000, This statements has been shown in this following pdf :

http://algo2.iti.kit.edu/documents/algo1-2014/alenex09filterkruskal.pdf

60000 is big...

Also notice I have moved the code of "all_paths" from generic_graphs.py file to generic_graph.pyx

  • Why ? is it related to this ticket ? Since the answer is no, you must revert this change.
  • Also, in your branch I see
    diff --git a/git-trac-command b/git-trac-command
    new file mode 160000
    +Subproject b5d75da815a6e261eae54da66d418e60cea7a43
    
    Which is not related to this branch. So please correct that
  • I also see:
    diff --git a/src/sage/graphs/generic_graph.py b/src/sage/graphs/generic_graph.py
    index 3cfb603..9fd276b 100644
    --- a/src/sage/graphs/generic_graph.py
    +++ b/src/sage/graphs/generic_graph.py
    @@ -15318,6 +15318,7 @@ class GenericGraph(GenericGraph_pyx):
                     if not act_path:                 # there is no other vertex ...
                         done = True                  # ... so we are done
             return all_paths
    +        
    
    Why adding an empty line here ? Actually it's not an empty line as you are adding several unnecessary spaces. Please revert that.
  • Another issue:
    -    return list(kruskal_iterator(G, wfunction=wfunction, check=check))
    -
    -
    -def kruskal_iterator(G, wfunction=None, bint check=False):
    +    return  list(kruskal_iterator(G, wfunction=wfunction, check=check))
    +    #print(time.time()-start_time)
    +    
    +def kruskal_iterator(G,wfunction=None, bint check=False):
    
    Please remove the extra space you added after return, remove the print statement, and add the space you have removed in G,wfunction=None.

Understand that we spend a considerable amount of time improving the quality of the code, of its documentation, and of its presentation to ease its maintainability. So please don't introduce unnecessary changes, and please follow the coding style.

  • update your code to avoid the use of global variables. It is not necessary here.
  • Why using an iterator over edges here and variable i?
    +    cdef int i=0
    +    cdef int ch=0
    +    #use of ch is to equally divide the the edges that are equal to the pivot(Random) edge weight 
    +    while i < len(G_edges):
    +        
    +        Curr=next(itera)
    +        i+=1
    
  • What is variable c ?
  • You don't need variable i here. You don't use it. So a for loop is more appropriate.
    +    while i < m:
    +        i+=1
    +        e = next(sortedE_iter)
    

Please review

That's enough for now...

comment:7 Changed 3 years ago by git

  • Commit changed from 68d25b6405eb5ecd948b49c4afef1504145e7c33 to 241302ba800cd003ea15c6a32df10a2055f4f6d2

Branch pushed to git repo; I updated commit sha1. New commits:

c06f7a3A new version of spanning tree has been added, this new algorithm is Filter Krus
69511aeUpdated filter_kruskal
9fa585aUpdated filter_kruskal
660e612updating kruskal
87700fdUpdated filter_kruskal
54bf085Updated filter_kruskal
241302bMerge branch 'u/gh-Hrishabh-yadav/filter_kruskal_implementation_for_minimum_spanning_tree' of git://trac.sagemath.org/sage into t/27482/filter_kruskal_implementation_for_minimum_spanning_tree

comment:8 Changed 3 years ago by git

  • Commit changed from 241302ba800cd003ea15c6a32df10a2055f4f6d2 to cdbd0e2a411e7343fb7368c34797323dbb497929

Branch pushed to git repo; I updated commit sha1. New commits:

cdbd0e2modified some changes

comment:9 Changed 3 years ago by git

  • Commit changed from cdbd0e2a411e7343fb7368c34797323dbb497929 to cf8b9a5b822202fcde37901174b53c9441a37b4d

Branch pushed to git repo; I updated commit sha1. New commits:

cf8b9a5Modified changes

comment:10 Changed 3 years ago by gh-Hrishabh-yadav

Following changes have been made:

1.) Deleted The file git-trac-command

2.) Have modified the code according to your suggestions:

a.) Removed all the extra lines

b.) Removed the useless comments, Global variables

c.) Modified the code to use for loop

comment:11 Changed 3 years ago by gh-Hrishabh-yadav

Result with graphs size of 10000 vertices and edges : 3*105

sage:  from sage.graphs.spanning_tree import kruskal
....: from sage.graphs.spanning_tree import filter_kruskal
....: G= Graph()
....: G.allow_multiple_edges(True)
....: G.allow_loops(True)
....: G.weighted(True)
....: for i in range(300000):
....:     G.add_edge(randint(0,10000),randint(0,10000),randint(0,1000000))
....: E=filter_kruskal(G);
....: E=kruskal(G)
Time for filter_kruskal: 0.708007097244
Time for Kruskal:        0.795707941055

Result with with more dense graphs

sage: from sage.graphs.spanning_tree import kruskal
....: from sage.graphs.spanning_tree import filter_kruskal
....: G= Graph()
....: G.allow_multiple_edges(True)
....: G.allow_loops(True)
....: G.weighted(True)
....: for i in range(3000000):
....:     G.add_edge(randint(0,10000),randint(0,10000),randint(0,1000000))
....: E=filter_kruskal(G);
....: E=kruskal(G)
Time for filter_kruskal: 7.70112395287
Time for Kruskal:        11.8219969273

comment:12 Changed 3 years ago by gh-Hrishabh-yadav

  • Status changed from new to needs_review

comment:13 Changed 3 years ago by dimpase

it seems that doctests errors shown by patchbot are not related to this ticket. (it also says that pyflakes plugin should show things to be fixed, but this does not happen somehow)

comment:14 Changed 3 years ago by gh-Hrishabh-yadav

What can be done?

comment:15 follow-up: Changed 3 years ago by dimpase

You can try running pyflakes locally. There is nothing you can do about patchbot errors not related to this ticket...

comment:16 in reply to: ↑ 15 Changed 3 years ago by gh-Hrishabh-yadav

Replying to dimpase: Is there a way to run pyflakes for cython file, apparently pyflakes is not able read any cpp part in the code

You can try running pyflakes locally. There is nothing you can do about patchbot errors not related to this ticket...

comment:17 Changed 3 years ago by dimpase

Perhaps flake8 is better: see https://stackoverflow.com/questions/31269527/running-pep8-or-pylint-on-cython-code

You can install it via pip into Sage's python, naturally.

comment:18 Changed 3 years ago by dcoudert

Some remarks.

  • Please try to follow the general coding style http://doc.sagemath.org/html/en/developer/coding_basics.html#python-code-style. You need time and experience to learn it of course ;)
    • missing space between declaration of parameters
      -def kruskal_iterator(G,wfunction=None, bint check=False):
      +def kruskal_iterator(G, wfunction=None, bint check=False):
      
    • missing space around =
      -    cpdef list MST_E=[]
      +    cdef list MST_E = []
      
    • Always put a space after #
      -    #creating list of all edges 
      +    # creating list of all edges 
      
  • You can use cpdef for a function, but there is no need for variables. Prefer cdef.
  • method .edges() sorts edges by default. When it's not needed, you can do either .edges(sort=False) or list(G.edge_iterator()). For instance
    -            A=list(G.edges())
    -            return A
    +            return G.edges(sort=False)
    
    and
    -    cpdef list edgesOfGraph=[]
    -    edgesOfGraph=list(g.edges())
    +    cdef edgesOfGraph = g.edges(sort=False)
    
  • to iterate over the vertices, you can do for u in g.vertex_iterator(): or simply and preferred for u in g:.
  • Instead of importing from sage.graphs.graph import Graph in all methods, import it for the all file
  • Instead of defining method filter, you could do directly:
    L1_filter = [e for e in L1_Graph if union_find.find(e[0]) != union_find.find(e[1])]
    
  • method filter_kruskal_iterator is not an itrator as it fills MST_E. Instead, it should yield edges. This way, in method partition, you could do:
         if len(edgesOfGraph) < 10000:
    -        filter_kruskal_iterator(edgesOfGraph,Num_V,MST_E,union_find, wfunction=wfunction, check=False)
    -        return
    +        yield from filter_kruskal_iterator(edgesOfGraph, Num_V, union_find, wfunction=wfunction, check=False)
    
    and also
         if len(L1_filter) > 0:
    -       partition(L1_filter, Num_V, MST_E,union_find,wfunction=wfunction,check=check)
    +       yield from partition(L1_filter, Num_V, union_find, wfunction=wfunction, check=check)
    

comment:19 Changed 3 years ago by git

  • Commit changed from cf8b9a5b822202fcde37901174b53c9441a37b4d to f4d4a1ab999adbb2f88b9e60f3f2867a53a6a6c0

Branch pushed to git repo; I updated commit sha1. New commits:

f4d4a1aImproved quality of code

comment:20 Changed 3 years ago by git

  • Commit changed from f4d4a1ab999adbb2f88b9e60f3f2867a53a6a6c0 to fbe35aff2ff5fd2518a95ab0888a0af7bfeaecde

Branch pushed to git repo; I updated commit sha1. New commits:

fbe35afMade count_edges global

comment:21 Changed 3 years ago by gh-Hrishabh-yadav

Thanks for the following suggestions, Time taken for Filter_kruskal has decreased quite well,

Result with number of edges = 3*106

sage: from sage.graphs.spanning_tree import kruskal
....: from sage.graphs.spanning_tree import filter_kruskal
....: G= Graph()
....: G.allow_multiple_edges(True)
....: G.allow_loops(True)
....: G.weighted(True)
....: for i in range(3000000):
....:     G.add_edge(randint(0,10000),randint(0,10000),randint(0,100000000))
....: E=filter_kruskal(G);
....: E=kruskal(G);
Time taken filter_kruskal : 5.84873104095
Time taken kruskal        : 11.6121520996

Result with number of edges = 3*105

sage: from sage.graphs.spanning_tree import kruskal
....: from sage.graphs.spanning_tree import filter_kruskal
....: G= Graph()
....: G.allow_multiple_edges(True)
....: G.allow_loops(True)
....: G.weighted(True)
....: for i in range(300000):
....:     G.add_edge(randint(0,10000),randint(0,10000),randint(0,100000000))
....: E=filter_kruskal(G);
....: E=kruskal(G);
Time taken filter_kruskal : 0.632566213608
Time taken kruskal        : 0.840409040451

For counting number of edges added in Minimum spanning tree, we can follow two way by making global variable or passing an array of a single element. Which way will be preferred or is there some other way?

comment:22 Changed 3 years ago by gh-Hrishabh-yadav

I still don't understand what's the problem with the patchbot. Can someone explain me?

comment:23 Changed 3 years ago by dcoudert

I don't know what's going on with the patchbot, but when I look at the content of your branch, I still see modifications in generic_graph.py and generic_graph_pyx.pyx that must be reverted. I also see:

diff --git a/git-trac-command b/git-trac-command
new file mode 160000
+Subproject b5d75da815a6e261eae54da66d418e60cea7a43

and I don't know why it's here...

Several comments in your code

  • Please move from sage.graphs.graph import Graph at the top of the file, at the same level than from sage.sets.disjoint_set cimport DisjointSet_of_hashables
  • In a method, you must add an empty line just after the first 1 line description. For instance
         Minimum spanning tree using Filter_Kruskal's algorithm.
    +
         This algorithm is different version of kruskal algorithm, where we don't sort
    
  • Similarly, add an empty line between paragraphs of the documentation. It will drastically improve the readability. The quality of the documentaiton is one of the strength of Sagemath and the graph module. That's why we do our best the maintain the quality, although it takes time to get used to the standard.
  • the keyword Example should be EXAMPLES:, and
    -    Example:
    -    The edges of a minimum spanning tree of ``G``, if one exists, otherwise
    -    returns the empty list.
    -    Output:
    -    sage: from sage.graphs.spanning_tree import filter_kruskal
    -    sage: G = Graph({1:{2:28, 6:10}, 2:{3:16, 7:14}, 3:{4:12}, 4:{5:22, 7:18}, 5:{6:25, 7:24}})
    -    sage: G.weighted(True)
    -    sage: E = filter_kruskal(G, check=True); E
    -    [(1, 6, 10), (3, 4, 12), (2, 7, 14), (2, 3, 16), (4, 5, 22), (5, 6, 25)]
    +    EXAMPLES:
    +
    +    Return the edges of a minimum spanning tree of the graph, if one exists::
    +
    +        sage: from sage.graphs.spanning_tree import filter_kruskal
    +        sage: G = Graph({1:{2:28, 6:10}, 2:{3:16, 7:14}, 3:{4:12}, 4:{5:22, 7:18}, 5:{6:25, 7:24}})
    +        sage: G.weighted(True)
    +        sage: E = filter_kruskal(G, check=True); E
    +        [(1, 6, 10), (3, 4, 12), (2, 7, 14), (2, 3, 16), (4, 5, 22), (5, 6, 25)]
    +
    

Note that the 4 spaces shifts are important for generating a nice html documentation.

  • Error messages have no capital letter at the beginning and no . at the end (unless we have a very long text)
    -        raise ValueError("The input G must be an undirected graph.")
    +        raise ValueError("the input graph must be undirected")
    
  • method .edges() returns a list of edges. So you don't need to do list(g.edges(...)). Moreover, there is no need to initialize edgesOfGraph before assigning it a list.
    -    cdef list edgesOfGraph = []
    -    edgesOfGraph = list(g.edges(sort=False))
    +    cdef list edgesOfGraph = g.edges(sort=False)
    
    not also that I'm not a fan of the name edgesOfGraph. Could be simply edges or E ?
  • You don't need to define variable Num_V. It is used only once, so use g.order() directly
  • Why do you need to define a global variable to count edges ? Do you really need to count edges ? Note that as soon as the union find data structure as a single set, you know that you have selected enough edges. check union_find.number_of_subsets()
  • method partition must be documented and at least one doctest added. Note also that this method could be cdef as it is for local use only
  • typo: psuedocode -> pseudocode
  • you could reorganize the code to 1) select the pivot, and then split the list of input edges according that pivot. So something like
    # Selecting a pivot (edge weight) at random
    pivot = random.choice(edges)[2]
    
    # Divide the list of edges according the pivot.
    # We use variable ch to equally divide edges with weight equal
    # to the pivot
    cdef list L1 = []
    cdef list L2 = []
    cdef int ch = 0
    
    for e in edges:
        if e[2] < pivot:
            L1.append(e)
        elif e[2] > pivot:
            L2.append(e)
        elif ch:
            L1.append(e)
            ch = 0
        else:
            L2.append(e)
            ch = 1
    

comment:24 Changed 3 years ago by gh-Hrishabh-yadav

  • Branch u/gh-Hrishabh-yadav/filter_kruskal_implementation_for_minimum_spanning_tree deleted
  • Commit fbe35aff2ff5fd2518a95ab0888a0af7bfeaecde deleted

comment:25 Changed 3 years ago by gh-Hrishabh-yadav

  • Branch set to u/gh-Hrishabh-yadav/filter_kruskal_spanning_tree

comment:26 Changed 3 years ago by gh-Hrishabh-yadav

  • Commit set to 3aee9726cc4e73ff17df11ecb98496785fa38081

Thanks for the response, I deleted the previous branch and added a new one. I have modified the code according to your suggestions,

Moving from sage.graphs.graph import Graph to the top of the file results in crashing of sage, with crash report saying :-

 
/home/hrishabh/sage/local/lib/python2.7/site-packages/sage/graphs/spanning_tree.pyx in init sage.graphs.spanning_tree (build/cythonized/sage/graphs/spanning_tree.c:13774)()
     53 #       Copyright (c) 2009 Mike Hansen <mhansen@gmail.com>
     54 #       Copyright (c) 2010 Gregory McWhirter <gmcwhirt@uci.edu>
     55 #       Copyright (c) 2010 Minh Van Nguyen <nguyenminh2@gmail.com>
     56 #
     57 # This program is free software: you can redistribute it and/or modify
     58 # it under the terms of the GNU General Public License as published by
     59 # the Free Software Foundation, either version 2 of the License, or
     60 # (at your option) any later version.
     61 #                  https://www.gnu.org/licenses/
     62 # ****************************************************************************
     63 from __future__ import absolute_import
     64 
     65 cimport cython
     66 
     67 from sage.sets.disjoint_set cimport DisjointSet_of_hashables
---> 68 from sage.graphs.graph import Graph
        global sage.graphs.graph = undefined
        global Graph = undefined
     69 cpdef kruskal(G, wfunction=None, bint check=False):
     70     r"""
     71     Minimum spanning tree using Kruskal's algorithm.
     72 
     73     This function assumes that we can only compute minimum spanning trees for
     74     undirected graphs. Such graphs can be weighted or unweighted, and they can
     75     have multiple edges (since we are computing the minimum spanning tree, only
     76     the minimum weight among all `(u,v)`-edges is considered, for each pair
     77     of vertices `u`, `v`).
     78 
     79     INPUT:
     80 
     81     - ``G`` -- an undirected graph.
     82 
     83     - ``weight_function`` (function) - a function that inputs an edge ``e``

ImportError: cannot import name Graph

Is there something that can be done?


New commits:

99d9ec6Modified some changes in documentation
fc0905bMade partition function cdef
3aee972Documentation changes

comment:27 Changed 3 years ago by dcoudert

Well, it's more complex than what I was expecting. So let the import in the methods. BUT, please move the import after the description of the method, for instance just before if not isinstance(G, Graph):

More comments:

  • no need to list( ) G.edges()
    -            return list(G.edges(sort=False))
    +            return G.edges(sort=False)
    
  • plus you must indicate the type of a variable
    -    cdef edges = list(g.edges(sort=False))
    +    cdef list edges = g.edges(sort=False)
    
    in fact, since variable edges is used only once in method filter_kruskal, you should better call g.edges(sort=False) directly, without creating intermediate variable edges

In method partition

  • you must document the method and its parameters. What is the goal of this method ? What is it doing ?
  • the threshold should be a parameter of the method with default value 10000. It's the only way a user can check that for instance 10 is too small without having to modify and recompile the code. So it should also be a parameter of filter_kruskal
  • Method partition does not consider a graph but a list of edges. So the documentation and comments must be consistent with that. For instance
    -    # If graph size is less than the kruskal Threshold, then we implement kruskal
    -    # Here threshold is set to  10000 edges
    -    if len(edges) < 10000:
    -        return filter_kruskal_iterator(edges, union_find, wfunction=wfunction, check=False)
    +    # If we have less edges than the threshold, we run the Kruskal algorithm
    +    if len(edges) < threshold:
    +        return filter_kruskal_iterator(edges, union_find, wfunction=wfunction, check=False)
    +
    +    # Otherwise, we partition the list of edges according a randomly selected pivot
    +
    
  • -    # Checking if we have got the spanning tree
    +    # Check if we have already obtained a spanning tree
    

Method filter_kruskal_iterator

  • This method must also be documented
  • remove sortedE_iter = None. It's not used
  • Actually, this method is just copy/paste of the same code from kruskal_iterator (ok, we have already improved it by removing useless variables/instructions). So it would be much smarter to avoid code repetition by creating a method kruskal_iterator_from_edges that takes as input: a list of edges (edges), a weight function (wfunction), and a DisjointSet_of_hashables (union_find). The method will then yield edges.

This method will then be called by kruskal_iterator and partition like that:

yield from kruskal_iterator_from_edges(edges, union_find, wfunction)

I know it's along process, but the goal is to get the best possible implementation: simpler, faster, more elegant, bug free, easier to maintain, etc.

comment:28 Changed 3 years ago by git

  • Commit changed from 3aee9726cc4e73ff17df11ecb98496785fa38081 to 16407759b6d08221d34da918b47dea2edca55252

Branch pushed to git repo; I updated commit sha1. New commits:

1640775Implemented Kruskal_iterator_from_edges

comment:29 Changed 3 years ago by gh-Hrishabh-yadav

Implementation and Documentation of kruskal_iterator_from_edges is done, Thanks for the guidelines. I am currently on my learning phase and working on such projects is only going to help me. So it doesn't matter about duration of the project. Please review the recent commit, Thanks! :)

comment:30 Changed 3 years ago by dcoudert

  • Why this change ? I think you should revert it.
    -    Return an iterator implementation of Kruskal algorithm.
    +    This method sanity checks and calls the fucntion kruskal_iterator_from_edges.  
    
  • An error message should not start with a capital letter
    -        raise ValueError("The input G must be an undirected graph")
    +        raise ValueError("the input graph must be undirected")
    

Warning when you change an error message, you must make sure that all doctests with this error message are updated. If you are not sure of what you are doing, it's better/safer to let the error message as it was.

  • Method filter_kruskal can be only def instead of cpdef
  • You should rename parameter kruskal_threshold to simply threshold. It's clear enough.
  • since you document parameter weight_function, I suggest you use that name everywhere instead of wfunction
  • a small detail
    -    - ``weight_function`` (function) - a function that inputs an edge ``e``
    +    - ``weight_function`` -- function (default: ``None``); a function that inputs an edge ``e``
    
    and correct the 80 columns formatting
  • proposal to improve the documentation
    -    - ``kruskal_threshold`` -- The maximum number of edges on which kruskal algorithm
    -      should run. Default: ``kruskal_threshold=10000``.  
    +    - ``threshold`` -- integer (default: 10000); maximum number of edges
    +      on which to run kruskal algorithm. Above that value, edges are
    +      partitioned in sets of size at most ``threshold``.
    
  • you can let 1 empty line here, not 2
    +    
    +
    +    """
    

Concerning method partition, I have a proposal that will, in my opinion, ease the code: make method partition a sub-method of filter_krustal. If you like it, just copy/paste the code. I made several corrections in it

def filter_kruskal(G, threshold=10000, weight_function=None, bint check=False):
    r"""
    ... CORRECTED DOCUMENTATION USING ABOVE COMMENTS ...
    """
    import random
    from sage.graphs.graph import Graph

    if not isinstance(G, Graph):
        raise ValueError("the input graph must be undirected")
    if check:
        if not G.order():
            return []
        if not G.is_connected():
            return
        # G is now assumed to be a nonempty connected graph
        if G.num_verts() == G.num_edges() + 1:
            # G is a tree
            return G.edges(sort=False)

        g = G.to_simple(to_undirected=False, keep_label='min')
    else:
        g = G

    def partition(edges, union_find, threshold, weight_function):
        """
        This method is the core of the algorithm.
        """
        # If graph size is less than the threshold, then we call Kruskal.
        if len(edges) < threshold:
            yield from kruskal_iterator_from_edges(edges, union_find, weight_function=weight_function)
            return
        # Selecting a pivot (edge weight) at random
        pivot = random.choice(edges)[2]
 
        # Divide the list of edges according the pivot
        # We use variable ch to equally divide edges with weight equal
        # the to pivot
        cdef list L1_Graph = []
        cdef list L2_Graph = []
        cdef int ch = 0

        for e in edges:
            if e[2] < pivot:
                L1_Graph.append(e)
            elif e[2] > pivot:
                L2_Graph.append(e)
            else:
                if ch:
                    L1_Graph.append(e)
                    ch = 0
                else:
                    L2_Graph.append(e)
                    ch = 1

        # We filter the list of edges to remove edges between vertices
        # of the same component
        L1_filter = [e for e in L1_Graph if union_find.find(e[0]) != union_find.find(e[1])]
        if L1_filter:
            yield from partition(L1_filter, union_find, threshold=threshold, weight_function=weight_function)

        # Checking if we have got the spanning tree
        if union_find.number_of_subsets() == 1:
            return

        L2_filter = [e for e in L2_Graph if union_find.find(e[0]) != union_find.find(e[1])]
        if L2_filter:
            yield from partition(L2_filter, union_find, threshold=threshold, weight_function=weight_function)
            return


    # Making a union set
    cdef DisjointSet_of_hashables union_find = DisjointSet_of_hashables(g.vertex_iterator())

    return list(partition(g.edges(sort=False), union_find, threshold=threshold,
                          weight_function=weight_function))

Concerning method kruskal_iterator_from_edges:

  • a small change
    -    Implements kruskal algorithm for minimum spanning tree
    +    Implement kruskal algorithm for minimum spanning tree
    
  • add some description of what the method do, and that it works on edges
  • document input parameters similarly that in other methods

comment:31 Changed 3 years ago by git

  • Commit changed from 16407759b6d08221d34da918b47dea2edca55252 to ba6973cdb7cb5cf26aaadb303926626dc8ab6654

Branch pushed to git repo; I updated commit sha1. New commits:

ffcea9eModified Floyd-Warshall to work with weighted graphs with non-negative integer edge weights.
0d7299fMerge branch 'u/gh-giorgosgiapis/weighted_floyd_warshall' of git://trac.sagemath.org/sage into filter_kruskal_spanning_tree
ba6973cMade partition sub-function of filter_kruskal

comment:32 Changed 3 years ago by git

  • Commit changed from ba6973cdb7cb5cf26aaadb303926626dc8ab6654 to 4772886cdf2ef90a25c11b8613b0893742ec627a

Branch pushed to git repo; I updated commit sha1. New commits:

4772886Revert "Merge branch 'u/gh-giorgosgiapis/weighted_floyd_warshall' of git://trac.sagemath.org/sage into filter_kruskal_spanning_tree"

comment:33 Changed 3 years ago by dcoudert

sorry I was not clear enough, but there is a misunderstanding. I mean to replace wfunction by weight_function in the new code you are adding (= only in filter_kruskal and kruskal_iterator_from_edges), not in previously existing methods like kruskal, kruskal_iterator and boruvka.

Indeed, these methods are called by other methods (e.g., .min_spanning_tree()). So with the current modifications, some doctests will break. Moreover, we usually add a deprecation warning during 1 year before effectively changing the name of a parameter to let enough time to users to update their own code.

comment:34 Changed 3 years ago by git

  • Commit changed from 4772886cdf2ef90a25c11b8613b0893742ec627a to 697701d62916b7731609cae44ccaf51a3ddb389e

Branch pushed to git repo; I updated commit sha1. New commits:

ff7a876Changed weight_fucntion to wfunction
697701dchanged some value error statement

comment:35 Changed 3 years ago by gh-Hrishabh-yadav

In method kruskal: raise ValueError("The input G must be an undirected graph.") and in method boruvka : raise ValueError("the input graph must be undirected") , error statements have written in such way. I have verified this according to Doctest.

comment:36 Changed 3 years ago by dcoudert

I was not satisfied with this recursive version of filter kruskal as it makes many copies of the list. So I coded a new version that is more in the spirit of quicksort.

You can find it in branch public/graphs/27482_filter_kruskal

The branch do:

  • a new version of filter kruskal avoiding to create many copies of the list of edges, and that is not recursive.
  • expose the method in method min_spanning_tree of generic_graph.py
  • many corrections
  • add reference to the paper at the right place

The branch is in public/, so you can modify it. Of course be careful ;)

Please try it, do many tests, etc. If you like it, you can replace the branch in this ticket by this one.

comment:37 Changed 3 years ago by embray

  • Milestone changed from sage-8.7 to sage-8.8

Ticket retargeted after milestone closed (if you don't believe this ticket is appropriate for the Sage 8.8 release please retarget manually)

comment:38 Changed 3 years ago by gh-Hrishabh-yadav

  • Branch u/gh-Hrishabh-yadav/filter_kruskal_spanning_tree deleted
  • Commit 697701d62916b7731609cae44ccaf51a3ddb389e deleted

comment:39 Changed 3 years ago by gh-Hrishabh-yadav

  • Branch set to public/graphs/27482_filter_kruskal

comment:40 Changed 3 years ago by gh-Hrishabh-yadav

  • Commit set to c10469297ef364b0e917bb9b4407187d2abea9ad

I have changed the current branch to public/graphs/27482_filter_kruskal This code works quite well, yeah making many copies of list was never a good idea.

For large and highly dense graph

sage: from sage.graphs.spanning_tree import kruskal
....: from sage.graphs.spanning_tree import filter_kruskal
....: G= Graph()
....: G.allow_multiple_edges(True)
....: G.allow_loops(True)
....: G.weighted(True)
....: for i in range(3000000):
....:     G.add_edge(randint(0,10000),randint(0,10000),randint(0,100000000))
....: E=filter_kruskal(G);
....: E=kruskal(G);
Time taken filter_kruskal : 4.15850090981
Time taken kruskal        : 11.2105371952

For large and partially dense graph

sage: from sage.graphs.spanning_tree import kruskal
....: from sage.graphs.spanning_tree import filter_kruskal
....: G= Graph()
....: G.allow_multiple_edges(True)
....: G.allow_loops(True)
....: G.weighted(True)
....: for i in range(300000):
....:     G.add_edge(randint(0,10000),randint(0,10000),randint(0,100000000))
....: E=filter_kruskal(G);
....: E=kruskal(G);
Time taken filter_kruskal : 0.484476089478
Time taken kruskal        : 0.874042987823

New commits:

c104692trac #27482: new version of filter kruskal

comment:41 Changed 3 years ago by gh-Hrishabh-yadav

If we see carefully selection of pivot can result in worst case complexity of O(m*m)(really really less odds!), but we can still have a better algorithm for selecting a pivot rather than selecting it randomly. I was thinking of selecting it by finding median. There are algorithm for finding median in linear time. So why not? As suggested in this paper:- http://www.cs.cornell.edu/courses/cs2110/2009su/Lectures/examples/MedianFinding.pdf

Last edited 3 years ago by gh-Hrishabh-yadav (previous) (diff)

comment:42 Changed 3 years ago by dcoudert

You can try it on your side and see how it affects the running time. You could also investigate on how quicksort do in practice (i.e., how is it done in C/C++ libraries). You can certainly find some experiments with different algorithms for pivot selection.

comment:43 Changed 3 years ago by dcoudert

Do you still plan to (try to) improve the implementation or can we try to conclude this ticket ?

comment:44 Changed 3 years ago by gh-Hrishabh-yadav

Yeah, I did try to implement it and results weren't quite good. Problem with median_of_medians is that although it is a linear time algorithm but also has large coefficient.

The result of median_of_median was still better than the one currently used for finding median.

Yeah it will be better if we conclude this ticket now, but still I am planning to study more about pivot selection. I also came across a data structure used for selection algorithm: https://en.wikipedia.org/wiki/Soft_heap Will be planning to implement it in future, for now we can conclude this ticket. Any suggestion?

comment:45 Changed 3 years ago by dcoudert

As reported by the patchbot

-    EXAMPLE:
+    EXAMPLES::

comment:46 Changed 3 years ago by git

  • Commit changed from c10469297ef364b0e917bb9b4407187d2abea9ad to 3aa3c8d654388e00cbdcb146032462a86b7326bb

Branch pushed to git repo; I updated commit sha1. New commits:

3aa3c8dModified "Example:"

comment:47 Changed 3 years ago by dcoudert

No. Please revert last commit. The needed change is in kruskal_iterator_from_edges.

Also, you should build the documentation and check if the resulting html pages display well.

comment:48 Changed 3 years ago by git

  • Commit changed from 3aa3c8d654388e00cbdcb146032462a86b7326bb to 6e928245fdc3300af342bc2dfc99fee7d81b6e0c

Branch pushed to git repo; I updated commit sha1. New commits:

6e92824Revrted and Modified

comment:49 Changed 3 years ago by git

  • Commit changed from 6e928245fdc3300af342bc2dfc99fee7d81b6e0c to 94aa027a99c5ff33fb9ca065c271887e26810d2f

Branch pushed to git repo; I updated commit sha1. New commits:

94aa027Modified "Examples"

comment:50 Changed 3 years ago by dcoudert

  • Reviewers set to David Coudert
  • Status changed from needs_review to positive_review

LGTM.

comment:51 Changed 3 years ago by chapoton

author name should be real name

comment:52 Changed 3 years ago by gh-Hrishabh-yadav

  • Authors changed from gh-Hrishabh-yadav to Hrishabh Yadav

comment:53 Changed 3 years ago by vbraun

  • Branch changed from public/graphs/27482_filter_kruskal to 94aa027a99c5ff33fb9ca065c271887e26810d2f
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.