Opened 3 years ago
Closed 3 years ago
#27482 closed enhancement (fixed)
Filter Kruskal Implementation For Minimum Spanning tree
Reported by:  ghHrishabhyadav  Owned by:  

Priority:  major  Milestone:  sage8.8 
Component:  graph theory  Keywords:  
Cc:  dcoudert  Merged in:  
Authors:  Hrishabh Yadav  Reviewers:  David Coudert 
Report Upstream:  N/A  Work issues:  
Branch:  94aa027 (Commits, GitHub, GitLab)  Commit:  94aa027a99c5ff33fb9ca065c271887e26810d2f 
Dependencies:  Stopgaps: 
Description
This ticket introduces a modified version of Kruskal and is called filterkruskal as suggested in the paper:
This algorithm avoids the sorting of a large dataset by partitioning the list of edges using a pivot. Its time complexity is approximately equal to O(m) where m=number of edges. I have implemented this new algorithm and it is passing all the tests. Currently, it takes approximately equal time as compared to Kruskal algorithm of sagemath. But, it can be further optimized.
Change History (53)
comment:1 Changed 3 years ago by
comment:2 Changed 3 years ago by
 Branch set to u/ghHrishabhyadav/filter_kruskal_implementation_for_minimum_spanning_tree
comment:3 Changed 3 years ago by
 Commit set to a7c90fd8574bbae0f24a7e1d650760cf78f58fe7
Branch pushed to git repo; I updated commit sha1. New commits:
a7c90fd  Filter Kruskal Implementation for MST

comment:4 Changed 3 years ago by
 Commit changed from a7c90fd8574bbae0f24a7e1d650760cf78f58fe7 to 68d25b6405eb5ecd948b49c4afef1504145e7c33
Branch pushed to git repo; I updated commit sha1. New commits:
68d25b6  Updated filter_kruskal

comment:5 followup: ↓ 6 Changed 3 years ago by
Hi, Glad to work for Sagemath!
I have implemented the following algorithm and have committed it. Right now it is comparable to kruskal but with further optimization it should work faster than Kruskal and Prim's. This algorithm works quite well for large dataset where number of vertices are more than 60000, This statements has been shown in this following pdf :
http://algo2.iti.kit.edu/documents/algo12014/alenex09filterkruskal.pdf
Also notice I have moved the code of "all_paths" from generic_graphs.py file to generic_graph.pyx Please review
comment:6 in reply to: ↑ 5 Changed 3 years ago by
I have implemented the following algorithm and have committed it. Right now it is comparable to kruskal but with further optimization it should work faster than Kruskal and Prim's. This algorithm works quite well for large dataset where number of vertices are more than 60000, This statements has been shown in this following pdf :
http://algo2.iti.kit.edu/documents/algo12014/alenex09filterkruskal.pdf
60000 is big...
Also notice I have moved the code of "all_paths" from generic_graphs.py file to generic_graph.pyx
 Why ? is it related to this ticket ? Since the answer is no, you must revert this change.
 Also, in your branch I see
diff git a/gittraccommand b/gittraccommand new file mode 160000 +Subproject b5d75da815a6e261eae54da66d418e60cea7a43
Which is not related to this branch. So please correct that
 I also see:
diff git a/src/sage/graphs/generic_graph.py b/src/sage/graphs/generic_graph.py index 3cfb603..9fd276b 100644  a/src/sage/graphs/generic_graph.py +++ b/src/sage/graphs/generic_graph.py @@ 15318,6 +15318,7 @@ class GenericGraph(GenericGraph_pyx): if not act_path: # there is no other vertex ... done = True # ... so we are done return all_paths +
Why adding an empty line here ? Actually it's not an empty line as you are adding several unnecessary spaces. Please revert that.
 Another issue:
Please remove the extra space you added after
 return list(kruskal_iterator(G, wfunction=wfunction, check=check))   def kruskal_iterator(G, wfunction=None, bint check=False): + return list(kruskal_iterator(G, wfunction=wfunction, check=check)) + #print(time.time()start_time) + +def kruskal_iterator(G,wfunction=None, bint check=False):
return
, remove the print statement, and add the space you have removed inG,wfunction=None
.
Understand that we spend a considerable amount of time improving the quality of the code, of its documentation, and of its presentation to ease its maintainability. So please don't introduce unnecessary changes, and please follow the coding style.
 update your code to avoid the use of global variables. It is not necessary here.
 Why using an iterator over edges here and variable
i
?+ cdef int i=0 + cdef int ch=0 + #use of ch is to equally divide the the edges that are equal to the pivot(Random) edge weight + while i < len(G_edges): + + Curr=next(itera) + i+=1
 What is variable
c
?
 You don't need variable
i
here. You don't use it. So afor
loop is more appropriate.+ while i < m: + i+=1 + e = next(sortedE_iter)
Please review
That's enough for now...
comment:7 Changed 3 years ago by
 Commit changed from 68d25b6405eb5ecd948b49c4afef1504145e7c33 to 241302ba800cd003ea15c6a32df10a2055f4f6d2
Branch pushed to git repo; I updated commit sha1. New commits:
c06f7a3  A new version of spanning tree has been added, this new algorithm is Filter Krus

69511ae  Updated filter_kruskal

9fa585a  Updated filter_kruskal

660e612  updating kruskal

87700fd  Updated filter_kruskal

54bf085  Updated filter_kruskal

241302b  Merge branch 'u/ghHrishabhyadav/filter_kruskal_implementation_for_minimum_spanning_tree' of git://trac.sagemath.org/sage into t/27482/filter_kruskal_implementation_for_minimum_spanning_tree

comment:8 Changed 3 years ago by
 Commit changed from 241302ba800cd003ea15c6a32df10a2055f4f6d2 to cdbd0e2a411e7343fb7368c34797323dbb497929
Branch pushed to git repo; I updated commit sha1. New commits:
cdbd0e2  modified some changes

comment:9 Changed 3 years ago by
 Commit changed from cdbd0e2a411e7343fb7368c34797323dbb497929 to cf8b9a5b822202fcde37901174b53c9441a37b4d
Branch pushed to git repo; I updated commit sha1. New commits:
cf8b9a5  Modified changes

comment:10 Changed 3 years ago by
Following changes have been made:
1.) Deleted The file gittraccommand
2.) Have modified the code according to your suggestions:
a.) Removed all the extra lines
b.) Removed the useless comments, Global variables
c.) Modified the code to use for loop
comment:11 Changed 3 years ago by
Result with graphs size of 10000 vertices and edges : 3*10^{5 }
sage: from sage.graphs.spanning_tree import kruskal ....: from sage.graphs.spanning_tree import filter_kruskal ....: G= Graph() ....: G.allow_multiple_edges(True) ....: G.allow_loops(True) ....: G.weighted(True) ....: for i in range(300000): ....: G.add_edge(randint(0,10000),randint(0,10000),randint(0,1000000)) ....: E=filter_kruskal(G); ....: E=kruskal(G) Time for filter_kruskal: 0.708007097244 Time for Kruskal: 0.795707941055
Result with with more dense graphs
sage: from sage.graphs.spanning_tree import kruskal ....: from sage.graphs.spanning_tree import filter_kruskal ....: G= Graph() ....: G.allow_multiple_edges(True) ....: G.allow_loops(True) ....: G.weighted(True) ....: for i in range(3000000): ....: G.add_edge(randint(0,10000),randint(0,10000),randint(0,1000000)) ....: E=filter_kruskal(G); ....: E=kruskal(G) Time for filter_kruskal: 7.70112395287 Time for Kruskal: 11.8219969273
comment:12 Changed 3 years ago by
 Status changed from new to needs_review
comment:13 Changed 3 years ago by
it seems that doctests errors shown by patchbot are not related to this ticket. (it also says that pyflakes plugin should show things to be fixed, but this does not happen somehow)
comment:14 Changed 3 years ago by
What can be done?
comment:15 followup: ↓ 16 Changed 3 years ago by
You can try running pyflakes locally. There is nothing you can do about patchbot errors not related to this ticket...
comment:16 in reply to: ↑ 15 Changed 3 years ago by
Replying to dimpase: Is there a way to run pyflakes for cython file, apparently pyflakes is not able read any cpp part in the code
You can try running pyflakes locally. There is nothing you can do about patchbot errors not related to this ticket...
comment:17 Changed 3 years ago by
Perhaps flake8 is better: see https://stackoverflow.com/questions/31269527/runningpep8orpylintoncythoncode
You can install it via pip into Sage's python, naturally.
comment:18 Changed 3 years ago by
Some remarks.
 Your branch u/ghHrishabhyadav/filter_kruskal_implementation_for_minimum_spanning_tree still contains modifications that must be reverted in
generic_graph.py
,generic_graph_pyx.pyx
.
 Please try to follow the general coding style http://doc.sagemath.org/html/en/developer/coding_basics.html#pythoncodestyle. You need time and experience to learn it of course ;)
 missing space between declaration of parameters
def kruskal_iterator(G,wfunction=None, bint check=False): +def kruskal_iterator(G, wfunction=None, bint check=False):
 missing space around
=
 cpdef list MST_E=[] + cdef list MST_E = []
 Always put a space after
#
 #creating list of all edges + # creating list of all edges
 missing space between declaration of parameters
 You can use
cpdef
for a function, but there is no need for variables. Prefercdef
.
 method
.edges()
sorts edges by default. When it's not needed, you can do either.edges(sort=False)
orlist(G.edge_iterator())
. For instanceand A=list(G.edges())  return A + return G.edges(sort=False)
 cpdef list edgesOfGraph=[]  edgesOfGraph=list(g.edges()) + cdef edgesOfGraph = g.edges(sort=False)
 to iterate over the vertices, you can do
for u in g.vertex_iterator():
or simply and preferredfor u in g:
.
 Instead of importing
from sage.graphs.graph import Graph
in all methods, import it for the all file
 Instead of defining method
filter
, you could do directly:L1_filter = [e for e in L1_Graph if union_find.find(e[0]) != union_find.find(e[1])]
 method
filter_kruskal_iterator
is not an itrator as it fillsMST_E
. Instead, it should yield edges. This way, in methodpartition
, you could do:and alsoif len(edgesOfGraph) < 10000:  filter_kruskal_iterator(edgesOfGraph,Num_V,MST_E,union_find, wfunction=wfunction, check=False)  return + yield from filter_kruskal_iterator(edgesOfGraph, Num_V, union_find, wfunction=wfunction, check=False)
if len(L1_filter) > 0:  partition(L1_filter, Num_V, MST_E,union_find,wfunction=wfunction,check=check) + yield from partition(L1_filter, Num_V, union_find, wfunction=wfunction, check=check)
comment:19 Changed 3 years ago by
 Commit changed from cf8b9a5b822202fcde37901174b53c9441a37b4d to f4d4a1ab999adbb2f88b9e60f3f2867a53a6a6c0
Branch pushed to git repo; I updated commit sha1. New commits:
f4d4a1a  Improved quality of code

comment:20 Changed 3 years ago by
 Commit changed from f4d4a1ab999adbb2f88b9e60f3f2867a53a6a6c0 to fbe35aff2ff5fd2518a95ab0888a0af7bfeaecde
Branch pushed to git repo; I updated commit sha1. New commits:
fbe35af  Made count_edges global

comment:21 Changed 3 years ago by
Thanks for the following suggestions, Time taken for Filter_kruskal has decreased quite well,
Result with number of edges = 3*10^{6}^{ }
sage: from sage.graphs.spanning_tree import kruskal ....: from sage.graphs.spanning_tree import filter_kruskal ....: G= Graph() ....: G.allow_multiple_edges(True) ....: G.allow_loops(True) ....: G.weighted(True) ....: for i in range(3000000): ....: G.add_edge(randint(0,10000),randint(0,10000),randint(0,100000000)) ....: E=filter_kruskal(G); ....: E=kruskal(G); Time taken filter_kruskal : 5.84873104095 Time taken kruskal : 11.6121520996
Result with number of edges = 3*10^{5}^{ }
sage: from sage.graphs.spanning_tree import kruskal ....: from sage.graphs.spanning_tree import filter_kruskal ....: G= Graph() ....: G.allow_multiple_edges(True) ....: G.allow_loops(True) ....: G.weighted(True) ....: for i in range(300000): ....: G.add_edge(randint(0,10000),randint(0,10000),randint(0,100000000)) ....: E=filter_kruskal(G); ....: E=kruskal(G); Time taken filter_kruskal : 0.632566213608 Time taken kruskal : 0.840409040451
For counting number of edges added in Minimum spanning tree, we can follow two way by making global variable or passing an array of a single element. Which way will be preferred or is there some other way?
comment:22 Changed 3 years ago by
I still don't understand what's the problem with the patchbot. Can someone explain me?
comment:23 Changed 3 years ago by
I don't know what's going on with the patchbot, but when I look at the content of your branch, I still see modifications in generic_graph.py
and generic_graph_pyx.pyx
that must be reverted. I also see:
diff git a/gittraccommand b/gittraccommand new file mode 160000 +Subproject b5d75da815a6e261eae54da66d418e60cea7a43
and I don't know why it's here...
Several comments in your code
 Please move
from sage.graphs.graph import Graph
at the top of the file, at the same level thanfrom sage.sets.disjoint_set cimport DisjointSet_of_hashables
 In a method, you must add an empty line just after the first 1 line description. For instance
Minimum spanning tree using Filter_Kruskal's algorithm. + This algorithm is different version of kruskal algorithm, where we don't sort
 Similarly, add an empty line between paragraphs of the documentation. It will drastically improve the readability. The quality of the documentaiton is one of the strength of Sagemath and the graph module. That's why we do our best the maintain the quality, although it takes time to get used to the standard.
 the keyword
Example
should beEXAMPLES:
, and Example:  The edges of a minimum spanning tree of ``G``, if one exists, otherwise  returns the empty list.  Output:  sage: from sage.graphs.spanning_tree import filter_kruskal  sage: G = Graph({1:{2:28, 6:10}, 2:{3:16, 7:14}, 3:{4:12}, 4:{5:22, 7:18}, 5:{6:25, 7:24}})  sage: G.weighted(True)  sage: E = filter_kruskal(G, check=True); E  [(1, 6, 10), (3, 4, 12), (2, 7, 14), (2, 3, 16), (4, 5, 22), (5, 6, 25)] + EXAMPLES: + + Return the edges of a minimum spanning tree of the graph, if one exists:: + + sage: from sage.graphs.spanning_tree import filter_kruskal + sage: G = Graph({1:{2:28, 6:10}, 2:{3:16, 7:14}, 3:{4:12}, 4:{5:22, 7:18}, 5:{6:25, 7:24}}) + sage: G.weighted(True) + sage: E = filter_kruskal(G, check=True); E + [(1, 6, 10), (3, 4, 12), (2, 7, 14), (2, 3, 16), (4, 5, 22), (5, 6, 25)] +
Note that the 4 spaces shifts are important for generating a nice html documentation.
 Error messages have no capital letter at the beginning and no
.
at the end (unless we have a very long text) raise ValueError("The input G must be an undirected graph.") + raise ValueError("the input graph must be undirected")
 method
.edges()
returns a list of edges. So you don't need to dolist(g.edges(...))
. Moreover, there is no need to initializeedgesOfGraph
before assigning it a list.not also that I'm not a fan of the name cdef list edgesOfGraph = []  edgesOfGraph = list(g.edges(sort=False)) + cdef list edgesOfGraph = g.edges(sort=False)
edgesOfGraph
. Could be simplyedges
orE
?
 You don't need to define variable
Num_V
. It is used only once, so useg.order()
directly
 Why do you need to define a global variable to count edges ? Do you really need to count edges ? Note that as soon as the union find data structure as a single set, you know that you have selected enough edges. check
union_find.number_of_subsets()
 method
partition
must be documented and at least one doctest added. Note also that this method could becdef
as it is for local use only
 typo:
psuedocode
>pseudocode
 you could reorganize the code to 1) select the pivot, and then split the list of input edges according that pivot. So something like
# Selecting a pivot (edge weight) at random pivot = random.choice(edges)[2] # Divide the list of edges according the pivot. # We use variable ch to equally divide edges with weight equal # to the pivot cdef list L1 = [] cdef list L2 = [] cdef int ch = 0 for e in edges: if e[2] < pivot: L1.append(e) elif e[2] > pivot: L2.append(e) elif ch: L1.append(e) ch = 0 else: L2.append(e) ch = 1
comment:24 Changed 3 years ago by
 Branch u/ghHrishabhyadav/filter_kruskal_implementation_for_minimum_spanning_tree deleted
 Commit fbe35aff2ff5fd2518a95ab0888a0af7bfeaecde deleted
comment:25 Changed 3 years ago by
 Branch set to u/ghHrishabhyadav/filter_kruskal_spanning_tree
comment:26 Changed 3 years ago by
 Commit set to 3aee9726cc4e73ff17df11ecb98496785fa38081
Thanks for the response, I deleted the previous branch and added a new one. I have modified the code according to your suggestions,
Moving from sage.graphs.graph import Graph
to the top of the file results in crashing of sage, with crash report saying :
/home/hrishabh/sage/local/lib/python2.7/sitepackages/sage/graphs/spanning_tree.pyx in init sage.graphs.spanning_tree (build/cythonized/sage/graphs/spanning_tree.c:13774)() 53 # Copyright (c) 2009 Mike Hansen <mhansen@gmail.com> 54 # Copyright (c) 2010 Gregory McWhirter <gmcwhirt@uci.edu> 55 # Copyright (c) 2010 Minh Van Nguyen <nguyenminh2@gmail.com> 56 # 57 # This program is free software: you can redistribute it and/or modify 58 # it under the terms of the GNU General Public License as published by 59 # the Free Software Foundation, either version 2 of the License, or 60 # (at your option) any later version. 61 # https://www.gnu.org/licenses/ 62 # **************************************************************************** 63 from __future__ import absolute_import 64 65 cimport cython 66 67 from sage.sets.disjoint_set cimport DisjointSet_of_hashables > 68 from sage.graphs.graph import Graph global sage.graphs.graph = undefined global Graph = undefined 69 cpdef kruskal(G, wfunction=None, bint check=False): 70 r""" 71 Minimum spanning tree using Kruskal's algorithm. 72 73 This function assumes that we can only compute minimum spanning trees for 74 undirected graphs. Such graphs can be weighted or unweighted, and they can 75 have multiple edges (since we are computing the minimum spanning tree, only 76 the minimum weight among all `(u,v)`edges is considered, for each pair 77 of vertices `u`, `v`). 78 79 INPUT: 80 81  ``G``  an undirected graph. 82 83  ``weight_function`` (function)  a function that inputs an edge ``e`` ImportError: cannot import name Graph
Is there something that can be done?
New commits:
99d9ec6  Modified some changes in documentation

fc0905b  Made partition function cdef

3aee972  Documentation changes

comment:27 Changed 3 years ago by
Well, it's more complex than what I was expecting. So let the import in the methods. BUT, please move the import after the description of the method, for instance just before if not isinstance(G, Graph):
More comments:
 no need to
list( )
G.edges()
 return list(G.edges(sort=False)) + return G.edges(sort=False)
 plus you must indicate the type of a variable
in fact, since variable
 cdef edges = list(g.edges(sort=False)) + cdef list edges = g.edges(sort=False)
edges
is used only once in methodfilter_kruskal
, you should better callg.edges(sort=False)
directly, without creating intermediate variableedges
In method partition
 you must document the method and its parameters. What is the goal of this method ? What is it doing ?
 the threshold should be a parameter of the method with default value 10000. It's the only way a user can check that for instance 10 is too small without having to modify and recompile the code.
So it should also be a parameter of
filter_kruskal
 Method
partition
does not consider a graph but a list of edges. So the documentation and comments must be consistent with that. For instance # If graph size is less than the kruskal Threshold, then we implement kruskal  # Here threshold is set to 10000 edges  if len(edges) < 10000:  return filter_kruskal_iterator(edges, union_find, wfunction=wfunction, check=False) + # If we have less edges than the threshold, we run the Kruskal algorithm + if len(edges) < threshold: + return filter_kruskal_iterator(edges, union_find, wfunction=wfunction, check=False) + + # Otherwise, we partition the list of edges according a randomly selected pivot +

 # Checking if we have got the spanning tree + # Check if we have already obtained a spanning tree
Method filter_kruskal_iterator
 This method must also be documented
 remove
sortedE_iter = None
. It's not used
 Actually, this method is just copy/paste of the same code from
kruskal_iterator
(ok, we have already improved it by removing useless variables/instructions). So it would be much smarter to avoid code repetition by creating a methodkruskal_iterator_from_edges
that takes as input: a list of edges (edges
), a weight function (wfunction
), and aDisjointSet_of_hashables
(union_find
). The method will then yield edges.
This method will then be called by
kruskal_iterator
andpartition
like that:yield from kruskal_iterator_from_edges(edges, union_find, wfunction)
I know it's along process, but the goal is to get the best possible implementation: simpler, faster, more elegant, bug free, easier to maintain, etc.
comment:28 Changed 3 years ago by
 Commit changed from 3aee9726cc4e73ff17df11ecb98496785fa38081 to 16407759b6d08221d34da918b47dea2edca55252
Branch pushed to git repo; I updated commit sha1. New commits:
1640775  Implemented Kruskal_iterator_from_edges

comment:29 Changed 3 years ago by
Implementation and Documentation of kruskal_iterator_from_edges
is done, Thanks for the guidelines. I am currently on my learning phase and working on such projects is only going to help me. So it doesn't matter about duration of the project. Please review the recent commit, Thanks! :)
comment:30 Changed 3 years ago by
 Why this change ? I think you should revert it.
 Return an iterator implementation of Kruskal algorithm. + This method sanity checks and calls the fucntion kruskal_iterator_from_edges.
 An error message should not start with a capital letter
 raise ValueError("The input G must be an undirected graph") + raise ValueError("the input graph must be undirected")
Warning when you change an error message, you must make sure that all doctests with this error message are updated. If you are not sure of what you are doing, it's better/safer to let the error message as it was.
 Method
filter_kruskal
can be onlydef
instead ofcpdef
 You should rename parameter
kruskal_threshold
to simplythreshold
. It's clear enough.
 since you document parameter
weight_function
, I suggest you use that name everywhere instead ofwfunction
 a small detail
and correct the 80 columns formatting
  ``weight_function`` (function)  a function that inputs an edge ``e`` +  ``weight_function``  function (default: ``None``); a function that inputs an edge ``e``
 proposal to improve the documentation
  ``kruskal_threshold``  The maximum number of edges on which kruskal algorithm  should run. Default: ``kruskal_threshold=10000``. +  ``threshold``  integer (default: 10000); maximum number of edges + on which to run kruskal algorithm. Above that value, edges are + partitioned in sets of size at most ``threshold``.
 you can let 1 empty line here, not 2
+ + + """
Concerning method partition
, I have a proposal that will, in my opinion, ease the code: make method partition a submethod of filter_krustal
.
If you like it, just copy/paste the code. I made several corrections in it
def filter_kruskal(G, threshold=10000, weight_function=None, bint check=False): r""" ... CORRECTED DOCUMENTATION USING ABOVE COMMENTS ... """ import random from sage.graphs.graph import Graph if not isinstance(G, Graph): raise ValueError("the input graph must be undirected") if check: if not G.order(): return [] if not G.is_connected(): return # G is now assumed to be a nonempty connected graph if G.num_verts() == G.num_edges() + 1: # G is a tree return G.edges(sort=False) g = G.to_simple(to_undirected=False, keep_label='min') else: g = G def partition(edges, union_find, threshold, weight_function): """ This method is the core of the algorithm. """ # If graph size is less than the threshold, then we call Kruskal. if len(edges) < threshold: yield from kruskal_iterator_from_edges(edges, union_find, weight_function=weight_function) return # Selecting a pivot (edge weight) at random pivot = random.choice(edges)[2] # Divide the list of edges according the pivot # We use variable ch to equally divide edges with weight equal # the to pivot cdef list L1_Graph = [] cdef list L2_Graph = [] cdef int ch = 0 for e in edges: if e[2] < pivot: L1_Graph.append(e) elif e[2] > pivot: L2_Graph.append(e) else: if ch: L1_Graph.append(e) ch = 0 else: L2_Graph.append(e) ch = 1 # We filter the list of edges to remove edges between vertices # of the same component L1_filter = [e for e in L1_Graph if union_find.find(e[0]) != union_find.find(e[1])] if L1_filter: yield from partition(L1_filter, union_find, threshold=threshold, weight_function=weight_function) # Checking if we have got the spanning tree if union_find.number_of_subsets() == 1: return L2_filter = [e for e in L2_Graph if union_find.find(e[0]) != union_find.find(e[1])] if L2_filter: yield from partition(L2_filter, union_find, threshold=threshold, weight_function=weight_function) return # Making a union set cdef DisjointSet_of_hashables union_find = DisjointSet_of_hashables(g.vertex_iterator()) return list(partition(g.edges(sort=False), union_find, threshold=threshold, weight_function=weight_function))
Concerning method kruskal_iterator_from_edges
:
 a small change
 Implements kruskal algorithm for minimum spanning tree + Implement kruskal algorithm for minimum spanning tree
 add some description of what the method do, and that it works on edges
 document input parameters similarly that in other methods
comment:31 Changed 3 years ago by
 Commit changed from 16407759b6d08221d34da918b47dea2edca55252 to ba6973cdb7cb5cf26aaadb303926626dc8ab6654
Branch pushed to git repo; I updated commit sha1. New commits:
ffcea9e  Modified FloydWarshall to work with weighted graphs with nonnegative integer edge weights.

0d7299f  Merge branch 'u/ghgiorgosgiapis/weighted_floyd_warshall' of git://trac.sagemath.org/sage into filter_kruskal_spanning_tree

ba6973c  Made partition subfunction of filter_kruskal

comment:32 Changed 3 years ago by
 Commit changed from ba6973cdb7cb5cf26aaadb303926626dc8ab6654 to 4772886cdf2ef90a25c11b8613b0893742ec627a
Branch pushed to git repo; I updated commit sha1. New commits:
4772886  Revert "Merge branch 'u/ghgiorgosgiapis/weighted_floyd_warshall' of git://trac.sagemath.org/sage into filter_kruskal_spanning_tree"

comment:33 Changed 3 years ago by
sorry I was not clear enough, but there is a misunderstanding. I mean to replace wfunction
by weight_function
in the new code you are adding (= only in filter_kruskal
and kruskal_iterator_from_edges
), not in previously existing methods like kruskal
, kruskal_iterator
and boruvka
.
Indeed, these methods are called by other methods (e.g., .min_spanning_tree()
). So with the current modifications, some doctests will break. Moreover, we usually add a deprecation warning during 1 year before effectively changing the name of a parameter to let enough time to users to update their own code.
comment:34 Changed 3 years ago by
 Commit changed from 4772886cdf2ef90a25c11b8613b0893742ec627a to 697701d62916b7731609cae44ccaf51a3ddb389e
comment:35 Changed 3 years ago by
In method kruskal: raise ValueError("The input G must be an undirected graph.")
and in method boruvka : raise ValueError("the input graph must be undirected")
, error statements have written in such way. I have verified this according to Doctest.
comment:36 Changed 3 years ago by
I was not satisfied with this recursive version of filter kruskal as it makes many copies of the list. So I coded a new version that is more in the spirit of quicksort.
You can find it in branch public/graphs/27482_filter_kruskal
The branch do:
 a new version of filter kruskal avoiding to create many copies of the list of edges, and that is not recursive.
 expose the method in method
min_spanning_tree
ofgeneric_graph.py
 many corrections
 add reference to the paper at the right place
The branch is in public/
, so you can modify it. Of course be careful ;)
Please try it, do many tests, etc. If you like it, you can replace the branch in this ticket by this one.
comment:37 Changed 3 years ago by
 Milestone changed from sage8.7 to sage8.8
Ticket retargeted after milestone closed (if you don't believe this ticket is appropriate for the Sage 8.8 release please retarget manually)
comment:38 Changed 3 years ago by
 Branch u/ghHrishabhyadav/filter_kruskal_spanning_tree deleted
 Commit 697701d62916b7731609cae44ccaf51a3ddb389e deleted
comment:39 Changed 3 years ago by
 Branch set to public/graphs/27482_filter_kruskal
comment:40 Changed 3 years ago by
 Commit set to c10469297ef364b0e917bb9b4407187d2abea9ad
I have changed the current branch to public/graphs/27482_filter_kruskal
This code works quite well, yeah making many copies of list was never a good idea.
For large and highly dense graph
sage: from sage.graphs.spanning_tree import kruskal ....: from sage.graphs.spanning_tree import filter_kruskal ....: G= Graph() ....: G.allow_multiple_edges(True) ....: G.allow_loops(True) ....: G.weighted(True) ....: for i in range(3000000): ....: G.add_edge(randint(0,10000),randint(0,10000),randint(0,100000000)) ....: E=filter_kruskal(G); ....: E=kruskal(G); Time taken filter_kruskal : 4.15850090981 Time taken kruskal : 11.2105371952
For large and partially dense graph
sage: from sage.graphs.spanning_tree import kruskal ....: from sage.graphs.spanning_tree import filter_kruskal ....: G= Graph() ....: G.allow_multiple_edges(True) ....: G.allow_loops(True) ....: G.weighted(True) ....: for i in range(300000): ....: G.add_edge(randint(0,10000),randint(0,10000),randint(0,100000000)) ....: E=filter_kruskal(G); ....: E=kruskal(G); Time taken filter_kruskal : 0.484476089478 Time taken kruskal : 0.874042987823
New commits:
c104692  trac #27482: new version of filter kruskal

comment:41 Changed 3 years ago by
If we see carefully selection of pivot can result in worst case complexity of O(m*m)(really really less odds!), but we can still have a better algorithm for selecting a pivot rather than selecting it randomly. I was thinking of selecting it by finding median. There are algorithm for finding median in linear time. So why not? As suggested in this paper: http://www.cs.cornell.edu/courses/cs2110/2009su/Lectures/examples/MedianFinding.pdf
comment:42 Changed 3 years ago by
You can try it on your side and see how it affects the running time. You could also investigate on how quicksort do in practice (i.e., how is it done in C/C++ libraries). You can certainly find some experiments with different algorithms for pivot selection.
comment:43 Changed 3 years ago by
Do you still plan to (try to) improve the implementation or can we try to conclude this ticket ?
comment:44 Changed 3 years ago by
Yeah, I did try to implement it and results weren't quite good. Problem with median_of_medians is that although it is a linear time algorithm but also has large coefficient.
The result of median_of_median was still better than the one currently used for finding median.
Yeah it will be better if we conclude this ticket now, but still I am planning to study more about pivot selection. I also came across a data structure used for selection algorithm: https://en.wikipedia.org/wiki/Soft_heap Will be planning to implement it in future, for now we can conclude this ticket. Any suggestion?
comment:45 Changed 3 years ago by
As reported by the patchbot
 EXAMPLE: + EXAMPLES::
comment:46 Changed 3 years ago by
 Commit changed from c10469297ef364b0e917bb9b4407187d2abea9ad to 3aa3c8d654388e00cbdcb146032462a86b7326bb
Branch pushed to git repo; I updated commit sha1. New commits:
3aa3c8d  Modified "Example:"

comment:47 Changed 3 years ago by
No. Please revert last commit. The needed change is in kruskal_iterator_from_edges
.
Also, you should build the documentation and check if the resulting html pages display well.
comment:48 Changed 3 years ago by
 Commit changed from 3aa3c8d654388e00cbdcb146032462a86b7326bb to 6e928245fdc3300af342bc2dfc99fee7d81b6e0c
Branch pushed to git repo; I updated commit sha1. New commits:
6e92824  Revrted and Modified

comment:49 Changed 3 years ago by
 Commit changed from 6e928245fdc3300af342bc2dfc99fee7d81b6e0c to 94aa027a99c5ff33fb9ca065c271887e26810d2f
Branch pushed to git repo; I updated commit sha1. New commits:
94aa027  Modified "Examples"

comment:50 Changed 3 years ago by
 Reviewers set to David Coudert
 Status changed from needs_review to positive_review
LGTM.
comment:51 Changed 3 years ago by
author name should be real name
comment:52 Changed 3 years ago by
comment:53 Changed 3 years ago by
 Branch changed from public/graphs/27482_filter_kruskal to 94aa027a99c5ff33fb9ca065c271887e26810d2f
 Resolution set to fixed
 Status changed from positive_review to closed
Welcome to Sagemath !
Is the algorithm complicated to implement ? Is it faster than, or comparable to, other spanning tree methods that we already have ?