Opened 3 years ago
Closed 3 years ago
#28220 closed enhancement (fixed)
Cythonize all_simple_paths, all_paths_iterator and _all_paths_iterator
Reported by:  Rajat Mittal  Owned by:  

Priority:  major  Milestone:  sage8.9 
Component:  graph theory  Keywords:  gsoc19 
Cc:  David Coudert  Merged in:  
Authors:  Rajat Mittal  Reviewers:  David Coudert 
Report Upstream:  N/A  Work issues:  
Branch:  20687b9 (Commits, GitHub, GitLab)  Commit:  20687b94f822ea728c1fe3ca4c0d27451505c119 
Dependencies:  #27859  Stopgaps: 
Description
This ticket aims to cythonize the above methods using cython variables and cython data structures to make them fast and efficient. It depends on #27859 in which these methods were moved to separate file path_enumeration.pyx
Change History (33)
comment:1 Changed 3 years ago by
comment:2 Changed 3 years ago by
Branch:  → public/graphs/28220_cythonize_paths 

comment:3 Changed 3 years ago by
Commit:  → 7e193cf9803387f4368bdee444766154c81f2872 

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:
6eb771d  merge conflict solved

8a22e2e  updated

eab8dba  removed bugs, worked on a copy of original graph, added a doctest

32d1bc8  added exclude vertices

fbc0faa  fix bugs in feng

ced65d4  fix case of no paths bw vertices in Feng

dc98836  correct pep8

362ff4b  based on #28221

f02d6e5  trac #27859 merged with 8.9beta4

7e193cf  cythonized using cython data structures

comment:4 Changed 3 years ago by
Status:  new → needs_review 

comment:5 Changed 3 years ago by
Commit:  7e193cf9803387f4368bdee444766154c81f2872 → 15180791edd17ca1045b90c2edccb079dfc8176f 

Branch pushed to git repo; I updated commit sha1. New commits:
1518079  improved cython code

comment:6 Changed 3 years ago by
Can you summarize what you did and what else you plan to do. It's not easy to see from the code (ok, the last commit is clear).
comment:7 Changed 3 years ago by
I have added three new methods all_simple_paths_cython, all_paths_iterator_cython and _all_paths_iterator_cython and used the priority_queue data structure instead of python heap. I have also used cython variable declarations. I have tested the time complexity and the improvement in time is very minute. I am looking for more cythonic style improvements. If you have any in mind w.r.t. these methods it will be helpful here.
comment:9 Changed 3 years ago by
Commit:  15180791edd17ca1045b90c2edccb079dfc8176f → 8c4a92a8972b1503ce129468e8c77f18053431f7 

comment:10 Changed 3 years ago by
sage: g = digraphs.Paley(11) sage: %timeit l = g.all_simple_paths(starting_vertices=[0], ending_vertices=[9], ....: trivial=False) 1 loop, best of 3: 737 ms per loop sage: sage: %timeit l = g.all_simple_paths_cython(starting_vertices=[0], ending_vertic ....: es=[9], trivial=False) 1 loop, best of 3: 722 ms per loop sage: d = digraphs.DeBruijn(3, 2) sage: %timeit l = d.all_simple_paths_cython(starting_vertices=['00'], ending_ver ....: tices=['22'], trivial=False) 100 loops, best of 3: 2.16 ms per loop sage: %timeit l = d.all_simple_paths(starting_vertices=['00'], ending_vertices=[ ....: '22'], trivial=False) 100 loops, best of 3: 2.44 ms per loop
comment:11 Changed 3 years ago by
Commit:  8c4a92a8972b1503ce129468e8c77f18053431f7 → 34b0e95f2c7aa82e9588fd8eaee641f1ab9c534c 

Branch pushed to git repo; I updated commit sha1. New commits:
34b0e95  corrected

comment:12 Changed 3 years ago by
Commit:  34b0e95f2c7aa82e9588fd8eaee641f1ab9c534c → 85be894df27f726dca2d79ee1bae6347fcb9e504 

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:
f503ec8  fixed potetial bugs in feng's

7871701  adding comments

7c9bc30  trac #27859: fix merge conflict with 8.9.beta5

3ae912d  trac #27859: add test on random digraphs

82015e2  made feng faster

4bdbe63  some improvement

fabe87b  trac #27859: review edit in Yen

3a2aca3  trac #27859: review edit in Feng

034cb7f  Merge branch 'public/graphs/27859_enumerate_paths' of git://trac.sagemath.org/sage into public/graphs/27859_enumerate_paths

85be894  merged with #27859

comment:13 Changed 3 years ago by
I'm having hard time identifying the changes done in this ticket, so I will wait for the inclusion of #27859 before starting to review it.
comment:14 Changed 3 years ago by
Commit:  85be894df27f726dca2d79ee1bae6347fcb9e504 → 686a738f0cff7bec785d7a4b6e6779306426c99b 

Branch pushed to git repo; I updated commit sha1. New commits:
686a738  minor changes

comment:15 Changed 3 years ago by
Commit:  686a738f0cff7bec785d7a4b6e6779306426c99b → a0ccabf40cc7693b52207abcebf190054a965d8a 

comment:16 Changed 3 years ago by
Commit:  a0ccabf40cc7693b52207abcebf190054a965d8a → ee12bd0cb01ac6e819e94e1a4067ab272e8a9861 

Branch pushed to git repo; I updated commit sha1. New commits:
ee12bd0  merged with sage 8.9beta6

comment:17 Changed 3 years ago by
You should do as for #28335: modify existing methods instead of creating new ones. There are no big changes in the methods justifying the creation of new methods.
comment:18 Changed 3 years ago by
Commit:  ee12bd0cb01ac6e819e94e1a4067ab272e8a9861 → f1bb530ba027a2f2a1c7bf978ad7b24cb759b709 

comment:19 followup: 20 Changed 3 years ago by
I had created new methods to compare the time complexities etc. Now the methods have been merged and written in Cython.
comment:20 Changed 3 years ago by
Status:  needs_review → needs_work 

Replying to ghrajat1433:
I had created new methods to compare the time complexities etc.
Right, I did not though about it. Sorry.
Now the methods have been merged and written in Cython.
Well, if I understand well what you did, you simply declared the type of some variables and used a c++ priority queue instead of a Python queue at the cost of storing all reported paths in a list. This is not good enough. and I disagree with the current use of idx_to_path
as this list may become giant.
Can't you try to identify first which parts of the methods are critical and could certainly be improved ?
comment:21 Changed 3 years ago by
Commit:  f1bb530ba027a2f2a1c7bf978ad7b24cb759b709 → c3946c6033379c09e1c3cd0c2ea5fefabe59127c 

Branch pushed to git repo; I updated commit sha1. New commits:
c3946c6  used dict instead of a list

comment:22 Changed 3 years ago by
all_paths_iterator methods calls _all_paths_iterator method to get all the paths starting from a particular vertex and ending at a list of ending vertices. But as we see the code of _all_paths_iterator method critical part there seems to be to get to the ending vertex from start but parameter max_length suggests to go step by step to ending vertex which the algorithm does. It explores all the paths to the ending vertices by putting them in the queue.
I don't see much of a scope of improvement from algorithm point of view right now in these methods. If you have any ideas , please let me know. But these methods are currently in cython now, in alignment with other methods of this module path_enumeration.pyx and faster.
comment:24 Changed 3 years ago by
Commit:  c3946c6033379c09e1c3cd0c2ea5fefabe59127c → 6c65e63ced41653dbe91320d81281c02621ab17d 

Branch pushed to git repo; I updated commit sha1. New commits:
6c65e63  resolved merge conflict

comment:25 Changed 3 years ago by
Status:  needs_work → needs_review 

comment:27 Changed 3 years ago by
Commit:  6c65e63ced41653dbe91320d81281c02621ab17d → 20687b94f822ea728c1fe3ca4c0d27451505c119 

Branch pushed to git repo; I updated commit sha1. New commits:
20687b9  trac #28220: fix doctest with Python 3

comment:28 Changed 3 years ago by
I changed 2 doctests to make them compatible with Python 3 (were failing when testing with py3).You may know that we are in the transition to py3 and so some of us are running both a version compiled with py2 and another with py3. For some data structures, like dict, the order in which items are "ordered" change. In this case, it was changing the order in which paths where yielded. The solution was correct nonetheless. With the change I made, it's more robust.
comment:29 Changed 3 years ago by
A good news:
with this patch:
sage: g = digraphs.Complete(6) sage: %time _ = g.all_simple_paths() CPU times: user 24.9 ms, sys: 1.73 ms, total: 26.7 ms Wall time: 25.7 ms
without:
sage: g = digraphs.Complete(6) sage: %time _ = g.all_simple_paths() CPU times: user 53 ms, sys: 8.88 ms, total: 61.9 ms Wall time: 55.8 ms
comment:30 Changed 3 years ago by
A bad news: method all_simple_paths
uses all_paths_iterator
, and this method claim that The paths are enumerated in increasing length order
. However, this is not the case (with and without this patch).
sage: g = digraphs.Complete(3) sage: g.all_simple_paths() [[0, 1], [1, 0], [2, 0], [0, 1, 0], [0, 2], [0, 2, 0], [0, 1, 2], [1, 0, 1], [1, 2], [1, 2, 1], [1, 0, 2], [2, 0, 2], [2, 1], [2, 1, 2], [2, 0, 1], [0, 1, 2, 0], [0, 2, 1], [0, 2, 1, 0], [1, 0, 2, 1], [1, 2, 0], [1, 2, 0, 1], [2, 0, 1, 2], [2, 1, 0], [2, 1, 0, 2]]
What if we used a priority queue instead of a queue in _all_paths_iterator
? As this behavior seems intependent of the changes done here, this can be postpone to a future patch.
comment:31 Changed 3 years ago by
It seems that when starting_vertices and ending_vertices are not single vertex then these paths are not in increasing no. of edges.
"What if we used a priority queue instead of a queue in _all_paths_iterator ? As this behavior seems intependent of the changes done here, this can be postpone to a future patch. "
I tried it but still the results are not in increasing order of edges.
We can fix it in next patch either to reorganize the algorithm to print paths in order of increasing number of edges or stating in simple paths that edges will be in increasing order in case of a single src and destination.
comment:32 Changed 3 years ago by
Status:  needs_review → positive_review 

OK, let's postpone the ordering issue to a specific ticket. You can investigate that before opening a ticket.
LGTM.
comment:33 Changed 3 years ago by
Branch:  public/graphs/28220_cythonize_paths → 20687b94f822ea728c1fe3ca4c0d27451505c119 

Resolution:  → fixed 
Status:  positive_review → closed 
Create distinct methods to make sure you can compare results.