Opened 21 months ago

Closed 4 months ago

#31451 closed enhancement (fixed)

Faster version of longest_increasing_subsequences

Reported by: gh-petermoraw Owned by:
Priority: major Milestone: sage-9.7
Component: combinatorics Keywords: permutation, subsequences
Cc: Nadia Lafrenière, Emile Nadeau, Travis Scrimshaw Merged in:
Authors: Peter Morawitz, Finn Hulse, Nadia Lafrenière Reviewers: David Coudert, Vincent Delecroix
Report Upstream: N/A Work issues:
Branch: 5969eda (Commits, GitHub, GitLab) Commit: 5969edaa2e395e683feaba38db8ea94c6290e996
Dependencies: Stopgaps:

Status badges

Description (last modified by Vincent Delecroix)

The algorithm for the longest_increasing_subsequences function for permutations is currently very inefficient. I worked on a new version that vastly improves the run time of the function. New version:

sage: %timeit Permutations(30).random_element().longest_increasing_subsequences()                                     
19.6 ms ± 2.28 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Old version:

sage: %timeit Permutations(30).random_element().longest_increasing_subsequences()                                     
The slowest run took 10.40 times longer than the fastest. This could mean that an intermediate result is being cached.
3min 34s ± 2min 47s per loop (mean ± std. dev. of 7 runs, 1 loop each)

follow-up: #34218

Change History (41)

comment:1 Changed 21 months ago by gh-petermoraw

Branch: u/gh-petermoraw/LongestIncreasingSubsequences

comment:2 Changed 21 months ago by Nadia Lafrenière

Commit: 0bb729e48b1669180599854ac8f2f4ee5741a094
Status: newneeds_review

New commits:

c65429fimproved longest increasing subsequences function
0bb729eAdded reference

comment:3 Changed 21 months ago by Frédéric Chapoton

  • when citing a reference, you should append an underscore, like this
    [Sch1961]_
    
  • Do not use "Returns" in the first line of doc, but "Return"

comment:4 Changed 21 months ago by Frédéric Chapoton

and patchbot complains about

src/doc/en/reference/references/index.rst  # Tab character found

comment:5 Changed 21 months ago by Frédéric Chapoton

Cc: Travis Scrimshaw added

comment:6 Changed 21 months ago by Travis Scrimshaw

I think seq == sorted(seq) would be better replaced by

all(seq[i] <= seq[i+1] for i in range(len(seq)-1))

Also, is there a way you can avoid all of the calls to index here:

self.index(seq[j]) < self.index(seq[j+1]):

I feel like there should be a better way to track things, otherwise this could run in O(len(seq)^2) (maybe O(log(len(seq)*len(seq))) time.

You can remove this useless list

-        potential_sequences = list(itertools.product(*s))
-        for seq in potential_sequences:
+        for seq in itertools.product(*s):

I think it is cleaner to write j += 1 instead of j = j+1. It might also be marginally faster.

comment:7 Changed 21 months ago by git

Commit: 0bb729e48b1669180599854ac8f2f4ee5741a09486204b5f3a3bf3f4c116e35890c94a4fb3087575

Branch pushed to git repo; I updated commit sha1. New commits:

86204b5tscrim edits

comment:8 Changed 21 months ago by Frédéric Chapoton

you still have not fixed comment:3 and comment:4

comment:9 Changed 21 months ago by Travis Scrimshaw

Nor have you commented on why you did not implement all of the suggestions from comment:6.

comment:10 Changed 21 months ago by Nadia Lafrenière

Status: needs_reviewneeds_work

comment:11 Changed 21 months ago by git

Commit: 86204b5f3a3bf3f4c116e35890c94a4fb30875751f7a29ae89f3b1e5b3144451c4860d7b9c19e0fe

Branch pushed to git repo; I updated commit sha1. New commits:

2f5391f[Sch1961]_
1f7a29aTab in references

comment:12 Changed 21 months ago by Emile Nadeau

About comment:6, you actually don't need to use index at all. You can do a single pass on your permutation to check if the element of seq appear in the right order in your permutation.

comment:13 Changed 21 months ago by Matthias Köppe

Milestone: sage-9.3sage-9.4

Sage development has entered the release candidate phase for 9.3. Setting a new milestone for this ticket based on a cursory review of ticket status, priority, and last modification date.

comment:14 Changed 17 months ago by Kevin Dilks

There's a clever Python way to check if one sequence is a subsequence of another with a single pass ( https://stackoverflow.com/questions/24017363/how-to-test-if-one-string-is-a-subsequence-of-another ):

def is_subseq(x, y):
    it = iter(y)
    return all(c in it for c in x)

Short enough that you could eliminate the helper function and go with

it = iter(self)
if seq == sorted(seq) and all(c in it for c in seq):

though including the helper function with the name may make code more transparent.

Were there any other lingering concerns besides fixing that helper function? If no other concerns and no preference between in-line vs helper function, I'll pick one, push the change and set to 'needs review'.

comment:15 Changed 17 months ago by Nadia Lafrenière

I tried to implement Kevin's solution, but it runs slower than Peter's proposed solution on my computer. I agree that the code would look better, though. I think that Peter is working on a clever and much faster solution that would involve a directed graph to store the potential increasing subsequences. That would reduce the number of sequences to check, and they would automatically be subsequences of the permutation (so there would be no need to check that they indeed are; getting rid of the problem pointed out here).

Maybe Peter can confirm my thoughts and update us with where he is in the process.

comment:16 Changed 17 months ago by Matthias Köppe

Milestone: sage-9.4sage-9.5

Setting a new milestone for this ticket based on a cursory review.

comment:17 Changed 12 months ago by Matthias Köppe

Milestone: sage-9.5sage-9.6

comment:18 Changed 8 months ago by Matthias Köppe

Milestone: sage-9.6sage-9.7

comment:19 Changed 5 months ago by Nadia Lafrenière

Branch: u/gh-petermoraw/LongestIncreasingSubsequencesu/nadialafreniere/LongestIncreasingSubsequences

comment:20 Changed 5 months ago by Nadia Lafrenière

Authors: Peter MorawitzPeter Morawitz. Finn Hulse, Nadia Lafrenière
Commit: 1f7a29ae89f3b1e5b3144451c4860d7b9c19e0fef5d40fa41d7cdee746b26a98a6a64f10eba47579
Status: needs_workneeds_review

Peter decided to leave the project. I took his ideas as well as some ideas from Finn Hulse, another student I had in my class, to write the patch that should solve the problem. Instead of listing all potential longest increasing subsequences and checking for their presence in the sequence, it uses a directed graph in which all the paths from a source vertex to a sink are longest increasing subsequences.

As for speeding up the solution, I get the following results:

sage: %timeit Permutations(30).random_element().longest_increasing_subsequences()
416 µs ± 6.73 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sage: %timeit Permutations(100).random_element().longest_increasing_subsequences()
3.34 ms ± 291 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In comparison, with the code currently in Sage 9.7.beta5, I get

sage: %timeit Permutations(30).random_element().longest_increasing_subsequences()
The slowest run took 34.93 times longer than the fastest. This could mean that an intermediate result is being cached.
6min 27s ± 6min 38s per loop (mean ± std. dev. of 7 runs, 1 loop each)

New commits:

0466305Merge branch 'u/gh-petermoraw/LongestIncreasingSubsequences' of git://trac.sagemath.org/sage into t/31451/LongestIncreasingSubsequences
f5d40faImplemented the digraph solution for enumerating longest increasing subsequences

comment:21 Changed 5 months ago by Nadia Lafrenière

Branch: u/nadialafreniere/LongestIncreasingSubsequences
Commit: f5d40fa41d7cdee746b26a98a6a64f10eba47579

Sorry, I realized I push the wrong branch. Will push the correct one soon.

comment:22 Changed 5 months ago by Nadia Lafrenière

Branch: u/nadialafreniere/LongestIncreasingSubsequences2
Commit: 8d1c1d6d8e39875c22299ec1912752db0e9332b6

Fixed the branch issue.

comment:23 Changed 4 months ago by Vincent Delecroix

I think the end would be clearer as

-        increasing_subsequences = []
    
        for i in columns[0]:
            D.add_edge(0, i)  # 0 is source
        for i in columns[-1]:
            D.add_edge(i, n+1) # n+1 is sink

-        for p in D.all_paths(0, n+1):
-            increasing_subsequences.append(p[1:-1])
+        increasing_subsequences = [p[1:-1] for p in D.all_paths(0, n+1)]

comment:24 Changed 4 months ago by Vincent Delecroix

Another detail,

        increasing_subsequences.sort()
        increasing_subsequences.reverse()

would better be

        increasing_subsequences.sort(reverse=True)

comment:25 Changed 4 months ago by Vincent Delecroix

Wouldn't it be more efficient to keep the lists in columns sorted so that one can do early stop

for k in columns[j-1]:
    if k >= self[i]:
        break
    D.add_edge(k, self[i])

Of course keeping them sorted imposes a log(|size|) for insertion + something smarter than a list as a data structure.

comment:26 Changed 4 months ago by git

Commit: 8d1c1d6d8e39875c22299ec1912752db0e9332b662b9a399268375ef45311c9d773f40227a7dcfd6

Branch pushed to git repo; I updated commit sha1. New commits:

62b9a39Simplified end of code for longest increasing subsequences

comment:27 in reply to:  25 Changed 4 months ago by Nadia Lafrenière

Thanks, Vincent, for your comments. I implemented the changes to the end of the code. I believe it is more readable now.

Replying to vdelecroix:

Wouldn't it be more efficient to keep the lists in columns sorted so that one can do early stop

for k in columns[j-1]:
    if k >= self[i]:
        break
    D.add_edge(k, self[i])

Of course keeping them sorted imposes a log(|size|) for insertion + something smarter than a list as a data structure.

That would indeed be more efficient. I'm trying to think of the appropriate data structure for it, but I'm not sure how to handle this in order to keep it efficient. Did you have anything in mind?

comment:28 Changed 4 months ago by David Coudert

You can maintain the elements in a column sorted using

sage: from bisect import insort
sage: L = []
sage: for e in Permutations(30).random_element():
....:     insort(L, e)
sage: print(L)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]

You can also avoid the call to method self.longest_increasing_subsequence_length and ensure that both use first_row_p_tableau and columns[j]` are sorted. Overall this is faster than your current code.

def longest_increasing_subsequences(self):
    r"""
    """
    n = self.size()
    if not n:
        return([[]])

    from bisect import insort

    # getting the column in which each element is inserted
    first_row_p_tableau = []
    columns = []
    D = DiGraph(n+2)
    for x in self:
        j = bisect(first_row_p_tableau, x)
        if j == len(first_row_p_tableau):
            if columns:
                for k in columns[-1]:
                    if k >= x:
                        break
                    D.add_edge(k, x)
            first_row_p_tableau.append(x)
            columns.append([x])
        else:
            first_row_p_tableau[j] = x
            insort(columns[j], x)
            if j:
                for k in columns[j-1]:
                    if k >= x:
                        break
                    D.add_edge(k, x)

    for i in columns[0]:
        D.add_edge(0, i)  # 0 is source
    for i in columns[-1]:
        D.add_edge(i, n+1)  # n+1 is sink

    return sorted([p[1:-1] for p in D.all_paths(0, n+1)], reverse=True)
Last edited 4 months ago by David Coudert (previous) (diff)

comment:29 Changed 4 months ago by Vincent Delecroix

Description: modified (diff)

comment:30 Changed 4 months ago by Vincent Delecroix

I did not know about bisect.insort. The Python implementation of list uses contiguous array so that insertion is O(n) and not O(log n). A priori, using balanced search tree would be more efficient here : iterating through them is O(1) per element and insertion is O(log n). Browsing quickly, I found https://pypi.org/project/blist/. But since we are dealing with integers, I would go with the bstree from boost in C++. Not sure it is worth implementing it, but it could be mentioned as a remark in the code.

comment:31 Changed 4 months ago by David Coudert

Right, insort is O(n). What we need here is a data structure with quick insert and enabling to iterate over the elements in increasing order without extra cost...

comment:32 Changed 4 months ago by git

Commit: 62b9a399268375ef45311c9d773f40227a7dcfd64a7f5c8cf4097ff903f625d1e4b97b5416308fba

Branch pushed to git repo; I updated commit sha1. New commits:

4a7f5c8Use of bisect for columns in longest_increasing_subsequences

comment:33 Changed 4 months ago by Nadia Lafrenière

Thanks, David. I did not know about bisect either. That's a great improvement! I left the columns into sorted lists and did not implement the BST mostly because I think that the code is easier to read this way and because the code is still pretty fast. Given how slow it used to be, I doubt anyone ever used this code on large permutations. As suggested by Vincent, I added a note to the docstring about it. If others think a more efficient data structure is worth implementing, I would also be fine with it.

comment:34 Changed 4 months ago by Vincent Delecroix

Nadia, you need a line break after .. NOTE:: as in

        .. NOTE::

            This algorithm could be made faster using a balanced search tree
            for each column instead of sorted lists. See discussion on
            :trac:`31451`.

comment:35 Changed 4 months ago by Vincent Delecroix

The rest looks perfectly good to me. Thanks.

comment:36 Changed 4 months ago by David Coudert

and also 2 spaces before comments. Just a coding style.

-            D.add_edge(i, n+1) # n+1 is sink
+            D.add_edge(i, n+1)  # n+1 is sink

I also think that this code is good enough for now.

comment:37 Changed 4 months ago by git

Commit: 4a7f5c8cf4097ff903f625d1e4b97b5416308fba5969edaa2e395e683feaba38db8ea94c6290e996

Branch pushed to git repo; I updated commit sha1. New commits:

5969edaproper formatting for longest_increasing_subsequences

comment:38 in reply to:  35 Changed 4 months ago by Nadia Lafrenière

Replying to vdelecroix:

The rest looks perfectly good to me. Thanks.

Thanks, Vincent. I just fixed the formatting.

comment:39 Changed 4 months ago by Vincent Delecroix

Reviewers: David Coudert, Vincent Delecroix
Status: needs_reviewpositive_review

comment:40 Changed 4 months ago by Matthias Köppe

Authors: Peter Morawitz. Finn Hulse, Nadia LafrenièrePeter Morawitz, Finn Hulse, Nadia Lafrenière

comment:41 Changed 4 months ago by Volker Braun

Branch: u/nadialafreniere/LongestIncreasingSubsequences25969edaa2e395e683feaba38db8ea94c6290e996
Resolution: fixed
Status: positive_reviewclosed
Note: See TracTickets for help on using tickets.