Opened 3 years ago

Closed 3 years ago

#26111 closed enhancement (fixed)

Implement faster iterator for Lyndon words

Reported by: tscrim Owned by:
Priority: major Milestone: sage-8.4
Component: combinatorics Keywords:
Cc: sage-combinat, mantepse, vdelecroix Merged in:
Authors: Travis Scrimshaw Reviewers: Martin Rubey
Report Upstream: N/A Work issues:
Branch: 0b4c68b (Commits, GitHub, GitLab) Commit: 0b4c68b2ba3c7e270ac8d4137c137695132fc588
Dependencies: #25462 Stopgaps:

Status badges

Description

Having an iterator for Lyndon words with a fixed length and alphabet size is useful that does not return elements (e.g., the free Lie algebra). Moreover, by having less transient objects, we can also improve the speed.

Current branch:

sage: LW = LyndonWords(11,6)
sage: %time for x in LW: pass
CPU times: user 2.17 s, sys: 40 ms, total: 2.21 s
Wall time: 2.17 s

sage: LW = LyndonWords(11,7)
sage: %time for x in LW: pass
CPU times: user 20.7 s, sys: 20 ms, total: 20.7 s
Wall time: 20.7 s

vs old version:

sage: LW = LyndonWords(11,6)
sage: %time for x in LW: pass
CPU times: user 2.86 s, sys: 12 ms, total: 2.87 s
Wall time: 2.85 s

sage: LW = LyndonWords(11,7)
sage: %time for x in LW: pass
CPU times: user 26.8 s, sys: 12 ms, total: 26.8 s
Wall time: 26.7 s

Change History (28)

comment:1 Changed 3 years ago by tscrim

  • Branch set to public/combinat/fast_lyndon_iter-26111
  • Commit set to a17239b054dbed216feba1b1b03b50f51df196eb
  • Status changed from new to needs_review

New commits:

fdb880bImplementing "fast" iterator for Lyndon words of a fixed length and alphabet size.
a17239bUsing new iterator in the free Lie algebra.

comment:2 follow-up: Changed 3 years ago by mantepse

Given #25262, it would be good to add a "long" doctest, so we can avoid performance regressions in future.

comment:3 Changed 3 years ago by mantepse

Notes to myself:

There seem to be Gray codes available, eg:

(Note that the very simple iterative algorithm on page 217 of Ruskey's book visits also the necklaces.)

Last edited 3 years ago by mantepse (previous) (diff)

comment:4 Changed 3 years ago by git

  • Commit changed from a17239b054dbed216feba1b1b03b50f51df196eb to bbacc76e3cd1da57b7463069c49b4c1d38ae9f21

Branch pushed to git repo; I updated commit sha1. New commits:

bbacc76Added long test to lyndon_word.py.

comment:5 in reply to: ↑ 2 Changed 3 years ago by tscrim

Replying to mantepse:

Given #25262, it would be good to add a "long" doctest, so we can avoid performance regressions in future.

That is not really how #25262 is suppose to work, and I don't believe is a better way to catch regressions (there will be less noise, but the noise will still be there). However, I've added it.

comment:6 Changed 3 years ago by mantepse

Here is a naive implementation of the iterative FKM algorithm from Ruskey's book. I couldn't get http://puma.dimai.unifi.it/17_1_2/weston.pdf to work, and I haven't checked what it's advantage would be.

I won't steel the joy of timing this from you :-)

def FKM_iterativ(k, n):
    a = [0]*(n+1)
    if n == 1:
        yield a[:]
    i = n;
    while True:
        a[i] += 1
        for j in range(1, n-i+1):
            a[j + i] = a[j]
        if n == i:            
            yield a[:]
        i = n
        while a[i] == k-1:
            i -= 1
        if i == 0:
            break

comment:7 Changed 3 years ago by git

  • Commit changed from bbacc76e3cd1da57b7463069c49b4c1d38ae9f21 to 8cc9d0e9cda0a33477caf88a93be69d48bff600e

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

ce2962brestore richer doc tests
ba08ff3reviewer's comments
95ce20fprovide iterators which return lists of lists
d2e0e6einline a computation by reviewer's request
947233cMerge branch 'public/25462' of git://trac.sagemath.org/sage into public/combinat/speedup_set_partitions-25462
a79e302Reverted to an_element() and added some additional reviewer changes.
87bf535Cythonizing iterator.
ba6a115Fixing doctests due to ordering change.
95d5d91Merge branch 'public/combinat/speedup_set_partitions-25462' of git://trac.sagemath.org/sage into public/combinat/fast_lyndon_iter-26111
8cc9d0eUsing a Cython version of FKM algorithm to generate Lyndon words.

comment:8 Changed 3 years ago by tscrim

  • Dependencies set to #25462

The "joy" of timing...sure...

Anyways...that was a good find. The FKM algorithm is much faster than what was previously there (approximately 4x faster):

sage: %time for l in generate_lyndon_words(11,6): pass
CPU times: user 1.74 s, sys: 36 ms, total: 1.78 s
Wall time: 1.76 s
sage: %time for l in FKM_iterative(11,6): pass
CPU times: user 452 ms, sys: 32 ms, total: 484 ms
Wall time: 432 ms

sage: %time for l in generate_lyndon_words(11,7): pass
CPU times: user 16.7 s, sys: 28 ms, total: 16.7 s
Wall time: 16.7 s
sage: %time for l in FKM_iterative(11,7): pass
CPU times: user 3.75 s, sys: 28 ms, total: 3.78 s
Wall time: 3.74 s

With Cython (approximate another 10x speedup):

sage: %time for l in FKM_iterative_cython(11,6): pass
CPU times: user 36 ms, sys: 20 ms, total: 56 ms
Wall time: 40.7 ms
sage: %time for l in FKM_iterative_cython(11,7): pass
CPU times: user 340 ms, sys: 12 ms, total: 352 ms
Wall time: 322 ms

I have merged in #25462 and added the Cython code in such a way as to not (add) conflict with #25659.

Last edited 3 years ago by tscrim (previous) (diff)

comment:9 Changed 3 years ago by mantepse

Great! (sorry, I actually did think that you would enjoy the timing)

One thing I do not yet understand: on my laptop, FKM is roughly 12 times faster than the original iterator (i.e., without this ticket).

I'll try this ticket now to check.

comment:10 Changed 3 years ago by mantepse

Dear Travis!

The following cuts the time by a half, but I'm not sure whether it's correct... What do you think? (more precisely: should it really be FiniteWord_char?)

  • src/sage/combinat/lyndon_word.py

    diff --git a/src/sage/combinat/lyndon_word.py b/src/sage/combinat/lyndon_word.py
    index 880dd4139c..15d0f56d41 100644
    a b class LyndonWords_nk(UniqueRepresentation, Parent): 
    457457            sage: sum(1 for lw in LyndonWords(11, 6))  # long time
    458458            295020
    459459        """
     460        from sage.combinat.words.word import FiniteWord_char
    460461        for lw in generate_lyndon_words(self._n, self._k):
    461             yield self._words([i+1 for i in lw], check=False)
     462            yield FiniteWord_char(self._words, [i+1 for i in lw])
    462463
    463464def StandardBracketedLyndonWords(n, k):
    464465    """

comment:11 follow-up: Changed 3 years ago by vdelecroix

To be nicer with the words code use

self._words.element_classes['char'](self._words, [i+1 for i in lw])

That should roughly be the same. And you can even keep a reference W = self._words.element_classes['char'] to save extra micro seconds.

comment:12 in reply to: ↑ 11 Changed 3 years ago by mantepse

Thank you Vincent!

However, it turns out that "char" is probably incorrect:

sage: list(LyndonWords(1000,1))
OverflowError: value too large to convert to unsigned char

So it should be

  • src/sage/combinat/lyndon_word.py

    diff --git a/src/sage/combinat/lyndon_word.py b/src/sage/combinat/lyndon_word.py
    index 880dd4139c..8c0f49cfdd 100644
    a b class LyndonWords_nk(UniqueRepresentation, Parent): 
    457457            sage: sum(1 for lw in LyndonWords(11, 6))  # long time
    458458            295020
    459459        """
     460        W = self._words._element_classes['list']
    460461        for lw in generate_lyndon_words(self._n, self._k):
    461             yield self._words([i+1 for i in lw], check=False)
     462            yield W(self._words, [i+1 for i in lw])
    462463
    463464def StandardBracketedLyndonWords(n, k):
    464465    """

This doesn't seem to be essentially slower than "char"...

comment:13 Changed 3 years ago by mantepse

@Travis, a tiny thing:

  • src/sage/algebras/lie_algebras/free_lie_algebra.py

    diff --git a/src/sage/algebras/lie_algebras/free_lie_algebra.py b/src/sage/algebras/lie_algebras/free_lie_algebra.py
    index 1d0665d225..49fcf058c9 100644
    a b class FreeLieAlgebra(Parent, UniqueRepresentation): 
    809809            n = len(self._indices)
    810810            ret = []
    811811            for lw in generate_lyndon_words(n, k):
    812                 b = self._standard_bracket(tuple([names[i] for i in lw]))
     812                b = self._standard_bracket(tuple(names[i] for i in lw))
    813813                ret.append(self.element_class(self, {b: one}))
    814814            return tuple(ret)

comment:14 Changed 3 years ago by mantepse

Another one:

  • src/sage/algebras/lie_algebras/free_lie_algebra.py

    diff --git a/src/sage/algebras/lie_algebras/free_lie_algebra.py b/src/sage/algebras/lie_algebras/free_lie_algebra.py
    index 1d0665d225..1ef05a7f4f 100644
    a b class FreeLieAlgebra(Parent, UniqueRepresentation): 
    774774                 [x0, [[x0, x1], x1]],
    775775                 [x0, [x0, [x1, x2]]],
    776776                 [x0, [[x0, x2], x1]],
    777                  [[x0, x1], [x0, x2]],
    778777                 [x0, [[x0, x2], x2]],
     778                 [[x0, x1], [x0, x2]],
    779779                 [[[x0, x1], x1], x1],
    780780                 [x0, [x1, [x1, x2]]],
    781781                 [[x0, [x1, x2]], x1],
    782                  [[[x0, x2], x1], x1],
    783782                 [x0, [[x1, x2], x2]],
     783                 [[[x0, x2], x1], x1],
    784784                 [[x0, x2], [x1, x2]],
    785785                 [[[x0, x2], x2], x1],
    786786                 [[[x0, x2], x2], x2],

comment:15 Changed 3 years ago by mantepse

  • Reviewers set to Martin Rubey

comment:16 Changed 3 years ago by git

  • Commit changed from 8cc9d0e9cda0a33477caf88a93be69d48bff600e to 7b25ddc646833c95d3d92c0fec399e3df6a7370f

Branch pushed to git repo; I updated commit sha1. New commits:

7b25ddcUsing faster generator for LyndonWords and fixing doctest in free_lie_algebra.py.

comment:17 Changed 3 years ago by tscrim

The change in comment:13 is actually slower in both Python and Cython. It has something to do with how Python constructs these things internally (but I agree that it is counter-intuitive).

With comment:12:

sage: LW = LyndonWords(11,6)
sage: %time for x in LW: pass
CPU times: user 200 ms, sys: 21.3 ms, total: 222 ms
Wall time: 201 ms
sage: LW = LyndonWords(11,7)
sage: %time for x in LW: pass
CPU times: user 1.85 s, sys: 8.12 ms, total: 1.85 s
Wall time: 1.84 s

vs old

sage: LW = LyndonWords(11,6)
sage: %time for x in LW: pass
CPU times: user 475 ms, sys: 42.2 ms, total: 518 ms
Wall time: 465 ms
sage: LW = LyndonWords(11,7)
sage: %time for x in LW: pass
CPU times: user 4.25 s, sys: 17.8 ms, total: 4.27 s
Wall time: 4.24 s

comment:18 Changed 3 years ago by mantepse

One more doctest, a pyflakes thing, and a corner case:

sum(1 for lw in LyndonWords(1, 1000)) is stupid, but (with this patch) it kills my computer by filling up the memory really quickly. Without this patch, it hits the recursion limit, whereas it gives 0 (which is correct) up to and including length 988.

  • src/sage/combinat/combinat_cython.pyx

    diff --git a/src/sage/combinat/combinat_cython.pyx b/src/sage/combinat/combinat_cython.pyx
    index 1a53431a5a..227e5d8900 100644
    a b def generate_lyndon_words(Py_ssize_t n, Py_ssize_t k): 
    168168
    169169        sage: list(generate_lyndon_words(5, 0))
    170170        []
     171
     172        sage: list(generate_lyndon_words(1, 1000))
     173        []
    171174    """
    172175    cdef Py_ssize_t i, j
    173176    if k == 0:
    def generate_lyndon_words(Py_ssize_t n, Py_ssize_t k): 
    176179        for i in range(n):
    177180            yield [i]
    178181        return
     182    if n == 1:
     183        return
    179184
    180185    cdef list a = [0] * (k+1)
    181186    i = k
  • src/sage/combinat/lyndon_word.py

    diff --git a/src/sage/combinat/lyndon_word.py b/src/sage/combinat/lyndon_word.py
    index 8c0f49cfdd..306f7a3cf9 100644
    a b from sage.arith.all import factorial, divisors, gcd, moebius 
    2323from sage.misc.all import prod
    2424
    2525from . import necklace
    26 from sage.combinat.integer_vector import IntegerVectors, integer_vectors_nk_fast_iter
    2726from sage.combinat.words.words import FiniteWords
    2827from sage.combinat.combinat_cython import generate_lyndon_words
    2928
    class LyndonWords_nk(UniqueRepresentation, Parent): 
    456455
    457456            sage: sum(1 for lw in LyndonWords(11, 6))  # long time
    458457            295020
     458
     459            sage: sum(1 for lw in LyndonWords(1000, 1))
     460            1000
     461
     462            sage: sum(1 for lw in LyndonWords(1, 1000))
     463            0
    459464        """
    460465        W = self._words._element_classes['list']
    461466        for lw in generate_lyndon_words(self._n, self._k):

comment:19 Changed 3 years ago by git

  • Commit changed from 7b25ddc646833c95d3d92c0fec399e3df6a7370f to b9d8ee495e9425e34c56361b694899f6aad226da

Branch pushed to git repo; I updated commit sha1. New commits:

b9d8ee4Corner case, pyflakes, and some extra doctests.

comment:20 Changed 3 years ago by tscrim

Fixed and added. I also added more doctests for the other corner case of LyndonWords(1,1).

comment:21 Changed 3 years ago by mantepse

Good catch! Funny that the # long disappeared :-)

I have a few hopefully final questions/requests:

  • since it is very likely that more and more fundamental iterators are cythonized in a similar way (eg. #25864 would probably make a lot of sense), I think it would be good to create and stick to a good naming scheme. It might be a pain to do that later. In short, I think lyndon_word_iterator would be better.
  • there is a typo in the docstring of the iterator, I propose the following instead:
    • src/sage/combinat/combinat_cython.pyx

      diff --git a/src/sage/combinat/combinat_cython.pyx b/src/sage/combinat/combinat_cython.pyx
      index a3e98d3f94..84e707ce7a 100644
      a b def _stirling_number2(n, k): 
      144144
      145145def generate_lyndon_words(Py_ssize_t n, Py_ssize_t k):
      146146    """
      147     Generate all Lyndon words with of length ``k`` with ``n`` letters.
       147    Generate the Lyndon words of fixed length `k` over the alphabet `{0, 1, ..., n-1}`.
      148148
      149149    The resulting Lyndon words will be words represented as lists
      150150    whose alphabet is ``range(n)``.
      151151
      152152    ALGORITHM:
      153153
      154     The FKM algorithm from *Combinatorial Generation* by Ruskey.
       154    The iterative FKM Algorithm 7.2 from *Combinatorial Generation* by Ruskey.
      155155
      156156    EXAMPLES::
  • in the documentation of the module lyndon_word.py, k and n are swapped in the docstrings class LyndonWords_nk and __init__ just thereafter. Should this be fixed in a separate ticket - I think it's easier to do it now.

Otherwise, I think it is ready to go.

comment:22 Changed 3 years ago by tscrim

You can make and push changes too. :) I am going to bed now, but can do it in the morning.

comment:23 Changed 3 years ago by git

  • Commit changed from b9d8ee495e9425e34c56361b694899f6aad226da to 2b3f9d4ae0ef57562e87109ce08db3355dee1795

Branch pushed to git repo; I updated commit sha1. New commits:

2b3f9d4fix doc and rename iterator

comment:24 Changed 3 years ago by git

  • Commit changed from 2b3f9d4ae0ef57562e87109ce08db3355dee1795 to 0b4c68b2ba3c7e270ac8d4137c137695132fc588

Branch pushed to git repo; I updated commit sha1. New commits:

0b4c68bA few more doc tweaks and making Ruskey a proper reference.

comment:25 Changed 3 years ago by tscrim

Thank you. I made a few other small changes: I make Ruskey into an actual reference and the 1-line description of lyndon_word_iterator just use n as code because the alphabet description is given in the main body (and I prefer to use code format as much as possible in the 1-line descriptions for methods).

If my changes are good, then positive review.

comment:26 Changed 3 years ago by mantepse

  • Status changed from needs_review to positive_review

comment:27 Changed 3 years ago by tscrim

Thank you.

comment:28 Changed 3 years ago by vbraun

  • Branch changed from public/combinat/fast_lyndon_iter-26111 to 0b4c68b2ba3c7e270ac8d4137c137695132fc588
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.