Opened 4 years ago

Closed 4 years ago

#19619 closed defect (fixed)

Simplify words.py

Reported by: vdelecroix Owned by:
Priority: major Milestone: sage-6.10
Component: combinatorics Keywords:
Cc: slabbe Merged in:
Authors: Vincent Delecroix Reviewers: Sébastien Labbé
Report Upstream: N/A Work issues:
Branch: 4fd6556 (Commits) Commit: 4fd6556a85deb5b43c571ec7a200570e47e684f6
Dependencies: Stopgaps:

Description (last modified by vdelecroix)

Currently we have too many parent for words:

  • for finite and infinite words (Words_all, Words_over_alphabet, Words_over_OrderedAlphabet)
  • finite words (FiniteWords_over_OrderedAlphabet)
  • infinite words (InfiniteWords_over_OrderedAlphabet)
  • words of fixed length (FiniteWords_length_k_over_OrderedAlphabet and Words_n)

This lead to subtle bug like

sage: W = Words([0,1], finite=False, infinite=True)
sage: u = W.an_element()
sage: u[2:5].parent()   # a finite word !!
Infinite Words over {0, 1}

The proposal of this ticket is to have only four classes:

  • FiniteWords
  • Words_n: words of length n (as a slice of the one before)
  • InfiniteWords (or FullShift)
  • FiniteOrInfiniteWords

The parent FiniteWords should hence have a method .shift() that return the associated shift (e.g u ** Infinity will belong there). Similarly, the parent InfiniteWords should have a method .factors() that return the set of factors (and finite slices will belong there).

We also:

  • deprecate size_of_alphabet and has_letter
  • use a bit more of categories in lyndon_word.py
  • use more often Word_char if possible
  • implement a better iterator for periodic points that avoids computing huge power of the morphism (that is not always doable)

Change History (54)

comment:1 Changed 4 years ago by slabbe

Sounds great. I will be reviewing this.

comment:2 Changed 4 years ago by vdelecroix

  • Authors set to Vincent Delecroix
  • Branch set to u/vdelecroix/19619
  • Commit set to fc24b013baa42774d80d1aef9d03009d676329fe
  • Status changed from new to needs_review

It took me much more time than expected... It is not yet completely finished but all tests pass in combinat/words/, combinat/lyndon_word.py and combinat/e_one_star.py.


New commits:

4d94566Trac 19619: simplify words.py
fc24b01Trac 19619: fix Lyndon words

comment:3 Changed 4 years ago by vdelecroix

  • Description modified (diff)

comment:4 Changed 4 years ago by git

  • Commit changed from fc24b013baa42774d80d1aef9d03009d676329fe to eddb67cbd7ed7284b130b927fec09d97c42d8376

Branch pushed to git repo; I updated commit sha1. New commits:

e16fd47Trac 19619: fix digraph generators
c86a5ffTrac 19619: fix continued fractions
3ccf384Trac 19619: cardinality for set of words
3cdf5d7Trac 19619: doc in algebras with basis
eddb67cTrac 19619: doc in finite state machine

comment:5 Changed 4 years ago by git

  • Commit changed from eddb67cbd7ed7284b130b927fec09d97c42d8376 to 0917c15693fc71c6f3b3d5795d0744f82d9da5f5

Branch pushed to git repo; I updated commit sha1. New commits:

0917c15Trac 19619: fix unpickling

comment:6 Changed 4 years ago by vdelecroix

What a mess this unpickling stuff!!!

comment:7 Changed 4 years ago by git

  • Commit changed from 0917c15693fc71c6f3b3d5795d0744f82d9da5f5 to ab9f0e10ff70cbb1a87f3f7179db7e572ed3771c

Branch pushed to git repo; I updated commit sha1. New commits:

ab9f0e1Trac 19619: typo

comment:8 Changed 4 years ago by git

  • Commit changed from ab9f0e10ff70cbb1a87f3f7179db7e572ed3771c to 661588fc93cd6d90caec33d1adfe5755983ae871

Branch pushed to git repo; I updated commit sha1. New commits:

661588fTrac 19619: bad input block

comment:9 Changed 4 years ago by vdelecroix

All tests pass!

comment:10 follow-up: Changed 4 years ago by slabbe

Is it ready for review? I saw a TODO in the diff...

comment:11 in reply to: ↑ 10 Changed 4 years ago by vdelecroix

Replying to slabbe:

Is it ready for review? I saw a TODO in the diff...

Read for review. The TODO can be removed. It was because of the cmp_letters method. Now its status is decided on construction as follows

        if alphabet.cardinality() == Infinity or \
           (alphabet.cardinality() < 36 and
            all(alphabet.unrank(i) > alphabet.unrank(j) for
            i in range(min(36,alphabet.cardinality())) for j in range(i))):
            self.cmp_letters = cmp
        else:
            self.cmp_letters = self._cmp_letters

where _cmp_letters is the previous version which compares by ranking in the alphabet. It is of course not ideal, but I think that we should get rid of this cmp_letters anyway.

comment:12 Changed 4 years ago by git

  • Commit changed from 661588fc93cd6d90caec33d1adfe5755983ae871 to a706f496accea8937d708f4432925a895b0450f6

Branch pushed to git repo; I updated commit sha1. New commits:

a706f49Trac 19619: remove the TODO

comment:13 Changed 4 years ago by slabbe

Some methods are missing doctests apparently:

Decreased doctests in combinat/words/morphism.py: from 53 / 53 = 100% to 53 / 56 = 94%
Decreased doctests in combinat/words/paths.py: from 57 / 57 = 100% to 57 / 59 = 96%
Decreased doctests in combinat/words/words.py: from 45 / 45 = 100% to 63 / 66 = 95%

comment:14 Changed 4 years ago by vdelecroix

Is that a problem?

comment:15 Changed 4 years ago by vdelecroix

In words.py, I can add some:

TESTS:

    sage: print "hello"
    hello

but I do not see the point. The missing doctest are in the deprecated class Words_all I do not see any relevant test...

For morphism.py and paths.py I will complete the missing ones.

comment:16 Changed 4 years ago by git

  • Commit changed from a706f496accea8937d708f4432925a895b0450f6 to bcc2d8678ed56cb0dca16b4647811834268b36b0

Branch pushed to git repo; I updated commit sha1. New commits:

bcc2d86Trac 19619: more doc (and equality improvements)

comment:17 Changed 4 years ago by slabbe

The ones at lines 302 and 1721 are not deprecated. The method _cmp_letters is tested with cmp_letters (?). To my opinion, the two methods that are deprecated should also be doctested. It will take three lines, show the user how to use those methods. Maybe with a pickle through an indirect doctest, I don't know. For the one at line 2429, say why this method must exist, it contains only "pass", why?

------------------------------------------------------------------------
SCORE src/sage/combinat/words/words.py: 95.5% (63 of 66)

Missing documentation:
     * line 1721: def _element_classes(self)
     * line 2424: def __setstate__(self, state)
     * line 2429: def _element_constructor_(self)

Possibly wrong (function name doesn't occur in doctests):
     * line 302: def _cmp_letters(self, letter1, letter2)
------------------------------------------------------------------------

comment:18 Changed 4 years ago by git

  • Commit changed from bcc2d8678ed56cb0dca16b4647811834268b36b0 to 29daae2736e3884b6513f021596a5c7ef0155f6e

Branch pushed to git repo; I updated commit sha1. New commits:

29daae2Trac 19619: doc for _cmp_letters and _element_classes

comment:19 Changed 4 years ago by git

  • Commit changed from 29daae2736e3884b6513f021596a5c7ef0155f6e to 449488686e52fb15a2a333073eed39403803347f

Branch pushed to git repo; I updated commit sha1. New commits:

4494886Trac 19619: remove __setstate__ and few words for _element_constructor_

comment:20 follow-up: Changed 4 years ago by slabbe

Why Words_n inherits from Parent instead of AbstractLanguage?

Last edited 4 years ago by slabbe (previous) (diff)

comment:21 follow-up: Changed 4 years ago by slabbe

Should a parent return an element for which he is the parent?

sage: W = Words('abc') 
sage: W 
Finite and infinite words over {'a', 'b', 'c'}
sage: W('aabbcc').parent()
Finite words over {'a', 'b', 'c'}
sage: W('aabbcc').parent() == W
False   
sage: W = Words(range(4))     
sage: W(lambda n:n%4).parent() == W 
False  
Last edited 4 years ago by slabbe (previous) (diff)

comment:22 in reply to: ↑ 20 ; follow-up: Changed 4 years ago by vdelecroix

Replying to slabbe:

Why Words_n inherits for Parent instead of AbstractLanguage?

AbstractLanguage is designed to disappear. For the moment it just holds the _alphabet attributes and some deprecated methods. I am not sure that all language should keep this _alphabet attribute. That can be changed. Do we want that languages implement a method alphabet or simply provides an alphabet to the constructor of a base class?

Words_n does not care about an attribute _alphabet since it is initialized with a FiniteWords as first argument. It is essentially an empty shell.

comment:23 in reply to: ↑ 21 ; follow-up: Changed 4 years ago by vdelecroix

Replying to slabbe:

Should a parent return an element for which he is the parent?

There are other parents doing the same

sage: NN.an_element().parent() == NN
False
sage: Primes().an_element().parent() == Primes()
False

And this is the concept of facade (in the Sage category sense).

With my branch you always have:

  • finte words have a parent FiniteWords
  • infinite words have a parent InfiniteWords
  • words with indefinite status have a parent FiniteOrInfiniteWords

In particular

sage: W = Words('ab')
sage: W('ab').parent() is W.finite_words()
True
sage: W(lambda n: 'a', length='infinite').parent() is W.infinite_words()
True
sage: W(iter("ab"*100)).parent() is W
True
Last edited 4 years ago by vdelecroix (previous) (diff)

comment:24 in reply to: ↑ 22 ; follow-up: Changed 4 years ago by slabbe

Replying to vdelecroix:

AbstractLanguage is designed to disappear.

If so, say it in the docstring __doc__ of AbstractLanguage. It helps to know what should be the next improvements to the code.

For the moment it just holds the _alphabet attributes and some deprecated methods.

But it contains some other nondeprecated methods too.

comment:25 in reply to: ↑ 24 Changed 4 years ago by vdelecroix

Replying to slabbe:

Replying to vdelecroix:

AbstractLanguage is designed to disappear.

If so, say it in the docstring __doc__ of AbstractLanguage. It helps to know what should be the next improvements to the code.

For the moment it just holds the _alphabet attributes and some deprecated methods.

But it contains some other nondeprecated methods too.

Right. Let say that it is an experimental abstraction of a potential Language base class... I will update the doc (not changing anything else).

comment:26 in reply to: ↑ 23 ; follow-up: Changed 4 years ago by slabbe

Replying to vdelecroix:

With my branch you always have:

  • finte words have a parent FiniteWords
  • infinite words have a parent InfiniteWords
  • words with indefinite status have a parent FiniteOrInfiniteWords

It is interesting that you write FiniteOrInfiniteWords here instead of FiniteAndInfiniteWords in the code because I was going to suggest to call that class FiniteOrInfiniteWords instead.

comment:27 Changed 4 years ago by git

  • Commit changed from 449488686e52fb15a2a333073eed39403803347f to 51c8f041e399556e7b7e5feb340bc1314e3f1d40

Branch pushed to git repo; I updated commit sha1. New commits:

a8f6921merge u/vdelecroix/19619 in Sage-6.10.beta6
51c8f04Trac 19619: add doc to AbstractLanguage

comment:28 in reply to: ↑ 26 Changed 4 years ago by vdelecroix

Replying to slabbe:

Replying to vdelecroix:

With my branch you always have:

  • finte words have a parent FiniteWords
  • infinite words have a parent InfiniteWords
  • words with indefinite status have a parent FiniteOrInfiniteWords

It is interesting that you write FiniteOrInfiniteWords here instead of FiniteAndInfiniteWords in the code because I was going to suggest to call that class FiniteOrInfiniteWords instead.

Oh right. It is clearly a union, hence Or makes more sense... will do it.

comment:29 Changed 4 years ago by git

  • Commit changed from 51c8f041e399556e7b7e5feb340bc1314e3f1d40 to 7ee2e3c52d273b51cef7993a5e605918a12a99ac

Branch pushed to git repo; I updated commit sha1. New commits:

7ee2e3cTrac 19619: FiniteAndInfinite -> FiniteOrInfinite

comment:30 Changed 4 years ago by vdelecroix

  • Description modified (diff)

comment:31 Changed 4 years ago by slabbe

  • Reviewers set to Sébastien Labbé

I am still not done with the review as I need more time to think and look at the code... Will continue this tomorrow.

comment:32 follow-up: Changed 4 years ago by slabbe

  • Status changed from needs_review to needs_work
  • Are sure that we do not prefer the __call__ to be implemented in one place in AbstractLanguage? Lot of documentation is a copy paste...
  • Argument length in unused in InfiniteWords.__call__.
  • Documentation of InfiniteWords.__call__ should not mention list, str, tuple, char and the length argument.
  • The below lines in FiniteWords._word_from_word seem useless. Same comment for InfiniteWords._word_from_word:
    from sage.combinat.words.finite_word import CallableFromListOfWords
    if isinstance(data._func, CallableFromListOfWords):
        # The following line is important because, in this case,
        # data._func is also a tuple (indeed
        # CallableFromListOfWords inherits from tuple)
        datatype = "callable"
    
  • Words_n.__call__ should include a INPUTS: block
  • The following imports are not used:
    from sage.misc.mrange import xmrange
    from sage.structure.parent import Set_PythonType
    from sage.combinat.combinat import InfiniteAbstractCombinatorialClass
    

comment:33 Changed 4 years ago by git

  • Commit changed from 7ee2e3c52d273b51cef7993a5e605918a12a99ac to f8e4ca55cb76eb29c67d656270f9ce5f6c95b0a0

Branch pushed to git repo; I updated commit sha1. New commits:

f8e4ca5Trac 19619: remove length arguments

comment:34 in reply to: ↑ 32 Changed 4 years ago by vdelecroix

  • Status changed from needs_work to needs_review

Replying to slabbe:

  • Are sure that we do not prefer the __call__ to be implemented in one place in AbstractLanguage? Lot of documentation is a copy paste...

Definitely not. We do not want to impose the class of elements for (future) languages. Moreover:

  • you should not be able to build infinite words from FiniteWords or finite ones from InfiniteWords.
  • the code is now much simpler, isn't it?

To my mind, the documentation is completely useless. Which user will do

sage: W = Words('ab')
sage: W.__call__?

comment:35 follow-up: Changed 4 years ago by mantepse

As a very casual user of words: I would expect FiniteOrInfiniteWords to be Words...

comment:36 in reply to: ↑ 35 Changed 4 years ago by vdelecroix

Replying to mantepse:

As a very casual user of words: I would expect FiniteOrInfiniteWords to be Words...

One problem is that Words is the name of the factory that dispatch to the actual classes FiniteWords, InfiniteWords, FiniteOrInfiniteWords or Words_n. Why do you care about the class names? You will never have to do

sage: FiniteOrInfiniteWords(my_alphabet)

comment:37 Changed 4 years ago by mantepse

OK, great, sorry for the noise!

comment:38 follow-up: Changed 4 years ago by slabbe

  • Status changed from needs_review to needs_work

The fourth item of my previous comment has not been answered.

comment:39 Changed 4 years ago by git

  • Commit changed from f8e4ca55cb76eb29c67d656270f9ce5f6c95b0a0 to fa5ef782448a49b2b3b141b5d91a9ec44ba38064

Branch pushed to git repo; I updated commit sha1. New commits:

fa5ef78Trac 19619: remove useless code

comment:40 in reply to: ↑ 38 Changed 4 years ago by vdelecroix

  • Status changed from needs_work to needs_review

Replying to slabbe:

The fourth item of my previous comment has not been answered.

Indeed.

comment:41 Changed 4 years ago by slabbe

  • Status changed from needs_review to needs_work

I think this comment inside FiniteWords._word_from_word

+            # this must be put first for the reason that CallableFromListOfWords
+            # inherits from tuple

is unnecessary and date from the time were one method was doing everything for the creation of a word. Then, data could be a tuple or a CallableFromListOfWords which inherits from tuple explaining the reason we needed to test for isinstance(data, CallableFromListOfWords) first.

Now, in that method, data is word (not a tuple neither a CallableFromListOfWords) so the isinstance tests can be done in any order. Question: what is best to test first?

comment:42 Changed 4 years ago by git

  • Commit changed from fa5ef782448a49b2b3b141b5d91a9ec44ba38064 to dd103be6eff7e4dc98c6390162059bac319d02ce

Branch pushed to git repo; I updated commit sha1. New commits:

0db113aTrac 19619: simplifications in word constructors
dd103beTrac 19619: doctest failures + documentation

comment:43 Changed 4 years ago by vdelecroix

  • Status changed from needs_work to needs_review

I tried to do my best for FiniteWords._word_from_word. I also removed _word_from_callable from FiniteOrInfiniteWords and simplified its __call__.

Last edited 4 years ago by vdelecroix (previous) (diff)

comment:44 Changed 4 years ago by slabbe

  • Status changed from needs_review to needs_work

I looked at the code. It looks good. But I now get (also reported by the patchbots):

----------------------------------------------------------------------
sage -t --warn-long 31.0 permutation.py  # 1 doctest failed
sage -t --warn-long 31.0 parking_functions.py  # 62 doctests failed
----------------------------------------------------------------------

comment:45 Changed 4 years ago by git

  • Commit changed from dd103be6eff7e4dc98c6390162059bac319d02ce to 13e989e3ab16bcda4980b69d7f6afd65ab46b318

Branch pushed to git repo; I updated commit sha1. New commits:

13e989eTrac 19619: fix wrong guessing

comment:46 Changed 4 years ago by vdelecroix

  • Status changed from needs_work to needs_review

Right. Permutations or parking functions are callable but should be considered as finite...

Last edited 4 years ago by vdelecroix (previous) (diff)

comment:47 follow-up: Changed 4 years ago by slabbe

I would prefer if the code of FiniteOrInfiniteWords.__call__ be like the others FiniteWords.__call__ and InfiniteWords.__call__ in the sense that first the function does:

if datatype is not None:
    #return directly the good word
    if datatype in ['str', 'list', 'char', 'tuple']:
        return etc...
else:
    #guess the datatype, the length, etc.

if you agree. It makes the code easier to read and be convince anybody that it does the minimal and most efficient path in the function.

Doing what you do in InfiniteWords.__call__ essentially reverts ticket #17021 where that kind of guessing was done first. Note that to avoid duplication of code, it is possible that you will have to recreate _word_from_callable because of that... or maybe not.

comment:48 in reply to: ↑ 47 Changed 4 years ago by vdelecroix

Replying to slabbe:

I would prefer if the code of FiniteOrInfiniteWords.__call__ be like the others FiniteWords.__call__ and InfiniteWords.__call__ in the sense that first the function does:

<SNIP>

if you agree. It makes the code easier to read and be convince anybody that it does the minimal and most efficient path in the function.

Doing what you do in InfiniteWords.__call__ essentially reverts ticket #17021 where that kind of guessing was done first. Note that to avoid duplication of code, it is possible that you will have to recreate _word_from_callable because of that... or maybe not.

If a user provides datatype and length there will be no guessing... And with what I changed the first step is to determine the length (in order to determine the parent). I do not see how to handle what you suggested without copy/paste.

comment:49 Changed 4 years ago by vdelecroix

Moreover, the length guessing is based on what the user input as datatype...

comment:50 Changed 4 years ago by vdelecroix

Some timings with the branch

sage: W = Words()
sage: FW = FiniteWords()
sage: L = range(1000)
sage: %timeit W(L, check=False)
100000 loops, best of 3: 2.54 µs per loop
sage: %timeit W(L, length="finite", check=False)
1000000 loops, best of 3: 1.58 µs per loop
sage: %timeit W(L, datatype="list", check=False)
100000 loops, best of 3: 1.75 µs per loop
sage: %timeit W(L, length="finite", datatype="list", check=False)
1000000 loops, best of 3: 1.49 µs per loop
sage: %timeit FW(L, check=False)
1000000 loops, best of 3: 823 ns per loop

Whereas we have on develop

sage: W = Words()
sage: %timeit W(L, check=False)
1000000 loops, best of 3: 800 ns per loop

It looks like the code in the branch is:

  • 3x slower now if nothing is provided
  • 2x slower if length or datatype is provided
  • 1x if call directly FiniteWords

Plenty of reasons: more inheritance, more function calls.

comment:51 Changed 4 years ago by git

  • Commit changed from 13e989e3ab16bcda4980b69d7f6afd65ab46b318 to 4fd6556a85deb5b43c571ec7a200570e47e684f6

Branch pushed to git repo; I updated commit sha1. New commits:

4fd6556Trac 19619: merge into 6.10.beta7

comment:52 Changed 4 years ago by slabbe

  • Status changed from needs_review to positive_review

Good to go.

comment:53 Changed 4 years ago by vdelecroix

Great! Thanks for the review!

comment:54 Changed 4 years ago by vbraun

  • Branch changed from u/vdelecroix/19619 to 4fd6556a85deb5b43c571ec7a200570e47e684f6
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.