#31684 closed enhancement (fixed)

Improve WordMorphism._language_naive

Reported by: gh-mrejmon Owned by:
Priority: major Milestone: sage-9.4
Component: combinatorics Keywords: words
Cc: Merged in:
Authors: Martin Rejmon Reviewers: Travis Scrimshaw
Report Upstream: N/A Work issues:
Branch: 9a49d59 (Commits, GitHub, GitLab) Commit: 9a49d59b96a65f8c304c5325947255f44c6bc389
Dependencies: Stopgaps:

Status badges

Description

I pushed a branch with multiple changes to the _language_naive method of the WordMorphism class.

First commit adds a sentence to the docs mentioning that the substitution should be non-erasing.

Second commit copy-pastes some code to make sure that the argument u is also a part of the language (and also adds a test for it). Example:

Before:

sage: s = WordMorphism({0: [0,1], 1:[0]})
sage: W = s.domain()
sage: W([1, 1]) in s._language_naive(3, W([1, 1]))
False

After:

sage: W([1, 1]) in s._language_naive(3, W([1, 1]))
True

I also replaced the line:

L = set(u.parent()())

with:

L = set()

since it does the same thing and is less confusing.

Third commit adds an alternative path for "non-letter" factors to speed up the method. Example:

Before:

sage: %timeit s._language_naive(100, W([0]))
7.49 s ± 48.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

After:

sage: %timeit s._language_naive(100, W([0]))
148 ms ± 2.56 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

The code is based on the fact, that for images of non-letter words u we only need to check factors which start in the image of the leftmost letter of u and end in the image of the rightmost letter of u, all the other factors will be generated when the images of factors of u are processed. The implementation results in a slight index hell though.

Also while writing this ticket I noticed that the method is way (way) slower if words with datatype string instead of list are used (even before this patch). Maybe also worth mentioning in the docs?

Change History (6)

comment:1 Changed 15 months ago by gh-mrejmon

  • Status changed from new to needs_review

comment:2 follow-up: Changed 15 months ago by tscrim

  • Reviewers set to Travis Scrimshaw

Some quick analysis is that the slicing f = v[i:j] takes the majority of the time for a string, whereas for a list, it is much faster. This is also reflected in the fact v = self(u) takes 3 times as long for strings as for lists. So if you want to get the string-based words to speed up, you need to increase the speed of word creation.

Doing a bit more analysis (%lprun is great for this), the FiniteWords.__call__ function is being called many more times than for a list. Lots of time is spent in the im += self._morph[a] line of WordMorphism.__call__. This comes from the fact that for words in {0,...,255}, there is the FiniteWord_char that has significant optimizations and no safety checks. In particular, we can do stuff like:

sage: si = WordMorphism({0: [0,1], 1:[0]})
sage: si._morph[0] + [265,2]
...
OverflowError: value too large to convert to unsigned char

So there are a number of optimziations that can be made across the word code. One simple one here is to just get the raw data from the finite word and not actually create a word until the entire process is done.

Also, I noticed a typo in WordMorphism.__init__:

                elif not isinstance(domain, FiniteWords):
                    raise TypeError("the codomain must be a set of finite words")

as it should be the domain, not codomain.


Anyways, these are all things for a separate ticket and shouldn't be mentioned in the docs.

The only real comment I have is that non-substitution requirement existed before this ticket, correct?

You can also use self._morph[letter] instead of self.image(letter).

comment:3 Changed 15 months ago by git

  • Commit changed from e6d88b22bd01983a817f95394674e9650df2d2d1 to 9a49d59b96a65f8c304c5325947255f44c6bc389

Branch pushed to git repo; I updated commit sha1. New commits:

9a49d5931684: Inline a function call

comment:4 in reply to: ↑ 2 Changed 15 months ago by gh-mrejmon

Replying to tscrim:

Some quick analysis ...

Interesting, thanks for the explanation.

Anyways, these are all things for a separate ticket and shouldn't be mentioned in the docs.

Alright.

The only real comment I have is that non-substitution requirement existed before this ticket, correct?

Yes. With an erasing substitution it is possible that not all words will be returned (both before and after this patch). A quick example:

sage: f = WordMorphism('s->abc,a->a,b->,c->c')
sage: Word('ac') in f._language_naive(3, Word('s'))
False
sage: Word('ac') in f._language_naive(4, Word('s'))
True

The word ac is found only when words of length at least 3 are also requested.

You can also use self._morph[letter] instead of self.image(letter).

Done.

comment:5 Changed 15 months ago by tscrim

  • Status changed from needs_review to positive_review

I see. Thank you. LGTM.

comment:6 Changed 13 months ago by vbraun

  • Branch changed from u/gh-mrejmon/speedup_language_naive to 9a49d59b96a65f8c304c5325947255f44c6bc389
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.