Opened 12 years ago
Closed 10 years ago
#9877 closed enhancement (fixed)
Add is_sturmian_factor, is_tangent methods for finite words
Reported by: | Thierry Monteil | Owned by: | Thierry Monteil |
---|---|---|---|
Priority: | major | Milestone: | sage-5.9 |
Component: | combinatorics | Keywords: | sturmian word, tangent word |
Cc: | Sage Combinat CC user, Alexandre Blondin Masse, Sébastien Labbé | Merged in: | sage-5.9.beta1 |
Authors: | Thierry Monteil, Frédéric Chapoton | Reviewers: | Alexandre Blondin Massé, Sébastien Labbé, Vincent Delecroix |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description (last modified by )
Add 3 methods to sage/combinat/words/finite_word.py
:
- sturmian_desubstitute_as_possible
- is_sturmian_factor
- is_tangent
Add a protecting is_tangent
method to sage/combinat/words/paths.py
.
sage: w = Word('01110110110111011101',alphabet='01') sage: w.is_tangent() True
Depends on #8739.
Apply:
Attachments (2)
Change History (37)
comment:1 Changed 12 years ago by
Cc: | Alexandre Blondin Masse Sébastien Labbé added |
---|---|
Description: | modified (diff) |
Status: | new → needs_review |
comment:2 follow-ups: 5 6 7 Changed 12 years ago by
Status: | needs_review → needs_work |
---|
comment:3 follow-ups: 9 10 Changed 12 years ago by
There is another problem with your function:
TypeError: Your word must be defined on a binary alphabet sage: sage: sage: Word('ababab', alphabet='ab').is_sturmian_factor() True sage: Word('ababab').is_sturmian_factor() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /Users/alexandre/Applications/sage/devel/sage-combinat/sage/combinat/words/<ipython console> in <module>() /Users/alexandre/Applications/sage/local/lib/python2.6/site-packages/sage/combinat/words/finite_word.pyc in is_sturmian_factor(self) 4197 - Thierry Monteil 4198 """ -> 4199 return self.sturmian_desubstitute_as_possible().is_empty() 4200 4201 /Users/alexandre/Applications/sage/local/lib/python2.6/site-packages/sage/combinat/words/finite_word.pyc in sturmian_desubstitute_as_possible(self) 4086 W = self.parent() 4087 if (W.size_of_alphabet() != 2): -> 4088 raise TypeError, 'Your word must be defined on a binary alphabet' 4089 alphabet = W.alphabet() 4090 if self.is_empty(): TypeError: Your word must be defined on a binary alphabet
I think this shouldn't be the case. In fact, this is interesting, because it raises another problem I had when working on another patch. What I suggest is that you could first verify if the parent has size 2 and, if it's not the case, you check if its *real* alphabet has size 2 by using the expression len(set(self)) == 2
I think the clean solution would be to replace the deprecated function alphabet()
by a new one that would compute the *real alphabet* in question, so that there would be two kinds of alphabet:
- The one corresponding to the parent of the word (for instance
aaab
might be a word defined on the alphabet{a,b,c}
). - The one corresponding to the alphabet realized by the word
Alph(aaab) = {a,b}
The first one is called by w.parent().alphabet()
while the second one would be obtained from w.alphabet()
. But this is for another ticket.
What I suggest is that you simply add the len(set(w)) == 2
condition to your code and we'll clean it eventually.
comment:4 Changed 12 years ago by
I might be wrong, but it seems that the code for the desubstitute_as_possible
is long and should be cut into smaller functions that maybe already exist in Sage. Could you provide me the pseudocode of the function?
comment:5 Changed 12 years ago by
Replying to abmasse:
- Could you add a reference where one can find the proof that a word is finite sturmian if and only if you can desubstitute it to the empy word?
Sure. This is a well known result (though there are many variants of this idea, i do not know whether this exact one is written somewhere), but i will add at least in the Arnoux chapter of the Pytheas Fogg book. I guess i will also add a reference to a recent article of Smillie and Ulcigrai which explains this in a nice way.
comment:6 Changed 12 years ago by
Replying to abmasse:
- Not very important, but there are two comment
# print <something>
that should be removed from the code in the functionis_sturmian_factor
.
ok.
comment:7 follow-up: 8 Changed 12 years ago by
Replying to abmasse:
- As I told you when you were in Montreal, I think I prefer the name
is_finite_sturmian
(or justis_sturmian
) over the nameis_sturmian_factor
. I feel that the last one implies an argument like inw.is_sturmian_factor(u)
and it seems to be used in many articles (just googling it, you find a big list).
Actually, those words are the balanced one, but the method is_balanced
already exists with a slower (quadratic complexity) implementation (that implements the definition of the balanced word). The first name of my method was is_factor_of_a_sturmian_word
, which describes precisely the method but is too long.
I do not really like is_finite_sturmian
, and i am completely opposed to the name is_sturmian
because the word Sturmian is reserved for infinite words (and this is not the role of sage to change historical notations). Also is_finite_sturmian
will not be well positioned in the automatic completion.
Another possibility could be to add a parameter algorithm
with values default
, desubstitution
or definition
and merge this method into the existing slow method is_balanced
(with the desubstitution
algorithm only applying for 1-balanced words).
comment:8 Changed 12 years ago by
Replying to tmonteil:
Another possibility could be to add a parameter
algorithm
with valuesdefault
,desubstitution
ordefinition
and merge this method into the existing slow methodis_balanced
(with thedesubstitution
algorithm only applying for 1-balanced words).
This is a very good idea. I think you should do it.
comment:9 Changed 12 years ago by
Replying to abmasse:
There is another problem with your function:
[cut] TypeError: Your word must be defined on a binary alphabet
I think this shouldn't be the case. In fact, this is interesting, because it raises another problem I had when working on another patch. What I suggest is that you could first verify if the parent has size 2 and, if it's not the case, you check if its *real* alphabet has size 2 by using the expression len(set(self)) == 2
I think the clean solution would be to replace the deprecated function
alphabet()
by a new one that would compute the *real alphabet* in question, so that there would be two kinds of alphabet:
- The one corresponding to the parent of the word (for instance
aaab
might be a word defined on the alphabet{a,b,c}
).- The one corresponding to the alphabet realized by the word
Alph(aaab) = {a,b}
The first one is called by
w.parent().alphabet()
while the second one would be obtained fromw.alphabet()
. But this is for another ticket.What I suggest is that you simply add the
len(set(w)) == 2
condition to your code and we'll clean it eventually.
Since it is explained in the documentation, this is not a bug but a feature ;)
I was thinking about this alphabet problem, and i am willing to write a ticket about that issue. I think that such a decision should be made for the whole word directory, with its whole community. Here are some bits.
If i start to do such tests in such a method, then almost every method of the FiniteWords? class will have to do it as well, and the faster parent().alphabet() method will become useless. So there is a problem of code replication.
Another problem is that a test like len(set(w)) == 2
takes probably a linear time whereas size_of_alphabet
is just looking into the parent, which is constant time.
Some programmer (user) should know where his/her words are living and should be warned if not. Making "friendly code" does not mean to pass over dirty programmer's inconsistencies, because then it won't detect the clean programmer's mistakes anymore. Fore those lazy ways of programming, we can imagine a global option which tells how the methods should care about the alphabet, so that both kinds of behavior are possible.
As for me, this may look Bourbachist (in particular not bash
-ist ;), but i would like the empty word defined on a 2-letter alphabet to be recognized as a factor of a Sturmian word and not necessarily the empty word defined on a three letter alphabet (or at least to be warned if i do so). This may be, a contrario, an argument to make a difference between is_sturmian_factor
(or is_finite_sturmian
) and is_balanced(1)
.
If in some very few occasion, the user will indeed need to use such a method on some words defined on some non-two-letter alphabet, but then there should be some general coerce methods like redefine_alphabet
or minimize_alphabet_to_actually_used_letters
that s-he can use in his/her code.
Another example of why the alphabet question deals with the whole word
directory, and why there should be an overall policy is the following:
sage: words.LowerMechanicalWord(1/sqrt(2)).parent() Words sage: words.CharacteristicSturmianWord(1/sqrt(2)).parent() Words over Ordered Alphabet [0, 1]
comment:10 Changed 12 years ago by
The first one is called by
w.parent().alphabet()
while the second one would be obtained fromw.alphabet()
. But this is for another ticket.What I suggest is that you simply add the
len(set(w)) == 2
condition to your code and we'll clean it eventually.
I think this is a very good suggestion.
Sébastien
comment:11 Changed 12 years ago by
Hi, Thierry !
Sorry if I haven't been available lately (busy, you know, the usual justification...).
If I'm not mistaken, there were last minor issues with your ticket that haven't been completely solved. To recap:
- Cleaning some commented code.
- Adding the
len(set(w)) == 2
condition. I understand your point of view that this should be done by many other functions in Sage, but for now, it will be cleaner if we do so. - Finally, I agree with you about keeping the name
is_sturmian_factor
. - Providing a reference for the theorem that desubstituting as possible to the empty word is equivalent to being Sturmian.
This is pretty much it!
Changed 12 years ago by
Attachment: | trac_9877_words_sturmian_desubstitution_attempt_2-tm.patch added |
---|
Tested on Sage 4.6
comment:13 follow-up: 16 Changed 12 years ago by
Authors: | → Thierry Monteil |
---|---|
Description: | modified (diff) |
Status: | needs_work → needs_review |
Hi Alexandre,
i started to work back on this ticket. This new patch is supposed to solve your four points.
Since the is_tangent()
method can now be applied to a word with more than 2 letters, i added a protection in the FiniteWordPath_all
class into the file paths.py
so that is_tangent()
could not be applied for such word paths. Indeed, there is an extended definition for them, which is not yet written (see the paper).
Also, concerning a previous comment on using existing sage code, i looked for some "run" method and the only similar code i found was the _delta_iterator()
method in the Word_class
class. I tried to use it, but it seems that the use of the groupby
itertool followed by len
makes a slower code:
from itertools import groupby def runs_groupby(self): for letter, run in groupby(self): yield letter, len(list(run)) def runs_homemade(self): try: previous_letter = self[0] except IndexError: return current_run_length = 0 for i in self: if i == previous_letter: current_run_length += 1 else: yield previous_letter , current_run_length current_run_length = 1 previous_letter = i yield previous_letter , current_run_length timeit('for i in runs_groupby(words.FibonacciWord()[:1000]):\n print i') timeit('for i in runs_homemade(words.FibonacciWord()[:1000]):\n print i')
The first takes 28.6 ms per loop whereas the second takes 18.2 ms per loop. The problem is that the first method, though more low-level, is like passing twice on the word, which seems to be slower than the second (high-level) one-pass method.
Ciao, Thierry
comment:14 Changed 12 years ago by
Please fix the 5 test errors. Four of them are the same :
AttributeError?: 'WordGenerator?' object has no attribute 'KolakoskiWord?'
See the report of the patchbot.
comment:15 Changed 12 years ago by
Description: | modified (diff) |
---|
comment:16 Changed 12 years ago by
Status: | needs_review → needs_info |
---|
Salut Thierry,
1.
Since the
is_tangent()
method can now be applied to a word with more than 2 letters, i added a protection in theFiniteWordPath_all
class into the filepaths.py
so thatis_tangent()
could not be applied for such word paths. Indeed, there is an extended definition for them, which is not yet written (see the paper).
Why? Can you explain more? I don't think the protection is needed.
The patch is having the following lines::
w_isolated = W(l_isolated,datatype='letter') #the word associated to the isolated letter w_running = W(l_running,datatype='letter') #the word associated to the running letter
I remember I suggested you to use datatype = 'letter'. But I realized today that this was supported only for the old sage words objects saved in the pickle jar. So, by introducing it now in this patch, we would have to keep it. So we need to make a decision if we keep it or not. If we don't keep it, you can do the following instead:
w_isolated = W([l_isolated]) #the word associated to the isolated letter w_running = W([l_running]) #the word associated to the running letter
- I think we should redefine the function alphabet for finite words and return the letter occuring in the word and caching the result (I can post a patch for this). And so, the function
is_tangent
could look if this method return a two letter alphabet when the parent don't.
comment:17 Changed 12 years ago by
I get the following error while building the documentation :
/Users/slabbe/Applications/sage-4.6.2/local/lib/python2.6/site-packages/sage/combinat/words/finite_word. py:docstring of sage.combinat.words.finite_word.FiniteWord_class.is_sturmian_factor:47: (ERROR/3) Unexpected indentation.
comment:18 Changed 12 years ago by
Trying to help the build bot
Apply trac_9877_words_sturmian_desubstitution_attempt_2-tm.patch
comment:19 Changed 11 years ago by
Hello,
Some comments to motivate the author of the patch.
1) The problem with datatype=letter may be fixed by the following. At the begining of the function, create a dictionnary "word_from_letter" with keys the two letters and values as words of length 1. It is the best way to do as a call to the parent for the creation of an object is always time consuming. With that done, all tests pass on sage-5.0.beta7.
2) It would be nice to put a "reverse engineering" example (random or not) as follows:
sage: s1 = WordMorphism('a->ab,b->b') sage: s2 = WordMorphism('a->ba,b->b') sage: s3 = WordMorphism('a->a,b->ba') sage: s4 = WordMorphism('a->a,b->ab') sage: W=Words('ab') sage: w=W('ab') sage: for i in xrange(8): w = choice([s1,s2,s3,s4])(w) sage: w word: aabaaabaabaaabaabaabaaabaabaabaaabaabaaa... sage: w.is_sturmian_factor() True
3) There is still the doctest problem mentionned in comment: 17
It would be great to have this patch integrated in sage-5.0!
comment:20 Changed 10 years ago by
ping
It would be great to have this patch integrated in sage-5.7!
comment:21 follow-up: 22 Changed 10 years ago by
Status: | needs_info → needs_review |
---|
I have made the necessary changes for 1) and 3).
Should I incorporate 2) ?
comment:22 Changed 10 years ago by
Replying to chapoton:
I have made the necessary changes for 1) and 3).
Should I incorporate 2) ?
I would. You do not like the example ?
comment:23 Changed 10 years ago by
Here it is. I had to change the result to
word: abaaabaaabaabaaabaaabaabaaabaabaaabaaaba...
for the doctest to pass.
Please review.
comment:24 Changed 10 years ago by
Some last docstring comments:
- for expressions that refer to Python variables or reserved names (in particular self) you may use double quotes (at some places it is ok and at some other it is not)
- line 5179: "desubstitution consist" -> "desubstitution consists"
- lines 5190, 5340, 5419: self is not a part of the input
- is_tangent doc: the word tangent, where it is defined, must be bold (put the word between two *). The definition is not very english, I suggest: A binary word is said to be *tangent* if it can appear in infintely many cutting sequences of a smooth curve, where each cutting sequence is observed on a progressively smaller grid.
Best, Vincent
comment:27 Changed 10 years ago by
Authors: | Thierry Monteil → Thierry Monteil, Frédéric Chapoton |
---|---|
Keywords: | sturmian word tangent word added |
Reviewers: | → Vincent Delecroix |
Status: | needs_review → positive_review |
Yes it is. Thanks Frédéric for the finalization !
comment:28 Changed 10 years ago by
Reviewers: | Vincent Delecroix → Alexandre Blondin Massé, Sébastien Labbé, Vincent Delecroix |
---|
comment:29 Changed 10 years ago by
Milestone: | sage-5.8 → sage-5.9 |
---|
comment:30 Changed 10 years ago by
Status: | positive_review → needs_info |
---|
Please clarify which patch(es) must be applied.
comment:31 Changed 10 years ago by
Description: | modified (diff) |
---|---|
Status: | needs_info → needs_review |
comment:32 Changed 10 years ago by
Status: | needs_review → positive_review |
---|
comment:33 Changed 10 years ago by
Status: | positive_review → needs_work |
---|
There is a problem building the documentation:
[combinat ] /release/merger/sage-5.9.beta0/local/lib/python2.7/site-packages/sage/combinat/words/finite_word.py:docstring of sage.combinat.words.finite_word:47: WARNING: Enumerated list ends without a blank line; unexpected unindent.
Changed 10 years ago by
Attachment: | trac_9777-sturm-review-fc.patch added |
---|
comment:34 Changed 10 years ago by
Status: | needs_work → positive_review |
---|
I have corrected the doc, back to positive review
comment:35 Changed 10 years ago by
Merged in: | → sage-5.9.beta1 |
---|---|
Resolution: | → fixed |
Status: | positive_review → closed |
I'm testing your patch in the next two hours. Before testing it explicitly, just some comments:
is_finite_sturmian
(or justis_sturmian
) over the nameis_sturmian_factor
. I feel that the last one implies an argument like inw.is_sturmian_factor(u)
and it seems to be used in many articles (just googling it, you find a big list).# print <something>
that should be removed from the code in the functionis_sturmian_factor
.I'm waiting for sage-combinat to install and I'll test and look more thoroughly at your code.