Opened 8 years ago

Closed 7 years ago

#8227 closed enhancement (fixed)

Improving iterated palindromic closure computation

Reported by: abmasse Owned by: abmasse
Priority: major Milestone: sage-4.3.4
Component: combinatorics Keywords: iterated palindromic closure
Cc: slabbe Merged in: sage-4.3.4.alpha0
Authors: Alexandre Blondin Massé Reviewers: Sébastien Labbé
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by abmasse)

There is a faster way to compute the iterated palindromic closure of a word than using the definition. The problem with the latter is that it needs to compute the longest f-palindromic suffix of the current word at each step, while it is possible to easily deduce this length only by looking at the directive word.
Here is an example:

sage: w = words.RandomWord(10)
sage: w.iterated_right_palindromic_closure(algorithm='definition')
word: 0010010001001000100100010010001001001000...
sage: timeit("print w.iterated_right_palindromic_closure(algorithm='definition').length()")
5 loops, best of 3: 211 ms per loop
sage: timeit("print w.iterated_right_palindromic_closure(algorithm='recursive').length()")
25 loops, best of 3: 9.46 ms per loop

The difference is particularly notable for longer words:

sage: w = words.RandomWord(15)
sage: timeit("print w.iterated_right_palindromic_closure(algorithm='definition').length()")
5 loops, best of 3: 3.73 s per loop
sage: timeit("print w.iterated_right_palindromic_closure(algorithm='recursive').length()")
25 loops, best of 3: 37.4 ms per loop

Attachments (2)

trac_8227_iterated_palindromic_closure_improvement-abm.patch (14.9 KB) - added by abmasse 7 years ago.
Updated patch with corrections
trac_8227_review-sl.patch (3.2 KB) - added by slabbe 7 years ago.
Applies over the precedent patch

Download all attachments as: .zip

Change History (14)

comment:1 Changed 8 years ago by abmasse

  • Description modified (diff)
  • Status changed from new to needs_review

I had to implement two other functions: find() and rfind() that were only available for Word_str words. It is not the more efficient implementation yet, but that was not the goal of this ticket...

comment:2 Changed 8 years ago by abmasse

  • Owner changed from AlexGhitza to abmasse

comment:3 Changed 8 years ago by abmasse

  • Component changed from algebra to combinatorics

comment:4 Changed 8 years ago by slabbe

  • Status changed from needs_review to needs_work
  1. I think the patch should follow the python behavior (i.e. return -1)
sage: '0123456789'.find('4566')
-1
sage: '0123456789'.rfind('4566')
-1

for those functions :

sage: Word(range(10)).rfind(Word([4,5,6,6]))
sage: Word(range(10)).find(Word([4,5,6,6]))
  1. The following comment found in the doc of find and rfind
This function is different for Word_str objects:

illustrates a problem : the behavior for Word_str should be the same (Word_str is wrong). Can you fix it? You may consult #8127 for an idea of how to handle it. Make sure to ask the parent using super if type of other is not an str or a Word_str.

  1. An enumeration in the doc of the iterator function is broken as seen in the result of :
sage: w = Word(range(10))
sage: browse_sage_doc(w._iterated_right_palindromic_closure_recursive_iterator)

Adding a blank line before the itemize should repair the problem. I also suggest to put WordMorphism, 'recursive' and _iterative_righ...iterator() inside double backquotes (like input arguments).

  1. Looking the function below but also how naive is the code of find, maybe the function find could use the function first_pos_in instead? This makes me realize that the function first_pos_in was probably a bad choice of name.... Using the new deprecation warning introduced recently by Florent Hivert this (name modif) can be done more easily now (but not in this ticket).
sage: %timeit Word([990,991,992,993]).first_pos_in(Word(range(1000)))
125 loops, best of 3: 1.65 ms per loop
sage: %timeit Word(range(1000)).find(Word([990,991,992,993]))
5 loops, best of 3: 48 ms per loop
  1. Could rfind could be improved easily using _pos_in and other good suffix table already implemented? If so, it can be good to do it now. But if you don't care now, it is fine. The function could be improved later if it is valuable. Anyway, the next step for all those search stuff is to be cythoned...

That's all for my comments.

comment:5 follow-up: Changed 8 years ago by abmasse

Thank you Sébastien for all those comments ! I agree with you on items 4 and 5, but I think these issues should be addressed in another ticket. I intend to do it in a close future, but this is in some way independent of the iterated palindromic closures functions. As for item 1, I agree with you as well, but I think it would be better if find() and rfind() functions could allow the user to choose between different string search algorithms (Boyer-Moore, KMP, etc.), so that it is maybe not necessary to make them look like the Python functions (what do you think?). If I understand you well in item 2, you would like me to change the find() and rfind() functions for Word_str objects or only to detect it in the algorithm computing the iterated palindromic closure? Finally, I will correct item 3 and the other problems as soon as you answer me.

comment:6 Changed 8 years ago by abmasse

  • Description modified (diff)

comment:7 in reply to: ↑ 5 Changed 8 years ago by slabbe

Replying to abmasse:

Thank you Sébastien for all those comments ! I agree with you on items 4 and 5, but I think these issues should be addressed in another ticket.

Well, ok for 5 : rfind could be improved later. But for 4, I would like your new find function to make use of first_pos_in since it is already there and is faster. If you want to keep your implementation there, I suggest you use a parameter algorithm that defaults to suffix_table or a similar word and that make use of first_pos_in.

As for item 1, I agree with you as well, but I think it would be better if find() and rfind() functions could allow the user to choose between different string search algorithms (Boyer-Moore, KMP, etc.), so that it is maybe not necessary to make them look like the Python functions (what do you think?).

I think both are possible (Je ne crois pas que l'un empêche l'autre) : one may selec the algorithm and the function can still behave like python. For example,

sage: w = Word(range(10))
sage: u = Word(range(5, 8))
sage: w.find(u)
5
sage: w.find(u, algorithm='KMP')
5
sage: w.find(u*u)
-1

If I understand you well in item 2, you would like me to change the find() and rfind() functions for Word_str objects or only to detect it in the algorithm computing the iterated palindromic closure? Finally, I will correct item 3 and the other problems as soon as you answer me.

Ok, so let's open a new ticket to clean up find and rfind.

comment:8 Changed 7 years ago by abmasse

I have corrected items 1 to 4.

As discussed, item 5 will be done in another ticket or directly in Cython.

I'll upload the new patch in a few minutes.

Changed 7 years ago by abmasse

Updated patch with corrections

comment:9 Changed 7 years ago by abmasse

  • Status changed from needs_work to needs_review

Changed 7 years ago by slabbe

Applies over the precedent patch

comment:10 Changed 7 years ago by slabbe

  • Authors changed from abmasse to Alexandre Blondin-Massé
  • Reviewers set to Sébastien Labbé

I just tested the patch. All test passed in sage/combinat/words. The speed of the function is greatly improved. Doc builds fine. I am giving a positive review to this ticket.

Althought, I added a patch fixing some small sphinx editing and replace l for L because I was reading 1 at first glance. Alexandre, if you agree with my patch, I allow you to change the status of this ticket to positive review.

comment:11 Changed 7 years ago by abmasse

  • Authors changed from Alexandre Blondin-Massé to Alexandre Blondin Massé
  • Status changed from needs_review to positive_review

I agree with the changes. I tested the two patches and everything seems alright. All tests passed, no problem in the documentation. Positive review to Sébastien's changes.

comment:12 Changed 7 years ago by mvngu

  • Merged in set to sage-4.3.4.alpha0
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.