Opened 12 years ago

Closed 12 years ago

# Improving iterated palindromic closure computation

Reported by: Owned by: abmasse abmasse major sage-4.3.4 combinatorics iterated palindromic closure slabbe sage-4.3.4.alpha0 Alexandre Blondin Massé Sébastien Labbé N/A

There is a faster way to compute the iterated palindromic closure of a word than using the definition. The problem with the latter is that it needs to compute the longest f-palindromic suffix of the current word at each step, while it is possible to easily deduce this length only by looking at the directive word.
Here is an example:

```sage: w = words.RandomWord(10)
sage: w.iterated_right_palindromic_closure(algorithm='definition')
word: 0010010001001000100100010010001001001000...
sage: timeit("print w.iterated_right_palindromic_closure(algorithm='definition').length()")
5 loops, best of 3: 211 ms per loop
sage: timeit("print w.iterated_right_palindromic_closure(algorithm='recursive').length()")
25 loops, best of 3: 9.46 ms per loop
```

The difference is particularly notable for longer words:

```sage: w = words.RandomWord(15)
sage: timeit("print w.iterated_right_palindromic_closure(algorithm='definition').length()")
5 loops, best of 3: 3.73 s per loop
sage: timeit("print w.iterated_right_palindromic_closure(algorithm='recursive').length()")
25 loops, best of 3: 37.4 ms per loop
```

### comment:1 Changed 12 years ago by abmasse

• Description modified (diff)
• Status changed from new to needs_review

I had to implement two other functions: find() and rfind() that were only available for Word_str words. It is not the more efficient implementation yet, but that was not the goal of this ticket...

### comment:2 Changed 12 years ago by abmasse

• Owner changed from AlexGhitza to abmasse

### comment:3 Changed 12 years ago by abmasse

• Component changed from algebra to combinatorics

### comment:4 Changed 12 years ago by slabbe

• Status changed from needs_review to needs_work
1. I think the patch should follow the python behavior (i.e. return -1)
```sage: '0123456789'.find('4566')
-1
sage: '0123456789'.rfind('4566')
-1
```

for those functions :

```sage: Word(range(10)).rfind(Word([4,5,6,6]))
sage: Word(range(10)).find(Word([4,5,6,6]))
```
1. The following comment found in the doc of `find` and `rfind`
```This function is different for Word_str objects:
```

illustrates a problem : the behavior for `Word_str` should be the same (`Word_str` is wrong). Can you fix it? You may consult #8127 for an idea of how to handle it. Make sure to ask the parent using super if type of other is not an str or a `Word_str`.

1. An enumeration in the doc of the iterator function is broken as seen in the result of :
```sage: w = Word(range(10))
sage: browse_sage_doc(w._iterated_right_palindromic_closure_recursive_iterator)
```

Adding a blank line before the itemize should repair the problem. I also suggest to put `WordMorphism`, `'recursive'` and `_iterative_righ...iterator()` inside double backquotes (like input arguments).

1. Looking the function below but also how naive is the code of `find`, maybe the function `find` could use the function `first_pos_in` instead? This makes me realize that the function `first_pos_in` was probably a bad choice of name.... Using the new deprecation warning introduced recently by Florent Hivert this (name modif) can be done more easily now (but not in this ticket).
```sage: %timeit Word([990,991,992,993]).first_pos_in(Word(range(1000)))
125 loops, best of 3: 1.65 ms per loop
sage: %timeit Word(range(1000)).find(Word([990,991,992,993]))
5 loops, best of 3: 48 ms per loop
```
1. Could `rfind` could be improved easily using `_pos_in` and other good suffix table already implemented? If so, it can be good to do it now. But if you don't care now, it is fine. The function could be improved later if it is valuable. Anyway, the next step for all those search stuff is to be cythoned...

### comment:5 follow-up: ↓ 7 Changed 12 years ago by abmasse

Thank you Sébastien for all those comments ! I agree with you on items 4 and 5, but I think these issues should be addressed in another ticket. I intend to do it in a close future, but this is in some way independent of the iterated palindromic closures functions. As for item 1, I agree with you as well, but I think it would be better if find() and rfind() functions could allow the user to choose between different string search algorithms (Boyer-Moore, KMP, etc.), so that it is maybe not necessary to make them look like the Python functions (what do you think?). If I understand you well in item 2, you would like me to change the find() and rfind() functions for Word_str objects or only to detect it in the algorithm computing the iterated palindromic closure? Finally, I will correct item 3 and the other problems as soon as you answer me.

### comment:6 Changed 12 years ago by abmasse

• Description modified (diff)

### comment:7 in reply to: ↑ 5 Changed 12 years ago by slabbe

Thank you Sébastien for all those comments ! I agree with you on items 4 and 5, but I think these issues should be addressed in another ticket.

Well, ok for 5 : `rfind` could be improved later. But for 4, I would like your new find function to make use of `first_pos_in` since it is already there and is faster. If you want to keep your implementation there, I suggest you use a parameter `algorithm` that defaults to `suffix_table` or a similar word and that make use of `first_pos_in`.

As for item 1, I agree with you as well, but I think it would be better if find() and rfind() functions could allow the user to choose between different string search algorithms (Boyer-Moore, KMP, etc.), so that it is maybe not necessary to make them look like the Python functions (what do you think?).

I think both are possible (Je ne crois pas que l'un empêche l'autre) : one may selec the algorithm and the function can still behave like python. For example,

```sage: w = Word(range(10))
sage: u = Word(range(5, 8))
sage: w.find(u)
5
sage: w.find(u, algorithm='KMP')
5
sage: w.find(u*u)
-1
```

If I understand you well in item 2, you would like me to change the find() and rfind() functions for Word_str objects or only to detect it in the algorithm computing the iterated palindromic closure? Finally, I will correct item 3 and the other problems as soon as you answer me.

Ok, so let's open a new ticket to clean up find and rfind.

### comment:8 Changed 12 years ago by abmasse

I have corrected items 1 to 4.

As discussed, item 5 will be done in another ticket or directly in Cython.

I'll upload the new patch in a few minutes.

### Changed 12 years ago by abmasse

Updated patch with corrections

### comment:9 Changed 12 years ago by abmasse

• Status changed from needs_work to needs_review

### Changed 12 years ago by slabbe

Applies over the precedent patch

### comment:10 Changed 12 years ago by slabbe

• Authors changed from abmasse to Alexandre Blondin-Massé
• Reviewers set to Sébastien Labbé

I just tested the patch. All test passed in sage/combinat/words. The speed of the function is greatly improved. Doc builds fine. I am giving a positive review to this ticket.

Althought, I added a patch fixing some small sphinx editing and replace `l` for `L` because I was reading `1` at first glance. Alexandre, if you agree with my patch, I allow you to change the status of this ticket to positive review.

### comment:11 Changed 12 years ago by abmasse

• Authors changed from Alexandre Blondin-Massé to Alexandre Blondin Massé
• Status changed from needs_review to positive_review

I agree with the changes. I tested the two patches and everything seems alright. All tests passed, no problem in the documentation. Positive review to Sébastien's changes.

### comment:12 Changed 12 years ago by mvngu

• Merged in set to sage-4.3.4.alpha0
• Resolution set to fixed
• Status changed from positive_review to closed

Merged in this order:

Note: See TracTickets for help on using tickets.