#14969 closed enhancement (fixed)
Longest common subword
Reported by: | ncohen | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-5.12 |
Component: | combinatorics | Keywords: | |
Cc: | slabbe, vdelecroix | Merged in: | sage-5.12.beta4 |
Authors: | Nathann Cohen | Reviewers: | Hugh Thomas |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description
Looks like this was missing. It's a pity that there is no .pyx file for words, though :-)
Nathann
Attachments (2)
Change History (12)
comment:1 Changed 8 years ago by
- Status changed from new to needs_review
Changed 7 years ago by
comment:2 follow-up: ↓ 3 Changed 7 years ago by
- Reviewers set to Hugh Thomas
- Status changed from needs_review to needs_work
Changed 7 years ago by
comment:3 in reply to: ↑ 2 ; follow-up: ↓ 4 Changed 7 years ago by
- Status changed from needs_work to needs_review
Helloooooooooooo !!
Review patch uploaded. I don't think it will be controversial.
+1
I removed one line of code which did nothing useful. (Please confirm.)
+1
The commit message on the original patch should be, um, more descriptive.
Right. Fixed.
Other than that, I am ready to give it a positive review if you approve my changes.
+1
What a nice algorithm!
The same approach could be used to find all that longest common subwords. Do you think that would be useful? To me it seems at least as natural.
Hmmmm. Well, the same algorithm with the same complexity can return the number of longest common subwords too. In order to return all longest subwords, though, one would have to keep track of all l[i,j]
, and not just l[i,j]
and l[i-1,j]
.
Nathann
comment:4 in reply to: ↑ 3 ; follow-up: ↓ 6 Changed 7 years ago by
Replying to ncohen:
Hmmmm. Well, the same algorithm with the same complexity can return the number of longest common subwords too. In order to return all longest subwords, though, one would have to keep track of all
l[i,j]
, and not justl[i,j]
andl[i-1,j]
.
What I was thinking of was that l[i,j] would store a list of the longest subwords of self[:i],other[:j]. Then at each step, you merge the three lists l[i,j-1], l[i-1,j], and l[i-1,j-1] with self[i] tacked onto the end of each one if self[i]==other[j], and remove the items that aren't as long as the maximum.
This wouldn't have the same complexity as the algorithm you implemented, but that seems somehow not unreasonable, since you're asking for more output. Is this inefficient?
comment:5 Changed 7 years ago by
- Status changed from needs_review to positive_review
comment:6 in reply to: ↑ 4 ; follow-up: ↓ 8 Changed 7 years ago by
Hellooooooooo !!
What I was thinking of was that l[i,j] would store a list of the longest subwords of self[:i],other[:j]. Then at each step, you merge the three lists l[i,j-1], l[i-1,j], and l[i-1,j-1] with self[i] tacked onto the end of each one if self[i]==other[j], and remove the items that aren't as long as the maximum.
This wouldn't have the same complexity as the algorithm you implemented, but that seems somehow not unreasonable, since you're asking for more output. Is this inefficient?
Well it's fine. It's just that it would be slightly better to do the computations twice : at first you only compute (and remember) all values of l[i,j] (i.e. just the size of the longest subword), then in a second pass you can actually build the list of longest subwords, from l[i,j] and recursively to the smaller values of l, only when needed, i.e. when it participates to a word of maximum length.
This way you make sure that you are not building and maintaining very long lists of words which you would throw away later :-)
Nathann
comment:7 Changed 7 years ago by
Oh, and thank for your the review !!!
Nathann
comment:8 in reply to: ↑ 6 Changed 7 years ago by
Replying to ncohen:
This way you make sure that you are not building and maintaining very long lists of words which you would throw away later
:-)
Oh, I see! Thanks very much for the explanation.
And as regards the review, you are very welcome -- it's fun to review efficient code!
Hugh
comment:9 Changed 7 years ago by
- Merged in set to sage-5.12.beta4
- Resolution set to fixed
- Status changed from positive_review to closed
comment:10 in reply to: ↑ description Changed 7 years ago by
Replying to ncohen:
Looks like this was missing. It's a pity that there is no .pyx file for words, though
:-)
There is one :
sage/combinat/words/word_datatypes.pyx
Code that goes there will apply only for words on specific data type (like list, tuple, str). If a method exists in the cython file and in the finite_word.py
file, the cython method is called instead.
I would be curious to know what is the gain... tell me if your test it.
Hi Nathann!
Review patch uploaded. I don't think it will be controversial. I removed one line of code which did nothing useful. (Please confirm.)
The commit message on the original patch should be, um, more descriptive. Other than that, I am ready to give it a positive review if you approve my changes.
What a nice algorithm!
The same approach could be used to find all that longest common subwords. Do you think that would be useful? To me it seems at least as natural.
cheers,
Hugh