Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#14969 closed enhancement (fixed)

Longest common subword

Reported by: ncohen Owned by:
Priority: major Milestone: sage-5.12
Component: combinatorics Keywords:
Cc: slabbe, vdelecroix Merged in: sage-5.12.beta4
Authors: Nathann Cohen Reviewers: Hugh Thomas
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description

Looks like this was missing. It's a pity that there is no .pyx file for words, though :-)

Nathann

Attachments (2)

trac_14969-review.patch (2.6 KB) - added by hthomas 5 years ago.
trac_14969.patch (3.0 KB) - added by ncohen 5 years ago.

Download all attachments as: .zip

Change History (12)

comment:1 Changed 5 years ago by ncohen

  • Status changed from new to needs_review

Changed 5 years ago by hthomas

comment:2 follow-up: Changed 5 years ago by hthomas

  • Reviewers set to Hugh Thomas
  • Status changed from needs_review to needs_work

Hi Nathann!

Review patch uploaded. I don't think it will be controversial. I removed one line of code which did nothing useful. (Please confirm.)

The commit message on the original patch should be, um, more descriptive. Other than that, I am ready to give it a positive review if you approve my changes.

What a nice algorithm!

The same approach could be used to find all that longest common subwords. Do you think that would be useful? To me it seems at least as natural.

cheers,

Hugh

Changed 5 years ago by ncohen

comment:3 in reply to: ↑ 2 ; follow-up: Changed 5 years ago by ncohen

  • Status changed from needs_work to needs_review

Helloooooooooooo !!

Review patch uploaded. I don't think it will be controversial.

+1

I removed one line of code which did nothing useful. (Please confirm.)

+1

The commit message on the original patch should be, um, more descriptive.

Right. Fixed.

Other than that, I am ready to give it a positive review if you approve my changes.

+1

What a nice algorithm!

The same approach could be used to find all that longest common subwords. Do you think that would be useful? To me it seems at least as natural.

Hmmmm. Well, the same algorithm with the same complexity can return the number of longest common subwords too. In order to return all longest subwords, though, one would have to keep track of all l[i,j], and not just l[i,j] and l[i-1,j].

Nathann

comment:4 in reply to: ↑ 3 ; follow-up: Changed 5 years ago by hthomas

Replying to ncohen:

Hmmmm. Well, the same algorithm with the same complexity can return the number of longest common subwords too. In order to return all longest subwords, though, one would have to keep track of all l[i,j], and not just l[i,j] and l[i-1,j].

What I was thinking of was that l[i,j] would store a list of the longest subwords of self[:i],other[:j]. Then at each step, you merge the three lists l[i,j-1], l[i-1,j], and l[i-1,j-1] with self[i] tacked onto the end of each one if self[i]==other[j], and remove the items that aren't as long as the maximum.

This wouldn't have the same complexity as the algorithm you implemented, but that seems somehow not unreasonable, since you're asking for more output. Is this inefficient?

comment:5 Changed 5 years ago by hthomas

  • Status changed from needs_review to positive_review

comment:6 in reply to: ↑ 4 ; follow-up: Changed 5 years ago by ncohen

Hellooooooooo !!

What I was thinking of was that l[i,j] would store a list of the longest subwords of self[:i],other[:j]. Then at each step, you merge the three lists l[i,j-1], l[i-1,j], and l[i-1,j-1] with self[i] tacked onto the end of each one if self[i]==other[j], and remove the items that aren't as long as the maximum.

This wouldn't have the same complexity as the algorithm you implemented, but that seems somehow not unreasonable, since you're asking for more output. Is this inefficient?

Well it's fine. It's just that it would be slightly better to do the computations twice : at first you only compute (and remember) all values of l[i,j] (i.e. just the size of the longest subword), then in a second pass you can actually build the list of longest subwords, from l[i,j] and recursively to the smaller values of l, only when needed, i.e. when it participates to a word of maximum length.

This way you make sure that you are not building and maintaining very long lists of words which you would throw away later :-)

Nathann

comment:7 Changed 5 years ago by ncohen

Oh, and thank for your the review !!!

Nathann

comment:8 in reply to: ↑ 6 Changed 5 years ago by hthomas

Replying to ncohen:

This way you make sure that you are not building and maintaining very long lists of words which you would throw away later :-)

Oh, I see! Thanks very much for the explanation.

And as regards the review, you are very welcome -- it's fun to review efficient code!

Hugh

comment:9 Changed 5 years ago by jdemeyer

  • Merged in set to sage-5.12.beta4
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:10 in reply to: ↑ description Changed 5 years ago by slabbe

Replying to ncohen:

Looks like this was missing. It's a pity that there is no .pyx file for words, though :-)

There is one :

sage/combinat/words/word_datatypes.pyx

Code that goes there will apply only for words on specific data type (like list, tuple, str). If a method exists in the cython file and in the finite_word.py file, the cython method is called instead.

I would be curious to know what is the gain... tell me if your test it.

Note: See TracTickets for help on using tickets.