Opened 4 years ago

Closed 4 years ago

#19322 closed enhancement (fixed)

a much faster longest_common_prefix for words

Reported by: vdelecroix Owned by:
Priority: major Milestone: sage-6.9
Component: combinatorics Keywords:
Cc: slabbe Merged in:
Authors: Vincent Delecroix Reviewers: Nathann Cohen
Report Upstream: N/A Work issues:
Branch: ebbc28d (Commits) Commit: ebbc28d6aa6e890312fa23f87863ec4179bf6f4b
Dependencies: Stopgaps:

Description

I had to do some computations of the following kind... which are damn slow

sage: w = words.FibonacciWord()[:10000]
sage: %time L = [[len(w[n:].longest_common_prefix(w[n+fibonacci(i):])) for i in range(5,20)] for n in range(1,1000)]
CPU times: user 20.6 s, sys: 44 ms, total: 20.7 s
Wall time: 20.5 s

sage: w = Words([0,1])(list(words.FibonacciWord()[:10000]))
sage: %time L = [[len(w[n:].longest_common_prefix(w[n+fibonacci(i):])) for i in range(5,20)] for n in range(1,1000)]
CPU times: user 7.99 s, sys: 56 ms, total: 8.04 s
Wall time: 7.93 s

and with the branch

sage: w = Words([0,1])(list(words.FibonacciWord()[:10000]))
sage: %time L = [[len(w[n:].longest_common_prefix(w[n+fibonacci(i):])) for i in range(5,20)] for n in range(1,1000)]
CPU times: user 172 ms, sys: 0 ns, total: 172 ms
Wall time: 168 ms

We also implement longest_common_suffix and fix the following annoying feature of has_prefix

sage: W = Words([0,1,2])
sage: w = W([0,1,0,2])
sage: w.has_prefix(words.FibonacciWord())
False

Change History (7)

comment:1 Changed 4 years ago by vdelecroix

  • Branch set to u/vdelecroix/19322
  • Commit set to fe306f971ac9edde2cec32997c3cff7b4db1ce1b
  • Status changed from new to needs_review

New commits:

fe306f9Trac 19322: faster longest_common_prefix

comment:2 Changed 4 years ago by ncohen

Hello Vincent,

Looks good. A couple of remarks:

1) You *can* use min/max in Cython code. Contrary to Python :-P

2) longest_suffix/Python case: what about 'caching' len(other) instead of recomputing it at every test?

3) longest prefix/Python case: the code generated by python for this 'slice' is scary. Isn't it possible to reimplement it without it? Plus if you need to import 'islice' it is maybe better to do so at module level, it's not thaaat bad either and it may happen that the prefix be one character long, in which case importing stuff could be comparatively more expensive (?).

4) Have you considered returning a 'new word' (through new_c) even when that new word is equal to one of self and other? It would simplify the code, and I don't know if it matters as it does not allocate new memory anyway?

5) I do not understand your 'not able to initialize a word from ..'. The code does not try to initialize a word, does it? It's more something like "unsupported type"?..

Nathann

comment:3 Changed 4 years ago by git

  • Commit changed from fe306f971ac9edde2cec32997c3cff7b4db1ce1b to ebbc28d6aa6e890312fa23f87863ec4179bf6f4b

Branch pushed to git repo; I updated commit sha1. New commits:

ebbc28dTrac 19322: reviewer comments

comment:4 follow-up: Changed 4 years ago by vdelecroix

Hello Nathann,

I implemented your remarks excepted 4. It does allocate memory to call _new_c: the one you need for a Python object. But I don't know whether it is justified or not (ie the ratio between "simple code" versus "efficient code").

Vincent

comment:5 in reply to: ↑ 4 Changed 4 years ago by ncohen

  • Reviewers set to Nathann Cohen
  • Status changed from needs_review to positive_review

Yoooooooooo !

I implemented your remarks excepted 4. It does allocate memory to call _new_c: the one you need for a Python object. But I don't know whether it is justified or not (ie the ratio between "simple code" versus "efficient code").

Okay okay, you decide, just felt like bringing it up when I reviewed this code.

Stamped, and good to go.

Nathann

comment:6 Changed 4 years ago by vdelecroix

Thanks!

comment:7 Changed 4 years ago by vbraun

  • Branch changed from u/vdelecroix/19322 to ebbc28d6aa6e890312fa23f87863ec4179bf6f4b
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.