Opened 5 years ago
Closed 5 years ago
#19890 closed enhancement (fixed)
Improve standardization of words and permutations
Reported by:  elixyre  Owned by:  

Priority:  major  Milestone:  sage7.1 
Component:  combinatorics  Keywords:  
Cc:  virmaux, nthiery  Merged in:  
Authors:  JeanBaptiste Priez, Aladin Virmaux  Reviewers:  Travis Scrimshaw, Vincent Delecroix 
Report Upstream:  N/A  Work issues:  
Branch:  61b15f4 (Commits)  Commit:  61b15f4115dec4a07d21a30dba8c503c5296ae17 
Dependencies:  Stopgaps: 
Description (last modified by )
Improvement of the standardization computation
 the current one:
sage: w = list(Words(100, 10000).random_element()) sage: from sage.combinat.permutation import to_standard sage: %time sig = to_standard(w) CPU times: user 3.06 s, sys: 0 ns, total: 3.06 s Wall time: 3.03 s
 the new version:
sage: w = list(Words(100, 10000).random_element()) sage: from sage.combinat.permutation import to_standard sage: %time sig = to_standard(w) CPU times: user 28 ms, sys: 12 ms, total: 40 ms Wall time: 61.2 ms
Change History (19)
comment:1 Changed 5 years ago by
 Status changed from new to needs_review
comment:2 followup: ↓ 6 Changed 5 years ago by
 Reviewers set to Travis Scrimshaw
 Type changed from PLEASE CHANGE to enhancement
comment:3 Changed 5 years ago by
Here are the changes I'd make:
 def merge(shelves, i=0): + def merge(shelves, i):  if len(shelves) == 0: + if not shelves: return + n = len(shelves)  if len(shelves) == 1: + if n == 1: yield (shelves[0], i) return  n = len(shelves)  m = int(n / 2) + m = int(n // 2) L, R = merge(shelves[:m], i), merge(shelves[m:], i+m) aL, aR = L.next(), R.next()  j, k = 1, 1 + j, k = 0, 0 while True: if aL[0] <= aR[0]: yield aL j += 1  if j == m+1: break  else: aL = L.next() + if j == m: + break + aL = L.next() else: yield aR k += 1  if k == n  m + 1: break  else: aR = R.next() + if k == n  m: + break + aR = R.next()  if j == m+1: + if j == m:  it = R yield aR + for a in R: + yield a else:  it = L yield aL + for a in L: + yield a  for a in it:  yield a
Here is my reasoning:
 I do not think you should have a default for
i
since you almost always call it with an argument. I would just make the one place where you use the default argument (thefor
loop in the main function) explicitly pass0
.  The
if not shelves
is faster than callinglen
and comparing to0
.  Moving the
n = len(shelves)
call up typically avoids an extra call tolen
and variable allocation is faster than a function call.  We should use
//
for integer division.  If we start at 0, we avoid a number of unnecessary
+1
's and keeps it more in line with standard python code.  Standard Python formatting should only have
if
(andelse
) statements on one line.  By not having the
it
variable and the distinctfor
loops, IMO it makes the code more clear as there are less variables to chase around (and it has the same length).
If you agree and make these changes, you can set a positive review on my behalf.
comment:4 Changed 5 years ago by
 Commit changed from 798a3fb4e5809e9aafe9c7ea0b4d13bc0b051788 to 3d1e1bfbc871f7a7c290f659b1db7f44c5e69634
Branch pushed to git repo; I updated commit sha1. New commits:
3d1e1bf  ticket standardization : remove unused import and old code

comment:5 Changed 5 years ago by
 Commit changed from 3d1e1bfbc871f7a7c290f659b1db7f44c5e69634 to b248fb55c423ffc380700234be22fa7a37ace3c2
Branch pushed to git repo; I updated commit sha1. New commits:
b248fb5  ticket standardization: add doctest with old standardization algorithm

comment:6 in reply to: ↑ 2 Changed 5 years ago by
Replying to tscrim:
That is quite the improvement. Here are some comments and suggestions:
 Why do you import
imap
? You don't seem to use it.
Sorry, it is a mistake... I was to fast and I forgot to remove that.
 Are you going to delete the old algorithm or keep it and add an
algorithm
option? I would at least keep the old (naive) algorithm around in a doctest for testing.
I deleted the old below and I had it in doctest!
comment:7 Changed 5 years ago by
 Commit changed from b248fb55c423ffc380700234be22fa7a37ace3c2 to acdaf9d6d8fa028a31e0628d0d61352511a610bc
Branch pushed to git repo; I updated commit sha1. New commits:
acdaf9d  ticket standardization: following Travis comments

comment:8 Changed 5 years ago by
Hey JeanBaptiste!
It is probably possible to go even faster using python sort and a dictionnary!
sage: w = Words(100, 100000).random_element() sage: %time a = w.standard_permutation() CPU times: user 123 ms, sys: 36.7 ms, total: 160 ms Wall time: 128 ms
comment:9 Changed 5 years ago by
Do you have code that you could push or paste here?
comment:10 Changed 5 years ago by
 Commit changed from acdaf9d6d8fa028a31e0628d0d61352511a610bc to 2ee9514188869c987a196893fcd7a8cb61653bcc
Branch pushed to git repo; I updated commit sha1. New commits:
2ee9514  ticket standardization: improvement with Aladin advice

comment:11 Changed 5 years ago by
Thanks Aladin for this idea... It is better now. :)
sage: %time _ = to_standard(w) CPU times: user 169 ms, sys: 20.7 ms, total: 190 ms Wall time: 178 ms
But I remark there exist a better version in FiniteWord_class
... (why this implementation is not used there???)
I correct it
comment:12 Changed 5 years ago by
 Commit changed from 2ee9514188869c987a196893fcd7a8cb61653bcc to 7725254780460706ecd3d933753d08883462c538
Branch pushed to git repo; I updated commit sha1. New commits:
7725254  ticket standardization: move and improve the standardization implementation of finite words into permutation.

comment:13 Changed 5 years ago by
 Commit changed from 7725254780460706ecd3d933753d08883462c538 to 27ac158138b23ae5b5effe0b0ba1b200281c7c5c
Branch pushed to git repo; I updated commit sha1. New commits:
27ac158  ticket standardization: what I was supposed to commit earlier...

comment:14 Changed 5 years ago by
The implementation in finite_word.py
was quite better so I moved it into permutation.py
and updated the code around.
(Specially I defined a function evaluation_dict
in finite_word
which replace/generalize/improve the evaluation_dict
method of finite_word
.)
comment:15 Changed 5 years ago by
standardization of what? The title is very vague, it's not even clear that it's about combinatorics.
comment:16 Changed 5 years ago by
 Branch changed from u/elixyre/standardization to u/tscrim/improve_standardization19890
 Commit changed from 27ac158138b23ae5b5effe0b0ba1b200281c7c5c to 61b15f4115dec4a07d21a30dba8c503c5296ae17
 Milestone changed from sage7.0 to sage7.1
 Summary changed from Improve standardization to Improve standardization of words and permutations
I made some reviewer changes by adding a little bit more description in the doctests, and I moved evaluation_dict
to the end of the file as IMO the main classes should be closest to the top of the file. If you're happy with my changes, then positive review. (I added Aladin to the authors since the current algorithm is based upon his idea as per comment:8.)
New commits:
e450780  Merge branch 'u/elixyre/standardization' of trac.sagemath.org:sage into u/elixyre/standardization

61b15f4  Some other reviewer changes.

comment:17 Changed 5 years ago by
ping
comment:18 Changed 5 years ago by
 Description modified (diff)
 Reviewers changed from Travis Scrimshaw to Travis Scrimshaw, Vincent Delecroix
 Status changed from needs_review to positive_review
comment:19 Changed 5 years ago by
 Branch changed from u/tscrim/improve_standardization19890 to 61b15f4115dec4a07d21a30dba8c503c5296ae17
 Resolution set to fixed
 Status changed from positive_review to closed
That is quite the improvement. Here are some comments and suggestions:
imap
? You don't seem to use it.algorithm
option? I would at least keep the old (naive) algorithm around in a doctest for testing.