Opened 4 years ago

Closed 3 years ago

#19890 closed enhancement (fixed)

Improve standardization of words and permutations

Reported by: elixyre Owned by:
Priority: major Milestone: sage-7.1
Component: combinatorics Keywords:
Cc: virmaux, nthiery Merged in:
Authors: Jean-Baptiste Priez, Aladin Virmaux Reviewers: Travis Scrimshaw, Vincent Delecroix
Report Upstream: N/A Work issues:
Branch: 61b15f4 (Commits) Commit: 61b15f4115dec4a07d21a30dba8c503c5296ae17
Dependencies: Stopgaps:

Description (last modified by vdelecroix)

Improvement of the standardization computation

  • the current one:
    sage: w = list(Words(100, 10000).random_element())
    sage: from sage.combinat.permutation import to_standard
    sage: %time sig = to_standard(w)
    CPU times: user 3.06 s, sys: 0 ns, total: 3.06 s
    Wall time: 3.03 s
    
  • the new version:
    sage: w = list(Words(100, 10000).random_element())
    sage: from sage.combinat.permutation import to_standard
    sage: %time sig = to_standard(w)
    CPU times: user 28 ms, sys: 12 ms, total: 40 ms
    Wall time: 61.2 ms
    

Change History (19)

comment:1 Changed 4 years ago by elixyre

  • Status changed from new to needs_review

comment:2 follow-up: Changed 4 years ago by tscrim

  • Reviewers set to Travis Scrimshaw
  • Type changed from PLEASE CHANGE to enhancement

That is quite the improvement. Here are some comments and suggestions:

  • Why do you import imap? You don't seem to use it.
  • Are you going to delete the old algorithm or keep it and add an algorithm option? I would at least keep the old (naive) algorithm around in a doctest for testing.

comment:3 Changed 4 years ago by tscrim

Here are the changes I'd make:

-    def merge(shelves, i=0):
+    def merge(shelves, i):
 
-        if len(shelves) == 0:
+        if not shelves:
             return
+        n = len(shelves)
-        if len(shelves) == 1:
+        if n == 1:
             yield (shelves[0], i)
             return
 
-        n = len(shelves)
-        m = int(n / 2)
+        m = int(n // 2)
 
         L, R = merge(shelves[:m], i), merge(shelves[m:], i+m)
 
         aL, aR = L.next(), R.next()
-        j, k = 1, 1
+        j, k = 0, 0
 
         while True:
             if aL[0] <= aR[0]:
                 yield aL
                 j += 1
-                if j == m+1: break
-                else: aL = L.next()
+                if j == m:
+                    break
+                aL = L.next()
             else:
                 yield aR
                 k += 1
-                if k == n - m + 1: break
-                else: aR = R.next()
+                if k == n - m:
+                    break
+                aR = R.next()
 
-        if j == m+1:
+        if j == m:
-            it = R
             yield aR
+            for a in R:
+                yield a
         else:
-            it = L
             yield aL
+            for a in L:
+                yield a
-        for a in it:
-            yield a

Here is my reasoning:

  • I do not think you should have a default for i since you almost always call it with an argument. I would just make the one place where you use the default argument (the for loop in the main function) explicitly pass 0.
  • The if not shelves is faster than calling len and comparing to 0.
  • Moving the n = len(shelves) call up typically avoids an extra call to len and variable allocation is faster than a function call.
  • We should use // for integer division.
  • If we start at 0, we avoid a number of unnecessary +1's and keeps it more in line with standard python code.
  • Standard Python formatting should only have if (and else) statements on one line.
  • By not having the it variable and the distinct for loops, IMO it makes the code more clear as there are less variables to chase around (and it has the same length).

If you agree and make these changes, you can set a positive review on my behalf.

comment:4 Changed 4 years ago by git

  • Commit changed from 798a3fb4e5809e9aafe9c7ea0b4d13bc0b051788 to 3d1e1bfbc871f7a7c290f659b1db7f44c5e69634

Branch pushed to git repo; I updated commit sha1. New commits:

3d1e1bfticket standardization : remove unused import and old code

comment:5 Changed 4 years ago by git

  • Commit changed from 3d1e1bfbc871f7a7c290f659b1db7f44c5e69634 to b248fb55c423ffc380700234be22fa7a37ace3c2

Branch pushed to git repo; I updated commit sha1. New commits:

b248fb5ticket standardization: add doctest with old standardization algorithm

comment:6 in reply to: ↑ 2 Changed 4 years ago by elixyre

Replying to tscrim:

That is quite the improvement. Here are some comments and suggestions:

  • Why do you import imap? You don't seem to use it.

Sorry, it is a mistake... I was to fast and I forgot to remove that.

  • Are you going to delete the old algorithm or keep it and add an algorithm option? I would at least keep the old (naive) algorithm around in a doctest for testing.

I deleted the old below and I had it in doctest!

comment:7 Changed 4 years ago by git

  • Commit changed from b248fb55c423ffc380700234be22fa7a37ace3c2 to acdaf9d6d8fa028a31e0628d0d61352511a610bc

Branch pushed to git repo; I updated commit sha1. New commits:

acdaf9dticket standardization: following Travis comments

comment:8 Changed 4 years ago by virmaux

Hey Jean-Baptiste!

It is probably possible to go even faster using python sort and a dictionnary!

sage: w = Words(100, 100000).random_element()
sage: %time a = w.standard_permutation()
CPU times: user 123 ms, sys: 36.7 ms, total: 160 ms
Wall time: 128 ms

comment:9 Changed 4 years ago by tscrim

Do you have code that you could push or paste here?

comment:10 Changed 4 years ago by git

  • Commit changed from acdaf9d6d8fa028a31e0628d0d61352511a610bc to 2ee9514188869c987a196893fcd7a8cb61653bcc

Branch pushed to git repo; I updated commit sha1. New commits:

2ee9514ticket standardization: improvement with Aladin advice

comment:11 Changed 4 years ago by elixyre

Thanks Aladin for this idea... It is better now. :-)

sage: %time _ = to_standard(w)
CPU times: user 169 ms, sys: 20.7 ms, total: 190 ms
Wall time: 178 ms

But I remark there exist a better version in FiniteWord_class... (why this implementation is not used there???)

I correct it

comment:12 Changed 4 years ago by git

  • Commit changed from 2ee9514188869c987a196893fcd7a8cb61653bcc to 7725254780460706ecd3d933753d08883462c538

Branch pushed to git repo; I updated commit sha1. New commits:

7725254ticket standardization: move and improve the standardization implementation of finite words into permutation.

comment:13 Changed 4 years ago by git

  • Commit changed from 7725254780460706ecd3d933753d08883462c538 to 27ac158138b23ae5b5effe0b0ba1b200281c7c5c

Branch pushed to git repo; I updated commit sha1. New commits:

27ac158ticket standardization: what I was supposed to commit earlier...

comment:14 Changed 4 years ago by elixyre

The implementation in finite_word.py was quite better so I moved it into permutation.py and updated the code around. (Specially I defined a function evaluation_dict in finite_word which replace/generalize/improve the evaluation_dict method of finite_word.)

comment:15 Changed 4 years ago by jdemeyer

standardization of what? The title is very vague, it's not even clear that it's about combinatorics.

comment:16 Changed 3 years ago by tscrim

  • Authors changed from Jean-Baptiste Priez to Jean-Baptiste Priez, Aladin Virmaux
  • Branch changed from u/elixyre/standardization to u/tscrim/improve_standardization-19890
  • Commit changed from 27ac158138b23ae5b5effe0b0ba1b200281c7c5c to 61b15f4115dec4a07d21a30dba8c503c5296ae17
  • Milestone changed from sage-7.0 to sage-7.1
  • Summary changed from Improve standardization to Improve standardization of words and permutations

I made some reviewer changes by adding a little bit more description in the doctests, and I moved evaluation_dict to the end of the file as IMO the main classes should be closest to the top of the file. If you're happy with my changes, then positive review. (I added Aladin to the authors since the current algorithm is based upon his idea as per comment:8.)


New commits:

e450780Merge branch 'u/elixyre/standardization' of trac.sagemath.org:sage into u/elixyre/standardization
61b15f4Some other reviewer changes.

comment:17 Changed 3 years ago by tscrim

ping

comment:18 Changed 3 years ago by vdelecroix

  • Description modified (diff)
  • Reviewers changed from Travis Scrimshaw to Travis Scrimshaw, Vincent Delecroix
  • Status changed from needs_review to positive_review

comment:19 Changed 3 years ago by vbraun

  • Branch changed from u/tscrim/improve_standardization-19890 to 61b15f4115dec4a07d21a30dba8c503c5296ae17
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.