Opened 5 months ago

Closed 5 months ago

#29942 closed enhancement (fixed)

More efficient rank and unrank for Permutations_mset

Reported by: dcfifield Owned by:
Priority: major Milestone: sage-9.2
Component: combinatorics Keywords:
Cc: slelievre Merged in:
Authors: David Fifield Reviewers: Frédéric Chapoton, Travis Scrimshaw
Report Upstream: N/A Work issues:
Branch: 141f9a9 (Commits) Commit: 141f9a97c16facf1780bf14402915d68d814f354
Dependencies: #30005 Stopgaps:

Description (last modified by slelievre)

The Permutations_mset rank and unrank methods inherit a generic implementation from EnumeratedSets rank and unrank, which have running time proportional to the rank:

sage: mset = list(range(10)) * 3
sage: p = Permutations(mset)
sage: for e in range(8): timeit('p.unrank({})'.format(10^e), number=1, repeat=5)
1 loop, best of 5: 24.2 μs per loop
1 loop, best of 5: 39.1 μs per loop
1 loop, best of 5: 191 μs per loop
1 loop, best of 5: 1.28 ms per loop
1 loop, best of 5: 12.7 ms per loop
1 loop, best of 5: 126 ms per loop
1 loop, best of 5: 1.27 s per loop
1 loop, best of 5: 13 s per loop
sage: p.rank(list(reversed(sorted(mset))))
... never finishes ...

This branch makes the running time roughly proportional to the size of the multiset:

sage: mset = list(range(10)) * 3
sage: p = Permutations(mset)
sage: for e in range(8): timeit('p.unrank({})'.format(10^e), number=1, repeat=5)
1 loop, best of 5: 1.35 ms per loop
1 loop, best of 5: 1.43 ms per loop
1 loop, best of 5: 1.37 ms per loop
1 loop, best of 5: 1.37 ms per loop
1 loop, best of 5: 2.23 ms per loop
1 loop, best of 5: 2.12 ms per loop
1 loop, best of 5: 1.46 ms per loop
1 loop, best of 5: 1.44 ms per loop
sage: timeit('p.rank(list(reversed(sorted(mset))))', number=1, repeat=5)
1 loop, best of 5: 1.51 ms per loop

Algorithm

The rank algorithm is a straightforward application of the solution to exercise 7.2.1.2–4 in The Art of Computer Programming, Volume 4A. I will quote the exercise:

  1. Generalizing exercise 3, explain how to compute the rank of a1an with respect to Algorithm L [lexicographic permutation generation] when {a1, …, an} is the multiset {n1·x1, …, nt·xt}; here n1 + … + nt = n and x1 < … < xt. (The total number of permutations is, of course, the multinomial coefficient

multinomial(n1, …, nt) = n! / (n1! … nt!);

see Eq. 5.1.2–(3).) What is the rank of 314159265?

  1. Use the recurrence rank(a1an) = 1/n Σj=1…t nj [xj < a1] multinomial(n1, …, nt) + rank(a2an). For example, rank(314159265) is

3/9 multinomial(2, 1, 1, 1, 2, 1, 1) + 0 + 2/7 multinomial(1, 1, 1, 2, 1, 1) + 0 + 1/5 multinomial(1, 2, 1, 1) + 3/4 multinomial(1, 1, 1, 1) + 0 + 1/2 multinomial(1, 1) = 30991.

The unrank algorithm is also based on this recurrence; however the application is not so straightforward. I did not find an explicit statement of an unranking algorithm elsewhere, and the algorithm I came up with isn't particularly elegant. I'd appreciate having another opinion on it or a pointer to some other reference. The main idea is that given a rank r and a multiset m = {n1·x1, …, nt·xt}, we know that r = c/n multinomial(n1, …, nt) + rank(m\c), where c is the number of elements in m that are strictly less than the first element of the output sequence, and m\c stands for m with one instance of the cth smallest element removed.

Notes/caveats

The rank method is actually independent of the Permutations_mset object it is attached to. Other than the self(p).check() line, it never refers to self. It could be defined in terms of a standalone function, as StandardPermutations_n.unrank is defined in terms of from_rank.

I am aware of two ways in which this branch differs in behavior from what is there now. They both seem to me to be because of bugs in the EnumeratedSets fallback implementations, but I'll document them here in case complete compatibility is needed.

The first is that the existing rank method returns None when passed a list or a tuple, because it uses == comparison with an element_class type (I'm not sure how that works exactly, but at any rate it doesn't compare equal.) To get rank to return something other than None, you have to pass in a permutation produced by the Permutations_mset object itself.

sage: mset = [1, 1, 2, 3, 4, 5, 5, 6, 9]
sage: p = Permutations(mset)
sage: p.rank(mset)
sage: p.rank([1, 1, 2, 3, 4, 5, 5, 6, 9])
sage: p.unrank(3)
[1, 1, 2, 3, 4, 5, 6, 9, 5]
sage: type(p.unrank(3))
<class 'sage.combinat.permutation.Permutations_mset_with_category.element_class'>
sage: p.rank(p.unrank(3))
3

In this branch, the value passed to rank doesn't have to be an element_class.

sage: mset = [1, 1, 2, 3, 4, 5, 5, 6, 9]
sage: p = Permutations(mset)
sage: p.rank(mset)
0
sage: p.rank([1, 1, 2, 3, 4, 5, 5, 6, 9])
0
sage: p.unrank(3)
[1, 1, 2, 3, 4, 5, 6, 9, 5]
sage: type(p.unrank(3))
<class 'list'>
sage: p.rank(p.unrank(3))
3

The other way in which this branch changes behavior is that the current code does not canonicalize the multiset used to create the Permutations_mset object. Permutations([1, 1, 2, 3, 4]) and Permutations([3, 2, 1, 4, 1]) visit the same permutations but in a different order; the latter acts like an already partially enumerated version of the former:

sage: Permutations([1, 1, 2, 3, 4]).unrank(0)
[1, 1, 2, 3, 4]
sage: Permutations([3, 2, 1, 4, 1]).unrank(0)
[3, 2, 1, 1, 4]

In this branch, the order of the input multiset does not make a difference:

sage: Permutations([1, 1, 2, 3, 4]).unrank(0)
[1, 1, 2, 3, 4]
sage: Permutations([3, 2, 1, 4, 1]).unrank(0)
[1, 1, 2, 3, 4]

It would be possible to work around this canonicalization, if necessary, by computing the rank of the input multiset on creation, and using it as an offset for future calls to unrank.

I was working on a project that needed fast unranking of a multiset, when I found that Sage could not handle a multiset of the size I needed. I came up with an algorithm that worked well enough for my purposes, however its ranking order is not lexicographic. The algorithms in this branch are the result of a little more research to find a lexicographic algorithm.

Change History (18)

comment:1 Changed 5 months ago by dcfifield

  • Status changed from new to needs_review

comment:2 follow-up: Changed 5 months ago by chapoton

The documentation is not formatted correctly. Each time you insert a sentence or TEST between two blocks of tests, it should end with :: unless there is another sentence just after.

see https://doc.sagemath.org/html/en/developer/coding_basics.html#template

comment:3 Changed 5 months ago by git

  • Commit changed from a59f51403d4d2e35533dfb5c017861b8e0ff8827 to 9b9d936081bc4b04d5476ce4c4e54af0997a9295

Branch pushed to git repo; I updated commit sha1. New commits:

9b9d936Fix doc formatting of Permutations_mset rank and unrank.

comment:4 in reply to: ↑ 2 Changed 5 months ago by dcfifield

Replying to chapoton:

The documentation is not formatted correctly. Each time you insert a sentence or TEST between two blocks of tests, it should end with :: unless there is another sentence just after.

Thanks. I fixed the formatting in 9b9d936. I can see the difference with sage --docbuild --include-tests-blocks.

comment:5 Changed 5 months ago by slelievre

  • Cc slelievre added
  • Description modified (diff)

comment:6 follow-up: Changed 5 months ago by tscrim

A quick comment for testing coverage. I would add an exhaustive test on something X of reasonable size such that

for i, p in enumerate(X):
    assert X.rank(p) == i
    assert X.unrank(i) == p

comment:7 Changed 5 months ago by git

  • Commit changed from 9b9d936081bc4b04d5476ce4c4e54af0997a9295 to a4adb8a15db91dfdec4d9fe43e3826397c47e607

Branch pushed to git repo; I updated commit sha1. New commits:

a4adb8aDo an exhaustive rank/unrank roundtrip test.

comment:8 in reply to: ↑ 6 Changed 5 months ago by dcfifield

Replying to tscrim:

A quick comment for testing coverage. I would add an exhaustive test on something X of reasonable size

I did so in 4adb8a over the multiset [2, 2, 3, 3, 3, 5, 5, 5, 5, 5], which has 2520 permutations. (I wasn't sure what a reasonable size for a test case would be.) In the same loop, I added a check for lexicographic ordering. I had to convert the permutations from a list to a tuple to facilitate comparison.

comment:9 follow-up: Changed 5 months ago by tscrim

  • Reviewers set to Frédéric Chapoton, Travis Scrimshaw

I think that is a reasonable size; thank you.

I just have three more minor changes:

-        - ``p`` -- a permutation of `M`.
+        - ``p`` -- a permutation of `M`
         - ``r`` -- an integer between ``0`` and ``self.cardinality()-1``
-          inclusive.
+          inclusive

Update the Knuth reference based off #30005.

Something that is stylistic, so it is up to you if you want to change it, but I would change the

-Some sentence explanation. ::
+Some sentence explanation::

to have it end each block with a colon as the code is better associated with the sentence. However, that is something you can ignore.

Frédéric, do you have any other comments?

comment:10 Changed 5 months ago by chapoton

ok, looks good to me too. The patchbot warning are false-positive. Once Travis suggestions are fixed, this can be set to positive and will be a very nice improvement.

comment:11 Changed 5 months ago by git

  • Commit changed from a4adb8a15db91dfdec4d9fe43e3826397c47e607 to a694a352f26f0c6236334f485a6c0e7237eecabc

Branch pushed to git repo; I updated commit sha1. New commits:

35b6e19Update a variable reference in a comment.
a694a35Doc improvements in Permutations_mset rank and unrank.

comment:12 in reply to: ↑ 9 ; follow-up: Changed 5 months ago by dcfifield

Replying to tscrim:

-        - ``p`` -- a permutation of `M`.
+        - ``p`` -- a permutation of `M`
         - ``r`` -- an integer between ``0`` and ``self.cardinality()-1``
-          inclusive.
+          inclusive
-Some sentence explanation. ::
+Some sentence explanation::

I made these changes in a694a35.

Update the Knuth reference based off #30005.

It seems the right way to handle this is to remove the REFERENCES section from gray_codes.py, containing the [Knuth-TAOCP4A] reference used in that file only, and update [Knuth-TAOCP4A] references to point to the new [Knu2011] in references/index.rst. But #30005 is not merged yet and [Knu2011] does not exist in its branch; while modifying gray_codes.py in this branch may prevent #30005 from merging cleanly. How should I handle this?

comment:13 in reply to: ↑ 12 ; follow-up: Changed 5 months ago by tscrim

Replying to dcfifield:

Replying to tscrim:

Update the Knuth reference based off #30005.

It seems the right way to handle this is to remove the REFERENCES section from gray_codes.py, containing the [Knuth-TAOCP4A] reference used in that file only, and update [Knuth-TAOCP4A] references to point to the new [Knu2011] in references/index.rst. But #30005 is not merged yet and [Knu2011] does not exist in its branch; while modifying gray_codes.py in this branch may prevent #30005 from merging cleanly. How should I handle this?

You merge in the branch from #30005 into this one and set it as a dependency in the ticket. Actually, you probably should remove that reference on this ticket (again, after merging in #30005) and use the one in the master reference file.

comment:14 Changed 5 months ago by git

  • Commit changed from a694a352f26f0c6236334f485a6c0e7237eecabc to 141f9a97c16facf1780bf14402915d68d814f354

Branch pushed to git repo; I updated commit sha1. New commits:

91990d0Fix variable name in StandardPermutations_n.rank exception message
f424a24Typo fixes in src/sage/combinat/gray_codes.py.
b45d5fbUpdate references in src/sage/combinat/gray_codes.py.
9577bdcMerge branch 'combinat_doc_fixes' into permutations_mset_rank_unrank
141f9a9Point references in combinat/gray_codes.py to the master bibliography file.

comment:15 in reply to: ↑ 13 ; follow-up: Changed 5 months ago by dcfifield

  • Dependencies set to #30005

Replying to tscrim:

You merge in the branch from #30005 into this one and set it as a dependency in the ticket.

Thanks for your guidance. I've merged #30005 into this branch and made it a dependency.

Actually, you probably should remove that reference on this ticket (again, after merging in #30005) and use the one in the master reference file.

It was actually the other way around. It is this ticket (#29942) that adds a reference to the master reference file, and #30005 that used a REFERENCES section local to gray_codes.py. In 141f9a9 I edited gray_codes.py to point to the master reference file.

comment:16 in reply to: ↑ 15 Changed 5 months ago by tscrim

Replying to dcfifield:

Replying to tscrim:

Actually, you probably should remove that reference on this ticket (again, after merging in #30005) and use the one in the master reference file.

It was actually the other way around. It is this ticket (#29942) that adds a reference to the master reference file, and #30005 that used a REFERENCES section local to gray_codes.py. In 141f9a9 I edited gray_codes.py to point to the master reference file.

The reason I was saying that was because #30005 was positively reviewed (and not clear there should be a dependency relation). However, I am happy with the changes here. I will just wait for the patchbot one more time.

comment:17 Changed 5 months ago by tscrim

  • Status changed from needs_review to positive_review

Finished testing locally. LGTM.

comment:18 Changed 5 months ago by vbraun

  • Branch changed from u/dcfifield/permutations_mset_rank_unrank to 141f9a97c16facf1780bf14402915d68d814f354
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.