Opened 5 months ago
Closed 5 months ago
#29942 closed enhancement (fixed)
More efficient rank and unrank for Permutations_mset
Reported by:  dcfifield  Owned by:  

Priority:  major  Milestone:  sage9.2 
Component:  combinatorics  Keywords:  
Cc:  slelievre  Merged in:  
Authors:  David Fifield  Reviewers:  Frédéric Chapoton, Travis Scrimshaw 
Report Upstream:  N/A  Work issues:  
Branch:  141f9a9 (Commits)  Commit:  141f9a97c16facf1780bf14402915d68d814f354 
Dependencies:  #30005  Stopgaps: 
Description (last modified by )
The Permutations_mset
rank
and unrank
methods inherit a generic implementation from EnumeratedSets
rank and
unrank,
which have running time proportional to the rank:
sage: mset = list(range(10)) * 3 sage: p = Permutations(mset) sage: for e in range(8): timeit('p.unrank({})'.format(10^e), number=1, repeat=5) 1 loop, best of 5: 24.2 μs per loop 1 loop, best of 5: 39.1 μs per loop 1 loop, best of 5: 191 μs per loop 1 loop, best of 5: 1.28 ms per loop 1 loop, best of 5: 12.7 ms per loop 1 loop, best of 5: 126 ms per loop 1 loop, best of 5: 1.27 s per loop 1 loop, best of 5: 13 s per loop sage: p.rank(list(reversed(sorted(mset)))) ... never finishes ...
This branch makes the running time roughly proportional to the size of the multiset:
sage: mset = list(range(10)) * 3 sage: p = Permutations(mset) sage: for e in range(8): timeit('p.unrank({})'.format(10^e), number=1, repeat=5) 1 loop, best of 5: 1.35 ms per loop 1 loop, best of 5: 1.43 ms per loop 1 loop, best of 5: 1.37 ms per loop 1 loop, best of 5: 1.37 ms per loop 1 loop, best of 5: 2.23 ms per loop 1 loop, best of 5: 2.12 ms per loop 1 loop, best of 5: 1.46 ms per loop 1 loop, best of 5: 1.44 ms per loop sage: timeit('p.rank(list(reversed(sorted(mset))))', number=1, repeat=5) 1 loop, best of 5: 1.51 ms per loop
Algorithm
The rank
algorithm is a straightforward application of the solution to exercise 7.2.1.2–4 in
The Art of Computer Programming, Volume 4A.
I will quote the exercise:
 Generalizing exercise 3, explain how to compute the rank of a_{1}…a_{n} with respect to Algorithm L [lexicographic permutation generation] when {a_{1}, …, a_{n}} is the multiset {n_{1}·x_{1}, …, n_{t}·x_{t}}; here n_{1} + … + n_{t} = n and x_{1} < … < x_{t}. (The total number of permutations is, of course, the multinomial coefficient
multinomial(n_{1}, …, n_{t}) = n! / (n_{1}! … n_{t}!);
see Eq. 5.1.2–(3).) What is the rank of 314159265?
 Use the recurrence rank(a_{1} … a_{n}) = 1/n Σ_{j=1…t} n_{j} [x_{j} < a_{1}] multinomial(n_{1}, …, n_{t}) + rank(a_{2} … a_{n}). For example, rank(314159265) is
3/9 multinomial(2, 1, 1, 1, 2, 1, 1) + 0 + 2/7 multinomial(1, 1, 1, 2, 1, 1) + 0 + 1/5 multinomial(1, 2, 1, 1) + 3/4 multinomial(1, 1, 1, 1) + 0 + 1/2 multinomial(1, 1) = 30991.
The unrank
algorithm is also based on this recurrence;
however the application is not so straightforward.
I did not find an explicit statement of an unranking algorithm elsewhere,
and the algorithm I came up with isn't particularly elegant.
I'd appreciate having another opinion on it or a pointer to some other reference.
The main idea is that given a rank r and a multiset
m = {n_{1}·x_{1}, …, n_{t}·x_{t}},
we know that
r = c/n multinomial(n_{1}, …, n_{t}) + rank(m\c),
where c is the number of elements in m that are strictly less than
the first element of the output sequence, and m\c stands for
m with one instance of the cth smallest element removed.
Notes/caveats
The rank
method is actually independent of the Permutations_mset
object
it is attached to.
Other than the self(p).check()
line,
it never refers to self
.
It could be defined in terms of a standalone function,
as StandardPermutations_n.unrank
is defined in terms of from_rank.
I am aware of two ways in which this branch differs in behavior from what is there now.
They both seem to me to be because of bugs in the EnumeratedSets
fallback implementations,
but I'll document them here in case complete compatibility is needed.
The first is that the existing
rank
method returns None
when passed a list or a tuple,
because it uses ==
comparison with an element_class
type
(I'm not sure how that works exactly, but at any rate it doesn't compare equal.)
To get rank
to return something other than None
, you have to pass in
a permutation produced by the Permutations_mset
object itself.
sage: mset = [1, 1, 2, 3, 4, 5, 5, 6, 9] sage: p = Permutations(mset) sage: p.rank(mset) sage: p.rank([1, 1, 2, 3, 4, 5, 5, 6, 9]) sage: p.unrank(3) [1, 1, 2, 3, 4, 5, 6, 9, 5] sage: type(p.unrank(3)) <class 'sage.combinat.permutation.Permutations_mset_with_category.element_class'> sage: p.rank(p.unrank(3)) 3
In this branch, the value passed to rank
doesn't have to be an element_class
.
sage: mset = [1, 1, 2, 3, 4, 5, 5, 6, 9] sage: p = Permutations(mset) sage: p.rank(mset) 0 sage: p.rank([1, 1, 2, 3, 4, 5, 5, 6, 9]) 0 sage: p.unrank(3) [1, 1, 2, 3, 4, 5, 6, 9, 5] sage: type(p.unrank(3)) <class 'list'> sage: p.rank(p.unrank(3)) 3
The other way in which this branch changes behavior
is that the current code does not canonicalize the multiset
used to create the Permutations_mset
object.
Permutations([1, 1, 2, 3, 4])
and Permutations([3, 2, 1, 4, 1])
visit the same permutations but in a different order;
the latter acts like an already partially enumerated version of the former:
sage: Permutations([1, 1, 2, 3, 4]).unrank(0) [1, 1, 2, 3, 4] sage: Permutations([3, 2, 1, 4, 1]).unrank(0) [3, 2, 1, 1, 4]
In this branch, the order of the input multiset does not make a difference:
sage: Permutations([1, 1, 2, 3, 4]).unrank(0) [1, 1, 2, 3, 4] sage: Permutations([3, 2, 1, 4, 1]).unrank(0) [1, 1, 2, 3, 4]
It would be possible to work around this canonicalization, if necessary,
by computing the rank of the input multiset on creation,
and using it as an offset for future calls to unrank
.
I was working on a project that needed fast unranking of a multiset, when I found that Sage could not handle a multiset of the size I needed. I came up with an algorithm that worked well enough for my purposes, however its ranking order is not lexicographic. The algorithms in this branch are the result of a little more research to find a lexicographic algorithm.
Change History (18)
comment:1 Changed 5 months ago by
 Status changed from new to needs_review
comment:2 followup: ↓ 4 Changed 5 months ago by
comment:3 Changed 5 months ago by
 Commit changed from a59f51403d4d2e35533dfb5c017861b8e0ff8827 to 9b9d936081bc4b04d5476ce4c4e54af0997a9295
Branch pushed to git repo; I updated commit sha1. New commits:
9b9d936  Fix doc formatting of Permutations_mset rank and unrank.

comment:4 in reply to: ↑ 2 Changed 5 months ago by
Replying to chapoton:
The documentation is not formatted correctly. Each time you insert a sentence or TEST between two blocks of tests, it should end with
::
unless there is another sentence just after.
Thanks. I fixed the formatting in 9b9d936. I can see the difference with sage docbuild includetestsblocks
.
comment:5 Changed 5 months ago by
 Cc slelievre added
 Description modified (diff)
comment:6 followup: ↓ 8 Changed 5 months ago by
A quick comment for testing coverage. I would add an exhaustive test on something X
of reasonable size such that
for i, p in enumerate(X): assert X.rank(p) == i assert X.unrank(i) == p
comment:7 Changed 5 months ago by
 Commit changed from 9b9d936081bc4b04d5476ce4c4e54af0997a9295 to a4adb8a15db91dfdec4d9fe43e3826397c47e607
Branch pushed to git repo; I updated commit sha1. New commits:
a4adb8a  Do an exhaustive rank/unrank roundtrip test.

comment:8 in reply to: ↑ 6 Changed 5 months ago by
Replying to tscrim:
A quick comment for testing coverage. I would add an exhaustive test on something
X
of reasonable size
I did so in 4adb8a over the multiset [2, 2, 3, 3, 3, 5, 5, 5, 5, 5]
, which has 2520 permutations. (I wasn't sure what a reasonable size for a test case would be.) In the same loop, I added a check for lexicographic ordering. I had to convert the permutations from a list
to a tuple
to facilitate comparison.
comment:9 followup: ↓ 12 Changed 5 months ago by
 Reviewers set to Frédéric Chapoton, Travis Scrimshaw
I think that is a reasonable size; thank you.
I just have three more minor changes:
  ``p``  a permutation of `M`. +  ``p``  a permutation of `M`
 ``r``  an integer between ``0`` and ``self.cardinality()1``  inclusive. + inclusive
Update the Knuth reference based off #30005.
Something that is stylistic, so it is up to you if you want to change it, but I would change the
Some sentence explanation. :: +Some sentence explanation::
to have it end each block with a colon as the code is better associated with the sentence. However, that is something you can ignore.
Frédéric, do you have any other comments?
comment:10 Changed 5 months ago by
ok, looks good to me too. The patchbot warning are falsepositive. Once Travis suggestions are fixed, this can be set to positive and will be a very nice improvement.
comment:11 Changed 5 months ago by
 Commit changed from a4adb8a15db91dfdec4d9fe43e3826397c47e607 to a694a352f26f0c6236334f485a6c0e7237eecabc
comment:12 in reply to: ↑ 9 ; followup: ↓ 13 Changed 5 months ago by
Replying to tscrim:
  ``p``  a permutation of `M`. +  ``p``  a permutation of `M` ``r``  an integer between ``0`` and ``self.cardinality()1``  inclusive. + inclusiveSome sentence explanation. :: +Some sentence explanation::
I made these changes in a694a35.
Update the Knuth reference based off #30005.
It seems the right way to handle this is to remove the REFERENCES section from gray_codes.py, containing the [KnuthTAOCP4A] reference used in that file only, and update [KnuthTAOCP4A] references to point to the new [Knu2011] in references/index.rst. But #30005 is not merged yet and [Knu2011] does not exist in its branch; while modifying gray_codes.py in this branch may prevent #30005 from merging cleanly. How should I handle this?
comment:13 in reply to: ↑ 12 ; followup: ↓ 15 Changed 5 months ago by
Replying to dcfifield:
Replying to tscrim:
Update the Knuth reference based off #30005.
It seems the right way to handle this is to remove the REFERENCES section from gray_codes.py, containing the [KnuthTAOCP4A] reference used in that file only, and update [KnuthTAOCP4A] references to point to the new [Knu2011] in references/index.rst. But #30005 is not merged yet and [Knu2011] does not exist in its branch; while modifying gray_codes.py in this branch may prevent #30005 from merging cleanly. How should I handle this?
You merge in the branch from #30005 into this one and set it as a dependency in the ticket. Actually, you probably should remove that reference on this ticket (again, after merging in #30005) and use the one in the master reference file.
comment:14 Changed 5 months ago by
 Commit changed from a694a352f26f0c6236334f485a6c0e7237eecabc to 141f9a97c16facf1780bf14402915d68d814f354
Branch pushed to git repo; I updated commit sha1. New commits:
91990d0  Fix variable name in StandardPermutations_n.rank exception message

f424a24  Typo fixes in src/sage/combinat/gray_codes.py.

b45d5fb  Update references in src/sage/combinat/gray_codes.py.

9577bdc  Merge branch 'combinat_doc_fixes' into permutations_mset_rank_unrank

141f9a9  Point references in combinat/gray_codes.py to the master bibliography file.

comment:15 in reply to: ↑ 13 ; followup: ↓ 16 Changed 5 months ago by
 Dependencies set to #30005
Replying to tscrim:
You merge in the branch from #30005 into this one and set it as a dependency in the ticket.
Thanks for your guidance. I've merged #30005 into this branch and made it a dependency.
Actually, you probably should remove that reference on this ticket (again, after merging in #30005) and use the one in the master reference file.
It was actually the other way around. It is this ticket (#29942) that adds a reference to the master reference file, and #30005 that used a REFERENCES section local to gray_codes.py. In 141f9a9 I edited gray_codes.py to point to the master reference file.
comment:16 in reply to: ↑ 15 Changed 5 months ago by
Replying to dcfifield:
Replying to tscrim:
Actually, you probably should remove that reference on this ticket (again, after merging in #30005) and use the one in the master reference file.
It was actually the other way around. It is this ticket (#29942) that adds a reference to the master reference file, and #30005 that used a REFERENCES section local to gray_codes.py. In 141f9a9 I edited gray_codes.py to point to the master reference file.
The reason I was saying that was because #30005 was positively reviewed (and not clear there should be a dependency relation). However, I am happy with the changes here. I will just wait for the patchbot one more time.
comment:17 Changed 5 months ago by
 Status changed from needs_review to positive_review
Finished testing locally. LGTM.
comment:18 Changed 5 months ago by
 Branch changed from u/dcfifield/permutations_mset_rank_unrank to 141f9a97c16facf1780bf14402915d68d814f354
 Resolution set to fixed
 Status changed from positive_review to closed
The documentation is not formatted correctly. Each time you insert a sentence or TEST between two blocks of tests, it should end with
::
unless there is another sentence just after.see https://doc.sagemath.org/html/en/developer/coding_basics.html#template