Opened 5 months ago

Closed 5 months ago

# More efficient rank and unrank for Permutations_mset

Reported by: Owned by: dcfifield major sage-9.2 combinatorics slelievre David Fifield Frédéric Chapoton, Travis Scrimshaw N/A 141f9a9 (Commits) 141f9a97c16facf1780bf14402915d68d814f354 #30005

The Permutations_mset `rank` and `unrank` methods inherit a generic implementation from `EnumeratedSets` rank and unrank, which have running time proportional to the rank:

```sage: mset = list(range(10)) * 3
sage: p = Permutations(mset)
sage: for e in range(8): timeit('p.unrank({})'.format(10^e), number=1, repeat=5)
1 loop, best of 5: 24.2 μs per loop
1 loop, best of 5: 39.1 μs per loop
1 loop, best of 5: 191 μs per loop
1 loop, best of 5: 1.28 ms per loop
1 loop, best of 5: 12.7 ms per loop
1 loop, best of 5: 126 ms per loop
1 loop, best of 5: 1.27 s per loop
1 loop, best of 5: 13 s per loop
sage: p.rank(list(reversed(sorted(mset))))
... never finishes ...
```

This branch makes the running time roughly proportional to the size of the multiset:

```sage: mset = list(range(10)) * 3
sage: p = Permutations(mset)
sage: for e in range(8): timeit('p.unrank({})'.format(10^e), number=1, repeat=5)
1 loop, best of 5: 1.35 ms per loop
1 loop, best of 5: 1.43 ms per loop
1 loop, best of 5: 1.37 ms per loop
1 loop, best of 5: 1.37 ms per loop
1 loop, best of 5: 2.23 ms per loop
1 loop, best of 5: 2.12 ms per loop
1 loop, best of 5: 1.46 ms per loop
1 loop, best of 5: 1.44 ms per loop
sage: timeit('p.rank(list(reversed(sorted(mset))))', number=1, repeat=5)
1 loop, best of 5: 1.51 ms per loop
```

## Algorithm

The `rank` algorithm is a straightforward application of the solution to exercise 7.2.1.2–4 in The Art of Computer Programming, Volume 4A. I will quote the exercise:

1. Generalizing exercise 3, explain how to compute the rank of a1an with respect to Algorithm L [lexicographic permutation generation] when {a1, …, an} is the multiset {n1·x1, …, nt·xt}; here n1 + … + nt = n and x1 < … < xt. (The total number of permutations is, of course, the multinomial coefficient

multinomial(n1, …, nt) = n! / (n1! … nt!);

see Eq. 5.1.2–(3).) What is the rank of 314159265?

1. Use the recurrence rank(a1an) = 1/n Σj=1…t nj [xj < a1] multinomial(n1, …, nt) + rank(a2an). For example, rank(314159265) is

3/9 multinomial(2, 1, 1, 1, 2, 1, 1) + 0 + 2/7 multinomial(1, 1, 1, 2, 1, 1) + 0 + 1/5 multinomial(1, 2, 1, 1) + 3/4 multinomial(1, 1, 1, 1) + 0 + 1/2 multinomial(1, 1) = 30991.

The `unrank` algorithm is also based on this recurrence; however the application is not so straightforward. I did not find an explicit statement of an unranking algorithm elsewhere, and the algorithm I came up with isn't particularly elegant. I'd appreciate having another opinion on it or a pointer to some other reference. The main idea is that given a rank r and a multiset m = {n1·x1, …, nt·xt}, we know that r = c/n multinomial(n1, …, nt) + rank(m\c), where c is the number of elements in m that are strictly less than the first element of the output sequence, and m\c stands for m with one instance of the cth smallest element removed.

## Notes/caveats

The `rank` method is actually independent of the `Permutations_mset` object it is attached to. Other than the `self(p).check()` line, it never refers to `self`. It could be defined in terms of a standalone function, as StandardPermutations_n.unrank is defined in terms of from_rank.

I am aware of two ways in which this branch differs in behavior from what is there now. They both seem to me to be because of bugs in the `EnumeratedSets` fallback implementations, but I'll document them here in case complete compatibility is needed.

The first is that the existing rank method returns `None` when passed a list or a tuple, because it uses `==` comparison with an `element_class` type (I'm not sure how that works exactly, but at any rate it doesn't compare equal.) To get `rank` to return something other than `None`, you have to pass in a permutation produced by the `Permutations_mset` object itself.

```sage: mset = [1, 1, 2, 3, 4, 5, 5, 6, 9]
sage: p = Permutations(mset)
sage: p.rank(mset)
sage: p.rank([1, 1, 2, 3, 4, 5, 5, 6, 9])
sage: p.unrank(3)
[1, 1, 2, 3, 4, 5, 6, 9, 5]
sage: type(p.unrank(3))
<class 'sage.combinat.permutation.Permutations_mset_with_category.element_class'>
sage: p.rank(p.unrank(3))
3
```

In this branch, the value passed to `rank` doesn't have to be an `element_class`.

```sage: mset = [1, 1, 2, 3, 4, 5, 5, 6, 9]
sage: p = Permutations(mset)
sage: p.rank(mset)
0
sage: p.rank([1, 1, 2, 3, 4, 5, 5, 6, 9])
0
sage: p.unrank(3)
[1, 1, 2, 3, 4, 5, 6, 9, 5]
sage: type(p.unrank(3))
<class 'list'>
sage: p.rank(p.unrank(3))
3
```

The other way in which this branch changes behavior is that the current code does not canonicalize the multiset used to create the `Permutations_mset` object. `Permutations([1, 1, 2, 3, 4])` and `Permutations([3, 2, 1, 4, 1])` visit the same permutations but in a different order; the latter acts like an already partially enumerated version of the former:

```sage: Permutations([1, 1, 2, 3, 4]).unrank(0)
[1, 1, 2, 3, 4]
sage: Permutations([3, 2, 1, 4, 1]).unrank(0)
[3, 2, 1, 1, 4]
```

In this branch, the order of the input multiset does not make a difference:

```sage: Permutations([1, 1, 2, 3, 4]).unrank(0)
[1, 1, 2, 3, 4]
sage: Permutations([3, 2, 1, 4, 1]).unrank(0)
[1, 1, 2, 3, 4]
```

It would be possible to work around this canonicalization, if necessary, by computing the rank of the input multiset on creation, and using it as an offset for future calls to `unrank`.

I was working on a project that needed fast unranking of a multiset, when I found that Sage could not handle a multiset of the size I needed. I came up with an algorithm that worked well enough for my purposes, however its ranking order is not lexicographic. The algorithms in this branch are the result of a little more research to find a lexicographic algorithm.

### comment:1 Changed 5 months ago by dcfifield

• Status changed from new to needs_review

### comment:2 follow-up: ↓ 4 Changed 5 months ago by chapoton

The documentation is not formatted correctly. Each time you insert a sentence or TEST between two blocks of tests, it should end with `::` unless there is another sentence just after.

### comment:3 Changed 5 months ago by git

• Commit changed from a59f51403d4d2e35533dfb5c017861b8e0ff8827 to 9b9d936081bc4b04d5476ce4c4e54af0997a9295

Branch pushed to git repo; I updated commit sha1. New commits:

 ​9b9d936 `Fix doc formatting of Permutations_mset rank and unrank.`

### comment:4 in reply to: ↑ 2 Changed 5 months ago by dcfifield

The documentation is not formatted correctly. Each time you insert a sentence or TEST between two blocks of tests, it should end with `::` unless there is another sentence just after.

Thanks. I fixed the formatting in 9b9d936. I can see the difference with `sage --docbuild --include-tests-blocks`.

### comment:5 Changed 5 months ago by slelievre

• Description modified (diff)

### comment:6 follow-up: ↓ 8 Changed 5 months ago by tscrim

A quick comment for testing coverage. I would add an exhaustive test on something `X` of reasonable size such that

```for i, p in enumerate(X):
assert X.rank(p) == i
assert X.unrank(i) == p
```

### comment:7 Changed 5 months ago by git

• Commit changed from 9b9d936081bc4b04d5476ce4c4e54af0997a9295 to a4adb8a15db91dfdec4d9fe43e3826397c47e607

Branch pushed to git repo; I updated commit sha1. New commits:

 ​a4adb8a `Do an exhaustive rank/unrank roundtrip test.`

### comment:8 in reply to: ↑ 6 Changed 5 months ago by dcfifield

A quick comment for testing coverage. I would add an exhaustive test on something `X` of reasonable size

I did so in 4adb8a over the multiset `[2, 2, 3, 3, 3, 5, 5, 5, 5, 5]`, which has 2520 permutations. (I wasn't sure what a reasonable size for a test case would be.) In the same loop, I added a check for lexicographic ordering. I had to convert the permutations from a `list` to a `tuple` to facilitate comparison.

### comment:9 follow-up: ↓ 12 Changed 5 months ago by tscrim

• Reviewers set to Frédéric Chapoton, Travis Scrimshaw

I think that is a reasonable size; thank you.

I just have three more minor changes:

```-        - ``p`` -- a permutation of `M`.
+        - ``p`` -- a permutation of `M`
```
```         - ``r`` -- an integer between ``0`` and ``self.cardinality()-1``
-          inclusive.
+          inclusive
```

Update the Knuth reference based off #30005.

Something that is stylistic, so it is up to you if you want to change it, but I would change the

```-Some sentence explanation. ::
+Some sentence explanation::
```

to have it end each block with a colon as the code is better associated with the sentence. However, that is something you can ignore.

Frédéric, do you have any other comments?

### comment:10 Changed 5 months ago by chapoton

ok, looks good to me too. The patchbot warning are false-positive. Once Travis suggestions are fixed, this can be set to positive and will be a very nice improvement.

### comment:11 Changed 5 months ago by git

• Commit changed from a4adb8a15db91dfdec4d9fe43e3826397c47e607 to a694a352f26f0c6236334f485a6c0e7237eecabc

Branch pushed to git repo; I updated commit sha1. New commits:

 ​35b6e19 `Update a variable reference in a comment.` ​a694a35 `Doc improvements in Permutations_mset rank and unrank.`

### comment:12 in reply to: ↑ 9 ; follow-up: ↓ 13 Changed 5 months ago by dcfifield

```-        - ``p`` -- a permutation of `M`.
+        - ``p`` -- a permutation of `M`
```
```         - ``r`` -- an integer between ``0`` and ``self.cardinality()-1``
-          inclusive.
+          inclusive
```
```-Some sentence explanation. ::
+Some sentence explanation::
```

I made these changes in a694a35.

Update the Knuth reference based off #30005.

It seems the right way to handle this is to remove the REFERENCES section from gray_codes.py, containing the [Knuth-TAOCP4A] reference used in that file only, and update [Knuth-TAOCP4A] references to point to the new [Knu2011] in references/index.rst. But #30005 is not merged yet and [Knu2011] does not exist in its branch; while modifying gray_codes.py in this branch may prevent #30005 from merging cleanly. How should I handle this?

### comment:13 in reply to: ↑ 12 ; follow-up: ↓ 15 Changed 5 months ago by tscrim

Update the Knuth reference based off #30005.

It seems the right way to handle this is to remove the REFERENCES section from gray_codes.py, containing the [Knuth-TAOCP4A] reference used in that file only, and update [Knuth-TAOCP4A] references to point to the new [Knu2011] in references/index.rst. But #30005 is not merged yet and [Knu2011] does not exist in its branch; while modifying gray_codes.py in this branch may prevent #30005 from merging cleanly. How should I handle this?

You merge in the branch from #30005 into this one and set it as a dependency in the ticket. Actually, you probably should remove that reference on this ticket (again, after merging in #30005) and use the one in the master reference file.

### comment:14 Changed 5 months ago by git

• Commit changed from a694a352f26f0c6236334f485a6c0e7237eecabc to 141f9a97c16facf1780bf14402915d68d814f354

Branch pushed to git repo; I updated commit sha1. New commits:

 ​91990d0 `Fix variable name in StandardPermutations_n.rank exception message` ​f424a24 `Typo fixes in src/sage/combinat/gray_codes.py.` ​b45d5fb `Update references in src/sage/combinat/gray_codes.py.` ​9577bdc `Merge branch 'combinat_doc_fixes' into permutations_mset_rank_unrank` ​141f9a9 `Point references in combinat/gray_codes.py to the master bibliography file.`

### comment:15 in reply to: ↑ 13 ; follow-up: ↓ 16 Changed 5 months ago by dcfifield

• Dependencies set to #30005

You merge in the branch from #30005 into this one and set it as a dependency in the ticket.

Thanks for your guidance. I've merged #30005 into this branch and made it a dependency.

Actually, you probably should remove that reference on this ticket (again, after merging in #30005) and use the one in the master reference file.

It was actually the other way around. It is this ticket (#29942) that adds a reference to the master reference file, and #30005 that used a REFERENCES section local to gray_codes.py. In 141f9a9 I edited gray_codes.py to point to the master reference file.

### comment:16 in reply to: ↑ 15 Changed 5 months ago by tscrim

Actually, you probably should remove that reference on this ticket (again, after merging in #30005) and use the one in the master reference file.

It was actually the other way around. It is this ticket (#29942) that adds a reference to the master reference file, and #30005 that used a REFERENCES section local to gray_codes.py. In 141f9a9 I edited gray_codes.py to point to the master reference file.

The reason I was saying that was because #30005 was positively reviewed (and not clear there should be a dependency relation). However, I am happy with the changes here. I will just wait for the patchbot one more time.

### comment:17 Changed 5 months ago by tscrim

• Status changed from needs_review to positive_review

Finished testing locally. LGTM.

### comment:18 Changed 5 months ago by vbraun

• Branch changed from u/dcfifield/permutations_mset_rank_unrank to 141f9a97c16facf1780bf14402915d68d814f354
• Resolution set to fixed
• Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.