Opened 4 years ago

Closed 4 years ago

#25462 closed enhancement (fixed)

make SetPartition much faster

Reported by: mantepse Owned by:
Priority: major Milestone: sage-8.4
Component: combinatorics Keywords:
Cc: alauve, tscrim, zabrocki Merged in:
Authors: Martin Rubey, Travis Scrimshaw Reviewers: Mike Zabrocki, Travis Scrimshaw
Report Upstream: N/A Work issues:
Branch: ba6a115 (Commits, GitHub, GitLab) Commit: ba6a115f30e3b7db0b972c0ec2bf3f511457dad3
Dependencies: #25497 Stopgaps:

Status badges

Description (last modified by mantepse)

SetPartition uses, as internal representation, a CloneableArray with each element being a Set. The latter makes the creation and manipulation of set partitions very slow.

Moreover, the current iterator over set partitions is much slower than necessary.

For example, to generate a list containing the set partitions of a 7 element set, I currently use about 3.3 seconds. Replacing the iterator with a straightforward algorithm from Ruskey's book I can achieve 470 ms. Removing the call to Set, this goes down to 16 ms.

Change History (85)

comment:1 Changed 4 years ago by mantepse

  • Authors set to Martin Rubey
  • Component changed from PLEASE CHANGE to combinatorics
  • Description modified (diff)
  • Type changed from PLEASE CHANGE to enhancement

comment:2 Changed 4 years ago by mantepse

The first question is about the internal representation. Options are:

  • a ClonableArray of frozensets. This gives reasonable performance, with hardly any changes in library code.
  • a ClonableArray of sorted tuples. This should give much better performance, and is more natural in many situations. However, it may be surprising to users.
  • a totally ordered base set, together with a restricted growth function.

comment:3 Changed 4 years ago by mantepse

  • Branch set to u/mantepse/make_setpartition_much_faster

comment:4 Changed 4 years ago by git

  • Commit set to 5591b40f3375a1ea6d645adfb5dcfe6cf0c87fa9

Branch pushed to git repo; I updated commit sha1. New commits:

5591b40add faster iterator

comment:5 Changed 4 years ago by tscrim

I would expect that sorted tuples give much worse performance generically when things get large because of the sorting. I have no problem with set partitions being tuples that represent sets, but I would think the code would be more natural/intuitive/readable by using frozensets.

Quick comments:

-len(z.intersection(p)) != 0
+z.intersection(p)

is a faster test for non-emptiness.

I don't like going to a recursive algorithm; Python's recursion check limit gets in the way of computing large examples. Also, why is this iterator faster? Is it because it does not create so much transient (Sage) objects? I think it would be better to explicit implement a backtracking version instead of the recursion (you will also get fewer transient objects and could do this in Cython for speed).

Your from_word, you could create sets, which is more efficient to convert to a frozenset:

sage: X = set([1,2,3])
sage: L = [1,2,3]
sage: %timeit frozenset(X)
10000000 loops, best of 3: 128 ns per loop
sage: %timeit frozenset(L)
1000000 loops, best of 3: 193 ns per loop

sage: X = set(range(12))
sage: L = list(range(12))
sage: %timeit frozenset(X)
1000000 loops, best of 3: 269 ns per loop
sage: %timeit frozenset(L)
1000000 loops, best of 3: 399 ns per loop

comment:6 Changed 4 years ago by tscrim

Also, likely only you would be able to work with a set partition if it was internally given by a restricted growth function. So that is a non-starter for me.

comment:7 Changed 4 years ago by git

  • Commit changed from 5591b40f3375a1ea6d645adfb5dcfe6cf0c87fa9 to 43be53b1dd431d86f7adb2bd73915a81393c4177

Branch pushed to git repo; I updated commit sha1. New commits:

43be53balternative, non-recursive iterator

comment:8 follow-up: Changed 4 years ago by mantepse

Dear Travis! Thank you very much for your comments!

I have now included an alternative iterator, due to Knuth. It is a small bit faster and non-recursive. However, I find it quite a bit more difficult to understand.

Please let me know whether you prefer it.

Your from_word, you could create sets, which is more efficient to convert to a frozenset

in principle this is true, however, I have to add elements (which I know to be distinct, but I cannot take advantage of this) to each of the blocks all the time. It turns out that appending to lists and then converting to a frozenset is (marginally) faster.

I don't like going to a recursive algorithm; Python's recursion check limit gets in the way of computing large examples. Also, why is this iterator faster? Is it because it does not create so much transient (Sage) objects? I think it would be better to explicit implement a backtracking version instead of the recursion (you will also get fewer transient objects and could do this in Cython for speed).

I think that one reason - apart from using OrderedSetPartitions, is that the original iterator does something much more complicated: you can prescribe the sizes of the blocks, too. I can have a short look whether we gain much by changing the internal representation of OrderedSetPartition. I must admit that I have not yet understood the iterator.

I have also implemented (but not yet committed) a (recursive) iterator for generating set partitions into a given number of blocks. I'm afraid I do not have a non-recursive version of that one.

If possible, could you give a use case for iterating over set partitions (into k blocks) where we hit the recursion limit? When the number of blocks is unrestricted, I am hitting it for set partitions of a 983-element set.

comment:9 in reply to: ↑ 8 Changed 4 years ago by tscrim

Replying to mantepse:

I have now included an alternative iterator, due to Knuth. It is a small bit faster and non-recursive. However, I find it quite a bit more difficult to understand.

Please let me know whether you prefer it.

Yes, I do. Thank you.

I don't like going to a recursive algorithm; Python's recursion check limit gets in the way of computing large examples. Also, why is this iterator faster? Is it because it does not create so much transient (Sage) objects? I think it would be better to explicit implement a backtracking version instead of the recursion (you will also get fewer transient objects and could do this in Cython for speed).

I think that one reason - apart from using OrderedSetPartitions, is that the original iterator does something much more complicated: you can prescribe the sizes of the blocks, too. I can have a short look whether we gain much by changing the internal representation of OrderedSetPartition. I must admit that I have not yet understood the iterator.

I have also implemented (but not yet committed) a (recursive) iterator for generating set partitions into a given number of blocks. I'm afraid I do not have a non-recursive version of that one.

I don't think that would be any worse than the current version. From a quick test, I get a recursion depth error for this:

sage: SP = SetPartitions(5000, Partition([5]*1000))
sage: it = iter(SP)
sage: it.next()

At one point I understood how OrderedSetPartitions iterator worked, but I have forgotten.

If possible, could you give a use case for iterating over set partitions (into k blocks) where we hit the recursion limit? When the number of blocks is unrestricted, I am hitting it for set partitions of a 983-element set.

With combinatorial objects, we can sometimes test conjectures on very large objects. Say you want to test the first million of set partitions of 5000 with 1000 parts. Basically it becomes an artificial limitation; plus, recursion is relatively slow.

comment:10 Changed 4 years ago by mantepse

Great!

With combinatorial objects, we can sometimes test conjectures on very large objects. Say you want to test the first million of set partitions of 5000 with 1000 parts. Basically it becomes an artificial limitation; plus, recursion is relatively slow.

OK, but for that purpose I have already implemented random generation :-) - hm, not for set partitions with given block sizes though :-(

comment:11 Changed 4 years ago by git

  • Commit changed from 43be53b1dd431d86f7adb2bd73915a81393c4177 to 7a2ba0a42dd2230f8beaa51cde386c059a5d02bc

Branch pushed to git repo; I updated commit sha1. New commits:

7a2ba0aadd faster iterator for set partitions with given number of blocks

comment:12 Changed 4 years ago by git

  • Commit changed from 7a2ba0a42dd2230f8beaa51cde386c059a5d02bc to d01f25dd69aa1788536c75bd6e8fa917548e075c

Branch pushed to git repo; I updated commit sha1. New commits:

d01f25dadapt doctests

comment:13 Changed 4 years ago by mantepse

  • Status changed from new to needs_review

comment:14 Changed 4 years ago by git

  • Commit changed from d01f25dd69aa1788536c75bd6e8fa917548e075c to eca8497a21a52fea2b29e1a7eac947ea9e6f0654

Branch pushed to git repo; I updated commit sha1. New commits:

eca8497fix more doctests

comment:15 Changed 4 years ago by mantepse

I am not sure about

sage: Set([1]) == frozenset([1])
False
sage: Set([1]) == set([1])
False
sage: set([1]) == frozenset([1])
True

shouldn't that be true always?

comment:16 Changed 4 years ago by tscrim

I am inclined to say yes and that is a bug in Set.

comment:17 Changed 4 years ago by mantepse

See #25497

comment:18 Changed 4 years ago by mantepse

  • Dependencies set to 25497

comment:19 Changed 4 years ago by mantepse

  • Dependencies changed from 25497 to #25497

comment:20 Changed 4 years ago by git

  • Commit changed from eca8497a21a52fea2b29e1a7eac947ea9e6f0654 to 75a30435cc07e7414e096e88c14c5af2934a4690

Branch pushed to git repo; I updated commit sha1. New commits:

c27c580remove an expensive call to Set, adapt some doctests
4ca5dfdrevert a call to Set which works after #25496
75a3043use the fast iterators and correct check

comment:21 follow-up: Changed 4 years ago by mantepse

It turns out that I have another question:

Currently, for two set partitions s and t we have s < t if turning them into lists (sorting each block into increasing order, then the set of blocks by minimal element) makes s come before t in lexicographic order.

This leads to strange results, for example:

sage: it = SetPartitions().__iter__()
sage: sorted(next(it) for x in range(10))
[{},
 {{1}},
 {{1}, {2}},
 {{1}, {2}, {3}},
 {{1}, {2, 3}},
 {{1, 2}},
 {{1, 2}, {3}},
 {{1, 2, 3}},
 {{1, 2, 3, 4}},
 {{1, 3}, {2}}]

Note that {{1},{2,3}} comes before {{1,2}}.

I would rather first sort by size, then by the number of blocks, and finally lexicographically.

Should I do this?

comment:22 Changed 4 years ago by mantepse

  • Status changed from needs_review to needs_work

comment:23 in reply to: ↑ 21 Changed 4 years ago by tscrim

Replying to mantepse:

It turns out that I have another question:

Currently, for two set partitions s and t we have s < t if turning them into lists (sorting each block into increasing order, then the set of blocks by minimal element) makes s come before t in lexicographic order.

This leads to strange results, for example:

sage: it = SetPartitions().__iter__()
sage: sorted(next(it) for x in range(10))
[{},
 {{1}},
 {{1}, {2}},
 {{1}, {2}, {3}},
 {{1}, {2, 3}},
 {{1, 2}},
 {{1, 2}, {3}},
 {{1, 2, 3}},
 {{1, 2, 3, 4}},
 {{1, 3}, {2}}]

Note that {{1},{2,3}} comes before {{1,2}}.

I would rather first sort by size, then by the number of blocks, and finally lexicographically.

Should I do this?

I think the current behavior is fine, but your choice is natural too. Irregardless that should be a separate ticket.

comment:24 Changed 4 years ago by mantepse

The problem is that the new iterator produces the set partitions in different order, which makes something like 30 doctests fail in diagram_algebras.py. I thought of replacing doctests like

P.basis().list()

with

set(P.basis())

and to make them more robust, but then the sorting order is weird.

comment:25 Changed 4 years ago by git

  • Commit changed from 75a30435cc07e7414e096e88c14c5af2934a4690 to a7be1effe7a106c409958801c5d7875f353bf1e7

Branch pushed to git repo; I updated commit sha1. New commits:

a7be1effix (and partially improve) tests in diagram algebras

comment:26 Changed 4 years ago by mantepse

I finally decided to simply adapt the doctests to the new iterator.

comment:27 Changed 4 years ago by mantepse

  • Status changed from needs_work to needs_review

comment:28 Changed 4 years ago by git

  • Commit changed from a7be1effe7a106c409958801c5d7875f353bf1e7 to ce6e63c5506a1e686310fc4fe2661bde4edac41e

Branch pushed to git repo; I updated commit sha1. New commits:

ce6e63cremove unused imports

comment:29 Changed 4 years ago by mantepse

  • Cc alauve tscrim zabrocki added

Dear Aaron and Mike!

since my original motivation was to do some computations with the orbit basis, and this ticket makes these computations a bit faster, maybe you could have a look?

For example (on 8.3.beta6):

sage: P5 = PartitionAlgebra(5, 2, ZZ); O5 = P5.orbit_basis(); O5
Orbit basis of Partition Algebra of rank 5 with parameter 2 over Integer Ring
sage: o1 = O5([[1,2,3,-1,-2,-3], [4,-4],[5,-5]])
sage: o2 = P5([[1,-1],[2],[3],[-2],[-3], [4],[-4],[5,-5]])
sage: o3 = O5([[1,2,-2,-1],[3,-3],[4,-4],[-5,-5]])
sage: %timeit o1*o2*o3
1 loop, best of 3: 24.9 s per loop
sage: 

with this ticket:

sage: %timeit o1*o2*o3
1 loop, best of 3: 1.07 s per loop

comment:30 Changed 4 years ago by zabrocki

  • Reviewers set to Mike Zabrocki
  • Status changed from needs_review to positive_review

I had some programs that checked an identity in the orbit basis that I used as an additional test. The speedup is really good (3m50s vs. 9m4s) and the computation is correct.

comment:31 follow-up: Changed 4 years ago by tscrim

  • Status changed from positive_review to needs_work

So there are a few things that warrant a second look:

  • Changing self.n to self._k. While this is the correct thing to do, this is a backwards incompatible change and requires deprecation (it is public). I think you should have n as a lazy attribute that raises the deprecation warning saying that it will become a method (there needs to be a public way to get this information).
  • x, y = min(a), max(a) is better as x, y = sorted(a) as the latter forces a to have length 2.
  • What about using _iterator_part for returning objects that are not elements of set partition, which do have some overhead?
  • You have changed the behavior of partition_diagrams (at least on Python2) as map returns a list. I think it would be better to keep it as an explicit iterator.

comment:32 in reply to: ↑ 31 ; follow-up: Changed 4 years ago by mantepse

Replying to tscrim:

So there are a few things that warrant a second look:

  • Changing self.n to self._k. While this is the correct thing to do, this is a backwards incompatible change and requires deprecation (it is public). I think you should have n as a lazy attribute that raises the deprecation warning saying that it will become a method (there needs to be a public way to get this information).

Could you please provide a code snippet? I do not know what a lazy attribute is, sorry.

  • x, y = min(a), max(a) is better as x, y = sorted(a) as the latter forces a to have length 2.

Right. In fact, I do not know what would be correct. It did not work with x, y = sorted(a), if I remember correctly - sometimes a has length greater than 2.

  • What about using _iterator_part for returning objects that are not elements of set partition, which do have some overhead?

I have no idea what you mean, please expand. I wouldn't mind speeding up _iterator_part, too, but this looks far more difficult to do.

  • You have changed the behavior of partition_diagrams (at least on Python2) as map returns a list. I think it would be better to keep it as an explicit iterator.

Oh, I didn't know that. Thanks, will do!

comment:34 Changed 4 years ago by git

  • Commit changed from ce6e63c5506a1e686310fc4fe2661bde4edac41e to 589f13596c3ab0f51fdc8792e5c5b3555a19803b

Branch pushed to git repo; I updated commit sha1. New commits:

589f135adress reviewer's comments: new method number_of_blocks and deprecation of n, partition_diagrams now returns iterator

comment:35 Changed 4 years ago by mantepse

  • Status changed from needs_work to needs_review

comment:36 Changed 4 years ago by tscrim

I don't care for itertools.imap. It looks kind of ugly, and I am not certain it is Python3 compatible. IMO, it is better to write a loop with a yield.

comment:37 Changed 4 years ago by tscrim

Also, this class to either ``self`` or something like these set partitions (I prefer the former).

comment:38 Changed 4 years ago by git

  • Commit changed from 589f13596c3ab0f51fdc8792e5c5b3555a19803b to ca293e395e482e3da3fe5e875b296eec4518cc8c

Branch pushed to git repo; I updated commit sha1. New commits:

ca293e3avoid itertools and modify docstring

comment:39 Changed 4 years ago by git

  • Commit changed from ca293e395e482e3da3fe5e875b296eec4518cc8c to 09bd54239403ac40940557d607f8ad49aecef255

Branch pushed to git repo; I updated commit sha1. New commits:

09bd542pyflakes fixes

comment:40 Changed 4 years ago by git

  • Commit changed from 09bd54239403ac40940557d607f8ad49aecef255 to 823769cb02435497113d13b0e3b22e3b9cd218bd

Branch pushed to git repo; I updated commit sha1. New commits:

823769cRevert "pyflakes fixes"

comment:41 in reply to: ↑ 32 ; follow-up: Changed 4 years ago by tscrim

Replying to mantepse:

Replying to tscrim:

  • x, y = min(a), max(a) is better as x, y = sorted(a) as the latter forces a to have length 2.

Right. In fact, I do not know what would be correct. It did not work with x, y = sorted(a), if I remember correctly - sometimes a has length greater than 2.

Really? That smells like a bug. Either that or the original code had a bug and you need to add a doctest showing it is fixed.

  • What about using _iterator_part for returning objects that are not elements of set partition, which do have some overhead?

I have no idea what you mean, please expand. I wouldn't mind speeding up _iterator_part, too, but this looks far more difficult to do.

Your iterator creates an actual element of the parent, which has overhead. I don't see why you are not rewriting _iterator_part with the new algorithm and using that.

Also please force-push this branch back to the earlier commit to remove 09bd542.

comment:42 in reply to: ↑ 41 ; follow-up: Changed 4 years ago by mantepse

Replying to tscrim:

Replying to mantepse:

Replying to tscrim:

  • x, y = min(a), max(a) is better as x, y = sorted(a) as the latter forces a to have length 2.

Right. In fact, I do not know what would be correct. It did not work with x, y = sorted(a), if I remember correctly - sometimes a has length greater than 2.

Really? That smells like a bug. Either that or the original code had a bug and you need to add a doctest showing it is fixed.

No, the original code took the first and the second element of the Set a, I take, for simplicity the smallest and the largest.

  • What about using _iterator_part for returning objects that are not elements of set partition, which do have some overhead?

I have no idea what you mean, please expand. I wouldn't mind speeding up _iterator_part, too, but this looks far more difficult to do.

Your iterator creates an actual element of the parent, which has overhead. I don't see why you are not rewriting _iterator_part with the new algorithm and using that.

I do not understand. SetPartitions._iterator_part is very slow, and I have no new algorithm for it, and it is not completely trivial to improve it.

Also please force-push this branch back to the earlier commit to remove 09bd542.

How do I do this?

comment:43 in reply to: ↑ 42 Changed 4 years ago by zabrocki

Right. In fact, I do not know what would be correct. It did not work with x, y = sorted(a), if I remember correctly - sometimes a has length greater than 2.

Most of the time the atoms_of_congruence_lattice have parts of size greater than 2 (in fact, just testing a number of lattices they are often just one block) and this algorithm returns two elements from those blocks. You might be able to speed this up in some cases by turning a into a list and taking the first and second elements, but this doesn't seem like it would be more efficient in general.

  • What about using _iterator_part for returning objects that are not elements of set partition, which do have some overhead?

I have no idea what you mean, please expand. I wouldn't mind speeding up _iterator_part, too, but this looks far more difficult to do.

Your iterator creates an actual element of the parent, which has overhead. I don't see why you are not rewriting _iterator_part with the new algorithm and using that.

I checked and _iterator_part is returning a set of sets. What do you suggest it should return?

sage: SetPartitions(6)._iterator_part(Partition([3,2,1])).next().parent()
<class 'sage.sets.set.Set_object_enumerated_with_category'>

Also please force-push this branch back to the earlier commit to remove 09bd542.

How do I do this?

Reading here: https://stackoverflow.com/questions/34519665/how-to-move-head-back-to-a-previous-location-detached-head/34519716#34519716

git reset --hard ​ca293e3

comment:44 Changed 4 years ago by git

  • Commit changed from 823769cb02435497113d13b0e3b22e3b9cd218bd to ca293e395e482e3da3fe5e875b296eec4518cc8c

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

comment:45 follow-up: Changed 4 years ago by mantepse

The patchbot produces a different an_example than my machine :-(

comment:46 Changed 4 years ago by git

  • Commit changed from ca293e395e482e3da3fe5e875b296eec4518cc8c to 9c5298df63011683acac988f48c7b02990ea5379

Branch pushed to git repo; I updated commit sha1. New commits:

9c5298dmake doctests more independent of ordering

comment:47 Changed 4 years ago by mantepse

I hope this is better now...

comment:48 Changed 4 years ago by mantepse

The patchbots now only report errors unrelated to this ticket. Please review!

comment:49 Changed 4 years ago by zabrocki

  • Branch changed from u/mantepse/make_setpartition_much_faster to public/25462
  • Commit changed from 9c5298df63011683acac988f48c7b02990ea5379 to ce2962b001f088e28a2e568334c5d4daf58a5f32

I agree that the an_element doctests seemed to be machine dependent, but I restored some of the doctests which were not linear combinations with coefficient 1.


Last 10 new commits:

4ca5dfdrevert a call to Set which works after #25496
75a3043use the fast iterators and correct check
a7be1effix (and partially improve) tests in diagram algebras
ce6e63cremove unused imports
589f135adress reviewer's comments: new method number_of_blocks and deprecation of n, partition_diagrams now returns iterator
ca293e3avoid itertools and modify docstring
594ace9Merge commit 'ca293e3' into public/25462
9c5298dmake doctests more independent of ordering
91e8d5cMerge commit '9c5298d' into public/25462
ce2962brestore richer doc tests

comment:50 Changed 4 years ago by mantepse

Thank you! Looks OK to me, except that I am slightly sceptical about the S.an_element :-)

comment:51 Changed 4 years ago by zabrocki

  • Status changed from needs_review to positive_review

Tests pass on my machine.

comment:52 Changed 4 years ago by tscrim

  • Status changed from positive_review to needs_info

I still have some comments, but I have not yet found the time to give them. I will try to do that by tomorrow.

comment:53 follow-up: Changed 4 years ago by tscrim

Oh, I see, the iterator is for all set partitions, not with a fixed set size. However, I think it would be better to have a separate iterator that returns the lists of frozensets for those things that do not need a real SetPartition. I also think _iterator_parts should be consistent and return lists of frozensets. Would it also be possible to modify the Knuth algorithm for _iterator_parts?

This is horrible code:

            try:
                tst = frozenset(e for B in self._base_diagram for e in B)
                if tst != self.parent()._set:
                    raise TypeError
            except TypeError:

You should not dictate program flow by raising an error. Just let the fact that B is non-iterable propagate up; you do not need to have the "perfect" error message.

This change

  • src/sage/groups/perm_gps/symgp_conjugacy_class.py

    diff --git a/src/sage/groups/perm_gps/symgp_conjugacy_class.py b/src/sage/groups/perm_gps/symgp_conjugacy_class.py
    index 3b55bcf..5e28f8d 100644
    a b def conjugacy_class_iterator(part, S=None): 
    360360
    361361    m = len(part)
    362362    for s in SetPartitions(S, part):
    363         firsts = [t[0] for t in s]
    364         rests = [t[1:] for t in s]
     363        blocks = map(Set, s)
     364        firsts = [t[0] for t in blocks]
     365        rests = [t[1:] for t in blocks]
    365366        iterator = tuple(itertools.permutations(r) for r in rests)
    366367        for r in itertools.product(*iterator):
    367368            yield [(firsts[i],) + r[i] for i in range(m)]

It seems like it would be better doing:

its = [iter(t) for t in s]
firsts = [next(i) for i in its]
rests = [list(i) for i in its]

Try to keep things at 80 chars/line, specifically here:

        There is a natural embedding into partition algebras on more elements, by adding identity strands::

Why are you not using an_element()?

Why this change:

  • src/sage/combinat/tableau.py

    diff --git a/src/sage/combinat/tableau.py b/src/sage/combinat/tableau.py
    index d5f24b8..c00e508 100644
    a b class StandardTableaux_size(StandardTableaux, DisjointUnionEnumeratedSets): 
    74137413
    74147414        EXAMPLES::
    74157415
    7416             sage: StandardTableaux(5).random_element() # random
    7417             [[1, 4, 5], [2], [3]]
     7416            sage: StandardTableaux(10).random_element() # random
     7417            [[1, 3, 6], [2, 5, 7], [4, 8], [9], [10]]
    74187418            sage: StandardTableaux(0).random_element()
    74197419            []
    74207420            sage: StandardTableaux(1).random_element()

Why are you needlessly unrolling this:

@@ -544,7 +546,9 @@ class SetPartition(AbstractSetPartition):
             {}
         """
         self._latex_options = {}
-        ClonableArray.__init__(self, parent, sorted(map(Set, s), key=min), check=check)
+        sets = map(frozenset, s)
+        blocks = sorted(sets, key=min)
+        ClonableArray.__init__(self, parent, blocks, check=check)
 
     def check(self):
         """

This makes it seem less like one logical unit.

More towards bikeshedding, but I would appreciate if you could regroup some of the output so the result is shorter output in terms of number of lines.

comment:54 Changed 4 years ago by git

  • Commit changed from ce2962b001f088e28a2e568334c5d4daf58a5f32 to ba08ff35c1aec6cd454219653e2394fd256f396f

Branch pushed to git repo; I updated commit sha1. New commits:

ba08ff3reviewer's comments

comment:55 Changed 4 years ago by git

  • Commit changed from ba08ff35c1aec6cd454219653e2394fd256f396f to 95ce20f8b875cbb2cc8848b09120544a895612b5

Branch pushed to git repo; I updated commit sha1. New commits:

95ce20fprovide iterators which return lists of lists

comment:56 in reply to: ↑ 53 ; follow-up: Changed 4 years ago by mantepse

Replying to tscrim:

I think it would be better to have a separate iterator that returns the lists of frozensets for those things that do not need a real SetPartition.

Provided such a method, but not used. It looks a bit like premature optimization to me right now.

I also think _iterator_parts should be consistent and return lists of frozensets. Would it also be possible to modify the Knuth algorithm for _iterator_parts?

This is on my todo list, but on a separate ticket. Currently, I would be transforming Set (from OrderedSetPartition) into frozenset, which is a bit backwards.

Why are you not using an_element()?

Because it turned out to be machine dependent.

Why this change:

  • src/sage/combinat/tableau.py

    diff --git a/src/sage/combinat/tableau.py b/src/sage/combinat/tableau.py
    index d5f24b8..c00e508 100644
    a b class StandardTableaux_size(StandardTableaux, DisjointUnionEnumeratedSets): 
    74137413
    74147414        EXAMPLES::
    74157415
    7416             sage: StandardTableaux(5).random_element() # random
    7417             [[1, 4, 5], [2], [3]]
     7416            sage: StandardTableaux(10).random_element() # random
     7417            [[1, 3, 6], [2, 5, 7], [4, 8], [9], [10]]
    74187418            sage: StandardTableaux(0).random_element()
    74197419            []
    74207420            sage: StandardTableaux(1).random_element()

Because the bug turned up only for tableaux of size 10.

Why are you needlessly unrolling this:

@@ -544,7 +546,9 @@ class SetPartition(AbstractSetPartition):
             {}
         """
         self._latex_options = {}
-        ClonableArray.__init__(self, parent, sorted(map(Set, s), key=min), check=check)
+        sets = map(frozenset, s)
+        blocks = sorted(sets, key=min)
+        ClonableArray.__init__(self, parent, blocks, check=check)
 
     def check(self):
         """

because it is much better to trace through, and at least for me, easier to understand. If you insist, I'll change it.

More towards bikeshedding, but I would appreciate if you could regroup some of the output so the result is shorter output in terms of number of lines.

There is a reason to have some of the output as it is: it is quite possible that the order the set partitions appear changes again. Then the diff will be much easier to check.

Thanks for your careful look! I modified all other things as you suggested.

comment:57 Changed 4 years ago by mantepse

  • Status changed from needs_info to needs_review

comment:58 in reply to: ↑ 56 Changed 4 years ago by tscrim

Replying to mantepse:

Replying to tscrim:

I think it would be better to have a separate iterator that returns the lists of frozensets for those things that do not need a real SetPartition.

Provided such a method, but not used. It looks a bit like premature optimization to me right now.

That is a bit unfair as it is something that could easily be used in places where ordering over all set partitions but that is not the actual returned object.

I also think _iterator_parts should be consistent and return lists of frozensets. Would it also be possible to modify the Knuth algorithm for _iterator_parts?

This is on my todo list, but on a separate ticket.

That is fine.

Currently, I would be transforming Set (from OrderedSetPartition) into frozenset, which is a bit backwards.

Huh? Isn't that what we are doing here?

Why are you not using an_element()?

Because it turned out to be machine dependent.

Really? That makes me very worried that the general iterator order is machine dependent, which seems very strange to me as it looks like there shouldn't be anything machine dependent in the implementation. CFM._an_element_ just iterates over the first few elements.

Why this change:

  • src/sage/combinat/tableau.py

    diff --git a/src/sage/combinat/tableau.py b/src/sage/combinat/tableau.py
    index d5f24b8..c00e508 100644
    a b class StandardTableaux_size(StandardTableaux, DisjointUnionEnumeratedSets): 
    74137413
    74147414        EXAMPLES::
    74157415
    7416             sage: StandardTableaux(5).random_element() # random
    7417             [[1, 4, 5], [2], [3]]
     7416            sage: StandardTableaux(10).random_element() # random
     7417            [[1, 3, 6], [2, 5, 7], [4, 8], [9], [10]]
    74187418            sage: StandardTableaux(0).random_element()
    74197419            []
    74207420            sage: StandardTableaux(1).random_element()

Because the bug turned up only for tableaux of size 10.

What bug? You never mentioned a bug. I cannot tell if this was the correct test or not without knowing what the problem was.

Why are you needlessly unrolling this:

@@ -544,7 +546,9 @@ class SetPartition(AbstractSetPartition):
             {}
         """
         self._latex_options = {}
-        ClonableArray.__init__(self, parent, sorted(map(Set, s), key=min), check=check)
+        sets = map(frozenset, s)
+        blocks = sorted(sets, key=min)
+        ClonableArray.__init__(self, parent, blocks, check=check)
 
     def check(self):
         """

because it is much better to trace through, and at least for me, easier to understand. If you insist, I'll change it.

Please do.

More towards bikeshedding, but I would appreciate if you could regroup some of the output so the result is shorter output in terms of number of lines.

There is a reason to have some of the output as it is: it is quite possible that the order the set partitions appear changes again. Then the diff will be much easier to check.

I don't really find that argument convincing. So should every list have an item on every line? I generally think it is better to have a more compact output. However, this is bikeshedding, so it is not something I would hold up this ticket for.

Thanks for your careful look! I modified all other things as you suggested.

Thank you.

comment:59 in reply to: ↑ 45 Changed 4 years ago by mantepse

Replying to tscrim:

Replying to mantepse:

Replying to tscrim:

I think it would be better to have a separate iterator that returns the lists of frozensets for those things that do not need a real SetPartition.

Provided such a method, but not used. It looks a bit like premature optimization to me right now.

That is a bit unfair as it is something that could easily be used in places where ordering over all set partitions but that is not the actual returned object.

OK, please suggest a name - is there any convention? In any case, I think that in the case of diagram algebras one might want to consider to use SetPartitions(x).list() instead, because this caches the result. That's why I said it might be premature.

I also think _iterator_parts should be consistent and return lists of frozensets.

Currently, I would be transforming Set (from OrderedSetPartition) into frozenset, which is a bit backwards.

Huh? Isn't that what we are doing here?

No, in the new __iter__ methods we generate restricted growth words (as lists), which are turned into frozensets. On the other hand _iterator_parts obtains its stuff first from OrderedSetPartition, which yields Sets. I want to change that, but in a separate ticket.

Why are you not using an_element()?

Because it turned out to be machine dependent.

Really? That makes me very worried that the general iterator order is machine dependent, which seems very strange to me as it looks like there shouldn't be anything machine dependent in the implementation. CFM._an_element_ just iterates over the first few elements.

I only saw this because a patchbot failed (see mantepse). The relevant snippet from https://patchbot.sagemath.org/log/25462/Ubuntu/14.04/i686/3.13.0-95-generic/arando/2018-07-02%2008:29:55 is

File "src/sage/combinat/diagram_algebras.py", line 2332, in sage.combinat.diagram_algebras.PartitionAlgebra._element_constructor_
Failed example:
    O.an_element()
Expected:
    3*O{{-3}, {-2, -1, 1, 2, 3}} + 2*O{{-3, -2, -1, 1, 2, 3}} + 2*O{{-3, -1, 1, 2, 3}, {-2}}
Got:
    3*O{{-3}, {-2, -1, 1, 2, 3}} + 2*O{{-3, -2, -1, 1, 2, 3}} + 2*O{{-3, -2, 1, 2, 3}, {-1}}

I have no idea how to debug this.

Why this change: [...]

Because the bug turned up only for tableaux of size 10.

What bug? You never mentioned a bug. I cannot tell if this was the correct test or not without knowing what the problem was.

Sorry, I do not recall the details anymore precisely. Roughly, I had to modify the code (because frozensets do not support indexing), and the first version had a bug, which I only discovered by luck, because it didn't happen for very small tableaux. I don't think that I ever commited the wrong code...

There is a reason to have some of the output as it is: it is quite possible that the order the set partitions appear changes again. Then the diff will be much easier to check.

I don't really find that argument convincing. So should every list have an item on every line?

Of course not - only if the order of the items in the lists is very likely to change again.

comment:60 Changed 4 years ago by git

  • Commit changed from 95ce20f8b875cbb2cc8848b09120544a895612b5 to d2e0e6e2114e3fa758ddd7a9295f03a5792421c3

Branch pushed to git repo; I updated commit sha1. New commits:

d2e0e6einline a computation by reviewer's request

comment:61 Changed 4 years ago by mantepse

@Travis: as an aside, after #25659, the (slow) _iterator_part will not be used anymore in library code...

comment:62 Changed 4 years ago by mantepse

I just created ticket #25865 for a fast iterator for set partitions with given block sizes.

comment:63 Changed 4 years ago by mantepse

ping?

comment:64 follow-up: Changed 4 years ago by zabrocki

I recompiled and ran all doctests (passed). I think that you addressed all of Travis' comments.

Just to clarify: The bug you mention in comment:56 was one that arose in implementing this algorithm. So then its better to just add a new test rather than modify an old one and I would suggest adding one that is not #random . I'm ok with this and I'm willing to put positive review. Travis?

comment:65 in reply to: ↑ 64 Changed 4 years ago by mantepse

Replying to zabrocki:

Just to clarify: The bug you mention in comment:56 was one that arose in implementing this algorithm. So then its better to just add a new test rather than modify an old one and I would suggest adding one that is not #random .

I am really sorry, but I cannot remember what I did exactly. However, since there are really very few standard Young tableaux on 5 elements, it is quite unlikely to hit a bug by producing such a small tableau.

Also, I must admit that I don't know how to produce a non-random test of .random_element(). Can one fix the seed, across all architectures?

comment:66 Changed 4 years ago by tscrim

I still do not see how an_element is machine dependent without the iterator being machine dependent, which is a bigger problem. The precise code for the _an_element_ (and hence, an_element) for the orbit basis is:

        x = self.zero()
        I = self.basis().keys()
        R = self.base_ring()
        try:
            x = x + self.monomial(I.an_element())
        except Exception:
            pass
        try:
            g = iter(self.basis().keys())
            for c in range(1,4):
                x = x + self.term(next(g), R(c))
        except Exception:
            pass
        return x

Now the doctests themselves may need to be updated because the iteration order of the set partitions has changed, but I do not see the need of moving away from an_element. I am okay with constructing an explicit element, but I want to be certain that there is no machine dependency here, as that would mean the iterator is machine dependent.

comment:67 Changed 4 years ago by mantepse

OK, so do you want me to revert (all?) doctests that used an_element (and only adapt the output)?

comment:68 follow-up: Changed 4 years ago by tscrim

Not necessarily, but you did say that an_element was machine dependent in comment:56. So I want to make sure that (well, really the iterator, but that is what it uses) is not machine dependent. If the iterator is, then it is a bug by being in EnumeratedSets (by my interpretation) or at the very least could mean brittle doctests.

comment:69 in reply to: ↑ 68 Changed 4 years ago by mantepse

Replying to tscrim:

Not necessarily, but you did say that an_element was machine dependent in comment:56.

I think you meant to write comment:45

Is there an easy way to restore to the state of affairs we had back than - possibly

823769cRevert "pyflakes fixes"

and rerun the patchbots - especially the one that produced the strange result?

Last edited 4 years ago by mantepse (previous) (diff)

comment:70 follow-up: Changed 4 years ago by tscrim

I think you mean to say it was said in both, albeit much more explicitly in comment:56.

You can go back to that commit but ontop of develop by doing

$ git checkout -b new_branch_name develop
$ git merge 823769c

However, if we wanted to properly test it, then we need to add a test that iterates over the corresponding set partitions. I think it would be better to also change the relevant an_element calls back on the same commit and do that on top of the current branch. It will be better in the long run if it is not a problem (as we will want to keep the iteration test).

comment:71 Changed 4 years ago by tscrim

  • Milestone changed from sage-8.3 to sage-8.4
  • Reviewers changed from Mike Zabrocki to Mike Zabrocki, Travis Scrimshaw

comment:72 in reply to: ↑ 70 Changed 4 years ago by mantepse

Replying to tscrim:

I think you mean to say it was said in both, albeit much more explicitly in comment:56.

Oh, yes, apologies!

You can go back to that commit but ontop of develop by doing

$ git checkout -b new_branch_name develop
$ git merge 823769c

OK, I am currently rebuilding.

However, if we wanted to properly test it, then we need to add a test that iterates over the corresponding set partitions. I think it would be better to also change the relevant an_element calls back on the same commit and do that on top of the current branch. It will be better in the long run if it is not a problem (as we will want to keep the iteration test).

I am not sure what you mean with "on the same commit". For the moment, I think I'll simply go back to that commit 823769c on top of develop, and switch the branch, so that the patchbots test it again.

comment:73 Changed 4 years ago by mantepse

  • Branch changed from public/25462 to public/set_partition_mystery

comment:74 Changed 4 years ago by mantepse

  • Commit changed from d2e0e6e2114e3fa758ddd7a9295f03a5792421c3 to 3e6b0a233ad06fe06d3b3186bbde108c253979ab

Travis, I cannot reproduce the error. Is there a way to make arando test the currently attached branch?


New commits:

09bd542pyflakes fixes
823769cRevert "pyflakes fixes"
3e6b0a2Merge commit '823769c' into set_partition_mystery

comment:75 Changed 4 years ago by mantepse

??? where are these commits coming from ???

comment:76 follow-up: Changed 4 years ago by tscrim

Unless you specifically get the arando patchbot manager to call it, no. Patchbots are there trying to help everyone equally. :)

These commits are really coming from comment:73.

comment:77 in reply to: ↑ 76 Changed 4 years ago by mantepse

Replying to tscrim:

Unless you specifically get the arando patchbot manager to call it, no. Patchbots are there trying to help everyone equally. :)

Could you help me to get this ticket done? I am clueless about how to proceed :-(

comment:78 Changed 4 years ago by tscrim

  • Branch changed from public/set_partition_mystery to public/combinat/speedup_set_partitions-25462
  • Commit changed from 3e6b0a233ad06fe06d3b3186bbde108c253979ab to 87bf535461a19ba4d8730fe08e91c55bab042003

Okay, I pushed a bunch of changes, including adding back in the an_element() tests. I think perhaps the non-determinism might have come from the fact you were calling list(self.base_set()) in your iterator, which may not have a guaranteed output. I also cythonized the set partition iterators, which cut the time to ~60%.

Since you claim this is much faster but have not provided timings, here are some:

sage: S = SetPartitions(10)
sage: S.cardinality()
115975
sage: SP = SetPartitions(12,7)
sage: SP.cardinality()
627396

sage: %time for x in S: pass
CPU times: user 1min 19s, sys: 71.4 ms, total: 1min 19s
Wall time: 1min 19s

sage: %time for x in S: pass
Longer than 5 minutes

vs "current" branch:

sage: %time for x in S: pass
CPU times: user 881 ms, sys: 28.1 ms, total: 909 ms
Wall time: 851 ms
sage: %time for x in SP: pass
CPU times: user 5.9 s, sys: 19.2 ms, total: 5.92 s
Wall time: 5.89 s

vs with my changes:

sage: %time for x in S: pass
CPU times: user 602 ms, sys: 14.5 ms, total: 617 ms
Wall time: 571 ms
sage: %time for x in SP: pass
CPU times: user 3.84 s, sys: 19.4 ms, total: 3.86 s
Wall time: 3.82 s

Now if we could get a non-recursive version of the set partition iterator with fixed blocks, I could get some more speed out of those iterators. The other possibility would be to try and make that into a parallel function, but that might be tricky because a needs to be shared in some fashion. Well, those could be made into followup tickets.

So if tests pass for you, then we can set this to a positive review as I am fairly confident that the iteration order (and hence the an_element()) is not machine-dependent (since I squashed the only other place I could see where a machine dependency might creep in).


New commits:

594ace9Merge commit 'ca293e3' into public/25462
9c5298dmake doctests more independent of ordering
91e8d5cMerge commit '9c5298d' into public/25462
ce2962brestore richer doc tests
ba08ff3reviewer's comments
95ce20fprovide iterators which return lists of lists
d2e0e6einline a computation by reviewer's request
947233cMerge branch 'public/25462' of git://trac.sagemath.org/sage into public/combinat/speedup_set_partitions-25462
a79e302Reverted to an_element() and added some additional reviewer changes.
87bf535Cythonizing iterator.

comment:79 Changed 4 years ago by mantepse

Great and thanks for cythonizing!

I think the speedup is quite respectable - do you have an application in mind where you need a faster version still?

The patchbot currently fails (it seems the order of the output has changed). I am currently compiling locally.

Thanks again!

comment:80 Changed 4 years ago by mantepse

(I was lucky and had develop already compiled)

Indeed, the doctests fail also on my machine, due to different order in the output. Is this to be expected - or do they pass on your computer?

comment:81 Changed 4 years ago by tscrim

Hmm...I thought I did test these files, but I guess I did not. I am getting what looks to be the same failures as the patchbot (I didn't check in detail though).

The reason I cythonized is because there are a few things (IIRC) that do iterate over set partitions and want to do so at a lower level. I was really hoping to get the iterator with a fixed number of parts down, but I couldn't figure out (at least in the 30 minutes I gave myself) how to make it non-recursive (in particular, so I could use fixed sized arrays of Py_ssize_t). However, it is not that important for me as it does not come up in my (current) research.

comment:82 Changed 4 years ago by git

  • Commit changed from 87bf535461a19ba4d8730fe08e91c55bab042003 to ba6a115f30e3b7db0b972c0ec2bf3f511457dad3

Branch pushed to git repo; I updated commit sha1. New commits:

ba6a115Fixing doctests due to ordering change.

comment:83 Changed 4 years ago by tscrim

Fixed the doctests (as least on the computers I can test, they agree).

comment:84 Changed 4 years ago by mantepse

  • Authors changed from Martin Rubey to Martin Rubey, Travis Scrimshaw
  • Status changed from needs_review to positive_review

I'm setting this to positive review since the patchbot is happy essentially, it works on my computer, and everthing is wonderful.

The next steps are #25659 and #25642... (very likely, both need rebasing)

comment:85 Changed 4 years ago by vbraun

  • Branch changed from public/combinat/speedup_set_partitions-25462 to ba6a115f30e3b7db0b972c0ec2bf3f511457dad3
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.