Ticket #10534 (needs_review enhancement)

Opened 2 years ago

Last modified 2 months ago

Optimizations for the generation of subwords, subsets and set partitions

Reported by: vdelecroix Owned by: vdelecroix
Priority: major Milestone: sage-5.10
Component: combinatorics Keywords: generation, combinatorics, set, subset, partition
Cc: sage-combinat Work issues:
Report Upstream: N/A Reviewers: Florent Hivert
Authors: Vincent Delecroix Merged in:
Dependencies: Stopgaps:

Description (last modified by chapoton) (diff)

This ticket stands for a faster method of exhaustive generation of combinatorial object that may be found in sage.combinat.* and more precisely

  • subwords (sage.combinat.subword)
  • subsets (sage.combinat.subset)
  • set partitions (sage.combinat.set_partitions and sage.combinat.set_partition_ordered)

Most of the time, the improvment just consists in calling the python library itertools. For other, it consists in creating a lower level Python iterator which does not use the heavy mechanism of Parent/Element? (which is not intended for the frontend user).

Moreover, the patch provides

  • bugs and typos corrections (doctest coverage is now 100%)
  • random generation capabilities
  • modified cardinality method which always return a Sage Integer (and no more Python int or Sage Rational)

Some timings (on Intel(R) Core(TM)2 Duo CPU T8100 @ 2.10GHz)

Old version

sage: S = SetPartitions(8,[3,3,2])
sage: timeit('for p in S: pass')
5 loops, best of 3: 401 ms per loop

sage: S = SetPartitions(9,[3,3,2,1])
sage: timeit('for p in S: pass')
5 loops, best of 3: 4 s per loop

sage: S = Subsets([1]*1+[2]*2+[3]*3+[4]*4+[5]*5, submultiset=True)
sage: timeit('S.cardinality()')
5 loops, best of 3: 712 ms per loop

New version

sage: S = SetPartitions(8,[3,3,2])
sage: timeit('for p in S: pass')
5 loops, best of 3: 130 ms per loop
sage: timeit('for p in S._fast_iterator(): pass')
25 loops, best of 3: 10.4 ms per loop

sage: S = SetPartitions(9,[3,3,2,1])
sage: timeit('for p in S: pass')
5 loops, best of 3: 1.43 s per loop
sage: timeit('for p in S._fast_iterator(): pass')
5 loops, best of 3: 91.3 ms per loop

sage: S = Subsets([1]*1+[2]*2+[3]*3+[4]*4+[5]*5,submultiset=True)
sage: timeit('S.cardinality()')
625 loops, best of 3: 33.9 µs per loop

Apply only:

  • trac_10534-generation_of_subsets_and_set_partitions.3.patch

Change History

comment:1 Changed 2 years ago by vdelecroix

  • Description modified (diff)
  • Summary changed from Generation of subsets and partitions enumerated sets to Generation of subwords, subsets and set partitions

comment:2 Changed 2 years ago by vdelecroix

  • Status changed from new to needs_review
  • Description modified (diff)

comment:3 Changed 2 years ago by vdelecroix

  • Description modified (diff)

comment:4 Changed 2 years ago by vdelecroix

  • Status changed from needs_review to needs_work
  • Description modified (diff)

comment:5 Changed 2 years ago by vdelecroix

  • Owner vdelecroix deleted

comment:6 Changed 2 years ago by vdelecroix

  • Owner set to vdelecroix
  • Description modified (diff)

comment:7 Changed 2 years ago by vdelecroix

  • Description modified (diff)

comment:8 Changed 2 years ago by vdelecroix

  • Status changed from needs_work to needs_review

comment:9 follow-up: ↓ 10 Changed 2 years ago by hivert

  • Reviewers set to Florent Hivert
  • Authors changed from vdelecroix to Vincent Delecroix

Hi Vincent,

Your patch looks good except a few remarks:

  • the field Author below is for credits. You should put your real name and not your login.
  • the documentation now says:
    [...]
    The elements of the output are tuples of Python int (and not Sage Integer).
    [...]
    It's element are returned as plain list of python int.
    

At least one of those two sentences is wrong. Also they seems a little redundent. This should be fixed.

I'm sorry for not having the time to put a review patch, nor finish the review now.

comment:10 in reply to: ↑ 9 Changed 2 years ago by vdelecroix

  • the field Author below is for credits. You should put your real name and not your login.

Thank you for modifying this.

  • the documentation now says:
    [...]
    The elements of the output are tuples of Python int (and not Sage Integer).
    [...]
    It's element are returned as plain list of python int.
    

That's right. Output value are now tuples and it is fully specified.

Changed 2 years ago by vdelecroix

comment:11 follow-up: ↓ 12 Changed 2 years ago by hivert

Hi Vincent,

I need more time to review your long patch. I just notice the following:

file: sage/combinat/choose_nk.py
85	            sage: [0,2] in c52   # trac XXXX 

You probably want to replace XXXX by an actual number. I'll continue my review tomorrow (hopefully)

Florent

comment:12 in reply to: ↑ 11 Changed 2 years ago by vdelecroix

Hi Florent

I need more time to review your long patch.

Certainly. Thanks for the review.

I have three important design questions relative to that patch and the Combinatorial classes which it modify

  • should the attributes of a CombinatorialClass? be visible like S.w or do we prefer S._w (some code in graphs uses a direct call to S.w) ?
  • what should be the inheritance/relations between say Subsets (subset of a given set) and Subsets_k (subset of size k of a given set) ?
  • what should be the output of Subwords ? tuple, list, strings, any user defined type ? Actually, if you initialize it with list (resp. tuple, string) it output lists (resp. tuple, string).

I just notice the following:

file: sage/combinat/choose_nk.py
85	            sage: [0,2] in c52   # trac XXXX 

You probably want to replace XXXX by an actual number.

Done

comment:13 Changed 16 months ago by ncohen

  • Status changed from needs_review to needs_info

comment:14 Changed 13 months ago by vdelecroix

  • Description modified (diff)

Nathan, you should be more explicit when asking for more info...

comment:15 Changed 13 months ago by vdelecroix

  • Status changed from needs_info to needs_review
  • Description modified (diff)

comment:16 Changed 13 months ago by nthiery

  • Summary changed from Generation of subwords, subsets and set partitions to Optimizations for the generation of subwords, subsets and set partitions

comment:17 Changed 12 months ago by vdelecroix

I update the file to make it works on sage-5.0. I will update the corresponding patch in the combinat queue.

comment:18 Changed 10 months ago by chapoton

This code breaks a test in polytopes.parallelotope, because this last procedure calls Combinations and expect to receive a list and not a tuple. This has to be cured, maybe by inserting a list() at the appropriate place in this procedure. Please test that the correction works before uploading a new patch.

Please also remove all trailing whitespaces, see the bot report.

comment:19 follow-up: ↓ 20 Changed 10 months ago by chapoton

Here is a minimal example of what goes wrong in polytopes.parallelotope :

sage: g=[ vector([1,0]), vector([0,1])]
sage: Combinations(g)

This is a mutability/hash problem, when trying to convert a multiset in a set, apparently..

comment:20 in reply to: ↑ 19 Changed 10 months ago by vdelecroix

Replying to chapoton:

Here is a minimal example of what goes wrong in polytopes.parallelotope :

sage: g=[ vector([1,0]), vector([0,1])]
sage: Combinations(g)

This is a mutability/hash problem, when trying to convert a multiset in a set, apparently..

In the above example, the command fails because the *vector* is not hashable.

Changed 10 months ago by vdelecroix

comment:21 Changed 10 months ago by vdelecroix

I correct the error and add the ticket number on the first line of the patch (as asked by the patchbot).

comment:22 Changed 9 months ago by chapoton

  • Description modified (diff)

Apply only:

  • trac_10534-generation_of_subsets_and_set_partitions.3.patch

comment:23 Changed 9 months ago by chapoton

new patch, to ensure 100% doctest

Last edited 9 months ago by chapoton (previous) (diff)

comment:24 follow-up: ↓ 25 Changed 7 months ago by emchennyc

I got 100% doctest coverage for all the edited files but there was one warning. If someone doesn't mind explaining what is the significance of this, that would be helpful.


SCORE /Applications/sage/devel/sage-main/sage/combinat/subset.py: 100% (33 of 33)

Possibly wrong (function name doesn't occur in doctests):

  • _element_constructor_(self, x):
  • _element_constructor_(self, x):

Also, "devel/sage/sage/tests/interrupt.pyx" timed out when I ran the long doctests.

comment:25 in reply to: ↑ 24 Changed 7 months ago by vdelecroix


SCORE /Applications/sage/devel/sage-main/sage/combinat/subset.py: 100% (33 of 33)

Possibly wrong (function name doesn't occur in doctests):

  • _element_constructor_(self, x):
  • _element_constructor_(self, x):

This is because the methods _element_constructor_(self, x) are not directly tested. There should be a comment "# indirect doctest" after the corresponding test. Basically, this function is used to build an element of a parent, but it is called through the method call of the parent (which is implemented in Parent).

Also, "devel/sage/sage/tests/interrupt.pyx" timed out when I ran the long doctests.

This is much more ennoying and I do not know why.

comment:26 Changed 2 months ago by ncohen

Helloooooooooooooooo !!!

While I read your patch, two things :

  • an "It's" should become an "its"
  • why do you change the ouput type from lists to tuples ?

Btw, sorry I did not answer your comment many months ago, I may have missed the warning from trac : I had set the ticket to needs_info because you seemed to be asking rather fundamental questions which would have had an impact on the patch, so that it would have been a pity to review the patch if you intended to change it so much later.

Have you had an answer from Florent since ?

Nathann

Note: See TracTickets for help on using tickets.