Opened 7 years ago

Closed 5 years ago

#13982 closed enhancement (fixed)

rewrite sage.combinat.combinat.unordered_tuples using itertools.combinations_with_replacement

Reported by: nbruin Owned by: sage-combinat
Priority: minor Milestone: sage-6.5
Component: combinatorics Keywords:
Cc: tscrim Merged in:
Authors: Travis Scrimshaw Reviewers: Vincent Delecroix, Darij Grinberg
Report Upstream: N/A Work issues:
Branch: 8676829 (Commits) Commit: 86768293bb94d077f37dc15aff84f0f332422b87
Dependencies: Stopgaps:

Description

With Python 2.7 there's a native implementation in python's itertools:

sage: list(itertools.combinations_with_replacement([1,2,3,4],2))
[(1, 1), (1, 2), (1, 3), (1, 4), (2, 2), (2, 3), (2, 4), (3, 3), (3, 4), (4, 4)]
sage: sage.combinat.combinat.unordered_tuples([1,2,3,4],2)
[[1, 1], [1, 2], [1, 3], [1, 4], [2, 2], [2, 3], [2, 4], [3, 3], [3, 4], [4, 4]]

Given that the current doc of unordered_tuples mentions that it wraps a GAP routine and that this is undesirable, replacing it (or deprecating it!) may be a good idea.

Change History (35)

comment:1 Changed 6 years ago by jdemeyer

  • Milestone changed from sage-5.11 to sage-5.12

comment:2 Changed 6 years ago by vbraun_spam

  • Milestone changed from sage-6.1 to sage-6.2

comment:3 Changed 6 years ago by vbraun_spam

  • Milestone changed from sage-6.2 to sage-6.3

comment:4 Changed 5 years ago by tscrim

  • Cc tscrim added

comment:5 Changed 5 years ago by vbraun_spam

  • Milestone changed from sage-6.3 to sage-6.4

comment:6 Changed 5 years ago by tscrim

  • Authors set to Travis Scrimshaw
  • Branch set to public/combinat/rewrite_unordered_tuples-13982
  • Commit set to 373e5858694adf822102b2a1b9e40ce53cd776a6
  • Status changed from new to needs_review

In order to get the behavior to match, I had to call sorted(set(S)) on the base set S when using itertools. On using the output from GAP, I had to do map(tuple, eval(ans)) because that returned lists (or sometimes strings). I've given the choice to the user to decide on which algorithm. Here are some timings (with GAP already running):

sage: %timeit unordered_tuples([1,2,3],2,algorithm='itertools')
10000 loops, best of 3: 31.9 µs per loop
sage: %timeit unordered_tuples([1,2,3],2,algorithm='gap')
1000 loops, best of 3: 1.65 ms per loop
sage: %timeit eval(gap.eval( "UnorderedTuples(%s,%s)"%([1,2,3],ZZ(2)) ))
1000 loops, best of 3: 1.65 ms per loop

sage: %timeit unordered_tuples([1,2],6)
10000 loops, best of 3: 31.4 µs per loop
sage: %timeit unordered_tuples([1,2],6,algorithm='gap')
100 loops, best of 3: 2.33 ms per loop
sage: %timeit eval(gap.eval( "UnorderedTuples(%s,%s)"%([1,2],ZZ(6)) ))
100 loops, best of 3: 2.06 ms per loop

sage: %timeit unordered_tuples(range(30), 6)
1 loops, best of 3: 590 ms per loop
sage: %timeit eval(gap.eval( "UnorderedTuples(%s,%s)"%(range(30),ZZ(6)) ))
# I got bored after a minute and killed it

So even with everything, there is a major speedup. Now once this gets replaced with libgap (#16719), we'll have to rerun timings.


New commits:

373e585Implemented unordered_tuples using itertools.

comment:7 Changed 5 years ago by rws

  • Status changed from needs_review to needs_work

Please justify the removal of the docstring with result ['aa', 'ab', 'ac', 'bb', 'bc', 'cc'].

comment:8 Changed 5 years ago by tscrim

  • Status changed from needs_work to needs_review

It's an unexpected format, in particular, the results aren't tuples. One can argue from the POV of ducktyping, it's okay, but I feel the inconsistency and the fact that it would fail a more explicit isinstance check the behavior should be standardized for all inputs. (Side note, I feel that should be a needs_info.)

comment:9 Changed 5 years ago by vdelecroix

  • Status changed from needs_review to needs_info

Hello,

Why not a deprecation? It is nowhere used in Sage code and it would be good to advertise users that it is infinitely better to use itertools directly.

Vincent

comment:10 Changed 5 years ago by tscrim

  • Status changed from needs_info to needs_review

Because using the itertools directly for a multiset gives multiple copies of the same tuple

sage: list(itertools.combinations_with_replacement([1,1], 2))
[(1, 1), (1, 1), (1, 1)]

which is a different behavior (this should just return a single tuple of [(1, 1)]).

comment:11 Changed 5 years ago by ncohen

  • Status changed from needs_review to needs_work

Needs a merge/rebase.

Nathann

comment:12 Changed 5 years ago by git

  • Commit changed from 373e5858694adf822102b2a1b9e40ce53cd776a6 to ca8ec669f1831b24ad6981ae66d86e5dcf1a86de

Branch pushed to git repo; I updated commit sha1. New commits:

ca8ec66Merge branch 'public/combinat/rewrite_unordered_tuples-13982' of trac.sagemath.org:sage into public/combinat/rewrite_unordered_tuples-13982

comment:13 Changed 5 years ago by tscrim

  • Milestone changed from sage-6.4 to sage-6.5
  • Status changed from needs_work to needs_review

Done.

comment:14 Changed 5 years ago by vdelecroix

What about tuples vs itertools.product

sage: S = [1,2]
sage: %timeit tuples(S,3)
10000 loops, best of 3: 26.9 µs per loop
sage: %timeit list(product(S,repeat=3))
100000 loops, best of 3: 1.74 µs per loop

And the function number_of_tuples should not wrap anything, the answer is just len(S)^k.

Last edited 5 years ago by vdelecroix (previous) (diff)

comment:15 Changed 5 years ago by git

  • Commit changed from ca8ec669f1831b24ad6981ae66d86e5dcf1a86de to 3c0c50adb02c5046a9b26273498c1f56eee9385e

Branch pushed to git repo; I updated commit sha1. New commits:

3c0c50aImplemented faster versions of tuples.

comment:16 Changed 5 years ago by tscrim

I've added that as well. Note the output order has changed (and the fact that it now returns honest tuples).

comment:17 Changed 5 years ago by vdelecroix

Hello,

Much cleaner now! What do you think of deprecations (not necessarily in this ticket)? Do we really want to keep the interface to GAP for those two iterators?

Vincent

comment:18 follow-up: Changed 5 years ago by tscrim

I think it is good to have access to multiple implementations since onemight be faster in certain situations and it makes for easier testing. At the very least, it doesn't do any harm being there IMO. (There's also a president for doing this with other such functions.)

comment:19 in reply to: ↑ 18 ; follow-up: Changed 5 years ago by vdelecroix

Replying to tscrim:

I think it is good to have access to multiple implementations since onemight be faster in certain situations and it makes for easier testing.

In the present case itertools is always faster and I think people should use itertools directly.

At the very least, it doesn't do any harm being there IMO. (There's also a president for doing this with other such functions.)

Yes it does. This is one more function in the global namespace, one more file to maintain.

comment:20 in reply to: ↑ 19 Changed 5 years ago by ncohen

Yes it does. This is one more function in the global namespace, one more file to maintain.

Plus let us only keep the code that we think worthy of being used.

Nathann

comment:21 follow-ups: Changed 5 years ago by tscrim

It's not adding anything more to the global namespace. Plus it is a separate (possibly different) algorithm, and you have to prove that in *every* case that the itertools implementation is faster.

comment:22 in reply to: ↑ 21 Changed 5 years ago by ncohen

It's not adding anything more to the global namespace. Plus it is a separate (possibly different) algorithm, and you have to prove that in *every* case that the itertools implementation is faster.

This seems to be a very bad political idea. It is very likely that even though nobody ever uses this implementation because it is obviously slower than itertools, nobody will ever be able to give a formal proof that it is slower in all cases.

Nathann

comment:23 in reply to: ↑ 21 Changed 5 years ago by vdelecroix

Replying to tscrim:

It's not adding anything more to the global namespace.

All of tuples, number_of_tuples, unordered_tuples and number_of_unordered_tuples are in the global namespace. I do not claim that your branch adds something to the namespace but that it was already useless noise.

comment:24 Changed 5 years ago by tscrim

@ncohen Even if you could show it is always slower (large sets can behave very differently, so it's not obvious to me), it makes it better testing because they are independent algorithms (although it is highly unlikely one of the algorithms has a bug).

@vdelecroix Sorry, I interpreted the "this" as my branch.

comment:25 Changed 5 years ago by darij

The "native" implementation of tuple is recursive, and with the current implementation it passes the string 'native' to itself a zillion times. Am I seeing ghosts or could this be an inefficiency? (I would implement the native algorithm, if anywhere, then in a separate method tuples_native, which would recurse upon itself, and which I would then reference *one time* from tuples.)

On number_of_tuples, is there really any chance for a gap call to beat return ZZ( len(set(S)) )**k ? I imagine the set call could take a while if S consists of complex things, but then again the doc of unordered_tuples warns against passing complex things to gap, which I assume should also apply to number_of_tuples. But I am too lazy to do timings...

comment:26 Changed 5 years ago by git

  • Commit changed from 3c0c50adb02c5046a9b26273498c1f56eee9385e to 79cb89296fce3f90b9e59cab797e7c7de57443fc

Branch pushed to git repo; I updated commit sha1. New commits:

256c54fMerge branches 'develop' and
64c8ab7review changes and add analogous change to number_of_unordered_tuples
79cb892let's not scare people with symbolics

comment:27 Changed 5 years ago by darij

Sorry for the badly named merge commit (I started writing my next console command before the pull was finished, and what I typed ended up in the merge message instead; I thought I had cleaned it). If you are fine with my changes, feel free to set this to positive review! (I have dealt with the first paragraph of my previous post; the second paragraph has become two TODOs, but they are very low-priority.)

comment:28 Changed 5 years ago by tscrim

  • Reviewers set to Vincent Delecroix, Darij Grinberg

The merge message is fine with me. I'm happy with your changes except I think we should remove the todo because having the Gap version allows us to do better testing by comparing the output from the algorithms. However if you really want to leave those in there, then you can go ahead and set a positive review.

comment:29 Changed 5 years ago by darij

Thanks -- feel free to remove the TODOs; I would do so myself but currently I'm not at my home or work computer.

comment:30 Changed 5 years ago by git

  • Commit changed from 79cb89296fce3f90b9e59cab797e7c7de57443fc to 697b1ed4e560ef15e1316b6f9957b900d1e99f5e

Branch pushed to git repo; I updated commit sha1. New commits:

697b1edRemoved todo messages.

comment:31 Changed 5 years ago by tscrim

  • Status changed from needs_review to positive_review

comment:32 Changed 5 years ago by vbraun

  • Status changed from positive_review to needs_work

PDF docs don't bulid

comment:33 Changed 5 years ago by git

  • Commit changed from 697b1ed4e560ef15e1316b6f9957b900d1e99f5e to 86768293bb94d077f37dc15aff84f0f332422b87

Branch pushed to git repo; I updated commit sha1. New commits:

8676829Fixing pdf build.

comment:34 Changed 5 years ago by tscrim

  • Status changed from needs_work to positive_review

This should fix the pdf build...

comment:35 Changed 5 years ago by vbraun

  • Branch changed from public/combinat/rewrite_unordered_tuples-13982 to 86768293bb94d077f37dc15aff84f0f332422b87
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.