Opened 7 years ago
Closed 4 years ago
#13982 closed enhancement (fixed)
rewrite sage.combinat.combinat.unordered_tuples using itertools.combinations_with_replacement
Reported by:  nbruin  Owned by:  sagecombinat 

Priority:  minor  Milestone:  sage6.5 
Component:  combinatorics  Keywords:  
Cc:  tscrim  Merged in:  
Authors:  Travis Scrimshaw  Reviewers:  Vincent Delecroix, Darij Grinberg 
Report Upstream:  N/A  Work issues:  
Branch:  8676829 (Commits)  Commit:  86768293bb94d077f37dc15aff84f0f332422b87 
Dependencies:  Stopgaps: 
Description
With Python 2.7 there's a native implementation in python's itertools:
sage: list(itertools.combinations_with_replacement([1,2,3,4],2)) [(1, 1), (1, 2), (1, 3), (1, 4), (2, 2), (2, 3), (2, 4), (3, 3), (3, 4), (4, 4)] sage: sage.combinat.combinat.unordered_tuples([1,2,3,4],2) [[1, 1], [1, 2], [1, 3], [1, 4], [2, 2], [2, 3], [2, 4], [3, 3], [3, 4], [4, 4]]
Given that the current doc of unordered_tuples
mentions that it wraps a GAP routine and that this is undesirable, replacing it (or deprecating it!) may be a good idea.
Change History (35)
comment:1 Changed 6 years ago by
 Milestone changed from sage5.11 to sage5.12
comment:2 Changed 6 years ago by
 Milestone changed from sage6.1 to sage6.2
comment:3 Changed 5 years ago by
 Milestone changed from sage6.2 to sage6.3
comment:4 Changed 5 years ago by
 Cc tscrim added
comment:5 Changed 5 years ago by
 Milestone changed from sage6.3 to sage6.4
comment:6 Changed 5 years ago by
 Branch set to public/combinat/rewrite_unordered_tuples13982
 Commit set to 373e5858694adf822102b2a1b9e40ce53cd776a6
 Status changed from new to needs_review
comment:7 Changed 5 years ago by
 Status changed from needs_review to needs_work
Please justify the removal of the docstring with result ['aa', 'ab', 'ac', 'bb', 'bc', 'cc']
.
comment:8 Changed 5 years ago by
 Status changed from needs_work to needs_review
It's an unexpected format, in particular, the results aren't tuples. One can argue from the POV of ducktyping, it's okay, but I feel the inconsistency and the fact that it would fail a more explicit isinstance
check the behavior should be standardized for all inputs. (Side note, I feel that should be a needs_info
.)
comment:9 Changed 5 years ago by
 Status changed from needs_review to needs_info
Hello,
Why not a deprecation? It is nowhere used in Sage code and it would be good to advertise users that it is infinitely better to use itertools directly.
Vincent
comment:10 Changed 5 years ago by
 Status changed from needs_info to needs_review
Because using the itertools
directly for a multiset gives multiple copies of the same tuple
sage: list(itertools.combinations_with_replacement([1,1], 2)) [(1, 1), (1, 1), (1, 1)]
which is a different behavior (this should just return a single tuple of [(1, 1)]
).
comment:11 Changed 5 years ago by
 Status changed from needs_review to needs_work
Needs a merge/rebase.
Nathann
comment:12 Changed 5 years ago by
 Commit changed from 373e5858694adf822102b2a1b9e40ce53cd776a6 to ca8ec669f1831b24ad6981ae66d86e5dcf1a86de
Branch pushed to git repo; I updated commit sha1. New commits:
ca8ec66  Merge branch 'public/combinat/rewrite_unordered_tuples13982' of trac.sagemath.org:sage into public/combinat/rewrite_unordered_tuples13982

comment:13 Changed 5 years ago by
 Milestone changed from sage6.4 to sage6.5
 Status changed from needs_work to needs_review
Done.
comment:14 Changed 5 years ago by
What about tuples
vs itertools.product
sage: S = [1,2] sage: %timeit tuples(S,3) 10000 loops, best of 3: 26.9 µs per loop sage: %timeit list(product(S,repeat=3)) 100000 loops, best of 3: 1.74 µs per loop
And the function number_of_tuples
should not wrap anything, the answer is just len(S)^k
.
comment:15 Changed 5 years ago by
 Commit changed from ca8ec669f1831b24ad6981ae66d86e5dcf1a86de to 3c0c50adb02c5046a9b26273498c1f56eee9385e
Branch pushed to git repo; I updated commit sha1. New commits:
3c0c50a  Implemented faster versions of tuples.

comment:16 Changed 5 years ago by
I've added that as well. Note the output order has changed (and the fact that it now returns honest tuples).
comment:17 Changed 5 years ago by
Hello,
Much cleaner now! What do you think of deprecations (not necessarily in this ticket)? Do we really want to keep the interface to GAP for those two iterators?
Vincent
comment:18 followup: ↓ 19 Changed 5 years ago by
I think it is good to have access to multiple implementations since onemight be faster in certain situations and it makes for easier testing. At the very least, it doesn't do any harm being there IMO. (There's also a president for doing this with other such functions.)
comment:19 in reply to: ↑ 18 ; followup: ↓ 20 Changed 4 years ago by
Replying to tscrim:
I think it is good to have access to multiple implementations since onemight be faster in certain situations and it makes for easier testing.
In the present case itertools
is always faster and I think people should use itertools directly.
At the very least, it doesn't do any harm being there IMO. (There's also a president for doing this with other such functions.)
Yes it does. This is one more function in the global namespace, one more file to maintain.
comment:20 in reply to: ↑ 19 Changed 4 years ago by
Yes it does. This is one more function in the global namespace, one more file to maintain.
Plus let us only keep the code that we think worthy of being used.
Nathann
comment:21 followups: ↓ 22 ↓ 23 Changed 4 years ago by
It's not adding anything more to the global namespace. Plus it is a separate (possibly different) algorithm, and you have to prove that in *every* case that the itertools implementation is faster.
comment:22 in reply to: ↑ 21 Changed 4 years ago by
It's not adding anything more to the global namespace. Plus it is a separate (possibly different) algorithm, and you have to prove that in *every* case that the itertools implementation is faster.
This seems to be a very bad political idea. It is very likely that even though nobody ever uses this implementation because it is obviously slower than itertools, nobody will ever be able to give a formal proof that it is slower in all cases.
Nathann
comment:23 in reply to: ↑ 21 Changed 4 years ago by
Replying to tscrim:
It's not adding anything more to the global namespace.
All of tuples
, number_of_tuples
, unordered_tuples
and number_of_unordered_tuples
are in the global namespace. I do not claim that your branch adds something to the namespace but that it was already useless noise.
comment:24 Changed 4 years ago by
@ncohen Even if you could show it is always slower (large sets can behave very differently, so it's not obvious to me), it makes it better testing because they are independent algorithms (although it is highly unlikely one of the algorithms has a bug).
@vdelecroix Sorry, I interpreted the "this" as my branch.
comment:25 Changed 4 years ago by
The "native" implementation of tuple
is recursive, and with the current implementation it passes the string 'native'
to itself a zillion times. Am I seeing ghosts or could this be an inefficiency? (I would implement the native algorithm, if anywhere, then in a separate method tuples_native
, which would recurse upon itself, and which I would then reference *one time* from tuples
.)
On number_of_tuples
, is there really any chance for a gap call to beat return ZZ( len(set(S)) )**k
? I imagine the set
call could take a while if S
consists of complex things, but then again the doc of unordered_tuples
warns against passing complex things to gap, which I assume should also apply to number_of_tuples
. But I am too lazy to do timings...
comment:26 Changed 4 years ago by
 Commit changed from 3c0c50adb02c5046a9b26273498c1f56eee9385e to 79cb89296fce3f90b9e59cab797e7c7de57443fc
comment:27 Changed 4 years ago by
Sorry for the badly named merge commit (I started writing my next console command before the pull was finished, and what I typed ended up in the merge message instead; I thought I had cleaned it). If you are fine with my changes, feel free to set this to positive review! (I have dealt with the first paragraph of my previous post; the second paragraph has become two TODOs, but they are very lowpriority.)
comment:28 Changed 4 years ago by
 Reviewers set to Vincent Delecroix, Darij Grinberg
The merge message is fine with me. I'm happy with your changes except I think we should remove the todo because having the Gap version allows us to do better testing by comparing the output from the algorithms. However if you really want to leave those in there, then you can go ahead and set a positive review.
comment:29 Changed 4 years ago by
Thanks  feel free to remove the TODOs; I would do so myself but currently I'm not at my home or work computer.
comment:30 Changed 4 years ago by
 Commit changed from 79cb89296fce3f90b9e59cab797e7c7de57443fc to 697b1ed4e560ef15e1316b6f9957b900d1e99f5e
Branch pushed to git repo; I updated commit sha1. New commits:
697b1ed  Removed todo messages.

comment:31 Changed 4 years ago by
 Status changed from needs_review to positive_review
comment:32 Changed 4 years ago by
 Status changed from positive_review to needs_work
PDF docs don't bulid
comment:33 Changed 4 years ago by
 Commit changed from 697b1ed4e560ef15e1316b6f9957b900d1e99f5e to 86768293bb94d077f37dc15aff84f0f332422b87
Branch pushed to git repo; I updated commit sha1. New commits:
8676829  Fixing pdf build.

comment:34 Changed 4 years ago by
 Status changed from needs_work to positive_review
This should fix the pdf build...
comment:35 Changed 4 years ago by
 Branch changed from public/combinat/rewrite_unordered_tuples13982 to 86768293bb94d077f37dc15aff84f0f332422b87
 Resolution set to fixed
 Status changed from positive_review to closed
In order to get the behavior to match, I had to call
sorted(set(S))
on the base setS
when usingitertools
. On using the output from GAP, I had to domap(tuple, eval(ans))
because that returned lists (or sometimes strings). I've given the choice to the user to decide on which algorithm. Here are some timings (with GAP already running):So even with everything, there is a major speedup. Now once this gets replaced with libgap (#16719), we'll have to rerun timings.
New commits:
Implemented unordered_tuples using itertools.