Opened 7 years ago
Closed 7 years ago
#13764 closed defect (fixed)
CartesianProduct with generators -> silent wrong answer
Reported by: | vbraun | Owned by: | sage-combinat |
---|---|---|---|
Priority: | major | Milestone: | sage-5.7 |
Component: | combinatorics | Keywords: | |
Cc: | Merged in: | sage-5.7.beta3 | |
Authors: | Volker Braun | Reviewers: | Nathann Cohen |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | #13763 | Stopgaps: |
Description
This is actually documented in the docstring:
sage: def a(n): yield 1*n; yield 2*n sage: def b(): yield 'a'; yield 'b' sage: CartesianProduct(a(3), b()).list() [[3, 'a'], [3, 'b']]
But by the principle of minimal surprise this should yield an error about generators and not silently return the wrong answer. Just to illustrate how dangerous this behavior is note that sage/combinat/integer_vector.py
falls precisely into this trap.
Attachments (1)
Change History (24)
comment:1 Changed 7 years ago by
- Status changed from new to needs_info
comment:2 Changed 7 years ago by
Nobody forces you to traverse the whole CartesianProduct
, you might just want 10 different elements of a product of potentially infinite iterators.
comment:3 Changed 7 years ago by
- Status changed from needs_info to needs_review
comment:4 Changed 7 years ago by
Oh right. Then what about, in this case build partial (local) lists of what the iterators contain, as they are explored ?
Nathann
comment:5 Changed 7 years ago by
There is no way to find out if a generator is infinite or not.
comment:6 Changed 7 years ago by
NOnonononoo I know ! But I was thinking of something like that :
You can easily build an iterator over the elements of N^15
: you first need to enumerate the elements of [0] x ... x [0]
. Then, once you are done, you can enumerate the elements of [0,1] x ... x [0,1]
that you have not enumerate before. Then, once you are done, you can enumerate the elements of [0,1,2] x .... x [0,1,2]
that you have not enumerated before, etc, etc. You see what I mean ? What I mean is of course a bit more complicated than that, because in this situation you do not know the cardinality of your iterators, you do not even know (and cannot, as you say) decide whether they are finite, etc... But if you "ask a new element from an iterator" only when you need it, you can always remember a finite partial list from the elements of each iterator, and still iterate over ALL elements !
Nathann
comment:7 Changed 7 years ago by
(and of course there is a way to do this without remembering "the list of things you have already returned")
Or else, a bit easier, is to return the elements in lexicographical order. It's less pretty, but easier to implement. And in this situation, you only remember from each iterator the list of elements you have already used. And you can "ask for another element" when needed.
Actually, this order of enumeration is more realistic. I prefered the previous one, but it is trickier to enumerate :-P
Nathann
comment:8 follow-up: ↓ 9 Changed 7 years ago by
Isn't all the funky stuff part of the *EnumeratedSet
framework that the combinat people are working on? I don't want to replace the CartesianProduct
iterator with something that incurs significant overhead.
comment:9 in reply to: ↑ 8 Changed 7 years ago by
Isn't all the funky stuff part of the
*EnumeratedSet
framework that the combinat people are working on? I don't want to replace theCartesianProduct
iterator with something that incurs significant overhead.
Yeah, when we need significant overhead we know that we can always delegate to combinat stuff :-P
Of course, we could *only* do that with the elements of the list which are generators, and use sensible code otherwise. With warnings everywhere.
Actually, perhaps the best would be that :
There is an optional flag called cache_iterator or something, which is set to False by default. When it is set to False, the function behaves exactly like the one you wrote, that is REFUSES INPUT if there is an iterator among them. The message could even say "if your iterators are all finite, wrap them with a
list()
to make them finite.
Then, we could say (in the exception message, or in the doc, or at both places) that if you want the code to work with infinite stuff, or when you cannot ensure that all iterators are finite, and hence cannot wrap them with list()
casts, then you can set cache_iterator
to True
which would do all the caching (or call the combinat stuff and enjoy their overhead if they already wrote it somewhere).
Hence :
- No speed loss for smart cases
- The speed will not be destroyed because the user forgot to wrap with
list()
calls, or just did not know it could make a difference - Crazy guys iterating on infinite things can still use the function, be it finally implemented in the method itself or delegated to combinat stuff.
Well.. Is there any other corner case ? :-P
Nathann
comment:10 follow-up: ↓ 11 Changed 7 years ago by
Iterators (objects implementing __iter__
) are fine , only generators (the yield
keyword) don't work.
I don't see the point in an extra flag to install different workarounds to generators, if you want to convert the input to a list then just do it yourself.
comment:11 in reply to: ↑ 10 ; follow-up: ↓ 12 Changed 7 years ago by
Iterators (objects implementing
__iter__
) are fine , only generators (theyield
keyword) don't work.
Oops, I see. I thought that the problem came from infinite sets, sorry.. But right now iterators are copied, and so the elements they iterate over are generated several times... Scary O_o
I don't see the point in an extra flag to install different workarounds to generators, if you want to convert the input to a list then just do it yourself.
Well, you cannot convert them to lists when they are infinite, and you do not want to cache stuff when you already stored it once.
Anyway, sorry for the misunderstanding. I'll look at the code and start making sense, if possible.
Nathann
comment:12 in reply to: ↑ 11 ; follow-up: ↓ 13 Changed 7 years ago by
Replying to ncohen:
Oops, I see. I thought that the problem came from infinite sets, sorry.. But right now iterators are copied, and so the elements they iterate over are generated several times...
Thats how the cartesian product works. Shouldn't come as a surprise.
Well, you cannot convert them to lists when they are infinite, and you do not want to cache stuff when you already stored it once.
But you can wrap your generator into an IterableFunctionCall
if it is potentially infinite.
comment:13 in reply to: ↑ 12 Changed 7 years ago by
Thats how the cartesian product works. Shouldn't come as a surprise.
Yeah, but you could want to cache them sometimes.. Do not generate them over and over if it takes time to generate them... Of course the opposite may also be true : it is very cheap to generate them, even over and over, but you do not want to store them.
But you can wrap your generator into an
IterableFunctionCall
if it is potentially infinite.
Yepyep. Right.
Nathann
comment:14 Changed 7 years ago by
Ahahahah. Well, very easy patch in the end... Looks all good, but could you modify the exception to something like that : "Generators are not allowed : read the method's documentation for a workaround" ?
You just told me about IterableFunctionCall
and I had never heard of it before :-)
Nathann
comment:15 Changed 7 years ago by
Of and also could you please make IterableFunctionCall
a html link, so that it is easier to find its documentation when reading the doc ?
(sorryyyyyyyyyy)
Nathann
comment:16 Changed 7 years ago by
Done
comment:17 Changed 7 years ago by
- Status changed from needs_review to positive_review
Well.. Looks good, passes all tests... Sorry for the first misunderstanding :-)
Nathann
comment:18 Changed 7 years ago by
- Status changed from positive_review to needs_work
Arggggggggg....
Ahem...
sage -t -long "devel/sage-2/sage/combinat/root_system/associahedron.py" ********************************************************************** File "/home/ncohen/.Sage/devel/sage-2/sage/combinat/root_system/associahedron.py", line 83: sage: TestSuite(Asso).run() Expected nothing Got: Failure in _test_pickling: Traceback (most recent call last): File "/home/ncohen/.Sage/local/lib/python/site-packages/sage/misc/sage_unittest.py", line 275, in run test_method(tester = tester) File "sage_object.pyx", line 396, in sage.structure.sage_object.SageObject._test_pickling (sage/structure/sage_object.c:3669) File "/home/ncohen/.Sage/local/lib/python2.7/unittest/case.py", line 511, in assertEqual assertion_func(first, second, msg=msg) File "/home/ncohen/.Sage/local/lib/python2.7/unittest/case.py", line 501, in _baseAssertEqual if not first == second: File "/home/ncohen/.Sage/local/lib/python/site-packages/sage/geometry/polyhedron/base.py", line 382, in __eq__ return self._is_subpolyhedron(other) and other._is_subpolyhedron(self) File "/home/ncohen/.Sage/local/lib/python/site-packages/sage/geometry/polyhedron/base.py", line 490, in _is_subpolyhedron CartesianProduct(other.Hrep_generator(), self.Vrep_generator()) ) File "/home/ncohen/.Sage/local/lib/python/site-packages/sage/combinat/cartesian_product.py", line 64, in CartesianProduct '(type "CartesianProduct?") for a workaround') ValueError: generators are not allowed, see the documentation (type "CartesianProduct?") for a workaround ------------------------------------------------------------ The following tests failed: _test_pickling **********************************************************************
comment:19 Changed 7 years ago by
- Dependencies set to #13763
- Reviewers set to Nathann Cohen
- Status changed from needs_work to positive_review
The failure in polyhedra is fixed in #13763. This was where I noticed the breakage in CartesianProduct
and came to open this ticket. I'm switching this ticket back to positive review since there is nothing to change here.
comment:20 Changed 7 years ago by
Well, then... :-)
Nathann
comment:21 Changed 7 years ago by
- Milestone changed from sage-5.6 to sage-pending
comment:22 Changed 7 years ago by
- Milestone changed from sage-pending to sage-5.7
comment:23 Changed 7 years ago by
- Merged in set to sage-5.7.beta3
- Resolution set to fixed
- Status changed from positive_review to closed
Hellooooooo !!
Well, same question as the one asked on sage-devel : why shouldn't we build the list of each iterator ? If the list is infinite then the code will not terminate, but of course the user asked for it, and if we do make lists from generators we store a quantity of information which is (expectedly) small compared to the output ?
Err... Actually, if the function does not return a list but only "yields" the elements one at a time, what we store could actually be much, much more than what we should.
Hence, what about doing this :
With a
.. NOTE::
in the method's documentation, saying exactly that. This way we never use more than what we actually need to use.Nathann