Opened 7 years ago

Closed 7 years ago

#13764 closed defect (fixed)

CartesianProduct with generators -> silent wrong answer

Reported by: vbraun Owned by: sage-combinat
Priority: major Milestone: sage-5.7
Component: combinatorics Keywords:
Cc: Merged in: sage-5.7.beta3
Authors: Volker Braun Reviewers: Nathann Cohen
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: #13763 Stopgaps:

Description

This is actually documented in the docstring:

sage: def a(n): yield 1*n; yield 2*n
sage: def b(): yield 'a'; yield 'b'
sage: CartesianProduct(a(3), b()).list()
[[3, 'a'], [3, 'b']]

But by the principle of minimal surprise this should yield an error about generators and not silently return the wrong answer. Just to illustrate how dangerous this behavior is note that sage/combinat/integer_vector.py falls precisely into this trap.

Attachments (1)

trac_13764_CartesianProduct_check.patch (3.6 KB) - added by vbraun 7 years ago.
Updated patch

Download all attachments as: .zip

Change History (24)

comment:1 Changed 7 years ago by ncohen

  • Status changed from new to needs_info

Hellooooooo !!

Well, same question as the one asked on sage-devel : why shouldn't we build the list of each iterator ? If the list is infinite then the code will not terminate, but of course the user asked for it, and if we do make lists from generators we store a quantity of information which is (expectedly) small compared to the output ?

Err... Actually, if the function does not return a list but only "yields" the elements one at a time, what we store could actually be much, much more than what we should.

Hence, what about doing this :

  • If we are given two lists, just do the job without problems
  • If we are given iterators, convert them to lists first then do the job ?

With a .. NOTE:: in the method's documentation, saying exactly that. This way we never use more than what we actually need to use.

Nathann

comment:2 Changed 7 years ago by vbraun

Nobody forces you to traverse the whole CartesianProduct, you might just want 10 different elements of a product of potentially infinite iterators.

Last edited 7 years ago by vbraun (previous) (diff)

comment:3 Changed 7 years ago by vbraun

  • Status changed from needs_info to needs_review

comment:4 Changed 7 years ago by ncohen

Oh right. Then what about, in this case build partial (local) lists of what the iterators contain, as they are explored ?

Nathann

comment:5 Changed 7 years ago by vbraun

There is no way to find out if a generator is infinite or not.

comment:6 Changed 7 years ago by ncohen

NOnonononoo I know ! But I was thinking of something like that :

You can easily build an iterator over the elements of N^15 : you first need to enumerate the elements of [0] x ... x [0]. Then, once you are done, you can enumerate the elements of [0,1] x ... x [0,1] that you have not enumerate before. Then, once you are done, you can enumerate the elements of [0,1,2] x .... x [0,1,2] that you have not enumerated before, etc, etc. You see what I mean ? What I mean is of course a bit more complicated than that, because in this situation you do not know the cardinality of your iterators, you do not even know (and cannot, as you say) decide whether they are finite, etc... But if you "ask a new element from an iterator" only when you need it, you can always remember a finite partial list from the elements of each iterator, and still iterate over ALL elements !

Nathann

comment:7 Changed 7 years ago by ncohen

(and of course there is a way to do this without remembering "the list of things you have already returned")

Or else, a bit easier, is to return the elements in lexicographical order. It's less pretty, but easier to implement. And in this situation, you only remember from each iterator the list of elements you have already used. And you can "ask for another element" when needed.

Actually, this order of enumeration is more realistic. I prefered the previous one, but it is trickier to enumerate :-P

Nathann

comment:8 follow-up: Changed 7 years ago by vbraun

Isn't all the funky stuff part of the *EnumeratedSet framework that the combinat people are working on? I don't want to replace the CartesianProduct iterator with something that incurs significant overhead.

comment:9 in reply to: ↑ 8 Changed 7 years ago by ncohen

Isn't all the funky stuff part of the *EnumeratedSet framework that the combinat people are working on? I don't want to replace the CartesianProduct iterator with something that incurs significant overhead.

Yeah, when we need significant overhead we know that we can always delegate to combinat stuff :-P

Of course, we could *only* do that with the elements of the list which are generators, and use sensible code otherwise. With warnings everywhere.

Actually, perhaps the best would be that :

There is an optional flag called cache_iterator or something, which is set to False by default. When it is set to False, the function behaves exactly like the one you wrote, that is REFUSES INPUT if there is an iterator among them. The message could even say "if your iterators are all finite, wrap them with a list() to make them finite.

Then, we could say (in the exception message, or in the doc, or at both places) that if you want the code to work with infinite stuff, or when you cannot ensure that all iterators are finite, and hence cannot wrap them with list() casts, then you can set cache_iterator to True which would do all the caching (or call the combinat stuff and enjoy their overhead if they already wrote it somewhere).

Hence :

  • No speed loss for smart cases
  • The speed will not be destroyed because the user forgot to wrap with list() calls, or just did not know it could make a difference
  • Crazy guys iterating on infinite things can still use the function, be it finally implemented in the method itself or delegated to combinat stuff.

Well.. Is there any other corner case ? :-P

Nathann

comment:10 follow-up: Changed 7 years ago by vbraun

Iterators (objects implementing __iter__) are fine , only generators (the yield keyword) don't work.

I don't see the point in an extra flag to install different workarounds to generators, if you want to convert the input to a list then just do it yourself.

comment:11 in reply to: ↑ 10 ; follow-up: Changed 7 years ago by ncohen

Iterators (objects implementing __iter__) are fine , only generators (the yield keyword) don't work.

Oops, I see. I thought that the problem came from infinite sets, sorry.. But right now iterators are copied, and so the elements they iterate over are generated several times... Scary O_o

I don't see the point in an extra flag to install different workarounds to generators, if you want to convert the input to a list then just do it yourself.

Well, you cannot convert them to lists when they are infinite, and you do not want to cache stuff when you already stored it once.

Anyway, sorry for the misunderstanding. I'll look at the code and start making sense, if possible.

Nathann

comment:12 in reply to: ↑ 11 ; follow-up: Changed 7 years ago by vbraun

Replying to ncohen:

Oops, I see. I thought that the problem came from infinite sets, sorry.. But right now iterators are copied, and so the elements they iterate over are generated several times...

Thats how the cartesian product works. Shouldn't come as a surprise.

Well, you cannot convert them to lists when they are infinite, and you do not want to cache stuff when you already stored it once.

But you can wrap your generator into an IterableFunctionCall if it is potentially infinite.

comment:13 in reply to: ↑ 12 Changed 7 years ago by ncohen

Thats how the cartesian product works. Shouldn't come as a surprise.

Yeah, but you could want to cache them sometimes.. Do not generate them over and over if it takes time to generate them... Of course the opposite may also be true : it is very cheap to generate them, even over and over, but you do not want to store them.

But you can wrap your generator into an IterableFunctionCall if it is potentially infinite.

Yepyep. Right.

Nathann

comment:14 Changed 7 years ago by ncohen

Ahahahah. Well, very easy patch in the end... Looks all good, but could you modify the exception to something like that : "Generators are not allowed : read the method's documentation for a workaround" ?

You just told me about IterableFunctionCall and I had never heard of it before :-)

Nathann

comment:15 Changed 7 years ago by ncohen

Of and also could you please make IterableFunctionCall a html link, so that it is easier to find its documentation when reading the doc ?

(sorryyyyyyyyyy)

Nathann

Changed 7 years ago by vbraun

Updated patch

comment:16 Changed 7 years ago by vbraun

Done

comment:17 Changed 7 years ago by ncohen

  • Status changed from needs_review to positive_review

Well.. Looks good, passes all tests... Sorry for the first misunderstanding :-)

Nathann

comment:18 Changed 7 years ago by ncohen

  • Status changed from positive_review to needs_work

Arggggggggg....

Ahem...

sage -t -long "devel/sage-2/sage/combinat/root_system/associahedron.py"
**********************************************************************
File "/home/ncohen/.Sage/devel/sage-2/sage/combinat/root_system/associahedron.py", line 83:
    sage: TestSuite(Asso).run()
Expected nothing
Got:
    Failure in _test_pickling:
    Traceback (most recent call last):
      File "/home/ncohen/.Sage/local/lib/python/site-packages/sage/misc/sage_unittest.py", line 275, in run
        test_method(tester = tester)
      File "sage_object.pyx", line 396, in sage.structure.sage_object.SageObject._test_pickling (sage/structure/sage_object.c:3669)
      File "/home/ncohen/.Sage/local/lib/python2.7/unittest/case.py", line 511, in assertEqual
        assertion_func(first, second, msg=msg)
      File "/home/ncohen/.Sage/local/lib/python2.7/unittest/case.py", line 501, in _baseAssertEqual
        if not first == second:
      File "/home/ncohen/.Sage/local/lib/python/site-packages/sage/geometry/polyhedron/base.py", line 382, in __eq__
        return self._is_subpolyhedron(other) and other._is_subpolyhedron(self)
      File "/home/ncohen/.Sage/local/lib/python/site-packages/sage/geometry/polyhedron/base.py", line 490, in _is_subpolyhedron
        CartesianProduct(other.Hrep_generator(), self.Vrep_generator()) )
      File "/home/ncohen/.Sage/local/lib/python/site-packages/sage/combinat/cartesian_product.py", line 64, in CartesianProduct
        '(type "CartesianProduct?") for a workaround')
    ValueError: generators are not allowed, see the documentation (type "CartesianProduct?") for a workaround
    ------------------------------------------------------------
    The following tests failed: _test_pickling
**********************************************************************

comment:19 Changed 7 years ago by vbraun

  • Dependencies set to #13763
  • Reviewers set to Nathann Cohen
  • Status changed from needs_work to positive_review

The failure in polyhedra is fixed in #13763. This was where I noticed the breakage in CartesianProduct and came to open this ticket. I'm switching this ticket back to positive review since there is nothing to change here.

comment:20 Changed 7 years ago by ncohen

Well, then... :-)

Nathann

comment:21 Changed 7 years ago by jdemeyer

  • Milestone changed from sage-5.6 to sage-pending

comment:22 Changed 7 years ago by jdemeyer

  • Milestone changed from sage-pending to sage-5.7

comment:23 Changed 7 years ago by jdemeyer

  • Merged in set to sage-5.7.beta3
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.