Opened 4 years ago

Closed 4 years ago

#19319 closed enhancement (fixed)

iterator over products on diagonals a la Cantor

Reported by: dkrenn Owned by:
Priority: major Milestone: sage-6.9
Component: misc Keywords:
Cc: behackl, cheuberg, ncohen Merged in:
Authors: Vincent Delecroix Reviewers: Daniel Krenn, Clemens Heuberger
Report Upstream: N/A Work issues:
Branch: 96c0366 (Commits) Commit: 96c03668100f8fc42e60ae3f7d7a715d257f41cb
Dependencies: Stopgaps:

Description (last modified by vdelecroix)

Add an iterator, which goes over the elements of a cartesian product of (possible infinite) inputs along the diagonals like the Cantor pairing function.

This was initially factored out from #19048.

Change History (44)

comment:1 Changed 4 years ago by dkrenn

  • Branch set to u/dkrenn/product_cantor_pairing

comment:2 Changed 4 years ago by dkrenn

  • Authors set to Daniel Krenn
  • Commit set to 17229c620b4aff47161b725de2ebb4f890cfc2af
  • Status changed from new to needs_review

New commits:

619cd8ccopy code from #19048
c4d2879rename to product_cantor_pairing
a4697d3fix import in doctest
e8460b9improve docstring
9e41be5doctest with infinite iterator inputs
97cb59cadd seealso blocks
17229c6extend AUTHROS

comment:3 Changed 4 years ago by ncohen

O_o

comment:4 follow-up: Changed 4 years ago by ncohen

Yeah, I also believed that it should be shorter.

Do you really need this Kantor enumeration, or perhaps you don't care for as long as the product of two infinite iterators works correctly?

comment:5 follow-ups: Changed 4 years ago by ncohen

If you don't care, I just gave it a try:

def product(A,B):
    B = iter(B)
    lb = []
    b_done = False
    for a in A:
        try:
            if not b_done:
                lb.append(B.next())
        except StopIteration:
            b_done = True
        for b in lb:
            yield (a,b)

Does the job.

Nathann

comment:6 follow-up: Changed 4 years ago by ncohen

About your 5 copy/paste: if you want to link every function to all others, I think the best is to have an index of methods at the head of the module and add a link toward the module.

By the way, if you feel like reviewing some doc patch: #19061

Nathann

comment:7 in reply to: ↑ 4 ; follow-up: Changed 4 years ago by dkrenn

Replying to ncohen:

Yeah, I also believed that it should be shorter.

:)

Do you really need this Kantor enumeration, or perhaps you don't care for as long as the product of two infinite iterators works correctly?

I care (at least a bit)...so I would leave it as it is.

comment:8 in reply to: ↑ 7 Changed 4 years ago by ncohen

I care (at least a bit)...so I would leave it as it is.

Could you say why? I do not see the point of having a complicated code in Sage if ten lines provide the feature.

comment:9 in reply to: ↑ 5 ; follow-up: Changed 4 years ago by dkrenn

Replying to ncohen:

If you don't care, I just gave it a try:

def product(A,B):
    B = iter(B)
    lb = []
    b_done = False
    for a in A:
        try:
            if not b_done:
                lb.append(B.next())
        except StopIteration:
            b_done = True
        for b in lb:
            yield (a,b)

Your code does not give all pairs...

comment:10 in reply to: ↑ 9 Changed 4 years ago by ncohen

Your code does not give all pairs...

Yeah, only works for infinite sets.

comment:11 in reply to: ↑ 5 Changed 4 years ago by dkrenn

Replying to ncohen:

If you don't care, I just gave it a try:

def product(A,B):
    B = iter(B)
    lb = []
    b_done = False
    for a in A:
        try:
            if not b_done:
                lb.append(B.next())
        except StopIteration:
            b_done = True
        for b in lb:
            yield (a,b)
sage: list(product(srange(2), ['a', 'b', 'c']))
[(0, 'a'), (1, 'a'), (1, 'b')]

comment:12 follow-up: Changed 4 years ago by ncohen

Yeah true, it's really bad. Soooooo I need two lists. Wanted to avoid that.

The question remains, though: is there a reason to insist on Kantor ordering, or would any other order do the job?

comment:13 in reply to: ↑ 12 Changed 4 years ago by dkrenn

Replying to ncohen:

The question remains, though: is there a reason to insist on Kantor ordering, or would any other order do the job?

In #19048 it does not really (except from aesthetic aspects) matter (but it has to deal with the finite case as well and it would be nice if then all pairs would occur). But I use the same code in some other Sage projects as well (not public (yet)), where it matters: There I need an order my the sums of the indices, which is Cantor. Thus having this tool in SageMath would be great.

comment:14 Changed 4 years ago by ncohen

Okay. Longer than the first attempt, though

def product(A,B):
    A = iter(A)
    B = iter(B)
    list_a = []
    list_b = []
    a_nonempty = b_nonempty = True
    while a_nonempty or b_nonempty:
        if a_nonempty:
            try:
                list_a.append(A.next())
            except StopIteration:
                a_nonempty = False
            else:
                for b in list_b:
                    yield (list_a[-1],b)
        if b_nonempty:
            try:
                list_b.append(B.next())
            except StopIteration:
                b_nonempty = False
            else:
                for a in list_a:
                    yield (a, list_b[-1])

comment:15 Changed 4 years ago by ncohen

ARgggg.. Fails too >_<

comment:16 Changed 4 years ago by ncohen

Oh no, false alarm. Seems ok.

comment:17 follow-up: Changed 4 years ago by cheuberg

A few comments on Daniel's code:

  1. Does not terminate if A is empty and B is infinite, thus I'd simply try to yield A[0], B[0] initially and then start s at 1.
  2. try: except: block in iter_as_list can be removed

comment:18 Changed 4 years ago by git

  • Commit changed from 17229c620b4aff47161b725de2ebb4f890cfc2af to 70185c3a13f573cf1eacbec2eb2122082e6c375f

Branch pushed to git repo; I updated commit sha1. New commits:

9a77311deal with product(empty, infinite)
70185c3remove unneccesary try/except block

comment:19 in reply to: ↑ 17 Changed 4 years ago by dkrenn

Replying to cheuberg:

A few comments on Daniel's code:

  1. Does not terminate if A is empty and B is infinite, thus I'd simply try to yield A[0], B[0] initially and then start s at 1.
  2. try: except: block in iter_as_list can be removed

Both changed.

comment:20 in reply to: ↑ 6 ; follow-up: Changed 4 years ago by dkrenn

Replying to ncohen:

About your 5 copy/paste: if you want to link every function to all others, I think the best is to have an index of methods at the head of the module and add a link toward the module.

I think having SEEALSO blocks with links pointing directly to related functions/classes is a very useful thing (There have been some discussions on sage-devel about this.); it makes it easy for the user to find functions without a long search (Finding the "correct" function is definitly a problem for Sage users have...)

comment:21 follow-up: Changed 4 years ago by cheuberg

I have an alternative implementation (pushed as u/cheuberg/product_cantor_pairing):

    from itertools import count
    from sage.rings.infinity import infinity
    A = iter(A)
    B = iter(B)
    max_A = infinity
    max_B = infinity
    cache_A = []
    cache_B = []
    for s in count():
        if s <= max_A:
            try:
                cache_A.append(next(A))
            except StopIteration:
                max_A = s-1
        if s <= max_B:
            try:
                cache_B.append(next(B))
            except StopIteration:
                max_B = s-1
        if s > max_A + max_B or max_A < 0 or max_B < 0:
            return

        for i in range(max(0, s-max_B), min(s, max_A)+1):
            yield cache_A[i], cache_B[s-i]

comment:22 in reply to: ↑ 20 ; follow-up: Changed 4 years ago by ncohen

I think having SEEALSO blocks with links pointing directly to related functions/classes is a very useful thing (There have been some discussions on sage-devel about this.); it makes it easy for the user to find functions without a long search (Finding the "correct" function is definitly a problem for Sage users have...)

You got me wrong: I did not ask you to remove the seealso block. I proposed to have a

SEEALSO::

    For related functions, refer to the module's documentation

with a link of course.

Linking n functions to each other requires n*(n-1) different links. If a function is moved, if a function is added, that requires much more work. Additionally, it possible that somebody *adding* a function may forget to update the others, which will not happen if we have an index of function at the head of the module.

Nathann

comment:23 in reply to: ↑ 21 Changed 4 years ago by ncohen

I have an alternative implementation (pushed as u/cheuberg/product_cantor_pairing):

+1 for the abscence of private class.

Nathann

comment:24 Changed 4 years ago by cheuberg

  • Authors changed from Daniel Krenn to Daniel Krenn, Clemens Heuberger
  • Branch changed from u/dkrenn/product_cantor_pairing to u/cheuberg/product_cantor_pairing
  • Commit changed from 70185c3a13f573cf1eacbec2eb2122082e6c375f to d4dec2be9048e1b8fc9de246fe5c698d3676a13a

New commits:

d4dec2bTrac #19319: alternative implementation

comment:25 in reply to: ↑ 22 Changed 4 years ago by dkrenn

Replying to ncohen:

You got me wrong: I did not ask you to remove the seealso block. I proposed to have a

SEEALSO::

    For related functions, refer to the module's documentation

with a link of course.

I understood.

Linking n functions to each other requires n*(n-1) different links. If a function is moved, if a function is added, that requires much more work. Additionally, it possible that somebody *adding* a function may forget to update the others, which will not happen if we have an index of function at the head of the module.

Having these links directly in the SEEALSO block of the function helps the user...and this ranks higher to me than what a developer has to add when moving something around (and of course there is --warn-links which alerts when something is not where it should be). When adding a new function it is fine that not all SEEALSO blocks contain it; maybe only the close related ones. In sage.misc.mrange there are just a few functions and all those should IMHO be linked. In larger files I would *not* link all in all, but only build "cliques" of very related functions (but this depends on the functions, of course).

comment:26 Changed 4 years ago by dkrenn

  • Branch changed from u/cheuberg/product_cantor_pairing to u/dkrenn/product_cantor_pairing

comment:27 follow-up: Changed 4 years ago by dkrenn

  • Commit changed from d4dec2be9048e1b8fc9de246fe5c698d3676a13a to fde8e6d09f41645ec4faf9d071cf60c7b35ef9e3
  • Reviewers set to Daniel Krenn

Alternative implementation looks good to be (and all tests pass).


New commits:

fde8e6dminor changes to code: spacings, PEP8, remove comment

comment:28 in reply to: ↑ 27 Changed 4 years ago by cheuberg

Replying to dkrenn:

New commits:

fde8e6dminor changes to code: spacings, PEP8, remove comment

thanks.

comment:29 follow-up: Changed 4 years ago by vdelecroix

What is the point of having it only for pairs? What if I want to iterate ZZ x ZZ x ZZ? Using towers of iterators is definitely not a solution (hard to write/use and slow).

comment:30 in reply to: ↑ 29 ; follow-up: Changed 4 years ago by cheuberg

Replying to vdelecroix:

What is the point of having it only for pairs? What if I want to iterate ZZ x ZZ x ZZ? Using towers of iterators is definitely not a solution (hard to write/use and slow).

Short answer: iterating over pairs is what we need right now.

Iteration over the product of n iterators will, as far as I can see, always need some recursive algorithm. I do not know whether avoiding iterators internally would make that much of a difference with respect of speed; IMHO it will not make it more readable.

What about replacing product_cantor_pairing(A, B) by product_cantor_pairing([A, B]) and implementing products of >= 3 iterators by a recursive call to product_cantor_pairing (and flattening the results) for the time being?

comment:31 in reply to: ↑ 30 ; follow-up: Changed 4 years ago by vdelecroix

Replying to cheuberg:

Replying to vdelecroix:

What is the point of having it only for pairs? What if I want to iterate ZZ x ZZ x ZZ? Using towers of iterators is definitely not a solution (hard to write/use and slow).

Short answer: iterating over pairs is what we need right now.

I perfectly understand.

Iteration over the product of n iterators will, as far as I can see, always need some recursive algorithm. I do not know whether avoiding iterators internally would make that much of a difference with respect of speed; IMHO it will not make it more readable.

Nope. If you look at the code of product in Python there is no recursion involved. It 70 lines long, with a lot of spaces and written in pure C.

What about replacing product_cantor_pairing(A, B) by product_cantor_pairing([A, B]) and implementing products of >= 3 iterators by a recursive call to product_cantor_pairing (and flattening the results) for the time being?

Is that the canonical order you want for a triple?

comment:32 in reply to: ↑ 31 Changed 4 years ago by cheuberg

Replying to vdelecroix:

Replying to cheuberg:

Iteration over the product of n iterators will, as far as I can see, always need some recursive algorithm. I do not know whether avoiding iterators internally would make that much of a difference with respect of speed; IMHO it will not make it more readable.

Nope. If you look at the code of product in Python there is no recursion involved. It 70 lines long, with a lot of spaces and written in pure C.

for infinite iterators, I suppose that things get somewhat trickier.

What about replacing product_cantor_pairing(A, B) by product_cantor_pairing([A, B]) and implementing products of >= 3 iterators by a recursive call to product_cantor_pairing (and flattening the results) for the time being?

Is that the canonical order you want for a triple?

Well, I am not really eager for ordering triples anyway.

But I imagine that having product_cantor_pairing([A, B, C]) returning the elements in the same order as product_cantor_pairing([product_cantor_pairing([A, B]), C]) (or the other way around, i.e., product_cantor_pairing([A, product_cantor_pairing([B, C])])) would make sense.

comment:33 follow-up: Changed 4 years ago by vdelecroix

  • Branch changed from u/dkrenn/product_cantor_pairing to u/vdelecroix/19319
  • Commit changed from fde8e6d09f41645ec4faf9d071cf60c7b35ef9e3 to c655f9ffb0ad9f0bfdaf712a4becb2e9eb9c1a70

26 lines and no recursion.

Vincent


New commits:

c655f9fTrac 19319: Cantor iteration of cartesian products

comment:34 Changed 4 years ago by git

  • Commit changed from c655f9ffb0ad9f0bfdaf712a4becb2e9eb9c1a70 to 4a52a845469078b5418937e8927d473e8b3d2ec6

Branch pushed to git repo; I updated commit sha1. New commits:

4a52a84Trac 19319: fix doctests

comment:35 Changed 4 years ago by vdelecroix

  • Description modified (diff)
  • Summary changed from iterator over pairs on diagonals a la Cantor pairing to iterator over products on diagonals a la Cantor

comment:36 follow-up: Changed 4 years ago by nbruin

Perhaps consider using sage.misc.lazy_list.lazy_list as a cache? That should save you making explicit appends. You could even get slightly better memory use for cartesian powers (a common case) by sharing caches.

comment:37 Changed 4 years ago by cheuberg

  • Branch changed from u/vdelecroix/19319 to u/cheuberg/19319

comment:38 Changed 4 years ago by git

  • Commit changed from 4a52a845469078b5418937e8927d473e8b3d2ec6 to 1fee7226728e5ec09ffdeb1acdb59c2aad1dc7d5

Branch pushed to git repo; I updated commit sha1. New commits:

3c5af3bTrac #19319: fix typo
c20bfe5Trac #19319: a.next() -> next(a) (Python3 compliance)
1fee722Trac #19319: added a few blanks

comment:39 in reply to: ↑ 33 Changed 4 years ago by cheuberg

  • Authors changed from Daniel Krenn, Clemens Heuberger to Vincent Delecroix
  • Reviewers changed from Daniel Krenn to Daniel Krenn, Clemens Heuberger

Replying to vdelecroix:

26 lines and no recursion.

thanks --- however, using IntegerListsLex amounts to a little cheating with respect to the length of this routine ;-).

I added three minor commits.

Any particular reason that you return lists instead of tuples?

comment:40 in reply to: ↑ 36 ; follow-up: Changed 4 years ago by vdelecroix

Replying to cheuberg:

Replying to vdelecroix:

26 lines and no recursion.

thanks --- however, using IntegerListsLex amounts to a little cheating with respect to the length of this routine ;-).

Sure. But the moto is "not reinvent the wheel".

Any particular reason that you return lists instead of tuples?

Good question. itertools used to output tuples. But sometimes you want to play with the result (like appending some more elements). For me it is fine either way.

Replying to nbruin:

Perhaps consider using sage.misc.lazy_list.lazy_list as a cache? That should save you making explicit appends. You could even get slightly better memory use for cartesian powers (a common case) by sharing caches.

Using lazy list you would also need to catch IndexError or so until we use the Cython API I would not go with it.

I will add a trivial commit for handling sharing of slices (very good idea!).

Vincent

comment:41 in reply to: ↑ 40 Changed 4 years ago by cheuberg

Replying to vdelecroix:

Replying to cheuberg:

Any particular reason that you return lists instead of tuples?

Good question. itertools used to output tuples. But sometimes you want to play with the result (like appending some more elements). For me it is fine either way.

I'd actually prefer tuples, for consistency with itertools and the other methods in this module.

comment:42 Changed 4 years ago by vdelecroix

  • Branch changed from u/cheuberg/19319 to u/vdelecroix/19319
  • Commit changed from 1fee7226728e5ec09ffdeb1acdb59c2aad1dc7d5 to 96c03668100f8fc42e60ae3f7d7a715d257f41cb

Hi Clemens,

I added a commit on top of yours for returning tuples and accepting a repeat argument (with the very same specification as for itertools.product).

Vincent


New commits:

96c0366Trac 19319: return tuples + repeat argument

comment:43 Changed 4 years ago by cheuberg

  • Status changed from needs_review to positive_review

Code is fine, doctests pass, documentation builds. Thanks.

comment:44 Changed 4 years ago by vbraun

  • Branch changed from u/vdelecroix/19319 to 96c03668100f8fc42e60ae3f7d7a715d257f41cb
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.