Opened 4 years ago

Closed 4 years ago

# iterator over products on diagonals a la Cantor

Reported by: Owned by: dkrenn major sage-6.9 misc behackl, cheuberg, ncohen Vincent Delecroix Daniel Krenn, Clemens Heuberger N/A 96c0366 (Commits) 96c03668100f8fc42e60ae3f7d7a715d257f41cb

Add an iterator, which goes over the elements of a cartesian product of (possible infinite) inputs along the diagonals like the Cantor pairing function.

This was initially factored out from #19048.

### comment:1 Changed 4 years ago by dkrenn

• Branch set to u/dkrenn/product_cantor_pairing

### comment:2 Changed 4 years ago by dkrenn

• Authors set to Daniel Krenn
• Commit set to 17229c620b4aff47161b725de2ebb4f890cfc2af
• Status changed from new to needs_review

New commits:

 ​619cd8c `copy code from #19048` ​c4d2879 `rename to product_cantor_pairing` ​a4697d3 `fix import in doctest` ​e8460b9 `improve docstring` ​9e41be5 `doctest with infinite iterator inputs` ​97cb59c `add seealso blocks` ​17229c6 `extend AUTHROS`

`O_o`

### comment:4 follow-up: ↓ 7 Changed 4 years ago by ncohen

Yeah, I also believed that it should be shorter.

Do you really need this Kantor enumeration, or perhaps you don't care for as long as the product of two infinite iterators works correctly?

### comment:5 follow-ups: ↓ 9 ↓ 11 Changed 4 years ago by ncohen

If you don't care, I just gave it a try:

```def product(A,B):
B = iter(B)
lb = []
b_done = False
for a in A:
try:
if not b_done:
lb.append(B.next())
except StopIteration:
b_done = True
for b in lb:
yield (a,b)
```

Does the job.

Nathann

### comment:6 follow-up: ↓ 20 Changed 4 years ago by ncohen

About your 5 copy/paste: if you want to link every function to all others, I think the best is to have an index of methods at the head of the module and add a link toward the module.

By the way, if you feel like reviewing some doc patch: #19061

Nathann

### comment:7 in reply to: ↑ 4 ; follow-up: ↓ 8 Changed 4 years ago by dkrenn

Yeah, I also believed that it should be shorter.

:)

Do you really need this Kantor enumeration, or perhaps you don't care for as long as the product of two infinite iterators works correctly?

I care (at least a bit)...so I would leave it as it is.

### comment:8 in reply to: ↑ 7 Changed 4 years ago by ncohen

I care (at least a bit)...so I would leave it as it is.

Could you say why? I do not see the point of having a complicated code in Sage if ten lines provide the feature.

### comment:9 in reply to: ↑ 5 ; follow-up: ↓ 10 Changed 4 years ago by dkrenn

If you don't care, I just gave it a try:

```def product(A,B):
B = iter(B)
lb = []
b_done = False
for a in A:
try:
if not b_done:
lb.append(B.next())
except StopIteration:
b_done = True
for b in lb:
yield (a,b)
```

Your code does not give all pairs...

### comment:10 in reply to: ↑ 9 Changed 4 years ago by ncohen

Your code does not give all pairs...

Yeah, only works for infinite sets.

### comment:11 in reply to: ↑ 5 Changed 4 years ago by dkrenn

If you don't care, I just gave it a try:

```def product(A,B):
B = iter(B)
lb = []
b_done = False
for a in A:
try:
if not b_done:
lb.append(B.next())
except StopIteration:
b_done = True
for b in lb:
yield (a,b)
```
```sage: list(product(srange(2), ['a', 'b', 'c']))
[(0, 'a'), (1, 'a'), (1, 'b')]
```

### comment:12 follow-up: ↓ 13 Changed 4 years ago by ncohen

Yeah true, it's really bad. Soooooo I need two lists. Wanted to avoid that.

The question remains, though: is there a reason to insist on Kantor ordering, or would any other order do the job?

### comment:13 in reply to: ↑ 12 Changed 4 years ago by dkrenn

The question remains, though: is there a reason to insist on Kantor ordering, or would any other order do the job?

In #19048 it does not really (except from aesthetic aspects) matter (but it has to deal with the finite case as well and it would be nice if then all pairs would occur). But I use the same code in some other Sage projects as well (not public (yet)), where it matters: There I need an order my the sums of the indices, which is Cantor. Thus having this tool in SageMath would be great.

### comment:14 Changed 4 years ago by ncohen

Okay. Longer than the first attempt, though

```def product(A,B):
A = iter(A)
B = iter(B)
list_a = []
list_b = []
a_nonempty = b_nonempty = True
while a_nonempty or b_nonempty:
if a_nonempty:
try:
list_a.append(A.next())
except StopIteration:
a_nonempty = False
else:
for b in list_b:
yield (list_a[-1],b)
if b_nonempty:
try:
list_b.append(B.next())
except StopIteration:
b_nonempty = False
else:
for a in list_a:
yield (a, list_b[-1])
```

### comment:15 Changed 4 years ago by ncohen

ARgggg.. Fails too `>_<`

### comment:16 Changed 4 years ago by ncohen

Oh no, false alarm. Seems ok.

### comment:17 follow-up: ↓ 19 Changed 4 years ago by cheuberg

A few comments on Daniel's code:

1. Does not terminate if `A` is empty and `B` is infinite, thus I'd simply try to yield A[0], B[0] initially and then start `s` at `1`.
2. `try:` `except:` block in `iter_as_list` can be removed

### comment:18 Changed 4 years ago by git

• Commit changed from 17229c620b4aff47161b725de2ebb4f890cfc2af to 70185c3a13f573cf1eacbec2eb2122082e6c375f

Branch pushed to git repo; I updated commit sha1. New commits:

 ​9a77311 `deal with product(empty, infinite)` ​70185c3 `remove unneccesary try/except block`

### comment:19 in reply to: ↑ 17 Changed 4 years ago by dkrenn

A few comments on Daniel's code:

1. Does not terminate if `A` is empty and `B` is infinite, thus I'd simply try to yield A[0], B[0] initially and then start `s` at `1`.
2. `try:` `except:` block in `iter_as_list` can be removed

Both changed.

### comment:20 in reply to: ↑ 6 ; follow-up: ↓ 22 Changed 4 years ago by dkrenn

About your 5 copy/paste: if you want to link every function to all others, I think the best is to have an index of methods at the head of the module and add a link toward the module.

I think having SEEALSO blocks with links pointing directly to related functions/classes is a very useful thing (There have been some discussions on sage-devel about this.); it makes it easy for the user to find functions without a long search (Finding the "correct" function is definitly a problem for Sage users have...)

### comment:21 follow-up: ↓ 23 Changed 4 years ago by cheuberg

I have an alternative implementation (pushed as `u/cheuberg/product_cantor_pairing`):

```    from itertools import count
from sage.rings.infinity import infinity
A = iter(A)
B = iter(B)
max_A = infinity
max_B = infinity
cache_A = []
cache_B = []
for s in count():
if s <= max_A:
try:
cache_A.append(next(A))
except StopIteration:
max_A = s-1
if s <= max_B:
try:
cache_B.append(next(B))
except StopIteration:
max_B = s-1
if s > max_A + max_B or max_A < 0 or max_B < 0:
return

for i in range(max(0, s-max_B), min(s, max_A)+1):
yield cache_A[i], cache_B[s-i]
```

### comment:22 in reply to: ↑ 20 ; follow-up: ↓ 25 Changed 4 years ago by ncohen

I think having SEEALSO blocks with links pointing directly to related functions/classes is a very useful thing (There have been some discussions on sage-devel about this.); it makes it easy for the user to find functions without a long search (Finding the "correct" function is definitly a problem for Sage users have...)

You got me wrong: I did not ask you to remove the seealso block. I proposed to have a

```SEEALSO::

For related functions, refer to the module's documentation
```

Linking n functions to each other requires n*(n-1) different links. If a function is moved, if a function is added, that requires much more work. Additionally, it possible that somebody *adding* a function may forget to update the others, which will not happen if we have an index of function at the head of the module.

Nathann

### comment:23 in reply to: ↑ 21 Changed 4 years ago by ncohen

I have an alternative implementation (pushed as `u/cheuberg/product_cantor_pairing`):

+1 for the abscence of private class.

Nathann

### comment:24 Changed 4 years ago by cheuberg

• Authors changed from Daniel Krenn to Daniel Krenn, Clemens Heuberger
• Branch changed from u/dkrenn/product_cantor_pairing to u/cheuberg/product_cantor_pairing
• Commit changed from 70185c3a13f573cf1eacbec2eb2122082e6c375f to d4dec2be9048e1b8fc9de246fe5c698d3676a13a

New commits:

 ​d4dec2b `Trac #19319: alternative implementation`

### comment:25 in reply to: ↑ 22 Changed 4 years ago by dkrenn

You got me wrong: I did not ask you to remove the seealso block. I proposed to have a

```SEEALSO::

For related functions, refer to the module's documentation
```

I understood.

Linking n functions to each other requires n*(n-1) different links. If a function is moved, if a function is added, that requires much more work. Additionally, it possible that somebody *adding* a function may forget to update the others, which will not happen if we have an index of function at the head of the module.

Having these links directly in the `SEEALSO` block of the function helps the user...and this ranks higher to me than what a developer has to add when moving something around (and of course there is `--warn-links` which alerts when something is not where it should be). When adding a new function it is fine that not all `SEEALSO` blocks contain it; maybe only the close related ones. In `sage.misc.mrange` there are just a few functions and all those should IMHO be linked. In larger files I would *not* link all in all, but only build "cliques" of very related functions (but this depends on the functions, of course).

### comment:26 Changed 4 years ago by dkrenn

• Branch changed from u/cheuberg/product_cantor_pairing to u/dkrenn/product_cantor_pairing

### comment:27 follow-up: ↓ 28 Changed 4 years ago by dkrenn

• Commit changed from d4dec2be9048e1b8fc9de246fe5c698d3676a13a to fde8e6d09f41645ec4faf9d071cf60c7b35ef9e3
• Reviewers set to Daniel Krenn

Alternative implementation looks good to be (and all tests pass).

New commits:

 ​fde8e6d `minor changes to code: spacings, PEP8, remove comment`

### comment:28 in reply to: ↑ 27 Changed 4 years ago by cheuberg

New commits:

 ​fde8e6d `minor changes to code: spacings, PEP8, remove comment`

thanks.

### comment:29 follow-up: ↓ 30 Changed 4 years ago by vdelecroix

What is the point of having it only for pairs? What if I want to iterate `ZZ x ZZ x ZZ`? Using towers of iterators is definitely not a solution (hard to write/use and slow).

### comment:30 in reply to: ↑ 29 ; follow-up: ↓ 31 Changed 4 years ago by cheuberg

What is the point of having it only for pairs? What if I want to iterate `ZZ x ZZ x ZZ`? Using towers of iterators is definitely not a solution (hard to write/use and slow).

Short answer: iterating over pairs is what we need right now.

Iteration over the product of n iterators will, as far as I can see, always need some recursive algorithm. I do not know whether avoiding iterators internally would make that much of a difference with respect of speed; IMHO it will not make it more readable.

What about replacing `product_cantor_pairing(A, B)` by `product_cantor_pairing([A, B])` and implementing products of `>= 3` iterators by a recursive call to `product_cantor_pairing` (and flattening the results) for the time being?

### comment:31 in reply to: ↑ 30 ; follow-up: ↓ 32 Changed 4 years ago by vdelecroix

What is the point of having it only for pairs? What if I want to iterate `ZZ x ZZ x ZZ`? Using towers of iterators is definitely not a solution (hard to write/use and slow).

Short answer: iterating over pairs is what we need right now.

I perfectly understand.

Iteration over the product of n iterators will, as far as I can see, always need some recursive algorithm. I do not know whether avoiding iterators internally would make that much of a difference with respect of speed; IMHO it will not make it more readable.

Nope. If you look at the code of `product` in Python there is no recursion involved. It 70 lines long, with a lot of spaces and written in pure C.

What about replacing `product_cantor_pairing(A, B)` by `product_cantor_pairing([A, B])` and implementing products of `>= 3` iterators by a recursive call to `product_cantor_pairing` (and flattening the results) for the time being?

Is that the canonical order you want for a triple?

### comment:32 in reply to: ↑ 31 Changed 4 years ago by cheuberg

Iteration over the product of n iterators will, as far as I can see, always need some recursive algorithm. I do not know whether avoiding iterators internally would make that much of a difference with respect of speed; IMHO it will not make it more readable.

Nope. If you look at the code of `product` in Python there is no recursion involved. It 70 lines long, with a lot of spaces and written in pure C.

for infinite iterators, I suppose that things get somewhat trickier.

What about replacing `product_cantor_pairing(A, B)` by `product_cantor_pairing([A, B])` and implementing products of `>= 3` iterators by a recursive call to `product_cantor_pairing` (and flattening the results) for the time being?

Is that the canonical order you want for a triple?

Well, I am not really eager for ordering triples anyway.

But I imagine that having `product_cantor_pairing([A, B, C])` returning the elements in the same order as `product_cantor_pairing([product_cantor_pairing([A, B]), C])` (or the other way around, i.e., `product_cantor_pairing([A, product_cantor_pairing([B, C])])`) would make sense.

### comment:33 follow-up: ↓ 39 Changed 4 years ago by vdelecroix

• Branch changed from u/dkrenn/product_cantor_pairing to u/vdelecroix/19319
• Commit changed from fde8e6d09f41645ec4faf9d071cf60c7b35ef9e3 to c655f9ffb0ad9f0bfdaf712a4becb2e9eb9c1a70

26 lines and no recursion.

Vincent

New commits:

 ​c655f9f `Trac 19319: Cantor iteration of cartesian products`

### comment:34 Changed 4 years ago by git

• Commit changed from c655f9ffb0ad9f0bfdaf712a4becb2e9eb9c1a70 to 4a52a845469078b5418937e8927d473e8b3d2ec6

Branch pushed to git repo; I updated commit sha1. New commits:

 ​4a52a84 `Trac 19319: fix doctests`

### comment:35 Changed 4 years ago by vdelecroix

• Description modified (diff)
• Summary changed from iterator over pairs on diagonals a la Cantor pairing to iterator over products on diagonals a la Cantor

### comment:36 follow-up: ↓ 40 Changed 4 years ago by nbruin

Perhaps consider using `sage.misc.lazy_list.lazy_list` as a cache? That should save you making explicit appends. You could even get slightly better memory use for cartesian powers (a common case) by sharing caches.

### comment:37 Changed 4 years ago by cheuberg

• Branch changed from u/vdelecroix/19319 to u/cheuberg/19319

### comment:38 Changed 4 years ago by git

• Commit changed from 4a52a845469078b5418937e8927d473e8b3d2ec6 to 1fee7226728e5ec09ffdeb1acdb59c2aad1dc7d5

Branch pushed to git repo; I updated commit sha1. New commits:

 ​3c5af3b `Trac #19319: fix typo` ​c20bfe5 `Trac #19319: a.next() -> next(a) (Python3 compliance)` ​1fee722 `Trac #19319: added a few blanks`

### comment:39 in reply to: ↑ 33 Changed 4 years ago by cheuberg

• Authors changed from Daniel Krenn, Clemens Heuberger to Vincent Delecroix
• Reviewers changed from Daniel Krenn to Daniel Krenn, Clemens Heuberger

26 lines and no recursion.

thanks --- however, using `IntegerListsLex` amounts to a little cheating with respect to the length of this routine `;-)`.

Any particular reason that you return lists instead of tuples?

### comment:40 in reply to: ↑ 36 ; follow-up: ↓ 41 Changed 4 years ago by vdelecroix

26 lines and no recursion.

thanks --- however, using `IntegerListsLex` amounts to a little cheating with respect to the length of this routine `;-)`.

Sure. But the moto is "not reinvent the wheel".

Any particular reason that you return lists instead of tuples?

Good question. `itertools` used to output tuples. But sometimes you want to play with the result (like appending some more elements). For me it is fine either way.

Perhaps consider using `sage.misc.lazy_list.lazy_list` as a cache? That should save you making explicit appends. You could even get slightly better memory use for cartesian powers (a common case) by sharing caches.

Using lazy list you would also need to catch `IndexError` or so until we use the Cython API I would not go with it.

I will add a trivial commit for handling sharing of slices (very good idea!).

Vincent

### comment:41 in reply to: ↑ 40 Changed 4 years ago by cheuberg

Any particular reason that you return lists instead of tuples?

Good question. `itertools` used to output tuples. But sometimes you want to play with the result (like appending some more elements). For me it is fine either way.

I'd actually prefer tuples, for consistency with `itertools` and the other methods in this module.

### comment:42 Changed 4 years ago by vdelecroix

• Branch changed from u/cheuberg/19319 to u/vdelecroix/19319
• Commit changed from 1fee7226728e5ec09ffdeb1acdb59c2aad1dc7d5 to 96c03668100f8fc42e60ae3f7d7a715d257f41cb

Hi Clemens,

I added a commit on top of yours for returning tuples and accepting a `repeat` argument (with the very same specification as for `itertools.product`).

Vincent

New commits:

 ​96c0366 `Trac 19319: return tuples + repeat argument`

### comment:43 Changed 4 years ago by cheuberg

• Status changed from needs_review to positive_review

Code is fine, doctests pass, documentation builds. Thanks.

### comment:44 Changed 4 years ago by vbraun

• Branch changed from u/vdelecroix/19319 to 96c03668100f8fc42e60ae3f7d7a715d257f41cb
• Resolution set to fixed
• Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.