Opened 5 years ago

Closed 4 years ago

#18109 closed defect (fixed)

Restructure IntegerListLex code

Reported by: vdelecroix Owned by:
Priority: major Milestone: sage-6.10
Component: combinatorics Keywords:
Cc: aschilling, jdemeyer, nthiery, ncohen, bgillespie Merged in:
Authors: Jeroen Demeyer Reviewers: Anne Schilling, Travis Scrimshaw
Report Upstream: N/A Work issues:
Branch: 65025ab (Commits) Commit: 65025ab68743a1dc6c30eb73b6db4e4283f49d4a
Dependencies: #15525 Stopgaps:

Description (last modified by jdemeyer)

Split up IntegerListsLex in a fast Cython back-end and a (slower) Python front-end. Restructure with multiple implementations (like #17920) in mind.

This ticket will not change anything to the public interface of IntegerListsLex.

Change History (100)

comment:1 follow-up: Changed 5 years ago by nthiery

  • I believe we should rephrase this ticket as "extract the IntegerListsLex *iterator* as a standalone tool that depends on nothing but Python/Cython". In fact this could go as far as making it a standalone library in e.g. C++.

We want to keep the parent to model the set itself, ask questions like cardinality or building the polyhedron, do constructions on top of it (e.g. use it as indexing set for a vector space), etc.

#18056 would be a good occasion to handle this part.

  • Being able to specify an element constructor is a useful feature as well. What we need to discuss here is whether we want to switch to using lists (or tuples!) by default.
  • To remove __classcall__ we need to wait until the end of the deprecation period. To remove global_options we need to wait for the subclasses using it to be refactored to not impose this burden on IntegerListsLex.

comment:2 in reply to: ↑ 1 ; follow-up: Changed 5 years ago by ncohen

We want to keep the parent to model the set itself, ask questions like cardinality or building the polyhedron, do constructions on top of it (e.g. use it as indexing set for a vector space), etc.

The 'Lex' there seems a bit too much for the mathematical object that you want to represent. You describe things that could be a method of an 'IntegerLists?' object (or more specifically methods of 'Compositions' or 'Partitions').

  • Being able to specify an element constructor is a useful feature as well. What we need to discuss here is whether we want to switch to using lists (or tuples!) by default.

There should be a way to enumerate these objects without paying this cost, however. A way to have both is to implement the iterator to return a copy of the current list, or a tuple (or even the current list itself, with big 'read only' warnings), and then implement in IntegerListsLex an __iter__ that wraps every element returned by that iterator with <whatever you need>.

Nathann

comment:3 in reply to: ↑ 2 Changed 5 years ago by jdemeyer

Replying to ncohen:

There should be a way to enumerate these objects without paying this cost, however. A way to have both is to implement the iterator to return a copy of the current list, or a tuple (or even the current list itself, with big 'read only' warnings), and then implement in IntegerListsLex an __iter__ that wraps every element returned by that iterator with <whatever you need>.

That's also what I have in mind: a low-level class designed to be clean and fast implemented in Cython without overhead. And then a class on top of that which can implement whatever extra Python features that you want.

comment:4 Changed 5 years ago by jdemeyer

  • Summary changed from IntegerListLex better not be a parent to Restructure IntegerListLex code

comment:5 Changed 5 years ago by jdemeyer

  • Branch set to u/jdemeyer/ticket/18109

comment:6 Changed 5 years ago by git

  • Commit set to 8d5ca559f1b4a376a84d88a995a3fde6bfb74a6f

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

8d5ca55Restructure IntegerListsLex code

comment:7 Changed 5 years ago by jdemeyer

  • Authors set to Jeroen Demeyer
  • Description modified (diff)

comment:8 Changed 5 years ago by jdemeyer

  • Description modified (diff)

comment:9 follow-ups: Changed 5 years ago by vdelecroix

Hi Jeroen,

As a matter of fact, if you isolate the commit that just move a file, the diff looks much nicer. I am not able to get something reasonable with the options -B, -C, -M or -D. Do you know what to do?

Vincent

comment:10 in reply to: ↑ 9 Changed 5 years ago by jdemeyer

Replying to vdelecroix:

Hi Jeroen,

As a matter of fact, if you isolate the commit that just move a file, the diff looks much nicer. I am not able to get something reasonable with the options -B, -C, -M or -D. Do you know what to do?

Sorry no. I don't know how to nicely show diffs which split up a file in two (interestingly, hg has better support for this!)

comment:11 Changed 5 years ago by jdemeyer

  • Dependencies set to #18181

comment:12 Changed 5 years ago by git

  • Commit changed from 8d5ca559f1b4a376a84d88a995a3fde6bfb74a6f to d8fd440915b0438aecf7f1f5d8d5eb2d5f366ddd

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

f648443Move IntegerListsLex._Iter out of IntegerListsLex
d8fd440Restructure IntegerListsLex code

comment:13 Changed 5 years ago by git

  • Commit changed from d8fd440915b0438aecf7f1f5d8d5eb2d5f366ddd to 9fbbd3c6c2900b5d6f326846522338774751cfe8

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

9fbbd3cRestructure IntegerListsLex code

comment:14 follow-up: Changed 5 years ago by vdelecroix

A comment on the design... Should we really support +Infinity in the iterator? I would go for unsigned int variables and UINT_MAX as a synonmyous for Infinity. Of course it makes sense for the higher classes (e.g. to give a proper answer to .cardinality()).

comment:15 in reply to: ↑ 14 ; follow-up: Changed 5 years ago by jdemeyer

Replying to vdelecroix:

A comment on the design... Should we really support +Infinity in the iterator?

Yes.

I would go for unsigned int variables and UINT_MAX as a synonmyous for Infinity.

And not support IntegerLists(10^100)?

comment:16 in reply to: ↑ 15 ; follow-ups: Changed 5 years ago by vdelecroix

Replying to jdemeyer:

Replying to vdelecroix:

A comment on the design... Should we really support +Infinity in the iterator?

Yes.

I would go for unsigned int variables and UINT_MAX as a synonmyous for Infinity.

And not support IntegerLists(10^100)?

I said for the iterator. Not for the main class. I would not bother if iter(IntegerLists(10^100)) just failed. It should be very fast for small entries. I guess that one option would be to use Nathann strategy in #18137 with fused Cython type (here unsigned int and mpz_t). But I remember that it was nearly impossible to make it work as attributes of an extension class.

comment:17 in reply to: ↑ 16 Changed 5 years ago by jdemeyer

Replying to vdelecroix:

But I remember that it was nearly impossible to make it work as attributes of an extension class.

For attributes of an extension class, no. I guess you could have two classes (one for some C type and one for mpz_t) on top of a common base class. But I have never done this.

comment:18 in reply to: ↑ 16 ; follow-up: Changed 5 years ago by jdemeyer

Replying to vdelecroix:

Replying to jdemeyer:

Replying to vdelecroix:

A comment on the design... Should we really support +Infinity in the iterator?

Yes.

I would go for unsigned int variables and UINT_MAX as a synonmyous for Infinity.

And not support IntegerLists(10^100)?

I said for the iterator. Not for the main class. I would not bother if iter(IntegerLists(10^100)) just failed. It should be very fast for small entries. I guess that one option would be to use Nathann strategy in #18137 with fused Cython type (here unsigned int and mpz_t).

I think that we really should support list(IntegerLists(10^100, length=1)) because in Sage, we always support large integers if possible.

In any case, changing this is certainly outside the scope of this ticket (it could be done in #18055 or #18056). Here, I just want to reorganize the code without changing the implementation.

comment:19 Changed 5 years ago by jdemeyer

  • Dependencies changed from #18181 to #18181, #18184

comment:20 Changed 5 years ago by git

  • Commit changed from 9fbbd3c6c2900b5d6f326846522338774751cfe8 to e39566a935403eb750d444d6b236f43084caa8be

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

e39566aRestructure IntegerListsLex code

comment:21 Changed 5 years ago by git

  • Commit changed from e39566a935403eb750d444d6b236f43084caa8be to cda4b75087577ff9583411799f8832ccf2e72866

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

cda4b75Restructure IntegerListsLex code

comment:22 in reply to: ↑ 18 Changed 5 years ago by nthiery

Replying to jdemeyer:

I think that we really should support list(IntegerLists(10^100, length=1)) because in Sage, we always support large integers if possible.

That's part of why I am thinking of C++; then we can just have a templated iterator, and depending on the input we can choose one instantiation or the other.

In any case, changing this is certainly outside the scope of this ticket (it could be done in #18055 or #18056). Here, I just want to reorganize the code without changing the implementation.

Sounds reasonable indeed.


New commits:

cda4b75Restructure IntegerListsLex code

comment:23 Changed 5 years ago by git

  • Commit changed from cda4b75087577ff9583411799f8832ccf2e72866 to 7f8c40a07f55d18097ca53007ca3dfd29439551e

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

e03140aCombinatorialObject constructor should copy input
e7bc348Merge commit 'e03140a761586bf64f01aceba324abd1eece813b' into HEAD
7f8c40aRestructure IntegerListsLex code

comment:24 Changed 5 years ago by git

  • Commit changed from 7f8c40a07f55d18097ca53007ca3dfd29439551e to 2b88b6416c0a6853b2fff59d7bd2ecb2d56f5a5c

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

2b88b64Restructure IntegerListsLex code

comment:25 in reply to: ↑ 9 Changed 5 years ago by jdemeyer

Replying to vdelecroix:

I am not able to get something reasonable with the options -B, -C, -M or -D. Do you know what to do?

Actually, the following will show everything as copied from integer_list.py:

git show --patience -D -B -C01
Last edited 5 years ago by jdemeyer (previous) (diff)

comment:26 Changed 5 years ago by git

  • Commit changed from 2b88b6416c0a6853b2fff59d7bd2ecb2d56f5a5c to e67e71c7437af2daf3c6653d882e0fafb6e292b8

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

e67e71cRestructure IntegerListsLex code

comment:27 Changed 5 years ago by jdemeyer

This now passes all doctests except for the pickle jar.

comment:28 Changed 5 years ago by git

  • Commit changed from e67e71c7437af2daf3c6653d882e0fafb6e292b8 to 8c587d34a7b3332baaeb97b3197f465eebf0c982

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

8c587d3Restructure IntegerListsLex code

comment:29 Changed 5 years ago by jdemeyer

  • Description modified (diff)

Passes all doctests and documentation builds.

comment:30 Changed 5 years ago by git

  • Commit changed from 8c587d34a7b3332baaeb97b3197f465eebf0c982 to 2be6ad090d51f5eb033cde42a1831ab7d89092ae

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

2be6ad0Restructure IntegerListsLex code

comment:31 follow-up: Changed 5 years ago by jdemeyer

  • Status changed from new to needs_review

comment:32 in reply to: ↑ 31 ; follow-ups: Changed 5 years ago by aschilling

Hi Jeroen!

Since you seem to have cythonized the code already, could you add some timings compared to the old code?

Anne

PS: I was under the impression that it is harder to debug code in cython, which might make life a little harder for the new features in #18055.

comment:33 in reply to: ↑ 32 Changed 5 years ago by jdemeyer

Replying to aschilling:

Since you seem to have cythonized the code already, could you add some timings compared to the old code?

My goal certainly was not to gain speed, but I could check...

I was under the impression that it is harder to debug code in cython, which might make life a little harder for the new features in #18055.

I have no idea really. A lot of the code I write for Sage is Cython and it hasn't bothered me.

On the other hand, I on purpose did not Cythonize IntegerListsLexIter (it's in a Cython source file, but doesn't use any Cython features). So if it makes your life easier, just move that one class to a Python file.

comment:34 in reply to: ↑ 32 Changed 5 years ago by jdemeyer

Replying to aschilling:

you seem to have cythonized the code already

Depends what you mean. I moved the files to a Cython source file and I created extension types instead of Python classes. But I didn't optimize the loops for example.

comment:35 follow-up: Changed 5 years ago by jdemeyer

There is a small but significant speed-up (again: this is without really trying to speed up the code).


Before:

sage: timeit('list(IntegerListsLex(14, min_part=1))')
5 loops, best of 3: 1.18 s per loop

After:

sage: timeit('list(IntegerListsLex(14, min_part=1))')
5 loops, best of 3: 986 ms per loop

Before:

sage: timeit('list(IntegerListsLex(28, length=8, floor=lambda i:i, check=False))')
5 loops, best of 3: 1.11 s per loop

After:

sage: timeit('list(IntegerListsLex(28, length=8, floor=lambda i:i, check=False))')
5 loops, best of 3: 852 ms per loop

comment:36 in reply to: ↑ 35 Changed 5 years ago by aschilling

Nice that there is a slight speedup without even trying hard!

FYI, I rebased this branch on top of sage-6.7.beta0 and everything looks clean.

I noticed that in nn.y in sage.combinat.integer_lists it currently says

    .. WARNING:: this function is likely to disappear in :trac:`17927`.

Briefly looking at 17927 this seems no longer the case.

comment:37 follow-up: Changed 5 years ago by git

  • Commit changed from 2be6ad090d51f5eb033cde42a1831ab7d89092ae to 218808287254c4b462eeed12801310ebe4d3d124

Branch pushed to git repo; I updated commit sha1. New commits:

2188082Remove references to #17927

comment:38 in reply to: ↑ 37 ; follow-ups: Changed 5 years ago by aschilling

In principle this branch looks good to me; I guess it needs to be rebased to the latest development version, however.

Just to check: are you planning to reuse the Envelope class also for #17920? The planned changes there involving backward smoothing etc will be ok, right?

comment:39 in reply to: ↑ 38 Changed 5 years ago by jdemeyer

Replying to aschilling:

Just to check: are you planning to reuse the Envelope class also for #17920? The planned changes there involving backward smoothing etc will be ok, right?

Sure. The only risk is that errors in Envelope will make both implementations wrong.

comment:40 in reply to: ↑ 38 ; follow-up: Changed 5 years ago by jdemeyer

Replying to aschilling:

In principle this branch looks good to me; I guess it needs to be rebased to the latest development version, however.

Despite the red link on Trac, it actually merges cleanly with 6.7.beta1

comment:41 in reply to: ↑ 40 Changed 4 years ago by aschilling

Hi Jeroen,

Could you please rebase this patch on the latest development version, so I can do the final review? Nicolas never made a comment, so I assume he is ok with it!

Best,

Anne

comment:42 follow-up: Changed 4 years ago by git

  • Commit changed from 218808287254c4b462eeed12801310ebe4d3d124 to 9582a7a5248f609a2d2de02497b4b1d36d9dd96e

Branch pushed to git repo; I updated commit sha1. New commits:

9582a7aMerge tag '6.7.beta3' into t/18109/ticket/18109

comment:43 in reply to: ↑ 42 ; follow-up: Changed 4 years ago by aschilling

I went through the code and the restructuring looks good to me. All tests pass and the documentation builds. One question I have though is about the comparison functions in base.py and lists.py. The tests look almost identical. So which one is actually used for IntegerListsLex?

comment:44 follow-ups: Changed 4 years ago by nthiery

Hi Jeroen,

Sorry, I got sidetracked reviewing other tickets. Thanks Anne for the reminder, and for checking the code and the documentation!

I have just been through the code myself (not checking the fine details though).

Altogether, I believe it's correct. Cythonizing the envelope function and making the iterator into a Cython class that depends on little of the Sage infrastructure was in our long term plans, so that's great (I would have done it after the algorithm refactorisation, but since it's done, let's move forward). Generalizing a base IntegerLists? class is useful as well.

On the other hand, is it absolutely necessary to have this separation between frontend and backend classes? It introduces some complexity and in fact some duplication (e.g. in the tests). In particular, I am worried that it might introduce complexity in the construction of new special case classes based on IntegerListsLex, when the primary goal is to make this as trivial as possible. I'd need to actually implement such new classes to qualify this worry.

If it's just a question of avoiding calls to the element constructor, could not this be done just locally in __iter__ (returning directly an IntegerListsLexIter object, or mapping the constructor on it)? If it's a question of making the iterator code independent of Parent, well, it already is: it only uses the IntegerListsLex object as a data holder for the constraints, right? As long as we don't have a specific use case (backed up by benchmarks if speed is the motivation), there may not be much point in maintaining additional code for this.

At the very least, there should be some entry point (typically in the module description) listing all the classes, and clarifying their respective roles. Maybe slightly more systematic class names could help; say: IntegerLists, IntegerListsLex, IntegerListsBackend, IntegerListsLexBackend.

Some small suggestions:

  • some of the cdef attributes could be declared as int's. E.g. Envelope.min_length.
  • maybe we could use the occasion to rename the attribute IntegerListsLexIter.parent especially if it's not a parent anymore.
  • Another small complication is having to handle pickling by hand each time. At least, would there be a way to implement XXX.__reduce__ to return (XXX, *) rather than having to implement an unpickle_XXX function each time?

Cheers,

Nicolas

comment:45 in reply to: ↑ 44 ; follow-up: Changed 4 years ago by jdemeyer

Replying to nthiery:

On the other hand, is it absolutely necessary to have this separation between frontend and backend classes? If it's a question of making the iterator code independent of Parent

It's a question of making the backend class independent of Parent yes. I think it's good to have a fast simple Cython backend and a proper Sage `Parent front-end.

Some small suggestions:

  • some of the cdef attributes could be declared as int's. E.g. Envelope.min_length.

I consider that out of the scope of this ticket. These are micro-optimizations which can be done after we have a clearer view of the final design.

  • maybe we could use the occasion to rename the attribute IntegerListsLexIter.parent especially if it's not a parent anymore.

Sure, fine for me.

  • Another small complication is having to handle pickling by hand each time. At least, would there be a way to implement XXX.__reduce__ to return (XXX, *) rather than having to implement an unpickle_XXX function each time?

It's certainly annoying, but that's because Cython cannot automatically (un)pickle extension types. I have to check if it can be simplified.

Version 1, edited 4 years ago by jdemeyer (previous) (next) (diff)

comment:46 in reply to: ↑ 45 ; follow-ups: Changed 4 years ago by nthiery

Hi Jeroen,

Replying to jdemeyer:

It's a question of making the backend class independent of Parent yes. I think it's good to have a fast simple Cython backend and a proper Sage Parent front-end.

I agree its good in theory. But if it makes things more complicated, then it's only good in practice if we have an actual use case.

A main point being: unless we create millions of small IntegerListsLex? objects, I don't see why having it be a (facade) parent makes things slower.

I consider that out of the scope of this ticket. These are micro-optimizations which can be done after we have a clearer view of the final design.

Fair enough.

It's certainly annoying, but that's because Cython cannot automatically (un)pickle extension types. I have to check if it can be simplified.

Thanks!

Cheers,

Nicolas

PS: I forgot to word my appreciation for the hard work!

comment:47 in reply to: ↑ 46 ; follow-up: Changed 4 years ago by vdelecroix

Replying to nthiery:

Hi Jeroen,

Replying to jdemeyer:

It's a question of making the backend class independent of Parent yes. I think it's good to have a fast simple Cython backend and a proper Sage Parent front-end.

I agree its good in theory. But if it makes things more complicated, then it's only good in practice if we have an actual use case.

A main point being: unless we create millions of small IntegerListsLex? objects, I don't see why having it be a (facade) parent makes things slower.

A use case is in MatrixSpace.__iter__ and I also have others intensive usage for building so-called square tiled surfaces (if you care, look at u/vdelecroix/flat_surfaces-6.5). I would be grateful if the iterator could be accessible without initializing any parent.

Vincent

comment:48 in reply to: ↑ 46 Changed 4 years ago by ncohen

A main point being: unless we create millions of small IntegerListsLex? objects, I don't see why having it be a (facade) parent makes things slower.

Wouldn't this happen if somebody wants, for all n between 10 and 1000, one instance of a partition of n satisfying <a specific set of constraints> ?

There is a (trivial) case at least which makes sense to me, i.e. for a fixed k find some partition of n into integers between floor(n/k) and ceil(n/k). I agree that this can be done manually without this iterator, but I expect that it can happen with more complicated set of constraints too.

Nathann

Last edited 4 years ago by ncohen (previous) (diff)

comment:49 in reply to: ↑ 47 Changed 4 years ago by nthiery

Replying to vdelecroix:

A use case is in MatrixSpace.__iter__ and I also have others intensive usage for building so-called square tiled surfaces (if you care, look at u/vdelecroix/flat_surfaces-6.5). I would be grateful if the iterator could be accessible without initializing any parent.

Thanks for the feedback.

Do you mind running a concrete benchmark in one of those situations, comparing a facade parent IntegerListLex and a non-parent IntegerListsLex as provided by this ticket? Then we coud evaluate on solid ground what the overhead actually is, and whether it's worth the additional complexity?

comment:50 Changed 4 years ago by vdelecroix

sage: class A:
....:     pass
....: 
sage: class B(Parent):
....:     pass
....: 
sage: timeit("A()")
625 loops, best of 3: 413 ns per loop
sage: timeit("B()")
625 loops, best of 3: 13.2 µs per loop

comment:51 follow-up: Changed 4 years ago by jdemeyer

That's not a fair benchmark, since that's Python, not Cython. The real benchmark is this:

sage: cython("cdef class A(object): pass")
sage: cython("from sage.structure.parent cimport Parent\ncdef class B(Parent): pass")
sage: timeit("A()", number=10^4, repeat=20)
10000 loops, best of 20: 68.9 ns per loop
sage: timeit("B()", number=10^4, repeat=20)
10000 loops, best of 20: 14.5 µs per loop

comment:52 in reply to: ↑ 51 ; follow-up: Changed 4 years ago by nthiery

Replying to jdemeyer:

That's not a fair benchmark, since that's Python, not Cython. The real benchmark is this:

sage: cython("cdef class A(object): pass")
sage: cython("from sage.structure.parent cimport Parent\ncdef class B(Parent): pass")
sage: timeit("A()", number=10^4, repeat=20)
10000 loops, best of 20: 68.9 ns per loop
sage: timeit("B()", number=10^4, repeat=20)
10000 loops, best of 20: 14.5 µs per loop

Thanks Vincent and Jeroen. For fairness, we should also initialize the category. Here it is:

sage: %timeit B()
The slowest run took 4.13 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 13.3 µs per loop
sage: %timeit B(category=C)
The slowest run took 7.14 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 21.7 µs per loop

Those benchmarks teach us that constructing a Parent costs 200 times more than an empty object. Not quite surprising, but it's a start. Now what I really had in mind was a benchmark in a real use case, to estimate what's the relative overhead in a typical situation.

Let's take something really trivial:

sage: sage: %timeit IntegerListsLex(n=1, max_length=1).an_element()
The slowest run took 6.67 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 113 µs per loop

Now we start to have some concrete ground to start discussions: what we are speaking about is an overhead of 13% in the case of the most trivial operation. There probably is some room for improvements in the initialization of IntegerListsLex objects; but not so much either: there is some unavoidable option parsing and initialization work to do. Similarly, there may be room for improvement in the initialization of a Parent (I am surprised that initializing the category is costly, since for a Cython parent it does nothing but set an attribute). Altogether, the 13% has to be taken with a grain of salt at this point.

For additional insight, I believe we would be need to have some typical concrete use case, like the one Vincent has for matrices or flat surfaces. And see, then, what's the actual overhead. With that, we will be able to take an informed decision; not just out of fear that "parents are slow".

Vincent, could you pickup such a situation and make a benchmark?

In general I would like to follow this rule of thumb for our enumerated sets:

  • On one hand, implement super optimized standalone iterators with bare bone API, whenever there is a proven need for them.
  • On the other hand, when modeling an enumerated set, implement it as a Parents with nice input parsing and a full fledged API; and typically using the above iterators under the hood.
  • Avoid half measures with iterables that are not Parent.

Here the situation is a bit delicate since we are in between the shores: the iterator needs some non trivial parsing and internal data. Which calls for an intermediate class handling the parsing and the data storage; is this class meant to model the enumerated set itself or not?

All that being said, just make your call. I am fine with additional complexity when there is a proven and worthwhile need for it. And when someone is willing to pay for the maintenance overhead ...

Cheers,

Nicolas

Last edited 4 years ago by nthiery (previous) (diff)

comment:53 in reply to: ↑ 52 ; follow-up: Changed 4 years ago by jdemeyer

Replying to nthiery:

In general I would like to follow this rule of thumb for our enumerated sets:

  • On one hand, implement super optimized standalone iterators with bare bone API, whenever there is a proven need for them.
  • On the other hand, when modeling an enumerated set, implement it as a Parents with nice input parsing and a full fledged API; and typically using the above iterators under the hood.
  • Avoid half measures with iterables that are not Parent.

Why do you think that the current implementation is a "half measure"? I would say it fits your "super optimized standalone iterators with bare bone API" (except that it's currently not yet super optimized, see #18055 and #18056).

comment:54 in reply to: ↑ 53 ; follow-up: Changed 4 years ago by nthiery

Good morning Jeroen,

Replying to jdemeyer:

Why do you think that the current implementation is a "half measure"? I would say it fits your "super optimized standalone iterators with bare bone API" (except that it's currently not yet super optimized, see #18055 and #18056).

Yes, I am very happy with the Cython iterator (and its upcoming optimizations) and the Parent enumerated sets (IntegerLists*). What I am wondering about are the intermediate classes (IntegerLists*Impl). What's their precise role? Do we really need them?

  • Are there situations where an iterator is not enough (we really need to model the enumerated set itself), yet we can't afford the overhead of having a Parent?

And / or

  • Are the *Impl just utility classes that are mandatory to construct the iterator?

Maybe we should have a short video conference at some point; it would be much easier to discuss!

comment:55 in reply to: ↑ 54 Changed 4 years ago by jdemeyer

Replying to nthiery:

Yes, I am very happy with the Cython iterator (and its upcoming optimizations) and the Parent enumerated sets (IntegerLists*). What I am wondering about are the intermediate classes (IntegerLists*Impl). What's their precise role?

Well, I can see two main use cases:

  1. if you need to iterate twice over the same set.
  2. to check containment.

In the future, point counting could be added as an extra algorithm in the Impl class, not in the iterator.

comment:56 in reply to: ↑ 44 ; follow-up: Changed 4 years ago by jdemeyer

Replying to nthiery:

  • At least, would there be a way to implement XXX.__reduce__ to return (XXX, *) rather than having to implement an unpickle_XXX function each time?

That's not quite possible because of https://groups.google.com/forum/#!topic/sage-devel/limDRGuep34

comment:57 in reply to: ↑ 56 Changed 4 years ago by ncohen

That's not quite possible because of https://groups.google.com/forum/#!topic/sage-devel/limDRGuep34

... which was also solved with an 'unpickle' function T_T

comment:58 Changed 4 years ago by jdemeyer

Actually, pickling can be implemented better using __getstate__ and __setstate__.

comment:59 Changed 4 years ago by git

  • Commit changed from 9582a7a5248f609a2d2de02497b4b1d36d9dd96e to c9408574da27b31030a8d2eb5c800dc838ae4132

Branch pushed to git repo; I updated commit sha1. New commits:

c940857Simplify pickling for IntegerListsImpl

comment:60 Changed 4 years ago by jdemeyer

For Envelope, this is harder since __init__ really does some non-trivial stuff like the sign handling.

comment:61 Changed 4 years ago by git

  • Commit changed from c9408574da27b31030a8d2eb5c800dc838ae4132 to d46541fb29ca682163ba62e66e48fc8128f2c825

Branch pushed to git repo; I updated commit sha1. New commits:

d46541fFix __richcmp__ doctests

comment:62 in reply to: ↑ 43 Changed 4 years ago by jdemeyer

Replying to aschilling:

One question I have though is about the comparison functions in base.py and lists.py. The tests look almost identical.

Right, I fixed the tests.

comment:63 in reply to: ↑ 44 ; follow-up: Changed 4 years ago by jdemeyer

Replying to nthiery:

Maybe slightly more systematic class names could help; say: IntegerLists, IntegerListsLex, IntegerListsBackend, IntegerListsLexBackend.

Do you find ...Backend more clear than ...Impl? If you care a lot, I can change it, but it smells like bike-shedding.

comment:64 Changed 4 years ago by git

  • Commit changed from d46541fb29ca682163ba62e66e48fc8128f2c825 to 5f7a883c07534a0d7861dd4dc46e0b45ae4ff1ed

Branch pushed to git repo; I updated commit sha1. New commits:

5f7a883Rename attribute "_parent" to "impl"

comment:65 in reply to: ↑ 63 ; follow-up: Changed 4 years ago by nthiery

Replying to jdemeyer:

Replying to nthiery:

Maybe slightly more systematic class names could help; say: IntegerLists, IntegerListsLex, IntegerListsBackend, IntegerListsLexBackend.

Do you find ...Backend more clear than ...Impl? If you care a lot, I can change it, but it smells like bike-shedding.

I have a preference for Backend since we use this vocable elsewhere in Sage (e.g. in the graph library), but don't really care otherwise; you choose. My comment was more about having a consistent naming scheme. It could be enough to e.g. rename IntegerListsImpl_invlex to IntegerListsLexImpl.

Thanks by the way for investigating how to better do the pickling!

comment:66 in reply to: ↑ 65 ; follow-up: Changed 4 years ago by jdemeyer

Replying to nthiery:

Replying to jdemeyer:

Replying to nthiery:

Maybe slightly more systematic class names could help; say: IntegerLists, IntegerListsLex, IntegerListsBackend, IntegerListsLexBackend.

Do you find ...Backend more clear than ...Impl? If you care a lot, I can change it, but it smells like bike-shedding.

I have a preference for Backend since we use this vocable elsewhere in Sage (e.g. in the graph library), but don't really care otherwise; you choose. My comment was more about having a consistent naming scheme. It could be enough to e.g. rename IntegerListsImpl_invlex to IntegerListsLexImpl.

Well, I actually prefer IntegerListsImpl_invlex over IntegerListsLexImpl.

I intentionally do not want IntegerListsImpl_invlex to be associated too much with IntegerListsLex: people should not think that there must be a 1-to-1 correspondence between IntegerLists classes and IntegerListsImpl classes.

comment:67 Changed 4 years ago by jdemeyer

After replacing Impl by Backend:

Error compiling Cython file:
------------------------------------------------------------
...
        IntegerListsBackend.__init__(self, *args, **kwds)

        self.check = check

        if self.min_part < 0:
            raise NotBackendementedError("strictly negative min_part")
                                       ^
------------------------------------------------------------

sage/combinat/integer_lists/invlex.pyx:848:40: undeclared name not builtin: NotBackendementedError

comment:68 Changed 4 years ago by vdelecroix

What is your strategy, defining NotBackendementedError?

comment:69 Changed 4 years ago by git

  • Commit changed from 5f7a883c07534a0d7861dd4dc46e0b45ae4ff1ed to 434a15236a27b2d6aef0ab95877aa396e86b1cf8

Branch pushed to git repo; I updated commit sha1. New commits:

434a152Rename Impl -> Backend

comment:70 in reply to: ↑ 66 ; follow-up: Changed 4 years ago by nthiery

Replying to jdemeyer:

Well, I actually prefer IntegerListsImpl_invlex over IntegerListsLexImpl.

I intentionally do not want IntegerListsImpl_invlex to be associated too much with IntegerListsLex: people should not think that there must be a 1-to-1 correspondence between IntegerLists classes and IntegerListsImpl classes.

I see your point. Still I find useful to give the reader a hint that, in this case, there is such a correspondence. When reading the code, I had to dig around to make sure I was getting it right.

So I'd rather have a consistent naming scheme, and a note to the developers, in the overview documentation of the module, that there need not be in general such a 1-to-1 correspondence.

Cheers,

Nicolas

PS: fun error message :-)

comment:71 in reply to: ↑ 70 Changed 4 years ago by jdemeyer

Replying to nthiery:

I see your point. Still I find useful to give the reader a hint that, in this case, there is such a correspondence.

There is not, even in this case. Note that IntegerListsLex is just IntegerLists with a classcall method and a default backend. I can easily create an IntegerLists instance with a IntegerListsBackend_invlex backend or an IntegerListsLex instance with a different backend.

comment:72 Changed 4 years ago by git

  • Commit changed from 434a15236a27b2d6aef0ab95877aa396e86b1cf8 to 6ae2e2264bf769ca276413d2c1d73d597a93cbbb

Branch pushed to git repo; I updated commit sha1. New commits:

6ae2e22Some doc improvement

comment:73 Changed 4 years ago by chapoton

  • Status changed from needs_review to needs_work

needs rebase, does not apply

comment:74 Changed 4 years ago by jdemeyer

Do you have an intention to review the ticket once I rebase it? I'm just asking since there is no point in rebasing if it just continues to bitrot.

comment:75 Changed 4 years ago by nthiery

Thanks for asking. The review is on my todo list for the last week of August.

comment:76 Changed 4 years ago by git

  • Commit changed from 6ae2e2264bf769ca276413d2c1d73d597a93cbbb to 35e2e50d85c85d6a71674d78f41ce2ba52f4c286

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

35e2e50Restructure IntegerListsLex code

comment:77 Changed 4 years ago by jdemeyer

  • Dependencies #18181, #18184 deleted

Rebased and squashed, testing now...

comment:78 Changed 4 years ago by git

  • Commit changed from 35e2e50d85c85d6a71674d78f41ce2ba52f4c286 to 3ebbb659f034dd17610a6afba03d307ffc7b483e

Branch pushed to git repo; I updated commit sha1. New commits:

3ebbb65Workaround for Trac #18192

comment:79 Changed 4 years ago by git

  • Commit changed from 3ebbb659f034dd17610a6afba03d307ffc7b483e to 88a100a762bad34400aca66c52d22255737072a7

Branch pushed to git repo; I updated commit sha1. New commits:

88a100aFix one more import of IntegerListsLex

comment:80 Changed 4 years ago by jdemeyer

  • Status changed from needs_work to needs_review

comment:81 Changed 4 years ago by kdilks

  • Milestone changed from sage-6.6 to sage-6.10

Merge conflict.

comment:82 Changed 4 years ago by jdemeyer

Do you actually plan to review this ticket? If you do, then I'll rebase immediately.

comment:83 Changed 4 years ago by tscrim

I will review this ticket once rebased.

comment:84 Changed 4 years ago by git

  • Commit changed from 88a100a762bad34400aca66c52d22255737072a7 to 6fcd10a8384e07df3d3302e9ae1e7c6f6f99217e

Branch pushed to git repo; I updated commit sha1. New commits:

6fcd10aMerge tag '6.9' into t/18109/ticket/18109

comment:85 follow-up: Changed 4 years ago by tscrim

  • Branch changed from u/jdemeyer/ticket/18109 to u/tscrim/refactor_integer_lists_lex-18109
  • Commit changed from 6fcd10a8384e07df3d3302e9ae1e7c6f6f99217e to 2e5e9691da0c2003d7eb3ddcf6c5c5010281f4dc
  • Dependencies set to #15525
  • Reviewers set to Travis Scrimshaw

I made some doc tweaks while we are moving everything around. Everything works and since we aren't actually changing the functionality, I'm not looking at that closely. In short, it looks goods.

In writing this comment, I've convinced myself that having a frontend class for each backend will likely be a good thing in practice as it can be used to distinguish different iteration orders without having to look at the backend (which may not be python visible at the end of the day). It also makes it easier to define string representations and natural extensions to subclasses.

In particular, #15525, which we will probably need to tweak some of those changes to support this new framework (and at least correct the imports, which is why I listed it as a dependency).


New commits:

2e5e969Some minor reviewer tweaks and doc changes.

comment:86 follow-up: Changed 4 years ago by jdemeyer

Two details:

  1. an set should be a set
  1. Why this change? I find the old version more readable.
    -        good_sum = (nu >= p.min_sum and nu <= p.max_sum)
    -        good_length = (l >= p.min_length and l <= p.max_length)
    -        no_trailing_zeros = (l <= max(p.min_length,0) or mu[-1] != 0)
    -        return good_sum and good_length and no_trailing_zeros
    +        return (nu >= p.min_sum and nu <= p.max_sum # Good sum
    +                 and l >= p.min_length and l <= p.max_length # Good length
    +                 and (l <= max(p.min_length,0) or mu[-1] != 0)) # No trailing zeros
    

If this is for performance reasons, you can instead cdef bint the variables good_sum, good_length and no_trailing_zeros.

comment:87 in reply to: ↑ 85 ; follow-up: Changed 4 years ago by jdemeyer

Replying to tscrim:

In writing this comment, I've convinced myself that having a frontend class for each backend will likely be a good thing in practice as it can be used to distinguish different iteration orders without having to look at the backend

But why would you need to "distinguish different iteration orders" in the first place?

In the few cases (maybe invlex is the only one?) where the iteration algorithm really matters, I understand that you want a separate front-end. However, it should be possible for a backend to iterate in an arbitrary order.

I really want to see the front-end and back-end as orthogonal: I can see use cases for different front-ends with the same back-end and use cases for different back-ends for the same front-end.

comment:88 in reply to: ↑ 86 Changed 4 years ago by tscrim

Replying to jdemeyer:

Two details:

  1. Why this change? I find the old version more readable.
    -        good_sum = (nu >= p.min_sum and nu <= p.max_sum)
    -        good_length = (l >= p.min_length and l <= p.max_length)
    -        no_trailing_zeros = (l <= max(p.min_length,0) or mu[-1] != 0)
    -        return good_sum and good_length and no_trailing_zeros
    +        return (nu >= p.min_sum and nu <= p.max_sum # Good sum
    +                 and l >= p.min_length and l <= p.max_length # Good length
    +                 and (l <= max(p.min_length,0) or mu[-1] != 0)) # No trailing zeros
    

If this is for performance reasons, you can instead cdef bint the variables good_sum, good_length and no_trailing_zeros.

It was for performance reasonings, but not the reason you're thinking of: short circuiting.

comment:89 in reply to: ↑ 87 ; follow-up: Changed 4 years ago by tscrim

  • Reviewers changed from Travis Scrimshaw to Anne Schilling, Travis Scrimshaw

Replying to jdemeyer:

Replying to tscrim:

In writing this comment, I've convinced myself that having a frontend class for each backend will likely be a good thing in practice as it can be used to distinguish different iteration orders without having to look at the backend

But why would you need to "distinguish different iteration orders" in the first place?

In the few cases (maybe invlex is the only one?) where the iteration algorithm really matters, I understand that you want a separate front-end. However, it should be possible for a backend to iterate in an arbitrary order.

I don't need this flexibility, but it comes as an added bonus. Right now we can check backends explicitly, but this allows us to separate that off into a completely separate package and still retain a differentiation.

I really want to see the front-end and back-end as orthogonal: I can see use cases for different front-ends with the same back-end and use cases for different back-ends for the same front-end.

I can see both use cases as well, but chances are there will be a coupling between base classes and backend. However I don't think it is really useful to continue this discussion because we both have our reasons for liking it. We just need to take care of the additions from #15525 (and the doc tweak above). Do you want me to do that (since that was my ticket) or will you?

comment:90 in reply to: ↑ 89 ; follow-up: Changed 4 years ago by jdemeyer

Replying to tscrim:

Do you want me to do that (since that was my ticket) or will you?

Go ahead. But please don't rewrite history, just merge #15525 and add extra commits.

comment:91 Changed 4 years ago by git

  • Commit changed from 2e5e9691da0c2003d7eb3ddcf6c5c5010281f4dc to 8b96bb5cca7f3a76a35b75e99406b7cb1c4cb0e5

Branch pushed to git repo; I updated commit sha1. New commits:

7982d3aIniital fix and added regular partitions.
4143c96Merge branch 'develop' into public/combinat/partitions_constraints-15525
1adb368Added deprecation to IntegerListLex global_options arg. Fixed doctests.
c477d6cMerge branch 'public/combinat/partitions_constraints-15525' of trac.sagemath.org:sage into public/combinat/partitions_constraints-15525
03dcda9Merge branch 'public/combinat/partitions_constraints-15525' of trac.sagemath.org:sage into public/combinat/partitions_constraints-15525
82f4fd5Merge branch 'public/combinat/partitions_constraints-15525' of trac.sagemath.org:sage into public/combinat/partitions_constraints-15525
0cc28a0Fixing some things.
1ec481dAddressing Kevin's comments.
7760c71Merge branch 'public/combinat/partitions_constraints-15525' of trac.sagemath.org:sage into u/tscrim/refactor_integer_lists_lex-18109
8b96bb5Adding changes from #15525.

comment:92 in reply to: ↑ 90 Changed 4 years ago by tscrim

Replying to jdemeyer:

Replying to tscrim:

Do you want me to do that (since that was my ticket) or will you?

Go ahead. But please don't rewrite history, just merge #15525 and add extra commits.

Done. So if you're okay with my changes, then we can set a positive review.

comment:93 Changed 4 years ago by jdemeyer

  • Status changed from needs_review to positive_review

comment:94 Changed 4 years ago by vbraun

  • Status changed from positive_review to needs_work

Doctests fail, try with the next beta

comment:95 Changed 4 years ago by jdemeyer

  • Branch changed from u/tscrim/refactor_integer_lists_lex-18109 to u/jdemeyer/refactor_integer_lists_lex-18109

comment:96 Changed 4 years ago by jdemeyer

  • Commit changed from 8b96bb5cca7f3a76a35b75e99406b7cb1c4cb0e5 to 824498345a881a9ebc21be694c3522d8687e275e

I'm testing this now...


New commits:

8244983Merge tag '6.10.beta0' into t/18109/refactor_integer_lists_lex-18109

comment:97 Changed 4 years ago by git

  • Commit changed from 824498345a881a9ebc21be694c3522d8687e275e to 65025ab68743a1dc6c30eb73b6db4e4283f49d4a

Branch pushed to git repo; I updated commit sha1. New commits:

65025abFix IntegerListsLex import

comment:98 Changed 4 years ago by jdemeyer

  • Status changed from needs_work to needs_review

comment:99 Changed 4 years ago by tscrim

  • Status changed from needs_review to positive_review

LGTM.

comment:100 Changed 4 years ago by vbraun

  • Branch changed from u/jdemeyer/refactor_integer_lists_lex-18109 to 65025ab68743a1dc6c30eb73b6db4e4283f49d4a
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.