Opened 20 months ago

Last modified 4 months ago

#24815 needs_work enhancement

Register FreeModuleElement to the Sequence ABC

Reported by: embray Owned by:
Priority: minor Milestone: sage-8.9
Component: refactoring Keywords:
Cc: Merged in:
Authors: Erik Bray Reviewers:
Report Upstream: N/A Work issues:
Branch: u/embray/refactoring/free-module-sequence (Commits) Commit: 9660547f4707ffc85edec2c10d1d0e7e58c0b9dc
Dependencies: Stopgaps:

Description

For most intents and purposes instances of FreeModuleElement can be used equivalently to list or tuple, in that they all meet the requirements of the Sequence protocol (which on Python 3 also includes range).

This simplifies some code that treats Vector distinctly from list and tuple. To be clear, in this case we are explicitly treating the FreeModuleElement type, and not the abstract Vector type which does not meet these requirements, but that's okay since FreeModuleElement implements any concrete vector space elements.

By convention, this always imports collections.Sequence as SequenceABC so as to not confuse it with Sage's Sequence class.

Change History (24)

comment:1 Changed 20 months ago by git

  • Commit set to fcda724addfec8d664d0c2b149a2c17f0cc1ca52

Branch pushed to git repo; I updated commit sha1. New commits:

a8d1c0eRegister FreeModuleElement to the Sequence ABC
a73178dSome simple examples where isinstance(..., (list, tuple, Vector)) could be replaced with SequenceABC
da6b45aA slightly more involved replacement with SequenceABC
fcda724More misc PEP-8 cleanup in the affected modules

comment:2 Changed 20 months ago by embray

One small concern I have with this is that str/bytes are also considered Sequences by Python. I think this is okay though, because under this treatment it's really no different than if passed, say, a list of one character strings. Then, if such as list is not valid input, a string is no less invalid. It just means that if you intend a function to accept string arguments to be processed separately from list/tuple-like arguments, they should always be handled first. This is what I would probably normally do anyways.

Last edited 20 months ago by embray (previous) (diff)

comment:3 Changed 20 months ago by git

  • Commit changed from fcda724addfec8d664d0c2b149a2c17f0cc1ca52 to 6644dbdc443203170f8dc5e35368ce3d8eaf1956

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

dc25e5fSome simple examples where isinstance(..., (list, tuple, Vector)) could be replaced with SequenceABC
16c2af9A slightly more involved replacement with SequenceABC
6644dbdMore misc PEP-8 cleanup in the affected modules

comment:4 Changed 20 months ago by embray

This still has some failing tests--I think mostly due to minor mistakes...

comment:5 Changed 20 months ago by git

  • Commit changed from 6644dbdc443203170f8dc5e35368ce3d8eaf1956 to 9660547f4707ffc85edec2c10d1d0e7e58c0b9dc

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

6af1167A slightly more involved replacement with SequenceABC
876b4d4More misc PEP-8 cleanup in the affected modules
d0ec617Added a line here
9660547depending on what the base ring is, the arguments may be strings, which doesn't make sense for the code that follows this condition

comment:6 Changed 20 months ago by embray

  • Status changed from new to needs_review

In fact, the main bug I had was due to the aforementioned concern about str, where there was one case where it didn't make sense to treat lists of strings equivalently to lists of lists. This was the only such case, however (a slightly tricky one since it was in the matrix constructor, which has a lot of tricky logic pertaining to flexibility in its arguments).

comment:7 Changed 20 months ago by jdemeyer

Can you please not make changes to src/sage/matrix because I'm dealing with that in #24742.

comment:8 Changed 20 months ago by jdemeyer

I'm not entirely convinced by using the Python ABC stuff. Of course we should register the Sage types in the ABC, but using the ABC is a different matter.

First of all, why Sequence? It excludes stuff that you typically want to include such as generators and the itertools stuff like zip(...).

Second, the ABC is horribly slow. It's about a factor 200 slower than checking isinstance(foo, (tuple, list)), a factor 40 slower than is_Vector() and a factor 25 slower than checking for iterability. I tested this in Cython, when given a FreeModuleElement as input.

So, why not check for iterability instead (and maybe exclude dicts)? That's what I plan to do in #24742.

comment:9 Changed 20 months ago by jdemeyer

  • Status changed from needs_review to needs_work

comment:10 follow-up: Changed 20 months ago by embray

I could incorporate #24742 if it even makes sense to, or just drop the changes there since it's not that important.

There are several problems with checking just for iterability. In most cases (if not all) that I updated here the code also uses indexing, so you really need that aspect of Sequence as well. The other problem is merely checking for iterability is that it can allow through infinite iterables (whereas Sequence--and perhaps this a misnomer on its part--must be finite). I discussed this problem more in #24757. As I also mentioned there, while on some level it's the user's responsibility not to pass in infinite generators, for example, where they don't belong, it's nice to prevent it where possible. zip, map, filter, etc. are all problematic here.

The performance issue is a real concern though. Just note that in addition to list, tuple, and Vector this is also adding support for range on Python 3 (and by extension any other custom type that would work just as well here, including possibly some other types in Sage).

I guess it's a question of context and what the relative overhead is. 200 x (eps < 1ns) is nothing if the overall method is on the order of 10ms, say, and what the likelihood is of using that call in a loop (or even its frequency in the test suite). For the matrix constructor most of all that's certainly worth looking at closely.

comment:11 follow-up: Changed 20 months ago by embray

On the performance question, again, perhaps for the sake of Cython optimization one could first check list/tuple explicitly since those are going to be the most common anyways, and then fall back on the more generic check.

comment:12 in reply to: ↑ 11 ; follow-up: Changed 20 months ago by embray

Replying to embray:

On the performance question, again, perhaps for the sake of Cython optimization one could first check list/tuple explicitly since those are going to be the most common anyways, and then fall back on the more generic check.

Come to think of it, this would be an optimization worth building into Cython, if possible. Common ABC from Python's stdlib --> exact type checks for common built-in types that implement that ABC, followed by generic isinstance(...).

comment:13 in reply to: ↑ 10 Changed 20 months ago by jdemeyer

Replying to embray:

I could incorporate #24742

Better make this ticket independent from #24742 and just remove the changes to src/sage/matrix from this ticket.

comment:14 follow-up: Changed 20 months ago by embray

I don't mind, but then #24742 should do something about this, or a follow-up to it. There are cases where it should accept vectors and ranges at the very least, but currently doesn't.

comment:15 in reply to: ↑ 12 ; follow-up: Changed 20 months ago by jdemeyer

Replying to embray:

Come to think of it, this would be an optimization worth building into Cython, if possible. Common ABC from Python's stdlib --> exact type checks for common built-in types that implement that ABC, followed by generic isinstance(...).

But then the case where it's not an instance of the ABC will still be slow.

I have sometimes been thinking about adding a fast Cython function caching calls to isinstance(..., some_type). If implemented well, this could be very fast. It could be used for ABCs as well as for other common type checks in Sage like isinstance(..., Element). We could take inspiration from src/sage/structure/coerce_dict.pyx.

comment:16 in reply to: ↑ 14 Changed 20 months ago by jdemeyer

Replying to embray:

I don't mind, but then #24742 should do something about this

It will.

comment:17 Changed 20 months ago by embray

I think though, that the case where it does the isinstance check would be relatively rare if the common cases are handled first. Before worrying extensively about it one would have to prove that there's an important common case that it actually substantially impacts.

comment:18 Changed 18 months ago by embray

  • Milestone changed from sage-8.2 to sage-8.3

comment:19 Changed 15 months ago by embray

  • Milestone changed from sage-8.3 to sage-8.4

I believe this issue can reasonably be addressed for Sage 8.4.

comment:20 in reply to: ↑ 15 Changed 14 months ago by embray

Replying to jdemeyer:

Replying to embray:

Come to think of it, this would be an optimization worth building into Cython, if possible. Common ABC from Python's stdlib --> exact type checks for common built-in types that implement that ABC, followed by generic isinstance(...).

But then the case where it's not an instance of the ABC will still be slow.

I have sometimes been thinking about adding a fast Cython function caching calls to isinstance(..., some_type). If implemented well, this could be very fast. It could be used for ABCs as well as for other common type checks in Sage like isinstance(..., Element). We could take inspiration from src/sage/structure/coerce_dict.pyx.

I'm thinking about this again, and wondering what exactly you mean. I think we really need some convenient isiterable and perhaps issequence functions, perhaps in sage.cpython. It doesn't necessarily have to work with the ABC framework. Perhaps we can just explicitly enumerate some specific types we care about for these cases (list, tuple, vector, (x)range, ...) which are going to be by far the most common. On the other hand, this is pretty inflexible, and that's where the ABC system is quite nice as it is very flexible, but that comes at a small cost.

There is also already caching built into ABCs, albeit implemented mostly in pure Python. Each ABC has a _abc_cache attribute, which is a WeakSet of all types known to implement that ABC (e.g. Sequence._abc_cache on Python 2 contains, by default, list, tuple, basestring, buffer, and xrange). There is also a Sequence._abc_negative_cache for negative lookups, also WeakSet. This gets invalidated any time a new ABC is registered.

Some interesting news is that Python 3.7 added a new C implementation of ABCMeta, which makes everything a bit faster. On Python 2.7:

python2.7 -m timeit -s 'from collections import Sequence; l=[]' 'isinstance(l, Sequence)'

and on 3.7:

python3.7 -m timeit -s 'from collections.abc import Sequence; l=[]' 'isinstance(l, Sequence)'
500000 loops, best of 5: 543 nsec per loop

So a little more than twice as fast?

Negative lookups are improved even more:

python2.7 -m timeit -s 'from collections import Sequence; l=0' 'isinstance(l, Sequence)'
100000 loops, best of 3: 2.17 usec per loop
python3.7 -m timeit -s 'from collections.abc import Sequence; l=0' 'isinstance(l, Sequence)'
500000 loops, best of 5: 534 nsec per loop

Of course, isinstance on a non-ABC is still much faster:

python3.7 -m timeit -s 'from collections.abc import Sequence; l=0' 'isinstance(l, tuple)'
2000000 loops, best of 5: 168 nsec per loop

but naturally, it slows down if the argument is a tuple, the more elements you add. This is probably significantly optimized in Cython too.

Going back to Python 2, I think nearly half the time is just spent in the lookup into the WeakSet cache:

python2.7 -m timeit -s 'from collections import Sequence; l=[]' 't = getattr(l, "__class__", None); t is not None and t in Sequence._abc_cache'
1000000 loops, best of 3: 0.663 usec per loop

(I can't make this test as easily on Python 3.7 since the _abc_cache is no longer exposed as Python attribute). So the C implementation from Python 3.7 is probably spending the majority of its time in this cache lookup, since it's doing away with some overhead from making a Python method call, the getattr call, etc.

I wonder if we couldn't do better. The use of a WeakSet makes sense for the case of classes that are created and later destroyed (more commonly these will be in the negative cache, since you'd be even less likely to register them to an ABC). That said, it's still pretty uncommon even for classes to be deleted. For builtin types, and for most classes in Sage, classes will never be deleted, and we could just cache them in a fixed-size array (if a new class is registered to the ABC it would have to be realloc'd, but this is rare).

Though I have doubts whether it's worth micro-optimizing this further. So I'd lean instead toward just backporting the C implementation of ABCMeta to Python 2, if anything at all.

comment:21 Changed 12 months ago by embray

  • Milestone changed from sage-8.4 to sage-8.5

comment:22 Changed 10 months ago by embray

  • Milestone changed from sage-8.5 to sage-8.7

Retargeting some of my tickets (somewhat optimistically for now).

comment:23 Changed 7 months ago by embray

  • Milestone changed from sage-8.7 to sage-8.8

Moving all my in-progress tickets to 8.8 milestone.

comment:24 Changed 4 months ago by embray

  • Milestone changed from sage-8.8 to sage-8.9

Tickets still needing working or clarification should be moved to the next release milestone at the soonest (please feel free to revert if you think the ticket is close to being resolved).

Note: See TracTickets for help on using tickets.