Opened 2 years ago
Last modified 2 months ago
#24815 needs_work enhancement
Register FreeModuleElement to the Sequence ABC
Reported by:  embray  Owned by:  

Priority:  minor  Milestone:  sage9.1 
Component:  refactoring  Keywords:  
Cc:  Merged in:  
Authors:  Erik Bray  Reviewers:  
Report Upstream:  N/A  Work issues:  
Branch:  u/embray/refactoring/freemodulesequence (Commits)  Commit:  9660547f4707ffc85edec2c10d1d0e7e58c0b9dc 
Dependencies:  Stopgaps: 
Description
For most intents and purposes instances of FreeModuleElement
can be used equivalently to list
or tuple
, in that they all meet the requirements of the Sequence protocol (which on Python 3 also includes range
).
This simplifies some code that treats Vector
distinctly from list
and tuple
. To be clear, in this case we are explicitly treating the FreeModuleElement
type, and not the abstract Vector
type which does not meet these requirements, but that's okay since FreeModuleElement
implements any concrete vector space elements.
By convention, this always imports collections.Sequence
as SequenceABC
so as to not confuse it with Sage's Sequence
class.
Change History (25)
comment:1 Changed 2 years ago by
 Commit set to fcda724addfec8d664d0c2b149a2c17f0cc1ca52
comment:2 Changed 2 years ago by
One small concern I have with this is that str
/bytes
are also considered Sequence
s by Python. I think this is okay though, because under this treatment it's really no different than if passed, say, a list of one character strings. Then, if such as list is not valid input, a string is no less invalid. It just means that if you intend a function to accept string arguments to be processed separately from list/tuplelike arguments, they should always be handled first. This is what I would probably normally do anyways.
comment:3 Changed 2 years ago by
 Commit changed from fcda724addfec8d664d0c2b149a2c17f0cc1ca52 to 6644dbdc443203170f8dc5e35368ce3d8eaf1956
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
dc25e5f  Some simple examples where isinstance(..., (list, tuple, Vector)) could be replaced with SequenceABC

16c2af9  A slightly more involved replacement with SequenceABC

6644dbd  More misc PEP8 cleanup in the affected modules

comment:4 Changed 2 years ago by
This still has some failing testsI think mostly due to minor mistakes...
comment:5 Changed 2 years ago by
 Commit changed from 6644dbdc443203170f8dc5e35368ce3d8eaf1956 to 9660547f4707ffc85edec2c10d1d0e7e58c0b9dc
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
6af1167  A slightly more involved replacement with SequenceABC

876b4d4  More misc PEP8 cleanup in the affected modules

d0ec617  Added a line here

9660547  depending on what the base ring is, the arguments may be strings, which doesn't make sense for the code that follows this condition

comment:6 Changed 2 years ago by
 Status changed from new to needs_review
In fact, the main bug I had was due to the aforementioned concern about str
, where there was one case where it didn't make sense to treat lists of strings equivalently to lists of lists. This was the only such case, however (a slightly tricky one since it was in the matrix constructor, which has a lot of tricky logic pertaining to flexibility in its arguments).
comment:7 Changed 2 years ago by
Can you please not make changes to src/sage/matrix
because I'm dealing with that in #24742.
comment:8 Changed 2 years ago by
I'm not entirely convinced by using the Python ABC stuff. Of course we should register the Sage types in the ABC, but using the ABC is a different matter.
First of all, why Sequence
? It excludes stuff that you typically want to include such as generators and the itertools stuff like zip(...)
.
Second, the ABC is horribly slow. It's about a factor 200 slower than checking isinstance(foo, (tuple, list))
, a factor 40 slower than is_Vector()
and a factor 25 slower than checking for iterability. I tested this in Cython, when given a FreeModuleElement
as input.
So, why not check for iterability instead (and maybe exclude dicts)? That's what I plan to do in #24742.
comment:9 Changed 2 years ago by
 Status changed from needs_review to needs_work
comment:10 followup: ↓ 13 Changed 2 years ago by
I could incorporate #24742 if it even makes sense to, or just drop the changes there since it's not that important.
There are several problems with checking just for iterability. In most cases (if not all) that I updated here the code also uses indexing, so you really need that aspect of Sequence
as well. The other problem is merely checking for iterability is that it can allow through infinite iterables (whereas Sequence
and perhaps this a misnomer on its partmust be finite). I discussed this problem more in #24757. As I also mentioned there, while on some level it's the user's responsibility not to pass in infinite generators, for example, where they don't belong, it's nice to prevent it where possible. zip, map, filter, etc. are all problematic here.
The performance issue is a real concern though. Just note that in addition to list
, tuple
, and Vector
this is also adding support for range
on Python 3 (and by extension any other custom type that would work just as well here, including possibly some other types in Sage).
I guess it's a question of context and what the relative overhead is. 200 x (eps < 1ns) is nothing if the overall method is on the order of 10ms, say, and what the likelihood is of using that call in a loop (or even its frequency in the test suite). For the matrix constructor most of all that's certainly worth looking at closely.
comment:11 followup: ↓ 12 Changed 2 years ago by
On the performance question, again, perhaps for the sake of Cython optimization one could first check list
/tuple
explicitly since those are going to be the most common anyways, and then fall back on the more generic check.
comment:12 in reply to: ↑ 11 ; followup: ↓ 15 Changed 2 years ago by
Replying to embray:
On the performance question, again, perhaps for the sake of Cython optimization one could first check
list
/tuple
explicitly since those are going to be the most common anyways, and then fall back on the more generic check.
Come to think of it, this would be an optimization worth building into Cython, if possible. Common ABC from Python's stdlib > exact type checks for common builtin types that implement that ABC, followed by generic isinstance(...)
.
comment:13 in reply to: ↑ 10 Changed 2 years ago by
comment:14 followup: ↓ 16 Changed 2 years ago by
I don't mind, but then #24742 should do something about this, or a followup to it. There are cases where it should accept vectors and ranges at the very least, but currently doesn't.
comment:15 in reply to: ↑ 12 ; followup: ↓ 20 Changed 2 years ago by
Replying to embray:
Come to think of it, this would be an optimization worth building into Cython, if possible. Common ABC from Python's stdlib > exact type checks for common builtin types that implement that ABC, followed by generic
isinstance(...)
.
But then the case where it's not an instance of the ABC will still be slow.
I have sometimes been thinking about adding a fast Cython function caching calls to isinstance(..., some_type)
. If implemented well, this could be very fast. It could be used for ABCs as well as for other common type checks in Sage like isinstance(..., Element)
. We could take inspiration from src/sage/structure/coerce_dict.pyx
.
comment:16 in reply to: ↑ 14 Changed 2 years ago by
comment:17 Changed 2 years ago by
I think though, that the case where it does the isinstance
check would be relatively rare if the common cases are handled first. Before worrying extensively about it one would have to prove that there's an important common case that it actually substantially impacts.
comment:18 Changed 22 months ago by
 Milestone changed from sage8.2 to sage8.3
comment:19 Changed 20 months ago by
 Milestone changed from sage8.3 to sage8.4
I believe this issue can reasonably be addressed for Sage 8.4.
comment:20 in reply to: ↑ 15 Changed 18 months ago by
Replying to jdemeyer:
Replying to embray:
Come to think of it, this would be an optimization worth building into Cython, if possible. Common ABC from Python's stdlib > exact type checks for common builtin types that implement that ABC, followed by generic
isinstance(...)
.But then the case where it's not an instance of the ABC will still be slow.
I have sometimes been thinking about adding a fast Cython function caching calls to
isinstance(..., some_type)
. If implemented well, this could be very fast. It could be used for ABCs as well as for other common type checks in Sage likeisinstance(..., Element)
. We could take inspiration fromsrc/sage/structure/coerce_dict.pyx
.
I'm thinking about this again, and wondering what exactly you mean. I think we really need some convenient isiterable
and perhaps issequence
functions, perhaps in sage.cpython
. It doesn't necessarily have to work with the ABC framework. Perhaps we can just explicitly enumerate some specific types we care about for these cases (list, tuple, vector, (x)range, ...) which are going to be by far the most common. On the other hand, this is pretty inflexible, and that's where the ABC system is quite nice as it is very flexible, but that comes at a small cost.
There is also already caching built into ABCs, albeit implemented mostly in pure Python. Each ABC has a _abc_cache
attribute, which is a WeakSet
of all types known to implement that ABC (e.g. Sequence._abc_cache
on Python 2 contains, by default, list
, tuple
, basestring
, buffer
, and xrange
). There is also a Sequence._abc_negative_cache
for negative lookups, also WeakSet
. This gets invalidated any time a new ABC is registered.
Some interesting news is that Python 3.7 added a new C implementation of ABCMeta, which makes everything a bit faster. On Python 2.7:
python2.7 m timeit s 'from collections import Sequence; l=[]' 'isinstance(l, Sequence)'
and on 3.7:
python3.7 m timeit s 'from collections.abc import Sequence; l=[]' 'isinstance(l, Sequence)' 500000 loops, best of 5: 543 nsec per loop
So a little more than twice as fast?
Negative lookups are improved even more:
python2.7 m timeit s 'from collections import Sequence; l=0' 'isinstance(l, Sequence)' 100000 loops, best of 3: 2.17 usec per loop
python3.7 m timeit s 'from collections.abc import Sequence; l=0' 'isinstance(l, Sequence)' 500000 loops, best of 5: 534 nsec per loop
Of course, isinstance on a nonABC is still much faster:
python3.7 m timeit s 'from collections.abc import Sequence; l=0' 'isinstance(l, tuple)' 2000000 loops, best of 5: 168 nsec per loop
but naturally, it slows down if the argument is a tuple, the more elements you add. This is probably significantly optimized in Cython too.
Going back to Python 2, I think nearly half the time is just spent in the lookup into the WeakSet
cache:
python2.7 m timeit s 'from collections import Sequence; l=[]' 't = getattr(l, "__class__", None); t is not None and t in Sequence._abc_cache' 1000000 loops, best of 3: 0.663 usec per loop
(I can't make this test as easily on Python 3.7 since the _abc_cache
is no longer exposed as Python attribute). So the C implementation from Python 3.7 is probably spending the majority of its time in this cache lookup, since it's doing away with some overhead from making a Python method call, the getattr
call, etc.
I wonder if we couldn't do better. The use of a WeakSet
makes sense for the case of classes that are created and later destroyed (more commonly these will be in the negative cache, since you'd be even less likely to register them to an ABC). That said, it's still pretty uncommon even for classes to be deleted. For builtin types, and for most classes in Sage, classes will never be deleted, and we could just cache them in a fixedsize array (if a new class is registered to the ABC it would have to be realloc'd, but this is rare).
Though I have doubts whether it's worth microoptimizing this further. So I'd lean instead toward just backporting the C implementation of ABCMeta to Python 2, if anything at all.
comment:21 Changed 16 months ago by
 Milestone changed from sage8.4 to sage8.5
comment:22 Changed 14 months ago by
 Milestone changed from sage8.5 to sage8.7
Retargeting some of my tickets (somewhat optimistically for now).
comment:23 Changed 11 months ago by
 Milestone changed from sage8.7 to sage8.8
Moving all my inprogress tickets to 8.8 milestone.
comment:24 Changed 9 months ago by
 Milestone changed from sage8.8 to sage8.9
Tickets still needing working or clarification should be moved to the next release milestone at the soonest (please feel free to revert if you think the ticket is close to being resolved).
comment:25 Changed 2 months ago by
 Milestone changed from sage8.9 to sage9.1
Ticket retargeted after milestone closed
Branch pushed to git repo; I updated commit sha1. New commits:
Register FreeModuleElement to the Sequence ABC
Some simple examples where isinstance(..., (list, tuple, Vector)) could be replaced with SequenceABC
A slightly more involved replacement with SequenceABC
More misc PEP8 cleanup in the affected modules