Opened 12 months ago

Last modified 5 months ago

#26769 needs_review enhancement

Add an issequence utility to check for list, tuple, and other compatible objects

Reported by: embray Owned by:
Priority: major Milestone: sage-8.9
Component: misc Keywords: python3
Cc: chapoton, jdemeyer Merged in:
Authors: Erik Bray Reviewers:
Report Upstream: N/A Work issues:
Branch: u/embray/misc/issequence (Commits) Commit: 1d7066c898c4885ff33e5385b20d5892993cbc98
Dependencies: Stopgaps:

Description (last modified by embray)

Adds an issequence() function that can work as a more generic replacement for isinstance(x, (list, tuple)) but will also work with a broader range of similar types (e.g. xrange, Sage vectors, Numpy arrays, etc). Note issequence(x) is also True for str and bytes. So when using this to replace (list, tuple) care should be taken to make sure other sequence-like types are handled first, in cases where they require separate handling in the first place.

For the very common case of (list, tuple) this is faster than isinstance, but also has the benefit of being more generic, while not quite as generic as the much slower isinstance(x, Sequence).

#24804 demonstrates an example usage which allows the constructor in question to also accept a Python 3 range object (an example which appears in the doctests that did not otherwise work). This will likely be useful elsewhere but we can find those examples as we come to them.

Change History (17)

comment:1 Changed 12 months ago by embray

  • Status changed from new to needs_review

comment:2 follow-ups: Changed 12 months ago by jdemeyer

  1. What's the difference between this and isinstance(x, Sequence)? I feel like that should be explained better.
  1. There is no point in adding an empty .pyx file (unless I'm missing something)

comment:3 follow-up: Changed 12 months ago by jdemeyer

  1. Is it possible for a subclass of list/tuple not to be a sequence? Just asking because you use FOO_CheckExact as opposed to FOO_Check.

comment:4 in reply to: ↑ 2 Changed 12 months ago by embray

Replying to jdemeyer:

  1. What's the difference between this and isinstance(x, Sequence)? I feel like that should be explained better.

These are technically completely different things. The C-API level "sequence protocol" is defined by implementing some or all of the PySequenceMethods (the way PySequence_Check is implemented all it cares about is that sq_item by implemented at a minimum).

This is different from the Python level collections.abc "Sequence" ABC which requires that both __getitem__ and __len__ are implemented.

This inconsistent overloading of terms is irritating and confusing, but it is what it is. So this interface defines my own sort of middle-ground which is closer to the ABC in that it requires the C-level sequence interface (which Python classes that implement __getitem__ have). Unfortunately, at the Python level there is no way to explicitly distinguish between a "sequence" and a "mapping".

A class that implements a custom __getitem__ may work as one or the other, or both, so the best we can do is check for __getitem__ and __len__ and hope it works "like a list", which is how I'm defining "sequence" in this case.

We special case tuple and list here since they are going to be quite common and are known to be the two most fundamental sequence types built into Python (as opposed to dict, which is not considered a sequence).

  1. There is no point in adding an empty .pyx file (unless I'm missing something)

If you don't have the .pyx file Cython won't actually generate a Python module that can be used by Python. You can still cimport cpdef functions defined in a .pxd file from other Cython code, but you can't use it in Python :/

Version 0, edited 12 months ago by embray (next)

comment:5 in reply to: ↑ 3 Changed 12 months ago by embray

Replying to jdemeyer:

  1. Is it possible for a subclass of list/tuple not to be a sequence? Just asking because you use FOO_CheckExact as opposed to FOO_Check.

From the C API perspective a subclass will still technically be a sequence, but there's no guarantee that it isn't broken in some way, and it's still necessary in those cases to go through the slower but more generic PySequence_ITEM calls, as opposed to PyList_GET_ITEM when accessing items, for example.

Likewise in issequence, for subclasses it's kind of necessary to go through the slightly slower path of calling PySequence_Length, and you only add overhead to the more common "exact" case, while gaining nothing.

comment:6 Changed 12 months ago by embray

  • Description modified (diff)

comment:7 Changed 12 months ago by embray

It would be better if Python had, say, a __getindex__ and related magic methods that could explicitly distinguish a type as a sequence as opposed to a more general mapping. Perhaps I will propose that...

comment:8 in reply to: ↑ 2 Changed 12 months ago by embray

Replying to jdemeyer:

  1. What's the difference between this and isinstance(x, Sequence)? I feel like that should be explained better.

Now that I've had lunch, let me give a slightly clearer answer to this: This is effectively the same as isinstance(x, Sequence), but it is much faster, both in the positive and negative cases, in that it bypasses all of the slow isinstance machinery entirely and does not result in calls to Sequence.__subclasshook__ for types that do not obviously already implement the Sequence protocol.

The only case where it can be a slightly slow is for the cases where it has to call PySequence_Length, but that itself is only slow on classes that already implement __getitem__ and __len__.

comment:9 Changed 12 months ago by chapoton

What about point 2 in comment:2 ?

EDIT: OK, I missed that you answered that already.

Can we move on ? I hope that Jeroen can finish the review.

Last edited 12 months ago by chapoton (previous) (diff)

comment:10 Changed 12 months ago by embray

One other comment worth adding about this, is that it needs to be used carefully. Almost any custom class in Python (except those explicitly inheriting from dict, I believe) that implements a __getitem__ can be interpreted by Python as a "sequence" type. Per my note in comment:7, I think this is a fundamental shortcoming in Python though, and something we just have to deal with on some level...

comment:11 Changed 11 months ago by embray

  • Milestone changed from sage-8.5 to sage-8.7

Retargeting some of my tickets (somewhat optimistically for now).

comment:12 in reply to: ↑ description ; follow-up: Changed 10 months ago by jdemeyer

Replying to embray:

Adds an issequence() function that can work as a more generic replacement for isinstance(x, (list, tuple))

I'm not really convinced that this issequence(x) is the correct replacement for isinstance(x, (list, tuple)). In the majority of cases, you probably want to be more general and accept any iterable. In the example #24804 that you refer to, you only iterate over the object so why restrict to sequences?

And the way: I also agree that the mapping vs sequence thing is annoying.

comment:13 Changed 10 months ago by embray

This is specifically intended for cases where the iterable is treated sequentially: That is, it is assumed that it can be indexed with monotonically increasing integer indices. There are many cases like this where you do not want to be more general and do not want to accept an arbitrary iterable, at least, without eagerly consuming the iterable in order to convert it to a sequence (e.g. list(iterable)) which is obviously wasteful if you already have a sequence-like object.

comment:14 Changed 10 months ago by embray

For example, there are many cases where you have isinstance(x, (list tuple)) where it should also be able to accept something like a vector or a matrix, and just arbitrarily doesn't because of this less general check.

comment:15 Changed 8 months ago by embray

  • Milestone changed from sage-8.7 to sage-8.8

Moving all my in-progress tickets to 8.8 milestone.

comment:16 in reply to: ↑ 12 Changed 8 months ago by embray

Replying to jdemeyer:

Replying to embray:

Adds an issequence() function that can work as a more generic replacement for isinstance(x, (list, tuple))

I'm not really convinced that this issequence(x) is the correct replacement for isinstance(x, (list, tuple)). In the majority of cases, you probably want to be more general and accept any iterable. In the example #24804 that you refer to, you only iterate over the object so why restrict to sequences?

In many cases you don't just want an arbitrary iterable and that's what this is for.

Anything more general and you wind up back mired in the performance problems of using isinstance and ABCs.

If there's one tweak to this I would make it would be to explicitly exclude str and bytes, since although they are technically sequences under this definition, most of the time we want to treat them separately.

I think this problem badly needs a solution so I'd be happy to consider an alternative, but I think this is pretty useful in the vast majority of cases where it's needed.

comment:17 Changed 5 months ago by embray

  • Milestone changed from sage-8.8 to sage-8.9

Moving tickets from the Sage 8.8 milestone that have been actively worked on in the last six months to the next release milestone (optimistically).

Note: See TracTickets for help on using tickets.