Opened 3 years ago

Last modified 3 months ago

#25981 needs_work enhancement

SafeSortable wrapper for ordering heterogeneous types on Python 3

Reported by: embray Owned by:
Priority: major Milestone: sage-9.4
Component: python3 Keywords:
Cc: jdemeyer, chapoton Merged in:
Authors: Erik Bray Reviewers:
Report Upstream: N/A Work issues:
Branch: u/embray/misc/safe-sortable (Commits, GitHub, GitLab) Commit: 667f15bb146965dc76ddd989ca498d626c83a03a
Dependencies: Stopgaps:

Status badges

Description

There are several problems on Python 3 involving ordering elements of heterogeneous collections, particularly in objects such as sets and graphs.

This issue has already come up several times in the Python 3 port usually with some ad-hoc fix. For sorting, sometimes it's straightforward to come up with a reasonable sort key. Other times it's not obvious, or leads to ugly code (lots of try/excepts, etc.)

One possibility I've brought up before is inspired by the pprint implementation on Python 3: pprint by default orders the output of collections such as dicts and sets. On Python 3 they had to deal with this issue of arbitrary types no longer being ordered, so pprint has an internal helper called _safe_key on which this SafeSortable class is inspired, which serves as a wrapper around objects giving them Python 2-style sorting semantics.

This is just a proof-of-concept, and I think there are various issues to work out:

  1. Performance comparison, especially on Python 2. Adding key=SafeSortable to sorts is unnecessary on Python 2 and probably only needlessly slows things down, though I'm not sure by how much. This could also be avoided by building a Python 2 exception into the SafeSortable.sort helper method, and/or possibly making type(SafeSortable).__call__ effectively a no-op.
  1. Performance improvements on Python 3. Even on Python 3 we can probably speed this up a lot, in particular by pooling SafeSortable instances similarly to what we do for Integers.
  1. Do we want to make the interface more flexible? For example, do we always want Python 2 sorting semantics? Would it also be useful to provide an optional custom fallback for two types that can't be compared naturally?
  1. What about comparing compound objects, specifically tuples? Do we want an API for applying SafeSortable recursively to the elements of tuples? And what about other comparing other heterogeneous collections such as lists? I think this would probably be useful (e.g. sorting key/value pairs from a dict), but I haven't settled on the API for that.

Change History (31)

comment:1 Changed 3 years ago by embray

  • Cc jdemeyer chapoton added

comment:2 Changed 3 years ago by embray

  • Status changed from new to needs_review

comment:3 Changed 3 years ago by embray

On sage-devel the idea was raise of sorting otherwise unordered objects also based on their hash (as opposed to their id). This might be a nice semantic to add, where possible, since it will lead to a more deterministic (at within a given platform) sort.

comment:4 Changed 3 years ago by embray

  • Milestone changed from sage-8.4 to sage-8.5

comment:5 follow-up: Changed 2 years ago by zerline

  • Status changed from needs_review to positive_review

This obviously is a useful change.

Python2 compatibility is a wise behavior.

I only deplore that we need to import a new module, even a small one ..

comment:6 follow-up: Changed 2 years ago by jhpalmieri

  • Status changed from positive_review to needs_work

Documentation doesn't build. I don't have time to prepare a proper branch, but I think this will fix it:

  • src/sage/structure/misc.pyx

    diff --git a/src/sage/structure/misc.pyx b/src/sage/structure/misc.pyx
    index c7e9808279..459dd4f244 100644
    a b cdef class SafeSortable: 
    88    Wrapper for arbitrary objects allowing them to always be ordered in the
    99    manner of Python 2.x objects.
    1010
    11     When comparing a `SafeSortable` to another `SafeSortable`, it first
     11    When comparing a ``SafeSortable`` to another ``SafeSortable``, it first
    1212    attempts comparing their wrapped objects using the standard ``__lt__``
    1313    comparison.  If that fails, it falls back to comparing them by their type
    1414    name, then by their id, which is consistent with what Python 2.x does with
    1515    a pair of otherwise unorderable types.
    1616
    17     `SafeSortable`s may only be compared with other `SafeSortable`s.
     17    ``SafeSortable`` objects may only be compared with other
     18    ``SafeSortable`` objects.
    1819
    1920    EXAMPLES::
    2021

comment:7 in reply to: ↑ 5 Changed 2 years ago by embray

Replying to zerline:

I only deplore that we need to import a new module, even a small one ..

That isn't really a problem though... I'm not sure why you think it is. Are you saying it's a problem for the module to be imported at all (because sage.structure.misc is already used in sage.structure.parent for some utilities)? Or just the fact that it means adding an import statement to places where it's used? I'm not sure how else you expect to use utilities defined in other modules. The only alternative is shoving them into globals and that's not how Sage (or any large Python package) is developed internally. It's only for the REPL interface that we import a bunch of stuff into the global namespace for the convenience of user-level code.

comment:8 in reply to: ↑ 6 Changed 2 years ago by embray

Replying to jhpalmieri:

Documentation doesn't build. I don't have time to prepare a proper branch, but I think this will fix it:

I'm not sure why just `SafeSortable` wouldn't work. Normally Sphinx is good about generating links to objects defined in the module being documented without, for example, writing out the fully-qualified name like :class:`~sage.structure.misc.SafeSortable`. Is there a known problem with that in Cython modules?

comment:9 Changed 2 years ago by embray

I see--for some reason I never quite got it through my head that the default role throughout Sage's documentation is :math:, and not the normal default in Sphinx for Python projects which is :py:obj:. I suppose that makes a certain sense. So at the very least :class:`SafeSortable` is needed to make a proper cross-reference.

comment:10 Changed 2 years ago by jhpalmieri

`SafeSortable` doesn't cause failures to build, just typesetting in math mode as you note, so that proposed change is cosmetic. The failures come from ``SafeSortable``s: the second `` is not recognized by Sphinx as ending a literal block: I'm guessing that Sphinx wants `` to either be preceded by or followed by a space.

By the way, I also see a doctest failure:

Running doctests with ID 2018-12-14-08-47-08-3b058240.
Git branch: t/25981/misc/safe-sortable
Using --optional=dochtml,mpir,python2,sage
Doctesting 1 file.
sage -t --warn-long 51.0 src/sage/structure/misc.pyx
**********************************************************************
File "src/sage/structure/misc.pyx", line 74, in sage.structure.misc.SafeSortable.lt
Failed example:
    SafeSortable.lt(1, 'a')
Expected:
    True
Got:
    False
**********************************************************************
1 item had failures:
   1 of   3 in sage.structure.misc.SafeSortable.lt
    [17 tests, 1 failure, 0.01 s]
----------------------------------------------------------------------
sage -t --warn-long 51.0 src/sage/structure/misc.pyx  # 1 doctest failed
----------------------------------------------------------------------
Total time for all tests: 0.0 seconds
    cpu time: 0.0 seconds

comment:11 Changed 2 years ago by jhpalmieri

If you want to create the cross-references, you can use this change instead of my previous suggestion:

  • src/sage/structure/misc.pyx

    diff --git a/src/sage/structure/misc.pyx b/src/sage/structure/misc.pyx
    index c7e9808279..8217005fa6 100644
    a b cdef class SafeSortable: 
    88    Wrapper for arbitrary objects allowing them to always be ordered in the
    99    manner of Python 2.x objects.
    1010
    11     When comparing a `SafeSortable` to another `SafeSortable`, it first
     11    When comparing a :class:`SafeSortable` to another :class:`SafeSortable`, it first
    1212    attempts comparing their wrapped objects using the standard ``__lt__``
    1313    comparison.  If that fails, it falls back to comparing them by their type
    1414    name, then by their id, which is consistent with what Python 2.x does with
    1515    a pair of otherwise unorderable types.
    1616
    17     `SafeSortable`s may only be compared with other `SafeSortable`s.
     17    :class:`SafeSortable` objects may only be compared with other
     18    :class:`SafeSortable` objects.
    1819
    1920    EXAMPLES::
    2021

comment:12 follow-up: Changed 2 years ago by jhpalmieri

Regarding the doctest failure, I get

sage: 1 < 'a'
False
sage: int(1) < 'a'
True

Did you mean to use int(1) instead of 1?

comment:13 follow-up: Changed 2 years ago by jhpalmieri

@zerline: could you clarify why you gave this a positive review if the docs don't build and tests don't pass?

comment:14 follow-up: Changed 2 years ago by jdemeyer

It's not really clear to me what the use case of this SafeSortable would be. It seems to me that we should just avoid to compare uncomparable objects instead of making uncomparable objects comparable anyway.

comment:15 in reply to: ↑ 13 ; follow-up: Changed 2 years ago by embray

Replying to jhpalmieri:

@zerline: could you clarify why you gave this a positive review if the docs don't build and tests don't pass?

They gave it a positive review based on its merits, which most people do, without necessarily catching every minute issue (this is what we have patchbots for, supposedly). They aren't as familiar as you are with these nitpicks and it's not a big deal.

comment:16 in reply to: ↑ 14 Changed 2 years ago by embray

Replying to jdemeyer:

It's not really clear to me what the use case of this SafeSortable would be. It seems to me that we should just avoid to compare uncomparable objects instead of making uncomparable objects comparable anyway.

My python3 branch is replete with examples of its use. I can add more if you don't believe me. It fixes a good many issues where lists of mixed-type objects are being sorted, and there are plenty of cases like that which are valid. One of the common examples is ordering lists of edge labels which can be strings or ints, but there are plenty of other examples.

There are dozens of bugs that I've delayed fixing for months on end due to not having a utility like this. I'm just going to fix the fussy documentation issue and set this back to positive review.

comment:17 in reply to: ↑ 12 ; follow-up: Changed 2 years ago by embray

Replying to jhpalmieri:

Regarding the doctest failure, I get

sage: 1 < 'a'
False
sage: int(1) < 'a'
True

Did you mean to use int(1) instead of 1?

I get

sage: 1 < 'a'
True

Initially I wasn't sure why you would get False here, since the default Python 2 comparison actually sorts instances of numeric types (which I believe Sage's Integer is for this purpose), so both Integer(1) and int(1) should come before 'a' and I'm not sure how you'd get any other result. However, it turns out that Integer.__richcmp__ given a RHS that is not obviously comparable to an int will fall back on the coercion model, which itself in this case has a default richcmp implementation that as a last resort compares the two elements just by their ids, and that's what's happening in the case of comparing an Integer to a str. This default comparison is rather unfortunate since it's rather non-deterministic.

I'm not sure what the best thing would be to do about this. I'm tempted to take the Python 3 approach and remove this fallback nonsense comparison in favor of raising a TypeError if two objects don't have a sensible comparison implemented.

Last edited 2 years ago by embray (previous) (diff)

comment:18 Changed 2 years ago by embray

One other thing no one has pointed out though is that the "Python 2 style comparison" implemented here isn't actually quite correct. I took the implementation from Python's pprint module which implements a similar utility, but in pure Python, and not public API (as I explained in the ticket description). However, it turns out the actual default comparison is a little more complex: https://github.com/python/cpython/blob/3752bc96c0ea1ecf28903cc34cdcd75c658e92ce/Objects/object.c#L767

I think I may update this to something a bit closer to that. It has the advantage of being deterministic, whereas the current implementation still relies on instance ids which is not ideal.

comment:19 in reply to: ↑ 15 ; follow-ups: Changed 2 years ago by jhpalmieri

Replying to embray:

Replying to jhpalmieri:

@zerline: could you clarify why you gave this a positive review if the docs don't build and tests don't pass?

They gave it a positive review based on its merits, which most people do, without necessarily catching every minute issue (this is what we have patchbots for, supposedly). They aren't as familiar as you are with these nitpicks and it's not a big deal.

There is (at least) one thing the patchbots won't do: look at the built documentation to make sure it looks okay, doesn't use `code` instead of ``code`` or other reST errors which won't stop the build but will make the output look bad. This should actually be on the reviewer's checklist: look at the documentation. If reviewers don't look at the built documentation, it is likely that the quality of the documentation will decrease over time because these errors won't get caught. I'd prefer that this did not happen.

The doctest may be system dependent. In unpatched Sage running Python 2, 1 < 'a' returns False on several OS X machines.

comment:20 in reply to: ↑ 19 Changed 2 years ago by jhpalmieri

Replying to jhpalmieri:

This should actually be on the reviewer's checklist: look at the documentation.

See #26926.

comment:21 Changed 2 years ago by jdemeyer

First a minor comment about the code: in comparison methods, you don't need to check the type of the first argument: it's self (and should be called self by PEP 8) and you know that it has the right class. Cython knows it too, so you don't need a <SafeSortable> cast.

comment:22 Changed 2 years ago by jdemeyer

Now about using SafeSortable: I'm not convinced that it actually makes sense. If you just want to sort something for the user's convenience and nothing really depends on it, then it's fine: in the best case, it helps and in the worst case, you won't break anything.

But I wouldn't use SafeSortable for anything where the order really matters for an algorithm: the try/except style means that it's non-transitive (in that sense, it's worse than simply using str as sorting key).

Falling back on things like type() and id() is also trying to force Python 2 sorting semantics in Python 3. I'm not sure that this is the right thing to do.

comment:23 Changed 2 years ago by embray

  • Milestone changed from sage-8.5 to sage-8.7

Retargeting some of my tickets (somewhat optimistically for now).

comment:24 in reply to: ↑ 19 Changed 2 years ago by embray

Replying to jhpalmieri:

The doctest may be system dependent. In unpatched Sage running Python 2, 1 < 'a' returns False on several OS X machines.

I already explained that it is, as well as the exact reason.

comment:25 in reply to: ↑ 17 Changed 2 years ago by jdemeyer

Replying to embray:

I'm tempted to take the Python 3 approach and remove this fallback nonsense comparison in favor of raising a TypeError if two objects don't have a sensible comparison implemented.

This is exactly what #22029 does.

comment:26 Changed 2 years ago by embray

  • Milestone changed from sage-8.7 to sage-8.8

Moving all my in-progress tickets to 8.8 milestone.

comment:27 Changed 2 years ago by embray

  • Milestone changed from sage-8.8 to sage-8.9

Tickets still needing working or clarification should be moved to the next release milestone at the soonest (please feel free to revert if you think the ticket is close to being resolved).

comment:28 Changed 17 months ago by embray

  • Milestone changed from sage-8.9 to sage-9.1

Ticket retargeted after milestone closed

comment:29 Changed 13 months ago by mkoeppe

  • Milestone changed from sage-9.1 to sage-9.2

Batch modifying tickets that will likely not be ready for 9.1, based on a review of the ticket title, branch/review status, and last modification date.

comment:30 Changed 8 months ago by mkoeppe

  • Milestone changed from sage-9.2 to sage-9.3

comment:31 Changed 3 months ago by mkoeppe

  • Milestone changed from sage-9.3 to sage-9.4

Setting new milestone based on a cursory review of ticket status, priority, and last modification date.

Note: See TracTickets for help on using tickets.