Opened 4 years ago

Last modified 4 months ago

#25388 needs_work enhancement

InheritComparisonUniqueRepresentation

Reported by: embray Owned by:
Priority: major Milestone: sage-9.7
Component: refactoring Keywords:
Cc: jdemeyer Merged in:
Authors: Erik Bray Reviewers:
Report Upstream: N/A Work issues:
Branch: u/embray/misc/inherit-comp-unique-repr (Commits, GitHub, GitLab) Commit: 8d665aa799cecdcbf762148d11c625e0bb30bf82
Dependencies: Stopgaps:

Status badges

Description

This is a bit ugly, but it seems to be necessary if you want to use UniqueRepresentation with subclasses of Element (among others) which has the InheritComparison metaclass. This otherwise creates a conflict with ClasscallMetaclass.

This class resolves that by making a UniqueRepresentation with the InheritComparisonClasscallMetaclass. This can then be used as a mix-in class with Element subclasses.

There's at least some prior precedence for this, and I've updated those examples to use InheritComparisonUniqueRepresentation to demonstrate that it works. I'm open to better ideas.

Change History (34)

comment:1 Changed 4 years ago by embray

  • Status changed from new to needs_review

comment:2 Changed 4 years ago by jdemeyer

My first reaction is that it doesn't really make sense to use UniqueRepresentation for elements. I wonder what the reason for that is.

comment:3 Changed 4 years ago by embray

Perhaps the entire concept is wrong somehow? That said as you can see there are cases of this (albeit with classes that only inherit Element indirectly).

Why doesn't it make sense? Just as 1 is 1 it does make sense to have this for other objects. Granted 1 is 1 is a consequence of caching small integers, but I don't see the harm of having it for other elements.

Alternatively, it would be nice to have something that behaves somewhat like UniqueRepresentation just for __eq__. That is, something like CachedRepresentation that compares instances according to their cache keys. The idea is to avoid implementing a bunch of __eq__ that compare two objects according to some set of attributes that are the same attributes used to initialize a "unique" instance of that class. Does that make sense?

comment:4 Changed 4 years ago by embray

In any case, it does make sense, as long as we have a complicated metaclass hierarchy, to have a metaclass that combines InheritComparison with UniqueRepresentation.

comment:5 follow-up: Changed 4 years ago by tscrim

I can think of certain morphisms that I could see benefiting from being UniqueRepresentation, but I wouldn't do non-equality comparisons on them. The other would be for small sets of key algebraic objects. We do this in combinat/crystals/letters.pyx, but we do it by @cached_method on _element_constructor_ to help improve the construction time and to tie it directly with the parent.

comment:6 Changed 4 years ago by nbruin

There are a couple of reasons why UniqueRepresentation should be used sparingly:

  • It makes instantiation of an object *expensive*: The construction parameters need to be processed and made into a dictionary key for lookup
  • It is very prone to creating memory leaks if the construction parameters consist of anything more complicated than integers and strings. That's because the construction parameters, being put in the key of a global WeakValueDict, have their life lower bounded by the object. So if those parameters end up carrying (indirectly) a reference to the object, it's now immortal.
  • it means that one has to be very careful caching UniqueRepresentation objects: they're already being globally cached (under expensive keys) and the places where one would like to cache them are generally places where that would create the kind of reference loops that cause memory leaks.

The reason why we have UniqueRepresentation is because it allows the coercion framework to do lightning-fast comparisons (because we can make do with an "is" test rather than a "=="). It's a compromise that makes memory management with *parents* much more complicated, for the benefit of making some operations with *elements* much faster.

That's why people think it's strange to make elements UniqueRepresentation.

comment:7 follow-up: Changed 4 years ago by embray

That's weird about the memory leaks since I thought the point of using weak reference caching was that you could construct an object once--it would be cached--but then if it isn't used again the cached keys would (potentially) go away.

As for this specific ticket, perhaps the use case of using UniqueRepresentation with Elements is not good, though as the diff shows there were still at least a few places where it was useful to define this (unfortunately verbose) InheritComparisonUniqueRepresentation class for reuse.

However, my primary motivation to using more UniqueRepresentation was actually as a possible solution, in many cases, to #24551. The reason it appealed to me is that it gives free hashing and equality comparison implementation for classes that have a property that identity of instances of that class is uniquely determined by some attributes of that instance that are provided at its time of construction (particularly via __init__).

If UniqueRepresentation is not right for this in all cases (something I can believe) then perhaps it would be good to add a new base class that has a similar property, but without caching (and thus without the guaranteed equality-by-id property, which for my use case is not actually so important). In other words, it would basically generate the same cache key that would be generated by CachedRepresentation, but only use that to implement __eq__ and __hash__, but not do any caching.

comment:8 in reply to: ↑ 5 ; follow-up: Changed 4 years ago by embray

Replying to tscrim:

I can think of certain morphisms that I could see benefiting from being UniqueRepresentation, but I wouldn't do non-equality comparisons on them. The other would be for small sets of key algebraic objects. We do this in combinat/crystals/letters.pyx, but we do it by @cached_method on _element_constructor_ to help improve the construction time and to tie it directly with the parent.

On that note, could you have a look at #25389 and tell me if the use of UniqueRepresentation doesn't make sense in any of those cases. I felt that it did make sense but clearly there's a lot of hard-earned experience I don't have as to when it does or doesn't make sense to use. (Note: The branch in that ticket has this one as a dependency, so it's really only interesting to look at the changes in quaternion algebras: https://git.sagemath.org/sage.git/diff/src/sage/algebras/quatalg/quaternion_algebra.py?id=9ea55d9933a3cd2bedd391068276f804ff868e1f&id2=6fc1e20c666283a301b4ff3f855013de8d206b35)

Last edited 4 years ago by embray (previous) (diff)

comment:9 in reply to: ↑ 7 ; follow-up: Changed 4 years ago by nbruin

Replying to embray:

That's weird about the memory leaks since I thought the point of using weak reference caching was that you could construct an object once--it would be cached--but then if it isn't used again the cached keys would (potentially) go away.

That's true, but the caching is not the primary design feature. It's a cost more than a benefit (but necessary for what it does). The global cache brings you very close to creating a global anchor for reference cycles: The global weakvaluedict that does the caching has strong references to the keys used (the construction parameters). As has been shown time and again, these keys have a tendency of carrying references to the cached object. Now your memory cycle is globally anchored and will be immune to cyclic garbage collection: a memory leak.

This kind of leak keeps occurring. See https://trac.sagemath.org/ticket/23807#comment:14 for a recent example.

The EqualityAndHashingBasedOnConstructionParameters that you suggest makes a lot more sense: here you get to store the key on the object itself, so memory leaks should be much less of a problem. I suspect it will not be a very memory-efficient class, because the key objects you're storing are likely preserved elsewhere on the object as well.

comment:10 in reply to: ↑ 8 ; follow-up: Changed 4 years ago by nbruin

Replying to embray:

On that note, could you have a look at #25389 and tell me if the use of UniqueRepresentation doesn't make sense in any of those cases.

The construction parameters of quaternionalgebras are sufficiently light that I would not expect memory leaks (as long as people don't take to caching quaternion algebras on the base ring!, which now that I thought of it, people will undoubtedly start doing).

In principle, QuaternionAlgebra is the kind of parent that could participate in coercion discovery, where UniqueRepresentation might be required (in particular, if people consider polynomial ring over a ring and expect coercion discovery, and in particular common covering parent construction, to work with it, then UniqueRepresentation might be required.

I would find it surprising if people are actually doing that with quaternion algebras, so I don't think we're getting the designed benefit from quaternion algebras.

By turning quaternion algebras into globally unique objects, you are requiring the algebras to be immutable in a very strong sense: different parts could end up having references to the same algebra, so ANY change in state (even little representation/caching details) must be assumed to affect completely unrelated code. It's not possible to make your own little local algebra and scribble in it anymore.

More precisely. The kind of scenario in the coercion framework where I know that UniqueRepresentation is really required is to make this assertion hold:

sage: Zxy.<x,y>=ZZ[]
sage: Qx.<x>=QQ[]
sage: Qy.<y>=QQ[]
sage: Qxy=QQ['x','y']
sage: x=Qx.0
sage: y=Zxy.1
sage: (x+y).parent() is Qxy
True
sage: Qy.<y>=QQ[]
sage: (Qy.0+Zxy.0).parent() is Qxy
True

Note that the coercion framework needs to construct a common covering parent for Zxy and Qx and one for Zxy and Qy itself. It would be very surprising if the results were not identical. (the fact that the resulting parent is equal to Qxy constructed by the user is convenient but less important). I don't think people would be doing that with QuaternionAlgebra so I think you could safely leave off the UniqueRep for now.

Last edited 4 years ago by nbruin (previous) (diff)

comment:11 in reply to: ↑ 9 Changed 4 years ago by embray

Replying to nbruin:

The EqualityAndHashingBasedOnConstructionParameters that you suggest makes a lot more sense: here you get to store the key on the object itself, so memory leaks should be much less of a problem. I suspect it will not be a very memory-efficient class, because the key objects you're storing are likely preserved elsewhere on the object as well.

While I like the feature of CachedRepresentation that bases the cache key on the construction parameters, I don't consider it a critical feature either. This could just as easily (more easily in fact) be implemented simply by defining a tuple of attribute names (or possibly methods that should be called) that should be used for testing equality and hashing. It's just such a common pattern it really could save a lot of code and effort if it could be coded once for a lot of classes.

Having an additional, usually small, tuple of references is not generally a major memory concern but of course as with anything it depends on how many of these objects you expect to construct.

comment:12 in reply to: ↑ 10 Changed 4 years ago by embray

Replying to nbruin:

Note that the coercion framework needs to construct a common covering parent for Zxy and Qx and one for Zxy and Qy itself. It would be very surprising if the results were not identical. (the fact that the resulting parent is equal to Qxy constructed by the user is convenient but less important). I don't think people would be doing that with QuaternionAlgebra so I think you could safely leave off the UniqueRep for now.

Thanks for looking at it and for the explanation. I agree. I guess what I really just want there is the aforementioned EqualityAndHashingBasedOnConstructionParameters, and I was using UniqueRepresentation more for that feature than anything (although it does seem useful to me for preventing construction of multiple copies of otherwise identical objects, but unless that's actually a problem for, say, QuaternionOrders then it's a premature optimization).

comment:13 Changed 4 years ago by embray

  • Milestone changed from sage-8.3 to sage-8.4

I believe this issue can reasonably be addressed for Sage 8.4.

comment:14 Changed 4 years ago by embray

I think this utility class is still useful as it clearly makes a few minor simplifications.

comment:15 Changed 4 years ago by jdemeyer

The problem is that I'm still not convinced that combining UniqueRepresentation and InheritComparisonMetaclass makes sense at all.

comment:16 Changed 4 years ago by embray

Oh what basis? On its face there's no reason not to and there are clearly classes that do use that combination. Now, if you disagreed that those classes should be using that combination in the first place that's another story: If those were fixed then there would be no use for adding this class. But first you'd have to justify why those classes shouldn't use UniqueRepresentation in particular and then fix that.

comment:17 Changed 4 years ago by nbruin

Looking at the code referenced in your commits: Indeed, it looks to me these objects have no business being UniqueRepresentation. They are differentials: elements; morphisms. There's no need for those to have their equality determined by their ID, which is what UniqueRepresentation is for. Indeed, needing comparison inheritance shows that they are not UniqueRepresentation. The right way to "fix" this (if any fix is necessary) is to change the inheritance of these objects, not to introduce an oxymoronic new class.

It could be that those pieces of code are from before UniqueRepresentation and CachedRepresentation were split off, or perhaps they were written by someone who didn't understand the fine difference between the two. In this case, I don't think you would even want to do it as a transition, since you're causing a net increase in code, so it's not making it shorter either.

Last edited 4 years ago by nbruin (previous) (diff)

comment:18 follow-up: Changed 4 years ago by tscrim

So I can answer for sage.algebras.commutative_dga.Differential why it is useful to have it inherit from UniqueRepresentation: These morphisms are used to define a differential graded commutative algebra (DGCA), so they need to be hashable and easily comparable. They also contain computationally difficult results that we want to cache and not duplicate, so they should be a cached representation. Hence, it makes sense to have them be UniqueRepresentation. From a quick thought, I believe it removes code duplication between the Z and multigraded cases.

This does not quite hold as much water for sage.algebras.clifford_algebra.ExteriorAlgebraDifferential as those are less likely to be compared or used to construct other objects. However, it does make the comparisons easier to work with in case someone does have a use-case for that.

comment:19 in reply to: ↑ 18 Changed 4 years ago by nbruin

Replying to tscrim:

So I can answer for sage.algebras.commutative_dga.Differential why it is useful to have it inherit from UniqueRepresentation

I see you make an argument for why CachedRepresentation is good for them via the cached bit. I also see you make an argument for UniqueRepresentation because you want them easily hashed and compared. Indeed, UniqueRepresentation makes those operations very fast because it's a composition of CachedRepresentation and EqualityById.

However, it seems that the comparison and hashing you want is NOT the one from UniqueRepresentation, because otherwise you wouldn't need the InheritComparison thing. The whole difference between UniqueRepresentation and CachedRepresentation is the mixing in of EqualityById. If you don't want that, then obviously you need to be inheriting from CachedRepresentation instead. So InheritComparisonCachedRepresentation might make sense. InheritComparisonUniqueRepresention never does.

Be careful with thinking that CachedRepresentation is actually more efficient: Creating objects the first time with them is actually MORE expensive, because the construction parameters need to be processed and hashed. You'll only see a benefit if you're actually recreating the same object from construction parameters multiple times. Often that's a silly thing to do; just keep the object! The other penalty to pay, that you're now sharing global objects so that you REALLY have to take immutability seriously, has been mentioned many times already.

comment:20 follow-up: Changed 4 years ago by embray

So should sage.algebras.commutative_dga.Differential be using WithEqualityById? Or InheritComparison? Because I agree it doesn't make sense to use both (in which case I see now why InheritComparisonUniqueRepresentation doesn't make sense, but InheritComparisonCachedRepresentation might.

On another note, I'm not even convinced InheritComparison needs to be a metaclass. ISTM it would work fine as a decorator. Maybe the only argument for it being a metaclass is that then subclasses can inherit its behavior, but I don't know if that's really needed or not...

comment:21 in reply to: ↑ 20 Changed 4 years ago by nbruin

Replying to embray:

On another note, I'm not even convinced InheritComparison needs to be a metaclass. ISTM it would work fine as a decorator. Maybe the only argument for it being a metaclass is that then subclasses can inherit its behavior, but I don't know if that's really needed or not...

I think that is because it actually pokes around in the class data structure, to "hard" inherit the slot value for the comparison functions, the way it would happen normally. The difference is that normally, comparison is not inherited if the hash is changed. The InheritComparisonMetaClass makes this happen even if hash is not inherited. I don't see at which point a decorator would have the opportunity to scribble in the relevant slots.

comment:22 Changed 4 years ago by embray

Ah, I just tried to implement it as a decorator and the problem is actually that Cython will not allow you to use "arbitrary" class decorators on cdef classes. Which is a shame because I don't see any reason that couldn't work. Yes, it is dangerous, but consenting adults and all...

What's worse is, it still works just fine if you call it not as a decorator, but just at module-level after the class definition (note: I defined a Python function called inherit_comparison that implements the functionality as a class decorator):

sage: cython('''
....: from sage.misc.inherit_comparison import inherit_comparison
....:
....: cdef class Base(object):
....:     def __richcmp__(left, right, int op):
....:         print("Base.__richcmp__")
....:         return left is right
....:
....: cdef class Derived(Base):
....:     def __hash__(self): return 1
....:
....: inherit_comparison(Derived)
....: ''')

So I don't think it needs to be a decorator. At the same time, as I wrote previously, I suppose the advantage of using a metaclass is that the behavior will automatically be inherited by subclasses, whereas with a decorator (whether used as above, or with the actual decorator syntax) one has to remember to use it when subclassing.

So in light of that, having the metaclass makes sense, unfortunately.

One bit of good news is that Python 3 (specifically 3.6+) has added a number of new dunder methods for customizing class creation that ease many of the most common use cases for metaclasses without explicitly using a metaclass. For example, I think __init_subclass__ might be able to serve as a replacement for InheritComparisonMetaclass. I don't know if Cython supports __init_subclass__ yet, but if/when it does we might be able to drop the metaclass even on Python 2.

comment:23 Changed 4 years ago by embray

  • Milestone changed from sage-8.4 to sage-8.5

comment:24 Changed 4 years ago by embray

  • Milestone changed from sage-8.5 to sage-8.7

Retargeting some of my tickets (somewhat optimistically for now).

comment:25 Changed 3 years ago by embray

  • Milestone changed from sage-8.7 to sage-8.8

Moving all my in-progress tickets to 8.8 milestone.

comment:26 Changed 3 years ago by embray

  • Milestone changed from sage-8.8 to sage-8.9

Moving tickets from the Sage 8.8 milestone that have been actively worked on in the last six months to the next release milestone (optimistically).

comment:27 Changed 3 years ago by embray

  • Milestone changed from sage-8.9 to sage-9.1

Ticket retargeted after milestone closed

comment:28 Changed 2 years ago by mkoeppe

  • Milestone changed from sage-9.1 to sage-9.2

Batch modifying tickets that will likely not be ready for 9.1, based on a review of the ticket title, branch/review status, and last modification date.

comment:29 Changed 2 years ago by mkoeppe

  • Milestone changed from sage-9.2 to sage-9.3

comment:30 Changed 19 months ago by chapoton

  • Status changed from needs_review to needs_work

red branch => needs work

comment:31 Changed 17 months ago by mkoeppe

  • Milestone changed from sage-9.3 to sage-9.4

Setting new milestone based on a cursory review of ticket status, priority, and last modification date.

comment:32 Changed 13 months ago by mkoeppe

  • Milestone changed from sage-9.4 to sage-9.5

Setting a new milestone for this ticket based on a cursory review.

comment:33 Changed 8 months ago by mkoeppe

  • Milestone changed from sage-9.5 to sage-9.6

comment:34 Changed 4 months ago by mkoeppe

  • Milestone changed from sage-9.6 to sage-9.7
Note: See TracTickets for help on using tickets.