Opened 3 years ago

Last modified 3 months ago

#25393 needs_info defect

Removed unused and potentially misleading Morphism.__hash__

Reported by: embray Owned by: egourgoulhon
Priority: major Milestone: sage-9.5
Component: geometry Keywords: python3
Cc: chapoton, ​egourgoulhon Merged in:
Authors: Erik Bray Reviewers:
Report Upstream: N/A Work issues:
Branch: u/embray/misc/ticket-25393 (Commits, GitHub, GitLab) Commit: 1a990578b1b8f0826fa0408aaac424b205f5dffb
Dependencies: Stopgaps:

Status badges

Description (last modified by embray)

As far as I can tell there is no code using the ability to hash Morphisms, and having this in the base class is misleading since there are subclasses which are not immutable and should not be hashable anyways. It would be better to implement __hash__ on an as-needed basis for those classes that can guarantee immutability.

Original description

ContinuousMap is impacted by #24551 in that it defines an __eq__, so its default hash is not inherited from its base class, Morphism (which does define __hash__).

This is one of the classes impacted by the issue raised in this comment.

I'm not sure what the best solution is. Right now I feel like it's somewhat ill-defined, because it's possible to have two ContinuousMaps that compare as equal, but have different hashes. That's not necessarily wrong, but it isn't obviously right either. If x == y, then hash(x) == hash(y) for a correct implementation. Technically I believe you can get away with the inverse--having hash(x) == hash(y) but x != y--but it is advisable for performance to avoid this.

Change History (37)

comment:1 Changed 3 years ago by embray

  • Description modified (diff)

comment:2 Changed 3 years ago by chapoton

  • Cc ​egourgoulhon added
  • Status changed from new to needs_info

comment:3 Changed 3 years ago by tscrim

If two instances of the same class have the same hash, they must also be equal to each other with ==

That is true iff a perfect hash function exists, which it generally does not. In fact, that is not mandated by Python. For example, a valid hash function is 0 for all objects, but you loose performance benefits from dict because it becomes a list search.

The converse is really what you want: == implies same hash. (Also to be hashable, the hash should be immutable.) You can get away with it, but it causes problems when you try to put it into a set/dictionary as you get multiple instances of equal objects.

The issue with this class is that we have essentially a mutable element class. So I think the proper thing to do is just disable hashing. We do something similar for vectors and matrices.

comment:4 follow-up: Changed 3 years ago by embray

I think you must have misread me or something. I didn't mean that if two instances have the same hash it implies they are ==. I mean that in order for hashing to be implemented correctly, then as you said if they are == they must have the same hash.

It actually wasn't immediately obvious to me that this class is not immutable. Where can it be mutated? All I noticed is that you define two ContinuousMaps that are the same except that one is constructed with is_isomorphism=True and the other with is_isomorphism=False. In this case they are == but have different hashes.

comment:5 in reply to: ↑ 4 Changed 3 years ago by tscrim

  • Description modified (diff)

Replying to embray:

I think you must have misread me or something. I didn't mean that if two instances have the same hash it implies they are ==. I mean that in order for hashing to be implemented correctly, then as you said if they are == they must have the same hash.

Paraphrasing what you wrote: if x.__class__ == y.__class__ and hash(x) == hash(y), then x == y. It is a bad smell for x == y with hash(x) == hash(y). We're on the same page with what needs to be done, but I just saying the ticket description needed an update, which I have just done.

IMO, it is not a code smell to have hash(x) == hash(y) but x != y. However, I do agree that it is better to have a more perfect hash function, but there are only so many hash values since hashes are limited to 64 bit integers (unless I am misunderstanding something in Python).

It actually wasn't immediately obvious to me that this class is not immutable. Where can it be mutated? All I noticed is that you define two ContinuousMaps that are the same except that one is constructed with is_isomorphism=True and the other with is_isomorphism=False. In this case they are == but have different hashes.

See add_expr and set_expr.

comment:6 follow-up: Changed 3 years ago by embray

Well in that case this is already fixed for free on Python 3 since it sets __hash__ = None automatically for any class that defines __eq__ but not __hash__, and this should just be done explicitly for this class.

In general should Morphism even have a default hash? It doesn't seem that immediately useful to me.

comment:7 in reply to: ↑ 6 Changed 3 years ago by tscrim

Replying to embray:

Well in that case this is already fixed for free on Python 3 since it sets __hash__ = None automatically for any class that defines __eq__ but not __hash__, and this should just be done explicitly for this class.

I agree. That is probably the correct thing to for this class.

In general should Morphism even have a default hash? It doesn't seem that immediately useful to me.

I cannot think of a general mechanism that would require morphisms to be keys to a cache. Ideally matrices should be a subclass of Morphism, which would break the implicit Morphism is meant to be immutable. Although the fact that a Morphism can be stored via a coercion might lead to subtle bugs when it is mutable.

comment:8 Changed 3 years ago by embray

Okay then, I think I'll just try removing Morphism.__hash__. Neither the commit that added it, nor the associated ticket (#13214) provide any explanation for why it was added. It just was.

I don't think we should be adding __hash__ implementations anywhere that it isn't explicitly needed (or at least a specific use case is being imagined). Especially not for base classes that could have mutable subclasses (one could make exceptions of course but in that case it would also be good to have an explicit mechanism for marking whether or not instances of some class are expected to be immutable).

comment:9 Changed 3 years ago by embray

Oh fun. Deleting Morphism.__hash__ causes segfaults :) No, I was just misreading the test output. It wasn't segfaults, just timeouts (the stack traces threw me off). Still I wonder if that was a regression...

Last edited 3 years ago by embray (previous) (diff)

comment:10 Changed 3 years ago by embray

  • Authors set to Erik Bray
  • Branch set to u/embray/misc/ticket-25393
  • Commit set to 1a990578b1b8f0826fa0408aaac424b205f5dffb
  • Description modified (diff)
  • Status changed from needs_info to needs_review
  • Summary changed from Fix hash for ContinuousMap to Removed unused and potentially misleading Morphism.__hash__

New commits:

1a99057remove unused and potentially misleading __hash__ implementation for Morphisms

comment:11 Changed 3 years ago by embray

  • Status changed from needs_review to needs_work

Looks like this has some test failures. It's possible that in some of these cases the real problem is they shouldn't use UniqueRepresentation in the first place, but I'm not sure.

comment:12 Changed 3 years ago by chapoton

bot says "no failing doctest"..

comment:13 Changed 3 years ago by embray

I'll have to try again. I got some failures when I ran it, but it might have been a matter of running the tests out of their "normal" order, and hence weird things happening with the coercion framework or something...

comment:14 Changed 3 years ago by egourgoulhon

Sorry for showing up late in the discussion, but for some reason, I did not received the Trac notification that I've been put in the Cc list of this ticket; I actually discovered the existence of the ticket only today, while reading the "python3 status" post on sage-devel...

Let me summarize a few things about ContinuousMap:

  • ContinousMap objects are "in spirit" immutable: they represent continuous maps between topological manifolds and once such a map is defined, there is no meaning in modifying it. However, in practice, such maps cannot be fully defined at their creation by the user, because some new charts may be introduced on the fly on the manifold and a given map should be able to have coordinate expressions in terms of these charts, hence the methods add_expr and set_expr.
  • ContinousMap (more precisely the derived class DiffMap) needs to be hashable because DiffMap objects serve as keys in the dictionary of vector field modules on a manifold (see here for details).
  • I understand that the __hash__ method inherited from Morphism relies on the repr of the continuous map and in some cases (maps f and g having the same domain, codomain, but different names given by the user), one may have f == g with hash(f) != hash(g), which is very bad!

I propose to implement a __hash__ method in ContinuousMap that ensures that maps that compare equal have the same hash. It would make sense that I do this atop commit 1a9907 (i.e. the commit in the branch currently attached to this ticket), since the latter is suppressing Morphism.__hash__. Do you agree?

Last edited 3 years ago by egourgoulhon (previous) (diff)

comment:15 follow-up: Changed 3 years ago by embray

  • Owner changed from (none) to egourgoulhon

Thanks Eric for the explanation. I'm a bit iffy on the first point--it's a bit unfortunate, but also an implementation detail--I don't feel qualified to fuss about it right now.

Based on your explanation though, maybe I should back off a bit on my argument that Morphism.__hash__ should be removed (unless someone can point out other subclasses of it that are definitely not in any sense immutable). Instead, I might just change the implementation a bit to do away with its use of repr(). If the domain doesn't have generators we can just ignore that and use None or something. IMO it's the repr() that's the real problem because that's just flakey and surprising. So maybe I'll just change that for now and see what that affects.

Meanwhile, yes, if you think you know the best way to implement a ContinuousMap.__hash__ please do. Please also make sure there is a test demonstrating not only that the hash works, but why it should work, since my experiment in removing it didn't appear to break any test after all...

comment:16 in reply to: ↑ 15 ; follow-ups: Changed 3 years ago by egourgoulhon

Replying to embray:

Thanks Eric for the explanation. I'm a bit iffy on the first point--it's a bit unfortunate, but also an implementation detail--I don't feel qualified to fuss about it right now.

Actually the type of immutatibility involved for ContinuousMap is that defined by Simon King in this sage-devel post: methods, like add_expr, may change the internal representation of the object, but in any case they must preserve the equivalence class with respect to == to which the object belongs at its creation.

Based on your explanation though, maybe I should back off a bit on my argument that Morphism.__hash__ should be removed (unless someone can point out other subclasses of it that are definitely not in any sense immutable). Instead, I might just change the implementation a bit to do away with its use of repr(). If the domain doesn't have generators we can just ignore that and use None or something. IMO it's the repr() that's the real problem because that's just flakey and surprising. So maybe I'll just change that for now and see what that affects.

Indeed, for ContinousMap, which don't have any "generator", the issue is the default back to the hash of repr().

Meanwhile, yes, if you think you know the best way to implement a ContinuousMap.__hash__ please do.

OK I will do it.

Please also make sure there is a test demonstrating not only that the hash works, but why it should work, since my experiment in removing it didn't appear to break any test after all...

I am quite surprised about this: how could ContinuousMap objects be used as dictionary keys without any __hash__ method?

Version 1, edited 3 years ago by egourgoulhon (previous) (next) (diff)

comment:17 in reply to: ↑ 16 Changed 3 years ago by embray

Replying to egourgoulhon:

Replying to embray:

Please also make sure there is a test demonstrating not only that the hash works, but why it should work, since my experiment in removing it didn't appear to break any test after all...

I am quite surprised about this: how could ContinuousMap objects be used as dictionary keys without any __hash__ method?

I don't know. Are there definitely examples of this in the tests? I did get some test failures when I tried this the first time, but the patchbot shows tests passing. May it's just in some --long tests?

comment:18 in reply to: ↑ 16 Changed 3 years ago by egourgoulhon

Replying to egourgoulhon:

Please also make sure there is a test demonstrating not only that the hash works, but why it should work, since my experiment in removing it didn't appear to break any test after all...

I am quite surprised about this: how could ContinuousMap objects be used as dictionary keys without any __hash__ method?

I did the experiment, i.e. remove Morphism.__hash__, and discovered that the hash of ContinousMap is then inherited from Map.__hash__, which does (cf. line 1292 o f src/sage/categories/map.pyx):

return hash((self.domain(), self._codomain))

This simple hash seems very satisfactory to me, except maybe for self._codomain that may be replaced by self.codomain(). Actually this is what I had in mind when proposing to implement ContinousMap.__hash__. So if you remove Morphism.__hash__, there is no need to write a proper ContinousMap.__hash__: that of Map is sufficient.

comment:19 follow-ups: Changed 3 years ago by tscrim

I would probably also consider removing Map.__hash__ (and reimplementing that in ContinuousMap) for the same reasons as removing Morphism.__hash__.

comment:20 in reply to: ↑ 19 Changed 3 years ago by egourgoulhon

Replying to tscrim:

I would probably also consider removing Map.__hash__ (and reimplementing that in ContinuousMap) for the same reasons as removing Morphism.__hash__.

Yes. Anyway, if I understand correctly, for Python3, one needs to implement ContinuousMap.__hash__ because there is a ContinuousMap.__eq__, so that __hash__ cannot be inherited. Correct?

Btw, other manifolds classes that have to be hashable because they serve as dictionary keys are TopologicalManifold, Chart and VectorFrame. However, they all inherit from UniqueRepresentation, from which they get __hash__ and __eq__. So it should be OK for Python3 with them, shouldn't it?

comment:21 in reply to: ↑ 19 Changed 3 years ago by egourgoulhon

Replying to tscrim:

I would probably also consider removing Map.__hash__ (and reimplementing that in ContinuousMap) for the same reasons as removing Morphism.__hash__.

Removing Map.__hash__, FormalCompositeMap.__hash__ (to be coherent with the removal of Map.__hash__) and Morphism.__hash__, while implementing ContinuousMap.__hash__, leads to (only) two doctest errors in all Sage 8.3.beta3, as reported by make ptestlong:

sage -t --long --warn-long 53.5 src/sage/combinat/root_system/hecke_algebra_representation.py
**********************************************************************
File "src/sage/combinat/root_system/hecke_algebra_representation.py", line 683, in sage.combinat.root_system.hecke_algebra_representation.HeckeAlgebraRepresentation._test_Y
Failed example:
    rho._test_Y()   # long time (4s)
...
TypeError: <class 'sage.modules.with_basis.morphism.ModuleMorphismByLinearity_with_category'> is not hashable

and

sage -t --long --warn-long 53.5 src/sage/groups/semimonomial_transformations/semimonomial_transformation.pyx
**********************************************************************
File "src/sage/groups/semimonomial_transformations/semimonomial_transformation.pyx", line 170, in sage.groups.semimonomial_transformations.semimonomial_transformation.SemimonomialTransformation.__hash__
Failed example:
    hash( SemimonomialTransformationGroup(F, 4).an_element() )  #random #indirect doctest
...
TypeError: <type 'sage.rings.finite_rings.hom_finite_field.FiniteFieldHomomorphism_generic'> is not hashable
Last edited 3 years ago by egourgoulhon (previous) (diff)

comment:22 Changed 3 years ago by embray

Those tests would just have to be removed I guess. Though I'm still not 100% sure now that we even want to outright remove Map.__hash__. But maybe you actually agree with my initial thought to remove it.

comment:23 Changed 3 years ago by nbruin

Slightly more elegant than removing tests might be by installing a reasonable hash. For: 'sage.rings.finite_rings.hom_finite_field.FiniteFieldHomomorphism_generic' I would think that hash((f.domain(),f.codomain())+tuple(f.im_gens())) might be a reasonable option.

and for the second one hash((phi.domain(),phi.codomain(),tuple(phi(x) for x in X.basis())))

is probably OK. These are both homomorphisms on finitely generated structures, so equality and hashing are relatively straightforward (if they are implemented for elements of the codomain).

comment:24 Changed 3 years ago by egourgoulhon

Finally, I've implemented ContinousMap.__hash__ in a separate ticket: #25502, in order to have a quick fix for the Python3 issue.

comment:25 Changed 3 years ago by embray

  • Milestone changed from sage-8.3 to sage-8.4

I believe this issue can reasonably be addressed for Sage 8.4.

comment:26 follow-up: Changed 3 years ago by embray

  • Status changed from needs_work to needs_info

Looking back at #25502, and looking the comments in this ticket again, was the hash added in #25502 really necessary? If we still go with removing the existing Morphism.__hash__ (should we?) then ContinuousMap inherits Map.__hash__ which is identical to the ContinuousMap.__hash__ added in #25502. Should Map.__hash__ also be removed?

comment:27 in reply to: ↑ 26 ; follow-up: Changed 3 years ago by egourgoulhon

Replying to embray:

Looking back at #25502, and looking the comments in this ticket again, was the hash added in #25502 really necessary? If we still go with removing the existing Morphism.__hash__ (should we?) then ContinuousMap inherits Map.__hash__ which is identical to the ContinuousMap.__hash__ added in #25502. Should Map.__hash__ also be removed?

Indeed ContinuousMap.__hash__ and Map.__hash__ are essentially identical, but I guess the logic of removing Morphism.__hash__ applies to Map.__hash__ as well. From the ticket description: As far as I can tell there is no code using the ability to hash Morphisms, and having this in the base class is misleading since there are subclasses which are not immutable and should not be hashable anyways. It would be better to implement hash on an as-needed basis for those classes that can guarantee immutability. If only Morphism.__hash__ is removed and Map.__hash__ is kept, then all subclasses will still have some __hash__ inherited from their base class (Map) and therefore will look hashable to Python3, even if they are not immutable, which is truly dangerous...

PS: I realize that the removing of Map.__hash__ was already advocated by Travis for the very same reasons in comment:19. See also comment:21 for the consequences of the removal of Map.__hash__ (only 2 failed doctests).

Last edited 3 years ago by egourgoulhon (previous) (diff)

comment:28 in reply to: ↑ 27 Changed 3 years ago by egourgoulhon

Replying to egourgoulhon:

If only Morphism.__hash__ is removed and Map.__hash__ is kept, then all subclasses will still have some __hash__ inherited from their base class (Map) and therefore will look hashable to Python3, even if they are not immutable, which is truly dangerous...

Actually I was wrong: there is no danger since in Python3, a class that defines some __eq__ does no longer inherit __hash__ from any base class. From this part of Python 3 reference: If a class that overrides __eq__() needs to retain the implementation of __hash__() from a parent class, the interpreter must be told this explicitly by setting __hash__ = <ParentClass>.__hash__. In our case, this means that if one does not write explicitly

   __hash__ = Map.__hash__

(and of course does not reimplement __hash__), then the subclass is not hashable.

Probably things will be more clear if Map.__hash__ is removed.

comment:29 Changed 3 years ago by embray

  • Milestone changed from sage-8.4 to sage-8.5

comment:30 Changed 3 years ago by embray

  • Milestone changed from sage-8.5 to sage-8.7

Retargeting some of my tickets (somewhat optimistically for now).

comment:31 Changed 3 years ago by embray

  • Milestone changed from sage-8.7 to sage-8.8

Moving all my in-progress tickets to 8.8 milestone.

comment:32 Changed 2 years ago by embray

  • Milestone changed from sage-8.8 to sage-8.9

Tickets still needing working or clarification should be moved to the next release milestone at the soonest (please feel free to revert if you think the ticket is close to being resolved).

comment:33 Changed 22 months ago by embray

  • Milestone changed from sage-8.9 to sage-9.1

Ticket retargeted after milestone closed

comment:34 Changed 18 months ago by mkoeppe

  • Milestone changed from sage-9.1 to sage-9.2

Moving tickets to milestone sage-9.2 based on a review of last modification date, branch status, and severity.

comment:35 Changed 12 months ago by mkoeppe

  • Milestone changed from sage-9.2 to sage-9.3

comment:36 Changed 5 months ago by mkoeppe

  • Milestone changed from sage-9.3 to sage-9.4

Moving to 9.4, as 9.3 has been released.

comment:37 Changed 3 months ago by mkoeppe

  • Milestone changed from sage-9.4 to sage-9.5

Setting a new milestone for this ticket based on a cursory review.

Note: See TracTickets for help on using tickets.