Opened 4 years ago
Last modified 5 weeks ago
#25393 needs_info defect
Removed unused and potentially misleading Morphism.__hash__
Reported by:  embray  Owned by:  egourgoulhon 

Priority:  major  Milestone:  sage9.6 
Component:  geometry  Keywords:  python3 
Cc:  chapoton, egourgoulhon  Merged in:  
Authors:  Erik Bray  Reviewers:  
Report Upstream:  N/A  Work issues:  
Branch:  u/embray/misc/ticket25393 (Commits, GitHub, GitLab)  Commit:  1a990578b1b8f0826fa0408aaac424b205f5dffb 
Dependencies:  Stopgaps: 
Description (last modified by )
As far as I can tell there is no code using the ability to hash Morphisms
, and having this in the base class is misleading since there are subclasses which are not immutable and should not be hashable anyways. It would be better to implement __hash__
on an asneeded basis for those classes that can guarantee immutability.
Original description
ContinuousMap
is impacted by #24551 in that it defines an __eq__
, so its default hash is not inherited from its base class, Morphism
(which does define __hash__
).
This is one of the classes impacted by the issue raised in this comment.
I'm not sure what the best solution is. Right now I feel like it's somewhat illdefined, because it's possible to have two ContinuousMaps
that compare as equal, but have different hashes. That's not necessarily wrong, but it isn't obviously right either. If x == y
, then hash(x) == hash(y)
for a correct implementation. Technically I believe you can get away with the inversehaving hash(x) == hash(y)
but x != y
but it is advisable for performance to avoid this.
Change History (38)
comment:1 Changed 4 years ago by
 Description modified (diff)
comment:2 Changed 4 years ago by
 Cc egourgoulhon added
 Status changed from new to needs_info
comment:3 Changed 4 years ago by
comment:4 followup: ↓ 5 Changed 4 years ago by
I think you must have misread me or something. I didn't mean that if two instances have the same hash it implies they are ==
. I mean that in order for hashing to be implemented correctly, then as you said if they are ==
they must have the same hash.
It actually wasn't immediately obvious to me that this class is not immutable. Where can it be mutated? All I noticed is that you define two ContinuousMaps
that are the same except that one is constructed with is_isomorphism=True
and the other with is_isomorphism=False
. In this case they are ==
but have different hashes.
comment:5 in reply to: ↑ 4 Changed 4 years ago by
 Description modified (diff)
Replying to embray:
I think you must have misread me or something. I didn't mean that if two instances have the same hash it implies they are
==
. I mean that in order for hashing to be implemented correctly, then as you said if they are==
they must have the same hash.
Paraphrasing what you wrote: if x.__class__ == y.__class__
and hash(x) == hash(y)
, then x == y
. It is a bad smell for x == y
with hash(x) == hash(y)
. We're on the same page with what needs to be done, but I just saying the ticket description needed an update, which I have just done.
IMO, it is not a code smell to have hash(x) == hash(y)
but x != y
. However, I do agree that it is better to have a more perfect hash function, but there are only so many hash values since hashes are limited to 64 bit integers (unless I am misunderstanding something in Python).
It actually wasn't immediately obvious to me that this class is not immutable. Where can it be mutated? All I noticed is that you define two
ContinuousMaps
that are the same except that one is constructed withis_isomorphism=True
and the other withis_isomorphism=False
. In this case they are==
but have different hashes.
See add_expr
and set_expr
.
comment:6 followup: ↓ 7 Changed 4 years ago by
Well in that case this is already fixed for free on Python 3 since it sets __hash__ = None
automatically for any class that defines __eq__
but not __hash__
, and this should just be done explicitly for this class.
In general should Morphism
even have a default hash? It doesn't seem that immediately useful to me.
comment:7 in reply to: ↑ 6 Changed 4 years ago by
Replying to embray:
Well in that case this is already fixed for free on Python 3 since it sets
__hash__ = None
automatically for any class that defines__eq__
but not__hash__
, and this should just be done explicitly for this class.
I agree. That is probably the correct thing to for this class.
In general should
Morphism
even have a default hash? It doesn't seem that immediately useful to me.
I cannot think of a general mechanism that would require morphisms to be keys to a cache. Ideally matrices should be a subclass of Morphism
, which would break the implicit Morphism
is meant to be immutable. Although the fact that a Morphism
can be stored via a coercion might lead to subtle bugs when it is mutable.
comment:8 Changed 4 years ago by
Okay then, I think I'll just try removing Morphism.__hash__
. Neither the commit that added it, nor the associated ticket (#13214) provide any explanation for why it was added. It just was.
I don't think we should be adding __hash__
implementations anywhere that it isn't explicitly needed (or at least a specific use case is being imagined). Especially not for base classes that could have mutable subclasses (one could make exceptions of course but in that case it would also be good to have an explicit mechanism for marking whether or not instances of some class are expected to be immutable).
comment:9 Changed 4 years ago by
Oh fun. Deleting No, I was just misreading the test output. It wasn't segfaults, just timeouts (the stack traces threw me off). Still I wonder if that was a regression...
Morphism.__hash__
causes segfaults :)
comment:10 Changed 4 years ago by
 Branch set to u/embray/misc/ticket25393
 Commit set to 1a990578b1b8f0826fa0408aaac424b205f5dffb
 Description modified (diff)
 Status changed from needs_info to needs_review
 Summary changed from Fix hash for ContinuousMap to Removed unused and potentially misleading Morphism.__hash__
New commits:
1a99057  remove unused and potentially misleading __hash__ implementation for Morphisms

comment:11 Changed 4 years ago by
 Status changed from needs_review to needs_work
Looks like this has some test failures. It's possible that in some of these cases the real problem is they shouldn't use UniqueRepresentation
in the first place, but I'm not sure.
comment:12 Changed 4 years ago by
bot says "no failing doctest"..
comment:13 Changed 4 years ago by
I'll have to try again. I got some failures when I ran it, but it might have been a matter of running the tests out of their "normal" order, and hence weird things happening with the coercion framework or something...
comment:14 Changed 4 years ago by
Sorry for showing up late in the discussion, but for some reason, I did not received the Trac notification that I've been put in the Cc list of this ticket; I actually discovered the existence of the ticket only today, while reading the "python3 status" post on sagedevel...
Let me summarize a few things about ContinuousMap
:
ContinousMap
objects are "in spirit" immutable: they represent continuous maps between topological manifolds and once such a map is defined, there is no meaning in modifying it. However, in practice, such maps cannot be fully defined at their creation by the user, because some new charts may be introduced on the fly on the manifold and a given map should be able to have coordinate expressions in terms of these charts, hence the methodsadd_expr
andset_expr
.
ContinousMap
(more precisely the derived classDiffMap
) needs to be hashable becauseDiffMap
objects serve as keys in the dictionary of vector field modules on a manifold (see here for details).
 I understand that the
__hash__
method inherited fromMorphism
relies on therepr
of the continuous map and in some cases (mapsf
andg
having the same domain, codomain, but different names given by the user), one may havef == g
withhash(f) != hash(g)
, which is very bad!
I propose to implement a __hash__
method in ContinuousMap
that ensures that maps that compare equal have the same hash. It would make sense that I do this atop commit 1a9907 (i.e. the commit in the branch currently attached to this ticket), since the latter is suppressing Morphism.__hash__
. Do you agree?
comment:15 followup: ↓ 16 Changed 4 years ago by
 Owner changed from (none) to egourgoulhon
Thanks Eric for the explanation. I'm a bit iffy on the first pointit's a bit unfortunate, but also an implementation detailI don't feel qualified to fuss about it right now.
Based on your explanation though, maybe I should back off a bit on my argument that Morphism.__hash__
should be removed (unless someone can point out other subclasses of it that are definitely not in any sense immutable). Instead, I might just change the implementation a bit to do away with its use of repr()
. If the domain doesn't have generators we can just ignore that and use None
or something. IMO it's the repr()
that's the real problem because that's just flakey and surprising. So maybe I'll just change that for now and see what that affects.
Meanwhile, yes, if you think you know the best way to implement a ContinuousMap.__hash__
please do. Please also make sure there is a test demonstrating not only that the hash works, but why it should work, since my experiment in removing it didn't appear to break any test after all...
comment:16 in reply to: ↑ 15 ; followups: ↓ 17 ↓ 18 Changed 4 years ago by
Replying to embray:
Thanks Eric for the explanation. I'm a bit iffy on the first pointit's a bit unfortunate, but also an implementation detailI don't feel qualified to fuss about it right now.
Actually the type of immutatibility involved for ContinuousMap
is that defined by Simon King in this sagedevel post: methods, like add_expr
, may change the internal representation of the object, but in any case they must preserve the equivalence class with respect to ==
to which the object belongs at its creation.
Based on your explanation though, maybe I should back off a bit on my argument that
Morphism.__hash__
should be removed (unless someone can point out other subclasses of it that are definitely not in any sense immutable). Instead, I might just change the implementation a bit to do away with its use ofrepr()
. If the domain doesn't have generators we can just ignore that and useNone
or something. IMO it's therepr()
that's the real problem because that's just flakey and surprising. So maybe I'll just change that for now and see what that affects.
Indeed, for ContinousMap
, which don't have any "generator", the issue is the default back to the hash of repr()
.
Meanwhile, yes, if you think you know the best way to implement a
ContinuousMap.__hash__
please do.
OK I will do it.
Please also make sure there is a test demonstrating not only that the hash works, but why it should work, since my experiment in removing it didn't appear to break any test after all...
I am quite surprised about this: how could ContinuousMap
objects be used as dictionary keys without any __hash__
method?
comment:17 in reply to: ↑ 16 Changed 4 years ago by
Replying to egourgoulhon:
Replying to embray:
Please also make sure there is a test demonstrating not only that the hash works, but why it should work, since my experiment in removing it didn't appear to break any test after all...
I am quite surprised about this: how could
ContinuousMap
objects be used as dictionary keys without any__hash__
method?
I don't know. Are there definitely examples of this in the tests? I did get some test failures when I tried this the first time, but the patchbot shows tests passing. May it's just in some long
tests?
comment:18 in reply to: ↑ 16 Changed 4 years ago by
Replying to egourgoulhon:
Please also make sure there is a test demonstrating not only that the hash works, but why it should work, since my experiment in removing it didn't appear to break any test after all...
I am quite surprised about this: how could
ContinuousMap
objects be used as dictionary keys without any__hash__
method?
I did the experiment, i.e. remove Morphism.__hash__
, and discovered that the hash of ContinousMap
is then inherited from Map.__hash__
, which does (cf. line 1292 o f src/sage/categories/map.pyx
):
return hash((self.domain(), self._codomain))
This simple hash seems very satisfactory to me, except maybe for self._codomain
that may be replaced by self.codomain()
. Actually this is what I had in mind when proposing to implement ContinousMap.__hash__
. So if you remove Morphism.__hash__
, there is no need to write a proper ContinousMap.__hash__
: that of Map
is sufficient.
comment:19 followups: ↓ 20 ↓ 21 Changed 4 years ago by
I would probably also consider removing Map.__hash__
(and reimplementing that in ContinuousMap
) for the same reasons as removing Morphism.__hash__
.
comment:20 in reply to: ↑ 19 Changed 4 years ago by
Replying to tscrim:
I would probably also consider removing
Map.__hash__
(and reimplementing that inContinuousMap
) for the same reasons as removingMorphism.__hash__
.
Yes. Anyway, if I understand correctly, for Python3, one needs to implement ContinuousMap.__hash__
because there is a ContinuousMap.__eq__
, so that __hash__
cannot be inherited. Correct?
Btw, other manifolds classes that have to be hashable because they serve as dictionary keys are TopologicalManifold
, Chart
and VectorFrame
. However, they all inherit from UniqueRepresentation
, from which they get __hash__
and __eq__
. So it should be OK for Python3 with them, shouldn't it?
comment:21 in reply to: ↑ 19 Changed 4 years ago by
Replying to tscrim:
I would probably also consider removing
Map.__hash__
(and reimplementing that inContinuousMap
) for the same reasons as removingMorphism.__hash__
.
Removing Map.__hash__
, FormalCompositeMap.__hash__
(to be coherent with the removal of Map.__hash__
) and Morphism.__hash__
, while implementing ContinuousMap.__hash__
, leads to (only) two doctest errors in all Sage 8.3.beta3, as reported by make ptestlong
:
sage t long warnlong 53.5 src/sage/combinat/root_system/hecke_algebra_representation.py ********************************************************************** File "src/sage/combinat/root_system/hecke_algebra_representation.py", line 683, in sage.combinat.root_system.hecke_algebra_representation.HeckeAlgebraRepresentation._test_Y Failed example: rho._test_Y() # long time (4s) ... TypeError: <class 'sage.modules.with_basis.morphism.ModuleMorphismByLinearity_with_category'> is not hashable
and
sage t long warnlong 53.5 src/sage/groups/semimonomial_transformations/semimonomial_transformation.pyx ********************************************************************** File "src/sage/groups/semimonomial_transformations/semimonomial_transformation.pyx", line 170, in sage.groups.semimonomial_transformations.semimonomial_transformation.SemimonomialTransformation.__hash__ Failed example: hash( SemimonomialTransformationGroup(F, 4).an_element() ) #random #indirect doctest ... TypeError: <type 'sage.rings.finite_rings.hom_finite_field.FiniteFieldHomomorphism_generic'> is not hashable
comment:22 Changed 4 years ago by
Those tests would just have to be removed I guess. Though I'm still not 100% sure now that we even want to outright remove Map.__hash__
. But maybe you actually agree with my initial thought to remove it.
comment:23 Changed 4 years ago by
Slightly more elegant than removing tests might be by installing a reasonable hash. For:
'sage.rings.finite_rings.hom_finite_field.FiniteFieldHomomorphism_generic'
I would think that
hash((f.domain(),f.codomain())+tuple(f.im_gens()))
might be a reasonable option.
and for the second one
hash((phi.domain(),phi.codomain(),tuple(phi(x) for x in X.basis())))
is probably OK. These are both homomorphisms on finitely generated structures, so equality and hashing are relatively straightforward (if they are implemented for elements of the codomain).
comment:24 Changed 4 years ago by
Finally, I've implemented ContinousMap.__hash__
in a separate ticket: #25502, in order to have a quick fix for the Python3 issue.
comment:25 Changed 4 years ago by
 Milestone changed from sage8.3 to sage8.4
I believe this issue can reasonably be addressed for Sage 8.4.
comment:26 followup: ↓ 27 Changed 3 years ago by
 Status changed from needs_work to needs_info
Looking back at #25502, and looking the comments in this ticket again, was the hash added in #25502 really necessary? If we still go with removing the existing Morphism.__hash__
(should we?) then ContinuousMap
inherits Map.__hash__
which is identical to the ContinuousMap.__hash__
added in #25502. Should Map.__hash__
also be removed?
comment:27 in reply to: ↑ 26 ; followup: ↓ 28 Changed 3 years ago by
Replying to embray:
Looking back at #25502, and looking the comments in this ticket again, was the hash added in #25502 really necessary? If we still go with removing the existing
Morphism.__hash__
(should we?) thenContinuousMap
inheritsMap.__hash__
which is identical to theContinuousMap.__hash__
added in #25502. ShouldMap.__hash__
also be removed?
Indeed ContinuousMap.__hash__
and Map.__hash__
are essentially identical, but I guess the logic of removing Morphism.__hash__
applies to Map.__hash__
as well. From the ticket description: As far as I can tell there is no code using the ability to hash Morphisms, and having this in the base class is misleading since there are subclasses which are not immutable and should not be hashable anyways. It would be better to implement hash on an asneeded basis for those classes that can guarantee immutability. If only Morphism.__hash__
is removed and Map.__hash__
is kept, then all subclasses will still have some __hash__
inherited from their base class (Map
) and therefore will look hashable to Python3, even if they are not immutable, which is truly dangerous...
PS: I realize that the removing of Map.__hash__
was already advocated by Travis for the very same reasons in comment:19. See also comment:21 for the consequences of the removal of Map.__hash__
(only 2 failed doctests).
comment:28 in reply to: ↑ 27 Changed 3 years ago by
Replying to egourgoulhon:
If only
Morphism.__hash__
is removed andMap.__hash__
is kept, then all subclasses will still have some__hash__
inherited from their base class (Map
) and therefore will look hashable to Python3, even if they are not immutable, which is truly dangerous...
Actually I was wrong: there is no danger since in Python3, a class that defines some __eq__
does no longer inherit __hash__
from any base class. From this part of Python 3 reference: If a class that overrides __eq__()
needs to retain the implementation of __hash__()
from a parent class, the interpreter must be told this explicitly by setting __hash__ = <ParentClass>.__hash__
. In our case, this means that if one does not write explicitly
__hash__ = Map.__hash__
(and of course does not reimplement __hash__
), then the subclass is not hashable.
Probably things will be more clear if Map.__hash__
is removed.
comment:29 Changed 3 years ago by
 Milestone changed from sage8.4 to sage8.5
comment:30 Changed 3 years ago by
 Milestone changed from sage8.5 to sage8.7
Retargeting some of my tickets (somewhat optimistically for now).
comment:31 Changed 3 years ago by
 Milestone changed from sage8.7 to sage8.8
Moving all my inprogress tickets to 8.8 milestone.
comment:32 Changed 3 years ago by
 Milestone changed from sage8.8 to sage8.9
Tickets still needing working or clarification should be moved to the next release milestone at the soonest (please feel free to revert if you think the ticket is close to being resolved).
comment:33 Changed 2 years ago by
 Milestone changed from sage8.9 to sage9.1
Ticket retargeted after milestone closed
comment:34 Changed 21 months ago by
 Milestone changed from sage9.1 to sage9.2
Moving tickets to milestone sage9.2 based on a review of last modification date, branch status, and severity.
comment:35 Changed 15 months ago by
 Milestone changed from sage9.2 to sage9.3
comment:36 Changed 9 months ago by
 Milestone changed from sage9.3 to sage9.4
Moving to 9.4, as 9.3 has been released.
comment:37 Changed 6 months ago by
 Milestone changed from sage9.4 to sage9.5
Setting a new milestone for this ticket based on a cursory review.
comment:38 Changed 5 weeks ago by
 Milestone changed from sage9.5 to sage9.6
Stalled in needs_review
or needs_info
; likely won't make it into Sage 9.5.
That is true iff a perfect hash function exists, which it generally does not. In fact, that is not mandated by Python. For example, a valid hash function is 0 for all objects, but you loose performance benefits from
dict
because it becomes a list search.The converse is really what you want:
==
implies same hash. (Also to be hashable, the hash should be immutable.) You can get away with it, but it causes problems when you try to put it into a set/dictionary as you get multiple instances of equal objects.The issue with this class is that we have essentially a mutable element class. So I think the proper thing to do is just disable hashing. We do something similar for vectors and matrices.