Opened 5 years ago
Last modified 5 years ago
#19016 closed defect
A more naive sage.structure.element.__hash__ — at Version 31
Reported by: | ncohen | Owned by: | |
---|---|---|---|
Priority: | critical | Milestone: | sage-6.10 |
Component: | misc | Keywords: | |
Cc: | Merged in: | ||
Authors: | Nathann Cohen | Reviewers: | |
Report Upstream: | N/A | Work issues: | |
Branch: | u/ncohen/19016 (Commits) | Commit: | 754dc5794a1a7004c8844cf7cfb64220957c36a5 |
Dependencies: | Stopgaps: |
Description (last modified by )
As reported on sage-devel [1], the default hash function implemented in Element
(as return hash(str(self)))
) causes a lot of troubles
- it breaks the
equality => same hash
assumption for finitely presented groupssage: G = groups.presentation.Cyclic(4) sage: G.cayley_graph().vertices() [1, a, a^2, a^-2, a^3, a^-1]
and symbolic expressionssage: f=sin(x)^2 sage: g=1-cos(x)^2 sage: bool(f == g) True sage: hash(f) == hash(g) False
and possibly many others
- it is highly incompatible with the
rename
feature (see also #8119)sage: from sage.structure.element import Element sage: class E(Element): ....: def __init__(self): ....: Element.__init__(self, Parent()) sage: e = E() sage: hash(e) -4965357552728411610 sage: e.rename('hey') sage: hash(e) -6429308858210906323
and similarly, hashing should not be available on any mutable object.
There are several possibilities that are currently being discussed:
- make it return
0
by default (original proposition of the ticket) - make it raise a
TypeError
by default (as it the case forSageObject
, see #18246) - let it as it is but raise a Warning
[1] https://groups.google.com/d/topic/sage-devel/6rXKkF87Gtc/discussion
Change History (31)
comment:1 Changed 5 years ago by
- Branch set to u/ncohen/19016
- Description modified (diff)
- Status changed from new to needs_review
comment:2 Changed 5 years ago by
- Commit set to 754dc5794a1a7004c8844cf7cfb64220957c36a5
comment:3 Changed 5 years ago by
Hi Nathann,
What about #18246? Just returning 0
will create many collisions that will be hard to track. I think that raising an error is more appropriate because you know what is to be fixed.
Vincent
comment:4 Changed 5 years ago by
If you see a way to turn #18246 into a needs_review
ticket soon, then I do not mind. But I don't want to fight against 800 screaming guys again, and I want this to be solved.
What about a middle ground: raise a warning, which would appear only once, whenever this 0 hash function is used?
This is a very bad bug and it needs to be fixed. Creating tickets that we forget is nto much more responsible than doing nothing.
Nathann
comment:5 Changed 5 years ago by
By the way, it is not even the same hash function. Here it is Element.__hash__
and it is SageObject.__hash__
in #18246. Both need to be solved of course.
comment:6 follow-up: ↓ 7 Changed 5 years ago by
Replying to ncohen:
If you see a way to turn #18246 into a
needs_review
ticket soon, then I do not mind. But I don't want to fight against 800 screaming guys again, and I want this to be solved.
Putting in needs_review
just need some efforts. I can dig into it if you are up for a review.
What about a middle ground: raise a warning, which would appear only once, whenever this 0 hash function is used?
It is already a bit better.
This is a very bad bug and it needs to be fixed.
It is.
Creating tickets that we forget is nto much more responsible than doing nothing.
I did not forget as you see ;-)
With your implementation, I am pretty sure that the running time of the tests will be much slower. Sage very often put elements in dictionaries or sets. Your solution does not help users to see that __hash__
is important, because you will not run into bugs, just into very slow code. I agree that you fix the error but you open a much more annoying feature.
comment:7 in reply to: ↑ 6 Changed 5 years ago by
Putting in
needs_review
just need some efforts. I can dig into it if you are up for a review.
Remember that #18246 is not even about the hash function that I change here.
With your implementation, I am pretty sure that the running time of the tests will be much slower. Sage very often put elements in dictionaries or sets. Your solution does not help users to see that
__hash__
is important, because you will not run into bugs, just into very slow code. I agree that you fix the error but you open a much more annoying feature.
If you think that we can fix the bug *quickly* by removing the __hash__
function from Element, then let us do that. If the result is that my ticket will be put "on hold" waiting for something that never happens (like it happened 4 months ago on #18246), then this ticket is better than an imaginary one.
Concerns about efficiency do not weight much compares to wrong results.
Nathann
comment:8 follow-up: ↓ 10 Changed 5 years ago by
-1 to a constant hash. That leads to such bad performance that it amounts to "wrong results".
+1 to removing silly default hashes. Hashing is simply not something that can be done on a very general level.
For your example, unless you come up with a canonical form for group elements that you can take the hash of, you really shouldn't have a hash function, by the way. These elements should be unhashable, not hashable with a constant hash value.
comment:9 Changed 5 years ago by
#18246 needs review (it removes the __hash__
from SageObject
).
comment:10 in reply to: ↑ 8 ; follow-up: ↓ 11 Changed 5 years ago by
-1 to a constant hash. That leads to such bad performance that it amounts to "wrong results".
+1 to removing silly default hashes. Hashing is simply not something that can be done on a very general level.
What is your position for "constant hash" Vs "we change nothing and the bug stays"? How many classes do you think would be broken (and would have to be fixed) if this hash is removed? Are you willing to help?
Nathann
comment:11 in reply to: ↑ 10 ; follow-up: ↓ 12 Changed 5 years ago by
Replying to ncohen:
What is your position for "constant hash" Vs "we change nothing and the bug stays"?
"Change nothing". I think it's unfortunate that hashing by repr will usually work quite well in cases where a hash is possible: a hash usually requires a canonical form to compute the hash from and that canonical form will usually be used for printing, because that's nice. So in practice, the default hash (although much too slow to be a reasonable hash) will probably work when it should.
In cases where the current default hash *doesn't* work, I think that having it as it is now will lead to better detection than when we change it to a default hash of 0. When people find this in particular cases, the solution is clear: actively disable the hash.
Having to *disable* something is of course the wrong way around: the default hash shouldn't be there in the first place. But putting a constant 0 hash there instead doesn't make transitioning to the desired situation any easier.
comment:12 in reply to: ↑ 11 Changed 5 years ago by
- Milestone changed from sage-6.9 to sage-duplicate/invalid/wontfix
- Status changed from needs_review to positive_review
What is your position for "constant hash" Vs "we change nothing and the bug stays"?
"Change nothing".
I love those guys. "You fix a bug, but because it is not done the way I like I will block your tickets, and the bug will stay unsolved. And no, I will not help do it the way I want to see it done. You have to do it by yourself, while I watch you sweat".
Free software.
See, I wouldn't feel well with myself if after something like that I got to work and started implementing what you ask me to. So the bug will stay, thanks to you. Today, you contributed to the common effort for a better mathematical software.
Nathann
comment:13 follow-up: ↓ 14 Changed 5 years ago by
- Status changed from positive_review to needs_review
As I said in 9 I did remove the stupid __hash__
from SageObject
(this is #18246). Not setting it to 0
but making it raise a TypeError
. It was not a tremendous effort even if I had to dig in things that I do not understand.
I guess that the same strategy would also be good for Element
and I agree with Nils on this. The warning option proposed in 4 by Nathann looks also good to me because each time you program something you will notice that you are doing something wrong. Could we try to found a solution? This is annoying and I am willing to help to fix it.
comment:14 in reply to: ↑ 13 Changed 5 years ago by
Could we try to found a solution? This is annoying and I am willing to help to fix it.
I do not intend to touch this ticket again. I'm starting a new trend: when people tell me that my idea is bad, I stop fighting and leave (see #18904, though it's not the first). So that if I ever find something more interesting to do during my holidays than contribute to Sage, I won't hesitate to delete my account and move on to other things.
Nathann
comment:15 follow-up: ↓ 16 Changed 5 years ago by
The default __hash__
could be moved from Element to a CanonicalRepresentation (or so) category. After injecting that into a few basic categories there are probably not that many parents where it has to be added by hand... thoughts?
comment:16 in reply to: ↑ 15 Changed 5 years ago by
Replying to vbraun:
The default
__hash__
could be moved from Element to a CanonicalRepresentation (or so) category. After injecting that into a few basic categories there are probably not that many parents where it has to be added by hand... thoughts?
That might be an easy move since we would just have to change the category to something like PreviousCategory().WithCanonicalRepresentation()
. For me this has to be done on a case by case basis for each Parent
. The advantage would be the simplicity. But I found that this would mostly move the problem. But it would be better, because explicit in the parent implementation.
But in the long run, do we really want to use only once the string representation for the hash? Is that how we want the user to implement their own Parent/Element
? I like Nathann's idea of having a warning if the default hash is called (and the default could be either hash(repr(P))
or 0
I do not really care).
comment:17 Changed 5 years ago by
- Status changed from needs_review to needs_work
comment:18 Changed 5 years ago by
Often you can have normal forms on general grounds, so IMHO would be reasonable to just make CanonicalRepresentation
a part of vector spaces and such. Thats why I said we shouldn't have to do it for every parent by hand. For most parents the hash(repr(element))
is a good enough hash function. Sure you could make it faster by an overall constant factor, but its still gives you O(1) associative containers.
comment:19 Changed 5 years ago by
I definitely don't like to have 0 hashes everywhere. I like the solution proposed by Volker: having hash only on those parents where we actually have a canonical form for each element.
On the other hand, it will take some time to be implemented, and in the meantime, the problem stated by Nathan would persist. So my proposal for a temporary quick fix for that is the following: for finitely presented group elements, just hash its representation in the abelianization (or any other invariant of the element that can be computed quickly). It would work as a hash should (that is, it will distinguish elements in some cases, but will not distnguish different representations of the same element), even if there would still be some collisions.
comment:20 Changed 5 years ago by
I am getting the following errors:
mmarco@mountain ~/sage $ ./sage -t src/sage/groups/finitely_presented.py Running doctests with ID 2015-08-15-00-27-39-750876a9. Git branch: t/19016/19016 Using --optional=gdb,mpir,python2,sage,scons,tides Doctesting 1 file. sage -t --warn-long 59.1 src/sage/groups/finitely_presented.py ********************************************************************** File "src/sage/groups/finitely_presented.py", line 326, in sage.groups.finitely_presented.FinitelyPresentedGroupElement.__call__ Failed example: w(1, 2, check=False) # result depends on presentation of the group element Expected: 2 Got: 8 ********************************************************************** 1 item had failures: 1 of 10 in sage.groups.finitely_presented.FinitelyPresentedGroupElement.__call__ [310 tests, 1 failure, 6.06 s] ---------------------------------------------------------------------- sage -t --warn-long 59.1 src/sage/groups/finitely_presented.py # 1 doctest failed ---------------------------------------------------------------------- Total time for all tests: 6.1 seconds cpu time: 5.8 seconds cumulative wall time: 6.1 seconds
comment:21 Changed 5 years ago by
A little digging in Gap's FP compare functions source shows that for finite finitely presented groups, our hash, given these definitions:
FamilyObj=libgap.function_factory("FamilyObj") ElementsFamily=libgap.function_factory("ElementsFamily") FPFaithHom=libgap.function_factory("FPFaithHom") Image=libgap.function_factory("Image")
should probably look something like this:
sage: G=groups.presentation.Cyclic(4) sage: f=FPFaithHom(ElementsFamily(FamilyObj(G.gap()))) sage: newhash=lambda a: hash(Image(f,a.gap()))
A little test:
sage: import collections sage: L=[G.0^i for i in [-5..5]] sage: collections.Counter(hash(a) == hash(b) for a in L for b in L if a==b) Counter({False: 20, True: 11}) sage: collections.Counter(newhash(a) == newhash(b) for a in L for b in L if a==b) Counter({True: 31}) sage: collections.Counter(newhash(a) == newhash(b) for a in L for b in L if a!=b) Counter({False: 90})
comment:22 follow-up: ↓ 23 Changed 5 years ago by
The problem is that we don't know, a priory, if a finitely presented group is finite or not. And just trying to check it automatically can take forever or exhaust the memory. That is why I proposed to go for the abelianization, which is an invariant of the element that can be always computed (and in a reasonably fast way). It still has some collisions, but it is way better than having always a collision.
comment:23 in reply to: ↑ 22 Changed 5 years ago by
Replying to mmarco:
The problem is that we don't know, a priory, if a finitely presented group is finite or not. And just trying to check it automatically can take forever or exhaust the memory. That is why I proposed to go for the abelianization, which is an invariant of the element that can be always computed (and in a reasonably fast way). It still has some collisions, but it is way better than having always a collision.
For finite simple groups I think "some collisions" is a bit of an understatement, but that's not the main point. The thing is: for pretty much any application of hashing, you will need equality testing too (this is definitely the case in Python's dicts), and then Gap will just go off and compute this stuff anyway, unless we go with Volker's suggestion of inheriting equality (and hashing) from the free covering presentation.
So we should definitely only compute the hash function lazily, but if it's asked for, I think we would incur a very small penalty in practice if we replicate the logic from Gap's MakeFpGroupCompMethod.
comment:24 Changed 5 years ago by
Just a small paranthesis: there are two very different things
- Changing the default behavior of the hash function
- Having a better hash for FP group elements
These deserve two tickets, don't you think? The initial purpose of the ticket was 1 and now you are mainly discussing 2. Could you open a ticket "better hash for FP Group element"?
Vincent
comment:25 follow-up: ↓ 26 Changed 5 years ago by
You are right, but i think they are very related, since FP groups is AFAIK the only parent where the default hash is problematic. So, maybe a solution for 2. would make 1. pointless (assuming there are no other places where the actual default hash is problematic in the same sense)
comment:26 in reply to: ↑ 25 ; follow-up: ↓ 29 Changed 5 years ago by
Replying to mmarco:
You are right, but i think they are very related, since FP groups is AFAIK the only parent where the default hash is problematic. So, maybe a solution for 2. would make 1. pointless (assuming there are no other places where the actual default hash is problematic in the same sense)
I agree that they are related. But __hash__
is problematic in other situations and two of them are mentioned in #18246. Solving the FP group problem will not solve everything as far as I understand it.
comment:27 Changed 5 years ago by
And note that in the ticket description it is written "Incidentally, it fixes the following bug:"...
comment:28 Changed 5 years ago by
Ok then.
In that case, i think it makes more sense to not have a hash by default, than having a constant one.
comment:29 in reply to: ↑ 26 ; follow-up: ↓ 30 Changed 5 years ago by
Replying to vdelecroix:
I agree that they are related. But
__hash__
is problematic in other situations and two of them are mentioned in #18246. Solving the FP group problem will not solve everything as far as I understand it.
Indeed, absolutely. And #18246 is now marked fixed. Another example is (of course) SR -- another ring where equality and normal form are impossible to resolve in general
sage: f=sin(x)^2 sage: g=1-cos(x)^2 sage: bool(f == g) True sage: hash(f) == hash(g) False
so, hopefully, with #18246 , the last command will lead to an error?
comment:30 in reply to: ↑ 29 Changed 5 years ago by
Replying to nbruin:
Replying to vdelecroix:
I agree that they are related. But
__hash__
is problematic in other situations and two of them are mentioned in #18246. Solving the FP group problem will not solve everything as far as I understand it.Indeed, absolutely. And #18246 is now marked fixed. Another example is (of course) SR -- another ring where equality and normal form are impossible to resolve in general
sage: f=sin(x)^2 sage: g=1-cos(x)^2 sage: bool(f == g) True sage: hash(f) == hash(g) Falseso, hopefully, with #18246 , the last command will lead to an error?
Sadly not
sage: f=sin(x)^2 sage: isinstance(f, sage.structure.element.Element) True
Element
does implement the return hash(str(self))
behavior. So this will incidentally be fixed with this ticket...
comment:31 Changed 5 years ago by
- Description modified (diff)
- Milestone changed from sage-duplicate/invalid/wontfix to sage-6.9
I wrote some more in the ticket description...
Branch pushed to git repo; I updated commit sha1. New commits:
trac 19016: A more naive sage.structure.element.__hash__