Opened 5 years ago

Last modified 5 years ago

#19016 closed defect

A more naive sage.structure.element.__hash__ — at Version 31

Reported by: ncohen Owned by:
Priority: critical Milestone: sage-6.10
Component: misc Keywords:
Cc: Merged in:
Authors: Nathann Cohen Reviewers:
Report Upstream: N/A Work issues:
Branch: u/ncohen/19016 (Commits) Commit: 754dc5794a1a7004c8844cf7cfb64220957c36a5
Dependencies: Stopgaps:

Description (last modified by vdelecroix)

As reported on sage-devel [1], the default hash function implemented in Element (as return hash(str(self)))) causes a lot of troubles

  • it breaks the equality => same hash assumption for finitely presented groups
        sage: G = groups.presentation.Cyclic(4)
        sage: G.cayley_graph().vertices()
        [1, a, a^2, a^-2, a^3, a^-1]
    
    and symbolic expressions
    sage: f=sin(x)^2
    sage: g=1-cos(x)^2
    sage: bool(f == g)
    True
    sage: hash(f) == hash(g)
    False
    

and possibly many others

  • it is highly incompatible with the rename feature (see also #8119)
    sage: from sage.structure.element import Element
    sage: class E(Element):
    ....:     def __init__(self):
    ....:         Element.__init__(self, Parent())
    sage: e = E()
    sage: hash(e)
    -4965357552728411610
    sage: e.rename('hey')
    sage: hash(e)
    -6429308858210906323
    
    and similarly, hashing should not be available on any mutable object.
  • it might be very bad for performance: see #18215 and #18239 for examples

There are several possibilities that are currently being discussed:

  • make it return 0 by default (original proposition of the ticket)
  • make it raise a TypeError by default (as it the case for SageObject, see #18246)
  • let it as it is but raise a Warning

[1] https://groups.google.com/d/topic/sage-devel/6rXKkF87Gtc/discussion

Change History (31)

comment:1 Changed 5 years ago by ncohen

  • Branch set to u/ncohen/19016
  • Description modified (diff)
  • Status changed from new to needs_review

comment:2 Changed 5 years ago by git

  • Commit set to 754dc5794a1a7004c8844cf7cfb64220957c36a5

Branch pushed to git repo; I updated commit sha1. New commits:

754dc57trac 19016: A more naive sage.structure.element.__hash__

comment:3 Changed 5 years ago by vdelecroix

Hi Nathann,

What about #18246? Just returning 0 will create many collisions that will be hard to track. I think that raising an error is more appropriate because you know what is to be fixed.

Vincent

comment:4 Changed 5 years ago by ncohen

If you see a way to turn #18246 into a needs_review ticket soon, then I do not mind. But I don't want to fight against 800 screaming guys again, and I want this to be solved.

What about a middle ground: raise a warning, which would appear only once, whenever this 0 hash function is used?

This is a very bad bug and it needs to be fixed. Creating tickets that we forget is nto much more responsible than doing nothing.

Nathann

comment:5 Changed 5 years ago by ncohen

By the way, it is not even the same hash function. Here it is Element.__hash__ and it is SageObject.__hash__ in #18246. Both need to be solved of course.

comment:6 follow-up: Changed 5 years ago by vdelecroix

Replying to ncohen:

If you see a way to turn #18246 into a needs_review ticket soon, then I do not mind. But I don't want to fight against 800 screaming guys again, and I want this to be solved.

Putting in needs_review just need some efforts. I can dig into it if you are up for a review.

What about a middle ground: raise a warning, which would appear only once, whenever this 0 hash function is used?

It is already a bit better.

This is a very bad bug and it needs to be fixed.

It is.

Creating tickets that we forget is nto much more responsible than doing nothing.

I did not forget as you see ;-)

With your implementation, I am pretty sure that the running time of the tests will be much slower. Sage very often put elements in dictionaries or sets. Your solution does not help users to see that __hash__ is important, because you will not run into bugs, just into very slow code. I agree that you fix the error but you open a much more annoying feature.

comment:7 in reply to: ↑ 6 Changed 5 years ago by ncohen

Putting in needs_review just need some efforts. I can dig into it if you are up for a review.

Remember that #18246 is not even about the hash function that I change here.

With your implementation, I am pretty sure that the running time of the tests will be much slower. Sage very often put elements in dictionaries or sets. Your solution does not help users to see that __hash__ is important, because you will not run into bugs, just into very slow code. I agree that you fix the error but you open a much more annoying feature.

If you think that we can fix the bug *quickly* by removing the __hash__ function from Element, then let us do that. If the result is that my ticket will be put "on hold" waiting for something that never happens (like it happened 4 months ago on #18246), then this ticket is better than an imaginary one.

Concerns about efficiency do not weight much compares to wrong results.

Nathann

comment:8 follow-up: Changed 5 years ago by nbruin

-1 to a constant hash. That leads to such bad performance that it amounts to "wrong results".

+1 to removing silly default hashes. Hashing is simply not something that can be done on a very general level.

For your example, unless you come up with a canonical form for group elements that you can take the hash of, you really shouldn't have a hash function, by the way. These elements should be unhashable, not hashable with a constant hash value.

comment:9 Changed 5 years ago by vdelecroix

#18246 needs review (it removes the __hash__ from SageObject).

comment:10 in reply to: ↑ 8 ; follow-up: Changed 5 years ago by ncohen

-1 to a constant hash. That leads to such bad performance that it amounts to "wrong results".

+1 to removing silly default hashes. Hashing is simply not something that can be done on a very general level.

What is your position for "constant hash" Vs "we change nothing and the bug stays"? How many classes do you think would be broken (and would have to be fixed) if this hash is removed? Are you willing to help?

Nathann

comment:11 in reply to: ↑ 10 ; follow-up: Changed 5 years ago by nbruin

Replying to ncohen:

What is your position for "constant hash" Vs "we change nothing and the bug stays"?

"Change nothing". I think it's unfortunate that hashing by repr will usually work quite well in cases where a hash is possible: a hash usually requires a canonical form to compute the hash from and that canonical form will usually be used for printing, because that's nice. So in practice, the default hash (although much too slow to be a reasonable hash) will probably work when it should.

In cases where the current default hash *doesn't* work, I think that having it as it is now will lead to better detection than when we change it to a default hash of 0. When people find this in particular cases, the solution is clear: actively disable the hash.

Having to *disable* something is of course the wrong way around: the default hash shouldn't be there in the first place. But putting a constant 0 hash there instead doesn't make transitioning to the desired situation any easier.

Last edited 5 years ago by nbruin (previous) (diff)

comment:12 in reply to: ↑ 11 Changed 5 years ago by ncohen

  • Milestone changed from sage-6.9 to sage-duplicate/invalid/wontfix
  • Status changed from needs_review to positive_review

What is your position for "constant hash" Vs "we change nothing and the bug stays"?

"Change nothing".

I love those guys. "You fix a bug, but because it is not done the way I like I will block your tickets, and the bug will stay unsolved. And no, I will not help do it the way I want to see it done. You have to do it by yourself, while I watch you sweat".

Free software.

See, I wouldn't feel well with myself if after something like that I got to work and started implementing what you ask me to. So the bug will stay, thanks to you. Today, you contributed to the common effort for a better mathematical software.

Nathann

comment:13 follow-up: Changed 5 years ago by vdelecroix

  • Status changed from positive_review to needs_review

As I said in 9 I did remove the stupid __hash__ from SageObject (this is #18246). Not setting it to 0 but making it raise a TypeError. It was not a tremendous effort even if I had to dig in things that I do not understand.

I guess that the same strategy would also be good for Element and I agree with Nils on this. The warning option proposed in 4 by Nathann looks also good to me because each time you program something you will notice that you are doing something wrong. Could we try to found a solution? This is annoying and I am willing to help to fix it.

comment:14 in reply to: ↑ 13 Changed 5 years ago by ncohen

Could we try to found a solution? This is annoying and I am willing to help to fix it.

I do not intend to touch this ticket again. I'm starting a new trend: when people tell me that my idea is bad, I stop fighting and leave (see #18904, though it's not the first). So that if I ever find something more interesting to do during my holidays than contribute to Sage, I won't hesitate to delete my account and move on to other things.

Nathann

comment:15 follow-up: Changed 5 years ago by vbraun

The default __hash__ could be moved from Element to a CanonicalRepresentation (or so) category. After injecting that into a few basic categories there are probably not that many parents where it has to be added by hand... thoughts?

comment:16 in reply to: ↑ 15 Changed 5 years ago by vdelecroix

Replying to vbraun:

The default __hash__ could be moved from Element to a CanonicalRepresentation (or so) category. After injecting that into a few basic categories there are probably not that many parents where it has to be added by hand... thoughts?

That might be an easy move since we would just have to change the category to something like PreviousCategory().WithCanonicalRepresentation(). For me this has to be done on a case by case basis for each Parent. The advantage would be the simplicity. But I found that this would mostly move the problem. But it would be better, because explicit in the parent implementation.

But in the long run, do we really want to use only once the string representation for the hash? Is that how we want the user to implement their own Parent/Element? I like Nathann's idea of having a warning if the default hash is called (and the default could be either hash(repr(P)) or 0 I do not really care).

comment:17 Changed 5 years ago by vdelecroix

  • Status changed from needs_review to needs_work

comment:18 Changed 5 years ago by vbraun

Often you can have normal forms on general grounds, so IMHO would be reasonable to just make CanonicalRepresentation a part of vector spaces and such. Thats why I said we shouldn't have to do it for every parent by hand. For most parents the hash(repr(element)) is a good enough hash function. Sure you could make it faster by an overall constant factor, but its still gives you O(1) associative containers.

comment:19 Changed 5 years ago by mmarco

I definitely don't like to have 0 hashes everywhere. I like the solution proposed by Volker: having hash only on those parents where we actually have a canonical form for each element.

On the other hand, it will take some time to be implemented, and in the meantime, the problem stated by Nathan would persist. So my proposal for a temporary quick fix for that is the following: for finitely presented group elements, just hash its representation in the abelianization (or any other invariant of the element that can be computed quickly). It would work as a hash should (that is, it will distinguish elements in some cases, but will not distnguish different representations of the same element), even if there would still be some collisions.

comment:20 Changed 5 years ago by mmarco

I am getting the following errors:

mmarco@mountain ~/sage $ ./sage -t src/sage/groups/finitely_presented.py 
Running doctests with ID 2015-08-15-00-27-39-750876a9.
Git branch: t/19016/19016
Using --optional=gdb,mpir,python2,sage,scons,tides
Doctesting 1 file.
sage -t --warn-long 59.1 src/sage/groups/finitely_presented.py
**********************************************************************
File "src/sage/groups/finitely_presented.py", line 326, in sage.groups.finitely_presented.FinitelyPresentedGroupElement.__call__
Failed example:
    w(1, 2, check=False)    # result depends on presentation of the group element
Expected:
    2
Got:
    8
**********************************************************************
1 item had failures:
   1 of  10 in sage.groups.finitely_presented.FinitelyPresentedGroupElement.__call__
    [310 tests, 1 failure, 6.06 s]
----------------------------------------------------------------------
sage -t --warn-long 59.1 src/sage/groups/finitely_presented.py  # 1 doctest failed
----------------------------------------------------------------------
Total time for all tests: 6.1 seconds
    cpu time: 5.8 seconds
    cumulative wall time: 6.1 seconds

comment:21 Changed 5 years ago by nbruin

A little digging in Gap's FP compare functions source shows that for finite finitely presented groups, our hash, given these definitions:

FamilyObj=libgap.function_factory("FamilyObj")
ElementsFamily=libgap.function_factory("ElementsFamily")
FPFaithHom=libgap.function_factory("FPFaithHom")
Image=libgap.function_factory("Image")

should probably look something like this:

sage: G=groups.presentation.Cyclic(4)
sage: f=FPFaithHom(ElementsFamily(FamilyObj(G.gap())))
sage: newhash=lambda a: hash(Image(f,a.gap()))

A little test:

sage: import collections
sage: L=[G.0^i for i in [-5..5]]
sage: collections.Counter(hash(a) == hash(b) for a in L for b in L if a==b)
Counter({False: 20, True: 11})
sage: collections.Counter(newhash(a) == newhash(b) for a in L for b in L if a==b)
Counter({True: 31})
sage: collections.Counter(newhash(a) == newhash(b) for a in L for b in L if a!=b)
Counter({False: 90})

comment:22 follow-up: Changed 5 years ago by mmarco

The problem is that we don't know, a priory, if a finitely presented group is finite or not. And just trying to check it automatically can take forever or exhaust the memory. That is why I proposed to go for the abelianization, which is an invariant of the element that can be always computed (and in a reasonably fast way). It still has some collisions, but it is way better than having always a collision.

comment:23 in reply to: ↑ 22 Changed 5 years ago by nbruin

Replying to mmarco:

The problem is that we don't know, a priory, if a finitely presented group is finite or not. And just trying to check it automatically can take forever or exhaust the memory. That is why I proposed to go for the abelianization, which is an invariant of the element that can be always computed (and in a reasonably fast way). It still has some collisions, but it is way better than having always a collision.

For finite simple groups I think "some collisions" is a bit of an understatement, but that's not the main point. The thing is: for pretty much any application of hashing, you will need equality testing too (this is definitely the case in Python's dicts), and then Gap will just go off and compute this stuff anyway, unless we go with Volker's suggestion of inheriting equality (and hashing) from the free covering presentation.

So we should definitely only compute the hash function lazily, but if it's asked for, I think we would incur a very small penalty in practice if we replicate the logic from Gap's MakeFpGroupCompMethod.

comment:24 Changed 5 years ago by vdelecroix

Just a small paranthesis: there are two very different things

  1. Changing the default behavior of the hash function
  1. Having a better hash for FP group elements

These deserve two tickets, don't you think? The initial purpose of the ticket was 1 and now you are mainly discussing 2. Could you open a ticket "better hash for FP Group element"?

Vincent

comment:25 follow-up: Changed 5 years ago by mmarco

You are right, but i think they are very related, since FP groups is AFAIK the only parent where the default hash is problematic. So, maybe a solution for 2. would make 1. pointless (assuming there are no other places where the actual default hash is problematic in the same sense)

comment:26 in reply to: ↑ 25 ; follow-up: Changed 5 years ago by vdelecroix

Replying to mmarco:

You are right, but i think they are very related, since FP groups is AFAIK the only parent where the default hash is problematic. So, maybe a solution for 2. would make 1. pointless (assuming there are no other places where the actual default hash is problematic in the same sense)

I agree that they are related. But __hash__ is problematic in other situations and two of them are mentioned in #18246. Solving the FP group problem will not solve everything as far as I understand it.

comment:27 Changed 5 years ago by vdelecroix

And note that in the ticket description it is written "Incidentally, it fixes the following bug:"...

Last edited 5 years ago by vdelecroix (previous) (diff)

comment:28 Changed 5 years ago by mmarco

Ok then.

In that case, i think it makes more sense to not have a hash by default, than having a constant one.

comment:29 in reply to: ↑ 26 ; follow-up: Changed 5 years ago by nbruin

Replying to vdelecroix:

I agree that they are related. But __hash__ is problematic in other situations and two of them are mentioned in #18246. Solving the FP group problem will not solve everything as far as I understand it.

Indeed, absolutely. And #18246 is now marked fixed. Another example is (of course) SR -- another ring where equality and normal form are impossible to resolve in general

sage: f=sin(x)^2
sage: g=1-cos(x)^2
sage: bool(f == g)
True
sage: hash(f) == hash(g)
False

so, hopefully, with #18246 , the last command will lead to an error?

comment:30 in reply to: ↑ 29 Changed 5 years ago by vdelecroix

Replying to nbruin:

Replying to vdelecroix:

I agree that they are related. But __hash__ is problematic in other situations and two of them are mentioned in #18246. Solving the FP group problem will not solve everything as far as I understand it.

Indeed, absolutely. And #18246 is now marked fixed. Another example is (of course) SR -- another ring where equality and normal form are impossible to resolve in general

sage: f=sin(x)^2
sage: g=1-cos(x)^2
sage: bool(f == g)
True
sage: hash(f) == hash(g)
False

so, hopefully, with #18246 , the last command will lead to an error?

Sadly not

sage: f=sin(x)^2
sage: isinstance(f, sage.structure.element.Element)
True

Element does implement the return hash(str(self)) behavior. So this will incidentally be fixed with this ticket...

comment:31 Changed 5 years ago by vdelecroix

  • Description modified (diff)
  • Milestone changed from sage-duplicate/invalid/wontfix to sage-6.9

I wrote some more in the ticket description...

Note: See TracTickets for help on using tickets.