Opened 4 years ago

Closed 4 years ago

#24501 closed defect (fixed)

Hash of Integer broken on Python 3

Reported by: embray Owned by:
Priority: major Milestone: sage-8.2
Component: python3 Keywords:
Cc: Merged in:
Authors: Jeroen Demeyer Reviewers: Erik Bray
Report Upstream: N/A Work issues:
Branch: 248cdd8 (Commits, GitHub, GitLab) Commit: 248cdd8d0c021056c33c62814cbd04e382ec8f3e
Dependencies: Stopgaps:

Status badges

Description

Currently Integer.__hash__ is implemented with mpz_pythonhash, which is designed to return the same hash value for built-in int and longs.

In Python 3 something changed subtly about the hash algorithm such that this no longer holds:

sage: hash(int(2^61))
1
sage: hash(2^61)
2305843009213693952

I haven't looked into what the difference is yet. But I suspect this is the source of a number of errors I've seen.

Change History (11)

comment:2 Changed 4 years ago by embray

I see--essentially instead of reduction modulo ULONG_MAX it's modulo a prime _PyHASH_MODULUS.

comment:3 Changed 4 years ago by jdemeyer

  • Authors set to Jeroen Demeyer

I'll have a look

comment:4 Changed 4 years ago by jdemeyer

  • Branch set to u/jdemeyer/hash_of_integer_broken_on_python_3

comment:5 Changed 4 years ago by jdemeyer

  • Commit set to 19efd2db979069166ced3cf57c497d2fdd07858f
  • Status changed from new to needs_review

New commits:

19efd2dFix hashing of Integer on Python 3

comment:6 follow-up: Changed 4 years ago by embray

Ah, you beat me to it. I was doing roughly the same thing but with separate Python 2 and 3 branches, so your version is more "generic" I suppose.

I wonder if we an find some way to get away with not having to calculate modulus and limb_bits every time since they are constant, but it's just some bit shifts and maybe the compiler can optimize that out anyways.

comment:7 in reply to: ↑ 6 Changed 4 years ago by jdemeyer

Replying to embray:

I wonder if we an find some way to get away with not having to calculate modulus and limb_bits every time since they are constant, but it's just some bit shifts and maybe the compiler can optimize that out anyways.

Constant folding is so trivial that it is safe to assume that the compiler knows what to do.

comment:8 Changed 4 years ago by embray

It seems like you're defining limb_bits twice--once using the trick in cdef extern and then again in the function body.

comment:9 Changed 4 years ago by git

  • Commit changed from 19efd2db979069166ced3cf57c497d2fdd07858f to 248cdd8d0c021056c33c62814cbd04e382ec8f3e

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

248cdd8Fix hashing of Integer on Python 3

comment:10 Changed 4 years ago by embray

  • Reviewers set to Erik Bray
  • Status changed from needs_review to positive_review

Thanks for the quick work on this--looks good to me!

comment:11 Changed 4 years ago by vbraun

  • Branch changed from u/jdemeyer/hash_of_integer_broken_on_python_3 to 248cdd8d0c021056c33c62814cbd04e382ec8f3e
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.