Opened 3 years ago
Closed 3 years ago
#28112 closed defect (fixed)
py3: Fix hash function of Integer
Reported by:  Markus Wageringel  Owned by:  

Priority:  major  Milestone:  sage8.9 
Component:  python3  Keywords:  
Cc:  Merged in:  
Authors:  Markus Wageringel  Reviewers:  Jeroen Demeyer, John Palmieri, Frédéric Chapoton 
Report Upstream:  N/A  Work issues:  
Branch:  b13f7f8 (Commits, GitHub, GitLab)  Commit:  b13f7f812bd02ae02a2a2549c860cc20cbc1bcd6 
Dependencies:  Stopgaps: 
Description
This ticket fixes an issue in sage.libs.gmp.pylong.mpz_pythonhash
, the hash function for Integer, and resolves the Python 3 doctest failures in sage/rings/integer.pyx
.
More precisely, the implementation of mpz_pythonhash
is using the fact that y <= modulus
, as explained in the comment
# Safely compute h = (h + y) % modulus knowing that h < modulus # and y <= modulus
However, this assumption is not always correct, as y
is the sum of up to three numbers. For example, this fails in
sage: hash(2^64  1), hash(int(2^64  1)) (2305843009213693958, 7)
Change History (20)
comment:1 followup: 3 Changed 3 years ago by
Authors:  → Markus Wageringel 

Branch:  → u/ghmwageringel/28112 
Commit:  → ba487ec15e63bff0e51f3d6afd5fb632f2b63d01 
Status:  new → needs_review 
Summary:  py3: Fix for Integer hash → py3: Fix hash function of Integer 
comment:2 Changed 3 years ago by
Reviewers:  → Jeroen Demeyer 

comment:3 followup: 6 Changed 3 years ago by
Replying to ghmwageringel:
I have not had the chance to test this with Python 3 on a 32bit machine.
Do we really need the exact values in the docstring? Wouldn't it be sufficient to mark those tests with # random
? The only thing really that matters is that hash(n) == hash(int(n))
.
comment:4 Changed 3 years ago by
Status:  needs_review → needs_work 

Given that y
is less than 2 * modulus
, it is more efficient to do
if y > modulus: y = modulus
(Note that the condition y > modulus
instead of y >= modulus
is intentional as it generates better code).
comment:5 Changed 3 years ago by
Commit:  ba487ec15e63bff0e51f3d6afd5fb632f2b63d01 → 87702411d84e713195bff3d8828d26f01ec9aa38 

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
8770241  28112: py3: fix Integer hash and doctests

comment:6 Changed 3 years ago by
Status:  needs_work → needs_review 

Replying to jdemeyer:
Do we really need the exact values in the docstring? Wouldn't it be sufficient to mark those tests with
# random
? The only thing really that matters is thathash(n) == hash(int(n))
.
I agree. I updated the tests accordingly, keeping just the 64bit Python 3 output. At some point, this should still be tested with 32bit, though.
I also applied the other suggested change. Thanks for the feedback.
comment:7 followup: 8 Changed 3 years ago by
I would prefer not to use long
in doctests, since it is not supported in Python 3. The only reason it works in Sage now is because of a hack that converts long
to int
when using Python 3. Maybe instead separate doctests?
sage: n.__hash__() == hash(long(n)) # py2 True sage: n.__hash__() == hash(n) # py3 True
comment:8 Changed 3 years ago by
Replying to jhpalmieri:
I would prefer not to use
long
in doctests, since it is not supported in Python 3. The only reason it works in Sage now is because of a hack that convertslong
toint
when using Python 3. Maybe instead separate doctests?
Good point. Instead of having separate doctests, I think it is preferable to replace long(n)
by int(n)
which should work in both Python 2 and 3. In Python 2, it will still return a long
if the number is too large.
comment:9 Changed 3 years ago by
Commit:  87702411d84e713195bff3d8828d26f01ec9aa38 → b1efe822fc4df0770f451dd5c44c71025329f0bb 

Branch pushed to git repo; I updated commit sha1. New commits:
b1efe82  28112: py3: replace long by int in Integer doctests

comment:10 followup: 11 Changed 3 years ago by
Reviewers:  Jeroen Demeyer → Jeroen Demeyer, John Palmieri 

One final nitpick: this comment # This is safe, since y <= 2 * modulus < 2 ^ limb_bits.
is not clear, I don't know what you're trying to say with it. I would say something like
# At this point, y < 2 * modulus but we need y <= modulus # We use > instead of >= on the line below because it # generates more efficient code
comment:11 followup: 15 Changed 3 years ago by
Replying to jdemeyer:
One final nitpick: this comment
# This is safe, since y <= 2 * modulus < 2 ^ limb_bits.
is not clear, I don't know what you're trying to say with it.
What I was trying to emphasize is that y
did not overflow which is important for the comparison to work. I will rephrase it to make this clearer.
Before adding a comment about efficiency, I would like to understand better why this is more efficient and why it is important. Could you explain this please? Is it because it saves a subtraction occasionally, or are >
comparisons generally faster than >=
, or maybe would it help the compiler do branch elimination?
comment:12 Changed 3 years ago by
Is it easy to also address this? In Sage + Python 2:
sage: all(hash(2^n1)==hash(SR(2^n1)) for n in range(60, 100)) True
In Sage + Python 3:
sage: hash(2^601) == hash(SR(2^601)) True sage: hash(2^611) == hash(SR(2^611)) False
comment:13 followup: 16 Changed 3 years ago by
Let us keep the SR issue for elsewhere. The branch looks good to me.
Jeroen, do you approve ?
comment:15 followup: 18 Changed 3 years ago by
Replying to ghmwageringel:
I would like to understand better why this is more efficient and why it is important.
It's certainly not important. It's just that, out of two possible alternatives, the >
alternative is slightly faster.
Could you explain this please?
I don't think that there is a fundamental reason. It's just that the compiler generates more efficient instructions for the >
variant.
comment:16 Changed 3 years ago by
Replying to chapoton:
Jeroen, do you approve ?
Yes, I approve. I'm just waiting for the rewording of the comment.
comment:17 Changed 3 years ago by
Commit:  b1efe822fc4df0770f451dd5c44c71025329f0bb → b13f7f812bd02ae02a2a2549c860cc20cbc1bcd6 

Branch pushed to git repo; I updated commit sha1. New commits:
b13f7f8  28112: reword a comment in hash function

comment:18 Changed 3 years ago by
Replying to jdemeyer:
Replying to ghmwageringel:
I would like to understand better why this is more efficient and why it is important.
It's certainly not important. It's just that, out of two possible alternatives, the
>
alternative is slightly faster.Could you explain this please?
I don't think that there is a fundamental reason. It's just that the compiler generates more efficient instructions for the
>
variant.
Thank you for explaining. I changed the comment. It should be fine now.
comment:19 Changed 3 years ago by
Reviewers:  Jeroen Demeyer, John Palmieri → Jeroen Demeyer, John Palmieri, Frédéric Chapoton 

Status:  needs_review → positive_review 
I am setting to positive. Jeroen, if you do not agree, then undo
comment:20 Changed 3 years ago by
Branch:  u/ghmwageringel/28112 → b13f7f812bd02ae02a2a2549c860cc20cbc1bcd6 

Resolution:  → fixed 
Status:  positive_review → closed 
I have not had the chance to test this with Python 3 on a 32bit machine. Instead, I just computed the 32bit hashes in the doctests using a modulus of
2^31  1
. Could someone with access to a 32bit machine confirm or correct these, please?New commits:
28112: py3: fix Integer hash and doctests