Opened 3 years ago
Closed 3 years ago
#28112 closed defect (fixed)
py3: Fix hash function of Integer
Reported by: | gh-mwageringel | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-8.9 |
Component: | python3 | Keywords: | |
Cc: | Merged in: | ||
Authors: | Markus Wageringel | Reviewers: | Jeroen Demeyer, John Palmieri, Frédéric Chapoton |
Report Upstream: | N/A | Work issues: | |
Branch: | b13f7f8 (Commits, GitHub, GitLab) | Commit: | b13f7f812bd02ae02a2a2549c860cc20cbc1bcd6 |
Dependencies: | Stopgaps: |
Description
This ticket fixes an issue in sage.libs.gmp.pylong.mpz_pythonhash
, the hash function for Integer, and resolves the Python 3 doctest failures in sage/rings/integer.pyx
.
More precisely, the implementation of mpz_pythonhash
is using the fact that y <= modulus
, as explained in the comment
# Safely compute h = (h + y) % modulus knowing that h < modulus # and y <= modulus
However, this assumption is not always correct, as y
is the sum of up to three numbers. For example, this fails in
sage: hash(2^64 - 1), hash(int(2^64 - 1)) (2305843009213693958, 7)
Change History (20)
comment:1 follow-up: ↓ 3 Changed 3 years ago by
- Branch set to u/gh-mwageringel/28112
- Commit set to ba487ec15e63bff0e51f3d6afd5fb632f2b63d01
- Status changed from new to needs_review
- Summary changed from py3: Fix for Integer hash to py3: Fix hash function of Integer
comment:2 Changed 3 years ago by
- Reviewers set to Jeroen Demeyer
comment:3 in reply to: ↑ 1 ; follow-up: ↓ 6 Changed 3 years ago by
Replying to gh-mwageringel:
I have not had the chance to test this with Python 3 on a 32-bit machine.
Do we really need the exact values in the docstring? Wouldn't it be sufficient to mark those tests with # random
? The only thing really that matters is that hash(n) == hash(int(n))
.
comment:4 Changed 3 years ago by
- Status changed from needs_review to needs_work
Given that y
is less than 2 * modulus
, it is more efficient to do
if y > modulus: y -= modulus
(Note that the condition y > modulus
instead of y >= modulus
is intentional as it generates better code).
comment:5 Changed 3 years ago by
- Commit changed from ba487ec15e63bff0e51f3d6afd5fb632f2b63d01 to 87702411d84e713195bff3d8828d26f01ec9aa38
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
8770241 | 28112: py3: fix Integer hash and doctests
|
comment:6 in reply to: ↑ 3 Changed 3 years ago by
- Status changed from needs_work to needs_review
Replying to jdemeyer:
Do we really need the exact values in the docstring? Wouldn't it be sufficient to mark those tests with
# random
? The only thing really that matters is thathash(n) == hash(int(n))
.
I agree. I updated the tests accordingly, keeping just the 64-bit Python 3 output. At some point, this should still be tested with 32-bit, though.
I also applied the other suggested change. Thanks for the feedback.
comment:7 follow-up: ↓ 8 Changed 3 years ago by
I would prefer not to use long
in doctests, since it is not supported in Python 3. The only reason it works in Sage now is because of a hack that converts long
to int
when using Python 3. Maybe instead separate doctests?
sage: n.__hash__() == hash(long(n)) # py2 True sage: n.__hash__() == hash(n) # py3 True
comment:8 in reply to: ↑ 7 Changed 3 years ago by
Replying to jhpalmieri:
I would prefer not to use
long
in doctests, since it is not supported in Python 3. The only reason it works in Sage now is because of a hack that convertslong
toint
when using Python 3. Maybe instead separate doctests?
Good point. Instead of having separate doctests, I think it is preferable to replace long(n)
by int(n)
which should work in both Python 2 and 3. In Python 2, it will still return a long
if the number is too large.
comment:9 Changed 3 years ago by
- Commit changed from 87702411d84e713195bff3d8828d26f01ec9aa38 to b1efe822fc4df0770f451dd5c44c71025329f0bb
Branch pushed to git repo; I updated commit sha1. New commits:
b1efe82 | 28112: py3: replace long by int in Integer doctests
|
comment:10 follow-up: ↓ 11 Changed 3 years ago by
- Reviewers changed from Jeroen Demeyer to Jeroen Demeyer, John Palmieri
One final nitpick: this comment # This is safe, since y <= 2 * modulus < 2 ^ limb_bits.
is not clear, I don't know what you're trying to say with it. I would say something like
# At this point, y < 2 * modulus but we need y <= modulus # We use > instead of >= on the line below because it # generates more efficient code
comment:11 in reply to: ↑ 10 ; follow-up: ↓ 15 Changed 3 years ago by
Replying to jdemeyer:
One final nitpick: this comment
# This is safe, since y <= 2 * modulus < 2 ^ limb_bits.
is not clear, I don't know what you're trying to say with it.
What I was trying to emphasize is that y
did not overflow which is important for the comparison to work. I will rephrase it to make this clearer.
Before adding a comment about efficiency, I would like to understand better why this is more efficient and why it is important. Could you explain this please? Is it because it saves a subtraction occasionally, or are <
comparisons generally faster than <=
, or maybe would it help the compiler do branch elimination?
comment:12 Changed 3 years ago by
Is it easy to also address this? In Sage + Python 2:
sage: all(hash(2^n-1)==hash(SR(2^n-1)) for n in range(60, 100)) True
In Sage + Python 3:
sage: hash(2^60-1) == hash(SR(2^60-1)) True sage: hash(2^61-1) == hash(SR(2^61-1)) False
comment:13 follow-up: ↓ 16 Changed 3 years ago by
Let us keep the SR issue for elsewhere. The branch looks good to me.
Jeroen, do you approve ?
comment:14 Changed 3 years ago by
The SR
issue might be upstream in Pynac.
comment:15 in reply to: ↑ 11 ; follow-up: ↓ 18 Changed 3 years ago by
Replying to gh-mwageringel:
I would like to understand better why this is more efficient and why it is important.
It's certainly not important. It's just that, out of two possible alternatives, the >
alternative is slightly faster.
Could you explain this please?
I don't think that there is a fundamental reason. It's just that the compiler generates more efficient instructions for the >
variant.
comment:16 in reply to: ↑ 13 Changed 3 years ago by
Replying to chapoton:
Jeroen, do you approve ?
Yes, I approve. I'm just waiting for the rewording of the comment.
comment:17 Changed 3 years ago by
- Commit changed from b1efe822fc4df0770f451dd5c44c71025329f0bb to b13f7f812bd02ae02a2a2549c860cc20cbc1bcd6
Branch pushed to git repo; I updated commit sha1. New commits:
b13f7f8 | 28112: reword a comment in hash function
|
comment:18 in reply to: ↑ 15 Changed 3 years ago by
Replying to jdemeyer:
Replying to gh-mwageringel:
I would like to understand better why this is more efficient and why it is important.
It's certainly not important. It's just that, out of two possible alternatives, the
>
alternative is slightly faster.Could you explain this please?
I don't think that there is a fundamental reason. It's just that the compiler generates more efficient instructions for the
>
variant.
Thank you for explaining. I changed the comment. It should be fine now.
comment:19 Changed 3 years ago by
- Reviewers changed from Jeroen Demeyer, John Palmieri to Jeroen Demeyer, John Palmieri, Frédéric Chapoton
- Status changed from needs_review to positive_review
I am setting to positive. Jeroen, if you do not agree, then undo
comment:20 Changed 3 years ago by
- Branch changed from u/gh-mwageringel/28112 to b13f7f812bd02ae02a2a2549c860cc20cbc1bcd6
- Resolution set to fixed
- Status changed from positive_review to closed
I have not had the chance to test this with Python 3 on a 32-bit machine. Instead, I just computed the 32-bit hashes in the doctests using a modulus of
2^31 - 1
. Could someone with access to a 32-bit machine confirm or correct these, please?New commits:
28112: py3: fix Integer hash and doctests