Opened 5 months ago

Closed 4 months ago

#28112 closed defect (fixed)

py3: Fix hash function of Integer

Reported by: gh-mwageringel Owned by:
Priority: major Milestone: sage-8.9
Component: python3 Keywords:
Cc: Merged in:
Authors: Markus Wageringel Reviewers: Jeroen Demeyer, John Palmieri, Frédéric Chapoton
Report Upstream: N/A Work issues:
Branch: b13f7f8 (Commits) Commit: b13f7f812bd02ae02a2a2549c860cc20cbc1bcd6
Dependencies: Stopgaps:

Description

This ticket fixes an issue in sage.libs.gmp.pylong.mpz_pythonhash, the hash function for Integer, and resolves the Python 3 doctest failures in sage/rings/integer.pyx.

More precisely, the implementation of mpz_pythonhash is using the fact that y <= modulus, as explained in the comment

        # Safely compute h = (h + y) % modulus knowing that h < modulus
        # and y <= modulus

However, this assumption is not always correct, as y is the sum of up to three numbers. For example, this fails in

sage: hash(2^64 - 1), hash(int(2^64 - 1))
(2305843009213693958, 7)

Change History (20)

comment:1 follow-up: Changed 5 months ago by gh-mwageringel

  • Authors set to Markus Wageringel
  • Branch set to u/gh-mwageringel/28112
  • Commit set to ba487ec15e63bff0e51f3d6afd5fb632f2b63d01
  • Status changed from new to needs_review
  • Summary changed from py3: Fix for Integer hash to py3: Fix hash function of Integer

I have not had the chance to test this with Python 3 on a 32-bit machine. Instead, I just computed the 32-bit hashes in the doctests using a modulus of 2^31 - 1. Could someone with access to a 32-bit machine confirm or correct these, please?


New commits:

ba487ec28112: py3: fix Integer hash and doctests

comment:2 Changed 5 months ago by jdemeyer

  • Reviewers set to Jeroen Demeyer

comment:3 in reply to: ↑ 1 ; follow-up: Changed 5 months ago by jdemeyer

Replying to gh-mwageringel:

I have not had the chance to test this with Python 3 on a 32-bit machine.

Do we really need the exact values in the docstring? Wouldn't it be sufficient to mark those tests with # random? The only thing really that matters is that hash(n) == hash(int(n)).

comment:4 Changed 5 months ago by jdemeyer

  • Status changed from needs_review to needs_work

Given that y is less than 2 * modulus, it is more efficient to do

if y > modulus:
    y -= modulus

(Note that the condition y > modulus instead of y >= modulus is intentional as it generates better code).

comment:5 Changed 5 months ago by git

  • Commit changed from ba487ec15e63bff0e51f3d6afd5fb632f2b63d01 to 87702411d84e713195bff3d8828d26f01ec9aa38

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

877024128112: py3: fix Integer hash and doctests

comment:6 in reply to: ↑ 3 Changed 5 months ago by gh-mwageringel

  • Status changed from needs_work to needs_review

Replying to jdemeyer:

Do we really need the exact values in the docstring? Wouldn't it be sufficient to mark those tests with # random? The only thing really that matters is that hash(n) == hash(int(n)).

I agree. I updated the tests accordingly, keeping just the 64-bit Python 3 output. At some point, this should still be tested with 32-bit, though.

I also applied the other suggested change. Thanks for the feedback.

comment:7 follow-up: Changed 5 months ago by jhpalmieri

I would prefer not to use long in doctests, since it is not supported in Python 3. The only reason it works in Sage now is because of a hack that converts long to int when using Python 3. Maybe instead separate doctests?

sage: n.__hash__() == hash(long(n)) # py2
True
sage: n.__hash__() == hash(n) # py3
True

comment:8 in reply to: ↑ 7 Changed 5 months ago by gh-mwageringel

Replying to jhpalmieri:

I would prefer not to use long in doctests, since it is not supported in Python 3. The only reason it works in Sage now is because of a hack that converts long to int when using Python 3. Maybe instead separate doctests?

Good point. Instead of having separate doctests, I think it is preferable to replace long(n) by int(n) which should work in both Python 2 and 3. In Python 2, it will still return a long if the number is too large.

Last edited 5 months ago by gh-mwageringel (previous) (diff)

comment:9 Changed 5 months ago by git

  • Commit changed from 87702411d84e713195bff3d8828d26f01ec9aa38 to b1efe822fc4df0770f451dd5c44c71025329f0bb

Branch pushed to git repo; I updated commit sha1. New commits:

b1efe8228112: py3: replace long by int in Integer doctests

comment:10 follow-up: Changed 5 months ago by jdemeyer

  • Reviewers changed from Jeroen Demeyer to Jeroen Demeyer, John Palmieri

One final nitpick: this comment # This is safe, since y <= 2 * modulus < 2 ^ limb_bits. is not clear, I don't know what you're trying to say with it. I would say something like

# At this point, y < 2 * modulus but we need y <= modulus
# We use > instead of >= on the line below because it
# generates more efficient code

comment:11 in reply to: ↑ 10 ; follow-up: Changed 4 months ago by gh-mwageringel

Replying to jdemeyer:

One final nitpick: this comment # This is safe, since y <= 2 * modulus < 2 ^ limb_bits. is not clear, I don't know what you're trying to say with it.

What I was trying to emphasize is that y did not overflow which is important for the comparison to work. I will rephrase it to make this clearer.

Before adding a comment about efficiency, I would like to understand better why this is more efficient and why it is important. Could you explain this please? Is it because it saves a subtraction occasionally, or are > comparisons generally faster than >=, or maybe would it help the compiler do branch elimination?

Last edited 4 months ago by gh-mwageringel (previous) (diff)

comment:12 Changed 4 months ago by jhpalmieri

Is it easy to also address this? In Sage + Python 2:

sage: all(hash(2^n-1)==hash(SR(2^n-1)) for n in range(60, 100))
True

In Sage + Python 3:

sage: hash(2^60-1) == hash(SR(2^60-1))
True
sage: hash(2^61-1) == hash(SR(2^61-1))
False

comment:13 follow-up: Changed 4 months ago by chapoton

Let us keep the SR issue for elsewhere. The branch looks good to me.

Jeroen, do you approve ?

comment:14 Changed 4 months ago by jdemeyer

The SR issue might be upstream in Pynac.

comment:15 in reply to: ↑ 11 ; follow-up: Changed 4 months ago by jdemeyer

Replying to gh-mwageringel:

I would like to understand better why this is more efficient and why it is important.

It's certainly not important. It's just that, out of two possible alternatives, the > alternative is slightly faster.

Could you explain this please?

I don't think that there is a fundamental reason. It's just that the compiler generates more efficient instructions for the > variant.

comment:16 in reply to: ↑ 13 Changed 4 months ago by jdemeyer

Replying to chapoton:

Jeroen, do you approve ?

Yes, I approve. I'm just waiting for the rewording of the comment.

comment:17 Changed 4 months ago by git

  • Commit changed from b1efe822fc4df0770f451dd5c44c71025329f0bb to b13f7f812bd02ae02a2a2549c860cc20cbc1bcd6

Branch pushed to git repo; I updated commit sha1. New commits:

b13f7f828112: reword a comment in hash function

comment:18 in reply to: ↑ 15 Changed 4 months ago by gh-mwageringel

Replying to jdemeyer:

Replying to gh-mwageringel:

I would like to understand better why this is more efficient and why it is important.

It's certainly not important. It's just that, out of two possible alternatives, the > alternative is slightly faster.

Could you explain this please?

I don't think that there is a fundamental reason. It's just that the compiler generates more efficient instructions for the > variant.

Thank you for explaining. I changed the comment. It should be fine now.

comment:19 Changed 4 months ago by chapoton

  • Reviewers changed from Jeroen Demeyer, John Palmieri to Jeroen Demeyer, John Palmieri, Frédéric Chapoton
  • Status changed from needs_review to positive_review

I am setting to positive. Jeroen, if you do not agree, then undo

comment:20 Changed 4 months ago by vbraun

  • Branch changed from u/gh-mwageringel/28112 to b13f7f812bd02ae02a2a2549c860cc20cbc1bcd6
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.