Opened 4 years ago
Last modified 2 months ago
#21284 new defect
hash of polynomials is very bad
Reported by: | vdelecroix | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-9.3 |
Component: | algebra | Keywords: | |
Cc: | jdemeyer | Merged in: | |
Authors: | Reviewers: | ||
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description (last modified by )
A lot of conflicts:
sage: S.<t> = QQ[] sage: from itertools import product sage: values = [S(t) for t in product((0,1/2,1,2), repeat=5)] sage: len(set(map(hash,values))) 565 sage: len(values) 1024
Though hashing in this situation is delicate since we want to preserve the compatibility with:
- sparse polynomials
- multivariate polynomials
Change History (20)
comment:1 Changed 4 years ago by
- Dependencies set to #21272
comment:2 Changed 4 years ago by
comment:3 Changed 4 years ago by
- Description modified (diff)
Right product
comes from the itertools
module.
comment:4 Changed 3 years ago by
To make this more specific, in case this helps to chase this down:
sage: R.<t> = QQ[] sage: u = 2*t^3 + t^2; v = t^3 + 2*t + 1 sage: hash(u), hash(v) (29698519856292282, 29698519856292282)
The key bit of code is in the function hash_c
(omitting some comments):
for i from 0<= i <= self.degree(): if i == 1: # we delay the hashing until now to not waste it on a constant poly var_name_hash = hash(self._parent._names[0]) c_hash = hash(self[i]) if c_hash != 0: if i == 0: result += c_hash else: # Hash (self[i], generator, i) as a tuple according to the algorithm. result_mon = c_hash result_mon = (1000003 * result_mon) ^ var_name_hash result_mon = (1000003 * result_mon) ^ i result += result_mon
which seems just not to work at all. Probably a complete rethink is in order here.
comment:5 Changed 3 years ago by
(related: I did something for nf elements at #23402)
comment:6 Changed 3 years ago by
Once #23402 is positively reviewed I will port the corresponding code here
comment:7 Changed 3 years ago by
- Dependencies changed from #21272 to #21272, #23402
Let's make #23402 a dependency then.
comment:8 Changed 3 years ago by
Actually what I did in #23402 has to be rethought since it does not apply to sparse polynomials... the hashing function needs to involve the degree of the monomials in some way.
comment:9 Changed 3 years ago by
So, is #23402 then a dependency or not?
comment:10 Changed 3 years ago by
- Dependencies changed from #21272, #23402 to #23402
- Description modified (diff)
comment:11 Changed 3 years ago by
- Dependencies #23402 deleted
Let us remove the dependency as there is no coercion/hash compatibility involved between polynomial rings and number fields.
comment:12 Changed 3 years ago by
- Milestone changed from sage-7.4 to sage-8.1
comment:13 follow-up: ↓ 14 Changed 11 months ago by
- Milestone changed from sage-8.1 to sage-9.1
Ping. Something has changed in the interim, as my example from comment:4 is no longer applicable:
sage: R.<t> = QQ[] ....: u = 2*t^3 + t^2; v = t^3 + 2*t + 1 ....: hash(u), hash(v) (-7745487741690187698, -7745487741690187694)
but there is still an overall issue:
sage: S.<t> = QQ[] sage: from itertools import product sage: values = [S(t) for t in product((0,1/2,1,2), repeat=5)] sage: len(set(map(hash,values))) 491 sage: len(values) 1024
comment:14 in reply to: ↑ 13 Changed 11 months ago by
- Cc jdemeyer added
Replying to kedlaya:
Ping. Something has changed in the interim, as my example from comment:4 is no longer applicable:
This is because of the switch to Python 3, as strings (of the variable names) are hashed differently. With a Python 2 based Sage, it still behaves as before.
Also note that this
# Hash (self[i], generator, i) as a tuple according to the algorithm. result_mon = c_hash result_mon = (1000003 * result_mon) ^ var_name_hash result_mon = (1000003 * result_mon) ^ i result += result_mon
essentially copies the implementation of hashing of tuples in Python 2. This was found to have many collisions on simple input and was replaced by a better hash function (xxHash) in Python 3.8; see https://bugs.python.org/issue34751. I guess we could just adopt this new tuple hash here to get better collision resistance.
comment:15 follow-up: ↓ 16 Changed 11 months ago by
Sounds reasonable. Does it also fix
sage: hash(-1) == hash(-2) True
comment:16 in reply to: ↑ 15 Changed 11 months ago by
No, that is still the same. IIRC, it was mentioned somewhere that this pattern is baked too deeply into Python as well as third-party code to change it at this point.
comment:17 Changed 11 months ago by
Then using the tuple hash is probably not what you want to do (below with Python 3.8.0)
>>> from itertools import product >>> for p in product([-1,-2], repeat=3): print(hash(p)) -6133384102531166807 -6133384102531166807 -6133384102531166807 -6133384102531166807 -6133384102531166807 -6133384102531166807 -6133384102531166807 -6133384102531166807
comment:18 Changed 11 months ago by
This is probably difficult to avoid in general, since the hashing function for polynomials should incorporate the hash values of its coefficients. As the hash values of -1 and -2 are the same, all polynomials that differ only in terms of coefficients of -1 or -2 will necessarily have the same hash.
Cython automatically returns -2 if a hashing function returns -1, so to avoid this, one would have to add a separate hashing function (which Cython does not know about) to the elements of all coefficient rings – or find some other way of obtaining hash values of -1.
Nevertheless, the problem in the ticket description is independent from this -1/-2 issue, so hashing could still be improved a lot for polynomials not involving coefficients of -1.
comment:19 Changed 7 months ago by
- Milestone changed from sage-9.1 to sage-9.2
Batch modifying tickets that will likely not be ready for 9.1, based on a review of the ticket title, branch/review status, and last modification date.
comment:20 Changed 2 months ago by
- Milestone changed from sage-9.2 to sage-9.3
I don't have
product
in my version of Sage, but by replacing that line withI was able to reproduce the issue. I need a fix for this in order to deal with hashes of ideals; see #21297.