Opened 6 years ago
Last modified 4 months ago
#21284 new defect
hash of polynomials is very bad
Reported by: | vdelecroix | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-9.7 |
Component: | algebra | Keywords: | |
Cc: | jdemeyer | Merged in: | |
Authors: | Reviewers: | ||
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description (last modified by )
A lot of conflicts:
sage: S.<t> = QQ[] sage: from itertools import product sage: values = [S(t) for t in product((0,1/2,1,2), repeat=5)] sage: len(set(map(hash,values))) 565 sage: len(values) 1024
Though hashing in this situation is delicate since we want to preserve the compatibility with:
- sparse polynomials
- multivariate polynomials
Change History (24)
comment:1 Changed 6 years ago by
- Dependencies set to #21272
comment:2 Changed 6 years ago by
comment:3 Changed 6 years ago by
- Description modified (diff)
Right product
comes from the itertools
module.
comment:4 Changed 5 years ago by
To make this more specific, in case this helps to chase this down:
sage: R.<t> = QQ[] sage: u = 2*t^3 + t^2; v = t^3 + 2*t + 1 sage: hash(u), hash(v) (29698519856292282, 29698519856292282)
The key bit of code is in the function hash_c
(omitting some comments):
for i from 0<= i <= self.degree(): if i == 1: # we delay the hashing until now to not waste it on a constant poly var_name_hash = hash(self._parent._names[0]) c_hash = hash(self[i]) if c_hash != 0: if i == 0: result += c_hash else: # Hash (self[i], generator, i) as a tuple according to the algorithm. result_mon = c_hash result_mon = (1000003 * result_mon) ^ var_name_hash result_mon = (1000003 * result_mon) ^ i result += result_mon
which seems just not to work at all. Probably a complete rethink is in order here.
comment:5 Changed 5 years ago by
(related: I did something for nf elements at #23402)
comment:6 Changed 5 years ago by
Once #23402 is positively reviewed I will port the corresponding code here
comment:7 Changed 5 years ago by
- Dependencies changed from #21272 to #21272, #23402
Let's make #23402 a dependency then.
comment:8 Changed 5 years ago by
Actually what I did in #23402 has to be rethought since it does not apply to sparse polynomials... the hashing function needs to involve the degree of the monomials in some way.
comment:9 Changed 5 years ago by
So, is #23402 then a dependency or not?
comment:10 Changed 5 years ago by
- Dependencies changed from #21272, #23402 to #23402
- Description modified (diff)
comment:11 Changed 5 years ago by
- Dependencies #23402 deleted
Let us remove the dependency as there is no coercion/hash compatibility involved between polynomial rings and number fields.
comment:12 Changed 5 years ago by
- Milestone changed from sage-7.4 to sage-8.1
comment:13 follow-up: ↓ 14 Changed 3 years ago by
- Milestone changed from sage-8.1 to sage-9.1
Ping. Something has changed in the interim, as my example from comment:4 is no longer applicable:
sage: R.<t> = QQ[] ....: u = 2*t^3 + t^2; v = t^3 + 2*t + 1 ....: hash(u), hash(v) (-7745487741690187698, -7745487741690187694)
but there is still an overall issue:
sage: S.<t> = QQ[] sage: from itertools import product sage: values = [S(t) for t in product((0,1/2,1,2), repeat=5)] sage: len(set(map(hash,values))) 491 sage: len(values) 1024
comment:14 in reply to: ↑ 13 Changed 3 years ago by
- Cc jdemeyer added
Replying to kedlaya:
Ping. Something has changed in the interim, as my example from comment:4 is no longer applicable:
This is because of the switch to Python 3, as strings (of the variable names) are hashed differently. With a Python 2 based Sage, it still behaves as before.
Also note that this
# Hash (self[i], generator, i) as a tuple according to the algorithm. result_mon = c_hash result_mon = (1000003 * result_mon) ^ var_name_hash result_mon = (1000003 * result_mon) ^ i result += result_mon
essentially copies the implementation of hashing of tuples in Python 2. This was found to have many collisions on simple input and was replaced by a better hash function (xxHash) in Python 3.8; see https://bugs.python.org/issue34751. I guess we could just adopt this new tuple hash here to get better collision resistance.
comment:15 follow-up: ↓ 16 Changed 3 years ago by
Sounds reasonable. Does it also fix
sage: hash(-1) == hash(-2) True
comment:16 in reply to: ↑ 15 Changed 3 years ago by
No, that is still the same. IIRC, it was mentioned somewhere that this pattern is baked too deeply into Python as well as third-party code to change it at this point.
comment:17 Changed 3 years ago by
Then using the tuple hash is probably not what you want to do (below with Python 3.8.0)
>>> from itertools import product >>> for p in product([-1,-2], repeat=3): print(hash(p)) -6133384102531166807 -6133384102531166807 -6133384102531166807 -6133384102531166807 -6133384102531166807 -6133384102531166807 -6133384102531166807 -6133384102531166807
comment:18 Changed 3 years ago by
This is probably difficult to avoid in general, since the hashing function for polynomials should incorporate the hash values of its coefficients. As the hash values of -1 and -2 are the same, all polynomials that differ only in terms of coefficients of -1 or -2 will necessarily have the same hash.
Cython automatically returns -2 if a hashing function returns -1, so to avoid this, one would have to add a separate hashing function (which Cython does not know about) to the elements of all coefficient rings – or find some other way of obtaining hash values of -1.
Nevertheless, the problem in the ticket description is independent from this -1/-2 issue, so hashing could still be improved a lot for polynomials not involving coefficients of -1.
comment:19 Changed 2 years ago by
- Milestone changed from sage-9.1 to sage-9.2
Batch modifying tickets that will likely not be ready for 9.1, based on a review of the ticket title, branch/review status, and last modification date.
comment:20 Changed 2 years ago by
- Milestone changed from sage-9.2 to sage-9.3
comment:21 Changed 18 months ago by
- Milestone changed from sage-9.3 to sage-9.4
Setting new milestone based on a cursory review of ticket status, priority, and last modification date.
comment:22 Changed 13 months ago by
- Milestone changed from sage-9.4 to sage-9.5
comment:23 Changed 8 months ago by
- Milestone changed from sage-9.5 to sage-9.6
comment:24 Changed 4 months ago by
- Milestone changed from sage-9.6 to sage-9.7
I don't have
product
in my version of Sage, but by replacing that line withI was able to reproduce the issue. I need a fix for this in order to deal with hashes of ideals; see #21297.