Opened 6 years ago

# hash of polynomials is very bad

Reported by: Owned by: vdelecroix major sage-9.7 algebra jdemeyer N/A

A lot of conflicts:

```sage: S.<t> = QQ[]
sage: from itertools import product
sage: values = [S(t) for t in product((0,1/2,1,2), repeat=5)]
sage: len(set(map(hash,values)))
565
sage: len(values)
1024
```

Though hashing in this situation is delicate since we want to preserve the compatibility with:

• sparse polynomials
• multivariate polynomials

### comment:1 Changed 6 years ago by vdelecroix

• Dependencies set to #21272

### comment:2 Changed 6 years ago by kedlaya

I don't have `product` in my version of Sage, but by replacing that line with

```sage: values = [S(list(t)) for t in cartesian_product([[0,1/2,1,2],] * 5)]
```

I was able to reproduce the issue. I need a fix for this in order to deal with hashes of ideals; see #21297.

### comment:3 Changed 6 years ago by vdelecroix

• Description modified (diff)

Right `product` comes from the `itertools` module.

### comment:4 Changed 5 years ago by kedlaya

To make this more specific, in case this helps to chase this down:

```sage: R.<t> = QQ[]
sage: u = 2*t^3 + t^2; v = t^3 + 2*t + 1
sage: hash(u), hash(v)
(29698519856292282, 29698519856292282)
```

The key bit of code is in the function `hash_c` (omitting some comments):

```for i from 0<= i <= self.degree():
if i == 1:
# we delay the hashing until now to not waste it on a constant poly
var_name_hash = hash(self._parent._names[0])
c_hash = hash(self[i])
if c_hash != 0:
if i == 0:
result += c_hash
else:
# Hash (self[i], generator, i) as a tuple according to the algorithm.
result_mon = c_hash
result_mon = (1000003 * result_mon) ^ var_name_hash
result_mon = (1000003 * result_mon) ^ i
result += result_mon
```

which seems just not to work at all. Probably a complete rethink is in order here.

### comment:5 Changed 5 years ago by vdelecroix

(related: I did something for nf elements at #23402)

### comment:6 Changed 5 years ago by vdelecroix

Once #23402 is positively reviewed I will port the corresponding code here

### comment:7 Changed 5 years ago by kedlaya

• Dependencies changed from #21272 to #21272, #23402

Let's make #23402 a dependency then.

### comment:8 Changed 5 years ago by vdelecroix

Actually what I did in #23402 has to be rethought since it does not apply to sparse polynomials... the hashing function needs to involve the degree of the monomials in some way.

### comment:9 Changed 5 years ago by jdemeyer

So, is #23402 then a dependency or not?

### comment:10 Changed 5 years ago by vdelecroix

• Dependencies changed from #21272, #23402 to #23402
• Description modified (diff)

### comment:11 Changed 5 years ago by vdelecroix

• Dependencies #23402 deleted

Let us remove the dependency as there is no coercion/hash compatibility involved between polynomial rings and number fields.

### comment:12 Changed 5 years ago by vdelecroix

• Milestone changed from sage-7.4 to sage-8.1

### comment:13 follow-up: ↓ 14 Changed 3 years ago by kedlaya

• Milestone changed from sage-8.1 to sage-9.1

Ping. Something has changed in the interim, as my example from comment:4 is no longer applicable:

```sage: R.<t> = QQ[]
....: u = 2*t^3 + t^2; v = t^3 + 2*t + 1
....: hash(u), hash(v)
(-7745487741690187698, -7745487741690187694)
```

but there is still an overall issue:

```sage: S.<t> = QQ[]
sage: from itertools import product
sage: values = [S(t) for t in product((0,1/2,1,2), repeat=5)]
sage: len(set(map(hash,values)))
491
sage: len(values)
1024
```

### comment:14 in reply to: ↑ 13 Changed 3 years ago by gh-mwageringel

• Cc jdemeyer added

Ping. Something has changed in the interim, as my example from comment:4 is no longer applicable:

This is because of the switch to Python 3, as strings (of the variable names) are hashed differently. With a Python 2 based Sage, it still behaves as before.

Also note that this

```            # Hash (self[i], generator, i) as a tuple according to the algorithm.
result_mon = c_hash
result_mon = (1000003 * result_mon) ^ var_name_hash
result_mon = (1000003 * result_mon) ^ i
result += result_mon
```

essentially copies the implementation of hashing of tuples in Python 2. This was found to have many collisions on simple input and was replaced by a better hash function (xxHash) in Python 3.8; see https://bugs.python.org/issue34751. I guess we could just adopt this new tuple hash here to get better collision resistance.

### comment:15 follow-up: ↓ 16 Changed 3 years ago by vdelecroix

Sounds reasonable. Does it also fix

```sage: hash(-1) == hash(-2)
True
```

### comment:16 in reply to: ↑ 15 Changed 3 years ago by gh-mwageringel

No, that is still the same. IIRC, it was mentioned somewhere that this pattern is baked too deeply into Python as well as third-party code to change it at this point.

### comment:17 Changed 3 years ago by vdelecroix

Then using the tuple hash is probably not what you want to do (below with Python 3.8.0)

```>>> from itertools import product
>>> for p in product([-1,-2], repeat=3): print(hash(p))
-6133384102531166807
-6133384102531166807
-6133384102531166807
-6133384102531166807
-6133384102531166807
-6133384102531166807
-6133384102531166807
-6133384102531166807
```

### comment:18 Changed 3 years ago by gh-mwageringel

This is probably difficult to avoid in general, since the hashing function for polynomials should incorporate the hash values of its coefficients. As the hash values of -1 and -2 are the same, all polynomials that differ only in terms of coefficients of -1 or -2 will necessarily have the same hash.

Cython automatically returns -2 if a hashing function returns -1, so to avoid this, one would have to add a separate hashing function (which Cython does not know about) to the elements of all coefficient rings – or find some other way of obtaining hash values of -1.

Nevertheless, the problem in the ticket description is independent from this -1/-2 issue, so hashing could still be improved a lot for polynomials not involving coefficients of -1.

### comment:19 Changed 2 years ago by mkoeppe

• Milestone changed from sage-9.1 to sage-9.2

Batch modifying tickets that will likely not be ready for 9.1, based on a review of the ticket title, branch/review status, and last modification date.

### comment:20 Changed 2 years ago by mkoeppe

• Milestone changed from sage-9.2 to sage-9.3

### comment:21 Changed 18 months ago by mkoeppe

• Milestone changed from sage-9.3 to sage-9.4

Setting new milestone based on a cursory review of ticket status, priority, and last modification date.

### comment:22 Changed 13 months ago by mkoeppe

• Milestone changed from sage-9.4 to sage-9.5

### comment:23 Changed 8 months ago by mkoeppe

• Milestone changed from sage-9.5 to sage-9.6

### comment:24 Changed 4 months ago by mkoeppe

• Milestone changed from sage-9.6 to sage-9.7
Note: See TracTickets for help on using tickets.