Opened 4 years ago

Last modified 2 months ago

#21284 new defect

hash of polynomials is very bad

Reported by: vdelecroix Owned by:
Priority: major Milestone: sage-9.3
Component: algebra Keywords:
Cc: jdemeyer Merged in:
Authors: Reviewers:
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by vdelecroix)

A lot of conflicts:

sage: S.<t> = QQ[]
sage: from itertools import product
sage: values = [S(t) for t in product((0,1/2,1,2), repeat=5)]
sage: len(set(map(hash,values)))
565
sage: len(values)
1024

Though hashing in this situation is delicate since we want to preserve the compatibility with:

  • sparse polynomials
  • multivariate polynomials

Change History (20)

comment:1 Changed 4 years ago by vdelecroix

  • Dependencies set to #21272

comment:2 Changed 4 years ago by kedlaya

I don't have product in my version of Sage, but by replacing that line with

sage: values = [S(list(t)) for t in cartesian_product([[0,1/2,1,2],] * 5)]

I was able to reproduce the issue. I need a fix for this in order to deal with hashes of ideals; see #21297.

comment:3 Changed 4 years ago by vdelecroix

  • Description modified (diff)

Right product comes from the itertools module.

comment:4 Changed 3 years ago by kedlaya

To make this more specific, in case this helps to chase this down:

sage: R.<t> = QQ[]
sage: u = 2*t^3 + t^2; v = t^3 + 2*t + 1
sage: hash(u), hash(v)
(29698519856292282, 29698519856292282)

The key bit of code is in the function hash_c (omitting some comments):

for i from 0<= i <= self.degree():
    if i == 1:
        # we delay the hashing until now to not waste it on a constant poly
        var_name_hash = hash(self._parent._names[0])
    c_hash = hash(self[i])
    if c_hash != 0:
        if i == 0:
            result += c_hash
        else:
            # Hash (self[i], generator, i) as a tuple according to the algorithm.
            result_mon = c_hash
            result_mon = (1000003 * result_mon) ^ var_name_hash
            result_mon = (1000003 * result_mon) ^ i
            result += result_mon

which seems just not to work at all. Probably a complete rethink is in order here.

comment:5 Changed 3 years ago by vdelecroix

(related: I did something for nf elements at #23402)

comment:6 Changed 3 years ago by vdelecroix

Once #23402 is positively reviewed I will port the corresponding code here

comment:7 Changed 3 years ago by kedlaya

  • Dependencies changed from #21272 to #21272, #23402

Let's make #23402 a dependency then.

comment:8 Changed 3 years ago by vdelecroix

Actually what I did in #23402 has to be rethought since it does not apply to sparse polynomials... the hashing function needs to involve the degree of the monomials in some way.

comment:9 Changed 3 years ago by jdemeyer

So, is #23402 then a dependency or not?

comment:10 Changed 3 years ago by vdelecroix

  • Dependencies changed from #21272, #23402 to #23402
  • Description modified (diff)

comment:11 Changed 3 years ago by vdelecroix

  • Dependencies #23402 deleted

Let us remove the dependency as there is no coercion/hash compatibility involved between polynomial rings and number fields.

comment:12 Changed 3 years ago by vdelecroix

  • Milestone changed from sage-7.4 to sage-8.1

comment:13 follow-up: Changed 11 months ago by kedlaya

  • Milestone changed from sage-8.1 to sage-9.1

Ping. Something has changed in the interim, as my example from comment:4 is no longer applicable:

sage: R.<t> = QQ[]
....: u = 2*t^3 + t^2; v = t^3 + 2*t + 1
....: hash(u), hash(v)
(-7745487741690187698, -7745487741690187694)

but there is still an overall issue:

sage: S.<t> = QQ[]
sage: from itertools import product
sage: values = [S(t) for t in product((0,1/2,1,2), repeat=5)]
sage: len(set(map(hash,values)))
491
sage: len(values)
1024

comment:14 in reply to: ↑ 13 Changed 11 months ago by gh-mwageringel

  • Cc jdemeyer added

Replying to kedlaya:

Ping. Something has changed in the interim, as my example from comment:4 is no longer applicable:

This is because of the switch to Python 3, as strings (of the variable names) are hashed differently. With a Python 2 based Sage, it still behaves as before.


Also note that this

            # Hash (self[i], generator, i) as a tuple according to the algorithm.
            result_mon = c_hash
            result_mon = (1000003 * result_mon) ^ var_name_hash
            result_mon = (1000003 * result_mon) ^ i
            result += result_mon

essentially copies the implementation of hashing of tuples in Python 2. This was found to have many collisions on simple input and was replaced by a better hash function (xxHash) in Python 3.8; see https://bugs.python.org/issue34751. I guess we could just adopt this new tuple hash here to get better collision resistance.

comment:15 follow-up: Changed 11 months ago by vdelecroix

Sounds reasonable. Does it also fix

sage: hash(-1) == hash(-2)
True

comment:16 in reply to: ↑ 15 Changed 11 months ago by gh-mwageringel

No, that is still the same. IIRC, it was mentioned somewhere that this pattern is baked too deeply into Python as well as third-party code to change it at this point.

comment:17 Changed 11 months ago by vdelecroix

Then using the tuple hash is probably not what you want to do (below with Python 3.8.0)

>>> from itertools import product
>>> for p in product([-1,-2], repeat=3): print(hash(p)) 
-6133384102531166807
-6133384102531166807
-6133384102531166807
-6133384102531166807
-6133384102531166807
-6133384102531166807
-6133384102531166807
-6133384102531166807

comment:18 Changed 11 months ago by gh-mwageringel

This is probably difficult to avoid in general, since the hashing function for polynomials should incorporate the hash values of its coefficients. As the hash values of -1 and -2 are the same, all polynomials that differ only in terms of coefficients of -1 or -2 will necessarily have the same hash.

Cython automatically returns -2 if a hashing function returns -1, so to avoid this, one would have to add a separate hashing function (which Cython does not know about) to the elements of all coefficient rings – or find some other way of obtaining hash values of -1.

Nevertheless, the problem in the ticket description is independent from this -1/-2 issue, so hashing could still be improved a lot for polynomials not involving coefficients of -1.

comment:19 Changed 7 months ago by mkoeppe

  • Milestone changed from sage-9.1 to sage-9.2

Batch modifying tickets that will likely not be ready for 9.1, based on a review of the ticket title, branch/review status, and last modification date.

comment:20 Changed 2 months ago by mkoeppe

  • Milestone changed from sage-9.2 to sage-9.3
Note: See TracTickets for help on using tickets.