Opened 5 years ago

Closed 3 years ago

#21316 closed defect (duplicate)

fix hash of (dense) matrices

Reported by: dkrenn Owned by:
Priority: major Milestone: sage-duplicate/invalid/wontfix
Component: misc Keywords:
Cc: Merged in:
Authors: Reviewers:
Report Upstream: N/A Work issues: adapt doctests
Branch: u/dkrenn/hash-matrix (Commits, GitHub, GitLab) Commit: 1e1263f56137d00f56eb828a7cf592b4f9649d2f
Dependencies: Stopgaps:

Status badges


It took me only 4 guesses to find two matrices with the same hash:

sage: N = Matrix([[1,1],[2,2]]); N.set_immutable(); hash(N)
sage: N = Matrix([[0,0],[0,1]]); N.set_immutable(); hash(N)

This should not be that easy.

In particular (looking at the code in sage.matrix.matric_dense), the hash value does not depend on entry 1,1 of the matrix (multiplication by zero in the hash code):

sage: N = Matrix([[3,2],[0,0]]); N.set_immutable(); hash(N)
sage: N = Matrix([[443,2],[0,0]]); N.set_immutable(); hash(N)
sage: N = Matrix([[2,2],[0,0]]); N.set_immutable(); hash(N)

Change History (7)

comment:1 Changed 5 years ago by dkrenn

From sage.matrix.matrix_dense:

        v = self._list()
        cdef Py_ssize_t i
        cdef long h = 0

        for i from 0 <= i < len(v):
            h = h ^ (i * hash(v[i]))

What is a good caching strategy? Simply h = hash(tuple(self._list()))?

comment:2 Changed 4 years ago by ibookstein

Non-cryptographic hashes don't have to be resilient to collision attacks, but should indeed give good hash value spread for the input space - which this implementation doesn't.

I can confirm that this implementation causes massive performance degradation when matrices are used as dictionary keys, for instance.

See tuple hash algorithm and explanations why XORing hash values doesn't perform very well.

While the current implementation does try to 'save' itself from the disadvantages of XORing hash values by multiplying by the index, it still performs very poorly.

I tried the solution dkrenn suggested and it does indeed work much better. If we don't care about the performance cost of creating a tuple (and a list), seeing as the hash value is cached, it's a nice solution.

comment:3 Changed 4 years ago by ibookstein

  • Milestone changed from sage-7.4 to sage-8.1

comment:4 Changed 4 years ago by dkrenn

  • Branch set to u/dkrenn/hash-matrix

comment:5 Changed 4 years ago by dkrenn

  • Commit set to 1e1263f56137d00f56eb828a7cf592b4f9649d2f
  • Work issues set to adapt doctests

New commits:

1e1263fnew hash function for matrix_dense

comment:6 Changed 3 years ago by ibookstein

Looks like major work was put into this issue in #19050, rendering this ticket a duplicate.

comment:7 Changed 3 years ago by jdemeyer

  • Milestone changed from sage-8.1 to sage-duplicate/invalid/wontfix
  • Resolution set to duplicate
  • Status changed from new to closed

Duplicate of #10950.

Note: See TracTickets for help on using tickets.