Opened 5 years ago

Closed 3 years ago

# fix hash of (dense) matrices

Reported by: Owned by: dkrenn major sage-duplicate/invalid/wontfix misc N/A adapt doctests u/dkrenn/hash-matrix 1e1263f56137d00f56eb828a7cf592b4f9649d2f

### Description

It took me only 4 guesses to find two matrices with the same hash:

```sage: N = Matrix([[1,1],[2,2]]); N.set_immutable(); hash(N)
3
sage: N = Matrix([[0,0],[0,1]]); N.set_immutable(); hash(N)
3
```

This should not be that easy.

In particular (looking at the code in `sage.matrix.matric_dense`), the hash value does not depend on entry 1,1 of the matrix (multiplication by zero in the hash code):

```sage: N = Matrix([[3,2],[0,0]]); N.set_immutable(); hash(N)
2
sage: N = Matrix([[443,2],[0,0]]); N.set_immutable(); hash(N)
2
sage: N = Matrix([[2,2],[0,0]]); N.set_immutable(); hash(N)
2
```

### comment:1 Changed 5 years ago by dkrenn

From `sage.matrix.matrix_dense`:

```        v = self._list()
cdef Py_ssize_t i
cdef long h = 0

for i from 0 <= i < len(v):
h = h ^ (i * hash(v[i]))
```

What is a good caching strategy? Simply `h = hash(tuple(self._list()))`?

### comment:2 Changed 4 years ago by ibookstein

Non-cryptographic hashes don't have to be resilient to collision attacks, but should indeed give good hash value spread for the input space - which this implementation doesn't.

I can confirm that this implementation causes massive performance degradation when matrices are used as dictionary keys, for instance.

While the current implementation does try to 'save' itself from the disadvantages of XORing hash values by multiplying by the index, it still performs very poorly.

I tried the solution dkrenn suggested and it does indeed work much better. If we don't care about the performance cost of creating a tuple (and a list), seeing as the hash value is cached, it's a nice solution.

### comment:3 Changed 4 years ago by ibookstein

• Milestone changed from sage-7.4 to sage-8.1

### comment:4 Changed 4 years ago by dkrenn

• Branch set to u/dkrenn/hash-matrix

### comment:5 Changed 4 years ago by dkrenn

• Commit set to 1e1263f56137d00f56eb828a7cf592b4f9649d2f
• Work issues set to adapt doctests

New commits:

 ​1e1263f `new hash function for matrix_dense`

### comment:6 Changed 3 years ago by ibookstein

Looks like major work was put into this issue in #19050, rendering this ticket a duplicate.

### comment:7 Changed 3 years ago by jdemeyer

• Milestone changed from sage-8.1 to sage-duplicate/invalid/wontfix
• Resolution set to duplicate
• Status changed from new to closed

Duplicate of #10950.

Note: See TracTickets for help on using tickets.