Opened 11 years ago
Last modified 11 years ago
#12290 closed defect
Fix the hash of matrix spaces and improve its performance — at Version 5
Reported by: | Simon King | Owned by: | jason, was |
---|---|---|---|
Priority: | critical | Milestone: | sage-5.0 |
Component: | linear algebra | Keywords: | hash matrix space unique parent |
Cc: | Merged in: | ||
Authors: | Simon King | Reviewers: | |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description (last modified by )
The central assumptions for any hash function is: If two objects are equal then they must have the same hash. That assumption is violated for matrix spaces:
sage: M = MatrixSpace(ZZ, 10) sage: N = MatrixSpace(ZZ, 10,sparse=True) sage: N Full MatrixSpace of 10 by 10 sparse matrices over Integer Ring sage: M Full MatrixSpace of 10 by 10 dense matrices over Integer Ring sage: M == N True sage: hash(M)==hash(N) False
That has to be fixed. Moreover, the hash of matrix spaces is rather sluggish and should thus be improved speed-wise:
sage: %timeit hash(M) 625 loops, best of 3: 13.8 µs per loop
The root of both evils is the generic __hash__
method inherited from SageObject
:
def __hash__(self): return hash(self.__repr__())
Apply
Change History (7)
Changed 11 years ago by
Attachment: | trac12290_matrixspace_hash.patch added |
---|
comment:1 Changed 11 years ago by
Authors: | → Simon King |
---|---|
Status: | new → needs_review |
With my patch, the __hash__
method returns the hash of the same data that are used for comparison of matrix spaces. Hence, we have the new doc test
sage: M = MatrixSpace(ZZ, 10) sage: N = MatrixSpace(ZZ, 10,sparse=True) sage: M == N True sage: hash(M)==hash(N) True
Now about speed. If one computes the hash value over and over again by
def __hash__(self): return hash((self.base(),self.__nrows, self.__ncols))
then the timing is
sage: timeit("hash(M)", number=10^6) 1000000 loops, best of 3: 1.72 µs per loop
If the hash value is stored in a single underscore attribute, such as
def __hash__(self): try: return self._hash except AttributeError: self._hash = h = hash((self.base(),self.__nrows, self.__ncols)) return h
then one gets
sage: timeit("hash(M)", number=10^6) 1000000 loops, best of 3: 801 ns per loop
With a double-underscore __hash
instead of _hash
, one has
sage: timeit("hash(M)", number=10^6) 1000000 loops, best of 3: 712 ns per loop
With directly accessing the __dict__
such as
def __hash__(self): try: return self.__dict__['_hash'] except KeyError: self.__dict__['_hash'] = h = hash((self.base(),self.__nrows, self.__ncols)) return h
one has
sage: timeit("hash(M)", number=10^6) 1000000 loops, best of 3: 611 ns per loop
and with the patch, one has
sage: timeit("hash(M)", number=10^6) 1000000 loops, best of 3: 547 ns per loop
How is that possible? Apparently a "try-except" block has some overhead. Hence, simply returning a lazy attribute (which is solution of my patch) is fastest. Note that it is not possible to use @cached_method on the __hash__
method.
Needs review (although I still need to run full doctests)!
comment:2 Changed 11 years ago by
Status: | needs_review → needs_work |
---|
make ptest results in one error:
sage -t -force_lib "devel/sage/sage/matrix/matrix2.pyx" ********************************************************************** File "/home/simon/SAGE/sage-4.8.alpha3/devel/sage/sage/matrix/matrix2.pyx", line 581: sage: B.elementwise_product(C).is_sparse() Exception raised: Traceback (most recent call last): File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/ncadoctest.py", line 1231, in run_one_test self.run_one_example(test, example, filename, compileflags) File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/sagedoctest.py", line 38, in run_one_example OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags) File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/ncadoctest.py", line 1172, in run_one_example compileflags, 1) in test.globs File "<doctest __main__.example_9[34]>", line 1, in <module> B.elementwise_product(C).is_sparse()###line 581: sage: B.elementwise_product(C).is_sparse() File "matrix2.pyx", line 626, in sage.matrix.matrix2.Matrix.elementwise_product (sage/matrix/matrix2.c:5345) File "element.pyx", line 2994, in sage.structure.element.canonical_coercion (sage/structure/element.c:20329) File "element.pyx", line 3011, in sage.structure.element.canonical_coercion (sage/structure/element.c:20245) File "coerce.pyx", line 939, in sage.structure.coerce.CoercionModel_cache_maps.canonical_coercion (sage/structure/coerce.c:9043) TypeError: no common canonical parent for objects with parents: 'Full MatrixSpace of 5 by 6 sparse matrices over Integer Ring' and 'Full MatrixSpace of 5 by 6 sparse matrices over Rational Field' ********************************************************************** 1 items had failures: 1 of 50 in __main__.example_9 ***Test Failed*** 1 failures. For whitespace errors, see the file /home/simon/.sage//tmp/matrix2_8852.py [19.4 s]
Quite an interesting error, I think!
comment:3 Changed 11 years ago by
Interesting indeed.
Without the patch, one has
sage: M1 = MatrixSpace(ZZ, 5,6, sparse=True) sage: M2 = MatrixSpace(ZZ, 5,6, sparse=False) sage: M1==M2 True sage: D = {M1:1,M2:2} sage: len(D) 2
Obvious reason: M1 and M2 are equal (so then the length of D should be one, not two!), but they have different hash and are thus in different buckets of the dictionary.
With my patch, they have the same hash, and by consequence they yield the same dictionary item - and that is bad for coercion! Hence, non-unique parents strike again...
comment:4 Changed 11 years ago by
The problem could be fixed by turning matrix spaces into unique parents (which would be the straight-forward thing to do).
I just asked sage-devel for permission to make matrix spaces unique. Adding a dense and a sparse matrix would not be a problem for the coercion model.
Changed 11 years ago by
Attachment: | trac12290_unique_matrix_space.patch added |
---|
Use UniqueRepresentation
as a base class for matrix spaces.
comment:5 Changed 11 years ago by
Description: | modified (diff) |
---|---|
Keywords: | unique parent added |
Status: | needs_work → needs_review |
I have attached a patch that follows a totally different approach: Use UniqueRepresentation
as a base class for matrix spaces!
Advantage: One gets __hash__
, __cmp__
and __reduce__
for free, and the hash is even faster than with my previous patch.
sage: M = MatrixSpace(ZZ, 10) sage: N = MatrixSpace(ZZ, 10,sparse=True) sage: M == N False sage: timeit("hash(M)", number=10^6) 1000000 loops, best of 3: 511 ns per loop
The price to pay (as one can see in the example): The spaces of dense versus sparse matrices are not considered equal anymore. For applications, this shouldn't matter, since the coercion model can easily deal with it. In fact, I like the new behaviour a lot better than the old behaviour!
Old:
sage: M = MatrixSpace(ZZ, 10) sage: N = MatrixSpace(ZZ, 10,sparse=True) sage: a = M.random_element() sage: b = N.random_element() sage: (a+b).parent() Full MatrixSpace of 10 by 10 dense matrices over Integer Ring sage: (b+a).parent() Full MatrixSpace of 10 by 10 sparse matrices over Integer Ring
The parent of the sum depends on the order of summands!!
But with the new patch, one has
sage: M = MatrixSpace(ZZ, 10) sage: N = MatrixSpace(ZZ, 10,sparse=True) sage: a = M.random_element() sage: b = N.random_element() sage: (a+b).parent() Full MatrixSpace of 10 by 10 dense matrices over Integer Ring sage: (b+a).parent() Full MatrixSpace of 10 by 10 dense matrices over Integer Ring
independent of the summation order.
I had to change some existing doctests in a trivial way, and then the whole test suite passes. Ready for review!
Apply trac12290_unique_matrix_space.patch
Make the hashes of two equal matrix spaces equal. Improve the performance