Opened 10 years ago

Last modified 6 years ago

#11794 needs_work enhancement

Optional cythonised cached hash for Python classes

Reported by: SimonKing Owned by: jason
Priority: major Milestone: sage-6.4
Component: misc Keywords:
Cc: robertwb Merged in:
Authors: Simon King Reviewers:
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: #11115, #11791 Stopgaps:

Status badges

Description (last modified by chapoton)

By default, the hash of an instance is obtained from the string representation. It would be computed over and over again, which is slow:

sage: from sage.structure.parent import Parent
sage: P = Parent()
sage: P.__hash__??
Type:method-wrapper
Base Class:<type 'method-wrapper'>
String Form:<method-wrapper '__hash__' of sage.structure.parent.Parent object at 0x53809b0>
Namespace:Interactive
Definition:P.__hash__(self)
Source:
    def __hash__(self):
        return hash(self.__repr__())
sage: timeit("hash(P)")
625 loops, best of 3: 19.4 µs per loop

Of course, a solution is to cache the hash once it is computed. This is done, for instance, in polynomial rings, and it is much faster:

sage: P.<x> = QQ[]
sage: P.__hash__??
Type:instancemethod
Base Class:<type 'instancemethod'>
String Form:<bound method PolynomialRing_field_with_category.__hash__ of Univariate Polynomial Ring in x over Rational Field>
Namespace:Interactive
File:/mnt/local/king/SAGE/sandbox/sage-4.7.2.alpha2/local/lib/python2.6/site-packages/sage/rings/polynomial/polynomial_ring.py
Definition:P.__hash__(self)
Source:
    def __hash__(self):
        # should be faster than just relying on the string representation
        try:
            return self._cached_hash
        except AttributeError:
            pass
        h = self._cached_hash = hash((self.base_ring(),self.variable_name()))
        return h
sage: timeit("hash(P)")
625 loops, best of 3: 3.08 µs per loop

Even more speed is possible if the hash method is written in Cython.

However, it would probably be unfeasible to rewrite all of Sage such that the hash is inherited from a cdef'd class, and also it may not be wanted to cache the hash.

What I am proposing here is a metaclass, that provides the option to add a cached cythonised hash method to any class.

Apply trac11794_fast_hash_metaclass.patch

Attachments (2)

trac11791_fast_hash_metaclass.patch (23.7 KB) - added by SimonKing 10 years ago.
A metaclass that provides any class with a cythonised cached hash method
trac11794_fast_hash_metaclass.patch (24.9 KB) - added by SimonKing 10 years ago.
A metaclass that provides any class with a cythonised cached hash method (disregard the previous patch)

Download all attachments as: .zip

Change History (22)

Changed 10 years ago by SimonKing

A metaclass that provides any class with a cythonised cached hash method

comment:1 follow-up: Changed 10 years ago by SimonKing

  • Authors set to Simon King
  • Dependencies set to #11115, #11791
  • Status changed from new to needs_review

The attached patch provides a new metaclass that overrides the hash method of an existing class with a fast cached method.

1. Providing existing classes with a fast hash

One way of using it is to take an existing class and apply the metaclass to it:

sage: P = FastHashMetaclass(Parent)()
sage: P
<class '__main__.Parent_with_hash'>
sage: timeit("hash(P)")
625 loops, best of 3: 464 ns per loop

sage: C = FastHashMetaclass(QQ['x'].__class__)
sage: C
<class 'sage.rings.polynomial.polynomial_ring.PolynomialRing_field_with_category_with_hash'>
sage: P = C(QQ,'x')
sage: P
Univariate Polynomial Ring in x over Rational Field
sage: timeit("hash(P)")
625 loops, best of 3: 458 ns per loop

2. Class definition with metaclass

The second way of using it is to provide it as __metaclass__ in a class definition. It allows to demonstrate (by printing status information) that the hash really is cached:

sage: class Bar:
....:     __metaclass__ = FastHashMetaclass
....:     def __init__(self,R):
....:         self.R = R
....:     def __repr__(self):
....:         return "[%s]"%self.R
....:     def __hash__(self):
....:         print "base hash"
....:         return hash(repr(self))
....:     
sage: P = Bar(9)
sage: P
[9]
sage: hash(P)
base hash
8209412804338245734
sage: hash(P)
8209412804338245734
sage: hash(repr(P))
8209412804338245734
sage: timeit("hash(P)")
625 loops, best of 3: 467 ns per loop

3. Dynamic metaclasses

Pickling of classes requires (as usual) that the class definition can be imported from a module. Thus, in the next example, I give the definition in interactive Cython. It demonstrates that the new metaclass can be combined with other metaclasses:

sage: cython_code = [
... 'from sage.misc.fast_hash import FastHashMetaclass',
... 'class Bar(UniqueRepresentation):',
... '    __metaclass__ = FastHashMetaclass',
... '    def __init__(self,R):',
... '        self.R = R',
... '    def __repr__(self):',
... '        return "[%s]"%self.R',
... '    def __hash__(self):',
... '        print "base hash"',
... '        return hash(repr(self))']
sage: cython('\n'.join(cython_code))

The metaclass associated with UniqueRepresentation makes the Bar instances unique parents:

sage: b = Bar(5)
sage: b is Bar(5)
True
sage: loads(dumps(b)) is b    # indirect doctest
True

WithFastHashMetaclass additionally makes the hash cached and fast:

sage: hash(b)
base hash
8209412804334245746
sage: hash(b)
8209412804334245746
sage: timeit("hash(b)")
625 loops, best of 3: 475 ns per loop

Technical detail: In order to mix the metaclasses, I turn the metaclasses into dynamic classes. In other words, the metaclasses have a metaclass, themselves:

sage: type(b)
<class '_mnt_local_king__sage_temp_mpc622_31222_tmp_0_spyx_0.Bar'>
sage: type(type(b))
<class 'sage.misc.fast_hash.FastHashAndClasscallMetaclass'>
sage: type(type(b)).__bases__
(<class 'sage.misc.fast_hash.FastHashMetaclass'>, <class 'sage.misc.classcall_metaclass.ClasscallMetaclass'>)
sage: type(type(type(b)))
<class 'sage.structure.dynamic_class.DynamicMetaclass'>

I think that this approach could be scalable, and might be useful in other places.

4. Speeding up existing instances

Another way of using it is to provide single instances of a class with the fast hash. This is only possible if the class is a "heap type":

sage: F = CombinatorialFreeModule(QQ, ['a','b'])
sage: timeit("hash(F)")
625 loops, best of 3: 1.09 µs per loop
sage: F.__class__ = FastHashMetaclass(F.__class__)
sage: timeit("hash(F)")
625 loops, best of 3: 467 ns per loop
sage: P.<x> = QQ[]
sage: P.__class__ = FastHashMetaclass(P.__class__)
sage: timeit("hash(P)")
625 loops, best of 3: 483 ns per loop
sage: x.__class__ = FastHashMetaclass(x.__class__)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/mnt/local/king/SAGE/sandbox/sage-4.7.2.alpha2/devel/sage-main/<ipython console> in <module>()

TypeError: __class__ assignment: only for heap types

5. The price to pay

There is no free lunch. Thus, one might expect that the fast hash comes with a price to pay. Unfortunately, this is indeed the case:

sage: F = CombinatorialFreeModule(QQ, ['a','b'])
sage: D = dir(F)
sage: timeit('for s in D: a = getattr(F,s)', number=10000)
10000 loops, best of 3: 84.9 µs per loop
sage: from sage.misc.fast_hash import FastHashMetaclass
sage: F.__class__ = FastHashMetaclass(F.__class__)
sage: timeit('for s in D: a = getattr(F,s)', number=10000)
10000 loops, best of 3: 96.2 µs per loop
sage: (96.2-84.9)/84.9
0.133097762073027

Hence, I am not suggesting to use the new metaclass everywhere. But when a fast hash is needed, it may be a useful option. Note that the regression in the previous example is temporary:

sage: F.__class__
<class 'sage.combinat.free_module.CombinatorialFreeModule_with_category_with_hash'>
sage: F.__class__.__base__
<class 'sage.combinat.free_module.CombinatorialFreeModule_with_category'>
sage: F.__class__ = F.__class__.__base__
sage: timeit('for s in D: a = getattr(F,s)', number=10000)
10000 loops, best of 3: 84.6 µs per loop

Hence, it could even be an option to temporarily get a fast hash and later bring all other methods back to speed.

Dependencies

#11115 makes cached functions picklable, and that is what we need here.

#11791 provides introspection for classes with dynamic metaclasses. As we have seen above, we have dynamic metaclasses.

I'll try to milden the speed regression exposed above. But I think it is OK to make it "needs review".

comment:2 in reply to: ↑ 1 Changed 10 years ago by SimonKing

Replying to SimonKing:

I'll try to milden the speed regression exposed above. But I think it is OK to make it "needs review".

I have just posted a new patch (also correcting the patch name - please disregard the first patch!). It is not only mildening the regression: It actually provides a general speedup!

sage: from sage.misc.fast_hash import FastHashMetaclass
sage: F = CombinatorialFreeModule(QQ, ['a','b'])
sage: D = dir(F)
sage: timeit('for s in D: a = getattr(F,s)', number=10000)
10000 loops, best of 3: 84.1 µs per loop
sage: timeit("hash(F)")
625 loops, best of 3: 1.09 µs per loop
sage: F.__class__ = FastHashMetaclass(F.__class__)
sage: timeit('for s in D: a = getattr(F,s)', number=10000)
10000 loops, best of 3: 64.4 µs per loop
sage: timeit("hash(F)")
625 loops, best of 3: 443 ns per loop
sage: TestSuite(F).run()

comment:3 Changed 10 years ago by SimonKing

  • Description modified (diff)

For the patchbot:

Apply trac11794_fast_hash_metaclass.patch

comment:4 Changed 10 years ago by SimonKing

I slightly modified the patch: One doc string needed an additional :: for making the html reference look nice.

For the patchbot:

Apply trac11794_fast_hash_metaclass.patch

comment:5 Changed 10 years ago by SimonKing

Not all is good:

sage: C = Objects().parent_class
sage: type('tmp',(FastHashMetaclass(C),),{})

yields an infinite recursion.

Changed 10 years ago by SimonKing

A metaclass that provides any class with a cythonised cached hash method (disregard the previous patch)

comment:6 Changed 10 years ago by SimonKing

The problem is solved with the latest patch version:

sage: from sage.misc.fast_hash import FastHashMetaclass, HashWithCache
sage: from sage.structure.element import Element
sage: C = Objects().parent_class
sage: M = FastHashMetaclass('tmp',(FastHashMetaclass(C),FastHashMetaclass(Element)),{})
sage: M.mro()
[<class '__main__.tmp'>, <type 'sage.misc.fast_hash.HashWithCache'>, <class '__main__.tmp'>, <class 'sage.categories.objects.Objects.parent_class'>, <type 'sage.structure.element.Element'>, <type 'sage.structure.sage_object.SageObject'>, <type 'object'>]

comment:7 Changed 10 years ago by SimonKing

Again for the patchbot:

Apply trac11794_fast_hash_metaclass.patch

comment:8 follow-ups: Changed 10 years ago by hivert

Hi Simon,

Looks like you have done a lot of hard work here ! Thanks ! I might review it more or less soon.

I've a question: how does it interact with #8119 ? Does it solve the problem there ?

Cheers,

Florent

comment:9 in reply to: ↑ 8 ; follow-up: Changed 10 years ago by SimonKing

Hi Florent!

Replying to hivert:

I've a question: how does it interact with #8119 ? Does it solve the problem there ?

One could argue that my patch does not change any existing implementation of hash in Sage -- it is a new option to get a fast hash in a comfortable way, but that new option would (at least in the beginning) not be applied.

Of course one could also argue that my patch could provide a tool to solve #8119. But I think the situation is not so easy. With my patch, the hash would be cached, so that renaming an object would not change the hash. However, consider the following two scenarios

A)

  1. Create P
  2. hash(P)
  3. P.rename("bla")
  4. hash(P)

versus

B)

  1. Create P
  2. P.rename("bla")
  3. hash(P)

With my patch, you would get the same result in A)2. and A)4 - which solves #8119 in scenario A). However, you would get a different result in A).2 and B).3.

Hence, #8119 is no duplicate, but weakly related.

comment:10 in reply to: ↑ 9 Changed 10 years ago by SimonKing

Replying to SimonKing:

Hence, #8119 is no duplicate, but weakly related.

On the other hand: If one would use the metaclass provided by my patch ALL OVER THE PLACE in Sage, then one could solve #8119 by simply starting the rename() method by calling hash(self). Then, the hash would be computed and cached before renaming, and thus the cached hash value would not change when renaming.

However, applying the metaclass everywhere is probably a very bad idea. The intention of this ticket really is to just provide a new optional feature.

comment:11 in reply to: ↑ 8 Changed 10 years ago by SimonKing

  • Cc robertwb added

Replying to hivert:

Hi Simon,

Looks like you have done a lot of hard work here ! Thanks ! I might review it more or less soon.

I've a question: how does it interact with #8119 ? Does it solve the problem there ?

Now (after a look at Robert's patch from #8119) I understand your question: Robert suggests to cache the hash value of CategoryObject, and he removes a couple of custom hash methods that actually do nothing but replicate the default hash method.

Then, I'd say the relation of the two tickets is as follows:

  • With Robert's patch, the DEFAULT hash of CategoryObject would be cached. If a sub-class has its own hash method, then this would not be automatically cached. And if your sub-class is actually just a Python class, then its hash would not be automatically fast.
  • With my patch, if you write a Python class with a custom hash method, then you can additionally provide the new metaclass, and would turn the slow uncached custom hash into a fast cached custom hash.

I am not sure what would happen in the following situation: Start with a class C that uses the FastHash metaclass. Create a subclass D of C that has a custom hash. Would the hash of D be automatically fast and cached, because it derives from a class with the FastHash metaclass?

Need to test it - that would be nice to have, I think.

comment:12 Changed 8 years ago by jdemeyer

  • Milestone changed from sage-5.11 to sage-5.12

comment:13 Changed 8 years ago by vbraun_spam

  • Milestone changed from sage-6.1 to sage-6.2

comment:14 Changed 7 years ago by vbraun_spam

  • Milestone changed from sage-6.2 to sage-6.3

comment:15 Changed 7 years ago by vbraun_spam

  • Milestone changed from sage-6.3 to sage-6.4

comment:16 in reply to: ↑ 8 ; follow-up: Changed 6 years ago by ncohen

  • Status changed from needs_review to needs_work

Looks like you have done a lot of hard work here ! Thanks ! I might review it more or less soon.

That old joke still makes me laugh.

Nathann

comment:17 in reply to: ↑ 16 ; follow-up: Changed 6 years ago by SimonKing

Replying to ncohen:

Looks like you have done a lot of hard work here ! Thanks ! I might review it more or less soon.

That old joke still makes me laugh.

Well, it had no branch anyway, and it depends on a ticket that needs work.

comment:18 in reply to: ↑ 17 Changed 6 years ago by ncohen

Well, it had no branch anyway, and it depends on a ticket that needs work.

Yep.

I hate to see work wasted. Sorry :-/

Nathann

comment:19 Changed 6 years ago by vdelecroix

Hello,

We intend to remove the stupid hash implementation from Element and CategoryObject in #19016 (already done for SageObject in #18246). For some classes, it would be good to have a generic mechanism for caching hash!

Vincent

comment:20 Changed 6 years ago by chapoton

  • Description modified (diff)
Note: See TracTickets for help on using tickets.