Opened 8 years ago
Last modified 5 years ago
#12941 new defect
Cache should be cleared on cloning
Reported by: | hivert | Owned by: | hivert |
---|---|---|---|
Priority: | major | Milestone: | sage-6.4 |
Component: | combinatorics | Keywords: | Clone, cache, days38 |
Cc: | sdenton, sage-combinat, mjo, SimonKing | Merged in: | |
Authors: | Florent Hivert | Reviewers: | |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description
With
class Foo(ClonableIntArray): @cached_method def sum(self): return add(i for i in self) def check(self): pass
One gets:
sage: Foo(Parent(), [1,2,3]) [1, 2, 3] sage: f = Foo(Parent(), [1,2,3]) sage: f.sum() 6 sage: with f.clone() as f1: ... f1[1] = 4 sage: f1 [1, 4, 3] sage: f1.sum() 6
The last result should be 8
. This happens because the cache of sum isn't
cleared for the cloned object f1
Florent
Change History (19)
comment:1 follow-up: ↓ 3 Changed 8 years ago by
comment:2 Changed 8 years ago by
- Cc mjo added
comment:3 in reply to: ↑ 1 ; follow-ups: ↓ 5 ↓ 9 Changed 8 years ago by
Replying to mjo:
What cache? There are no cached methods in
ClonableIntArray
, so the behavior within that class appears correct at first glance.Isn't the problem really that you're saying
sum()
is cacheable, when in reality it isn't? At least under the inherited implementation ofsetitem()
.Since
Foo
is the class with the cached method,Foo
should invalidate the cache onsetitem()
.
In the previous code f1
is supposed to be a mutable copy (clone) of
f
. Therefore I think we should clear the cache during the __copy__
method called by clone
. Actually the same problem occur with
Element
:
Using the following definition:
from sage.structure.element import Element class Bar(Element): def __init__(self, x): self.x = x Element.__init__(self, Parent()) @cached_method def y(self): return self.x
One gets
sage: b = Bar(4) sage: b.y() 4 sage: c = copy(b) sage: c.x = 25 sage: b.y() 4 sage: c.y() 4
I think that in Element
method __copy__
line 427 (sage 5.0rc0) the
following line
try: res.__dict__ = self.__dict__.copy() except AttributeError: pass
Should be fixed not to copy cached methods, with something as
dct = self.__dict__.copy() for s in self.__dict__: if isinstance(dct[s], (CachedFunction, CachedMethod)): del dct[s] res.__dict__ = dct
I'm CCing Simon King who designed the last version of cache. He may has some idea.
Cheers,
Florent
comment:4 Changed 8 years ago by
- Cc SimonKing added
comment:5 in reply to: ↑ 3 Changed 8 years ago by
Replying to hivert:
I think that in
Element
method__copy__
line 427 (sage 5.0rc0) the following line... Should be fixed not to copy cached methods
But this affects all of the mutable classes in which caching is handled correctly. If instances of Bar
are mutable and invalidate the cache when modified, why shouldn't a b = Bar(); copy(b)
also copy the cache?
This only helps in a very specific case (immutable superclass with no cached methods and mutable copies), and even then only when a bug actually exists in the subclass. Trying to predict everything that subclasses might do wrong within the superclass is the path to unhappiness (and breaks encapsulation) =)
comment:6 Changed 8 years ago by
Oh, and even if we did clear the cache on copy()
, there would still be a bug in Foo
:
sage: with f.clone() as f1: ... f1[1] = 4 sage: f1 [1, 4, 3] sage: f1.sum() 8 sage: f1[1] = 100000 sage: f1.sum() 8
comment:7 follow-up: ↓ 8 Changed 8 years ago by
mjo: The idea of the cloneable array is that you make a clone, modify it, and then at the end of that modification the array should fit the parameters defined by the class, though it is itself a new object. The array is only mutable during this cloning process; I think the f1[1]=100000 line should not be allowed, if my understanding is correct.
My understanding is that there's a push to switch from using the CombintaorialObject? class to the ClonableArray? class, which would make this cloning behavior a rather common occurrence.
comment:8 in reply to: ↑ 7 Changed 8 years ago by
Replying to sdenton:
mjo: The idea of the cloneable array is that you make a clone, modify it, and then at the end of that modification the array should fit the parameters defined by the class, though it is itself a new object. The array is only mutable during this cloning process; I think the f1[1]=100000 line should not be allowed, if my understanding is correct.
Sorry, you're right. I botched my test. This does work:
sage: f = Foo(Parent(), [1,2,3]) sage: f1 = f.clone() sage: f1[1] = 10000 sage: f1 [1, 10000, 3]
but doesn't do what I think it does. (I hit up
a bunch of times, and apparently that's what I typed earlier when I should have done with f.clone()...
)
My understanding is that there's a push to switch from using the CombintaorialObject? class to the ClonableArray? class, which would make this cloning behavior a rather common occurrence.
I still think the bug is in Foo
. I could instead cache sum()
in self._cache
; would the superclass still be responsible for that?
Another philosophical argument: copy()
should make a copy.
comment:9 in reply to: ↑ 3 Changed 8 years ago by
Some comments, even though I don't know the exact purpose of ClonableIntArray. I guess I'll divide my reply into two posts. Here is the first, in which I explain why Florent's approach won't work.
Replying to hivert:
Should be fixed not to copy cached methods, with something as
dct = self.__dict__.copy() for s in self.__dict__: if isinstance(dct[s], (CachedFunction, CachedMethod)): del dct[s] res.__dict__ = dct
That would not be enough to clear the cache for cached methods with arguments.
Let's go through an example that should illustrate different aspects:
sage: class A: ....: def __init__(self,x): ....: self.x = x ....: @cached_method ....: def f(self,y): ....: return self.x+y ....: @cached_method ....: def g(self): ....: return self.x*2 ....:
First, we copy an instance of class A before calling the cached method, and mutate it:
sage: a = A(5) sage: b = copy(a) sage: b.x = 7 sage: a.f(7) 12 sage: a.g() 10 sage: b.f(7) 14 sage: b.g() 14
No surprise, everything is fine. Now, we copy the first instance of A again, and mutate:
sage: c = copy(a) sage: c.x = 7 sage: c.f(7) 12 sage: c.g() 10
No surprise, the cache is (wrongly) preserved.
Next, and that illustrates why Florent's suggestion is not enough to solve the problem, we copy the instance, kill the cached methods, and then mutate:
sage: d = copy(a) sage: del d.f, d.g sage: d.x = 7 sage: d.f(7) 12 sage: d.g() 14
Observation: The cached method without arguments is now fine, but the cached method with arguments is still caching the wrong result. Indeed, cached methods with and without arguments are different in how they store their cache values:
sage: type(d.f) <type 'sage.misc.cachefunc.CachedMethodCaller'> sage: type(d.g) <type 'sage.misc.cachefunc.CachedMethodCallerNoArgs'> sage: d.f.cache {((7,), ()): 12} sage: d.g.cache 14
So, a cached method stores the values in a dictionary, while the cached method without arguments just stores its single value. And now comes the catch: The dictionary is stored as an argument of the object and is thus copied! d.f.cache
is just a shortcut for faster access:
sage: d.f.cache is d._cache__f True
Summary
Deleting a cached method in the copy is enough for cached methods with no arguments, but won't suffice for general cached methods.
comment:10 follow-up: ↓ 11 Changed 8 years ago by
I just observed another difficulty: If you copy a CachedMethodCaller
(and that is what you find in the dictionary of the to-be-copied object, after calling a cached method) then the copy will still refer to the old instance. Hence, deleting the cached method caller from the dictionary (as suggested by Florent) is indeed needed. But in addition, one ought to clear the cache:
Define
from sage.misc.cachefunc import CachedMethodCaller, CachedMethodCallerNoArgs, cached_method class A(object): def __init__(self,x): self.x = x def __copy__(self): res = self.__class__.__new__(self.__class__) d = copy(self.__dict__) CM = [] for s,v in self.__dict__.iteritems(): if isinstance(v, CachedMethodCaller): d.__delitem__(s) CM.append(s) elif isinstance(v, CachedMethodCallerNoArgs): d.__delitem__(s) res.__dict__ = d for s in CM: getattr(res,s).clear_cache() return res @cached_method def f(self,i): return self.x+i @cached_method def g(self): return self.x*2
and then
sage: a = A(5) sage: a.f(7) 12 sage: a.g() 10 sage: b = copy(a) sage: b.x = 7 sage: b.f(7) 14 sage: b.g() 14
comment:11 in reply to: ↑ 10 Changed 8 years ago by
Replying to SimonKing:
I just observed another difficulty: If you copy a
CachedMethodCaller
(and that is what you find in the dictionary of the to-be-copied object, after calling a cached method) then the copy will still refer to the old instance.
Or actually: If you copy (but not deep-copy) the dictionary, then you will have that problem. Anyway, solution see above...
comment:12 Changed 8 years ago by
Here is a second potential solution: ClearCacheOnPickle, that existed prior to #11115, but was totally broken. Now it works, as follows.
You need to define some class with cached methods:
from sage.misc.cachefunc import CachedMethodCaller, CachedMethodCallerNoArgs, cached_method class B(object): def __init__(self,x): self.x = x def __getstate__(self): return (self.__dict__,) def __setstate__(self,L): self.__dict__ = L[0] @cached_method def f(self,i): return self.x+i @cached_method def g(self): return self.x*2
Then, you derive a new class from it that also inherits from ClearCacheOnPickle; note that it is important that B has (or inherits) a __getstate__
method, because ClearCacheOnPickle calls it when pickling:
from sage.misc.cachefunc import ClearCacheOnPickle class A(ClearCacheOnPickle,B): @cached_method def h(self,i): return self.x**i
Note that the derived class has an additional cached method. Let us see what happens:
sage: a = A(5) sage: a.f(7) 12 sage: x = a.f(7) sage: a.g() 10 sage: a.h(7) 78125 sage: b = copy(a) sage: b.x = 7 sage: b.f(7) 14 sage: b.g() 14 sage: b.h(7) 823543 sage: x is a.f(7) True
So, all cached methods have automatically been cleared (but only in the copy, not in the original instance).
I don't know if it makes sense to use ClearCacheOnPickle in your setting, but perhaps it helps.
comment:13 follow-up: ↓ 14 Changed 8 years ago by
The following does not work in terms of "clearing the cache":
def unpickle(cl,data): res = cl.__new__(cl) res.__dict__ = data return res class C(object): def __init__(self,x): self.x = x def __reduce__(self): return unpickle, (self.__class__,self.__dict__) @cached_method def f(self,i): return self.x+i @cached_method def g(self): return self.x*2 from sage.misc.cachefunc import ClearCacheOnPickle # This doesn't work as it should: class D(ClearCacheOnPickle,C): pass
The reason is that D inherits the __reduce__
method from C and inherits the __getstate__
method from ClearCacheOnPickle - but if both methods exists, __reduce__
is chosen for pickling, and thus the caches are not cleared.
In a better world, ClearCacheOnPickle would work with both pickling methods, not just with the __getstate__/__setstate__
method. I guess that would be possible somehow (the question is whether we want that new feature).
Moreover, ClearCacheOnPickle only works if you take an existing class with __getstate__
pickling and form a new class that inherits from ClearCacheOnPickle (must be in the first place!) and the other class. In an ideal world, ClearCacheOnPickle would work similarly to UniqueRepresentation: It would suffice to provide ClearCacheOnPickle as a base class, but then one could implement pickling directly in the class, rather than in another base class.
That could be done using metaclasses, changing the way how instances of the to-be-created class are pickled. Note that this is different from DynamicClassMetaclass, which changes the way how the to-be-created class itself (rather than their instances) are pickled.
But then, in order to seriously use such a new metaclass, one would like to be able to create a class that combines the new metaclass with others (classcall etc). Hence, we are back at the discussion that I recently started at sage-combinat-devel: Ceterum censeo we should have meta-metaclasses!
comment:14 in reply to: ↑ 13 ; follow-up: ↓ 15 Changed 8 years ago by
Hi Simon, Thanks for your clear and comprehensive explanations ! Cloning ~~~~~~~ I think I need to explain a little what we want to achieve with the clone protocol. In Sage, we mostly models mathematical objects which are by essence immutable. However, in many occasions, we wants to construct a data-structure encoding a new mathematical object by small modifications of the data structure encoding some already built object. For the resulting data-structure to correctly encode the mathematical object, some structural invariants must hold. One problem is that, in many cases, during the modification process, there is no possibility but to break the invariants. Think about applying a transposition on a permutation without atomic parallel assignment. In the middle of the various assignment the object is not a permutations. So to summarize: - I'm starting with a valid immutable object ``a``; - I'm cloning ``a`` to a new temporarily mutable object ``b``. - I'm modifying ``b``; during the modification ``b`` is likely to be **broken**. It's the user responsibility to make sure that methods calls in that state are correct. - At the end of the modification, ``b`` is restored in an immutable state and it is semantically checked. Any methods should now work as for object ``a``. Syntactically speaking you write the following:: with a.clone() as b: ... # lot of invariant breaking modifications on b ... # b is checked and immutable Note that this is equivalent to:: b = a.__copy__() ... # lot of invariant breaking modifications on b ... b.set_immutable() b.check() Copying for Cloning ~~~~~~~~~~~~~~~~~~~ Since in code with clones there could be lot of copying on small objects, I don't want copy to be slow. In particular, in most current usecases (there are currently admittedly not that much of them), I'm performing the copy at the Cython/C level. Dumping the object in a string (or any temporary Python structures) and creating a new object from that is in my opinion not an option, as the whole purpose of clone is to avoid unnecessary copies. So this rule out copying by pickling and therefore your nice ``ClearCacheOnPickle`` feature. Caching an mutability ~~~~~~~~~~~~~~~~~~~~~ I would rather have the following protocol: - I'm Ok to implement a ``copyAndClearTheCache`` at the level of ``Element`` or ``ClonableElement``. Any support from the cache mechanism which makes it as fast as possible is welcome. - On the second stage I don't want to clear the cache upon any modification as they are supposed to be elementary (eg: modifying an array entry). As a consequence, I'd like to have a hook which **deactivate any caching on mutable objects**. Caching will be be reactivated once the object is set immutable. Do you think that's feasible on the ``CachedMethod`` side while keeping the overhead as low as possible ? Currently all sage mutable objects including not only clones but matrices and vectors have a method ``_is_immutable``. Vectors and matrices implements their own caching mechanism, but I think having cached method work with them would be good. What do you think ? Cheers, Florent
comment:15 in reply to: ↑ 14 Changed 8 years ago by
Replying to hivert:
- I'm Ok to implement a
copyAndClearTheCache
at the level of
Element
or
ClonableElement?
. Any support from the cache mechanism which makes it as fast as possible is welcome.
I hope that the clear_cache
method of cached methods is quick enough:
sage: class A: ....: @cached_method ....: def f(self): ....: return 5 ....: @cached_method ....: def g(self,i,n=4): ....: return i**n ....: sage: a = A() sage: %timeit a.f.clear_cache() 625 loops, best of 3: 182 ns per loop sage: %timeit a.g.clear_cache() 625 loops, best of 3: 328 ns per loop
- On the second stage I don't want to clear the cache upon any modification as they are supposed to be elementary (eg: modifying an array entry). As a consequence, I'd like to have a hook which deactivate any caching on mutable objects. Caching will be be reactivated once the object is set immutable. Do you think that's feasible on the
CachedMethod?
side while keeping the overhead as low as possible ?
One could certainly create a new decorator, say, @cache_if_immutable, that only writes to or reads from the cache if the object is immutable. I have no idea of the overhead (at least, one would have one call to _is_immutable() per access).
Currently all sage mutable objects including not only clones but matrices and vectors have a method
_is_immutable
. Vectors and matrices implements their own caching mechanism, but I think having cached method work with them would be good.
Technical difficulty: cached_method requires that either the object allows to set new attributes, or it has an attribute __cached_methods
which is a dictionary. The latter exists for sage.structure.parent.Parent, by #11115.
Now the problem is that vectors (such as created by vector(CC, [1,2,3])
) do not support adding attributes, and they do not have __cached_methods
. By consequence, @cached_method will not work for them.
A possible solution would be to add cpdef dict __cached_methods
to vectors. But then, the next complication arises:
sage: cython(""" ....: from sage.structure.parent cimport Parent ....: cdef class MyParent(Parent): ....: def f(self): ....: return 5 ....: """) sage: MyParent.f.__module__ --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /home/simon/SAGE/work/pathalgebras/<ipython console> in <module>() AttributeError: 'method_descriptor' object has no attribute '__module__'
Unfortunately, the __module__
attribute is read when a cached method is defined. Hence, although cached_method does work for a python class defined in a cython module, it does not (yet) work on a cdef class.
But one could try to work around:
sage: MyParent.f.__objclass__.__module__ '_home_simon__sage_temp_linux_sqwp_site_4700_tmp_2_spyx_0'
So, I guess I'll open another cached_method ticket!
comment:16 Changed 6 years ago by
- Milestone changed from sage-5.11 to sage-5.12
comment:17 Changed 6 years ago by
- Milestone changed from sage-6.1 to sage-6.2
comment:18 Changed 6 years ago by
- Milestone changed from sage-6.2 to sage-6.3
comment:19 Changed 5 years ago by
- Milestone changed from sage-6.3 to sage-6.4
What cache? There are no cached methods in
ClonableIntArray
, so the behavior within that class appears correct at first glance.Isn't the problem really that you're saying
sum()
is cacheable, when in reality it isn't? At least under the inherited implementation ofsetitem()
.Since
Foo
is the class with the cached method,Foo
should invalidate the cache onsetitem()
.