Opened 7 years ago

Last modified 5 years ago

#12941 new defect

Cache should be cleared on cloning

Reported by: hivert Owned by: hivert
Priority: major Milestone: sage-6.4
Component: combinatorics Keywords: Clone, cache, days38
Cc: sdenton, sage-combinat, mjo, SimonKing Merged in:
Authors: Florent Hivert Reviewers:
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description

With

class Foo(ClonableIntArray):
    @cached_method
    def sum(self):
        return add(i for i in self)
    def check(self):
        pass

One gets:

    sage: Foo(Parent(), [1,2,3])
    [1, 2, 3]
    sage: f = Foo(Parent(), [1,2,3])
    sage: f.sum()
    6
    sage: with f.clone() as f1:
    ...     f1[1] = 4
    sage: f1
    [1, 4, 3]
    sage: f1.sum()
    6

The last result should be 8. This happens because the cache of sum isn't cleared for the cloned object f1

Florent

Change History (19)

comment:1 follow-up: Changed 7 years ago by mjo

What cache? There are no cached methods in ClonableIntArray, so the behavior within that class appears correct at first glance.

Isn't the problem really that you're saying sum() is cacheable, when in reality it isn't? At least under the inherited implementation of setitem().

Since Foo is the class with the cached method, Foo should invalidate the cache on setitem().

comment:2 Changed 7 years ago by mjo

  • Cc mjo added

comment:3 in reply to: ↑ 1 ; follow-ups: Changed 7 years ago by hivert

Replying to mjo:

What cache? There are no cached methods in ClonableIntArray, so the behavior within that class appears correct at first glance.

Isn't the problem really that you're saying sum() is cacheable, when in reality it isn't? At least under the inherited implementation of setitem().

Since Foo is the class with the cached method, Foo should invalidate the cache on setitem().

In the previous code f1 is supposed to be a mutable copy (clone) of f. Therefore I think we should clear the cache during the __copy__ method called by clone. Actually the same problem occur with Element: Using the following definition:

from sage.structure.element import Element
class Bar(Element):
    def __init__(self, x):
        self.x = x
        Element.__init__(self, Parent())

    @cached_method
    def y(self):
        return self.x

One gets

sage: b = Bar(4)
sage: b.y()
4
sage: c = copy(b)
sage: c.x = 25
sage: b.y()
4
sage: c.y()
4

I think that in Element method __copy__ line 427 (sage 5.0rc0) the following line

       try:
            res.__dict__ = self.__dict__.copy()
        except AttributeError:
            pass

Should be fixed not to copy cached methods, with something as

            dct = self.__dict__.copy()
            for s in self.__dict__:
                if isinstance(dct[s], (CachedFunction, CachedMethod)):
                    del dct[s]
            res.__dict__ = dct

I'm CCing Simon King who designed the last version of cache. He may has some idea.

Cheers,

Florent

comment:4 Changed 7 years ago by hivert

  • Cc SimonKing added

comment:5 in reply to: ↑ 3 Changed 7 years ago by mjo

Replying to hivert:

I think that in Element method __copy__ line 427 (sage 5.0rc0) the following line... Should be fixed not to copy cached methods


But this affects all of the mutable classes in which caching is handled correctly. If instances of Bar are mutable and invalidate the cache when modified, why shouldn't a b = Bar(); copy(b) also copy the cache?

This only helps in a very specific case (immutable superclass with no cached methods and mutable copies), and even then only when a bug actually exists in the subclass. Trying to predict everything that subclasses might do wrong within the superclass is the path to unhappiness (and breaks encapsulation) =)

comment:6 Changed 7 years ago by mjo

Oh, and even if we did clear the cache on copy(), there would still be a bug in Foo:

sage: with f.clone() as f1:
...     f1[1] = 4
sage: f1
[1, 4, 3]
sage: f1.sum()
8
sage: f1[1] = 100000
sage: f1.sum()
8

comment:7 follow-up: Changed 7 years ago by sdenton

mjo: The idea of the cloneable array is that you make a clone, modify it, and then at the end of that modification the array should fit the parameters defined by the class, though it is itself a new object. The array is only mutable during this cloning process; I think the f1[1]=100000 line should not be allowed, if my understanding is correct.

My understanding is that there's a push to switch from using the CombintaorialObject? class to the ClonableArray? class, which would make this cloning behavior a rather common occurrence.

comment:8 in reply to: ↑ 7 Changed 7 years ago by mjo

Replying to sdenton:

mjo: The idea of the cloneable array is that you make a clone, modify it, and then at the end of that modification the array should fit the parameters defined by the class, though it is itself a new object. The array is only mutable during this cloning process; I think the f1[1]=100000 line should not be allowed, if my understanding is correct.


Sorry, you're right. I botched my test. This does work:

sage: f = Foo(Parent(), [1,2,3])
sage: f1 = f.clone()
sage: f1[1] = 10000
sage: f1
[1, 10000, 3]

but doesn't do what I think it does. (I hit up a bunch of times, and apparently that's what I typed earlier when I should have done with f.clone()...)

My understanding is that there's a push to switch from using the CombintaorialObject? class to the ClonableArray? class, which would make this cloning behavior a rather common occurrence.


I still think the bug is in Foo. I could instead cache sum() in self._cache; would the superclass still be responsible for that?

Another philosophical argument: copy() should make a copy.

comment:9 in reply to: ↑ 3 Changed 7 years ago by SimonKing

Some comments, even though I don't know the exact purpose of ClonableIntArray. I guess I'll divide my reply into two posts. Here is the first, in which I explain why Florent's approach won't work.

Replying to hivert:

Should be fixed not to copy cached methods, with something as

            dct = self.__dict__.copy()
            for s in self.__dict__:
                if isinstance(dct[s], (CachedFunction, CachedMethod)):
                    del dct[s]
            res.__dict__ = dct

That would not be enough to clear the cache for cached methods with arguments.

Let's go through an example that should illustrate different aspects:

sage: class A:
....:     def __init__(self,x):
....:         self.x = x
....:     @cached_method
....:     def f(self,y):
....:         return self.x+y
....:     @cached_method
....:     def g(self):
....:         return self.x*2
....:     

First, we copy an instance of class A before calling the cached method, and mutate it:

sage: a = A(5)
sage: b = copy(a)
sage: b.x = 7
sage: a.f(7)
12
sage: a.g()
10
sage: b.f(7)
14
sage: b.g()
14

No surprise, everything is fine. Now, we copy the first instance of A again, and mutate:

sage: c = copy(a)
sage: c.x = 7
sage: c.f(7)
12
sage: c.g()
10

No surprise, the cache is (wrongly) preserved.

Next, and that illustrates why Florent's suggestion is not enough to solve the problem, we copy the instance, kill the cached methods, and then mutate:

sage: d = copy(a)
sage: del d.f, d.g
sage: d.x = 7
sage: d.f(7)
12
sage: d.g()
14

Observation: The cached method without arguments is now fine, but the cached method with arguments is still caching the wrong result. Indeed, cached methods with and without arguments are different in how they store their cache values:

sage: type(d.f)
<type 'sage.misc.cachefunc.CachedMethodCaller'>
sage: type(d.g)
<type 'sage.misc.cachefunc.CachedMethodCallerNoArgs'>
sage: d.f.cache
{((7,), ()): 12}
sage: d.g.cache
14

So, a cached method stores the values in a dictionary, while the cached method without arguments just stores its single value. And now comes the catch: The dictionary is stored as an argument of the object and is thus copied! d.f.cache is just a shortcut for faster access:

sage: d.f.cache is d._cache__f
True

Summary

Deleting a cached method in the copy is enough for cached methods with no arguments, but won't suffice for general cached methods.

comment:10 follow-up: Changed 7 years ago by SimonKing

I just observed another difficulty: If you copy a CachedMethodCaller (and that is what you find in the dictionary of the to-be-copied object, after calling a cached method) then the copy will still refer to the old instance. Hence, deleting the cached method caller from the dictionary (as suggested by Florent) is indeed needed. But in addition, one ought to clear the cache:

Define

from sage.misc.cachefunc import CachedMethodCaller, CachedMethodCallerNoArgs, cached_method

class A(object):
    def __init__(self,x):
        self.x = x
    def __copy__(self):
        res = self.__class__.__new__(self.__class__)
        d = copy(self.__dict__)
        CM = []
        for s,v in self.__dict__.iteritems():
            if isinstance(v, CachedMethodCaller):
                d.__delitem__(s)
                CM.append(s)
            elif isinstance(v, CachedMethodCallerNoArgs):
                d.__delitem__(s)
        res.__dict__ = d
        for s in CM:
            getattr(res,s).clear_cache()
        return res
    @cached_method
    def f(self,i):
        return self.x+i
    @cached_method
    def g(self):
        return self.x*2

and then

sage: a = A(5)
sage: a.f(7)
12
sage: a.g()
10
sage: b = copy(a)
sage: b.x = 7
sage: b.f(7)
14
sage: b.g()
14

comment:11 in reply to: ↑ 10 Changed 7 years ago by SimonKing

Replying to SimonKing:

I just observed another difficulty: If you copy a CachedMethodCaller (and that is what you find in the dictionary of the to-be-copied object, after calling a cached method) then the copy will still refer to the old instance.

Or actually: If you copy (but not deep-copy) the dictionary, then you will have that problem. Anyway, solution see above...

comment:12 Changed 7 years ago by SimonKing

Here is a second potential solution: ClearCacheOnPickle, that existed prior to #11115, but was totally broken. Now it works, as follows.

You need to define some class with cached methods:

from sage.misc.cachefunc import CachedMethodCaller, CachedMethodCallerNoArgs, cached_method

class B(object):
    def __init__(self,x):
        self.x = x
    def __getstate__(self):
        return (self.__dict__,)
    def __setstate__(self,L):
        self.__dict__ = L[0]
    @cached_method
    def f(self,i):
        return self.x+i
    @cached_method
    def g(self):
        return self.x*2

Then, you derive a new class from it that also inherits from ClearCacheOnPickle; note that it is important that B has (or inherits) a __getstate__ method, because ClearCacheOnPickle calls it when pickling:

from sage.misc.cachefunc import ClearCacheOnPickle
class A(ClearCacheOnPickle,B):
    @cached_method
    def h(self,i):
        return self.x**i

Note that the derived class has an additional cached method. Let us see what happens:

sage: a = A(5)
sage: a.f(7)
12
sage: x = a.f(7)
sage: a.g()
10
sage: a.h(7)
78125
sage: b = copy(a)
sage: b.x = 7
sage: b.f(7)
14
sage: b.g()
14
sage: b.h(7)
823543
sage: x is a.f(7)
True

So, all cached methods have automatically been cleared (but only in the copy, not in the original instance).

I don't know if it makes sense to use ClearCacheOnPickle in your setting, but perhaps it helps.

comment:13 follow-up: Changed 7 years ago by SimonKing

The following does not work in terms of "clearing the cache":

def unpickle(cl,data):
    res = cl.__new__(cl)
    res.__dict__ = data
    return res

class C(object):
    def __init__(self,x):
        self.x = x
    def __reduce__(self):
        return unpickle, (self.__class__,self.__dict__)
    @cached_method
    def f(self,i):
        return self.x+i
    @cached_method
    def g(self):
        return self.x*2

from sage.misc.cachefunc import ClearCacheOnPickle
# This doesn't work as it should:
class D(ClearCacheOnPickle,C):
    pass

The reason is that D inherits the __reduce__ method from C and inherits the __getstate__ method from ClearCacheOnPickle - but if both methods exists, __reduce__ is chosen for pickling, and thus the caches are not cleared.

In a better world, ClearCacheOnPickle would work with both pickling methods, not just with the __getstate__/__setstate__ method. I guess that would be possible somehow (the question is whether we want that new feature).

Moreover, ClearCacheOnPickle only works if you take an existing class with __getstate__ pickling and form a new class that inherits from ClearCacheOnPickle (must be in the first place!) and the other class. In an ideal world, ClearCacheOnPickle would work similarly to UniqueRepresentation: It would suffice to provide ClearCacheOnPickle as a base class, but then one could implement pickling directly in the class, rather than in another base class.

That could be done using metaclasses, changing the way how instances of the to-be-created class are pickled. Note that this is different from DynamicClassMetaclass, which changes the way how the to-be-created class itself (rather than their instances) are pickled.

But then, in order to seriously use such a new metaclass, one would like to be able to create a class that combines the new metaclass with others (classcall etc). Hence, we are back at the discussion that I recently started at sage-combinat-devel: Ceterum censeo we should have meta-metaclasses!

comment:14 in reply to: ↑ 13 ; follow-up: Changed 7 years ago by hivert

       Hi Simon,

Thanks for your clear and comprehensive explanations !

Cloning
~~~~~~~

I think I need to explain a little what we want to achieve with the clone
protocol. In Sage, we mostly models mathematical objects which are by essence
immutable. However, in many occasions, we wants to construct a data-structure
encoding a new mathematical object by small modifications of the data
structure encoding some already built object. For the resulting data-structure
to correctly encode the mathematical object, some structural invariants must
hold. One problem is that, in many cases, during the modification process,
there is no possibility but to break the invariants.

Think about applying a transposition on a permutation without atomic parallel
assignment. In the middle of the various assignment the object is not a
permutations.

So to summarize:

- I'm starting with a valid immutable object ``a``;

- I'm cloning ``a`` to a new temporarily mutable object ``b``.

- I'm modifying ``b``; during the modification ``b`` is likely to be
  **broken**. It's the user responsibility to make sure that methods calls in
  that state are correct.

- At the end of the modification, ``b`` is restored in an immutable state and
  it is semantically checked. Any methods should now work as for object ``a``.

Syntactically speaking you write the following::

      with a.clone() as b:
           ...
           # lot of invariant breaking modifications on b
           ...
      # b is checked and immutable

Note that this is equivalent to::

      b = a.__copy__()
      ...
      # lot of invariant breaking modifications on b
      ...
      b.set_immutable()
      b.check()


Copying for Cloning
~~~~~~~~~~~~~~~~~~~

Since in code with clones there could be lot of copying on small objects, I
don't want copy to be slow. In particular, in most current usecases (there are
currently admittedly not that much of them), I'm performing the copy at the
Cython/C level. Dumping the object in a string (or any temporary Python
structures) and creating a new object from that is in my opinion not an
option, as the whole purpose of clone is to avoid unnecessary copies. So this
rule out copying by pickling and therefore your nice ``ClearCacheOnPickle``
feature.


Caching an mutability
~~~~~~~~~~~~~~~~~~~~~

I would rather have the following protocol:

- I'm Ok to implement a ``copyAndClearTheCache`` at the level of ``Element``
  or ``ClonableElement``.  Any support from the cache mechanism which makes it
  as fast as possible is welcome.

- On the second stage I don't want to clear the cache upon any modification as
  they are supposed to be elementary (eg: modifying an array entry). As a
  consequence, I'd like to have a hook which **deactivate any caching on
  mutable objects**. Caching will be be reactivated once the object is set
  immutable. Do you think that's feasible on the ``CachedMethod`` side while
  keeping the overhead as low as possible ?

Currently all sage mutable objects including not only clones but matrices and
vectors have a method ``_is_immutable``. Vectors and matrices implements
their own caching mechanism, but I think having cached method work with them
would be good.

What do you think ?

Cheers,

Florent

comment:15 in reply to: ↑ 14 Changed 7 years ago by SimonKing

Replying to hivert:

  • I'm Ok to implement a copyAndClearTheCache at the level of Element or ClonableElement?. Any support from the cache mechanism which makes it as fast as possible is welcome.

I hope that the clear_cache method of cached methods is quick enough:

sage: class A:
....:     @cached_method
....:     def f(self):
....:         return 5
....:     @cached_method
....:     def g(self,i,n=4):
....:         return i**n
....:     
sage: a = A()
sage: %timeit a.f.clear_cache()
625 loops, best of 3: 182 ns per loop
sage: %timeit a.g.clear_cache()
625 loops, best of 3: 328 ns per loop

  • On the second stage I don't want to clear the cache upon any modification as they are supposed to be elementary (eg: modifying an array entry). As a consequence, I'd like to have a hook which deactivate any caching on mutable objects. Caching will be be reactivated once the object is set immutable. Do you think that's feasible on the CachedMethod? side while keeping the overhead as low as possible ?

One could certainly create a new decorator, say, @cache_if_immutable, that only writes to or reads from the cache if the object is immutable. I have no idea of the overhead (at least, one would have one call to _is_immutable() per access).

Currently all sage mutable objects including not only clones but matrices and vectors have a method _is_immutable. Vectors and matrices implements their own caching mechanism, but I think having cached method work with them would be good.

Technical difficulty: cached_method requires that either the object allows to set new attributes, or it has an attribute __cached_methods which is a dictionary. The latter exists for sage.structure.parent.Parent, by #11115.

Now the problem is that vectors (such as created by vector(CC, [1,2,3])) do not support adding attributes, and they do not have __cached_methods. By consequence, @cached_method will not work for them.

A possible solution would be to add cpdef dict __cached_methods to vectors. But then, the next complication arises:

sage: cython("""
....: from sage.structure.parent cimport Parent
....: cdef class MyParent(Parent):
....:     def f(self):
....:         return 5                                                                             
....: """)                                                                                        
sage: MyParent.f.__module__
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)                                                                                                                               

/home/simon/SAGE/work/pathalgebras/<ipython console> in <module>()

AttributeError: 'method_descriptor' object has no attribute '__module__'

Unfortunately, the __module__ attribute is read when a cached method is defined. Hence, although cached_method does work for a python class defined in a cython module, it does not (yet) work on a cdef class.

But one could try to work around:

sage: MyParent.f.__objclass__.__module__
'_home_simon__sage_temp_linux_sqwp_site_4700_tmp_2_spyx_0'

So, I guess I'll open another cached_method ticket!

comment:16 Changed 6 years ago by jdemeyer

  • Milestone changed from sage-5.11 to sage-5.12

comment:17 Changed 6 years ago by vbraun_spam

  • Milestone changed from sage-6.1 to sage-6.2

comment:18 Changed 5 years ago by vbraun_spam

  • Milestone changed from sage-6.2 to sage-6.3

comment:19 Changed 5 years ago by vbraun_spam

  • Milestone changed from sage-6.3 to sage-6.4
Note: See TracTickets for help on using tickets.