Opened 3 years ago

Closed 3 years ago

#24954 closed enhancement (fixed)

Stronger references in CachedRepresentation

Reported by: jdemeyer Owned by:
Priority: major Milestone: sage-8.2
Component: misc Keywords:
Cc: Merged in:
Authors: Jeroen Demeyer Reviewers: Marc Mezzarobba
Report Upstream: N/A Work issues:
Branch: a802f12 (Commits, GitHub, GitLab) Commit: a802f12aee11f3b100f82108aacd151e6aad3c80
Dependencies: Stopgaps:

Status badges

Description

There is some very bad behaviour related to CachedRepresentation caching that I'm observing on #24742 (but this is otherwise unrelated to that ticket):

sage: timeit('MatrixSpace(ZZ,3,3)')
625 loops, best of 3: 117 µs per loop

Now, we try again but we first create a strong reference:

sage: S = MatrixSpace(ZZ,3,3)
sage: timeit('MatrixSpace(ZZ,3,3)')
625 loops, best of 3: 4.13 µs per loop

This is much faster the second time! In the first example, the caching of CachedRepresentation.__classcall__ is pointless since there is no strong reference to the entry in the cache, so it gets deleted immediately whenever the MatrixSpace(ZZ,3,3) is deleted.

This is just the usual Py_DECREF of Python objects, it has nothing to do with the cyclic garbage collector: the behaviour remains the same even with gc.disable().

This makes me think that we might need to change CachedRepresentation to use semi-strong references: these would only be deleted by the cyclic garbage collector but not by a simple Py_DECREF().

Change History (22)

comment:1 Changed 3 years ago by SimonKing

Idea: SemiWeakValueDictionary will initially create a strong reference to each value (say, by means of a class attribute holding a list containing the items), and gc.collect() will be monkey-patched so that it first removes all strong references kept by SemiWeakValueDictionary before doing the cyclic collection.

In that way, the values of SemiWeakValueDictionary would be permanent till the next cyclic garbage collection occurs.

Would that work?

comment:2 Changed 3 years ago by SimonKing

Proof of concept:

sage: from weakref import WeakValueDictionary
sage: from gc import collect
sage: import gc
sage: class MyThing(object):
....:     def __del__(self):
....:         print "<{}> gone".format(id(self))
....:     def __init__(self, n):
....:         print "new thing for {}".format(n)
....:         
sage: class MyWVD(object):
....:     refs = []
....:     def __init__(self):
....:         self.D = WeakValueDictionary()
....:     def __getitem__(self, k):
....:         return self.D[k]
....:     def __setitem__(self, k, v):
....:         self.D[k] = v
....:         MyWVD.refs.append(v)
....:         
sage: D = MyWVD()
sage: def my_collect(generation=None):
....:     MyWVD.refs = []
....:     if generation is not None:
....:         collect(generation)
....:     else:
....:         collect()
....:         
sage: gc.collect = my_collect
sage: a = D[1] = MyThing(1); del a
new thing for 1
sage: a = D[2] = MyThing(2); del a
new thing for 2
sage: gc.collect()
<140253944715856> gone
<140253944736848> gone

comment:3 Changed 3 years ago by SimonKing

Timing:

sage: class MyThing(object):
....:     pass
....:         
sage: D1 = MyWVD()
sage: D2 = WeakValueDictionary()
sage: def test1(n):
....:     try:
....:         return D1[n]
....:     except KeyError:
....:         a = D1[n] = MyThing()
....:         return a
....:     
sage: def test2(n):
....:     try:
....:         return D2[n]
....:     except KeyError:
....:         a = D2[n] = MyThing()
....:         return a
....:     
sage: %timeit test1(5)
The slowest run took 54.57 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.14 µs per loop
sage: %timeit test2(5)
The slowest run took 15.23 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.87 µs per loop

vs.

sage: %timeit test1(5); gc.collect()
10 loops, best of 3: 60 ms per loop
sage: %timeit test2(5); gc.collect()
10 loops, best of 3: 60 ms per loop

comment:4 follow-up: Changed 3 years ago by embray

You can't "just" monkey-patch the gc module, since what you can reach from Python are just Python-level wrappers for lower-level C functions used internally by the Python interpreter. This only changes what happens if one manually calls gc.collect() (in which case there's not much point in monkey-patching).

comment:5 in reply to: ↑ 4 Changed 3 years ago by SimonKing

Replying to embray:

You can't "just" monkey-patch the gc module, since what you can reach from Python are just Python-level wrappers for lower-level C functions used internally by the Python interpreter. This only changes what happens if one manually calls gc.collect() (in which case there's not much point in monkey-patching).

You're right. Hm. Then one could perhaps construct a guardian that will not be collected unless a cyclic garbage collection happens, with a __del__ method that triggers weakening of the SemiWeakValueDictionary. I'll try to construct it...

comment:6 follow-up: Changed 3 years ago by jdemeyer

I see two possible implementations, both quite simple:

(A) Keep strong references to the last 100 (or whatever number) values inserted in a WeakValueDictionary. Basically a Python list lastvalues with a wrapping-aroud counter i such that, whenever an item is inserted, it is also inserted as lastvalues[i]. This requires changing only WeakValueDictionary.

(B) Keep a strong reference from the value to itself (self.__self = self) forcing the object to be part of a reference cycle. This way, it can be deleted only by the garbage collector. This requires changing the values, i.e. CachedRepresentation.

I'll try to implement (A).

comment:7 in reply to: ↑ 6 Changed 3 years ago by SimonKing

Replying to jdemeyer:

I see two possible implementations, both quite simple:

(A) Keep strong references to the last 100 (or whatever number) values inserted in a WeakValueDictionary. Basically a Python list lastvalues with a wrapping-aroud counter i such that, whenever an item is inserted, it is also inserted as lastvalues[i]. This requires changing only WeakValueDictionary.

(B) Keep a strong reference from the value to itself (self.__self = self) forcing the object to be part of a reference cycle. This way, it can be deleted only by the garbage collector. This requires changing the values, i.e. CachedRepresentation.

I'll try to implement (A).

I will try to implement (C), which is:

(C) create a "guardian" with a reference to itself (so that it will only be deleted during cyclic garbage collection) and create a weak reference with callback from the class SemiWeakValueDictionary to the guardian. The callback function will both restore the weak reference to the guardian and clears the strong references to the values -- and apparently the callback function is only called when a cyclic collection happens.

comment:8 Changed 3 years ago by jdemeyer

So, in practice, the main difference between (A) and (C) is when are the strong references cleared? With (A), it would happen after 100 objects have been inserted in a WeakValueDictionary and with (C) it would happen upon garbage collection. I have no idea what works best.

comment:9 Changed 3 years ago by SimonKing

I thought (C) would work as follows:

sage: from weakref import WeakValueDictionary, ref
sage: class Guardian(object):
....:     def __init__(self):
....:         self._self = self
....:         
sage: class SWVD(object):
....:     @staticmethod
....:     def collect(*foo):
....:         print "collect"
....:         SWVD.guard = ref(Guardian(), cls.collect)
....:         SWVD.refs = []
....:     refs = []
....:     guard = ref(Guardian(), collect)
....:     def __init__(self):
....:         self.D = WeakValueDictionary()
....:     def __getitem__(self, k):
....:         return self.D[k]
....:     def __setitem__(self, k, v):
....:         self.D[k] = v
....:         SWVD.refs.append(v)
....:         
sage: class MyThing(object):
....:     def __del__(self):
....:         print "<{}> gone".format(id(self))
....:     def __init__(self, n):
....:         print "new thing for {}".format(n)
....:         
sage: D = SWVD()
sage: a = D[2] = MyThing(2); del a
new thing for 2
sage: a = D[2] = MyThing(2); del a
new thing for 2
sage: a = D[2] = MyThing(2); del a
new thing for 2

That's fine. But something went wrong:

sage: import gc
sage: gc.collect()
Exception TypeError: "'staticmethod' object is not callable" in <staticmethod object at 0x7f8cda18b2f0> ignored
58

So, why is the static method not callable? What requirements are made for the callback of a weak reference?

comment:10 Changed 3 years ago by SimonKing

Ouch, I found the problem: An unreferenced variable "cls" in the callback function that shouldn't belong there...

Nonetheless, so far, I don't get it to work.

comment:11 Changed 3 years ago by SimonKing

Got it.

Problem: During creation of the class SWVD, I wanted to refer to an attribute of that class. Apparently it didn't work. So, now, I am setting the class attribute during initialisation of the first instance of that class.

sage: from weakref import WeakValueDictionary, ref
sage: import gc
sage: class Guardian(object):
....:     def __init__(self):
....:         self._self = self
....:         
sage: class SWVD(object):
....:     @classmethod
....:     def collect(cls, *foo):
....:         print "collect"
....:         cls.guard = ref(Guardian(), cls.collect)
....:         cls.refs = []
....:     refs = []
....:     guard = None
....:     def __init__(self):
....:         if self.__class__.guard is None:
....:             self.__class__guard = ref(Guardian(), self.__class__.collect)
....:         self.D = WeakValueDictionary()
....:     def __getitem__(self, k):
....:         return self.D[k]
....:     def __setitem__(self, k, v):
....:         self.D[k] = v
....:         SWVD.refs.append(v)
....:         
sage: class MyThing(object):
....:     def __del__(self):
....:         print "<{}> gone".format(id(self))
....:     def __init__(self, n):
....:         print "new thing for {}".format(n)
....:         
sage: D = SWVD()

Elements of a SemiWeakValueDictionary aren't immediately deleted:

sage: a = D[2] = MyThing(2); del a
new thing for 2
sage: a = D[2] = MyThing(2); del a
new thing for 2
sage: a = D[2] = MyThing(2); del a
new thing for 2

But they are deleted when a garbage collection happens:

sage: gc.collect()
collect
<139776980263696> gone
<139776980313808> gone
<139776980313104> gone
71

I guess the above would basically work, although I would prefer to define SWVD.guard during creation of the class and not during creation of its first instance.

Edit: I also tested a = D[2] = MyThing(2); del a in a loop, and indeed a "spontaneous" garbage collection does result in the deletion of the dictionary values, just as it should. So, this time I am not just touching the Python layer of the gc module...

Last edited 3 years ago by SimonKing (previous) (diff)

comment:12 Changed 3 years ago by jdemeyer

  • Branch set to u/jdemeyer/stronger_references_in_cachedrepresentation

comment:13 Changed 3 years ago by jdemeyer

  • Commit set to 99717eb55a7833b630e5fd0732dc45b2c0b98d8c

This is my proof-of-concept implementation. It simply adds strong references to the last 64 objects added to the CachedWeakValueDictionary.


New commits:

99717ebCachedWeakValueDictionary with strong references to last values

comment:14 follow-up: Changed 3 years ago by vbraun

Thats the obvious solution, keep a finite-length lru cache around.

comment:15 Changed 3 years ago by git

  • Commit changed from 99717eb55a7833b630e5fd0732dc45b2c0b98d8c to bd9e3685f6c34c1d58cc330a97c3f77be1a4cfe9

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

bd9e368CachedWeakValueDictionary with strong references to last values

comment:16 Changed 3 years ago by jdemeyer

  • Status changed from new to needs_review

comment:17 in reply to: ↑ 14 Changed 3 years ago by jdemeyer

Replying to vbraun:

Thats the obvious solution, keep a finite-length lru cache around.

I'm not using a LRU cache, but a much simpler FIFO.

comment:18 Changed 3 years ago by jdemeyer

  • Status changed from needs_review to needs_work

comment:19 Changed 3 years ago by git

  • Commit changed from bd9e3685f6c34c1d58cc330a97c3f77be1a4cfe9 to a802f12aee11f3b100f82108aacd151e6aad3c80

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

a802f12CachedWeakValueDictionary with strong references to last values

comment:20 Changed 3 years ago by jdemeyer

  • Status changed from needs_work to needs_review

Fixed a few test failures.

comment:21 Changed 3 years ago by mmezzarobba

  • Reviewers set to Marc Mezzarobba
  • Status changed from needs_review to positive_review

This solves the issue (random slowdowns in the test suite of ore_algebra) that initially led me to the examples mentioned on #24742. And a way to cache the results of the last few calls to a function is something I was missing for other reasons as well.

comment:22 Changed 3 years ago by vbraun

  • Branch changed from u/jdemeyer/stronger_references_in_cachedrepresentation to a802f12aee11f3b100f82108aacd151e6aad3c80
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.