Opened 7 years ago

Closed 7 years ago

#15069 closed defect (fixed)

Make `MonoDictEraser` and `TripleDictEraser` safe against "recursion depth exceeded"

Reported by: SimonKing Owned by:
Priority: major Milestone: sage-5.12
Component: performance Keywords:
Cc: nthiery, nbruin, vbraun Merged in: sage-5.12.beta4
Authors: Simon King Reviewers: Volker Braun
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description

The following happens both with Sage's MonoDict and with Python's weakref.WeakKeyDictionary:

from sage.structure.coerce_dict import MonoDict
M = MonoDict(11)

class A: pass
a = A()
prev = a

for i in range(1000):
    newA = A()
    M[prev] = newA
    prev = newA

len(M)
del a
Exception RuntimeError: 'maximum recursion depth exceeded while calling a Python object' in <sage.structure.coerce_dict.MonoDictEraser object at 0x5a13788> ignored

Replace M = MonoDict(11) by M = weakref.WeakKeyDictionary(), and you get essentially the same error:

sage: del a
Exception RuntimeError: 'maximum recursion depth exceeded while calling a Python object' in <function remove at 0x5f9d578> ignored

However, a weakref.WeakValueDictionary seems safe:

sage: class A: pass
sage: M = weakref.WeakValueDictionary()
sage: a = A()
....: prev = a
....: for i in range(1000):
....:     newA = A()
....:     M[newA] = prev
....:     prev = newA
....:     
sage: len(M)
1000
sage: del a
sage: len(M)
0

even though the recursive deletion of dictionary items seems similar.

Aim of this ticket: Make it so that the erasers of MonoDict and TripleDict are not recursively called and thus the dreaded 'maximum recursion depth exceeded ... ignored' finally vanishes.

Attachments (1)

trac15069-recursion_safe_callback.patch (5.5 KB) - added by SimonKing 7 years ago.

Download all attachments as: .zip

Change History (11)

comment:1 Changed 7 years ago by SimonKing

Comments 97--102 at #10963 provide some ideas how to solve the problem.

Actually I originally thought that the solution is easy: When the eraser of a MonoDict key is called, the to-be-deleted value shall be assigned to a temporary variable and thus be prevented from being deleted while the corresponding items from the bucket of the MonoDict are deleted; the value can only be deleted (causing a new invocation of the eraser) when the original call to the eraser is done and the temporary variable vanishes.

Inserting print statements shows that in unpatched Sage the eraser calls are nested, which explains the "recursion depth exceeded". And in fact, the trick with the temporary variable has the effect that the eraser calls are now neatly ordered and not nested. And now comes the punch-line: The "recursion depth exceeded" error does not vanish!!!

The comments at #10963 seem to show that the error behaves different if the class A in the example from the ticket description is a Python or a Cython class. But there is a recursion depth problem in either case.

comment:2 follow-up: Changed 7 years ago by SimonKing

Defining

class A:                  
    def __del__(self):
        print "__del__",id(self)

and keeping track of the order in which the elements are created by

L = []
a = A()
prev = a
for i in range(1000):
    L.append(id(prev))
    newA = A()
    M[prev] = newA
    prev = newA
del a

I see that the n-th element created (the first one is a) is the "last but n"-th element for which __del__ is called. This comes as a surprise. One should think that del a first makes a disappear, which then triggers a call to the "eraser" callback, which then makes M[a] disappear, which triggers the next "eraser" callback.

comment:3 in reply to: ↑ 2 Changed 7 years ago by SimonKing

Replying to SimonKing:

I see that the n-th element created (the first one is a) is the "last but n"-th element for which __del__ is called. This comes as a surprise.

Or not? The weakref module says:

Weak references to an object are cleared before the object's __del__() is called, to ensure that the weak reference callback (if any) finds the object still alive.

Hm. I understand that a.__del__() is executed only after the callback of the weak reference to a has finished. But by print statements I found that the callback for a is done before the callback for M[a] is invoked! So, why is a.__del__() not executed directly after the first callback?

comment:4 Changed 7 years ago by SimonKing

It seems that it is not enough to keep a reference to the value of the key-value pair being deleted by the eraser---but if the eraser returns the value, then I see the recursion depth error disappear. Hence, with this patch

  • sage/structure/coerce_dict.pyx

    diff --git a/sage/structure/coerce_dict.pyx b/sage/structure/coerce_dict.pyx
    a b  
    187187        h,offset = r.key
    188188        cdef list bucket = <object>PyList_GET_ITEM(buckets, (<size_t>h) % PyList_GET_SIZE(buckets))
    189189        cdef Py_ssize_t i
     190        cdef object val
    190191        for i from 0 <= i < PyList_GET_SIZE(bucket) by 3:
    191192            if PyInt_AsSsize_t(PyList_GET_ITEM(bucket,i))==h:
    192193                if PyList_GET_ITEM(bucket,i+offset)==<void *>r:
     194                    val = <object>PyList_GET_ITEM(bucket,i+2)
    193195                    del bucket[i:i+3]
    194196                    D._size -= 1
    195197                    break
    196198                else:
    197199                    break
     200        return val
    198201

seems to work. Doing tests now...

Changed 7 years ago by SimonKing

comment:5 Changed 7 years ago by SimonKing

  • Authors set to Simon King

With the patch that I have attached, the "recursion depth exceeded" problem seems to vanish. Let's see if the full test suite passes.

comment:6 Changed 7 years ago by SimonKing

  • Status changed from new to needs_review

The tests pass for me. Needs review, then!

comment:7 follow-up: Changed 7 years ago by vbraun

I don't think Python makes any guarantees that this is safe. The issue is precisely where the object is deallocated. As you noted, a local variable doesn't cut it as it gets destroyed while we are still in the "remove' function frame. A return value apparently works, but that is just an implementation detail of CPython. A different Python version might very well destroy returned-but-discarded values while still in the function frame.

Its still better than before, so if you want to run with this workaround then that is fine with me.

comment:8 in reply to: ↑ 7 Changed 7 years ago by nthiery

Replying to vbraun:

I don't think Python makes any guarantees that this is safe. The issue is precisely where the object is deallocated. As you noted, a local variable doesn't cut it as it gets destroyed while we are still in the "remove' function frame. A return value apparently works, but that is just an implementation detail of CPython. A different Python version might very well destroy returned-but-discarded values while still in the function frame.

Its still better than before, so if you want to run with this workaround then that is fine with me.

In the worst case we will get an explicit recursion error, some objects won't get deleted, and the calculation will proceed safely, right?

If yes, then I agree it's not yet perfect but safe enough to go ahead. Thanks Simon!

I don't feel quite qualified for the fine technical details. Volker, if you give me a green light, I am happy finishing the rest of the review.

Cheers,

Nicolas

comment:9 Changed 7 years ago by vbraun

  • Reviewers set to Volker Braun
  • Status changed from needs_review to positive_review

At least we'll have a doctest, so if it breaks in the future we'll be notified ;-)

comment:10 Changed 7 years ago by jdemeyer

  • Merged in set to sage-5.12.beta4
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.