Opened 8 years ago
Closed 8 years ago
#13904 closed defect (duplicate)
Better deletion of items of TripleDict
Reported by: | SimonKing | Owned by: | rlm |
---|---|---|---|
Priority: | major | Milestone: | sage-duplicate/invalid/wontfix |
Component: | memleak | Keywords: | |
Cc: | nbruin, vbraun, jpflori | Merged in: | |
Authors: | Reviewers: | Simon King, Jean-Pierre Flori | |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | #13896, #12313 | Stopgaps: |
Description (last modified by )
In #715, TripleDict
has been modified, so that the key items are compared by identity rather than equality, and so that weak keys are used when possible.
However, as it turns out, the deletion of the items tracked in a TripleDict
was done improperly. The aim of this ticket is to fix that.
Superseded by #13387.
Attachments (4)
Change History (38)
Changed 8 years ago by
comment:1 follow-ups: ↓ 4 ↓ 7 ↓ 9 Changed 8 years ago by
There's the more severe problem that during GC cleanup, a weakref callback could lead to the deletion of a non-weakreffable key, which could retrigger the dealloc of the object that has just generated the weakref. That would lead to a double dealloc.
The following might be a solution:
- equip TripleDict? with an attribute
self.deathrow=[]
. - when TripleDictEraser? finds itself removing a non-weakreffed key (other than None, int, float, etc), it appends the key to
self.deathrow
instead.
This way, we don't trigger the deletion (and the whole cascade that may follow) in the callback.
Deathrow will be cleaned up eventually, as part of the regular cleanup code for deallocation of TripleDict?.
The problem is that weakref callback operates in a strange environment: There is a dealloc call higher up the stack somewhere. Normally python code doesn't run under those circumstances. It shouldn't make much difference, but you should absolutely not delete that object again. Hence, I think the default should be: Don't delete objects in callback. You can modify this rule, but only on a case-by-case basis, by carefully reasoning that the deletions that you do allow won't lead to nasty unpredictable deletion cascades.
comment:2 Changed 8 years ago by
I don't think we need this ticket for the __delitem__
optimization. That can be wrapped into #13387.
comment:3 Changed 8 years ago by
Nils, indeed it might be an idea to use the ideas from #13387. However, the ticket here is not about optimization.
Anyway.
I am starting with Sage's debug version as in #13864. With the first patch applied, one gets a crash in modules/module.pyx
With both patches applied, one gets
sage -t "devel/sage-main/sage/modules/module.pyx" [3.5 s]
So, that looks like a progress.
However, at least with a slightly different version of the first patch, make ptest still results in
sage -t -force_lib devel/sage/sage/schemes/elliptic_curves/ell_point.py # Killed/crashed sage -t -force_lib devel/sage/sage/algebras/free_algebra_quotient.py # Killed/crashed sage -t -force_lib devel/sage/sage/algebras/free_algebra_quotient_element.py # Killed/crashed sage -t -force_lib devel/sage/sage/categories/modules_with_basis.py # Killed/crashed sage -t -force_lib devel/sage/sage/categories/homset.py # 1 doctests failed
when both patches are applied. Note, however, that I can not reproduce these crashes when I run them independently.
Hence, one needs to dig deeper. See comment:1.
comment:4 in reply to: ↑ 1 ; follow-up: ↓ 6 Changed 8 years ago by
Replying to nbruin:
There's the more severe problem that during GC cleanup, a weakref callback could lead to the deletion of a non-weakreffable key, which could retrigger the dealloc of the object that has just generated the weakref. That would lead to a double dealloc.
Ah, you say that the problem actually isn't so much the __delitem__
but the callback? OK, I'll have a look into that. And of course we should try to rebase #13387.
comment:5 Changed 8 years ago by
Oops, I just notice that the patch attempts to fix TripleDictEraser.__call__
, hence, the callback, but not TripleDict.__delitem__
. So, I did have a look into the callback function...
comment:6 in reply to: ↑ 4 Changed 8 years ago by
Replying to SimonKing:
Ah, you say that the problem actually isn't so much the
__delitem__
but the callback? OK, I'll have a look into that. And of course we should try to rebase #13387.
The __delitem__
isn't wrong. It just happens to be slightly more memory intensive that the straight del
, so it happened to trigger a GC. The crash was due to Cython not handling GC during deallocation properly. That's getting fixed with #13896. The potential issue I described above hasn't been observed. I suspect that one could engineer an example. Perhaps one should (Robert's cython test case for #13896 can serve as a template). It's just that memory management is such a sensitive business that one should really (informally) prove bad scenarios can't happen rather than wait until they do. I think the scenario I describe can happen and I don't see what measures in the python code would prevent it.
One way to solve this would be to forbid non-weakreffable container types to be used as keys. The problem is that I can't think of a reliable test for that. Since the consequences of violating might lead to incredibly insidious memory corruptions, I think we should guard against it. A consequence is that any non-weakreffable (non-whitelisted such as None or int) key automatically has its life span bounded below by the lifetime of the TripleDict
, even if the entry in which it served got removed due to a weakref dying (explicit key deletions would be fine). I don't see how we could get hold of a hook where clearing deathrow
would be safe.
comment:7 in reply to: ↑ 1 ; follow-up: ↓ 8 Changed 8 years ago by
Replying to nbruin:
There's the more severe problem that during GC cleanup, a weakref callback could lead to the deletion of a non-weakreffable key, which could retrigger the dealloc of the object that has just generated the weakref. That would lead to a double dealloc.
Is what you describe as follows?
Let's say we have a key (A,B,C)
, where A
is weakrefed and got deleted, so that the key (A,B,C)
gets removed by means of A
's callback function. Now let B
be non-weakrefable. Assume further that the key triple is the last reference to B, and assume that B
has an attribute that constitutes a (strong? weak?) reference to A
. Then, deletion of B
would trigger a second deletion of A
.
Or did I misunderstand? Can you elaborate a bit more, please?
comment:8 in reply to: ↑ 7 Changed 8 years ago by
Replying to SimonKing:
Let's say we have a key
(A,B,C)
, whereA
is weakrefed and got deleted, so that the key(A,B,C)
gets removed by means ofA
's callback function. Now letB
be non-weakrefable. Assume further that the key triple is the last reference to B, and assume thatB
has an attribute that constitutes a (strong? weak?) reference toA
. Then, deletion ofB
would trigger a second deletion ofA
.
Yes, that's correct. It would have to be a strong reference from B to A, because deletion of weakrefs should not trigger deletion cascades. Of course, that means the reference count to A isn't 0, so normally A would not be eligible for deletion ... but if A and B are found to be cyclic garbage, deallocation could still occur. We'd be at the mercy of GC for how the chain gets broken up and deleted.
In fact, moving B to deathrow
could be tricky, because that would resurrect B
(generally frowned upon), albeit only within the cyclic garbage. OK, I'll try and see if I can make a testcase that triggers this behaviour. At this point I don't even know how to fix this. Perhaps we should just forbid non-weakreffable keys that refer to another key component. Mind you, that forbids
(None, ZZ, ZZ(1) )
as a key, which doesn't immediately look completely unlikely. We don't generate keys like that, but someone else might ...
comment:9 in reply to: ↑ 1 ; follow-up: ↓ 10 Changed 8 years ago by
Replying to nbruin:
There's the more severe problem that during GC cleanup, a weakref callback could lead to the deletion of a non-weakreffable key, which could retrigger the dealloc of the object that has just generated the weakref. That would lead to a double dealloc.
PS: Why can this only happen with a non-weakreffable key? In particular, what are the keys in our crashing example? Namely, in its current use, I don't think that there are any non-weakreffable keys other than None and ints.
The following might be a solution:
- equip TripleDict? with an attribute
self.deathrow=[]
.- when TripleDictEraser? finds itself removing a non-weakreffed key (other than None, int, float, etc), it appends the key to
self.deathrow
instead.This way, we don't trigger the deletion (and the whole cascade that may follow) in the callback.
Deathrow will be cleaned up eventually, as part of the regular cleanup code for deallocation of TripleDict?.
The problem is that weakref callback operates in a strange environment: There is a dealloc call higher up the stack somewhere. Normally python code doesn't run under those circumstances. It shouldn't make much difference, but you should absolutely not delete that object again. Hence, I think the default should be: Don't delete objects in callback.
One could provide a new method TripleDict.kill_deathrow
, that will be called in any other method of TripleDict
. Hence, the deletions would occur when the TripleDict
will be used another time, and in particular this will only happen after the "dealloc call higher up the stack" has been completed.
Perhaps kill_deathrow
could also be called at the beginning of the callback function - so that it performs the deletions that were scheduled by the preceding callback function.
Problem could be a severe slow-down.
comment:10 in reply to: ↑ 9 ; follow-up: ↓ 11 Changed 8 years ago by
Replying to SimonKing:
PS: Why can this only happen with a non-weakreffable key? In particular, what are the keys in our crashing example? Namely, in its current use, I don't think that there are any non-weakreffable keys other than None and ints.
The cython-induced crash has only weakreffable keys. The hypothetical behaviour has not been observed.
One could provide a new method
TripleDict.kill_deathrow
, that will be called in any other method ofTripleDict
. Hence, the deletions would occur when theTripleDict
will be used another time, and in particular this will only happen after the "dealloc call higher up the stack" has been completed.
But arbitrary code can be executed during weakref callbacks, including arbitrary TripleDict
operations. So you'd never be sure if it's safe to kill deathrow
.
Really, this is starting to look more like a design deficiency in CPython: it should really mark objects on which dealloc is running to prevent GC, decref, whatever, from reentering dealloc. Preventing this by code protocol just seems to unduly restrict allowed data structures.
Perhaps
kill_deathrow
could also be called at the beginning of the callback function - so that it performs the deletions that were scheduled by the preceding callback function.
You wouldn't be sure that deathrow entries are now safe to delete ...
EXECUTIVE DECISION: we mandate that key components to TripleDict
be either weakreffable or do not hold references to other objects (i.e., are non container-types). Perhaps we can relax this rule once we come up with a smart mechanism.
comment:11 in reply to: ↑ 10 Changed 8 years ago by
Replying to nbruin:
Replying to SimonKing:
PS: Why can this only happen with a non-weakreffable key? In particular, what are the keys in our crashing example? Namely, in its current use, I don't think that there are any non-weakreffable keys other than None and ints.
The cython-induced crash has only weakreffable keys...
... and is an upstream bug in Cython?
One could provide a new method
TripleDict.kill_deathrow
, that will be called in any other method ofTripleDict
. Hence, the deletions would occur when theTripleDict
will be used another time, and in particular this will only happen after the "dealloc call higher up the stack" has been completed.But arbitrary code can be executed during weakref callbacks, including arbitrary
TripleDict
operations. So you'd never be sure if it's safe to killdeathrow
.
Not quite, because we Know exactly what is happening in our weakref callbacks (namely: In the __call__
method of TripleDictEraser
). The callback function will not call any method of TripleDict
.
Perhaps
kill_deathrow
could also be called at the beginning of the callback function - so that it performs the deletions that were scheduled by the preceding callback function.You wouldn't be sure that deathrow entries are now safe to delete ...
Agreed.
EXECUTIVE DECISION: we mandate that key components to
TripleDict
be either weakreffable or do not hold references to other objects (i.e., are non container-types). Perhaps we can relax this rule once we come up with a smart mechanism.
I guess this would be the easiest solution on the short run. Do you think of a whitelist, i.e., explicitly allowing NoneType
, int
, long
and so on?
comment:12 Changed 8 years ago by
By the way, my first attachment is wrong anyway, because I mispelled gc.isenabled()
.
Changed 8 years ago by
For debugging only. Increase the likelyhood that a gc occurs during a weakref callback
comment:13 follow-ups: ↓ 14 ↓ 16 Changed 8 years ago by
The "debug_only" patch is updated. With it, I tried the following:
sage: cython(""" ....: cdef class A: ....: cdef object x ....: def __init__(self, x): ....: self.x = x ....: """) sage: import gc sage: from sage.structure.coerce_dict import TripleDict sage: T = TripleDict(31) sage: while(1): ....: b = B() ....: a = A(b) ....: T[b,a,None] = a ....: del a,b ....: foo = gc.collect() ....:
That should be exactly the theoretical situation you were describing, isn't it?
But there is no crash. When I interrupt it after a while, with Ctrl-C, I get
sage: len(T) 1681
This I don't understand. Is TripleDict
broken, in the sense of it doesn't allow its weak keys be cleared??
comment:14 in reply to: ↑ 13 Changed 8 years ago by
Replying to SimonKing:
This I don't understand. Is
TripleDict
broken, in the sense of it doesn't allow its weak keys be cleared??
Well, certainly the weak keys can be cleared. But apparently they can not be cleared, if there is a non-weakreffable item in the key that holds a reference to a weakreffable item in the key.
Hence, on the one hand, it seems that the situation you constructed is a non-issue, because the callback function for the weak reference will not be called.
On the other hand, it is yet another reason to not allow non-weakreffable containers as key.
Changed 8 years ago by
Script that tries to cause a double dealloc but fails because python is mature
comment:15 follow-up: ↓ 17 Changed 8 years ago by
HOORAY!
As Modules/gc_weakref.txt shows, Zope has already put very good stresstests on python's clearing of weakrefs in cyclic garbage. I think we're safe with TripleDict
as it is, because python takes special precautions when running weakref callbacks in cleaning up cyclic garbage. In our case the weakref itself is part of the cyclic garbage. That means Python always stacks the deck such that the weakref appears to be cleared prior to the deletion of what it references. Hence, the callback doesn't happen! I think we're safe with TripleDict? as it is.
comment:16 in reply to: ↑ 13 Changed 8 years ago by
Replying to SimonKing:
But there is no crash. When I interrupt it after a while, with Ctrl-C, I get
sage: len(T) 1681This I don't understand. Is
TripleDict
broken, in the sense of it doesn't allow its weak keys be cleared??
No, nothing is broken. You haven't included the definition of B
, but I assume it's a weakreffable type.
Although you delete a,b
, your dictionary T
is still holding a strong reference to a
(because it can't weakref it) and a
is holding a strong reference to b
, so b
is reachable. Nothing gets collected.
Things are collectible once you throw away T
but thanks to the special handling of callbacks in cyclic garbage this is always handled properly.
comment:17 in reply to: ↑ 15 ; follow-up: ↓ 19 Changed 8 years ago by
Replying to nbruin:
I think we're safe with
TripleDict
as it is.
OK. But I wouldn't like to resolve this ticket as "wontfix" or "duplicate".
Still, we should use "del", not "__delitem__
", because it gives shorter C-code. And in addition to that, we should see whether the new Cython from #13896 fixes the problem.
comment:18 Changed 8 years ago by
- Dependencies set to #13896
comment:19 in reply to: ↑ 17 Changed 8 years ago by
Replying to SimonKing:
Still, we should use "del", not "
__delitem__
", because it gives shorter C-code. And in addition to that, we should see whether the new Cython from #13896 fixes the problem.
Yes, go ahead. Since there is no urgent issue anymore, use this ticket for whatever you want (and yes, that cython does solve the problem, if you compile the whole library with it. I'm now pretty confident I know exactly what went wrong, after seeing it happen on bit-level ...). The gc_weakref.txt
is a fascinating read and should be required for anybody intending to meddle with sage's memory management.
comment:20 follow-up: ↓ 22 Changed 8 years ago by
comment:21 Changed 8 years ago by
- Cc jpflori added; jpflory removed
comment:22 in reply to: ↑ 20 Changed 8 years ago by
Can we get the change from delitem to del quickly in? It is trivial, produce better C code, and potentially avoid horribly complicated gc problems?
From what I've really quickly read, Nils pointed out another problem with weakrefs and callbacks, but let's go one step at a time and deal with that elsewhere (or move the trivial fix on a separate ticket).
comment:23 follow-up: ↓ 24 Changed 8 years ago by
- Summary changed from Proper deletion of items of TripleDict to Better deletion of items of TripleDict
Oh in fact, I've read a little less quickly and the problem Nils thought about does not occur? So let's just get the trivial improvement here. I agree it is not a fix, but an imporvement, so I've changed the ticket title to reflect that as well.
And in case one really wants to let garbage collection occurs during weakref callback or whenever, there are always ways to do that.
comment:24 in reply to: ↑ 23 Changed 8 years ago by
Replying to jpflori:
Oh in fact, I've read a little less quickly and the problem Nils thought about does not occur?
Yes, because "Python is more mature than we thought"...
So let's just get the trivial improvement here. I agree it is not a fix, but an imporvement
But, for the record, it is fixed with the new Cython spkg from #13896.
comment:25 Changed 8 years ago by
Note that #12313 comprises a patch that makes the callback of a TripleDictEraser
(hence, the deletion of dead items of a TripleDict
) more secure. But I forgot to include the little improvement from here. Hence, I still think this ticket here is valid.
comment:26 Changed 8 years ago by
- Dependencies changed from #13896 to #13896, #12313
- Status changed from new to needs_review
I think using del
rather than calling __delitem__
makes sense even though the segfaults from #715 seem to be fixed by the Cython upgrade: Apparently del
results in shorter C code.
The new patch uses del
both on TripleDict
and the new MonoDict
, hence, it depends on #12313, where the latter was introduced.
Apply trac13904_use_del.patch
comment:27 Changed 8 years ago by
- Description modified (diff)
- Reviewers set to Jean-Pierre Flori
- Status changed from needs_review to positive_review
comment:28 Changed 8 years ago by
- Milestone changed from sage-5.6 to sage-5.7
comment:29 Changed 8 years ago by
- Milestone changed from sage-5.7 to sage-5.8
comment:30 Changed 8 years ago by
- Merged in set to sage-5.8.beta0
- Resolution set to fixed
- Status changed from positive_review to closed
comment:31 Changed 8 years ago by
- Merged in sage-5.8.beta0 deleted
- Resolution fixed deleted
- Status changed from closed to new
Unmerging because of trouble with #12313.
comment:32 Changed 8 years ago by
Perhaps try #13387 instead? With the solution there, this ticket is superseded there. It improves the performance of these dictionaries in some other ways as well.
comment:33 Changed 8 years ago by
- Status changed from new to needs_review
comment:34 Changed 8 years ago by
- Description modified (diff)
- Milestone changed from sage-5.8 to sage-duplicate/invalid/wontfix
- Resolution set to duplicate
- Reviewers changed from Jean-Pierre Flori to Simon King, Jean-Pierre Flori
- Status changed from needs_review to closed
Attempt to fix
TripleDict.__delitem__