Opened 10 years ago
Last modified 10 years ago
#13896 closed defect
Fix cython's gc_track and gc_untrack — at Version 6
Reported by: | nbruin | Owned by: | rlm |
---|---|---|---|
Priority: | blocker | Milestone: | sage-5.6 |
Component: | memleak | Keywords: | |
Cc: | SimonKing, jpflori | Merged in: | |
Authors: | Reviewers: | ||
Report Upstream: | Completely fixed; Fix reported upstream | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description (last modified by )
In a long sage-devel thread we eventually found in this message that a GC during a weakref callback on a Cython class can lead to double deallocation of that class. In Python's Objects/typeobject.c, line 1024 and onwards, there are some comments that indicate that earlier version of Python were bitten by this problem too. The solution is to insert the appropriate PyObject_GC_Untrack
and PyObject_GC_Track
in cython's deallocation code. This is best fixed in cython itself.
Change History (7)
Changed 10 years ago by
comment:1 Changed 10 years ago by
With attached patch applied to 5.6.beta2 (and probably also other versions close to it),
sage -t devel/sage/sage/modules/module.pyx
will crash relatively reliably on several machines (including sage.math
)
comment:2 Changed 10 years ago by
- Cc SimonKing added
comment:3 follow-up: ↓ 4 Changed 10 years ago by
- Cc jpflori added
I'd like to see this ticket as a blocker, anyone against this idea?
comment:4 in reply to: ↑ 3 Changed 10 years ago by
Replying to jpflori:
I'd like to see this ticket as a blocker, anyone against this idea?
Since this is the ultimate "can generate segfaults anywhere", it's a prime candidate for blocker status. However, we're fully at the mercy of cython developers as to when this gets fixed. Also, if we release with this bug unfixed, we might as well leave #715 in too, since this one has a much wider possible impact :-).
comment:5 Changed 10 years ago by
- Priority changed from major to blocker
Ok, Ive put it as blocker.
For those who want to play while waiting for upstream, I've posted a p0 Cython spkg which does "something" with PyObject_GC_[Un]Track. Not sure it makes any sense, but it seems to make our bug disappear. It's at http://boxen.math.washington.edu/home/jpflori/cython-0.17.3.p0.spkg
comment:6 Changed 10 years ago by
- Description modified (diff)
Apologies. I saw I linked to the wrong file. Include/object.h also has some interesting information, but it looks like it is a bit out-of-date on some bits. In particular, if you look at the actual use of the TRASHCAN macros:
PyObject_GC_UnTrack(self); ++_PyTrash_delete_nesting; Py_TRASHCAN_SAFE_BEGIN(self); --_PyTrash_delete_nesting; ... endlabel: ++_PyTrash_delete_nesting; Py_TRASHCAN_SAFE_END(self); --_PyTrash_delete_nesting;
with the explanation a little lower:
Q. Why the bizarre (net-zero) manipulation of _PyTrash_delete_nesting around the trashcan macros? A. Some base classes (e.g. list) also use the trashcan mechanism. The following scenario used to be possible: - suppose the trashcan level is one below the trashcan limit - subtype_dealloc() is called - the trashcan limit is not yet reached, so the trashcan level is incremented and the code between trashcan begin and end is executed - this destroys much of the object's contents, including its slots and __dict__ - basedealloc() is called; this is really list_dealloc(), or some other type which also uses the trashcan macros - the trashcan limit is now reached, so the object is put on the trashcan's to-be-deleted-later list - basedealloc() returns - subtype_dealloc() decrefs the object's type - subtype_dealloc() returns - later, the trashcan code starts deleting the objects from its to-be-deleted-later list - subtype_dealloc() is called *AGAIN* for the same object - at the very least (if the destroyed slots and __dict__ don't cause problems) the object's type gets decref'ed a second time, which is *BAD*!!! The remedy is to make sure that if the code between trashcan begin and end in subtype_dealloc() is called, the code between trashcan begin and end in basedealloc() will also be called. This is done by decrementing the level after passing into the trashcan block, and incrementing it just before leaving the block. But now it's possible that a chain of objects consisting solely of objects whose deallocator is subtype_dealloc() will defeat the trashcan mechanism completely: the decremented level means that the effective level never reaches the limit. Therefore, we *increment* the level *before* entering the trashcan block, and matchingly decrement it after leaving. This means the trashcan code will trigger a little early, but that's no big deal.
It's probably better to leave out the trashcan for now. It seems like rather tricky code and I'm not sure it's part of the official Python C-API (it might be something internal, just like they use some macros themselves they find unsafe for use in extension modules)
Patch to more reliably produce crash