#13387 closed defect (fixed)
Improve MonoDict and TripleDict data structures
Reported by: | nbruin | Owned by: | Nils Bruin |
---|---|---|---|
Priority: | major | Milestone: | sage-5.8 |
Component: | memleak | Keywords: | |
Cc: | SimonKing, jpflori | Merged in: | sage-5.8.beta2 |
Authors: | Nils Bruin | Reviewers: | Simon King |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | #11521, #12313 | Stopgaps: |
Description (last modified by )
On #715 and #12313 some variants of WeakKeyRef? data structures are introduced, used in caching for, among other things, uniqueness of parents and coercions. In #12313 comment 162 is some data on how these data structures can be improved.
Apply:
Attachments (9)
Change History (63)
Changed 9 years ago by
comment:1 Changed 9 years ago by
- Status changed from new to needs_review
Initial tests show only a doctest failure due to a doctest change introduced on #12313 on sage/categories/modules_with_basis.py
. We'll see what the bots think.
comment:2 Changed 9 years ago by
- Cc SimonKing added
comment:3 Changed 9 years ago by
Excellent work! Looks promising. I tested IdKeyDict
against the revamped classes here with the same tests you posted there. The MonoDict
wins of course:
sage: %timeit K in W 625 loops, best of 3: 38.1 µs per loop sage: %timeit K in M 625 loops, best of 3: 112 ns per loop sage: %timeit K in I 625 loops, best of 3: 160 ns per loop sage: %timeit K2 in W 625 loops, best of 3: 680 ns per loop sage: %timeit K2 in M 625 loops, best of 3: 64.1 ns per loop sage: %timeit K2 in I 625 loops, best of 3: 104 ns per loop sage: %timeit W[K] 625 loops, best of 3: 36.7 µs per loop sage: %timeit M[K] 625 loops, best of 3: 99.2 ns per loop sage: %timeit I[K] 625 loops, best of 3: 146 ns per loop
but already for the TripleDict
it seems a reasonable choice:
sage: %timeit (K,K,True) in I 625 loops, best of 3: 392 ns per loop sage: %timeit (K,K,True) in T 625 loops, best of 3: 451 ns per loop sage: %timeit (K2,K,True) in I 625 loops, best of 3: 133 ns per loop sage: %timeit (K2,K,True) in T 625 loops, best of 3: 541 ns per loop sage: sage: %timeit I[K,K,True] 625 loops, best of 3: 366 ns per loop sage: %timeit T[K,K,True] 625 loops, best of 3: 432 ns per loop
I don't think I'll even understand modern CPU's ... (that or TripleDict
can be further tuned. It was a first try. I think I did use some ideas that might be useful for you, though.
Anyway, at least the iteration needs some work (nice that cython generators now work!)
sage: from sage.structure.idkey_dict import IdKeyDict sage: I = IdKeyDict(3,53, threshold=0.7) sage: L = [] sage: for p in prime_range(10000): ....: L.append(GF(p)['x','y']) ....: sage: for i,K in enumerate(L): ....: I[K,K,True] = i ....: sage: from collections import Counter sage: Counter([k in I for k,v in I.iteritems()]) Counter({False: 1229}) sage: Counter([(k,k,True) in I for k in L]) Counter({True: 1229}) sage: Counter([(k[3](),k[4](),k[5]) in I for k,v in I.iteritems()]) Counter({True: 1229})
but as you see, that's easily fixed. You just have to extract the actual keys. Doing it this way might report errors a bit better than with the unsafe casting of k[:3]
(dead weakrefs etc.)
sage: Counter([(id(k[3]()),id(k[4]()),id(k[5])) == k[:3] for k,v in I.iteritems()]) Counter({True: 1229})
Pitfall, by the way:
sage: import weakref sage: W=weakref.ref(ZZ) sage: weakref.ref(W) TypeError: cannot create weak reference to 'weakref' object
So:
sage: from sage.structure.idkey_dict import IdKeyDict sage: import weakref sage: class D(object): pass ....: sage: d = D() sage: W = weakref.KeyedRef(d,None,None) sage: I = IdKeyDict(1,53, threshold=0.7) sage: I[W]= 10 sage: I[W] 10 sage: del d sage: len(I) 1 sage: I[W] KeyError: <weakref at 0x5f6c4d0; dead> sage: len(I) 0 sage: W <weakref at 0x5f6c4d0; dead>
Note that my key, W
still exists! Also note that an IdKeyDict
would accept mutable (unhashable) objects.
That might be a selling point beyond its "weak" and speed properties.
So we should document that one should not use KeyedRef
s as key elements and probably forbid them. For key construction, after weakreffing has failed, doing a little isinstance(k,KeyedRef)
shouldn't be so bad (that's not the code path we really care about anyway)
comment:4 Changed 8 years ago by
I noticed this python-ideas thread. So there seems some interest for dictionaries keyed by ID elsewhere too (although they're not mentioning a weak key dict)
comment:5 follow-up: ↓ 6 Changed 8 years ago by
- Milestone changed from sage-5.6 to sage-duplicate/invalid/wontfix
I see a logistic problem: The aim of #13904 is to fix some problems introduced with #715. It should be a quick solution, since otherwise #715 will be unmerged. However, the ticket here also depends on #12313, which has a positive review, but isn't merged yet. So, that's too slow.
Therefore, I suggest that #13904 shall use the ideas from here, but only applied to TripleDict
, so that #715 can remain in Sage.
I think it is worth while to provide the IdKeyDict
code in its full generality (that's an attachment I made in #12313), but I am not sure, perhaps that's for a different ticket. Then, #12313 should be tackled, and probably again done as in IdKeyDict
(but perhaps with specialised code).
In other words: I suggest to split this ticket (one part done in #13904 ASAP, the other part later in #12313), so that this ticket shall be marked as a duplicate.
Do you agree?
comment:6 in reply to: ↑ 5 Changed 8 years ago by
Replying to SimonKing:
I see a logistic problem: The aim of #13904 is to fix some problems introduced with #715. It should be a quick solution, since otherwise #715 will be unmerged. However, the ticket here also depends on #12313, which has a positive review, but isn't merged yet. So, that's too slow.
Therefore, I suggest that #13904 shall use the ideas from here, but only applied to
TripleDict
, so that #715 can remain in Sage.I think it is worth while to provide the
IdKeyDict
code in its full generality (that's an attachment I made in #12313), but I am not sure, perhaps that's for a different ticket. Then, #12313 should be tackled, and probably again done as inIdKeyDict
(but perhaps with specialised code).In other words: I suggest to split this ticket (one part done in #13904 ASAP, the other part later in #12313), so that this ticket shall be marked as a duplicate.
Do you agree?
There is nothing wrong with TripleDict
.
EDIT: It used to say except for an eraser problem we don't know how to fix. We should solve that for now by forbidding the problematic keys in the documentation. In other words: "Don't use TripleDict
unless you know what you're doing". Then TripleDict
doesn't have any known bugs and we can take our time to think about how to make TripleDict
more generally applicable. here. There is no such problem as far as we know now though (required reading:
Modules/gc_weakref.txt)
Memory management, deallocation, and weakref callbacks are genuinely hard ...
comment:7 Changed 8 years ago by
- Cc jpflori added
comment:8 Changed 8 years ago by
Concerning the timing reported on #12313:300, with the patch here we get
sage: M=sage.structure.coerce_dict.MonoDict(23) sage: M[ZZ]=1 sage: %timeit _=M[ZZ] 625 loops, best of 3: 240 ns per loop
which is an improvement of 91 ns over the 331 ns reported there. Since a normal dictionary times 110 ns for this operation, and this patch avoids the internal use of another dictionary lookup in _refcache
, it seems like the gain is indeed basically purely the removal of _refcache
.
We now have
sage: x=-20 sage: def test(): ....: for n in xrange(10**7): ....: _=QQ(x) ....: sage: sage: time test() Time: CPU 7.92 s, Wall: 7.95 s
That is a bit of a gain over Time: CPU 8.53 s, Wall: 8.57 s
we have without #13387, but still quite a bit worse than the Time: CPU 2.97 s, Wall: 2.98 s
we have prior to #12313. The difference is that the dictionary that stores the conversion and coercion maps turned into a weakkeyref dict (MonoDict
), which is necessarily slower in key lookup than a normal dict, because there's an extra indirection level in the keys.
Concerning Jeroen's original report, we now have (#12313 + #13387)
sage: def test(RR): ....: for d in range(-20,0): ....: if abs(RR(quadratic_L_function__numerical(1, d, 10000) - quadratic_L_function__exact(1, d))) > 0.001: ....: print "Oops! We have a problem at d = ", d, " exact = ", RR(quadratic_L_function__exact(1, d)), " numerical = ", RR(quadratic_L_function__numerical(1, d)) ....: sage: time test(RealField(50)) Time: CPU 1.81 s, Wall: 1.82 s
versus reference timing
Time: CPU 1.63 s, Wall: 1.64 s
To compare, on this machine, the same test with just #12313 takes
Time: CPU 2.07 s, Wall: 2.08 s
so the improvements here do significantly reduce the regression originally reported by Jeroen.
comment:9 Changed 8 years ago by
Incidentally, doing a del
on a bucket slice is indeed safe: Python defers DECREF activity until the slice has been removed from the list. Also, since the slices we delete are not longer than 8 entries, the temporary references are kept on the stack (faster!). See Objects/listobject.c:614. That means the deletion routines as proposed should be safe, also in view of callbacks.
The only scenario that would lead to undue deletions due to callback would be if a callback on a key with id=I
gets delayed until a new object with the same id
I
gets stored in the same dictionary. However, that would need an object to get deallocated before its callbacks get executed or scrapped. That seems like a real violation of protocol.
For the truly paranoid, we could check in the Eraser
classes that the weakref corresponding to the callback is indeed NULL-ed. However, I'd be shocked if that were necessary.
comment:10 follow-up: ↓ 11 Changed 8 years ago by
Somewhere I had posted a general IdKeyDict
, with a fixed key length that can be chosen at creation time, weak keys (where possible) and comparison by identity. But I can't find it. Can you help me?
comment:11 in reply to: ↑ 10 Changed 8 years ago by
It's where everything is: #12313 (idkey_dict attachment)
comment:12 Changed 8 years ago by
Yippee!! using PyWeakref_GetObject
instead of calling weakrefs is indeed a little faster:
sage: M=sage.structure.coerce_dict.MonoDict(23) sage: M[ZZ]=1 sage: %timeit _=M[ZZ] 625 loops, best of 3: 211 ns per loop sage: %timeit _=M[ZZ]
sage: x=-20 sage: def test(): ....: for n in xrange(10**7): ....: _=QQ(x) ....: sage: sage: time test() Time: CPU 7.76 s, Wall: 7.79 s
sage: def test(RR): ....: for d in range(-20,0): ....: if abs(RR(quadratic_L_function__numerical(1, d, 10000) - quadratic_L_function__exact(1, d))) > 0.001: ....: print "Oops! We have a problem at d = ", d, " exact = ", RR(quadratic_L_function__exact(1, d)), " numerical = ", RR(quadratic_L_function__numerical(1, d)) ....: sage: time test(RealField(50)) Time: CPU 1.76 s, Wall: 1.77 s
So we shave a little more off (something like 30ns, i.e., some 15% on a simple MonoDict? lookup)
Apply trac_13387-rebased.patch trac_13387-fast-weakref-calls.patch
TODO: Further improvement: in the coercion framework, hardwire the use of MonoDict?, provide get and set cdef methods and call those to eliminate further method lookup overhead.
comment:13 Changed 8 years ago by
- Milestone changed from sage-duplicate/invalid/wontfix to sage-feature
I think the patches on this ticket have value already, so I'm refiling under "sage-feature". We'd already benefit from merging these tickets as-is. That said, there is probably room for some easy further improvements, so we can wait with merging it unless someone deems this tickets needs to be a dependency for another one that needs merging.
comment:14 Changed 8 years ago by
TripleDict
has cdef
get
and set
, which are directly called in coercion. This is faster than relying on the slotted __getitem__
and __setitem__
calls (which are still a bit better than completely generic method lookup). Implementing this for MonoDict
and ensuring that it's used for _convert_from_hash
and _coerce_from_hash
speeds things up to an extent that we seem to lose any speed regression wrt prior #12313:
sage: def test(RR): ....: for d in range(-20,0): ....: if abs(RR(quadratic_L_function__numerical(1, d, 10000) - quadratic_L_function__exact(1, d))) > 0.001: ....: print "Oops! We have a problem at d = ", d, " exact = ", RR(quadratic_L_function__exact(1, d)), " numerical = ", RR(quadratic_L_function__numerical(1, d)) ....: sage: %time test(RealField(50)) CPU times: user 1.50 s, sys: 0.01 s, total: 1.51 s Wall time: 1.51 s
and
sage: x=-20 sage: def test(): ....: for n in xrange(10**7): ....: _=QQ(x) ....: sage: %time test() CPU times: user 3.70 s, sys: 0.01 s, total: 3.71 s Wall time: 3.71 s
I have noticed that these timings can vary quite a bit between compiles and branch names. I guess loops this tight get sensitive to cache alignment and fortunate code locations.
Apply trac_13387-rebased.patch trac_13387-fast-weakref-calls.patch trac_13387-cdef_monodict_access.patch
comment:15 follow-up: ↓ 16 Changed 8 years ago by
Hm. Seems MonoDict
had a _get_buckets
method with a doctest referring to TripleDict
. Since both MonoDict
and TripleDict
now have cdef get
and set
methods with incompatible signatures, we cannot have TripleDict
inherit from MonoDict
anymore, so I've just copied the definition and adapted the doctest. _get_buckets
is only for debugging anyway.
comment:16 in reply to: ↑ 15 ; follow-up: ↓ 18 Changed 8 years ago by
Replying to nbruin:
Hm. Seems
MonoDict
had a_get_buckets
method with a doctest referring toTripleDict
. Since bothMonoDict
andTripleDict
now have cdefget
andset
methods with incompatible signatures, we cannot haveTripleDict
inherit fromMonoDict
anymore,
I think this is why I originally dropped get/set for MonoDict
.
What about having different names for the getters and setters (mget/mset versus tget/tset)? It would avoid the duplication, but would certainly not be very clean.
comment:17 Changed 8 years ago by
By the way, what patches are to be applied? The following?
Apply trac_13387-rebased.patch trac_13387-fast-weakref-calls.patch trac_13387-cdef_monodict_access.patch
comment:18 in reply to: ↑ 16 Changed 8 years ago by
Replying to SimonKing:
What about having different names for the getters and setters (mget/mset versus tget/tset)? It would avoid the duplication, but would certainly not be very clean.
I think it's worse than the current situation. The reality is that MonoDict
and TripleDict
only share some data attribute names and a debug routine. They are each separately implemented. Code sharing is more inviting bugs due to unintended side-effects of code changes that appear local (as demonstrated here) than saving work. I think having the classes independent is better. If it were up to me I'd also make the Eraser
classes independent, but that's just a small issue. In the end, both are just unrolled special cases of IdKeyDict
(which, as we found, might be a contender for TripleDict
but not for MonoDict
) so we don't have them for codce cleanliness but for speed.
The patch order is correct, as with the Apply statement before. Note that the patchbot results page sorts records by the time local to the bot, so reports are not necessarily in reverse chronological order. It did get a "tests past" with the current patch sequence.
comment:19 Changed 8 years ago by
- Description modified (diff)
comment:20 Changed 8 years ago by
- Description modified (diff)
comment:21 follow-up: ↓ 22 Changed 8 years ago by
comment:22 in reply to: ↑ 21 ; follow-up: ↓ 24 Changed 8 years ago by
Replying to jdemeyer:
Does this supersede #13904 (i.e. should #13904 be closed as duplicate)?
I would say yes, in the sense that if we merge this ticket then #13904 is irrelevant (the lines amended there wouldn't exist anymore). In light of the new findings on #12313 it looks like we have to merge something like the changes proposed here (at least the first patch), but Simon should probably chime in as well. He's most qualified to review this ticket.
comment:23 Changed 8 years ago by
It looks like this patch does indeed fix the KeyError
I posted on #12313. At least, I can no longer reproduce the problem with this patch applied.
comment:24 in reply to: ↑ 22 Changed 8 years ago by
Replying to nbruin:
Replying to jdemeyer:
Does this supersede #13904 (i.e. should #13904 be closed as duplicate)?
I would say yes, in the sense that if we merge this ticket then #13904 is irrelevant (the lines amended there wouldn't exist anymore). In light of the new findings on #12313 it looks like we have to merge something like the changes proposed here (at least the first patch), but Simon should probably chime in as well. He's most qualified to review this ticket.
OK. Is this supposed to be for 5.7 or 5.8?
comment:25 Changed 8 years ago by
Certainly not sage-5.7.
comment:26 Changed 8 years ago by
- Milestone changed from sage-feature to sage-5.8
- Status changed from needs_review to needs_work
On hawk
(OpenSolaris i386), Sage fails to start:
File "./spkg-install", line 4, in <module> from sage.all import save File "/export/home/buildbot/build/sage/hawk-1/hawk_full/build/sage-5.8.beta1/local/lib/python2.7/site-packages/sage/all.py", line 63, in <module> from sage.misc.all import * # takes a while File "/export/home/buildbot/build/sage/hawk-1/hawk_full/build/sage-5.8.beta1/local/lib/python2.7/site-packages/sage/misc/all.py", line 83, in <module> from functional import (additive_order, File "/export/home/buildbot/build/sage/hawk-1/hawk_full/build/sage-5.8.beta1/local/lib/python2.7/site-packages/sage/misc/functional.py", line 36, in <module> from sage.rings.complex_double import CDF File "complex_double.pyx", line 96, in init sage.rings.complex_double (sage/rings/complex_double.c:16857) OverflowError: long int too large to convert to int
Line 96 of "complex_double.pyx" is import real_mpfr
, see also the C code:
/* "sage/rings/complex_double.pyx":96 * CC = complex_field.ComplexField() * * import real_mpfr # <<<<<<<<<<<<<< * RR = real_mpfr.RealField() * */ __pyx_t_2 = __Pyx_Import(((PyObject *)__pyx_n_s__real_mpfr), 0, -1); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 96; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); if (PyObject_SetAttr(__pyx_m, __pyx_n_s__real_mpfr, __pyx_t_2) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 96; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
comment:27 follow-up: ↓ 28 Changed 8 years ago by
You're not giving me much to go on here ... Clearly on that machine "long int != int", It being an i386, I'd assume "long int" is 32 bits. Is "int" 16 bits there? That's just going to be unworkable! Hopefully the "int" is a variable under our control that we can simply change to a "long int".
Anyway, on pristine 5.8b0 and 5.8b1 I'm finding that this is line 94. So are you sure the problem comes from the patch here?
Anyway, FWIW, #715 comment 179 also addressed 32 bit problems, although there it seems to deal more with sign problems for "%". The reported behaviour here doesn't seem to be of that kind, unless the "%" gets executed in a signed way and then interpreted as an unsigned value (then you could easily get a "very big" value)
If you give me access to something where I can reproduce the problematic behaviour I can diagnose a little better. That might be preferable to me making random changes to the patch here in the hole that you get more favourable results. Or better yet, perhaps JP can put his acquired skills to use, since he has solved these similar problems before.
comment:28 in reply to: ↑ 27 Changed 8 years ago by
Replying to nbruin:
You're not giving me much to go on here ... Clearly on that machine "long int != int", It being an i386, I'd assume "long int" is 32 bits. Is "int" 16 bits there?
I'm fairly certain both "long int" and "int" are 32 bits. But any of those words could refer to the Python type "long" or "int".
Anyway, on pristine 5.8b0 and 5.8b1 I'm finding that this is line 94. So are you sure the problem comes from the patch here?
Yes, unmerging the patch fixes the problem. In case you wonder, I also applied #13618 but that's really only about doctests.
comment:29 follow-up: ↓ 31 Changed 8 years ago by
To me, the expressions
PyInt_AsSsize_t(PyList_GET_ITEM(bucket, i))
look the most suspicious. What is PyList_GET_ITEM(bucket, i)
and why are we sure it fits in a Py_ssize_t
?
comment:30 Changed 8 years ago by
Looking at the Python sources now. The error message is precisely what one would get if PyInt_AsSsize_t()
overflows (I do agree that the error text is confusing).
Anyway, you know much better than me where the values PyList_GET_ITEM(bucket, i)
come from, so I'll let you think about it.
comment:31 in reply to: ↑ 29 ; follow-up: ↓ 32 Changed 8 years ago by
Replying to jdemeyer:
To me, the expressions
PyInt_AsSsize_t(PyList_GET_ITEM(bucket, i))look the most suspicious. What is
PyList_GET_ITEM(bucket, i)
and why are we sure it fits in aPy_ssize_t
?
Ah yes! Good catch. This could very well be the source. The stored values here are id
s of python objects. They are stored via a <size_t><void *>
cast (which subsequently gets converted to a python int). When we try to retrieve that via a PyInt_AsSsize_t
we're trying to retrieve it into a signed
size_t
(that's the purpose of the Ssize_t
, right?), so indeed this might not fit.
What's the best fix, though? Should we be storing via a <Py_ssize_t><void *>
cast or should we do a <size_t><void *>PyList_GET_ITEM(...)
instead? I think I found PyInt_AsSsize_t
by looking at the code generated by that statement.
It depends a bit: I would assume that python has relatively efficient storage for both small positive and negative integers, so storing as a signed quantity would make better use of the bits python offers for its "small" (as in not PyLong
) integers.
Previously, the code prevented this whole issue by constructions such as
tmp = <object>PyList_GET_ITEM(bucket, i) if <size_t>tmp == h1:
which of course works, but is less efficient: The cast to <object>
will introduce refcounting.
So, my hunch is: store keys as <Py_ssize_t>
and cast to <size_t>
before doing modulo operations. I suppose that <size_t><Py_ssize_t>(-1)
equals 256^(size_of(size_t))-1
(and probably doesn't generate code).
Advice welcome.
comment:32 in reply to: ↑ 31 ; follow-up: ↓ 33 Changed 8 years ago by
Replying to nbruin:
I would assume that python has relatively efficient storage for both small positive and negative integers, so storing as a signed quantity would make better use of the bits python offers for its "small" (as in not
PyLong
) integers.
That is completely true, so going the Py_ssize_t way is probably the most efficient.
If N is of type Py_ssize_t, then <size_t>N % k isn't even slow, it should actually be at least as fast as N % k and certainly a lot faster than
h = N % k if h < 0: h = -h
comment:33 in reply to: ↑ 32 Changed 8 years ago by
Replying to jdemeyer:
That is completely true, so going the Py_ssize_t way is probably the most efficient.
If N is of type Py_ssize_t, then <size_t>N % k isn't even slow, it should actually be at least as fast as N % k.
Cool. I'll try and prepare a patch that goes that route then. Hopefully that solves the problem.
comment:34 Changed 8 years ago by
- Description modified (diff)
- Status changed from needs_work to needs_review
OK! Hopefully this does the trick. All doctests pass for me when applied to 5.8beta0. We'll see what the bot thinks.
I have uploaded the incremental changes wrt previous work here, in case that helps for review. For convenience, there's also a combined path, that has everything in one.
Apply trac_13387-combined.patch
comment:35 Changed 8 years ago by
Oh bother. Neither the iteration code nor the resize code is likely to be safe. The resize code does memory allocations throughout its inner loop, so callbacks (and bucket modifications!) can happen there. The loop is not robust against that in a rather fundamental way: If a callback happens on an entry that has already been copied over to the new buckets, but before the new buckets have been commissioned, then the entry will get removed from its old bucket (possibly causing mayhem), but the entry in the new bucket will happily remain!
I'm not sure how to formulate resize otherwise. Should we trigger a garbage collection, to ensure all collectable entries have already been removed? If we start rearranging the buckets right after that, we shouldn't be getting any callbacks. Unless GC isn't always completely thorough. Can it ever be that a GC call issued right after another GC can still find something? Given how difficult it can be to break cycles, I can imagine this would happen. For instance, if a weakref callback causes something else to become garbage, will that still be collected in the same call?
Another approach might be to "lock" the dictionary so that callbacks get queued rather than executed. When we unlock the dictionary we could process the queue (or purge the dict so that we're sure any outstanding callback has become irrelevant)
The iterator is less inherently unsafe. It's just that in between "yield"s there is likely allocation activity of some sort, so GCs could modify the dict.
In any case, I don't think the work on this ticket makes the situation worse than it already is, which apparently is not so bad: resize operations are very rare. It may well be they don't happen in the test suite at all.
Furthermore, the problem only arises if a new bucket needs to be extended during copying. Since buckets are supposed to be really small, this would be even rarer (python lists are initialized with room for a certain number of entries, right?). We don't know how big the buckets are going to have to be. Otherwise we could just preallocate the buckets with sufficient length.
In fact, I wonder how robust Python's own WeakValueDict? and WeakKeyDict? are in the face of these issues. Perhaps they are OK because their memory allocations are (near) atomic.
That would be an option too: program the hash tables of TripleDict? and MonoDict? via open addressing rather than the list-of-lists we have now. I'm not the right person to make such a change, though. I'm sure implementing good hash tables has all kinds of pitfalls I am not aware of.
Conclusion: This ticket should still be merged because it's likely strictly an improvement over the status-quo, but there are some fundamental design problems due to an extremely remote possibility for unfortunate GC occurrences in the middle of dictionary update operations.
comment:36 Changed 8 years ago by
- Description modified (diff)
Therapeutic despair. The solution is obvious! Just disable garbage collection for the duration of the critical block. That way Python can "queue" the callbacks for us. This should ensure that MonoDict.resize
and TripleDict.resize
are safe.
Iteration comes "buyer beware" and I don't think any code uses it presently anyway.
Apply trac_13387-combined.patch, trac_13387-guard_GC.patch
comment:37 follow-up: ↓ 38 Changed 8 years ago by
Some remarks about the comment (lines 48--62 of sage/structure/coerce_dict.pyx
):
Nobody guarantees that size_t
is a "pointer sized integer". In practice, the number of bits in a size_t
does equal the bitness of the machine on modern systems, but there is no guarantee for this. In C99, there is a type uintptr_t
, which is defined as being a pointer-sized unsigned integer. See also http://stackoverflow.com/questions/1464174/size-t-vs-intptr-t for some good points.
The next sentence (This has an advantage...) could be made more concrete: Assuming that Py_ssize_t is the same as a C long (which is true on most Unix-like systems), this also has the advantage that these Py_ssize_t values are stored as a Python "int" (as opposed to "long"), which allow for fast conversion to/from C types.
Typo:
(<size_t) h)% modulus
comment:38 in reply to: ↑ 37 ; follow-up: ↓ 39 Changed 8 years ago by
Replying to jdemeyer:
Nobody guarantees that
size_t
is a "pointer sized integer".
Didn't know that. Thank you for pointing this out.
In C99, there is a type
uintptr_t
, which is defined as being a pointer-sized unsigned integer.
OK. Can we use this instead?
comment:39 in reply to: ↑ 38 ; follow-up: ↓ 40 Changed 8 years ago by
Replying to SimonKing:
OK. Can we use this instead?
Not really, since Python doesn't define a function like PyInt_AsIntptr_t
.
But, like I said, for all current systems, we have ssize_t == intptr_t
.
comment:40 in reply to: ↑ 39 ; follow-up: ↓ 41 Changed 8 years ago by
Replying to jdemeyer:
Not really, since Python doesn't define a function like
PyInt_AsIntptr_t
. But, like I said, for all current systems, we havessize_t == intptr_t
.
There is something that comes close, though:
Python's id
is implemented via PyLong_FromVoidPtr
. This routine only compiles if sizeof(void *) <= sizeof(long)
or if sizeof(void *) <= sizeof(long long)
(if that type is available).
So CPython itself is taking reasonable but not extreme measures to support odd pointer sizes (I guess sizeof(void *) !=sizeof(long)
is quite possible)
The definition is available:
cdef extern from "stdint.h": ctypedef long intptr_t #doesn't need to match exactly
In order to find conversion routines in Python that are guaranteed of the right length, we'd have to go with PyLong_FromVoidPtr
and PyLong_AsVoidPtr
. So basically we'd be working with void *
and convert to intptr_t
if we need to do arithmetic.
In fact, at that time we can afford to lose bits, so we wouldn't particularly need intptr_t
as a type at all. We can just cast to <long>
once we do arithmetic to compute the hash value or the bucket number.
So the question really is: Do we want to let our keys be <void *>
rather than Py_Ssize_t
?
We do need to pack and unpack those keys into the bucket PyList
objects.
There is a slight problem there: PyLong_FromVoidPtr
goes through extra trouble to interpret void *
as an unsigned quantity (if only it didn't do that! what's wrong with a negative pointer?). Thus even on systems where a PyInt? and a void *
have the same number of bits available, half of the possible values will be stored in a PyLong
anyway (because of sign). [Note that PyLong_FromVoidPtr
is happy to produce a PyInt
if it thinks it fits.
So yes, we could do that and guard against future platforms where sizeof(void *) > sizeof(size_t)
, but we'd do so at a storage and performance penalty on our platforms now.
(probably very rare on 64bit, because pointer tend to have their top bits zero there, but on 32-bit it's very common to have pointers with their top bit set)
I'm not sure it's worth it.
comment:41 in reply to: ↑ 40 Changed 8 years ago by
Replying to nbruin:
I'm not sure it's worth it.
I agree, especially also given the fact that other components of Sage make more serious assumptions.
comment:42 Changed 8 years ago by
- Status changed from needs_review to needs_work
- Work issues set to rebase
It seems that #12313 is merged in sage-5.8.beta0. But the first patch does not apply, 2 out of 30 hunks fail.
-
coerce_dict.pyx
cdef class MonoDictEraser: 115 137 # and it knows the stored key of the unique singleton r() had been part 116 138 # of. 117 139 # We remove that unique tuple from self.D -- if self.D is still there! 118 cdef MonoDict D = self.D() 140 141 #WARNING! These callbacks can happen during the addition of items to the same 142 #dictionary. The addition code may then be in the process of adding a new entry 143 #(one PyList_Append call at a time) to the relevant bucket. 144 #The "PyList_GET_SIZE(bucket) by 3" call should mean that we round down and hence not look 145 #at incomplete entries. Furthermore, deleting a slice out of a buck should still be OK. 146 #this callback code should absolutely not resize the dictionary, because that would wreak 147 #complete havoc. 148 149 cdef MonoDict D = <object> PyWeakref_GetObject(self.D) 119 150 if D is None: 120 151 return 121 152 cdef list buckets = D.buckets 122 153 if buckets is None: 123 154 return 124 cdef size_t k = r.key 125 cdef size_t h = k % PyList_GET_SIZE(buckets) 126 cdef list bucket = <object>PyList_GET_ITEM(buckets,h) 155 cdef Py_ssize_t h = r.key 156 cdef list bucket = <object>PyList_GET_ITEM(buckets, (<size_t>h) % PyList_GET_SIZE(buckets)) 127 157 cdef Py_ssize_t i 128 for i from 0 <= i < PyList_GET_SIZE(bucket) by 2:129 if <size_t><object>PyList_GET_ITEM(bucket,i)==k:130 del bucket[i:i+ 2]158 for i from 0 <= i < PyList_GET_SIZE(bucket) by 3: 159 if PyInt_AsSsize_t(PyList_GET_ITEM(bucket,i))==h: 160 del bucket[i:i+3] 131 161 D._size -= 1 132 162 break 133 try:134 D._refcache.__delitem__(k)135 except KeyError:136 pass137 163 138 cdef class TripleDictEraser (MonoDictEraser):164 cdef class TripleDictEraser: 139 165 """ 140 166 Erases items from a :class:`TripleDict` when a weak reference becomes 141 167 invalid. … … cdef class TripleDictEraser(MonoDictEras 216 251 # r is a (weak) reference (typically to a parent), and it knows the 217 252 # stored key of the unique triple r() had been part of. 218 253 # We remove that unique triple from self.D 219 cdef size_t k1,k2,k3254 cdef Py_ssize_t k1,k2,k3 220 255 k1,k2,k3 = r.key 221 cdef size_t h = (k1 + 13*k2 ^ 503*k3)222 cdef list bucket = <object>PyList_GET_ITEM(buckets, h% PyList_GET_SIZE(buckets))256 cdef Py_ssize_t h = (k1 + 13*k2 ^ 503*k3) 257 cdef list bucket = <object>PyList_GET_ITEM(buckets, (<size_t>h) % PyList_GET_SIZE(buckets)) 223 258 cdef Py_ssize_t i 224 for i from 0 <= i < PyList_GET_SIZE(bucket) by 4:225 if <size_t><object>PyList_GET_ITEM(bucket, i)==k1 and \226 <size_t><object>PyList_GET_ITEM(bucket, i+1)==k2 and \227 <size_t><object>PyList_GET_ITEM(bucket, i+2)==k3:228 del bucket[i:i+ 4]259 for i from 0 <= i < PyList_GET_SIZE(bucket) by 7: 260 if PyInt_AsSsize_t(PyList_GET_ITEM(bucket, i))==k1 and \ 261 PyInt_AsSsize_t(PyList_GET_ITEM(bucket, i+1))==k2 and \ 262 PyInt_AsSsize_t(PyList_GET_ITEM(bucket, i+2))==k3: 263 del bucket[i:i+7] 229 264 D._size -= 1 230 265 break 231 try:232 D._refcache.__delitem__((k1,k2,k3))233 except KeyError:234 pass235 266 236 267 cdef class MonoDict: 237 268 """ 238 269 This is a hashtable specifically designed for (read) speed in 239 270 the coercion model. 240 271 241 It differs from a python dictin the following important way:272 It differs from a python WeakKeyDictionary in the following important way: 242 273 243 274 - Comparison is done using the 'is' rather than '==' operator. 244 - There are only weak references (if possible) to the keys. 275 - Only weak references to the keys are stored if at all possible. 276 Keys that do not allow for weak references are stored with a normal 277 refcounted reference. 245 278 246 279 There are special cdef set/get methods for faster access. 247 280 It is bare-bones in the sense that not all dictionary methods are
comment:43 Changed 8 years ago by
- Description modified (diff)
- Status changed from needs_work to needs_review
If patchbot starts with sage-5.8.beta0, it should first apply one patch from #12313 (namely the one that reverts the usage of a callback for the weak reference to homsets), and then
Apply trac_13387-combined-rel5.8.b0.patch trac_13387-guard_GC.patch
comment:44 Changed 8 years ago by
I am puzzled. The patchbot is applying the correct patches in the correct order to the correct sage version, but almost all patches fail to apply.
For the record, I have
trac_12313-revert_callback_from_11521.patch trac_13387-combined-rel5.8.b0.patch trac_13387-guard_GC.patch
on top of sage-5.8.beta0.
comment:45 Changed 8 years ago by
To be on the safe side, I was deleting the three patches and downloaded them from trac. They applied fine. So, I think the error is on the side of the patchbot.
comment:46 Changed 8 years ago by
Argh. The patch keeps failing to apply! Could it be that the "official" version of 5.8.beta0 is different from the one that I had downloaded?
comment:47 Changed 8 years ago by
- Description modified (diff)
- Type changed from enhancement to defect
- Work issues rebase deleted
I don't see why you think rebasing is necessary. The previous patchbot logs already show that the ticket here applies cleanly on 5.8beta0 and that tests pass.
The fact that #12313 fails to apply with its currently stated dependencies is expected: The patchbot doesn't understand "merge with". However, if patches apply here and tests pass, that's an implicit validation of #12313 (with the understanding that this ticket needs to be merged after it).
I'm also refiling this as a "defect" rather than an enhancement: Some of the fixes here fix real bugs (although we might not have seen them triggered).
Let's try again, since the "guard_GC" patch didn't get picked up by the bot before:
Apply trac_13387-combined.patch trac_13387-guard_GC.patch
comment:48 Changed 8 years ago by
- Description modified (diff)
IIRC, patchbot won't apply patches that have an uppercase letter in the name.
Hence, I renamed your second patch.
Apply trac_13387-combined.patch trac_13387-guard_gc.patch
comment:49 follow-up: ↓ 50 Changed 8 years ago by
What is the startup_modules script complaining about?
comment:50 in reply to: ↑ 49 Changed 8 years ago by
Replying to SimonKing:
What is the startup_modules script complaining about?
It's complaining that there seems to be a new module now, sage.structures.gc
. I don't know why that's a bad thing. It's a direct consequence of the statement import gc
that I added, which I need because I need to do gc.enable()
and gc.disable()
. This is in a routine that should rarely execute and has an expensive inner loop that doesn't contain these statements, so I don't think the python method calls are going to make a difference. Furthermore, I wasn't able to find a C-level interface to this functionality. I've checked the source of the GC
module and the underlying C routines are explicitly declared static
, so our only access seems to be via method lookup.
If someone has a recommended way of accessing this functionality without importing gc, that's fine with me. Moving the import gc
statement into runtime, say one of the init methods is a bad idea: Several of these dictionaries get instantiated every time a parent's coercion framework is initialized (which can be pretty often), but a documented C-level API for turning gc on and off would be fine.
comment:51 Changed 8 years ago by
- Reviewers set to Simon King
- Status changed from needs_review to positive_review
With sage-5.8.beta0 plus #14159 plus #12313 plus #13387, make ptest works fine for me. The ideas behind the patches from here are well-documented in comments in the code. The precautions taken to prevent trouble with garbage collection happening in the wrong moment seem reasonable to me. And we have seen that the performance is fine as well.
Hence, it is a positive review.
comment:52 Changed 8 years ago by
PS: From my perspective, it is no problem that gc is imported during Sage startup.
comment:53 Changed 8 years ago by
- Merged in set to sage-5.8.beta2
- Resolution set to fixed
- Status changed from positive_review to closed
comment:54 Changed 8 years ago by
Please see #14254 for a blocker follow-up.
First shot at improvements of MonoDict? and TripleDict? data structures