Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#13387 closed defect (fixed)

Improve MonoDict and TripleDict data structures

Reported by: nbruin Owned by: Nils Bruin
Priority: major Milestone: sage-5.8
Component: memleak Keywords:
Cc: SimonKing, jpflori Merged in: sage-5.8.beta2
Authors: Nils Bruin Reviewers: Simon King
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: #11521, #12313 Stopgaps:

Description (last modified by SimonKing)

On #715 and #12313 some variants of WeakKeyRef? data structures are introduced, used in caching for, among other things, uniqueness of parents and coercions. In #12313 comment 162 is some data on how these data structures can be improved.

Apply:

Attachments (9)

trac_13387.patch (31.1 KB) - added by nbruin 7 years ago.
First shot at improvements of MonoDict? and TripleDict? data structures
trac_13387-rebased.patch (28.5 KB) - added by nbruin 7 years ago.
rebased to apply cleanly over #12313 (28 jan 2013)
trac_13387-fast-weakref-calls.patch (7.3 KB) - added by nbruin 7 years ago.
Use PyWeakref_GetObject
trac_13387-cdef_monodict_access.patch (6.6 KB) - added by nbruin 7 years ago.
cdef get and set access to MonoDict? to speed up coercion lookups
trac_13387-fix_signs.patch (19.9 KB) - added by nbruin 7 years ago.
ensure consistent treatment of signs of hash values
trac_13387-combined.patch (42.9 KB) - added by nbruin 7 years ago.
Combined patch of previous work
trac_13387-guard_GC.patch (3.2 KB) - added by nbruin 7 years ago.
Guard against GC during dict resize
trac_13387-combined-rel5.8.b0.patch (41.9 KB) - added by SimonKing 7 years ago.
Combined patch of previous work, rel 5.8.beta0
trac_13387-guard_gc.patch (3.2 KB) - added by SimonKing 7 years ago.
Guard against GC during dict resize

Download all attachments as: .zip

Change History (63)

Changed 7 years ago by nbruin

First shot at improvements of MonoDict? and TripleDict? data structures

comment:1 Changed 7 years ago by nbruin

  • Status changed from new to needs_review

Initial tests show only a doctest failure due to a doctest change introduced on #12313 on sage/categories/modules_with_basis.py. We'll see what the bots think.

comment:2 Changed 7 years ago by SimonKing

  • Cc SimonKing added

comment:3 Changed 7 years ago by nbruin

Excellent work! Looks promising. I tested IdKeyDict against the revamped classes here with the same tests you posted there. The MonoDict wins of course:

sage: %timeit K in W
625 loops, best of 3: 38.1 µs per loop
sage: %timeit K in M
625 loops, best of 3: 112 ns per loop
sage: %timeit K in I
625 loops, best of 3: 160 ns per loop
sage: %timeit K2 in W
625 loops, best of 3: 680 ns per loop
sage: %timeit K2 in M
625 loops, best of 3: 64.1 ns per loop
sage: %timeit K2 in I
625 loops, best of 3: 104 ns per loop
sage: %timeit W[K]
625 loops, best of 3: 36.7 µs per loop
sage: %timeit M[K]
625 loops, best of 3: 99.2 ns per loop
sage: %timeit I[K]
625 loops, best of 3: 146 ns per loop

but already for the TripleDict it seems a reasonable choice:

sage: %timeit (K,K,True) in I
625 loops, best of 3: 392 ns per loop
sage: %timeit (K,K,True) in T
625 loops, best of 3: 451 ns per loop
sage: %timeit (K2,K,True) in I
625 loops, best of 3: 133 ns per loop
sage: %timeit (K2,K,True) in T
625 loops, best of 3: 541 ns per loop
sage: 
sage: %timeit I[K,K,True]
625 loops, best of 3: 366 ns per loop
sage: %timeit T[K,K,True]
625 loops, best of 3: 432 ns per loop

I don't think I'll even understand modern CPU's ... (that or TripleDict can be further tuned. It was a first try. I think I did use some ideas that might be useful for you, though.

Anyway, at least the iteration needs some work (nice that cython generators now work!)

sage: from sage.structure.idkey_dict import IdKeyDict
sage: I = IdKeyDict(3,53, threshold=0.7)
sage: L = []
sage: for p in prime_range(10000):
....:         L.append(GF(p)['x','y'])
....: 
sage: for i,K in enumerate(L):
....:         I[K,K,True] = i
....: 
sage: from collections import Counter
sage: Counter([k in I for k,v in I.iteritems()])
Counter({False: 1229})
sage: Counter([(k,k,True) in I for k in L])
Counter({True: 1229})
sage: Counter([(k[3](),k[4](),k[5]) in I for k,v in I.iteritems()])
Counter({True: 1229})

but as you see, that's easily fixed. You just have to extract the actual keys. Doing it this way might report errors a bit better than with the unsafe casting of k[:3] (dead weakrefs etc.)

sage: Counter([(id(k[3]()),id(k[4]()),id(k[5])) == k[:3] for k,v in I.iteritems()])
Counter({True: 1229})

Pitfall, by the way:

sage: import weakref
sage: W=weakref.ref(ZZ)
sage: weakref.ref(W)
TypeError: cannot create weak reference to 'weakref' object

So:

sage: from sage.structure.idkey_dict import IdKeyDict
sage: import weakref
sage: class D(object): pass
....: 
sage: d = D()
sage: W = weakref.KeyedRef(d,None,None)
sage: I = IdKeyDict(1,53, threshold=0.7)
sage: I[W]= 10
sage: I[W]
10
sage: del d
sage: len(I)
1
sage: I[W]
KeyError: <weakref at 0x5f6c4d0; dead>
sage: len(I)
0
sage: W
<weakref at 0x5f6c4d0; dead>

Note that my key, W still exists! Also note that an IdKeyDict would accept mutable (unhashable) objects. That might be a selling point beyond its "weak" and speed properties.

So we should document that one should not use KeyedRefs as key elements and probably forbid them. For key construction, after weakreffing has failed, doing a little isinstance(k,KeyedRef) shouldn't be so bad (that's not the code path we really care about anyway)

comment:4 Changed 7 years ago by nbruin

I noticed this python-ideas thread. So there seems some interest for dictionaries keyed by ID elsewhere too (although they're not mentioning a weak key dict)

comment:5 follow-up: Changed 7 years ago by SimonKing

  • Milestone changed from sage-5.6 to sage-duplicate/invalid/wontfix

I see a logistic problem: The aim of #13904 is to fix some problems introduced with #715. It should be a quick solution, since otherwise #715 will be unmerged. However, the ticket here also depends on #12313, which has a positive review, but isn't merged yet. So, that's too slow.

Therefore, I suggest that #13904 shall use the ideas from here, but only applied to TripleDict, so that #715 can remain in Sage.

I think it is worth while to provide the IdKeyDict code in its full generality (that's an attachment I made in #12313), but I am not sure, perhaps that's for a different ticket. Then, #12313 should be tackled, and probably again done as in IdKeyDict (but perhaps with specialised code).

In other words: I suggest to split this ticket (one part done in #13904 ASAP, the other part later in #12313), so that this ticket shall be marked as a duplicate.

Do you agree?

comment:6 in reply to: ↑ 5 Changed 7 years ago by nbruin

Replying to SimonKing:

I see a logistic problem: The aim of #13904 is to fix some problems introduced with #715. It should be a quick solution, since otherwise #715 will be unmerged. However, the ticket here also depends on #12313, which has a positive review, but isn't merged yet. So, that's too slow.

Therefore, I suggest that #13904 shall use the ideas from here, but only applied to TripleDict, so that #715 can remain in Sage.

I think it is worth while to provide the IdKeyDict code in its full generality (that's an attachment I made in #12313), but I am not sure, perhaps that's for a different ticket. Then, #12313 should be tackled, and probably again done as in IdKeyDict (but perhaps with specialised code).

In other words: I suggest to split this ticket (one part done in #13904 ASAP, the other part later in #12313), so that this ticket shall be marked as a duplicate.

Do you agree?

There is nothing wrong with TripleDict.

EDIT: It used to say except for an eraser problem we don't know how to fix. We should solve that for now by forbidding the problematic keys in the documentation. In other words: "Don't use TripleDict unless you know what you're doing". Then TripleDict doesn't have any known bugs and we can take our time to think about how to make TripleDict more generally applicable. here. There is no such problem as far as we know now though (required reading: Modules/gc_weakref.txt)

Memory management, deallocation, and weakref callbacks are genuinely hard ...

Last edited 7 years ago by nbruin (previous) (diff)

comment:7 Changed 7 years ago by jpflori

  • Cc jpflori added

I'd prefer to get #13904 quickly in, then #12313 as it is (or with minimal changes to avoid bugs), and then introduce this one which deals with imporvements rather than fixes.

Changed 7 years ago by nbruin

rebased to apply cleanly over #12313 (28 jan 2013)

comment:8 Changed 7 years ago by nbruin

Concerning the timing reported on #12313:300, with the patch here we get

sage: M=sage.structure.coerce_dict.MonoDict(23)
sage: M[ZZ]=1
sage: %timeit _=M[ZZ]
625 loops, best of 3: 240 ns per loop

which is an improvement of 91 ns over the 331 ns reported there. Since a normal dictionary times 110 ns for this operation, and this patch avoids the internal use of another dictionary lookup in _refcache, it seems like the gain is indeed basically purely the removal of _refcache.

We now have

sage: x=-20
sage: def test():
....:            for n in xrange(10**7):
....:              _=QQ(x)
....: 
sage: sage: time test()
Time: CPU 7.92 s, Wall: 7.95 s

That is a bit of a gain over Time: CPU 8.53 s, Wall: 8.57 s we have without #13387, but still quite a bit worse than the Time: CPU 2.97 s, Wall: 2.98 s we have prior to #12313. The difference is that the dictionary that stores the conversion and coercion maps turned into a weakkeyref dict (MonoDict), which is necessarily slower in key lookup than a normal dict, because there's an extra indirection level in the keys.

Concerning Jeroen's original report, we now have (#12313 + #13387)

sage: def test(RR):
....:         for d in range(-20,0):
....:             if abs(RR(quadratic_L_function__numerical(1, d, 10000) - quadratic_L_function__exact(1, d))) > 0.001:
....:                 print "Oops!  We have a problem at d = ", d, "    exact = ", RR(quadratic_L_function__exact(1, d)), "    numerical = ", RR(quadratic_L_function__numerical(1, d))
....: 
sage: time test(RealField(50))
Time: CPU 1.81 s, Wall: 1.82 s

versus reference timing

Time: CPU 1.63 s, Wall: 1.64 s

To compare, on this machine, the same test with just #12313 takes

Time: CPU 2.07 s, Wall: 2.08 s

so the improvements here do significantly reduce the regression originally reported by Jeroen.

comment:9 Changed 7 years ago by nbruin

Incidentally, doing a del on a bucket slice is indeed safe: Python defers DECREF activity until the slice has been removed from the list. Also, since the slices we delete are not longer than 8 entries, the temporary references are kept on the stack (faster!). See Objects/listobject.c:614. That means the deletion routines as proposed should be safe, also in view of callbacks.

The only scenario that would lead to undue deletions due to callback would be if a callback on a key with id=I gets delayed until a new object with the same id I gets stored in the same dictionary. However, that would need an object to get deallocated before its callbacks get executed or scrapped. That seems like a real violation of protocol.

For the truly paranoid, we could check in the Eraser classes that the weakref corresponding to the callback is indeed NULL-ed. However, I'd be shocked if that were necessary.

comment:10 follow-up: Changed 7 years ago by SimonKing

Somewhere I had posted a general IdKeyDict, with a fixed key length that can be chosen at creation time, weak keys (where possible) and comparison by identity. But I can't find it. Can you help me?

comment:11 in reply to: ↑ 10 Changed 7 years ago by nbruin

It's where everything is: #12313 (idkey_dict attachment)

Changed 7 years ago by nbruin

Use PyWeakref_GetObject

comment:12 Changed 7 years ago by nbruin

Yippee!! using PyWeakref_GetObject instead of calling weakrefs is indeed a little faster:

sage: M=sage.structure.coerce_dict.MonoDict(23)
sage: M[ZZ]=1
sage: %timeit _=M[ZZ]
625 loops, best of 3: 211 ns per loop
sage: %timeit _=M[ZZ]
sage: x=-20
sage: def test():
....:         for n in xrange(10**7):
....:             _=QQ(x)
....:  
sage: sage: time test()
Time: CPU 7.76 s, Wall: 7.79 s
sage: def test(RR):
....:             for d in range(-20,0):
....:                 if abs(RR(quadratic_L_function__numerical(1, d, 10000) - quadratic_L_function__exact(1, d))) > 0.001:
....:                     print "Oops!  We have a problem at d = ", d, "    exact = ", RR(quadratic_L_function__exact(1, d)), "    numerical = ", RR(quadratic_L_function__numerical(1, d))
....: 
sage: time test(RealField(50))
Time: CPU 1.76 s, Wall: 1.77 s

So we shave a little more off (something like 30ns, i.e., some 15% on a simple MonoDict? lookup)

Apply trac_13387-rebased.patch trac_13387-fast-weakref-calls.patch

TODO: Further improvement: in the coercion framework, hardwire the use of MonoDict?, provide get and set cdef methods and call those to eliminate further method lookup overhead.

comment:13 Changed 7 years ago by nbruin

  • Milestone changed from sage-duplicate/invalid/wontfix to sage-feature

I think the patches on this ticket have value already, so I'm refiling under "sage-feature". We'd already benefit from merging these tickets as-is. That said, there is probably room for some easy further improvements, so we can wait with merging it unless someone deems this tickets needs to be a dependency for another one that needs merging.

comment:14 Changed 7 years ago by nbruin

TripleDict has cdef get and set, which are directly called in coercion. This is faster than relying on the slotted __getitem__ and __setitem__ calls (which are still a bit better than completely generic method lookup). Implementing this for MonoDict and ensuring that it's used for _convert_from_hash and _coerce_from_hash speeds things up to an extent that we seem to lose any speed regression wrt prior #12313:

sage: def test(RR):
....:         for d in range(-20,0):
....:                 if abs(RR(quadratic_L_function__numerical(1, d, 10000) - quadratic_L_function__exact(1, d))) > 0.001:
....:                         print "Oops!  We have a problem at d = ", d, "    exact = ", RR(quadratic_L_function__exact(1, d)), "    numerical = ", RR(quadratic_L_function__numerical(1, d))
....:
sage: %time test(RealField(50))
CPU times: user 1.50 s, sys: 0.01 s, total: 1.51 s
Wall time: 1.51 s

and

sage: x=-20
sage: def test():
....:            for n in xrange(10**7):
....:                  _=QQ(x)
....:          
sage: %time test()
CPU times: user 3.70 s, sys: 0.01 s, total: 3.71 s
Wall time: 3.71 s

I have noticed that these timings can vary quite a bit between compiles and branch names. I guess loops this tight get sensitive to cache alignment and fortunate code locations.

Apply trac_13387-rebased.patch trac_13387-fast-weakref-calls.patch trac_13387-cdef_monodict_access.patch

Changed 7 years ago by nbruin

cdef get and set access to MonoDict? to speed up coercion lookups

comment:15 follow-up: Changed 7 years ago by nbruin

Hm. Seems MonoDict had a _get_buckets method with a doctest referring to TripleDict. Since both MonoDict and TripleDict now have cdef get and set methods with incompatible signatures, we cannot have TripleDict inherit from MonoDict anymore, so I've just copied the definition and adapted the doctest. _get_buckets is only for debugging anyway.

comment:16 in reply to: ↑ 15 ; follow-up: Changed 7 years ago by SimonKing

Replying to nbruin:

Hm. Seems MonoDict had a _get_buckets method with a doctest referring to TripleDict. Since both MonoDict and TripleDict now have cdef get and set methods with incompatible signatures, we cannot have TripleDict inherit from MonoDict anymore,

I think this is why I originally dropped get/set for MonoDict.

What about having different names for the getters and setters (mget/mset versus tget/tset)? It would avoid the duplication, but would certainly not be very clean.

comment:17 Changed 7 years ago by SimonKing

By the way, what patches are to be applied? The following?

Apply trac_13387-rebased.patch trac_13387-fast-weakref-calls.patch trac_13387-cdef_monodict_access.patch

comment:18 in reply to: ↑ 16 Changed 7 years ago by nbruin

Replying to SimonKing:

What about having different names for the getters and setters (mget/mset versus tget/tset)? It would avoid the duplication, but would certainly not be very clean.

I think it's worse than the current situation. The reality is that MonoDict and TripleDict only share some data attribute names and a debug routine. They are each separately implemented. Code sharing is more inviting bugs due to unintended side-effects of code changes that appear local (as demonstrated here) than saving work. I think having the classes independent is better. If it were up to me I'd also make the Eraser classes independent, but that's just a small issue. In the end, both are just unrolled special cases of IdKeyDict (which, as we found, might be a contender for TripleDict but not for MonoDict) so we don't have them for codce cleanliness but for speed.

The patch order is correct, as with the Apply statement before. Note that the patchbot results page sorts records by the time local to the bot, so reports are not necessarily in reverse chronological order. It did get a "tests past" with the current patch sequence.

comment:19 Changed 7 years ago by nbruin

  • Description modified (diff)

comment:20 Changed 7 years ago by nbruin

  • Description modified (diff)

comment:21 follow-up: Changed 7 years ago by jdemeyer

Does this supersede #13904 (i.e. should #13904 be closed as duplicate)?

comment:22 in reply to: ↑ 21 ; follow-up: Changed 7 years ago by nbruin

Replying to jdemeyer:

Does this supersede #13904 (i.e. should #13904 be closed as duplicate)?

I would say yes, in the sense that if we merge this ticket then #13904 is irrelevant (the lines amended there wouldn't exist anymore). In light of the new findings on #12313 it looks like we have to merge something like the changes proposed here (at least the first patch), but Simon should probably chime in as well. He's most qualified to review this ticket.

comment:23 Changed 7 years ago by jdemeyer

It looks like this patch does indeed fix the KeyError I posted on #12313. At least, I can no longer reproduce the problem with this patch applied.

comment:24 in reply to: ↑ 22 Changed 7 years ago by SimonKing

Replying to nbruin:

Replying to jdemeyer:

Does this supersede #13904 (i.e. should #13904 be closed as duplicate)?

I would say yes, in the sense that if we merge this ticket then #13904 is irrelevant (the lines amended there wouldn't exist anymore). In light of the new findings on #12313 it looks like we have to merge something like the changes proposed here (at least the first patch), but Simon should probably chime in as well. He's most qualified to review this ticket.

OK. Is this supposed to be for 5.7 or 5.8?

comment:25 Changed 7 years ago by jdemeyer

Certainly not sage-5.7.

comment:26 Changed 7 years ago by jdemeyer

  • Milestone changed from sage-feature to sage-5.8
  • Status changed from needs_review to needs_work

On hawk (OpenSolaris i386), Sage fails to start:

  File "./spkg-install", line 4, in <module>
    from sage.all import save
  File "/export/home/buildbot/build/sage/hawk-1/hawk_full/build/sage-5.8.beta1/local/lib/python2.7/site-packages/sage/all.py", line 63, in <module>
    from sage.misc.all       import *         # takes a while
  File "/export/home/buildbot/build/sage/hawk-1/hawk_full/build/sage-5.8.beta1/local/lib/python2.7/site-packages/sage/misc/all.py", line 83, in <module>
    from functional import (additive_order,
  File "/export/home/buildbot/build/sage/hawk-1/hawk_full/build/sage-5.8.beta1/local/lib/python2.7/site-packages/sage/misc/functional.py", line 36, in <module>
    from sage.rings.complex_double import CDF
  File "complex_double.pyx", line 96, in init sage.rings.complex_double (sage/rings/complex_double.c:16857)
OverflowError: long int too large to convert to int

Line 96 of "complex_double.pyx" is import real_mpfr, see also the C code:

  /* "sage/rings/complex_double.pyx":96
 * CC = complex_field.ComplexField()
 *
 * import real_mpfr             # <<<<<<<<<<<<<<
 * RR = real_mpfr.RealField()
 *
 */
  __pyx_t_2 = __Pyx_Import(((PyObject *)__pyx_n_s__real_mpfr), 0, -1); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 96; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
  __Pyx_GOTREF(__pyx_t_2);
  if (PyObject_SetAttr(__pyx_m, __pyx_n_s__real_mpfr, __pyx_t_2) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 96; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
Last edited 7 years ago by jdemeyer (previous) (diff)

comment:27 follow-up: Changed 7 years ago by nbruin

You're not giving me much to go on here ... Clearly on that machine "long int != int", It being an i386, I'd assume "long int" is 32 bits. Is "int" 16 bits there? That's just going to be unworkable! Hopefully the "int" is a variable under our control that we can simply change to a "long int".

Anyway, on pristine 5.8b0 and 5.8b1 I'm finding that this is line 94. So are you sure the problem comes from the patch here?

Anyway, FWIW, #715 comment 179 also addressed 32 bit problems, although there it seems to deal more with sign problems for "%". The reported behaviour here doesn't seem to be of that kind, unless the "%" gets executed in a signed way and then interpreted as an unsigned value (then you could easily get a "very big" value)

If you give me access to something where I can reproduce the problematic behaviour I can diagnose a little better. That might be preferable to me making random changes to the patch here in the hole that you get more favourable results. Or better yet, perhaps JP can put his acquired skills to use, since he has solved these similar problems before.

comment:28 in reply to: ↑ 27 Changed 7 years ago by jdemeyer

Replying to nbruin:

You're not giving me much to go on here ... Clearly on that machine "long int != int", It being an i386, I'd assume "long int" is 32 bits. Is "int" 16 bits there?

I'm fairly certain both "long int" and "int" are 32 bits. But any of those words could refer to the Python type "long" or "int".

Anyway, on pristine 5.8b0 and 5.8b1 I'm finding that this is line 94. So are you sure the problem comes from the patch here?

Yes, unmerging the patch fixes the problem. In case you wonder, I also applied #13618 but that's really only about doctests.

comment:29 follow-up: Changed 7 years ago by jdemeyer

To me, the expressions

PyInt_AsSsize_t(PyList_GET_ITEM(bucket, i))

look the most suspicious. What is PyList_GET_ITEM(bucket, i) and why are we sure it fits in a Py_ssize_t?

comment:30 Changed 7 years ago by jdemeyer

Looking at the Python sources now. The error message is precisely what one would get if PyInt_AsSsize_t() overflows (I do agree that the error text is confusing).

Anyway, you know much better than me where the values PyList_GET_ITEM(bucket, i) come from, so I'll let you think about it.

Last edited 7 years ago by jdemeyer (previous) (diff)

comment:31 in reply to: ↑ 29 ; follow-up: Changed 7 years ago by nbruin

Replying to jdemeyer:

To me, the expressions

PyInt_AsSsize_t(PyList_GET_ITEM(bucket, i))

look the most suspicious. What is PyList_GET_ITEM(bucket, i) and why are we sure it fits in a Py_ssize_t?

Ah yes! Good catch. This could very well be the source. The stored values here are ids of python objects. They are stored via a <size_t><void *> cast (which subsequently gets converted to a python int). When we try to retrieve that via a PyInt_AsSsize_t we're trying to retrieve it into a signed size_t (that's the purpose of the Ssize_t, right?), so indeed this might not fit.

What's the best fix, though? Should we be storing via a <Py_ssize_t><void *> cast or should we do a <size_t><void *>PyList_GET_ITEM(...) instead? I think I found PyInt_AsSsize_t by looking at the code generated by that statement.

It depends a bit: I would assume that python has relatively efficient storage for both small positive and negative integers, so storing as a signed quantity would make better use of the bits python offers for its "small" (as in not PyLong) integers.

Previously, the code prevented this whole issue by constructions such as

            tmp = <object>PyList_GET_ITEM(bucket, i)
            if <size_t>tmp == h1:

which of course works, but is less efficient: The cast to <object> will introduce refcounting.

So, my hunch is: store keys as <Py_ssize_t> and cast to <size_t> before doing modulo operations. I suppose that <size_t><Py_ssize_t>(-1) equals 256^(size_of(size_t))-1 (and probably doesn't generate code).

Advice welcome.

comment:32 in reply to: ↑ 31 ; follow-up: Changed 7 years ago by jdemeyer

Replying to nbruin:

I would assume that python has relatively efficient storage for both small positive and negative integers, so storing as a signed quantity would make better use of the bits python offers for its "small" (as in not PyLong) integers.

That is completely true, so going the Py_ssize_t way is probably the most efficient.

If N is of type Py_ssize_t, then <size_t>N % k isn't even slow, it should actually be at least as fast as N % k and certainly a lot faster than

h = N % k
if h < 0:
    h = -h
Last edited 7 years ago by jdemeyer (previous) (diff)

comment:33 in reply to: ↑ 32 Changed 7 years ago by nbruin

Replying to jdemeyer:

That is completely true, so going the Py_ssize_t way is probably the most efficient.

If N is of type Py_ssize_t, then <size_t>N % k isn't even slow, it should actually be at least as fast as N % k.

Cool. I'll try and prepare a patch that goes that route then. Hopefully that solves the problem.

Changed 7 years ago by nbruin

ensure consistent treatment of signs of hash values

Changed 7 years ago by nbruin

Combined patch of previous work

comment:34 Changed 7 years ago by nbruin

  • Description modified (diff)
  • Status changed from needs_work to needs_review

OK! Hopefully this does the trick. All doctests pass for me when applied to 5.8beta0. We'll see what the bot thinks.

I have uploaded the incremental changes wrt previous work here, in case that helps for review. For convenience, there's also a combined path, that has everything in one.

Apply trac_13387-combined.patch

comment:35 Changed 7 years ago by nbruin

Oh bother. Neither the iteration code nor the resize code is likely to be safe. The resize code does memory allocations throughout its inner loop, so callbacks (and bucket modifications!) can happen there. The loop is not robust against that in a rather fundamental way: If a callback happens on an entry that has already been copied over to the new buckets, but before the new buckets have been commissioned, then the entry will get removed from its old bucket (possibly causing mayhem), but the entry in the new bucket will happily remain!

I'm not sure how to formulate resize otherwise. Should we trigger a garbage collection, to ensure all collectable entries have already been removed? If we start rearranging the buckets right after that, we shouldn't be getting any callbacks. Unless GC isn't always completely thorough. Can it ever be that a GC call issued right after another GC can still find something? Given how difficult it can be to break cycles, I can imagine this would happen. For instance, if a weakref callback causes something else to become garbage, will that still be collected in the same call?

Another approach might be to "lock" the dictionary so that callbacks get queued rather than executed. When we unlock the dictionary we could process the queue (or purge the dict so that we're sure any outstanding callback has become irrelevant)

The iterator is less inherently unsafe. It's just that in between "yield"s there is likely allocation activity of some sort, so GCs could modify the dict.

In any case, I don't think the work on this ticket makes the situation worse than it already is, which apparently is not so bad: resize operations are very rare. It may well be they don't happen in the test suite at all.

Furthermore, the problem only arises if a new bucket needs to be extended during copying. Since buckets are supposed to be really small, this would be even rarer (python lists are initialized with room for a certain number of entries, right?). We don't know how big the buckets are going to have to be. Otherwise we could just preallocate the buckets with sufficient length.

In fact, I wonder how robust Python's own WeakValueDict? and WeakKeyDict? are in the face of these issues. Perhaps they are OK because their memory allocations are (near) atomic.

That would be an option too: program the hash tables of TripleDict? and MonoDict? via open addressing rather than the list-of-lists we have now. I'm not the right person to make such a change, though. I'm sure implementing good hash tables has all kinds of pitfalls I am not aware of.

Conclusion: This ticket should still be merged because it's likely strictly an improvement over the status-quo, but there are some fundamental design problems due to an extremely remote possibility for unfortunate GC occurrences in the middle of dictionary update operations.

Changed 7 years ago by nbruin

Guard against GC during dict resize

comment:36 Changed 7 years ago by nbruin

  • Description modified (diff)

Therapeutic despair. The solution is obvious! Just disable garbage collection for the duration of the critical block. That way Python can "queue" the callbacks for us. This should ensure that MonoDict.resize and TripleDict.resize are safe.

Iteration comes "buyer beware" and I don't think any code uses it presently anyway.

Apply trac_13387-combined.patch, trac_13387-guard_GC.patch

comment:37 follow-up: Changed 7 years ago by jdemeyer

Some remarks about the comment (lines 48--62 of sage/structure/coerce_dict.pyx):

Nobody guarantees that size_t is a "pointer sized integer". In practice, the number of bits in a size_t does equal the bitness of the machine on modern systems, but there is no guarantee for this. In C99, there is a type uintptr_t, which is defined as being a pointer-sized unsigned integer. See also http://stackoverflow.com/questions/1464174/size-t-vs-intptr-t for some good points.

The next sentence (This has an advantage...) could be made more concrete: Assuming that Py_ssize_t is the same as a C long (which is true on most Unix-like systems), this also has the advantage that these Py_ssize_t values are stored as a Python "int" (as opposed to "long"), which allow for fast conversion to/from C types.

Typo:

(<size_t) h)% modulus

comment:38 in reply to: ↑ 37 ; follow-up: Changed 7 years ago by SimonKing

Replying to jdemeyer:

Nobody guarantees that size_t is a "pointer sized integer".

Didn't know that. Thank you for pointing this out.

In C99, there is a type uintptr_t, which is defined as being a pointer-sized unsigned integer.

OK. Can we use this instead?

comment:39 in reply to: ↑ 38 ; follow-up: Changed 7 years ago by jdemeyer

Replying to SimonKing:

OK. Can we use this instead?

Not really, since Python doesn't define a function like PyInt_AsIntptr_t. But, like I said, for all current systems, we have ssize_t == intptr_t.

comment:40 in reply to: ↑ 39 ; follow-up: Changed 7 years ago by nbruin

Replying to jdemeyer:

Not really, since Python doesn't define a function like PyInt_AsIntptr_t. But, like I said, for all current systems, we have ssize_t == intptr_t.

There is something that comes close, though: Python's id is implemented via PyLong_FromVoidPtr. This routine only compiles if sizeof(void *) <= sizeof(long) or if sizeof(void *) <= sizeof(long long) (if that type is available).

So CPython itself is taking reasonable but not extreme measures to support odd pointer sizes (I guess sizeof(void *) !=sizeof(long) is quite possible)

The definition is available:

cdef extern from "stdint.h":
    ctypedef long intptr_t #doesn't need to match exactly

In order to find conversion routines in Python that are guaranteed of the right length, we'd have to go with PyLong_FromVoidPtr and PyLong_AsVoidPtr. So basically we'd be working with void * and convert to intptr_t if we need to do arithmetic.

In fact, at that time we can afford to lose bits, so we wouldn't particularly need intptr_t as a type at all. We can just cast to <long> once we do arithmetic to compute the hash value or the bucket number.

So the question really is: Do we want to let our keys be <void *> rather than Py_Ssize_t? We do need to pack and unpack those keys into the bucket PyList objects.

There is a slight problem there: PyLong_FromVoidPtr goes through extra trouble to interpret void * as an unsigned quantity (if only it didn't do that! what's wrong with a negative pointer?). Thus even on systems where a PyInt? and a void * have the same number of bits available, half of the possible values will be stored in a PyLong anyway (because of sign). [Note that PyLong_FromVoidPtr is happy to produce a PyInt if it thinks it fits.

So yes, we could do that and guard against future platforms where sizeof(void *) > sizeof(size_t), but we'd do so at a storage and performance penalty on our platforms now. (probably very rare on 64bit, because pointer tend to have their top bits zero there, but on 32-bit it's very common to have pointers with their top bit set)

I'm not sure it's worth it.

comment:41 in reply to: ↑ 40 Changed 7 years ago by jdemeyer

Replying to nbruin:

I'm not sure it's worth it.

I agree, especially also given the fact that other components of Sage make more serious assumptions.

comment:42 Changed 7 years ago by SimonKing

  • Status changed from needs_review to needs_work
  • Work issues set to rebase

It seems that #12313 is merged in sage-5.8.beta0. But the first patch does not apply, 2 out of 30 hunks fail.

  • coerce_dict.pyx

    cdef class MonoDictEraser: 
    115137        # and it knows the stored key of the unique singleton r() had been part
    116138        # of.
    117139        # We remove that unique tuple from self.D -- if self.D is still there!
    118         cdef MonoDict D = self.D()
     140       
     141        #WARNING! These callbacks can happen during the addition of items to the same
     142        #dictionary. The addition code may then be in the process of adding a new entry
     143        #(one PyList_Append call at a time) to the relevant bucket.
     144        #The "PyList_GET_SIZE(bucket) by 3" call should mean that we round down and hence not look
     145        #at incomplete entries. Furthermore, deleting a slice out of a buck should still be OK.
     146        #this callback code should absolutely not resize the dictionary, because that would wreak
     147        #complete havoc.
     148       
     149        cdef MonoDict D = <object> PyWeakref_GetObject(self.D)
    119150        if D is None:
    120151            return
    121152        cdef list buckets = D.buckets
    122153        if buckets is None:
    123154            return
    124         cdef size_t k = r.key
    125         cdef size_t h = k % PyList_GET_SIZE(buckets)
    126         cdef list bucket = <object>PyList_GET_ITEM(buckets,h)
     155        cdef Py_ssize_t h = r.key
     156        cdef list bucket = <object>PyList_GET_ITEM(buckets, (<size_t>h) % PyList_GET_SIZE(buckets))
    127157        cdef Py_ssize_t i
    128         for i from 0 <= i < PyList_GET_SIZE(bucket) by 2:
    129             if <size_t><object>PyList_GET_ITEM(bucket,i)==k:
    130                 del bucket[i:i+2]
     158        for i from 0 <= i < PyList_GET_SIZE(bucket) by 3:
     159            if PyInt_AsSsize_t(PyList_GET_ITEM(bucket,i))==h:
     160                del bucket[i:i+3]
    131161                D._size -= 1
    132162                break
    133         try:
    134             D._refcache.__delitem__(k)
    135         except KeyError:
    136             pass
    137163
    138 cdef class TripleDictEraser(MonoDictEraser):
     164cdef class TripleDictEraser:
    139165    """
    140166    Erases items from a :class:`TripleDict` when a weak reference becomes
    141167    invalid.
    cdef class TripleDictEraser(MonoDictEras 
    216251        # r is a (weak) reference (typically to a parent), and it knows the
    217252        # stored key of the unique triple r() had been part of.
    218253        # We remove that unique triple from self.D
    219         cdef size_t k1,k2,k3
     254        cdef Py_ssize_t k1,k2,k3
    220255        k1,k2,k3 = r.key
    221         cdef size_t h = (k1 + 13*k2 ^ 503*k3)
    222         cdef list bucket = <object>PyList_GET_ITEM(buckets, h % PyList_GET_SIZE(buckets))
     256        cdef Py_ssize_t h = (k1 + 13*k2 ^ 503*k3)
     257        cdef list bucket = <object>PyList_GET_ITEM(buckets, (<size_t>h) % PyList_GET_SIZE(buckets))
    223258        cdef Py_ssize_t i
    224         for i from 0 <= i < PyList_GET_SIZE(bucket) by 4:
    225             if <size_t><object>PyList_GET_ITEM(bucket, i)==k1 and \
    226                <size_t><object>PyList_GET_ITEM(bucket, i+1)==k2 and \
    227                <size_t><object>PyList_GET_ITEM(bucket, i+2)==k3:
    228                 del bucket[i:i+4]
     259        for i from 0 <= i < PyList_GET_SIZE(bucket) by 7:
     260            if PyInt_AsSsize_t(PyList_GET_ITEM(bucket, i))==k1 and \
     261               PyInt_AsSsize_t(PyList_GET_ITEM(bucket, i+1))==k2 and \
     262               PyInt_AsSsize_t(PyList_GET_ITEM(bucket, i+2))==k3:
     263                del bucket[i:i+7]
    229264                D._size -= 1
    230265                break
    231         try:
    232             D._refcache.__delitem__((k1,k2,k3))
    233         except KeyError:
    234             pass
    235266
    236267cdef class MonoDict:
    237268    """
    238269    This is a hashtable specifically designed for (read) speed in
    239270    the coercion model.
    240271
    241     It differs from a python dict in the following important way:
     272    It differs from a python WeakKeyDictionary in the following important way:
    242273
    243274       - Comparison is done using the 'is' rather than '==' operator.
    244        - There are only weak references (if possible) to the keys.
     275       - Only weak references to the keys are stored if at all possible.
     276         Keys that do not allow for weak references are stored with a normal
     277         refcounted reference.
    245278
    246279    There are special cdef set/get methods for faster access.
    247280    It is bare-bones in the sense that not all dictionary methods are

Changed 7 years ago by SimonKing

Combined patch of previous work, rel 5.8.beta0

comment:43 Changed 7 years ago by SimonKing

  • Description modified (diff)
  • Status changed from needs_work to needs_review

If patchbot starts with sage-5.8.beta0, it should first apply one patch from #12313 (namely the one that reverts the usage of a callback for the weak reference to homsets), and then

Apply trac_13387-combined-rel5.8.b0.patch trac_13387-guard_GC.patch

comment:44 Changed 7 years ago by SimonKing

I am puzzled. The patchbot is applying the correct patches in the correct order to the correct sage version, but almost all patches fail to apply.

For the record, I have

trac_12313-revert_callback_from_11521.patch
trac_13387-combined-rel5.8.b0.patch
trac_13387-guard_GC.patch

on top of sage-5.8.beta0.

comment:45 Changed 7 years ago by SimonKing

To be on the safe side, I was deleting the three patches and downloaded them from trac. They applied fine. So, I think the error is on the side of the patchbot.

comment:46 Changed 7 years ago by SimonKing

Argh. The patch keeps failing to apply! Could it be that the "official" version of 5.8.beta0 is different from the one that I had downloaded?

comment:47 Changed 7 years ago by nbruin

  • Description modified (diff)
  • Type changed from enhancement to defect
  • Work issues rebase deleted

I don't see why you think rebasing is necessary. The previous patchbot logs already show that the ticket here applies cleanly on 5.8beta0 and that tests pass.

The fact that #12313 fails to apply with its currently stated dependencies is expected: The patchbot doesn't understand "merge with". However, if patches apply here and tests pass, that's an implicit validation of #12313 (with the understanding that this ticket needs to be merged after it).

I'm also refiling this as a "defect" rather than an enhancement: Some of the fixes here fix real bugs (although we might not have seen them triggered).

Let's try again, since the "guard_GC" patch didn't get picked up by the bot before:

Apply trac_13387-combined.patch trac_13387-guard_GC.patch

Changed 7 years ago by SimonKing

Guard against GC during dict resize

comment:48 Changed 7 years ago by SimonKing

  • Description modified (diff)

IIRC, patchbot won't apply patches that have an uppercase letter in the name.

Hence, I renamed your second patch.

Apply trac_13387-combined.patch trac_13387-guard_gc.patch

comment:49 follow-up: Changed 7 years ago by SimonKing

What is the startup_modules script complaining about?

comment:50 in reply to: ↑ 49 Changed 7 years ago by nbruin

Replying to SimonKing:

What is the startup_modules script complaining about?

It's complaining that there seems to be a new module now, sage.structures.gc. I don't know why that's a bad thing. It's a direct consequence of the statement import gc that I added, which I need because I need to do gc.enable() and gc.disable(). This is in a routine that should rarely execute and has an expensive inner loop that doesn't contain these statements, so I don't think the python method calls are going to make a difference. Furthermore, I wasn't able to find a C-level interface to this functionality. I've checked the source of the GC module and the underlying C routines are explicitly declared static, so our only access seems to be via method lookup.

If someone has a recommended way of accessing this functionality without importing gc, that's fine with me. Moving the import gc statement into runtime, say one of the init methods is a bad idea: Several of these dictionaries get instantiated every time a parent's coercion framework is initialized (which can be pretty often), but a documented C-level API for turning gc on and off would be fine.

comment:51 Changed 7 years ago by SimonKing

  • Reviewers set to Simon King
  • Status changed from needs_review to positive_review

With sage-5.8.beta0 plus #14159 plus #12313 plus #13387, make ptest works fine for me. The ideas behind the patches from here are well-documented in comments in the code. The precautions taken to prevent trouble with garbage collection happening in the wrong moment seem reasonable to me. And we have seen that the performance is fine as well.

Hence, it is a positive review.

comment:52 Changed 7 years ago by SimonKing

PS: From my perspective, it is no problem that gc is imported during Sage startup.

comment:53 Changed 7 years ago by jdemeyer

  • Merged in set to sage-5.8.beta2
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:54 Changed 7 years ago by jdemeyer

Please see #14254 for a blocker follow-up.

Note: See TracTickets for help on using tickets.