Opened 4 years ago

Closed 4 years ago

#14159 closed defect (fixed)

Don't install callbacks on values of TripleDict, MonoDict

Reported by: nbruin Owned by: tbd
Priority: major Milestone: sage-5.10
Component: memleak Keywords:
Cc: SimonKing, jpflori Merged in: sage-5.10.beta0
Authors: Simon King Reviewers: Nils Bruin
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: #13387, #14254 Stopgaps:

Description (last modified by SimonKing)

In #11521 trac_11521_callback.patch a callback was installed on a weakref *value* of a TripleDict:

 _cache[key] = KeyedRef(H, _cache.eraser, (id(X),id(Y),id(category)))

That's not safe: If the value under that key gets changed while the value remains in memory, the callback will be executed and remove an entry that now has an unrelated value!

So: Either prove that the value under this key will not change for the lifetime of H (keep in mind that cyclic references can extend the lifetime of an otherwise unreachable object essentially indefinitely, so the proof needs to include that all key components survive H, otherwise those ids could get reused) or provide a more selective callback (for instance, ensure that the value is still as expected before deleting).

Note: The patch trac_14159_safer_callback.patch follows the second approach, so that a memory leak remains fixed.

Another point is that the API of _cache.eraser isn't really published, so this behaviour is probably better encapsulated in a method on the dict itself.

See #12313 comment 317 for a situation that likely implicates these callbacks (someone has to hold strong references to these keys in order to set the dict, so the absence of the keys suggests a spurious execution of this kind of callback)

Apply

Attachments (5)

trac_12313-revert_callback_from_11521.patch (1015 bytes) - added by nbruin 4 years ago.
patch that got misplaced on #12313
trac_14159-document_callback_issue.patch (2.7 KB) - added by nbruin 4 years ago.
Documentation to explain the problem (and possible future problems)
trac_14159_safer_callback.patch (7.2 KB) - added by SimonKing 4 years ago.
A safe callback for weak values of a TripleDict
trac_14159_weak_value_triple_dict.patch (18.0 KB) - added by SimonKing 4 years ago.
Optional weak values for mono- and tripledict
trac_14159_use_cdef_get.patch (2.4 KB) - added by SimonKing 4 years ago.
Use cdef get/set methods of MonoDict and TripleDict in cython modules

Download all attachments as: .zip

Change History (65)

comment:1 follow-up: Changed 4 years ago by SimonKing

What do you mean by "value under that key"?

Ah! You mean this situation: T[a,b,c]=v and later T[a,b,c]=w. Then v might become garbage collected and its callback would remove w from the dictionary.

Indeed, this would be bad.

Potential solutions (two alternatives):

  1. The callback shall test whether the value in the entry to be deleted coincides with the value that is being garbage collected.
  1. If an existing entry is overwritten by __setitem__ then its callback will be removed.

comment:2 in reply to: ↑ 1 ; follow-up: Changed 4 years ago by nbruin

Replying to SimonKing:

  1. If an existing entry is overwritten by __setitem__ then its callback will be removed.

I guess you could reach into the KeyedRef object and invalidate the key to prevent action on callback. Doing so would complicate all our dict setting and deleting, so I think that is not an attractive option.

Just deleting the KeyedRef object won't guarantee it will go away: someone else may be holding a reference to the KeyedRef object as well.

So probably there needs to be an additional option on ...DictEraser objects to check that the value is as expected (or write yet another callback class, but that sounds like a waste).

And then ...Dict would grow an extra set_weak_value to install the appropriate KeyedRef? on the value, because letting other code instantiate eraser objects is scary.

It feels like horrible feature creep, but I guess you have a good use case (can you come up with an alternative solution?)

comment:3 in reply to: ↑ 2 ; follow-up: Changed 4 years ago by SimonKing

Replying to nbruin:

Replying to SimonKing:

  1. If an existing entry is overwritten by __setitem__ then its callback will be removed.

I guess you could reach into the KeyedRef object and invalidate the key to prevent action on callback. Doing so would complicate all our dict setting and deleting, so I think that is not an attractive option.

So, what about the first option? It would make sure that an entry will only be deleted if the value is identical with the object whose callback is being called.

comment:4 in reply to: ↑ 3 ; follow-up: Changed 4 years ago by nbruin

Replying to SimonKing:

So, what about the first option? It would make sure that an entry will only be deleted if the value is identical with the object whose callback is being called.

Yes, sorry for not being clear about it. That's what the second part of the reply expands on.

Last edited 4 years ago by nbruin (previous) (diff)

comment:5 in reply to: ↑ 4 Changed 4 years ago by SimonKing

Replying to nbruin:

Replying to SimonKing:

So, what about the first option? It would make sure that an entry will only be deleted if the value is identical with the object whose callback is being called.

Yes, sorry for not being clear about it. That's what the second part of the reply expands on.

Sorry, I don't see the relation with suggestion 1.

So probably there needs to be an additional option on ...DictEraser? objects to check that the value is as expected

Why? The callback would be checking that the value is identical with the object pointed to by the weak reference. This should be cheap enough to be done by default.

comment:6 Changed 4 years ago by SimonKing

I thought that the following would expose the problem, but it doesn't:

sage: from sage.structure.coerce_dict import TripleDict
sage: import gc
sage: class Foo: pass
sage: 
sage: a = 1
sage: b = 2
sage: c = 3
sage: v = Foo()
sage: w = Foo()
sage: T = TripleDict(13)
sage: T[a,b,c] = v
sage: T[a,b,c] = w
sage: del v
sage: _ = gc.collect()
sage: len([x for x in gc.get_objects() if isinstance(x,Foo)])
1

So, v got collected and thus the callback was executed. However:

sage: id(T[a,b,c])
85762632
sage: id(w)
85762632

The entry that was previously holding v did not get deleted when v's callback was called.

So, are we really in trouble?

comment:7 follow-up: Changed 4 years ago by nbruin

You have to make sure that the weakref object lives to see the death of v (and ensure that you make such a weakref in the first place!)

sage: from sage.structure.coerce_dict import TripleDict
sage: import gc
sage: import weakref
sage: class Foo: pass
sage: 
sage: a = 1
sage: b = 2
sage: c = 3
sage: v = Foo()
sage: w = Foo()
sage: T = TripleDict(13)
sage: T[a,b,c] = weakref.KeyedRef(v,T.eraser,(id(a),id(b),id(c)))
sage: h=T[a,b,c]
sage: T[a,b,c] = w
sage: del v
sage: _ = gc.collect()
sage: len([x for x in gc.get_objects() if isinstance(x,Foo)])
1
sage: T[a,b,c] == w
KeyError: (1, 2, 3)

I admit, the weakref surviving outside the dict is not a terribly likely event, but from your code at #11521 it wasn't entirely clear to me that you can absolutely rule it out.

comment:8 in reply to: ↑ 7 ; follow-up: Changed 4 years ago by SimonKing

Replying to nbruin:

You have to make sure that the weakref object lives to see the death of v (and ensure that you make such a weakref in the first place!)

Apparently I had wrong memories about my code, then. I had in mind that the TripleDict has a weak reference with callback to the value.

sage: from sage.structure.coerce_dict import TripleDict
sage: import gc
sage: import weakref
sage: class Foo: pass
sage: 
sage: a = 1
sage: b = 2
sage: c = 3
sage: v = Foo()
sage: w = Foo()
sage: T = TripleDict(13)
sage: T[a,b,c] = weakref.KeyedRef(v,T.eraser,(id(a),id(b),id(c)))
sage: h=T[a,b,c]
sage: T[a,b,c] = w
sage: del v
sage: _ = gc.collect()
sage: len([x for x in gc.get_objects() if isinstance(x,Foo)])
1
sage: T[a,b,c] == w
KeyError: (1, 2, 3)

I admit, the weakref surviving outside the dict is not a terribly likely event, but from your code at #11521 it wasn't entirely clear to me that you can absolutely rule it out.

Hm. This particular TripleDict is only used by hom, and hom doe not override an existing entry. So, this particular case should be safe. However, it still seems reasonable to me to let the callback check whether the to-be-deleted value did not change.

comment:9 in reply to: ↑ 8 Changed 4 years ago by nbruin

Replying to SimonKing:

Hm. This particular TripleDict is only used by hom, and hom doe not override an existing entry. So, this particular case should be safe. However, it still seems reasonable to me to let the callback check whether the to-be-deleted value did not change.

I was fully expecting that to be true, but I could not come up with an explanation for the bug Jeroen reported at #12313, where the line in

 _cache[key] = KeyedRef(H, _cache.eraser, (id(X),id(Y),id(category)))

leads to a key error in TripleDict.set in the line:

r1,r2,r3 = <tuple>(self._refcache[h1,h2,h3])

The only thing I could think of for that line to fail is:

  • This KeyedRef object W is constructed
  • _cache.set(k1,k2,k3,W) is called with k1=id(X),k2=id(Y),k3=id(category). Note that the caller is holding strong references to (X,Y,category).
  • In order to end up in the line r1,r2,r3 = <tuple>(self._refcache[h1,h2,h3]), the key triple (k1,k2,k3) must be present in this TripleDict and hence in self._refcache
  • Yet, we get a KeyError. Calling self._refcache[h1,h2,h3] requires the construction of a new tuple (h1,h2,h3), which could trigger a GC and hence a callback that would remove entries. So the only explanation I see is that we're getting a callback with key (k1,k2,k3) due to a GC triggered by this tuple allocation.
  • However, this callback cannot result from any of X,Y,category getting collected, because someone is holding strong references to them. Hence such a callback must be coming from somewhere else.
  • The code pointed to (ironically the same code that triggers this TripleDict operation) is an example of code that installs such callbacks. So a possible scenario is that a statement

_cache.set(k1,k2,k3,Wold) was executed before, that Wold is still hanging around in memory and that it finally gets removed due to this poorly timed garbage collection and consequently that the keytriple (k1,k2,k3) receives a callback, even though they are pointing to (or are about to be pointing to -- we don't know what happened in between) another, unrelated value.

In this scenario, values in this Hom dictionary do get changed.

We need to ensure that TripleDict cannot trigger GC at that point anyway (and that is what #13387 does as a side-effect), because all kinds of other bad things could happen as well. However, given that it needs such a complicated explanation, I would hope we can verify my assumptions, because otherwise my fix might not be sufficient.

Hence the formulation of the ticket: If you can prove that installing this callback is safe, I have no problem with it.

comment:10 Changed 4 years ago by nbruin

Ouch, I looked at that code and you're correct: That doesn't overwrite the value at all (it's really a cache). Incidentally, at that code:

    try:
        H = _cache[key]()
    except KeyError:
        H = None
    if H is not None:
        # Are domain or codomain breaking the unique parent condition?
        if H.domain() is X and H.codomain() is Y:
            return H

Since X,Y are part of key, which is looked up by identity, the "domain" and "codomain" test should really be superfluous. So perhaps we want

if H is not None:
   assert H.domain() is X and H.codomain() is Y
   return H

since something is going very wrong if this is not the case! (H would not be allowed to sit there in the dict).

Furthermore, H has strong references to X,Y,category in the form of H.domain(), H.codomain() and H.homset_category(), so if H is alive then so must the keys. Which means that the callback on H must happen (or be purged) before those on X,Y etc., or at least in the same GC. so their ids can't get reused until their callbacks have happened or have been discarded.

I think that only leaves the possibility that _cache got silently corrupted due to a GC at an inopportune time before this particular error. We cannot rule that out, but it's extremely unlikely: It requires a bucket to be rather full and buckets are typically not full at all (ideally only one entry).

On the other hand, since this is a global _cache, it does get to hold quite a few keys, so it's quite likely that it underwent a resize operation, where a corruption was much more likely to happen.

It's rather difficult to do an accurate postmortem with so little data.

Changed 4 years ago by nbruin

patch that got misplaced on #12313

comment:11 Changed 4 years ago by SimonKing

  • Authors set to Simon King
  • Status changed from new to needs_review

Thank you for moving the patch to the right location (I forgot about the existence of this ticket...).

Changed 4 years ago by nbruin

Documentation to explain the problem (and possible future problems)

comment:12 Changed 4 years ago by nbruin

"Reviewer" patch to provide some doc and pointers if this ever proves to be a problem. An example of the kind of situation where this could lead to quadratic memory use (relative to what is required):

R=[ZZ.quo(3^n) for n in [1..100]]
for i in range(100):
    for j in range(i,100):
        H = Hom(R[j],R[i])

This leaves 100*101/2 entries with (eventually) dead weakrefs in the dictionary.

I think we should leave ample documentation of this leak in the code, including the strategy to fix it, should it become necessary.

(for posterity: The KeyedRef? alternative is probably OK, because it seems that H can never outlive its domain and codomain X and Y, so I don't see how their id's could get reused before the callback on H has happened or got purged. I'm just very cautious with anything that has to do with GC and callback orders after the problems we encountered with it elsewhere. In any case, the incantation would have to documented "DO NOT COPY THIS WITHOUT UNDERSTANDING THE CONSEQUENCES FULLY", because it's a bad pattern to use elsewhere, unless you prove a whole bunch of extra conditions.)

Simon, are you OK with the doc? If so we can put this to positive review (doctests passed for me)

comment:13 Changed 4 years ago by nbruin

  • Component changed from PLEASE CHANGE to memleak
  • Reviewers set to Nils Bruin
  • Type changed from PLEASE CHANGE to defect

comment:14 Changed 4 years ago by SimonKing

Since your patch changes code (by inserting assert statements) I better run the doc tests with #14159, #12313 and #13387, before changing it to positive review.

Note, however, that the error reported by the patchbot probably is unrelated:

sage -t  -force_lib devel/sage-14159/doc/en/constructions/calculus.rst
**********************************************************************
File "/mnt/storage2TB/patchbot/Sage/sage-5.8.beta0/devel/sage-14159/doc/en/constructions/calculus.rst", line 102:
    sage: maxima(g).powerseries('x',0)
Expected:
    16*f0*('sum((2^(2*i1-1)-1)*bern(2*i1)*k^(2*i1-1)*x^(2*i1-1)/(2*i1)!,i1,0,inf))^4
Got:
    Maxima crashed -- automatically restarting.
    16*f0*('sum((2^(2*i1-1)-1)*bern(2*i1)*k^(2*i1-1)*x^(2*i1-1)/(2*i1)!,i1,0,inf))^4
**********************************************************************
1 items had failures:
   1 of   7 in __main__.example_3
***Test Failed*** 1 failures.

comment:15 Changed 4 years ago by nbruin

A little data point for how this change might adversely affect memory and speed performance. Running

import resource
R=[ZZ.quo(3^n) for n in [1..1000]]
def test(R):
    for i in range(len(R)):
        print i
        for j in range(i,len(R)):
            H=Hom(R[j],R[i])

%time test(R)
resource.getrusage(resource.RUSAGE_SELF)

Reference:

resource.struct_rusage(ru_utime=50.020395, ru_stime=0.16497399999999998,
ru_maxrss=130108, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=42660,
ru_majflt=1, ru_nswap=0, ru_inblock=600, ru_oublock=240, ru_msgsnd=0,
ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=8669, ru_nivcsw=7514)

With this patch:

resource.struct_rusage(ru_utime=54.220757, ru_stime=0.35794499999999996,
ru_maxrss=703824, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=187662,
ru_majflt=1, ru_nswap=0, ru_inblock=584, ru_oublock=240, ru_msgsnd=0,
ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=12759, ru_nivcsw=7940)

This of course is an example that tries to expose exactly the flaw. We're finding:

  • about 10% runtime penalty. That seems to be the garbage collections that are happening.
  • a lot more memory usage (without the patch memory usage is flat. With the patch you can run out of memory)

I'm not sure how serious this is. It's making me a little hesitant on whether we should be "fixing" this at all, though!

comment:16 Changed 4 years ago by SimonKing

Idea: We could add a callback---but a new one. Namely a callback, that will only remove the entry from the TripleDict if the value is the weak reference for which this callback is called.

Changed 4 years ago by SimonKing

A safe callback for weak values of a TripleDict

comment:17 Changed 4 years ago by SimonKing

  • Description modified (diff)

I did not run the full tests yet, but I think the new patch can be reviewed.

I introduce a variant of TripleDictEraser dedicated for use on the values of a TripleDict (if we use a callback for the values of MonoDict then we need to do the same). It will only erase an item if the value is the originally stored weak reference.

The patch contains tests demonstrating that TripleDictEraser is unsafe but TripleDictEraserOnValue is safe on values, and it contains a test demonstrating that homsets can still be garbage collected.

Apply trac_14159_safer_callback.patch

comment:18 Changed 4 years ago by SimonKing

  • Dependencies set to #13387

PS: I worked on top of #13387, so, that's a dependency.

Apply trac_14159_safer_callback.patch

comment:19 follow-up: Changed 4 years ago by nbruin

tests OK for me. One tiny nitpick:

  • You're claiming that this fixes an extremely rare race condition wrt. to GC. I'm actually not sure that it CAN occur for our particular use (because our value has a strong reference to a key component). I think your example on TripleDictEraserOnValue illustrates why we should prefer checking this anyway. I hope we don't have to be this paranoid in the future.

There's another one, that the _cache[X,Y,category]=KeyedRef(...) statement is very fragile: If you make a mistake in typing the (id(X),id(Y),id(category)) you'll have a very hard to trace (and possibly disastrous!) bug in your program. One solution would be to encapsulate it in TripleDict, but the instantiation of the TripleDictEraserOnValue would complicate the logic in TripleDict, so let's just keep it this way.

comment:20 Changed 4 years ago by SimonKing

  • Status changed from needs_review to needs_work
  • Work issues set to Make part the coverage script happy

The coverage script rightfully complains that I didn't add a test to the __call__ method of the new TripleDictEraserOnValue.

Do we need to worry about startup time? How to interpret the data?

comment:21 Changed 4 years ago by SimonKing

The startup_modules script complains that sage.structure.gc was added. But that has happened in #13387! So, why is the patchbot complaining here, too?

comment:22 in reply to: ↑ 19 ; follow-up: Changed 4 years ago by SimonKing

Replying to nbruin:

tests OK for me. One tiny nitpick:

  • You're claiming that this fixes an extremely rare race condition wrt. to GC. I'm actually not sure that it CAN occur for our particular use (because our value has a strong reference to a key component). I think your example on TripleDictEraserOnValue illustrates why we should prefer checking this anyway. I hope we don't have to be this paranoid in the future.

I thought you diagnosed that this was the cause of one problem we saw? Such as:

  1. Some parent X, Y give rise to a homset H, in a category C.
  2. H and X get deleted, but the callbacks are not performed yet. A new parent X2 is created, and by mere coincidence we have id(X)==id(X2).
  3. The homset H2 with respect to X2, Y and C is created. It thus overrides H in the cache.
  4. Finally, the callback of H is performed, erasing H2 from the cache. Boom

This is what I meant by a "rare racing condition".

Do you prefer that I erase this statement?

Thinking of it: Suppose we are now considering the callback of X, which, in the above scenario, has not been called prior to the creation of X2. Would it not delete the new item, too? So, shouldn't we add a test in TripleDictEraser.__call__(self, r), so that an item is deleted only if one of the references from the keys of this item is identical with r?

There's another one, that the _cache[X,Y,category]=KeyedRef(...) statement is very fragile: If you make a mistake in typing the (id(X),id(Y),id(category)) you'll have a very hard to trace (and possibly disastrous!) bug in your program. One solution would be to encapsulate it in TripleDict, but the instantiation of the TripleDictEraserOnValue would complicate the logic in TripleDict,

I wouldn't like to put this into TripleDict. After all, it is impossible to prevent all thinkable typing mistakes.

comment:23 Changed 4 years ago by SimonKing

I am just thinking: Perhaps one should enrich the key of the KeyedReference, so that it states where we are supposed to find the reference that caused the callback. I am thinking of an additional int, that is 1, 3 or 5 if the reference is in the first, second or third part of the key (1,3,5 give the relative position in the bucket), and 6 if the reference belongs to the value of the item.

In that way, we would avoid the racing condition I described in the previous comment, *and* we would not need to create a separate class TripleDictEraserOnValue.

comment:24 Changed 4 years ago by SimonKing

Additional idea: We could create TripleDict with an optional parameter weak_values. If this parameter is True, then the set/get methods would automatically use a weak reference on the value, with the correct key for the callback. Hence, we do not need to exploit implementation details when we apply a triple dict with weak values in sage.categories.homset

Does that sound like a good idea?

comment:25 Changed 4 years ago by SimonKing

What I suggest is the following:

        sage: from sage.structure.coerce_dict import MonoDict
        sage: import gc
        sage: M = MonoDict(13)
        sage: MW = MonoDict(13, weak_values=True)
        sage: class Foo: pass
        sage: a = Foo()
        sage: b = Foo()
        sage: k = 1
        sage: M[k] = a
        sage: MW[k] = b
        sage: M[k] is a
        True
        sage: MW[k] is b
        True
        sage: k in M
        True
        sage: k in MW
        True
        sage: import gc
        sage: del a,b
        sage: _ = gc.collect()
        sage: k in M
        True
        sage: k in MW
        False
        sage: len(MW)
        0
        sage: len(M)
        1
        sage: MW[k] = int(5)
        sage: MW[k]
        5

Hence, we have the option to use weak values, but even in that case we would accept non-weakrefable values, too.

And, similarly, for TripleDict, with using weak_values for the homset cache.

comment:26 in reply to: ↑ 22 ; follow-up: Changed 4 years ago by nbruin

Replying to SimonKing:

(just saw your new post. Definitely an option. If you see merit I'm fine with that. Below response I wrote to your earlier message)

I thought you diagnosed that this was the cause of one problem we saw?

No, what I'm pretty sure we saw at some point was GC happening during dictionary value setting. While looking at the code I noticed this callback without proof that it's safe.

  1. Some parent X, Y give rise to a homset H, in a category C.
  2. H and X get deleted, but the callbacks are not performed yet. A new parent X2 is created, and by mere coincidence we have id(X)==id(X2).
  3. The homset H2 with respect to X2, Y and C is created. It thus overrides H in the cache.
  4. Finally, the callback of H is performed, erasing H2 from the cache. Boom

Yes, that's exactly the scenario I envisaged. I don't have a proof that this is theoretically possible, though.

The weakref to X is referenced by this dictionary that is still in use, so it's not garbage. I think that means the callback to X is guaranteed to happen before deallocation of X (and hence before the creation of X2). Furthermore X can only be found to be garbage if H is found to be garbage (because of the strong domain reference), so the callback on H has to happen in the same GC as the deallocation of X. So as long as we're not creating parents during callbacks, I think that also means X2 cannot be created in the same memory location as X while the callback on H hasn't happened yet.

So we have a choice:

  • make a very involved argument, carefully reasoning about the specifics of python's GC system (for which we don't actually have a formal axiom system)
  • be on the safe side and just put in a cheap test.

Do you prefer that I erase this statement?

perhaps "race condition we couldn't quite rule out"

Thinking of it: Suppose we are now considering the callback of X, which, in the above scenario, has not been called prior to the creation of X2. Would it not delete the new item, too? So, shouldn't we add a test in TripleDictEraser.__call__(self, r), so that an item is deleted only if one of the references from the keys of this item is identical with r?

No, in order to set the value, the caller must be holding strong references to the keys. As far as I know, weakref callback and deallocation only get separated inside GC. So as long as we're not calling set in callbacks, I think we're prevented from it (note that for H we needed, in addition, that H holds a reference to X AND that key triples do not get reassigned.)

I wouldn't like to put this into TripleDict. After all, it is impossible to prevent all thinkable typing mistakes.

Well, this is one where you have to type the same data twice. But I agree: We have only one use case. Should we get more I think it's worth reconsidering.

Additional idea: We could create TripleDict? with an optional parameter weak_values. If this parameter is True, then the set/get methods would automatically use a weak reference on the value, with the correct key for the callback. Hence, we do not need to exploit implementation details when we apply a triple dict with weak values in sage.categories.homset

Well, conceptually that's a different concept. Whether we want to implement that in the same structure (advantage: code reuse, disadvantage: increased complexity, possibly efficiency loss) is an engineering call. Again, for one use case I think it's not worth it. Should we find the need more often for such "fully weak associations" (we found the coercion system has tables that are a bit like this) we can reconsider.

At the moment I'm getting a feeling we're entering the realm of overengineering.

comment:27 in reply to: ↑ 26 Changed 4 years ago by SimonKing

  • Description modified (diff)
  • Status changed from needs_work to needs_review
  • Work issues Make part the coverage script happy deleted

Replying to nbruin:

At the moment I'm getting a feeling we're entering the realm of overengineering.

That's quite possible. But the changes were rather mild, and so I am submitting my patch anyway.

Do you think it is overengineered? We should of course test if the performance drops too much.

And we need to run the test suite (I didn't, yet).

Apply trac_14159_weak_value_triple_dict.patch

comment:28 follow-up: Changed 4 years ago by SimonKing

I was just wondering whether a MonoDict with weak values really makes sense. After all, it means: If you do M[a]=b`, then this item will be deleted, unless you keep a strong reference to both a and b. But then you probably don't need a dictionary.

Anyway, I think the implementation for the TripleDict case makes sense.

comment:29 in reply to: ↑ 28 Changed 4 years ago by nbruin

Replying to SimonKing:

I was just wondering whether a MonoDict with weak values really makes sense. After all, it means: If you do M[a]=b`, then this item will be deleted, unless you keep a strong reference to both a and b.

Not so much because of the requirement for strong references (with weak caching, the idea is that unknown entities are still holding references), but more because in that case you might as well spell it

a.M=weakref(b)

Incidentally, see #14058 for another use case where weakrefs (without callbacks there!) get stored in a TripleDict. So that seems to be a strong indication that supporting weak values is indeed a reasonable think for it. I would propose to amend the solution there to use the facilities here and make #14058 depend on #14159. Provided the code here all checks out to work as expected and perform well, of course.

Last edited 4 years ago by nbruin (previous) (diff)

comment:30 follow-up: Changed 4 years ago by SimonKing

I still have not the faintest idea how to interpret the results of the startup_time plugin. Can someone please give me a pointer?

comment:31 in reply to: ↑ 30 Changed 4 years ago by nbruin

Replying to SimonKing:

I still have not the faintest idea how to interpret the results of the startup_time plugin. Can someone please give me a pointer?

While it's probably measuring something significant, I don't think the process properly controls for other factors. I've seen significant (consistent) timing differences between different compiles of exactly the same sage (i.e., after hg qpop; sage -b; hg qpush; sage -b) on the same computer. The only thing I can think of for something like that is getting lucky/unlucky how certain libraries get laid out in memory. The differences it's reporting are so small that I think one cannot rule out such factors (the reference build being "lucky" and the patch build being "unlucky").

comment:32 Changed 4 years ago by SimonKing

I could get the meaning if it would say: "With 98% confidence, the startup time of sage-5.8.beta1+#14159 is at least 0.1% bigger than the startup time of sage-5.8.beta1." But the plugin does not seem to make such statements.

Instead, looking at the diff, one sees

--- 5.8.beta1 

+++ 5.8.beta1 + #14159 
...
-Main:   0.70703 sec (30 samples, std_dev=0.0174)
-Ticket: 0.71392 sec (30 samples, std_dev=0.0296)
+Main:   0.66585 sec (30 samples, std_dev=0.00636)
+Ticket: 0.67404 sec (30 samples, std_dev=0.0354)

-Average increase of 0.0069 secs or 0.97%.
+Average increase of 0.0082 secs or 1.2%.

-With 52% confidence, startup time increased by at least 0.5%
-With 72% confidence, startup time increased by at least 0.25%
-With 83% confidence, startup time increased by at least 0.1%
+With 36% confidence, startup time increased by at least 0.5%
+With 86% confidence, startup time increased by at least 0.25%
+With 98% confidence, startup time increased by at least 0.1%

So, in vanilla sage, one has 83% confidence of getting an increase of startup time by at least 0.1%---but increase with respect to what? This is what I don't understand.

comment:33 Changed 4 years ago by SimonKing

I have run sage -startuptime both with and without the patch. The strange result:

  • Without the patch, the import of _ssl is listed with 3.397ms. But with the patch, _ssl is not listed.
  • With the patch, the import of sage.plot.graphics is listed with 21.036ms. But without the patch, sage.plot.graphics is not listed.

I can neither believe that sage.plot.graphics is not imported during startup in vanilla sage, nor can I believe that sage.plot.graphics is imported just because of changing the intestines of coercion dictionaries.

Bug in the startuptime script?

comment:34 Changed 4 years ago by SimonKing

PS: The time spent for including sage.plot.graphics is nearly equal to the difference of the total startup times that I get with and without the patch. Hence, it could explain why the plugin complains.

comment:35 follow-up: Changed 4 years ago by vbraun

For the record, the startuptime patchbot plugin compares ticket vs. no ticket. So the 0.1% increase is relative to not applying the patch on the ticket.

comment:36 in reply to: ↑ 35 Changed 4 years ago by SimonKing

Replying to vbraun:

For the record, the startuptime patchbot plugin compares ticket vs. no ticket. So the 0.1% increase is relative to not applying the patch on the ticket.

Sure that should be the case. But, as you see, the plugin provides a diff "no ticket" versus "ticket". And according to the diff, "With 83% confidence, startup time increased by at least 0.1%" without ticket. So, is the point that 83% is not significant?

Hence, does the plugin state the following?

"Without the patch, there is no statistical evidence for a 0.1% increase of startup time. With the patch, there is statistical evidence for a 0.1% increase."?

comment:37 Changed 4 years ago by jpflori

  • Cc jpflori added

comment:38 follow-up: Changed 4 years ago by SimonKing

Note that in the most recent run, the startup time plugin did actually not complain!

Nevertheless, I think we should care about speed. I see two approaches to improve the startup time, the second of them would improve the general performance.

  1. I found that a dozen of homsets are created and deleted during startup. One should find where they are created, and see whether their creation is really needed or whether one can avoid that the get deleted.
  2. MonoDict and TripleDict have fast cdef set and get methods, but they are not always used. At least in Cython modules, we should try to find all locations which could benefit from replacing T[a,b,c] by T.get(a,b,c). And even for Python modules, I think it would make sense to turn the cdef get and set methods into cpdef get and set methods: I guess it would still be faster then T[a,b,c], because one avoids the test
         try:
             k1, k2, k3 = k
         except (TypeError,ValueError):
             raise KeyError, k
    

comment:39 in reply to: ↑ 38 ; follow-up: Changed 4 years ago by nbruin

Replying to SimonKing:

  1. I found that a dozen of homsets are created and deleted during startup. One should find where they are created, and see whether their creation is really needed or whether one can avoid that the get deleted.

"Premature optimization is the root of all evil" [Knuth]. Does their creation and deletion add considerably to the startup time? I would expect that this gets dwarfed by all the work involved in constructing class instances and resizing the globals dictionary.

  1. MonoDict and TripleDict have fast cdef set and get methods, but they are not always used. At least in Cython modules, we should try to find all locations which could benefit from replacing T[a,b,c] by T.get(a,b,c). And even for Python modules, I think it would make sense to turn the cdef get and set methods into cpdef get and set methods: I guess it would still be faster then T[a,b,c], because one avoids the test

Probably not: The __setitem__ and __getitem__ lookups in python likely happen faster than normal method lookups in python, because the special methods are stored in slots, whereas normal method lookups have to resort to dictionary lookups. That might well cancel the packing/unpacking that needs to happen for the key (which isn't necessary for MonoDict at all, by the way).

I think to do optimizations with best productivity:

  • locate scenarios where MonoDict and TripleDict access is significant in time usage
  • Find what is creating the worst overhead.

I'm not sure that the sage startup routine is most representative of that behaviour. If the goal is improving sage startup time, the right way is probably to only import a "stub" notebook that offers the limited functionality from other modules (basically capable of answering "we're not in the notebook") and make the stub so that starting the notebook, or any other non-trivial access, results in loading and replacing the stub with the real notebook.

Last edited 4 years ago by nbruin (previous) (diff)

comment:40 in reply to: ↑ 39 ; follow-up: Changed 4 years ago by SimonKing

Replying to nbruin:

"Premature optimization is the root of all evil" [Knuth]. Does their creation and deletion add considerably to the startup time?

Probably not.

Probably not: The __setitem__ and __getitem__ lookups in python likely happen faster than normal method lookups in python, because the special methods are stored in slots, whereas normal method lookups have to resort to dictionary lookups.

OK. That's calling overhead versus avoiding an additional test. That would be easy to test.

But at least in cython modules, when one uses cdef'd variables of type MonoDict or TripleDict, using a cdef or cpdef set/get method directly should be faster than calling __set/getitem__, isn't it?

comment:41 in reply to: ↑ 40 ; follow-up: Changed 4 years ago by nbruin

Replying to SimonKing:

But at least in cython modules, when one uses cdef'd variables of type MonoDict or TripleDict, using a cdef or cpdef set/get method directly should be faster than calling __set/getitem__, isn't it?

Only because it's using the cdef part then, I think.

comment:42 in reply to: ↑ 41 Changed 4 years ago by SimonKing

Replying to nbruin:

Replying to SimonKing:

But at least in cython modules, when one uses cdef'd variables of type MonoDict or TripleDict, using a cdef or cpdef set/get method directly should be faster than calling __set/getitem__, isn't it?

Only because it's using the cdef part then, I think.

"Only" or not: If it is faster then it should be used.

And I don't think it is "only" cdef. It is also the set/get expects a fixed number of arguments, while in __set/getitem__ the given arguments are put into a tuple, need to be unpacked, the length of the tuple must be checked. So, set/get could be faster anyway.

I am now testing---as soon as half of the Sage library is rebuilt. I hate patches touching coerce_dict.pxd!!!

comment:43 Changed 4 years ago by nbruin

%cython
cdef class dictest(object):
    cpdef get(self,int k1,int k2,int k3):
        if k1==1 and k2==1 and k3==1:
            return True
        else:
            return False
    def __getitem__(self,k):
        cdef int k1,k2,k3
        try:
            k1,k2,k3=k
        except:
            raise KeyError
        return self.get(k1,k2,k3)

regular method call protocol is pretty expensive:

sage: C=dictest()
sage: timeit("C.get(1,1,1)")
625 loops, best of 3: 795 ns per loop
sage: timeit("C[1,1,1]")
625 loops, best of 3: 717 ns per loop

(but of course, from cython, one should absolutely use the cdef form)

comment:44 Changed 4 years ago by SimonKing

Some other timings, making set/get cpdef:

sage: from sage.structure.coerce_dict import TripleDict, MonoDict
sage: R = [Integers(n) for n in range(1,10^4)]
sage: T = TripleDict(53)
sage: M = MonoDict(53)
sage: def test_py_getitem(T,R):
    for x in R:
        T[R,R,R] = 1
    for x in R:
        _ = T[R,R,R]
....:         
sage: def test_py_get(T,R):
    for x in R:         
        T.set(R,R,R, 1)
    for x in R:
        _ = T.get(R,R,R)
....:         
sage: cython("""
from sage.structure.coerce_dict cimport TripleDict
def test_cy_getitem(TripleDict T,R):
    for x in R:
        T[R,R,R] = 1
    for x in R:
        _ = T[R,R,R]
""")
....: 
sage: cython("""
from sage.structure.coerce_dict cimport TripleDict
def test_cy_get(TripleDict T,R):
    for x in R:
        T.set(R,R,R, 1)
    for x in R:
        _ = T.get(R,R,R)
""")
....: 
sage: %time test_py_getitem(T,R)
CPU times: user 0.03 s, sys: 0.01 s, total: 0.04 s
Wall time: 0.04 s
sage: %timeit test_py_getitem(T,R)
10 loops, best of 3: 23.9 ms per loop
sage: %timeit test_py_getitem(T,R)
10 loops, best of 3: 23.5 ms per loop
sage: %timeit test_py_getitem(T,R)
10 loops, best of 3: 23.6 ms per loop
sage: %timeit test_py_get(T,R)
10 loops, best of 3: 24.6 ms per loop
sage: %timeit test_py_get(T,R)
10 loops, best of 3: 24.3 ms per loop
sage: %timeit test_py_get(T,R)
10 loops, best of 3: 24.3 ms per loop
sage: %timeit test_cy_getitem(T,R)
10 loops, best of 3: 19.9 ms per loop
sage: %timeit test_cy_getitem(T,R)
100 loops, best of 3: 19.9 ms per loop
sage: %timeit test_cy_getitem(T,R)
100 loops, best of 3: 19.6 ms per loop
sage: %timeit test_cy_get(T,R)
100 loops, best of 3: 17.9 ms per loop
sage: %timeit test_cy_get(T,R)
100 loops, best of 3: 18 ms per loop
sage: %timeit test_cy_get(T,R)
100 loops, best of 3: 18.3 ms per loop

Hence, I can confirm that in Python one better relies on the magical methods, while in cython it is better (but not much better!) to use the cpdef methods.

But in fact, I found some spots in parent_old.pyx that *are* used and call the magical methods.

comment:45 Changed 4 years ago by SimonKing

  • Description modified (diff)

I have added a new patch, that will hopefully make Sage a little faster. This is by using the cdef set/get methods everywhere in cython. The only spot that is using the magical methods i sage/categories/homset, which currently is python. Perhaps it should be cythoned? Since homsets are in the background for every conversion and coercion, this could be of some benefit.

Apply trac_14159_weak_value_triple_dict.patch trac_14159_use_cdef_get.patch

comment:46 Changed 4 years ago by SimonKing

PS: In any case, cythoning sage.categories.homset should be on a different ticket.

comment:47 follow-up: Changed 4 years ago by SimonKing

Note that different patchbots show different results for the plugins: One patchbot, running sage-5.8.beta2, does not complain at all. Another one, running sage-5.8.beta1, complains both for startup_modules and startup_time.

comment:48 in reply to: ↑ 47 Changed 4 years ago by nbruin

Replying to SimonKing:

Note that different patchbots show different results for the plugins: One patchbot, running sage-5.8.beta2, does not complain at all.

For the startup module I think that's because 5.8b2 has #13387 merged (for now at least).

comment:49 Changed 4 years ago by SimonKing

  • Dependencies changed from #13387 to #13387, #14254

#14254 is a blocker that will likely to be merged before this ticket here is positively reviewed. Hence, it will be a dependency.

Question: Shall we remove the new function signed_id introduced in #14254? It is not needed with the approach that we take here. Alternatively, we could apply it. How much is the overhead of calling a cpdef inline function doing <Py_ssize_t><void *>(x), compared without doing it directly?

comment:50 Changed 4 years ago by SimonKing

Apparently there is no significant difference:

sage: cython("""
cdef inline Py_ssize_t signed_id(object X):
    <Py_ssize_t><void *>(X)
from sage.all import srange
def testcall():
    cdef object i
    for i in srange(10^6):
        a = signed_id(i)
def testdirect():
    cdef object i
    for i in srange(10^6):
        a = <Py_ssize_t><void *>(i)
""")
sage: %timeit testcall()
10000 loops, best of 3: 87.2 us per loop
sage: %timeit testdirect()
10000 loops, best of 3: 87 us per loop

comment:51 follow-ups: Changed 4 years ago by nbruin

Replying to SimonKing:

Question: Shall we remove the new function signed_id introduced in #14254? It is not needed with the approach that we take here. Alternatively, we could apply it. How much is the overhead of calling a cpdef inline function doing <Py_ssize_t><void *>(x), compared without doing it directly?

It's inline, so using it in a place where the cdef part is used should be 0 overhead. The inline should get expanded before the optimizer even looks at the code. The whole point of inline is to get the performance of #define macros without the headaches.

If the function is not really in the way, it may be worth keeping around. I have found it immensely useful for debugging cython code if as much features as possible are also exposed in python. In particular, the cdef only data attributes can be a real pain.

comment:52 in reply to: ↑ 51 Changed 4 years ago by SimonKing

Replying to nbruin:

If the function is not really in the way, it may be worth keeping around. I have found it immensely useful for debugging cython code if as much features as possible are also exposed in python. In particular, the cdef only data attributes can be a real pain.

+5

So, let's keep it (it is cpdef anyway, hence, can be used from Python as well), and apparently it is fast enough (in the example above I had "cdef inline", but "cpdef inline" gives the same timings).

comment:53 Changed 4 years ago by SimonKing

The first patch needed to be rebased, the second was still fine.

Apply trac_14159_weak_value_triple_dict.patch trac_14159_use_cdef_get.patch

Changed 4 years ago by SimonKing

Optional weak values for mono- and tripledict

comment:54 Changed 4 years ago by SimonKing

I needed another update of the patch: #14254 has introduced signed_id, which is imported in homset.py, but with the approach from here, we do not need to import signed_id. Hence, I removed the import statement in the current version of the patch.

Apply trac_14159_weak_value_triple_dict.patch trac_14159_use_cdef_get.patch

comment:55 Changed 4 years ago by nbruin

  • Status changed from needs_review to positive_review

Good stuff! Much cleaner. Patchbot is happy and effectively this patch consolidates some use of TripleDict that was already in ad-hoc use, so there is not too much chance for unexpected failures. The solution here does solve the "manual" installation of callbacks that was dangerous before and in includes a sanity check that the object on which the callback is happening is indeed still referring to the (now dead) weakref that is causing the callback.

comment:56 in reply to: ↑ 51 Changed 4 years ago by jdemeyer

Replying to nbruin:

The inline should get expanded before the optimizer even looks at the code.

That's not entirely true, at least not with GCC. The inline keyword is just one of the things that GCC considers to decide whether or not to inline a function. Very simple functions like signed_id() might very will be always inlined at higher optimization levels, while complicated functions might never be inlined (even if marked inline).

comment:57 Changed 4 years ago by jdemeyer

  • Milestone changed from sage-5.9 to sage-5.10

comment:58 Changed 4 years ago by jdemeyer

trac_14159_use_cdef_get.patch needs a proper commit message

Changed 4 years ago by SimonKing

Use cdef get/set methods of MonoDict and TripleDict in cython modules

comment:59 Changed 4 years ago by SimonKing

Message added.

comment:60 Changed 4 years ago by jdemeyer

  • Merged in set to sage-5.10.beta0
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.