Opened 5 years ago
Closed 5 years ago
#14159 closed defect (fixed)
Don't install callbacks on values of TripleDict, MonoDict
Reported by: | nbruin | Owned by: | tbd |
---|---|---|---|
Priority: | major | Milestone: | sage-5.10 |
Component: | memleak | Keywords: | |
Cc: | SimonKing, jpflori | Merged in: | sage-5.10.beta0 |
Authors: | Simon King | Reviewers: | Nils Bruin |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | #13387, #14254 | Stopgaps: |
Description (last modified by )
In #11521 trac_11521_callback.patch a callback was installed on a weakref *value* of a TripleDict:
_cache[key] = KeyedRef(H, _cache.eraser, (id(X),id(Y),id(category)))
That's not safe: If the value under that key gets changed while the value remains in memory, the callback will be executed and remove an entry that now has an unrelated value!
So: Either prove that the value under this key will not change for the lifetime of H (keep in mind that cyclic references can extend the lifetime of an otherwise unreachable object essentially indefinitely, so the proof needs to include that all key components survive H, otherwise those ids could get reused) or provide a more selective callback (for instance, ensure that the value is still as expected before deleting).
Note: The patch trac_14159_safer_callback.patch follows the second approach, so that a memory leak remains fixed.
Another point is that the API of _cache.eraser
isn't really published, so this behaviour is probably better encapsulated in a method on the dict itself.
See #12313 comment 317 for a situation that likely implicates these callbacks (someone has to hold strong references to these keys in order to set the dict, so the absence of the keys suggests a spurious execution of this kind of callback)
Apply
Attachments (5)
Change History (65)
comment:1 follow-up: ↓ 2 Changed 5 years ago by
comment:2 in reply to: ↑ 1 ; follow-up: ↓ 3 Changed 5 years ago by
Replying to SimonKing:
- If an existing entry is overwritten by
__setitem__
then its callback will be removed.
I guess you could reach into the KeyedRef
object and invalidate the key to prevent action on callback. Doing so would complicate all our dict setting and deleting, so I think that is not an attractive option.
Just deleting the KeyedRef
object won't guarantee it will go away: someone else may be holding a reference to the KeyedRef
object as well.
So probably there needs to be an additional option on ...DictEraser
objects to check that the value is as expected (or write yet another callback class, but that sounds like a waste).
And then ...Dict
would grow an extra set_weak_value
to install the appropriate KeyedRef? on the value, because letting other code instantiate eraser objects is scary.
It feels like horrible feature creep, but I guess you have a good use case (can you come up with an alternative solution?)
comment:3 in reply to: ↑ 2 ; follow-up: ↓ 4 Changed 5 years ago by
Replying to nbruin:
Replying to SimonKing:
- If an existing entry is overwritten by
__setitem__
then its callback will be removed.I guess you could reach into the
KeyedRef
object and invalidate the key to prevent action on callback. Doing so would complicate all our dict setting and deleting, so I think that is not an attractive option.
So, what about the first option? It would make sure that an entry will only be deleted if the value is identical with the object whose callback is being called.
comment:4 in reply to: ↑ 3 ; follow-up: ↓ 5 Changed 5 years ago by
Replying to SimonKing:
So, what about the first option? It would make sure that an entry will only be deleted if the value is identical with the object whose callback is being called.
Yes, sorry for not being clear about it. That's what the second part of the reply expands on.
comment:5 in reply to: ↑ 4 Changed 5 years ago by
Replying to nbruin:
Replying to SimonKing:
So, what about the first option? It would make sure that an entry will only be deleted if the value is identical with the object whose callback is being called.
Yes, sorry for not being clear about it. That's what the second part of the reply expands on.
Sorry, I don't see the relation with suggestion 1.
So probably there needs to be an additional option on ...DictEraser? objects to check that the value is as expected
Why? The callback would be checking that the value is identical with the object pointed to by the weak reference. This should be cheap enough to be done by default.
comment:6 Changed 5 years ago by
I thought that the following would expose the problem, but it doesn't:
sage: from sage.structure.coerce_dict import TripleDict sage: import gc sage: class Foo: pass sage: sage: a = 1 sage: b = 2 sage: c = 3 sage: v = Foo() sage: w = Foo() sage: T = TripleDict(13) sage: T[a,b,c] = v sage: T[a,b,c] = w sage: del v sage: _ = gc.collect() sage: len([x for x in gc.get_objects() if isinstance(x,Foo)]) 1
So, v got collected and thus the callback was executed. However:
sage: id(T[a,b,c]) 85762632 sage: id(w) 85762632
The entry that was previously holding v did not get deleted when v's callback was called.
So, are we really in trouble?
comment:7 follow-up: ↓ 8 Changed 5 years ago by
You have to make sure that the weakref object lives to see the death of v (and ensure that you make such a weakref in the first place!)
sage: from sage.structure.coerce_dict import TripleDict sage: import gc sage: import weakref sage: class Foo: pass sage: sage: a = 1 sage: b = 2 sage: c = 3 sage: v = Foo() sage: w = Foo() sage: T = TripleDict(13) sage: T[a,b,c] = weakref.KeyedRef(v,T.eraser,(id(a),id(b),id(c))) sage: h=T[a,b,c] sage: T[a,b,c] = w sage: del v sage: _ = gc.collect() sage: len([x for x in gc.get_objects() if isinstance(x,Foo)]) 1 sage: T[a,b,c] == w KeyError: (1, 2, 3)
I admit, the weakref surviving outside the dict is not a terribly likely event, but from your code at #11521 it wasn't entirely clear to me that you can absolutely rule it out.
comment:8 in reply to: ↑ 7 ; follow-up: ↓ 9 Changed 5 years ago by
Replying to nbruin:
You have to make sure that the weakref object lives to see the death of v (and ensure that you make such a weakref in the first place!)
Apparently I had wrong memories about my code, then. I had in mind that the TripleDict
has a weak reference with callback to the value.
sage: from sage.structure.coerce_dict import TripleDict sage: import gc sage: import weakref sage: class Foo: pass sage: sage: a = 1 sage: b = 2 sage: c = 3 sage: v = Foo() sage: w = Foo() sage: T = TripleDict(13) sage: T[a,b,c] = weakref.KeyedRef(v,T.eraser,(id(a),id(b),id(c))) sage: h=T[a,b,c] sage: T[a,b,c] = w sage: del v sage: _ = gc.collect() sage: len([x for x in gc.get_objects() if isinstance(x,Foo)]) 1 sage: T[a,b,c] == w KeyError: (1, 2, 3)I admit, the weakref surviving outside the dict is not a terribly likely event, but from your code at #11521 it wasn't entirely clear to me that you can absolutely rule it out.
Hm. This particular TripleDict
is only used by hom, and hom doe not override an existing entry. So, this particular case should be safe. However, it still seems reasonable to me to let the callback check whether the to-be-deleted value did not change.
comment:9 in reply to: ↑ 8 Changed 5 years ago by
Replying to SimonKing:
Hm. This particular
TripleDict
is only used by hom, and hom doe not override an existing entry. So, this particular case should be safe. However, it still seems reasonable to me to let the callback check whether the to-be-deleted value did not change.
I was fully expecting that to be true, but I could not come up with an explanation for the bug Jeroen reported at #12313, where the line in
_cache[key] = KeyedRef(H, _cache.eraser, (id(X),id(Y),id(category)))
leads to a key error in TripleDict.set
in the line:
r1,r2,r3 = <tuple>(self._refcache[h1,h2,h3])
The only thing I could think of for that line to fail is:
- This
KeyedRef
objectW
is constructed _cache.set(k1,k2,k3,W)
is called withk1=id(X),k2=id(Y),k3=id(category)
. Note that the caller is holding strong references to(X,Y,category)
.- In order to end up in the line
r1,r2,r3 = <tuple>(self._refcache[h1,h2,h3])
, the key triple(k1,k2,k3)
must be present in thisTripleDict
and hence inself._refcache
- Yet, we get a
KeyError
. Callingself._refcache[h1,h2,h3]
requires the construction of a new tuple(h1,h2,h3)
, which could trigger aGC
and hence a callback that would remove entries. So the only explanation I see is that we're getting a callback with key(k1,k2,k3)
due to a GC triggered by this tuple allocation. - However, this callback cannot result from any of
X,Y,category
getting collected, because someone is holding strong references to them. Hence such a callback must be coming from somewhere else. - The code pointed to (ironically the same code that triggers this
TripleDict
operation) is an example of code that installs such callbacks. So a possible scenario is that a statement
_cache.set(k1,k2,k3,Wold)
was executed before, that Wold is still hanging around in memory and that it finally gets removed due to this poorly timed garbage collection and consequently that the keytriple (k1,k2,k3)
receives a callback, even though they are pointing to (or are about to be pointing to -- we don't know what happened in between) another, unrelated value.
In this scenario, values in this Hom
dictionary do get changed.
We need to ensure that TripleDict
cannot trigger GC
at that point anyway (and that is what #13387 does as a side-effect), because all kinds of other bad things could happen as well. However, given that it needs such a complicated explanation, I would hope we can verify my assumptions, because otherwise my fix might not be sufficient.
Hence the formulation of the ticket: If you can prove that installing this callback is safe, I have no problem with it.
comment:10 Changed 5 years ago by
Ouch, I looked at that code and you're correct: That doesn't overwrite the value at all (it's really a cache). Incidentally, at that code:
try: H = _cache[key]() except KeyError: H = None if H is not None: # Are domain or codomain breaking the unique parent condition? if H.domain() is X and H.codomain() is Y: return H
Since X,Y
are part of key, which is looked up by identity, the "domain" and "codomain" test should really be superfluous. So perhaps we want
if H is not None: assert H.domain() is X and H.codomain() is Y return H
since something is going very wrong if this is not the case! (H would not be allowed to sit there in the dict).
Furthermore, H
has strong references to X,Y,category
in the form of H.domain()
, H.codomain()
and H.homset_category()
, so if H is alive then so must the keys. Which means that the callback on H must happen (or be purged) before those on X,Y
etc., or at least in the same GC.
so their id
s can't get reused until their callbacks have happened or have been discarded.
I think that only leaves the possibility that _cache
got silently corrupted due to a GC at an inopportune time before this particular error. We cannot rule that out, but it's extremely unlikely: It requires a bucket to be rather full and buckets are typically not full at all (ideally only one entry).
On the other hand, since this is a global _cache, it does get to hold quite a few keys, so it's quite likely that it underwent a resize
operation, where a corruption was much more likely to happen.
It's rather difficult to do an accurate postmortem with so little data.
comment:11 Changed 5 years ago by
- Status changed from new to needs_review
Thank you for moving the patch to the right location (I forgot about the existence of this ticket...).
comment:12 Changed 5 years ago by
"Reviewer" patch to provide some doc and pointers if this ever proves to be a problem. An example of the kind of situation where this could lead to quadratic memory use (relative to what is required):
R=[ZZ.quo(3^n) for n in [1..100]] for i in range(100): for j in range(i,100): H = Hom(R[j],R[i])
This leaves 100*101/2
entries with (eventually) dead weakrefs in the dictionary.
I think we should leave ample documentation of this leak in the code, including the strategy to fix it, should it become necessary.
(for posterity: The KeyedRef? alternative is probably OK, because it seems that H can never outlive its domain and codomain X and Y, so I don't see how their id's could get reused before the callback on H has happened or got purged. I'm just very cautious with anything that has to do with GC and callback orders after the problems we encountered with it elsewhere. In any case, the incantation would have to documented "DO NOT COPY THIS WITHOUT UNDERSTANDING THE CONSEQUENCES FULLY", because it's a bad pattern to use elsewhere, unless you prove a whole bunch of extra conditions.)
Simon, are you OK with the doc? If so we can put this to positive review (doctests passed for me)
comment:13 Changed 5 years ago by
- Component changed from PLEASE CHANGE to memleak
- Reviewers set to Nils Bruin
- Type changed from PLEASE CHANGE to defect
comment:14 Changed 5 years ago by
Since your patch changes code (by inserting assert statements) I better run the doc tests with #14159, #12313 and #13387, before changing it to positive review.
Note, however, that the error reported by the patchbot probably is unrelated:
sage -t -force_lib devel/sage-14159/doc/en/constructions/calculus.rst ********************************************************************** File "/mnt/storage2TB/patchbot/Sage/sage-5.8.beta0/devel/sage-14159/doc/en/constructions/calculus.rst", line 102: sage: maxima(g).powerseries('x',0) Expected: 16*f0*('sum((2^(2*i1-1)-1)*bern(2*i1)*k^(2*i1-1)*x^(2*i1-1)/(2*i1)!,i1,0,inf))^4 Got: Maxima crashed -- automatically restarting. 16*f0*('sum((2^(2*i1-1)-1)*bern(2*i1)*k^(2*i1-1)*x^(2*i1-1)/(2*i1)!,i1,0,inf))^4 ********************************************************************** 1 items had failures: 1 of 7 in __main__.example_3 ***Test Failed*** 1 failures.
comment:15 Changed 5 years ago by
A little data point for how this change might adversely affect memory and speed performance. Running
import resource R=[ZZ.quo(3^n) for n in [1..1000]] def test(R): for i in range(len(R)): print i for j in range(i,len(R)): H=Hom(R[j],R[i]) %time test(R) resource.getrusage(resource.RUSAGE_SELF)
Reference:
resource.struct_rusage(ru_utime=50.020395, ru_stime=0.16497399999999998, ru_maxrss=130108, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=42660, ru_majflt=1, ru_nswap=0, ru_inblock=600, ru_oublock=240, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=8669, ru_nivcsw=7514)
With this patch:
resource.struct_rusage(ru_utime=54.220757, ru_stime=0.35794499999999996, ru_maxrss=703824, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=187662, ru_majflt=1, ru_nswap=0, ru_inblock=584, ru_oublock=240, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=12759, ru_nivcsw=7940)
This of course is an example that tries to expose exactly the flaw. We're finding:
- about 10% runtime penalty. That seems to be the garbage collections that are happening.
- a lot more memory usage (without the patch memory usage is flat. With the patch you can run out of memory)
I'm not sure how serious this is. It's making me a little hesitant on whether we should be "fixing" this at all, though!
comment:16 Changed 5 years ago by
Idea: We could add a callback---but a new one. Namely a callback, that will only remove the entry from the TripleDict
if the value is the weak reference for which this callback is called.
comment:17 Changed 5 years ago by
- Description modified (diff)
I did not run the full tests yet, but I think the new patch can be reviewed.
I introduce a variant of TripleDictEraser
dedicated for use on the values of a TripleDict
(if we use a callback for the values of MonoDict
then we need to do the same). It will only erase an item if the value is the originally stored weak reference.
The patch contains tests demonstrating that TripleDictEraser
is unsafe but TripleDictEraserOnValue
is safe on values, and it contains a test demonstrating that homsets can still be garbage collected.
Apply trac_14159_safer_callback.patch
comment:18 Changed 5 years ago by
- Dependencies set to #13387
PS: I worked on top of #13387, so, that's a dependency.
Apply trac_14159_safer_callback.patch
comment:19 follow-up: ↓ 22 Changed 5 years ago by
tests OK for me. One tiny nitpick:
- You're claiming that this fixes an extremely rare race condition wrt. to GC. I'm actually not sure that it CAN occur for our particular use (because our value has a strong reference to a key component). I think your example on
TripleDictEraserOnValue
illustrates why we should prefer checking this anyway. I hope we don't have to be this paranoid in the future.
There's another one, that the _cache[X,Y,category]=KeyedRef(...)
statement is very fragile: If you make a mistake in typing the (id(X),id(Y),id(category))
you'll have a very hard to trace (and possibly disastrous!) bug in your program. One solution would be to encapsulate it in TripleDict
, but the instantiation of the TripleDictEraserOnValue
would complicate the logic in TripleDict
,
so let's just keep it this way.
comment:20 Changed 5 years ago by
- Status changed from needs_review to needs_work
- Work issues set to Make part the coverage script happy
The coverage script rightfully complains that I didn't add a test to the __call__
method of the new TripleDictEraserOnValue
.
Do we need to worry about startup time? How to interpret the data?
comment:21 Changed 5 years ago by
The startup_modules script complains that sage.structure.gc was added. But that has happened in #13387! So, why is the patchbot complaining here, too?
comment:22 in reply to: ↑ 19 ; follow-up: ↓ 26 Changed 5 years ago by
Replying to nbruin:
tests OK for me. One tiny nitpick:
- You're claiming that this fixes an extremely rare race condition wrt. to GC. I'm actually not sure that it CAN occur for our particular use (because our value has a strong reference to a key component). I think your example on
TripleDictEraserOnValue
illustrates why we should prefer checking this anyway. I hope we don't have to be this paranoid in the future.
I thought you diagnosed that this was the cause of one problem we saw? Such as:
- Some parent X, Y give rise to a homset H, in a category C.
- H and X get deleted, but the callbacks are not performed yet. A new parent X2 is created, and by mere coincidence we have id(X)==id(X2).
- The homset H2 with respect to X2, Y and C is created. It thus overrides H in the cache.
- Finally, the callback of H is performed, erasing H2 from the cache. Boom
This is what I meant by a "rare racing condition".
Do you prefer that I erase this statement?
Thinking of it: Suppose we are now considering the callback of X, which, in the above scenario, has not been called prior to the creation of X2. Would it not delete the new item, too? So, shouldn't we add a test in TripleDictEraser.__call__(self, r)
, so that an item is deleted only if one of the references from the keys of this item is identical with r?
There's another one, that the
_cache[X,Y,category]=KeyedRef(...)
statement is very fragile: If you make a mistake in typing the(id(X),id(Y),id(category))
you'll have a very hard to trace (and possibly disastrous!) bug in your program. One solution would be to encapsulate it inTripleDict
, but the instantiation of theTripleDictEraserOnValue
would complicate the logic inTripleDict
,
I wouldn't like to put this into TripleDict
. After all, it is impossible to prevent all thinkable typing mistakes.
comment:23 Changed 5 years ago by
I am just thinking: Perhaps one should enrich the key of the KeyedReference
, so that it states where we are supposed to find the reference that caused the callback. I am thinking of an additional int, that is 1, 3 or 5 if the reference is in the first, second or third part of the key (1,3,5 give the relative position in the bucket), and 6 if the reference belongs to the value of the item.
In that way, we would avoid the racing condition I described in the previous comment, *and* we would not need to create a separate class TripleDictEraserOnValue
.
comment:24 Changed 5 years ago by
Additional idea: We could create TripleDict
with an optional parameter weak_values
. If this parameter is True, then the set/get methods would automatically use a weak reference on the value, with the correct key for the callback. Hence, we do not need to exploit implementation details when we apply a triple dict with weak values in sage.categories.homset
Does that sound like a good idea?
comment:25 Changed 5 years ago by
What I suggest is the following:
sage: from sage.structure.coerce_dict import MonoDict sage: import gc sage: M = MonoDict(13) sage: MW = MonoDict(13, weak_values=True) sage: class Foo: pass sage: a = Foo() sage: b = Foo() sage: k = 1 sage: M[k] = a sage: MW[k] = b sage: M[k] is a True sage: MW[k] is b True sage: k in M True sage: k in MW True sage: import gc sage: del a,b sage: _ = gc.collect() sage: k in M True sage: k in MW False sage: len(MW) 0 sage: len(M) 1 sage: MW[k] = int(5) sage: MW[k] 5
Hence, we have the option to use weak values, but even in that case we would accept non-weakrefable values, too.
And, similarly, for TripleDict
, with using weak_values
for the homset cache.
comment:26 in reply to: ↑ 22 ; follow-up: ↓ 27 Changed 5 years ago by
Replying to SimonKing:
(just saw your new post. Definitely an option. If you see merit I'm fine with that. Below response I wrote to your earlier message)
I thought you diagnosed that this was the cause of one problem we saw?
No, what I'm pretty sure we saw at some point was GC happening during dictionary value setting. While looking at the code I noticed this callback without proof that it's safe.
- Some parent X, Y give rise to a homset H, in a category C.
- H and X get deleted, but the callbacks are not performed yet. A new parent X2 is created, and by mere coincidence we have id(X)==id(X2).
- The homset H2 with respect to X2, Y and C is created. It thus overrides H in the cache.
- Finally, the callback of H is performed, erasing H2 from the cache. Boom
Yes, that's exactly the scenario I envisaged. I don't have a proof that this is theoretically possible, though.
The weakref to X is referenced by this dictionary that is still in use, so it's not garbage. I think that means the callback to X is guaranteed to happen before deallocation of X (and hence before the creation of X2). Furthermore X can only be found to be garbage if H is found to be garbage (because of the strong domain
reference), so the callback on H has to happen in the same GC as the deallocation of X. So as long as we're not creating parents during callbacks, I think that also means X2 cannot be created in the same memory location as X while the callback on H hasn't happened yet.
So we have a choice:
- make a very involved argument, carefully reasoning about the specifics of python's GC system (for which we don't actually have a formal axiom system)
- be on the safe side and just put in a cheap test.
Do you prefer that I erase this statement?
perhaps "race condition we couldn't quite rule out"
Thinking of it: Suppose we are now considering the callback of X, which, in the above scenario, has not been called prior to the creation of X2. Would it not delete the new item, too? So, shouldn't we add a test in
TripleDictEraser.__call__(self, r)
, so that an item is deleted only if one of the references from the keys of this item is identical with r?
No, in order to set the value, the caller must be holding strong references to the keys. As far as I know, weakref callback and deallocation only get separated inside GC. So as long as we're not calling set in callbacks, I think we're prevented from it (note that for H we needed, in addition, that H holds a reference to X AND that key triples do not get reassigned.)
I wouldn't like to put this into
TripleDict
. After all, it is impossible to prevent all thinkable typing mistakes.
Well, this is one where you have to type the same data twice. But I agree: We have only one use case. Should we get more I think it's worth reconsidering.
Additional idea: We could create TripleDict? with an optional parameter weak_values. If this parameter is True, then the set/get methods would automatically use a weak reference on the value, with the correct key for the callback. Hence, we do not need to exploit implementation details when we apply a triple dict with weak values in sage.categories.homset
Well, conceptually that's a different concept. Whether we want to implement that in the same structure (advantage: code reuse, disadvantage: increased complexity, possibly efficiency loss) is an engineering call. Again, for one use case I think it's not worth it. Should we find the need more often for such "fully weak associations" (we found the coercion system has tables that are a bit like this) we can reconsider.
At the moment I'm getting a feeling we're entering the realm of overengineering.
comment:27 in reply to: ↑ 26 Changed 5 years ago by
- Description modified (diff)
- Status changed from needs_work to needs_review
- Work issues Make part the coverage script happy deleted
Replying to nbruin:
At the moment I'm getting a feeling we're entering the realm of overengineering.
That's quite possible. But the changes were rather mild, and so I am submitting my patch anyway.
Do you think it is overengineered? We should of course test if the performance drops too much.
And we need to run the test suite (I didn't, yet).
Apply trac_14159_weak_value_triple_dict.patch
comment:28 follow-up: ↓ 29 Changed 5 years ago by
I was just wondering whether a MonoDict with weak values really makes sense. After all, it means: If you do
M[a]=b`, then this item will be deleted, unless you keep a strong reference to both a and b. But then you probably don't need a dictionary.
Anyway, I think the implementation for the TripleDict
case makes sense.
comment:29 in reply to: ↑ 28 Changed 5 years ago by
Replying to SimonKing:
I was just wondering whether a
MonoDict with weak values really makes sense. After all, it means: If you do
M[a]=b`, then this item will be deleted, unless you keep a strong reference to both a and b.
Not so much because of the requirement for strong references (with weak caching, the idea is that unknown entities are still holding references), but more because in that case you might as well spell it
a.M=weakref(b)
Incidentally, see #14058 for another use case where weakrefs (without callbacks there!) get stored in a TripleDict
. So that seems to be a strong indication that supporting weak values is indeed a reasonable think for it. I would propose to amend the solution there to use the facilities here and make #14058 depend on #14159. Provided the code here all checks out to work as expected and perform well, of course.
comment:30 follow-up: ↓ 31 Changed 5 years ago by
I still have not the faintest idea how to interpret the results of the startup_time plugin. Can someone please give me a pointer?
comment:31 in reply to: ↑ 30 Changed 5 years ago by
Replying to SimonKing:
I still have not the faintest idea how to interpret the results of the startup_time plugin. Can someone please give me a pointer?
While it's probably measuring something significant, I don't think the process properly controls for other factors. I've seen significant (consistent) timing differences between different compiles of exactly the same sage (i.e., after hg qpop; sage -b; hg qpush; sage -b
) on the same computer. The only thing I can think of for something like that is getting lucky/unlucky how certain libraries get laid out in memory. The differences it's reporting are so small that I think one cannot rule out such factors (the reference build being "lucky" and the patch build being "unlucky").
comment:32 Changed 5 years ago by
I could get the meaning if it would say: "With 98% confidence, the startup time of sage-5.8.beta1+#14159 is at least 0.1% bigger than the startup time of sage-5.8.beta1." But the plugin does not seem to make such statements.
Instead, looking at the diff, one sees
--- 5.8.beta1 +++ 5.8.beta1 + #14159 ... -Main: 0.70703 sec (30 samples, std_dev=0.0174) -Ticket: 0.71392 sec (30 samples, std_dev=0.0296) +Main: 0.66585 sec (30 samples, std_dev=0.00636) +Ticket: 0.67404 sec (30 samples, std_dev=0.0354) -Average increase of 0.0069 secs or 0.97%. +Average increase of 0.0082 secs or 1.2%. -With 52% confidence, startup time increased by at least 0.5% -With 72% confidence, startup time increased by at least 0.25% -With 83% confidence, startup time increased by at least 0.1% +With 36% confidence, startup time increased by at least 0.5% +With 86% confidence, startup time increased by at least 0.25% +With 98% confidence, startup time increased by at least 0.1%
So, in vanilla sage, one has 83% confidence of getting an increase of startup time by at least 0.1%---but increase with respect to what? This is what I don't understand.
comment:33 Changed 5 years ago by
I have run sage -startuptime
both with and without the patch. The strange result:
- Without the patch, the import of
_ssl
is listed with 3.397ms. But with the patch,_ssl
is not listed. - With the patch, the import of
sage.plot.graphics
is listed with 21.036ms. But without the patch,sage.plot.graphics
is not listed.
I can neither believe that sage.plot.graphics
is not imported during startup in vanilla sage, nor can I believe that sage.plot.graphics
is imported just because of changing the intestines of coercion dictionaries.
Bug in the startuptime script?
comment:34 Changed 5 years ago by
PS: The time spent for including sage.plot.graphics is nearly equal to the difference of the total startup times that I get with and without the patch. Hence, it could explain why the plugin complains.
comment:35 follow-up: ↓ 36 Changed 5 years ago by
For the record, the startuptime patchbot plugin compares ticket vs. no ticket. So the 0.1% increase is relative to not applying the patch on the ticket.
comment:36 in reply to: ↑ 35 Changed 5 years ago by
Replying to vbraun:
For the record, the startuptime patchbot plugin compares ticket vs. no ticket. So the 0.1% increase is relative to not applying the patch on the ticket.
Sure that should be the case. But, as you see, the plugin provides a diff "no ticket" versus "ticket". And according to the diff, "With 83% confidence, startup time increased by at least 0.1%" without ticket. So, is the point that 83% is not significant?
Hence, does the plugin state the following?
"Without the patch, there is no statistical evidence for a 0.1% increase of startup time. With the patch, there is statistical evidence for a 0.1% increase."?
comment:37 Changed 5 years ago by
- Cc jpflori added
comment:38 follow-up: ↓ 39 Changed 5 years ago by
Note that in the most recent run, the startup time plugin did actually not complain!
Nevertheless, I think we should care about speed. I see two approaches to improve the startup time, the second of them would improve the general performance.
- I found that a dozen of homsets are created and deleted during startup. One should find where they are created, and see whether their creation is really needed or whether one can avoid that the get deleted.
MonoDict
andTripleDict
have fast cdef set and get methods, but they are not always used. At least in Cython modules, we should try to find all locations which could benefit from replacingT[a,b,c]
byT.get(a,b,c)
. And even for Python modules, I think it would make sense to turn the cdef get and set methods into cpdef get and set methods: I guess it would still be faster thenT[a,b,c]
, because one avoids the testtry: k1, k2, k3 = k except (TypeError,ValueError): raise KeyError, k
comment:39 in reply to: ↑ 38 ; follow-up: ↓ 40 Changed 5 years ago by
Replying to SimonKing:
- I found that a dozen of homsets are created and deleted during startup. One should find where they are created, and see whether their creation is really needed or whether one can avoid that the get deleted.
"Premature optimization is the root of all evil" [Knuth]. Does their creation and deletion add considerably to the startup time? I would expect that this gets dwarfed by all the work involved in constructing class instances and resizing the globals dictionary.
MonoDict
andTripleDict
have fast cdef set and get methods, but they are not always used. At least in Cython modules, we should try to find all locations which could benefit from replacingT[a,b,c]
byT.get(a,b,c)
. And even for Python modules, I think it would make sense to turn the cdef get and set methods into cpdef get and set methods: I guess it would still be faster thenT[a,b,c]
, because one avoids the test
Probably not: The __setitem__
and __getitem__
lookups in python likely happen faster than normal method lookups in python, because the special methods are stored in slots, whereas normal method lookups have to resort to dictionary lookups. That might well cancel the packing/unpacking that needs to happen for the key (which isn't necessary for MonoDict
at all, by the way).
I think to do optimizations with best productivity:
- locate scenarios where
MonoDict
andTripleDict
access is significant in time usage - Find what is creating the worst overhead.
I'm not sure that the sage startup routine is most representative of that behaviour. If the goal is improving sage startup time, the right way is probably to only import a "stub" notebook that offers the limited functionality from other modules (basically capable of answering "we're not in the notebook") and make the stub so that starting the notebook, or any other non-trivial access, results in loading and replacing the stub with the real notebook.
comment:40 in reply to: ↑ 39 ; follow-up: ↓ 41 Changed 5 years ago by
Replying to nbruin:
"Premature optimization is the root of all evil" [Knuth]. Does their creation and deletion add considerably to the startup time?
Probably not.
Probably not: The
__setitem__
and__getitem__
lookups in python likely happen faster than normal method lookups in python, because the special methods are stored in slots, whereas normal method lookups have to resort to dictionary lookups.
OK. That's calling overhead versus avoiding an additional test. That would be easy to test.
But at least in cython modules, when one uses cdef'd variables of type MonoDict
or TripleDict
, using a cdef or cpdef set/get method directly should be faster than calling __set/getitem__
, isn't it?
comment:41 in reply to: ↑ 40 ; follow-up: ↓ 42 Changed 5 years ago by
Replying to SimonKing:
But at least in cython modules, when one uses cdef'd variables of type
MonoDict
orTripleDict
, using a cdef or cpdef set/get method directly should be faster than calling__set/getitem__
, isn't it?
Only because it's using the cdef
part then, I think.
comment:42 in reply to: ↑ 41 Changed 5 years ago by
Replying to nbruin:
Replying to SimonKing:
But at least in cython modules, when one uses cdef'd variables of type
MonoDict
orTripleDict
, using a cdef or cpdef set/get method directly should be faster than calling__set/getitem__
, isn't it?Only because it's using the
cdef
part then, I think.
"Only" or not: If it is faster then it should be used.
And I don't think it is "only" cdef. It is also the set/get
expects a fixed number of arguments, while in __set/getitem__
the given arguments are put into a tuple, need to be unpacked, the length of the tuple must be checked. So, set/get could be faster anyway.
I am now testing---as soon as half of the Sage library is rebuilt. I hate patches touching coerce_dict.pxd!!!
comment:43 Changed 5 years ago by
%cython cdef class dictest(object): cpdef get(self,int k1,int k2,int k3): if k1==1 and k2==1 and k3==1: return True else: return False def __getitem__(self,k): cdef int k1,k2,k3 try: k1,k2,k3=k except: raise KeyError return self.get(k1,k2,k3)
regular method call protocol is pretty expensive:
sage: C=dictest() sage: timeit("C.get(1,1,1)") 625 loops, best of 3: 795 ns per loop sage: timeit("C[1,1,1]") 625 loops, best of 3: 717 ns per loop
(but of course, from cython, one should absolutely use the cdef form)
comment:44 Changed 5 years ago by
Some other timings, making set/get cpdef:
sage: from sage.structure.coerce_dict import TripleDict, MonoDict sage: R = [Integers(n) for n in range(1,10^4)] sage: T = TripleDict(53) sage: M = MonoDict(53) sage: def test_py_getitem(T,R): for x in R: T[R,R,R] = 1 for x in R: _ = T[R,R,R] ....: sage: def test_py_get(T,R): for x in R: T.set(R,R,R, 1) for x in R: _ = T.get(R,R,R) ....: sage: cython(""" from sage.structure.coerce_dict cimport TripleDict def test_cy_getitem(TripleDict T,R): for x in R: T[R,R,R] = 1 for x in R: _ = T[R,R,R] """) ....: sage: cython(""" from sage.structure.coerce_dict cimport TripleDict def test_cy_get(TripleDict T,R): for x in R: T.set(R,R,R, 1) for x in R: _ = T.get(R,R,R) """) ....: sage: %time test_py_getitem(T,R) CPU times: user 0.03 s, sys: 0.01 s, total: 0.04 s Wall time: 0.04 s sage: %timeit test_py_getitem(T,R) 10 loops, best of 3: 23.9 ms per loop sage: %timeit test_py_getitem(T,R) 10 loops, best of 3: 23.5 ms per loop sage: %timeit test_py_getitem(T,R) 10 loops, best of 3: 23.6 ms per loop sage: %timeit test_py_get(T,R) 10 loops, best of 3: 24.6 ms per loop sage: %timeit test_py_get(T,R) 10 loops, best of 3: 24.3 ms per loop sage: %timeit test_py_get(T,R) 10 loops, best of 3: 24.3 ms per loop sage: %timeit test_cy_getitem(T,R) 10 loops, best of 3: 19.9 ms per loop sage: %timeit test_cy_getitem(T,R) 100 loops, best of 3: 19.9 ms per loop sage: %timeit test_cy_getitem(T,R) 100 loops, best of 3: 19.6 ms per loop sage: %timeit test_cy_get(T,R) 100 loops, best of 3: 17.9 ms per loop sage: %timeit test_cy_get(T,R) 100 loops, best of 3: 18 ms per loop sage: %timeit test_cy_get(T,R) 100 loops, best of 3: 18.3 ms per loop
Hence, I can confirm that in Python one better relies on the magical methods, while in cython it is better (but not much better!) to use the cpdef methods.
But in fact, I found some spots in parent_old.pyx that *are* used and call the magical methods.
comment:45 Changed 5 years ago by
- Description modified (diff)
I have added a new patch, that will hopefully make Sage a little faster. This is by using the cdef set/get methods everywhere in cython. The only spot that is using the magical methods i sage/categories/homset, which currently is python. Perhaps it should be cythoned? Since homsets are in the background for every conversion and coercion, this could be of some benefit.
Apply trac_14159_weak_value_triple_dict.patch trac_14159_use_cdef_get.patch
comment:46 Changed 5 years ago by
PS: In any case, cythoning sage.categories.homset should be on a different ticket.
comment:47 follow-up: ↓ 48 Changed 5 years ago by
Note that different patchbots show different results for the plugins: One patchbot, running sage-5.8.beta2, does not complain at all. Another one, running sage-5.8.beta1, complains both for startup_modules and startup_time.
comment:48 in reply to: ↑ 47 Changed 5 years ago by
comment:49 Changed 5 years ago by
- Dependencies changed from #13387 to #13387, #14254
#14254 is a blocker that will likely to be merged before this ticket here is positively reviewed. Hence, it will be a dependency.
Question: Shall we remove the new function signed_id introduced in #14254? It is not needed with the approach that we take here. Alternatively, we could apply it. How much is the overhead of calling a cpdef inline function doing <Py_ssize_t><void *>(x)
, compared without doing it directly?
comment:50 Changed 5 years ago by
Apparently there is no significant difference:
sage: cython(""" cdef inline Py_ssize_t signed_id(object X): <Py_ssize_t><void *>(X) from sage.all import srange def testcall(): cdef object i for i in srange(10^6): a = signed_id(i) def testdirect(): cdef object i for i in srange(10^6): a = <Py_ssize_t><void *>(i) """) sage: %timeit testcall() 10000 loops, best of 3: 87.2 us per loop sage: %timeit testdirect() 10000 loops, best of 3: 87 us per loop
comment:51 follow-ups: ↓ 52 ↓ 56 Changed 5 years ago by
Replying to SimonKing:
Question: Shall we remove the new function signed_id introduced in #14254? It is not needed with the approach that we take here. Alternatively, we could apply it. How much is the overhead of calling a cpdef inline function doing
<Py_ssize_t><void *>(x)
, compared without doing it directly?
It's inline
, so using it in a place where the cdef
part is used should be 0 overhead. The inline
should get expanded before the optimizer even looks at the code. The whole point of inline
is to get the performance of #define
macros without the headaches.
If the function is not really in the way, it may be worth keeping around. I have found it immensely useful for debugging cython
code if as much features as possible are also exposed in python. In particular, the cdef
only data attributes can be a real pain.
comment:52 in reply to: ↑ 51 Changed 5 years ago by
Replying to nbruin:
If the function is not really in the way, it may be worth keeping around. I have found it immensely useful for debugging
cython
code if as much features as possible are also exposed in python. In particular, thecdef
only data attributes can be a real pain.
+5
So, let's keep it (it is cpdef anyway, hence, can be used from Python as well), and apparently it is fast enough (in the example above I had "cdef inline", but "cpdef inline" gives the same timings).
comment:53 Changed 5 years ago by
The first patch needed to be rebased, the second was still fine.
Apply trac_14159_weak_value_triple_dict.patch trac_14159_use_cdef_get.patch
comment:54 Changed 5 years ago by
I needed another update of the patch: #14254 has introduced signed_id, which is imported in homset.py, but with the approach from here, we do not need to import signed_id. Hence, I removed the import statement in the current version of the patch.
Apply trac_14159_weak_value_triple_dict.patch trac_14159_use_cdef_get.patch
comment:55 Changed 5 years ago by
- Status changed from needs_review to positive_review
Good stuff! Much cleaner. Patchbot is happy and effectively this patch consolidates some use of TripleDict
that was already in ad-hoc use, so there is not too much chance for unexpected failures. The solution here does solve the "manual" installation of callbacks that was dangerous before and in includes a sanity check that the object on which the callback is happening is indeed still referring to the (now dead) weakref that is causing the callback.
comment:56 in reply to: ↑ 51 Changed 5 years ago by
Replying to nbruin:
The
inline
should get expanded before the optimizer even looks at the code.
That's not entirely true, at least not with GCC. The inline
keyword is just one of the things that GCC considers to decide whether or not to inline a function. Very simple functions like signed_id()
might very will be always inlined at higher optimization levels, while complicated functions might never be inlined (even if marked inline
).
comment:57 Changed 5 years ago by
- Milestone changed from sage-5.9 to sage-5.10
comment:58 Changed 5 years ago by
trac_14159_use_cdef_get.patch needs a proper commit message
comment:59 Changed 5 years ago by
Message added.
comment:60 Changed 5 years ago by
- Merged in set to sage-5.10.beta0
- Resolution set to fixed
- Status changed from positive_review to closed
What do you mean by "value under that key"?
Ah! You mean this situation:
T[a,b,c]=v
and laterT[a,b,c]=w
. Then v might become garbage collected and its callback would remove w from the dictionary.Indeed, this would be bad.
Potential solutions (two alternatives):
__setitem__
then its callback will be removed.