Opened 8 years ago
Closed 8 years ago
#14187 closed enhancement (fixed)
Check that lazy imports are not resolved during startup
Reported by: | vbraun | Owned by: | tbd |
---|---|---|---|
Priority: | major | Milestone: | sage-5.10 |
Component: | performance | Keywords: | |
Cc: | nthiery, robertwb | Merged in: | sage-5.10.beta1 |
Authors: | Volker Braun | Reviewers: | Nicolas M. Thiéry, Travis Scrimshaw |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | #12543 | Stopgaps: |
Description (last modified by )
Print a warning if, during startup, any lazy imports are being resolved. This beats the purpose of lazy imports so we should have a mechanism to prevent us from stepping into that trap.
Attachments (2)
Change History (46)
comment:1 Changed 8 years ago by
- Component changed from PLEASE CHANGE to performance
- Description modified (diff)
- Type changed from PLEASE CHANGE to enhancement
comment:2 Changed 8 years ago by
- Cc nthiery added
comment:3 Changed 8 years ago by
comment:4 follow-up: ↓ 5 Changed 8 years ago by
We still need to fix the warning about the FacadeSets?, I thought you'd know what goes wrong.
comment:5 in reply to: ↑ 4 Changed 8 years ago by
Replying to vbraun:
We still need to fix the warning about the FacadeSets?, I thought you'd know what goes wrong.
I must have missed something. Which ticket is this about?
Never mind; I missed the ticket description ...
comment:6 Changed 8 years ago by
Ok, I investigated that, and the issue is as follow:
- On Sage's startup, one create NN which is a facade set (in Sets().Facade())
- Sets lazily imports FacadeSets? to avoid an import loop: FacadeSets? uses Sets, but we want a reference Sets.Facades -> FacadeSets?
Note that with the upcoming functorial construction that will be a very common idiom even for low level categories like Magma().Commutative() that we certainly will load on startup.
So the problem we are having is that LazyImport? serves to purposes:
- avoiding to load stuff on Sage's startup
- break import loops.
Granted, it's not soo nice to setup a safety guard and immediately provide a way to work around it ... but what about adding an explicit option so that one could explicitly disable the warning when we know that the lazy import is really just about breaking an import loop?
LazyImport?('sage.categories.facade_sets', 'FacadeSets?', dont_warn_on_startup=True)
or
LazyImport?('sage.categories.facade_sets', 'FacadeSets?', can_be_evaluated_on_startup=True)
Cheers,
Nicolas
comment:7 follow-up: ↓ 8 Changed 8 years ago by
How about we just use lazy import for NN
?
comment:8 in reply to: ↑ 7 Changed 8 years ago by
comment:9 Changed 8 years ago by
- Reviewers set to Nicolas M. Thiéry
- Status changed from new to needs_review
I've fixed NN to only lazily import on startup. Also, I changed the user feedback from a warning to printing a traceback, otherwise its hard to figure out where exactly the undesirable import comes from.
comment:10 Changed 8 years ago by
- Description modified (diff)
comment:11 Changed 8 years ago by
All doctests pass now...
comment:12 Changed 8 years ago by
I've added a Python function to access the value of the startup_guard
Cython variable.
comment:13 Changed 8 years ago by
Hi Volker!
Thanks for the updated patch! I am almost ready to put it a positive
review. Just one thing about the import of NN in
sage/combinat/integer_vectors_mod_permgroup.py: I think one should
repeat the lazy import there rather than importing from all
where it's lazy imported. Otherwise, lazy import misses information to
complete its job.
Quick example: in a file all.py, put:
from sage.misc.lazy_import import lazy_import lazy_import('sage.rings.semirings.non_negative_integer_semiring', 'NN')
and in blah.py:
import all from all import NN print type(all.NN) print type(NN) NN(1) print type(all.NN) print type(NN)
With those we get:
sage: import blah <type 'sage.misc.lazy_import.LazyImport'> <type 'sage.misc.lazy_import.LazyImport'> <class 'sage.rings.semirings.non_negative_integer_semiring.NonNegativeIntegerSemiring_with_category'> <type 'sage.misc.lazy_import.LazyImport'>
Interpretation: upon using NN, lazy import substitutes the lazy import object by NN in the module where it was lazy imported. But not in blah! And that won't happen either if use explicitly the NN from blah:
sage: blah.NN Non negative integer semiring sage: type(blah.NN) sage.misc.lazy_import.LazyImport sage: blah.NN(1) 1 sage: type(blah.NN) sage.misc.lazy_import.LazyImport
Btw, you may want to write the output description in is_during_startup
as OUTPUT: a boolean
, but that's more a question of taste.
Cheers,
comment:14 follow-up: ↓ 16 Changed 8 years ago by
Isn't that just a bug in lazy_import
? It should figure out all modules where it is currently imported and replace all occurrences.
comment:15 Changed 8 years ago by
- Cc robertwb added
comment:16 in reply to: ↑ 14 Changed 8 years ago by
Replying to vbraun:
Isn't that just a bug in
lazy_import
? It should figure out all modules where it is currently imported and replace all occurrences.
I agree, this certainly is a missing feature. But I would not be surprised if figuring out all modules where it's imported could get tricky. I don't remember seeing any hook for an object to take some action when it's imported somewhere.
Anyway, for the problem at hand, let's just do the lazy import explicitly.
Cheers,
Nicolas
comment:17 follow-ups: ↓ 18 ↓ 19 Changed 8 years ago by
I disagree, imports are much more legible without the lazy_import call. And being easy to read is more important than a nanosecond saved by avoiding the lazy_import redirect. IMHO only the imports from all.py
should use lazy imports, and the not the internal implementation of the Sage library.
comment:18 in reply to: ↑ 17 Changed 8 years ago by
Replying to vbraun:
I disagree, imports are much more legible without the lazy_import call. And being easy to read is more important than a nanosecond saved by avoiding the lazy_import redirect.
A nanosecond multiplied by the number of times the object is used. Here it's for example used in is_canonical which is a fairly low level method. Also as much as I highly value readibility of code, as much I don't care so much about the readibility of imports (but it's a pain that pyflakes does not handle lazy imports).
Well, just my personal opinion; go ahead in either way you feel better.
comment:19 in reply to: ↑ 17 Changed 8 years ago by
Replying to vbraun:
I disagree, imports are much more legible without the lazy_import call.
Would we even have a hook to do this? The problem is that lazy_import
must register the globals
dictionary in which to change the binding. It can do that when lazy_import
gets called explicitly, because globals()
gives you the appropriate dictionary then.
When you do from all import NN
, you get that blah.NN
gets bound to the then current binding of all.NN
, which is a LazyImport
object. I don't think that at any point that LazyImport
object gets notified that it is getting bound to an entry in a different globals
dictionary, though. So how would LazyImport
even know which other dictionaries to mess with in order to resolve the lazy loading? You'd have to reach quite deep into Python's import mechanism.
The "hack-free" solution is to avoid from ... import
and, in time critical pieces of code, load "all.NN" into a local variable at runtime (which is faster anyway). Of course, that doesn't solve the readability problem ...
comment:20 follow-up: ↓ 21 Changed 8 years ago by
Just query the referrers to the lazy import, should be easy.
comment:21 in reply to: ↑ 20 Changed 8 years ago by
Replying to vbraun:
Just query the referrers to the lazy import, should be easy.
Do you mean gc.get_referrers
? That should mainly get you a list of dictionaries, indeed. You'd propose to just assume that those are globals
dictionaries and change the bindings? That sounds horribly hacky and dangerous to me. Robert can probably comment on the wisdom of this.
There's also the question of how to change the referrers once you've found them. This is not even straightforward for a dict
. The reference you're finding should be as a value in the dict. You don't know under which key. You'll have to query the dict for that. That's not going to be cheap (but at least only happens once).
What if you're finding a tuple referring to it? Or a homomorphism instance? What if you're finding a module that has decided to customize itself and has a frozendict as namespace?
comment:22 Changed 8 years ago by
Just do the replacement for dicts; if your customized module wants to freeze the lazy_import in place then it should be allowed to.
comment:23 Changed 8 years ago by
gc.get_referrers, clever. I don't see any reason why this shouldn't be fine to use (for dicts, which is the common and important case). More powerful and less hacky than what's there.
comment:24 Changed 8 years ago by
Wow, it actually works! When I quickly hacked together a trial it resulted in a failure to start up. Some funny dicts showed up, which made me a little worried about this replacement. I see you're disabling gc during the replacement, which I didn't do. Perhaps that's the key.
Anyway, the dictionaries you get back on referrers can be quite big, so ref.items()
can be a rather costly construction. If you make it
for k in [k for k,v in ref.iteritems() if v is self]: ref[k]=obj
you should end up allocating a much shorter list (note that the key list needs to be made before iterating over it because you should be done iterating over the dict itself prior to modifying it, as you did).
Anyway, if Robert thinks this is less evil than what was happening before, I won't complain either.
comment:25 follow-up: ↓ 26 Changed 8 years ago by
Just being curious: what's the complexity of the gc.getreferers operation?
Also, should we be put of by the following comment on http://docs.python.org/2/library/gc.html:
"Avoid using get_referrers() for any purpose other than debugging"?
comment:26 in reply to: ↑ 25 Changed 8 years ago by
Replying to nthiery:
"Avoid using get_referrers() for any purpose other than debugging"?
Good catch! I think we should take that very seriously. I think I read that at some point and that's probably why I felt uneasy about it. I think that idea can go out the window then.
comment:27 follow-up: ↓ 29 Changed 8 years ago by
For the record, we are using gc.get_referrers()
already in the Sage library.
The alternative is to walk sys.modules
for lazy imports into modules and fall back to what we have for class-level lazy imports.
But at the end of the day lazy imports are a hack so why not have the garbage collector do the work for us?
comment:28 follow-up: ↓ 30 Changed 8 years ago by
See this python bug. The following code segfaults:
from gc import get_referrers def iter(): tag = object() yield tag # 'tag' gets stored in the result tuple lst = [x for x in get_referrers(tag) if isinstance(x, tuple)] t = lst[0] # this *is* the result tuple print t[3] # full of nulls ! tuple(iter())
Before we use get_referrers
in production code you'd need to understand what goes wrong in the above example and guarantee this won't happen for the dictionaries we do replacements on.
I'd think that looming segfaults are a high price to pay for the convenience of writing NN=module.NN
rather than lazy_import("other_module",NN)
.
comment:29 in reply to: ↑ 27 Changed 8 years ago by
Replying to vbraun:
For the record, we are using
gc.get_referrers()
already in the Sage library.The alternative is to walk
sys.modules
for lazy imports into modules and fall back to what we have for class-level lazy imports.But at the end of the day lazy imports are a hack so why not have the garbage collector do the work for us?
+1, I see this as a "best effort" kind of a thing anyways.
In terms of efficiency, I think it might make sense to try to attempt a lookup before iterating over the (potentially very large) dict.
comment:30 in reply to: ↑ 28 ; follow-up: ↓ 31 Changed 8 years ago by
Replying to nbruin:
See this python bug. The following code segfaults:
from gc import get_referrers def iter(): tag = object() yield tag # 'tag' gets stored in the result tuple lst = [x for x in get_referrers(tag) if isinstance(x, tuple)] t = lst[0] # this *is* the result tuple print t[3] # full of nulls ! tuple(iter())Before we use
get_referrers
in production code you'd need to understand what goes wrong in the above example and guarantee this won't happen for the dictionaries we do replacements on.
This is intentionally creating a partially-constructed tuple and using the fact that get_referrers() can refer to it. "partially-constructed" dicts are safe, if we ever run across one.
I'd think that looming segfaults are a high price to pay for the convenience of writing
NN=module.NN
rather thanlazy_import("other_module",NN)
.
The convenience is in being abel to write "from sage.foo.all import X" and not worry about whether X was (possibly transitively) lazily imported. I think that's a big win.
comment:31 in reply to: ↑ 30 ; follow-up: ↓ 32 Changed 8 years ago by
Replying to robertwb:
This is intentionally creating a partially-constructed tuple and using the fact that get_referrers() can refer to it.
Agreed. It's intentionally exposing one of the dangers. That's the point of an example. We need to convince ourselves that this will never happen with a dictionary. And then probably only do replacements when type(ref) is dict
rather than isinstance(ref,dict)
and document that lazy imports only work in conventional globals()
(one never knows what comes of subclassing).
"partially-constructed" dicts are safe, if we ever run across one.
Do you have a reference that they are? Are you sure it's just partially constructed ones? Note that as soon as a value in a dict
we iterate over is NULL
, we're done for, since the value fetch will cause an INCREF
on that NULL
pointer. Lazy imports can be triggered to resolve at funny times, quite possibly when a class definition is being processed. At that time, the dict
of that class has not been given out yet, so it would be under construction and I don't immediately see why it should abstain from putting a key with a null value in there already.
For infrastructure like this I'd be rather reluctant to just rely on "it seems to work in all the cases we've tried", especially because it would be nice if this code were robust in the face of python upgrades.
(Concerning cost: the call basically requires a full GC
sweep, so it's very expensive. If you can make sure that there are very few lazyimport
instances then the cost may be acceptable, because the resolution of any single such instance will trigger a GC)
Note that in sage, a GC can trigger the callback of a WeakValueDict
, which can trigger an __eq__
on an arbitrary object, which can possibly trigger the resolution of a LazyImport
. Are you still sure dict
s are safe to examine under those conditions?
comment:32 in reply to: ↑ 31 Changed 8 years ago by
Replying to nbruin:
Replying to robertwb:
This is intentionally creating a partially-constructed tuple and using the fact that get_referrers() can refer to it.
Agreed. It's intentionally exposing one of the dangers. That's the point of an example. We need to convince ourselves that this will never happen with a dictionary. And then probably only do replacements when
type(ref) is dict
rather thanisinstance(ref,dict)
and document that lazy imports only work in conventionalglobals()
(one never knows what comes of subclassing).
Totally fine with that.
"partially-constructed" dicts are safe, if we ever run across one.
Do you have a reference that they are? Are you sure it's just partially constructed ones? Note that as soon as a value in a
dict
we iterate over isNULL
, we're done for, since the value fetch will cause anINCREF
on thatNULL
pointer. Lazy imports can be triggered to resolve at funny times, quite possibly when a class definition is being processed. At that time, thedict
of that class has not been given out yet, so it would be under construction and I don't immediately see why it should abstain from putting a key with a null value in there already.
There's neither an API nor motivation to insert key with a NULL value.
For infrastructure like this I'd be rather reluctant to just rely on "it seems to work in all the cases we've tried", especially because it would be nice if this code were robust in the face of python upgrades.
Agreed.
(Concerning cost: the call basically requires a full
GC
sweep, so it's very expensive. If you can make sure that there are very fewlazyimport
instances then the cost may be acceptable, because the resolution of any single such instance will trigger a GC)
Fair enough. I don't think this is a huge problem, it's basically deferred startup. It could be interesting to have the ability to force all lazy imports to be resolved (perhaps even in the background...) but that could be trickier.
Note that in sage, a GC can trigger the callback of a
WeakValueDict
, which can trigger an__eq__
on an arbitrary object, which can possibly trigger the resolution of aLazyImport
. Are you still suredict
s are safe to examine under those conditions?
I don't see why not. For instance, dicts
follow the GC protocol of removing themselves from the loop when they're being destructed.
comment:33 Changed 8 years ago by
- Dependencies set to #12543
I've tightened the dictionary type check and implemented the optimization where first a list of all relevant keys is created. Also rebased the patch on sage-5.8.beta4.
comment:34 Changed 8 years ago by
Performance information? In particular, it would be interesting to see:
sage: lazy_import("lazily_imported_module","A",_as="A1") sage: lazy_import("lazily_imported_module","A",_as="A2") sage: %time A1 sage: %time A2
It's the second lazy import resolution that's interesting: The module is already imported so in principle it's cheap, but we trigger a GC, so it's not. This is something that could happen a lot with objects that are used all around the place and yet in each place it's decided to do a lazy import (essentially the approach required now). Of course with the proposed approach it would be better to do A2=A1
instead, or in general, only lazily import in one eagerly imported module, from where other modules import the lazy import eagerly.
It's important to know what kind of speed penalties are involved with using lazy_import
because it's a speed optimization to begin with.
comment:35 Changed 8 years ago by
As you already said, time is about the time for a gc run:
sage: def lazy_test(): ....: ns = dict() ....: lazy_import("sage.rings.semirings.all", "NN", "foo", namespace=ns) ....: ns['foo']._get_object() sage: sage: %timeit lazy_test() 10 loops, best of 3: 54.1 ms per loop
I don't think we actually need the gc call, I just stuck it in there for reproducability.
comment:36 Changed 8 years ago by
Without the gc.collect()
call:
sage: %timeit lazy_test() 100 loops, best of 3: 13.8 ms per loop
So the time to figure out the referrers is non-negligible compared to the full GC run.
comment:37 Changed 8 years ago by
- Description modified (diff)
The second patch doesn't work with the new doctesting framework that was recently introduced. I'm moving it into #14357 so we can at least resolve the original issue.
comment:38 Changed 8 years ago by
The updated patch adds support for copy() and deepcopy() to lazy imports, which is used in backtrack.py
doctests and causes it to fail in Family(NN, ...)
. This was not triggered before because NN was accidentally resolved during startup.
comment:39 Changed 8 years ago by
Still needs review...
comment:40 Changed 8 years ago by
- Description modified (diff)
- Reviewers changed from Nicolas M. Thiéry to Nicolas M. Thiéry, Travis Scrimshaw
- Status changed from needs_review to positive_review
Looks good to me. I also think this is a good feature to have.
(For those who don't want to read/understand the above discussion: most of it pertains to the now separate ticket #14357.)
comment:41 Changed 8 years ago by
- Milestone changed from sage-5.9 to sage-5.10
comment:42 Changed 8 years ago by
- Status changed from positive_review to needs_work
sage -t devel/sage/sage/misc/dev_tools.py ********************************************************************** File "devel/sage/sage/misc/dev_tools.py", line 142, in sage.misc.dev_tools.import_statements Failed example: import_statements(NN) Expected: from sage.rings.semirings.non_negative_integer_semiring import NN Got: ** Warning **: several modules for that object: sage.all, sage.all_cmdline, sage.combinat.integer_vectors_mod_permgroup, sage.combinat.partition ... from sage.rings.semirings.all import NN **********************************************************************
comment:43 Changed 8 years ago by
- Status changed from needs_work to positive_review
Oops, that was part of the other patch that I split off into #14357. Fixed.
comment:44 Changed 8 years ago by
- Merged in set to sage-5.10.beta1
- Resolution set to fixed
- Status changed from positive_review to closed
Oh, I like this feature!!!
Could we add a quick test to it? Maybe something like (not tested):
Once done, and assuming that the patch bot goes green, you can put a positive review on my behalf!
Cheers,