Opened 8 years ago

Closed 8 years ago

#14187 closed enhancement (fixed)

Check that lazy imports are not resolved during startup

Reported by: vbraun Owned by: tbd
Priority: major Milestone: sage-5.10
Component: performance Keywords:
Cc: nthiery, robertwb Merged in: sage-5.10.beta1
Authors: Volker Braun Reviewers: Nicolas M. Thiéry, Travis Scrimshaw
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: #12543 Stopgaps:

Status badges

Description (last modified by tscrim)

Print a warning if, during startup, any lazy imports are being resolved. This beats the purpose of lazy imports so we should have a mechanism to prevent us from stepping into that trap.

Apply: trac_14187_lazy_import_test.patch

Attachments (2)

trac_14187_lazy_everywhere.patch (7.0 KB) - added by vbraun 8 years ago.
Version without the gc.collect() call
trac_14187_lazy_import_test.patch (6.0 KB) - added by vbraun 8 years ago.
Updated patch

Download all attachments as: .zip

Change History (46)

comment:1 Changed 8 years ago by vbraun

  • Authors set to Volker Braun
  • Component changed from PLEASE CHANGE to performance
  • Description modified (diff)
  • Type changed from PLEASE CHANGE to enhancement

comment:2 Changed 8 years ago by vbraun

  • Cc nthiery added

comment:3 Changed 8 years ago by nthiery

Oh, I like this feature!!!

Could we add a quick test to it? Maybe something like (not tested):

    sage: sage.misc.lazy_import.startup_guard = True
    sage: lazy_import(something)
    sage: something
    warning ...
    sage: sage.misc.lazy_import.finish_startup()
    sage: sage.misc.lazy_import.finish_startup()
    Assertion failed finish_startup() must be called exactly once

Once done, and assuming that the patch bot goes green, you can put a positive review on my behalf!

Cheers,

Nicolas

comment:4 follow-up: Changed 8 years ago by vbraun

We still need to fix the warning about the FacadeSets?, I thought you'd know what goes wrong.

comment:5 in reply to: ↑ 4 Changed 8 years ago by nthiery

Replying to vbraun:

We still need to fix the warning about the FacadeSets?, I thought you'd know what goes wrong.

I must have missed something. Which ticket is this about?

Never mind; I missed the ticket description ...

Last edited 8 years ago by nthiery (previous) (diff)

comment:6 Changed 8 years ago by nthiery

Ok, I investigated that, and the issue is as follow:

  • On Sage's startup, one create NN which is a facade set (in Sets().Facade())
  • Sets lazily imports FacadeSets? to avoid an import loop: FacadeSets? uses Sets, but we want a reference Sets.Facades -> FacadeSets?

Note that with the upcoming functorial construction that will be a very common idiom even for low level categories like Magma().Commutative() that we certainly will load on startup.

So the problem we are having is that LazyImport? serves to purposes:

  • avoiding to load stuff on Sage's startup
  • break import loops.

Granted, it's not soo nice to setup a safety guard and immediately provide a way to work around it ... but what about adding an explicit option so that one could explicitly disable the warning when we know that the lazy import is really just about breaking an import loop?

LazyImport?('sage.categories.facade_sets', 'FacadeSets?', dont_warn_on_startup=True)

or

LazyImport?('sage.categories.facade_sets', 'FacadeSets?', can_be_evaluated_on_startup=True)

Cheers,

Nicolas

comment:7 follow-up: Changed 8 years ago by vbraun

How about we just use lazy import for NN?

comment:8 in reply to: ↑ 7 Changed 8 years ago by nthiery

Replying to vbraun:

How about we just use lazy import for NN?

Fair enough. Yeah, if that's easily done, go ahead. And then I'll add this option for #10963 if I really need it.

Cheers,

comment:9 Changed 8 years ago by vbraun

  • Reviewers set to Nicolas M. Thiéry
  • Status changed from new to needs_review

I've fixed NN to only lazily import on startup. Also, I changed the user feedback from a warning to printing a traceback, otherwise its hard to figure out where exactly the undesirable import comes from.

comment:10 Changed 8 years ago by vbraun

  • Description modified (diff)

comment:11 Changed 8 years ago by vbraun

All doctests pass now...

comment:12 Changed 8 years ago by vbraun

I've added a Python function to access the value of the startup_guard Cython variable.

comment:13 Changed 8 years ago by nthiery

Hi Volker!

Thanks for the updated patch! I am almost ready to put it a positive review. Just one thing about the import of NN in sage/combinat/integer_vectors_mod_permgroup.py: I think one should repeat the lazy import there rather than importing from all where it's lazy imported. Otherwise, lazy import misses information to complete its job.

Quick example: in a file all.py, put:

from sage.misc.lazy_import import lazy_import
lazy_import('sage.rings.semirings.non_negative_integer_semiring', 'NN')

and in blah.py:

import all
from all import NN

print type(all.NN)
print type(NN)

NN(1)

print type(all.NN)
print type(NN)

With those we get:

sage: import blah
<type 'sage.misc.lazy_import.LazyImport'>
<type 'sage.misc.lazy_import.LazyImport'>
<class 'sage.rings.semirings.non_negative_integer_semiring.NonNegativeIntegerSemiring_with_category'>
<type 'sage.misc.lazy_import.LazyImport'>

Interpretation: upon using NN, lazy import substitutes the lazy import object by NN in the module where it was lazy imported. But not in blah! And that won't happen either if use explicitly the NN from blah:

sage: blah.NN
Non negative integer semiring
sage: type(blah.NN)
sage.misc.lazy_import.LazyImport
sage: blah.NN(1)
1
sage: type(blah.NN)
sage.misc.lazy_import.LazyImport

Btw, you may want to write the output description in is_during_startup as OUTPUT: a boolean, but that's more a question of taste.

Cheers,

comment:14 follow-up: Changed 8 years ago by vbraun

Isn't that just a bug in lazy_import? It should figure out all modules where it is currently imported and replace all occurrences.

comment:15 Changed 8 years ago by vbraun

  • Cc robertwb added

comment:16 in reply to: ↑ 14 Changed 8 years ago by nthiery

Replying to vbraun:

Isn't that just a bug in lazy_import? It should figure out all modules where it is currently imported and replace all occurrences.

I agree, this certainly is a missing feature. But I would not be surprised if figuring out all modules where it's imported could get tricky. I don't remember seeing any hook for an object to take some action when it's imported somewhere.

Anyway, for the problem at hand, let's just do the lazy import explicitly.

Cheers,

Nicolas

comment:17 follow-ups: Changed 8 years ago by vbraun

I disagree, imports are much more legible without the lazy_import call. And being easy to read is more important than a nanosecond saved by avoiding the lazy_import redirect. IMHO only the imports from all.py should use lazy imports, and the not the internal implementation of the Sage library.

comment:18 in reply to: ↑ 17 Changed 8 years ago by nthiery

Replying to vbraun:

I disagree, imports are much more legible without the lazy_import call. And being easy to read is more important than a nanosecond saved by avoiding the lazy_import redirect.

A nanosecond multiplied by the number of times the object is used. Here it's for example used in is_canonical which is a fairly low level method. Also as much as I highly value readibility of code, as much I don't care so much about the readibility of imports (but it's a pain that pyflakes does not handle lazy imports).

Well, just my personal opinion; go ahead in either way you feel better.

comment:19 in reply to: ↑ 17 Changed 8 years ago by nbruin

Replying to vbraun:

I disagree, imports are much more legible without the lazy_import call.

Would we even have a hook to do this? The problem is that lazy_import must register the globals dictionary in which to change the binding. It can do that when lazy_import gets called explicitly, because globals() gives you the appropriate dictionary then.

When you do from all import NN, you get that blah.NN gets bound to the then current binding of all.NN, which is a LazyImport object. I don't think that at any point that LazyImport object gets notified that it is getting bound to an entry in a different globals dictionary, though. So how would LazyImport even know which other dictionaries to mess with in order to resolve the lazy loading? You'd have to reach quite deep into Python's import mechanism.

The "hack-free" solution is to avoid from ... import and, in time critical pieces of code, load "all.NN" into a local variable at runtime (which is faster anyway). Of course, that doesn't solve the readability problem ...

comment:20 follow-up: Changed 8 years ago by vbraun

Just query the referrers to the lazy import, should be easy.

comment:21 in reply to: ↑ 20 Changed 8 years ago by nbruin

Replying to vbraun:

Just query the referrers to the lazy import, should be easy.

Do you mean gc.get_referrers? That should mainly get you a list of dictionaries, indeed. You'd propose to just assume that those are globals dictionaries and change the bindings? That sounds horribly hacky and dangerous to me. Robert can probably comment on the wisdom of this.

There's also the question of how to change the referrers once you've found them. This is not even straightforward for a dict. The reference you're finding should be as a value in the dict. You don't know under which key. You'll have to query the dict for that. That's not going to be cheap (but at least only happens once).

What if you're finding a tuple referring to it? Or a homomorphism instance? What if you're finding a module that has decided to customize itself and has a frozendict as namespace?

Last edited 8 years ago by nbruin (previous) (diff)

comment:22 Changed 8 years ago by vbraun

Just do the replacement for dicts; if your customized module wants to freeze the lazy_import in place then it should be allowed to.

comment:23 Changed 8 years ago by robertwb

gc.get_referrers, clever. I don't see any reason why this shouldn't be fine to use (for dicts, which is the common and important case). More powerful and less hacky than what's there.

comment:24 Changed 8 years ago by nbruin

Wow, it actually works! When I quickly hacked together a trial it resulted in a failure to start up. Some funny dicts showed up, which made me a little worried about this replacement. I see you're disabling gc during the replacement, which I didn't do. Perhaps that's the key.

Anyway, the dictionaries you get back on referrers can be quite big, so ref.items() can be a rather costly construction. If you make it

for k in [k for k,v in ref.iteritems() if v is self]:
    ref[k]=obj

you should end up allocating a much shorter list (note that the key list needs to be made before iterating over it because you should be done iterating over the dict itself prior to modifying it, as you did).

Anyway, if Robert thinks this is less evil than what was happening before, I won't complain either.

comment:25 follow-up: Changed 8 years ago by nthiery

Just being curious: what's the complexity of the gc.getreferers operation?

Also, should we be put of by the following comment on http://docs.python.org/2/library/gc.html:

"Avoid using get_referrers() for any purpose other than debugging"?

comment:26 in reply to: ↑ 25 Changed 8 years ago by nbruin

Replying to nthiery:

"Avoid using get_referrers() for any purpose other than debugging"?

Good catch! I think we should take that very seriously. I think I read that at some point and that's probably why I felt uneasy about it. I think that idea can go out the window then.

comment:27 follow-up: Changed 8 years ago by vbraun

For the record, we are using gc.get_referrers() already in the Sage library.

The alternative is to walk sys.modules for lazy imports into modules and fall back to what we have for class-level lazy imports.

But at the end of the day lazy imports are a hack so why not have the garbage collector do the work for us?

comment:28 follow-up: Changed 8 years ago by nbruin

See this python bug. The following code segfaults:

from gc import get_referrers

def iter():
        tag = object()
        yield tag   # 'tag' gets stored in the result tuple
        lst = [x for x in get_referrers(tag)
               if isinstance(x, tuple)]
        t = lst[0]  # this *is* the result tuple
        print t[3]  # full of nulls !
    
tuple(iter())

Before we use get_referrers in production code you'd need to understand what goes wrong in the above example and guarantee this won't happen for the dictionaries we do replacements on.

I'd think that looming segfaults are a high price to pay for the convenience of writing NN=module.NN rather than lazy_import("other_module",NN).

comment:29 in reply to: ↑ 27 Changed 8 years ago by robertwb

Replying to vbraun:

For the record, we are using gc.get_referrers() already in the Sage library.

The alternative is to walk sys.modules for lazy imports into modules and fall back to what we have for class-level lazy imports.

But at the end of the day lazy imports are a hack so why not have the garbage collector do the work for us?

+1, I see this as a "best effort" kind of a thing anyways.

In terms of efficiency, I think it might make sense to try to attempt a lookup before iterating over the (potentially very large) dict.

comment:30 in reply to: ↑ 28 ; follow-up: Changed 8 years ago by robertwb

Replying to nbruin:

See this python bug. The following code segfaults:

from gc import get_referrers

def iter():
        tag = object()
        yield tag   # 'tag' gets stored in the result tuple
        lst = [x for x in get_referrers(tag)
               if isinstance(x, tuple)]
        t = lst[0]  # this *is* the result tuple
        print t[3]  # full of nulls !
    
tuple(iter())

Before we use get_referrers in production code you'd need to understand what goes wrong in the above example and guarantee this won't happen for the dictionaries we do replacements on.

This is intentionally creating a partially-constructed tuple and using the fact that get_referrers() can refer to it. "partially-constructed" dicts are safe, if we ever run across one.

I'd think that looming segfaults are a high price to pay for the convenience of writing NN=module.NN rather than lazy_import("other_module",NN).

The convenience is in being abel to write "from sage.foo.all import X" and not worry about whether X was (possibly transitively) lazily imported. I think that's a big win.

comment:31 in reply to: ↑ 30 ; follow-up: Changed 8 years ago by nbruin

Replying to robertwb:

This is intentionally creating a partially-constructed tuple and using the fact that get_referrers() can refer to it.

Agreed. It's intentionally exposing one of the dangers. That's the point of an example. We need to convince ourselves that this will never happen with a dictionary. And then probably only do replacements when type(ref) is dict rather than isinstance(ref,dict) and document that lazy imports only work in conventional globals() (one never knows what comes of subclassing).

"partially-constructed" dicts are safe, if we ever run across one.

Do you have a reference that they are? Are you sure it's just partially constructed ones? Note that as soon as a value in a dict we iterate over is NULL, we're done for, since the value fetch will cause an INCREF on that NULL pointer. Lazy imports can be triggered to resolve at funny times, quite possibly when a class definition is being processed. At that time, the dict of that class has not been given out yet, so it would be under construction and I don't immediately see why it should abstain from putting a key with a null value in there already.

For infrastructure like this I'd be rather reluctant to just rely on "it seems to work in all the cases we've tried", especially because it would be nice if this code were robust in the face of python upgrades.

(Concerning cost: the call basically requires a full GC sweep, so it's very expensive. If you can make sure that there are very few lazyimport instances then the cost may be acceptable, because the resolution of any single such instance will trigger a GC)

Note that in sage, a GC can trigger the callback of a WeakValueDict, which can trigger an __eq__ on an arbitrary object, which can possibly trigger the resolution of a LazyImport. Are you still sure dicts are safe to examine under those conditions?

comment:32 in reply to: ↑ 31 Changed 8 years ago by robertwb

Replying to nbruin:

Replying to robertwb:

This is intentionally creating a partially-constructed tuple and using the fact that get_referrers() can refer to it.

Agreed. It's intentionally exposing one of the dangers. That's the point of an example. We need to convince ourselves that this will never happen with a dictionary. And then probably only do replacements when type(ref) is dict rather than isinstance(ref,dict) and document that lazy imports only work in conventional globals() (one never knows what comes of subclassing).

Totally fine with that.

"partially-constructed" dicts are safe, if we ever run across one.

Do you have a reference that they are? Are you sure it's just partially constructed ones? Note that as soon as a value in a dict we iterate over is NULL, we're done for, since the value fetch will cause an INCREF on that NULL pointer. Lazy imports can be triggered to resolve at funny times, quite possibly when a class definition is being processed. At that time, the dict of that class has not been given out yet, so it would be under construction and I don't immediately see why it should abstain from putting a key with a null value in there already.

There's neither an API nor motivation to insert key with a NULL value.

For infrastructure like this I'd be rather reluctant to just rely on "it seems to work in all the cases we've tried", especially because it would be nice if this code were robust in the face of python upgrades.

Agreed.

(Concerning cost: the call basically requires a full GC sweep, so it's very expensive. If you can make sure that there are very few lazyimport instances then the cost may be acceptable, because the resolution of any single such instance will trigger a GC)

Fair enough. I don't think this is a huge problem, it's basically deferred startup. It could be interesting to have the ability to force all lazy imports to be resolved (perhaps even in the background...) but that could be trickier.

Note that in sage, a GC can trigger the callback of a WeakValueDict, which can trigger an __eq__ on an arbitrary object, which can possibly trigger the resolution of a LazyImport. Are you still sure dicts are safe to examine under those conditions?

I don't see why not. For instance, dicts follow the GC protocol of removing themselves from the loop when they're being destructed.

comment:33 Changed 8 years ago by vbraun

  • Dependencies set to #12543

I've tightened the dictionary type check and implemented the optimization where first a list of all relevant keys is created. Also rebased the patch on sage-5.8.beta4.

comment:34 Changed 8 years ago by nbruin

Performance information? In particular, it would be interesting to see:

sage: lazy_import("lazily_imported_module","A",_as="A1")
sage: lazy_import("lazily_imported_module","A",_as="A2")
sage: %time A1
sage: %time A2

It's the second lazy import resolution that's interesting: The module is already imported so in principle it's cheap, but we trigger a GC, so it's not. This is something that could happen a lot with objects that are used all around the place and yet in each place it's decided to do a lazy import (essentially the approach required now). Of course with the proposed approach it would be better to do A2=A1 instead, or in general, only lazily import in one eagerly imported module, from where other modules import the lazy import eagerly.

It's important to know what kind of speed penalties are involved with using lazy_import because it's a speed optimization to begin with.

comment:35 Changed 8 years ago by vbraun

As you already said, time is about the time for a gc run:

sage: def lazy_test():
....:         ns = dict()
....:         lazy_import("sage.rings.semirings.all", "NN", "foo", namespace=ns)
....:         ns['foo']._get_object()
sage: 
sage: %timeit lazy_test()
10 loops, best of 3: 54.1 ms per loop

I don't think we actually need the gc call, I just stuck it in there for reproducability.

comment:36 Changed 8 years ago by vbraun

Without the gc.collect() call:

sage: %timeit lazy_test()
100 loops, best of 3: 13.8 ms per loop

So the time to figure out the referrers is non-negligible compared to the full GC run.

Changed 8 years ago by vbraun

Version without the gc.collect() call

comment:37 Changed 8 years ago by vbraun

  • Description modified (diff)

The second patch doesn't work with the new doctesting framework that was recently introduced. I'm moving it into #14357 so we can at least resolve the original issue.

comment:38 Changed 8 years ago by vbraun

The updated patch adds support for copy() and deepcopy() to lazy imports, which is used in backtrack.py doctests and causes it to fail in Family(NN, ...). This was not triggered before because NN was accidentally resolved during startup.

comment:39 Changed 8 years ago by vbraun

Still needs review...

comment:40 Changed 8 years ago by tscrim

  • Description modified (diff)
  • Reviewers changed from Nicolas M. Thiéry to Nicolas M. Thiéry, Travis Scrimshaw
  • Status changed from needs_review to positive_review

Looks good to me. I also think this is a good feature to have.

(For those who don't want to read/understand the above discussion: most of it pertains to the now separate ticket #14357.)

comment:41 Changed 8 years ago by jdemeyer

  • Milestone changed from sage-5.9 to sage-5.10

comment:42 Changed 8 years ago by jdemeyer

  • Status changed from positive_review to needs_work
sage -t devel/sage/sage/misc/dev_tools.py
**********************************************************************
File "devel/sage/sage/misc/dev_tools.py", line 142, in sage.misc.dev_tools.import_statements
Failed example:
    import_statements(NN)
Expected:
    from sage.rings.semirings.non_negative_integer_semiring import NN
Got:
      ** Warning **: several modules for that object: sage.all, sage.all_cmdline, sage.combinat.integer_vectors_mod_permgroup, sage.combinat.partition ...
    from sage.rings.semirings.all import NN
**********************************************************************

Changed 8 years ago by vbraun

Updated patch

comment:43 Changed 8 years ago by vbraun

  • Status changed from needs_work to positive_review

Oops, that was part of the other patch that I split off into #14357. Fixed.

comment:44 Changed 8 years ago by jdemeyer

  • Merged in set to sage-5.10.beta1
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.