Opened 3 years ago
Closed 3 years ago
#24681 closed defect (fixed)
Fix Cython tracebacks on Python 3
Reported by: | embray | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-8.3 |
Component: | cython | Keywords: | |
Cc: | Merged in: | ||
Authors: | Jeroen Demeyer, Erik Bray | Reviewers: | Frédéric Chapoton |
Report Upstream: | Reported upstream. No feedback yet. | Work issues: | |
Branch: | 1a9225f (Commits, GitHub, GitLab) | Commit: | 1a9225f70f8a49428d1874a47252a46791d3746d |
Dependencies: | Stopgaps: |
Description (last modified by )
I've been getting to the bottom of why tracebacks aren't displaying Cython sources on Python 3. The problem turns out to be multi-faceted, and fixing it right would also mean fixing some things in general, since the way it works now is a hack.
Basically, the only reason it works at all on Python 2, is that the linecache
module on Python 2 has some code which, given a relative filename (with an extension unrecognized by the import system) like sage/rings/integer.pyx
, it will search for this file under all sys.path
entries and, if found, read the source lines from that file.
This, in turn, only works for Sage because we actually install the .pyx
sources in the sage package.
This does not work in Python 3. In linecache.updatecache
, before it tries the sys.path
search, it checks if the module object has a __loader__
, and calls its get_source()
method if it exists. On Python 2 this isn't a problem since modules don't necessarily have a __loader__
, and in particular extension modules don't. But on the reworked import system in Python 3, pretty much every module has a __loader__
--in the case of extension modules an ExtensionFileLoader
. But the built-in ExtensionFileLoader
of course knows nothing about Cython so its get_source()
method just returns None
. linecache.updatecache
assumes this is correct (why would the loader lie?) and returns.
The simplest way to fix this is to remove the get_source()
method from the ExtensionFileLoader
class. This way, Python 3 works the same way as Python 2.
Upstream:
Change History (46)
comment:1 Changed 3 years ago by
- Description modified (diff)
comment:2 Changed 3 years ago by
- Branch set to u/embray/python3/cython-source-prototype
- Commit set to 3388b6606e30c3d3c0fad1566b74c790772dfe1d
comment:3 Changed 3 years ago by
- Commit changed from 3388b6606e30c3d3c0fad1566b74c790772dfe1d to 8b63abe731c510f0de9ef0e3ab9a0bda3669dce1
Branch pushed to git repo; I updated commit sha1. New commits:
8b63abe | Add get_source for Cython modules. This is the last bit needed to make tracebacks work properly again.
|
comment:4 follow-up: ↓ 6 Changed 3 years ago by
I have pretty good contacts with Cython upstream (according to github, I am the number 6 Cython contributor). So if you want to propose something, there is a good chance that it will be accepted.
From your description, it seems that the simplest solution would be to add a custom __loader__
for Cython modules.
New commits:
8b63abe | Add get_source for Cython modules. This is the last bit needed to make tracebacks work properly again.
|
comment:5 in reply to: ↑ description ; follow-up: ↓ 7 Changed 3 years ago by
Replying to embray:
In
linecache.updatecache
, before it tries thesys.path
search, it checks if the module object has a__loader__
Which module object? How does the mapping traceback -> module work?
comment:6 in reply to: ↑ 4 ; follow-ups: ↓ 8 ↓ 11 Changed 3 years ago by
Replying to jdemeyer:
I have pretty good contacts with Cython upstream (according to github, I am the number 6 Cython contributor). So if you want to propose something, there is a good chance that it will be accepted.
From your description, it seems that the simplest solution would be to add a custom
__loader__
for Cython modules.
I haven't tried it yet, but that might work if the __loader__
were just pre-baked into the extension module. It would definitely be preferable to the import system hacks here. I'll probably give that a try, but I wanted to see if we could make it work without patching Cython first.
comment:7 in reply to: ↑ 5 Changed 3 years ago by
Replying to jdemeyer:
Replying to embray:
In
linecache.updatecache
, before it tries thesys.path
search, it checks if the module object has a__loader__
Which module object? How does the mapping traceback -> module work?
Ultimately inspect.findsource
, which is used in formatting the traceback, goes through linecache
to get the source lines for a file. Well, if you look at how linecache.updatecache is implemented, it's funny--if it's passed a real filename then it will just read the source code directly out of the filename. The filename, in this case, is coming from the traceback frame's code object. However, as in our old friend #24097, since the Cython module's hard-coded path is a relative path, this actually fails unless you happen to be in the right directory. It then goes on to try going through the module's __loader__
if it has one...
comment:8 in reply to: ↑ 6 ; follow-up: ↓ 10 Changed 3 years ago by
Replying to embray:
I wanted to see if we could make it work without patching Cython first.
Why? If we can do it with just a Cython patch, that would be ideal because then everybody benefits, not just Sage.
comment:9 follow-up: ↓ 12 Changed 3 years ago by
FYI: the traceback frame for Cython code is manually constructed by Cython itself (exceptions from C code typically have no "frame"). So if anything needs to be changed there, that can easily be done.
comment:10 in reply to: ↑ 8 Changed 3 years ago by
Replying to jdemeyer:
Replying to embray:
I wanted to see if we could make it work without patching Cython first.
Why? If we can do it with just a Cython patch, that would be ideal because then everybody benefits, not just Sage.
Oh, agree completely. I just didn't want to, as a first pass, depend on one without knowing when such a patch would be available in an official release. Or, at the time of writing, I wasn't even sure if it was a good idea (just because it surprised me that such a feature didn't already exit).
comment:11 in reply to: ↑ 6 ; follow-up: ↓ 13 Changed 3 years ago by
Replying to embray:
Replying to jdemeyer:
I have pretty good contacts with Cython upstream (according to github, I am the number 6 Cython contributor). So if you want to propose something, there is a good chance that it will be accepted.
From your description, it seems that the simplest solution would be to add a custom
__loader__
for Cython modules.I haven't tried it yet, but that might work if the
__loader__
were just pre-baked into the extension module. It would definitely be preferable to the import system hacks here. I'll probably give that a try, but I wanted to see if we could make it work without patching Cython first.
In retrospect, I don't know if this will work after all. It's the import system itself that constructs module
objects and assigns to their __loader__
attribute (with the loader that was used to load the module code in the first place). So I do't know that it's possible for a module to come with __loader__
already in its namespace and not just have it overridden. So it might be necessary to have some import hooks after all if we wanted to provide a custom loader for Cython modules.
That said, it would still be possible to provide better introspection of Cython modules if the Cython sources were (optionally) baked into the module somewhere, and that's a separate issue and easy to add to Cython.
comment:12 in reply to: ↑ 9 Changed 3 years ago by
Replying to jdemeyer:
FYI: the traceback frame for Cython code is manually constructed by Cython itself (exceptions from C code typically have no "frame"). So if anything needs to be changed there, that can easily be done.
No, I don't think anything needs to be done there (modulo the issue about filename paths, but that's somewhat orthogonal).
comment:13 in reply to: ↑ 11 Changed 3 years ago by
Replying to embray:
In retrospect, I don't know if this will work after all. It's the import system itself that constructs
module
objects and assigns to their__loader__
attribute (with the loader that was used to load the module code in the first place). So I do't know that it's possible for a module to come with__loader__
already in its namespace and not just have it overridden.
I'll have a look. Isn't the __loader__
just an entry in the module dict? Surely that dict can be changed.
comment:14 Changed 3 years ago by
Simple fix which works for me on a sample project:
del globals()["__loader__"]
It still requires that the sources can be found in sys.path
.
I'm not saying that this is the perfect solution, but it gives a starting point that playing with __loader__
works. Now I'll try to implement a custom __loader__
.
comment:15 Changed 3 years ago by
To my surprise, the import system actually does not, by default override __loader__
if it's already in the module: https://github.com/python/cpython/blob/3.6/Lib/importlib/_bootstrap.py#L504
So I guess you could just put whatever you want in a module's __loader__
attribute. That is, unless that function is called with override=True
, and it's not entirely clear to me in what context that would happen, or if this behavior is even documented. In other words, I'm not sure if there's well-defined behavior for this.
comment:16 Changed 3 years ago by
Got it to work!
def error(): raise RuntimeError("traceback testing") from importlib.machinery import ExtensionFileLoader class CythonExtensionLoader(ExtensionFileLoader): def __init__(self, name): self.name = name @property def path(self): return __file__ def get_source(self, fullname): print("get_source({}) called".format(fullname)) with open("/usr/local/src/sage-config/local/src/cyloader/cyloader/loader.pyx") as f: return f.read() __loader__ = CythonExtensionLoader(__name__)
By setting a Cython-aware __loader__
when the module is initialized, we can fix its get_source
method to do what we want. Calling error()
does display the source in the traceback.
comment:17 follow-up: ↓ 19 Changed 3 years ago by
I see, there is actually some better documentation on this here: https://docs.python.org/3/reference/import.html?highlight=_init_module_attrs#loading
All the attributes set by the import system are set before the module is actually executed. When the module is executed it can, of course, put whatever it wants in its own namespace, including overriding anything the import system put there.
comment:18 Changed 3 years ago by
Nice! I wouldn't have thought that would work, but I'm glad it does. A real implementation might also consider wrapping whatever __loader__
is already there.
Now it would be nice to add a flag to Cython to include the source code in the module, but in the meantime my prototype, which just installs the .pyx files alongside the extension modules, could also work.
comment:19 in reply to: ↑ 17 Changed 3 years ago by
Replying to embray:
All the attributes set by the import system are set before the module is actually executed.
No, that's not the case. In my first attempt, I tried to simply change the __class__
of the existing __loader__
. But there is no existing __loader__
when the extension module is being initialized.
PEP 489 is certainly relevant here, but Cython does not fully support that yet.
comment:20 Changed 3 years ago by
Hmm, my solution only works in IPython so far, not in plain Python.
comment:21 Changed 3 years ago by
The Python REPL uses the default excepthook
to display exceptions. When it cannot find the file, it does something strange: it essentially tries os.path.join(path, basename)
where path
loops over the sys.path
entries and basename
is the last part(!) of the filename in the traceback. This goes back to
commit 7169dbb76dcd1d25f2989eb00da33d05c3d6279b Author: Guido van Rossum <guido@python.org> Date: Wed Feb 26 15:17:59 1992 +0000 Move printing of filename and lineno to tb_displayline. Search sys.path if the filename isn't found. Include osdefs.h.
comment:22 Changed 3 years ago by
But I guess we care more IPython than Python, so I'm just going to ignore the plain Python REPL for now.
comment:23 Changed 3 years ago by
And the overriding of __loader__
won't work with pxd files: there is no way for the get_source()
method to know the filename that it should return the sources for.
comment:24 Changed 3 years ago by
OK, new and super-simple solution which just reverts the Python 3 behaviour to the Python 2 behaviour:
class DeletedAttribute(object): def __get__(self, instance, owner): raise AttributeError from importlib.machinery import ExtensionFileLoader ExtensionFileLoader.get_source = DeletedAttribute()
It effectively removes the get_source
attribute from ExtensionFileLoader
. This causes the linecache
module to continuing searching for the source file as in Python 2.
comment:25 Changed 3 years ago by
- Branch changed from u/embray/python3/cython-source-prototype to u/jdemeyer/python3/cython-source-prototype
comment:26 Changed 3 years ago by
- Commit changed from 8b63abe731c510f0de9ef0e3ab9a0bda3669dce1 to 1a9225f70f8a49428d1874a47252a46791d3746d
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
1a9225f | Fix Cython tracebacks on Python 3
|
comment:27 Changed 3 years ago by
- Description modified (diff)
- Status changed from new to needs_review
comment:28 Changed 3 years ago by
The approach with the custom __loader__
can never work because it cannot differentiate between the .pyx
file and other files. It assumes that one module has one source file, which is generally not the case for Cython.
So I propose this simple but effective solution of monkey-patching ExtensionFileLoader
.
comment:29 Changed 3 years ago by
- Description modified (diff)
- Report Upstream changed from N/A to Reported upstream. No feedback yet.
comment:30 Changed 3 years ago by
Regardless of the upstream Python or Cython bug, could you accept this as a working-but-improvable solution?
comment:31 follow-ups: ↓ 33 ↓ 38 Changed 3 years ago by
No, I'm -1 on this, I don't think it's a good solution even if it happens to work. If this were urgent I'd say "sure", but we have time and flexibility here to come up with a better solution.
comment:32 Changed 3 years ago by
It is an interesting question concerning the split of some modules between a .pyx and a .pxd. In most cases only the .pyx is relevant or interesting, but some functions are implemented in the .pxd file. This is more rare, however.
comment:33 in reply to: ↑ 31 Changed 3 years ago by
- Milestone changed from sage-wishlist to sage-8.2
comment:34 follow-up: ↓ 35 Changed 3 years ago by
I mean it's obviously a hack, and an ugly one at that. This also doesn't solve the issue of multiple source files any better. I think we really need to think about how to handle that case well--I think it will require rethinking some things in Cython.
That said, having even just a hack for this is could still be very useful for debugging issues in Python 3, so I've changed my mind and don't really have a problem giving this a try for now as long as we open a new ticket to track this issue (or leave this ticket open and create a new ticket for the workaround).
comment:35 in reply to: ↑ 34 Changed 3 years ago by
Replying to embray:
This also doesn't solve the issue of multiple source files any better.
In what way does it not solve that problem? The whole reason not to use __loader__.get_source()
is precisely to support multiple source files from the same module.
comment:36 follow-up: ↓ 37 Changed 3 years ago by
I must be missing something, but how do we support that case currently? E.g. if some module has a .pxd
and a .pyx
, and some function in that module is implemented in one of those source files but not the other, how do we currently load the source for that function from the correct source file?
comment:37 in reply to: ↑ 36 ; follow-up: ↓ 39 Changed 3 years ago by
Replying to embray:
how do we currently load the source for that function from the correct source file?
The traceback contains the source filename:
sage: try: ....: 1/0 ....: except ZeroDivisionError as E: ....: tb = sys.exc_info()[2] sage: tb.tb_next.tb_frame.f_code.co_filename 'sage/rings/integer.pyx'
comment:38 in reply to: ↑ 31 Changed 3 years ago by
Replying to embray:
No, I'm -1 on this, I don't think it's a good solution even if it happens to work.
An ugly/hackish solution is still better than no solution at all.
I simply don't see a better solution, not even in theory.
comment:39 in reply to: ↑ 37 ; follow-up: ↓ 40 Changed 3 years ago by
Replying to jdemeyer:
Replying to embray:
how do we currently load the source for that function from the correct source file?
The traceback contains the source filename:
I see--that's those fake Cython hacked frames with fake code objects. I didn't realize that would include the correct filename even for functions defined in .pxd
file but apparently it does.
An ugly/hackish solution is still better than no solution at all.
I agree, as long as we open a new ticket--either a separate ticket from this one for the hack solution, or a new ticket for tracking the issue (I'd prefer the former since closing this ticket means closing some valuable discussion).
I simply don't see a better solution, not even in theory.
I still have some ideas about this, but it would require modification to Cython for sure.
Additionally (and much longer term) a very large part of the problem is the Python Loader API's assumption that a module has a single source file. It seems the assumption is that "for extension modules sure there may be many source files, but it's not useful for Python-level introspection to be able to return them". Sure, for normal extension modules. But the Cython case obviously disproves that in general. I think this is worth taking up but that's not going to help anytime soon.
comment:40 in reply to: ↑ 39 Changed 3 years ago by
Replying to embray:
I still have some ideas about this, but it would require modification to Cython for sure.
It is possible to explain what you are thinking of?
comment:41 Changed 3 years ago by
I could try to explain but it's still a bit hand-wavey at the moment. It also comes back around to the issue of subclassing the function type.
comment:42 Changed 3 years ago by
- Description modified (diff)
- Milestone changed from sage-8.2 to sage-8.3
comment:43 Changed 3 years ago by
Ping? Having this in Sage would help with the Python 3 porting effort...
comment:44 Changed 3 years ago by
- Description modified (diff)
If you prefer instead to add the Python patch from https://github.com/python/cpython/pull/6653 that would obviously be fine for me too.
comment:45 Changed 3 years ago by
- Reviewers set to Frédéric Chapoton
- Status changed from needs_review to positive_review
Bot is green.. ok, let it be.
comment:46 Changed 3 years ago by
- Branch changed from u/jdemeyer/python3/cython-source-prototype to 1a9225f70f8a49428d1874a47252a46791d3746d
- Resolution set to fixed
- Status changed from positive_review to closed
Here's a Python 3 only rough prototype of how this might work, in this case via simply installing the
.pyx
files alongside the compiled modules. Doing anything like inlining the source code in the compiled modules looks like it will be trickier than I thought without some modification to Cython.This instead recognizes Cython modules by looking for
.pyx
files associated with extension modules. It then wraps such modules with aCythonModule
class which has a__cython_source__
property which can read in the sources. Unfortunately this doesn't yet handle reading sources from.pxd
files, but that could probably be added with a bit of elbow grease.Unfortunately, about 80% of this patch is some very general machinery that I needed just to get Python's import system to handle a new module suffix (in fact, this would even be necessary to override the handling of already known suffixes). This used to be easier to do in Python 2--I've done it before relatively painlessly. But the new import system made that otherwise simple use case much more difficult--I've raised this issue with the python-ideas mailing list. In the meantime my workaround is fine, but an added complication I would have wished to have avoided...
New commits:
Rough sketch of supporting Cython file sources by way of installing the