Opened 3 years ago

Closed 3 years ago

#24681 closed defect (fixed)

Fix Cython tracebacks on Python 3

Reported by: embray Owned by:
Priority: major Milestone: sage-8.3
Component: cython Keywords:
Cc: Merged in:
Authors: Jeroen Demeyer, Erik Bray Reviewers: Frédéric Chapoton
Report Upstream: Reported upstream. No feedback yet. Work issues:
Branch: 1a9225f (Commits, GitHub, GitLab) Commit: 1a9225f70f8a49428d1874a47252a46791d3746d
Dependencies: Stopgaps:

Status badges

Description (last modified by jdemeyer)

I've been getting to the bottom of why tracebacks aren't displaying Cython sources on Python 3. The problem turns out to be multi-faceted, and fixing it right would also mean fixing some things in general, since the way it works now is a hack.

Basically, the only reason it works at all on Python 2, is that the linecache module on Python 2 has some code which, given a relative filename (with an extension unrecognized by the import system) like sage/rings/integer.pyx, it will search for this file under all sys.path entries and, if found, read the source lines from that file.

This, in turn, only works for Sage because we actually install the .pyx sources in the sage package.

This does not work in Python 3. In linecache.updatecache, before it tries the sys.path search, it checks if the module object has a __loader__, and calls its get_source() method if it exists. On Python 2 this isn't a problem since modules don't necessarily have a __loader__, and in particular extension modules don't. But on the reworked import system in Python 3, pretty much every module has a __loader__--in the case of extension modules an ExtensionFileLoader. But the built-in ExtensionFileLoader of course knows nothing about Cython so its get_source() method just returns None. linecache.updatecache assumes this is correct (why would the loader lie?) and returns.

The simplest way to fix this is to remove the get_source() method from the ExtensionFileLoader class. This way, Python 3 works the same way as Python 2.

Upstream:

Change History (46)

comment:1 Changed 3 years ago by embray

  • Description modified (diff)

comment:2 Changed 3 years ago by embray

  • Branch set to u/embray/python3/cython-source-prototype
  • Commit set to 3388b6606e30c3d3c0fad1566b74c790772dfe1d

Here's a Python 3 only rough prototype of how this might work, in this case via simply installing the .pyx files alongside the compiled modules. Doing anything like inlining the source code in the compiled modules looks like it will be trickier than I thought without some modification to Cython.

This instead recognizes Cython modules by looking for .pyx files associated with extension modules. It then wraps such modules with a CythonModule class which has a __cython_source__ property which can read in the sources. Unfortunately this doesn't yet handle reading sources from .pxd files, but that could probably be added with a bit of elbow grease.

Unfortunately, about 80% of this patch is some very general machinery that I needed just to get Python's import system to handle a new module suffix (in fact, this would even be necessary to override the handling of already known suffixes). This used to be easier to do in Python 2--I've done it before relatively painlessly. But the new import system made that otherwise simple use case much more difficult--I've raised this issue with the python-ideas mailing list. In the meantime my workaround is fine, but an added complication I would have wished to have avoided...


New commits:

3388b66Rough sketch of supporting Cython file sources by way of installing the

comment:3 Changed 3 years ago by git

  • Commit changed from 3388b6606e30c3d3c0fad1566b74c790772dfe1d to 8b63abe731c510f0de9ef0e3ab9a0bda3669dce1

Branch pushed to git repo; I updated commit sha1. New commits:

8b63abeAdd get_source for Cython modules. This is the last bit needed to make tracebacks work properly again.

comment:4 follow-up: Changed 3 years ago by jdemeyer

I have pretty good contacts with Cython upstream (according to github, I am the number 6 Cython contributor). So if you want to propose something, there is a good chance that it will be accepted.

From your description, it seems that the simplest solution would be to add a custom __loader__ for Cython modules.


New commits:

8b63abeAdd get_source for Cython modules. This is the last bit needed to make tracebacks work properly again.

comment:5 in reply to: ↑ description ; follow-up: Changed 3 years ago by jdemeyer

Replying to embray:

In linecache.updatecache, before it tries the sys.path search, it checks if the module object has a __loader__

Which module object? How does the mapping traceback -> module work?

comment:6 in reply to: ↑ 4 ; follow-ups: Changed 3 years ago by embray

Replying to jdemeyer:

I have pretty good contacts with Cython upstream (according to github, I am the number 6 Cython contributor). So if you want to propose something, there is a good chance that it will be accepted.

From your description, it seems that the simplest solution would be to add a custom __loader__ for Cython modules.

I haven't tried it yet, but that might work if the __loader__ were just pre-baked into the extension module. It would definitely be preferable to the import system hacks here. I'll probably give that a try, but I wanted to see if we could make it work without patching Cython first.

comment:7 in reply to: ↑ 5 Changed 3 years ago by embray

Replying to jdemeyer:

Replying to embray:

In linecache.updatecache, before it tries the sys.path search, it checks if the module object has a __loader__

Which module object? How does the mapping traceback -> module work?

Ultimately inspect.findsource, which is used in formatting the traceback, goes through linecache to get the source lines for a file. Well, if you look at how linecache.updatecache is implemented, it's funny--if it's passed a real filename then it will just read the source code directly out of the filename. The filename, in this case, is coming from the traceback frame's code object. However, as in our old friend #24097, since the Cython module's hard-coded path is a relative path, this actually fails unless you happen to be in the right directory. It then goes on to try going through the module's __loader__ if it has one...

comment:8 in reply to: ↑ 6 ; follow-up: Changed 3 years ago by jdemeyer

Replying to embray:

I wanted to see if we could make it work without patching Cython first.

Why? If we can do it with just a Cython patch, that would be ideal because then everybody benefits, not just Sage.

comment:9 follow-up: Changed 3 years ago by jdemeyer

FYI: the traceback frame for Cython code is manually constructed by Cython itself (exceptions from C code typically have no "frame"). So if anything needs to be changed there, that can easily be done.

comment:10 in reply to: ↑ 8 Changed 3 years ago by embray

Replying to jdemeyer:

Replying to embray:

I wanted to see if we could make it work without patching Cython first.

Why? If we can do it with just a Cython patch, that would be ideal because then everybody benefits, not just Sage.

Oh, agree completely. I just didn't want to, as a first pass, depend on one without knowing when such a patch would be available in an official release. Or, at the time of writing, I wasn't even sure if it was a good idea (just because it surprised me that such a feature didn't already exit).

comment:11 in reply to: ↑ 6 ; follow-up: Changed 3 years ago by embray

Replying to embray:

Replying to jdemeyer:

I have pretty good contacts with Cython upstream (according to github, I am the number 6 Cython contributor). So if you want to propose something, there is a good chance that it will be accepted.

From your description, it seems that the simplest solution would be to add a custom __loader__ for Cython modules.

I haven't tried it yet, but that might work if the __loader__ were just pre-baked into the extension module. It would definitely be preferable to the import system hacks here. I'll probably give that a try, but I wanted to see if we could make it work without patching Cython first.

In retrospect, I don't know if this will work after all. It's the import system itself that constructs module objects and assigns to their __loader__ attribute (with the loader that was used to load the module code in the first place). So I do't know that it's possible for a module to come with __loader__ already in its namespace and not just have it overridden. So it might be necessary to have some import hooks after all if we wanted to provide a custom loader for Cython modules.

That said, it would still be possible to provide better introspection of Cython modules if the Cython sources were (optionally) baked into the module somewhere, and that's a separate issue and easy to add to Cython.

comment:12 in reply to: ↑ 9 Changed 3 years ago by embray

Replying to jdemeyer:

FYI: the traceback frame for Cython code is manually constructed by Cython itself (exceptions from C code typically have no "frame"). So if anything needs to be changed there, that can easily be done.

No, I don't think anything needs to be done there (modulo the issue about filename paths, but that's somewhat orthogonal).

comment:13 in reply to: ↑ 11 Changed 3 years ago by jdemeyer

Replying to embray:

In retrospect, I don't know if this will work after all. It's the import system itself that constructs module objects and assigns to their __loader__ attribute (with the loader that was used to load the module code in the first place). So I do't know that it's possible for a module to come with __loader__ already in its namespace and not just have it overridden.

I'll have a look. Isn't the __loader__ just an entry in the module dict? Surely that dict can be changed.

comment:14 Changed 3 years ago by jdemeyer

Simple fix which works for me on a sample project:

del globals()["__loader__"]

It still requires that the sources can be found in sys.path.

I'm not saying that this is the perfect solution, but it gives a starting point that playing with __loader__ works. Now I'll try to implement a custom __loader__.

comment:15 Changed 3 years ago by embray

To my surprise, the import system actually does not, by default override __loader__ if it's already in the module: https://github.com/python/cpython/blob/3.6/Lib/importlib/_bootstrap.py#L504

So I guess you could just put whatever you want in a module's __loader__ attribute. That is, unless that function is called with override=True, and it's not entirely clear to me in what context that would happen, or if this behavior is even documented. In other words, I'm not sure if there's well-defined behavior for this.

comment:16 Changed 3 years ago by jdemeyer

Got it to work!

def error():
    raise RuntimeError("traceback testing")


from importlib.machinery import ExtensionFileLoader

class CythonExtensionLoader(ExtensionFileLoader):
    def __init__(self, name):
        self.name = name

    @property
    def path(self):
        return __file__

    def get_source(self, fullname):
        print("get_source({}) called".format(fullname))
        with open("/usr/local/src/sage-config/local/src/cyloader/cyloader/loader.pyx") as f:
            return f.read()


__loader__ = CythonExtensionLoader(__name__)

By setting a Cython-aware __loader__ when the module is initialized, we can fix its get_source method to do what we want. Calling error() does display the source in the traceback.

comment:17 follow-up: Changed 3 years ago by embray

I see, there is actually some better documentation on this here: https://docs.python.org/3/reference/import.html?highlight=_init_module_attrs#loading

All the attributes set by the import system are set before the module is actually executed. When the module is executed it can, of course, put whatever it wants in its own namespace, including overriding anything the import system put there.

comment:18 Changed 3 years ago by embray

Nice! I wouldn't have thought that would work, but I'm glad it does. A real implementation might also consider wrapping whatever __loader__ is already there.

Now it would be nice to add a flag to Cython to include the source code in the module, but in the meantime my prototype, which just installs the .pyx files alongside the extension modules, could also work.

comment:19 in reply to: ↑ 17 Changed 3 years ago by jdemeyer

Replying to embray:

All the attributes set by the import system are set before the module is actually executed.

No, that's not the case. In my first attempt, I tried to simply change the __class__ of the existing __loader__. But there is no existing __loader__ when the extension module is being initialized.

PEP 489 is certainly relevant here, but Cython does not fully support that yet.

comment:20 Changed 3 years ago by jdemeyer

Hmm, my solution only works in IPython so far, not in plain Python.

comment:21 Changed 3 years ago by jdemeyer

The Python REPL uses the default excepthook to display exceptions. When it cannot find the file, it does something strange: it essentially tries os.path.join(path, basename) where path loops over the sys.path entries and basename is the last part(!) of the filename in the traceback. This goes back to

commit 7169dbb76dcd1d25f2989eb00da33d05c3d6279b
Author: Guido van Rossum <guido@python.org>
Date:   Wed Feb 26 15:17:59 1992 +0000

    Move printing of filename and lineno to tb_displayline.
    Search sys.path if the filename isn't found.
    Include osdefs.h.

comment:22 Changed 3 years ago by jdemeyer

But I guess we care more IPython than Python, so I'm just going to ignore the plain Python REPL for now.

comment:23 Changed 3 years ago by jdemeyer

And the overriding of __loader__ won't work with pxd files: there is no way for the get_source() method to know the filename that it should return the sources for.

comment:24 Changed 3 years ago by jdemeyer

OK, new and super-simple solution which just reverts the Python 3 behaviour to the Python 2 behaviour:

class DeletedAttribute(object):
    def __get__(self, instance, owner):
        raise AttributeError


from importlib.machinery import ExtensionFileLoader
ExtensionFileLoader.get_source = DeletedAttribute()

It effectively removes the get_source attribute from ExtensionFileLoader. This causes the linecache module to continuing searching for the source file as in Python 2.

comment:25 Changed 3 years ago by jdemeyer

  • Branch changed from u/embray/python3/cython-source-prototype to u/jdemeyer/python3/cython-source-prototype

comment:26 Changed 3 years ago by git

  • Commit changed from 8b63abe731c510f0de9ef0e3ab9a0bda3669dce1 to 1a9225f70f8a49428d1874a47252a46791d3746d

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

1a9225fFix Cython tracebacks on Python 3

comment:27 Changed 3 years ago by jdemeyer

  • Authors set to Jeroen Demeyer, Erik Bray
  • Description modified (diff)
  • Status changed from new to needs_review

comment:28 Changed 3 years ago by jdemeyer

The approach with the custom __loader__ can never work because it cannot differentiate between the .pyx file and other files. It assumes that one module has one source file, which is generally not the case for Cython.

So I propose this simple but effective solution of monkey-patching ExtensionFileLoader.

comment:29 Changed 3 years ago by jdemeyer

  • Description modified (diff)
  • Report Upstream changed from N/A to Reported upstream. No feedback yet.

comment:30 Changed 3 years ago by jdemeyer

Regardless of the upstream Python or Cython bug, could you accept this as a working-but-improvable solution?

comment:31 follow-ups: Changed 3 years ago by embray

No, I'm -1 on this, I don't think it's a good solution even if it happens to work. If this were urgent I'd say "sure", but we have time and flexibility here to come up with a better solution.

comment:32 Changed 3 years ago by embray

It is an interesting question concerning the split of some modules between a .pyx and a .pxd. In most cases only the .pyx is relevant or interesting, but some functions are implemented in the .pxd file. This is more rare, however.

comment:33 in reply to: ↑ 31 Changed 3 years ago by jdemeyer

  • Milestone changed from sage-wishlist to sage-8.2

Replying to embray:

No, I'm -1 on this, I don't think it's a good solution even if it happens to work.

Can you at least say why you think that it's not a good solution?

I guess that you would like to use __loader__.get_source() but that won't work, see 28.

comment:34 follow-up: Changed 3 years ago by embray

I mean it's obviously a hack, and an ugly one at that. This also doesn't solve the issue of multiple source files any better. I think we really need to think about how to handle that case well--I think it will require rethinking some things in Cython.

That said, having even just a hack for this is could still be very useful for debugging issues in Python 3, so I've changed my mind and don't really have a problem giving this a try for now as long as we open a new ticket to track this issue (or leave this ticket open and create a new ticket for the workaround).

comment:35 in reply to: ↑ 34 Changed 3 years ago by jdemeyer

Replying to embray:

This also doesn't solve the issue of multiple source files any better.

In what way does it not solve that problem? The whole reason not to use __loader__.get_source() is precisely to support multiple source files from the same module.

comment:36 follow-up: Changed 3 years ago by embray

I must be missing something, but how do we support that case currently? E.g. if some module has a .pxd and a .pyx, and some function in that module is implemented in one of those source files but not the other, how do we currently load the source for that function from the correct source file?

comment:37 in reply to: ↑ 36 ; follow-up: Changed 3 years ago by jdemeyer

Replying to embray:

how do we currently load the source for that function from the correct source file?

The traceback contains the source filename:

sage: try:
....:     1/0
....: except ZeroDivisionError as E:
....:     tb = sys.exc_info()[2]
sage: tb.tb_next.tb_frame.f_code.co_filename
'sage/rings/integer.pyx'

comment:38 in reply to: ↑ 31 Changed 3 years ago by jdemeyer

Replying to embray:

No, I'm -1 on this, I don't think it's a good solution even if it happens to work.

An ugly/hackish solution is still better than no solution at all.

I simply don't see a better solution, not even in theory.

comment:39 in reply to: ↑ 37 ; follow-up: Changed 3 years ago by embray

Replying to jdemeyer:

Replying to embray:

how do we currently load the source for that function from the correct source file?

The traceback contains the source filename:

I see--that's those fake Cython hacked frames with fake code objects. I didn't realize that would include the correct filename even for functions defined in .pxd file but apparently it does.

An ugly/hackish solution is still better than no solution at all.

I agree, as long as we open a new ticket--either a separate ticket from this one for the hack solution, or a new ticket for tracking the issue (I'd prefer the former since closing this ticket means closing some valuable discussion).

I simply don't see a better solution, not even in theory.

I still have some ideas about this, but it would require modification to Cython for sure.

Additionally (and much longer term) a very large part of the problem is the Python Loader API's assumption that a module has a single source file. It seems the assumption is that "for extension modules sure there may be many source files, but it's not useful for Python-level introspection to be able to return them". Sure, for normal extension modules. But the Cython case obviously disproves that in general. I think this is worth taking up but that's not going to help anytime soon.

comment:40 in reply to: ↑ 39 Changed 3 years ago by jdemeyer

Replying to embray:

I still have some ideas about this, but it would require modification to Cython for sure.

It is possible to explain what you are thinking of?

comment:41 Changed 3 years ago by embray

I could try to explain but it's still a bit hand-wavey at the moment. It also comes back around to the issue of subclassing the function type.

comment:42 Changed 3 years ago by jdemeyer

  • Description modified (diff)
  • Milestone changed from sage-8.2 to sage-8.3

comment:43 Changed 3 years ago by jdemeyer

Ping? Having this in Sage would help with the Python 3 porting effort...

comment:44 Changed 3 years ago by jdemeyer

  • Description modified (diff)

If you prefer instead to add the Python patch from https://github.com/python/cpython/pull/6653 that would obviously be fine for me too.

comment:45 Changed 3 years ago by chapoton

  • Reviewers set to Frédéric Chapoton
  • Status changed from needs_review to positive_review

Bot is green.. ok, let it be.

comment:46 Changed 3 years ago by vbraun

  • Branch changed from u/jdemeyer/python3/cython-source-prototype to 1a9225f70f8a49428d1874a47252a46791d3746d
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.