Opened 4 years ago

Closed 4 years ago

#24097 closed defect (fixed)

Broken paths in cython()'d modules

Reported by: embray Owned by:
Priority: critical Milestone: sage-8.1
Component: misc Keywords: sageinspect cython cythonize
Cc: jdemeyer, fbissey Merged in:
Authors: Erik Bray Reviewers: Jeroen Demeyer
Report Upstream: N/A Work issues:
Branch: 1cf56a5 (Commits, GitHub, GitLab) Commit: 1cf56a587a15c89b9b759fa327fc255e7779eb6c
Dependencies: #24088 Stopgaps:

Status badges

Description

As discussed on the mailing list, when cythonizing a module, the hard-coded path to that file saved in the module is written as a relative path if the current working directory is somewhere along the path that the module file resides.

That is, if the cython module is at

/home/sage/.sage/temp/cython/mymodule.pyx

and the current working directory is /home/sage, then the module's path is written as

.sage/temp/cython/mymodule.pyx

.

This breaks sage.misc.sageinspect._extract_embedded_position which assumes that files generated with cython() are relative to the SPYX_TMP variable.

This could be considered a defect in Cython, but it's also easy to work around for now by chdir()-ing into the SPYX_TMP directory before running cythonize().

Change History (38)

comment:1 Changed 4 years ago by embray

  • Authors set to Erik Bray
  • Branch set to u/embray/misc/cython-path-bug
  • Commit set to f6dd73f77a132d790f7a0092b25403eb69f50805
  • Priority changed from major to critical
  • Status changed from new to needs_review

Marked as critical just because this regression prevents new builds of the docker container from being pushed, since the script that does so requires all tests to pass (though perhaps this requirement could be relaxed somewhat for pre-releases...)


New commits:

c90fe7fAdd some dependencies for ipykernel
65fa4c0Add a restore_cwd context manager
f6dd73fFixes https://trac.sagemath.org/ticket/24097 and includes a regression test

comment:2 Changed 4 years ago by embray

  • Dependencies set to #24088

Depends on #24088 only because I can't build without it.

comment:3 follow-up: Changed 4 years ago by chapoton

missing :: in the doc in sage inspect

comment:4 Changed 4 years ago by jdemeyer

What does the docker container do differently that causes this bug to only appear there? Or in other words, how could I reproduce this bug?

comment:5 Changed 4 years ago by jdemeyer

OK, after reading your mailing list post again, I understand the problem. It happens when you rundoctest from $HOME.

But I think the bug is really in _extract_embedded_position: it shouldn't assume that the path is relative to SPYX_TMP. In other words, the following patch also fixes the problem for me and I think it is a better solution:

  • src/sage/misc/sageinspect.py

    diff --git a/src/sage/misc/sageinspect.py b/src/sage/misc/sageinspect.py
    index c8e79e0..2628e07 100644
    a b def _extract_embedded_position(docstring): 
    222222        return None
    223223
    224224    raw_filename = res.group('FILENAME')
    225     filename = os.path.join(SAGE_SRC, raw_filename)
    226     if not os.path.isfile(filename):
    227         from sage.misc.misc import SPYX_TMP
    228         filename = os.path.join(SPYX_TMP, '_'.join(raw_filename.split('_')[:-1]), raw_filename)
     225    if os.path.isfile(raw_filename):
     226        filename = os.path.abspath(raw_filename)
     227    else:
     228        # Maybe the path is relative to SAGE_SRC
     229        sagesrcfile = os.path.join(SAGE_SRC, raw_filename)
     230        if os.path.isfile(sagesrcfile):
     231            filename = sagesrcfile
     232        else:
     233            # No idea, keep raw path
     234            filename = raw_filename
    229235    lineno = int(res.group('LINENO'))
    230236    original = res.group('ORIGINAL')
    231237    return (original, filename, lineno)

comment:6 follow-ups: Changed 4 years ago by embray

Hmm--I don't particularly like this either. If I understand your point, it's that this code in _extract_embedded_position should be able to work on arbitrary modules. But then, if it's a relative path, there's really know way of knowing where it's relative to. In fact it's making a wild guess when it checks if it's relative to SAGE_SRC. How would this handle the case where it's relative to SPYX_TMP?

This if os.path.isfile(raw_filename): filename = os.path.abspath(raw_filename) works if the filename is already an absolute path or if the file is a relative path and we just happen to be in the directory it's relative to. For example, you could cython() a function, then change directories, then try to inspect the function and this would fail.

I think for now my solution is better because it explicitly addresses correct handling of modules that were compiled by Sage's cython() function.

comment:7 in reply to: ↑ 3 Changed 4 years ago by embray

Replying to chapoton:

missing :: in the doc in sage inspect

Oh I see what you're saying. I'll fix that.

comment:8 Changed 4 years ago by git

  • Commit changed from f6dd73f77a132d790f7a0092b25403eb69f50805 to a43134ce20db0ba1463efdd3ff539a7dddcca5af

Branch pushed to git repo; I updated commit sha1. New commits:

a43134cFix docstring

comment:9 in reply to: ↑ 6 ; follow-up: Changed 4 years ago by jdemeyer

Replying to embray:

In fact it's making a wild guess when it checks if it's relative to SAGE_SRC.

That is not a wild guess. This is correct for functions in the Sage library, which is the most important use case.

comment:10 in reply to: ↑ 6 ; follow-up: Changed 4 years ago by jdemeyer

Replying to embray:

For example, you could cython() a function, then change directories, then try to inspect the function and this would fail.

OK, I see your point and I agree.

But then I have some further comments:

  1. Instead of changing the directory to target_dir and then using some dubious code in _extract_embedded_position to reconstruct the path, why not change directory to some other directory (say SAGE_LOCAL) to force Cython to write an absolute path in the first place?
  1. Only the cythonize() call should be inside the restore_cwd() context, not the whole try/except block.

comment:11 in reply to: ↑ 9 Changed 4 years ago by embray

Replying to jdemeyer:

Replying to embray:

In fact it's making a wild guess when it checks if it's relative to SAGE_SRC.

That is not a wild guess. This is correct for functions in the Sage library, which is the most important use case.

Sorry, I was unclear. It's not a "wild guess" and I agree that part of the code is actually fine. All I meant was that it's still a heuristic approach that can in theory be wrong (but probably isn't).

comment:12 in reply to: ↑ 10 ; follow-ups: Changed 4 years ago by embray

Replying to jdemeyer:

Replying to embray:

For example, you could cython() a function, then change directories, then try to inspect the function and this would fail.

OK, I see your point and I agree.

But then I have some further comments:

  1. Instead of changing the directory to target_dir and then using some dubious code in _extract_embedded_position to reconstruct the path, why not change directory to some other directory (say SAGE_LOCAL) to force Cython to write an absolute path in the first place?

I considered that, but it seemed like more of a hack around Cython. The current approach feels more explicit, along the lines of how we guess a path might be relative to SAGE_SRC. I'd be fine with either approach if you have strong feelings about it but I like this better and I don't think it's all that dubious.

  1. Only the cythonize() call should be inside the restore_cwd() context, not the whole try/except block.

Heh, I actually had a small debate with myself about this. I'm not saying I disagree but why do you think it should be inverted?

comment:13 in reply to: ↑ 12 ; follow-up: Changed 4 years ago by jdemeyer

Replying to embray:

I'm not saying I disagree but why do you think it should be inverted?

Because it is easier to understand if the minimal amount of code is inside the context. Otherwise, you give the impression that the except: block should be executed also in a specific directory.

comment:14 in reply to: ↑ 12 ; follow-up: Changed 4 years ago by jdemeyer

Replying to embray:

I considered that, but it seemed like more of a hack around Cython. The current approach feels more explicit, along the lines of how we guess a path might be relative to SAGE_SRC. I'd be fine with either approach if you have strong feelings about it but I like this better and I don't think it's all that dubious.

Ideally, Cython should just use absolute paths. That was, no hacks in _extract_embedded_position are needed.

comment:15 in reply to: ↑ 14 Changed 4 years ago by embray

Replying to jdemeyer:

Replying to embray:

I considered that, but it seemed like more of a hack around Cython. The current approach feels more explicit, along the lines of how we guess a path might be relative to SAGE_SRC. I'd be fine with either approach if you have strong feelings about it but I like this better and I don't think it's all that dubious.

Ideally, Cython should just use absolute paths. That was, no hacks in _extract_embedded_position are needed.

Well, yes, but we need this worked around now and I don't want to wait for/depend on a fix to Cython.

comment:16 in reply to: ↑ 13 ; follow-up: Changed 4 years ago by embray

Replying to jdemeyer:

Replying to embray:

I'm not saying I disagree but why do you think it should be inverted?

Because it is easier to understand if the minimal amount of code is inside the context. Otherwise, you give the impression that the except: block should be executed also in a specific directory.

True, but you could also argue the other way around it makes less clear exactly what the exception handling pertains to. In this case it's clear enough, but it's also just as clear that there's nothing in the exception handling that's dependent on the current directory.

Perhaps in a case like this the context manager is almost more of a distraction and deserves to be simply a finally: statement. I just added it because I've written the same context manager almost every time I've wanted some code that temporarily changes directories.

comment:17 in reply to: ↑ 16 ; follow-up: Changed 4 years ago by jdemeyer

Replying to embray:

Perhaps in a case like this the context manager is almost more of a distraction and deserves to be simply a finally: statement.

Actually, that makes a lot of sense here, especially because we already have a try block.

comment:18 Changed 4 years ago by fbissey

  • Cc fbissey added

comment:19 in reply to: ↑ 17 Changed 4 years ago by embray

Replying to jdemeyer:

Replying to embray:

Perhaps in a case like this the context manager is almost more of a distraction and deserves to be simply a finally: statement.

Actually, that makes a lot of sense here, especially because we already have a try block.

Indeed; I'll just drop the context manager for now and maybe it add it back in later.

comment:20 Changed 4 years ago by embray

On second thought, I ended up keeping the context manager as a utility, as it's useful for implementing the regression test (indeed most of the times I find myself in need of such a utility it's for testing). So I only used it in the doctest for now.

comment:21 Changed 4 years ago by embray

  • Branch changed from u/embray/misc/cython-path-bug to u/embray/misc/cython-path-bug-2
  • Commit changed from a43134ce20db0ba1463efdd3ff539a7dddcca5af to e03cf202160df6a8d619363ad7405e9dd443eb1c

Kept basically the same logic in sageinspect for now, but I think clarified the approach somewhat. Still open to other ideas, including just forcing Cython to write an absolute path.


New commits:

3662dbdFixes https://trac.sagemath.org/ticket/24097 and includes a regression test
e03cf20Make the logic for how this funcion handles relative paths a bit clearer

comment:22 Changed 4 years ago by git

  • Commit changed from e03cf202160df6a8d619363ad7405e9dd443eb1c to 0a1259e4237f72feae65b6118116f3defa0b5052

Branch pushed to git repo; I updated commit sha1. New commits:

0a1259eThis didn't work as advertised if the filename relative to SPYX_TMP was not

comment:23 follow-up: Changed 4 years ago by jdemeyer

Funny that you put os.chdir(target_dir) inside the try. I remember a long discussion about that :-)

comment:24 in reply to: ↑ 23 Changed 4 years ago by embray

Replying to jdemeyer:

Funny that you put os.chdir(target_dir) inside the try. I remember a long discussion about that :-)

I thought about that.... I did it because I didn't want to have the discussion again.

comment:25 follow-up: Changed 4 years ago by embray

So is this fine for now?

comment:26 in reply to: ↑ 25 Changed 4 years ago by jdemeyer

Replying to embray:

So is this fine for now?

I still think that this would be better fixed upstream in Cython (I just wrote to cython-devel about it). How important/urgent is this ticket?

comment:27 Changed 4 years ago by embray

It is urgent because it's a regression in the tests. With the fix the overall approach is no better/worse than it was before except that it doesn't break due to the user's current directory so it's an obvious improvement.

Last edited 4 years ago by embray (previous) (diff)

comment:28 Changed 4 years ago by embray

In other words, I'm going to change the default behavior of the docker images just to work around a minor regression in the test suite that is demonstrably fixed.

comment:29 Changed 4 years ago by jdemeyer

  • Reviewers set to Jeroen Demeyer

I can live with this as work-around. I still think that it should be fixed in a better way in the future.

Did you verify that your docker image works with the latest version of this ticket?

comment:30 Changed 4 years ago by embray

Yes, in fact that was the first place I tested it. Feel free to open a follow-up ticket if you think it should be updated later.

comment:31 Changed 4 years ago by jdemeyer

  • Status changed from needs_review to positive_review

comment:32 Changed 4 years ago by jdemeyer

  • Status changed from positive_review to needs_work

Sorry, one final detail: I don't like to add more stuff to sage.misc.misc. Would you agree to add restore_cwd to sage.misc.sage_ostools (or some other more specific module)?

comment:33 Changed 4 years ago by embray

I certainly don't mind moving it (I would suggest breaking up sage.misc.misc even further eventually). I just wasn't sure where would be better. I don't know about sage_ostools (do we really need "sage" in the name of that module? It seems superfluous...)

Instead ISTM there's a case for some kind of pathutils module for filename/path-related utilities.

comment:34 Changed 4 years ago by jdemeyer

  • Branch changed from u/embray/misc/cython-path-bug-2 to u/jdemeyer/misc/cython-path-bug-2

comment:35 Changed 4 years ago by jdemeyer

  • Commit changed from 0a1259e4237f72feae65b6118116f3defa0b5052 to 1cf56a587a15c89b9b759fa327fc255e7779eb6c
  • Status changed from needs_work to positive_review

New commits:

fa989e3Add a restore_cwd context manager
1cf56a5Fixes https://trac.sagemath.org/ticket/24097 and includes a regression test

comment:36 Changed 4 years ago by jdemeyer

Moved to sage_ostools.

comment:37 Changed 4 years ago by embray

I'm not crazy about "sage_ostools", but whatever works.

comment:38 Changed 4 years ago by vbraun

  • Branch changed from u/jdemeyer/misc/cython-path-bug-2 to 1cf56a587a15c89b9b759fa327fc255e7779eb6c
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.