Opened 3 years ago

Last modified 2 years ago

#22655 needs_work enhancement

Support package_data-like of non-Python resource files in Python packages

Reported by: embray Owned by:
Priority: major Milestone: sage-8.0
Component: build Keywords:
Cc: mkoeppe Merged in:
Authors: Erik Bray Reviewers:
Report Upstream: N/A Work issues:
Branch: u/embray/build/package-data (Commits) Commit: 5d3065295426415c4d267bb2dae91741f078dcc3
Dependencies: Stopgaps:

Description (last modified by embray)

This enhances sage_setup.find.find_extra_files to better support non-Python resources found in Python package sources. Previously it could include extra files found under build/cythonized, but now it can also include arbitrary files (including support for fnmatch patterns) that should be installed alongside a Python package. For example,

find_extra_files('sage.foo', '.', special_filenames=['*.html'])

will find any .html files under the ./sage/foo/ package.

This will be useful for #21732 and possibly other tickets.

This also adds a package_data variable in module_list.py that can list all non-Python resources on a per-package basis (using the same format as the standard `package_data` option to setup.py. This includes handling of package data in a way that's compatible with sage_setup.clean.clean_install_dir.

Change History (19)

comment:1 Changed 3 years ago by embray

  • Component changed from PLEASE CHANGE to build
  • Description modified (diff)
  • Status changed from new to needs_review

Hopefully this makes sense. Let me know if there's any more I can clarify. Note that, unlike my original attempt at this in #21732, I avoided moving module_list.py for now.

comment:2 Changed 3 years ago by mkoeppe

  • Cc mkoeppe added

comment:3 Changed 3 years ago by jdemeyer

First of all, there is an easy conflict with #22662.

The function find_extra_files was obviously meant to find extra Cython files since it looks for .pxd and .pxi files.

It makes sense to generalise this, but then it should no longer handle .pxd and .pxi files by default: those should be passed as *.pxd and *.pxi arguments by sage_build_ext.copy_extra_files.

comment:4 Changed 3 years ago by jdemeyer

I think it's a bit silly to duplicate copy_extra_files. There are two reasons for this:

  1. The build_py command already contains lots of code to deal with "data files". It might be better to hook into this, for example by overriding build_py.get_data_files(). The only reason that I didn't do this in #21600 is because build_ext is run after build_py.
  1. If you really want to copy the files manually, I don't see much reason to treat the Cython data files differently from the package_data data files. The current code makes a distinction between cmd_build_ext.cythonized_files and cmd_build_py.extra_files. There is no reason to do that.

See also https://trac.sagemath.org/ticket/21682#comment:34

comment:5 Changed 3 years ago by jdemeyer

There is also an easy conflict with #22106.

comment:6 follow-up: Changed 3 years ago by embray

build_ext is run after build_py

Only if you run ./setup.py build. The two commands can be run independently as well and serve different purposes. In particular, it makes sense to make a distinction here because one case is for installing files that are found in the Python source tree, while the other is for installing files found under build/cythonized (which normally would actually be the Python source tree but we build this separate hierarchy--I never liked that, but whatever).

comment:7 in reply to: ↑ 6 Changed 3 years ago by jdemeyer

Replying to embray:

build_ext is run after build_py

Only if you run ./setup.py build.

What do you mean? If I run pip install ... or ./setup.py install, it would still run build_py before build_ext, right? Or am I missing something?

comment:8 follow-up: Changed 3 years ago by embray

setup.py install calls setup.py build, yes. All of these commands can be run individually, and should work individually, in principle. It's just that some of them of them (namely "install" and "build") work primarily by running a number of simpler sub-commands in some order.

comment:9 in reply to: ↑ 8 Changed 3 years ago by jdemeyer

Replying to embray:

setup.py install calls setup.py build, yes. All of these commands can be run individually, and should work individually, in principle.

For what value of "work"? Do you mean "work" as in: doesn't raise an exception? Or "work" as in: actually does something meaningful?

It makes little sense to run just build_ext without anything else. In the real world, build scripts have a natural order and dependencies. You cannot cherry-pick pieces of a build system and expect those pieces to be meaningful individually. Especially if you want something like https://github.com/cython/cython/issues/1514 (which I think is a good idea) you cannot run build_ext individually.

Splitting the installation of data files in two, just because it adheres to some theoretical principle that nobody cares about, looks like a bad idea to me.

comment:10 follow-up: Changed 3 years ago by jdemeyer

Together with #21682, I would like to move to the following build steps:

  1. Run build_cython to run Cython, which generates .c and .h files.
  1. Run build_py which "builds" Python files and which copies all package data, from Cython and non-Cython. Preferably, copying this package data can be done by using some existing machinery from build_py.
  1. Run build_ext as usual.

Erik, what's your opinion on this?

comment:11 follow-up: Changed 3 years ago by embray

It makes little sense to run just build_ext without anything else.

I do that almost every day on 'normal' projects. ./setup.py build_ext --inplace is the best thing to doing in-source development of a project that uses extension modules.

In any case, by overthinking it you're actually making it more complicated and less reliable. It works fine as is.

comment:12 in reply to: ↑ 11 Changed 3 years ago by jdemeyer

Replying to embray:

It works fine as is.

...at the cost of extra complexity. I am a strong believer in the https://en.wikipedia.org/wiki/KISS_principle

comment:13 Changed 3 years ago by embray

You don't need to link me to a Wikipedia article on "KISS".

In fact complexity is sometimes subjective. You're upset because you see what looks like a little bit of code duplication (actually, I don't see it as duplication; it's doing two somewhat different things that nonetheless look similar due to using the same API), but without it in fact you're relying on a fragile, poorly documented/specified state machine to do things in the right order or else everything breaks. That to me is complexity.

comment:14 Changed 3 years ago by embray

(I do agree about rebasing on / integrating with #21682 except that's been stalled for months for no apparent reason)

comment:15 Changed 3 years ago by embray

Alright, I rebased #21682, so I'll at least see about integrating this ticket with it properly. If there's any opportunity I can find to reduce code duplication I'll take it, but I do think in this case files generated from Cython should be handled separately from static files.

comment:16 Changed 3 years ago by chapoton

  • Status changed from needs_review to needs_work

does not apply

comment:17 Changed 3 years ago by jdemeyer

Erik: I'm confused how you see the bigger picture with this ticket and #21682. In your ideal world, how would the build process work? In other words: what is your alternative to 10?

comment:18 Changed 2 years ago by jdemeyer

TODO: discuss next week if possible.

comment:19 in reply to: ↑ 10 Changed 2 years ago by embray

Replying to jdemeyer:

Together with #21682, I would like to move to the following build steps:

  1. Run build_cython to run Cython, which generates .c and .h files.
  1. Run build_py which "builds" Python files and which copies all package data, from Cython and non-Cython. Preferably, copying this package data can be done by using some existing machinery from build_py.
  1. Run build_ext as usual.

Erik, what's your opinion on this?

I think this makes sense, actually. The only thing is that build_cython needs to be responsible for providing a list of resource files it generates (.h files, etc.), but we could free it from responsibility for actually copying them. I don't think it makes a difference either way really but I don't care much at this point either.

Note: See TracTickets for help on using tickets.