Opened 5 years ago
Closed 4 years ago
#21600 closed enhancement (fixed)
Use custom build_ext to compile Cython code
Reported by: | jdemeyer | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-7.4 |
Component: | build | Keywords: | |
Cc: | mkoeppe, embray | Merged in: | |
Authors: | Jeroen Demeyer | Reviewers: | Matthias Koeppe, Erik Bray |
Report Upstream: | N/A | Work issues: | |
Branch: | dfcd817 (Commits, GitHub, GitLab) | Commit: | dfcd8177447c7eab491569f08777f585deda93ee |
Dependencies: | #21604 | Stopgaps: |
Description (last modified by )
In src/setup.py
, we should not run cythonize()
manually. It would be better to use a custom build_ext
command for that.
This is inspired by the build_ext
from Cython 0.25: https://github.com/cython/cython/blob/master/Cython/Distutils/build_ext.py
Unfortunately, we cannot really use that since it doesn't allow to pass options to cythonize()
.
Due to a Cython bug, this warning gets displayed even though the cache
option is valid: UserWarning: got unknown compilation option, please remove: cache
.
Change History (47)
comment:1 Changed 5 years ago by
- Description modified (diff)
comment:2 Changed 5 years ago by
- Summary changed from Use Cython's build_ext to Use custom build_ext to compile Cython code
comment:3 Changed 5 years ago by
- Description modified (diff)
comment:4 Changed 5 years ago by
- Branch set to u/jdemeyer/ticket/21600
comment:5 Changed 5 years ago by
- Commit set to 6d6d3490d1aa434acfdb9ebed4730df38aae6a37
- Dependencies changed from #20596 to #21604
comment:6 Changed 5 years ago by
- Commit changed from 6d6d3490d1aa434acfdb9ebed4730df38aae6a37 to 35ecf4b0fca0c13dad3e38826fc33211bbd66c81
Branch pushed to git repo; I updated commit sha1. This was a forced push. Last 10 new commits:
3a8cc0e | Fix typo in comment
|
0dd9c50 | Respect environment variable MAKE
|
17f90d8 | beautification
|
e5f9065 | More comments
|
7791cd9 | Remove --buildbase code
|
74169e7 | Pass SAGE_SRC to generate_py_source.mk
|
0394333 | Add new file to MANIFEST.in
|
fdedb02 | Install SAGE_LOCAL/bin/sage instead of SAGE_ROOT/sage as Jupyter kernel
|
5ba95ed | Clean up stale installed files in install command
|
35ecf4b | Run cythonize() inside build_ext command
|
comment:7 Changed 5 years ago by
- Cc embray added
comment:8 follow-up: ↓ 9 Changed 5 years ago by
If I'm to understand correctly, this is mostly just taking the existing Cython build code and moving into the build_ext command, right? Or was anything substantive changed in the build code too (beyond what was already from #21604)?
comment:9 in reply to: ↑ 8 Changed 5 years ago by
Replying to embray:
If I'm to understand correctly, this is mostly just taking the existing Cython build code and moving into the build_ext command, right?
Exactly. But it interacts in non-trivial ways with other parts of setup.py
(in particular, finding the data files), so this isn't ready yet.
comment:10 Changed 5 years ago by
Could you be more specific? I can probably help.
comment:11 Changed 5 years ago by
Well, I haven't figured out precisely what to do with this part from src/setup.py
:
from sage_setup.find import find_python_sources, find_extra_files python_packages, python_modules = find_python_sources( SAGE_SRC, ['sage', 'sage_setup']) python_data_files = find_extra_files(python_packages, ".", SAGE_CYTHONIZED, SAGE_LIB, ["ntlwrap.cpp"])
This code determines some inputs to the setup()
call, so logically it should come before calling setup()
.
On the other hand, the last line assumes that Cython has run because it looks for files in SAGE_CYTHONIZED
. So, if we run Cython in the build_ext
command of setup()
(what this ticket already implements), then we can no longer determine the python_data_files
before calling setup()
.
comment:12 Changed 5 years ago by
One secret about distutils, which is both a bug and a feature--and the fact that it's often exploited is why it's so hard to change anything about distutils--is that in many cases it doesn't really matter what you pass to setup()
. It's mostly important for static metadata, but most other details about building the package can be tweaked in various places along the line (albeit sometimes requiring care).
So for example, the data_files
passed to setup()
is stored on the Distribution
object in the .data_files
attribute. So this can be accessed from any command via self.distribution.data_files
. If a command, like build_py
or build_ext
is also responsible for generating files, some of which might result in installable resource files (for Cython modules this is mainly header files), it can also append those generated files, as appropriate, to self.distribution.data_files
for later use.
Where this requires care is it's sometimes important to make sure you're using those attributes in ways that are compatible with how other commands use them, for which the best documentation is the source code (another misfeature). data_files
is an easy one because it's only used by a couple other commands, mainly install
/install_data
, and sdist
. This is easy because the build
command is normally run automatically before install
, so as long as the build_ext
command updates the data_files
appropriately they will be handled correctly.
sdist
is a little trickier--it does not normally run build first, though if you invoke setup.py
with multiple commands it will work correctly (like ./setup.py build sdist
). I normally subclass sdist
to have it automatically run build
first if I have Cython modules, because I also want to include the C sources in my source dist. But I'm not too worried about that right now since ./setup.py sdist
doesn't work right for Sage for other reasons too. That will be a separate issue.
comment:13 Changed 5 years ago by
Thanks a lot! I think I have enough information now to work on this.
comment:14 Changed 5 years ago by
- Commit changed from 35ecf4b0fca0c13dad3e38826fc33211bbd66c81 to 2b5fa98244c5d9f3481c7d15061e3e0f4c4b0ee6
Branch pushed to git repo; I updated commit sha1. New commits:
2b5fa98 | Copy extra files to build directory instead of using data_files
|
comment:15 follow-up: ↓ 16 Changed 5 years ago by
I'm not quite sure why you're not not using data_files
at all, though for files installed with the Python modules themselves it's more customary to use package_data
but that's neither here nor there. I suspect the problem might have to do in part with oddity in how find_extra_files
is expected to work.
I'm not sure why the dist.extra_files =
either. There's no reason now to go assigning arbitrary new attributes to the Distribution instance.
comment:16 in reply to: ↑ 15 ; follow-up: ↓ 19 Changed 5 years ago by
Replying to embray:
I'm not quite sure why you're not not using
data_files
at all
I could have done that, but the usage of data_files
in Sage has always been a bit fishy, see #20108. So given that I'm changing the data file handling anyway, I decided to stop using data_files
.
though for files installed with the Python modules themselves it's more customary to use
package_data
That doesn't work either: package_data
assumes that the package data files are in the same directory as the Python sources. This is not the case for the Cython-generated files, so I cannot use package_data
. But I am processing the data files similarly as what distutils does with package_data
: copying them to the build directory. Since the data files appear in the build directory, they are automatically installed in site-packages
.
I suspect the problem might have to do in part with oddity in how
find_extra_files
is expected to work.
No, we can always change find_extra_files
to do whatever we want it to do.
I'm not sure why the
dist.extra_files =
either. There's no reason now to go assigning arbitrary new attributes to the Distribution instance.
These extra_files
are also needed for cleaning in the sage_install
class. So I need to pass them somehow and assigning an attribute to the Distribution instance seemed like the most natural way.
comment:17 Changed 5 years ago by
- Commit changed from 2b5fa98244c5d9f3481c7d15061e3e0f4c4b0ee6 to 7e3a492edaa54fc9c724d183534d27d171c8cf60
Branch pushed to git repo; I updated commit sha1. New commits:
7e3a492 | Rename extra_files -> sage_extra_files
|
comment:18 Changed 5 years ago by
- Status changed from new to needs_review
comment:19 in reply to: ↑ 16 ; follow-up: ↓ 21 Changed 5 years ago by
Replying to jdemeyer:
Replying to embray:
though for files installed with the Python modules themselves it's more customary to use
package_data
That doesn't work either:
package_data
assumes that the package data files are in the same directory as the Python sources. This is not the case for the Cython-generated files, so I cannot usepackage_data
. But I am processing the data files similarly as what distutils does withpackage_data
: copying them to the build directory. Since the data files appear in the build directory, they are automatically installed insite-packages
.
Since the purpose of package_data
is specifically to list data files that should be installed in the Python packages, I think it makes sense to use anyways. Normally this wouldn't be a problem even for Cython generated files, and the only reason it is is because we're handling those in a "weird" (to distutils) and unexpected way. That's fine--I mean, I don't agree with it but it's not abhorrent or anything--we just need to make some tweaks to how package_data
is handled so that it searches additional file hierarchies.
But for coherency I would rather use the existing option that is specifically for this purpose, and use our command subclasses to enhance its functionality to work for our purposes, rather than make up something entirely new.
comment:20 Changed 5 years ago by
- Description modified (diff)
comment:21 in reply to: ↑ 19 Changed 5 years ago by
Replying to embray:
we just need to make some tweaks to how package_data is handled so that it searches additional file hierarchies.
There is another issue: package_data
is handled in the build_py
command, which happens before build_ext
, which runs Cython. So even if you manage to handle the Cython-generated files using package_data
, you either need to reorder the build
subcommands or you need to run Cython in build_py
.
But for coherency I would rather use the existing option that is specifically for this purpose
You mean: going back to use data_files
?
comment:22 follow-ups: ↓ 23 ↓ 24 Changed 5 years ago by
I see, that's a bit annoying. I'm surprised I haven't run into that before. If I have I don't know what solution I used--maybe just manually copying the generated Cython files to the appropriate build path. It would still not be a bad idea to add relevant entries to package_data
though since that's also used by the sdist
command to ensure that all those files are included in the source tarball (though this can also be achieved simply with the correct MANIFEST.in rule).
Looking at the source again, I think the simplest thing to do would be after Cythonization to copy the appropriate generated files to the appropriate build/ directory (basically emulating what build_py.build_package_data
is doing, which you basically already have. But then also add entries for either of:
- `self.get_finalized_command('build_py').data_files or
- `self.distribution.data_files
The former is the same as if those files were included in package_data
in the first place.
An alternative I kind of like, but that's more complicated, is to make cythonize and entirely separate command (build_cython
or something) and insert that into the front of the build
command's sub-commands. But that's more complicated and probably not worth it for now.
My only real problem at this point is with the new attribute you're tacking on. It seems easily avoidable.
comment:23 in reply to: ↑ 22 ; follow-up: ↓ 25 Changed 5 years ago by
Replying to embray:
My only real problem at this point is with the new attribute you're tacking on.
Seriously, why is that a problem? I even explicitly added a sage_
prefix.
comment:24 in reply to: ↑ 22 Changed 5 years ago by
Replying to embray:
I see, that's a bit annoying. I'm surprised I haven't run into that before. If I have I don't know what solution I used--maybe just manually copying the generated Cython files to the appropriate build path. It would still not be a bad idea to add relevant entries to
package_data
though since that's also used by thesdist
command to ensure that all those files are included in the source tarball (though this can also be achieved simply with the correct MANIFEST.in rule).Looking at the source again, I think the simplest thing to do would be after Cythonization to copy the appropriate generated files to the appropriate build/ directory (basically emulating what
build_py.build_package_data
is doing, which you basically already have. But then also add entries for either of:
- `self.get_finalized_command('build_py').data_files or
- `self.distribution.data_files
The former is the same as if those files were included in
package_data
in the first place.
Do you want me to make those changes here? Or do you accept the current code on this branch and should we leave that for another ticket?
comment:25 in reply to: ↑ 23 ; follow-ups: ↓ 26 ↓ 30 Changed 5 years ago by
Replying to jdemeyer:
Replying to embray:
My only real problem at this point is with the new attribute you're tacking on.
Seriously, why is that a problem? I even explicitly added a
sage_
prefix.
1) Because it's not at all clear why it's even necessary, and it only confuses things.
2) Just because distutils is "meant to be" tinkered with in a flexible manner, monkeypatching the Distribution
option is definitely an anti-pattern. More commonly, if some command has some data it would like to share with other commands, the pattern is to add an attribute to that command instead (and usually at least initialize that attribute with a default value in the finalize_command
method). Then, if other commands need that data, they would first ensure that the command it comes from is finalized, and may also run that command if necessary (using self.distribution.run_command
, which doesn't run the command a second time if it has already been run). That way if command B
uses A.data
, it ensures that A.data
has been generated first.
I'm just looking out for distutils being used the way it's intended. This is unfortunately very undocumented, but if you spend enough time in it you'll see that there are various conventions that are best to follow. That said, I don't see why you need to add this in the first place.
comment:26 in reply to: ↑ 25 Changed 5 years ago by
Replying to embray:
Because it's not at all clear why it's even necessary, and it only confuses things.
It's necessary because I need that list of files in more than one place.
comment:27 follow-up: ↓ 28 Changed 5 years ago by
Hmmm...I think I can appreciate what the clean_stale_files
stuff is for. In most projects I would just blow away the build
directory, or just individual files in it if applicable. But sage is so large this is a real pain to do--that's something which itself could use a better approach but that's well beyond this ticket.
I don't think that should just happen when you "install" though, but really prior to any build.
comment:28 in reply to: ↑ 27 Changed 5 years ago by
comment:29 Changed 5 years ago by
- Reviewers set to Matthias Koeppe
I've tested it (on top of 7.4.beta6), and it seems to work for me. Erik, objections to setting this ticket to "positive review"?
comment:30 in reply to: ↑ 25 Changed 5 years ago by
Replying to embray:
2) Just because distutils is "meant to be" tinkered with in a flexible manner, monkeypatching the
Distribution
option is definitely an anti-pattern.
Adding attributes to a class is a completely normal thing when subclassing. In this case, I could make a trivial subclass
class SageDistribution(Distribution): pass
and define that SageDistribution
supports a sage_extra_files
attribute. Because of the way Python works though, there is no need for such a subclass.
So, I still don't get why it's perfectly fine to have dist.data_files
(what distutils
itself uses) but not dist.sage_extra_files
. They serve very analogous purposes, so why not implement them analogously?
comment:31 follow-up: ↓ 32 Changed 5 years ago by
Please--you don't have to explain to me how Python works.
The point I'm trying to make is that the way you're doing this is antithetical to the common programming patterns used specifically in the context of writing distutils commands. I'm still not convinced that you need this extra attribute at all. But for now I'd be satisfied if it were implemented as I described above. If that's not clear I'll just do it.
comment:32 in reply to: ↑ 31 Changed 5 years ago by
Replying to embray:
Please--you don't have to explain to me how Python works.
Of course I don't need to explain Python. I wanted to explain the philosophy behind what I was doing.
I'm still not convinced that you need this extra attribute at all.
And I'm still not convinced that what I'm doing is somehow wrong.
But for now I'd be satisfied if it were implemented as I described above.
I'll try it.
comment:33 Changed 5 years ago by
BTW I should add--if what I suggested is not clear that's of course my fault for explaining poorly. My point is that it's not good practice in general to use classes as dumping grounds for arbitrary attributes. Python allows it, sure, but it's almost always confusing and error-prone. For example it's possible to run the "install" command without running "build_py" first, in which case you'll get an attribute error. There are myriad little corner cases like that in distutils.
I think instead of "sage_extra_files" it might be fine to call this "cythonized_files" or something like that, and have it be an attribute of the custom "build_ext". That way it's clearer exactly what this is about (and that the only reason it needs to exist at all is due to how we're putting the cythonized files in a separate, parallel directory structure).
comment:34 Changed 5 years ago by
- Commit changed from 7e3a492edaa54fc9c724d183534d27d171c8cf60 to dfcd8177447c7eab491569f08777f585deda93ee
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
dfcd817 | extra_files -> sage_build_ext.cythonized_files
|
comment:35 Changed 5 years ago by
Needs review.
comment:36 follow-up: ↓ 38 Changed 5 years ago by
This looks along the right lines. It occurs to me there's something kind of problematic here, which is that it runs run_cython
in finalize_command
. finalize_command
is basically supposed to be like an __init__
--it should just set default values for things and not actually have much in the way of side-effects.
run_cython()
should happen at the beginning of the run()
. finalize_command
should just set cythonized_files
to a default value (such as an empty list or None
). I think I see why you did things the way you did--you wanted to make sure one way or another that find_extra_files
was run. I might instead move the find_extra_files
call into a method on build_ext
(such as find_cythonized_files()
or something). When called after run_cython()
that could update the cythonized_files
attribute, and then be used to copy the files. But from the install command you would just called find_cythonized_files()
if cythonized_files
isn't already set.
comment:37 Changed 5 years ago by
Something like
class sage_build_ext(build_ext): def finalize_options(self): ... self.cythonized_files = None def run(self): self.run_cython() # The following two lines, or the equivalent thereof, could be called # directly from run_cython() too self.find_cythonized_files() self.copy_cythonized_files() build_ext.run(self) def find_cythonized_files(self): if self.cythonized_files is None: ... self.cythonized_files = ... class sage_install(install): def clean_stale_files(self): ... cmd_build_ext = self.get_finalized_command('build_ext') cmd.find_cythonized_files() # Do stuff with cmd.find_cythonized_files()
And the rest pretty much exactly as you have it. Would that work?
It's not pretty but with distutils you have to write code like you're writing for Python 2.1 or something :)
comment:38 in reply to: ↑ 36 Changed 5 years ago by
Replying to embray:
This looks along the right lines. It occurs to me there's something kind of problematic here, which is that it runs
run_cython
infinalize_command
.
That's exactly what upstream Cython 0.25 does (which was the inspiration for this ticket), so I'm not inclined to change this.
comment:39 Changed 5 years ago by
Erik, on this ticket (and elsewhere) you have mentioned various subtle issues of setup.py
commands and their interaction. I've opened ticket #21678 (Testsuite for src/setup.py) to formalize this, and to make sure that everything that is implemented correctly on the present ticket, according to your comments, will not be lost by regressions in the future.
comment:40 follow-up: ↓ 42 Changed 5 years ago by
Then I'll have to bother upstream Cython about this. There may be a reason they're doing it that way, but it doesn't make a lot of sense on its face.
The point of finalize_command
is to initialize any options / set up defaults, not actually do anything. "But they do it" is not a good reason if we don't know why they're doing it.
comment:41 Changed 5 years ago by
If I had to guess, the reason they're doing it that way is because of how the "sdist" command uses the "build_ext" command to get a list of source files to include in the sdist. I think the right way to do that is not really to run cythonize()
in the finalize_command()
method, but rather to modify the sdist
command so that it runs cythonize()
as a sub-command.
comment:42 in reply to: ↑ 40 ; follow-up: ↓ 43 Changed 5 years ago by
comment:43 in reply to: ↑ 42 Changed 5 years ago by
Replying to jdemeyer:
Replying to embray:
Then I'll have to bother upstream Cython about this.
Please do that.
I will try to work with them more closely on that.
In the interest of harmony and in making progress, if this is working for you as is, then I give it a positive review. I'll open a separate ticket with my ideas for how to build on this to make it a bit cleaner, but I don't think it's bad as is as long as it works. I just know we can do better.
comment:44 Changed 5 years ago by
- Reviewers changed from Matthias Koeppe to Matthias Koeppe, Erik Bray
I'll let the other reviewer (Matthias Koeppe) decide on giving positive review.
comment:45 Changed 5 years ago by
See #21682
comment:46 Changed 5 years ago by
- Status changed from needs_review to positive_review
Tested on top of 7.4.rc0.
comment:47 Changed 4 years ago by
- Branch changed from u/jdemeyer/ticket/21600 to dfcd8177447c7eab491569f08777f585deda93ee
- Resolution set to fixed
- Status changed from positive_review to closed
New commits:
Fix build of pyzmq with Cython 0.25
Upgrade Cython to version 0.25
Move old_style_globals to modules; other directives to cythonize() call
Run cythonize() inside build_ext command