#26018 closed defect (fixed)
Race conditions because of DESTDIR
Reported by: | jdemeyer | Owned by: | |
---|---|---|---|
Priority: | blocker | Milestone: | sage-9.1 |
Component: | build | Keywords: | |
Cc: | embray, dimpase, jhpalmieri | Merged in: | |
Authors: | Erik Bray, John Palmieri | Reviewers: | John Palmieri, Matthias Koeppe |
Report Upstream: | N/A | Work issues: | |
Branch: | 6b20b84 (Commits, GitHub, GitLab) | Commit: | |
Dependencies: | Stopgaps: |
Description (last modified by )
Using a staging directory for package install can introduce race conditions. For example, when installing a Python package, the package is installed first and then the egg-info
directory is installed. The latter is what pkg_resources
uses to consider a package "installed".
However, when a DESTDIR
install is used, the order that files appear in the final installation directory might be different.
If the egg-info
directory is copied to site-packages
before the actual Python package, then this can happen for example:
Installing alabaster-0.7.10 Installing package alabaster using pip2 Ignoring indexes: https://pypi.python.org/simple Processing /home/jdemeyer/sage-test/local/var/tmp/sage/build/alabaster-0.7.10/src Running setup.py (path:/tmp/pip-0ac_4C-build/setup.py) egg_info for package from file:///home/jdemeyer/sage-test/local/var/tmp/sage/build/alabaster-0.7.10/src Running command python setup.py egg_info running egg_info creating pip-egg-info/alabaster.egg-info writing pip-egg-info/alabaster.egg-info/PKG-INFO writing top-level names to pip-egg-info/alabaster.egg-info/top_level.txt writing dependency_links to pip-egg-info/alabaster.egg-info/dependency_links.txt writing manifest file 'pip-egg-info/alabaster.egg-info/SOURCES.txt' Traceback (most recent call last): File "<string>", line 1, in <module> File "/tmp/pip-0ac_4C-build/setup.py", line 37, in <module> 'Programming Language :: Python :: 3.4', File "/home/jdemeyer/sage-test/local/lib/python2.7/site-packages/setuptools/__init__.py", line 129, in setup return distutils.core.setup(**attrs) File "/home/jdemeyer/sage-test/local/lib/python2.7/distutils/core.py", line 151, in setup dist.run_commands() File "/home/jdemeyer/sage-test/local/lib/python2.7/distutils/dist.py", line 953, in run_commands self.run_command(cmd) File "/home/jdemeyer/sage-test/local/lib/python2.7/distutils/dist.py", line 972, in run_command cmd_obj.run() File "/home/jdemeyer/sage-test/local/lib/python2.7/site-packages/setuptools/command/egg_info.py", line 278, in run self.find_sources() File "/home/jdemeyer/sage-test/local/lib/python2.7/site-packages/setuptools/command/egg_info.py", line 293, in find_sources mm.run() File "/home/jdemeyer/sage-test/local/lib/python2.7/site-packages/setuptools/command/egg_info.py", line 524, in run self.add_defaults() File "/home/jdemeyer/sage-test/local/lib/python2.7/site-packages/setuptools/command/egg_info.py", line 563, in add_defaults rcfiles = list(walk_revctrl()) File "/home/jdemeyer/sage-test/local/lib/python2.7/site-packages/setuptools/command/sdist.py", line 20, in walk_revctrl for item in ep.load()(dirname): File "/home/jdemeyer/sage-test/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2405, in load return self.resolve() File "/home/jdemeyer/sage-test/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2411, in resolve module = __import__(self.module_name, fromlist=['__name__'], level=0) File "/home/jdemeyer/sage-test/local/lib/python2.7/site-packages/setuptools_scm/__init__.py", line 9, in <module> from .utils import trace ImportError: No module named utils
What you see here is that setuptools_scm
is used by pkg_resources
even though the package setuptools_scm
has not fully been installed.
See also: #26160 (py3: race condition with .pth files)
Change History (45)
comment:1 Changed 4 years ago by
comment:2 Changed 4 years ago by
- Description modified (diff)
comment:3 Changed 4 years ago by
Thanks for figuring out what was going on there. Do you know a reliable way to reproduce this? I wonder why I've never seen it before.
comment:4 follow-up: ↓ 9 Changed 4 years ago by
Another thing to note: Previously we worked around this kind of race condition by putting a lock around pip
, so only one package can be pip-installed at a time. Now we still have that lock, but I think it's not actually necessary (though maybe still worth keeping?) since we in general do not pip-install packages to the same site-packages at the same time (when using DESTDIR).
However, similar problems arise if packages can be copied into site-packages by sage-spkg while other python packages are simultaneously being installed. So maybe we keep the exiting pip lock, but also require sage-spkg to obey it somehow.
comment:5 Changed 4 years ago by
I just had this happen trying to rebuild the Windows patchbot, which broke again, for some reason.
While I believe this is a more general problem in principle, it's happening specifically because installing setuptools_scm
can alter the behavior of pip (in particular, it installs some setuptools.file_finder
entrypoints). It's also pulled in as a setup_requires
from a couple of packages that have it listed as a prerequisite.
This is relatively unusual, and I think among packages installed by Sage, setuptools_scm
is probably the only one like this. So I wonder if we shouldn't just list it as a prerequisite to almost all Python packages, like we currently do with pip.
comment:6 Changed 4 years ago by
- Branch set to u/embray/build/python_toolchain
- Commit set to 6c16368a6416d1e3655e026c63811fc9fbe23d2f
- Status changed from new to needs_review
Just as one possible idea, what about this?
Technically, what I called $(PYTHON_TOOLCHAIN)
, could just be spelled "setuptools_scm", since there's already a direct dependency sequence of setuptools -> pip -> setuptools_scm
. But I like that having a PYTHON_TOOLCHAIN
variable makes this relationship to other packages explicit, and can be modified in one place.
New commits:
6c16368 | Add a PYTHON_TOOLCHIN variable to the makefile, which makes explicit the prerequisites to installing other Python packages with pip
|
comment:7 Changed 4 years ago by
- Commit changed from 6c16368a6416d1e3655e026c63811fc9fbe23d2f to b6268ef708d10573b283cb4dea1c691e90a83872
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
b6268ef | Add a PYTHON_TOOLCHIN variable to the makefile, which makes explicit the prerequisites to installing other Python packages with pip
|
comment:8 Changed 4 years ago by
(had to fix one case, build/pkgs/simplegeneric/dependencies
, where my sed
call was a little over-aggressive; I should have made it just apply to the first line :)
comment:9 in reply to: ↑ 4 Changed 4 years ago by
Replying to embray:
Another thing to note: Previously we worked around this kind of race condition by putting a lock around
pip
, so only one package can be pip-installed at a time. Now we still have that lock, but I think it's not actually necessary (though maybe still worth keeping?) since we in general do not pip-install packages to the same site-packages at the same time (when using DESTDIR).
Good thinking! The problem is that the part which copies files from DESTDIR
to the final installation directory is not locked. Adding an exclusive lock there would also mitigate this and maybe also #18438.
comment:10 Changed 4 years ago by
I have in the past considered putting a lock around any kind of writing to $SAGE_LOCAL
--not for any specific reason but it just seemed like it would probably be a good idea. I had a feeling problems like this one would come up less frequently with such a lock. However, for that to be practical, while still keeping parallel builds reasonably fast, we need as many packages as possible using DESTDIR
builds.
I can experiment with that, if you think it's a good idea. In the meantime what do you think about the workaround I attached? I think it makes a certain amount of sense independently of this issue, but I'm not sure.
comment:11 Changed 4 years ago by
I don't understand the branch: it looks like pip
need not be installed before all of the packages that (without the branch) have it as a dependency. In any case, with a Python 3 build, I get
No record that 'setuptools_scm' was ever installed; skipping uninstall Installing setuptools_scm-1.15.6 your setuptools is too old (<12) setuptools_scm functionality is degraded /Users/palmieri/Desktop/Sage_stuff/sage_builds/PYTHON3/sage-8.4.beta5/local/bin/python3: No module named pip
Perhaps pip
should not have $(PYTHON_TOOLCHAIN)
as a dependency.
comment:12 follow-up: ↓ 13 Changed 4 years ago by
- Status changed from needs_review to needs_work
Similar error with a Python 2 build. I don't think it's too much to ask for you to actually try your own branch before marking it as "needs review". Please do better in the future.
comment:13 in reply to: ↑ 12 Changed 4 years ago by
Replying to jhpalmieri:
Similar error with a Python 2 build. I don't think it's too much to ask for you to actually try your own branch before marking it as "needs review". Please do better in the future.
With respect, this is extremely condescending and unnecessary. I'm a professional software engineer and you are not my employer or my mentor--you don't have to tell me how to do my job. Obviously I tested it, being that it's a fairly non-trivial change. I see what the problem is about pip and I'm not sure why I didn't hit on it, though I only would have if I were fully rebuilding from scratch (I believe, instead, I was doing some ./sage -f
calls that were designed to trigger the race condition).
Sage-the-distribution is a very large, complex pile of software, and it's difficult to test all possible scenarios, especially for builds. I'm not sure what you think the point of "needs review" is, if not to provide peer-review, which includes peer-testing. If you feel your time has been wasted you were also welcome to check the current results of the patchbots (which also show a build failure) and point that out rather than lecturing me needlessly.
comment:14 Changed 4 years ago by
- Commit changed from b6268ef708d10573b283cb4dea1c691e90a83872 to e7efb49659ca300b0f894a6624bb6e7ab3466415
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
e7efb49 | Add a PYTHON_TOOLCHIN variable to the makefile, which makes explicit the prerequisites to installing other Python packages with pip
|
comment:15 Changed 4 years ago by
- Status changed from needs_work to needs_review
Updated to ensure that $(PYTHON_TOOLCHAIN)
is not used in the dependencies of the PYTHON_TOOLCHAIN
members themselves.
IIRC (it was several weeks ago) I did do this in the first place, but I think when I ran sed
over the remaining packages I might have forgotten to exclude pip
and setuptools_scm
in particular.
Of course, let's wait and see what my patchbot says.
comment:16 follow-up: ↓ 18 Changed 4 years ago by
My point is exactly that this is a ticket about building Sage from scratch, so running make distclean; make
is one of the first things to test, it seems to me.
Edit: let me tone this down a little.
So here is a serious question: if I think you have not done a good job on this ticket, should I not say anything? I think it's not bad to provide constructive criticism, like saying "you should have run make distclean; make
before marking this as needs_review
."
comment:17 follow-up: ↓ 19 Changed 4 years ago by
Let me put this another way. As you have no doubt experienced, in the Sage community there are different styles of reviewers. Some are very thorough, while some will take a glance at the code and say "looks good to me, positive review". One can get away with that superficial kind of review some of the time, for example a Python 3 ticket which just replaces some iteritems
by items
, but on a ticket like this, it needs thorough testing. I believe that the burden of that thorough testing falls on both the author and the reviewer. In this particular case, the problem gets caught by the patchbots, but that does not always happen.
So especially on a ticket involving Sage's build process, I think it is important for both the authors and the reviewers to be more careful and thorough than they might otherwise be. It's better to get it right the first time than to merge a ticket with problems and have to put out a bunch of fires later.
comment:18 in reply to: ↑ 16 Changed 4 years ago by
Replying to jhpalmieri:
My point is exactly that this is a ticket about building Sage from scratch, so running
make distclean; make
is one of the first things to test, it seems to me.
It isn't necessarily though. The issue at hand can occur in any situation where, at the most, the python2 package needs to be rebuilt, as do its dependencies. A full build from scratch was not necessary to test it--I had found it sufficient for reliably reproducing the problem to do ./sage -f <something>
where I forget exactly what <something>
was but it was one of python's early dependencies. Possibly something fast, that wouldn't require a whole lot of other packages to be rebuilt. Now that I look, it I think it was probably sqlite.
Edit: let me tone this down a little.
So here is a serious question: if I think you have not done a good job on this ticket, should I not say anything? I think it's not bad to provide constructive criticism, like saying "you should have run
make distclean; make
before marking this asneeds_review
."
I think that would have been completely fair, though I have mixed feelings about "before marking this as needs_review". What I find insulting is the assumption that I (or anyone, if it were someone else) just didn't test it at all. It might be a little more reasonable if were something that obviously would never have worked at all under any circumstances, like having a syntax error in some code such that it would not have even compiled.
Even then there are cases I've seen, both from myself and others, where I had something working fine, but accidentally hit a key in my editor and inserted a stray symbol somewhere just before committing the change. I almost always review my diff before committing, but something like that can be easy to fix (and of course if I did skip diff review entirely, that's on me). In fact, as I pointed out, this was in some ways such a case, since I had it "right" in the first place but then made a mistake before committing. I did do a diff review but it was the sort that was easy to glaze over since it was dozens of almost equivalent changes.
But I don't think anyone should be lectured to about setting a ticket to "needs review" if there happens to be a mistake. That's the whole point of asking for review--it's saying "I've done everything I can or have time for to work on this; please take a look".
comment:19 in reply to: ↑ 17 Changed 4 years ago by
Replying to jhpalmieri:
So especially on a ticket involving Sage's build process, I think it is important for both the authors and the reviewers to be more careful and thorough than they might otherwise be. It's better to get it right the first time than to merge a ticket with problems and have to put out a bunch of fires later.
I don't disagree of course. Part of the problem is that we do not have a QA department to set review policy. But you will never, ever get everything "right the first time" and that's why we have peer review and continuous integration in the first place (and even with those things some things will be wrong and that's OK).
In the meantime, I think there would be less of a problem here if there were a test policy and review policy for build issues. Lacking that, you can't point fingers if someone didn't follow a policy that doesn't exist. (Though as I've noted, I don't think it's fair to "point fingers" at all on code that is still under review in the first place).
I appreciate your code reviews so I hope you'll keep doing them as they are very valuable; but please don't assume that if something is broken it's not because the contributor was lazy or careless or doesn't know what they're doing.
comment:20 Changed 3 years ago by
- Milestone changed from sage-8.4 to sage-8.6
comment:21 Changed 3 years ago by
- Status changed from needs_review to needs_work
comment:22 Changed 3 years ago by
Per #26990, the future
package also inserts itself at a fairly "low level" in that it ends up getting imported, once it's installed, by other Python build tools like setuptools and/or pip. So it should also be included in the list of toolchain packages.
comment:23 Changed 3 years ago by
- Commit changed from e7efb49659ca300b0f894a6624bb6e7ab3466415 to cce3a23d86197f4d9603d3640d4196cbafa8a7fe
comment:24 Changed 3 years ago by
- Commit changed from cce3a23d86197f4d9603d3640d4196cbafa8a7fe to a06c851550132fb137aafa09b2f749a52dd5fede
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
a06c851 | add the "future" package to PYTHON_TOOLCHAIN
|
comment:25 Changed 3 years ago by
- Commit changed from a06c851550132fb137aafa09b2f749a52dd5fede to 6c9482b6d65a153cb87f1c07801f503e4c7e7437
comment:26 Changed 3 years ago by
- Status changed from needs_work to needs_review
comment:27 Changed 3 years ago by
- Milestone changed from sage-8.6 to sage-8.7
Retarging tickets optimistically to the next milestone. If you are responsible for this ticket (either its reporter or owner) and don't believe you are likely to complete this ticket before the next release (8.7) please retarget this ticket's milestone to sage-pending or sage-wishlist.
comment:29 Changed 3 years ago by
- Milestone changed from sage-8.7 to sage-8.8
Ticket retargeted after milestone closed (if you don't believe this ticket is appropriate for the Sage 8.8 release please retarget manually)
comment:30 Changed 3 years ago by
- Milestone changed from sage-8.8 to sage-8.9
Tickets still needing working or clarification should be moved to the next release milestone at the soonest (please feel free to revert if you think the ticket is close to being resolved).
comment:31 Changed 3 years ago by
- Milestone changed from sage-8.9 to sage-9.1
Ticket retargeted after milestone closed
comment:32 Changed 2 years ago by
- Priority changed from major to critical
I have started to see this race conditions in automatic builds (for which I use high parallelization) now quite regularly. For example https://github.com/mkoeppe/sage/runs/581024663?check_suite_focus=true
comment:33 Changed 2 years ago by
- Cc dimpase jhpalmieri added
comment:34 Changed 2 years ago by
- Description modified (diff)
comment:35 Changed 2 years ago by
- Branch changed from u/embray/build/python_toolchain to u/jhpalmieri/build/python_toolchain
comment:36 Changed 2 years ago by
- Commit changed from 6c9482b6d65a153cb87f1c07801f503e4c7e7437 to 6b20b84fa42303d8d96373f6e8f75546b97a079c
comment:37 Changed 2 years ago by
- Status changed from needs_work to needs_review
comment:38 Changed 2 years ago by
Looks good to me, I'll try this out on one of my next GH Action runs
comment:39 Changed 2 years ago by
- Reviewers set to John Palmieri, Matthias Koeppe
comment:40 Changed 2 years ago by
By the way, I ran "git grep pip build/pkgs/*/dependencies" and similarly searched for "setuptools" and "future". I think I caught everything that needed changing. There are a few packages (I think notedown and pandoc_attributes are the only ones) with dependencies like
$(PYTHON) $(PYTHON_TOOLCHAIN) | pip nbformat nbconvert six pandoc_attributes
and I left those as is.
comment:41 Changed 2 years ago by
- Priority changed from critical to blocker
comment:42 Changed 2 years ago by
comment:43 Changed 2 years ago by
- Status changed from needs_review to positive_review
This seems to be working well. Thanks!
comment:44 Changed 2 years ago by
- Branch changed from u/jhpalmieri/build/python_toolchain to 6b20b84fa42303d8d96373f6e8f75546b97a079c
- Resolution set to fixed
- Status changed from positive_review to closed
comment:45 Changed 2 years ago by
- Commit 6b20b84fa42303d8d96373f6e8f75546b97a079c deleted
As was to be expected, this helped but did not resolve the issue completely. Follow-up:
- #29585 More race conditions with Python package installations because of DESTDIR
Very naive fix at least for Python packages:
build/bin/sage-spkg
); do