#29585 closed defect (fixed)
Remove DESTDIR staging for Python packages to eliminate race conditions during Python package installations
Reported by: | mkoeppe | Owned by: | |
---|---|---|---|
Priority: | critical | Milestone: | sage-9.4 |
Component: | build | Keywords: | sd111 |
Cc: | jhpalmieri, dimpase, embray, gh-kliem | Merged in: | |
Authors: | Matthias Koeppe | Reviewers: | John Palmieri |
Report Upstream: | N/A | Work issues: | |
Branch: | c3eeb69 (Commits, GitHub, GitLab) | Commit: | |
Dependencies: | Stopgaps: |
Description (last modified by )
Follow-up from #26018:
Declaring some Python packages as PYTHON_TOOLCHAIN helped somewhat, but there are more race conditions with the same failure mechanism: Some Python package byte-compiles modules from a half-installed package, and then finishing the installation of that fails.
Of course, the high parallelization in my automatic runs on GitHub provoke these kinds of errors. But this should be fixed nevertheless.
See https://github.com/mkoeppe/sage/runs/618645992
[backports_shutil_get_terminal_size-1.0.0.p1] cp: cannot create regular file '/sage/local/./lib/python2.7/site-packages/backports/__init__.py': File exists [backports_shutil_get_terminal_size-1.0.0.p1] ************************************************************************ [backports_shutil_get_terminal_size-1.0.0.p1] Error copying files for backports_shutil_get_terminal_size-1.0.0.p1. [backports_shutil_get_terminal_size-1.0.0.p1] ************************************************************************ [backports_ssl_match_hostname-3.5.0.1.p0] error installing, exit status 1. End of log file: [backports_shutil_get_terminal_size-1.0.0.p1] Full log file: /sage/logs/pkgs/backports_shutil_get_terminal_size-1.0.0.p1.log [backports_ssl_match_hostname-3.5.0.1.p0] Copying package files from temporary location /sage/local/var/tmp/sage/build/backports_ssl_match_hostname-3.5.0.1.p0/inst to /sage/local [backports_ssl_match_hostname-3.5.0.1.p0] cp: cannot create regular file '/sage/local/./lib/python2.7/site-packages/backports/__init__.pyc': File exists [backports_ssl_match_hostname-3.5.0.1.p0] ************************************************************************ [backports_ssl_match_hostname-3.5.0.1.p0] Error copying files for backports_ssl_match_hostname-3.5.0.1.p0. [backports_ssl_match_hostname-3.5.0.1.p0] ************************************************************************ [backports_ssl_match_hostname-3.5.0.1.p0] Full log file: /sage/logs/pkgs/backports_ssl_match_hostname-3.5.0.1.p0.log
Another occurrence on Cygwin: https://github.com/mkoeppe/sage/runs/638920131?check_suite_focus=true
Another occurrence: #30831
A new one thanks to recent pip
's opportunistic use of tornado
:
[tornado-6.0.4] real 0m11.342s [tornado-6.0.4] user 0m2.041s [tornado-6.0.4] sys 0m0.380s [tornado-6.0.4] Copying package files from temporary location /sage/local/var/tmp/sage/build/tornado-6.0.4/inst to /sage/local [tornado-6.0.4] cp: cannot create directory '/sage/local/./lib/python3.9/site-packages/tornado/__pycache__': File exists [tornado-6.0.4] ************************************************************************
https://github.com/sagemath/sage/runs/2879682001
In this ticket, we remove the DESTDIR
staging for Python packages. It has been redundant since we moved to installing packages via wheels.
After this change, the only file that is tracked in $SAGE_LOCAL/var/lib/sage/installed/pynormaliz-2.14
is the wheel file:
"files": [ "var/lib/sage/wheels/PyNormaliz-2.14-cp39-cp39-macosx_11_0_x86_64.whl" ] }
Uninstalling Python packages is now done via pip
, by means of a new removal script called spkg-piprm
installed in $SAGE_LOCAL/var/lib/sage/scripts/SPKG/
.
Change History (42)
comment:1 Changed 2 years ago by
comment:2 follow-up: ↓ 4 Changed 2 years ago by
In general, I think that package installation can't be atomic, since we need to move things to SAGE_LOCAL/lib and SAGE_LOCAL/include, and we can't just create a directory temp/lib and move it to SAGE_LOCAL without overwriting the existing one. But maybe this could work with Python packages. Rewrite sdh_pip_install
?
comment:3 Changed 2 years ago by
Yes, I mean specifically for Python packages.
comment:4 in reply to: ↑ 2 Changed 2 years ago by
Replying to jhpalmieri:
Rewrite
sdh_pip_install
?
I think the copy-from-staging-dir code could just deal with the Python directories specially. We are working around another problem (lib64) with that code already anyway.
This would solve it even for Python packages that are not installed using pip currently (#29500).
comment:5 Changed 2 years ago by
In fact (when SAGE_SUDO
is unset), the whole code should actually recursively move, instead of copy, directories. This will be much faster too (the staged directory is already on the same filesystem)
comment:6 follow-up: ↓ 7 Changed 2 years ago by
mv
vs cp
was discussed at #26011.
Something like the following (rough draft, untested)?
-
build/bin/sage-spkg
diff --git a/build/bin/sage-spkg b/build/bin/sage-spkg index 5cdd55db3f..0f8acf9b97 100755
a b if [ -d "$SAGE_DESTDIR" ]; then 946 946 # Generate installed file manifest 947 947 FILE_LIST="$(cd "$PREFIX" && find . -type f -o -type l | sed 's|^\./||' | sort)" 948 948 949 if [ -d "$PREFIX"/lib/python*/site-packages/ ]; then 950 echo "This is a Python package: moving site-package directories atomically." 951 SITE_PACKAGES=$(python -c 'import site; print(site.getsitepackages()[0])') 952 $SAGE_SUDO mv "$PREFIX"/lib/python*/site-packages/* "$SITE_PACKAGES" 953 fi 954 949 955 # Copy files into $SAGE_LOCAL 950 956 $SAGE_SUDO cp -Rp "$PREFIX/." "$SAGE_LOCAL" 951 957 if [ $? -ne 0 ]; then
comment:7 in reply to: ↑ 6 Changed 2 years ago by
Replying to jhpalmieri:
mv
vscp
was discussed at #26011.
Thanks for the pointer. The mv --recursive
that Erik is asking about in that thread, is exactly what would need to be implemented - best in Python (as also discussed in that thread).
Something like the following (rough draft, untested)?
build/bin/sage-spkg
diff --git a/build/bin/sage-spkg b/build/bin/sage-spkg index 5cdd55db3f..0f8acf9b97 100755
a b if [ -d "$SAGE_DESTDIR" ]; then 946 946 # Generate installed file manifest 947 947 FILE_LIST="$(cd "$PREFIX" && find . -type f -o -type l | sed 's|^\./||' | sort)" 948 948 949 if [ -d "$PREFIX"/lib/python*/site-packages/ ]; then 950 echo "This is a Python package: moving site-package directories atomically." 951 SITE_PACKAGES=$(python -c 'import site; print(site.getsitepackages()[0])') 952 $SAGE_SUDO mv "$PREFIX"/lib/python*/site-packages/* "$SITE_PACKAGES" 953 fi 954 949 955 # Copy files into $SAGE_LOCAL 950 956 $SAGE_SUDO cp -Rp "$PREFIX/." "$SAGE_LOCAL" 951 957 if [ $? -ne 0 ]; then
That could work, but I think I would prefer the more general solution.
comment:8 Changed 2 years ago by
This could be done in some 20 lines of code I guess that would take care of getting the intended concurrency semantics right. Basically try to move a source directory to the target, handling EEXIST and in that case recursing instead.
comment:9 Changed 2 years ago by
The new script would also
- make sure that the permissions are correct (avoiding the problems with readonly files that we encountered in #29082);
- handle the intended semantics of copying symlinks to directory over symlinks (as needed for
lib64
) and - issue warnings when concurrent writes to the same file (not directory) / writes to existing files are detected.
comment:10 Changed 2 years ago by
comment:11 Changed 2 years ago by
Ha, yes, I stumbled on that website, too.
comment:12 Changed 2 years ago by
- Description modified (diff)
comment:13 Changed 2 years ago by
- Milestone changed from sage-9.1 to sage-9.2
comment:14 Changed 22 months ago by
I think this sort of issue was solved a long time ago with sage-pip-install
, and its use of sage-flock
to prevent pip installing more than one package simultaneously.
Unfortunately, with the switch to the DESTDIR system, while this is probably still helpful, it doesn't help during copying of files into SAGE_LOCAL.
Perhaps the pip-lock should also be held when copying packages into $SAGE_LOCAL (don't even bother checking if it's an actual Python package; just prevent copying anything into $SAGE_LOCAL without holding the pip-lock).
comment:15 Changed 22 months ago by
In 9.3 I hope to switch installation of all Python packages from DESTDIR to use pip wheel - see #29500.
comment:16 Changed 21 months ago by
- Milestone changed from sage-9.2 to sage-9.3
comment:17 Changed 20 months ago by
- Description modified (diff)
comment:18 Changed 19 months ago by
- Keywords sd111 added
Hoping we can make progress on this ticket this week - https://wiki.sagemath.org/days111
comment:19 Changed 17 months ago by
- Milestone changed from sage-9.3 to sage-9.4
Setting new milestone based on a cursory review of ticket status, priority, and last modification date.
comment:20 Changed 13 months ago by
- Description modified (diff)
comment:21 Changed 13 months ago by
- Description modified (diff)
comment:22 Changed 13 months ago by
- Branch set to u/mkoeppe/more_race_conditions_with_python_package_installations_because_of_destdir
comment:23 Changed 13 months ago by
- Commit set to 49af8aa45b40ae471e4748a7f343f8050df6383e
Branch pushed to git repo; I updated commit sha1. New commits:
3b435d9 | build/bin/sage-dist-helpers (sdh_store_and_pip_install_wheel): Record name of installed distribution name
|
34bd445 | sdh_pip_uninstall (sdh_pip_uninstall): New helper functtion; use same flags in 'make SPKG-clean' for pip packages
|
49af8aa | build/bin/sage-spkg, build/sage_bootstrap/uninstall.py: Prepare/install/use the spkg-piprm script
|
comment:24 Changed 13 months ago by
- Description modified (diff)
- Summary changed from More race conditions with Python package installations because of DESTDIR to Remove DESTDIR staging for Python packages to eliminate race conditions during Python package installations
comment:25 Changed 13 months ago by
- Status changed from new to needs_review
comment:26 Changed 12 months ago by
- Cc gh-kliem added
comment:27 follow-ups: ↓ 30 ↓ 31 Changed 12 months ago by
- It would be good to document
sdh_pip_uninstall
, both at the top of the file and in the developer's guide.
- Unindent the block in
sage-spkg
starting on line 449? The block right after the comment "Extract the package" — you deleted the enclosingif
andfi
but left the block indented.
- Typo: "pacakges" on line 538 of
sage-spkg
.
- Was the variable
USE_LOCAL_SCRIPTS
essentially used to distinguish between old-stye and new-style packages?
The changes make sense, but it's hard for me to test them because I can't reproduce the race conditions. I've tried make -j20
on a pretty fast computer, but no luck so far.
comment:28 Changed 12 months ago by
- Commit changed from 49af8aa45b40ae471e4748a7f343f8050df6383e to ecd49fb5900e8cc8139efc3c50b6f8aed14c16df
Branch pushed to git repo; I updated commit sha1. New commits:
ecd49fb | build/bin/sage-dist-helpers, src/doc/en/developer/packaging.rst: Document sdh_pip_uninstall
|
comment:29 Changed 12 months ago by
- Commit changed from ecd49fb5900e8cc8139efc3c50b6f8aed14c16df to c3eeb69f78412ed0d8ad4c7aebdb834b512aad3c
Branch pushed to git repo; I updated commit sha1. New commits:
c3eeb69 | build/bin/sage-spkg: Fix typo in comment, unindent a block
|
comment:30 in reply to: ↑ 27 Changed 12 months ago by
Replying to jhpalmieri:
- Was the variable
USE_LOCAL_SCRIPTS
essentially used to distinguish between old-stye and new-style packages?
Yes, exactly
comment:31 in reply to: ↑ 27 Changed 12 months ago by
Replying to jhpalmieri:
it's hard for me to test them because I can't reproduce the race conditions. I've tried
make -j20
on a pretty fast computer, but no luck so far.
Based on the failing runs that I have seen on GH Actions, I would estimate 1 in 30 to 1 in 200 runs fail...
In any case, there's another reason why we should make this change: Whenever a user uses sage -pip install
to install any package, this can update various other Python packages as a side effect. So the installation records that the DESTDIR staging uses easily get out of sync with what's actually installed, making uninstallation by this method basically a gamble.
comment:32 follow-ups: ↓ 33 ↓ 34 ↓ 35 Changed 12 months ago by
Not for this ticket, but I really don't like the warning message here (when running make numpy-clean
):
Uninstalling existing 'numpy' Running pip-uninstall script for 'numpy' WARNING: Skipping numpy as it is not installed. Removing stamp file '/Users/palmieri/Desktop/Sage/sage_builds/TESTING/sage-9.4.beta4/local/var/lib/sage/installed/numpy-1.20.3'
For this ticket: if I do make numpy-clean
, should it also uninstall the packages that depend on numpy?
I also tried make jupyterlab_widgets
, but it didn't write anything in local/var/lib/sage/scripts
. This installed the packages jupyterlab
, nodejs
, and jupyterlab_widgets
, at least according to the files in logs/pkgs
, but there is no corresponding jupyterlab
file in local/var/lib/sage/installed
. The files for nodejs
and jupyterlab_widgets
are empty. make jupyterlab-clean
did not remove jupyterlab_widgets
. Is this the expected behavior? Maybe I don't understand your last post about how uninstallation is supposed to work.
comment:33 in reply to: ↑ 32 Changed 12 months ago by
Replying to jhpalmieri:
I also tried
make jupyterlab_widgets
, but it didn't write anything inlocal/var/lib/sage/scripts
. This installed the packagesjupyterlab
,nodejs
, andjupyterlab_widgets
, at least according to the files inlogs/pkgs
, but there is no correspondingjupyterlab
file inlocal/var/lib/sage/installed
. The files fornodejs
andjupyterlab_widgets
are empty.make jupyterlab-clean
did not removejupyterlab_widgets
. Is this the expected behavior?
jupyterlab_widgets
and nodejs
are script packages; we keep track of their installation by creating an empty stamp file in ...installed
. The present ticket makes no changes to script packages. There is no uninstallation procedure for script packages at all - only the stamp file is removed. (Only sagelib-clean
and sage_docbuild-clean
are defined directly in build/make/Makefile.in
.)
jupyterlab
is a pip package; we do not keep installation records for them. The present ticket makes no changes to pip packages either. Uninstallation is done using sage --pip uninstall
(invoked by the makefile).
comment:34 in reply to: ↑ 32 Changed 12 months ago by
Replying to jhpalmieri:
if I do
make numpy-clean
, should it also uninstall the packages that depend on numpy?
No, I don't think so. Neither our current build system (before the present ticket) nor pip
do this.
(Reinstallation of packages depending on numpy
is instead triggered by a reinstallation of numpy
.)
comment:35 in reply to: ↑ 32 Changed 12 months ago by
Replying to jhpalmieri:
Not for this ticket, but I really don't like the warning message here (when running
make numpy-clean
):Uninstalling existing 'numpy' Running pip-uninstall script for 'numpy' WARNING: Skipping numpy as it is not installed. Removing stamp file '/Users/palmieri/Desktop/Sage/sage_builds/TESTING/sage-9.4.beta4/local/var/lib/sage/installed/numpy-1.20.3'
I think this warning will only show up if you switch between branches with and without this ticket.
comment:36 follow-up: ↓ 37 Changed 12 months ago by
Is pip
better at atomic actions?
comment:37 in reply to: ↑ 36 Changed 12 months ago by
Replying to jhpalmieri:
Is
pip
better at atomic actions?
I don't know, but we do not depend on it to be because we are holding a lock while installing a wheel, in build/bin/sage-pip-install
. This was added in #21672.
comment:38 Changed 12 months ago by
- Reviewers set to John Palmieri
- Status changed from needs_review to positive_review
Okay, let's merge it so it gets more testing.
comment:39 Changed 12 months ago by
Thanks!
comment:40 Changed 12 months ago by
- Branch changed from u/mkoeppe/more_race_conditions_with_python_package_installations_because_of_destdir to c3eeb69f78412ed0d8ad4c7aebdb834b512aad3c
- Resolution set to fixed
- Status changed from positive_review to closed
comment:41 Changed 11 months ago by
- Commit c3eeb69f78412ed0d8ad4c7aebdb834b512aad3c deleted
A crucial commit was missing on this branch. Follow-up in #32361.
comment:42 Changed 11 months ago by
I'm at peace with this solution :)
Various fixes are possible here: