Opened 2 years ago

Closed 12 months ago

Last modified 11 months ago

#29585 closed defect (fixed)

Remove DESTDIR staging for Python packages to eliminate race conditions during Python package installations

Reported by: mkoeppe Owned by:
Priority: critical Milestone: sage-9.4
Component: build Keywords: sd111
Cc: jhpalmieri, dimpase, embray, gh-kliem Merged in:
Authors: Matthias Koeppe Reviewers: John Palmieri
Report Upstream: N/A Work issues:
Branch: c3eeb69 (Commits, GitHub, GitLab) Commit:
Dependencies: Stopgaps:

Status badges

Description (last modified by mkoeppe)

Follow-up from #26018:

Declaring some Python packages as PYTHON_TOOLCHAIN helped somewhat, but there are more race conditions with the same failure mechanism: Some Python package byte-compiles modules from a half-installed package, and then finishing the installation of that fails.

Of course, the high parallelization in my automatic runs on GitHub provoke these kinds of errors. But this should be fixed nevertheless.

See https://github.com/mkoeppe/sage/runs/618645992

  [backports_shutil_get_terminal_size-1.0.0.p1]   cp: cannot create regular file '/sage/local/./lib/python2.7/site-packages/backports/__init__.py': File exists
  [backports_shutil_get_terminal_size-1.0.0.p1]   ************************************************************************
  [backports_shutil_get_terminal_size-1.0.0.p1]   Error copying files for backports_shutil_get_terminal_size-1.0.0.p1.
  [backports_shutil_get_terminal_size-1.0.0.p1]   ************************************************************************
  [backports_ssl_match_hostname-3.5.0.1.p0] error installing, exit status 1. End of log file:
  [backports_shutil_get_terminal_size-1.0.0.p1] Full log file: /sage/logs/pkgs/backports_shutil_get_terminal_size-1.0.0.p1.log


  [backports_ssl_match_hostname-3.5.0.1.p0]   Copying package files from temporary location /sage/local/var/tmp/sage/build/backports_ssl_match_hostname-3.5.0.1.p0/inst to /sage/local
  [backports_ssl_match_hostname-3.5.0.1.p0]   cp: cannot create regular file '/sage/local/./lib/python2.7/site-packages/backports/__init__.pyc': File exists
  [backports_ssl_match_hostname-3.5.0.1.p0]   ************************************************************************
  [backports_ssl_match_hostname-3.5.0.1.p0]   Error copying files for backports_ssl_match_hostname-3.5.0.1.p0.
  [backports_ssl_match_hostname-3.5.0.1.p0]   ************************************************************************
  [backports_ssl_match_hostname-3.5.0.1.p0] Full log file: /sage/logs/pkgs/backports_ssl_match_hostname-3.5.0.1.p0.log

Another occurrence on Cygwin: https://github.com/mkoeppe/sage/runs/638920131?check_suite_focus=true

Another occurrence: #30831

A new one thanks to recent pip's opportunistic use of tornado:

  [tornado-6.0.4]   real	0m11.342s
  [tornado-6.0.4]   user	0m2.041s
  [tornado-6.0.4]   sys	0m0.380s
  [tornado-6.0.4]   Copying package files from temporary location /sage/local/var/tmp/sage/build/tornado-6.0.4/inst to /sage/local
  [tornado-6.0.4]   cp: cannot create directory '/sage/local/./lib/python3.9/site-packages/tornado/__pycache__': File exists
  [tornado-6.0.4]   ************************************************************************

https://github.com/sagemath/sage/runs/2879682001

In this ticket, we remove the DESTDIR staging for Python packages. It has been redundant since we moved to installing packages via wheels.

After this change, the only file that is tracked in $SAGE_LOCAL/var/lib/sage/installed/pynormaliz-2.14 is the wheel file:

    "files": [
        "var/lib/sage/wheels/PyNormaliz-2.14-cp39-cp39-macosx_11_0_x86_64.whl"
    ]
}

Uninstalling Python packages is now done via pip, by means of a new removal script called spkg-piprm installed in $SAGE_LOCAL/var/lib/sage/scripts/SPKG/.

Change History (42)

comment:1 Changed 2 years ago by mkoeppe

Various fixes are possible here:

  • More dependencies need to be added
  • Copying from the staging dir needs to make Python package directory creation atomic (by copying to a temp name in the destination directory, followed by an (atomic) mv of the directory)
  • It should be checked if current versions of pip are safe for concurrent use so we don't have to reinvent the wheel

comment:2 follow-up: Changed 2 years ago by jhpalmieri

In general, I think that package installation can't be atomic, since we need to move things to SAGE_LOCAL/lib and SAGE_LOCAL/include, and we can't just create a directory temp/lib and move it to SAGE_LOCAL without overwriting the existing one. But maybe this could work with Python packages. Rewrite sdh_pip_install?

comment:3 Changed 2 years ago by mkoeppe

Yes, I mean specifically for Python packages.

comment:4 in reply to: ↑ 2 Changed 2 years ago by mkoeppe

Replying to jhpalmieri:

Rewrite sdh_pip_install?

I think the copy-from-staging-dir code could just deal with the Python directories specially. We are working around another problem (lib64) with that code already anyway.

This would solve it even for Python packages that are not installed using pip currently (#29500).

comment:5 Changed 2 years ago by mkoeppe

In fact (when SAGE_SUDO is unset), the whole code should actually recursively move, instead of copy, directories. This will be much faster too (the staged directory is already on the same filesystem)

comment:6 follow-up: Changed 2 years ago by jhpalmieri

mv vs cp was discussed at #26011.

Something like the following (rough draft, untested)?

  • build/bin/sage-spkg

    diff --git a/build/bin/sage-spkg b/build/bin/sage-spkg
    index 5cdd55db3f..0f8acf9b97 100755
    a b if [ -d "$SAGE_DESTDIR" ]; then 
    946946    # Generate installed file manifest
    947947    FILE_LIST="$(cd "$PREFIX" && find . -type f -o -type l | sed 's|^\./||' | sort)"
    948948
     949    if [ -d "$PREFIX"/lib/python*/site-packages/ ]; then
     950        echo "This is a Python package: moving site-package directories atomically."
     951        SITE_PACKAGES=$(python -c 'import site; print(site.getsitepackages()[0])')
     952        $SAGE_SUDO mv "$PREFIX"/lib/python*/site-packages/* "$SITE_PACKAGES"
     953    fi
     954
    949955    # Copy files into $SAGE_LOCAL
    950956    $SAGE_SUDO cp -Rp "$PREFIX/." "$SAGE_LOCAL"
    951957    if [ $? -ne 0 ]; then
Last edited 2 years ago by jhpalmieri (previous) (diff)

comment:7 in reply to: ↑ 6 Changed 2 years ago by mkoeppe

Replying to jhpalmieri:

mv vs cp was discussed at #26011.

Thanks for the pointer. The mv --recursive that Erik is asking about in that thread, is exactly what would need to be implemented - best in Python (as also discussed in that thread).

Something like the following (rough draft, untested)?

  • build/bin/sage-spkg

    diff --git a/build/bin/sage-spkg b/build/bin/sage-spkg
    index 5cdd55db3f..0f8acf9b97 100755
    a b if [ -d "$SAGE_DESTDIR" ]; then 
    946946    # Generate installed file manifest
    947947    FILE_LIST="$(cd "$PREFIX" && find . -type f -o -type l | sed 's|^\./||' | sort)"
    948948
     949    if [ -d "$PREFIX"/lib/python*/site-packages/ ]; then
     950        echo "This is a Python package: moving site-package directories atomically."
     951        SITE_PACKAGES=$(python -c 'import site; print(site.getsitepackages()[0])')
     952        $SAGE_SUDO mv "$PREFIX"/lib/python*/site-packages/* "$SITE_PACKAGES"
     953    fi
     954
    949955    # Copy files into $SAGE_LOCAL
    950956    $SAGE_SUDO cp -Rp "$PREFIX/." "$SAGE_LOCAL"
    951957    if [ $? -ne 0 ]; then

That could work, but I think I would prefer the more general solution.

comment:8 Changed 2 years ago by mkoeppe

This could be done in some 20 lines of code I guess that would take care of getting the intended concurrency semantics right. Basically try to move a source directory to the target, handling EEXIST and in that case recursing instead.

comment:9 Changed 2 years ago by mkoeppe

The new script would also

  • make sure that the permissions are correct (avoiding the problems with readonly files that we encountered in #29082);
  • handle the intended semantics of copying symlinks to directory over symlinks (as needed for lib64) and
  • issue warnings when concurrent writes to the same file (not directory) / writes to existing files are detected.

comment:11 Changed 2 years ago by jhpalmieri

Ha, yes, I stumbled on that website, too.

comment:12 Changed 2 years ago by mkoeppe

  • Description modified (diff)

comment:13 Changed 2 years ago by mkoeppe

  • Milestone changed from sage-9.1 to sage-9.2

comment:14 Changed 22 months ago by embray

I think this sort of issue was solved a long time ago with sage-pip-install, and its use of sage-flock to prevent pip installing more than one package simultaneously.

Unfortunately, with the switch to the DESTDIR system, while this is probably still helpful, it doesn't help during copying of files into SAGE_LOCAL.

Perhaps the pip-lock should also be held when copying packages into $SAGE_LOCAL (don't even bother checking if it's an actual Python package; just prevent copying anything into $SAGE_LOCAL without holding the pip-lock).

comment:15 Changed 22 months ago by mkoeppe

In 9.3 I hope to switch installation of all Python packages from DESTDIR to use pip wheel - see #29500.

comment:16 Changed 21 months ago by mkoeppe

  • Milestone changed from sage-9.2 to sage-9.3

comment:17 Changed 20 months ago by mkoeppe

  • Description modified (diff)

comment:18 Changed 19 months ago by mkoeppe

  • Keywords sd111 added

Hoping we can make progress on this ticket this week - https://wiki.sagemath.org/days111

comment:19 Changed 17 months ago by mkoeppe

  • Milestone changed from sage-9.3 to sage-9.4

Setting new milestone based on a cursory review of ticket status, priority, and last modification date.

comment:20 Changed 13 months ago by mkoeppe

  • Description modified (diff)

comment:21 Changed 13 months ago by mkoeppe

  • Description modified (diff)

comment:22 Changed 13 months ago by mkoeppe

  • Branch set to u/mkoeppe/more_race_conditions_with_python_package_installations_because_of_destdir

comment:23 Changed 13 months ago by git

  • Commit set to 49af8aa45b40ae471e4748a7f343f8050df6383e

Branch pushed to git repo; I updated commit sha1. New commits:

3b435d9build/bin/sage-dist-helpers (sdh_store_and_pip_install_wheel): Record name of installed distribution name
34bd445sdh_pip_uninstall (sdh_pip_uninstall): New helper functtion; use same flags in 'make SPKG-clean' for pip packages
49af8aabuild/bin/sage-spkg, build/sage_bootstrap/uninstall.py: Prepare/install/use the spkg-piprm script

comment:24 Changed 13 months ago by mkoeppe

  • Authors set to Matthias Koeppe
  • Description modified (diff)
  • Summary changed from More race conditions with Python package installations because of DESTDIR to Remove DESTDIR staging for Python packages to eliminate race conditions during Python package installations

comment:25 Changed 13 months ago by mkoeppe

  • Status changed from new to needs_review

comment:26 Changed 12 months ago by mkoeppe

  • Cc gh-kliem added

comment:27 follow-ups: Changed 12 months ago by jhpalmieri

  • It would be good to document sdh_pip_uninstall, both at the top of the file and in the developer's guide.
  • Unindent the block in sage-spkg starting on line 449? The block right after the comment "Extract the package" — you deleted the enclosing if and fi but left the block indented.
  • Typo: "pacakges" on line 538 of sage-spkg.
  • Was the variable USE_LOCAL_SCRIPTS essentially used to distinguish between old-stye and new-style packages?

The changes make sense, but it's hard for me to test them because I can't reproduce the race conditions. I've tried make -j20 on a pretty fast computer, but no luck so far.

comment:28 Changed 12 months ago by git

  • Commit changed from 49af8aa45b40ae471e4748a7f343f8050df6383e to ecd49fb5900e8cc8139efc3c50b6f8aed14c16df

Branch pushed to git repo; I updated commit sha1. New commits:

ecd49fbbuild/bin/sage-dist-helpers, src/doc/en/developer/packaging.rst: Document sdh_pip_uninstall

comment:29 Changed 12 months ago by git

  • Commit changed from ecd49fb5900e8cc8139efc3c50b6f8aed14c16df to c3eeb69f78412ed0d8ad4c7aebdb834b512aad3c

Branch pushed to git repo; I updated commit sha1. New commits:

c3eeb69build/bin/sage-spkg: Fix typo in comment, unindent a block

comment:30 in reply to: ↑ 27 Changed 12 months ago by mkoeppe

Replying to jhpalmieri:

  • Was the variable USE_LOCAL_SCRIPTS essentially used to distinguish between old-stye and new-style packages?

Yes, exactly

comment:31 in reply to: ↑ 27 Changed 12 months ago by mkoeppe

Replying to jhpalmieri:

it's hard for me to test them because I can't reproduce the race conditions. I've tried make -j20 on a pretty fast computer, but no luck so far.

Based on the failing runs that I have seen on GH Actions, I would estimate 1 in 30 to 1 in 200 runs fail...

In any case, there's another reason why we should make this change: Whenever a user uses sage -pip install to install any package, this can update various other Python packages as a side effect. So the installation records that the DESTDIR staging uses easily get out of sync with what's actually installed, making uninstallation by this method basically a gamble.

comment:32 follow-ups: Changed 12 months ago by jhpalmieri

Not for this ticket, but I really don't like the warning message here (when running make numpy-clean):

Uninstalling existing 'numpy'
Running pip-uninstall script for 'numpy'
WARNING: Skipping numpy as it is not installed.
Removing stamp file '/Users/palmieri/Desktop/Sage/sage_builds/TESTING/sage-9.4.beta4/local/var/lib/sage/installed/numpy-1.20.3'

For this ticket: if I do make numpy-clean, should it also uninstall the packages that depend on numpy?

I also tried make jupyterlab_widgets, but it didn't write anything in local/var/lib/sage/scripts. This installed the packages jupyterlab, nodejs, and jupyterlab_widgets, at least according to the files in logs/pkgs, but there is no corresponding jupyterlab file in local/var/lib/sage/installed. The files for nodejs and jupyterlab_widgets are empty. make jupyterlab-clean did not remove jupyterlab_widgets. Is this the expected behavior? Maybe I don't understand your last post about how uninstallation is supposed to work.

comment:33 in reply to: ↑ 32 Changed 12 months ago by mkoeppe

Replying to jhpalmieri:

I also tried make jupyterlab_widgets, but it didn't write anything in local/var/lib/sage/scripts. This installed the packages jupyterlab, nodejs, and jupyterlab_widgets, at least according to the files in logs/pkgs, but there is no corresponding jupyterlab file in local/var/lib/sage/installed. The files for nodejs and jupyterlab_widgets are empty. make jupyterlab-clean did not remove jupyterlab_widgets. Is this the expected behavior?

jupyterlab_widgets and nodejs are script packages; we keep track of their installation by creating an empty stamp file in ...installed. The present ticket makes no changes to script packages. There is no uninstallation procedure for script packages at all - only the stamp file is removed. (Only sagelib-clean and sage_docbuild-clean are defined directly in build/make/Makefile.in.)

jupyterlab is a pip package; we do not keep installation records for them. The present ticket makes no changes to pip packages either. Uninstallation is done using sage --pip uninstall (invoked by the makefile).

Last edited 12 months ago by mkoeppe (previous) (diff)

comment:34 in reply to: ↑ 32 Changed 12 months ago by mkoeppe

Replying to jhpalmieri:

if I do make numpy-clean, should it also uninstall the packages that depend on numpy?

No, I don't think so. Neither our current build system (before the present ticket) nor pip do this.

(Reinstallation of packages depending on numpy is instead triggered by a reinstallation of numpy.)

comment:35 in reply to: ↑ 32 Changed 12 months ago by mkoeppe

Replying to jhpalmieri:

Not for this ticket, but I really don't like the warning message here (when running make numpy-clean):

Uninstalling existing 'numpy'
Running pip-uninstall script for 'numpy'
WARNING: Skipping numpy as it is not installed.
Removing stamp file '/Users/palmieri/Desktop/Sage/sage_builds/TESTING/sage-9.4.beta4/local/var/lib/sage/installed/numpy-1.20.3'

I think this warning will only show up if you switch between branches with and without this ticket.

comment:36 follow-up: Changed 12 months ago by jhpalmieri

Is pip better at atomic actions?

comment:37 in reply to: ↑ 36 Changed 12 months ago by mkoeppe

Replying to jhpalmieri:

Is pip better at atomic actions?

I don't know, but we do not depend on it to be because we are holding a lock while installing a wheel, in build/bin/sage-pip-install. This was added in #21672.

comment:38 Changed 12 months ago by jhpalmieri

  • Reviewers set to John Palmieri
  • Status changed from needs_review to positive_review

Okay, let's merge it so it gets more testing.

comment:39 Changed 12 months ago by mkoeppe

Thanks!

comment:40 Changed 12 months ago by vbraun

  • Branch changed from u/mkoeppe/more_race_conditions_with_python_package_installations_because_of_destdir to c3eeb69f78412ed0d8ad4c7aebdb834b512aad3c
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:41 Changed 11 months ago by mkoeppe

  • Commit c3eeb69f78412ed0d8ad4c7aebdb834b512aad3c deleted

A crucial commit was missing on this branch. Follow-up in #32361.

comment:42 Changed 11 months ago by embray

I'm at peace with this solution :)

Note: See TracTickets for help on using tickets.