Opened 22 months ago

Closed 15 months ago

Last modified 13 months ago

#24655 closed enhancement (fixed)

Automatically build docker images with CircleCI/GitLab CI

Reported by: saraedum Owned by:
Priority: minor Milestone: sage-8.3
Component: distribution Keywords: docker, CI
Cc: roed, embray, nthiery, mkoeppe, jdemeyer, hivert Merged in:
Authors: Julian Rüth Reviewers: Erik Bray
Report Upstream: N/A Work issues:
Branch: ac6201a (Commits) Commit:
Dependencies: #25161, #25160 Stopgaps:

Description (last modified by saraedum)

It would be nice to update our docker images automatically through continuous integration services. Of course it's good to have these images up-to-date without manual intervention but this is also convenient as a starting point for people who want to use CI for their own branches of Sage (besides the patchbot.)¹

This ticket proposes recipes for GitLab CI and CircleCI to build our docker images automatically. On the respective websites, the CI can be configured to push automatically to the Docker Hub. A webhook (on github) updates the README on Docker Hub automatically.

I implemented this for both GitLab CI and CircleCI. I think GitLab CI is more relevant in the long run, also it's open source and people can provision their own machines as runners. CircleCI at the same time works out of the box for Sage without private test runners and it also allows for easier debugging as you can logon to the machine running your tests with SSH. I tried to share most code between the two implementations.

See also https://github.com/sagemath/docker-images/issues/13 and https://github.com/sagemath/sage-binder-env/issues/3 for a followup (automatically provide jupyter notebooks for easier review.)


Here are some numbers and screenshots (click on the screenshots to go to the real pages):

GitLab CI

If I provision my own runner from Google Cloud with two threads, it takes about 5 hours to build Sage from scratch, run rudimentary tests on the docker images, and upload them to Docker Hub and GitLab's registry.

screenshot

Recycling the build artifacts from the last run on the develop branch brings this down to about ?? minutes (on GitLab's free shared runners with two threads.) This roughly breaks down as:

  • 32 minutes for `build-from-latest:
    • 10 minutes for the actual build (most of which is spent in the docbuild; caused by a known Sphinx bug to some extent)
    • ?? minutes are spent pulling the sagemath-dev image from Docker Hub (this usually goes away if you provision your own runners and expose the host's docker daemon by setting DOCKER_HOST.)
    • a few minutes running through all the fast stages of the Dockerfile.
    • a few minutes to push the resulting images to GitLab's registry. (using GitLab's cache, this could probably be improved, at least for runners that we provision ourselves.)
  • 5 - 15 minutes for each test (run in parallel,); the relevant test is test-dev.sh which spents 6 minutes in the actual docbuild (just as in build-from-latest) and some 5 minutes to pull the sagemath-dev image from the GitLab registry. (That part should go away with a provisioned runner that exposes the host's docker daemon.)
  • ?? minutes for the publishing to Docker Hub, most of which is spent pulling the images from the GitLab registry, and the other half pushing them to Docker Hub roughly. (Again, exposing the host's docker daemon would probably cut that time in half.)

With some tricks we could probably bring this down to 25 minutes (see CircleCI below) but we won't get this down to this without giving up on the CI being split up into different stages (as is for technical reasons necessary for CircleCI.) To go well below that, we would need to pull binary packages from somewhere…I don't see a sustainable way of doing this with the current SPKG system.

screenshot

CircleCI

It typically takes almost 5 hours to build Sage from scratch on CircleCI, run rudimentary tests on the docker images, and upload them to Docker Hub.

screenshot

Recycling the build artifacts from the last run on the develop branch brings this down to about 30 minutes usually. 5 minutes could be saved by not testing the sagemath-dev and probably another minute or two if we do not build it at all. To go significantly below 15 minutes is probably hard with the huge sage-the-distribution (7GB uncompressed/2GB compressed) that we have to pull every time at the moment.

screenshot

Docker Hub

A push to github updates the README on the Docker Hub page. The current sizes are https://img.shields.io/microbadger/image-size/sagemath/sagemath/latest.svg and https://img.shields.io/microbadger/image-size/sagemath/sagemath-dev/latest.svg; unfortunately MicroBadger? is somewhat unstable so these numbers are incorrectly reported as 0 sometimes.

screenshot


Here are some things that we need to test before merging this:


After this ticket has been merged, the following steps are necessary:

  • Setup an account for sagemath on Circle CI.
  • Add Docker Hub credentials on Circle CI or GitLab.

To see a demo of what the result looks like, go to https://hub.docker.com/r/sagemath/sagemath/. The CircleCI runs can be seen here https://circleci.com/gh/saraedum/sage, and the GitLab CI runs are here https://gitlab.com/saraedum/sage/pipelines.


¹: I want to run unit tests of an external Sage package, https://github.com/swewers/MCLF. Being able to build a custom docker image which contains some not-yet-merged tickets makes this much easier.

PS: Long-term one could imagine this to be the first step to replace the patchbot with a solution that we do not have to maintain so much ourselves, such as gitlab-runners. This is of course outside of the scope of this ticket but having a bunch of working CI files in our repository might inspire people to script some other tasks in a reproducible and standardized way.

Attachments (5)

gitlab.png (27.8 KB) - added by saraedum 21 months ago.
screenshot
gitlab-rebuild.png (28.7 KB) - added by saraedum 21 months ago.
screenshot
circleci.png (49.3 KB) - added by saraedum 21 months ago.
screenshot
circleci-rebuild.png (15.6 KB) - added by saraedum 21 months ago.
screenshot
dockerhub.png (90.1 KB) - added by saraedum 21 months ago.
screenshot

Download all attachments as: .zip

Change History (362)

comment:1 Changed 22 months ago by saraedum

  • Cc roed embray nthiery added
  • Description modified (diff)

comment:2 Changed 22 months ago by saraedum

  • Branch set to u/saraedum/gitlabci

comment:3 Changed 22 months ago by saraedum

  • Commit set to 91e4ed1fa743786c25b505859faefa516f1e6983
  • Description modified (diff)

New commits:

91e4ed1Enablec GitLab CI and CircleCI

comment:4 Changed 22 months ago by nthiery

I haven't played much with GitLab CI and CircleCI, so can't really comment on the details for now. But +1 for the general idea. That would be very helpful.

comment:5 Changed 22 months ago by nthiery

  • Cc mkoeppe added

Adding Matthias who did setup continuous integration for various Sage packages (including https://github.com/sagemath/sage_sample).

comment:6 Changed 22 months ago by git

  • Commit changed from 91e4ed1fa743786c25b505859faefa516f1e6983 to 18970ac8cce0c0ced6f5efffb8a63699a18c827e

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

18970acEnable GitLab CI and CircleCI

comment:7 Changed 22 months ago by git

  • Commit changed from 18970ac8cce0c0ced6f5efffb8a63699a18c827e to 3c855571a4a17738116011cdf9962f675248f129

Branch pushed to git repo; I updated commit sha1. New commits:

e6d2703Make sure we do not docker build from a stale cache
3bbe45fDistribute sagemath-cli from GitLab
3c85557There is no guarantee that we are going to see the same docker daemon

comment:8 Changed 21 months ago by git

  • Commit changed from 3c855571a4a17738116011cdf9962f675248f129 to 5f1a0e634a37bfc7b42a31a52a1f3b1e76c27470

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

5f1a0e6Build docker images automatically

comment:9 Changed 21 months ago by git

  • Commit changed from 5f1a0e634a37bfc7b42a31a52a1f3b1e76c27470 to 69752edeec2006713580d5c20a3c95db03636c43

Branch pushed to git repo; I updated commit sha1. New commits:

69752edFix script name

comment:10 Changed 21 months ago by git

  • Commit changed from 69752edeec2006713580d5c20a3c95db03636c43 to 6642d994bbe0d8552578d2902f91b7d03fb37938

Branch pushed to git repo; I updated commit sha1. New commits:

c0514d5Comment on limitations on CircleCI
1035022Set workdir to the Sage directory
4ccc256Fix entrypoints
84c12edSimplify push/pull scripts
202273cA single entrypoint does not need a directory on its own
a8fcb94Add tests
cfe2d14Fix CircleCI config
6642d99Speed up my build experiments

comment:11 Changed 21 months ago by git

  • Commit changed from 6642d994bbe0d8552578d2902f91b7d03fb37938 to 062d37bec3145981b4ee6b1b1b74b1943c348520

Branch pushed to git repo; I updated commit sha1. New commits:

4e179ccno TTY/STDIN for automated tests
a5b2749Fix names of docker containers for GitLab CI
295ffd8Trying to make tests POSIX compliant
e9bacb8Fixing test for CI
1b72d5bUpdate docker/README to incorporate latest changes
062d37bAdd debug output

comment:12 Changed 21 months ago by git

  • Commit changed from 062d37bec3145981b4ee6b1b1b74b1943c348520 to e4010d8f91082ac1b4d9c2cd2955edda30c638f8

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

4c023d5Enable CI backed by GitLab CI and CircleCI
6ccbaa3Be more agressive about requiring the latest develop
7f90d86be more precise about failed latest merge
8efe3a9more and more specific badges
15f3ae7fix "build your own" instructions
595be9cPOSIX scripts
e4010d8Fix branding of CI files

comment:13 Changed 21 months ago by saraedum

  • Description modified (diff)

Changed 21 months ago by saraedum

screenshot

Changed 21 months ago by saraedum

screenshot

Changed 21 months ago by saraedum

screenshot

Changed 21 months ago by saraedum

screenshot

Changed 21 months ago by saraedum

screenshot

comment:14 Changed 21 months ago by saraedum

  • Description modified (diff)
  • Status changed from new to needs_review

comment:15 Changed 21 months ago by saraedum

  • Description modified (diff)

comment:16 Changed 21 months ago by git

  • Commit changed from e4010d8f91082ac1b4d9c2cd2955edda30c638f8 to 10d2093e21b637f31f2f6fef0a994dd14f3cd48f

Branch pushed to git repo; I updated commit sha1. New commits:

10d2093Comment on timings and possible speedups

comment:17 Changed 21 months ago by saraedum

  • Description modified (diff)

comment:18 Changed 21 months ago by git

  • Commit changed from 10d2093e21b637f31f2f6fef0a994dd14f3cd48f to d99696262a3fd3d50e4ad46bb6c768266dd76cd0

Branch pushed to git repo; I updated commit sha1. New commits:

f1d33ecExplain how docker can be used for local development
d996962Merge branch 'gitlabci' into ci

comment:19 Changed 21 months ago by git

  • Commit changed from d99696262a3fd3d50e4ad46bb6c768266dd76cd0 to 01e3fc95ea7418371ba8a67d47ca80762f59ef3f

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

744cc24Merge remote-tracking branch 'gitlab/binder' into binder
93696a6git needs ssh to push
ae0feb4ssh is called openssh
6c2b29cRecord github's SSH key
062c40bMerge branch 'ci' into binder
da69969Allow user to override ARTIFACT_BASE on CircleCI
7a06f9aCreate a directory for known_hosts
83a3593Create environment for binder
0f4796cMerge branch 'ci' into binder
01e3fc9Merge branch 'binder' into HEAD

comment:20 Changed 21 months ago by git

  • Commit changed from 01e3fc95ea7418371ba8a67d47ca80762f59ef3f to d99696262a3fd3d50e4ad46bb6c768266dd76cd0

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

comment:21 Changed 21 months ago by git

  • Commit changed from d99696262a3fd3d50e4ad46bb6c768266dd76cd0 to 069e722679c5f5e612a7214e51d98cf89b07d137

Branch pushed to git repo; I updated commit sha1. New commits:

d0b3d7fTo make this truly a micro-release
069e722Allow user to override ARTIFACT_BASE on CircleCI

comment:22 Changed 21 months ago by saraedum

  • Description modified (diff)

comment:23 Changed 21 months ago by saraedum

  • Work issues set to run doctests

Maybe my microrelease image is missing something essential? I get a lot of inspection related doctest errors: https://gitlab.com/saraedum/sage/-/jobs/54609045

comment:24 Changed 21 months ago by saraedum

  • Work issues changed from run doctests to fix doctests
sage -t src/doc/common/conf.py  # 1 doctest failed
sage -t src/sage/arith/long.pxd  # 9 doctests failed
sage -t src/sage/cpython/cython_metaclass.pyx  # 4 doctests failed
sage -t src/sage/cpython/getattr.pyx  # 4 doctests failed
sage -t src/sage/cpython/wrapperdescr.pxd  # 6 doctests failed
sage -t src/sage/docs/instancedoc.pyx  # 4 doctests failed
sage -t src/sage/ext/cdefs.pxi  # 1 doctest failed
sage -t src/sage/ext/memory_allocator.pyx  # 2 doctests failed
sage -t src/sage/ext/stdsage.pxi  # 1 doctest failed
sage -t src/sage/libs/gap/util.pyx  # 2 doctests failed
sage -t src/sage/libs/glpk/error.pyx  # 1 doctest failed
sage -t src/sage/misc/cachefunc.pyx  # 54 doctests failed
sage -t src/sage/misc/cython.py  # 17 doctests failed
sage -t src/sage/misc/cython_c.pyx  # 1 doctest failed
sage -t src/sage/misc/inherit_comparison.pyx  # 5 doctests failed
sage -t src/sage/misc/lazy_attribute.pyx  # 3 doctests failed
sage -t src/sage/misc/nested_class.pyx  # 6 doctests failed
sage -t src/sage/misc/sagedoc.py  # 6 doctests failed
sage -t src/sage/misc/sageinspect.py  # 31 doctests failed
sage -t src/sage/misc/session.pyx  # 2 doctests failed
sage -t src/sage/misc/superseded.py  # 2 doctests failed
sage -t src/sage/parallel/decorate.py  # 2 doctests failed
sage -t src/sage/repl/ipython_extension.py  # 1 doctest failed
sage -t src/sage/repl/load.py  # 2 doctests failed
sage -t src/sage/rings/integer_fake.pxd  # 1 doctest failed
sage -t src/sage/structure/element.pxd  # 1 doctest failed
sage -t src/sage/structure/element.pyx  # 38 doctests failed
sage -t src/sage/structure/factory.pyx  # 8 doctests failed
sage -t src/sage/tests/cmdline.py  # 4 doctests failed
Last edited 21 months ago by saraedum (previous) (diff)

comment:25 Changed 21 months ago by saraedum

Making good progress…

sage -t src/doc/common/conf.py  # 1 doctest failed
sage -t src/sage/misc/cython.py  # 5 doctests failed
sage -t src/sage/misc/sagedoc.py  # 4 doctests failed

comment:26 Changed 21 months ago by git

  • Commit changed from 069e722679c5f5e612a7214e51d98cf89b07d137 to 7ad5449612c66cac78312e703ddabbd3c49027ec

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

2fc2da6Add rdfind properly to the dependencies
e4d5e4efix doctest errors
8ce8f41add more debug info about the host we are running on
ce23502Merge branch 'ci' of gitlab.com:saraedum/sage into ci
bbe279esame log info on CircleCI
9aaca79sh commands in Makefile do not modify the cwd
fa93281Fix introspection
5e38736Make tests requiring documentation optional
fb22aa2Fix cython compilation errors during doctesting
7ad5449Merge remote-tracking branch 'trac/develop' into ci

comment:27 Changed 21 months ago by embray

Julian walked me through this last week and it is very good stuff and I'm +1 on it.

My one concern is that by including the CI configuration files directly in the main Sage repo, it may be difficult to update things quickly if/when they break. In the long term I'd much prefer it if Sage's 'master' branch were used more like other projects typically do, as a fast-moving development branch where we can easily push changes quickly if need be, particularly for the CI config, for which perhaps we could make a special dispensation.

In the meantime, perhaps the CI builds should be based on a "ci" branch of Sage which, most of the time, would just be kept up to date with develop, but that we could also commit to more frequently if needed.

Is this still needs_review, or should it be needs_work?

comment:28 Changed 21 months ago by git

  • Commit changed from 7ad5449612c66cac78312e703ddabbd3c49027ec to 4818f9bc54c2d71c12097f5b0ff10335718429e0

Branch pushed to git repo; I updated commit sha1. New commits:

6aeebe6simplify environment variables
4818f9bAn attempt at incremental builds

comment:29 follow-up: Changed 21 months ago by saraedum

  • Work issues changed from fix doctests to make doc takes a bit too much time in sagemath-dev, make incremental build work on CircleCI

I see your concerns but I think it has to be in the main branch to be useful. If we have it on a separate branch, how would I use it if I want it on my branch? Merge it in and then in the end force push a version without CI?

This actually needs review. I am tweaking this in a few places to make it run faster and produce smaller images but I am not planning on making any substantial changes anymore.

comment:30 Changed 21 months ago by saraedum

  • Work issues changed from make doc takes a bit too much time in sagemath-dev, make incremental build work on CircleCI to make doc takes a bit too much time in sagemath-dev, make incremental build work on CircleCI, update timings in ticket description

comment:31 Changed 21 months ago by git

  • Commit changed from 4818f9bc54c2d71c12097f5b0ff10335718429e0 to 96a20eaa0cef2ce4a26867b86d16f1b813ec2f18

Branch pushed to git repo; I updated commit sha1. New commits:

c821ba0Fix build on CircleCI
07602e5fix CirceCI tag selection
3f088bfenforce valid docker tags
96a20eaexplain why "make" is not as fast as it could be

comment:32 Changed 21 months ago by saraedum

  • Work issues changed from make doc takes a bit too much time in sagemath-dev, make incremental build work on CircleCI, update timings in ticket description to update timings in ticket description

New commits:

c821ba0Fix build on CircleCI
07602e5fix CirceCI tag selection
3f088bfenforce valid docker tags
96a20eaexplain why "make" is not as fast as it could be

comment:33 Changed 21 months ago by saraedum

  • Work issues changed from update timings in ticket description to update timings in ticket description, explain how the incremental build works

comment:34 Changed 21 months ago by git

  • Commit changed from 96a20eaa0cef2ce4a26867b86d16f1b813ec2f18 to 3f9a0193369fe11dbe194a302e444b825a1de4a9

Branch pushed to git repo; I updated commit sha1. New commits:

3f9a019minor documentation improvements

comment:35 Changed 21 months ago by saraedum

  • Description modified (diff)
  • Work issues update timings in ticket description, explain how the incremental build works deleted

comment:36 Changed 21 months ago by git

  • Commit changed from 3f9a0193369fe11dbe194a302e444b825a1de4a9 to 2a57dd007a662286ec6de6c88770c67f72ed102b

Branch pushed to git repo; I updated commit sha1. New commits:

2a57dd0update timings

comment:37 Changed 21 months ago by saraedum

  • Description modified (diff)

comment:38 in reply to: ↑ 29 ; follow-up: Changed 21 months ago by embray

Replying to saraedum:

I see your concerns but I think it has to be in the main branch to be useful. If we have it on a separate branch, how would I use it if I want it on my branch? Merge it in and then in the end force push a version without CI?

My point was that Sage's "develop" branch moves at a slow pace as it currently does, but we have a separate branch (synced regularly to the current "develop") that is allowed to move faster, more like how most projects use "master".

So it wouldn't just be for continuous integration. That is, the CI stuff would still be in the main branch, but we would have a separate development branch from "develop" that can be a bit more agile, accept merge requests, etc.

comment:39 follow-up: Changed 21 months ago by embray

  • Makefile

    diff --git a/Makefile b/Makefile
    index 9245745..468302b 100644
    a b misc-clean: 
    6868        rm -f build/make/Makefile build/make/Makefile-auto
    6969        rm -f .BUILDSTART
    7070
     71# TODO: What is a "bdist"? A binary distribution?
    7172bdist-clean: clean
    7273        $(MAKE) misc-clean

Yes--this just does misc-clean but also cleans up $SAGE_LOCAL/var/tmp/sage/build. I think you can just remove this TODO note unless you had some specific intentions here.

comment:40 follow-up: Changed 21 months ago by embray

+       @# We need src/sage/, src/doc/common, src/doc/en/introspect for introspection with "??"

I believe since #24690 this is no longer true, at least with regard to src/sage. Actually you might be able to get away with completely deleting src/sage since it's basically duplicated in $SAGE_LOCAL/lib/python2.7/site-packages/sage. Though it's possible there's some small tests that will still fail without it--worth a try.

That said, it might not be much of an issue if the rdfind stuff works.

comment:41 in reply to: ↑ 38 Changed 21 months ago by saraedum

Replying to embray:

Replying to saraedum:

I see your concerns but I think it has to be in the main branch to be useful. If we have it on a separate branch, how would I use it if I want it on my branch? Merge it in and then in the end force push a version without CI?

My point was that Sage's "develop" branch moves at a slow pace as it currently does, but we have a separate branch (synced regularly to the current "develop") that is allowed to move faster, more like how most projects use "master".

So it wouldn't just be for continuous integration. That is, the CI stuff would still be in the main branch, but we would have a separate development branch from "develop" that can be a bit more agile, accept merge requests, etc.

Ok. I am not exactly sure what you are proposing with regards to this ticket. I would not mind a develop branch that moves at a faster pace however.

comment:42 in reply to: ↑ 39 ; follow-up: Changed 21 months ago by saraedum

Replying to embray:

  • Makefile

    diff --git a/Makefile b/Makefile
    index 9245745..468302b 100644
    a b misc-clean: 
    6868        rm -f build/make/Makefile build/make/Makefile-auto
    6969        rm -f .BUILDSTART
    7070
     71# TODO: What is a "bdist"? A binary distribution?
    7172bdist-clean: clean
    7273        $(MAKE) misc-clean

Yes--this just does misc-clean but also cleans up $SAGE_LOCAL/var/tmp/sage/build. I think you can just remove this TODO note unless you had some specific intentions here.

I would like to document what this is good for. So…what is the use case for this target?

comment:43 in reply to: ↑ 42 Changed 21 months ago by embray

Replying to saraedum:

I would like to document what this is good for. So…what is the use case for this target?

It seems clear to me that it cleans up all build artifacts in preparation for a binary release. But you might want to ask Volker.

comment:44 in reply to: ↑ 40 Changed 21 months ago by saraedum

  • Work issues set to document bdist target, clarify why we keep src/ in micro_release

Replying to embray:

+       @# We need src/sage/, src/doc/common, src/doc/en/introspect for introspection with "??"

I believe since #24690 this is no longer true, at least with regard to src/sage. Actually you might be able to get away with completely deleting src/sage since it's basically duplicated in $SAGE_LOCAL/lib/python2.7/site-packages/sage. Though it's possible there's some small tests that will still fail without it--worth a try.

That said, it might not be much of an issue if the rdfind stuff works.

It does not make a real difference with rdfind. Yes, it really seems that it is not needed for introspection anymore. I'll adapt the comment.

comment:45 Changed 21 months ago by saraedum

Is somebody using micro_release at the moment? It is way more aggressive now, so this might break some workflows. Who should I ask about this? … I'll write an email to sage-devel.

Last edited 21 months ago by saraedum (previous) (diff)

comment:46 Changed 21 months ago by git

  • Commit changed from 2a57dd007a662286ec6de6c88770c67f72ed102b to 28445f7107762401af831bf3e018fbbfdf6e6fd3

Branch pushed to git repo; I updated commit sha1. New commits:

28445f7Fix comments in Makefile

comment:47 Changed 21 months ago by saraedum

  • Work issues document bdist target, clarify why we keep src/ in micro_release deleted

comment:48 Changed 20 months ago by git

  • Commit changed from 28445f7107762401af831bf3e018fbbfdf6e6fd3 to e0f56c4ca8cc8a9a0f2cc802625986cb34741cae

Branch pushed to git repo; I updated commit sha1. New commits:

e0f56c4Merge remote-tracking branch 'trac/develop' into t/24655/gitlabci

comment:49 Changed 20 months ago by saraedum

I fixed the conflicts with develop. This needs review again.

comment:50 follow-up: Changed 20 months ago by embray

Maybe just a minor nitpick, but I noticed that you have all these shared scripts like describe-system.sh and so on, and that in both the circle-ci and gitlab configurations you source a lot of these scripts instead of just running them. A few of them export some environment variables, so it that case the sourcing makes sense, but why for the others, that don't modify the environment?

comment:51 follow-up: Changed 20 months ago by embray

Perhaps you can clarify something for me. I'm reading the documentation at https://docs.gitlab.com/ce/ci/docker/using_docker_build.html and at https://docs.gitlab.com/ce/ci/docker/using_docker_images.html (just to better understand some aspects of this), and the documentation starts out saying "Install GitLab Runner, etc...". But just to be clear, this will work on GitLab-provided runners as well, right? We don't necessarily have to be running our own in order for docker-in-docker to work? (Not that we wouldn't want to be running our own runners anyways, but it would be disconcerting if that were necessary for this configuration to work properly at all).

Relatedly, I don't understand this comment at all:

# We use docker-in-docker to build our docker images.  It can be faster to
# expose your outer docker daemon by mounting /var/run/docker.sock to
# /var/run/docker.sock and setting DOCKER_HOST in Settings -> CI/CD -> Secret
# variable to unix:///var/run/docker.sock

Why would you set DOCKER_HOST as a secret?

Forgive me if these are annoying questions--I'm just trying to understand how this all works so that I can continue to help with it in the future.

comment:52 follow-up: Changed 20 months ago by embray

I don't understand why you made this change:

  • deleted file src/.gitignore

    diff --git a/src/.gitignore b/src/.gitignore
    deleted file mode 100644
    index e85aa7f..00000000
    + -  
    1 /.cython_version
    2 /build
    3 /Makefile
    4 /bin/sage-env-config

You simultaneously added the same ignores to the top-level .gitignore but it's not clear to me why. I think it's pretty normal to have directory-specific .gitignores and I think that one was fine as it was, unless there was some specific reason.

comment:53 follow-up: Changed 20 months ago by embray

One more annoying nitpick, but in docker/README.md can we wrap the lines, please?

comment:54 follow-up: Changed 20 months ago by embray

It looks like in the Dockerfile you've done away with the build for the sagemath-jupyter image. I've always been somewhat split on my feelings about it, but I think in general it has been useful. It's nice having an image whose default command is just the correct thing to start up the Notebook, because I've had users mess that up before and get confused (e.g. you have to make sure to pass --ip='*' and get the quoting right as well.

On a similar note, I was confused by the documentation for .ci/test-jupyter.sh where it reads:

# Usage: ./test-jupyter.sh sage-jupyter-image [host]

This gave me the impression that it should be used verbatim like ./test-jupyter.sh sage-jupyter-image--as if there were actually an image called "sage-jupyter-image". This might be clearer if it were documented in a more GNU-like style just as ./test-jupyter.sh IMAGE-NAME [HOST], making it clear that the first argument is the name (or hash for that matter) of some Docker image, not necessarily a specific image.

I haven't looked at how all the other scripts are documented yet, but I would do them similarly.

comment:55 follow-up: Changed 20 months ago by embray

Since we already use the name sagemath-develop on DockerHub for the development images, could we go ahead and do the same here (so we don't end up with sagemath-develop and sagemath-dev)? I don't see the advantage of dropping four letters here, unless for some reason it would be really annoying to change this name at this point?

comment:56 follow-up: Changed 20 months ago by embray

(Since I'm asking for lots of little things, to be clear, I'm happy to make some of the requested changes myself--I just want to clarify some of them with you first.)

comment:57 follow-up: Changed 20 months ago by embray

Can we rename the little commit stamp file generated by the Dockerfile to .commit, so it's hidden by default? I can't imagine we'd normally care about seeing that.

comment:58 in reply to: ↑ 50 ; follow-up: Changed 20 months ago by saraedum

Replying to embray:

Maybe just a minor nitpick, but I noticed that you have all these shared scripts like describe-system.sh and so on, and that in both the circle-ci and gitlab configurations you source a lot of these scripts instead of just running them. A few of them export some environment variables, so it that case the sourcing makes sense, but why for the others, that don't modify the environment?

Ok. Fixed.

comment:59 in reply to: ↑ 51 ; follow-up: Changed 20 months ago by saraedum

Replying to embray:

Perhaps you can clarify something for me. I'm reading the documentation at https://docs.gitlab.com/ce/ci/docker/using_docker_build.html and at https://docs.gitlab.com/ce/ci/docker/using_docker_images.html (just to better understand some aspects of this), and the documentation starts out saying "Install GitLab Runner, etc...". But just to be clear, this will work on GitLab-provided runners as well, right? We don't necessarily have to be running our own in order for docker-in-docker to work? (Not that we wouldn't want to be running our own runners anyways, but it would be disconcerting if that were necessary for this configuration to work properly at all).

Yes, it works nicely with the provisioned runners out of the box.

Relatedly, I don't understand this comment at all:

# We use docker-in-docker to build our docker images.  It can be faster to
# expose your outer docker daemon by mounting /var/run/docker.sock to
# /var/run/docker.sock and setting DOCKER_HOST in Settings -> CI/CD -> Secret
# variable to unix:///var/run/docker.sock

Why would you set DOCKER_HOST as a secret?

Environment variables that are injected into the scripts that are described in .gitlab-ci.yml are called secrets in GitLab. They are not necessarily "secret". The docker docker image detects whether there is a host called docker and then assumes that this host is running docker:dind (or something similar.) You can disable this behaviour by explicitly setting DOCKER_HOST.

Forgive me if these are annoying questions--I'm just trying to understand how this all works so that I can continue to help with it in the future.

No worries at all. This is all a bit confusing.

comment:60 in reply to: ↑ 52 ; follow-up: Changed 20 months ago by saraedum

Replying to embray:

I don't understand why you made this change:

  • deleted file src/.gitignore

    diff --git a/src/.gitignore b/src/.gitignore
    deleted file mode 100644
    index e85aa7f..00000000
    + -  
    1 /.cython_version
    2 /build
    3 /Makefile
    4 /bin/sage-env-config

You simultaneously added the same ignores to the top-level .gitignore but it's not clear to me why. I think it's pretty normal to have directory-specific .gitignores and I think that one was fine as it was, unless there was some specific reason.

I want these files to be part of .dockerignore. I symlinked .dockerignore to .gitignore in the root directory. However, docker does not support .dockerignore files in subdirectories, so I had to move these rules down.

comment:61 in reply to: ↑ 53 ; follow-up: Changed 20 months ago by saraedum

Replying to embray:

One more annoying nitpick, but in docker/README.md can we wrap the lines, please?

No :) I added a comment to the README about this.

comment:62 in reply to: ↑ 54 ; follow-up: Changed 20 months ago by saraedum

Replying to embray:

It looks like in the Dockerfile you've done away with the build for the sagemath-jupyter image. I've always been somewhat split on my feelings about it, but I think in general it has been useful. It's nice having an image whose default command is just the correct thing to start up the Notebook, because I've had users mess that up before and get confused (e.g. you have to make sure to pass --ip='*' and get the quoting right as well.

I had this originally but then removed it. It's a bit annoying to build both images because you need to upload the full image to dockerhub (dockerhub does not detect shared layers between repositories it seems.) The "docker run" string is more complicated anyway, since you need the port forwarding. Do you think what it says in the README is not sufficient for people to just copy and paste it? he command should work on all POSIX shells I think.

On a similar note, I was confused by the documentation for .ci/test-jupyter.sh where it reads:

# Usage: ./test-jupyter.sh sage-jupyter-image [host]

This gave me the impression that it should be used verbatim like ./test-jupyter.sh sage-jupyter-image--as if there were actually an image called "sage-jupyter-image". This might be clearer if it were documented in a more GNU-like style just as ./test-jupyter.sh IMAGE-NAME [HOST], making it clear that the first argument is the name (or hash for that matter) of some Docker image, not necessarily a specific image.

I haven't looked at how all the other scripts are documented yet, but I would do them similarly.

Ok. Fixed.

comment:63 in reply to: ↑ 55 ; follow-ups: Changed 20 months ago by saraedum

Replying to embray:

Since we already use the name sagemath-develop on DockerHub for the development images, could we go ahead and do the same here (so we don't end up with sagemath-develop and sagemath-dev)? I don't see the advantage of dropping four letters here, unless for some reason it would be really annoying to change this name at this point?

If I understood correctly, sagemath-develop is referring to Sage's develop branch. My thinking was to use something that can not get confused with an existing term in Sage. Also, my sagemath images are continuing what the existing sagemath images do. But the sagemath-dev images are not really related to the sagemath-develop images. I am open to replacing dev with something else.

comment:64 in reply to: ↑ 57 Changed 20 months ago by saraedum

Replying to embray:

Can we rename the little commit stamp file generated by the Dockerfile to .commit, so it's hidden by default? I can't imagine we'd normally care about seeing that.

It's inside the docker/ directory so you usually won't see it. It's also .gitignore-d (but not .dockerignore-d!). I don't have a very strong opinion on this but I would need to change a bunch of things in my local setup here (not part of this ticket) if I changed this.

comment:65 in reply to: ↑ 56 Changed 20 months ago by saraedum

Replying to embray:

(Since I'm asking for lots of little things, to be clear, I'm happy to make some of the requested changes myself--I just want to clarify some of them with you first.)

Your comments are very much appreciated. I think it's easier for me to fix things as I work with these files all the time. But feel free to push your own changes of course.

comment:66 Changed 20 months ago by git

  • Commit changed from e0f56c4ca8cc8a9a0f2cc802625986cb34741cae to 4ee63131bad067e53df4369dacb30461a58c97fa

Branch pushed to git repo; I updated commit sha1. New commits:

2767adeSource only the scripts that set environment variables
834f66bWe need long lines in the README because dockerhub is not very smart
4ee6313make usage easier to understand

comment:67 Changed 20 months ago by saraedum

  • Reviewers set to Erik Bray

comment:68 Changed 20 months ago by saraedum

  • Work issues set to check that 4ee63131bad067e53df4369dacb30461a58c97fa still works on CircleCI/GitLab CI

comment:69 follow-up: Changed 20 months ago by saraedum

I am now getting

Step 40/42 : RUN make build
 ---> Running in 9605e7f7ecf4
build/make/Makefile --stop
make: build/make/Makefile: Command not found
Makefile:20: recipe for target 'all-build' failed

Has anything changed in Sage recently that could cause this?

comment:70 in reply to: ↑ 59 ; follow-up: Changed 20 months ago by embray

Replying to saraedum:

Replying to embray:

Relatedly, I don't understand this comment at all:

# We use docker-in-docker to build our docker images.  It can be faster to
# expose your outer docker daemon by mounting /var/run/docker.sock to
# /var/run/docker.sock and setting DOCKER_HOST in Settings -> CI/CD -> Secret
# variable to unix:///var/run/docker.sock

Why would you set DOCKER_HOST as a secret?

Environment variables that are injected into the scripts that are described in .gitlab-ci.yml are called secrets in GitLab. They are not necessarily "secret". The docker docker image detects whether there is a host called docker and then assumes that this host is running docker:dind (or something similar.) You can disable this behaviour by explicitly setting DOCKER_HOST.

Okay, thanks for clarifying the "secrets" thing. That was definitely confusing me.

I still had to stare at this for a while though to understand what was meant. In particular I didn't understand exactly what services: meant (now I do), and it was also unclear to me that docker:dind was just the name of a Docker image (where dind is the tag indicating the Docker-in-Docker image). I also didn't fully understand the implications of what "Docker-in-Docker" meant. I mean, I've installed the Docker client inside a container before and linked it to my outer Docker engine via a socket before, and I thought that's all that was meant by "Docker-in-Docker".

It's clearer to me now what this means, but maybe this should be documented somewhere more clearly, like in the Readme, instead of burying it here. In fact, is there any advantage to using docker-in-docker specifically over setting DOCKER_HOST like you explain here?

comment:71 in reply to: ↑ 61 Changed 20 months ago by embray

Replying to saraedum:

Replying to embray:

One more annoying nitpick, but in docker/README.md can we wrap the lines, please?

No :) I added a comment to the README about this.

Ugh, that sucks, but okay.

comment:72 in reply to: ↑ 60 ; follow-up: Changed 20 months ago by embray

Replying to saraedum:

I want these files to be part of .dockerignore. I symlinked .dockerignore to .gitignore in the root directory. However, docker does not support .dockerignore files in subdirectories, so I had to move these rules down.

I see; that's too bad. Something about it doesn't sit right with me. For example I feel like, although it's convenient to symlink .gitignore to .dockerignore (as there are many patterns in the former we would also want to apply to the latter), I could envision a scenario where we don't want them to be exactly the same.

We can try it for now though, since I don't have an immediate argument against it.

comment:73 in reply to: ↑ 62 ; follow-up: Changed 20 months ago by embray

Replying to saraedum:

Replying to embray:

It looks like in the Dockerfile you've done away with the build for the sagemath-jupyter image. I've always been somewhat split on my feelings about it, but I think in general it has been useful. It's nice having an image whose default command is just the correct thing to start up the Notebook, because I've had users mess that up before and get confused (e.g. you have to make sure to pass --ip='*' and get the quoting right as well.

I had this originally but then removed it. It's a bit annoying to build both images because you need to upload the full image to dockerhub (dockerhub does not detect shared layers between repositories it seems.) The "docker run" string is more complicated anyway, since you need the port forwarding. Do you think what it says in the README is not sufficient for people to just copy and paste it? he command should work on all POSIX shells I think.

Nicolas might be interested in chiming in here, but even as an "expert" I've personally found the sagemath-jupyter image to be useful, as a way to quickly fire up a notebook (since I don't use the notebook much personally, but I do sometimes need to test things in it). I don't want to have to look up the correct command line every time I do.

It's weird what you say about Docker having to re-upload the intermediate images. I don't think I've observed that before myself, but maybe I just wasn't paying attention.

I don't think we need to make the sagemath-jupyter for every build though--only for releases should be sufficient. So what if we at least kept a build target for it in the Dockerfile (or even in a separate Dockerfile if it's more convenient?) and only build/push those images for final release versions?

comment:74 in reply to: ↑ 63 ; follow-up: Changed 20 months ago by embray

Replying to saraedum:

Replying to embray:

Since we already use the name sagemath-develop on DockerHub for the development images, could we go ahead and do the same here (so we don't end up with sagemath-develop and sagemath-dev)? I don't see the advantage of dropping four letters here, unless for some reason it would be really annoying to change this name at this point?

If I understood correctly, sagemath-develop is referring to Sage's develop branch. My thinking was to use something that can not get confused with an existing term in Sage. Also, my sagemath images are continuing what the existing sagemath images do. But the sagemath-dev images are not really related to the sagemath-develop images. I am open to replacing dev with something else.

I must be confused about the nomenclature then. I'll re-review everything but I assumed sagemath-dev was equivalent to builds from the develop branch. Clearly, if I'm wrong, then I must be missing something crucial. If they're not really related though a different name (maybe even a different Docker repository for them--e.g. a sagemath-dev repository) might be good.

comment:75 in reply to: ↑ 63 ; follow-up: Changed 20 months ago by embray

Replying to saraedum:

Replying to embray:

Can we rename the little commit stamp file generated by the Dockerfile to .commit, so it's hidden by default? I can't imagine we'd normally care about seeing that.

It's inside the docker/ directory so you usually won't see it. It's also .gitignore-d (but not .dockerignore-d!). I don't have a very strong opinion on this but I would need to change a bunch of things in my local setup here (not part of this ticket) if I changed this.

I think personally I would basically never want to see it (e.g. with ls). If I didn't know what it was for it'd just confuse me. I'm not sure what else you would need to change? I'm not going to insist on it if it's a huge pain but I'm confused as to why it would be.

Last edited 20 months ago by embray (previous) (diff)

comment:76 in reply to: ↑ 69 Changed 20 months ago by embray

Replying to saraedum:

I am now getting

Step 40/42 : RUN make build
 ---> Running in 9605e7f7ecf4
build/make/Makefile --stop
make: build/make/Makefile: Command not found
Makefile:20: recipe for target 'all-build' failed

Has anything changed in Sage recently that could cause this?

Not that I can think of. It might be something to do with your environment. The relevant line from the top-level Makefile is:

     $(MAKE) build/make/Makefile --stop

so you would get this error if the $(MAKE) variable were somehow set to empty.

comment:77 follow-up: Changed 20 months ago by embray

Still confused about sagemath-dev even after re-reviewing. The point of sagemath-develop, at least originally, was supposed to be more or less the same thing--it would be built from the latest develop branch and contain most/all build artifacts (and is hence a bit fatter than the normal sagemath image, by ~2GB).

comment:78 Changed 20 months ago by embray

+# 1) Restore the git checkout ARTIFACT_BASE was built from, recorded in
+#    docker/commit. (Or go directly to FETCH_HEAD if there is no history to
+#    restore.)

Maybe here explicitly mention something like "unless ARTIFACT_BASE=source-clean". I think that's what the parenthetical statement is referring to, but I had to go around in circles a few times to understand that (since by default ARTIFACT_BASE=source-clean :) It says that in the message that gets echo'd later on, but the connection isn't obvious at first.

Last edited 20 months ago by embray (previous) (diff)

comment:79 follow-up: Changed 20 months ago by embray

In the Dockerfile section for sagemath-dev it has

+CMD ["bash"]

I've usually found it convenient (even for the -devel image) to just start straight into the sage shell like /home/sage/sage/sage -sh or something. But maybe for debugging purposes it's just a well not to. I'm not sure.

comment:80 follow-up: Changed 20 months ago by embray

In pull-gitlab.sh you have:

+# The variable $DOCKER_IMAGE is set to the full name of the pulled image;
+# source this script to use it.

that's rather awkward to me. Perhaps this script could output the image name at the end (perhaps silencing the other commands, or piping them to stderr or something). Or better still, maybe DOCKER_IMAGE could be set somewhere else, since the way it's pieced together in that script doesn't seem all that directly to the script itself.

(The general principle I'm going on with these scripts is that even though they're written for and used primarily by the CI systems, for testing and development purposes I'd rather they work in some "normal", unsurprising way as much as possible.)

comment:81 in reply to: ↑ 58 ; follow-up: Changed 20 months ago by embray

Replying to saraedum:

Replying to embray:

Maybe just a minor nitpick, but I noticed that you have all these shared scripts like describe-system.sh and so on, and that in both the circle-ci and gitlab configurations you source a lot of these scripts instead of just running them. A few of them export some environment variables, so it that case the sourcing makes sense, but why for the others, that don't modify the environment?

Ok. Fixed.

I think you missed one here:

+            # Build docker images
+            . .ci/build-docker.sh

comment:82 Changed 20 months ago by embray

Okay--I think I've finally mostly understood all of this :) The walkthrough you gave me earlier certainly helped, but when it came to the details I still had a bit to go through.

Thank you again for your Herculean effort on this. Other than the numerous, but minor comments above (plus figuring out why $(MAKE) broke for you) I think we're pretty close... I'm excited to launch this.

comment:83 Changed 20 months ago by git

  • Commit changed from 4ee63131bad067e53df4369dacb30461a58c97fa to 0892d35ba0c1a6dee2df8faa74cd1155e9964c4b

Branch pushed to git repo; I updated commit sha1. New commits:

0892d35Explain the pros and cons of docker:dind

comment:84 in reply to: ↑ 70 ; follow-up: Changed 20 months ago by saraedum

Replying to embray:

Replying to saraedum:

Replying to embray:

Relatedly, I don't understand this comment at all:

# We use docker-in-docker to build our docker images.  It can be faster to
# expose your outer docker daemon by mounting /var/run/docker.sock to
# /var/run/docker.sock and setting DOCKER_HOST in Settings -> CI/CD -> Secret
# variable to unix:///var/run/docker.sock

Why would you set DOCKER_HOST as a secret?

Environment variables that are injected into the scripts that are described in .gitlab-ci.yml are called secrets in GitLab. They are not necessarily "secret". The docker docker image detects whether there is a host called docker and then assumes that this host is running docker:dind (or something similar.) You can disable this behaviour by explicitly setting DOCKER_HOST.

Okay, thanks for clarifying the "secrets" thing. That was definitely confusing me.

I still had to stare at this for a while though to understand what was meant. In particular I didn't understand exactly what services: meant (now I do), and it was also unclear to me that docker:dind was just the name of a Docker image (where dind is the tag indicating the Docker-in-Docker image). I also didn't fully understand the implications of what "Docker-in-Docker" meant. I mean, I've installed the Docker client inside a container before and linked it to my outer Docker engine via a socket before, and I thought that's all that was meant by "Docker-in-Docker".

It's clearer to me now what this means, but maybe this should be documented somewhere more clearly, like in the Readme, instead of burying it here. In fact, is there any advantage to using docker-in-docker specifically over setting DOCKER_HOST like you explain here?

I extended the section about docker:dind. Do you think that's enough? The section could go into the README but then again it's absolutely standard in the GitLab CI world.

comment:85 in reply to: ↑ 72 Changed 20 months ago by saraedum

Replying to embray:

Replying to saraedum:

I want these files to be part of .dockerignore. I symlinked .dockerignore to .gitignore in the root directory. However, docker does not support .dockerignore files in subdirectories, so I had to move these rules down.

I see; that's too bad. Something about it doesn't sit right with me. For example I feel like, although it's convenient to symlink .gitignore to .dockerignore (as there are many patterns in the former we would also want to apply to the latter), I could envision a scenario where we don't want them to be exactly the same.

We can try it for now though, since I don't have an immediate argument against it.

I am also not perfectly happy with it. But I also don't want to maintain a .dockerignore unless people widely use and care about this.

comment:86 in reply to: ↑ 73 ; follow-up: Changed 20 months ago by saraedum

Replying to embray:

Replying to saraedum:

Replying to embray:

It looks like in the Dockerfile you've done away with the build for the sagemath-jupyter image. I've always been somewhat split on my feelings about it, but I think in general it has been useful. It's nice having an image whose default command is just the correct thing to start up the Notebook, because I've had users mess that up before and get confused (e.g. you have to make sure to pass --ip='*' and get the quoting right as well.

I had this originally but then removed it. It's a bit annoying to build both images because you need to upload the full image to dockerhub (dockerhub does not detect shared layers between repositories it seems.) The "docker run" string is more complicated anyway, since you need the port forwarding. Do you think what it says in the README is not sufficient for people to just copy and paste it? he command should work on all POSIX shells I think.

Nicolas might be interested in chiming in here, but even as an "expert" I've personally found the sagemath-jupyter image to be useful, as a way to quickly fire up a notebook (since I don't use the notebook much personally, but I do sometimes need to test things in it). I don't want to have to look up the correct command line every time I do.

It's weird what you say about Docker having to re-upload the intermediate images. I don't think I've observed that before myself, but maybe I just wasn't paying attention.

I don't think we need to make the sagemath-jupyter for every build though--only for releases should be sufficient. So what if we at least kept a build target for it in the Dockerfile (or even in a separate Dockerfile if it's more convenient?) and only build/push those images for final release versions?

I would like to have the releases and non-releases produce the same output if possible. It makes breaking things much harder. If people want to try something out, they should click on a binder link (see #24842.) I believe that's much more convenient anyway. I don't see having to lookup a command as a problem but I might be missing something here: You need to do something about binding the ports anyway, so the casual docker user (me too actually) needs to lookup the docker run command anyway?

comment:87 in reply to: ↑ 84 ; follow-up: Changed 20 months ago by embray

Replying to saraedum:

I extended the section about docker:dind. Do you think that's enough? The section could go into the README but then again it's absolutely standard in the GitLab CI world.

Thank you--that's much clearer now, I think. I would still lean towards explaining this in the Readme, because what's bog standard in GitLab CI, I would think, is at this point virtually unknown to almost everyone else who works on Sage (including me). Maybe with a link to https://docs.gitlab.com/ee/ci/docker/using_docker_build.html#use-docker-in-docker-executor which helped me understand this.

I say that because this actually impacts how one configures their runner and/or their CI settings, and they're more likely to find it sooner in the README--most people aren't going to want to dig into the details of the .gitlab-ci.yaml if they don't have to.

comment:88 in reply to: ↑ 74 Changed 20 months ago by saraedum

Replying to embray:

Replying to saraedum:

Replying to embray:

Since we already use the name sagemath-develop on DockerHub for the development images, could we go ahead and do the same here (so we don't end up with sagemath-develop and sagemath-dev)? I don't see the advantage of dropping four letters here, unless for some reason it would be really annoying to change this name at this point?

If I understood correctly, sagemath-develop is referring to Sage's develop branch. My thinking was to use something that can not get confused with an existing term in Sage. Also, my sagemath images are continuing what the existing sagemath images do. But the sagemath-dev images are not really related to the sagemath-develop images. I am open to replacing dev with something else.

I must be confused about the nomenclature then. I'll re-review everything but I assumed sagemath-dev was equivalent to builds from the develop branch. Clearly, if I'm wrong, then I must be missing something crucial. If they're not really related though a different name (maybe even a different Docker repository for them--e.g. a sagemath-dev repository) might be good.

Yes, I think you misunderstood that part. The dev images are about the build artifacts being intact. I don't discriminate branches (except for building from scratch for master/develop and building from develop's sagemath-dev image for everyone else.)

comment:89 in reply to: ↑ 75 Changed 20 months ago by saraedum

Replying to embray:

Replying to saraedum:

Replying to embray:

Can we rename the little commit stamp file generated by the Dockerfile to .commit, so it's hidden by default? I can't imagine we'd normally care about seeing that.

It's inside the docker/ directory so you usually won't see it. It's also .gitignore-d (but not .dockerignore-d!). I don't have a very strong opinion on this but I would need to change a bunch of things in my local setup here (not part of this ticket) if I changed this.

I think personally I would basically never want to see it (e.g. with ls). If I didn't know what it was for it'd just confuse me. I'm not sure what else you would need to change? I'm not going to insist on it if it's a huge pain but I'm confused as to why it would be.

It breaks my existing incremental builds on other branches because they expect it to be called commit. I really don't mind renaming it though.

comment:90 in reply to: ↑ 86 ; follow-up: Changed 20 months ago by embray

Replying to saraedum:

I would like to have the releases and non-releases produce the same output if possible. It makes breaking things much harder. If people want to try something out, they should click on a binder link (see #24842.) I believe that's much more convenient anyway. I don't see having to lookup a command as a problem but I might be missing something here: You need to do something about binding the ports anyway, so the casual docker user (me too actually) needs to lookup the docker run command anyway?

That's a fair point, but I still think docker run -p <local-port>:<container-port> <image-name> is easier to remember (I used to have to look up how -p works but not usually anymore) than docker run -p 127.0.0.1:8888:8888 sagemath/sagemath sage -notebook=jupyter --no-browser --ip='*' --port=8888, which is definitely more than I can remember.

I agree about releases and non-releases having much of the same output, but in this case it's really just a convenience layer for casual users, who are mostly only going to be concerned about final releases.

comment:91 Changed 20 months ago by embray

I remember when I first took over maintenance of the sage Docker images I was skeptical about sagemath-jupyter as well, which was already there before me. I had pretty much the same skepticism about it. But now that I've personally found it useful my mind is changed.

comment:92 in reply to: ↑ 77 ; follow-up: Changed 20 months ago by saraedum

Replying to embray:

Still confused about sagemath-dev even after re-reviewing. The point of sagemath-develop, at least originally, was supposed to be more or less the same thing--it would be built from the latest develop branch and contain most/all build artifacts (and is hence a bit fatter than the normal sagemath image, by ~2GB).

I don't really understand the confusion. But I should totally improve the documentation if it is unclear. I build sagemath and sagemath-dev for every branch, not only for develop. The sagemath-dev image has the build artifacts, the sagemath image does not. So, sagemath-dev is not related to the develop branch. But it includes the build artifacts in the same way that sagemath-develop apparently did.

comment:93 in reply to: ↑ 79 Changed 20 months ago by saraedum

Replying to embray:

In the Dockerfile section for sagemath-dev it has

+CMD ["bash"]

I've usually found it convenient (even for the -devel image) to just start straight into the sage shell like /home/sage/sage/sage -sh or something. But maybe for debugging purposes it's just a well not to. I'm not sure.

Maybe it has improved, but I really dislike how sage -sh screws with the environment. I find it really confusing unless you understand well what it is doing exactly. I'd rather keep the bash as it is.

comment:94 in reply to: ↑ 81 Changed 20 months ago by saraedum

Replying to embray:

Replying to saraedum:

Replying to embray:

Maybe just a minor nitpick, but I noticed that you have all these shared scripts like describe-system.sh and so on, and that in both the circle-ci and gitlab configurations you source a lot of these scripts instead of just running them. A few of them export some environment variables, so it that case the sourcing makes sense, but why for the others, that don't modify the environment?

Ok. Fixed.

I think you missed one here:

+            # Build docker images
+            . .ci/build-docker.sh

Thanks. Fixed :)

comment:95 in reply to: ↑ 80 Changed 20 months ago by saraedum

Replying to embray:

In pull-gitlab.sh you have:

+# The variable $DOCKER_IMAGE is set to the full name of the pulled image;
+# source this script to use it.

that's rather awkward to me. Perhaps this script could output the image name at the end (perhaps silencing the other commands, or piping them to stderr or something). Or better still, maybe DOCKER_IMAGE could be set somewhere else, since the way it's pieced together in that script doesn't seem all that directly to the script itself.

(The general principle I'm going on with these scripts is that even though they're written for and used primarily by the CI systems, for testing and development purposes I'd rather they work in some "normal", unsurprising way as much as possible.)

I just tried to build it outside and did not like the result. Would you want to try to come up with something here?

comment:96 in reply to: ↑ 90 Changed 20 months ago by saraedum

  • Work issues changed from check that 4ee63131bad067e53df4369dacb30461a58c97fa still works on CircleCI/GitLab CI to check that this still works on CircleCI/GitLab CI, make jupyter more accesible

Replying to embray:

Replying to saraedum:

I would like to have the releases and non-releases produce the same output if possible. It makes breaking things much harder. If people want to try something out, they should click on a binder link (see #24842.) I believe that's much more convenient anyway. I don't see having to lookup a command as a problem but I might be missing something here: You need to do something about binding the ports anyway, so the casual docker user (me too actually) needs to lookup the docker run command anyway?

That's a fair point, but I still think docker run -p <local-port>:<container-port> <image-name> is easier to remember (I used to have to look up how -p works but not usually anymore) than docker run -p 127.0.0.1:8888:8888 sagemath/sagemath sage -notebook=jupyter --no-browser --ip='*' --port=8888, which is definitely more than I can remember.

I agree about releases and non-releases having much of the same output, but in this case it's really just a convenience layer for casual users, who are mostly only going to be concerned about final releases.

Let me try to come up with an easier solution for this without creating an extra repository.

comment:97 follow-up: Changed 20 months ago by saraedum

Ok. It's getting hard to keep track of all the threads here. I think I addressed everything you mentioned. Let me know if you think that something is missing.

comment:98 Changed 20 months ago by git

  • Commit changed from 0892d35ba0c1a6dee2df8faa74cd1155e9964c4b to c3eeb65d4adb6f77ecdb76f5c016c70973c935ae

Branch pushed to git repo; I updated commit sha1. New commits:

290bec4clarify comment
c3eeb65do not source scripts without side effects

comment:99 Changed 20 months ago by git

  • Commit changed from c3eeb65d4adb6f77ecdb76f5c016c70973c935ae to bf56bcae1c28545666ed7ed489c8bcba3e3422ee

Branch pushed to git repo; I updated commit sha1. New commits:

bf56bcahowto provision your own runners

comment:100 in reply to: ↑ 87 Changed 20 months ago by saraedum

Replying to embray:

Replying to saraedum:

I extended the section about docker:dind. Do you think that's enough? The section could go into the README but then again it's absolutely standard in the GitLab CI world.

Thank you--that's much clearer now, I think. I would still lean towards explaining this in the Readme, because what's bog standard in GitLab CI, I would think, is at this point virtually unknown to almost everyone else who works on Sage (including me). Maybe with a link to https://docs.gitlab.com/ee/ci/docker/using_docker_build.html#use-docker-in-docker-executor which helped me understand this.

I say that because this actually impacts how one configures their runner and/or their CI settings, and they're more likely to find it sooner in the README--most people aren't going to want to dig into the details of the .gitlab-ci.yaml if they don't have to.

Oh, missed that one. So, the thing here is really that it does not affect you at all if you just use gitlab.com. If you provision your own runners you need to make them --privileged. If you care a lot about caching and speed and don't use a cloud service, then you might want to do the DOCKER_HOST thing. I only mention it there to say that there's a way to disable docker:dind if you don't want it for some reason.


New commits:

bf56bcahowto provision your own runners

comment:101 in reply to: ↑ 97 ; follow-up: Changed 20 months ago by embray

Replying to saraedum:

Ok. It's getting hard to keep track of all the threads here. I think I addressed everything you mentioned. Let me know if you think that something is missing.

Heh, yeah, sorry. I thought of maybe putting a TODO list in the ticket description or something. I'll touch back on this tomorrow and see if I can summarize whatever remaining issues there are.

comment:102 in reply to: ↑ 101 Changed 20 months ago by saraedum

Replying to embray:

Replying to saraedum:

Ok. It's getting hard to keep track of all the threads here. I think I addressed everything you mentioned. Let me know if you think that something is missing.

Heh, yeah, sorry. I thought of maybe putting a TODO list in the ticket description or something. I'll touch back on this tomorrow and see if I can summarize whatever remaining issues there are.

I find the "work issues" to be a good place to collect such things.

comment:103 in reply to: ↑ 92 ; follow-up: Changed 20 months ago by nthiery

Replying to saraedum:

I don't really understand the confusion. But I should totally improve the documentation if it is unclear. I build sagemath and sagemath-dev for every branch, not only for develop. The sagemath-dev image has the build artifacts, the sagemath image does not. So, sagemath-dev is not related to the develop branch. But it includes the build artifacts in the same way that sagemath-develop apparently did.

Just throwing a random thought: what about sagemath-build? But sagemath-dev may be the right thing if you see it as the closest analogs of xxx-dev packages in debian&co.

comment:104 Changed 20 months ago by nthiery

No strong opinion for sagemath-jupyter. The docker command to start it (with port forwarding) can be reconstruted without doc lookup by someone who has some familiarity with docker. The full command requires both sagemath and jupyter familiarity.

Having a jupyter image explicitly bring attention that our image is "jupyter ready"; on the other hand, it may be confusing suggesting that our base image is not.

comment:105 follow-up: Changed 20 months ago by nthiery

No strong opinion on sage -sh either. The only important feature is that, for binder use but also beyond, the commands jupyter, gap, singular, ... be in the path and work.

Thanks for all the work!

comment:106 in reply to: ↑ 103 Changed 20 months ago by saraedum

Replying to nthiery:

Replying to saraedum:

I don't really understand the confusion. But I should totally improve the documentation if it is unclear. I build sagemath and sagemath-dev for every branch, not only for develop. The sagemath-dev image has the build artifacts, the sagemath image does not. So, sagemath-dev is not related to the develop branch. But it includes the build artifacts in the same way that sagemath-develop apparently did.

Just throwing a random thought: what about sagemath-build? But sagemath-dev may be the right thing if you see it as the closest analogs of xxx-dev packages in debian&co.

It's only an analogue in the sense that it contains the things that developers usually care about. I think that -dev is good for people who are randomly looking for a sagemath image on docker (and know about the nomenclature in Debian,…): with the -dev they should know immediately that that's not what they need. I don't mind -build but I don't find it as obvious in that regard.

comment:107 Changed 20 months ago by saraedum

  • Work issues changed from check that this still works on CircleCI/GitLab CI, make jupyter more accesible to check that this still works on CircleCI/GitLab CI, make jupyter more accesible, check what's in the PATH (comment:105)

comment:108 Changed 20 months ago by embray

Then maybe we should just delete the old sagemath-develop image from Dockerhub so that there's no confusion, though I don't know why we can't just reuse the name for the new images as well. I'm still not convinced it doesn't already serve the same purpose. I'm fine either way as long as Dockerhub will let me delete the old one. Whatever the outcome all I don't want is for there to be a "sagemath-dev" and a "sagemath-develop" on there.

Last edited 20 months ago by embray (previous) (diff)

comment:109 Changed 20 months ago by git

  • Commit changed from bf56bcae1c28545666ed7ed489c8bcba3e3422ee to 183ddd40b2ebd9f53d242a112325bda119b8298c

Branch pushed to git repo; I updated commit sha1. New commits:

fa4599aRename commit to .commit
71d9ce7export variables in sourced scripts
329765eMerge remote-tracking branch 'trac/develop' into HEAD
2493bb3Fixup for fa4599a6c6849e5a2ac2a1f738eed437843454fa
b3c6d14Make docker-build.sh POSIX compliant
183ddd4Make jupyter invocation easier to remember

comment:110 Changed 20 months ago by git

  • Commit changed from 183ddd40b2ebd9f53d242a112325bda119b8298c to 2e74da9f1e408d8f569151198b4c739b2208749f

Branch pushed to git repo; I updated commit sha1. New commits:

59952f3Drop pipefail
a727ec5Compilation sometimes takes a bit longer
2e74da9POSIX complaint tests

comment:111 in reply to: ↑ 105 ; follow-up: Changed 20 months ago by saraedum

Replying to nthiery:

No strong opinion on sage -sh either. The only important feature is that, for binder use but also beyond, the commands jupyter, gap, singular, ... be in the path and work.

nthiery: I would like to automatically test that this is the case. So actually what's the requirement here? How does binder run this image?

comment:112 follow-up: Changed 20 months ago by embray

It's nothing special about binder per se--rather, it's what the current Docker image does. It sets the default environment so that, for example, $SAGE_LOCAL/bin is on the PATH, so all commands installed as part of Sage "just work".

I had to remind myself exactly how this works though. It does not start the container with sage -sh like I suggested previously. I think I ran into some problems with that as well. However, my solution does take many of the environment variables that result from running sage -sh and sets them by default in the container:

https://github.com/sagemath/docker-images/commit/acbfc7d54d435cf11b6325e93eab2c3dfe9ee48c#diff-9cf7721d3a01b10049e0d9f6e29e4058R45

I think it would be good to keep something like this. Actually, it's very important for binder to work correctly since it also ensures that it's running the Jupyter installed with Sage.

Last edited 20 months ago by embray (previous) (diff)

comment:113 Changed 20 months ago by embray

I think there might still be some little details I'm not personally happy with and I have some doubts still about some naming issues. However, if/when this is working for you I'd like to get it merged so we can go ahead and start playing with it, and then fuss about the details later. I think we should also coordinate with Volker on this--I think it would be nice if he can allow us to push changes to develop just on the CI stuff on an as-needed basis without having to necessarily go through a full CI cycle with the buildbots, since this obviously doesn't affect any of the existing continuous integration process.

comment:114 in reply to: ↑ 112 ; follow-ups: Changed 20 months ago by saraedum

Replying to embray:

It's nothing special about binder per se--rather, it's what the current Docker image does. It sets the default environment so that, for example, $SAGE_LOCAL/bin is on the PATH, so all commands installed as part of Sage "just work".

I had to remind myself exactly how this works though. It does not start the container with sage -sh like I suggested previously. I think I ran into some problems with that as well. However, my solution does take many of the environment variables that result from running sage -sh and sets them by default in the container:

https://github.com/sagemath/docker-images/commit/acbfc7d54d435cf11b6325e93eab2c3dfe9ee48c#diff-9cf7721d3a01b10049e0d9f6e29e4058R45

I think it would be good to keep something like this. Actually, it's very important for binder to work correctly since it also ensures that it's running the Jupyter installed with Sage.

Ok. But how is that different from just setting the entrypoint to sage -sh like I do currently?

comment:115 Changed 20 months ago by saraedum

  • Work issues changed from check that this still works on CircleCI/GitLab CI, make jupyter more accesible, check what's in the PATH (comment:105) to check that this still works on CircleCI/GitLab CI

comment:116 Changed 20 months ago by saraedum

embray, there is now a shortcut to start jupyter with the usual arguments: docker run -p8888:8888 sagemath/sagemath:latest sage-jupyter

Do you think that's good enough?

comment:117 in reply to: ↑ 114 Changed 20 months ago by nthiery

Ok. But how is that different from just setting the entrypoint to sage -sh like I do currently?

Will it work if someone does something like docker run -ti gap ?

comment:118 in reply to: ↑ 111 Changed 20 months ago by nthiery

nthiery: I would like to automatically test that this is the case. So actually what's the requirement here? How does binder run this image?

Binder just runs the docker image with appropriate port forwarding, and then launches in the container the command 'jupyter notebook' from the path, with the port flag and a few others. At some stage, it was instead calling directly the command jupyter-notebook but presumably this is no more the case.

The specific requirements are there:

https://mybinder.readthedocs.io/en/latest/dockerfile.html#preparing-your-dockerfile

Notes:

  • We enforced the notebook=5 with with a pip install in our previous docker image. Since #24168, merged in 8.1, the notebook has been upgraded to v5, so the pip install should not be needed anymore.
  • Items 4 and 5 do not apply to Sage's docker image; instead they are about the (essentially trivial) Dockerfile's that authors of binder-enabled repositories will write.

Btw: Item 5 is new in the documentation; and seems optional at this stage.

comment:119 in reply to: ↑ 114 Changed 20 months ago by embray

Replying to saraedum:

Replying to embray:

It's nothing special about binder per se--rather, it's what the current Docker image does. It sets the default environment so that, for example, $SAGE_LOCAL/bin is on the PATH, so all commands installed as part of Sage "just work".

I had to remind myself exactly how this works though. It does not start the container with sage -sh like I suggested previously. I think I ran into some problems with that as well. However, my solution does take many of the environment variables that result from running sage -sh and sets them by default in the container:

https://github.com/sagemath/docker-images/commit/acbfc7d54d435cf11b6325e93eab2c3dfe9ee48c#diff-9cf7721d3a01b10049e0d9f6e29e4058R45

I think it would be good to keep something like this. Actually, it's very important for binder to work correctly since it also ensures that it's running the Jupyter installed with Sage.

Ok. But how is that different from just setting the entrypoint to sage -sh like I do currently?

IIRC it's a little less invasive and works for more use cases. I can't remember though exactly why I didn't put sage -sh in the entrypoint. I know I tried that and it had various problems.

comment:120 Changed 20 months ago by nthiery

Just a random thought: this might be calling for an equivalent of conda's source activate to setup sage's environment variables without having to run a subshell as in sage -sh.

comment:121 follow-ups: Changed 20 months ago by embray

I agree--I think that was the main problem here--that sage -sh ran a subshell.

comment:122 in reply to: ↑ 121 Changed 20 months ago by saraedum

Replying to embray:

I agree--I think that was the main problem here--that sage -sh ran a subshell.

Why is that a problem?

comment:123 in reply to: ↑ 121 Changed 20 months ago by saraedum

Replying to embray:

I agree--I think that was the main problem here--that sage -sh ran a subshell.

So this runs a shell but then spawns the argument with exec so there is no additional process created if that's what you would like to avoid.

comment:124 Changed 20 months ago by embray

I think my main motivation was just ensure that the environment by default has Sage on the $PATH, as well as any other environment variables needed at runtime for software installed by Sage. But I admit I can't remember why I also didn't just essentially make sage -sh the entrypoint. I think it was partly because of what you yourself said in terms of it changing too much of the default environment. I'm pretty sure I tried that at first and something didn't work right, but since I can't remember what it was I'd say go for it and give it a try anyways.

But I do agree with Nicolas that something like a source sage-activate would be good to have in Sage, to just modify the existing environment without going into a subshell (and a deactivate would be nice to have as well). We could handle that in another issue though...

comment:125 Changed 20 months ago by saraedum

  • Work issues changed from check that this still works on CircleCI/GitLab CI to check that this still works on CircleCI/GitLab CI; investigate why builds on GitLab hang sometimes

Sometimes my builds on GitLab hang. Apparently during the step of uploading build artifacts. The problem is that the log has already exceeded 3MB at that point so I can't see what was the actual build error. It would be nice of somehow dropping the boring parts from the log in GitLab

comment:126 Changed 20 months ago by saraedum

  • Dependencies set to #25161

Since the build hangs during docbuild, I have a feeling that #25161 might be to blame.

comment:127 Changed 20 months ago by embray

Could I see a link to one of the builds that hanged? Also how much of the logs are we outputting? We could certainly add a less verbose log option. Most of the logs are written to files anyways, so if we needed to do some post-mortem debugging we can just look in the files, and avoid writing so much to stdout.

comment:128 Changed 20 months ago by saraedum

Here's an example of a build that timed out: https://gitlab.com/saraedum/sage/-/jobs/62689653 Everything interesting is missing in this log. GitLab limits logs to about 4MB, so I had to cut the build log after 3MB. We can only download the log files when there was no timeout…

I'll try to make this work better in case of timeouts. I'll report back when I know what's the last thing that happens before the timeout.

comment:129 Changed 20 months ago by git

  • Commit changed from 2e74da9f1e408d8f569151198b4c739b2208749f to 3c78351fdcbfe93c2c353d0aef23bc2e976d7e17

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

0936a5dMerge remote-tracking branch 'trac/u/saraedum/sphinxbuild-stack' into u/saraedum/gitlabci
3f0c2f0Add doctest for #25160
0477b3cCheck for errors before ignoring output
6b80d72Merge remote-tracking branch 'trac/u/saraedum/sphinxbuild-stack' into u/saraedum/gitlabci
ca66c03Revert "Check for errors before ignoring output"
42993c9Merge remote-tracking branch 'trac/u/saraedum/sphinxbuild-stack' into u/saraedum/gitlabci
6aade39Silence docker pull
93a7ba6Silence apt-get
158af15Silence setup.py
3c78351Ignore much more useless chatter in sphinx build

comment:130 Changed 20 months ago by git

  • Commit changed from 3c78351fdcbfe93c2c353d0aef23bc2e976d7e17 to 7d5f3898f3bff423efe3c3e97997de6af5ccd573

Branch pushed to git repo; I updated commit sha1. New commits:

7d5f389Smarter truncation of output

comment:131 Changed 20 months ago by saraedum

  • Dependencies changed from #25161 to #25161, #25160
  • Work issues changed from check that this still works on CircleCI/GitLab CI; investigate why builds on GitLab hang sometimes to investigate why builds on GitLab hang sometimes

Last 10 new commits:

0936a5dMerge remote-tracking branch 'trac/u/saraedum/sphinxbuild-stack' into u/saraedum/gitlabci
3f0c2f0Add doctest for #25160
0477b3cCheck for errors before ignoring output
6b80d72Merge remote-tracking branch 'trac/u/saraedum/sphinxbuild-stack' into u/saraedum/gitlabci
ca66c03Revert "Check for errors before ignoring output"
42993c9Merge remote-tracking branch 'trac/u/saraedum/sphinxbuild-stack' into u/saraedum/gitlabci
6aade39Silence docker pull
93a7ba6Silence apt-get
158af15Silence setup.py
3c78351Ignore much more useless chatter in sphinx build

New commits:

7d5f389Smarter truncation of output

comment:132 Changed 20 months ago by git

  • Commit changed from 7d5f3898f3bff423efe3c3e97997de6af5ccd573 to cf80b0a90555b0b456efdbd5c3784139519af243

Branch pushed to git repo; I updated commit sha1. New commits:

321658ehead-tail.sh needs stdbuf from coreutils
7735a2cPOSIX compliant push-dockerhub.sh script
3112236Try to get rid of the explicit sh invocation
05328a3Fix typo in 3c78351fdcbfe93c2c353d0aef23bc2e976d7e17
77eeea9Do not print any of these ignored warnings
5881359Remove raise exception logic
cf80b0aMerge remote-tracking branch 'trac/u/saraedum/sphinxbuild-stack' into u/saraedum/gitlabci

comment:133 Changed 20 months ago by git

  • Commit changed from cf80b0a90555b0b456efdbd5c3784139519af243 to f3bfa5fe9971fa2a04190b92da501cbf9724905b

Branch pushed to git repo; I updated commit sha1. New commits:

aa13c84Ignore more chatter in docbuild
8884d9bPrint available disk space
090c25aTrying to find out which of the shared runners actually provide enough power
a788043Make NTHREADS respect the available RAM
9e317e6Allow at least one thread
d7e743aFixed comments about the do tag
49b21e1Properly ignore undefined labels on first pass
bb1a407Fix incorrect @echo command
52c2136ignoring more chatter during docbuild
f3bfa5fUpgraded .gitlab-ci.yml to provide proper tags that actually work

comment:134 follow-up: Changed 20 months ago by saraedum

  • Work issues investigate why builds on GitLab hang sometimes deleted

Apparently, GitLab has downgraded their digitalocean runners from 4GB of RAM to only 2GB. That's not enough to build the documentation. I did not manage to create swap space inside docker, so we can only use (single core) GCE runners at the moment. Also we can not build from scratch (i.e., master & develop) on GitLab anymore with shared runners.

We should maybe create a sage group on gitlab where developers get access to more powerful runners. Maybe something we can talk about in Cernay.

comment:135 Changed 20 months ago by saraedum

Anyway, this needs review again. It seems to work fine on GitLab CI & CircleCI at the moment.

You can see recent GitLab pipelines here: https://gitlab.com/saraedum/sage/pipelines (note the "stuck" tasks because there are no shared runners that provide enough HDD & RAM for build-from-clean.)

You can also see the CircleCI runs here: https://circleci.com/gh/saraedum/sage (build-from-clean works fine here.)

comment:136 follow-up: Changed 19 months ago by embray

I'd be hesitant to move forward on this if it can't build the docs from scratch anymore (though maybe it's not actually that big a problem...?)

That said, I don't want to hold this up any longer. I just wonder exactly what the plan is for "bootstrapping" the whole system if it can't build a working Sage image from scratch.

Also, have you tried using serial doc build? I seem to recall that ends up using less RAM, but takes longer of course...

comment:137 in reply to: ↑ 134 ; follow-up: Changed 19 months ago by embray

Replying to saraedum:

Apparently, GitLab has downgraded their digitalocean runners from 4GB of RAM to only 2GB. That's not enough to build the documentation.

Ah, I see what you're saying. If the docs are built in serial that might be just enough, but even with just 2 process 2GB is not currently enough. I'm doing a little analysis of the doc build memory usage, and in the reference docs the manifolds docs take by far the longest, and by themselves chew up to ~800MB for the docbuild process alone, not to mention the couple of instances of maxima it spins up. Meanwhile the combinat docs are some of the most memory-hungry and consume nearly 1.5 GB, so that alone would be enough to run afoul of a 2GB limit.

I know this has been looked at a lot before, but I must believe there's something we can do to more tightly limit memory usage of the docbuilds...

comment:138 in reply to: ↑ 136 Changed 19 months ago by saraedum

Replying to embray:

I'd be hesitant to move forward on this if it can't build the docs from scratch anymore (though maybe it's not actually that big a problem...?)

Yes, I also don't really like to go ahead like that. It works on Circle and if you provide your own runners also on GitLab. But let me try a bit harder to make this work on GitLab's own shared runners again…

That said, I don't want to hold this up any longer. I just wonder exactly what the plan is for "bootstrapping" the whole system if it can't build a working Sage image from scratch.

We can bootstrap from CircleCI, or from my GitLab runners that run on Google Compute Engine.

Also, have you tried using serial doc build? I seem to recall that ends up using less RAM, but takes longer of course...

I am building serial (MAKE=make -j1) already.

comment:139 in reply to: ↑ 137 ; follow-up: Changed 19 months ago by saraedum

Replying to embray:

Replying to saraedum:

Apparently, GitLab has downgraded their digitalocean runners from 4GB of RAM to only 2GB. That's not enough to build the documentation.

Ah, I see what you're saying. If the docs are built in serial that might be just enough, but even with just 2 process 2GB is not currently enough. I'm doing a little analysis of the doc build memory usage, and in the reference docs the manifolds docs take by far the longest, and by themselves chew up to ~800MB for the docbuild process alone, not to mention the couple of instances of maxima it spins up. Meanwhile the combinat docs are some of the most memory-hungry and consume nearly 1.5 GB, so that alone would be enough to run afoul of a 2GB limit.

I know this has been looked at a lot before, but I must believe there's something we can do to more tightly limit memory usage of the docbuilds...

So the actual problem seems to come from tachyon invocations.

comment:140 in reply to: ↑ 139 Changed 19 months ago by saraedum

Replying to saraedum:

Replying to embray:

Replying to saraedum:

Apparently, GitLab has downgraded their digitalocean runners from 4GB of RAM to only 2GB. That's not enough to build the documentation.

Ah, I see what you're saying. If the docs are built in serial that might be just enough, but even with just 2 process 2GB is not currently enough. I'm doing a little analysis of the doc build memory usage, and in the reference docs the manifolds docs take by far the longest, and by themselves chew up to ~800MB for the docbuild process alone, not to mention the couple of instances of maxima it spins up. Meanwhile the combinat docs are some of the most memory-hungry and consume nearly 1.5 GB, so that alone would be enough to run afoul of a 2GB limit.

I know this has been looked at a lot before, but I must believe there's something we can do to more tightly limit memory usage of the docbuilds...

So the actual problem seems to come from tachyon invocations.

So, I guess that's probably not true. The problem surfaces when the docbuild runs os.fork to spawn the tachyon process. The fork itself should be quite lightweight, so how can it be that actually fork fails and not something inside tachyon after the fork?

comment:141 follow-ups: Changed 19 months ago by saraedum

Though the fork is cheap, there could be an overcommit policy that's causing the trouble… https://stackoverflow.com/a/13329386/812379

comment:142 follow-ups: Changed 19 months ago by saraedum

The serial runner might curiously be the culprit here. Somewhere in the sphinx build we seem to be leaking memory. Here I am plotting the memory usage in a serial build just before the call to sphinxbuild():

[manifolds] process.memory_info()[0]): 280907776
[dynamics ] process.memory_info()[0]): 349028352
[polynomia] process.memory_info()[0]): 365514752
[repl     ] process.memory_info()[0]): 375050240
[tensor_fr] process.memory_info()[0]): 375050240
[combinat ] process.memory_info()[0]): 382959616
[plot3d   ] process.memory_info()[0]): 1632706560

And here is the same if I run this through the pool but with just one process:

[manifolds] process.memory_info()[0]): 230350848
[dynamics ] process.memory_info()[0]): 191688704
[polynomia] process.memory_info()[0]): 210935808
[repl     ] process.memory_info()[0]): 176205824
[tensor_fr] process.memory_info()[0]): 163934208
[combinat ] process.memory_info()[0]): 517341184
[plot3d   ] process.memory_info()[0]): 170418176

Btw, I am running ./sage -docbuild en/reference inventory here but that's probably not that important.

Last edited 19 months ago by saraedum (previous) (diff)

comment:143 Changed 19 months ago by saraedum

  • Work issues set to make docs build from scratch on GitLab

comment:144 in reply to: ↑ 141 Changed 19 months ago by saraedum

Replying to saraedum:

Though the fork is cheap, there could be an overcommit policy that's causing the trouble… https://stackoverflow.com/a/13329386/812379

The overcommit policy is set to 50% overcommit which isn't that much if you only have 2GB and no swap.

comment:145 in reply to: ↑ 141 Changed 19 months ago by embray

Replying to saraedum:

Though the fork is cheap, there could be an overcommit policy that's causing the trouble… https://stackoverflow.com/a/13329386/812379

That's certainly one thing I was wondering: Is the limit for the runners on virtual memory, or actual resident memory?

comment:146 in reply to: ↑ 142 Changed 19 months ago by embray

Replying to saraedum:

The serial runner might curiously be the culprit here. Somewhere in the sphinx build we seem to be leaking memory. Here I am plotting the memory usage in a serial build just before the call to sphinxbuild():

Yes, there are definitely some big memory leaks in the doc build. I think Florent spent a lot of time trying to track some of those down, but I don't know if we ever made any progress. I'm going to start looking into it now because, well, I personally haven't tried and I'm curious.

I know it's not during the individual docbuilds: For example, some of the docs still use a lot of memory for their builds--mostly for holding their huge doctrees (e.g. manifolds uses >1GB for this). I would also like to try to reduce this if possible (maybe even splitting up some of the huger docs like manifolds into smaller subdocs). But either way, the individual docbuilds release most of the memory they eat up when they're done, so that's not where the leak is. But after the inventories are merged, by which I mean this part in ReferenceBuilder._wrapper:

 506             # The html refman must be build at the end to ensure correct
 507             # merging of indexes and inventories.
 508             # Sphinx is run here in the current process (not in a
 509             # subprocess) and the IntersphinxCache gets populated to be
 510             # used for the second pass of the reference manual and for
 511             # the other documents.
 512             getattr(DocBuilder(self.name, lang), format)(*args, **kwds)

it chews up about a gigabyte alone, and never releases it. I thought at first this might be the InventoryCache itself, but no that only appears to be about ~50MB in size by my accounting. So there's some other big objects that hang around when they really shouldn't anymore.

comment:147 in reply to: ↑ 142 Changed 19 months ago by embray

Replying to saraedum:

The serial runner might curiously be the culprit here. Somewhere in the sphinx build we seem to be leaking memory. Here I am plotting the memory usage in a serial build just before the call to sphinxbuild():

Interesting--that might be a different leak than the one I'm looking at then. When the individual doc inventories are built in subprocesses they manage not to leak too much (this despite the fact that subprocesses are being reused for each task (maxtasksperchild=None) oops I take that back, it uses maxtasksperchild=1

Last edited 19 months ago by embray (previous) (diff)

comment:148 Changed 19 months ago by embray

A few of the larger objects hanging around attached to app.env after the reference doc inventories build:

  • app.env.todo_all_todos (~600 MB)
  • app.env.intersphinx_inventory (~50 MB)
  • app.env.intersphinx_cache (~50 MB)

that is, assuming my recursive getsizeof function is even right (though I think it has a bug somewhere because it actually goes haywire in app.env.longtitles--I don't even know what that object is so it requires some exploration...)

Last edited 19 months ago by embray (previous) (diff)

comment:149 follow-up: Changed 19 months ago by embray

I forget--is there a reason we don't run runsphinx() in a subprocess? Because sphinx.cmdline.main() wasn't really ever meant to be invoked as an API in a larger script, and it leaves around all kinds of global state. Just for example:

  • the logging module's root logger hangs on to a reference the Sphinx app
  • the matplotlib.sphinxext.plot_directive.setup function holds on to a reference to the app for some reason

and there might be other examples.

Last edited 19 months ago by embray (previous) (diff)

comment:150 Changed 19 months ago by embray

There's also quite a lot of Sage-related objects left in memory after the docbuild.

comment:151 Changed 19 months ago by embray

Not right now, but next week I think I will do some refactoring of the docbuild process a bit to ensure that runsphinx() is always run in a subprocess. I think that will have a big impact at least in terms of keeping memory leaks under control.

I have some other thoughts I want to try out--that might go quite deep--in terms of keeping doctree sizes under better control too. But that will probably be a longer effort.

comment:152 Changed 19 months ago by nthiery

  • Cc jdemeyer hivert added

Thanks for investigating improvements to our Sphinx build! I am adding Florent and Jeroen in CC.

At Sage Days 77, Florent and a sphinx developper had done some extensive analysis of the memory footprint of Sphinx. IIRC, there was a lot of useless information that was kept in the doctrees; and also some memory abuse that could not be explained, pointing to some plausible bug/leak in Sphinx itself. Also, IIRC, they had come to the conclusion that using Sphinx's built-in parallelisation was not out of reach. An aspect of it is that, instead of manually splitting our reference manual in smaller and smaller chunks we could tentatively revert back to a single document and let Sphinx play its magic.

But that's from the top of my head from two years ago; I'll let Florent comment and dig up notes from SD77.

comment:153 in reply to: ↑ 149 ; follow-up: Changed 19 months ago by saraedum

Replying to embray:

I forget--is there a reason we don't run runsphinx() in a subprocess? Because sphinx.cmdline.main() wasn't really ever meant to be invoked as an API in a larger script, and it leaves around all kinds of global state. [...]

That's what Debian does btw. https://salsa.debian.org/science-team/sagemath/blob/master/debian/patches/df-subprocess-sphinx.patch

I think this behaviour was originally introduced in https://trac.sagemath.org/ticket/14204. I am not completely sure, but it seems to me that the idea was to initialize sage.all once and then use multiprocessing's Pool to fork the individual builds. I think that it's fine to call the main of sphinx in that way. The problem I think is that this broke when (embray I suppose?) introduced the special case for single-threaded builds. So, I guess we need to fork once to control the RAM the docbuild uses. Is the serial build still necessary on cygwin?

comment:154 Changed 19 months ago by git

  • Commit changed from f3bfa5fe9971fa2a04190b92da501cbf9724905b to 95c6275adac26be537c11a084114d8a8dc576392

Branch pushed to git repo; I updated commit sha1. New commits:

5719a72Improve caching during build
e4887e9Fix caching during build
be3a0f7The docbuild crashes happen during an os.fork to spawn tachyon
7d85dc7Something related to the sphinxbuild seems to be leaking memory
50898feTry to build-from-scratch on GitLab's underpowered CI machines
95c6275Comment on fork logic in docbuilding

comment:155 Changed 19 months ago by saraedum

The build from scratch on GitLab CI works again btw. It takes a bit more than 10h…

comment:156 Changed 19 months ago by saraedum

  • Work issues changed from make docs build from scratch on GitLab to does this break docbuild on cygwin now?

comment:157 Changed 19 months ago by git

  • Commit changed from 95c6275adac26be537c11a084114d8a8dc576392 to c0574ca897321c56927e9ce5004fd2f75693785a

Branch pushed to git repo; I updated commit sha1. New commits:

c0574caMerge remote-tracking branch 'trac/develop' into u/saraedum/gitlabci

comment:158 in reply to: ↑ 153 Changed 19 months ago by embray

Replying to saraedum:

Replying to embray:

I forget--is there a reason we don't run runsphinx() in a subprocess? Because sphinx.cmdline.main() wasn't really ever meant to be invoked as an API in a larger script, and it leaves around all kinds of global state. [...]

That's what Debian does btw. https://salsa.debian.org/science-team/sagemath/blob/master/debian/patches/df-subprocess-sphinx.patch

I think this behaviour was originally introduced in https://trac.sagemath.org/ticket/14204. I am not completely sure, but it seems to me that the idea was to initialize sage.all once and then use multiprocessing's Pool to fork the individual builds. I think that it's fine to call the main of sphinx in that way. The problem I think is that this broke when (embray I suppose?) introduced the special case for single-threaded builds. So, I guess we need to fork once to control the RAM the docbuild uses. Is the serial build still necessary on cygwin?

Yes, in #21389 I changed things so that SAGE_NUM_THREADS=1 meant that no subprocesses were created during the docuild. Originally that was motivated by need for Cygwin, but I don't believe that's necessary anymore (all the fork-related bugs that I know of) appear to be fixed. I also now recognize that this is not good behavior. As I wrote above, there's an enormous amount of global state that runsphinx() leaves around, and trying to clean it all up manually would be very difficult, so it should always be run in a subprocess.

Likewise, simply in the process of building the docs, there is also a lot of global state left in Sage (cached values and the like) which it turns out can eat up a lot of memory, especially after some of the larger docs (like manifolds). This is likewise difficult to manually clean up.

Even a reversion of #21389 wouldn't fully fix the issue though since there are other parts of the docbuild that call runsphinx() without calling it in a subprocess, and those are responsible for quite a bit of unnecessary memory leakage currently (even in parallel doc builds). So I intend to fix that.

We can follow up on the docbuild discussion elsewhere though, since other than its impact on CI the details of that are orthogonal to this ticket.

comment:159 Changed 19 months ago by saraedum

  • Work issues does this break docbuild on cygwin now? deleted

comment:160 Changed 19 months ago by git

  • Commit changed from c0574ca897321c56927e9ce5004fd2f75693785a to 329fdab86a1a3edf03c88780104a1471105fbcd6

Branch pushed to git repo; I updated commit sha1. New commits:

329fdabfix tags for build-from-latest

comment:161 Changed 19 months ago by saraedum

So, I think this is really ready for review again.


New commits:

329fdabfix tags for build-from-latest

comment:162 Changed 19 months ago by saraedum

The patchbot complains about "print" but that one is actually in an awk script and not in Python, so it's fine.

comment:163 Changed 19 months ago by git

  • Commit changed from 329fdab86a1a3edf03c88780104a1471105fbcd6 to 048e2d8f0a25bada68693f9b22a8e0426eab2fb2

Branch pushed to git repo; I updated commit sha1. New commits:

048e2d8silence unicode warnings from the patchbot

comment:164 Changed 19 months ago by saraedum

  • Status changed from needs_review to needs_work
  • Work issues set to patchbot errors

comment:165 Changed 19 months ago by git

  • Commit changed from 048e2d8f0a25bada68693f9b22a8e0426eab2fb2 to f9c772e5003fda874e82edd5c785eab2657cee1b

Branch pushed to git repo; I updated commit sha1. New commits:

f9c772eFix filtering logic in sphinx logger

comment:166 Changed 19 months ago by saraedum

  • Status changed from needs_work to needs_review
  • Work issues patchbot errors deleted

New commits:

f9c772eFix filtering logic in sphinx logger

comment:167 Changed 19 months ago by git

  • Commit changed from f9c772e5003fda874e82edd5c785eab2657cee1b to f0431637a02819116781d87825523a068bf88ee2

Branch pushed to git repo; I updated commit sha1. New commits:

f043163Fixed some minor doctest issues

comment:168 Changed 19 months ago by embray

  • Status changed from needs_review to positive_review

LGTM. Let's move forward.

comment:169 Changed 19 months ago by git

  • Commit changed from f0431637a02819116781d87825523a068bf88ee2 to a34a3e393128f30efae547fdf36f84d6feb34b6e
  • Status changed from positive_review to needs_review

Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:

a34a3e3It seems that MAKEOPTS/SAGE_NUM_THREADS parsing has been fixed now

comment:170 Changed 19 months ago by git

  • Commit changed from a34a3e393128f30efae547fdf36f84d6feb34b6e to 9c0015ff7e186e9d34caef28d3c1b962dbde75d5

Branch pushed to git repo; I updated commit sha1. New commits:

9c0015fFixup a34a3e393128f30efae547fdf36f84d6feb34b6e

comment:171 Changed 19 months ago by saraedum

erik: I pushed a minor cleanup. Please have a look :)

comment:172 Changed 19 months ago by embray

Can you just squash the last commit into the previous one?

comment:173 Changed 19 months ago by saraedum

The problem is that I had already merged this into another branch. Let's see how unhappy git is about me squashing this.

comment:174 Changed 19 months ago by git

  • Commit changed from 9c0015ff7e186e9d34caef28d3c1b962dbde75d5 to 8f3480e71cafcef96b1c21aaf3219e8c81e5e9e1

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

8f3480eIt seems that MAKEOPTS/SAGE_NUM_THREADS parsing has been fixed now

comment:175 Changed 19 months ago by saraedum

Ok. Seems to handle that fine by now.


New commits:

8f3480eIt seems that MAKEOPTS/SAGE_NUM_THREADS parsing has been fixed now

comment:176 Changed 19 months ago by saraedum

  • Status changed from needs_review to needs_work
  • Work issues set to caching of build-time-dependencies does not work

comment:177 Changed 19 months ago by git

  • Commit changed from 8f3480e71cafcef96b1c21aaf3219e8c81e5e9e1 to ad39cc9de9dd8a3f6092c7153278982393da57b8

Branch pushed to git repo; I updated commit sha1. New commits:

99355adBuild images in the same order in the Dockerfile and in build-docker.sh
81babb8Simplify caching during docker build
ad39cc9Complicate parallelization quite a bit

comment:178 Changed 19 months ago by saraedum

  • Status changed from needs_work to needs_review
  • Work issues caching of build-time-dependencies does not work deleted

Ok. Tested that everything works on GitLab CI and CircleCI. The build (from scratch) on CircleCI barely finished a few minutes before it hits the time limit of five hours but I don't think there's much that I can do to speed things up. (I could try to seed a ccache with files from a previous but I have a feeling that to help with a build from scratch I need the ccache to be huge. I am not sure whether CircleCI lets me do that much network I/O and whether it would be fast enough.)

comment:179 Changed 19 months ago by git

  • Commit changed from ad39cc9de9dd8a3f6092c7153278982393da57b8 to ae119f23c62921e7e01e5169b52124a2243a8ab1

Branch pushed to git repo; I updated commit sha1. New commits:

ae119f2Do not overcommit that much

comment:180 Changed 19 months ago by embray

Perhaps at some point we could experiment with setting up sccache for Sage

comment:181 follow-up: Changed 19 months ago by saraedum

Interesting. I did not know sccache.

Since we did not manage to get this into 8.2, I'll manually update our docker images to 8.2 and 8.3beta0.

comment:182 Changed 19 months ago by git

  • Commit changed from ae119f23c62921e7e01e5169b52124a2243a8ab1 to a5914a4917e6dbbbc99c8424250f010ee54de809

Branch pushed to git repo; I updated commit sha1. New commits:

a5914a4Build tags from clean

comment:183 Changed 19 months ago by saraedum

  • Status changed from needs_review to needs_work
  • Work issues set to CircleCI does not correctly build tagged branches

comment:184 Changed 19 months ago by git

  • Commit changed from a5914a4917e6dbbbc99c8424250f010ee54de809 to 77f1f3c478159da59f069f4a0d4644cc5770fdf5

Branch pushed to git repo; I updated commit sha1. New commits:

77f1f3cMerge remote-tracking branch 'trac/develop' into HEAD

comment:185 Changed 19 months ago by saraedum

  • Milestone changed from sage-8.2 to sage-8.3

New commits:

77f1f3cMerge remote-tracking branch 'trac/develop' into HEAD

comment:186 Changed 19 months ago by embray

It looks like the Docker images for 8.2 haven't been built yet, or are you doing that now? If not I will just build them from my existing scripts (that should have been done already...)

comment:187 Changed 19 months ago by embray

Great, the Docker build is failing now due to #23519. This explains why my automated builds stopped working yet again (I hadn't had a chance to investigate). Fortunately there's a workaround that I can apply easily enough...

comment:188 in reply to: ↑ 181 ; follow-up: Changed 18 months ago by saraedum

Replying to saraedum:

Since we did not manage to get this into 8.2, I'll manually update our docker images to 8.2 and 8.3beta0.

I'm building both actually.

comment:189 Changed 18 months ago by saraedum

  • Description modified (diff)

comment:190 Changed 18 months ago by git

  • Commit changed from 77f1f3c478159da59f069f4a0d4644cc5770fdf5 to 989501f837a8efbbee8f943c886c7985bc6f6572

Branch pushed to git repo; I updated commit sha1. New commits:

49bb769Work around CircleCI escaping
f9c68a6Tell CircleCI to build for all tags
b1508cfadapt to microbadger URL change
989501fMerge remote-tracking branch 'trac/u/saraedum/gitlabci' into gitlabci

comment:191 Changed 18 months ago by git

  • Commit changed from 989501f837a8efbbee8f943c886c7985bc6f6572 to a4487bf1fa369edb3084bcb9ba37755156eea770

Branch pushed to git repo; I updated commit sha1. New commits:

a4487bfFix format of stable releases and mention betas

comment:192 Changed 18 months ago by git

  • Commit changed from a4487bf1fa369edb3084bcb9ba37755156eea770 to 2d33a626429f5ae07dbe3dd7791b238341e37ab7

Branch pushed to git repo; I updated commit sha1. New commits:

2d33a62fix typo

comment:193 Changed 18 months ago by git

  • Commit changed from 2d33a626429f5ae07dbe3dd7791b238341e37ab7 to 481855ff1ab88f258b97fb4d5896d3e7c614e41f

Branch pushed to git repo; I updated commit sha1. New commits:

481855fMerge remote-tracking branch 'trac/develop' into gitlabci

comment:194 in reply to: ↑ 188 ; follow-up: Changed 18 months ago by saraedum

Replying to saraedum:

Replying to saraedum:

Since we did not manage to get this into 8.2, I'll manually update our docker images to 8.2 and 8.3beta0.

I'm building both actually.

I pushed both (they are much smaller than the one from the old setup, so I hope it's ok that I replaced yours.) gitlab is building 8.3.beta1 currently.

comment:195 in reply to: ↑ 194 ; follow-up: Changed 18 months ago by embray

Replying to saraedum:

Replying to saraedum:

Replying to saraedum:

Since we did not manage to get this into 8.2, I'll manually update our docker images to 8.2 and 8.3beta0.

I'm building both actually.

I pushed both (they are much smaller than the one from the old setup, so I hope it's ok that I replaced yours.) gitlab is building 8.3.beta1 currently.

I'm a little concerned about that. Does that include the sagemath-jupyter image? People are using that.

comment:196 in reply to: ↑ 195 Changed 18 months ago by saraedum

Replying to embray:

Replying to saraedum:

Replying to saraedum:

Replying to saraedum:

Since we did not manage to get this into 8.2, I'll manually update our docker images to 8.2 and 8.3beta0.

I'm building both actually.

I pushed both (they are much smaller than the one from the old setup, so I hope it's ok that I replaced yours.) gitlab is building 8.3.beta1 currently.

I'm a little concerned about that. Does that include the sagemath-jupyter image? People are using that.

The sagemath/sagemath image does contain jupyter. I did not touch the sagemath-jupyter images but I'll create a README there to point people to the main repository.

comment:197 Changed 18 months ago by git

  • Commit changed from 481855ff1ab88f258b97fb4d5896d3e7c614e41f to 2632d32509c5f64c67d7bbd13d214dddb3821e2c

Branch pushed to git repo; I updated commit sha1. New commits:

90ab8f6Put URL for anynomous access in the docker image
2632d32Merge branch 'u/saraedum/gitlabci' of git://trac.sagemath.org/sage into u/saraedum/gitlabci

comment:198 Changed 18 months ago by git

  • Commit changed from 2632d32509c5f64c67d7bbd13d214dddb3821e2c to bf0b9ff9dd125bb575a0b0111c30779e1ed81425

Branch pushed to git repo; I updated commit sha1. New commits:

a532fceMerge remote-tracking branch 'trac/develop' into u/saraedum/gitlabci
686fe86An attempt to use workflows on CircleCI
79003abDo not deduce CPUs/RAM by looking at the local machine
5ba2f8aBackport https://trac.sagemath.org/ticket/24655
24a627fMerge commit '5ba2f8a419f11dd51ae95ed44f748fe85fbe0bf8' into u/saraedum/gitlabci
bf0b9ffCollect system information on the docker host

comment:199 Changed 18 months ago by saraedum

  • Status changed from needs_work to needs_review
  • Work issues changed from CircleCI does not correctly build tagged branches to test all variations of branches and tags on CircleCI & GitLab CI

comment:200 Changed 18 months ago by git

  • Commit changed from bf0b9ff9dd125bb575a0b0111c30779e1ed81425 to a0fdc3c60f0415963bf4e945f2694dee66ebb8eb

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

906ebeaCollect system information on the docker host
60ddb4bsimplify apk add command for GitLab CI
25f934eBuild from the latest develop image
aa49196Backport https://trac.sagemath.org/ticket/24655 for automated docker builds
a0fdc3cMerge commit 'aa49196c4f0ed33da7a45b80c7a4a01bd46b7d61' into u/saraedum/gitlabci

comment:201 Changed 18 months ago by saraedum

  • Branch u/saraedum/gitlabci deleted
  • Commit a0fdc3c60f0415963bf4e945f2694dee66ebb8eb deleted

comment:202 Changed 18 months ago by saraedum

  • Branch set to u/saraedum/24655

comment:203 Changed 18 months ago by git

  • Commit set to 11f8a6cc24e9a1853a85f45d48210bdfb35254ff

Branch pushed to git repo; I updated commit sha1. New commits:

11f8a6cno port exposure needed for jupyter anymore

comment:204 Changed 18 months ago by git

  • Commit changed from 11f8a6cc24e9a1853a85f45d48210bdfb35254ff to 1960ba1e0674435f222af242df6252d2b7ea50c8

Branch pushed to git repo; I updated commit sha1. New commits:

1960ba1add openssl/libssl-dev to the docker images

comment:205 Changed 18 months ago by saraedum

  • Description modified (diff)

comment:206 Changed 18 months ago by saraedum

  • Description modified (diff)

comment:207 Changed 18 months ago by git

  • Commit changed from 1960ba1e0674435f222af242df6252d2b7ea50c8 to f2c641a9d3a99fd0a1dc4f95c801c600e226f1f5

Branch pushed to git repo; I updated commit sha1. New commits:

14926cfRevert "add openssl/libssl-dev to the docker images"
f2c641aAdd debug information on patch size

comment:208 Changed 18 months ago by git

  • Commit changed from f2c641a9d3a99fd0a1dc4f95c801c600e226f1f5 to 83f71e7948c8eb8a72e63d4d1451f9f20779bb8b

Branch pushed to git repo; I updated commit sha1. New commits:

83f71e7Don't pipe "echo" into "rm"

comment:209 Changed 18 months ago by saraedum

  • Description modified (diff)

comment:210 Changed 18 months ago by git

  • Commit changed from 83f71e7948c8eb8a72e63d4d1451f9f20779bb8b to d28f601f53ddae2cecadc95854fbd91a525ac8f7

Branch pushed to git repo; I updated commit sha1. New commits:

d28f601Make sage -pip work

comment:211 Changed 18 months ago by git

  • Commit changed from d28f601f53ddae2cecadc95854fbd91a525ac8f7 to 7f44254f417f2cf19a50cd4bbe1e387f2084744b

Branch pushed to git repo; I updated commit sha1. New commits:

b8e6077Do not install libssl-dev twice
7f44254Merge remote-tracking branch 'trac/develop' into 24655

comment:212 Changed 18 months ago by saraedum

  • Work issues test all variations of branches and tags on CircleCI & GitLab CI deleted

I created the 8.2 and 8.3.beta* releases with this and it seems to work fine at the moment.

comment:213 Changed 18 months ago by embray

So where are we at with this now? I've seen there's been a lot of continued activity from you on the ticket but I haven't followed what it was all about. Just minor fixes or anything major to know?

comment:214 Changed 18 months ago by saraedum

There's nothing really major that has changed. The CircleCI setup looks a bit different now but on the inside its mostly the same. Here's a changelog of sorts since you gave it positive_review last:

  • I had set SAGE_NUM_THREADS/MAKEOPTS from the host that ran the docker build commands not from inside the actual docker containers; the specs of these two machines are sometimes very different on CircleCI. That's fixed now.
  • I worked around some docker build bug related to caching and multi-stage builds. I don't really understand what's the problem there but it works now.
  • Most of the changes are to make builds for git tags work on CircleCI.
  • I started using these images for actual Sage development and found some minor issues in the git setup inside the sagemath-dev image.
  • I fixed the pip/openssl issue.

So…does it work? It works fine for building from scratch. I've done that for 8.2 and all the betas since then. It also worked for building on top of the latest beta but I have not tested that in all the possible combinations (say, changing SPKGs and things like that) this time. As nobody knows about that latter feature yet, nobody is going to be upset if it doesn't work ;) I certainly want to use this heavily so I'll fix problems right away.

comment:215 Changed 18 months ago by saraedum

Let me check if I changed anything since your last review that is not exclusively docker related…

comment:216 Changed 18 months ago by saraedum

This has changed since you reviewed it last, so nothing "dangerous" I'd say:

 .ci/build-docker.sh           |  26 +++++++++++---------------
 .ci/describe-system.sh        |  14 ++++++--------
 .ci/setup-make-parallelity.sh |  66 ++++++++++++++++++++++--------------------------------------------
 .ci/test-dev.sh               |   4 ++--
 .ci/test-jupyter.sh           |   5 +++--
 .ci/update-env.sh             |   6 +-----
 .circleci/before-script.sh    |  26 --------------------------
 .circleci/config.yml          | 100 +++++++++++++++++++++++++++++++++++++++-------------------------------------------------------------
 .gitlab-ci.yml                |  19 ++++++++++++-------
 docker/Dockerfile             |  83 ++++++++++++++++++++++++++++++++++++-----------------------------------------------
 docker/README.md              |   7 +++----
 11 files changed, 135 insertions(+), 221 deletions(-)

comment:217 follow-up: Changed 18 months ago by embray

Thanks for the summary.

Though the amount of minor tweaks this has continued to need seems to confirm my feeling that there should really be a way to get these changes into develop faster, somehow, or use a branch other than develop for CI purposes (but that is regularly reset to the latest develop as soon as any necessary changes are merged into develop...) I don't know if you have any ideas about that.

Like, what if CircleCI, or GitLab changes something, or Docker changes something that has to be worked around (something that has happened to me plenty of times already)? We don't want to have all CI break until the next beta in develop...

comment:218 in reply to: ↑ 217 Changed 18 months ago by saraedum

Replying to embray:

Thanks for the summary.

Though the amount of minor tweaks this has continued to need seems to confirm my feeling that there should really be a way to get these changes into develop faster, somehow, or use a branch other than develop for CI purposes (but that is regularly reset to the latest develop as soon as any necessary changes are merged into develop...) I don't know if you have any ideas about that.

Like, what if CircleCI, or GitLab changes something, or Docker changes something that has to be worked around (something that has happened to me plenty of times already)? We don't want to have all CI break until the next beta in develop...

I see your point but I don't have a good answer. As long as nobody is relying on this, we can just stop the builds on CircleCI/GitLab until the fix is in develop. Then it is going to work again once people merge in the latest develop. That's not great, sure.

I think this whole problem is a bit similar to the git trac issue that we had when we migrated to git. We decided to put the predecessor of the git trac command into the sage repository which was a bad idea as it wasn't stable and when you went back to an old branch you got the old broken tooling. Now that git trac is stable and hardly ever changes we could put it back into the main repository (I am not proposing that, just saying.) So the hope here could be that once people start to use this actively, the main problems have been sorted out already.

The fundamental problem is that I don't see how to take this out of the repository in any way that is even remotely supported by the CIs that we are planning to use. conda-forge has the tooling (conda-smithy) outside of the individual repositories but you have to say "@conda-forge-admin please rerender" all the time to make the tooling upgrade itself in your pull request. It's annoying, it would mean that we would need a trac bot to do this rerendering of the CI related files, and it makes the diffs hard to read as they are filled with CI related changes all the time.

The other option would be to only have a minimal hook call in the repository and then have that trigger actual pipelines somewhere else on that commit and report back through bots. It's a lot of work and I don't think we would want to maintain that kind of infrastructure.

Long story short. I fear we have to use the CIs the way they're intended to be used.

The other issue that you're sort of raising for me is whether develop should be pushed to more frequently. I would love to see a marge-bot or something similar that merges in things automatically; but this goes beyond the focus of this ticket I think.

comment:219 Changed 18 months ago by git

  • Commit changed from 7f44254f417f2cf19a50cd4bbe1e387f2084744b to ab71cfa8a74651587bb84fabffe39bad35fcfb4b

Branch pushed to git repo; I updated commit sha1. New commits:

4ddd346Backport #24655 to build docker images
ab71cfaMerge branch 'docker-8.3.beta3' into 24655

comment:220 Changed 18 months ago by saraedum

Hm…we could run a script that is contained in an environment variable. That way we could inject whatever behaviour into every CI run from the outside, e.g., to display a warning that something is broken at the moment and instructions on how to work around.

comment:221 Changed 18 months ago by git

  • Commit changed from ab71cfa8a74651587bb84fabffe39bad35fcfb4b to 511bf3ebfc3c5fef8320c5116592aca875971b85

Branch pushed to git repo; I updated commit sha1. New commits:

511bf3eIntroduce monkey-patching in CI

comment:222 follow-up: Changed 18 months ago by saraedum

What about something like this (last commit). I have not tested it yet but do you think that's a good idea?

comment:223 Changed 18 months ago by embray

I see your point but I don't have a good answer. As long as nobody is relying on this, we can just stop the builds on CircleCI/GitLab until the fix is in develop. Then it is going to work again once people merge in the latest develop. That's not great, sure.

"as long as nobody is relying on this" yeah, but we want them to rely on it; perhaps even exclusively long term :)

Fortunately, I think once this is in more use it will also force, or at least encourage a change in Sage's development process to be more like a "normal" project.

comment:224 in reply to: ↑ 222 Changed 18 months ago by embray

Replying to saraedum:

What about something like this (last commit). I have not tested it yet but do you think that's a good idea?

Definitely makes me uh, a bit nervous but okay :) It could be useful to have and is at least better than nothing as a short-term solution to this problem (if it even does become more than a hypothetical problem).

comment:225 Changed 18 months ago by nthiery

I am not sure this is the right location to report on this. Maybe the README about our Docker image on dockerhub should contain some information on where to report issues?

Anyway, I am having trouble with using binder with the docker image for SageMath 8.2 as currently posted on dockerhub compared to that for 8.1.

Namely, when binder launches Jupyter, its working directory is /home/sage/sage, instead of /home/sage as before. This means in particular that all the notebooks and files that are copied over from the github repository with the usual COPY . ${HOME} are not accessible anymore.

Should WORK_DIR be reset to ${HOME} at the end of the Dockerfile?

comment:226 follow-up: Changed 18 months ago by saraedum

nthiery: You are right, we should mention that bugs should be reported on trac as usual. I am not sure whether COPY . ${HOME} is the usual thing. I usually do COPY --chown=sage:sage . .. I don't know, should we reset WORKDIR to HOME? Or should it be the place where Sage is installed. Let's reset it to HOME, so we don't break scripts that used to work with the 8.1 image. (This breaks compatibility with 8.2 images but anyway…)

comment:227 Changed 18 months ago by git

  • Commit changed from 511bf3ebfc3c5fef8320c5116592aca875971b85 to 4ef4e1830f78d45e9faf0ac48fa44f9d57da33ac

Branch pushed to git repo; I updated commit sha1. New commits:

162fe52Reset WORKDIR to HOME
4ef4e18Where to report bugs for the docker images

comment:228 Changed 18 months ago by git

  • Commit changed from 4ef4e1830f78d45e9faf0ac48fa44f9d57da33ac to b8a94e2fdbe3c8b7db3b06be0a707373ce85c1d9

Branch pushed to git repo; I updated commit sha1. New commits:

b8a94e2Merge remote-tracking branch 'trac/develop' into 24655

comment:229 Changed 18 months ago by git

  • Commit changed from b8a94e2fdbe3c8b7db3b06be0a707373ce85c1d9 to 6a3d5d0c70310ddcf28ac684c56bf279d189179c

Branch pushed to git repo; I updated commit sha1. New commits:

6a3d5d0drop trailing space

comment:230 Changed 18 months ago by git

  • Commit changed from 6a3d5d0c70310ddcf28ac684c56bf279d189179c to f60cebb458d21265afa3bb6f2f7d4701cc3e9085

Branch pushed to git repo; I updated commit sha1. New commits:

f60cebbMerge branch '24655' into 8.3.beta4.docker

comment:231 Changed 18 months ago by git

  • Commit changed from f60cebb458d21265afa3bb6f2f7d4701cc3e9085 to 8ff2bb4ae935a2b58820908174fd5ad33f7fa01b

Branch pushed to git repo; I updated commit sha1. New commits:

8ff2bb4Make update-env posix compliant again

comment:232 Changed 18 months ago by git

  • Commit changed from 8ff2bb4ae935a2b58820908174fd5ad33f7fa01b to f7ea0f3c7a10baf475e96bd0bacd09f06039b14b

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

f7ea0f3Make update-env posix compliant again

comment:233 follow-up: Changed 18 months ago by saraedum

I pushed a new 8.2 image with the WORKDIR set to /home/sage. Both sage and sage-jupyter still work, so I hope that this doesn't break anything.


New commits:

f7ea0f3Make update-env posix compliant again

comment:234 follow-up: Changed 18 months ago by saraedum

embray: The monkey patching seems to work. So this needs review again.

comment:235 in reply to: ↑ 233 ; follow-up: Changed 18 months ago by nthiery

Replying to saraedum:

I pushed a new 8.2 image with the WORKDIR set to /home/sage. Both sage and sage-jupyter still work, so I hope that this doesn't break anything.

Thanks!

I have a problem though: since binder already downloaded an image with tag 8.2, it won't redownload the new version. Is there a way to add an alias tag such as 8.2-1 ?

Thanks!

comment:236 in reply to: ↑ 226 Changed 18 months ago by nthiery

Replying to saraedum:

nthiery: You are right, we should mention that bugs should be reported on trac as usual. I am not sure whether COPY . ${HOME} is the usual thing. I usually do COPY --chown=sage:sage . ..

IIRC, the binder doc suggests COPY . ${HOME}; in sage-binder-env, I am indeed adding --chown=sage:sage.

I don't know, should we reset WORKDIR to HOME? Or should it be the place where Sage is installed. Let's reset it to HOME, so we don't break scripts that used to work with the 8.1 image. (This breaks compatibility with 8.2 images but anyway…)

I would tend to see the location of the sage install (and in general which ever software is installed) as an implementation detail; but this may be a bias from my use cases of the image!

comment:237 in reply to: ↑ 235 Changed 18 months ago by saraedum

Replying to nthiery:

Replying to saraedum:

I pushed a new 8.2 image with the WORKDIR set to /home/sage. Both sage and sage-jupyter still work, so I hope that this doesn't break anything.

Thanks!

I have a problem though: since binder already downloaded an image with tag 8.2, it won't redownload the new version. Is there a way to add an alias tag such as 8.2-1 ?

Strange, I had been curious about this a while back and had tried whether binder would check for image upgrades and it actually did. Have you changed something in the repository? I think you need to make some change so it rebuilds your Dockerfile. If it rebuilds it should pull the latest version. Anyway, to work around your specific problem, you could also just use latest instead of 8.2 which is the latest released stable version.

comment:238 Changed 18 months ago by nthiery

Alas, tried again, and it did not work. Also binder explicitly forbids using "latest". No luck!

Last edited 18 months ago by nthiery (previous) (diff)

comment:239 Changed 18 months ago by saraedum

Just tried, putting a hash into the Dockerfile worked for me FROM sagemath/sagemath@sha256:e933509b105f36b9b7de892af847ade7753e058c5d9e0c0f280f591b85ad996d

comment:240 Changed 18 months ago by nthiery

Ah yes, good point. Thanks for the tip, that worked!

comment:241 Changed 18 months ago by git

  • Commit changed from f7ea0f3c7a10baf475e96bd0bacd09f06039b14b to e8dadc8fc9e64968f66ff9953115203b6b0603df

Branch pushed to git repo; I updated commit sha1. New commits:

e8dadc8Merge remote-tracking branch 'trac/develop' into 24655

comment:242 Changed 17 months ago by git

  • Commit changed from e8dadc8fc9e64968f66ff9953115203b6b0603df to 569d492f83e15b7abfd79c5ddbdda51928187fe0

Branch pushed to git repo; I updated commit sha1. New commits:

569d492Merge remote-tracking branch 'trac/develop' into 24655

comment:243 in reply to: ↑ 234 ; follow-up: Changed 17 months ago by embray

Replying to saraedum:

embray: The monkey patching seems to work. So this needs review again.

Wait, what monkey patching? I've been AFK for 2 weeks so I'm not sure what you're referring to anymore :)

comment:244 in reply to: ↑ 243 Changed 17 months ago by saraedum

Replying to embray:

Replying to saraedum:

embray: The monkey patching seems to work. So this needs review again.

Wait, what monkey patching? I've been AFK for 2 weeks so I'm not sure what you're referring to anymore :)

I was refering to the injection of a bash script into the CI run through environment variables with that. Sorry the term is probably not appropriate here.

comment:245 Changed 17 months ago by embray

Ah right, that. Well, I've never *seen* "monkey-patch" used that way before, nor have I even seen that trick used before, but I suppose "monkey-patch" isn't a bad term for it :)

comment:246 Changed 17 months ago by git

  • Commit changed from 569d492f83e15b7abfd79c5ddbdda51928187fe0 to baf71bdf605e916630e97bee88194401d5288584

Branch pushed to git repo; I updated commit sha1. New commits:

baf71bdMerge remote-tracking branch develop into 24655

comment:247 Changed 17 months ago by git

  • Commit changed from baf71bdf605e916630e97bee88194401d5288584 to 89115fa2d0767256ce62e0a15d1263d0fc30f392

Branch pushed to git repo; I updated commit sha1. New commits:

89115faMerge remote-tracking branch 'trac/develop' into 24655

comment:248 Changed 17 months ago by git

  • Commit changed from 89115fa2d0767256ce62e0a15d1263d0fc30f392 to 6319a7728304d4bffbc477989a7d182067ff5c56

Branch pushed to git repo; I updated commit sha1. New commits:

6319a77Allow to push to another's user's namespace

comment:249 Changed 17 months ago by git

  • Commit changed from 6319a7728304d4bffbc477989a7d182067ff5c56 to 5d9b1a5c9ec4588567a8f34bdb895082a6c1c26d

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

5d9b1a5Allow to push to another's user's namespace

comment:250 Changed 17 months ago by embray

How is this coming along? Should I review this again and/or set positive review?

I have the gitlab/trac bot more or less ready to start work too.

comment:251 Changed 17 months ago by git

  • Commit changed from 5d9b1a5c9ec4588567a8f34bdb895082a6c1c26d to 4bdd32a4c7a9a434bc992be8d4e84ef7f2c89748

Branch pushed to git repo; I updated commit sha1. New commits:

4bdd32aRetry build-from-clean on github

comment:252 Changed 17 months ago by saraedum

Yes, this is definitely ready for review. If you are happy with it, I would run a full set of tests before we really set it to positive review. But I am using this all the time myself and it seems to be sufficiently stable to be useful.

comment:253 Changed 17 months ago by git

  • Commit changed from 4bdd32a4c7a9a434bc992be8d4e84ef7f2c89748 to 754c52bcf2ded4813b440e8f15e4ee2f295c290a

Branch pushed to git repo; I updated commit sha1. New commits:

754c52bSkip publication steps if SECRET_DOCKER_PASS has not been set

comment:254 Changed 17 months ago by git

  • Commit changed from 754c52bcf2ded4813b440e8f15e4ee2f295c290a to 894c5b8b5c55324015e1d3c0664409bc60c8b992

Branch pushed to git repo; I updated commit sha1. New commits:

894c5b8Fixup for 5d9b1a5c9ec4588567a8f34bdb895082a6c1c26d

comment:255 Changed 17 months ago by git

  • Commit changed from 894c5b8b5c55324015e1d3c0664409bc60c8b992 to d29058e7950b177f6d0c48c88248790ec6a86a3b

Branch pushed to git repo; I updated commit sha1. New commits:

d29058eMerge remote-tracking branch 'trac/develop' into 24655

comment:256 follow-up: Changed 17 months ago by embray

What's the status of the Circle-CI builds? Looking at https://circleci.com/gh/saraedum/sage it hasn't updated in a month and the last 2 builds were failures.

comment:257 follow-up: Changed 17 months ago by embray

Also, what's up with the latest 3 builds on https://gitlab.com/saraedum/sage/pipelines ? It just says they got stuck/timed out, but with no log to show for it.

comment:258 in reply to: ↑ 256 Changed 17 months ago by saraedum

Replying to embray:

What's the status of the Circle-CI builds? Looking at https://circleci.com/gh/saraedum/sage it hasn't updated in a month and the last 2 builds were failures.

The builds failed because "build-from-clean" does not work anymore (Sage takes too much time now to build.) I have not tried the build-from-latest for a bit. If everything else looks good to you I would run a test of every scenario that I can think of, including these CircleCI builds.

comment:259 in reply to: ↑ 257 Changed 17 months ago by saraedum

Replying to embray:

Also, what's up with the latest 3 builds on https://gitlab.com/saraedum/sage/pipelines ? It just says they got stuck/timed out, but with no log to show for it.

The runners are currently down because the credit card linked to the google cloud account expired. The owner is working on fixing that. The build-from-clean only works with non-shared runners because Sage takes too long to build. The build-from-latest still works with shared runners.

comment:260 follow-up: Changed 17 months ago by embray

I wish there was some way to see exactly which runner a pipeline ran on. I can't find anything to that effect, but you'd think one would want to know, especially if there were failures (especially since some failures can depend on the runner).

comment:261 in reply to: ↑ 260 Changed 17 months ago by saraedum

Replying to embray:

I wish there was some way to see exactly which runner a pipeline ran on. I can't find anything to that effect, but you'd think one would want to know, especially if there were failures (especially since some failures can depend on the runner).

The first line of the log says which runner it ran on exactly. I also print the specs of the runner in the logs.

Last edited 17 months ago by saraedum (previous) (diff)

comment:262 Changed 17 months ago by embray

Right, I see that now, and it works fine on successful builds. It's just odd that there's no output for the recent failures. Is that just because it can't even contact the runner?

comment:263 follow-up: Changed 17 months ago by embray

Yes, here's an example where build-from-clean failed but still provided the runner info: https://gitlab.com/saraedum/sage/-/jobs/79424547

Is slelievre providing this runner?

comment:264 follow-up: Changed 17 months ago by embray

Instead of having build-from-clean rely on runners tagged "do" (which I understand stands for Digital Ocean, though out of context I just read it as the word "do"), maybe let's have a tag like "big" or "fat" indicating "big enough to build sage from scratch", since in principle such a runner could be hosted anywhere (in fact I will probably set one up on our OpenStack infrastructure ASAP).

comment:265 in reply to: ↑ 264 Changed 17 months ago by embray

Replying to embray:

Instead of having build-from-clean rely on runners tagged "do" (which I understand stands for Digital Ocean, though out of context I just read it as the word "do"), maybe let's have a tag like "big" or "fat" indicating "big enough to build sage from scratch", since in principle such a runner could be hosted anywhere (in fact I will probably set one up on our OpenStack infrastructure ASAP).

Amusingly, moments after I posted this, said OpenStack infrastructure where I am currently working on multiple VMs, went down completely. As if their internet uplink was killed or something. Completely unreachable.

comment:266 Changed 17 months ago by saraedum

I think you're right. The original idea was that "build-from-clean" worked for the free shared runners provided by gitlab.com that were tagged as "do". This is not reliably the case anymore so we should change that tag.

comment:267 Changed 17 months ago by saraedum

  • Work issues set to replace "do" with something else

comment:268 Changed 17 months ago by git

  • Commit changed from d29058e7950b177f6d0c48c88248790ec6a86a3b to 2cec8ac704cddc2c65cbc439a7b07c7bcc953f1b

Branch pushed to git repo; I updated commit sha1. New commits:

2cec8acReplace do tag with big

comment:269 Changed 17 months ago by git

  • Commit changed from 2cec8ac704cddc2c65cbc439a7b07c7bcc953f1b to 624101115b9af149f8bf0d7cfe2f63d9ccf5201d

Branch pushed to git repo; I updated commit sha1. New commits:

6241011Fix namespacing for CircleCI

comment:270 follow-up: Changed 17 months ago by saraedum

embray: I am not sure how we want to do the review on this one. I could do full tests of all the relevant scenarios today and tomorrow. The problem is: If during your review you find things that you want to see changed, then I have to go through the whole testing process again.

The alternative is: You review the current state, and I do the full test just before we set this to positive review.

The first option is more straightforward probably. But it only works if you think that there ane not going to be relevant changes necessary. (I am not sure how familiar you are with the current state of this ticket.)

comment:271 in reply to: ↑ 263 Changed 17 months ago by saraedum

Replying to embray:

Yes, here's an example where build-from-clean failed but still provided the runner info: https://gitlab.com/saraedum/sage/-/jobs/79424547

Is slelievre providing this runner?

Yes, the one he's running say "slelievre" in the first line. Mine say "jrueth".

comment:272 in reply to: ↑ 270 ; follow-up: Changed 17 months ago by embray

Replying to saraedum:

embray: I am not sure how we want to do the review on this one. I could do full tests of all the relevant scenarios today and tomorrow. The problem is: If during your review you find things that you want to see changed, then I have to go through the whole testing process again.

The alternative is: You review the current state, and I do the full test just before we set this to positive review.

The first option is more straightforward probably. But it only works if you think that there ane not going to be relevant changes necessary. (I am not sure how familiar you are with the current state of this ticket.)

I'm mostly caught up on it--it is a lot though. As far as I'm concerned it's "positive review". While I believe there are still areas we'll want to improve a lot of it comes down to computing resources (and other related issues like continuing to improve the resource usage of the doc build, for which I have ideas but haven't had a chance to try them yet).

Otherwise I believe you that it works, and I'd like to be able to start using it right away, so it's a provisional Positive Review from me. My only remaining question is: Do we want to go ahead now and bring this work up on sage-devel and ask others to weight in? Or would that be too disruptive overall? (If nothing else we'll need to get Volker to at least agree to merge it.)

comment:273 follow-up: Changed 17 months ago by embray

I think we should also approach companies like Google, Digital Ocean, Microsoft, etc. (and what about for OSX??) to see if we can get some donated resources from them for this purpose. I don't think it's outside the realm of possibility, especially with a decent proposal (this would definitely fall under OpenDreamKit? work I think). But that's something we can work on in parallel.

comment:274 in reply to: ↑ 272 Changed 17 months ago by saraedum

  • Work issues changed from replace "do" with something else to check that everything works ⇒ positive review

Replying to embray:

Replying to saraedum:

embray: I am not sure how we want to do the review on this one. I could do full tests of all the relevant scenarios today and tomorrow. The problem is: If during your review you find things that you want to see changed, then I have to go through the whole testing process again.

The alternative is: You review the current state, and I do the full test just before we set this to positive review.

The first option is more straightforward probably. But it only works if you think that there ane not going to be relevant changes necessary. (I am not sure how familiar you are with the current state of this ticket.)

I'm mostly caught up on it--it is a lot though. As far as I'm concerned it's "positive review". While I believe there are still areas we'll want to improve a lot of it comes down to computing resources (and other related issues like continuing to improve the resource usage of the doc build, for which I have ideas but haven't had a chance to try them yet).

Otherwise I believe you that it works, and I'd like to be able to start using it right away, so it's a provisional Positive Review from me.

Great. Let me test that everything works then. And improving resource usage of the build would be quite amazing. Let me know if I can support you in any way there.

My only remaining question is: Do we want to go ahead now and bring this work up on sage-devel and ask others to weight in? Or would that be too disruptive overall? (If nothing else we'll need to get Volker to at least agree to merge it.)

I don't think we need general consent from or discussion on sage-devel. For me this ticket is only about automatically building docker images to publish them on our Docker Hub account more conveniently. It's not about replacing the patchbot, migrating to GitLab, or anything like that.

At the same time this can of course be a first step towards more important changes, most promimently for me the binder ticket #24842. I'll put that one up for discussion on sage-devel once it's ready.

comment:275 in reply to: ↑ 273 ; follow-up: Changed 17 months ago by saraedum

Replying to embray:

I think we should also approach companies like Google, Digital Ocean, Microsoft, etc. (and what about for OSX??) to see if we can get some donated resources from them for this purpose. I don't think it's outside the realm of possibility, especially with a decent proposal (this would definitely fall under OpenDreamKit? work I think). But that's something we can work on in parallel.

Could you create a trac ticket to keep track of this? Or an issue on the ODK github account?

comment:276 Changed 17 months ago by saraedum

  • Work issues changed from check that everything works ⇒ positive review to check that everything works ⇒ (provisional) positive review

comment:277 in reply to: ↑ 275 ; follow-up: Changed 17 months ago by embray

Replying to saraedum:

Replying to embray:

I think we should also approach companies like Google, Digital Ocean, Microsoft, etc. (and what about for OSX??) to see if we can get some donated resources from them for this purpose. I don't think it's outside the realm of possibility, especially with a decent proposal (this would definitely fall under OpenDreamKit? work I think). But that's something we can work on in parallel.

Could you create a trac ticket to keep track of this? Or an issue on the ODK github account?

Speaking of which, I was going to ask if you saw my messages on Zulip. But it appears to be down (and I'm afraid I might have broken it, somehow, by trying to log in): https://zulip.sagemath.org/

But I believe this could fall under D3.8

comment:278 in reply to: ↑ 277 Changed 17 months ago by roed

Replying to embray:

Speaking of which, I was going to ask if you saw my messages on Zulip. But it appears to be down (and I'm afraid I might have broken it, somehow, by trying to log in): https://zulip.sagemath.org/

I've upgraded zulip to the most recent version and restarted the server; it seems to be working now.

comment:279 Changed 17 months ago by saraedum

  • Description modified (diff)

comment:280 Changed 16 months ago by git

  • Commit changed from 624101115b9af149f8bf0d7cfe2f63d9ccf5201d to 54cfea3463c4815df3c42a0c4468efc9511f3f0b

Branch pushed to git repo; I updated commit sha1. New commits:

6e267b6Fix namespacing for GitLab CI
54cfea3Merge remote-tracking branch 'trac/develop' into 24655

comment:281 Changed 16 months ago by saraedum

  • Description modified (diff)

comment:282 Changed 16 months ago by saraedum

  • Description modified (diff)

comment:283 Changed 16 months ago by saraedum

  • Description modified (diff)

comment:284 Changed 16 months ago by saraedum

  • Description modified (diff)

comment:285 Changed 16 months ago by saraedum

  • Description modified (diff)

comment:286 follow-up: Changed 16 months ago by embray

[ ] build-from-clean works in a user namespace on CircleCI, ​https://circleci.com/workflow-run/4ae6af8c-2212-4724-a865-a401be4bd8b7

[ ] build-from-latest works and is fast in a user namespace on GitLab?

[ ] build-from-latest works and is fast in a user namespace on CircleCI

In this case what do you mean by "user namespace"? You mean like, a personal fork? I could try that. Would I need to set up a self-provided runner? I can do that too (and should practice that anyways).

comment:287 Changed 16 months ago by saraedum

  • Description modified (diff)

comment:288 in reply to: ↑ 286 ; follow-up: Changed 16 months ago by saraedum

Replying to embray:

[ ] build-from-clean works in a user namespace on CircleCI, ​https://circleci.com/workflow-run/4ae6af8c-2212-4724-a865-a401be4bd8b7

[ ] build-from-latest works and is fast in a user namespace on GitLab?

[ ] build-from-latest works and is fast in a user namespace on CircleCI

In this case what do you mean by "user namespace"? You mean like, a personal fork? I could try that.

Yes, I mean the typical use case for personal forks. More specifically, I meant the case where we are uploading to a docker hub account that is not the sagemath account. You might want to wait until https://gitlab.com/saraedum/sage/pipelines/25831675 has completed (so the sagemath/sagemath-dev:develop image has been upgraded), otherwise it won't be fast. Once that has happened, a branch that contains the changes in this ticket should build automatically on any GitLab fork without further setup.

Would I need to set up a self-provided runner? I can do that too (and should practice that anyways).

No, you don't need private runners for this. The free shared runners should be sufficient.

comment:289 in reply to: ↑ 288 Changed 16 months ago by embray

Replying to saraedum:

Replying to embray:

[ ] build-from-clean works in a user namespace on CircleCI, ​https://circleci.com/workflow-run/4ae6af8c-2212-4724-a865-a401be4bd8b7

[ ] build-from-latest works and is fast in a user namespace on GitLab?

[ ] build-from-latest works and is fast in a user namespace on CircleCI

In this case what do you mean by "user namespace"? You mean like, a personal fork? I could try that.

Yes, I mean the typical use case for personal forks. More specifically, I meant the case where we are uploading to a docker hub account that is not the sagemath account. You might want to wait until https://gitlab.com/saraedum/sage/pipelines/25831675 has completed (so the sagemath/sagemath-dev:develop image has been upgraded), otherwise it won't be fast. Once that has happened, a branch that contains the changes in this ticket should build automatically on any GitLab fork without further setup.

What about circle-ci? For that surely I'd at least have to enable it in my repository?

comment:290 Changed 16 months ago by saraedum

Yes, once the sagemath/sagemath-dev:develop has been upgraded, it should work by simply enabling it in your repository with no further configuration.

comment:291 Changed 16 months ago by saraedum

  • Description modified (diff)

comment:292 Changed 16 months ago by saraedum

  • Description modified (diff)

comment:293 Changed 16 months ago by saraedum

  • Description modified (diff)

comment:294 Changed 16 months ago by saraedum

  • Description modified (diff)

comment:295 Changed 16 months ago by saraedum

  • Description modified (diff)

comment:296 follow-up: Changed 16 months ago by saraedum

Everything relevant works. However, strangely, when "building from latest", the resulting images have different sizes on CircleCI and GitLab CI:

sagemath: 611MB vs 635MB

sagemath-dev: 4GB vs 2GB

The dev images are not relevant in this case but still this is unfortunate. More importantly I would like to understand why these are different in size as the docker builds should be the same on both platforms.

comment:297 Changed 16 months ago by saraedum

  • Description modified (diff)

comment:298 Changed 16 months ago by git

  • Commit changed from 54cfea3463c4815df3c42a0c4468efc9511f3f0b to 220d948c93048bab2a5a909b377f012374c01b06

Branch pushed to git repo; I updated commit sha1. New commits:

b12cd9fDo not use broken docker version on CircleCI
220d948Warn about aufs

comment:299 Changed 16 months ago by git

  • Commit changed from 220d948c93048bab2a5a909b377f012374c01b06 to c40bb81d324fc08f1ea6179626e0a5ddcddd5fbb

Branch pushed to git repo; I updated commit sha1. New commits:

c40bb81Add reference to known chown bug in docker/aufs

comment:300 in reply to: ↑ 296 ; follow-up: Changed 16 months ago by saraedum

Replying to saraedum:

Everything relevant works. However, strangely, when "building from latest", the resulting images have different sizes on CircleCI and GitLab CI:

sagemath: 611MB vs 635MB

This is due to a difference in docker daemon version. I don't know what's the difference between the two images but I know tell CircleCI to build using the latest stable version and both are 635MB in size.

sagemath-dev: 4GB vs 2GB

This is a known bug in docker on aufs (that CircleCI is using.) There's isn't really anything we can do about it.

comment:301 follow-up: Changed 16 months ago by saraedum

  • Work issues changed from check that everything works ⇒ (provisional) positive review to 8.3rc2 release ⇒ positive review

embray: I'll set this to positive review once I released 8.3rc2 as a final check.

comment:302 Changed 16 months ago by saraedum

  • Description modified (diff)

comment:303 follow-up: Changed 16 months ago by embray

There's a problem with these Docker images, which is that it's impossible to install optional packages, which is a thing people are going to want to be able to do. This is a problem, of course, because being able to install optional packages means at least having some minimal build toolchain available in the image.

I suppose they could apt-get install that after the fact, if desired (though this needs to be documented somehow). But then there's the problem that /home/sage/sage is thoroughly stripped down, and does not have configure, build/make/Makefile, or anything else required for sage -i to work at a minimum. This leads to the unhelpful error message

/home/sage/sage/src/bin/sage: line 353: make: command not found

or even if you sudo apt-get install make:

make: *** No rule to make target 'all-toolchain'.  Stop.

I'm not sure what we do about this.

comment:304 in reply to: ↑ 300 Changed 16 months ago by embray

Replying to saraedum:

Replying to saraedum:

Everything relevant works. However, strangely, when "building from latest", the resulting images have different sizes on CircleCI and GitLab CI:

sagemath: 611MB vs 635MB

This is due to a difference in docker daemon version. I don't know what's the difference between the two images but I know tell CircleCI to build using the latest stable version and both are 635MB in size.

sagemath-dev: 4GB vs 2GB

This is a known bug in docker on aufs (that CircleCI is using.) There's isn't really anything we can do about it.

Thanks for researching and making note of it. What a pain. I wish Docker would like, stabilize :)

Do you know if this issue has been brought up with CircleCI?

comment:305 in reply to: ↑ 303 ; follow-up: Changed 16 months ago by saraedum

Replying to embray:

There's a problem with these Docker images, which is that it's impossible to install optional packages, which is a thing people are going to want to be able to do. This is a problem, of course, because being able to install optional packages means at least having some minimal build toolchain available in the image.

I suppose they could apt-get install that after the fact, if desired (though this needs to be documented somehow). But then there's the problem that /home/sage/sage is thoroughly stripped down, and does not have configure, build/make/Makefile, or anything else required for sage -i to work at a minimum. This leads to the unhelpful error message

/home/sage/sage/src/bin/sage: line 353: make: command not found

or even if you sudo apt-get install make:

make: *** No rule to make target 'all-toolchain'.  Stop.

I'm not sure what we do about this.

No clue really. sage -pip works afaik, so that's good enough for some I think.

I am not sure who would want to install more packages. Since you need to add a bunch of build dependencies, we could consider pointing people to the dev images instead maybe? Actually many optional dependencies should already be detected by the features framework so they could be installed through apt-get?

comment:306 in reply to: ↑ 305 Changed 16 months ago by embray

Replying to saraedum:

Replying to embray:

There's a problem with these Docker images, which is that it's impossible to install optional packages, which is a thing people are going to want to be able to do. This is a problem, of course, because being able to install optional packages means at least having some minimal build toolchain available in the image.

I suppose they could apt-get install that after the fact, if desired (though this needs to be documented somehow). But then there's the problem that /home/sage/sage is thoroughly stripped down, and does not have configure, build/make/Makefile, or anything else required for sage -i to work at a minimum. This leads to the unhelpful error message

/home/sage/sage/src/bin/sage: line 353: make: command not found

or even if you sudo apt-get install make:

make: *** No rule to make target 'all-toolchain'.  Stop.

I'm not sure what we do about this.

No clue really. sage -pip works afaik, so that's good enough for some I think.

Not for anything that's installed with sage -i

I am not sure who would want to install more packages. Since you need to add a bunch of build dependencies, we could consider pointing people to the dev images instead maybe?

For now that's what I've done, and what I suppose we can continue to do for now. Installing optional packages works for the dev images.

Actually many optional dependencies should already be detected by the features framework so they could be installed through apt-get?

Yes, that's something we should work on separately. There should be a framework for optional packages such that we can easily supply the distro-specific commands to install any dependencies needed for optional packages. For the docker images (or any other Sage installed on debian or ubuntu) this would of course be the appropriate apt-get command. For other distros it might supply the correct package names to pass to emerge, yum, etc. This info could come from a distro-specific config file, that we can either maintain in the Sage repo directly, or that could be provided as a patch by downstream distributors.

For this to work we also need to include, at a minimum, the build/make/Makefile in the installed Sage as well. Anyways, I don't think that's something we should do for this ticket. Just something to consider.

comment:307 in reply to: ↑ 301 Changed 16 months ago by embray

Replying to saraedum:

embray: I'll set this to positive review once I released 8.3rc2 as a final check.

+1

comment:308 Changed 16 months ago by git

  • Commit changed from c40bb81d324fc08f1ea6179626e0a5ddcddd5fbb to 8b104295c4bb8f8a8256aa0ccf1cd6bd6b332588

Branch pushed to git repo; I updated commit sha1. New commits:

8b10429Merge remote-tracking branch 'trac/develop' into 24655

comment:309 Changed 16 months ago by saraedum

  • Status changed from needs_review to positive_review
  • Work issues 8.3rc2 release ⇒ positive review deleted

The 8.3rc2 release worked fine, so we can finally set this to positive review.

comment:310 Changed 16 months ago by git

  • Commit changed from 8b104295c4bb8f8a8256aa0ccf1cd6bd6b332588 to d313c11a504baf2c6476357d345f121e0fb855e9
  • Status changed from positive_review to needs_review

Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:

d313c11Merge remote-tracking branch 'trac/develop' into 24655

comment:311 Changed 16 months ago by saraedum

  • Status changed from needs_review to positive_review

No need to set this back to needs review after a trivial merge with trac/develop.

comment:312 Changed 16 months ago by embray

+1

comment:313 Changed 16 months ago by git

  • Commit changed from d313c11a504baf2c6476357d345f121e0fb855e9 to 4152838ef924933eca5ee54447826c803d2dff00
  • Status changed from positive_review to needs_review

Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:

4152838Merge remote-tracking branch 'trac/develop' into 24655

comment:314 Changed 16 months ago by saraedum

  • Status changed from needs_review to positive_review

(trivial merge)

comment:315 Changed 16 months ago by saraedum

Btw., I just published 8.3 on sagemath/sagemath and sagemath/sagemath-dev.

comment:316 Changed 16 months ago by git

  • Commit changed from 4152838ef924933eca5ee54447826c803d2dff00 to a85023cef50c4cfb99a2a2bdd3504cd78b4c0327
  • Status changed from positive_review to needs_review

Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:

a85023cMerge remote-tracking branch 'trac/develop' into 24655

comment:317 Changed 16 months ago by saraedum

  • Status changed from needs_review to positive_review

Back to positive review after a trivial merge. (vbraun: I am merging in develop as this triggers automatic builds of our docker images on gitlab.com.)

comment:318 Changed 16 months ago by git

  • Commit changed from a85023cef50c4cfb99a2a2bdd3504cd78b4c0327 to 6fa8bc6a16cfb7ce61598f38052729775cf898c4
  • Status changed from positive_review to needs_review

Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:

6fa8bc6Builds fail with 8.4.beta0 but the output is truncated and it is not clear what's the issue.

comment:319 Changed 15 months ago by git

  • Commit changed from 6fa8bc6a16cfb7ce61598f38052729775cf898c4 to 3af7c5006c0a9b3583f2bb2041c815c607b2f3f2

Branch pushed to git repo; I updated commit sha1. New commits:

3af7c50Merge remote-tracking branch 'trac/develop' into HEAD

comment:320 Changed 15 months ago by saraedum

embray, I have a feeling that the build now hangs more frequently than it used to. I have now seen this twice locally and I had CI machines time out after 12 hours of trying to build Sage. (I checked locally that nothing was happening anymore in the background.)

comment:321 follow-up: Changed 15 months ago by embray

  • Status changed from needs_review to positive_review

I can't imagine why. Nobody else has reported builds hanging. I'm not sure what you mean in this case by "the build", because there's nothing special about the way sage is being built ist here?

This ticket has gone on too long. Let's see if we can get Volker to merge it and then we can continue tweaking from there. It might also help if you could squash down some of the massive commit history into something a little easier to digest, but I know that can be daunting in its own right...

comment:322 follow-up: Changed 15 months ago by embray

Also, can you tell where the builds have been hanging?

They've been some tinkering with Docker Hub lately, and I know they have some planned downtime coming up.

comment:323 follow-up: Changed 15 months ago by embray

I'm a little confused by the latest jobs on https://gitlab.com/saraedum/sage/pipelines

The last three jobs were all building the exact same commit: https://gitlab.com/saraedum/sage/commit/3af7c5006c0a9b3583f2bb2041c815c607b2f3f2

But one was triggered on an update to the "develop" branch, one was triggered on an update to a "build-from-latest" branch (what's that about?--apologies if I've already asked), and one was triggered by the new 8.4.beta1 tag. Although, the way Sage is currently developed, updates to the "develop" branch always coincide with new tags as well. I don't like that, but for now it's superfluous to trigger builds on new tags.

Also, with the two that failed, it looks like I was right about it being something to do with Docker Hub, but I could be misreading the logs. I'm not sure if that's what you're referring to though.

Last edited 15 months ago by embray (previous) (diff)

comment:324 follow-up: Changed 15 months ago by embray

The latest Circle-CI build, however, timed out at

 ---> 3753d48ddd74
Step 68/76 : RUN if [ x"$ARTIFACT_BASE" != x"source-clean" ]; then         mkdir -p $HOME/patch         && find . -type f > $HOME/make-fast-rebuild-clean.manifest         && cat $HOME/make-fast-rebuild-clean.manifest $HOME/artifact-base.manifest | sort | uniq -u > $HOME/obsolete         && find . -type f -cnewer $HOME/artifact-base.manifest > $HOME/modified         && tar -cJf $HOME/patch/modified.tar.xz -T $HOME/modified         && cat $HOME/obsolete $HOME/modified | xz > $HOME/patch/modified.xz         && rm -rf $SAGE_ROOT         && mkdir -p $SAGE_ROOT         && mv $HOME/patch $SAGE_ROOT/;     fi
 ---> Running in ea5237ee3bbd
Build timedout

I have no idea what any of that means.

(As an aside, I find it annoying that Circle-CI will show you the beginning of the log, but you have to download the full log to see the end. On GitLab it will also truncate the log, but it shows the end of the log first, which is almost always more useful when it comes to diagnosing problems. I wonder if that's at all configurable on Circle...)

comment:325 in reply to: ↑ 322 Changed 15 months ago by saraedum

Replying to embray:

Also, can you tell where the builds have been hanging?

Unfortunately I don't really know. My top showed a python2 process that did not do anything anymore but that apparently was the only thing that make was waiting for. I did not investigate further.

On GitLab I could not see where it hung because we truncate the logs ourselves and we were already in that portion.

They've been some tinkering with Docker Hub lately, and I know they have some planned downtime coming up.

Yes, that explains some of the errors when pushing images in the end. But that's not what I have been refering to here.

comment:326 in reply to: ↑ 323 ; follow-up: Changed 15 months ago by saraedum

Replying to embray:

I'm a little confused by the latest jobs on https://gitlab.com/saraedum/sage/pipelines

The last three jobs were all building the exact same commit: https://gitlab.com/saraedum/sage/commit/3af7c5006c0a9b3583f2bb2041c815c607b2f3f2

Yes, that's fine. I pushed one to a branch (that's not called develop and not called master) to see if the build-from-latest still works. However, due to quite a bit of SPKG changes, the build timed out in the 3h time limit that is in place for build-from-latest (we use the public shared gitlab runners, and they enforce that time limit.)

The other two runs are one build-from-clean for the develop branch and one build-from-clean for the 8.4.beta1 tag. That's intentional and how GitLab CI works. The rules don't determine that there are several triggers that come from the same commit. CircleCI does that and it is imho very annoying to work around.

As a result, the develop and the 8.4.beta1 tag on docker hub do not come from the same run of "docker build". But since "docker build" should be reproducible anyway (in particular if we build-from-clean) I don't see this as a problem apart from the wasted resources.

comment:327 in reply to: ↑ 324 ; follow-up: Changed 15 months ago by saraedum

Replying to embray:

The latest Circle-CI build, however, timed out at

 ---> 3753d48ddd74
Step 68/76 : RUN if [ x"$ARTIFACT_BASE" != x"source-clean" ]; then         mkdir -p $HOME/patch         && find . -type f > $HOME/make-fast-rebuild-clean.manifest         && cat $HOME/make-fast-rebuild-clean.manifest $HOME/artifact-base.manifest | sort | uniq -u > $HOME/obsolete         && find . -type f -cnewer $HOME/artifact-base.manifest > $HOME/modified         && tar -cJf $HOME/patch/modified.tar.xz -T $HOME/modified         && cat $HOME/obsolete $HOME/modified | xz > $HOME/patch/modified.xz         && rm -rf $SAGE_ROOT         && mkdir -p $SAGE_ROOT         && mv $HOME/patch $SAGE_ROOT/;     fi
 ---> Running in ea5237ee3bbd
Build timedout

Yes, build-from-clean times out on CircleCI. That's a known and documented issue.

(As an aside, I find it annoying that Circle-CI will show you the beginning of the log, but you have to download the full log to see the end. On GitLab it will also truncate the log, but it shows the end of the log first, which is almost always more useful when it comes to diagnosing problems. I wonder if that's at all configurable on Circle...)

No, it can't be configured and it's a common complaint about CircleCI.

comment:328 in reply to: ↑ 321 Changed 15 months ago by saraedum

Replying to embray:

I can't imagine why. Nobody else has reported builds hanging. I'm not sure what you mean in this case by "the build", because there's nothing special about the way sage is being built ist here?

This ticket has gone on too long. Let's see if we can get Volker to merge it and then we can continue tweaking from there. It might also help if you could squash down some of the massive commit history into something a little easier to digest, but I know that can be daunting in its own right...

I prefer not to squash the commits just because it's a lot of work. I am trying to write meaningful commit messages so if somebody really looks at the commit history some day it should be possible to make sense out of it.

But, yeah, let's get this back to positive review.

comment:329 Changed 15 months ago by embray

I spent some time reading through the log of the recent Circle-CI build that timed out, and that was at least edifying, to understand better step-by-step what the process is. You've walked me through it a few times but it's still...complicated, to say the least.

I'm still not clear on what these timeouts you're referring to are, then, if it's not anything I could see on the recent builds...

comment:330 in reply to: ↑ 327 ; follow-up: Changed 15 months ago by embray

Replying to saraedum:

Replying to embray:

The latest Circle-CI build, however, timed out at

 ---> 3753d48ddd74
Step 68/76 : RUN if [ x"$ARTIFACT_BASE" != x"source-clean" ]; then         mkdir -p $HOME/patch         && find . -type f > $HOME/make-fast-rebuild-clean.manifest         && cat $HOME/make-fast-rebuild-clean.manifest $HOME/artifact-base.manifest | sort | uniq -u > $HOME/obsolete         && find . -type f -cnewer $HOME/artifact-base.manifest > $HOME/modified         && tar -cJf $HOME/patch/modified.tar.xz -T $HOME/modified         && cat $HOME/obsolete $HOME/modified | xz > $HOME/patch/modified.xz         && rm -rf $SAGE_ROOT         && mkdir -p $SAGE_ROOT         && mv $HOME/patch $SAGE_ROOT/;     fi
 ---> Running in ea5237ee3bbd
Build timedout

Yes, build-from-clean times out on CircleCI. That's a known and documented issue.

Is it? I see your note in the Dockerfile about it making the images too big, but not about it timing out. If it's a known issue then shouldn't we just disable it for now?

comment:331 in reply to: ↑ 326 ; follow-up: Changed 15 months ago by embray

Replying to saraedum:

The other two runs are one build-from-clean for the develop branch and one build-from-clean for the 8.4.beta1 tag. That's intentional and how GitLab CI works. The rules don't determine that there are several triggers that come from the same commit. CircleCI does that and it is imho very annoying to work around.

As a result, the develop and the 8.4.beta1 tag on docker hub do not come from the same run of "docker build". But since "docker build" should be reproducible anyway (in particular if we build-from-clean) I don't see this as a problem apart from the wasted resources.

I still feel like we ought to be able to work around that somehow. I don't see why we would want to do two complete build-from-cleans of the same exact commit just so they wind up with different tags on Docker Hub.

I know you've thought about this more than I have so I'm probably missing something. But my thinking is that until and unless Sage's "develop" branch is decoupled from its tags (something I think should happen but is not the case for now), we should edit the .gitlab-ci.yml and .circleci/config.yml to really only do build-from-clean on the develop branch (not even master). If we want to then tag an image with a version number, we can take the current commit ID on develop, do git describe --tags <commit id>, and get the associated tag name. Then use that to tag the image.

comment:332 in reply to: ↑ 331 ; follow-up: Changed 15 months ago by saraedum

Replying to embray:

Replying to saraedum:

The other two runs are one build-from-clean for the develop branch and one build-from-clean for the 8.4.beta1 tag. That's intentional and how GitLab CI works. The rules don't determine that there are several triggers that come from the same commit. CircleCI does that and it is imho very annoying to work around.

As a result, the develop and the 8.4.beta1 tag on docker hub do not come from the same run of "docker build". But since "docker build" should be reproducible anyway (in particular if we build-from-clean) I don't see this as a problem apart from the wasted resources.

I still feel like we ought to be able to work around that somehow. I don't see why we would want to do two complete build-from-cleans of the same exact commit just so they wind up with different tags on Docker Hub.

It's a bit weird, I agree. But that's how people do that on GitLab CI. And usually builds don't take five hours, so it doesn't matter.

I know you've thought about this more than I have so I'm probably missing something. But my thinking is that until and unless Sage's "develop" branch is decoupled from its tags (something I think should happen but is not the case for now), we should edit the .gitlab-ci.yml and .circleci/config.yml to really only do build-from-clean on the develop branch (not even master). If we want to then tag an image with a version number, we can take the current commit ID on develop, do git describe --tags <commit id>, and get the associated tag name. Then use that to tag the image.

This would break if the branch and the tag are not pushed at the same time. Also, it's reasonable that you don't want to push all tags but just protected tags, e.g., I don't want to push the tags that I just use to check that building from tags actually works.

In short, I see your point but I'd rather leave it the way it is.

comment:333 in reply to: ↑ 330 ; follow-up: Changed 15 months ago by saraedum

Replying to embray:

Replying to saraedum:

Replying to embray:

The latest Circle-CI build, however, timed out at

 ---> 3753d48ddd74
Step 68/76 : RUN if [ x"$ARTIFACT_BASE" != x"source-clean" ]; then         mkdir -p $HOME/patch         && find . -type f > $HOME/make-fast-rebuild-clean.manifest         && cat $HOME/make-fast-rebuild-clean.manifest $HOME/artifact-base.manifest | sort | uniq -u > $HOME/obsolete         && find . -type f -cnewer $HOME/artifact-base.manifest > $HOME/modified         && tar -cJf $HOME/patch/modified.tar.xz -T $HOME/modified         && cat $HOME/obsolete $HOME/modified | xz > $HOME/patch/modified.xz         && rm -rf $SAGE_ROOT         && mkdir -p $SAGE_ROOT         && mv $HOME/patch $SAGE_ROOT/;     fi
 ---> Running in ea5237ee3bbd
Build timedout

Yes, build-from-clean times out on CircleCI. That's a known and documented issue.

Is it? I see your note in the Dockerfile about it making the images too big, but not about it timing out. If it's a known issue then shouldn't we just disable it for now?

It's in the CircleCI configuration file:

# To make the build time not too excessive, we seed the build cache with
# sagemath/sagemath-dev:develop. When basic SPKGs change, this does not help much,
# still the full build does usually not exceed CircleCI's limits for open
# source projcets (five hours on 2 vCPUs as of early 2018.)
# You might want to try to build locally or with GitLab CI, see
# `.gitlab-ci.yml` for more options.

comment:334 in reply to: ↑ 333 Changed 15 months ago by saraedum

Replying to saraedum:

Replying to embray:

Replying to saraedum:

Replying to embray:

The latest Circle-CI build, however, timed out at

 ---> 3753d48ddd74
Step 68/76 : RUN if [ x"$ARTIFACT_BASE" != x"source-clean" ]; then         mkdir -p $HOME/patch         && find . -type f > $HOME/make-fast-rebuild-clean.manifest         && cat $HOME/make-fast-rebuild-clean.manifest $HOME/artifact-base.manifest | sort | uniq -u > $HOME/obsolete         && find . -type f -cnewer $HOME/artifact-base.manifest > $HOME/modified         && tar -cJf $HOME/patch/modified.tar.xz -T $HOME/modified         && cat $HOME/obsolete $HOME/modified | xz > $HOME/patch/modified.xz         && rm -rf $SAGE_ROOT         && mkdir -p $SAGE_ROOT         && mv $HOME/patch $SAGE_ROOT/;     fi
 ---> Running in ea5237ee3bbd
Build timedout

Yes, build-from-clean times out on CircleCI. That's a known and documented issue.

Is it? I see your note in the Dockerfile about it making the images too big, but not about it timing out. If it's a known issue then shouldn't we just disable it for now?

It's in the CircleCI configuration file:

# To make the build time not too excessive, we seed the build cache with
# sagemath/sagemath-dev:develop. When basic SPKGs change, this does not help much,
# still the full build does usually not exceed CircleCI's limits for open
# source projcets (five hours on 2 vCPUs as of early 2018.)
# You might want to try to build locally or with GitLab CI, see
# `.gitlab-ci.yml` for more options.

But it says the wrong thing ouch.

comment:335 Changed 15 months ago by git

  • Commit changed from 3af7c5006c0a9b3583f2bb2041c815c607b2f3f2 to ac6201a942aa19b6181c463cac8e9588e92b3d8a
  • Status changed from positive_review to needs_review

Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:

ac6201aTimeouts DO happen on CircleCI

comment:336 in reply to: ↑ 332 ; follow-up: Changed 15 months ago by embray

Replying to saraedum:

Replying to embray:

Replying to saraedum:

The other two runs are one build-from-clean for the develop branch and one build-from-clean for the 8.4.beta1 tag. That's intentional and how GitLab CI works. The rules don't determine that there are several triggers that come from the same commit. CircleCI does that and it is imho very annoying to work around.

As a result, the develop and the 8.4.beta1 tag on docker hub do not come from the same run of "docker build". But since "docker build" should be reproducible anyway (in particular if we build-from-clean) I don't see this as a problem apart from the wasted resources.

I still feel like we ought to be able to work around that somehow. I don't see why we would want to do two complete build-from-cleans of the same exact commit just so they wind up with different tags on Docker Hub.

It's a bit weird, I agree. But that's how people do that on GitLab CI. And usually builds don't take five hours, so it doesn't matter.

Usually...

I know you've thought about this more than I have so I'm probably missing something. But my thinking is that until and unless Sage's "develop" branch is decoupled from its tags (something I think should happen but is not the case for now), we should edit the .gitlab-ci.yml and .circleci/config.yml to really only do build-from-clean on the develop branch (not even master). If we want to then tag an image with a version number, we can take the current commit ID on develop, do git describe --tags <commit id>, and get the associated tag name. Then use that to tag the image.

This would break if the branch and the tag are not pushed at the same time. Also, it's reasonable that you don't want to push all tags but just protected tags, e.g., I don't want to push the tags that I just use to check that building from tags actually works.

In short, I see your point but I'd rather leave it the way it is.

Why not build just the tags then?

comment:337 Changed 15 months ago by embray

  • Status changed from needs_review to positive_review

comment:338 Changed 15 months ago by saraedum

Back to positive review. Only a minor documentation change.

So, I would not disable the CircleCI builds as it does not do any harm:

  • We won't configure CircleCI on our sagemath github repository (because it does not work reliably and we are building on GitLab CI anyway.) so there is not going to be confusing messages there.
  • If somebody has a clone of our repo and CircleCI enabled, then these builds only happen if they push to master or develop or if they push a tag (something people usually don't do in forks).
  • Finally, it sometimes works. It really seems to be about how much load is on the actual machine that runs your docker container. I've seen builds complete after 4:59 or sometimes after 4:45. If we dropped a standard SPKG (python 2 eventually?) or Cython 2.9.0 or the next GCC is compiling faster for some reason, the builds are going to work reliably.

That said, #26085 should comunicate such things more clearly in the actual CI output.

comment:339 in reply to: ↑ 336 ; follow-up: Changed 15 months ago by saraedum

Replying to embray:

Why not build just the tags then?

I would need some logic that decides whether this tag is a "develop" or a "latest" tag. And maybe even have to decide whether this is really "later" than what's currently there. It's certainly doable but I don't see enough of an advantage really.

comment:340 in reply to: ↑ 339 ; follow-up: Changed 15 months ago by embray

Replying to saraedum:

Replying to embray:

Why not build just the tags then?

I would need some logic that decides whether this tag is a "develop" or a "latest" tag. And maybe even have to decide whether this is really "later" than what's currently there. It's certainly doable but I don't see enough of an advantage really.

I'm not sure what you mean by this. Don't we have to do that anyways, somewhere, based on the version?

comment:341 in reply to: ↑ 340 Changed 15 months ago by saraedum

Replying to embray:

Replying to saraedum:

Replying to embray:

Why not build just the tags then?

I would need some logic that decides whether this tag is a "develop" or a "latest" tag. And maybe even have to decide whether this is really "later" than what's currently there. It's certainly doable but I don't see enough of an advantage really.

I'm not sure what you mean by this. Don't we have to do that anyways, somewhere, based on the version?

No, the master branch pushes to the "latest" tag, the develop branch pushes to the "develop" tag. Or, was that not your question maybe?

comment:342 follow-up: Changed 15 months ago by embray

I just don't see why we should build the exact same thing up to three times every time there's a release. I looked at what happened when 8.3 was released and that added up to more than 15 hours worth of building (some of it was because push-dockerhub-dev took 2.5 hours??)--all the exact same thing--three times.

I don't know why we need to use the git branch/tag that happens to be being built to determine which docker-tag to push.

Every time 'develop' is updated a tag is created, and every time the tag is for a final release, 'master' also gets updated. I don't like this process or the branch names but for now it is what it is. Thus, it's only necessary to trigger a build-from-clean when a tag is created. With docker you can re-tag the same image as many times as you want and push it to Docker Hub, and that can be done depending on what the version number in the tag is.

So what I'm proposing is to have one build, when a tag is created, with some <version> then,

  1. Tag the image sagemath:<version>; push
  2. If the version ends with .betaN or .rcN, re-tag the image sagemath:develop; push
  3. Else re-tag the image sagemath:latest; push

One build resulting in 2 docker-tags for the same image, each time.

I think, theoretically, the way you do things currently makes sense, and might make sense to revert to at some point, but practically-speaking, right now, I don't see why we should do build-from-clean for anything but tags.

Last edited 15 months ago by embray (previous) (diff)

comment:343 in reply to: ↑ 342 Changed 15 months ago by saraedum

Replying to embray:

So what I'm proposing is to have one build, when a tag is created, with some <version> then,

  1. Tag the image sagemath:<version>; push
  2. If the version ends with .betaN or .rcN, re-tag the image sagemath:develop; push

That's not correct, all tags should go to sagemath:develop.

  1. Else re-tag the image sagemath:latest; push

One build resulting in 2 docker-tags for the same image, each time.

I think, theoretically, the way you do things currently makes sense, and might make sense to revert to at some point, but practically-speaking, right now, I don't see why we should do build-from-clean for anything but tags.

Is it really an issue? It's a couple of hours of build time every week but it would be hardcoding knowledge about our release process which I don't like. Also, I'd like to add the release manager's integration branch to the list of build-from-clean at some point (though they should not go to the docker hub maybe) and then we'd need that one again anyway.

comment:344 Changed 15 months ago by saraedum

Also, there are some annoying races here (the following situation has happened to me when we worked with preemptible runners a lot.) With your proposed model the following could happen:

  1. Create git-tag 9.0.
  2. The CI fails for some reason, say a timeout pushing to docker hub.
  3. Create git-tag 9.1.beta0
  4. The CI pushes to sagemath:9.1.beta0 and sagemath:develop
  5. Somebody realizes that the sagemath:9.0 build is missing and presses "retry" in the CI.
  6. The CI pushes to sagemath:9.0, sagemath:latest, and sagemath:develop. Now sagemath:develop is not the latest beta.

This can't happen if you have develop branch → develop docker-tag; git-tag → docker-tag.

comment:345 Changed 15 months ago by vbraun

  • Branch changed from u/saraedum/24655 to ac6201a942aa19b6181c463cac8e9588e92b3d8a
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:346 Changed 15 months ago by gh-timokau

  • Commit ac6201a942aa19b6181c463cac8e9588e92b3d8a deleted

Somehow 7d85dc796 ("Something related to the sphinxbuild seems to be leaking memory") causes the docbuild to segfault on nix. Since the docbuild is running on 4 cores, as far as I can see the commit shouldn't actually change anything. Any idea?

[algebras ] loading pickled environment... not yet created
[algebras ] building [inventory]: targets for 67 source files that are out of date
[algebras ] updating environment: 67 added, 0 changed, 0 removed
[algebras ] The HTML pages are in share/doc/sage/inventory/en/reference/algebras.
Build finished. The built documents can be found in /build/share/doc/sage/inventory/en/reference/algebras
[combinat ] loading pickled environment... not yet created
[combinat ] building [inventory]: targets for 367 source files that are out of date
[combinat ] updating environment: 367 added, 0 changed, 0 removed
Error building the documentation.
Traceback (most recent call last):
  File "/nix/store/9lvgqr20r7j7b6a4fmhw6n82spd9rafq-python-2.7.15-env/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/nix/store/9lvgqr20r7j7b6a4fmhw6n82spd9rafq-python-2.7.15-env/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/nix/store/yyarqmdxj1dmp65qv7mwm3sfrdlwmg8v-python2.7-sagelib-8.4.beta2/lib/python2.7/site-packages/sage_setup/docbuild/__main__.py", line 2, in <module>
    main()
  File "/nix/store/yyarqmdxj1dmp65qv7mwm3sfrdlwmg8v-python2.7-sagelib-8.4.beta2/lib/python2.7/site-packages/sage_setup/docbuild/__init__.py", line 1712, in main
    builder()
  File "/nix/store/yyarqmdxj1dmp65qv7mwm3sfrdlwmg8v-python2.7-sagelib-8.4.beta2/lib/python2.7/site-packages/sage_setup/docbuild/__init__.py", line 334, in _wrapper
    getattr(get_builder(document), 'inventory')(*args, **kwds)
  File "/nix/store/yyarqmdxj1dmp65qv7mwm3sfrdlwmg8v-python2.7-sagelib-8.4.beta2/lib/python2.7/site-packages/sage_setup/docbuild/__init__.py", line 529, in _wrapper
    build_many(build_ref_doc, L)
  File "/nix/store/yyarqmdxj1dmp65qv7mwm3sfrdlwmg8v-python2.7-sagelib-8.4.beta2/lib/python2.7/site-packages/sage_setup/docbuild/__init__.py", line 283, in build_many
    ret = x.get(99999)
  File "/nix/store/9lvgqr20r7j7b6a4fmhw6n82spd9rafq-python-2.7.15-env/lib/python2.7/multiprocessing/pool.py", line 572, in get
    raise self._value
Exception: ('Non-exception during docbuild: Segmentation fault', SignalError('Segmentation fault',))

comment:347 follow-up: Changed 15 months ago by saraedum

I agree that it shouldn't change anything. Can you get more details on the segfault in sphinx?

comment:348 in reply to: ↑ 347 Changed 15 months ago by gh-timokau

Replying to saraedum:

I agree that it shouldn't change anything. Can you get more details on the segfault in sphinx?

Yet for some reason the documentation built fine on 8.2.beta1 and still builds if I revert that commit. Really weird.

I don't know, what kind of detail are you looking for? Do you happen to know how I can convince python/sphinx to give me a useful stacktrace?

comment:349 Changed 13 months ago by jdemeyer

What's the motivation for all these changes to the docbuilder?

Please see https://groups.google.com/forum/#!topic/sage-packaging/VU4h8IWGFLA

comment:350 follow-up: Changed 13 months ago by saraedum

We were running out of RAM on CI machines while building the docs IIRC.

comment:351 in reply to: ↑ 350 ; follow-up: Changed 13 months ago by jdemeyer

Replying to saraedum:

We were running out of RAM on CI machines while building the docs IIRC.

Is that related to logging somehow? See also #26667.

comment:352 Changed 13 months ago by jdemeyer

Seriously, yet another thing in Sage broken by this ticket...

comment:353 Changed 13 months ago by embray

That's not really helpful.

comment:354 in reply to: ↑ 351 Changed 13 months ago by saraedum

Replying to jdemeyer:

Replying to saraedum:

We were running out of RAM on CI machines while building the docs IIRC.

Is that related to logging somehow? See also #26667.

No, the logging issues made us exceed stdout limits on CI systems. We are now truncating the output there. It would be nice if we could try to keep the output to stdout as small as reasonably possible as this makes the CI output much easier to work with but I see that this one was too extreme.

comment:355 Changed 13 months ago by embray

I don't think it was "too extreme". I think, perhaps more reasonably, there should be an option for verbosity level. My general preference is somewhere between zero output or a small status progress message until and unless something goes wrong. Detailed logs can go to files, which are more useful to have for debugging anyways (on pipelines the only inconvenience there is that I can't see the log "at-a-glance" on the web UI, but we have artifact downloads for that...)

comment:356 Changed 13 months ago by gh-timokau

I like the idea of just logging the more verbose stuff into a file. It should be printed to stdout on failure though, otherwise debugging a CI failure is annoying.

comment:357 Changed 13 months ago by mkoeppe

You probably know about this, but try "make V=0" in the sage build. Everything will go into logs only and you can print them on error.

Note: See TracTickets for help on using tickets.