#24655 closed enhancement (fixed)
Automatically build docker images with CircleCI/GitLab CI
Reported by: | saraedum | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | sage-8.3 |
Component: | distribution | Keywords: | docker, CI |
Cc: | roed, embray, nthiery, mkoeppe, jdemeyer, hivert | Merged in: | |
Authors: | Julian Rüth | Reviewers: | Erik Bray |
Report Upstream: | N/A | Work issues: | |
Branch: | ac6201a (Commits) | Commit: | |
Dependencies: | #25161, #25160 | Stopgaps: |
Description (last modified by )
It would be nice to update our docker images automatically through continuous integration services. Of course it's good to have these images up-to-date without manual intervention but this is also convenient as a starting point for people who want to use CI for their own branches of Sage (besides the patchbot.)¹
This ticket proposes recipes for GitLab CI and CircleCI to build our docker images automatically. On the respective websites, the CI can be configured to push automatically to the Docker Hub. A webhook (on github) updates the README on Docker Hub automatically.
I implemented this for both GitLab CI and CircleCI. I think GitLab CI is more relevant in the long run, also it's open source and people can provision their own machines as runners. CircleCI at the same time works out of the box for Sage without private test runners and it also allows for easier debugging as you can logon to the machine running your tests with SSH. I tried to share most code between the two implementations.
See also https://github.com/sagemath/docker-images/issues/13 and https://github.com/sagemath/sage-binder-env/issues/3 for a followup (automatically provide jupyter notebooks for easier review.)
Here are some numbers and screenshots (click on the screenshots to go to the real pages):
GitLab CI
If I provision my own runner from Google Cloud with two threads, it takes about 5 hours to build Sage from scratch, run rudimentary tests on the docker images, and upload them to Docker Hub and GitLab's registry.
Recycling the build artifacts from the last run on the develop branch brings this down to about ?? minutes (on GitLab's free shared runners with two threads.) This roughly breaks down as:
- 32 minutes for `build-from-latest:
- 10 minutes for the actual build (most of which is spent in the docbuild; caused by a known Sphinx bug to some extent)
- ?? minutes are spent pulling the sagemath-dev image from Docker Hub (this usually goes away if you provision your own runners and expose the host's docker daemon by setting
DOCKER_HOST
.) - a few minutes running through all the fast stages of the Dockerfile.
- a few minutes to push the resulting images to GitLab's registry. (using GitLab's
cache
, this could probably be improved, at least for runners that we provision ourselves.)
- 5 - 15 minutes for each test (run in parallel,); the relevant test is
test-dev.sh
which spents 6 minutes in the actual docbuild (just as inbuild-from-latest
) and some 5 minutes to pull the sagemath-dev image from the GitLab registry. (That part should go away with a provisioned runner that exposes the host's docker daemon.) - ?? minutes for the publishing to Docker Hub, most of which is spent pulling the images from the GitLab registry, and the other half pushing them to Docker Hub roughly. (Again, exposing the host's docker daemon would probably cut that time in half.)
With some tricks we could probably bring this down to 25 minutes (see CircleCI below) but we won't get this down to this without giving up on the CI being split up into different stages (as is for technical reasons necessary for CircleCI.) To go well below that, we would need to pull binary packages from somewhere…I don't see a sustainable way of doing this with the current SPKG system.
CircleCI
It typically takes almost 5 hours to build Sage from scratch on CircleCI, run rudimentary tests on the docker images, and upload them to Docker Hub.
Recycling the build artifacts from the last run on the develop branch brings this down to about 30 minutes usually. 5 minutes could be saved by not testing the sagemath-dev
and probably another minute or two if we do not build it at all. To go significantly below 15 minutes is probably hard with the huge sage-the-distribution (7GB uncompressed/2GB compressed) that we have to pull every time at the moment.
Docker Hub
A push to github updates the README on the Docker Hub page. The current sizes are and
; unfortunately MicroBadger? is somewhat unstable so these numbers are incorrectly reported as 0 sometimes.
Here are some things that we need to test before merging this:
- [x] build-from-clean works in the sagemath namespace, building a tag on GitLab, https://gitlab.com/saraedum/sage/pipelines/25831229
- [x] build-from-clean works in the sagemath namespace, building from develop on GitLab, https://gitlab.com/saraedum/sage/pipelines/25831675
- [x] build-from-clean works in a user namespace on CircleCI, https://circleci.com/workflow-run/4ae6af8c-2212-4724-a865-a401be4bd8b7; this does not work reliably as it often times out after 5 hours. If we can manage to use more packages from the system, then we should be able to move this below CircleCI's time limit.
- [x] build-from-latest works and is fast in a user namespace on GitLab, https://gitlab.com/saraedum/sage/pipelines/25894653
- [x] build-from-latest works and is fast in a user namespace on CircleCI, https://circleci.com/workflow-run/5bad5fe0-f817-4174-b0b4-de7d1be3b01c
After this ticket has been merged, the following steps are necessary:
Setup an account for sagemath on Circle CI.- Add Docker Hub credentials on
Circle CI orGitLab.
To see a demo of what the result looks like, go to https://hub.docker.com/r/sagemath/sagemath/. The CircleCI runs can be seen here https://circleci.com/gh/saraedum/sage, and the GitLab CI runs are here https://gitlab.com/saraedum/sage/pipelines.
¹: I want to run unit tests of an external Sage package, https://github.com/swewers/MCLF. Being able to build a custom docker image which contains some not-yet-merged tickets makes this much easier.
PS: Long-term one could imagine this to be the first step to replace the patchbot with a solution that we do not have to maintain so much ourselves, such as gitlab-runners. This is of course outside of the scope of this ticket but having a bunch of working CI files in our repository might inspire people to script some other tasks in a reproducible and standardized way.
Attachments (5)
Change History (362)
comment:1 Changed 3 years ago by
- Cc roed embray nthiery added
- Description modified (diff)
comment:2 Changed 3 years ago by
- Branch set to u/saraedum/gitlabci
comment:3 Changed 3 years ago by
- Commit set to 91e4ed1fa743786c25b505859faefa516f1e6983
- Description modified (diff)
comment:4 Changed 3 years ago by
I haven't played much with GitLab CI and CircleCI, so can't really comment on the details for now. But +1 for the general idea. That would be very helpful.
comment:5 Changed 3 years ago by
- Cc mkoeppe added
Adding Matthias who did setup continuous integration for various Sage packages (including https://github.com/sagemath/sage_sample).
comment:6 Changed 3 years ago by
- Commit changed from 91e4ed1fa743786c25b505859faefa516f1e6983 to 18970ac8cce0c0ced6f5efffb8a63699a18c827e
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
18970ac | Enable GitLab CI and CircleCI
|
comment:7 Changed 3 years ago by
- Commit changed from 18970ac8cce0c0ced6f5efffb8a63699a18c827e to 3c855571a4a17738116011cdf9962f675248f129
comment:8 Changed 3 years ago by
- Commit changed from 3c855571a4a17738116011cdf9962f675248f129 to 5f1a0e634a37bfc7b42a31a52a1f3b1e76c27470
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
5f1a0e6 | Build docker images automatically
|
comment:9 Changed 3 years ago by
- Commit changed from 5f1a0e634a37bfc7b42a31a52a1f3b1e76c27470 to 69752edeec2006713580d5c20a3c95db03636c43
Branch pushed to git repo; I updated commit sha1. New commits:
69752ed | Fix script name
|
comment:10 Changed 3 years ago by
- Commit changed from 69752edeec2006713580d5c20a3c95db03636c43 to 6642d994bbe0d8552578d2902f91b7d03fb37938
Branch pushed to git repo; I updated commit sha1. New commits:
c0514d5 | Comment on limitations on CircleCI
|
1035022 | Set workdir to the Sage directory
|
4ccc256 | Fix entrypoints
|
84c12ed | Simplify push/pull scripts
|
202273c | A single entrypoint does not need a directory on its own
|
a8fcb94 | Add tests
|
cfe2d14 | Fix CircleCI config
|
6642d99 | Speed up my build experiments
|
comment:11 Changed 3 years ago by
- Commit changed from 6642d994bbe0d8552578d2902f91b7d03fb37938 to 062d37bec3145981b4ee6b1b1b74b1943c348520
Branch pushed to git repo; I updated commit sha1. New commits:
4e179cc | no TTY/STDIN for automated tests
|
a5b2749 | Fix names of docker containers for GitLab CI
|
295ffd8 | Trying to make tests POSIX compliant
|
e9bacb8 | Fixing test for CI
|
1b72d5b | Update docker/README to incorporate latest changes
|
062d37b | Add debug output
|
comment:12 Changed 3 years ago by
- Commit changed from 062d37bec3145981b4ee6b1b1b74b1943c348520 to e4010d8f91082ac1b4d9c2cd2955edda30c638f8
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
4c023d5 | Enable CI backed by GitLab CI and CircleCI
|
6ccbaa3 | Be more agressive about requiring the latest develop
|
7f90d86 | be more precise about failed latest merge
|
8efe3a9 | more and more specific badges
|
15f3ae7 | fix "build your own" instructions
|
595be9c | POSIX scripts
|
e4010d8 | Fix branding of CI files
|
comment:13 Changed 3 years ago by
- Description modified (diff)
comment:14 Changed 3 years ago by
- Description modified (diff)
- Status changed from new to needs_review
comment:15 Changed 3 years ago by
- Description modified (diff)
comment:16 Changed 3 years ago by
- Commit changed from e4010d8f91082ac1b4d9c2cd2955edda30c638f8 to 10d2093e21b637f31f2f6fef0a994dd14f3cd48f
Branch pushed to git repo; I updated commit sha1. New commits:
10d2093 | Comment on timings and possible speedups
|
comment:17 Changed 3 years ago by
- Description modified (diff)
comment:18 Changed 3 years ago by
- Commit changed from 10d2093e21b637f31f2f6fef0a994dd14f3cd48f to d99696262a3fd3d50e4ad46bb6c768266dd76cd0
comment:19 Changed 3 years ago by
- Commit changed from d99696262a3fd3d50e4ad46bb6c768266dd76cd0 to 01e3fc95ea7418371ba8a67d47ca80762f59ef3f
Branch pushed to git repo; I updated commit sha1. Last 10 new commits:
744cc24 | Merge remote-tracking branch 'gitlab/binder' into binder
|
93696a6 | git needs ssh to push
|
ae0feb4 | ssh is called openssh
|
6c2b29c | Record github's SSH key
|
062c40b | Merge branch 'ci' into binder
|
da69969 | Allow user to override ARTIFACT_BASE on CircleCI
|
7a06f9a | Create a directory for known_hosts
|
83a3593 | Create environment for binder
|
0f4796c | Merge branch 'ci' into binder
|
01e3fc9 | Merge branch 'binder' into HEAD
|
comment:20 Changed 3 years ago by
- Commit changed from 01e3fc95ea7418371ba8a67d47ca80762f59ef3f to d99696262a3fd3d50e4ad46bb6c768266dd76cd0
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
comment:21 Changed 3 years ago by
- Commit changed from d99696262a3fd3d50e4ad46bb6c768266dd76cd0 to 069e722679c5f5e612a7214e51d98cf89b07d137
comment:22 Changed 3 years ago by
- Description modified (diff)
comment:23 Changed 3 years ago by
- Work issues set to run doctests
Maybe my microrelease image is missing something essential? I get a lot of inspection related doctest errors: https://gitlab.com/saraedum/sage/-/jobs/54609045
comment:24 Changed 3 years ago by
- Work issues changed from run doctests to fix doctests
sage -t src/doc/common/conf.py # 1 doctest failed sage -t src/sage/arith/long.pxd # 9 doctests failed sage -t src/sage/cpython/cython_metaclass.pyx # 4 doctests failed sage -t src/sage/cpython/getattr.pyx # 4 doctests failed sage -t src/sage/cpython/wrapperdescr.pxd # 6 doctests failed sage -t src/sage/docs/instancedoc.pyx # 4 doctests failed sage -t src/sage/ext/cdefs.pxi # 1 doctest failed sage -t src/sage/ext/memory_allocator.pyx # 2 doctests failed sage -t src/sage/ext/stdsage.pxi # 1 doctest failed sage -t src/sage/libs/gap/util.pyx # 2 doctests failed sage -t src/sage/libs/glpk/error.pyx # 1 doctest failed sage -t src/sage/misc/cachefunc.pyx # 54 doctests failed sage -t src/sage/misc/cython.py # 17 doctests failed sage -t src/sage/misc/cython_c.pyx # 1 doctest failed sage -t src/sage/misc/inherit_comparison.pyx # 5 doctests failed sage -t src/sage/misc/lazy_attribute.pyx # 3 doctests failed sage -t src/sage/misc/nested_class.pyx # 6 doctests failed sage -t src/sage/misc/sagedoc.py # 6 doctests failed sage -t src/sage/misc/sageinspect.py # 31 doctests failed sage -t src/sage/misc/session.pyx # 2 doctests failed sage -t src/sage/misc/superseded.py # 2 doctests failed sage -t src/sage/parallel/decorate.py # 2 doctests failed sage -t src/sage/repl/ipython_extension.py # 1 doctest failed sage -t src/sage/repl/load.py # 2 doctests failed sage -t src/sage/rings/integer_fake.pxd # 1 doctest failed sage -t src/sage/structure/element.pxd # 1 doctest failed sage -t src/sage/structure/element.pyx # 38 doctests failed sage -t src/sage/structure/factory.pyx # 8 doctests failed sage -t src/sage/tests/cmdline.py # 4 doctests failed
comment:25 Changed 3 years ago by
Making good progress…
sage -t src/doc/common/conf.py # 1 doctest failed sage -t src/sage/misc/cython.py # 5 doctests failed sage -t src/sage/misc/sagedoc.py # 4 doctests failed
comment:26 Changed 3 years ago by
- Commit changed from 069e722679c5f5e612a7214e51d98cf89b07d137 to 7ad5449612c66cac78312e703ddabbd3c49027ec
Branch pushed to git repo; I updated commit sha1. Last 10 new commits:
2fc2da6 | Add rdfind properly to the dependencies
|
e4d5e4e | fix doctest errors
|
8ce8f41 | add more debug info about the host we are running on
|
ce23502 | Merge branch 'ci' of gitlab.com:saraedum/sage into ci
|
bbe279e | same log info on CircleCI
|
9aaca79 | sh commands in Makefile do not modify the cwd
|
fa93281 | Fix introspection
|
5e38736 | Make tests requiring documentation optional
|
fb22aa2 | Fix cython compilation errors during doctesting
|
7ad5449 | Merge remote-tracking branch 'trac/develop' into ci
|
comment:27 Changed 3 years ago by
Julian walked me through this last week and it is very good stuff and I'm +1 on it.
My one concern is that by including the CI configuration files directly in the main Sage repo, it may be difficult to update things quickly if/when they break. In the long term I'd much prefer it if Sage's 'master' branch were used more like other projects typically do, as a fast-moving development branch where we can easily push changes quickly if need be, particularly for the CI config, for which perhaps we could make a special dispensation.
In the meantime, perhaps the CI builds should be based on a "ci" branch of Sage which, most of the time, would just be kept up to date with develop
, but that we could also commit to more frequently if needed.
Is this still needs_review, or should it be needs_work?
comment:28 Changed 3 years ago by
- Commit changed from 7ad5449612c66cac78312e703ddabbd3c49027ec to 4818f9bc54c2d71c12097f5b0ff10335718429e0
comment:29 follow-up: ↓ 38 Changed 3 years ago by
- Work issues changed from fix doctests to make doc takes a bit too much time in sagemath-dev, make incremental build work on CircleCI
I see your concerns but I think it has to be in the main branch to be useful. If we have it on a separate branch, how would I use it if I want it on my branch? Merge it in and then in the end force push a version without CI?
This actually needs review. I am tweaking this in a few places to make it run faster and produce smaller images but I am not planning on making any substantial changes anymore.
comment:30 Changed 3 years ago by
- Work issues changed from make doc takes a bit too much time in sagemath-dev, make incremental build work on CircleCI to make doc takes a bit too much time in sagemath-dev, make incremental build work on CircleCI, update timings in ticket description
comment:31 Changed 3 years ago by
- Commit changed from 4818f9bc54c2d71c12097f5b0ff10335718429e0 to 96a20eaa0cef2ce4a26867b86d16f1b813ec2f18
comment:32 Changed 3 years ago by
- Work issues changed from make doc takes a bit too much time in sagemath-dev, make incremental build work on CircleCI, update timings in ticket description to update timings in ticket description
comment:33 Changed 3 years ago by
- Work issues changed from update timings in ticket description to update timings in ticket description, explain how the incremental build works
comment:34 Changed 3 years ago by
- Commit changed from 96a20eaa0cef2ce4a26867b86d16f1b813ec2f18 to 3f9a0193369fe11dbe194a302e444b825a1de4a9
Branch pushed to git repo; I updated commit sha1. New commits:
3f9a019 | minor documentation improvements
|
comment:35 Changed 3 years ago by
- Description modified (diff)
- Work issues update timings in ticket description, explain how the incremental build works deleted
comment:36 Changed 3 years ago by
- Commit changed from 3f9a0193369fe11dbe194a302e444b825a1de4a9 to 2a57dd007a662286ec6de6c88770c67f72ed102b
Branch pushed to git repo; I updated commit sha1. New commits:
2a57dd0 | update timings
|
comment:37 Changed 3 years ago by
- Description modified (diff)
comment:38 in reply to: ↑ 29 ; follow-up: ↓ 41 Changed 3 years ago by
Replying to saraedum:
I see your concerns but I think it has to be in the main branch to be useful. If we have it on a separate branch, how would I use it if I want it on my branch? Merge it in and then in the end force push a version without CI?
My point was that Sage's "develop" branch moves at a slow pace as it currently does, but we have a separate branch (synced regularly to the current "develop") that is allowed to move faster, more like how most projects use "master".
So it wouldn't just be for continuous integration. That is, the CI stuff would still be in the main branch, but we would have a separate development branch from "develop" that can be a bit more agile, accept merge requests, etc.
comment:39 follow-up: ↓ 42 Changed 3 years ago by
-
Makefile
diff --git a/Makefile b/Makefile index 9245745..468302b 100644
a b misc-clean: 68 68 rm -f build/make/Makefile build/make/Makefile-auto 69 69 rm -f .BUILDSTART 70 70 71 # TODO: What is a "bdist"? A binary distribution? 71 72 bdist-clean: clean 72 73 $(MAKE) misc-clean
Yes--this just does misc-clean
but also cleans up $SAGE_LOCAL/var/tmp/sage/build
. I think you can just remove this TODO note unless you had some specific intentions here.
comment:40 follow-up: ↓ 44 Changed 3 years ago by
+ @# We need src/sage/, src/doc/common, src/doc/en/introspect for introspection with "??"
I believe since #24690 this is no longer true, at least with regard to src/sage
. Actually you might be able to get away with completely deleting src/sage
since it's basically duplicated in $SAGE_LOCAL/lib/python2.7/site-packages/sage
. Though it's possible there's some small tests that will still fail without it--worth a try.
That said, it might not be much of an issue if the rdfind
stuff works.
comment:41 in reply to: ↑ 38 Changed 3 years ago by
Replying to embray:
Replying to saraedum:
I see your concerns but I think it has to be in the main branch to be useful. If we have it on a separate branch, how would I use it if I want it on my branch? Merge it in and then in the end force push a version without CI?
My point was that Sage's "develop" branch moves at a slow pace as it currently does, but we have a separate branch (synced regularly to the current "develop") that is allowed to move faster, more like how most projects use "master".
So it wouldn't just be for continuous integration. That is, the CI stuff would still be in the main branch, but we would have a separate development branch from "develop" that can be a bit more agile, accept merge requests, etc.
Ok. I am not exactly sure what you are proposing with regards to this ticket. I would not mind a develop branch that moves at a faster pace however.
comment:42 in reply to: ↑ 39 ; follow-up: ↓ 43 Changed 3 years ago by
Replying to embray:
Makefile
diff --git a/Makefile b/Makefile index 9245745..468302b 100644
a b misc-clean: 68 68 rm -f build/make/Makefile build/make/Makefile-auto 69 69 rm -f .BUILDSTART 70 70 71 # TODO: What is a "bdist"? A binary distribution? 71 72 bdist-clean: clean 72 73 $(MAKE) misc-clean Yes--this just does
misc-clean
but also cleans up$SAGE_LOCAL/var/tmp/sage/build
. I think you can just remove this TODO note unless you had some specific intentions here.
I would like to document what this is good for. So…what is the use case for this target?
comment:43 in reply to: ↑ 42 Changed 3 years ago by
Replying to saraedum:
I would like to document what this is good for. So…what is the use case for this target?
It seems clear to me that it cleans up all build artifacts in preparation for a binary release. But you might want to ask Volker.
comment:44 in reply to: ↑ 40 Changed 3 years ago by
- Work issues set to document bdist target, clarify why we keep src/ in micro_release
Replying to embray:
+ @# We need src/sage/, src/doc/common, src/doc/en/introspect for introspection with "??"
I believe since #24690 this is no longer true, at least with regard to
src/sage
. Actually you might be able to get away with completely deletingsrc/sage
since it's basically duplicated in$SAGE_LOCAL/lib/python2.7/site-packages/sage
. Though it's possible there's some small tests that will still fail without it--worth a try.That said, it might not be much of an issue if the
rdfind
stuff works.
It does not make a real difference with rdfind. Yes, it really seems that it is not needed for introspection anymore. I'll adapt the comment.
comment:45 Changed 3 years ago by
Is somebody using micro_release
at the moment? It is way more aggressive now, so this might break some workflows. Who should I ask about this? … I'll write an email to sage-devel.
comment:46 Changed 3 years ago by
- Commit changed from 2a57dd007a662286ec6de6c88770c67f72ed102b to 28445f7107762401af831bf3e018fbbfdf6e6fd3
Branch pushed to git repo; I updated commit sha1. New commits:
28445f7 | Fix comments in Makefile
|
comment:47 Changed 3 years ago by
- Work issues document bdist target, clarify why we keep src/ in micro_release deleted
comment:48 Changed 3 years ago by
- Commit changed from 28445f7107762401af831bf3e018fbbfdf6e6fd3 to e0f56c4ca8cc8a9a0f2cc802625986cb34741cae
Branch pushed to git repo; I updated commit sha1. New commits:
e0f56c4 | Merge remote-tracking branch 'trac/develop' into t/24655/gitlabci
|
comment:49 Changed 3 years ago by
I fixed the conflicts with develop. This needs review again.
comment:50 follow-up: ↓ 58 Changed 3 years ago by
Maybe just a minor nitpick, but I noticed that you have all these shared scripts like describe-system.sh
and so on, and that in both the circle-ci and gitlab configurations you source a lot of these scripts instead of just running them. A few of them export some environment variables, so it that case the sourcing makes sense, but why for the others, that don't modify the environment?
comment:51 follow-up: ↓ 59 Changed 3 years ago by
Perhaps you can clarify something for me. I'm reading the documentation at https://docs.gitlab.com/ce/ci/docker/using_docker_build.html and at https://docs.gitlab.com/ce/ci/docker/using_docker_images.html (just to better understand some aspects of this), and the documentation starts out saying "Install GitLab Runner, etc...". But just to be clear, this will work on GitLab-provided runners as well, right? We don't necessarily have to be running our own in order for docker-in-docker to work? (Not that we wouldn't want to be running our own runners anyways, but it would be disconcerting if that were necessary for this configuration to work properly at all).
Relatedly, I don't understand this comment at all:
# We use docker-in-docker to build our docker images. It can be faster to # expose your outer docker daemon by mounting /var/run/docker.sock to # /var/run/docker.sock and setting DOCKER_HOST in Settings -> CI/CD -> Secret # variable to unix:///var/run/docker.sock
Why would you set DOCKER_HOST
as a secret?
Forgive me if these are annoying questions--I'm just trying to understand how this all works so that I can continue to help with it in the future.
comment:52 follow-up: ↓ 60 Changed 3 years ago by
I don't understand why you made this change:
-
deleted file src/.gitignore
diff --git a/src/.gitignore b/src/.gitignore deleted file mode 100644 index e85aa7f..00000000
+ - 1 /.cython_version2 /build3 /Makefile4 /bin/sage-env-config
You simultaneously added the same ignores to the top-level .gitignore
but it's not clear to me why. I think it's pretty normal to have directory-specific .gitignore
s and I think that one was fine as it was, unless there was some specific reason.
comment:53 follow-up: ↓ 61 Changed 3 years ago by
One more annoying nitpick, but in docker/README.md
can we wrap the lines, please?
comment:54 follow-up: ↓ 62 Changed 3 years ago by
It looks like in the Dockerfile you've done away with the build for the sagemath-jupyter
image. I've always been somewhat split on my feelings about it, but I think in general it has been useful. It's nice having an image whose default command is just the correct thing to start up the Notebook, because I've had users mess that up before and get confused (e.g. you have to make sure to pass --ip='*'
and get the quoting right as well.
On a similar note, I was confused by the documentation for .ci/test-jupyter.sh
where it reads:
# Usage: ./test-jupyter.sh sage-jupyter-image [host]
This gave me the impression that it should be used verbatim like ./test-jupyter.sh sage-jupyter-image
--as if there were actually an image called "sage-jupyter-image". This might be clearer if it were documented in a more GNU-like style just as ./test-jupyter.sh IMAGE-NAME [HOST]
, making it clear that the first argument is the name (or hash for that matter) of some Docker image, not necessarily a specific image.
I haven't looked at how all the other scripts are documented yet, but I would do them similarly.
comment:55 follow-up: ↓ 63 Changed 3 years ago by
Since we already use the name sagemath-develop
on DockerHub for the development images, could we go ahead and do the same here (so we don't end up with sagemath-develop
and sagemath-dev
)? I don't see the advantage of dropping four letters here, unless for some reason it would be really annoying to change this name at this point?
comment:56 follow-up: ↓ 65 Changed 3 years ago by
(Since I'm asking for lots of little things, to be clear, I'm happy to make some of the requested changes myself--I just want to clarify some of them with you first.)
comment:57 follow-up: ↓ 64 Changed 3 years ago by
Can we rename the little commit
stamp file generated by the Dockerfile
to .commit
, so it's hidden by default? I can't imagine we'd normally care about seeing that.
comment:58 in reply to: ↑ 50 ; follow-up: ↓ 81 Changed 3 years ago by
Replying to embray:
Maybe just a minor nitpick, but I noticed that you have all these shared scripts like
describe-system.sh
and so on, and that in both the circle-ci and gitlab configurations you source a lot of these scripts instead of just running them. A few of them export some environment variables, so it that case the sourcing makes sense, but why for the others, that don't modify the environment?
Ok. Fixed.
comment:59 in reply to: ↑ 51 ; follow-up: ↓ 70 Changed 3 years ago by
Replying to embray:
Perhaps you can clarify something for me. I'm reading the documentation at https://docs.gitlab.com/ce/ci/docker/using_docker_build.html and at https://docs.gitlab.com/ce/ci/docker/using_docker_images.html (just to better understand some aspects of this), and the documentation starts out saying "Install GitLab Runner, etc...". But just to be clear, this will work on GitLab-provided runners as well, right? We don't necessarily have to be running our own in order for docker-in-docker to work? (Not that we wouldn't want to be running our own runners anyways, but it would be disconcerting if that were necessary for this configuration to work properly at all).
Yes, it works nicely with the provisioned runners out of the box.
Relatedly, I don't understand this comment at all:
# We use docker-in-docker to build our docker images. It can be faster to # expose your outer docker daemon by mounting /var/run/docker.sock to # /var/run/docker.sock and setting DOCKER_HOST in Settings -> CI/CD -> Secret # variable to unix:///var/run/docker.sockWhy would you set
DOCKER_HOST
as a secret?
Environment variables that are injected into the scripts that are described in .gitlab-ci.yml
are called secrets in GitLab. They are not necessarily "secret". The docker
docker image detects whether there is a host called docker
and then assumes that this host is running docker:dind
(or something similar.) You can disable this behaviour by explicitly setting DOCKER_HOST
.
Forgive me if these are annoying questions--I'm just trying to understand how this all works so that I can continue to help with it in the future.
No worries at all. This is all a bit confusing.
comment:60 in reply to: ↑ 52 ; follow-up: ↓ 72 Changed 3 years ago by
Replying to embray:
I don't understand why you made this change:
deleted file src/.gitignore
diff --git a/src/.gitignore b/src/.gitignore deleted file mode 100644 index e85aa7f..00000000
+ - 1 /.cython_version2 /build3 /Makefile4 /bin/sage-env-configYou simultaneously added the same ignores to the top-level
.gitignore
but it's not clear to me why. I think it's pretty normal to have directory-specific.gitignore
s and I think that one was fine as it was, unless there was some specific reason.
I want these files to be part of .dockerignore
. I symlinked .dockerignore
to .gitignore
in the root directory. However, docker does not support .dockerignore
files in subdirectories, so I had to move these rules down.
comment:61 in reply to: ↑ 53 ; follow-up: ↓ 71 Changed 3 years ago by
Replying to embray:
One more annoying nitpick, but in
docker/README.md
can we wrap the lines, please?
No :) I added a comment to the README about this.
comment:62 in reply to: ↑ 54 ; follow-up: ↓ 73 Changed 3 years ago by
Replying to embray:
It looks like in the Dockerfile you've done away with the build for the
sagemath-jupyter
image. I've always been somewhat split on my feelings about it, but I think in general it has been useful. It's nice having an image whose default command is just the correct thing to start up the Notebook, because I've had users mess that up before and get confused (e.g. you have to make sure to pass--ip='*'
and get the quoting right as well.
I had this originally but then removed it. It's a bit annoying to build both images because you need to upload the full image to dockerhub (dockerhub does not detect shared layers between repositories it seems.) The "docker run" string is more complicated anyway, since you need the port forwarding. Do you think what it says in the README is not sufficient for people to just copy and paste it? he command should work on all POSIX shells I think.
On a similar note, I was confused by the documentation for
.ci/test-jupyter.sh
where it reads:# Usage: ./test-jupyter.sh sage-jupyter-image [host]This gave me the impression that it should be used verbatim like
./test-jupyter.sh sage-jupyter-image
--as if there were actually an image called "sage-jupyter-image". This might be clearer if it were documented in a more GNU-like style just as./test-jupyter.sh IMAGE-NAME [HOST]
, making it clear that the first argument is the name (or hash for that matter) of some Docker image, not necessarily a specific image.I haven't looked at how all the other scripts are documented yet, but I would do them similarly.
Ok. Fixed.
comment:63 in reply to: ↑ 55 ; follow-ups: ↓ 74 ↓ 75 Changed 3 years ago by
Replying to embray:
Since we already use the name
sagemath-develop
on DockerHub for the development images, could we go ahead and do the same here (so we don't end up withsagemath-develop
andsagemath-dev
)? I don't see the advantage of dropping four letters here, unless for some reason it would be really annoying to change this name at this point?
If I understood correctly, sagemath-develop
is referring to Sage's develop
branch. My thinking was to use something that can not get confused with an existing term in Sage. Also, my sagemath
images are continuing what the existing sagemath
images do. But the sagemath-dev
images are not really related to the sagemath-develop
images. I am open to replacing dev
with something else.
comment:64 in reply to: ↑ 57 Changed 3 years ago by
Replying to embray:
Can we rename the little
commit
stamp file generated by theDockerfile
to.commit
, so it's hidden by default? I can't imagine we'd normally care about seeing that.
It's inside the docker/
directory so you usually won't see it. It's also .gitignore-d (but not .dockerignore-d!). I don't have a very strong opinion on this but I would need to change a bunch of things in my local setup here (not part of this ticket) if I changed this.
comment:65 in reply to: ↑ 56 Changed 3 years ago by
Replying to embray:
(Since I'm asking for lots of little things, to be clear, I'm happy to make some of the requested changes myself--I just want to clarify some of them with you first.)
Your comments are very much appreciated. I think it's easier for me to fix things as I work with these files all the time. But feel free to push your own changes of course.
comment:66 Changed 3 years ago by
- Commit changed from e0f56c4ca8cc8a9a0f2cc802625986cb34741cae to 4ee63131bad067e53df4369dacb30461a58c97fa
comment:67 Changed 3 years ago by
- Reviewers set to Erik Bray
comment:68 Changed 3 years ago by
- Work issues set to check that 4ee63131bad067e53df4369dacb30461a58c97fa still works on CircleCI/GitLab CI
comment:69 follow-up: ↓ 76 Changed 3 years ago by
I am now getting
Step 40/42 : RUN make build ---> Running in 9605e7f7ecf4 build/make/Makefile --stop make: build/make/Makefile: Command not found Makefile:20: recipe for target 'all-build' failed
Has anything changed in Sage recently that could cause this?
comment:70 in reply to: ↑ 59 ; follow-up: ↓ 84 Changed 3 years ago by
Replying to saraedum:
Replying to embray:
Relatedly, I don't understand this comment at all:
# We use docker-in-docker to build our docker images. It can be faster to # expose your outer docker daemon by mounting /var/run/docker.sock to # /var/run/docker.sock and setting DOCKER_HOST in Settings -> CI/CD -> Secret # variable to unix:///var/run/docker.sockWhy would you set
DOCKER_HOST
as a secret?Environment variables that are injected into the scripts that are described in
.gitlab-ci.yml
are called secrets in GitLab. They are not necessarily "secret". Thedocker
docker image detects whether there is a host calleddocker
and then assumes that this host is runningdocker:dind
(or something similar.) You can disable this behaviour by explicitly settingDOCKER_HOST
.
Okay, thanks for clarifying the "secrets" thing. That was definitely confusing me.
I still had to stare at this for a while though to understand what was meant. In particular I didn't understand exactly what services:
meant (now I do), and it was also unclear to me that docker:dind
was just the name of a Docker image (where dind
is the tag indicating the Docker-in-Docker image). I also didn't fully understand the implications of what "Docker-in-Docker" meant. I mean, I've installed the Docker client inside a container before and linked it to my outer Docker engine via a socket before, and I thought that's all that was meant by "Docker-in-Docker".
It's clearer to me now what this means, but maybe this should be documented somewhere more clearly, like in the Readme, instead of burying it here. In fact, is there any advantage to using docker-in-docker specifically over setting DOCKER_HOST
like you explain here?
comment:71 in reply to: ↑ 61 Changed 3 years ago by
comment:72 in reply to: ↑ 60 ; follow-up: ↓ 85 Changed 3 years ago by
Replying to saraedum:
I want these files to be part of
.dockerignore
. I symlinked.dockerignore
to.gitignore
in the root directory. However, docker does not support.dockerignore
files in subdirectories, so I had to move these rules down.
I see; that's too bad. Something about it doesn't sit right with me. For example I feel like, although it's convenient to symlink .gitignore
to .dockerignore
(as there are many patterns in the former we would also want to apply to the latter), I could envision a scenario where we don't want them to be exactly the same.
We can try it for now though, since I don't have an immediate argument against it.
comment:73 in reply to: ↑ 62 ; follow-up: ↓ 86 Changed 3 years ago by
Replying to saraedum:
Replying to embray:
It looks like in the Dockerfile you've done away with the build for the
sagemath-jupyter
image. I've always been somewhat split on my feelings about it, but I think in general it has been useful. It's nice having an image whose default command is just the correct thing to start up the Notebook, because I've had users mess that up before and get confused (e.g. you have to make sure to pass--ip='*'
and get the quoting right as well.I had this originally but then removed it. It's a bit annoying to build both images because you need to upload the full image to dockerhub (dockerhub does not detect shared layers between repositories it seems.) The "docker run" string is more complicated anyway, since you need the port forwarding. Do you think what it says in the README is not sufficient for people to just copy and paste it? he command should work on all POSIX shells I think.
Nicolas might be interested in chiming in here, but even as an "expert" I've personally found the sagemath-jupyter
image to be useful, as a way to quickly fire up a notebook (since I don't use the notebook much personally, but I do sometimes need to test things in it). I don't want to have to look up the correct command line every time I do.
It's weird what you say about Docker having to re-upload the intermediate images. I don't think I've observed that before myself, but maybe I just wasn't paying attention.
I don't think we need to make the sagemath-jupyter
for every build though--only for releases should be sufficient. So what if we at least kept a build target for it in the Dockerfile (or even in a separate Dockerfile if it's more convenient?) and only build/push those images for final release versions?
comment:74 in reply to: ↑ 63 ; follow-up: ↓ 88 Changed 3 years ago by
Replying to saraedum:
Replying to embray:
Since we already use the name
sagemath-develop
on DockerHub for the development images, could we go ahead and do the same here (so we don't end up withsagemath-develop
andsagemath-dev
)? I don't see the advantage of dropping four letters here, unless for some reason it would be really annoying to change this name at this point?If I understood correctly,
sagemath-develop
is referring to Sage'sdevelop
branch. My thinking was to use something that can not get confused with an existing term in Sage. Also, mysagemath
images are continuing what the existingsagemath
images do. But thesagemath-dev
images are not really related to thesagemath-develop
images. I am open to replacingdev
with something else.
I must be confused about the nomenclature then. I'll re-review everything but I assumed sagemath-dev
was equivalent to builds from the develop
branch. Clearly, if I'm wrong, then I must be missing something crucial. If they're not really related though a different name (maybe even a different Docker repository for them--e.g. a sagemath-dev
repository) might be good.
comment:75 in reply to: ↑ 63 ; follow-up: ↓ 89 Changed 3 years ago by
Replying to saraedum:
Replying to embray:
Can we rename the little
commit
stamp file generated by theDockerfile
to.commit
, so it's hidden by default? I can't imagine we'd normally care about seeing that.It's inside the
docker/
directory so you usually won't see it. It's also .gitignore-d (but not .dockerignore-d!). I don't have a very strong opinion on this but I would need to change a bunch of things in my local setup here (not part of this ticket) if I changed this.
I think personally I would basically never want to see it (e.g. with ls). If I didn't know what it was for it'd just confuse me. I'm not sure what else you would need to change? I'm not going to insist on it if it's a huge pain but I'm confused as to why it would be.
comment:76 in reply to: ↑ 69 Changed 3 years ago by
Replying to saraedum:
I am now getting
Step 40/42 : RUN make build ---> Running in 9605e7f7ecf4 build/make/Makefile --stop make: build/make/Makefile: Command not found Makefile:20: recipe for target 'all-build' failedHas anything changed in Sage recently that could cause this?
Not that I can think of. It might be something to do with your environment. The relevant line from the top-level Makefile
is:
$(MAKE) build/make/Makefile --stop
so you would get this error if the $(MAKE)
variable were somehow set to empty.
comment:77 follow-up: ↓ 92 Changed 3 years ago by
Still confused about sagemath-dev
even after re-reviewing. The point of sagemath-develop
, at least originally, was supposed to be more or less the same thing--it would be built from the latest develop
branch and contain most/all build artifacts (and is hence a bit fatter than the normal sagemath
image, by ~2GB).
comment:78 Changed 3 years ago by
+# 1) Restore the git checkout ARTIFACT_BASE was built from, recorded in +# docker/commit. (Or go directly to FETCH_HEAD if there is no history to +# restore.)
Maybe here explicitly mention something like "unless ARTIFACT_BASE=source-clean". I think that's what the parenthetical statement is referring to, but I had to go around in circles a few times to understand that (since by default ARTIFACT_BASE=source-clean
:) It says that in the message that gets echo'd later on, but the connection isn't obvious at first.
comment:79 follow-up: ↓ 93 Changed 3 years ago by
In the Dockerfile section for sagemath-dev
it has
+CMD ["bash"]
I've usually found it convenient (even for the -devel image) to just start straight into the sage shell like /home/sage/sage/sage -sh
or something. But maybe for debugging purposes it's just a well not to. I'm not sure.
comment:80 follow-up: ↓ 95 Changed 3 years ago by
In pull-gitlab.sh
you have:
+# The variable $DOCKER_IMAGE is set to the full name of the pulled image; +# source this script to use it.
that's rather awkward to me. Perhaps this script could output the image name at the end (perhaps silencing the other commands, or piping them to stderr or something). Or better still, maybe DOCKER_IMAGE
could be set somewhere else, since the way it's pieced together in that script doesn't seem all that directly to the script itself.
(The general principle I'm going on with these scripts is that even though they're written for and used primarily by the CI systems, for testing and development purposes I'd rather they work in some "normal", unsurprising way as much as possible.)
comment:81 in reply to: ↑ 58 ; follow-up: ↓ 94 Changed 3 years ago by
Replying to saraedum:
Replying to embray:
Maybe just a minor nitpick, but I noticed that you have all these shared scripts like
describe-system.sh
and so on, and that in both the circle-ci and gitlab configurations you source a lot of these scripts instead of just running them. A few of them export some environment variables, so it that case the sourcing makes sense, but why for the others, that don't modify the environment?Ok. Fixed.
I think you missed one here:
+ # Build docker images + . .ci/build-docker.sh
comment:82 Changed 3 years ago by
Okay--I think I've finally mostly understood all of this :) The walkthrough you gave me earlier certainly helped, but when it came to the details I still had a bit to go through.
Thank you again for your Herculean effort on this. Other than the numerous, but minor comments above (plus figuring out why $(MAKE)
broke for you) I think we're pretty close... I'm excited to launch this.
comment:83 Changed 3 years ago by
- Commit changed from 4ee63131bad067e53df4369dacb30461a58c97fa to 0892d35ba0c1a6dee2df8faa74cd1155e9964c4b
Branch pushed to git repo; I updated commit sha1. New commits:
0892d35 | Explain the pros and cons of docker:dind
|
comment:84 in reply to: ↑ 70 ; follow-up: ↓ 87 Changed 3 years ago by
Replying to embray:
Replying to saraedum:
Replying to embray:
Relatedly, I don't understand this comment at all:
# We use docker-in-docker to build our docker images. It can be faster to # expose your outer docker daemon by mounting /var/run/docker.sock to # /var/run/docker.sock and setting DOCKER_HOST in Settings -> CI/CD -> Secret # variable to unix:///var/run/docker.sockWhy would you set
DOCKER_HOST
as a secret?Environment variables that are injected into the scripts that are described in
.gitlab-ci.yml
are called secrets in GitLab. They are not necessarily "secret". Thedocker
docker image detects whether there is a host calleddocker
and then assumes that this host is runningdocker:dind
(or something similar.) You can disable this behaviour by explicitly settingDOCKER_HOST
.Okay, thanks for clarifying the "secrets" thing. That was definitely confusing me.
I still had to stare at this for a while though to understand what was meant. In particular I didn't understand exactly what
services:
meant (now I do), and it was also unclear to me thatdocker:dind
was just the name of a Docker image (wheredind
is the tag indicating the Docker-in-Docker image). I also didn't fully understand the implications of what "Docker-in-Docker" meant. I mean, I've installed the Docker client inside a container before and linked it to my outer Docker engine via a socket before, and I thought that's all that was meant by "Docker-in-Docker".It's clearer to me now what this means, but maybe this should be documented somewhere more clearly, like in the Readme, instead of burying it here. In fact, is there any advantage to using docker-in-docker specifically over setting
DOCKER_HOST
like you explain here?
I extended the section about docker:dind. Do you think that's enough? The section could go into the README but then again it's absolutely standard in the GitLab CI world.
comment:85 in reply to: ↑ 72 Changed 3 years ago by
Replying to embray:
Replying to saraedum:
I want these files to be part of
.dockerignore
. I symlinked.dockerignore
to.gitignore
in the root directory. However, docker does not support.dockerignore
files in subdirectories, so I had to move these rules down.I see; that's too bad. Something about it doesn't sit right with me. For example I feel like, although it's convenient to symlink
.gitignore
to.dockerignore
(as there are many patterns in the former we would also want to apply to the latter), I could envision a scenario where we don't want them to be exactly the same.We can try it for now though, since I don't have an immediate argument against it.
I am also not perfectly happy with it. But I also don't want to maintain a .dockerignore unless people widely use and care about this.
comment:86 in reply to: ↑ 73 ; follow-up: ↓ 90 Changed 3 years ago by
Replying to embray:
Replying to saraedum:
Replying to embray:
It looks like in the Dockerfile you've done away with the build for the
sagemath-jupyter
image. I've always been somewhat split on my feelings about it, but I think in general it has been useful. It's nice having an image whose default command is just the correct thing to start up the Notebook, because I've had users mess that up before and get confused (e.g. you have to make sure to pass--ip='*'
and get the quoting right as well.I had this originally but then removed it. It's a bit annoying to build both images because you need to upload the full image to dockerhub (dockerhub does not detect shared layers between repositories it seems.) The "docker run" string is more complicated anyway, since you need the port forwarding. Do you think what it says in the README is not sufficient for people to just copy and paste it? he command should work on all POSIX shells I think.
Nicolas might be interested in chiming in here, but even as an "expert" I've personally found the
sagemath-jupyter
image to be useful, as a way to quickly fire up a notebook (since I don't use the notebook much personally, but I do sometimes need to test things in it). I don't want to have to look up the correct command line every time I do.It's weird what you say about Docker having to re-upload the intermediate images. I don't think I've observed that before myself, but maybe I just wasn't paying attention.
I don't think we need to make the
sagemath-jupyter
for every build though--only for releases should be sufficient. So what if we at least kept a build target for it in the Dockerfile (or even in a separate Dockerfile if it's more convenient?) and only build/push those images for final release versions?
I would like to have the releases and non-releases produce the same output if possible. It makes breaking things much harder. If people want to try something out, they should click on a binder link (see #24842.) I believe that's much more convenient anyway. I don't see having to lookup a command as a problem but I might be missing something here: You need to do something about binding the ports anyway, so the casual docker user (me too actually) needs to lookup the docker run command anyway?
comment:87 in reply to: ↑ 84 ; follow-up: ↓ 100 Changed 3 years ago by
Replying to saraedum:
I extended the section about docker:dind. Do you think that's enough? The section could go into the README but then again it's absolutely standard in the GitLab CI world.
Thank you--that's much clearer now, I think. I would still lean towards explaining this in the Readme, because what's bog standard in GitLab CI, I would think, is at this point virtually unknown to almost everyone else who works on Sage (including me). Maybe with a link to https://docs.gitlab.com/ee/ci/docker/using_docker_build.html#use-docker-in-docker-executor which helped me understand this.
I say that because this actually impacts how one configures their runner and/or their CI settings, and they're more likely to find it sooner in the README--most people aren't going to want to dig into the details of the .gitlab-ci.yaml
if they don't have to.
comment:88 in reply to: ↑ 74 Changed 3 years ago by
Replying to embray:
Replying to saraedum:
Replying to embray:
Since we already use the name
sagemath-develop
on DockerHub for the development images, could we go ahead and do the same here (so we don't end up withsagemath-develop
andsagemath-dev
)? I don't see the advantage of dropping four letters here, unless for some reason it would be really annoying to change this name at this point?If I understood correctly,
sagemath-develop
is referring to Sage'sdevelop
branch. My thinking was to use something that can not get confused with an existing term in Sage. Also, mysagemath
images are continuing what the existingsagemath
images do. But thesagemath-dev
images are not really related to thesagemath-develop
images. I am open to replacingdev
with something else.I must be confused about the nomenclature then. I'll re-review everything but I assumed
sagemath-dev
was equivalent to builds from thedevelop
branch. Clearly, if I'm wrong, then I must be missing something crucial. If they're not really related though a different name (maybe even a different Docker repository for them--e.g. asagemath-dev
repository) might be good.
Yes, I think you misunderstood that part. The dev images are about the build artifacts being intact. I don't discriminate branches (except for building from scratch for master/develop and building from develop's sagemath-dev image for everyone else.)
comment:89 in reply to: ↑ 75 Changed 3 years ago by
Replying to embray:
Replying to saraedum:
Replying to embray:
Can we rename the little
commit
stamp file generated by theDockerfile
to.commit
, so it's hidden by default? I can't imagine we'd normally care about seeing that.It's inside the
docker/
directory so you usually won't see it. It's also .gitignore-d (but not .dockerignore-d!). I don't have a very strong opinion on this but I would need to change a bunch of things in my local setup here (not part of this ticket) if I changed this.I think personally I would basically never want to see it (e.g. with ls). If I didn't know what it was for it'd just confuse me. I'm not sure what else you would need to change? I'm not going to insist on it if it's a huge pain but I'm confused as to why it would be.
It breaks my existing incremental builds on other branches because they expect it to be called commit
. I really don't mind renaming it though.
comment:90 in reply to: ↑ 86 ; follow-up: ↓ 96 Changed 3 years ago by
Replying to saraedum:
I would like to have the releases and non-releases produce the same output if possible. It makes breaking things much harder. If people want to try something out, they should click on a binder link (see #24842.) I believe that's much more convenient anyway. I don't see having to lookup a command as a problem but I might be missing something here: You need to do something about binding the ports anyway, so the casual docker user (me too actually) needs to lookup the docker run command anyway?
That's a fair point, but I still think docker run -p <local-port>:<container-port> <image-name>
is easier to remember (I used to have to look up how -p
works but not usually anymore) than docker run -p 127.0.0.1:8888:8888 sagemath/sagemath sage -notebook=jupyter --no-browser --ip='*' --port=8888
, which is definitely more than I can remember.
I agree about releases and non-releases having much of the same output, but in this case it's really just a convenience layer for casual users, who are mostly only going to be concerned about final releases.
comment:91 Changed 3 years ago by
I remember when I first took over maintenance of the sage Docker images I was skeptical about sagemath-jupyter
as well, which was already there before me. I had pretty much the same skepticism about it. But now that I've personally found it useful my mind is changed.
comment:92 in reply to: ↑ 77 ; follow-up: ↓ 103 Changed 3 years ago by
Replying to embray:
Still confused about
sagemath-dev
even after re-reviewing. The point ofsagemath-develop
, at least originally, was supposed to be more or less the same thing--it would be built from the latestdevelop
branch and contain most/all build artifacts (and is hence a bit fatter than the normalsagemath
image, by ~2GB).
I don't really understand the confusion. But I should totally improve the documentation if it is unclear. I build sagemath
and sagemath-dev
for every branch, not only for develop
. The sagemath-dev
image has the build artifacts, the sagemath
image does not. So, sagemath-dev
is not related to the develop
branch. But it includes the build artifacts in the same way that sagemath-develop
apparently did.
comment:93 in reply to: ↑ 79 Changed 3 years ago by
Replying to embray:
In the Dockerfile section for
sagemath-dev
it has+CMD ["bash"]I've usually found it convenient (even for the -devel image) to just start straight into the sage shell like
/home/sage/sage/sage -sh
or something. But maybe for debugging purposes it's just a well not to. I'm not sure.
Maybe it has improved, but I really dislike how sage -sh
screws with the environment. I find it really confusing unless you understand well what it is doing exactly. I'd rather keep the bash as it is.
comment:94 in reply to: ↑ 81 Changed 3 years ago by
Replying to embray:
Replying to saraedum:
Replying to embray:
Maybe just a minor nitpick, but I noticed that you have all these shared scripts like
describe-system.sh
and so on, and that in both the circle-ci and gitlab configurations you source a lot of these scripts instead of just running them. A few of them export some environment variables, so it that case the sourcing makes sense, but why for the others, that don't modify the environment?Ok. Fixed.
I think you missed one here:
+ # Build docker images + . .ci/build-docker.sh
Thanks. Fixed :)
comment:95 in reply to: ↑ 80 Changed 3 years ago by
Replying to embray:
In
pull-gitlab.sh
you have:+# The variable $DOCKER_IMAGE is set to the full name of the pulled image; +# source this script to use it.that's rather awkward to me. Perhaps this script could output the image name at the end (perhaps silencing the other commands, or piping them to stderr or something). Or better still, maybe
DOCKER_IMAGE
could be set somewhere else, since the way it's pieced together in that script doesn't seem all that directly to the script itself.(The general principle I'm going on with these scripts is that even though they're written for and used primarily by the CI systems, for testing and development purposes I'd rather they work in some "normal", unsurprising way as much as possible.)
I just tried to build it outside and did not like the result. Would you want to try to come up with something here?
comment:96 in reply to: ↑ 90 Changed 3 years ago by
- Work issues changed from check that 4ee63131bad067e53df4369dacb30461a58c97fa still works on CircleCI/GitLab CI to check that this still works on CircleCI/GitLab CI, make jupyter more accesible
Replying to embray:
Replying to saraedum:
I would like to have the releases and non-releases produce the same output if possible. It makes breaking things much harder. If people want to try something out, they should click on a binder link (see #24842.) I believe that's much more convenient anyway. I don't see having to lookup a command as a problem but I might be missing something here: You need to do something about binding the ports anyway, so the casual docker user (me too actually) needs to lookup the docker run command anyway?
That's a fair point, but I still think
docker run -p <local-port>:<container-port> <image-name>
is easier to remember (I used to have to look up how-p
works but not usually anymore) thandocker run -p 127.0.0.1:8888:8888 sagemath/sagemath sage -notebook=jupyter --no-browser --ip='*' --port=8888
, which is definitely more than I can remember.I agree about releases and non-releases having much of the same output, but in this case it's really just a convenience layer for casual users, who are mostly only going to be concerned about final releases.
Let me try to come up with an easier solution for this without creating an extra repository.
comment:97 follow-up: ↓ 101 Changed 3 years ago by
Ok. It's getting hard to keep track of all the threads here. I think I addressed everything you mentioned. Let me know if you think that something is missing.
comment:98 Changed 3 years ago by
- Commit changed from 0892d35ba0c1a6dee2df8faa74cd1155e9964c4b to c3eeb65d4adb6f77ecdb76f5c016c70973c935ae
comment:99 Changed 3 years ago by
- Commit changed from c3eeb65d4adb6f77ecdb76f5c016c70973c935ae to bf56bcae1c28545666ed7ed489c8bcba3e3422ee
Branch pushed to git repo; I updated commit sha1. New commits:
bf56bca | howto provision your own runners
|
comment:100 in reply to: ↑ 87 Changed 3 years ago by
Replying to embray:
Replying to saraedum:
I extended the section about docker:dind. Do you think that's enough? The section could go into the README but then again it's absolutely standard in the GitLab CI world.
Thank you--that's much clearer now, I think. I would still lean towards explaining this in the Readme, because what's bog standard in GitLab CI, I would think, is at this point virtually unknown to almost everyone else who works on Sage (including me). Maybe with a link to https://docs.gitlab.com/ee/ci/docker/using_docker_build.html#use-docker-in-docker-executor which helped me understand this.
I say that because this actually impacts how one configures their runner and/or their CI settings, and they're more likely to find it sooner in the README--most people aren't going to want to dig into the details of the
.gitlab-ci.yaml
if they don't have to.
Oh, missed that one. So, the thing here is really that it does not affect you at all if you just use gitlab.com. If you provision your own runners you need to make them --privileged
. If you care a lot about caching and speed and don't use a cloud service, then you might want to do the DOCKER_HOST thing. I only mention it there to say that there's a way to disable docker:dind if you don't want it for some reason.
New commits:
bf56bca | howto provision your own runners
|
comment:101 in reply to: ↑ 97 ; follow-up: ↓ 102 Changed 3 years ago by
Replying to saraedum:
Ok. It's getting hard to keep track of all the threads here. I think I addressed everything you mentioned. Let me know if you think that something is missing.
Heh, yeah, sorry. I thought of maybe putting a TODO list in the ticket description or something. I'll touch back on this tomorrow and see if I can summarize whatever remaining issues there are.
comment:102 in reply to: ↑ 101 Changed 3 years ago by
Replying to embray:
Replying to saraedum:
Ok. It's getting hard to keep track of all the threads here. I think I addressed everything you mentioned. Let me know if you think that something is missing.
Heh, yeah, sorry. I thought of maybe putting a TODO list in the ticket description or something. I'll touch back on this tomorrow and see if I can summarize whatever remaining issues there are.
I find the "work issues" to be a good place to collect such things.
comment:103 in reply to: ↑ 92 ; follow-up: ↓ 106 Changed 3 years ago by
Replying to saraedum:
I don't really understand the confusion. But I should totally improve the documentation if it is unclear. I build
sagemath
andsagemath-dev
for every branch, not only fordevelop
. Thesagemath-dev
image has the build artifacts, thesagemath
image does not. So,sagemath-dev
is not related to thedevelop
branch. But it includes the build artifacts in the same way thatsagemath-develop
apparently did.
Just throwing a random thought: what about sagemath-build
? But sagemath-dev
may be the right thing if you see it as the closest analogs of xxx-dev
packages in debian&co.
comment:104 Changed 3 years ago by
No strong opinion for sagemath-jupyter
. The docker command to start it (with port forwarding) can be reconstruted without doc lookup by someone who has some familiarity with docker. The full command requires both sagemath and jupyter familiarity.
Having a jupyter image explicitly bring attention that our image is "jupyter ready"; on the other hand, it may be confusing suggesting that our base image is not.
comment:105 follow-up: ↓ 111 Changed 3 years ago by
No strong opinion on sage -sh
either. The only important feature is that, for binder use but also beyond, the commands jupyter
, gap
, singular
, ... be in the path and work.
Thanks for all the work!
comment:106 in reply to: ↑ 103 Changed 3 years ago by
Replying to nthiery:
Replying to saraedum:
I don't really understand the confusion. But I should totally improve the documentation if it is unclear. I build
sagemath
andsagemath-dev
for every branch, not only fordevelop
. Thesagemath-dev
image has the build artifacts, thesagemath
image does not. So,sagemath-dev
is not related to thedevelop
branch. But it includes the build artifacts in the same way thatsagemath-develop
apparently did.Just throwing a random thought: what about
sagemath-build
? Butsagemath-dev
may be the right thing if you see it as the closest analogs ofxxx-dev
packages in debian&co.
It's only an analogue in the sense that it contains the things that developers usually care about. I think that -dev
is good for people who are randomly looking for a sagemath image on docker (and know about the nomenclature in Debian,…): with the -dev
they should know immediately that that's not what they need. I don't mind -build
but I don't find it as obvious in that regard.
comment:107 Changed 3 years ago by
- Work issues changed from check that this still works on CircleCI/GitLab CI, make jupyter more accesible to check that this still works on CircleCI/GitLab CI, make jupyter more accesible, check what's in the PATH (comment:105)
comment:108 Changed 3 years ago by
Then maybe we should just delete the old sagemath-develop
image from Dockerhub so that there's no confusion, though I don't know why we can't just reuse the name for the new images as well. I'm still not convinced it doesn't already serve the same purpose. I'm fine either way as long as Dockerhub will let me delete the old one. Whatever the outcome all I don't want is for there to be a "sagemath-dev" and a "sagemath-develop" on there.
comment:109 Changed 3 years ago by
- Commit changed from bf56bcae1c28545666ed7ed489c8bcba3e3422ee to 183ddd40b2ebd9f53d242a112325bda119b8298c
Branch pushed to git repo; I updated commit sha1. New commits:
fa4599a | Rename commit to .commit
|
71d9ce7 | export variables in sourced scripts
|
329765e | Merge remote-tracking branch 'trac/develop' into HEAD
|
2493bb3 | Fixup for fa4599a6c6849e5a2ac2a1f738eed437843454fa
|
b3c6d14 | Make docker-build.sh POSIX compliant
|
183ddd4 | Make jupyter invocation easier to remember
|
comment:110 Changed 3 years ago by
- Commit changed from 183ddd40b2ebd9f53d242a112325bda119b8298c to 2e74da9f1e408d8f569151198b4c739b2208749f
comment:111 in reply to: ↑ 105 ; follow-up: ↓ 118 Changed 3 years ago by
Replying to nthiery:
No strong opinion on
sage -sh
either. The only important feature is that, for binder use but also beyond, the commandsjupyter
,gap
,singular
, ... be in the path and work.
nthiery: I would like to automatically test that this is the case. So actually what's the requirement here? How does binder run this image?
comment:112 follow-up: ↓ 114 Changed 3 years ago by
It's nothing special about binder per se--rather, it's what the current Docker image does. It sets the default environment so that, for example, $SAGE_LOCAL/bin
is on the PATH, so all commands installed as part of Sage "just work".
I had to remind myself exactly how this works though. It does not start the container with sage -sh
like I suggested previously. I think I ran into some problems with that as well. However, my solution does take many of the environment variables that result from running sage -sh
and sets them by default in the container:
I think it would be good to keep something like this. Actually, it's very important for binder to work correctly since it also ensures that it's running the Jupyter installed with Sage.
comment:113 Changed 3 years ago by
I think there might still be some little details I'm not personally happy with and I have some doubts still about some naming issues. However, if/when this is working for you I'd like to get it merged so we can go ahead and start playing with it, and then fuss about the details later. I think we should also coordinate with Volker on this--I think it would be nice if he can allow us to push changes to develop just on the CI stuff on an as-needed basis without having to necessarily go through a full CI cycle with the buildbots, since this obviously doesn't affect any of the existing continuous integration process.
comment:114 in reply to: ↑ 112 ; follow-ups: ↓ 117 ↓ 119 Changed 3 years ago by
Replying to embray:
It's nothing special about binder per se--rather, it's what the current Docker image does. It sets the default environment so that, for example,
$SAGE_LOCAL/bin
is on the PATH, so all commands installed as part of Sage "just work".I had to remind myself exactly how this works though. It does not start the container with
sage -sh
like I suggested previously. I think I ran into some problems with that as well. However, my solution does take many of the environment variables that result from runningsage -sh
and sets them by default in the container:I think it would be good to keep something like this. Actually, it's very important for binder to work correctly since it also ensures that it's running the Jupyter installed with Sage.
Ok. But how is that different from just setting the entrypoint to sage -sh
like I do currently?
comment:115 Changed 3 years ago by
- Work issues changed from check that this still works on CircleCI/GitLab CI, make jupyter more accesible, check what's in the PATH (comment:105) to check that this still works on CircleCI/GitLab CI
comment:116 Changed 3 years ago by
embray, there is now a shortcut to start jupyter with the usual arguments: docker run -p8888:8888 sagemath/sagemath:latest sage-jupyter
Do you think that's good enough?
comment:117 in reply to: ↑ 114 Changed 3 years ago by
Ok. But how is that different from just setting the entrypoint to
sage -sh
like I do currently?
Will it work if someone does something like docker run -ti gap
?
comment:118 in reply to: ↑ 111 Changed 3 years ago by
nthiery: I would like to automatically test that this is the case. So actually what's the requirement here? How does binder run this image?
Binder just runs the docker image with appropriate port forwarding,
and then launches in the container the command 'jupyter notebook' from
the path, with the port flag and a few others. At some stage, it was
instead calling directly the command jupyter-notebook
but presumably
this is no more the case.
The specific requirements are there:
https://mybinder.readthedocs.io/en/latest/dockerfile.html#preparing-your-dockerfile
Notes:
- We enforced the notebook=5 with with a pip install in our previous docker image. Since #24168, merged in 8.1, the notebook has been upgraded to v5, so the pip install should not be needed anymore.
- Items 4 and 5 do not apply to Sage's docker image; instead they are about the (essentially trivial) Dockerfile's that authors of binder-enabled repositories will write.
Btw: Item 5 is new in the documentation; and seems optional at this stage.
comment:119 in reply to: ↑ 114 Changed 3 years ago by
Replying to saraedum:
Replying to embray:
It's nothing special about binder per se--rather, it's what the current Docker image does. It sets the default environment so that, for example,
$SAGE_LOCAL/bin
is on the PATH, so all commands installed as part of Sage "just work".I had to remind myself exactly how this works though. It does not start the container with
sage -sh
like I suggested previously. I think I ran into some problems with that as well. However, my solution does take many of the environment variables that result from runningsage -sh
and sets them by default in the container:I think it would be good to keep something like this. Actually, it's very important for binder to work correctly since it also ensures that it's running the Jupyter installed with Sage.
Ok. But how is that different from just setting the entrypoint to
sage -sh
like I do currently?
IIRC it's a little less invasive and works for more use cases. I can't remember though exactly why I didn't put sage -sh
in the entrypoint. I know I tried that and it had various problems.
comment:120 Changed 3 years ago by
Just a random thought: this might be calling for an equivalent of conda's source activate
to setup sage's environment variables without having to run a subshell as in sage -sh
.
comment:121 follow-ups: ↓ 122 ↓ 123 Changed 3 years ago by
I agree--I think that was the main problem here--that sage -sh
ran a subshell.
comment:122 in reply to: ↑ 121 Changed 3 years ago by
Replying to embray:
I agree--I think that was the main problem here--that
sage -sh
ran a subshell.
Why is that a problem?
comment:123 in reply to: ↑ 121 Changed 3 years ago by
Replying to embray:
I agree--I think that was the main problem here--that
sage -sh
ran a subshell.
So this runs a shell but then spawns the argument with exec
so there is no additional process created if that's what you would like to avoid.
comment:124 Changed 3 years ago by
I think my main motivation was just ensure that the environment by default has Sage on the $PATH
, as well as any other environment variables needed at runtime for software installed by Sage. But I admit I can't remember why I also didn't just essentially make sage -sh
the entrypoint. I think it was partly because of what you yourself said in terms of it changing too much of the default environment. I'm pretty sure I tried that at first and something didn't work right, but since I can't remember what it was I'd say go for it and give it a try anyways.
But I do agree with Nicolas that something like a source sage-activate
would be good to have in Sage, to just modify the existing environment without going into a subshell (and a deactivate
would be nice to have as well). We could handle that in another issue though...
comment:125 Changed 3 years ago by
- Work issues changed from check that this still works on CircleCI/GitLab CI to check that this still works on CircleCI/GitLab CI; investigate why builds on GitLab hang sometimes
comment:126 Changed 3 years ago by
- Dependencies set to #25161
Since the build hangs during docbuild, I have a feeling that #25161 might be to blame.
comment:127 Changed 3 years ago by
Could I see a link to one of the builds that hanged? Also how much of the logs are we outputting? We could certainly add a less verbose log option. Most of the logs are written to files anyways, so if we needed to do some post-mortem debugging we can just look in the files, and avoid writing so much to stdout.
comment:128 Changed 3 years ago by
Here's an example of a build that timed out: https://gitlab.com/saraedum/sage/-/jobs/62689653 Everything interesting is missing in this log. GitLab limits logs to about 4MB, so I had to cut the build log after 3MB. We can only download the log files when there was no timeout…
I'll try to make this work better in case of timeouts. I'll report back when I know what's the last thing that happens before the timeout.
comment:129 Changed 3 years ago by
- Commit changed from 2e74da9f1e408d8f569151198b4c739b2208749f to 3c78351fdcbfe93c2c353d0aef23bc2e976d7e17
Branch pushed to git repo; I updated commit sha1. Last 10 new commits:
0936a5d | Merge remote-tracking branch 'trac/u/saraedum/sphinxbuild-stack' into u/saraedum/gitlabci
|
3f0c2f0 | Add doctest for #25160
|
0477b3c | Check for errors before ignoring output
|
6b80d72 | Merge remote-tracking branch 'trac/u/saraedum/sphinxbuild-stack' into u/saraedum/gitlabci
|
ca66c03 | Revert "Check for errors before ignoring output"
|
42993c9 | Merge remote-tracking branch 'trac/u/saraedum/sphinxbuild-stack' into u/saraedum/gitlabci
|
6aade39 | Silence docker pull
|
93a7ba6 | Silence apt-get
|
158af15 | Silence setup.py
|
3c78351 | Ignore much more useless chatter in sphinx build
|
comment:130 Changed 3 years ago by
- Commit changed from 3c78351fdcbfe93c2c353d0aef23bc2e976d7e17 to 7d5f3898f3bff423efe3c3e97997de6af5ccd573
Branch pushed to git repo; I updated commit sha1. New commits:
7d5f389 | Smarter truncation of output
|
comment:131 Changed 3 years ago by
- Dependencies changed from #25161 to #25161, #25160
- Work issues changed from check that this still works on CircleCI/GitLab CI; investigate why builds on GitLab hang sometimes to investigate why builds on GitLab hang sometimes
Last 10 new commits:
0936a5d | Merge remote-tracking branch 'trac/u/saraedum/sphinxbuild-stack' into u/saraedum/gitlabci
|
3f0c2f0 | Add doctest for #25160
|
0477b3c | Check for errors before ignoring output
|
6b80d72 | Merge remote-tracking branch 'trac/u/saraedum/sphinxbuild-stack' into u/saraedum/gitlabci
|
ca66c03 | Revert "Check for errors before ignoring output"
|
42993c9 | Merge remote-tracking branch 'trac/u/saraedum/sphinxbuild-stack' into u/saraedum/gitlabci
|
6aade39 | Silence docker pull
|
93a7ba6 | Silence apt-get
|
158af15 | Silence setup.py
|
3c78351 | Ignore much more useless chatter in sphinx build
|
New commits:
7d5f389 | Smarter truncation of output
|
comment:132 Changed 3 years ago by
- Commit changed from 7d5f3898f3bff423efe3c3e97997de6af5ccd573 to cf80b0a90555b0b456efdbd5c3784139519af243
Branch pushed to git repo; I updated commit sha1. New commits:
321658e | head-tail.sh needs stdbuf from coreutils
|
7735a2c | POSIX compliant push-dockerhub.sh script
|
3112236 | Try to get rid of the explicit sh invocation
|
05328a3 | Fix typo in 3c78351fdcbfe93c2c353d0aef23bc2e976d7e17
|
77eeea9 | Do not print any of these ignored warnings
|
5881359 | Remove raise exception logic
|
cf80b0a | Merge remote-tracking branch 'trac/u/saraedum/sphinxbuild-stack' into u/saraedum/gitlabci
|
comment:133 Changed 3 years ago by
- Commit changed from cf80b0a90555b0b456efdbd5c3784139519af243 to f3bfa5fe9971fa2a04190b92da501cbf9724905b
Branch pushed to git repo; I updated commit sha1. New commits:
aa13c84 | Ignore more chatter in docbuild
|
8884d9b | Print available disk space
|
090c25a | Trying to find out which of the shared runners actually provide enough power
|
a788043 | Make NTHREADS respect the available RAM
|
9e317e6 | Allow at least one thread
|
d7e743a | Fixed comments about the do tag
|
49b21e1 | Properly ignore undefined labels on first pass
|
bb1a407 | Fix incorrect @echo command
|
52c2136 | ignoring more chatter during docbuild
|
f3bfa5f | Upgraded .gitlab-ci.yml to provide proper tags that actually work
|
comment:134 follow-up: ↓ 137 Changed 3 years ago by
- Work issues investigate why builds on GitLab hang sometimes deleted
Apparently, GitLab has downgraded their digitalocean runners from 4GB of RAM to only 2GB. That's not enough to build the documentation. I did not manage to create swap space inside docker, so we can only use (single core) GCE runners at the moment. Also we can not build from scratch (i.e., master & develop) on GitLab anymore with shared runners.
We should maybe create a sage group on gitlab where developers get access to more powerful runners. Maybe something we can talk about in Cernay.
comment:135 Changed 3 years ago by
Anyway, this needs review again. It seems to work fine on GitLab CI & CircleCI at the moment.
You can see recent GitLab pipelines here: https://gitlab.com/saraedum/sage/pipelines (note the "stuck" tasks because there are no shared runners that provide enough HDD & RAM for build-from-clean.)
You can also see the CircleCI runs here: https://circleci.com/gh/saraedum/sage (build-from-clean works fine here.)
comment:136 follow-up: ↓ 138 Changed 3 years ago by
I'd be hesitant to move forward on this if it can't build the docs from scratch anymore (though maybe it's not actually that big a problem...?)
That said, I don't want to hold this up any longer. I just wonder exactly what the plan is for "bootstrapping" the whole system if it can't build a working Sage image from scratch.
Also, have you tried using serial doc build? I seem to recall that ends up using less RAM, but takes longer of course...
comment:137 in reply to: ↑ 134 ; follow-up: ↓ 139 Changed 3 years ago by
Replying to saraedum:
Apparently, GitLab has downgraded their digitalocean runners from 4GB of RAM to only 2GB. That's not enough to build the documentation.
Ah, I see what you're saying. If the docs are built in serial that might be just enough, but even with just 2 process 2GB is not currently enough. I'm doing a little analysis of the doc build memory usage, and in the reference docs the manifolds docs take by far the longest, and by themselves chew up to ~800MB for the docbuild process alone, not to mention the couple of instances of maxima it spins up. Meanwhile the combinat docs are some of the most memory-hungry and consume nearly 1.5 GB, so that alone would be enough to run afoul of a 2GB limit.
I know this has been looked at a lot before, but I must believe there's something we can do to more tightly limit memory usage of the docbuilds...
comment:138 in reply to: ↑ 136 Changed 3 years ago by
Replying to embray:
I'd be hesitant to move forward on this if it can't build the docs from scratch anymore (though maybe it's not actually that big a problem...?)
Yes, I also don't really like to go ahead like that. It works on Circle and if you provide your own runners also on GitLab. But let me try a bit harder to make this work on GitLab's own shared runners again…
That said, I don't want to hold this up any longer. I just wonder exactly what the plan is for "bootstrapping" the whole system if it can't build a working Sage image from scratch.
We can bootstrap from CircleCI, or from my GitLab runners that run on Google Compute Engine.
Also, have you tried using serial doc build? I seem to recall that ends up using less RAM, but takes longer of course...
I am building serial (MAKE=make -j1) already.
comment:139 in reply to: ↑ 137 ; follow-up: ↓ 140 Changed 3 years ago by
Replying to embray:
Replying to saraedum:
Apparently, GitLab has downgraded their digitalocean runners from 4GB of RAM to only 2GB. That's not enough to build the documentation.
Ah, I see what you're saying. If the docs are built in serial that might be just enough, but even with just 2 process 2GB is not currently enough. I'm doing a little analysis of the doc build memory usage, and in the reference docs the manifolds docs take by far the longest, and by themselves chew up to ~800MB for the docbuild process alone, not to mention the couple of instances of maxima it spins up. Meanwhile the combinat docs are some of the most memory-hungry and consume nearly 1.5 GB, so that alone would be enough to run afoul of a 2GB limit.
I know this has been looked at a lot before, but I must believe there's something we can do to more tightly limit memory usage of the docbuilds...
So the actual problem seems to come from tachyon invocations.
comment:140 in reply to: ↑ 139 Changed 3 years ago by
Replying to saraedum:
Replying to embray:
Replying to saraedum:
Apparently, GitLab has downgraded their digitalocean runners from 4GB of RAM to only 2GB. That's not enough to build the documentation.
Ah, I see what you're saying. If the docs are built in serial that might be just enough, but even with just 2 process 2GB is not currently enough. I'm doing a little analysis of the doc build memory usage, and in the reference docs the manifolds docs take by far the longest, and by themselves chew up to ~800MB for the docbuild process alone, not to mention the couple of instances of maxima it spins up. Meanwhile the combinat docs are some of the most memory-hungry and consume nearly 1.5 GB, so that alone would be enough to run afoul of a 2GB limit.
I know this has been looked at a lot before, but I must believe there's something we can do to more tightly limit memory usage of the docbuilds...
So the actual problem seems to come from tachyon invocations.
So, I guess that's probably not true. The problem surfaces when the docbuild runs os.fork
to spawn the tachyon process. The fork itself should be quite lightweight, so how can it be that actually fork
fails and not something inside tachyon after the fork?
comment:141 follow-ups: ↓ 144 ↓ 145 Changed 3 years ago by
Though the fork is cheap, there could be an overcommit policy that's causing the trouble… https://stackoverflow.com/a/13329386/812379
comment:142 follow-ups: ↓ 146 ↓ 147 Changed 3 years ago by
The serial runner might curiously be the culprit here. Somewhere in the sphinx build we seem to be leaking memory. Here I am plotting the memory usage in a serial build just before the call to sphinxbuild()
:
[manifolds] process.memory_info()[0]): 280907776 [dynamics ] process.memory_info()[0]): 349028352 [polynomia] process.memory_info()[0]): 365514752 [repl ] process.memory_info()[0]): 375050240 [tensor_fr] process.memory_info()[0]): 375050240 [combinat ] process.memory_info()[0]): 382959616 [plot3d ] process.memory_info()[0]): 1632706560
And here is the same if I run this through the pool but with just one process:
[manifolds] process.memory_info()[0]): 230350848 [dynamics ] process.memory_info()[0]): 191688704 [polynomia] process.memory_info()[0]): 210935808 [repl ] process.memory_info()[0]): 176205824 [tensor_fr] process.memory_info()[0]): 163934208 [combinat ] process.memory_info()[0]): 517341184 [plot3d ] process.memory_info()[0]): 170418176
Btw, I am running ./sage -docbuild en/reference inventory
here but that's probably not that important.
comment:143 Changed 3 years ago by
- Work issues set to make docs build from scratch on GitLab
comment:144 in reply to: ↑ 141 Changed 3 years ago by
Replying to saraedum:
Though the fork is cheap, there could be an overcommit policy that's causing the trouble… https://stackoverflow.com/a/13329386/812379
The overcommit policy is set to 50% overcommit which isn't that much if you only have 2GB and no swap.
comment:145 in reply to: ↑ 141 Changed 3 years ago by
Replying to saraedum:
Though the fork is cheap, there could be an overcommit policy that's causing the trouble… https://stackoverflow.com/a/13329386/812379
That's certainly one thing I was wondering: Is the limit for the runners on virtual memory, or actual resident memory?
comment:146 in reply to: ↑ 142 Changed 3 years ago by
Replying to saraedum:
The serial runner might curiously be the culprit here. Somewhere in the sphinx build we seem to be leaking memory. Here I am plotting the memory usage in a serial build just before the call to
sphinxbuild()
:
Yes, there are definitely some big memory leaks in the doc build. I think Florent spent a lot of time trying to track some of those down, but I don't know if we ever made any progress. I'm going to start looking into it now because, well, I personally haven't tried and I'm curious.
I know it's not during the individual docbuilds: For example, some of the docs still use a lot of memory for their builds--mostly for holding their huge doctrees (e.g. manifolds uses >1GB for this). I would also like to try to reduce this if possible (maybe even splitting up some of the huger docs like manifolds into smaller subdocs). But either way, the individual docbuilds release most of the memory they eat up when they're done, so that's not where the leak is. But after the inventories are merged, by which I mean this part in ReferenceBuilder._wrapper
:
506 # The html refman must be build at the end to ensure correct 507 # merging of indexes and inventories. 508 # Sphinx is run here in the current process (not in a 509 # subprocess) and the IntersphinxCache gets populated to be 510 # used for the second pass of the reference manual and for 511 # the other documents. 512 getattr(DocBuilder(self.name, lang), format)(*args, **kwds)
it chews up about a gigabyte alone, and never releases it. I thought at first this might be the InventoryCache
itself, but no that only appears to be about ~50MB
in size by my accounting. So there's some other big objects that hang around when they really shouldn't anymore.
comment:147 in reply to: ↑ 142 Changed 3 years ago by
Replying to saraedum:
The serial runner might curiously be the culprit here. Somewhere in the sphinx build we seem to be leaking memory. Here I am plotting the memory usage in a serial build just before the call to
sphinxbuild()
:
Interesting--that might be a different leak than the one I'm looking at then. When the individual doc inventories are built in subprocesses they manage not to leak too much (this despite the fact that subprocesses are being reused for each task (maxtasksperchild=None
)
comment:148 Changed 3 years ago by
A few of the larger objects hanging around attached to app.env
after the reference doc inventories build:
- app.env.todo_all_todos (~600 MB)
- app.env.intersphinx_inventory (~50 MB)
- app.env.intersphinx_cache (~50 MB)
that is, assuming my recursive getsizeof function is even right (though I think it has a bug somewhere because it actually goes haywire in app.env.longtitles
--I don't even know what that object is so it requires some exploration...)
comment:149 follow-up: ↓ 153 Changed 3 years ago by
I forget--is there a reason we don't run runsphinx()
in a subprocess? Because sphinx.cmdline.main()
wasn't really ever meant to be invoked as an API in a larger script, and it leaves around all kinds of global state. Just for example:
- the
logging
module's root logger hangs on to a reference the Sphinx app - the
matplotlib.sphinxext.plot_directive.setup
function holds on to a reference to the app for some reason
and there might be other examples.
comment:150 Changed 3 years ago by
There's also quite a lot of Sage-related objects left in memory after the docbuild.
comment:151 Changed 3 years ago by
Not right now, but next week I think I will do some refactoring of the docbuild process a bit to ensure that runsphinx()
is always run in a subprocess. I think that will have a big impact at least in terms of keeping memory leaks under control.
I have some other thoughts I want to try out--that might go quite deep--in terms of keeping doctree sizes under better control too. But that will probably be a longer effort.
comment:152 Changed 3 years ago by
- Cc jdemeyer hivert added
Thanks for investigating improvements to our Sphinx build! I am adding Florent and Jeroen in CC.
At Sage Days 77, Florent and a sphinx developper had done some extensive analysis of the memory footprint of Sphinx. IIRC, there was a lot of useless information that was kept in the doctrees; and also some memory abuse that could not be explained, pointing to some plausible bug/leak in Sphinx itself. Also, IIRC, they had come to the conclusion that using Sphinx's built-in parallelisation was not out of reach. An aspect of it is that, instead of manually splitting our reference manual in smaller and smaller chunks we could tentatively revert back to a single document and let Sphinx play its magic.
But that's from the top of my head from two years ago; I'll let Florent comment and dig up notes from SD77.
comment:153 in reply to: ↑ 149 ; follow-up: ↓ 158 Changed 3 years ago by
Replying to embray:
I forget--is there a reason we don't run
runsphinx()
in a subprocess? Becausesphinx.cmdline.main()
wasn't really ever meant to be invoked as an API in a larger script, and it leaves around all kinds of global state. [...]
That's what Debian does btw. https://salsa.debian.org/science-team/sagemath/blob/master/debian/patches/df-subprocess-sphinx.patch
I think this behaviour was originally introduced in https://trac.sagemath.org/ticket/14204. I am not completely sure, but it seems to me that the idea was to initialize sage.all
once and then use multiprocessing's Pool to fork the individual builds. I think that it's fine to call the main
of sphinx in that way. The problem I think is that this broke when (embray I suppose?) introduced the special case for single-threaded builds. So, I guess we need to fork once to control the RAM the docbuild uses.
Is the serial build still necessary on cygwin?
comment:154 Changed 3 years ago by
- Commit changed from f3bfa5fe9971fa2a04190b92da501cbf9724905b to 95c6275adac26be537c11a084114d8a8dc576392
Branch pushed to git repo; I updated commit sha1. New commits:
5719a72 | Improve caching during build
|
e4887e9 | Fix caching during build
|
be3a0f7 | The docbuild crashes happen during an os.fork to spawn tachyon
|
7d85dc7 | Something related to the sphinxbuild seems to be leaking memory
|
50898fe | Try to build-from-scratch on GitLab's underpowered CI machines
|
95c6275 | Comment on fork logic in docbuilding
|
comment:155 Changed 3 years ago by
The build from scratch on GitLab CI works again btw. It takes a bit more than 10h…
comment:156 Changed 3 years ago by
- Work issues changed from make docs build from scratch on GitLab to does this break docbuild on cygwin now?
comment:157 Changed 3 years ago by
- Commit changed from 95c6275adac26be537c11a084114d8a8dc576392 to c0574ca897321c56927e9ce5004fd2f75693785a
Branch pushed to git repo; I updated commit sha1. New commits:
c0574ca | Merge remote-tracking branch 'trac/develop' into u/saraedum/gitlabci
|
comment:158 in reply to: ↑ 153 Changed 3 years ago by
Replying to saraedum:
Replying to embray:
I forget--is there a reason we don't run
runsphinx()
in a subprocess? Becausesphinx.cmdline.main()
wasn't really ever meant to be invoked as an API in a larger script, and it leaves around all kinds of global state. [...]That's what Debian does btw. https://salsa.debian.org/science-team/sagemath/blob/master/debian/patches/df-subprocess-sphinx.patch
I think this behaviour was originally introduced in https://trac.sagemath.org/ticket/14204. I am not completely sure, but it seems to me that the idea was to initialize
sage.all
once and then use multiprocessing's Pool to fork the individual builds. I think that it's fine to call themain
of sphinx in that way. The problem I think is that this broke when (embray I suppose?) introduced the special case for single-threaded builds. So, I guess we need to fork once to control the RAM the docbuild uses. Is the serial build still necessary on cygwin?
Yes, in #21389 I changed things so that SAGE_NUM_THREADS=1
meant that no subprocesses were created during the docuild. Originally that was motivated by need for Cygwin, but I don't believe that's necessary anymore (all the fork-related bugs that I know of) appear to be fixed. I also now recognize that this is not good behavior. As I wrote above, there's an enormous amount of global state that runsphinx()
leaves around, and trying to clean it all up manually would be very difficult, so it should always be run in a subprocess.
Likewise, simply in the process of building the docs, there is also a lot of global state left in Sage (cached values and the like) which it turns out can eat up a lot of memory, especially after some of the larger docs (like manifolds). This is likewise difficult to manually clean up.
Even a reversion of #21389 wouldn't fully fix the issue though since there are other parts of the docbuild that call runsphinx()
without calling it in a subprocess, and those are responsible for quite a bit of unnecessary memory leakage currently (even in parallel doc builds). So I intend to fix that.
We can follow up on the docbuild discussion elsewhere though, since other than its impact on CI the details of that are orthogonal to this ticket.
comment:159 Changed 3 years ago by
- Work issues does this break docbuild on cygwin now? deleted
comment:160 Changed 3 years ago by
- Commit changed from c0574ca897321c56927e9ce5004fd2f75693785a to 329fdab86a1a3edf03c88780104a1471105fbcd6
Branch pushed to git repo; I updated commit sha1. New commits:
329fdab | fix tags for build-from-latest
|
comment:161 Changed 3 years ago by
So, I think this is really ready for review again.
New commits:
329fdab | fix tags for build-from-latest
|
comment:162 Changed 3 years ago by
The patchbot complains about "print" but that one is actually in an awk script and not in Python, so it's fine.
comment:163 Changed 3 years ago by
- Commit changed from 329fdab86a1a3edf03c88780104a1471105fbcd6 to 048e2d8f0a25bada68693f9b22a8e0426eab2fb2
Branch pushed to git repo; I updated commit sha1. New commits:
048e2d8 | silence unicode warnings from the patchbot
|
comment:164 Changed 3 years ago by
- Status changed from needs_review to needs_work
- Work issues set to patchbot errors
comment:165 Changed 3 years ago by
- Commit changed from 048e2d8f0a25bada68693f9b22a8e0426eab2fb2 to f9c772e5003fda874e82edd5c785eab2657cee1b
Branch pushed to git repo; I updated commit sha1. New commits:
f9c772e | Fix filtering logic in sphinx logger
|
comment:166 Changed 3 years ago by
- Status changed from needs_work to needs_review
- Work issues patchbot errors deleted
New commits:
f9c772e | Fix filtering logic in sphinx logger
|
comment:167 Changed 3 years ago by
- Commit changed from f9c772e5003fda874e82edd5c785eab2657cee1b to f0431637a02819116781d87825523a068bf88ee2
Branch pushed to git repo; I updated commit sha1. New commits:
f043163 | Fixed some minor doctest issues
|
comment:168 Changed 3 years ago by
- Status changed from needs_review to positive_review
LGTM. Let's move forward.
comment:169 Changed 3 years ago by
- Commit changed from f0431637a02819116781d87825523a068bf88ee2 to a34a3e393128f30efae547fdf36f84d6feb34b6e
- Status changed from positive_review to needs_review
Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:
a34a3e3 | It seems that MAKEOPTS/SAGE_NUM_THREADS parsing has been fixed now
|
comment:170 Changed 3 years ago by
- Commit changed from a34a3e393128f30efae547fdf36f84d6feb34b6e to 9c0015ff7e186e9d34caef28d3c1b962dbde75d5
Branch pushed to git repo; I updated commit sha1. New commits:
9c0015f | Fixup a34a3e393128f30efae547fdf36f84d6feb34b6e
|
comment:171 Changed 3 years ago by
erik: I pushed a minor cleanup. Please have a look :)
comment:172 Changed 3 years ago by
Can you just squash the last commit into the previous one?
comment:173 Changed 3 years ago by
The problem is that I had already merged this into another branch. Let's see how unhappy git is about me squashing this.
comment:174 Changed 3 years ago by
- Commit changed from 9c0015ff7e186e9d34caef28d3c1b962dbde75d5 to 8f3480e71cafcef96b1c21aaf3219e8c81e5e9e1
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
8f3480e | It seems that MAKEOPTS/SAGE_NUM_THREADS parsing has been fixed now
|
comment:175 Changed 3 years ago by
Ok. Seems to handle that fine by now.
New commits:
8f3480e | It seems that MAKEOPTS/SAGE_NUM_THREADS parsing has been fixed now
|
comment:176 Changed 3 years ago by
- Status changed from needs_review to needs_work
- Work issues set to caching of build-time-dependencies does not work
comment:177 Changed 3 years ago by
- Commit changed from 8f3480e71cafcef96b1c21aaf3219e8c81e5e9e1 to ad39cc9de9dd8a3f6092c7153278982393da57b8
comment:178 Changed 3 years ago by
- Status changed from needs_work to needs_review
- Work issues caching of build-time-dependencies does not work deleted
Ok. Tested that everything works on GitLab CI and CircleCI. The build (from scratch) on CircleCI barely finished a few minutes before it hits the time limit of five hours but I don't think there's much that I can do to speed things up. (I could try to seed a ccache with files from a previous but I have a feeling that to help with a build from scratch I need the ccache to be huge. I am not sure whether CircleCI lets me do that much network I/O and whether it would be fast enough.)
comment:179 Changed 3 years ago by
- Commit changed from ad39cc9de9dd8a3f6092c7153278982393da57b8 to ae119f23c62921e7e01e5169b52124a2243a8ab1
Branch pushed to git repo; I updated commit sha1. New commits:
ae119f2 | Do not overcommit that much
|
comment:180 Changed 3 years ago by
Perhaps at some point we could experiment with setting up sccache for Sage
comment:181 follow-up: ↓ 188 Changed 3 years ago by
Interesting. I did not know sccache.
Since we did not manage to get this into 8.2, I'll manually update our docker images to 8.2 and 8.3beta0.
comment:182 Changed 3 years ago by
- Commit changed from ae119f23c62921e7e01e5169b52124a2243a8ab1 to a5914a4917e6dbbbc99c8424250f010ee54de809
Branch pushed to git repo; I updated commit sha1. New commits:
a5914a4 | Build tags from clean
|
comment:183 Changed 3 years ago by
- Status changed from needs_review to needs_work
- Work issues set to CircleCI does not correctly build tagged branches
comment:184 Changed 3 years ago by
- Commit changed from a5914a4917e6dbbbc99c8424250f010ee54de809 to 77f1f3c478159da59f069f4a0d4644cc5770fdf5
Branch pushed to git repo; I updated commit sha1. New commits:
77f1f3c | Merge remote-tracking branch 'trac/develop' into HEAD
|
comment:185 Changed 3 years ago by
- Milestone changed from sage-8.2 to sage-8.3
New commits:
77f1f3c | Merge remote-tracking branch 'trac/develop' into HEAD
|
comment:186 Changed 3 years ago by
It looks like the Docker images for 8.2 haven't been built yet, or are you doing that now? If not I will just build them from my existing scripts (that should have been done already...)
comment:187 Changed 3 years ago by
Great, the Docker build is failing now due to #23519. This explains why my automated builds stopped working yet again (I hadn't had a chance to investigate). Fortunately there's a workaround that I can apply easily enough...
comment:188 in reply to: ↑ 181 ; follow-up: ↓ 194 Changed 3 years ago by
Replying to saraedum:
Since we did not manage to get this into 8.2, I'll manually update our docker images to 8.2 and 8.3beta0.
I'm building both actually.
comment:189 Changed 3 years ago by
- Description modified (diff)
comment:190 Changed 3 years ago by
- Commit changed from 77f1f3c478159da59f069f4a0d4644cc5770fdf5 to 989501f837a8efbbee8f943c886c7985bc6f6572
comment:191 Changed 3 years ago by
- Commit changed from 989501f837a8efbbee8f943c886c7985bc6f6572 to a4487bf1fa369edb3084bcb9ba37755156eea770
Branch pushed to git repo; I updated commit sha1. New commits:
a4487bf | Fix format of stable releases and mention betas
|
comment:192 Changed 3 years ago by
- Commit changed from a4487bf1fa369edb3084bcb9ba37755156eea770 to 2d33a626429f5ae07dbe3dd7791b238341e37ab7
Branch pushed to git repo; I updated commit sha1. New commits:
2d33a62 | fix typo
|
comment:193 Changed 3 years ago by
- Commit changed from 2d33a626429f5ae07dbe3dd7791b238341e37ab7 to 481855ff1ab88f258b97fb4d5896d3e7c614e41f
Branch pushed to git repo; I updated commit sha1. New commits:
481855f | Merge remote-tracking branch 'trac/develop' into gitlabci
|
comment:194 in reply to: ↑ 188 ; follow-up: ↓ 195 Changed 3 years ago by
Replying to saraedum:
Replying to saraedum:
Since we did not manage to get this into 8.2, I'll manually update our docker images to 8.2 and 8.3beta0.
I'm building both actually.
I pushed both (they are much smaller than the one from the old setup, so I hope it's ok that I replaced yours.) gitlab is building 8.3.beta1 currently.
comment:195 in reply to: ↑ 194 ; follow-up: ↓ 196 Changed 3 years ago by
Replying to saraedum:
Replying to saraedum:
Replying to saraedum:
Since we did not manage to get this into 8.2, I'll manually update our docker images to 8.2 and 8.3beta0.
I'm building both actually.
I pushed both (they are much smaller than the one from the old setup, so I hope it's ok that I replaced yours.) gitlab is building 8.3.beta1 currently.
I'm a little concerned about that. Does that include the sagemath-jupyter image? People are using that.
comment:196 in reply to: ↑ 195 Changed 3 years ago by
Replying to embray:
Replying to saraedum:
Replying to saraedum:
Replying to saraedum:
Since we did not manage to get this into 8.2, I'll manually update our docker images to 8.2 and 8.3beta0.
I'm building both actually.
I pushed both (they are much smaller than the one from the old setup, so I hope it's ok that I replaced yours.) gitlab is building 8.3.beta1 currently.
I'm a little concerned about that. Does that include the sagemath-jupyter image? People are using that.
The sagemath/sagemath image does contain jupyter. I did not touch the sagemath-jupyter images but I'll create a README there to point people to the main repository.
comment:197 Changed 3 years ago by
- Commit changed from 481855ff1ab88f258b97fb4d5896d3e7c614e41f to 2632d32509c5f64c67d7bbd13d214dddb3821e2c
comment:198 Changed 3 years ago by
- Commit changed from 2632d32509c5f64c67d7bbd13d214dddb3821e2c to bf0b9ff9dd125bb575a0b0111c30779e1ed81425
Branch pushed to git repo; I updated commit sha1. New commits:
a532fce | Merge remote-tracking branch 'trac/develop' into u/saraedum/gitlabci
|
686fe86 | An attempt to use workflows on CircleCI
|
79003ab | Do not deduce CPUs/RAM by looking at the local machine
|
5ba2f8a | Backport https://trac.sagemath.org/ticket/24655
|
24a627f | Merge commit '5ba2f8a419f11dd51ae95ed44f748fe85fbe0bf8' into u/saraedum/gitlabci
|
bf0b9ff | Collect system information on the docker host
|
comment:199 Changed 3 years ago by
- Status changed from needs_work to needs_review
- Work issues changed from CircleCI does not correctly build tagged branches to test all variations of branches and tags on CircleCI & GitLab CI
comment:200 Changed 3 years ago by
- Commit changed from bf0b9ff9dd125bb575a0b0111c30779e1ed81425 to a0fdc3c60f0415963bf4e945f2694dee66ebb8eb
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
906ebea | Collect system information on the docker host
|
60ddb4b | simplify apk add command for GitLab CI
|
25f934e | Build from the latest develop image
|
aa49196 | Backport https://trac.sagemath.org/ticket/24655 for automated docker builds
|
a0fdc3c | Merge commit 'aa49196c4f0ed33da7a45b80c7a4a01bd46b7d61' into u/saraedum/gitlabci
|
comment:201 Changed 3 years ago by
- Branch u/saraedum/gitlabci deleted
- Commit a0fdc3c60f0415963bf4e945f2694dee66ebb8eb deleted
comment:202 Changed 3 years ago by
- Branch set to u/saraedum/24655
comment:203 Changed 3 years ago by
- Commit set to 11f8a6cc24e9a1853a85f45d48210bdfb35254ff
Branch pushed to git repo; I updated commit sha1. New commits:
11f8a6c | no port exposure needed for jupyter anymore
|
comment:204 Changed 3 years ago by
- Commit changed from 11f8a6cc24e9a1853a85f45d48210bdfb35254ff to 1960ba1e0674435f222af242df6252d2b7ea50c8
Branch pushed to git repo; I updated commit sha1. New commits:
1960ba1 | add openssl/libssl-dev to the docker images
|
comment:205 Changed 3 years ago by
- Description modified (diff)
comment:206 Changed 3 years ago by
- Description modified (diff)
comment:207 Changed 3 years ago by
- Commit changed from 1960ba1e0674435f222af242df6252d2b7ea50c8 to f2c641a9d3a99fd0a1dc4f95c801c600e226f1f5
comment:208 Changed 3 years ago by
- Commit changed from f2c641a9d3a99fd0a1dc4f95c801c600e226f1f5 to 83f71e7948c8eb8a72e63d4d1451f9f20779bb8b
Branch pushed to git repo; I updated commit sha1. New commits:
83f71e7 | Don't pipe "echo" into "rm"
|
comment:209 Changed 3 years ago by
- Description modified (diff)
comment:210 Changed 3 years ago by
- Commit changed from 83f71e7948c8eb8a72e63d4d1451f9f20779bb8b to d28f601f53ddae2cecadc95854fbd91a525ac8f7
Branch pushed to git repo; I updated commit sha1. New commits:
d28f601 | Make sage -pip work
|
comment:211 Changed 3 years ago by
- Commit changed from d28f601f53ddae2cecadc95854fbd91a525ac8f7 to 7f44254f417f2cf19a50cd4bbe1e387f2084744b
comment:212 Changed 3 years ago by
- Work issues test all variations of branches and tags on CircleCI & GitLab CI deleted
I created the 8.2 and 8.3.beta* releases with this and it seems to work fine at the moment.
comment:213 Changed 3 years ago by
So where are we at with this now? I've seen there's been a lot of continued activity from you on the ticket but I haven't followed what it was all about. Just minor fixes or anything major to know?
comment:214 Changed 3 years ago by
There's nothing really major that has changed. The CircleCI setup looks a bit different now but on the inside its mostly the same. Here's a changelog of sorts since you gave it positive_review last:
- I had set SAGE_NUM_THREADS/MAKEOPTS from the host that ran the
docker build
commands not from inside the actual docker containers; the specs of these two machines are sometimes very different on CircleCI. That's fixed now. - I worked around some docker build bug related to caching and multi-stage builds. I don't really understand what's the problem there but it works now.
- Most of the changes are to make builds for git tags work on CircleCI.
- I started using these images for actual Sage development and found some minor issues in the git setup inside the sagemath-dev image.
- I fixed the pip/openssl issue.
So…does it work? It works fine for building from scratch. I've done that for 8.2 and all the betas since then. It also worked for building on top of the latest beta but I have not tested that in all the possible combinations (say, changing SPKGs and things like that) this time. As nobody knows about that latter feature yet, nobody is going to be upset if it doesn't work ;) I certainly want to use this heavily so I'll fix problems right away.
comment:215 Changed 3 years ago by
Let me check if I changed anything since your last review that is not exclusively docker related…
comment:216 Changed 3 years ago by
This has changed since you reviewed it last, so nothing "dangerous" I'd say:
.ci/build-docker.sh | 26 +++++++++++--------------- .ci/describe-system.sh | 14 ++++++-------- .ci/setup-make-parallelity.sh | 66 ++++++++++++++++++++++-------------------------------------------- .ci/test-dev.sh | 4 ++-- .ci/test-jupyter.sh | 5 +++-- .ci/update-env.sh | 6 +----- .circleci/before-script.sh | 26 -------------------------- .circleci/config.yml | 100 +++++++++++++++++++++++++++++++++++++++------------------------------------------------------------- .gitlab-ci.yml | 19 ++++++++++++------- docker/Dockerfile | 83 ++++++++++++++++++++++++++++++++++++----------------------------------------------- docker/README.md | 7 +++---- 11 files changed, 135 insertions(+), 221 deletions(-)
comment:217 follow-up: ↓ 218 Changed 3 years ago by
Thanks for the summary.
Though the amount of minor tweaks this has continued to need seems to confirm my feeling that there should really be a way to get these changes into develop
faster, somehow, or use a branch other than develop
for CI purposes (but that is regularly reset to the latest develop
as soon as any necessary changes are merged into develop
...) I don't know if you have any ideas about that.
Like, what if CircleCI, or GitLab changes something, or Docker changes something that has to be worked around (something that has happened to me plenty of times already)? We don't want to have all CI break until the next beta in develop
...
comment:218 in reply to: ↑ 217 Changed 3 years ago by
Replying to embray:
Thanks for the summary.
Though the amount of minor tweaks this has continued to need seems to confirm my feeling that there should really be a way to get these changes into
develop
faster, somehow, or use a branch other thandevelop
for CI purposes (but that is regularly reset to the latestdevelop
as soon as any necessary changes are merged intodevelop
...) I don't know if you have any ideas about that.Like, what if CircleCI, or GitLab changes something, or Docker changes something that has to be worked around (something that has happened to me plenty of times already)? We don't want to have all CI break until the next beta in
develop
...
I see your point but I don't have a good answer. As long as nobody is relying on this, we can just stop the builds on CircleCI/GitLab until the fix is in develop. Then it is going to work again once people merge in the latest develop. That's not great, sure.
I think this whole problem is a bit similar to the git trac
issue that we had when we migrated to git
. We decided to put the predecessor of the git trac
command into the sage repository which was a bad idea as it wasn't stable and when you went back to an old branch you got the old broken tooling. Now that git trac
is stable and hardly ever changes we could put it back into the main repository (I am not proposing that, just saying.) So the hope here could be that once people start to use this actively, the main problems have been sorted out already.
The fundamental problem is that I don't see how to take this out of the repository in any way that is even remotely supported by the CIs that we are planning to use. conda-forge has the tooling (conda-smithy) outside of the individual repositories but you have to say "@conda-forge-admin please rerender" all the time to make the tooling upgrade itself in your pull request. It's annoying, it would mean that we would need a trac bot to do this rerendering of the CI related files, and it makes the diffs hard to read as they are filled with CI related changes all the time.
The other option would be to only have a minimal hook call in the repository and then have that trigger actual pipelines somewhere else on that commit and report back through bots. It's a lot of work and I don't think we would want to maintain that kind of infrastructure.
Long story short. I fear we have to use the CIs the way they're intended to be used.
The other issue that you're sort of raising for me is whether develop should be pushed to more frequently. I would love to see a marge-bot or something similar that merges in things automatically; but this goes beyond the focus of this ticket I think.
comment:219 Changed 3 years ago by
- Commit changed from 7f44254f417f2cf19a50cd4bbe1e387f2084744b to ab71cfa8a74651587bb84fabffe39bad35fcfb4b
comment:220 Changed 3 years ago by
Hm…we could run a script that is contained in an environment variable. That way we could inject whatever behaviour into every CI run from the outside, e.g., to display a warning that something is broken at the moment and instructions on how to work around.
comment:221 Changed 3 years ago by
- Commit changed from ab71cfa8a74651587bb84fabffe39bad35fcfb4b to 511bf3ebfc3c5fef8320c5116592aca875971b85
Branch pushed to git repo; I updated commit sha1. New commits:
511bf3e | Introduce monkey-patching in CI
|
comment:222 follow-up: ↓ 224 Changed 3 years ago by
What about something like this (last commit). I have not tested it yet but do you think that's a good idea?
comment:223 Changed 3 years ago by
I see your point but I don't have a good answer. As long as nobody is relying on this, we can just stop the builds on CircleCI/GitLab until the fix is in develop. Then it is going to work again once people merge in the latest develop. That's not great, sure.
"as long as nobody is relying on this" yeah, but we want them to rely on it; perhaps even exclusively long term :)
Fortunately, I think once this is in more use it will also force, or at least encourage a change in Sage's development process to be more like a "normal" project.
comment:224 in reply to: ↑ 222 Changed 3 years ago by
Replying to saraedum:
What about something like this (last commit). I have not tested it yet but do you think that's a good idea?
Definitely makes me uh, a bit nervous but okay :) It could be useful to have and is at least better than nothing as a short-term solution to this problem (if it even does become more than a hypothetical problem).
comment:225 Changed 3 years ago by
I am not sure this is the right location to report on this. Maybe the README about our Docker image on dockerhub should contain some information on where to report issues?
Anyway, I am having trouble with using binder with the docker image for SageMath 8.2 as currently posted on dockerhub compared to that for 8.1.
Namely, when binder launches Jupyter, its working directory is /home/sage/sage
, instead of /home/sage
as before. This means in particular that all the notebooks and files that are copied over from the github repository with the usual COPY . ${HOME}
are not accessible anymore.
Should WORK_DIR be reset to ${HOME} at the end of the Dockerfile?
comment:226 follow-up: ↓ 236 Changed 3 years ago by
nthiery: You are right, we should mention that bugs should be reported on trac as usual. I am not sure whether COPY . ${HOME}
is the usual thing. I usually do COPY --chown=sage:sage . .
. I don't know, should we reset WORKDIR to HOME? Or should it be the place where Sage is installed. Let's reset it to HOME, so we don't break scripts that used to work with the 8.1 image. (This breaks compatibility with 8.2 images but anyway…)
comment:227 Changed 3 years ago by
- Commit changed from 511bf3ebfc3c5fef8320c5116592aca875971b85 to 4ef4e1830f78d45e9faf0ac48fa44f9d57da33ac
comment:228 Changed 3 years ago by
- Commit changed from 4ef4e1830f78d45e9faf0ac48fa44f9d57da33ac to b8a94e2fdbe3c8b7db3b06be0a707373ce85c1d9
Branch pushed to git repo; I updated commit sha1. New commits:
b8a94e2 | Merge remote-tracking branch 'trac/develop' into 24655
|
comment:229 Changed 3 years ago by
- Commit changed from b8a94e2fdbe3c8b7db3b06be0a707373ce85c1d9 to 6a3d5d0c70310ddcf28ac684c56bf279d189179c
Branch pushed to git repo; I updated commit sha1. New commits:
6a3d5d0 | drop trailing space
|
comment:230 Changed 3 years ago by
- Commit changed from 6a3d5d0c70310ddcf28ac684c56bf279d189179c to f60cebb458d21265afa3bb6f2f7d4701cc3e9085
Branch pushed to git repo; I updated commit sha1. New commits:
f60cebb | Merge branch '24655' into 8.3.beta4.docker
|
comment:231 Changed 3 years ago by
- Commit changed from f60cebb458d21265afa3bb6f2f7d4701cc3e9085 to 8ff2bb4ae935a2b58820908174fd5ad33f7fa01b
Branch pushed to git repo; I updated commit sha1. New commits:
8ff2bb4 | Make update-env posix compliant again
|
comment:232 Changed 3 years ago by
- Commit changed from 8ff2bb4ae935a2b58820908174fd5ad33f7fa01b to f7ea0f3c7a10baf475e96bd0bacd09f06039b14b
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
f7ea0f3 | Make update-env posix compliant again
|
comment:233 follow-up: ↓ 235 Changed 3 years ago by
I pushed a new 8.2 image with the WORKDIR set to /home/sage
. Both sage
and sage-jupyter
still work, so I hope that this doesn't break anything.
New commits:
f7ea0f3 | Make update-env posix compliant again
|
comment:234 follow-up: ↓ 243 Changed 3 years ago by
embray: The monkey patching seems to work. So this needs review again.
comment:235 in reply to: ↑ 233 ; follow-up: ↓ 237 Changed 3 years ago by
Replying to saraedum:
I pushed a new 8.2 image with the WORKDIR set to
/home/sage
. Bothsage
andsage-jupyter
still work, so I hope that this doesn't break anything.
Thanks!
I have a problem though: since binder already downloaded an image with tag 8.2, it won't redownload the new version. Is there a way to add an alias tag such as 8.2-1 ?
Thanks!
comment:236 in reply to: ↑ 226 Changed 3 years ago by
Replying to saraedum:
nthiery: You are right, we should mention that bugs should be reported on trac as usual. I am not sure whether
COPY . ${HOME}
is the usual thing. I usually doCOPY --chown=sage:sage . .
.
IIRC, the binder doc suggests COPY . ${HOME}
; in sage-binder-env, I am indeed adding --chown=sage:sage
.
I don't know, should we reset WORKDIR to HOME? Or should it be the place where Sage is installed. Let's reset it to HOME, so we don't break scripts that used to work with the 8.1 image. (This breaks compatibility with 8.2 images but anyway…)
I would tend to see the location of the sage install (and in general which ever software is installed) as an implementation detail; but this may be a bias from my use cases of the image!
comment:237 in reply to: ↑ 235 Changed 3 years ago by
Replying to nthiery:
Replying to saraedum:
I pushed a new 8.2 image with the WORKDIR set to
/home/sage
. Bothsage
andsage-jupyter
still work, so I hope that this doesn't break anything.Thanks!
I have a problem though: since binder already downloaded an image with tag 8.2, it won't redownload the new version. Is there a way to add an alias tag such as 8.2-1 ?
Strange, I had been curious about this a while back and had tried whether binder would check for image upgrades and it actually did. Have you changed something in the repository? I think you need to make some change so it rebuilds your Dockerfile. If it rebuilds it should pull the latest version. Anyway, to work around your specific problem, you could also just use latest
instead of 8.2
which is the latest released stable version.
comment:238 Changed 3 years ago by
Alas, tried again, and it did not work. Also binder explicitly forbids using "latest". No luck!
comment:239 Changed 3 years ago by
Just tried, putting a hash into the Dockerfile worked for me FROM sagemath/sagemath@sha256:e933509b105f36b9b7de892af847ade7753e058c5d9e0c0f280f591b85ad996d
comment:240 Changed 3 years ago by
Ah yes, good point. Thanks for the tip, that worked!
comment:241 Changed 3 years ago by
- Commit changed from f7ea0f3c7a10baf475e96bd0bacd09f06039b14b to e8dadc8fc9e64968f66ff9953115203b6b0603df
Branch pushed to git repo; I updated commit sha1. New commits:
e8dadc8 | Merge remote-tracking branch 'trac/develop' into 24655
|
comment:242 Changed 3 years ago by
- Commit changed from e8dadc8fc9e64968f66ff9953115203b6b0603df to 569d492f83e15b7abfd79c5ddbdda51928187fe0
Branch pushed to git repo; I updated commit sha1. New commits:
569d492 | Merge remote-tracking branch 'trac/develop' into 24655
|
comment:243 in reply to: ↑ 234 ; follow-up: ↓ 244 Changed 3 years ago by
Replying to saraedum:
embray: The monkey patching seems to work. So this needs review again.
Wait, what monkey patching? I've been AFK for 2 weeks so I'm not sure what you're referring to anymore :)
comment:244 in reply to: ↑ 243 Changed 3 years ago by
Replying to embray:
Replying to saraedum:
embray: The monkey patching seems to work. So this needs review again.
Wait, what monkey patching? I've been AFK for 2 weeks so I'm not sure what you're referring to anymore :)
I was refering to the injection of a bash script into the CI run through environment variables with that. Sorry the term is probably not appropriate here.
comment:245 Changed 3 years ago by
Ah right, that. Well, I've never *seen* "monkey-patch" used that way before, nor have I even seen that trick used before, but I suppose "monkey-patch" isn't a bad term for it :)
comment:246 Changed 3 years ago by
- Commit changed from 569d492f83e15b7abfd79c5ddbdda51928187fe0 to baf71bdf605e916630e97bee88194401d5288584
Branch pushed to git repo; I updated commit sha1. New commits:
baf71bd | Merge remote-tracking branch develop into 24655
|
comment:247 Changed 3 years ago by
- Commit changed from baf71bdf605e916630e97bee88194401d5288584 to 89115fa2d0767256ce62e0a15d1263d0fc30f392
Branch pushed to git repo; I updated commit sha1. New commits:
89115fa | Merge remote-tracking branch 'trac/develop' into 24655
|
comment:248 Changed 3 years ago by
- Commit changed from 89115fa2d0767256ce62e0a15d1263d0fc30f392 to 6319a7728304d4bffbc477989a7d182067ff5c56
Branch pushed to git repo; I updated commit sha1. New commits:
6319a77 | Allow to push to another's user's namespace
|
comment:249 Changed 3 years ago by
- Commit changed from 6319a7728304d4bffbc477989a7d182067ff5c56 to 5d9b1a5c9ec4588567a8f34bdb895082a6c1c26d
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
5d9b1a5 | Allow to push to another's user's namespace
|
comment:250 Changed 3 years ago by
How is this coming along? Should I review this again and/or set positive review?
I have the gitlab/trac bot more or less ready to start work too.
comment:251 Changed 3 years ago by
- Commit changed from 5d9b1a5c9ec4588567a8f34bdb895082a6c1c26d to 4bdd32a4c7a9a434bc992be8d4e84ef7f2c89748
Branch pushed to git repo; I updated commit sha1. New commits:
4bdd32a | Retry build-from-clean on github
|
comment:252 Changed 3 years ago by
Yes, this is definitely ready for review. If you are happy with it, I would run a full set of tests before we really set it to positive review. But I am using this all the time myself and it seems to be sufficiently stable to be useful.
comment:253 Changed 3 years ago by
- Commit changed from 4bdd32a4c7a9a434bc992be8d4e84ef7f2c89748 to 754c52bcf2ded4813b440e8f15e4ee2f295c290a
Branch pushed to git repo; I updated commit sha1. New commits:
754c52b | Skip publication steps if SECRET_DOCKER_PASS has not been set
|
comment:254 Changed 3 years ago by
- Commit changed from 754c52bcf2ded4813b440e8f15e4ee2f295c290a to 894c5b8b5c55324015e1d3c0664409bc60c8b992
Branch pushed to git repo; I updated commit sha1. New commits:
894c5b8 | Fixup for 5d9b1a5c9ec4588567a8f34bdb895082a6c1c26d
|
comment:255 Changed 3 years ago by
- Commit changed from 894c5b8b5c55324015e1d3c0664409bc60c8b992 to d29058e7950b177f6d0c48c88248790ec6a86a3b
Branch pushed to git repo; I updated commit sha1. New commits:
d29058e | Merge remote-tracking branch 'trac/develop' into 24655
|
comment:256 follow-up: ↓ 258 Changed 3 years ago by
What's the status of the Circle-CI builds? Looking at https://circleci.com/gh/saraedum/sage it hasn't updated in a month and the last 2 builds were failures.
comment:257 follow-up: ↓ 259 Changed 3 years ago by
Also, what's up with the latest 3 builds on https://gitlab.com/saraedum/sage/pipelines ? It just says they got stuck/timed out, but with no log to show for it.
comment:258 in reply to: ↑ 256 Changed 3 years ago by
Replying to embray:
What's the status of the Circle-CI builds? Looking at https://circleci.com/gh/saraedum/sage it hasn't updated in a month and the last 2 builds were failures.
The builds failed because "build-from-clean" does not work anymore (Sage takes too much time now to build.) I have not tried the build-from-latest for a bit. If everything else looks good to you I would run a test of every scenario that I can think of, including these CircleCI builds.
comment:259 in reply to: ↑ 257 Changed 3 years ago by
Replying to embray:
Also, what's up with the latest 3 builds on https://gitlab.com/saraedum/sage/pipelines ? It just says they got stuck/timed out, but with no log to show for it.
The runners are currently down because the credit card linked to the google cloud account expired. The owner is working on fixing that. The build-from-clean only works with non-shared runners because Sage takes too long to build. The build-from-latest still works with shared runners.
comment:260 follow-up: ↓ 261 Changed 3 years ago by
I wish there was some way to see exactly which runner a pipeline ran on. I can't find anything to that effect, but you'd think one would want to know, especially if there were failures (especially since some failures can depend on the runner).
comment:261 in reply to: ↑ 260 Changed 3 years ago by
Replying to embray:
I wish there was some way to see exactly which runner a pipeline ran on. I can't find anything to that effect, but you'd think one would want to know, especially if there were failures (especially since some failures can depend on the runner).
The first line of the log says which runner it ran on exactly. I also print the specs of the runner in the logs.
comment:262 Changed 3 years ago by
Right, I see that now, and it works fine on successful builds. It's just odd that there's no output for the recent failures. Is that just because it can't even contact the runner?
comment:263 follow-up: ↓ 271 Changed 3 years ago by
Yes, here's an example where build-from-clean failed but still provided the runner info: https://gitlab.com/saraedum/sage/-/jobs/79424547
Is slelievre providing this runner?
comment:264 follow-up: ↓ 265 Changed 3 years ago by
Instead of having build-from-clean
rely on runners tagged "do" (which I understand stands for Digital Ocean, though out of context I just read it as the word "do"), maybe let's have a tag like "big" or "fat" indicating "big enough to build sage from scratch", since in principle such a runner could be hosted anywhere (in fact I will probably set one up on our OpenStack infrastructure ASAP).
comment:265 in reply to: ↑ 264 Changed 3 years ago by
Replying to embray:
Instead of having
build-from-clean
rely on runners tagged "do" (which I understand stands for Digital Ocean, though out of context I just read it as the word "do"), maybe let's have a tag like "big" or "fat" indicating "big enough to build sage from scratch", since in principle such a runner could be hosted anywhere (in fact I will probably set one up on our OpenStack infrastructure ASAP).
Amusingly, moments after I posted this, said OpenStack infrastructure where I am currently working on multiple VMs, went down completely. As if their internet uplink was killed or something. Completely unreachable.
comment:266 Changed 3 years ago by
I think you're right. The original idea was that "build-from-clean" worked for the free shared runners provided by gitlab.com that were tagged as "do". This is not reliably the case anymore so we should change that tag.
comment:267 Changed 3 years ago by
- Work issues set to replace "do" with something else
comment:268 Changed 3 years ago by
- Commit changed from d29058e7950b177f6d0c48c88248790ec6a86a3b to 2cec8ac704cddc2c65cbc439a7b07c7bcc953f1b
Branch pushed to git repo; I updated commit sha1. New commits:
2cec8ac | Replace do tag with big
|
comment:269 Changed 3 years ago by
- Commit changed from 2cec8ac704cddc2c65cbc439a7b07c7bcc953f1b to 624101115b9af149f8bf0d7cfe2f63d9ccf5201d
Branch pushed to git repo; I updated commit sha1. New commits:
6241011 | Fix namespacing for CircleCI
|
comment:270 follow-up: ↓ 272 Changed 3 years ago by
embray: I am not sure how we want to do the review on this one. I could do full tests of all the relevant scenarios today and tomorrow. The problem is: If during your review you find things that you want to see changed, then I have to go through the whole testing process again.
The alternative is: You review the current state, and I do the full test just before we set this to positive review.
The first option is more straightforward probably. But it only works if you think that there ane not going to be relevant changes necessary. (I am not sure how familiar you are with the current state of this ticket.)
comment:271 in reply to: ↑ 263 Changed 3 years ago by
Replying to embray:
Yes, here's an example where build-from-clean failed but still provided the runner info: https://gitlab.com/saraedum/sage/-/jobs/79424547
Is slelievre providing this runner?
Yes, the one he's running say "slelievre" in the first line. Mine say "jrueth".
comment:272 in reply to: ↑ 270 ; follow-up: ↓ 274 Changed 3 years ago by
Replying to saraedum:
embray: I am not sure how we want to do the review on this one. I could do full tests of all the relevant scenarios today and tomorrow. The problem is: If during your review you find things that you want to see changed, then I have to go through the whole testing process again.
The alternative is: You review the current state, and I do the full test just before we set this to positive review.
The first option is more straightforward probably. But it only works if you think that there ane not going to be relevant changes necessary. (I am not sure how familiar you are with the current state of this ticket.)
I'm mostly caught up on it--it is a lot though. As far as I'm concerned it's "positive review". While I believe there are still areas we'll want to improve a lot of it comes down to computing resources (and other related issues like continuing to improve the resource usage of the doc build, for which I have ideas but haven't had a chance to try them yet).
Otherwise I believe you that it works, and I'd like to be able to start using it right away, so it's a provisional Positive Review from me. My only remaining question is: Do we want to go ahead now and bring this work up on sage-devel and ask others to weight in? Or would that be too disruptive overall? (If nothing else we'll need to get Volker to at least agree to merge it.)
comment:273 follow-up: ↓ 275 Changed 3 years ago by
I think we should also approach companies like Google, Digital Ocean, Microsoft, etc. (and what about for OSX??) to see if we can get some donated resources from them for this purpose. I don't think it's outside the realm of possibility, especially with a decent proposal (this would definitely fall under OpenDreamKit? work I think). But that's something we can work on in parallel.
comment:274 in reply to: ↑ 272 Changed 3 years ago by
- Work issues changed from replace "do" with something else to check that everything works ⇒ positive review
Replying to embray:
Replying to saraedum:
embray: I am not sure how we want to do the review on this one. I could do full tests of all the relevant scenarios today and tomorrow. The problem is: If during your review you find things that you want to see changed, then I have to go through the whole testing process again.
The alternative is: You review the current state, and I do the full test just before we set this to positive review.
The first option is more straightforward probably. But it only works if you think that there ane not going to be relevant changes necessary. (I am not sure how familiar you are with the current state of this ticket.)
I'm mostly caught up on it--it is a lot though. As far as I'm concerned it's "positive review". While I believe there are still areas we'll want to improve a lot of it comes down to computing resources (and other related issues like continuing to improve the resource usage of the doc build, for which I have ideas but haven't had a chance to try them yet).
Otherwise I believe you that it works, and I'd like to be able to start using it right away, so it's a provisional Positive Review from me.
Great. Let me test that everything works then. And improving resource usage of the build would be quite amazing. Let me know if I can support you in any way there.
My only remaining question is: Do we want to go ahead now and bring this work up on sage-devel and ask others to weight in? Or would that be too disruptive overall? (If nothing else we'll need to get Volker to at least agree to merge it.)
I don't think we need general consent from or discussion on sage-devel. For me this ticket is only about automatically building docker images to publish them on our Docker Hub account more conveniently. It's not about replacing the patchbot, migrating to GitLab, or anything like that.
At the same time this can of course be a first step towards more important changes, most promimently for me the binder ticket #24842. I'll put that one up for discussion on sage-devel once it's ready.
comment:275 in reply to: ↑ 273 ; follow-up: ↓ 277 Changed 3 years ago by
Replying to embray:
I think we should also approach companies like Google, Digital Ocean, Microsoft, etc. (and what about for OSX??) to see if we can get some donated resources from them for this purpose. I don't think it's outside the realm of possibility, especially with a decent proposal (this would definitely fall under OpenDreamKit? work I think). But that's something we can work on in parallel.
Could you create a trac ticket to keep track of this? Or an issue on the ODK github account?
comment:276 Changed 3 years ago by
- Work issues changed from check that everything works ⇒ positive review to check that everything works ⇒ (provisional) positive review
comment:277 in reply to: ↑ 275 ; follow-up: ↓ 278 Changed 3 years ago by
Replying to saraedum:
Replying to embray:
I think we should also approach companies like Google, Digital Ocean, Microsoft, etc. (and what about for OSX??) to see if we can get some donated resources from them for this purpose. I don't think it's outside the realm of possibility, especially with a decent proposal (this would definitely fall under OpenDreamKit? work I think). But that's something we can work on in parallel.
Could you create a trac ticket to keep track of this? Or an issue on the ODK github account?
Speaking of which, I was going to ask if you saw my messages on Zulip. But it appears to be down (and I'm afraid I might have broken it, somehow, by trying to log in): https://zulip.sagemath.org/
But I believe this could fall under D3.8
comment:278 in reply to: ↑ 277 Changed 3 years ago by
Replying to embray:
Speaking of which, I was going to ask if you saw my messages on Zulip. But it appears to be down (and I'm afraid I might have broken it, somehow, by trying to log in): https://zulip.sagemath.org/
I've upgraded zulip to the most recent version and restarted the server; it seems to be working now.
comment:279 Changed 3 years ago by
- Description modified (diff)
comment:280 Changed 3 years ago by
- Commit changed from 624101115b9af149f8bf0d7cfe2f63d9ccf5201d to 54cfea3463c4815df3c42a0c4468efc9511f3f0b
comment:281 Changed 3 years ago by
- Description modified (diff)
comment:282 Changed 3 years ago by
- Description modified (diff)
comment:283 Changed 3 years ago by
- Description modified (diff)
comment:284 Changed 3 years ago by
- Description modified (diff)
comment:285 Changed 3 years ago by
- Description modified (diff)
comment:286 follow-up: ↓ 288 Changed 3 years ago by
[ ] build-from-clean works in a user namespace on CircleCI, https://circleci.com/workflow-run/4ae6af8c-2212-4724-a865-a401be4bd8b7
[ ] build-from-latest works and is fast in a user namespace on GitLab?
[ ] build-from-latest works and is fast in a user namespace on CircleCI
In this case what do you mean by "user namespace"? You mean like, a personal fork? I could try that. Would I need to set up a self-provided runner? I can do that too (and should practice that anyways).
comment:287 Changed 3 years ago by
- Description modified (diff)
comment:288 in reply to: ↑ 286 ; follow-up: ↓ 289 Changed 3 years ago by
Replying to embray:
[ ] build-from-clean works in a user namespace on CircleCI, https://circleci.com/workflow-run/4ae6af8c-2212-4724-a865-a401be4bd8b7
[ ] build-from-latest works and is fast in a user namespace on GitLab?
[ ] build-from-latest works and is fast in a user namespace on CircleCI
In this case what do you mean by "user namespace"? You mean like, a personal fork? I could try that.
Yes, I mean the typical use case for personal forks. More specifically, I meant the case where we are uploading to a docker hub account that is not the sagemath account. You might want to wait until https://gitlab.com/saraedum/sage/pipelines/25831675 has completed (so the sagemath/sagemath-dev:develop image has been upgraded), otherwise it won't be fast. Once that has happened, a branch that contains the changes in this ticket should build automatically on any GitLab fork without further setup.
Would I need to set up a self-provided runner? I can do that too (and should practice that anyways).
No, you don't need private runners for this. The free shared runners should be sufficient.
comment:289 in reply to: ↑ 288 Changed 3 years ago by
Replying to saraedum:
Replying to embray:
[ ] build-from-clean works in a user namespace on CircleCI, https://circleci.com/workflow-run/4ae6af8c-2212-4724-a865-a401be4bd8b7
[ ] build-from-latest works and is fast in a user namespace on GitLab?
[ ] build-from-latest works and is fast in a user namespace on CircleCI
In this case what do you mean by "user namespace"? You mean like, a personal fork? I could try that.
Yes, I mean the typical use case for personal forks. More specifically, I meant the case where we are uploading to a docker hub account that is not the sagemath account. You might want to wait until https://gitlab.com/saraedum/sage/pipelines/25831675 has completed (so the sagemath/sagemath-dev:develop image has been upgraded), otherwise it won't be fast. Once that has happened, a branch that contains the changes in this ticket should build automatically on any GitLab fork without further setup.
What about circle-ci? For that surely I'd at least have to enable it in my repository?
comment:290 Changed 3 years ago by
Yes, once the sagemath/sagemath-dev:develop
has been upgraded, it should work by simply enabling it in your repository with no further configuration.
comment:291 Changed 3 years ago by
- Description modified (diff)
comment:292 Changed 3 years ago by
- Description modified (diff)
comment:293 Changed 3 years ago by
- Description modified (diff)
comment:294 Changed 3 years ago by
- Description modified (diff)
comment:295 Changed 3 years ago by
- Description modified (diff)
comment:296 follow-up: ↓ 300 Changed 3 years ago by
Everything relevant works. However, strangely, when "building from latest", the resulting images have different sizes on CircleCI and GitLab CI:
sagemath: 611MB vs 635MB
sagemath-dev: 4GB vs 2GB
The dev images are not relevant in this case but still this is unfortunate. More importantly I would like to understand why these are different in size as the docker builds should be the same on both platforms.
comment:297 Changed 3 years ago by
- Description modified (diff)
comment:298 Changed 3 years ago by
- Commit changed from 54cfea3463c4815df3c42a0c4468efc9511f3f0b to 220d948c93048bab2a5a909b377f012374c01b06
comment:299 Changed 3 years ago by
- Commit changed from 220d948c93048bab2a5a909b377f012374c01b06 to c40bb81d324fc08f1ea6179626e0a5ddcddd5fbb
Branch pushed to git repo; I updated commit sha1. New commits:
c40bb81 | Add reference to known chown bug in docker/aufs
|
comment:300 in reply to: ↑ 296 ; follow-up: ↓ 304 Changed 3 years ago by
Replying to saraedum:
Everything relevant works. However, strangely, when "building from latest", the resulting images have different sizes on CircleCI and GitLab CI:
sagemath: 611MB vs 635MB
This is due to a difference in docker daemon version. I don't know what's the difference between the two images but I know tell CircleCI to build using the latest stable version and both are 635MB in size.
sagemath-dev: 4GB vs 2GB
This is a known bug in docker on aufs (that CircleCI is using.) There's isn't really anything we can do about it.
comment:301 follow-up: ↓ 307 Changed 3 years ago by
- Work issues changed from check that everything works ⇒ (provisional) positive review to 8.3rc2 release ⇒ positive review
embray: I'll set this to positive review once I released 8.3rc2 as a final check.
comment:302 Changed 3 years ago by
- Description modified (diff)
comment:303 follow-up: ↓ 305 Changed 2 years ago by
There's a problem with these Docker images, which is that it's impossible to install optional packages, which is a thing people are going to want to be able to do. This is a problem, of course, because being able to install optional packages means at least having some minimal build toolchain available in the image.
I suppose they could apt-get install
that after the fact, if desired (though this needs to be documented somehow). But then there's the problem that /home/sage/sage
is thoroughly stripped down, and does not have configure
, build/make/Makefile
, or anything else required for sage -i
to work at a minimum. This leads to the unhelpful error message
/home/sage/sage/src/bin/sage: line 353: make: command not found
or even if you sudo apt-get install make
:
make: *** No rule to make target 'all-toolchain'. Stop.
I'm not sure what we do about this.
comment:304 in reply to: ↑ 300 Changed 2 years ago by
Replying to saraedum:
Replying to saraedum:
Everything relevant works. However, strangely, when "building from latest", the resulting images have different sizes on CircleCI and GitLab CI:
sagemath: 611MB vs 635MB
This is due to a difference in docker daemon version. I don't know what's the difference between the two images but I know tell CircleCI to build using the latest stable version and both are 635MB in size.
sagemath-dev: 4GB vs 2GB
This is a known bug in docker on aufs (that CircleCI is using.) There's isn't really anything we can do about it.
Thanks for researching and making note of it. What a pain. I wish Docker would like, stabilize :)
Do you know if this issue has been brought up with CircleCI?
comment:305 in reply to: ↑ 303 ; follow-up: ↓ 306 Changed 2 years ago by
Replying to embray:
There's a problem with these Docker images, which is that it's impossible to install optional packages, which is a thing people are going to want to be able to do. This is a problem, of course, because being able to install optional packages means at least having some minimal build toolchain available in the image.
I suppose they could
apt-get install
that after the fact, if desired (though this needs to be documented somehow). But then there's the problem that/home/sage/sage
is thoroughly stripped down, and does not haveconfigure
,build/make/Makefile
, or anything else required forsage -i
to work at a minimum. This leads to the unhelpful error message/home/sage/sage/src/bin/sage: line 353: make: command not foundor even if you
sudo apt-get install make
:make: *** No rule to make target 'all-toolchain'. Stop.I'm not sure what we do about this.
No clue really. sage -pip
works afaik, so that's good enough for some I think.
I am not sure who would want to install more packages. Since you need to add a bunch of build dependencies, we could consider pointing people to the dev images instead maybe? Actually many optional dependencies should already be detected by the features framework so they could be installed through apt-get
?
comment:306 in reply to: ↑ 305 Changed 2 years ago by
Replying to saraedum:
Replying to embray:
There's a problem with these Docker images, which is that it's impossible to install optional packages, which is a thing people are going to want to be able to do. This is a problem, of course, because being able to install optional packages means at least having some minimal build toolchain available in the image.
I suppose they could
apt-get install
that after the fact, if desired (though this needs to be documented somehow). But then there's the problem that/home/sage/sage
is thoroughly stripped down, and does not haveconfigure
,build/make/Makefile
, or anything else required forsage -i
to work at a minimum. This leads to the unhelpful error message/home/sage/sage/src/bin/sage: line 353: make: command not foundor even if you
sudo apt-get install make
:make: *** No rule to make target 'all-toolchain'. Stop.I'm not sure what we do about this.
No clue really.
sage -pip
works afaik, so that's good enough for some I think.
Not for anything that's installed with sage -i
I am not sure who would want to install more packages. Since you need to add a bunch of build dependencies, we could consider pointing people to the dev images instead maybe?
For now that's what I've done, and what I suppose we can continue to do for now. Installing optional packages works for the dev images.
Actually many optional dependencies should already be detected by the features framework so they could be installed through
apt-get
?
Yes, that's something we should work on separately. There should be a framework for optional packages such that we can easily supply the distro-specific commands to install any dependencies needed for optional packages. For the docker images (or any other Sage installed on debian or ubuntu) this would of course be the appropriate apt-get
command. For other distros it might supply the correct package names to pass to emerge, yum, etc. This info could come from a distro-specific config file, that we can either maintain in the Sage repo directly, or that could be provided as a patch by downstream distributors.
For this to work we also need to include, at a minimum, the build/make/Makefile
in the installed Sage as well. Anyways, I don't think that's something we should do for this ticket. Just something to consider.
comment:307 in reply to: ↑ 301 Changed 2 years ago by
Replying to saraedum:
embray: I'll set this to positive review once I released 8.3rc2 as a final check.
+1
comment:308 Changed 2 years ago by
- Commit changed from c40bb81d324fc08f1ea6179626e0a5ddcddd5fbb to 8b104295c4bb8f8a8256aa0ccf1cd6bd6b332588
Branch pushed to git repo; I updated commit sha1. New commits:
8b10429 | Merge remote-tracking branch 'trac/develop' into 24655
|
comment:309 Changed 2 years ago by
- Status changed from needs_review to positive_review
- Work issues 8.3rc2 release ⇒ positive review deleted
The 8.3rc2 release worked fine, so we can finally set this to positive review.
comment:310 Changed 2 years ago by
- Commit changed from 8b104295c4bb8f8a8256aa0ccf1cd6bd6b332588 to d313c11a504baf2c6476357d345f121e0fb855e9
- Status changed from positive_review to needs_review
Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:
d313c11 | Merge remote-tracking branch 'trac/develop' into 24655
|
comment:311 Changed 2 years ago by
- Status changed from needs_review to positive_review
No need to set this back to needs review after a trivial merge with trac/develop.
comment:312 Changed 2 years ago by
+1
comment:313 Changed 2 years ago by
- Commit changed from d313c11a504baf2c6476357d345f121e0fb855e9 to 4152838ef924933eca5ee54447826c803d2dff00
- Status changed from positive_review to needs_review
Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:
4152838 | Merge remote-tracking branch 'trac/develop' into 24655
|
comment:314 Changed 2 years ago by
- Status changed from needs_review to positive_review
(trivial merge)
comment:315 Changed 2 years ago by
Btw., I just published 8.3 on sagemath/sagemath and sagemath/sagemath-dev.
comment:316 Changed 2 years ago by
- Commit changed from 4152838ef924933eca5ee54447826c803d2dff00 to a85023cef50c4cfb99a2a2bdd3504cd78b4c0327
- Status changed from positive_review to needs_review
Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:
a85023c | Merge remote-tracking branch 'trac/develop' into 24655
|
comment:317 Changed 2 years ago by
- Status changed from needs_review to positive_review
Back to positive review after a trivial merge. (vbraun: I am merging in develop as this triggers automatic builds of our docker images on gitlab.com.)
comment:318 Changed 2 years ago by
- Commit changed from a85023cef50c4cfb99a2a2bdd3504cd78b4c0327 to 6fa8bc6a16cfb7ce61598f38052729775cf898c4
- Status changed from positive_review to needs_review
Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:
6fa8bc6 | Builds fail with 8.4.beta0 but the output is truncated and it is not clear what's the issue.
|
comment:319 Changed 2 years ago by
- Commit changed from 6fa8bc6a16cfb7ce61598f38052729775cf898c4 to 3af7c5006c0a9b3583f2bb2041c815c607b2f3f2
Branch pushed to git repo; I updated commit sha1. New commits:
3af7c50 | Merge remote-tracking branch 'trac/develop' into HEAD
|
comment:320 Changed 2 years ago by
embray, I have a feeling that the build now hangs more frequently than it used to. I have now seen this twice locally and I had CI machines time out after 12 hours of trying to build Sage. (I checked locally that nothing was happening anymore in the background.)
comment:321 follow-up: ↓ 328 Changed 2 years ago by
- Status changed from needs_review to positive_review
I can't imagine why. Nobody else has reported builds hanging. I'm not sure what you mean in this case by "the build", because there's nothing special about the way sage is being built ist here?
This ticket has gone on too long. Let's see if we can get Volker to merge it and then we can continue tweaking from there. It might also help if you could squash down some of the massive commit history into something a little easier to digest, but I know that can be daunting in its own right...
comment:322 follow-up: ↓ 325 Changed 2 years ago by
Also, can you tell where the builds have been hanging?
They've been some tinkering with Docker Hub lately, and I know they have some planned downtime coming up.
comment:323 follow-up: ↓ 326 Changed 2 years ago by
I'm a little confused by the latest jobs on https://gitlab.com/saraedum/sage/pipelines
The last three jobs were all building the exact same commit: https://gitlab.com/saraedum/sage/commit/3af7c5006c0a9b3583f2bb2041c815c607b2f3f2
But one was triggered on an update to the "develop" branch, one was triggered on an update to a "build-from-latest" branch (what's that about?--apologies if I've already asked), and one was triggered by the new 8.4.beta1 tag. Although, the way Sage is currently developed, updates to the "develop" branch always coincide with new tags as well. I don't like that, but for now it's superfluous to trigger builds on new tags.
Also, with the two that failed, it looks like I was right about it being something to do with Docker Hub, but I could be misreading the logs. I'm not sure if that's what you're referring to though.
comment:324 follow-up: ↓ 327 Changed 2 years ago by
The latest Circle-CI build, however, timed out at
---> 3753d48ddd74 Step 68/76 : RUN if [ x"$ARTIFACT_BASE" != x"source-clean" ]; then mkdir -p $HOME/patch && find . -type f > $HOME/make-fast-rebuild-clean.manifest && cat $HOME/make-fast-rebuild-clean.manifest $HOME/artifact-base.manifest | sort | uniq -u > $HOME/obsolete && find . -type f -cnewer $HOME/artifact-base.manifest > $HOME/modified && tar -cJf $HOME/patch/modified.tar.xz -T $HOME/modified && cat $HOME/obsolete $HOME/modified | xz > $HOME/patch/modified.xz && rm -rf $SAGE_ROOT && mkdir -p $SAGE_ROOT && mv $HOME/patch $SAGE_ROOT/; fi ---> Running in ea5237ee3bbd Build timedout
I have no idea what any of that means.
(As an aside, I find it annoying that Circle-CI will show you the beginning of the log, but you have to download the full log to see the end. On GitLab it will also truncate the log, but it shows the end of the log first, which is almost always more useful when it comes to diagnosing problems. I wonder if that's at all configurable on Circle...)
comment:325 in reply to: ↑ 322 Changed 2 years ago by
Replying to embray:
Also, can you tell where the builds have been hanging?
Unfortunately I don't really know. My top showed a python2 process that did not do anything anymore but that apparently was the only thing that make was waiting for. I did not investigate further.
On GitLab I could not see where it hung because we truncate the logs ourselves and we were already in that portion.
They've been some tinkering with Docker Hub lately, and I know they have some planned downtime coming up.
Yes, that explains some of the errors when pushing images in the end. But that's not what I have been refering to here.
comment:326 in reply to: ↑ 323 ; follow-up: ↓ 331 Changed 2 years ago by
Replying to embray:
I'm a little confused by the latest jobs on https://gitlab.com/saraedum/sage/pipelines
The last three jobs were all building the exact same commit: https://gitlab.com/saraedum/sage/commit/3af7c5006c0a9b3583f2bb2041c815c607b2f3f2
Yes, that's fine. I pushed one to a branch (that's not called develop and not called master) to see if the build-from-latest still works. However, due to quite a bit of SPKG changes, the build timed out in the 3h time limit that is in place for build-from-latest (we use the public shared gitlab runners, and they enforce that time limit.)
The other two runs are one build-from-clean for the develop branch and one build-from-clean for the 8.4.beta1 tag. That's intentional and how GitLab CI works. The rules don't determine that there are several triggers that come from the same commit. CircleCI does that and it is imho very annoying to work around.
As a result, the develop and the 8.4.beta1 tag on docker hub do not come from the same run of "docker build". But since "docker build" should be reproducible anyway (in particular if we build-from-clean) I don't see this as a problem apart from the wasted resources.
comment:327 in reply to: ↑ 324 ; follow-up: ↓ 330 Changed 2 years ago by
Replying to embray:
The latest Circle-CI build, however, timed out at
---> 3753d48ddd74 Step 68/76 : RUN if [ x"$ARTIFACT_BASE" != x"source-clean" ]; then mkdir -p $HOME/patch && find . -type f > $HOME/make-fast-rebuild-clean.manifest && cat $HOME/make-fast-rebuild-clean.manifest $HOME/artifact-base.manifest | sort | uniq -u > $HOME/obsolete && find . -type f -cnewer $HOME/artifact-base.manifest > $HOME/modified && tar -cJf $HOME/patch/modified.tar.xz -T $HOME/modified && cat $HOME/obsolete $HOME/modified | xz > $HOME/patch/modified.xz && rm -rf $SAGE_ROOT && mkdir -p $SAGE_ROOT && mv $HOME/patch $SAGE_ROOT/; fi ---> Running in ea5237ee3bbd Build timedout
Yes, build-from-clean times out on CircleCI. That's a known and documented issue.
(As an aside, I find it annoying that Circle-CI will show you the beginning of the log, but you have to download the full log to see the end. On GitLab it will also truncate the log, but it shows the end of the log first, which is almost always more useful when it comes to diagnosing problems. I wonder if that's at all configurable on Circle...)
No, it can't be configured and it's a common complaint about CircleCI.
comment:328 in reply to: ↑ 321 Changed 2 years ago by
Replying to embray:
I can't imagine why. Nobody else has reported builds hanging. I'm not sure what you mean in this case by "the build", because there's nothing special about the way sage is being built ist here?
This ticket has gone on too long. Let's see if we can get Volker to merge it and then we can continue tweaking from there. It might also help if you could squash down some of the massive commit history into something a little easier to digest, but I know that can be daunting in its own right...
I prefer not to squash the commits just because it's a lot of work. I am trying to write meaningful commit messages so if somebody really looks at the commit history some day it should be possible to make sense out of it.
But, yeah, let's get this back to positive review.
comment:329 Changed 2 years ago by
I spent some time reading through the log of the recent Circle-CI build that timed out, and that was at least edifying, to understand better step-by-step what the process is. You've walked me through it a few times but it's still...complicated, to say the least.
I'm still not clear on what these timeouts you're referring to are, then, if it's not anything I could see on the recent builds...
comment:330 in reply to: ↑ 327 ; follow-up: ↓ 333 Changed 2 years ago by
Replying to saraedum:
Replying to embray:
The latest Circle-CI build, however, timed out at
---> 3753d48ddd74 Step 68/76 : RUN if [ x"$ARTIFACT_BASE" != x"source-clean" ]; then mkdir -p $HOME/patch && find . -type f > $HOME/make-fast-rebuild-clean.manifest && cat $HOME/make-fast-rebuild-clean.manifest $HOME/artifact-base.manifest | sort | uniq -u > $HOME/obsolete && find . -type f -cnewer $HOME/artifact-base.manifest > $HOME/modified && tar -cJf $HOME/patch/modified.tar.xz -T $HOME/modified && cat $HOME/obsolete $HOME/modified | xz > $HOME/patch/modified.xz && rm -rf $SAGE_ROOT && mkdir -p $SAGE_ROOT && mv $HOME/patch $SAGE_ROOT/; fi ---> Running in ea5237ee3bbd Build timedoutYes, build-from-clean times out on CircleCI. That's a known and documented issue.
Is it? I see your note in the Dockerfile about it making the images too big, but not about it timing out. If it's a known issue then shouldn't we just disable it for now?
comment:331 in reply to: ↑ 326 ; follow-up: ↓ 332 Changed 2 years ago by
Replying to saraedum:
The other two runs are one build-from-clean for the develop branch and one build-from-clean for the 8.4.beta1 tag. That's intentional and how GitLab CI works. The rules don't determine that there are several triggers that come from the same commit. CircleCI does that and it is imho very annoying to work around.
As a result, the develop and the 8.4.beta1 tag on docker hub do not come from the same run of "docker build". But since "docker build" should be reproducible anyway (in particular if we build-from-clean) I don't see this as a problem apart from the wasted resources.
I still feel like we ought to be able to work around that somehow. I don't see why we would want to do two complete build-from-cleans of the same exact commit just so they wind up with different tags on Docker Hub.
I know you've thought about this more than I have so I'm probably missing something. But my thinking is that until and unless Sage's "develop" branch is decoupled from its tags (something I think should happen but is not the case for now), we should edit the .gitlab-ci.yml
and .circleci/config.yml
to really only do build-from-clean on the develop
branch (not even master). If we want to then tag an image with a version number, we can take the current commit ID on develop, do git describe --tags <commit id>
, and get the associated tag name. Then use that to tag the image.
comment:332 in reply to: ↑ 331 ; follow-up: ↓ 336 Changed 2 years ago by
Replying to embray:
Replying to saraedum:
The other two runs are one build-from-clean for the develop branch and one build-from-clean for the 8.4.beta1 tag. That's intentional and how GitLab CI works. The rules don't determine that there are several triggers that come from the same commit. CircleCI does that and it is imho very annoying to work around.
As a result, the develop and the 8.4.beta1 tag on docker hub do not come from the same run of "docker build". But since "docker build" should be reproducible anyway (in particular if we build-from-clean) I don't see this as a problem apart from the wasted resources.
I still feel like we ought to be able to work around that somehow. I don't see why we would want to do two complete build-from-cleans of the same exact commit just so they wind up with different tags on Docker Hub.
It's a bit weird, I agree. But that's how people do that on GitLab CI. And usually builds don't take five hours, so it doesn't matter.
I know you've thought about this more than I have so I'm probably missing something. But my thinking is that until and unless Sage's "develop" branch is decoupled from its tags (something I think should happen but is not the case for now), we should edit the
.gitlab-ci.yml
and.circleci/config.yml
to really only do build-from-clean on thedevelop
branch (not even master). If we want to then tag an image with a version number, we can take the current commit ID on develop, dogit describe --tags <commit id>
, and get the associated tag name. Then use that to tag the image.
This would break if the branch and the tag are not pushed at the same time. Also, it's reasonable that you don't want to push all tags but just protected tags, e.g., I don't want to push the tags that I just use to check that building from tags actually works.
In short, I see your point but I'd rather leave it the way it is.
comment:333 in reply to: ↑ 330 ; follow-up: ↓ 334 Changed 2 years ago by
Replying to embray:
Replying to saraedum:
Replying to embray:
The latest Circle-CI build, however, timed out at
---> 3753d48ddd74 Step 68/76 : RUN if [ x"$ARTIFACT_BASE" != x"source-clean" ]; then mkdir -p $HOME/patch && find . -type f > $HOME/make-fast-rebuild-clean.manifest && cat $HOME/make-fast-rebuild-clean.manifest $HOME/artifact-base.manifest | sort | uniq -u > $HOME/obsolete && find . -type f -cnewer $HOME/artifact-base.manifest > $HOME/modified && tar -cJf $HOME/patch/modified.tar.xz -T $HOME/modified && cat $HOME/obsolete $HOME/modified | xz > $HOME/patch/modified.xz && rm -rf $SAGE_ROOT && mkdir -p $SAGE_ROOT && mv $HOME/patch $SAGE_ROOT/; fi ---> Running in ea5237ee3bbd Build timedoutYes, build-from-clean times out on CircleCI. That's a known and documented issue.
Is it? I see your note in the Dockerfile about it making the images too big, but not about it timing out. If it's a known issue then shouldn't we just disable it for now?
It's in the CircleCI configuration file:
# To make the build time not too excessive, we seed the build cache with # sagemath/sagemath-dev:develop. When basic SPKGs change, this does not help much, # still the full build does usually not exceed CircleCI's limits for open # source projcets (five hours on 2 vCPUs as of early 2018.) # You might want to try to build locally or with GitLab CI, see # `.gitlab-ci.yml` for more options.
comment:334 in reply to: ↑ 333 Changed 2 years ago by
Replying to saraedum:
Replying to embray:
Replying to saraedum:
Replying to embray:
The latest Circle-CI build, however, timed out at
---> 3753d48ddd74 Step 68/76 : RUN if [ x"$ARTIFACT_BASE" != x"source-clean" ]; then mkdir -p $HOME/patch && find . -type f > $HOME/make-fast-rebuild-clean.manifest && cat $HOME/make-fast-rebuild-clean.manifest $HOME/artifact-base.manifest | sort | uniq -u > $HOME/obsolete && find . -type f -cnewer $HOME/artifact-base.manifest > $HOME/modified && tar -cJf $HOME/patch/modified.tar.xz -T $HOME/modified && cat $HOME/obsolete $HOME/modified | xz > $HOME/patch/modified.xz && rm -rf $SAGE_ROOT && mkdir -p $SAGE_ROOT && mv $HOME/patch $SAGE_ROOT/; fi ---> Running in ea5237ee3bbd Build timedoutYes, build-from-clean times out on CircleCI. That's a known and documented issue.
Is it? I see your note in the Dockerfile about it making the images too big, but not about it timing out. If it's a known issue then shouldn't we just disable it for now?
It's in the CircleCI configuration file:
# To make the build time not too excessive, we seed the build cache with # sagemath/sagemath-dev:develop. When basic SPKGs change, this does not help much, # still the full build does usually not exceed CircleCI's limits for open # source projcets (five hours on 2 vCPUs as of early 2018.) # You might want to try to build locally or with GitLab CI, see # `.gitlab-ci.yml` for more options.
But it says the wrong thing ouch.
comment:335 Changed 2 years ago by
- Commit changed from 3af7c5006c0a9b3583f2bb2041c815c607b2f3f2 to ac6201a942aa19b6181c463cac8e9588e92b3d8a
- Status changed from positive_review to needs_review
Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:
ac6201a | Timeouts DO happen on CircleCI
|
comment:336 in reply to: ↑ 332 ; follow-up: ↓ 339 Changed 2 years ago by
Replying to saraedum:
Replying to embray:
Replying to saraedum:
The other two runs are one build-from-clean for the develop branch and one build-from-clean for the 8.4.beta1 tag. That's intentional and how GitLab CI works. The rules don't determine that there are several triggers that come from the same commit. CircleCI does that and it is imho very annoying to work around.
As a result, the develop and the 8.4.beta1 tag on docker hub do not come from the same run of "docker build". But since "docker build" should be reproducible anyway (in particular if we build-from-clean) I don't see this as a problem apart from the wasted resources.
I still feel like we ought to be able to work around that somehow. I don't see why we would want to do two complete build-from-cleans of the same exact commit just so they wind up with different tags on Docker Hub.
It's a bit weird, I agree. But that's how people do that on GitLab CI. And usually builds don't take five hours, so it doesn't matter.
Usually...
I know you've thought about this more than I have so I'm probably missing something. But my thinking is that until and unless Sage's "develop" branch is decoupled from its tags (something I think should happen but is not the case for now), we should edit the
.gitlab-ci.yml
and.circleci/config.yml
to really only do build-from-clean on thedevelop
branch (not even master). If we want to then tag an image with a version number, we can take the current commit ID on develop, dogit describe --tags <commit id>
, and get the associated tag name. Then use that to tag the image.This would break if the branch and the tag are not pushed at the same time. Also, it's reasonable that you don't want to push all tags but just protected tags, e.g., I don't want to push the tags that I just use to check that building from tags actually works.
In short, I see your point but I'd rather leave it the way it is.
Why not build just the tags then?
comment:337 Changed 2 years ago by
- Status changed from needs_review to positive_review
comment:338 Changed 2 years ago by
Back to positive review. Only a minor documentation change.
So, I would not disable the CircleCI builds as it does not do any harm:
- We won't configure CircleCI on our sagemath github repository (because it does not work reliably and we are building on GitLab CI anyway.) so there is not going to be confusing messages there.
- If somebody has a clone of our repo and CircleCI enabled, then these builds only happen if they push to master or develop or if they push a tag (something people usually don't do in forks).
- Finally, it sometimes works. It really seems to be about how much load is on the actual machine that runs your docker container. I've seen builds complete after 4:59 or sometimes after 4:45. If we dropped a standard SPKG (python 2 eventually?) or Cython 2.9.0 or the next GCC is compiling faster for some reason, the builds are going to work reliably.
That said, #26085 should comunicate such things more clearly in the actual CI output.
comment:339 in reply to: ↑ 336 ; follow-up: ↓ 340 Changed 2 years ago by
Replying to embray:
Why not build just the tags then?
I would need some logic that decides whether this tag is a "develop" or a "latest" tag. And maybe even have to decide whether this is really "later" than what's currently there. It's certainly doable but I don't see enough of an advantage really.
comment:340 in reply to: ↑ 339 ; follow-up: ↓ 341 Changed 2 years ago by
Replying to saraedum:
Replying to embray:
Why not build just the tags then?
I would need some logic that decides whether this tag is a "develop" or a "latest" tag. And maybe even have to decide whether this is really "later" than what's currently there. It's certainly doable but I don't see enough of an advantage really.
I'm not sure what you mean by this. Don't we have to do that anyways, somewhere, based on the version?
comment:341 in reply to: ↑ 340 Changed 2 years ago by
Replying to embray:
Replying to saraedum:
Replying to embray:
Why not build just the tags then?
I would need some logic that decides whether this tag is a "develop" or a "latest" tag. And maybe even have to decide whether this is really "later" than what's currently there. It's certainly doable but I don't see enough of an advantage really.
I'm not sure what you mean by this. Don't we have to do that anyways, somewhere, based on the version?
No, the master branch pushes to the "latest" tag, the develop branch pushes to the "develop" tag. Or, was that not your question maybe?
comment:342 follow-up: ↓ 343 Changed 2 years ago by
I just don't see why we should build the exact same thing up to three times every time there's a release. I looked at what happened when 8.3 was released and that added up to more than 15 hours worth of building (some of it was because push-dockerhub-dev took 2.5 hours??)--all the exact same thing--three times.
I don't know why we need to use the git branch/tag that happens to be being built to determine which docker-tag to push.
Every time 'develop' is updated a tag is created, and every time the tag is for a final release, 'master' also gets updated. I don't like this process or the branch names but for now it is what it is. Thus, it's only necessary to trigger a build-from-clean when a tag is created. With docker you can re-tag the same image as many times as you want and push it to Docker Hub, and that can be done depending on what the version number in the tag is.
So what I'm proposing is to have one build, when a tag is created, with some <version>
then,
- Tag the image
sagemath:<version>
; push - If the version ends with
.betaN
or.rcN
, re-tag the imagesagemath:develop
; push - Else re-tag the image
sagemath:latest
; push
One build resulting in 2 docker-tags for the same image, each time.
I think, theoretically, the way you do things currently makes sense, and might make sense to revert to at some point, but practically-speaking, right now, I don't see why we should do build-from-clean for anything but tags.
comment:343 in reply to: ↑ 342 Changed 2 years ago by
Replying to embray:
So what I'm proposing is to have one build, when a tag is created, with some
<version>
then,
- Tag the image
sagemath:<version>
; push- If the version ends with
.betaN
or.rcN
, re-tag the imagesagemath:develop
; push
That's not correct, all tags should go to sagemath:develop.
- Else re-tag the image
sagemath:latest
; pushOne build resulting in 2 docker-tags for the same image, each time.
I think, theoretically, the way you do things currently makes sense, and might make sense to revert to at some point, but practically-speaking, right now, I don't see why we should do build-from-clean for anything but tags.
Is it really an issue? It's a couple of hours of build time every week but it would be hardcoding knowledge about our release process which I don't like. Also, I'd like to add the release manager's integration branch to the list of build-from-clean at some point (though they should not go to the docker hub maybe) and then we'd need that one again anyway.
comment:344 Changed 2 years ago by
Also, there are some annoying races here (the following situation has happened to me when we worked with preemptible runners a lot.) With your proposed model the following could happen:
- Create git-tag 9.0.
- The CI fails for some reason, say a timeout pushing to docker hub.
- Create git-tag 9.1.beta0
- The CI pushes to sagemath:9.1.beta0 and sagemath:develop
- Somebody realizes that the sagemath:9.0 build is missing and presses "retry" in the CI.
- The CI pushes to sagemath:9.0, sagemath:latest, and sagemath:develop. Now sagemath:develop is not the latest beta.
This can't happen if you have develop branch → develop docker-tag; git-tag → docker-tag.
comment:345 Changed 2 years ago by
- Branch changed from u/saraedum/24655 to ac6201a942aa19b6181c463cac8e9588e92b3d8a
- Resolution set to fixed
- Status changed from positive_review to closed
comment:346 Changed 2 years ago by
- Commit ac6201a942aa19b6181c463cac8e9588e92b3d8a deleted
Somehow 7d85dc796 ("Something related to the sphinxbuild seems to be leaking memory") causes the docbuild to segfault on nix. Since the docbuild is running on 4 cores, as far as I can see the commit shouldn't actually change anything. Any idea?
[algebras ] loading pickled environment... not yet created [algebras ] building [inventory]: targets for 67 source files that are out of date [algebras ] updating environment: 67 added, 0 changed, 0 removed [algebras ] The HTML pages are in share/doc/sage/inventory/en/reference/algebras. Build finished. The built documents can be found in /build/share/doc/sage/inventory/en/reference/algebras [combinat ] loading pickled environment... not yet created [combinat ] building [inventory]: targets for 367 source files that are out of date [combinat ] updating environment: 367 added, 0 changed, 0 removed Error building the documentation. Traceback (most recent call last): File "/nix/store/9lvgqr20r7j7b6a4fmhw6n82spd9rafq-python-2.7.15-env/lib/python2.7/runpy.py", line 174, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/nix/store/9lvgqr20r7j7b6a4fmhw6n82spd9rafq-python-2.7.15-env/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/nix/store/yyarqmdxj1dmp65qv7mwm3sfrdlwmg8v-python2.7-sagelib-8.4.beta2/lib/python2.7/site-packages/sage_setup/docbuild/__main__.py", line 2, in <module> main() File "/nix/store/yyarqmdxj1dmp65qv7mwm3sfrdlwmg8v-python2.7-sagelib-8.4.beta2/lib/python2.7/site-packages/sage_setup/docbuild/__init__.py", line 1712, in main builder() File "/nix/store/yyarqmdxj1dmp65qv7mwm3sfrdlwmg8v-python2.7-sagelib-8.4.beta2/lib/python2.7/site-packages/sage_setup/docbuild/__init__.py", line 334, in _wrapper getattr(get_builder(document), 'inventory')(*args, **kwds) File "/nix/store/yyarqmdxj1dmp65qv7mwm3sfrdlwmg8v-python2.7-sagelib-8.4.beta2/lib/python2.7/site-packages/sage_setup/docbuild/__init__.py", line 529, in _wrapper build_many(build_ref_doc, L) File "/nix/store/yyarqmdxj1dmp65qv7mwm3sfrdlwmg8v-python2.7-sagelib-8.4.beta2/lib/python2.7/site-packages/sage_setup/docbuild/__init__.py", line 283, in build_many ret = x.get(99999) File "/nix/store/9lvgqr20r7j7b6a4fmhw6n82spd9rafq-python-2.7.15-env/lib/python2.7/multiprocessing/pool.py", line 572, in get raise self._value Exception: ('Non-exception during docbuild: Segmentation fault', SignalError('Segmentation fault',))
comment:347 follow-up: ↓ 348 Changed 2 years ago by
I agree that it shouldn't change anything. Can you get more details on the segfault in sphinx?
comment:348 in reply to: ↑ 347 Changed 2 years ago by
Replying to saraedum:
I agree that it shouldn't change anything. Can you get more details on the segfault in sphinx?
Yet for some reason the documentation built fine on 8.2.beta1 and still builds if I revert that commit. Really weird.
I don't know, what kind of detail are you looking for? Do you happen to know how I can convince python/sphinx to give me a useful stacktrace?
comment:349 Changed 2 years ago by
What's the motivation for all these changes to the docbuilder?
Please see https://groups.google.com/forum/#!topic/sage-packaging/VU4h8IWGFLA
comment:350 follow-up: ↓ 351 Changed 2 years ago by
We were running out of RAM on CI machines while building the docs IIRC.
comment:351 in reply to: ↑ 350 ; follow-up: ↓ 354 Changed 2 years ago by
comment:352 Changed 2 years ago by
Seriously, yet another thing in Sage broken by this ticket...
comment:353 Changed 2 years ago by
That's not really helpful.
comment:354 in reply to: ↑ 351 Changed 2 years ago by
Replying to jdemeyer:
Replying to saraedum:
We were running out of RAM on CI machines while building the docs IIRC.
Is that related to logging somehow? See also #26667.
No, the logging issues made us exceed stdout limits on CI systems. We are now truncating the output there. It would be nice if we could try to keep the output to stdout as small as reasonably possible as this makes the CI output much easier to work with but I see that this one was too extreme.
comment:355 Changed 2 years ago by
I don't think it was "too extreme". I think, perhaps more reasonably, there should be an option for verbosity level. My general preference is somewhere between zero output or a small status progress message until and unless something goes wrong. Detailed logs can go to files, which are more useful to have for debugging anyways (on pipelines the only inconvenience there is that I can't see the log "at-a-glance" on the web UI, but we have artifact downloads for that...)
comment:356 Changed 2 years ago by
I like the idea of just logging the more verbose stuff into a file. It should be printed to stdout on failure though, otherwise debugging a CI failure is annoying.
comment:357 Changed 2 years ago by
You probably know about this, but try "make V=0" in the sage build. Everything will go into logs only and you can print them on error.
New commits:
Enablec GitLab CI and CircleCI