#31344 closed defect (fixed)

homebrew: docbuild crashes, libtcl AtForkPrepare - from sage.misc.cython globals / multiprocessing

Reported by: mkoeppe Owned by:
Priority: blocker Milestone: sage-9.3
Component: build Keywords:
Cc: jhpalmieri, gh-zlscherr, fbissey, gh-kliem Merged in:
Authors: Matthias Koeppe, John Palmieri Reviewers: John Palmieri
Report Upstream: N/A Work issues:
Branch: b4ceee5 (Commits, GitHub, GitLab) Commit: b4ceee5dae4caa7dac7a6fe7ca29538aca2d381a
Dependencies: Stopgaps:

Status badges

Change History (36)

comment:1 Changed 17 months ago by mkoeppe

Bisecting src/doc/en/reference/misc/index.rst (running ./sage -docbuild --keep-going all html) reveals that the crash is coming from sage.misc.cython

comment:2 Changed 17 months ago by mkoeppe

For some reason, this line: cblas_pc = pkgconfig.parse(get_cblas_pc_module_name()) seems to cause the trouble.

comment:3 Changed 17 months ago by mkoeppe

  • Branch set to u/mkoeppe/homebrew__docbuild_crashes__libtcl_atforkprepare

comment:4 Changed 17 months ago by mkoeppe

  • Authors set to Matthias Koeppe
  • Commit set to 80720d7bcc0b1de4b92984fa62d52dad5855b9b0
  • Status changed from new to needs_review

New commits:

80720d7src/sage/misc/cython.py: Do not run pkgconfig at import time

comment:5 Changed 17 months ago by mkoeppe

Easiest to test on #31335, which merges this branch

comment:6 Changed 17 months ago by jhpalmieri

With #31335, I still see a failure during docbuilding when using homebrew's Python with Big Sur. The failure now appears when building thematic_tutorials instead of the reference manual.

------------------------------------------------------------------------
0   signals.cpython-39-darwin.so        0x00000001047e2542 print_backtrace + 66
1   signals.cpython-39-darwin.so        0x00000001047e6167 sigdie + 39
2   signals.cpython-39-darwin.so        0x00000001047e606a cysigs_signal_handler + 282
3   libsystem_platform.dylib            0x00007fff20486d7d _sigtramp + 29
4   Python                              0x00000001029edcf1 _PyArg_ParseTuple_SizeT + 158
5   libtcl8.6.dylib                     0x000000034143972e AtForkPrepare + 38
6   libsystem_pthread.dylib             0x00007fff204421a3 _pthread_atfork_prepare_handlers + 90
7   libSystem.B.dylib                   0x00007fff2a645934 libSystem_atfork_prepare + 11
8   libsystem_c.dylib                   0x00007fff20325b1b fork + 12
9   _posixsubprocess.cpython-39-darwin. 0x00000001030d77f3 subprocess_fork_exec + 860
10  Python                              0x000000010291c2da cfunction_call + 90
11  Python                              0x00000001028d1b56 _PyObject_MakeTpCall + 129
12  Python                              0x00000001029ca625 call_function + 278
13  Python                              0x00000001029c7e86 _PyEval_EvalFrameDefault + 45416
14  Python                              0x00000001029bbbd6 _PyEval_EvalCode + 403
...
272 Python                              0x00000001028d2774 _PyFunction_Vectorcall + 376
273 Python                              0x0000000102a3ade0 pymain_run_module + 212
274 Python                              0x0000000102a3a8aa pymain_run_python + 433
275 Python                              0x0000000102a3a6bd Py_RunMain + 23
276 Python                              0x0000000102a3b9da pymain_main + 35
277 Python                              0x0000000102a3bcb0 Py_BytesMain + 42
278 libdyld.dylib                       0x00007fff2045d621 start + 1
------------------------------------------------------------------------
Unhandled SIGILL: An illegal instruction occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
------------------------------------------------------------------------

comment:7 Changed 17 months ago by mkoeppe

Thanks for testing! I'll try a clean rebuild of the documentation and see if I can reproduce it on Catalina as well.

For reference, the trick for bisecting was to use

make build && ./sage -docbuild --keep-going all html ; ./sage -docbuild all html

the first --keep-going was necessary so that WARNING: document isn't included in any toctree does not stop the whole process.

comment:8 Changed 17 months ago by mkoeppe

OK, I can reproduce it

comment:9 Changed 17 months ago by mkoeppe

reducing thematic_tutorials/index.rst to the following still reproduces the crash:

.. Sage documentation master file, created by sphinx-quickstart on Thu
.. Aug 21 20:15:55 2008. You can adapt this file completely to your
.. liking, but it should at least contain the root `toctree` directive.

.. _thematic-tutorials:

Welcome to the Sage Thematic Tutorials!
=======================================


* `Tutorial: Symbolics and Plotting (PREP) <../prep/Symbolics-and-Basic-Plotting.html>`_

comment:10 Changed 17 months ago by mkoeppe

That's in an incremental docbuild - so something bad must have been saved in the inventory.

comment:11 Changed 17 months ago by jhpalmieri

When I saw the original problem, I only saw it on the second pass through the ref manual build, which is consistent with seeing problems based on something in the inventory.

comment:12 Changed 17 months ago by gh-zlscherr

Does anyone know why

./sage --docbuild all html

fails at thematic_tutorial but

./sage --docbuild thematic_tutorial html

works?

comment:13 Changed 17 months ago by gh-zlscherr

In fact, after

./sage -docbuild --keep-going all html

failed, I tried building thematic_tutorial by itself. That worked, and then make doc says it was successful.

comment:14 Changed 17 months ago by mkoeppe

I think the bug is triggered by the parallelization code in sage_setup.docbuild.AllBuilder.

comment:15 Changed 17 months ago by mkoeppe

We previously had trouble with this code (build_many - from #28356, #27514, #27490) in #30351, #28483, ...

comment:16 Changed 17 months ago by mkoeppe

see also #31289

comment:17 Changed 17 months ago by mkoeppe

In any case, I think this ticket is an improvement by itself, as it removes some accidental globals from the module sage.misc.cython and reduced its load time.

comment:18 Changed 17 months ago by mkoeppe

  • Summary changed from homebrew: docbuild crashes, libtcl AtForkPrepare to homebrew: docbuild crashes, libtcl AtForkPrepare - from sage.misc.cython globals

comment:19 Changed 17 months ago by jhpalmieri

With this change, the documentation builds for me (but of course it is missing a plot):

  • src/doc/en/thematic_tutorials/vector_calculus/vector_calc_cartesian.rst

    diff --git a/src/doc/en/thematic_tutorials/vector_calculus/vector_calc_cartesian.rst b/src/doc/en/thematic_tutorials/vector_calculus/vector_calc_cartesian.rst
    index 9faa9f2375..bc77d72e68 100644
    a b Vector fields can be plotted:: 
    9494    E = EuclideanSpace(3)
    9595    x, y, z = E.default_chart()[:]
    9696    v = E.vector_field(-y, x, sin(x*y*z), name='v')
    97     sphinx_plot(v.plot(max_range=1.5, scale=0.5))
    9897
    9998For customizing the plot, see the list of options in the documentation of
    10099:meth:`~sage.manifolds.differentiable.vectorfield.VectorField.plot`.

comment:20 Changed 17 months ago by mkoeppe

This does not seem to help on my machine

comment:21 Changed 17 months ago by jhpalmieri

Sorry, it turns out that it doesn't consistently help on mine, either. I think this should stop the non-reference manual docs from being built in parallel:

  • src/sage_setup/docbuild/__init__.py

    diff --git a/src/sage_setup/docbuild/__init__.py b/src/sage_setup/docbuild/__init__.py
    index b07e9c100c..1d4139555e 100644
    a b class DocBuilder(object): 
    286286
    287287from .utils import build_many as _build_many
    288288
    289 def build_many(target, args):
     289def build_many(target, args, processes=None):
    290290    """
    291291    Thin wrapper around `sage_setup.docbuild.utils.build_many` which uses the
    292292    docbuild settings ``NUM_THREADS`` and ``ABORT_ON_ERROR``.
    293293    """
     294    if processes is None:
     295        processes = NUM_THREADS
    294296    try:
    295         _build_many(target, args, processes=NUM_THREADS)
     297        _build_many(target, args, processes=processes)
    296298    except BaseException as exc:
    297299        if ABORT_ON_ERROR:
    298300            raise
    class AllBuilder(object): 
    349351
    350352        # build the other documents in parallel
    351353        L = [(doc, name, kwds) + args for doc in others]
    352         build_many(build_other_doc, L)
     354        build_many(build_other_doc, L, 1)
    353355        logger.warning("Elapsed time: %.1f seconds."%(time.time()-start))
    354356        logger.warning("Done building the documentation!")
Last edited 17 months ago by jhpalmieri (previous) (diff)

comment:22 Changed 17 months ago by jhpalmieri

#31289 doesn't seem to help, by the way.

comment:23 Changed 17 months ago by mkoeppe

Perhaps conditionalize this change on macOS?

comment:24 Changed 17 months ago by mkoeppe

I've pushed this change to the branch of #31335, but it does not actually fix the problem for me. I'll try next if replacing the build_many by a for loop helps.

comment:25 Changed 17 months ago by mkoeppe

(retracted)

Last edited 17 months ago by mkoeppe (previous) (diff)

comment:26 Changed 17 months ago by mkoeppe

  • Summary changed from homebrew: docbuild crashes, libtcl AtForkPrepare - from sage.misc.cython globals to homebrew: docbuild crashes, libtcl AtForkPrepare - from sage.misc.cython globals / multiprocessing

comment:27 Changed 17 months ago by git

  • Commit changed from 80720d7bcc0b1de4b92984fa62d52dad5855b9b0 to b4ceee5dae4caa7dac7a6fe7ca29538aca2d381a

Branch pushed to git repo; I updated commit sha1. New commits:

515f899sage_setup.docbuild.AllBuilder: stop the non-reference manual docs from being built in parallel
804ebd7sage_setup.dpcbuild.AllBuilder: Restrict workaround to macOS
b4ceee5sage_setup.docbuild: In the workaround, do not go through build_many to build serially

comment:28 Changed 17 months ago by mkoeppe

This fixes the problem on my machine. Please test on Big Sur

comment:29 Changed 17 months ago by mkoeppe

  • Priority changed from critical to blocker

comment:30 Changed 17 months ago by mkoeppe

  • Authors changed from Matthias Koeppe to Matthias Koeppe, John Palmieri

comment:31 Changed 17 months ago by gh-zlscherr

Worked for me on Big Sur

comment:32 Changed 17 months ago by jhpalmieri

  • Reviewers set to John Palmieri
  • Status changed from needs_review to positive_review

This works for me, too. It would be nice to know that the actual problem is beyond "some murky issue with parallel docbuilding on OS X," but it's good enough to merge. @gh-zlscherr, feel free to add your real name to the reviewers field (and also to the wiki page, if you want).

comment:33 Changed 17 months ago by mkoeppe

Thanks!

comment:34 Changed 17 months ago by dimpase

this fixed the docbuild crash on my Big Sur bix too

comment:35 Changed 16 months ago by slelievre

On macOS 10.14.6: dochtml builds with this, while it does not with 9.3.beta7 or #31419.

comment:36 Changed 16 months ago by vbraun

  • Branch changed from u/mkoeppe/homebrew__docbuild_crashes__libtcl_atforkprepare to b4ceee5dae4caa7dac7a6fe7ca29538aca2d381a
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.