Opened 3 years ago

Closed 2 years ago

# Docbuild segfaults when pari is compiled with threading

Reported by: Owned by: gh-timokau embray major sage-pending documentation docbuild, pari arojas, jdemeyer, fbissey, dimpase, saraedum, embray N/A

This ticket is a followup to this sage-packaging discussion. To summarize:

However since #24655 (unconditionally enabling threaded docbuild), the docbuild segfaults when pari is compiled with threading support. Apparently sage somehow uses pari while ignoring the nthread option. We get the following backtrace (provided by Antonio with an older version of sage):

Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 113, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python2.7/multiprocessing/pool.py", line 65, in mapstar
return map(*args)
File "/build/sagemath-doc/src/sage-8.0/local-python/sage_setup/docbuild/__init__.py", line 70, in build_ref_doc
getattr(ReferenceSubBuilder(doc, lang), format)(*args, **kwds)
File "/build/sagemath-doc/src/sage-8.0/local-python/sage_setup/docbuild/__init__.py", line 720, in _wrapper
getattr(DocBuilder, build_type)(self, *args, **kwds)
File "/build/sagemath-doc/src/sage-8.0/local-python/sage_setup/docbuild/__init__.py", line 104, in f
runsphinx()
File "/build/sagemath-doc/src/sage-8.0/local-python/sage_setup/docbuild/sphinxbuild.py", line 207, in runsphinx
sphinx.cmdline.main(sys.argv)
File "/usr/lib/python2.7/site-packages/sphinx/cmdline.py", line 296, in main
app.build(opts.force_all, filenames)
File "/usr/lib/python2.7/site-packages/sphinx/application.py", line 333, in build
self.builder.build_update()
File "/usr/lib/python2.7/site-packages/sphinx/builders/__init__.py", line 251, in build_update
'out of date' % len(to_build))
File "/usr/lib/python2.7/site-packages/sphinx/builders/__init__.py", line 265, in build
self.doctreedir, self.app))
File "/usr/lib/python2.7/site-packages/sphinx/environment/__init__.py", line 549, in update
File "/usr/lib/python2.7/site-packages/sphinx/environment/__init__.py", line 569, in _read_serial
File "/usr/lib/python2.7/site-packages/sphinx/environment/__init__.py", line 677, in read_doc
pub.publish()
File "/usr/lib/python2.7/site-packages/docutils/core.py", line 217, in publish
self.settings)
File "/usr/lib/python2.7/site-packages/sphinx/io.py", line 55, in read
self.parse()
File "/usr/lib/python2.7/site-packages/docutils/readers/__init__.py", line 78, in parse
self.parser.parse(self.input, document)
File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/__init__.py", line 191, in parse
self.statemachine.run(inputlines, document, inliner=self.inliner)
File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 171, in run
input_source=document['source'])
File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 239, in run
context, state, transitions)
File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 460, in check_line
return method(match, context, next_state)
File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2753, in underline
self.section(title, source, style, lineno - 1, messages)
File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 327, in section
self.new_subsection(title, lineno, messages)
File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 395, in new_subsection
node=section_node, match_titles=True)
File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 282, in nested_parse
node=node, match_titles=match_titles)
File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 196, in run
results = StateMachineWS.run(self, input_lines, input_offset)
File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 239, in run
context, state, transitions)
File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 460, in check_line
return method(match, context, next_state)
File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2328, in explicit_markup
self.explicit_list(blank_finish)
File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2358, in explicit_list
match_titles=self.state_machine.match_titles)
File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 319, in nested_list_parse
node=node, match_titles=match_titles)
File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 196, in run
results = StateMachineWS.run(self, input_lines, input_offset)
File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 239, in run
context, state, transitions)
File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 460, in check_line
return method(match, context, next_state)
File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2631, in explicit_markup
nodelist, blank_finish = self.explicit_construct(match)
File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2338, in explicit_construct
return method(self, expmatch)
File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2081, in directive
directive_class, match, type_name, option_presets)
File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2130, in run_directive
result = directive_instance.run()
File "/build/sagemath-doc/src/sage-8.0/src/sage_setup/docbuild/ext/sage_autodoc.py", line 1749, in run
nested_parse_with_titles(self.state, self.result, node)
File "/usr/lib/python2.7/site-packages/sphinx/util/nodes.py", line 208, in nested_parse_with_titles
return state.nested_parse(content, 0, node, match_titles=1)
File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 282, in nested_parse
node=node, match_titles=match_titles)
File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 196, in run
results = StateMachineWS.run(self, input_lines, input_offset)
File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 239, in run
context, state, transitions)
File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 460, in check_line
return method(match, context, next_state)
File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2326, in explicit_markup
nodelist, blank_finish = self.explicit_construct(match)
File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2338, in explicit_construct
return method(self, expmatch)
File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2081, in directive
directive_class, match, type_name, option_presets)
File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2130, in run_directive
result = directive_instance.run()
File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/__init__.py", line 410, in run
self.state, self.state_machine)
File "/usr/lib/python2.7/site-packages/matplotlib/sphinxext/plot_directive.py", line 189, in plot_directive
return run(arguments, content, options, state_machine, state, lineno)
File "/usr/lib/python2.7/site-packages/matplotlib/sphinxext/plot_directive.py", line 779, in run
close_figs=context_opt == 'close-figs')
File "/usr/lib/python2.7/site-packages/matplotlib/sphinxext/plot_directive.py", line 644, in render_figures
run_code(code_piece, code_path, ns, function_name)
File "/usr/lib/python2.7/site-packages/matplotlib/sphinxext/plot_directive.py", line 524, in run_code
six.exec_(code, ns)
File "/usr/lib/python2.7/site-packages/six.py", line 709, in exec_
exec("""exec _code_ in _globs_, _locs_""")
File "<string>", line 1, in <module>
File "<string>", line 1, in <module>
File "sage/misc/classcall_metaclass.pyx", line 329, in sage.misc.classcall_metaclass.ClasscallMetaclass.__call__ (build/cythonized/sage/misc/classcall_metaclass.c:1698)
if cls.classcall is not None:
File "/usr/lib/python2.7/site-packages/sage/geometry/triangulation/point_configuration.py", line 331, in __classcall__
.__classcall__(cls, points, connected, fine, regular, star, defined_affine)
File "sage/misc/cachefunc.pyx", line 1005, in sage.misc.cachefunc.CachedFunction.__call__ (build/cythonized/sage/misc/cachefunc.c:6065)
ArgSpec(args=['self', 'algorithm', 'deg_bound', 'mult_bound', 'prot'],
File "/usr/lib/python2.7/site-packages/sage/structure/unique_representation.py", line 1027, in __classcall__
instance = typecall(cls, *args, **options)
File "sage/misc/classcall_metaclass.pyx", line 496, in sage.misc.classcall_metaclass.typecall (build/cythonized/sage/misc/classcall_metaclass.c:2148)
"""
File "/usr/lib/python2.7/site-packages/sage/geometry/triangulation/point_configuration.py", line 367, in __init__
PointConfiguration_base.__init__(self, points, defined_affine)
File "sage/geometry/triangulation/base.pyx", line 398, in sage.geometry.triangulation.base.PointConfiguration_base.__init__ (build/cythonized/sage/geometry/triangulation/base.cpp:4135)
self._init_points(points)
File "sage/geometry/triangulation/base.pyx", line 456, in sage.geometry.triangulation.base.PointConfiguration_base._init_points (build/cythonized/sage/geometry/triangulation/base.cpp:4982)
red = matrix([ red.row(i) for i in red.pivot_rows()])
File "sage/matrix/matrix2.pyx", line 517, in sage.matrix.matrix2.Matrix.pivot_rows (build/cythonized/sage/matrix/matrix2.c:8414)
"""
File "sage/matrix/matrix_integer_dense.pyx", line 2217, in sage.matrix.matrix_integer_dense.Matrix_integer_dense.pivots (build/cythonized/sage/matrix/matrix_integer_dense.c:19162)
sage: matrix(3, range(9)).elementary_divisors()
File "sage/matrix/matrix_integer_dense.pyx", line 2019, in sage.matrix.matrix_integer_dense.Matrix_integer_dense.echelon_form (build/cythonized/sage/matrix/matrix_integer_dense.c:17749)

File "sage/matrix/matrix_integer_dense.pyx", line 5719, in sage.matrix.matrix_integer_dense.Matrix_integer_dense._hnf_pari (build/cythonized/sage/matrix/matrix_integer_dense.c:46635)
most \max\mathcal{S} where \mathcal{S} denotes the full
SignalError: Segmentation fault

That shows us that src/sage/matrix/matrix_integer_dense.pyx is involved. Apparently that file directly uses cypari c-bindings instead of the libs/pari.py interface (where the nthreads option is added). For example:

def LLL_gram(self, flag = 0):
if self._nrows != self._ncols:
raise ArithmeticError("self must be a square matrix")

n = self.nrows()
# maybe should be /unimodular/ matrices ?
P = self.__pari__()
try:
U = P.qflllgram(flag)
except (RuntimeError, ArithmeticError) as msg:
raise ValueError("qflllgram failed, "
"perhaps the matrix is not positive definite")
if U.matsize() != [n, n]:
raise ValueError("qflllgram did not return a square matrix, "
"perhaps the matrix is not positive definite");
MS = matrix_space.MatrixSpace(ZZ,n)
U = MS(U.sage())
# Fix last column so that det = +1
if U.det() == -1:
for i in range(n):
U[i,n-1] = - U[i,n-1]
return U

Can someone more familiar with cython and cypari tell if the options defined in libs/pari.py would apply here? Why isn't libs/pari.py used?

### comment:1 Changed 3 years ago by embray

I think there's a little bit of misinformation / misconception here.

There's nothing about Sage's docbuild program that uses multi-threading. It uses a process pool and builds each sub-document in separate processes.

(There are some cases where it does not run builds in subprocesses when it probably should, and I think that is contributing somewhat to the explosion of memory usage in the docbuild, but that's a separate issue).

### comment:2 follow-up: ↓ 5 Changed 3 years ago by embray

I don't know if PARI uses openblas in its multi-threaded mode but I wonder if this is related to #26585

### comment:3 Changed 3 years ago by arojas

Note that the code excerpts in the last lines of the backtrace are nonsense since I was compiling an older version of the docs. Here's the "translated" version:

File "sage/misc/cachefunc.pyx", line 1005, in sage.misc.cachefunc.CachedFunction.__call__ (build/cythonized/sage/misc/cachefunc.c:6065)
w = self.f(*args, **kwds)
File "/usr/lib/python2.7/site-packages/sage/structure/unique_representation.py", line 1027, in __classcall__
instance = typecall(cls, *args, **options)
File "sage/misc/classcall_metaclass.pyx", line 496, in sage.misc.classcall_metaclass.typecall (build/cythonized/sage/misc/classcall_metaclass.c:2148)
return (<PyTypeObject*>type).tp_call(cls, args, kwds)
File "/usr/lib/python2.7/site-packages/sage/geometry/triangulation/point_configuration.py", line 367, in __init__
PointConfiguration_base.__init__(self, points, defined_affine)
File "sage/geometry/triangulation/base.pyx", line 398, in sage.geometry.triangulation.base.PointConfiguration_base.__init__ (build/cythonized/sage/geometry/triangulation/base.cpp:4135)
self._init_points(points)
File "sage/geometry/triangulation/base.pyx", line 456, in sage.geometry.triangulation.base.PointConfiguration_base._init_points (build/cythonized/sage/geometry/triangulation/base.cpp:4982)
red = matrix([ red.row(i) for i in red.pivot_rows()])
File "sage/matrix/matrix2.pyx", line 517, in sage.matrix.matrix2.Matrix.pivot_rows (build/cythonized/sage/matrix/matrix2.c:8414)
v = self.transpose().pivots()
File "sage/matrix/matrix_integer_dense.pyx", line 2217, in sage.matrix.matrix_integer_dense.Matrix_integer_dense.pivots (build/cythonized/sage/matrix/matrix_integer_dense.c:19162)
E = self.echelon_form()
File "sage/matrix/matrix_integer_dense.pyx", line 2019, in sage.matrix.matrix_integer_dense.Matrix_integer_dense.echelon_form (build/cythonized/sage/matrix/matrix_integer_dense.c:17749)
H_m = self._hnf_pari(flag, include_zero_rows=include_zero_rows)
File "sage/matrix/matrix_integer_dense.pyx", line 5719, in sage.matrix.matrix_integer_dense.Matrix_integer_dense._hnf_pari (build/cythonized/sage/matrix/matrix_integer_dense.c:46635)
sig_on()
SignalError: Segmentation fault

### comment:4 Changed 3 years ago by gh-timokau

• Description modified (diff)

### comment:5 in reply to: ↑ 2 ; follow-up: ↓ 10 Changed 3 years ago by gh-timokau

I don't know if PARI uses openblas in its multi-threaded mode but I wonder if this is related to #26585

I'll test if that openblas patch fixes it. Very interesting ticket, I wonder if that also causes #26130 (I've heard darwin is somewhat more prone to threading bugs).

Last edited 3 years ago by gh-timokau (previous) (diff)

### comment:6 follow-up: ↓ 12 Changed 3 years ago by embray

I'm not sure if it does, but it might. I grepped the pari/gp source and it doesn't use openblas_set_num_threads directly, but something else might be. I did see some reference to omp_set_num_threads but I don't think we compile with OpenMP by default in Sage.

### comment:7 Changed 3 years ago by embray

There's also some multi-threading support in FLINT which could be problematic, but I have no idea if that's relevant in this case.

### comment:8 Changed 3 years ago by embray

I read on the mailing list post "It is called indirectly via matplotlib when rendering plots, see full backtrace below (btw, I had to downgrade to an old version of Sage to get a meaningful backtrace - I really dislike this trend of hiding build output, it makes it very hard to debug stuff)"

I had a very similar problem to this; it actually came from the BLAS library by way of a Numpy ufunc (I think for the "dot" product of a matrix in a vector, or two matrices). I feel like I actually fixed this but now I can't remember.

### comment:9 Changed 3 years ago by embray

Do you know some direct, specific way to reproduce this so that I can try it?

### comment:10 in reply to: ↑ 5 ; follow-up: ↓ 14 Changed 3 years ago by embray

I don't know if PARI uses openblas in its multi-threaded mode but I wonder if this is related to #26585

I'll test if that openblas patch fixes it. Very interesting ticket, I wonder if that also causes #26130 (I've heard darwin is somewhat more prone to threading bugs).

I don't think it's related, because this only showed up when we patched fflas-ffpack to allow configuring the number of threads to use with openblas (by default it just sets it to 1). But conceivably there's a similar bug elsewhere. Possibly related to fork(). I have found many bugs in different projects related to threads/fork interaction.

### comment:11 Changed 3 years ago by embray

You know what though--I'm looking at the relevant code in openblas, and openblas_set_num_threads(1) might actually cause it to spin up a single thread (beyond the main thread) which is actually good enough to invoke the bug I fixed. I'm going to try to confirm that though.

### comment:12 in reply to: ↑ 6 Changed 3 years ago by jdemeyer

I'm not sure if it does, but it might. I grepped the pari/gp source and it doesn't use openblas_set_num_threads directly, but something else might be.

I don't think that PARI uses BLAS in any way.

### comment:13 Changed 3 years ago by jdemeyer

• Description modified (diff)

### comment:14 in reply to: ↑ 10 ; follow-up: ↓ 16 Changed 3 years ago by jdemeyer

I have found many bugs in different projects related to threads/fork interaction.

I'm also betting on this. PARI might setup some data structures related to threading (when compiled with threading support) which are invalid when running in a forked child process.

### comment:15 Changed 3 years ago by embray

• Description modified (diff)

Nevermind; that does not appear to be the case, I don't think.

### comment:16 in reply to: ↑ 14 Changed 3 years ago by embray

I have found many bugs in different projects related to threads/fork interaction.

I'm also betting on this. PARI might setup some data structures related to threading (when compiled with threading support) which are invalid when running in a forked child process.

Yes, I think you must be right. PARI has its own thread management, and it does not implement any pthread_atfork handler that I can find, which is strong cause to suspect it...

### comment:17 follow-up: ↓ 19 Changed 3 years ago by embray

ISTM PARI/GP is not even built with multi-threading enabled unless you run its Configure with either --mt=pthread or --mt=mpi. On my system anyways this is not done by default...

Last edited 3 years ago by embray (previous) (diff)

### comment:18 follow-up: ↓ 29 Changed 3 years ago by embray

Now if I build PARI with --mt=pthread I can make a child process segfault if it tries to do some multi-threaded work. Granted, this is partly with my own bad code which does some things improperly. Now I'd be curious if

a) Anyone getting having this problem is building PARI with --mt=pthread and

b) Exactly what code is being run in Sage that invokes multi-threading in PARI.

Last edited 3 years ago by embray (previous) (diff)

### comment:19 in reply to: ↑ 17 Changed 3 years ago by gh-timokau

ISTM PARI/GP is not even built with multi-threading enabled unless you run its Configure with either --mt=pthread or --mt=mpi. On my system anyways this is not done by default...

Yes, that is why this issue came up on sage-packaging. Some distros ship pari with threading enabled. Sage does not. My effort in #26002 was not to change that, but to make sage compatible with a system pari.

### comment:20 Changed 3 years ago by embray

I can also make it deadlock with the right combination of evil calls.

### comment:21 Changed 3 years ago by gh-timokau

Even when you explicitly set nthreads to 1 before going forth with that evilness?

### comment:22 Changed 3 years ago by embray

I don't know. I've been sick for a the last week so I completely forget exactly where I left this. Nevertheless, now I know that unless I compile pari with --mt=pthread the problem won't occur at all. So I'll try doing that again, and then see if I can figure out exactly what code in Sage exhibits the bug. That will help me pinpoint it.

### comment:23 Changed 3 years ago by gh-timokau

Okay, glad you're feeling better :)

Let me know if I can help.

### comment:24 Changed 3 years ago by embray

• Milestone changed from sage-8.5 to sage-8.7
• Owner changed from (none) to embray

### comment:25 Changed 3 years ago by jdemeyer

Too bad, I also forgot about this. I'm literally right now returning from a week-long PARI/GP workshop where I could have discussed this.

### comment:26 Changed 3 years ago by arojas

The alternative multiprocess doc build introduced in #27490 works as a temporary workaround. I just replaced the revert with a one-line patch to enable it unconditionally for 8.7

### comment:27 Changed 3 years ago by embray

• Milestone changed from sage-8.7 to sage-pending

Removing most of the rest of my open tickets out of the 8.7 milestone, which should be closed.

### comment:29 in reply to: ↑ 18 Changed 2 years ago by embray

Now if I build PARI with --mt=pthread I can make a child process segfault if it tries to do some multi-threaded work. Granted, this is partly with my own bad code which does some things improperly. Now I'd be curious if

I wish I knew what I meant by this, because I want to investigate this again but I don't have a clear way yet to even reproduce the issue, and I can't find whatever example code this might be referring to... :(

### comment:30 Changed 2 years ago by embray

• Setting --mt=pthread in PARI's Configure script results in it compiling a test file called config/pthread.c, and if that succeeds it sets a variable thread_engine=pthread and enable_tls=yes, and should print "Using mt engine pthread" (similar result if you use mpi, but for now I'm just looking at pthread).
• This also outputs the macro #define PARI_MT_ENGINE "pthread" in the generated paricfg.h.
• In src/language/paricfg.c it sets a global constant const char *paricfg_mt_engine = PARI_MT_ENGINE;. This is only used when printing the version info like
\$ gp --version
GP/PARI CALCULATOR Version 2.11.1 (released)
amd64 running linux (x86-64/GMP-6.0.0 kernel) 64-bit version
compiled: Aug  7 2019, gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.4)
(readline v6.3 disabled, extended help enabled)

• In the generated Makefile this also adds some bits, such as MT_LIBS=-lpthread, the file parimt.h contains the contents of src/mt/pthread.h, and the module src/mt/pthread.c which implements an standard interface (including the pari_mt_init function), is built, alongside some generic code in src/mt/mt.c
• If instead we'd had --mt=single (the default) it would compile src/mt/single.c which implements the same interfaces, mostly as no-ops (its pari_mt_init just sets the global variable pari_mt_nbthreads = 1).
Version 0, edited 2 years ago by embray (next)

### comment:31 Changed 2 years ago by dimpase

and Sage installs paricfg.h with

#define PARI_MT_ENGINE "single"

Well, I can use this on #28242 to test whether we got single-threaded libpari.

### comment:32 Changed 2 years ago by embray

Managed to reproduce the problem again by doing a parallel docbuild after building PARI with --mt=pthread, which eventually gives a segfault.

Unfortunately the build_many function doesn't give us any info on the traceback from the original exception, so I added the following patch to the docbuild code so it would at least print the traceback:

• ## src/sage_setup/docbuild/__init__.py

diff --git a/src/sage_setup/docbuild/__init__.py b/src/sage_setup/docbuild/__init__.py
index e406bca..4912b5c 100644
 a import shutil import subprocess import sys import time import traceback import warnings logger = logging.getLogger(__name__) def builder_helper(type): if ABORT_ON_ERROR: raise except BaseException as e: exc_type, exc_value, exc_traceback = sys.exc_info() traceback.print_tb(exc_traceback) # We need to wrap a BaseException that is not an Exception in a # regular Exception. Otherwise multiprocessing.Pool.get hangs, see # #25161

With that, I get a long traceback (much of which is stuff in Sphinx that isn't interesting). But as previously reported--and unsurprisingly--the segfault originates from some code for a plot:

File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/matplotlib/sphinxext/plot_directive.py", line 644, in render_figures
run_code(code_piece, code_path, ns, function_name)
File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/matplotlib/sphinxext/plot_directive.py", line 524, in run_code
six.exec_(code, ns)
File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/six.py", line 709, in exec_
exec("""exec _code_ in _globs_, _locs_""")
File "<string>", line 1, in <module>
File "<string>", line 2, in <module>
File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/sage/categories/finite_coxeter_groups.py", line 750, in permutahedron
vertices = [v.change_ring(AA) for v in vertices]
File "sage/modules/free_module_element.pyx", line 1495, in sage.modules.free_module_element.FreeModuleElement.change_ring (build/cythonized/sage/modules/free_module_element.c:11182)
return M(self.list(), coerce=True)
File "sage/structure/parent.pyx", line 902, in sage.structure.parent.Parent.__call__ (build/cythonized/sage/structure/parent.c:9225)
return mor._call_with_args(x, args, kwds)
File "sage/structure/coerce_maps.pyx", line 171, in sage.structure.coerce_maps.DefaultConvertMap_unique._call_with_args (build/cythonized/sage/structure/coerce_maps.c:4872)
return C._element_constructor(x, **kwds)
File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/sage/modules/free_module.py", line 5601, in _element_constructor_
return FreeModule_generic_field._element_constructor_(self, e, *args, **kwds)
File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/sage/modules/free_module.py", line 1028, in _element_constructor_
return self.element_class(self, x, coerce, copy)
File "sage/modules/free_module_element.pyx", line 4119, in sage.modules.free_module_element.FreeModuleElement_generic_dense.__init__ (build/cythonized/sage/modules/free_module_element.c:28564)
entries = [coefficient_ring(x) for x in entries]
File "sage/structure/parent.pyx", line 900, in sage.structure.parent.Parent.__call__ (build/cythonized/sage/structure/parent.c:9198)
return mor._call_(x)
File "sage/structure/coerce_maps.pyx", line 157, in sage.structure.coerce_maps.DefaultConvertMap_unique._call_ (build/cythonized/sage/structure/coerce_maps.c:4449)
return C._element_constructor(x)
File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/sage/rings/qqbar.py", line 759, in _element_constructor_
return x._algebraic_(AA)
File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/sage/rings/universal_cyclotomic_field.py", line 606, in _algebraic_
return R(QQbar(self))
File "sage/structure/parent.pyx", line 900, in sage.structure.parent.Parent.__call__ (build/cythonized/sage/structure/parent.c:9198)
return mor._call_(x)
File "sage/categories/map.pyx", line 789, in sage.categories.map.Map._call_ (build/cythonized/sage/categories/map.c:6953)
cpdef Element _call_(self, x):
File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/sage/rings/universal_cyclotomic_field.py", line 260, in _call_
zeta = QQbar.zeta(k)
File "sage/misc/cachefunc.pyx", line 1949, in sage.misc.cachefunc.CachedMethodCaller.__call__ (build/cythonized/sage/misc/cachefunc.c:10274)
w = self._instance_call(*args, **kwds)
File "sage/misc/cachefunc.pyx", line 1825, in sage.misc.cachefunc.CachedMethodCaller._instance_call (build/cythonized/sage/misc/cachefunc.c:9759)
return self.f(self._instance, *args, **kwds)
File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/sage/rings/qqbar.py", line 1366, in zeta
nf = CyclotomicField(n)
File "sage/structure/factory.pyx", line 369, in sage.structure.factory.UniqueFactory.__call__ (build/cythonized/sage/structure/factory.c:2146)
return self.get_object(version, key, kwds)
File "sage/structure/factory.pyx", line 406, in sage.structure.factory.UniqueFactory.get_object (build/cythonized/sage/structure/factory.c:2350)
return self._cache[version, cache_key]
File "sage/misc/weak_dict.pyx", line 704, in sage.misc.weak_dict.WeakValueDictionary.__getitem__ (build/cythonized/sage/misc/weak_dict.c:3653)
cdef PyObject* wr = PyDict_GetItemWithError(self, k)
File "sage/cpython/dict_del_by_value.pyx", line 58, in sage.cpython.dict_del_by_value.PyDict_GetItemWithError (build/cythonized/sage/cpython/dict_del_by_value.c:1261)
ep = mp.ma_lookup(mp, <PyObject*><void*>key, PyObject_Hash(key))
File "sage/rings/real_lazy.pyx", line 1398, in sage.rings.real_lazy.LazyNamedUnop.__hash__ (build/cythonized/sage/rings/real_lazy.c:15533)
return hash(complex(self))
File "sage/rings/real_lazy.pyx", line 822, in sage.rings.real_lazy.LazyFieldElement.__complex__ (build/cythonized/sage/rings/real_lazy.c:9872)
return self.eval(complex)
File "sage/rings/real_lazy.pyx", line 1352, in sage.rings.real_lazy.LazyNamedUnop.eval (build/cythonized/sage/rings/real_lazy.c:14982)
arg = self._arg.eval(R)
File "sage/rings/real_lazy.pyx", line 1129, in sage.rings.real_lazy.LazyBinop.eval (build/cythonized/sage/rings/real_lazy.c:12722)
left = self._left.eval(R)
File "sage/rings/real_lazy.pyx", line 1130, in sage.rings.real_lazy.LazyBinop.eval (build/cythonized/sage/rings/real_lazy.c:12734)
right = self._right.eval(R)
File "sage/rings/real_lazy.pyx", line 1647, in sage.rings.real_lazy.LazyAlgebraic.eval (build/cythonized/sage/rings/real_lazy.c:17915)
self.eval(self.parent().interval_field(64)) # up the prec
File "sage/rings/real_lazy.pyx", line 1673, in sage.rings.real_lazy.LazyAlgebraic.eval (build/cythonized/sage/rings/real_lazy.c:18066)
roots = self._poly.roots(ring = AA if isinstance(self._parent, RealLazyField_class) else QQbar)
File "sage/rings/polynomial/polynomial_element.pyx", line 7721, in sage.rings.polynomial.polynomial_element.Polynomial.roots (build/cythonized/sage/rings/polynomial/polynomial_element.c:61844)
rts = complex_roots(self, retval='algebraic')
File "/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/sage/rings/polynomial/complex_roots.py", line 258, in complex_roots
rts = cfac.roots(multiplicities=False)
File "sage/rings/polynomial/polynomial_element.pyx", line 7629, in sage.rings.polynomial.polynomial_element.Polynomial.roots (build/cythonized/sage/rings/polynomial/polynomial_element.c:59297)
ext_rts = self.__pari__().polroots(precision=L.prec())
File "sage/rings/polynomial/polynomial_element.pyx", line 6021, in sage.rings.polynomial.polynomial_element.Polynomial.__pari__ (build/cythonized/sage/rings/polynomial/polynomial_element.c:49003)
return self._pari_with_name(self._parent.variable_name())
File "sage/rings/polynomial/polynomial_element.pyx", line 6074, in sage.rings.polynomial.polynomial_element.Polynomial._pari_with_name (build/cythonized/sage/rings/polynomial/polynomial_element.c:49395)
vals = [x.__pari__() for x in self.list()]
File "sage/rings/complex_number.pyx", line 593, in sage.rings.complex_number.ComplexNumber.__pari__ (build/cythonized/sage/rings/complex_number.c:7107)
return self.real().__pari__()
File "sage/rings/real_mpfr.pyx", line 3248, in sage.rings.real_mpfr.RealNumber.__pari__ (build/cythonized/sage/rings/real_mpfr.c:22355)
sig_on()

Here it's probably building the reference docs for sage.categories.finite_coxeter_groups which contains a plot using CoxeterGroup.permutahedron(), which in turn much further down happens to use PARI to compute some complex roots of a polynomial.

It looks like it's not actually reaching the polroots() call before crashing somewhere down in converting the polynomial coeffs to PARI values...?

Last edited 2 years ago by embray (previous) (diff)

### comment:33 Changed 2 years ago by embray

The nffactor function in PARI ("Factorization of the univariate polynomial (or rational function) T over the number field nf") internally uses a parallel Chinese remainder routine in the internal function polint_chinese. This results in a thread pool being launched. This is being called at some point during the categories reference doc build.

### comment:34 Changed 2 years ago by dimpase

If #26002 has any effect, the pool should have just a single thread, no?

### comment:35 Changed 2 years ago by embray

In further digging, yes, I just found #26002 and that indeed pari has pari_mt_nbthreads = 1 so maybe I'm barking up the wrong tree. The polint_chinese call indeed does not use any threads in this case.

### comment:36 Changed 2 years ago by dimpase

hmm, maybe it still does something with thread local variables?

### comment:37 follow-up: ↓ 39 Changed 2 years ago by embray

On a wild guess, I tried switching the docbuild to use my multiprocessing.Pool replacement from #27490 (this is on Linux), and the crashes no longer occur.

The major difference is that in multiprocessing.Pool, each docbuild subprocess is forked from a separate thread from the main thread, whereas in my replacement they're just forked directly from the main thread.

Somehow this alone is enough to leave some structures in PARI in a bad state, and only if it was built with multithreading support in the first place (apparently).

### comment:38 follow-up: ↓ 40 Changed 2 years ago by embray

Going back to comment 30, if --mt=pthread then a variable in PARI's Configure, enable_tls="yes" is set. This in turns leads to defining a macro in paricfg.h called ENABLE_TLS.

The only effect of ENABLE_TLS is in src/headers/parisys.h:

#ifdef ENABLE_TLS
#else
#endif

So variables in PARI declared THREAD use TLS in this case. This alone could be enough to suspect strange behavior in PARI when it's forked from a thread...

### comment:39 in reply to: ↑ 37 Changed 2 years ago by dimpase

On a wild guess, I tried switching the docbuild to use my multiprocessing.Pool replacement from #27490 (this is on Linux), and the crashes no longer occur.

This is matching what I observed on #28242 (and the workaround I added to that branch) -- so we're in the same wilderness :-)

### comment:40 in reply to: ↑ 38 Changed 2 years ago by embray

So variables in PARI declared THREAD use TLS in this case. This alone could be enough to suspect strange behavior in PARI when it's forked from a thread...

Yup. That's really all it is. PARI has tons of global variables declared __thread, including but not least of all pari_mainstack. When multiprocessing.Pool spins up a new thread, all of those __thread variables are suddenly set back to their initialization values (0x0, typically).

This isn't so unusual in PARI's case. It assumes that it's the only one that will be starting new threads that it manages. It does not assume it will ever be used in someone else's multi-threaded application.

### comment:41 Changed 2 years ago by embray

#28356 proposes a workaround for this issue.

It won't solve the issue in general (PARI is not safe to use in arbitrary multi-threaded code), but at least it won't crash when building the docs.

### comment:42 follow-ups: ↓ 43 ↓ 44 Changed 2 years ago by jdemeyer

Did somebody check that this problem is limited to docbuilds? Does the Sage testsuite pass (apart from doc-related tests of course)?

### comment:43 in reply to: ↑ 42 Changed 2 years ago by arojas

Did somebody check that this problem is limited to docbuilds? Does the Sage testsuite pass (apart from doc-related tests of course)?

Yes, there are no test suite failures related to this.

### comment:44 in reply to: ↑ 42 Changed 2 years ago by embray

Did somebody check that this problem is limited to docbuilds? Does the Sage testsuite pass (apart from doc-related tests of course)?

In principle it's not just limited to docbuilds: The broader problem, for which I would like to find a better resolution, is that the cypari2 Pari instances are simply not thread-safe. A more general approach would be if Sage's single Pari instance were actually thread-local.

As it is, Sage doesn't use threads for much of anything, whether in the tests, or in general, so the problem arises primarily in the docbuild.

But this can cause problems for anyone who carelessly tries to use multiprocessing.Pool, or threads in general, in Sage* :(

!* If they do anything in those threads that happens to use PARI.

Last edited 2 years ago by embray (previous) (diff)

### comment:45 Changed 2 years ago by embray

• Resolution set to fixed
• Status changed from new to closed

Closing this ticket for now, since the immediate problem (Sage's docbuild) has been worked around by #28356. The broader issue still should have a ticket to track it, even if it won't be resolved any time soon, so I have opened #28800.

Note: See TracTickets for help on using tickets.