Opened 3 years ago

Last modified 2 years ago

#26608 closed defect

Docbuild segfaults when pari is compiled with threading — at Version 15

Reported by: gh-timokau Owned by:
Priority: major Milestone: sage-pending
Component: documentation Keywords: docbuild, pari
Cc: arojas, jdemeyer, fbissey, dimpase, saraedum, embray Merged in:
Authors: Reviewers:
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description (last modified by embray)

This ticket is a followup to this sage-packaging discussion. To summarize:

sage does not work together with pari's threading. Instead of relying on it being compiled without threading, I made use of the "nthreads" option to disable threading at runtime in #26002.

However since #24655 (unconditionally enabling threaded docbuild), the docbuild segfaults when pari is compiled with threading support. Apparently sage somehow uses pari while ignoring the nthread option. We get the following backtrace (provided by Antonio with an older version of sage):

Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 113, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 65, in mapstar
    return map(*args)
  File "/build/sagemath-doc/src/sage-8.0/local-python/sage_setup/docbuild/__init__.py", line 70, in build_ref_doc
    getattr(ReferenceSubBuilder(doc, lang), format)(*args, **kwds)
  File "/build/sagemath-doc/src/sage-8.0/local-python/sage_setup/docbuild/__init__.py", line 720, in _wrapper
    getattr(DocBuilder, build_type)(self, *args, **kwds)
  File "/build/sagemath-doc/src/sage-8.0/local-python/sage_setup/docbuild/__init__.py", line 104, in f
    runsphinx()
  File "/build/sagemath-doc/src/sage-8.0/local-python/sage_setup/docbuild/sphinxbuild.py", line 207, in runsphinx
    sphinx.cmdline.main(sys.argv)
  File "/usr/lib/python2.7/site-packages/sphinx/cmdline.py", line 296, in main
    app.build(opts.force_all, filenames)
  File "/usr/lib/python2.7/site-packages/sphinx/application.py", line 333, in build
    self.builder.build_update()
  File "/usr/lib/python2.7/site-packages/sphinx/builders/__init__.py", line 251, in build_update
    'out of date' % len(to_build))
  File "/usr/lib/python2.7/site-packages/sphinx/builders/__init__.py", line 265, in build
    self.doctreedir, self.app))
  File "/usr/lib/python2.7/site-packages/sphinx/environment/__init__.py", line 549, in update
    self._read_serial(docnames, app)
  File "/usr/lib/python2.7/site-packages/sphinx/environment/__init__.py", line 569, in _read_serial
    self.read_doc(docname, app)
  File "/usr/lib/python2.7/site-packages/sphinx/environment/__init__.py", line 677, in read_doc
    pub.publish()
  File "/usr/lib/python2.7/site-packages/docutils/core.py", line 217, in publish
    self.settings)
  File "/usr/lib/python2.7/site-packages/sphinx/io.py", line 55, in read
    self.parse()
  File "/usr/lib/python2.7/site-packages/docutils/readers/__init__.py", line 78, in parse
    self.parser.parse(self.input, document)
  File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/__init__.py", line 191, in parse
    self.statemachine.run(inputlines, document, inliner=self.inliner)
  File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 171, in run
    input_source=document['source'])
  File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 239, in run
    context, state, transitions)
  File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 460, in check_line
    return method(match, context, next_state)
  File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2753, in underline
    self.section(title, source, style, lineno - 1, messages)
  File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 327, in section
    self.new_subsection(title, lineno, messages)
  File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 395, in new_subsection
    node=section_node, match_titles=True)
  File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 282, in nested_parse
    node=node, match_titles=match_titles)
  File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 196, in run
    results = StateMachineWS.run(self, input_lines, input_offset)
  File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 239, in run
    context, state, transitions)
  File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 460, in check_line
    return method(match, context, next_state)
  File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2328, in explicit_markup
    self.explicit_list(blank_finish)
  File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2358, in explicit_list
    match_titles=self.state_machine.match_titles)
  File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 319, in nested_list_parse
    node=node, match_titles=match_titles)
  File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 196, in run
    results = StateMachineWS.run(self, input_lines, input_offset)
  File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 239, in run
    context, state, transitions)
  File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 460, in check_line
    return method(match, context, next_state)
  File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2631, in explicit_markup
    nodelist, blank_finish = self.explicit_construct(match)
  File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2338, in explicit_construct
    return method(self, expmatch)
  File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2081, in directive
    directive_class, match, type_name, option_presets)
  File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2130, in run_directive
    result = directive_instance.run()
  File "/build/sagemath-doc/src/sage-8.0/src/sage_setup/docbuild/ext/sage_autodoc.py", line 1749, in run
    nested_parse_with_titles(self.state, self.result, node)
  File "/usr/lib/python2.7/site-packages/sphinx/util/nodes.py", line 208, in nested_parse_with_titles
    return state.nested_parse(content, 0, node, match_titles=1)
  File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 282, in nested_parse
    node=node, match_titles=match_titles)
  File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 196, in run
    results = StateMachineWS.run(self, input_lines, input_offset)
  File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 239, in run
    context, state, transitions)
  File "/usr/lib/python2.7/site-packages/docutils/statemachine.py", line 460, in check_line
    return method(match, context, next_state)
  File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2326, in explicit_markup
    nodelist, blank_finish = self.explicit_construct(match)
  File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2338, in explicit_construct
    return method(self, expmatch)
  File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2081, in directive
    directive_class, match, type_name, option_presets)
  File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/states.py", line 2130, in run_directive
    result = directive_instance.run()
  File "/usr/lib/python2.7/site-packages/docutils/parsers/rst/__init__.py", line 410, in run
    self.state, self.state_machine)
  File "/usr/lib/python2.7/site-packages/matplotlib/sphinxext/plot_directive.py", line 189, in plot_directive
    return run(arguments, content, options, state_machine, state, lineno)
  File "/usr/lib/python2.7/site-packages/matplotlib/sphinxext/plot_directive.py", line 779, in run
    close_figs=context_opt == 'close-figs')
  File "/usr/lib/python2.7/site-packages/matplotlib/sphinxext/plot_directive.py", line 644, in render_figures
    run_code(code_piece, code_path, ns, function_name)
  File "/usr/lib/python2.7/site-packages/matplotlib/sphinxext/plot_directive.py", line 524, in run_code
    six.exec_(code, ns)
  File "/usr/lib/python2.7/site-packages/six.py", line 709, in exec_
    exec("""exec _code_ in _globs_, _locs_""")
  File "<string>", line 1, in <module>
  File "<string>", line 1, in <module>
  File "sage/misc/classcall_metaclass.pyx", line 329, in sage.misc.classcall_metaclass.ClasscallMetaclass.__call__ (build/cythonized/sage/misc/classcall_metaclass.c:1698)
    if cls.classcall is not None:
  File "/usr/lib/python2.7/site-packages/sage/geometry/triangulation/point_configuration.py", line 331, in __classcall__
    .__classcall__(cls, points, connected, fine, regular, star, defined_affine)
  File "sage/misc/cachefunc.pyx", line 1005, in sage.misc.cachefunc.CachedFunction.__call__ (build/cythonized/sage/misc/cachefunc.c:6065)
    ArgSpec(args=['self', 'algorithm', 'deg_bound', 'mult_bound', 'prot'],
  File "/usr/lib/python2.7/site-packages/sage/structure/unique_representation.py", line 1027, in __classcall__
    instance = typecall(cls, *args, **options)
  File "sage/misc/classcall_metaclass.pyx", line 496, in sage.misc.classcall_metaclass.typecall (build/cythonized/sage/misc/classcall_metaclass.c:2148)
    """
  File "/usr/lib/python2.7/site-packages/sage/geometry/triangulation/point_configuration.py", line 367, in __init__
    PointConfiguration_base.__init__(self, points, defined_affine)
  File "sage/geometry/triangulation/base.pyx", line 398, in sage.geometry.triangulation.base.PointConfiguration_base.__init__ (build/cythonized/sage/geometry/triangulation/base.cpp:4135)
    self._init_points(points)
  File "sage/geometry/triangulation/base.pyx", line 456, in sage.geometry.triangulation.base.PointConfiguration_base._init_points (build/cythonized/sage/geometry/triangulation/base.cpp:4982)
    red = matrix([ red.row(i) for i in red.pivot_rows()])
  File "sage/matrix/matrix2.pyx", line 517, in sage.matrix.matrix2.Matrix.pivot_rows (build/cythonized/sage/matrix/matrix2.c:8414)
    """
  File "sage/matrix/matrix_integer_dense.pyx", line 2217, in sage.matrix.matrix_integer_dense.Matrix_integer_dense.pivots (build/cythonized/sage/matrix/matrix_integer_dense.c:19162)
    sage: matrix(3, range(9)).elementary_divisors()
  File "sage/matrix/matrix_integer_dense.pyx", line 2019, in sage.matrix.matrix_integer_dense.Matrix_integer_dense.echelon_form (build/cythonized/sage/matrix/matrix_integer_dense.c:17749)
   
  File "sage/matrix/matrix_integer_dense.pyx", line 5719, in sage.matrix.matrix_integer_dense.Matrix_integer_dense._hnf_pari (build/cythonized/sage/matrix/matrix_integer_dense.c:46635)
    most `\max\mathcal{S}` where `\mathcal{S}` denotes the full
SignalError: Segmentation fault 

That shows us that src/sage/matrix/matrix_integer_dense.pyx is involved. Apparently that file directly uses cypari c-bindings instead of the libs/pari.py interface (where the nthreads option is added). For example:

def LLL_gram(self, flag = 0):
    if self._nrows != self._ncols:
        raise ArithmeticError("self must be a square matrix")

    n = self.nrows()
    # maybe should be /unimodular/ matrices ?
    P = self.__pari__()
    try:
        U = P.qflllgram(flag)
    except (RuntimeError, ArithmeticError) as msg:
        raise ValueError("qflllgram failed, "
                         "perhaps the matrix is not positive definite")
    if U.matsize() != [n, n]:
        raise ValueError("qflllgram did not return a square matrix, "
                         "perhaps the matrix is not positive definite");
    MS = matrix_space.MatrixSpace(ZZ,n)
    U = MS(U.sage())
    # Fix last column so that det = +1
    if U.det() == -1:
        for i in range(n):
            U[i,n-1] = - U[i,n-1]
    return U

Can someone more familiar with cython and cypari tell if the options defined in libs/pari.py would apply here? Why isn't libs/pari.py used?

Change History (15)

comment:1 Changed 3 years ago by embray

I think there's a little bit of misinformation / misconception here.

There's nothing about Sage's docbuild program that uses multi-threading. It uses a process pool and builds each sub-document in separate processes.

(There are some cases where it does not run builds in subprocesses when it probably should, and I think that is contributing somewhat to the explosion of memory usage in the docbuild, but that's a separate issue).

comment:2 follow-up: Changed 3 years ago by embray

I don't know if PARI uses openblas in its multi-threaded mode but I wonder if this is related to #26585

comment:3 Changed 3 years ago by arojas

Note that the code excerpts in the last lines of the backtrace are nonsense since I was compiling an older version of the docs. Here's the "translated" version:

  File "sage/misc/cachefunc.pyx", line 1005, in sage.misc.cachefunc.CachedFunction.__call__ (build/cythonized/sage/misc/cachefunc.c:6065)
    w = self.f(*args, **kwds)
  File "/usr/lib/python2.7/site-packages/sage/structure/unique_representation.py", line 1027, in __classcall__
    instance = typecall(cls, *args, **options)
  File "sage/misc/classcall_metaclass.pyx", line 496, in sage.misc.classcall_metaclass.typecall (build/cythonized/sage/misc/classcall_metaclass.c:2148)
    return (<PyTypeObject*>type).tp_call(cls, args, kwds)
  File "/usr/lib/python2.7/site-packages/sage/geometry/triangulation/point_configuration.py", line 367, in __init__
    PointConfiguration_base.__init__(self, points, defined_affine)
  File "sage/geometry/triangulation/base.pyx", line 398, in sage.geometry.triangulation.base.PointConfiguration_base.__init__ (build/cythonized/sage/geometry/triangulation/base.cpp:4135)
    self._init_points(points)
  File "sage/geometry/triangulation/base.pyx", line 456, in sage.geometry.triangulation.base.PointConfiguration_base._init_points (build/cythonized/sage/geometry/triangulation/base.cpp:4982)
    red = matrix([ red.row(i) for i in red.pivot_rows()])
  File "sage/matrix/matrix2.pyx", line 517, in sage.matrix.matrix2.Matrix.pivot_rows (build/cythonized/sage/matrix/matrix2.c:8414)
    v = self.transpose().pivots()
  File "sage/matrix/matrix_integer_dense.pyx", line 2217, in sage.matrix.matrix_integer_dense.Matrix_integer_dense.pivots (build/cythonized/sage/matrix/matrix_integer_dense.c:19162)
    E = self.echelon_form()
  File "sage/matrix/matrix_integer_dense.pyx", line 2019, in sage.matrix.matrix_integer_dense.Matrix_integer_dense.echelon_form (build/cythonized/sage/matrix/matrix_integer_dense.c:17749)
    H_m = self._hnf_pari(flag, include_zero_rows=include_zero_rows)
  File "sage/matrix/matrix_integer_dense.pyx", line 5719, in sage.matrix.matrix_integer_dense.Matrix_integer_dense._hnf_pari (build/cythonized/sage/matrix/matrix_integer_dense.c:46635)
    sig_on()
SignalError: Segmentation fault 

comment:4 Changed 3 years ago by gh-timokau

  • Description modified (diff)

comment:5 in reply to: ↑ 2 ; follow-up: Changed 3 years ago by gh-timokau

Replying to embray:

I don't know if PARI uses openblas in its multi-threaded mode but I wonder if this is related to #26585

I'll test if that openblas patch fixes it. Very interesting ticket, I wonder if that also causes #26130 (I've heard darwin is somewhat more prone to threading bugs).

Last edited 3 years ago by gh-timokau (previous) (diff)

comment:6 follow-up: Changed 3 years ago by embray

I'm not sure if it does, but it might. I grepped the pari/gp source and it doesn't use openblas_set_num_threads directly, but something else might be. I did see some reference to omp_set_num_threads but I don't think we compile with OpenMP by default in Sage.

comment:7 Changed 3 years ago by embray

There's also some multi-threading support in FLINT which could be problematic, but I have no idea if that's relevant in this case.

comment:8 Changed 3 years ago by embray

I read on the mailing list post "It is called indirectly via matplotlib when rendering plots, see full backtrace below (btw, I had to downgrade to an old version of Sage to get a meaningful backtrace - I really dislike this trend of hiding build output, it makes it very hard to debug stuff)"

I had a very similar problem to this; it actually came from the BLAS library by way of a Numpy ufunc (I think for the "dot" product of a matrix in a vector, or two matrices). I feel like I actually fixed this but now I can't remember.

comment:9 Changed 3 years ago by embray

Do you know some direct, specific way to reproduce this so that I can try it?

comment:10 in reply to: ↑ 5 ; follow-up: Changed 3 years ago by embray

Replying to gh-timokau:

Replying to embray:

I don't know if PARI uses openblas in its multi-threaded mode but I wonder if this is related to #26585

I'll test if that openblas patch fixes it. Very interesting ticket, I wonder if that also causes #26130 (I've heard darwin is somewhat more prone to threading bugs).

I don't think it's related, because this only showed up when we patched fflas-ffpack to allow configuring the number of threads to use with openblas (by default it just sets it to 1). But conceivably there's a similar bug elsewhere. Possibly related to fork(). I have found many bugs in different projects related to threads/fork interaction.

comment:11 Changed 3 years ago by embray

You know what though--I'm looking at the relevant code in openblas, and openblas_set_num_threads(1) might actually cause it to spin up a single thread (beyond the main thread) which is actually good enough to invoke the bug I fixed. I'm going to try to confirm that though.

comment:12 in reply to: ↑ 6 Changed 3 years ago by jdemeyer

Replying to embray:

I'm not sure if it does, but it might. I grepped the pari/gp source and it doesn't use openblas_set_num_threads directly, but something else might be.

I don't think that PARI uses BLAS in any way.

comment:13 Changed 3 years ago by jdemeyer

  • Description modified (diff)

comment:14 in reply to: ↑ 10 Changed 3 years ago by jdemeyer

Replying to embray:

I have found many bugs in different projects related to threads/fork interaction.

I'm also betting on this. PARI might setup some data structures related to threading (when compiled with threading support) which are invalid when running in a forked child process.

comment:15 Changed 3 years ago by embray

  • Description modified (diff)

Nevermind; that does not appear to be the case, I don't think.

Note: See TracTickets for help on using tickets.