Opened 5 years ago

Closed 5 years ago

#22694 closed defect (fixed)

Cygwin: test failure in sage.manifolds.differentiable.affine_connection

Reported by: embray Owned by:
Priority: major Milestone: sage-8.0
Component: porting: Cygwin Keywords: windows cygwin manifolds parallel
Cc: mmancini, egourgoulhon, jpflori Merged in:
Authors: Erik Bray Reviewers: Eric Gourgoulhon, Jean-Pierre Flori
Report Upstream: Reported upstream. No feedback yet. Work issues:
Branch: 877a3fd (Commits, GitHub, GitLab) Commit: 877a3fd8c03101edb4e3e91b091372dfd8eb31b2
Dependencies: Stopgaps:

Status badges

Description (last modified by embray)

I'm consistently getting these test failures in this module on my Cygwin branch (just rebased on the latest develop branch, but I've had this for a couple weeks now):

sage -t --long --warn-long 158.6 src/sage/manifolds/differentiable/affine_connection.py
**********************************************************************
File "src/sage/manifolds/differentiable/affine_connection.py", line 1749, in sage.manifolds.differentiable.affine_connection.AffineConnection.riemann
Failed example:
    for i in M.irange():
        for j in M.irange():
            for k in M.irange():
                nab.add_coef(eV)[i,j,k] = nab.coef(eVW)[i,j,k,c_uvW].expr()
Exception raised:
    Traceback (most recent call last):
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 498, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 861, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.manifolds.differentiable.affine_connection.AffineConnection.riemann[30]>", line 4, in <module>
        nab.add_coef(eV)[i,j,k] = nab.coef(eVW)[i,j,k,c_uvW].expr()
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/manifolds/differentiable/affine_connection.py", line 628, in coef
        gam[[k,i,j]] = self(ev[i])(ef[k],ev[j])
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/manifolds/differentiable/affine_connection.py", line 1348, in __call__
        return self._derive_paral(tensor_r)
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/manifolds/differentiable/affine_connection.py", line 1470, in _derive_paral
        for ii,val in make_CovDerivative(listParalInput):
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/parallel/multiprocessing_sage.py", line 75, in parallel_iter
        for res in result:
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python/multiprocessing/pool.py", line 668, in next
        raise value
    RuntimeError: Aborted
**********************************************************************
File "src/sage/manifolds/differentiable/affine_connection.py", line 1754, in sage.manifolds.differentiable.affine_connection.AffineConnection.riemann
Failed example:
    r = nab.riemann() ; r
Exception raised:
    Traceback (most recent call last):
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 498, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 861, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.manifolds.differentiable.affine_connection.AffineConnection.riemann[31]>", line 1, in <module>
        r = nab.riemann() ; r
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/manifolds/differentiable/affine_connection.py", line 1776, in riemann
        gam_gam = gam.contract(1, gam, 0)
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/tensor/modules/comp.py", line 2316, in contract
        for ii, val in make_Contraction(listParalInput):
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/parallel/multiprocessing_sage.py", line 75, in parallel_iter
        for res in result:
      File "/home/embray/src/sagemath/sage-cygwin/local/lib/python/multiprocessing/pool.py", line 668, in next
        raise value
    TypeError: Aborted
**********************************************************************
1 item had failures:
   2 of  36 in sage.manifolds.differentiable.affine_connection.AffineConnection.riemann
    [418 tests, 2 failures, 229.98 s]

I can reproduce this outside the test suite as well. The same code works fine without parallel processing, so my guess is a problem in the parallel processing itself (though the tests for sage.parallel itself all pass).

Upstream report: https://lists.opendylan.org/pipermail/bdwgc/2017-March/006266.html

Attachments (1)

stacktrace.log (26.3 KB) - added by embray 5 years ago.

Download all attachments as: .zip

Change History (16)

comment:1 Changed 5 years ago by embray

The RuntimeError: Aborted is from Cysignals. Something is causing a SIGABRT somewhere, though I'm not sure why.

comment:2 Changed 5 years ago by egourgoulhon

  • Cc mmancini egourgoulhon added

Changed 5 years ago by embray

comment:3 follow-up: Changed 5 years ago by embray

By the way, I've made significant progress in getting to the bottom of this. The full traceback leading up to the SIGABRT is attached.

But the main issue here seems to have to do with libgc's handling of threads in the child process after a fork, on Cygwin. In principle libgc explicitly supports Cygwin, but there still seems to be a bug somewhere related to this (or possibly just a build issue though I haven't found that to be the case yet...). I'm trying to boil it down to a simpler test case, but the issue seems to be that for one reason or another the post-fork handler is either not running at all, or not working properly, because in the forked process it's leaving behind a reference to a thread from the parent process, which has a handle to a native (Windows) thread that is no longer valid in the child process. The pthread_atfork handler for the child process should have removed that thread but it doesn't.

A possible short-term workaround might be to compile libgc with threads disabled on Cygwin, but I don't know what other impacts that would have.

comment:4 Changed 5 years ago by embray

Confirmed that building libgc with ./configure --enable-threads=none works around this particular issue.

But I'd still like to get to the bottom of this; ISTM like it might be a simple bug at the end of the day.

comment:5 in reply to: ↑ 3 ; follow-up: Changed 5 years ago by egourgoulhon

Replying to embray:

By the way, I've made significant progress in getting to the bottom of this. The full traceback leading up to the SIGABRT is attached.

Thanks for this update. Another place where doctests involve parallel processing is src/sage/tensor/modules/comp.py. Do you also experience any trouble with this file?

comment:6 follow-up: Changed 5 years ago by embray

Okay, it turns out the pthread_atexit handler doesn't actually get installed unless I explicitly build with ./configure --enable-handle-fork (it can also be enabled at runtime, incidentally).

That said, the fact that it outright crashes on Windows without this option is also a bug, IMO, albeit fixable I think. But for Sage's purposes it will be easy enough to add the appropriate flags when building on Cygwin \o/

comment:7 in reply to: ↑ 5 ; follow-up: Changed 5 years ago by embray

Replying to egourgoulhon:

Another place where doctests involve parallel processing is src/sage/tensor/modules/comp.py. Do you also experience any trouble with this file?

Actually this doctest passes fine as is. This issue only arises in certain cases if ECL is allocating a lot of small objects, and gc needs to suspend all the threads whose stacks it's monitoring so that it can make more room in its heap for more small objects (at least, that's how I understand the issue).

comment:8 in reply to: ↑ 6 Changed 5 years ago by egourgoulhon

Replying to embray:

But for Sage's purposes it will be easy enough to add the appropriate flags when building on Cygwin \o/

Very good!

comment:9 in reply to: ↑ 7 Changed 5 years ago by egourgoulhon

Replying to embray:

Replying to egourgoulhon:

Actually this doctest passes fine as is. This issue only arises in certain cases if ECL is allocating a lot of small objects, and gc needs to suspend all the threads whose stacks it's monitoring so that it can make more room in its heap for more small objects (at least, that's how I understand the issue).

OK I see. Indeed the doctests in src/sage/tensor/modules/comp.py do not involve ECL (no symbolic calculus, hence no Maxima simplifications).

comment:10 Changed 5 years ago by embray

  • Authors set to Erik Bray
  • Branch set to u/embray/cygwin/ticket-22694
  • Commit set to 877a3fd8c03101edb4e3e91b091372dfd8eb31b2
  • Description modified (diff)
  • Report Upstream changed from N/A to Reported upstream. No feedback yet.
  • Status changed from new to needs_review

I've attached a patch to Sage to configure GC to work properly on Cygwin. I've also made an upstream report, but only to suggest that the default configuration for Cygwin should incorporate this (since it's broken otherwise). I don't think there's a great way to work around the issue in the code without using pthread_atfork that doesn't incur too much overhead.


New commits:

877a3fdSet the configure flags required to function correctly w.r.t. fork/threads on Cygwin

comment:11 Changed 5 years ago by egourgoulhon

  • Cc jpflori added
  • Reviewers set to Eric Gourgoulhon
  • Status changed from needs_review to positive_review

LGTM. Just putting Jean-Pierre in Cc for a cross-check, since he knows far better than me Sage build process and Cygwin.

comment:12 follow-up: Changed 5 years ago by jpflori

Looks good to me.

comment:13 Changed 5 years ago by jpflori

  • Reviewers changed from Eric Gourgoulhon to Eric Gourgoulhon, Jean-Pierre Flori

comment:14 in reply to: ↑ 12 Changed 5 years ago by egourgoulhon

Replying to jpflori:

Looks good to me.

Thanks!

comment:15 Changed 5 years ago by vbraun

  • Branch changed from u/embray/cygwin/ticket-22694 to 877a3fd8c03101edb4e3e91b091372dfd8eb31b2
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.