Opened 5 years ago
Closed 5 years ago
#22694 closed defect (fixed)
Cygwin: test failure in sage.manifolds.differentiable.affine_connection
Reported by: | embray | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-8.0 |
Component: | porting: Cygwin | Keywords: | windows cygwin manifolds parallel |
Cc: | mmancini, egourgoulhon, jpflori | Merged in: | |
Authors: | Erik Bray | Reviewers: | Eric Gourgoulhon, Jean-Pierre Flori |
Report Upstream: | Reported upstream. No feedback yet. | Work issues: | |
Branch: | 877a3fd (Commits, GitHub, GitLab) | Commit: | 877a3fd8c03101edb4e3e91b091372dfd8eb31b2 |
Dependencies: | Stopgaps: |
Description (last modified by )
I'm consistently getting these test failures in this module on my Cygwin branch (just rebased on the latest develop branch, but I've had this for a couple weeks now):
sage -t --long --warn-long 158.6 src/sage/manifolds/differentiable/affine_connection.py ********************************************************************** File "src/sage/manifolds/differentiable/affine_connection.py", line 1749, in sage.manifolds.differentiable.affine_connection.AffineConnection.riemann Failed example: for i in M.irange(): for j in M.irange(): for k in M.irange(): nab.add_coef(eV)[i,j,k] = nab.coef(eVW)[i,j,k,c_uvW].expr() Exception raised: Traceback (most recent call last): File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 498, in _run self.compile_and_execute(example, compiler, test.globs) File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 861, in compile_and_execute exec(compiled, globs) File "<doctest sage.manifolds.differentiable.affine_connection.AffineConnection.riemann[30]>", line 4, in <module> nab.add_coef(eV)[i,j,k] = nab.coef(eVW)[i,j,k,c_uvW].expr() File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/manifolds/differentiable/affine_connection.py", line 628, in coef gam[[k,i,j]] = self(ev[i])(ef[k],ev[j]) File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/manifolds/differentiable/affine_connection.py", line 1348, in __call__ return self._derive_paral(tensor_r) File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/manifolds/differentiable/affine_connection.py", line 1470, in _derive_paral for ii,val in make_CovDerivative(listParalInput): File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/parallel/multiprocessing_sage.py", line 75, in parallel_iter for res in result: File "/home/embray/src/sagemath/sage-cygwin/local/lib/python/multiprocessing/pool.py", line 668, in next raise value RuntimeError: Aborted ********************************************************************** File "src/sage/manifolds/differentiable/affine_connection.py", line 1754, in sage.manifolds.differentiable.affine_connection.AffineConnection.riemann Failed example: r = nab.riemann() ; r Exception raised: Traceback (most recent call last): File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 498, in _run self.compile_and_execute(example, compiler, test.globs) File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 861, in compile_and_execute exec(compiled, globs) File "<doctest sage.manifolds.differentiable.affine_connection.AffineConnection.riemann[31]>", line 1, in <module> r = nab.riemann() ; r File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/manifolds/differentiable/affine_connection.py", line 1776, in riemann gam_gam = gam.contract(1, gam, 0) File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/tensor/modules/comp.py", line 2316, in contract for ii, val in make_Contraction(listParalInput): File "/home/embray/src/sagemath/sage-cygwin/local/lib/python2.7/site-packages/sage/parallel/multiprocessing_sage.py", line 75, in parallel_iter for res in result: File "/home/embray/src/sagemath/sage-cygwin/local/lib/python/multiprocessing/pool.py", line 668, in next raise value TypeError: Aborted ********************************************************************** 1 item had failures: 2 of 36 in sage.manifolds.differentiable.affine_connection.AffineConnection.riemann [418 tests, 2 failures, 229.98 s]
I can reproduce this outside the test suite as well. The same code works fine without parallel processing, so my guess is a problem in the parallel processing itself (though the tests for sage.parallel
itself all pass).
Upstream report: https://lists.opendylan.org/pipermail/bdwgc/2017-March/006266.html
Attachments (1)
Change History (16)
comment:1 Changed 5 years ago by
comment:2 Changed 5 years ago by
- Cc mmancini egourgoulhon added
Changed 5 years ago by
comment:3 follow-up: ↓ 5 Changed 5 years ago by
By the way, I've made significant progress in getting to the bottom of this. The full traceback leading up to the SIGABRT is attached.
But the main issue here seems to have to do with libgc's handling of threads in the child process after a fork, on Cygwin. In principle libgc explicitly supports Cygwin, but there still seems to be a bug somewhere related to this (or possibly just a build issue though I haven't found that to be the case yet...). I'm trying to boil it down to a simpler test case, but the issue seems to be that for one reason or another the post-fork handler is either not running at all, or not working properly, because in the forked process it's leaving behind a reference to a thread from the parent process, which has a handle to a native (Windows) thread that is no longer valid in the child process. The pthread_atfork
handler for the child process should have removed that thread but it doesn't.
A possible short-term workaround might be to compile libgc with threads disabled on Cygwin, but I don't know what other impacts that would have.
comment:4 Changed 5 years ago by
Confirmed that building libgc with ./configure --enable-threads=none
works around this particular issue.
But I'd still like to get to the bottom of this; ISTM like it might be a simple bug at the end of the day.
comment:5 in reply to: ↑ 3 ; follow-up: ↓ 7 Changed 5 years ago by
Replying to embray:
By the way, I've made significant progress in getting to the bottom of this. The full traceback leading up to the SIGABRT is attached.
Thanks for this update.
Another place where doctests involve parallel processing is src/sage/tensor/modules/comp.py
. Do you also experience any trouble with this file?
comment:6 follow-up: ↓ 8 Changed 5 years ago by
Okay, it turns out the pthread_atexit
handler doesn't actually get installed unless I explicitly build with ./configure --enable-handle-fork
(it can also be enabled at runtime, incidentally).
That said, the fact that it outright crashes on Windows without this option is also a bug, IMO, albeit fixable I think. But for Sage's purposes it will be easy enough to add the appropriate flags when building on Cygwin \o/
comment:7 in reply to: ↑ 5 ; follow-up: ↓ 9 Changed 5 years ago by
Replying to egourgoulhon:
Another place where doctests involve parallel processing is
src/sage/tensor/modules/comp.py
. Do you also experience any trouble with this file?
Actually this doctest passes fine as is. This issue only arises in certain cases if ECL is allocating a lot of small objects, and gc needs to suspend all the threads whose stacks it's monitoring so that it can make more room in its heap for more small objects (at least, that's how I understand the issue).
comment:8 in reply to: ↑ 6 Changed 5 years ago by
Replying to embray:
But for Sage's purposes it will be easy enough to add the appropriate flags when building on Cygwin \o/
Very good!
comment:9 in reply to: ↑ 7 Changed 5 years ago by
Replying to embray:
Replying to egourgoulhon:
Actually this doctest passes fine as is. This issue only arises in certain cases if ECL is allocating a lot of small objects, and gc needs to suspend all the threads whose stacks it's monitoring so that it can make more room in its heap for more small objects (at least, that's how I understand the issue).
OK I see. Indeed the doctests in src/sage/tensor/modules/comp.py
do not involve ECL (no symbolic calculus, hence no Maxima simplifications).
comment:10 Changed 5 years ago by
- Branch set to u/embray/cygwin/ticket-22694
- Commit set to 877a3fd8c03101edb4e3e91b091372dfd8eb31b2
- Description modified (diff)
- Report Upstream changed from N/A to Reported upstream. No feedback yet.
- Status changed from new to needs_review
I've attached a patch to Sage to configure GC to work properly on Cygwin. I've also made an upstream report, but only to suggest that the default configuration for Cygwin should incorporate this (since it's broken otherwise). I don't think there's a great way to work around the issue in the code without using pthread_atfork
that doesn't incur too much overhead.
New commits:
877a3fd | Set the configure flags required to function correctly w.r.t. fork/threads on Cygwin
|
comment:11 Changed 5 years ago by
- Cc jpflori added
- Reviewers set to Eric Gourgoulhon
- Status changed from needs_review to positive_review
LGTM. Just putting Jean-Pierre in Cc for a cross-check, since he knows far better than me Sage build process and Cygwin.
comment:12 follow-up: ↓ 14 Changed 5 years ago by
Looks good to me.
comment:13 Changed 5 years ago by
- Reviewers changed from Eric Gourgoulhon to Eric Gourgoulhon, Jean-Pierre Flori
comment:14 in reply to: ↑ 12 Changed 5 years ago by
comment:15 Changed 5 years ago by
- Branch changed from u/embray/cygwin/ticket-22694 to 877a3fd8c03101edb4e3e91b091372dfd8eb31b2
- Resolution set to fixed
- Status changed from positive_review to closed
The
RuntimeError: Aborted
is from Cysignals. Something is causing aSIGABRT
somewhere, though I'm not sure why.