#14426 closed defect (fixed)
Runaway/Segfaulting ECL processes
Reported by: | jdemeyer | Owned by: | jdemeyer |
---|---|---|---|
Priority: | blocker | Milestone: | sage-5.9 |
Component: | packages: standard | Keywords: | |
Cc: | leif, jpflori | Merged in: | sage-5.9.rc0 |
Authors: | Jeroen Demeyer | Reviewers: | Volker Braun, John Cremona |
Report Upstream: | Reported upstream. Developers acknowledge bug. | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description (last modified by )
On some systems, when executing
./sage -tp --long devel/sage/sage/interfaces/lisp.py
there are two ECL processes which do (strace log)
read(0, "(setq sage0 2)\n", 1024) = 15 write(1, "\n", 1) = 1 write(1, "2", 1) = 1 write(1, "\n", 1) = 1 write(1, ">", 1) = 1 write(1, " ", 1) = 1 read(0, 0x7f2c263b1000, 1024) = -1 EIO (Input/output error) --- SIGHUP (Hangup) @ 0 (0) --- --- SIGCONT (Continued) @ 0 (0) --- select(1, [0], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0}) select(1, [0], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0}) read(0, "", 1024) = 0 write(2, "\n", 1) = -1 EIO (Input/output error) write(2, "\n", 1) = -1 EIO (Input/output error) write(2, "\n", 1) = -1 EIO (Input/output error) write(2, "\n", 1) = -1 EIO (Input/output error) [...]
after which they either segfault or keep running forever.
A different way to see this problem:
jdemeyer@boxen:/release/merger/sage-5.9.beta2$ ./sage --sh -c 'echo syntax error |ecl 2>/dev/full' ECL (Embeddable Common-Lisp) 12.12.1 (git:UNKNOWN) Copyright (C) 1984 Taiichi Yuasa and Masami Hagiya Copyright (C) 1993 Giuseppe Attardi Copyright (C) 2000 Juan J. Garcia-Ripoll ECL is free software, and you are welcome to redistribute it under certain conditions; see file 'Copyright' for details. Type :h for Help. Top level. > /bin/bash: line 1: 11264 Done echo syntax error 11265 Segmentation fault | ecl 2> /dev/full
upstream bug: https://gitlab.com/embeddable-common-lisp/ecl/issues/43
spkg: http://boxen.math.washington.edu/home/jdemeyer/spkg/ecl-12.12.1.p2.spkg (diff)
apply: 14426_doctest.patch
ecl-12.12.1.p2 (Jeroen Demeyer, 9 April 2013)
- #14426: write_error.patch: avoid an infinite loop when reporting an error while writing to stderr.
- Rename spkg-make to spkg-src.
- Don't unset MAKEFLAGS (it was not clear why this was needed).
- It seems no longer needed to disable Altivec.
- Support ECL_CONFIGURE environment variable for options to ./configure.
Attachments (2)
Change History (46)
comment:1 Changed 9 years ago by
- Description modified (diff)
- Summary changed from Runaway ECL processes to Runaway/Segfaulting ECL processes
comment:2 Changed 9 years ago by
- Description modified (diff)
comment:3 Changed 9 years ago by
- Description modified (diff)
comment:4 Changed 9 years ago by
- Component changed from doctest framework to packages: standard
- Description modified (diff)
- Owner changed from roed to jdemeyer
comment:5 Changed 9 years ago by
- Description modified (diff)
comment:6 Changed 9 years ago by
- Description modified (diff)
- Report Upstream changed from N/A to Reported upstream. No feedback yet.
- Status changed from new to needs_review
comment:7 Changed 9 years ago by
- Description modified (diff)
comment:8 Changed 9 years ago by
comment:9 Changed 9 years ago by
- Cc leif added
comment:10 Changed 9 years ago by
Isn't the problem also that PExpect interfaces apparently do not properly get shut down?
The bug / patch should be (better) documented in the spkg; AFAICS there's not even a link to the upstream report there.
comment:11 Changed 9 years ago by
Fixes the problem for me!
comment:12 follow-up: ↓ 13 Changed 9 years ago by
I installed the spkg and patch and now almost no file can be doctested successfully. For example
File "/home/jec/sage-5.9.beta4/local/lib/python2.7/site-packages/sage/interfaces/maxima_lib.py", line 80, in <module> ecl_eval("(require 'maxima)") File "ecl.pyx", line 1225, in sage.libs.ecl.ecl_eval (sage/libs/ecl.c:7102) File "ecl.pyx", line 1240, in sage.libs.ecl.ecl_eval (sage/libs/ecl.c:7039) File "ecl.pyx", line 246, in sage.libs.ecl.ecl_safe_eval (sage/libs/ecl.c:2901) RuntimeError: ECL says: Module error: Don't know how to REQUIRE MAXIMA.
comment:13 in reply to: ↑ 12 Changed 9 years ago by
Replying to cremona:
I installed the spkg and patch and now almost no file can be doctested successfully. For example
File "/home/jec/sage-5.9.beta4/local/lib/python2.7/site-packages/sage/interfaces/maxima_lib.py", line 80, in <module> ecl_eval("(require 'maxima)") File "ecl.pyx", line 1225, in sage.libs.ecl.ecl_eval (sage/libs/ecl.c:7102) File "ecl.pyx", line 1240, in sage.libs.ecl.ecl_eval (sage/libs/ecl.c:7039) File "ecl.pyx", line 246, in sage.libs.ecl.ecl_safe_eval (sage/libs/ecl.c:2901) RuntimeError: ECL says: Module error: Don't know how to REQUIRE MAXIMA.
You of course have to rebuild the spkgs that depend on ECL as well, i.e., Maxima, and do sage -b
afterwards.
comment:14 Changed 9 years ago by
FWIW, I think we met that "double-fault" problem with stderr
before, quite a while ago, and IIRC discussed it with upstream, so it's a bit astonishing it's still in. (Although the circumstances were probably slightly different.)
comment:15 Changed 9 years ago by
SPKG.txt
lacks a "Patches" section, and the following "Special Update/Build? Instructions" should get corrected:
* Note: the way we configure Sage, CXX and CXXFLAGS are unused. * Note: for the time being, ECL is built single threaded library as it seems to interact badly with the pexpect interface and Sage's signal handling when built multithreaded.
(Related to the first, printing the settings of CXX
and CXXFLAGS
in spkg-install
then makes no sense.)
comment:16 Changed 9 years ago by
As expected, for me solves the issues with ECL and (Ubuntu's) GNU Make 3.81 and the new doctesting framework on Ubuntu 10.04.4 LTS x86_64. (Haven't tested on x86 yet, but I assume it will fix the specific ECL issue there as well.)
Still, a working cleaner should have properly killed the processes, and it's not obvious what actually caused ECL running amok (i.e., why writing to stderr
fails in the first place).
comment:17 Changed 9 years ago by
- Description modified (diff)
I made some further small changes to the spkg-install
file.
comment:18 Changed 9 years ago by
OK, it worked for me after both rebuilding maxima and also the whole Sage library (sage -ba) after applying the patch and new spkg.
comment:19 Changed 9 years ago by
- Reviewers set to Volker Braun, John Cremona
- Status changed from needs_review to positive_review
Looks good to me
comment:20 Changed 9 years ago by
On various, this causes ECL-related doctest failures. I have no idea why...
comment:21 Changed 9 years ago by
Also: /dev/full
doesn't exist on all systems.
Changed 9 years ago by
Changed 9 years ago by
comment:22 Changed 9 years ago by
- Status changed from positive_review to needs_work
comment:23 Changed 9 years ago by
In particular, the doctest
sage: var('a,b,c') ## line 416 ## (a, b, c) sage: eqn = [a+b*c==1, b-a*c==0, a+b==5] ## line 418 ## sage: s = solve(eqn, a,b,c); s ## line 419 ##
in devel/sage/doc/en/constructions/linear_algebra.rst
seems problematic for ECL.
comment:24 follow-up: ↓ 25 Changed 9 years ago by
I guess /dev/full
is linux only.
I don't get any doctest failures from linear_algebra.rst
, for the record.
comment:25 in reply to: ↑ 24 ; follow-up: ↓ 39 Changed 9 years ago by
Replying to vbraun:
I don't get any doctest failures from
linear_algebra.rst
, for the record.
Well, the error isn't reproducible. When it fails, it usually fails like
sage: var('a,b,c') ## line 416 ## (a, b, c) sage: eqn = [a+b*c==1, b-a*c==0, a+b==5] ## line 418 ## sage: s = solve(eqn, a,b,c); s ## line 419 ## ;;; Unhandled lisp initialization error ;;; Message: UNBOUND-VARIABLE ;;; Arguments: Internal or unrecoverable error in: Lisp initialization error. [2: No such file or directory] ;;; ECL C Backtrace ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(si_dump_c_backtrace+0x28) [0x7f6a15678208] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(ecl_internal_error+0x3f) [0x7f6a156631df] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0x124324) [0x7f6a15663324] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(cl_funcall+0x70) [0x7f6a15646410] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(cl_error+0xdb) [0x7f6a1566416b] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0x1254b2) [0x7f6a156644b2] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(FEwrong_type_argument+0x1e) [0x7f6a156644de] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(stream_dispatch_table+0x17) [0x7f6a15656e47] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(ecl_write_char+0x1b) [0x7f6a156576db] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0x13769b) [0x7f6a1567669b] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(_ecl_write_symbol+0x156) [0x7f6a15676bf6] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(si_write_ugly_object+0x26) [0x7f6a15675cf6] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0x12430b) [0x7f6a1566330b] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(cl_funcall+0x70) [0x7f6a15646410] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(cl_error+0xdb) [0x7f6a1566416b] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0x125308) [0x7f6a15664308] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(ecl_interpret+0x19cd) [0x7f6a1564869d] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0x10e36f) [0x7f6a1564d36f] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(si_eval_with_env+0x2eb) [0x7f6a1564ef2b] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(si_signal_simple_error+0x26d) [0x7f6a15613e6d] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(FEwrong_type_nth_arg+0x109) [0x7f6a15663d29] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(_ecl_sethash+0) [0x7f6a156901a0] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0x14df58) [0x7f6a1568cf58] ;;; /lib64/libpthread.so.0() [0x36e9e0f500] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xb621e) [0x7f6a155f521e] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xbce17) [0x7f6a155fbe17] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xbd368) [0x7f6a155fc368] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xbd882) [0x7f6a155fc882] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xbd8bd) [0x7f6a155fc8bd] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xbd8bd) [0x7f6a155fc8bd] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xbd8bd) [0x7f6a155fc8bd] ;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xbd8bd) [0x7f6a155fc8bd] ********************************************************************** ---------------------------------------------------------------------- sage -t --long devel/sage/doc/en/constructions/linear_algebra.rst # Killed due to abort ----------------------------------------------------------------------
comment:26 Changed 9 years ago by
- Cc jpflori added
comment:27 Changed 9 years ago by
- Status changed from needs_work to needs_review
New version of the patch seems to work fine.
comment:28 follow-up: ↓ 29 Changed 9 years ago by
I guess you mean the version where you check /dev/full exists?
comment:29 in reply to: ↑ 28 Changed 9 years ago by
Replying to jpflori:
I guess you mean the version where you check /dev/full exists?
And the new version of patches/write_error.patch
inside the ECL spkg.
comment:30 follow-up: ↓ 31 Changed 9 years ago by
Could you post the old version so that I can spot the differences?
comment:31 in reply to: ↑ 30 Changed 9 years ago by
Replying to jpflori:
Could you post the old version so that I can spot the differences?
I don't have the old version anymore. But that doesn't matter, could you perhaps review it as if there never was a previous version?
comment:32 Changed 9 years ago by
As the patch is quite simple, I was wondering what was failing before and caused the random failures, but of course I can pretend this previous version did not exist.
comment:33 Changed 9 years ago by
The previous version patched restartable_io_error()
but that was called from different places, possibly causing the problems.
comment:34 Changed 9 years ago by
- Status changed from needs_review to positive_review
Looks good to me.
comment:35 Changed 9 years ago by
- Merged in set to sage-5.9.rc0
- Resolution set to fixed
- Status changed from positive_review to closed
comment:36 follow-ups: ↓ 37 ↓ 40 Changed 9 years ago by
Was the patch forwarded to upstream?
comment:37 in reply to: ↑ 36 ; follow-up: ↓ 38 Changed 9 years ago by
comment:38 in reply to: ↑ 37 Changed 9 years ago by
comment:39 in reply to: ↑ 25 Changed 9 years ago by
Replying to jdemeyer:
Replying to vbraun:
I don't get any doctest failures from
linear_algebra.rst
, for the record.Well, the error isn't reproducible. When it fails, it usually fails like
Are you sure this is related to this ticket? I only see this after applying the patches at #14055, and I see this whether I have applied the patches here or not. This is happening on both mark and taurus. (I also mentioned it on #14055.)
comment:40 in reply to: ↑ 36 ; follow-up: ↓ 41 Changed 9 years ago by
comment:41 in reply to: ↑ 40 Changed 7 years ago by
Replying to jdemeyer:
Replying to Snark:
Was the patch forwarded to upstream?
Yes, but upstream is totally ignoring it...
here is another try. Upstream points out, correctly, that the patch does not work if ECL is configured without disabling threads.
https://gitlab.com/embeddable-common-lisp/ecl/merge_requests/1 and https://gitlab.com/embeddable-common-lisp/ecl/issues/43
comment:42 Changed 7 years ago by
- Report Upstream changed from Reported upstream. No feedback yet. to Reported upstream. Developers acknowledge bug.
comment:43 Changed 7 years ago by
- Description modified (diff)
comment:44 Changed 7 years ago by
- Description modified (diff)
I am testing this now, on a machine which showed the problem up to now. I expect it to work since it's the machine on which Jeroen diagnosed the problem, so anyone else who saw the problem should test it too.