Opened 10 years ago

Closed 10 years ago

Last modified 7 years ago

#14426 closed defect (fixed)

Runaway/Segfaulting ECL processes

Reported by: Jeroen Demeyer Owned by: Jeroen Demeyer
Priority: blocker Milestone: sage-5.9
Component: packages: standard Keywords:
Cc: Leif Leonhardy, Jean-Pierre Flori Merged in: sage-5.9.rc0
Authors: Jeroen Demeyer Reviewers: Volker Braun, John Cremona
Report Upstream: Reported upstream. Developers acknowledge bug. Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description (last modified by Jeroen Demeyer)

On some systems, when executing

./sage -tp --long devel/sage/sage/interfaces/lisp.py

there are two ECL processes which do (strace log)

read(0, "(setq sage0 2)\n", 1024)       = 15
write(1, "\n", 1)                       = 1
write(1, "2", 1)                        = 1
write(1, "\n", 1)                       = 1
write(1, ">", 1)                        = 1
write(1, " ", 1)                        = 1
read(0, 0x7f2c263b1000, 1024)           = -1 EIO (Input/output error)
--- SIGHUP (Hangup) @ 0 (0) ---
--- SIGCONT (Continued) @ 0 (0) ---
select(1, [0], NULL, NULL, {0, 0})      = 1 (in [0], left {0, 0})
select(1, [0], NULL, NULL, {0, 0})      = 1 (in [0], left {0, 0})
read(0, "", 1024)                       = 0
write(2, "\n", 1)                       = -1 EIO (Input/output error)
write(2, "\n", 1)                       = -1 EIO (Input/output error)
write(2, "\n", 1)                       = -1 EIO (Input/output error)
write(2, "\n", 1)                       = -1 EIO (Input/output error)
[...]

after which they either segfault or keep running forever.

A different way to see this problem:

jdemeyer@boxen:/release/merger/sage-5.9.beta2$ ./sage --sh -c 'echo syntax error |ecl 2>/dev/full'
ECL (Embeddable Common-Lisp) 12.12.1 (git:UNKNOWN)
Copyright (C) 1984 Taiichi Yuasa and Masami Hagiya
Copyright (C) 1993 Giuseppe Attardi
Copyright (C) 2000 Juan J. Garcia-Ripoll
ECL is free software, and you are welcome to redistribute it
under certain conditions; see file 'Copyright' for details.
Type :h for Help.  
Top level.
> /bin/bash: line 1: 11264 Done                    echo syntax error
     11265 Segmentation fault      | ecl 2> /dev/full

upstream bug: https://gitlab.com/embeddable-common-lisp/ecl/issues/43

spkg: http://boxen.math.washington.edu/home/jdemeyer/spkg/ecl-12.12.1.p2.spkg (diff)

apply: 14426_doctest.patch

ecl-12.12.1.p2 (Jeroen Demeyer, 9 April 2013)

  • #14426: write_error.patch: avoid an infinite loop when reporting an error while writing to stderr.
  • Rename spkg-make to spkg-src.
  • Don't unset MAKEFLAGS (it was not clear why this was needed).
  • It seems no longer needed to disable Altivec.
  • Support ECL_CONFIGURE environment variable for options to ./configure.

Attachments (2)

14426_doctest.patch (2.3 KB) - added by Jeroen Demeyer 10 years ago.
ecl-12.12.1.p2.diff (5.0 KB) - added by Jeroen Demeyer 10 years ago.

Download all attachments as: .zip

Change History (46)

comment:1 Changed 10 years ago by Jeroen Demeyer

Description: modified (diff)
Summary: Runaway ECL processesRunaway/Segfaulting ECL processes

comment:2 Changed 10 years ago by Jeroen Demeyer

Description: modified (diff)

comment:3 Changed 10 years ago by Jeroen Demeyer

Description: modified (diff)

comment:4 Changed 10 years ago by Jeroen Demeyer

Component: doctest frameworkpackages: standard
Description: modified (diff)
Owner: changed from David Roe to Jeroen Demeyer

comment:5 Changed 10 years ago by Jeroen Demeyer

Authors: Jeroen Demeyer
Description: modified (diff)

comment:6 Changed 10 years ago by Jeroen Demeyer

Description: modified (diff)
Report Upstream: N/AReported upstream. No feedback yet.
Status: newneeds_review

comment:7 Changed 10 years ago by Jeroen Demeyer

Description: modified (diff)

comment:8 Changed 10 years ago by John Cremona

I am testing this now, on a machine which showed the problem up to now. I expect it to work since it's the machine on which Jeroen diagnosed the problem, so anyone else who saw the problem should test it too.

comment:9 Changed 10 years ago by Leif Leonhardy

Cc: Leif Leonhardy added

comment:10 Changed 10 years ago by Leif Leonhardy

Isn't the problem also that PExpect interfaces apparently do not properly get shut down?


The bug / patch should be (better) documented in the spkg; AFAICS there's not even a link to the upstream report there.

comment:11 Changed 10 years ago by Andrey Novoseltsev

Fixes the problem for me!

comment:12 Changed 10 years ago by John Cremona

I installed the spkg and patch and now almost no file can be doctested successfully. For example

      File "/home/jec/sage-5.9.beta4/local/lib/python2.7/site-packages/sage/interfaces/maxima_lib.py", line 80, in <module>
        ecl_eval("(require 'maxima)")
      File "ecl.pyx", line 1225, in sage.libs.ecl.ecl_eval (sage/libs/ecl.c:7102)
      File "ecl.pyx", line 1240, in sage.libs.ecl.ecl_eval (sage/libs/ecl.c:7039)
      File "ecl.pyx", line 246, in sage.libs.ecl.ecl_safe_eval (sage/libs/ecl.c:2901)
    RuntimeError: ECL says: Module error: Don't know how to REQUIRE MAXIMA.

comment:13 in reply to:  12 Changed 10 years ago by Leif Leonhardy

Replying to cremona:

I installed the spkg and patch and now almost no file can be doctested successfully. For example

      File "/home/jec/sage-5.9.beta4/local/lib/python2.7/site-packages/sage/interfaces/maxima_lib.py", line 80, in <module>
        ecl_eval("(require 'maxima)")
      File "ecl.pyx", line 1225, in sage.libs.ecl.ecl_eval (sage/libs/ecl.c:7102)
      File "ecl.pyx", line 1240, in sage.libs.ecl.ecl_eval (sage/libs/ecl.c:7039)
      File "ecl.pyx", line 246, in sage.libs.ecl.ecl_safe_eval (sage/libs/ecl.c:2901)
    RuntimeError: ECL says: Module error: Don't know how to REQUIRE MAXIMA.

You of course have to rebuild the spkgs that depend on ECL as well, i.e., Maxima, and do sage -b afterwards.

comment:14 Changed 10 years ago by Leif Leonhardy

FWIW, I think we met that "double-fault" problem with stderr before, quite a while ago, and IIRC discussed it with upstream, so it's a bit astonishing it's still in. (Although the circumstances were probably slightly different.)

comment:15 Changed 10 years ago by Leif Leonhardy

SPKG.txt lacks a "Patches" section, and the following "Special Update/Build? Instructions" should get corrected:

 * Note: the way we configure Sage, CXX and CXXFLAGS are unused.
 * Note: for the time being, ECL is built single threaded library as it
   seems to interact badly with the pexpect interface and Sage's signal
   handling when built multithreaded.

(Related to the first, printing the settings of CXX and CXXFLAGS in spkg-install then makes no sense.)

comment:16 Changed 10 years ago by Leif Leonhardy

As expected, for me solves the issues with ECL and (Ubuntu's) GNU Make 3.81 and the new doctesting framework on Ubuntu 10.04.4 LTS x86_64. (Haven't tested on x86 yet, but I assume it will fix the specific ECL issue there as well.)

Still, a working cleaner should have properly killed the processes, and it's not obvious what actually caused ECL running amok (i.e., why writing to stderr fails in the first place).

comment:17 Changed 10 years ago by Jeroen Demeyer

Description: modified (diff)

I made some further small changes to the spkg-install file.

comment:18 Changed 10 years ago by John Cremona

OK, it worked for me after both rebuilding maxima and also the whole Sage library (sage -ba) after applying the patch and new spkg.

comment:19 Changed 10 years ago by Volker Braun

Reviewers: Volker Braun, John Cremona
Status: needs_reviewpositive_review

Looks good to me

comment:20 Changed 10 years ago by Jeroen Demeyer

On various, this causes ECL-related doctest failures. I have no idea why...

comment:21 Changed 10 years ago by Jeroen Demeyer

Also: /dev/full doesn't exist on all systems.

Changed 10 years ago by Jeroen Demeyer

Attachment: 14426_doctest.patch added

Changed 10 years ago by Jeroen Demeyer

Attachment: ecl-12.12.1.p2.diff added

comment:22 Changed 10 years ago by Jeroen Demeyer

Status: positive_reviewneeds_work

comment:23 Changed 10 years ago by Jeroen Demeyer

In particular, the doctest

sage: var('a,b,c') ## line 416 ##
(a, b, c)
sage: eqn = [a+b*c==1, b-a*c==0, a+b==5] ## line 418 ##
sage: s = solve(eqn, a,b,c); s ## line 419 ##

in devel/sage/doc/en/constructions/linear_algebra.rst seems problematic for ECL.

comment:24 Changed 10 years ago by Volker Braun

I guess /dev/full is linux only.

I don't get any doctest failures from linear_algebra.rst, for the record.

comment:25 in reply to:  24 ; Changed 10 years ago by Jeroen Demeyer

Replying to vbraun:

I don't get any doctest failures from linear_algebra.rst, for the record.

Well, the error isn't reproducible. When it fails, it usually fails like

sage: var('a,b,c') ## line 416 ##
(a, b, c)
sage: eqn = [a+b*c==1, b-a*c==0, a+b==5] ## line 418 ##
sage: s = solve(eqn, a,b,c); s ## line 419 ##

;;; Unhandled lisp initialization error
;;; Message:
UNBOUND-VARIABLE
;;; Arguments:

Internal or unrecoverable error in:

Lisp initialization error.

  [2: No such file or directory]

;;; ECL C Backtrace
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(si_dump_c_backtrace+0x28) [0x7f6a15678208]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(ecl_internal_error+0x3f) [0x7f6a156631df]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0x124324) [0x7f6a15663324]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(cl_funcall+0x70) [0x7f6a15646410]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(cl_error+0xdb) [0x7f6a1566416b]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0x1254b2) [0x7f6a156644b2]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(FEwrong_type_argument+0x1e) [0x7f6a156644de]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(stream_dispatch_table+0x17) [0x7f6a15656e47]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(ecl_write_char+0x1b) [0x7f6a156576db]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0x13769b) [0x7f6a1567669b]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(_ecl_write_symbol+0x156) [0x7f6a15676bf6]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(si_write_ugly_object+0x26) [0x7f6a15675cf6]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0x12430b) [0x7f6a1566330b]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(cl_funcall+0x70) [0x7f6a15646410]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(cl_error+0xdb) [0x7f6a1566416b]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0x125308) [0x7f6a15664308]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(ecl_interpret+0x19cd) [0x7f6a1564869d]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0x10e36f) [0x7f6a1564d36f]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(si_eval_with_env+0x2eb) [0x7f6a1564ef2b]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(si_signal_simple_error+0x26d) [0x7f6a15613e6d]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(FEwrong_type_nth_arg+0x109) [0x7f6a15663d29]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(_ecl_sethash+0) [0x7f6a156901a0]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0x14df58) [0x7f6a1568cf58]
;;; /lib64/libpthread.so.0() [0x36e9e0f500]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xb621e) [0x7f6a155f521e]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xbce17) [0x7f6a155fbe17]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xbd368) [0x7f6a155fc368]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xbd882) [0x7f6a155fc882]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xbd8bd) [0x7f6a155fc8bd]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xbd8bd) [0x7f6a155fc8bd]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xbd8bd) [0x7f6a155fc8bd]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xbd8bd) [0x7f6a155fc8bd]

**********************************************************************
----------------------------------------------------------------------
sage -t --long devel/sage/doc/en/constructions/linear_algebra.rst  # Killed due to abort
----------------------------------------------------------------------

comment:26 Changed 10 years ago by Jean-Pierre Flori

Cc: Jean-Pierre Flori added

comment:27 Changed 10 years ago by Jeroen Demeyer

Status: needs_workneeds_review

New version of the patch seems to work fine.

comment:28 Changed 10 years ago by Jean-Pierre Flori

I guess you mean the version where you check /dev/full exists?

comment:29 in reply to:  28 Changed 10 years ago by Jeroen Demeyer

Replying to jpflori:

I guess you mean the version where you check /dev/full exists?

And the new version of patches/write_error.patch inside the ECL spkg.

comment:30 Changed 10 years ago by Jean-Pierre Flori

Could you post the old version so that I can spot the differences?

comment:31 in reply to:  30 Changed 10 years ago by Jeroen Demeyer

Replying to jpflori:

Could you post the old version so that I can spot the differences?

I don't have the old version anymore. But that doesn't matter, could you perhaps review it as if there never was a previous version?

comment:32 Changed 10 years ago by Jean-Pierre Flori

As the patch is quite simple, I was wondering what was failing before and caused the random failures, but of course I can pretend this previous version did not exist.

comment:33 Changed 10 years ago by Jeroen Demeyer

The previous version patched restartable_io_error() but that was called from different places, possibly causing the problems.

comment:34 Changed 10 years ago by Volker Braun

Status: needs_reviewpositive_review

Looks good to me.

comment:35 Changed 10 years ago by Jeroen Demeyer

Merged in: sage-5.9.rc0
Resolution: fixed
Status: positive_reviewclosed

comment:36 Changed 10 years ago by Julien Puydt

Was the patch forwarded to upstream?

comment:37 in reply to:  36 ; Changed 10 years ago by Jeroen Demeyer

Replying to Snark:

Was the patch forwarded to upstream?

Yes.

comment:38 in reply to:  37 Changed 10 years ago by Julien Puydt

Replying to jdemeyer:

Replying to Snark:

Was the patch forwarded to upstream?

Yes.

Perfect!

comment:39 in reply to:  25 Changed 10 years ago by John Palmieri

Replying to jdemeyer:

Replying to vbraun:

I don't get any doctest failures from linear_algebra.rst, for the record.

Well, the error isn't reproducible. When it fails, it usually fails like

Are you sure this is related to this ticket? I only see this after applying the patches at #14055, and I see this whether I have applied the patches here or not. This is happening on both mark and taurus. (I also mentioned it on #14055.)

comment:40 in reply to:  36 ; Changed 10 years ago by Jeroen Demeyer

Replying to Snark:

Was the patch forwarded to upstream?

Yes, but upstream is totally ignoring it...

comment:41 in reply to:  40 Changed 7 years ago by Dima Pasechnik

Replying to jdemeyer:

Replying to Snark:

Was the patch forwarded to upstream?

Yes, but upstream is totally ignoring it...

here is another try. Upstream points out, correctly, that the patch does not work if ECL is configured without disabling threads.

https://gitlab.com/embeddable-common-lisp/ecl/merge_requests/1 and https://gitlab.com/embeddable-common-lisp/ecl/issues/43

comment:42 Changed 7 years ago by Dima Pasechnik

Report Upstream: Reported upstream. No feedback yet.Reported upstream. Developers acknowledge bug.

comment:43 Changed 7 years ago by Jeroen Demeyer

Description: modified (diff)

comment:44 Changed 7 years ago by Jeroen Demeyer

Description: modified (diff)
Note: See TracTickets for help on using tickets.