Opened 5 years ago

Last modified 13 months ago

#22766 closed defect

Trying completion list of maxima_lib crashes Sage — at Version 19

Reported by: rws Owned by:
Priority: critical Milestone: sage-duplicate/invalid/wontfix
Component: interfaces Keywords:
Cc: jdemeyer Merged in:
Authors: Reviewers:
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: #23956 Stopgaps:

Status badges

Description (last modified by jdemeyer)

If Maxima's commands list is not stored, then initialising Maxima/ECL and then hitting TAB after maxima_lib crashes Sage, as shown below. Other similar crashes may be triggered, see e.g. #23956.

The reason for these crashes is the design of tab completion in IPython 5+ using prompt_toolkit, which uses Python threading, and does tab completion in a separate thread.

$ rm -f ~/.sage/maxima_commandlist_cache.sobj
$ sage
┌────────────────────────────────────────────────────────────────────┐
│ SageMath version 8.0.beta0, Release Date: 2017-03-30               │
│ Type "notebook()" for the browser-based notebook interface.        │
│ Type "help()" for help.                                            │
└────────────────────────────────────────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Warning: this is a prerelease version, and it may be unstable.     ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
sage: from sage.interfaces.maxima_lib import maxima_lib
sage: maxima_lib.
Building Maxima command completion list (this takes
a few seconds only the first time you do it).
To force rebuild later, delete /home/ralf/.sage//maxima_commandlist_cache.sobj.
A
;;;
;;; Stack overflow.
;;; Jumping to the outermost toplevel prompt
;;;


Internal or unrecoverable error in:

;;;
;;; No frame to jump to
;;; Aborting ECL
;;;

;;; ECL C Backtrace
;;; /home/ralf/sage/local/lib/libecl.so.16.1(si_dump_c_backtrace+0x26) [0x7f012]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(ecl_internal_error+0x3f) [0x7f0127]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(FEerror+0) [0x7f0127706f20]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(+0x1adb1a) [0x7f012772eb1a]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(+0x12b3c5) [0x7f01276ac3c5]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(cl_funcall+0x70) [0x7f01276e7c90]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(si_serror+0xd9) [0x7f0127708299]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(ecl_cs_overflow+0xac) [0x7f012772e]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(+0x12b3c5) [0x7f01276ac3c5]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(cl_funcall+0x70) [0x7f01276e7c90]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(si_serror+0xd9) [0x7f0127708299]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(ecl_cs_overflow+0xac) [0x7f012772e]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(ecl_interpret+0x1d67) [0x7f01276ea]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(cl_apply+0x145) [0x7f01276e7e95]
;;; /home/ralf/sage/local/lib/python2.7/site-packages/sage/libs/ecl.so(+0xcf1d)]
;;; /home/ralf/sage/local/lib/python2.7/site-packages/sage/libs/ecl.so(+0x15511]
;;; /home/ralf/sage/local/lib/python2.7/site-packages/sage/libs/ecl.so(+0x15d6b]
;;; /home/ralf/sage/local/lib/python2.7/site-packages/sage/libs/ecl.so(+0x16d47]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(+0xc0c7f) [0x7f038ea92c7f]
;;; /home/ralf/sage/local/lib/python2.7/site-packages/sage/libs/ecl.so(+0xcad8)]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f038e]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x56da) [0]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x81c) [0x7]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8020) [0]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x81c) [0x7]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8020) [0]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x81c) [0x7]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8020) [0]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x81c) [0x7]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8020) [0]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x81c) [0x7]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(+0x87ecc) [0x7f038ea59ecc]
Aborted (core dumped)

Change History (19)

comment:1 Changed 5 years ago by rws

I think the crash happens in line 281 in /home/ralf/sage/local/var/tmp/sage/build/ecl-16.1.2.p2/src/src/c/interpreter.d.

comment:2 Changed 5 years ago by dimpase

It is because since IPython 5.*, tab completion happens in a different thread. And here you initialise Maxima in the main thread, but run Maxima (via triggering tab completion) in a separate thread.

See also #23700 and #23956 for a different way to trigger the same bug.

comment:3 follow-up: Changed 5 years ago by jhpalmieri

On OS X (10.12.6, Xcode 8.3.3, Sage 8.1.beta6), I don't get this crash, and I also don't see the message "Building Maxima command completion list ...": I just immediately get a list of completions.

comment:4 in reply to: ↑ 3 Changed 5 years ago by dimpase

Replying to jhpalmieri:

On OS X (10.12.6, Xcode 8.3.3, Sage 8.1.beta6), I don't get this crash, and I also don't see the message "Building Maxima command completion list ...": I just immediately get a list of completions.

It's because you already have this list built and stored in somewhere in ~/.sage, I suppose. Try moving it out of the way 1st.

comment:5 follow-up: Changed 5 years ago by jhpalmieri

Okay, I get the crash after deleting .sage/maxima_commandlist_cache.sobj. Do you know why starting Sage and then doing maxima.<TAB> doesn't trigger the crash, but instead rebuilds this file?

comment:6 in reply to: ↑ 5 Changed 5 years ago by dimpase

Replying to jhpalmieri:

Okay, I get the crash after deleting .sage/maxima_commandlist_cache.sobj. Do you know why starting Sage and then doing maxima.<TAB> doesn't trigger the crash, but instead rebuilds this file?

Yes, it is because in this case everything happens in the same thread (the one of the tab completion).

comment:7 follow-up: Changed 5 years ago by nbruin

Making maxima_lib "thread-safe" would consist of *locking* it to one thread. Due to the signal management switching that happens upon entering/exiting ecllib makes it fundamentally incompatible with multi-threading, because signal handlers are process-specific; not thread-specific.

Sage installs special signal handlers (for SIGINT, for instance), and so does ECL. If ECL runs with multi-threading, ECL even goes further with signal handling (it makes a dedicated signal handling thread), and it uses signals to synchronize threads for critical GC operations.

If you want to get ecllib to a state where it can safely be used in a multi-threaded environment, I think one would have to unify the signal management of sage and ecl.

The result would not actually make maxima_lib threadsafe, because maxima itself is rather fundamentally not thread-safe.

comment:8 Changed 5 years ago by rws

Removing dependencies seems so much more promising, see https://trac.sagemath.org/wiki/symbolics/maxima

comment:9 Changed 5 years ago by dimpase

Here is another scenario with two threads, only leading to an abort, not to a segfault. Here I tab-complete from sage.libs.ecl import to force initialisation in non-main thread.

sage: from sage.libs.ecl import                        
 at  init_ecl
 thread id  <Thread(Thread-32, started 139989416191744)> 
 active threads  [<_MainThread(MainThread, started 140000333952768)>, <HistorySavingThread(IPythonHistorySavingThread, started 140000088241920)>, <Thread(Thread-32, started 139989416191744)>] 
sage: from sage.libs.ecl import *
sage: from sage.interfaces.maxima_lib import *
 at  ecl_eval
 thread id  <_MainThread(MainThread, started 140000333952768)> 
 active threads  [<_MainThread(MainThread, started 140000333952768)>, <HistorySavingThread(IPythonHistorySavingThread, started 140000088241920)>] 
 at  ecl_safe_eval
 thread id  <_MainThread(MainThread, started 140000333952768)> 
 active threads  [<_MainThread(MainThread, started 140000333952768)>, <HistorySavingThread(IPythonHistorySavingThread, started 140000088241920)>] 
 at  ecl_eval
 thread id  <_MainThread(MainThread, started 140000333952768)> 
 active threads  [<_MainThread(MainThread, started 140000333952768)>, <HistorySavingThread(IPythonHistorySavingThread, started 140000088241920)>] 
 at  ecl_safe_eval
 thread id  <_MainThread(MainThread, started 140000333952768)> 
 active threads  [<_MainThread(MainThread, started 140000333952768)>, <HistorySavingThread(IPythonHistorySavingThread, started 140000088241920)>] 
Collecting from unknown thread
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-2-5e6d4a068396> in <module>()
----> 1 from sage.interfaces.maxima_lib import *

/home/dima/Sage/sage-dev/local/lib/python2.7/site-packages/sage/interfaces/maxima_lib.py in <module>()
    102 ## i.e. loading it into ECL
    103 ecl_eval("(setf *load-verbose* NIL)")
--> 104 ecl_eval("(require 'maxima)")
    105 ecl_eval("(in-package :maxima)")
    106 ecl_eval("(setq $nolabels t))")

/home/dima/Sage/sage-dev/src/sage/libs/ecl.pyx in sage.libs.ecl.ecl_eval (build/cythonized/sage/libs/ecl.c:10977)()
   1328 
   1329 #convenience routine to more easily evaluate strings
-> 1330 cpdef EclObject ecl_eval(bytes s):
   1331     """
   1332     Read and evaluate string in Lisp and return the result

/home/dima/Sage/sage-dev/src/sage/libs/ecl.pyx in sage.libs.ecl.ecl_eval (build/cythonized/sage/libs/ecl.c:10916)()
   1344     cdef cl_object o
   1345     o=ecl_safe_read_string(s)
-> 1346     o=ecl_safe_eval(o)
   1347     return ecl_wrap(o)
   1348 

/home/dima/Sage/sage-dev/src/sage/libs/ecl.pyx in sage.libs.ecl.ecl_safe_eval (build/cythonized/sage/libs/ecl.c:5710)()
    343     report_threading_status("ecl_safe_eval")
    344     cdef cl_object s
--> 345     ecl_sig_on()
    346     cl_funcall(2,safe_eval_clobj,form)
    347     ecl_sig_off()

RuntimeError: Aborted
sage: 

with the following prints inserted into ecl.pyx:

diff --git a/src/sage/libs/ecl.pyx b/src/sage/libs/ecl.pyx
index 20e937876d..879d405d78 100644
--- a/src/sage/libs/ecl.pyx
+++ b/src/sage/libs/ecl.pyx
@@ -240,6 +240,7 @@ def init_ecl():
     cdef sigaction_t sage_action[32]
     cdef int i
 
+    report_threading_status("init_ecl")
     if ecl_has_booted:
         raise RuntimeError("ECL is already initialized")
 
@@ -339,6 +340,7 @@ cdef cl_object ecl_safe_eval(cl_object form) except NULL:
         ...
         RuntimeError: ECL says: Console interrupt.
     """
+    report_threading_status("ecl_safe_eval")
     cdef cl_object s
     ecl_sig_on()
     cl_funcall(2,safe_eval_clobj,form)
@@ -1318,6 +1320,12 @@ cdef EclObject ecl_wrap(cl_object o):
     obj.set_obj(o)
     return obj
 
+cpdef report_threading_status(s):
+    import threading
+    print("\n at ", s)
+    print("\n thread id ", threading.current_thread(), "\n")
+    print(" active threads ", threading.enumerate(), "\n")
+
 #convenience routine to more easily evaluate strings
 cpdef EclObject ecl_eval(bytes s):
     """
@@ -1332,6 +1340,7 @@ cpdef EclObject ecl_eval(bytes s):
         <ECL: (1 1 2 3 5 8 13)>
 
     """
+    report_threading_status("ecl_eval")
     cdef cl_object o
     o=ecl_safe_read_string(s)
     o=ecl_safe_eval(o)

comment:10 Changed 5 years ago by dimpase

I think what we see here is ECL being initialised in a thread (number 32) that later is shut down, and then maxima_lib import breaks, as ECL isn't available to run.

It seems that indeed we must make sure that ECL is always started in the main thread, which does not disappear.

comment:11 Changed 5 years ago by dimpase

  • Milestone changed from sage-8.0 to sage-8.1
  • Priority changed from critical to blocker

A part of the relevant discussion is on #23956, which I'll close as duplicate.

comment:12 Changed 5 years ago by dimpase

  • Dependencies set to #23956

comment:13 Changed 5 years ago by dimpase

  • Description modified (diff)

comment:14 Changed 5 years ago by dimpase

  • Cc jdemeyer added

I wonder whether any other extension (apart from ECL/Maxima) is affected by this issue.

comment:15 follow-up: Changed 5 years ago by nbruin

Reiterating from 23956:

The effect of ecl_sig_on and ecl_sig_off is NOT thread-local. Thus during the clock time that ecl_sig_on is active (i.e., that ecl code is being executed), signals that are supposed to be handled by the sage signal handler will be handled in the wrong way.

That means it is NOT safe to execute sage code in a thread parallel to a thread that is executing ecl code (properly).

So, if we allow for multiple threads in sage, we'd strictly have to halt all the other threads upon executing ecl_sig_on, and start them again when the corresponding ecl_sig_off is entered. That, or cross your fingers no signals destined for python arrive during that time period.

In addition, we're running ECL with threading support on their end *disabled*. I would be surprised if, with that configuration, it is still possible to have multiple threads configured to be able to execute ECL (ECL cares a lot about knowing which threads might be executing ECL code, because they need to be stopped during critical GC events. I expect that all of that is turned off when threading support is turned off).

Given that IPython apparently runs tab completion in a separate thread, I think the most straightforward way of solving the immediate problem here is to avoid that ecl code will be run upon tab completion. That can be done by building the completion cache upon build time, rather than on-demand.

comment:16 in reply to: ↑ 15 ; follow-up: Changed 5 years ago by dimpase

Replying to nbruin:

The effect of ecl_sig_on and ecl_sig_off is NOT thread-local. Thus during the clock time that ecl_sig_on is active (i.e., that ecl code is being executed), signals that are supposed to be handled by the sage signal handler will be handled in the wrong way.

That means it is NOT safe to execute sage code in a thread parallel to a thread that is executing ecl code (properly).

Right, I think I finally understand your point about signals---sorry for being thick.

It's even worse, I think - apart from signals, ecllib does non-thread-safe things to global variables... It's known that in such a case GIL does not suffice, you also need a lock from Python threading

lock=threading.Lock()
with lock:
    <do unsafe (non-atomic) stuff here>

That is we potentially might still get hit by many threads here, even if something seemingly innocent happens.

To me it looks that to disable threads in tab completion is a more robust solution, and it will also make sure that other extensions are safe and sound in this respect, not only ECL/Maxima.

comment:17 in reply to: ↑ 16 Changed 5 years ago by nbruin

Replying to dimpase:

It's even worse, I think - apart from signals, ecllib does non-thread-safe things to global variables...

After initialization that should pretty much be limited to the modifications that are made to the ECL doubly linked list *SAGE-LIST-OF-OBJECTS*. The modifications run in ECL whenever an EclObject is made or deleted (so that should lock, probably). Otherwise I think the signal stuff is the main obstruction to thread-safety.

maximalib is a different issue: maxima is just not thread-safe in its design at all. So I don't think it's worth investing in making ecllib thread-safe (and the signals are a real obstruction), because our main application doesn't allow it anyway.

comment:18 in reply to: ↑ 7 Changed 5 years ago by jdemeyer

Replying to nbruin:

It uses signals to synchronize threads for critical GC operations.

It seems that ecl_sig_on() changes SIGINT, SIGBUS and SIGSEGV. Does it really use one of those standard signals to deal with GC operations? Because if none of those 3 signals are involved, the issue can't be signal handlers.

It is true that signals and threads generally do not mix well. Signal handlers are set on the level of the process, not threads.

comment:19 Changed 5 years ago by jdemeyer

  • Description modified (diff)
Note: See TracTickets for help on using tickets.