Opened 8 years ago

Closed 8 years ago

#10818 closed defect (fixed)

EclLib should allow signals to make LISP code interruptable

Reported by: nbruin Owned by: was
Priority: major Milestone: sage-4.7.1
Component: interfaces Keywords: lisp ecl signal interrupt
Cc: fbissey, jdemeyer, kcrisman Merged in: sage-4.7.1.alpha0
Authors: Nils Bruin, Jeroen Demeyer Reviewers: Jean-Pierre Flori
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by jdemeyer)

Presently, ecllib does not enable signals when executing ecl code. This makes LISP code uninterruptable:

sage: from sage.libs.ecl import *
sage: ecl_eval("(setf i 0)")
<ECL: 0>
sage: inf_loop=ecl_eval("(defun infinite() (loop (incf i)))")
sage: inf_loop() #DON'T DO THIS! (bye bye)

The signal handling in ECL should be studied a bit more to ensure that we are safely interacting with it.

When testing this patch, absolutely test it to sage-4.7.alpha1 or later since ECL was updated (#10766) and interrupts were completely rewritten (#9678).

Apply 10818_handlerswap.p4.patch

Attachments (6)

signal-handling.patch (1.7 KB) - added by nbruin 8 years ago.
rudimentary approach to signal handling in Ecllib
handlerswap.patch (2.7 KB) - added by nbruin 8 years ago.
Try to swap signal handlers
siginttest.c (1.6 KB) - added by nbruin 8 years ago.
handlerswap.p2.patch (4.0 KB) - added by nbruin 8 years ago.
Swap signal handlers and enable signals (this one seems to work!)
10818_handlerswap.p3.patch (7.3 KB) - added by jdemeyer 8 years ago.
10818_handlerswap.p4.patch (7.5 KB) - added by jdemeyer 8 years ago.

Download all attachments as: .zip

Change History (48)

Changed 8 years ago by nbruin

rudimentary approach to signal handling in Ecllib

comment:1 Changed 8 years ago by nbruin

  • Status changed from new to needs_work

With the patch, we can indeed interrupt ECL. We should try to do some more critical stuff and see if ECL can get in an inconsistent state due to this (I expect it can). Example session:

sage: from sage.libs.ecl import *
sage: ecl_eval("(setf i 0)")
<ECL: 0>
sage: inf_loop=ecl_eval("(defun infinite() (loop (incf i)))")
sage: inf_loop()
KeyboardInterrupt: 
sage: ecl_eval("i")
<ECL: 5337027>
sage: ecl_eval("(infinite)")
KeyboardInterrupt: 
sage: ecl_eval("i")
<ECL: 15518182>

comment:2 Changed 8 years ago by fbissey

  • Cc fbissey added

Changed 8 years ago by nbruin

Try to swap signal handlers

Changed 8 years ago by nbruin

comment:3 Changed 8 years ago by nbruin

A simple approach would be to swap in ECL's signal handler for SIGINT upon entry of ECL code and swap back to the sage signal handler afterwards. The patch handlerswap.patch takes this approach. Unfortunately, it doesn't work. For some reason, the SIGINT handler we capture from ECL seems to just exit upon interrupt. This is not ECL's normal behaviour and the attached stand-alone program siginttest.c (also attached) shows that this approach works in principle.

I have no idea why the same approach leads to different results in a sage and cython context.

comment:4 Changed 8 years ago by nbruin

  • Milestone changed from sage-feature to sage-4.7
  • Status changed from needs_work to needs_review
  • Work issues investigate ECL interrupt handling deleted

Apply handlerswap.p2.patch

Another take on swapping handlers and this one seems to work:

sage: from sage.interfaces.maxima_lib import maxima as max
sage: max('sum(1/x^2, x, 1, 100000)')
^C---------------------------------------------------------------------------
[...]
TypeError: ECL says: Console interrupt
sage: #we press Ctrl-C again
KeyboardInterrupt

For reference, ECL prefers to use sigaction (in fact, probably sage should be using sigaction as well. Certain race conditions with signal are avoidable with sigaction). In glibc, signal is implemented via sigaction anyway. sigaction has a richer interface, so signal cannot always properly report what sigaction has done. By using sigaction to record and swap the different SIGINT handlers, it seems to work.

Subtle race conditions still exist: the ECL handler is instated just before entering lisp code and setting up the "unwind-protect stack frame", so an interrupt that gets delivered in that window would cause unspecified behaviour. By running

while True: os.kill(<PID>,2)

it is easy to trigger such behaviour, but sage in its present state already cannot handle this. Therefore, I think the present patch establishes acceptable "best effort" behaviour.

Changed 8 years ago by nbruin

Swap signal handlers and enable signals (this one seems to work!)

comment:5 Changed 8 years ago by nbruin

Small update:

  • Juanjo explains that unwelcome signals in ECL get queued. To mark this fact, a memory block gets mprotected to readonly. The next time a write occurs, a SIGSEGV is received and the queued signal is processed if it is now safe to do so. This means, however, that if ECL is allowed to process *any* signals then it should be allowed to handle SIGSEGV. So that handler needs to be swapped as well.
  • constants SIGINT and SIGSEGV are now actually imported.

Race conditions still exist. If you run a sage process with pid PID with something along the lines of

from sage.libs.ecl import *
lp = ecl_eval("(defun lp (a) (loop))")
while True:
    try:
        lp(1)
    except RuntimeError as e:
        print e

and then run in another process:

while True:
    os.kill(PID,2)
    sleep(0.01)

you may eventually see strange behaviour. To avoid this we could disable signals prior to swapping the handlers and only enable them once we're inside lisp (and do the reverse on the way out). However, the signal handlers in sage are already not completely bullet proof (I don't think there are many programs where they are). Improvements welcome of course, but I think the present gives a workable compromise.

comment:6 Changed 8 years ago by nbruin

patchbot apply handlerswap.p2.patch

comment:7 follow-up: Changed 8 years ago by drkirkby

If this can leave ECL in an undefined state, and so potentially Maxima generate incorrect results, would it not be better left out? Not being able to interrupt code is annoying, but potentially getting incorrect results is more dangerous.

Dave

comment:8 in reply to: ↑ 7 Changed 8 years ago by nbruin

Replying to drkirkby:

If this can leave ECL in an undefined state, and so potentially Maxima generate incorrect results

I don't know if maxima is robust in the face of interrupts, but we're already exposed to that by letting pexpect propagate SIGINT.

As far as I know, the present patch does not contribute to possible corruption of ECL. You might end up in the ECL debugger if the interrupt arrives between teardown of the Lisp version of try/expect and the swap of the interrupt handlers. This is easily distinguished from a wrong answer.

See http://sourceforge.net/mailarchive/message.php?msg_id=27127709 for JJ G-P's suggestion for dealing with this.

comment:9 Changed 8 years ago by jdemeyer

  • Cc jdemeyer added
  • Description modified (diff)
  • Reviewers set to Jeroen Demeyer

comment:10 Changed 8 years ago by jdemeyer

  • Description modified (diff)

comment:11 Changed 8 years ago by jdemeyer

  • Authors changed from Nils Bruin to Nils Bruin, Jeroen Demeyer
  • Description modified (diff)
  • Reviewers Jeroen Demeyer deleted

comment:12 follow-up: Changed 8 years ago by jdemeyer

With my patch, I introduce new functions ecl_sig_on() and ecl_sig_off() which is a cleaner solution. I also added two tests to check that interrupts actually work.

comment:13 in reply to: ↑ 12 ; follow-ups: Changed 8 years ago by nbruin

Nice cleanup! Two questions:

  • in ecl_sig_on(), you do a sig_on() BEFORE switching to the ecl signal handler. That provides a window for an interrupt to still be handled by the SAGE handler. Is that intentional? (It might even be a good idea if that "clears out" the signal queue before switching to ecl handling, because ecl will not be aware of any signals queued by SAGE, which might cause them to grow stale)
  • Juanjo's suggestion http://sourceforge.net/mailarchive/message.php?msg_id=27127709 includes using some macros from ecl's header files. Can that be integrated in ecl_sig_on() etc? Since you are moving things to a .c file, using those macros will be much easier. I gave some thought on how to import those into cython but failed to come up with a solution.

comment:14 in reply to: ↑ 13 Changed 8 years ago by jdemeyer

Replying to nbruin:

in ecl_sig_on(), you do a sig_on() BEFORE switching to the ecl signal handler. That provides a window for an interrupt to still be handled by the SAGE handler. Is that intentional?

Yes, it is intentional. It fixes the problem of an interrupt arriving *before* ecl_sig_on() is called, I added a test to prove that it works.

comment:15 in reply to: ↑ 13 Changed 8 years ago by jdemeyer

Replying to nbruin:

Juanjo's suggestion http://sourceforge.net/mailarchive/message.php?msg_id=27127709 includes using some macros from ecl's header files. Can that be integrated in ecl_sig_on() etc? Since you are moving things to a .c file, using those macros will be much easier.

Maybe it's a good idea, maybe not. I do not know how ECL's interrupt handling works (and to be honest, I don't feel like studying it). So personally, I am not going to do this now.

comment:16 follow-up: Changed 8 years ago by jdemeyer

It would be good to include this patch before #7377. Nils, are you able to review this? (if no, just say so).

comment:17 Changed 8 years ago by jdemeyer

  • Description modified (diff)

comment:18 in reply to: ↑ 16 ; follow-up: Changed 8 years ago by nbruin

Replying to jdemeyer:

It would be good to include this patch before #7377. Nils, are you able to review this?

No, sorry. I'm currently not set up to apply and test patches. I gave your code a quick look and it looks great

  • The fact that ecl_sig_on() is a macro will work great with Juanjo's refinement later
  • I'm a little puzzled that ecl_sig_off() is not a macro. Doesn't the same argument apply there?

(certainly if we apply Juanjo's suggestion it will have to be, because I think there is a macro pair that involves straddling "{" and "}"). However, if the current code works that that is fine. We can always change it later.

  • Jeroen is almost certainly more of an expert in Posix signal handling than I am. Take that as a recommendation.

comment:19 in reply to: ↑ 18 Changed 8 years ago by jdemeyer

Replying to nbruin:

  • I'm a little puzzled that ecl_sig_off() is not a macro. Doesn't the same argument apply there?

(certainly if we apply Juanjo's suggestion it will have to be, because I think there is a macro pair that involves straddling "{" and "}"). However, if the current code works that that is fine. We can always change it later.

The main reason for not making it a macro is that there is currently no reason why it should be a macro. Between macros and inline functions, I much prefer inline functions. So I only use macros when they are really needed.

Let me state again that I'm not against Juanjo's suggestion, it's just that I don't really know how the ECL signal handling works. The code which is currently on this ticket makes few assumptions on the inner workings of the ECL signal handler. The more we start "interfering" with the ECL signal handler, the more important it becomes to really understand what is going on.

comment:20 follow-up: Changed 8 years ago by jpflori

  • Status changed from needs_review to needs_work

I've looked through the code and (after understanding lots of things about signal handling in Sage) it seems great.

I would be ready to give it positive review if the test_sigint_... was a little bit more explicit.

Right now, letting sig_on(), sig_off() in ecl_sig_on() and ecl_sig_off() does not look really different (ok, there is one line more).

For example adding an infinite loop between ecl_sig_on and ecl_sig_off:

    global safe_funcall_clobj
    cl_eval(string_to_object("(setf i 0)"))
    cl_eval(string_to_object("(defun infinite() (loop (incf i)))"))
    inf_loop=cl_eval(string_to_object("(symbol-function 'infinite)"))
    signal_raise(SIGINT)
    ecl_sig_on() # KeyboardInterrupt should be raised because ecl_sig_on() calls sig_on() first
    cl_funcall(1,inf_loop)
    ecl_sig_off()

comment:21 in reply to: ↑ 20 Changed 8 years ago by jdemeyer

  • Reviewers set to Jean-Pierre Flori
  • Status changed from needs_work to needs_review

Replying to jpflori:

For example adding an infinite loop between ecl_sig_on and ecl_sig_off:

    global safe_funcall_clobj
    cl_eval(string_to_object("(setf i 0)"))
    cl_eval(string_to_object("(defun infinite() (loop (incf i)))"))
    inf_loop=cl_eval(string_to_object("(symbol-function 'infinite)"))
    signal_raise(SIGINT)
    ecl_sig_on() # KeyboardInterrupt should be raised because ecl_sig_on() calls sig_on() first
    cl_funcall(1,inf_loop)
    ecl_sig_off()

I decided not to do this, in order to keep the code simple and make it more clear what is happening. I did add more explanation to the test though, I hope this helps also.

comment:22 follow-up: Changed 8 years ago by jpflori

  • Status changed from needs_review to needs_work

I'm sorry to complain a little more about the test you put, but I think putting something python-ish at that place is a bad idea.

Of course because you will actually leave "sig_on()" in "ecl_sig_on()", that won't cause any problems, but if we were to leave out "sig_on()" we won't see "Failure" printed (the SIGINT will get caught before, and that is the main reason which makes it a bad test for me) and that will completely break interrupt handling (pressing Ctrl-C afterwards will lead you into Lisp debugger, I don't even know how to get out of there without pressing Ctrl+\). So IMHO that is not a good example to clearly understand what is going on with interrupt handling for someone who is not acquainted to it (just as me !).

Therefore I put something C-ish (but too complicated I agree) in there. Without "sig_on()", at least it should not lead to a broken situation because the interrupt only gets caught after the C-ish part is finished (but my example is even worse than yours because the C-ish part never ends...).

So you could just add a line stating that even with sig_on(), "Failure !" will not be printed, or a little explanatory paragraph in a note rather than just comments inside sage code.

Changed 8 years ago by jdemeyer

comment:23 in reply to: ↑ 22 Changed 8 years ago by jdemeyer

  • Status changed from needs_work to needs_review

Replying to jpflori:

Therefore I put something C-ish (but too complicated I agree) in there.

Now I added an abort() which is certainly C-ish.

comment:24 Changed 8 years ago by jpflori

  • Status changed from needs_review to positive_review

Thanks, I'm happy with that.

comment:25 Changed 8 years ago by jdemeyer

  • Merged in set to sage-4.7.alpha5
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:26 follow-up: Changed 8 years ago by jdemeyer

  • Merged in sage-4.7.alpha5 deleted
  • Resolution fixed deleted
  • Status changed from closed to new

On OSX 10.6-32 (bsd):

sage -t -long  -force_lib devel/sage/sage/libs/ecl.pyx
The doctested process was killed by signal 10

comment:27 Changed 8 years ago by jdemeyer

  • Status changed from new to needs_work

comment:28 Changed 8 years ago by jdemeyer

  • Milestone changed from sage-4.7 to sage-4.7.1

comment:29 Changed 8 years ago by kcrisman

  • Cc kcrisman added

comment:30 in reply to: ↑ 26 Changed 8 years ago by fbissey

Replying to jdemeyer:

On OSX 10.6-32 (bsd):

sage -t -long  -force_lib devel/sage/sage/libs/ecl.pyx
The doctested process was killed by signal 10

Would it be at all possible to have the output of

sage -t -long -verbose -force_lib devel/sage/sage/libs/ecl.pyx

on that machine?

comment:31 Changed 8 years ago by jpflori

I'll try to test that on different OSX versions (I guess this is an Intel processor).

comment:32 Changed 8 years ago by jdemeyer

We need to handle SIGBUS in addition to SIGINT and SIGSEGV.

comment:33 Changed 8 years ago by jdemeyer

  • Description modified (diff)
  • Status changed from needs_work to needs_review

Changed 8 years ago by jdemeyer

comment:34 follow-ups: Changed 8 years ago by kcrisman

I don't have access to bsd, but on my OS X 10.6 Intel laptop and my OS X 10.4 PPC desktop I get, using the command fbissey suggests,

----------------------------------------------------------------------
All tests passed!
Total time for all tests: 4.0 seconds

Well, it was 36.1 seconds on the older machine! So for what it's worth, the latest patch seems to be ok. Since this is Jeroen's patch, presumably he tested it works on bsd?

comment:35 in reply to: ↑ 34 Changed 8 years ago by jdemeyer

Replying to kcrisman:

Since this is Jeroen's patch, presumably he tested it works on bsd?

No, bsd is totally completely utterly unusably slow these days so I can't test it on that machine. I have sent an email to William about this slowness.

comment:36 in reply to: ↑ 34 Changed 8 years ago by jdemeyer

Replying to kcrisman:

Since this is Jeroen's patch, presumably he tested it works on bsd?

bsd is fixed now and yes it works on that machine. Can anybody look at the patch and give a review?

comment:37 Changed 8 years ago by kcrisman

Can you post a diff from previous versions if no one reviews it soon? It looks ok but it takes me a long time to review this sort of code properly so I don't feel comfortable giving review until I have time to sit down and really parse it.

comment:38 follow-ups: Changed 8 years ago by jpflori

The only difference with the previous version is that SIGBUS is handled as SIGINT and SIGSEGV now (so basically a line added everytime they are involved).

I think that the code its testcases and its doc are fine, but I don't have the time and the means to test it on any exotic architecture, so I don't really feel like giving it a positive review (last time I let the bsd problem go through).

I'm not an ECL interrupt expert as well, so some other things could be wrong...

comment:39 in reply to: ↑ 38 Changed 8 years ago by jdemeyer

Replying to jpflori:

I don't have the time and the means to test it on any exotic architecture, so I don't really feel like giving it a positive review

(speaking as release manager) I don't think that testing on exotic architectures should be a requirement for a patch to get positive_review. It is much more important to actually look at the patch and check that it makes sense rather than test it on all possible architectures. Some minimal testing should of course be done (say -t -long in directories obviously affected by the patch on at least one system).

comment:40 in reply to: ↑ 38 Changed 8 years ago by jdemeyer

Replying to jpflori:

I'm not an ECL interrupt expert as well, so some other things could be wrong...

Unfortunately, even I am not an ECL interrupt expert. However, I tried to write the code in such a way that it doesn't really matter what ECL does. With the current patch, there is no interaction between Sage interrupt handling and ECL interrupt handling. We just let ECL deal with all relevant signals.

comment:41 Changed 8 years ago by jpflori

  • Status changed from needs_review to positive_review

Ok, then I'll put it back as positive review.

comment:42 Changed 8 years ago by jdemeyer

  • Merged in set to sage-4.7.1.alpha0
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.