Opened 5 months ago
Closed 3 months ago
#33027 closed defect (fixed)
zombie maxima process - if invoked from a script
Reported by: | dimpase | Owned by: | |
---|---|---|---|
Priority: | critical | Milestone: | sage-9.6 |
Component: | interfaces | Keywords: | |
Cc: | nbruin, vbraun, mkoeppe, gh-spaghettisalat | Merged in: | |
Authors: | Dima Pasechnik | Reviewers: | Michael Orlitzky |
Report Upstream: | N/A | Work issues: | |
Branch: | abafddc (Commits, GitHub, GitLab) | Commit: | abafddc33ff85ad50eb67c6532984985a856e882 |
Dependencies: | Stopgaps: |
Description (last modified by )
invoking maxima
in a Sage script leads to a zombie process. E.g.
run the following in terminal
echo "t=maxima('2+2')" > /tmp/foo.sage && ./sage /tmp/foo.sage
and observe zombie maxima
process after this terminates.
Don't forget to
killall maxima
now and then.
A slightly shorter
echo "t=maxima('2+2')" | ./sage
does not lead to a zombie - in fact, it prints on exit:
sage: sage: Exiting Sage (CPU time 0m0.08s, Wall time 0m0.67s). Exiting Maxima with PID 2318 running <SAGEROOT>/local/bin/maxima -p <SAGEROOT>/local/var/lib/sage/venv-python3.9/lib/python3.9/site-packages/sage/interfaces/sage-maxima.lisp
This was observed while testing sagetex
spkg with SAGE_CHECK=yes make sagetex
on #32887.
Using maxima_calculus()
(the library interface) rather than maxima()
(pexpect interface) does not lead to zombies.
Apparently, this only happens with ecl installed in Sage rather than with system-wide ecl.
Attachments (1)
Change History (57)
comment:1 Changed 5 months ago by
- Priority changed from major to critical
comment:2 Changed 5 months ago by
- Cc nbruin added
comment:3 Changed 4 months ago by
This might be related to #32167.
comment:4 follow-up: ↓ 8 Changed 4 months ago by
I think it was observed without fricas installed, too.
comment:5 Changed 4 months ago by
- Description modified (diff)
comment:6 Changed 4 months ago by
- Description modified (diff)
I've noticed that one gets
.sage/maxima/binary/5_45_0/ecl/21_2_1/
created in DOTSAGE. (no matter with the reproducer, or the non-reproducer echo "t=maxima('2+2')" | ./sage
comment:7 follow-up: ↓ 9 Changed 4 months ago by
- Cc vbraun mkoeppe added
- Priority changed from critical to blocker
This effectively kills the patchbots, thus a blocker, IMHO. https://groups.google.com/d/msgid/sage-devel/997e9f75-8e92-4a67-b43a-1777074cbe45n%40googlegroups.com
comment:8 in reply to: ↑ 4 Changed 4 months ago by
what I meant to say is that it might be a similar underlying problem.
comment:9 in reply to: ↑ 7 Changed 4 months ago by
Replying to dimpase:
This effectively kills the patchbots, thus a blocker, IMHO. https://groups.google.com/d/msgid/sage-devel/997e9f75-8e92-4a67-b43a-1777074cbe45n%40googlegroups.com
Apparently a ticket introduced in 9.5beta8 https://groups.google.com/g/sage-release/c/vo_m79EHAVc/m/mMlNPz5sBAAJ introduced this problem. My patchbot worked fine until then and I got contacted by the IT on December 13th (it is a virtual machine and I'm guessing this also affected other people).
comment:10 follow-up: ↓ 11 Changed 4 months ago by
- Never execute code from a predictable filename under
/tmp
=) - FWIW, I can't reproduce this on rc2.
comment:11 in reply to: ↑ 10 ; follow-up: ↓ 12 Changed 4 months ago by
Replying to mjo:
- Never execute code from a predictable filename under
/tmp
=)- FWIW, I can't reproduce this on rc2.
Perhaps it's due to ecl from the system? I see this on Debian 11 with ecl built by Sage.
comment:12 in reply to: ↑ 11 ; follow-up: ↓ 13 Changed 4 months ago by
comment:13 in reply to: ↑ 12 Changed 4 months ago by
Replying to gh-kliem:
Replying to dimpase:
Replying to mjo:
- Never execute code from a predictable filename under
/tmp
=)- FWIW, I can't reproduce this on rc2.
Perhaps it's due to ecl from the system? I see this on Debian 11 with ecl built by Sage.
No, my config.log states,
already installed as an SPKG
.
I meant that a system ecl is the reason that Michael cannot reproduce this.
comment:14 follow-up: ↓ 15 Changed 4 months ago by
Ah yes, this might be the reason. I can also observe this on ubuntu focal with ecl installed as SPKG.
comment:15 in reply to: ↑ 14 ; follow-ups: ↓ 18 ↓ 20 Changed 4 months ago by
Replying to gh-kliem:
Ah yes, this might be the reason. I can also observe this on ubuntu focal with ecl installed as SPKG.
Yes, I can reproduce this - on a Gentoo machine with systemwide ecl there is no zombie, but installing ecl+maxima in Sage leads to zombies.
comment:16 Changed 4 months ago by
- Description modified (diff)
comment:17 Changed 4 months ago by
- Cc gh-spaghettisalat added
comment:18 in reply to: ↑ 15 ; follow-up: ↓ 21 Changed 4 months ago by
Replying to dimpase:
Replying to gh-kliem:
Ah yes, this might be the reason. I can also observe this on ubuntu focal with ecl installed as SPKG.
Yes, I can reproduce this - on a Gentoo machine with systemwide ecl there is no zombie, but installing ecl+maxima in Sage leads to zombies.
Maybe delete that write_error.patch
that the SPKG applies, and try again?
comment:19 follow-up: ↓ 23 Changed 4 months ago by
The other obvious difference is that the SPKG is built with --disable-threads
. But the patch is more suspicious to me.
comment:20 in reply to: ↑ 15 Changed 4 months ago by
Could you check whether using spkg ecl / system ecl also makes the difference for #32167?
comment:21 in reply to: ↑ 18 Changed 4 months ago by
Replying to mjo:
Replying to dimpase:
Replying to gh-kliem:
Ah yes, this might be the reason. I can also observe this on ubuntu focal with ecl installed as SPKG.
Yes, I can reproduce this - on a Gentoo machine with systemwide ecl there is no zombie, but installing ecl+maxima in Sage leads to zombies.
Maybe delete that
write_error.patch
that the SPKG applies, and try again?
patch or no patch, the same picture.
comment:22 Changed 4 months ago by
accoring to comment:9, something happened in 9.5.beta8, which caused this. Time for git bisect
I suppose...
comment:23 in reply to: ↑ 19 Changed 4 months ago by
Replying to mjo:
The other obvious difference is that the SPKG is built with
--disable-threads
. But the patch is more suspicious to me.
This was the correct guess.
If I remove --disable-threads
then no zombies pop up.
Can this also be done on macOS?
comment:24 Changed 4 months ago by
- Status changed from new to needs_review
posting a branch in a moment
comment:25 Changed 4 months ago by
- Status changed from needs_review to needs_work
comment:26 Changed 4 months ago by
- Branch set to u/dimpase/packages/ecl/nozombies
- Commit set to abafddc33ff85ad50eb67c6532984985a856e882
- Status changed from needs_work to needs_review
New commits:
abafddc | disable-threads led to zombies
|
comment:27 follow-up: ↓ 29 Changed 4 months ago by
On OS X, after running ./configure --with-system-ecl=no
, I don't see any zombie maxima processes no matter how many times I run ./sage /path/to/foo.sage
.
comment:28 follow-up: ↓ 30 Changed 4 months ago by
Is there any way this is connected to #31796? That's the only obvious ticket merged in 9.5.beta8 which is connected to maxima, but it doesn't look problematic to me.
comment:29 in reply to: ↑ 27 Changed 4 months ago by
Replying to jhpalmieri:
On OS X, after running
./configure --with-system-ecl=no
, I don't see any zombie maxima processes no matter how many times I run./sage /path/to/foo.sage
.
macOS is sufficiently different from Linux, I'm not surprised. I noticed that Homebrew builds ecl with threads enabled, so the change proposed here should be fine on macOS too.
Not sure about cygwin though.
comment:30 in reply to: ↑ 28 Changed 4 months ago by
Replying to jhpalmieri:
Is there any way this is connected to #31796? That's the only obvious ticket merged in 9.5.beta8 which is connected to maxima, but it doesn't look problematic to me.
I even tried reverting it, to no avail. Perhaps the behaviour was always there, it's just a test introduced in beta8 that led to these zombie processes?
comment:31 follow-up: ↓ 32 Changed 4 months ago by
I am fairly certain that this is a race condition and enabling threads will not help in general.
Attaching a debugger to the maxima zombie process shows that the process hangs in a loop where it tries to flush the standard output, gets an error, tries to report that error to the standard error stream, then gets an error for that and so on. The following script
import atexit atexit.register(lambda : maxima.quit()) t=maxima('2+2')
works fine and leaves no maxima zombies.
I don't have time to dig into the maxima interface code, but I am pretty sure that sage just doesn't shut down maxima properly.
comment:32 in reply to: ↑ 31 ; follow-ups: ↓ 34 ↓ 39 Changed 4 months ago by
Replying to gh-spaghettisalat:
I am fairly certain that this is a race condition and enabling threads will not help in general.
Attaching a debugger to the maxima zombie process shows that the process hangs in a loop where it tries to flush the standard output, gets an error, tries to report that error to the standard error stream, then gets an error for that and so on.
Sounds related to https://gitlab.com/embeddable-common-lisp/ecl/-/issues/634?
The fact that it works with threads enabled could be due to https://gitlab.com/embeddable-common-lisp/ecl/-/issues/129. If that's the case then indeed we shouldn't rely on it.
The following script
import atexit atexit.register(lambda : maxima.quit()) t=maxima('2+2')works fine and leaves no maxima zombies.
I don't have time to dig into the maxima interface code, but I am pretty sure that sage just doesn't shut down maxima properly.
The pexpect interface has been deprecated forever, so it's not unlikely. And I don't think adding a quit()
hook could hurt in any case.
comment:33 Changed 4 months ago by
- Priority changed from blocker to critical
The patchbots will use the develop branch so there is little to be gained and a lot to potentially screw up by pushing the proposed patch into 9.5
comment:34 in reply to: ↑ 32 ; follow-ups: ↓ 35 ↓ 36 Changed 4 months ago by
Replying to mjo:
The pexpect interface has been deprecated forever
No, it's not. We have 2 maxima interfaces. maxima_calculus
is the unique instance of the library-based interface and is for internal use by symbolics. maxima
is the user-facing interface and is pexpect-based.
comment:35 in reply to: ↑ 34 Changed 4 months ago by
Replying to mkoeppe:
Replying to mjo:
The pexpect interface has been deprecated forever
No, it's not. We have 2 maxima interfaces.
maxima_calculus
is the unique instance of the library-based interface and is for internal use by symbolics.maxima
is the user-facing interface and is pexpect-based.
I agree that the pexpect interface is not deprecated, but I don't think it's the only interface that's meant to be user-facing. In fact, I expect (no pun intended) that most uses for maxima in sage *are* through the internal one, because the integration with symbolics makes it easier to get the things in/out. For instance SR(x)._maxima_()
gets you an object through the maxima_lib
interface.
The pexpect interface is convenient if you want multiple sessions (managed through different processes) and if you want "vanilla" maxima: the maxima_lib one has some tweaks and configurations motivated by its use for SR.
comment:36 in reply to: ↑ 34 ; follow-up: ↓ 38 Changed 4 months ago by
Replying to mkoeppe:
Replying to mjo:
The pexpect interface has been deprecated forever
No, it's not. We have 2 maxima interfaces.
maxima_calculus
is the unique instance of the library-based interface and is for internal use by symbolics.maxima
is the user-facing interface and is pexpect-based.
Semantics? The name maxima()
isn't deprecated, but the library interface is faster and more robust than pexpect ever can be. There are several tickets open to replace pexpect calls and documentation with the library interface. You yourself started to make maxima()
use the library in #30097.
comment:37 Changed 4 months ago by
Yes, that's correct, #30097 would be a solution.
comment:38 in reply to: ↑ 36 Changed 4 months ago by
Replying to mjo:
Semantics? The name
maxima()
isn't deprecated, but the library interface is faster and more robust than pexpect ever can be.
That would depend. The pexpect interface could in principle work with, say, maxima on SBCL, which probably would run faster than on ECL. So jobs with maxima that aren't particularly I/O-bound need not be faster through maxima_lib.
Stability: sure. pexpect inferfaces have shown to be quite fragile; particular with LISP, which tends to have I/O that isn't particularly compatible with C-type I/O (character devices may be driver really one-character-at-a-time).
comment:39 in reply to: ↑ 32 ; follow-up: ↓ 41 Changed 4 months ago by
Replying to mjo:
And I don't think adding a
quit()
hook could hurt in any case.
it's there, no?
sage: maxima._quit_string() 'quit();'
comment:40 Changed 4 months ago by
I think I am noticing zombie processes also with the gap3 (optional) interface.
comment:41 in reply to: ↑ 39 Changed 4 months ago by
Replying to dimpase:
Replying to mjo:
And I don't think adding a
quit()
hook could hurt in any case.it's there, no?
sage: maxima._quit_string() 'quit();'
But when is that string sent to the running maxima process? I meant something like comment:31.
I see now that there's a function called quit_sage()
in src/sage/all.py
,
def quit_sage(verbose=True): """ If you use Sage in library mode, you should call this function when your application quits. It makes sure any child processes are also killed, etc. """ ... from sage.interfaces.quit import expect_quitall expect_quitall(verbose=verbose) ...
which kills all running pexpect processes. And apparently you are supposed to call this function yourself. So, I guess we can blame sagetex for not calling quit_sage()
at the end of the scripts it generates? =)
comment:42 follow-up: ↓ 43 Changed 4 months ago by
shouldn't EOF in a .sage script trigger quit_sage()
?
comment:43 in reply to: ↑ 42 Changed 4 months ago by
Replying to dimpase:
shouldn't EOF in a .sage script trigger
quit_sage()
?
Yes, obviously. (Or something like that). I only use sage as a library, and I just learned about that function 20 minutes ago. It's outrageous to expect the user to manually clean up after the library.
However I think an atexit hook is a cleaner approach, at least for pexpect processes. They all support quit()
, and could register their own atexit hook upon being initialized. It won't help with crashes or when killed by a signal, but then, neither does calling quit_sage()
at EOF.
comment:44 Changed 4 months ago by
I'm going to experiment with a cleanup hook for symmetrica on another ticket. Our interface to that library has start()
and end()
functions that are supposed to be called manually. The start()
function gets called when you import sage.libs.symmetrica.all
, but end()
never does unless you call quit_sage()
. So I'm going to have start()
register a hook that calls end()
, and remove the corresponding bits from quit_sage()
. The body of quit_sage()
isn't very long so we may be able to obsolete it rather quickly.
comment:45 Changed 4 months ago by
I posted a branch on #8784 that eliminates quit_sage()
and puts all of the cleanup chores (including the termination of pexpect processes) into atexit hooks.
It's a more-invasive change, but the right thing to do if it doesn't cause any subtle new bugs.
comment:46 Changed 4 months ago by
- Reviewers set to Michael Orlitzky
- Status changed from needs_review to positive_review
Let's take the easy route for the 9.5 release and worry about quit_sage()
afterwards.
comment:47 Changed 4 months ago by
- Milestone changed from sage-9.5 to sage-9.6
Setting milestone to 9.6 now that 9.5 is out.
comment:48 Changed 4 months ago by
- Status changed from positive_review to needs_work
With this ticket I'm seeing a lot of flakiness with maxima-related tests, e.g.
sage -t --long --warn-long 50.7 --random-seed=36889336955867588730901666695941730921 src/sage/interfaces/maxima_abstract.py ********************************************************************** File "src/sage/interfaces/maxima_abstract.py", line 314, in sage.interfaces.maxima_abstract.MaximaAbstract._commands Failed example: sorted(maxima._commands(verbose=False)) Exception raised: Traceback (most recent call last): File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/doctest/forker.py", line 694, in _run self.compile_and_execute(example, compiler, test.globs) File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/doctest/forker.py", line 1088, in compile_and_execute exec(compiled, globs) File "<doctest sage.interfaces.maxima_abstract.MaximaAbstract._commands[0]>", line 1, in <module> sorted(maxima._commands(verbose=False)) File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/interfaces/maxima_abstract.py", line 327, in _commands [self.completions(chr(65+n), verbose=verbose)+ File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/interfaces/maxima_abstract.py", line 327, in <listcomp> [self.completions(chr(65+n), verbose=verbose)+ File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/interfaces/maxima_abstract.py", line 296, in completions cmd_list = self._eval_line('apropos("%s")'%s, error_check=False) File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/interfaces/maxima.py", line 814, in _eval_line self._expect_expr(self._display_prompt) File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/interfaces/maxima.py", line 731, in _expect_expr i = self._expect.expect(expr) File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/pexpect/spawnbase.py", line 343, in expect return self.expect_list(compiled_pattern_list, File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/pexpect/spawnbase.py", line 372, in expect_list return exp.expect_loop(timeout) File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/pexpect/expect.py", line 179, in expect_loop return self.eof(e) File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/pexpect/expect.py", line 122, in eof raise exc pexpect.exceptions.EOF: End Of File (EOF). Exception style platform. Maxima with PID 1795009 running /home/release/Sage/local/bin/maxima -p /home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/interfaces/sage-maxima.lisp command: /home/release/Sage/local/bin/maxima args: ['/home/release/Sage/local/bin/maxima', '-p', '/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/interfaces/sage-maxima.lisp'] buffer (last 100 chars): b'' before (last 100 chars): '' after: <class 'pexpect.exceptions.EOF'> match: None match_index: None exitstatus: None flag_eof: True pid: 1795009 child_fd: 20 closed: False timeout: None delimiter: <class 'pexpect.exceptions.EOF'> logfile: None logfile_read: None logfile_send: None maxread: 4194304 ignorecase: False searchwindowsize: None delaybeforesend: None delayafterclose: 0.1 delayafterterminate: 0.1 searcher: searcher_re: 0: re.compile(b'<sage-display>') **********************************************************************
Always disappears when testing individually, but running multiple maxima-using tests in parallel triggers them quite reliably. E.g.
./sage -t -p 8 --long src/sage/interfaces/maxima_abstract.py src/sage/dynamics/complex_dynamics/mandel_julia.py src/sage/interfaces/maxima.py src/sage/tests/books/computational-mathematics-with-sagemath/sol/mpoly_doctest.py src/sage/coding/kasami_codes.pyx
comment:49 Changed 4 months ago by
Actually seems to be due to #32986
comment:50 Changed 4 months ago by
- Status changed from needs_work to needs_review
comment:51 Changed 4 months ago by
I have just been hit by this problem on the patchbot as well. For me, running ptestlong now leaves about 47 maxima processes running at around 130% cpu usage on average, as well as two ecl and several fricas processes.
With the current branch, the problem seems to go away.
comment:52 Changed 4 months ago by
feel free to give it positive review, again...
comment:53 Changed 4 months ago by
- Status changed from needs_review to positive_review
Longer term we will probably want threads enabled anyway (the upstream and most distros default), even if #8784 goes in, so.
comment:54 Changed 4 months ago by
At first, I thought the branch would disable threads, but actually it is the other way around. If threads had been disabled before, I am surprised the zombie processes use more than 100% cpu.
comment:55 Changed 4 months ago by
The patch resulted from an observation that a system-wide ecl
with threads enabled does not produce zombie processes. Probably sage-cleaner
works better on them.
comment:56 Changed 3 months ago by
- Branch changed from u/dimpase/packages/ecl/nozombies to abafddc33ff85ad50eb67c6532984985a856e882
- Resolution set to fixed
- Status changed from positive_review to closed
Moving this to critical, as it seems to efficiently kill the patchbots.