Opened 5 months ago

Closed 3 months ago

#33027 closed defect (fixed)

zombie maxima process - if invoked from a script

Reported by: dimpase Owned by:
Priority: critical Milestone: sage-9.6
Component: interfaces Keywords:
Cc: nbruin, vbraun, mkoeppe, gh-spaghettisalat Merged in:
Authors: Dima Pasechnik Reviewers: Michael Orlitzky
Report Upstream: N/A Work issues:
Branch: abafddc (Commits, GitHub, GitLab) Commit: abafddc33ff85ad50eb67c6532984985a856e882
Dependencies: Stopgaps:

Status badges

Description (last modified by dimpase)

invoking maxima in a Sage script leads to a zombie process. E.g. run the following in terminal

echo "t=maxima('2+2')" > /tmp/foo.sage && ./sage /tmp/foo.sage

and observe zombie maxima process after this terminates. Don't forget to

killall maxima

now and then.

A slightly shorter

echo "t=maxima('2+2')" | ./sage

does not lead to a zombie - in fact, it prints on exit:

sage: sage: Exiting Sage (CPU time 0m0.08s, Wall time 0m0.67s).
Exiting Maxima with PID 2318 running <SAGEROOT>/local/bin/maxima -p <SAGEROOT>/local/var/lib/sage/venv-python3.9/lib/python3.9/site-packages/sage/interfaces/sage-maxima.lisp

This was observed while testing sagetex spkg with SAGE_CHECK=yes make sagetex on #32887.

Using maxima_calculus() (the library interface) rather than maxima() (pexpect interface) does not lead to zombies.


Apparently, this only happens with ecl installed in Sage rather than with system-wide ecl.

Attachments (1)

config.log (301.9 KB) - added by gh-kliem 4 months ago.
A config.log of some debian buster with this problem

Download all attachments as: .zip

Change History (57)

comment:1 Changed 5 months ago by gh-kliem

  • Priority changed from major to critical

Moving this to critical, as it seems to efficiently kill the patchbots.

comment:2 Changed 5 months ago by dimpase

  • Cc nbruin added

comment:3 Changed 4 months ago by mantepse

This might be related to #32167.

comment:4 follow-up: Changed 4 months ago by dimpase

I think it was observed without fricas installed, too.

comment:5 Changed 4 months ago by dimpase

  • Description modified (diff)

comment:6 Changed 4 months ago by dimpase

  • Description modified (diff)

I've noticed that one gets .sage/maxima/binary/5_45_0/ecl/21_2_1/ created in DOTSAGE. (no matter with the reproducer, or the non-reproducer echo "t=maxima('2+2')" | ./sage

comment:7 follow-up: Changed 4 months ago by dimpase

  • Cc vbraun mkoeppe added
  • Priority changed from critical to blocker

comment:8 in reply to: ↑ 4 Changed 4 months ago by mantepse

what I meant to say is that it might be a similar underlying problem.

comment:9 in reply to: ↑ 7 Changed 4 months ago by gh-kliem

Replying to dimpase:

This effectively kills the patchbots, thus a blocker, IMHO. https://groups.google.com/d/msgid/sage-devel/997e9f75-8e92-4a67-b43a-1777074cbe45n%40googlegroups.com

Apparently a ticket introduced in 9.5beta8 https://groups.google.com/g/sage-release/c/vo_m79EHAVc/m/mMlNPz5sBAAJ introduced this problem. My patchbot worked fine until then and I got contacted by the IT on December 13th (it is a virtual machine and I'm guessing this also affected other people).

comment:10 follow-up: Changed 4 months ago by mjo

  1. Never execute code from a predictable filename under /tmp =)
  2. FWIW, I can't reproduce this on rc2.

comment:11 in reply to: ↑ 10 ; follow-up: Changed 4 months ago by dimpase

Replying to mjo:

  1. Never execute code from a predictable filename under /tmp =)
  2. FWIW, I can't reproduce this on rc2.

Perhaps it's due to ecl from the system? I see this on Debian 11 with ecl built by Sage.

comment:12 in reply to: ↑ 11 ; follow-up: Changed 4 months ago by gh-kliem

Replying to dimpase:

Replying to mjo:

  1. Never execute code from a predictable filename under /tmp =)
  2. FWIW, I can't reproduce this on rc2.

Perhaps it's due to ecl from the system? I see this on Debian 11 with ecl built by Sage.

No, my config.log states, already installed as an SPKG.

Changed 4 months ago by gh-kliem

A config.log of some debian buster with this problem

comment:13 in reply to: ↑ 12 Changed 4 months ago by dimpase

Replying to gh-kliem:

Replying to dimpase:

Replying to mjo:

  1. Never execute code from a predictable filename under /tmp =)
  2. FWIW, I can't reproduce this on rc2.

Perhaps it's due to ecl from the system? I see this on Debian 11 with ecl built by Sage.

No, my config.log states, already installed as an SPKG.

I meant that a system ecl is the reason that Michael cannot reproduce this.

comment:14 follow-up: Changed 4 months ago by gh-kliem

Ah yes, this might be the reason. I can also observe this on ubuntu focal with ecl installed as SPKG.

comment:15 in reply to: ↑ 14 ; follow-ups: Changed 4 months ago by dimpase

Replying to gh-kliem:

Ah yes, this might be the reason. I can also observe this on ubuntu focal with ecl installed as SPKG.

Yes, I can reproduce this - on a Gentoo machine with systemwide ecl there is no zombie, but installing ecl+maxima in Sage leads to zombies.

comment:16 Changed 4 months ago by dimpase

  • Description modified (diff)

comment:17 Changed 4 months ago by mkoeppe

  • Cc gh-spaghettisalat added

comment:18 in reply to: ↑ 15 ; follow-up: Changed 4 months ago by mjo

Replying to dimpase:

Replying to gh-kliem:

Ah yes, this might be the reason. I can also observe this on ubuntu focal with ecl installed as SPKG.

Yes, I can reproduce this - on a Gentoo machine with systemwide ecl there is no zombie, but installing ecl+maxima in Sage leads to zombies.

Maybe delete that write_error.patch that the SPKG applies, and try again?

comment:19 follow-up: Changed 4 months ago by mjo

The other obvious difference is that the SPKG is built with --disable-threads. But the patch is more suspicious to me.

comment:20 in reply to: ↑ 15 Changed 4 months ago by mantepse

Could you check whether using spkg ecl / system ecl also makes the difference for #32167?

comment:21 in reply to: ↑ 18 Changed 4 months ago by dimpase

Replying to mjo:

Replying to dimpase:

Replying to gh-kliem:

Ah yes, this might be the reason. I can also observe this on ubuntu focal with ecl installed as SPKG.

Yes, I can reproduce this - on a Gentoo machine with systemwide ecl there is no zombie, but installing ecl+maxima in Sage leads to zombies.

Maybe delete that write_error.patch that the SPKG applies, and try again?

patch or no patch, the same picture.

comment:22 Changed 4 months ago by dimpase

accoring to comment:9, something happened in 9.5.beta8, which caused this. Time for git bisect I suppose...

comment:23 in reply to: ↑ 19 Changed 4 months ago by dimpase

Replying to mjo:

The other obvious difference is that the SPKG is built with --disable-threads. But the patch is more suspicious to me.

This was the correct guess. If I remove --disable-threads then no zombies pop up.

Can this also be done on macOS?

comment:24 Changed 4 months ago by dimpase

  • Authors set to Dima Pasechnik
  • Status changed from new to needs_review

posting a branch in a moment

comment:25 Changed 4 months ago by dimpase

  • Status changed from needs_review to needs_work

comment:26 Changed 4 months ago by dimpase

  • Branch set to u/dimpase/packages/ecl/nozombies
  • Commit set to abafddc33ff85ad50eb67c6532984985a856e882
  • Status changed from needs_work to needs_review

New commits:

abafddcdisable-threads led to zombies

comment:27 follow-up: Changed 4 months ago by jhpalmieri

On OS X, after running ./configure --with-system-ecl=no, I don't see any zombie maxima processes no matter how many times I run ./sage /path/to/foo.sage.

comment:28 follow-up: Changed 4 months ago by jhpalmieri

Is there any way this is connected to #31796? That's the only obvious ticket merged in 9.5.beta8 which is connected to maxima, but it doesn't look problematic to me.

comment:29 in reply to: ↑ 27 Changed 4 months ago by dimpase

Replying to jhpalmieri:

On OS X, after running ./configure --with-system-ecl=no, I don't see any zombie maxima processes no matter how many times I run ./sage /path/to/foo.sage.

macOS is sufficiently different from Linux, I'm not surprised. I noticed that Homebrew builds ecl with threads enabled, so the change proposed here should be fine on macOS too.

Not sure about cygwin though.

comment:30 in reply to: ↑ 28 Changed 4 months ago by dimpase

Replying to jhpalmieri:

Is there any way this is connected to #31796? That's the only obvious ticket merged in 9.5.beta8 which is connected to maxima, but it doesn't look problematic to me.

I even tried reverting it, to no avail. Perhaps the behaviour was always there, it's just a test introduced in beta8 that led to these zombie processes?

comment:31 follow-up: Changed 4 months ago by gh-spaghettisalat

I am fairly certain that this is a race condition and enabling threads will not help in general.

Attaching a debugger to the maxima zombie process shows that the process hangs in a loop where it tries to flush the standard output, gets an error, tries to report that error to the standard error stream, then gets an error for that and so on. The following script

import atexit
atexit.register(lambda : maxima.quit())
t=maxima('2+2')

works fine and leaves no maxima zombies.

I don't have time to dig into the maxima interface code, but I am pretty sure that sage just doesn't shut down maxima properly.

comment:32 in reply to: ↑ 31 ; follow-ups: Changed 4 months ago by mjo

Replying to gh-spaghettisalat:

I am fairly certain that this is a race condition and enabling threads will not help in general.

Attaching a debugger to the maxima zombie process shows that the process hangs in a loop where it tries to flush the standard output, gets an error, tries to report that error to the standard error stream, then gets an error for that and so on.

Sounds related to https://gitlab.com/embeddable-common-lisp/ecl/-/issues/634?

The fact that it works with threads enabled could be due to https://gitlab.com/embeddable-common-lisp/ecl/-/issues/129. If that's the case then indeed we shouldn't rely on it.

The following script

import atexit
atexit.register(lambda : maxima.quit())
t=maxima('2+2')

works fine and leaves no maxima zombies.

I don't have time to dig into the maxima interface code, but I am pretty sure that sage just doesn't shut down maxima properly.

The pexpect interface has been deprecated forever, so it's not unlikely. And I don't think adding a quit() hook could hurt in any case.

comment:33 Changed 4 months ago by vbraun

  • Priority changed from blocker to critical

The patchbots will use the develop branch so there is little to be gained and a lot to potentially screw up by pushing the proposed patch into 9.5

comment:34 in reply to: ↑ 32 ; follow-ups: Changed 4 months ago by mkoeppe

Replying to mjo:

The pexpect interface has been deprecated forever

No, it's not. We have 2 maxima interfaces. maxima_calculus is the unique instance of the library-based interface and is for internal use by symbolics. maxima is the user-facing interface and is pexpect-based.

comment:35 in reply to: ↑ 34 Changed 4 months ago by nbruin

Replying to mkoeppe:

Replying to mjo:

The pexpect interface has been deprecated forever

No, it's not. We have 2 maxima interfaces. maxima_calculus is the unique instance of the library-based interface and is for internal use by symbolics. maxima is the user-facing interface and is pexpect-based.

I agree that the pexpect interface is not deprecated, but I don't think it's the only interface that's meant to be user-facing. In fact, I expect (no pun intended) that most uses for maxima in sage *are* through the internal one, because the integration with symbolics makes it easier to get the things in/out. For instance SR(x)._maxima_() gets you an object through the maxima_lib interface.

The pexpect interface is convenient if you want multiple sessions (managed through different processes) and if you want "vanilla" maxima: the maxima_lib one has some tweaks and configurations motivated by its use for SR.

comment:36 in reply to: ↑ 34 ; follow-up: Changed 4 months ago by mjo

Replying to mkoeppe:

Replying to mjo:

The pexpect interface has been deprecated forever

No, it's not. We have 2 maxima interfaces. maxima_calculus is the unique instance of the library-based interface and is for internal use by symbolics. maxima is the user-facing interface and is pexpect-based.

Semantics? The name maxima() isn't deprecated, but the library interface is faster and more robust than pexpect ever can be. There are several tickets open to replace pexpect calls and documentation with the library interface. You yourself started to make maxima() use the library in #30097.

comment:37 Changed 4 months ago by mkoeppe

Yes, that's correct, #30097 would be a solution.

comment:38 in reply to: ↑ 36 Changed 4 months ago by nbruin

Replying to mjo:

Semantics? The name maxima() isn't deprecated, but the library interface is faster and more robust than pexpect ever can be.

That would depend. The pexpect interface could in principle work with, say, maxima on SBCL, which probably would run faster than on ECL. So jobs with maxima that aren't particularly I/O-bound need not be faster through maxima_lib.

Stability: sure. pexpect inferfaces have shown to be quite fragile; particular with LISP, which tends to have I/O that isn't particularly compatible with C-type I/O (character devices may be driver really one-character-at-a-time).

comment:39 in reply to: ↑ 32 ; follow-up: Changed 4 months ago by dimpase

Replying to mjo:

And I don't think adding a quit() hook could hurt in any case.

it's there, no?

sage: maxima._quit_string()
'quit();'

comment:40 Changed 4 months ago by mantepse

I think I am noticing zombie processes also with the gap3 (optional) interface.

comment:41 in reply to: ↑ 39 Changed 4 months ago by mjo

Replying to dimpase:

Replying to mjo:

And I don't think adding a quit() hook could hurt in any case.

it's there, no?

sage: maxima._quit_string()
'quit();'

But when is that string sent to the running maxima process? I meant something like comment:31.

I see now that there's a function called quit_sage() in src/sage/all.py,

def quit_sage(verbose=True):
    """                                                                                                                                                                      
    If you use Sage in library mode, you should call this function                                                                                                           
    when your application quits.                                                                                                                                             
                                                                                                                                                                             
    It makes sure any child processes are also killed, etc.                                                                                                                  
    """
    ...
    from sage.interfaces.quit import expect_quitall
    expect_quitall(verbose=verbose)
    ...

which kills all running pexpect processes. And apparently you are supposed to call this function yourself. So, I guess we can blame sagetex for not calling quit_sage() at the end of the scripts it generates? =)

comment:42 follow-up: Changed 4 months ago by dimpase

shouldn't EOF in a .sage script trigger quit_sage()?

comment:43 in reply to: ↑ 42 Changed 4 months ago by mjo

Replying to dimpase:

shouldn't EOF in a .sage script trigger quit_sage()?

Yes, obviously. (Or something like that). I only use sage as a library, and I just learned about that function 20 minutes ago. It's outrageous to expect the user to manually clean up after the library.

However I think an atexit hook is a cleaner approach, at least for pexpect processes. They all support quit(), and could register their own atexit hook upon being initialized. It won't help with crashes or when killed by a signal, but then, neither does calling quit_sage() at EOF.

comment:44 Changed 4 months ago by mjo

I'm going to experiment with a cleanup hook for symmetrica on another ticket. Our interface to that library has start() and end() functions that are supposed to be called manually. The start() function gets called when you import sage.libs.symmetrica.all, but end() never does unless you call quit_sage(). So I'm going to have start() register a hook that calls end(), and remove the corresponding bits from quit_sage(). The body of quit_sage() isn't very long so we may be able to obsolete it rather quickly.

comment:45 Changed 4 months ago by mjo

I posted a branch on #8784 that eliminates quit_sage() and puts all of the cleanup chores (including the termination of pexpect processes) into atexit hooks.

It's a more-invasive change, but the right thing to do if it doesn't cause any subtle new bugs.

comment:46 Changed 4 months ago by mjo

  • Reviewers set to Michael Orlitzky
  • Status changed from needs_review to positive_review

Let's take the easy route for the 9.5 release and worry about quit_sage() afterwards.

comment:47 Changed 4 months ago by slelievre

  • Milestone changed from sage-9.5 to sage-9.6

Setting milestone to 9.6 now that 9.5 is out.

comment:48 Changed 4 months ago by vbraun

  • Status changed from positive_review to needs_work

With this ticket I'm seeing a lot of flakiness with maxima-related tests, e.g.

sage -t --long --warn-long 50.7 --random-seed=36889336955867588730901666695941730921 src/sage/interfaces/maxima_abstract.py
**********************************************************************
File "src/sage/interfaces/maxima_abstract.py", line 314, in sage.interfaces.maxima_abstract.MaximaAbstract._commands
Failed example:
    sorted(maxima._commands(verbose=False))
Exception raised:
    Traceback (most recent call last):
      File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/doctest/forker.py", line 694, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/doctest/forker.py", line 1088, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.interfaces.maxima_abstract.MaximaAbstract._commands[0]>", line 1, in <module>
        sorted(maxima._commands(verbose=False))
      File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/interfaces/maxima_abstract.py", line 327, in _commands
        [self.completions(chr(65+n), verbose=verbose)+
      File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/interfaces/maxima_abstract.py", line 327, in <listcomp>
        [self.completions(chr(65+n), verbose=verbose)+
      File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/interfaces/maxima_abstract.py", line 296, in completions
        cmd_list = self._eval_line('apropos("%s")'%s, error_check=False)
      File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/interfaces/maxima.py", line 814, in _eval_line
        self._expect_expr(self._display_prompt)
      File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/interfaces/maxima.py", line 731, in _expect_expr
        i = self._expect.expect(expr)
      File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/pexpect/spawnbase.py", line 343, in expect
        return self.expect_list(compiled_pattern_list,
      File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/pexpect/spawnbase.py", line 372, in expect_list
        return exp.expect_loop(timeout)
      File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/pexpect/expect.py", line 179, in expect_loop
        return self.eof(e)
      File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/pexpect/expect.py", line 122, in eof
        raise exc
    pexpect.exceptions.EOF: End Of File (EOF). Exception style platform.
    Maxima with PID 1795009 running /home/release/Sage/local/bin/maxima -p /home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/interfaces/sage-maxima.lisp
    command: /home/release/Sage/local/bin/maxima
    args: ['/home/release/Sage/local/bin/maxima', '-p', '/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/interfaces/sage-maxima.lisp']
    buffer (last 100 chars): b''
    before (last 100 chars): ''
    after: <class 'pexpect.exceptions.EOF'>
    match: None
    match_index: None
    exitstatus: None
    flag_eof: True
    pid: 1795009
    child_fd: 20
    closed: False
    timeout: None
    delimiter: <class 'pexpect.exceptions.EOF'>
    logfile: None
    logfile_read: None
    logfile_send: None
    maxread: 4194304
    ignorecase: False
    searchwindowsize: None
    delaybeforesend: None
    delayafterclose: 0.1
    delayafterterminate: 0.1
    searcher: searcher_re:
        0: re.compile(b'<sage-display>')
**********************************************************************

Always disappears when testing individually, but running multiple maxima-using tests in parallel triggers them quite reliably. E.g.

./sage -t -p 8 --long src/sage/interfaces/maxima_abstract.py src/sage/dynamics/complex_dynamics/mandel_julia.py src/sage/interfaces/maxima.py src/sage/tests/books/computational-mathematics-with-sagemath/sol/mpoly_doctest.py src/sage/coding/kasami_codes.pyx

comment:49 Changed 4 months ago by vbraun

Actually seems to be due to #32986

comment:50 Changed 4 months ago by dimpase

  • Status changed from needs_work to needs_review

comment:51 Changed 4 months ago by gh-mwageringel

I have just been hit by this problem on the patchbot as well. For me, running ptestlong now leaves about 47 maxima processes running at around 130% cpu usage on average, as well as two ecl and several fricas processes.

With the current branch, the problem seems to go away.

comment:52 Changed 4 months ago by dimpase

feel free to give it positive review, again...

comment:53 Changed 4 months ago by mjo

  • Status changed from needs_review to positive_review

Longer term we will probably want threads enabled anyway (the upstream and most distros default), even if #8784 goes in, so.

comment:54 Changed 4 months ago by gh-mwageringel

At first, I thought the branch would disable threads, but actually it is the other way around. If threads had been disabled before, I am surprised the zombie processes use more than 100% cpu.

comment:55 Changed 4 months ago by dimpase

The patch resulted from an observation that a system-wide ecl with threads enabled does not produce zombie processes. Probably sage-cleaner works better on them.

comment:56 Changed 3 months ago by vbraun

  • Branch changed from u/dimpase/packages/ecl/nozombies to abafddc33ff85ad50eb67c6532984985a856e882
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.