Opened 8 years ago
Closed 8 years ago
#18741 closed defect (fixed)
Random failure in sagespawn.pyx
Reported by: | vbraun | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-6.8 |
Component: | interfaces | Keywords: | random_fail |
Cc: | jdemeyer | Merged in: | |
Authors: | Jeroen Demeyer | Reviewers: | Volker Braun |
Report Upstream: | N/A | Work issues: | |
Branch: | 659a1f5 (Commits, GitHub, GitLab) | Commit: | 659a1f549535ddf1f08b36a33d71cd37b6971bee |
Dependencies: | Stopgaps: |
Description (last modified by )
There are at least two races left, see comments for details.
Change History (13)
comment:1 Changed 8 years ago by
Keywords: | random_fail added |
---|
comment:2 Changed 8 years ago by
Cc: | jdemeyer added |
---|
On OSX:
sage -t --long src/sage/interfaces/sagespawn.pyx Bad exit: 1 ********************************************************************** Tests run before process (pid=73714) failed: sage: from sage.interfaces.sagespawn import SageSpawn ## line 45 ## sage: SageSpawn("sleep 1", name="Sleeping Beauty") ## line 46 ## Sleeping Beauty with PID 73715 running /bin/sleep 1 sage: sig_on_count() ## line 48 ## 0 sage: from sage.interfaces.sagespawn import SageSpawn ## line 67 ## sage: s = SageSpawn("true", name="stupid process") ## line 68 ## sage: s # indirect doctest ## line 69 ## stupid process with PID 73717 running /usr/bin/true sage: while s.isalive(): # Wait until the process finishes sleep(0.1) ## line 71 ## sage: s # indirect doctest ## line 73 ## stupid process finished running /usr/bin/true sage: sig_on_count() ## line 75 ## 0 sage: from sage.interfaces.sagespawn import SageSpawn ## line 96 ## sage: s = SageSpawn("sh", ["-c", "while true; do sleep 1; done"]) ## line 97 ## sage: s._keep_alive() ## line 98 ## sage: pid = s.pid ## line 99 ## sage: del s ## line 100 ## sage: import gc ## line 101 ## sage: _ = gc.collect() ## line 102 ## sage: from signal import SIGTERM ## line 107 ## sage: os.kill(pid, SIGTERM) ## line 108 ## sage: sig_on_count() ## line 109 ## 0 sage: from sage.interfaces.sagespawn import SageSpawn ## line 122 ## sage: s = SageSpawn("sleep 1000") ## line 123 ## sage: s.close() ## line 124 ## sage: while s.isalive(): # long time (5 seconds) sleep(0.1) ## line 125 ## sage: sig_on_count() ## line 127 ## 0 sage: from sage.interfaces.sagespawn import SageSpawn ## line 157 ## sage: s = SageSpawn("sh", ["-c", "while true; do sleep 1; done"]) ## line 158 ## sage: s.terminate_async(interval=0.2) ## line 163 ## sage: while True: try: os.kill(s.pid, 0) except OSError: sleep(0.1) else: break # process got killed ## line 164 ## sage: sig_on_count() ## line 171 ## 0
comment:3 Changed 8 years ago by
Same on Linux (Snapperkob):
sage -t --long src/sage/interfaces/sagespawn.pyx Killed due to terminate ********************************************************************** Tests run before process (pid=10311) failed: sage: from sage.interfaces.sagespawn import SageSpawn ## line 45 ## sage: SageSpawn("sleep 1", name="Sleeping Beauty") ## line 46 ## Sleeping Beauty with PID 10312 running /bin/sleep 1 sage: sig_on_count() ## line 48 ## 0 sage: from sage.interfaces.sagespawn import SageSpawn ## line 67 ## sage: s = SageSpawn("true", name="stupid process") ## line 68 ## sage: s # indirect doctest ## line 69 ## stupid process with PID 10316 running /bin/true sage: while s.isalive(): # Wait until the process finishes sleep(0.1) ## line 71 ## sage: s # indirect doctest ## line 73 ## stupid process finished running /bin/true sage: sig_on_count() ## line 75 ## 0 sage: from sage.interfaces.sagespawn import SageSpawn ## line 96 ## sage: s = SageSpawn("sh", ["-c", "while true; do sleep 1; done"]) ## line 97 ## sage: s._keep_alive() ## line 98 ## sage: pid = s.pid ## line 99 ## sage: del s ## line 100 ## sage: import gc ## line 101 ## sage: _ = gc.collect() ## line 102 ## sage: from signal import SIGTERM ## line 107 ## sage: os.kill(pid, SIGTERM) ## line 108 ## sage: sig_on_count() ## line 109 ## 0 sage: from sage.interfaces.sagespawn import SageSpawn ## line 122 ## sage: s = SageSpawn("sleep 1000") ## line 123 ## sage: s.close() ## line 124 ## sage: while s.isalive(): # long time (5 seconds) sleep(0.1) ## line 125 ## sage: sig_on_count() ## line 127 ## 0 sage: from sage.interfaces.sagespawn import SageSpawn ## line 157 ## sage: s = SageSpawn("sh", ["-c", "while true; do sleep 1; done"]) ## line 158 ## sage: s.terminate_async(interval=0.2) ## line 163 ## sage: while True: try: os.kill(s.pid, 0) except OSError: sleep(0.1) else: break # process got killed ## line 164 ## sage: sig_on_count() ## line 171 ## 0
comment:4 Changed 8 years ago by
Authors: | → Jeroen Demeyer |
---|
comment:5 Changed 8 years ago by
On my laptop, I can reproduce the first two, not the third. Those first two seem to be different issues. The first is the process-group-was-not-yet-changed race I mentioned in #17686. But I still need to figure out the "bad exit".
comment:6 Changed 8 years ago by
Branch: | → u/jdemeyer/random_failure_in_sagespawn_pyx |
---|
comment:7 Changed 8 years ago by
Commit: | → 659a1f549535ddf1f08b36a33d71cd37b6971bee |
---|---|
Status: | new → needs_review |
New commits:
659a1f5 | Fix some race conditions in SageSpawn
|
comment:8 Changed 8 years ago by
Description: | modified (diff) |
---|
comment:9 follow-up: 10 Changed 8 years ago by
Looks good but the 0.125s is IMHO too slow in the poll; If used with a quick-running executable in @parallel then we might spend most of the time waiting. IMHO we should just sleep(0)
(yield to the thread scheduler). Also, the total timeout should probably be larger than 20*0.125 which you should be able to hit in case of low memory / swapping.
comment:10 Changed 8 years ago by
Replying to vbraun:
Looks good but the 0.125s is IMHO too slow in the poll; If used with a quick-running executable in @parallel then we might spend most of the time waiting.
That code is only for the case where you spawn a pexpect
subprocess (not for @parallel
) and where the process is started and immediately killed. If you're doing that too often, you're doing something wrong anyway.
comment:11 Changed 8 years ago by
Status: | needs_review → positive_review |
---|
comment:12 Changed 8 years ago by
Reviewers: | → Volker Braun |
---|
comment:13 Changed 8 years ago by
Branch: | u/jdemeyer/random_failure_in_sagespawn_pyx → 659a1f549535ddf1f08b36a33d71cd37b6971bee |
---|---|
Resolution: | → fixed |
Status: | positive_review → closed |
On Arando (already reported at #17686):