Opened 4 years ago
Closed 3 years ago
#21258 closed defect (fixed)
Random failure in trace.py
Reported by:  vbraun  Owned by:  

Priority:  blocker  Milestone:  sage8.0 
Component:  misc  Keywords:  random_fail 
Cc:  tmonteil, jdemeyer, slelievre  Merged in:  
Authors:  Volker Braun  Reviewers:  Thierry Monteil 
Report Upstream:  N/A  Work issues:  
Branch:  c1062b6 (Commits)  Commit:  c1062b631b529c2755abcce53924db472ee5360b 
Dependencies:  Stopgaps: 
Description (last modified by )
Reported by Emmanuel Charpentier, Daniel Krenn, Steven Trogdon in the sagerelease discussion about Sage 7.4.beta1.
sage t long src/sage/misc/trace.py ********************************************************************** File "src/sage/misc/trace.py", line 66, in sage.misc.trace.trace Failed example: print(s.before[s.before.find(''):]) Expected: ... ipdb> c 2 * 5 Got: Return None > /home/buildbot/slave/sage_git/build/build/bdist.linuxx86_64/egg/prompt_toolkit/terminal/vt ********************************************************************** 1 item had failures: 1 of 10 in sage.misc.trace.trace
Change History (35)
comment:2 Changed 4 years ago by
Note that in my case, the failure does not appear randomly, but at each test.
comment:3 Changed 4 years ago by
A few additional remarks:
 Changing the
TERM
variable (vt100
,linux
,screen
,''
, ...) has no effect.  When i reninstall
prompt_toolkit
, the doctest pass once. Then further tests fail.  The directory mentionned in the failed doctest (
$SAGE_ROOT/build/build/bdist.linux${ARCH}/egg/prompt_toolkit/terminal/vt
) does not exist.
comment:4 Changed 4 years ago by
Further investigation:
if i remove the following .pyc
files, the test pass again (only once, of course):
$SAGE_ROOT/local/lib/python2.7/sitepackages/IPython/core/completer.pyc $SAGE_ROOT/local/lib/python2.7/sitepackages/IPython/core/interactiveshell.pyc $SAGE_ROOT/local/lib/python2.7/sitepackages/IPython/lib/display.pyc
If i run the culprit test in a .sage
file, i only have to remove the first two files.
comment:5 Changed 4 years ago by
Volker, on which kind of machine was the error found ? Was it within a VM ?
comment:6 Changed 4 years ago by
Its obviously the test before
sage: _ = s.expect('100', timeout=90)
triggering at the ...egg/prompt_toolkit/terminal/vt100 instead of the output of print(3+97)
comment:7 Changed 4 years ago by
Can you reproduce it with SAGE_PEXPECT_LOG=yes
(log will be in ~/.sage/pexpect_logs)
comment:8 Changed 4 years ago by
Nothing appears in the ~/.sage/pexpect_logs
directory (even after starting with a fresh .sage directory).
comment:9 followup: ↓ 11 Changed 4 years ago by
If i replace 3+97
with 3+98
and s.expect('100', timeout=90)
with s.expect('101', timeout=90)
, i got a pexpect.exceptions.TIMEOUT: Timeout exceeded.
. But i guess that 90s is more than enough.
If it replace the last expect
command with a sleep(10)
, and if i print the whole s
, i got:
<pexpect.pty_spawn.spawn object at 0x8c45258c> command: /opt/patchbot/sage7.3/src/bin/sage args: ['/opt/patchbot/sage7.3/src/bin/sage'] buffer (last 100 chars): ' ' before (last 100 chars): '/build/bdist.linuxi686/egg/prompt_toolkit/terminal/vt100_input.py(266)_input_parser_generator()\r\n\r\n' after: 'ipdb>' match: <_sre.SRE_Match object at 0x8c4ca950> match_index: 0 exitstatus: None flag_eof: False pid: 21633 child_fd: 5 closed: False timeout: 30 delimiter: <class 'pexpect.exceptions.EOF'> logfile: None logfile_read: None logfile_send: None maxread: 2000 ignorecase: False searchwindowsize: None delaybeforesend: 0.05 delayafterclose: 0.1 delayafterterminate: 0.1
In the "normal" situation (i.e. after removing the .pyc
files, and with the last expect
command), the whole s
is
<pexpect.pty_spawn.spawn object at 0x8c40268c> command: /opt/patchbot/sage7.3/src/bin/sage args: ['/opt/patchbot/sage7.3/src/bin/sage'] buffer (last 100 chars): '\r\n' before (last 100 chars): ' """\r\n 2321 Returns the factorization of ``n``. The result depends on the\r\n\r\nipdb> c\r\n2 * 5\r\n' after: '101' match: <_sre.SRE_Match object at 0x8c47a988> match_index: 0 exitstatus: None flag_eof: False pid: 21962 child_fd: 5 closed: False timeout: 30 delimiter: <class 'pexpect.exceptions.EOF'> logfile: None logfile_read: None logfile_send: None maxread: 2000 ignorecase: False searchwindowsize: None delaybeforesend: 0.05 delayafterclose: 0.1 delayafterterminate: 0.1
comment:10 followups: ↓ 18 ↓ 20 Changed 4 years ago by
 Cc jdemeyer added
In both cases there is trash in s.before that seems to come from a different process, the vt100 thing is just a symptom. E.g. the "Returns the factorization of n" part is from src/sage/arith/misc.py and not src/sage/misc/trace.py. The failing doctest is literally the first one in src/sage/misc/trace.py.
Its as if pexpect does not flush stdout before forking, or something with STDOUT gets messed up. Jeroen, since you are the pexpect expert, any idea?
comment:11 in reply to: ↑ 9 Changed 4 years ago by
Note that in the logs of comment 9, i ran only the culprit doctest in a separate ./sage script.sage
, where script.sage is:
import pexpect s = pexpect.spawn('sage') _ = s.sendline("trace('print(factor(10))'); print(3+98)") _ = s.expect('ipdb>', timeout=90) _ = s.sendline("s"); _ = s.sendline("c") _ = s.expect('101', timeout=90) # print(s.before[s.before.find(''):]) print(s)
comment:12 Changed 4 years ago by
 Cc slelievre added
 Description modified (diff)
Expand ticket description with link to sagerelease discussion.
comment:13 Changed 4 years ago by
Here the trace.py
doctest passes if I make the change
@@ 53,7 +53,7 @@ def trace(code, preparse=True): sage: import pexpect sage: s = pexpect.spawn('sage')  sage: _ = s.sendline("trace('print(factor(10))'); print(3+97)") + sage: _ = s.sendline("trace('print(factor(10))'); print(100)") sage: _ = s.expect('ipdb>', timeout=90) sage: _ = s.sendline("s"); _ = s.sendline("c"); sage: _ = s.expect('100', timeout=90)
or is this cheating?
comment:14 Changed 4 years ago by
I should note that I have a sageongentoo install and the trace.py
doctest passes without modification. And I don't believe anything special is done with pexpect
. And it is pexpect4.1.0
that's installed.
comment:15 Changed 4 years ago by
I think the test feeds in print(3+97) to avoid matching the "100" from the terminal echo.
comment:16 Changed 4 years ago by
Curiously, this does not now fail for me with sage7.4.beta2
.
comment:17 Changed 4 years ago by
Let me relaunch my patchbot, it might pass now :P
comment:18 in reply to: ↑ 10 Changed 4 years ago by
Replying to vbraun:
Its as if pexpect does not flush stdout before forking, or something with STDOUT gets messed up. Jeroen, since you are the pexpect expert, any idea?
I cannot reproduce the problem.
comment:19 Changed 4 years ago by
To somebody who can reproduce the problem: what happens when you do the following in an interactive session:
jdemeyer@tamiyo:/usr/local/src/sagegit$ ./sage ┌────────────────────────────────────────────────────────────────────┐ │ SageMath version 7.4.beta2, Release Date: 20160826 │ │ Type "notebook()" for the browserbased notebook interface. │ │ Type "help()" for help. │ └────────────────────────────────────────────────────────────────────┘ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Warning: this is a prerelease version, and it may be unstable. ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ sage: trace('print(factor(10))'); print(3+97) > <string>(1)<module>() ipdb>
Now type s
:
ipdb> s Call > /usr/local/src/sagegit/local/lib/python2.7/sitepackages/sage/arith/misc.py(2319)factor() 2317 return ZZ(n).trial_division(bound) 2318 > 2319 def factor(n, proof=None, int_=False, algorithm='pari', verbose=0, **kwds): 2320 """ 2321 Returns the factorization of ``n``. The result depends on the
Now type c
:
ipdb> c 2 * 5 100
comment:20 in reply to: ↑ 10 Changed 4 years ago by
Replying to vbraun:
the vt100 thing is just a symptom
I doubt that, I think it really is the cause. Something must print the vt100
string and I wonder what does that.
comment:21 followup: ↓ 22 Changed 4 years ago by
Thats what I meant: The test failure is entirely accidental, the real problem is that there is unrelated stuff in the pexpect input.
comment:22 in reply to: ↑ 21 Changed 4 years ago by
Replying to vbraun:
the real problem is that there is unrelated stuff in the pexpect input.
But not because of a pexpect
problem. The strange input is real, the question is: where does it come from?
comment:23 Changed 4 years ago by
Tests pass for me on 7.4.beta2
. Let me recompile for 7.4.beta1
to reproduce the thing.
comment:24 followup: ↓ 25 Changed 4 years ago by
Interactively, i got nothing weird:
┌────────────────────────────────────────────────────────────────────┐ │ SageMath version 7.4.beta1, Release Date: 20160817 │ │ Type "notebook()" for the browserbased notebook interface. │ │ Type "help()" for help. │ └────────────────────────────────────────────────────────────────────┘ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Warning: this is a prerelease version, and it may be unstable. ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ sage: trace('print(factor(10))'); print(3+97) > <string>(1)<module>() ipdb> s Call > /opt/patchbot/sage7.3/local/lib/python2.7/sitepackages/sage/arith/misc.py(2319)factor() 2317 return ZZ(n).trial_division(bound) 2318 > 2319 def factor(n, proof=None, int_=False, algorithm='pari', verbose=0, **kwds): 2320 """ 2321 Returns the factorization of ``n``. The result depends on the ipdb> c 2 * 5 100
See my previous comments for behaviour within a sage script.
comment:25 in reply to: ↑ 24 Changed 4 years ago by
Replying to tmonteil:
Interactively, i got nothing weird:
If you still have your 7.4.beta1
could you try:
import pexpect s = pexpect.spawn('sage') _ = s.sendline("trace('print(factor(10))'); print(3+97)") _ = s.expect('ipdb>', timeout=90) print(s.before)
from a .sage
script? I don't have a 7.4.beta1
but with a sageongentoo (7.4.beta2) if I run the script 20+ times successively I occasionally get this failure
┌────────────────────────────────────────────────────────────────────┐ │ SageMath version 7.4.beta2, Release Date: 20160826 │ │ Type "notebook()" for the browserbased notebook interface. │ │ Type "help()" for help. │ └────────────────────────────────────────────────────────────────────┘ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Warning: this is a prerelease version, and it may be unstable. ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ trace('print(factor(10))'); print(3+97) sage: trace('print(factor(10))'); print(3+97) GeneratorExit: None > /storage/strogdon/gentoorap/usr/lib64/python2.7/sitepackages/prompt_toolkit/terminal/vt100_input.py(266)_input_parser_generator() 264 return None 265 > 266 def _input_parser_generator(self): 267 """ 268 Coroutine (state machine) for the input parser.
which I believe looks
like the failure I was getting in vanilla Sage, modulo the path to the indicated file. I don't know about the GeneratorExit:
?
comment:26 Changed 4 years ago by
I've reverted my vanilla 7.4.beta2
to 7.4.beta1
and I seem to be unable to now get trace.py
to fail. And it failed consistently when moving from 7.4.beta0
to 7.4.beta1
.
comment:27 Changed 4 years ago by
When I add a log file pexpect.spawn(..., logfile=...)
I get the following for a successful run, together with the 4+97=101 trick to avoid matching the vt100:
trace('print(factor(10))'); print(4+97) ┌────────────────────────────────────────────────────────────────────┐ │ SageMath version 7.6, Release Date: 20170325 │ │ Type "notebook()" for the browserbased notebook interface. │ │ Type "help()" for help. │ └────────────────────────────────────────────────────────────────────┘ trace('print(factor(10))'); print(4+97) sage: ^[sage: trace('print(factor(10))'); print(4+97) > <string>(1)<module>() ipdb> s c s Call > /mnt/disk/home/release/Sage/local/lib/python2.7/sitepackages/sage/arith/misc.py(2281)factor() 2279 return ZZ(n).trial_division(bound) 2280 > 2281 def factor(n, proof=None, int_=False, algorithm='pari', verbose=0, **kwds): 2282 """ 2283 Returns the factorization of ``n``. The result depends on the ipdb> c 2 * 5 101
and this in the random error case:
trace('print(factor(10))'); print(4+97) ┌────────────────────────────────────────────────────────────────────┐ │ SageMath version 7.6, Release Date: 20170325 │ │ Type "notebook()" for the browserbased notebook interface. │ │ Type "help()" for help. │ └────────────────────────────────────────────────────────────────────┘ trace('print(factor(10))'); print(4+97) sage: ^[sage: trace('print(factor(10))'); print(4+97) GeneratorExit: None > /mnt/disk/home/release/Sage/local/lib/python2.7/sitepackages/prompt_toolkit/terminal/vt100_input.py(266)_input_parser_generator() 264 return None 265 > 266 def _input_parser_generator(self): 267 """ 268 Coroutine (state machine) for the input parser. ipdb> s c s Return None > /mnt/disk/home/release/Sage/local/lib/python2.7/sitepackages/prompt_toolkit/terminal/vt100_input.py(266)_input_parser_generator() 264 return None 265 > 266 def _input_parser_generator(self): 267 """ 268 Coroutine (state machine) for the input parser. ipdb> c u'\n'
comment:28 Changed 4 years ago by
The GeneratorExit is part of the mechanism of a generator being garbage collected; Looks like ipdb accidentally catches that exception by lucky timing?
comment:29 Changed 3 years ago by
 Branch set to u/vbraun/random_failure_in_trace_py
comment:30 Changed 3 years ago by
 Commit set to c1062b631b529c2755abcce53924db472ee5360b
 Priority changed from major to blocker
 Status changed from new to needs_review
So the hypothesis is that the debugger can (though unlikely) jump into an internal GeneratorExit that is part of garbage collection.
In the attached patch I disabled garbage collection for the test, and I wasn't able to trigger the bug again. Either its fixed or timings changed so that I can't trigger it any more...
New commits:
c1062b6  Disable garbage collection while testing ipdb

comment:31 Changed 3 years ago by
 Milestone changed from sage7.4 to sage8.0
comment:32 followup: ↓ 33 Changed 3 years ago by
ping
comment:33 in reply to: ↑ 32 Changed 3 years ago by
comment:34 Changed 3 years ago by
 Reviewers set to Thierry Monteil
 Status changed from needs_review to positive_review
Same on my side: the issue disapeared both on this branch and on 8.0.beta7
. So, i am not sure whether the disabling of the garbage collection is of any help. Let it be like this.
comment:35 Changed 3 years ago by
 Branch changed from u/vbraun/random_failure_in_trace_py to c1062b631b529c2755abcce53924db472ee5360b
 Resolution set to fixed
 Status changed from positive_review to closed
I got the same error (within a VM).