Opened 4 years ago

Closed 3 years ago

#21258 closed defect (fixed)

Random failure in trace.py

Reported by: vbraun Owned by:
Priority: blocker Milestone: sage-8.0
Component: misc Keywords: random_fail
Cc: tmonteil, jdemeyer, slelievre Merged in:
Authors: Volker Braun Reviewers: Thierry Monteil
Report Upstream: N/A Work issues:
Branch: c1062b6 (Commits) Commit: c1062b631b529c2755abcce53924db472ee5360b
Dependencies: Stopgaps:

Description (last modified by slelievre)

Reported by Emmanuel Charpentier, Daniel Krenn, Steven Trogdon in the sage-release discussion about Sage 7.4.beta1.

sage -t --long src/sage/misc/trace.py
**********************************************************************
File "src/sage/misc/trace.py", line 66, in sage.misc.trace.trace
Failed example:
    print(s.before[s.before.find('--'):])
Expected:
    --...
    ipdb> c
    2 * 5
Got:
    --Return--

    None

    > /home/buildbot/slave/sage_git/build/build/bdist.linux-x86_64/egg/prompt_toolkit/terminal/vt
**********************************************************************
1 item had failures:
   1 of  10 in sage.misc.trace.trace

Note: previous discussions of trace.py on sage-release.

Change History (35)

comment:1 Changed 4 years ago by tmonteil

  • Cc tmonteil added

I got the same error (within a VM).

Last edited 4 years ago by tmonteil (previous) (diff)

comment:2 Changed 4 years ago by tmonteil

Note that in my case, the failure does not appear randomly, but at each test.

comment:3 Changed 4 years ago by tmonteil

A few additional remarks:

  • Changing the TERM variable (vt100, linux, screen, '', ...) has no effect.
  • When i reninstall prompt_toolkit, the doctest pass once. Then further tests fail.
  • The directory mentionned in the failed doctest ($SAGE_ROOT/build/build/bdist.linux-${ARCH}/egg/prompt_toolkit/terminal/vt) does not exist.

comment:4 Changed 4 years ago by tmonteil

Further investigation:

if i remove the following .pyc files, the test pass again (only once, of course):

$SAGE_ROOT/local/lib/python2.7/site-packages/IPython/core/completer.pyc
$SAGE_ROOT/local/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc
$SAGE_ROOT/local/lib/python2.7/site-packages/IPython/lib/display.pyc

If i run the culprit test in a .sage file, i only have to remove the first two files.

comment:5 Changed 4 years ago by tmonteil

Volker, on which kind of machine was the error found ? Was it within a VM ?

comment:6 Changed 4 years ago by vbraun

Its obviously the test before

        sage: _ = s.expect('100', timeout=90)                                                                                                           

triggering at the ...egg/prompt_toolkit/terminal/vt100 instead of the output of print(3+97)

comment:7 Changed 4 years ago by vbraun

Can you reproduce it with SAGE_PEXPECT_LOG=yes (log will be in ~/.sage/pexpect_logs)

comment:8 Changed 4 years ago by tmonteil

Nothing appears in the ~/.sage/pexpect_logs directory (even after starting with a fresh .sage directory).

comment:9 follow-up: Changed 4 years ago by tmonteil

If i replace 3+97 with 3+98 and s.expect('100', timeout=90) with s.expect('101', timeout=90), i got a pexpect.exceptions.TIMEOUT: Timeout exceeded.. But i guess that 90s is more than enough.

If it replace the last expect command with a sleep(10), and if i print the whole s, i got:

<pexpect.pty_spawn.spawn object at 0x8c45258c>
command: /opt/patchbot/sage-7.3/src/bin/sage
args: ['/opt/patchbot/sage-7.3/src/bin/sage']
buffer (last 100 chars): ' '
before (last 100 chars): '/build/bdist.linux-i686/egg/prompt_toolkit/terminal/vt100_input.py(266)_input_parser_generator()\r\n\r\n'
after: 'ipdb>'
match: <_sre.SRE_Match object at 0x8c4ca950>
match_index: 0
exitstatus: None
flag_eof: False
pid: 21633
child_fd: 5
closed: False
timeout: 30
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1

In the "normal" situation (i.e. after removing the .pyc files, and with the last expect command), the whole s is

<pexpect.pty_spawn.spawn object at 0x8c40268c>
command: /opt/patchbot/sage-7.3/src/bin/sage
args: ['/opt/patchbot/sage-7.3/src/bin/sage']
buffer (last 100 chars): '\r\n'
before (last 100 chars): ' """\r\n   2321     Returns the factorization of ``n``.  The result depends on the\r\n\r\nipdb> c\r\n2 * 5\r\n'
after: '101'
match: <_sre.SRE_Match object at 0x8c47a988>
match_index: 0
exitstatus: None
flag_eof: False
pid: 21962
child_fd: 5
closed: False
timeout: 30
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1

comment:10 follow-ups: Changed 4 years ago by vbraun

  • Cc jdemeyer added

In both cases there is trash in s.before that seems to come from a different process, the vt100 thing is just a symptom. E.g. the "Returns the factorization of n" part is from src/sage/arith/misc.py and not src/sage/misc/trace.py. The failing doctest is literally the first one in src/sage/misc/trace.py.

Its as if pexpect does not flush stdout before forking, or something with STDOUT gets messed up. Jeroen, since you are the pexpect expert, any idea?

comment:11 in reply to: ↑ 9 Changed 4 years ago by tmonteil

Note that in the logs of comment 9, i ran only the culprit doctest in a separate ./sage script.sage, where script.sage is:

import pexpect
s = pexpect.spawn('sage')
_ = s.sendline("trace('print(factor(10))'); print(3+98)")
_ = s.expect('ipdb>', timeout=90)
_ = s.sendline("s"); _ = s.sendline("c")
_ = s.expect('101', timeout=90)
# print(s.before[s.before.find('--'):])
print(s)

comment:12 Changed 4 years ago by slelievre

  • Cc slelievre added
  • Description modified (diff)

Expand ticket description with link to sage-release discussion.

comment:13 Changed 4 years ago by strogdon

Here the trace.py doctest passes if I make the change

@@ -53,7 +53,7 @@ def trace(code, preparse=True):
 
         sage: import pexpect
         sage: s = pexpect.spawn('sage')
-        sage: _ = s.sendline("trace('print(factor(10))'); print(3+97)")
+        sage: _ = s.sendline("trace('print(factor(10))'); print(100)")
         sage: _ = s.expect('ipdb>', timeout=90)
         sage: _ = s.sendline("s"); _ = s.sendline("c");
         sage: _ = s.expect('100', timeout=90)

or is this cheating?

comment:14 Changed 4 years ago by strogdon

I should note that I have a sage-on-gentoo install and the trace.py doctest passes without modification. And I don't believe anything special is done with pexpect. And it is pexpect-4.1.0 that's installed.

comment:15 Changed 4 years ago by vbraun

I think the test feeds in print(3+97) to avoid matching the "100" from the terminal echo.

comment:16 Changed 4 years ago by strogdon

Curiously, this does not now fail for me with sage-7.4.beta2.

comment:17 Changed 4 years ago by tmonteil

Let me relaunch my patchbot, it might pass now :P

comment:18 in reply to: ↑ 10 Changed 4 years ago by jdemeyer

Replying to vbraun:

Its as if pexpect does not flush stdout before forking, or something with STDOUT gets messed up. Jeroen, since you are the pexpect expert, any idea?

I cannot reproduce the problem.

comment:19 Changed 4 years ago by jdemeyer

To somebody who can reproduce the problem: what happens when you do the following in an interactive session:

jdemeyer@tamiyo:/usr/local/src/sage-git$ ./sage
┌────────────────────────────────────────────────────────────────────┐
│ SageMath version 7.4.beta2, Release Date: 2016-08-26               │
│ Type "notebook()" for the browser-based notebook interface.        │
│ Type "help()" for help.                                            │
└────────────────────────────────────────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Warning: this is a prerelease version, and it may be unstable.     ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
sage: trace('print(factor(10))'); print(3+97)
> <string>(1)<module>()

ipdb> 

Now type s:

ipdb> s
--Call--
> /usr/local/src/sage-git/local/lib/python2.7/site-packages/sage/arith/misc.py(2319)factor()
   2317         return ZZ(n).trial_division(bound)
   2318 
-> 2319 def factor(n, proof=None, int_=False, algorithm='pari', verbose=0, **kwds):
   2320     """
   2321     Returns the factorization of ``n``.  The result depends on the

Now type c:

ipdb> c
2 * 5
100
Last edited 4 years ago by jdemeyer (previous) (diff)

comment:20 in reply to: ↑ 10 Changed 4 years ago by jdemeyer

Replying to vbraun:

the vt100 thing is just a symptom

I doubt that, I think it really is the cause. Something must print the vt100 string and I wonder what does that.

comment:21 follow-up: Changed 4 years ago by vbraun

Thats what I meant: The test failure is entirely accidental, the real problem is that there is unrelated stuff in the pexpect input.

comment:22 in reply to: ↑ 21 Changed 4 years ago by jdemeyer

Replying to vbraun:

the real problem is that there is unrelated stuff in the pexpect input.

But not because of a pexpect problem. The strange input is real, the question is: where does it come from?

comment:23 Changed 4 years ago by tmonteil

Tests pass for me on 7.4.beta2. Let me recompile for 7.4.beta1 to reproduce the thing.

comment:24 follow-up: Changed 4 years ago by tmonteil

Interactively, i got nothing weird:

┌────────────────────────────────────────────────────────────────────┐
│ SageMath version 7.4.beta1, Release Date: 2016-08-17               │
│ Type "notebook()" for the browser-based notebook interface.        │
│ Type "help()" for help.                                            │
└────────────────────────────────────────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Warning: this is a prerelease version, and it may be unstable.     ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
sage: trace('print(factor(10))'); print(3+97)
> <string>(1)<module>()

ipdb> s
--Call--
> /opt/patchbot/sage-7.3/local/lib/python2.7/site-packages/sage/arith/misc.py(2319)factor()
   2317         return ZZ(n).trial_division(bound)
   2318 
-> 2319 def factor(n, proof=None, int_=False, algorithm='pari', verbose=0, **kwds):
   2320     """
   2321     Returns the factorization of ``n``.  The result depends on the

ipdb> c
2 * 5
100

See my previous comments for behaviour within a sage script.

comment:25 in reply to: ↑ 24 Changed 4 years ago by strogdon

Replying to tmonteil:

Interactively, i got nothing weird:

If you still have your 7.4.beta1 could you try:

import pexpect
s = pexpect.spawn('sage')
_ = s.sendline("trace('print(factor(10))'); print(3+97)")
_ = s.expect('ipdb>', timeout=90)
print(s.before)

from a .sage script? I don't have a 7.4.beta1 but with a sage-on-gentoo (7.4.beta2) if I run the script 20+ times successively I occasionally get this failure

┌────────────────────────────────────────────────────────────────────┐
│ SageMath version 7.4.beta2, Release Date: 2016-08-26               │
│ Type "notebook()" for the browser-based notebook interface.        │
│ Type "help()" for help.                                            │
└────────────────────────────────────────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Warning: this is a prerelease version, and it may be unstable.     ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
trace('print(factor(10))'); print(3+97)
sage: trace('print(factor(10))'); print(3+97)
GeneratorExit: None
> /storage/strogdon/gentoo-rap/usr/lib64/python2.7/site-packages/prompt_toolkit/terminal/vt100_input.py(266)_input_parser_generator()
    264             return None
    265 
--> 266     def _input_parser_generator(self):
    267         """
    268         Coroutine (state machine) for the input parser.

which I believe looks like the failure I was getting in vanilla Sage, modulo the path to the indicated file. I don't know about the GeneratorExit:?

comment:26 Changed 4 years ago by strogdon

I've reverted my vanilla 7.4.beta2 to 7.4.beta1 and I seem to be unable to now get trace.py to fail. And it failed consistently when moving from 7.4.beta0 to 7.4.beta1.

comment:27 Changed 4 years ago by vbraun

When I add a log file pexpect.spawn(..., logfile=...) I get the following for a successful run, together with the 4+97=101 trick to avoid matching the vt100:

trace('print(factor(10))'); print(4+97)
┌────────────────────────────────────────────────────────────────────┐
│ SageMath version 7.6, Release Date: 2017-03-25                     │
│ Type "notebook()" for the browser-based notebook interface.        │
│ Type "help()" for help.                                            │
└────────────────────────────────────────────────────────────────────┘
trace('print(factor(10))'); print(4+97)
sage: ^[sage: trace('print(factor(10))'); print(4+97)
> <string>(1)<module>()

ipdb> s
c
s
--Call--
> /mnt/disk/home/release/Sage/local/lib/python2.7/site-packages/sage/arith/misc.py(2281)factor()
   2279         return ZZ(n).trial_division(bound)
   2280 
-> 2281 def factor(n, proof=None, int_=False, algorithm='pari', verbose=0, **kwds):
   2282     """
   2283     Returns the factorization of ``n``.  The result depends on the

ipdb> c
2 * 5
101

and this in the random error case:

trace('print(factor(10))'); print(4+97)
┌────────────────────────────────────────────────────────────────────┐
│ SageMath version 7.6, Release Date: 2017-03-25                     │
│ Type "notebook()" for the browser-based notebook interface.        │
│ Type "help()" for help.                                            │
└────────────────────────────────────────────────────────────────────┘
trace('print(factor(10))'); print(4+97)
sage: ^[sage: trace('print(factor(10))'); print(4+97)
GeneratorExit: None
> /mnt/disk/home/release/Sage/local/lib/python2.7/site-packages/prompt_toolkit/terminal/vt100_input.py(266)_input_parser_generator()
    264             return None
    265 
--> 266     def _input_parser_generator(self):
    267         """
    268         Coroutine (state machine) for the input parser.

ipdb> s
c
s
--Return--
None
> /mnt/disk/home/release/Sage/local/lib/python2.7/site-packages/prompt_toolkit/terminal/vt100_input.py(266)_input_parser_generator()
    264             return None
    265 
--> 266     def _input_parser_generator(self):
    267         """
    268         Coroutine (state machine) for the input parser.

ipdb> c
u'\n'

comment:28 Changed 4 years ago by vbraun

The GeneratorExit is part of the mechanism of a generator being garbage collected; Looks like ipdb accidentally catches that exception by lucky timing?

comment:29 Changed 3 years ago by vbraun

  • Branch set to u/vbraun/random_failure_in_trace_py

comment:30 Changed 3 years ago by vbraun

  • Authors set to Volker Braun
  • Commit set to c1062b631b529c2755abcce53924db472ee5360b
  • Priority changed from major to blocker
  • Status changed from new to needs_review

So the hypothesis is that the debugger can (though unlikely) jump into an internal GeneratorExit that is part of garbage collection.

In the attached patch I disabled garbage collection for the test, and I wasn't able to trigger the bug again. Either its fixed or timings changed so that I can't trigger it any more...


New commits:

c1062b6Disable garbage collection while testing ipdb

comment:31 Changed 3 years ago by kcrisman

  • Milestone changed from sage-7.4 to sage-8.0

comment:32 follow-up: Changed 3 years ago by vbraun

ping

comment:33 in reply to: ↑ 32 Changed 3 years ago by tmonteil

Replying to vbraun:

ping

Trying the fix right now, compilation might take time.

comment:34 Changed 3 years ago by tmonteil

  • Reviewers set to Thierry Monteil
  • Status changed from needs_review to positive_review

Same on my side: the issue disapeared both on this branch and on 8.0.beta7. So, i am not sure whether the disabling of the garbage collection is of any help. Let it be like this.

comment:35 Changed 3 years ago by vbraun

  • Branch changed from u/vbraun/random_failure_in_trace_py to c1062b631b529c2755abcce53924db472ee5360b
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.