Opened 10 years ago

Last modified 10 years ago

#13889 closed enhancement

Better automatic backtrace — at Version 17

Reported by: Volker Braun Owned by: Georg S. Weber
Priority: major Milestone: sage-5.7
Component: build Keywords:
Cc: Simon King Merged in:
Authors: Volker Braun Reviewers:
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: #13881 Stopgaps:

Status badges

Description (last modified by Jeroen Demeyer)

When Sage encounters a SIGSEGV, it uses the glibc backtrace. This is sometimes seen in doctests that are not reproducible, making it hard to debug. The aim of this ticket is to return better automatic stack traces by automatically attaching gdb and running scripts in gdb's Python interpreter. This will will pinpoint the cause much easier, especially in a debug build.

The log is printed to stdout and automatically saved to a log file in $DOT_SAGE/crash_logs with logs older than a week automatically deleted. A sample log is sage_crash_wKcUKj.log

Apply

Change History (17)

comment:1 Changed 10 years ago by Volker Braun

Description: modified (diff)

comment:2 Changed 10 years ago by Volker Braun

Description: modified (diff)

comment:3 Changed 10 years ago by Volker Braun

Dependencies: #13881

comment:4 Changed 10 years ago by Volker Braun

Authors: Volker Braun
Status: newneeds_review

comment:5 Changed 10 years ago by Simon King

Cc: Simon King added

comment:6 Changed 10 years ago by Jeroen Demeyer

I don't think I know enough about gdb to review this completely, but some remarks anyway:

  1. There is a double "and" in
    "Sage will now try to attach the debugger to get more information and\n"
    "and then terminate.\n"
    
  1. Why should the CSI script kill the Sage process? Couldn't you use a different IPC mechanism, such as a simple wait() (or a different signal or a pipe if you want)? The sentence "The target process is frozen while this script runs and resumes when it is finished." in sage-CSI is currently wrong.
  1. I preferred how the backtrace currently appears before the "Unhandled SIGSEGV: ..." message. When something goes wrong, that message stands out at the end of the console output. I think all those backtraces might confuse less experienced Sage users.
  1. In run_gdb(), stderr will always be None since you didn't redirect it. That's fine, so remove
    if stderr != None: 
        result.append('An error ocurred:') 
        result.append(stderr)
    
  1. The Popen command should probably be wrapped in a
    try:
        cmd = Popen(...)
    except OSError:
        return ""
    

block to handle the case that gdb isn't installed (and I guess empty backtraces should not be saved to a file).

Last edited 10 years ago by Jeroen Demeyer (previous) (diff)

comment:7 in reply to:  6 ; Changed 10 years ago by Simon King

Replying to jdemeyer:

I don't think I know enough about gdb

At least you certainly know more than I do. So, sorry for my unqualified comments...

  1. I preferred how the backtrace currently appears before the "Unhandled SIGSEGV: ..." message. When something goes wrong, that message stands out at the end of the console output.

Yes, when an error in Python is raised, one first sees the backtrace and then the error message. If Sage crashes, it makes sense to show first the backtrace and then the error message as well.

I think all those backtraces might confuse less experienced Sage users.

Well, a less experienced Sage user will hopefully never see such backtraces.

What I mean is: Sage is supposed to run without crashes when doing low-level stuff. Hence, crashes are supposed to only occur if the user creates new buggy Cython programs in Sage, or tries to tweak Sage until it exposes its existing bugs. And I think a user who does such "real" programming and developing is experienced enough to not be confused by a backtrace.

comment:8 in reply to:  7 Changed 10 years ago by Jeroen Demeyer

Replying to SimonKing:

Well, a less experienced Sage user will hopefully never see such backtraces.

What I mean is: Sage is supposed to run without crashes when doing low-level stuff.

Unfornately, "hopefully" and "should" aren't always applicable. There are plenty of cases where seemingly innocent calculations produce the dreaded "Unhandled SIGSEGV" message.

comment:9 Changed 10 years ago by Volker Braun

It makes sense to have the "Unhandled SIGSEGV..." message at the end. I don't think trying to get back from the debugger into the main program is a good idea. Sage is currently in a signal handler after something bad happened, there are no guarantees that anything is still working. In fact, just calling printf() from the signal handler is, strictly speaking, illegal. But the sage-CSI script can easily print the message.

The "The target process is frozen while this script runs and resumes when it is finished." is correct (unless you add --kill, what did you expect). In fact you can use the sage-CSI to see the status of a long-running computation without disturbing it if you don't pass --kill.

comment:10 in reply to:  9 Changed 10 years ago by Jeroen Demeyer

Replying to vbraun:

It makes sense to have the "Unhandled SIGSEGV..." message at the end. I don't think trying to get back from the debugger into the main program is a good idea. Sage is currently in a signal handler after something bad happened, there are no guarantees that anything is still working.

True, but what's your point? Your execlp() might also fail for this reason. While fork() and wait() are basically pure system calls, so much less probable to fail.

So I stand by my proposal to remove the sleep(10) kludge and replace it by a wait(). Then you don't need to kill the parent process.

comment:11 Changed 10 years ago by Volker Braun

I don't mind going the wait() route. Its still illegal to use printf() in the signal handler but seems to work more often than not.

comment:12 Changed 10 years ago by Volker Braun

I've addressed all issues. Also, I don't see why you need to be an expert in GDB to review this ticket.. ;-)

comment:13 Changed 10 years ago by Jeroen Demeyer

The GDB thing doesn't quite work for me, don't know why:

------------------------------------------------------------------------
Attaching to process id 8267.
GNU gdb 6.7.1
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
(gdb) Hangup detected on fd 0
error detected on stdin
Saved trace to /home/jdemeyer/.sage/crash_logs/sage_crash_WM1Ga9.log

------------------------------------------------------------------------

comment:14 Changed 10 years ago by Volker Braun

Gdb 6.7.1 is ancient (vintage 2007). I doubt it has any Python supporty...

comment:15 Changed 10 years ago by Jeroen Demeyer

Fair enough, compiling gdb-7.5 now...

comment:16 Changed 10 years ago by Jeroen Demeyer

Please have a look at this logfile, the last part looks broken.

It was generated by

./sage -c "from sage.tests.interrupt import *; unguarded_abort()"

comment:17 Changed 10 years ago by Jeroen Demeyer

Description: modified (diff)
Status: needs_reviewneeds_work
Note: See TracTickets for help on using tickets.