Opened 12 years ago
Closed 12 years ago
#9616 closed defect (fixed)
Errno 16 / NFS problems with parallel/decorate.py
Reported by: | Dan Drake | Owned by: | Minh Van Nguyen |
---|---|---|---|
Priority: | blocker | Milestone: | sage-4.5.2 |
Component: | doctest coverage | Keywords: | fork nfs device resource busy |
Cc: | Leif Leonhardy, John Palmieri, Karl-Dieter Crisman, Martin Albrecht, Minh Van Nguyen, Simon King, William Stein | Merged in: | sage-4.5.2.rc0 |
Authors: | Mitesh Patel | Reviewers: | John Palmieri |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description
In 4.5.2.alpha1, we have for many people:
sage -t -long "devel/sage/sage/parallel/decorate.py" ------------------------------------------------------------ Unhandled SIGSEGV: A segmentation fault occurred in Sage. This probably occurred because a *compiled* component of Sage has a bug in it (typically accessing invalid memory) or is not properly wrapped with _sig_on, _sig_off. You might want to run Sage under gdb with 'sage -gdb' to debug this. Sage will now terminate (sorry). ------------------------------------------------------------ ********************************************************************** File "/mnt/usb1/scratch/drake/release/tmp/sage-4.5.2.alpha1/devel/sage/sage/parallel/decorate.py", line 300: sage: g() Expected: '10' Got: [Errno 16] Device or resource busy: '/home/drake/.sage/temp/sage.math.washington.edu/30336/dir_0/.nfs0000000000591f8700069d5c' '10' ********************************************************************** File "/mnt/usb1/scratch/drake/release/tmp/sage-4.5.2.alpha1/devel/sage/sage/parallel/decorate.py", line 311: sage: g() Expected: 'a' Got: [Errno 16] Device or resource busy: '/home/drake/.sage/temp/sage.math.washington.edu/30336/dir_1/.nfs0000000000591f8d00069d5d' 'a' **********************************************************************
and so on. See https://groups.google.com/group/sage-release/msg/88b030aa31926459 and that thread.
This seems related to #9501.
Attachments (2)
Change History (17)
comment:1 follow-up: 3 Changed 12 years ago by
comment:2 follow-up: 5 Changed 12 years ago by
Cc: | Leif Leonhardy John Palmieri Karl-Dieter Crisman Martin Albrecht Minh Van Nguyen Simon King William Stein added |
---|
For now, should we tag the relevant tests with # not tested
or backout the whole patch? Other options?
comment:3 Changed 12 years ago by
By the way, here are the latest doctesting exist codes (cf. #9243), from the top of sage-doctest
:
# Return value in process exit code: # 0: all tests passed # 1: file not found # 2: KeyboardInterrupt # 4: doctest process was terminated by a signal # 8: the doctesting framework raised an exception # 16: script called with bad options # 32: (used internally in sage-ptest) # 64: time out # 128: failed doctests
comment:4 Changed 12 years ago by
Keywords: | device resource busy added; segfault removed |
---|---|
Summary: | segfault / NFS problems with parallel/decorate.py → Errno 16 / NFS problems with parallel/decorate.py |
According to William on sage-release, the segfault is an intentional part of a doctest, so I've changed the ticket's title.
comment:5 follow-up: 6 Changed 12 years ago by
Replying to mpatel:
For now, should we tag the relevant tests with
# not tested
or backout the whole patch? Other options?
If we backout the whole patch, I have more confidence that the doctests will get fixed quickly.
comment:6 Changed 12 years ago by
Authors: | → Mitesh Patel |
---|---|
Status: | new → needs_review |
Replying to jhpalmieri:
Replying to mpatel:
For now, should we tag the relevant tests with
# not tested
or backout the whole patch? Other options?If we backout the whole patch, I have more confidence that the doctests will get fixed quickly.
Adapting the procedure in this comment at #9583, I've attached a patch that undoes (or should undo) all of #9501. If the patch gets a positive review, we can open a new ticket for re-merging #9501.
comment:7 follow-up: 8 Changed 12 years ago by
Hmmm, of course a simple procedure, but we'd back out too much in my opinion...
But I can live with that. (And I'm currently too (laz|bus)y to sort out the desirable parts of the original patch.)
Changed 12 years ago by
Attachment: | trac_9616-backout_only_some_of_9501.patch added |
---|
Backouts only ticket-relevant parts of #9501 (subset of Mitesh's patch)
comment:8 Changed 12 years ago by
Replying to leif:
(And I'm currently too (laz|bus)y to sort out the desirable parts of the original patch.)
Couldn't resist though (simpler as expected).
Not very tested, only successfully ran ./sage -t -long devel/sage/sage/parallel/
and rebuilt the documentation without errors or warnings.
(Ubuntu 9.04 x86_64, Core2, gcc 4.3.3)
So now two concurrent patches to review... ;-)
comment:9 follow-up: 11 Changed 12 years ago by
I've tested mpatel's patch on 5 machines: 4 on which the problem originally occurred (sage.math and skynet machines eno, iras, and taurus) and one machine (running OS X) which didn't have the original problem. After applying the patch, all tests pass for the directory "parallel" on all 5 machines. Long doctests for the whole Sage library pass on sage.math and taurus except for previously known, unrelated, failures.
I don't know if I'll get to leif's patch.
Since this is a rollback to a previous situation, I think this is good enough for a positive review for mpatel's patch, though. Opinions?
comment:11 follow-up: 12 Changed 12 years ago by
Reviewers: | → John Palmieri |
---|---|
Status: | needs_review → positive_review |
Replying to jhpalmieri:
Since this is a rollback to a previous situation, I think this is good enough for a positive review for mpatel's patch, though. Opinions?
You've done some good testing, and since the original patch was an enhancement, and didn't fix any bugs or failing doctests (right?), I think a positive review is warranted here.
comment:12 Changed 12 years ago by
Replying to ddrake:
... since the original patch was an enhancement, and didn't fix any bugs or failing doctests (right?)
Well, I pushed back in mostly fixes (and improvements) to the documentation (one might consider bugfixes, too).
comment:13 Changed 12 years ago by
Besides the above mentioned, this would completely miss, too:
- ``reset_interface`` -- if True (the default), all the pexpect interfaces are reset in the forked off subprocesses. You definitely want this, since not doing this can lead to weird issues.
comment:14 Changed 12 years ago by
Ooops, forget my last comment: The reset is performed just unconditionally in Mitesh's version.
comment:15 Changed 12 years ago by
Merged in: | → sage-4.5.2.rc0 |
---|---|
Resolution: | → fixed |
Status: | positive_review → closed |
A partial backout, since it retains only some of the changes from #9501, needs a new review, which currently, at least, we don't have. Given the need to press forward with the 4.5.2 release cycle, I'm merging trac_9616-backout_9501_fork_deco.patch into 4.5.2.rc0.
This may not be an ideal resolution, but it seems reasonable given the circumstances. Absolutely no offense is intended.
I've opened #9631 for re-merging #9501 after we fix the NFS/doctest problem.
For what they're worth, tests on sage.math with variations on
end with
for the default DOT_SAGE,
/scratch/$USER/.sage
, and/dev/shm/$USER/.sage
, respectively.