Opened 11 years ago
Closed 11 years ago
#9616 closed defect (fixed)
Errno 16 / NFS problems with parallel/decorate.py
Reported by: | ddrake | Owned by: | mvngu |
---|---|---|---|
Priority: | blocker | Milestone: | sage-4.5.2 |
Component: | doctest coverage | Keywords: | fork nfs device resource busy |
Cc: | leif, jhpalmieri, kcrisman, malb, mvngu, SimonKing, was | Merged in: | sage-4.5.2.rc0 |
Authors: | Mitesh Patel | Reviewers: | John Palmieri |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description
In 4.5.2.alpha1, we have for many people:
sage -t -long "devel/sage/sage/parallel/decorate.py" ------------------------------------------------------------ Unhandled SIGSEGV: A segmentation fault occurred in Sage. This probably occurred because a *compiled* component of Sage has a bug in it (typically accessing invalid memory) or is not properly wrapped with _sig_on, _sig_off. You might want to run Sage under gdb with 'sage -gdb' to debug this. Sage will now terminate (sorry). ------------------------------------------------------------ ********************************************************************** File "/mnt/usb1/scratch/drake/release/tmp/sage-4.5.2.alpha1/devel/sage/sage/parallel/decorate.py", line 300: sage: g() Expected: '10' Got: [Errno 16] Device or resource busy: '/home/drake/.sage/temp/sage.math.washington.edu/30336/dir_0/.nfs0000000000591f8700069d5c' '10' ********************************************************************** File "/mnt/usb1/scratch/drake/release/tmp/sage-4.5.2.alpha1/devel/sage/sage/parallel/decorate.py", line 311: sage: g() Expected: 'a' Got: [Errno 16] Device or resource busy: '/home/drake/.sage/temp/sage.math.washington.edu/30336/dir_1/.nfs0000000000591f8d00069d5d' 'a' **********************************************************************
and so on. See https://groups.google.com/group/sage-release/msg/88b030aa31926459 and that thread.
This seems related to #9501.
Attachments (2)
Change History (17)
comment:1 follow-up: ↓ 3 Changed 11 years ago by
comment:2 follow-up: ↓ 5 Changed 11 years ago by
- Cc leif jhpalmieri kcrisman malb mvngu SimonKing was added
For now, should we tag the relevant tests with # not tested
or backout the whole patch? Other options?
comment:3 in reply to: ↑ 1 Changed 11 years ago by
By the way, here are the latest doctesting exist codes (cf. #9243), from the top of sage-doctest
:
# Return value in process exit code: # 0: all tests passed # 1: file not found # 2: KeyboardInterrupt # 4: doctest process was terminated by a signal # 8: the doctesting framework raised an exception # 16: script called with bad options # 32: (used internally in sage-ptest) # 64: time out # 128: failed doctests
comment:4 Changed 11 years ago by
- Keywords device resource busy added; segfault removed
- Summary changed from segfault / NFS problems with parallel/decorate.py to Errno 16 / NFS problems with parallel/decorate.py
According to William on sage-release, the segfault is an intentional part of a doctest, so I've changed the ticket's title.
comment:5 in reply to: ↑ 2 ; follow-up: ↓ 6 Changed 11 years ago by
Replying to mpatel:
For now, should we tag the relevant tests with
# not tested
or backout the whole patch? Other options?
If we backout the whole patch, I have more confidence that the doctests will get fixed quickly.
comment:6 in reply to: ↑ 5 Changed 11 years ago by
- Status changed from new to needs_review
Replying to jhpalmieri:
Replying to mpatel:
For now, should we tag the relevant tests with
# not tested
or backout the whole patch? Other options?If we backout the whole patch, I have more confidence that the doctests will get fixed quickly.
Adapting the procedure in this comment at #9583, I've attached a patch that undoes (or should undo) all of #9501. If the patch gets a positive review, we can open a new ticket for re-merging #9501.
comment:7 follow-up: ↓ 8 Changed 11 years ago by
Hmmm, of course a simple procedure, but we'd back out too much in my opinion...
But I can live with that. (And I'm currently too (laz|bus)y to sort out the desirable parts of the original patch.)
comment:8 in reply to: ↑ 7 Changed 11 years ago by
Replying to leif:
(And I'm currently too (laz|bus)y to sort out the desirable parts of the original patch.)
Couldn't resist though (simpler as expected).
Not very tested, only successfully ran ./sage -t -long devel/sage/sage/parallel/
and rebuilt the documentation without errors or warnings.
(Ubuntu 9.04 x86_64, Core2, gcc 4.3.3)
So now two concurrent patches to review... ;-)
comment:9 follow-up: ↓ 11 Changed 11 years ago by
I've tested mpatel's patch on 5 machines: 4 on which the problem originally occurred (sage.math and skynet machines eno, iras, and taurus) and one machine (running OS X) which didn't have the original problem. After applying the patch, all tests pass for the directory "parallel" on all 5 machines. Long doctests for the whole Sage library pass on sage.math and taurus except for previously known, unrelated, failures.
I don't know if I'll get to leif's patch.
Since this is a rollback to a previous situation, I think this is good enough for a positive review for mpatel's patch, though. Opinions?
comment:10 Changed 11 years ago by
Let the release managers decide... ;-)
comment:11 in reply to: ↑ 9 ; follow-up: ↓ 12 Changed 11 years ago by
- Reviewers set to John Palmieri
- Status changed from needs_review to positive_review
Replying to jhpalmieri:
Since this is a rollback to a previous situation, I think this is good enough for a positive review for mpatel's patch, though. Opinions?
You've done some good testing, and since the original patch was an enhancement, and didn't fix any bugs or failing doctests (right?), I think a positive review is warranted here.
comment:12 in reply to: ↑ 11 Changed 11 years ago by
Replying to ddrake:
... since the original patch was an enhancement, and didn't fix any bugs or failing doctests (right?)
Well, I pushed back in mostly fixes (and improvements) to the documentation (one might consider bugfixes, too).
comment:13 Changed 11 years ago by
Besides the above mentioned, this would completely miss, too:
- ``reset_interface`` -- if True (the default), all the pexpect interfaces are reset in the forked off subprocesses. You definitely want this, since not doing this can lead to weird issues.
comment:14 Changed 11 years ago by
Ooops, forget my last comment: The reset is performed just unconditionally in Mitesh's version.
comment:15 Changed 11 years ago by
- Merged in set to sage-4.5.2.rc0
- Resolution set to fixed
- Status changed from positive_review to closed
A partial backout, since it retains only some of the changes from #9501, needs a new review, which currently, at least, we don't have. Given the need to press forward with the 4.5.2 release cycle, I'm merging trac_9616-backout_9501_fork_deco.patch into 4.5.2.rc0.
This may not be an ideal resolution, but it seems reasonable given the circumstances. Absolutely no offense is intended.
I've opened #9631 for re-merging #9501 after we fix the NFS/doctest problem.
For what they're worth, tests on sage.math with variations on
end with
for the default DOT_SAGE,
/scratch/$USER/.sage
, and/dev/shm/$USER/.sage
, respectively.