Opened 11 years ago

Closed 11 years ago

#9616 closed defect (fixed)

Errno 16 / NFS problems with parallel/decorate.py

Reported by: ddrake Owned by: mvngu
Priority: blocker Milestone: sage-4.5.2
Component: doctest coverage Keywords: fork nfs device resource busy
Cc: leif, jhpalmieri, kcrisman, malb, mvngu, SimonKing, was Merged in: sage-4.5.2.rc0
Authors: Mitesh Patel Reviewers: John Palmieri
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description

In 4.5.2.alpha1, we have for many people:

sage -t -long "devel/sage/sage/parallel/decorate.py"        


------------------------------------------------------------
Unhandled SIGSEGV: A segmentation fault occurred in Sage.
This probably occurred because a *compiled* component
of Sage has a bug in it (typically accessing invalid memory)
or is not properly wrapped with _sig_on, _sig_off.
You might want to run Sage under gdb with 'sage -gdb' to debug this.
Sage will now terminate (sorry).
------------------------------------------------------------

**********************************************************************
File "/mnt/usb1/scratch/drake/release/tmp/sage-4.5.2.alpha1/devel/sage/sage/parallel/decorate.py", line 300:
    sage: g()
Expected:
    '10'
Got:
    [Errno 16] Device or resource busy: '/home/drake/.sage/temp/sage.math.washington.edu/30336/dir_0/.nfs0000000000591f8700069d5c'
    '10'
**********************************************************************
File "/mnt/usb1/scratch/drake/release/tmp/sage-4.5.2.alpha1/devel/sage/sage/parallel/decorate.py", line 311:
    sage: g()
Expected:
    'a'
Got:
    [Errno 16] Device or resource busy: '/home/drake/.sage/temp/sage.math.washington.edu/30336/dir_1/.nfs0000000000591f8d00069d5d'
    'a'
**********************************************************************

and so on. See https://groups.google.com/group/sage-release/msg/88b030aa31926459 and that thread.

This seems related to #9501.

Attachments (2)

trac_9616-backout_9501_fork_deco.patch (10.1 KB) - added by mpatel 11 years ago.
Backout #9501
trac_9616-backout_only_some_of_9501.patch (4.8 KB) - added by leif 11 years ago.
Backouts only ticket-relevant parts of #9501 (subset of Mitesh's patch)

Download all attachments as: .zip

Change History (17)

comment:1 follow-up: Changed 11 years ago by mpatel

For what they're worth, tests on sage.math with variations on

#!/bin/bash                                                                     

# This does not keep overall statistics:                                        
# env SAGE_TEST_GLOBAL_ITER=100 ./sage -tp 1 -long /path/to/file.py                     

SAGE_TEST="./sage -t -long"
#SAGE_TEST="env DOT_SAGE=/dev/shm/$USER/.sage $SAGE_TEST"                       
#SAGE_TEST="env DOT_SAGE=/scratch/$USER/.sage $SAGE_TEST"                       
RUNS=100
for I in `seq 1 $RUNS`;
do
    $SAGE_TEST devel/sage/sage/parallel/decorate.py
    CODE[$I]=$?

    echo "Results after $I of $RUNS runs:"
    echo "${CODE[*]}" | tr ' ' '\n' | sort -n | uniq -c
done

end with

Results after 100 of 100 runs:                                          
     1 0                                                                      
    99 128                                                                    
Results after 100 of 100 runs:                                       
   100 0                                                                      
Results after 100 of 100 runs:                                       
   100 0                                                                      

for the default DOT_SAGE, /scratch/$USER/.sage, and /dev/shm/$USER/.sage, respectively.

comment:2 follow-up: Changed 11 years ago by mpatel

  • Cc leif jhpalmieri kcrisman malb mvngu SimonKing was added

For now, should we tag the relevant tests with # not tested or backout the whole patch? Other options?

comment:3 in reply to: ↑ 1 Changed 11 years ago by mpatel

By the way, here are the latest doctesting exist codes (cf. #9243), from the top of sage-doctest:

# Return value in process exit code:
# 0: all tests passed
# 1: file not found
# 2: KeyboardInterrupt
# 4: doctest process was terminated by a signal
# 8: the doctesting framework raised an exception
# 16: script called with bad options
# 32: (used internally in sage-ptest)
# 64: time out
# 128: failed doctests

comment:4 Changed 11 years ago by leif

  • Keywords device resource busy added; segfault removed
  • Summary changed from segfault / NFS problems with parallel/decorate.py to Errno 16 / NFS problems with parallel/decorate.py

According to William on sage-release, the segfault is an intentional part of a doctest, so I've changed the ticket's title.

comment:5 in reply to: ↑ 2 ; follow-up: Changed 11 years ago by jhpalmieri

Replying to mpatel:

For now, should we tag the relevant tests with # not tested or backout the whole patch? Other options?

If we backout the whole patch, I have more confidence that the doctests will get fixed quickly.

Changed 11 years ago by mpatel

Backout #9501

comment:6 in reply to: ↑ 5 Changed 11 years ago by mpatel

  • Authors set to Mitesh Patel
  • Status changed from new to needs_review

Replying to jhpalmieri:

Replying to mpatel:

For now, should we tag the relevant tests with # not tested or backout the whole patch? Other options?

If we backout the whole patch, I have more confidence that the doctests will get fixed quickly.

Adapting the procedure in this comment at #9583, I've attached a patch that undoes (or should undo) all of #9501. If the patch gets a positive review, we can open a new ticket for re-merging #9501.

comment:7 follow-up: Changed 11 years ago by leif

Hmmm, of course a simple procedure, but we'd back out too much in my opinion...

But I can live with that. (And I'm currently too (laz|bus)y to sort out the desirable parts of the original patch.)

Changed 11 years ago by leif

Backouts only ticket-relevant parts of #9501 (subset of Mitesh's patch)

comment:8 in reply to: ↑ 7 Changed 11 years ago by leif

Replying to leif:

(And I'm currently too (laz|bus)y to sort out the desirable parts of the original patch.)

Couldn't resist though (simpler as expected).

Not very tested, only successfully ran ./sage -t -long devel/sage/sage/parallel/ and rebuilt the documentation without errors or warnings. (Ubuntu 9.04 x86_64, Core2, gcc 4.3.3)

So now two concurrent patches to review... ;-)

comment:9 follow-up: Changed 11 years ago by jhpalmieri

I've tested mpatel's patch on 5 machines: 4 on which the problem originally occurred (sage.math and skynet machines eno, iras, and taurus) and one machine (running OS X) which didn't have the original problem. After applying the patch, all tests pass for the directory "parallel" on all 5 machines. Long doctests for the whole Sage library pass on sage.math and taurus except for previously known, unrelated, failures.

I don't know if I'll get to leif's patch.

Since this is a rollback to a previous situation, I think this is good enough for a positive review for mpatel's patch, though. Opinions?

comment:10 Changed 11 years ago by leif

Let the release managers decide... ;-)

comment:11 in reply to: ↑ 9 ; follow-up: Changed 11 years ago by ddrake

  • Reviewers set to John Palmieri
  • Status changed from needs_review to positive_review

Replying to jhpalmieri:

Since this is a rollback to a previous situation, I think this is good enough for a positive review for mpatel's patch, though. Opinions?

You've done some good testing, and since the original patch was an enhancement, and didn't fix any bugs or failing doctests (right?), I think a positive review is warranted here.

comment:12 in reply to: ↑ 11 Changed 11 years ago by leif

Replying to ddrake:

... since the original patch was an enhancement, and didn't fix any bugs or failing doctests (right?)

Well, I pushed back in mostly fixes (and improvements) to the documentation (one might consider bugfixes, too).

comment:13 Changed 11 years ago by leif

Besides the above mentioned, this would completely miss, too:

       - ``reset_interface`` -- if True (the default), all the 
         pexpect interfaces are reset in the forked off 
         subprocesses.  You definitely want this, since not doing 
         this can lead to weird issues.

comment:14 Changed 11 years ago by leif

Ooops, forget my last comment: The reset is performed just unconditionally in Mitesh's version.

comment:15 Changed 11 years ago by mpatel

  • Merged in set to sage-4.5.2.rc0
  • Resolution set to fixed
  • Status changed from positive_review to closed

A partial backout, since it retains only some of the changes from #9501, needs a new review, which currently, at least, we don't have. Given the need to press forward with the 4.5.2 release cycle, I'm merging trac_9616-backout_9501_fork_deco.patch into 4.5.2.rc0.

This may not be an ideal resolution, but it seems reasonable given the circumstances. Absolutely no offense is intended.

I've opened #9631 for re-merging #9501 after we fix the NFS/doctest problem.

Note: See TracTickets for help on using tickets.