Opened 3 years ago

Closed 3 years ago

#25107 closed defect (fixed)

Ignored OSErrors in test output on Cygwin

Reported by: embray Owned by:
Priority: blocker Milestone: sage-8.2
Component: porting: Cygwin Keywords:
Cc: Merged in:
Authors: Erik Bray Reviewers: Jeroen Demeyer
Report Upstream: N/A Work issues:
Branch: 85576e9 (Commits, GitHub, GitLab) Commit: 85576e961271144e698c07af810c4d3da465c137
Dependencies: Stopgaps:

Status badges

Description

When running the test suite on Cygwin, whether a test passes or fails the test log is full (for each test module of output like):

sage -t --long src/sage/interfaces/gap.py
Exception OSError: (2, 'No such file or directory', '/tmp/tmpwVkVtd') in <bound method _TemporaryFileWrapper.__del__ of <closed file '<fdopen>', mode 'w+b' at 0x6f8032f1e40>> ignored
    [224 tests, 49.25 s]

in particular this of course causes tests of the doctest runner itself to fail, and it's also making all the patchbot runs fail.

Change History (19)

comment:1 Changed 3 years ago by embray

I suspect this probably started with #24343 somehow.

comment:2 Changed 3 years ago by jdemeyer

Obvious question: is it a Sage or Python bug? I don't have a high opinion of Python's tempfile module, it had caused problems in the past.

comment:3 Changed 3 years ago by embray

It's kind of a Python bug. I've actually been bitten by this issue before, but before #24343 it was more random and somewhat uncommon. The changes in #24343 (e.g. explicitly closing files more often) made the problem guaranteed to occur.

comment:4 Changed 3 years ago by embray

  • Authors set to Erik Bray
  • Branch set to u/embray/cygwin/temp-file-wrapper-del
  • Commit set to 75720fd44a14b7da594be41a411eb901d0ba4ff9
  • Status changed from new to needs_review

This should resolve the issue for now--considering that this is a blocker AFAIC this workaround should be good enough for now. In the meantime I would like to continue to rethink this code and how it handles files.


New commits:

f0d297cUpdate this version check to support newer JRE versions (>=1.9)
db5ab23In the Jmol interface, support image outputs that contain spaces in their paths
75720fdFix overly aggressive deleting of output temp file on Cygwin

comment:5 Changed 3 years ago by git

  • Commit changed from 75720fd44a14b7da594be41a411eb901d0ba4ff9 to 03d4f0f074f6491f35abe580d8833779c7fa0089

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

03d4f0fFix overly aggressive deleting of output temp file on Cygwin

comment:6 Changed 3 years ago by embray

Oops, rebased on correct branch.

comment:7 Changed 3 years ago by vbraun

Can you wrap it in a if-cygwin branch instead of making assumptions about which temp file implementation is used?

comment:8 Changed 3 years ago by embray

It's not making assumptions--as written it's making a feature check that isn't necessarily platform specific, which is usually preferable.

comment:9 Changed 3 years ago by jdemeyer

Do we really need to close the file there? Maybe a different and simpler solution would be to just remove that self.outtmpfile.close().

comment:10 Changed 3 years ago by embray

I alluded to this in 3 but the file will be deleted when the object is closed anyways, whether that close is explicit, or comes sometime later when the object goes out of scope and its __del__ is called. Explicitly closing ensures things happen predictably, and not doing so is not a solution to the actual problem.

Anyways, that was added for avoiding ResourceWarnings on Python 3.

comment:11 Changed 3 years ago by jdemeyer

OK, I think I understand the reasons now, but it wasn't obvious from reading the code (including the comments). So I agree with the fix, but the comment should be improved. I'll think about that tomorrow.

comment:12 follow-up: Changed 3 years ago by jdemeyer

The real problem is that the file is closed twice: once in the master process and once in the child. It can only be deleted once.

comment:13 in reply to: ↑ 12 Changed 3 years ago by embray

Replying to jdemeyer:

The real problem is that the file is closed twice: once in the master process and once in the child. It can only be deleted once.

Right, that's whole point. I more or less wrote that in the comment but I guess it should have been clearer...

comment:14 Changed 3 years ago by embray

Perhaps to give slightly more detail as well, regarding the usage/implementation of the tempfile module (which I agree with you is a bit crummy):

In Sage here we are using the TemporaryFile factory (which despite the CamelCase is actually function). This basically uses mkstemp to create a unique filename, then returns a file-like object around the file descriptor and immediately calls os.unlink on the filename (effectively deleting the file as soon as it's no longer needed.

However, there is also NamedTemporaryFile which does not immediately unlink the name, so that the file can have a name that can be used in other processes, for example. In this case it's up to the user to delete the file, although it has an auto-delete functionality delete=True enabled by default in fact, that deletes the file after it is closed in __del__. Unfortunately the implementation is a bit buggy, I think, since there's an unhandled exception if the file was already deleted when it gets to this point :(

The problem arises here because the tempfile quietly replaces TemporaryFile with NamedTemporaryFile on non-POSIX systems and on Cygwin, since the ability to delete a file while it has an open file descriptor is not guaranteed on other platforms. This is actually annoying for a couple reasons:

1) NamedTemporaryFile and TemporaryFile have different APIs, and this switch is done rather implicitly. You also get the aforementioned buggy behavior of NamedTemporaryFile when it comes to double-deletes.

2) At least on Cygwin this is unnecessary. It might have been a concession to an older version of Cygwin (this hack seems to have been in place for a while). But it's not necessary at all on more modern versions of Cygwin which do allow unlinking open files.

So the problem is that on Cygwin the TemporaryFile gets replaced with a NamedTemporaryFile with delete=True (which can be detected through presence of the delete attribute, though an explicit isinstance check could also be possible). And, as Jeroen wrote, this means that because that NamedTemporaryFile is instantiated before forking, it exists in two processes, and whichever one happens to close the file first deletes it. Then the process that closes the file second produces this annoying unhandled exception in __del__ message.

This has actually been a problem for a long time, and I've seen it show up in the tests before rather randomly. But it was relatively uncommon since most of the time the file was closed first in the parent process and we didn't see the message from the child process.

comment:15 Changed 3 years ago by embray

Also, while I'm calling the behavior of NamedTemporaryFile "buggy" with respect to double deletions, it may in fact be intentional, as this could point toward a programming error of some kind (which I believe is in fact the case here).

comment:16 Changed 3 years ago by git

  • Commit changed from 03d4f0f074f6491f35abe580d8833779c7fa0089 to 85576e961271144e698c07af810c4d3da465c137

Branch pushed to git repo; I updated commit sha1. New commits:

85576e9add a more detailed comment

comment:17 Changed 3 years ago by embray

I improved (I think) the comment and also linked to my comment above for more detail. I don't think it's such a deep problem though (albeit annoying)...


New commits:

85576e9add a more detailed comment

comment:18 Changed 3 years ago by jdemeyer

  • Reviewers set to Jeroen Demeyer
  • Status changed from needs_review to positive_review

comment:19 Changed 3 years ago by vbraun

  • Branch changed from u/embray/cygwin/temp-file-wrapper-del to 85576e961271144e698c07af810c4d3da465c137
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.