Opened 10 years ago

Closed 10 years ago

#14292 closed defect (fixed)

Race conditions in doctester

Reported by: Jeroen Demeyer Owned by: David Roe
Priority: blocker Milestone: sage-5.9
Component: doctest framework Keywords:
Cc: Merged in: sage-5.9.beta4
Authors: Jeroen Demeyer Reviewers: Volker Braun
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description (last modified by Jeroen Demeyer)

The writing of the JSON file is prone to race conditions.

Attachments (1)

14292_doctest_race.patch (10.0 KB) - added by Jeroen Demeyer 10 years ago.

Download all attachments as: .zip

Change History (22)

comment:1 Changed 10 years ago by Jeroen Demeyer

Description: modified (diff)

comment:2 Changed 10 years ago by David Roe

What exactly goes wrong? Loading and saving stats is done in DocTestController, so there aren't multiple processes trying to write to the same file...

comment:3 in reply to:  2 Changed 10 years ago by Jeroen Demeyer

Replying to roed:

so there aren't multiple processes trying to write to the same file...

There could be when testing the doctester in parallel. Anyway I have a patch ready but it's on sage.math which is currently down.

comment:4 Changed 10 years ago by David Roe

Component: doctestdoctest framework
Owner: changed from Minh Van Nguyen to David Roe

comment:5 Changed 10 years ago by Jeroen Demeyer

OK, I managed to recover the data from sage.math including this patch. It's not the final version, but it shows the main idea.

comment:6 Changed 10 years ago by Jeroen Demeyer

Status: newneeds_review

comment:7 Changed 10 years ago by Jeroen Demeyer

Status: needs_reviewneeds_work

comment:8 Changed 10 years ago by Volker Braun

Its good to have a general solution to this recurring problem. But writing via a temporary file is an ugly workaround for file locking, which is what we really want to do. It is also likely to hide race conditions. Also, on Win32 (and possibly other non-posix platforms) there is no atomic replace. File locking is also useful as a simple synchronization tool for clusters with a shared file system.

comment:9 in reply to:  8 Changed 10 years ago by Jeroen Demeyer

Replying to vbraun:

But writing via a temporary file is an ugly workaround for file locking

Not always. Locking has different goals. Locks should be held as short as possible, which cannot be guaranteed (for example when writing the GAP workspace). My solution also solves the problem of exceptions/crashes during the write and it doesn't need any changes to the reading of files.

comment:10 Changed 10 years ago by Jeroen Demeyer

Status: needs_workneeds_review

comment:11 Changed 10 years ago by Volker Braun

But we really do want locking when creating the gap workspace instead of 10 processes creating the workspace 10 times and then fighting over the workspace file (last process standing wins).

comment:12 in reply to:  8 Changed 10 years ago by Jeroen Demeyer

Does Python have portable file locking commands?

comment:13 Changed 10 years ago by Volker Braun

Not natively, but e.g. https://github.com/smontanaro/pylockfile looks quite sane to me.

comment:14 Changed 10 years ago by Jeroen Demeyer

Volker, I don't object to using that, but I still think this patch here is ready for review. I would rather use file locking in addition to the current patch such that we can have all advantages of both approaches.

comment:15 Changed 10 years ago by Volker Braun

I agree that using a temp file is good enough for now. But it would be nice if we could rename write_via_temp to something that embodies the underlying idea instead of the implementation. Then we could improve the implementation later without having to change the usage. Maybe write_safely or write_threadsafe.

comment:16 in reply to:  15 Changed 10 years ago by Jeroen Demeyer

If you prefer write_safely, that's fine for me. However, I do think that using locking and using a temporary file are two different solutions which are sufficiently different to warrant different names. Given this, perhaps it is better to be explicit in the implementation.

Version 0, edited 10 years ago by Jeroen Demeyer (next)

comment:17 Changed 10 years ago by Volker Braun

Well if your goal is to eventually have both proper locking and atomic writing separately then thats fine. But the name should still convey why we are doing that, not how.

Obviously many Sage developers are not familiar with which file system operations are atomic, so they will rightfully be confused when they see code that writes a cache with write_via_temp. Clearly there is an optimization where you take out that extra step ;-)

How about atomic_write?

comment:18 Changed 10 years ago by Jeroen Demeyer

Do you agree with the patch, modulo the naming of write_via_temp?

comment:19 Changed 10 years ago by Volker Braun

Yes, I think its great to use a with block and the actual implementation seems fine. With the caveat that Windows can't do atomic replace, but then there is nothing we can do about it.

Changed 10 years ago by Jeroen Demeyer

Attachment: 14292_doctest_race.patch added

comment:20 Changed 10 years ago by Jeroen Demeyer

Reviewers: Volker Braun
Status: needs_reviewpositive_review

OK, renamed to atomic_write.

comment:21 Changed 10 years ago by Jeroen Demeyer

Merged in: sage-5.9.beta4
Resolution: fixed
Status: positive_reviewclosed
Note: See TracTickets for help on using tickets.