Opened 8 years ago

Closed 8 years ago

#14292 closed defect (fixed)

Race conditions in doctester

Reported by: jdemeyer Owned by: roed
Priority: blocker Milestone: sage-5.9
Component: doctest framework Keywords:
Cc: Merged in: sage-5.9.beta4
Authors: Jeroen Demeyer Reviewers: Volker Braun
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by jdemeyer)

The writing of the JSON file is prone to race conditions.

Attachments (1)

14292_doctest_race.patch (10.0 KB) - added by jdemeyer 8 years ago.

Download all attachments as: .zip

Change History (22)

comment:1 Changed 8 years ago by jdemeyer

  • Description modified (diff)

comment:2 follow-up: Changed 8 years ago by roed

What exactly goes wrong? Loading and saving stats is done in DocTestController, so there aren't multiple processes trying to write to the same file...

comment:3 in reply to: ↑ 2 Changed 8 years ago by jdemeyer

Replying to roed:

so there aren't multiple processes trying to write to the same file...

There could be when testing the doctester in parallel. Anyway I have a patch ready but it's on sage.math which is currently down.

comment:4 Changed 8 years ago by roed

  • Component changed from doctest to doctest framework
  • Owner changed from mvngu to roed

comment:5 Changed 8 years ago by jdemeyer

OK, I managed to recover the data from sage.math including this patch. It's not the final version, but it shows the main idea.

comment:6 Changed 8 years ago by jdemeyer

  • Status changed from new to needs_review

comment:7 Changed 8 years ago by jdemeyer

  • Status changed from needs_review to needs_work

comment:8 follow-ups: Changed 8 years ago by vbraun

Its good to have a general solution to this recurring problem. But writing via a temporary file is an ugly workaround for file locking, which is what we really want to do. It is also likely to hide race conditions. Also, on Win32 (and possibly other non-posix platforms) there is no atomic replace. File locking is also useful as a simple synchronization tool for clusters with a shared file system.

comment:9 in reply to: ↑ 8 Changed 8 years ago by jdemeyer

Replying to vbraun:

But writing via a temporary file is an ugly workaround for file locking

Not always. Locking has different goals. Locks should be held as short as possible, which cannot be guaranteed (for example when writing the GAP workspace). My solution also solves the problem of exceptions/crashes during the write and it doesn't need any changes to the reading of files.

comment:10 Changed 8 years ago by jdemeyer

  • Status changed from needs_work to needs_review

comment:11 Changed 8 years ago by vbraun

But we really do want locking when creating the gap workspace instead of 10 processes creating the workspace 10 times and then fighting over the workspace file (last process standing wins).

comment:12 in reply to: ↑ 8 Changed 8 years ago by jdemeyer

Does Python have portable file locking commands?

comment:13 Changed 8 years ago by vbraun

Not natively, but e.g. https://github.com/smontanaro/pylockfile looks quite sane to me.

comment:14 Changed 8 years ago by jdemeyer

Volker, I don't object to using that, but I still think this patch here is ready for review. I would rather use file locking in addition to the current patch such that we can have all advantages of both approaches.

comment:15 follow-up: Changed 8 years ago by vbraun

I agree that using a temp file is good enough for now. But it would be nice if we could rename write_via_temp to something that embodies the underlying idea instead of the implementation. Then we could improve the implementation later without having to change the usage. Maybe write_safely or write_threadsafe.

comment:16 in reply to: ↑ 15 Changed 8 years ago by jdemeyer

If you prefer write_safely, that's fine for me. However, I do think that using locking and using a temporary file are two different solutions which are sufficiently different to warrant different names. Given this, perhaps it is better to be explicit about the implementation.

Last edited 8 years ago by jdemeyer (previous) (diff)

comment:17 Changed 8 years ago by vbraun

Well if your goal is to eventually have both proper locking and atomic writing separately then thats fine. But the name should still convey why we are doing that, not how.

Obviously many Sage developers are not familiar with which file system operations are atomic, so they will rightfully be confused when they see code that writes a cache with write_via_temp. Clearly there is an optimization where you take out that extra step ;-)

How about atomic_write?

comment:18 Changed 8 years ago by jdemeyer

Do you agree with the patch, modulo the naming of write_via_temp?

comment:19 Changed 8 years ago by vbraun

Yes, I think its great to use a with block and the actual implementation seems fine. With the caveat that Windows can't do atomic replace, but then there is nothing we can do about it.

Changed 8 years ago by jdemeyer

comment:20 Changed 8 years ago by jdemeyer

  • Reviewers set to Volker Braun
  • Status changed from needs_review to positive_review

OK, renamed to atomic_write.

comment:21 Changed 8 years ago by jdemeyer

  • Merged in set to sage-5.9.beta4
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.