Opened 11 years ago

Closed 11 years ago

#10004 closed defect (fixed)

The gap instances in a parallelised function do not always have distinct _local_tmpfile

Reported by: SimonKing Owned by: was
Priority: blocker Milestone: sage-4.6
Component: interfaces Keywords: _local_tmpfile, parallel
Cc: Merged in: sage-4.6.alpha3
Authors: Simon King Reviewers: Mitesh Patel
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description (last modified by SimonKing)

Using the @parallel decorator on a function, the Gap instances in the different branches of the function should have distinct files for I/O.

Indeed, under normal circumstances, they are distinct:

sage: @parallel
....: def f(n):
....:     return gap._local_tmpfile()
....:
sage: L = [t[1] for t in f(range(5))]
sage: len(set(L))  # this is how it should be
5

However, it suffices to call gap._local_tmpfile() in the main sage branch in order to break diversity:

sage: print gap._local_tmpfile()
/home/king/.sage//temp/gauss/16041//interface//tmp16041
sage: L = [t[1] for t in f(range(5))]
sage: len(set(L))  # this is how it must not be
1

The reason for this behaviour lies in the quit() method of Expect interfaces: If _local_tmpfile() is called then the resulting name is stored in __local_tmpfile. The @parallel decorator needs the interfaces in a clean state, so, it calls quit(). However, quit does not unset __local_tmpfile, and thus all interface instances in the different branches of the parallel computation will use the same previously stored file name.

The consequences are obvious: Having different processes use the same file for passing command lines must end in a disaster.

I hope it is OK to mark this as blocker. After all, parallelisation and interfaces are both important.

Attachments (1)

trac_10004_local_tmpfile.patch (3.4 KB) - added by SimonKing 11 years ago.
Gives any instance of an Expect interface its own temporary file

Download all attachments as: .zip

Change History (14)

comment:1 Changed 11 years ago by SimonKing

Addendum:

sage: print gap._local_tmpfile()
/home/king/.sage//temp/gauss/16746//interface//tmp16746
sage: print singular._local_tmpfile()
/home/king/.sage//temp/gauss/16746//interface//tmp16746

Gap and Singular sharing one _local_tmpfile asks for trouble, IMO.

comment:2 Changed 11 years ago by SimonKing

Sorry, I thought that the patch would solve the problem, but this was due to a misprint in my test. So, please disregard the patch.

comment:3 follow-up: Changed 11 years ago by SimonKing

Meanwhile, it seems to me that the problem is ultimately caused by setattr/getattr.

I work with the definition

    def _local_tmpfile(self):
        try:
            return self.__local_tmpfile
        except AttributeError:
            self.__local_tmpfile = '%s/tmp'%SAGE_TMP_INTERFACE + str(self.pid())
            return self.__local_tmpfile

The following is rather irritating:

  1. The attribute __local_tmpfile can be accessed from within the method, but not from outside.
    sage: gap._local_tmpfile() is gap._local_tmpfile()
    True
    sage: gap.__local_tmpfile
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    ...
    AttributeError:
    
  1. When setting the attribute __local_tmpfile from outside, it can not be retrieved from inside the _local_tmpfile method:
    sage: gap.__local_tmpfile = 'foobar'
    sage: gap._local_tmpfile()
    '/home/king/.sage//temp/gauss/21283//interface//tmp21285'
    sage: gap.__local_tmpfile
    'foobar'
    

comment:4 Changed 11 years ago by SimonKing

  • Authors set to Simon King
  • Status changed from new to needs_review

Now it seems to work! With the patch, I get

  1. Caching works
      sage: gap._local_tmpfile() is gap._local_tmpfile()
      True
    
  1. Different interfaces have different tmpfiles:
      sage: gap._local_tmpfile() != singular._local_tmpfile()
      True
    
  1. Parallelisation works:
      sage: @parallel
      ....: def f(n):
      ....:     return gap._local_tmpfile()
      ....:
      sage: L = [t[1] for t in f(range(5))]
      sage: len(set(L))
      5
      sage: print gap._local_tmpfile()
      /home/king/.sage//temp/gauss/24281//interface//tmp24283
      sage: L = [t[1] for t in f(range(5))]
      sage: len(set(L))
      5 
    

The ideas for my patch are:

  1. As before, the name of the file is cached in an attribute __local_tmpfile.
  1. If __local_tmpfile is not there or is None, then a new filename is created, which depends on pid(). This is different for different interface instances, including different forks of a parallelised function.
  1. It must be ensured that the cache is emptied before forking. This is implemented in Expect.quit(): __local_tmpfile is set to None (I tested that __delattr__ did not do the job).

comment:5 Changed 11 years ago by mpatel

  • Milestone changed from sage-4.6.1 to sage-4.6

comment:6 in reply to: ↑ 3 ; follow-up: Changed 11 years ago by mpatel

Replying to SimonKing:

  1. The attribute __local_tmpfile can be accessed from within the method, but not from outside.

I think this is because of Python's name-mangling:

sage: gap._local_tmpfile()
'/scratch/mpatel/.sage/temp/sage.math.washington.edu/28400//interface//tmp28400'
sage: gap._Expect__local_tmpfile
'/scratch/mpatel/.sage/temp/sage.math.washington.edu/28400//interface//tmp28400'

comment:7 in reply to: ↑ 6 Changed 11 years ago by SimonKing

Replying to mpatel:

I think this is because of Python's name-mangling:

Thank you! I did not know about this.

Do you think that in the quit() method I should now do

self.__delattr__('_Expect__local_tmpfile')

instead of

self.__local_tmpfile=None

?

Or is testing

self.__local_tmpfile is None

quick enough, so that the patch can stay as it is?

comment:8 follow-up: Changed 11 years ago by mpatel

I think it's OK to set to and "is-compare" with None. But does

        try:
            del self.__local_tmpfile
        except AttributeError:
            pass

work in Expect.quit? Then you could simplify the try block in Expect._local_tmpfile. It seems there's a tradeoff.

Could you explain where Expect.quit is called (indirectly) in your example in the description?

comment:9 in reply to: ↑ 8 ; follow-up: Changed 11 years ago by SimonKing

  • Description modified (diff)

Replying to mpatel:

... work in Expect.quit? Then you could simplify the try block in Expect._local_tmpfile. It seems there's a tradeoff.

That was my question: Is there a tradeoff? So, now that you taught me how to deal with the double underscore attribute, I'll try to change my patch accordingly.

Could you explain where Expect.quit is called (indirectly) in your example in the description?

OK, done. quit is called in @parallel, because that decorator needs to wipe all interfaces -- but it was not rigorous enough, as it did not unset the name of the temporary file.

Changed 11 years ago by SimonKing

Gives any instance of an Expect interface its own temporary file

comment:10 Changed 11 years ago by SimonKing

The patch is now updated.

comment:11 in reply to: ↑ 9 Changed 11 years ago by mpatel

Replying to SimonKing:

Replying to mpatel:

... work in Expect.quit? Then you could simplify the try block in Expect._local_tmpfile. It seems there's a tradeoff.

That was my question: Is there a tradeoff? So, now that you taught me how to deal with the double underscore attribute, I'll try to change my patch accordingly.

I apologize for not being clear. I just meant there's a tradeoff in which method (quit or _local_tmpfile) might have to do more work. Does one typically get called much more often than the other? There may be a small performance penalty in using try-except vs. hasattr vs. None. But this probably only matters if we're making many calls to the methods.

The patch looks good and I'll try to review it soon, if someone else doesn't do it first.

Could you explain where Expect.quit is called (indirectly) in your example in the description?

OK, done. quit is called in @parallel, because that decorator needs to wipe all interfaces -- but it was not rigorous enough, as it did not unset the name of the temporary file.

Thanks! I wasn't aware of the quit call in parallel (or use_fork) itself.

comment:12 Changed 11 years ago by mpatel

  • Reviewers set to Mitesh Patel
  • Status changed from needs_review to positive_review

comment:13 Changed 11 years ago by mpatel

  • Merged in set to sage-4.6.alpha3
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.