Opened 6 years ago

Closed 6 years ago

#17814 closed defect (fixed)

Make calling a cached method independent of source code inspection

Reported by: SimonKing Owned by:
Priority: major Milestone: sage-6.6
Component: distribution Keywords:
Cc: Merged in:
Authors: Simon King Reviewers: Volker Braun
Report Upstream: N/A Work issues:
Branch: 0e8ef00 (Commits) Commit: 0e8ef000e434b654c76299b2ea694ca15be965a3
Dependencies: Stopgaps:

Description (last modified by SimonKing)

If a pyx file in the sage library uses cached methods and then the source file is removed after compiling Sage, then accessing the cached methods in this file becomes impossible. Example: Move the file src/sage/rings/finite_rings/finite_field_base.pyx away. Then:

sage: K=GF(5)
sage: K.factored_order
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-488672e9c19b> in <module>()
----> 1 K.factored_order()

/home/king/Sage/git/sage/src/sage/structure/parent.pyx in sage.structure.parent.Parent.__getattr__ (build/cythonized/sage/structure/parent.c:7863)()
    838             return self.__cached_methods[name]
    839         except KeyError:
--> 840             attr = getattr_from_other_class(self, self._category.parent_class, name)
    841             self.__cached_methods[name] = attr
    842             return attr

/home/king/Sage/git/sage/src/sage/structure/misc.pyx in sage.structure.misc.getattr_from_other_class (build/cythonized/sage/structure/misc.c:1582)()
    249         dummy_error_message.cls = type(self)
    250         dummy_error_message.name = name
--> 251         raise dummy_attribute_error
    252     try:
    253         attribute = getattr(cls, name)

AttributeError: 'FiniteField_prime_modn_with_category' object has no attribute 'factored_order'

The same does work in attached pyx files, even after removing the temporary copy of the pyx file. Conjecture: The cached method's __get__ method relies on source code inspection. That should of course not be the case. Source code inspection should only come in play if the user wants to see the source code or at least the doc string.

I chose the component "distribution", since it was reported on sage-devel as something that happened on certain sage distributions that do not provide sage sources.

Change History (74)

comment:1 Changed 6 years ago by SimonKing

  • Component changed from PLEASE CHANGE to distribution
  • Description modified (diff)

comment:2 Changed 6 years ago by SimonKing

  • Type changed from PLEASE CHANGE to defect

comment:3 Changed 6 years ago by SimonKing

Here one can see how introspection enters:

sage: K=GF(5)
sage: C=type(K)
sage: C.factored_order
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-d955bd892332> in <module>()
----> 1 C.factored_order

/home/king/Sage/git/sage/src/sage/misc/cachefunc.pyx in sage.misc.cachefunc.CachedMethod.__get__ (build/cythonized/sage/misc/cachefunc.c:15452)()
   2673         f = (<CachedFunction>self._cachedfunc).f
   2674         if self.nargs==0:
-> 2675             args, varargs, keywords, defaults = sage_getargspec(f)                                                                                                        
   2676             if varargs is None and keywords is None and len(args)<=1:                                                                                                     
   2677                 self.nargs = 1

/home/king/Sage/git/sage/local/lib/python2.7/site-packages/sage/misc/sageinspect.pyc in sage_getargspec(obj)
   1394         except TypeError: # arg is not a code object
   1395         # The above "hopefully" was wishful thinking:
-> 1396             return inspect.ArgSpec(*_sage_getargspec_cython(sage_getsource(obj)))
   1397             #return _sage_getargspec_from_ast(sage_getsource(obj))
   1398     try:

/home/king/Sage/git/sage/local/lib/python2.7/site-packages/sage/misc/sageinspect.pyc in _sage_getargspec_cython(source)
   1006 
   1007     """
-> 1008     defpos = source.find('def ')
   1009     assert defpos > -1, "The given source does not contain 'def'"
   1010     s = source[defpos:].strip()

AttributeError: 'NoneType' object has no attribute 'find'

So: cached_method tries to determine the number of arguments. This is because there is a special implementation for functions with no argument (except self).

In other words, the wrong behaviour could be fixed by finding a way to get the number of arguments (or better the argspec) of a cython function without looking at its source code. I'll see if that's possible. So, the fix will likely be in sage.misc.sageinspect, not in sage.misc.cachefunc.

comment:4 follow-up: Changed 6 years ago by SimonKing

Fixing sageinspect won't be easy. I currently have absolutely no clue how to read off the number of arguments (or even the names of the arguments) of a function that is defined in a Cython file. Those functions don't seem to have any attributes holding useful information.

Has someone else an idea? If not, then I don't see what we can do. We do want a special cached method implementation for methods without arguments, and thus we need to determine the number of arguments of the to-be-wrapped method. If that doesn't work without reading the sources, what could we possibly do?

comment:5 in reply to: ↑ 4 Changed 6 years ago by jdemeyer

Replying to SimonKing:

Fixing sageinspect won't be easy. I currently have absolutely no clue how to read off the number of arguments (or even the names of the arguments)

Do you really require the number and names of arguments or do you only want to know whether a function takes any *args and/or **kwds at all (apart from the self argument)?

Technically, Python always uses *args and **kwds to pass arguments (except for some special methods like __new__). It's the Cython method implementation which interprets these *args and **kwds.

Last edited 6 years ago by jdemeyer (previous) (diff)

comment:6 follow-up: Changed 6 years ago by jdemeyer

The question is also: if there is a method

def f(self, arg=1)

do we require that obj.f() and obj.f(1) and obj.f(arg=1) all use the same cache entry? If you don't require that, there might be a solution by working on the level of *args and **kwds.

Last edited 6 years ago by jdemeyer (previous) (diff)

comment:7 in reply to: ↑ 6 ; follow-ups: Changed 6 years ago by SimonKing

Replying to jdemeyer:

The question is also: if there is a method

def f(self, arg=1)

do we require that obj.f() and obj.f(1) and obj.f(arg=1) all use the same cache entry? If you don't require that, there might be a solution by working on the level of *args and **kwds.

Yes, we do require that. Equivalent inputs should result in identical output.

The problem occurs when the cached method is bound to an instance. At that point, we want to know if the method accepts any argument in addition to self. If it does, then a CachedMethodCaller is set as an attribute of the instance. If it does not, then a CachedMethodCallerNoArgs is bound to the method. If sage can't answer the question, then an AttributeError is raised. We don't need to know all argument names and all defaults, but we do need to know their number.

comment:8 in reply to: ↑ 7 ; follow-up: Changed 6 years ago by jdemeyer

The two statements below seem like a contraction: using the example above, if you want to ensure that obj.f() and obj.f(1) and obj.f(arg=1) use the same cache entry, you do need to know the names and defaults of the arguments.

Replying to SimonKing:

Yes, we do require that. Equivalent inputs should result in identical output.

We don't need to know all argument names and all defaults, but we do need to know their number.

comment:9 in reply to: ↑ 7 Changed 6 years ago by jdemeyer

Replying to SimonKing:

we want to know if the method accepts any argument in addition to self.

That's possible using the Python API function PyCFunction_GetFlags. When applied to a bound method, it returns the flags as defined in https://docs.python.org/2/c-api/structures.html#c.PyMethodDef

comment:10 in reply to: ↑ 8 ; follow-ups: Changed 6 years ago by SimonKing

Replying to jdemeyer:

The two statements below seem like a contraction: using the example above, if you want to ensure that obj.f() and obj.f(1) and obj.f(arg=1) use the same cache entry, you do need to know the names and defaults of the arguments.

For calling the function, you of course need it (so, I admit that the title of this ticket is misleading). However, for determining whether it ought to be a CachedMethodCaller or a CachedMethodCallerNoArgs, it suffices to know a little less.

And thank you for the link to PyCFunction_GetFlags. So, it seems that sage.misc.sageinspect should get a little addition sage.misc.sage_inspect_cython written in Cython that provides helpers for inspection of Cython methods.

comment:11 in reply to: ↑ 10 Changed 6 years ago by jdemeyer

Replying to SimonKing:

For calling the function, you of course need it (so, I admit that the title of this ticket is misleading). However, for determining whether it ought to be a CachedMethodCaller or a CachedMethodCallerNoArgs, it suffices to know a little less.

Sure, but that doesn't solve the problem really (unless you want to fix the issue of this ticket only for methods taking no arguments). You still need to call the function eventually.

comment:12 follow-up: Changed 6 years ago by jdemeyer

Replying to SimonKing:

Replying to jdemeyer:

The question is also: if there is a method

def f(self, arg=1)

do we require that obj.f() and obj.f(1) and obj.f(arg=1) all use the same cache entry? If you don't require that, there might be a solution by working on the level of *args and **kwds.

Yes, we do require that.

Is that really so important? Does it happen a lot in practice that people or the Sage library call a function in different but equivalent ways?

comment:13 in reply to: ↑ 12 Changed 6 years ago by SimonKing

Replying to jdemeyer:

Replying to SimonKing:

Yes, we do require that.

Is that really so important? Does it happen a lot in practice that people or the Sage library call a function in different but equivalent ways?

I believe it is important, and I think it is specified somewhere in the docs of sage.misc.cachefunc. There are places where it is used for UniqueRepresentation. Indeed, if you have a unique parent that depends on arguments which (partially) have a default, then you simply don't know if the user will provide these arguments explicitly or implicitly. But when pickling/unpickling it, the arguments will be provided in a uniform way (always explicitly, if I recall correctly).

Hence, if we changed that, we would break uniqueness of UniqueRepresentation under pickling.

comment:14 Changed 6 years ago by jdemeyer

I see. This is important where the semantics of cached_function are really important.

I always had in mind only the application of speeding up the function by not computing the same thing multiple times. For the latter application, you need to consider also the overhead of parsing *args and **kwds to compute the correct cache entry. Where speed is important, we might just use *args and **kwds instead.

comment:15 Changed 6 years ago by jdemeyer

Proposal: have 2 different implementations of cached_function:

  • one for Python methods, which works just like the current implementation.
  • one for Cython methods, which works on the level of *args and **kwds.

comment:16 Changed 6 years ago by jdemeyer

This proposal would solve the issue on the ticket. Specializing for Cython, it might also be possible to speed-up things more.

comment:17 in reply to: ↑ 10 ; follow-up: Changed 6 years ago by SimonKing

Replying to SimonKing:

And thank you for the link to PyCFunction_GetFlags. So, it seems that sage.misc.sageinspect should get a little addition sage.misc.sage_inspect_cython written in Cython that provides helpers for inspection of Cython methods.

It is puzzling. When I call PyCFunction_GetFlags on the function/method being wrapped, no crash occurs in that function. However, importing lazy_attribute then fails, and so Sage won't start. No idea how the two things (a function that gives me information on other functions without raising an error, and an import statement) could possibly interfere.

comment:18 in reply to: ↑ 17 Changed 6 years ago by jdemeyer

Replying to SimonKing:

Replying to SimonKing:

And thank you for the link to PyCFunction_GetFlags. So, it seems that sage.misc.sageinspect should get a little addition sage.misc.sage_inspect_cython written in Cython that provides helpers for inspection of Cython methods.

It is puzzling. When I call PyCFunction_GetFlags on the function/method being wrapped, no crash occurs in that function. However, importing lazy_attribute then fails, and so Sage won't start. No idea how the two things (a function that gives me information on other functions without raising an error, and an import statement) could possibly interfere.

It's hard to tell without seeing the code. Are you sure the PyCFunction_GetFlags call is what creates the problem and not for example a changed import or the mere fact of accessing obj.method?

Anyway, I'm still very curious what you think about 11 and 15

Last edited 6 years ago by jdemeyer (previous) (diff)

comment:19 follow-up: Changed 6 years ago by SimonKing

The proposal from comment:15 would of course solve the problem of the introspection that is needed to wrap a cython function in a cached method. However, it would mean that the specification of uniqueness would be violated for those cython functions for which the argument spectrum is not available.

I'll try to show you a diff file shortly, so that you can see how calling PyCFunction_GetFlags makes the import of lazy_attribute fail. Currently, I am rebuilding Sage, so, it may take a while.

comment:20 in reply to: ↑ 19 Changed 6 years ago by jdemeyer

Replying to SimonKing:

The proposal from comment:15 would of course solve the problem of the introspection that is needed to wrap a cython function in a cached method. However, it would mean that the specification of uniqueness would be violated for those cython functions for which the argument spectrum is not available.

True, but I think the uniqueness is only important for certain very specific functions. In any case, if we want to solve this ticket, I think it's the only way to go.

comment:21 Changed 6 years ago by SimonKing

Try this:

  • src/sage/misc/cachefunc.pyx

    diff --git a/src/sage/misc/cachefunc.pyx b/src/sage/misc/cachefunc.pyx
    index a4276b9..515c74a 100644
    a b import sage.misc.weak_dict 
    481481from sage.misc.weak_dict import WeakValueDictionary
    482482from sage.misc.decorators import decorator_keywords
    483483
     484from cpython cimport PyObject
     485
     486cdef extern from "methodobject.h":
     487    cdef int PyCFunction_GetFlags(PyObject *op)
     488
     489
    484490cdef frozenset special_method_names = frozenset(['__abs__', '__add__',
    485491            '__and__', '__call__', '__cmp__', '__coerce__', '__complex__', '__contains__', '__del__',
    486492            '__delattr__', '__delete__', '__delitem__', '__delslice__', '__dir__', '__div__',
    cdef class CachedMethod(object): 
    26722678        # we need to analyse the argspec
    26732679        f = (<CachedFunction>self._cachedfunc).f
    26742680        if self.nargs==0:
     2681            bla = PyCFunction_GetFlags(<PyObject *>f)
    26752682            args, varargs, keywords, defaults = sage_getargspec(f)
    26762683            if varargs is None and keywords is None and len(args)<=1:
    26772684                self.nargs = 1

According to the crash report that I get, the error occurs in the line args, varargs, keywords, defaults = sage_getargspec(f), thus, after PyCFunction_GetFlags has returned something.

comment:22 Changed 6 years ago by jdemeyer

Replace

cdef int PyCFunction_GetFlags(PyObject *op)

by

cdef int PyCFunction_GetFlags(object op) except -1

comment:23 follow-up: Changed 6 years ago by jdemeyer

With the correct except declaration, there is an error if the argument is not a "built-in" (Cython) method:

SystemError: Objects/methodobject.c:64: bad argument to internal function

comment:24 in reply to: ↑ 23 Changed 6 years ago by SimonKing

Replying to jdemeyer:

With the correct except declaration, there is an error if the argument is not a "built-in" (Cython) method:

SystemError: Objects/methodobject.c:64: bad argument to internal function

OK, we then have an error whose traceback points to calling the function.

comment:25 follow-up: Changed 6 years ago by SimonKing

Apparently PyCFunction_GetFlags is the wrong tool (or we first need to extract the cfunction from the method), since always an error is raised.

comment:26 in reply to: ↑ 25 ; follow-up: Changed 6 years ago by jdemeyer

Replying to SimonKing:

Apparently PyCFunction_GetFlags is the wrong tool (or we first need to extract the cfunction from the method), since always an error is raised.

It is the right tool for bound Cython methods.

comment:27 Changed 6 years ago by jdemeyer

I am currently testing something else, but for example

sage: K = GF(5)
sage: m = K.factored_unit_order
sage: m
<built-in method factored_unit_order of FiniteField_prime_modn_with_category object at 0x7f357853c1e0>

This m object should be suitable as argument for PyCFunction_GetFlags.

comment:28 in reply to: ↑ 26 Changed 6 years ago by SimonKing

Replying to jdemeyer:

Replying to SimonKing:

Apparently PyCFunction_GetFlags is the wrong tool (or we first need to extract the cfunction from the method), since always an error is raised.

It is the right tool for bound Cython methods.

Exactly. And what is wrapped by a cached method is not a bound method, but an unbound method (or even a function that is not a method at all?). Would it make sense to take the time to create a bound copy, after making sure that it is not a Python thingy?

comment:29 Changed 6 years ago by SimonKing

  • Branch set to u/SimonKing/make_calling_a_cached_method_independent_of_source_code_inspection

comment:30 follow-up: Changed 6 years ago by SimonKing

  • Authors set to Simon King
  • Branch u/SimonKing/make_calling_a_cached_method_independent_of_source_code_inspection deleted
  • Status changed from new to needs_review

I have attached a branch. If you have a cached method without arguments in a cython file and remove the sources before starting sage, then both accessing and calling the method should work. If you have a cached method with arguments in a cython file and remove the sources before starting sage, then accessing the method works, but calling would still not work. Methods in Python files should not be affected.

I don't know how to write a meaningful doctest for it (and also I didn't run the testsuite). Anyway, perhaps you can comment whether you think that it is a solution to the problem, or at least a step in the right direction.

comment:31 Changed 6 years ago by SimonKing

  • Branch set to u/SimonKing/make_calling_a_cached_method_independent_of_source_code_inspection
  • Commit set to 30543e7b283c52cad2fbeb060efe78a74c7b42ba

Why has the branch field automatically been deleted?

comment:32 in reply to: ↑ 30 ; follow-up: Changed 6 years ago by jdemeyer

Replying to SimonKing:

Anyway, perhaps you can comment whether you think that it is a solution to the problem

Given that you want to call the method also, no I don't think it's a solution.

Personally, I still think that 15 is the best solution (but I'm still open to other suggestions).

comment:33 in reply to: ↑ 32 ; follow-ups: Changed 6 years ago by nbruin

Replying to jdemeyer:

Personally, I still think that 15 is the best solution (but I'm still open to other suggestions).

This information is cheaply available when cython does its work, We could ask the cython devs if they would be willing to equip their function objects with attributes that play the role of __code__.co_argcount, __code__.co_flags (to see if *args or **kwds is present), __code__.varnames, func_defaults.

Cython writes the code that splits up the *args and **kwargs that python calls at C-API get, and generates errors if it doesn't fit some internal pattern. For inspection and dynamical code, the required pattern information should really be made available.

They could either write it into the docstring (as they already do with source file information) or they could put some attributes on their cython function objects.

Example:

sage: cython("""def f(a,b=0): pass""")
sage: f(a=1)
sage: f(b=1)
TypeError: f() takes at least 1 positional argument (0 given)
sage: f(1,c=1)
TypeError: f() got an unexpected keyword argument 'c'

I think it's quite reasonable if cython would make available what the number of positional arguments can be and which keyword arguments are accepted.

comment:34 in reply to: ↑ 33 Changed 6 years ago by SimonKing

Replying to nbruin:

Replying to jdemeyer:

Personally, I still think that 15 is the best solution (but I'm still open to other suggestions).

This information is cheaply available when cython does its work, We could ask the cython devs if they would be willing to equip their function objects with attributes that play the role of __code__.co_argcount, __code__.co_flags (to see if *args or **kwds is present), __code__.varnames, func_defaults.

If I recall correctly, there is a compile option to do so.

In any case, I believe that for now we should proceed as follows:

  • Avoid source code inspection in the cached method's __get__ (as done by the currently attached branch)
  • When creating the ArgumentFixer, we need to know the argspec. If it can not be determined, then no error should be raised. Instead, a generic argument fixer (or no argument fixer at all) shall be used, meaning that the argspec is assumed to be formed by self, *args and **kwds. This I can implement shortly, as soon as I have reinstalled ccache on my current Sage version (now it takes hours to switch branches :-/).
  • In the long run, ask the cython devs to provide functions that can access the relevant information.

comment:35 in reply to: ↑ 33 Changed 6 years ago by jdemeyer

Replying to nbruin:

They could either write it into the docstring (as they already do with source file information) or they could put some attributes on their cython function objects.

I think the docstring is the only option: look at the fields of PyMethodDef

Even then, I would propose to use just *args and **kwds for Cython methods, which will surely be faster than messing with the list of arguments. In the case where uniquess really matters for Cython methods, a wrapper could be added to fix the arguments.

comment:36 follow-up: Changed 6 years ago by SimonKing

Now I am totally puzzled. There are the Cython modules sage.misc.function_mangling and sage.misc.cachefunc, but they don't appear in src/module_list.py! So, why are they built at all? Is that a recent change? E.g.., is it not needed anymore to mention a .pyx file in module_list.py if it has no .pxd and no compile options?

comment:37 in reply to: ↑ 36 ; follow-up: Changed 6 years ago by jdemeyer

Replying to SimonKing:

Now I am totally puzzled. There are the Cython modules sage.misc.function_mangling and sage.misc.cachefunc, but they don't appear in src/module_list.py! So, why are they built at all? Is that a recent change? E.g.., is it not needed anymore to mention a .pyx file in module_list.py if it has no .pxd and no compile options?

See #17767

comment:38 follow-ups: Changed 6 years ago by SimonKing

Soon I will submit another commit. With it, I can do the following:

I created a file src/sage/misc/cache_test.pyx and rebuilt Sage. In fact, the new pyx file got compiled even without mentioning it in module_list.py. Content of the file:

from sage.misc.cachefunc import cached_method, cached_function
from sage.structure.parent cimport Parent

@cached_function
def bla(x,y,z=2):
    return (x+y)*z

cdef class Foo(Parent):
    @cached_method
    def bar(self, a,b, c=1):
        return (a+b)*c

I started a Sage session and got:

sage: from sage.misc.cache_test import Foo, bla
sage: bla(2,3,2) is bla(2,3,z=2)
True
sage: F = Foo()
sage: F.bar(2,3,2) is F.bar(2,3,c=2)
True

as expected.

Then I left Sage, removed src/sage/misc/cache_test.pyx. Restarting (but not recompiling!) Sage, I now get:

sage: from sage.misc.cache_test import Foo, bla
sage: bla(2,3,2) is bla(2,3,z=2)
False
sage: bla(2,3,2) == bla(2,3,z=2)
True
sage: F = Foo()
sage: F.bar(2,3,2) is F.bar(2,3,c=2)
False
sage: F.bar(2,3,2) == F.bar(2,3,c=2)
True
sage: bla(2,3,2) is bla(2,3,2)
True
sage: F.bar(2,3,2) is F.bar(2,3,2)
True

So, the argument mangling fails, but at least one can call the wrapped functions and methods in the expected way, with a basic level of caching.

Now the questions are: Would that be enough to declare the problem as "fixed for now"? And how to doctest it? Is it possible/reasonable to have a source file that is temporarily moved away by its own doctest?

Branch will soon be updated.

comment:39 Changed 6 years ago by git

  • Commit changed from 30543e7b283c52cad2fbeb060efe78a74c7b42ba to a0b829dc10b4ccb00b99e073de8a1c0006aa1e51

Branch pushed to git repo; I updated commit sha1. New commits:

a0b829dMake a cached function/method work with default argspec when introspection fails

comment:40 in reply to: ↑ 37 Changed 6 years ago by SimonKing

Replying to jdemeyer:

See #17767

Special-casing for various modules? Ouch, I don't like that.

comment:41 follow-up: Changed 6 years ago by nbruin

If we need to access argument information on cython functions/methods, we might consider using

cython("""
import cython
@cython.binding(True)
def f(a,b=0): pass

def g(a,b=0): pass

def c(F,n_in):
  cdef int n=n_in
  for i in range(n):
    F(1)
""")

With this we have

sage: f.func_defaults
(0,)
sage: f.__code__.co_varnames
('a', 'b')

There might be a penalty to pay, though:

sage: %time c(f,1000000000)
CPU times: user 7.7 s, sys: 10 ms, total: 7.71 s
Wall time: 8.06 s
sage: %time c(g,1000000000)
CPU times: user 7.43 s, sys: 13 ms, total: 7.45 s
Wall time: 7.78 s

If we can do away with source file inspection during startup (as would now happen for cython modules), we should see a significant improvement of startup time too! Finding line numbers in a text file is not a fast operation.

comment:42 in reply to: ↑ 38 ; follow-up: Changed 6 years ago by jdemeyer

Replying to SimonKing:

Is it possible/reasonable to have a source file that is temporarily moved away by its own doctest?

Certainly not :-)

comment:43 in reply to: ↑ 41 ; follow-up: Changed 6 years ago by jdemeyer

Replying to nbruin:

If we need to access argument information on cython functions/methods, we might consider using

cython("""
import cython
@cython.binding(True)
def f(a,b=0): pass

def g(a,b=0): pass

def c(F,n_in):
  cdef int n=n_in
  for i in range(n):
    F(1)
""")

I think this trick only works with module-level functions, not method (but I need to recheck).

comment:44 in reply to: ↑ 42 ; follow-up: Changed 6 years ago by SimonKing

Replying to jdemeyer:

Replying to SimonKing:

Is it possible/reasonable to have a source file that is temporarily moved away by its own doctest?

Certainly not :-)

Why not? The test itself is obtained from a temporary copy of the file, if I recall correctly. One might think of writing a test that renames the original source file, then does the test, and moves the source file back. Of course, it will only work if the user has write permission.

comment:45 in reply to: ↑ 38 Changed 6 years ago by jdemeyer

Replying to SimonKing:

So, the argument mangling fails, but at least one can call the wrapped functions and methods in the expected way, with a basic level of caching.

Honestly, I absolutely don't like it. It's much better if stuff obviously breaks (i.e. the current situation) instead of breaking in a very subtle way (some method relies on uniquess, which is not guaranteed if the source file is missing).

If this issue is fixed, it should be fixed in such a way that it doesn't matter at all whether the source file is present.

comment:46 in reply to: ↑ 44 Changed 6 years ago by jdemeyer

Replying to SimonKing:

The test itself is obtained from a temporary copy of the file, if I recall correctly.

That's no longer true (but the test is read before it is executed, so that's not the issue).

One might think of writing a test that renames the original source file, then does the test, and moves the source file back. Of course, it will only work if the user has write permission.

Doctests should never write anything below SAGE_ROOT and your proposal will break badly if the test is interrupted at the wrong moment (you could try something with try/finally but still it's dangerous). Also, you will get race conditions (imagine you're editing the file while the test is running).

comment:47 in reply to: ↑ 43 ; follow-up: Changed 6 years ago by nbruin

Replying to jdemeyer:

I think this trick only works with module-level functions, not method (but I need to recheck).

It does seem to work for class in cython, but sadly not for cdef class:

cython("""
my_global=[]
def mydec(a):
  global my_global
  my_global.append(a)
  return(a)

import cython
class A(object):
  @mydec
  @cython.binding(True)
  def b(x,y=0):
    pass

cdef class B(object):
  @mydec
  @cython.binding(True)
  def b(x,y=0):
    pass
""")
sage: my_global
[<cyfunction A.b at 0x7f1b0e947590>,
 <method 'b' of '_home_nbruin__sage_temp_art_17765_tmp_DC65IM_spyx_0.B' objects>]

The first object does have the func_* attributes, but the second doesn't. Perhaps that is something that the cython devs can do something about.

comment:48 in reply to: ↑ 47 Changed 6 years ago by jdemeyer

Replying to nbruin:

The first object does have the func_* attributes, but the second doesn't. Perhaps that is something that the cython devs can do something about.

Not without a performance penalty. If you want the method to be a fast PyCFunction, you are limited in which attributes it has.

comment:49 follow-up: Changed 6 years ago by jdemeyer

In other words, I think it needs a Python patch first to add the hooks which then Cython could use.

comment:50 in reply to: ↑ 49 ; follow-up: Changed 6 years ago by nbruin

Replying to jdemeyer:

In other words, I think it needs a Python patch first to add the hooks which then Cython could use.

well ... I think the more reasonable solution is to write the signature in the docstring, which is available. That's what cython already does to make source location available.

It's unfortunate: all the required objects already exist in the cython code, there are just no hooks to attach them to.

comment:51 in reply to: ↑ 50 ; follow-ups: Changed 6 years ago by SimonKing

Replying to nbruin:

It's unfortunate: all the required objects already exist in the cython code, there are just no hooks to attach them to.

That's why I suggest to have a practical fix that works for now. Recall that the problem from the ticket description came from a report on sage-devel, where people actually work with flavours of Sage that come without the source code.

I think that the current branch does not have a high danger of creating subtle caching problems. First of all, with the branch, a change in behaviour can only occur when we have a cached method-with-default-arguments-and/or-args/kwds of a cdef class whose source file is not available. Methods without arguments will be totally fine if the source file is gone (so, the branch does fix a problem).

A change in behaviour would most likely be a problem where coercion is involved, i.e., in UniqueRepresentation. But there, we are safe, since (a) sub-classes of UniqueRepresentation must be Python classes anyway (and Python classes aren't affected by the bug), and (b) UniqueRepresentation.__classcall__ has generic arguments *args,**kwds.

If you think that it is too dangerous to silently change behaviour rather than raise a straight forward error, we could instead raise a warning (only once, similar to deprecation warnings) when a cached method is called on a method whose argspec can't be determined.

comment:52 in reply to: ↑ 51 Changed 6 years ago by nbruin

Replying to SimonKing:

I think that the current branch does not have a high danger of creating subtle caching problems. First of all, with the branch, a change in behaviour can only occur when we have a cached method-with-default-arguments-and/or-args/kwds of a cdef class whose source file is not available. Methods without arguments will be totally fine if the source file is gone (so, the branch does fix a problem).

I agree the danger is "not high" but it's not zero either. Subtle deviations like this are the source of incredibly difficult-to-replicate issues later on, because the reporter will of course forget to report that this is a sourceless distribution. (Perhaps it's not so bad: the bug will just be ignored because we can't replicate it)

I'm not so sure a warning is an option: I'd expect that the first occurrence would be during system startup, so the warning might get lost.

Anyway, I think the fact that we look at the source at all in order to figure out how to preprocess the arguments is a deficiency. I'd prefer if we can use this ticket to track the resolution of that, but if this issue is really urgent for some users, I guess we can merge something along the lines of what you propose.

I expect that we'll find a cython-based solution, possibly docstring based. It might be a configure option to select if you want this extra data in your docstrings, but in sage we probably would, since we process the docstrings for display anyway.

comment:53 in reply to: ↑ 51 Changed 6 years ago by jdemeyer

Replying to SimonKing:

I think that the current branch does not have a high danger of creating subtle caching problems. First of all, with the branch, a change in behaviour can only occur when we have a cached method-with-default-arguments-and/or-args/kwds of a cdef class whose source file is not available. Methods without arguments will be totally fine if the source file is gone (so, the branch does fix a problem).

Proposal 15 also has that property.

A change in behaviour would most likely be a problem where coercion is involved, i.e., in UniqueRepresentation. But there, we are safe, since (a) sub-classes of UniqueRepresentation must be Python classes anyway (and Python classes aren't affected by the bug), and (b) UniqueRepresentation.__classcall__ has generic arguments *args,**kwds.

Python classes would be unaffected by proposal 15.

If you think that it is too dangerous to silently change behaviour rather than raise a straight forward error, we could instead raise a warning (only once, similar to deprecation warnings) when a cached method is called on a method whose argspec can't be determined.

I think we need to agree first on whether the uniquess property for Cython cached methods is important.

  1. If uniqueness is important, then it is indeed "too dangerous to silently change behaviour". ==> Go with the solution of Nils to somehow put the argspec information in the docstring and use that.
  1. If uniqueness is not important, then why bother reading the argspec anyway? ==> Proposal 15

comment:54 Changed 6 years ago by nbruin

Can we perhaps instrument the caching machinery to log any cache hits that would have been misses if *arg, **kwarg would be processed into a cache key without transporting over keywords to positional parameters? Then we can see the extent of the problem.

The main one I expect there is from cases where people specify a default parameter, but with the default value, and that's a serious one:

@cached_function
def a(x,y=1)
  return set([x,y])

Then [a(1),a(1,1)] should contain two identical sets, but only caching on *arg and **kwarg would return result in two non-identical sets, because the second would have an implicit keyword parameter y rather than a second positional parameter.

I think this is potentially quite a serious issue, especially because in sage, much caching happens with the purpose of making things unique (but perhaps currently never in cython applications ...)

comment:55 follow-up: Changed 6 years ago by nbruin

I think this embedsignature=True is what we need:

sage: cython("""
....: import cython
....: cdef class A(object):
....:   @cython.embedsignature(True)
....:   def b(x,y=0): pass
....: """)
sage: print A.b.__doc__
A.b(x, y=0)
File: _home_nbruin__sage_temp_art_5476_tmp_LOYsNA_spyx_0.pyx (starting at line 10)

i.e., if we ensure that we have embedsignature=True (probably globally throughout the library), then we just need to change how cython docstrings are used to accommodate for the fact that the first line is the signature. Then sage_argspec can just look in the docstring to see if it can find a signature there. No source inspection required anymore.

comment:56 in reply to: ↑ 55 Changed 6 years ago by SimonKing

Replying to nbruin:

sage: cython("""
....: import cython
....: cdef class A(object):
....:   @cython.embedsignature(True)
....:   def b(x,y=0): pass
....: """)
sage: print A.b.__doc__
A.b(x, y=0)
File: _home_nbruin__sage_temp_art_5476_tmp_LOYsNA_spyx_0.pyx (starting at line 10)

Cool! Would that option be easy to enable globally? If not, then I still think assuming a generic argspec and printing a warning (once for each affected cached method) would be better than nothing.

comment:57 Changed 6 years ago by jdemeyer

See #17847

comment:58 follow-up: Changed 6 years ago by SimonKing

In other words: Because of Cython caching, we can not use the embedded argspec information, even if we could enable "embedsignature(True)" globally, and additionally, if "embedsignature(True)" was enabled globally, some valid Cython code wouldn't compile.

comment:59 Changed 6 years ago by jdemeyer

  • Status changed from needs_review to needs_work

comment:60 in reply to: ↑ 58 Changed 6 years ago by nbruin

Replying to SimonKing:

Because of Cython caching, we can not use the embedded argspec information

This is a one-off hitch. It just means that when we enable embedsignature globally, people should clear their cython cache (once).

even if we could enable "embedsignature(True)" globally

We can.

if "embedsignature(True)" was enabled globally, some valid Cython code wouldn't compile.

but it's easy to reformulate those cases.

comment:61 Changed 6 years ago by jdemeyer

Let me mention that #17847 is now in Sage, which makes this ticket feasible.

comment:62 Changed 6 years ago by SimonKing

Would it be ok to rebase this branch on top of the current develop branch (and then force-push)? Or would it be ok if I merged this branch into the current develop and start with it? Or merge the current develop into this branch? I still don't get what is considered best praxis.

comment:63 Changed 6 years ago by git

  • Commit changed from a0b829dc10b4ccb00b99e073de8a1c0006aa1e51 to b340eebb3cec1d5f273b9c382599f23736045e98

Branch pushed to git repo; I updated commit sha1. New commits:

831173bMerge branch 't/17814/make_calling_a_cached_method_independent_of_source_code_inspection' into t/17814/rebased-make_calling_a_cached_method_independent_of_source_code_inspection
b340eebUse embedded signature for introspection

comment:64 Changed 6 years ago by SimonKing

  • Status changed from needs_work to needs_review

I am now using the embedded signatures (due to #17847) for introspection. I preserved the previous commits by merging the previous ticket branch into the current develop to get the new ticket branch; hopefully the correct procedure.

In order to demonstrate what is happening, I added a dummy method to a nested test class provided in sage.misc.nested_class. As it turns out, there has been a bug in the previous sage_getargspec that is now fixed. I suppose that the bug actually is in Cython:

sage: from sage.misc.nested_class import MainClass
sage: print MainClass.NestedClass.NestedSubClass.dummy.func_defaults
None

Actually the default is not None but is a tuple, which is correctly determined by the new version of sage_getargspec:

sage: sage_getargspec(MainClass.NestedClass.NestedSubClass.dummy)
ArgSpec(args=['self', 'x', 'r'], varargs='args', keywords='kwds', defaults=((1, 2, 3.4),))

Moreover, the above does not involve reading the source code. I wrote a new function that determines the argspec from the embedded signature, and strips the signature at the same time:

sage: print MainClass.NestedClass.NestedSubClass.dummy.__doc__
NestedSubClass.dummy(self, x, *args, r=(1, 2, 3.4), **kwds)
File: sage/misc/nested_class.pyx (starting at line 314)

                A dummy method to demonstrate the embedding of
                method signature for nested classes.
...
sage: from sage.misc.sageinspect import _extract_embedded_signature
sage: print _extract_embedded_signature(MainClass.NestedClass.NestedSubClass.dummy.__doc__, 'dummy')[0]
File: sage/misc/nested_class.pyx (starting at line 314)

                A dummy method to demonstrate the embedding of
                method signature for nested classes.
...
sage: _extract_embedded_signature(MainClass.NestedClass.NestedSubClass.dummy.__doc__, 'dummy')[1]
ArgSpec(args=['self', 'x', 'r'], varargs='args', keywords='kwds', defaults=((1, 2, 3.4),))

The tests in sage.misc pass.

comment:65 Changed 6 years ago by git

  • Commit changed from b340eebb3cec1d5f273b9c382599f23736045e98 to 0e8ef000e434b654c76299b2ea694ca15be965a3

Branch pushed to git repo; I updated commit sha1. New commits:

0e8ef00Catch syntax error when extraction of signature fails

comment:66 follow-up: Changed 6 years ago by vdelecroix

On 6.7-beta2 I got

sage: K=GF(5)
sage: K.factored_order
Cached version of <method 'factored_order' of
'sage.rings.finite_rings.finite_field_base.FiniteField' objects>

comment:67 follow-up: Changed 6 years ago by vdelecroix

  • Status changed from needs_review to needs_info

sorry already in comment:27... could you update the description then?

comment:68 in reply to: ↑ 66 ; follow-up: Changed 6 years ago by SimonKing

Replying to vdelecroix:

On 6.7-beta2 I got

sage: K=GF(5)
sage: K.factored_order
Cached version of <method 'factored_order' of
'sage.rings.finite_rings.finite_field_base.FiniteField' objects>

You mean, you got it after removing src/sage/rings/finite_rings/finite_field_base.pyx? Then you say the problem is fixed.

comment:69 in reply to: ↑ 67 Changed 6 years ago by SimonKing

Replying to vdelecroix:

sorry already in comment:27... could you update the description then?

I don't understand what info you need.

comment:70 in reply to: ↑ 68 ; follow-up: Changed 6 years ago by vdelecroix

Replying to SimonKing:

Replying to vdelecroix:

On 6.7-beta2 I got

sage: K=GF(5)
sage: K.factored_order
Cached version of <method 'factored_order' of
'sage.rings.finite_rings.finite_field_base.FiniteField' objects>

You mean, you got it after removing src/sage/rings/finite_rings/finite_field_base.pyx? Then you say the problem is fixed.

Nope. Just starting from a fresh sage-6.7.beta2.

comment:71 in reply to: ↑ 70 Changed 6 years ago by vdelecroix

Replying to vdelecroix:

Replying to SimonKing:

Replying to vdelecroix:

On 6.7-beta2 I got

sage: K=GF(5)
sage: K.factored_order
Cached version of <method 'factored_order' of
'sage.rings.finite_rings.finite_field_base.FiniteField' objects>

You mean, you got it after removing src/sage/rings/finite_rings/finite_field_base.pyx? Then you say the problem is fixed.

Nope. Just starting from a fresh sage-6.7.beta2.

My mistake... I read too fast!!

comment:72 Changed 6 years ago by SimonKing

  • Status changed from needs_info to needs_review

I still don't see what info was requested. Hence, I change it back to "needs review".

comment:73 Changed 6 years ago by vbraun

  • Reviewers set to Volker Braun
  • Status changed from needs_review to positive_review

comment:74 Changed 6 years ago by vbraun

  • Branch changed from u/SimonKing/make_calling_a_cached_method_independent_of_source_code_inspection to 0e8ef000e434b654c76299b2ea694ca15be965a3
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.