Opened 8 years ago

Last modified 7 years ago

#15207 new defect

SageObject breaks pickling circular structures

Reported by: nbruin Owned by:
Priority: major Milestone: sage-6.4
Component: misc Keywords:
Cc: SimonKing, vbraun, jkeitel Merged in:
Authors: Reviewers:
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description

See this sage-devel thread. There's a problem where SageObject.__hash__ requires __repr__ to be operational, which with standard pickling may not be the case, because attributes only set after __setstate__ may be required.

The solution is to change the way objects are constructed during unpickling. Python offers several hooks for this. The purpose of this ticket is to find a way that is easy to adapt.

Attachments (1)

richpickle.py (3.0 KB) - added by nbruin 8 years ago.
a possibly light interface

Download all attachments as: .zip

Change History (10)

comment:1 Changed 8 years ago by nbruin

class Node(base):
    def __init__(self,i):
        self.i=i
        self.neighbours=set()
    def __repr__(self):
        return "Node %s with neighbours %s"%(self.i,sorted([c.i for c in self.neighbours]))
    def connected_component(self):
        nodelist=[]
        todo=[self]
        visited=set()
        while todo:
            n=todo.pop()
            if n.i not in visited:
                nodelist.append(n)
                todo.extend(n.neighbours)
                visited.add(n.i)
        nodelist.sort(key=lambda n:n.i)
        return nodelist

with base=object this works:

from richpickle import Node
n=4
N=[Node(i) for i in range(n)]
for j in range(n):
        N[j].neighbours=set(N[i] for i in range(n) if i != j)

loads(dumps(N[0])).connected_component()

with base=SageObject it fails, because now the attribute i in required for __hash__ via __repr__.

Changed 8 years ago by nbruin

a possibly light interface

comment:2 Changed 8 years ago by nbruin

The attached file is a rough draft. The idea is that just adding the line

    _init_attributes = ('i',)

to the class definition of Node, together with setting base=RichPickle,SageObject, restores pickling circular structures, because the initialization of i is moved forward.

Just specifying which attributes need to be reinstated earlier may feel a lot lighter to people implementing classes than having to write a full __reduce__.

comment:3 Changed 8 years ago by jkeitel

  • Cc jkeitel added

comment:4 follow-up: Changed 8 years ago by SimonKing

So, you suggest to introduce a base class RichPickle, which manages certain attributes that absolutely must be present before __setstate__. These attributes are indicated by a list of strings stored as a class attribute.

This makes sense to me. I guess this new base class will save me a lot of trouble when attempting to implement proper pickling on all homsets. So, thumbs up from me...

comment:5 in reply to: ↑ 4 Changed 8 years ago by nbruin

Replying to SimonKing:

Indeed, that's the idea. The next step is interface and performance:

  • Should we try to also implement the __getnewargs__ protocol to dredge up the arguments needed for the new call?
  • should we just depend on super.__reduce__ and modify its return value to include the actions we want to take?
  • the base on which __new__ gets called upon reconstruction needs more attention: When inheriting from SageObject, RichPickle (rather than the other way around) upickling triggers an error about an unsafe __new__.
  • I think it's a good idea to walk the MRO to collect all the attributes that need special treatment (this is already implemented), so that RickPickle? also works properly when multiple classes in the MRO make use of it. Do we need to do that every time, though or should we store what we found?
  • Should this be a separate class or should we integrate this behaviour into some existing class (logically, I think it should be separate, but perhaps it's better for performance to arrange it otherwise)
  • We need a better name. RichPickle really doesn't tell you what's so rich about it -- and really, it's only a modest extension of the protocol followed for object, so it's not so rich anyway.

comment:6 Changed 8 years ago by nbruin

A slightly different approach for CategoryObject: it caches its hash already, so all we need to do is pickle it as well and ensure the cache is reinitialized by the constructor, not by setstate. If we add the following to sage.structure.category_object.CategoryObject:

    def __reduce__(self):
        if not isinstance(self,CategoryObject):
            raise TypeError("this only works for CategoryObject")
        cdef CategoryObject Cco=self
        try:
            getstate = self.__getstate__
        except AttributeError:
            if getattr(self, "__slots__", None):
                raise TypeError("a class that defines __slots__ without "
                                "defining __getstate__ cannot be pickled")
            try:
                dict = self.__dict__
            except AttributeError:
                dict = None
        else:
            dict = getstate()
        if Cco._hash_value == -1:
            reconstructor = reconstruct_CategoryObject_without_hash
            args=(self.__class__)
        else:
            reconstructor = reconstruct_CategoryObject_with_hash
            args=(self.__class__,Cco._hash_value,)
        if dict:
            return reconstructor, args, dict
        else:
            return reconstructor, args

as well as:

def reconstruct_CategoryObject_with_hash(cls,hash,*args):
    C=cls.__new__(cls,*args)
    if not isinstance(C,CategoryObject):
        raise TypeError("this only works for CategoryObject")
    cdef CategoryObject Cco=C
    Cco._hash_value=hash
    return C
    
def reconstruct_CategoryObject_without_hash(cls,*args):
    return cls.__new__(cls,*args)

we can do:

sage: L=LaurentPolynomialRing(QQ,'t')
sage: L.d={L:1}
sage: C=loads(dumps(L))

which otherwise would fail. There are some doctests in src/sage/structure/factory.pyx that fail, so this approach needs a bit of adjustment, but I think something along these lines would help a lot.

Of course, we just kick the can a little further. It's easy to create two non-identical rings with the same hash, so that equality testing really becomes important for the dictionary lookup:

sage: L1=LaurentPolynomialRing(QQ,'u,v',order="deglex")
sage: L2=LaurentPolynomialRing(QQ,'u,v',order="lex")
sage: L1.d={L1:1,L2:2}
sage: L2.d={L1:1,L2:2}
sage: loads(dumps(L1))
AttributeError: 'LaurentPolynomialRing_mpair_with_category' object has no attribute '_R'

because although the rings hash OK, their _R attribute, required in the __cmp__ implementation, is not initialized.

comment:7 Changed 8 years ago by vbraun_spam

  • Milestone changed from sage-6.1 to sage-6.2

comment:8 Changed 7 years ago by vbraun_spam

  • Milestone changed from sage-6.2 to sage-6.3

comment:9 Changed 7 years ago by vbraun_spam

  • Milestone changed from sage-6.3 to sage-6.4
Note: See TracTickets for help on using tickets.