Opened 4 years ago

Closed 4 years ago

#18848 closed defect (fixed)

do not save() objects we cannot load()

Reported by: dimpase Owned by:
Priority: major Milestone: sage-6.8
Component: interfaces Keywords:
Cc: ncohen, vbraun Merged in:
Authors: Simon King Reviewers: Volker Braun
Report Upstream: N/A Work issues:
Branch: bb63cd2 (Commits) Commit: bb63cd205f7150776d04f5e1e7ec02438b8dc384
Dependencies: Stopgaps:

Description

currently

sage: x=singular(2)
sage: x.save("blah")

results in blah.sobj we cannot load().

cf. sage-devel post

Change History (35)

comment:1 follow-up: Changed 4 years ago by dimpase

Blindly doing save(x.sage(),"blah") is not quite right, as this would e.g. break saving/loading Python objects (something that works just fine).

Ideas?

comment:2 Changed 4 years ago by dimpase

  • Cc SimonKing added

comment:3 in reply to: ↑ 1 Changed 4 years ago by SimonKing

  • Cc SimonKing removed

Replying to dimpase:

Blindly doing save(x.sage(),"blah") is not quite right, as this would e.g. break saving/loading Python objects (something that works just fine).

Well, each object has a __reduce__ method. In the case of sage.interfaces.interface.InterfaceElement.__reduce__, this is

    def __reduce__(self):
        return reduce_load, (self.parent(), self._reduce())

and

    def _reduce(self):
        return repr(self)

The latter should change (by default) to

    def _reduce(self):
        return self.sage()

But then, it is still needed to fix something, since SingularElement has its own __reduce__ method, that explicitly makes sure that the result will be invalid:

    def __reduce__(self):
        """
        Note that the result of the returned reduce_load is an invalid
        Singular object.

        EXAMPLES::

            sage: singular(2).__reduce__()
            (<function reduce_load at 0x...>, ())
        """
        return reduce_load, ()  # default is an invalid object

And finally, one should remove the stupid custom reduce_load methods. There is a default one:

def reduce_load(parent, x):
    return parent(x)

but the Singular and GAP interfaces override it to, for example,

def reduce_load():
    """
    Returns an invalid GAP element. Note that this is the object
    returned when a GAP element is unpickled.

    EXAMPLES::

        sage: from sage.interfaces.gap import reduce_load
        sage: reduce_load()
        <repr(<sage.interfaces.gap.GapElement at 0x...>) failed: ValueError: The session in which this object was defined is no longer running.>
        sage: loads(dumps(gap(2)))
        <repr(<sage.interfaces.gap.GapElement at 0x...>) failed: ValueError: The session in which this object was defined is no longer running.>
    """
    return GapElement(None, None)

So, summary of the status quo: Several interfaces explicitly override the default to make sure that pickling does NOT work. And the default currently is not good enough to ensure pickling, since the default relies on string representations.

Summary of the proposed solution: Interfaces should rely on the default pickling, as long as they don't have anything substantial to say about pickling. And the default should rely on the .sage() method, whose functionality should be improved in some cases.

comment:4 Changed 4 years ago by SimonKing

Or perhaps one should do a compromise: If self.sage() throws an error, then we should try if the string representation is good enough to reconstruct the object. If it does then use it, otherwise throw an error upon pickling.

It would be better than nothing. But it could be quite costly to first check, for each interface element to be pickled, if the string representation is good enough.

comment:5 Changed 4 years ago by SimonKing

PPS: In the case of GAP elements, string representation sometimes is not good enough, and GAP does not support pickling by default. But there was some thread on sage-devel concerning that point. It was mentioned that with the IO package of GAP, which is GPL3, pickling of individual objects can be achieved. Perhaps we can make it a standard package?

Last edited 4 years ago by SimonKing (previous) (diff)

comment:6 follow-up: Changed 4 years ago by SimonKing

PPPS: Both Singular and GAP can dump a complete session. I don't know if that has been wrapped in Sage's interface. But it could be useful for some. Not for pickling individual objects, though.

comment:7 in reply to: ↑ 6 ; follow-up: Changed 4 years ago by dimpase

Replying to SimonKing:

PPPS: Both Singular and GAP can dump a complete session. I don't know if that has been wrapped in Sage's interface. But it could be useful for some. Not for pickling individual objects, though.

one can save GAP's workspace via gap.save_workspace(...), but there is no mechanism in Sage to load GAP with such a custom-made workspace; although this is doable.

No idea about Singular though.

comment:8 in reply to: ↑ 7 ; follow-up: Changed 4 years ago by SimonKing

Replying to dimpase:

Replying to SimonKing:

PPPS: Both Singular and GAP can dump a complete session. I don't know if that has been wrapped in Sage's interface. But it could be useful for some. Not for pickling individual objects, though.

one can save GAP's workspace via gap.save_workspace(...), but there is no mechanism in Sage to load GAP with such a custom-made workspace; although this is doable.

Haha! So, it is "write only memory"...

No idea about Singular though.

See dump.

comment:9 in reply to: ↑ 8 Changed 4 years ago by SimonKing

Replying to SimonKing:

Replying to dimpase:

one can save GAP's workspace via gap.save_workspace(...), but there is no mechanism in Sage to load GAP with such a custom-made workspace; although this is doable.

Haha! So, it is "write only memory"...

I had a look, and it seems that really it is impossible to load a workspace into a running session (overriding all previously assigned variables of the session). One can only start a new session that is using the saved workspace. Doh...

comment:10 follow-up: Changed 4 years ago by SimonKing

Something totally different: Why is there a .save() and .dumps() method? Since __reduce__ is implemented, save(Foo, "bar") would be the recommended way to store Foo in "bar.sobj", and Foo.save("bar") isn't really adding anything. Can we deprecate these methods?

Last edited 4 years ago by SimonKing (previous) (diff)

comment:11 in reply to: ↑ 10 Changed 4 years ago by SimonKing

Replying to SimonKing:

Something totally different: Why is there a .save() and .dumps() method? Since __reduce__ is implemented, save(Foo, "bar") would be the recommended way to store Foo in "bar.sobj", and Foo.save("bar") isn't really adding anything. Can we deprecate these methods?

These are not defined on the level of sage.interfaces, but sage.structure.sage_object. So, I guess I better don't touch it...

comment:12 Changed 4 years ago by SimonKing

  • Branch set to u/SimonKing/do_not_save___objects_we_cannot_load__

comment:13 Changed 4 years ago by SimonKing

  • Authors set to Simon King
  • Commit set to 547f511ff186696ac2617c6324852c59b079c807
  • Status changed from new to needs_review

I think the attached branch is a step to the right direction. It implements most of what I announced above:

  • sage.interfaces.gap and sage.interfaces.singular do not override the pickling protocol from sage.interfaces.interface.

-. The default pickling tries to convert the interface element to a Sage object. Unpickling converts this element back to an interface element. If conversion to a Sage object fails with a NotImplementedError, then the string representation is used, as a last resort.

  • I added to SingularElement._sage_() the forgotten case of integers.

I am not sure if it solves the problem to the extent possible. For example, one still can not unpickle a string in gap or singular:

sage: loads(dumps(singular('"a"')))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
...
TypeError: (TypeError(SingularError('Singular error:\n   ? `a` is undefined\n   ? error occurred in or before STDIN line 14: ``',),), <function reduce_load at 0x7f08e40a1500>, (Singular, 'a'))

Reason is: The corresponding Sage object of singular('"a"') is the string "a", but the evaluation singular("a") fails.

It has failed before. So, I am not sure if it is absolutely critical to fix this. Possible semi-fixes are:

  • singular('"a"').sage() should return the string "'a'", not just "a". But this wouldn't be the correct answer.
  • Alternatively, singular("a") could first try if there is an object in singular whose name is "a", and return singular("'a'") if that fails. That's bad, because think of what happens if the name a is defined in Singular...

A real fix would be to modify the _reduce() method so that it recognises whether self is a string in the respective interface. I.e., there would be a new method is_string(), and we would do

def _reduce(self):
    if self.is_string():
        return repr(repr(self))
    try:
        return self.sage()
    except NotImplementedError:
        return repr(self)

And is_string() would of course be implemented in each interface, such as testing self.type()=="string" in the case of SingularElement or bool(self.IsString()) in the case of GapElement.

Do you think we should care about strings?


New commits:

547f511Improve pickling for InterfaceElement

comment:14 Changed 4 years ago by git

  • Commit changed from 547f511ff186696ac2617c6324852c59b079c807 to 2445264ab24d28344b7bb850c2bdaef446f1e65c

Branch pushed to git repo; I updated commit sha1. New commits:

2445264Allow pickling for more types of interface elements, and preserve backwards compatibility

comment:15 Changed 4 years ago by SimonKing

With the new commit, many interface elements became picklable that haven't been before. Previously, elements of the r or singular or gap or pari or maxima interface would unpickle as invalid. But, as demonstrated in the new tests, now they can be pickled and unpickled.

One missing case are strings in maxima. In fact, a string in maxima is not converted to a string in Sage, but it is attempted to transform it into an expression, which for what ever reason fails.

Do you know how to recognise a string in maxima? I.e., is there a command in maxima that tells whether a variable is a string?

In any case, I believe the current branch extends the pickling capabilities of interface elements considerably.

comment:16 Changed 4 years ago by dimpase

What I don't understand is why Lisp interface does not suffer from this issue:

sage: x=lisp(1)
sage: x
1
sage: type(x)
<class 'sage.interfaces.lisp.LispElement'>
sage: x.save("/tmp/foo")
sage: load("/tmp/foo")
1
sage: y=load("/tmp/foo")
sage: y==x
True

And in fact /tmp/foo.sobj is loadable in a fresh Sage session.

comment:17 follow-up: Changed 4 years ago by nbruin

That works because it actually has something implemented:

sage: explain_pickle(lisp(1).dumps())
pg_reduce_load = unpickle_global('sage.interfaces.interface', 'reduce_load')
pg_reduce_load_Lisp = unpickle_global('sage.interfaces.lisp', 'reduce_load_Lisp')
pg = unpickle_instantiate(pg_reduce_load_Lisp, ())
pg_reduce_load(pg, '1')

And if you look at reduce_load_Lisp, it simply returns lisp so the pickle consists of the instructions "execute lisp(1)", which indeed reconstructs the object. If you try to do this with more complicated objects, you'll see it go wrong for whatever reason (mainly because the lisp interface is very rudimentary):

sage: a=lisp("'(1 2)")
sage: a
(1 2)
sage: loads(a.dumps())
1

which is explained by the fact that lisp('(1 2)') doesn't produce something that equals a.

comment:18 in reply to: ↑ 17 Changed 4 years ago by SimonKing

Dima, what do you mean by "this issue"?

Replying to nbruin:

That works because it actually has something implemented:

sage: explain_pickle(lisp(1).dumps())
pg_reduce_load = unpickle_global('sage.interfaces.interface', 'reduce_load')
pg_reduce_load_Lisp = unpickle_global('sage.interfaces.lisp', 'reduce_load_Lisp')
pg = unpickle_instantiate(pg_reduce_load_Lisp, ())
pg_reduce_load(pg, '1')

And if you look at reduce_load_Lisp, it simply returns lisp so the pickle consists of the instructions "execute lisp(1)", which indeed reconstructs the object.

Well, this is how it used to work all the time. Each interface "Foo" has a __reduce__ method that returns reduce_load_Foo, and reduce_load_Foo() returns Foo(). And each ELEMENT (!) of Foo (not Foo itself) also used to have a __reduce__ method. The default was that the __reduce__ method returns reduce_load, (self.parent(), self._reduce()), and reduce_load(P, R) did then return P(R), thus, unpickling the element.

The issue was that

  1. often the string representation of an element is not valid input to the interface and
  2. several sub-classes of InterfaceElement overrode __reduce__ and reduce_load, making absolutely sure that under no circumstances pickling could work (not even when the string representation happened to be suitable for pickling).

That's what my branch changed (plus some improvements for conversion from InterfaceElement to Sage, which is now used for pickling instead of string representation).

comment:19 Changed 4 years ago by git

  • Commit changed from 2445264ab24d28344b7bb850c2bdaef446f1e65c to 123f29504557ae471ac9d55b0c5a6fbad587c30f

Branch pushed to git repo; I updated commit sha1. New commits:

123f295Fix doctests

comment:20 Changed 4 years ago by SimonKing

There were tests to be fixed. Actually one of the failing tests tried to make sure that the unpickled element is invalid...


New commits:

123f295Fix doctests

comment:21 Changed 4 years ago by SimonKing

I am just wondering: Should we perhaps get rid of reduce_load? After all, an interface is callable, and hence there is no need to have a function that calls the interface when we can do it directly.

comment:22 Changed 4 years ago by git

  • Commit changed from 123f29504557ae471ac9d55b0c5a6fbad587c30f to 595df9606f14cffa7f8131aedb6daeaf1b08c1ce

Branch pushed to git repo; I updated commit sha1. New commits:

595df96Remove reduce_load from sage.interfaces.interface

comment:23 Changed 4 years ago by git

  • Commit changed from 595df9606f14cffa7f8131aedb6daeaf1b08c1ce to d123d33c89ff4524ecf960d894ea07a1f0e96196

Branch pushed to git repo; I updated commit sha1. New commits:

d123d33Add one doctest

comment:24 Changed 4 years ago by vbraun

  • Reviewers set to Volker Braun
  • Status changed from needs_review to positive_review

comment:25 follow-up: Changed 4 years ago by vbraun

  • Status changed from positive_review to needs_work

Fails on 32-bit linux:

sage -t --long src/sage/interfaces/gp.py
**********************************************************************
File "src/sage/interfaces/gp.py", line 878, in sage.interfaces.gp.GpElement._reduce
Failed example:
    loads(dumps(E))  # indirect doctest
Expected:
    [1, 2, 3, 4, 5, 9, 11, 29, 35, -183, -3429, -10351, 6128487/10351, Vecsmall([1]),
    [Vecsmall([128, -1])], [0, 0, 0, 0, 0, 0, 0, 0]]
Got:
    [1, 2, 3, 4, 5, 9, 11, 29, 35, -183, -3429, -10351, 6128487/10351, Vecsmall([1]), [Vecsmall([96, -1])], [0, 0, 0, 0, 0, 0, 0, 0]]
**********************************************************************
File "src/sage/interfaces/gp.py", line 881, in sage.interfaces.gp.GpElement._reduce
Failed example:
    E.sage()
Expected:
    [1,
     2,
     3,
     4,
     5,
     9,
     11,
     29,
     35,
     -183,
     -3429,
     -10351,
     6128487/10351,
     [1],
     [[128, -1]],
     [0, 0, 0, 0, 0, 0, 0, 0]]
Got:
    [1,
     2,
     3,
     4,
     5,
     9,
     11,
     29,
     35,
     -183,
     -3429,
     -10351,
     6128487/10351,
     [1],
     [[96, -1]],
     [0, 0, 0, 0, 0, 0, 0, 0]]
**********************************************************************
1 item had failures:
   2 of   4 in sage.interfaces.gp.GpElement._reduce
    [155 tests, 2 failures, 7.53 s]

comment:26 in reply to: ↑ 25 Changed 4 years ago by SimonKing

Replying to vbraun:

Fails on 32-bit linux:

I did not change the _sage_() method for !GPElement, if I recall correctly. Thus, the fact that things changed from yielding [128, -1] to [96, -1] has been there before. And now I wonder if that's a bug, or if the different answer just tells something about internal gp data structures that depend on the architecture. In the latter case, we should replace [128, -1] by [..., -1] in the expected answer.

comment:27 Changed 4 years ago by SimonKing

I guess it would be better to ask on sage-nt or sage-algebra, since it appears to be about elliptic curves.

comment:28 follow-up: Changed 4 years ago by SimonKing

Volker, before I ask sage-nt, can you show me the result of

sage: gp('ellinit([1,2,3,4,5])')
[1, 2, 3, 4, 5, 9, 11, 29, 35, -183, -3429, -10351, 6128487/10351, Vecsmall([1]), [Vecsmall([128, -1])], [0, 0, 0, 0, 0, 0, 0, 0]]

on 32-bit linux?

If it gives [..., [Vecsmall([96, -1])], ...], then the problem is very likely a different internal representation, and we should just replace [128, -1] by [..., -1]. If it isn't, and the different answer only occurs upon pickling, then we have a serious problem.

Last edited 4 years ago by SimonKing (previous) (diff)

comment:29 Changed 4 years ago by SimonKing

PS: Or even better, we should just replace the test by

sage: E = gp('ellinit([1,2,3,4,5])')
sage: loads(dumps(E)) == E
True

I'll commit and push it now.

comment:30 Changed 4 years ago by git

  • Commit changed from d123d33c89ff4524ecf960d894ea07a1f0e96196 to bb63cd205f7150776d04f5e1e7ec02438b8dc384

Branch pushed to git repo; I updated commit sha1. New commits:

bb63cd2Make a test architecture independent

comment:31 Changed 4 years ago by SimonKing

  • Status changed from needs_work to needs_review

comment:32 in reply to: ↑ 28 ; follow-up: Changed 4 years ago by dimpase

Replying to SimonKing:

Volker, before I ask sage-nt, can you show me the result of

sage: gp('ellinit([1,2,3,4,5])')
[1, 2, 3, 4, 5, 9, 11, 29, 35, -183, -3429, -10351, 6128487/10351, Vecsmall([1]), [Vecsmall([128, -1])], [0, 0, 0, 0, 0, 0, 0, 0]]

on 32-bit linux?

If it gives [..., [Vecsmall([96, -1])], ...], then the problem is very likely a different internal representation, and we should just replace [128, -1] by [..., -1]. If it isn't, and the different answer only occurs upon pickling, then we have a serious problem.

here you are:

sage: gp('ellinit([1,2,3,4,5])')
[1, 2, 3, 4, 5, 9, 11, 29, 35, -183, -3429, -10351, 6128487/10351, Vecsmall([1]), [Vecsmall([96, -1])], [0, 0, 0, 0, 0, 0, 0, 0]]

on

dima@arando:~/sage/dev/sage$ uname -a
Linux arando 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 17:56:26 UTC 2014 i686 i686 i686 GNU/Linux

Indeed, you see [Vecsmall([96, -1])] there!

(most probably the same machine Volker was referring to).

comment:33 in reply to: ↑ 32 Changed 4 years ago by SimonKing

Replying to dimpase:

Indeed, you see [Vecsmall([96, -1])] there!

OK. In that case, the string representation is machine dependent, and we have to use a

 sage: loads(dumps(E)) == E
 True

test (which is cleaner anyway).

The main point of my test was to show that pickling of the gp interface does not rely on the sage-representation of a gp-element, but on the string representation (whereas by this ticket all other interfaces use the sage-representation when available). So, in the most recent commit, I change the test into

 sage: E == gp(E.sage())
 False

which hopefully is the same on 32-bit.

In other words: Needs review again...

comment:34 Changed 4 years ago by vbraun

  • Status changed from needs_review to positive_review

comment:35 Changed 4 years ago by vbraun

  • Branch changed from u/SimonKing/do_not_save___objects_we_cannot_load__ to bb63cd205f7150776d04f5e1e7ec02438b8dc384
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.