Opened 3 years ago
Last modified 3 years ago
#28302 new defect
stopgap for save/load that is not working accross different Sage versions
Reported by: | Vincent Delecroix | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-wishlist |
Component: | misc | Keywords: | |
Cc: | Merged in: | ||
Authors: | Reviewers: | ||
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description
According to the manual the commands loads/save are supposed to give an easy way to save and retrieve data on a hard drive. Sadly this is not safe accross different Sage versions.
This should either be widely advertized in the documentation, emitting a warning or a stopgap.
Attachments (2)
Change History (19)
comment:1 Changed 3 years ago by
comment:2 Changed 3 years ago by
Milestone: | sage-8.9 → sage-wishlist |
---|
comment:3 follow-ups: 4 10 Changed 3 years ago by
See also #24337.
I believe the only solution here is better documentation emphasizing that save()
/load()
use pickle, and are not guaranteed to work between different versions of Sage, or even different versions of Python, and should only be used for temporary storage, not long-term portable serialization.
comment:4 Changed 3 years ago by
Replying to embray:
See also #24337.
I believe the only solution here is better documentation emphasizing that
save()
/load()
use pickle, and are not guaranteed to work between different versions of Sage, or even different versions of Python, and should only be used for temporary storage, not long-term portable serialization.
Indeed, the fact that we should advertise to not use save/load
was even discussed this sage-devel thread.
Do you have any suggestion for "long-term portable serialization"?
comment:5 follow-up: 6 Changed 3 years ago by
I agree that we should avoid this misunderstanding about save
and load
. I had it one and a half year ago, too. But surely, it is much more annoying if you have published sobj
-files that run out date as in the case from the sage-devel-thread!
On the other hand, if these functions will be disabled I would appreciate to have an option to force their use, even so. Similarly, if there will be a warning, an option to suppress it would be nice. The reason is, that I found a way, which works pretty good in my case where the Sage objects are matrices with entries in
Multivariate Polynomial Ring in u, v over Univariate Laurent Polynomial Ring in w over Integer Ring
respectively
Multivariate Laurent Polynomial Ring in a, b, c over Univariate Quotient Polynomial Ring in e3 over Integer Ring with modulus e3^2 + e3 + 1
I converted them (recursively) into python dictionaries
(over python dictionaries
...) ending up with values which are nothing but python int
. There is very good support by Sage to convert to and from such dictionaries (but this may be different in other cases, as for example with number fields).
Thus, my sobj
-files have been robust against version upgrades since than (and the conversion saved some disk space, as well)! Sure, there is no guarantee that compatibility will keep forever. But the chance is better since this is closer to kernel functionality.
Maybe there is a way, to have such a solution being supported more systematically.
comment:6 follow-up: 7 Changed 3 years ago by
Replying to soehms:
On the other hand, if these functions will be disabled I would appreciate to have an option to force their use, even so. Similarly, if there will be a warning, an option to suppress it would be nice. The reason is, that I found a way, which works pretty good in my case where the Sage objects are matrices with entries in
Certainly no one is proposing disabling or removing them. Just better clarifying what it is and is not appropriate for (I would say, short-term caching, mainly).
We could also maybe do better about storing with .sobj
files the Sage version with which they were created (if we don't already; I think maybe we do?). Still allow them to be loaded with newer Sage versions if possible, but provide a better error message if not, including the Sage version that should be used to load them.
comment:7 Changed 3 years ago by
Replying to embray:
Replying to soehms:
On the other hand, if these functions will be disabled I would appreciate to have an option to force their use, even so. Similarly, if there will be a warning, an option to suppress it would be nice. The reason is, that I found a way, which works pretty good in my case where the Sage objects are matrices with entries in
Certainly no one is proposing disabling or removing them. Just better clarifying what it is and is not appropriate for (I would say, short-term caching, mainly).
Maybe I misunderstood this. But, is there really a (generic) way, to suppress the warning of a deprecated function or method?
I had trouble with a deprecation (ticket #16613), which is not implemented completely, that means: there is still library-code left using the deprecated method (I implemented a fix for this in #26421, but better would have done this in a separate ticket). So, how can you get rid of the warning in such a case (apart from waiting until the fix is reviewed)?
We could also maybe do better about storing with
.sobj
files the Sage version with which they were created (if we don't already; I think maybe we do?). Still allow them to be loaded with newer Sage versions if possible, but provide a better error message if not, including the Sage version that should be used to load them.
To enlarge transparency about the version an sobj-file is created for is the one thing. But in addition, wouldn't it be worth to improve the stability against version upgrades as I was claiming above?
I think most of the objects, people want to save to disk are instances of element classes. Many of these classes have methods to convert to a dictionary, whose values are instances of element classes again. I think the latter should be more stable against version upgrades as the former, since usually it is closer to kernel functionality. In the cases I explained in my former post, the recursion ends up with integers.
Furthermore, many parent classes have an _element_constructor_
doing the opposite conversion. So in all such cases, the stability against version upgrades could be improved similarly as I have done it for the matrices over some types of polynomial rings.
comment:8 follow-up: 9 Changed 3 years ago by
Of course, the problem has two sides:
- A Sage user may have old data and wants to use them in a new Sage version.
- A Sage user may have new data and wants to use them in future Sage versions.
If I understand correctly what you are saying:
Sage doesn't even attempt to offer a way to permanently and reliably store user data. If a pickle of user data created by some months of computation time in Sage-x.y.z can not be read in Sage-x.y.(z+1), then it is just tough luck for the user, but isn't considered a bug. And all is good if we simply write in the documentation that a user shouldn't use Sage to store mathematical data.
If that understanding of your statements is correct then I very strongly disagree with you. And that's also something that I find very disturbing in Python-3: Apparently, Python-2 str basically is the same as Python-3 bytes, and Python-2 unicode basically is the same as Python-3 str. But, surprisesurprise: Python-3 attempts to unpickle the pickle of a Python-2 str as Python-3 str (not as Python-3 bytes) -- which isn't exactly consistent (and fails in some cases).
Granted, it can happen that a Sage-x.y.z pickle cannot be unpickled in Sage-x.y.(z+1). However, if that happens then it clearly is a bug and it is our duty to attempt to fix it. Within reason, of course. But in the first place it is our duty. And since the default way of storing Sage data used to be pickling, we should acknowledge that it is a bug if an old pickle cannot be unpickled by a new Sage version.
I believe that any self-respecting software project should provide at least one way of storing data with the promise that the stored data will still be useable in future versions of the software. Of course, there may be different approaches to keep that promise. Such us: Provide a conversion function that turns the data format of version x.y into the data format of version x.(y+1), so that the user can access old stored data version x.y by repeated application of such functions, and then use them in version x.(y+15). However, it is not ok to tell the user: "Well, you may have data from version x.y, but you will not be able to use the old data together with the cool new features of version x.(y+15)".
comment:9 follow-ups: 11 16 Changed 3 years ago by
Replying to SimonKing:
Provide a conversion function that turns the data format of version x.y into the data format of version x.(y+1), so that the user can access old stored data version x.y by repeated application of such functions, and then use them in version x.(y+15).
So, in the beginning of this thread, I have proposed an upgrade mechanism of sage objects from a version to another version of sage. I modified it further. The idea is to add into sage_object.pyx
something like
def __getstate__(self): if hasattr(self, '_getstate_'): state = self._getstate_() elif hasattr(self, '__dict__'): state = self.__dict__.copy() else: state = dict() from sage.env import SAGE_VERSION v = SAGE_VERSION.split('.') major = int(v[0]) minor = int(v[1]) state['sage_version'] = major * 100 + minor return state def __setstate__(self, state): if 'sage_version' in state: version = state['sage_version'] del state['sage_version'] else: version = None if hasattr(self, '_setstate_'): self._setstate_(state) elif hasattr(self, '__dict__'): self.__dict__.update(state) else: self.__dict__ = state if version is not None and hasattr(self, '_upgrade'): self._upgrade(from_version=version) def _upgrade(self, from_version): version = from_version if version < 800: # upgrade to version 801 self._upgradable = True # now self is an object of Sage version 8.1 version = 800 if version < 809: # upgrade to version 902 del self._upgradable # now self is an object of Sage version 9.2 version = 809
I wonder why no one makes a comment on the idea... Is it a bad idea?
comment:10 follow-up: 12 Changed 3 years ago by
Replying to embray:
See also #24337.
I believe the only solution here is better documentation emphasizing that
save()
/load()
use pickle, and are not guaranteed to work between different versions of Sage, or even different versions of Python, and should only be used for temporary storage, not long-term portable serialization.
Erik, this is actually a wrong statement. From the pickle documentation
The pickle serialization format is guaranteed to be backwards compatible across Python releases provided a compatible pickle protocol is chosen and pickling and unpickling code deals with Python 2 to Python 3 type differences if your data is crossing that unique breaking change language boundary.
comment:11 follow-up: 15 Changed 3 years ago by
Replying to klee:
Replying to SimonKing:
Provide a conversion function that turns the data format of version x.y into the data format of version x.(y+1), so that the user can access old stored data version x.y by repeated application of such functions, and then use them in version x.(y+15).
So, in the beginning of this thread, I have proposed an upgrade mechanism of sage objects from a version to another version of sage. ... I wonder why no one makes a comment on the idea... Is it a bad idea?
Sorry, I first wanted to let off some steam, as there recently were some comments in sage-support that I find rather discouraging. I understood these comments as: "Don't use Sage if you want to create a mathematical database", and I believe that's a very unhealthy attitude towards users. Also I suffered, too, from a backwards incompatibility between pickling strings in Python-2 and Python-3.
Concerning your proposal: If I understand correctly, it fits into the often-used approach to provide a reasonable default for double-underscore magical methods, and implement the details by a corresponding single-underscore method. So, that's sound. And it would help in those cases in which __getstate__/__setstate__
is used for pickling.
Is your suggestion that the _upgrade
function has no default implementation, i.e., is custom? Another approach would be that, whenever in future a new incompatibility arises in version x.y, add a new method _upgrade_to_xy
, whose job is to upgrade from x.(y-1) to x.y, and have a default _upgrade
method, that would automatically call an appropriate sequence of the _upgrade_to_ab
methods to come from version foo.bar to version x.y
But it seems to me that the __reduce__
protocol is also very commonly used. It would perhaps make sense to automatically add version information in a default implementation of __reduce__
. Of course, something is different: The first return value of __reduce__
is an unpickling function, which is custom. And to solve incompatibilities between different versions, there probably is no way around fixing each individual custom unpickling function. Or do you see a way?
comment:12 follow-up: 13 Changed 3 years ago by
Replying to vdelecroix:
Erik, this is actually a wrong statement. From the pickle documentation
The pickle serialization format is guaranteed to be backwards compatible across Python releases provided a compatible pickle protocol is chosen and pickling and unpickling code deals with Python 2 to Python 3 type differences if your data is crossing that unique breaking change language boundary.
If Python makes that promise then I believe it really is a bug (and not just an oddity) that Python-3 cannot unpickle some strings that were pickled with Python-2. See #28444.
EDIT: "cannot" means that one needs to add the new keyword encoding="bytes"
to successfully unpickle. But without the keyword, it fails.
comment:13 follow-up: 14 Changed 3 years ago by
Replying to SimonKing:
Replying to vdelecroix:
Erik, this is actually a wrong statement. From the pickle documentation
The pickle serialization format is guaranteed to be backwards compatible across Python releases provided a compatible pickle protocol is chosen and pickling and unpickling code deals with Python 2 to Python 3 type differences if your data is crossing that unique breaking change language boundary.If Python makes that promise then I believe it really is a bug (and not just an oddity) that Python-3 cannot unpickle some strings that were pickled with Python-2. See #28444.
EDIT: "cannot" means that one needs to add the new keyword
encoding="bytes"
to successfully unpickle. But without the keyword, it fails.
It is indeed not a bug. The promise is "up to pickling protocol" and "Python2/Python3 datastructure differences".
comment:14 Changed 3 years ago by
Replying to vdelecroix:
It is indeed not a bug. The promise is "up to pickling protocol" and "Python2/Python3 datastructure differences".
Well, I was told that the Python 2 str datastructure corresponds to the Python 3 bytes datastructure. The problem I try to solve in #28444 is exactly that Python 3 CHANGES the datastructure found in a pickle and turns it into a different datastructure that happens to have the same name (namely str, which in Python 3 is something else than in Python 2).
comment:15 Changed 3 years ago by
Replying to SimonKing:
Sorry, I first wanted to let off some steam, as there recently were some comments in sage-support that I find rather discouraging. I understood these comments as: "Don't use Sage if you want to create a mathematical database", and I believe that's a very unhealthy attitude towards users. Also I suffered, too, from a backwards incompatibility between pickling strings in Python-2 and Python-3.
As it is, the current save/load
based on pickling is not provisioned to work across versions. But I think we should and could make efforts to make it so.
Concerning your proposal: If I understand correctly, it fits into the often-used approach to provide a reasonable default for double-underscore magical methods, and implement the details by a corresponding single-underscore method. So, that's sound. And it would help in those cases in which
__getstate__/__setstate__
is used for pickling.Is your suggestion that the
_upgrade
function has no default implementation, i.e., is custom?
_upgrade
would be provided for a class of objects only when there is a change in the internal data structure of the class and thus unpickling breaks. The developer who is responsible of the breakage should implement the _upgrade
function.
What could be "default" for _upgrade
? I don't understand.
Another approach would be that, whenever in future a new incompatibility arises in version x.y, add a new method
_upgrade_to_xy
, whose job is to upgrade from x.(y-1) to x.y, and have a default_upgrade
method, that would automatically call an appropriate sequence of the_upgrade_to_ab
methods to come from version foo.bar to version x.y
That is done in the _upgrade
function. See if version < ...: version ...
statements. The sequence of if
statements would upgrade the object in steps ultimately to the present version.
But it seems to me that the
__reduce__
protocol is also very commonly used. It would perhaps make sense to automatically add version information in a default implementation of__reduce__
. Of course, something is different: The first return value of__reduce__
is an unpickling function, which is custom. And to solve incompatibilities between different versions, there probably is no way around fixing each individual custom unpickling function. Or do you see a way?
For those classes of objects with custom __reduce__
, perhaps we should customize the __reduce__
method further to incorporate the upgrading mechanism. But I didn't think about these details. Also I don't think classes with custom __reduce__
is common. What is common is those UniqueRepresentation
classes. But for them, there is just one implementation of __reduce__
.
Changed 3 years ago by
Attachment: | demo_yaml.ipynb added |
---|
How can YAML be used for long-term data storage with Sage?
comment:16 Changed 3 years ago by
Replying to klee:
Replying to SimonKing:
Provide a conversion function that turns the data format of version x.y into the data format of version x.(y+1), so that the user can access old stored data version x.y by repeated application of such functions, and then use them in version x.(y+15).
So, in the beginning of this thread, I have proposed an upgrade mechanism of sage objects from a version to another version of sage. I modified it further. The idea is to add into
sage_object.pyx
something likedef __getstate__(self): if hasattr(self, '_getstate_'): state = self._getstate_() elif hasattr(self, '__dict__'): state = self.__dict__.copy() else: state = dict() from sage.env import SAGE_VERSION v = SAGE_VERSION.split('.') major = int(v[0]) minor = int(v[1]) state['sage_version'] = major * 100 + minor return state def __setstate__(self, state): if 'sage_version' in state: version = state['sage_version'] del state['sage_version'] else: version = None if hasattr(self, '_setstate_'): self._setstate_(state) elif hasattr(self, '__dict__'): self.__dict__.update(state) else: self.__dict__ = state if version is not None and hasattr(self, '_upgrade'): self._upgrade(from_version=version) def _upgrade(self, from_version): version = from_version if version < 800: # upgrade to version 801 self._upgradable = True # now self is an object of Sage version 8.1 version = 800 if version < 809: # upgrade to version 902 del self._upgradable # now self is an object of Sage version 9.2 version = 809I wonder why no one makes a comment on the idea... Is it a bad idea?
I'm sorry for that, as well. It is in no way a bad idea. But I have some doubts, that pickling could become robust against version upgrades in principle. In my point of view it looks like an raw egg and the question is whether to just increase its protection or take into account the option to boil it before transportation.
One problem I see is that pickling serializes all related instances independent whether they are simple structured or not. Thus the chance that changes in data structure will harm pickling is really high. Your suggestion would heal such violations, if they where detected. But how can you guarantee that developers and reviewers of all such harmful changes will notice that they should implement or adapt an _upgrade
method?
I made some tests on how an option to boil the egg could look like (using your idea, as well) and documented this in a jupyter notebook I've attached to that ticket.
I think we should do both: Make pickling more save and think about another format!
Changed 3 years ago by
Attachment: | demo_yaml.html added |
---|
This is demo_yaml in html-format (produced with older Sage version 8.1)
comment:17 Changed 3 years ago by
For convenience I added the demo_yaml in a directly readable format
The link to the manual does not work.
I doubt save/load is supposed to work across different Sage versions.
The gist of the problem is whether a pickled object from an old Sage version can be unpickled and work in current Sage version. Python's pickling protocol does not guarantee this.
I think this problem is not directly related with whether we maintain pickle jar or not.
A sage object is tied to a specific Sage version. Hence if we want an object pickled in old Sage can be unpickled and working in current Sage, the object should be pickled with Sage version information and in unpickling, should be upgraded incrementally to current Sage.
I think what Sage lacks is this upgrading mechanism for objects pickled in old Sage version. Pickle jar would be necessary to check whether this upgrading mechanism is functioning.
Let me show my idea. The upgrading mechanism is coded to
sage_object
class:and this is written to the class of objects that needs to be upgraded from old Sage:
But I guess the real solution would be far more complex, but this would be start.