Opened 5 months ago

Closed 4 months ago

Last modified 3 months ago

#28444 closed defect (fixed)

Fix backwards incompatibility of unpickling in Python 3

Reported by: SimonKing Owned by:
Priority: blocker Milestone: sage-8.9
Component: python3 Keywords: unpickling UnicodeError backwards compatibility
Cc: Merged in:
Authors: Simon King Reviewers: Nils Bruin
Report Upstream: N/A Work issues:
Branch: d7f170f (Commits) Commit:
Dependencies: Stopgaps:

Description (last modified by SimonKing)

EDIT: In the original ticket description, I stated: "I believe that a backwards incompatible change of pickling is a blocker for Python-3 support." In that (and ONLY in that) sense I believe this ticket is a blocker. I replaced the original ticket description by something that I wrote in a comment, because now I have a much smaller example, and moreover pickles of the same object created with Python-3 and with Python-2, so that one can compare.

The following examples require the optional meataxe package, but I am not sure yet if meataxe is to blame or Python-3 (I hope it is the former, because I guess it would be more easy to fix).

attachment:Py2.sobj​ and attachment:Py3.sobj​ result in the following behaviour in Python-3

sage: load('/home/king/Projekte/coho/tests/Py2.sobj')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-3-5705b555470a> in <module>()
----> 1 load('/home/king/Projekte/coho/tests/Py2.sobj')

/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.load (build/cythonized/sage/misc/persist.c:2824)()
    149 
    150     ## Load file by absolute filename
--> 151     with open(filename, 'rb') as fobj:
    152         X = loads(fobj.read(), compress=compress)
    153     try:

/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.load (build/cythonized/sage/misc/persist.c:2774)()
    150     ## Load file by absolute filename
    151     with open(filename, 'rb') as fobj:
--> 152         X = loads(fobj.read(), compress=compress)
    153     try:
    154         X._default_filename = os.path.abspath(filename)

/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.loads (build/cythonized/sage/misc/persist.c:7270)()
    967 
    968     unpickler = SageUnpickler(io.BytesIO(s))
--> 969     return unpickler.load()
    970 
    971 

UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)
sage: load('/home/king/Projekte/coho/tests/Py3.sobj')
[1 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 1]

and in Python-2

sage: load('/home/king/Projekte/coho/tests/Py2.sobj')
[1 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 1]
sage: load('/home/king/Projekte/coho/tests/Py3.sobj')
[1 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 1]
sage: __ == _
True

So, the Python-3 pickle can be unpickled in Python-2, but not the other way around. What is the problem?

Attachments (3)

State.sobj (1.1 KB) - added by SimonKing 5 months ago.
File that cannot be unpickled in Python-3
Py2.sobj (259 bytes) - added by SimonKing 5 months ago.
Pickle of MeatAxe? in Python-2
Py3.sobj (324 bytes) - added by SimonKing 5 months ago.
Pickle of MeaAxe? matrix created with Python-3

Download all attachments as: .zip

Change History (91)

Changed 5 months ago by SimonKing

File that cannot be unpickled in Python-3

comment:1 follow-ups: Changed 5 months ago by nbruin

Can Sage/Py3 produce the pickle? In that case, you could compare the produced pickles to see how far apart they are. Of course, if an ASCII decoder encounters 0x80 it's justified to not decode it, so it might be interesting to see what py3 makes from it itself. My guess would be that the bytestring should NOT be decoded by ascii, but something else. Perhaps unpickle can be configured to use a different decoder. But it would be good to see what generates the non-ascii symbol and what its meaning is.

comment:2 in reply to: ↑ 1 Changed 5 months ago by SimonKing

Replying to nbruin:

Of course, if an ASCII decoder encounters 0x80 it's justified to not decode it

Then the same should hold for Python-2. It doesn't. Hence, it shouldn't hold for Python-3 either.

Last edited 5 months ago by SimonKing (previous) (diff)

comment:3 in reply to: ↑ 1 Changed 5 months ago by SimonKing

Replying to nbruin:

Can Sage/Py3 produce the pickle?

The problem is that the pickle comes from an old version of an optional Sage package. That's why I use the "unpickle override". But I'll see what I can do.

comment:4 Changed 5 months ago by SimonKing

The new attachment was created with Python-2 and can be used without "unpickle override", but it requires the optional meataxe spkg.

Changed 5 months ago by SimonKing

Pickle of MeatAxe? in Python-2

Changed 5 months ago by SimonKing

Pickle of MeaAxe? matrix created with Python-3

comment:5 Changed 5 months ago by SimonKing

I think now I have a very small example. It uses the optional meataxe package. It would be a good news if actually the meataxe wrapper was to blame for the unpickling problem --- but I am not expert enough to tell whether (1) it is the case and (2) how it could be fixed (if it was the case).

Anyway. I have a new version of attachment:Py2.sobj, and a new attachment:Py3.sobj. It results in the following behaviour in Python-3

sage: load('/home/king/Projekte/coho/tests/Py2.sobj')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-3-5705b555470a> in <module>()
----> 1 load('/home/king/Projekte/coho/tests/Py2.sobj')

/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.load (build/cythonized/sage/misc/persist.c:2824)()
    149 
    150     ## Load file by absolute filename
--> 151     with open(filename, 'rb') as fobj:
    152         X = loads(fobj.read(), compress=compress)
    153     try:

/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.load (build/cythonized/sage/misc/persist.c:2774)()
    150     ## Load file by absolute filename
    151     with open(filename, 'rb') as fobj:
--> 152         X = loads(fobj.read(), compress=compress)
    153     try:
    154         X._default_filename = os.path.abspath(filename)

/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.loads (build/cythonized/sage/misc/persist.c:7270)()
    967 
    968     unpickler = SageUnpickler(io.BytesIO(s))
--> 969     return unpickler.load()
    970 
    971 

UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)
sage: load('/home/king/Projekte/coho/tests/Py3.sobj')
[1 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 1]

and in Python-2

sage: load('/home/king/Projekte/coho/tests/Py2.sobj')
[1 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 1]
sage: load('/home/king/Projekte/coho/tests/Py3.sobj')
[1 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 1]
sage: __ == _
True

So, the Python-3 pickle can be unpickled in Python-2, but not the other way around. What is the problem?

Last edited 5 months ago by SimonKing (previous) (diff)

comment:6 Changed 5 months ago by SimonKing

Note that the Python-3 pickle is as much as 25% larger than the Python-2 pickle. Is that regression typical?

comment:7 Changed 5 months ago by SimonKing

  • Description modified (diff)
  • Keywords meataxe added

comment:8 follow-up: Changed 5 months ago by chapoton

This is in no way a blocker, IMHO.

comment:9 in reply to: ↑ 8 Changed 5 months ago by SimonKing

Replying to chapoton:

This is in no way a blocker, IMHO.

If it is due to meataxe (an optional package), then it is not a blocker. If it is due to the upcoming switch to Python-3, then IMHO it is a blocker to that switch (not to a Python-2 version of Sage, though). Since currently it isn't clear if the example reveals a problem in Python-3 or not, I'd say better safe than sorry.

comment:10 follow-up: Changed 5 months ago by chapoton

So we agree that this is not a blocker for the upcoming 8.9 release, still py2. This is the usual meaning of blocker. But in this time of transition, we must be clearer about what blocker means.

comment:11 in reply to: ↑ 10 Changed 5 months ago by SimonKing

  • Description modified (diff)

Replying to chapoton:

So we agree that this is not a blocker for the upcoming 8.9 release, still py2. This is the usual meaning of blocker. But in this time of transition, we must be clearer about what blocker means.

In the original ticket description, I told in what sense I believe it was a blocker, but I somehow deleted that clarification when I changed the ticket description. Now, the statement is back, at the top of the ticket description.

comment:12 follow-up: Changed 5 months ago by chapoton

could you provide the needed register unpickle override ?

Trying to load with py3, I stumble on

ModuleNotFoundError: No module named 'sage.matrix.matrix_gfpn_dense'

comment:13 in reply to: ↑ 12 Changed 5 months ago by SimonKing

Replying to chapoton:

could you provide the needed register unpickle override ?

Trying to load with py3, I stumble on

ModuleNotFoundError: No module named 'sage.matrix.matrix_gfpn_dense'

As stated in the ticket description, the pickle is supposed to be loadable with the optional meataxe package (followed by sage -b) installed. However, I believe (but cannot test it, as I do not have Sage without meataxe) that the following register unpickle override would work:

sage: def unpickler(*args, **kwds):
....:     return None
....: 
sage: register_unpickle_override('sage.matrix.matrix_gfpn_dense', 'mtx_unpickle', unpickler)
sage: load('/home/king/Projekte/coho/tests/Py2.sobj')

I suppose the "load()" command will return None in Python-2 but result in an error with Python-3 (even without meataxe installed).

Last edited 5 months ago by SimonKing (previous) (diff)

comment:14 follow-up: Changed 5 months ago by chapoton

using your empty unpickler, I still get the same ModuleNotFoundError

Last edited 5 months ago by chapoton (previous) (diff)

comment:15 in reply to: ↑ 14 Changed 5 months ago by SimonKing

Replying to chapoton:

using you empty unpickler, I still get the same ModuleNotFoundError

I guess the matrix space is a matrix space with implementation=meataxe, and the parent of a matrix is also part of the pickle.

So, we'll override unpickling the matrix space as well:

sage: def MS_unpickler(*args, **kwds):
....:     return MatrixSpace(*(args[:4]),**kwds)
....: 
sage: def mtx_unpickler(*args, **kwds):
....:     return None
....: 
sage: register_unpickle_override('sage.matrix.matrix_space', 'MatrixSpace', MS_unpickler)
sage: register_unpickle_override('sage.matrix.matrix_gfpn_dense', 'mtx_unpickle', mtx_unpickler)
sage: load('/home/king/Projekte/coho/tests/Py2.sobj')

comment:16 follow-ups: Changed 5 months ago by chapoton

Same thing with the new unpicklers proposal. I tried something else:

sage: explain_pickle(open('Py2.sobj', 'rb').read())
pg_mtx_unpickle = unpickle_global('sage.matrix.matrix_gfpn_dense', 'mtx_unpickle')
pg_unreduce = unpickle_global('sage.structure.unique_representation', 'unreduce')
pg_MatrixSpace = unpickle_global('sage.matrix.matrix_space', 'MatrixSpace')
pg_generic_factory_unpickle = unpickle_global('sage.structure.factory', 'generic_factory_unpickle')
pg_lookup_global = unpickle_global('sage.structure.factory', 'lookup_global')
pg_make_integer = unpickle_global('sage.rings.integer', 'make_integer')
si = pg_make_integer('2')
pg_Matrix_gfpn_dense = unpickle_global('sage.matrix.matrix_gfpn_dense', 'Matrix_gfpn_dense')
pg_mtx_unpickle(pg_unreduce(pg_MatrixSpace, (pg_generic_factory_unpickle(pg_lookup_global('FiniteField'), (8r, 9r, 'beta8'), (si, ('x',), None, 'modn', si, pg_make_integer('1'), True, None, None, None), {}), 2r, 8r, False, pg_Matrix_gfpn_dense), {}), 2r, 8r, '\x80\x1f', True)

sage: explain_pickle(open('Py3.sobj', 'rb').read())
pg_mtx_unpickle = unpickle_global('sage.matrix.matrix_gfpn_dense', 'mtx_unpickle')
pg_unreduce = unpickle_global('sage.structure.unique_representation', 'unreduce')
pg_MatrixSpace = unpickle_global('sage.matrix.matrix_space', 'MatrixSpace')
pg_generic_factory_unpickle = unpickle_global('sage.structure.factory', 'generic_factory_unpickle')
pg_lookup_global = unpickle_global('sage.structure.factory', 'lookup_global')
pg_make_integer = unpickle_global('sage.rings.integer', 'make_integer')
si = pg_make_integer('2')
pg_Matrix_gfpn_dense = unpickle_global('sage.matrix.matrix_gfpn_dense', 'Matrix_gfpn_dense')
pg_encode = unpickle_global('_codecs', 'encode')
pg_mtx_unpickle(pg_unreduce(pg_MatrixSpace, (pg_generic_factory_unpickle(pg_lookup_global('FiniteField'), (8r, 9r, 'beta8'), (si, ('x',), None, 'modn', si, pg_make_integer('1'), True, None, None, None), {}), 2r, 8r, False, pg_Matrix_gfpn_dense), {}), 2r, 8r, pg_encode('\x80\x1f', 'latin1'), True)
Last edited 5 months ago by chapoton (previous) (diff)

comment:17 in reply to: ↑ 16 Changed 5 months ago by SimonKing

Replying to chapoton:

Same thing with the new unpicklers proposal.

Can you try to add another line:

sage: register_unpickle_override('sage.matrix.matrix_gfpn_dense', 'Matrix_gfpn_dense', type)

comment:18 in reply to: ↑ 16 Changed 5 months ago by SimonKing

Replying to chapoton:

I tried something else:

sage: explain_pickle(open('Py2.sobj', 'rb').read())
pg_mtx_unpickle = unpickle_global('sage.matrix.matrix_gfpn_dense', 'mtx_unpickle')
pg_unreduce = unpickle_global('sage.structure.unique_representation', 'unreduce')
pg_MatrixSpace = unpickle_global('sage.matrix.matrix_space', 'MatrixSpace')
pg_generic_factory_unpickle = unpickle_global('sage.structure.factory', 'generic_factory_unpickle')
pg_lookup_global = unpickle_global('sage.structure.factory', 'lookup_global')
pg_make_integer = unpickle_global('sage.rings.integer', 'make_integer')
si = pg_make_integer('2')
pg_Matrix_gfpn_dense = unpickle_global('sage.matrix.matrix_gfpn_dense', 'Matrix_gfpn_dense')
pg_mtx_unpickle(pg_unreduce(pg_MatrixSpace, (pg_generic_factory_unpickle(pg_lookup_global('FiniteField'), (8r, 9r, 'beta8'), (si, ('x',), None, 'modn', si, pg_make_integer('1'), True, None, None, None), {}), 2r, 8r, False, pg_Matrix_gfpn_dense), {}), 2r, 8r, '\x80\x1f', True)

sage: explain_pickle(open('Py3.sobj', 'rb').read())
pg_mtx_unpickle = unpickle_global('sage.matrix.matrix_gfpn_dense', 'mtx_unpickle')
pg_unreduce = unpickle_global('sage.structure.unique_representation', 'unreduce')
pg_MatrixSpace = unpickle_global('sage.matrix.matrix_space', 'MatrixSpace')
pg_generic_factory_unpickle = unpickle_global('sage.structure.factory', 'generic_factory_unpickle')
pg_lookup_global = unpickle_global('sage.structure.factory', 'lookup_global')
pg_make_integer = unpickle_global('sage.rings.integer', 'make_integer')
si = pg_make_integer('2')
pg_Matrix_gfpn_dense = unpickle_global('sage.matrix.matrix_gfpn_dense', 'Matrix_gfpn_dense')
pg_encode = unpickle_global('_codecs', 'encode')
pg_mtx_unpickle(pg_unreduce(pg_MatrixSpace, (pg_generic_factory_unpickle(pg_lookup_global('FiniteField'), (8r, 9r, 'beta8'), (si, ('x',), None, 'modn', si, pg_make_integer('1'), True, None, None, None), {}), 2r, 8r, False, pg_Matrix_gfpn_dense), {}), 2r, 8r, pg_encode('\x80\x1f', 'latin1'), True)

Do I see correctly that the only difference is the line

pg_encode = unpickle_global('_codecs', 'encode')

that is in Python-3 but not in Python-2? What are the implications?

comment:19 follow-up: Changed 5 months ago by chapoton

There is another difference at the end of the last line of the explain pickles, that contains 0x80..

With the third "register_unpickle, I now get

sage: load('Py3.sobj')

works ie returns None. And

sage: load('Py2.sobj')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-38-45fcfa858fc2> in <module>()
----> 1 load('Py2.sobj')

/home/chapoton/sage3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.load (build/cythonized/sage/misc/persist.c:2824)()
    149 
    150     ## Load file by absolute filename
--> 151     with open(filename, 'rb') as fobj:
    152         X = loads(fobj.read(), compress=compress)
    153     try:

/home/chapoton/sage3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.load (build/cythonized/sage/misc/persist.c:2774)()
    150     ## Load file by absolute filename
    151     with open(filename, 'rb') as fobj:
--> 152         X = loads(fobj.read(), compress=compress)
    153     try:
    154         X._default_filename = os.path.abspath(filename)

/home/chapoton/sage3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.loads (build/cythonized/sage/misc/persist.c:7270)()
    967 
    968     unpickler = SageUnpickler(io.BytesIO(s))
--> 969     return unpickler.load()
    970 
    971 

UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)

comment:20 in reply to: ↑ 19 Changed 5 months ago by SimonKing

Replying to chapoton:

There is another difference at the end of the last line of the explain pickles, that contains 0x80..

Right.

Do I understand correctly, that the string with 0x80 is swallowed without problem by Python-2, both with and without explicit encoding, while Python-3 will swallow it ONLY with an explicit encoding?

Then: How to make it so that temporarily the unpickler assumes a default encoding?

With the third "register_unpickle, I now get ...

Hooray! So, it is now possible to investigate the core problem.

comment:21 Changed 5 months ago by SimonKing

Aha! I guess the problem is indeed that the string passed to the matrix constructor is supposed to be interpreted as bytes. Namely:

sage: pg_mtx_unpickle = unpickle_global('sage.matrix.matrix_gfpn_dense', 'mtx_unpickle')
....: pg_unreduce = unpickle_global('sage.structure.unique_representation', 'unreduce')
....: pg_MatrixSpace = unpickle_global('sage.matrix.matrix_space', 'MatrixSpace')
....: pg_generic_factory_unpickle = unpickle_global('sage.structure.factory', 'generic_factory_unpickle')
....: pg_lookup_global = unpickle_global('sage.structure.factory', 'lookup_global')
....: pg_make_integer = unpickle_global('sage.rings.integer', 'make_integer')
....: si = pg_make_integer('2')
....: pg_Matrix_gfpn_dense = unpickle_global('sage.matrix.matrix_gfpn_dense', 'Matrix_gfpn_dense')
sage: pg_mtx_unpickle(pg_unreduce(pg_MatrixSpace, (pg_generic_factory_unpickle(pg_lookup_global('FiniteField
....: '), (8r, 9r, 'beta8'), (si, ('x',), None, 'modn', si, pg_make_integer('1'), True, None, None, None), {
....: }), 2r, 8r, False, pg_Matrix_gfpn_dense), {}), 2r, 8r, '\x80\x1f', True)
....: 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-5506f597954b> in <module>()
----> 1 pg_mtx_unpickle(pg_unreduce(pg_MatrixSpace, (pg_generic_factory_unpickle(pg_lookup_global('FiniteField'), (8, 9, 'beta8'), (si, ('x',), None, 'modn', si, pg_make_integer('1'), True, None, None, None), {}), 2, 8, False, pg_Matrix_gfpn_dense), {}), 2, 8, '\x80\x1f', True)

TypeError: Argument 'Data' has incorrect type (expected bytes, got str)
sage: pg_mtx_unpickle(pg_unreduce(pg_MatrixSpace, (pg_generic_factory_unpickle(pg_lookup_global('FiniteField
....: '), (8r, 9r, 'beta8'), (si, ('x',), None, 'modn', si, pg_make_integer('1'), True, None, None, None), {
....: }), 2r, 8r, False, pg_Matrix_gfpn_dense), {}), 2r, 8r, b'\x80\x1f', True)
....: 
[1 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 1]

(of course, this is with meataxe installed).

So, is it possible to automatically load a Python-2 string as a Python-3 bytes, as if one simply puts a b in front of the string (that's what I did in the last line of the above example)?

comment:22 Changed 5 months ago by chapoton

Happy to see progress. But sorry, I will now turn offline..

comment:23 follow-up: Changed 5 months ago by SimonKing

Everything boils down to the following:

  • In Python-2, do
    sage: X = '\x80\x1f'
    sage: save(X, 'Py2_string.sobj')
    
  • In Python-3, you'll get
    sage: load('/home/king/Projekte/coho/tests/Py2_string.sobj')
    Traceback (most recent call last):
    ...
    UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)
    

So, it is really the question: How can I (temporarily) force Python-3 to interpret a pickled string as bytes?

comment:24 in reply to: ↑ 23 ; follow-ups: Changed 5 months ago by nbruin

Replying to SimonKing:

So, it is really the question: How can I (temporarily) force Python-3 to interpret a pickled string as bytes?

It looks like other people have run into this problem:

https://stackoverflow.com/questions/28218466/unpickling-a-python-2-object-with-python-3

comment:25 in reply to: ↑ 24 ; follow-up: Changed 5 months ago by jhpalmieri

Replying to nbruin:

Replying to SimonKing:

So, it is really the question: How can I (temporarily) force Python-3 to interpret a pickled string as bytes?

It looks like other people have run into this problem:

https://stackoverflow.com/questions/28218466/unpickling-a-python-2-object-with-python-3

I was wondering about one of the approaches mentioned there: within Python 2, unpickle the data, then save it in a format which can be read by Python 3, or ideally, read by both Python 2 and 3.

comment:26 in reply to: ↑ 25 ; follow-up: Changed 5 months ago by nbruin

Replying to jhpalmieri:

I was wondering about one of the approaches mentioned there: within Python 2, unpickle the data, then save it in a format which can be read by Python 3, or ideally, read by both Python 2 and 3.

I think the main point here is to be able to solve it on the Py3 side: have a *BACKWARD* compatible solution. If we can work on both sides, there are plenty of work-arounds, for instance, write the data as JSON and parse that on the other side.

That said, I think this is another example that shows that we should be serious about keeping VMs of a reasonable spread of sage versions -- people might end up needing them for data archaeology. (do we do this already? I know VMs get produced for a lot of releases; archiving them would just be a matter of resources.

comment:27 in reply to: ↑ 26 ; follow-up: Changed 5 months ago by jhpalmieri

Replying to nbruin:

Replying to jhpalmieri:

I was wondering about one of the approaches mentioned there: within Python 2, unpickle the data, then save it in a format which can be read by Python 3, or ideally, read by both Python 2 and 3.

I think the main point here is to be able to solve it on the Py3 side: have a *BACKWARD* compatible solution.

Right, sorry, I was thinking about the immediate problem of the data in the p-group cohomology package. Simon, maybe it's the right time to change the format of the saved data?

comment:28 in reply to: ↑ 27 ; follow-up: Changed 5 months ago by SimonKing

Replying to jhpalmieri:

Replying to nbruin:

I think the main point here is to be able to solve it on the Py3 side: have a *BACKWARD* compatible solution.

Indeed, that's what I want to achieve here.

Right, sorry, I was thinking about the immediate problem of the data in the p-group cohomology package. Simon, maybe it's the right time to change the format of the saved data?

As I have demonstrated above, a pickle created with Python-3 can be read both with Python-2 and Python-3. So, that side of the problem isn't really urgent for the p_group_cohomology package, I think.

However, I'd like to understand

  • why the Python-3 pickle is 25% larger than the Python-2 pickle, and
  • why pickling bytes in Python-3 is apparently more involved than pickling a string, even though to my understanding a string is something more complicated, as it needs to be interpreted by some encoding. That's to say: Why is b'\x80\x1f' not pickled as b'\x80\x1f' but as
    pg_encode = unpickle_global('_codecs', 'encode')
    pg_encode('\x80\x1f', 'latin1')
    
    My expectation actually was that bytes is a more basic data type than str, as the latter needs to know how it is interpreted (utf-8? isolatin-1? etc.)

comment:29 in reply to: ↑ 24 Changed 5 months ago by SimonKing

Replying to nbruin:

It looks like other people have run into this problem:

https://stackoverflow.com/questions/28218466/unpickling-a-python-2-object-with-python-3

Thank you! The recommended solution is

with open(mshelffile, 'rb') as f:
    d = pickle.load(f, encoding='bytes')

However, Sage's load apparently doesn't know about the encoding keyword. So, I suggest the solution to add an encoding keyword to appropriate places in sage.misc.persist; that keyword should just be ignored in Python-2 (as apparently Python-2 doesn't support it) and passed to the pickle module in Python-3.

If you agree that the approach makes sense, I'll change the ticket description accordingly.

comment:30 Changed 5 months ago by SimonKing

  • Keywords backwards compatibility added; meataxe removed

I just checked: When I pickle.dump('\x80\x1f', <file>) with either Python-2 or Python-3, then I can pickle.load(<file>, encoding=bytes) in Python-3 and can pickle.load(<file>) in Python-2. So, that looks like backwards and forwards compatibility to me.

Also we see in the stackoverflow thread that the problem really isn't related with our optional meataxe or p_groupcohomology modules. Therefore I remove the keyword from the ticket description.

comment:31 follow-up: Changed 5 months ago by chapoton

I would also suggest to understand where this non-ascii string comes from in your example pickles, and fix the responsible of that to use instead unicode strings. This '\x80\x1f' may be related to the empty-set symbol, but I am not sure.

comment:32 in reply to: ↑ 31 Changed 5 months ago by SimonKing

Replying to chapoton:

I would also suggest to understand where this non-ascii string comes from in your example pickles, and fix the responsible of that to use instead unicode strings.

This already is understood. '\x80\x1f' is bytes corresponding to the data of some meataxe matrix. It really *is* a bytes (see line 528 in src/sage/matrix/matrix_gfpn_dense.pyx! It SHOULDN'T be a unicode string. And the problem is that Python-3 tries to erroneously read it as unicode string, when it encounters it in a Python-2 pickle.

Ostensibly, it is the case that Python-2 str corresponds to Python-3 bytes, and Python-2 unicode corresponds to Python-3 str. But apparently Python-3 tries to unpickle a Python-2 str as unicode, and that's a bug (in Python, not in Sage), IMHO. Example:

king@klap:~$ ~/Sage/git/sage/sage -python
Python 2.7.15 (default, Jul 26 2019, 11:49:43) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('bla', 'wb') as f:
...     pickle.dump(('\x80\x1f', u'\x80\x1f'), f)
... 
>>> 
king@klap:~$ ~/Sage/git/py3/sage -python
Python 3.7.3 (default, Aug 27 2019, 23:22:23) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('bla', 'rb') as f:
...     pickle.load(f)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)
>>> with open('bla', 'rb') as f:
...     X = pickle.load(f, encoding='bytes')
... 
>>> X
(b'\x80\x1f', '\x80\x1f')

Note that according to the stackoverflow thread, the same problem would also arise when you pickle a Python-2 dictionary whose keys are Python-2 strings, and then unpickle in Python-3. It isn't an exotic problem that is in any way caused by a Sage (optional) package.

comment:33 Changed 5 months ago by SimonKing

PS:

king@klap:~$ ~/Sage/git/py3/sage -python
Python 3.7.3 (default, Aug 27 2019, 23:22:23) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('bla', 'wb') as f:
...     pickle.dump(('\x80\x1f', b'\x80\x1f'), f)
... 
>>> 
king@klap:~$ ~/Sage/git/py3/sage -python
Python 3.7.3 (default, Aug 27 2019, 23:22:23) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('bla', 'rb') as f:
...     X = pickle.load(f)
... 
>>> X
('\x80\x1f', b'\x80\x1f')
>>> with open('bla', 'rb') as f:
...     X = pickle.load(f, encoding='bytes')
... 
>>> X
('\x80\x1f', b'\x80\x1f')

So, the types of a Python-3 pickle are still correct when unpickling them with the encode='bytes' keyword.

comment:34 Changed 5 months ago by SimonKing

Argh. It is of course possible to pass the encoding keyword to the pickle module. But then, things fail because of things like

    if '.' in name:
        module, name = name.rsplit('.', 1)
        all = __import__(module, fromlist=[name])

in sage.structure.factory.unpickle_global: name should be a string, but is bytes when using encoding='bytes' (whereas '.' still is a string).

comment:35 follow-up: Changed 5 months ago by jhpalmieri

Does the function bytes_to_str (from sage.cpython.string) help? The docstring:

    Convert ``bytes`` to ``str``.

    On Python 2 this is a no-op since ``bytes is str``.  On Python 3
    this decodes the given ``bytes`` to a Python 3 unicode ``str`` using
    the specified encoding.

You could apply that to name.

comment:36 in reply to: ↑ 35 ; follow-up: Changed 5 months ago by SimonKing

Replying to jhpalmieri:

Does the function bytes_to_str (from sage.cpython.string) help?

Yes, but currently it seems that I am just opening a can of worms. The error that I'm getting next:

/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/rings/integer.pyx in sage.rings.integer.make_integer (build/cythonized/sage/rings/integer.c:43277)()
   7220     """
   7221     cdef Integer r = PY_NEW(Integer)
-> 7222     mpz_set_str(r.value, str_to_bytes(s), 32)
   7223     return r
   7224 

/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/cpython/string.pxd in sage.cpython.string.str_to_bytes (build/cythonized/sage/rings/integer.c:47648)()
     90     # compile-time variable. We keep the Cython wrapper to deal with
     91     # the default arguments.
---> 92     return _str_to_bytes(s, encoding, errors)

TypeError: expected str, bytes found

So, summary: In many places, python-3 really uses str in the same way as python-2 was using str. Hence, it does make sense to unpickle a python-2 str as a python-3 string. However, in other situations, we really want and need that a python-2 str unpickles as a python-3 bytes, because in python-2 str is bytes.

And that really makes everything complicated.

Last edited 5 months ago by SimonKing (previous) (diff)

comment:37 in reply to: ↑ 36 ; follow-up: Changed 5 months ago by nbruin

Replying to SimonKing:

So, summary: In many places, python-3 really uses str in the same way as python-2 was using str. Hence, it does make sense to unpickle a python-2 str as a python-3 string. However, in other situations, we really want and need that a python-2 str unpickles as a python-3 bytes, because in python-2 str is bytes.

That would indicate we need a heuristic. We might have to patch pickle._decode_string: perhaps have another pseudo-encoding "ascii_or_bytes" which decodes via ascii to a unicode string if possible and otherwise returns a bytes object.

Another option is to decode with 'latin-1' to always get a unicode string, but it's already documented that that breaks stuff as well. So it seems we would need to try and break things in as few cases as possible ...

comment:38 in reply to: ↑ 37 ; follow-ups: Changed 5 months ago by SimonKing

Replying to nbruin:

That would indicate we need a heuristic. We might have to patch pickle._decode_string: perhaps have another pseudo-encoding "ascii_or_bytes" which decodes via ascii to a unicode string if possible and otherwise returns a bytes object.

Another option is to decode with 'latin-1' to always get a unicode string, but it's already documented that that breaks stuff as well.

"Break" in what sense?

There are two types of problems we encounter when pickling:

  1. Unpickling of objects of basic python types results in an error, thus, unpickling aborts before a call to code from the Sage library is involved.
  2. Unpickling of objects of basic python types is faulty but without raising an error, thus, code from the Sage library is called with incorrect input.

Problems of type 1. are very bad, because unpickling fails before our Sage library code even has a chance to get things right. Therefore we should try to make it so that problems of type 1. are strictly avoided.

Problems of type 2. can at least in principal be fixed by changes to the Sage library. So, they are less critical. Let us focus on the bytes versus str problem.

If we use encoding='bytes', then all python-2 str are unpickled as bytes. And that breaks lots and lots of things, such as looking up functions in the global namespace by their function name (function names must be str, not bytes). So, I believe this is no option.

What if we use encoding='latin1'? Is it guaranteed that if b is a bytes then b.decode('latin1') yields a str (without raising an error) and that b.decode('latin1').encode('latin1') == b? I tested that it is the case at least in some example, in which other encodings fail:

sage: b = b'\x80\x1f'
sage: b.decode('utf-16').encode('utf-16') == b
False
sage: b.decode('utf-8').encode('utf-8') == b
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-130-7e0e96195e1e> in <module>()
----> 1 b.decode('utf-8').encode('utf-8') == b

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
sage: b.decode('latin1').encode('latin1') == b
True

If the answer to the question generally is "yes, it is guaranteed", then the natural approach would be: Whenever a Sage function or method expects an input of type bytes but encounters str, then the str should be encoded to a bytes with encoding 'latin1'.

The price to pay: The automatic conversion may be fine during unpickling, but may be unwanted in all other situations.

I wonder how feasible this would be.

comment:39 in reply to: ↑ 38 Changed 5 months ago by SimonKing

Replying to SimonKing:

What if we use encoding='latin1'? Is it guaranteed that if b is a bytes then b.decode('latin1') yields a str (without raising an error) and that b.decode('latin1').encode('latin1') == b?

Yes!!

sage: all_bytes = bytearray(range(0x100)); all_bytes.decode('latin1').encode('latin1') == all_bytes
True

Good news.

Is there a way to temporarily (i.e., during unpickling) set the default encoding to latin1, so that cdef bytes x = s where s is a string will automatically use latin1-encoding? This may already fix some cases.

comment:40 in reply to: ↑ 38 ; follow-up: Changed 5 months ago by nbruin

Replying to SimonKing:

Another option is to decode with 'latin-1' to always get a unicode string, but it's already documented that that breaks stuff as well.

"Break" in what sense?

at the stackoverflow mentioned before it's mentioned that encoding='latin1' breaks unpickling of numpy arrays, which is used as motivation that encoding='bytes' would be better. That's obviously not uniformly the case, as you're encountering.

Roundtrip invariance is definitely guaranteed by latin1: it's just a 1-1 mapping between byte values and characters (in fact, the first 256 unicode code-points. If 0 can count as a code point).

Bad news: the cython documentation suggests that cython's string conversion encodings are determined (from cython 0.19) by compile-time directives.

comment:41 in reply to: ↑ 40 ; follow-up: Changed 5 months ago by SimonKing

Replying to nbruin:

at the stackoverflow mentioned before it's mentioned that encoding='latin1' breaks unpickling of numpy arrays, which is used as motivation that encoding='bytes' would be better. That's obviously not uniformly the case, as you're encountering.

In the thread, I see mentioned that "Using encoding = 'latin1' causes some issues when your object contains numpy arrays in it." However, I don't see an explicit example or an explanation what these issues are.

comment:42 in reply to: ↑ 41 ; follow-up: Changed 5 months ago by nbruin

Replying to SimonKing:

In the thread, I see mentioned that "Using encoding = 'latin1' causes some issues when your object contains numpy arrays in it." However, I don't see an explicit example or an explanation what these issues are.

One quote:

Note that up to Python versions before 3.6.8, 3.7.2 and 3.8.0, unpickling of Python 2 datetime object data is broken unless you use encoding='bytes'.

(linking to Python issue 22005. The issue is an interesting read, displaying a similar spread of attitudes among core Python developers as we have seen on sage-devel. In the end a fix was accepted in Python, so some willingness to consider compatibility for pickles between python versions (especially 2/3) does exist.)

This could be what the advice was about (since numpy arrays in pandas applications are quite prone to containing datetimes)

The numpy.load command accepts an encoding (for passing to pickle) and enforces it to be ascii, latin1, or bytes to avoid corrupting numerical data.

There is a numpy issue about this as well, which seems to be resolved.

The main thing I seem to find here: datetime had a problem with pickle encoding=latin1 prior to 3.9. Otherwise, bytes seems to cause various problems. So from this search it seems to me going with latin1 might indeed be the better default.

comment:43 in reply to: ↑ 42 Changed 5 months ago by SimonKing

Replying to nbruin:

One quote:

Note that up to Python versions before 3.6.8, 3.7.2 and 3.8.0, unpickling of Python 2 datetime object data is broken unless you use encoding='bytes'. ... This could be what the advice was about (since numpy arrays in pandas applications are quite prone to containing datetimes)

The numpy.load command accepts an encoding (for passing to pickle) and enforces it to be ascii, latin1, or bytes to avoid corrupting numerical data.

There is a numpy issue about this as well, which seems to be resolved.

In a version that is in Sage?

The main thing I seem to find here: datetime had a problem with pickle encoding=latin1 prior to 3.9.

We are prior to 3.9 (the current Py-3 version is 3.8 in Sage).

Otherwise, bytes seems to cause various problems. So from this search it seems to me going with latin1 might indeed be the better default.

Of course we should have a default, but most importantly is that we allow the user to specify all optional arguments to load/dump that are accepted by pickle.load/pickle.dump. Namely, in Python-2:

Python 2.7.15 (default, Jul 26 2019, 11:49:43) 
[GCC 5.4.0 20160609] on linux2
>>> import pickle, inspect
>>> inspect.getargspec(pickle.load)
ArgSpec(args=['file'], varargs=None, keywords=None, defaults=None)
>>> inspect.getargspec(pickle.dump)
ArgSpec(args=['obj', 'file', 'protocol'], varargs=None, keywords=None, defaults=(None,))

and in Python-3:

Python 3.7.3 (default, Aug 27 2019, 23:22:23) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle, inspect
>>> inspect.getfullargspec(pickle.load)
FullArgSpec(args=['file'], varargs=None, varkw=None, defaults=None, kwonlyargs=['fix_imports', 'encoding', 'errors'], kwonlydefaults={'fix_imports': True, 'encoding': 'ASCII', 'errors': 'strict'}, annotations={})
>>> inspect.getfullargspec(pickle.dump)
FullArgSpec(args=['obj', 'file', 'protocol'], varargs=None, varkw=None, defaults=(None,), kwonlyargs=['fix_imports'], kwonlydefaults={'fix_imports': True}, annotations={})

I wonder if we should be explicit (i.e., def load(file, fix_imports=True, encoding='ASCII', errors='strict')), which means that the code would depend on the Python language level, or implicit (i.e., def load(file, **kwargs))?

comment:44 follow-up: Changed 5 months ago by SimonKing

In this comment, I'm trying to summarise the different approaches. Please add if I omit or misrepresent something.

  • Sage's pickling and unpickling functions (save, load, dumps, loads) should accept optional arguments. These should be passed to SageUnpickler.__init__ and SagePickler.__init__; I think this is uncontroversial. Only question: explicit or implicit, as asked in comment:43.
  • Should the encoding keyword get a default in Sage? Actually I am not sure if it should! But probably it should, and we should consider it a bug if a Sage object pickled with Python-2 cannot be unpickled with Python-3.

If we decide to have encoding with a default:

  • encoding='bytes' means that unpickling will fail as soon as a function is looked up in some module, because the look-up is by function name, and the name is a str, not bytes. So, bytes is a very bad default, unless we modify sage.structure.factory.lookup_global so that it also accepts bytes in lieu of str. When this is fixed, each Sage function expecting a str needs to be changed so that it also accepts bytes and transforms it to str with encoding "latin1" (?), case-by-case.
  • encoding='latin1' will make it so that always the unpickler will be in a position to look up globals. However, each Sage function expecting a bytes needs to be changed so that it also accepts str and transforms it to bytes with encoding "latin1".

Which scenario would be the worse can of worms?I guess using strings to pickle objects is more common than using bytes (sage.matrix.matrix_gfpn_dense use bytes, but what else?). So, it seems the approach to use latin1 will be easier in the Sage library.

However, we also need to worry about non-Sage library objects that might occur as attributes of Sage library objects. Such as numpy arrays/datetime. Is there a hack to make unpickling of numpy arrays work even when latin1 is used? Or is bytes the only way out for them?

comment:45 Changed 5 months ago by SimonKing

PS: Another approach was mentioned, namely to patch the pickle module, so that it is guessed whether a pickled Py-2 str is unpickled as Py-3 str or bytes. But each heuristics to do so can fail on specific data, and in such cases we would still need to add appropriate conversion from str to bytes and from bytes to str to Sage library functions, to catch the cases in which the heuristics fails.

comment:46 Changed 5 months ago by SimonKing

Here is another aspect that we could consider:

sage: all_bytes = bytes(range(0x100))
sage: len(dumps(all_bytes))
410
sage: len(dumps(all_bytes.decode(encoding='latin')))
361

So, it might actually be a good idea to NOT use bytes for pickles, but use str instead, when Python 3 is involved (I am slightly surprised, because I thought bytes objects are simpler than str objects and thus their pickle should be smaller, not larger).

When we change pickle functions to switch from using bytes to using str, it would be the corresponding unpickling functions that should be changed to accept str instead of bytes to make Py-2 pickles useable in Py-3 with encoding='latin1'. That's another reason to choose the default encoding='latin1'.

comment:47 in reply to: ↑ 44 ; follow-up: Changed 5 months ago by nbruin

Replying to SimonKing:

  • encoding='latin1' will make it so that always the unpickler will be in a position to look up globals. However, each Sage function expecting a bytes needs to be changed so that it also accepts str and transforms it to bytes with encoding "latin1".

We only need to do so at the time of unpickling, for backwards compatibility reasons (i.e., if we move to a python-3 only protocol in the future, the workaround can be turned off when unpickling at that protocol level). So instead of actually changing the (hopefully few) objects that have a "bytes" object in them to accept strings and interpret them with latin1 encoding, can we just register little wrappers in "copyreg" to do that decoding for us?

However, we also need to worry about non-Sage library objects that might occur as attributes of Sage library objects. Such as numpy arrays/datetime. Is there a hack to make unpickling of numpy arrays work even when latin1 is used? Or is bytes the only way out for them?

Backport Issue 22005 from Python 3.9?

If the copyreg option is available, perhaps the following would be a workable, extendable solutions:

  • make sure encoding is available for sage's unpickler access
  • aim for "latin1" being the default, but let 'bytes' be an option for the (rare?) pickles that need it
  • solve a few cases needing 'bytes', using copyreg registered constructors, so that they also work with 'latin1'
  • document the process of writing these copyreg constructors so that it's relatively straightforward to do it when we discover other data types that need it.

I think that gets us at least to a state where the "blocker" status of this ticket can be lifted, because there's a clear path to resolving this pickle incompatibility.

Truth is: any data type that is not in a pickle jar that gets tested regularly will almost certainly not be supported for unpickling in the future. Backwards compatibility of pickle requires real and constant work as sage develops. Major overhauls particularly will be a huge problem, and in some cases (if we don't have a version attribute) there might already be pickles that are genuinely ambiguous in how they should be unpickled.

Last edited 5 months ago by nbruin (previous) (diff)

comment:48 Changed 5 months ago by nbruin

Can we do something like the following? Suppose we have an object as below. It's written without particular considerations for pickling and it uses both bytes and str, so on the Py3 side, we're hooped:

class MixedObject(object):
    def __init__(self,B,S):
        if not(isinstance(B,bytes)):
            raise TypeError("Parameter B must be of type bytes")
        if not(isinstance(S,str)):
            raise TypeError("Parameter S must be ot type str")
        self.B=B
        self.S=S
    def validate(self):
        if not(isinstance(self.B,bytes)):
            raise TypeError("Parameter B must be of type bytes")
        if not(isinstance(self.S,str)):
            raise TypeError("Parameter S must be ot type str")
        return True

X=MixedObject(b"hi","there")

The pickle we get with pickle.dumps(X) from Py2 unpacks into Py3 into an object that doesn't validate. If we use pickle.loads(X,encoding="bytes") we end up with an object that doesn't even have the required attributes (because the dictionary gets reconstructed with bytes keys).

However, if we equip the object on the Py3 side with a __setstate__:

    def __setstate__(self,D):
        if isinstance(D['B'],str):
            self.B=D['B'].encode('latin1')
        else:
            self.B=D['B']
        self.S=D['S']

we can use loads(...,encoding="latin1") and get an object that validates. So with a bit of luck, I think we can deal with most cases where we do need a bytes attribute with either use copyreg to register a custom reconstructor (when an object uses a __reduce__) that can preprocess construction parameters that are to be turned into bytes or by amending/adding __setstate__ to use encode('latin1') on values that are supposed to be bytes.

It's a fairly painful process, but it should be doable entirely on the py3 side.

comment:49 in reply to: ↑ 47 ; follow-up: Changed 5 months ago by SimonKing

Replying to nbruin:

Replying to SimonKing:

  • encoding='latin1' will make it so that always the unpickler will be in a position to look up globals. However, each Sage function expecting a bytes needs to be changed so that it also accepts str and transforms it to bytes with encoding "latin1".

We only need to do so at the time of unpickling, for backwards compatibility reasons (i.e., if we move to a python-3 only protocol in the future, the workaround can be turned off when unpickling at that protocol level). So instead of actually changing the (hopefully few) objects that have a "bytes" object in them to accept strings and interpret them with latin1 encoding, can we just register little wrappers in "copyreg" to do that decoding for us?

I don't know about copyreg. What and how can it do?

That said: We currently know only one type (namely Matrix_gfpn_dense) in the Sage library whose unpickling would fail if stored Python-2 str are interpreted as Python-3 str. And since a pickle with bytes is larger than a pickle with str, I'm inclined to change the pickling of Matrix_gfpn_dense anyway.

Suggestion: In a new ticket that will be a dependency for here, I'll change Matrix_gfpn_dense so that it will in future use str for pickling and will be able to use both bytes and str for unpickling- Does that make sense to you?

Afterwards, we can focus here on encoding defaults, copyreg...

Backport Issue 22005 from Python 3.9?

... and the backport.

Sorry, I just notice that my above suggestions largely coincide with your suggestion in the rest of your post. So:

If the copyreg option is available, perhaps the following would be a workable, extendable solutions:

  • make sure encoding is available for sage's unpickler access

That's for the ticket here, and easy.

  • aim for "latin1" being the default, but let 'bytes' be an option for the (rare?) pickles that need it

We should give the user the option to use options, regardless whether such options are needed for pickles of Sage library objects or not. That's easy to do.

  • solve a few cases needing 'bytes', using copyreg registered constructors, so that they also work with 'latin1'

I would like to solve one case in a new ticket (namely Matrix_gfpn_dense) without copyreg. If further cases pop up (perhaps in the pickle jar?), we may tackle them here with copyreg. I don't know how difficult it is.

  • document the process of writing these copyreg constructors so that it's relatively straightforward to do it when we discover other data types that need it.

Good idea.

I think that gets us at least to a state where the "blocker" status of this ticket can be lifted, because there's a clear path to resolving this pickle incompatibility.

I think the example from the ticket description could be easily fixed.

Truth is: any data type that is not in a pickle jar that gets tested regularly will almost certainly not be supported for unpickling in the future.

If meataxe was a standard package (there are reasons to make it standard, btw.), Matrix_gfpn_dense should certainly become part of the pickle jar.

comment:50 in reply to: ↑ 49 Changed 5 months ago by nbruin

Replying to SimonKing:

I don't know about copyreg. What and how can it do?

From what I understand, it manages the dictionary in which "constructors" are looked up. A pickle of an object with a __reduce__ would usually amount to instructions to call a certain named constructor with arguments (encoded in the pickle as well). The name of that constructor needs to be looked up somewhere. If I'm not mistaken, copyreg manages the dictionary for that. One can also instantiate Unpickler objects (or subclass them) to customize the constructor lookup. In cases where the arguments as pickled would need preprocessing before calling the normal constructor (just class names are generally used as constructors), then it makes sense to register a factory function in place of that class name to do the preprocessing (and then call the relevant class).

EDIT: I see now that register_unpickle_override seems to hook into the same place where copyreg for the normal python unpickler goes (sage subclasses the normal unpickler and exposes that via load and loads)

Suggestion: In a new ticket that will be a dependency for here, I'll change Matrix_gfpn_dense so that it will in future use str for pickling and will be able to use both bytes and str for unpickling- Does that make sense to you?

I'm rather strongly against misusing str for binary data. That's not what it's for! And what you can see in this ticket already, is that signalling intent properly in pickle files is of utmost importance for forward/backward compatibility option. The observation that it ends up shorter as a str could be something stupid as the sequence having a UTF-8 representation that happens to be surprisingly short. If you think the pickle format for binary can be improved, then patch pickle. I think Matrix_gfpn_dense should just have the interface that makes most sense for it. It can also have a custom __reduce__ that encodes its defining data in a sensible way (it probably already has); possible zipping or otherwise compressing the data if that is appropriate (I don't think it is -- compression can always be done in a separate round)

Matrix_gfpn_dense is very easy to fix: they are ALREADY unpickled using a helper function mtx_unpickle. If that would accept as Data both bytes and str (which it can then transform in bytes using the latin1 encoder), you'd be done. That doesn't affect the normal API of the class at all.

If meataxe was a standard package (there are reasons to make it standard, btw.), Matrix_gfpn_dense should certainly become part of the pickle jar.

Isn't the problem that the pickle jar isn't really maintained/tested anymore? With the corollary that pickling in sage is doomed to suffer from compatibility problems.

Last edited 5 months ago by nbruin (previous) (diff)

comment:51 Changed 5 months ago by SimonKing

  • Branch set to u/SimonKing/fix_backwards_incompatibility_of_unpickling_in_python_3

comment:52 Changed 5 months ago by SimonKing

  • Commit set to 2e0d11e758f4565b718f4cdf8f957fa6ef6b8d7d

With the new commit, one can unpickle a previously un-unpicklable meataxe matrix:

sage: X = load('/home/king/Projekte/coho/tests/Py2.sobj')
sage: X
[1 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 1]

However, we need doctests to demonstrate that this ticket fixes what it was supposed to fix.

Also, I wonder if we should strive to make it not only "legal" to use encoding='bytes' but to actually make it possible: Currently, one gets

sage: X = load('/home/king/Projekte/coho/tests/Py2.sobj', encoding='bytes')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-1bf7b48d4b86> in <module>()
----> 1 X = load('/home/king/Projekte/coho/tests/Py2.sobj', encoding='bytes')

/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.load (build/cythonized/sage/misc/persist.c:2863)()
    154 
    155     ## Load file by absolute filename
--> 156     with open(filename, 'rb') as fobj:
    157         X = loads(fobj.read(), compress=compress, **kwargs)
    158     try:

/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.load (build/cythonized/sage/misc/persist.c:2813)()
    155     ## Load file by absolute filename
    156     with open(filename, 'rb') as fobj:
--> 157         X = loads(fobj.read(), compress=compress, **kwargs)
    158     try:
    159         X._default_filename = os.path.abspath(filename)

/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.loads (build/cythonized/sage/misc/persist.c:7318)()
    974 
    975     unpickler = SageUnpickler(io.BytesIO(s), **kwargs)
--> 976     return unpickler.load()
    977 
    978 

/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/structure/factory.pyx in sage.structure.factory.lookup_global (build/cythonized/sage/structure/factory.c:4602)()
    752         pass
    753 
--> 754     if '.' in name:
    755         module, name = name.rsplit('.', 1)
    756         all = __import__(module, fromlist=[name])

TypeError: a bytes-like object is required, not 'str'

We could of course make lookup_global to accept bytes (with encoding ascii, I guess) to look up functions given by names. It will, however, be very likely that that will not be the only change needed to make encoding='bytes' work.


New commits:

2e0d11ePass unpickling options to pickle.load, default encoding 'latin1'. Accept both str and bytes in mtx_unpickle

comment:53 Changed 5 months ago by SimonKing

Warning: Trying to make bytes encoding work is likely to open a can of worms. Such as:

sage: X = load('/home/king/Projekte/coho/tests/Py2.sobj', encoding='bytes')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-1bf7b48d4b86> in <module>()
----> 1 X = load('/home/king/Projekte/coho/tests/Py2.sobj', encoding='bytes')

/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.load (build/cythonized/sage/misc/persist.c:2863)()
    157 
    158     ## Load file by absolute filename
--> 159     with open(filename, 'rb') as fobj:
    160         X = loads(fobj.read(), compress=compress, **kwargs)
    161     try:

/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.load (build/cythonized/sage/misc/persist.c:2813)()
    158     ## Load file by absolute filename
    159     with open(filename, 'rb') as fobj:
--> 160         X = loads(fobj.read(), compress=compress, **kwargs)
    161     try:
    162         X._default_filename = os.path.abspath(filename)

/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.loads (build/cythonized/sage/misc/persist.c:7318)()
    977 
    978     unpickler = SageUnpickler(io.BytesIO(s), **kwargs)
--> 979     return unpickler.load()
    980 
    981 

/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/rings/integer.pyx in sage.rings.integer.make_integer (build/cythonized/sage/rings/integer.c:43277)()
   7220     """
   7221     cdef Integer r = PY_NEW(Integer)
-> 7222     mpz_set_str(r.value, str_to_bytes(s), 32)
   7223     return r
   7224 

/home/king/Sage/git/py3/local/lib/python3.7/site-packages/sage/cpython/string.pxd in sage.cpython.string.str_to_bytes (build/cythonized/sage/rings/integer.c:47648)()
     90     # compile-time variable. We keep the Cython wrapper to deal with
     91     # the default arguments.
---> 92     return _str_to_bytes(s, encoding, errors)

TypeError: expected str, bytes found

So, it would mean to change further constructors to be flexible with regards to str and bytes.

Thus, question: Shall we try to do this? Shall we just keep the new commit, and add doctests? Further suggestion for the way to proceed?

comment:54 Changed 5 months ago by SimonKing

I just notice that the function str_to_bytes does not require that the input is a str. Hence, it may be reasonable to change it so that, if the input is a bytes, it is returned without change.

comment:55 Changed 5 months ago by git

  • Commit changed from 2e0d11e758f4565b718f4cdf8f957fa6ef6b8d7d to 1984bc5f2e1d5bd50be697ddad39866b683d5f76

Branch pushed to git repo; I updated commit sha1. New commits:

1984bc5Make str_to_bytes accept str input, and bytes_to_str accept bytes input. Use it in finite field and integer constructor

comment:56 Changed 5 months ago by SimonKing

As a proof of concept, I provide a commit which makes it possible to unpickle my example with encoding='bytes'. It works by making bytes_to_str accept str input (returning it unchanged) and str_to_bytes accept bytes input (returning it unchanged. Also, I used these conversion functions in some integer constructor, finite field constructor, and in mtx_unpickle.

However, I guess it would be needed to use bytes_to_str and str_to_bytes in a lot more places, therefore it is not more than a proof of concept.

Anyway, with the new commit, the following things work in Python 3:

sage: load('/home/king/Projekte/coho/tests/Py2.sobj')
[1 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 1]
sage: load('/home/king/Projekte/coho/tests/Py2.sobj', encoding='bytes')
[1 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 1]
sage: load('/home/king/Projekte/coho/tests/Py3.sobj', encoding='bytes')
[1 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 1]
sage: from sage.matrix.matrix_gfpn_dense import mtx_unpickle
sage: class unpickle_old_mtx:
....:     def __call__(self, *args, **kwds):
....:         return mtx_unpickle(*args, **kwds)
....:     
sage: register_unpickle_override('pGroupCohomology.mtx', 'MTX_unpickle_class', unpickle_old_mtx)
sage: X = load('/home/king/Projekte/coho/tests/State.sobj')

Of course, all these examples are using the optional meataxe package.

comment:57 follow-up: Changed 5 months ago by nbruin

Why try to make encoding='bytes' work in so many places? The problem is that pickles travelling over the Py2/Py3 boundary have an issue that the str/bytes distinction gets squashed (because str in py2 wasn't proper unicode yet). It's pretty clear that str objects are WAY more common in python than bytes objects, so encoding with bytes is going to need a LOT of post-processing. It's much more reasonable to assume these objects are in fact (ascii) strings. However, that breaks hard if it obviously isn't. Hence the latin1 workaround. But then getting bytes objects where they should be of course requires domain-specific knowledge. So the unpickling routines must be made more tolerant and accept a str where it expects a bytes object. But preferably only the unpickling routines! Interpreting a str object as a bytes object via latin1 encoding remains a hack that could mask errors and typos, and make them harder to detect.

If you want to make things work with 'bytes' encoding *as well as* 'latin1' encoding, I think you're just making extra work for yourself, and if you're making the infrastructure more lax about input types, I think you can end up with masking more errors in the end. Note that the problem you're solving here is very limited: it's JUST to help pickles cross the Py2-to-Py3 boundary (the other direction is not supported at all).

With the experience we have here now, I think it's pretty clear that with enough work, you can make any pickle decode with latin1 in Py3. You'll just be registering a lot of pickle overrides ...

Last edited 5 months ago by nbruin (previous) (diff)

comment:58 in reply to: ↑ 57 ; follow-up: Changed 5 months ago by SimonKing

Replying to nbruin:

Why try to make encoding='bytes' work in so many places?

Because an important datatype, namely numpy arrays with datetime objects, apparently can only be unpickled using encoding='bytes'. And of course, a numpy array can be part of a larger structure that also comprises further Sage objects. Therefore it makes sense that these can be unpickled with encoding='bytes', too.

So the unpickling routines must be made more tolerant and accept a str where it expects a bytes object. But preferably only the unpickling routines!

Sure. That's what I do in the first commit (in the only place in the Sage library that really uses bytes that I am aware of).

Interpreting a str object as a bytes object via latin1 encoding remains a hack that could mask errors and typos, and make them harder to detect.

You mean the other way around.

If you want to make things work with 'bytes' encoding *as well as* 'latin1' encoding, I think you're just making extra work for yourself, and if you're making the infrastructure more lax about input types, I think you can end up with masking more errors in the end.

Sure. That's why I said it is a proof of concept. I show that making bytes_to_str and str_to_bytes more tolerant about input and using it more commonly in fact is a way to unpickle with encoding='bytes', showing that with some work there is a chance to unpickle complex data structures involving numpy arrays, without patching numpy.

But probably, it would be easier to just drop the new commit, and then provide some unpickle override for numpy arrays only.

Note that the problem you're solving here is very limited: it's JUST to help pickles cross the Py2-to-Py3 boundary (the other direction is not supported at all).

The other direction apparently is less critical, as I have shown that the Py-3 pickle corresponding to the ticket description can be unpickled both in Py-2 and Py-3.

With the experience we have here now, I think it's pretty clear that with enough work, you can make any pickle decode with latin1 in Py3. You'll just be registering a lot of pickle overrides ...

No. I am pretty sure that with latin1 it would be few pickle overrides. With bytes it would be lots of pickle overrides.

Last edited 5 months ago by SimonKing (previous) (diff)

comment:59 in reply to: ↑ 58 Changed 5 months ago by SimonKing

Replying to SimonKing:

With the experience we have here now, I think it's pretty clear that with enough work, you can make any pickle decode with latin1 in Py3. You'll just be registering a lot of pickle overrides ...

No. I am pretty sure that with latin1 it would be few pickle overrides. With bytes it would be lots of pickle overrides.

Perhaps I was confusing "pickle overrides" with "changes in unpickling functions".

Restating what I mean:

  • If 'latin1' is default encoding, it could very well be that the first commit already is enough to fix the Sage library. With 'bytes', a lot more changes to the Sage library would be needed.
  • If 'latin1' is default encoding, there will of course be places in which we should register a pickle override. So far, we know one case from hearsay, namely numpy.array. If 'bytes' is default encoding, I simply don't know how many pickle overrides we'd need to register for non-Sage data types. If I recall correctly, the links discussed above only mention numpy datetime, with the corollary of numpy arrays. But is there more?

From the above, I think we can conclude that 'latin1' is the most reasonable default encoding for unpickling. But we should decide to what extent we want to support the 'bytes' encoding for unpickling. From my perspective, it wouldn't really be needed, thus, I'm totally fine with discarding the proof-of-concept commit.

comment:60 follow-up: Changed 5 months ago by SimonKing

Even if we decide to not support 'bytes' encoding, we might want to change str_to_bytes and bytes_to_str so that the former accepts bytes input and the latter accepts str input, returning the input unaltered in that case. I think it would be useful.

comment:61 in reply to: ↑ 60 Changed 5 months ago by nbruin

Replying to SimonKing:

Even if we decide to not support 'bytes' encoding, we might want to change str_to_bytes and bytes_to_str so that the former accepts bytes input and the latter accepts str input, returning the input unaltered in that case. I think it would be useful.

Yes, that doesn't look too bad. Do we know for certain that datetime is still a problem with latin1? I don't have a problem with it when I do

pickle.dumps(datetime(1977,10,10))

and then load it in Py3 with pickle.loads(...,encoding="latin1"). With encoding="bytes" it also seems to work. When I look in Python3 Lib/datetime.py, I see reduce and __new__ methods that very explicitly have comments about "pickle support" and calls to latin1 encoding to deal with string input parameters. So I'm pretty sure this has been fixed in recent python version; not just 3.9.

So I don't think we have a convincing case where encoding=bytes is required, and we do have convincing evidence that encoding=bytes is very painful. Even in a py3 where latin1 does not work out of the box, one could provide a pickle override to do the preprocessing. Therefore, I don't think you should put extra effort in supporting encoding=bytes at the moment.

comment:62 Changed 5 months ago by git

  • Commit changed from 1984bc5f2e1d5bd50be697ddad39866b683d5f76 to 8c105cdcada53eed019bba4f4d26cf7fe7e72c22

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

8c105cdMake str_to_bytes/bytes_to_str accept both str and bytes input.

comment:63 Changed 5 months ago by SimonKing

I replaced the old second commit by a commit that is less invasive: It changes the conversion functions so that they return the input unaltered if its type fits, it uses the conversion functions in matrix_gfpn_dense for unpickling (I think that's cleaner than doing a type check there), and it also uses them in lookup_global, because if we want to support encoding='bytes' at all, we have to make lookup work with that option.

So, from my perspective, the code itself is ready to be used. But we should add some tests. Ideas how they could look? I could imagine to use a string that was created by dumps in Python-2 and demonstrate that the string can be interpreted by loads, regardless whether it is Py-2 or Py-3.

Last edited 5 months ago by SimonKing (previous) (diff)

comment:64 Changed 5 months ago by SimonKing

Currently I try to create a doctest as follows: Have a Cython class Foo such that its constructor expects a str. Create a Foo instance f in Python 2 with a non-ascii string, and create a pickle string s with dumps(f). Then, without this ticket, loads(s) would fail in Python 3, because the non-ascii string in the pickle cannot be unpickled as a string with the default encoding chosen by Python 3. But WITH this ticket, it can be unpickled, because our default encoding is latin1.

Problem: When I do

sage: cython("""
....: from sage.structure.richcmp import richcmp
....: cdef class Foo:
....:     cdef str bar
....:     def __init__(self, data):
....:         self.bar = data
....:     def __richcmp__(self, other, op):
....:         cdef Foo Other = other
....:         return richcmp(self.bar, Other.bar, op)
....:     def __reduce__(self):
....:         return Foo, (self.bar,)
....: """)

then the module name of Foo is a temporary name that is essentially random. How can I prescribe that the module of Foo is __main__?

comment:65 Changed 5 months ago by git

  • Commit changed from 8c105cdcada53eed019bba4f4d26cf7fe7e72c22 to 712e8dd8109e1c6784f405eec86433ae9c1a711d

Branch pushed to git repo; I updated commit sha1. New commits:

712e8ddAdd tests for #28444

comment:66 Changed 5 months ago by git

  • Commit changed from 712e8dd8109e1c6784f405eec86433ae9c1a711d to c8a074883997bfdd5afde3065c4a33002914999b

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

c8a0748Add tests for #28444

comment:67 follow-up: Changed 5 months ago by SimonKing

  • Authors set to Simon King
  • Status changed from new to needs_review

The new commit adds tests. There is also one test that should only be executed in Python-3, and therefore I marked it # optional: python3. The tests in sage.misc.persist do pass in Sage-with-py3 and Sage-with-py2. However, oddly enough, the Py3 version of Sage adds python2, not python3, to the doctester. Is that a bug?

Anyway, I now think the ticket can be reviewed.

comment:68 in reply to: ↑ 67 Changed 5 months ago by SimonKing

Replying to SimonKing:

However, oddly enough, the Py3 version of Sage adds python2, not python3, to the doctester.

I just tested: With sage -t --optional=build,ccache,dochtml,frobby,gap_packages,gdb,gfortran,libsemigroups,meataxe,memlimit,mpir,p_group_cohomology,python2,python3,sage src/sage/misc/persist.pyx, the optional py3-test passes in Sage-with-py3, but fails in Sage-with-py2, which is of course expected, as the encoding keyword doesn't exist in Python-2.

comment:69 follow-up: Changed 4 months ago by chapoton

The python3-only test tag is #py3

comment:70 Changed 4 months ago by git

  • Commit changed from c8a074883997bfdd5afde3065c4a33002914999b to 89ac77aa44fd3bab4399791fa72e088a52058a8e

Branch pushed to git repo; I updated commit sha1. New commits:

89ac77aFix keyword for py3-only test

comment:71 in reply to: ↑ 69 Changed 4 months ago by SimonKing

Replying to chapoton:

The python3-only test tag is #py3

Thank you! The latest commit fixes it. I verified that the tests of persist.pyx pass both with python-2 and python-3.

comment:72 Changed 4 months ago by SimonKing

  • Status changed from needs_review to needs_work
  • Work issues set to Fix doc strings

Oops. The doc-builder complains:

[dochtml] [misc     ] docstring of sage.misc.persist.SagePickler:21: WARNING: Inline literal start-string without end-string.
[dochtml] [misc     ] docstring of sage.misc.persist.loads:52: WARNING: Block quote ends without a blank line; unexpected unindent.
[dochtml] [misc     ] docstring of sage.misc.persist.loads:52: WARNING: Inline emphasis start-string without end-string.

comment:73 Changed 4 months ago by git

  • Commit changed from 89ac77aa44fd3bab4399791fa72e088a52058a8e to 36930248d6df66a94fe8a86d411e9105595eb8e9

Branch pushed to git repo; I updated commit sha1. New commits:

3693024Fix doc strings in sage.misc.persist

comment:74 Changed 4 months ago by SimonKing

  • Status changed from needs_work to needs_review
  • Work issues Fix doc strings deleted

Fixed, I hope.

comment:75 Changed 4 months ago by jhpalmieri

On OS X, all tests pass with both Python 2 and Python 3. I don't have any comments on the actual code.

comment:76 follow-up: Changed 4 months ago by nbruin

Perhaps to improve maintainability for the future, document the uses in unpickling routines by explicitly mentioning something along the lines of

# Legacy pickles from Python 2 do not distinguish between str and bytes type objects.
# While we expect a "bytes" type object here, it could arrive here as a "str" object, for instance
# if unpickle was passed the argument "encoding='latin1'". In that case we convert the "str" to
# a "bytes" object using "encoding='latin1'". If the object is already a "bytes" object, then
# str_to_bytes will return it unchanged.

because the use of "str_to_bytes" would indicate for the uninitiated reader that a "str" object is expected, whereas the main code path is actually through "bytes". The "str" case only arises in legacy cases.

comment:77 Changed 4 months ago by git

  • Commit changed from 36930248d6df66a94fe8a86d411e9105595eb8e9 to f0828eea650db44b6a5fd6c8f16a8382ff82f04a

Branch pushed to git repo; I updated commit sha1. New commits:

f0828eeAdd a comment regarding the expected data type for an unpickle helper

comment:78 in reply to: ↑ 76 ; follow-up: Changed 4 months ago by SimonKing

Replying to nbruin:

Perhaps to improve maintainability for the future, document the uses in unpickling routines...

Is the new commit ok?

comment:79 in reply to: ↑ 78 ; follow-up: Changed 4 months ago by nbruin

Replying to SimonKing:

Replying to nbruin:

Perhaps to improve maintainability for the future, document the uses in unpickling routines...

Is the new commit ok?

There are some language errors in the text, but I think the gist of it is clear.

comment:80 Changed 4 months ago by git

  • Commit changed from f0828eea650db44b6a5fd6c8f16a8382ff82f04a to ba41ebee5b46fb2fb72eae988678ab0579309dc5

Branch pushed to git repo; I updated commit sha1. New commits:

ba41ebeFix two typos in a comment

comment:81 in reply to: ↑ 79 Changed 4 months ago by SimonKing

Replying to nbruin:

There are some language errors in the text, but I think the gist of it is clear.

Oops, you are right. I tried to fix it. Is it good now?

comment:82 Changed 4 months ago by jhpalmieri

  • Branch changed from u/SimonKing/fix_backwards_incompatibility_of_unpickling_in_python_3 to u/jhpalmieri/fix_backwards_incompatibility_of_unpickling_in_python_3

comment:83 Changed 4 months ago by jhpalmieri

  • Commit changed from ba41ebee5b46fb2fb72eae988678ab0579309dc5 to d7f170f2a2b8371da13d5577770b80868b21a726

Here are two more minor fixes.


New commits:

a293427Pass unpickling options to pickle.load, default encoding 'latin1'. Accept both str and bytes in mtx_unpickle
5973292Make str_to_bytes/bytes_to_str accept both str and bytes input.
9646948Add tests for #28444
bfd64bcFix keyword for py3-only test
7e1de27Fix doc strings in sage.misc.persist
f4581b6Add a comment regarding the expected data type for an unpickle helper
c4db899Fix two typos in a comment
d7f170ftrac 28444: fix a few typos.

comment:84 Changed 4 months ago by vbraun

looks good to me...

comment:85 Changed 4 months ago by nbruin

  • Reviewers set to Nils Bruin
  • Status changed from needs_review to positive_review

Let's get this out of the door then

comment:86 Changed 4 months ago by vbraun

  • Branch changed from u/jhpalmieri/fix_backwards_incompatibility_of_unpickling_in_python_3 to d7f170f2a2b8371da13d5577770b80868b21a726
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:87 in reply to: ↑ 28 ; follow-up: Changed 3 months ago by soehms

  • Commit d7f170f2a2b8371da13d5577770b80868b21a726 deleted

Replying to SimonKing:

Simon, maybe it's the right time to change the format of the saved data?

As I have demonstrated above, a pickle created with Python-3 can be read both with Python-2 and Python-3. So, that side of the problem isn't really urgent for the p_group_cohomology package, I think.

Hi Simon,

recently I've experimented with data storage using YAML. I also used your examples for that and documented this in an jupyter notebook attached to #28302. It would be interesting to hear your comment about that.

I tested with Python 2 and 3 on 8.9.rc0 and 8.9.rc1. As said in the description of the ticket I could read Py3.sobj with Python 2 and 8.9.rc0. This seems to be broken now with 8.9.rc1 (this also happens with newly saved sobj-file from Python 3):

sage: load('Py3.sobj')
Traceback (most recent call last):
...
TypeError: expected bytes, unicode found

Should this be acceptable, right now?

comment:88 in reply to: ↑ 87 Changed 3 months ago by SimonKing

Replying to soehms:

I tested with Python 2 and 3 on 8.9.rc0 and 8.9.rc1. As said in the description of the ticket I could read Py3.sobj with Python 2 and 8.9.rc0. This seems to be broken now with 8.9.rc1 (this also happens with newly saved sobj-file from Python 3):

sage: load('Py3.sobj')
Traceback (most recent call last):
...
TypeError: expected bytes, unicode found

How did this suddenly pop up?? Sigh.

Should this be acceptable, right now?

Personally I don't think so. So, I'd appreciate opening a new ticket.

Note: See TracTickets for help on using tickets.