Ticket #28302: demo_yaml.ipynb

File demo_yaml.ipynb, 39.2 KB (added by soehms, 2 years ago)

How can YAML be used for long-term data storage with Sage?

Line 
1{
2 "cells": [
3  {
4   "cell_type": "markdown",
5   "metadata": {},
6   "source": [
7    "# How can YAML be used for long-term data storage with Sage?\n",
8    "\n",
9    "In the sage-devel-thread [save/loads and the pickle jar](https://groups.google.com/forum/#!topic/sage-devel/JuKzzgxDlmA) and in [#28302](https://trac.sagemath.org/ticket/28302)  there has been a discussion about data-storage for long-term purpose. Since the `sobj` format doesn't seem to be suitable for this task (see also : [getting rid of the pickle_jar](https://groups.google.com/forum/#!msg/sage-devel/dZwxUCNEZWk/joIzOT0aBAAJ) and [#24337](https://trac.sagemath.org/ticket/24337)), suggestions where made to use a *human readable* data format for this concern. This worksheet tries to visualize how this could look like using `YAML` giving some examples with respect to the storage of matrices.\n",
10    "\n",
11    "To reproduce this code you need to install *raumel.yaml* (`sage - pip install ruamel.yaml`). I choose YAML since it is the ML mostly focusing to be *human readable*. The methods `add_multi_representer` and `add_multi_constructor` are not the recommend way of *ruamel.yaml* to register custom conversion methods (they even aren't documented and inherited from *PyYAML*). But for this demonstration case I didn't want to change class structures (which would mean to add methods `to_yaml` and `from_yaml` (unfortunately not `_to_yaml` and `_from_yaml`) to Sage classes).\n",
12    "\n",
13    "In general, the code is just intended to illustrate how YAML could be used with Sage. It is not claiming to be complete, structural reasonable or well tested. Its only purpose is to have the examples given below work. These examples did run on stable 8.1 (Python 2), 8.9.rc0 (Python 3) and 8.9.rc1 (Python 2 and 3). The produced output just differs in the dumped version-information. \n",
14    "\n",
15    "Furthermore, I've freely chosen the names of the `tags` only for this test (but in a systematical way using names from Sage's global name-space). Surely, it would need an extensive process of agreement to find an appropriate scheme (taking into account *OpenMath* and other human readable CAS formats).\n",
16    "\n",
17    "### Why think about another serialization format?\n",
18    "\n",
19    "The advantage of pickling lies in performance, file-size and that you get back objects that are very close to what you have saved before. This is achieved by serializing all dependent objects exactly as they are, independent whether they are simple structured or not. \n",
20    "\n",
21    "But if you want your data-files to be valid after some version upgrades this accuracy becomes a disadvantage, since the chance that the data-structure of one of the dependent objects has changed is high. You can't guarantee that developers and reviewers of all such changes will notice all the time, that they have broken pickling.\n",
22    "\n",
23    "Now, Sage has its own frameworks how elements and parents can be constructed. Python's pickling doesn't know anything about that. You can customize `__reduce__` or `__getstate__`/`__setstate__` methods to tell Python a little bit more. But anyway, pickling will always construct Python objects and not parents and elements.\n",
24    "\n",
25    "Thus, in order to store data for long-term usage, it is better to rely their reconstruction on intrinsic Sage construction methods such as `_element_constructor_` and diverse construction methods for parents.\n",
26    "\n",
27    "Sebastian Oehms, October 2019"
28   ]
29  },
30  {
31   "cell_type": "code",
32   "execution_count": 1,
33   "metadata": {},
34   "outputs": [],
35   "source": [
36    "from ruamel.yaml import YAML\n",
37    "yaml = YAML(typ='safe')\n",
38    "\n",
39    "from sys import version_info\n",
40    "from sage.misc.banner import version_dict\n",
41    "from sage.structure.element import Matrix\n",
42    "from sage.structure.factory import lookup_global\n",
43    "from sage.matrix.matrix_space import MatrixSpace\n",
44    "from sage.matrix.matrix_gfpn_dense import Matrix_gfpn_dense, mtx_unpickle\n",
45    "from sage.matrix.matrix_generic_dense import Matrix_generic_dense\n",
46    "from sage.rings.integer_ring import IntegerRing_class, ZZ\n",
47    "from sage.rings.rational_field import QQ\n",
48    "from sage.rings.finite_rings.finite_field_prime_modn import FiniteField_prime_modn\n",
49    "from sage.rings.finite_rings.finite_field_base import FiniteField\n",
50    "from sage.rings.polynomial.laurent_polynomial_ring import LaurentPolynomialRing_univariate\n",
51    "from sage.rings.polynomial.multi_polynomial_ring import MPolynomialRing_polydict\n",
52    "from sage.rings.polynomial.polydict import ETuple\n",
53    "\n",
54    "\n",
55    "# -------------------------------------------------------------------------------------------------------------------\n",
56    "# dictionaries to recover implementation variants (these would better be shared with the corresponding library files)\n",
57    "# -------------------------------------------------------------------------------------------------------------------\n",
58    "matrix_space_implementations={'meataxe':Matrix_gfpn_dense, 'generic':Matrix_generic_dense}\n",
59    "finite_field_implementations={'modn':FiniteField_prime_modn}\n",
60    "\n",
61    "\n",
62    "# -----------------------------------------------------------------------------------------------------------\n",
63    "# helper functions\n",
64    "# -----------------------------------------------------------------------------------------------------------\n",
65    "def implementation_to_str(impl_dict, cls):\n",
66    "    implementation = None\n",
67    "    for k, v in impl_dict.items():\n",
68    "        if v in cls.__mro__:\n",
69    "            implementation = k\n",
70    "            break\n",
71    "    if implementation is None:\n",
72    "        raise NotImplementedError('unhandled implementation %s' %implementation)\n",
73    "    return implementation\n",
74    "\n",
75    "def str_to_implementation(parent, impl_str):\n",
76    "    if parent == 'MatrixSpace':\n",
77    "        impl_dict = matrix_space_implementations\n",
78    "    elif parent == 'FiniteField':\n",
79    "        impl_dict = finite_field_implementations\n",
80    "    else:\n",
81    "        return impl_str        \n",
82    "    if impl_str in impl_dict.keys():\n",
83    "        return impl_dict[impl_str]\n",
84    "    return impl_str\n",
85    "\n",
86    "def poly_to_dict_rec(p):\n",
87    "    dict_res = {}\n",
88    "    if hasattr(p, 'dict'):\n",
89    "        p_dict = p.dict()\n",
90    "        for k in p_dict.keys():\n",
91    "            if isinstance(k, ETuple):\n",
92    "                dict_res[tuple(k)] = poly_to_dict_rec(p_dict[k])\n",
93    "            else:\n",
94    "                dict_res[k] = poly_to_dict_rec(p_dict[k])\n",
95    "    else:\n",
96    "        if p in ZZ:\n",
97    "            return int(p)\n",
98    "        return p\n",
99    "    return dict_res\n",
100    "\n",
101    "# -----------------------------------------------------------------------------------------------------------\n",
102    "# For Future use (in case arguments of construction calls did change)\n",
103    "# -----------------------------------------------------------------------------------------------------------\n",
104    "def upgrade(construction_call, arguments, version):\n",
105    "    this = version_dict()\n",
106    "    if version['major'] == this['major'] and version['minor'] == this['minor']:\n",
107    "        return arguments\n",
108    "\n",
109    "    if version['major'] < 8  or version['major'] == 8 and version['minor'] < 4:\n",
110    "        if construction_call == 'MatrixSpace':\n",
111    "            if arguments['implementation'] == 'flint':\n",
112    "                # this has been the default implementation for MatrixSpace befor version 8.4\n",
113    "                # (:trac:`23719`). We check if this is the case here, and remove it in case of True\n",
114    "                base_ring = arguments['base_ring']\n",
115    "                if not base_ring is ZZ and not base_ring is QQ:\n",
116    "                    del arguments['implementation']\n",
117    "    return arguments\n",
118    "\n",
119    "\n",
120    "# -----------------------------------------------------------------------------------------------------------\n",
121    "# declaration of representers needed for the examples\n",
122    "# -----------------------------------------------------------------------------------------------------------\n",
123    "def yaml_matrix_representer(representer, data):\n",
124    "    if isinstance(data, Matrix_gfpn_dense): # to use meataxe_unpickle function\n",
125    "        mtx_args = data.__reduce__()[1][1:]\n",
126    "        dict_dump = {'parent':data.parent(), 'mtx_args':mtx_args}\n",
127    "    else:\n",
128    "        entries = {key:poly_to_dict_rec(value) for key, value in data.dict().items()}\n",
129    "        dict_dump = {'parent':data.parent(), 'data':entries}\n",
130    "    return representer.represent_mapping('!ElementMatrix', dict_dump)\n",
131    "\n",
132    "def yaml_matrix_space_representer(representer, data):\n",
133    "    base_ring = data.base_ring()\n",
134    "    implementation = None\n",
135    "    if hasattr(data, 'Element'): # to have this work with version 8.1, too\n",
136    "        if not data._has_default_implementation():\n",
137    "            implementation = implementation_to_str(matrix_space_implementations, data.Element)\n",
138    "    else: # needed for versions before 8.4 (:trac:`23719`)\n",
139    "        if data._implementation != 'flint':  # flint has been default before 8.4\n",
140    "            if isinstance(data._implementation, type):                \n",
141    "                implementation = implementation_to_str(matrix_space_implementations, data._implementation)\n",
142    "            else:\n",
143    "                implementation = implementation_to_str(matrix_space_implementations, data._get_matrix_class())\n",
144    "    nrows, ncols = data.dims()\n",
145    "    dict_dump = {'base_ring':base_ring, 'nrows':nrows, 'ncols':ncols, 'sparse':data.is_sparse(), 'version':version_dict()}\n",
146    "    if implementation is not None:\n",
147    "        dict_dump['implementation'] = implementation\n",
148    "    return representer.represent_mapping('!ParentMatrixSpace', dict_dump)\n",
149    "\n",
150    "def yaml_integer_ring_representer(representer, data):\n",
151    "    return representer.represent_mapping('!ParentIntegerRing', {})\n",
152    "\n",
153    "def yaml_finite_field_representer(representer, data):\n",
154    "    implementation = implementation_to_str(finite_field_implementations, data.__class__);\n",
155    "    dict_dump = {'order':int(data.cardinality()), 'names':data.variable_name(), 'impl':implementation, 'version':version_dict()}\n",
156    "    return representer.represent_mapping('!ParentFiniteField', dict_dump)\n",
157    "\n",
158    "def yaml_laurent_polynomial_ring_representer(representer, data):\n",
159    "    dict_dump = {'base_ring':data.base_ring(), 'names':data.variable_names(), 'version':version_dict()}\n",
160    "    return representer.represent_mapping('!ParentLaurentPolynomialRing', dict_dump)\n",
161    "\n",
162    "def yaml_polynomial_ring_representer(representer, data):\n",
163    "    dict_dump = {'base_ring':data.base_ring(), 'names':data.variable_names(), 'version':version_dict()}\n",
164    "    return representer.represent_mapping('!ParentPolynomialRing', dict_dump)\n",
165    "\n",
166    "# -----------------------------------------------------------------------------------------------------------\n",
167    "# declaration of constructors needed for the examples\n",
168    "# -----------------------------------------------------------------------------------------------------------\n",
169    "def yaml_element_constructor(constructor, tag_suffix, node):\n",
170    "    inp = constructor.construct_mapping(node, deep=True)\n",
171    "    parent  = inp['parent']\n",
172    "    if 'mtx_args' in inp.keys():\n",
173    "        # special case: use mtx_unpickle\n",
174    "        mtx_args = inp['mtx_args']\n",
175    "        return mtx_unpickle(parent, *mtx_args)\n",
176    "    data    = inp['data']\n",
177    "    return parent(data)\n",
178    "\n",
179    "def yaml_parent_constructor(constructor, tag_suffix, node):\n",
180    "    args = constructor.construct_mapping(node, deep=True)\n",
181    "    constructor_name = tag_suffix\n",
182    "\n",
183    "    if 'version' in args.keys():\n",
184    "        # check for version upgrades\n",
185    "        version = args['version']\n",
186    "        del args['version']\n",
187    "        args = upgrade(tag_suffix, args, version)\n",
188    "    if 'implementation' in args.keys():\n",
189    "        # set non-default implementation\n",
190    "        args['implementation'] = str_to_implementation(constructor_name, args['implementation'])\n",
191    "\n",
192    "    if version_info.major == 2:\n",
193    "        constructor_name = bytes(tag_suffix)        \n",
194    "    return lookup_global(constructor_name)(**args)\n",
195    "\n",
196    "\n",
197    "# -----------------------------------------------------------------------------------------------------------\n",
198    "# registration of representers needed for the examples\n",
199    "# -----------------------------------------------------------------------------------------------------------\n",
200    "yaml.representer.add_multi_representer(Matrix,             yaml_matrix_representer)\n",
201    "\n",
202    "yaml.representer.add_multi_representer(MatrixSpace,        yaml_matrix_space_representer)\n",
203    "yaml.representer.add_multi_representer(FiniteField,        yaml_finite_field_representer)\n",
204    "yaml.representer.add_multi_representer(LaurentPolynomialRing_univariate, yaml_laurent_polynomial_ring_representer)\n",
205    "yaml.representer.add_multi_representer(MPolynomialRing_polydict,   yaml_polynomial_ring_representer)\n",
206    "yaml.representer.add_multi_representer(IntegerRing_class,  yaml_integer_ring_representer)\n",
207    "\n",
208    "# -----------------------------------------------------------------------------------------------------------\n",
209    "# registration of constructors needed for the examples\n",
210    "# -----------------------------------------------------------------------------------------------------------\n",
211    "yaml.constructor.add_multi_constructor('!Element',         yaml_element_constructor)\n",
212    "yaml.constructor.add_multi_constructor('!Parent',          yaml_parent_constructor)"
213   ]
214  },
215  {
216   "cell_type": "markdown",
217   "metadata": {},
218   "source": [
219    "### Simon King's example\n",
220    "\n",
221    "The first example is taken from the Trac-ticket [#28444](https://trac.sagemath.org/ticket/28444) in which an incompatibility between Python 2 and Python 3 `sobj`-files has been fixed. To reproduce the example you have to download the corresponding file from that ticket into your current directory.  "
222   ]
223  },
224  {
225   "cell_type": "code",
226   "execution_count": 2,
227   "metadata": {},
228   "outputs": [],
229   "source": [
230    "if version_info.major == 2:\n",
231    "    M1 = load('Py2.sobj')  # if you use Python 2 on a version newer than 8.9.rc0 than Py3.sobj will not work\n",
232    "else:\n",
233    "    M1 = load('Py3.sobj')\n",
234    "stream = open('M1.yaml', 'w')\n",
235    "yaml.dump(M1,stream)\n",
236    "stream.close()"
237   ]
238  },
239  {
240   "cell_type": "markdown",
241   "metadata": {},
242   "source": [
243    "Will yield:\n",
244    "\n",
245    "```yaml\n",
246    "!ElementMatrix\n",
247    "mtx_args:\n",
248    "- 2\n",
249    "- 8\n",
250    "- !!binary |\n",
251    "  gB8=\n",
252    "- true\n",
253    "parent: !ParentMatrixSpace\n",
254    "  base_ring: !ParentFiniteField\n",
255    "    impl: modn\n",
256    "    names: x\n",
257    "    order: 2\n",
258    "    version: {major: 8, minor: 9, prerelease: true, tiny: 0}\n",
259    "  implementation: meataxe\n",
260    "  ncols: 8\n",
261    "  nrows: 2\n",
262    "  sparse: false\n",
263    "  version: {major: 8, minor: 9, prerelease: true, tiny: 0}\n",
264    "```"
265   ]
266  },
267  {
268   "cell_type": "markdown",
269   "metadata": {},
270   "source": [
271    "This result is independent whether you use Python 2 or Python 3 (nothing shown by `diff`). Loading this file back works independent on the Python version, as well:"
272   ]
273  },
274  {
275   "cell_type": "code",
276   "execution_count": 3,
277   "metadata": {},
278   "outputs": [
279    {
280     "data": {
281      "text/plain": [
282       "True"
283      ]
284     },
285     "execution_count": 3,
286     "metadata": {},
287     "output_type": "execute_result"
288    }
289   ],
290   "source": [
291    "stream = open('M1.yaml', 'r')\n",
292    "M1back = yaml.load(stream)\n",
293    "stream.close()\n",
294    "M1 == M1back"
295   ]
296  },
297  {
298   "cell_type": "markdown",
299   "metadata": {},
300   "source": [
301    "Here we have used the same `mtx_unplickle`-function as it is used in the `sobj` case to store the entries of the matrix binary. If you want to have it more readable you have to change the matrix representer: "
302   ]
303  },
304  {
305   "cell_type": "code",
306   "execution_count": 4,
307   "metadata": {},
308   "outputs": [],
309   "source": [
310    "def yaml_matrix_representer(representer, data):\n",
311    "    if isinstance(data, Matrix_gfpn_dense) and False:  # deactivating mtx_unpickle\n",
312    "        mtx_args = data.__reduce__()[1][1:]\n",
313    "        dict_dump = {'parent':data.parent(), 'mtx_args':mtx_args}\n",
314    "    else:\n",
315    "        entries = {key:poly_to_dict_rec(value) for key, value in data.dict().items()}\n",
316    "        dict_dump = {'parent':data.parent(), 'data':entries}\n",
317    "    return representer.represent_mapping('!ElementMatrix', dict_dump)\n",
318    "\n",
319    "yaml.representer.add_multi_representer(Matrix,      yaml_matrix_representer)"
320   ]
321  },
322  {
323   "cell_type": "code",
324   "execution_count": 5,
325   "metadata": {},
326   "outputs": [],
327   "source": [
328    "stream = open('M1entries.yaml', 'w')\n",
329    "yaml.dump(M1, stream)\n",
330    "stream.close()"
331   ]
332  },
333  {
334   "cell_type": "markdown",
335   "metadata": {},
336   "source": [
337    "The result now is this:\n",
338    "\n",
339    "```yaml\n",
340    "!ElementMatrix\n",
341    "data:\n",
342    "  ? [0, 0]\n",
343    "  : 1\n",
344    "  ? [1, 3]\n",
345    "  : 1\n",
346    "  ? [1, 4]\n",
347    "  : 1\n",
348    "  ? [1, 5]\n",
349    "  : 1\n",
350    "  ? [1, 6]\n",
351    "  : 1\n",
352    "  ? [1, 7]\n",
353    "  : 1\n",
354    "parent: !ParentMatrixSpace\n",
355    "  base_ring: !ParentFiniteField\n",
356    "    impl: modn\n",
357    "    names: x\n",
358    "    order: 2\n",
359    "    version: {major: 8, minor: 9, prerelease: true, tiny: 0}\n",
360    "  implementation: meataxe\n",
361    "  ncols: 8\n",
362    "  nrows: 2\n",
363    "  sparse: false\n",
364    "  version: {major: 8, minor: 9, prerelease: true, tiny: 0}\n",
365    "```\n",
366    "Reading it back:"
367   ]
368  },
369  {
370   "cell_type": "code",
371   "execution_count": 6,
372   "metadata": {},
373   "outputs": [
374    {
375     "data": {
376      "text/plain": [
377       "True"
378      ]
379     },
380     "execution_count": 6,
381     "metadata": {},
382     "output_type": "execute_result"
383    }
384   ],
385   "source": [
386    "stream = open('M1entries.yaml', 'r')\n",
387    "M1entries = yaml.load(stream)\n",
388    "stream.close()\n",
389    "M1 == M1entries"
390   ]
391  },
392  {
393   "cell_type": "markdown",
394   "metadata": {},
395   "source": [
396    "But one thing is annoying:"
397   ]
398  },
399  {
400   "cell_type": "code",
401   "execution_count": 7,
402   "metadata": {},
403   "outputs": [
404    {
405     "data": {
406      "text/plain": [
407       "False"
408      ]
409     },
410     "execution_count": 7,
411     "metadata": {},
412     "output_type": "execute_result"
413    }
414   ],
415   "source": [
416    "M1back.parent() == M1.parent()"
417   ]
418  },
419  {
420   "cell_type": "code",
421   "execution_count": 8,
422   "metadata": {},
423   "outputs": [
424    {
425     "data": {
426      "text/plain": [
427       "False"
428      ]
429     },
430     "execution_count": 8,
431     "metadata": {},
432     "output_type": "execute_result"
433    }
434   ],
435   "source": [
436    "M1entries.parent() == M1.parent()"
437   ]
438  },
439  {
440   "cell_type": "markdown",
441   "metadata": {},
442   "source": [
443    "Even though:"
444   ]
445  },
446  {
447   "cell_type": "code",
448   "execution_count": 9,
449   "metadata": {},
450   "outputs": [
451    {
452     "data": {
453      "text/plain": [
454       "Full MatrixSpace of 2 by 8 dense matrices over Finite Field of size 2 (using Matrix_gfpn_dense)"
455      ]
456     },
457     "execution_count": 9,
458     "metadata": {},
459     "output_type": "execute_result"
460    }
461   ],
462   "source": [
463    "M1back.parent()"
464   ]
465  },
466  {
467   "cell_type": "code",
468   "execution_count": 10,
469   "metadata": {},
470   "outputs": [
471    {
472     "data": {
473      "text/plain": [
474       "Full MatrixSpace of 2 by 8 dense matrices over Finite Field of size 2 (using Matrix_gfpn_dense)"
475      ]
476     },
477     "execution_count": 10,
478     "metadata": {},
479     "output_type": "execute_result"
480    }
481   ],
482   "source": [
483    "M1entries.parent()"
484   ]
485  },
486  {
487   "cell_type": "code",
488   "execution_count": 11,
489   "metadata": {},
490   "outputs": [
491    {
492     "data": {
493      "text/plain": [
494       "Full MatrixSpace of 2 by 8 dense matrices over Finite Field of size 2 (using Matrix_gfpn_dense)"
495      ]
496     },
497     "execution_count": 11,
498     "metadata": {},
499     "output_type": "execute_result"
500    }
501   ],
502   "source": [
503    "M1.parent()"
504   ]
505  },
506  {
507   "cell_type": "markdown",
508   "metadata": {},
509   "source": [
510    "The reason for this is that the `_pyx_order` of the corresponding base rings cannot be identified:"
511   ]
512  },
513  {
514   "cell_type": "code",
515   "execution_count": 12,
516   "metadata": {},
517   "outputs": [
518    {
519     "name": "stdout",
520     "output_type": "stream",
521     "text": [
522      "('different item', '_pyx_order', NativeIntStruct(2))\n"
523     ]
524    }
525   ],
526   "source": [
527    "for k, v in M1.base_ring().__dict__.items():\n",
528    "    if M1back.base_ring().__dict__[k] != v:\n",
529    "        print(\"different item\", k, v)"
530   ]
531  },
532  {
533   "cell_type": "markdown",
534   "metadata": {},
535   "source": [
536    "I guess that this isn't an essential problem!\n",
537    "\n",
538    "### Loading from externally generated yaml-code\n",
539    "\n",
540    "The above implementation of the constructors can also be used for classes, that don't have a corresponding dump-method:"
541   ]
542  },
543  {
544   "cell_type": "code",
545   "execution_count": 13,
546   "metadata": {},
547   "outputs": [
548    {
549     "data": {
550      "text/plain": [
551       "s2*s1^-1*s2*s0^-1"
552      ]
553     },
554     "execution_count": 13,
555     "metadata": {},
556     "output_type": "execute_result"
557    }
558   ],
559   "source": [
560    "sage: braid=\"\"\"\n",
561    "....: !ElementBraid\n",
562    "....: data:\n",
563    "....:   [3,-2,3,-1]\n",
564    "....: parent: !ParentBraidGroup\n",
565    "....:     n: 5\n",
566    "....: \"\"\"\n",
567    "sage: yaml.load(braid)"
568   ]
569  },
570  {
571   "cell_type": "code",
572   "execution_count": 14,
573   "metadata": {},
574   "outputs": [
575    {
576     "data": {
577      "text/plain": [
578       "Braid group on 5 strands"
579      ]
580     },
581     "execution_count": 14,
582     "metadata": {},
583     "output_type": "execute_result"
584    }
585   ],
586   "source": [
587    "_.parent()"
588   ]
589  },
590  {
591   "cell_type": "markdown",
592   "metadata": {},
593   "source": [
594    "As you can see: There is no mystery about element construction. It is just like in a Sage-session. In case of an incompatibility you may do adaptions manually or by simple conversion tools.\n",
595    "\n",
596    "### A larger matrix with polynomial entries\n",
597    "\n",
598    "For an example with a larger matrix  I use data which describe a 648 x 648 matrix over a 2 variate polynomial ring over an univariate Laurent polynomial ring over `ZZ` (this is a regular representation matrix of one of the generators of the *cubic Hecke algebra on 4 strands* which can be downloaded from [Ivan Marin's data file](http://www.lamfa.u-picardie.fr/marin/softs/H4/MatricesRegH4.maple) as human readable maple-file). \n",
599    "\n",
600    "Notice: Loading the full data-file (containing six such matrices) may take more than a minute (on an `i5` about 70 - 80 seconds with Python 2 and 110 -120 seconds with Python 3)!"
601   ]
602  },
603  {
604   "cell_type": "code",
605   "execution_count": 15,
606   "metadata": {},
607   "outputs": [
608    {
609     "name": "stdout",
610     "output_type": "stream",
611     "text": [
612      "Start loading modified maple-file\n",
613      "Finished loading modified maple-file after 77.316933 seconds\n"
614     ]
615    }
616   ],
617   "source": [
618    "from six.moves.urllib.request import urlopen\n",
619    "url_data = urlopen('http://www.lamfa.u-picardie.fr/marin/softs/H4/MatricesRegH4.maple').read().decode()\n",
620    "preparsed_data =url_data.replace(':=', '=').replace(';', '').replace('^', '**').replace('Matrix', 'matrix')\n",
621    "stream = open('MatricesRegH4.py',  'w')\n",
622    "stream.write(preparsed_data)\n",
623    "stream.close()\n",
624    "\n",
625    "L.<w>   = LaurentPolynomialRing(ZZ)\n",
626    "R.<u,v> = PolynomialRing(L)\n",
627    "\n",
628    "from sage.misc.misc import cputime\n",
629    "\n",
630    "print('Start loading modified maple-file')\n",
631    "start = cputime()\n",
632    "load('MatricesRegH4.py')\n",
633    "end = cputime(start)\n",
634    "print('Finished loading modified maple-file after %s seconds' %end)"
635   ]
636  },
637  {
638   "cell_type": "markdown",
639   "metadata": {},
640   "source": [
641    "For the test we use the matrix `mm1`:"
642   ]
643  },
644  {
645   "cell_type": "code",
646   "execution_count": 16,
647   "metadata": {},
648   "outputs": [
649    {
650     "data": {
651      "text/plain": [
652       "Full MatrixSpace of 648 by 648 dense matrices over Multivariate Polynomial Ring in u, v over Univariate Laurent Polynomial Ring in w over Integer Ring"
653      ]
654     },
655     "execution_count": 16,
656     "metadata": {},
657     "output_type": "execute_result"
658    }
659   ],
660   "source": [
661    "mm1.parent()"
662   ]
663  },
664  {
665   "cell_type": "code",
666   "execution_count": 17,
667   "metadata": {},
668   "outputs": [
669    {
670     "name": "stdout",
671     "output_type": "stream",
672     "text": [
673      "Start dumping YAML mm1\n",
674      "Finished dumping  YAML mm1 after 0.620348 seconds\n"
675     ]
676    }
677   ],
678   "source": [
679    "stream = open('mm1.yaml', 'w')\n",
680    "print('Start dumping YAML mm1')\n",
681    "start = cputime()\n",
682    "yaml.dump(mm1, stream=stream)\n",
683    "end = cputime(start)\n",
684    "print('Finished dumping  YAML mm1 after %s seconds' %end)\n",
685    "stream.close()"
686   ]
687  },
688  {
689   "cell_type": "code",
690   "execution_count": 18,
691   "metadata": {},
692   "outputs": [
693    {
694     "name": "stdout",
695     "output_type": "stream",
696     "text": [
697      "Start loading YAML mm1\n",
698      "Finished loading YAML mm1 after 0.438457 seconds\n"
699     ]
700    }
701   ],
702   "source": [
703    "stream=open('mm1.yaml', 'r')\n",
704    "print('Start loading YAML mm1')\n",
705    "start = cputime()\n",
706    "mm1_yaml = yaml.load(stream)\n",
707    "end = cputime(start)\n",
708    "print('Finished loading YAML mm1 after %s seconds' %end)\n",
709    "stream.close()"
710   ]
711  },
712  {
713   "cell_type": "code",
714   "execution_count": 19,
715   "metadata": {},
716   "outputs": [
717    {
718     "data": {
719      "text/plain": [
720       "True"
721      ]
722     },
723     "execution_count": 19,
724     "metadata": {},
725     "output_type": "execute_result"
726    }
727   ],
728   "source": [
729    "mm1 == mm1_yaml"
730   ]
731  },
732  {
733   "cell_type": "code",
734   "execution_count": 20,
735   "metadata": {},
736   "outputs": [
737    {
738     "data": {
739      "text/plain": [
740       "True"
741      ]
742     },
743     "execution_count": 20,
744     "metadata": {},
745     "output_type": "execute_result"
746    }
747   ],
748   "source": [
749    "mm1.parent() == mm1_yaml.parent()"
750   ]
751  },
752  {
753   "cell_type": "markdown",
754   "metadata": {},
755   "source": [
756    "The matrix looks similar as the second version of the first example. But the entries are a bit more complicated, since they represent recursive dictionaries. This comes from the conversion (helper function `poly_to_dict_rec` in the code above) before serializing. That is what I've tried to indicate in [comment 5](https://trac.sagemath.org/ticket/28302#comment:5) and [comment 7](https://trac.sagemath.org/ticket/28302#comment:7) of the ticket.\n",
757    "\n",
758    "```yaml\n",
759    "!ElementMatrix\n",
760    "data:\n",
761    "  ? [0, 27]\n",
762    "  : ? [0, 1]\n",
763    "    : {0: -1}\n",
764    "  ? [0, 54]\n",
765    "  : ? [0, 0]\n",
766    "    : {0: 1}\n",
767    "..............\n",
768    "  ? [646, 619]\n",
769    "  : ? [0, 0]\n",
770    "    : {1: 1}\n",
771    "  ? [647, 620]\n",
772    "  : ? [0, 0]\n",
773    "    : {1: 1}\n",
774    "parent: !ParentMatrixSpace\n",
775    "  base_ring: !ParentPolynomialRing\n",
776    "    base_ring: !ParentLaurentPolynomialRing\n",
777    "      base_ring: !ParentIntegerRing {}\n",
778    "      names: [w]\n",
779    "      version: {major: 8, minor: 9, prerelease: true, tiny: 0}\n",
780    "    names: [u, v]\n",
781    "    version: {major: 8, minor: 9, prerelease: true, tiny: 0}\n",
782    "  ncols: 648\n",
783    "  nrows: 648\n",
784    "  sparse: false\n",
785    "  version: {major: 8, minor: 9, prerelease: true, tiny: 0}\n",
786    "```\n",
787    "\n",
788    "#### Now lets compare this with `sobj`:"
789   ]
790  },
791  {
792   "cell_type": "code",
793   "execution_count": 21,
794   "metadata": {},
795   "outputs": [
796    {
797     "name": "stdout",
798     "output_type": "stream",
799     "text": [
800      "Start dumping SOBJ mm1\n",
801      "Finished dumping  SOBJ mm1 after 5.115259 seconds\n"
802     ]
803    }
804   ],
805   "source": [
806    "print('Start dumping SOBJ mm1')\n",
807    "start = cputime()\n",
808    "save(mm1, 'mm1.sobj')\n",
809    "end = cputime(start)\n",
810    "print('Finished dumping  SOBJ mm1 after %s seconds' %end)\n",
811    "stream.close()"
812   ]
813  },
814  {
815   "cell_type": "code",
816   "execution_count": 22,
817   "metadata": {},
818   "outputs": [
819    {
820     "name": "stdout",
821     "output_type": "stream",
822     "text": [
823      "Start loading SOBJ mm1\n",
824      "Finished loading SOBJ mm1 after 3.667154 seconds\n"
825     ]
826    }
827   ],
828   "source": [
829    "print('Start loading SOBJ mm1')\n",
830    "start = cputime()\n",
831    "mm1_sobj = load('mm1.sobj')\n",
832    "end = cputime(start)\n",
833    "print('Finished loading SOBJ mm1 after %s seconds' %end)\n",
834    "stream.close()"
835   ]
836  },
837  {
838   "cell_type": "code",
839   "execution_count": 23,
840   "metadata": {},
841   "outputs": [
842    {
843     "data": {
844      "text/plain": [
845       "True"
846      ]
847     },
848     "execution_count": 23,
849     "metadata": {},
850     "output_type": "execute_result"
851    }
852   ],
853   "source": [
854    "mm1 == mm1_sobj"
855   ]
856  },
857  {
858   "cell_type": "code",
859   "execution_count": 24,
860   "metadata": {},
861   "outputs": [
862    {
863     "data": {
864      "text/plain": [
865       "True"
866      ]
867     },
868     "execution_count": 24,
869     "metadata": {},
870     "output_type": "execute_result"
871    }
872   ],
873   "source": [
874    "mm1.parent() == mm1_sobj.parent()"
875   ]
876  },
877  {
878   "cell_type": "markdown",
879   "metadata": {},
880   "source": [
881    "Difference in file-size:\n",
882    "\n",
883    "```bash\n",
884    "-rw-r--r-- 1 sebastian sebastian   44565 Okt 14 07:59 mm1.yaml\n",
885    "-rw-r--r-- 1 sebastian sebastian 5475008 Okt 14 07:59 mm1.sobj\n",
886    "```"
887   ]
888  },
889  {
890   "cell_type": "markdown",
891   "metadata": {},
892   "source": [
893    "The reason for this difference comes from:"
894   ]
895  },
896  {
897   "cell_type": "code",
898   "execution_count": 25,
899   "metadata": {},
900   "outputs": [
901    {
902     "data": {
903      "text/plain": [
904       "True"
905      ]
906     },
907     "execution_count": 25,
908     "metadata": {},
909     "output_type": "execute_result"
910    }
911   ],
912   "source": [
913    "mm1.is_dense()"
914   ]
915  },
916  {
917   "cell_type": "markdown",
918   "metadata": {},
919   "source": [
920    "Even though the matrix is not really dense it has been constructed as a `dense` matrix since the maple-file contains the full list of entries. Thus, `sobj` seems to serialize the full list as well. But anyway, the matrix read back from YAML is `dense`, as well:"
921   ]
922  },
923  {
924   "cell_type": "code",
925   "execution_count": 26,
926   "metadata": {},
927   "outputs": [
928    {
929     "data": {
930      "text/plain": [
931       "True"
932      ]
933     },
934     "execution_count": 26,
935     "metadata": {},
936     "output_type": "execute_result"
937    }
938   ],
939   "source": [
940    "mm1_yaml.is_dense()"
941   ]
942  },
943  {
944   "cell_type": "markdown",
945   "metadata": {},
946   "source": [
947    "So we may serialize a `dense` matrix as `dict` and a `sparse` matrix as a `list` depending on which way is more efficient  with respect to file-size!\n",
948    "\n",
949    "Converting `mm1` to a `sparse` matrix gives some reputation for `sobj`:"
950   ]
951  },
952  {
953   "cell_type": "code",
954   "execution_count": 27,
955   "metadata": {},
956   "outputs": [
957    {
958     "name": "stdout",
959     "output_type": "stream",
960     "text": [
961      "Start dumping SOBJ mm1s\n",
962      "Finished dumping  SOBJ mm1s after 0.024181 seconds\n"
963     ]
964    }
965   ],
966   "source": [
967    "MSsparse = MatrixSpace(R, 648, 648, sparse=True)\n",
968    "mm1s = MSsparse(mm1)\n",
969    "print('Start dumping SOBJ mm1s')\n",
970    "start = cputime()\n",
971    "save(mm1s, 'mm1s.sobj')\n",
972    "end = cputime(start)\n",
973    "print('Finished dumping  SOBJ mm1s after %s seconds' %end)\n",
974    "stream.close()"
975   ]
976  },
977  {
978   "cell_type": "code",
979   "execution_count": 28,
980   "metadata": {},
981   "outputs": [
982    {
983     "name": "stdout",
984     "output_type": "stream",
985     "text": [
986      "Start loading SOBJ mm1s\n",
987      "Finished loading SOBJ mm1s after 0.020051 seconds\n"
988     ]
989    }
990   ],
991   "source": [
992    "print('Start loading SOBJ mm1s')\n",
993    "start = cputime()\n",
994    "mm1s_sobj = load('mm1s.sobj')\n",
995    "end = cputime(start)\n",
996    "print('Finished loading SOBJ mm1s after %s seconds' %end)\n",
997    "stream.close()"
998   ]
999  },
1000  {
1001   "cell_type": "markdown",
1002   "metadata": {},
1003   "source": [
1004    "Difference in file-size now:\n",
1005    "\n",
1006    "Python 2:\n",
1007    "```bash\n",
1008    "-rw-rw-r-- 1 sebastian sebastian   44565 Okt 15 18:08 mm1.yaml\n",
1009    "-rw-rw-r-- 1 sebastian sebastian 3673860 Okt 15 18:09 mm1.sobj\n",
1010    "-rw-rw-r-- 1 sebastian sebastian   24612 Okt 15 18:09 mm1s.sobj\n",
1011    "```\n",
1012    "Python 3:\n",
1013    "```bash\n",
1014    "-rw-r--r-- 1 sebastian sebastian   44565 Okt 14 07:59 mm1.yaml\n",
1015    "-rw-r--r-- 1 sebastian sebastian 5475008 Okt 14 07:59 mm1.sobj\n",
1016    "-rw-r--r-- 1 sebastian sebastian   39150 Okt 14 07:59 mm1s.sobj\n",
1017    "```"
1018   ]
1019  },
1020  {
1021   "cell_type": "markdown",
1022   "metadata": {},
1023   "source": [
1024    "Thus, as one would expect, for stable long-term serialization you have to pay with larger file-size and more CPU-time. But I think, the user should be enabled to decide by himself, which feature is preferable.\n",
1025    "\n",
1026    "### Handling of version incompatibilty\n",
1027    "\n",
1028    "(following [comment 1](https://trac.sagemath.org/ticket/28302#comment:1), [comment 9](https://trac.sagemath.org/ticket/28302#comment:9) and [comment 15](https://trac.sagemath.org/ticket/28302#comment:15) of the ticket)\n",
1029    "\n",
1030    "Finally, to explain the meaning of the `upgrade`-function of cell 2, suppose that the `yaml_matrix_space_representer` would have been implemented different before version 8.4 ([#23719](https://trac.sagemath.org/ticket/23719)), for example like this:"
1031   ]
1032  },
1033  {
1034   "cell_type": "code",
1035   "execution_count": 29,
1036   "metadata": {},
1037   "outputs": [],
1038   "source": [
1039    "def yaml_matrix_space_representer(representer, data):\n",
1040    "    nrows, ncols = data.dims()\n",
1041    "    dict_dump = {'base_ring':data.base_ring(), 'nrows':nrows, 'ncols':ncols, 'sparse':data.is_sparse(), 'implementation':data._implementation, 'version':version_dict()}\n",
1042    "    return representer.represent_mapping('!ParentMatrixSpace', dict_dump)\n",
1043    "\n",
1044    "yaml.representer.add_multi_representer(MatrixSpace,        yaml_matrix_space_representer)"
1045   ]
1046  },
1047  {
1048   "cell_type": "markdown",
1049   "metadata": {},
1050   "source": [
1051    "and that `mm1` has been dumped with in old enought version:"
1052   ]
1053  },
1054  {
1055   "cell_type": "code",
1056   "execution_count": 30,
1057   "metadata": {},
1058   "outputs": [],
1059   "source": [
1060    "if not hasattr(mm1.parent(), 'Element'):\n",
1061    "    stream = open('mm1-old.yaml', 'w')\n",
1062    "    print('Start dumping YAML mm1-old')\n",
1063    "    start = cputime()\n",
1064    "    yaml.dump(mm1, stream=stream)\n",
1065    "    end = cputime(start)\n",
1066    "    print('Finished dumping  YAML mm1-old after %s seconds' %end)\n",
1067    "    stream.close()"
1068   ]
1069  },
1070  {
1071   "cell_type": "markdown",
1072   "metadata": {},
1073   "source": [
1074    "Than the difference of `mm1.yaml` to the corresponding file `mm1-old.yaml` will be (here using 8.1):\n",
1075    "\n",
1076    "```diff\n",
1077    "diff mm1.yaml mm1-old.yaml \n",
1078    "3248c3248\n",
1079    "<       version: {major: 8, minor: 9, prerelease: true, tiny: 0}\n",
1080    "---\n",
1081    ">       version: {major: 8, minor: 1, prerelease: false, tiny: 0}\n",
1082    "3250c3250,3251\n",
1083    "<     version: {major: 8, minor: 9, prerelease: true, tiny: 0}\n",
1084    "---\n",
1085    ">     version: {major: 8, minor: 1, prerelease: false, tiny: 0}\n",
1086    ">   implementation: flint\n",
1087    "3254c3255\n",
1088    "<   version: {major: 8, minor: 9, prerelease: true, tiny: 0}\n",
1089    "---\n",
1090    ">   version: {major: 8, minor: 1, prerelease: false, tiny: 0}\n",
1091    "```\n",
1092    "\n",
1093    "since `flint` had been the default value for `implementation` at that time. Now, the advantage of the **human readable file** is, that an advanced Sage user can delete the line `implementation: flint` easily from the file. He will than be able to read it into the recent version. \n",
1094    "\n",
1095    "Now, the task of the `upgrade` function is to do exactly the same implicitly, so that all sage users may use the file in recent versions (another task of this function could be its usage in a conversion-tool). With the above implementation of `upgrade` this works fine:"
1096   ]
1097  },
1098  {
1099   "cell_type": "code",
1100   "execution_count": 31,
1101   "metadata": {},
1102   "outputs": [
1103    {
1104     "name": "stdout",
1105     "output_type": "stream",
1106     "text": [
1107      "mm1-old successfully loaded\n"
1108     ]
1109    }
1110   ],
1111   "source": [
1112    "if version_info.major == 2:\n",
1113    "    from exceptions import IOError as FileNotFoundError \n",
1114    "try:\n",
1115    "    stream = open('mm1-old.yaml', 'r')\n",
1116    "    mm1_old = yaml.load(stream)\n",
1117    "    print('mm1-old successfully loaded')\n",
1118    "except FileNotFoundError:\n",
1119    "    mm1_old = mm1_yaml\n",
1120    "    print('mm1-old not found')"
1121   ]
1122  },
1123  {
1124   "cell_type": "code",
1125   "execution_count": 32,
1126   "metadata": {},
1127   "outputs": [
1128    {
1129     "data": {
1130      "text/plain": [
1131       "True"
1132      ]
1133     },
1134     "execution_count": 32,
1135     "metadata": {},
1136     "output_type": "execute_result"
1137    }
1138   ],
1139   "source": [
1140    "mm1_old == mm1"
1141   ]
1142  },
1143  {
1144   "cell_type": "code",
1145   "execution_count": 33,
1146   "metadata": {},
1147   "outputs": [
1148    {
1149     "data": {
1150      "text/plain": [
1151       "True"
1152      ]
1153     },
1154     "execution_count": 33,
1155     "metadata": {},
1156     "output_type": "execute_result"
1157    }
1158   ],
1159   "source": [
1160    "mm1_old.parent() == mm1.parent()"
1161   ]
1162  },
1163  {
1164   "cell_type": "markdown",
1165   "metadata": {},
1166   "source": [
1167    "On the other hand: Deactivating the `upgrade`-function:"
1168   ]
1169  },
1170  {
1171   "cell_type": "code",
1172   "execution_count": 34,
1173   "metadata": {},
1174   "outputs": [],
1175   "source": [
1176    "def upgrade(construction_call, arguments, version):\n",
1177    "    return arguments"
1178   ]
1179  },
1180  {
1181   "cell_type": "markdown",
1182   "metadata": {},
1183   "source": [
1184    "will cause an error when reading the old file into a recent Sage-version:"
1185   ]
1186  },
1187  {
1188   "cell_type": "code",
1189   "execution_count": 35,
1190   "metadata": {},
1191   "outputs": [],
1192   "source": [
1193    "#stream=open('mm1-old.yaml', 'r')\n",
1194    "#mm1_old=yaml.load(stream)"
1195   ]
1196  },
1197  {
1198   "cell_type": "markdown",
1199   "metadata": {},
1200   "source": [
1201    "if `mm1-old.yaml` was created successfully as above, the commented lines give:\n",
1202    "\n",
1203    "```py\n",
1204    "sage: stream = open('mm1-old.yaml', 'r')\n",
1205    "sage: mm1_old = yaml.load(stream)\n",
1206    "Traceback (most recent call last):\n",
1207    "...\n",
1208    "ValueError: unknown matrix implementation 'flint' over Multivariate Polynomial Ring in u, v over Univariate Laurent Polynomial Ring in w over Integer Ring\n",
1209    "```"
1210   ]
1211  }
1212 ],
1213 "metadata": {
1214  "kernelspec": {
1215   "display_name": "SageMath 8.9.rc1",
1216   "language": "sage",
1217   "name": "sagemath"
1218  },
1219  "language_info": {
1220   "codemirror_mode": {
1221    "name": "ipython",
1222    "version": 2
1223   },
1224   "file_extension": ".py",
1225   "mimetype": "text/x-python",
1226   "name": "python",
1227   "nbconvert_exporter": "python",
1228   "pygments_lexer": "ipython2",
1229   "version": "2.7.15"
1230  }
1231 },
1232 "nbformat": 4,
1233 "nbformat_minor": 2
1234}