Opened 7 months ago
Closed 6 months ago
#30106 closed defect (fixed)
sage.libs.ecl: Fix unicode handling
Reported by:  mkoeppe  Owned by:  

Priority:  major  Milestone:  sage9.2 
Component:  symbolics  Keywords:  
Cc:  ghmwageringel, nbruin, dimpase, ghspaghettisalat  Merged in:  
Authors:  Matthias Koeppe  Reviewers:  Markus Wageringel 
Report Upstream:  N/A  Work issues:  
Branch:  59dd62b (Commits)  Commit:  59dd62b301bb487452c99d97fabb2f4b180a7c1b 
Dependencies:  #22191  Stopgaps: 
Description (last modified by )
As a followup to #29278, #29280: If we use Unicode variable names in SR
, declaring a domain gives an error:
sage: SR.var('Ο', domain='real') RuntimeError: ECL says: THROW: The catch MACSYMAQUIT is undefined. SystemError: <builtin method var of sage.symbolic.ring.SymbolicRing object at 0x334506908> returned a result with an error set
This comes from our ECL interface:
sage: from sage.libs.ecl import * sage: u_symbol = EclObject('π₯') sage: u_symbol <repr(<sage.libs.ecl.EclObject at 0x337e7b3c8>) failed: UnicodeDecodeError: 'utf8' codec can't decode byte 0x94 in position 2: invalid start byte> sage: u_symbol.python() UnicodeDecodeError: 'utf8' codec can't decode byte 0x94 in position 2: invalid start byte
Also note:
sage: b_symbol = EclObject(bytes([166])) sage: b_symbol <repr(<sage.libs.ecl.EclObject at 0x337e7b058>) failed: UnicodeDecodeError: 'utf8' codec can't decode byte 0xa6 in position 0: invalid start byte>
Change History (32)
comment:1 Changed 7 months ago by
 Description modified (diff)
comment:2 Changed 7 months ago by
 Cc ghmwageringel added; mwageringel removed
comment:3 Changed 7 months ago by
comment:4 Changed 7 months ago by
 Description modified (diff)
 Milestone changed from sagewishlist to sage9.2
comment:5 Changed 7 months ago by
 Branch set to u/mkoeppe/sage_libs_ecl__fix_unicode_handling
comment:6 Changed 7 months ago by
 Commit set to 6ce8f9e2cb46b5fc58c5c329201d6f74301946c3
Branch pushed to git repo; I updated commit sha1. New commits:
βb6ebb52  src/sage/cpython/*string*: Update documentation

β6ce8f9e  python_to_ecl: Handle unicode strings

comment:7 Changed 7 months ago by
Now
sage: SR.var('Ο', domain='real') Ο
but more work is needed.
comment:8 Changed 7 months ago by
 Commit changed from 6ce8f9e2cb46b5fc58c5c329201d6f74301946c3 to c98352349e1cf8989e3f4ba758788e5004c63abe
Branch pushed to git repo; I updated commit sha1. New commits:
βc983523  EclObject.str: Handle unicode

comment:9 Changed 7 months ago by
sage: from sage.libs.ecl import * sage: u_symbol = EclObject('π₯') sage: u_symbol <ECL: π₯>
comment:10 Changed 7 months ago by
 Commit changed from c98352349e1cf8989e3f4ba758788e5004c63abe to f2379353686eeecfaa0ae1b49936a212e4df146a
Branch pushed to git repo; I updated commit sha1. New commits:
βf237935  ecl_to_python: Handle unicode strings

comment:11 Changed 7 months ago by
sage: u_string = EclObject('"π"') sage: u_string <ECL: "π"> sage: _.python() '"π"'
comment:12 Changed 7 months ago by
 Commit changed from f2379353686eeecfaa0ae1b49936a212e4df146a to 6f740eb71d017088d90d891379cc826a0f4a3577
Branch pushed to git repo; I updated commit sha1. New commits:
β6f740eb  ecl_eval: Handle unicode strings

comment:13 Changed 7 months ago by
sage: clock = ecl_eval('''(defun clock (h) (string (elt "ππππππππππππ" (mod h 12))))''') sage: clock(3).python() '"π"'
comment:14 Changed 7 months ago by
 Status changed from new to needs_review
comment:15 followup: βΒ 17 Changed 7 months ago by
Thank you for solving this.
Should our Maxima interface support unicode with this branch now, or do you know what else is needed to make this work? The following used to give an error, but results in a crash now:
sage: var('ΞΎ')._maxima_() Condition of type: SIMPLEERROR Cannot coerce string Cannot coerce string $_SAGE_VAR_ΞΎ to a basestring to a basestring No restarts available. ... Excessive debugger depth! Probable infinite recursion! Quitting.
Also, please add a doctest for the new functionality.
comment:16 followup: βΒ 18 Changed 7 months ago by
Are you on ecl 20.4 ?
comment:17 in reply to: βΒ 15 Changed 7 months ago by
 Status changed from needs_review to needs_work
Replying to ghmwageringel:
The following used to give an error, but results in a crash now:
sage: var('ΞΎ')._maxima_() Condition of type: SIMPLEERROR Cannot coerce string Cannot coerce string $_SAGE_VAR_ΞΎ to a basestring to a basestring No restarts available. ... Excessive debugger depth! Probable infinite recursion! Quitting.
Yes, I can confirm this here. I'll look into this.
Also, please add a doctest for the new functionality.
Sure, will do.
comment:18 in reply to: βΒ 16 Changed 7 months ago by
comment:19 Changed 7 months ago by
 Commit changed from 6f740eb71d017088d90d891379cc826a0f4a3577 to ee97a6d8617974f8ed2538d3d85c9ede91fa89db
Branch pushed to git repo; I updated commit sha1. New commits:
βee97a6d  Fix infinite recursion when error messages are not basestrings

comment:20 Changed 7 months ago by
I suggest we work on ecl 20.4, for uniformity
comment:21 followup: βΒ 24 Changed 7 months ago by
The above error is due to a bug in ECL 16.1.2:
> (princtostring 'ΞΎ) "Ξ" > (setf localtable (copyreadtable nil)) > (setf (readtablecase localtable) :invert) > :INVERT > (let ((*readtable* localtable) (*printcase* :upcase)) (princtostring 'ΞΎ)) Cannot coerce string Ξ to a basestring
I will try with the ECL upgrade now.
comment:22 Changed 7 months ago by
 Commit changed from ee97a6d8617974f8ed2538d3d85c9ede91fa89db to 2d87ee362ed1ea6c7833e1e8509456927c82c23b
Branch pushed to git repo; I updated commit sha1. Last 10 new commits:
β0076bc3  backport Maxima fix for bug #3629

β1d074e8  doctest fixes

β266d8c1  backport ECL PR #210

β12447bc  reject old makeinfo

βa3e0eca  add upstream fix from MR 215

β89b006b  add the patch from upstream MR 214

β0b77737  add upstream MR 216 (to fix cygwin fork)

β8ca1c0e  Merge tag '9.2.beta2' into t/22191/public/packages/ecl20

βf82c716  Commit 75877dd8 from upstream

β2d87ee3  Merge commit 'f82c716fdf9c6e91a07166d36b6329a15ecfb41d' of git://trac.sagemath.org/sage into t/30106/sage_libs_ecl__fix_unicode_handling

comment:23 Changed 7 months ago by
 Commit changed from 2d87ee362ed1ea6c7833e1e8509456927c82c23b to 828a727915e1920964c7bdbdd52e50b84e731a62
Branch pushed to git repo; I updated commit sha1. New commits:
β828a727  Add doctests

comment:24 in reply to: βΒ 21 Changed 7 months ago by
Replying to mkoeppe:
The above error is due to a bug in ECL 16.1.2:
> (princtostring 'ΞΎ) "Ξ" > (setf localtable (copyreadtable nil)) > (setf (readtablecase localtable) :invert) > :INVERT > (let ((*readtable* localtable) (*printcase* :upcase)) (princtostring 'ΞΎ)) Cannot coerce string Ξ to a basestringI will try with the ECL upgrade now.
This bug is still present in ECL 20.4.24. The above code is from Maxima's PRINTINVERTCASE
function in commac.lisp
.
comment:25 Changed 7 months ago by
 Status changed from needs_work to needs_review
comment:26 Changed 7 months ago by
 Dependencies set to #22191
comment:27 Changed 7 months ago by
Note that after commit ee97a6d, the test case from comment 17 now gives a proper error message, no longer a crash:
sage: var('ΞΎ')._maxima_() ....: <repr(<sage.interfaces.maxima_lib.MaximaLibElement at 0x32c1d4828>) failed: RuntimeError: ECL says: Cannot coerce string $_SAGE_VAR_ΞΎ to a basestring>
Fixing this ECL bug or working around it is outside of the scope of this ticket.
comment:28 Changed 7 months ago by
 Reviewers set to Markus Wageringel
Ok, thanks for the fix. There is one more place where ecl_string_to_python
should be used β in print_objects
:
sage: from sage.libs.ecl import * sage: u_symbol = EclObject('π₯') sage: print_objects() # crashes
Moreover, the docstring of ecl_eval
should be changed into a raw string, as the examples contain backslashes.
Once that is fixed, you can set a positive review on my behalf.
comment:29 Changed 7 months ago by
 Commit changed from 828a727915e1920964c7bdbdd52e50b84e731a62 to 59dd62b301bb487452c99d97fabb2f4b180a7c1b
Branch pushed to git repo; I updated commit sha1. New commits:
β99b894a  print_objects: Handle unicode

β59dd62b  ecl_eval: Make docstring raw

comment:31 Changed 7 months ago by
Followup ticket to keep track of the ECL issue: #30122
comment:32 Changed 6 months ago by
 Branch changed from u/mkoeppe/sage_libs_ecl__fix_unicode_handling to 59dd62b301bb487452c99d97fabb2f4b180a7c1b
 Resolution set to fixed
 Status changed from positive_review to closed
Not sure if there is a function in the ECL C API that constructs a Unicode Lisp string. So I would just be going through
CODECHAR
as in