Opened 7 months ago

Closed 6 months ago

#30106 closed defect (fixed)

sage.libs.ecl: Fix unicode handling

Reported by: mkoeppe Owned by:
Priority: major Milestone: sage-9.2
Component: symbolics Keywords:
Cc: gh-mwageringel, nbruin, dimpase, gh-spaghettisalat Merged in:
Authors: Matthias Koeppe Reviewers: Markus Wageringel
Report Upstream: N/A Work issues:
Branch: 59dd62b (Commits) Commit: 59dd62b301bb487452c99d97fabb2f4b180a7c1b
Dependencies: #22191 Stopgaps:

Description (last modified by mkoeppe)

As a follow-up to #29278, #29280: If we use Unicode variable names in SR, declaring a domain gives an error:

sage: SR.var('Ο€', domain='real')
RuntimeError: ECL says: THROW: The catch MACSYMA-QUIT is undefined.
SystemError: <built-in method var of sage.symbolic.ring.SymbolicRing object at 0x334506908> returned a result with an error set

This comes from our ECL interface:

sage: from sage.libs.ecl import *
sage: u_symbol = EclObject('πŸ”₯')
sage: u_symbol
<repr(<sage.libs.ecl.EclObject at 0x337e7b3c8>) failed: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x94 in position 2: invalid start byte>
sage: u_symbol.python()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x94 in position 2: invalid start byte

Also note:

sage: b_symbol = EclObject(bytes([166]))
sage: b_symbol
<repr(<sage.libs.ecl.EclObject at 0x337e7b058>) failed: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa6 in position 0: invalid start byte>

Change History (32)

comment:1 Changed 7 months ago by mkoeppe

  • Description modified (diff)

comment:2 Changed 7 months ago by mkoeppe

  • Cc gh-mwageringel added; mwageringel removed

comment:3 Changed 7 months ago by mkoeppe

Not sure if there is a function in the ECL C API that constructs a Unicode Lisp string. So I would just be going through CODE-CHAR as in

sage: ecl_eval('''(type-of (map 'string #'code-char '(128293))))''')
<ECL: (SIMPLE-ARRAY CHARACTER (1))>

comment:4 Changed 7 months ago by mkoeppe

  • Description modified (diff)
  • Milestone changed from sage-wishlist to sage-9.2

comment:5 Changed 7 months ago by mkoeppe

  • Branch set to u/mkoeppe/sage_libs_ecl__fix_unicode_handling

comment:6 Changed 7 months ago by git

  • Commit set to 6ce8f9e2cb46b5fc58c5c329201d6f74301946c3

Branch pushed to git repo; I updated commit sha1. New commits:

​b6ebb52src/sage/cpython/*string*: Update documentation
​6ce8f9epython_to_ecl: Handle unicode strings

comment:7 Changed 7 months ago by mkoeppe

Now

sage: SR.var('Ο€', domain='real')
Ο€

but more work is needed.

comment:8 Changed 7 months ago by git

  • Commit changed from 6ce8f9e2cb46b5fc58c5c329201d6f74301946c3 to c98352349e1cf8989e3f4ba758788e5004c63abe

Branch pushed to git repo; I updated commit sha1. New commits:

​c983523EclObject.str: Handle unicode

comment:9 Changed 7 months ago by mkoeppe

sage: from sage.libs.ecl import *
sage: u_symbol = EclObject('πŸ”₯')
sage: u_symbol
<ECL: πŸ”₯>

comment:10 Changed 7 months ago by git

  • Commit changed from c98352349e1cf8989e3f4ba758788e5004c63abe to f2379353686eeecfaa0ae1b49936a212e4df146a

Branch pushed to git repo; I updated commit sha1. New commits:

​f237935ecl_to_python: Handle unicode strings

comment:11 Changed 7 months ago by mkoeppe

sage: u_string = EclObject('"πŸ™"')
sage: u_string
<ECL: "πŸ™">
sage: _.python()
'"πŸ™"'

comment:12 Changed 7 months ago by git

  • Commit changed from f2379353686eeecfaa0ae1b49936a212e4df146a to 6f740eb71d017088d90d891379cc826a0f4a3577

Branch pushed to git repo; I updated commit sha1. New commits:

​6f740ebecl_eval: Handle unicode strings

comment:13 Changed 7 months ago by mkoeppe

sage: clock = ecl_eval('''(defun clock (h) (string (elt "πŸ•›πŸ•πŸ•‘πŸ•’πŸ•“πŸ•”πŸ••πŸ•–πŸ•—πŸ•˜πŸ•™πŸ•š" (mod h 12))))''')
sage: clock(3).python()
'"πŸ•’"'

comment:14 Changed 7 months ago by mkoeppe

  • Authors set to Matthias Koeppe
  • Status changed from new to needs_review

comment:15 follow-up: Changed 7 months ago by gh-mwageringel

Thank you for solving this.

Should our Maxima interface support unicode with this branch now, or do you know what else is needed to make this work? The following used to give an error, but results in a crash now:

sage: var('ΞΎ')._maxima_()

Condition of type: SIMPLE-ERROR
Cannot coerce string Cannot coerce string $_SAGE_VAR_ΞΎ to a base-string to a base-string
No restarts available.
...
Excessive debugger depth! Probable infinite recursion!
Quitting.

Also, please add a doctest for the new functionality.

comment:16 follow-up: Changed 7 months ago by dimpase

Are you on ecl 20.4 ?

comment:17 in reply to: ↑ 15 Changed 7 months ago by mkoeppe

  • Status changed from needs_review to needs_work

Replying to gh-mwageringel:

The following used to give an error, but results in a crash now:

sage: var('ΞΎ')._maxima_()

Condition of type: SIMPLE-ERROR
Cannot coerce string Cannot coerce string $_SAGE_VAR_ΞΎ to a base-string to a base-string
No restarts available.
...
Excessive debugger depth! Probable infinite recursion!
Quitting.

Yes, I can confirm this here. I'll look into this.

Also, please add a doctest for the new functionality.

Sure, will do.

comment:18 in reply to: ↑ 16 Changed 7 months ago by mkoeppe

Replying to dimpase:

Are you on ecl 20.4 ?

I'm still on 16.1.2.

comment:19 Changed 7 months ago by git

  • Commit changed from 6f740eb71d017088d90d891379cc826a0f4a3577 to ee97a6d8617974f8ed2538d3d85c9ede91fa89db

Branch pushed to git repo; I updated commit sha1. New commits:

​ee97a6dFix infinite recursion when error messages are not base-strings

comment:20 Changed 7 months ago by dimpase

I suggest we work on ecl 20.4, for uniformity

comment:21 follow-up: Changed 7 months ago by mkoeppe

The above error is due to a bug in ECL 16.1.2:

> (princ-to-string 'ΞΎ)
"Ξ"
> (setf local-table (copy-readtable nil))
> (setf (readtable-case local-table) :invert)
> :INVERT
> (let ((*readtable* local-table) (*print-case* :upcase)) (princ-to-string 'ΞΎ))
Cannot coerce string Ξ to a base-string

I will try with the ECL upgrade now.

comment:22 Changed 7 months ago by git

  • Commit changed from ee97a6d8617974f8ed2538d3d85c9ede91fa89db to 2d87ee362ed1ea6c7833e1e8509456927c82c23b

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

​0076bc3backport Maxima fix for bug #3629
​1d074e8doctest fixes
​266d8c1backport ECL PR #210
​12447bcreject old makeinfo
​a3e0ecaadd upstream fix from MR 215
​89b006badd the patch from upstream MR 214
​0b77737add upstream MR 216 (to fix cygwin fork)
​8ca1c0eMerge tag '9.2.beta2' into t/22191/public/packages/ecl20
​f82c716Commit 75877dd8 from upstream
​2d87ee3Merge commit 'f82c716fdf9c6e91a07166d36b6329a15ecfb41d' of git://trac.sagemath.org/sage into t/30106/sage_libs_ecl__fix_unicode_handling

comment:23 Changed 7 months ago by git

  • Commit changed from 2d87ee362ed1ea6c7833e1e8509456927c82c23b to 828a727915e1920964c7bdbdd52e50b84e731a62

Branch pushed to git repo; I updated commit sha1. New commits:

​828a727Add doctests

comment:24 in reply to: ↑ 21 Changed 7 months ago by mkoeppe

Replying to mkoeppe:

The above error is due to a bug in ECL 16.1.2:

> (princ-to-string 'ΞΎ)
"Ξ"
> (setf local-table (copy-readtable nil))
> (setf (readtable-case local-table) :invert)
> :INVERT
> (let ((*readtable* local-table) (*print-case* :upcase)) (princ-to-string 'ΞΎ))
Cannot coerce string Ξ to a base-string

I will try with the ECL upgrade now.

This bug is still present in ECL 20.4.24. The above code is from Maxima's PRINT-INVERT-CASE function in commac.lisp.

comment:25 Changed 7 months ago by mkoeppe

  • Status changed from needs_work to needs_review

comment:26 Changed 7 months ago by mkoeppe

  • Dependencies set to #22191

comment:27 Changed 7 months ago by mkoeppe

Note that after commit ee97a6d, the test case from comment 17 now gives a proper error message, no longer a crash:

sage: var('ΞΎ')._maxima_()
....: 
<repr(<sage.interfaces.maxima_lib.MaximaLibElement at 0x32c1d4828>) failed: RuntimeError: ECL says: Cannot coerce string $_SAGE_VAR_ΞΎ to a base-string>

Fixing this ECL bug or working around it is outside of the scope of this ticket.

comment:28 Changed 7 months ago by gh-mwageringel

  • Reviewers set to Markus Wageringel

Ok, thanks for the fix. There is one more place where ecl_string_to_python should be used – in print_objects:

sage: from sage.libs.ecl import *
sage: u_symbol = EclObject('πŸ”₯')
sage: print_objects()  # crashes

Moreover, the docstring of ecl_eval should be changed into a raw string, as the examples contain backslashes.

Once that is fixed, you can set a positive review on my behalf.

comment:29 Changed 7 months ago by git

  • Commit changed from 828a727915e1920964c7bdbdd52e50b84e731a62 to 59dd62b301bb487452c99d97fabb2f4b180a7c1b

Branch pushed to git repo; I updated commit sha1. New commits:

​99b894aprint_objects: Handle unicode
​59dd62becl_eval: Make docstring raw

comment:30 Changed 7 months ago by mkoeppe

  • Status changed from needs_review to positive_review

Thank you!

comment:31 Changed 7 months ago by mkoeppe

Follow-up ticket to keep track of the ECL issue: #30122

comment:32 Changed 6 months ago by vbraun

  • Branch changed from u/mkoeppe/sage_libs_ecl__fix_unicode_handling to 59dd62b301bb487452c99d97fabb2f4b180a7c1b
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.