Opened 5 years ago

Closed 5 years ago

#21874 closed defect (fixed)

Make autogen/pari Python 3 compatible

Reported by: jdemeyer Owned by:
Priority: major Milestone: sage-7.5
Component: python3 Keywords:
Cc: defeo, chapoton Merged in:
Authors: Frédéric Chapoton Reviewers: Jeroen Demeyer
Report Upstream: N/A Work issues:
Branch: 07d262a (Commits, GitHub, GitLab) Commit: 07d262ad2bef1bfe9387ebb885ff773bcab219a0
Dependencies: Stopgaps:

Status badges

Description


Change History (24)

comment:1 Changed 5 years ago by chapoton

  • Authors set to Frédéric Chapoton
  • Branch set to u/chapoton/21874
  • Commit set to c36156ebea1662aad4584f9d954e0fa20b22a0b4
  • Status changed from new to needs_review

New commits:

c36156e21874 py3 compatible autogen/pari (bytes for communicate)

comment:2 Changed 5 years ago by git

  • Commit changed from c36156ebea1662aad4584f9d954e0fa20b22a0b4 to 6d3d2c7468e783b737736180045a2898644f9ea5

Branch pushed to git repo; I updated commit sha1. New commits:

6d3d2c7trac 21874 fixing

comment:3 Changed 5 years ago by chapoton

should be good now. Works for python3, not checked for python2.

comment:4 Changed 5 years ago by jdemeyer

  • Status changed from needs_review to needs_work

Doctest failure, see patchbot.

comment:5 follow-up: Changed 5 years ago by jdemeyer

The "native" format for filenames is bytes. So it seems silly to convert bytes to unicode and then let Python convert it back to bytes. Just drop the .decode().

comment:6 follow-up: Changed 5 years ago by chapoton

elsewhere in the code, one takes the sum (+) of datadir and a string, so one needs the decode()

comment:7 Changed 5 years ago by git

  • Commit changed from 6d3d2c7468e783b737736180045a2898644f9ea5 to 169eb2107367e9ddd1a8bdee43c434d71abc08f7

Branch pushed to git repo; I updated commit sha1. New commits:

169eb21trac 21874 correct doctest

comment:8 Changed 5 years ago by chapoton

  • Status changed from needs_work to needs_review

comment:9 in reply to: ↑ 6 Changed 5 years ago by jdemeyer

  • Status changed from needs_review to needs_work

Replying to chapoton:

elsewhere in the code, one takes the sum (+) of datadir and a string, so one needs the decode()

That string should then also be bytes.

comment:10 in reply to: ↑ 5 ; follow-up: Changed 5 years ago by defeo

The "native" format for filenames is bytes.

Are you sure about this? In Python 3.5

>>> import os
>>> type(os.listdir('.')[0])
str

comment:11 in reply to: ↑ 10 Changed 5 years ago by jdemeyer

listdir outputs str when given str and bytes when gives bytes:

>>> import os
>>> type(os.listdir(b'.')[0])
<class 'bytes'>
>>> import os
>>> type(os.listdir('.')[0])
<class 'str'>
Version 1, edited 5 years ago by jdemeyer (previous) (next) (diff)

comment:12 follow-up: Changed 5 years ago by jdemeyer

See https://docs.python.org/3/howto/unicode.html#unicode-filenames

Note the sentence "When opening a file for reading or writing, you can usually just provide the Unicode string as the filename, and it will be automatically converted to the right encoding for you". This means that Python does a conversion of encoding, so it makes sense to use bytes for filenames to avoid conversions.

comment:13 in reply to: ↑ 12 ; follow-up: Changed 5 years ago by defeo

Replying to jdemeyer:

See https://docs.python.org/3/howto/unicode.html#unicode-filenames

Note the sentence "When opening a file for reading or writing, you can usually just provide the Unicode string as the filename, and it will be automatically converted to the right encoding for you". This means that Python does a conversion of encoding, so it makes sense to use bytes for filenames to avoid conversions.

But handling the locale/filesystem encoding seems best left to Python. What if datastring is bytes, string is python's utf-8, and the user's locale is ISO-8859? What is safest among open(datastring.decode() + string) and open(datastring + string.encode())? Probably neither.

I do not even dare imagine what would happen with a WIN-1252-encoded fat filesystem mounted inside a ISO-8859 locale (these things can happen with usb keys...).

comment:14 in reply to: ↑ 13 ; follow-up: Changed 5 years ago by jdemeyer

Replying to defeo:

But handling the locale/filesystem encoding seems best left to Python. What if datastring is bytes, string is python's utf-8, and the user's locale is ISO-8859? What is safest among open(datastring.decode() + string) and open(datastring + string.encode())? Probably neither.

If string is ASCII, then string.encode() should be pretty safe since most encodings map ASCII to ASCII bytes.

comment:15 in reply to: ↑ 14 ; follow-up: Changed 5 years ago by defeo

Replying to jdemeyer:

Replying to defeo:

But handling the locale/filesystem encoding seems best left to Python. What if datastring is bytes, string is python's utf-8, and the user's locale is ISO-8859? What is safest among open(datastring.decode() + string) and open(datastring + string.encode())? Probably neither.

If string is ASCII, then string.encode() should be pretty safe since most encodings map ASCII to ASCII bytes.

Same goes for datastring.decode(), then. I was obviously thinking of the case where they contain non-ASCII characters.

comment:16 in reply to: ↑ 15 ; follow-up: Changed 5 years ago by jdemeyer

Replying to defeo:

Same goes for datastring.decode(), then.

Not if datastring refers to a user-chosen path. Then we shouldn't make too many assumptions.

comment:17 in reply to: ↑ 16 Changed 5 years ago by defeo

Replying to jdemeyer:

Replying to defeo:

Same goes for datastring.decode(), then.

Not if datastring refers to a user-chosen path. Then we shouldn't make too many assumptions.

Right. I thought it was the other way around. So where are these datadir + string? I see one on line 83 of parser.py, and there string is pari.desc, so I agree with you (assuming that gp correctly handles locale encodings). Any other places?

comment:18 Changed 5 years ago by chapoton

I think this is the only place. Could you please handle all this without me ?

comment:19 Changed 5 years ago by git

  • Commit changed from 169eb2107367e9ddd1a8bdee43c434d71abc08f7 to 07d262ad2bef1bfe9387ebb885ff773bcab219a0

Branch pushed to git repo; I updated commit sha1. New commits:

07d262atrac 21874 using bytes everywhere

comment:20 Changed 5 years ago by chapoton

  • Status changed from needs_work to needs_review

ok, here is another try, following Jeroen's suggestion

comment:21 Changed 5 years ago by chapoton

patchbot now gives a green light

comment:22 Changed 5 years ago by jdemeyer

  • Reviewers set to Jeroen Demeyer

Good for me then.

comment:23 Changed 5 years ago by jdemeyer

  • Status changed from needs_review to positive_review

comment:24 Changed 5 years ago by vbraun

  • Branch changed from u/chapoton/21874 to 07d262ad2bef1bfe9387ebb885ff773bcab219a0
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.