Opened 3 years ago
Closed 3 years ago
#28490 closed enhancement (fixed)
Let Sage honour PEP 515 (py3)
Reported by:  Thierry Monteil  Owned by:  

Priority:  major  Milestone:  sage9.1 
Component:  python3  Keywords:  preparser, py3 
Cc:  Samuel Lelièvre  Merged in:  
Authors:  Vincent Delecroix, John Palmieri  Reviewers:  Thierry Monteil, Vincent Delecroix, John Palmieri 
Report Upstream:  N/A  Work issues:  
Branch:  4c0787d (Commits, GitHub, GitLab)  Commit:  4c0787ddabd93c2b79d652131503685c98d12888 
Dependencies:  Stopgaps: 
Description (last modified by )
In Python 3, underscored numbers like 10_000_000.0
are allowed, but the preparser, as well as integer and real number constructors, do not handle them yet, see this Ask Sage question and PEP 515.
Change History (45)
comment:1 Changed 3 years ago by
Cc:  Samuel Lelièvre added 

Description:  modified (diff) 
Keywords:  preparser py3 added 
comment:2 Changed 3 years ago by
Branch:  → u/jhpalmieri/preparse_underscore 

comment:3 Changed 3 years ago by
Authors:  → John Palmieri 

Commit:  → 3601e61b65ce7c67e272e4d16be6a555318a6351 
Status:  new → needs_review 
comment:4 Changed 3 years ago by
Commit:  3601e61b65ce7c67e272e4d16be6a555318a6351 → cfcdc57ecce11491d8428c50a5ecfc4beea5663a 

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
cfcdc57  trac 28490: preparse underscores in numeric literals.

comment:5 Changed 3 years ago by
Reviewers:  → Thierry Monteil 

Status:  needs_review → needs_work 
In my understanding of the PEP (confirmed by testing on a ipython3 console), trailing underscores are not allowed, however we currently have:
sage: 100_000_ 100000
Also, perhaps could you show the feature in action with a concrete example, something like:
The preparser can handle PEP 515 (see :trac:`28490`):: sage: 1_000_000 + 3_000 1003000
comment:6 Changed 3 years ago by
Commit:  cfcdc57ecce11491d8428c50a5ecfc4beea5663a → 61f06181e04f3c1266b22cd502bf223850df0ac0 

Branch pushed to git repo; I updated commit sha1. New commits:
61f0618  trac 28490: do not allow trailing underscores

comment:7 Changed 3 years ago by
Commit:  61f06181e04f3c1266b22cd502bf223850df0ac0 → a26c28960fd2c67b2415ed3bd75f9acac42d5a76 

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
a26c289  trac 28490: do not allow trailing underscores

comment:8 Changed 3 years ago by
Description:  modified (diff) 

Status:  needs_work → needs_review 
Okay, trailing underscores now should fail, and I added the doctest.
comment:9 Changed 3 years ago by
The patchbots seem unhappy, apparently the doctests do not behave the same there.
comment:10 Changed 3 years ago by
The doctests pass with Python 3, not with Python 2. I guess there is a question: using underscores is only valid starting in Python. Do we want to allow this in Sage built with Python 2?
Also, a leading zero is valid in Python 2 but not in Python 3. What should we do about that?
comment:11 Changed 3 years ago by
Commit:  a26c28960fd2c67b2415ed3bd75f9acac42d5a76 → 2fe807bad14c00b3c27421c8765893f0de56e1f2 

Branch pushed to git repo; I updated commit sha1. New commits:
2fe807b  trac 28490: fix py2 behavior and doctests

comment:13 Changed 3 years ago by
My personal opinion:
 write a simple code with Python 3 in mind only, to avoid cumbersome code, since Python2 will be dropped soon anyway (an i guess nobody using Python 2 will complain that 1000_000 should not be accepted)
 use
# py2
and# py3
in doctests to make patchbots happy.
comment:14 Changed 3 years ago by
That's mostly what I did, except that with my code and Sage + Python 2, leading zeroes are still allowed. Aside from the # py2
and # py3
doctest tags, the only thing I've added to support Python 2 is to insert if (six.PY3 and ...)
instead of if (...)
. (This is to deal with leading zeroes.)
comment:16 Changed 3 years ago by
Milestone:  sage8.9 → sage9.1 

Ticket retargeted after milestone closed
comment:17 followup: 19 Changed 3 years ago by
Could you factorize
# Multiple consecutive underscores is invalid Python # syntax. Do not preparse. if num.find('__') != 1: continue # Trailing underscore is invalid Python syntax. Do not # preparse. if num.endswith('_'): continue
into
# Multiple consecutive underscores or trailing underscore # is invalid Python syntax. Do not preparse. if num.find('__') != 1 or num.endswith('_'): continue
comment:18 Changed 3 years ago by
Commit:  2fe807bad14c00b3c27421c8765893f0de56e1f2 → 8d22780cadd555599c2f50ec74c92274f668ccbc 

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
9c3ba0c  trac 28490: preparse underscores in numeric literals.

ab0c26e  trac 28490: do not allow trailing underscores

172ea24  trac 28490: fix py2 behavior and doctests

8d22780  trac 28490: combine two "if" statements into one.

comment:19 Changed 3 years ago by
Replying to vdelecroix:
Could you factorize
# Multiple consecutive underscores is invalid Python # syntax. Do not preparse. if num.find('__') != 1: continue # Trailing underscore is invalid Python syntax. Do not # preparse. if num.endswith('_'): continueinto
# Multiple consecutive underscores or trailing underscore # is invalid Python syntax. Do not preparse. if num.find('__') != 1 or num.endswith('_'): continue
Done.
comment:20 followup: 25 Changed 3 years ago by
Why would you change 123_456
to 123456
? This works perfectly fine
sage: preparser(False) sage: Integer(123_456) 123456 sage: preparser(True)
The only problem seems to be that .0
is too often converted to .gen(0)
sage: preparse("100_000.0") '100_000.gen(0)'
comment:21 Changed 3 years ago by
Couldn't you simply change dec_num
to something like dec_num = r"\d+(_\d+)*"
?
comment:22 Changed 3 years ago by
Status:  needs_review → needs_work 

And it is not the role of the Sage preparser to implement PEP requests.
comment:23 Changed 3 years ago by
In other words: it is good to make strings with underscored be recognized as numbers. But do not modify the input string beyond what is strictly necessary.
comment:24 Changed 3 years ago by
I put a branch at public/28490
which contains (I hope) the correct regexps. I also added more doctests for floating point numbers.
Of course Integer('1_23')
and RealNumber('1_1.')
are broken. But this is IMHO a distinct problem.
comment:25 followup: 27 Changed 3 years ago by
Replying to vdelecroix:
Why would you change
123_456
to123456
? This works perfectly fine
5 months ago, when the ticket was opened, Python 2 was the default for Sage, and all of these things were completely broken with Python 2. Should this ticket provide support for Python 2? I haven't looked at your changes, so I don't know if they do.
And it is not the role of the Sage preparser to implement PEP requests.
It's not a request anymore, it's been merged into Python. Whose role is it to make sure that valid Python 3 syntax, like 100_000.0
, is interpreted properly in Sage?
comment:26 Changed 3 years ago by
By the way, I removed dec_num
because it was redundant: covered (I think) by the other regexps.
comment:27 Changed 3 years ago by
Replying to jhpalmieri:
Replying to vdelecroix:
Why would you change
123_456
to123456
? This works perfectly fine5 months ago, when the ticket was opened, Python 2 was the default for Sage, and all of these things were completely broken with Python 2. Should this ticket provide support for Python 2? I haven't looked at your changes, so I don't know if they do.
Python 2 is supposed to be dropped in sage9.2. It should be supported for the next 9.1 release.
And it is not the role of the Sage preparser to implement PEP requests.
It's not a request anymore, it's been merged into Python. Whose role is it to make sure that valid Python 3 syntax, like
100_000.0
, is interpreted properly in Sage?
Agreed. Did you have a look at public/28490
? It does not touch to the interpreter beyond the update of regexp.
comment:28 followup: 30 Changed 3 years ago by
I looked at your branch. It doesn't work with Python 2, for what that's worth:
sage: 123_456 File "<ipythoninput1d4d70e3ee054>", line 1 Integer(123_456) ^ SyntaxError: invalid syntax
I also get a few doctest failures with Python 3, but they should be easy to fix.
You can delete dec_num
since it is redundant.

src/sage/repl/preparse.py
diff git a/src/sage/repl/preparse.py b/src/sage/repl/preparse.py index a699394d1b..9b877285e9 100644
a b def preparse_numeric_literals(code, extract=False): 830 830 831 831 global all_num_regex 832 832 if all_num_regex is None: 833 dec_num = r"\b\d+(_\d+)*"834 833 hex_num = r"\b0x[09af]+(_[09af]+)*" 835 834 oct_num = r"\b0o[07]+(_[07]+)*" 836 835 bin_num = r"\b0b[01]+(_[01]+)*" 837 836 # This is slightly annoying as floating point numbers may start 838 837 # with a decimal point, but if they do the \b will not match. 839 838 float_num = r"((\b\d+(_\d+)*([.](\d+(_\d+)*)?)?)([.]\d+(_\d+)*))(e[+]?\d+(_\d+)*)?" 840 all_num = r"((%s)(%s)(%s)(%s) (%s))(rjrLjrLrjLr)\b" % (hex_num, oct_num, bin_num, float_num, dec_num)839 all_num = r"((%s)(%s)(%s)(%s))(rjrLjrLrjLr)\b" % (hex_num, oct_num, bin_num, float_num) 841 840 all_num_regex = re.compile(all_num, re.I) 842 841 843 842 for m in all_num_regex.finditer(code):
The preparser is still broken with your branch:
sage: preparse('100_00.0') "RealNumber('100_00.gen(0)')"
We need to fix this and add a doctest.
comment:29 followup: 31 Changed 3 years ago by
First, with this change:

src/sage/repl/preparse.py
diff git a/src/sage/repl/preparse.py b/src/sage/repl/preparse.py index a699394d1b..dd798f06b7 100644
a b def preparse(line, reset=True, do_time=False, ignore_prompts=False, 1300 1300 1301 1301 # Generators 1302 1302 # R.0 > R.gen(0) 1303 L = re.sub(r'([ _azAZ]\w*[)\]])\.(\d+)', r'\1.gen(\2)', L)1303 L = re.sub(r'([azAZ]\w*[)\]])\.(\d+)', r'\1.gen(\2)', L) 1304 1304 1305 1305 # Use ^ for exponentiation and ^^ for xor 1306 1306 # (A side effect is that **** becomes xor as well.)
preparse(100_000.0)
works, but 100_000.0
does not:
sage: preparse('100_000.0') "RealNumber('100_000.0')" sage: 100_000.0  TypeError Traceback (most recent call last) ... TypeError: unable to convert '100_000.0' to a real number
So that's some progress. Then this change fixes the conversion to RealNumber
:

src/sage/rings/real_mpfr.pyx
diff git a/src/sage/rings/real_mpfr.pyx b/src/sage/rings/real_mpfr.pyx index 26cfddc0a2..8ec11ae38f 100644
a b cdef class RealNumber(sage.structure.element.RingElement): 1482 1482 mpfr_set_z(self.value, (<gmpy2.mpz>x).z, parent.rnd) 1483 1483 else: 1484 1484 s = str(x).replace(' ','') 1485 s = str(x).replace('_','') 1485 1486 s_lower = s.lower() 1486 1487 if s_lower == 'infinity': 1487 1488 raise ValueError('can only convert signed infinity to RR')
comment:30 Changed 3 years ago by
Replying to jhpalmieri:
I looked at your branch. It doesn't work with Python 2, for what that's worth:
sage: 123_456 File "<ipythoninput1d4d70e3ee054>", line 1 Integer(123_456) ^ SyntaxError: invalid syntax
It does work with Python2 as the above SyntaxError
is the expected behavior
$ python2 c '123_456' File "<string>", line 1 123_456 ^ SyntaxError: invalid syntax
I don't see why we would change that in Sage.
comment:31 followup: 32 Changed 3 years ago by
Replying to jhpalmieri:
First, with this change:
<SNIP>
preparse(100_000.0)
works, but100_000.0
does not:<SNIP>So that's some progress.
Nice! To my mind, this is the end. RealNumber
should accept this kind of strings as float
does.
sage: float('1_1.1_1e1_1') 1111000000000.0
It is not the job of the preparser to modify the input string.
Then this change fixes the conversion to
RealNumber
:<SNIP>
Which is precisely what I don't wnat to do here!
comment:32 followup: 33 Changed 3 years ago by
Replying to vdelecroix:
Then this change fixes the conversion to
RealNumber
:<SNIP>Which is precisely what I don't wnat to do here!
Why not? This is in the __init__
method for RealNumber
, where the string is already modified, as you can see from the surrounding lines. It's not in the preparser. Where else would you want to fix it?
comment:33 Changed 3 years ago by
Replying to jhpalmieri:
Replying to vdelecroix:
Then this change fixes the conversion to
RealNumber
:<SNIP>Which is precisely what I don't wnat to do here!
Why not? This is in the
__init__
method forRealNumber
, where the string is already modified, as you can see from the surrounding lines. It's not in the preparser. Where else would you want to fix it?
Oups. I misread. It is fine modifying RealNumber
. But the title and description has to be modified as it is not only about touching the preparser. Also Integer
and Rational
constructors have to be fixed.
comment:34 Changed 3 years ago by
Your 'fix' from the first part of comment:29 was not one... I will update the branch in a second.
comment:35 Changed 3 years ago by
Authors:  John Palmieri → John Palmieri, Vincent Delecroix 

Branch:  u/jhpalmieri/preparse_underscore → public/28490 
Commit:  8d22780cadd555599c2f50ec74c92274f668ccbc → 604d18debfdbb45e69258e61f847b15d01d0440e 
New commits:
604d18d  28490: let preparser handle correctly _ in numbers

comment:36 Changed 3 years ago by
Commit:  604d18debfdbb45e69258e61f847b15d01d0440e → d40ba4464cb4e8e2f202985f7cfe6623556fe99a 

comment:37 Changed 3 years ago by
Commit:  d40ba4464cb4e8e2f202985f7cfe6623556fe99a → 70496a9218e87daf75e25f77375760023991456e 

Branch pushed to git repo; I updated commit sha1. New commits:
70496a9  28490: integer and real number constructor

comment:38 Changed 3 years ago by
Status:  needs_work → needs_review 

comment:39 followup: 42 Changed 3 years ago by
With your branch in Python 2, Integer('1_3')
returns 13 but plain 1_3
raises a SyntaxError
. Is that what you intended? It's not consistent behavior.
comment:40 Changed 3 years ago by
Summary:  Let Sage preparser honour PEP 515 (py3) → Let Sage honour PEP 515 (py3) 

comment:41 Changed 3 years ago by
Description:  modified (diff) 

comment:42 Changed 3 years ago by
Replying to jhpalmieri:
With your branch in Python 2,
Integer('1_3')
returns 13 but plain1_3
raises aSyntaxError
. Is that what you intended? It's not consistent behavior.
It is not nice I agree. But I do not quite see inconsistency. Some constructors (like Integer.__init__
) can perfectly support additional syntax. The problem to me is that on Python2, 1_3.
does not raise an error. But I do not want to worry about Python2. If you find a non invasive solution, feel free to add to the branch.
Related comment: it would be interesting to benchmark the two options for the preparser 123 > Integer(123)
to 123 > Integer('123')
. See the noticeable slowdown mentioned in this sagedevel thread.
comment:43 Changed 3 years ago by
Commit:  70496a9218e87daf75e25f77375760023991456e → 4c0787ddabd93c2b79d652131503685c98d12888 

Branch pushed to git repo; I updated commit sha1. New commits:
4c0787d  trac 28490: tag one doctest "py3"

comment:44 Changed 3 years ago by
Authors:  John Palmieri, Vincent Delecroix → Vincent Delecroix, John Palmieri 

Reviewers:  Thierry Monteil → Thierry Monteil, Vincent Delecroix, John Palmieri 
Status:  needs_review → positive_review 
Okay, good enough, I think.
comment:45 Changed 3 years ago by
Branch:  public/28490 → 4c0787ddabd93c2b79d652131503685c98d12888 

Resolution:  → fixed 
Status:  positive_review → closed 
Here is an attempt. This does at least one other thing: it no longer allows leading zeroes in integers. This is not valid Python syntax, so Sage should not allow it, either.