Opened 3 years ago

Closed 3 years ago

#28490 closed enhancement (fixed)

Let Sage honour PEP 515 (py3)

Reported by: Thierry Monteil Owned by:
Priority: major Milestone: sage-9.1
Component: python3 Keywords: preparser, py3
Cc: Samuel Lelièvre Merged in:
Authors: Vincent Delecroix, John Palmieri Reviewers: Thierry Monteil, Vincent Delecroix, John Palmieri
Report Upstream: N/A Work issues:
Branch: 4c0787d (Commits, GitHub, GitLab) Commit: 4c0787ddabd93c2b79d652131503685c98d12888
Dependencies: Stopgaps:

Status badges

Description (last modified by John Palmieri)

In Python 3, underscored numbers like 10_000_000.0 are allowed, but the preparser, as well as integer and real number constructors, do not handle them yet, see this Ask Sage question and PEP 515.

Change History (45)

comment:1 Changed 3 years ago by Samuel Lelièvre

Cc: Samuel Lelièvre added
Description: modified (diff)
Keywords: preparser py3 added

comment:2 Changed 3 years ago by John Palmieri

Branch: u/jhpalmieri/preparse_underscore

comment:3 Changed 3 years ago by John Palmieri

Authors: John Palmieri
Commit: 3601e61b65ce7c67e272e4d16be6a555318a6351
Status: newneeds_review

Here is an attempt. This does at least one other thing: it no longer allows leading zeroes in integers. This is not valid Python syntax, so Sage should not allow it, either.

comment:4 Changed 3 years ago by git

Commit: 3601e61b65ce7c67e272e4d16be6a555318a6351cfcdc57ecce11491d8428c50a5ecfc4beea5663a

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

cfcdc57trac 28490: preparse underscores in numeric literals.

comment:5 Changed 3 years ago by Thierry Monteil

Reviewers: Thierry Monteil
Status: needs_reviewneeds_work

In my understanding of the PEP (confirmed by testing on a ipython3 console), trailing underscores are not allowed, however we currently have:

sage: 100_000_
100000

Also, perhaps could you show the feature in action with a concrete example, something like:

The preparser can handle PEP 515 (see :trac:`28490`)::

    sage: 1_000_000 + 3_000
    1003000

comment:6 Changed 3 years ago by git

Commit: cfcdc57ecce11491d8428c50a5ecfc4beea5663a61f06181e04f3c1266b22cd502bf223850df0ac0

Branch pushed to git repo; I updated commit sha1. New commits:

61f0618trac 28490: do not allow trailing underscores

comment:7 Changed 3 years ago by git

Commit: 61f06181e04f3c1266b22cd502bf223850df0ac0a26c28960fd2c67b2415ed3bd75f9acac42d5a76

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

a26c289trac 28490: do not allow trailing underscores

comment:8 Changed 3 years ago by John Palmieri

Description: modified (diff)
Status: needs_workneeds_review

Okay, trailing underscores now should fail, and I added the doctest.

comment:9 Changed 3 years ago by Thierry Monteil

The patchbots seem unhappy, apparently the doctests do not behave the same there.

comment:10 Changed 3 years ago by John Palmieri

The doctests pass with Python 3, not with Python 2. I guess there is a question: using underscores is only valid starting in Python. Do we want to allow this in Sage built with Python 2?

Also, a leading zero is valid in Python 2 but not in Python 3. What should we do about that?

comment:11 Changed 3 years ago by git

Commit: a26c28960fd2c67b2415ed3bd75f9acac42d5a762fe807bad14c00b3c27421c8765893f0de56e1f2

Branch pushed to git repo; I updated commit sha1. New commits:

2fe807btrac 28490: fix py2 behavior and doctests

comment:12 Changed 3 years ago by John Palmieri

I propose to allow underscores with Python 2.

comment:13 Changed 3 years ago by Thierry Monteil

My personal opinion:

  • write a simple code with Python 3 in mind only, to avoid cumbersome code, since Python2 will be dropped soon anyway (an i guess nobody using Python 2 will complain that 1000_000 should not be accepted)
  • use # py2 and # py3 in doctests to make patchbots happy.

comment:14 Changed 3 years ago by John Palmieri

That's mostly what I did, except that with my code and Sage + Python 2, leading zeroes are still allowed. Aside from the # py2 and # py3 doctest tags, the only thing I've added to support Python 2 is to insert if (six.PY3 and ...) instead of if (...). (This is to deal with leading zeroes.)

comment:15 Changed 3 years ago by John Palmieri

ping?

comment:16 Changed 3 years ago by Erik Bray

Milestone: sage-8.9sage-9.1

Ticket retargeted after milestone closed

comment:17 Changed 3 years ago by Vincent Delecroix

Could you factorize

        # Multiple consecutive underscores is invalid Python
        # syntax. Do not preparse.
        if num.find('__') != -1:
            continue

        # Trailing underscore is invalid Python syntax. Do not
        # preparse.
        if num.endswith('_'):
            continue

into

        # Multiple consecutive underscores or trailing underscore
        # is invalid Python syntax. Do not preparse.
        if num.find('__') != -1 or num.endswith('_'):
            continue

comment:18 Changed 3 years ago by git

Commit: 2fe807bad14c00b3c27421c8765893f0de56e1f28d22780cadd555599c2f50ec74c92274f668ccbc

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

9c3ba0ctrac 28490: preparse underscores in numeric literals.
ab0c26etrac 28490: do not allow trailing underscores
172ea24trac 28490: fix py2 behavior and doctests
8d22780trac 28490: combine two "if" statements into one.

comment:19 in reply to:  17 Changed 3 years ago by John Palmieri

Replying to vdelecroix:

Could you factorize

        # Multiple consecutive underscores is invalid Python
        # syntax. Do not preparse.
        if num.find('__') != -1:
            continue

        # Trailing underscore is invalid Python syntax. Do not
        # preparse.
        if num.endswith('_'):
            continue

into

        # Multiple consecutive underscores or trailing underscore
        # is invalid Python syntax. Do not preparse.
        if num.find('__') != -1 or num.endswith('_'):
            continue

Done.

comment:20 Changed 3 years ago by Vincent Delecroix

Why would you change 123_456 to 123456? This works perfectly fine

sage: preparser(False)
sage: Integer(123_456)
123456
sage: preparser(True)

The only problem seems to be that .0 is too often converted to .gen(0)

sage: preparse("100_000.0")
'100_000.gen(0)'

comment:21 Changed 3 years ago by Vincent Delecroix

Couldn't you simply change dec_num to something like dec_num = r"\d+(_\d+)*" ?

comment:22 Changed 3 years ago by Vincent Delecroix

Status: needs_reviewneeds_work

And it is not the role of the Sage preparser to implement PEP requests.

comment:23 Changed 3 years ago by Vincent Delecroix

In other words: it is good to make strings with underscored be recognized as numbers. But do not modify the input string beyond what is strictly necessary.

comment:24 Changed 3 years ago by Vincent Delecroix

I put a branch at public/28490 which contains (I hope) the correct regexps. I also added more doctests for floating point numbers.

Of course Integer('1_23') and RealNumber('1_1.') are broken. But this is IMHO a distinct problem.

comment:25 in reply to:  20 ; Changed 3 years ago by John Palmieri

Replying to vdelecroix:

Why would you change 123_456 to 123456? This works perfectly fine

5 months ago, when the ticket was opened, Python 2 was the default for Sage, and all of these things were completely broken with Python 2. Should this ticket provide support for Python 2? I haven't looked at your changes, so I don't know if they do.

And it is not the role of the Sage preparser to implement PEP requests.

It's not a request anymore, it's been merged into Python. Whose role is it to make sure that valid Python 3 syntax, like 100_000.0, is interpreted properly in Sage?

comment:26 Changed 3 years ago by John Palmieri

By the way, I removed dec_num because it was redundant: covered (I think) by the other regexps.

comment:27 in reply to:  25 Changed 3 years ago by Vincent Delecroix

Replying to jhpalmieri:

Replying to vdelecroix:

Why would you change 123_456 to 123456? This works perfectly fine

5 months ago, when the ticket was opened, Python 2 was the default for Sage, and all of these things were completely broken with Python 2. Should this ticket provide support for Python 2? I haven't looked at your changes, so I don't know if they do.

Python 2 is supposed to be dropped in sage-9.2. It should be supported for the next 9.1 release.

And it is not the role of the Sage preparser to implement PEP requests.

It's not a request anymore, it's been merged into Python. Whose role is it to make sure that valid Python 3 syntax, like 100_000.0, is interpreted properly in Sage?

Agreed. Did you have a look at public/28490? It does not touch to the interpreter beyond the update of regexp.

comment:28 Changed 3 years ago by John Palmieri

I looked at your branch. It doesn't work with Python 2, for what that's worth:

sage: 123_456
  File "<ipython-input-1-d4d70e3ee054>", line 1
    Integer(123_456)
                  ^
SyntaxError: invalid syntax

I also get a few doctest failures with Python 3, but they should be easy to fix.

You can delete dec_num since it is redundant.

  • src/sage/repl/preparse.py

    diff --git a/src/sage/repl/preparse.py b/src/sage/repl/preparse.py
    index a699394d1b..9b877285e9 100644
    a b def preparse_numeric_literals(code, extract=False): 
    830830
    831831    global all_num_regex
    832832    if all_num_regex is None:
    833         dec_num = r"\b\d+(_\d+)*"
    834833        hex_num = r"\b0x[0-9a-f]+(_[0-9a-f]+)*"
    835834        oct_num = r"\b0o[0-7]+(_[0-7]+)*"
    836835        bin_num = r"\b0b[01]+(_[01]+)*"
    837836        # This is slightly annoying as floating point numbers may start
    838837        # with a decimal point, but if they do the \b will not match.
    839838        float_num = r"((\b\d+(_\d+)*([.](\d+(_\d+)*)?)?)|([.]\d+(_\d+)*))(e[-+]?\d+(_\d+)*)?"
    840         all_num = r"((%s)|(%s)|(%s)|(%s)|(%s))(rj|rL|jr|Lr|j|L|r|)\b" % (hex_num, oct_num, bin_num, float_num, dec_num)
     839        all_num = r"((%s)|(%s)|(%s)|(%s))(rj|rL|jr|Lr|j|L|r|)\b" % (hex_num, oct_num, bin_num, float_num)
    841840        all_num_regex = re.compile(all_num, re.I)
    842841
    843842    for m in all_num_regex.finditer(code):

The preparser is still broken with your branch:

sage: preparse('100_00.0')
"RealNumber('100_00.gen(0)')"

We need to fix this and add a doctest.

Last edited 3 years ago by John Palmieri (previous) (diff)

comment:29 Changed 3 years ago by John Palmieri

First, with this change:

  • src/sage/repl/preparse.py

    diff --git a/src/sage/repl/preparse.py b/src/sage/repl/preparse.py
    index a699394d1b..dd798f06b7 100644
    a b def preparse(line, reset=True, do_time=False, ignore_prompts=False, 
    13001300
    13011301    # Generators
    13021302    # R.0 -> R.gen(0)
    1303     L = re.sub(r'([_a-zA-Z]\w*|[)\]])\.(\d+)', r'\1.gen(\2)', L)
     1303    L = re.sub(r'([a-zA-Z]\w*|[)\]])\.(\d+)', r'\1.gen(\2)', L)
    13041304
    13051305    # Use ^ for exponentiation and ^^ for xor
    13061306    # (A side effect is that **** becomes xor as well.)

preparse(100_000.0) works, but 100_000.0 does not:

sage: preparse('100_000.0')
"RealNumber('100_000.0')"
sage: 100_000.0
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
...
TypeError: unable to convert '100_000.0' to a real number

So that's some progress. Then this change fixes the conversion to RealNumber:

  • src/sage/rings/real_mpfr.pyx

    diff --git a/src/sage/rings/real_mpfr.pyx b/src/sage/rings/real_mpfr.pyx
    index 26cfddc0a2..8ec11ae38f 100644
    a b cdef class RealNumber(sage.structure.element.RingElement): 
    14821482            mpfr_set_z(self.value, (<gmpy2.mpz>x).z, parent.rnd)
    14831483        else:
    14841484            s = str(x).replace(' ','')
     1485            s = str(x).replace('_','')
    14851486            s_lower = s.lower()
    14861487            if s_lower == 'infinity':
    14871488                raise ValueError('can only convert signed infinity to RR')

comment:30 in reply to:  28 Changed 3 years ago by Vincent Delecroix

Replying to jhpalmieri:

I looked at your branch. It doesn't work with Python 2, for what that's worth:

sage: 123_456
  File "<ipython-input-1-d4d70e3ee054>", line 1
    Integer(123_456)
                  ^
SyntaxError: invalid syntax

It does work with Python2 as the above SyntaxError is the expected behavior

$ python2 -c '123_456'
  File "<string>", line 1
    123_456
          ^
SyntaxError: invalid syntax

I don't see why we would change that in Sage.

comment:31 in reply to:  29 ; Changed 3 years ago by Vincent Delecroix

Replying to jhpalmieri:

First, with this change:

<SNIP>

preparse(100_000.0) works, but 100_000.0 does not:

<SNIP>

So that's some progress.

Nice! To my mind, this is the end. RealNumber should accept this kind of strings as float does.

sage: float('1_1.1_1e1_1')
1111000000000.0

It is not the job of the preparser to modify the input string.

Then this change fixes the conversion to RealNumber:

<SNIP>

Which is precisely what I don't wnat to do here!

comment:32 in reply to:  31 ; Changed 3 years ago by John Palmieri

Replying to vdelecroix:

Then this change fixes the conversion to RealNumber:

<SNIP>

Which is precisely what I don't wnat to do here!

Why not? This is in the __init__ method for RealNumber, where the string is already modified, as you can see from the surrounding lines. It's not in the preparser. Where else would you want to fix it?

comment:33 in reply to:  32 Changed 3 years ago by Vincent Delecroix

Replying to jhpalmieri:

Replying to vdelecroix:

Then this change fixes the conversion to RealNumber:

<SNIP>

Which is precisely what I don't wnat to do here!

Why not? This is in the __init__ method for RealNumber, where the string is already modified, as you can see from the surrounding lines. It's not in the preparser. Where else would you want to fix it?

Oups. I misread. It is fine modifying RealNumber. But the title and description has to be modified as it is not only about touching the preparser. Also Integer and Rational constructors have to be fixed.

comment:34 Changed 3 years ago by Vincent Delecroix

Your 'fix' from the first part of comment:29 was not one... I will update the branch in a second.

comment:35 Changed 3 years ago by Vincent Delecroix

Authors: John PalmieriJohn Palmieri, Vincent Delecroix
Branch: u/jhpalmieri/preparse_underscorepublic/28490
Commit: 8d22780cadd555599c2f50ec74c92274f668ccbc604d18debfdbb45e69258e61f847b15d01d0440e

New commits:

604d18d28490: let preparser handle correctly _ in numbers

comment:36 Changed 3 years ago by git

Commit: 604d18debfdbb45e69258e61f847b15d01d0440ed40ba4464cb4e8e2f202985f7cfe6623556fe99a

Branch pushed to git repo; I updated commit sha1. New commits:

f235d0228490: remove leading 0 doctest
0148e7a28490: more real numbers parser doctests
ca9809328490: remove dec_num
d40ba4428490: fix X.0 -> X.gen(0)

comment:37 Changed 3 years ago by git

Commit: d40ba4464cb4e8e2f202985f7cfe6623556fe99a70496a9218e87daf75e25f77375760023991456e

Branch pushed to git repo; I updated commit sha1. New commits:

70496a928490: integer and real number constructor

comment:38 Changed 3 years ago by Vincent Delecroix

Status: needs_workneeds_review

comment:39 Changed 3 years ago by John Palmieri

With your branch in Python 2, Integer('1_3') returns 13 but plain 1_3 raises a SyntaxError. Is that what you intended? It's not consistent behavior.

comment:40 Changed 3 years ago by John Palmieri

Summary: Let Sage preparser honour PEP 515 (py3)Let Sage honour PEP 515 (py3)

comment:41 Changed 3 years ago by John Palmieri

Description: modified (diff)

comment:42 in reply to:  39 Changed 3 years ago by Vincent Delecroix

Replying to jhpalmieri:

With your branch in Python 2, Integer('1_3') returns 13 but plain 1_3 raises a SyntaxError. Is that what you intended? It's not consistent behavior.

It is not nice I agree. But I do not quite see inconsistency. Some constructors (like Integer.__init__) can perfectly support additional syntax. The problem to me is that on Python2, 1_3. does not raise an error. But I do not want to worry about Python2. If you find a non invasive solution, feel free to add to the branch.

Related comment: it would be interesting to benchmark the two options for the preparser 123 -> Integer(123) to 123 -> Integer('123'). See the noticeable slowdown mentioned in this sage-devel thread.

Last edited 3 years ago by Vincent Delecroix (previous) (diff)

comment:43 Changed 3 years ago by git

Commit: 70496a9218e87daf75e25f77375760023991456e4c0787ddabd93c2b79d652131503685c98d12888

Branch pushed to git repo; I updated commit sha1. New commits:

4c0787dtrac 28490: tag one doctest "py3"

comment:44 Changed 3 years ago by John Palmieri

Authors: John Palmieri, Vincent DelecroixVincent Delecroix, John Palmieri
Reviewers: Thierry MonteilThierry Monteil, Vincent Delecroix, John Palmieri
Status: needs_reviewpositive_review

Okay, good enough, I think.

comment:45 Changed 3 years ago by Volker Braun

Branch: public/284904c0787ddabd93c2b79d652131503685c98d12888
Resolution: fixed
Status: positive_reviewclosed
Note: See TracTickets for help on using tickets.