#28490 closed enhancement (fixed)

Let Sage honour PEP 515 (py3)

Reported by: tmonteil Owned by:
Priority: major Milestone: sage-9.1
Component: python3 Keywords: preparser, py3
Cc: slelievre Merged in:
Authors: Vincent Delecroix, John Palmieri Reviewers: Thierry Monteil, Vincent Delecroix, John Palmieri
Report Upstream: N/A Work issues:
Branch: 4c0787d (Commits, GitHub, GitLab) Commit: 4c0787ddabd93c2b79d652131503685c98d12888
Dependencies: Stopgaps:

Status badges

Description (last modified by jhpalmieri)

In Python 3, underscored numbers like 10_000_000.0 are allowed, but the preparser, as well as integer and real number constructors, do not handle them yet, see this Ask Sage question and PEP 515.

Change History (45)

comment:1 Changed 19 months ago by slelievre

  • Cc slelievre added
  • Description modified (diff)
  • Keywords preparser py3 added

comment:2 Changed 18 months ago by jhpalmieri

  • Branch set to u/jhpalmieri/preparse_underscore

comment:3 Changed 18 months ago by jhpalmieri

  • Authors set to John Palmieri
  • Commit set to 3601e61b65ce7c67e272e4d16be6a555318a6351
  • Status changed from new to needs_review

Here is an attempt. This does at least one other thing: it no longer allows leading zeroes in integers. This is not valid Python syntax, so Sage should not allow it, either.

comment:4 Changed 18 months ago by git

  • Commit changed from 3601e61b65ce7c67e272e4d16be6a555318a6351 to cfcdc57ecce11491d8428c50a5ecfc4beea5663a

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

cfcdc57trac 28490: preparse underscores in numeric literals.

comment:5 Changed 18 months ago by tmonteil

  • Reviewers set to Thierry Monteil
  • Status changed from needs_review to needs_work

In my understanding of the PEP (confirmed by testing on a ipython3 console), trailing underscores are not allowed, however we currently have:

sage: 100_000_
100000

Also, perhaps could you show the feature in action with a concrete example, something like:

The preparser can handle PEP 515 (see :trac:`28490`)::

    sage: 1_000_000 + 3_000
    1003000

comment:6 Changed 18 months ago by git

  • Commit changed from cfcdc57ecce11491d8428c50a5ecfc4beea5663a to 61f06181e04f3c1266b22cd502bf223850df0ac0

Branch pushed to git repo; I updated commit sha1. New commits:

61f0618trac 28490: do not allow trailing underscores

comment:7 Changed 18 months ago by git

  • Commit changed from 61f06181e04f3c1266b22cd502bf223850df0ac0 to a26c28960fd2c67b2415ed3bd75f9acac42d5a76

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

a26c289trac 28490: do not allow trailing underscores

comment:8 Changed 18 months ago by jhpalmieri

  • Description modified (diff)
  • Status changed from needs_work to needs_review

Okay, trailing underscores now should fail, and I added the doctest.

comment:9 Changed 18 months ago by tmonteil

The patchbots seem unhappy, apparently the doctests do not behave the same there.

comment:10 Changed 18 months ago by jhpalmieri

The doctests pass with Python 3, not with Python 2. I guess there is a question: using underscores is only valid starting in Python. Do we want to allow this in Sage built with Python 2?

Also, a leading zero is valid in Python 2 but not in Python 3. What should we do about that?

comment:11 Changed 18 months ago by git

  • Commit changed from a26c28960fd2c67b2415ed3bd75f9acac42d5a76 to 2fe807bad14c00b3c27421c8765893f0de56e1f2

Branch pushed to git repo; I updated commit sha1. New commits:

2fe807btrac 28490: fix py2 behavior and doctests

comment:12 Changed 18 months ago by jhpalmieri

I propose to allow underscores with Python 2.

comment:13 Changed 18 months ago by tmonteil

My personal opinion:

  • write a simple code with Python 3 in mind only, to avoid cumbersome code, since Python2 will be dropped soon anyway (an i guess nobody using Python 2 will complain that 1000_000 should not be accepted)
  • use # py2 and # py3 in doctests to make patchbots happy.

comment:14 Changed 18 months ago by jhpalmieri

That's mostly what I did, except that with my code and Sage + Python 2, leading zeroes are still allowed. Aside from the # py2 and # py3 doctest tags, the only thing I've added to support Python 2 is to insert if (six.PY3 and ...) instead of if (...). (This is to deal with leading zeroes.)

comment:15 Changed 16 months ago by jhpalmieri

ping?

comment:16 Changed 16 months ago by embray

  • Milestone changed from sage-8.9 to sage-9.1

Ticket retargeted after milestone closed

comment:17 follow-up: Changed 14 months ago by vdelecroix

Could you factorize

        # Multiple consecutive underscores is invalid Python
        # syntax. Do not preparse.
        if num.find('__') != -1:
            continue

        # Trailing underscore is invalid Python syntax. Do not
        # preparse.
        if num.endswith('_'):
            continue

into

        # Multiple consecutive underscores or trailing underscore
        # is invalid Python syntax. Do not preparse.
        if num.find('__') != -1 or num.endswith('_'):
            continue

comment:18 Changed 14 months ago by git

  • Commit changed from 2fe807bad14c00b3c27421c8765893f0de56e1f2 to 8d22780cadd555599c2f50ec74c92274f668ccbc

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

9c3ba0ctrac 28490: preparse underscores in numeric literals.
ab0c26etrac 28490: do not allow trailing underscores
172ea24trac 28490: fix py2 behavior and doctests
8d22780trac 28490: combine two "if" statements into one.

comment:19 in reply to: ↑ 17 Changed 14 months ago by jhpalmieri

Replying to vdelecroix:

Could you factorize

        # Multiple consecutive underscores is invalid Python
        # syntax. Do not preparse.
        if num.find('__') != -1:
            continue

        # Trailing underscore is invalid Python syntax. Do not
        # preparse.
        if num.endswith('_'):
            continue

into

        # Multiple consecutive underscores or trailing underscore
        # is invalid Python syntax. Do not preparse.
        if num.find('__') != -1 or num.endswith('_'):
            continue

Done.

comment:20 follow-up: Changed 14 months ago by vdelecroix

Why would you change 123_456 to 123456? This works perfectly fine

sage: preparser(False)
sage: Integer(123_456)
123456
sage: preparser(True)

The only problem seems to be that .0 is too often converted to .gen(0)

sage: preparse("100_000.0")
'100_000.gen(0)'

comment:21 Changed 14 months ago by vdelecroix

Couldn't you simply change dec_num to something like dec_num = r"\d+(_\d+)*" ?

comment:22 Changed 14 months ago by vdelecroix

  • Status changed from needs_review to needs_work

And it is not the role of the Sage preparser to implement PEP requests.

comment:23 Changed 14 months ago by vdelecroix

In other words: it is good to make strings with underscored be recognized as numbers. But do not modify the input string beyond what is strictly necessary.

comment:24 Changed 14 months ago by vdelecroix

I put a branch at public/28490 which contains (I hope) the correct regexps. I also added more doctests for floating point numbers.

Of course Integer('1_23') and RealNumber('1_1.') are broken. But this is IMHO a distinct problem.

comment:25 in reply to: ↑ 20 ; follow-up: Changed 14 months ago by jhpalmieri

Replying to vdelecroix:

Why would you change 123_456 to 123456? This works perfectly fine

5 months ago, when the ticket was opened, Python 2 was the default for Sage, and all of these things were completely broken with Python 2. Should this ticket provide support for Python 2? I haven't looked at your changes, so I don't know if they do.

And it is not the role of the Sage preparser to implement PEP requests.

It's not a request anymore, it's been merged into Python. Whose role is it to make sure that valid Python 3 syntax, like 100_000.0, is interpreted properly in Sage?

comment:26 Changed 14 months ago by jhpalmieri

By the way, I removed dec_num because it was redundant: covered (I think) by the other regexps.

comment:27 in reply to: ↑ 25 Changed 14 months ago by vdelecroix

Replying to jhpalmieri:

Replying to vdelecroix:

Why would you change 123_456 to 123456? This works perfectly fine

5 months ago, when the ticket was opened, Python 2 was the default for Sage, and all of these things were completely broken with Python 2. Should this ticket provide support for Python 2? I haven't looked at your changes, so I don't know if they do.

Python 2 is supposed to be dropped in sage-9.2. It should be supported for the next 9.1 release.

And it is not the role of the Sage preparser to implement PEP requests.

It's not a request anymore, it's been merged into Python. Whose role is it to make sure that valid Python 3 syntax, like 100_000.0, is interpreted properly in Sage?

Agreed. Did you have a look at public/28490? It does not touch to the interpreter beyond the update of regexp.

comment:28 follow-up: Changed 14 months ago by jhpalmieri

I looked at your branch. It doesn't work with Python 2, for what that's worth:

sage: 123_456
  File "<ipython-input-1-d4d70e3ee054>", line 1
    Integer(123_456)
                  ^
SyntaxError: invalid syntax

I also get a few doctest failures with Python 3, but they should be easy to fix.

You can delete dec_num since it is redundant.

diff --git a/src/sage/repl/preparse.py b/src/sage/repl/preparse.py
index a699394d1b..9b877285e9 100644
--- a/src/sage/repl/preparse.py
+++ b/src/sage/repl/preparse.py
@@ -830,14 +830,13 @@ def preparse_numeric_literals(code, extract=False):
 
     global all_num_regex
     if all_num_regex is None:
-        dec_num = r"\b\d+(_\d+)*"
         hex_num = r"\b0x[0-9a-f]+(_[0-9a-f]+)*"
         oct_num = r"\b0o[0-7]+(_[0-7]+)*"
         bin_num = r"\b0b[01]+(_[01]+)*"
         # This is slightly annoying as floating point numbers may start
         # with a decimal point, but if they do the \b will not match.
         float_num = r"((\b\d+(_\d+)*([.](\d+(_\d+)*)?)?)|([.]\d+(_\d+)*))(e[-+]?\d+(_\d+)*)?"
-        all_num = r"((%s)|(%s)|(%s)|(%s)|(%s))(rj|rL|jr|Lr|j|L|r|)\b" % (hex_num, oct_num, bin_num, float_num, dec_num)
+        all_num = r"((%s)|(%s)|(%s)|(%s))(rj|rL|jr|Lr|j|L|r|)\b" % (hex_num, oct_num, bin_num, float_num)
         all_num_regex = re.compile(all_num, re.I)
 
     for m in all_num_regex.finditer(code):

The preparser is still broken with your branch:

sage: preparse('100_00.0')
"RealNumber('100_00.gen(0)')"

We need to fix this and add a doctest.

Version 0, edited 14 months ago by jhpalmieri (next)

comment:29 follow-up: Changed 14 months ago by jhpalmieri

First, with this change:

  • src/sage/repl/preparse.py

    diff --git a/src/sage/repl/preparse.py b/src/sage/repl/preparse.py
    index a699394d1b..dd798f06b7 100644
    a b def preparse(line, reset=True, do_time=False, ignore_prompts=False, 
    13001300
    13011301    # Generators
    13021302    # R.0 -> R.gen(0)
    1303     L = re.sub(r'([_a-zA-Z]\w*|[)\]])\.(\d+)', r'\1.gen(\2)', L)
     1303    L = re.sub(r'([a-zA-Z]\w*|[)\]])\.(\d+)', r'\1.gen(\2)', L)
    13041304
    13051305    # Use ^ for exponentiation and ^^ for xor
    13061306    # (A side effect is that **** becomes xor as well.)

preparse(100_000.0) works, but 100_000.0 does not:

sage: preparse('100_000.0')
"RealNumber('100_000.0')"
sage: 100_000.0
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
...
TypeError: unable to convert '100_000.0' to a real number

So that's some progress. Then this change fixes the conversion to RealNumber:

  • src/sage/rings/real_mpfr.pyx

    diff --git a/src/sage/rings/real_mpfr.pyx b/src/sage/rings/real_mpfr.pyx
    index 26cfddc0a2..8ec11ae38f 100644
    a b cdef class RealNumber(sage.structure.element.RingElement): 
    14821482            mpfr_set_z(self.value, (<gmpy2.mpz>x).z, parent.rnd)
    14831483        else:
    14841484            s = str(x).replace(' ','')
     1485            s = str(x).replace('_','')
    14851486            s_lower = s.lower()
    14861487            if s_lower == 'infinity':
    14871488                raise ValueError('can only convert signed infinity to RR')

comment:30 in reply to: ↑ 28 Changed 14 months ago by vdelecroix

Replying to jhpalmieri:

I looked at your branch. It doesn't work with Python 2, for what that's worth:

sage: 123_456
  File "<ipython-input-1-d4d70e3ee054>", line 1
    Integer(123_456)
                  ^
SyntaxError: invalid syntax

It does work with Python2 as the above SyntaxError is the expected behavior

$ python2 -c '123_456'
  File "<string>", line 1
    123_456
          ^
SyntaxError: invalid syntax

I don't see why we would change that in Sage.

comment:31 in reply to: ↑ 29 ; follow-up: Changed 14 months ago by vdelecroix

Replying to jhpalmieri:

First, with this change:

<SNIP>

preparse(100_000.0) works, but 100_000.0 does not:

<SNIP>

So that's some progress.

Nice! To my mind, this is the end. RealNumber should accept this kind of strings as float does.

sage: float('1_1.1_1e1_1')
1111000000000.0

It is not the job of the preparser to modify the input string.

Then this change fixes the conversion to RealNumber:

<SNIP>

Which is precisely what I don't wnat to do here!

comment:32 in reply to: ↑ 31 ; follow-up: Changed 14 months ago by jhpalmieri

Replying to vdelecroix:

Then this change fixes the conversion to RealNumber:

<SNIP>

Which is precisely what I don't wnat to do here!

Why not? This is in the __init__ method for RealNumber, where the string is already modified, as you can see from the surrounding lines. It's not in the preparser. Where else would you want to fix it?

comment:33 in reply to: ↑ 32 Changed 14 months ago by vdelecroix

Replying to jhpalmieri:

Replying to vdelecroix:

Then this change fixes the conversion to RealNumber:

<SNIP>

Which is precisely what I don't wnat to do here!

Why not? This is in the __init__ method for RealNumber, where the string is already modified, as you can see from the surrounding lines. It's not in the preparser. Where else would you want to fix it?

Oups. I misread. It is fine modifying RealNumber. But the title and description has to be modified as it is not only about touching the preparser. Also Integer and Rational constructors have to be fixed.

comment:34 Changed 14 months ago by vdelecroix

Your 'fix' from the first part of comment:29 was not one... I will update the branch in a second.

comment:35 Changed 14 months ago by vdelecroix

  • Authors changed from John Palmieri to John Palmieri, Vincent Delecroix
  • Branch changed from u/jhpalmieri/preparse_underscore to public/28490
  • Commit changed from 8d22780cadd555599c2f50ec74c92274f668ccbc to 604d18debfdbb45e69258e61f847b15d01d0440e

New commits:

604d18d28490: let preparser handle correctly _ in numbers

comment:36 Changed 14 months ago by git

  • Commit changed from 604d18debfdbb45e69258e61f847b15d01d0440e to d40ba4464cb4e8e2f202985f7cfe6623556fe99a

Branch pushed to git repo; I updated commit sha1. New commits:

f235d0228490: remove leading 0 doctest
0148e7a28490: more real numbers parser doctests
ca9809328490: remove dec_num
d40ba4428490: fix X.0 -> X.gen(0)

comment:37 Changed 14 months ago by git

  • Commit changed from d40ba4464cb4e8e2f202985f7cfe6623556fe99a to 70496a9218e87daf75e25f77375760023991456e

Branch pushed to git repo; I updated commit sha1. New commits:

70496a928490: integer and real number constructor

comment:38 Changed 14 months ago by vdelecroix

  • Status changed from needs_work to needs_review

comment:39 follow-up: Changed 14 months ago by jhpalmieri

With your branch in Python 2, Integer('1_3') returns 13 but plain 1_3 raises a SyntaxError. Is that what you intended? It's not consistent behavior.

comment:40 Changed 14 months ago by jhpalmieri

  • Summary changed from Let Sage preparser honour PEP 515 (py3) to Let Sage honour PEP 515 (py3)

comment:41 Changed 14 months ago by jhpalmieri

  • Description modified (diff)

comment:42 in reply to: ↑ 39 Changed 14 months ago by vdelecroix

Replying to jhpalmieri:

With your branch in Python 2, Integer('1_3') returns 13 but plain 1_3 raises a SyntaxError. Is that what you intended? It's not consistent behavior.

It is not nice I agree. But I do not quite see inconsistency. Some constructors (like Integer.__init__) can perfectly support additional syntax. The problem to me is that on Python2, 1_3. does not raise an error. But I do not want to worry about Python2. If you find a non invasive solution, feel free to add to the branch.

Related comment: it would be interesting to benchmark the two options for the preparser 123 -> Integer(123) to 123 -> Integer('123'). See the noticeable slowdown mentioned in this sage-devel thread.

Last edited 14 months ago by vdelecroix (previous) (diff)

comment:43 Changed 14 months ago by git

  • Commit changed from 70496a9218e87daf75e25f77375760023991456e to 4c0787ddabd93c2b79d652131503685c98d12888

Branch pushed to git repo; I updated commit sha1. New commits:

4c0787dtrac 28490: tag one doctest "py3"

comment:44 Changed 14 months ago by jhpalmieri

  • Authors changed from John Palmieri, Vincent Delecroix to Vincent Delecroix, John Palmieri
  • Reviewers changed from Thierry Monteil to Thierry Monteil, Vincent Delecroix, John Palmieri
  • Status changed from needs_review to positive_review

Okay, good enough, I think.

comment:45 Changed 14 months ago by vbraun

  • Branch changed from public/28490 to 4c0787ddabd93c2b79d652131503685c98d12888
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.