Opened 6 years ago

Closed 6 years ago

Preparser gets lost with iterated ellipsis_range

Reported by: Owned by: tmonteil major sage-6.5 user interface Nils Bruin Thierry Monteil, Marc Mezzarobba N/A 7748c7c (Commits) 7748c7c62cdd946fbcec5647faeec9ded0db8573

Description

Here is a bug reported during a workshop at Villetaneuse today:

sage: [1..10]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
sage: len([1..10])
10
sage: [1..len([1..10])]
[1, 2, 3]

After a quick look, it seems that there is a preparsing problem, [1..len([1..10])] should be preparsed as something like:

(ellipsis_range(Integer(1),Ellipsis,len(ellipsis_range(Integer(1),Ellipsis,Integer(10)))))

but it is currently preparsed as:

sage: preparse('[1..len([1..10])]')
'(ellipsis_range(Integer(1),Ellipsis,len([Integer(1),Ellipsis,Integer(10)])))'

(the second ellipsis_range disappeared).

comment:1 Changed 6 years ago by nbruin

I think we're really out of our depth here. ellipsis_range can actually take multiple .. in its arguments, so we need to distinguish between

[1,2..10,12..20]

and

[1,2..len([10,12..20])]

The code in question is presently (see sage.misc.preparser line 520)

ix = code.find('..')
while ix != -1:
if ix == 0:
raise SyntaxError("Cannot start line with ellipsis.")
elif code[ix-1]=='.':
# '...' be valid Python in index slices
code = code[:ix-1] + "Ellipsis" + code[ix+2:]
elif len(code) >= ix+3 and code[ix+2]=='.':
# '...' be valid Python in index slices
code = code[:ix] + "Ellipsis" + code[ix+3:]
else:
start_list, end_list = containing_block(code, ix, ['()','[]'])
(*)         arguments = code[start_list+1:end_list-1].replace('...', ',Ellipsis,').replace('..', ',Ellipsis,')
arguments = re.sub(r',\s*,', ',', arguments)
if preparse_step:
arguments = arguments.replace(';', ', step=')
range_or_iter = 'range' if code[start_list]=='[' else 'iter'
code = "%s(ellipsis_%s(%s))%s" %  (code[:start_list],
range_or_iter,
arguments,
code[end_list:])
ix = code.find('..')

The problem is in the line marked (*). It really does need to be prepared to make multiple substitutions, but it should leave ".." that are enclosed in some kind of bracket for later substitutions.

This is really one of those cases where string-substitutions aren't really powerful enough. It we'd start with a '..' that has a containing block that isn't properly contained in a containing block of any other '..', rather than the first '..' we find, then we'd be ok. (because inside that containing block all the '..' do belong to the same ellipsis_range).

I'm a little worried about cost with that approach: what happens if a whole ".sage" file gets preparsed that has a whole lot of '..' in it? does this routine then execute on that huge string? execution time quadratic (or worse) in the number of '..' might be problematic.

EDIT: correction: what we should do is: after we have found our initial ix and our start_list,end_list, we should do

while True:
new_ix = code.find('..',ix+1,end_list)
if new_ix == -1:
break #EXIT HERE if there are no other '..' to consider
else:
ix = new_ix
new_start_list,new_end_list = containing_block(code,ix,['(),'[]'])
if (new_start_list != start_list) or (new_end_list != end_list):
start_list = new_start_list
end_list = new_end_list

This should be pretty quick because it simply zooms in on the block that's under consideration. Those blocks should be pretty small, even in a big file.

This code should probably be edited a bit because there are apparently some occurrences of '...' that need to be treated differently and I'm not doing that here at the moment.

Last edited 6 years ago by nbruin (previous) (diff)

comment:2 Changed 6 years ago by nbruin

Incidentally, the comma handling is perhaps a little too permissive:

sage: [1,,2..3]
[1, 2, 3]
sage: [1,,2,3]

Running preparse_ellipsis has the side effect for turning something that should be a syntax error into valid code.

comment:3 Changed 6 years ago by nbruin

• Branch set to u/nbruin/preparser_gets_lost_with_iterated_ellipsis_range

comment:4 Changed 6 years ago by nbruin

• Commit set to 0c0ddd9935c9907906c7627e9d20538f443c6b23

OK, I did not deal with the extra comma replacements (which seem relatively harmless). i think the "..." detection is simply a matter that those "..." can be replaced by Ellipsis but should not trigger the generation of an ellipsis_range/ellipsis_iter. That's my current interpretation.

New commits:

 ​0c0ddd9 trac 17378: make sure preparse_ellipsis deals with nested ranges properly

comment:5 Changed 6 years ago by nbruin

• Authors set to nbruin
• Status changed from new to needs_review

comment:6 Changed 6 years ago by git

• Commit changed from 0c0ddd9935c9907906c7627e9d20538f443c6b23 to 7b4b9ec8d1df91b7c4659deb1d3d5fc8901ce339

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

 ​7b4b9ec trac 17378: make sure preparse_ellipsis deals with nested ranges properly

comment:7 Changed 6 years ago by git

• Commit changed from 7b4b9ec8d1df91b7c4659deb1d3d5fc8901ce339 to a0f5daec2c09514cb9981479d93018a353d3d6a4

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

 ​a0f5dae trac 17378: make sure preparse_ellipsis deals with nested ranges properly

comment:8 Changed 6 years ago by tmonteil

• Authors changed from nbruin to Nils Bruin
• Reviewers set to Thierry Monteil
• Status changed from needs_review to needs_work

This makes sense and fixes the problem.

Could you just replace #17378 by :trac:`17378` in the doctest so that sphinx can handle it ?

comment:9 Changed 6 years ago by tmonteil

• Branch changed from u/nbruin/preparser_gets_lost_with_iterated_ellipsis_range to u/tmonteil/preparser_gets_lost_with_iterated_ellipsis_range

comment:10 Changed 6 years ago by tmonteil

• Commit changed from a0f5daec2c09514cb9981479d93018a353d3d6a4 to 0f8764fc79a47000fea74e01736a4728a0e895a1
• Status changed from needs_work to positive_review

New commits:

 ​0f8764f #17378 let sphinx know about trac ticket (reviewer commit).

comment:11 Changed 6 years ago by vbraun

• Status changed from positive_review to needs_work

Conflicts with #17396

comment:12 Changed 6 years ago by nbruin

• Branch changed from u/tmonteil/preparser_gets_lost_with_iterated_ellipsis_range to u/nbruin/preparser_gets_lost_with_iterated_ellipsis_range

comment:13 Changed 6 years ago by nbruin

• Commit changed from 0f8764fc79a47000fea74e01736a4728a0e895a1 to 7748c7c62cdd946fbcec5647faeec9ded0db8573
• Status changed from needs_work to needs_review

hard rebase to resolve conflict

New commits:

 ​7748c7c trac 17378: make sure preparse_ellipsis deals with nested ranges properly

comment:14 Changed 6 years ago by mmezzarobba

• Reviewers changed from Thierry Monteil to Thierry Monteil, Marc Mezzarobba
• Status changed from needs_review to positive_review

comment:15 Changed 6 years ago by vbraun

• Branch changed from u/nbruin/preparser_gets_lost_with_iterated_ellipsis_range to 7748c7c62cdd946fbcec5647faeec9ded0db8573
• Resolution set to fixed
• Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.