#25535 closed defect (fixed)

failing internet doctest in sage/repl/load.py

Reported by: slabbe Owned by:
Priority: major Milestone: sage-8.7
Component: doctest coverage Keywords: thursdaysbdx
Cc: Merged in:
Authors: Sébastien Labbé Reviewers: Vincent Klein
Report Upstream: N/A Work issues:
Branch: adeedb8 (Commits) Commit: adeedb8cacfa38f4f4ad3580aa103aed646b0e0f
Dependencies: Stopgaps:

Description

As reported on sage-release 8.3.beta3,

sage -tp --optional=sage,internet src/sage/repl/load.py

gives

sage -t src/sage/repl/load.py
**********************************************************************
File "src/sage/repl/load.py", line 142, in sage.repl.load.load
Failed example:
    sage.repl.load.load('http://wstein.org/loadtest.py', globals())  # optional - internet
Exception raised:
    Traceback (most recent call last):
      File "/home/slabbe/GitBox/sage/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 572, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/home/slabbe/GitBox/sage/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 982, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.repl.load.load[16]>", line 1, in <module>
        sage.repl.load.load('http://wstein.org/loadtest.py', globals())  # optional - internet
      File "/home/slabbe/GitBox/sage/local/lib/python2.7/site-packages/sage/repl/load.py", line 242, in load
        code = compile(f.read(), fpath, 'exec')
      File "/home/slabbe/.sage/temp/labbe-laptop/4725/tmp_aLSpZn.py", line 1
        <!DOCTYPE html>
        ^
    SyntaxError: invalid syntax
**********************************************************************
1 item had failures:
   1 of  37 in sage.repl.load.load
    [45 tests, 1 failure, 3.81 s]
----------------------------------------------------------------------
sage -t src/sage/repl/load.py  # 1 doctest failed
----------------------------------------------------------------------

Change History (24)

comment:1 Changed 16 months ago by slabbe

This works:

$ wget http://wstein.org/loadtest.py
$ cat loadtest.py 
print "hi from the net"

print 2+3

But it does not work from Sage:

sage: sage.repl.load.load('http://wstein.org/loadtest.py', globals())
  File "/home/slabbe/.sage/temp/labbe-laptop/25148/tmp_GGzNox.py", line 1
    <!DOCTYPE html>
    ^
SyntaxError: invalid syntax

This allows to understand the problem:

sage: !cat /home/slabbe/.sage/temp/labbe-laptop/25148/tmp_GGzNox.py
<!DOCTYPE html>
...
<head>
<title>Access denied | wstein.org used Cloudflare to restrict access</title>
...
        <h2 class="cf-subheadline">Access denied</h2>
...
            <h2 data-translate="what_happened">What happened?</h2>
            <p>The owner of this website (wstein.org) has banned your access based on your browser's signature. ...</p>
          </div>
...
</script>
</body>
</html>
sage: 

comment:2 Changed 16 months ago by gh-timokau

Is the point of that doctest to fetch a random python file from a webpage over http and execute it? If so, we should probably remove the doctest instead or amend it to use a local file / check the hash of the file first.

comment:3 Changed 16 months ago by nbruin

The issue that Cloudflare blocks the default python useragent was already encountered at #25222. We have a workaround there as well.

Indeed the wisdom in executing a file fetched from the internet relatively blindly is questionable.

comment:4 Changed 16 months ago by slabbe

  • Branch set to u/slabbe/25535
  • Commit set to d4eab4b746f3df347dc31b6277b17b58218a816d
  • Status changed from new to needs_review

As suggested by nbruin, I added some User-Agent in get_remote_file. The problem is that I now use urlopen instead of urlretrieve. Thus, I can not specify the report_hook anymore which was used by the verbose.

I set it to needs_review to get some feedback by the patchbots.


New commits:

d4eab4b25535: specifying a User-Agent

comment:5 Changed 16 months ago by gh-timokau

In my opinion the security issue of fetching an executing a file from the internet (especially over HTTP) should also be addressed. Although you could argue that it is out of scope in this ticket, maybe it can be combined.

comment:6 follow-up: Changed 16 months ago by slabbe

Indeed, I wanted to say that my first commit was just fixing the doctest with the new url http://www.sagemath.org/files/loadtest.py. I am not a specialist on security issues.

I know that fixing this doctest means that the loadtest.py file will be downloaded and executed each time someone runs optional doctests including those with tag internet. Is it possible that someone replaces the loadtest.py file by something else more malecious one day? I don't know. I let other discuss the security issue and possibly propose other commits, solutions...

Meanwhile, I know that the doctest is broken which means that something is currently broken and keeping the doctest will guarrentees that the feature continue to be tested.

comment:7 in reply to: ↑ 6 Changed 16 months ago by gh-timokau

Replying to slabbe:

Is it possible that someone replaces the loadtest.py file by something else more malecious one day? I don't know. I let other discuss the security issue and possibly propose other commits, solutions...

Yes, that is one possibility. It could be replaced by someone on the sage team for various motivations (though rather unlikely) or be replaced as part of a server compromise. Targeted attacks that go unnoticed would also be possible. But worse is the possibility of the request jost being MITM'd by *anybody* on the network since it is fetched over HTTP. So if you're running the sage doctests in a public wifi, anybody else in that wifi could execute code on your computer.

Possible solutions include

  • first downloading the file, then verifying the contents match a hash, then loading it
  • just loading a local file

Do we really need to test that we can fetch a file from the internet? Thats more or less a python primitive. And if we need to test that we can and should test it seperately from the load mechanism.

comment:8 Changed 16 months ago by chapoton

the loadtest.py file should also be made python3-compatible

EDIT: DONE

Last edited 16 months ago by chapoton (previous) (diff)

comment:9 Changed 15 months ago by git

  • Commit changed from d4eab4b746f3df347dc31b6277b17b58218a816d to 05e53d0c041d17ab7921c4ce950cb57c25a72767

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

5f01ebe25535: specifying a User-Agent
05e53d025535: fixing doctest of get_remote_file function

comment:10 Changed 15 months ago by slabbe

I rebased on top of the most recent beta and I added a commit.

There are three issues begin dealt in this ticket:

  • get_remote_file was broken
  • get_remote_file was not properly tested (even broken, the doctests were passing)
  • load was loading a file from the http which is not secure

I believe the proposed two commits solves the above 3 issues (needs review).

comment:11 follow-up: Changed 15 months ago by slabbe

There is one small thing that I am still not able to preserve. It is the loading bar of dots ........ as in:

sage: get_remote_file('http://www.sagemath.org/files/loadtest.py', verbose=True)
Attempting to load remote file: http://www.sagemath.org/files/loadtest.py
Loading: [.................]    

which used to be printed with the help of the report_hook which I can not use anymore since I replaced the call to urlretrieve by a call to urlopen:

-        urlretrieve(filename, temp_name, report_hook)
+        content = urlopen(req, timeout=1)
+        with open(temp_name, 'w') as f:
+            f.write(content.read())

Somebody has a solution? (... one of them being getting rid of the report_hook thing)

comment:12 follow-up: Changed 15 months ago by chapoton

le plugin "pyflakes" est pas content, car un import est en trop..

comment:13 Changed 14 months ago by git

  • Commit changed from 05e53d0c041d17ab7921c4ce950cb57c25a72767 to 9e3c4f9ec5189c85561974072777d42b7c9b6fe7

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

68a656625535: specifying a User-Agent
bf53dd825535: fixing doctest of get_remote_file function
9e3c4f925535: removing urlretrieve command in comments

comment:14 in reply to: ↑ 12 Changed 14 months ago by slabbe

Replying to chapoton:

le plugin "pyflakes" est pas content, car un import est en trop..

yes, this was because of the urlretrieve that I am replacing by urlopen. I was waiting for an answer to my previous comment before removing the import urlretrieve...

I removed it now and rebased the branch on top of 8.4.rc0

comment:15 Changed 11 months ago by vklein

  • Milestone changed from sage-8.3 to sage-8.6

comment:16 Changed 11 months ago by vklein

  • Owner changed from (none) to vklein

comment:17 in reply to: ↑ 11 Changed 11 months ago by vklein

Replying to slabbe:

Somebody has a solution? (... one of them being getting rid of the report_hook thing)

Others urlopen usage in sage don't seems to display a kind of "Download progress bar" and in my opinion something like "Loading started" and "Loading ended" is enough for the verbose mode. It's still possible to implement a read chunk by chunk if it's really needed.

Either way the remaining report_hook function and cur global variable should be removed from the code.

comment:18 Changed 11 months ago by slabbe

  • Status changed from needs_review to needs_work

ok

comment:19 Changed 11 months ago by vklein

  • Owner changed from vklein to (none)

I am ok with the rest of this ticket, i have no other remark.

comment:20 Changed 11 months ago by embray

  • Milestone changed from sage-8.6 to sage-8.7

Retarging tickets optimistically to the next milestone. If you are responsible for this ticket (either its reporter or owner) and don't believe you are likely to complete this ticket before the next release (8.7) please retarget this ticket's milestone to sage-pending or sage-wishlist.

comment:21 Changed 11 months ago by git

  • Commit changed from 9e3c4f9ec5189c85561974072777d42b7c9b6fe7 to adeedb8cacfa38f4f4ad3580aa103aed646b0e0f

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

abcdf8725535: specifying a User-Agent
0a105e025535: fixing doctest of get_remote_file function
b4bfc4a25535: removing urlretrieve command in comments
adeedb825535: cleaning report_hook

comment:22 Changed 11 months ago by slabbe

  • Authors set to Sébastien Labbé
  • Keywords thursdaysbdx added
  • Status changed from needs_work to needs_review

comment:23 Changed 11 months ago by vklein

  • Reviewers set to Vincent Klein
  • Status changed from needs_review to positive_review

Looks good to me.

comment:24 Changed 10 months ago by vbraun

  • Branch changed from u/slabbe/25535 to adeedb8cacfa38f4f4ad3580aa103aed646b0e0f
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.