Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#11749 closed enhancement (fixed)

Remove unneeded imports

Reported by: robertwb Owned by: tbd
Priority: major Milestone: sage-4.7.2
Component: performance Keywords:
Cc: leif, mderickx Merged in: sage-4.7.2.alpha3
Authors: Robert Bradshaw Reviewers: Keshav Kini, Leif Leonhardy
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by leif)

As a first step in simplifying the import hierarchy, remove obviously unused imports.


The patch below is the result of applying Robert's original patch (which was based on Sage 4.7.1) to the (presumably) final version of Sage 4.7.2.alpha3, ignoring rejects, and with some manual corrections due to afterwards occurring import errors when trying to start up Sage or running ptestlong.

The current omissions will be dealt with on follow-up tickets.


Apply only trac_11749-remove_unneeded_imports-rebased_on_4.7.2.alpha3.reviewer.patch to the Sage library.

Attachments (2)

11749-unneeded-imports.patch (249.7 KB) - added by robertwb 9 years ago.
trac_11749-remove_unneeded_imports-rebased_on_4.7.2.alpha3.reviewer.patch (227.1 KB) - added by leif 9 years ago.
Reviewer patch. Robert's patch rebased on the "final preliminary" Sage 4.7.2.alpha3. Apply only this one.

Download all attachments as: .zip

Change History (28)

Changed 9 years ago by robertwb

comment:1 Changed 9 years ago by robertwb

  • Status changed from new to needs_review

comment:2 Changed 9 years ago by kini

I'm ptestlonging this on sage.math right now.

comment:3 Changed 9 years ago by kini

Incidentally how did you generate this patch?

comment:4 Changed 9 years ago by kini

  • Authors set to Robert Bradshaw
  • Reviewers set to Keshav Kini

ptestlong passes! Before I mark this positive review I would like to know how you generated the patch, though. Just tracing imports and finding those which are not used?

comment:5 Changed 9 years ago by leif

  • Cc leif added

comment:6 Changed 9 years ago by leif

Really nice! This patch will save a lot of disk space! :D

comment:7 Changed 9 years ago by robertwb

I generated the patch with this code, plus a handful (less than a dozen) manual touch-ups where, e.g., symbols were used in eval() strings but not directly.

import symtable
import ast
import re

class GlobalImports(ast.NodeVisitor):
    
    func_depth = 0
    
    def __init__(self):
        self.modules = {}
        self.names = {}
        self.used = {}
    
    def visit_FunctionDef(self, node):
        self.func_depth += 1
        self.generic_visit(node)
        self.func_depth -= 1
        
    def visit_Import(self, node):
        if self.func_depth == 0:
            for alias in node.names:
                self.modules[alias.asname or alias.name] = alias.name
    
    def visit_ImportFrom(self, node):
        if self.func_depth == 0:
            for alias in node.names:
                if alias.name != '*':
                    self.names[alias.asname or alias.name] = (node.module, alias.name)
    
    def visit_Name(self, node):
        self.used[node.id] = True

def list_unused(file, source=None):
    source = source or open(file).read()
    tree = ast.parse(source)
    visitor = GlobalImports()
    visitor.visit(tree)
#    print visitor.modules
#    print visitor.names
#    print visitor.used
    all = dict(visitor.modules)
    all.update(visitor.names)
    for name in all:
        if name not in visitor.used:
            if '.' in name:
                if source.count(name) == 1:
                    print "Unused", name
            else:
                print "Unused", name
#    symtab = symtable. symtable(source, file, 'exec')
#    return symtab

def extract_used(source):
    visitor = GlobalImports()
    tree = ast.parse(source)
    visitor.visit(tree)
    return visitor.used

def prune_unused_one(file, source=None):
    if source is None:
        source = open(file).read()
    used = extract_used(source)
    import_matcher = re.compile(r'(from (\S+) import (.*))|(import (.*))').match
    
    bad_name = re.compile('[()#]').search
    lines = source.split('\n')
    for lineno, line in enumerate(lines):
        m = import_matcher(line)
        if m:
            g = m.groups()
            if g[0]:
                if g[1] == '__future__':
                    continue
                names = g[2].split(',')
            else:
                names = g[4].split(',')
            #print names
            for ix, name in enumerate(names):
                name = name.strip()
                if bad_name(name) or name in ('', '*', '\\'):
                    continue
                if ' as ' in name:
                    name, alias = name.split(" as ")
                    name = name.strip()
                    alias = alias.strip()
                else:
                    alias = name
                if alias not in used:
                    if "." in alias:
                        unused = source.count(alias) == 1
                    else:
                        unused = True
                    if unused:
                        print "Unused %s%s" % (g[1] + "." if g[1] else "", name)
                        names[ix] = None
            if None in names:
                new_names = ", ".join(name.strip() for name in names if name is not None)
                #print names, "->", new_names
                if new_names:
                    if g[0]:
                        lines[lineno] = "from %s import %s" % (g[1], new_names)
                    else:
                        lines[lineno] = "import %s" % new_names
                else:
                    lines[lineno] = None
                
    return "\n".join(line for line in lines if line is not None)

def prune_unused_all(sage_root):
    for root, dirs, files in os.walk('%s/devel/sage/sage/' % sage_root):
        for file in files:
            file = os.path.join(root, file)
            if file.endswith('.py'):
                if os.path.basename(file) in ('__init__.py', 'all.py', 'interpreter.py'):
                    continue
                source = open(file).read()
                try:
                    new_source = prune_unused_one(file, source)
                    if new_source != source:
                        ast.parse(new_source)
                        print file
                        open(file, 'w').write(new_source)
                        r = os.system("%s/sage -b > /dev/null 2> /dev/null" % sage_root)
                        if r == 0:
                            r = os.system("%s/sage -python -c 'import sage.all'" % sage_root)
                        if r != 0:
                            print "Failed!"
                            open(file, 'w').write(source)
                        else:
                            print "Good!"
                except Exception, exn:
                    print file, exn

Then

prune_unused_all('/Users/robertwb/sage/sage-4.7.1')

comment:8 Changed 9 years ago by kini

  • Status changed from needs_review to positive_review

Nice. Positive review from me.

BTW for future reference you can syntax-highlight python code in trac comments by putting #!python on the first line of the raw content inside triple braces (see the main page of Sage trac which I added some examples of this to).

comment:9 Changed 9 years ago by leif

  • Reviewers changed from Keshav Kini to Keshav Kini, Leif Leonhardy

Ooops. I wrote this a few hours ago, but forgot to click the submit button.

Replying to robertwb:

I generated the patch with this code, plus a handful (less than a dozen) manual touch-ups where, e.g., symbols were used in eval() strings but not directly.

Ok, so it is rather conservative with the exception of eval() and friends.

We should perhaps especially inspect those files modified that have a low "coverage"(though I strongly doubt the doctests of others will cover most execution paths...).

I think more sophisticated detection of superfluous imports can be left for follow-up tickets, just like the inclusion of Cython files.

The patch is certainly a big step forwards as is.


P.S.: I'm pretty sure Robert knows about trac's wiki markup for source code etc. ;)

comment:10 Changed 9 years ago by kini

I'm pretty sure Robert knows whatever I'm likely to say on trac, but hey :P someone reading might not know.

comment:11 Changed 9 years ago by mderickx

  • Cc mderickx added

comment:12 Changed 9 years ago by leif

  • Description modified (diff)

Attachment comments shouldn't hurt either... ;-)

comment:13 Changed 9 years ago by robertwb

Well, they do take time and screen real estate, and are 100% redundant when there's a single attached file to apply to the Sage library, but I won't complain too much :-).

comment:14 Changed 9 years ago by leif

  • Milestone changed from sage-4.7.2 to sage-pending

Of courseTM this patch bomb doesn't apply after only a few other patches have been applied.

Should we just merge those parts that do (changing the patch here later), and put the others onto a follow-up?

comment:15 Changed 9 years ago by robertwb

  • Milestone changed from sage-pending to sage-4.7.2

Yes, simply ignore the rejects when you merge it and post then to a follow-up. 99% of the changes are completely independent of each other, and otherwise this will never actually get in.

comment:16 Changed 9 years ago by kini

As the patch is more or less automatically generated, why not just merge it last? Generate a patch using Robert's code snippet above, inspect for less than a dozen manual fixes, and done. (Make sure of course to hg qrefresh -U "Robert Bradshaw <robertwb@math.washington.edu>" first :) ) Not sure how time consuming this manual inspection is, but maybe it's worth a shot? As opposed to losing some of the changes, I mean.

comment:17 follow-up: Changed 9 years ago by robertwb

Manual inspection takes quite a while, and the patch is not entirely automatically generated. I'd rather see (the bulk of) this patch merged in sooner rather than later, and especially avoid playing some right-before-the-release gymnastics. The changes won't be lost--we can record what the rejects were and even re-run the script at a later date (which should be much faster and produce a much smaller, easier to inspect output).

comment:18 Changed 9 years ago by kini

Oh, OK then. Sounds good!

comment:19 in reply to: ↑ 17 Changed 9 years ago by leif

Replying to robertwb:

Manual inspection takes quite a while, and the patch is not entirely automatically generated. I'd rather see (the bulk of) this patch merged in sooner rather than later, and especially avoid playing some right-before-the-release gymnastics. The changes won't be lost--we can record what the rejects were and even re-run the script at a later date (which should be much faster and produce a much smaller, easier to inspect output).

That's what I intended; i.e., merge the patch as soon as I've sorted out which tickets are really ready to get merged into Sage 4.7.2.alpha3, replace the patch on this ticket by what actually applied, and put the rejects on a follow-up for the next (devel) release. There we can rerun the script on the then current release.

I've only set it to "sage-pending" to back it out from my merge attempts, not to postpone it to another release.

Changed 9 years ago by leif

Reviewer patch. Robert's patch rebased on the "final preliminary" Sage 4.7.2.alpha3. Apply only this one.

comment:20 Changed 9 years ago by leif

  • Description modified (diff)

Fixed. :)

comment:21 Changed 9 years ago by leif

  • Merged in set to sage-4.7.2.alpha3
  • Resolution set to fixed
  • Status changed from positive_review to closed

We shouldn't forget to open a follow-up.

I'll attach the logs later, either here or to another ticket in case someone opened one then...

comment:22 Changed 9 years ago by kini

Followup ticket is at #11811.

comment:23 Changed 9 years ago by nthiery

I understand the rationale for that patch, but I hope you realize that such patch bombs inflict a lot of rebasing work. I just spent half a day, if not more, rebasing the sage-combinat patch queue. Please, please let us know in advance about it next time.

Cheers,

Nicolas

comment:24 follow-up: Changed 9 years ago by kini

Sorry Nicolas, I didn't know about your guys' hundreds of patches queue when I reviewed this ticket :( Next time I'll cc you to tickets I see with broad patches.

comment:25 in reply to: ↑ 24 Changed 9 years ago by nthiery

Replying to kini:

Sorry Nicolas, I didn't know about your guys' hundreds of patches queue when I reviewed this ticket :( Next time I'll cc you to tickets I see with broad patches.

Thanks!

comment:26 Changed 9 years ago by mderickx

Did you see the follow up of this ticket also Nicolas? It's #11762 it also touches quite a lot, but it are just init.py files so it should not conflict a lot with your patch queue.

Note: See TracTickets for help on using tickets.