Opened 13 years ago

Closed 13 years ago

#6578 closed defect (fixed)

[with patch, positive review] fast subgraphs by building the graph instead of deleting things

Reported by: Jason Grout Owned by: Robert Miller
Priority: major Milestone: sage-4.1.1
Component: graph theory Keywords:
Cc: Robert Miller Merged in: Sage 4.1.1.alpha1
Authors: Jason Grout Reviewers: Robert Miller, Minh Van Nguyen
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description

Currently, to create a subgraph, Sage copies the graph and then deletes everything not specified. This is very slow if you just want a small part of a large graph.

Attachments (2)

trac-6578-subgraph-refactoring.patch (16.7 KB) - added by Jason Grout 13 years ago.
trac_6578-referee.patch (1.2 KB) - added by Minh Van Nguyen 13 years ago.
referee patch

Download all attachments as: .zip

Change History (10)

comment:1 Changed 13 years ago by Jason Grout

This patch refactors the subgraph code into two functions, and then tries to intelligently choose between building the graph from scratch and deleting vertices and edges from a copy of the graph.

Changed 13 years ago by Jason Grout

comment:2 Changed 13 years ago by Jason Grout

Cc: Robert Miller added
Summary: fast subgraphs by building the graph instead of deleting things[with patch, needs review] fast subgraphs by building the graph instead of deleting things

Timing comparison:

sage: g=graphs.PathGraph(100000)                     
sage: %time g.subgraph(range(20), algorithm='add')   
CPU times: user 0.61 s, sys: 0.01 s, total: 0.62 s
Wall time: 0.68 s
Subgraph of (Path Graph): Graph on 20 vertices
sage: %time g.subgraph(range(20), algorithm='delete') # the old algorithm
CPU times: user 3.96 s, sys: 0.04 s, total: 4.00 s
Wall time: 4.15 s
Subgraph of (Path Graph): Graph on 20 vertices

comment:3 Changed 13 years ago by Robert Miller

Jason,

Do you have any examples where delete is actually faster than add? I ask because what this does is add every edge, and then delete a bunch of them. I doubt it's ever faster.

comment:4 Changed 13 years ago by Jason Grout

Most of the time delete is faster than add; that's why I set the factor so that if we want more than 5% of the vertices, it does delete.

I think delete copies the graph, which doesn't *add* every edge, but probably just copies the edge dictionary, which should be *way* faster.

The tests I was taking my data from was doing

sage: g=graphs.PathGraph(1000)

and then doing g.subgraph(range(i) using each of the algorithms. The breakeven point seemed to be between 50 and 100. I also did this with the complete graph, and got similar results.

comment:5 in reply to:  4 Changed 13 years ago by Robert Miller

Reviewers: Robert Miller
Summary: [with patch, needs review] fast subgraphs by building the graph instead of deleting things[with patch, positive review] fast subgraphs by building the graph instead of deleting things

Replying to jason:

I think delete copies the graph, which doesn't *add* every edge, but probably just copies the edge dictionary, which should be *way* faster.

What is this opinion based on? If it's not an inplace subgraph then it creates a copy, which calls networkx_graph, which calls NX's copy, which has the following lines:

        for e in self.edges_iter():
            H.add_edge(e)

So we're definitely not using the adjacency dictionary there. The crossover between add and delete you observed above is most likely due to overhead from the fact that calling Sage's add_edge is using several layers of Python functions.

I expect this to change drastically once the graph backends are optimized. This may be a while in the future, but you should keep this in mind for when that does eventually happen. In general, this patch is an improvement, since it provides both options, but thinking in terms of optimization at this stage is going to result in wasted effort, since the floor is about to drop out from underneath any current timings.

Anyway, everything looks good here (especially, applies to 4.1.1.alpha0 and passes tests), so let's go ahead and merge it!

comment:6 Changed 13 years ago by Jason Grout

The opinion was based on thinking of the most logical way to implement it (hence the "I think") :). Thanks for looking into the code.

I'm dreaming of the day that this optimization and choice is obsolete! It would be *so* nice!

Changed 13 years ago by Minh Van Nguyen

Attachment: trac_6578-referee.patch added

referee patch

comment:7 Changed 13 years ago by Minh Van Nguyen

Reviewers: Robert MillerRobert Miller, Minh Van Nguyen

The patch trac_6578-referee.patch adds a missing double colon "::". It also deletes two sets of redundant double colons.

comment:8 Changed 13 years ago by Minh Van Nguyen

Merged in: Sage 4.1.1.alpha1
Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.