Opened 7 years ago
Closed 7 years ago
#15978 closed enhancement (fixed)
Waste of time in g.edges() (acually in iterator_edges)
Reported by: | ncohen | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-6.2 |
Component: | graph theory | Keywords: | |
Cc: | azi | Merged in: | |
Authors: | Nathann Cohen | Reviewers: | Jernej Azarija |
Report Upstream: | N/A | Work issues: | |
Branch: | 77c4df2 (Commits) | Commit: | 77c4df2fe5eda40cd86c083d8d2e3e510efbb667 |
Dependencies: | Stopgaps: |
Description (last modified by )
Jernej reported that Sage took <a lifetime> to enumerate the edges of an important graph of his. There was actually a comment in the source code which I left then forgot, saying that this particular block of code was totally awful.
Was.
With the latest release :
sage: g = graphs.CompleteGraph(3000) sage: %time _ = g.to_dictionary() CPU times: user 1.38 s, sys: 0 ns, total: 1.38 s Wall time: 1.37 s sage: %time _ = g.edges() CPU times: user 1min 24s, sys: 8 ms, total: 1min 24s Wall time: 1min 24s
With this branch :
sage: g = graphs.CompleteGraph(3000) sage: %time _ = g.to_dictionary() CPU times: user 1.38 s, sys: 0 ns, total: 1.38 s Wall time: 1.37 s sage: %time _ = g.edges() CPU times: user 6.88 s, sys: 0 ns, total: 6.88 s Wall time: 6.88 s
Which still does not make any sense.
Nathann
Change History (26)
comment:1 Changed 7 years ago by
- Branch set to u/ncohen/15978
- Commit set to 768a63ea1d24e569a254f99b01b1f4c568198bc9
- Description modified (diff)
- Status changed from new to needs_review
comment:2 Changed 7 years ago by
- Description modified (diff)
comment:3 Changed 7 years ago by
Nathan,
the patch seems to implement the fix in the way you described yesterday in the email right?
comment:4 Changed 7 years ago by
Yepyepyepyepyep. Except I replaced the calls to "sorted" by plain python.
sage: def a(u,v): ....: return tuple(sorted((u,v))) sage: %timeit a(5,6) 1000000 loops, best of 3: 1.58 µs per loop sage: def a(u,v): ....: return (u,v) if u<=v else (v,u) sage: %timeit a(5,6) 1000000 loops, best of 3: 527 ns per loop
Nathann
comment:5 follow-up: ↓ 6 Changed 7 years ago by
Probably a stupid question, but: Why is CompleteGraph(3000)
a SparseGraph
?
I don't know anything about the graph code, but I did a bit of profiling out of curiosity. And it looks like nested loops such as
# SparseGraph.iterator_edges for u_int in self._cg.out_neighbors(v_int): if u_int >= v_int: for l_int in self._cg.all_arcs(v_int, u_int): ...
are a pretty redundant and not too cache-friendly way of enumerating the edges. In particular:
all_arcs_unsafe(u, v, ...)
seems to spend LOTS of time looking fortemp
such thattemp.vertex == v
, while (if I understand correctly)out_neighbors_unsafe
already had to visit the correspondingSparseGraphBTNode
. But perhaps the graph is just too dense for the hash table to do its job, and it would work just fine with a sparse graph?- The test
u_int <= v_int
can also be a bit expensive, perhaps due to branch mispredictions or something like that.
comment:6 in reply to: ↑ 5 ; follow-up: ↓ 7 Changed 7 years ago by
Yoooooo !
Probably a stupid question, but: Why is
CompleteGraph(3000)
aSparseGraph
?
No good answer to that. There is a Dense Graph backend which is never used unless you ask for it, and as it is never tested I expect it to be quite buggy.
I don't know anything about the graph code, but I did a bit of profiling out of curiosity. And it looks like nested loops such as
# SparseGraph.iterator_edges for u_int in self._cg.out_neighbors(v_int): if u_int >= v_int: for l_int in self._cg.all_arcs(v_int, u_int): ...are a pretty redundant and not too cache-friendly way of enumerating the edges. In particular:
all_arcs_unsafe(u, v, ...)
seems to spend LOTS of time looking fortemp
such thattemp.vertex == v
, while (if I understand correctly)out_neighbors_unsafe
already had to visit the correspondingSparseGraphBTNode
. But perhaps the graph is just too dense for the hash table to do its job, and it would work just fine with a sparse graph?
There is a waste of time there indeed. I just never re-wrote these parts... The backend code is Robert Miller's, I never touched the data structures. I wrote some backends I needed for efficient implementations, and in those implementations you will not find such waste...
I should either rewrite the default backend for graphs, or write lower-level code for functions like that.
I just hate labels and multiple edges.....
- The test
u_int <= v_int
can also be a bit expensive, perhaps due to branch mispredictions or something like that.
Hmmmm... I understand, but how can you do without ?
Nathann
comment:7 in reply to: ↑ 6 Changed 7 years ago by
Replying to ncohen:
- The test
u_int <= v_int
can also be a bit expensive, perhaps due to branch mispredictions or something like that.Hmmmm... I understand, but how can you do without ?
I was thinking of having out_neighbors
(or a new variant of it that would return SparseGraphBTNode
s) filter out the vertices with u_int < v_int
itslef, or seeing if by chance it happens to return a sorted list... But I have no idea (i) if it is feasible, (ii) if would be worth the pain!
comment:8 Changed 7 years ago by
Yoooo !!
Well. What I should do is try again to read and document, and rewrite this at lower level. Definitely.
I mean... You are right in what you say, it is just that I have to read and understand very undocumented code, and that to be honest I don't have performance problems myself. But I will do that next week, definitely.
Thanks.
Sorry, I'm a bit exhausted right now. Anyway have fun, good evening to you ;-)
Nathann
comment:9 Changed 7 years ago by
It took me a while, but I understood that what Marc reported above is actually responsible for 99% of our worries. I will create a ticket and write a patch for this tomorrow asap. It will probably depend on this very ticket.
Nathann
comment:10 Changed 7 years ago by
Done in #16005.
Nathann
comment:11 Changed 7 years ago by
I've looked at the patch and tested it and it looks good (the doctest somehow fails but I believe its not related to this change but something unrelated). Hence I give this patch a go.
Thanks Nathann for fixing this.
comment:12 Changed 7 years ago by
- Status changed from needs_review to positive_review
comment:13 Changed 7 years ago by
Please fill in reviewer name
comment:14 Changed 7 years ago by
- Reviewers set to Jernej Azarija
comment:15 Changed 7 years ago by
- Status changed from positive_review to needs_work
Traceback (most recent call last): File "/home/release/Sage/src/doc/common/builder.py", line 1474, in <module> getattr(get_builder(name), type)() File "/home/release/Sage/src/doc/common/builder.py", line 269, in _wrapper getattr(get_builder(document), 'inventory')(*args, **kwds) File "/home/release/Sage/src/doc/common/builder.py", line 485, in _wrapper x.get(99999) File "/home/release/Sage/local/lib/python/multiprocessing/pool.py", line 554, in get raise self._value OSError: [graphs ] docstring of sage.graphs.base.sparse_graph.SparseGraphBackend.iterator_edges:20: WARNING: Literal block expected; none found.
comment:16 Changed 7 years ago by
NOOOOooooooooooooooooooooooooooooooo... It missed the release :-P
Okay, fixed T_T
Nathann
comment:17 Changed 7 years ago by
- Commit changed from 768a63ea1d24e569a254f99b01b1f4c568198bc9 to 83f22e77e04b47fba65ae55adb4d80ec1c7d0a5e
comment:18 Changed 7 years ago by
- Status changed from needs_work to positive_review
comment:19 Changed 7 years ago by
- Status changed from positive_review to needs_work
Doctests fail in some crystal stuff
comment:20 Changed 7 years ago by
I hate parents/elements and categories. I am not even allowed to test that two elements are equal, for if I do I get inside of their f... recursive loops.
comment:21 Changed 7 years ago by
not to test equality but COMPARE.
comment:22 Changed 7 years ago by
sage: Tab = CrystalOfTableaux(['A', 3], shape = [2,1,1]) sage: g=Tab.digraph() sage: u,v = g.vertices()[:2] sage: u <= v ... RuntimeError: maximum recursion depth exceeded in cmp
What the hell can I do with that ?
Before, the following code was called :
sage: cmp(u,v) -1
comment:23 Changed 7 years ago by
Okay, here it is. I replaced the <=
with <
which changes nothing for us and does not change any doctest either, except that the crystal doctests works now.
And I wrote to sage-combinat-devel about that.
I set it back to positive_review
after having checked that all long doctests in graph/ and combinat/crystal pass.
By the way the doctests in combinat/crystals/ take a lifetime.
Nathann
comment:24 Changed 7 years ago by
- Commit changed from 83f22e77e04b47fba65ae55adb4d80ec1c7d0a5e to 77c4df2fe5eda40cd86c083d8d2e3e510efbb667
Branch pushed to git repo; I updated commit sha1. New commits:
77c4df2 | trac #15978: Broken doctest in combinat/crystals/
|
comment:25 Changed 7 years ago by
- Status changed from needs_work to positive_review
comment:26 Changed 7 years ago by
- Branch changed from u/ncohen/15978 to 77c4df2fe5eda40cd86c083d8d2e3e510efbb667
- Resolution set to fixed
- Status changed from positive_review to closed
New commits:
trac #15978: Waste of time in g.edges()