Opened 5 years ago

Closed 5 years ago

#15978 closed enhancement (fixed)

Waste of time in g.edges() (acually in iterator_edges)

Reported by: ncohen Owned by:
Priority: major Milestone: sage-6.2
Component: graph theory Keywords:
Cc: azi Merged in:
Authors: Nathann Cohen Reviewers: Jernej Azarija
Report Upstream: N/A Work issues:
Branch: 77c4df2 (Commits) Commit: 77c4df2fe5eda40cd86c083d8d2e3e510efbb667
Dependencies: Stopgaps:

Description (last modified by ncohen)

Jernej reported that Sage took <a lifetime> to enumerate the edges of an important graph of his. There was actually a comment in the source code which I left then forgot, saying that this particular block of code was totally awful.

Was.

With the latest release :

sage: g = graphs.CompleteGraph(3000)
sage: %time _ = g.to_dictionary()
CPU times: user 1.38 s, sys: 0 ns, total: 1.38 s
Wall time: 1.37 s
sage: %time _ = g.edges()
CPU times: user 1min 24s, sys: 8 ms, total: 1min 24s
Wall time: 1min 24s

With this branch :

sage: g = graphs.CompleteGraph(3000)
sage: %time _ = g.to_dictionary()
CPU times: user 1.38 s, sys: 0 ns, total: 1.38 s
Wall time: 1.37 s
sage: %time _ = g.edges()
CPU times: user 6.88 s, sys: 0 ns, total: 6.88 s
Wall time: 6.88 s

Which still does not make any sense.

Nathann

Change History (26)

comment:1 Changed 5 years ago by ncohen

  • Branch set to u/ncohen/15978
  • Commit set to 768a63ea1d24e569a254f99b01b1f4c568198bc9
  • Description modified (diff)
  • Status changed from new to needs_review

New commits:

768a63etrac #15978: Waste of time in g.edges()

comment:2 Changed 5 years ago by ncohen

  • Description modified (diff)

comment:3 Changed 5 years ago by azi

Nathan,

the patch seems to implement the fix in the way you described yesterday in the email right?

comment:4 Changed 5 years ago by ncohen

Yepyepyepyepyep. Except I replaced the calls to "sorted" by plain python.

sage: def a(u,v):
....:     return tuple(sorted((u,v)))
sage: %timeit a(5,6)
1000000 loops, best of 3: 1.58 µs per loop
sage: def a(u,v):                                 
....:     return (u,v) if u<=v else (v,u)
sage: %timeit a(5,6)                                  
1000000 loops, best of 3: 527 ns per loop

Nathann

comment:5 follow-up: Changed 5 years ago by mmezzarobba

Probably a stupid question, but: Why is CompleteGraph(3000) a SparseGraph?

I don't know anything about the graph code, but I did a bit of profiling out of curiosity. And it looks like nested loops such as

# SparseGraph.iterator_edges
for u_int in self._cg.out_neighbors(v_int):
    if u_int >= v_int:
        for l_int in self._cg.all_arcs(v_int, u_int):
            ...

are a pretty redundant and not too cache-friendly way of enumerating the edges. In particular:

  • all_arcs_unsafe(u, v, ...) seems to spend LOTS of time looking for temp such that temp.vertex == v, while (if I understand correctly) out_neighbors_unsafe already had to visit the corresponding SparseGraphBTNode. But perhaps the graph is just too dense for the hash table to do its job, and it would work just fine with a sparse graph?
  • The test u_int <= v_int can also be a bit expensive, perhaps due to branch mispredictions or something like that.

comment:6 in reply to: ↑ 5 ; follow-up: Changed 5 years ago by ncohen

Yoooooo !

Probably a stupid question, but: Why is CompleteGraph(3000) a SparseGraph?

No good answer to that. There is a Dense Graph backend which is never used unless you ask for it, and as it is never tested I expect it to be quite buggy.

I don't know anything about the graph code, but I did a bit of profiling out of curiosity. And it looks like nested loops such as

# SparseGraph.iterator_edges
for u_int in self._cg.out_neighbors(v_int):
    if u_int >= v_int:
        for l_int in self._cg.all_arcs(v_int, u_int):
            ...

are a pretty redundant and not too cache-friendly way of enumerating the edges. In particular:

  • all_arcs_unsafe(u, v, ...) seems to spend LOTS of time looking for temp such that temp.vertex == v, while (if I understand correctly) out_neighbors_unsafe already had to visit the corresponding SparseGraphBTNode. But perhaps the graph is just too dense for the hash table to do its job, and it would work just fine with a sparse graph?

There is a waste of time there indeed. I just never re-wrote these parts... The backend code is Robert Miller's, I never touched the data structures. I wrote some backends I needed for efficient implementations, and in those implementations you will not find such waste...

I should either rewrite the default backend for graphs, or write lower-level code for functions like that.

I just hate labels and multiple edges.....

  • The test u_int <= v_int can also be a bit expensive, perhaps due to branch mispredictions or something like that.

Hmmmm... I understand, but how can you do without ?

Nathann

comment:7 in reply to: ↑ 6 Changed 5 years ago by mmezzarobba

Replying to ncohen:

  • The test u_int <= v_int can also be a bit expensive, perhaps due to branch mispredictions or something like that.

Hmmmm... I understand, but how can you do without ?

I was thinking of having out_neighbors (or a new variant of it that would return SparseGraphBTNodes) filter out the vertices with u_int < v_int itslef, or seeing if by chance it happens to return a sorted list... But I have no idea (i) if it is feasible, (ii) if would be worth the pain!

comment:8 Changed 5 years ago by ncohen

Yoooo !!

Well. What I should do is try again to read and document, and rewrite this at lower level. Definitely.

I mean... You are right in what you say, it is just that I have to read and understand very undocumented code, and that to be honest I don't have performance problems myself. But I will do that next week, definitely.

Thanks.

Sorry, I'm a bit exhausted right now. Anyway have fun, good evening to you ;-)

Nathann

comment:9 Changed 5 years ago by ncohen

It took me a while, but I understood that what Marc reported above is actually responsible for 99% of our worries. I will create a ticket and write a patch for this tomorrow asap. It will probably depend on this very ticket.

Nathann

comment:10 Changed 5 years ago by ncohen

Done in #16005.

Nathann

comment:11 Changed 5 years ago by azi

I've looked at the patch and tested it and it looks good (the doctest somehow fails but I believe its not related to this change but something unrelated). Hence I give this patch a go.

Thanks Nathann for fixing this.

Last edited 5 years ago by azi (previous) (diff)

comment:12 Changed 5 years ago by azi

  • Status changed from needs_review to positive_review

comment:13 Changed 5 years ago by vbraun

Please fill in reviewer name

comment:14 Changed 5 years ago by ncohen

  • Reviewers set to Jernej Azarija

comment:15 Changed 5 years ago by vbraun

  • Status changed from positive_review to needs_work
Traceback (most recent call last):
  File "/home/release/Sage/src/doc/common/builder.py", line 1474, in <module>
    getattr(get_builder(name), type)()
  File "/home/release/Sage/src/doc/common/builder.py", line 269, in _wrapper
    getattr(get_builder(document), 'inventory')(*args, **kwds)
  File "/home/release/Sage/src/doc/common/builder.py", line 485, in _wrapper
    x.get(99999)
  File "/home/release/Sage/local/lib/python/multiprocessing/pool.py", line 554, in get
    raise self._value
OSError: [graphs   ] docstring of sage.graphs.base.sparse_graph.SparseGraphBackend.iterator_edges:20: WARNING: Literal block expected; none found.

comment:16 Changed 5 years ago by ncohen

NOOOOooooooooooooooooooooooooooooooo... It missed the release :-P

Okay, fixed T_T

Nathann

comment:17 Changed 5 years ago by git

  • Commit changed from 768a63ea1d24e569a254f99b01b1f4c568198bc9 to 83f22e77e04b47fba65ae55adb4d80ec1c7d0a5e

Branch pushed to git repo; I updated commit sha1. New commits:

c8c46catrac #15978: Rebase on 6.2.beta5
83f22e7trac #15978: Broken doc

comment:18 Changed 5 years ago by ncohen

  • Status changed from needs_work to positive_review

comment:19 Changed 5 years ago by vbraun

  • Status changed from positive_review to needs_work

Doctests fail in some crystal stuff

comment:20 Changed 5 years ago by ncohen

I hate parents/elements and categories. I am not even allowed to test that two elements are equal, for if I do I get inside of their f... recursive loops.

Last edited 5 years ago by ncohen (previous) (diff)

comment:21 Changed 5 years ago by ncohen

not to test equality but COMPARE.

comment:22 Changed 5 years ago by ncohen

sage: Tab  = CrystalOfTableaux(['A', 3], shape = [2,1,1])           
sage: g=Tab.digraph()                                    
sage: u,v = g.vertices()[:2]                             
sage: u <= v
...
RuntimeError: maximum recursion depth exceeded in cmp

What the hell can I do with that ?

Before, the following code was called :

sage: cmp(u,v)              
-1

comment:23 Changed 5 years ago by ncohen

Okay, here it is. I replaced the <= with < which changes nothing for us and does not change any doctest either, except that the crystal doctests works now.

And I wrote to sage-combinat-devel about that.

I set it back to positive_review after having checked that all long doctests in graph/ and combinat/crystal pass.

By the way the doctests in combinat/crystals/ take a lifetime.

Nathann

comment:24 Changed 5 years ago by git

  • Commit changed from 83f22e77e04b47fba65ae55adb4d80ec1c7d0a5e to 77c4df2fe5eda40cd86c083d8d2e3e510efbb667

Branch pushed to git repo; I updated commit sha1. New commits:

77c4df2trac #15978: Broken doctest in combinat/crystals/

comment:25 Changed 5 years ago by ncohen

  • Status changed from needs_work to positive_review
Last edited 5 years ago by ncohen (previous) (diff)

comment:26 Changed 5 years ago by vbraun

  • Branch changed from u/ncohen/15978 to 77c4df2fe5eda40cd86c083d8d2e3e510efbb667
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.