Opened 2 years ago
Closed 23 months ago
#31154 closed enhancement (fixed)
Edges and report distances in Breadth First Search in c_graph.pyx
Reported by:  ghkliem  Owned by:  

Priority:  major  Milestone:  sage9.3 
Component:  graph theory  Keywords:  graphs, BFS, distances 
Cc:  David Coudert  Merged in:  
Authors:  Jonathan Kliem  Reviewers:  David Coudert 
Report Upstream:  N/A  Work issues:  
Branch:  025a4f9 (Commits, GitHub, GitLab)  Commit:  025a4f919e897677fa259819a8569c539e53b7fd 
Dependencies:  #31129  Stopgaps: 
Description (last modified by )
Currently, reporting of distances is reported in the python implementation only. We enable this in the cython implementation in c_graph.pyx
.
Likewise for returning the edges of the BFS tree.
To compare:
sage: def comp(): ....: for n in [5, 10, 50, 100, 500, 1000]: ....: G = graphs.Grid2dGraph(n, n) ....: print(G) ....: %timeit _ = list(G.breadth_first_search(start=(0, 0), report_distance=False)) ....: %timeit _ = list(G.breadth_first_search(start=(0, 0), report_distance=True)) ....: %timeit _ = list(G.breadth_first_search(start=(0, 0), edges=True))
Before:
sage: comp() 2D Grid Graph for [5, 5] 5.69 µs ± 67 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 36.2 µs ± 411 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 35.8 µs ± 465 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 2D Grid Graph for [10, 10] 16.2 µs ± 22.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 140 µs ± 639 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 138 µs ± 326 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 2D Grid Graph for [50, 50] 492 µs ± 5.91 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 4.15 ms ± 26 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.07 ms ± 18 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 2D Grid Graph for [100, 100] 2.68 ms ± 24.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 18.1 ms ± 41.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 17.9 ms ± 13.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 2D Grid Graph for [500, 500] 89.4 ms ± 665 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 649 ms ± 5.74 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 618 ms ± 7.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 2D Grid Graph for [1000, 1000] 452 ms ± 5.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 3.17 s ± 11.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 3.12 s ± 91 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
After:
sage: comp() 2D Grid Graph for [5, 5] 5.76 µs ± 7.01 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 6.3 µs ± 52.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 7.25 µs ± 19.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 2D Grid Graph for [10, 10] 15.5 µs ± 17.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 18.2 µs ± 138 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 21.8 µs ± 13.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 2D Grid Graph for [50, 50] 461 µs ± 1.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 535 µs ± 1.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 703 µs ± 3.01 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2D Grid Graph for [100, 100] 2.55 ms ± 12.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 3.19 ms ± 10.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.11 ms ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 2D Grid Graph for [500, 500] 86.3 ms ± 441 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 159 ms ± 973 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 139 ms ± 337 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 2D Grid Graph for [1000, 1000] 456 ms ± 3.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 694 ms ± 4.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 703 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The small improvement in the first case is due to using yield from
.
Change History (22)
comment:1 Changed 2 years ago by
Status:  new → needs_review 

comment:2 Changed 2 years ago by
Reviewers:  → David Coudert 

comment:3 followup: 5 Changed 2 years ago by
Why are you using smallInteger
for distances ? does it really make a difference ?
comment:4 followup: 10 Changed 2 years ago by
Actually, I can't measure a difference for pushing pairs or not. It doesn't seem to make a difference at all. After all it's super easy C stuff (if you try to push a tuple as a pair and hope for cython to do the right thing, that is going to make a difference in the wrong way).
yield from
is a bit fast that's it.
comment:5 Changed 2 years ago by
Replying to dcoudert:
Why are you using
smallInteger
for distances ? does it really make a difference ?
Integer
is lots slower. smallInteger
is exactly for this use case. Creating an integer from an int/long
.
Timings with replacing smallInteger
by Integer
:
sage: comp() 2D Grid Graph for [5, 5] 5.67 µs ± 254 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 8.34 µs ± 114 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 2D Grid Graph for [10, 10] 15.9 µs ± 73.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 26.1 µs ± 98.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 2D Grid Graph for [50, 50] 470 µs ± 6.86 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.09 ms ± 8.04 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2D Grid Graph for [100, 100] 2.59 ms ± 20.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 5.8 ms ± 8.76 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 2D Grid Graph for [500, 500] 91.1 ms ± 1.26 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 182 ms ± 638 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 2D Grid Graph for [1000, 1000] 445 ms ± 9.32 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 734 ms ± 9.12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
comment:6 Changed 2 years ago by
The distance assume weight 1 for each edge.
Is there any use case in having the distances and the edges? Currently report_distance
is ignored when edges
is true.
comment:7 followup: 8 Changed 2 years ago by
smallInteger
: this is good to know
report distances and edges: we can certainly find a use case where it is useful to return both. Is it slower to push tuples instead of pairs ?
comment:8 Changed 2 years ago by
Replying to dcoudert:
smallInteger
: this is good to know
I was told on my very first ticket about it.
report distances and edges: we can certainly find a use case where it is useful to return both. Is it slower to push tuples instead of pairs ?
I'll try.
comment:9 Changed 2 years ago by
Description:  modified (diff) 

Apparently, I took the timings in some intermediate step where report_distances=True
did not do anything. Things are slower now, but smallInteger
is still better than Integer
.
comment:10 Changed 2 years ago by
Replying to ghkliem:
Actually, I can't measure a difference for pushing pairs or not. It doesn't seem to make a difference at all. After all it's super easy C stuff (if you try to push a tuple as a pair and hope for cython to do the right thing, that is going to make a difference in the wrong way).
yield from
is a bit fast that's it.
Actually you are right, there is a small penalty for pushing pairs. Something like 5 percent.
comment:11 Changed 2 years ago by
Commit:  890ed51eb3f334ba4363c36bbb4c2ea2c2676ffe → d0113bb2b8fd60f0d4f32e801db9a239b7998e91 

comment:12 Changed 2 years ago by
Description:  modified (diff) 

comment:13 followup: 16 Changed 2 years ago by
I like the way you count maintain the distance. it's smart.
What is surprising is that reporting distance takes roughly 30% more time. The only difference in the code is in the return statement.
comment:14 Changed 2 years ago by
Description:  modified (diff) 

Summary:  Report distances in Breadth First Search in c_graph.pyx → Edges and report distances in Breadth First Search in c_graph.pyx 
comment:15 Changed 2 years ago by
Commit:  d0113bb2b8fd60f0d4f32e801db9a239b7998e91 → 58f3405be1f4f42b27497e9fa0450e174752a4e8 

Branch pushed to git repo; I updated commit sha1. New commits:
58f3405  edges in BFS in c_graph.pyx

comment:16 Changed 2 years ago by
Replying to dcoudert:
I like the way you count maintain the distance. it's smart.
What is surprising is that reporting distance takes roughly 30% more time. The only difference in the code is in the return statement.
Now that you point this out, I find it surprisingly to.
However, I believe this is out of reach:
sage: cython(''' ....: def tester(int n): ....: cdef int i ....: for i in range(n): ....: yield i ....: ....: def tester2(int n): ....: cdef int i ....: for i in range(n): ....: yield (i, 1) ....: ''') sage: sage: %timeit list(tester(100)) 2.3 µs ± 15.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit list(tester2(100)) 3.97 µs ± 5.07 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit list(tester(200)) 4.15 µs ± 29.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) sage: %timeit list(tester2(200)) 7.94 µs ± 52.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
comment:17 followup: 19 Changed 2 years ago by
I agree. We already have a huge speedup.
Some details, missing :
(several times):
  ``report_distance``  boolean (default ``False``); if ``True``, +  ``report_distance``  boolean (default: ``False``); if ``True``,
  ``edges``  boolean (default ``False``); whether to return the edges +  ``edges``  boolean (default: ``False``); whether to return the edges
comment:18 Changed 2 years ago by
Commit:  58f3405be1f4f42b27497e9fa0450e174752a4e8 → 025a4f919e897677fa259819a8569c539e53b7fd 

Branch pushed to git repo; I updated commit sha1. New commits:
025a4f9  missing colons when specifying defaults

comment:19 Changed 2 years ago by
Replying to dcoudert:
I agree. We already have a huge speedup.
Some details, missing
:
(several times):  ``report_distance``  boolean (default ``False``); if ``True``, +  ``report_distance``  boolean (default: ``False``); if ``True``,  ``edges``  boolean (default ``False``); whether to return the edges +  ``edges``  boolean (default: ``False``); whether to return the edges
Copy/paste. Changed it for every instance in the touched files.
comment:22 Changed 23 months ago by
Branch:  u/ghkliem/improved_distances_in_breadth_first_search → 025a4f919e897677fa259819a8569c539e53b7fd 

Resolution:  → fixed 
Status:  positive_review → closed 
It's nice that the extra cost of pushing pairs even when
report_distance
isFalse
is compensated by theyield from...
.We could also report edges. It suffices to store in second the parent in the BFS instead of the distance. The extra cost of testing if
edges
is true before pushing to the queue is rather limited.