Opened 2 years ago

Closed 23 months ago

#31154 closed enhancement (fixed)

Edges and report distances in Breadth First Search in c_graph.pyx

Reported by: gh-kliem Owned by:
Priority: major Milestone: sage-9.3
Component: graph theory Keywords: graphs, BFS, distances
Cc: David Coudert Merged in:
Authors: Jonathan Kliem Reviewers: David Coudert
Report Upstream: N/A Work issues:
Branch: 025a4f9 (Commits, GitHub, GitLab) Commit: 025a4f919e897677fa259819a8569c539e53b7fd
Dependencies: #31129 Stopgaps:

Status badges

Description (last modified by gh-kliem)

Currently, reporting of distances is reported in the python implementation only. We enable this in the cython implementation in c_graph.pyx.

Likewise for returning the edges of the BFS tree.

To compare:

sage: def comp(): 
....:     for n in [5, 10, 50, 100, 500, 1000]:  
....:         G = graphs.Grid2dGraph(n, n)  
....:         print(G) 
....:         %timeit _ = list(G.breadth_first_search(start=(0, 0), report_distance=False)) 
....:         %timeit _ = list(G.breadth_first_search(start=(0, 0), report_distance=True)) 
....:         %timeit _ = list(G.breadth_first_search(start=(0, 0), edges=True)) 

Before:

sage: comp()                                                                                                                                                                                                                                                                                                                                                               
2D Grid Graph for [5, 5]
5.69 µs ± 67 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
36.2 µs ± 411 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
35.8 µs ± 465 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
2D Grid Graph for [10, 10]
16.2 µs ± 22.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
140 µs ± 639 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
138 µs ± 326 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
2D Grid Graph for [50, 50]
492 µs ± 5.91 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
4.15 ms ± 26 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.07 ms ± 18 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2D Grid Graph for [100, 100]
2.68 ms ± 24.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
18.1 ms ± 41.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
17.9 ms ± 13.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2D Grid Graph for [500, 500]
89.4 ms ± 665 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
649 ms ± 5.74 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
618 ms ± 7.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2D Grid Graph for [1000, 1000]
452 ms ± 5.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.17 s ± 11.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.12 s ± 91 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

After:

sage: comp()
2D Grid Graph for [5, 5]
5.76 µs ± 7.01 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
6.3 µs ± 52.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
7.25 µs ± 19.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
2D Grid Graph for [10, 10]
15.5 µs ± 17.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
18.2 µs ± 138 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
21.8 µs ± 13.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
2D Grid Graph for [50, 50]
461 µs ± 1.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
535 µs ± 1.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
703 µs ± 3.01 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2D Grid Graph for [100, 100]
2.55 ms ± 12.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.19 ms ± 10.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.11 ms ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2D Grid Graph for [500, 500]
86.3 ms ± 441 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
159 ms ± 973 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
139 ms ± 337 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
2D Grid Graph for [1000, 1000]
456 ms ± 3.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
694 ms ± 4.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
703 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The small improvement in the first case is due to using yield from.

Change History (22)

comment:1 Changed 2 years ago by gh-kliem

Status: newneeds_review

comment:2 Changed 2 years ago by David Coudert

Reviewers: David Coudert

It's nice that the extra cost of pushing pairs even when report_distance is False is compensated by the yield from....

We could also report edges. It suffices to store in second the parent in the BFS instead of the distance. The extra cost of testing if edges is true before pushing to the queue is rather limited.

comment:3 Changed 2 years ago by David Coudert

Why are you using smallInteger for distances ? does it really make a difference ?

comment:4 Changed 2 years ago by gh-kliem

Actually, I can't measure a difference for pushing pairs or not. It doesn't seem to make a difference at all. After all it's super easy C stuff (if you try to push a tuple as a pair and hope for cython to do the right thing, that is going to make a difference in the wrong way).

yield from is a bit fast that's it.

comment:5 in reply to:  3 Changed 2 years ago by gh-kliem

Replying to dcoudert:

Why are you using smallInteger for distances ? does it really make a difference ?

Integer is lots slower. smallInteger is exactly for this use case. Creating an integer from an int/long.

Timings with replacing smallInteger by Integer:

sage: comp()                                                                                                                                                                                                                                                                                                                                                               
2D Grid Graph for [5, 5]
5.67 µs ± 254 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
8.34 µs ± 114 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
2D Grid Graph for [10, 10]
15.9 µs ± 73.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
26.1 µs ± 98.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
2D Grid Graph for [50, 50]
470 µs ± 6.86 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.09 ms ± 8.04 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2D Grid Graph for [100, 100]
2.59 ms ± 20.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
5.8 ms ± 8.76 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2D Grid Graph for [500, 500]
91.1 ms ± 1.26 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
182 ms ± 638 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
2D Grid Graph for [1000, 1000]
445 ms ± 9.32 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
734 ms ± 9.12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

comment:6 Changed 2 years ago by gh-kliem

The distance assume weight 1 for each edge.

Is there any use case in having the distances and the edges? Currently report_distance is ignored when edges is true.

comment:7 Changed 2 years ago by David Coudert

smallInteger: this is good to know

report distances and edges: we can certainly find a use case where it is useful to return both. Is it slower to push tuples instead of pairs ?

comment:8 in reply to:  7 Changed 2 years ago by gh-kliem

Replying to dcoudert:

smallInteger: this is good to know

I was told on my very first ticket about it.

report distances and edges: we can certainly find a use case where it is useful to return both. Is it slower to push tuples instead of pairs ?

I'll try.

comment:9 Changed 2 years ago by gh-kliem

Description: modified (diff)

Apparently, I took the timings in some intermediate step where report_distances=True did not do anything. Things are slower now, but smallInteger is still better than Integer.

comment:10 in reply to:  4 Changed 2 years ago by gh-kliem

Replying to gh-kliem:

Actually, I can't measure a difference for pushing pairs or not. It doesn't seem to make a difference at all. After all it's super easy C stuff (if you try to push a tuple as a pair and hope for cython to do the right thing, that is going to make a difference in the wrong way).

yield from is a bit fast that's it.

Actually you are right, there is a small penalty for pushing pairs. Something like 5 percent.

comment:11 Changed 2 years ago by git

Commit: 890ed51eb3f334ba4363c36bbb4c2ea2c2676ffed0113bb2b8fd60f0d4f32e801db9a239b7998e91

Branch pushed to git repo; I updated commit sha1. New commits:

d71b874use two fifos
d0113bbeasier structure

comment:12 Changed 2 years ago by gh-kliem

Description: modified (diff)

Slighly better by simplifying things.


New commits:

d71b874use two fifos
d0113bbeasier structure

comment:13 Changed 2 years ago by David Coudert

I like the way you count maintain the distance. it's smart.

What is surprising is that reporting distance takes roughly 30% more time. The only difference in the code is in the return statement.

comment:14 Changed 2 years ago by gh-kliem

Description: modified (diff)
Summary: Report distances in Breadth First Search in c_graph.pyxEdges and report distances in Breadth First Search in c_graph.pyx

comment:15 Changed 2 years ago by git

Commit: d0113bb2b8fd60f0d4f32e801db9a239b7998e9158f3405be1f4f42b27497e9fa0450e174752a4e8

Branch pushed to git repo; I updated commit sha1. New commits:

58f3405edges in BFS in c_graph.pyx

comment:16 in reply to:  13 Changed 2 years ago by gh-kliem

Replying to dcoudert:

I like the way you count maintain the distance. it's smart.

What is surprising is that reporting distance takes roughly 30% more time. The only difference in the code is in the return statement.

Now that you point this out, I find it surprisingly to.

However, I believe this is out of reach:

sage: cython(''' 
....: def tester(int n): 
....:     cdef int i 
....:     for i in range(n): 
....:         yield i 
....:          
....: def tester2(int n): 
....:     cdef int i 
....:     for i in range(n): 
....:         yield (i, 1) 
....: ''')                                                                                                                                                                                                                                                                                                                                                                 

sage:                                                                                                                                                                                                                                                                                                                                                                      
sage: %timeit list(tester(100))                                                                                                                                                                                                                                                                                                                                            
2.3 µs ± 15.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit list(tester2(100))                                                                                                                                                                                                                                                                                                                                           
3.97 µs ± 5.07 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit list(tester(200))                                                                                                                                                                                                                                                                                                                                            
4.15 µs ± 29.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
sage: %timeit list(tester2(200))                                                                                                                                                                                                                                                                                                                                           
7.94 µs ± 52.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

comment:17 Changed 2 years ago by David Coudert

I agree. We already have a huge speedup.

Some details, missing : (several times):

-        - ``report_distance`` -- boolean (default ``False``); if ``True``,
+        - ``report_distance`` -- boolean (default: ``False``); if ``True``,
-        - ``edges`` -- boolean (default ``False``); whether to return the edges
+        - ``edges`` -- boolean (default: ``False``); whether to return the edges

comment:18 Changed 2 years ago by git

Commit: 58f3405be1f4f42b27497e9fa0450e174752a4e8025a4f919e897677fa259819a8569c539e53b7fd

Branch pushed to git repo; I updated commit sha1. New commits:

025a4f9missing colons when specifying defaults

comment:19 in reply to:  17 Changed 2 years ago by gh-kliem

Replying to dcoudert:

I agree. We already have a huge speedup.

Some details, missing : (several times):

-        - ``report_distance`` -- boolean (default ``False``); if ``True``,
+        - ``report_distance`` -- boolean (default: ``False``); if ``True``,
-        - ``edges`` -- boolean (default ``False``); whether to return the edges
+        - ``edges`` -- boolean (default: ``False``); whether to return the edges

Copy/paste. Changed it for every instance in the touched files.

comment:20 Changed 2 years ago by David Coudert

Status: needs_reviewpositive_review

LGTM. Thank you !

comment:21 Changed 2 years ago by gh-kliem

Thanks for reviewing.

comment:22 Changed 23 months ago by Volker Braun

Branch: u/gh-kliem/improved_distances_in_breadth_first_search025a4f919e897677fa259819a8569c539e53b7fd
Resolution: fixed
Status: positive_reviewclosed
Note: See TracTickets for help on using tickets.