Opened 8 years ago
Closed 8 years ago
#13073 closed enhancement (fixed)
recognition of weakly chordal graphs
Reported by: | eisermbi | Owned by: | jason, ncohen, rlm |
---|---|---|---|
Priority: | major | Milestone: | sage-5.2 |
Component: | graph theory | Keywords: | weakly chordal, hole, antihole |
Cc: | dimpase, kini, ncohen | Merged in: | sage-5.2.rc0 |
Authors: | Birk Eisermann | Reviewers: | Nathann Cohen,Birk Eisermann |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | #12917, #12965, #12544 | Stopgaps: |
Description (last modified by )
Adding functionality to recognize weakly chordal graphs; as part of that, procedures for finding holes (= induced cycle of length >=5) and antiholes (complements of holes) in graphs are included.
Apply:
Attachments (10)
Change History (33)
comment:1 Changed 8 years ago by
comment:2 Changed 8 years ago by
Hello Nathann!!
Thank you for your comments! I have revised the patch... Now to answer the questions:
Location / Visibility:
- The weakly chordal test (checking for `C_k`, `k \geq 5`) provides similar functionality as the chordal test (checking for `C_k`, `k \geq 4`). Since the function is_chordal() is contained in the file generic_graph.py, I chose this file for the new functions. You are right that the weakly chordal test does not consider directed edges, but just its underlying undirected graph. My previous version made an 'undirected' copy of the (Di)Graph and thus, it yield a correct result. To gain speed, I removed this part and because of this moved the algorithm into graph.py.
- Shall the functions be moved to an extra file?
- I think since the functions is_even_hole_free() and is_odd_hole_free() are available, it might be useful to have the function is_hole_free()/is_antihole_free() available as well (and not make them private)...
Run time / speed:
- It is true that g.complement().is_long_hole_free() and g.is_long_antihole_free() are doing the same job. But for the first function, the complement of the graph has to be calculated while this is not the case for the second one. Actually both algorithms should be used in is_weakly_chordal(). That was an oversight in the first version (though the function was still correct). Now I have done a speed comparison between above two options. The result of
comparison1()
of attached "comparison.py" shows that in average it is better to use both.
- Okay, I was focused on a fast algorithm. Now I wrote a function using the subgraph_search of C_k, for k = 5,6,...,graph.order() (what does actually the same as is_even_hole_free() and is_odd_hole_free() together except for C_4's). I did some speed comparison. After some rounds of profiling and optimization, the Python code runs in average several times faster than the subgraph search - as you said, subgraph search is algorithmically bad in general. Hence, the is_long_(anti)hole_free algorithm should be preferred. See result of
comparison2()
of attached "comparison.py".
- Profiling the code shows that 99% of the time is spend in the edge_iterator. Whether it is possible to improve on that further, it is beyond my knowledge. Any improvements are welcome!
Comments:
- Actually, the algorithms are described in the mentioned paper and I just implemented them. The description of each function is two pages long (+ one page for the certificate) and I doubt that I can write down all as comments. I added at least some comments...
Birk
Changed 8 years ago by
comment:3 Changed 8 years ago by
- Status changed from new to needs_review
comment:4 follow-up: ↓ 21 Changed 8 years ago by
Hellooooo Birk !!
I am currently reviewing your patch. As you code is somewhat long, and because you convinced me that both functions were useful I thought it would be better to move the code *out* of graph.py in a new weakly_chordal module (we already have some modules for specific graph classes). This way graph.py does not grow too much, and we will be able to add specific documentation in the module's head. I do it very often these days, and you can add very interesting things there on what the code actually does.
For instance :
http://www.sagemath.org/doc/reference/sage/graphs/graph_decompositions/rankwidth.html http://www.sagemath.org/doc/reference/sage/graphs/graph_decompositions/vertex_separation.html http://www.sagemath.org/doc/reference/sage/graphs/pq_trees.html
This patch is *not* a review. I changed nothing to your code, unless what I could not avoid:
- The occurrences of
self
have been replaced by
g
- I changed several instances of "from sage.graphs.all import Graph" into the more proper "from sage.graphs.graph import Graph"
- The new module is imported in sage.graphs.all.py
- There are shortcuts from graph.py to your methods so that g.is_long_hole_free still works. And quite honestly we should do this for many other methods to shorten graph.py !
I will try to send a complete review by today. It will probably take time as I will have to read/understand the paper to mention. If I can't make it, you will have it by tomorrow, and unless something really really bad happens this ticket should be reviewed by the end of the month :-D
Nathann
comment:5 Changed 8 years ago by
- Description modified (diff)
- Reviewers set to Nathann Cohen
Hellooooooooo Birk !!!
Here is the promised reviewer's patch ! I originally intended to do a bit more, but some of the optimizations I had in mind were actually worsening the situation on random graphs, and I got stuck with *another* optimization because some things are possible in Python files that are not in Cython ones... *sigh* :-)
Anyway I changed the code a little since the previous version :
- Added the new module to the documentation
- Added a copyright header for the new file (and for an old file which had none)
- Added some documentation at the module's head
- Edited the doc where it mentionned the "P_4-structure" of the graph. The algorithm actually is not very complicated, so it's better to give its main idea in this short description
:-)
- Added a link toward the pdf, and edited some trivial things too
- I really could not make sense of the way you built the induced cycle back, at the end of the algorithm. It did not appear to be very efficient either (you enumerated many edges, then checked that you had to consider them)... I wrote a new version of it, which also works for antiholes (though I have to call a .complement() on a (hopefully small) subgraph), and I hope that it is clearer this way. There are some explanations around in the code.
Well, that's more or less all I did. Please tell me what you think of it !
The usual procedure in these situations is that I review your changes while you review mine, and we can set the ticket to positive_review when we are both confident that the other made no catastrophic blunder :-)
Thank you again !!
Nathann
comment:6 Changed 8 years ago by
Oh yeah, and the "ActivePath?" variable disappears. That's actually the inverse of InPath?, which you kept updating during the algorithm... It was as efficient to just build it when you need it, that is when returning the cycle. That's what the C_index variable I added actually does :-)
Nathann
comment:7 Changed 8 years ago by
- Description modified (diff)
- Status changed from needs_review to needs_work
comment:8 Changed 8 years ago by
Hellooooooo Birk !!!
I just uploaded another patch that makes a Cython file from your module, and caches the graph as a dense bitset for faster has_edge queries. It turns out that the resulting algorithm is actually *slower* on random GNP
Before :
sage: g = graphs.RandomGNP(200,.2) sage: %timeit g.is_weakly_chordal() 25 loops, best of 3: 17.6 ms per loop sage: g = graphs.RandomGNP(200,.2) sage: %timeit g.is_weakly_chordal() 25 loops, best of 3: 17.3 ms per loop
After :
sage: g = graphs.RandomGNP(200,.2) sage: %timeit g.is_weakly_chordal() 25 loops, best of 3: 35.3 ms per loop sage: g = graphs.RandomGNP(200,.2) sage: %timeit g.is_weakly_chordal() 25 loops, best of 3: 37.2 ms per loop
I was quite surprised at that, and then noticed that the RandomGNP graphs have so many long holes that the algorithm is almost linear in this case. Hence I looked for graphs in which long holes may be hard to find (for instance if there are none), but which is not a clique as they should contain many induced P4. Well. Chordal graphs, then :-)
Hence I tried both implementations with our RandomIntervalGraph? generator.
Before:
sage: g = graphs.RandomInterval(50) sage: %timeit g.is_weakly_chordal() 5 loops, best of 3: 2.67 s per loop sage: g = graphs.RandomInterval(50) sage: %timeit g.is_weakly_chordal() 5 loops, best of 3: 3.12 s per loop sage: g = graphs.RandomInterval(100) sage: time g.is_weakly_chordal() True Time: CPU 39.91 s, Wall: 40.03 s
After
sage: g = graphs.RandomInterval(50) sage: %timeit g.is_weakly_chordal() 5 loops, best of 3: 715 ms per loop sage: g = graphs.RandomInterval(50) sage: %timeit g.is_weakly_chordal() 5 loops, best of 3: 775 ms per loop sage: g = graphs.RandomInterval(100) sage: time g.is_weakly_chordal() True Time: CPU 8.75 s, Wall: 8.76 s
Which makes sense, as in this case the algorithm will enumerate all P4 --that takes times and requires many has_edge queries.
Of course, it is easy to fix this. I will upload in a split second yet another patch that first checks whether the graph is chordal, in which case there is no point in running a (potentially very expensive) search. The runtime on interval graphs should be almost instantaneous afterwards.
Well... I feel a bit awkward because I think this optimization (and caching) patch is a good thing, but of course it is slower on the graphs we used as examples until now. as I think I understand why, and as it is much faster on other instances, I think that it should be merged anyway. What do you think ?
Nathann
comment:9 follow-up: ↓ 10 Changed 8 years ago by
- Status changed from needs_work to needs_review
Hmmm.... Well, here's how long it takes to check that random graphs are not chordal :
sage: g = graphs.RandomGNP(200,.2) sage: %time g.is_chordal() CPU times: user 0.31 s, sys: 0.00 s, total: 0.31 s Wall time: 0.31 s False
That's not negligible at all compared to the runtime of your is_weakly_chordal function for random graphs. What do you think we should do ?
Nathann
comment:10 in reply to: ↑ 9 Changed 8 years ago by
Thanks for your comments! But first of all I need to get sage updated. Sorry for that :-(
Birk
comment:11 Changed 8 years ago by
NOoooooooo prob ! ;-)
Nathann
comment:12 Changed 8 years ago by
(just for the joke) :-PPP
sage: g = graphs.RandomTree(400) sage: %time g.is_weakly_chordal() CPU times: user 9.42 s, sys: 0.02 s, total: 9.44 s Wall time: 9.48 s True
Yeahhhhhhhhhhh !! That's efficiency baby ! :-D
Nathann
Changed 8 years ago by
Changed 8 years ago by
Changed 8 years ago by
Changed 8 years ago by
comment:13 follow-up: ↓ 14 Changed 8 years ago by
Hello Nathann!
I agree with most of the patch set. Especially the removal of the variable ActivePath? is nice and the hole/antihole extraction is easier to understand (and does not need to be highly effficient). The techniques in the optim patch are kind of what I could imagine. Since I am new to sage and python it could not do that. So you did good work and the speed is much better!
On the other side, I want to point out that the algorithm became less readable to an outsider. Now it uses some kind of 'low level' programming. If one looks at bitset_in(dense_graph,u*n+v)
it takes time to realize that it stands for g.has_edge(u,v)
. I think this should be somehow hidden in the graph class instead of showing up in a 'high level' algorithm... (Of course, my statement about hidden is very blurry. I will write something about that and the graphs class hierarchy in general on sage-devel soon.) For the moment, I added a small modification (trac_13073-optim-review.patch) with which I could live. :-)
Finally some what- and why-questions:
- What is the difference between '
del InPath[c]
' and 'InPath.pop(c)
' ? You changed the former to the latter but only in some places...
- I find the change
for u in g.vertices()
to
for u in g
makes the code harder to read for an outsider. Why making it more cryptic?
- Why is the reference deleted in is_longanti_hole_free (but not in is_long_hole_free)?
- There is an upwards shift arrow appended at the end of the line
# C[i]...C[j] is necessarily an induced cycle.⇧
in process() of is_long_hole_free(). Similar for is_long_antihole_free(). What is the use of that symbol?
Thank you!
Birk
comment:14 in reply to: ↑ 13 ; follow-up: ↓ 15 Changed 8 years ago by
Helloooooo Birk !!!
On the other side, I want to point out that the algorithm became less readable to an outsider. Now it uses some kind of 'low level' programming. If one looks at
bitset_in(dense_graph,u*n+v)
it takes time to realize that it stands forg.has_edge(u,v)
. I think this should be somehow hidden in the graph class instead of showing up in a 'high level' algorithm... (Of course, my statement about hidden is very blurry. I will write something about that and the graphs class hierarchy in general on sage-devel soon.) For the moment, I added a small modification (trac_13073-optim-review.patch) with which I could live. :-)
Oh ! Nice. And you are right of course. We really need low-level backends for such things.. Could you change your patch a little bit though ? You define the _has_edge function inside of the other functions, and as you are in your own module I guess nobody would mind if you define a has_edge method. The other thing is that you could define it as a C method, something like that :
cdef inline int has_edge(bitset_t bs, int u, int v, int n): return bitset_in(bs, u*n+v)
This way you avoid a function call... But that's not very costly anyway :-)
Finally some what- and why-questions:
- What is the difference between '
del InPath[c]
' and 'InPath.pop(c)
' ? You changed the former to the latter but only in some places...
Hmmmm... I believed that that the following code would also destroy the content of a
, though it still exists.
sage: a = set([1,2,3]) sage: a set([1, 2, 3]) sage: d = {8:a} sage: d {8: set([1, 2, 3])} sage: del d[8] sage: a set([1, 2, 3])
I believed that a call to "del" was a direct order to destroy the given object, but it only removes the value from the dictionary. Looks good : sorry for this (partial!) modification, please update the code however you like :-)
- I find the change
for u in g.vertices()
to
for u in g
makes the code harder to read for an outsider. Why making it more cryptic?
O_o
Well, there is a small difference between the two lines, but I only changed it to make the code more... readable. You think "for u in g" is worse that "for u in g.vertices()" ?
We can write \forall u \in G
in papers, and I usually write "for u in g" because the line is shorter, and (to me) easier to read.
The difference between the two lines is that g.vertices() returns the *sorted* list of vertices, while "for i in g" returns the list of vertices in any order -- hence there is no call to *sort* and is *theoretically* faster, though I do not think that there is any significant difference in practice. It's up to you too if you don't like it !
- Why is the reference deleted in is_longanti_hole_free (but not in is_long_hole_free)?
Oh. I believe Sage complained when I built the documentation (sage -docbuild reference html
) because of the double entry (with the same reference code). The link works when you click on them in the HTML documentation though.
- There is an upwards shift arrow appended at the end of the line
# C[i]...C[j] is necessarily an induced cycle.⇧
in process() of is_long_hole_free(). Similar for is_long_antihole_free(). What is the use of that symbol?
It expresses my right to make typos in patches, but I understand from your comment that you believe I do not deserve that right :-P
Sorry about that, it is just a tyo ^^;
Nathann
comment:15 in reply to: ↑ 14 Changed 8 years ago by
Hello Nathann !!
I incorporated your suggestion. The _has_edge is moved and changed.
Finally some what- and why-questions:
- What is the difference between '
del InPath[c]
' and 'InPath.pop(c)
' ? You changed the former to the latter but only in some places...Hmmmm... I believed that that the following code would also destroy the content of
a
, though it still exists.
sage: a = set([1,2,3]) sage: a set([1, 2, 3]) sage: d = {8:a} sage: d {8: set([1, 2, 3])} sage: del d[8] sage: a set([1, 2, 3])I believed that a call to "del" was a direct order to destroy the given object, but it only removes the value from the dictionary. Looks good : sorry for this (partial!) modification, please update the code however you like
:-)
Okay, since I am new to python I did not know about this pop method before. I did some more research and found they are the same expect that the pop-method returns the value while del does not. As a post on stackoverflow shows (and I also check myself) this difference has impact on the run time. So if the value of the removed element is not needed 'del' is a bit faster. Hence, I go for 'del'...
- I find the change
for u in g.vertices()
to
for u in g
makes the code harder to read for an outsider. Why making it more cryptic?
O_o
Well, there is a small difference between the two lines, but I only changed it to make the code more... readable. You think "for u in g" is worse that "for u in g.vertices()" ?
We can write
\forall u \in G
in papers, and I usually write "for u in g" because the line is shorter, and (to me) easier to read.
Well, then let us write "for e in g" to iterate over all edges, alright? (joke). I am used to "for u in V(G)" or if we have G=(V,E) then "for u in V", and similar "for e in E(G)", "for e in E".
The difference between the two lines is that g.vertices() returns the *sorted* list of vertices, while "for i in g" returns the list of vertices in any order -- hence there is no call to *sort* and is *theoretically* faster, though I do not think that there is any significant difference in practice. It's up to you too if you don't like it !
Okay, we can use 'g' to avoid "sort" and since the comment makes it clear that u is a vertex. But I am not so convinced that it is well readable for everyone.
- Why is the reference deleted in is_longanti_hole_free (but not in is_long_hole_free)?
Oh. I believe Sage complained when I built the documentation (
sage -docbuild reference html
) because of the double entry (with the same reference code). The link works when you click on them in the HTML documentation though.
I see. There was no error but warnings about duplicate entries, hence removed.
- There is an upwards shift arrow appended at the end of the line
# C[i]...C[j] is necessarily an induced cycle.⇧
in process() of is_long_hole_free(). Similar for is_long_antihole_free(). What is the use of that symbol?
It expresses my right to make typos in patches, but I understand from your comment that you believe I do not deserve that right
:-P
I was just curious. And for a 'master of sage', sure, no typos allowed... ;-)
Thanks for all of your suggestions! :-)
Birk
Changed 8 years ago by
Changed 8 years ago by
comment:16 Changed 8 years ago by
- Reviewers changed from Nathann Cohen to Nathann Cohen,Birk Eisermann
Helloooooo Birk !!
Well, the code looks nice to me, it is tested, documented, sligthly optimized... I would say that we are done with it !
I just built the doc again and saw that there remained a conflict with the references... Of course : the paper is cited in one of the module's functions, and this function is also loaded inside of Graph, so it appears twice in Sphinx's view of the code.. I add a small patch that moves it inside of the module's doc, and the warning disappears as a result.
I also uploaded a patch that contains all the other ones, to make it easier for Jeroen.
As I agree with all your code and changes, you can set this ticket to "positive_review" if you also agree with my part :-)
Thaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaanksss !!
Nathann
The following patch:
Contains all of the following :
comment:17 Changed 8 years ago by
- Description modified (diff)
comment:18 Changed 8 years ago by
- Status changed from needs_review to positive_review
Good. Thank you!!
comment:19 Changed 8 years ago by
- Milestone changed from sage-5.1 to sage-5.2
comment:20 Changed 8 years ago by
- Dependencies set to #12917, #12965, #12544
- Status changed from positive_review to needs_work
comment:21 in reply to: ↑ 4 ; follow-up: ↓ 22 Changed 8 years ago by
Changed 8 years ago by
Changed 8 years ago by
comment:22 in reply to: ↑ 21 Changed 8 years ago by
- Status changed from needs_work to positive_review
Why?
........ Because I *HAD TO* when I wrote the patch, it wouldn't compile otherwise.
Though I just reverted those changes back to the "from sage.graphs.all [...]
" and nothing went wrong.
Big mystery to me, but trac_13073-all2.patch is a new rebased patch that compiles and does not make those changes anymore ^^;
Nathann
comment:23 Changed 8 years ago by
- Merged in set to sage-5.2.rc0
- Resolution set to fixed
- Status changed from positive_review to closed
Helloooooo Birk !!!
I do not know whether you want this patch to be reviewed already but I could not help giving it a look anyway, and I should make some remarks before I forget them
:-)
:-)
Thank you again !!!
Nathann