Opened 8 years ago

Closed 8 years ago

recognition of weakly chordal graphs

Reported by: Owned by: eisermbi jason, ncohen, rlm major sage-5.2 graph theory weakly chordal, hole, antihole dimpase, kini, ncohen sage-5.2.rc0 Birk Eisermann Nathann Cohen,Birk Eisermann N/A #12917, #12965, #12544

Adding functionality to recognize weakly chordal graphs; as part of that, procedures for finding holes (= induced cycle of length >=5) and antiholes (complements of holes) in graphs are included.

Apply:

comment:1 Changed 8 years ago by ncohen

Helloooooo Birk !!!

I do not know whether you want this patch to be reviewed already but I could not help giving it a look anyway, and I should make some remarks before I forget them :-)

• The first one is *thank you very much* for adding this kind of features, I added some recognition algorithms to Sage recently and I had no idea that there was a nice way to recognize these graphs !
• Your patch modifies the file generic_graph.py, which means that the patch would add methods to BOTH Graph and DiGraph? objects. As I guess that the algorithm does not answer anything sensible when given a DiGraph? as an input, I guess the best would be to add the methods to graph.py instead. BUT
• Your patch adds quite long functions. Hence perhaps the best is to write them inside of another file, while still exposing your functions in the Graph class.
• I do not understand why you have both a isHoleFree and isAntiholeFree method. Aren't they doing the same job ? You only use the first one yourself in is_weakly_chordal, so why write both in the patch ? At the very least the second one should call the first one on the graph's complement ?.. Did I say something wrong ?
• If this isAntiHoleFree function can be removed, perhaps the code will be short enough so that it will become possible to add it direcly inside of graph.py without writing it in a diffrent file first...
• We already have a is_even_hole_free method, as well as an is_odd_hole_free method. You can see by looking at their code that they are quite straightforward functions -- they immediately call the subgraph_search method. While it is algorithmically a very bad way to solve the problem, the search for subgraph is implemented in C and can be expected to be quite fast. Is such a method totally left behind by your specific code ? I expect that it is but I just wondered..
• Oh, and also... Could you add some comments to our code ? As it is, it is pretty hard for somebody who is not you to understand what it does :-)

Thank you again !!!

Nathann

comment:2 Changed 8 years ago by eisermbi

Hello Nathann!!

Location / Visibility:

• The weakly chordal test (checking for C_k, k \geq 5) provides similar functionality as the chordal test (checking for C_k, k \geq 4). Since the function is_chordal() is contained in the file generic_graph.py, I chose this file for the new functions. You are right that the weakly chordal test does not consider directed edges, but just its underlying undirected graph. My previous version made an 'undirected' copy of the (Di)Graph and thus, it yield a correct result. To gain speed, I removed this part and because of this moved the algorithm into graph.py.
• Shall the functions be moved to an extra file?
• I think since the functions is_even_hole_free() and is_odd_hole_free() are available, it might be useful to have the function is_hole_free()/is_antihole_free() available as well (and not make them private)...

Run time / speed:

• It is true that g.complement().is_long_hole_free() and g.is_long_antihole_free() are doing the same job. But for the first function, the complement of the graph has to be calculated while this is not the case for the second one. Actually both algorithms should be used in is_weakly_chordal(). That was an oversight in the first version (though the function was still correct). Now I have done a speed comparison between above two options. The result of comparison1() of attached "comparison.py" shows that in average it is better to use both.
• Okay, I was focused on a fast algorithm. Now I wrote a function using the subgraph_search of C_k, for k = 5,6,...,graph.order() (what does actually the same as is_even_hole_free() and is_odd_hole_free() together except for C_4's). I did some speed comparison. After some rounds of profiling and optimization, the Python code runs in average several times faster than the subgraph search - as you said, subgraph search is algorithmically bad in general. Hence, the is_long_(anti)hole_free algorithm should be preferred. See result of comparison2() of attached "comparison.py".
• Profiling the code shows that 99% of the time is spend in the edge_iterator. Whether it is possible to improve on that further, it is beyond my knowledge. Any improvements are welcome!

• Actually, the algorithms are described in the mentioned paper and I just implemented them. The description of each function is two pages long (+ one page for the certificate) and I doubt that I can write down all as comments. I added at least some comments...

Birk

Changed 8 years ago by eisermbi

only for speed comparison

comment:3 Changed 8 years ago by eisermbi

• Status changed from new to needs_review

comment:4 follow-up: ↓ 21 Changed 8 years ago by ncohen

Hellooooo Birk !!

I am currently reviewing your patch. As you code is somewhat long, and because you convinced me that both functions were useful I thought it would be better to move the code *out* of graph.py in a new weakly_chordal module (we already have some modules for specific graph classes). This way graph.py does not grow too much, and we will be able to add specific documentation in the module's head. I do it very often these days, and you can add very interesting things there on what the code actually does.

For instance :

This patch is *not* a review. I changed nothing to your code, unless what I could not avoid:

• The occurrences of self have been replaced by g
• I changed several instances of "from sage.graphs.all import Graph" into the more proper "from sage.graphs.graph import Graph"
• The new module is imported in sage.graphs.all.py
• There are shortcuts from graph.py to your methods so that g.is_long_hole_free still works. And quite honestly we should do this for many other methods to shorten graph.py !

I will try to send a complete review by today. It will probably take time as I will have to read/understand the paper to mention. If I can't make it, you will have it by tomorrow, and unless something really really bad happens this ticket should be reviewed by the end of the month :-D

Nathann

comment:5 Changed 8 years ago by ncohen

• Authors set to Birk Eisermann
• Description modified (diff)
• Reviewers set to Nathann Cohen

Hellooooooooo Birk !!!

Here is the promised reviewer's patch ! I originally intended to do a bit more, but some of the optimizations I had in mind were actually worsening the situation on random graphs, and I got stuck with *another* optimization because some things are possible in Python files that are not in Cython ones... *sigh* :-)

Anyway I changed the code a little since the previous version :

• Added the new module to the documentation
• Edited the doc where it mentionned the "P_4-structure" of the graph. The algorithm actually is not very complicated, so it's better to give its main idea in this short description :-)
• Added a link toward the pdf, and edited some trivial things too
• I really could not make sense of the way you built the induced cycle back, at the end of the algorithm. It did not appear to be very efficient either (you enumerated many edges, then checked that you had to consider them)... I wrote a new version of it, which also works for antiholes (though I have to call a .complement() on a (hopefully small) subgraph), and I hope that it is clearer this way. There are some explanations around in the code.

Well, that's more or less all I did. Please tell me what you think of it !

The usual procedure in these situations is that I review your changes while you review mine, and we can set the ticket to positive_review when we are both confident that the other made no catastrophic blunder :-)

Thank you again !!

Nathann

comment:6 Changed 8 years ago by ncohen

Oh yeah, and the "ActivePath?" variable disappears. That's actually the inverse of InPath?, which you kept updating during the algorithm... It was as efficient to just build it when you need it, that is when returning the cycle. That's what the C_index variable I added actually does :-)

Nathann

comment:7 Changed 8 years ago by ncohen

• Description modified (diff)
• Status changed from needs_review to needs_work

comment:8 Changed 8 years ago by ncohen

Hellooooooo Birk !!!

I just uploaded another patch that makes a Cython file from your module, and caches the graph as a dense bitset for faster has_edge queries. It turns out that the resulting algorithm is actually *slower* on random GNP

Before :

sage: g = graphs.RandomGNP(200,.2)
sage: %timeit g.is_weakly_chordal()
25 loops, best of 3: 17.6 ms per loop
sage: g = graphs.RandomGNP(200,.2)
sage: %timeit g.is_weakly_chordal()
25 loops, best of 3: 17.3 ms per loop


After :

sage: g = graphs.RandomGNP(200,.2)
sage: %timeit g.is_weakly_chordal()
25 loops, best of 3: 35.3 ms per loop
sage: g = graphs.RandomGNP(200,.2)
sage: %timeit g.is_weakly_chordal()
25 loops, best of 3: 37.2 ms per loop


I was quite surprised at that, and then noticed that the RandomGNP graphs have so many long holes that the algorithm is almost linear in this case. Hence I looked for graphs in which long holes may be hard to find (for instance if there are none), but which is not a clique as they should contain many induced P4. Well. Chordal graphs, then :-)

Hence I tried both implementations with our RandomIntervalGraph? generator.

Before:

sage: g = graphs.RandomInterval(50)
sage: %timeit g.is_weakly_chordal()
5 loops, best of 3: 2.67 s per loop
sage: g = graphs.RandomInterval(50)
sage: %timeit g.is_weakly_chordal()
5 loops, best of 3: 3.12 s per loop

sage: g = graphs.RandomInterval(100)
sage: time g.is_weakly_chordal()
True
Time: CPU 39.91 s, Wall: 40.03 s


After

sage: g = graphs.RandomInterval(50)
sage: %timeit g.is_weakly_chordal()
5 loops, best of 3: 715 ms per loop
sage: g = graphs.RandomInterval(50)
sage: %timeit g.is_weakly_chordal()
5 loops, best of 3: 775 ms per loop

sage: g = graphs.RandomInterval(100)
sage: time g.is_weakly_chordal()
True
Time: CPU 8.75 s, Wall: 8.76 s


Which makes sense, as in this case the algorithm will enumerate all P4 --that takes times and requires many has_edge queries.

Of course, it is easy to fix this. I will upload in a split second yet another patch that first checks whether the graph is chordal, in which case there is no point in running a (potentially very expensive) search. The runtime on interval graphs should be almost instantaneous afterwards.

Well... I feel a bit awkward because I think this optimization (and caching) patch is a good thing, but of course it is slower on the graphs we used as examples until now. as I think I understand why, and as it is much faster on other instances, I think that it should be merged anyway. What do you think ?

Nathann

comment:9 follow-up: ↓ 10 Changed 8 years ago by ncohen

• Status changed from needs_work to needs_review

Hmmm.... Well, here's how long it takes to check that random graphs are not chordal :

sage: g = graphs.RandomGNP(200,.2)
sage: %time g.is_chordal()
CPU times: user 0.31 s, sys: 0.00 s, total: 0.31 s
Wall time: 0.31 s
False


That's not negligible at all compared to the runtime of your is_weakly_chordal function for random graphs. What do you think we should do ?

Nathann

comment:10 in reply to: ↑ 9 Changed 8 years ago by eisermbi

Thanks for your comments! But first of all I need to get sage updated. Sorry for that :-(

Birk

comment:11 Changed 8 years ago by ncohen

NOoooooooo prob ! ;-)

Nathann

comment:12 Changed 8 years ago by ncohen

(just for the joke) :-PPP

sage: g = graphs.RandomTree(400)
sage: %time g.is_weakly_chordal()
CPU times: user 9.42 s, sys: 0.02 s, total: 9.44 s
Wall time: 9.48 s
True


Yeahhhhhhhhhhh !! That's efficiency baby ! :-D

Nathann

comment:13 follow-up: ↓ 14 Changed 8 years ago by eisermbi

Hello Nathann!

I agree with most of the patch set. Especially the removal of the variable ActivePath? is nice and the hole/antihole extraction is easier to understand (and does not need to be highly effficient). The techniques in the optim patch are kind of what I could imagine. Since I am new to sage and python it could not do that. So you did good work and the speed is much better!

On the other side, I want to point out that the algorithm became less readable to an outsider. Now it uses some kind of 'low level' programming. If one looks at bitset_in(dense_graph,u*n+v) it takes time to realize that it stands for g.has_edge(u,v). I think this should be somehow hidden in the graph class instead of showing up in a 'high level' algorithm... (Of course, my statement about hidden is very blurry. I will write something about that and the graphs class hierarchy in general on sage-devel soon.) For the moment, I added a small modification (trac_13073-optim-review.patch) with which I could live. :-)

Finally some what- and why-questions:

• What is the difference between 'del InPath[c]' and 'InPath.pop(c)' ? You changed the former to the latter but only in some places...
• I find the change

for u in g.vertices()

to

for u in g

makes the code harder to read for an outsider. Why making it more cryptic?

• Why is the reference deleted in is_longanti_hole_free (but not in is_long_hole_free)?
• There is an upwards shift arrow appended at the end of the line

# C[i]...C[j] is necessarily an induced cycle.⇧

in process() of is_long_hole_free(). Similar for is_long_antihole_free(). What is the use of that symbol?

Thank you!

Birk

comment:14 in reply to: ↑ 13 ; follow-up: ↓ 15 Changed 8 years ago by ncohen

Helloooooo Birk !!!

On the other side, I want to point out that the algorithm became less readable to an outsider. Now it uses some kind of 'low level' programming. If one looks at bitset_in(dense_graph,u*n+v) it takes time to realize that it stands for g.has_edge(u,v). I think this should be somehow hidden in the graph class instead of showing up in a 'high level' algorithm... (Of course, my statement about hidden is very blurry. I will write something about that and the graphs class hierarchy in general on sage-devel soon.) For the moment, I added a small modification (trac_13073-optim-review.patch) with which I could live. :-)

Oh ! Nice. And you are right of course. We really need low-level backends for such things.. Could you change your patch a little bit though ? You define the _has_edge function inside of the other functions, and as you are in your own module I guess nobody would mind if you define a has_edge method. The other thing is that you could define it as a C method, something like that :

cdef inline int has_edge(bitset_t bs, int u, int v, int n):
return bitset_in(bs, u*n+v)


This way you avoid a function call... But that's not very costly anyway :-)

Finally some what- and why-questions:

• What is the difference between 'del InPath[c]' and 'InPath.pop(c)' ? You changed the former to the latter but only in some places...

Hmmmm... I believed that that the following code would also destroy the content of a, though it still exists.

sage: a = set([1,2,3])
sage: a
set([1, 2, 3])
sage: d = {8:a}
sage: d
{8: set([1, 2, 3])}
sage: del d[8]
sage: a
set([1, 2, 3])


I believed that a call to "del" was a direct order to destroy the given object, but it only removes the value from the dictionary. Looks good : sorry for this (partial!) modification, please update the code however you like :-)

• I find the change

for u in g.vertices()

to

for u in g

makes the code harder to read for an outsider. Why making it more cryptic?

O_o

Well, there is a small difference between the two lines, but I only changed it to make the code more... readable. You think "for u in g" is worse that "for u in g.vertices()" ?

We can write \forall u \in G in papers, and I usually write "for u in g" because the line is shorter, and (to me) easier to read. The difference between the two lines is that g.vertices() returns the *sorted* list of vertices, while "for i in g" returns the list of vertices in any order -- hence there is no call to *sort* and is *theoretically* faster, though I do not think that there is any significant difference in practice. It's up to you too if you don't like it !

• Why is the reference deleted in is_longanti_hole_free (but not in is_long_hole_free)?

Oh. I believe Sage complained when I built the documentation (sage -docbuild reference html) because of the double entry (with the same reference code). The link works when you click on them in the HTML documentation though.

• There is an upwards shift arrow appended at the end of the line

# C[i]...C[j] is necessarily an induced cycle.⇧

in process() of is_long_hole_free(). Similar for is_long_antihole_free(). What is the use of that symbol?

It expresses my right to make typos in patches, but I understand from your comment that you believe I do not deserve that right :-P

Sorry about that, it is just a tyo ^^;

Nathann

comment:15 in reply to: ↑ 14 Changed 8 years ago by eisermbi

Hello Nathann !!

I incorporated your suggestion. The _has_edge is moved and changed.

Finally some what- and why-questions:

• What is the difference between 'del InPath[c]' and 'InPath.pop(c)' ? You changed the former to the latter but only in some places...

Hmmmm... I believed that that the following code would also destroy the content of a, though it still exists.

sage: a = set([1,2,3])
sage: a
set([1, 2, 3])
sage: d = {8:a}
sage: d
{8: set([1, 2, 3])}
sage: del d[8]
sage: a
set([1, 2, 3])


I believed that a call to "del" was a direct order to destroy the given object, but it only removes the value from the dictionary. Looks good : sorry for this (partial!) modification, please update the code however you like :-)

Okay, since I am new to python I did not know about this pop method before. I did some more research and found they are the same expect that the pop-method returns the value while del does not. As a post on stackoverflow shows (and I also check myself) this difference has impact on the run time. So if the value of the removed element is not needed 'del' is a bit faster. Hence, I go for 'del'...

• I find the change

for u in g.vertices()

to

for u in g

makes the code harder to read for an outsider. Why making it more cryptic?

O_o

Well, there is a small difference between the two lines, but I only changed it to make the code more... readable. You think "for u in g" is worse that "for u in g.vertices()" ?

We can write \forall u \in G in papers, and I usually write "for u in g" because the line is shorter, and (to me) easier to read.

Well, then let us write "for e in g" to iterate over all edges, alright? (joke). I am used to "for u in V(G)" or if we have G=(V,E) then "for u in V", and similar "for e in E(G)", "for e in E".

The difference between the two lines is that g.vertices() returns the *sorted* list of vertices, while "for i in g" returns the list of vertices in any order -- hence there is no call to *sort* and is *theoretically* faster, though I do not think that there is any significant difference in practice. It's up to you too if you don't like it !

Okay, we can use 'g' to avoid "sort" and since the comment makes it clear that u is a vertex. But I am not so convinced that it is well readable for everyone.

• Why is the reference deleted in is_longanti_hole_free (but not in is_long_hole_free)?

Oh. I believe Sage complained when I built the documentation (sage -docbuild reference html) because of the double entry (with the same reference code). The link works when you click on them in the HTML documentation though.

I see. There was no error but warnings about duplicate entries, hence removed.

• There is an upwards shift arrow appended at the end of the line

# C[i]...C[j] is necessarily an induced cycle.⇧

in process() of is_long_hole_free(). Similar for is_long_antihole_free(). What is the use of that symbol?

It expresses my right to make typos in patches, but I understand from your comment that you believe I do not deserve that right :-P

I was just curious. And for a 'master of sage', sure, no typos allowed... ;-)

Thanks for all of your suggestions! :-)

Birk

comment:16 Changed 8 years ago by ncohen

• Reviewers changed from Nathann Cohen to Nathann Cohen,Birk Eisermann

Helloooooo Birk !!

Well, the code looks nice to me, it is tested, documented, sligthly optimized... I would say that we are done with it !

I just built the doc again and saw that there remained a conflict with the references... Of course : the paper is cited in one of the module's functions, and this function is also loaded inside of Graph, so it appears twice in Sphinx's view of the code.. I add a small patch that moves it inside of the module's doc, and the warning disappears as a result.

I also uploaded a patch that contains all the other ones, to make it easier for Jeroen.

As I agree with all your code and changes, you can set this ticket to "positive_review" if you also agree with my part :-)

Thaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaanksss !!

Nathann

The following patch:

Contains all of the following :

comment:17 Changed 8 years ago by ncohen

• Description modified (diff)

comment:18 Changed 8 years ago by eisermbi

• Status changed from needs_review to positive_review

Good. Thank you!!

comment:19 Changed 8 years ago by jdemeyer

• Milestone changed from sage-5.1 to sage-5.2

comment:20 Changed 8 years ago by jdemeyer

• Dependencies set to #12917, #12965, #12544
• Status changed from positive_review to needs_work

This should be rebased on top of #12917 + #12965 + #12544.

comment:21 in reply to: ↑ 4 ; follow-up: ↓ 22 Changed 8 years ago by jdemeyer

• I changed several instances of "from sage.graphs.all import Graph" into the more proper "from sage.graphs.graph import Graph"

Why? Importing from .all makes it easier if files are moved around later, see for example #12857.

Last edited 8 years ago by jdemeyer (previous) (diff)

comment:22 in reply to: ↑ 21 Changed 8 years ago by ncohen

• Status changed from needs_work to positive_review

Why?

........ Because I *HAD TO* when I wrote the patch, it wouldn't compile otherwise.

Though I just reverted those changes back to the "from sage.graphs.all [...]" and nothing went wrong.

Big mystery to me, but trac_13073-all2.patch is a new rebased patch that compiles and does not make those changes anymore ^^;

Nathann

comment:23 Changed 8 years ago by jdemeyer

• Merged in set to sage-5.2.rc0
• Resolution set to fixed
• Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.