Opened 7 years ago
Closed 7 years ago
#17712 closed enhancement (fixed)
Adds memoization to the branch and bound for vertex separation
Reported by: | dcoudert | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | sage-6.5 |
Component: | graph theory | Keywords: | |
Cc: | ncohen | Merged in: | |
Authors: | David Coudert | Reviewers: | Nathann Cohen |
Report Upstream: | N/A | Work issues: | |
Branch: | 93e9a0a (Commits, GitHub, GitLab) | Commit: | 93e9a0ab8ffae7ae695e51f8a89ca72994c3a9de |
Dependencies: | #17711 | Stopgaps: |
Description (last modified by )
This patch adds to the branch and bound algorithm for the vertex separation (#17647) a kind of memoization tool for storing prefixes as proposed in [CMN14]. This can significantly reduce the number of nodes of the branch-and-bound tree to consider.
Reference: [CMN14] D. Coudert, D. Mazauric, N. Nisse. *Experimental Evaluation of a Branch and Bound Algorithm for computing Pathwidth*. In SEA'13, LNCS vol. 8504, pp.46-58, 2014.
Change History (31)
comment:1 Changed 7 years ago by
- Branch set to public/17112
- Dependencies set to #17711
- Description modified (diff)
comment:2 Changed 7 years ago by
- Commit set to 8dd34cd88b0463af0a1c2ad47560ea77a1543bf0
comment:3 Changed 7 years ago by
- Commit changed from 8dd34cd88b0463af0a1c2ad47560ea77a1543bf0 to 1a33d044aa565ef4ea574d33d8010c634a17699a
Branch pushed to git repo; I updated commit sha1. New commits:
1a33d04 | trac #17712: some cleaning
|
comment:4 Changed 7 years ago by
- Cc ncohen added
- Status changed from new to needs_review
I have pushed a first draft for storing prefixes. We can certainly reduce further the computation time, but I don't know how. Also, I tried to be as simple as possible.
David.
comment:5 follow-up: ↓ 7 Changed 7 years ago by
- Status changed from needs_review to needs_work
Hello David,
A short review before going to sleep:
1) You do not seem to do anything at all with max_prefix_number
.
2) You add parameters in path_decomposition
but you don't do
anything with them.
3) Given that you seem to only store sets of integers, I am
tempted to think that you should prefer FrozenBitset
objects
over frozenset
. They should give you a faster hash/equality
test.
4) There are many empty docstrings in the methods of
PrefixStorage
. And to be honest, considering the amount of doc
it takes to implement all of that, I wonder if it would not be
better to leave it all in the code, without creating a new
class. "Just a dictionary whose keys are bitsets".
In order to not have too many things in the actual branch-and-bound, it may be enough to have just a couple of functions (your update/is_known_prefix) taking a dictionary as an additional argument.
Good night,
Nathann
comment:6 Changed 7 years ago by
- Commit changed from 1a33d044aa565ef4ea574d33d8010c634a17699a to 1181090c4d48f10b1bc20daa517c5396b6d27083
Branch pushed to git repo; I updated commit sha1. New commits:
1181090 | trac #17712: improvements
|
comment:7 in reply to: ↑ 5 Changed 7 years ago by
Replying to ncohen:
Hello David,
A short review before going to sleep:
1) You do not seem to do anything at all with
max_prefix_number
.
Now used. I forgot some parts. Surprizingly you are less efficient when the heating system of your building is broken during winter time...
Also I'm now only recording 1 boolean per prefix. We don't need to store the cost.
2) You add parameters in
path_decomposition
but you don't do anything with them.
Fixed
3) Given that you seem to only store sets of integers, I am tempted to think that you should prefer
FrozenBitset
objects overfrozenset
. They should give you a faster hash/equality test.
My tests gives better performances with frozenset
. But if you have better ideas...
4) There are many empty docstrings in the methods of
PrefixStorage
. And to be honest, considering the amount of doc it takes to implement all of that, I wonder if it would not be better to leave it all in the code, without creating a new class. "Just a dictionary whose keys are bitsets".In order to not have too many things in the actual branch-and-bound, it may be enough to have just a couple of functions (your update/is_known_prefix) taking a dictionary as an additional argument.
The main advantage of the class is to hide all the parameters. It is very useful to be able to tune these parameters according the amount of memory you have. Passing the dict as parameter, we would also have to pass other parameters. I'm not convinced that it would really be shorter to have 2 functions instead of this class.
We could however save the enable_prefix_storage
parameter since we can obtain the same behavior setting max_prefix_length=0
.
Good night,
Nathann
If you want to know the kind of speedup we get for a quite hard instance:
sage: G = graphs.MycielskiGraph(5) sage: %timeit vs, seq = VS.vertex_separation_BAB(G, enable_prefix_storage=True) 1000 loops, best of 3: 1.3 ms per loop sage: %timeit vs, seq = VS.vertex_separation_BAB(G, enable_prefix_storage=False) 100 loops, best of 3: 2.19 ms per loop sage: G = graphs.MycielskiGraph(6) sage: %timeit vs, seq = VS.vertex_separation_BAB(G, max_prefix_length=20) 1 loops, best of 3: 1.05 s per loop sage: %timeit vs, seq = VS.vertex_separation_BAB(G, max_prefix_length=10) 1 loops, best of 3: 1.84 s per loop sage: %timeit vs, seq = VS.vertex_separation_BAB(G, max_prefix_length=8) 1 loops, best of 3: 12 s per loop sage: %timeit vs, seq = VS.vertex_separation_BAB(G, max_prefix_length=7) 1 loops, best of 3: 40.8 s per loop
I have not tested VS.vertex_separation_BAB(G, enable_prefix_storage=False)
with this version of the prefix storage. With a former version (using trees and so much more ugly code) it was around 40min, so with dict should be more than 1h
David.
comment:8 Changed 7 years ago by
- Status changed from needs_work to needs_review
comment:9 Changed 7 years ago by
- Commit changed from 1181090c4d48f10b1bc20daa517c5396b6d27083 to 6e39f78fbdfcf22ca89ba23ac96ce88431feaa6e
Branch pushed to git repo; I updated commit sha1. New commits:
6e39f78 | trac #17712: remove parameter enable_prefix_storage
|
comment:10 Changed 7 years ago by
Hello again,
I am working again on this patch, and I cannot say that I like this PrefixStorage
class very much. To make it simple it is just a dictionary, in which you seem to store all values that you do not want to carry around as parameters of the BAB_C
function.
Some parameters are almost impossible to document in the PrefixStorage
class and can only be understood if you know exactly where they are called from the inside of the BAB_C
function.
I removed some undocumented functions from PrefixStorage
that you did not call (as well as the undocumented __len__
). Please make sure that sage -coverage <file>
does not return anything wrong before you submit a patch for review.
....
Okay, and I just noticed that you pushed a new commit on this needs_review
patch. So now I will see whether it breaks all my modifications.
Today is not a good day.
Nathann
New commits:
6e39f78 | trac #17712: remove parameter enable_prefix_storage
|
comment:11 Changed 7 years ago by
Sorry, sorry, sorry, sorry, I removed the enable prefix storage parameter to shorten the doc. Really sorry if I breaks your edits...
comment:12 Changed 7 years ago by
If you believe it is better to add extra parameters to BAB_C
(dict, max length, max number, etc.) and to remove this class, we can try.
comment:13 Changed 7 years ago by
- Status changed from needs_review to needs_work
Hello again,
This commit does what was said above, and also rephrases the short description of the caching mechanism in the module's doc. I hope that it is clearer.
As you just removed a keyword I also wondered about the use of "is_on".
Two remarks:
1) Right now you 'store' the size of the dictionary in an integer variable. To do this, every time you add a new element in the dictionary, you check whether the element is there already. Thus, each of the +1
costs you a lookup, theoretically a log(n)
operation. I have not run any timings, but I would be a bit troubled if calling len(the_dict)
was not faster.
2) I still do not understand the point of the dictionary: what are "False" values useful for ? Why isn't it just a set of frozensets ?
3)
~/sage/graphs/graph_decompositions$ sage -coverage vertex_separation.pyx ------------------------------------------------------------------------ SCORE vertex_separation.pyx: 88.9% (8 of 9) Missing doctests: * line 1663: def __init__(self, int max_prefix_length=20, int max_prefix_number=10**6) ------------------------------------------------------------------------
Nathann
P.S. : About removing this PrefixStorage
: let us first see what happens with this dict/set question. My intuition is that if this thing becomes a set you can already remove cost/upper_bound/is_improved
from update()
(and only call 'update' when appropriate) which may already simplify matters.
comment:14 Changed 7 years ago by
- Commit changed from 6e39f78fbdfcf22ca89ba23ac96ce88431feaa6e to 4abd2bacff0b06ae13a4fb4d046c7ecee4716d28
Branch pushed to git repo; I updated commit sha1. New commits:
4abd2ba | trac #17712: Reviewer's commit
|
comment:15 Changed 7 years ago by
- Branch changed from public/17112 to public/17712
comment:16 follow-up: ↓ 17 Changed 7 years ago by
Hello Nathann,
You are right, we could use a set instead of a dictionary and add the prefix P
to the set only if c(P)<\min_{L\in{\cal L}_P(V)} c(L)
.
However, it is apparently faster to test if a frozenset is a key of a dictionary than if it is in a set.
sage: d = dict() sage: s = set() sage: for i in range(1, 11): ....: for z in Combinations(range(2*i), i): ....: d[frozenset(z)] = True ....: s.add(frozenset(z)) ....: sage: len(d) 250952 sage: %timeit len(d) 10000000 loops, best of 3: 63 ns per loop sage: %timeit len(s) 10000000 loops, best of 3: 59.9 ns per loop sage: sage: elt = d.keys()[randint(0, len(d)-1)] sage: %timeit elt in d 10000000 loops, best of 3: 60.9 ns per loop sage: %timeit elt in s 10000000 loops, best of 3: 163 ns per loop sage: sage: elt = d.keys()[randint(0, len(d)-1)] sage: %timeit elt in d 10000000 loops, best of 3: 58.8 ns per loop sage: %timeit elt in s 10000000 loops, best of 3: 160 ns per loop
What we can do is to change is_known_prefix
as follows:
if size<1 or size>self.max_prefix_length: return False cdef int i cdef frozenset my_prefix = frozenset([prefix[i] for i in range(size)]) return my_prefix in self.PT
and update
as:
cdef int i if size>0 and size<=self.max_prefix_length \ and not (is_improved and cost==upper_bound) \ and len(self.PT)< self.max_prefix_number: my_prefix = frozenset(prefix[i] for i in range(size)) self.PT[my_prefix] = True
Furthermore, if we do that inside BAB_C
, we do only once my_prefix = frozenset(prefix[i] for i in range(size))
.
If you agree, I will implement the changes.
David.
comment:17 in reply to: ↑ 16 Changed 7 years ago by
Yo !
You are right, we could use a set instead of a dictionary and add the prefix
P
to the set only ifc(P)<\min_{L\in{\cal L}_P(V)} c(L)
.
Cool. This should sipmlify both code and doc.
However, it is apparently faster to test if a frozenset is a key of a dictionary than if it is in a set.
WHaaaaaaaaaaaaaaat ?.... -_-
Okay. Dict with 'True' keys. Really... -_-
I will write to sage-devel with your timings.
If you agree, I will implement the changes.
+1
Nathann
comment:18 Changed 7 years ago by
- Commit changed from 4abd2bacff0b06ae13a4fb4d046c7ecee4716d28 to 4cc8d7fc0550aba5fd9e03f3263ede4172ff9b73
Branch pushed to git repo; I updated commit sha1. New commits:
4cc8d7f | trac #17712: remove class PrefixStorage and simplify code
|
comment:19 Changed 7 years ago by
With this modification, the code is significantly faster. Cool! I'm really happy to have this timing on my MBA, knowing that it is a hard instance for this problem.
sage: from sage.graphs.graph_decompositions import vertex_separation as VS sage: G = graphs.MycielskiGraph(6) sage: %timeit vs, seq = VS.vertex_separation(G) 1 loops, best of 3: 965 ms per loop
We can easily change from dict
to any other relevant data structure if it is faster.
David.
comment:20 Changed 7 years ago by
HMmmmm.. Less code, more speed. Love when it happens :-P
Nathann
comment:21 Changed 7 years ago by
Since we learned that "internally sets are implemented just as dictionaries with no values", there is no reason for using dictionary. The timing was certainly due to the order in which values were inserted in the set and in the dictionary. Moreover, we can get different results:
sage: L = [] sage: for i in range(1, 11): ....: for z in Combinations(range(2*i), i): ....: L.append(frozenset(z)) ....: sage: shuffle(L) sage: d = dict( (k, True) for k in L ) sage: s = set(L) sage: elt = L[randint(0, len(L)-1)] sage: %timeit elt in s 10000000 loops, best of 3: 66.8 ns per loop sage: %timeit elt in d 10000000 loops, best of 3: 64.1 ns per loop sage: d = dict() sage: for i in range(1, 11): for z in Combinations(range(2*i), i): d[frozenset(z)] = True ....: sage: %timeit elt in d 10000000 loops, best of 3: 157 ns per loop
I will change to set.
David.
comment:22 Changed 7 years ago by
- Commit changed from 4cc8d7fc0550aba5fd9e03f3263ede4172ff9b73 to f6a71a03edd5e21712ae76268e1cdd1a00b8b845
Branch pushed to git repo; I updated commit sha1. New commits:
f6a71a0 | trac #17712: use set instead of dict to store prefixes
|
comment:23 Changed 7 years ago by
- Status changed from needs_work to needs_review
Same speed and bit less doc.
comment:24 Changed 7 years ago by
- Commit changed from f6a71a03edd5e21712ae76268e1cdd1a00b8b845 to 8bcfa17987a8561d459703334615dec1b7edfa3d
Branch pushed to git repo; I updated commit sha1. New commits:
8bcfa17 | trac #17712: store prefixes generated by greedy steps
|
comment:25 Changed 7 years ago by
I have added a few lines to store the prefixes generated by the greedy steps. Although it takes some time locally, it should help reducing the total number of visited branches.
comment:26 Changed 7 years ago by
- Commit changed from 8bcfa17987a8561d459703334615dec1b7edfa3d to 919df3da5d54ddc1531581c225d8dfe7a21b0f76
Branch pushed to git repo; I updated commit sha1. New commits:
919df3d | trac #17712: move tests on prefixes
|
comment:27 Changed 7 years ago by
- Commit changed from 919df3da5d54ddc1531581c225d8dfe7a21b0f76 to 93e9a0ab8ffae7ae695e51f8a89ca72994c3a9de
Branch pushed to git repo; I updated commit sha1. New commits:
93e9a0a | trac #17712: Update doc
|
comment:28 Changed 7 years ago by
- Reviewers set to Nathann Cohen
Yoooooooooo !
If you don't see anything wrong in this last commit you can set the ticket to positive_review
. Thanks for you work,
Nathann
comment:29 Changed 7 years ago by
Youhou!!!
Thank you once again Nathann for your very constructive help. The final solution is very nice.
Best, David.
PS: I have other possible inputs for this module (algorithm for trees, pre-processing for reducing the size of input, algorithm for testing vs=1 and vs=2, heuristics, etc.), but not right now.
comment:30 Changed 7 years ago by
- Status changed from needs_review to positive_review
comment:31 Changed 7 years ago by
- Branch changed from public/17712 to 93e9a0ab8ffae7ae695e51f8a89ca72994c3a9de
- Resolution set to fixed
- Status changed from positive_review to closed
Branch pushed to git repo; I updated commit sha1. Last 10 new commits:
Merge branch 'public/17647b' of git://trac.sagemath.org/sage into tmp
trac#17711: expose BAB in vertex_separation
trac#17711: Let MILP work with Graph as well
trac#17711: Let exp work with Graph as well
trac#17711: Let lower_bound to work with Graph as well and clean lots of tests
trac#17711: adds decomposition into strongly connected components
trac #17711: Mostly removing trailing whitespaces
trac #17711: handle connected components
trac #17711: minor edit
trac #17712: add prefix storage to BAB