Opened 5 years ago

Closed 5 years ago

#17712 closed enhancement (fixed)

Adds memoization to the branch and bound for vertex separation

Reported by: dcoudert Owned by:
Priority: minor Milestone: sage-6.5
Component: graph theory Keywords:
Cc: ncohen Merged in:
Authors: David Coudert Reviewers: Nathann Cohen
Report Upstream: N/A Work issues:
Branch: 93e9a0a (Commits) Commit: 93e9a0ab8ffae7ae695e51f8a89ca72994c3a9de
Dependencies: #17711 Stopgaps:

Description (last modified by dcoudert)

This patch adds to the branch and bound algorithm for the vertex separation (#17647) a kind of memoization tool for storing prefixes as proposed in [CMN14]. This can significantly reduce the number of nodes of the branch-and-bound tree to consider.

Reference: [CMN14] D. Coudert, D. Mazauric, N. Nisse. *Experimental Evaluation of a Branch and Bound Algorithm for computing Pathwidth*. In SEA'13, LNCS vol. 8504, pp.46-58, 2014.

Change History (31)

comment:1 Changed 5 years ago by dcoudert

  • Branch set to public/17112
  • Dependencies set to #17711
  • Description modified (diff)

comment:2 Changed 5 years ago by git

  • Commit set to 8dd34cd88b0463af0a1c2ad47560ea77a1543bf0

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

cfc96a0Merge branch 'public/17647b' of git://trac.sagemath.org/sage into tmp
8c666aetrac#17711: expose BAB in vertex_separation
62dd7c8trac#17711: Let MILP work with Graph as well
f6f4975trac#17711: Let exp work with Graph as well
d1b63batrac#17711: Let lower_bound to work with Graph as well and clean lots of tests
f627165trac#17711: adds decomposition into strongly connected components
085a3cdtrac #17711: Mostly removing trailing whitespaces
0d9da78trac #17711: handle connected components
6baf83etrac #17711: minor edit
8dd34cdtrac #17712: add prefix storage to BAB

comment:3 Changed 5 years ago by git

  • Commit changed from 8dd34cd88b0463af0a1c2ad47560ea77a1543bf0 to 1a33d044aa565ef4ea574d33d8010c634a17699a

Branch pushed to git repo; I updated commit sha1. New commits:

1a33d04trac #17712: some cleaning

comment:4 Changed 5 years ago by dcoudert

  • Cc ncohen added
  • Status changed from new to needs_review

I have pushed a first draft for storing prefixes. We can certainly reduce further the computation time, but I don't know how. Also, I tried to be as simple as possible.

David.

comment:5 follow-up: Changed 5 years ago by ncohen

  • Status changed from needs_review to needs_work

Hello David,

A short review before going to sleep:

1) You do not seem to do anything at all with max_prefix_number.

2) You add parameters in path_decomposition but you don't do anything with them.

3) Given that you seem to only store sets of integers, I am tempted to think that you should prefer FrozenBitset objects over frozenset. They should give you a faster hash/equality test.

4) There are many empty docstrings in the methods of PrefixStorage. And to be honest, considering the amount of doc it takes to implement all of that, I wonder if it would not be better to leave it all in the code, without creating a new class. "Just a dictionary whose keys are bitsets".

In order to not have too many things in the actual branch-and-bound, it may be enough to have just a couple of functions (your update/is_known_prefix) taking a dictionary as an additional argument.

Good night,

Nathann

comment:6 Changed 5 years ago by git

  • Commit changed from 1a33d044aa565ef4ea574d33d8010c634a17699a to 1181090c4d48f10b1bc20daa517c5396b6d27083

Branch pushed to git repo; I updated commit sha1. New commits:

1181090trac #17712: improvements

comment:7 in reply to: ↑ 5 Changed 5 years ago by dcoudert

Replying to ncohen:

Hello David,

A short review before going to sleep:

1) You do not seem to do anything at all with max_prefix_number.

Now used. I forgot some parts. Surprizingly you are less efficient when the heating system of your building is broken during winter time...

Also I'm now only recording 1 boolean per prefix. We don't need to store the cost.

2) You add parameters in path_decomposition but you don't do anything with them.

Fixed

3) Given that you seem to only store sets of integers, I am tempted to think that you should prefer FrozenBitset objects over frozenset. They should give you a faster hash/equality test.

My tests gives better performances with frozenset. But if you have better ideas...

4) There are many empty docstrings in the methods of PrefixStorage. And to be honest, considering the amount of doc it takes to implement all of that, I wonder if it would not be better to leave it all in the code, without creating a new class. "Just a dictionary whose keys are bitsets".

In order to not have too many things in the actual branch-and-bound, it may be enough to have just a couple of functions (your update/is_known_prefix) taking a dictionary as an additional argument.

The main advantage of the class is to hide all the parameters. It is very useful to be able to tune these parameters according the amount of memory you have. Passing the dict as parameter, we would also have to pass other parameters. I'm not convinced that it would really be shorter to have 2 functions instead of this class.

We could however save the enable_prefix_storage parameter since we can obtain the same behavior setting max_prefix_length=0.

Good night,

Nathann

If you want to know the kind of speedup we get for a quite hard instance:

sage: G = graphs.MycielskiGraph(5)
sage: %timeit vs, seq = VS.vertex_separation_BAB(G, enable_prefix_storage=True)
1000 loops, best of 3: 1.3 ms per loop
sage: %timeit vs, seq = VS.vertex_separation_BAB(G, enable_prefix_storage=False)
100 loops, best of 3: 2.19 ms per loop

sage: G = graphs.MycielskiGraph(6)
sage: %timeit vs, seq = VS.vertex_separation_BAB(G, max_prefix_length=20)
1 loops, best of 3: 1.05 s per loop
sage: %timeit vs, seq = VS.vertex_separation_BAB(G, max_prefix_length=10)
1 loops, best of 3: 1.84 s per loop
sage: %timeit vs, seq = VS.vertex_separation_BAB(G, max_prefix_length=8)
1 loops, best of 3: 12 s per loop
sage: %timeit vs, seq = VS.vertex_separation_BAB(G, max_prefix_length=7)
1 loops, best of 3: 40.8 s per loop

I have not tested VS.vertex_separation_BAB(G, enable_prefix_storage=False) with this version of the prefix storage. With a former version (using trees and so much more ugly code) it was around 40min, so with dict should be more than 1h

David.

comment:8 Changed 5 years ago by dcoudert

  • Status changed from needs_work to needs_review

comment:9 Changed 5 years ago by git

  • Commit changed from 1181090c4d48f10b1bc20daa517c5396b6d27083 to 6e39f78fbdfcf22ca89ba23ac96ce88431feaa6e

Branch pushed to git repo; I updated commit sha1. New commits:

6e39f78trac #17712: remove parameter enable_prefix_storage

comment:10 Changed 5 years ago by ncohen

Hello again,

I am working again on this patch, and I cannot say that I like this PrefixStorage class very much. To make it simple it is just a dictionary, in which you seem to store all values that you do not want to carry around as parameters of the BAB_C function.

Some parameters are almost impossible to document in the PrefixStorage class and can only be understood if you know exactly where they are called from the inside of the BAB_C function.

I removed some undocumented functions from PrefixStorage that you did not call (as well as the undocumented __len__). Please make sure that sage -coverage <file> does not return anything wrong before you submit a patch for review.

....

Okay, and I just noticed that you pushed a new commit on this needs_review patch. So now I will see whether it breaks all my modifications.

Today is not a good day.

Nathann


New commits:

6e39f78trac #17712: remove parameter enable_prefix_storage

comment:11 Changed 5 years ago by dcoudert

Sorry, sorry, sorry, sorry, I removed the enable prefix storage parameter to shorten the doc. Really sorry if I breaks your edits...

comment:12 Changed 5 years ago by dcoudert

If you believe it is better to add extra parameters to BAB_C (dict, max length, max number, etc.) and to remove this class, we can try.

comment:13 Changed 5 years ago by ncohen

  • Status changed from needs_review to needs_work

Hello again,

This commit does what was said above, and also rephrases the short description of the caching mechanism in the module's doc. I hope that it is clearer.

As you just removed a keyword I also wondered about the use of "is_on".

Two remarks:

1) Right now you 'store' the size of the dictionary in an integer variable. To do this, every time you add a new element in the dictionary, you check whether the element is there already. Thus, each of the +1 costs you a lookup, theoretically a log(n) operation. I have not run any timings, but I would be a bit troubled if calling len(the_dict) was not faster.

2) I still do not understand the point of the dictionary: what are "False" values useful for ? Why isn't it just a set of frozensets ?

3)

~/sage/graphs/graph_decompositions$ sage -coverage vertex_separation.pyx 
------------------------------------------------------------------------
SCORE vertex_separation.pyx: 88.9% (8 of 9)

Missing doctests:
     * line 1663: def __init__(self, int max_prefix_length=20, int max_prefix_number=10**6)
------------------------------------------------------------------------

Nathann

P.S. : About removing this PrefixStorage: let us first see what happens with this dict/set question. My intuition is that if this thing becomes a set you can already remove cost/upper_bound/is_improved from update() (and only call 'update' when appropriate) which may already simplify matters.

comment:14 Changed 5 years ago by git

  • Commit changed from 6e39f78fbdfcf22ca89ba23ac96ce88431feaa6e to 4abd2bacff0b06ae13a4fb4d046c7ecee4716d28

Branch pushed to git repo; I updated commit sha1. New commits:

4abd2batrac #17712: Reviewer's commit

comment:15 Changed 5 years ago by ncohen

  • Branch changed from public/17112 to public/17712

comment:16 follow-up: Changed 5 years ago by dcoudert

Hello Nathann,

You are right, we could use a set instead of a dictionary and add the prefix P to the set only if c(P)<\min_{L\in{\cal L}_P(V)} c(L).

However, it is apparently faster to test if a frozenset is a key of a dictionary than if it is in a set.

sage: d = dict()
sage: s = set()
sage: for i in range(1, 11):
....:     for z in Combinations(range(2*i), i):
....:         d[frozenset(z)] = True
....:         s.add(frozenset(z))
....:         
sage: len(d)
250952
sage: %timeit len(d)
10000000 loops, best of 3: 63 ns per loop
sage: %timeit len(s)
10000000 loops, best of 3: 59.9 ns per loop
sage:
sage: elt = d.keys()[randint(0, len(d)-1)]
sage: %timeit elt in d
10000000 loops, best of 3: 60.9 ns per loop
sage: %timeit elt in s
10000000 loops, best of 3: 163 ns per loop
sage:
sage: elt = d.keys()[randint(0, len(d)-1)]
sage: %timeit elt in d
10000000 loops, best of 3: 58.8 ns per loop
sage: %timeit elt in s
10000000 loops, best of 3: 160 ns per loop

What we can do is to change is_known_prefix as follows:

        if size<1 or size>self.max_prefix_length:
            return False

        cdef int i
        cdef frozenset my_prefix = frozenset([prefix[i] for i in range(size)])

        return my_prefix in self.PT

and update as:

        cdef int i
        if size>0 and size<=self.max_prefix_length \
           and not (is_improved and cost==upper_bound) \
           and len(self.PT)< self.max_prefix_number:

            my_prefix = frozenset(prefix[i] for i in range(size))
            self.PT[my_prefix] = True

Furthermore, if we do that inside BAB_C, we do only once my_prefix = frozenset(prefix[i] for i in range(size)).

If you agree, I will implement the changes.

David.

comment:17 in reply to: ↑ 16 Changed 5 years ago by ncohen

Yo !

You are right, we could use a set instead of a dictionary and add the prefix P to the set only if c(P)<\min_{L\in{\cal L}_P(V)} c(L).

Cool. This should sipmlify both code and doc.

However, it is apparently faster to test if a frozenset is a key of a dictionary than if it is in a set.

WHaaaaaaaaaaaaaaat ?.... -_-

Okay. Dict with 'True' keys. Really... -_-

I will write to sage-devel with your timings.

If you agree, I will implement the changes.

+1

Nathann

comment:18 Changed 5 years ago by git

  • Commit changed from 4abd2bacff0b06ae13a4fb4d046c7ecee4716d28 to 4cc8d7fc0550aba5fd9e03f3263ede4172ff9b73

Branch pushed to git repo; I updated commit sha1. New commits:

4cc8d7ftrac #17712: remove class PrefixStorage and simplify code

comment:19 Changed 5 years ago by dcoudert

With this modification, the code is significantly faster. Cool! I'm really happy to have this timing on my MBA, knowing that it is a hard instance for this problem.

sage: from sage.graphs.graph_decompositions import vertex_separation as VS
sage: G = graphs.MycielskiGraph(6)
sage: %timeit vs, seq = VS.vertex_separation(G)
1 loops, best of 3: 965 ms per loop

We can easily change from dict to any other relevant data structure if it is faster.

David.

comment:20 Changed 5 years ago by ncohen

HMmmmm.. Less code, more speed. Love when it happens :-P

Nathann

comment:21 Changed 5 years ago by dcoudert

Since we learned that "internally sets are implemented just as dictionaries with no values", there is no reason for using dictionary. The timing was certainly due to the order in which values were inserted in the set and in the dictionary. Moreover, we can get different results:

sage: L = []
sage: for i in range(1, 11):
....:     for z in Combinations(range(2*i), i):
....:         L.append(frozenset(z))
....:         
sage: shuffle(L)
sage: d = dict( (k, True) for k in L )
sage: s = set(L)
sage: elt = L[randint(0, len(L)-1)]
sage: %timeit elt in s
10000000 loops, best of 3: 66.8 ns per loop
sage: %timeit elt in d
10000000 loops, best of 3: 64.1 ns per loop
sage: d = dict()
sage: for i in range(1, 11):
    for z in Combinations(range(2*i), i):
        d[frozenset(z)] = True
....:
sage: %timeit elt in d
10000000 loops, best of 3: 157 ns per loop

I will change to set.

David.

comment:22 Changed 5 years ago by git

  • Commit changed from 4cc8d7fc0550aba5fd9e03f3263ede4172ff9b73 to f6a71a03edd5e21712ae76268e1cdd1a00b8b845

Branch pushed to git repo; I updated commit sha1. New commits:

f6a71a0trac #17712: use set instead of dict to store prefixes

comment:23 Changed 5 years ago by dcoudert

  • Status changed from needs_work to needs_review

Same speed and bit less doc.

comment:24 Changed 5 years ago by git

  • Commit changed from f6a71a03edd5e21712ae76268e1cdd1a00b8b845 to 8bcfa17987a8561d459703334615dec1b7edfa3d

Branch pushed to git repo; I updated commit sha1. New commits:

8bcfa17trac #17712: store prefixes generated by greedy steps

comment:25 Changed 5 years ago by dcoudert

I have added a few lines to store the prefixes generated by the greedy steps. Although it takes some time locally, it should help reducing the total number of visited branches.

comment:26 Changed 5 years ago by git

  • Commit changed from 8bcfa17987a8561d459703334615dec1b7edfa3d to 919df3da5d54ddc1531581c225d8dfe7a21b0f76

Branch pushed to git repo; I updated commit sha1. New commits:

919df3dtrac #17712: move tests on prefixes

comment:27 Changed 5 years ago by git

  • Commit changed from 919df3da5d54ddc1531581c225d8dfe7a21b0f76 to 93e9a0ab8ffae7ae695e51f8a89ca72994c3a9de

Branch pushed to git repo; I updated commit sha1. New commits:

93e9a0atrac #17712: Update doc

comment:28 Changed 5 years ago by ncohen

  • Reviewers set to Nathann Cohen

Yoooooooooo !

If you don't see anything wrong in this last commit you can set the ticket to positive_review. Thanks for you work,

Nathann

comment:29 Changed 5 years ago by dcoudert

Youhou!!!

Thank you once again Nathann for your very constructive help. The final solution is very nice.

Best, David.

PS: I have other possible inputs for this module (algorithm for trees, pre-processing for reducing the size of input, algorithm for testing vs=1 and vs=2, heuristics, etc.), but not right now.

comment:30 Changed 5 years ago by dcoudert

  • Status changed from needs_review to positive_review

comment:31 Changed 5 years ago by vbraun

  • Branch changed from public/17712 to 93e9a0ab8ffae7ae695e51f8a89ca72994c3a9de
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.