Opened 2 years ago
Closed 2 years ago
#18987 closed enhancement (fixed)
Parallel computation of number of solutions in dancing links
Reported by:  slabbe  Owned by:  

Priority:  major  Milestone:  sage6.9 
Component:  combinatorics  Keywords:  
Cc:  Merged in:  
Authors:  Sébastien Labbé  Reviewers:  Vincent Delecroix 
Report Upstream:  N/A  Work issues:  
Branch:  29ff986 (Commits)  Commit:  29ff98625316f95c48fd704115a209dce38deda8 
Dependencies:  Stopgaps: 
Description (last modified by )
The following computation takes a lot of time:
sage: from sage.games.quantumino import QuantuminoSolver sage: QuantuminoSolver(0).number_of_solutions() # long time (several days)
but we can make it faster by doing the computation in parallel... This ticket does this (directly in the dancing links code).
Change History (69)
comment:1 Changed 2 years ago by
 Branch set to u/slabbe/18987
 Commit set to a86b42d048567484cf30c7a6f7838704615083e3
comment:2 Changed 2 years ago by
 Commit changed from a86b42d048567484cf30c7a6f7838704615083e3 to 6fd4a8759641415b121b4a12aa6ecb711490b533
comment:3 Changed 2 years ago by
 Description modified (diff)
comment:4 Changed 2 years ago by
 Branch u/slabbe/18987 deleted
 Commit 6fd4a8759641415b121b4a12aa6ecb711490b533 deleted
comment:5 Changed 2 years ago by
 Branch set to u/slabbe/18987
 Commit set to 3c11961ebae4953711c490f7ac57679b99c59b52
 Status changed from new to needs_review
comment:6 Changed 2 years ago by
 Status changed from needs_review to needs_work
Still other stuff to improve...
comment:7 Changed 2 years ago by
 Commit changed from 3c11961ebae4953711c490f7ac57679b99c59b52 to b370fd04a3efb7393d5516936958c552934faa7f
comment:8 Changed 2 years ago by
 Status changed from needs_work to needs_review
Ok, now needs reviews. I won't rebase my branch anymore.
comment:9 followups: ↓ 10 ↓ 12 ↓ 14 ↓ 19 ↓ 26 Changed 2 years ago by
 Reviewers set to Vincent Delecroix
 Status changed from needs_review to needs_work
Salut Sebastien,
Do you mind if I rebase over 6.9.beta1? Note that I also have a waiting commit that does some tiny optimization to dancing_links.pyx
.
I am not sure the overall strategy is the good one. If parallelization is needed I guess that it should better be implemented at the level of dancing links. Googling "parallelization dancing links" already gives a lot of things.
Less importantly:
 I do not understand the name of the function
orthogonal_transformation
. Are these the orthogonal transformations ofR^3
with integer coordinates? If so, please write more precise specifications.
 As far as I see, you do not test all cases of the function
orthogonal_transformation
.
 What is the
modpi
arguments. What is a rotation of anglepi
for you? Is it a linear transformation that is a pirotation restricted on a plane and leaves invariant the orthogonal complement? (I guess it should also have integer coordinates) If this is, then it is of course not a group... but of course you might consider the group generated by these.
Vincent
comment:10 in reply to: ↑ 9 ; followup: ↓ 11 Changed 2 years ago by
Replying to vdelecroix:
Salut Sebastien,
Do you mind if I rebase over 6.9.beta1?
Merge or rebase? I prefer if you merge. Or I can rebase on 6.9.beta1 to keep the authorship (right?).
Note that I also have a waiting commit that does some tiny optimization to
dancing_links.pyx
.
Where?
I am not sure the overall strategy is the good one. If parallelization is needed I guess that it should better be implemented at the level of dancing links. Googling "parallelization dancing links" already gives a lot of things.
Indeed, I am using a parallelization strategy that applies to a tiling problem where each piece is used only once. This strategy obviously does not apply to the general problem that is the Exact cover problem.
Also, I prefered to cut the (tiling) problem into subproblems that takes at most 23 hours each so that I can more easily follow the process of the computation and stop and restart the computation more easily. Even with a parallel implementation of dancing links, the computation would take days to finish.
comment:11 in reply to: ↑ 10 ; followup: ↓ 13 Changed 2 years ago by
Replying to slabbe:
Replying to vdelecroix:
Salut Sebastien,
Do you mind if I rebase over 6.9.beta1?
Merge or rebase? I prefer if you merge. Or I can rebase on 6.9.beta1 to keep the authorship (right?).
The autorship of what? If I do a rebase, you keep the autorship of commits. But of course it will be my branch (or a public one).
I do prefer rebase over merge because:
 history is much cleaner afterwards (fewer commits that follow each other)
 you can hide whatever you want in a merge commit
Note that I also have a waiting commit that does some tiny optimization to
dancing_links.pyx
.Where?
On my computer ;) It is waiting for the rebase or merge.
I am not sure the overall strategy is the good one. If parallelization is needed I guess that it should better be implemented at the level of dancing links. Googling "parallelization dancing links" already gives a lot of things.
Indeed, I am using a parallelization strategy that applies to a tiling problem where each piece is used only once. This strategy obviously does not apply to the general problem that is the Exact cover problem.
In the exact cover problem, each subset is used at most once. In your tiling formulation a piece count for several subsets. You can naively apply the same thing for dancing links: look at one position i0
and consider the set S0
of subsets that cover it. For each subset in S0
, launch a thread where you only run through the subproblem that consists of having fixed this subset.
But I guess that there exists less naive parallelization.
Also, I prefered to cut the (tiling) problem into subproblems that takes at most 23 hours each so that I can more easily follow the process of the computation and stop and restart the computation more easily. Even with a parallel implementation of dancing links, the computation would take days to finish.
Why? If your parallization ends with a time better than total_time / nb_cpus
then you should parallelize more often ;)
Vincent
comment:12 in reply to: ↑ 9 Changed 2 years ago by
Less importantly:
 I do not understand the name of the function
orthogonal_transformation
.
I chose that name 3 years ago. I agree it was not the best function name.
Your comments are very good. The reason that things are not clearly written is that it is not clear enough in my head. Let me sleep this night and I will come back on this tomorrow or later this week.
When modpi=True
, I (think! I) want to quotient the result by the group generated by the diagonal matrices of 1's and 1's with exactly two 1 on the diagonal. Only, when the dimension is 3, I was able to generate representative of the classes easily.
comment:13 in reply to: ↑ 11 ; followup: ↓ 15 Changed 2 years ago by
Why? If your parallization ends with a time better than
total_time / nb_cpus
then you should parallelize more often ;)
I have 240 subproblems each of them taking between 20 minutes and 10 hours of computation. But my machine at work only have 4 cores. So one way or the other, the computation takes days to finish since I do not have access to a super machine.
comment:14 in reply to: ↑ 9 Changed 2 years ago by
I am not sure the overall strategy is the good one. If parallelization is needed I guess that it should better be implemented at the level of dancing links. Googling "parallelization dancing links" already gives a lot of things.
If you agree, I suggest to move the discussion of "parallelization dancing links" in another ticket for which I am willing to be a reviewer.
comment:15 in reply to: ↑ 13 ; followup: ↓ 18 Changed 2 years ago by
Replying to slabbe:
Why? If your parallization ends with a time better than
total_time / nb_cpus
then you should parallelize more often ;)I have 240 subproblems each of them taking between 20 minutes and 10 hours of computation. But my machine at work only have 4 cores. So one way or the other, the computation takes days to finish since I do not have access to a super machine.
This looks very bad. At the end, you might end up with only one core working on the biggest subinstance. And it can lasts several days even with 200 cores. Ideally, you should slice the problem in such way that each subinstance will not take longer than 1 hour (let say). This is why adopting a less naive strategy at the level of dancing links seems to me the best option since people already worked on it.
There is no super computer in Liege? I can set an invitation to use the very powerful Plafrim in Bordeaux. But it is some work to learn how to use it (e.g. you need to tell in advance for how long you request the processors).
Vincent
comment:16 Changed 2 years ago by
 Commit changed from b370fd04a3efb7393d5516936958c552934faa7f to cbbc26242b1883f8a51065ccd7e8e97e6154025b
comment:17 Changed 2 years ago by
Quickly reading [1] (which looks serious but not very deep) it seems that it is hard to find subinstance of the problems that are well scaled for a given cluster.
[1] S. M. Ashraful Kadir, "A Parallel Programming Approach to Solve the Exact Cover Problem"
comment:18 in reply to: ↑ 15 Changed 2 years ago by
This looks very bad. At the end, you might end up with only one core working on the biggest subinstance. And it can lasts several days even with 200 cores. Ideally, you should slice the problem in such way that each subinstance will not take longer than 1 hour (let say).
It is not very bad as most of the 240 computations takes the same amount of time (about 2 to 3 hours). Maybe 5 of time takes more (10 hours). So I am using the four cores at least 95% of the time. In my case, I have very good subinstances to reuse your term.
There is no super computer in Liege?
Well maybe there is something. It is the first time in Liège that I need computation power, but it is the vacation now.
I can set an invitation to use the very powerful Plafrim in Bordeaux. But it is some work to learn how to use it (e.g. you need to tell in advance for how long you request the processors).
That would be nice!
comment:19 in reply to: ↑ 9 Changed 2 years ago by
Do you mind if I rebase over 6.9.beta1?
I just did that (in case you did not notice).
comment:20 Changed 2 years ago by
 Branch changed from u/slabbe/18987 to public/18987
 Commit changed from cbbc26242b1883f8a51065ccd7e8e97e6154025b to 9d8d0a142b58c2ab0c3ab2e18eef093295c0c54e
New commits:
9d8d0a1  doc + optim in dancing_links

comment:21 Changed 2 years ago by
I found the organization of the dancing links code very confusing. Do you know why there are both a file dancing_links.pyx
(that wraps the C++ class as a Cython class) and a file dlxcpp.py
(that uses the Cython class in a functional style)? And there is also a combinat/dlx.py
!! I guess this is mostly historical but it needs a serious cleanup.
comment:22 Changed 2 years ago by
Even just inside dancing_links.pyx
, the class is called a Wrapper for the C++ code and dlx_solver
is a function that calls the constructor of the wrapper...
Also I would not put this in combnat/matrices
just because the exact cover problem can be represented as a 01 matrix...
comment:23 Changed 2 years ago by
 Commit changed from 9d8d0a142b58c2ab0c3ab2e18eef093295c0c54e to 91de8636683e1c8cec7d9d33e645721572de3b1c
Branch pushed to git repo; I updated commit sha1. New commits:
91de863  rename orthogonal_transformation function

comment:24 Changed 2 years ago by
Is there a way to quotient two groups in Sage, maybe using gap or something ?
sage: L = [w.matrix() for w in WeylGroup(['B',3]) if w.matrix().det()==1] sage: G = MatrixGroup(L) sage: H = MatrixGroup(L[:4]) sage: len(G) 24 sage: len(H) 4 sage: H Matrix group over Rational Field with 4 generators ( [1 0 0] [ 1 0 0] [1 0 0] [1 0 0] [0 1 0] [ 0 1 0] [ 0 1 0] [ 0 1 0] [0 0 1], [ 0 0 1], [ 0 0 1], [ 0 0 1] ) sage: G.quotient(H) Traceback (most recent call last): ... NotImplementedError:
comment:25 Changed 2 years ago by
 Commit changed from 91de8636683e1c8cec7d9d33e645721572de3b1c to 0c752d4038a7419285a1c1fa9b0c21842b593a2e
Branch pushed to git repo; I updated commit sha1. New commits:
0c752d4  Improve explanation of the modpi argument

comment:26 in reply to: ↑ 9 ; followup: ↓ 27 Changed 2 years ago by
 Status changed from needs_work to needs_review
 What is the
modpi
arguments. What is a rotation of anglepi
for you? Is it a linear transformation that is a pirotation restricted on a plane and leaves invariant the orthogonal complement? (I guess it should also have integer coordinates) If this is, then it is of course not a group... but of course you might consider the group generated by these.
Ok, so tell me if it is better now.
comment:27 in reply to: ↑ 26 ; followup: ↓ 29 Changed 2 years ago by
Replying to slabbe:
 What is the
modpi
arguments. What is a rotation of anglepi
for you? Is it a linear transformation that is a pirotation restricted on a plane and leaves invariant the orthogonal complement? (I guess it should also have integer coordinates) If this is, then it is of course not a group... but of course you might consider the group generated by these.Ok, so tell me if it is better now.
A bit. What do you mean by the rectangular parallelepiped
? There is only one? As far as I understand it is the isometry group that preserves any rectangular parallelepiped. You can also say differently: the group of orientable isometry that preserve (globally) each axis. Is that right?
And as before, this is the group *generated* by these rotation and not only thes rotations themselves (except in dim 2 and 3).
Isn't this group exactly the subgroup of matrices with an even number of 1
on the diagonal? So the quotient is just the signed permutation matrices with either 0
or 1
coefficient 1
(all others being 1
). Isn't it?
When orientation_preserving
is False
I guess this group is the subgroup of matrices with any number of 1
on the diagonal. In that case, the quotient would just be the permutation matrices. No?
comment:28 Changed 2 years ago by
And when you say that is by rotations of angle pi
this is confusing since there are infinitely many of them. You can say by rotations of angle pi on the plane generated by two axes
.
comment:29 in reply to: ↑ 27 ; followups: ↓ 30 ↓ 31 Changed 2 years ago by
A bit. What do you mean by
the rectangular parallelepiped
? There is only one?
Of course the length of the side can change, so there is more than one. But, of course, I consider the case where the length of the sides are all distinct.
As far as I understand it is the isometry group that preserves any rectangular parallelepiped.
You understand well.
You can also say differently: the group of orientable isometry that preserve (globally) each axis. Is that right?
Yes.
And as before, this is the group *generated* by these rotation and not only thes rotations themselves (except in dim 2 and 3).
Of course. "modpi" is in the sense of modulo a group.
Isn't this group exactly the subgroup of matrices with an even number of
1
on the diagonal?
Yes.
So the quotient is just the signed permutation matrices with either
0
or1
coefficient1
(all others being1
). Isn't it?
Maybe. When the dimension is odd, the quotient is the signed permutation matrices where the signs is either all positive or all negative with determinant 1. That is the formula that I was using.
When
orientation_preserving
isFalse
I guess this group is the subgroup of matrices with any number of1
on the diagonal. In that case, the quotient would just be the permutation matrices. No?
Yes! You are right.
comment:30 in reply to: ↑ 29 Changed 2 years ago by
So the quotient is just the signed permutation matrices with either
0
or1
coefficient1
(all others being1
). Isn't it?
When orientation_preserving=True
, the determinant of every returned matrix must be one. Therefore, I believe the quotient is the positive permutation matrices of determinant one + the other permutation matrices where one 1 is replaced by 1.
comment:31 in reply to: ↑ 29 ; followup: ↓ 33 Changed 2 years ago by
So the quotient is just the signed permutation matrices with either
0
or1
coefficient1
(all others being1
). Isn't it?Maybe. When the dimension is odd, the quotient is the signed permutation matrices where the signs is either all positive or all negative with determinant 1. That is the formula that I was using.
Ok, now I understand why I needed it that way. I need wellchosen representatives for the quotient. By this, I mean that the chosen representatives of the cosets form a group itself.
For example, these representatives do not form a group:
sage: n = 3 sage: c = identity_matrix(n) sage: c[0,0] = 1 sage: L = [w.matrix() for w in WeylGroup(['A', n1])] sage: L = [(w if w.det() == 1 else c*w) for w in L] [ [1 0 0] [ 0 0 1] [0 0 1] [ 0 1 0] [0 1 0] [1 0 0] [0 1 0] [ 0 1 0] [1 0 0] [ 1 0 0] [0 0 1] [ 0 0 1] [0 0 1], [ 1 0 0], [0 1 0], [ 0 0 1], [1 0 0], [ 0 1 0] ] sage: MatrixGroup(L).cardinality() 24
But these representatives forms a group:
sage: L = [w.matrix() for w in WeylGroup(['A', n1])] sage: L = [m.det() * m for m in L] sage: L [ [1 0 0] [ 0 0 1] [0 0 1] [ 0 1 0] [0 1 0] [1 0 0] [0 1 0] [ 0 1 0] [1 0 0] [1 0 0] [0 0 1] [ 0 0 1] [0 0 1], [1 0 0], [0 1 0], [ 0 0 1], [1 0 0], [ 0 1 0] ] sage: MatrixGroup(L).cardinality() 6
And I still don't know how to construct the quotient when n is even.
comment:32 Changed 2 years ago by
Okay, so it is a chance that it works for when n is odd because in general:
http://groupprops.subwiki.org/wiki/Quotient_group_need_not_be_isomorphic_to_any_subgroup
comment:33 in reply to: ↑ 31 Changed 2 years ago by
Ok, now I understand why I needed it that way.
Wait. I really need the quotient itself finally. I was just lucky that the transformation keeping some the pentamino invariant was in the group.
sage: from sage.combinat.tiling import ncube_isometry_group sage: from sage.games.quantumino import pentaminos sage: L = ncube_isometry_group(3) sage: f = lambda p : [m for m in L[1:] if (m*p).canonical() == p.canonical()] sage: [(i, f(p)) for i,p in enumerate(pentaminos) if f(p)] [(6, [ [ 0 0 1] [ 0 1 0] [1 0 0] ]), (7, [ [ 0 0 1] [ 0 1 0] [ 1 0 0] ]), (12, [ [1 0 0] [ 0 0 1] [ 0 1 0] ]), (13, [ [ 0 0 1] [ 0 1 0] [1 0 0] ]), (16, [ [ 0 0 1] [ 0 1 0] [1 0 0] ])]
Above, I get a problem with pentamino number 7 because it is invariant under a transformation that is not in the subgroup isomorphic to the quotient. So I really need to consider the quotient with all of the elements in each coset. Chosing a representative won't work even if it is well chosen.
Give me more time. I'll update my branch.
comment:34 Changed 2 years ago by
 Commit changed from 0c752d4038a7419285a1c1fa9b0c21842b593a2e to dc3797bd98509562635b7ef940c7cbd48fc71e31
comment:35 Changed 2 years ago by
Reneeds review.
comment:36 Changed 2 years ago by
Note that the patchbot was not able to build the doc.
comment:37 Changed 2 years ago by
 Commit changed from dc3797bd98509562635b7ef940c7cbd48fc71e31 to 1e01ba27547f9a7f8ec2e35418f95b00e2ffb405
Branch pushed to git repo; I updated commit sha1. New commits:
ca84fcf  Trac #18987: Parallel computation of the nb of solutions for tilings

3d4402b  Trac #18987: Tilings of polyominos modulo 180 degrees rotations

65f6512  Trac #18987: Add a transparent gray box to QuantuminoState.show3d

555f0a9  Trac #18987: doc + optim in dancing_links

481d96e  Trac #18987: rename orthogonal_transformation function

5f66e1d  Trac #18987: Improve explanation of the modpi argument

d359115  Trac #18987: Using cosets for when modpi=True

1e01ba2  Trac #18987: Change pentaminos to there canonical form

comment:38 Changed 2 years ago by
There was a TESTS::
with text following. Fixed that (and squashed it the the last commit). Also added the prefix Trac #18987:
to all commit messages.
comment:39 Changed 2 years ago by
 Status changed from needs_review to needs_work
There is one failing doctest on the patchbot that is not failing on my machine...
comment:40 Changed 2 years ago by
 Commit changed from 1e01ba27547f9a7f8ec2e35418f95b00e2ffb405 to 2133ecdcffb1b993049874c8be6cad2b1224a73a
Branch pushed to git repo; I updated commit sha1. New commits:
2133ecd  Trac #18987: fixing intermittent doctest failure

comment:41 Changed 2 years ago by
 Status changed from needs_work to needs_review
comment:42 followups: ↓ 44 ↓ 45 Changed 2 years ago by
Are you sure ncube_isometry_group
is worth a @cached_function
?
In ncube_isometry_group_modpi
why are you building MatrixGroup
?
In
P_coset = set(frozenset((m.matrix() * self).canonical() for m in coset) for coset in L) return set(next(iter(s)) for s in P_coset)
You are building a lot of images to consider matrix x polyomino
to use just one at the end. Why not
return set((L[0].matrix() * self).canonical() for coset in L)
And if you were not using matrix groups you can even do
return set((L[0] * self).canonical() for coset in L)
comment:43 Changed 2 years ago by
You are using L
instead of coset
I think. Can you edit your previous comment to make sure I understand what you mean?
comment:44 in reply to: ↑ 42 ; followup: ↓ 46 Changed 2 years ago by
Replying to vdelecroix:
Are you sure
ncube_isometry_group
is worth a@cached_function
?
Calling this function takes 2.8s on my machine. And it is called once for each polyomino. That is 16 times for the Quantumino puzzle. With the cache, I gain about 40s to construct the rows to give to the dlx solver.
In
ncube_isometry_group_modpi
why are you buildingMatrixGroup
?
Because otherwise, this
G = ncube_isometry_group(n, orientation_preserving) H = [h for h in G if all(i==j for (i,j) in h.nonzero_positions())] left_cosets = set(tuple(sorted(h*g for h in H)) for g in G)
throws a TypeError: mutable matrices are unhashable
and I find it more fun to read like this instead of the .set_immutable()
on every matrices.
comment:45 in reply to: ↑ 42 Changed 2 years ago by
You are building a lot of images to consider
matrix x polyomino
to use just one at the end. Why not
I know but I need all of them. That is what I understood yesterday. Let me try to explain. In general it is okay to take only one matrix in the coset. The problem comes when the polyomino is invariant under some of the 24 orientation preserving isometries of the cube. For example, consider:
sage: from sage.games.quantumino import pentaminos sage: from sage.combinat.tiling import ncube_isometry_group sage: p = pentaminos[7] sage: m = ncube_isometry_group(3)[3] sage: p Polyomino: [(0, 0, 0), (0, 1, 0), (0, 2, 0), (0, 2, 1), (1, 0, 0)], Color: orange sage: m [ 0 0 1] [ 0 1 0] [ 1 0 0] sage: (m*p).canonical() == p True
The polyomino p
has 12 distinct rotation images instead of 24. And among the 12 ways of placing that polyomino into a box, there are 3 distinct ways up to rotation of the box keeping the box invariant. To compute this, we need to consider the whole coset:
sage: cosets = ncube_isometry_group_modpi(3) sage: P_coset = set(frozenset((m.matrix() * p).canonical() for m in coset) for coset in cosets) sage: len(P_coset) 3 sage: len(set(next(iter(s)) for s in P_coset)) 3
Otherwise, you obtain too many polyominos (see below).
sage: set((coset[0].matrix() * p).canonical() for coset in cosets) {Polyomino: [(0, 0, 1), (1, 0, 1), (1, 1, 1), (1, 2, 0), (1, 2, 1)], Color: orange, Polyomino: [(0, 1, 2), (1, 0, 0), (1, 1, 0), (1, 1, 1), (1, 1, 2)], Color: orange, Polyomino: [(0, 0, 0), (0, 1, 0), (1, 1, 0), (2, 1, 0), (2, 1, 1)], Color: orange, Polyomino: [(0, 1, 0), (0, 1, 1), (1, 1, 1), (2, 0, 1), (2, 1, 1)], Color: orange, Polyomino: [(0, 2, 0), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 2, 0)], Color: orange}
comment:46 in reply to: ↑ 44 ; followups: ↓ 48 ↓ 53 Changed 2 years ago by
Replying to slabbe:
Replying to vdelecroix:
Are you sure
ncube_isometry_group
is worth a@cached_function
?Calling this function takes 2.8s on my machine. And it is called once for each polyomino. That is 16 times for the Quantumino puzzle. With the cache, I gain about 40s to construct the rows to give to the dlx solver.
Perhaps the way it is implemented is wrong. For example
sage: %timeit P = [s.matrix() for s in SymmetricGroup(4)] 1 loops, best of 3: 1.27 ms per loop
and with the signs
sage: from itertools import product sage: %timeit M = [diagonal_matrix(p) for p in product([1,1], repeat=4)] 1 loops, best of 3: 2.9 ms per loop sage: M = [diagonal_matrix(p) for p in product([1,1], repeat=4)] sage: %timeit S = [m*s.matrix() for m in M for s in SymmetricGroup(4)] 100 loops, best of 3: 18.3 ms per loop
So it is likely to be ~20ms
.
comment:47 Changed 2 years ago by
 Commit changed from 2133ecdcffb1b993049874c8be6cad2b1224a73a to 0d68ecec28cb92d9e78a92d6e733d42b3e8941c1
Branch pushed to git repo; I updated commit sha1. New commits:
0d68ece  Trac #18987: removed a cached_function

comment:48 in reply to: ↑ 46 ; followup: ↓ 49 Changed 2 years ago by
Replying to vdelecroix:
Are you sure
ncube_isometry_group
is worth a@cached_function
?
In fact, you make me realize that there is already caching involved in WeylGroup. Therefore caching that function is not so necessary. Without caching, I get:
sage: from sage.combinat.tiling import ncube_isometry_group sage: time L = ncube_isometry_group(4) CPU times: user 1.14 s, sys: 19.7 ms, total: 1.16 s Wall time: 1.3 s sage: time L = ncube_isometry_group(4) CPU times: user 358 ms, sys: 4.01 ms, total: 362 ms Wall time: 448 ms
sage: from itertools import product sage: %timeit M = [diagonal_matrix(p) for p in product([1,1], repeat=4)] 1 loops, best of 3: 2.9 ms per loop sage: M = [diagonal_matrix(p) for p in product([1,1], repeat=4)] sage: %timeit S = [m*s.matrix() for m in M for s in SymmetricGroup(4)] 100 loops, best of 3: 18.3 ms per loopSo it is likely to be
~20ms
.
Note that you can't use timeit above since there is caching involved in SymmetricGroup.
For comparison, your solution on my machine gives the following timings:
sage: from itertools import product sage: # first call sage: time M = [diagonal_matrix(p) for p in product([1,1], repeat=4)] CPU times: user 11.4 ms, sys: 92 µs, total: 11.5 ms Wall time: 11.9 ms sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)] CPU times: user 191 ms, sys: 25.2 ms, total: 216 ms Wall time: 667 ms sage: # second call sage: time M = [diagonal_matrix(p) for p in product([1,1], repeat=4)] CPU times: user 11.7 ms, sys: 1.34 ms, total: 13 ms Wall time: 16.6 ms sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)] CPU times: user 108 ms, sys: 1.92 ms, total: 110 ms Wall time: 114 ms
So it seems using WeylGroup(['B',4])
is about 4 times slower than your solution. But I don't know if I will change it. I mean that improvement could be done directly in WeylGroup
code...
comment:49 in reply to: ↑ 48 ; followup: ↓ 50 Changed 2 years ago by
Replying to slabbe:
sage: from itertools import product sage: %timeit M = [diagonal_matrix(p) for p in product([1,1], repeat=4)] 1 loops, best of 3: 2.9 ms per loop sage: M = [diagonal_matrix(p) for p in product([1,1], repeat=4)] sage: %timeit S = [m*s.matrix() for m in M for s in SymmetricGroup(4)] 100 loops, best of 3: 18.3 ms per loopSo it is likely to be
~20ms
.Note that you can't use timeit above since there is caching involved in SymmetricGroup.
Where?! Beyond the construction of the group (SymmetricGroup(4) is SymmetricGroup(4)
gives True
) nothing is cached.
comment:50 in reply to: ↑ 49 Changed 2 years ago by
Where?! Beyond the construction of the group (
SymmetricGroup(4) is SymmetricGroup(4)
givesTrue
) nothing is cached.
Don't you get that the first execution of
sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]
is slower than the second? And that the second takes about the same time than the third, the fourth, etc. ? Like me:
sage: from itertools import product sage: M = [diagonal_matrix(p) for p in product([1,1], repeat=4)] sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)] CPU times: user 184 ms, sys: 18 ms, total: 202 ms Wall time: 710 ms sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)] CPU times: user 109 ms, sys: 2.05 ms, total: 111 ms Wall time: 166 ms sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)] CPU times: user 109 ms, sys: 2.6 ms, total: 111 ms Wall time: 113 ms sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)] CPU times: user 110 ms, sys: 2.12 ms, total: 112 ms Wall time: 169 ms sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)] CPU times: user 108 ms, sys: 2.44 ms, total: 111 ms Wall time: 114 ms
comment:51 Changed 2 years ago by
Yes for the timing. But this does not implies that something is cached. Actually, the reason is because gap is launched (I do not know why)
sage: S = SymmetricGroup(4) sage: time gap(3) CPU times: user 16 ms, sys: 12 ms, total: 28 ms Wall time: 295 ms 3 sage: from itertools import product sage: M = [diagonal_matrix(p) for p in product([1,1], repeat=4)] sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)] CPU times: user 56 ms, sys: 0 ns, total: 56 ms Wall time: 56.9 ms sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)] CPU times: user 56 ms, sys: 0 ns, total: 56 ms Wall time: 53 ms
comment:52 Changed 2 years ago by
caching ncube_isometry_group
gives:
sage: from sage.games.quantumino import QuantuminoSolver sage: q = QuantuminoSolver(0) sage: t = q.tiling_solver() sage: time rows = t.rows() CPU times: user 18.8 s, sys: 168 ms, total: 19 s Wall time: 19 s
not caching ncube_isometry_group
gives:
sage: from sage.games.quantumino import QuantuminoSolver sage: q = QuantuminoSolver(0) sage: t = q.tiling_solver() sage: time rows = t.rows() CPU times: user 19.4 s, sys: 288 ms, total: 19.7 s Wall time: 20 s
So I confirm that I don't lose relatively much time by not caching.
comment:53 in reply to: ↑ 46 ; followup: ↓ 54 Changed 2 years ago by
Replying to vdelecroix:
Perhaps the way it is implemented is wrong. For example
sage: %timeit P = [s.matrix() for s in SymmetricGroup(4)] 1 loops, best of 3: 1.27 ms per loop
I removed the cached_method
. Do you want me to replace the code based on WeylGroup(['B',n])
by the SymmetricGroup
+ diagonal_matrix of signs
code you propose? I think that this improvement should be done in WeylGroup
in another ticket.
Also, for timing improvements in that file tiling.py
, there is a another more important thing to spend time on because a lot of time is spent creating vectors from tuple (because coordinates are stored as tuple). This should be done in another ticket.
Needs review!
comment:54 in reply to: ↑ 53 ; followup: ↓ 56 Changed 2 years ago by
Replying to slabbe:
Replying to vdelecroix:
Perhaps the way it is implemented is wrong. For example
sage: %timeit P = [s.matrix() for s in SymmetricGroup(4)] 1 loops, best of 3: 1.27 ms per loopI removed the
cached_method
. Do you want me to replace the code based onWeylGroup(['B',n])
by theSymmetricGroup
+diagonal_matrix of signs
code you propose? I think that this improvement should be done inWeylGroup
in another ticket.
Nope. I also the think that the WeylGroup
code has to be improved.
Also, for timing improvements in that file
tiling.py
, there is a another more important thing to spend time on because a lot of time is spent creating vectors from tuple (because coordinates are stored as tuple). This should be done in another ticket.
Did you notice that
sage: t = map(vector, t)
is much slower than
sage: V = FreeModule(ZZ,12) sage: t = map(V,t)
(I measure a factor x8 in __sub__
and __add__
for example)
Moreover, everything would be faster if you would store integer vectors instead of tuples (and an attribute self._free_module
). Why aren't you doing that?
comment:55 followup: ↓ 57 Changed 2 years ago by
I still do not understand why you are parallelizing the polyomino solver code and not the DLX one. Isn't your strategy exactly equivalent to this one: pick a column, for each row that does have a 1 in this column remove all the columns occuppied by the piece and launch an independent process?
comment:56 in reply to: ↑ 54 ; followup: ↓ 58 Changed 2 years ago by
Moreover, everything would be faster if you would store integer vectors instead of tuples (and an attribute
self._free_module
). Why aren't you doing that?
 Because I think I wanted to use the most basic hashable container (tuple) for the need. I realized only recently that it was slow because there are many matrix operations and additions involved. So indeed, storing vectors would be better.
 Because it is not the purpose of this ticket.
comment:57 in reply to: ↑ 55 Changed 2 years ago by
 Status changed from needs_review to needs_work
Replying to vdelecroix:
I still do not understand why you are parallelizing the polyomino solver code and not the DLX one. Isn't your strategy exactly equivalent to this one: pick a column, for each row that does have a 1 in this column remove all the columns occuppied by the piece and launch an independent process?
Okay, I will work on a new commit for that.
comment:58 in reply to: ↑ 56 Changed 2 years ago by
Replying to slabbe:
Moreover, everything would be faster if you would store integer vectors instead of tuples (and an attribute
self._free_module
). Why aren't you doing that?
 Because I think I wanted to use the most basic hashable container (tuple) for the need. I realized only recently that it was slow because there are many matrix operations and additions involved. So indeed, storing vectors would be better.
 Because it is not the purpose of this ticket.
I opened #19036
comment:59 Changed 2 years ago by
 Commit changed from 0d68ecec28cb92d9e78a92d6e733d42b3e8941c1 to df655874d79c75f3d5b18adda01cedb620a1856b
Branch pushed to git repo; I updated commit sha1. New commits:
df65587  Trac #18987: Moved the paralel computation of nb of sol in dancing links module

comment:60 Changed 2 years ago by
 Status changed from needs_work to needs_review
(I will not have access to a machine with Sage for the next 7 days).
comment:61 Changed 2 years ago by
Ok, Vincent. I am now back and available to do the follow up on this ticket.
With the new proposal you made, this ticket now consist of two independant things. Do you prefer me to split the code into two tickets (easier to review) or is it ok like this?
Sébastien
comment:62 Changed 2 years ago by
 Description modified (diff)
 Summary changed from Parallel computation for TilingSolver.number_of_solutions to Parallel computation of number of solutions in dancing links
I finally decided to split this ticket into two to ease the review. The second part is now available at #19107.
comment:63 Changed 2 years ago by
 Description modified (diff)
comment:64 Changed 2 years ago by
 Commit changed from df655874d79c75f3d5b18adda01cedb620a1856b to c01612691e55baf745fc710fc8c8dfc97b885375
comment:65 followup: ↓ 67 Changed 2 years ago by
 Status changed from needs_review to needs_work
(solving) independant (cases)
>independent
 I do not understand this comment
If possible, a good choice of column gives a partition of solutions where each part has about the same number of solutions.
 could you make a complete sentence for the
OUTPUT
section instead ofdict, row number > list of rows
 this is not quite accurate
After the split each subproblem has the same number of columns and rows and the same solutions as above::
it is the union of solutions of the subproblems that form the solution of the initial one.  I do not see the point of having
number_of_solutions_iterator
andnumber_of_solutions
. You should just get rid of the first one and allow parallelization in the second.  Why not also make available parallelization for getting the list of solutions?
comment:66 Changed 2 years ago by
 Commit changed from c01612691e55baf745fc710fc8c8dfc97b885375 to 29ff98625316f95c48fd704115a209dce38deda8
Branch pushed to git repo; I updated commit sha1. New commits:
29ff986  Trac #18987: Fixed reviewer comment 65

comment:67 in reply to: ↑ 65 Changed 2 years ago by
 Status changed from needs_work to needs_review
Thanks for your review! I fixed the first comments.
 I do not see the point of having
number_of_solutions_iterator
andnumber_of_solutions
. You should just get rid of the first one and allow parallelization in the second.
number_of_solutions_iterator
is now_number_of_solutions_iterator
number_of_solutions
now allows parallel computation I kept
_number_of_solutions_iterator
because for the problem I am currently looking at,number_of_solutions
takes days while_number_of_solutions_iterator
yield something once every hours. So it allows me to follow the computation, making sure it is not stuck and evaluate the duration left to do. I prefer this way rather than removing method_number_of_solutions_iterator
and adding some verbose thing innumber_of_solutions
.
 Why not also make available parallelization for getting the list of solutions?
Because I don't know how or if it is possible to use the parallel decorator to merge iterators.
comment:68 Changed 2 years ago by
 Status changed from needs_review to positive_review
Salut Sebastien,
Thanks for your patience. I am sure that the choice of splitting is suboptimal but at least the design is much cleaner than in the first version.
Vincent
comment:69 Changed 2 years ago by
 Branch changed from public/18987 to 29ff98625316f95c48fd704115a209dce38deda8
 Resolution set to fixed
 Status changed from positive_review to closed
Preliminary version. Not ready for review.
New commits:
Parallel computation of the nb of solutions for tilings
Tilings polyominos modulo 180 degrees rotations