Opened 2 years ago

Closed 2 years ago

#18987 closed enhancement (fixed)

Parallel computation of number of solutions in dancing links

Reported by: slabbe Owned by:
Priority: major Milestone: sage-6.9
Component: combinatorics Keywords:
Cc: Merged in:
Authors: Sébastien Labbé Reviewers: Vincent Delecroix
Report Upstream: N/A Work issues:
Branch: 29ff986 (Commits) Commit: 29ff98625316f95c48fd704115a209dce38deda8
Dependencies: Stopgaps:

Description (last modified by slabbe)

The following computation takes a lot of time:

sage: from sage.games.quantumino import QuantuminoSolver
sage: QuantuminoSolver(0).number_of_solutions()  # long time (several days)

but we can make it faster by doing the computation in parallel... This ticket does this (directly in the dancing links code).

Change History (69)

comment:1 Changed 2 years ago by slabbe

  • Branch set to u/slabbe/18987
  • Commit set to a86b42d048567484cf30c7a6f7838704615083e3

Preliminary version. Not ready for review.


New commits:

ee430a4Parallel computation of the nb of solutions for tilings
a86b42dTilings polyominos modulo 180 degrees rotations

comment:2 Changed 2 years ago by git

  • Commit changed from a86b42d048567484cf30c7a6f7838704615083e3 to 6fd4a8759641415b121b4a12aa6ecb711490b533

Branch pushed to git repo; I updated commit sha1. New commits:

406d876Parallel computation of the nb of solutions for tilings
6fd4a87Tilings polyominos modulo 180 degrees rotations

comment:3 Changed 2 years ago by slabbe

  • Description modified (diff)

comment:4 Changed 2 years ago by slabbe

  • Branch u/slabbe/18987 deleted
  • Commit 6fd4a8759641415b121b4a12aa6ecb711490b533 deleted

comment:5 Changed 2 years ago by slabbe

  • Branch set to u/slabbe/18987
  • Commit set to 3c11961ebae4953711c490f7ac57679b99c59b52
  • Status changed from new to needs_review

New commits:

06915dcParallel computation of the nb of solutions for tilings
520cf93Tilings of polyominos modulo 180 degrees rotations
3c11961Add a transparent gray box to QuantuminoState.show3d

comment:6 Changed 2 years ago by slabbe

  • Status changed from needs_review to needs_work

Still other stuff to improve...

comment:7 Changed 2 years ago by git

  • Commit changed from 3c11961ebae4953711c490f7ac57679b99c59b52 to b370fd04a3efb7393d5516936958c552934faa7f

Branch pushed to git repo; I updated commit sha1. New commits:

a75f8e3Parallel computation of the nb of solutions for tilings
21038b0Tilings of polyominos modulo 180 degrees rotations
b370fd0Add a transparent gray box to QuantuminoState.show3d

comment:8 Changed 2 years ago by slabbe

  • Status changed from needs_work to needs_review

Ok, now needs reviews. I won't rebase my branch anymore.

comment:9 follow-ups: Changed 2 years ago by vdelecroix

  • Authors set to Sébastien Labbé
  • Reviewers set to Vincent Delecroix
  • Status changed from needs_review to needs_work

Salut Sebastien,

Do you mind if I rebase over 6.9.beta1? Note that I also have a waiting commit that does some tiny optimization to dancing_links.pyx.

I am not sure the overall strategy is the good one. If parallelization is needed I guess that it should better be implemented at the level of dancing links. Googling "parallelization dancing links" already gives a lot of things.

Less importantly:

  • I do not understand the name of the function orthogonal_transformation. Are these the orthogonal transformations of R^3 with integer coordinates? If so, please write more precise specifications.
  • As far as I see, you do not test all cases of the function orthogonal_transformation.
  • What is the modpi arguments. What is a rotation of angle pi for you? Is it a linear transformation that is a pi-rotation restricted on a plane and leaves invariant the orthogonal complement? (I guess it should also have integer coordinates) If this is, then it is of course not a group... but of course you might consider the group generated by these.

Vincent

comment:10 in reply to: ↑ 9 ; follow-up: Changed 2 years ago by slabbe

Replying to vdelecroix:

Salut Sebastien,

Do you mind if I rebase over 6.9.beta1?

Merge or rebase? I prefer if you merge. Or I can rebase on 6.9.beta1 to keep the authorship (right?).

Note that I also have a waiting commit that does some tiny optimization to dancing_links.pyx.

Where?

I am not sure the overall strategy is the good one. If parallelization is needed I guess that it should better be implemented at the level of dancing links. Googling "parallelization dancing links" already gives a lot of things.

Indeed, I am using a parallelization strategy that applies to a tiling problem where each piece is used only once. This strategy obviously does not apply to the general problem that is the Exact cover problem.

Also, I prefered to cut the (tiling) problem into subproblems that takes at most 2-3 hours each so that I can more easily follow the process of the computation and stop and restart the computation more easily. Even with a parallel implementation of dancing links, the computation would take days to finish.

comment:11 in reply to: ↑ 10 ; follow-up: Changed 2 years ago by vdelecroix

Replying to slabbe:

Replying to vdelecroix:

Salut Sebastien,

Do you mind if I rebase over 6.9.beta1?

Merge or rebase? I prefer if you merge. Or I can rebase on 6.9.beta1 to keep the authorship (right?).

The autorship of what? If I do a rebase, you keep the autorship of commits. But of course it will be my branch (or a public one).

I do prefer rebase over merge because:

  • history is much cleaner afterwards (fewer commits that follow each other)
  • you can hide whatever you want in a merge commit

Note that I also have a waiting commit that does some tiny optimization to dancing_links.pyx.

Where?

On my computer ;-) It is waiting for the rebase or merge.

I am not sure the overall strategy is the good one. If parallelization is needed I guess that it should better be implemented at the level of dancing links. Googling "parallelization dancing links" already gives a lot of things.

Indeed, I am using a parallelization strategy that applies to a tiling problem where each piece is used only once. This strategy obviously does not apply to the general problem that is the Exact cover problem.

In the exact cover problem, each subset is used at most once. In your tiling formulation a piece count for several subsets. You can naively apply the same thing for dancing links: look at one position i0 and consider the set S0 of subsets that cover it. For each subset in S0, launch a thread where you only run through the subproblem that consists of having fixed this subset.

But I guess that there exists less naive parallelization.

Also, I prefered to cut the (tiling) problem into subproblems that takes at most 2-3 hours each so that I can more easily follow the process of the computation and stop and restart the computation more easily. Even with a parallel implementation of dancing links, the computation would take days to finish.

Why? If your parallization ends with a time better than total_time / nb_cpus then you should parallelize more often ;-)

Vincent

comment:12 in reply to: ↑ 9 Changed 2 years ago by slabbe

Less importantly:

  • I do not understand the name of the function orthogonal_transformation.

I chose that name 3 years ago. I agree it was not the best function name.

Your comments are very good. The reason that things are not clearly written is that it is not clear enough in my head. Let me sleep this night and I will come back on this tomorrow or later this week.

When modpi=True, I (think! I) want to quotient the result by the group generated by the diagonal matrices of 1's and -1's with exactly two -1 on the diagonal. Only, when the dimension is 3, I was able to generate representative of the classes easily.

comment:13 in reply to: ↑ 11 ; follow-up: Changed 2 years ago by slabbe

Why? If your parallization ends with a time better than total_time / nb_cpus then you should parallelize more often ;-)

I have 240 subproblems each of them taking between 20 minutes and 10 hours of computation. But my machine at work only have 4 cores. So one way or the other, the computation takes days to finish since I do not have access to a super machine.

comment:14 in reply to: ↑ 9 Changed 2 years ago by slabbe

I am not sure the overall strategy is the good one. If parallelization is needed I guess that it should better be implemented at the level of dancing links. Googling "parallelization dancing links" already gives a lot of things.

If you agree, I suggest to move the discussion of "parallelization dancing links" in another ticket for which I am willing to be a reviewer.

comment:15 in reply to: ↑ 13 ; follow-up: Changed 2 years ago by vdelecroix

Replying to slabbe:

Why? If your parallization ends with a time better than total_time / nb_cpus then you should parallelize more often ;-)

I have 240 subproblems each of them taking between 20 minutes and 10 hours of computation. But my machine at work only have 4 cores. So one way or the other, the computation takes days to finish since I do not have access to a super machine.

This looks very bad. At the end, you might end up with only one core working on the biggest subinstance. And it can lasts several days even with 200 cores. Ideally, you should slice the problem in such way that each subinstance will not take longer than 1 hour (let say). This is why adopting a less naive strategy at the level of dancing links seems to me the best option since people already worked on it.

There is no super computer in Liege? I can set an invitation to use the very powerful Plafrim in Bordeaux. But it is some work to learn how to use it (e.g. you need to tell in advance for how long you request the processors).

Vincent

comment:16 Changed 2 years ago by git

  • Commit changed from b370fd04a3efb7393d5516936958c552934faa7f to cbbc26242b1883f8a51065ccd7e8e97e6154025b

Branch pushed to git repo; I updated commit sha1. New commits:

55fb3b7Parallel computation of the nb of solutions for tilings
ef31007Tilings of polyominos modulo 180 degrees rotations
cbbc262Add a transparent gray box to QuantuminoState.show3d

comment:17 Changed 2 years ago by vdelecroix

Quickly reading [1] (which looks serious but not very deep) it seems that it is hard to find subinstance of the problems that are well scaled for a given cluster.

[1] S. M. Ashraful Kadir, "A Parallel Programming Approach to Solve the Exact Cover Problem"

comment:18 in reply to: ↑ 15 Changed 2 years ago by slabbe

This looks very bad. At the end, you might end up with only one core working on the biggest subinstance. And it can lasts several days even with 200 cores. Ideally, you should slice the problem in such way that each subinstance will not take longer than 1 hour (let say).

It is not very bad as most of the 240 computations takes the same amount of time (about 2 to 3 hours). Maybe 5 of time takes more (10 hours). So I am using the four cores at least 95% of the time. In my case, I have very good subinstances to reuse your term.

There is no super computer in Liege?

Well maybe there is something. It is the first time in Liège that I need computation power, but it is the vacation now.

I can set an invitation to use the very powerful Plafrim in Bordeaux. But it is some work to learn how to use it (e.g. you need to tell in advance for how long you request the processors).

That would be nice!

Last edited 2 years ago by slabbe (previous) (diff)

comment:19 in reply to: ↑ 9 Changed 2 years ago by slabbe

Do you mind if I rebase over 6.9.beta1?

I just did that (in case you did not notice).

comment:20 Changed 2 years ago by vdelecroix

  • Branch changed from u/slabbe/18987 to public/18987
  • Commit changed from cbbc26242b1883f8a51065ccd7e8e97e6154025b to 9d8d0a142b58c2ab0c3ab2e18eef093295c0c54e

New commits:

9d8d0a1doc + optim in dancing_links

comment:21 Changed 2 years ago by vdelecroix

I found the organization of the dancing links code very confusing. Do you know why there are both a file dancing_links.pyx (that wraps the C++ class as a Cython class) and a file dlxcpp.py (that uses the Cython class in a functional style)? And there is also a combinat/dlx.py!! I guess this is mostly historical but it needs a serious cleanup.

comment:22 Changed 2 years ago by slabbe

Even just inside dancing_links.pyx, the class is called a Wrapper for the C++ code and dlx_solver is a function that calls the constructor of the wrapper...

Also I would not put this in combnat/matrices just because the exact cover problem can be represented as a 0-1 matrix...

Last edited 2 years ago by slabbe (previous) (diff)

comment:23 Changed 2 years ago by git

  • Commit changed from 9d8d0a142b58c2ab0c3ab2e18eef093295c0c54e to 91de8636683e1c8cec7d9d33e645721572de3b1c

Branch pushed to git repo; I updated commit sha1. New commits:

91de863rename orthogonal_transformation function

comment:24 Changed 2 years ago by slabbe

Is there a way to quotient two groups in Sage, maybe using gap or something ?

sage: L = [w.matrix() for w in WeylGroup(['B',3]) if w.matrix().det()==1]
sage: G = MatrixGroup(L)
sage: H = MatrixGroup(L[:4])
sage: len(G)
24
sage: len(H)
4
sage: H
Matrix group over Rational Field with 4 generators (
[1 0 0]  [ 1  0  0]  [-1  0  0]  [-1  0  0]
[0 1 0]  [ 0 -1  0]  [ 0  1  0]  [ 0 -1  0]
[0 0 1], [ 0  0 -1], [ 0  0 -1], [ 0  0  1]
)
sage: G.quotient(H)
Traceback (most recent call last):
...
NotImplementedError:
Last edited 2 years ago by slabbe (previous) (diff)

comment:25 Changed 2 years ago by git

  • Commit changed from 91de8636683e1c8cec7d9d33e645721572de3b1c to 0c752d4038a7419285a1c1fa9b0c21842b593a2e

Branch pushed to git repo; I updated commit sha1. New commits:

0c752d4Improve explanation of the modpi argument

comment:26 in reply to: ↑ 9 ; follow-up: Changed 2 years ago by slabbe

  • Status changed from needs_work to needs_review
  • What is the modpi arguments. What is a rotation of angle pi for you? Is it a linear transformation that is a pi-rotation restricted on a plane and leaves invariant the orthogonal complement? (I guess it should also have integer coordinates) If this is, then it is of course not a group... but of course you might consider the group generated by these.

Ok, so tell me if it is better now.

comment:27 in reply to: ↑ 26 ; follow-up: Changed 2 years ago by vdelecroix

Replying to slabbe:

  • What is the modpi arguments. What is a rotation of angle pi for you? Is it a linear transformation that is a pi-rotation restricted on a plane and leaves invariant the orthogonal complement? (I guess it should also have integer coordinates) If this is, then it is of course not a group... but of course you might consider the group generated by these.

Ok, so tell me if it is better now.

A bit. What do you mean by the rectangular parallelepiped? There is only one? As far as I understand it is the isometry group that preserves any rectangular parallelepiped. You can also say differently: the group of orientable isometry that preserve (globally) each axis. Is that right?

And as before, this is the group *generated* by these rotation and not only thes rotations themselves (except in dim 2 and 3).

Isn't this group exactly the subgroup of matrices with an even number of -1 on the diagonal? So the quotient is just the signed permutation matrices with either 0 or 1 coefficient -1 (all others being 1). Isn't it?

When orientation_preserving is False I guess this group is the subgroup of matrices with any number of -1 on the diagonal. In that case, the quotient would just be the permutation matrices. No?

comment:28 Changed 2 years ago by vdelecroix

And when you say that is by rotations of angle pi this is confusing since there are infinitely many of them. You can say by rotations of angle pi on the plane generated by two axes.

comment:29 in reply to: ↑ 27 ; follow-ups: Changed 2 years ago by slabbe

A bit. What do you mean by the rectangular parallelepiped? There is only one?

Of course the length of the side can change, so there is more than one. But, of course, I consider the case where the length of the sides are all distinct.

As far as I understand it is the isometry group that preserves any rectangular parallelepiped.

You understand well.

You can also say differently: the group of orientable isometry that preserve (globally) each axis. Is that right?

Yes.

And as before, this is the group *generated* by these rotation and not only thes rotations themselves (except in dim 2 and 3).

Of course. "modpi" is in the sense of modulo a group.

Isn't this group exactly the subgroup of matrices with an even number of -1 on the diagonal?

Yes.

So the quotient is just the signed permutation matrices with either 0 or 1 coefficient -1 (all others being 1). Isn't it?

Maybe. When the dimension is odd, the quotient is the signed permutation matrices where the signs is either all positive or all negative with determinant 1. That is the formula that I was using.

When orientation_preserving is False I guess this group is the subgroup of matrices with any number of -1 on the diagonal. In that case, the quotient would just be the permutation matrices. No?

Yes! You are right.

Last edited 2 years ago by slabbe (previous) (diff)

comment:30 in reply to: ↑ 29 Changed 2 years ago by slabbe

So the quotient is just the signed permutation matrices with either 0 or 1 coefficient -1 (all others being 1). Isn't it?

When orientation_preserving=True, the determinant of every returned matrix must be one. Therefore, I believe the quotient is the positive permutation matrices of determinant one + the other permutation matrices where one 1 is replaced by -1.

comment:31 in reply to: ↑ 29 ; follow-up: Changed 2 years ago by slabbe

So the quotient is just the signed permutation matrices with either 0 or 1 coefficient -1 (all others being 1). Isn't it?

Maybe. When the dimension is odd, the quotient is the signed permutation matrices where the signs is either all positive or all negative with determinant 1. That is the formula that I was using.

Ok, now I understand why I needed it that way. I need well-chosen representatives for the quotient. By this, I mean that the chosen representatives of the cosets form a group itself.

For example, these representatives do not form a group:

sage: n = 3
sage: c = identity_matrix(n)
sage: c[0,0] = -1
sage: L = [w.matrix() for w in WeylGroup(['A', n-1])]
sage: L = [(w if w.det() == 1 else c*w) for w in L]
[
[1 0 0]  [ 0  0 -1]  [0 0 1]  [ 0 -1  0]  [0 1 0]  [-1  0  0]
[0 1 0]  [ 0  1  0]  [1 0 0]  [ 1  0  0]  [0 0 1]  [ 0  0  1]
[0 0 1], [ 1  0  0], [0 1 0], [ 0  0  1], [1 0 0], [ 0  1  0]
]
sage: MatrixGroup(L).cardinality()
24

But these representatives forms a group:

sage: L = [w.matrix() for w in WeylGroup(['A', n-1])]
sage: L = [m.det() * m for m in L]
sage: L
[
[1 0 0]  [ 0  0 -1]  [0 0 1]  [ 0 -1  0]  [0 1 0]  [-1  0  0]
[0 1 0]  [ 0 -1  0]  [1 0 0]  [-1  0  0]  [0 0 1]  [ 0  0 -1]
[0 0 1], [-1  0  0], [0 1 0], [ 0  0 -1], [1 0 0], [ 0 -1  0]
]
sage: MatrixGroup(L).cardinality()
6

And I still don't know how to construct the quotient when n is even.

comment:32 Changed 2 years ago by slabbe

Okay, so it is a chance that it works for when n is odd because in general:

http://groupprops.subwiki.org/wiki/Quotient_group_need_not_be_isomorphic_to_any_subgroup

comment:33 in reply to: ↑ 31 Changed 2 years ago by slabbe

Ok, now I understand why I needed it that way.

Wait. I really need the quotient itself finally. I was just lucky that the transformation keeping some the pentamino invariant was in the group.

    sage: from sage.combinat.tiling import ncube_isometry_group
    sage: from sage.games.quantumino import pentaminos
    sage: L = ncube_isometry_group(3)
    sage: f = lambda p : [m for m in L[1:] if (m*p).canonical() == p.canonical()]
    sage: [(i, f(p)) for i,p in enumerate(pentaminos) if f(p)]
    [(6, [
    [ 0  0 -1]
    [ 0 -1  0]
    [-1  0  0]
    ]),
     (7, [
    [ 0  0  1]
    [ 0 -1  0]
    [ 1  0  0]
    ]),
     (12, [
    [-1  0  0]
    [ 0  0 -1]
    [ 0 -1  0]
    ]),
     (13, [
    [ 0  0 -1]
    [ 0 -1  0]
    [-1  0  0]
    ]),
     (16, [
    [ 0  0 -1]
    [ 0 -1  0]
    [-1  0  0]
    ])]

Above, I get a problem with pentamino number 7 because it is invariant under a transformation that is not in the subgroup isomorphic to the quotient. So I really need to consider the quotient with all of the elements in each coset. Chosing a representative won't work even if it is well chosen.

Give me more time. I'll update my branch.

comment:34 Changed 2 years ago by git

  • Commit changed from 0c752d4038a7419285a1c1fa9b0c21842b593a2e to dc3797bd98509562635b7ef940c7cbd48fc71e31

Branch pushed to git repo; I updated commit sha1. New commits:

a51e4d8Using cosets for when modpi=True
dc3797bChange pentaminos to there canonical form

comment:35 Changed 2 years ago by slabbe

Re-needs review.

comment:36 Changed 2 years ago by vdelecroix

Note that the patchbot was not able to build the doc.

comment:37 Changed 2 years ago by git

  • Commit changed from dc3797bd98509562635b7ef940c7cbd48fc71e31 to 1e01ba27547f9a7f8ec2e35418f95b00e2ffb405

Branch pushed to git repo; I updated commit sha1. New commits:

ca84fcfTrac #18987: Parallel computation of the nb of solutions for tilings
3d4402bTrac #18987: Tilings of polyominos modulo 180 degrees rotations
65f6512Trac #18987: Add a transparent gray box to QuantuminoState.show3d
555f0a9Trac #18987: doc + optim in dancing_links
481d96eTrac #18987: rename orthogonal_transformation function
5f66e1dTrac #18987: Improve explanation of the modpi argument
d359115Trac #18987: Using cosets for when modpi=True
1e01ba2Trac #18987: Change pentaminos to there canonical form

comment:38 Changed 2 years ago by slabbe

There was a TESTS:: with text following. Fixed that (and squashed it the the last commit). Also added the prefix Trac #18987: to all commit messages.

comment:39 Changed 2 years ago by slabbe

  • Status changed from needs_review to needs_work

There is one failing doctest on the patchbot that is not failing on my machine...

comment:40 Changed 2 years ago by git

  • Commit changed from 1e01ba27547f9a7f8ec2e35418f95b00e2ffb405 to 2133ecdcffb1b993049874c8be6cad2b1224a73a

Branch pushed to git repo; I updated commit sha1. New commits:

2133ecdTrac #18987: fixing intermittent doctest failure

comment:41 Changed 2 years ago by slabbe

  • Status changed from needs_work to needs_review

comment:42 follow-ups: Changed 2 years ago by vdelecroix

Are you sure ncube_isometry_group is worth a @cached_function?

In ncube_isometry_group_modpi why are you building MatrixGroup?

In

P_coset = set(frozenset((m.matrix() * self).canonical() for m in coset) for coset in L)
return set(next(iter(s)) for s in P_coset)

You are building a lot of images to consider matrix x polyomino to use just one at the end. Why not

return set((L[0].matrix() * self).canonical() for coset in L)

And if you were not using matrix groups you can even do

return set((L[0] * self).canonical() for coset in L)

comment:43 Changed 2 years ago by slabbe

You are using L instead of coset I think. Can you edit your previous comment to make sure I understand what you mean?

comment:44 in reply to: ↑ 42 ; follow-up: Changed 2 years ago by slabbe

Replying to vdelecroix:

Are you sure ncube_isometry_group is worth a @cached_function?

Calling this function takes 2.8s on my machine. And it is called once for each polyomino. That is 16 times for the Quantumino puzzle. With the cache, I gain about 40s to construct the rows to give to the dlx solver.

In ncube_isometry_group_modpi why are you building MatrixGroup?

Because otherwise, this

G = ncube_isometry_group(n, orientation_preserving)
H = [h for h in G if all(i==j for (i,j) in h.nonzero_positions())]
left_cosets = set(tuple(sorted(h*g for h in H)) for g in G)

throws a TypeError: mutable matrices are unhashable and I find it more fun to read like this instead of the .set_immutable() on every matrices.

Last edited 2 years ago by slabbe (previous) (diff)

comment:45 in reply to: ↑ 42 Changed 2 years ago by slabbe

You are building a lot of images to consider matrix x polyomino to use just one at the end. Why not

I know but I need all of them. That is what I understood yesterday. Let me try to explain. In general it is okay to take only one matrix in the coset. The problem comes when the polyomino is invariant under some of the 24 orientation preserving isometries of the cube. For example, consider:

sage: from sage.games.quantumino import pentaminos                                  
sage: from sage.combinat.tiling import ncube_isometry_group                         
sage: p = pentaminos[7]                                                             
sage: m = ncube_isometry_group(3)[-3]                                               
sage: p                                                                             
Polyomino: [(0, 0, 0), (0, 1, 0), (0, 2, 0), (0, 2, 1), (1, 0, 0)], Color: orange   
sage: m                                                                             
[ 0  0  1]                                                                          
[ 0 -1  0]                                                                          
[ 1  0  0]                                                                          
sage: (m*p).canonical() == p                                                        
True                                                                                

The polyomino p has 12 distinct rotation images instead of 24. And among the 12 ways of placing that polyomino into a box, there are 3 distinct ways up to rotation of the box keeping the box invariant. To compute this, we need to consider the whole coset:

sage: cosets = ncube_isometry_group_modpi(3)
sage: P_coset = set(frozenset((m.matrix() * p).canonical() for m in coset) for coset in cosets)
sage: len(P_coset)                         
3   
sage: len(set(next(iter(s)) for s in P_coset))
3                                                                                          

Otherwise, you obtain too many polyominos (see below).

sage: set((coset[0].matrix() * p).canonical() for coset in cosets)                 
{Polyomino: [(0, 0, 1), (1, 0, 1), (1, 1, 1), (1, 2, 0), (1, 2, 1)], Color: orange,
 Polyomino: [(0, 1, 2), (1, 0, 0), (1, 1, 0), (1, 1, 1), (1, 1, 2)], Color: orange,
 Polyomino: [(0, 0, 0), (0, 1, 0), (1, 1, 0), (2, 1, 0), (2, 1, 1)], Color: orange,
 Polyomino: [(0, 1, 0), (0, 1, 1), (1, 1, 1), (2, 0, 1), (2, 1, 1)], Color: orange,
 Polyomino: [(0, 2, 0), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 2, 0)], Color: orange}

comment:46 in reply to: ↑ 44 ; follow-ups: Changed 2 years ago by vdelecroix

Replying to slabbe:

Replying to vdelecroix:

Are you sure ncube_isometry_group is worth a @cached_function?

Calling this function takes 2.8s on my machine. And it is called once for each polyomino. That is 16 times for the Quantumino puzzle. With the cache, I gain about 40s to construct the rows to give to the dlx solver.

Perhaps the way it is implemented is wrong. For example

sage: %timeit P = [s.matrix() for s in SymmetricGroup(4)]
1 loops, best of 3: 1.27 ms per loop

and with the signs

sage: from itertools import product
sage: %timeit M = [diagonal_matrix(p) for p in product([1,-1], repeat=4)]
1 loops, best of 3: 2.9 ms per loop
sage: M = [diagonal_matrix(p) for p in product([1,-1], repeat=4)]
sage: %timeit S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]
100 loops, best of 3: 18.3 ms per loop

So it is likely to be ~20ms.

comment:47 Changed 2 years ago by git

  • Commit changed from 2133ecdcffb1b993049874c8be6cad2b1224a73a to 0d68ecec28cb92d9e78a92d6e733d42b3e8941c1

Branch pushed to git repo; I updated commit sha1. New commits:

0d68eceTrac #18987: removed a cached_function

comment:48 in reply to: ↑ 46 ; follow-up: Changed 2 years ago by slabbe

Replying to vdelecroix:

Are you sure ncube_isometry_group is worth a @cached_function?

In fact, you make me realize that there is already caching involved in WeylGroup. Therefore caching that function is not so necessary. Without caching, I get:

sage: from sage.combinat.tiling import ncube_isometry_group
sage: time L = ncube_isometry_group(4)             
CPU times: user 1.14 s, sys: 19.7 ms, total: 1.16 s
Wall time: 1.3 s                                   
sage: time L = ncube_isometry_group(4)             
CPU times: user 358 ms, sys: 4.01 ms, total: 362 ms
Wall time: 448 ms                                      
sage: from itertools import product
sage: %timeit M = [diagonal_matrix(p) for p in product([1,-1], repeat=4)]
1 loops, best of 3: 2.9 ms per loop
sage: M = [diagonal_matrix(p) for p in product([1,-1], repeat=4)]
sage: %timeit S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]
100 loops, best of 3: 18.3 ms per loop

So it is likely to be ~20ms.

Note that you can't use timeit above since there is caching involved in SymmetricGroup.

For comparison, your solution on my machine gives the following timings:

sage: from itertools import product   
sage: # first call                                 
sage: time M = [diagonal_matrix(p) for p in product([1,-1], repeat=4)] 
CPU times: user 11.4 ms, sys: 92 µs, total: 11.5 ms                    
Wall time: 11.9 ms                                                     
sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]    
CPU times: user 191 ms, sys: 25.2 ms, total: 216 ms                    
Wall time: 667 ms                                                      
sage: # second call
sage: time M = [diagonal_matrix(p) for p in product([1,-1], repeat=4)]
CPU times: user 11.7 ms, sys: 1.34 ms, total: 13 ms                   
Wall time: 16.6 ms                                                    
sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]   
CPU times: user 108 ms, sys: 1.92 ms, total: 110 ms                   
Wall time: 114 ms                                                     

So it seems using WeylGroup(['B',4]) is about 4 times slower than your solution. But I don't know if I will change it. I mean that improvement could be done directly in WeylGroup code...

comment:49 in reply to: ↑ 48 ; follow-up: Changed 2 years ago by vdelecroix

Replying to slabbe:

sage: from itertools import product
sage: %timeit M = [diagonal_matrix(p) for p in product([1,-1], repeat=4)]
1 loops, best of 3: 2.9 ms per loop
sage: M = [diagonal_matrix(p) for p in product([1,-1], repeat=4)]
sage: %timeit S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]
100 loops, best of 3: 18.3 ms per loop

So it is likely to be ~20ms.

Note that you can't use timeit above since there is caching involved in SymmetricGroup.

Where?! Beyond the construction of the group (SymmetricGroup(4) is SymmetricGroup(4) gives True) nothing is cached.

comment:50 in reply to: ↑ 49 Changed 2 years ago by slabbe

Where?! Beyond the construction of the group (SymmetricGroup(4) is SymmetricGroup(4) gives True) nothing is cached.

Don't you get that the first execution of

sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]

is slower than the second? And that the second takes about the same time than the third, the fourth, etc. ? Like me:

sage: from itertools import product                                
sage: M = [diagonal_matrix(p) for p in product([1,-1], repeat=4)]  
sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]
CPU times: user 184 ms, sys: 18 ms, total: 202 ms                  
Wall time: 710 ms                                                  
sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]
CPU times: user 109 ms, sys: 2.05 ms, total: 111 ms                
Wall time: 166 ms                                                  
sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]
CPU times: user 109 ms, sys: 2.6 ms, total: 111 ms                 
Wall time: 113 ms                                                  
sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]
CPU times: user 110 ms, sys: 2.12 ms, total: 112 ms                
Wall time: 169 ms                                                  
sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]
CPU times: user 108 ms, sys: 2.44 ms, total: 111 ms                
Wall time: 114 ms                                                  

comment:51 Changed 2 years ago by vdelecroix

Yes for the timing. But this does not implies that something is cached. Actually, the reason is because gap is launched (I do not know why)

sage: S = SymmetricGroup(4)
sage: time gap(3)
CPU times: user 16 ms, sys: 12 ms, total: 28 ms
Wall time: 295 ms
3
sage: from itertools import product
sage: M = [diagonal_matrix(p) for p in product([1,-1], repeat=4)]
sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]
CPU times: user 56 ms, sys: 0 ns, total: 56 ms
Wall time: 56.9 ms
sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]
CPU times: user 56 ms, sys: 0 ns, total: 56 ms
Wall time: 53 ms

comment:52 Changed 2 years ago by slabbe

caching ncube_isometry_group gives:

sage: from sage.games.quantumino import QuantuminoSolver 
sage: q = QuantuminoSolver(0)                            
sage: t = q.tiling_solver()                              
sage: time rows = t.rows()                               
CPU times: user 18.8 s, sys: 168 ms, total: 19 s         
Wall time: 19 s                                          

not caching ncube_isometry_group gives:

sage: from sage.games.quantumino import QuantuminoSolver  
sage: q = QuantuminoSolver(0)                             
sage: t = q.tiling_solver()                               
sage: time rows = t.rows()                                
CPU times: user 19.4 s, sys: 288 ms, total: 19.7 s        
Wall time: 20 s                                                                      

So I confirm that I don't lose relatively much time by not caching.

comment:53 in reply to: ↑ 46 ; follow-up: Changed 2 years ago by slabbe

Replying to vdelecroix:

Perhaps the way it is implemented is wrong. For example

sage: %timeit P = [s.matrix() for s in SymmetricGroup(4)]
1 loops, best of 3: 1.27 ms per loop

I removed the cached_method. Do you want me to replace the code based on WeylGroup(['B',n]) by the SymmetricGroup + diagonal_matrix of signs code you propose? I think that this improvement should be done in WeylGroup in another ticket.

Also, for timing improvements in that file tiling.py, there is a another more important thing to spend time on because a lot of time is spent creating vectors from tuple (because coordinates are stored as tuple). This should be done in another ticket.

Needs review!

comment:54 in reply to: ↑ 53 ; follow-up: Changed 2 years ago by vdelecroix

Replying to slabbe:

Replying to vdelecroix:

Perhaps the way it is implemented is wrong. For example

sage: %timeit P = [s.matrix() for s in SymmetricGroup(4)]
1 loops, best of 3: 1.27 ms per loop

I removed the cached_method. Do you want me to replace the code based on WeylGroup(['B',n]) by the SymmetricGroup + diagonal_matrix of signs code you propose? I think that this improvement should be done in WeylGroup in another ticket.

Nope. I also the think that the WeylGroup code has to be improved.

Also, for timing improvements in that file tiling.py, there is a another more important thing to spend time on because a lot of time is spent creating vectors from tuple (because coordinates are stored as tuple). This should be done in another ticket.

Did you notice that

sage: t = map(vector, t)

is much slower than

sage: V = FreeModule(ZZ,12)
sage: t = map(V,t)

(I measure a factor x8 in __sub__ and __add__ for example)

Moreover, everything would be faster if you would store integer vectors instead of tuples (and an attribute self._free_module). Why aren't you doing that?

comment:55 follow-up: Changed 2 years ago by vdelecroix

I still do not understand why you are parallelizing the polyomino solver code and not the DLX one. Isn't your strategy exactly equivalent to this one: pick a column, for each row that does have a 1 in this column remove all the columns occuppied by the piece and launch an independent process?

comment:56 in reply to: ↑ 54 ; follow-up: Changed 2 years ago by slabbe

Moreover, everything would be faster if you would store integer vectors instead of tuples (and an attribute self._free_module). Why aren't you doing that?

  1. Because I think I wanted to use the most basic hashable container (tuple) for the need. I realized only recently that it was slow because there are many matrix operations and additions involved. So indeed, storing vectors would be better.
  1. Because it is not the purpose of this ticket.

comment:57 in reply to: ↑ 55 Changed 2 years ago by slabbe

  • Status changed from needs_review to needs_work

Replying to vdelecroix:

I still do not understand why you are parallelizing the polyomino solver code and not the DLX one. Isn't your strategy exactly equivalent to this one: pick a column, for each row that does have a 1 in this column remove all the columns occuppied by the piece and launch an independent process?

Okay, I will work on a new commit for that.

comment:58 in reply to: ↑ 56 Changed 2 years ago by vdelecroix

Replying to slabbe:

Moreover, everything would be faster if you would store integer vectors instead of tuples (and an attribute self._free_module). Why aren't you doing that?

  1. Because I think I wanted to use the most basic hashable container (tuple) for the need. I realized only recently that it was slow because there are many matrix operations and additions involved. So indeed, storing vectors would be better.
  1. Because it is not the purpose of this ticket.

I opened #19036

comment:59 Changed 2 years ago by git

  • Commit changed from 0d68ecec28cb92d9e78a92d6e733d42b3e8941c1 to df655874d79c75f3d5b18adda01cedb620a1856b

Branch pushed to git repo; I updated commit sha1. New commits:

df65587Trac #18987: Moved the paralel computation of nb of sol in dancing links module

comment:60 Changed 2 years ago by slabbe

  • Status changed from needs_work to needs_review

(I will not have access to a machine with Sage for the next 7 days).

Last edited 2 years ago by slabbe (previous) (diff)

comment:61 Changed 2 years ago by slabbe

Ok, Vincent. I am now back and available to do the follow up on this ticket.

With the new proposal you made, this ticket now consist of two independant things. Do you prefer me to split the code into two tickets (easier to review) or is it ok like this?

Sébastien

comment:62 Changed 2 years ago by slabbe

  • Description modified (diff)
  • Summary changed from Parallel computation for TilingSolver.number_of_solutions to Parallel computation of number of solutions in dancing links

I finally decided to split this ticket into two to ease the review. The second part is now available at #19107.

comment:63 Changed 2 years ago by slabbe

  • Description modified (diff)

comment:64 Changed 2 years ago by git

  • Commit changed from df655874d79c75f3d5b18adda01cedb620a1856b to c01612691e55baf745fc710fc8c8dfc97b885375

Branch pushed to git repo; I updated commit sha1. New commits:

8db8f72Trac #18987: Number of solutions method for dancing links
9d25f7cTrac #18987: doc + optim in dancing_links
c016126Trac #18987: parallel computations in dancing links

comment:65 follow-up: Changed 2 years ago by vdelecroix

  • Status changed from needs_review to needs_work
  • (solving) independant (cases) -> independent
  • I do not understand this comment
    If possible, a good choice of column gives a partition
    of solutions where each part has about the same number
    of solutions.
    
  • could you make a complete sentence for the OUTPUT section instead of dict, row number -> list of rows
  • this is not quite accurate
    After the split each subproblem has the same number of columns and
    rows and the same solutions as above::
    
    it is the union of solutions of the subproblems that form the solution of the initial one.
  • I do not see the point of having number_of_solutions_iterator and number_of_solutions. You should just get rid of the first one and allow parallelization in the second.
  • Why not also make available parallelization for getting the list of solutions?

comment:66 Changed 2 years ago by git

  • Commit changed from c01612691e55baf745fc710fc8c8dfc97b885375 to 29ff98625316f95c48fd704115a209dce38deda8

Branch pushed to git repo; I updated commit sha1. New commits:

29ff986Trac #18987: Fixed reviewer comment 65

comment:67 in reply to: ↑ 65 Changed 2 years ago by slabbe

  • Status changed from needs_work to needs_review

Thanks for your review! I fixed the first comments.

  • I do not see the point of having number_of_solutions_iterator and number_of_solutions. You should just get rid of the first one and allow parallelization in the second.
  • number_of_solutions_iterator is now _number_of_solutions_iterator
  • number_of_solutions now allows parallel computation
  • I kept _number_of_solutions_iterator because for the problem I am currently looking at, number_of_solutions takes days while _number_of_solutions_iterator yield something once every hours. So it allows me to follow the computation, making sure it is not stuck and evaluate the duration left to do. I prefer this way rather than removing method _number_of_solutions_iterator and adding some verbose thing in number_of_solutions.
  • Why not also make available parallelization for getting the list of solutions?

Because I don't know how or if it is possible to use the parallel decorator to merge iterators.

comment:68 Changed 2 years ago by vdelecroix

  • Status changed from needs_review to positive_review

Salut Sebastien,

Thanks for your patience. I am sure that the choice of splitting is suboptimal but at least the design is much cleaner than in the first version.

Vincent

comment:69 Changed 2 years ago by vbraun

  • Branch changed from public/18987 to 29ff98625316f95c48fd704115a209dce38deda8
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.