Opened 3 years ago

Last modified 2 years ago

#25662 needs_work defect

Wrong contains in PartitionDiagrams

Reported by: mantepse Owned by:
Priority: major Milestone: sage-8.3
Component: combinatorics Keywords:
Cc: alauve, tscrim, zabrocki Merged in:
Authors: Reviewers: Daniel Krenn
Report Upstream: N/A Work issues:
Branch: u/mantepse/wrong_contains_in_partitiondiagrams (Commits) Commit: ab58ad8dbc68c29df337f19fbfba4544f2237f46
Dependencies: #25659, #25462, #25642 Stopgaps:

Description

In 8.3 beta 7:

sage: import sage.combinat.diagram_algebras as da
sage: PD = da.PartitionDiagrams(3/2)
sage: PD.list()
[{{-2, -1, 1, 2}},
 {{-2, 1, 2}, {-1}},
 {{-2, 2}, {-1, 1}},
 {{-2, -1, 2}, {1}},
 {{-2, 2}, {-1}, {1}}]
sage: [[-2,-1],[2,1]] in PD
True

The fix is to remove the individual __contains__ methods, and correct the generic one, and optimize afterwards. However, to avoid a mess, I'd like to wait for the dependencies.

Change History (24)

comment:1 Changed 3 years ago by mantepse

I do not understand these doctests:

 sage: import sage.combinat.diagram_algebras as da
            sage: R.<x> = QQ[]
            sage: PA = PartitionAlgebra(2, x, R, 'P')
            sage: PA([]) == PA.one()
            True
            sage: D = da.DiagramAlgebra(2, x, R, 'P', da.PartitionDiagrams(2))
            sage: D([]) == D.one()
            Traceback (most recent call last):
            ...
            ValueError: invalid input of []

Why is PA([]) supposed to give the identity, but D([]) throw an error? Is it because of UnitDiagramMixin magic?

Put differently, the check method of AbstractPartitionDiagram explicitely allows the empty diagram, irrespective of order. Should [] in PartitionDiagrams(2) return true or false?

comment:2 Changed 3 years ago by mantepse

  • Branch set to u/mantepse/wrong_contains_in_partitiondiagrams

comment:3 Changed 2 years ago by git

  • Commit set to c32cbed705a207cf13c246e8bdda87f4a90bf680

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

3d5f553Merge branch 'public/combinat/speedup_set_partitions-25462' of git://trac.sagemath.org/sage into t/25659/make_braueralgebra_faster
c7fdf9bInitial Cython version of the PerfectMatchings iterator.
dd4441aAdded fixed point free involution generator and convert to SP. Fixing bug in PerfectMatchingsIterator.
034589dOptimizing generation using fixed-point-free involutions.
7942eabUsing FPF involution iterator and skipping sorting.
80b13f4Use the new backend iterators in diagram_algebras.py.
b486103add reference
77d6902Minor tweaks.
14ed798Merge branch 'public/combinat/speedup_brauer_algebra-25659' of git://trac.sagemath.org/sage into t/25662/wrong_contains_in_partitiondiagrams
c32cbedfix doctests destroyed by merge

comment:4 Changed 2 years ago by mantepse

  • Status changed from new to needs_review

comment:5 Changed 2 years ago by mantepse

ping?

comment:6 Changed 2 years ago by mantepse

ping ping?

comment:7 Changed 2 years ago by dkrenn

  • Reviewers set to Daniel Krenn
  • Status changed from needs_review to needs_work

Started reviewing:

  1. +            obj = self._element_constructor_(obj)
    

I think here self(obj) is the correct way to do as this includes the shortcuts for elements with the "correct" parent.

2.

+        if self.order not in ZZ:
+            # check that k and -k are in the same block
+            k = ZZ(self.order + ZZ(1)/ZZ(2))

I have no glue when the order is not in ZZ (apperently it can have half values, but is this always the case? (I am a bit scared about this code...)

So, either someone tells me, where this is explained or someone else takes a look (I'd prefer the latter).

Rest looks good AFAICS.

comment:8 Changed 2 years ago by git

  • Commit changed from c32cbed705a207cf13c246e8bdda87f4a90bf680 to fee85a739034b72b7fa39ac4d9be2a77bbfa5648

Branch pushed to git repo; I updated commit sha1. New commits:

e7d2738Merge branch 'u/mantepse/wrong_contains_in_partitiondiagrams' of git://trac.sagemath.org/sage into HEAD
fee85a7replace self._element_constructor_(obj) with self(obj)

comment:9 follow-up: Changed 2 years ago by mantepse

Wow, that was quick - thank you!

OK, I admit I don't know about the first one, so I simply tried and it looks OK. If I understand correctly, self(obj) will invoke __call__, which in turn calls _element_constructor_, right? I did not check whether there is a performance penality.

The second one is by design of this class (which is not myself). For the partition algebra, it makes algebraically a lot of sense to have half-integer order. Combinatorially, this simply implies that the final elements of the top and the bottom row, by convention k and -k, are in the same block.

By design, all other diagram algebras are currently regarded as sub-algebras of the partition algebra. This may be up to debate, but is certainly outside the scope of this ticket.

Martin


New commits:

e7d2738Merge branch 'u/mantepse/wrong_contains_in_partitiondiagrams' of git://trac.sagemath.org/sage into HEAD
fee85a7replace self._element_constructor_(obj) with self(obj)

comment:10 in reply to: ↑ 9 Changed 2 years ago by dkrenn

Replying to mantepse:

OK, I admit I don't know about the first one, so I simply tried and it looks OK. If I understand correctly, self(obj) will invoke __call__, which in turn calls _element_constructor_, right? I did not check whether there is a performance penality.

The thing is that _element_constructor_ is not called if the element has already the parent self (maybe there are some additional shortcuts if the element_class already fits). So, therefore this is the way to convert an element. I do not think that the performance is here an issue (and I am very certain that one could construct an example, where your code does not work). So therefore, I am in favor of changing it.

The second one is by design of this class (which is not myself). For the partition algebra, it makes algebraically a lot of sense to have half-integer order. Combinatorially, this simply implies that the final elements of the top and the bottom row, by convention k and -k, are in the same block.

Ok. So then the code is somehow fine for me (because if for whatever reason the order is not half-integer, then there will be an error and not a false result).

It might be worth considering restructuring the code so that if self.order in ZZ: return True so that it is clear that the code ends here in this case and only otherwise we do something. (This is also the style on the lines before).

By design, all other diagram algebras are currently regarded as sub-algebras of the partition algebra. This may be up to debate, but is certainly outside the scope of this ticket.

Clear, this is not for discussion here.

comment:11 Changed 2 years ago by mantepse

  • Status changed from needs_work to needs_review

comment:12 Changed 2 years ago by git

  • Commit changed from fee85a739034b72b7fa39ac4d9be2a77bbfa5648 to ab58ad8dbc68c29df337f19fbfba4544f2237f46

Branch pushed to git repo; I updated commit sha1. New commits:

ab58ad8restructure flow

comment:13 Changed 2 years ago by mantepse

Done!

comment:14 Changed 2 years ago by dkrenn

  • Status changed from needs_review to positive_review

Thanks, LGTM.

comment:15 follow-up: Changed 2 years ago by tscrim

  • Status changed from positive_review to needs_work

There are a number of things wrong or misleading. First, self(foo) redirects to self._element_constructor_(foo) unless foo is already an element of the parent self. I generally disagree with testing self(foo) and catching an error first. This is very slow for things that inherit from AbstractPartitionDiagram. I believe you also get None in PartitionDiagrams(0) returning True whereas the current version, this is False.

I think what you should do is actually properly fix the __contains__ that are there. This might include a better check method too.

Last edited 2 years ago by tscrim (previous) (diff)

comment:16 follow-up: Changed 2 years ago by mantepse

As with your comment on the permutation code, I am completely lost.

Maybe you could provide a patch yourself, so I can see what you mean?

Patching the individual __contains__ is not the way to go in my opinion, because it is quite error prone (as visible in the original code). As I wrote above, I think it is better to have a working version first, and optimise afterwards. Note that there are many other diagram algebras that would be good to have, so it is important that the generic method works well.

comment:17 in reply to: ↑ 16 Changed 2 years ago by tscrim

Replying to mantepse:

As with your comment on the permutation code, I am completely lost.

Maybe you could provide a patch yourself, so I can see what you mean?

No because there is no patch to provide. What I am saying is that I do not think your proposed change is the way forward with reasoning why. Although I did see one typo above that I have fixed.

Patching the individual __contains__ is not the way to go in my opinion, because it is quite error prone (as visible in the original code). As I wrote above, I think it is better to have a working version first, and optimise afterwards. Note that there are many other diagram algebras that would be good to have, so it is important that the generic method works well.

It is definitely the way to go because it has better separation-of-concerns (checks are localized to each part), it is faster, and it should not be hard to do. If you want to make it work, then fix it properly.

comment:18 Changed 2 years ago by mantepse

  • Authors Martin Rubey deleted

OK, I'll abandon this for the moment, I don't understand what you want me to do (except for implementing a method for each diagram algebra separately, which I believe is wrong). I'll do the quick fix in the permutations file, but this one is too much work right now. Exams coming up.

comment:19 in reply to: ↑ 15 ; follow-up: Changed 2 years ago by dkrenn

Replying to tscrim:

There are a number of things wrong or misleading. First, self(foo) redirects to self._element_constructor_(foo) unless foo is already an element of the parent self. I generally disagree with testing self(foo) and catching an error first. This is very slow for things that inherit from AbstractPartitionDiagram. I believe you also get None in PartitionDiagrams(0) returning True whereas the current version, this is False.

I think what you should do is actually properly fix the __contains__ that are there. This might include a better check method too.

I am not sure either, why you think the proposed code is so bad. Isn't having a quite general implementation favored over individual implementations for each class. (But it might also be that I do not really understand your concerns; so, could we elaborate on this...?)

comment:20 in reply to: ↑ 19 Changed 2 years ago by tscrim

Replying to dkrenn:

Replying to tscrim:

There are a number of things wrong or misleading. First, self(foo) redirects to self._element_constructor_(foo) unless foo is already an element of the parent self. I generally disagree with testing self(foo) and catching an error first. This is very slow for things that inherit from AbstractPartitionDiagram. I believe you also get None in PartitionDiagrams(0) returning True whereas the current version, this is False.

I think what you should do is actually properly fix the __contains__ that are there. This might include a better check method too.

I am not sure either, why you think the proposed code is so bad. Isn't having a quite general implementation favored over individual implementations for each class. (But it might also be that I do not really understand your concerns; so, could we elaborate on this...?)

The __contains__ method should be fast as a general rule. Moreover, general code is usually more complicated as it has poorer separation-of-conerns. This is why we have subclasses. We also already have implementations that mostly work. I see this proposal as 1 step forward, 2 steps back.

comment:21 follow-up: Changed 2 years ago by mantepse

A few questions, comments and a clarification:

1.) I actually did propose to implement efficient methods once the generic check works, and as need arises. As you know I spent quite a bit of time speeding up the code for diagram algebras, because it was too slow for my computations. Thus, I think you could trust me that I care about speed and thought about how to achieve it.

Fun fact: removing the "specialised" TemperleyLiebDiagrams.__contains__ results in a factor 0.8 speedup of the following test:

sage: r=5; td = da.TemperleyLiebDiagrams(r); l = da.PartitionDiagrams(r).list()
sage: %timeit sum(1 for d in l if d in td)

I am guessing that this is because it first checks whether the candidate is in BrauerDiagrams, thus duplicating a lot of work.

2.) I do think the general rule is to go for correctness first, and then optimize. (This is not saying that my patch is perfect, of course. In particular, I think I should have removed the __contains__ from TemperleyLiebDiagrams and PlanarDiagrams, too.)

In the case at hand I think it is particularly important to have a well working generic check, to make it less work to implement new diagram algebras. In particular, #25637 moves three from partition_algebra.py to this module, using the same generic __contains__ method.

I do not see what "separation-of-concerns" has to do with my approach: I am simply determining containment by trying to construct the diagram.

3.) I understand that you think "it is definitively the way to go", but I do not see why. I think your argument about speed is invalid, because the current implementation is buggy, and duplication of code is often a source of bugs.

comment:22 in reply to: ↑ 21 ; follow-up: Changed 2 years ago by tscrim

Replying to mantepse:

A few questions, comments and a clarification:

1.) I actually did propose to implement efficient methods once the generic check works, and as need arises. As you know I spent quite a bit of time speeding up the code for diagram algebras, because it was too slow for my computations. Thus, I think you could trust me that I care about speed and thought about how to achieve it.

Martin, I know you care about speed. Patronizing me is not useful to the discussion.

Fun fact: removing the "specialised" TemperleyLiebDiagrams.__contains__ results in a factor 0.8 speedup of the following test:

sage: r=5; td = da.TemperleyLiebDiagrams(r); l = da.PartitionDiagrams(r).list()
sage: %timeit sum(1 for d in l if d in td)

I am guessing that this is because it first checks whether the candidate is in BrauerDiagrams, thus duplicating a lot of work.

Probably. That is an interesting data point.

2.) I do think the general rule is to go for correctness first, and then optimize. (This is not saying that my patch is perfect, of course. In particular, I think I should have removed the __contains__ from TemperleyLiebDiagrams and PlanarDiagrams, too.)

IMO that is even more a step in the wrong direction.

In the case at hand I think it is particularly important to have a well working generic check, to make it less work to implement new diagram algebras. In particular, #25637 moves three from partition_algebra.py to this module, using the same generic __contains__ method.

I feel that is a bit of a misrepresentation of those __contains__ as we are deprecating those classes.

I do not see what "separation-of-concerns" has to do with my approach: I am simply determining containment by trying to construct the diagram.

You are taking things that should be in separate classes and pulling them to one class. The containment check for being a Brauer diagram should be done by the Brauer diagram parent class after it has been determined that it is a partition diagram by the partition diagram class. Each class tests what it is suppose to be.

3.) I understand that you think "it is definitively the way to go", but I do not see why. I think your argument about speed is invalid, because the current implementation is buggy, and duplication of code is often a source of bugs.

That does not make it invalid. It just means we need to fix a bug. We are only arguing about how to fix the bug. IMO, your fix makes the code less clear, harder to optimize in the future (as any optimization would undo your change), and likely generically makes the code slower (but I have not run timings yet).

There is also very little duplication of code. The only part that is duplicated is the part that converts a non (abstract) partition diagram to an object of the parent. If you are worried about that, you can make that into a function. However, the meat of the issue has no duplication.

I wonder if the only thing that is going wrong is this part:

# check that k and -k are in the same block

comment:23 in reply to: ↑ 22 ; follow-up: Changed 2 years ago by mantepse

Replying to tscrim:

Martin, I know you care about speed. Patronizing me is not useful to the discussion.

reciprocally.

In the case at hand I think it is particularly important to have a well working generic check, to make it less work to implement new diagram algebras. In particular, #25637 moves three from partition_algebra.py to this module, using the same generic __contains__ method.

I feel that is a bit of a misrepresentation of those __contains__ as we are deprecating those classes.

No. Those classes (rooks, planar rooks and permutations) are migrated to the better framework in diagram_algebra.py, only the old implementations are deprecated.

You are taking things that should be in separate classes and pulling them to one class.

No. This is absolutely not what the patch does. I am only removing code.

Besides:

I believe you also get None in PartitionDiagrams(0) returning True [...]

No. I agree it is a good idea to check this corner case!

Anyway. My offer is as follows:

  • remove the __contains__ here
  • do #25637, so that all algebras are in this class
  • discuss and implement a better __contains__ machinery - but after having migrated the other classes from partition_algebra.py to diagram_algebra.py.

A final question:

I wonder if the only thing that is going wrong is this part:

# check that k and -k are in the same block

I don't really know. The logic is (for my brain) scattered over too many places.

The containment check for being a Brauer diagram should be done by the Brauer diagram parent class after it has been determined that it is a partition diagram by the partition diagram class. Each class tests what it is suppose to be.

However, I think it would make sense to swap the __contains__ and check methods: the meat of the check could be in __contains__ of the parent classes, and the check method could be generic in AbstractPartitionDiagram.

The advantage is speed, the cost is that one probably looses some information for the error message. For example, TemperleyLieb is the intersection of Brauer and Planar. Currently the user is told whether the diagram has bad blocks, and if all blocks are OK, whether the diagram is planar. A generic check could probably only say "This is not a Temperley-Lieb diagram".

The other way to gain performance and remove duplication of code is probably to remove AbstractSetPartition.

comment:24 in reply to: ↑ 23 Changed 2 years ago by tscrim

Replying to mantepse:

Replying to tscrim:

You are taking things that should be in separate classes and pulling them to one class.

No. This is absolutely not what the patch does. I am only removing code.

Yes it is. That is precisely what you are doing. Everything goes into the base class's __contains__, so that is the only method that is being called instead of the subclasses' __contains__.

Anyway. My offer is as follows:

  • remove the __contains__ here

This is the sticking point and what I disagree with doing. How about we fix the actual bug?

  • do #25637, so that all algebras are in this class

#25637 is basically independent of this and those old classes are meant to be deprecated. So that is irrelevant to this discussion.

  • discuss and implement a better __contains__ machinery - but after having migrated the other classes from partition_algebra.py to diagram_algebra.py.

This is not a good strategy. Fix the problem at hand. It sounds like you are suggesting to remove code just to add it back in. If you are not sure and feel it warrants a full design discussion, then let us fix the bug using the current framework, which should be easy, and then have the discussion.

A final question:

I wonder if the only thing that is going wrong is this part:

# check that k and -k are in the same block

I don't really know. The logic is (for my brain) scattered over too many places.

Are you saying you are unable to debug this with the current implementation?

The containment check for being a Brauer diagram should be done by the Brauer diagram parent class after it has been determined that it is a partition diagram by the partition diagram class. Each class tests what it is suppose to be.

However, I think it would make sense to swap the __contains__ and check methods: the meat of the check could be in __contains__ of the parent classes, and the check method could be generic in AbstractPartitionDiagram.

I generally agree with this, but your proposed change actually goes the opposite direction of this.

The advantage is speed, the cost is that one probably looses some information for the error message. For example, TemperleyLieb is the intersection of Brauer and Planar. Currently the user is told whether the diagram has bad blocks, and if all blocks are OK, whether the diagram is planar. A generic check could probably only say "This is not a Temperley-Lieb diagram".

In properly optimized code, there would be no speed advantage to having a generic method. Generic code needs to work generically, and so it cannot take advantage of special structure.

The other way to gain performance and remove duplication of code is probably to remove AbstractSetPartition.

Removing ABCs is usually independent of performance, and it generally involves more code duplication because there is less commonality with related classes.

Note: See TracTickets for help on using tickets.