Opened 8 years ago
Last modified 7 years ago
#13814 needs_work defect
LazyFamily.__eq__ gives false positives.
Reported by: | cnassau | Owned by: | sage-combinat |
---|---|---|---|
Priority: | major | Milestone: | sage-6.4 |
Component: | combinatorics | Keywords: | |
Cc: | Merged in: | ||
Authors: | Christian Nassau | Reviewers: | |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description (last modified by )
LazyFamily.__eq__
occasionally returns false positives, because it only compares function names, not values. This can lead to subtle bugs later, like this one:
fun = lambda i:i fam1 = LazyFamily((0,1),fun) fun = lambda i:i+6 fam2 = LazyFamily((0,1),fun) fam3 = LazyFamily((2,3),fun) d1 = DisjointUnionEnumeratedSets((fam1,fam3)) d2 = DisjointUnionEnumeratedSets((fam2,fam3)) for u in (fam1,fam2,fam3,d1,d2): print list(u)
This gives
[0, 1] [6, 7] [8, 9] [0, 1, 8, 9] [0, 1, 8, 9]
because Sage thinks fam1 == fam2
. The behaviour can be fixed by setting
def noteq(self,other): return False LazyFamily.__eq__ = noteq
I think __eq__
should _never_ give false positives for classes that might be hashed. In this case LazyFamily.__eq__
should
- test function equality if the index sets are finite of the same size
- return
False
if one of the indexing sets is infinite
Attachments (1)
Change History (29)
comment:1 Changed 8 years ago by
- Description modified (diff)
comment:2 follow-up: ↓ 6 Changed 8 years ago by
comment:3 Changed 8 years ago by
- Status changed from new to needs_info
comment:4 Changed 8 years ago by
- Status changed from needs_info to needs_review
comment:5 in reply to: ↑ description ; follow-up: ↓ 7 Changed 8 years ago by
Hi!
Replying to cnassau:
LazyFamily.__eq__
occasionally returns false positives, because it only compares function names, not values.
Yikes. This is indeed definitely a bug!
Thanks for catching it, and sorry for commenting on it after you have implemented a first proposal of solution.
I think
__eq__
should _never_ give false positives for classes that might be hashed.
+1!
In this case
LazyFamily.__eq__
should
- test function equality if the index sets are finite of the same size
- return
False
if one of the indexing sets is infinite
I would much prefer to stick as much as possible to the default
equality rule for composite objects: namely to compare their
components for equality. A Family is composed of an indexing set I
and a mapping f
. Thus, ideally, two families should be equal if
their indexing sets are equal and their mappings are equal. How two
sets (infinite or not) compare for equality is not Family's decision
to take.
Now, we have the problem that comparing mappings is delicate; first because two functions with the same code are not considered as equal, and anyway because two functions with different code might implement the same mapping, but that's undecidable.
So at this point it seems to me that the best approximation would be as follow:
- For comparing two FiniteFamily?'s, compare the underlying indexing set and dictionary for equality (the mapping has already been computed on all the indexing set, so we might as well use it, even though this comparison will have a cost).
- For comparing other families: test whether the two indexing sets and the two mappings compare to equal.
Potential issue: that upon pickling unpickling we get back an equal object.
Thanks again for investigating this,
Best,
Nicolas
comment:6 in reply to: ↑ 2 ; follow-up: ↓ 8 Changed 8 years ago by
Replying to cnassau:
The TestSuite? failures would probably be fixed if
_test_pickling
would check fornot (f != loads(dumps(f)))rather than
f == loads(dumps(f))I'll suggest this in a different ticket.
eq
and
ne
are supposed to be implemented consistently; so
_test_pickling is allowed to rely on this assumption. If the
assumption fails, that's an equality bug which should be fixed in
eq/ne
and be tested in
_test_equal
(whenever possible).
Cheers,
Nicolas
comment:7 in reply to: ↑ 5 ; follow-up: ↓ 9 Changed 8 years ago by
Replying to nthiery:
So at this point it seems to me that the best approximation would be as follow:
- For comparing two FiniteFamily?'s, compare the underlying indexing set and dictionary for equality (the mapping has already been computed on all the indexing set, so we might as well use it, even though this comparison will have a cost).
- For comparing other families: test whether the two indexing sets and the two mappings compare to equal.
I think this is also how my patch implements this (which does not quite follow the description in my original submission).
In my patch function equality is tested like this:
- finite family: brute force check of all values
- infinite family: check if function objects are identical
It occured to me that we should also try this:
- comparison of the pickle-string of the function objects
- testing an initial finite slice of the values
I believe functions with the same pickle-representation are guaranteed to be equal, even if their pickled-function might not be usable (see failure of U3 doctests in my patch).
Potential issue: that upon pickling unpickling we get back an equal object.
comment:8 in reply to: ↑ 6 ; follow-up: ↓ 14 Changed 8 years ago by
Replying to nthiery:
Replying to cnassau:
The TestSuite? failures would probably be fixed if
_test_pickling
would check fornot (f != loads(dumps(f)))rather than
f == loads(dumps(f))I'll suggest this in a different ticket.
eq
and
ne
are supposed to be implemented consistently; so _test_pickling is allowed to rely on this assumption. If the assumption fails, that's an equality bug which should be fixed in
eq/ne
and be tested in
_test_equal
(whenever possible).
Well, I think it is consistent to have
sage: LazyFamily(Integers(),lambda i:i) == LazyFamily(Integers(),lambda i:1*i) False sage: LazyFamily(Integers(),lambda i:i) != LazyFamily(Integers(),lambda i:1*i) False
In my reading this conforms to the standard python requirements:
hashing only requires
__hash__
and__eq__
inhttp://docs.python.org/2/reference/datamodel.html#object.__hash__
The document also says
There are no implied relationships among the comparison operators. The truth of x==y does not imply that x!=y is false.
Cheers, Christian
comment:9 in reply to: ↑ 7 ; follow-up: ↓ 10 Changed 8 years ago by
Replying to cnassau:
I think this is also how my patch implements this (which does not quite follow the description in my original submission).
In my patch function equality is tested like this:
- finite family: brute force check of all values
- infinite family: check if function objects are identical
There is a slight difference: I prefer the distinction to be based on the class, namely whether we are manipulating two FiniteFamily?'s or not.
By the way, please use:
for i in self.set:
rather than:
lst = list(self.set) for i in lst
It occured to me that we should also try this:
- comparison of the pickle-string of the function objects
This sounds reasonable, but needs to be heavily tested for all sorts of function objects.
- testing an initial finite slice of the values
I can see the point! Still, I'd rather stick to a behavior that depends only on the class, rather than on some magic constant. Otherwise, this is likely to cause bugs that won't be caught by testing since they will only show up for large objects.
Cheers,
Nicolas
comment:10 in reply to: ↑ 9 ; follow-up: ↓ 11 Changed 8 years ago by
Replying to nthiery:
Replying to cnassau:
I think this is also how my patch implements this (which does not quite follow the description in my original submission).
In my patch function equality is tested like this:
- finite family: brute force check of all values
- infinite family: check if function objects are identical
There is a slight difference: I prefer the distinction to be based on the class, namely whether we are manipulating two FiniteFamily?'s or not.
This has some really bad consequences if (as I believe) we're not able to reliably test functions for equality. We would then have to live with
sage: LazyFamily([1,2,3], lambda i:i) == LazyFamily([1,2,3], lambda i:i) False
I would find it better to use a brute force check for those finite index sets that implement a list()
method:
try: for i in self.set.list(): if self.function(i) != other.function(i): return False return True except NotImplementedError: pass
As noted in src/categories/enumerated_sets.py
a set can choose to not implement list
if it is probably too large to be useful.
By the way, please use:
for i in self.set:rather than:
lst = list(self.set) for i in lst
The idea was to use self.set.list()
as explained above, because then the exhaustive check would be skipped for sets that are too big and know about it.
It occured to me that we should also try this:
- comparison of the pickle-string of the function objects
This sounds reasonable, but needs to be heavily tested for all sorts of function objects.
I've changed my mind: this is not at all a good idea: in my tests an unpickle_function(lambda i:i)
includes a reference to the doctest where it was defined. That's not good at all if we want to test for function equality... for similar reasons it seems bad to use the bytecode disassembly from the dis module...
Could you live with the self.set.list()
suggestion?
comment:11 in reply to: ↑ 10 Changed 8 years ago by
Replying to cnassau:
This has some really bad consequences if (as I believe) we're not able to reliably test functions for equality. We would then have to live with
sage: LazyFamily([1,2,3], lambda i:i) == LazyFamily([1,2,3], lambda i:i) False
That is perfectly fine for me: if the caller asked explicitly for a lazy family, then (s)he should expect the comparison to be lazy, and thus cheap but incomplete. If she is willing to pay the price of computing the function on all indices, then a FiniteFamily? is what she wants.
As noted in
src/categories/enumerated_sets.py
a set can choose to not implementlist
if it is probably too large to be useful.
Yup. But there are a lot of sets, like Permutations(n) that can be very small or very large, and such sets cannot make a good decision. At the end of the day, it's better to take the decision upon building the family (by asking explicitly for a lazy family or not).
I've changed my mind: this is not at all a good idea: in my tests an
unpickle_function(lambda i:i)
includes a reference to the doctest where it was defined. That's not good at all if we want to test for function equality... for similar reasons it seems bad to use the bytecode disassembly from the dis module...
Ok!
Kind regards,
Nicolas
comment:12 follow-up: ↓ 15 Changed 8 years ago by
Ok, so I'll prepare a new patch which
- does not compare function values if the indexing set is finite
- recognizes identical function objects
- uses
self.function.__eq__
if this is available - otherwise treats functions as distinct
We will then have, for example,
sage: LazyFamily([1,2,3], lambda i:i) == LazyFamily([1,2,3], lambda i:i) False sage: LazyFamily([1,2,3], lambda i:i) != LazyFamily([1,2,3], lambda i:i) False
I predict that quite a number of doctests will have to be modified. There will be new pickling failures, because Sage will often not be able to verify f == load(dumps(f))
. If we switched to not(f!=load(dumps(f)))
the doctests could probably remain as they are. I'll post more on this when I have the patch and some numbers.
Cheers, Christian
comment:13 Changed 8 years ago by
By the way, would you mind if I also included the change proposed in ticket #13811 in this patch?
comment:14 in reply to: ↑ 8 Changed 8 years ago by
Hi Christian,
Replying to cnassau:
eq
and
ne
are supposed to be implemented consistently; so _test_pickling is allowed to rely on this assumption. If the assumption fails, that's an equality bug which should be fixed in
eq/ne
and be tested in
_test_equal
(whenever possible).
Well, I think it is consistent to have
sage: LazyFamily(Integers(),lambda i:i) == LazyFamily(Integers(),lambda i:1*i) False sage: LazyFamily(Integers(),lambda i:i) != LazyFamily(Integers(),lambda i:1*i) FalseIn my reading this conforms to the standard python requirements:
hashing only requires
__hash__
and__eq__
inhttp://docs.python.org/2/reference/datamodel.html#object.__hash__
The document also says
There are no implied relationships among the comparison operators. The truth of x==y does not imply that x!=y is false.
That's right for Python. However Sage is more stringent about this and requires == and != to be consistent. Hmm, I am not sure this is documented anywhere though. Basically the idea is that we already have trouble maintaining a consistent specification and implementation for ==, so we can't really afford to also maintain a possibly different specification for != ...
Cheers,
Nicolas
comment:15 in reply to: ↑ 12 Changed 8 years ago by
Replying to cnassau:
Ok, so I'll prepare a new patch which
- does not compare function values if the indexing set is finite
- recognizes identical function objects
- uses
self.function.__eq__
if this is available- otherwise treats functions as distinct
Excellent. Thanks!
We will then have, for example,
sage: LazyFamily([1,2,3], lambda i:i) == LazyFamily([1,2,3], lambda i:i) False sage: LazyFamily([1,2,3], lambda i:i) != LazyFamily([1,2,3], lambda i:i) False
I really vote for == and != being consistent. Comments anyone else?
I predict that quite a number of doctests will have to be modified. There will be new pickling failures, because Sage will often not be able to verify
f == load(dumps(f))
.
Do you mind post here a quick selection of typical doctests / pickling failures? That will be a good source for me and others to think about what the right approach shall be.
Thanks for your ongoing work!
Nicolas
PS: for the other ticket, since the changes are fairly independent, please keep them separate!
comment:16 follow-up: ↓ 17 Changed 8 years ago by
I have implemented a new patch, based on Sage 5.5.rc0. Here are the key features:
The new LazyFamily.__eq__
- does not compare function values (even if the indexing set is finite)
- recognizes identical function objects
- uses
self.function.__eq__
if this is available - otherwise treats functions as distinct
Furthermore
- "
f != g
" is exactly equivalent to "not(f==g)
"
Contrary to my expectations there were not that many doctest failures. The only affectes files were family.py
and
disjoint_union_enumerated_sets.py
.
Originally I was hoping to present two patches, plus numbers/statistics about doctest failures and the like.... but I'm a little bit too exhausted right now. And I think the new patch is probably good enough as it stands ;-)
Cheers,
Christian
comment:17 in reply to: ↑ 16 ; follow-up: ↓ 18 Changed 8 years ago by
Replying to cnassau:
I have implemented a new patch, based on Sage 5.5.rc0. Here are the key features:
The new
LazyFamily.__eq__
- does not compare function values (even if the indexing set is finite)
- recognizes identical function objects
- uses
self.function.__eq__
if this is available- otherwise treats functions as distinct
Furthermore
- "
f != g
" is exactly equivalent to "not(f==g)
"
Great, thanks!
Contrary to my expectations there were not that many doctest failures. The only affectes files were
family.py
and
disjoint_union_enumerated_sets.py
.
That's definitely good news.
Should I remove the previous patch? Then we won't have to specify which patch to apply.
A couple questions:
Would there be a problem to just compare the functions with:
self.function == other.function
Rather than doing the check for is and then for eq?
I am surprised about your comment about Permutations not being picklable. This works for me:
sage: P = Permutations(3) sage: P == Permutations(3) True sage: P == loads(dumps(P)) True
Could the hash be calculated using both the hash of the indexing set and the function?
Please take the occasion to rewrite WARNING
into
.. WARNING::
(see the developers manual for the exact syntax).
Where is !=
implemented?
Cheers,
Nicolas
comment:18 in reply to: ↑ 17 ; follow-up: ↓ 19 Changed 8 years ago by
Should I remove the previous patch? Then we won't have to specify which patch to apply.
Yes, please! I would do it myself if I knew how to do it ;-)
A couple questions:
Would there be a problem to just compare the functions with:
self.function == other.functionRather than doing the check for is and then for eq?
No problem, that's definitly better.
I am surprised about your comment about Permutations not being picklable. This works for me:
sage: P = Permutations(3) sage: P == Permutations(3) True sage: P == loads(dumps(P)) True
This is really an issue with sage.misc.fpickle
: the function "n -> Permutations(n)
" can be pickled, but the unpickled version has forgotten the default value of the second argument of
Permutations(n,k)
:
sage: f = Permutations sage: from sage.misc.fpickle import pickle_function, unpickle_function sage: g = unpickle_function(pickle_function(f)) sage: print f(5) Standard permutations of 5 sage: print g(5) Traceback (most recent call last): ... TypeError: Permutations() takes exactly 2 arguments (1 given)
Could the hash be calculated using both the hash of the indexing set and the function?
I'm now using the sum of both hashes.
Please take the occasion to rewrite
WARNING
into.. WARNING::
(see the developers manual for the exact syntax).
Fixed.
Where is
!=
implemented?
That's a good question: grep
could find a handful of
__ne__
in the Sage code base, but none that would be a candidate for LazyFamily.__ne__
. The python manual clearly says that one should define __ne__
if __eq__
is defined; that appears not to be the case in class SageObject
, so my guess is that Python is using a default implementation.
By the way, both Parent
and SageObject
seem to test the consistency of __eq__
and __ne__
in the test suite.
Cheers,
Christian
comment:19 in reply to: ↑ 18 ; follow-ups: ↓ 20 ↓ 21 Changed 8 years ago by
Hi Christian,
Thanks for your changes!
Replying to cnassau:
Should I remove the previous patch? Then we won't have to specify which patch to apply.
Done.
This is really an issue with
sage.misc.fpickle
: the function "n -> Permutations(n)
" can be pickled, but the unpickled version has forgotten the default value of the second argument of
Permutations(n,k)
: ...
I see!
Yes, this is definitely a misfeature with sage.misc.fpickle. It should be fixed to use standard pickling when it can (e.g. for a function defined in a library). And it should return something that can be unpickled with a plain loads. And, as mentioned in the code of family it should be registered to copy_reg so that it would be called automatically by dumps without needing to pollute one's code (like family) with it. Would you be so kind as to open a ticket about this if there is not already one?
In order to separate the two issues, what about changing the failing examples to use Partitions (which pickles smoothly), rather than making it more complicated with some workaround that won't be needed hopefully sometime soon, and is not vital for most users?
Where is
!=
implemented?
That's a good question:
grep
could find a handful of
__ne__
in the Sage code base, but none that would be a candidate forLazyFamily.__ne__
. The python manual clearly says that one should define__ne__
if__eq__
is defined; that appears not to be the case in classSageObject
, so my guess is that Python is using a default implementation.
Ok. Then you may want to explicitly implement ne as not eq, just to be sure.
By the way, both
Parent
andSageObject
seem to test the consistency of__eq__
and__ne__
in the test suite.
That's good :-)
Cheers,
Nicolas
comment:20 in reply to: ↑ 19 ; follow-up: ↓ 22 Changed 8 years ago by
In order to separate the two issues, what about changing the failing examples to use Partitions (which pickles smoothly), rather than making it more complicated with some workaround that won't be needed hopefully sometime soon, and is not vital for most users?
Actually, Partitions
doesn't really pickle very well either:
sage: # first attempt sage: f=Family(NonNegativeIntegers(),Partitions) ; f Lazy family (Partitions(i))_{i in Non negative integers} sage: loads(dumps(f)) Traceback (most recent call last) ... ValueError: Cannot pickle code objects from closures sage: # second attempt sage: f=Family(NonNegativeIntegers(),lambda n:Partitions(n)) sage: g=loads(dumps(f)) sage: f[5] Partitions of the integer 5 sage: g[5] Traceback (most recent call last) ... NameError: global name 'Partitions' is not defined
I can use Simplex
for a smooth example:
sage: f=Family(NonNegativeIntegers(),Simplex) sage: g=loads(dumps(f)) sage: f[5] (0, 1, 2, 3, 4, 5) sage: g[5] (0, 1, 2, 3, 4, 5)
I'll try to prepare a new patch tomorrow.
Cheers,
Christian
comment:21 in reply to: ↑ 19 Changed 8 years ago by
comment:22 in reply to: ↑ 20 Changed 8 years ago by
Replying to cnassau:
In order to separate the two issues, what about changing the failing examples to use Partitions (which pickles smoothly), rather than making it more complicated with some workaround that won't be needed hopefully sometime soon, and is not vital for most users?
Actually,
Partitions
doesn't really pickle very well either:
Oops, sorry; I had forgotten that Partitions was fixed, but only in the Sage-Combinat queue for the moment. Simplex will do.
Cheers,
comment:23 Changed 8 years ago by
Here's a new patch. This implements a __ne__
on the AbstractFamily
base class and uses SymmetricGroups
in place of Permutations
in the disjoint union doctests.
Actually, it wasn' easy to find an example of a Family where the pickling tests worked smoothly: Simplex
failed (Simplex(n)
is not a set), GF
failed (elements not disjoint because of a common 0
), Subsets
failed (same defect as the original Partitions
example)...
Cheers,
Christian
Changed 8 years ago by
comment:24 Changed 8 years ago by
- Milestone changed from sage-5.11 to sage-5.12
comment:26 Changed 7 years ago by
- Milestone changed from sage-6.1 to sage-6.2
comment:27 Changed 7 years ago by
- Milestone changed from sage-6.2 to sage-6.3
comment:28 Changed 7 years ago by
- Milestone changed from sage-6.3 to sage-6.4
I've attached a patch that fixes the equality test for lazy families.
It leads to testsuite failures in
disjoint_union_enumerated_sets.py
: these failures are due to pickling problems that were hidden by the deficiency of the old__eq__
implementation.The TestSuite? failures would probably be fixed if
_test_pickling
would check forrather than
I'll suggest this in a different ticket.