Opened 8 years ago

Last modified 7 years ago

#13814 needs_work defect

LazyFamily.__eq__ gives false positives.

Reported by: cnassau Owned by: sage-combinat
Priority: major Milestone: sage-6.4
Component: combinatorics Keywords:
Cc: Merged in:
Authors: Christian Nassau Reviewers:
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description (last modified by cnassau)

LazyFamily.__eq__ occasionally returns false positives, because it only compares function names, not values. This can lead to subtle bugs later, like this one:

fun = lambda i:i
fam1 = LazyFamily((0,1),fun)
fun = lambda i:i+6
fam2 = LazyFamily((0,1),fun)
fam3 = LazyFamily((2,3),fun)
d1 = DisjointUnionEnumeratedSets((fam1,fam3))
d2 = DisjointUnionEnumeratedSets((fam2,fam3))
for u in (fam1,fam2,fam3,d1,d2):
    print list(u)

This gives

[0, 1]
[6, 7]
[8, 9]
[0, 1, 8, 9]
[0, 1, 8, 9]

because Sage thinks fam1 == fam2. The behaviour can be fixed by setting

def noteq(self,other):
    return False
LazyFamily.__eq__ = noteq

I think __eq__ should _never_ give false positives for classes that might be hashed. In this case LazyFamily.__eq__ should

  • test function equality if the index sets are finite of the same size
  • return False if one of the indexing sets is infinite

Attachments (1)

trac13814_cna.patch (8.1 KB) - added by cnassau 8 years ago.

Download all attachments as: .zip

Change History (29)

comment:1 Changed 8 years ago by cnassau

  • Description modified (diff)

comment:2 follow-up: Changed 8 years ago by cnassau

I've attached a patch that fixes the equality test for lazy families.

It leads to testsuite failures in disjoint_union_enumerated_sets.py: these failures are due to pickling problems that were hidden by the deficiency of the old __eq__ implementation.

The TestSuite? failures would probably be fixed if _test_pickling would check for

not (f != loads(dumps(f)))

rather than

f == loads(dumps(f))

I'll suggest this in a different ticket.

comment:3 Changed 8 years ago by cnassau

  • Status changed from new to needs_info

comment:4 Changed 8 years ago by cnassau

  • Status changed from needs_info to needs_review

comment:5 in reply to: ↑ description ; follow-up: Changed 8 years ago by nthiery

Hi!

Replying to cnassau:

LazyFamily.__eq__ occasionally returns false positives, because it only compares function names, not values.

Yikes. This is indeed definitely a bug!

Thanks for catching it, and sorry for commenting on it after you have implemented a first proposal of solution.

I think __eq__ should _never_ give false positives for classes that might be hashed.

+1!

In this case LazyFamily.__eq__ should

  • test function equality if the index sets are finite of the same size
  • return False if one of the indexing sets is infinite

I would much prefer to stick as much as possible to the default equality rule for composite objects: namely to compare their components for equality. A Family is composed of an indexing set I and a mapping f. Thus, ideally, two families should be equal if their indexing sets are equal and their mappings are equal. How two sets (infinite or not) compare for equality is not Family's decision to take.

Now, we have the problem that comparing mappings is delicate; first because two functions with the same code are not considered as equal, and anyway because two functions with different code might implement the same mapping, but that's undecidable.

So at this point it seems to me that the best approximation would be as follow:

  • For comparing two FiniteFamily?'s, compare the underlying indexing set and dictionary for equality (the mapping has already been computed on all the indexing set, so we might as well use it, even though this comparison will have a cost).
  • For comparing other families: test whether the two indexing sets and the two mappings compare to equal.

Potential issue: that upon pickling unpickling we get back an equal object.

Thanks again for investigating this,

Best,

Nicolas

comment:6 in reply to: ↑ 2 ; follow-up: Changed 8 years ago by nthiery

Replying to cnassau:

The TestSuite? failures would probably be fixed if _test_pickling would check for

not (f != loads(dumps(f)))

rather than

f == loads(dumps(f))

I'll suggest this in a different ticket.

eq and ne are supposed to be implemented consistently; so _test_pickling is allowed to rely on this assumption. If the assumption fails, that's an equality bug which should be fixed in eq/ne and be tested in _test_equal (whenever possible).

Cheers,

Nicolas

comment:7 in reply to: ↑ 5 ; follow-up: Changed 8 years ago by cnassau

Replying to nthiery:

So at this point it seems to me that the best approximation would be as follow:

  • For comparing two FiniteFamily?'s, compare the underlying indexing set and dictionary for equality (the mapping has already been computed on all the indexing set, so we might as well use it, even though this comparison will have a cost).
  • For comparing other families: test whether the two indexing sets and the two mappings compare to equal.

I think this is also how my patch implements this (which does not quite follow the description in my original submission).

In my patch function equality is tested like this:

  • finite family: brute force check of all values
  • infinite family: check if function objects are identical

It occured to me that we should also try this:

  • comparison of the pickle-string of the function objects
  • testing an initial finite slice of the values

I believe functions with the same pickle-representation are guaranteed to be equal, even if their pickled-function might not be usable (see failure of U3 doctests in my patch).

Potential issue: that upon pickling unpickling we get back an equal object.

comment:8 in reply to: ↑ 6 ; follow-up: Changed 8 years ago by cnassau

Replying to nthiery:

Replying to cnassau:

The TestSuite? failures would probably be fixed if _test_pickling would check for

not (f != loads(dumps(f)))

rather than

f == loads(dumps(f))

I'll suggest this in a different ticket.

eq and ne are supposed to be implemented consistently; so _test_pickling is allowed to rely on this assumption. If the assumption fails, that's an equality bug which should be fixed in eq/ne and be tested in _test_equal (whenever possible).

Well, I think it is consistent to have

   sage: LazyFamily(Integers(),lambda i:i) == LazyFamily(Integers(),lambda i:1*i)
   False
   sage: LazyFamily(Integers(),lambda i:i) != LazyFamily(Integers(),lambda i:1*i)
   False

In my reading this conforms to the standard python requirements:

hashing only requires __hash__ and __eq__ in http://docs.python.org/2/reference/datamodel.html#object.__hash__

The document also says

There are no implied relationships among the comparison operators. The truth of x==y does not imply that x!=y is false.

Cheers, Christian

comment:9 in reply to: ↑ 7 ; follow-up: Changed 8 years ago by nthiery

Replying to cnassau:

I think this is also how my patch implements this (which does not quite follow the description in my original submission).

In my patch function equality is tested like this:

  • finite family: brute force check of all values
  • infinite family: check if function objects are identical

There is a slight difference: I prefer the distinction to be based on the class, namely whether we are manipulating two FiniteFamily?'s or not.

By the way, please use:

    for i in self.set:

rather than:

    lst = list(self.set)
    for i in lst

It occured to me that we should also try this:

  • comparison of the pickle-string of the function objects

This sounds reasonable, but needs to be heavily tested for all sorts of function objects.

  • testing an initial finite slice of the values

I can see the point! Still, I'd rather stick to a behavior that depends only on the class, rather than on some magic constant. Otherwise, this is likely to cause bugs that won't be caught by testing since they will only show up for large objects.

Cheers,

Nicolas

comment:10 in reply to: ↑ 9 ; follow-up: Changed 8 years ago by cnassau

Replying to nthiery:

Replying to cnassau:

I think this is also how my patch implements this (which does not quite follow the description in my original submission).

In my patch function equality is tested like this:

  • finite family: brute force check of all values
  • infinite family: check if function objects are identical

There is a slight difference: I prefer the distinction to be based on the class, namely whether we are manipulating two FiniteFamily?'s or not.

This has some really bad consequences if (as I believe) we're not able to reliably test functions for equality. We would then have to live with

sage: LazyFamily([1,2,3], lambda i:i) == LazyFamily([1,2,3], lambda i:i)
False

I would find it better to use a brute force check for those finite index sets that implement a list() method:

try:
    for i in self.set.list():
        if self.function(i) != other.function(i):
            return False
    return True
except NotImplementedError:
    pass

As noted in src/categories/enumerated_sets.py a set can choose to not implement list if it is probably too large to be useful.

By the way, please use:

    for i in self.set:

rather than:

    lst = list(self.set)
    for i in lst

The idea was to use self.set.list() as explained above, because then the exhaustive check would be skipped for sets that are too big and know about it.

It occured to me that we should also try this:

  • comparison of the pickle-string of the function objects

This sounds reasonable, but needs to be heavily tested for all sorts of function objects.

I've changed my mind: this is not at all a good idea: in my tests an unpickle_function(lambda i:i) includes a reference to the doctest where it was defined. That's not good at all if we want to test for function equality... for similar reasons it seems bad to use the bytecode disassembly from the dis module...

Could you live with the self.set.list() suggestion?

comment:11 in reply to: ↑ 10 Changed 8 years ago by nthiery

Replying to cnassau:

This has some really bad consequences if (as I believe) we're not able to reliably test functions for equality. We would then have to live with

sage: LazyFamily([1,2,3], lambda i:i) == LazyFamily([1,2,3], lambda i:i)
False

That is perfectly fine for me: if the caller asked explicitly for a lazy family, then (s)he should expect the comparison to be lazy, and thus cheap but incomplete. If she is willing to pay the price of computing the function on all indices, then a FiniteFamily? is what she wants.

As noted in src/categories/enumerated_sets.py a set can choose to not implement list if it is probably too large to be useful.

Yup. But there are a lot of sets, like Permutations(n) that can be very small or very large, and such sets cannot make a good decision. At the end of the day, it's better to take the decision upon building the family (by asking explicitly for a lazy family or not).

I've changed my mind: this is not at all a good idea: in my tests an unpickle_function(lambda i:i) includes a reference to the doctest where it was defined. That's not good at all if we want to test for function equality... for similar reasons it seems bad to use the bytecode disassembly from the dis module...

Ok!

Kind regards,

Nicolas

comment:12 follow-up: Changed 8 years ago by cnassau

Ok, so I'll prepare a new patch which

  • does not compare function values if the indexing set is finite
  • recognizes identical function objects
  • uses self.function.__eq__ if this is available
  • otherwise treats functions as distinct

We will then have, for example,

sage: LazyFamily([1,2,3], lambda i:i) == LazyFamily([1,2,3], lambda i:i)
False
sage: LazyFamily([1,2,3], lambda i:i) != LazyFamily([1,2,3], lambda i:i)
False

I predict that quite a number of doctests will have to be modified. There will be new pickling failures, because Sage will often not be able to verify f == load(dumps(f)). If we switched to not(f!=load(dumps(f))) the doctests could probably remain as they are. I'll post more on this when I have the patch and some numbers.

Cheers, Christian

comment:13 Changed 8 years ago by cnassau

By the way, would you mind if I also included the change proposed in ticket #13811 in this patch?

comment:14 in reply to: ↑ 8 Changed 8 years ago by nthiery

Hi Christian,

Replying to cnassau:

eq and ne are supposed to be implemented consistently; so _test_pickling is allowed to rely on this assumption. If the assumption fails, that's an equality bug which should be fixed in eq/ne and be tested in _test_equal (whenever possible).

Well, I think it is consistent to have

   sage: LazyFamily(Integers(),lambda i:i) == LazyFamily(Integers(),lambda i:1*i)
   False
   sage: LazyFamily(Integers(),lambda i:i) != LazyFamily(Integers(),lambda i:1*i)
   False

In my reading this conforms to the standard python requirements:

hashing only requires __hash__ and __eq__ in http://docs.python.org/2/reference/datamodel.html#object.__hash__

The document also says

There are no implied relationships among the comparison operators. The truth of x==y does not imply that x!=y is false.

That's right for Python. However Sage is more stringent about this and requires == and != to be consistent. Hmm, I am not sure this is documented anywhere though. Basically the idea is that we already have trouble maintaining a consistent specification and implementation for ==, so we can't really afford to also maintain a possibly different specification for != ...

Cheers,

Nicolas

comment:15 in reply to: ↑ 12 Changed 8 years ago by nthiery

Replying to cnassau:

Ok, so I'll prepare a new patch which

  • does not compare function values if the indexing set is finite
  • recognizes identical function objects
  • uses self.function.__eq__ if this is available
  • otherwise treats functions as distinct

Excellent. Thanks!

We will then have, for example,

sage: LazyFamily([1,2,3], lambda i:i) == LazyFamily([1,2,3], lambda i:i)
False
sage: LazyFamily([1,2,3], lambda i:i) != LazyFamily([1,2,3], lambda i:i)
False

I really vote for == and != being consistent. Comments anyone else?

I predict that quite a number of doctests will have to be modified. There will be new pickling failures, because Sage will often not be able to verify f == load(dumps(f)).

Do you mind post here a quick selection of typical doctests / pickling failures? That will be a good source for me and others to think about what the right approach shall be.

Thanks for your ongoing work!

Nicolas

PS: for the other ticket, since the changes are fairly independent, please keep them separate!

comment:16 follow-up: Changed 8 years ago by cnassau

I have implemented a new patch, based on Sage 5.5.rc0. Here are the key features:

The new LazyFamily.__eq__

  • does not compare function values (even if the indexing set is finite)
  • recognizes identical function objects
  • uses self.function.__eq__ if this is available
  • otherwise treats functions as distinct

Furthermore

  • "f != g" is exactly equivalent to "not(f==g)"

Contrary to my expectations there were not that many doctest failures. The only affectes files were family.py and disjoint_union_enumerated_sets.py.

Originally I was hoping to present two patches, plus numbers/statistics about doctest failures and the like.... but I'm a little bit too exhausted right now. And I think the new patch is probably good enough as it stands ;-)

Cheers,
Christian

comment:17 in reply to: ↑ 16 ; follow-up: Changed 8 years ago by nthiery

Replying to cnassau:

I have implemented a new patch, based on Sage 5.5.rc0. Here are the key features:

The new LazyFamily.__eq__

  • does not compare function values (even if the indexing set is finite)
  • recognizes identical function objects
  • uses self.function.__eq__ if this is available
  • otherwise treats functions as distinct

Furthermore

  • "f != g" is exactly equivalent to "not(f==g)"

Great, thanks!

Contrary to my expectations there were not that many doctest failures. The only affectes files were family.py and disjoint_union_enumerated_sets.py.

That's definitely good news.

Should I remove the previous patch? Then we won't have to specify which patch to apply.

A couple questions:

Would there be a problem to just compare the functions with:

   self.function == other.function

Rather than doing the check for is and then for eq?

I am surprised about your comment about Permutations not being picklable. This works for me:

  sage: P = Permutations(3) 
  sage: P == Permutations(3)
  True
  sage: P == loads(dumps(P))
  True

Could the hash be calculated using both the hash of the indexing set and the function?

Please take the occasion to rewrite WARNING into .. WARNING:: (see the developers manual for the exact syntax).

Where is != implemented?

Cheers,

Nicolas

comment:18 in reply to: ↑ 17 ; follow-up: Changed 8 years ago by cnassau

Should I remove the previous patch? Then we won't have to specify which patch to apply.

Yes, please! I would do it myself if I knew how to do it ;-)

A couple questions:

Would there be a problem to just compare the functions with:

   self.function == other.function

Rather than doing the check for is and then for eq?

No problem, that's definitly better.

I am surprised about your comment about Permutations not being picklable. This works for me:

  sage: P = Permutations(3) 
  sage: P == Permutations(3)
  True
  sage: P == loads(dumps(P))
  True

This is really an issue with sage.misc.fpickle: the function "n -> Permutations(n)" can be pickled, but the unpickled version has forgotten the default value of the second argument of Permutations(n,k):

   sage: f = Permutations 
   sage: from sage.misc.fpickle import pickle_function, unpickle_function 
   sage: g = unpickle_function(pickle_function(f)) 
   sage: print f(5) 
   Standard permutations of 5
   sage: print g(5) 
   Traceback (most recent call last):
   ...
   TypeError: Permutations() takes exactly 2 arguments (1 given)

Could the hash be calculated using both the hash of the indexing set and the function?

I'm now using the sum of both hashes.

Please take the occasion to rewrite WARNING into .. WARNING:: (see the developers manual for the exact syntax).

Fixed.

Where is != implemented?

That's a good question: grep could find a handful of __ne__ in the Sage code base, but none that would be a candidate for LazyFamily.__ne__. The python manual clearly says that one should define __ne__ if __eq__ is defined; that appears not to be the case in class SageObject, so my guess is that Python is using a default implementation.

By the way, both Parent and SageObject seem to test the consistency of __eq__ and __ne__ in the test suite.

Cheers,
Christian

comment:19 in reply to: ↑ 18 ; follow-ups: Changed 8 years ago by nthiery

Hi Christian,

Thanks for your changes!

Replying to cnassau:

Should I remove the previous patch? Then we won't have to specify which patch to apply.

Done.

This is really an issue with sage.misc.fpickle: the function "n -> Permutations(n)" can be pickled, but the unpickled version has forgotten the default value of the second argument of Permutations(n,k): ...

I see!

Yes, this is definitely a misfeature with sage.misc.fpickle. It should be fixed to use standard pickling when it can (e.g. for a function defined in a library). And it should return something that can be unpickled with a plain loads. And, as mentioned in the code of family it should be registered to copy_reg so that it would be called automatically by dumps without needing to pollute one's code (like family) with it. Would you be so kind as to open a ticket about this if there is not already one?

In order to separate the two issues, what about changing the failing examples to use Partitions (which pickles smoothly), rather than making it more complicated with some workaround that won't be needed hopefully sometime soon, and is not vital for most users?

Where is != implemented?

That's a good question: grep could find a handful of __ne__ in the Sage code base, but none that would be a candidate for LazyFamily.__ne__. The python manual clearly says that one should define __ne__ if __eq__ is defined; that appears not to be the case in class SageObject, so my guess is that Python is using a default implementation.

Ok. Then you may want to explicitly implement ne as not eq, just to be sure.

By the way, both Parent and SageObject seem to test the consistency of __eq__ and __ne__ in the test suite.

That's good :-)

Cheers,

Nicolas

comment:20 in reply to: ↑ 19 ; follow-up: Changed 8 years ago by cnassau

In order to separate the two issues, what about changing the failing examples to use Partitions (which pickles smoothly), rather than making it more complicated with some workaround that won't be needed hopefully sometime soon, and is not vital for most users?

Actually, Partitions doesn't really pickle very well either:

sage: # first attempt
sage: f=Family(NonNegativeIntegers(),Partitions) ; f
Lazy family (Partitions(i))_{i in Non negative integers}
sage: loads(dumps(f))
Traceback (most recent call last)
...
ValueError: Cannot pickle code objects from closures

sage: # second attempt
sage: f=Family(NonNegativeIntegers(),lambda n:Partitions(n))
sage: g=loads(dumps(f))
sage: f[5]
Partitions of the integer 5
sage: g[5]
Traceback (most recent call last)
...
NameError: global name 'Partitions' is not defined

I can use Simplex for a smooth example:

sage: f=Family(NonNegativeIntegers(),Simplex)
sage: g=loads(dumps(f))
sage: f[5]
(0, 1, 2, 3, 4, 5)
sage: g[5]
(0, 1, 2, 3, 4, 5)

I'll try to prepare a new patch tomorrow.

Cheers,
Christian

comment:21 in reply to: ↑ 19 Changed 8 years ago by cnassau

Replying to nthiery:

Would you be so kind as to open a ticket about this if there is not already one?

Done: this is now #13835

Cheers,
Christian

comment:22 in reply to: ↑ 20 Changed 8 years ago by nthiery

Replying to cnassau:

In order to separate the two issues, what about changing the failing examples to use Partitions (which pickles smoothly), rather than making it more complicated with some workaround that won't be needed hopefully sometime soon, and is not vital for most users?

Actually, Partitions doesn't really pickle very well either:

Oops, sorry; I had forgotten that Partitions was fixed, but only in the Sage-Combinat queue for the moment. Simplex will do.

Cheers,

comment:23 Changed 8 years ago by cnassau

Here's a new patch. This implements a __ne__ on the AbstractFamily base class and uses SymmetricGroups in place of Permutations in the disjoint union doctests.

Actually, it wasn' easy to find an example of a Family where the pickling tests worked smoothly: Simplex failed (Simplex(n) is not a set), GF failed (elements not disjoint because of a common 0), Subsets failed (same defect as the original Partitions example)...

Cheers,
Christian

Changed 8 years ago by cnassau

comment:24 Changed 8 years ago by jdemeyer

  • Milestone changed from sage-5.11 to sage-5.12

comment:25 Changed 8 years ago by chapoton

  • Status changed from needs_review to needs_work

needs rebase

comment:26 Changed 7 years ago by vbraun_spam

  • Milestone changed from sage-6.1 to sage-6.2

comment:27 Changed 7 years ago by vbraun_spam

  • Milestone changed from sage-6.2 to sage-6.3

comment:28 Changed 7 years ago by vbraun_spam

  • Milestone changed from sage-6.3 to sage-6.4
Note: See TracTickets for help on using tickets.