Opened 4 years ago

Last modified 3 years ago

#23324 needs_work defect

surprising behaviour of Set

Reported by: mantepse Owned by:
Priority: major Milestone: sage-8.0
Component: misc Keywords:
Cc: Merged in:
Authors: Martin Rubey Reviewers:
Report Upstream: N/A Work issues:
Branch: u/mantepse/surprising_behaviour_of_set (Commits, GitHub, GitLab) Commit: ba613e9de4136b386c156e6c1cb1c8fceebc9ecf
Dependencies: Stopgaps:

Status badges

Description (last modified by jdemeyer)

sage: Set([[], []]).cardinality()
2

In earlier versions (before #22597), a TypeError: unhashable type: 'list' was raised.

cardinality should raise an error if the elements of the set are unhashable.

Change History (21)

comment:1 Changed 4 years ago by vdelecroix

The Set should not even be created.

comment:2 Changed 4 years ago by jdemeyer

The problem is that we do want to support certain infinite sets which cannot be listed, so simply listing all elements is not a solution.

comment:3 follow-up: Changed 4 years ago by vdelecroix

I don't understand your point, the problematic code is sage/sets/set.py line 206-211

    try:
        X = frozenset(X)
    except TypeError:
        return Set_object(X)
    else:
        return Set_object_enumerated(X)

It does expand X whatever it is. If the input X contains some non hashable object, a kind of lazy set is returned (Set_object) and not a proper finite set

sage: S = Set([1,1,1,[]])
sage: S
Set of elements of [1, 1, 1, []]
sage: S.cardinality()
4

And I think it should just raise a TypeError as it used to be. Support for non hashable elements should not be dealt with in the main constructor.

Last edited 4 years ago by vdelecroix (previous) (diff)

comment:4 in reply to: ↑ 3 Changed 4 years ago by jdemeyer

  • Description modified (diff)

Replying to vdelecroix:

it should just raise a TypeError as it used to be.

This "used to be" the case only for certain specific types (which included list). It really makes no sense to say "for these specific types, we require that the elements of the set are hashable, but for other types not". The only sensible options are to have such requirement never (which is the current implementation) or always.

comment:5 follow-up: Changed 4 years ago by vdelecroix

Does the following make sense!?

sage: Set([1, 1, 2])
{1, 2}
sage: Set([[], 1, 1, 2])
Set of elements of [[], 1, 1, 2]

or

sage: Set([1,2]) == Set([2,1])
True
sage: Set([[],1]) == Set([1,[]])
False

or

sage: l1 = []
sage: l2 = []
sage: S1 = Set([l1])
sage: S2 = Set([l2])
sage: S1 == S2
True
sage: l1.append('hello')
sage: S1 == S2
False

The Set function is a globally available factory for constructing mathematical sets. It is useful and possible to have mathematical sets made of Python lists. However, I do think that the Set constructor should not be used for that purpose.

comment:6 in reply to: ↑ 5 ; follow-up: Changed 4 years ago by jdemeyer

Replying to vdelecroix:

Does the following make sense!?

Of course not. I never said that the current behaviour makes sense. I just wanted to defend #22597 and to point out that it's more complicated than you think, that's all.

The current behaviour is because of the need to support infinite sets:

sage: Set(RR)
Set of elements of Real Field with 53 bits of precision

We could of course disallow that and simply always call frozenset(X).

comment:7 in reply to: ↑ 6 Changed 4 years ago by vdelecroix

Replying to jdemeyer:

Replying to vdelecroix: The current behaviour is because of the need to support infinite sets:

In your example, I don't even understand why RR is not itself returned.

A more sensible behavior would be

  • if X is a Parent whose category is a subcategory of Set then returns it (or possibly a wrapper around it)
  • otherwise return a finite mathematical set corresponding to frozenset(X)

It just boils down to replace the code metioned in comment:3 into

    return Set_object_enumerated(frozenset(X))

(the part for infinite stuff is already done at this stage)

comment:8 Changed 3 years ago by mantepse

  • Branch set to u/mantepse/surprising_behaviour_of_set

comment:9 Changed 3 years ago by mantepse

  • Authors set to Martin Rubey
  • Commit set to ba613e9de4136b386c156e6c1cb1c8fceebc9ecf
  • Status changed from new to needs_review

New commits:

ba613e9refuse to make Sets with unhashable objects

comment:10 Changed 3 years ago by tscrim

  • Status changed from needs_review to needs_work

-1 from me. There should be a natural object in Sage that behaves like a set but that the (casual) user doesn't have to learn what "hashable" means.

comment:11 Changed 3 years ago by mantepse

Hm, I don't see how unhashable (more precisely: mutable) objects can possibly be turned into a set. Just to make sure: the current Set does not behave like a mathematical set at all:

sage: s = Set([[1,2],[1,2]]); s
Set of elements of [[1, 2], [1, 2]]
sage: s.cardinality()
2
sage: s == Set([[1,2]])
False

How about telling the user explicitely what to do in the error message? Something like

sage: Set([QQ, [3, 1], 5])
ValueError: the elements of a Set must be immutable, use tuples instead of lists.

comment:12 Changed 3 years ago by tscrim

Note that unhashable does not necessarily mean mutable in the sense that == changes. For instance, it may not be possible to devise a (reasonable) hash function such that a == b implies hash(a) == hash(b).

Note that I am not saying that it does not have bugs (and some features should be used sparingly), but removing something that is useful to the causal user (in a way, python breaks its own conventions with its builtin classes) and could potentially have use-cases in the wild is why this is a -1.

comment:13 follow-up: Changed 3 years ago by mantepse

Well, I reported this, because it hit me as a casual user. I did a (research) experiment, where I was counting the number of elements in a set. I then found that the number that Set reported did not agree with my conjecture.

Luckily, I found out that Set was wrong, not my conjecture.

I think it would be good to have a use case where one really wants to make a Set of unhashable objects, so that we can incorporate it.

I hope you agree that elements of a set should be immutable?

comment:14 in reply to: ↑ 13 Changed 3 years ago by tscrim

Replying to mantepse:

I hope you agree that elements of a set should be immutable?

That is far too loaded of a question to get a fair answer.

I am somewhat leaning against it as it would be too difficult to enforce (e.g., a tuple with a list is mutable). Plus there is no way (at least AFAIK in Python) to distinguish hashable and mutable.

comment:15 Changed 3 years ago by mantepse

I'm afraid I have a communication problem (on my part), sorry for that. What I meant to ask is: what should the output of the following be:

sage: l = [1,2]; s = Set([l,[1,2]]); s

sage: s.cardinality()

sage: s == Set([[1,2]])

sage: l.append(3)

sage: s

Actually, I think that it makes sense to ask for hashable objects, and I do not think that this is a great burden on the user. In case of doubt, we could make the error message more explict.

Even more so, because I cannot see any set-like behaviour in Set with non-hashable objects. To me, it looks like a list. Am I missing something?

comment:16 Changed 3 years ago by tscrim

You are equating two separate things:

  1. The incorrect behavior of Set not giving unique objects.
  2. The hashability of objects in Set.

I agree that the first one is a bug and should be fixed so that you at least have to start doing something much more evil to break it. This might mean a new class with extra overhead for trying to enforce the uniqueness of the elements.

I do not think it makes sense to force a generic user to care about hashability.Try explaining that to a math undergrad without saying (im)mutable who just wants to create a finite set of vectors. I think it is a much bigger burden than you realize.

Remember, I am thinking of this as a convenience class and not something meant for doing real computations. At that point you should learn/know about hashablity and be using Python's (frozen)set directly.

comment:17 Changed 3 years ago by mantepse

You are equating two separate things:

  1. The incorrect behavior of Set not giving unique objects.
  2. The hashability of objects in Set.

Indeed, I do not see how one could separate these two items. Please explain!

Perhaps more importantly: is there anything one currently can do with s below?

sage: s = Set([vector(range(i)) for i in range(5)]); s
Set of elements of [(), (0), (0, 1), (0, 1, 2), (0, 1, 2, 3)]

(I tried: subsets, which raises an error about unhashability, and difference and union which actually don't do anything.)

Concerning your (great!) example I think it would be more practical to have an easy way to create vectors immutable, other than

sage: l = [vector(range(i)) for i in range(5)]; map(lambda v: v.set_immutable(), l); set(l)
[None, None, None, None, None]
{(0, 1, 2, 3), (0, 1), (0, 1, 2), (), (0)}

For example (maybe the error message could be better):

sage: set([Graph(i) for i in range(5)])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
...
TypeError: This graph is mutable, and thus not hashable. Create an immutable copy by `g.copy(immutable=True)`
sage: set([Graph(i, immutable=True) for i in range(5)])
{Graph on 0 vertices,
 Graph on 1 vertex,
 Graph on 2 vertices,
 Graph on 3 vertices,
 Graph on 4 vertices}

The fundamental problem I see with requiring immutability (instead of hashability) is that there seems to be no way to check whether an object is immutable. So, the only thing I can think of is to wrap every non-hashable object with

class SetElement(SageObject):
    def __init__(self, x):
        self._elt = copy(x)
    def _repr_(self):
        return repr(self._elt)

I don't know how generic copy is.

However, I think that it is much safer to point to the documentation, for example, https://stackoverflow.com/questions/14084317/python-mutable-vectors-are-unhashable-error

comment:18 Changed 3 years ago by tscrim

You can work towards solving 1 by doing things like removing duplicates on construction, adding elements, etc. That's not to say that enforcing the hashability does not solve 1, but they are two different discussions.

If you wrap objects, you're likely in for a whole host of other problems, e.g., "why cannot I add two vectors in this set?"

I do not think you can get around the non-hashability issue out by simply documenting hashability/immutability either. There is both too much to document/enforce/explain. Again, IMSO, such a user should not have to know/care about these things.

+1 for having a better way to construct immutable vectors, matrices, etc. Separate ticket.

comment:19 follow-up: Changed 3 years ago by mantepse

+1 for having a better way to construct immutable vectors, matrices, etc. Separate ticket.

I created #25509.

You can work towards solving 1 by doing things like removing duplicates on construction, adding elements, etc.

I don't see how this could work. You might get:

sage: l = [1]; s = Set([l, [1,2]]); s
{[1], [1,2]}
sage: l.append(2); s
{[1,2], [1,2]}
sage: l.add([1,2])
{[1,2]}

I think that this would be even more confusing.

That's not to say that enforcing the hashability does not solve 1, but they are two different discussions.

If you wrap objects, you're likely in for a whole host of other problems, e.g., "why cannot I add two vectors in this set?"

I agree.

I do not think you can get around the non-hashability issue out by simply documenting hashability/immutability either. There is both too much to document/enforce/explain. Again, IMSO, such a user should not have to know/care about these things.

I don't really understand this paragraph. Currently, non-hashable objects in 'Set' do not raise an error, but do not work at all. There is not a single operation which does something sensible in this case. Yes, the casual user could create a set of vectors, but what could she do then with it thereafter?

comment:20 in reply to: ↑ 19 ; follow-up: Changed 3 years ago by tscrim

Replying to mantepse:

You can work towards solving 1 by doing things like removing duplicates on construction, adding elements, etc.

I don't see how this could work. You might get:

sage: l = [1]; s = Set([l, [1,2]]); s
{[1], [1,2]}
sage: l.append(2); s
{[1,2], [1,2]}
sage: l.add([1,2])
{[1,2]}

I think that this would be even more confusing.

That is a deliberate misuse, something that is generally not likely to happen, and essentially impossible to prevent. Now you are going to see that as reason to enforce hashability, but I do not think that is the utility of having this class handle finite sets.

I do not think you can get around the non-hashability issue out by simply documenting hashability/immutability either. There is both too much to document/enforce/explain. Again, IMSO, such a user should not have to know/care about these things.

I don't really understand this paragraph. Currently, non-hashable objects in 'Set' do not raise an error, but do not work at all. There is not a single operation which does something sensible in this case. Yes, the casual user could create a set of vectors, but what could she do then with it thereafter?

Say you have two bases and you want to compute their common vectors (think matroids):

sage: X = Set([[1,2],[3,4]])
sage: X
Set of elements of [[1, 2], [3, 4]]
sage: X.difference(X)
{}
sage: X.difference(Set([[1,2]]))
Set-theoretic difference of Set of elements of [[1, 2], [3, 4]] and Set of elements of [[1, 2]]
sage: list(_)
[[3, 4]]

So things work much more than you say.

I am tired of arguing about this. I am getting far too angry and frustrated. Do whatever you want.

comment:21 in reply to: ↑ 20 Changed 3 years ago by mantepse

Replying to tscrim:

Say you have two bases and you want to compute their common vectors (think matroids):

sage: X = Set([[1,2],[3,4]])
sage: X
Set of elements of [[1, 2], [3, 4]]
sage: X.difference(X)
{}
sage: X.difference(Set([[1,2]]))
Set-theoretic difference of Set of elements of [[1, 2], [3, 4]] and Set of elements of [[1, 2]]
sage: list(_)
[[3, 4]]

So things work much more than you say.

OK, thanks, I didn't know that this works.

I am tired of arguing about this. I am getting far too angry and frustrated. Do whatever you want.

I guess we should have a beer or something. I didn't realize that I'm making you angry, sorry about that.

Note: See TracTickets for help on using tickets.