Opened 4 years ago
Last modified 3 years ago
#23324 needs_work defect
surprising behaviour of Set
Reported by:  mantepse  Owned by:  

Priority:  major  Milestone:  sage8.0 
Component:  misc  Keywords:  
Cc:  Merged in:  
Authors:  Martin Rubey  Reviewers:  
Report Upstream:  N/A  Work issues:  
Branch:  u/mantepse/surprising_behaviour_of_set (Commits, GitHub, GitLab)  Commit:  ba613e9de4136b386c156e6c1cb1c8fceebc9ecf 
Dependencies:  Stopgaps: 
Description (last modified by )
sage: Set([[], []]).cardinality() 2
In earlier versions (before #22597), a TypeError: unhashable type: 'list'
was raised.
cardinality
should raise an error if the elements of the set are unhashable.
Change History (21)
comment:1 Changed 4 years ago by
comment:2 Changed 4 years ago by
The problem is that we do want to support certain infinite sets which cannot be listed, so simply listing all elements is not a solution.
comment:3 followup: ↓ 4 Changed 4 years ago by
I don't understand your point, the problematic code is sage/sets/set.py
line 206211
try: X = frozenset(X) except TypeError: return Set_object(X) else: return Set_object_enumerated(X)
It does expand X
whatever it is. If the input X
contains some non hashable object, a kind of lazy set is returned (Set_object
) and not a proper finite set
sage: S = Set([1,1,1,[]]) sage: S Set of elements of [1, 1, 1, []] sage: S.cardinality() 4
And I think it should just raise a TypeError
as it used to be. Support for non hashable elements should not be dealt with in the main constructor.
comment:4 in reply to: ↑ 3 Changed 4 years ago by
 Description modified (diff)
Replying to vdelecroix:
it should just raise a
TypeError
as it used to be.
This "used to be" the case only for certain specific types (which included list
). It really makes no sense to say "for these specific types, we require that the elements of the set are hashable, but for other types not". The only sensible options are to have such requirement never (which is the current implementation) or always.
comment:5 followup: ↓ 6 Changed 4 years ago by
Does the following make sense!?
sage: Set([1, 1, 2]) {1, 2} sage: Set([[], 1, 1, 2]) Set of elements of [[], 1, 1, 2]
or
sage: Set([1,2]) == Set([2,1]) True sage: Set([[],1]) == Set([1,[]]) False
or
sage: l1 = [] sage: l2 = [] sage: S1 = Set([l1]) sage: S2 = Set([l2]) sage: S1 == S2 True sage: l1.append('hello') sage: S1 == S2 False
The Set
function is a globally available factory for constructing mathematical sets. It is useful and possible to have mathematical sets made of Python lists. However, I do think that the Set
constructor should not be used for that purpose.
comment:6 in reply to: ↑ 5 ; followup: ↓ 7 Changed 4 years ago by
Replying to vdelecroix:
Does the following make sense!?
Of course not. I never said that the current behaviour makes sense. I just wanted to defend #22597 and to point out that it's more complicated than you think, that's all.
The current behaviour is because of the need to support infinite sets:
sage: Set(RR) Set of elements of Real Field with 53 bits of precision
We could of course disallow that and simply always call frozenset(X)
.
comment:7 in reply to: ↑ 6 Changed 4 years ago by
Replying to jdemeyer:
Replying to vdelecroix: The current behaviour is because of the need to support infinite sets:
In your example, I don't even understand why RR
is not itself returned.
A more sensible behavior would be
 if
X
is aParent
whose category is a subcategory ofSet
then returns it (or possibly a wrapper around it)  otherwise return a finite mathematical set corresponding to
frozenset(X)
It just boils down to replace the code metioned in comment:3 into
return Set_object_enumerated(frozenset(X))
(the part for infinite stuff is already done at this stage)
comment:8 Changed 3 years ago by
 Branch set to u/mantepse/surprising_behaviour_of_set
comment:9 Changed 3 years ago by
 Commit set to ba613e9de4136b386c156e6c1cb1c8fceebc9ecf
 Status changed from new to needs_review
New commits:
ba613e9  refuse to make Sets with unhashable objects

comment:10 Changed 3 years ago by
 Status changed from needs_review to needs_work
1 from me. There should be a natural object in Sage that behaves like a set but that the (casual) user doesn't have to learn what "hashable" means.
comment:11 Changed 3 years ago by
Hm, I don't see how unhashable (more precisely: mutable) objects can possibly be turned into a set. Just to make sure: the current Set
does not behave like a mathematical set at all:
sage: s = Set([[1,2],[1,2]]); s Set of elements of [[1, 2], [1, 2]] sage: s.cardinality() 2 sage: s == Set([[1,2]]) False
How about telling the user explicitely what to do in the error message? Something like
sage: Set([QQ, [3, 1], 5]) ValueError: the elements of a Set must be immutable, use tuples instead of lists.
comment:12 Changed 3 years ago by
Note that unhashable does not necessarily mean mutable in the sense that ==
changes. For instance, it may not be possible to devise a (reasonable) hash function such that a == b
implies hash(a) == hash(b)
.
Note that I am not saying that it does not have bugs (and some features should be used sparingly), but removing something that is useful to the causal user (in a way, python breaks its own conventions with its builtin classes) and could potentially have usecases in the wild is why this is a 1.
comment:13 followup: ↓ 14 Changed 3 years ago by
Well, I reported this, because it hit me as a casual user. I did a (research) experiment, where I was counting the number of elements in a set. I then found that the number that Set
reported did not agree with my conjecture.
Luckily, I found out that Set
was wrong, not my conjecture.
I think it would be good to have a use case where one really wants to make a Set
of unhashable objects, so that we can incorporate it.
I hope you agree that elements of a set should be immutable?
comment:14 in reply to: ↑ 13 Changed 3 years ago by
Replying to mantepse:
I hope you agree that elements of a set should be immutable?
That is far too loaded of a question to get a fair answer.
I am somewhat leaning against it as it would be too difficult to enforce (e.g., a tuple with a list is mutable). Plus there is no way (at least AFAIK in Python) to distinguish hashable and mutable.
comment:15 Changed 3 years ago by
I'm afraid I have a communication problem (on my part), sorry for that. What I meant to ask is: what should the output of the following be:
sage: l = [1,2]; s = Set([l,[1,2]]); s sage: s.cardinality() sage: s == Set([[1,2]]) sage: l.append(3) sage: s
Actually, I think that it makes sense to ask for hashable objects, and I do not think that this is a great burden on the user. In case of doubt, we could make the error message more explict.
Even more so, because I cannot see any setlike behaviour in Set
with nonhashable objects. To me, it looks like a list. Am I missing something?
comment:16 Changed 3 years ago by
You are equating two separate things:
 The incorrect behavior of
Set
not giving unique objects.  The hashability of objects in
Set
.
I agree that the first one is a bug and should be fixed so that you at least have to start doing something much more evil to break it. This might mean a new class with extra overhead for trying to enforce the uniqueness of the elements.
I do not think it makes sense to force a generic user to care about hashability.Try explaining that to a math undergrad without saying (im)mutable who just wants to create a finite set of vectors. I think it is a much bigger burden than you realize.
Remember, I am thinking of this as a convenience class and not something meant for doing real computations. At that point you should learn/know about hashablity and be using Python's (frozen)set
directly.
comment:17 Changed 3 years ago by
You are equating two separate things:
 The incorrect behavior of Set not giving unique objects.
 The hashability of objects in Set.
Indeed, I do not see how one could separate these two items. Please explain!
Perhaps more importantly: is there anything one currently can do with s
below?
sage: s = Set([vector(range(i)) for i in range(5)]); s Set of elements of [(), (0), (0, 1), (0, 1, 2), (0, 1, 2, 3)]
(I tried: subsets
, which raises an error about unhashability, and difference
and union
which actually don't do anything.)
Concerning your (great!) example I think it would be more practical to have an easy way to create vectors immutable, other than
sage: l = [vector(range(i)) for i in range(5)]; map(lambda v: v.set_immutable(), l); set(l) [None, None, None, None, None] {(0, 1, 2, 3), (0, 1), (0, 1, 2), (), (0)}
For example (maybe the error message could be better):
sage: set([Graph(i) for i in range(5)])  TypeError Traceback (most recent call last) ... TypeError: This graph is mutable, and thus not hashable. Create an immutable copy by `g.copy(immutable=True)` sage: set([Graph(i, immutable=True) for i in range(5)]) {Graph on 0 vertices, Graph on 1 vertex, Graph on 2 vertices, Graph on 3 vertices, Graph on 4 vertices}
The fundamental problem I see with requiring immutability (instead of hashability) is that there seems to be no way to check whether an object is immutable. So, the only thing I can think of is to wrap every nonhashable object with
class SetElement(SageObject): def __init__(self, x): self._elt = copy(x) def _repr_(self): return repr(self._elt)
I don't know how generic copy
is.
However, I think that it is much safer to point to the documentation, for example, https://stackoverflow.com/questions/14084317/pythonmutablevectorsareunhashableerror
comment:18 Changed 3 years ago by
You can work towards solving 1 by doing things like removing duplicates on construction, adding elements, etc. That's not to say that enforcing the hashability does not solve 1, but they are two different discussions.
If you wrap objects, you're likely in for a whole host of other problems, e.g., "why cannot I add two vectors in this set?"
I do not think you can get around the nonhashability issue out by simply documenting hashability/immutability either. There is both too much to document/enforce/explain. Again, IMSO, such a user should not have to know/care about these things.
+1 for having a better way to construct immutable vectors, matrices, etc. Separate ticket.
comment:19 followup: ↓ 20 Changed 3 years ago by
+1 for having a better way to construct immutable vectors, matrices, etc. Separate ticket.
I created #25509.
You can work towards solving 1 by doing things like removing duplicates on construction, adding elements, etc.
I don't see how this could work. You might get:
sage: l = [1]; s = Set([l, [1,2]]); s {[1], [1,2]} sage: l.append(2); s {[1,2], [1,2]} sage: l.add([1,2]) {[1,2]}
I think that this would be even more confusing.
That's not to say that enforcing the hashability does not solve 1, but they are two different discussions.
If you wrap objects, you're likely in for a whole host of other problems, e.g., "why cannot I add two vectors in this set?"
I agree.
I do not think you can get around the nonhashability issue out by simply documenting hashability/immutability either. There is both too much to document/enforce/explain. Again, IMSO, such a user should not have to know/care about these things.
I don't really understand this paragraph. Currently, nonhashable objects in 'Set' do not raise an error, but do not work at all. There is not a single operation which does something sensible in this case. Yes, the casual user could create a set of vectors, but what could she do then with it thereafter?
comment:20 in reply to: ↑ 19 ; followup: ↓ 21 Changed 3 years ago by
Replying to mantepse:
You can work towards solving 1 by doing things like removing duplicates on construction, adding elements, etc.
I don't see how this could work. You might get:
sage: l = [1]; s = Set([l, [1,2]]); s {[1], [1,2]} sage: l.append(2); s {[1,2], [1,2]} sage: l.add([1,2]) {[1,2]}I think that this would be even more confusing.
That is a deliberate misuse, something that is generally not likely to happen, and essentially impossible to prevent. Now you are going to see that as reason to enforce hashability, but I do not think that is the utility of having this class handle finite sets.
I do not think you can get around the nonhashability issue out by simply documenting hashability/immutability either. There is both too much to document/enforce/explain. Again, IMSO, such a user should not have to know/care about these things.
I don't really understand this paragraph. Currently, nonhashable objects in 'Set' do not raise an error, but do not work at all. There is not a single operation which does something sensible in this case. Yes, the casual user could create a set of vectors, but what could she do then with it thereafter?
Say you have two bases and you want to compute their common vectors (think matroids):
sage: X = Set([[1,2],[3,4]]) sage: X Set of elements of [[1, 2], [3, 4]] sage: X.difference(X) {} sage: X.difference(Set([[1,2]])) Settheoretic difference of Set of elements of [[1, 2], [3, 4]] and Set of elements of [[1, 2]] sage: list(_) [[3, 4]]
So things work much more than you say.
I am tired of arguing about this. I am getting far too angry and frustrated. Do whatever you want.
comment:21 in reply to: ↑ 20 Changed 3 years ago by
Replying to tscrim:
Say you have two bases and you want to compute their common vectors (think matroids):
sage: X = Set([[1,2],[3,4]]) sage: X Set of elements of [[1, 2], [3, 4]] sage: X.difference(X) {} sage: X.difference(Set([[1,2]])) Settheoretic difference of Set of elements of [[1, 2], [3, 4]] and Set of elements of [[1, 2]] sage: list(_) [[3, 4]]So things work much more than you say.
OK, thanks, I didn't know that this works.
I am tired of arguing about this. I am getting far too angry and frustrated. Do whatever you want.
I guess we should have a beer or something. I didn't realize that I'm making you angry, sorry about that.
The Set should not even be created.