Opened 3 years ago
Closed 3 years ago
#15956 closed defect (fixed)
WeakValueDictionary does not handle unhashable keys correctly
Reported by:  saraedum  Owned by:  

Priority:  minor  Milestone:  sage6.2 
Component:  misc  Keywords:  hash 
Cc:  nbruin, simonking  Merged in:  
Authors:  Julian Rueth, Nils Bruin  Reviewers:  Nils Bruin, Julian Rueth 
Report Upstream:  N/A  Work issues:  
Branch:  52e02b1 (Commits)  Commit:  52e02b14977e34bf31f4302cea433f1d825a7698 
Dependencies:  Stopgaps: 
Description
The WeakValueDictionary
behaves differently from regular dicts for unhashable keys:
sage: {}[[]] TypeError: unhashable type: 'list' sage: sage.misc.weak_dict.WeakValueDictionary()[[]] KeyError: []
This is because http://docs.python.org/2/capi/dict.html#PyDict_GetItem does not throw a TypeError
.
Change History (12)
comment:1 Changed 3 years ago by
comment:2 Changed 3 years ago by
You are right. It is annoying to fix this in python2. I ran into it while working on #11895 (disable hashing of padics) btw.
comment:3 Changed 3 years ago by
 Branch set to u/saraedum/ticket/15956
 Created changed from 03/16/14 22:32:06 to 03/16/14 22:32:06
 Modified changed from 03/17/14 03:56:12 to 03/17/14 03:56:12
comment:4 Changed 3 years ago by
 Status changed from new to needs_review
comment:5 Changed 3 years ago by
 Commit set to 3a05a1aec6459e59f0b7069af4bc1100a71a9502
The proposed fix is very elegant and minimal, but it does noticeably affect a code path that deserves to be very fast: testing that a key does not occur in the dictionary. Timings:
%cpaste cython(""" def test(N,D): cdef long n=N for i in range(n): 1 in D """) 
sage: D=sage.misc.weak_dict.WeakValueDictionary() sage: D[2]=ZZ sage: timeit("test(10^7,D)") 5 loops, best of 3: 122 ms per loop //with the extra hash call: 5 loops, best of 3: 151 ms per loop sage: D[1]=ZZ sage: timeit("test(10^7,D)") 5 loops, best of 3: 466 ms per loop //this of course times the same with the extra hash call
Compared with a normal dictionary:
sage: N={} sage: N[2]=ZZ sage: timeit("test(10^7,N)") 5 loops, best of 3: 90.6 ms per loop sage: N[1]=ZZ sage: timeit("test(10^7,N)") 5 loops, best of 3: 377 ms per loop
The change from 122ms to 151ms is an increase of 25% in time taken (and this is for integers, which hash extremely quickly. I've also tried it with a string key '1', where I got 101ms vs. 127ms, so a penalty of around 30ms/10^{7 seems to be expected). }
Of course, it's nice if sage.misc.weak_dict.WeakValueDictionary
is a perfect dropin replacement for weakref.WeakValueDictionary
, but having better performance at the expense of different edgecase error reporting (and it's just reporting "key not here" in cases where that's actually true, and where some other dictionaries raise "key not hashable", so it's hard to think of an edge case where that seriously affects legitimate program logic) shouldn't just be dismissed.
We could of course patch python and backport PyDict_GetItemWithError
and get a proper fix, without performance degradation.
For the current proposal, I'd like to see some realworld motivation to bring the error reporting in line with other dictionaries before I'd consider significantly degrading performance.
A further problem with the current proposal: it only fixes one symptom. The nastier example http://hg.python.org/cpython/file/5cab0ada97b2/Objects/dictobject.c#l1046 (stack overflow during key comparison) isn't addressed by verifying that the key hashes. Going all the way with the PyDict_GetItemWithError
solution (or equivalent) would also solve that.
New commits:
3a05a1a  Improved handling of unhashable keys in weak dictionary

comment:6 Changed 3 years ago by
 Branch changed from u/saraedum/ticket/15956 to u/nbruin/ticket/15956
 Modified changed from 03/20/14 18:05:19 to 03/20/14 18:05:19
comment:7 Changed 3 years ago by
 Commit changed from 3a05a1aec6459e59f0b7069af4bc1100a71a9502 to c60d3c6caefd6626ce5881252e66f71524461651
Turns out backporting PyDict_GetItemWithError
isn't nearly as bad as it seems in cython. We are relying on details about the PyDict
implementation that are not officially part of the CAPI, but they do occur in Python.h
. We're already doing worse here by digging up the nonexported Dummy
value for dict tables.
The current implementation is actually a little faster than PyDict_GetItem
(because we don't need to clear errors, probably). We could unwrap the use of these even further into the various places where PyDict_GetItemWithError
is used, but for now this should be good enough.
New commits:
c60d3c6  trac #15956: Provide PyDict_GetItemWithError and use it to obtain appropriate errors when looking up weakdict entries

comment:8 Changed 3 years ago by
 Commit changed from c60d3c6caefd6626ce5881252e66f71524461651 to 109c161ca077a7d8fc6fc91fc1dce48f439234e6
Branch pushed to git repo; I updated commit sha1. New commits:
109c161  move a comment to its proper place

comment:9 Changed 3 years ago by
 Reviewers set to Nils Bruin, ???
*PING* I'm happy with Julian's doctests. The current use of PyDict_GetItemWithError
actually gives slightly better performance than we had before AND we get fully compliant error reporting. So I'd be happy with the current solution. If Julian can give my change a positive review, we can get this tiny, unimportant improvement merged. Please complete the reviewer field when you're done.
comment:10 Changed 3 years ago by
 Commit changed from 109c161ca077a7d8fc6fc91fc1dce48f439234e6 to 52e02b14977e34bf31f4302cea433f1d825a7698
Branch pushed to git repo; I updated commit sha1. New commits:
52e02b1  trac #15956: improve error detection by adding "except ..." to some type declarations and inline some things in "__contains__" to obtain better performance than standard dict on nonoccurring keys

comment:11 Changed 3 years ago by
 Reviewers changed from Nils Bruin, ??? to Nils Bruin, Julian Rueth
 Status changed from needs_review to positive_review
Thanks for taking care of this :)
comment:12 Changed 3 years ago by
 Branch changed from u/nbruin/ticket/15956 to 52e02b14977e34bf31f4302cea433f1d825a7698
 Resolution set to fixed
 Status changed from positive_review to closed
Hm, unfortunate situation. In python 3 we'd be able to use
PyDict_GetItemWithError
: http://hg.python.org/cpython/file/5cab0ada97b2/Objects/dictobject.c#l1109 I'd be tempted to let this sit for now. We'd basically have to write our own.