Opened 9 years ago

Closed 9 years ago

#15956 closed defect (fixed)

WeakValueDictionary does not handle unhashable keys correctly

Reported by: Julian Rüth Owned by:
Priority: minor Milestone: sage-6.2
Component: misc Keywords: hash
Cc: Nils Bruin, simonking Merged in:
Authors: Julian Rueth, Nils Bruin Reviewers: Nils Bruin, Julian Rueth
Report Upstream: N/A Work issues:
Branch: 52e02b1 (Commits, GitHub, GitLab) Commit: 52e02b14977e34bf31f4302cea433f1d825a7698
Dependencies: Stopgaps:

Status badges

Description

The WeakValueDictionary behaves differently from regular dicts for unhashable keys:

sage: {}[[]]
TypeError: unhashable type: 'list'
sage: sage.misc.weak_dict.WeakValueDictionary()[[]]
KeyError: []

This is because http://docs.python.org/2/c-api/dict.html#PyDict_GetItem does not throw a TypeError.

Change History (12)

comment:1 Changed 9 years ago by Nils Bruin

Hm, unfortunate situation. In python 3 we'd be able to use PyDict_GetItemWithError: http://hg.python.org/cpython/file/5cab0ada97b2/Objects/dictobject.c#l1109 I'd be tempted to let this sit for now. We'd basically have to write our own.

comment:2 Changed 9 years ago by Julian Rüth

You are right. It is annoying to fix this in python2. I ran into it while working on #11895 (disable hashing of padics) btw.

comment:3 Changed 9 years ago by Julian Rüth

Branch: u/saraedum/ticket/15956
Created: Mar 16, 2014, 10:32:06 PMMar 16, 2014, 10:32:06 PM
Modified: Mar 17, 2014, 3:56:12 AMMar 17, 2014, 3:56:12 AM

comment:4 Changed 9 years ago by Julian Rüth

Status: newneeds_review

comment:5 Changed 9 years ago by Nils Bruin

Commit: 3a05a1aec6459e59f0b7069af4bc1100a71a9502

The proposed fix is very elegant and minimal, but it does noticeably affect a code path that deserves to be very fast: testing that a key does not occur in the dictionary. Timings:

%cpaste
cython("""
def test(N,D):
    cdef long n=N
    for i in range(n):
        1 in D
""")
--
sage: D=sage.misc.weak_dict.WeakValueDictionary()
sage: D[2]=ZZ
sage: timeit("test(10^7,D)")
5 loops, best of 3: 122 ms per loop
//with the extra hash call:
5 loops, best of 3: 151 ms per loop
sage: D[1]=ZZ
sage: timeit("test(10^7,D)")
5 loops, best of 3: 466 ms per loop
//this of course times the same with the extra hash call

Compared with a normal dictionary:

sage: N={}
sage: N[2]=ZZ
sage: timeit("test(10^7,N)")
5 loops, best of 3: 90.6 ms per loop
sage: N[1]=ZZ
sage: timeit("test(10^7,N)")
5 loops, best of 3: 377 ms per loop

The change from 122ms to 151ms is an increase of 25% in time taken (and this is for integers, which hash extremely quickly. I've also tried it with a string key '1', where I got 101ms vs. 127ms, so a penalty of around 30ms/107 seems to be expected).

Of course, it's nice if sage.misc.weak_dict.WeakValueDictionary is a perfect drop-in replacement for weakref.WeakValueDictionary, but having better performance at the expense of different edge-case error reporting (and it's just reporting "key not here" in cases where that's actually true, and where some other dictionaries raise "key not hashable", so it's hard to think of an edge case where that seriously affects legitimate program logic) shouldn't just be dismissed.

We could of course patch python and backport PyDict_GetItemWithError and get a proper fix, without performance degradation.

For the current proposal, I'd like to see some real-world motivation to bring the error reporting in line with other dictionaries before I'd consider significantly degrading performance.

A further problem with the current proposal: it only fixes one symptom. The nastier example http://hg.python.org/cpython/file/5cab0ada97b2/Objects/dictobject.c#l1046 (stack overflow during key comparison) isn't addressed by verifying that the key hashes. Going all the way with the PyDict_GetItemWithError solution (or equivalent) would also solve that.


New commits:

3a05a1aImproved handling of unhashable keys in weak dictionary

comment:6 Changed 9 years ago by Nils Bruin

Branch: u/saraedum/ticket/15956u/nbruin/ticket/15956
Modified: Mar 20, 2014, 6:05:19 PMMar 20, 2014, 6:05:19 PM

comment:7 Changed 9 years ago by Nils Bruin

Commit: 3a05a1aec6459e59f0b7069af4bc1100a71a9502c60d3c6caefd6626ce5881252e66f71524461651

Turns out backporting PyDict_GetItemWithError isn't nearly as bad as it seems in cython. We are relying on details about the PyDict implementation that are not officially part of the CAPI, but they do occur in Python.h. We're already doing worse here by digging up the non-exported Dummy value for dict tables.

The current implementation is actually a little faster than PyDict_GetItem (because we don't need to clear errors, probably). We could unwrap the use of these even further into the various places where PyDict_GetItemWithError is used, but for now this should be good enough.


New commits:

c60d3c6trac #15956: Provide PyDict_GetItemWithError and use it to obtain appropriate errors when looking up weakdict entries

comment:8 Changed 9 years ago by git

Commit: c60d3c6caefd6626ce5881252e66f71524461651109c161ca077a7d8fc6fc91fc1dce48f439234e6

Branch pushed to git repo; I updated commit sha1. New commits:

109c161move a comment to its proper place

comment:9 Changed 9 years ago by Nils Bruin

Authors: Julian, Nils Bruin
Reviewers: Nils Bruin, ???

*PING* I'm happy with Julian's doctests. The current use of PyDict_GetItemWithError actually gives slightly better performance than we had before AND we get fully compliant error reporting. So I'd be happy with the current solution. If Julian can give my change a positive review, we can get this tiny, unimportant improvement merged. Please complete the reviewer field when you're done.

comment:10 Changed 9 years ago by git

Commit: 109c161ca077a7d8fc6fc91fc1dce48f439234e652e02b14977e34bf31f4302cea433f1d825a7698

Branch pushed to git repo; I updated commit sha1. New commits:

52e02b1trac #15956: improve error detection by adding "except ..." to some type declarations and inline some things in "__contains__" to obtain better performance than standard dict on non-occurring keys

comment:11 Changed 9 years ago by Julian Rüth

Authors: Julian, Nils BruinJulian Rueth, Nils Bruin
Reviewers: Nils Bruin, ???Nils Bruin, Julian Rueth
Status: needs_reviewpositive_review

Thanks for taking care of this :)

comment:12 Changed 9 years ago by Volker Braun

Branch: u/nbruin/ticket/1595652e02b14977e34bf31f4302cea433f1d825a7698
Resolution: fixed
Status: positive_reviewclosed
Note: See TracTickets for help on using tickets.