Opened 6 months ago

Closed 6 months ago

#15956 closed defect (fixed)

WeakValueDictionary does not handle unhashable keys correctly

Reported by: saraedum Owned by:
Priority: minor Milestone: sage-6.2
Component: misc Keywords: hash
Cc: nbruin, simonking Merged in:
Authors: Julian Rueth, Nils Bruin Reviewers: Nils Bruin, Julian Rueth
Report Upstream: N/A Work issues:
Branch: 52e02b1 (Commits) Commit: 52e02b14977e34bf31f4302cea433f1d825a7698
Dependencies: Stopgaps:

Description

The WeakValueDictionary behaves differently from regular dicts for unhashable keys:

sage: {}[[]]
TypeError: unhashable type: 'list'
sage: sage.misc.weak_dict.WeakValueDictionary()[[]]
KeyError: []

This is because http://docs.python.org/2/c-api/dict.html#PyDict_GetItem does not throw a TypeError.

Change History (12)

comment:1 Changed 6 months ago by nbruin

Hm, unfortunate situation. In python 3 we'd be able to use PyDict_GetItemWithError:
http://hg.python.org/cpython/file/5cab0ada97b2/Objects/dictobject.c#l1109
I'd be tempted to let this sit for now. We'd basically have to write our own.

comment:2 Changed 6 months ago by saraedum

You are right. It is annoying to fix this in python2. I ran into it while working on #11895 (disable hashing of padics) btw.

comment:3 Changed 6 months ago by saraedum

  • Branch set to u/saraedum/ticket/15956
  • Created changed from 03/16/14 15:32:06 to 03/16/14 15:32:06
  • Modified changed from 03/16/14 20:56:12 to 03/16/14 20:56:12

comment:4 Changed 6 months ago by saraedum

  • Status changed from new to needs_review

comment:5 Changed 6 months ago by nbruin

  • Commit set to 3a05a1aec6459e59f0b7069af4bc1100a71a9502

The proposed fix is very elegant and minimal, but it does noticeably affect a code path that deserves to be very fast: testing that a key does not occur in the dictionary. Timings:

%cpaste
cython("""
def test(N,D):
    cdef long n=N
    for i in range(n):
        1 in D
""")
--
sage: D=sage.misc.weak_dict.WeakValueDictionary()
sage: D[2]=ZZ
sage: timeit("test(10^7,D)")
5 loops, best of 3: 122 ms per loop
//with the extra hash call:
5 loops, best of 3: 151 ms per loop
sage: D[1]=ZZ
sage: timeit("test(10^7,D)")
5 loops, best of 3: 466 ms per loop
//this of course times the same with the extra hash call

Compared with a normal dictionary:

sage: N={}
sage: N[2]=ZZ
sage: timeit("test(10^7,N)")
5 loops, best of 3: 90.6 ms per loop
sage: N[1]=ZZ
sage: timeit("test(10^7,N)")
5 loops, best of 3: 377 ms per loop

The change from 122ms to 151ms is an increase of 25% in time taken (and this is for integers, which hash extremely quickly. I've also tried it with a string key '1', where I got 101ms vs. 127ms, so a penalty of around 30ms/107 seems to be expected).

Of course, it's nice if sage.misc.weak_dict.WeakValueDictionary is a perfect drop-in replacement for weakref.WeakValueDictionary, but having better performance at the expense of different edge-case error reporting (and it's just reporting "key not here" in cases where that's actually true, and where some other dictionaries raise "key not hashable", so it's hard to think of an edge case where that seriously affects legitimate program logic) shouldn't just be dismissed.

We could of course patch python and backport PyDict_GetItemWithError and get a proper fix, without performance degradation.

For the current proposal, I'd like to see some real-world motivation to bring the error reporting in line with other dictionaries before I'd consider significantly degrading performance.

A further problem with the current proposal: it only fixes one symptom. The nastier example http://hg.python.org/cpython/file/5cab0ada97b2/Objects/dictobject.c#l1046 (stack overflow during key comparison) isn't addressed by verifying that the key hashes. Going all the way with the PyDict_GetItemWithError solution (or equivalent) would also solve that.


New commits:

3a05a1aImproved handling of unhashable keys in weak dictionary

comment:6 Changed 6 months ago by nbruin

  • Branch changed from u/saraedum/ticket/15956 to u/nbruin/ticket/15956
  • Modified changed from 03/20/14 11:05:19 to 03/20/14 11:05:19

comment:7 Changed 6 months ago by nbruin

  • Commit changed from 3a05a1aec6459e59f0b7069af4bc1100a71a9502 to c60d3c6caefd6626ce5881252e66f71524461651

Turns out backporting PyDict_GetItemWithError isn't nearly as bad as it seems in cython. We are relying on details about the PyDict implementation that are not officially part of the CAPI, but they do occur in Python.h. We're already doing worse here by digging up the non-exported Dummy value for dict tables.

The current implementation is actually a little faster than PyDict_GetItem (because we don't need to clear errors, probably). We could unwrap the use of these even further into the various places where PyDict_GetItemWithError is used, but for now this should be good enough.


New commits:

c60d3c6trac #15956: Provide PyDict_GetItemWithError and use it to obtain appropriate errors when looking up weakdict entries

comment:8 Changed 6 months ago by git

  • Commit changed from c60d3c6caefd6626ce5881252e66f71524461651 to 109c161ca077a7d8fc6fc91fc1dce48f439234e6

Branch pushed to git repo; I updated commit sha1. New commits:

109c161move a comment to its proper place

comment:9 Changed 6 months ago by nbruin

  • Authors set to Julian, Nils Bruin
  • Reviewers set to Nils Bruin, ???

*PING* I'm happy with Julian's doctests. The current use of PyDict_GetItemWithError actually gives slightly better performance than we had before AND we get fully compliant error reporting. So I'd be happy with the current solution. If Julian can give my change a positive review, we can get this tiny, unimportant improvement merged. Please complete the reviewer field when you're done.

comment:10 Changed 6 months ago by git

  • Commit changed from 109c161ca077a7d8fc6fc91fc1dce48f439234e6 to 52e02b14977e34bf31f4302cea433f1d825a7698

Branch pushed to git repo; I updated commit sha1. New commits:

52e02b1trac #15956: improve error detection by adding "except ..." to some type declarations and inline some things in "__contains__" to obtain better performance than standard dict on non-occurring keys

comment:11 Changed 6 months ago by saraedum

  • Authors changed from Julian, Nils Bruin to Julian Rueth, Nils Bruin
  • Reviewers changed from Nils Bruin, ??? to Nils Bruin, Julian Rueth
  • Status changed from needs_review to positive_review

Thanks for taking care of this :)

comment:12 Changed 6 months ago by vbraun

  • Branch changed from u/nbruin/ticket/15956 to 52e02b14977e34bf31f4302cea433f1d825a7698
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.