Opened 3 years ago

Closed 3 years ago

#24135 closed enhancement (fixed)

Clean up in coerce_dict

Reported by: jdemeyer Owned by:
Priority: major Milestone: sage-8.1
Component: coercion Keywords:
Cc: SimonKing, nbruin, tscrim, chapoton Merged in:
Authors: Jeroen Demeyer Reviewers: Travis Scrimshaw
Report Upstream: N/A Work issues:
Branch: 7c83e05 (Commits) Commit: 7c83e05a8df5e3bad8c909b33835a2bc5b1920e3
Dependencies: Stopgaps:

Description (last modified by jdemeyer)

  1. Deprecate various arguments which should have been deprecated in #15367.
  1. Use a tuple instead of a list to throw away. This is very slightly faster.
  1. Use safe memory functions from cysignals instead of PyMem functions.
  1. Split __init__ into __cinit__ and __init__.
  1. Rename the iteritems() method to items().
  1. Introduce a new inline function valid(ptr) to replace the very common ptr != NULL and ptr != dummy.
  1. Change type of key_id from void* to PyObject*. This avoids a lot of casts.
  1. Use a custom class with @cython.freelist instead of a PyCapsule to wrap a PyObject*.
  1. In tp_clear, only delete references to elements contained in the dict. Move deallocation of the data structure to __dealloc__.
  1. Use random 31-bit multipliers (instead of 13 and 503) in the hash function for TripleDict. This might give better mixing for large sizes (in any case, it can't hurt).
  1. Rename dummy -> deleted_key to make it more clear what it means. Also, there was no reason that this was of type bytes. Now it is simply created by object().
  1. Change the logic of the lookup() methods a bit to make them easier to understand.
  1. Generic code cleanup: PEP 8, is instead of == for pointers, ...

Timings:

All the changes above lead to a modest speed-up:

MonoDict lookup:

sage: from sage.structure.coerce_dict import MonoDict; D = MonoDict()
sage: L = [Integer(x) for x in range(1000)]
sage: for k in L: D[k] = k
sage: timeit('[D[k] for k in L]', repeat=20, number=20000)

Before: 20000 loops, best of 20: 72.7 µs per loop

After: 20000 loops, best of 20: 64.5 µs per loop

TripleDict lookup:

sage: from sage.structure.coerce_dict import TripleDict; D = TripleDict()
sage: L = [(None, None, Integer(x)) for x in range(1000)]
sage: for k in L: D[k] = None
sage: timeit('[D[k] for k in L]', repeat=20, number=20000)

Before: 20000 loops, best of 20: 97.3 µs per loop

After: 20000 loops, best of 20: 87.3 µs per loop

Change History (32)

comment:1 Changed 3 years ago by jdemeyer

  • Description modified (diff)

comment:2 Changed 3 years ago by jdemeyer

  • Branch set to u/jdemeyer/clean_up_in_coerce_dict

comment:3 Changed 3 years ago by tscrim

  • Cc tscrim added
  • Commit set to 5abd7b8d5f12fd2d1b516d17feed7407c87381e1

New commits:

5abd7b8Clean up in coerce_dict

comment:4 Changed 3 years ago by jdemeyer

  • Description modified (diff)

comment:5 Changed 3 years ago by jdemeyer

  • Description modified (diff)

comment:6 Changed 3 years ago by jdemeyer

  • Description modified (diff)

comment:7 Changed 3 years ago by git

  • Commit changed from 5abd7b8d5f12fd2d1b516d17feed7407c87381e1 to dbf3831736b3ae1e20adc65d334da8764ebcf74c

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

dbf3831Clean up in coerce_dict

comment:8 Changed 3 years ago by jdemeyer

  • Description modified (diff)
  • Status changed from new to needs_review

comment:9 Changed 3 years ago by git

  • Commit changed from dbf3831736b3ae1e20adc65d334da8764ebcf74c to 15fe12d7f8b74204c1704ad860810df874294b4d

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

15fe12dClean up in coerce_dict

comment:10 Changed 3 years ago by jdemeyer

  • Cc chapoton added

cc chapoton because this is also relevant for Python 3.

comment:11 follow-ups: Changed 3 years ago by tscrim

Can you also justify 8 a little too? I was only able to get a loose understanding from a quick search.

Maybe I am too traditional, but why remove the explicit return values?

  • src/sage/structure/coerce_dict.pyx

    diff --git a/src/sage/structure/coerce_dict.pyx b/src/sage/structure/coerce_dict.pyx
    index bbd1ca7..cc2d04f 100644
    a b cdef class MonoDict: (this hunk was shorter than expected) 
    821869#on cyclic GC)
    822870
    823871cdef int MonoDict_traverse(MonoDict op, visitproc visit, void *arg):
    824     if op.table == NULL:
     872    if op.table is NULL:
    825873        return 0
    826874    Py_VISIT3(<PyObject*>op.eraser, visit, arg)
    827875    cdef size_t i
    828876    for i in range(op.mask + 1):
    829877        cursor = &op.table[i]
    830         if cursor.key_id != NULL and cursor.key_id != dummy:
     878        if valid(cursor.key_id):
    831879            Py_VISIT3(cursor.key_weakref, visit, arg)
    832880            Py_VISIT3(cursor.value, visit, arg)
    833     return 0
    cdef int MonoDict_clear(MonoDict op): 
    859906    del tmp
    860907    for i in range(mask+1):
    861908        cursor = &(table[i])
    862         if cursor.key_id != NULL and cursor.key_id != dummy:
     909        if valid(cursor.key_id):
    863910            cursor.key_id = dummy
    864911            Py_XDECREF(cursor.key_weakref)
    865912            Py_XDECREF(cursor.value)
    866     PyMem_Free(table)
    867     return 0
     913    sig_free(table)
     914
    868915
    869916(<PyTypeObject*>MonoDict).tp_traverse = <traverseproc>(&MonoDict_traverse)
    870917(<PyTypeObject*>MonoDict).tp_clear = <inquiry>(&MonoDict_clear)
    871918
     919
    872920cdef class TripleDictEraser:
    873921    """
    874922    Erases items from a :class:`TripleDict` when a weak reference becomes

I am assuming 5 is for better consistency with Python3. Although I think that items() should have a least a little one-line description saying that it is an iterator (and not a list/view).

Trivial and I know you essentially simply moved it, but only 2 spaces instead of 4:

    sage: K.<t> = GF(2^55)
    sage: for i in range(50):
    ....:   a = K.random_element()
    ....:   E = EllipticCurve(j=a)
    ....:   P = E.random_point()
    ....:   Q = 2*P

Also with its new placement, it does not need an #indirect doctest.

comment:12 in reply to: ↑ 11 Changed 3 years ago by jdemeyer

Replying to tscrim:

Maybe I am too traditional, but why remove the explicit return values?

Conceptually, these functions do not return anything. It is really analogous to a plain Python function which always returns None. The int return value exists only for a technical reason, namely exception handling.

comment:13 Changed 3 years ago by jdemeyer

  • Description modified (diff)

comment:14 Changed 3 years ago by jdemeyer

  • Description modified (diff)

comment:15 Changed 3 years ago by jdemeyer

  • Description modified (diff)

comment:16 Changed 3 years ago by git

  • Commit changed from 15fe12d7f8b74204c1704ad860810df874294b4d to 347a7c544f589d5306b14f24d51ba8da41cfa535

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

347a7c5Clean up in coerce_dict

comment:17 in reply to: ↑ 11 Changed 3 years ago by jdemeyer

Replying to tscrim:

Can you also justify 8 a little too?

The new class ObjectWrapper is a simple replacement for a PyCapsule. It's simpler because it doesn't need a name, a context or a destructor. It literally just stores a pointer and nothing else. This simplicity and the @cython.freelist() should make it extremely fast too.

comment:18 in reply to: ↑ 11 Changed 3 years ago by jdemeyer

Replying to tscrim:

I am assuming 5 is for better consistency with Python3.

Obviously. If we are implementing a kind of dict, we should use items() instead of iteritems().

Although I think that items() should have a least a little one-line description saying that it is an iterator (and not a list/view).

Done.

comment:19 Changed 3 years ago by jdemeyer

  • Description modified (diff)

I ended up doing a lot more cleanup in this version. Please review.

comment:20 Changed 3 years ago by git

  • Commit changed from 347a7c544f589d5306b14f24d51ba8da41cfa535 to 87a01932994209c9581c90e21de2e8c9e3573970

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

87a0193Clean up in coerce_dict

comment:21 follow-up: Changed 3 years ago by tscrim

  • Status changed from needs_review to needs_work

Thank you for the explanations. All of the changes make sense to me.

However, multiple patchbots are reporting multiple random seg faults. So something seems to be subtly breaking.

comment:22 in reply to: ↑ 21 ; follow-up: Changed 3 years ago by jdemeyer

  • Description modified (diff)
  • Status changed from needs_work to needs_review

Replying to tscrim:

However, multiple patchbots are reporting multiple random seg faults. So something seems to be subtly breaking.

I know, that was because I made a mistake in __dealloc__. I fixed that now, so we need to wait for new patchbot reports.

By the way, I realize that I'm making quite a lot of changes in this ticket. What started as a plan to fix the iteritems() calls ended up rewriting most of the coerce_dict.pyx file. However, most changes are pretty simple and do not change the functionality at all.

If there are certain changes that you cannot accept (either because you disagree or you don't understand), I can try to revert those.

comment:23 in reply to: ↑ 22 ; follow-up: Changed 3 years ago by tscrim

Replying to jdemeyer:

Replying to tscrim:

However, multiple patchbots are reporting multiple random seg faults. So something seems to be subtly breaking.

I know, that was because I made a mistake in __dealloc__. I fixed that now, so we need to wait for new patchbot reports.

Ah, I see. I couldn't quite tell if the patchbot reports were current or not.

By the way, I realize that I'm making quite a lot of changes in this ticket. What started as a plan to fix the iteritems() calls ended up rewriting most of the coerce_dict.pyx file. However, most changes are pretty simple and do not change the functionality at all.

It's good to get things done.

If there are certain changes that you cannot accept (either because you disagree or you don't understand), I can try to revert those.

I think I understand all of the changes and agree with them (or at least do not have any reason to disagree). So if the latest patchbots come back clean, then it will be a positive review.

comment:24 in reply to: ↑ 23 ; follow-up: Changed 3 years ago by jdemeyer

Replying to tscrim:

Ah, I see. I couldn't quite tell if the patchbot reports were current or not.

I believe that the patchbot icon which is displayed on the Trac page (i.e. this page) only considers the up-to-date reports. As I'm writing this comment, I see a status of "pending", meaning that no patchbot has tested this latest version.

comment:25 in reply to: ↑ 24 Changed 3 years ago by tscrim

Replying to jdemeyer:

Replying to tscrim:

Ah, I see. I couldn't quite tell if the patchbot reports were current or not.

I believe that the patchbot icon which is displayed on the Trac page (i.e. this page) only considers the up-to-date reports. As I'm writing this comment, I see a status of "pending", meaning that no patchbot has tested this latest version.

I've seen it as pending before when one of them is still working but another has finished and came back green. Although that might have been on a much older version of the patchbot.

comment:26 Changed 3 years ago by tscrim

  • Reviewers set to Travis Scrimshaw
  • Status changed from needs_review to positive_review

Greenbot.

comment:27 Changed 3 years ago by jdemeyer

  • Description modified (diff)

comment:28 Changed 3 years ago by jdemeyer

  • Description modified (diff)

comment:29 Changed 3 years ago by jdemeyer

  • Description modified (diff)

comment:30 Changed 3 years ago by git

  • Commit changed from 87a01932994209c9581c90e21de2e8c9e3573970 to 7c83e05a8df5e3bad8c909b33835a2bc5b1920e3
  • Status changed from positive_review to needs_review

Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:

7c83e05Add comments on MonoDict/TripleDict attributes

comment:31 Changed 3 years ago by jdemeyer

  • Status changed from needs_review to positive_review

Added some harmless comments.

comment:32 Changed 3 years ago by vbraun

  • Branch changed from u/jdemeyer/clean_up_in_coerce_dict to 7c83e05a8df5e3bad8c909b33835a2bc5b1920e3
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.