Opened 6 years ago

Closed 5 years ago

Last modified 20 months ago

#15367 closed enhancement (fixed)

Empty lists while creating parents

Reported by: roed Owned by:
Priority: major Milestone: sage-6.1
Component: coercion Keywords:
Cc: nbruin, SimonKing, vbraun Merged in:
Authors: Nils Bruin Reviewers: Simon King
Report Upstream: N/A Work issues:
Branch: u/nbruin/ticket/15367 (Commits) Commit: 719cdec176875685142039dce297a7fd8ae4143b
Dependencies: Stopgaps:

Description

This is crazy:

sage: import gc
sage: A = at_least
sage: pre = gc.get_objects()
sage: K = Qp(7,2)
sage: post = gc.get_objects()
sage: len(post) - len(pre)
747

Much of that is empty lists:

sage: M = []
sage: for a in post:
....:     for b in pre:
....:         if a is b:
....:             break
....:     else:
....:         M.append(a)
sage: from collections import defaultdict
sage: types = defaultdict(int)
sage: for a in K.M:
....:     types[type(a)] += 1
sage: print '\n'.join([str(a) + ' ' + str(types[a]) for a in types.keys() if types[a] > 1])
<type 'list'> 554
<type 'tuple'> 57
<class 'weakref.KeyedRef'> 28
<type 'weakref'> 22
<type 'dict'> 18
<type 'sage.structure.coerce_dict.MonoDictEraser'> 10
<type 'sage.structure.coerce_dict.MonoDict'> 10
<type 'cell'> 8
<type 'instancemethod'> 6
<type 'frame'> 6
<type 'sage.structure.coerce_dict.TripleDict'> 5
<type 'sage.structure.coerce_dict.TripleDictEraser'> 5
<type 'function'> 4
<type 'staticmethod'> 4
<class 'sage.structure.dynamic_class.DynamicMetaclass'> 4
<class 'sage.rings.homset.RingHomset_generic_with_category'> 2
<class '_ast.Name'> 2

sage: listlens = defaultdict(int)
sage: for a in M:
....:     if isinstance(a, list):
....:         listlens[len(a)] += 1
sage: sage: sorted(listlens.items())
[(0, 525), (1, 9), (2, 2), (3, 2), (23, 10), (53, 5), (116583, 1)]

I'm not sure where these 525 empty lists are created, but it seems a bit excessive.

It's not just p-adic fields:

sage: pre = gc.get_objects()
sage: K = QuadraticField(-11)
sage: post = gc.get_objects()
sage: len(post) - len(pre)
1152

Attachments (2)

coerce_dict.pyx (52.3 KB) - added by nbruin 6 years ago.
MonoDict? and TripleDict? implemented by open addressing
coerce_dict.pxd (780 bytes) - added by nbruin 6 years ago.
MonoDict? and TripleDict? implemented by open addressing

Download all attachments as: .zip

Change History (63)

comment:1 Changed 6 years ago by nbruin

These are the empty buckets of the various TripleDict and MonoDict data structures:

sage: import gc
sage: pre = gc.get_objects()
sage: K = Qp(7,2)
sage: post = gc.get_objects()
sage: len(post) - len(pre)
682
sage: pre_id=set( id(p) for p in pre )
sage: new=[p for p in post if id(p) not in pre_id and isinstance(p,list) and len(p) ==0]

OK, we have now all the new lists of length 1. Let's see who's referring to them

sage: refs=[]
sage: for p in new: refs.extend(gc.get_referrers(p))

and count, among the referrers that are lists themselves, what the lengths are.

sage: from collections import Counter
sage: Counter(len(l) for l in refs if isinstance(l,list))
Counter({525: 525, 117431: 525, 53: 265, 23: 228})

The first one must be new itself and the one of length 117431 also seems to contain all 525 of them, so that must be post. The other ones have conspicuous lengths: 53 and 23:

sage: search_src("TripleDict\(23\)")
structure/parent.pyx:684:            self._action_hash = TripleDict(23)
structure/parent_old.pyx:87:        self._action_hash = TripleDict(23)
sage: search_src("MonoDict\(23\)")
structure/parent_gens.pyx:254:        self._has_coerce_map_from = MonoDict(23)
structure/parent.pyx:682:            self._coerce_from_hash = MonoDict(23)
structure/parent_old.pyx:85:        self._coerce_from_hash = MonoDict(23)
structure/parent_old.pyx:100:        self._has_coerce_map_from = MonoDict(23)
structure/parent_old.pyx:343:            self._has_coerce_map_from = MonoDict(23)
sage: search_src("MonoDict\(53\)")
structure/parent.pyx:686:            self._convert_from_hash = MonoDict(53)

so it looks like we have 265/53=5 (empty) self._convert_from_hash dictionaries and 228/23=9.9, so probably 10 (mostly) empty self._coerce_from_hash and self._action_hash dictionaries. Did we create 5 parents, perhaps?

If you think this is a real issue, we could open-code the hashtables for MonoDict and TripleDict in the same way that dict is done. That's quite a bit of work to program correctly, though! In the mean time, we could reduce the default sizes of the dictionaries a bit. Perhaps 53 and 23 is a bit on the large side for dictionaries that tend to not fill up so much?

Last edited 6 years ago by nbruin (previous) (diff)

comment:2 Changed 6 years ago by afiori

The code:

memus= get_memory_usage();
for i in primes_first_n(1000):
    K = Qp(i,2);
    y=2-K(2);
    del K; del y;
    if i%100 == 1:
        gc.collect();
gc.collect();
print get_memory_usage(memus);

Allocates about 64M of ram. This suggests that we are using about 64K of ram per p-adic field. There is a similar phenomenon for quadratic fields (about 50K each). For normal use, no one likely cares about one allocation of 64K, however, if you want to be using say 20000 p-adic fields (or perhaps 20000 things with parents), you might care about your 1.2G of ram.

The side issue that it is very easy to make p-adic fields or number fields un-garbage collectable (hopefully fixed forever someday soon http://trac.sagemath.org/ticket/14711) makes it easy for a user to accidentally end up running out of ram.

comment:3 Changed 6 years ago by afiori

There is a comment in: parent.pyx:2196

if self._coerce_from_hash is None: # this is because parent.__init__() does not always get called
            self.init_coerce(False)

that suggests that we can't rely on init to actually be responsible for allocating anything. Consequently, throughout parent.pyx and parent_old.pyx most of the time we use any of

_action_hash, _has_coerce_map_from, _coerce_from_hash, _convert_from_hash

we check if they have been allocated yet. (There may be places where we don't check... which I suppose would be a bug).

This suggests the following fix:

Never allocate any of these, unless we intend to put something in them. We already need to check if these are unallocated, if they are... we can safely assume they are empty and not create them unless we are about to put something in them. We can also make the eventual start size for them smaller.

This may even speed up execution if these lists are typically empty, as looking to see if an empty list contains something will be slower than checking that the list doesn't even exist yet (especially as we already check if the list doesn't exist yet).

comment:4 follow-up: Changed 6 years ago by nbruin

Memory footprint:

sage: from sage.structure.coerce_dict import MonoDict
sage: memus= get_memory_usage();
sage: L=[MonoDict(23) for i in xrange(2*10^5)]
sage: print get_memory_usage(memus);
430.5078125
sage: memus= get_memory_usage();
sage: L2=[MonoDict(53) for i in xrange(2*10^5)]
sage: print get_memory_usage(memus);
908.0078125
sage: memus= get_memory_usage();
sage: L3=[sage.structure.coerce_dict.TripleDict(23) for i in xrange(2*10^5)]
sage: print get_memory_usage(memus);
430.5546875
sage: L4=[sage.structure.coerce_dict.TripleDict(11) for i in xrange(2*10^5)]
sage: print get_memory_usage(memus);
248.15234375

So we see that if we assume that 1MB=10^6B, then the memory footprint of empty TripleDict and MonoDict is about 112 to 93 to 86 bytes per initialized bucket (for empty dicts they should have about the same memory footprint. This would not be true for an open-coded hash table). So scaling back all these dictionaries to starting size 11 is probably quite reasonable: That allows 6 entries in them before resizing would be triggered.

It also makes a difference in initialization time:

sage: from sage.structure.coerce_dict import MonoDict, TripleDict
sage: %time _=[sage.structure.coerce_dict.TripleDict(11) for i in xrange(2*10^5)]
CPU times: user 1.30 s, sys: 0.07 s, total: 1.37 s
Wall time: 1.37 s
sage: %time _=[sage.structure.coerce_dict.TripleDict(23) for i in xrange(2*10^5)]
CPU times: user 2.14 s, sys: 0.11 s, total: 2.25 s
Wall time: 2.26 s
sage: %time _=[sage.structure.coerce_dict.TripleDict(53) for i in xrange(2*10^5)]
CPU times: user 3.33 s, sys: 0.16 s, total: 3.49 s
Wall time: 3.50 s

As you can see, allocating the empty lists is a pretty significant cost as well. For an open-coded hash table we would expect on a 64-bit machine: 16 bytes per entry in MonoDict and 32 bytes per entry in TripleDict (compared to 24 bytes for a python dict)

Furthermore, allocation time would be fairly independent of size: the store would simply be a slab of memory that can be initialized to null by a straight memory-zeroing call.

So I'm pretty sure we could make MonoDict and TripleDict quite a bit faster and more memory-efficient by basically just copying the open hash-table design for python's dict. We'd be working with table sizes that are a power of two, so we'd probably want to shift the addresses that we use as keys (given that the bottom 3 bits probably carry no information)

It's quite a bit of work to get right, so we should probably first establish that this is actually a bottleneck somewhere. I don't think the runtime characteristics of the implementations would be hugely different (but I cannot exclude something like a factor of 2 in runtime)

comment:5 in reply to: ↑ 4 ; follow-up: Changed 6 years ago by SimonKing

Replying to nbruin:

So I'm pretty sure we could make MonoDict and TripleDict quite a bit faster and more memory-efficient by basically just copying the open hash-table design for python's dict. We'd be working with table sizes that are a power of two, so we'd probably want to shift the addresses that we use as keys (given that the bottom 3 bits probably carry no information)

It's quite a bit of work to get right, ...

Indeed. Thus, I think it would be a good idea to start with small steps, that can easily be done and may improve the situation.

Namely:

  • Let the default size of coercion caches be smaller. Why don't we make a test how big the caches become in doctests? If it turns out that most of the time we only store about 20 coercions, it does not make sense to create 53 buckets, but 23 would be more than enough.
  • Only create the dictionaries when we actually need them. I should add: Do not initialise all coercion caches of a parent at once.

comment:6 in reply to: ↑ 5 ; follow-up: Changed 6 years ago by nbruin

Replying to SimonKing:

  • Let the default size of coercion caches be smaller. Why don't we make a test how big the caches become in doctests? If it turns out that most of the time we only store about 20 coercions, it does not make sense to create 53 buckets, but 23 would be more than enough.

Given that we should bound the fill ratio by about 60%-70%, that would not be enough. I expect, however, that many caches have very few entries, so going with 11 and reshaping when required is likely the best.

  • Only create the dictionaries when we actually need them. I should add: Do not initialise all coercion caches of a parent at once.

Hm, I wonder how the cost of that works out. Another option: make MonoDict etc. lazy in actually allocating the buckets (i.e., upon creation, just set self._buckets=None).

comment:7 follow-up: Changed 6 years ago by SimonKing

To test whether the dimension of the dicts is too big at initialisation, it makes sense to see how often a resize of MonoDict and TripleDict happens.

First findings:

  • During startup of Sage, there is no resize of a MonoDict. There is exactly one TripleDict that is resized, and it is actually resized three times.
  • The size of a MonoDict or TripleDict is always (by automatic resize) smaller than the number of buckets. The first resize of the TripleDict mentioned above happens when it has 53 buckets but 38 items (and after resize, it has thus 107 buckets for 38 items!)

I will now see what happens during doctests.

Question: Is it really a good idea to have 107 buckets for 38 items?

comment:8 in reply to: ↑ 6 ; follow-up: Changed 6 years ago by SimonKing

Replying to nbruin:

Given that we should bound the fill ratio by about 60%-70%, that would not be enough.

What is the theory behind this choice?

  • Only create the dictionaries when we actually need them. I should add: Do not initialise all coercion caches of a parent at once.

Hm, I wonder how the cost of that works out. Another option: make MonoDict etc. lazy in actually allocating the buckets (i.e., upon creation, just set self._buckets=None).

Good idea! So, do not be lazy in coercion initialisation of Parent, but be lazy in the underlying data structure.

comment:9 in reply to: ↑ 7 Changed 6 years ago by nbruin

Replying to SimonKing:

Question: Is it really a good idea to have 107 buckets for 38 items?

The answer to this question is basically the same as "Is it really worth using a dictionary with lightning fast lookup and bad memory density rather than a linear search in a dense list?", except that we get to tune between the two.

Given cache locality, a linear search in a list of 38 items could be quite competitive!

Python dicts are even more agressive in resizing: for small dictionary sizes, they aim for the next power of two that is at least FOUR times the number of items (and trigger it at 2/3 occupation rate). But then, their price per entry is 24 bytes.

comment:10 follow-up: Changed 6 years ago by SimonKing

Next findings:

  • MonoDict does not get resized often, and during doctests of sage.schemes, the resize happens at a maximal size of 1118 items
  • TripleDict is resized more often. A size of 1277 is not uncommon, and maximally I see a resize happening with 5125 items.

Anyway, I am not sure if these figures are hinting something.

Do I understand correctly: Your suggestion is to start with a list (an actual initialised list) of buckets, but initially each bucket will be None rather than []. Hence, if the hash makes us choose a bucket that is None, then we immediately know that an item with this key hash does not exist yet, and we may either create a new bucket (if we want to add a new item) or immediately raise a KeyError (if we want to get an item).

comment:11 in reply to: ↑ 10 Changed 6 years ago by nbruin

Replying to SimonKing:

Do I understand correctly: Your suggestion is to start with a list (an actual initialised list) of buckets, but initially each bucket will be None rather than [].

Uh, no. I was more thinking of starting out with a bucket list of None. Your interpretation is also possible, but addresses a slightly different issue. It could be worth doing both, one of the two, or neither.

For your proposal: I don't know what the cost difference is between making an empty list and putting another reference to None in, i.e., the cost of

   cdef PyList buckets = PyList_New(nbuckets)
   for i in range(nbuckets):
        PyList_SetItem(buckets,i,None) #I think this works OK for refcounting

versus

   cdef PyList buckets = PyList_New(nbuckets)
   for i in range(nbuckets):
        PyList_SetItem(buckets,i,PyList_New(0))

If the difference is significant (and it may well be, because increffing doesn't involve memory allocation) then your interpretation may well have great benefits (and surely reduced memory footprint).

If you're feeling gutsy, you could just leave the entries in buckets initialized with NULL and save on all the incref/decref activity on None.

Come to think of it, I expect that your interpretation would have the biggest impact.

comment:12 in reply to: ↑ 8 Changed 6 years ago by nbruin

Replying to SimonKing:

Replying to nbruin:

Given that we should bound the fill ratio by about 60%-70%, that would not be enough.

What is the theory behind this choice?

A reasonable start is wikipedia. We're currently using "separate chaining" which degrades fairly gracefully with increasing load factor, so perhaps the 60-70 load factor isn't as strict as the phase transition that open addressing undergoes around that point. You could experiment and tune a reasonable trade-off (the idea is that any time you don't hit what you're looking for on the first try, you've basically lost. Hash conflict resolution should rarely be necessary).

The other design limitation is that reshaping is expensive (O(size of dict)), so that should happen rarely. That means that if you increase the number of bins, you have to do so by a lot.

comment:13 Changed 6 years ago by nbruin

After figuring out how python dicts work to improve WeakValueDictionary? it wasn't so complicated to replicate the design for MonoDict? and TripleDict? either. Things of note:

  • I tried to be drop-in compatible. I haven't run doctests yet, but adapting should be straightforward
  • Performance is noticeably better than the previous implementation. Given how much overhead a per-line timeit has due to the python interpreter, I expect that this means the performance is considerably better (but in applications it probably doesn't matter that much, since our previous implementation was already pretty good)
  • regular deletions are done by placing the items to be deleted in a list (yes, that means allocating a list, but weakref callbacks etc. are allowed memory allocations. It would be virtually impossible to write python interpreter code that doesn't do that) and then throwing away the list. Advantage: we borrow python's trashcan that's already implemented for container types like lists. Of course there's the slight disadvantage of having some memory allocation activity, but that should be pretty fast (every python function call involves constructing tuples etc.)
  • storage should be much more compact, which should make code more cache-friendly as well. We don't have per-bucket overhead anymore: open addressing shares all space.
  • design size are irrelevant: the table is just a power of 2, but we use different hashing, so we should still get a good spread.

Some little problems:

  • rolling your own dynamically allocated store for python object links means that cython's generated GC traverse and clear code isn't sufficient. I wrote my own and hack them into the right slots.
  • we need to do some extensive testing to check no silly mistakes are left.

Changed 6 years ago by nbruin

MonoDict? and TripleDict? implemented by open addressing

Changed 6 years ago by nbruin

MonoDict? and TripleDict? implemented by open addressing

comment:14 Changed 6 years ago by nbruin

With the attached files the doctests succeed (modulo some trivial failures).

comment:15 follow-up: Changed 6 years ago by SimonKing

With the old code, it would have been relatively easy to create a dictionary with a fixed key size (e.g. a dictionary such that all keys are 5-tuples of objects that are weak-refd if possible and compared by identity).

One question we might address: How difficult would this be with the new code you are suggesting? You are defining cdef struct mono_cell and cdef struct triple_cell. One might think of this definition

cdef struct tuple_cell:
    void** key_id_list
    PyObject** key_weakref_list
    PyObject* value

and then a KeytupleDict (or whatever we call it) would allocate enough memory for n pointers (n being the key size of the dictionary) both for key_id_list and key_weakref_list, for each item it stores. At first glance, your code looks like not difficult to generalise.

So, do you think we should have MonoDict and TripleDict for hard-coded key length and then (on a different ticket?) KeytupleDict for arbitrary fixed key length?

comment:16 in reply to: ↑ 15 Changed 6 years ago by nbruin

Replying to SimonKing:

So, do you think we should have MonoDict and TripleDict for hard-coded key length and then (on a different ticket?) KeytupleDict for arbitrary fixed key length?

I do not think so. With the previous code you had a prototype that performed well and offered this feature and we didn't put it in. That suggests to me we don't have a need for it.

In any case, If you would do this I don't think you'd want to add an extra layer of indirection in the key fields. Hence, you'd have a dynamic length structure:

cdef struct tuple_cell:
    void* key_ids[length]
    PyObject* key_weakrefs[length]
    PyObject* value

which of course you cannot define, but you need the sizeof, which should fairly safely be sizeofcell=(2*length+1)*sizeof(void*) Then you can have your table allocated as

table=<void**>PyMem_Malloc(newsize * sizeofcell)

and addressing entry i would be addressed as

cursor=<void**>&(table[(i&mask)*(2*length+1)])
key_id1=cursor[0]
key_id2=cursor[1]
key_wr1=<PyObject*> cursor[length]
key_wr2=<PyObject*> cursor[length+1]
value  =<PyObject*> cursor[2*length]

One problem with open addressing is that it has no cache locality at all. That's not a huge problem if the table is compact (and your hits are almost always immediate). Since the weakrefs and value are only used for validation, not searching, you could consider storing 'those' under an indirection.

I think we should only generalize the design if we know we have a need for it. I think first we should establish if this implementation is really an advantage, if the hash functions perform well, and if the resizing amount is reasonable.

One way would be to let lookup keep statistics:

  • number of iterations needed
  • mask,fill,used statistics

and perhaps also for the dictionaries themselves:

  • calls to resize
  • calls to iteritems (especially with what mask:used:fill ratio that gets called)

See: http://code.activestate.com/recipes/578375/ for a proposal that makes a much more compact search table with compact iteration.

And as well: profiling data on how much time one typically in this code. If that's 2%, even doubling the speed would not have much effect. We may be hitting this code fairly often, but it's still pretty specialized: not like a dict of which several get queried for any method lookup.

comment:17 follow-up: Changed 6 years ago by SimonKing

I can confirm that all doctests pass, except some in sage.structure.coerce, which explicitly use a method that shows the distribution of values in the buckets and is thus not relevant.

Suggestion: I turn your two files into a git branch, remove the useless stuff in coerce.pyx, and then we can start reviewing (and testing performance).

comment:18 Changed 6 years ago by SimonKing

  • Branch set to u/SimonKing/ticket/15367
  • Created changed from 11/07/13 06:21:28 to 11/07/13 06:21:28
  • Modified changed from 11/12/13 11:31:30 to 11/12/13 11:31:30

comment:19 in reply to: ↑ 17 Changed 6 years ago by SimonKing

  • Authors set to Nils Bruin
  • Commit set to a2852e9610a76e6df392544b01b8e3c319b1cdd4
  • Reviewers set to Simon King

Replying to SimonKing:

Suggestion: I turn your two files into a git branch, remove the useless stuff in coerce.pyx, and then we can start reviewing (and testing performance).

Well, I simply did...

The first commit is the result of your drop-in replacement, the second commit is removing the get_stats() method, which is a debug method that is not functional any longer. I think the second commit is nothing but a review patch. Filling Author and Reviewer fields accordingly---but the review is not completed yet!


New commits:

a2852e9Remove unused debug method CoercionModel_cache_maps.get_stats()
1724e0eAn improved implementation of MonoDict? and TripleDict?

comment:20 Changed 6 years ago by SimonKing

With the attached branch, I repeated the example from the ticket description:

sage: import gc
sage: pre = gc.get_objects()
sage: K = Qp(7,2)
sage: post = gc.get_objects()
sage: len(post) - len(pre)
211

And from comment:2 (starting a new session):

sage: M = []
sage: memus= get_memory_usage()
sage: for i in primes_first_n(1000):
....:     K = Qp(i,2)
....:     y=2-K(2)
....:     del K,y
....:     if i%100 == 1:
....:         gc.collect()
....: gc.collect()
....: print get_memory_usage(memus)
....: 
30
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
12.48828125

With sage-5.12.beta5+#13394, I get only "0" for gc.collect(), and in the end it get_memory_usage(memus) returns 30.2265625.

I have to leave now, but will tell something about memory footprint later.

comment:21 Changed 6 years ago by SimonKing

  • Work issues set to Improve timings

I have good and bad news. The good news first.

Repeating the tests from comment:4, with the attached branch:

sage: from sage.structure.coerce_dict import MonoDict
sage: memus= get_memory_usage()
sage: L=[MonoDict(23) for i in xrange(2*10^5)]
sage: print get_memory_usage(memus)
54.984375
sage: memus= get_memory_usage()
sage: L2=[MonoDict(53) for i in xrange(2*10^5)]
sage: print get_memory_usage(memus)
55.2109375
sage: memus= get_memory_usage()
sage: L3=[sage.structure.coerce_dict.TripleDict(23) for i in xrange(2*10^5)]
sage: print get_memory_usage(memus)
79.83203125
sage: memus= get_memory_usage()
sage: L4=[sage.structure.coerce_dict.TripleDict(11) for i in xrange(2*10^5)]
sage: print get_memory_usage(memus)
79.5429687500000

So, that's much of a progress.

More good news. I compare sage-5.12.beta5+#13394 with the branch from here. First, without the branch:

sage: %time _=[sage.structure.coerce_dict.TripleDict(11) for i in xrange(2*10^5)]
CPU times: user 2.50 s, sys: 0.12 s, total: 2.61 s
Wall time: 2.62 s
sage: %time _=[sage.structure.coerce_dict.TripleDict(23) for i in xrange(2*10^5)]
CPU times: user 4.04 s, sys: 0.16 s, total: 4.21 s
Wall time: 4.21 s
sage: %time _=[sage.structure.coerce_dict.TripleDict(53) for i in xrange(2*10^5)]
CPU times: user 6.10 s, sys: 0.27 s, total: 6.36 s
Wall time: 6.38 s

With the branch:

sage: %time _=[sage.structure.coerce_dict.TripleDict(11) for i in xrange(2*10^5)]
CPU times: user 2.48 s, sys: 0.10 s, total: 2.57 s
Wall time: 2.59 s
sage: %time _=[sage.structure.coerce_dict.TripleDict(23) for i in xrange(2*10^5)]
CPU times: user 2.19 s, sys: 0.07 s, total: 2.26 s
Wall time: 2.27 s
sage: %time _=[sage.structure.coerce_dict.TripleDict(53) for i in xrange(2*10^5)]
CPU times: user 2.51 s, sys: 0.02 s, total: 2.53 s
Wall time: 2.53 s

Thus, it is a clear progress, for larger empty dictionaries.

Now for the bad news. I am testing how efficient it is to fill an empty TripleDict, triggering many resizes, and reading and deleting items. First, without the branch:

sage: cython("""
....: def test(T, list L, v):
....:     for i in L:
....:         for j in L:
....:             for k in L:
....:                 T[i,j,k] = v
....: """)
....: 
sage: T = sage.structure.coerce_dict.TripleDict(11)
sage: L = srange(100)
sage: %time test(T, L, ZZ)
CPU times: user 15.23 s, sys: 0.28 s, total: 15.51 s
Wall time: 15.54 s
sage: cython("""
....: def test2(T, list L):
....:     for i in L:
....:         for j in L:
....:             for k in L:
....:                 v = T[i,j,k]
....:                 del T[i,j,k]
....: """)
....: 
sage: %time test2(T, L)
CPU times: user 2.74 s, sys: 0.06 s, total: 2.81 s
Wall time: 2.81 s

Now, the same with the branch:

sage: T = sage.structure.coerce_dict.TripleDict(11)
sage: L = srange(100)
sage: %time test(T, L, ZZ)
CPU times: user 131.21 s, sys: 1.36 s, total: 132.57 s
Wall time: 133.11 s
sage: %time test2(T, L)
CPU times: user 46.98 s, sys: 0.58 s, total: 47.56 s
Wall time: 47.97 s

Conclusion

The memory footprint and the creation time of an empty dictionary did considerably improve. However, the timings for setting, getting and deleting items became very very bad. These operations are of vital importance for the coercion framework, and hence this needs to be fixed.

comment:22 Changed 6 years ago by SimonKing

For completeness (with the same definition of test(T,L,v) and test2(T,L) as in the previous post):

sage: T = {}
sage: L = srange(100)
sage: %time test(T, L, ZZ)
CPU times: user 0.53 s, sys: 0.05 s, total: 0.58 s
Wall time: 0.58 s
sage: %time test2(T, L)
CPU times: user 0.67 s, sys: 0.01 s, total: 0.68 s
Wall time: 0.68 s

Do I understand correctly that your version of TripleDict and MonoDict mimics Python's implementation of a dict? Then why is it so much slower?

comment:23 follow-ups: Changed 6 years ago by afiori

Doesn't your code only test the setting and deleting of items? One hopes that the getting is actually the by far most common action otherwise... caching using a hash table is the wrong choice.

Even so, obviously it would be best to improve the speed of the new code to match the old code, but if this isn't possible one could add parameters to parent constructors of the form:

expect_lots_of_coercion = True, expect_lots_of_actions=True

and let either the user or the class implementer decide if it is likely there will be a lot of coercions or actions and pass the appropriate parameter.

The following is a bit off topic, implicit coercion is clearly important (and incredibly useful), but wouldn't it always be strictly faster for a caller to explicitly coerce with the correct function rather than letting things get to the lookup code? So couldn't a user get more of a speed boost by being warned that an implicit coercion is happening and being encouraged to do the correct cast manually (for example in the above code I posted, I assume K(2)-K(2) would be strictly faster than 2-K(2)). I know various databases have settings to either warn on implicit coercion or to even disable it completely, can this be done in SAGE?

comment:24 follow-up: Changed 6 years ago by nbruin

Ouch, that is indeed not such good news and with such a degradation in performance I would definitely not recommend merging it. Things I can think of:

  • We might have to improve the initial seed of the hash function. If we're getting many collisions, For dict, oldMonoDict and newMonoDict we're all using different hash functions. In the test above you never timed pure data item retrieval (both existing and non-existing entries). If we can have that data too, we might be able to see if the problem is there
  • We are probably hitting resizing more often (since dict starts with quadrupling, it resizes less), so if that's a problem we could be more aggressive there.
  • I am allocating memory on delete as well, for well-documented reasons. That's more expensive than straight dereffing. Maybe much more expensive?
  • perhaps the GC code around the new dicts is less than optimal? I checked the C-code that Cython generated for the _traverse and _clear methods and was pretty happy with them, but perhaps there's overhead there as well.

Finally, note that MonoDict and TripleDict are optimized for read speed, not particularly the other operations. I think that's the frequent operation on them in the coercion framework; discovery much less so (otherwise parents must be disappearing/reappearing all the time and the creation overhead of the parent is likely much higher).

comment:25 in reply to: ↑ 23 Changed 6 years ago by SimonKing

Replying to afiori:

Doesn't your code only test the setting and deleting of items?

No. test2() also involves v=T[i,j,k], hence, getting. I did not bother to write separate tests for the two.

comment:26 in reply to: ↑ 24 Changed 6 years ago by SimonKing

Replying to nbruin:

Finally, note that MonoDict and TripleDict are optimized for read speed, not particularly the other operations.

OK, then we should indeed better separate the test v=T[i,j,k] (both existing and non-existing items), del T[i,j,k] and T[i,j,k]=v (both adding new items and overriding an existing item).

comment:27 in reply to: ↑ 23 Changed 6 years ago by nbruin

Replying to afiori:

Even so, obviously it would be best to improve the speed of the new code to match the old code, but if this isn't possible one could add parameters to parent constructors of the form

expect_lots_of_coercion = True, expect_lots_of_actions=True

and let either the user or the class implementer decide if it is likely there will be a lot of coercions or actions and pass the appropriate parameter.

No, sticking with the old implementation (possibly made lazy) would probably be a far better choice. I should stress this code is an experiment. Perhaps open addressing isn't an appropriate choice for our application or perhaps I just made an error in my implementation. I didn't replicate all the tuning choices made in python's dict but went for a "straight" implementation first.

The following is a bit off topic, implicit coercion is clearly important (and incredibly useful), but wouldn't it always be strictly faster for a caller to explicitly coerce with the correct function rather than letting things get to the lookup code?

Yes, but probably hardly noticeably so. The conversions are going to be the expensive bit. Far better: make sure that your elements are all in the right parent anyway (for actions this doesn't apply of course).

comment:28 Changed 6 years ago by nbruin

Hm, I'm getting on vanilla 5.10:

sage: %time test(T,L,ZZ)
CPU times: user 6.54 s, sys: 0.02 s, total: 6.56 s
Wall time: 6.58 s
sage: %time test(T,L,ZZ)
CPU times: user 0.79 s, sys: 0.00 s, total: 0.79 s
Wall time: 0.79 s

and on 5.13 with patch:

sage: sage: %time test(T, L, ZZ)
CPU times: user 19.53 s, sys: 0.02 s, total: 19.56 s
Wall time: 19.67 s
sage: sage: %time test(T, L, ZZ)
CPU times: user 6.46 s, sys: 0.00 s, total: 6.46 s
Wall time: 6.49 s

so the gap for me is a little smaller, but definitely there. (running a second time gets rid of resizing in both cases. You're basically trying reassigning then). I think the new hash might be bad in the face of permutations.

When I instrument lookup to print length of probe sequence until result and (mask,used,fill) I get the impression the probe sequences are a bit on the long side, indicating poor hashing. We may have choose a bit more convoluted way of combining the bits of the addresses. In particular, if I keep track of:

summed_expected += (<double> j) - (<double> self.mask+1)/(<double> self.mask+1-self.fill)

where j is the number of iterations needed by lookup I think I should get something that averages 0 (since I'm subtracting the expected number of samples needed to obtain a NULL), but I'm getting way larger numbers. That indicates to me poorly randomized probing sequences.

Deletion is indeed also slower (as expected)

EDIT: I think the hash indeed is very bad. If instead we do

        cdef size_t key = <size_t>key1 + 13*(<size_t>key2) ^ 503*(<size_t>key3)
        cdef size_t i = key>>8 + key
        cursor = &(table[i & mask])
        perturb = (key)>>3

(i.e., borrow the function originally used) I still see summed_expected slowly drift upward, but nowhere as quickly. I didn't do the statistics carefully, so that may even be expected. My preliminary timings show that on your test script, the new implementation now solidly outperforms the old implementation. Perhaps you want to check as well>

Last edited 6 years ago by nbruin (previous) (diff)

comment:29 Changed 6 years ago by nbruin

To compare, with the new hash function we get for the script

%cpaste
cython("""
def test(T, list L1, list L2, list L3, v):
    for i in L1:
        for j in L2:
            for k in L3:
                T[i,j,k] = v
def test2(T, list L1, list L2, list L3):
    for i in L1:
        for j in L2:
            for k in L3:
                v = T[i,j,k]
def test3(T, list L1, list L2, list L3):
    for i in L1:
        for j in L2:
            for k in L3:
                (i,j,k) in T
def test4(T, list L1, list L2, list L3):
    for i in L1:
        for j in L2:
            for k in L3:
                (i,j+1,k) in T
def test5(T, list L1, list L2, list L3):
    for i in L1:
        for j in L2:
            for k in L3:
                del T[i,j,k]
""")
T = sage.structure.coerce_dict.TripleDict(11)
L1 = srange(100)

With vanilla 5.10:

sage: %time test(T,L1,L1,L1,ZZ);
CPU times: user 6.61 s, sys: 0.08 s, total: 6.69 s
Wall time: 6.72 s
sage: %time test2(T,L1,L1,L1);
CPU times: user 0.82 s, sys: 0.00 s, total: 0.82 s
Wall time: 0.83 s
sage: %time test3(T,L1,L1,L1);
CPU times: user 0.83 s, sys: 0.00 s, total: 0.83 s
Wall time: 0.83 s
sage: %time test4(T,L1,L1,L1);
CPU times: user 0.68 s, sys: 0.00 s, total: 0.68 s
Wall time: 0.68 s
sage: %time test5(T,L1,L1,L1);
CPU times: user 0.81 s, sys: 0.00 s, total: 0.81 s
Wall time: 0.81 s

On 5.13beta(something) with the patch:

sage: %time test(T,L1,L1,L1,ZZ);
CPU times: user 3.28 s, sys: 0.06 s, total: 3.34 s
Wall time: 3.35 s
sage: %time test2(T,L1,L1,L1);
CPU times: user 0.61 s, sys: 0.00 s, total: 0.61 s
Wall time: 0.61 s
sage: %time test3(T,L1,L1,L1);
CPU times: user 0.54 s, sys: 0.00 s, total: 0.54 s
Wall time: 0.54 s
sage: %time test4(T,L1,L1,L1);
CPU times: user 0.08 s, sys: 0.00 s, total: 0.08 s
Wall time: 0.08 s
sage: %time test5(T,L1,L1,L1);
CPU times: user 0.28 s, sys: 0.00 s, total: 0.28 s
Wall time: 0.29 s

so creation, finding elements not in there (that one would need more testing if you really want to make that claim though - it may be our keys/dict layout are skewed on this example), and deletion is very much faster and retrieving elements is quite a bit faster. It seems like a win to me (plus: #15069 issues should now be properly fixed in most situations, thanks to a borrowed trashcan--but we could use that in the other design too)

Last edited 6 years ago by nbruin (previous) (diff)

comment:30 Changed 6 years ago by nbruin

  • Branch changed from u/SimonKing/ticket/15367 to u/nbruin/ticket/15367
  • Modified changed from 11/12/13 23:44:53 to 11/12/13 23:44:53

Please pull this for further review. It has updated an hash function in TripleDict and some other minor improvements.

Last edited 6 years ago by nbruin (previous) (diff)

comment:31 Changed 6 years ago by nbruin

  • Commit changed from a2852e9610a76e6df392544b01b8e3c319b1cdd4 to cce61195a94ce70f27c0a3efeafbccf4f190f5f0
  • Component changed from memleak to coercion
  • Status changed from new to needs_review
  • Type changed from defect to enhancement
  • Work issues Improve timings deleted

New commits:

cce6119better hashing for TripleDict?, use PyCapsule? for callbacks, and fix KeyedRef?

comment:32 follow-up: Changed 6 years ago by SimonKing

Can you elaborate: What is PyCapsule? I've never heard of it.

Would it make sense to have cdef type fixed_KeyedRef=KeyedRef instead of just cdef fixed_KeyedRef=KeyedRef? After all, we do "isinstance" for a couple of times, so, it might help Cython if it knows that KeyedRef actually is a type.

Concerning the hash function: Didn't we have the current hash function <size_t>key1 + 13*(<size_t>key2) ^ 503*(<size_t>key3) in the original implementation? If I recall correctly, I chose it because it was recommended as a good hash function for tuples. Why did you (temporarily?) change it to <size_t>key1 + (<size_t>key2)<<2 + (<size_t>key3)<<4?

comment:33 in reply to: ↑ 32 ; follow-ups: Changed 6 years ago by nbruin

Replying to SimonKing:

Can you elaborate: What is PyCapsule? I've never heard of it.

It's a python object wrapping a void *: http://docs.python.org/2/c-api/capsule.html , so it does exactly what we need. In cython it would be something along the lines of:

cdef class myPyCapsule:
  cdef void* pointer

I don't think doing that would be much faster, though: we'd be receiving a <object> that we have to cast to a !myPyCapsule, so cython would be generating the type testing code anyway, which is probably what !PyCapsule_GetPointer is spending most of its time on anyway. So going with the standard python solution for the thing seems appropriate.

Previously, we were casting <void*> to <Py_ssize_t> which we then wrapped as a PyInt object. PyCapsule seems more appropriate. (in our previous implementation it wasn't, because our buckets we PyList, so our keys weren't stored as bare void* anyway. I don't think PyCapsule objects have sensible comparison semantics, so packaging them as PyInt made more sense.

Would it make sense to have cdef type fixed_KeyedRef=KeyedRef instead of just cdef fixed_KeyedRef=KeyedRef? After all, we do "isinstance" for a couple of times, so, it might help Cython if it knows that KeyedRef actually is a type.

I suspect for creation cython generates a PyObject_callobject anyway, so there it shouldn't matter. KeyedRef is implemented in python anyway, so there's no sense in trying to do anything smarter there anyway.

For the typecheck it might make a difference, but I expect we have to nudge cython on that by explicitly using PyObject_TypeCheck. I'm pretty sure cython's isinstance will just translate into PyObject_IsInstance (or perhaps it really just goes off and look up the function in the builtin dict). And that actually does happen in the more critical bits of the code! It's an easy change to make. I'll push it later.

Concerning the hash function: Didn't we have the current hash function <size_t>key1 + 13*(<size_t>key2) ^ 503*(<size_t>key3) in the original implementation? If I recall correctly, I chose it because it was recommended as a good hash function for tuples.

Why did you (temporarily?) change it to <size_t>key1 + (<size_t>key2)<<2 + (<size_t>key3)<<4?

It is indeed a good hash function, particularly for moduli that are a power of 2. Where did you get it from? or was it already in the code?

My change was just misguided. I thought the other hash was based on "prime modulus", so I wasn't assuming it was particularly good for my situation. And I thought the new code wouldn't be so sensitive, because it still has the "perturb" as well. I was wrong.

comment:34 follow-up: Changed 6 years ago by git

  • Commit changed from cce61195a94ce70f27c0a3efeafbccf4f190f5f0 to b95fa73c3fca342851dde858028a12629c6147ee

Branch pushed to git repo; I updated commit sha1. New commits:

b95fa73Faster type checking for KeyedRef?

comment:35 in reply to: ↑ 34 Changed 6 years ago by nbruin

Using PyObject_TypeCheck instead of isinstance significantly improves performance of lookup (where we need to check if keys are weakrefs and if they are still alive), and that's actually a scenario we care about. Thanks for the suggestion (I'm pretty sure any further performance improvements will not have a measurable effect on sage performance in general, but this one was silly not to include).

comment:36 in reply to: ↑ 33 Changed 6 years ago by SimonKing

Replying to nbruin:

Replying to SimonKing:

Can you elaborate: What is PyCapsule? I've never heard of it.

It's a python object wrapping a void *: http://docs.python.org/2/c-api/capsule.html , so it does exactly what we need.

Thank you for the pointer!

Concerning the hash function: Didn't we have the current hash function <size_t>key1 + 13*(<size_t>key2) ^ 503*(<size_t>key3) in the original implementation? If I recall correctly, I chose it because it was recommended as a good hash function for tuples.

It is indeed a good hash function, particularly for moduli that are a power of 2. Where did you get it from? or was it already in the code?

It has been in the code (as I have just checked by looking at the patch from #715).

When we tried to create a generalisation of TripleDict towards a dictionary with fixed key length at #12313, I did some google search concerning hash functions for tuples. I don't recall the web page, but it resulted in this code, see attachment "idkey_dict" at #12313:

cdef list find_bucket(self, tuple keytuple):
    # Determine the appropriate bucket
    cdef size_t h = 0x345678
    cdef size_t i
    for i from 0<=i<self._keylen:
        h *= 1000003
        h ^= <size_t>PyTuple_GET_ITEM(keytuple,i)
    cdef list all_buckets = self.buckets
    return <object>PyList_GET_ITEM(all_buckets, h % PyList_GET_SIZE(all_buckets))

comment:37 in reply to: ↑ 33 Changed 6 years ago by nbruin

Replying to nbruin:

I suspect for creation cython generates a PyObject_callobject

I checked and it does (also when it is declared a type)

I'm pretty sure cython's isinstance will just translate into PyObject_IsInstance

Cython is amazing: If the second argument is a type it does NOT! It does generate the more efficient PyObject_TypeCheck. So the latest patch really only needed cdef type fixed_KeyedRef = KeyedRef.

comment:38 Changed 6 years ago by git

  • Commit changed from b95fa73c3fca342851dde858028a12629c6147ee to 9022b3226b0439fdb8d48944c3eb175f5499e49b

Branch pushed to git repo; I updated commit sha1. This was a forced push. Recent commits:

9022b32trac #15367: faster type checking by declaring fixed_KeyedRef to be type
cce6119better hashing for TripleDict?, use PyCapsule? for callbacks, and fix KeyedRef?
a2852e9Remove unused debug method CoercionModel_cache_maps.get_stats()
1724e0eAn improved implementation of MonoDict? and TripleDict?

comment:39 Changed 6 years ago by nbruin

Hm, perhaps this is a good argument for trying to avoid circular structures and hence avoiding encoding erasers via closures and instead do it via an eraser class that has a weak reference to the dictionary:

sage: import gc
sage: 
sage: T=weakref.WeakKeyDictionary()
sage: 
sage: L=[GF(p) for p in prime_range(1,1000)]
sage: for i in range(len(L)-1): T[L[i+1]]=L[i]
sage: del L
sage: print len(T)
167
sage: C=0
sage: while len(T) > 0:
....:         C+=1
....:         _=gc.collect()
....:     
sage: print C
167

as you can see, every GC only collects one finite field. This is what one should expect: When GC does its analysis, it finds there is only one field that is unreachable, so GC collects that. As a knock-on effect, that causes another field to be decreffed. However, since that field is part of a reference cycle, that doesn't cause its refcount to hit 0. It takes the next GC to find that, thanks to the decref, the field is now unreachable and hence gets collected.

In order for GC to be more effective, we should avoid circular references if we easily can. So, in our own WeakValueDictionary from #13394 as well as in MonoDict and in TripleDict, I think we should have our erasers as a class (as we had). So where the argument on http://bugs.python.org/issue417795 didn't immediately convince me, now it does. We can fix this on a separate ticket if we want to, since #13394 is already merged and the slower GC is not really a bug.

comment:40 Changed 6 years ago by git

  • Commit changed from 9022b3226b0439fdb8d48944c3eb175f5499e49b to b8d5a5a2442bfe718424b888a68e44b421ac89d1

Branch pushed to git repo; I updated commit sha1. This was a forced push. Recent commits:

b8d5a5aRemove unused debug method CoercionModel_cache_maps.get_stats()
8144996trac #15367: Improved implementation of TripleDict? and MonoDict?, based on open addressing. This implementation should be faster for all operations and have a smaller memory footprint.

comment:41 Changed 6 years ago by nbruin

Reverted to an implementation that uses an eraser class that keeps a weak reference to dictionary it erases on, rather than a closure for an eraser, which would keep a strong reference. Rationale: There are some adverse effects to cyclic reference cycles: For instance, if a GC clean-up has a side-effect (due to a callback for instance) of making a cycle unreachable, that cycle will only be picked up during the next GC. Non-cyclic garbage, on the other hand, would have refcounts hitting 0 and would get cleaned up immediately. So, if one can easily avoid cycles (and we can here), then it is better to do so.

comment:42 Changed 6 years ago by git

  • Commit changed from b8d5a5a2442bfe718424b888a68e44b421ac89d1 to 9aeb4f5793ae93e4e0e8dbdea726fb2d498abc8c

Branch pushed to git repo; I updated commit sha1. New commits:

9aeb4f5trac #15367: fix whitespace

comment:43 Changed 6 years ago by git

  • Commit changed from 9aeb4f5793ae93e4e0e8dbdea726fb2d498abc8c to 719cdec176875685142039dce297a7fd8ae4143b

Branch pushed to git repo; I updated commit sha1. This was a forced push. Recent commits:

719cdecRemove unused debug method CoercionModel_cache_maps.get_stats()
0550ecbtrac #15367: Improved implementation of TripleDict? and MonoDict?, based on open addressing. This implementation should be faster for all operations and have a smaller memory footprint.

comment:44 Changed 6 years ago by nbruin

*ping*

I think this straightforward replacement is worth merging:

  • memory layout is more straightforward: If you want to trace a memory leak/reference chain, you immediately get to the MonoDict/TripleDict? rather than having to dig through 2 levels of lists
  • memory footprint is much more compact
  • performance should be quite a bit better (although it seems that these dicts rarely get accessed in tight loops: optimizing such a loop usually removes the access to these dicts)
  • the way items get deleted makes sure that the Python trashcan is involved, which prevents recursion depth errors from occurring when deleting long chains of objects.

So, if someone can finish the review ... It's pretty straightforward code (apart from being rather performance sensitive).

comment:45 follow-up: Changed 6 years ago by SimonKing

Once more, my impression is that the git workflow helps us solve problems that we didn't have without git.

That's to say: sage --dev checkout --ticket=15367 only brought me to my local branch for this ticket. sage --dev pull resulted in

Merging the remote branch "u/nbruin/ticket/15367" into the local branch "ticket/15367".
Automatic merge failed, there are conflicting commits.

Auto-merging src/sage/structure/coerce_dict.pyx
CONFLICT (content): Merge conflict in src/sage/structure/coerce_dict.pyx

Please edit the affected files to resolve the conflicts. When you are finished, your resolution will be commited.
Finished? [ok/Abort] A

So, how can I get your code?

comment:46 in reply to: ↑ 45 ; follow-up: Changed 6 years ago by vbraun

Nils rewrote history with the forced pushs (bad), so some of the code in Simon's branch conflicts as it is on a different history.

Replying to SimonKing:

Once more, my impression is that the git workflow helps us solve problems that we didn't have without git

Noticing conflicts is not a problem when you manually copy patches, correct. However, not noticing conflicts is a problem ;-)

So, how can I get your code?

If you haven't made any commits that you want to keep: delete your current branch (git branch -D my_branch) and checkout a fresh copy. If you have made commits, you have to rebase (git rebase or git cherrypick, for example) them on top of Nils' rewritten history.

comment:47 in reply to: ↑ 46 ; follow-ups: Changed 6 years ago by SimonKing

Replying to vbraun:

Replying to SimonKing:

Once more, my impression is that the git workflow helps us solve problems that we didn't have without git

Noticing conflicts is not a problem when you manually copy patches, correct. However, not noticing conflicts is a problem ;-)

As far as I know, my local repository contains nothing that depends on this ticket. So, how can there be a conflict? I am not talking about conflicts that are just artefacts of how git works.

So, how can I get your code?

If you haven't made any commits that you want to keep: delete your current branch (git branch -D my_branch) and checkout a fresh copy. If you have made commits, you have to rebase (git rebase or git cherrypick, for example) them on top of Nils' rewritten history.

If I have a local branch associated with this ticket, and if I pull from this ticket and there are merge conflicts with my local branch, I'd expect that my old local branch (ticket/15367) be preserved and a new local branch (say, ticket/15367-1) be created that is equal to the branch forced-pushed by Nils. This should be automatically done by either the dev script or by git. And then, I can still decide whether I want to delete my old branch or merge the new and the old branch, or I can explain to Nils why I don't like his new branch and switch back to the old one.

In contrast, you say that in the new workflow I have to decide whether or not to delete my old branch, before I pull from Nils' changes. Why? This approach seems plain wrong to me.

I guess this is a theoretical discussion that belongs to the sage-git list. Since I don't see stuff depend on this ticket (yet), I did as you suggested.

comment:48 follow-up: Changed 6 years ago by nbruin

It seems that sage --dev has a concept of linking a branch with a trac ticket, which is convenient. In your case, you already had a branch linked with this ticket, so naturally it tried to merge my branch with your local one and due to the rebase there really WAS a conflict. Probably there is a way to "unlink" your local branch from the ticket, so that sage --dev checkout --ticket=15367 gets you a fresh branch. Similarly I'd hope there is a way to locally "reassociate" a branch with a ticket.

Sorry about the rebase, but in this case it seemed warranted because

  • nothing else depended on the branch
  • There was a big back-and-forth on removing and reintroducing the Eraser classes, which would have been silly to archive in the sage history (tripling the modification size, or something like that)

Keeping the trivial edits to CoercionModel_cache_maps.get_stats() attributed to Simon was an interesting exercise, though.

comment:49 in reply to: ↑ 47 Changed 6 years ago by vbraun

Replying to SimonKing:

If I have a local branch associated with this ticket, and if I pull from this ticket and there are merge conflicts with my local branch, I'd expect that my old local branch (ticket/15367) be preserved and a new local branch (say, ticket/15367-1) be created that is equal to the branch forced-pushed by Nils.

The commits from all of those branches are of course preserved in git. If you like another local branch to give a name to the different histories then that is fine, but don't force your naming convention on others. To uniquely reference any of the past branch heads you only need the commit SHA1, whose historical values is conveniently listed in the ticket comments.

comment:50 in reply to: ↑ 47 Changed 6 years ago by nbruin

Replying to SimonKing:

If I have a local branch associated with this ticket, and if I pull from this ticket and there are merge conflicts with my local branch, I'd expect that my old local branch (ticket/15367) be preserved and a new local branch (say, ticket/15367-1) be created that is equal to the branch forced-pushed by Nils.

I think you can "unlink" by doing something along the lines of ./sage --dev set-remote None no/remote/branch. Because your local branch is now associated with a different remote branch (is this a git feature or is "associated remote branch" something that sage-dev invents?) I would expect that checking out the ticket would now cause a new, fresh branch to be created.

comment:51 in reply to: ↑ 48 Changed 6 years ago by SimonKing

Replying to nbruin:

It seems that sage --dev has a concept of linking a branch with a trac ticket, which is convenient. In your case, you already had a branch linked with this ticket, so naturally it tried to merge my branch with your local one and due to the rebase there really WAS a conflict.

AFAIK, git knows that this was a forced push. So, the dev script should at least offer a "forced pull".

Anyway, this ticket is not about annoyances and hold-ups caused by the switch to the new workflow.

comment:52 Changed 6 years ago by SimonKing

To start reviewing, I merged master (freshly pulled from trac) into the branch, which succeeded.

Last edited 6 years ago by SimonKing (previous) (diff)

comment:53 Changed 6 years ago by SimonKing

Second step of the review: I checked that the patchbot is right and all tests pass (after merging master).

comment:54 Changed 6 years ago by SimonKing

Third step: Re-test timings. I do "branch versus sage-5.13.b4".

%cpaste
cython("""
def test(T, list L1, list L2, list L3, v):
    for i in L1:
        for j in L2:
            for k in L3:
                T[i,j,k] = v
def test2(T, list L1, list L2, list L3):
    for i in L1:
        for j in L2:
            for k in L3:
                v = T[i,j,k]
def test3(T, list L1, list L2, list L3):
    for i in L1:
        for j in L2:
            for k in L3:
                (i,j,k) in T
def test4(T, list L1, list L2, list L3):
    for i in L1:
        for j in L2:
            for k in L3:
                (i,j+1,k) in T
def test5(T, list L1, list L2, list L3):
    for i in L1:
        for j in L2:
            for k in L3:
                del T[i,j,k]
""")

In the branch:

sage: T = sage.structure.coerce_dict.TripleDict(11)
sage: L1 = srange(100)
sage: %time test(T,L1,L1,L1,ZZ)
CPU times: user 9.70 s, sys: 0.05 s, total: 9.75 s
Wall time: 9.77 s
sage: %time test2(T,L1,L1,L1)
CPU times: user 0.64 s, sys: 0.01 s, total: 0.65 s
Wall time: 0.66 s
sage: %time test3(T,L1,L1,L1)
CPU times: user 0.61 s, sys: 0.00 s, total: 0.61 s
Wall time: 0.61 s
sage: %time test4(T,L1,L1,L1)
CPU times: user 0.23 s, sys: 0.00 s, total: 0.23 s
Wall time: 0.23 s
sage: %time test5(T,L1,L1,L1)
CPU times: user 0.84 s, sys: 0.01 s, total: 0.85 s
Wall time: 0.86 s
sage: get_memory_usage()
231.640625

In sage-5.13.b4 (plus #14912, but this doesn't touch coerce dict):

sage: T = sage.structure.coerce_dict.TripleDict(11)
sage: L1 = srange(100)
sage: %time test(T,L1,L1,L1,ZZ)
CPU times: user 15.94 s, sys: 0.11 s, total: 16.05 s
Wall time: 16.08 s
sage: %time test2(T,L1,L1,L1)
CPU times: user 1.86 s, sys: 0.00 s, total: 1.86 s
Wall time: 1.87 s
sage: %time test3(T,L1,L1,L1)
CPU times: user 1.86 s, sys: 0.00 s, total: 1.86 s
Wall time: 1.86 s
sage: %time test4(T,L1,L1,L1)
CPU times: user 1.79 s, sys: 0.02 s, total: 1.81 s
Wall time: 1.82 s
sage: %time test5(T,L1,L1,L1)
CPU times: user 1.72 s, sys: 0.03 s, total: 1.75 s
Wall time: 1.75 s
sage: get_memory_usage()
318.078125

So, everything became faster, and the memory consumption is a bit less (but the latter could very well be because of other tickets).

The effect on startuptime is probably not significant, but I get (one run only) 1586.843ms with the branch versus 1600.406ms with 5.13.b4+#14912.

So, the winner is: THIS BRANCH.

Last step of the review: Read the new code.

comment:55 follow-up: Changed 6 years ago by SimonKing

  • Status changed from needs_review to needs_info

In MonoDict.set, there is

        cursor = self.lookup(<PyObject*><void*>k)
        if cursor.key_id == NULL or cursor.key_id == dummy:
            ...

without saying that cursor is cdef mono_cell*. Similar things happen in MonoDict.iteritems, Monodict_traverse and MonoDict_clear, and also in TripleDict.set, TripleDict.iteritems, TripleDict_traverse and TripleDict_clear.

First question: Why does it not crash? Second question, even if there is a good answer to the first: Shouldn't it be cdef mono_cell* cursor resp. cdef triple_cell* cursor in order to improve performance?

Last edited 6 years ago by SimonKing (previous) (diff)

comment:56 in reply to: ↑ 55 ; follow-up: Changed 6 years ago by nbruin

Replying to SimonKing:

First question: Why does it not crash?

Cython has grown type inference.

Second question, even if there is a good answer to the first: Shouldn't it be cdef mono_cell* cursor resp. cdef triple_cell* cursor in order to improve performance?

No, because cython infers the correct type by itself.

I'm not commenting on whether it's good style to rely on said type inference, but it is very convenient when you're programming.

comment:57 in reply to: ↑ 56 ; follow-up: Changed 6 years ago by SimonKing

  • Status changed from needs_info to needs_review

Replying to nbruin:

Replying to SimonKing:

First question: Why does it not crash?

Cython has grown type inference.

Since when?

Amazing.

Anyway, back to needs review then...

comment:58 in reply to: ↑ 57 ; follow-up: Changed 6 years ago by nbruin

Replying to SimonKing:

Since when?

According to https://github.com/cython/cython/wiki/ReleaseNotes-0.13 since August 25, 2010 :-).

comment:59 in reply to: ↑ 58 Changed 6 years ago by SimonKing

  • Status changed from needs_review to positive_review

Replying to nbruin:

Replying to SimonKing:

Since when?

According to https://github.com/cython/cython/wiki/ReleaseNotes-0.13 since August 25, 2010 :-).

So far I have always tried to tell Cython which type to expect.

Anyway.

The code looks good to me, tests pass, the patchbot has nothing to complain, and as we have seen there is an improved performance. Positive review!

comment:60 Changed 5 years ago by vbraun

  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:61 Changed 20 months ago by jdemeyer

While looking at this code, I noticed that the size argument is never used anywhere (except to interpret it as data):

def __init__(self, size=11, data=None, threshold=0.7, weak_values=False)

Is this intentional? If yes, can we deprecate size? If no, what should we do?

Note: See TracTickets for help on using tickets.