Sage: Ticket #715: Parents probably not reclaimed due to too much caching
https://trac.sagemath.org/ticket/715
<p>
Here is a small example illustrating the issue.
</p>
<p>
The memory footprint of the following piece of code grows indefinitely.
</p>
<pre class="wiki">sage: K = GF(1<<55,'t')
sage: a = K.random_element()
sage: while 1:
....: E = EllipticCurve(j=a); P = E.random_point(); 2*P; del E, P;
</pre><p>
E and P get deleted, but when 2*P is computed, the action of integers on A, the abelian group of rational points of the ellitpic curve, gets cached in the corecion model.
</p>
<p>
A key-value pair is left in coercion_model._action_maps dict:
</p>
<p>
(ZZ,A,*) : <a class="missing wiki">IntegerMulAction?</a>
</p>
<p>
Moreover there is at least also references to A in the <a class="missing wiki">IntegerMulAction?</a> and one in ZZ._action_hash.
</p>
<p>
So weak refs should be used in all these places if it does not slow things too much.
</p>
en-usSagehttps://trac.sagemath.org/chrome/site/logo_sagemath_trac.png
https://trac.sagemath.org/ticket/715
Trac 1.1.6mabshoffSun, 23 Sep 2007 10:33:39 GMTmilestone set
https://trac.sagemath.org/ticket/715#comment:1
https://trac.sagemath.org/ticket/715#comment:1
<ul>
<li><strong>milestone</strong>
set to <em>sage-feature</em>
</li>
</ul>
TicketmabshoffSun, 03 Feb 2008 02:19:15 GMTmilestone changed
https://trac.sagemath.org/ticket/715#comment:2
https://trac.sagemath.org/ticket/715#comment:2
<ul>
<li><strong>milestone</strong>
changed from <em>sage-feature</em> to <em>sage-2.10.2</em>
</li>
</ul>
TicketmhansenFri, 14 Nov 2008 08:59:41 GMT
https://trac.sagemath.org/ticket/715#comment:3
https://trac.sagemath.org/ticket/715#comment:3
<p>
I think this is a bit too vague for a ticket. Robert, could you be more specific or close this?
</p>
TicketrobertwbFri, 14 Nov 2008 18:25:26 GMT
https://trac.sagemath.org/ticket/715#comment:4
https://trac.sagemath.org/ticket/715#comment:4
<p>
The coercion model needs to use weakrefs so that parents aren't needlessly referenced when they're discarded. It is nontrivial to see where the weakrefs need to go, and how to do so without slowing the code down.
</p>
<p>
The ticket is still valid.
</p>
TicketdavidloefflerThu, 20 Jan 2011 11:29:18 GMTcomponent, description changed; upstream set
https://trac.sagemath.org/ticket/715#comment:5
https://trac.sagemath.org/ticket/715#comment:5
<ul>
<li><strong>component</strong>
changed from <em>basic arithmetic</em> to <em>coercion</em>
</li>
<li><strong>description</strong>
modified (<a href="/ticket/715?action=diff&version=5">diff</a>)
</li>
<li><strong>upstream</strong>
set to <em>N/A</em>
</li>
</ul>
TicketjpfloriFri, 01 Jul 2011 13:29:40 GMTcc set
https://trac.sagemath.org/ticket/715#comment:6
https://trac.sagemath.org/ticket/715#comment:6
<ul>
<li><strong>cc</strong>
<em>jpflori</em> added
</li>
</ul>
TicketjpfloriMon, 04 Jul 2011 08:38:12 GMTdescription changed
https://trac.sagemath.org/ticket/715#comment:7
https://trac.sagemath.org/ticket/715#comment:7
<ul>
<li><strong>description</strong>
modified (<a href="/ticket/715?action=diff&version=7">diff</a>)
</li>
</ul>
<p>
With the piece of code in the desrciption, there is only one reference to these objects in that ZZ._hash_actions dictionary because to build it we test if A1 == A2 and not A1 is A2 as in coercion_model._action_maps, and because of the current implementation of ellitpic curves, see <a class="ext-link" href="http://groups.google.com/group/sage-nt/browse_thread/thread/ec8d0ad14a819082"><span class="icon"></span>http://groups.google.com/group/sage-nt/browse_thread/thread/ec8d0ad14a819082</a> and <a class="ext-link" href="http://trac.sagemath.org/sage_trac/search/opensearch?q=ticket%3A11474"><span class="icon"></span>#11474</a>, and decause the above code use only one j-invariant, only ones gets finally stored.
</p>
<p>
However with random curves, I guess there would be all of them.
</p>
<p>
About the weakref, the idea should more be to build something like <a class="missing wiki">WeakKeyDictionnary?</a> if it does not slow down coercion too much...
</p>
TicketnbruinMon, 04 Jul 2011 16:05:59 GMT
https://trac.sagemath.org/ticket/715#comment:8
https://trac.sagemath.org/ticket/715#comment:8
<p>
The following example also exhibits a suspicious, steady growth in memory use. The only reason I can think of why that would happen is that references to the created finite field remain lying around somewhere, preventing deallocation:
</p>
<pre class="wiki">sage: L=prime_range(10^8)
sage: for p in L: k=GF(p)
</pre><p>
If you change it to the creation of a polynomial ring the memory use rises much faster:
</p>
<pre class="wiki">sage: L=prime_range(10^8)
sage: for p in L: k=GF(p)['t']
</pre><p>
Are "unique" parents simply *never* deallocated?
</p>
TicketjpfloriMon, 04 Jul 2011 16:13:40 GMT
https://trac.sagemath.org/ticket/715#comment:9
https://trac.sagemath.org/ticket/715#comment:9
<p>
Be aware that polynomial rings are also cached because of uniqueness of parents, explaining somehow your second memory consumption; see <a class="closed ticket" href="https://trac.sagemath.org/ticket/5970" title="defect: Weak references in Polynomial Ring cache (closed: duplicate)">#5970</a> for example.
</p>
<p>
For finite fields I did not check.
</p>
TicketzimmermaThu, 13 Oct 2011 11:14:18 GMTcc changed
https://trac.sagemath.org/ticket/715#comment:10
https://trac.sagemath.org/ticket/715#comment:10
<ul>
<li><strong>cc</strong>
<em>zimmerma</em> added
</li>
</ul>
TicketjpfloriMon, 24 Oct 2011 13:19:39 GMT
https://trac.sagemath.org/ticket/715#comment:11
https://trac.sagemath.org/ticket/715#comment:11
<p>
See <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> for some concrete instances of this problem and some advice to investigate it.
</p>
TicketSimonKingWed, 21 Dec 2011 10:11:11 GMT
https://trac.sagemath.org/ticket/715#comment:12
https://trac.sagemath.org/ticket/715#comment:12
<p>
In my code for the computation Ext algebras of basic algebras, I use letterplace algebras (see <a class="closed ticket" href="https://trac.sagemath.org/ticket/7797" title="enhancement: Full interface to letterplace from singular (closed: fixed)">#7797</a>), and they involve the creation of many polynomial rings. Only one of them is used at a time, so, the others could be garbage collected. But they aren't, and I suspect this is because of using strong references in the coercion cache.
</p>
<p>
See the following example (using <a class="closed ticket" href="https://trac.sagemath.org/ticket/7797" title="enhancement: Full interface to letterplace from singular (closed: fixed)">#7797</a>)
</p>
<pre class="wiki">sage: F.<a,b,c> = FreeAlgebra(GF(4,'z'), implementation='letterplace')
sage: import gc
sage: len(gc.get_objects())
170947
sage: a*b*c*b*c*a*b*c
a*b*c*b*c*a*b*c
sage: len(gc.get_objects())
171556
sage: del F,a,b,c
sage: gc.collect()
81
sage: len(gc.get_objects())
171448
sage: cm = sage.structure.element.get_coercion_model()
sage: cm.reset_cache()
sage: gc.collect()
273
sage: len(gc.get_objects())
171108
</pre><p>
That is certainly not a proof of my claim, but it indicates that it might be worth while to investigate.
</p>
<p>
In order to facilitate work, I am providing some other tickets that may be related to this:
</p>
<ul><li><a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> (that might be solved, but needs review)
</li><li><a class="closed ticket" href="https://trac.sagemath.org/ticket/10262" title="enhancement: memory leak in scalar*vector multiplication (closed: duplicate)">#10262</a>
</li><li><a class="closed ticket" href="https://trac.sagemath.org/ticket/8905" title="defect: Memory leak in echelon over QQ (closed: fixed)">#8905</a>
</li><li><a class="closed ticket" href="https://trac.sagemath.org/ticket/5970" title="defect: Weak references in Polynomial Ring cache (closed: duplicate)">#5970</a> (this might already be fixed)
</li></ul><p>
I guess that one should use a similar cache model to what I did in <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>: The key for the cache should not just be <code>(domain,codomain)</code>, because we want that garbage collection of the cache item is already allowed if just one of domain or codomain is collectable.
</p>
TicketSimonKingWed, 21 Dec 2011 10:41:10 GMT
https://trac.sagemath.org/ticket/715#comment:13
https://trac.sagemath.org/ticket/715#comment:13
<p>
I try to wrap my mind around weak references. I found that when creating a weak reference, one can also provide a method that is called when the weak reference becomes invalid. I propose to use such method to erase the deleted object from the cache, regardless whether it appears as domain or codomain.
</p>
<p>
Here is a proof of concept:
</p>
<pre class="wiki">sage: ref = weakref.ref
sage: D = {}
sage: def remove(x):
....: for a,b,c in D.keys():
....: if a is x or b is x or c is x:
....: D.__delitem__((a,b,c))
....:
sage: class A:
....: def __init__(self,x):
....: self.x = x
....: def __repr__(self):
....: return str(self.x)
....: def __del__(self):
....: print "deleting",self.x
....:
sage: a = A(5)
sage: b = A(6)
sage: r = ref(a,remove)
sage: s = ref(b,remove)
sage: D[r,r,s] = 1
sage: D[s,r,s] = 2
sage: D[s,s,s] = 3
sage: D[s,s,1] = 4
sage: D[r,s,1] = 5
sage: D.values()
[5, 3, 1, 4, 2]
sage: del a
deleting 5
sage: D.values()
[4, 3]
sage: del b
deleting 6
sage: D.values()
[]
</pre>
TicketSimonKingWed, 21 Dec 2011 14:09:23 GMT
https://trac.sagemath.org/ticket/715#comment:14
https://trac.sagemath.org/ticket/715#comment:14
<p>
It turns out that using weak references in the coercion cache will not be enough. Apparently there are other direct references that have to be dealt with.
</p>
TicketSimonKingWed, 21 Dec 2011 17:10:06 GMT
https://trac.sagemath.org/ticket/715#comment:15
https://trac.sagemath.org/ticket/715#comment:15
<p>
I wonder whether the problem has already been solved. I just tested the example from the ticket description, and get (at least with <a class="closed ticket" href="https://trac.sagemath.org/ticket/11900" title="defect: Serious regression caused by #9138 (closed: fixed)">#11900</a>, <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/11115" title="enhancement: Rewrite cached_method in Cython (closed: fixed)">#11115</a>):
</p>
<pre class="wiki">sage: K = GF(1<<55,'t')
sage: a = K.random_element()
sage: m0 = get_memory_usage()
sage: for i in range(1000):
....: E = EllipticCurve(j=a); P = E.random_point(); PP = 2*P
....:
sage: get_memory_usage() - m0
15.22265625
</pre><p>
I think that this is not particularly scary. I'll repeat the test with vanilla sage-4.8.alpha3, but this will take a while to rebuild.
</p>
TicketSimonKingWed, 21 Dec 2011 19:39:46 GMT
https://trac.sagemath.org/ticket/715#comment:16
https://trac.sagemath.org/ticket/715#comment:16
<p>
No, even in vanilla sage-4.8.alpha3 I don't find a scary memory leak in this example.
</p>
<p>
Do we have a better example? One could, of course, argue that one should use weak references for caching even if we do not find an apparent memory leak. I am preparing a patch for it now.
</p>
TicketSimonKingWed, 21 Dec 2011 22:45:52 GMTcc changed
https://trac.sagemath.org/ticket/715#comment:17
https://trac.sagemath.org/ticket/715#comment:17
<ul>
<li><strong>cc</strong>
<em>vbraun</em> added
</li>
</ul>
<p>
Here is an experimental patch.
</p>
<p>
A new test shows that the weak caching actually works.
</p>
<p>
Note that the patch also introduces a weak cache for polynomial rings, which might be better to put into <a class="closed ticket" href="https://trac.sagemath.org/ticket/5970" title="defect: Weak references in Polynomial Ring cache (closed: duplicate)">#5970</a>. Well, we can sort things out later...
</p>
TicketSimonKingThu, 22 Dec 2011 18:29:24 GMT
https://trac.sagemath.org/ticket/715#comment:18
https://trac.sagemath.org/ticket/715#comment:18
<p>
It needs work, though. Some tests in sage/structure fail, partially because of pickling, partially because some tests do not follow the new specification of <code>TripleDict</code> (namely that the first two parts of each key triple and the associated value must be weak referenceable.
</p>
TicketSimonKingFri, 23 Dec 2011 06:53:36 GMT
https://trac.sagemath.org/ticket/715#comment:19
https://trac.sagemath.org/ticket/715#comment:19
<p>
Now I wonder: Should I try to use weak references <em>and</em> make it accept stuff that does not allow for weak references?
</p>
<p>
In the intended applications, weak references are possible. But in some tests and in the pickle jar, the "wrong" type of keys (namely strings and ints) are used.
</p>
TicketSimonKingFri, 23 Dec 2011 09:28:35 GMT
https://trac.sagemath.org/ticket/715#comment:20
https://trac.sagemath.org/ticket/715#comment:20
<p>
The only place where the weak references are created is in the <code>set(...)</code> method of <code>TripleDict</code>. I suggest to simply catch the error that may occur when creating a weak reference, and then use a different way of storing the key. I am now running tests, and I hope that this ticket will be "needs review" in a few hours.
</p>
TicketSimonKingFri, 23 Dec 2011 09:53:49 GMTstatus changed; keywords, author set
https://trac.sagemath.org/ticket/715#comment:21
https://trac.sagemath.org/ticket/715#comment:21
<ul>
<li><strong>keywords</strong>
<em>weak</em> <em>cache</em> <em>coercion</em> added
</li>
<li><strong>status</strong>
changed from <em>new</em> to <em>needs_review</em>
</li>
<li><strong>author</strong>
set to <em>Simon King</em>
</li>
</ul>
<p>
With the attached patch, all tests pass for me, and the new features are doctested. Needs review!
</p>
TicketSimonKingFri, 23 Dec 2011 10:04:26 GMTdependencies set
https://trac.sagemath.org/ticket/715#comment:22
https://trac.sagemath.org/ticket/715#comment:22
<ul>
<li><strong>dependencies</strong>
set to <em>#11900</em>
</li>
</ul>
<p>
It turns out that this patch only cleanly applies after <a class="closed ticket" href="https://trac.sagemath.org/ticket/11900" title="defect: Serious regression caused by #9138 (closed: fixed)">#11900</a>. So, I introduce <a class="closed ticket" href="https://trac.sagemath.org/ticket/11900" title="defect: Serious regression caused by #9138 (closed: fixed)">#11900</a> as a dependency. My statement on "doctests passing" was with <a class="closed ticket" href="https://trac.sagemath.org/ticket/11900" title="defect: Serious regression caused by #9138 (closed: fixed)">#11900</a> anyway.
</p>
TicketzimmermaFri, 23 Dec 2011 10:18:18 GMT
https://trac.sagemath.org/ticket/715#comment:23
https://trac.sagemath.org/ticket/715#comment:23
<p>
I was able to apply this patch to vanilla 4.7.2. Should I continue reviewing it like this?
</p>
<p>
Paul
</p>
TicketzimmermaFri, 23 Dec 2011 11:12:33 GMT
https://trac.sagemath.org/ticket/715#comment:24
https://trac.sagemath.org/ticket/715#comment:24
<p>
on top of vanilla 4.7.2 several doctests fail:
</p>
<pre class="wiki">
sage -t 4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/devel/sage-715/sage/calculus/interpolators.pyx # 0 doctests failed
sage -t 4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/devel/sage-715/sage/databases/database.py # 15 doctests failed
sage -t 4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/devel/sage-715/sage/finance/time_series.pyx # 0 doctests failed
sage -t 4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/devel/sage-715/sage/graphs/graph_list.py # 4 doctests failed
sage -t 4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/devel/sage-715/sage/graphs/graph_database.py # 28 doctests failed
sage -t 4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/devel/sage-715/sage/graphs/graph.py # 6 doctests failed
sage -t 4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/devel/sage-715/sage/graphs/generic_graph.py # 4 doctests failed
sage -t 4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/devel/sage-715/sage/matrix/matrix2.pyx # 3 doctests failed
sage -t 4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/devel/sage-715/sage/modular/hecke/hecke_operator.py # 1 doctests failed
sage -t 4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/devel/sage-715/sage/modular/hecke/ambient_module.py # 2 doctests failed
sage -t 4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/devel/sage-715/sage/modular/modsym/subspace.py # 6 doctests failed
sage -t 4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/devel/sage-715/sage/modular/modsym/boundary.py # 3 doctests failed
sage -t 4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/devel/sage-715/sage/modular/modsym/space.py # 3 doctests failed
sage -t 4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/devel/sage-715/sage/modular/modsym/modsym.py # 1 doctests failed
sage -t 4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/devel/sage-715/sage/modular/modsym/ambient.py # 11 doctests failed
sage -t 4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/devel/sage-715/sage/modular/abvar/abvar.py # 0 doctests failed
sage -t 4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/devel/sage-715/sage/schemes/elliptic_curves/heegner.py # 9 doctests failed
sage -t 4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/devel/sage-715/sage/sandpiles/sandpile.py # Time out
</pre><p>
Paul
</p>
TicketSimonKingFri, 23 Dec 2011 16:21:55 GMT
https://trac.sagemath.org/ticket/715#comment:25
https://trac.sagemath.org/ticket/715#comment:25
<p>
I'll try again on top of vanilla sage-4.8.alpha3. You are right, the patch does apply (almost) cleanly even without <a class="closed ticket" href="https://trac.sagemath.org/ticket/11900" title="defect: Serious regression caused by #9138 (closed: fixed)">#11900</a>. That surprises me, because at some point there was an inconsistency.
</p>
<p>
Hopefully I can see later today whether I get the same errors as you.
</p>
TicketSimonKingFri, 23 Dec 2011 17:28:36 GMTdependencies deleted
https://trac.sagemath.org/ticket/715#comment:26
https://trac.sagemath.org/ticket/715#comment:26
<ul>
<li><strong>dependencies</strong>
<em>#11900</em> deleted
</li>
</ul>
<p>
It turns out that <a class="closed ticket" href="https://trac.sagemath.org/ticket/11900" title="defect: Serious regression caused by #9138 (closed: fixed)">#11900</a> is indeed not needed.
</p>
<p>
I can not reproduce any of the errors you mention.
</p>
<p>
Moreover, the file "sage/devel/sage/databases/database.py", for which you reported an error, does not exist in vanilla sage (not in 4.7.2 and not in 4.8.alpha3).
</p>
<p>
Did you test other patches before returning to vanilla 4.7.2? Namely, when a patch changes a module from python to cython, and one wants to remove the patch, then it is often needed to also remove any reference to the cython module in <code>build/sage/...</code> and in <code>build/*/sage/...</code>. For example, when I had <a class="closed ticket" href="https://trac.sagemath.org/ticket/11115" title="enhancement: Rewrite cached_method in Cython (closed: fixed)">#11115</a> applied and want to remove it again, then I would do <code>rm build/sage/misc/cachefunc.*</code> and <code>rm build/*/sage/misc/cachefunc.*</code>.
</p>
TicketzimmermaFri, 23 Dec 2011 18:09:08 GMT
https://trac.sagemath.org/ticket/715#comment:27
https://trac.sagemath.org/ticket/715#comment:27
<p>
yes I tried other patches (<a class="closed ticket" href="https://trac.sagemath.org/ticket/10983" title="enhancement: new doctest for french book about Sage (closed: fixed)">#10983</a>, <a class="closed ticket" href="https://trac.sagemath.org/ticket/8720" title="defect: CC and CDF do not display numeric 0 (closed: fixed)">#8720</a>, <a class="closed ticket" href="https://trac.sagemath.org/ticket/10596" title="enhancement: Misc improvements to integer.pyx (closed: fixed)">#10596</a>) before <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a>, but each one with a
different clone.
</p>
<p>
Paul
</p>
TicketSimonKingFri, 23 Dec 2011 18:15:15 GMT
https://trac.sagemath.org/ticket/715#comment:28
https://trac.sagemath.org/ticket/715#comment:28
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:27" title="Comment 27">zimmerma</a>:
</p>
<blockquote class="citation">
<p>
yes I tried other patches (<a class="closed ticket" href="https://trac.sagemath.org/ticket/10983" title="enhancement: new doctest for french book about Sage (closed: fixed)">#10983</a>, <a class="closed ticket" href="https://trac.sagemath.org/ticket/8720" title="defect: CC and CDF do not display numeric 0 (closed: fixed)">#8720</a>, <a class="closed ticket" href="https://trac.sagemath.org/ticket/10596" title="enhancement: Misc improvements to integer.pyx (closed: fixed)">#10596</a>) before <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a>, but each one with a
different clone.
</p>
</blockquote>
<p>
But where does the databases/database.py file come from?
</p>
<p>
And could you post one or two examples for the errors you are getting (i.e. not just which files are problematic, but what commands exactly fail)?
</p>
TicketSimonKingSat, 24 Dec 2011 18:16:31 GMT
https://trac.sagemath.org/ticket/715#comment:29
https://trac.sagemath.org/ticket/715#comment:29
<p>
FWIW: I started with sage-4.8.alpha3, have <a class="closed ticket" href="https://trac.sagemath.org/ticket/9138" title="defect: Categories for all rings (closed: fixed)">#9138</a>, <a class="closed ticket" href="https://trac.sagemath.org/ticket/11900" title="defect: Serious regression caused by #9138 (closed: fixed)">#11900</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a> applied, and all doctests pass. I don't know why the patchbot isn't even trying (although it says "retry: True"), but from my point of view, everything is alright.
</p>
TicketSimonKingMon, 26 Dec 2011 16:18:16 GMT
https://trac.sagemath.org/ticket/715#comment:30
https://trac.sagemath.org/ticket/715#comment:30
<p>
I have simplified the routine that removes cache items when a weak reference became invalid. The tests all pass for me.
</p>
<p>
Apply trac715_weak_coercion_cache.patch
</p>
TicketvbraunTue, 27 Dec 2011 12:17:16 GMTdependencies set
https://trac.sagemath.org/ticket/715#comment:31
https://trac.sagemath.org/ticket/715#comment:31
<ul>
<li><strong>dependencies</strong>
set to <em>#9138, #11900</em>
</li>
</ul>
TicketSimonKingWed, 28 Dec 2011 23:18:27 GMT
https://trac.sagemath.org/ticket/715#comment:32
https://trac.sagemath.org/ticket/715#comment:32
<p>
One question: Currently, my patch uses weak references only for the first two parts of the key. Should it also use weak references to the value, when possible?
</p>
<p>
By "when possible", I mean that not all values allow weak references - if it is possible then a weak reference is used, otherwise a strong reference is used. This might contribute to fixing the memory leak in <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>, but it might have a speed penalty.
</p>
<p>
Concerning <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>: The point is that an action (which currently does not allow weak references, but that might change) has a strong reference to the objects that are used for storing it in the cache. Hence, an action is not collectable with the current patch.
</p>
<p>
Thoughts?
</p>
TicketSimonKingThu, 29 Dec 2011 22:26:20 GMTattachment set
https://trac.sagemath.org/ticket/715
https://trac.sagemath.org/ticket/715
<ul>
<li><strong>attachment</strong>
set to <em>trac715_weak_coercion_cache.patch</em>
</li>
</ul>
<p>
Use weak references in the coercion cache
</p>
TicketSimonKingThu, 29 Dec 2011 22:28:31 GMT
https://trac.sagemath.org/ticket/715#comment:33
https://trac.sagemath.org/ticket/715#comment:33
<p>
I have slightly updated some of the new examples: In the old patch version, I had created <code>TripleDict(10)</code>, but meanwhile I learnt that the given parameter should better be odd (actually a prime). So, in the new patch version, it is <code>TripleDict(11)</code>.
</p>
TicketSimonKingFri, 30 Dec 2011 09:24:05 GMTstatus changed; work_issues set
https://trac.sagemath.org/ticket/715#comment:34
https://trac.sagemath.org/ticket/715#comment:34
<ul>
<li><strong>status</strong>
changed from <em>needs_review</em> to <em>needs_work</em>
</li>
<li><strong>work_issues</strong>
set to <em>Comparison of the third key items</em>
</li>
</ul>
<p>
I think I need to modify one detail:
</p>
<p>
For efficiency and since domain/codomain of a map must be identic with (and not just equal to) the given keys, my patch compares them by "is" rather than "==". But I think one should still compare the third item of a key via "==" and not "is". I need to do some tests first...
</p>
TicketSimonKingFri, 30 Dec 2011 13:59:50 GMT
https://trac.sagemath.org/ticket/715#comment:35
https://trac.sagemath.org/ticket/715#comment:35
<p>
It really is not an easy question whether or not we should have "is" or "==".
</p>
<p>
On the one hand, we have the lines
</p>
<pre class="wiki">!python
if y_mor is not None:
all.append("Coercion on right operand via")
all.append(y_mor)
if res is not None and res is not y_mor.codomain():
raise RuntimeError, ("BUG in coercion model: codomains not equal!", x_mor, y_mor)
</pre><p>
in sage/structure/coerce.pyx seem to imply that comparison via "is" is the right thing to do.
</p>
<p>
But in the same file, the coercion model copes with the fact that some parents are not unique:
</p>
<pre class="wiki">!python
# Make sure the domains are correct
if R_map.domain() is not R:
if fix:
connecting = R_map.domain().coerce_map_from(R)
if connecting is not None:
R_map = R_map * connecting
if R_map.domain() is not R:
raise RuntimeError, ("BUG in coercion model, left domain must be original parent", R, R_map)
if S_map is not None and S_map.domain() is not S:
if fix:
connecting = S_map.domain().coerce_map_from(S)
if connecting is not None:
S_map = S_map * connecting
if S_map.domain() is not S:
raise RuntimeError, ("BUG in coercion model, right domain must be original parent", S, S_map)
</pre><p>
That would suggest that comparison by "==" (the old behaviour or <code>TripleDict</code>) is fine.
</p>
<p>
Perhaps we should actually have to variants of <code>TripleDict</code>, one using "is" and one using "==".
</p>
<p>
Note another detail of sage/structure/coerce.pyx: We have
</p>
<pre class="wiki"> cpdef verify_action(self, action, R, S, op, bint fix=True):
</pre><p>
but
</p>
<pre class="wiki"> cpdef verify_coercion_maps(self, R, S, homs, bint fix=False):
</pre><p>
Note the different default value for "fix". If "fix" is True then the coercion model tries to cope with non-unique parents by prepending a conversion between the two equal copies of a parent.
</p>
<p>
Since the default is to fix non-unique parents for actions, but not for coercion maps, I suggest that a "=="-<code>TripleDict</code> should be used for actions and an "is"-<code>TripleDict</code> for coercions.
</p>
TicketjpfloriFri, 30 Dec 2011 14:07:01 GMT
https://trac.sagemath.org/ticket/715#comment:36
https://trac.sagemath.org/ticket/715#comment:36
<p>
I guess a choice has to be made and that it should at lest be as consistent as possible.
What you propose makes sense to me, is not too far from the current model and gives a little more conssitency.
Moreover, when both <a class="missing wiki">TripleDicts?</a> will have been implemented, changing our mind later will be trivial.
</p>
TicketSimonKingFri, 30 Dec 2011 14:49:32 GMT
https://trac.sagemath.org/ticket/715#comment:37
https://trac.sagemath.org/ticket/715#comment:37
<p>
There is another detail. Even in the old version of <code>TripleDict</code>, we have
</p>
<pre class="wiki"> It is implemented as a list of lists (hereafter called buckets). The bucket
is chosen according to a very simple hash based on the object pointer.
and each bucket is of the form [k1, k2, k3, value, k1, k2, k3, value, ...]
on which a linear search is performed.
</pre><p>
So, the choice of a bucket is based on the object pointer - but then it is not consequent to compare by "==".
</p>
TicketSimonKingFri, 30 Dec 2011 14:55:16 GMT
https://trac.sagemath.org/ticket/715#comment:38
https://trac.sagemath.org/ticket/715#comment:38
<p>
To be precise: The <em>old</em> behaviour was not consequent. The bucket depended on <code>id(k1),id(k2),id(k3)</code>, but the comparison was by "==" rather than by "is".
</p>
<p>
Experimentally, I will provide two versions of <code>TripleDict</code>, one using "hash"for determining the bucket and doing comparison by "==", the other using "id" for determining the bucket and doing comparison by "is".
</p>
TicketSimonKingFri, 30 Dec 2011 23:30:08 GMTwork_issues changed
https://trac.sagemath.org/ticket/715#comment:39
https://trac.sagemath.org/ticket/715#comment:39
<ul>
<li><strong>work_issues</strong>
changed from <em>Comparison of the third key items</em> to <em>fix doctests</em>
</li>
</ul>
<p>
As announced, I have attached an experimental patch. It provides two variants of <code>TripleDict</code>, namely using "==" or "is" for comparison, respectively. Both are used, namely for caching coerce maps or actions, respectively.
</p>
<p>
It could be that a last-minute change was interfering, but I am confident that all but the following three tests pass:
</p>
<pre class="wiki"> sage -t devel/sage-main/doc/en/bordeaux_2008/nf_introduction.rst # 1 doctests failed
sage -t devel/sage-main/sage/modular/modsym/space.py # Killed/crashed
sage -t devel/sage-main/sage/structure/coerce_dict.pyx # 3 doctests failed
</pre><p>
The memory leak exposed in the ticket description is fixed (more or less):
</p>
<pre class="wiki">sage: K = GF(1<<55,'t')
sage: a = K.random_element()
sage: for i in range(500):
....: E = EllipticCurve(j=a)
....: P = E.random_point()
....: Q = 2*P
....:
sage: import gc
sage: gc.collect()
862
sage: from sage.schemes.generic.homset import SchemeHomsetModule_abelian_variety_coordinates_field
sage: LE = [x for x in gc.get_objects() if isinstance(x,SchemeHomsetModule_abelian_variety_coordinates_field)]
sage: len(LE)
2
</pre><p>
I am not sure whether this makes <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> redundant.
</p>
<p>
For now, it is "needs work, because of the doctests. But you can already play with the patch.
</p>
TicketSimonKingFri, 30 Dec 2011 23:32:31 GMT
https://trac.sagemath.org/ticket/715#comment:40
https://trac.sagemath.org/ticket/715#comment:40
<p>
Sorry, only TWO doctests should fail: The tests of sage/structure/coerce_dict.pyx are, of course, fixed.
</p>
TicketSimonKingSat, 31 Dec 2011 13:07:47 GMT
https://trac.sagemath.org/ticket/715#comment:41
https://trac.sagemath.org/ticket/715#comment:41
<p>
The segfault in <code>sage -t devel/sage-main/sage/modular/modsym/space.py</code> seems difficult to debug.
</p>
<p>
Inspecting a core dump with gdb did not help at all:
</p>
<pre class="wiki">(gdb) bt
#0 0x00007f61d12ca097 in kill () from /lib64/libc.so.6
#1 0x00007f61d0044a40 in sigdie () from /home/simon/SAGE/sage-4.8.alpha3/local/lib/libcsage.so
#2 0x00007f61d0044646 in sage_signal_handler () from /home/simon/SAGE/sage-4.8.alpha3/local/lib/libcsage.so
#3 <signal handler called>
#4 0x00007f61cf080520 in mpn_submul_1 () from /home/simon/SAGE/sage-4.8.alpha3/local/lib/libgmp.so.8
#5 0x00007f61cf0b4f0f in __gmpn_sb_bdiv_q () from /home/simon/SAGE/sage-4.8.alpha3/local/lib/libgmp.so.8
#6 0x00007f61cf0b6428 in __gmpn_divexact () from /home/simon/SAGE/sage-4.8.alpha3/local/lib/libgmp.so.8
#7 0x00007f61ccbf4d64 in ?? ()
...
#191 0x55c0ade81d9aeecf in ?? ()
#192 0xffffe4b8b6920b7b in ?? ()
#193 0x000000000ac854cf in ?? ()
#194 0x0000000000000000 in ?? ()
</pre><p>
How could one proceed? What other debugging techniques can you recommend?
</p>
TicketvbraunSat, 31 Dec 2011 13:21:37 GMT
https://trac.sagemath.org/ticket/715#comment:42
https://trac.sagemath.org/ticket/715#comment:42
<p>
Looks like you did not tell gdb about the executable you were running. You should run
</p>
<pre class="wiki">gdb --core=<corefile> $SAGE_LOCAL/bin/python
</pre>
TicketSimonKingSat, 31 Dec 2011 14:11:30 GMT
https://trac.sagemath.org/ticket/715#comment:43
https://trac.sagemath.org/ticket/715#comment:43
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:42" title="Comment 42">vbraun</a>:
</p>
<blockquote class="citation">
<p>
Looks like you did not tell gdb about the executable you were running.
</p>
</blockquote>
<p>
No, I did tell it. I did
</p>
<pre class="wiki">gdb --core=715doublecore ~/SAGE/sage-4.8.alpha3/local/bin/python
</pre><p>
Should I do it inside a Sage shell?
</p>
TicketSimonKingSat, 31 Dec 2011 14:15:59 GMT
https://trac.sagemath.org/ticket/715#comment:44
https://trac.sagemath.org/ticket/715#comment:44
<p>
No, doing the same inside a sage shell did not help either.
</p>
TicketSimonKingSat, 31 Dec 2011 14:59:43 GMT
https://trac.sagemath.org/ticket/715#comment:45
https://trac.sagemath.org/ticket/715#comment:45
<p>
I am now printing some debugging information into a file, which hopefully means that I am coming closer to the source of the problem. The segfault arises in line 2165 of sage/modular/modsym/space.py
</p>
TicketSimonKingSat, 31 Dec 2011 15:18:40 GMT
https://trac.sagemath.org/ticket/715#comment:46
https://trac.sagemath.org/ticket/715#comment:46
<p>
Sorry, it was the wrong line number.
</p>
TicketSimonKingSun, 01 Jan 2012 19:18:25 GMT
https://trac.sagemath.org/ticket/715#comment:47
https://trac.sagemath.org/ticket/715#comment:47
<p>
Meanwhile I am rather desperate: I have not the faintest idea how the segfault occurs.
</p>
<p>
Therefore I used some debugging function that I registered using <code>sys.settrace(...)</code>, so that all Python commands in the critical example are written into a file.
</p>
<p>
I posted logs for the <a class="ext-link" href="http://sage.math.washington.edu/home/SimonKing/SAGE/tickets/715/fulltrace"><span class="icon"></span>unpatched</a> and the <a class="ext-link" href="http://sage.math.washington.edu/home/SimonKing/SAGE/tickets/715/fulltracePatched"><span class="icon"></span>patched</a> version.
</p>
<p>
There is one obvious difference of the two logs: The hash is called more often in the patched version. Calling the hash is rather inefficient for matrix spaces: Each time when the hash of a matrix space is called, the matrix space's string representation is created, which is slow. I suggest to cache the hash value (like what I did for polynomial rings in <a class="closed ticket" href="https://trac.sagemath.org/ticket/9944" title="defect: categories for polynomial rings (closed: fixed)">#9944</a>), but this should be on a different ticket.
</p>
<p>
Apart from that, I can't spot an obvious difference. Do you have any clue?
</p>
TicketSimonKingMon, 02 Jan 2012 07:58:05 GMT
https://trac.sagemath.org/ticket/715#comment:48
https://trac.sagemath.org/ticket/715#comment:48
<p>
It turns out that using <code>TripleDictById</code> for the _action_maps cache makes the segfault disappear.
</p>
<p>
If one uses <code>TripleDict</code> for _coercion_maps then
</p>
<pre class="wiki">sage -t devel/sage-main/sage/modular/modsym/space.py
</pre><p>
takes 30 seconds, but if one also uses <code>TripleDictById</code> then it only takes 23 seconds.
</p>
<p>
My conclusion:
</p>
<ul><li>The old version of <code>TripleDict</code> was buggy: It uses <code>id(...)</code> for the hash table, but <code>==</code> for comparison. I think that had to be fixed.
</li><li>The new version of <code>TripleDict</code> uses <code>hash(...)</code> for the hash table and <code>==</code> for comparison. That should be fine, but (1) it leads to a segfault and (2) it leads to a slowdown. After all, calling <code>hash(...)</code> is a lot slower than determining the address.
</li><li>The new <code>TripleDictById</code> uses <code>id(...)</code> for the hash table and <code>... is ...</code> for comparison. Problem: It would probably not fix the memory leak.
</li></ul><p>
However, the fact that using <code>TripleDictById</code> fixes the segfault makes me wonder: Perhaps the segfault occurs when calling <code>hash(...)</code> on a parent? Namely, in some cases, and action will already be constructed during initialisation of a parent. But if the hash is determined based on cdef data that aren't initialised, a segfault can easily occur.
</p>
<p>
I'll investigate that further. In any case, we need to keep an eye on the potential slow-down.
</p>
TicketSimonKingMon, 02 Jan 2012 11:48:25 GMT
https://trac.sagemath.org/ticket/715#comment:49
https://trac.sagemath.org/ticket/715#comment:49
<p>
The segfault does not occur while computing a hash. It occurs in line 468 of sage/matrix/matrix_rational_dense.pyx, namely
</p>
<pre class="wiki"> mpq_mul(y, w._entries[j], self._matrix[j][i])
</pre><p>
I also tested, just before that line, that <code>w[j]</code> and <code>self.get_unsafe(j,i)</code> (which accesses <code>w._entries[j]</code> and <code>self._matrix[j],[i]</code>) works.
</p>
<p>
At this point, I am at my wits' end. To me, it looks like a change in the way of comparing dictionary keys modifies internals of mpir (IIRC, this is where mpq_mul is defined). gdb can not decipher the core file, and I don't know how valgrind can be used.
</p>
<p>
What else?
</p>
TicketvbraunMon, 02 Jan 2012 17:03:38 GMT
https://trac.sagemath.org/ticket/715#comment:50
https://trac.sagemath.org/ticket/715#comment:50
<p>
Which patches did you apply? With only <code>trac715_two_tripledicts.patch</code> applied sage doesn't start.
</p>
TicketSimonKingMon, 02 Jan 2012 18:39:54 GMT
https://trac.sagemath.org/ticket/715#comment:51
https://trac.sagemath.org/ticket/715#comment:51
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:50" title="Comment 50">vbraun</a>:
</p>
<blockquote class="citation">
<p>
Which patches did you apply? With only <code>trac715_two_tripledicts.patch</code> applied sage doesn't start.
</p>
</blockquote>
<p>
What???
</p>
<p>
According to <code>hg qapplied</code>, I have
</p>
<pre class="wiki">trac_12057_fix_doctests.patch
9138_flat.patch
trac_11319_prime_field_coercion.patch
trac_11319_number_field_example.patch
trac11900_category_speedup_combined.patch
11115_flat.patch
trac_11115_docfix.patch
trac715_two_tripledicts.patch
</pre><p>
Remark: I work on <code>openSUSE</code>, hence, I had to apply <a class="closed ticket" href="https://trac.sagemath.org/ticket/12131" title="defect: $SAGE_LOCAL/lib and lib64 (closed: fixed)">#12131</a> and thus also its dependency <a class="closed ticket" href="https://trac.sagemath.org/ticket/12057" title="enhancement: Upgrade R (r-project.org) (closed: fixed)">#12057</a>. I doubt that the absence of <a class="closed ticket" href="https://trac.sagemath.org/ticket/11115" title="enhancement: Rewrite cached_method in Cython (closed: fixed)">#11115</a> is responsible for Sage not starting. And all other patches are dependencies.
</p>
<p>
What error occurs when you start Sage with my patch? If we are lucky, it gives some clue why the segfault in the one doctest occurs.
</p>
<p>
Best regards,
</p>
<p>
Simon
</p>
TicketSimonKingMon, 02 Jan 2012 18:40:38 GMT
https://trac.sagemath.org/ticket/715#comment:52
https://trac.sagemath.org/ticket/715#comment:52
<p>
PS: I started on top of sage-4.8.alpha3.
</p>
TicketSimonKingTue, 03 Jan 2012 06:25:17 GMT
https://trac.sagemath.org/ticket/715#comment:53
https://trac.sagemath.org/ticket/715#comment:53
<p>
Meanwhile I built sage-5.0.prealpha0 and applied <a class="closed ticket" href="https://trac.sagemath.org/ticket/11780" title="defect: Creating a polynomial ring over a number field results in a non-unique ... (closed: fixed)">#11780</a> and <a class="attachment" href="https://trac.sagemath.org/attachment/ticket/715/trac715_two_tripledicts.patch" title="Attachment 'trac715_two_tripledicts.patch' in Ticket #715">trac715_two_tripledicts.patch</a><a class="trac-rawlink" href="https://trac.sagemath.org/raw-attachment/ticket/715/trac715_two_tripledicts.patch" title="Download"></a>. Sage starts fine.
</p>
<p>
So, Volker, what had you have applied when Sage didn't start?
</p>
TicketSimonKingTue, 03 Jan 2012 09:33:46 GMT
https://trac.sagemath.org/ticket/715#comment:54
https://trac.sagemath.org/ticket/715#comment:54
<p>
I think I made a progress: I found that the vector space that is part of the crash is <em>not unique</em>! So, the <code>VectorMatrixAction</code> is defined for a vector space that is equal to but not identical with the vector space it is acting on!
</p>
<p>
The natural solution is to try and find out why the vector space is not unique. Vector spaces should be created using the <code>VectorSpace</code> constructor, that relies on a <code>UniqueFactory</code>. But apparently some very old code is constructing a vector space directly - it wouldn't be the first time that this is causing trouble.
</p>
TicketSimonKingTue, 03 Jan 2012 09:43:10 GMT
https://trac.sagemath.org/ticket/715#comment:55
https://trac.sagemath.org/ticket/715#comment:55
<p>
PS: Note that vector spaces with different inner product are considered equal.
</p>
<pre class="wiki">sage: V = QQ^5
sage: M = random_matrix(QQ,5,5)
sage: M.set_immutable()
sage: W = VectorSpace(QQ,5,inner_product_matrix=M)
sage: V
Vector space of dimension 5 over Rational Field
sage: W
Ambient quadratic space of dimension 5 over Rational Field
Inner product matrix:
[ 0 1/2 1 -1 -1]
[ 0 0 0 1 -1/2]
[ -2 0 0 0 0]
[ 1 0 2 0 0]
[ 0 -2 0 1 0]
sage: V==W
True
sage: type(V)==type(W)
False
</pre><p>
But this is not the problem here: The two equal vector spaces involved in the crash have default inner product.
</p>
<p>
The non-uniqueness makes me think of another potential solution: The coercion model has a method "verify_action". This is <em>only</em> called when a new action is found, but not when an action is taken from the cache.
</p>
<p>
So, in addition to fixing the non-unique vector space in the modular symbols code, one could <em>always</em> verify the action. Probably this would be too slow, though.
</p>
TicketSimonKingTue, 03 Jan 2012 09:59:31 GMT
https://trac.sagemath.org/ticket/715#comment:56
https://trac.sagemath.org/ticket/715#comment:56
<p>
Aha! We have a sparse versus a dense vector space! Here is our problem!
</p>
TicketvbraunTue, 03 Jan 2012 10:08:35 GMT
https://trac.sagemath.org/ticket/715#comment:57
https://trac.sagemath.org/ticket/715#comment:57
<p>
I did manage to install it and reproduce the crash. The core dump shows that the stack is completely corrupted before we called into gmp code.
</p>
TicketSimonKingTue, 03 Jan 2012 10:13:09 GMT
https://trac.sagemath.org/ticket/715#comment:58
https://trac.sagemath.org/ticket/715#comment:58
<p>
Hi Volker,
</p>
<p>
good that you managed to install it. Meanwhile I think I can debug it without the core dump - I think mistaking a sparse with a dense vector space is a pretty convincing reason for a segfault.
</p>
<p>
However, I hate that old code!!
</p>
<p>
I tried <code>verify_action</code>, but then hundreds of tests fail in sage/modular/modsym/space.py. So, apparently it is very common to have non-unique parents in such a way that the action can <em>not</em> be fixed!
</p>
<p>
For example, I see errors like
</p>
<pre class="wiki"> TypeError: Coercion of [Infinity] - [0] (of type <class 'sage.modular.modsym.boundary.BoundarySpaceElement'>) into Space of Boundary Modular Symbols for Congruence Subgroup Gamma0(43) of weight 2 and over Rational Field not (yet) defined.
</pre><p>
Anyway, <code>verify_action</code> is no solution.
</p>
TicketjpfloriTue, 03 Jan 2012 10:18:06 GMT
https://trac.sagemath.org/ticket/715#comment:59
https://trac.sagemath.org/ticket/715#comment:59
<p>
Hi all,
</p>
<p>
Just wanted to say I had no problem installing the new patch on top of sage.4.8.alpha5 with tickets <a class="closed ticket" href="https://trac.sagemath.org/ticket/9138" title="defect: Categories for all rings (closed: fixed)">#9138</a> <a class="closed ticket" href="https://trac.sagemath.org/ticket/11900" title="defect: Serious regression caused by #9138 (closed: fixed)">#11900</a> <a class="closed ticket" href="https://trac.sagemath.org/ticket/1115" title="enhancement: [with new patch, positive review] Sha_an either fails or lies when ... (closed: fixed)">#1115</a> <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> in that order and Sage launches.
I'll start a make ptestlong now.
</p>
TicketSimonKingTue, 03 Jan 2012 10:32:09 GMT
https://trac.sagemath.org/ticket/715#comment:60
https://trac.sagemath.org/ticket/715#comment:60
<p>
Hi Jean-Pierre,
</p>
<p>
don't start ptestlong - I am about to update the new patch such that the segfault does not occur and the time for executing the test is fine and the memleak is gone!
</p>
TicketSimonKingTue, 03 Jan 2012 10:33:19 GMTattachment set
https://trac.sagemath.org/ticket/715
https://trac.sagemath.org/ticket/715
<ul>
<li><strong>attachment</strong>
set to <em>trac715_two_tripledicts.patch</em>
</li>
</ul>
<p>
Use weak references to the keys of <code>TripleDict</code>. Compare by "==" or by "is", depending on the application. Use weak references for storing actions.
</p>
TicketSimonKingTue, 03 Jan 2012 10:34:29 GMTstatus, description changed; work_issues deleted
https://trac.sagemath.org/ticket/715#comment:61
https://trac.sagemath.org/ticket/715#comment:61
<ul>
<li><strong>status</strong>
changed from <em>needs_work</em> to <em>needs_review</em>
</li>
<li><strong>description</strong>
modified (<a href="/ticket/715?action=diff&version=61">diff</a>)
</li>
<li><strong>work_issues</strong>
<em>fix doctests</em> deleted
</li>
</ul>
<p>
See the updated patch:
</p>
<p>
Apply trac715_two_tripledicts.patch
</p>
TicketSimonKingTue, 03 Jan 2012 10:42:40 GMT
https://trac.sagemath.org/ticket/715#comment:62
https://trac.sagemath.org/ticket/715#comment:62
<p>
Here some remarks on the new patch:
</p>
<p>
I use <code>TripleDictById</code> for storing actions, since otherwise we have trouble with non-unique parents and get segfaults.
</p>
<p>
In addition, I do not directly store the action but only a weak reference to it, since otherwise I couldn't fix the memory leak.
</p>
<p>
Sometimes, the stored action is in fact <code>None</code>, for which we can't use a weak references. Instead, I use a constant function. For technical reasons it returns False and not None (namely, this is to avoid confusion with a weak reference that has become invalid).
</p>
<p>
<strong><span class="underline">Features</span></strong>
</p>
<ul><li>The segfault in <code>sage -t sage/modular/modsym/space.py</code> is gone.
</li><li>The time for executing that test remains fine, namely 20.7 seconds (unpatched sage-5.0.prealpha0) versus 21.4 seconds (with patch).
</li><li>The example from the ticket description does not leak anymore!
</li></ul><p>
Thus, needs, review.
</p>
TicketjpfloriTue, 03 Jan 2012 10:48:46 GMT
https://trac.sagemath.org/ticket/715#comment:63
https://trac.sagemath.org/ticket/715#comment:63
<p>
Ok, I'll give the new patch a go and report after make ptestlong and checking for the memleaks.
</p>
TicketSimonKingTue, 03 Jan 2012 11:06:49 GMT
https://trac.sagemath.org/ticket/715#comment:64
https://trac.sagemath.org/ticket/715#comment:64
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:63" title="Comment 63">jpflori</a>:
</p>
<blockquote class="citation">
<p>
Ok, I'll give the new patch a go and report after make ptestlong
</p>
</blockquote>
<p>
So do I.
</p>
<p>
I guess at least one thing is needed: Provide a doc test that demonstrates the fix of the memory leak. This should be similar to the example for the patch that I have posted at <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>. Note that <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> is in fact a duplicate: The examples from the two ticket descriptions are almost identical.
</p>
TicketvbraunTue, 03 Jan 2012 11:22:52 GMT
https://trac.sagemath.org/ticket/715#comment:65
https://trac.sagemath.org/ticket/715#comment:65
<p>
Actions have strong references to domain and codomain, so its no surprise that they keep their coercion cache entry alive. But I don't understand how storing a weak reference to the action can work; Nothing else keeps the action alive unless it happens to be used while the garbage collector is running. So actions are essentially not cached any more. It seem that either actions should only store weak references to domain/codomain or we implement some ring buffer that keeps the last N coerce maps unconditionally alive.
</p>
<p>
In fact, the action's reference to domain and codomain seem to be for convenience only. After all you know domain and codomain when you constuct the action and when you pick it from the cache, so there shouldn't be much incentive to look it up. Perhaps it would be easy to make them weak refs, did you look into that?
</p>
TicketjpfloriTue, 03 Jan 2012 11:26:31 GMT
https://trac.sagemath.org/ticket/715#comment:66
https://trac.sagemath.org/ticket/715#comment:66
<p>
I agree with Volker and would like to test putting weak refs to domain and codomain in Functor as I suggested in <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> and letting an option to use strong ref by default so that a user building an action but not storing its domain elsewhere won't see it disappear magically.
</p>
<p>
Unfortunately I do not have much time to do anything more than testing till the end of the week.
</p>
TicketSimonKingTue, 03 Jan 2012 11:42:41 GMT
https://trac.sagemath.org/ticket/715#comment:67
https://trac.sagemath.org/ticket/715#comment:67
<p>
I wouldn't use weak references for anything but caching. In particular, having a weak reference from a functor to its domain or codomain seems a no-go to me.
</p>
<p>
In one point I agree: There should be a mechanism to keep an action alive as long as domain and codomain exist. But perhaps this is already the case? Isn't there an action cache as an attribute of any parent? And isn't the action stored there (and not only in the cache of the coercion model) when an action is discovered?
</p>
<p>
So, before thinking of a weak reference from the functor to domain and codomain, I would first test whether the problem you describe actually occurs.
</p>
TicketSimonKingTue, 03 Jan 2012 13:16:53 GMT
https://trac.sagemath.org/ticket/715#comment:68
https://trac.sagemath.org/ticket/715#comment:68
<p>
Just two mental notes:
</p>
<p>
One test in sage/structure/coerce.pyx fails, because it explicitly uses the action cache (ignoring the fact that it now contains weak references and not actions).
</p>
<p>
And: The long tests of these two files
</p>
<pre class="wiki">devel/sage/sage/graphs/graph_generators.py
devel/sage/sage/graphs/generic_graph.py
</pre><p>
take 10 minutes each. Is my patch to blame, or has it been like that before?
</p>
TicketjpfloriTue, 03 Jan 2012 14:16:44 GMTstatus changed
https://trac.sagemath.org/ticket/715#comment:69
https://trac.sagemath.org/ticket/715#comment:69
<ul>
<li><strong>status</strong>
changed from <em>needs_review</em> to <em>needs_work</em>
</li>
</ul>
<p>
With sage4.8.alpha5 plus <a class="closed ticket" href="https://trac.sagemath.org/ticket/9138" title="defect: Categories for all rings (closed: fixed)">#9138</a> <a class="closed ticket" href="https://trac.sagemath.org/ticket/11900" title="defect: Serious regression caused by #9138 (closed: fixed)">#11900</a> <a class="closed ticket" href="https://trac.sagemath.org/ticket/1115" title="enhancement: [with new patch, positive review] Sha_an either fails or lies when ... (closed: fixed)">#1115</a> <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> applied some tests fail, namely in the files:
</p>
<ul><li>sage.rings.padic.padic_base_generic_element.pyx (1 test failed) but did NOT when rerun and did again the next time,
</li><li>sage.rings.number_field.number_field.py (1) and did NOT again,
</li><li>sage.structure.coerce.pyx (5) and did again,
</li><li>sage.algebras.quatalg.quaternion_algebra.py (1) and did again once and then did NOT again,
</li><li>lots of them in sage.homology.* (20+25+50+93+1) and did again.
</li></ul><p>
The random behavior of some of the above tests fails with:
</p>
<ul><li><a class="missing wiki">IndexError?</a>: list index out of range (padic)
</li><li>Attribute Error: QuaternionAlgebra_abstract_with_category object has no attribute _a (quatalg)
</li></ul><p>
and at some point in the stack <a class="missing wiki">TripleDicts?</a> of the coercion model are present.
</p>
TicketjpfloriTue, 03 Jan 2012 14:18:08 GMT
https://trac.sagemath.org/ticket/715#comment:70
https://trac.sagemath.org/ticket/715#comment:70
<p>
Oops, this should be more readable:
</p>
<p>
With sage4.8.alpha5 plus <a class="ext-link" href="http://trac.sagemath.org/sage_trac/ticket/9138"><span class="icon"></span>#9138</a> <a class="ext-link" href="http://trac.sagemath.org/sage_trac/ticket/11900"><span class="icon"></span>#11900</a> <a class="ext-link" href="http://trac.sagemath.org/sage_trac/ticket/1115"><span class="icon"></span>#1115</a> <a class="ext-link" href="http://trac.sagemath.org/sage_trac/ticket/715"><span class="icon"></span>#715</a> and <a class="ext-link" href="http://trac.sagemath.org/sage_trac/ticket/11521"><span class="icon"></span>#11521</a> applied some tests fail, namely in the files:
</p>
<ul><li>sage.rings.padic.padic_base_generic_element.pyx (1 test failed) but did NOT when rerun and did again the next time,
</li></ul><ul><li>sage.rings.number_field.number_field.py (1) and did NOT again,
</li></ul><ul><li>sage.structure.coerce.pyx (5) and did again,
</li></ul><ul><li>sage.algebras.quatalg.quaternion_algebra.py (1) and did again once and then did NOT again,
</li></ul><ul><li>lots of them in sage.homology.* (20+25+50+93+1) and did again.
</li></ul><p>
The random behavior of some of the above tests fails with:
</p>
<ul><li><a class="ext-link" href="http://trac.sagemath.org/sage_trac/wiki/IndexError"><span class="icon"></span>IndexError?</a>: list index out of range (padic)
</li></ul><ul><li>Attribute Error: <a class="ext-link" href="http://trac.sagemath.org/sage_trac/wiki/QuaternionAlgebra"><span class="icon"></span>QuaternionAlgebra?</a>_abstract_with_category object has no attribute _a (quatalg) and at some point in the stack <a class="ext-link" href="http://trac.sagemath.org/sage_trac/wiki/TripleDicts"><span class="icon"></span>TripleDicts?</a> of the coercion model are present.
</li></ul>
TicketjpfloriTue, 03 Jan 2012 14:21:47 GMT
https://trac.sagemath.org/ticket/715#comment:71
https://trac.sagemath.org/ticket/715#comment:71
<p>
For info, the number_field test also fails with an "IndexError: list out of range".
</p>
TicketSimonKingTue, 03 Jan 2012 14:52:54 GMTwork_issues set
https://trac.sagemath.org/ticket/715#comment:72
https://trac.sagemath.org/ticket/715#comment:72
<ul>
<li><strong>work_issues</strong>
set to <em>avoid regression</em>
</li>
</ul>
<p>
The flaky behaviour probably means that sometimes something gets garbage collected when it shouldn't.
</p>
<p>
But why do you have the patch from <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> applied?
</p>
<p>
Note that with sage-5.0.prealpha0 + <a class="closed ticket" href="https://trac.sagemath.org/ticket/11780" title="defect: Creating a polynomial ring over a number field results in a non-unique ... (closed: fixed)">#11780</a> + the new patch from here, I get two tests with errors, namely
</p>
<pre class="wiki"> sage -t --long -force_lib devel/sage/sage/structure/coerce_dict.pyx # 1 doctests failed
sage -t --long -force_lib devel/sage/sage/structure/coerce.pyx # 5 doctests failed
</pre><p>
However, the tests took rather long in total: 12100 seconds with the new patch versus 4569 seconds unpatched.
</p>
<p>
I think the regression is not acceptable.
</p>
<p>
Well, perhaps you are right and we should experiment with weak references on domain and codomain.
</p>
TicketjpfloriTue, 03 Jan 2012 15:01:12 GMT
https://trac.sagemath.org/ticket/715#comment:73
https://trac.sagemath.org/ticket/715#comment:73
<p>
Good point about <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>, I'd say because that what I was firstly interested in.
</p>
<p>
Without it applied, the flaky behavior seem to disappear.
</p>
<p>
I'll post timings with all patches, with all patches except for <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>, and with no patches in a few hours.
</p>
<p>
Anyway I guess Volker is right and even with just <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a> applied we should check that actions do not get garbage collected continuously as your timings suggest.
</p>
TicketSimonKingTue, 03 Jan 2012 15:07:15 GMT
https://trac.sagemath.org/ticket/715#comment:74
https://trac.sagemath.org/ticket/715#comment:74
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:73" title="Comment 73">jpflori</a>:
</p>
<blockquote class="citation">
<p>
Good point about <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>, I'd say because that what I was firstly interested in.
</p>
<p>
Without it applied, the flaky behavior seem to disappear.
</p>
</blockquote>
<p>
Good!
</p>
<blockquote class="citation">
<p>
I'll post timings with all patches, with all patches except for <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>, and with no patches in a few hours.
</p>
</blockquote>
<p>
OK, but I guess the timings I provided should be enough to show that the patch can not remain as it is now.
</p>
<blockquote class="citation">
<p>
Anyway I guess Volker is right and even with just <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a> applied we should check that actions do not get garbage collected continuously as your timings suggest.
</p>
</blockquote>
<p>
Yep. Two potential solutions:
</p>
<ol><li>Find out why apparently not all actions are registered in the parent (because then we would have a strong reference as long as at least the domain is alive).
</li><li>Play with the idea to have a strong reference on the action but a weak reference from a functor to its domain and codomain.
</li></ol><p>
I'm trying the latter now.
</p>
TicketjpfloriTue, 03 Jan 2012 15:21:41 GMT
https://trac.sagemath.org/ticket/715#comment:75
https://trac.sagemath.org/ticket/715#comment:75
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:74" title="Comment 74">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:73" title="Comment 73">jpflori</a>:
</p>
<blockquote class="citation">
<p>
Good point about <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>, I'd say because that what I was firstly interested in. Without it applied, the flaky behavior seem to disappear.
</p>
</blockquote>
<p>
Good!
</p>
<blockquote class="citation">
<p>
I'll post timings with all patches, with all patches except for <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>, and with no patches in a few hours.
</p>
</blockquote>
<p>
OK, but I guess the timings I provided should be enough to show that the patch can not remain as it is now.
</p>
<blockquote class="citation">
<p>
Anyway I guess Volker is right and even with just <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a> applied we should check that actions do not get garbage collected continuously as your timings suggest.
</p>
</blockquote>
</blockquote>
<blockquote class="citation">
<p>
Yep. Two potential solutions: 1. Find out why apparently not all actions are registered in the parent (because then we would have a strong reference as long as at least the domain is alive). 2. Play with the idea to have a strong reference on the action but a weak reference from a functor to its domain and codomain. I'm trying the latter now.
</p>
</blockquote>
<p>
Just to summarize, here is the current problem, please correct me if some of the following is wrong: we want to let a codomain (resp. domain) get garbage collected when its only weak reffed outside of the coercion model.
</p>
<p>
Before the current patch the situation is as follows for actions:
</p>
<ul><li>when an action is resolved, it is cached in a triple dict in the coercion model with the domain and codomains as keys
</li></ul><ul><li>the action is also cached in the dictionnaries in the domain and the codomain much in the same way
</li></ul><ul><li>there is also a similar cache for homsets
</li></ul><p>
The current patch let weakrefs be used for the keys to the above dictionaries and a weak ref to the corresponding value (which is the action).
</p>
<p>
The problem is that as the action is only weak reffed everywhere now, it gets garbage collected all the time (to be confirmed).
</p>
<p>
If it is not, then the codomain (resp. domain) will in turn not get garbage collected, because it will be strongly reffed in the action strongly reffed in the domain (resp. codomain) (to be confirmed).
</p>
<p>
The problem for the homset patch is slightly different and is being discussed in <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>.
</p>
TicketSimonKingTue, 03 Jan 2012 15:25:08 GMT
https://trac.sagemath.org/ticket/715#comment:76
https://trac.sagemath.org/ticket/715#comment:76
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:74" title="Comment 74">SimonKing</a>:
</p>
<blockquote class="citation">
<ol><li>Find out why apparently not all actions are registered in the parent (because then we would have a strong reference as long as at least the domain is alive).
</li></ol></blockquote>
<p>
That's why:
</p>
<pre class="wiki">sage: search_src("register_action")
structure/parent.pyx:1698: self.register_action(action)
structure/parent.pyx:1791: cpdef register_action(self, action):
structure/parent.pyx:1841: sage: R.register_action(act)
structure/parent.pxd:29: cpdef register_action(self, action)
</pre><p>
So, simply register action isn't used at all - which makes me think why some actions <em>are</em> stored in the parent.
</p>
TicketSimonKingTue, 03 Jan 2012 15:29:14 GMT
https://trac.sagemath.org/ticket/715#comment:77
https://trac.sagemath.org/ticket/715#comment:77
<p>
I see. register_action is not to be used after any coercion was established.
</p>
TicketvbraunTue, 03 Jan 2012 15:35:54 GMT
https://trac.sagemath.org/ticket/715#comment:78
https://trac.sagemath.org/ticket/715#comment:78
<p>
Just as a remark from the side lines, it seems that consistently storing a reference in the parent would be the cleanest solution. Perhaps the testsuite stuff can be used to verify that all parents do that?
</p>
TicketSimonKingTue, 03 Jan 2012 15:46:29 GMT
https://trac.sagemath.org/ticket/715#comment:79
https://trac.sagemath.org/ticket/715#comment:79
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:78" title="Comment 78">vbraun</a>:
</p>
<blockquote class="citation">
<p>
Just as a remark from the side lines, it seems that consistently storing a reference in the parent would be the cleanest solution.
</p>
</blockquote>
<p>
But perhaps a difficult one. The condition that <code>register_action</code> must not be used after defining any coercion is probably there for a reason.
</p>
<blockquote class="citation">
<p>
Perhaps the testsuite stuff can be used to verify that all parents do that?
</p>
</blockquote>
<p>
How could it? By hooking into the coercion model, look up any action there and verify that all are registered?
</p>
TicketSimonKingTue, 03 Jan 2012 15:50:17 GMTattachment set
https://trac.sagemath.org/ticket/715
https://trac.sagemath.org/ticket/715
<ul>
<li><strong>attachment</strong>
set to <em>test_orphan_functor</em>
</li>
</ul>
<p>
Experimental patch using weak references on domain and codomain of functors
</p>
TicketSimonKingTue, 03 Jan 2012 15:56:40 GMT
https://trac.sagemath.org/ticket/715#comment:80
https://trac.sagemath.org/ticket/715#comment:80
<p>
I have posted an <a class="attachment" href="https://trac.sagemath.org/attachment/ticket/715/test_orphan_functor" title="Attachment 'test_orphan_functor' in Ticket #715">experimental patch</a><a class="trac-rawlink" href="https://trac.sagemath.org/raw-attachment/ticket/715/test_orphan_functor" title="Download"></a>, that has to be applied on top of <a class="attachment" href="https://trac.sagemath.org/attachment/ticket/715/trac715_two_tripledicts.patch" title="Attachment 'trac715_two_tripledicts.patch' in Ticket #715">trac715_two_tripledicts.patch</a><a class="trac-rawlink" href="https://trac.sagemath.org/raw-attachment/ticket/715/trac715_two_tripledicts.patch" title="Download"></a>.
</p>
<p>
With the experimental patch, the coercion model stores strong references to the actions (hence, it restores the original behaviour), but functors will only store weak references to their domains and codomains.
</p>
<p>
Unfortunately, this does <em>not</em> fix the memory leak. But perhaps you want to play with it...
</p>
<p>
Ah! And I just see that "sage.categories.functor" was the wrong location to do the change.
</p>
TicketSimonKingTue, 03 Jan 2012 15:58:39 GMT
https://trac.sagemath.org/ticket/715#comment:81
https://trac.sagemath.org/ticket/715#comment:81
<p>
Or I should say: <code>Action.__domain</code> is <em>not</em> what the action acts on, but it is a groupoid, and is not used. So, forget the experimental patch.
</p>
TicketSimonKingTue, 03 Jan 2012 16:18:11 GMT
https://trac.sagemath.org/ticket/715#comment:82
https://trac.sagemath.org/ticket/715#comment:82
<p>
An action of G on S stores direct references to G and to S.
</p>
<p>
The action is a functor, and as a functor, it additionally stores a reference to <code>Groupoid(G)</code>, which stores another reference to G, and to the category of S.
</p>
<p>
In some cases, the category of S will store references to the base ring of S (for example, if S is an algebra), which might have a pointer back to S (for example if the action of <code>S.base_ring()</code> on S was registered during initialisation). In this case, we are lost, since categories are unique parents and thus strongly cached (unless we apply <a class="closed ticket" href="https://trac.sagemath.org/ticket/12215" title="defect: Memleak in UniqueRepresentation, @cached_method (closed: fixed)">#12215</a>, which poses some problems).
</p>
<p>
For the same reason, creating the groupoid of G will result in an eternal reference on G (<code>Groupoid(G)</code> is strongly cached and it points to G). So, the best that we can hope for is that we can free S at some point, but we will never be able to free G.
</p>
<p>
It starts to be complicated. Time to call it a day...
</p>
<p>
Perhaps the idea to register actions in the parents (in addition to a weak cache in the coercion model) is better?
</p>
TicketjpfloriTue, 03 Jan 2012 16:25:12 GMT
https://trac.sagemath.org/ticket/715#comment:83
https://trac.sagemath.org/ticket/715#comment:83
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:82" title="Comment 82">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
An action of G on S stores direct references to G and to S. The action is a functor, and as a functor, it additionally stores a reference to <code>Groupoid(G)</code>, which stores another reference to G, and to the category of S. In some cases, the category of S will store references to the base ring of S (for example, if S is an algebra), which might have a pointer back to S (for example if the action of <code>S.base_ring()</code> on S was registered during initialisation). In this case, we are lost, since categories are unique parents and thus strongly cached (unless we apply <a class="closed ticket" href="https://trac.sagemath.org/ticket/12215" title="defect: Memleak in UniqueRepresentation, @cached_method (closed: fixed)">#12215</a>, which poses some problems). For the same reason, creating the groupoid of G will result in an eternal reference on G (<code>Groupoid(G)</code> is strongly cached and it points to G). So, the best that we can hope for is that we can free S at some point, but we will never be able to free G. It starts to be complicated. Time to call it a day... Perhaps the idea to register actions in the parents (in addition to a weak cache in the coercion model) is better?
</p>
</blockquote>
<p>
But if you store the actions in both parents (with strong references), you will never be able to free any of the two domain and codomain.
</p>
<p>
In the ticket example for example you would get a strong reference to the action in the ZZ cache (which will hopefully never get deleted) (in fact that is what is happening with the current Sage version anyway, isn't that strange according to what you posted, because I guess is already initialized once the for loop is executed?) so the elliptic curves (in the ticket example you only get one stored in that cache because comarison was made with "==", if you let the j invariant change within the for loop you would get a growing number of curves in that cache) will stay strongly refed forever as well...
</p>
TicketjpfloriTue, 03 Jan 2012 16:56:12 GMT
https://trac.sagemath.org/ticket/715#comment:84
https://trac.sagemath.org/ticket/715#comment:84
<p>
My timings
</p>
<ul><li>with everything up to <a class="ext-link" href="http://trac.sagemath.org/sage_trac/search/opensearch?q=ticket%3A11521"><span class="icon"></span>#11521</a>: about 3650 sec (errors as above)<br />
</li></ul><ul><li>with everything up to <a class="ext-link" href="http://trac.sagemath.org/sage_trac/search/opensearch?q=ticket%3A715"><span class="icon"></span>#715</a>: about 3350 sec (only errors in coerce.pyx)<br />
</li></ul><ul><li>with everything up to <a class="ext-link" href="http://trac.sagemath.org/sage_trac/search/opensearch?q=ticket%3A11115"><span class="icon"></span>#11115</a>: about 3350 sec as well (no errors)
</li></ul><p>
So, unless I did something wrong, I do not get any significant slow down...
</p>
TicketjpfloriTue, 03 Jan 2012 18:35:39 GMT
https://trac.sagemath.org/ticket/715#comment:85
https://trac.sagemath.org/ticket/715#comment:85
<p>
I got about 3350 sec on top of vanilla sage-4.8.alpha5.
</p>
TicketSimonKingTue, 03 Jan 2012 19:47:15 GMT
https://trac.sagemath.org/ticket/715#comment:86
https://trac.sagemath.org/ticket/715#comment:86
<p>
Hi Jean-Pierre,
</p>
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:83" title="Comment 83">jpflori</a>:
</p>
<blockquote class="citation">
<p>
But if you store the actions in both parents (with strong references), you will never be able to free any of the two domain and codomain.
</p>
</blockquote>
<p>
This is not necessarily the case. You would merely get circular references, and they would not obstruct garbage collection, unless one item in the cycle has a <code>__del__</code> method.
</p>
<p>
One problem, however, is that many actions start with <code>ZZ</code>. And if <code>ZZ</code> is contained in the cycle, then it can not be collected, since <code>ZZ</code> will live forever -- but you know that.
</p>
<blockquote class="citation">
<p>
In the ticket example for example you would get a strong reference to the action in the ZZ cache (which will hopefully never get deleted) (in fact that is what is happening with the current Sage version anyway, isn't that strange according to what you posted, because I guess is already initialized once the for loop is executed?)
</p>
</blockquote>
<p>
Yes. And is it really sure that the actions are stored in <code>ZZ</code>?
</p>
<p>
Anyway. They are stored by <code>==</code>, and thus only one copy remains alive.
</p>
<blockquote class="citation">
<p>
so the elliptic curves (in the ticket example you only get one stored in that cache because comarison was made with "==", if you let the j invariant change within the for loop you would get a growing number of curves in that cache) will stay strongly refed forever as well...
</p>
</blockquote>
<p>
Yes. And that is a problem that, again, might be solved using weak references.
</p>
<p>
Namely:
</p>
<p>
Consider an action A of G on S. Typically, G is immortal (like <code>ZZ</code>), but we are willing to let A and S die if we do not have any "external" strong reference to S. In particular, the existence of A should not be enough to keep S alive.
</p>
<p>
I think this can be accomplished as follows:
</p>
<ul><li>For quick access and for backwards compatibility, we want that actions remain stored in the coercion model. We use weak references to the keys (G,S), but a strong reference to the action (this is what the previous version of <a class="attachment" href="https://trac.sagemath.org/attachment/ticket/715/trac715_two_tripledicts.patch" title="Attachment 'trac715_two_tripledicts.patch' in Ticket #715">trac715_two_tripledicts.patch</a><a class="trac-rawlink" href="https://trac.sagemath.org/raw-attachment/ticket/715/trac715_two_tripledicts.patch" title="Download"></a> did).
</li><li>In addition to that, A should only have a weak reference to S; I think it doesn't matter whether the reference from A to G is strong or weak.
</li></ul><p>
Let us analyse what happens with G, S and A:
</p>
<ol><li>G will remain alive forever, even without an external reference. Namely, the coercion cache has a strong reference to A; as a functor, A points to <code>Groupoid(G)</code>; <code>Groupoid(G)</code> is strongly cached (unless we use weak caching for <code>UniqueRepresentation</code>) and must have a reference to G. If we decide to use weak caching for <code>UniqueRepresentation</code>, then we would only have a strong reference from G to A and a weak or strong reference from A to G. That would be fine for garbage collection. Anyway, I think keeping G alive will not hurt.
</li><li>Assume that there is no external reference to S. There is a weak reference to S from the cache in the coercion model, namely as key of the cache. Moreover, there is another <em>weak</em> reference from A to S. Hence, S could be garbage collected.
</li><li>Assume that there is no external reference to A. If S is garbage collected (see the previous point), then it will remove itself from the coercion cache, and thus the strong reference to A would vanish - it could be collected. But if S is alive, then A will remain alive as well.
</li></ol><p>
However, this is how the experimental patch should work - and it does <em>not</em> fix the leak. Perhaps this is, again, due to caching the homsets? So, we would need the patch from <a class="closed ticket" href="https://trac.sagemath.org/ticket/12215" title="defect: Memleak in UniqueRepresentation, @cached_method (closed: fixed)">#12215</a> as well. Difficult topic.
</p>
TicketSimonKingTue, 03 Jan 2012 19:49:09 GMT
https://trac.sagemath.org/ticket/715#comment:87
https://trac.sagemath.org/ticket/715#comment:87
<p>
Sorry, I meant "we would need the patch from <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> as well".
</p>
TicketSimonKingTue, 03 Jan 2012 22:42:28 GMTdescription changed
https://trac.sagemath.org/ticket/715#comment:88
https://trac.sagemath.org/ticket/715#comment:88
<ul>
<li><strong>description</strong>
modified (<a href="/ticket/715?action=diff&version=88">diff</a>)
</li>
</ul>
<p>
I have attached another patch, which implements the ideas sketched above. I think it corresponds to what you suggested ("use a weak reference from the action to the domain").
</p>
<p>
One detail: We have to distinguish between the underlying set, the domain and the codomain of an action. In fact, the new patch only uses a weak reference to the underlying set, and introduces a cdef function (hence, hopefully with little overhead) returning it.
</p>
<p>
I consider sage-5.0.prealpha0 plus trac11780_unique_auxiliar_polyring.patch (probably not needed) plus <a class="attachment" href="https://trac.sagemath.org/attachment/ticket/715/trac715_two_tripledicts.patch" title="Attachment 'trac715_two_tripledicts.patch' in Ticket #715">trac715_two_tripledicts.patch</a><a class="trac-rawlink" href="https://trac.sagemath.org/raw-attachment/ticket/715/trac715_two_tripledicts.patch" title="Download"></a> plus <a class="attachment" href="https://trac.sagemath.org/attachment/ticket/715/trac715_weak_action.patch" title="Attachment 'trac715_weak_action.patch' in Ticket #715">trac715_weak_action.patch</a><a class="trac-rawlink" href="https://trac.sagemath.org/raw-attachment/ticket/715/trac715_weak_action.patch" title="Download"></a>.
</p>
<p>
At least <code>sage -t sage/modular/modsym/space.py</code> passes, but I need to run the whole test suite.
</p>
<p>
The example from the ticket description does not leak. However, if the j-invariant varies, it seems that for each elliptic curve one copy is preserved:
</p>
<pre class="wiki">sage: K = GF(1<<55,'t')
sage: for i in range(500):
....: a = K.random_element()
....: E = EllipticCurve(j=a)
....: P = E.random_point()
....: Q = 2*P
....:
sage: import gc
sage: gc.collect()
2124
sage: from sage.schemes.generic.homset import SchemeHomsetModule_abelian_variety_coordinates_field
sage: LE = [x for x in gc.get_objects() if isinstance(x,SchemeHomsetModule_abelian_variety_coordinates_field)]
sage: len(LE)
500
</pre><p>
In any case, the original leak is fixed with the two patches. The question is whether the second patch suffices to keep actions alive, whether it avoids a regression, and whether all tests pass.
</p>
<p>
If everything is alright, we may still try to find out where the remaining strong reference to an elliptic curve comes from.
</p>
TicketSimonKingTue, 03 Jan 2012 22:48:33 GMT
https://trac.sagemath.org/ticket/715#comment:89
https://trac.sagemath.org/ticket/715#comment:89
<p>
PS: The additional application of <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> does not suffice to avoid the remaining strong reference to an elliptic curve.
</p>
TicketSimonKingWed, 04 Jan 2012 06:09:03 GMTstatus changed
https://trac.sagemath.org/ticket/715#comment:90
https://trac.sagemath.org/ticket/715#comment:90
<ul>
<li><strong>status</strong>
changed from <em>needs_work</em> to <em>needs_info</em>
</li>
</ul>
<p>
With the two patches applied, I get some doctest errors that seem trivial to fix, and it takes 10905 seconds in total. Now, I am not sure: Originally, I had much less time with unpatched Sage.
</p>
<p>
But perhaps my computer was in a different state at that time? Jean-Pierre, if I understood correctly, you did not find any significant slowdown, right?
</p>
<p>
The first (i.e., the "official") patch is enough to fix the leak for the original example. According to Jean-Pierre, the timings are fine, it does not matter whether we have no patch, the official patch only, or the first experimental patch. And according to my own test, it does not matter whether we have the first or the second experimental patch.
</p>
<p>
So, the further proceeding depends on the following questions:
</p>
<ul><li>The experimental patches provide two different approaches to fix a potential problem, namely actions being deallocated when they are still needed. However, is this potential problem a <em>real</em> problem? Only then would it make sense to consider the experimental patches!
</li></ul><ul><li>Do we also want to fix the leak in the more difficult example, namely when the j-invariant varies? In this case, we need to find out why the actions are registered in ZZ. It is not clear yet whether one really needs one of the experimental patches to get rid of it.
</li></ul><p>
What is your answer to the questions?
</p>
TicketSimonKingWed, 04 Jan 2012 09:20:28 GMT
https://trac.sagemath.org/ticket/715#comment:91
https://trac.sagemath.org/ticket/715#comment:91
<p>
There are two occasions for writing stuff into <code>Parent._action_hash</code>: During initialisation, via register_action, and in addition the action is stored in the parent when a new action is found while calling get_action.
</p>
<p>
Perhaps we should distinguish the two cases: The actions that are stored during initialisation should probably be "immortal". But the actions that is stored on the fly should only be weakly cached.
</p>
<p>
I think this can be solved by changing <code>Parent._action_hash</code> into a dictionary that uses weak references to both the keys and the values. There is one difference between register_action and get_action: register_action additionally stores the actions in a list, but get_action doesn't. Hence, indeed the actions registered during initialisation will survive, but the stuff stored by get_action could become collectable.
</p>
TicketSimonKingWed, 04 Jan 2012 10:10:59 GMTstatus changed; work_issues deleted
https://trac.sagemath.org/ticket/715#comment:92
https://trac.sagemath.org/ticket/715#comment:92
<ul>
<li><strong>status</strong>
changed from <em>needs_info</em> to <em>needs_review</em>
</li>
<li><strong>work_issues</strong>
<em>avoid regression</em> deleted
</li>
</ul>
<p>
Yesss!! It suffices (in addition to what I did before) to use a <code>TripleDictById</code> (which uses weak references to the keys, but strong references to the value) for <code>Parent._action_hash</code>!!!
</p>
<p>
The leak is no completely gone:
</p>
<pre class="wiki">sage: K = GF(1<<55,'t')
sage: for i in range(50):
....: a = K.random_element()
....: E = EllipticCurve(j=a)
....: P = E.random_point()
....: Q = 2*P
....:
sage: from sage.schemes.generic.homset import SchemeHomsetModule_abelian_variety_coordinates_field
sage: import gc, objgraph
sage: gc.collect()
882
sage: LE = [x for x in gc.get_objects() if isinstance(x,SchemeHomsetModule_abelian_variety_coordinates_field)]
sage: len(LE)
1
</pre><p>
I need to add a test (or better: modify the test introduced by the first patch), demonstrating that the "big" leak is fixed, and I need to add tests for the new code I wrote, and of course I need to run ptestlong.
</p>
<p>
Nevertheless, I think you can start reviewing. And please store the doc test times, so that we can detect any regression.
</p>
<p>
Apply trac715_two_tripledicts.patch trac715_weak_action.patch
</p>
TicketjpfloriWed, 04 Jan 2012 10:12:46 GMT
https://trac.sagemath.org/ticket/715#comment:93
https://trac.sagemath.org/ticket/715#comment:93
<p>
Good!
</p>
<p>
I've just built last sage prealpha and am quite busy today, but I'll at least run ptestlong with and without patches to get timings in the afternoo.
</p>
TicketSimonKingWed, 04 Jan 2012 10:20:21 GMT
https://trac.sagemath.org/ticket/715#comment:94
https://trac.sagemath.org/ticket/715#comment:94
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:93" title="Comment 93">jpflori</a>:
</p>
<blockquote class="citation">
<p>
I've just built last sage prealpha and am quite busy today, but I'll at least run ptestlong with and without patches to get timings in the afternoo.
</p>
</blockquote>
<p>
Good! I am sure that there will be errors (for example, the current test against the leak expects <code>len(LE)==2</code>), but the timings will certainly be interesting. And of course, it would be interesting whether an action can die prematurely.
</p>
TicketSimonKingWed, 04 Jan 2012 12:23:44 GMT
https://trac.sagemath.org/ticket/715#comment:95
https://trac.sagemath.org/ticket/715#comment:95
<p>
I am sure that there is a regression compared with vanilla sage-5.0.prealpha0. For example, when I originally ran sage -ptestlong, sage/schemes/hyperellyptic_cuve/hyperellyptic_padic_field.py took 57 seconds, but with the patches it takes 160 seconds.
</p>
TicketSimonKingWed, 04 Jan 2012 13:54:19 GMT
https://trac.sagemath.org/ticket/715#comment:96
https://trac.sagemath.org/ticket/715#comment:96
<p>
Meanwhile I took some timings on a different machine. Based on the experience that the schemes code tends to slow down a lot when one does fancy stuff (see <a class="closed ticket" href="https://trac.sagemath.org/ticket/11900" title="defect: Serious regression caused by #9138 (closed: fixed)">#11900</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/11935" title="enhancement: Make parent/element classes independent of base rings (closed: fixed)">#11935</a>), I use "sage/schemes" as a test bed.
</p>
<p>
I find (based on sage-4.8.alpha3):
</p>
<p>
With patch
</p>
<pre class="wiki">king@mpc622:/mnt/local/king/SAGE/rebase/sage-4.8.alpha3/devel/sage$ hg qapplied
trac_12149.3.patch
9138_flat.patch
trac11900_category_speedup_combined.patch
11115_flat.patch
trac_11115_docfix.patch
trac715_two_tripledicts.patch
trac715_weak_action.patch
king@mpc622:/mnt/local/king/SAGE/rebase/sage-4.8.alpha3/devel/sage$ ../../sage -t sage/schemes/
...
----------------------------------------------------------------------
All tests passed!
Total time for all tests: 625.1 seconds
</pre><p>
Here are the five worst:
</p>
<pre class="wiki">sage -t "devel/sage-main/sage/schemes/elliptic_curves/ell_rational_field.py"
[58.1 s]
sage -t "devel/sage-main/sage/schemes/elliptic_curves/heegner.py"
[51.1 s]
sage -t "devel/sage-main/sage/schemes/elliptic_curves/ell_number_field.py"
[35.2 s]
sage -t "devel/sage-main/sage/schemes/elliptic_curves/padic_lseries.py"
[26.9 s]
sage -t "devel/sage-main/sage/schemes/elliptic_curves/sha_tate.py"
[25.7 s]
</pre><p>
Now, the same without the two patches from here:
</p>
<pre class="wiki">king@mpc622:/mnt/local/king/SAGE/rebase/sage-4.8.alpha3/devel/sage$ hg qapplied
trac_12149.3.patch
9138_flat.patch
trac11900_category_speedup_combined.patch
11115_flat.patch
trac_11115_docfix.patch
king@mpc622:/mnt/local/king/SAGE/rebase/sage-4.8.alpha3/devel/sage$ ../../sage -t sage/schemes/
...
----------------------------------------------------------------------
All tests passed!
Total time for all tests: 597.0 seconds
</pre><p>
And the five worst, comparing with the times from above, are:
</p>
<pre class="wiki">sage -t "devel/sage-main/sage/schemes/elliptic_curves/ell_rational_field.py"
[55.4 s] (was: [58.1 s])
sage -t "devel/sage-main/sage/schemes/elliptic_curves/heegner.py"
[47.2 s] (was: [51.1 s])
sage -t "devel/sage-main/sage/schemes/elliptic_curves/ell_number_field.py"
[34.1 s] (was: [35.2 s])
sage -t "devel/sage-main/sage/schemes/elliptic_curves/padic_lseries.py"
[26.1 s] (was: [26.9 s])
sage -t "devel/sage-main/sage/schemes/elliptic_curves/sha_tate.py"
[24.9 s] (was: [25.7 s])
</pre><p>
Hence, we have a slow-down of, overall, <code>(625.1 - 597)/625.1 = 4.5%</code>, the slow-down seems to be systematic (you hardly find an example that became faster), and in some cases we have a slow-down of 10%.
</p>
<p>
I expected it to be worse (after all, coercion affects everything). But still, the question is: Can the slow-down be avoided?
</p>
TicketjpfloriWed, 04 Jan 2012 14:05:37 GMT
https://trac.sagemath.org/ticket/715#comment:97
https://trac.sagemath.org/ticket/715#comment:97
<p>
So here are my global timings for ptestlong on sage.5.0.prealpha0:
</p>
<ul><li>3092.9 seconds with no errors on vanilla
</li><li>3097.8 seconds with 3 errors in sage.matrix.action.pyx and 1 in sage.structure.corece_dict.pyx on vanille + the two patches in the current ticket (<a class="ext-link" href="http://trac.sagemath.org/sage_trac/ticket/715"><span class="icon"></span>#715</a>) description.
</li></ul><p>
That's kind of strange. Maybe the slowdown is absorbed by the fact that the test are running in parallel ?
</p>
<p>
I'll just test sage.schemes with a single core and report.
</p>
TicketSimonKingWed, 04 Jan 2012 14:19:29 GMT
https://trac.sagemath.org/ticket/715#comment:98
https://trac.sagemath.org/ticket/715#comment:98
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:97" title="Comment 97">jpflori</a>:
</p>
<blockquote class="citation">
<ul><li>3092.9 seconds with no errors on vanilla
</li><li>3097.8 seconds with 3 errors in sage.matrix.action.pyx and 1 in sage.structure.corece_dict.pyx on vanille + the two patches in the current ticket (<a class="ext-link" href="http://trac.sagemath.org/sage_trac/ticket/715"><span class="icon"></span>#715</a>) description.
</li></ul></blockquote>
<p>
Cool!
</p>
<p>
Note that the tests in sage/schemes only take 597.5 seconds when I apply <a class="closed ticket" href="https://trac.sagemath.org/ticket/11943" title="enhancement: The category graph should comply with Python's method resolution order (closed: fixed)">#11943</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/11935" title="enhancement: Make parent/element classes independent of base rings (closed: fixed)">#11935</a> on top. Hence, if there really is a slow-down then it can be countered.
</p>
<p>
One detail about trac: If you want to link to a ticket, you can just provide the number after the "hash symbol", hence, <code>#715</code> and not <code>[http://.../715 #715]</code>.
</p>
<blockquote class="citation">
<p>
That's kind of strange. Maybe the slowdown is absorbed by the fact that the test are running in parallel ?
</p>
</blockquote>
<p>
I don't know the typical standard deviation of the timings.
</p>
TicketSimonKingWed, 04 Jan 2012 14:24:22 GMT
https://trac.sagemath.org/ticket/715#comment:99
https://trac.sagemath.org/ticket/715#comment:99
<p>
Interestingly, I get slightly different errors:
</p>
<pre class="wiki"> sage -t --long -force_lib devel/sage/sage/structure/coerce_dict.pyx # 1 doctests failed
sage -t --long -force_lib devel/sage/sage/structure/parent.pyx # 1 doctests failed
sage -t --long -force_lib devel/sage/sage/matrix/action.pyx # 5 doctests failed
</pre><p>
Anyway, this should be easy to fix...
</p>
TicketSimonKingWed, 04 Jan 2012 14:25:18 GMT
https://trac.sagemath.org/ticket/715#comment:100
https://trac.sagemath.org/ticket/715#comment:100
<p>
... and it took me 12187.2 seconds.
</p>
TicketSimonKingWed, 04 Jan 2012 15:07:31 GMT
https://trac.sagemath.org/ticket/715#comment:101
https://trac.sagemath.org/ticket/715#comment:101
<p>
The second patch is updated, more examples are added (in particular, it is demonstrated that the memory leak is fixed even when the j-invariant varies), and the errors that I had with the previous version are gone.
</p>
<p>
Hence, it can now be reviewed. Please try to find regressions!
</p>
<p>
Apply trac715_two_tripledicts.patch trac715_weak_action.patch
</p>
TicketjpfloriWed, 04 Jan 2012 15:24:26 GMT
https://trac.sagemath.org/ticket/715#comment:102
https://trac.sagemath.org/ticket/715#comment:102
<p>
Testing sage.schemes with only one core gave me:
</p>
<ul><li>1526.0 sec on vanilla
</li></ul><ul><li>1538.8 sec on vanilla + <a class="ext-link" href="http://trac.sagemath.org/sage_trac/search/opensearch?q=ticket%3A715"><span class="icon"></span>#715</a>
</li></ul><p>
This is more than acceptable according to me (if it does reflect anything... it might only be random stuff).
</p>
<p>
Running five tests of sage.schemes.elliptic_curves.padic_lseries gave me:
</p>
<ul><li>51.0, 48.7, 47.0, 47.0, 47.1 on vanilla
</li></ul><ul><li>49.0, 47.2, 48.4, 47.4, 47.7 on vanilla + <a class="ext-link" href="http://trac.sagemath.org/sage_trac/search/opensearch?q=ticket%3A715"><span class="icon"></span>#715</a>
</li></ul><p>
Still surprising that I don't find any slow-down as you did, but I might also be good news :)
</p>
<p>
My next step is to check for the memory leaks (same j invariant, different j invariants, finite field example of Paul in <a class="ext-link" href="http://trac.sagemath.org/sage_trac/search/opensearch?q=ticket%3A11521"><span class="icon"></span>#11521</a> ? or do that last one need a patch for the <a class="ext-link" href="http://trac.sagemath.org/sage_trac/search/opensearch?q=wiki%3AHomSet"><span class="icon"></span>HomSet</a> cache ? if this is the case it won't prevent this ticket to be closed, but should be treated in <a class="ext-link" href="http://trac.sagemath.org/sage_trac/search/opensearch?q=ticket%3A11521"><span class="icon"></span>#11521</a>, otherwise <a class="ext-link" href="http://trac.sagemath.org/sage_trac/search/opensearch?q=ticket%3A11521"><span class="icon"></span>#11521</a> can be closed as duplicate) and that action do not get continuously deleted.
</p>
<p>
Afterward, I'll properly review your code and examples (that I've already seen many times obviously :)).
</p>
TicketSimonKingWed, 04 Jan 2012 16:06:29 GMT
https://trac.sagemath.org/ticket/715#comment:103
https://trac.sagemath.org/ticket/715#comment:103
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:102" title="Comment 102">jpflori</a>:
</p>
<blockquote class="citation">
<p>
Testing sage.schemes with only one core gave me:
</p>
<ul><li>1526.0 sec on vanilla
</li></ul><ul><li>1538.8 sec on vanilla + <a class="ext-link" href="http://trac.sagemath.org/sage_trac/search/opensearch?q=ticket%3A715"><span class="icon"></span>#715</a>
</li></ul></blockquote>
<p>
That's very good news indeed!
</p>
<blockquote class="citation">
<p>
Running five tests of sage.schemes.elliptic_curves.padic_lseries gave me:
</p>
<ul><li>51.0, 48.7, 47.0, 47.0, 47.1 on vanilla
</li></ul><ul><li>49.0, 47.2, 48.4, 47.4, 47.7 on vanilla + <a class="ext-link" href="http://trac.sagemath.org/sage_trac/search/opensearch?q=ticket%3A715"><span class="icon"></span>#715</a>
</li></ul></blockquote>
<p>
That looks like quite some randomness.
</p>
<blockquote class="citation">
<p>
My next step is to check for the memory leaks (same j invariant, different j invariants, finite field example of Paul in <a class="ext-link" href="http://trac.sagemath.org/sage_trac/search/opensearch?q=ticket%3A11521"><span class="icon"></span>#11521</a> ? or do that last one need a patch for the <a class="ext-link" href="http://trac.sagemath.org/sage_trac/search/opensearch?q=wiki%3AHomSet"><span class="icon"></span>HomSet</a> cache ?
</p>
</blockquote>
<p>
Yes, the finite field example is not solved:
</p>
<pre class="wiki">sage: for p in prime_range(10^5):
....: K = GF(p)
....: a = K(0)
....:
sage: import gc
sage: gc.collect()
0
</pre><p>
So, I am going to modify the ticket description of <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>, indicating that the original elliptic curve example has been tackled here, but that there remains an orthogonal problem.
</p>
<blockquote class="citation">
<p>
Afterward, I'll properly review your code and examples (that I've already seen many times obviously :)).
</p>
</blockquote>
<p>
Not so many times: Some examples are only in the very latest version of the second patch.
</p>
TicketjpfloriWed, 04 Jan 2012 16:55:43 GMT
https://trac.sagemath.org/ticket/715#comment:104
https://trac.sagemath.org/ticket/715#comment:104
<p>
Please be careful with the non slow-down I reported above.
</p>
<p>
Something must have gone wrong with my installation, sorry for that, as I realized that the leak was not fixed.
</p>
<p>
I'll investigate all of this more carefully ASAP.
</p>
TicketSimonKingWed, 04 Jan 2012 20:03:40 GMT
https://trac.sagemath.org/ticket/715#comment:105
https://trac.sagemath.org/ticket/715#comment:105
<p>
There is one thing, related with regressions, that I didn't do: The <code>TripleDict</code> is cimported in sage/structure/coerce.pyx, and thus I could use the cdefed methods "set" and "get". But instead, I'm using the usual Python <code>__getitem__</code> and <code>__setitem__</code>. So, I could avoid some overhead. Will test it a bit later.
</p>
TicketSimonKingWed, 04 Jan 2012 20:18:15 GMTattachment set
https://trac.sagemath.org/ticket/715
https://trac.sagemath.org/ticket/715
<ul>
<li><strong>attachment</strong>
set to <em>trac715_weak_action.patch</em>
</li>
</ul>
<p>
Use weak references to the underlying set of an action. Use <code>TripleDictById</code> to store actions in parents. Disregard the orphan_functor patch!
</p>
TicketSimonKingWed, 04 Jan 2012 20:20:10 GMT
https://trac.sagemath.org/ticket/715#comment:106
https://trac.sagemath.org/ticket/715#comment:106
<p>
I have updated the second patch, so, please replace it with the new version. I am sorry that this came to late for "ptestlong", but perhaps the timings with the old patch version indicate what it might make sense to look at with the new version.
</p>
<p>
Apply trac715_two_tripledicts.patch trac715_weak_action.patch
</p>
TicketSimonKingWed, 04 Jan 2012 21:26:38 GMT
https://trac.sagemath.org/ticket/715#comment:107
https://trac.sagemath.org/ticket/715#comment:107
<p>
By the way, here is an example that shows that a <code>TripleDictById</code> finds its items <em>faster</em> then a usual dict, even though it has the additional advantage of weak keys. If one uses Cython, one can still save some more, which is what I did in the preceding change of the second patch.
</p>
<p>
I create a list of pairs of rings:
</p>
<pre class="wiki">sage: for p in prime_range(10^3):
....: K = GF(p)
....: P = K['x','y']
....: L.append((K,P))
....:
sage: len(L)
168
</pre><p>
I create a <code>TripleDictById</code> and a usual dictionary, and fill it by the same data:
</p>
<pre class="wiki">sage: from sage.structure.coerce_dict import TripleDictById
sage: D = TripleDictById(113)
sage: for i,(K,P) in enumerate(L):
....: D[K,P,True] = i
....:
sage: E = {}
sage: for i,(K,P) in enumerate(L):
....: E[K,P,True] = i
....:
sage: len(D)
168
sage: len(E)
168
</pre><p>
I create cython functions that know about the types. In the first, I use the Python way of accessing data from <code>TripleDictById</code>, in the second, I use the special cdefed <code>get()</code> method, and the third is for usual dictionaries.
</p>
<pre class="wiki">sage: cython("""
....: from sage.structure.coerce_dict cimport TripleDictById
....: def testD(TripleDictById D, list L):
....: for K,P in L:
....: n = D[K,P,True]
....: def testDget(TripleDictById D, list L):
....: for K,P in L:
....: n = D.get(K,P,True)
....: def testE(dict D, list L):
....: for K,P in L:
....: n = D[K,P,True]
....: """)
</pre><p>
Even though Cython is supposed to be quite good at optimising dictionary access (mind that <code>testE(...)</code> knows that it will receive a dictionary!), I was surprised by how much faster the <code>TripleDictById</code> is:
</p>
<pre class="wiki">sage: %timeit testD(D,L)
625 loops, best of 3: 67.8 µs per loop
sage: %timeit testDget(D,L)
625 loops, best of 3: 52.1 µs per loop
sage: %timeit testE(E,L)
125 loops, best of 3: 3.26 ms per loop
</pre><p>
Fourty to sixty times faster! So, I think it was a good idea to use <code>TripleDictById</code> not only in the coercion model, but also as an attribute of Parent.
</p>
TicketSimonKingThu, 05 Jan 2012 17:02:43 GMTstatus changed; work_issues set
https://trac.sagemath.org/ticket/715#comment:108
https://trac.sagemath.org/ticket/715#comment:108
<ul>
<li><strong>status</strong>
changed from <em>needs_review</em> to <em>needs_work</em>
</li>
<li><strong>work_issues</strong>
set to <em>Avoid a regression</em>
</li>
</ul>
<p>
We have a big regression.
</p>
<p>
I considered the doctests of sage/modules/free_module.py and took each timing twice, in order to be on the safe side.
</p>
<p>
Vanilla 5.0.prealpha0
</p>
<pre class="wiki">sage -t "devel/sage-main/sage/modules/free_module.py"
[11.9 s]
sage -t "devel/sage-main/sage/modules/free_module.py"
[10.3 s]
</pre><p>
With the first patch from here:
</p>
<pre class="wiki">sage -t "devel/sage-main/sage/modules/free_module.py"
[24.1 s]
sage -t "devel/sage-main/sage/modules/free_module.py"
[25.7 s]
</pre><p>
With both patches from here:
</p>
<pre class="wiki">sage -t "devel/sage-main/sage/modules/free_module.py"
[26.0 s]
sage -t "devel/sage-main/sage/modules/free_module.py"
[25.8 s]
</pre><p>
I think such a huge regression can't be accepted. Thus, it is "needs work".
</p>
TicketjpfloriThu, 05 Jan 2012 17:14:15 GMTstatus, work_issues changed
https://trac.sagemath.org/ticket/715#comment:109
https://trac.sagemath.org/ticket/715#comment:109
<ul>
<li><strong>status</strong>
changed from <em>needs_work</em> to <em>needs_info</em>
</li>
<li><strong>work_issues</strong>
changed from <em>Avoid a regression</em> to <em>Get some timings</em>
</li>
</ul>
<p>
Your timings are kind of strange in comparison with what I get.
</p>
<p>
I was going to post what follows which I double checked:
</p>
<p>
I can finally confirm I do not get any serious speed regression with the last couple of patches.
</p>
<p>
ptestlong gives something between 3100 and 3250 seconds with 5.0.prealpha0 vanilla or 715 applied.
</p>
<p>
Testing only sage.schemes with one core gives me between 1550 and 1600 with both in the same way.
</p>
<p>
And this time I confirm the test with random j-invariants is fixed by 715 and is not without (I'm getting paranoid now) as well as with a fixed j-invariant.
</p>
<p>
I'll review the code and example next.
</p>
<p>
So when I saw your post, I ran "sage -t devel/sage/sage-main/modules/free_modules.py" with both my installations (vanilla and vanilla+715) and got several times about 13 sec for both ! maybe a mean little lower for vanilla (at max .5 sec less).
</p>
<p>
I should mention I also get a quite big variance, not sure why, because my system is not heavily loaded, maybe cos the disk is on NFS.
</p>
<p>
E.g. I got between 12.3 and 20.0 (just once) for vanilla and between 12.7and 19. (at the same time, so maybe the network was loaded at that time ??)
</p>
<p>
For info I got a quite recent multicore Xeon running an outdated version of Ubuntu 64 bits.
</p>
TicketjpfloriThu, 05 Jan 2012 17:15:49 GMT
https://trac.sagemath.org/ticket/715#comment:110
https://trac.sagemath.org/ticket/715#comment:110
<p>
Groumpf, trac didnot like my blank lines.
</p>
<p>
So the original part I was about to post is inbetween "I can finally confirm..." and "example next."
</p>
TicketSimonKingFri, 06 Jan 2012 07:04:01 GMTstatus changed
https://trac.sagemath.org/ticket/715#comment:111
https://trac.sagemath.org/ticket/715#comment:111
<ul>
<li><strong>status</strong>
changed from <em>needs_info</em> to <em>needs_review</em>
</li>
</ul>
<p>
It is always possible that there is a regression on some hardware, but not on all.
</p>
<p>
I made an excessive log, i.e., I logged all Python commands. It turns out that there are only little differences with or without patch. Hence, I am sure that the regression does not come from an excessive garbage collection (otherwise, I would have seen that some objects are created repeatedly). So, I guess the regression comes from the C-level.
</p>
<p>
There is one thing that could make my code too slow: I use weak references in my version of <code>TripleDict</code> and <code>TripleDictById</code>; however, when getting dictionary items, I am calling the weak reference, in order to get the object that it is pointing to, and compare then. That is slow.
</p>
<p>
I was thinking: Perhaps I could manage to use <code>id(X)</code> as key of <code>TripleDictById</code>, rather than a weak reference to <code>X</code>. The weak reference to <code>X</code> could be stored elsewhere.
</p>
<p>
Anyway, here is a data point:
Unpatched (there is only <code>TripleDict</code>, no <code>TripleDictById</code>):
</p>
<pre class="wiki">sage: from sage.structure.coerce_dict import TripleDict
sage: D = TripleDict(113)
sage: L = []
sage: for p in prime_range(10^3):
....: K = GF(p)
....: P = K['x','y']
....: L.append((K,P))
....:
sage: for i,(K,P) in enumerate(L):
....: D[K,P,True] = i
....:
sage: cython("""
....: def testD(D, list L):
....: for K,P in L:
....: n = D[K,P,True]
....: """)
sage: %timeit testD(D,L)
625 loops, best of 3: 30.6 µs per loop
</pre><p>
Patched (comparing <code>TripleDict</code> and <code>TripleDictById</code>):
</p>
<pre class="wiki">sage: from sage.structure.coerce_dict import TripleDict, TripleDictById
sage: D = TripleDict(113)
sage: E = TripleDictById(113)
sage: L = []
sage: for p in prime_range(10^3):
....: K = GF(p)
....: P = K['x','y']
....: L.append((K,P))
....:
sage: for i,(K,P) in enumerate(L):
....: D[K,P,True] = i
....: E[K,P,True] = i
....:
sage: cython("""
....: def testD(D, list L):
....: for K,P in L:
....: n = D[K,P,True]
....: """)
sage: %timeit testD(D,L)
25 loops, best of 3: 21 ms per loop
sage: %timeit testD(E,L)
625 loops, best of 3: 61.9 µs per loop
</pre><p>
In the applications, I am mainly using <code>TripleDictById</code>. Nevertheless, it is only half as fast as the old <code>TripleDict</code>. So, this is what I have to work at!
</p>
TicketSimonKingFri, 06 Jan 2012 07:04:13 GMTstatus changed
https://trac.sagemath.org/ticket/715#comment:112
https://trac.sagemath.org/ticket/715#comment:112
<ul>
<li><strong>status</strong>
changed from <em>needs_review</em> to <em>needs_work</em>
</li>
</ul>
TicketvbraunFri, 06 Jan 2012 15:20:45 GMT
https://trac.sagemath.org/ticket/715#comment:113
https://trac.sagemath.org/ticket/715#comment:113
<p>
This is a bit hackish, but we could also store a strong reference as before but manually <code>Py_DECREF</code> it by one. The eraser then has to make sure that cache entries are removed when they fall out of use, or we'll segfault....
</p>
TicketSimonKingFri, 06 Jan 2012 16:50:04 GMT
https://trac.sagemath.org/ticket/715#comment:114
https://trac.sagemath.org/ticket/715#comment:114
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:113" title="Comment 113">vbraun</a>:
</p>
<blockquote class="citation">
<p>
This is a bit hackish, but we could also store a strong reference as before but manually <code>Py_DECREF</code> it by one. The eraser then has to make sure that cache entries are removed when they fall out of use, or we'll segfault....
</p>
</blockquote>
<p>
How should the eraser know which entry is to be removed? I wouldn't like to re-implement the weakref module...
</p>
<p>
At least on my machine, I have a regression. In order to avoid it, I am now experimenting with some ideas to speed-up the access to dictionary items: With my current patch, I do something like
</p>
<pre class="wiki"> if k1 is bucket[i]()
</pre><p>
where <code>buchet[i]</code> is a weak reference. But calling the reference takes a lot of time.
</p>
<p>
For example, since k1 is compared by identity (not equality), it might make sense to store <code>id(bucket[i]())</code> as an attribute of the weak reference. This is possible by <code>weakref.KeyedRef</code>. And <code>bucket[i].key</code> is a bit faster than <code>bucket[i]()</code>.
</p>
TicketSimonKingFri, 06 Jan 2012 18:20:49 GMT
https://trac.sagemath.org/ticket/715#comment:115
https://trac.sagemath.org/ticket/715#comment:115
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:114" title="Comment 114">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
For example, since k1 is compared by identity (not equality), it might make sense to store <code>id(bucket[i]())</code> as an attribute of the weak reference. This is possible by <code>weakref.KeyedRef</code>. And <code>bucket[i].key</code> is a bit faster than <code>bucket[i]()</code>.
</p>
</blockquote>
<p>
... but <code>k1 is bucket[i]()</code> is a lot faster than <code>id(k1)==bucket[i].key</code>. Too bad.
</p>
TicketvbraunFri, 06 Jan 2012 18:37:09 GMT
https://trac.sagemath.org/ticket/715#comment:116
https://trac.sagemath.org/ticket/715#comment:116
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:114" title="Comment 114">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
How should the eraser know which entry is to be removed? I wouldn't like to re-implement the weakref module...
</p>
</blockquote>
<p>
As you said in the preceeding comment, you'd have to store a weak reference elsewhere. The only advantage is that comparing could be done on the actual reference.
</p>
TicketSimonKingFri, 06 Jan 2012 22:27:04 GMT
https://trac.sagemath.org/ticket/715#comment:117
https://trac.sagemath.org/ticket/715#comment:117
<p>
Perhaps as follows: We currently have one ensemble of buckets. Instead, we could have two ensembles, say, <code>self.keys</code> and <code>self.refs</code>. Each bucket in both ensembles is a list of length <code>3*n</code>. Let <code>X,Y,Z</code> be key, let h be the hash of that triple and V the value associated with <code>X,Y,Z</code>.
</p>
<p>
Then, one could store <code>X,Y,Z</code> as <code>self.keys[h][i:i+3]</code>, with artificially decrementing the reference count for X and Y (but not for Z, which usually is True, False, None, operator.mul and so on), as suggested by Volker. And <code>self.refs[h][i:i+3]</code> would be formed by a weak reference to X, a weak reference to Y, and V. The two weak references have a callback function, that tries to find a reference <code>self.refs[h][j]</code> when it became invalid, and would delete the corresponding triple both in <code>self.refs[h]</code> and in <code>self.keys[h]</code>.
</p>
<p>
Two weak references with callback function pointing to the same object are distinct (they are only the same if they don't have a callback function). Hence, each reference occurs in the <code>TripleDict</code> exactly once. Hence, it makes sense to store the hash value of the <em>triple</em> <code>X,Y,Z</code> as additional data both in the reference to X and to Y - which is possible with <code>weakref.KeyedRef</code>. In that way, deleting an entry when a reference became invalid would be much faster as with my current patch, since it is not needed to search in <em>all</em> buckets.
</p>
<p>
I will try it tomorrow.
</p>
TicketSimonKingSat, 07 Jan 2012 12:13:22 GMTdescription changed
https://trac.sagemath.org/ticket/715#comment:118
https://trac.sagemath.org/ticket/715#comment:118
<ul>
<li><strong>description</strong>
modified (<a href="/ticket/715?action=diff&version=118">diff</a>)
</li>
</ul>
<p>
Here is a preliminary combined patch, implementing the ideas sketched in the previous post -- except that I forgot to explicitly decref stuff... Trying that now. I hope it doesn't segfault.
</p>
TicketSimonKingSat, 07 Jan 2012 12:27:25 GMT
https://trac.sagemath.org/ticket/715#comment:119
https://trac.sagemath.org/ticket/715#comment:119
<p>
Hm. When adding a "Py_DECREF", some doctest segfaults, and also the memory leak is <em>not</em> completely fixed: In the test where one creates 50 elliptic curves with random j-invariant, 12 of them survive garbage collection. That's better than 50, but worse than 1.
</p>
TicketSimonKingSat, 07 Jan 2012 12:36:03 GMT
https://trac.sagemath.org/ticket/715#comment:120
https://trac.sagemath.org/ticket/715#comment:120
<p>
First of all, in my current patch, I forgot the case <code>k1 is k2</code>: In that case, I would decrement the reference count twice for the same object. However, even when I avoid it, I get a double free.
</p>
<p>
I wonder: Could the double free result from the fact that I do <code>del self.key_buckets[h][i:i+3]</code> when the reference count for <code>self.key_buckets[h][i]</code> is already zero? Or would that be no problem?
</p>
TicketSimonKingSat, 07 Jan 2012 12:51:23 GMT
https://trac.sagemath.org/ticket/715#comment:121
https://trac.sagemath.org/ticket/715#comment:121
<p>
I made some progress by using <code>Py_CLEAR</code> instead of <code>Py_DECREF</code>. Now, it is "only" signal 11, not a double-free.
</p>
TicketSimonKingSat, 07 Jan 2012 18:09:03 GMT
https://trac.sagemath.org/ticket/715#comment:122
https://trac.sagemath.org/ticket/715#comment:122
<p>
Sorry, it seems that I have no idea whatsoever of reference counting. I made experiments with <code>Py_DECREF</code> resp. <code>Py_CLEAR</code> applied to list elements, but in all cases I get a segfault when the next garbage collection occurs.
</p>
TicketSimonKingSat, 07 Jan 2012 20:06:04 GMTwork_issues changed
https://trac.sagemath.org/ticket/715#comment:123
https://trac.sagemath.org/ticket/715#comment:123
<ul>
<li><strong>work_issues</strong>
changed from <em>Get some timings</em> to <em>Improve timing and provid documentation</em>
</li>
</ul>
<p>
I have updated the patch. Instead of storing the original key and using <code>Py_DECREF</code>, I store its address instead (for <code>TripleDictId</code>) resp. use another weak reference.
</p>
<p>
With the new patch, <code>sage -t "devel/sage-main/sage/modules/free_module.py"</code> works and is about as fast as in vanilla sage.
</p>
<p>
Moreover, the memleak is fixed.
</p>
<p>
However, the patch isn't fully tested or documented yet. And still <code>TripleDictById</code> is only half as fast as the old <code>TripleDict</code> (but recall: The old is buggy and uses strong references). So, it isn't ready for review, but of course I'd appreciate preliminary comments.
</p>
<p>
Apply trac715_tripledict_combined.patch
</p>
TicketSimonKingSun, 08 Jan 2012 17:21:37 GMTwork_issues changed
https://trac.sagemath.org/ticket/715#comment:124
https://trac.sagemath.org/ticket/715#comment:124
<ul>
<li><strong>work_issues</strong>
changed from <em>Improve timing and provid documentation</em> to <em>Rename `TripleDictById` into `TripleDict`. Improve timing and update documentation</em>
</li>
</ul>
<p>
OMG!!
</p>
<p>
I totally misinterpreted how the keys were compared in the original version of <code>TripleDict</code>. When I saw the line <code>if PyList_GET_ITEM(bucket, i) == <PyObject*>k1</code> in the old code, I thought that this means to compare the objects by equality.
</p>
<p>
But now I learnt that this is comparison by identity. Arrgh! The behaviour that I provide with <code>TripleDictById</code> was there all along!
</p>
<p>
Conclusion: I should erase my version of <code>TripleDict</code> (which really compares by equality, not identity), rename my <code>TripleDictById</code> into <code>TripleDict</code>, and then finally try to get things up to speed.
</p>
TicketSimonKingSun, 08 Jan 2012 23:56:58 GMTattachment set
https://trac.sagemath.org/ticket/715
https://trac.sagemath.org/ticket/715
<ul>
<li><strong>attachment</strong>
set to <em>trac715_tripledict_combined.patch</em>
</li>
</ul>
<p>
Introduce weak references to coercion dicts, and refactor the hashtables.
</p>
TicketSimonKingMon, 09 Jan 2012 00:23:01 GMTstatus, work_issues changed
https://trac.sagemath.org/ticket/715#comment:125
https://trac.sagemath.org/ticket/715#comment:125
<ul>
<li><strong>status</strong>
changed from <em>needs_work</em> to <em>needs_info</em>
</li>
<li><strong>work_issues</strong>
changed from <em>Rename `TripleDictById` into `TripleDict`. Improve timing and update documentation</em> to <em>Should we rename `TripleDictById` into `TripleDict`, or do we want a "weak triple dictionary with comparison by equality"?</em>
</li>
</ul>
<p>
I have posted a new patch version.
</p>
<p>
Recall that we want a dictionary whose keys are triples; we want to compare all three key items by identity, and we want that there is only a weak reference to the first two key items (the third my have a strong reference).
</p>
<p>
The <code>TripleDictById</code> is now based on the following idea:
</p>
<ul><li>There is one list that stores the memory addresses of the first two key items and the third key item. In particular, I don't need to decref the key items, since we only store their addresses.
</li><li>There is another list that stores the value corresponding to the key triple, and stores weak references with a callback function to the first two key items.
</li><li>When accessing the dictionary, the address of the first two key items is compared with the stored address, and the third is compared by identity with the stored data.
</li><li>Only when iterating over the <code>TripleDictById</code>, the weak references are called (of course: iteritems is supposed to return the keys, not just the address of the keys).
</li><li>There are two reasons for storing the weak references (and not only the addresses): The callback function of the weak references removes unused entries of the dictionary, and we also need it for iteration over the dictionary.
</li></ul><p>
<span class="underline">Status of the patch</span>
</p>
<ul><li>The "raw" speed seems to be almost as good as in the unpatched version, the speed of doctests seems to be OK, and I don't observe segfaults.
</li><li>The memleak is fixed.
</li><li>The documentation of sage/structure/coerce_dict.pyx needs more polishing, and last but not least I did not run the doctests yet.
</li></ul><p>
The patch still contains both <code>TripleDict</code> (which compares weak keys by equality) and <code>TripleDictById</code> (which compares keys by identity, similar to what <code>TripleDict</code> does in unpatched Sage, but using weak references).
</p>
<p>
What do you think: Should comparison by equality be provided in the patch?
</p>
<p>
Contra:
</p>
<blockquote>
<p>
We don't use it in the rest of Sage, so, why should we add it?
</p>
</blockquote>
<p>
Pro:
</p>
<blockquote>
<p>
A "triple dict by comparison" is slower than a usual (strong) dictionary, but on the other hand <code>weakref.WeakKeyDictionary</code> does not work if the keys are tuples - hence, "triple dict by comparison" adds a new feature.
</p>
</blockquote>
TicketSimonKingMon, 09 Jan 2012 16:43:06 GMTstatus, description changed; work_issues deleted
https://trac.sagemath.org/ticket/715#comment:126
https://trac.sagemath.org/ticket/715#comment:126
<ul>
<li><strong>status</strong>
changed from <em>needs_info</em> to <em>needs_review</em>
</li>
<li><strong>work_issues</strong>
<em>Should we rename `TripleDictById` into `TripleDict`, or do we want a "weak triple dictionary with comparison by equality"?</em> deleted
</li>
<li><strong>description</strong>
modified (<a href="/ticket/715?action=diff&version=126">diff</a>)
</li>
</ul>
<p>
To answer my own question: I believe that comparison by equality does not make sense (yet), since it isn't used in the coercion model.
</p>
<p>
Therefore, I have produced a new patch. Idea: The <code>TripleDict</code> stores the addresses of the keys. In addition, there is a dictionary of weak references with callback function. The <em>only</em> purpose of these references is that their callback functions are erasing invalid dictionary items.
</p>
<p>
<span class="underline">"Raw" speed</span>
</p>
<p>
Patched:
</p>
<pre class="wiki">sage: from sage.structure.coerce_dict import TripleDict
sage: D = TripleDict(113)
sage: L = []
sage: for p in prime_range(10^3):
....: K = GF(p)
....: P = K['x','y']
....: L.append((K,P))
....:
sage: for i,(K,P) in enumerate(L):
....: D[K,P,True] = i
....:
sage: cython("""
....: from sage.structure.coerce_dict cimport TripleDict
....: def testTriple(TripleDict D, list L):
....: for K,P in L:
....: n = D[K,P,True]
....: def testTripleGet(TripleDict D, list L):
....: for K,P in L:
....: n = D.get(K,P,True)
....: def testTripleSet(list L):
....: cdef TripleDict D = TripleDict(113)
....: for i,(K,P) in enumerate(L):
....: D.set(K,P,True, i)
....: """)
sage: %timeit testTriple(D,L)
625 loops, best of 3: 42.4 µs per loop
sage: %timeit testTripleGet(D,L)
625 loops, best of 3: 28.3 µs per loop
sage: %timeit testTripleSet(L)
125 loops, best of 3: 2.66 ms per loop
</pre><p>
Unpatched:
</p>
<pre class="wiki">sage: %timeit testTriple(D,L)
625 loops, best of 3: 31.2 µs per loop
sage: %timeit testTripleGet(D,L)
625 loops, best of 3: 17.5 µs per loop
sage: %timeit testTripleSet(L)
625 loops, best of 3: 79.2 µs per loop
</pre><p>
<span class="underline">Doctest speed</span>
</p>
<p>
Patched
</p>
<pre class="wiki">sage -t "devel/sage-main/sage/modules/free_module.py"
[11.4 s]
sage -t "devel/sage-main/sage/modules/free_module.py"
[11.7 s]
</pre><p>
Unpatched
</p>
<pre class="wiki">sage -t "devel/sage-main/sage/modules/free_module.py"
[11.7 s]
sage -t "devel/sage-main/sage/modules/free_module.py"
[11.5 s]
</pre><p>
<span class="underline">Conclusion</span>
</p>
<p>
Using weak references, things become a bit slower, but it is a lot better than with the previous patches. According to the timing of the doc tests, the regression doesn't matter in applications.
</p>
<p>
I guess there is no free lunch, and thus the regression is small enough, given that the memory leak is fixed (which is checked in a new test).
</p>
<p>
I have not run the full test suite yet, but I think it can be reviewed.
</p>
<p>
Apply trac715_one_triple_dict.patch
</p>
TicketSimonKingMon, 09 Jan 2012 20:13:06 GMTattachment set
https://trac.sagemath.org/ticket/715
https://trac.sagemath.org/ticket/715
<ul>
<li><strong>attachment</strong>
set to <em>trac715_one_triple_dict.patch</em>
</li>
</ul>
<p>
Drop the distinction of <code>TripleDict</code> versus <code>TripleDictById</code>. Use the memory addresses as dictionary keys
</p>
TicketSimonKingMon, 09 Jan 2012 20:14:35 GMT
https://trac.sagemath.org/ticket/715#comment:127
https://trac.sagemath.org/ticket/715#comment:127
<p>
<code>make ptest</code> reported only one error, and the error was in fact a misprint. Hence, I have updated my patch, and with it, all tests should pass.
</p>
TicketjpfloriThu, 12 Jan 2012 10:37:42 GMT
https://trac.sagemath.org/ticket/715#comment:128
https://trac.sagemath.org/ticket/715#comment:128
<p>
Here are finally some first timings for 'make ptest' (this time I first checked the memory leak is actually fixed...)
</p>
<ul><li>sage-5.0.prealpha1 vanilla: 937.4 sec
</li><li>sage-5.0.prealpha1 + 715: 948.8 sec
</li></ul><p>
No errors for both. I'll report on make ptestlong tomorrow, try to check that actions do not get continuously deleted and finally review the code.
</p>
TicketjpfloriFri, 13 Jan 2012 08:18:50 GMT
https://trac.sagemath.org/ticket/715#comment:129
https://trac.sagemath.org/ticket/715#comment:129
<p>
Running "make ptestlong" gave me:
</p>
<ul><li>sage-5.0.prealpha1 vanilla: 1397.9 sec
</li><li>sage-5.0.prealpha1 + 715: 1415.0 sec
</li></ul><p>
with no errors for both (I remarked that testing sandpile.py was horribly long with my previous install of sage-5.0.prealpha0, something like 1350 sec for it alone; in between I've updated my ubuntu and recompiled everything for prealpha1, so that might explain why my new timings are so faster).
</p>
<p>
Hence no regression!
</p>
TicketSimonKingFri, 13 Jan 2012 09:00:52 GMT
https://trac.sagemath.org/ticket/715#comment:130
https://trac.sagemath.org/ticket/715#comment:130
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:129" title="Comment 129">jpflori</a>:
</p>
<blockquote class="citation">
<p>
Running "make ptestlong" gave me:
</p>
<ul><li>sage-5.0.prealpha1 vanilla: 1397.9 sec
</li><li>sage-5.0.prealpha1 + 715: 1415.0 sec
</li></ul></blockquote>
<p>
I'm glad that this time (in contrast to what I did in <a class="closed ticket" href="https://trac.sagemath.org/ticket/9138" title="defect: Categories for all rings (closed: fixed)">#9138</a> and was fixing in <a class="closed ticket" href="https://trac.sagemath.org/ticket/11900" title="defect: Serious regression caused by #9138 (closed: fixed)">#11900</a>) it seems that I am not creating a terrible slow-down!
</p>
TicketSimonKingSun, 15 Jan 2012 09:39:04 GMT
https://trac.sagemath.org/ticket/715#comment:131
https://trac.sagemath.org/ticket/715#comment:131
<p>
I found another memory leak:
</p>
<pre class="wiki">sage: K = GF(1<<55,'t')
sage: for i in range(50):
....: a = K.random_element()
....: E = EllipticCurve(j=a)
....: b = K.has_coerce_map_from(E)
....:
sage: import gc
sage: gc.collect()
0
</pre><p>
Namely, <code>K.coerce_map_from(E)</code> stores the resulting map (or None) in a strong dictionary.
</p>
<p>
Several questions: Would it suffice to change the dictionary into a <code>WeakKeyDictionary</code>? If it would: Would it cause a regression? I guess the answer to the second question is "yes", since getting an item out of a weak key dictionary is quite slow and requesting a coerce map is a very frequent operation.
</p>
<p>
So, I suppose one could introduce another type of dictionary, analogous to <code>TripleDict</code>, which would not test for equality but for identity.
</p>
<p>
But should this be here or on a new ticket? I think the patch from here is big enough, hence, do it on a different ticket, but you can try to convince me to do it here.
</p>
<p>
Since the patchbot tried to use the wrong patches:
</p>
<p>
Apply trac715_one_tripledict.patch
</p>
TicketSimonKingSun, 15 Jan 2012 11:37:48 GMT
https://trac.sagemath.org/ticket/715#comment:132
https://trac.sagemath.org/ticket/715#comment:132
<p>
I don't know why the patchbot keeps trying to apply <em>all</em> patches.
</p>
<p>
Anyway. First experiments show that a <code>MonoDict</code> (which would be my name for a dictionary that uses weak keys, compares the keys by identity and expect a singly item as a key) is a lot faster than a usual dictionary, if the keys are frequently used parents such as finite fields. "A lot" means: More than 20 times faster.
</p>
<p>
I will simply try whether things still work when I replace dictionaries by <code>MonoDict</code> in the coercion model. If they do, I'll post here. If there are difficult problems, I'll move it to a different ticket.
</p>
TicketjpfloriSun, 15 Jan 2012 14:52:37 GMT
https://trac.sagemath.org/ticket/715#comment:133
https://trac.sagemath.org/ticket/715#comment:133
<p>
I'd say we'd better put your <a class="missing wiki">MonoDict?</a> fix in another ticket, even if it no difficult problems arise, to keep the patch readable enough and the problems clearly separated.
</p>
<p>
And close this one asap... sorry I should be the one finally reviewing your code (I already checked for speed regression and actual fix of the leak as mentioned above), but I do not have much time these days.
</p>
<p>
I'd say I'll do that on thursday (at worst i hope), as there is some Sage meeting in Paris that day.
</p>
TicketSimonKingSun, 15 Jan 2012 17:04:46 GMT
https://trac.sagemath.org/ticket/715#comment:134
https://trac.sagemath.org/ticket/715#comment:134
<p>
See <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a> for the other memleak.
</p>
TicketSimonKingTue, 17 Jan 2012 07:21:54 GMT
https://trac.sagemath.org/ticket/715#comment:135
https://trac.sagemath.org/ticket/715#comment:135
<p>
You raised the question whether actions (and perhaps maps as well) are garbage collected too often. I inserted some lines of code into the init method of <code>sage.categories.map.Map</code> and <code>sage.categories.action.Action</code> that counts how often the init method is called (namely by appending one character to some file on my disk). Then, I ran the doctests in <code>sage.schemes</code>. Result:
</p>
<p>
<span class="underline">With <a class="closed ticket" href="https://trac.sagemath.org/ticket/11780" title="defect: Creating a polynomial ring over a number field results in a non-unique ... (closed: fixed)">#11780</a> only</span>
</p>
<ul><li>76102 maps
</li><li>41381 actions
</li><li>647.3 seconds
</li></ul><p>
<span class="underline">With <a class="closed ticket" href="https://trac.sagemath.org/ticket/11780" title="defect: Creating a polynomial ring over a number field results in a non-unique ... (closed: fixed)">#11780</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a></span>
</p>
<ul><li>76192 maps
</li><li>46157 actions
</li><li>658 seconds
</li></ul><p>
So, actions are created about 10% more often than without the patch, while the speed regression is not so dramatic.
</p>
<p>
Two explanations:
</p>
<ol><li>These 10% of actions would have been needed, and it is <em>bad</em> that they were garbage collected.
</li></ol><ol start="2"><li>One file contains many tests, and often these tests are quite similar. In particular, many actions will occur in many different tests. Without the patch, the actions created by the first test are strongly cached and are thus still available for the second, third, ... test. But with the patch, the actions created by the first test will be garbage collected when the first test is done. Hence, it is <em>good</em> that they were garbage collected.
</li></ol><p>
In order to find out whether 1. or 2. is the correct explanation, I'll determine the number of maps and actions created in "single" computations, namely in the benchmarks discussed at <a class="closed ticket" href="https://trac.sagemath.org/ticket/11900" title="defect: Serious regression caused by #9138 (closed: fixed)">#11900</a>.
</p>
TicketSimonKingTue, 17 Jan 2012 08:34:08 GMT
https://trac.sagemath.org/ticket/715#comment:136
https://trac.sagemath.org/ticket/715#comment:136
<p>
Here is some more data. In all cases, I give the number of maps and actions created, first with <a class="closed ticket" href="https://trac.sagemath.org/ticket/11780" title="defect: Creating a polynomial ring over a number field results in a non-unique ... (closed: fixed)">#11780</a> only and then with <a class="closed ticket" href="https://trac.sagemath.org/ticket/11780" title="defect: Creating a polynomial ring over a number field results in a non-unique ... (closed: fixed)">#11780</a>+<a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a>.
</p>
<p>
First test: Start Sage!
</p>
<blockquote>
<p>
-> 191 maps, 44 actions versus 191 maps, 44 actions. Fine!
</p>
</blockquote>
<p>
Second test:
</p>
<pre class="wiki">E = J0(46).endomorphism_ring()
g = E.gens()
</pre><blockquote>
<p>
-> 597 maps, 320 actions versus 611 maps, 481 actions. That's about 50% more actions and is thus not good.
</p>
</blockquote>
<p>
Third test:
</p>
<pre class="wiki">L = EllipticCurve('960d1').prove_BSD()
</pre><blockquote>
<p>
-> 3550 maps, 97 actions versus 3550 maps, 97 actions. Fine!
</p>
</blockquote>
<p>
Fourth test:
</p>
<pre class="wiki">E = EllipticCurve('389a')
for p in prime_range(10000):
if p != 389:
G = E.change_ring(GF(p)).abelian_group()
</pre><blockquote>
<p>
-> 14969 maps, 9884 actions versus 14969 maps, 9885 actions. Fine!
</p>
</blockquote>
<p>
Question to the reviewer: How bad do you think is the "missing action" in the second example? Would it be worth while to fix it in the method <code>E.gens</code>?
</p>
<p>
Would you even think I should try to modify <code>TripleDict</code> so that a list of <em>strong</em> references is preserved, but the list can only have a maximal length (thus popping the first references on the list when new references are appended)? In that way, one could extend the life time of the cache, but at the same time one would avoid an infinite memory growth.
</p>
<p>
It is a shame that Python only has strong and weak references, but no soft references!
</p>
TicketSimonKingWed, 18 Jan 2012 12:02:16 GMTcc changed
https://trac.sagemath.org/ticket/715#comment:137
https://trac.sagemath.org/ticket/715#comment:137
<ul>
<li><strong>cc</strong>
<em>robertwb</em> added
</li>
</ul>
<p>
At <a class="ext-link" href="http://groups.google.com/group/sage-devel/browse_thread/thread/8b2fba49fe1ee69e"><span class="icon"></span>sage-devel</a>, Robert Bradshaw suggested the following benchmark, measuring the impact of the new <code>TripleDict</code> on multiplication of integers with <code>RDF</code> (which does involve actions and thus does involve lookup in <code>TripleDict</code>):
</p>
<pre class="wiki">sage: def test(n):
....: a = Integer(10)
....: b = QQ(20)
....: s = RDF(30)
....: for x in xrange(10**n):
....: s += a*b*x
....:
</pre><p>
With Sage-5.0.prealpha0+<a class="closed ticket" href="https://trac.sagemath.org/ticket/11780" title="defect: Creating a polynomial ring over a number field results in a non-unique ... (closed: fixed)">#11780</a>:
</p>
<pre class="wiki">sage: %time test(6)
CPU times: user 7.25 s, sys: 0.04 s, total: 7.29 s
Wall time: 7.31 s
</pre><p>
and with the patch from here added
</p>
<pre class="wiki">sage: %time test(6)
CPU times: user 7.29 s, sys: 0.01 s, total: 7.31 s
Wall time: 7.31 s
</pre><p>
So, yet another supporting data point!
</p>
TicketSimonKingWed, 18 Jan 2012 14:19:00 GMT
https://trac.sagemath.org/ticket/715#comment:138
https://trac.sagemath.org/ticket/715#comment:138
<p>
Question: How urgent do you see implementing a ring buffer for <code>TripleDict</code>? Namely, right now, I'd prefer to work on <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a>. Since <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a> changes sage/structure/coerce_dict.pxd, it would probably be easier for me to coordinate work by postponing the ring buffer to a different ticket (or perhaps introduce it at <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a>?).
</p>
<p>
What do you think?
</p>
TicketjpfloriWed, 18 Jan 2012 14:25:43 GMT
https://trac.sagemath.org/ticket/715#comment:139
https://trac.sagemath.org/ticket/715#comment:139
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:138" title="Comment 138">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
Question: How urgent do you see implementing a ring buffer for <code>TripleDict</code>? Namely, right now, I'd prefer to work on <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a>. Since <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a> changes sage/structure/coerce_dict.pxd, it would probably be easier for me to coordinate work by postponing the ring buffer to a different ticket (or perhaps introduce it at <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a>?). What do you think?
</p>
</blockquote>
<p>
I think we'd better close this one asap, especially now that it seems that no speed regression occur, and provide a speed-up in a subsequent ticket (as you did for <a class="closed ticket" href="https://trac.sagemath.org/ticket/9138" title="defect: Categories for all rings (closed: fixed)">#9138</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/11900" title="defect: Serious regression caused by #9138 (closed: fixed)">#11900</a> or two other ones..).
</p>
<p>
Of course one could argue that we get no speed regression because we go faster when accessing the dicts, but delete actions more often, so the situation for object creations is not exactly as before, but I do not think anybody or any functions relied the lifetime of these objects (or should...).
</p>
<p>
If you do agree, I'll review the ticket tomorrow as I already planned to do and mentioned a few comments above.
</p>
TicketSimonKingWed, 18 Jan 2012 14:34:30 GMT
https://trac.sagemath.org/ticket/715#comment:140
https://trac.sagemath.org/ticket/715#comment:140
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:139" title="Comment 139">jpflori</a>:
</p>
<blockquote class="citation">
<p>
If you do agree, I'll review the ticket tomorrow
</p>
</blockquote>
<p>
Thank you! Yes, I'd prefer it that way. Having the ring buffer means modifying coerce_dict.pxd, which essentially means recompiling almost the whole Sage library, and that takes almost an hour on my laptop. So, it is better for me to not switch back and forth between <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a>.
</p>
TicketjpfloriTue, 24 Jan 2012 12:41:26 GMTdescription changed
https://trac.sagemath.org/ticket/715#comment:141
https://trac.sagemath.org/ticket/715#comment:141
<ul>
<li><strong>description</strong>
modified (<a href="/ticket/715?action=diff&version=141">diff</a>)
</li>
</ul>
TicketjpfloriTue, 24 Jan 2012 13:10:20 GMTstatus changed
https://trac.sagemath.org/ticket/715#comment:142
https://trac.sagemath.org/ticket/715#comment:142
<ul>
<li><strong>status</strong>
changed from <em>needs_review</em> to <em>needs_info</em>
</li>
</ul>
<p>
Ive finally read your code and have to say bravo!
</p>
<p>
However I've got one request, or rather one question.
</p>
<p>
With the current implementation, Actions always use a weak ref for the underlying set so that it can and will be garbage collected if it is not strong refed elsewhere.
</p>
<p>
You illustrate and mention that in some examples in action.pyx.
</p>
<p>
You also modify an example involving Action and <a class="missing wiki">MatrixSpace?</a> to make sure that no gc occurs.
</p>
<p>
I do not think this is the right solution, I mean that the user should be able to use Action has before (and anyway it does not feel right to me that you can create something that can magically disappear).
</p>
<p>
You could also argue that nobody actually uses Actions directly (I do not for example :) ), those who do will have to be careful.
</p>
<p>
I see two solutions:
</p>
<ul><li>Add a big fat warning in Action documentation (red, in a bloc, at the start, etc.)
</li><li>Implement somehow an option to choose whether to use weak ref (which will be set for the coercion model) or strong ones (set by default, so the "normal" and previous behaviour will be the default one). It basically mean passing an additional boolean somehow which will lead the construction of underlying_set, be saved and modify the behavior of underlying_set() (i.e. add () or not)
</li></ul><p>
What does everybody thinks ?
</p>
TicketjpfloriTue, 24 Jan 2012 13:48:33 GMT
https://trac.sagemath.org/ticket/715#comment:143
https://trac.sagemath.org/ticket/715#comment:143
<p>
Note to myself: could use type(E) rather than importing the <a class="missing wiki">AbelianGroupSoLong?</a>... type as Simon did in <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a>
</p>
<p>
(type(E) is not Abelian... but the memory leak can be testing with it as well)
</p>
<p>
This is should make the example more understandable.
</p>
<p>
Same remark apply for ticket about homset.
</p>
TicketSimonKingTue, 24 Jan 2012 15:31:23 GMT
https://trac.sagemath.org/ticket/715#comment:144
https://trac.sagemath.org/ticket/715#comment:144
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:142" title="Comment 142">jpflori</a>:
</p>
<blockquote class="citation">
<p>
With the current implementation, Actions always use a weak ref for the underlying set so that it can and will be garbage collected if it is not strong refed elsewhere.
</p>
<p>
You illustrate and mention that in some examples in action.pyx.
</p>
<p>
You also modify an example involving Action and <a class="missing wiki">MatrixSpace?</a> to make sure that no gc occurs.
</p>
</blockquote>
<p>
Or rather: That it does not occur too late.
</p>
<blockquote class="citation">
<p>
I do not think this is the right solution, I mean that the user should be able to use Action has before (and anyway it does not feel right to me that you can create something that can magically disappear).
</p>
</blockquote>
<p>
I believe that it is fine. Namely, what use would an action have if you do not have any other strong reference to the underlying set S?
</p>
<p>
That's to say: You forgot S and <em>all</em> of its elements. But what use would an action on S if you not even know to provide a single element of S?
</p>
<blockquote class="citation">
<p>
You could also argue that nobody actually uses Actions directly (I do not for example :) ), those who do will have to be careful.
</p>
</blockquote>
<p>
I think so.
</p>
<blockquote class="citation">
<ul><li>Add a big fat warning in Action documentation (red, in a bloc, at the start, etc.)
</li></ul></blockquote>
<p>
OK, that would need more than the short remarks in my added examples.
</p>
<blockquote class="citation">
<ul><li>Implement somehow an option to choose whether to use weak ref (which will be set for the coercion model) or strong ones (set by default, so the "normal" and previous behaviour will be the default one). It basically mean passing an additional boolean somehow which will lead the construction of underlying_set, be saved and modify the behavior of underlying_set() (i.e. add () or not)
</li></ul></blockquote>
<p>
One could store the underlying set S either by
</p>
<pre class="wiki"> self.S = weakref.ref(S)
</pre><p>
resulting in a weak reference, or by
</p>
<pre class="wiki"> self.S = ConstantFunction(S)
</pre><p>
resulting in a strong reference.
</p>
<p>
The advantage is that <code>underlying_set()</code> could remain as it is. In particular, we don't need to make the syntax (<code>return self.S</code> versus <code>return self.S()</code>) depend on any any parameter used during initialisation. Note that calling a <code>ConstantFunction</code> takes almost no time.
</p>
<p>
However, it might even be faster to do
</p>
<pre class="wiki"> if self.use_weak_references:
return self.S()
else:
return self.S
</pre><p>
where <code>self.use_weak_references</code> is a <code>cdef bint</code> parameter assigned during initialisation.
</p>
<p>
I can't test it right now.
</p>
TicketjpfloriWed, 08 Feb 2012 14:39:45 GMT
https://trac.sagemath.org/ticket/715#comment:145
https://trac.sagemath.org/ticket/715#comment:145
<p>
I'll have some time to work on this today or friday.
</p>
<p>
Any progress on your side ?
</p>
<p>
For example, implementing my preferred solution with the "use_wek_references"? :)
</p>
TicketSimonKingWed, 08 Feb 2012 16:53:22 GMT
https://trac.sagemath.org/ticket/715#comment:146
https://trac.sagemath.org/ticket/715#comment:146
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:145" title="Comment 145">jpflori</a>:
</p>
<blockquote class="citation">
<p>
I'll have some time to work on this today or friday.
</p>
<p>
Any progress on your side ?
</p>
<p>
For example, implementing my preferred solution with the "use_wek_references"? :)
</p>
</blockquote>
<p>
No. Currently, I focus on computing Ext algebras of finite dimensional path algebra quotients (that's what I get my money for), and to fix my old group cohomology spkg (which wouldn't work with the most recent version of Sage for at least three independent reasons).
</p>
TicketjpfloriFri, 10 Feb 2012 15:00:49 GMTkeywords changed
https://trac.sagemath.org/ticket/715#comment:147
https://trac.sagemath.org/ticket/715#comment:147
<ul>
<li><strong>keywords</strong>
<em>Cernay2012</em> added
</li>
</ul>
<p>
I've posted a first draft of a patch to make use of weakrefs optional (did not add doc, nor changed the test added or modified by Simon yet).
</p>
<p>
I've surely forgotten some places where action are defined etc.
</p>
<p>
After doing that, I've begun thinking that Simon is right and that Actions are too much related to the coecion system for this approch to be valid.
</p>
<p>
Maybe using weakrefs all the time, even though objects can become unusable is good enough.
</p>
TicketjpfloriFri, 10 Feb 2012 16:58:07 GMTattachment set
https://trac.sagemath.org/ticket/715
https://trac.sagemath.org/ticket/715
<ul>
<li><strong>attachment</strong>
set to <em>trac715_optional_weakref.patch</em>
</li>
</ul>
<p>
Make use of weakrefs optional: off by default, on for coercion
</p>
TicketjpfloriFri, 10 Feb 2012 17:15:39 GMT
https://trac.sagemath.org/ticket/715#comment:148
https://trac.sagemath.org/ticket/715#comment:148
<p>
Some further thoughts:
</p>
<ul><li>Currently my piece of code do not take into account classes overriding get_action
</li></ul><ul><li>for this approach to be consistent I guess that get and discover action should return by default strong refed actions, so we should also add optional arguments to all the get and discover actions...
</li></ul>
TicketjpfloriFri, 10 Feb 2012 17:28:15 GMT
https://trac.sagemath.org/ticket/715#comment:149
https://trac.sagemath.org/ticket/715#comment:149
<p>
This last idea won't be really consistent anyway because the get_action function caches its result anyway in _action_hash...
</p>
<p>
So i'm now quite convinced that one should use weak refs all the time and that providing documentation about that is sufficient.
</p>
TicketjpfloriWed, 07 Mar 2012 13:11:29 GMT
https://trac.sagemath.org/ticket/715#comment:150
https://trac.sagemath.org/ticket/715#comment:150
<p>
I'm finally trying to add some doc to this ticket and realized that in the matrix.action file you state the usual laius about underlying sets eventually getting garbage collected.
</p>
<p>
However, this is not the case in your examples, for the good reason that matrix spaces are cached.
</p>
<p>
I'll try to provide an example where matrices act on something not cached, and we won a new ticket where your constructions should be used to cache objects :)
</p>
TicketjpfloriFri, 09 Mar 2012 12:17:23 GMTattachment set
https://trac.sagemath.org/ticket/715
https://trac.sagemath.org/ticket/715
<ul>
<li><strong>attachment</strong>
set to <em>trac_715-reviewer.patch</em>
</li>
</ul>
<p>
Reviewer patch; added doc
</p>
TicketjpfloriFri, 09 Mar 2012 12:19:46 GMTstatus changed
https://trac.sagemath.org/ticket/715#comment:151
https://trac.sagemath.org/ticket/715#comment:151
<ul>
<li><strong>status</strong>
changed from <em>needs_info</em> to <em>needs_review</em>
</li>
</ul>
TicketjpfloriFri, 09 Mar 2012 12:22:46 GMTdescription changed
https://trac.sagemath.org/ticket/715#comment:152
https://trac.sagemath.org/ticket/715#comment:152
<ul>
<li><strong>description</strong>
modified (<a href="/ticket/715?action=diff&version=152">diff</a>)
</li>
</ul>
<p>
I've added warning blocks at the top of files modified by Simon (and fixed minor typos without introducing new ones I hope). The generated doc looks ok.
</p>
<p>
All tests pass on my computer and the numerical evidence we've gathered so far points that there is no speed regression.
</p>
<p>
If Simon or someone else could have a look at my "reviewer patch", this can be put to positive review.
</p>
<p>
Personally, I'm happy with Simon patches.
</p>
TicketSimonKingFri, 09 Mar 2012 12:39:11 GMTstatus changed; reviewer set
https://trac.sagemath.org/ticket/715#comment:153
https://trac.sagemath.org/ticket/715#comment:153
<ul>
<li><strong>status</strong>
changed from <em>needs_review</em> to <em>positive_review</em>
</li>
<li><strong>reviewer</strong>
set to <em>Jean-Pierre Flori</em>
</li>
</ul>
<p>
Hi Jean-Pierre,
</p>
<p>
your reviewer patch looks fine to me! Thank you for fixing the typos and explaining things a bit clearer!
</p>
<p>
So, I change it into "positive review", naming you as a reviewer.
</p>
TicketjpfloriFri, 09 Mar 2012 12:41:49 GMT
https://trac.sagemath.org/ticket/715#comment:154
https://trac.sagemath.org/ticket/715#comment:154
<p>
Great!
</p>
<p>
And sorry for the delay. I'll try to tackle the related tickets this afternoon.
</p>
TicketjdemeyerSat, 10 Mar 2012 09:12:10 GMTstatus, dependencies changed
https://trac.sagemath.org/ticket/715#comment:155
https://trac.sagemath.org/ticket/715#comment:155
<ul>
<li><strong>status</strong>
changed from <em>positive_review</em> to <em>needs_work</em>
</li>
<li><strong>dependencies</strong>
changed from <em>#9138, #11900</em> to <em>#9138, #11900, #11599</em>
</li>
</ul>
<p>
This seems to conflict with <a class="closed ticket" href="https://trac.sagemath.org/ticket/11599" title="enhancement: Wrap fan morphism in toric morphism (closed: fixed)">#11599</a>. With <a class="closed ticket" href="https://trac.sagemath.org/ticket/11599" title="enhancement: Wrap fan morphism in toric morphism (closed: fixed)">#11599</a> applied, I get doctest errors:
</p>
<pre class="wiki">sage -t -force_lib devel/sage/sage/structure/coerce_dict.pyx
**********************************************************************
File "/mnt/usb1/scratch/jdemeyer/merger/sage-5.0.beta8/devel/sage-main/sage/structure/coerce_dict.pyx", line 210:
sage: from sage.schemes.generic.homset import SchemeHomsetModule_abelian_variety_coordinates_field
Exception raised:
Traceback (most recent call last):
File "/mnt/usb1/scratch/jdemeyer/merger/sage-5.0.beta8/local/bin/ncadoctest.py", line 1231, in run_one_test
self.run_one_example(test, example, filename, compileflags)
File "/mnt/usb1/scratch/jdemeyer/merger/sage-5.0.beta8/local/bin/sagedoctest.py", line 38, in run_one_example
OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
File "/mnt/usb1/scratch/jdemeyer/merger/sage-5.0.beta8/local/bin/ncadoctest.py", line 1172, in run_one_example
compileflags, 1) in test.globs
File "<doctest __main__.example_3[33]>", line 1, in <module>
from sage.schemes.generic.homset import SchemeHomsetModule_abelian_variety_coordinates_field###line 210:
sage: from sage.schemes.generic.homset import SchemeHomsetModule_abelian_variety_coordinates_field
ImportError: cannot import name SchemeHomsetModule_abelian_variety_coordinates_field
**********************************************************************
File "/mnt/usb1/scratch/jdemeyer/merger/sage-5.0.beta8/devel/sage-main/sage/structure/coerce_dict.pyx", line 211:
sage: LE = [x for x in gc.get_objects() if isinstance(x,SchemeHomsetModule_abelian_variety_coordinates_field)]
Exception raised:
Traceback (most recent call last):
File "/mnt/usb1/scratch/jdemeyer/merger/sage-5.0.beta8/local/bin/ncadoctest.py", line 1231, in run_one_test
self.run_one_example(test, example, filename, compileflags)
File "/mnt/usb1/scratch/jdemeyer/merger/sage-5.0.beta8/local/bin/sagedoctest.py", line 38, in run_one_example
OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
File "/mnt/usb1/scratch/jdemeyer/merger/sage-5.0.beta8/local/bin/ncadoctest.py", line 1172, in run_one_example
compileflags, 1) in test.globs
File "<doctest __main__.example_3[34]>", line 1, in <module>
LE = [x for x in gc.get_objects() if isinstance(x,SchemeHomsetModule_abelian_variety_coordinates_field)]###line 211:
sage: LE = [x for x in gc.get_objects() if isinstance(x,SchemeHomsetModule_abelian_variety_coordinates_field)]
NameError: name 'SchemeHomsetModule_abelian_variety_coordinates_field' is not defined
**********************************************************************
File "/mnt/usb1/scratch/jdemeyer/merger/sage-5.0.beta8/devel/sage-main/sage/structure/coerce_dict.pyx", line 212:
sage: len(LE) # indirect doctest
Exception raised:
Traceback (most recent call last):
File "/mnt/usb1/scratch/jdemeyer/merger/sage-5.0.beta8/local/bin/ncadoctest.py", line 1231, in run_one_test
self.run_one_example(test, example, filename, compileflags)
File "/mnt/usb1/scratch/jdemeyer/merger/sage-5.0.beta8/local/bin/sagedoctest.py", line 38, in run_one_example
OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
File "/mnt/usb1/scratch/jdemeyer/merger/sage-5.0.beta8/local/bin/ncadoctest.py", line 1172, in run_one_example
compileflags, 1) in test.globs
File "<doctest __main__.example_3[35]>", line 1, in <module>
len(LE) # indirect doctest###line 212:
sage: len(LE) # indirect doctest
NameError: name 'LE' is not defined
**********************************************************************
</pre>
TicketjpfloriTue, 20 Mar 2012 17:25:15 GMT
https://trac.sagemath.org/ticket/715#comment:156
https://trac.sagemath.org/ticket/715#comment:156
<p>
!SchemeHomsetModule_abelian_variety_coordinates_field was indeed renamed to SchemeHomset_points_abelian_variety_field in <a class="closed ticket" href="https://trac.sagemath.org/ticket/11599" title="enhancement: Wrap fan morphism in toric morphism (closed: fixed)">#11599</a>.
</p>
<p>
We have two solutions:
</p>
<ul><li>do the same renaming in the doctests here
</li><li>use the <a class="missing wiki">EllipticCurve?</a> class which provides basically the same test (that's the one I originally pointed out) and which I find more explicit.
</li></ul><p>
I'll provide a patch for this second solution.
</p>
TicketjpfloriTue, 20 Mar 2012 17:33:23 GMT
https://trac.sagemath.org/ticket/715#comment:157
https://trac.sagemath.org/ticket/715#comment:157
<p>
Except that the changes introduced in <a class="closed ticket" href="https://trac.sagemath.org/ticket/11599" title="enhancement: Wrap fan morphism in toric morphism (closed: fixed)">#11599</a> seem to break the work done here by reintroducing some caching...
</p>
TicketjpfloriTue, 20 Mar 2012 17:34:20 GMT
https://trac.sagemath.org/ticket/715#comment:158
https://trac.sagemath.org/ticket/715#comment:158
<p>
More precisely both my proposed solution fix the import error (with EllipticCurve_finite_field for the second one) but then LE is still of length 50, whence no garbage collection occured.
</p>
TicketjpfloriTue, 20 Mar 2012 17:39:38 GMT
https://trac.sagemath.org/ticket/715#comment:159
https://trac.sagemath.org/ticket/715#comment:159
<p>
Applying <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> solves the problem back, so I guess that this ticket and <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> should be merged at once as they both depend on each other.
</p>
TicketjpfloriTue, 20 Mar 2012 17:44:22 GMT
https://trac.sagemath.org/ticket/715#comment:160
https://trac.sagemath.org/ticket/715#comment:160
<p>
Here comes a patch.
</p>
TicketjpfloriTue, 20 Mar 2012 17:45:04 GMTattachment set
https://trac.sagemath.org/ticket/715
https://trac.sagemath.org/ticket/715
<ul>
<li><strong>attachment</strong>
set to <em>trac_715-rebase_11599.patch</em>
</li>
</ul>
<p>
Rebase on top of <a class="closed ticket" href="https://trac.sagemath.org/ticket/11599" title="enhancement: Wrap fan morphism in toric morphism (closed: fixed)">#11599</a>, now circularly depends on <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>
</p>
TicketjpfloriTue, 20 Mar 2012 17:46:24 GMTstatus, dependencies, description changed
https://trac.sagemath.org/ticket/715#comment:161
https://trac.sagemath.org/ticket/715#comment:161
<ul>
<li><strong>status</strong>
changed from <em>needs_work</em> to <em>needs_review</em>
</li>
<li><strong>dependencies</strong>
changed from <em>#9138, #11900, #11599</em> to <em>#9138, #11900, #11599, #11521</em>
</li>
<li><strong>description</strong>
modified (<a href="/ticket/715?action=diff&version=161">diff</a>)
</li>
</ul>
TicketdavidloefflerMon, 26 Mar 2012 13:23:16 GMT
https://trac.sagemath.org/ticket/715#comment:162
https://trac.sagemath.org/ticket/715#comment:162
<p>
A data point that might be helpful: all doctests pass on 5.0.beta10 on 64-bit Linux with qseries
</p>
<pre class="wiki">trac715_one_triple_dict.patch
trac_715-reviewer.patch
trac_715-rebase_11599.patch
trac11521_triple_homset.patch
trac_11521-reviewer.patch
</pre><p>
What is there here that still needs review? I can confirm that the change in jpflori's reviewer patch does not affect the doctest, in the sense that the new patched doctest fails without this ticket applied but succeeds with it. Is this ready to go in?
</p>
TicketSimonKingMon, 26 Mar 2012 13:32:07 GMT
https://trac.sagemath.org/ticket/715#comment:163
https://trac.sagemath.org/ticket/715#comment:163
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:162" title="Comment 162">davidloeffler</a>:
</p>
<blockquote class="citation">
<p>
What is there here that still needs review? I can confirm that the change in jpflori's reviewer patch does not affect the doctest, in the sense that the new patched doctest fails without this ticket applied but succeeds with it. Is this ready to go in?
</p>
</blockquote>
<p>
From my perspective, it is. But I think I am not entitled to set it to positive review, since Jean-Pierre did not explicitly state that he gives his OK.
</p>
TicketjpfloriMon, 26 Mar 2012 13:34:44 GMT
https://trac.sagemath.org/ticket/715#comment:164
https://trac.sagemath.org/ticket/715#comment:164
<p>
Oh, that's my bad, I just wanted to be sure that Simon was ok with my rebase... (and did not want to set it back to positive review because I did the rebase myself)
</p>
<p>
Sorry about that !
</p>
TicketjpfloriMon, 26 Mar 2012 13:35:54 GMTstatus changed
https://trac.sagemath.org/ticket/715#comment:165
https://trac.sagemath.org/ticket/715#comment:165
<ul>
<li><strong>status</strong>
changed from <em>needs_review</em> to <em>positive_review</em>
</li>
</ul>
<p>
And I'm putting the ticket back to positive review because the three of us seem happy with it.
</p>
TicketdavidloefflerMon, 26 Mar 2012 13:41:13 GMTdescription changed
https://trac.sagemath.org/ticket/715#comment:166
https://trac.sagemath.org/ticket/715#comment:166
<ul>
<li><strong>description</strong>
modified (<a href="/ticket/715?action=diff&version=166">diff</a>)
</li>
</ul>
TicketjdemeyerSun, 01 Apr 2012 19:20:58 GMT
https://trac.sagemath.org/ticket/715#comment:167
https://trac.sagemath.org/ticket/715#comment:167
<p>
Bad news...
</p>
<p>
Applying <a class="attachment" href="https://trac.sagemath.org/attachment/ticket/715/trac715_one_triple_dict.patch" title="Attachment 'trac715_one_triple_dict.patch' in Ticket #715">trac715_one_triple_dict.patch</a><a class="trac-rawlink" href="https://trac.sagemath.org/raw-attachment/ticket/715/trac715_one_triple_dict.patch" title="Download"></a> causes Segmentation Faults on startup on 32-bit systems.
</p>
<p>
$ ./sage --python -v -c 'import sage.all'
</p>
<pre class="wiki">[...]
import sage.libs.singular.function_factory # precompiled from /home/jdemeyer/silius/sage-5.0.beta12-gcc-32/local/lib/python2.7/site-packages/sage/libs/singular/function_factory.pyc
import sage.rings.polynomial.multi_polynomial_libsingular # dynamically loaded from /home/jdemeyer/silius/sage-5.0.beta12-gcc-32/local/lib/python2.7/site-packages/sage/rings/polynomial/multi_polynomial_libsingular.so
/home/jdemeyer/silius/sage-5.0.beta12-gcc-32/local/lib/libcsage.so(print_backtrace+0x4c)[0xf9f7c74]
/home/jdemeyer/silius/sage-5.0.beta12-gcc-32/local/lib/libcsage.so(sigdie+0x34)[0xf9f7ce0]
/home/jdemeyer/silius/sage-5.0.beta12-gcc-32/local/lib/libcsage.so(sage_signal_handler+0x20c)[0xf9f77d4]
[0x100364]
[0x10bb81d0]
/home/jdemeyer/silius/sage-5.0.beta12-gcc-32/local/lib/python2.7/site-packages/sage/structure/coerce.so(+0xb994)[0xe6ab994]
/home/jdemeyer/silius/sage-5.0.beta12-gcc-32/local/lib/python2.7/site-packages/sage/structure/coerce.so(+0x16654)[0xe6b6654]
/home/jdemeyer/silius/sage-5.0.beta12-gcc-32/local/lib/python2.7/site-packages/sage/structure/element.so(__pyx_f_4sage_9structure_7element_7Element__richcmp+0x42c)[0xe735e80]
/home/jdemeyer/silius/sage-5.0.beta12-gcc-32/local/lib/python2.7/site-packages/sage/rings/real_mpfr.so(+0xcf70)[0xd6bcf70]
/home/jdemeyer/silius/sage-5.0.beta12-gcc-32/local/lib/libpython2.7.so.1.0(+0x90794)[0xfeb0794]
/home/jdemeyer/silius/sage-5.0.beta12-gcc-32/local/lib/libpython2.7.so.1.0(PyObject_RichCompare+0x84)[0xfeb2d8c]
/home/jdemeyer/silius/sage-5.0.beta12-gcc-32/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x2b60)[0xff1cd10]
/home/jdemeyer/silius/sage-5.0.beta12-gcc-32/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x78a4)[0xff21a54]
/home/jdemeyer/silius/sage-5.0.beta12-gcc-32/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x964)[0xff2240c]
/home/jdemeyer/silius/sage-5.0.beta12-gcc-32/local/lib/libpython2.7.so.1.0(+0x73138)[0xfe93138]
/home/jdemeyer/silius/sage-5.0.beta12-gcc-32/local/lib/libpython2.7.so.1.0(PyObject_Call+0x74)[0xfe63900]
/home/jdemeyer/silius/sage-5.0.beta12-gcc-32/local/lib/libpython2.7.so.1.0(+0x52dac)[0xfe72dac]
/home/jdemeyer/silius/sage-5.0.beta12-gcc-32/local/lib/libpython2.7.so.1.0(PyObject_Call+0x74)[0xfe63900]
[...]
------------------------------------------------------------------------
Unhandled SIGSEGV: A segmentation fault occurred in Sage.
This probably occurred because a *compiled* component of Sage has a bug
in it and is not properly wrapped with sig_on(), sig_off(). You might
want to run Sage under gdb with 'sage -gdb' to debug this.
Sage will now terminate.
------------------------------------------------------------------------
/home/jdemeyer/silius/sage-5.0.beta12-gcc-32/spkg/bin/sage: line 464: 16347 Segmentation fault python "$@"
</pre>
TicketjdemeyerSun, 01 Apr 2012 19:21:30 GMTstatus changed
https://trac.sagemath.org/ticket/715#comment:168
https://trac.sagemath.org/ticket/715#comment:168
<ul>
<li><strong>status</strong>
changed from <em>positive_review</em> to <em>needs_work</em>
</li>
</ul>
TicketjpfloriMon, 02 Apr 2012 07:31:18 GMT
https://trac.sagemath.org/ticket/715#comment:169
https://trac.sagemath.org/ticket/715#comment:169
<p>
Too bad...
I don't have access to 32 bits cpus but I'll try to setup a <a class="missing wiki">VirtualBox?</a> installation.
I'll also ask William for an account on skynet.
</p>
TicketjpfloriWed, 11 Apr 2012 12:39:31 GMT
https://trac.sagemath.org/ticket/715#comment:170
https://trac.sagemath.org/ticket/715#comment:170
<p>
No response from William yet, but I've finally managed to setup a Sage installation on a 32 bits installation of Ubuntu 12.04 beta 2 within a virtual machine and could reproduce the crash.
Let's now investigate it.
</p>
TicketjpfloriWed, 11 Apr 2012 12:48:22 GMT
https://trac.sagemath.org/ticket/715#comment:171
https://trac.sagemath.org/ticket/715#comment:171
<p>
The segfault gets raised in a call to <a class="missing wiki">TripleDict?</a>.get
</p>
TicketjpfloriWed, 11 Apr 2012 13:09:40 GMT
https://trac.sagemath.org/ticket/715#comment:172
https://trac.sagemath.org/ticket/715#comment:172
<p>
More precisely in the line:
</p>
<pre class="wiki">cdef list bucket = <object>PyList_GET_ITEM(all_buckets, h % PyList_GET_SIZE(all_buckets))
</pre>
TicketjpfloriWed, 11 Apr 2012 13:17:43 GMT
https://trac.sagemath.org/ticket/715#comment:173
https://trac.sagemath.org/ticket/715#comment:173
<p>
Putting back the if h<0: h=-h (without really thinking about it) seems to solve the problem.
</p>
TicketjpfloriWed, 11 Apr 2012 13:50:19 GMT
https://trac.sagemath.org/ticket/715#comment:174
https://trac.sagemath.org/ticket/715#comment:174
<p>
The problem seems to be that the C "%" operator returns a result of the same sign as its input.
</p>
<p>
That is : 14%15 -> 14, but -1%15 -> -1
</p>
TicketjpfloriWed, 11 Apr 2012 14:38:20 GMTattachment set
https://trac.sagemath.org/ticket/715
https://trac.sagemath.org/ticket/715
<ul>
<li><strong>attachment</strong>
set to <em>trac_715-modulo.patch</em>
</li>
</ul>
<p>
C modulo operator
</p>
TicketjpfloriWed, 11 Apr 2012 14:38:59 GMTstatus, description changed
https://trac.sagemath.org/ticket/715#comment:175
https://trac.sagemath.org/ticket/715#comment:175
<ul>
<li><strong>status</strong>
changed from <em>needs_work</em> to <em>needs_review</em>
</li>
<li><strong>description</strong>
modified (<a href="/ticket/715?action=diff&version=175">diff</a>)
</li>
</ul>
TicketjpfloriWed, 11 Apr 2012 17:02:43 GMTstatus changed
https://trac.sagemath.org/ticket/715#comment:176
https://trac.sagemath.org/ticket/715#comment:176
<ul>
<li><strong>status</strong>
changed from <em>needs_review</em> to <em>needs_work</em>
</li>
</ul>
<p>
Even though the current patches should be OK, I'll provide a slightly different patch after my monologue at <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a> to be more consistent.
</p>
TicketSimonKingWed, 11 Apr 2012 19:18:29 GMT
https://trac.sagemath.org/ticket/715#comment:177
https://trac.sagemath.org/ticket/715#comment:177
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:173" title="Comment 173">jpflori</a>:
</p>
<blockquote class="citation">
<p>
Putting back the if h<0: h=-h (without really thinking about it) seems to solve the problem.
</p>
</blockquote>
<p>
Thank you for tracking that down! I tested that the Cython modulo operator works like the Python one, but apparently my mistake was that I tested the Cython modulo only on Sage integers, but not on C types.
</p>
<p>
I wonder whether there is a better way to get rid of the problem. for example: The number h is determined by converting the memory address of an object into <code>Py_ssize_t</code> - which is signed. Isn't there an unsigned <code>Py_size_t</code> (size_t, not ssize_t) as well? Perhaps one should try to use the unsigned type instead? In that way one would avoid the problem of a negative modulus, but would still avoid the slow-down resulting from the test "<code>if h<0</code>".
</p>
<p>
I would like to test whether that works (next week, though).
</p>
TicketjpfloriWed, 11 Apr 2012 19:42:23 GMT
https://trac.sagemath.org/ticket/715#comment:178
https://trac.sagemath.org/ticket/715#comment:178
<p>
Good idea about the unsigned type.
</p>
<p>
Don't worry about doing it next week, I should be able to test that tomorrow.
</p>
TicketjpfloriThu, 12 Apr 2012 08:09:59 GMT
https://trac.sagemath.org/ticket/715#comment:179
https://trac.sagemath.org/ticket/715#comment:179
<p>
For info,
Py_ssize_t was defined by that PEP:
<a class="ext-link" href="http://www.python.org/dev/peps/pep-0353/"><span class="icon"></span>http://www.python.org/dev/peps/pep-0353/</a>
and adopted in Python 2.5
</p>
<p>
There is no Py_size_t, but I guess that using plain C size_t is ok (the point of Py_ssize_t is to be a signed stuff of the same size as size_t).
</p>
TicketjpfloriThu, 12 Apr 2012 13:47:31 GMTattachment set
https://trac.sagemath.org/ticket/715
https://trac.sagemath.org/ticket/715
<ul>
<li><strong>attachment</strong>
set to <em>trac_715-one_triple_dict-take2.patch</em>
</li>
</ul>
<p>
Version without fuzz
</p>
TicketjpfloriThu, 12 Apr 2012 13:48:01 GMTattachment set
https://trac.sagemath.org/ticket/715
https://trac.sagemath.org/ticket/715
<ul>
<li><strong>attachment</strong>
set to <em>trac_715-reviewer-take2.patch</em>
</li>
</ul>
TicketjpfloriThu, 12 Apr 2012 13:48:16 GMTattachment set
https://trac.sagemath.org/ticket/715
https://trac.sagemath.org/ticket/715
<ul>
<li><strong>attachment</strong>
set to <em>trac_715-rebase_11599-take2.patch</em>
</li>
</ul>
TicketjpfloriThu, 12 Apr 2012 13:49:07 GMTattachment set
https://trac.sagemath.org/ticket/715
https://trac.sagemath.org/ticket/715
<ul>
<li><strong>attachment</strong>
set to <em>trac_715-size_t-take2.patch</em>
</li>
</ul>
<p>
Use size_t instead of Py_ssize_t for indices used by PyList_GET_ITEM
</p>
TicketjpfloriThu, 12 Apr 2012 13:51:55 GMTstatus, description changed
https://trac.sagemath.org/ticket/715#comment:180
https://trac.sagemath.org/ticket/715#comment:180
<ul>
<li><strong>status</strong>
changed from <em>needs_work</em> to <em>needs_review</em>
</li>
<li><strong>description</strong>
modified (<a href="/ticket/715?action=diff&version=180">diff</a>)
</li>
</ul>
<p>
The current patches seem ok both on my 64 bits system and on the virtual 32 bits system running within it.
At least Sage does start and computes correctly 1+1.
I'm currently running "make ptest" on both system.
On the latter, this will take an awfully long time.
</p>
<p>
I've also taken the liberty to modify the "reviewer" patch to fix formatting issues (and rebase patches of <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a> on top of that).
</p>
TicketjpfloriThu, 12 Apr 2012 14:00:17 GMT
https://trac.sagemath.org/ticket/715#comment:181
https://trac.sagemath.org/ticket/715#comment:181
<p>
As I reported in <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a> :
With the new patches of <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a> + <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> + <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a>, Sage starts both on 32 and 64 bits and make test pass on 64 bits (32 bits not finished yet).
</p>
TicketSimonKingFri, 13 Apr 2012 06:02:57 GMT
https://trac.sagemath.org/ticket/715#comment:182
https://trac.sagemath.org/ticket/715#comment:182
<p>
Is Jean-Pierre just reviewer, or author as well?
</p>
<p>
Anyway, I am now testing whether the stuff from here plus <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> plus <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a> works for me as well, with size_t. And perhaps I'll also do some timings tomorrow. If Jean-Pierre is author as well, we could cross-review.
</p>
<p>
And I think I'll also create a combined patch, for each of the three tickets.
</p>
TicketjpfloriFri, 13 Apr 2012 06:47:55 GMT
https://trac.sagemath.org/ticket/715#comment:183
https://trac.sagemath.org/ticket/715#comment:183
<p>
I don't mind being one of the authors as I spent some time on the ticket as well, although you clearly produced most of the code.
And as you point out, it will make you more "legitimate" to set the ticket back to positive review after my last changes.
</p>
<p>
The tset finished in my 32 bits virtual machine and I got 4 failures.
Not sure they are related to the tickets here.
It could just be time outs and issues related to Gap.
I'm rerunning make test, or rather a working euivalent command, with proper logging to check that.
</p>
<p>
Of course if someone has access to a real 32 bits system, that would be easier to test.
</p>
TicketjpfloriFri, 13 Apr 2012 09:14:38 GMT
https://trac.sagemath.org/ticket/715#comment:184
https://trac.sagemath.org/ticket/715#comment:184
<p>
Rerunning the tests within the virtual machine raised (less) errors in the same files.
</p>
<p>
Namely:
</p>
<ol><li>A segfault in sage/parallel/decorate.py instead of killing something because of a too long computation
</li><li>An error in sage/misc/sagedoc.py caused by a failing search_src_or_doc (?!?)
</li><li>An error in sage/misc/misc.py about an alarm not going off
</li><li>0 error in sage/structure/parent.pyx who got killed
</li></ol>
TicketjpfloriFri, 13 Apr 2012 09:44:28 GMT
https://trac.sagemath.org/ticket/715#comment:185
https://trac.sagemath.org/ticket/715#comment:185
<p>
I could reproduce the previous errors in parent.pyx and they stem from a <a class="missing wiki">MeomryError?</a> and fail to evaluate the cython(...) code defining classes because of some IOError, so I'm not sure it's related to the tickets here.It might be because of the environment its run within.
</p>
TicketSimonKingFri, 13 Apr 2012 14:16:34 GMT
https://trac.sagemath.org/ticket/715#comment:186
https://trac.sagemath.org/ticket/715#comment:186
<p>
For the record: With the current patches from <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a> + <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> + <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a>, all tests pass. But I can not test on 32 bit, I'm afraid.
</p>
<p>
But MemoryError looks strange to me. I hope it is unrelated with these patches. Anyway, I'm certainly going to use them for my own work.
</p>
<p>
I still think it would be good to have a combined patch. Anyway, I give a positive review to Jean-Pierre's contribution.
</p>
TicketSimonKingSat, 14 Apr 2012 12:01:23 GMTdescription changed
https://trac.sagemath.org/ticket/715#comment:187
https://trac.sagemath.org/ticket/715#comment:187
<ul>
<li><strong>description</strong>
modified (<a href="/ticket/715?action=diff&version=187">diff</a>)
</li>
</ul>
<p>
I have just attached a combined patch, created by simply folding all patches that were previously to be applied.
</p>
<p>
With <em>only</em> that patch, I obtain a single doctest error:
</p>
<pre class="wiki">sage -t -force_lib "devel/sage/sage/structure/coerce_dict.pyx"
**********************************************************************
File "/mnt/local/king/SAGE/stable/sage-5.0.beta13/devel/sage/sage/structure/coerce_dict.pyx", line 210:
sage: len(LE) # indirect doctest
Expected:
1
Got:
50
</pre><p>
However, this is to be merged together with <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>, and with both tickets together the error vanishes (at least on 64 bit). So, from my point of view, it is a positive review, but we should wait for Jean-Pierre's results on 32 bit.
</p>
<p>
For the patchbot:
</p>
<p>
Apply trac_715_combined.patch
</p>
TicketjpfloriTue, 17 Apr 2012 11:33:44 GMTstatus, reviewer, author changed
https://trac.sagemath.org/ticket/715#comment:188
https://trac.sagemath.org/ticket/715#comment:188
<ul>
<li><strong>status</strong>
changed from <em>needs_review</em> to <em>positive_review</em>
</li>
<li><strong>reviewer</strong>
changed from <em>Jean-Pierre Flori</em> to <em>Jean-Pierre Flori, Simon King</em>
</li>
<li><strong>author</strong>
changed from <em>Simon King</em> to <em>Simon King, Jean-Pierre Flori</em>
</li>
</ul>
<p>
The tests we've run on 32 bits seem conclusive, so I'm putting this back to positive review.
The errors I got care due to memory shortage within my virtual machine and were not reproduce on real systems.
</p>
TicketcremonaTue, 17 Apr 2012 19:07:37 GMT
https://trac.sagemath.org/ticket/715#comment:189
https://trac.sagemath.org/ticket/715#comment:189
<p>
I'm just confirming that applying this patch & that at <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> to 5.0-beta13 on a 32-bit linux machine, all tests pass.
</p>
TicketjdemeyerTue, 17 Apr 2012 22:23:09 GMTmilestone changed
https://trac.sagemath.org/ticket/715#comment:190
https://trac.sagemath.org/ticket/715#comment:190
<ul>
<li><strong>milestone</strong>
changed from <em>sage-5.0</em> to <em>sage-5.1</em>
</li>
</ul>
TicketjdemeyerSun, 06 May 2012 12:12:31 GMTstatus changed; resolution, merged set
https://trac.sagemath.org/ticket/715#comment:191
https://trac.sagemath.org/ticket/715#comment:191
<ul>
<li><strong>status</strong>
changed from <em>positive_review</em> to <em>closed</em>
</li>
<li><strong>resolution</strong>
set to <em>fixed</em>
</li>
<li><strong>merged</strong>
set to <em>sage-5.1.beta0</em>
</li>
</ul>
TicketjdemeyerThu, 05 Jul 2012 08:40:06 GMTstatus, milestone changed; resolution, merged deleted
https://trac.sagemath.org/ticket/715#comment:192
https://trac.sagemath.org/ticket/715#comment:192
<ul>
<li><strong>status</strong>
changed from <em>closed</em> to <em>new</em>
</li>
<li><strong>resolution</strong>
<em>fixed</em> deleted
</li>
<li><strong>merged</strong>
<em>sage-5.1.beta0</em> deleted
</li>
<li><strong>milestone</strong>
changed from <em>sage-5.1</em> to <em>sage-5.2</em>
</li>
</ul>
<p>
Unmerging this due to unmerging the dependency <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>.
</p>
TicketjdemeyerThu, 05 Jul 2012 09:04:24 GMTdependencies changed
https://trac.sagemath.org/ticket/715#comment:193
https://trac.sagemath.org/ticket/715#comment:193
<ul>
<li><strong>dependencies</strong>
changed from <em>#9138, #11900, #11599, #11521</em> to <em>#9138, #11900, #11599, to be merged with #11521</em>
</li>
</ul>
TicketjdemeyerFri, 13 Jul 2012 11:51:52 GMTstatus changed
https://trac.sagemath.org/ticket/715#comment:194
https://trac.sagemath.org/ticket/715#comment:194
<ul>
<li><strong>status</strong>
changed from <em>new</em> to <em>needs_review</em>
</li>
</ul>
TicketjdemeyerFri, 13 Jul 2012 11:52:03 GMTstatus, milestone changed
https://trac.sagemath.org/ticket/715#comment:195
https://trac.sagemath.org/ticket/715#comment:195
<ul>
<li><strong>status</strong>
changed from <em>needs_review</em> to <em>positive_review</em>
</li>
<li><strong>milestone</strong>
changed from <em>sage-5.2</em> to <em>sage-pending</em>
</li>
</ul>
TicketSimonKingWed, 15 Aug 2012 15:18:17 GMTcc changed
https://trac.sagemath.org/ticket/715#comment:196
https://trac.sagemath.org/ticket/715#comment:196
<ul>
<li><strong>cc</strong>
<em>nbruin</em> added
</li>
</ul>
<p>
Nils has stated on sage-devel that he was not (immediately) able to apply the patch to sage-5.3.beta2. Indeed there was fuzz 2. So, I rebased the patch, it should now apply fine.
</p>
TicketSimonKingWed, 15 Aug 2012 15:18:44 GMT
https://trac.sagemath.org/ticket/715#comment:197
https://trac.sagemath.org/ticket/715#comment:197
<p>
I forgot:
</p>
<p>
Apply trac_715_combined.patch
</p>
TicketnbruinWed, 15 Aug 2012 23:51:23 GMTstatus changed
https://trac.sagemath.org/ticket/715#comment:198
https://trac.sagemath.org/ticket/715#comment:198
<ul>
<li><strong>status</strong>
changed from <em>positive_review</em> to <em>needs_work</em>
</li>
</ul>
<p>
When reviewing <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a> I observed a possible problem for slight leaking (see <a class="ext-link" href="http://trac.sagemath.org/sage_trac/ticket/12313#comment:125"><span class="icon"></span>comment 125</a>):
</p>
<p>
When all <code>KeyRef</code> objects under a certain key in <code>_refcache</code> get deleted, I think you're left with a <code>{<key> : []}</code> entry in <code>_refcache</code>. So I think in <code>TripleDictEraser.__call__</code> you need an extra line:
</p>
<pre class="wiki"> cdef list L = _refcache[k1,k2,k3]
del L[L.index(r)]
if len(L)==0:
del _refcache[k1,k2,k3]
</pre><p>
or whatever is the best way to remove such things.
</p>
<p>
Similar on <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a> in <code>MonoDictEraser.__call__</code> of course.
</p>
<p>
By all means, if you have a good argument why this is not necessary, revert to Positive Review (and I'd be interested in seeing the argument).
</p>
<p>
<strong>unweakreffable keys</strong>
</p>
<p>
Note that currently, any key that doesn't allow weakreffing, gets a (permanent, global) strong ref in <code>_refcache</code> in the value list, keyed by their <code>id</code>. That's worse than a normal <code>dict</code>. A possible solution is to have a <code>strongrefcache</code> on the <code>MonoDict</code> or <code>TripleDict</code> itself. Then at least the references disappear when the Dict itself goes.
</p>
<p>
You'd have to ensure that whenever an entry gets deleted from the <code>MonoDict</code> or the <code>TripleDict</code>, that any references in <code>strongrefcache</code> to relevant key components get removed too. Especially for <code>TripleDict</code>, this needs to happen in <code>TripleDictEraser</code> too, because if any weakreffable key component gets GCd, the whole entry gets removed, so strong refs to other key components should be released.
</p>
<p>
Of course, it would be better to insist that for <code>TripleDict</code>s, there should be
at least one weakreffable key component and that for <code>MonoDict</code>s only
weakreffable keys are allowed. You might investigate where the offending keys
arise. One place is sage.rings.Ring.ideal (line 495):
</p>
<pre class="wiki"> gens = args
...
first = gens[0]
...
elif self.has_coerce_map_from(first):
gens = first.gens() # we have a ring as argument
</pre><p>
so if you do <code>4*ZZ</code> then this gets called with <code>self=ZZ</code> and <code>first=4</code>. This is
how bare integers end up being used as keys into <code>MonoDict</code>. Since this gets
stored in <code>ZZ._coerce_from_hash</code> it's as bad as a permanent reference (we cannot
put a weakref on 4)
</p>
<hr />
<p>
<strong>[EDIT] OBSERVATION:</strong>
really it looks like this is trying to detect the rare case of
</p>
<pre class="wiki">R.ideal(S)
</pre><p>
where <code>S</code> is a ring/ideal coercible into <code>R</code> and we're computing <code>S*R</code>, the
extension of <code>S</code> to an ideal of <code>R</code>. Isn't it a little expensive to abuse to
coercion framework for this, expecting it to fail? Can't we use the category
framework for this and do something like
</p>
<pre class="wiki"> elif first in Magmas and self.has_coerce_map_from(first):
gens = first.gens() # we have a ring as argument
</pre><p>
or whatever is an appropriate test to see if first is even a parent that has a
chance of having a coerce map to self?
</p>
<p>
YEP it is. In vanilla 5.0 (so that's even WITH caching)
</p>
<pre class="wiki">sage: R=Rings()
sage: timeit('ZZ.has_coerce_map_from(3)')
625 loops, best of 3: 15.9 µs per loop
sage: timeit('3 in R')
625 loops, best of 3: 6.55 µs per loop
</pre><p>
so we should definitely test the category of the element. Question is: which
category? Ideals are not in <code>Rings()</code> (which are unitary rings), but they are in
<a class="missing wiki">CommutativeAdditiveMonoids?</a>(). Creation of ideals still works if this works,
though:
</p>
<pre class="wiki">sage: ZZ.has_coerce_map_from(3*ZZ)
False
</pre><p>
so I'm not so sure if that branch ever essentially gets used.
</p>
<p>
This is a separate issue, continued on <a class="needs_work ticket" href="https://trac.sagemath.org/ticket/13374" title="defect: Improve identification of arguments to Ring.ideal (needs_work)">#13374</a>.
</p>
<hr />
<p>
The storing happens in sage.structure.parent (line 1990):
</p>
<pre class="wiki"> if (mor is not None) or _may_cache_none(self, S, "coerce"):
self._coerce_from_hash[S] = mor
</pre><p>
perhaps we should also disallow caching None if S is not weakreffable. Since
valid parents should always be weakreffable, we could perhaps just return None
for <code>has_coerce_map_from</code> for non-weakreffable <code>S</code>.
</p>
TicketSimonKingThu, 16 Aug 2012 07:48:52 GMT
https://trac.sagemath.org/ticket/715#comment:199
https://trac.sagemath.org/ticket/715#comment:199
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:198" title="Comment 198">nbruin</a>:
</p>
<blockquote class="citation">
<p>
When reviewing <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a> I observed a possible problem for slight leaking
...
By all means, if you have a good argument why this is not necessary, revert to Positive Review (and I'd be interested in seeing the argument).
</p>
</blockquote>
<p>
Yes, it is a potential leak. The argument would be:
</p>
<ul><li><em>If</em> all items indexed by a certain key triple are gone, we are left with three size_t and with one pointer to an empty list, that will not be collected; that's just a few bytes.
</li><li>It is (I believe) quite likely that the same key triple will be used again. Hence, the few bytes will actually be used again.
</li><li>As long as it is not noticeable in a practical computation, I am not sure if it is a good idea to slow deallocation down with a test "if len(L)==0".
</li></ul><p>
OK, that is not more than a heuristical argument. The patch would allow a (I believe) very small leak, for the sake of a (probably) very small speed-up.
</p>
<blockquote class="citation">
<p>
<strong>unweakreffable keys</strong>
</p>
<p>
Note that currently, any key that doesn't allow weakreffing, gets a (permanent, global) strong ref in <code>_refcache</code> in the value list, keyed by their <code>id</code>. That's worse than a normal <code>dict</code>. A possible solution is to have a <code>strongrefcache</code> on the <code>MonoDict</code> or <code>TripleDict</code> itself. Then at least the references disappear when the Dict itself goes.
</p>
</blockquote>
<p>
Hm. It is quite a long time ago that I wrote the code, so I need some time to reconstruct what I thought.
</p>
<p>
The data of a <code>TripleDict</code> are stored in buckets. The buckets just provide memory locations of the keys. This is in order to make access to the data very fast: Otherwise, one would have to do special cases for keys that are weak-refable and those that are not. By consequence, the weak references (with callback function) to the keys need to be stored somewhere else: in _refcache. In that way, items whose keys got garbage collected can be removed from cache.
</p>
<p>
But why did I put <em>strong</em> references in _refcache as well? Let (k1,k2,k3) be a key, and assume that k1 is not weak-refable. Assume further that no external reference to k1 is left, but there are external strong references to k2 and k3. If I would not store a strong reference to k1 in _refcache, then k1 would be garbage collected. Since we do not have a weak reference with callback for k1 and since k2 and k3 can not be collected, the item for (k1,k2,k3) remains in the <code>TripleDict</code>. Hence, when iterating over the items (and there is existing code that does iterate over the items!), we would meet a reference to k1 <em>after</em> it was garbage collected. That means a segfault occurs.
</p>
<p>
In other words: If k2 and k3 are not collectable and k1 can not be weak-refed, then we must ensure that k1 stays alive. The solution is to keep a strong reference to k1 in _refcache.
</p>
<p>
But now I wonder: Wouldn't it be better to have _refcache not as a global dictionary, but have a separate _refcache for each <code>TripleDict</code>, so that it gets collected if the <code>TripleDict</code> gets collected? Is that your suggestion?
</p>
<p>
I think this would be worth trying.
</p>
<blockquote class="citation">
<p>
You'd have to ensure that whenever an entry gets deleted from the <code>MonoDict</code> or the <code>TripleDict</code>, that any references in <code>strongrefcache</code> to relevant key components get removed too. Especially for <code>TripleDict</code>, this needs to happen in <code>TripleDictEraser</code> too, because if any weakreffable key component gets GCd, the whole entry gets removed, so strong refs to other key components should be released.
</p>
</blockquote>
<p>
As I have pointed out, it is important that weak or strong references are stored in _refcache. But perhaps the items in _refcache should be triples of weak or strong references? If I am not mistaken, if (k1,k2,k3) is a key, then it is uniquely determined by (id(k1),id(k2),id(k3)). We store weak references (if possible) that provide (id(k1),id(k2),id(k3)). Hence, the callback function of the weak reference can simply delete this entry.
</p>
<p>
<span class="underline">Conclusion</span>
</p>
<ul><li>I will try if the <code>"if len(L)==0"</code> test leads to a slow-down
</li><li>I will try to replace the global _refcache by a dictionary that is local to each <code>TripleDict</code>.
</li><li>I will store the references provided by _refcache in a different form, so that they can more easily be deleted.
</li></ul>
TicketSimonKingThu, 16 Aug 2012 10:14:54 GMT
https://trac.sagemath.org/ticket/715#comment:200
https://trac.sagemath.org/ticket/715#comment:200
<p>
Too bad. I tried to add the following to the old patch:
</p>
<div class="wiki-code"><div xmlns="http://www.w3.org/1999/xhtml" class="diff">
<ul class="entries">
<li class="entry">
<h2>
<a>sage/structure/coerce_dict.pxd</a>
</h2>
<pre>diff --git a/sage/structure/coerce_dict.pxd b/sage/structure/coerce_dict.pxd</pre>
<table class="trac-diff inline" summary="Differences" cellspacing="0">
<colgroup><col class="lineno" /><col class="lineno" /><col class="content" /></colgroup>
<thead>
<tr>
<th title="File a/sage/structure/coerce_dict.pxd">
a
</th>
<th title="File b/sage/structure/coerce_dict.pxd">
b
</th>
<td><em></em> </td>
</tr>
</thead>
<tbody class="unmod">
<tr>
<th>1</th><th>1</th><td class="l"><span>cdef class TripleDict:</span></td>
</tr><tr>
<th>2</th><th>2</th><td class="l"><span> cdef Py_ssize_t _size</span></td>
</tr><tr>
<th>3</th><th>3</th><td class="l"><span> cdef buckets</span></td>
</tr>
</tbody><tbody class="add">
<tr class="last first">
<th> </th><th>4</th><td class="r"><ins> cdef dict _refcache</ins></td>
</tr>
</tbody><tbody class="unmod">
<tr>
<th>4</th><th>5</th><td class="l"><span> cdef double threshold</span></td>
</tr><tr>
<th>5</th><th>6</th><td class="l"><span> cdef TripleDictEraser eraser</span></td>
</tr><tr>
<th>6</th><th>7</th><td class="l"><span> cdef get(self, object k1, object k2, object k3)</span></td>
</tr>
</tbody>
</table>
</li>
<li class="entry">
<h2>
<a>sage/structure/coerce_dict.pyx</a>
</h2>
<pre>diff --git a/sage/structure/coerce_dict.pyx b/sage/structure/coerce_dict.pyx</pre>
<table class="trac-diff inline" summary="Differences" cellspacing="0">
<colgroup><col class="lineno" /><col class="lineno" /><col class="content" /></colgroup>
<thead>
<tr>
<th title="File a/sage/structure/coerce_dict.pyx">
a
</th>
<th title="File b/sage/structure/coerce_dict.pyx">
b
</th>
<td><em></em> </td>
</tr>
</thead>
<tbody class="unmod">
<tr>
<th>18</th><th>18</th><td class="l"><span># removing dead references from the cache</span></td>
</tr><tr>
<th>19</th><th>19</th><td class="l"><span>############################################</span></td>
</tr><tr>
<th>20</th><th>20</th><td class="l"><span></span></td>
</tr>
</tbody><tbody class="rem">
<tr class="first">
<th>21</th><th> </th><td class="l"><del>cdef dict _refcache = {}</del></td>
</tr><tr class="last">
<th>22</th><th> </th><td class="l"><del></del></td>
</tr>
</tbody><tbody class="unmod">
<tr>
<th>23</th><th>21</th><td class="l"><span>cdef class TripleDictEraser:</span></td>
</tr><tr>
<th>24</th><th>22</th><td class="l"><span> """</span></td>
</tr><tr>
<th>25</th><th>23</th><td class="l"><span> Erases items from a :class:`TripleDict` when a weak reference becomes</span></td>
</tr>
</tbody>
<tbody class="skipped">
<tr>
<th><a href="#L108">…</a></th>
<th><a href="#L106">…</a></th>
<td><em></em> </td>
</tr>
</tbody>
<tbody class="unmod">
<tr>
<th>108</th><th>106</th><td class="l"><span> del bucket[i:i+4]</span></td>
</tr><tr>
<th>109</th><th>107</th><td class="l"><span> self.D._size -= 1</span></td>
</tr><tr>
<th>110</th><th>108</th><td class="l"><span> break</span></td>
</tr>
</tbody><tbody class="mod">
<tr class="first">
<th>111</th><th> </th><td class="l"><span> cdef list L = _refcache[k1,k2,k3]</span></td>
</tr><tr>
<th>112</th><th> </th><td class="l"><span> del L[L.index(r)]</span></td>
</tr>
<tr>
<th> </th><th>109</th><td class="r"><span> try:</span></td>
</tr><tr>
<th> </th><th>110</th><td class="r"><span> self.D._refcache.__delitem__((k1,k2,k3))</span></td>
</tr><tr>
<th> </th><th>111</th><td class="r"><span> except KeyError:</span></td>
</tr><tr class="last">
<th> </th><th>112</th><td class="r"><span> pass</span></td>
</tr>
</tbody><tbody class="unmod">
<tr>
<th>113</th><th>113</th><td class="l"><span></span></td>
</tr><tr>
<th>114</th><th>114</th><td class="l"><span>cdef class TripleDict:</span></td>
</tr><tr>
<th>115</th><th>115</th><td class="l"><span> """</span></td>
</tr>
</tbody>
<tbody class="skipped">
<tr>
<th><a href="#L432">…</a></th>
<th><a href="#L432">…</a></th>
<td><em></em> </td>
</tr>
</tbody>
<tbody class="unmod">
<tr>
<th>432</th><th>432</th><td class="l"><span> PyList_Append(bucket, h3)</span></td>
</tr><tr>
<th>433</th><th>433</th><td class="l"><span> PyList_Append(bucket, value)</span></td>
</tr><tr>
<th>434</th><th>434</th><td class="l"><span> try:</span></td>
</tr>
</tbody><tbody class="mod">
<tr class="first">
<th>435</th><th> </th><td class="l"><span> PyList_Append(_refcache.setdefault((h1 , h2, h3), []),</span></td>
</tr><tr>
<th>436</th><th> </th><td class="l"><span> KeyedRef(k1,self.eraser,(h1, h2, h3)))</span></td>
</tr>
<tr class="last">
<th> </th><th>435</th><td class="r"><span> ref1 = KeyedRef(k1,self.eraser,(h1, h2, h3)))</span></td>
</tr>
</tbody><tbody class="unmod">
<tr>
<th>437</th><th>436</th><td class="l"><span> except TypeError:</span></td>
</tr>
</tbody><tbody class="mod">
<tr class="first">
<th>438</th><th> </th><td class="l"><span> <del>PyList_Append(_refcache.setdefault((h1, h2, h3), []), k1)</del></span></td>
</tr>
<tr class="last">
<th> </th><th>437</th><td class="r"><span> <ins>ref1 = k1</ins></span></td>
</tr>
</tbody><tbody class="unmod">
<tr>
<th>439</th><th>438</th><td class="l"><span> if k2 is not k1:</span></td>
</tr><tr>
<th>440</th><th>439</th><td class="l"><span> try:</span></td>
</tr>
</tbody><tbody class="mod">
<tr class="first">
<th>441</th><th> </th><td class="l"><span> PyList_Append(_refcache.setdefault((h1 , h2, h3), []),</span></td>
</tr><tr>
<th>442</th><th> </th><td class="l"><span> KeyedRef(k2,self.eraser,(h1, h2, h3)))</span></td>
</tr>
<tr class="last">
<th> </th><th>440</th><td class="r"><span> ref2 = KeyedRef(k2,self.eraser,(h1, h2, h3)))</span></td>
</tr>
</tbody><tbody class="unmod">
<tr>
<th>443</th><th>441</th><td class="l"><span> except TypeError:</span></td>
</tr>
</tbody><tbody class="mod">
<tr class="first">
<th>444</th><th> </th><td class="l"><span> PyList_Append(_refcache.setdefault((h1, h2, h3), []), k2)</span></td>
</tr><tr>
<th>445</th><th> </th><td class="l"><span> if k3 is not k1 and k3 is not k2:</span></td>
</tr>
<tr>
<th> </th><th>442</th><td class="r"><span> ref2 = k2</span></td>
</tr><tr>
<th> </th><th>443</th><td class="r"><span> else:</span></td>
</tr><tr>
<th> </th><th>444</th><td class="r"><span> ref2 = None</span></td>
</tr><tr class="last">
<th> </th><th>445</th><td class="r"><span> if k3 is not k2 or k3 is not k1:</span></td>
</tr>
</tbody><tbody class="unmod">
<tr>
<th>446</th><th>446</th><td class="l"><span> try:</span></td>
</tr>
</tbody><tbody class="mod">
<tr class="first">
<th>447</th><th> </th><td class="l"><span> PyList_Append(_refcache.setdefault((h1 , h2, h3), []),</span></td>
</tr><tr>
<th>448</th><th> </th><td class="l"><span> KeyedRef(k3,self.eraser,(h1, h2, h3)))</span></td>
</tr>
<tr class="last">
<th> </th><th>447</th><td class="r"><span> ref3 = KeyedRef(k3,self.eraser,(h1, h2, h3)))</span></td>
</tr>
</tbody><tbody class="unmod">
<tr>
<th>449</th><th>448</th><td class="l"><span> except TypeError:</span></td>
</tr>
</tbody><tbody class="mod">
<tr class="first">
<th>450</th><th> </th><td class="l"><span> PyList_Append(_refcache.setdefault((h1, h2, h3), []),k3)</span></td>
</tr>
<tr>
<th> </th><th>449</th><td class="r"><span> ref3 = k3</span></td>
</tr><tr>
<th> </th><th>450</th><td class="r"><span> else:</span></td>
</tr><tr>
<th> </th><th>451</th><td class="r"><span> ref3 = None</span></td>
</tr><tr class="last">
<th> </th><th>452</th><td class="r"><span> self._refcache[h1,h2,h3] = (ref1,ref2,ref3)</span></td>
</tr>
</tbody><tbody class="unmod">
<tr>
<th>451</th><th>453</th><td class="l"><span> self._size += 1</span></td>
</tr><tr>
<th>452</th><th>454</th><td class="l"><span></span></td>
</tr><tr>
<th>453</th><th>455</th><td class="l"><span> def __delitem__(self, k):</span></td>
</tr>
</tbody>
</table>
</li>
</ul>
</div></div><p>
However, with the resulting code, the memory leak discussed here reappears!
</p>
<p>
So far, I can only speculate why that has happened. It could be that moving _refcache into the <code>TripleDict</code> created a reference cycle (namely, <code>TripleDict</code> will occur as attribute to parents, and the parents occur as references in _refcache). If a <code>__del__</code> method is involved, the items in the reference cycle can't be collected.
</p>
<p>
If this holds true, then one has to have the necessary references in an <em>external</em> dictionary. But perhaps one can still ensure that the data associated with one <code>TripleDict</code> will be removed, as soon as the <code>TripleDict</code> gets garbage collected.
</p>
TicketSimonKingThu, 16 Aug 2012 10:34:36 GMT
https://trac.sagemath.org/ticket/715#comment:201
https://trac.sagemath.org/ticket/715#comment:201
<p>
Hm. I tried the alternative idea sketched in the previous post, but the leak is still there. Very strange.
</p>
TicketSimonKingThu, 16 Aug 2012 10:57:49 GMT
https://trac.sagemath.org/ticket/715#comment:202
https://trac.sagemath.org/ticket/715#comment:202
<p>
Aaaah! Now I see! The doc test I am struggling with is also failing, if only the old patch is applied. Apparently I forgot that <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> needs to be applied as well.
</p>
TicketSimonKingThu, 16 Aug 2012 11:15:56 GMT
https://trac.sagemath.org/ticket/715#comment:203
https://trac.sagemath.org/ticket/715#comment:203
<p>
Yessss! <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> was missing.
</p>
<p>
OK. I am now testing the new patch, and will hopefully be able to post it soon.
</p>
TicketSimonKingThu, 16 Aug 2012 14:07:02 GMTstatus, description changed
https://trac.sagemath.org/ticket/715#comment:204
https://trac.sagemath.org/ticket/715#comment:204
<ul>
<li><strong>status</strong>
changed from <em>needs_work</em> to <em>needs_review</em>
</li>
<li><strong>description</strong>
modified (<a href="/ticket/715?action=diff&version=204">diff</a>)
</li>
</ul>
<p>
I have attached a new patch, that changes the way how references are being kept track of.
</p>
<p>
First of all, as I have explained in my long post today, it is important for speed that the buckets of <code>TripleDict</code> only keep track of the memory locations of the keys. Hence, references (weak or strong, depending on the type of keys) need to be stored somewhere else.
</p>
<p>
Previously, there was a global dictionary, that was shared by all <code>TripleDicts</code>. That probably was a bad idea, for the reasons you pointed out. Now, the references are stored in a dictionary that is an attribute of each <code>TripleDict</code>.
</p>
<p>
That has several advantages: In a single <code>TripleDict</code>, each key triple only occurs once. Hence, we don't need to store the references in a list addressed by a triple of memory locations, that are popped off the list when being garbage collected.
</p>
<p>
Instead, each triple of memory locations points to exactly one triple of references. The triple of references is popped off the dictionary as soon as any weak-refed member of the key triple was garbage collected. Note that the <code>if len(L)==0:</code> bit is not needed.
</p>
<p>
Another advantage: If the <code>TripleDict</code> is deallocated, then the strong references associated with the <code>TripleDict</code> will vanish as well, which wouldn't have been the case with the old code.
</p>
<p>
Currently, there is only one bad situation I can think of: Let P be an object that can not be weak-refed, has a <code>TripleDict</code> T as an attribute, is used as a key in T, and has a <code>__del__</code> method. Then the reference cycle P->T->T._refcache->P will keep P alive. However, if any of the four assumptions does not hold, then P can be garbage collected. I think we can take that risk.
Is there any question of yours that I forgot to address?
</p>
<p>
I didn't do timings, but I've successfully run the doc tests.
</p>
<p>
Apply trac_715_combined.patch trac_715_local_refcache.patch <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>
</p>
TicketSimonKingThu, 16 Aug 2012 14:26:25 GMT
https://trac.sagemath.org/ticket/715#comment:205
https://trac.sagemath.org/ticket/715#comment:205
<p>
The patch bot seems to have a problem. It already times out when testing whether the dependencies are applied!
</p>
TicketnbruinThu, 16 Aug 2012 17:35:00 GMT
https://trac.sagemath.org/ticket/715#comment:206
https://trac.sagemath.org/ticket/715#comment:206
<p>
Excellent! Thank you for the great work. This is incredibly important for so
many parts of sage.
</p>
<blockquote class="citation">
<p>
Previously, there was a global dictionary, that was shared by all
<code>TripleDicts</code>. That probably was a bad idea, for the reasons you pointed out.
Now, the references are stored in a dictionary that is an attribute of each
<code>TripleDict</code>.
</p>
</blockquote>
<p>
Excellent! I agree with your assessment. I think this addresses all my concerns.
I think this is a useful data structure in general, so can we formalize its
behaviour in the documentation? (rewrite as you see fit)
</p>
<pre class="wiki">TripleDict is a structure like WeakKeyDictionary, optimized for lookup speed.
Keys consist of a triple (k1,k2,k3) and are looked up by identity rather than
equality. The keys are stored by weakrefs if possible. If any one of the
components k1,k2,k3 gets garbage collected, then the entry is removed from the
TripleDict. Key components that do not allow for weakrefs are stored via a
normal refcounted reference. That means that any entry stored using a triple
(k1,k2,k3) with none of the k1,k2,k3 weakreffable behaves as an entry in a
normal dictionary, so its existence in TripleDict prevents it from being garbage
collected.
</pre><blockquote class="citation">
<p>
Another advantage: If the <code>TripleDict</code> is deallocated, then the strong
references associated with the <code>TripleDict</code> will vanish as well, which wouldn't
have been the case with the old code.
</p>
</blockquote>
<p>
AND if an entry gets deleted/garbage collected due to a weakreffed key component
disappearing, we also deref any strongly reffed key components! So I think we
never behave worse than a normal dict in terms of keeping objects alive.
</p>
<blockquote class="citation">
<p>
Currently, there is only one bad situation I can think of: Let P be an object
that can not be weak-refed, has a <code>TripleDict</code> T as an attribute, is used as a
key in T, and has a <code>__del__</code> method. Then the reference cycle
P->T->T._refcache->P will keep P alive.
</p>
</blockquote>
<p>
I think we're safe for that. There are very few <code>__del__</code> definitions in the
sage library and they're all associated with interface-type objects. And those
are plain python classes anyway, so they are weakreffable.
</p>
TicketSimonKingThu, 16 Aug 2012 18:24:18 GMTattachment set
https://trac.sagemath.org/ticket/715
https://trac.sagemath.org/ticket/715
<ul>
<li><strong>attachment</strong>
set to <em>trac_715_specification.patch</em>
</li>
</ul>
<p>
Document the specifications of <code>TripleDict</code>
</p>
TicketSimonKingThu, 16 Aug 2012 18:31:15 GMTdescription changed
https://trac.sagemath.org/ticket/715#comment:207
https://trac.sagemath.org/ticket/715#comment:207
<ul>
<li><strong>description</strong>
modified (<a href="/ticket/715?action=diff&version=207">diff</a>)
</li>
</ul>
<p>
I added sage.structure.coerce_dict to the reference manual. I took a slight modification of the text you suggested to document the purpose of that module. Note that the text also refers to <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>, but this should be fine, as <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> are mutually dependent, hence, will be merged together.
</p>
<p>
Apply trac_715_combined.patch trac_715_local_refcache.patch trac_715_specification.patch <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>
</p>
TicketnbruinThu, 16 Aug 2012 22:06:25 GMTstatus, reviewer changed
https://trac.sagemath.org/ticket/715#comment:208
https://trac.sagemath.org/ticket/715#comment:208
<ul>
<li><strong>status</strong>
changed from <em>needs_review</em> to <em>positive_review</em>
</li>
<li><strong>reviewer</strong>
changed from <em>Jean-Pierre Flori, Simon King</em> to <em>Jean-Pierre Flori, Simon King, Nils Bruin</em>
</li>
</ul>
<p>
I'm happy. Positive review. I think the bot gets confused and tries to apply the patches in the wrong order.
</p>
TicketjdemeyerFri, 17 Aug 2012 09:32:45 GMTmilestone changed
https://trac.sagemath.org/ticket/715#comment:209
https://trac.sagemath.org/ticket/715#comment:209
<ul>
<li><strong>milestone</strong>
changed from <em>sage-pending</em> to <em>sage-5.4</em>
</li>
</ul>
TicketSimonKingMon, 20 Aug 2012 10:16:24 GMTstatus changed; work_issues set
https://trac.sagemath.org/ticket/715#comment:210
https://trac.sagemath.org/ticket/715#comment:210
<ul>
<li><strong>status</strong>
changed from <em>positive_review</em> to <em>needs_work</em>
</li>
<li><strong>work_issues</strong>
set to <em>Fix __delitem__</em>
</li>
</ul>
<p>
On <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a>, I found that the <code>TripleDict.__delitem__</code> method needs to be fixed, because it does currently not update the <code>_refcache</code> dictionary. So, I'd say I fix this here, which means that it needs work and needs another review.
</p>
TicketSimonKingMon, 20 Aug 2012 10:33:02 GMTstatus changed; work_issues deleted
https://trac.sagemath.org/ticket/715#comment:211
https://trac.sagemath.org/ticket/715#comment:211
<ul>
<li><strong>status</strong>
changed from <em>needs_work</em> to <em>needs_review</em>
</li>
<li><strong>work_issues</strong>
<em>Fix __delitem__</em> deleted
</li>
</ul>
<p>
Fixed. Keep in mind that the patchbot wouldn't be able to understand that <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> needs to be merged as well.
</p>
<p>
Apply trac_715_combined.patch trac_715_local_refcache.patch trac_715_specification.patch <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>
</p>
TicketSimonKingMon, 20 Aug 2012 11:05:48 GMT
https://trac.sagemath.org/ticket/715#comment:212
https://trac.sagemath.org/ticket/715#comment:212
<p>
I had to change one detail: If a non-existing item is deleted, then with the old patch version the resulting key error would not name the key but the memory address of the key. Fixed.
</p>
<p>
Apply trac_715_combined.patch trac_715_local_refcache.patch trac_715_specification.patch <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>
</p>
TicketSimonKingTue, 21 Aug 2012 08:19:23 GMTstatus changed; work_issues set
https://trac.sagemath.org/ticket/715#comment:213
https://trac.sagemath.org/ticket/715#comment:213
<ul>
<li><strong>status</strong>
changed from <em>needs_review</em> to <em>needs_work</em>
</li>
<li><strong>work_issues</strong>
set to <em>test activity of weak references if addresses coincide</em>
</li>
</ul>
TicketSimonKingTue, 21 Aug 2012 08:54:12 GMTattachment set
https://trac.sagemath.org/ticket/715
https://trac.sagemath.org/ticket/715
<ul>
<li><strong>attachment</strong>
set to <em>trac_715_local_refcache.patch</em>
</li>
</ul>
<p>
Keep track of references in a local dictionary
</p>
TicketSimonKingTue, 21 Aug 2012 08:55:31 GMTstatus changed; work_issues deleted
https://trac.sagemath.org/ticket/715#comment:214
https://trac.sagemath.org/ticket/715#comment:214
<ul>
<li><strong>status</strong>
changed from <em>needs_work</em> to <em>needs_review</em>
</li>
<li><strong>work_issues</strong>
<em>test activity of weak references if addresses coincide</em> deleted
</li>
</ul>
<p>
OK, the second patch is updated again. Changes: The <code>get()</code> method now tests whether the stored weak references to the keys are still active, before returning a value.
</p>
<p>
Apply trac_715_combined.patch trac_715_local_refcache.patch trac_715_specification.patch <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>
</p>
TicketSimonKingThu, 23 Aug 2012 07:55:18 GMT
https://trac.sagemath.org/ticket/715#comment:215
https://trac.sagemath.org/ticket/715#comment:215
<p>
I have added a patch that changes a couple of problems discussed at <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a>. In particular
</p>
<ul><li>Remove <code>TripleDictIter</code> and replace it by using the new "yield" statement in Cython.
</li><li>Test whether the references are valid, before setting an item.
</li></ul><p>
Apply trac_715_combined.patch trac_715_local_refcache.patch trac_715_safer.patch trac_715_specification.patch
</p>
<p>
And then <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>
</p>
TicketSimonKingThu, 23 Aug 2012 07:56:46 GMT
https://trac.sagemath.org/ticket/715#comment:216
https://trac.sagemath.org/ticket/715#comment:216
<p>
... and in addition, set 0.7 as default threshold.
</p>
TicketSimonKingThu, 23 Aug 2012 07:57:37 GMTdescription changed
https://trac.sagemath.org/ticket/715#comment:217
https://trac.sagemath.org/ticket/715#comment:217
<ul>
<li><strong>description</strong>
modified (<a href="/ticket/715?action=diff&version=217">diff</a>)
</li>
</ul>
TicketSimonKingThu, 23 Aug 2012 10:34:24 GMTstatus changed
https://trac.sagemath.org/ticket/715#comment:218
https://trac.sagemath.org/ticket/715#comment:218
<ul>
<li><strong>status</strong>
changed from <em>needs_review</em> to <em>needs_work</em>
</li>
</ul>
<p>
make ptest did only result in few errors, no segfault! So, it needs work for now, but it is close to success.
</p>
TicketSimonKingThu, 23 Aug 2012 10:48:19 GMTattachment set
https://trac.sagemath.org/ticket/715
https://trac.sagemath.org/ticket/715
<ul>
<li><strong>attachment</strong>
set to <em>trac_715_safer.patch</em>
</li>
</ul>
<p>
Fix some issues: Test validity of references when setting items; use the new "yield" statement in Cython for iteration.
</p>
TicketSimonKingThu, 23 Aug 2012 10:49:05 GMTstatus changed
https://trac.sagemath.org/ticket/715#comment:219
https://trac.sagemath.org/ticket/715#comment:219
<ul>
<li><strong>status</strong>
changed from <em>needs_work</em> to <em>needs_review</em>
</li>
</ul>
<p>
Now it should work! Needs review - this time for real...
</p>
<p>
Apply trac_715_combined.patch trac_715_local_refcache.patch trac_715_safer.patch trac_715_specification.patch
</p>
<p>
And then <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>
</p>
TicketnbruinFri, 24 Aug 2012 23:08:31 GMTstatus changed
https://trac.sagemath.org/ticket/715#comment:220
https://trac.sagemath.org/ticket/715#comment:220
<ul>
<li><strong>status</strong>
changed from <em>needs_review</em> to <em>positive_review</em>
</li>
</ul>
<p>
<strong>safer.patch</strong>: <code>sage/structure/coerce_dict.pyx</code>
</p>
<pre class="wiki"> # This is to cope with a potential racing condition - if garbage
# collection and weakref callback happens right between the
# "if (isinstance(r1,..." and the "del", then the previously
# existing entry might already be gone.
</pre><p>
No, there is no such racing condition. You are holding references
<code>k1,k2,k3</code>. You have just looked up the weakreference <code>r1,r2,r3</code> to these keys
and checked that the weakrefs are still alive (and hence that it's not the case
that one of the <code>ki</code> is just a new element that happens to have the same id as a
now-deceased previous key element in the dict).
Since you are holding references, they cannot die in between, so I don't think
the
</p>
<pre class="wiki"> del self._refcache[<size_t><void *>k1,<size_t><void *>k2,<size_t><void *>k3]
</pre><p>
needs to be quarded.
</p>
<p>
It won't hurt, though, and this code will be optimized anyway, so no effect on
the review.
</p>
<p>
Concerning <code>next_odd_prime</code>: I'm pretty sure we're keeping a list of primes
somewhere in sage. We might want to look in there rather than have this snippet
here. Again, not hurtful to do it this way.
</p>
<p>
<strong>trac_715_specification.patch</strong>: One line fuzz in application. Do we care?
</p>
TicketSimonKingMon, 03 Sep 2012 09:25:05 GMTstatus changed
https://trac.sagemath.org/ticket/715#comment:221
https://trac.sagemath.org/ticket/715#comment:221
<ul>
<li><strong>status</strong>
changed from <em>positive_review</em> to <em>needs_work</em>
</li>
</ul>
<p>
There are sporadic segfaults found by some (not all) patchbots on <a class="closed ticket" href="https://trac.sagemath.org/ticket/11370" title="defect: permutation.to_standard() breaks on empty permutations (closed: fixed)">#11370</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/12876" title="enhancement: Fix element and parent classes of Hom categories to be abstract, and ... (closed: fixed)">#12876</a> that seem to be related with the weak caching bits.
</p>
<p>
Here, the cdef attribute <code>sage.categories.action.Action.S</code> (that's for keeping the underlying set of the action) is turned into a weak reference to the underlying set. Today, I found that there might be an interference with old code in sage/rings/morphism.pyx, namely:
</p>
<pre class="wiki">cdef class RingHomomorphism(RingMap):
def __init__(self, R, S):
"""
Create a lifting ring map.
EXAMPLES::
sage: f = Zmod(8).lift() # indirect doctest
sage: f(3)
3
sage: type(f(3))
<type 'sage.rings.integer.Integer'>
sage: type(f)
<type 'sage.rings.morphism.RingMap_lift'>
"""
from sage.categories.sets_cat import Sets
H = R.Hom(S, Sets())
RingMap.__init__(self, H)
self.S = S # for efficiency
try:
S._coerce_(R(0).lift())
except TypeError:
raise TypeError, "No natural lift map"
cdef _update_slots(self, _slots):
self.S = _slots['S']
Morphism._update_slots(self, _slots)
cdef _extra_slots(self, _slots):
_slots['S'] = self.S
return Morphism._extra_slots(self, _slots)
</pre><p>
Hence, <code>RingHomomorphism</code> uses the S attribute as well, but differently. And aren't there actions that are ring homomorphisms?
</p>
<p>
I think it is worth trying to rename the <code>S</code> attribute of Action. I put it to "needs work", because I doubt that flaky segfaults are acceptable.
</p>
TicketSimonKingMon, 03 Sep 2012 09:30:10 GMTattachment set
https://trac.sagemath.org/ticket/715
https://trac.sagemath.org/ticket/715
<ul>
<li><strong>attachment</strong>
set to <em>trac_715_combined.patch</em>
</li>
</ul>
<p>
Introduce weak references to coercion dicts, and refactor the hashtables.
</p>
TicketSimonKingMon, 03 Sep 2012 09:31:52 GMT
https://trac.sagemath.org/ticket/715#comment:222
https://trac.sagemath.org/ticket/715#comment:222
<p>
The main patch is updated, renaming S into US (for Underlying Set - I certainly don't want to blame the US if it doesn't work). Let us see whether stuff at <a class="closed ticket" href="https://trac.sagemath.org/ticket/13370" title="defect: Do not cache the result of is_Field externally (closed: fixed)">#13370</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/12876" title="enhancement: Fix element and parent classes of Hom categories to be abstract, and ... (closed: fixed)">#12876</a> will work now...
</p>
<p>
Apply trac_715_combined.patch trac_715_local_refcache.patch trac_715_safer.patch trac_715_specification.patch
</p>
<p>
And then <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>
</p>
TicketSimonKingMon, 03 Sep 2012 09:32:02 GMTstatus changed
https://trac.sagemath.org/ticket/715#comment:223
https://trac.sagemath.org/ticket/715#comment:223
<ul>
<li><strong>status</strong>
changed from <em>needs_work</em> to <em>needs_review</em>
</li>
</ul>
TicketjdemeyerWed, 05 Sep 2012 12:09:48 GMT
https://trac.sagemath.org/ticket/715#comment:224
https://trac.sagemath.org/ticket/715#comment:224
<p>
Applying <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> gives on OS X 10.6 x86_64:
</p>
<pre class="wiki">bsd:sage-5.4.beta0 jdemeyer$ ./sage -t devel/sage/sage/misc/cachefunc.pyx
sage -t "devel/sage/sage/misc/cachefunc.pyx"
The doctested process was killed by signal 11
[14.3 s]
----------------------------------------------------------------------
The following tests failed:
sage -t "devel/sage/sage/misc/cachefunc.pyx" # Killed/crashed
Total time for all tests: 14.3 seconds
</pre><p>
This is the only system where this happens. When running the test with <code>--verbose</code>, the test actually passes.
</p>
TicketSimonKingWed, 05 Sep 2012 12:17:38 GMT
https://trac.sagemath.org/ticket/715#comment:225
https://trac.sagemath.org/ticket/715#comment:225
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:224" title="Comment 224">jdemeyer</a>:
</p>
<blockquote class="citation">
<p>
Applying <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> gives on OS X 10.6 x86_64:
</p>
<pre class="wiki">bsd:sage-5.4.beta0 jdemeyer$ ./sage -t devel/sage/sage/misc/cachefunc.pyx
sage -t "devel/sage/sage/misc/cachefunc.pyx"
The doctested process was killed by signal 11
[14.3 s]
----------------------------------------------------------------------
The following tests failed:
sage -t "devel/sage/sage/misc/cachefunc.pyx" # Killed/crashed
Total time for all tests: 14.3 seconds
</pre><p>
This is the only system where this happens.
</p>
</blockquote>
<p>
But that means: We finally have a system where it happens <em>with <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a>+<a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> only</em>! So far, we only had Volker's patchbot, which produced segfaults when other patches were applied on top of <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>.
</p>
<p>
Hence, hope increases.
</p>
<blockquote class="citation">
<blockquote>
<p>
When running the test with <code>--verbose</code>, the test actually passes.
</p>
</blockquote>
</blockquote>
<p>
Did it really fully pass and you came back to your shell prompt, or did the tests pass and there was a segfault when Sage shuts down?
</p>
<p>
Can you produce a backtrace, say, by using gdb? Or can you give me access to the machine, so that I can do some experiments?
</p>
<p>
Best regards,
Simon
</p>
TicketjdemeyerWed, 05 Sep 2012 12:28:57 GMT
https://trac.sagemath.org/ticket/715#comment:226
https://trac.sagemath.org/ticket/715#comment:226
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:225" title="Comment 225">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
Did it really fully pass and you came back to your shell prompt, or did the tests pass and there was a segfault when Sage shuts down?
</p>
</blockquote>
<p>
It really worked:
</p>
<pre class="wiki">715 tests in 72 items.
715 passed and 0 failed.
Test passed.
[13.6 s]
----------------------------------------------------------------------
All tests passed!
Total time for all tests: 13.6 seconds
</pre><p>
</p>
<blockquote class="citation">
<p>
Can you produce a backtrace, say, by using gdb?
</p>
</blockquote>
<p>
Under gdb, there is no crash. There is a doctest failure though:
</p>
<pre class="wiki">**********************************************************************
File "/Users/jdemeyer/sage-5.4.beta0/devel/sage/sage/misc/cachefunc.pyx", line 799, in __main__.example_17
Failed example:
oddprime_factors.precompute(range(Integer(1),Integer(100)), Integer(4))###line 704:_sage_ >>> oddprime_factors.precompute(range(1,1
00), 4)
Expected nothing
Got:
[Errno 4] Interrupted system call
Killing any remaining workers...
**********************************************************************
File "/Users/jdemeyer/sage-5.4.beta0/devel/sage/sage/misc/cachefunc.pyx", line 800, in __main__.example_17
Failed example:
oddprime_factors.cache[(Integer(25),),()]###line 705:_sage_ >>> oddprime_factors.cache[(25,),()]
Exception raised:
Traceback (most recent call last):
File "/Users/jdemeyer/sage-5.4.beta0/local/bin/ncadoctest.py", line 1231, in run_one_test
self.run_one_example(test, example, filename, compileflags)
File "/Users/jdemeyer/sage-5.4.beta0/local/bin/sagedoctest.py", line 38, in run_one_example
OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
File "/Users/jdemeyer/sage-5.4.beta0/local/bin/ncadoctest.py", line 1172, in run_one_example
compileflags, 1) in test.globs
File "<doctest __main__.example_17[4]>", line 1, in <module>
oddprime_factors.cache[(Integer(25),),()]###line 705:_sage_ >>> oddprime_factors.cache[(25,),()]
KeyError: ((25,), ())
</pre><blockquote class="citation">
<p>
Or can you give me access to the machine, so that I can do some experiments?
</p>
</blockquote>
<p>
This is William's <code>bsd.math</code> machine, ask him.
</p>
TicketSimonKingWed, 05 Sep 2012 12:37:41 GMT
https://trac.sagemath.org/ticket/715#comment:227
https://trac.sagemath.org/ticket/715#comment:227
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:226" title="Comment 226">jdemeyer</a>:
</p>
<blockquote class="citation">
<p>
Under gdb, there is no crash. There is a doctest failure though:
</p>
</blockquote>
<p>
Interesting. Isn't gdb supposed to just watch, and not interfere with, the computations?
</p>
<blockquote class="citation">
<p>
This is William's <code>bsd.math</code> machine, ask him.
</p>
</blockquote>
<p>
Too bad. I already tested 5.3.rc1 on bsd.math, and it worked fine. No segfault. Perhaps I should retry with 5.4.beta0, then?
</p>
TicketjdemeyerWed, 05 Sep 2012 12:44:50 GMT
https://trac.sagemath.org/ticket/715#comment:228
https://trac.sagemath.org/ticket/715#comment:228
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:227" title="Comment 227">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
Perhaps I should retry with 5.4.beta0, then?
</p>
</blockquote>
<p>
Yes, you could. sage-5.4.beta0 is more or less stable now (the main uncertainty being <a class="closed ticket" href="https://trac.sagemath.org/ticket/13121" title="enhancement: Upgrade sagenb to 0.10.x (closed: fixed)">#13121</a> and related tickets).
</p>
TicketjdemeyerWed, 05 Sep 2012 12:46:30 GMT
https://trac.sagemath.org/ticket/715#comment:229
https://trac.sagemath.org/ticket/715#comment:229
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:227" title="Comment 227">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
Interesting. Isn't gdb supposed to just watch, and not interfere with, the computations?
</p>
</blockquote>
<p>
I honestly don't know how gdb works and certainly not how it works within doctesting (<code>sage -t --gdb</code>). Note that this is on OS X and <code>gdb</code> might work slightly different compared to Linux.
</p>
TicketjdemeyerWed, 05 Sep 2012 12:47:46 GMT
https://trac.sagemath.org/ticket/715#comment:230
https://trac.sagemath.org/ticket/715#comment:230
<p>
I should also clarify that the doctest crash is completely reproducible: it really happens every time.
</p>
TicketSimonKingWed, 05 Sep 2012 12:54:55 GMT
https://trac.sagemath.org/ticket/715#comment:231
https://trac.sagemath.org/ticket/715#comment:231
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:228" title="Comment 228">jdemeyer</a>:
</p>
<blockquote class="citation">
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:227" title="Comment 227">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
Perhaps I should retry with 5.4.beta0, then?
</p>
</blockquote>
<p>
Yes, you could.
</p>
</blockquote>
<p>
Building it now...
</p>
TicketvbraunWed, 05 Sep 2012 13:12:27 GMT
https://trac.sagemath.org/ticket/715#comment:232
https://trac.sagemath.org/ticket/715#comment:232
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:227" title="Comment 227">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
Interesting. Isn't gdb supposed to just watch, and not interfere with, the computations?
</p>
</blockquote>
<p>
Yes, but:
</p>
<ul><li>gdb installs a bag full of signal handlers (so you can press Ctrl-C and get to the gdb prompt, e.g.)
</li><li>gdb disables ASLR by default, so all memory locations are reproducible (but different from running without gdb).
</li></ul>
TicketSimonKingWed, 05 Sep 2012 13:16:01 GMT
https://trac.sagemath.org/ticket/715#comment:233
https://trac.sagemath.org/ticket/715#comment:233
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:232" title="Comment 232">vbraun</a>:
</p>
<blockquote class="citation">
<ul><li>gdb disables ASLR by default, so all memory locations are reproducible (but different from running without gdb).
</li></ul></blockquote>
<p>
OK, that is likely to be a problem here.
</p>
TicketSimonKingWed, 05 Sep 2012 15:36:03 GMT
https://trac.sagemath.org/ticket/715#comment:234
https://trac.sagemath.org/ticket/715#comment:234
<p>
Hooray! Finally I get
</p>
<pre class="wiki">bash-3.2$ ../../sage -t sage/misc/cachefunc.pyx
sage -t "devel/sage-main/sage/misc/cachefunc.pyx"
The doctested process was killed by signal 11
[44.1 s]
----------------------------------------------------------------------
The following tests failed:
sage -t "devel/sage-main/sage/misc/cachefunc.pyx" # Killed/crashed
Total time for all tests: 44.1 seconds
</pre><p>
Why is it so much faster for you, Jeroen?
</p>
TicketSimonKingWed, 05 Sep 2012 15:41:31 GMT
https://trac.sagemath.org/ticket/715#comment:235
https://trac.sagemath.org/ticket/715#comment:235
<p>
What I don't understand: With gdb, one gets an error, reportedly in line 800. But line 800 is
</p>
<pre class="wiki"> sage: J.groebner_basis.clear_cache()
</pre><p>
Nothing like
</p>
<pre class="wiki">Failed example:
oddprime_factors.cache[(Integer(25),),()]###line 705:_sage_ >>> oddprime_factors.cache[(25,),()]
Exception raised:
</pre>
TicketSimonKingWed, 05 Sep 2012 15:51:56 GMT
https://trac.sagemath.org/ticket/715#comment:236
https://trac.sagemath.org/ticket/715#comment:236
<p>
I tried to install some valgrind spkg on bsd.math, but it failed.
</p>
TicketjpfloriWed, 05 Sep 2012 15:54:47 GMT
https://trac.sagemath.org/ticket/715#comment:237
https://trac.sagemath.org/ticket/715#comment:237
<p>
Valgrind cannot be build with the FSF GCC on OS X (see the ticket I pointed to recently about Valgrind, don't remember where).
IIRC, that's what Sage tries to do, so it fails...
So you should use a system wide valgrind or force the use of the system wide compiler to build the spkg.
</p>
TicketSimonKingWed, 05 Sep 2012 16:09:46 GMT
https://trac.sagemath.org/ticket/715#comment:238
https://trac.sagemath.org/ticket/715#comment:238
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:237" title="Comment 237">jpflori</a>:
</p>
<blockquote class="citation">
<p>
Valgrind cannot be build with the FSF GCC on OS X (see the ticket I pointed to recently about Valgrind, don't remember where).
IIRC, that's what Sage tries to do, so it fails...
So you should use a system wide valgrind or force the use of the system wide compiler to build the spkg.
</p>
</blockquote>
<p>
Thank you!
</p>
<p>
Next attempt: ulimit -c unlimited.
</p>
<p>
However, even though there still was a signal 11, no core dump was written. So, question to the experts: How can I make bsd.math write a core dump of the failing test?
</p>
TicketjdemeyerWed, 05 Sep 2012 17:56:09 GMT
https://trac.sagemath.org/ticket/715#comment:239
https://trac.sagemath.org/ticket/715#comment:239
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:234" title="Comment 234">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
Hooray!
</p>
</blockquote>
<p>
Hooray because you get a Segmentation Fault, there are a lot of tickets that should make you happy then :-)
</p>
<blockquote class="citation">
<p>
Why is it so much faster for you, Jeroen?
</p>
</blockquote>
<p>
Caching (the disk kind) perhaps?
</p>
TicketSimonKingWed, 05 Sep 2012 19:00:32 GMT
https://trac.sagemath.org/ticket/715#comment:240
https://trac.sagemath.org/ticket/715#comment:240
<p>
Gosh, it is so frustrating to hunt that Heisenbug!!
</p>
<ul><li>Test the file - there is a segfault, but it doesn't give any clue of what is happening.
</li><li>Test the file with gdb - the segfault is gone, but a "normal" error occurs, that is rather odd because it is reported in the wrong line of the file.
</li><li>Try ulimit -c unlimited - there is a segfault, but no core dump is written.
</li><li>Try verbose tests - all tests pass.
</li><li>What I just did: Start each test with a few lines that write something into a log file. Hence, it isn't exactly verbose, but should give some idea in what test the segfault occurs. But alas - all tests pass.
</li><li>Valgrind doesn't seem to be available on bsd.math,
</li></ul><p>
Anything else I could try? So far, I only see the option to try to understand why using gdb results in an error in a very innocent-looking test that should actually use a strong cache.
</p>
TicketnbruinWed, 05 Sep 2012 20:07:08 GMT
https://trac.sagemath.org/ticket/715#comment:241
https://trac.sagemath.org/ticket/715#comment:241
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:240" title="Comment 240">SimonKing</a>:
</p>
<blockquote class="citation">
<ul><li>Test the file - there is a segfault, but it doesn't give any clue of what is happening.
</li></ul></blockquote>
<p>
I'm pretty sure that's because sage-doctest redirects the output somewhere. I'm
sure the SIGSEGV causes the usual traceback upon sage crashing. So I'd start
breaking into the sage-doctest script and change little things there, hoping to
not upset the subtle conditions required to trigger the fault. Indeed
</p>
<p>
<strong>local/bin/sage-doctest</strong>:801
</p>
<div class="wiki-code"><div class="code"><pre> <span class="k">if</span> verbose <span class="ow">or</span> gdb <span class="ow">or</span> memcheck <span class="ow">or</span> massif <span class="ow">or</span> cachegrind<span class="p">:</span>
<span class="kn">import</span> <span class="nn">subprocess</span>
proc <span class="o">=</span> subprocess<span class="o">.</span>Popen<span class="p">(</span>cmd<span class="p">,</span> shell<span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="k">while</span> time<span class="o">.</span>time<span class="p">()</span><span class="o">-</span>tm <span class="o"><=</span> TIMEOUT <span class="ow">and</span> proc<span class="o">.</span>poll<span class="p">()</span> <span class="o">==</span> <span class="bp">None</span><span class="p">:</span>
time<span class="o">.</span>sleep<span class="p">(</span><span class="mf">0.1</span><span class="p">)</span>
<span class="k">if</span> time<span class="o">.</span>time<span class="p">()</span><span class="o">-</span>tm <span class="o">>=</span>TIMEOUT<span class="p">:</span>
os<span class="o">.</span>kill<span class="p">(</span>proc<span class="o">.</span>pid<span class="p">,</span> <span class="mi">9</span><span class="p">)</span>
<span class="k">print</span> <span class="s">"*** *** Error: TIMED OUT! PROCESS KILLED! *** ***"</span>
e <span class="o">=</span> proc<span class="o">.</span>poll<span class="p">()</span>
<span class="k">else</span><span class="p">:</span>
outf <span class="o">=</span> tempfile<span class="o">.</span>NamedTemporaryFile<span class="p">()</span>
<span class="kn">import</span> <span class="nn">subprocess</span>
proc <span class="o">=</span> subprocess<span class="o">.</span>Popen<span class="p">(</span>cmd<span class="p">,</span> shell<span class="o">=</span><span class="bp">True</span><span class="p">,</span> \
stdout<span class="o">=</span>outf<span class="o">.</span>file<span class="o">.</span>fileno<span class="p">(),</span> stderr <span class="o">=</span> outf<span class="o">.</span>file<span class="o">.</span>fileno<span class="p">())</span>
<span class="k">while</span> time<span class="o">.</span>time<span class="p">()</span><span class="o">-</span>tm <span class="o"><=</span> TIMEOUT <span class="ow">and</span> proc<span class="o">.</span>poll<span class="p">()</span> <span class="o">==</span> <span class="bp">None</span><span class="p">:</span>
time<span class="o">.</span>sleep<span class="p">(</span><span class="mf">0.1</span><span class="p">)</span>
<span class="k">if</span> time<span class="o">.</span>time<span class="p">()</span><span class="o">-</span>tm <span class="o">>=</span>TIMEOUT<span class="p">:</span>
os<span class="o">.</span>kill<span class="p">(</span>proc<span class="o">.</span>pid<span class="p">,</span> <span class="mi">9</span><span class="p">)</span>
<span class="k">print</span> <span class="s">"*** *** Error: TIMED OUT! PROCESS KILLED! *** ***"</span>
outf<span class="o">.</span>file<span class="o">.</span>seek<span class="p">(</span><span class="mi">0</span><span class="p">)</span>
out <span class="o">=</span> outf<span class="o">.</span>read<span class="p">()</span>
e <span class="o">=</span> proc<span class="o">.</span>poll<span class="p">()</span>
</pre></div></div><p>
The verbose parameter does get written into the file that <code>cmd</code> executes, so it
has effect there as well. You could also just extract that temporary file and
hack on that.
</p>
<p>
Of course, this all only might help you to figure out where the SEGV occurs,
which may or may not be related to the real culprit.
</p>
TicketSimonKingThu, 06 Sep 2012 08:46:40 GMT
https://trac.sagemath.org/ticket/715#comment:242
https://trac.sagemath.org/ticket/715#comment:242
<p>
Concerning the oddity that there is an error (but no segfault) with gdb: It says
</p>
<pre class="wiki">File "/scratch/sking/sage-5.4.beta0/devel/sage-main/sage/misc/cachefunc.pyx", line 799, in __main__.example_17
Failed example:
oddprime_factors.precompute(range(Integer(1),Integer(100)), Integer(4))###line 704:_sage_ >>> oddprime_factors.precompute(range(1,100), 4)
Expected nothing
Got:
[Errno 4] Interrupted system call
Killing any remaining workers...
</pre><p>
Why does "interrupted system call" mean? The failing function appears to be the cached version of
</p>
<pre class="wiki"> def oddprime_factors(n):
l = [p for p,e in factor(n) if p != 2]
return len(l)
</pre><p>
What system call is involved here?
</p>
TicketSimonKingThu, 06 Sep 2012 08:59:31 GMT
https://trac.sagemath.org/ticket/715#comment:243
https://trac.sagemath.org/ticket/715#comment:243
<p>
PS: When I comment out the "oddprime_factors" test, running <code>sage -t -gdb</code> does not report any error - and it also makes the segfault in <code>sage -t</code> go away!
</p>
<p>
Hence, it seems that the problem really is due to the seemingly harmless test of the "precompute" method.
</p>
TicketSimonKingThu, 06 Sep 2012 09:09:01 GMT
https://trac.sagemath.org/ticket/715#comment:244
https://trac.sagemath.org/ticket/715#comment:244
<p>
If the test is
</p>
<pre class="wiki"> sage: @cached_function
... def oddprime_factors(n):
... l = [p for p,e in factor(n) if p != 2]
... return len(l)
sage: oddprime_factors.precompute(range(1,99), 4)
sage: oddprime_factors.cache[(25,),()]
1
</pre><p>
then <code>sage -t</code> passes. A precomputation in <code>range(1,90)</code> or <code>range(1,50)</code> or <code>range(2,100)</code> works as well. But if the precomputation runs in <code>range(1,100)</code> or <code>range(2,101)</code> or <code>range(1,110)</code>, then there is signal 11.
</p>
<p>
Hence, it really seems that we located the culprit - although I have no idea whatsoever as to what is happening here. It seems that there is no error, if we precompute at most 98 values, but if we have 99 or more precomputed values then there is signal 11.
</p>
<p>
Any idea what to try next?
</p>
TicketjdemeyerThu, 06 Sep 2012 09:22:27 GMT
https://trac.sagemath.org/ticket/715#comment:245
https://trac.sagemath.org/ticket/715#comment:245
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:242" title="Comment 242">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
Why does "interrupted system call" mean?
</p>
</blockquote>
<p>
It means, quite literally, that a system call got interrupted by a signal. If a system call (for example some file I/O operation) gets interrupted by a signal, then that system call might fail with "interrupted system call", even if the signal was properly handled by the application. Note the use of "might", there is a long discussion in the <code>signal(7)</code> man page explaining to which calls this applies.
</p>
TicketSimonKingThu, 06 Sep 2012 09:24:51 GMT
https://trac.sagemath.org/ticket/715#comment:246
https://trac.sagemath.org/ticket/715#comment:246
<p>
By inserting a print statement into <code>weakref.KeyedRef.__init__</code> and running the test in the command line, I found that the test does <em>not</em> involve keyed weak references.
</p>
<p>
Since the second argument to the <code>precompute</code> method gives the number of used parallel processes, I thought for a moment that parallelity could be the problem, but changing the test into
</p>
<pre class="wiki"> sage: oddprime_factors.precompute(range(1,110), 1)
</pre><p>
did <em>not</em> make signal 11 vanish.
</p>
TicketvbraunThu, 06 Sep 2012 09:25:25 GMT
https://trac.sagemath.org/ticket/715#comment:247
https://trac.sagemath.org/ticket/715#comment:247
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:242" title="Comment 242">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
Got:
</p>
<blockquote>
<p>
[Errno 4] Interrupted system call
Killing any remaining workers...
</p>
</blockquote>
</blockquote>
<p>
This sounds more like a bug in the doctest framework. I imagine the worker process segfaults, and the doctesting process is in a blocking system call when the <code>SIGCHLD</code> arrives. The doctesting framework should check the <code>EINTR</code> result and retry but doesn't.
</p>
TicketSimonKingThu, 06 Sep 2012 09:31:00 GMT
https://trac.sagemath.org/ticket/715#comment:248
https://trac.sagemath.org/ticket/715#comment:248
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:247" title="Comment 247">vbraun</a>:
</p>
<blockquote class="citation">
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:242" title="Comment 242">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
Got:
</p>
<blockquote>
<p>
[Errno 4] Interrupted system call
Killing any remaining workers...
</p>
</blockquote>
</blockquote>
<p>
This sounds more like a bug in the doctest framework. I imagine the worker process segfaults, and the doctesting process is in a blocking system call when the <code>SIGCHLD</code> arrives. The doctesting framework should check the <code>EINTR</code> result and retry but doesn't.
</p>
</blockquote>
<p>
... which sounds like <a class="ext-link" href="http://bugs.python.org/issue12268"><span class="icon"></span>this known problem</a> or <a class="ext-link" href="http://bugs.python.org/issue9867"><span class="icon"></span>its duplicate</a>.
</p>
TicketSimonKingThu, 06 Sep 2012 11:47:50 GMT
https://trac.sagemath.org/ticket/715#comment:249
https://trac.sagemath.org/ticket/715#comment:249
<p>
Recall that some of the patchbots report sporadic problems for <a class="closed ticket" href="https://trac.sagemath.org/ticket/13370" title="defect: Do not cache the result of is_Field externally (closed: fixed)">#13370</a> or <a class="closed ticket" href="https://trac.sagemath.org/ticket/12876" title="enhancement: Fix element and parent classes of Hom categories to be abstract, and ... (closed: fixed)">#12876</a> as well - and at least in the case of <a class="closed ticket" href="https://trac.sagemath.org/ticket/12876" title="enhancement: Fix element and parent classes of Hom categories to be abstract, and ... (closed: fixed)">#12876</a>, it is seemingly a similar problem:
</p>
<pre class="wiki">sage -t -force_lib devel/sage-12876/sage/rings/polynomial/infinite_polynomial_ring.py
The doctested process was killed by signal 11
</pre><p>
Signal 11, same signal as here.
</p>
<p>
I wonder: Does the patchbot uses some <code>UniqueRepresentation</code> to represent a tester? That might be a problem if it is only weakly cached.
</p>
TicketSimonKingThu, 06 Sep 2012 14:00:43 GMT
https://trac.sagemath.org/ticket/715#comment:250
https://trac.sagemath.org/ticket/715#comment:250
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:249" title="Comment 249">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
Recall that some of the patchbots report sporadic problems for <a class="closed ticket" href="https://trac.sagemath.org/ticket/13370" title="defect: Do not cache the result of is_Field externally (closed: fixed)">#13370</a> or <a class="closed ticket" href="https://trac.sagemath.org/ticket/12876" title="enhancement: Fix element and parent classes of Hom categories to be abstract, and ... (closed: fixed)">#12876</a> as well - and at least in the case of <a class="closed ticket" href="https://trac.sagemath.org/ticket/12876" title="enhancement: Fix element and parent classes of Hom categories to be abstract, and ... (closed: fixed)">#12876</a>, it is seemingly a similar problem:
</p>
</blockquote>
<p>
In an error of an earlier version of <a class="closed ticket" href="https://trac.sagemath.org/ticket/13370" title="defect: Do not cache the result of is_Field externally (closed: fixed)">#13370</a> on <a class="ext-link" href="http://patchbot.sagemath.org/log/13370/Fedora/17/x86_64/3.5.2-3.fc17.x86_64/volker-desktop.stp.dias.ie/2012-08-27%2023:35:38%20+0100"><span class="icon"></span>Volker's patchbot</a>, too:
</p>
<pre class="wiki">sage -t -force_lib devel/sage-13370/sage/rings/polynomial/polynomial_real_mpfr_dense.pyx
The doctested process was killed by signal 11
</pre><p>
The big question is: How can we deal with that problem?
</p>
TicketSimonKingThu, 06 Sep 2012 15:46:44 GMT
https://trac.sagemath.org/ticket/715#comment:251
https://trac.sagemath.org/ticket/715#comment:251
<p>
It seems to me that it is <em>not</em> a side effect. Namely, I put
</p>
<pre class="wiki">class Foo:
def bar(self):
"""
Cache values for a number of inputs. Do the computation
in parallel, and only bother to compute values that we
haven't already cached.
EXAMPLES::
sage: @cached_function
... def oddprime_factors(n):
... l = [p for p,e in factor(n) if p != 2]
... return len(l)
sage: oddprime_factors.precompute(range(1,100), 4)
sage: oddprime_factors.cache[(25,),()]
1
"""
pass
</pre><p>
into a file and run <code>sage -t</code> on it. All tests pass -- <strong><span class="underline">BUT</span></strong> running <code>sage -t -gdb</code>, I get the same error as in cachefunc.pyx, where the test above is just one among many other tests:
</p>
<pre class="wiki">Failed example:
oddprime_factors.precompute(range(Integer(1),Integer(100)), Integer(4))###line 14:_sage_ >>> oddprime_factors.precompute(range(1,100), 4)
Expected nothing
Got:
[Errno 4] Interrupted system call
Killing any remaining workers...
**********************************************************************
File "/scratch/sking/sage-5.4.beta0/devel/sage-main/sage/misc/blubb.py", line ?, in __main__.example_0
Failed example:
oddprime_factors.cache[(Integer(25),),()]###line 15:_sage_ >>> oddprime_factors.cache[(25,),()]
Exception raised:
Traceback (most recent call last):
File "/scratch/sking/sage-5.4.beta0/local/bin/ncadoctest.py", line 1231, in run_one_test
self.run_one_example(test, example, filename, compileflags)
File "/scratch/sking/sage-5.4.beta0/local/bin/sagedoctest.py", line 38, in run_one_example
OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
File "/scratch/sking/sage-5.4.beta0/local/bin/ncadoctest.py", line 1172, in run_one_example
compileflags, 1) in test.globs
File "<doctest __main__.example_0[4]>", line 1, in <module>
oddprime_factors.cache[(Integer(25),),()]###line 15:_sage_ >>> oddprime_factors.cache[(25,),()]
KeyError: ((25,), ())
</pre><p>
The "Killing any remaining workers" comes from a parallel computation, isn't it? The <code>precompute()</code> method is parallel. Let us see whether it also fails in an interactive session under gdb!
</p>
TicketSimonKingThu, 06 Sep 2012 15:51:38 GMT
https://trac.sagemath.org/ticket/715#comment:252
https://trac.sagemath.org/ticket/715#comment:252
<p>
Yes, it even works (i.e. reproduces the error) interactively, provided one runs "sage -gdb"!
</p>
<pre class="wiki">sage: @cached_function
....: def oddprime_factors(n):
....: l = [p for p,e in factor(n) if p != 2]
....: return len(l)
....:
sage: oddprime_factors.precompute(range(1,100), 4)
[Errno 4] Interrupted system call
Killing any remaining workers...
sage: oddprime_factors.precompute(range(1,100), 6)
[Errno 4] Interrupted system call
Killing any remaining workers...
sage: oddprime_factors.precompute(range(1,100))
[Errno 4] Interrupted system call
Killing any remaining workers...
sage: len(oddprime_factors.cache)
0
</pre><p>
Interestingly, using <code>range(1,99)</code> (which made the problem vanish in the doctest) does not work interactively.
</p>
TicketSimonKingFri, 07 Sep 2012 13:24:08 GMT
https://trac.sagemath.org/ticket/715#comment:253
https://trac.sagemath.org/ticket/715#comment:253
<p>
I tried inserting print statements into sage.parallel.use_fork.p_iter_fork._subprocess.
</p>
<p>
The print statements <em>are</em> executed when successfully running the example in an interactive session.
</p>
<p>
They are <em>not</em> executed when running it interactively under gdb. Hence, _subprocess (which contains the invalidation of pexpect interfaces) is not involved in the interactive error under gdb.
</p>
<p>
They <em>are</em> executed when running the doctests. The test then fails (because of the unexpected print statements), but there is no signal 11.
</p>
<p>
What could that mean? Perhaps we actually have two independent problems in the same example: One appears in a gdb'ed interactive session and can be fixed with <a class="closed ticket" href="https://trac.sagemath.org/ticket/13437" title="defect: Clean up SIGALRM handling in p_iter_fork (closed: invalid)">#13437</a>. The other appears with <code>sage -t</code> and remains a mystery.
</p>
TicketSimonKingFri, 07 Sep 2012 13:37:23 GMT
https://trac.sagemath.org/ticket/715#comment:254
https://trac.sagemath.org/ticket/715#comment:254
<p>
Perhaps I stand corrected. I inserted print statement in a different location, one of them directly before calling _subprocess. This statement is printed, then interrupted system call strikes.
</p>
TicketSimonKingFri, 07 Sep 2012 13:47:30 GMT
https://trac.sagemath.org/ticket/715#comment:255
https://trac.sagemath.org/ticket/715#comment:255
<p>
I forgot that _subprocess redirects stdout. Sorry for the noise.
</p>
TicketSimonKingFri, 07 Sep 2012 14:00:50 GMT
https://trac.sagemath.org/ticket/715#comment:256
https://trac.sagemath.org/ticket/715#comment:256
<p>
I think now I located the problem exposed by a gdb'ed interactive session. When not redirecting stdout, a print statement before the last line of the "finally:" clause of _subprocess is executed. But a print statement inserted right after the call to <code>self._subprocess(f, dir, v[0])</code> in <code>p_iter_fork.__call__</code> is not executed.
</p>
<p>
There is only one line of code between the executed and the not-executed print statements: The last line of <code>_subprocess</code>' "finally:" clause, namely
</p>
<pre class="wiki"> os._exit(0)
</pre><p>
Question to the experts: What could possible go wrong in <code>os._exit(0)</code>?
</p>
TicketnbruinFri, 07 Sep 2012 15:50:02 GMT
https://trac.sagemath.org/ticket/715#comment:257
https://trac.sagemath.org/ticket/715#comment:257
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:256" title="Comment 256">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
There is only one line of code between the executed and the not-executed print statements: The last line of <code>_subprocess</code>' "finally:" clause, namely
</p>
<pre class="wiki"> os._exit(0)
</pre><p>
Question to the experts: What could possible go wrong in <code>os._exit(0)</code>?
</p>
</blockquote>
<p>
Oh dear. That sounds like <code>_subprocess</code> is not returning at all! Let's see what the documentation says:
</p>
<pre class="wiki">os._exit(n)
Exit the process with status n, without calling cleanup handlers,
flushing stdio buffers, etc.
</pre><p>
Could it be we found a bug in the OSX kernel? A system call that doesn't return?
</p>
<p>
More seriously, it seems rather reassuring that the statement that comes after you tell the process to quit, doesn't get executed. It seems to me you've just ruled out it's not the child that SEGV-ing -- it's the parent.
</p>
<p>
In fact, we could have known that. In the doctest of <code>sage.parallel.decorate.Fork</code> there is an explicit test that shows a child can segfault with no detrimental effect (If you instrument <code>sage-doctest</code> to not hide stderr, it's scary to see the backtrace come by, but the doctest passes without problem). The fact that the doctest framework can get its hand on the "11" exit code shows it's the parent that generates it. Why do you think this happens due to parallel at all? Under gdb, the test does not segfault, so you're looking at different behaviour. I don't think parallel is implicated in this at all.
</p>
<p>
Really, <em>strip away the doctesting layer</em>! If you read <code>sage-doctest</code>, you'll see it produces a straight python file that it then executes straight using python, with IO all redirected. Get that file and run it directly, without redirecting IO. Setting <code>verbose</code> doesn't just change the IO redirection in <code>sage-doctest</code>. It also gets written into that file and hence can influence behaviour there. So with <code>sage -t</code> and <code>sage -t --verbose</code> you're really running different code. You want the code that <code>sage -t</code> generates with the IO redirection that <code>sage -t -verbose</code> does. At that point you might as well just get <code>sage -t</code> out of the way completely.
</p>
<p>
If you want to help people in the future, patch <code>sage -t</code> to have a flag <code>--keep</code>, to not throw away any of the temporary files it produces, so that you can pick through the remainders.
</p>
<hr />
<p>
Using <code>os.exit</code> versus <code>os._exit</code>: I can see why one might have thought that's a good idea. We got what we came for (the function got executed and the result is stored in an <code>.sobj</code> -- this should really be communicated via a pipe to the parent, not via a temporary file), so why risk fouling it up by doing more just to exit? However, if someone uses this for side-effects (write to some shared file) it could be the buffers don't get flushed. On the other hand, code is executing in parallel here (that's the point), so one would probably already run into problems.
</p>
TicketSimonKingFri, 07 Sep 2012 16:22:57 GMTattachment set
https://trac.sagemath.org/ticket/715
https://trac.sagemath.org/ticket/715
<ul>
<li><strong>attachment</strong>
set to <em>failing_test_under_gdb.py</em>
</li>
</ul>
<p>
Temporary file created by sage -t on a test that fails with gdb
</p>
TicketSimonKingFri, 07 Sep 2012 16:26:15 GMT
https://trac.sagemath.org/ticket/715#comment:258
https://trac.sagemath.org/ticket/715#comment:258
<p>
I am not totally sure if I understand what you mean: You say it would be interesting to see the temporary file that is created by <code>sage -t</code>? Then: see <a class="attachment" href="https://trac.sagemath.org/attachment/ticket/715/failing_test_under_gdb.py" title="Attachment 'failing_test_under_gdb.py' in Ticket #715">failing_test_under_gdb.py</a><a class="trac-rawlink" href="https://trac.sagemath.org/raw-attachment/ticket/715/failing_test_under_gdb.py" title="Download"></a>.
</p>
<p>
The original file was as in <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:251" title="Comment 251">comment:251</a>. It passes when running <code>sage -t</code>, but fails when running <code>sage -t -gdb</code>.
</p>
TicketSimonKingFri, 07 Sep 2012 16:26:49 GMT
https://trac.sagemath.org/ticket/715#comment:259
https://trac.sagemath.org/ticket/715#comment:259
<p>
Since the attachments changed, a message for the patchbots:
</p>
<p>
Apply trac_715_combined.patch trac_715_local_refcache.patch trac_715_safer.patch trac_715_specification.patch
</p>
<p>
And then <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>
</p>
TicketSimonKingFri, 07 Sep 2012 16:44:29 GMT
https://trac.sagemath.org/ticket/715#comment:260
https://trac.sagemath.org/ticket/715#comment:260
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:257" title="Comment 257">nbruin</a>:
</p>
<blockquote class="citation">
<p>
Using <code>os.exit</code> versus <code>os._exit</code>: I can see why one might have thought that's a good idea. We got what we came for (the function got executed and the result is stored in an <code>.sobj</code> -- this should really be communicated via a pipe to the parent, not via a temporary file), so why risk fouling it up by doing more just to exit? However, if someone uses this for side-effects (write to some shared file) it could be the buffers don't get flushed. On the other hand, code is executing in parallel here (that's the point), so one would probably already run into problems.
</p>
</blockquote>
<p>
Changing <code>os._exit</code> into <code>os.exit</code> won't work. The example <code>oddprime_factors.precompute(range(1,99))</code> seems to hang with that change.
</p>
TicketnbruinFri, 07 Sep 2012 16:50:08 GMT
https://trac.sagemath.org/ticket/715#comment:261
https://trac.sagemath.org/ticket/715#comment:261
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:258" title="Comment 258">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
I am not totally sure if I understand what you mean: You say it would be interesting to see the temporary file that is created by <code>sage -t</code>? Then: see <a class="attachment" href="https://trac.sagemath.org/attachment/ticket/715/failing_test_under_gdb.py" title="Attachment 'failing_test_under_gdb.py' in Ticket #715">failing_test_under_gdb.py</a><a class="trac-rawlink" href="https://trac.sagemath.org/raw-attachment/ticket/715/failing_test_under_gdb.py" title="Download"></a>.
</p>
<p>
The original file was as in <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:251" title="Comment 251">comment:251</a>. It passes when running <code>sage -t</code>, but fails when running <code>sage -t -gdb</code>.
</p>
</blockquote>
<p>
But doctesting doesn't run it through <code>sage</code>. It executes <code>python failing_test_under_gdb.py</code>. if I'm not mistaken. If you can run the exact same command and input file that <code>sage -t</code> runs and not get a segv where <code>sage -t</code> does, there is something really strange. I guess you might want to control for environment variables as well, but other than that there should really not be a difference.
</p>
TicketvbraunFri, 07 Sep 2012 17:13:17 GMT
https://trac.sagemath.org/ticket/715#comment:262
https://trac.sagemath.org/ticket/715#comment:262
<p>
The forked children inherit the parent <code>atexit</code> handlers, this is why we get out with <code>os._exit</code>. Calling the regular <code>os.exit</code> might delete the parent's temp files etc.
</p>
TicketnbruinFri, 07 Sep 2012 17:32:46 GMT
https://trac.sagemath.org/ticket/715#comment:263
https://trac.sagemath.org/ticket/715#comment:263
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:261" title="Comment 261">nbruin</a>:
</p>
<blockquote class="citation">
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:258" title="Comment 258">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
I am not totally sure if I understand what you mean: You say it would be interesting to see the temporary file that is created by <code>sage -t</code>? Then: see <a class="attachment" href="https://trac.sagemath.org/attachment/ticket/715/failing_test_under_gdb.py" title="Attachment 'failing_test_under_gdb.py' in Ticket #715">failing_test_under_gdb.py</a><a class="trac-rawlink" href="https://trac.sagemath.org/raw-attachment/ticket/715/failing_test_under_gdb.py" title="Download"></a>.
</p>
</blockquote>
</blockquote>
<p>
That's not the entire file, so even if you run this through python and not get a SEGV, that doesn't show anything. Since the GDB problem and SEGV are likely independent, you may well have cut out the doctest that generates the SEGV (or changed the memory conditions under which it runs).
</p>
TicketSimonKingFri, 07 Sep 2012 18:00:49 GMT
https://trac.sagemath.org/ticket/715#comment:264
https://trac.sagemath.org/ticket/715#comment:264
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:261" title="Comment 261">nbruin</a>:
</p>
<blockquote class="citation">
<p>
But doctesting doesn't run it through <code>sage</code>. It executes <code>python failing_test_under_gdb.py</code>. if I'm not mistaken. If you can run the exact same command and input file that <code>sage -t</code> runs and not get a segv where <code>sage -t</code> does, there is something really strange. I guess you might want to control for environment variables as well, but other than that there should really not be a difference.
</p>
</blockquote>
<pre class="wiki">bash-3.2$ ../../sage -python -t ~/SAGE/work/signal11/my_test_86673.py
</pre><p>
So, running it in pure python works, of course.
</p>
<p>
According to the sage-doctest script, I thought that the command to run the test under gdb is as follows:
</p>
<pre class="wiki">bash-3.2$ gdb --args ../../sage -python -t ~/SAGE/work/signal11/my_test_86673.py
GNU gdb 6.3.50-20050815 (Apple version gdb-1515) (Sat Jan 15 08:33:48 UTC 2011)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin"..."/scratch/sking/sage-5.4.beta0/sage": not in executable format: File format not recognized
(gdb) r
Starting program:
No executable file specified.
Use the "file" or "exec-file" command.
</pre><p>
So, it didn't work.
</p>
<p>
What is the command to run the test in python under gdb?
</p>
TicketSimonKingFri, 07 Sep 2012 18:02:36 GMT
https://trac.sagemath.org/ticket/715#comment:265
https://trac.sagemath.org/ticket/715#comment:265
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:263" title="Comment 263">nbruin</a>:
</p>
<blockquote class="citation">
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:261" title="Comment 261">nbruin</a>:
</p>
<blockquote class="citation">
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:258" title="Comment 258">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
I am not totally sure if I understand what you mean: You say it would be interesting to see the temporary file that is created by <code>sage -t</code>? Then: see <a class="attachment" href="https://trac.sagemath.org/attachment/ticket/715/failing_test_under_gdb.py" title="Attachment 'failing_test_under_gdb.py' in Ticket #715">failing_test_under_gdb.py</a><a class="trac-rawlink" href="https://trac.sagemath.org/raw-attachment/ticket/715/failing_test_under_gdb.py" title="Download"></a>.
</p>
</blockquote>
</blockquote>
<p>
That's not the entire file
</p>
</blockquote>
<p>
Why do you think so? It is the temporary file created by sage-doctest. I had modified sage-doctest so that the location of the temporary file is shown, instead of deleting the file - hence, I could copy it and post it here.
</p>
TicketSimonKingFri, 07 Sep 2012 18:12:36 GMT
https://trac.sagemath.org/ticket/715#comment:266
https://trac.sagemath.org/ticket/715#comment:266
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:264" title="Comment 264">SimonKing</a>:
</p>
<blockquote class="citation">
<pre class="wiki">bash-3.2$ gdb --args ../../sage -python -t ~/SAGE/work/signal11/my_test_86673.py
...
</pre></blockquote>
<p>
Should have been
</p>
<pre class="wiki">bash-3.2$ gdb --args ../../local/bin/python -t ~/SAGE/work/signal11/my_test_86673.py
</pre><p>
However, running the test won't work:
</p>
<pre class="wiki">(gdb) r
Starting program: /scratch/sking/sage-5.4.beta0/local/bin/python -t /Users/SimonKing/SAGE/work/signal11/my_test_86673.py
Reading symbols for shared libraries .++++..... done
Traceback (most recent call last):
File "/Users/SimonKing/SAGE/work/signal11/my_test_86673.py", line 6, in <module>
from sage.all_cmdline import *;
File "/scratch/sking/sage-5.4.beta0/local/lib/python2.7/site-packages/sage/all_cmdline.py", line 14, in <module>
from sage.all import *
File "/scratch/sking/sage-5.4.beta0/local/lib/python2.7/site-packages/sage/all.py", line 47, in <module>
raise RuntimeError("To use the Sage libraries, set the environment variable SAGE_ROOT to the Sage build directory and LD_LIBRARY_PATH to $SAGE_ROOT/local/lib")
RuntimeError: To use the Sage libraries, set the environment variable SAGE_ROOT to the Sage build directory and LD_LIBRARY_PATH to $SAGE_ROOT/local/lib
Program exited with code 01.
(gdb)
</pre><p>
So, it needs to be executed inside a Sage shell - but then the test fails in the exactly same way as with sage -t, which shouldn't be a surprise.
</p>
TicketSimonKingFri, 07 Sep 2012 18:16:04 GMT
https://trac.sagemath.org/ticket/715#comment:267
https://trac.sagemath.org/ticket/715#comment:267
<p>
PS: Setting SAGE_ROOT and LD_LIBRARY_PATH as indicated by the error message did not help.
</p>
TicketnbruinFri, 07 Sep 2012 21:41:50 GMT
https://trac.sagemath.org/ticket/715#comment:268
https://trac.sagemath.org/ticket/715#comment:268
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:265" title="Comment 265">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
Why do you think so? It is the temporary file created by sage-doctest. I had modified sage-doctest so that the location of the temporary file is shown, instead of deleting the file - hence, I could copy it and post it here.
</p>
</blockquote>
<p>
I did the same but got a bigger file (I'm not attaching it because with the hardcoded paths it's useless, so you have to extract it yourself anyway)
</p>
<pre class="wiki">duke sage/5.3rc1$ wc failing_test_under_gdb.py
96 283 3714 failing_test_under_gdb.py
duke sage/5.3rc1$ wc cachefunc_3730.py
2592 10019 99307 cachefunc_3730.py
</pre><p>
so I suspect that you edited it. However, if your shorter file is still capable of segfaulting, that's fine, of course.
</p>
<p>
As you remark, it should be run in a sage shell:
</p>
<pre class="wiki">duke sage/5.3rc1$ ./sage -sh
Starting subshell with Sage environment variables set. Don't forget
to exit when you are done. Beware:
* Do not do anything with other copies of Sage on your system.
* Do not use this for installing Sage packages using "sage -i" or for
running "make" at Sage's root directory. These should be done
outside the Sage shell.
Bypassing shell configuration files...
Note: SAGE_ROOT=/usr/local/sage/5.3rc1
> time python cachefunc_3730.py
5.553u 2.243s 0:10.39 74.9% 0+0k 1128+17624io 1pf+0w
</pre><p>
If you do this on the machine where you get the SEGV (i.e., bsd) in the doctest,
you should really get a SEGV from this as well. If you don't, we should probably
start taking cosmic radiation into account as well.
</p>
<p>
For running under gdb:
</p>
<pre class="wiki">> gdb --args python -t cachefunc_3730.py
[...runs fine...]
</pre><p>
we already know that that prevents the SEGV from happening.
</p>
<p>
The key is that now you have a single file, <code>cachefunc_3730.py</code> for me, but
you'd have a different name, which you can tweak bit by bit. As we've seen,
running <code>sage -t --verbose</code> also prevents the SEGV, so setting
</p>
<pre class="wiki">if __name__ == '__main__':
verbose = True
</pre><p>
will likely make the SEGV go away. However, you have finer control now. By
tweaking the file bit by bit you can probably zoom in on what goes wrong. Plus,
seeing the traceback from an unredirected stderr might already give you a hint
of what's going wrong.
</p>
<p>
My bet is that all doctests pass and that something goes wrong in <code>quit_sage</code>, where the flurry of deletions is likely double-free something or reference an invalid pointer.
</p>
TicketSimonKingFri, 07 Sep 2012 22:10:51 GMT
https://trac.sagemath.org/ticket/715#comment:269
https://trac.sagemath.org/ticket/715#comment:269
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:268" title="Comment 268">nbruin</a>:
</p>
<blockquote class="citation">
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:265" title="Comment 265">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
Why do you think so? It is the temporary file created by sage-doctest. I had modified sage-doctest so that the location of the temporary file is shown, instead of deleting the file - hence, I could copy it and post it here.
</p>
</blockquote>
<p>
I did the same but got a bigger file (I'm not attaching it because with the hardcoded paths it's useless, so you have to extract it yourself anyway)
</p>
<pre class="wiki">duke sage/5.3rc1$ wc failing_test_under_gdb.py
96 283 3714 failing_test_under_gdb.py
duke sage/5.3rc1$ wc cachefunc_3730.py
2592 10019 99307 cachefunc_3730.py
</pre><p>
so I suspect that you edited it. However, if your shorter file is still capable of segfaulting, that's fine, of course.
</p>
</blockquote>
<p>
As I said: It is the file from <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:251" title="Comment 251">comment:251</a>, it is <em>not</em> cachefunc.pyx, but just a single test from cachefunc.pyx that suffices to trigger the error (which also demonstrates that it is not a side effect of other tests).
</p>
<blockquote class="citation">
<p>
If you do this on the machine where you get the SEGV (i.e., bsd) in the doctest,
</p>
</blockquote>
<p>
Do I get SEGV? Is that a synonym of signal 11?
</p>
<blockquote class="citation">
<p>
For running under gdb:
</p>
<pre class="wiki">> gdb --args python -t cachefunc_3730.py
[...runs fine...]
</pre><p>
we already know that that prevents the SEGV from happening.
</p>
</blockquote>
<p>
Is it preventing it from happening? I thought we have found that some signal problem is still present for sub-processes created with p_iter_fork.
</p>
TicketnbruinFri, 07 Sep 2012 22:41:21 GMT
https://trac.sagemath.org/ticket/715#comment:270
https://trac.sagemath.org/ticket/715#comment:270
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:269" title="Comment 269">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
Do I get SEGV? Is that a synonym of signal 11?
</p>
</blockquote>
<p>
Ah, yes. SIGSEGV (a segmentation fault) gets communicated via a signal 11.
</p>
<blockquote class="citation">
<p>
Is it preventing it from happening? I thought we have found that some signal problem is still present for sub-processes created with p_iter_fork.
</p>
</blockquote>
<p>
Right. But signal handlers and segfaults are only related to the extent that a segmentation fault gets communicated via a signal. So being killed because of a "signal 11" doesn't particularly indicate any problem with stray signals or signal handlers. It's probably just a plain memory fault. At this point I think there is little ground to assume the SIGABRT issues observed are related to the segmentation fault. In particular because <a class="closed ticket" href="https://trac.sagemath.org/ticket/13437" title="defect: Clean up SIGALRM handling in p_iter_fork (closed: invalid)">#13437</a> fixes one and not the other.
Even if there is a connection, it doesn't seem that exploring a hypothetical one is going to help much in tracing the problem. If
</p>
<pre class="wiki">$ sage -sh
...
> python -t failing_test_under_gdb.py
</pre><p>
is giving you a segfault, perhaps you can get that to dump core? (if I send
<code>kill -11 [python process]</code> I get a core dumped if I unset the limit). Alternatively, perhaps
</p>
<pre class="wiki">> sage failing_test_under_gdb.py
</pre><p>
is close enough that it still segfaults. Sage installs a more useful SIGSEGV handler that at least gives you a traceback on stderr. Apparently involving gdb changes things too much to still observe the error, so from that point it's just tweaking the file and/or sage to see where the error is originating.
</p>
TicketSimonKingSat, 08 Sep 2012 06:15:27 GMT
https://trac.sagemath.org/ticket/715#comment:271
https://trac.sagemath.org/ticket/715#comment:271
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:270" title="Comment 270">nbruin</a>:
</p>
<blockquote class="citation">
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:269" title="Comment 269">SimonKing</a>:
If
</p>
<pre class="wiki">$ sage -sh
...
> python -t failing_test_under_gdb.py
</pre><p>
is giving you a segfault,
</p>
</blockquote>
<p>
It isn't. As I stated above, with the short test file written down in <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:251" title="Comment 251">comment:251</a> I can reproduce the failure occurring with <code>sage -t -gdb</code>, but it passes with <code>sage -t</code>. And so does its pure python version.
</p>
<p>
In other words, I'll now try to get the python version of the full test of cachefunc.pyx.
</p>
<blockquote class="citation">
<p>
perhaps you can get that to dump core? (if I send
<code>kill -11 [python process]</code> I get a core dumped if I unset the limit).
</p>
</blockquote>
<p>
Could you elaborate more? By <code>[python process]</code> you mean the pid of the test, right? How can I find out the pid in the few seconds that the test takes before failing?
</p>
TicketSimonKingSat, 08 Sep 2012 06:19:46 GMT
https://trac.sagemath.org/ticket/715#comment:272
https://trac.sagemath.org/ticket/715#comment:272
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:271" title="Comment 271">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
In other words, I'll now try to get the python version of the full test of cachefunc.pyx.
</p>
</blockquote>
<p>
I tried to modify the function delete_tmpfiles() in sage-doctest such that the temporary files are preserved, but apparently the function is not executed. That may indicate that in fact the test framework fails, not the test.
</p>
TicketSimonKingSat, 08 Sep 2012 06:24:58 GMTattachment set
https://trac.sagemath.org/ticket/715
https://trac.sagemath.org/ticket/715
<ul>
<li><strong>attachment</strong>
set to <em>cachefunc_94107.py</em>
</li>
</ul>
<p>
Temporary file created by sage -t that gives signal 11
</p>
TicketSimonKingSat, 08 Sep 2012 06:32:20 GMT
https://trac.sagemath.org/ticket/715#comment:273
https://trac.sagemath.org/ticket/715#comment:273
<p>
With <a class="attachment" href="https://trac.sagemath.org/attachment/ticket/715/cachefunc_94107.py" title="Attachment 'cachefunc_94107.py' in Ticket #715">cachefunc_94107.py</a><a class="trac-rawlink" href="https://trac.sagemath.org/raw-attachment/ticket/715/cachefunc_94107.py" title="Download"></a>, I get:
</p>
<pre class="wiki">(sage-sh) SimonKing@bsd:sage$ python -t ~/SAGE/work/signal11/cachefunc_94107.py
------------------------------------------------------------------------
Unhandled SIGSEGV: A segmentation fault occurred in Sage.
This probably occurred because a *compiled* component of Sage has a bug
in it and is not properly wrapped with sig_on(), sig_off(). You might
want to run Sage under gdb with 'sage -gdb' to debug this.
Sage will now terminate.
------------------------------------------------------------------------
Segmentation fault
</pre><p>
So, that looks much more expressive than what sage -t reports!
</p>
<p>
However, setting ulimit -c unlimited did not result in a dumped core:
</p>
<pre class="wiki">(sage-sh) SimonKing@bsd:sage$ ulimit -c unlimited
(sage-sh) SimonKing@bsd:sage$ python -t ~/SAGE/work/signal11/cachefunc_94107.py
...
Segmentation fault
(sage-sh) SimonKing@bsd:sage$ ls /cores/
(sage-sh) SimonKing@bsd:sage$
</pre><p>
So, can you explain how I could get a core dump?
</p>
<p>
Apply trac_715_combined.patch trac_715_local_refcache.patch trac_715_safer.patch trac_715_specification.patch
</p>
<p>
And then <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>
</p>
TicketSimonKingSat, 08 Sep 2012 06:39:21 GMT
https://trac.sagemath.org/ticket/715#comment:274
https://trac.sagemath.org/ticket/715#comment:274
<p>
Playing around with <a class="attachment" href="https://trac.sagemath.org/attachment/ticket/715/cachefunc_94107.py" title="Attachment 'cachefunc_94107.py' in Ticket #715">cachefunc_94107.py</a><a class="trac-rawlink" href="https://trac.sagemath.org/raw-attachment/ticket/715/cachefunc_94107.py" title="Download"></a>:
</p>
<p>
<code>sage cachefunc_94107.py</code> also results in that segfault.
</p>
<p>
Starting sage and then attaching cachefunc_94107.py to the interactive session, I get:
</p>
<pre class="wiki">sage: attach ~/SAGE/work/signal11/cachefunc_94107.py
---------------------------------------------------------------------------
SystemExit Traceback (most recent call last)
/scratch/sking/sage-5.4.beta0/devel/sage-main/<ipython console> in <module>()
/scratch/sking/sage-5.4.beta0/local/lib/python2.7/site-packages/sage/misc/preparser.pyc in load(filename, globals, attach)
1646
1647 if fpath.endswith('.py'):
-> 1648 execfile(fpath, globals)
1649 elif fpath.endswith('.sage'):
1650 if (attach and attach_debug_mode) or ((not attach) and load_debug_mode):
/Users/SimonKing/SAGE/work/signal11/cachefunc_94107.py in <module>()
2588 sys.exit(255)
2589 quit_sage(verbose=False)
2590 if runner.failures > 254:
2591 sys.exit(254)
-> 2592 sys.exit(runner.failures)
SystemExit: 0
Type %exit or %quit to exit IPython (%Exit or %Quit do so unconditionally).
sage:
</pre><p>
So, up to here, it more or less looks normal. But when I press Ctrl-D to leave the interactive session, I get:
</p>
<pre class="wiki">Exiting Sage (CPU time 0m0.70s, Wall time 0m30.10s).
/scratch/sking/sage-5.4.beta0/spkg/bin/sage: line 336: 97357 Segmentation fault sage-ipython "$@" -i
</pre><p>
The "Exiting Sage ..." is printed at the beginning of sage.all.quit_sage. Hence, it now seems that (again) leaving Sage is the problem.
</p>
TicketSimonKingSat, 08 Sep 2012 06:49:12 GMT
https://trac.sagemath.org/ticket/715#comment:275
https://trac.sagemath.org/ticket/715#comment:275
<p>
Nils, I have absolutely no idea how you made messages show up in my screen session on bsd.math, and I also have no idea how to answer.
</p>
<p>
What I did now: I edited the test file, so that it ends with
</p>
<div class="wiki-code"><div class="code"><pre> <span class="k">except</span> <span class="ne">BaseException</span><span class="p">,</span> msg<span class="p">:</span>
<span class="k">print</span> <span class="s">"an exception has occured"</span>
<span class="k">print</span> msg
<span class="c">#quit_sage(verbose=False)</span>
<span class="kn">import</span> <span class="nn">traceback</span>
traceback<span class="o">.</span>print_exc<span class="p">(</span><span class="nb">file</span><span class="o">=</span>sys<span class="o">.</span>stdout<span class="p">)</span>
<span class="c">#sys.exit(255)</span>
<span class="k">print</span> <span class="s">"we would now quit, but we don't"</span>
<span class="c">#quit_sage(verbose=False)</span>
<span class="c">#if runner.failures > 254:</span>
<span class="c"># sys.exit(254)</span>
<span class="c">#sys.exit(runner.failures)</span>
</pre></div></div><p>
Then, attaching the file and quitting sage works fine, and an exception does not occur. No idea what that means, though.
</p>
TicketSimonKingSat, 08 Sep 2012 06:55:58 GMT
https://trac.sagemath.org/ticket/715#comment:276
https://trac.sagemath.org/ticket/715#comment:276
<p>
Sorry for the noise. The segmentation fault already occurs when doing <code>quit_sage()</code> in an interactive session, and then quits sage (which implies executing <code>quit_sage()</code> again).
</p>
TicketSimonKingSat, 08 Sep 2012 07:12:43 GMT
https://trac.sagemath.org/ticket/715#comment:277
https://trac.sagemath.org/ticket/715#comment:277
<p>
A question, perhaps slightly off-topic (or not?): Do we want to change quit_sage such that using it twice does not result in a segfault but only in a "harmless" error (or better: In no error at all)?
</p>
<p>
I know, one is not supposed to call quit_sage() explicitly (even though it appears in the global name space of interactive sessions), but perhaps it could be made safer.
</p>
TicketSimonKingSat, 08 Sep 2012 07:21:47 GMT
https://trac.sagemath.org/ticket/715#comment:278
https://trac.sagemath.org/ticket/715#comment:278
<p>
Ah! If one does quit_sage() in an interactive session and then quits sage, the segfault occurs in _unsafe_deallocate_pari_stack. That problem is fixed in another patch of mine, namely <a class="ext-link" href="http://trac.sagemath.org/sage_trac/attachment/ticket/12215/trac12215_segfault_fixes.patch"><span class="icon"></span>at #12215</a>.
</p>
TicketSimonKingSat, 08 Sep 2012 07:33:35 GMT
https://trac.sagemath.org/ticket/715#comment:279
https://trac.sagemath.org/ticket/715#comment:279
<p>
If one replaces _unsafe_deallocate_pari_stack by <code>__dealloc__</code> (as suggested by <a class="closed ticket" href="https://trac.sagemath.org/ticket/12215" title="defect: Memleak in UniqueRepresentation, @cached_method (closed: fixed)">#12215</a>), the segfault created by manually using quit_sage() has moved to <code>sage.rings.integer.clear_mpz_globals()</code>. I guess that function has to do some checks before calling free.
</p>
<p>
Anyway. With the change to pari, one still has the signal 11 problem in sage -t sage/misc/cachefunc.pyx, as before.
</p>
TicketnbruinSat, 08 Sep 2012 07:46:53 GMT
https://trac.sagemath.org/ticket/715#comment:280
https://trac.sagemath.org/ticket/715#comment:280
<p>
I don't think that <code>quit_sage()</code> is the cause. Here's why. When I run <code>python cachfunc*.py</code> I observe that the segfault is happening during the actual doctests. In fact, it can happen in <code>example_27</code>.
I've changed the doctests there to fail, so that non-verbose output tells me what happens:
</p>
<pre class="wiki">...
def example_27(): r""">>> set_random_seed(0L)
>>> change_warning_output(sys.stdout)
Call the cached method without using the cache.
EXAMPLE::
>>> 1
DO WE SEE THIS?
>>> P = QQ['a, b, c, d']; (a, b, c, d,) = P._first_ngens(4)###line 1038:_sage_ >>> P.<a,b,c,d> = QQ[]
WE DO NOT SEE THIS
...
</pre><p>
When I run that, the (added) doctest fails visibly on <code>DO WE SEE THIS?</code> but the next line does not fail visibly anymore. That's not to say that that line has the bug in it. It just happens to get trapped in whatever memory corruption has happened before. So at least we know that whatever causes the corruption, it's executed before that point.
</p>
<p>
The actual trigger point may not bear real information. For instance, if I edit some doctests in e.g. example_17 to fail, I do get the printed failures but no segfault at all. That is consistent with <code>--verbose</code> making the segfault not happen in a way.
</p>
<p>
In any case, it seems that in an interactive session the segfault trigger gets postponed even further and only happens in <code>quit_sage()</code>. But that doesn't mean that <code>quit_sage()</code> is to blame.
</p>
TicketnbruinSat, 08 Sep 2012 08:19:27 GMT
https://trac.sagemath.org/ticket/715#comment:281
https://trac.sagemath.org/ticket/715#comment:281
<p>
Continuing on the bisection tour: While the argument above was sound (the memory corruption must happen before the segfault), the following is a heuristic: We observed that letting a doctest print/fail early in the file prevents the segfault from happening. This could be because such a print changes the memory layout (triggers a GC or something like that) and hence the corruption that is still to come, happens in a different place and doesn't lead to a segfault. This idea seems surprisingly robust in practice: If you add failing doctests below a certain point, you have the segfault in the place pointed out above. If you add failing doctests before a certain point, no segfault happens. The hypothesis is now that the crossover point is where the corruption happens. It's in example_21:
</p>
<pre class="wiki">...
This class is a pickle. However, sometimes, pickles
need to be pickled another time.
TEST::
>>> PF = WeylGroup(['A',Integer(3)]).pieri_factors()###line 846:_sage_ >>> PF = WeylGroup(['A',3]).pieri_factors()
>>> a = PF.an_element()###line 847:_sage_ >>> a = PF.an_element()
>>> 1
NOT THIS
>>> a.bruhat_lower_covers()###line 848:_sage_ >>> a.bruhat_lower_covers()
...
</pre><p>
With this in place, a segfault still happens. If I move the failing doctest before <code>PF.an_element</code>, we don't get a segfault. So perhaps that routine is to blame? Missing refcount increase perhaps?
</p>
<p>
Once again, this is only heuristic! I have no proof. It's just that around this location, segfaulting seems to react to changes.
</p>
TicketnbruinSat, 08 Sep 2012 08:47:00 GMT
https://trac.sagemath.org/ticket/715#comment:282
https://trac.sagemath.org/ticket/715#comment:282
<pre class="wiki">sage: PF = WeylGroup(['A',3]).pieri_factors()
sage: %time a = PF.an_element()
</pre><p>
I think <code>an_element</code> is exonerated. If you replace it with <code>a=iter(PF).next()</code> you get the same element and the same segfault.
</p>
<p>
Further desperate facts that may or may not be relevant:
</p>
<ul><li>if you make <code>TripleDict</code> strong on <em>any</em> of its keys, the segfault disappears. That doesn't say no memory corruption happens of course.
</li></ul><ul><li>if you store all key triples fed into <a class="missing wiki">TripleDict?</a> (setting strong refs), you find 220 keys before the tests run (i.e., just due to sage startup) and 351 after (and no segfault of course). A set of the 151 new entries:
<pre class="wiki">set([Full MatrixSpace of 4 by 4 sparse matrices over Integer Ring, Set of Python objects of type 'long', Ring of integers modulo 389, Extended weight space over the Rational Field of the Root system of type ['A', 3, 1], Full MatrixSpace of 4 by 4 dense matrices over Rational Field, Multivariate Polynomial Ring in a, b, c, d over Rational Field, Weight space over the Rational Field of the Root system of type ['A', 3], Coroot lattice of the Root system of type ['A', 3, 1], Ambient space of the Root system of type ['A', 3], Set of Python objects of type 'int', Weight lattice of the Root system of type ['A', 3, 1], Vector space of dimension 4 over Rational Field, Weight lattice of the Root system of type ['A', 3], Extended weight lattice of the Root system of type ['A', 3, 1], Full MatrixSpace of 130 by 390 sparse matrices over Rational Field, <type 'int'>, Root space over the Rational Field of the Root system of type ['A', 3], Interface to the PARI C library, Integer Ring, Root space over the Rational Field of the Root system of type ['A', 3, 1], The Infinity Ring, Rational Field, Weight space over the Rational Field of the Root system of type ['A', 3, 1], Root lattice of the Root system of type ['A', 3, 1], Multivariate Polynomial Ring in x, y, z over Rational Field, Full MatrixSpace of 130 by 390 sparse matrices over Integer Ring, <type 'NoneType'>, Root lattice of the Root system of type ['A', 3], <type 'long'>])
</pre></li></ul><p>
Quite some entries involving "root systems" etc., so it's not so far fetched to think that a bad deletion of something involving the <code>WeylGroup</code> causes the memory corruption. In that case the corruption is happening on all systems. It just only triggers a segfault on bsd. So someone with good valgrind experience wanting to analyze the memory management of the <code>cachefunc.pyx</code> doctests?
</p>
TicketjdemeyerSat, 08 Sep 2012 09:10:52 GMT
https://trac.sagemath.org/ticket/715#comment:283
https://trac.sagemath.org/ticket/715#comment:283
<p>
If you guys ever solve this problem, you really deserve some kind of medal for Debugging Excellence.
</p>
TicketSimonKingSat, 08 Sep 2012 11:10:16 GMT
https://trac.sagemath.org/ticket/715#comment:284
https://trac.sagemath.org/ticket/715#comment:284
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:282" title="Comment 282">nbruin</a>:
Implicated in all of this, by the way:
</p>
<blockquote class="citation">
<pre class="wiki">class WeylGroup_gens(ClearCacheOnPickle, UniqueRepresentation, MatrixGroup_gens)
</pre><p>
Oh this is so cool.
</p>
</blockquote>
<p>
Indeed! Until not so long ago (i.e., before <a class="closed ticket" href="https://trac.sagemath.org/ticket/11115" title="enhancement: Rewrite cached_method in Cython (closed: fixed)">#11115</a>), <code>ClearCacheOnPickle</code> was totally broken. And it is originally done for strongly cached methods. But with the patches from here, <code>UniqueRepresentation</code> has a weak cache of its <code>__classcall__</code>. That might be worth analysing - I am not sure at all whether this can possibly be a problem, because <code>__classcall__</code> is a cached_function, while <code>ClearCacheOnPickle</code> is supposed to only clear cached_method.
</p>
<blockquote class="citation">
<p>
All our favourites in one place:
</p>
<pre class="wiki">class MatrixGroup_gens(MatrixGroup_gap)
</pre><p>
We're wrapping an interface too!
</p>
</blockquote>
<p>
:)
</p>
TicketSimonKingSat, 08 Sep 2012 15:08:46 GMT
https://trac.sagemath.org/ticket/715#comment:285
https://trac.sagemath.org/ticket/715#comment:285
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:282" title="Comment 282">nbruin</a>:
</p>
<blockquote class="citation">
<p>
Oh sigh ... this could be such a red herring. On bsd.math, there is a huge difference between sage versions in how this piece of code behaves:
On <code>sage 5.4.beta0</code> (with patches):
</p>
<pre class="wiki">sage: PF = WeylGroup(['A',3]).pieri_factors()
sage: %time a = PF.an_element()
CPU times: user 0.06 s, sys: 0.05 s, total: 0.11 s
Wall time: 43.57 s
</pre></blockquote>
<p>
That's strange, but I can not confirm that timing. On bsd.math with patched 5.4.beta0:
</p>
<pre class="wiki">sage: PF = WeylGroup(['A',3]).pieri_factors()
sage: %time a = PF.an_element()
CPU times: user 0.06 s, sys: 0.05 s, total: 0.11 s
Wall time: 0.75 s
</pre><blockquote class="citation">
<p>
We're wrapping an interface too! (that sort of explains the anomalous timing. Apparently the particular 5.4b0 build on bsd has a very bad gap?
</p>
</blockquote>
<p>
Works for me.
</p>
TicketSimonKingSat, 08 Sep 2012 16:47:48 GMT
https://trac.sagemath.org/ticket/715#comment:286
https://trac.sagemath.org/ticket/715#comment:286
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:282" title="Comment 282">nbruin</a>:
</p>
<blockquote class="citation">
<p>
The method eventually called, PF._an_element_, is a very interesting piece of work.
</p>
</blockquote>
<p>
For the record: It is the generic _an_element_, defined in sage.structure.parent.Parent.
</p>
TicketnbruinSat, 08 Sep 2012 17:21:53 GMT
https://trac.sagemath.org/ticket/715#comment:287
https://trac.sagemath.org/ticket/715#comment:287
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:285" title="Comment 285">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
That's strange, but I can not confirm that timing. On bsd.math with patched 5.4.beta0:
</p>
</blockquote>
<p>
I cannot anymore either (I've copied your 5.4b0 on bsd). When I try it now, I get timings similar to yours. When I tried I did so repeatedly, with both sage versions.
</p>
<p>
However, the triggering of the segfault still seems to be as reported: Let a doctest fail before <code>PF._an_element</code>: no segfault. Otherwise: segfault.
</p>
TicketnbruinSat, 08 Sep 2012 19:36:24 GMT
https://trac.sagemath.org/ticket/715#comment:288
https://trac.sagemath.org/ticket/715#comment:288
<p>
OK, in principle it is possible to handle segfaults with code like:
</p>
<pre class="wiki">import signal
import os, sys
import traceback
def handler(a,frm):
tb=traceback.extract_stack(frm)
traceback.print_tb(tb,sys.stderr)
sys.stderr.flush()
os._exit(255)
signal.signal(signal.SIGSEGV,handler)
</pre><p>
Of course, with a serious corruption, it's doubtful that code can run successfully. Indeed, if we equip the doctesting script with it, we don't get useful information. It makes the script loop forever.
</p>
<p>
I cannot debug on bsd because OSX wants admin credentials. However, I think it is possible to attach gdb to running processes, in which case it might be possible to poke around in the corpse a bit.
</p>
<p>
See
</p>
<pre class="wiki">bsd.math.washington.edu:/scratch/nbruin/sage-5.4.beta0/segv_handle_infinite_loop.py
</pre><p>
I've also tried to put <code>gc.collect()</code> in the doctest. If you put it early enough in the file (either before or a bit after the <code>an_element</code> call), it prevents the segfault. If you put it right before the test where the segfault happens, the collection itself does not lead to a segfault, but a segfault still happens. This is all consistent with a corruption that happens at one point and triggers a fault somewhere else. Is there a way to put a command in the doctest that would drop us into a (python) debugger or a REPL? then we could pick through the memory and see if there's anything unsavoury.
</p>
TicketSimonKingSat, 08 Sep 2012 20:15:38 GMT
https://trac.sagemath.org/ticket/715#comment:289
https://trac.sagemath.org/ticket/715#comment:289
<p>
I really wonder about the use of <code>ClearCacheOnPickle</code> here. Quite simply: <code>ClearCacheOnPickle</code> can not work together with a method whose pickling relies on a <code>__reduce__</code> method. It will only work for objects that are pickled via <code>__getstate__</code>.
</p>
<p>
In particular, since <code>loads(dumps(W))</code> <em>is</em> <code>W</code>, there is nothing emptied. And even when storing it on disc, the cache is not emptied:
</p>
<pre class="wiki">sage: W = WeylGroup(['A',3])
sage: W.cartan_type
Cached version of <function cartan_type at 0x10aad5398>
sage: W.cartan_type.cache
['A', 3]
sage: save(W,'tmp')
</pre><p>
Start new session
</p>
<pre class="wiki">sage: W = load('tmp.sobj')
sage: W.cartan_type.cache
['A', 3]
</pre><p>
Hence, it makes absolutely no sense to me that <code>sage.combinat.root_system.weyl_group.WeylGroup_gens</code> inherits from <code>ClearCacheOnPickle</code> and <code>UniqueRepresentation</code> at the same time - both bases are orthogonal. I think <code>ClearCacheOnPickle</code> should be removed here.
</p>
<p>
However, as I just tested: Dropping <code>ClearCacheOnPickle</code> will not fix the signal 11.
</p>
TicketnbruinSun, 09 Sep 2012 07:05:59 GMT
https://trac.sagemath.org/ticket/715#comment:290
https://trac.sagemath.org/ticket/715#comment:290
<p>
Oops, wrong button. I meant to reply to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:282" title="Comment 282">282</a> but instead I edited the text. You can still read the original under "previous". Here is the reply:
</p>
<pre class="wiki">sage: PF = WeylGroup(['A',3]).pieri_factors()
sage: %time a = PF.an_element()
</pre><p>
I think <code>an_element</code> is exonerated. If you replace it with <code>a=iter(PF).next()</code> you get the same element and the same segfault.
</p>
<p>
Further desperate facts that may or may not be relevant:
</p>
<ul><li>if you make <code>TripleDict</code> strong on <em>any</em> of its keys, the segfault disappears. That doesn't say no memory corruption happens of course.
</li></ul><ul><li>if you store all key triples fed into <a class="missing wiki">TripleDict?</a> (setting strong refs), you find 220 keys before the tests run (i.e., just due to sage startup) and 351 after (and no segfault of course). A set of the 151 new entries:
<pre class="wiki">set([Full MatrixSpace of 4 by 4 sparse matrices over Integer Ring, Set of Python objects of type 'long', Ring of integers modulo 389, Extended weight space over the Rational Field of the Root system of type ['A', 3, 1], Full MatrixSpace of 4 by 4 dense matrices over Rational Field, Multivariate Polynomial Ring in a, b, c, d over Rational Field, Weight space over the Rational Field of the Root system of type ['A', 3], Coroot lattice of the Root system of type ['A', 3, 1], Ambient space of the Root system of type ['A', 3], Set of Python objects of type 'int', Weight lattice of the Root system of type ['A', 3, 1], Vector space of dimension 4 over Rational Field, Weight lattice of the Root system of type ['A', 3], Extended weight lattice of the Root system of type ['A', 3, 1], Full MatrixSpace of 130 by 390 sparse matrices over Rational Field, <type 'int'>, Root space over the Rational Field of the Root system of type ['A', 3], Interface to the PARI C library, Integer Ring, Root space over the Rational Field of the Root system of type ['A', 3, 1], The Infinity Ring, Rational Field, Weight space over the Rational Field of the Root system of type ['A', 3, 1], Root lattice of the Root system of type ['A', 3, 1], Multivariate Polynomial Ring in x, y, z over Rational Field, Full MatrixSpace of 130 by 390 sparse matrices over Integer Ring, <type 'NoneType'>, Root lattice of the Root system of type ['A', 3], <type 'long'>])
</pre></li></ul><p>
Quite some entries involving "root systems" etc., so it's not so far fetched to think that a bad deletion of something involving the <code>WeylGroup</code> causes the memory corruption. In that case the corruption is happening on all systems. It just only triggers a segfault on bsd. So someone with good valgrind experience wanting to analyze the memory management of the <code>cachefunc.pyx</code> doctests?
</p>
TicketjpfloriSun, 09 Sep 2012 09:15:11 GMT
https://trac.sagemath.org/ticket/715#comment:291
https://trac.sagemath.org/ticket/715#comment:291
<p>
You mean running the complete cachefunc.pyx doctests?
Or some stripped file?
I could give it a try.
</p>
<p>
IIRC I tried running valgrind on Simon example alone involving the cached oddprime thingy, but with the problematic range, the Valgrind output was just horrible, above 280MB...
I then only used 1,4 as range without parrallelness (the last parameter in Simon example), but did not really find anything obvious.
</p>
<p>
I'll retry to valgrind this today or tomorrow.
</p>
TicketSimonKingSun, 09 Sep 2012 10:43:20 GMT
https://trac.sagemath.org/ticket/715#comment:292
https://trac.sagemath.org/ticket/715#comment:292
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:290" title="Comment 290">nbruin</a>:
</p>
<blockquote class="citation">
<p>
Further desperate facts that may or may not be relevant:
</p>
<ul><li>if you make <code>TripleDict</code> strong on <em>any</em> of its keys, the segfault disappears.
</li></ul></blockquote>
<p>
Do you really say: <em>Any</em>? I ask, because the "classical" application of <code>TripleDict</code> in sage.structure.coerce would either have <code>None</code> (for coercion maps) or an operation (for actions) as third key item.
</p>
<p>
Hence, if a strong reference to the third key items of <code>TripleDict</code> suffices to fix the problem, then I reckon the "non-classical" use of <code>TripleDict</code> in <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> is involved in the segfault: The cache for Homsets, which has categories as third key items.
</p>
TicketSimonKingSun, 09 Sep 2012 16:18:16 GMT
https://trac.sagemath.org/ticket/715#comment:293
https://trac.sagemath.org/ticket/715#comment:293
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:290" title="Comment 290">nbruin</a>:
</p>
<blockquote class="citation">
<ul><li>if you store all key triples fed into <a class="missing wiki">TripleDict?</a> (setting strong refs), you find 220 keys before the tests run (i.e., just due to sage startup) and 351 after (and no segfault of course). A set of the 151 new entries:
</li></ul><p>
...
Quite some entries involving "root systems" etc., so it's not so far fetched to think that a bad deletion of something involving the <code>WeylGroup</code> causes the memory corruption.
</p>
</blockquote>
<p>
I tried to view it from the opposite direction: When feeding a value into a <code>TripleDict</code>, I stored the string representation of the value in a dictionary, indexed by the memory address of the value. And when <code>TripleDictEraser</code> was removing an item of a <code>TripleDict</code>, I wrote the string representation of the current value being deleted and its original string representation into a file.
</p>
<p>
Result: When running the tests of cachefunc.pyx, it happens 122 times that <code>TripleDictEraser</code> is called. It is called on precisely two kinds of values:
</p>
<ol><li>The value could be an action. If this is the case, then the underlying set of the action is already garbage collected, at the time when the action is removed from the <code>TripleDict</code>.
</li><li>The value could be a weak reference to a set of homomorphisms. If this is the case, then the weak reference is already dead, at the time when it is removed from the <code>TripleDict</code>.
</li></ol><p>
Here is the change that I applied:
</p>
<div class="wiki-code"><div xmlns="http://www.w3.org/1999/xhtml" class="diff">
<ul class="entries">
<li class="entry">
<h2>
<a>sage/structure/coerce_dict.pyx</a>
</h2>
<pre>diff --git a/sage/structure/coerce_dict.pyx b/sage/structure/coerce_dict.pyx</pre>
<table class="trac-diff inline" summary="Differences" cellspacing="0">
<colgroup><col class="lineno" /><col class="lineno" /><col class="content" /></colgroup>
<thead>
<tr>
<th title="File a/sage/structure/coerce_dict.pyx">
a
</th>
<th title="File b/sage/structure/coerce_dict.pyx">
b
</th>
<td><em></em> </td>
</tr>
</thead>
<tbody class="unmod">
<tr>
<th>33</th><th>33</th><td class="l"><span>include "../ext/python_list.pxi"</span></td>
</tr><tr>
<th>34</th><th>34</th><td class="l"><span></span></td>
</tr><tr>
<th>35</th><th>35</th><td class="l"><span>from weakref import KeyedRef</span></td>
</tr>
</tbody><tbody class="mod">
<tr class="first">
<th>36</th><th> </th><td class="l"><span></span></td>
</tr>
<tr class="last">
<th> </th><th>36</th><td class="r"><span>tmp_dict = {}</span></td>
</tr>
</tbody><tbody class="unmod">
<tr>
<th>37</th><th>37</th><td class="l"><span>############################################</span></td>
</tr><tr>
<th>38</th><th>38</th><td class="l"><span># The following code is responsible for</span></td>
</tr><tr>
<th>39</th><th>39</th><td class="l"><span># removing dead references from the cache</span></td>
</tr>
</tbody>
<tbody class="skipped">
<tr>
<th><a href="#L120">…</a></th>
<th><a href="#L120">…</a></th>
<td><em></em> </td>
</tr>
</tbody>
<tbody class="unmod">
<tr>
<th>120</th><th>120</th><td class="l"><span> cdef size_t h = (k1 + 13*k2 ^ 503*k3)</span></td>
</tr><tr>
<th>121</th><th>121</th><td class="l"><span> cdef list bucket = <object>PyList_GET_ITEM(self.D.buckets, h % PyList_GET_SIZE(self.D.buckets))</span></td>
</tr><tr>
<th>122</th><th>122</th><td class="l"><span> cdef int i</span></td>
</tr>
</tbody><tbody class="add">
<tr class="last first">
<th> </th><th>123</th><td class="r"><ins> f = file('/Users/SimonKing/SAGE/work/tmp','a')</ins></td>
</tr>
</tbody><tbody class="unmod">
<tr>
<th>123</th><th>124</th><td class="l"><span> for i from 0 <= i < PyList_GET_SIZE(bucket) by 4:</span></td>
</tr><tr>
<th>124</th><th>125</th><td class="l"><span> if <size_t><object>PyList_GET_ITEM(bucket, i)==k1 and \</span></td>
</tr><tr>
<th>125</th><th>126</th><td class="l"><span> <size_t><object>PyList_GET_ITEM(bucket, i+1)==k2 and \</span></td>
</tr><tr>
<th>126</th><th>127</th><td class="l"><span> <size_t><object>PyList_GET_ITEM(bucket, i+2)==k3:</span></td>
</tr>
</tbody><tbody class="add">
<tr class="first">
<th> </th><th>128</th><td class="r"><ins> try:</ins></td>
</tr><tr>
<th> </th><th>129</th><td class="r"><ins> f.write('%s: '%repr(bucket[i+3]))</ins></td>
</tr><tr>
<th> </th><th>130</th><td class="r"><ins> except BaseException,msg:</ins></td>
</tr><tr>
<th> </th><th>131</th><td class="r"><ins> f.write('%s: '%repr(msg))</ins></td>
</tr><tr class="last">
<th> </th><th>132</th><td class="r"><ins> f.write( '%s\n'%tmp_dict[id(bucket[i+3])])</ins></td>
</tr>
</tbody><tbody class="unmod">
<tr>
<th>127</th><th>133</th><td class="l"><span> del bucket[i:i+4]</span></td>
</tr><tr>
<th>128</th><th>134</th><td class="l"><span> self.D._size -= 1</span></td>
</tr><tr>
<th>129</th><th>135</th><td class="l"><span> break</span></td>
</tr><tr>
<th>130</th><th>136</th><td class="l"><span> try:</span></td>
</tr>
</tbody><tbody class="add">
<tr class="last first">
<th> </th><th>137</th><td class="r"><ins> f.close()</ins></td>
</tr>
</tbody><tbody class="unmod">
<tr>
<th>131</th><th>138</th><td class="l"><span> self.D._refcache.__delitem__((k1,k2,k3))</span></td>
</tr><tr>
<th>132</th><th>139</th><td class="l"><span> except KeyError:</span></td>
</tr><tr>
<th>133</th><th>140</th><td class="l"><span> pass</span></td>
</tr>
</tbody>
<tbody class="skipped">
<tr>
<th><a href="#L451">…</a></th>
<th><a href="#L458">…</a></th>
<td><em></em> </td>
</tr>
</tbody>
<tbody class="unmod">
<tr>
<th>451</th><th>458</th><td class="l"><span> self.set(k1, k2, k3, value)</span></td>
</tr><tr>
<th>452</th><th>459</th><td class="l"><span></span></td>
</tr><tr>
<th>453</th><th>460</th><td class="l"><span> cdef set(self, object k1, object k2, object k3, value):</span></td>
</tr>
</tbody><tbody class="add">
<tr class="first">
<th> </th><th>461</th><td class="r"><ins> if getattr(value,'__module__',None)=='weakref':</ins></td>
</tr><tr>
<th> </th><th>462</th><td class="r"><ins> tmp_dict[id(value)] = repr(value())</ins></td>
</tr><tr>
<th> </th><th>463</th><td class="r"><ins> else:</ins></td>
</tr><tr class="last">
<th> </th><th>464</th><td class="r"><ins> tmp_dict[id(value)] = repr(value)</ins></td>
</tr>
</tbody><tbody class="unmod">
<tr>
<th>454</th><th>465</th><td class="l"><span> if self.threshold and self._size > len(self.buckets) * self.threshold:</span></td>
</tr><tr>
<th>455</th><th>466</th><td class="l"><span> self.resize()</span></td>
</tr><tr>
<th>456</th><th>467</th><td class="l"><span> cdef size_t h1 = <size_t><void *>k1</span></td>
</tr>
</tbody>
</table>
</li>
</ul>
</div></div><p>
Unfortunately, with that change, the signal 11 is gone. After all, it is a Heisenbug...
</p>
<p>
The question is: What do we learn from these data?
</p>
<p>
If the underlying set of an action has already been deleted when deleting the action, then of course it would be a problem is some <code>__dealloc__</code> method would try to do something with the underlying set.
</p>
TicketnbruinSun, 09 Sep 2012 16:41:46 GMT
https://trac.sagemath.org/ticket/715#comment:294
https://trac.sagemath.org/ticket/715#comment:294
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:292" title="Comment 292">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
Do you really say: <em>Any</em>? I ask, because the "classical" application of <code>TripleDict</code> in sage.structure.coerce would either have <code>None</code> (for coercion maps) or an operation (for actions) as third key item.
Hence, if a strong reference to the third key items of <code>TripleDict</code> suffices to fix the problem, then I reckon the "non-classical" use of <code>TripleDict</code> in <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> is involved in the segfault: The cache for Homsets, which has categories as third key items.
</p>
</blockquote>
<p>
Not quite. Good suggestion! I tried to only the strongrefs that are either a Category or not, and in either case I prevented the segfault. There's a large overlap between the other keys between different entries, and if a deletion is to blame somewhere, ANY reference to that object would prevent the segfault. I concentrated on the classical use, because that involves few 3rd keys. I found the following possible non-category third keys (not looking at those that are present after sage initialization already):
</p>
<pre class="wiki">set([False, True, <built-in function div>, <built-in function mul>,
None, <built-in function eq>, <built-in function add>,
<built-in function iadd>])
</pre><p>
I tried only storing entries with one third key, for each of the above. Only <code><built-in function mul></code> prevents segfaulting. This doesn't say with absolute certainty that it's one of those key triples whose deletion causes the problem. It could also be that a subtle change in memory layout prevents the segfault. Anyway, the key triples in question are (only the ones added after sage init):
</p>
<pre class="wiki">(Rational Field, <type 'int'>, <built-in function mul>)
(Univariate Polynomial Ring in x over Rational Field, Integer Ring, <built-in function mul>)
(Rational Field, Rational Field, <built-in function mul>)
(Rational Field, Complex Lazy Field, <built-in function mul>)
(Number Field in I with defining polynomial x^2 + 1, <type 'int'>, <built-in function mul>)
(Integer Ring, Symbolic Ring, <built-in function mul>)
(<type 'int'>, Symbolic Ring, <built-in function mul>)
(Integer Ring, Rational Field, <built-in function mul>)
(Symbolic Ring, <type 'int'>, <built-in function mul>)
(<type 'float'>, Symbolic Ring, <built-in function mul>)
(Real Field with 53 bits of precision, Rational Field, <built-in function mul>)
(<type 'list'>, Integer Ring, <built-in function mul>)
(Rational Field, Real Interval Field with 64 bits of precision, <built-in function mul>)
(Real Interval Field with 64 bits of precision, <type 'int'>, <built-in function mul>)
(Number Field in I with defining polynomial x^2 + 1, Rational Field, <built-in function mul>)
(Rational Field, Complex Interval Field with 64 bits of precision, <built-in function mul>)
(Multivariate Polynomial Ring in a, b, c, d over Rational Field, Rational Field, <built-in function mul>)
(Multivariate Polynomial Ring in x, y, z over Rational Field, Rational Field, <built-in function mul>)
(<type 'int'>, Rational Field, <built-in function mul>)
(Ambient space of the Root system of type ['A', 3], Rational Field, <built-in function mul>)
(Rational Field, Ambient space of the Root system of type ['A', 3], <built-in function mul>)
(Rational Field, Root space over the Rational Field of the Root system of type ['A', 3, 1], <built-in function mul>)
(<type 'int'>, Full MatrixSpace of 4 by 4 sparse matrices over Integer Ring, <built-in function mul>)
(Full MatrixSpace of 4 by 4 sparse matrices over Integer Ring, Integer Ring, <built-in function mul>)
(Integer Ring, Coroot lattice of the Root system of type ['A', 3, 1], <built-in function mul>)
(Rational Field, Weight space over the Rational Field of the Root system of type ['A', 3, 1], <built-in function mul>)
(Integer Ring, Weight space over the Rational Field of the Root system of type ['A', 3, 1], <built-in function mul>)
(Rational Field, Integer Ring, <built-in function mul>)
(Full MatrixSpace of 4 by 4 dense matrices over Rational Field, Vector space of dimension 4 over Rational Field, <built-in function mul>)
(Vector space of dimension 4 over Rational Field, Rational Field, <built-in function mul>)
(Integer Ring, Full MatrixSpace of 130 by 390 sparse matrices over Integer Ring, <built-in function mul>)
(Integer Ring, Full MatrixSpace of 130 by 390 sparse matrices over Rational Field, <built-in function mul>)
(<type 'long'>, Integer Ring, <built-in function mul>)
</pre><p>
By the way, I've checked that the segfault really happens during <code>P = Q['a, b, c, d']</code> in example 27, not in getting the generators.
</p>
<p>
Again, it's only <em>likely</em> that one of these objects is involved, since not strong reffing them allows a segfault to happen. Unlikely, but not impossible, is that the mere presence of one of these objects in memory changes the location of an otherwise unrelated memory corruption. I think it's unlikely because the other tests show you can change quite a bit about what you store or not and still get a segfault.
</p>
TicketnbruinSun, 09 Sep 2012 16:51:20 GMT
https://trac.sagemath.org/ticket/715#comment:295
https://trac.sagemath.org/ticket/715#comment:295
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:294" title="Comment 294">nbruin</a>:
</p>
<blockquote class="citation">
<p>
By the way, I've checked that the segfault really happens during <code>P = Q['a, b, c, d']</code> in example 27, not in getting the generators.
</p>
</blockquote>
<p>
And indeed, changing the cache in <code>polynomial_ring_constructor.py</code> to be a <code>dict</code> instead of a <code>WeakValueDictionary</code> prevents the segfault.
</p>
<p>
Digging a little deeper (putting <code>sys.stderr.write("point 2\n")</code> in the source), the segfault happens in <code>sage.rings.polynomial.polynomial_ring_constructor._multi_variate</code>
</p>
<pre class="wiki"> R = MPolynomialRing_libsingular(base_ring, n, names, order)
</pre><p>
In my experience, this bug is relatively robust against things done in python (apart from, strangely, letting doctests fail before a certain point). Simon's code above asks for <code>repr</code>. I imagine doing that on a libsingular object calls into libsingular (which has its own <code>omalloc</code> handled heap, right?)
</p>
<p>
When I was analyzing references, I stored them wholesale in a list. Only later did I ask for string representatives. Hence, I probably avoided extra calls into libsingular.
</p>
<p>
Digging a littler deeper still, the segfault seems to occur in <code>MPolynomialRing_libsingular.__init__</code> in the line:
</p>
<pre class="wiki"> self._ring = singular_ring_new(base_ring, n, self._names, order)
</pre><p>
which goes into <code>sage/libs/singular/ring. pyx</code>. Instrumenting the code there a bit:
</p>
<pre class="wiki"> sys.stderr.write("before _names allocation\n")
_names = <char**>omAlloc0(sizeof(char*)*(len(names)))
sys.stderr.write("after _names allocation\n")
for i from 0 <= i < n:
_name = names[i]
sys.stderr.write("calling omStrDup for i=%s with name=%s\n"%(i,names[i])
_names[i] = omStrDup(_name)
sys.stderr.write("after omStrDup\n")
</pre><p>
gives me (note that the strings to be duplicated are fine for printing!):
</p>
<pre class="wiki">...
after _names allocation
calling omStrDup for i=0 with name=a
after omStrDup
calling omStrDup for i=1 with name=b
<UNHANDLED SIGSEGV>
</pre><p>
I think this strongly implicates a corruption of the omAlloc heap. Other people who know much more about singular hopefully can take over.
</p>
<p>
All my files (including instrumented code) are on <code>bsd:/scratch/nbruin/sage-5.4.beta0</code>. It might be hard to work with directly, but a <code>hg diff</code> might give some useful info regarding which files are involved.
</p>
<p>
One thing that helps a little bit is to guard the <code>omStrDup</code> loop with <code>sig_on()</code> and <code>sig_off</code>. The the segmentation fault gets reported as a <code>RuntimeError</code>. There are of course all kinds of doctests that fail (in particular any of the other polynomial constructions in subsequent tests fail)
</p>
TicketSimonKingSun, 09 Sep 2012 22:50:17 GMTcc changed
https://trac.sagemath.org/ticket/715#comment:296
https://trac.sagemath.org/ticket/715#comment:296
<ul>
<li><strong>cc</strong>
<em>malb</em> added
</li>
</ul>
<p>
Cc to Martin, since I suppose he knows about Singular's <code>omAlloc</code> and can comment on the problems described in <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:295" title="Comment 295">comment:295</a>.
</p>
<p>
Nils: Kudos for digging to such depth!
</p>
<p>
Since there is a new Singular spkg at <a class="closed ticket" href="https://trac.sagemath.org/ticket/13237" title="enhancement: Upgrade Singular (closed: fixed)">#13237</a> which got merged into sage-5.4.beta0, we may check whether the segfault also occurs with the old version of Singular.
</p>
TicketmalbMon, 10 Sep 2012 09:33:04 GMT
https://trac.sagemath.org/ticket/715#comment:297
https://trac.sagemath.org/ticket/715#comment:297
<p>
Nothing immediately comes to mind, but you could try and ask [singular-devel] perhaps?
</p>
TicketnbruinMon, 10 Sep 2012 19:28:23 GMT
https://trac.sagemath.org/ticket/715#comment:298
https://trac.sagemath.org/ticket/715#comment:298
<p>
OK, I've taken out the <code>omStrDup</code> call in <code>sage/libs/singular/ring.pyx</code> and just manually copy the strings over:
</p>
<pre class="wiki"> for i from 0 <= i < n:
_name = names[i]
sys.stderr.write("calling omStrDup for i=%s with name=%s\n"%(i,names[i]))
j = 0
while <bint> _name[j]:
j+=1
j+=1 #increment to include the 0
sys.stderr.write("string length (including 0) seems to be %s\n"%j)
copiedname = <char*>omAlloc(sizeof(char)*(j+perturb))
sys.stderr.write("Done reserving memory buffer; got address %x\n"%(<long>copiedname))
for 0 <= offset < j:
sys.stderr.write("copying character nr %s\n"%offset)
copiedname[offset] = _name[offset]
_names[i] = copiedname
sys.stderr.write("after omStrDup\n")
</pre><p>
If I set this code with <code>perturb=7</code>, I don't get a segfault. With smaller values I do, and the segfault happens in the <code>omAlloc</code> line. Given that <code>j==2</code> for most of this code, I guess that memory blocks are at least 8 bytes (this is OSX 64bits).
</p>
<p>
If <code>omAlloc</code> fails, I guess some of the internal omAlloc data structures is failing (I think the idea is that memory is managed in equal-sized blocks with just a free list on a system mAlloc-ed page). If I were to implement that, I'd store the pointers of the free block linked list in the actual blocks (hence minimum 8 byte blocks), so if anyone omAllocs an 8-byte block and then writes past it, they could ruin the linked list and likely cause a subsequent omAlloc to segfault (because the omAlloc would actually have to access the location pointed to to check if the there is a next node in the free list). Even more likely: some code decides to "zero out" a block after it's already been <code>omFree'd</code>. That could also be a double deallocation.
</p>
<p>
There must be people with vast omAlloc debugging experience who have wonderful tricks to track down this kind of error. A tiny bit of instrumentation should do the trick (frequent verification of free lists, checking that a block is not already in the free list when asked to deallocate -- these are things one could easily do without changing memory layout.
</p>
<p>
In the mean time, we can "fix" the segfault on bsd by allocating a little extra space for variable names. At least 9 bytes seems to do the trick. By now it's pretty clear that the real error is probably a refcounting error in sage libsingular rings, which didn't become apparent until these things actually do get deallocated.
</p>
<p>
If we insist that libsingular rings behave as specified, then part of their specification is likely that they should not be deallocated. Since Volker has already put in manual refcounting, we can simply get the result by
</p>
<p>
<strong>sage/libs/singular/ring.pyx</strong>:
</p>
<div class="wiki-code"><div class="code"><pre> wrapped_ring = wrap_ring(_ring)
if wrapped_ring in ring_refcount_dict:
raise ValueError('newly created ring already in dictionary??')
<span class="gd">- ring_refcount_dict[wrapped_ring] = 1
</span><span class="gi">+ ring_refcount_dict[wrapped_ring] = 2
</span> return _ring
</pre></div></div><p>
Then one can make another ticket "make libsingular rings deallocatable". Given that these rings get tied into the coercion framework anyway, I think you'd be hard-pressed to find a memory regression wrt. pre-<a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a> sage (perhaps one would have to increase a refcount on an object one level higher, since the <code>ring_wrapper_Py</code> objects don't actually live with the _ring. They're only to do an equality test. So with this fix, I think rings would leak in the sense that the <code>UniqueRepresentation</code> type that wraps them would die without the ring dying.)
</p>
<p>
I think exposing the rest of sage to mortal parents is too important to delay on a hard-to-track-down memory issue for deallocation in libsingular.
</p>
TicketnbruinTue, 11 Sep 2012 01:04:11 GMTattachment set
https://trac.sagemath.org/ticket/715
https://trac.sagemath.org/ticket/715
<ul>
<li><strong>attachment</strong>
set to <em>trac_715_osx64-dealloc.patch</em>
</li>
</ul>
<p>
Fix segfault on bsd
</p>
TicketSimonKingTue, 11 Sep 2012 05:58:51 GMT
https://trac.sagemath.org/ticket/715#comment:299
https://trac.sagemath.org/ticket/715#comment:299
<p>
Nils, I find this very interesting! Note that some libsingular refcounting problem was enough to fix a segfault created by <a class="closed ticket" href="https://trac.sagemath.org/ticket/13370" title="defect: Do not cache the result of is_Field externally (closed: fixed)">#13370</a>, but in that case the refcounting concerned non-commutative rings - see <a class="closed ticket" href="https://trac.sagemath.org/ticket/13145" title="defect: Sage's noncommutative rings don't always increment a refcount (closed: fixed)">#13145</a>.
</p>
<p>
I lost track: Did we already test whether <a class="closed ticket" href="https://trac.sagemath.org/ticket/13145" title="defect: Sage's noncommutative rings don't always increment a refcount (closed: fixed)">#13145</a> fixes the segfault here as well?
</p>
TicketSimonKingTue, 11 Sep 2012 06:01:16 GMTdependencies changed
https://trac.sagemath.org/ticket/715#comment:300
https://trac.sagemath.org/ticket/715#comment:300
<ul>
<li><strong>dependencies</strong>
changed from <em>#9138, #11900, #11599, to be merged with #11521</em> to <em>#9138, #11900, #11599, #13145, to be merged with #11521</em>
</li>
</ul>
<p>
I'd say: Let's try this again, with <a class="closed ticket" href="https://trac.sagemath.org/ticket/13145" title="defect: Sage's noncommutative rings don't always increment a refcount (closed: fixed)">#13145</a> as a new dependency!
</p>
<p>
Apply trac_715_combined.patch trac_715_local_refcache.patch trac_715_safer.patch trac_715_specification.patch
</p>
<p>
And then <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>
</p>
TicketSimonKingTue, 11 Sep 2012 06:07:52 GMT
https://trac.sagemath.org/ticket/715#comment:301
https://trac.sagemath.org/ticket/715#comment:301
<p>
Nope, doesn't help.
</p>
<pre class="wiki">bash-3.2$ ../../sage -t sage/misc/cachefunc.pyx
sage -t "devel/sage-main/sage/misc/cachefunc.pyx"
The doctested process was killed by signal 11
[12.7 s]
----------------------------------------------------------------------
The following tests failed:
sage -t "devel/sage-main/sage/misc/cachefunc.pyx" # Killed/crashed
Total time for all tests: 12.7 seconds
bash-3.2$ hg qa
trac_715_combined.patch
trac_715_local_refcache.patch
trac_715_safer.patch
trac_715_specification.patch
trac_11521_homset_weakcache_combined.patch
trac_11521_callback.patch
13145.patch
</pre><p>
and still, under gdb:
</p>
<pre class="wiki">sage: @cached_function
....: def oddprime_factors(n):
....: l = [p for p,e in factor(n) if p != 2]
....: return len(l)
....:
sage: oddprime_factors.precompute(range(1,100))
[Errno 4] Interrupted system call
Killing any remaining workers...
</pre>
TicketnbruinTue, 11 Sep 2012 07:02:39 GMT
https://trac.sagemath.org/ticket/715#comment:302
https://trac.sagemath.org/ticket/715#comment:302
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:301" title="Comment 301">SimonKing</a>:
</p>
<blockquote class="citation">
<pre class="wiki">sage: @cached_function
....: def oddprime_factors(n):
....: l = [p for p,e in factor(n) if p != 2]
....: return len(l)
....:
sage: oddprime_factors.precompute(range(1,100))
[Errno 4] Interrupted system call
Killing any remaining workers...
</pre></blockquote>
<p>
I firmly believe that's an unrelated problem. It's hard to imagine how singular could be involved with that. Furthermore, we have already seen that we can solve this one by setting and handling SIGALRM more cleanly.
</p>
TicketSimonKingTue, 11 Sep 2012 08:12:11 GMT
https://trac.sagemath.org/ticket/715#comment:303
https://trac.sagemath.org/ticket/715#comment:303
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:302" title="Comment 302">nbruin</a>:
</p>
<blockquote class="citation">
<p>
I firmly believe that's an unrelated problem. It's hard to imagine how singular could be involved with that.
</p>
</blockquote>
<p>
Sure.
</p>
<p>
So, what shall we do? Do we all agree that the plan is as follows:
</p>
<ul><li>The SIGALRM problem (under gdb) is solved on a different ticket already, namely <a class="closed ticket" href="https://trac.sagemath.org/ticket/13437" title="defect: Clean up SIGALRM handling in p_iter_fork (closed: invalid)">#13437</a>. So, make it a dependency.
</li><li>The refcounting problem that is likely to be behind the signal 11 problem (without gdb) can be temporarily worked around by using a strong cache to libsingular polynomial rings. I only wonder whether the doctests demonstrating the weak cache will still work. But it would be a chance to finally get over with <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/12215" title="defect: Memleak in UniqueRepresentation, @cached_method (closed: fixed)">#12215</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/13370" title="defect: Do not cache the result of is_Field externally (closed: fixed)">#13370</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/12876" title="enhancement: Fix element and parent classes of Hom categories to be abstract, and ... (closed: fixed)">#12876</a> and so on.
</li><li>A proper fix of the refcounting problem should be done on a new ticket. Nils, since you already made a deep analysis of the problem, could you create that new ticket?
</li></ul><p>
Here is another message to the patchbot, since I forgot to include the new patch:
</p>
<p>
Apply trac_715_combined.patch trac_715_local_refcache.patch trac_715_safer.patch trac_715_specification.patch trac_715_osx64-dealloc.patch
</p>
<p>
And then <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>
</p>
TicketSimonKingTue, 11 Sep 2012 15:56:06 GMT
https://trac.sagemath.org/ticket/715#comment:304
https://trac.sagemath.org/ticket/715#comment:304
<p>
When running sage-5.2.rc0 with patches on <code>OpenSuse</code> under gdb, I get the following:
</p>
<pre class="wiki">sage: @cached_function
....: def oddprime_factors(n):
....: l = [p for p,e in factor(n) if p != 2]
....: return len(l)
....:
sage: oddprime_factors.precompute(range(1,100))
Detaching after fork from child process 20030.
Detaching after fork from child process 20031.
...
Detaching after fork from child process 20127.
Detaching after fork from child process 20128.
sage:
</pre><p>
If I understand correctly, that message comes from gdb and informs that gdb can only follow one of the two processes after forking. So, that is to be expected, right?
</p>
<p>
Anyway, it is another data point, telling that the two problems we are seeing here are specific to OS X on Intel.
</p>
TicketSimonKingTue, 11 Sep 2012 15:59:16 GMT
https://trac.sagemath.org/ticket/715#comment:305
https://trac.sagemath.org/ticket/715#comment:305
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:304" title="Comment 304">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
If I understand correctly, that message comes from gdb and informs that gdb can only follow one of the two processes after forking. So, that is to be expected, right?
</p>
</blockquote>
<p>
Or perhaps not:
</p>
<pre class="wiki">simon@linux-sqwp:~/SAGE/prerelease/sage-5.2.rc0/devel/sage> gdb
GNU gdb (GDB) SUSE (7.3-41.1.2)
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-suse-linux".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
(gdb) show print inferior-events
Printing of inferior events is off.
</pre><p>
Anyway, I guess it doesn't help with the two problems we are facing here.
</p>
TicketSimonKingWed, 12 Sep 2012 10:21:04 GMT
https://trac.sagemath.org/ticket/715#comment:306
https://trac.sagemath.org/ticket/715#comment:306
<p>
The new ticket for the libsingular problem is <a class="closed ticket" href="https://trac.sagemath.org/ticket/13450" title="defect: Fix refcounting of libsingular rings (closed: duplicate)">#13450</a>.
</p>
<p>
I suggest that we use a temporary work-around, as proposed by Nils, so that we can use <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>. The work-around should be converted into a proper fix in <a class="closed ticket" href="https://trac.sagemath.org/ticket/13450" title="defect: Fix refcounting of libsingular rings (closed: duplicate)">#13450</a>.
</p>
TicketjdemeyerWed, 12 Sep 2012 15:07:20 GMT
https://trac.sagemath.org/ticket/715#comment:307
https://trac.sagemath.org/ticket/715#comment:307
<p>
I guess the list of patches in the description needs to be updated?
</p>
TicketjdemeyerWed, 12 Sep 2012 15:15:33 GMTdescription changed
https://trac.sagemath.org/ticket/715#comment:308
https://trac.sagemath.org/ticket/715#comment:308
<ul>
<li><strong>description</strong>
modified (<a href="/ticket/715?action=diff&version=308">diff</a>)
</li>
</ul>
<p>
<strong>Please update the ticket description if you add patches.</strong>
</p>
<p>
Thanks.
</p>
TicketnbruinWed, 12 Sep 2012 15:22:17 GMT
https://trac.sagemath.org/ticket/715#comment:309
https://trac.sagemath.org/ticket/715#comment:309
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:306" title="Comment 306">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
The new ticket for the libsingular problem is <a class="closed ticket" href="https://trac.sagemath.org/ticket/13450" title="defect: Fix refcounting of libsingular rings (closed: duplicate)">#13450</a>.
</p>
</blockquote>
<p>
I think that one can be considered a duplicate of <a class="needs_work ticket" href="https://trac.sagemath.org/ticket/13447" title="defect: Make libsingular multivariate polynomial rings collectable (needs_work)">#13447</a>
</p>
<p>
Something along the lines of <a class="attachment" href="https://trac.sagemath.org/attachment/ticket/715/trac_715_osx64-dealloc.patch" title="Attachment 'trac_715_osx64-dealloc.patch' in Ticket #715">attachment:trac_715_osx64-dealloc.patch</a><a class="trac-rawlink" href="https://trac.sagemath.org/raw-attachment/ticket/715/trac_715_osx64-dealloc.patch" title="Download"></a> should work best, because then at least the permanently stored copy is available for <code>UniqueRepresentation</code> purposes (for which polynomial rings have their own weakvaluedictionary).
</p>
<p>
Solving the SIGALRM issue is optional since it only occurs on one machine with gdb and I don't think we specify that sage is supposed to work perfectly under gdb on all supported platforms. A first go at the problem is at <a class="closed ticket" href="https://trac.sagemath.org/ticket/13437" title="defect: Clean up SIGALRM handling in p_iter_fork (closed: invalid)">#13437</a> (does its job).
</p>
TicketSimonKingWed, 12 Sep 2012 15:40:53 GMT
https://trac.sagemath.org/ticket/715#comment:310
https://trac.sagemath.org/ticket/715#comment:310
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:307" title="Comment 307">jdemeyer</a>:
</p>
<blockquote class="citation">
<p>
I guess the list of patches in the description needs to be updated?
</p>
</blockquote>
<p>
Not necessarily, since I am not sure whether we all agree that a work-around by a permanent cache for polynomial rings is the right thing to do. Mainly I wanted to let the patchbot test whether other tests break with a permanent cache (e.g., tests that were introduced in <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a> or <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>). But the patchbots seems to be down, so, it doesn't make sense anyway.
</p>
TicketSimonKingWed, 12 Sep 2012 15:44:30 GMT
https://trac.sagemath.org/ticket/715#comment:311
https://trac.sagemath.org/ticket/715#comment:311
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:309" title="Comment 309">nbruin</a>:
</p>
<blockquote class="citation">
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:306" title="Comment 306">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
The new ticket for the libsingular problem is <a class="closed ticket" href="https://trac.sagemath.org/ticket/13450" title="defect: Fix refcounting of libsingular rings (closed: duplicate)">#13450</a>.
</p>
</blockquote>
<p>
I think that one can be considered a duplicate of <a class="needs_work ticket" href="https://trac.sagemath.org/ticket/13447" title="defect: Make libsingular multivariate polynomial rings collectable (needs_work)">#13447</a>
</p>
</blockquote>
<p>
Probably. Why didn't you put me as cc? Then, I wouldn't have opened <a class="closed ticket" href="https://trac.sagemath.org/ticket/13450" title="defect: Fix refcounting of libsingular rings (closed: duplicate)">#13450</a>.
</p>
<blockquote class="citation">
<p>
Something along the lines of <a class="attachment" href="https://trac.sagemath.org/attachment/ticket/715/trac_715_osx64-dealloc.patch" title="Attachment 'trac_715_osx64-dealloc.patch' in Ticket #715">attachment:trac_715_osx64-dealloc.patch</a><a class="trac-rawlink" href="https://trac.sagemath.org/raw-attachment/ticket/715/trac_715_osx64-dealloc.patch" title="Download"></a> should work best,
</p>
</blockquote>
<p>
But only as a temporary workaround. In my applications, it is absolutely essential that polynomial rings can be deallocated, or the memory consumption would explode.
</p>
<blockquote class="citation">
<p>
Solving the SIGALRM issue is optional since it only occurs on one machine with gdb and I don't think we specify that sage is supposed to work perfectly under gdb on all supported platforms. A first go at the problem is at <a class="closed ticket" href="https://trac.sagemath.org/ticket/13437" title="defect: Clean up SIGALRM handling in p_iter_fork (closed: invalid)">#13437</a> (does its job).
</p>
</blockquote>
<p>
Agreed.
</p>
TicketSimonKingWed, 12 Sep 2012 15:49:51 GMT
https://trac.sagemath.org/ticket/715#comment:312
https://trac.sagemath.org/ticket/715#comment:312
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:310" title="Comment 310">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
Not necessarily, since I am not sure whether we all agree that a work-around by a permanent cache for polynomial rings is the right thing to do.
</p>
</blockquote>
<p>
PS: For my own applications, it is certainly not acceptable to have a permanent cache for polynomial rings. Actually, the only reason why I worked on <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a>, <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>, <a class="closed ticket" href="https://trac.sagemath.org/ticket/12215" title="defect: Memleak in UniqueRepresentation, @cached_method (closed: fixed)">#12215</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a> is to make polynomial rings collectable.
</p>
TicketnbruinWed, 12 Sep 2012 16:31:59 GMT
https://trac.sagemath.org/ticket/715#comment:313
https://trac.sagemath.org/ticket/715#comment:313
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:312" title="Comment 312">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
PS: For my own applications, it is certainly not acceptable to have a permanent cache for polynomial rings. Actually, the only reason why I worked on <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a>, <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>, <a class="closed ticket" href="https://trac.sagemath.org/ticket/12215" title="defect: Memleak in UniqueRepresentation, @cached_method (closed: fixed)">#12215</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a> is to make polynomial rings collectable.
</p>
</blockquote>
<p>
Excellent! that makes you the perfect person to either find convincing evidence that <a class="needs_work ticket" href="https://trac.sagemath.org/ticket/13447" title="defect: Make libsingular multivariate polynomial rings collectable (needs_work)">#13447</a> is not necessary due to some silly configuration issue on <code>bsd.math</code> or push through a proper fix! :-). A side-trip to Kaiserslautern sounds like the best way to make progress.
</p>
TicketjdemeyerWed, 12 Sep 2012 18:59:38 GMT
https://trac.sagemath.org/ticket/715#comment:314
https://trac.sagemath.org/ticket/715#comment:314
<p>
With the current patches of <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a> + <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> + <a class="closed ticket" href="https://trac.sagemath.org/ticket/12215" title="defect: Memleak in UniqueRepresentation, @cached_method (closed: fixed)">#12215</a> + <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a>, I get:
</p>
<pre class="wiki">sage -t -force_lib devel/sage/sage/rings/polynomial/multi_polynomial_libsingular.pyx
**********************************************************************
File "/release/merger/sage-5.4.beta2/devel/sage-main/sage/rings/polynomial/multi_polynomial_libsingular.pyx", line 423:
sage: len(ring_refcount_dict) == n
Expected:
True
Got:
False
**********************************************************************
</pre><pre class="wiki">sage -t -force_lib devel/sage/sage/libs/singular/ring.pyx
**********************************************************************
File "/release/merger/sage-5.4.beta2/devel/sage-main/sage/libs/singular/ring.pyx", line 490:
sage: ring_ptr in ring_refcount_dict
Expected:
False
Got:
True
**********************************************************************
</pre>
TicketSimonKingWed, 12 Sep 2012 19:12:30 GMT
https://trac.sagemath.org/ticket/715#comment:315
https://trac.sagemath.org/ticket/715#comment:315
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:314" title="Comment 314">jdemeyer</a>:
</p>
<blockquote class="citation">
<p>
With the current patches of <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a> + <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> + <a class="closed ticket" href="https://trac.sagemath.org/ticket/12215" title="defect: Memleak in UniqueRepresentation, @cached_method (closed: fixed)">#12215</a> + <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a>, I get:
</p>
</blockquote>
<p>
And I guess the failing tests were introduced in one of the four tickets...
</p>
TicketjdemeyerWed, 12 Sep 2012 19:20:18 GMT
https://trac.sagemath.org/ticket/715#comment:316
https://trac.sagemath.org/ticket/715#comment:316
<p>
I'm pretty sure the failing tests are because of <a class="attachment" href="https://trac.sagemath.org/attachment/ticket/715/trac_715_osx64-dealloc.patch" title="Attachment 'trac_715_osx64-dealloc.patch' in Ticket #715">trac_715_osx64-dealloc.patch</a><a class="trac-rawlink" href="https://trac.sagemath.org/raw-attachment/ticket/715/trac_715_osx64-dealloc.patch" title="Download"></a> (since that's the only thing that changed recently).
</p>
TicketSimonKingWed, 12 Sep 2012 20:03:05 GMT
https://trac.sagemath.org/ticket/715#comment:317
https://trac.sagemath.org/ticket/715#comment:317
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:316" title="Comment 316">jdemeyer</a>:
</p>
<blockquote class="citation">
<p>
I'm pretty sure the failing tests are because of <a class="attachment" href="https://trac.sagemath.org/attachment/ticket/715/trac_715_osx64-dealloc.patch" title="Attachment 'trac_715_osx64-dealloc.patch' in Ticket #715">trac_715_osx64-dealloc.patch</a><a class="trac-rawlink" href="https://trac.sagemath.org/raw-attachment/ticket/715/trac_715_osx64-dealloc.patch" title="Download"></a> (since that's the only thing that changed recently).
</p>
</blockquote>
<p>
In fact, the two tests were introduced in <a class="closed ticket" href="https://trac.sagemath.org/ticket/11339" title="defect: Refcounting for Singular rings (closed: fixed)">#11339</a>.
</p>
<p>
The tests work, because they explicitly avoid calling the polynomial ring constructor (which has a strong cache in vanilla Sage) and calls the class explicitly. Nils' patch, introduces a strong cache not in the polynomial ring constructor, but directly in the class' <code>__init__</code> method. That's why the tests break.
</p>
TicketSimonKingThu, 13 Sep 2012 11:20:41 GMT
https://trac.sagemath.org/ticket/715#comment:318
https://trac.sagemath.org/ticket/715#comment:318
<p>
Sigh.
</p>
<p>
Nils, you found that <code>python -t</code> <a class="attachment" href="https://trac.sagemath.org/attachment/ticket/715/cachefunc_94107.py" title="Attachment 'cachefunc_94107.py' in Ticket #715">cachefunc_94107.py</a><a class="trac-rawlink" href="https://trac.sagemath.org/raw-attachment/ticket/715/cachefunc_94107.py" title="Download"></a> segfaults in example_27, right? But if one deletes all tests that come <em>after</em> example_27, the segfault vanishes.
</p>
<p>
In other words, whether there is a segfault or not depends on the presence of tests that will never be executed because of the segfault.
</p>
TicketSimonKingThu, 13 Sep 2012 11:32:07 GMT
https://trac.sagemath.org/ticket/715#comment:319
https://trac.sagemath.org/ticket/715#comment:319
<p>
There is a segfault when deleting example_1. However, if one additionally deletes any of the tests that comes after example_61, then the segfault vanishes.
</p>
<p>
It seems we will not get a reasonable minimal example. Let's see if Hans Schönemann has an idea.
</p>
TicketnbruinThu, 13 Sep 2012 22:35:18 GMT
https://trac.sagemath.org/ticket/715#comment:320
https://trac.sagemath.org/ticket/715#comment:320
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:318" title="Comment 318">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
In other words, whether there is a segfault or not depends on the presence of tests that will never be executed because of the segfault.
</p>
</blockquote>
<p>
That's assuming that the doctesting framework tests all examples *in order*. I'm not 100% positive that that's the case. When I equipped a bunch of tests with lines of the form
</p>
<pre class="wiki"> >>> 1
EXAMPLE 6
</pre><p>
etcetera, the segfault vanished (of course!) but I didn't see all test appear in numerical order. I guess I was just blocking that because of the severe cognitive dissonance this was causing, but your remark now makes it unavoidable to acknowledge.
</p>
<p>
Indeed, reading the generated <code>.py</code> file:
</p>
<pre class="wiki">...
m = sys.modules[__name__]
...
runner = sagedoctest.testmod_returning_runner(m,
...
</pre><p>
so the doctestrunner gets a hold of which doctests to run by getting passed the <em>module</em> <code>__main__</code>. At that point, it can basically only look up the runnable methods in the dictionary, so ordering is not guaranteed. It likely just extracts the doctests by the usual docstring introspecion tools.
</p>
<p>
Perhaps if we equip every test with a line
</p>
<pre class="wiki"> >>> sys.stderr.write('testing test 6\n')
</pre><p>
we may might be able to see the actual order in which the examples are tested without preventing the doctest from happening. Perhaps when we establish that, we can strip out the (for us) silly doctesting layer and just have a plain python input file? Or take a guess and hope that <code>__main__.__dict__</code> is listing the tests in the order they are tested.
</p>
TicketSimonKingFri, 14 Sep 2012 10:22:26 GMT
https://trac.sagemath.org/ticket/715#comment:321
https://trac.sagemath.org/ticket/715#comment:321
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:320" title="Comment 320">nbruin</a>:
</p>
<blockquote class="citation">
<p>
Perhaps if we equip every test with a line
</p>
<pre class="wiki"> >>> sys.stderr.write('testing test 6\n')
</pre><p>
we may might be able to see the actual order in which the examples are tested without preventing the doctest from happening.
</p>
</blockquote>
<p>
<Deep sigh>
</p>
<p>
I tried, and printing to sys.stderr makes the segfault disappear. However, it shows that the doctests <em>are</em> executed in alphabetical order: ..., example_49, example_5, example_50, example_51, ..., example_67, example_7, example_8, example_9.
</p>
<p>
Assuming that the same order is used when <em>not</em> printing to stderr, we still find that the absence of a later doc test will prevent the segfault. Namely, you located the segfault in example_27, but it won't occur when deleting example_62, which comes alphabetically after example_27.
</p>
TicketSimonKingSat, 15 Sep 2012 17:56:55 GMTdependencies changed
https://trac.sagemath.org/ticket/715#comment:322
https://trac.sagemath.org/ticket/715#comment:322
<ul>
<li><strong>dependencies</strong>
changed from <em>#9138, #11900, #11599, #13145, to be merged with #11521</em> to <em>#9138, #11900, #11599, #13145, #13447 to be merged with #11521</em>
</li>
</ul>
<p>
With Nils' work at <a class="needs_work ticket" href="https://trac.sagemath.org/ticket/13447" title="defect: Make libsingular multivariate polynomial rings collectable (needs_work)">#13447</a>, it seems that we have a proper solution of the problem and do not need a permanent cache for polynomial rings.
</p>
<p>
Hence, I am adding <a class="needs_work ticket" href="https://trac.sagemath.org/ticket/13447" title="defect: Make libsingular multivariate polynomial rings collectable (needs_work)">#13447</a> as a dependency.
</p>
<p>
Apply trac_715_combined.patch trac_715_local_refcache.patch trac_715_safer.patch trac_715_specification.patch
</p>
<p>
And then <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>
</p>
TicketjdemeyerWed, 19 Sep 2012 06:33:33 GMTdependencies changed
https://trac.sagemath.org/ticket/715#comment:323
https://trac.sagemath.org/ticket/715#comment:323
<ul>
<li><strong>dependencies</strong>
changed from <em>#9138, #11900, #11599, #13145, #13447 to be merged with #11521</em> to <em>#13145, #13447, to be merged with #11521</em>
</li>
</ul>
TicketnbruinSun, 23 Sep 2012 08:42:34 GMTstatus, dependencies changed
https://trac.sagemath.org/ticket/715#comment:324
https://trac.sagemath.org/ticket/715#comment:324
<ul>
<li><strong>status</strong>
changed from <em>needs_review</em> to <em>positive_review</em>
</li>
<li><strong>dependencies</strong>
changed from <em>#13145, #13447, to be merged with #11521</em> to <em>#13145, to be merged with #11521</em>
</li>
</ul>
<p>
Removing dependency <a class="needs_work ticket" href="https://trac.sagemath.org/ticket/13447" title="defect: Make libsingular multivariate polynomial rings collectable (needs_work)">#13447</a> again, because it looks like that ticket is not close to resolution. In the mean time, leaving polynomial rings immortal is not a regression compared to previous behaviour. Note that while the issue was only diagnosed on OSX, deallocation of polynomial rings indeed leads to potential write-after-free, so the osx64-dealloc patch should be applied universally.
</p>
<p>
That means we're back at <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:317" title="Comment 317">comment:317</a> and positive review. We should really get this merged. At least the libsingular interface is not worse than it was before. Proper coordination of libsingular and python memory management shouldn't hold up reclaiming of other rings.
</p>
<p>
Apply trac_715_combined.patch trac_715_local_refcache.patch trac_715_safer.patch trac_715_specification.patch trac_715_osx64-dealloc.patch
</p>
TicketjdemeyerSun, 23 Sep 2012 16:15:30 GMTmilestone changed
https://trac.sagemath.org/ticket/715#comment:325
https://trac.sagemath.org/ticket/715#comment:325
<ul>
<li><strong>milestone</strong>
changed from <em>sage-5.4</em> to <em>sage-5.5</em>
</li>
</ul>
<p>
SInce these tickets have caused some trouble in the past, I prefer to merge them only in a .beta0 (to maximize the testing), hence the milestone bump.
</p>
TicketjdemeyerFri, 05 Oct 2012 13:57:49 GMTattachment set
https://trac.sagemath.org/ticket/715
https://trac.sagemath.org/ticket/715
<ul>
<li><strong>attachment</strong>
set to <em>715_all.patch</em>
</li>
</ul>
TicketjdemeyerFri, 05 Oct 2012 13:59:05 GMTdescription changed
https://trac.sagemath.org/ticket/715#comment:326
https://trac.sagemath.org/ticket/715#comment:326
<ul>
<li><strong>description</strong>
modified (<a href="/ticket/715?action=diff&version=326">diff</a>)
</li>
</ul>
<p>
I combined all the patches in one patch.
</p>
TicketjdemeyerWed, 17 Oct 2012 20:59:27 GMTstatus changed; resolution, merged set
https://trac.sagemath.org/ticket/715#comment:327
https://trac.sagemath.org/ticket/715#comment:327
<ul>
<li><strong>status</strong>
changed from <em>positive_review</em> to <em>closed</em>
</li>
<li><strong>resolution</strong>
set to <em>fixed</em>
</li>
<li><strong>merged</strong>
set to <em>sage-5.5.beta0</em>
</li>
</ul>
TicketjdemeyerSat, 03 Nov 2012 17:35:41 GMTstatus changed; resolution, merged deleted
https://trac.sagemath.org/ticket/715#comment:328
https://trac.sagemath.org/ticket/715#comment:328
<ul>
<li><strong>status</strong>
changed from <em>closed</em> to <em>new</em>
</li>
<li><strong>resolution</strong>
<em>fixed</em> deleted
</li>
<li><strong>merged</strong>
<em>sage-5.5.beta0</em> deleted
</li>
</ul>
<p>
Sorry to bring bad news, but a trial sage-5.5.beta1 caused a Segmentation Fault in <code>sage/schemes/elliptic_curves/ell_number_field.py</code> on OS X 10.4 PPC. Removing <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> made the problem go away. This really feels like a déjà vu, but I'm afraid I need to remove the patch from sage-5.5.beta0.
</p>
TicketjdemeyerSun, 04 Nov 2012 08:37:09 GMTstatus changed; resolution, merged set
https://trac.sagemath.org/ticket/715#comment:329
https://trac.sagemath.org/ticket/715#comment:329
<ul>
<li><strong>status</strong>
changed from <em>new</em> to <em>closed</em>
</li>
<li><strong>resolution</strong>
set to <em>fixed</em>
</li>
<li><strong>merged</strong>
set to <em>sage-5.5.beta0</em>
</li>
</ul>
TicketjdemeyerMon, 05 Nov 2012 07:51:30 GMT
https://trac.sagemath.org/ticket/715#comment:330
https://trac.sagemath.org/ticket/715#comment:330
<p>
sage-5.5.beta0 + <a class="closed ticket" href="https://trac.sagemath.org/ticket/11593" title="enhancement: `quo_rem` for divisor of leading unit coefficient (closed: fixed)">#11593</a> gives
</p>
<pre class="wiki">sage -t --long "devel/sage/sage/schemes/elliptic_curves/ell_number_field.py"
The doctested process was killed by signal 11
[166.4 s]
</pre><p>
on OS X 10.4 PPC (not on other systems as far as I know).
</p>
TicketjdemeyerMon, 12 Nov 2012 21:43:16 GMT
https://trac.sagemath.org/ticket/715#comment:331
https://trac.sagemath.org/ticket/715#comment:331
<p>
Sigh. While testing a preliminary sage-5.5.beta2, I got again
</p>
<pre class="wiki">sage -t --long -force_lib devel/sage/sage/schemes/elliptic_curves/ell_number_field.py
Segmentation fault (core dumped)
</pre><p>
on a <em>different</em> system as before (Linux i686) and with <em>different</em> patches.
</p>
TicketSimonKingMon, 12 Nov 2012 22:20:56 GMT
https://trac.sagemath.org/ticket/715#comment:332
https://trac.sagemath.org/ticket/715#comment:332
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:331" title="Comment 331">jdemeyer</a>:
</p>
<blockquote class="citation">
<p>
Sigh. While testing a preliminary sage-5.5.beta2, I got again
</p>
<pre class="wiki">sage -t --long -force_lib devel/sage/sage/schemes/elliptic_curves/ell_number_field.py
Segmentation fault (core dumped)
</pre><p>
on a <em>different</em> system as before (Linux i686) and with <em>different</em> patches.
</p>
</blockquote>
<p>
What does the core dump say? libsingular again, or something else?
</p>
<p>
I am sorry, but recently (and at least until end of this week) I will not be able to do Sage development.
</p>
TicketmjoMon, 19 Nov 2012 03:39:49 GMTcc changed
https://trac.sagemath.org/ticket/715#comment:333
https://trac.sagemath.org/ticket/715#comment:333
<ul>
<li><strong>cc</strong>
<em>mjo</em> added
</li>
</ul>
<p>
FWIW, this is happening here consistently with 5.5.rc0. Not sure if this will be useful, I compiled with my default CFLAGS:
</p>
<pre class="wiki">(gdb) bt
#0 convi (x=0x555559c55318, l=0x7ffffffec7d0) at ../src/kernel/gmp/mp.c:1288
#1 0x00007ffff4bec01a in itostr_sign (x=<optimized out>, sx=1,
len=0x7ffffffec858) at ../src/language/es.c:507
#2 0x00007ffff4bf10b6 in str_absint (x=0x555559c55318, S=0x7ffffffecac0)
at ../src/language/es.c:1788
#3 bruti_intern (g=0x555559c55318, T=<optimized out>, S=0x7ffffffecac0,
addsign=1) at ../src/language/es.c:2568
#4 0x00007ffff4bf197e in bruti_intern (g=0x555559c55348, T=0x7ffff4f4bc80,
S=0x7ffffffecac0, addsign=<optimized out>) at ../src/language/es.c:2741
#5 0x00007ffff4bf0ec4 in GENtostr_fun (out=0x7ffff4bf3d10 <bruti>,
T=0x7ffff4f4bc80, x=0x555559c55348) at ../src/language/es.c:1655
#6 GENtostr (x=0x555559c55348) at ../src/language/es.c:1661
#7 0x00007fffeaea9d14 in gcmp_sage (y=0x55555ca84ef8, x=0x555559c55348)
at sage/libs/pari/misc.h:60
#8 __pyx_f_4sage_4libs_4pari_3gen_3gen__cmp_c_impl (
__pyx_v_left=<optimized out>, __pyx_v_right=<optimized out>)
at sage/libs/pari/gen.c:9747
#9 0x00007fffee0d7697 in __pyx_f_4sage_9structure_7element_7Element__richcmp_c_impl (__pyx_v_left=0x55555a963c58, __pyx_v_right=<optimized out>,
__pyx_v_op=2) at sage/structure/element.c:8719
#10 0x00007fffee0f48e4 in __pyx_f_4sage_9structure_7element_7Element__richcmp
(__pyx_v_left=0x55555a963c58, __pyx_v_right=0x55555c6f9db8, __pyx_v_op=2)
at sage/structure/element.c:8418
#11 0x00007fffeaea0d1b in __pyx_pf_4sage_4libs_4pari_3gen_3gen_88__richcmp__ (
__pyx_v_op=<optimized out>, __pyx_v_right=<optimized out>,
__pyx_v_left=<optimized out>) at sage/libs/pari/gen.c:9709
#12 __pyx_pw_4sage_4libs_4pari_3gen_3gen_89__richcmp__ (
__pyx_v_left=<optimized out>, __pyx_v_right=<optimized out>,
__pyx_v_op=<optimized out>) at sage/libs/pari/gen.c:9679
#13 0x00007ffff7a705fa in try_rich_compare (v=0x55555a963c58,
w=0x55555c6f9db8, op=2) at Objects/object.c:617
#14 0x00007ffff7a7318b in try_rich_compare_bool (op=2, w=0x55555c6f9db8,
v=0x55555a963c58) at Objects/object.c:645
...
</pre>
TicketjpfloriMon, 19 Nov 2012 14:22:39 GMT
https://trac.sagemath.org/ticket/715#comment:334
https://trac.sagemath.org/ticket/715#comment:334
<p>
Ok, I can reproduce a segfault on a x86_64 system when running ell_number_field.py tests under gdb.
The end of the backtrace is similar to what mjo posted.
The beginning involves twisted, so it feels like the segfault happens when Sage quits, somewhere in quit_sage.
This might be <a class="ext-link" href="http://trac.sagemath.org/sage_trac/attachment/ticket/12215/trac12215_segfault_fixes.patch"><span class="icon"></span>http://trac.sagemath.org/sage_trac/attachment/ticket/12215/trac12215_segfault_fixes.patch</a> as the backtrace suggests, and removing the offending PARI deallocation suggests as well.
</p>
<p>
I'll retry with the patch linked above and the other fix from <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a> for polybori.
</p>
TicketjpfloriMon, 19 Nov 2012 14:32:58 GMT
https://trac.sagemath.org/ticket/715#comment:335
https://trac.sagemath.org/ticket/715#comment:335
<p>
ell_number_field.py seems fine with the two above fixes, so I guess the easiest solution is to open tickets to include these patches alone (and not all the hard work originally targetted in <a class="closed ticket" href="https://trac.sagemath.org/ticket/12215" title="defect: Memleak in UniqueRepresentation, @cached_method (closed: fixed)">#12215</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a>), make the tickets here (<a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a>) and there (<a class="closed ticket" href="https://trac.sagemath.org/ticket/12215" title="defect: Memleak in UniqueRepresentation, @cached_method (closed: fixed)">#12215</a> and <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a>) depend on these "new" tickets, and relaunch the patchbots with the new set of patches.
</p>
<p>
Maybe with Nils findings as well, like what is discussed at <a class="ext-link" href="https://groups.google.com/d/topic/sage-devel/hgQLrqnCeyA/discussion"><span class="icon"></span>https://groups.google.com/d/topic/sage-devel/hgQLrqnCeyA/discussion</a>
if a patch is devised, and <a class="closed ticket" href="https://trac.sagemath.org/ticket/13719" title="defect: Illegal free in graph_generators (closed: fixed)">#13719</a>, or keep these two for later...
</p>
TicketmjoMon, 19 Nov 2012 16:03:11 GMT
https://trac.sagemath.org/ticket/715#comment:336
https://trac.sagemath.org/ticket/715#comment:336
<p>
The fix at <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a> is what does it for me:
</p>
<pre class="wiki">$ sage -hg qapp
trac_12313_quit_sage.patch
$ sage -t -long ./sage/schemes/elliptic_curves/ell_number_field.py
sage -t -long "devel/sage-main/sage/schemes/elliptic_curves/ell_number_field.py"
[43.8 s]
----------------------------------------------------------------------
All tests passed!
Total time for all tests: 43.8 seconds
</pre>
TicketjpfloriWed, 21 Nov 2012 13:49:22 GMT
https://trac.sagemath.org/ticket/715#comment:337
https://trac.sagemath.org/ticket/715#comment:337
<p>
Got a segfault in interrupt.pyx during "make ptestlong" on 5.5.rc0 plus the fixes mentioned above plus some pynac related patches.
</p>
<p>
I got
{{
Fatal Python error: GC object already tracked
}}
followed by an highly uninterresting backtrace involving Python magic (from libpython itself and the Sage process and doctesting environment I guess), interrupt.so, a final call to PyTuple_New in libpython and boom (through libpthread and libcsage), but in particular nothing related to pynac, so the additional patches concerning pynac can be ruled out.
</p>
TicketjpfloriWed, 21 Nov 2012 13:50:27 GMT
https://trac.sagemath.org/ticket/715#comment:338
https://trac.sagemath.org/ticket/715#comment:338
<p>
I've not managed to reproduce it by testing interrupt.pyx alone.
</p>
TicketjpfloriWed, 21 Nov 2012 15:33:48 GMT
https://trac.sagemath.org/ticket/715#comment:339
https://trac.sagemath.org/ticket/715#comment:339
<p>
In fact I could after some other tens of iterations.
By the way it also happened that the test timed out.
Not sure this is related though, or present without the patches, or whatever.
</p>
TicketSimonKingThu, 22 Nov 2012 10:17:16 GMT
https://trac.sagemath.org/ticket/715#comment:340
https://trac.sagemath.org/ticket/715#comment:340
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:336" title="Comment 336">mjo</a>:
</p>
<blockquote class="citation">
<p>
The fix at <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a> is what does it for me:
</p>
</blockquote>
<p>
The next step for me is to move the pari-fix from <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a> to a separate ticket.
</p>
TicketSimonKingThu, 22 Nov 2012 10:26:22 GMT
https://trac.sagemath.org/ticket/715#comment:341
https://trac.sagemath.org/ticket/715#comment:341
<p>
Spoke too soon. I thought you are talking about the fix that made pari be properly deallocated. But that's in a patch from <a class="closed ticket" href="https://trac.sagemath.org/ticket/12215" title="defect: Memleak in UniqueRepresentation, @cached_method (closed: fixed)">#12215</a>.
</p>
<p>
So, correcting myself: The next step is to fix pari deallocation as in <a class="closed ticket" href="https://trac.sagemath.org/ticket/12215" title="defect: Memleak in UniqueRepresentation, @cached_method (closed: fixed)">#12215</a> on a separate ticket, and then I think my priority will be <a class="needs_work ticket" href="https://trac.sagemath.org/ticket/13447" title="defect: Make libsingular multivariate polynomial rings collectable (needs_work)">#13447</a>, which is then likely to involve the patch from <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a>.
</p>
TicketSimonKingThu, 22 Nov 2012 10:38:30 GMT
https://trac.sagemath.org/ticket/715#comment:342
https://trac.sagemath.org/ticket/715#comment:342
<p>
For the record: I created <a class="closed ticket" href="https://trac.sagemath.org/ticket/13741" title="defect: Proper deallocation of the (unique) pari instance (closed: fixed)">#13741</a>, needing review (but perhaps needing a doctest).
</p>
TicketjpfloriThu, 22 Nov 2012 17:33:57 GMTdependencies changed
https://trac.sagemath.org/ticket/715#comment:343
https://trac.sagemath.org/ticket/715#comment:343
<ul>
<li><strong>dependencies</strong>
changed from <em>#13145, to be merged with #11521</em> to <em>#13145, #13741, #13746, to be merged with #11521</em>
</li>
</ul>
<p>
I've put the pbori fix from <a class="closed ticket" href="https://trac.sagemath.org/ticket/12313" title="defect: Fix yet another memory leak caused by caching of coercion data (closed: fixed)">#12313</a> at <a class="closed ticket" href="https://trac.sagemath.org/ticket/13746" title="defect: Do not delete a borrowed reference to reduction strategies in pbori (closed: fixed)">#13746</a>.
I've put it as a dependency here, and <a class="closed ticket" href="https://trac.sagemath.org/ticket/13741" title="defect: Proper deallocation of the (unique) pari instance (closed: fixed)">#13741</a> because I feel this was the original problem mjo evoked although applying the other fix was sufficient to slightly modify the order ot the universe and make the problem disappear.
</p>
TicketjpfloriThu, 22 Nov 2012 17:34:38 GMT
https://trac.sagemath.org/ticket/715#comment:344
https://trac.sagemath.org/ticket/715#comment:344
<p>
As the ticket was closed, I'm not sure my idea of changind the dependencies field was a good idea...
</p>
TicketjdemeyerFri, 23 Nov 2012 13:28:42 GMT
https://trac.sagemath.org/ticket/715#comment:345
https://trac.sagemath.org/ticket/715#comment:345
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:344" title="Comment 344">jpflori</a>:
</p>
<blockquote class="citation">
<p>
As the ticket was closed, I'm not sure my idea of changind the dependencies field was a good idea...
</p>
</blockquote>
<p>
Changing dependencies is fine for me. Just don't change the patch(es).
</p>
TicketjpfloriThu, 29 Nov 2012 23:06:49 GMT
https://trac.sagemath.org/ticket/715#comment:346
https://trac.sagemath.org/ticket/715#comment:346
<p>
Did someone still have random failures?
Or are we finally approaching the end here?
</p>
TicketjpfloriMon, 24 Dec 2012 21:21:01 GMT
https://trac.sagemath.org/ticket/715#comment:347
https://trac.sagemath.org/ticket/715#comment:347
<p>
Building Python without pymalloc, I hopefully got valgrind outputs which might point to the hopefully last problem we have to face:
</p>
<p>
Not sure we got these so clearly before, but using --without-pymalloc and Valgrind (hint: finish and review <a class="closed ticket" href="https://trac.sagemath.org/ticket/13060" title="defect: Update Valgrind spkg to version 3.8.1 (closed: fixed)">#13060</a>) I get lots of
</p>
<pre class="wiki">==28631== Invalid read of size 8
==28631== at 0x10429E50: __pyx_tp_dealloc_4sage_9structure_15category_object_CategoryObject (category_object.c:8990)
==28631== by 0x4ED96C5: subtype_dealloc (typeobject.c:1014)
==28631== by 0x4EBA106: insertdict (dictobject.c:530)
==28631== by 0x4EBCB51: PyDict_SetItem (dictobject.c:775)
==28631== by 0x4EC2517: _PyObject_GenericSetAttrWithDict (object.c:1524)
==28631== by 0x4EC1F5E: PyObject_SetAttr (object.c:1247)
==28631== by 0x4F21600: PyEval_EvalFrameEx (ceval.c:2004)
==28631== by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631== by 0x4EA8F65: function_call (funcobject.c:526)
==28631== by 0x4E7DFED: PyObject_Call (abstract.c:2529)
==28631== by 0x4F1F6A6: PyEval_CallObjectWithKeywords (ceval.c:3890)
==28631== by 0x4F23D5A: PyEval_EvalFrameEx (ceval.c:1739)
==28631== by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631== by 0x4F266C1: PyEval_EvalCode (ceval.c:667)
==28631== by 0x4F24388: PyEval_EvalFrameEx (ceval.c:4718)
==28631== by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631== by 0x4EA8F65: function_call (funcobject.c:526)
==28631== by 0x4E7DFED: PyObject_Call (abstract.c:2529)
==28631== by 0x4E8C46F: instancemethod_call (classobject.c:2578)
==28631== by 0x4E7DFED: PyObject_Call (abstract.c:2529)
==28631== by 0x4F21828: PyEval_EvalFrameEx (ceval.c:4239)
==28631== by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631== by 0x4F2422C: PyEval_EvalFrameEx (ceval.c:4117)
==28631== by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631== by 0x4EA8F65: function_call (funcobject.c:526)
==28631== Address 0xbd30390 is 48 bytes inside a block of size 256 free'd
==28631== at 0x4C28B16: free (vg_replace_malloc.c:446)
==28631== by 0x4ED96C5: subtype_dealloc (typeobject.c:1014)
==28631== by 0x4F5F112: collect (gcmodule.c:770)
==28631== by 0x4F5FB06: _PyObject_GC_Malloc (gcmodule.c:996)
==28631== by 0x4F5FB3C: _PyObject_GC_New (gcmodule.c:1467)
==28631== by 0x4E98B97: PyWrapper_New (descrobject.c:1068)
==28631== by 0x4EC2258: _PyObject_GenericGetAttrWithDict (object.c:1434)
==28631== by 0x10A6CD28: __pyx_pw_4sage_9structure_11coerce_dict_16TripleDictEraser_3__call__ (coerce_dict.c:1225)
==28631== by 0x4E7DFED: PyObject_Call (abstract.c:2529)
==28631== by 0x4E7EB2D: PyObject_CallFunctionObjArgs (abstract.c:2760)
==28631== by 0x4EEA350: PyObject_ClearWeakRefs (weakrefobject.c:881)
==28631== by 0x10429E4F: __pyx_tp_dealloc_4sage_9structure_15category_object_CategoryObject (category_object.c:8989)
==28631== by 0x4ED96C5: subtype_dealloc (typeobject.c:1014)
==28631== by 0x4EBA106: insertdict (dictobject.c:530)
==28631== by 0x4EBCB51: PyDict_SetItem (dictobject.c:775)
==28631== by 0x4EC2517: _PyObject_GenericSetAttrWithDict (object.c:1524)
==28631== by 0x4EC1F5E: PyObject_SetAttr (object.c:1247)
==28631== by 0x4F21600: PyEval_EvalFrameEx (ceval.c:2004)
==28631== by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631== by 0x4EA8F65: function_call (funcobject.c:526)
==28631== by 0x4E7DFED: PyObject_Call (abstract.c:2529)
==28631== by 0x4F1F6A6: PyEval_CallObjectWithKeywords (ceval.c:3890)
==28631== by 0x4F23D5A: PyEval_EvalFrameEx (ceval.c:1739)
==28631== by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631== by 0x4F266C1: PyEval_EvalCode (ceval.c:667)
</pre><p>
and
</p>
<pre class="wiki">==28631== Invalid read of size 8
==28631== at 0x4F5FC1E: PyObject_GC_Del (gcmodule.c:210)
==28631== by 0x4ED96C5: subtype_dealloc (typeobject.c:1014)
==28631== by 0x4EBA106: insertdict (dictobject.c:530)
==28631== by 0x4EBCB51: PyDict_SetItem (dictobject.c:775)
==28631== by 0x4EC2517: _PyObject_GenericSetAttrWithDict (object.c:1524)
==28631== by 0x4EC1F5E: PyObject_SetAttr (object.c:1247)
==28631== by 0x4F21600: PyEval_EvalFrameEx (ceval.c:2004)
==28631== by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631== by 0x4EA8F65: function_call (funcobject.c:526)
==28631== by 0x4E7DFED: PyObject_Call (abstract.c:2529)
==28631== by 0x4F1F6A6: PyEval_CallObjectWithKeywords (ceval.c:3890)
==28631== by 0x4F23D5A: PyEval_EvalFrameEx (ceval.c:1739)
==28631== by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631== by 0x4F266C1: PyEval_EvalCode (ceval.c:667)
==28631== by 0x4F24388: PyEval_EvalFrameEx (ceval.c:4718)
==28631== by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631== by 0x4EA8F65: function_call (funcobject.c:526)
==28631== by 0x4E7DFED: PyObject_Call (abstract.c:2529)
==28631== by 0x4E8C46F: instancemethod_call (classobject.c:2578)
==28631== by 0x4E7DFED: PyObject_Call (abstract.c:2529)
==28631== by 0x4F21828: PyEval_EvalFrameEx (ceval.c:4239)
==28631== by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631== by 0x4F2422C: PyEval_EvalFrameEx (ceval.c:4117)
==28631== by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631== by 0x4EA8F65: function_call (funcobject.c:526)
==28631== Address 0xbd30360 is 0 bytes inside a block of size 256 free'd
==28631== at 0x4C28B16: free (vg_replace_malloc.c:446)
==28631== by 0x4ED96C5: subtype_dealloc (typeobject.c:1014)
==28631== by 0x4F5F112: collect (gcmodule.c:770)
==28631== by 0x4F5FB06: _PyObject_GC_Malloc (gcmodule.c:996)
==28631== by 0x4F5FB3C: _PyObject_GC_New (gcmodule.c:1467)
==28631== by 0x4E98B97: PyWrapper_New (descrobject.c:1068)
==28631== by 0x4EC2258: _PyObject_GenericGetAttrWithDict (object.c:1434)
==28631== by 0x10A6CD28: __pyx_pw_4sage_9structure_11coerce_dict_16TripleDictEraser_3__call__ (coerce_dict.c:1225)
==28631== by 0x4E7DFED: PyObject_Call (abstract.c:2529)
==28631== by 0x4E7EB2D: PyObject_CallFunctionObjArgs (abstract.c:2760)
==28631== by 0x4EEA350: PyObject_ClearWeakRefs (weakrefobject.c:881)
==28631== by 0x10429E4F: __pyx_tp_dealloc_4sage_9structure_15category_object_CategoryObject (category_object.c:8989)
==28631== by 0x4ED96C5: subtype_dealloc (typeobject.c:1014)
==28631== by 0x4EBA106: insertdict (dictobject.c:530)
==28631== by 0x4EBCB51: PyDict_SetItem (dictobject.c:775)
==28631== by 0x4EC2517: _PyObject_GenericSetAttrWithDict (object.c:1524)
==28631== by 0x4EC1F5E: PyObject_SetAttr (object.c:1247)
==28631== by 0x4F21600: PyEval_EvalFrameEx (ceval.c:2004)
==28631== by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631== by 0x4EA8F65: function_call (funcobject.c:526)
==28631== by 0x4E7DFED: PyObject_Call (abstract.c:2529)
==28631== by 0x4F1F6A6: PyEval_CallObjectWithKeywords (ceval.c:3890)
==28631== by 0x4F23D5A: PyEval_EvalFrameEx (ceval.c:1739)
==28631== by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631== by 0x4F266C1: PyEval_EvalCode (ceval.c:667)
</pre>
TicketSimonKingTue, 25 Dec 2012 19:23:15 GMT
https://trac.sagemath.org/ticket/715#comment:348
https://trac.sagemath.org/ticket/715#comment:348
<p>
Sorry, I am not experienced enough with valgrind. I don't even know what we learn from the valgrind output. Does it tell in what function/method the invalid read occur? Does it tell what is the reason for the read being invalid? I mean: Does the read concern data that have previously been freed, or what happens?
</p>
TicketvbraunTue, 25 Dec 2012 20:55:05 GMT
https://trac.sagemath.org/ticket/715#comment:349
https://trac.sagemath.org/ticket/715#comment:349
<p>
The first line is the error, like "Invalid read of size 8" = the code wants to read 8 bytes from a location that it is not on the stack or hasn't been malloc'ed. Then follows the stack backtrace, first the function that caused the error then the calling function etc (just like gdb).
</p>
<p>
Valgrind will keep info about the most recent free's to give you a better diagnostic (this has been freed previously and you are this far into the freed space) but it won't track all frees that have ever happened (which would be prohibitive ram usage). There are some options to control this, for example
</p>
<pre class="wiki"> --freelist-vol=<number> volume of freed blocks queue [20000000]
--freelist-big-blocks=<number> releases first blocks with size >= [1000000]
</pre><p>
see also <code>valgrind --help</code>
</p>
TicketjdemeyerWed, 26 Dec 2012 13:16:18 GMT
https://trac.sagemath.org/ticket/715#comment:350
https://trac.sagemath.org/ticket/715#comment:350
<p>
With a trial of sage-5.6.beta2, I get the following doctest error on the Skynet machine <code>mark</code> (Solaris SPARC 32-bit):
</p>
<pre class="wiki">sage -t --long -force_lib devel/sage/sage/calculus/wester.py
**********************************************************************
File "/home/buildbot/build/sage/mark-1/mark_full/build/sage-5.6.beta2/devel/sage-main/sage/calculus/wester.py", line 456:
sage: d = m.determinant()
Exception raised:
Traceback (most recent call last):
File "/home/buildbot/build/sage/mark-1/mark_full/build/sage-5.6.beta2/local/bin/ncadoctest.py", line 1231, in run_one_test
self.run_one_example(test, example, filename, compileflags)
File "/home/buildbot/build/sage/mark-1/mark_full/build/sage-5.6.beta2/local/bin/sagedoctest.py", line 38, in run_one_example
OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
File "/home/buildbot/build/sage/mark-1/mark_full/build/sage-5.6.beta2/local/bin/ncadoctest.py", line 1172, in run_one_example
compileflags, 1) in test.globs
File "<doctest __main__.example_0[153]>", line 1, in <module>
d = m.determinant()###line 456:
sage: d = m.determinant()
File "matrix2.pyx", line 1167, in sage.matrix.matrix2.Matrix.determinant (sage/matrix/matrix2.c:8553)
File "matrix_symbolic_dense.pyx", line 436, in sage.matrix.matrix_symbolic_dense.Matrix_symbolic_dense.charpoly (sage/matrix/matrix_symbolic_dense.c:3556)
File "expression.pyx", line 4911, in sage.symbolic.expression.Expression.polynomial (sage/symbolic/expression.cpp:23554)
File "/home/buildbot/build/sage/mark-1/mark_full/build/sage-5.6.beta2/local/lib/python/site-packages/sage/symbolic/expression_conversions.py", line 1056, in polynomial
res = converter()
File "/home/buildbot/build/sage/mark-1/mark_full/build/sage-5.6.beta2/local/lib/python/site-packages/sage/symbolic/expression_conversions.py", line 214, in __call__
return self.arithmetic(ex, operator)
File "/home/buildbot/build/sage/mark-1/mark_full/build/sage-5.6.beta2/local/lib/python/site-packages/sage/symbolic/expression_conversions.py", line 1010, in arithmetic
ops = [self(a) for a in ex.operands()]
File "/home/buildbot/build/sage/mark-1/mark_full/build/sage-5.6.beta2/local/lib/python/site-packages/sage/symbolic/expression_conversions.py", line 214, in __call__
return self.arithmetic(ex, operator)
File "/home/buildbot/build/sage/mark-1/mark_full/build/sage-5.6.beta2/local/lib/python/site-packages/sage/symbolic/expression_conversions.py", line 1011, in arithmetic
return reduce(operator, ops)
File "element.pyx", line 1682, in sage.structure.element.RingElement.__mul__ (sage/structure/element.c:14096)
File "polynomial_element.pyx", line 1156, in sage.rings.polynomial.polynomial_element.Polynomial._mul_ (sage/rings/polynomial/polynomial_element.c:11992)
File "polynomial_element.pyx", line 6165, in sage.rings.polynomial.polynomial_element.Polynomial_generic_dense.__richcmp__ (sage/rings/polynomial/polynomial_element.c:42959)
File "element.pyx", line 843, in sage.structure.element.Element._richcmp (sage/structure/element.c:7870)
File "coerce.pyx", line 854, in sage.structure.coerce.CoercionModel_cache_maps.canonical_coercion (sage/structure/coerce.c:7932)
File "coerce.pyx", line 1009, in sage.structure.coerce.CoercionModel_cache_maps.coercion_maps (sage/structure/coerce.c:9483)
File "coerce.pyx", line 1150, in sage.structure.coerce.CoercionModel_cache_maps.discover_coercion (sage/structure/coerce.c:11033)
File "parent.pyx", line 1974, in sage.structure.parent.Parent.coerce_map_from (sage/structure/parent.c:13804)
File "parent.pyx", line 2068, in sage.structure.parent.Parent.discover_coerce_map_from (sage/structure/parent.c:14231)
File "parent_old.pyx", line 507, in sage.structure.parent_old.Parent._coerce_map_from_ (sage/structure/parent_old.c:6428)
File "/home/buildbot/build/sage/mark-1/mark_full/build/sage-5.6.beta2/local/lib/python/site-packages/sage/rings/polynomial/polynomial_ring.py", line 554, in _coerce_map_from_
return self.coerce_map_from(base_ring) * connecting
File "map.pyx", line 649, in sage.categories.map.Map.__mul__ (sage/categories/map.c:4578)
File "map.pyx", line 689, in sage.categories.map.Map._composition (sage/categories/map.c:4696)
File "/home/buildbot/build/sage/mark-1/mark_full/build/sage-5.6.beta2/local/lib/python/site-packages/sage/categories/homset.py", line 261, in Hom
_cache[key] = KeyedRef(H, _cache.eraser, (id(X),id(Y),id(category)))
File "coerce_dict.pyx", line 451, in sage.structure.coerce_dict.TripleDict.__setitem__ (sage/structure/coerce_dict.c:2933)
File "coerce_dict.pyx", line 471, in sage.structure.coerce_dict.TripleDict.set (sage/structure/coerce_dict.c:3199)
KeyError: (26976432, 83278464, 7649040)
**********************************************************************
</pre><p>
The error is reproducible, except that the numbers in the <code>KeyError</code> change.
</p>
TicketjpfloriWed, 26 Dec 2012 13:23:16 GMT
https://trac.sagemath.org/ticket/715#comment:351
https://trac.sagemath.org/ticket/715#comment:351
<p>
Thanks for the report, there is definitely something wrong with our Python refcounting and use of weakrefs.
I'm currently investigating this using a debug build of Python.
With it, some ref counts get negative very quickly and Sage aborts because of the assert which are now tested.
In fact, while importing Sage, Python just has the time to:
</p>
<ul><li>create the empty set in sage/structure/parent.pyx
</li><li>create the Mathematica interface in sage/interfaces/mathematica.pyx
</li><li>assert fails and abort.
</li></ul><p>
Any idea if Sage ever worked correctly with such a build?
I'm rebuilding a Sage 5.2 with such a build just to see.
</p>
TicketjpfloriWed, 26 Dec 2012 14:04:34 GMT
https://trac.sagemath.org/ticket/715#comment:352
https://trac.sagemath.org/ticket/715#comment:352
<p>
Bad (?) news, Sage 5.2 fails the same way.
I've got some FLINT related patches on top of vanilla 5.2 but have double checked I have nothing related to the memleak tickets.
</p>
TicketSimonKingWed, 26 Dec 2012 14:53:45 GMT
https://trac.sagemath.org/ticket/715#comment:353
https://trac.sagemath.org/ticket/715#comment:353
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:352" title="Comment 352">jpflori</a>:
</p>
<blockquote class="citation">
<p>
Bad (?) news, Sage 5.2 fails the same way.
I've got some FLINT related patches on top of vanilla 5.2 but have double checked I have nothing related to the memleak tickets.
</p>
</blockquote>
<p>
It means that the debug build of Python can not (yet) be used to debug the problems introduced by this patch.
</p>
<p>
I suggest that we move fixing these unrelated problems to a new ticket.
</p>
<p>
However, the valgrind output suggests that there <em>is</em> something wrong with the new <code>TripleDict</code> implementation. Is there a way to tell from the valgrind output where (i.e., for instances of what classes) the invalid reads occur? IIRC, you suggested on sage-devel that it could be related with endomorphism rings, kind of "domain and codomain are both decref'd, which is bad if they are the same".
</p>
<p>
But is that just a guess, or has it been confirmed?
</p>
TicketjpfloriWed, 26 Dec 2012 15:03:06 GMT
https://trac.sagemath.org/ticket/715#comment:354
https://trac.sagemath.org/ticket/715#comment:354
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:353" title="Comment 353">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:352" title="Comment 352">jpflori</a>:
</p>
<blockquote class="citation">
<p>
Bad (?) news, Sage 5.2 fails the same way.
I've got some FLINT related patches on top of vanilla 5.2 but have double checked I have nothing related to the memleak tickets.
</p>
</blockquote>
<p>
It means that the debug build of Python can not (yet) be used to debug the problems introduced by this patch.
</p>
</blockquote>
<p>
If there are indeed additional problems caused by the patches here.
</p>
<blockquote class="citation">
<p>
I suggest that we move fixing these unrelated problems to a new ticket.
</p>
</blockquote>
<p>
Agreed.
</p>
<blockquote class="citation">
<p>
However, the valgrind output suggests that there <em>is</em> something wrong with the new <code>TripleDict</code> implementation. Is there a way to tell from the valgrind output where (i.e., for instances of what classes) the invalid reads occur? IIRC, you suggested on sage-devel that it could be related with endomorphism rings, kind of "domain and codomain are both decref'd, which is bad if they are the same".
</p>
</blockquote>
<p>
Kind of, there is something wrong happening and it involves <a class="missing wiki">TripleDict?</a> indeed, but maybe its only a consequence of a previous problem, potentially the assert that fails when Sage starts.
Let's say that <a class="missing wiki">TripleDict?</a> tries to delete some of its elements but those were already deleted because of a superfluous previous decref.
And when Python tries to delete them again (because a final valid strong reference has been deleted), it can randomly segfault (and did not until the inclusion of these patches!).
</p>
<p>
With the debug build, instead of randomly segfaulting, these spurious decref make the assert clauses abort the program.
</p>
<p>
A realistic hypothesis is that the decref problems were already present but went unnoticed.
It's only the patcehs here which make a deeper use of weakrefs that revealed these previous problems.
And hopefully there are no other problems introduced here (frankly after staring for hours at the new <a class="missing wiki">TripleDict?</a> code is does not look that bad, so we can hope it is really correct).
</p>
<blockquote class="citation">
<p>
But is that just a guess, or has it been confirmed?
</p>
</blockquote>
<p>
That was just a guess, kind of: what could be the more fish here? ok, it is when both domain and codomain are equal.
</p>
TicketvbraunWed, 26 Dec 2012 15:04:23 GMT
https://trac.sagemath.org/ticket/715#comment:355
https://trac.sagemath.org/ticket/715#comment:355
<p>
The negative refcounts are possibly related to this bug since this means somebody is touching a dead object to decref it.
</p>
<p>
Do we have a ticket and/or updated spkgs for python/singular somewhere that enable all this debugging if I compile with <code>SAGE_DEBUG=yes</code>? We should push this out into a beta first, then people can actually look at their own code and see if it does something wrong.
</p>
TicketSimonKingWed, 26 Dec 2012 15:12:08 GMT
https://trac.sagemath.org/ticket/715#comment:356
https://trac.sagemath.org/ticket/715#comment:356
<p>
I remember - but I do not remember the ticket number - that I had to <em>incref</em> something in the <em>deallocation</em> of something homset-related. It could be that it was in groupoids, but I am not sure.
</p>
<p>
The fact that I had to incref before deallocating suggests that something fishy was going on, and perhaps it is related?
</p>
TicketjpfloriWed, 26 Dec 2012 15:14:28 GMT
https://trac.sagemath.org/ticket/715#comment:357
https://trac.sagemath.org/ticket/715#comment:357
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:355" title="Comment 355">vbraun</a>:
</p>
<blockquote class="citation">
<p>
The negative refcounts are possibly related to this bug since this means somebody is touching a dead object to decref it.
</p>
</blockquote>
<p>
Not sure what you exactly mean.
What I meant is that Sage was already decrefing objects too much before the patches here, so maybe we did not add anything wrong here, except that random segfaults which could already have happened before now actually do.
</p>
<blockquote class="citation">
<p>
Do we have a ticket and/or updated spkgs for python/singular somewhere that enable all this debugging if I compile with <code>SAGE_DEBUG=yes</code>? We should push this out into a beta first, then people can actually look at their own code and see if it does something wrong.
</p>
</blockquote>
<p>
I got one for Python on my computer...
</p>
<p>
I've opened <a class="closed ticket" href="https://trac.sagemath.org/ticket/13864" title="task: Configure Python with pydebug when SAGE_DEBUG is set (closed: fixed)">#13864</a> (the spkg is not there yet, it will be when I attach the diff).
</p>
TicketjpfloriWed, 26 Dec 2012 15:22:43 GMT
https://trac.sagemath.org/ticket/715#comment:358
https://trac.sagemath.org/ticket/715#comment:358
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:356" title="Comment 356">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
I remember - but I do not remember the ticket number - that I had to <em>incref</em> something in the <em>deallocation</em> of something homset-related. It could be that it was in groupoids, but I am not sure.
</p>
</blockquote>
<p>
Was it <a class="needs_work ticket" href="https://trac.sagemath.org/ticket/13447" title="defect: Make libsingular multivariate polynomial rings collectable (needs_work)">#13447</a>?
</p>
<blockquote class="citation">
<p>
The fact that I had to incref before deallocating suggests that something fishy was going on, and perhaps it is related?
</p>
</blockquote>
TicketSimonKingWed, 26 Dec 2012 17:18:45 GMT
https://trac.sagemath.org/ticket/715#comment:359
https://trac.sagemath.org/ticket/715#comment:359
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:358" title="Comment 358">jpflori</a>:
</p>
<blockquote class="citation">
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:356" title="Comment 356">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
I remember - but I do not remember the ticket number - that I had to <em>incref</em> something in the <em>deallocation</em> of something homset-related. It could be that it was in groupoids, but I am not sure.
</p>
</blockquote>
<p>
Was it <a class="needs_work ticket" href="https://trac.sagemath.org/ticket/13447" title="defect: Make libsingular multivariate polynomial rings collectable (needs_work)">#13447</a>?
</p>
</blockquote>
<p>
Nope. It did involve groupoids, I think.
</p>
TicketvbraunWed, 26 Dec 2012 18:05:22 GMT
https://trac.sagemath.org/ticket/715#comment:360
https://trac.sagemath.org/ticket/715#comment:360
<p>
I noticed that the <code>SAGE_DEBUG</code> documentation doesn't quite match what we are doing with it. So I proposed to change it at <a class="closed ticket" href="https://trac.sagemath.org/ticket/13865" title="enhancement: Document that SAGE_DEBUG is three-state (closed: fixed)">#13865</a>.
</p>
TicketjpfloriThu, 27 Dec 2012 09:22:00 GMT
https://trac.sagemath.org/ticket/715#comment:361
https://trac.sagemath.org/ticket/715#comment:361
<p>
Hopefully the updated Cython 0.17.3 at <a class="closed ticket" href="https://trac.sagemath.org/ticket/13832" title="enhancement: Upgrade Cython to 0.17.3 (closed: fixed)">#13832</a> might fix the last bugs we encounter.
It indeed involves a fix concerning deallocation of weakreferable cdefed classes, see
<a class="ext-link" href="https://groups.google.com/d/topic/cython-users/4es75DeacRA/discussion"><span class="icon"></span>https://groups.google.com/d/topic/cython-users/4es75DeacRA/discussion</a> for the release annoucement and
<a class="ext-link" href="https://groups.google.com/d/topic/cython-users/K5EFvq22UNI/discussion"><span class="icon"></span>https://groups.google.com/d/topic/cython-users/K5EFvq22UNI/discussion</a> for a previous bug report.
So the end of the story is that the intensive of weakrefs made here just revealed bugs already present in Sage but which by some chance never produced segfaults.
</p>
<p>
See some comments as well on testing Sage with a pydebug enable Python at <a class="closed ticket" href="https://trac.sagemath.org/ticket/13864" title="task: Configure Python with pydebug when SAGE_DEBUG is set (closed: fixed)">#13864</a> and the long thread at
<a class="ext-link" href="https://groups.google.com/d/topic/sage-devel/Wt7uxbDkh_A/discussion"><span class="icon"></span>https://groups.google.com/d/topic/sage-devel/Wt7uxbDkh_A/discussion</a>
</p>
TicketjdemeyerThu, 27 Dec 2012 23:14:55 GMT
https://trac.sagemath.org/ticket/715#comment:362
https://trac.sagemath.org/ticket/715#comment:362
<p>
Please note <a class="closed ticket" href="https://trac.sagemath.org/ticket/13870" title="enhancement: Undo #715, #11521 (closed: invalid)">#13870</a>.
</p>
TicketjdemeyerMon, 07 Jan 2013 12:36:10 GMT
https://trac.sagemath.org/ticket/715#comment:363
https://trac.sagemath.org/ticket/715#comment:363
<p>
More bad news: <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a> + <a class="closed ticket" href="https://trac.sagemath.org/ticket/11521" title="defect: Use weak references to cache homsets (closed: fixed)">#11521</a> cause a significant slowdown for the command
</p>
<pre class="wiki">sage: time p = polar_plot(lambda t: (100/(100+(t-pi/2)^8))*(2-sin(7*t)-cos(30*t)/2), -pi/4, 3*pi/2, color="red",plot_points=1000)
</pre><p>
from 22 to 33 seconds. See <a class="ext-link" href="https://groups.google.com/forum/?fromgroups#!topic/sage-devel/EzFPIG6EFMI"><span class="icon"></span>https://groups.google.com/forum/?fromgroups#!topic/sage-devel/EzFPIG6EFMI</a>
</p>
TicketSimonKingMon, 07 Jan 2013 12:52:21 GMT
https://trac.sagemath.org/ticket/715#comment:364
https://trac.sagemath.org/ticket/715#comment:364
<p>
Without <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a> and friends,
</p>
<pre class="wiki">sage: %prun p = polar_plot(lambda t: (100/(100+(t-pi/2)^8))*(2-sin(7*t)-cos(30*t)/2), -pi/4, 3*pi/2, color="red",plot_points=1000)
</pre><p>
yields
</p>
<pre class="wiki"> ncalls tottime percall cumtime percall filename:lineno(function)
88368 12.267 0.000 20.873 0.000 arith.py:1439(gcd)
9263 9.309 0.001 32.102 0.003 <string>:1(<lambda>)
79788/39894 2.004 0.000 2.865 0.000 lazy_attribute.py:506(__get__)
39894 1.599 0.000 4.681 0.000 homset.py:296(__init__)
97631 1.145 0.000 1.737 0.000 arith.py:1611(lcm)
19950 0.961 0.000 6.910 0.000 homset.py:40(Hom)
185999 0.879 0.000 0.880 0.000 {method 'canonical_coercion' of 'sage.structure.coerce.CoercionModel_cache_maps' objects}
8263/999 0.824 0.000 29.602 0.030 plot.py:2307(adaptive_refinement)
39895 0.373 0.000 1.783 0.000 {hasattr}
159576 0.328 0.000 0.328 0.000 {getattr}
14601 0.309 0.000 0.510 0.000 quotient_fields.py:55(gcd)
39890 0.304 0.000 5.033 0.000 homset.py:573(__init__)
116811 0.259 0.000 0.259 0.000 weakref.py:55(__getitem__)
</pre><p>
With <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a>, it becomes
</p>
<pre class="wiki"> ncalls tottime percall cumtime percall filename:lineno(function)
89840 43.019 0.000 68.180 0.001 arith.py:1489(gcd)
9415 24.524 0.003 97.043 0.010 <string>:1(<lambda>)
82004/41002 5.752 0.000 7.564 0.000 lazy_attribute.py:506(__get__)
41002 4.583 0.000 12.597 0.000 homset.py:353(__init__)
20504 4.108 0.000 19.894 0.001 homset.py:80(Hom)
189095 2.924 0.000 2.925 0.000 {method 'canonical_coercion' of 'sage.structure.coerce.CoercionModel_cache_maps' objects}
99255 2.392 0.000 3.942 0.000 arith.py:1661(lcm)
8415/999 1.517 0.000 88.121 0.088 plot.py:2316(adaptive_refinement)
205132 1.118 0.000 1.699 0.000 weakref.py:223(__new__)
164064 1.088 0.000 1.088 0.000 weakref.py:228(__init__)
41003 0.979 0.000 5.099 0.000 {hasattr}
205132 0.581 0.000 0.581 0.000 {built-in method __new__ of type object at 0x7f9b33e874a0}
164008 0.578 0.000 0.578 0.000 {getattr}
40998 0.546 0.000 13.200 0.000 homset.py:630(__init__)
119635 0.545 0.000 0.545 0.000 weakref.py:55(__getitem__)
14954 0.532 0.000 0.813 0.000 quotient_fields.py:55(gcd)
20499 0.424 0.000 8.366 0.000 rings.py:635(__new__)
133072 0.394 0.000 0.394 0.000 rational_field.py:217(__hash__)
114209 0.370 0.000 0.370 0.000 {method 'lcm' of 'sage.structure.element.PrincipalIdealDomainElement' objects}
40998 0.332 0.000 13.532 0.000 homset.py:30(__init__)
20499 0.330 0.000 0.330 0.000 dynamic_class.py:122(dynamic_class)
20499 0.304 0.000 7.942 0.000 homset.py:23(RingHomset)
119748 0.297 0.000 0.297 0.000 {method 'gcd' of 'sage.rings.integer.Integer' objects}
189095 0.262 0.000 0.262 0.000 {sage.structure.element.get_coercion_model}
61551 0.213 0.000 0.213 0.000 {isinstance}
41002 0.204 0.000 5.303 0.000 sets_cat.py:255(_element_constructor_)
1 0.201 0.201 98.842 98.842 plot.py:2401(generate_plot_points)
</pre><p>
So, it seems to me that the slow-down is in the creation of homsets.
</p>
<p>
First question: Why are so many homsets needed in this example?
</p>
<p>
Second question: What can we do to make the creation of a homset more efficient?
</p>
TicketSimonKingMon, 07 Jan 2013 12:55:27 GMT
https://trac.sagemath.org/ticket/715#comment:365
https://trac.sagemath.org/ticket/715#comment:365
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:364" title="Comment 364">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
First question: Why are so many homsets needed in this example?
</p>
</blockquote>
<p>
By this, I mean "Why so many even with strong cache?"
</p>
<p>
Note that <a class="closed ticket" href="https://trac.sagemath.org/ticket/715" title="defect: Parents probably not reclaimed due to too much caching (closed: fixed)">#715</a> only involves a mild increase in the number of homsets created. But the time for creation increases dramatically.
</p>
TicketcremonaMon, 07 Jan 2013 13:07:01 GMT
https://trac.sagemath.org/ticket/715#comment:366
https://trac.sagemath.org/ticket/715#comment:366
<p>
To me it looks as if most of the extra time is in the symbolic gcd calls. But seriously, why on earth does potting a simple trig function involve anything as sophisticated as creating homsets? And why also are any gcds being computed? Is it that for each of the values t which are being iterated over, which are rational multiples of pi, the evaluation of sin(7*t) and cos (30*t) is being much too clever when all that is needed is a low-precision numerical value?
</p>
TicketSimonKingMon, 07 Jan 2013 13:12:53 GMT
https://trac.sagemath.org/ticket/715#comment:367
https://trac.sagemath.org/ticket/715#comment:367
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:366" title="Comment 366">cremona</a>:
</p>
<blockquote class="citation">
<p>
To me it looks as if most of the extra time is in the symbolic gcd calls. But seriously, why on earth does potting a simple trig function involve anything as sophisticated as creating homsets?
</p>
</blockquote>
<p>
And in particular: Why is <code>Set of Homomorphisms from Integer Ring to Real Interval Field with 64 bits of precision</code> created a couple of thousands of times, even with a <em>strong</em> cache?
</p>
<blockquote class="citation">
<p>
And why also are any gcds being computed? Is it that for each of the values twhich are being iterated over, which are rational multiples of pi, the evaluation of sin(7*t) and cos (30*t) is being much too clever when all that is needed is a low-precision numerical value?
</p>
</blockquote>
<p>
I don't know. But in any case, there is a regression in the time for creating a homset. I have opened <a class="closed ticket" href="https://trac.sagemath.org/ticket/13922" title="defect: Avoid a regression in the creation of homsets (closed: fixed)">#13922</a> for this problem.
</p>
TicketSimonKingMon, 07 Jan 2013 13:22:51 GMT
https://trac.sagemath.org/ticket/715#comment:368
https://trac.sagemath.org/ticket/715#comment:368
<p>
Replying to <a class="ticket" href="https://trac.sagemath.org/ticket/715#comment:367" title="Comment 367">SimonKing</a>:
</p>
<blockquote class="citation">
<p>
And in particular: Why is <code>Set of Homomorphisms from Integer Ring to Real Interval Field with 64 bits of precision</code> created a couple of thousands of times, even with a <em>strong</em> cache?
</p>
</blockquote>
<p>
PS: The category for this homset is always the same, namely the category of euclidean domains. It should definitely not happen that this homset is created more than once, even with a weak cache.
</p>
Ticket