Opened 9 years ago

Last modified 8 years ago

#11521 closed defect

Use weak references to cache homsets — at Version 126

Reported by: jpflori Owned by: robertwb
Priority: major Milestone: sage-5.5
Component: coercion Keywords: sd35
Cc: jpflori, nthiery Merged in:
Authors: Simon King Reviewers:
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: #11900 #715 Stopgaps:

Description (last modified by jpflori)

Originally, this ticket was about the following memory leak when computing with elliptic curves:

sage: K = GF(1<<55,'t')
sage: a = K.random_element()
sage: while 1:
....:     E = EllipticCurve(j=a); P = E.random_point(); 2*P; 

This example is in fact solved by #715. However, while working on that ticket, another leak has been found, namely

sage: for p in prime_range(10^5):
....:     K = GF(p)
....:     a = K(0)
....:     
sage: import gc
sage: gc.collect()
0

So, I suggest to start with #715 and solve the second memory leak on top of it. It seems that a strong cache for homsets is to blame. I suggest to use the weak TripleDict instead, which were introduced in #715.

Apply

Change History (131)

comment:1 Changed 9 years ago by jpflori

After looking at #10548, I might have a better idea of the culprit:

sage: import gc
sage: from sage.schemes.elliptic_curves.ell_finite_field import EllipticCurve_finite_field
sage: K = GF(1<<60,'t')
sage: j = K.random_element()
sage: for i in xrange(100):
....:     E = EllipticCurve(j=j); P = E.random_point(); 2*P;
....:     
sage: gc.collect()
440
sage: len([x for x in gc.get_objects() if  isinstance(x,EllipticCurve_finite_field)])
100

comment:2 Changed 9 years ago by jpflori

  • Component changed from memleak to coercion
  • Owner changed from rlm to robertwb

So this could be just #715 .

comment:3 Changed 9 years ago by jpflori

This is definitely #715.

Resetting the coercion cache and calling gc.collect() deletes the cached elements.

I guess weak refs should be used in the different TripleDict? objects of !CoercionModel_cache_maps.

So this should be closed as duplicate/won't fix.

comment:4 Changed 9 years ago by jpflori

  • Status changed from new to needs_review

comment:5 Changed 9 years ago by zimmerma

Jean-Pierre, why did you change the status to "needs review"? There is no patch to review.

Also, how to you reset the coercion cache? I would be interested if you have a workaround for the memory leak in:

for p in prime_range(10^8):
   k = GF(p)

Paul

comment:6 follow-up: Changed 9 years ago by jpflori

As far as I'm concerned, this is nothing but a concrete example of the vague #715. So I guess I put it to "needs review" so that it could be closed as "duplicate/won't fix". Not sure it was the right way to do it.

I seem to remember that I had some workarounds to delete some parts of the cache (so that I could perform my computations), but not all of them. In fact there are several dictionnaries hidden in different places. It was quite a while ago, but I'll have another look at it at some point. Anyway, using weak references for all these dictionnaries seems to be a quite non trivial task. Moreover it should also not slow things down too much to be viable...

comment:7 Changed 9 years ago by zimmerma

for the computations I need to perform, which need to create about 200000 prime fields, this memory leak makes it impossible to perform it with Sage, which eats all the available memory.

I would be satisfied if I had a magic command to type to explicitly free the memory used by every k=GF(p).

Paul

comment:8 follow-up: Changed 9 years ago by jpflori

I'm having a look at your problem right now. Here are some hints to study the problem, mostly stolen from #10548.

I put it here for the record, and so that i can go faster next time.

First, if I only create the finite fields and do nothing with them, I do not seem to get a memleak. Some fields are not garbage collected immediately, but calling gc.collect() does the trick.

sage: import gc
sage: for p in prime_range(10**5):
....:    k = GF(p)
....:
sage: del p, k
sage: gc.collect()
1265
sage: from sage.rings.finite_rings.finite_field_prime_modn import FiniteField_prime_modn as FF
sage: L = [x for x in gc.get_objects() if isinstance(x, FF)]
sage: len(L)
1
sage: L
[Finite Field of size 2]

Of course I guess you actually do something with your finite fields.

So here is a small example causing the fields to stay cached.

sage: import gc
sage: for p in prime_range(10**5):
....:    k = GF(p)
....:
sage: del p, k
sage: gc.collect()
0

The zero here is bad and indeed

sage: from sage.rings.finite_rings.finite_field_prime_modn import FiniteField_prime_modn as FF
sage: L = [x for x in gc.get_objects() if isinstance(x, FF)]
sage: len(L)
9592 

If you want to know where it comes from you can use the objgraph python module (on my debian I just installed the python-objgraph package, updated sys.path in Sage to include '/usr/lib/python2.6/dist-packages')

sage: sys.path.append('/usr/lib/python2.6/dist-packages')
sage: import inspect, objgraph
sage: objgraph.show_backrefs(L[-1],filename='/home/jp/br.png',extra_ignore=[id(L)])

And look at the png or use

sage: brc = objgraph.find_backref_chain(L[-1],inspect.ismodule,max_depth=15,extra_ignore=[id(L)])
sage: map(type,brc)
[<type 'module'>, <type 'dict'>, <type 'dict'>,...
sage: brc[0]
<module 'sage.categories.homset'...

So the culprit is "_cache" in sage.categories.homset which has a direct reference to the finite fields in its keys.

The clean solution woul be to used weakref in the keys (let's say something as WeakKeyDirectory? in python).

However, resetting cache should be a (potentially partial) workaround (and could kill your Sage?).

sage: sage.categories.homset.cache = {}
sage: gc.collect()
376595

It also seems to be enough if I do "a = k.random_element(); a = a+a" in the for loop, but not if I add "a = 47*a+3".

I'm currently investigating that last case.

comment:9 Changed 9 years ago by jpflori

For info, using "k(47)*a+k(3)" is solved,  so the problem left really comes from coercion and action of integers.

sage: cm = sage.structure. get_coercion_model()
sage: cm.reset_cache()

does not help.

comment:10 Changed 9 years ago by jpflori

First, the second example above is missing the line "k(1);" in the for loop, otherwise it does nothing more than the first example.

Second, I guess the remaining references to the finite fields are in the different lists and dictionnaries of the integer ring named _coerce_from_list, _convert_from_list etc.

You can not directly access them from Python level, but there a function _introspect_coerce() (defined in parent.pyx) which returns them.

comment:11 Changed 9 years ago by jpflori

In fact, everything is in _*_hash.

And to conclude, I'd say that right now you can not directly delete entries in this dictionaries from the Python level.

So for minimum changes, you should eitheir avoid coercion, or make the dictionaries "cdef public" in parent.pxd to be able to explicitely delete every created entries (be aware that it could be written in different places for example in ZZ but also in QQ and CC and I don't know where...).

And I could also have missed other dictionaries used by Sage.

comment:12 Changed 9 years ago by zimmerma

Jean-Pierre, I cannot reproduce that:

sage: sys.path.append('/usr/lib/python2.6/dist-packages')
sage: import inspect, objgraph
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)

/users/caramel/zimmerma/Adm/Strass/SED/2011/<ipython console> in <module>()

ImportError: No module named objgraph

Paul

comment:13 Changed 9 years ago by jpflori

Did you "apt-get install python-objgraph" on your system?

If yes, is it a version for python 2.6 ?

comment:14 Changed 9 years ago by jpflori

The path I gave above might also be different on your system...

As the package manager.

comment:15 follow-up: Changed 9 years ago by jpflori

Any progress on your side?

If you found any other dicitonaries leading to cahing problems, it would be great to mention them here for the record.

Hence the day someone will finally decide to tackle ticket #715, it will speed up the search of the culprit.

comment:16 in reply to: ↑ 15 Changed 9 years ago by zimmerma

Replying to jpflori:

Any progress on your side?

no time so far. I will look at this during the SageFlint? days in December, unless someone else has some time before.

Paul

comment:17 in reply to: ↑ 6 Changed 9 years ago by johanbosman

Replying to jpflori:

As far as I'm concerned, this is nothing but a concrete example of the vague #715. So I guess I put it to "needs review" so that it could be closed as "duplicate/won't fix". Not sure it was the right way to do it.

If you think it should be closed, then I think you should set the milestone to duplicate/wontfix. Otherwise, it is probably better to change the status back to 'new'.

comment:18 Changed 9 years ago by zimmerma

with Sage 4.7.2 I get the following:

sage: for p in prime_range(10^5):
....:     K = GF(p)
....:     a = K(0)
....:     
sage: import gc
sage: gc.collect()
0
sage: from sage.rings.finite_rings.finite_field_prime_modn import \
....: FiniteField_prime_modn as FF
sage: L = [x for x in gc.get_objects() if isinstance(x, FF)]
sage: len(L), L[0], L[len(L)-1]
(9592, Finite Field of size 2, Finite Field of size 99767)

thus whenever we use the finite field we get a memleak. (If I remove the a=K(0) line, I get only two elements in L, for p=2 and p=99991.)

Paul

comment:19 Changed 9 years ago by zimmerma

I confirm (cf comment 8) that if I comment out the line

    _cache[(X, Y, category)] = weakref.ref(H)

in categories/homset.py, then the memory leak from comment 18 disappears.

Paul

comment:20 Changed 9 years ago by SimonKing

I think we have a different problem here.

The finite fields themselves should be cached. The cache (see GF._cache) uses weak references, which should be fine.

Also, there are special methods zero_element and one_element which should do caching, because zero and one are frequently used and should not be created over and over again.

However, it seems that all elements of a finite field are cached - and that's bad!

sage: K = GF(5)
sage: K(2) is K(2)
True
sage: K.<a> = GF(17^2)
sage: K(5) is K(5)
True

I see absolutely no reason why all 17^2 elements should be cached.

Fortunately, we have no caching for larger finite fields:

sage: K.<a> = GF(17^10)
sage: K(5) is K(5)
False

comment:21 Changed 9 years ago by SimonKing

  • Status changed from needs_review to needs_work

In the following example, there is no memory leak that would be caused by the sage.categories.homset cache:

sage: len(sage.categories.homset._cache.keys())
100
sage: for p in prime_range(10^5):
....:     K = GF(p)
....:     
sage: len(sage.categories.homset._cache.keys())
100

However, when you do a conversion K(...) then a convert map is created, and apparently is cached:

sage: for p in prime_range(10^5):
....:     K = GF(p)
....:     a = K(0)
....:
sage: len(sage.categories.homset._cache.keys())
9692

The homset cache does use weak references. Hence, it is surprising that the unused stuff can not be garbage collected. Apparently there is some direct reference involved at some point.

I am very stronglyagainst removing the cache of sage.categories.homset. Namely, elliptic curve code uses finite fields and maps involving finite fields a lot, and killing the cache is likely to cause a massive regression.

However, since we actually have coercion maps (not just any odd map), I expect that the direct reference comes from the coercion model. I suggest to look into the coercion code and use weak references there.

By the way, I don't know why the status is "needs review". I think it clearly is "needs work".

comment:22 in reply to: ↑ 8 Changed 9 years ago by SimonKing

Replying to jpflori:

So the culprit is "_cache" in sage.categories.homset which has a direct reference to the finite fields in its keys.

Oops, that could indeed be the problem! Namely, the homset cache uses weak references to its values, but uses direct references to its keys! Perhaps using weak references as keys would work?

comment:23 Changed 9 years ago by zimmerma

it seems the complete caching of field elements only occurs for p < 500:

sage: K=GF(499)
sage: K(5) is K(5)
True
sage: K=GF(503)
sage: K(5) is K(5)
False

A workaround to this memory leak is to free the cache from time to time (thanks Simon):

sage: sage.categories.homset._cache.clear()

Paul

comment:24 Changed 9 years ago by SimonKing

On the other hand, it could be that using weak keys in the homset cache will not work: The keys are triples: domain, codomain and category.

What we want is: If either the domain or the codomain or the category have no strong reference, then the homset can be garbage collected.

Hence: Why don't we use a dictionary of dictionaries of dictionaries?

What I mean is:

  • The keys of sage.categories.homset._cache should be weak references to the domain
  • The values of sage.categories.homset._cache should be dictionaries whose keys are weak references to the codomain.
  • The keys of these dictionaries should be weak references to the category, and the value a weak reference to the homset.

Hence, if there is no strong reference to either domain or codomain or category, then the homset can be deallocated.

comment:25 Changed 9 years ago by SimonKing

The idea sketched in the previous comment seems to work!!!

With it, after running

sage: for p in prime_range(10^5):
....:     K = GF(p)
....:     a = K(0)
....:     print get_memory_usage()

ends with printing 825.45703125

Without it, it ends with printing 902.8125

I don't know if we should ban caching of field elements?

How could fixing that memory leak be demonstrated by a doc test?

Anyway, I will post a preliminary patch in a few minutes (so that you can see if it fixes at least part of the problem for you), while I am running sage -tp 2 devel/sage.

comment:26 Changed 9 years ago by SimonKing

Patch's up!

comment:27 Changed 9 years ago by SimonKing

Hm. There seems to be a problem.

sage -t  devel/sage/doc/en/constructions/linear_codes.rst
The doctested process was killed by signal 11

What is signal 11?

comment:28 Changed 9 years ago by zimmerma

signal 11 is Segmentation Fault

Paul

comment:29 Changed 9 years ago by SimonKing

Indeed it is a segfault. And it is triggered by

sage: w = vector([1,1,-4])

comment:30 Changed 9 years ago by SimonKing

Monique van Beek just pointed me to the fact that the error occurs even earlier:

sage: is_Ring(None)
<BOOOOOM>

comment:31 Changed 9 years ago by SimonKing

Moreover,

sage: None in Rings()
<BOOOOOOM>

comment:32 Changed 9 years ago by SimonKing

That said: I am not working on top of vanilla sage, but I use some patches. In particular, I use #11900, which introduces so called Category_singleton, which has a specialised and very fast containment test.

I would not like to change #11900, but prefer to change my patch from here so that it works on top of #11900.

comment:33 Changed 9 years ago by SimonKing

It turns out that indeed the bug is in #11900. So, I have to fix it there.

comment:34 follow-up: Changed 9 years ago by SimonKing

Cc to Nicolas, since it concerns categories:

Do we want that Hom(1,1) is still supported?

I think it does not make sense at all to talk about the homomorphisms of the number 1 to the number 1. The problem (for my patch as it is posted here) is the fact that one can't create a weak reference to the number 1.

comment:35 Changed 9 years ago by SimonKing

  • Cc nthiery added

Sorry, I forgot to update the Cc field.

Nicolas, please read my previous comment.

comment:36 Changed 9 years ago by SimonKing

With my patch, applied on top of #11900, I get

        sage -t  devel/sage-main/sage/structure/parent.pyx # 2 doctests failed
        sage -t  devel/sage-main/sage/structure/category_object.pyx # 2 doctests failed
        sage -t  devel/sage-main/sage/rings/polynomial/polynomial_singular_interface.py # 1 doctests failed
        sage -t  devel/sage-main/sage/rings/polynomial/multi_polynomial_ring.py # 36 doctests failed
        sage -t  devel/sage-main/sage/structure/parent_base.pyx # 2 doctests failed

At least some of the errors are like

    sage: n = 5; Hom(n,7)
Exception raised:
    Traceback (most recent call last):
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_3[4]>", line 1, in <module>
        n = Integer(5); Hom(n,Integer(7))###line 108:
    sage: n = 5; Hom(n,7)
      File "/home/simon/SAGE/sage-4.8.alpha3/local/lib/python/site-packages/sage/categories/homset.py", line 159, in Hom
        cache2 = _cache[X]
      File "/home/simon/SAGE/sage-4.8.alpha3/local/lib/python2.6/weakref.py", line 243, in __getitem__
        return self.data[ref(key)]
    TypeError: cannot create weak reference to 'sage.rings.integer.Integer' object

and I really don't see why this should be considered a bug.

comment:37 Changed 9 years ago by SimonKing

  • Status changed from needs_work to needs_info

At least one of the errors in polynomial rings is due to a wrong order of initialising things: There is some information missing, and by consequence a weak reference can't be created.

I fixed this problem in the new patch version.

I put it as "needs info", because of my question on whether or not we want to consider an integer as object in a category.

comment:38 Changed 9 years ago by SimonKing

I was told by Mike Hansen why weak references to integers and rationals do not work.

I see three options:

#. Drop the support for Hom(1,1) (which I'd prefer) #. Add a cdef'd attribute __weakref__ to sage.structure.element.Element, which would create an overhead for garbage collection for elements, and also a memory overhead. #. Use two category.homset caches in parallel: One (the default) that uses weak references, and another one that uses "normal" references if weak references fail.

comment:39 Changed 9 years ago by SimonKing

FWIW:

With the latest patch, the tests in polynomial_singular_interface and in multi_polynomial_ring pass.

There remain the following problems:

sage -t  "devel/sage-main/sage/structure/parent.pyx"        
**********************************************************************
File "/home/simon/SAGE/sage-4.8.alpha3/devel/sage-main/sage/structure/parent.pyx", line 1410:
    sage: n = 5; Hom(n,7)
Exception raised:
    Traceback (most recent call last):
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_33[4]>", line 1, in <module>
        n = Integer(5); Hom(n,Integer(7))###line 1410:
    sage: n = 5; Hom(n,7)
      File "/home/simon/SAGE/sage-4.8.alpha3/local/lib/python/site-packages/sage/categories/homset.py", line 159, in Hom
        cache2 = _cache[X]
      File "/home/simon/SAGE/sage-4.8.alpha3/local/lib/python2.6/weakref.py", line 243, in __getitem__
        return self.data[ref(key)]
    TypeError: cannot create weak reference to 'sage.rings.integer.Integer' object
**********************************************************************
File "/home/simon/SAGE/sage-4.8.alpha3/devel/sage-main/sage/structure/parent.pyx", line 1412:
    sage: z=(2/3); Hom(z,8/1)                                                                            
Exception raised:                                                                                        
    Traceback (most recent call last):                                                                   
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/ncadoctest.py", line 1231, in run_one_test        
        self.run_one_example(test, example, filename, compileflags)                                      
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/sagedoctest.py", line 38, in run_one_example      
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_33[5]>", line 1, in <module>
        z=(Integer(2)/Integer(3)); Hom(z,Integer(8)/Integer(1))###line 1412:
    sage: z=(2/3); Hom(z,8/1)
      File "/home/simon/SAGE/sage-4.8.alpha3/local/lib/python/site-packages/sage/categories/homset.py", line 159, in Hom
        cache2 = _cache[X]
      File "/home/simon/SAGE/sage-4.8.alpha3/local/lib/python2.6/weakref.py", line 243, in __getitem__
        return self.data[ref(key)]
    TypeError: cannot create weak reference to 'sage.rings.rational.Rational' object
**********************************************************************
1 items had failures:
   2 of   8 in __main__.example_33
***Test Failed*** 2 failures.
For whitespace errors, see the file /home/simon/.sage//tmp/parent_2986.py
         [11.6 s]

and

sage -t  "devel/sage-main/sage/structure/category_object.pyx"
**********************************************************************
File "/home/simon/SAGE/sage-4.8.alpha3/devel/sage-main/sage/structure/category_object.pyx", line 590:
    sage: n = 5; Hom(n,7)
Exception raised:
    Traceback (most recent call last):
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_17[4]>", line 1, in <module>
        n = Integer(5); Hom(n,Integer(7))###line 590:
    sage: n = 5; Hom(n,7)
      File "/home/simon/SAGE/sage-4.8.alpha3/local/lib/python/site-packages/sage/categories/homset.py", line 159, in Hom
        cache2 = _cache[X]
      File "/home/simon/SAGE/sage-4.8.alpha3/local/lib/python2.6/weakref.py", line 243, in __getitem__
        return self.data[ref(key)]
    TypeError: cannot create weak reference to 'sage.rings.integer.Integer' object
**********************************************************************
File "/home/simon/SAGE/sage-4.8.alpha3/devel/sage-main/sage/structure/category_object.pyx", line 592:
    sage: z=(2/3); Hom(z,8/1)
Exception raised:
    Traceback (most recent call last):
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_17[5]>", line 1, in <module>
        z=(Integer(2)/Integer(3)); Hom(z,Integer(8)/Integer(1))###line 592:
    sage: z=(2/3); Hom(z,8/1)
      File "/home/simon/SAGE/sage-4.8.alpha3/local/lib/python/site-packages/sage/categories/homset.py", line 159, in Hom
        cache2 = _cache[X]
      File "/home/simon/SAGE/sage-4.8.alpha3/local/lib/python2.6/weakref.py", line 243, in __getitem__
        return self.data[ref(key)]
    TypeError: cannot create weak reference to 'sage.rings.rational.Rational' object
**********************************************************************
1 items had failures:
   2 of   8 in __main__.example_17
***Test Failed*** 2 failures.
For whitespace errors, see the file /home/simon/.sage//tmp/category_object_3050.py
         [2.7 s]

and

sage -t  "devel/sage-main/sage/structure/parent_base.pyx"   
**********************************************************************
File "/home/simon/SAGE/sage-4.8.alpha3/devel/sage-main/sage/structure/parent_base.pyx", line 108:
    sage: n = 5; Hom(n,7)
Exception raised:
    Traceback (most recent call last):
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_3[4]>", line 1, in <module>
        n = Integer(5); Hom(n,Integer(7))###line 108:
    sage: n = 5; Hom(n,7)
      File "/home/simon/SAGE/sage-4.8.alpha3/local/lib/python/site-packages/sage/categories/homset.py", line 159, in Hom
        cache2 = _cache[X]
      File "/home/simon/SAGE/sage-4.8.alpha3/local/lib/python2.6/weakref.py", line 243, in __getitem__
        return self.data[ref(key)]
    TypeError: cannot create weak reference to 'sage.rings.integer.Integer' object
**********************************************************************
File "/home/simon/SAGE/sage-4.8.alpha3/devel/sage-main/sage/structure/parent_base.pyx", line 110:
    sage: z=(2/3); Hom(z,8/1)
Exception raised:
    Traceback (most recent call last):
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_3[5]>", line 1, in <module>
        z=(Integer(2)/Integer(3)); Hom(z,Integer(8)/Integer(1))###line 110:
    sage: z=(2/3); Hom(z,8/1)
      File "/home/simon/SAGE/sage-4.8.alpha3/local/lib/python/site-packages/sage/categories/homset.py", line 159, in Hom
        cache2 = _cache[X]
      File "/home/simon/SAGE/sage-4.8.alpha3/local/lib/python2.6/weakref.py", line 243, in __getitem__
        return self.data[ref(key)]
    TypeError: cannot create weak reference to 'sage.rings.rational.Rational' object
**********************************************************************
1 items had failures:
   2 of   8 in __main__.example_3
***Test Failed*** 2 failures.
For whitespace errors, see the file /home/simon/.sage//tmp/parent_base_3078.py
         [2.6 s]

So, essentially this is just a single test that comes in two versions and is repeated three times - and I would actually say that not raising an error was a bug.

It seems that Hom(1/2,2/3) and similar nonsense is not used in Sage. Hence, I think these tests should be removed. I'll ask sage-devel.

comment:40 Changed 9 years ago by SimonKing

Without the patch:

sage: def test():
....:     for p in prime_range(10^5):
....:         K = GF(p)
....:         a = K(0)
....:         
sage: m0 = get_memory_usage()
sage: %time test()
CPU times: user 7.75 s, sys: 0.08 s, total: 7.83 s
Wall time: 7.84 s
sage: get_memory_usage() - m0
80.234375

With the patch:

sage: def test():
....:     for p in prime_range(10^5):
....:         K = GF(p)
....:         a = K(0)
....:         
sage: m0 = get_memory_usage()
sage: %time test()
CPU times: user 7.59 s, sys: 0.01 s, total: 7.60 s
Wall time: 7.61 s
sage: get_memory_usage() - m0
8.53515625

So, the memory does mildly increase, but it seems that most of the leak is fixed.

I think that a test of the kind

sage: get_memory_usage() - -m0 < 10
True

might be used as a doc test.

comment:41 in reply to: ↑ 34 Changed 9 years ago by nthiery

Replying to SimonKing:

Cc to Nicolas, since it concerns categories:

Do we want that Hom(1,1) is still supported?

I think it does not make sense at all to talk about the homomorphisms of the number 1 to the number 1. The problem (for my patch as it is posted here) is the fact that one can't create a weak reference to the number 1.

I don't see much point either. We had a similar discussion a while ago about whether elements should be objects in a category, and as far as I remember, the answer was no by default (Element does not inherit from CategoryObject?). So +1 on my side to kill this dubious feature. You might want to double check on sage-algebra just to make sure.

comment:42 follow-up: Changed 9 years ago by zimmerma

Simon, you can also use the test suggested by Jean-Pierre Flori (see comment 18 for an example).

Paul

comment:43 in reply to: ↑ 42 Changed 9 years ago by SimonKing

Hi Paul,

Replying to zimmerma:

Simon, you can also use the test suggested by Jean-Pierre Flori (see comment 18 for an example).

Yes, that looks good. With my patch, the test would be like

sage: for p in prime_range(10^5):
....:     K = GF(p)
....:     a = K(0)
....:     
sage: import gc
sage: gc.collect()
1881                                                                                                     
sage: from sage.rings.finite_rings.finite_field_prime_modn import FiniteField_prime_modn as FF           
sage: L = [x for x in gc.get_objects() if isinstance(x, FF)]
sage: len(L), L[0], L[len(L)-1]
(2, Finite Field of size 2, Finite Field of size 99991)

The people at sage-devel somehow seem to agree that objects of a category should be instances of CategoryObject (which elements aren't!), and that we should thus drop the Hom(2/3,8/1) test.

In addition to that, I suggest to provide a better error message, something like

sage: Hom(2/3, 8/1)
Traceback (most recent call last):
...
TypeError: Objects of categories must be instances of <type 'sage.structure.category_object.CategoryObject'>, but 2/3 isn't.

Cheers,

Simon

comment:44 Changed 9 years ago by SimonKing

  • Keywords sd35 added

One bad detail: I'd like to add the test to the documentation of sage.categories.homset. However, if I insert it in the appropriate place, there will be a conflict with both #9138 and #11900.

I could try to insert the test in a less logical place, in order to avoid to have a dependency.

comment:45 Changed 9 years ago by SimonKing

  • Dependencies set to #11900
  • Status changed from needs_info to needs_review

I am very much afraid that I have not been able to make my patch independent of #11900. This is not just because of the documentation, but also because of some details in the choice of the homset's category.

Anyway, it needs review (and so does the most recent version of #11900, by the way)!

comment:46 Changed 9 years ago by SimonKing

  • Status changed from needs_review to needs_work

I did try the new doctests in sage/categories/homset.py. However, with other patches applied, the number returned by gc.collect() changes.

So, for stability, I suggest to simplify the test, so that only the number of finite fields remaining in the cache is tested.

comment:47 Changed 9 years ago by SimonKing

  • Authors set to Simon King
  • Status changed from needs_work to needs_review

I updated the patch.

Difference to the previous patch: The number of objects collect by gc is marked as random (indeed, it will change with #11115 applied). What we are really interested in is the number of finite fields that remains in the cache after garbage collection. This number is two and is not random. Thus, that test is preserved.

comment:48 Changed 9 years ago by SimonKing

  • Status changed from needs_review to needs_work

I think I need help with debugging.

When I have sage-4.8.alpha3 with #9138, #11900, #715 and #11115, then all doctests pass.

When I also have #11521, then the test sage/rings/number_field/number_field_rel.py segfaults. When I run the tests in verbose mode, then all tests seem to pass, but in the very end it says

4 items had no tests:
    __main__
    __main__.change_warning_output
    __main__.check_with_tolerance
    __main__.warning_function
69 items passed all tests:
...
660 tests in 73 items.
660 passed and 0 failed.
Test passed.
The doctested process was killed by signal 11
         [15.0 s]

So, could it be that not one of the tests was killed, but the test process itself?

What is even more confusing: When I run the tests with the option -randorder, then most of the time the tests pass without a problem.

Can you give me any pointer on how those things could possibly be debugged?

comment:49 Changed 9 years ago by fbissey

It sometimes happen that the sage session itself crash on exit. This is probably one of these. Last time I got one it was related to singular I think. It is quite difficult to corner these with gdb. The best you can hope is start a sage session with gdb and then try the last doctest sequence and quit sage, it may lead to the crash in which case you may have some luck with gdb. But this is one of these case where gdb itself may be interfering. I don't think I have time to look into this right now but I'll put it into my "To do" list in case it isn't solved when i have time to spare.

Changed 9 years ago by SimonKing

Use weak references for the keys of the homset cache. If weak references are not supported, then raise an error, pointing out that category objects should be CategoryObjects.

comment:50 Changed 9 years ago by SimonKing

  • Work issues set to Understand why a weak key dictionary is not enough

I have attached a new patch version. It fixes the segfault I mentioned. However, it also does not fix the memory leak.

The difference between the two versions is: The new patch still uses weak references to the key of the cache, but a strong reference to the value (i.e., the homset).

The homset has a reference to domain and codomain, which constitute the cache key. Thus, I expected that it does not make any difference whether one has a strong or a weak reference to the homset. But I stand corrected. That needs to be investigated more deeply.

comment:51 follow-up: Changed 9 years ago by jpflori

Dear Simon,

Thanks a lot for taking care of all of this !

I'm just back from vacation and will have a look at all your patches in the following days.

I must point out that even if the memory leak was small, it did still mater because I used a LOT of them and after several hours of computations it ate all the available memory the piece of code in the ticket description is just a minimal example, in my actual code I used different curves and similar simple computations on them)...

And to make things clear, I must say I put that ticket as need review in order to get it closed as wont fix/duplicate because I thought it could be seen as a concrete example of ticket 715 and all the work could be done there.

Of course youre the one currently doing all the wok, so do as you want :)

Cheers,

JP

comment:52 in reply to: ↑ 51 Changed 9 years ago by SimonKing

Hi Jean-Pierre,

Replying to jpflori:

I must point out that even if the memory leak was small,

It isn't small.

And to make things clear, I must say I put that ticket as need review in order to get it closed as wont fix/duplicate because I thought it could be seen as a concrete example of ticket 715 and all the work could be done there.

I am not sure whether it would be good to do everything on one ticket, as the topics are related, but clearly disting: #715 is about weak "TripleDict" for coercion, #12215 is about a weak version of cached_function, and the ticket here is about the cache of homsets.

On the other hand: I am about to post a new patch here, with #715 as a dependency. It will use the new version of TripleDict from #715. So, one could argue that there is a common tool for both tickets, and they belong together.

Anyway. The new patch will fix the leak, but it will not suffer from the segfaults.

Cheers,

Simon

comment:53 Changed 9 years ago by SimonKing

  • Dependencies changed from #11900 to #11900 #715
  • Description modified (diff)
  • Status changed from needs_work to needs_review
  • Work issues Understand why a weak key dictionary is not enough deleted

I have attached another patch under a new name, using a new approach: The weak TripleDict, that I introduce at #715, is an appropriate tool for the cache of homsets. The key is the triple (domain, codomain, category), and the value is a weak reference to the corresponding homset.

There is a new test (the same as in the other patch), showing that the leak is fixed. And all tests in sage/schemes, sage/rings, sage/categories and sage/structure pass.

Hence: Needs review!

Apply trac11521_triple_homset.patch

comment:54 Changed 9 years ago by SimonKing

In fact all tests pass for me, with #9138, #11900, #11115, #715 and trac11521_triple_homset.patch applied on top of sage-4.8.alpha3.

comment:55 Changed 9 years ago by jpflori

Applied all patches mentionned in the previous comment without problems on sage-4.8.alpha5 and "make ptestlong" passed all tests. I'll try to check that the memory leaks actually disappeared :) today and have a look at your patches to give them positive reviews as well.

comment:56 Changed 9 years ago by SimonKing

It depends on what you call "the" leak.

At least, the patch fixes one leak, namely the one that is doctested against. I am not sure whether it is enough to fix the leak exposed in the ticket description:

sage: K = GF(1<<55,'t')
sage: a = K.random_element()
sage: get_memory_usage()
872.26171875
sage: while 1:
....:     E = EllipticCurve(j=a)
....:     P = E.random_point()
....:     Q = 2*P;
....:     print get_memory_usage()

The memory usage does climb, but it climbs a lot slower than, for example, in sage-4.6.2.

comment:57 Changed 9 years ago by SimonKing

PS: I just tested that even #12215 (which is a lot more aggressive in using weak references and fixes a gaping leak) is not enough to totally fix the example from the ticket description. Hence, I think that, for now, we should be happy with a partial fix and investigate the remaining problems on different tickets.

comment:58 Changed 9 years ago by SimonKing

After running the example for a couple of minutes, interrupting and doing garbage collection, I find:

sage: from sage.schemes.elliptic_curves.ell_finite_field import EllipticCurve_finite_field
sage: LE = [x for x in gc.get_objects() if  isinstance(x,EllipticCurve_finite_field)]
sage: len(LE)
632
sage: import objgraph
sage: from sage.rings.finite_rings.finite_field_prime_modn import FiniteField_prime_modn as FF
sage: LF = [x for x in gc.get_objects() if isinstance(x, FF)]
sage: len(LF)
1

LF shows that one leak is fixed. However, the curves in LE, which are all defined over the same finite field, can not be collected.

comment:59 Changed 9 years ago by SimonKing

Using objgraph, I found that the remaining leak seem to be related with sage.schemes.generic.homset.SchemeHomsetModule_abelian_variety_coordinates_field_with_category. Since this is another homset, it would make sense to try and fix it here.

comment:60 Changed 9 years ago by SimonKing

Aha!!

I found that we have many equal instances of these scheme homsets:

sage: LE = [x for x in gc.get_objects() if isinstance(x,SchemeHomsetModule_abelian_variety_coordinates_field)]
sage: LE[100] == LE[150]
True

So, I guess we should fix the memory leak by using UniqueRepresentation or UniqueFactory!

comment:61 Changed 9 years ago by jpflori

My 2 cents: Isn't it somehow related to the fact that elliptic curves are not unique parents? In some of my original tests, I used different curves and depending on the cache where they were stored "equality" testing was made either with "is" or with "==". In the former case, the "same" elliptic curve would be stored several times making the "leak" even bigger.

comment:62 Changed 9 years ago by SimonKing

OK. Then why not reduce the non-uniqueness? EllipticCurve could easily be turned into a cached function, which would mean that two elliptic curves became identical if they are defined by equal data. That is not enough to be a unique parent (there could be equal elliptic curves defined by different data), but it could be a step forward. And with #12215, it could then actually be turned into a weak cached function.

comment:63 Changed 9 years ago by SimonKing

  • Status changed from needs_review to needs_work

Yes, using a cached function would indeed fix the leak that is cause by the useless creation of creating an elliptic curve with the same basic data over and over again. In particular, it would fix the leak from the ticket description (#12215 is not needed for that).

I am preparing a new patch (it requires do tests and stuff), so, it is "needs work" for now.

comment:64 Changed 9 years ago by SimonKing

It is definitely not easy. It seems that the elliptic curve code relies on the non-uniqueness of elliptic curves: I get loads of errors.

comment:65 Changed 9 years ago by SimonKing

Ah! Tuples of elements of different rings can be equal, but the corresponding elliptic curves would be different. So, the ring needs to be part of the description.

comment:66 Changed 9 years ago by SimonKing

I am undecided what we should do:

We could argue that my patch does fix some memory leak, and leave it like that (modulo comments of the reviewer, of course). In order to fix the memory leak exposed by the example from the ticket description, we have no chance but to have some kind of uniqueness for elliptic curves. But that is a different topic and should thus be dealt with on a different ticket (perhaps such ticket already exists?).

Or we could argue that this ticket is about fixing the memory leak that is exposed in the description. Hence, we should do all necessary steps here.

And then, there is still the question whether the number theorists really want the elliptic curves be "weakly unique" (i.e., identical if the given data are equal). In addition, we might want that the elliptic curve cache is weak - which might imply that we have to wait for #12215.

What do you think?

I guess I'll also ask on sage-nt.

comment:67 Changed 9 years ago by jpflori

I think we can go your way and stop working on this ticket now (or rather once I've taken the time to go through your patches to properly review them). Thus we can already some non negligible improvements merged.

Anyway the problem of uniqueness of elliptic curve is highly non trivial and deserves its own ticket. I guess #11474 would be a right place to treat that problem. An alternative would be to make this ticket depend on #11474.

comment:68 Changed 9 years ago by SimonKing

Hi Jean-Pierre,

indeed, you found the correct ticket.

And there is no need to ask on sage-nt, since I already did before opening #11474: Sage-nt first seemed to agree that uniqueness is good, so I opened the ticket. But then, they became convinced that much of the elliptic curve stuff depends on choices (generators), so that we should consider elliptic curves to be mutable objects, for which we wouldn't like to have uniqueness.

Considering the discussion on sage-nt, #11474 probably is "wontfix".

comment:69 Changed 9 years ago by SimonKing

I am not sure whether we should really stop work right there. After all, it is still not 100% clear to me why the elliptic curve E, that is created in the loop in an increasing number of copies, can not be garbage collected.

First, E is created, and some coercion into it is created. The coercion is cached. By #715, some key of the cache provides a weak reference to E. In addition, the coerce map refers to a homset, and the homset refers to its codomain, which is E. I wonder whether there is a chain of strong references from E to the homset (one could try to find out using objgraph, I guess). If there is, then we would have a reference cycle. If that was the case, then we needed to find a __del__ method that prevents the cycle from being garbage collected.

comment:70 Changed 9 years ago by jpflori

Objgraph only shows a reference left as _codomain in a dict in a SchemeHomsetModule_a_v_c_f (defined in sage.schemes.generic.homset).

comment:71 Changed 9 years ago by jpflori

The homset of points of the ab. group of the curve is itself reference by an IntegerMulAction?, the point at infinity on the curve (no idea when it gets created) and a dict with 11 elements. I guess the problem might be that in addition to the _cache in sage.categories.homset the homset of points is directly link within the Action object ? That could also be nonsense.

comment:72 Changed 9 years ago by SimonKing

The IntegerMulAction should be fine, as it is stored in a TripleDict (hence, weakly, by #715). Where does the dict occur?

comment:73 Changed 9 years ago by SimonKing

I am a bit puzzled.

I see that there are a couple of cycles, involving an elliptic curve, a SchemeHomsetModule..., an IntegerMulAction, and an EllipticPoint_finit_field. However, none of them has a __del__ method, thus, the garbage collection should be able to collect the cycles.

But the backref graph also shows something that I don't understand: Apparently there is a top level dict with three items that points to the elliptic curve. Where is it?

comment:74 follow-up: Changed 9 years ago by SimonKing

I got it!!

By #715, both the keys and the value of sage.structure.coerce_dict.TripleDict are by weak reference -- if possible. If the value does not allow for a weak reference, then (to simplify code) a constant function is used instead.

Actions are stored in such TripleDicts. However, an IntegerMulAction can not be weakly referenced. Hence, there is a strong reference, and no garbage collection can occur (the value points back to the keys, hence, they can't be collected either).

Solution: Make IntegerMulAction weakly referenceable! That should suffice.

comment:75 Changed 9 years ago by SimonKing

It does not suffice. But I think it is a step to the right direction.

comment:76 Changed 9 years ago by SimonKing

Apparently the weak reference to the value is not used in the TripleDict! So, I have to look at #715, perhaps I forgot to implement something there.

comment:77 Changed 9 years ago by SimonKing

Aha, I was mistaken: At #715 I only took care for weak references to the keys of TripleDict. The big question is whether we additionally want weak references to the values. I am somehow against doing this at #715.

comment:78 Changed 9 years ago by jpflori

My point was that it seemed to me that there was some (not weak!) reference in the dictionary ZZ._action_hash pointing to the IntegerMulAction? itself pointing to the curve and I thought it could prevent garbage collection.

So I added a cpdef function to be able to clear that dictionary and see if garbage collection occurs once it is emptied, however it is a the C level so kind of the whole Sage library had to be rebuilt and I had to go back home before rebuilding was finished...

Anyway, I'll have a look at it tomorrow :)

By the way, have you any idea if dictionaries declared at the C level such as ZZ._action_hash are detected by objgraph ?

comment:79 Changed 9 years ago by SimonKing

Hi Jean-Pierre,

The thing with the _action_hash is a good finding. I thought it would be a TripleDict, but it is just a usual dict. And this could indeed be a problem. I don't know if this is visible to objgraph.

But also I think that in addition we have the problem of strong references to the values of TripleDict. In principle, one could use weak references for not only the keys but also to the values -- perhaps this could be done in #715.

Or one could leave TripleDict as it is currently in #715, but explicitly use weak references to functors for coercion (which needs to be enabled first). _action_hash then has to use weak references as well.

There might be a speed problem, though.

comment:80 Changed 9 years ago by jpflori

For info, resetting ZZ._action_hash to {} is not sufficient to let the IntegerMulAction? be garbage collected.

Objgraph shows 3 objects pointing to the abelian group of points of the curve (itself pointing to the curve):

  • the IntegerMulAction?
  • the point at infinity (?)
  • a dict of size 11 which is in fact an attribute (_Schemering_point_homset) of the curve (as a scheme) itself.

I'd be happy to test a weakref for values version of your patch regardless of the spedd impact.

comment:81 follow-up: Changed 9 years ago by jpflori

I guess you're solution should be the right one because resetting the coercion cache solves the problem, so dealing with it should be enough for the example in the ticket description.

Only one copy gets cached in _action_hash because the curves are equal (==) but not identical (is). However, if one uses (really) different curves, one of each gets also cached in _action_hash

Here is  small piece of code to test the effect of clearing different caches (the code to clear the ZZ cache is left as an exercise, anyway we should use weakrefs there):

sage: import gc, inspect
sage: load /usr/share/shared/objgraph.py # or whatever you should type to use objgraph
sage: from sage.schemes.elliptic_curves.ell_finite_field import EllipticCurve_finite_field
sage: K = GF(1<<60, 't')
sage: j = K.random_element()
sage: for i in xrange(100):
....:     E = EllipticCurve(j=j); P = E.random_point(); 2*P; del P; del E;
....:
sage: gc.collect()
68
sage: L = [x for x in gc.get_objects() if isintance(x, EllipticCurve_finite_field)]
sage: len(L)
100
sage: del L
sage: get_coercion_model().reset_cache()
sage: gc.collect()
6172
sage: L = [x for x in gc.get_objects() if isintance(x, EllipticCurve_finite_field)]
sage: len(L)
1
sage: del L
sage: ZZ.del_hash()
sage: gc.collect()
56
sage: L = [x for x in gc.get_objects() if isintance(x, EllipticCurve_finite_field)]
sage: len(L)
0
sage: del L
sage: for i in xrange(100):
....:     E = EllipticCurve(j=K.random_element()); P = E.random_point(); 2*P; del P; del E;
....:
sage: gc.collect()
174
sage: L = [x for x in gc.get_objects() if isintance(x, EllipticCurve_finite_field)]
sage: len(L)
100
sage: del L
sage: get_coercion_model().reset_cache()
sage: gc.collect()
738
sage: L = [x for x in gc.get_objects() if isintance(x, EllipticCurve_finite_field)]
sage: len(L) # _action_hash
100
sage: del L
sage: ZZ.del_hash()
sage: gc.collect()
5742
sage: L = [x for x in gc.get_objects() if isintance(x, EllipticCurve_finite_field)]
sage: len(L) # mmm got one left!!! not sure where it comes from yet...
1

comment:82 in reply to: ↑ 81 ; follow-up: Changed 9 years ago by SimonKing

Hi Jean-Pierre,

Replying to jpflori:

I guess you're solution should be the right one because resetting the coercion cache solves the problem, so dealing with it should be enough for the example in the ticket description.

But it is not so easy because...

Only one copy gets cached in _action_hash because the curves are equal (==) but not identical (is).

... elliptic curves are not unique parents.

I think it would be a mistake to use a weak reference to the value of TripleDict. I tried and got many errors - and I think this was because some important data (homsets, actions, ...) was garbage collected even though there was still a strong reference to domain and codomain. In that situation, the value must not be garbage collected.

sage: len(L) # mmm got one left!!! not sure where it comes from yet...

Don't forget the last copy of E that was defined in the loop!

Since I think that a weak reference to the values of TripleDict won't work: What else could one do? Or perhaps I should try again: It could be that the errors came from the wrong choice of a callback function.

comment:83 in reply to: ↑ 82 Changed 9 years ago by jpflori

Replying to SimonKing:

Hi Jean-Pierre, Replying to jpflori:

I guess you're solution should be the right one because resetting the coercion cache solves the problem, so dealing with it should be enough for the example in the ticket description.

But it is not so easy because...

Only one copy gets cached in _action_hash because the curves are equal (==) but not identical (is).

... elliptic curves are not unique parents.

Yep :)

I think it would be a mistake to use a weak reference to the value of TripleDict. I tried and got many errors - and I think this was because some important data (homsets, actions, ...) was garbage collected even though there was still a strong reference to domain and codomain. In that situation, the value must not be garbage collected.

I see...

sage: len(L) # mmm got one left!!! not sure where it comes from yet...

Don't forget the last copy of E that was defined in the loop!

In the new code I posted I explicitely added "del P; del E;" to the loop so no copy should be left. What's even stranger is that this remaining copy appears after the second loop (with random j invariants so different curves) but not after the first one (with constant j invariant so only one curve)!

Since I think that a weak reference to the values of TripleDict won't work: What else could one do? Or perhaps I should try again: It could be that the errors came from the wrong choice of a callback function.

Mmm, have to think more about all of that...

comment:84 Changed 9 years ago by SimonKing

I think I should describe the problem in more detail.

Currently,

  • There is a strong reference from __main__ to the action and coercion caches that are stored in the coercion model. That's fine.
  • Parents store actions as well. Some of these parents (the integer ring, for example) will always stay in memory.
  • There is a strong reference from any action and homset back to domain and codomain, which are used as keys in the cache. I think that's fine: If a homset is alive then its domain and codomain must remain alive as well.
  • The action and coercion caches have a strong reference to the values.
  • There is no direct reference from domain and codomain to the homset.

Hence, if an action is stored in the action cache, then there will always be a chain of strong references from __main__ via the cache to the action, and further to domain and codomain, so that it can not be collected.

On the other hand, if weak references to the values of the action and coercion caches are used, then an action or a coercion could die even when domain and codomain were still alive. That's probably not good. To the very least, it would imply that actions would be needed to be re-created over and over again.

How could that be solved?

comment:85 Changed 9 years ago by jpflori

If I understand correctly, the keys to all these dictionaries are triples and what you've done is that ift elements in that triple are weakrefs to non strongly refed objects, the key-value pair gets deleted so that garbage collection occur for these only weakrefed objects?

Therefore, if ZZ._action_hash gets deleted, only weakrefs to the abelian groups of points of the curve should be left in the coercion model (in the second triple dict _action_maps) and it should not prevent garbage collection...

comment:86 Changed 9 years ago by jpflori

Maybe the problem is that the value in that last dict corresponding to the triple with a (abelian group of points of a) curve is a (weakref to unless not weakreferrable) the IntegerMulAction? which itself has a strong ref to the curve which would prevent the curve to get collected?

comment:87 in reply to: ↑ 74 Changed 9 years ago by jpflori

This is basically what you concluded in Comment 74, so a solution could be to allow Functors to be weakreferreable or make them use weak refs for their [co]domains? You posted that making making IntegerMulAction? weak referrable is not enough, could you post a patch applying these ideas so that I can play with it?

Changed 9 years ago by SimonKing

Experimental patch using weak references for actions

comment:88 Changed 9 years ago by SimonKing

I was mistaken: Using weak references to the actions (on top of the other patch) does fix the leak - see the experimental patch that I have just posted.

I did not run any doctests on it, yet. But I tested that only one SchemeHomsetModule... survives the garbage collection (and I did not delete E).

comment:89 Changed 9 years ago by jpflori

I confirm the leak is fixed with your last patch, just launched a ptestlong.

comment:90 Changed 9 years ago by SimonKing

My first impression from some tests is that the additional patch causes a massive slow-down.

comment:91 follow-up: Changed 9 years ago by jpflori

I ran the sage test suite on the same pc, on a 4.8.alpha5 with patches and a 4.7.2 without patches, just on the schemes directory and got 1512 vs 1060 sec... some files required between 2 and 3 times more time (e.g. hyperelliptic_padic_field, heegner and ell_number_field which are already quite long, i'd say most of the file already long), others did not change at all.

comment:92 in reply to: ↑ 91 Changed 9 years ago by SimonKing

Replying to jpflori:

I ran the sage test suite on the same pc, on a 4.8.alpha5 with patches and a 4.7.2 without patches, just on the schemes directory and got 1512 vs 1060 sec...

That is not half as bad as I thought! However, it is far from good.

Do you also have the time with only the first patch, resp. with #715 only? After all, #715 uses weak references and may thus be responsible for some slow-down.

comment:93 Changed 9 years ago by SimonKing

I have slightly modifies trac11521_triple_homset.patch: In the old version, I had created a TripleDict(50), but meanwhile I learnt that the parameter of the TripleDict should not be even and should best be a prime number. In the new version, it is prime...

Concerning the weak references: Why exactly is the experimental patch so slow? Is it because the access to the weakly referenced actions is so slow? Or is it because the actions are garbage collected even if their domain/codomain is still alive, so that the actions are uselessly created over and over again? I suspect it is the latter.

comment:94 follow-up: Changed 9 years ago by jpflori

I guess you're right.

Here is a piece of code emphasizing the problem.

sage: K = GF(1<<60, 't')
sage: a = K.random_element()
sage: E = EllipticCurve([1, 0, 0, 0, a])
sage: P = E.random_point()
sage: 2*P
...
sage: ZZ._introspect_coerce()['_action_hash']
{(<type 'list'>, ...} # No Abelian group of points of the curve!!!
sage: get_coercion_model().get_cache()[1]
{(Integer Ring, Abelian group of points on Elliptic Curve defined by ..., <built-in function mul>): <weakref at ...; dead>, ...} # the "dead" are bad

Indeed, has all the dicts for coercion caches have weakrefs to their values, the actions get garbage collected. That becomes kind of tricky...

As pointed before, the problem is that if we let strong reference to the actions in the values of the dict, these action themselves have strong refs to the domain and codomain and so prevent the whole garbage collection to occur. Is it sensible to use weakrefs for [co]domains in Functors? Hence garbage collection can occur in the cache dicts, but if someone actually use functors directly, he must be sure to have some strong references to its domain and codomain somewhere to avoid garbage collection...

Another question: why not use a TripleDict? in the parent class for the _action_hash dict rather that a WeakValueDict? with three keys? That could somehow unify the approach taken here!

comment:95 in reply to: ↑ 94 ; follow-up: Changed 9 years ago by SimonKing

Replying to jpflori:

Indeed, has all the dicts for coercion caches have weakrefs to their values, the actions get garbage collected. That becomes kind of tricky...

Yes, and note that we have two locations for storing the actions:

  • in the coercion model
  • in the attribute _action_hash of parents.

I found that having weak references in the coercion model is enough for fixing the leak - even if one has strong references in _action_hash. That is something I don't fully understand. In the example of the ticket description, we have an action of ZZ. ZZ is not subject to garbage collection, hence, having a strong reference in ZZ._action_hash should keep the action alive, and thus the elliptic curve (namely the different copies of E that are created in the loop).

Anyway, even in that case we would see the drastic slow-down.

As pointed before, the problem is that if we let strong reference to the actions in the values of the dict, these action themselves have strong refs to the domain and codomain and so prevent the whole garbage collection to occur. Is it sensible to use weakrefs for [co]domains in Functors? Hence garbage collection can occur in the cache dicts, but if someone actually use functors directly, he must be sure to have some strong references to its domain and codomain somewhere to avoid garbage collection...

Indeed that would be the consequence. I think that would not be acceptable: If the user keeps an action, then s/he can rightfully expect that domain and codomain are automatically kept alive.

Another question: why not use a TripleDict? in the parent class for the _action_hash dict rather that a WeakValueDict? with three keys? That could somehow unify the approach taken here!

I was thinking of that, too. However, in addition to _action_hash, the parent also has a _action_list. And that might be a bigger problem.

comment:96 in reply to: ↑ 95 ; follow-up: Changed 9 years ago by jpflori

Replying to SimonKing:

Yes, and note that we have two locations for storing the actions: * in the coercion model * in the attribute _action_hash of parents. I found that having weak references in the coercion model is enough for fixing the leak - even if one has strong references in _action_hash. That is something I don't fully understand.

As I posted above in ZZ._action_hash equality is tested with "==" (rather than identity with "is") so in the code of the ticket description where only "one" curve is used, only one curve gets stored in _action_hash. If you try the code posted some comments above where (really) different curves are used, you'll see that the leak is not fixed if you dont use a similar approach for _action_hash as for the coercion model.

In the example of the ticket description, we have an action of ZZ. ZZ is not subject to garbage collection, hence, having a strong reference in ZZ._action_hash should keep the action alive, and thus the elliptic curve (namely the different copies of E that are created in the loop). Anyway, even in that case we would see the drastic slow-down.

Indeed that would be the consequence. I think that would not be acceptable: If the user keeps an action, then s/he can rightfully expect that domain and codomain are automatically kept alive.

The weakref domain and codomain in functors problem could be tackled by adding an optional parameter for the use of weakref.

By default, the behavior of functors would be as before with strong refs (and the option to false).

For the coercion models we would set the option to True and use weakref whenever possible.

comment:97 in reply to: ↑ 96 ; follow-up: Changed 9 years ago by SimonKing

Replying to jpflori:

As I posted above in ZZ._action_hash equality is tested with "==" (rather than identity with "is") so in the code of the ticket description where only "one" curve is used, only one curve gets stored in _action_hash.

Yes, but then I don't understand why there is no error: In some places, the coercion model tests for identity, and raises a big fat "coercion BUG" otherwise.

The weakref domain and codomain in functors problem could be tackled by adding an optional parameter for the use of weakref.

By default, the behavior of functors would be as before with strong refs (and the option to false).

For the coercion models we would set the option to True and use weakref whenever possible.

Again, I believe that it must not be up to the user: "Evidently" (for a human), the different copies of E in the while loop can be garbage collected. It is our job to hammer the "evidence" into Sage. We shouldn't simply say "let the user decide" whether he wants a leak or a potential disaster: I would call it a disaster if one has an action and suddenly its codomain is gone.

comment:98 in reply to: ↑ 97 Changed 9 years ago by jpflori

Replying to SimonKing:

Yes, but then I don't understand why there is no error: In some places, the coercion model tests for identity, and raises a big fat "coercion BUG" otherwise.

I'll have a look at it, I'm not completely up to what the coercion model actually does :) I guess the doc could also be extended on that :)

The weakref domain and codomain in functors problem could be tackled by adding an optional parameter for the use of weakref.

By default, the behavior of functors would be as before with strong refs (and the option to false).

For the coercion models we would set the option to True and use weakref whenever possible.

Again, I believe that it must not be up to the user: "Evidently" (for a human), the different copies of E in the while loop can be garbage collected. It is our job to hammer the "evidence" into Sage. We shouldn't simply say "let the user decide" whether he wants a leak or a potential disaster: I would call it a disaster if one has an action and suddenly its codomain is gone.

That would not really be up to the user if by default the functor does not use weakref, but if we change this behavior for the coercion model by switching the option. It would be fatly documented not to set the option to True for general use, so normal use would not lead to problems. Of course it all depends on what you mean by "up to the user". One could still intentionally set the option to true, and then cry that its codomain disappeared...

comment:99 Changed 9 years ago by jpflori

In the verify_action of the coerce model there is a PY_TYPE_CHECK(action, IntegerMulAction?) that might explain the absence of fat BUG.

comment:100 follow-up: Changed 9 years ago by jpflori

Anyway, it does not make much sense to me that the _action_hash dict in the Parent class uses "==" rather than "is", especially since the TripleDict? of the coercion model do. What do yo think? Have you any good reason to justify such a choice that I might have missed?

comment:101 in reply to: ↑ 100 Changed 9 years ago by SimonKing

Replying to jpflori:

Anyway, it does not make much sense to me that the _action_hash dict in the Parent class uses "==" rather than "is", especially since the TripleDict? of the coercion model do.

The old TripleDict didn't - I changed that only in #715. Perhaps it is debatable whether that change is fine. But:

What do yo think? Have you any good reason to justify such a choice that I might have missed?

See sage/structure/coerce.pyx:

            if y_mor is not None:
                all.append("Coercion on right operand via")
                all.append(y_mor)
                if res is not None and res is not y_mor.codomain():
                    raise RuntimeError, ("BUG in coercion model: codomains not equal!", x_mor, y_mor)

That is why I think one should test for identity, not equality.

On the other hand, we also have

        # Make sure the domains are correct
        if R_map.domain() is not R:
            if fix:
                connecting = R_map.domain().coerce_map_from(R)
                if connecting is not None:
                    R_map = R_map * connecting
            if R_map.domain() is not R:
                raise RuntimeError, ("BUG in coercion model, left domain must be original parent", R, R_map)
        if S_map is not None and S_map.domain() is not S:
            if fix:
                connecting = S_map.domain().coerce_map_from(S)
                if connecting is not None:
                    S_map = S_map * connecting
            if S_map.domain() is not S:
                raise RuntimeError, ("BUG in coercion model, right domain must be original parent", S, S_map)

in the same file. These lines apparently cope with the fact that there are non-unique parents, and they suggest that == is the right thing to do in the coerce caches.

But I think this is a discussion that belongs to #715.

comment:102 Changed 9 years ago by SimonKing

My suggestion at #715 is to use a different way of comparison in the case of actions respectively coerce maps. This complies with existing defaults in sage/structure/coerce.pyx

comment:103 Changed 9 years ago by jpflori

I agree to move the discussion for "==" and "is" to #715 (and also agree with your solution).

So, what about the original problem here and your thoughts about moving the weakref from the action itself to its domains and codomains?

If you really dislike the idea of having an option switched off by default (and a priori only on in the coercion model and not outside it unless one knows what he is doing), we could mimick what you wanna do with the TripleDict? and have the usual Functor and a Functor_weakref version (or you can go the other way around and add an extra arg to the TripleDict? constructor taking the equality operator as argument :) rather than having two classes)?

comment:104 Changed 9 years ago by SimonKing

Interestingly, the experimental patch for #715 that I have not submitted yet (need more tests) seems to be almost enough to fix the memory leak from the ticket description!

Namely:

sage: <the usual loop>
Traceback
...
KeyboardInterrupt: 
sage: import gc
sage: gc.collect()
585
sage: from sage.schemes.generic.homset import SchemeHomsetModule_abelian_variety_coordinates_field
sage: LE = [x for x in gc.get_objects() if  isinstance(x,SchemeHomsetModule_abelian_variety_coordinates_field)]
sage: len(LE)
2

And the experimental patch is not using weak references to the values of the TripleDict - only to the keys! Perhaps this ticket will eventually be a duplicate of #715?

Anyway, I need more tests and will probably submit the patch to #715 later today.

comment:105 Changed 9 years ago by jpflori

If you used "==" for equality testing for actions in the coercion model (as in the parent caches) and let "j=j" in the "usual piece of code", this is not surprising because for "==" all the curves are equal, so there will be no duplication nor memory leak.

To perform a better test, you should replace j=j by j=K.random_element() in the constructor of the elliptic curve (as I did a few tickets up) to get really different curves (i.e. for "is" and for "=="). I fear the memory leak is still there with this second example...

Changed 9 years ago by SimonKing

Experimental: Have to versions of TripleDict, using "==" or "is" for comparison

comment:106 Changed 9 years ago by SimonKing

Sorry, I have just posted a patch to the wrong ticket. The new patch belongs to #715, not to here. Just ignore it.

comment:107 Changed 9 years ago by SimonKing

  • Milestone changed from sage-4.8 to sage-duplicate/invalid/wontfix
  • Status changed from needs_work to needs_info

Concerning "wrong ticket": Shouldn't we consider this ticket a duplicate of #715? After all, the examples from the two ticket descriptions are almost identical.

comment:108 follow-up: Changed 9 years ago by jpflori

That's what I originally suggested :) See the first comments on this ticket.

When I found about ticket #715 I copied the description from here to there (I must have introduced some typographical difference in the way) because I thought the problem belonged there and the original description of ticket #715 was non-existent.

comment:109 in reply to: ↑ 108 Changed 9 years ago by SimonKing

Replying to jpflori:

That's what I originally suggested :) See the first comments on this ticket.

When I found about ticket #715 I copied the description from here to there (I must have introduced some typographical difference in the way) because I thought the problem belonged there and the original description of ticket #715 was non-existent.

I see. And I also found that I was originally responsible for not making it a duplicate.

The reason was that I found another leak: It is the strong cache for the homsets, and that is not addressed in #715.

So (question to the release manager), what shall we do? Mark this as a duplicate and open a different ticket for the cache of homsets? Or change the ticket description such that it is only about homsets, not about elliptic curves?

comment:110 Changed 9 years ago by zimmerma

before tagging that ticket as a duplicate of #715, I'd like to check the leak is indeed fixed with the (upcoming) patch of #715.

Paul

comment:111 Changed 9 years ago by jpflori

I fear the work on #715 is far from being finished...

comment:112 Changed 9 years ago by SimonKing

  • Description modified (diff)
  • Milestone changed from sage-duplicate/invalid/wontfix to sage-5.0
  • Status changed from needs_info to needs_review

comment:113 Changed 9 years ago by SimonKing

The original example of this ticket is in fact fixed with #715, but I think the other problem should be dealt with here, namely by using a weak cache for homsets.

comment:114 Changed 9 years ago by SimonKing

Note that with the patch applied on top of #715, the memleak is fixed:

sage: for p in prime_range(10^5):
....:     K = GF(p)
....:     a = K(0)
....:     
sage: import gc
sage: gc.collect()
8320
sage: LE = [x for x in gc.get_objects() if  isinstance(x,type(K))]
sage: len(LE)
2
sage: LE
[Finite Field of size 2, Finite Field of size 99991]

Later today, I'll run doctests.

comment:115 Changed 9 years ago by SimonKing

Note that the patch was written before the latest version of the patch from #715 was created. In particular, it uses TripleDict, but should better use TripleDictById (which hasn't been there when I wrote the patch).

comment:116 Changed 9 years ago by SimonKing

Not bad: make ptest results in only one error with the patch and its dependencies applied on top of sage-5.0.prealpha0:

sage -t -force_lib "devel/sage/sage/rings/polynomial/multi_polynomial_libsingular.pyx"
Exception AttributeError: 'PolynomialRing_field_with_category' object has no attribute '_modulus' in  ignored
Exception AttributeError: 'PolynomialRing_field_with_category' object has no attribute '_modulus' in  ignored
**********************************************************************
File "/home/simon/SAGE/sage-5.0.prealpha0/devel/sage/sage/rings/polynomial/multi_polynomial_libsingular.pyx", line 418:
    sage: len(ring_refcount_dict) == n
Expected:
    True
Got:
    False
**********************************************************************
1 items had failures:
   1 of  18 in __main__.example_4
***Test Failed*** 1 failures.
For whitespace errors, see the file /home/simon/.sage//tmp/multi_polynomial_libsingular_6755.py
         [3.6 s]
 
----------------------------------------------------------------------
The following tests failed:


        sage -t -force_lib "devel/sage/sage/rings/polynomial/multi_polynomial_libsingular.pyx"
Total time for all tests: 3.6 seconds

comment:117 Changed 9 years ago by SimonKing

It turns out that the failing test had in fact a wrong design. It was:

            sage: import gc
            sage: from sage.rings.polynomial.multi_polynomial_libsingular import MPolynomialRing_libsingular
            sage: from sage.libs.singular.ring import ring_refcount_dict
            sage: n = len(ring_refcount_dict)
            sage: R = MPolynomialRing_libsingular(GF(547), 2, ('x', 'y'), TermOrder('degrevlex', 2))
            sage: len(ring_refcount_dict) == n + 1
            True

            sage: Q = copy(R)   # indirect doctest
            sage: p = R.gen(0) ^2+R.gen(1)^2
            sage: q = copy(p)
            sage: del R
            sage: del Q
            sage: del p
            sage: del q
            sage: gc.collect() # random output
            sage: len(ring_refcount_dict) == n   
            True

Hence, before n is defined, no garbage collection takes place. This is, of course, not correct: The ring_refcount_dict may contain references to a ring created in another doctest, that is only garbage collected in the line before len(ring_refcount_dict)==n.

In other words: The test did not fail because there was a memory leak.

When I insert a garbage collection right before the definition of n, the test works, and in addition the warning about AttributeError being ignored vanishes.

The patch fixes the memory leak caused by a strong homset cache (which was not addressed by #715):

sage: for p in prime_range(10^3):
....:     K = GF(p)
....:     a = K(0)
....:     
sage: import gc
sage: gc.collect()
3128
sage: LE = [x for x in gc.get_objects() if  isinstance(x,type(K))]
sage: LE
[Finite Field of size 2, Finite Field of size 997]

The patch adds a new doctest demonstrating that it is fixed.

Needs review!

comment:118 Changed 9 years ago by jpflori

I'll this patch as soon as 715 is finally closed.

I think the ticket title also needs to be changed to reflect the issue addressed, namely that when homset are created, a strong ref is kept forever in sage.???.homset.

comment:119 Changed 9 years ago by SimonKing

  • Status changed from needs_review to needs_work
  • Summary changed from Memleak when resolving the action of Integers on an Elliptic Curve to Use weak references to cache homsets

When applying the new patch from #715, all tests pass. But when also applying the patch from here, make ptest results in

sage -t  -force_lib devel/sage/sage/libs/singular/ring.pyx # 6 doctests failed

The failing tests concern ring_refcount_dict - and it is perhaps not surprising that a patch enabling garbage collection of rings has an influence on refcount, such as:

   sage: ring_ptr = set(ring_refcount_dict.keys()).difference(prev_rings).pop()
Exception raised:
    Traceback (most recent call last):
      File "/home/simon/SAGE/sage-5.0.prealpha0/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/home/simon/SAGE/sage-5.0.prealpha0/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/home/simon/SAGE/sage-5.0.prealpha0/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_8[9]>", line 1, in <module>
        ring_ptr = set(ring_refcount_dict.keys()).difference(prev_rings).pop()###line 452:
    sage: ring_ptr = set(ring_refcount_dict.keys()).difference(prev_rings).pop()
    KeyError: 'pop from an empty set'

Anyway, I need to analyse what happens, and provide a new patch after catching some sleep.

Jean-Pierre, I hope you like the new ticket title!

comment:120 Changed 9 years ago by jpflori

Yes, better title !

I've not have any time to have a look at these cache tickets in the last few days, but be sure I'll have a look soon.

Changed 9 years ago by SimonKing

Use the weak TripleDict from #715 for the cache of homsets

comment:121 Changed 9 years ago by SimonKing

  • Status changed from needs_work to needs_review

The doctest problem was easy enough to solve: When starting the example with a garbage collection, then things work. Actually I am surprised that the doctest framework does not do a garbage collection at the beginning of each example, in order to bring the system in a defined state - I'll ask on sage-devel.

Anyway, since the problem is solved and the patch updated, stuff is now ready for review!

Apply trac11521_triple_homset.patch

comment:122 follow-up: Changed 9 years ago by jpflori

Ive look at the patch and has nothing to say except for weakref being imported twice, I'll provided a reviewer patch for that.

Ha, maybe we should mention the caching system in the doc as well, I'm ready to do that in a reviewer patch as well.

However, I'll wait for #715 to be closed before positive reviewing this one, just in case the changes there have consequences here (which should not be the case).

comment:123 in reply to: ↑ 122 Changed 9 years ago by SimonKing

Replying to jpflori:

Ha, maybe we should mention the caching system in the doc as well, I'm ready to do that in a reviewer patch as well.

Oops, you are right! I thought I had add to the docs, but I just added a doctest, no elaborate explanation.

comment:124 Changed 8 years ago by jpflori

  • Status changed from needs_review to needs_info

While finally looking at this ticket seriously and adding some doc, I remarked that if you run

sage: V = VectorSpace(QQ,3)
sage: H = Hom(V,V)
sage: H is Hom(V,V)
False

With this tickets patches but also WITHOUT them.

Is this a consequence of #9138 ?

What's strange is that V does not break the unique parent assumption:

sage: V is VectorSpace(QQ, 3)
True

Is that intended because there is a ? It does not look very right to me.

The same "problem" occurs playing for example with finite fields...

With more classical stuff, the cache seems to work as intended

sage: H = Hom(ZZ, QQ)
sage: H is Hom(ZZ, QQ)
True

This last point let me think that we can discuss and fix the VectorSpace? situation elsewhere if this is indeed a problem.

My concern here is that it's hard to provide a "smart" example where the hom set is cached and where it can be garbage collected when its domainand codomain are.

Id on't think deleting ZZ and QQ is a good idea :) (and they would be refed elsewhere anyway)

comment:125 Changed 8 years ago by jpflori

My bad, in fact the "issue" seems to be that X._Hom_(Y, category) succeeds but returns non identical results.

Changed 8 years ago by jpflori

Reviewer patch; added doc

comment:126 Changed 8 years ago by jpflori

  • Description modified (diff)
  • Status changed from needs_info to needs_review

Here is a reviewer patch added a little doc.

I hope you will find it satisfactory.

If so, feel free to set the ticket to positive review, I personnally feel happy with youre code.

My above rant, if it should be addressed, should be elsewhere.

Note: See TracTickets for help on using tickets.