Opened 7 years ago

Last modified 7 years ago

#17360 new defect

memory leak in FiniteDimensionalAlgebra

Reported by: nbruin Owned by:
Priority: major Milestone: sage-6.5
Component: memleak Keywords:
Cc: SimonKing, nthiery, jpflori Merged in:
Authors: Reviewers:
Report Upstream: N/A Work issues:
Branch: u/nbruin/algebra_memoryleak (Commits, GitHub, GitLab) Commit: 2c942ad50c3f9947b7d680d2f6cc708848ac9758
Dependencies: Stopgaps:

Status badges

Description

The following code shows a memory leak:

import gc
from collections import Counter
gc.collect()

pre={id(a) for a in gc.get_objects()}

for p in prime_range(20000):
        A = FiniteDimensionalAlgebra(GF(p), [Matrix(GF(p),[[1, 0], [0, 1]]), Matrix(GF(p),[[0, 1], [0, 0]])]) 
    
gc.collect()

post=Counter(str(type(a)) for a in gc.get_objects() if id(a) not in pre)
post

(the content of "post" shows that all finite fields and a lot of categories are still in memory). The cause is simple: the category is still a parametrized one by default. The fix, therefore, is simple.

Change History (12)

comment:1 Changed 7 years ago by nbruin

  • Branch set to u/nbruin/algebra_memoryleak

comment:2 Changed 7 years ago by nbruin

  • Commit set to d486b71627014ee14955a61bdf8c64572cb49718
  • Component changed from PLEASE CHANGE to memleak

Indeed, with the branch attached things are fine (and run a lot faster too, because there's not the overhead of creating all those categories every time).

Incidentally, running the command with

        A = FiniteDimensionalAlgebra(GF(p), [Matrix([[1, 0], [0, 1]]), Matrix([[0, 1], [0, 0]])]) 

goes horribly wrong again, because of the little pearl in matrix_space.py:

   def change_ring(self, R):
        try:
            return self.__change_ring[R]
        except AttributeError:
            self.__change_ring = {}
        except KeyError:
            pass
        M = MatrixSpace(R, self.__nrows, self.__ncols, self.__is_sparse)
        self.__change_ring[R] = M
        return M

which conveniently undoes all the effort of the rest of the coercion framework to avoid strong references. Note that simply changing the dict to weakly keyed won't help: the matrix spaces will hold a reference to the base ring anyway. Having it weak on both key and value would work, but now we're just replicating the UniqueRepresentation? caching. Is that necessary?


New commits:

d486b71trac 17360: try to avoid referencing base ring in category.

comment:3 Changed 7 years ago by git

  • Commit changed from d486b71627014ee14955a61bdf8c64572cb49718 to 2c942ad50c3f9947b7d680d2f6cc708848ac9758

Branch pushed to git repo; I updated commit sha1. New commits:

2c942adtrac 17360: remove custom MatrixSpace.change_ring cache, because it leaks and doesn't seem to improve performance otherwise anyway.

comment:4 Changed 7 years ago by nbruin

OK! some initial timing gives me with the old code:

sage: M=matrix(ZZ,2,2,[1,0,0,1])
sage: k=GF(3)
sage: %timeit M.change_ring(k)
10000 loops, best of 3: 111 µs per loop

and after removing the custom cache:

sage: %timeit M.change_ring(k)
10000 loops, best of 3: 112 µs per loop

so I think there is no difference (we did improve UniqueRepresentation? a lot), and removing the cache makes the example work fine.

There are plenty of other parent inits that set a base-ring referencing category by default, by the way. Any ideas on how to conveniently detect these?

comment:5 Changed 7 years ago by nbruin

  • Type changed from PLEASE CHANGE to defect

comment:6 follow-up: Changed 7 years ago by tscrim

  • Cc nthiery added; nthierry removed

Do you also want to fix this for other algebras (except Weyl and Clifford, I'll do that in #17096 since I'm editing that part of those files anyways)?

comment:7 Changed 7 years ago by jpflori

  • Cc jpflori added

comment:8 in reply to: ↑ 6 Changed 7 years ago by nbruin

Replying to tscrim:

Do you also want to fix this for other algebras

I would like that it gets fixed (because the pervasive leaking has repeatedly prevented me from using sage on problems of interesting scale); I'm not particularly keen on doing it myself, since I don't have a particularly good idea about how to do that efficiently.

(except Weyl and Clifford, I'll do that in #17096 since I'm editing that part of those files anyways)?

Thanks!

comment:9 Changed 7 years ago by nbruin

OK, now I'm confused. If I do

import gc
from collections import Counter
gc.collect()

pre={id(a) for a in gc.get_objects()}

for p in prime_range(20000):
    C=VectorSpaces(GF(p))

gc.collect()

post=Counter(str(type(a)) for a in gc.get_objects() if id(a) not in pre)
[p for p in post.iteritems() if p[1] > 2000]

I get no hits (i.e, the category VectorSpaces seems to be collectible), but if I do the same with

    C=Algebras(GF(p))

I do end up with a lot of junk. What is it that makes Algebras not collectible? [There's also the performance component: making all these different VectorSpaces categories does take time too, so equipping a vector space object by default with a generic category that doesn't need to be created for every base ring is a good idea anyway, but it would be good to understand why VectorSpaces aren't immortal like Algebras are]

One obvious difference is that Algebras inherits from CategoryWithAxiom_over_base_ring . It may be that the axiom stuff ruins mortality? It definitely ruins performance:

sage: %time for p in prime_range(20000): C=Algebras(GF(p))
CPU times: user 4.17 s, sys: 71 ms, total: 4.24 s
Wall time: 4.23 s

versus (in a fresh session, because the above creates the vector spaces anyway):

sage: %time for p in prime_range(20000): C=VectorSpaces(GF(p))
CPU times: user 660 ms, sys: 14 ms, total: 674 ms
Wall time: 662 ms

Both are noticeable, so should be avoided if possible, but the Algebras one is just VERY slow. It's not using that much memory so I have trouble attributing that just to overhead in allocating memory.

Last edited 7 years ago by nbruin (previous) (diff)

comment:10 Changed 7 years ago by tscrim

Algebras is a subclass of CategoryWithAxiom_over_base_ring whereas Modules (and VectorSpaces) is a subclass of Category_module. I'm guessing that any subclass of CategoryWithAxiom gets nailed into memory because there is a link back to the list of axioms or the call to the strongly cached axiom_of_nested_class.

comment:11 Changed 7 years ago by jpflori

Should we or did someone open a ticket to fix that (general) nasty memory leak caused by WithAxiom?

comment:12 Changed 7 years ago by tscrim

I'm thinking we should change this ticket to take care of the CategoryWithAxiom leak.

I can get rid of the memory leak by removing the caching on Category._with_axiom_as_tuple and Category._with_axiom (although it takes a few passes of gc.collect() to get rid of the finite fields). However I believe this could lead to a slowdown in instantiating categories, and we have the workaround of the memory leak of Algebras(Fields()).

So perhaps what we can do is have the cached part should be at the class level (which should cover the heavy lifting and result in a large speedup in [Nils] creation because the construction won't be done for every new Algebras category) and the caching should return a class. Thoughts?

Note: See TracTickets for help on using tickets.