Opened 2 years ago

Closed 19 months ago

#23851 closed defect (fixed)

Fix memoryleak introduced in #11670

Reported by: nbruin Owned by:
Priority: major Milestone: sage-8.3
Component: memleak Keywords: thursdaysbdx
Cc: Merged in:
Authors: Nils Bruin, Peter Bruin Reviewers: Sébastien Labbé
Report Upstream: N/A Work issues:
Branch: b7e1042 (Commits) Commit: b7e1042ab0f3948aa2416c03ebe7c70701418dc3
Dependencies: Stopgaps:

Description (last modified by jdemeyer)

In #11670 the following leak was introduced:

        sage: u=id(NumberField(x^2-5,'a').absolute_field('b'))
        sage: import gc
        sage: gc.collect() #random
        10
        sage: [id(v) for v in gc.get_objects() if id(v) == u]

See Nils's explanation in comment:14:ticket:23807 and the surrounding discussion, where this bug was found.

Change History (33)

comment:1 Changed 2 years ago by nbruin

  • Branch set to u/nbruin/fix_memoryleak_introduced_in__11670

comment:2 Changed 2 years ago by nbruin

  • Authors set to nbruin, pbruin
  • Cc pbruin added
  • Commit set to bf22de066dacc686296cf17c0f098484dddf1f1e
  • Component changed from PLEASE CHANGE to memleak
  • Description modified (diff)
  • Status changed from new to needs_review
  • Type changed from PLEASE CHANGE to defect

New commits:

bf22de0amend caching of "absolute_field" to avoid memory leaks due to "structure" storage.

comment:3 Changed 2 years ago by pbruin

  • Authors changed from nbruin, pbruin to Nils Bruin, Peter Bruin
  • Branch changed from u/nbruin/fix_memoryleak_introduced_in__11670 to u/pbruin/23851-memory_leak
  • Cc pbruin removed
  • Commit changed from bf22de066dacc686296cf17c0f098484dddf1f1e to 4568007a9d08ccaa744c1f9b1832ea351eb5ccac

Cleaned up the docstrings of absolute_field() a bit.

comment:4 follow-up: Changed 2 years ago by slabbe

  • Keywords thursdaysbdx added
  • Reviewers set to Sébastien Labbé
  • Status changed from needs_review to positive_review

I do not know what is happening with the patchbot. The bug is fixed and I get All tests passed after running make ptestlong on my machine.

comment:5 Changed 2 years ago by vbraun

  • Status changed from positive_review to needs_work

I'm getting this on various buildbots:

File "src/sage/rings/polynomial/polynomial_quotient_ring.py", line 1389, in sage.rings.polynomial.polynomial_quotient_ring.PolynomialQuotientRing_generic.S_units
Failed example:
    u,o = K.S_units([])[0]; u, o
Expected:
    (-1/2*a + 1/2, 6)
Got:
    (1/2*a + 1/2, 6)
**********************************************************************
File "src/sage/rings/polynomial/polynomial_quotient_ring.py", line 1395, in sage.rings.polynomial.polynomial_quotient_ring.PolynomialQuotientRing_generic.S_units
Failed example:
    u^2
Expected:
    -1/2*a - 1/2
Got:
    1/2*a - 1/2
**********************************************************************
File "src/sage/rings/polynomial/polynomial_quotient_ring.py", line 1404, in sage.rings.polynomial.polynomial_quotient_ring.PolynomialQuotientRing_generic.S_units
Failed example:
    L.S_units([])
Expected:
    [(-1/2*a + 1/2, 6),
     ((-1/3*a - 1)*b^2 - 4/3*a*b - 5/6*a + 7/2, +Infinity),
     (2/3*a*b^2 + (2/3*a - 2)*b - 5/6*a - 7/2, +Infinity)]
Got:
    [(1/2*a + 1/2, 6),
     ((-1/3*a - 1)*b^2 - 4/3*a*b - 5/6*a + 7/2, +Infinity),
     (2/3*a*b^2 + (2/3*a - 2)*b - 5/6*a - 7/2, +Infinity)]
**********************************************************************
File "src/sage/rings/polynomial/polynomial_quotient_ring.py", line 1408, in sage.rings.polynomial.polynomial_quotient_ring.PolynomialQuotientRing_generic.S_units
Failed example:
    L.S_units([K.ideal(1/2*a - 3/2)])
Expected:
    [((-1/6*a - 1/2)*b^2 + (1/3*a - 1)*b + 4/3*a, +Infinity),
     (-1/2*a + 1/2, 6),
     ((-1/3*a - 1)*b^2 - 4/3*a*b - 5/6*a + 7/2, +Infinity),
     (2/3*a*b^2 + (2/3*a - 2)*b - 5/6*a - 7/2, +Infinity)]
Got:
    [((-1/6*a - 1/2)*b^2 + (1/3*a - 1)*b + 4/3*a, +Infinity),
     (1/2*a + 1/2, 6),
     ((-1/3*a - 1)*b^2 - 4/3*a*b - 5/6*a + 7/2, +Infinity),
     (2/3*a*b^2 + (2/3*a - 2)*b - 5/6*a - 7/2, +Infinity)]
**********************************************************************
File "src/sage/rings/polynomial/polynomial_quotient_ring.py", line 1413, in sage.rings.polynomial.polynomial_quotient_ring.PolynomialQuotientRing_generic.S_units
Failed example:
    L.S_units([K.ideal(2)])
Expected:
    [((1/2*a - 1/2)*b^2 + (a + 1)*b + 3, +Infinity),
     ((1/6*a + 1/2)*b^2 + (-1/3*a + 1)*b - 5/6*a + 1/2, +Infinity),
     ((1/6*a + 1/2)*b^2 + (-1/3*a + 1)*b - 5/6*a - 1/2, +Infinity),
     (-1/2*a + 1/2, 6),
     ((-1/3*a - 1)*b^2 - 4/3*a*b - 5/6*a + 7/2, +Infinity),
     (2/3*a*b^2 + (2/3*a - 2)*b - 5/6*a - 7/2, +Infinity)]
Got:
    [((1/2*a - 1/2)*b^2 + (a + 1)*b + 3, +Infinity),
     ((1/6*a + 1/2)*b^2 + (-1/3*a + 1)*b - 5/6*a + 1/2, +Infinity),
     ((1/6*a + 1/2)*b^2 + (-1/3*a + 1)*b - 5/6*a - 1/2, +Infinity),
     (1/2*a + 1/2, 6),
     ((-1/3*a - 1)*b^2 - 4/3*a*b - 5/6*a + 7/2, +Infinity),
     (2/3*a*b^2 + (2/3*a - 2)*b - 5/6*a - 7/2, +Infinity)]
**********************************************************************
File "src/sage/rings/polynomial/polynomial_quotient_ring.py", line 1476, in sage.rings.polynomial.polynomial_quotient_ring.PolynomialQuotientRing_generic.units
Failed example:
    u = K.units()[0][0]; u
Expected:
    -1/2*a + 1/2
Got:
    1/2*a + 1/2
**********************************************************************
File "src/sage/rings/polynomial/polynomial_quotient_ring.py", line 1482, in sage.rings.polynomial.polynomial_quotient_ring.PolynomialQuotientRing_generic.units
Failed example:
    u^2
Expected:
    -1/2*a - 1/2
Got:
    1/2*a - 1/2
**********************************************************************
File "src/sage/rings/polynomial/polynomial_quotient_ring.py", line 1494, in sage.rings.polynomial.polynomial_quotient_ring.PolynomialQuotientRing_generic.units
Failed example:
    L.units()
Expected:
    [(-1/2*a + 1/2, 6),
     ((-1/3*a - 1)*b^2 - 4/3*a*b - 5/6*a + 7/2, +Infinity),
     (2/3*a*b^2 + (2/3*a - 2)*b - 5/6*a - 7/2, +Infinity)]
Got:
    [(1/2*a + 1/2, 6),
     ((-1/3*a - 1)*b^2 - 4/3*a*b - 5/6*a + 7/2, +Infinity),
     (2/3*a*b^2 + (2/3*a - 2)*b - 5/6*a - 7/2, +Infinity)]
**********************************************************************
File "src/sage/rings/polynomial/polynomial_quotient_ring.py", line 1503, in sage.rings.polynomial.polynomial_quotient_ring.PolynomialQuotientRing_generic.units
Failed example:
    L.unit_group().gens_values()
Expected:
    [1/2*a + 1/2, (-1/3*a - 1)*b^2 - 4/3*a*b - 5/6*a + 7/2, 2/3*a*b^2 + (2/3*a - 2)*b - 5/6*a - 7/2]
Got:
    [-1/2*a + 1/2, (-1/3*a - 1)*b^2 - 4/3*a*b - 5/6*a + 7/2, 2/3*a*b^2 + (2/3*a - 2)*b - 5/6*a - 7/2]
**********************************************************************
2 items had failures:
   5 of  18 in sage.rings.polynomial.polynomial_quotient_ring.PolynomialQuotientRing_generic.S_units
   4 of  23 in sage.rings.polynomial.polynomial_quotient_ring.PolynomialQuotientRing_generic.units
    [448 tests, 9 failures, 6.99 s]

comment:6 Changed 2 years ago by pbruin

With the following change (which is probably a bad idea) I consistently get the same doctest failures as above:

  • src/sage/rings/polynomial/polynomial_quotient_ring.py

    a b class PolynomialQuotientRing_generic(CommutativeRing): 
    10501050        return self(self.polynomial_ring().random_element( \
    10511051            degree=self.degree()-1, *args, **kwds))
    10521052
    1053     @cached_method
    10541053    def _S_decomposition(self, S):
    10551054        """
    10561055        Compute the decomposition of self into a product of number fields.

comment:7 Changed 2 years ago by git

  • Commit changed from 4568007a9d08ccaa744c1f9b1832ea351eb5ccac to 55a59ff3da2c50c83c23d5021a127026f811f0cd

Branch pushed to git repo; I updated commit sha1. New commits:

55a59ffTrac 23851: force garbage collection to make S-unit computations deterministic

comment:8 Changed 2 years ago by pbruin

The above commit appears to make the computation deterministic again. Let's see if the patchbot agrees.

comment:9 Changed 2 years ago by pbruin

  • Status changed from needs_work to needs_review

comment:10 in reply to: ↑ 4 Changed 2 years ago by slabbe

Replying to slabbe:

I do not know what is happening with the patchbot. The bug is fixed and I get All tests passed after running make ptestlong on my machine.

I confirm that I get the same errors when run alone:

 sage -t src/sage/rings/polynomial/polynomial_quotient_ring.py

I am sorry, I do not know what happened with my run of make ptestlong which said All tests passed. I rechecked the log file of ptestlong.log, and I confirm that I got All tests passed:

Running doctests with ID 2017-09-14-16-16-01-0e639740.
Git branch: 23851
Using --optional=ccache,cmake,cryptominisat,mpir,pandoc_attributes,pandocfilters,python2,rst2ipynb,sage
Doctesting entire Sage library.
Sorting sources by runtime so that slower doctests are run first....
Doctesting 3587 files using 8 threads.
...
sage -t --long --warn-long 58.9 src/sage/rings/polynomial/polynomial_quotient_ring.py
    [448 tests, 3.09 s]
...
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------
Total time for all tests: 1623.2 seconds
    cpu time: 10208.6 seconds
    cumulative wall time: 12523.8 seconds
Last edited 2 years ago by slabbe (previous) (diff)

comment:11 Changed 2 years ago by slabbe

+        First, we force garbage collection to make the `S`-unit
+        computations below deterministic::

Why? Can you add a sentence in this paragraph saying why the computatino of S-unit depends on the existence of objects that were not yet collected by the garbage collector?

Last edited 2 years ago by slabbe (previous) (diff)

comment:12 Changed 2 years ago by slabbe

I confirm that the last commit fixes the sage -t of polynomial_quotient_ring.py when run alone.

comment:13 Changed 2 years ago by pbruin

What is apparently happening is that commit bf22de0 (see comment:2) causes the quadratic field of discriminant -3 to be garbage-collected more often. This is in principle a good thing, but every time the field is constructed again, the computed generator of the unit group may be different than the previous time. This also happens if you run

K = nfinit(y^2 + 3); nfbasistoalg(K, nfrootsof1(K)[2])

several times in GP; the result varies randomly between 1/2*y + 1/2 and -1/2*y + 1/2, the two roots of unity of order 6.

Commit 55a59ff (see comment:7) reduces the randomness in the number of times that K is constructed. Experimentally it seems to work; I ran doctests in polynomial_quotient_ring.py 128 times with this commit applied and got no failures. However, I now realise that it may not be entirely watertight and that we should check if there are other points at which K may be randomly garbage-collected. Alternatively, we could construct K once at the beginning of the file, keeping it alive by binding it to a name that is not used anywhere else in the file.

comment:14 Changed 2 years ago by slabbe

With the recent commit, when running make ptestlong or when running sage -t --long alone on the file, I get problems:

----------------------------------------------------------------------
sage -t --long --warn-long 58.5 src/sage/rings/polynomial/polynomial_quotient_ring.py  # 9 doctests failed
----------------------------------------------------------------------

When running sage -t alone on the file without the option --long , I get All tests passed:

Doctesting 1 file.
sage -t --warn-long 32.7 src/sage/rings/polynomial/polynomial_quotient_ring.py
    [448 tests, 1.81 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------

But the failing tests are not marked with long time ... ??? Why are they not failing in the second case?

Last edited 2 years ago by slabbe (previous) (diff)

comment:15 Changed 2 years ago by pbruin

  • Status changed from needs_review to needs_work
  • Work issues set to fix random doctest failures

This seems to indicate that those tests are still non-deterministic. We should either make them deterministic again or mark them with # random (but then it is harder to notice in case the output actually becomes wrong at some point.)

comment:16 Changed 2 years ago by nbruin

These new changes are just showing that the doctests were fragile to begin with. There's a bunch of things you can do to check the result is correct:

  • look at the norms
  • verify they and their inverses are S-integral (factorize the denominator)
  • hard-code an S-unit basis and check that these occur as a power product of the computed set.

It's a pain, but I don't think we really have another option. (I guess the answers only differ in some sign choices, so you could just construct the set of all sign choices and check your answer lies in there. That would be a reasonably strict test and will probably take a little longer before it breaks again)

Also, if you want to prevent the field from being garbage collected, just bind it explicitly to an identifier. That's how users should prevent garbage collections too. That doesn't help the non-deterministic doctests, though it may hide the problem for a while.

comment:17 Changed 2 years ago by git

  • Commit changed from 55a59ff3da2c50c83c23d5021a127026f811f0cd to dd0594d391a365ff8d6d53c56767178c7d910903

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

dd0594dTrac 23851: accept both roots of unity of order 6 in doctests

comment:18 Changed 2 years ago by pbruin

  • Status changed from needs_work to needs_review

The above commit is a more robust alternative to the gc.collect() solution; we now just avoid distinguishing between the two generators for the unit group of QuadraticField(-3).

comment:19 follow-up: Changed 2 years ago by roed

Do we really want to delete @cached_method on absolute_field?

comment:20 in reply to: ↑ 19 ; follow-up: Changed 20 months ago by slabbe

Replying to roed:

Do we really want to delete @cached_method on absolute_field?

One argument for caching a method is when it takes long time to compute which is not the case here:

sage: %time nf = NumberField(x^2-5,'a')
CPU times: user 1.12 ms, sys: 97 µs, total: 1.21 ms
Wall time: 962 µs
sage: %time nf.absolute_field('b')
CPU times: user 9 µs, sys: 1e+03 ns, total: 10 µs
Wall time: 13.8 µs
Number Field in b with defining polynomial x^2 - 5

Is there another reason why absolute_field should be cached?

comment:21 Changed 20 months ago by slabbe

  • Status changed from needs_review to positive_review
  • Work issues fix random doctest failures deleted

comment:22 Changed 20 months ago by chapoton

  • Milestone changed from sage-8.1 to sage-8.3

comment:23 in reply to: ↑ 20 Changed 20 months ago by roed

Replying to slabbe:

Replying to roed:

Do we really want to delete @cached_method on absolute_field?

One argument for caching a method is when it takes long time to compute which is not the case here:

sage: %time nf = NumberField(x^2-5,'a')
CPU times: user 1.12 ms, sys: 97 µs, total: 1.21 ms
Wall time: 962 µs
sage: %time nf.absolute_field('b')
CPU times: user 9 µs, sys: 1e+03 ns, total: 10 µs
Wall time: 13.8 µs
Number Field in b with defining polynomial x^2 - 5

Your timings, on a quadratic field which is already absolute, aren't particularly convincing that this method is always fast. However, running some other tests on higher degree relative extensions shows that they're quite fast as well. I'll withdraw my objection to removing the cached_method.

Is there another reason why absolute_field should be cached?

comment:24 Changed 20 months ago by vbraun

  • Status changed from positive_review to needs_work

Merge conflict

comment:25 Changed 19 months ago by git

  • Commit changed from dd0594d391a365ff8d6d53c56767178c7d910903 to b7e1042ab0f3948aa2416c03ebe7c70701418dc3

Branch pushed to git repo; I updated commit sha1. New commits:

b7e1042Merge branch 'develop' into ticket/23851-memory_leak

comment:26 Changed 19 months ago by pbruin

  • Status changed from needs_work to positive_review

comment:27 follow-up: Changed 19 months ago by jdemeyer

I don't really understand the memory leak. If the original code leaks, isn't that a deeper problem with @cached_method?

comment:28 Changed 19 months ago by jdemeyer

  • Status changed from positive_review to needs_info

comment:29 Changed 19 months ago by jdemeyer

Does anybody have an idea? If not, please say so. I'm just asking to know whether it's worth my time to investigate it.

comment:30 in reply to: ↑ 27 Changed 19 months ago by pbruin

Replying to jdemeyer:

I don't really understand the memory leak. If the original code leaks, isn't that a deeper problem with @cached_method?

The leak is caused by the combination of @cached_method and UniqueFactory; see Nils's explanation in comment:14:ticket:23807 and the surrounding discussion, where this bug was found.

comment:31 Changed 19 months ago by jdemeyer

  • Description modified (diff)

comment:32 Changed 19 months ago by jdemeyer

  • Status changed from needs_info to positive_review

Thanks for the pointer! It all makes sense to me now.

comment:33 Changed 19 months ago by vbraun

  • Branch changed from u/pbruin/23851-memory_leak to b7e1042ab0f3948aa2416c03ebe7c70701418dc3
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.