#31313 closed defect (fixed)
Memory leak in bipartite_graph (and so in generalised_quadrangle_with_spread)
Reported by:  vbraun  Owned by:  

Priority:  major  Milestone:  sage9.3 
Component:  combinatorial designs  Keywords:  generalised_quadrangle, bipartite graph, memory leak 
Cc:  dimpase, ghIvoMaffei, dcoudert  Merged in:  
Authors:  Dave Morris  Reviewers:  Dima Pasechnik 
Report Upstream:  N/A  Work issues:  
Branch:  0be8fbc (Commits, GitHub, GitLab)  Commit:  
Dependencies:  #30517  Stopgaps: 
Description (last modified by )
Reproducer:
sage: from sage.combinat.designs.gen_quadrangles_with_spread import is_GQ_with_spread, dual_GQ_ovoid # line 270 sage: t = designs.generalised_quadrangle_hermitian_with_ovoid(3) # line 272 sage: t = dual_GQ_ovoid(*t) # line 273 sage: is_GQ_with_spread(*t, s=3, t=9) # line 274
For example, running the last is_GQ_with_spread
command eats about 170MB per invocation that is never released.
The relevant tests were marked as # known bug
in #30517.
Fixed by changing A[i][j]
to A[i, j]
to access matrix entries.
This avoids the memory leak which turns out to be in accessing matrix rows and is now tracked at #31340.
Attachments (1)
Change History (45)
comment:1 Changed 8 months ago by
 Description modified (diff)
comment:2 Changed 8 months ago by
comment:3 Changed 8 months ago by
FWIW, I see the leak. I tested nbruin's code with 9.2 and 9.3b6 on CoCalc
, and 9.3b6 on MacOS 10.15.7. The numbers grew steadily in all cases, but the rate differed: about 170MB on CoCalc
and about 90MB on MacOS.
comment:4 followup: ↓ 5 Changed 8 months ago by
I apologize if this is noise, but it appears to me that this leak has nothing to do with quadrangles. Instead, the problem seems to be that there is a memory leak in the construction of bipartite graphs:
sage: m = 150 ....: n = 200 ....: A = Matrix(ZZ, m, n) ....: for j in range(n): ....: i = randint(0,m2) ....: A[i, j] = 1 ....: A[i+1, j] = 1 ....: def graph(A): ....: G = Graph(A) ....: def bipartite(A): ....: G = BipartiteGraph(A) ....: start_mem = get_memory_usage() ....: for n in [1..10]: ....: prev_mem = get_memory_usage() ....: graph(A) ....: print("graph:", get_memory_usage()  prev_mem) ....: prev_mem = get_memory_usage() ....: bipartite(A) ....: print("bipartite:", get_memory_usage()  prev_mem, get_memory_usage()  start ....: _mem) graph: 0.14453125 bipartite: 185.1640625 185.30859375 graph: 0.0 bipartite: 47.0 232.30859375 graph: 0.0 bipartite: 47.00390625 279.3125 graph: 0.0 bipartite: 48.0 327.3125 graph: 0.0 bipartite: 67.0078125 394.3203125 graph: 0.0 bipartite: 47.0 441.3203125 graph: 0.0 bipartite: 48.0 489.3203125 graph: 0.0 bipartite: 47.0 536.3203125 graph: 0.0 bipartite: 46.0 582.3203125 graph: 0.0 bipartite: 58.0 640.3203125
This was on MacOS. On CoCalc
, I got a consistent bipartite leak of 93.84375.
comment:5 in reply to: ↑ 4 Changed 8 months ago by
As a possibly interesting data point, also for this bipartite graph example I'm not getting a leak on sage 9.2, running on CentOS Linux release 7.9.2009 (Core). Extra build info on the underlying python: Python 3.8.5 (default, Dec 15 2020, 19:39:17) [GCC 4.8.5 20150623 (Red Hat 4.8.544)] on linux.
comment:6 Changed 8 months ago by
The problem seems to be a small leak when accessing a row of an integer matrix. The construction of a bipartite graph involves accessing all of the entries of the matrix. If the graph is large (such as 150 x 200), then the formula A[jj][ii]
will be evaluated tens of thousands of times, which amplifies the leak.
sage: A = Matrix(ZZ, 200) ....: def here_is_a_leak(A): ....: for n in range(10000): ....: row0 = A[0] ....: ....: start_mem = get_memory_usage() ....: for n in [1..5]: ....: prev_mem = get_memory_usage() ....: here_is_a_leak(A) ....: print("change in memory:", round(get_memory_usage()  prev_mem, 1), ....: "\n total:", round(get_memory_usage()  start_mem,1)) change in memory: 162.2 total: 162.2 change in memory: 15.0 total: 177.2 change in memory: 16.0 total: 193.2 change in memory: 24.0 total: 217.2 change in memory: 16.0 total: 233.2
This is from 9.3b6 on MacOS, but I also tested 9.2 on CoCalc
. The leak seemed to go away (most of the time, at least) when I changed ZZ
in the definition of A
to QQ
or RR
or GF(2^10)
.
comment:7 Changed 8 months ago by
If my diagnosis is correct, then we might be able to solve the problem on this ticket just by changing A[jj][ii]
to A[jj,ii]
. I'll write a patch to try that out.
If it works, we can open a different ticket about the memory leak of A[jj]
.
comment:8 Changed 8 months ago by
 Branch set to public/31313
comment:9 Changed 8 months ago by
 Commit set to 779ebd2c0b1b6f0b209ebb06899a46f173ed44f9
Branch pushed to git repo; I updated commit sha1. New commits:
779ebd2  off by one error in doctest

comment:10 Changed 8 months ago by
 Keywords bipartite graph memory leak added
 Status changed from new to needs_review
Yes, that simple change fixes the problem for me (and dramatically speeds up the construction of the bipartite graph).
So we should be able to remove the # known bug
tags from #30517.
comment:11 Changed 8 months ago by
 Commit changed from 779ebd2c0b1b6f0b209ebb06899a46f173ed44f9 to 380d4384dd3e4fd31629c6a6939a17d9059503af
Branch pushed to git repo; I updated commit sha1. New commits:
380d438  style: space after comma

comment:12 Changed 8 months ago by
EDIT: oops sorry, wrong ticket
comment:13 Changed 8 months ago by
 Reviewers set to Dima Pasechnik
 Status changed from needs_review to needs_work
Great catch! Please also base it on #30517, and revert this known bug tag, here.
comment:14 Changed 8 months ago by
 Dependencies set to #30517
comment:15 Changed 8 months ago by
 Commit changed from 380d4384dd3e4fd31629c6a6939a17d9059503af to 0be8fbcb7d76a5b1d845a8cca1b4b04fc9b68ac1
Branch pushed to git repo; I updated commit sha1. New commits:
36837a1  added test function

08b8890  Merge branch 'public/combinat/30517' in 9.3.b6

a6841da  adding known bug tags

c40efe9  removed test()

7085634  Merge remotetracking branch 'trac/public/31313' into public/31313

0be8fbc  Revert "adding known bug tags"  they are fixed by trac:31313

comment:16 Changed 8 months ago by
 Summary changed from Memory leak in generalized quadrangles to Memory leak in bipartite_graph (and so in generalised_quadrangle_with_spread)
comment:18 Changed 8 months ago by
Thanks! I opened #31340 for the memory leak from A[jj]
.
comment:19 Changed 8 months ago by
 Resolution set to duplicate
 Status changed from positive_review to closed
comment:20 Changed 8 months ago by
@vbraun: This ticket (#31313) is not a duplicate. The PR here gets rid of a memory leak (and therefore removes the # known bug
tags that were added in #30517). More work needs to be done (so there is a follow up ticket), but I think this ticket should be enough to make your buildbot run. If not, please let us know.
comment:21 Changed 6 months ago by
 Resolution duplicate deleted
 Status changed from closed to new
I looked at the code here and it seems that this code will probable also be faster then the old code since it is getting entries directly from the matrix instead of creating rows. It seems that the new code is not only a workaround for the underlying bug, but that it will also be better then the existing code even after the underlying bug has been fixed. So I think it makes sense to still merge this.
comment:22 Changed 6 months ago by
 Status changed from new to needs_review
needs review? Or good to go?
comment:23 Changed 6 months ago by
 Status changed from needs_review to positive_review
rechecked, all good
comment:24 Changed 6 months ago by
 Branch changed from public/31313 to 0be8fbcb7d76a5b1d845a8cca1b4b04fc9b68ac1
 Resolution set to fixed
 Status changed from positive_review to closed
comment:25 Changed 6 months ago by
 Commit 0be8fbcb7d76a5b1d845a8cca1b4b04fc9b68ac1 deleted
 Description modified (diff)
Thanks for fixing this and opening the followup ticket #31340!
Ideally, all
, any
, sum
, prod
should be
called on generators rather than lists when possible.
When coming across cases of use with list, it's good to replace them.
 sage: all([x in t[0] for x in t[1]]) + sage: all(x in t[0] for x in t[1])
Not the point of this ticket though; will be done elsewhere.
comment:26 Changed 6 months ago by
Are you sure? I'd be happy to do it either way, but I've been told that using a list is faster. See, for example, ticket:11606#comment:12
comment:27 Changed 6 months ago by
Oops, here goes my belief that using lists made things slower.
Thanks for telling me and for the reference.
comment:28 Changed 6 months ago by
Using an iterator seems much more pythonic, and must use less memory, so maybe it's better, even though it may be a bit slower, but I'm no expert and just try to do what I'm told (and often get confused about what I'm supposed to do).
comment:29 followup: ↓ 30 Changed 5 months ago by
I am semiregularly getting errors with the new doctest
File "src/sage/graphs/bipartite_graph.py", line 326, in sage.graphs.bipartite_graph.BipartiteGraph.__init__ Failed example: print(round(get_memory_usage()  start_mem)) Expected: 0.0 Got: 1.0
This is on OS X Big Sur and all recent Sage betas and rcs. I can reproduce with an interactive Sage session: if I run the loop
sage: for _ in range(10): ....: make_bip_graph(A) ....:
it will sometimes cause an increase in memory usage.
comment:30 in reply to: ↑ 29 Changed 5 months ago by
Replying to jhpalmieri:
I am semiregularly getting errors with the new doctest
File "src/sage/graphs/bipartite_graph.py", line 326, in sage.graphs.bipartite_graph.BipartiteGraph.__init__ Failed example: print(round(get_memory_usage()  start_mem)) Expected: 0.0 Got: 1.0This is on OS X Big Sur and all recent Sage betas and rcs. I can reproduce with an interactive Sage session: if I run the loop
sage: for _ in range(10): ....: make_bip_graph(A) ....:it will sometimes cause an increase in memory usage.
Can you capture the particular A for which you get this failure, and attach it here? Of course it might be macOSspecific...
comment:31 Changed 5 months ago by
I have attached one example, but the following is quite repeatable for me:
 start a fresh Sage session and do this:
sage: A = Matrix(ZZ, 100, 125) sage: for i in range(A.nrows()): ....: for j in Subsets(A.ncols()).random_element(): ....: A[i, j1] = 1 ....: sage: def make_bip_graph(A): ....: G = BipartiteGraph(A) ....: sage: start_mem = get_memory_usage()
Then I repeat these commands
sage: for _ in range(100): ....: make_bip_graph(A) ....: sage: print(round(get_memory_usage()  start_mem))
and the output often, but not always, increases. Every time I've started Sage, I have seen an increase: it doesn't seem to rely on a very specific value of A
.
Further, if in the same Sage session, I redefine A
by A = Matrix(...)
and then use the loop for i in ...
, then I don't see the memory leak any more, no matter how many times I iterate make_bip_graph(A)
.
Changed 5 months ago by
comment:32 followups: ↓ 33 ↓ 34 Changed 5 months ago by
Can it increase multiple times per session? Or is it just one time per session that this happens?
Also what happens if you do first
import gc
and then
gc.collect()
right before every call of get_memory_usage()
comment:33 in reply to: ↑ 32 Changed 5 months ago by
Replying to mderickx:
Can it increase multiple times per session? Or is it just one time per session that this happens?
Also what happens if you do first
import gcand then
gc.collect()right before every call of
get_memory_usage()
indeed, what John sees might be merely a bug in this doctest.
Without gc.collect()
calls, it is not reliable, IMHO.
E.g. in src/sage/geometry/polyhedron/base.py
on has
Test that computing the face lattice does not lead to a memory leak:: sage: import gc sage: _ = gc.collect() sage: P = polytopes.cube() sage: a = P.face_lattice() sage: n = get_memory_usage() sage: P = polytopes.cube() sage: a = P.face_lattice() sage: _ = gc.collect() sage: n == get_memory_usage() True
this is the only other memory leak test in the library which uses get_memory_usage()
 other places merely use it for information (or just it's a useless import
which can go).
I've opened #31671 to fix this.
comment:34 in reply to: ↑ 32 Changed 5 months ago by
Replying to mderickx:
Can it increase multiple times per session? Or is it just one time per session that this happens?
Multiple times per session as long as I don't change the matrix A
.
Also what happens if you do first
import gcand then
gc.collect()right before every call of
get_memory_usage()
Doesn't help:
% sage ┌────────────────────────────────────────────────────────────────────┐ │ SageMath version 9.3.rc3, Release Date: 20210412 │ │ Using Python 3.9.2. Type "help()" for help. │ └────────────────────────────────────────────────────────────────────┘ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Warning: this is a prerelease version, and it may be unstable. ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ sage: A = Matrix(ZZ, 100, 125) sage: for i in range(A.nrows()): ....: for j in Subsets(A.ncols()).random_element(): ....: A[i, j1] = 1 ....: sage: def make_bip_graph(A): ....: G = BipartiteGraph(A) ....: sage: import gc sage: gc.collect() 349 sage: start_mem = get_memory_usage() sage: for _ in range(100): ....: make_bip_graph(A) ....: sage: gc.collect() 763 sage: print(round(get_memory_usage()  start_mem)) 16.0
comment:35 Changed 5 months ago by
(un)fortunately this is platform dependent  I can't reproduce this on Linux
comment:36 Changed 5 months ago by
What packages might affect this? I can enable/disable more homebrew packages and see if that makes a difference.
comment:37 Changed 5 months ago by
 Cc dcoudert added
it doesn't use anything external, I think, it's just sagelib and cython, IMHO.
comment:38 followup: ↓ 39 Changed 5 months ago by
on macOS 10.5.7 I observe a very small memory increase the first time we can the graph constructor. Could it be due to a lazy import or something like that ?
sapristi:sage dcoudert$ ./sage ┌────────────────────────────────────────────────────────────────────┐ │ SageMath version 9.3.rc3, Release Date: 20210412 │ │ Using Python 3.9.2. Type "help()" for help. │ └────────────────────────────────────────────────────────────────────┘ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Warning: this is a prerelease version, and it may be unstable. ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ sage: sage: m = 150 ....: ....: n = 200 ....: ....: A = Matrix(ZZ, m, n) ....: ....: for j in range(n): ....: ....: i = randint(0,m2) ....: ....: A[i, j] = 1 ....: ....: A[i+1, j] = 1 ....: ....: def graph(A): ....: ....: G = Graph(A) ....: ....: def bipartite(A): ....: ....: G = BipartiteGraph(A) ....: ....: start_mem = get_memory_usage() ....: ....: for n in [1..10]: ....: ....: prev_mem = get_memory_usage() ....: ....: graph(A) ....: ....: print("graph:", get_memory_usage()  prev_mem) ....: ....: prev_mem = get_memory_usage() ....: ....: bipartite(A) ....: ....: print("bipartite:", get_memory_usage()  prev_mem, get_memory_usage()  start_mem) graph: 0.171875 bipartite: 0.0 0.171875 graph: 0.0 bipartite: 0.0 0.171875 graph: 0.0 bipartite: 0.0 0.171875 graph: 0.0 bipartite: 0.0 0.171875 graph: 0.0 bipartite: 0.0 0.171875 graph: 0.0 bipartite: 0.0 0.171875 graph: 0.0 bipartite: 0.0 0.171875 graph: 0.0 bipartite: 0.0 0.171875 graph: 0.0 bipartite: 0.0 0.171875 graph: 0.0 bipartite: 0.0 0.171875
sapristi:sage dcoudert$ ./sage ┌────────────────────────────────────────────────────────────────────┐ │ SageMath version 9.3.rc3, Release Date: 20210412 │ │ Using Python 3.9.2. Type "help()" for help. │ └────────────────────────────────────────────────────────────────────┘ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Warning: this is a prerelease version, and it may be unstable. ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ sage: sage: A = Matrix(ZZ, 100, 125) ....: sage: for i in range(A.nrows()): ....: ....: for j in Subsets(A.ncols()).random_element(): ....: ....: A[i, j1] = 1 ....: ....: ....: sage: def make_bip_graph(A): ....: ....: G = BipartiteGraph(A) ....: ....: ....: sage: import gc ....: sage: gc.collect() ....: 271 sage: sage: start_mem = get_memory_usage() ....: sage: for _ in range(100): ....: ....: make_bip_graph(A) ....: ....: ....: sage: gc.collect() ....: 82 sage: print(round(get_memory_usage()  start_mem)) 0.0
comment:39 in reply to: ↑ 38 Changed 5 months ago by
Replying to dcoudert:
on macOS 10.5.7 I observe a very small memory increase the first time we can the graph constructor. Could it be due to a lazy import or something like that ?
Can you reproduce what's reported by John on macOS, or are we down to even more selective (perhaps CPU matters too?) bug in his case?
comment:40 followup: ↓ 41 Changed 5 months ago by
When I try the branch of this ticket on macOS (10.5.7, intel CPU), I have no dockets error in bipartite_graph.py
.
When I start a session and copy/paste the test, I have a random result:
sapristi:sage dcoudert$ ./sage ┌────────────────────────────────────────────────────────────────────┐ │ SageMath version 9.3.rc4, Release Date: 20210418 │ │ Using Python 3.9.2. Type "help()" for help. │ └────────────────────────────────────────────────────────────────────┘ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Warning: this is a prerelease version, and it may be unstable. ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ sage: A = Matrix(ZZ, 100, 125) ....: for i in range(A.nrows()): ....: for j in Subsets(A.ncols()).random_element(): ....: A[i, j1] = 1 ....: def make_bip_graph(A): ....: G = BipartiteGraph(A) ....: import gc ....: gc.collect() ....: start_mem = get_memory_usage() ....: for _ in range(100): ....: make_bip_graph(A) ....: gc.collect() ....: print(round(get_memory_usage()  start_mem)) ....: 282 0 0.0 sage: Exiting Sage (CPU time 0m1.49s, Wall time 0m19.73s). sapristi:sage dcoudert$ ./sage ┌────────────────────────────────────────────────────────────────────┐ │ SageMath version 9.3.rc4, Release Date: 20210418 │ │ Using Python 3.9.2. Type "help()" for help. │ └────────────────────────────────────────────────────────────────────┘ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Warning: this is a prerelease version, and it may be unstable. ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ sage: A = Matrix(ZZ, 100, 125) ....: for i in range(A.nrows()): ....: for j in Subsets(A.ncols()).random_element(): ....: A[i, j1] = 1 ....: def make_bip_graph(A): ....: G = BipartiteGraph(A) ....: import gc ....: gc.collect() ....: start_mem = get_memory_usage() ....: for _ in range(100): ....: make_bip_graph(A) ....: gc.collect() ....: print(round(get_memory_usage()  start_mem)) ....: 282 0 1.0
comment:41 in reply to: ↑ 40 Changed 5 months ago by
Replying to dcoudert:
When I try the branch of this ticket on macOS (10.5.7, intel CPU), I have no dockets error in
bipartite_graph.py
.
Can you try running the doctest several times? Does it pass 10 times in a row? Passing or failing is random for me.
comment:42 Changed 5 months ago by
Passed 7 times and failed 3 times :(
comment:43 Changed 5 months ago by
Thank you for checking. I've tried with the system Python, homebrew's Python, and Sage's, and I get failures with all three. I don't know what else to do to investigate this.
comment:44 Changed 5 months ago by
I presume it must be a leak in a Cython backend, but which one I don't know (not that I tried looking, though)
For me,
does NOT show increased memory usage. Putting the other commands inside the loop doesn't create a leak either. This is on Sage 9.2.