Opened 9 years ago

Closed 9 years ago

#9863 closed defect (fixed)

Error in sage/graphs/genus.pyx on ia64-Linux-suse

Reported by: mpatel Owned by: mvngu
Priority: blocker Milestone: sage-4.6.1
Component: doctest coverage Keywords:
Cc: jhpalmieri, mderickx Merged in: sage-4.6.1.rc0
Authors: Maarten Derickx, Mitesh Patel Reviewers: Mitesh Patel, Jeroen Demeyer
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description

Reported by John Palmieri on sage-release:

On iras (ia64-Linux-suse), after continuing the build, one failure:

sage -t -long "devel/sage/sage/graphs/genus.pyx"
**********************************************************************
File "/home/palmieri/iras/sage-4.5.2.alpha0/devel/sage/sage/graphs/
genus.pyx", line 129:
    sage: get_memory_usage(t)
Expected:
    0.0
Got:
    -0.28125

Related: #9584, which once mentioned this problem, but ended up fixing another.

Attachments (1)

trac_9863-genus_memory_leak_test.patch (1.3 KB) - added by mpatel 9 years ago.
Tweak flaky test with Maarten Derickx's suggestions

Download all attachments as: .zip

Change History (24)

comment:1 Changed 9 years ago by mpatel

  • What happens if you run the test manually, on the Sage command-line?
    sage: import sage.graphs.genus
    sage: G = graphs.CompleteGraph(100)
    sage: G = Graph(G, implementation='c_graph', sparse=False)
    sage: t = get_memory_usage()
    sage: gb = sage.graphs.genus.simple_connected_genus_backtracker(G._backend._cg)
    sage: gb = None  #indirect doctest
    sage: get_memory_usage(t)
    0.0
    
  • Could this be a bug in sage.misc.getusage.get_memory_usage?
  • Should we really test for equality?

comment:2 Changed 9 years ago by jhpalmieri

What happens if you run the test manually, on the Sage command-line?

Well, right now I can't even get it to fail when running

sage -t -long "devel/sage/sage/graphs/genus.pyx"

by itself. I can only get it to fail reliably if I use parallel testing, say

sage -tp 2 -long "devel/sage/sage/graphs/"

I tried running the testsuite from one window and "sage -t ... genus.pyx" in another, but that didn't fail, either.

Could this be a bug in sage.misc.getusage.get_memory_usage?

Is it reasonable to get negative numbers for this?

Should we change the test to

sage: get_memory_usage(t) <= 0
True

?

comment:3 Changed 9 years ago by mpatel

  • Priority changed from major to blocker

comment:4 Changed 9 years ago by mderickx

I find the test code which causes al the discussion in this ticket broken. The code tries to test if the deallocation of simple_connected_genus_backtracker works so to prevent memory leaks. But when I test it in sage (on os x 10.6 since I don't have acces to a suse installation) I get the following result with measuring how many memory the backtracker actually takes:

sage: import sage.graphs.genus
sage: G = graphs.!CompleteGraph(100)
sage: G = Graph(G, implementation='c_graph', sparse=False)
sage: t = get_memory_usage()
sage: gb = sage.graphs.genus.simple_connected_genus_backtracker(G._backend._cg)
sage: get_memory_usage(t)
0.0
sage: gb = None  #indirect doctest
sage: get_memory_usage(t)
0.0

So the backtracker seems to take an unmeasurable amount of memory in the first place, deallocation working properly is therefore not tested at all. (at least not os os x 10.6, I would love to see this test replicated on the machine on which the paralel doctesting gives errors).

comment:5 Changed 9 years ago by mderickx

  • Cc mderickx added

I know that python uses a garbage collector, I don't really know the detailed workings of the garbage collector, neither do I know all the internals of the implementation of the graphs code used here. But I do know that from first sight there might be a possibility that the garbage collector might collect some garbagage created by the second assignment to G in between the two "get_memory_usage" statements.

There is an easy way to rule out this possibilty by changing the test code to:

sage: import sage.graphs.genus
sage: G = graphs.CompleteGraph(100)
sage: H = Graph(G, implementation='c_graph', sparse=False)
sage: t = get_memory_usage()
sage: gb = sage.graphs.genus.simple_connected_genus_backtracker(H._backend._cg)
sage: gb = None  #indirect doctest
sage: get_memory_usage(t)
0.0

In this way, you don't overwrite the reference to graphs.CompleteGraph?(100) so there is no possibility of accidental garbage collection.

comment:6 Changed 9 years ago by mpatel

What if we use the gc module to wrap some lines in a gc.enable() / gc.disable() block? Or use gc.collect? There's also sage.interfaces.expect.gc_disabled(), a with statement context manager.

comment:7 Changed 9 years ago by mderickx

I think a gc.collect would be a good thing to do right before both of the get_memory_usage statements. I think it's even better then my previous suggestion to just rename change the second asignment to G in H. The first collect would be to make sure the measurements don't get messed up by things that where collectable already at the first get_memory_usage() statement.

The second collect would be to make sure that objects not already freed by the reference counting, but could be garbage collected still get freed. This should be done since these objects also wouldn't account to a memory leak. I suggest a code similar to the following.

sage: import sage.graphs.genus
sage: import gc
sage: G = graphs.CompleteGraph(100)
sage: G = Graph(G, implementation='c_graph', sparse=False)
sage: gc.collect()   #random
54
sage: t = get_memory_usage()
sage: for i in xrange(10000):
....:     gb = sage.graphs.genus.simple_connected_genus_backtracker(G._backend._cg)
....:     gb = None  #indirect doctest
....:     
sage: gc.collect()
0
sage: get_memory_usage(t)
0.0

ps. I added the for loop to make sure that if there really is a memory leak, it will be found. As shown in my first remark, the memory used by the tested class is insignificant.

comment:8 follow-up: Changed 9 years ago by mpatel

I ran mderickx's "gc.collect" test 100 times in the Sage CLI on bsd, redhawk, sage, and t2.math. I always got 0.0 on the first three. On t2, I sometimes got -1.0 or -2.0.

comment:9 Changed 9 years ago by mpatel

Looking at how differently sage.misc.getusage.get_memory_usage works on different types of systems --- in particular, the precision varies --- I think we'll also need to weaken the test to get_memory_usage(t) <= 0, as John suggests. I don't know if there's a good general answer.

Does the iras test, when it fails, always give -0.28125?

comment:10 in reply to: ↑ 8 Changed 9 years ago by drkirkby

Replying to mpatel:

I ran mderickx's "gc.collect" test 100 times in the Sage CLI on bsd, redhawk, sage, and t2.math. I always got 0.0 on the first three. On t2, I sometimes got -1.0 or -2.0.

t2.math will use prstat for this. Other systems use other methods, including system calls and using top, which is very similar to prstat.

comment:11 Changed 9 years ago by jhpalmieri

Does the iras test, when it fails, always give -0.28125?

I just reran the test and got -0.25.

comment:12 follow-up: Changed 9 years ago by mderickx

I also think that testing equality might by to strikt. While reading the source code I read how get_memory_usage() works. And on linux it works by asking top or prstat or the /proc filesystem how much memory the entire python process is using in which sage is running. To fully understand the fluctuations in the memory usage, we would need to know all the details of the implementation of the CPython virtual machine which interprets the python bytecode. I suspect that the virtual machine is not asking memory from the os using malloc for every python object it creates, but instead it uses malloc to sometimes ask for a chuck of memory in which it can fiddle around and only asks for a new peace of memory after it filled the old one (all this for performance reasons). So deallocating memory would also proably go in the same chunks. The scenario described above, and maybe many other scenarios might cause that changes in how many memory is used by python objects might not be in a 1-1 correspodence with changes in the total memory used by the python process.

I'm curious if the observed change in memory usages really has something to do with genus backtracker code. To bad I don't have acces to a linux machine with a sage install, for then I could test certain things myself. I really wonder if the problem also occurs if you change the assignment: gb = sage.graphs.genus.simple_connected_genus_backtracker(G._backend._cg) to something else.

The observed difference in precision is probably because of the way that python prints floats.

sage: 2.132311r-2.132311r
0.0

comment:13 Changed 9 years ago by mderickx

If we really want to do this cleanly, then we should use something like guppy: http://guppy-pe.sourceforge.net/. This frees us from worying about all the things that happen outside the python objects.

comment:14 in reply to: ↑ 12 Changed 9 years ago by drkirkby

Replying to mderickx:

I also think that testing equality might by to strikt. While reading the source code I read how get_memory_usage() works. And on linux it works by asking top or prstat or the /proc filesystem how much memory the entire python process is using in which sage is running.

prstat is never used on Linux - only on Solaris. To my knowledge, prstat does not exist on anything other than Solaris.

top used to be commonly used on SunOS then Solaris systems, though it was never part of the operating system. But it has become increasing less useful with modern systems, so on Solaris at least, it is considered unreliable.

The bit of code in Sage which uses prstat is only run on Solaris (aka SunOS) is:

   elif U == 'sunos':
        # Sun's 'prstat' command appends K, M or G depending on whether 
        # the memory usage is in KB. MB or GB. So we need to strip off 
        # the letter, and convert to a consistent unit of MB. 
        memory_in_KB_MB_or_GB = top().split()[3]
        if memory_in_KB_MB_or_GB.endswith("K"): 
            m = float(memory_in_KB_MB_or_GB.strip("K")) / 1024
        elif memory_in_KB_MB_or_GB.endswith("M"):
            m = float(memory_in_KB_MB_or_GB.strip("M"))
        elif memory_in_KB_MB_or_GB.endswith("G"):
            m = float(memory_in_KB_MB_or_GB.strip("G")) * 1024

I wrote that bit, since top is not standard on Solaris, and even if installed, it is not very useful with modern Solaris versions.

These small numbers however are not going to be coming from the output of the top command either.

Dave

comment:15 Changed 9 years ago by mpatel

  • Milestone changed from sage-4.6 to sage-4.6.1
  • Priority changed from blocker to major

In view of #9897, I'm deferring [merging] this to after 4.6.

comment:16 Changed 9 years ago by mpatel

But I take it that our current plan is to update the doctest as Maarten suggests and not to test for equality?

comment:17 Changed 9 years ago by mpatel

This tests a leaky Cython function:

cython('from stdlib cimport *\ndef leak(m=100000):\n    cdef double *arr = <double *>malloc(m * sizeof(double))')

def g(n=100, m=100000):
    import gc
#    import sage.graphs.genus
#    G = graphs.CompleteGraph(100)
#    G = Graph(G, implementation='c_graph', sparse=False)
    gc.collect()
    t = get_memory_usage()
    for i in xrange(n):
        leak(m)
#        gb = sage.graphs.genus.simple_connected_genus_backtracker(G._backend._cg)
#        gb = None  #indirect doctest
    gc.collect()
    return get_memory_usage(t)

For large enough n and m, g(n,m) > 0.0, but the threshold values will vary from system to system (because of get_memory_usage's implementation, etc.).

comment:18 Changed 9 years ago by mpatel

  • Authors set to Maarten Derickx

I've attached a trial, not much-tested patch that uses Maarten's suggestions.

comment:19 Changed 9 years ago by mpatel

  • Authors changed from Maarten Derickx to Maarten Derickx, Mitesh Patel
  • Status changed from new to needs_review

Changed 9 years ago by mpatel

Tweak flaky test with Maarten Derickx's suggestions

comment:20 Changed 9 years ago by mpatel

The patch causes no problems for me on {bsd,sage,t2}.math and iras.skynet.

comment:21 Changed 9 years ago by jdemeyer

  • Priority changed from major to blocker

comment:22 Changed 9 years ago by jdemeyer

  • Reviewers set to Mitesh Patel, Jeroen Demeyer
  • Status changed from needs_review to positive_review

Patch looks good to me.

comment:23 Changed 9 years ago by jdemeyer

  • Merged in set to sage-4.6.1.rc0
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.