Opened 7 years ago

Closed 7 years ago

#13880 closed defect (fixed)

Respect ulimit -v for GAP memory pool size

Reported by: vbraun Owned by: jason
Priority: blocker Milestone: sage-5.6
Component: misc Keywords:
Cc: Merged in: sage-5.6.beta3
Authors: Volker Braun Reviewers: Dmitrii Pasechnik
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by vbraun)

Limit the GAP pool size to 1/10 of the ulimit -v, if the user has set one. Also, allow zero-sized swap in a doctest.

Apply trac_13880_gap_pool_RLIMIT_AS.patch

Attachments (2)

trac_13880_swap_doctest_fix.patch (2.0 KB) - added by vbraun 7 years ago.
Updated patch
trac_13880_gap_pool_RLIMIT_AS.patch (6.6 KB) - added by vbraun 7 years ago.
Updated patch

Download all attachments as: .zip

Change History (40)

Changed 7 years ago by vbraun

Updated patch

comment:1 Changed 7 years ago by vbraun

  • Authors set to Volker Braun
  • Status changed from new to needs_review

comment:2 Changed 7 years ago by vbraun

  • Description modified (diff)
  • Summary changed from Fix doctest for amout of swap to Respect ulimit -v for GAP memory pool size

comment:3 follow-up: Changed 7 years ago by leif

Why available_swap() in ZZ?


I'm not sure whether the fallback value of virtual_memory_limit() (self.available_swap() + self.available_ram()) is appropriate.

It doesn't take into account the memory already used by the process, and the available (or total) swap can be much larger than what's available to a single process (e.g. 16 GB swap space on a 32-bit processor or OS); similar for the RAM "available".

comment:4 in reply to: ↑ 3 ; follow-up: Changed 7 years ago by vbraun

Replying to leif:

Why available_swap() in ZZ?

To test that it returns an integer as promised in the docstring.

I've added a manual 2GB limit for the rare case of 32-bit machines with >> 2GB ram.

comment:5 in reply to: ↑ 4 Changed 7 years ago by leif

Replying to vbraun:

Replying to leif:

Why available_swap() in ZZ?

To test that it returns an integer as promised in the docstring.

Not that obvious, to me at least... ;-)

Still wonder whether we shouldn't also test that the result is actually non-negative.


I've added a manual 2GB limit for the rare case of 32-bit machines with >> 2GB ram.

Ok. (The more common case -- which I meant -- is having >> 2 GB swap space, which is handled now as well.)


Slightly related (memory_info.py probably deserved its own ticket):

available_ram() should probably have an optional boolean parameter to not just return the amount of free memory ("MemFree"), as this can often be pretty close to zero, due to buffering and especially caching. (I.e., the amount of "available" RAM, allocatable by a process without running out of [physical] memory, might in fact be much larger, regardless of whether swap space is available or not.)

As is, although its docstring contains "free" in parentheses, the term "available RAM" is IMHO potentially misleading.

W.r.t. GAP, I think it may currently "run out of memory" simply because [there's no or not enough swap space and] most (or at least much) of the physical memory is eaten up by just caches.

comment:6 Changed 7 years ago by jdemeyer

I disagree with

suggested_size = min(suggested_size, int(mem.virtual_memory_limit()/10))

This virtual_memory_limit is per-process, so the suggested size should be roughly equal to virtual_memory_limit, not much less. Maybe something like

# Respect ulimit -v, use slightly less than maximum allowed amount of memory
suggested_size = min(suggested_size, int(mem.virtual_memory_limit()*7/8))

comment:7 Changed 7 years ago by jdemeyer

  • Priority changed from major to blocker

comment:8 follow-up: Changed 7 years ago by vbraun

I'm sure you can improve on available_ram() but its good enough for starting GAP, I think. The perfect is the enemy of the good.

For libGAP the pool is actually in the Sage process, so what Jeroen suggests would be a bad idea.

Also, I don't think the user's intention when setting ulimit -v was for Sage to spawn a couple of processes that each try to get as close as possible to the limit.

comment:9 in reply to: ↑ 8 ; follow-up: Changed 7 years ago by jdemeyer

Replying to vbraun:

I'm sure you can improve on available_ram() but its good enough for starting GAP, I think. The perfect is the enemy of the good.

But I'd say that this patch actually makes things worse than they are. GAP currently works fine with ulimit -v, it automatically adjusts (remember the "halving pool size" messages).

For libGAP the pool is actually in the Sage process, so what Jeroen suggests would be a bad idea.

No, because this ticket doesn't refer to libGAP.

Also, I don't think the user's intention when setting ulimit -v was for Sage to spawn a couple of processes that each try to get as close as possible to the limit.

I don't think the user intends GAP to take 1/10th of the memory either. If I would set the memory limit to 2GB and GAP doesn't want to allocate 300MB, I would be quite surprised.

I do agree this is all bikeshedding, that's why I intentionally don't set this to needs_work.

comment:10 in reply to: ↑ 9 Changed 7 years ago by vbraun

Replying to jdemeyer:

But I'd say that this patch actually makes things worse than they are. GAP currently works fine with ulimit -v, it automatically adjusts (remember the "halving pool size" messages).

Not if you use libgap. It also adjusts to the point where you don't have any virtual memory left for Sage and then other stuff runs of of memory.

comment:11 Changed 7 years ago by leif

The libGAP discussion IMHO belongs to another ticket (and is IMHO pretty irrelevant at this point). Dima mentioned the GAP developers seem to be working on a new memory manager, probably also because the current one is unsuitable for a library, at least when used together with other libraries or memory managers.


Would it make sense to use a different pool size for doctesting (i.e., a close[r]-to-minimal one)?


I still believe the current implementation will prevent GAP from working if the RAM is filled up with buffers and caches (and there's no or "not enough" swap space), which to me doesn't seem to be too untypical. Haven't tried yet though.

comment:12 follow-ups: Changed 7 years ago by vbraun

The libGAP discussion is relevant because they share the function to determine the pool size.

If you have neither free ram nor any swap then we'll still reserve a pool that is large enough to run all long doctests.

comment:13 in reply to: ↑ 12 ; follow-up: Changed 7 years ago by nbruin

Replying to vbraun:

The libGAP discussion is relevant because they share the function to determine the pool size.

Perhaps the suggested_size routine needs a flag for_libgap=true/false then, to compute appropriate defaults for either situation. Obviously, the two cases ask for separate considerations in the face of per-process limits.

Isn't this ticket dealing with a sage in which libgap is non-existent? patching this upon introduction of libgap is rather straightforward.

comment:14 in reply to: ↑ 13 ; follow-up: Changed 7 years ago by leif

Replying to nbruin:

Isn't this ticket dealing with a sage in which libgap is non-existent? patching this upon introduction of libgap is rather straightforward.

That's what I meant with (currently) "irrelevant". (Besides that libGAP may come with a new / alternate memory manager when Sage is "ready" for it.)

comment:15 in reply to: ↑ 12 Changed 7 years ago by leif

Replying to vbraun:

If you have neither free ram nor any swap then we'll still reserve a pool that is large enough to run all long doctests.

Ok. That's probably enough for "ordinary" Sage sessions as well.

Not sure right now how it behaves when users happen to need a larger pool size. (They should get an appropriate error message, and the pool size should be easily customizable. AFAIK, the latter currently requires modifying $DOT_SAGE/init.sage.)

comment:16 Changed 7 years ago by vbraun

You can always use sage.interfaces.gap.set_gap_memory_pool_size() before starting any gap computation if you want to set it manually. This is not changed in this ticket.

comment:17 in reply to: ↑ 14 ; follow-up: Changed 7 years ago by vbraun

Replying to leif:

Besides that libGAP may come with a new / alternate memory manager when Sage is "ready" for it.

LibGAP is ready. It only needs review for a few obvious improvements over the previously reviewed version. I.e. will probably never be merged considering the glacial process. But ready it is.

comment:18 in reply to: ↑ 17 Changed 7 years ago by dimpase

Replying to vbraun:

Replying to leif:

Besides that libGAP may come with a new / alternate memory manager when Sage is "ready" for it.

LibGAP is ready. It only needs review for a few obvious improvements over the previously reviewed version. I.e. will probably never be merged considering the glacial process.

I'm working on this from a crevasse in a local Singaporean glacier. It helps against hardware overheating...

comment:19 Changed 7 years ago by dimpase

I am testing this on a system where mem.available_swap() comes up negative:

sage: t = mem._parse_proc_meminfo(); t
{'available_ram': 1435963392, 'total_swap': 10909376512, 'available_swap': 8558673920, 'Committed_AS': 16000847872, 'total_ram': 7929249792}
sage: mem.available_swap()
-5091471360

which is not surprising, as mem.available_swap() computes t['total_swap']-t['Committed_AS'], rather than returning t['available_swap'].

What is going on? A bug or a feature?

That's how far I get digging up the cause of

sage -t -long "devel/sage-main/sage/misc/memory_info.py"    
**********************************************************************
File "/usr/local/src/sage/sage-5.5/devel/sage-main/sage/misc/memory_info.py", line 122:
    sage: mem.virtual_memory_limit() > 0
Expected:
    True
Got:
    False
**********************************************************************
1 items had failures:

comment:20 follow-up: Changed 7 years ago by dimpase

How about the following patch:

diff --git a/sage/misc/memory_info.py b/sage/misc/memory_info.py
--- a/sage/misc/memory_info.py
+++ b/sage/misc/memory_info.py
@@ -263,7 +263,10 @@
         """
         info = self._parse_proc_meminfo()
         try:
-            return info['total_swap'] - info['Committed_AS']
+            dif = info['total_swap'] - info['Committed_AS']
+            if dif > 0:
+               return dif
+            return info['available_swap']
         except KeyError:
             return info['available_swap']

at least this fixes the doctests on the system in question

comment:21 in reply to: ↑ 20 Changed 7 years ago by dimpase

Replying to dimpase:

How about the following patch:

diff --git a/sage/misc/memory_info.py b/sage/misc/memory_info.py
--- a/sage/misc/memory_info.py
+++ b/sage/misc/memory_info.py
@@ -263,7 +263,10 @@
         """
         info = self._parse_proc_meminfo()
         try:
-            return info['total_swap'] - info['Committed_AS']
+            dif = info['total_swap'] - info['Committed_AS']
+            if dif > 0:
+               return dif
+            return info['available_swap']
         except KeyError:
             return info['available_swap']

at least this fixes the doctests on the system in question

I actually have no clear idea why 'Committed_AS' gets into the picture here. E.g. I read

Committed_AS: An estimate of how much RAM you would need to make a 99.99% guarantee that there never is OOM (out of memory) for this workload. Normally the kernel will overcommit memory. That means, say you do a 1GB malloc, nothing happens, really. Only when you start USING that malloc memory you will get real memory on demand, and just as much as you use. So you sort of take a mortgage and hope the bank doesn't go bust. Other cases might include when you mmap a file that's shared only when you write to it and you get a private copy of that data. While it normally is shared between processes. The Committed_AS is a guesstimate of how much RAM/swap you would need worst-case.

So this is a worst-case estimate, and info['total_swap'] < info['Committed_AS'] means, from the OS point of view, that you're living on the edge. Well, maybe. And the measurement is taken while make -j4 ptestlong on this 4-processor machine.

Last edited 7 years ago by dimpase (previous) (diff)

comment:22 Changed 7 years ago by jdemeyer

  • Status changed from needs_review to needs_work
  • Work issues set to mem.available_swap() can be negative

comment:23 Changed 7 years ago by jdemeyer

dimpase: the new test simply checks

MemoryInfo().available_swap() in ZZ

so perhaps this is a feature, not a bug? I don't mind that available_swap() can be negative.

comment:24 follow-up: Changed 7 years ago by vbraun

The problem is that I don't know an easy way to report "available swap". It could all be reserved by some other gap session for all we know. So Total-Committed_AS is a good approximation. Yes it can go negative if the kernel is overcommitting, but one could argue that it is then to be expected that a negative number is returned. In any case, it works as I intended it. If you have a better metric for available swap that doesn't involve going through the VM allocations of every process then I'd be happy to implement that. But in its present state it is good enough for starting GAP; if Committed_AS > Total then you probably want the minimial sized pool for GAP.

comment:25 Changed 7 years ago by vbraun

  • Status changed from needs_work to needs_review
  • Work issues mem.available_swap() can be negative deleted

comment:26 in reply to: ↑ 24 Changed 7 years ago by dimpase

Replying to vbraun:

The problem is that I don't know an easy way to report "available swap".

how about self._parse_proc_meminfo()['available_swap'] ?

As it is now, this value can be different from MemoryInfo().available_swap(), and this is rather confusing.

It could all be reserved by some other gap session for all we know.

but we also know that this might be a huge overestimate.

So Total-Committed_AS is a good approximation.

at least it should be named differently, not MemoryInfo().available_swap(), to avoid the clash with self._parse_proc_meminfo()['available_swap']

Yes it can go negative if the kernel is overcommitting, but one could argue that it is then to be expected that a negative number is returned. In any case, it works as I intended it. If you have a better metric for available swap that doesn't involve going through the VM allocations of every process then I'd be happy to implement that. But in its present state it is good enough for starting GAP; if Committed_AS > Total then you probably want the minimial sized pool for GAP.

comment:27 follow-up: Changed 7 years ago by vbraun

/proc/meminfo doesn't exclude reserved yet unused swap in the SwapFree field.

I've renamed the (private) dict key to free_swap. In any case thats not visible to the user.

comment:28 in reply to: ↑ 27 Changed 7 years ago by dimpase

Replying to vbraun:

/proc/meminfo doesn't exclude reserved yet unused swap in the SwapFree field.

I've renamed the (private) dict key to free_swap. In any case thats not visible to the user.

This is not enough. I still get

File "/usr/local/src/sage/sage-5.6.beta2/devel/sage-main/sage/misc/memory_info.py", line 122:
    sage: mem.virtual_memory_limit() > 0
Expected:
    True
Got:
    False

virtual_memory_limit() as implemented is too conservative, I think. Indeed:

sage: from sage.misc.memory_info import MemoryInfo
sage: mem = MemoryInfo()
sage: mem.av
mem.available_ram   mem.available_swap  
sage: mem.available_swap()
-5123952640
sage: mem.available_ram() 
1262751744
sage: mem.virtual_memory_limit()
-3861200896

unless you will allow virtual_memory_limit() be negative, too, but this is IMHO even weirder... Shouldn't an implementation respect the overcommitting settings somehow?

Changed 7 years ago by vbraun

Updated patch

comment:29 follow-up: Changed 7 years ago by vbraun

I've change virtual_memory_limit() to default to total swap+ram instead of available swap+ram. Now its always positive (and quite large unless you set a limit manually).

Last edited 7 years ago by vbraun (previous) (diff)

comment:30 in reply to: ↑ 29 Changed 7 years ago by dimpase

Replying to vbraun:

I've change virtual_memory_limit() to default to total swap+ram instead of available swap+ram. Now its always positive (and quite large unless you set a limit manually).

Why don't you use 'free_swap' instead of 'available_swap' ? This reflects the reality and the system settings (for overcommitting) far better.

With 'total swap'+'total ram' one would potentially end up with too much swap reserved for GAP.

comment:31 follow-up: Changed 7 years ago by vbraun

No, the virtual memory limit should only be used for the pool size if either it was set by hand to a relatively small value or if the address space is small compared to ram size (32-bit PAE).

comment:32 in reply to: ↑ 31 Changed 7 years ago by dimpase

Replying to vbraun:

No, the virtual memory limit should only be used for the pool size if either it was set by hand to a relatively small value or if the address space is small compared to ram size (32-bit PAE).

The docstring for virtual memory limit says:

    This is the value set by ``ulimit -v`` at the command line or
    a practical limit if no limit is set.

This does not describe the current implementation. (And it wasn't really true before, either).

comment:33 follow-up: Changed 7 years ago by vbraun

Why not? For all practical purposes, the limit is either swap+ram or your processor address limit if you don't set it.

comment:34 in reply to: ↑ 33 Changed 7 years ago by dimpase

Replying to vbraun:

Why not? For all practical purposes, the limit is either swap+ram or your processor address limit if you don't set it.

because you neither return the value X set by ulimit -v (you return X+Y, for some nonzero Y), nor anything practical (no single process can reach this limit, ever) to speak about.

comment:35 Changed 7 years ago by vbraun

Are you sure? I do return RLIMIT_AS if set:

[vbraun@volker-desktop ~]$ ulimit -v 12341234
[vbraun@volker-desktop ~]$ sage 
...
sage: from sage.misc.memory_info import MemoryInfo  
sage: MemoryInfo().virtual_memory_limit()
12637423616
sage: _/1024
12341234

And you probably can request total ram+swap thanks to overcommit if not much else is going on or the overcommit ratio is stupidly high.

comment:36 Changed 7 years ago by dimpase

  • Status changed from needs_review to positive_review

comment:37 Changed 7 years ago by jdemeyer

  • Reviewers set to Dmitrii Pasechnik

comment:38 Changed 7 years ago by jdemeyer

  • Merged in set to sage-5.6.beta3
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.