Opened 9 months ago

Closed 8 months ago

#22633 closed defect (fixed)

Pari default stack size results in huge physical memory commit after fork on Cygwin

Reported by: embray Owned by:
Priority: major Milestone: sage-8.0
Component: porting: Cygwin Keywords: days85 windows cygwin pari mmap
Cc: jdemeyer Merged in:
Authors: Erik Bray, Jeroen Demeyer Reviewers: Erik Bray, Jeroen Demeyer
Report Upstream: Fixed upstream, but not in a stable release. Work issues:
Branch: 343f241 (Commits) Commit: 343f241bdd3a262cbff496ca9ec6ad6dfd8fbb4a
Dependencies: Stopgaps:

Description (last modified by jdemeyer)

It turns out #21582 causes big problems for fork on Cygwin. This is due to an implementation detail of how Cygwin handles copy-on-write private mappings when forking. A fact that might, unfortunately, be difficult to avoid. The problem with this on Cygwin is that, while Windows does not commit pages to physical memory until they are accessed, they do become committed upon any access (even reads, when they haven't already been written to).

So when a process with a private mmap is forked, it loops over all pages in the mmap'd region and copies them into the child's copy of the mmap'd region. This results in committing physical memory on both the parent and the child, even for pages that haven't been written to yet.

TL;DR, if there's a huge private|anonymous mmap in a process, and that process is forked, it will commit the full size of the mmap to memory in both the parent and child processes.

This is a big problem in Sage since we set a very large default stack size for Pari. This was not a problem prior to #21582, since Pari used the MAP_NORESERVE flag which circumvents this issue (only dirty pages need to be copied). The changes in #21582 make sense for Linux, but for Cygwin the opposite is true. Different but unfortunate implementation details on both platforms.

The best way forward is to use MAP_NORESERVE anyway, which helps Cygwin and won't hurt other platforms.

Upstream bug: https://pari.math.u-bordeaux.fr/cgi-bin/bugreport.cgi?bug=1912

Change History (27)

comment:1 Changed 9 months ago by embray

  • Authors set to Erik Bray
  • Branch set to u/embray/cygwin/pari-mmap
  • Commit set to 48b762121bc08048dc269732d7c72561b4f34978
  • Report Upstream changed from N/A to Not yet reported upstream; Will do shortly.

The additional patch I've attached seems to fix this, but I would like to do more testing to see if any more work is needed.

This definitely fixes the immediate problem of memory being eaten up after fork(), but I don't know if there are any other implications for these changes on Cygwin (I suspect the rest is fine, however).


New commits:

48b7621Add patch to Pari (on top of prot_none_1.patch) that restores use of MAP_NORESERVE on Cygwin

comment:2 follow-up: Changed 9 months ago by embray

Per jdemeyer it might be fine to simplify this by including MAP_NORESERVE on all platforms that support it. Now that PROT_NONE is used I think the MAP_NORESERVE should be harmless on Linux (though I'd like to know a way to check that...)

comment:3 in reply to: ↑ 2 Changed 9 months ago by jdemeyer

Replying to embray:

Now that PROT_NONE is used I think the MAP_NORESERVE should be harmless on Linux (though I'd like to know a way to check that...)

If you have a patch ready, I can test it.

comment:4 Changed 9 months ago by jdemeyer

Do you want me to report this upstream?

comment:5 Changed 9 months ago by git

  • Commit changed from 48b762121bc08048dc269732d7c72561b4f34978 to a724b2a0fd86f4894de05f3a0db0b9e078c7d36d

Branch pushed to git repo; I updated commit sha1. New commits:

a724b2aUpdate the patch to always apply MAP_NORESERVE if it's defined (not just on Cygwin)

comment:6 Changed 9 months ago by embray

I just updated the patch with your suggestion. I'll report it now.

comment:7 Changed 9 months ago by embray

  • Report Upstream changed from Not yet reported upstream; Will do shortly. to Reported upstream. No feedback yet.

comment:8 Changed 9 months ago by jdemeyer

  • Description modified (diff)

comment:9 follow-up: Changed 9 months ago by jdemeyer

Can you check if it suffices to add MAP_NORESERVE only to the PROT_NONE allocations? That would be more in line with the intent of the current code. See also my answer to the PARI bug.

comment:10 Changed 9 months ago by jdemeyer

Beware of #22675: we might need to synchronize these tickets.

comment:11 Changed 9 months ago by embray

I can update these after #22675.

comment:12 Changed 9 months ago by embray

Hm, for some reason, while I got Bill's initial reply to my report I didn't get your response in my e-mail. Maybe I need to explicitly subscribe to updates on the bug...

Yeah, I can try making that change. I don't see what difference it would make--what harm does it have on the other allocations? But if you think it will help I can try it. It should be fine.

comment:13 in reply to: ↑ 9 Changed 9 months ago by jdemeyer

Replying to jdemeyer:

Can you check if it suffices to add MAP_NORESERVE only to the PROT_NONE allocations? That would be more in line with the intent of the current code. See also my answer to the PARI bug.

ping

comment:14 Changed 9 months ago by embray

Right. Working on it.

comment:15 Changed 9 months ago by embray

A minimal example in C seems to indicate that this would not work. Perhaps it's a limitation of Cygwin and/or Windows, but calling mmap(..., PROT_NONE, MAP_NORESERVE) at the address of a previous mmap(NULL, ..., PROT_READ|PROT_WRITE, ...) without MAP_NORESERVE exhibits the same bug on fork.

Last edited 9 months ago by embray (previous) (diff)

comment:16 Changed 9 months ago by jdemeyer

Thanks for testing.

I will push a new patch here, hang on.

comment:17 Changed 9 months ago by jdemeyer

  • Branch changed from u/embray/cygwin/pari-mmap to u/jdemeyer/cygwin/pari-mmap

comment:18 Changed 9 months ago by jdemeyer

  • Commit changed from a724b2a0fd86f4894de05f3a0db0b9e078c7d36d to 795e3f127240b8138f526369b9548c868b4628f9

Does this work for you?


New commits:

795e3f1mmap() PARI stack with MAP_NORESERVE (for Cygwin)

comment:19 Changed 9 months ago by jdemeyer

Hang on, I will add a check for the return value of mmap().

comment:20 Changed 9 months ago by git

  • Commit changed from 795e3f127240b8138f526369b9548c868b4628f9 to 343f241bdd3a262cbff496ca9ec6ad6dfd8fbb4a

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

343f241mmap() PARI stack with MAP_NORESERVE (for Cygwin)

comment:21 Changed 9 months ago by embray

  • Status changed from new to needs_review

comment:22 Changed 9 months ago by jdemeyer

Does it work on Cygwin?

comment:23 Changed 9 months ago by embray

  • Status changed from needs_review to positive_review

It looks like the explicit munmap before remapping works. I don't exactly see what advantage this has over my original patch but this is fine too.

comment:24 Changed 9 months ago by jdemeyer

  • Report Upstream changed from Reported upstream. No feedback yet. to Fixed upstream, but not in a stable release.

Accepted by upstream.

comment:25 Changed 9 months ago by vbraun

  • Status changed from positive_review to needs_work

Reviewer name missing...

comment:26 Changed 9 months ago by jdemeyer

  • Authors changed from Erik Bray to Erik Bray, Jeroen Demeyer
  • Reviewers set to Erik Bray, Jeroen Demeyer
  • Status changed from needs_work to positive_review

comment:27 Changed 8 months ago by vbraun

  • Branch changed from u/jdemeyer/cygwin/pari-mmap to 343f241bdd3a262cbff496ca9ec6ad6dfd8fbb4a
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.