Opened 3 years ago

Closed 3 years ago

#20463 closed defect (fixed)

Pari segfault on Sage startup in Cygwin (2)

Reported by: embray Owned by:
Priority: major Milestone: sage-7.3
Component: porting: Cygwin Keywords: cygwin windows
Cc: Merged in:
Authors: Erik Bray Reviewers: Jeroen Demeyer
Report Upstream: Fixed upstream, in a later stable release. Work issues:
Branch: 856f128 (Commits) Commit: 856f128777c04a88e4efe6295e98292a2f0cc7e2
Dependencies: Stopgaps:

Description (last modified by jdemeyer)

Due to a Cygwin bug https://cygwin.com/ml/cygwin/2016-05/msg00140.html there is a segmentation fault when the PARI maximum stack size is larger than 4GB.

The issue has also been reported to PARI: http://pari.math.u-bordeaux.fr/archives/pari-dev-1605/threads.html

Change History (34)

comment:1 follow-up: Changed 3 years ago by embray

  • Authors set to Erik Bray
  • Branch set to u/embray/pari-mmap-cygwin
  • Commit set to d0ca39615c9d83ed76e9ab0916925b35d9131183
  • Keywords cygwin windows added
  • Report Upstream changed from N/A to Reported upstream. No feedback yet.
  • Status changed from new to needs_review

I have discovered what I think is the cause of this issue, and have reported it upstream--I believe it to be an issue in Cygwin: https://cygwin.com/ml/cygwin/2016-05/msg00140.html

I've attached a workaround for the issue from the Sage end. I'm not terribly happy with it, because if I understand correctly PARI will automatically grow its stack size as needed, and if it grows beyond 4GB this bug could still occur (though I haven't tested this yet--anyone want to suggest to me a good way to test it?)

I'm not sure how best to work around that though without completely taking over PARI's stack management. Probably fine just to leave it as a known issue until there's a fix in Cygwin. It's more important that we don't immediately segfault as soon as the first PARI number is created.


New commits:

d0ca396Workaround for Segfault during start up due to PARI's stack size being made too large for Cygwin (due to a bug in Cygwin; see http://trac.sagemath.org/ticket/20463)

comment:2 Changed 3 years ago by git

  • Commit changed from d0ca39615c9d83ed76e9ab0916925b35d9131183 to fcc10fc6cebcfb92ac71207099f80fa120afb04d

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

fcc10fcWorkaround for Segfault during start up due to PARI's stack size being made too large for Cygwin (due to a bug in Cygwin; see http://trac.sagemath.org/ticket/20463)

comment:3 Changed 3 years ago by embray

(Oops, had a typo in the original commit)

comment:4 in reply to: ↑ 1 Changed 3 years ago by jdemeyer

Replying to embray:

if I understand correctly PARI will automatically grow its stack size as needed

PARI will automatically grow its stack up to some user-provided maximum. The function is

paristack_setsize(size, vsize)       

where size is the actual size and vsize the maximum size.

comment:5 Changed 3 years ago by jdemeyer

  • Milestone changed from sage-7.2 to sage-7.3
  • Status changed from needs_review to needs_work

...which implies that the correct work-around for Cygwin is simply to limit the second argument of paristack_setsize().

comment:6 Changed 3 years ago by jdemeyer

Note also that not calling paristack_setsize() is not really an option: you would end up with a small stack which cannot be increased. I'm pretty sure that some Sage functionality assumes a PARI stack which is larger than the default size.

comment:7 Changed 3 years ago by jdemeyer

We should probably also report this to upstream PARI (I guess the problem can be reproduced inside PARI/GP itself), such that they might apply the workaround.

comment:8 Changed 3 years ago by jdemeyer

  • Description modified (diff)

comment:9 Changed 3 years ago by embray

I was under the assumption that it would grow automatically as needed, but maybe I shouldn't assume that it works nicely....

comment:10 Changed 3 years ago by jdemeyer

  • Description modified (diff)

comment:11 Changed 3 years ago by embray

  • Report Upstream changed from Reported upstream. No feedback yet. to Fixed upstream, in a later stable release.

I have an update to this patch somewhere, I just haven't gotten around to pushing it yet.

In the meantime Cygwin has acknowledged the bug and pushed the fix. I just confirmed that the fix seems to work (there's no segfault, at least): https://www.cygwin.com/ml/cygwin/2016-05/msg00240.html

Not sure yet when the fix will be in a release. I may update this to include a version check on Cygwin once it makes sense to.

comment:12 Changed 3 years ago by embray

Per this commit, the fix for this should appear in Cygwin v2.5.2.

Last edited 3 years ago by embray (previous) (diff)

comment:13 Changed 3 years ago by embray

If I were to put in a constant or function to get the Cygwin version (where relevant) where would be a good place to put that? sage.env, maybe?

comment:14 Changed 3 years ago by git

  • Commit changed from fcc10fc6cebcfb92ac71207099f80fa120afb04d to d976c6d201a52d6edcfdcd6246a01bbda353a93d

Branch pushed to git repo; I updated commit sha1. New commits:

7223ab0Instead of not setting the stack size at all, set it to a small size that should not trigger the bug.
d976c6dAdd a CYGWIN_VERSION variable to sage.env. Use this to check both whether we're in Cygwin at all, and if its version requires this workaround.

comment:15 Changed 3 years ago by embray

Now always calls paristack_setsize(), but to a smaller value on platforms with this issue.

Also added an explicit check on the cygwin version, and does not invoke the workaround for Cygwin v2.5.2 where this is fixed.

comment:16 Changed 3 years ago by embray

  • Status changed from needs_work to needs_review

comment:17 Changed 3 years ago by git

  • Commit changed from d976c6d201a52d6edcfdcd6246a01bbda353a93d to e06dd9513ff729b4e0f8baff35550512ca08f388

Branch pushed to git repo; I updated commit sha1. New commits:

e06dd95Cygwin version should be a tuple of ints, not strings

comment:18 Changed 3 years ago by jdemeyer

  • Status changed from needs_review to needs_work

I would prefer something along the lines of

sizemax = mem.virtual_memory_limit() // 4

if USING_BROKEN_VERSION_OF_CYGWIN and sizemax > KNOWN_WORKING_SIZE:
    # Cygwin's mmap is broken for large mmaps (>~ 4GB):
    # * https://trac.sagemath.org/ticket/20463
    # * https://cygwin.com/ml/cygwin/2016-05/msg00140.html
    #
    # The underlying issue is fixed in Cygwin v2.5.2
    sizemax = KNOWN_WORKING_SIZE

paristack_setsize(size, sizemax)

comment:19 Changed 3 years ago by embray

That's fine, but you don't literally mean USING_BROKEN_VERSION_OF_CYGWIN do you? The explicit version check that's there is much clearer.

comment:20 Changed 3 years ago by jdemeyer

No, not literal. Replace USING_BROKEN_VERSION_OF_CYGWIN and KNOWN_WORKING_SIZE by whatever you want.

comment:21 Changed 3 years ago by git

  • Commit changed from e06dd9513ff729b4e0f8baff35550512ca08f388 to e89c838f485a03c25aa9b0d8ad532ab9bbfce3a0

Branch pushed to git repo; I updated commit sha1. New commits:

e89c838Slight reordering to make it clearer that the Cygwin workaround is a special case, not the default.

comment:22 Changed 3 years ago by embray

  • Status changed from needs_work to needs_review

comment:23 Changed 3 years ago by jdemeyer

  • Status changed from needs_review to needs_info

Yes, this is better.

I just don't understand why you talk of a limit of 4GB but then use 1GB (actually0x3fffffff) in the code?

comment:24 follow-up: Changed 3 years ago by embray

To be honest I'm not exactly sure why either, though I might have been at one time (it's not like I've really worked on this recently. I know that for starters I was just going by how the default takes mem.virtual_memory_limit() // 4 so I also divided by 4. And then when I went much over that it still crashed so I just kept it at 1GB (minus 1)

comment:25 in reply to: ↑ 24 Changed 3 years ago by jdemeyer

Replying to embray:

I was just going by how the default takes mem.virtual_memory_limit() // 4

That division by 4 is simply to determine how much memory should be used. It is a totally arbitrary number and it should be unrelated to the limit on Cygwin.

Did you run full doctests (make ptestlong) on Cygwin with this patch?

Last edited 3 years ago by jdemeyer (previous) (diff)

comment:26 Changed 3 years ago by embray

make ptestlog is kind of useless right now because many other tests are failing (though I'm closing to having them all fixed now). That said, with this patch I haven't gotten any further segfaults from the Pari interface.

I'll have to double check on the actual limit. IIRC using just under 4 GB still didn't work, although possibly due to page alignment requirements, etc. It's been a while since I experimented with it. Another problem I remember having was there was a heisenbug where the pari interface segfaults only in gdb unless I remove the paristack_setsize call entirely. Outside the debugger though it works.

comment:27 Changed 3 years ago by jdemeyer

Would a limit of, say 0xf0000000 (3.75 GB) work?

comment:28 Changed 3 years ago by git

  • Commit changed from e89c838f485a03c25aa9b0d8ad532ab9bbfce3a0 to ef2f8f1d5344552630c93301d5826d6e715c521e

Branch pushed to git repo; I updated commit sha1. New commits:

ef2f8f1Changed the max size to mmap to a little closer to 4GB. Putting it at *right* under 4 GB (ex: 0xffffffff) still breaks things, possibly due to subtle alignment issues. But this is good enough. In any case the most important thing is it doesn't crash.

comment:29 Changed 3 years ago by embray

  • Status changed from needs_info to needs_review

Confirmed that 0xf0000000 works, so that's close enough.


New commits:

ef2f8f1Changed the max size to mmap to a little closer to 4GB. Putting it at *right* under 4 GB (ex: 0xffffffff) still breaks things, possibly due to subtle alignment issues. But this is good enough. In any case the most important thing is it doesn't crash.

comment:30 Changed 3 years ago by jdemeyer

This comment does not really makes sense, I suggest to delete these 2 lines:

            # So we will just let the pari stack size grow as needed; if
            # it grows beyond 4GB this issue could still be triggered

The stack will only grow "as needed" up to the virtual stack size, which is at most 0xf0000000.

Last edited 3 years ago by jdemeyer (previous) (diff)

comment:31 Changed 3 years ago by embray

Oh, you're right. That's from an older version of the patch where it wasn't setting a max stack size at all.

comment:32 Changed 3 years ago by git

  • Commit changed from ef2f8f1d5344552630c93301d5826d6e715c521e to 856f128777c04a88e4efe6295e98292a2f0cc7e2

Branch pushed to git repo; I updated commit sha1. New commits:

856f128Updated the comment on this to more accurately reflect what's going on

comment:33 Changed 3 years ago by jdemeyer

  • Reviewers set to Jeroen Demeyer
  • Status changed from needs_review to positive_review

comment:34 Changed 3 years ago by vbraun

  • Branch changed from u/embray/pari-mmap-cygwin to 856f128777c04a88e4efe6295e98292a2f0cc7e2
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.