Opened 7 years ago
Closed 7 years ago
#18340 closed defect (fixed)
PARI/GP does not gracefully handle out-of-memory
Reported by: | novoselt | Owned by: | |
---|---|---|---|
Priority: | blocker | Milestone: | sage-6.7 |
Component: | packages: standard | Keywords: | PARI gp ptestlong doctesting lseries_ell.py |
Cc: | jdemeyer | Merged in: | |
Authors: | Jeroen Demeyer | Reviewers: | Vincent Delecroix, John Cremona, Leif Leonhardy |
Report Upstream: | Fixed upstream, but not in a stable release. | Work issues: | |
Branch: | 5249809 (Commits, GitHub, GitLab) | Commit: | 5249809481c4366e0ea5f9efc0f89b27ce7bc7fd |
Dependencies: | Stopgaps: |
Description (last modified by )
The following will eat all your memory:
$ ( ulimit -s unlimited; ./sage -t --long src/sage/schemes/elliptic_curves/lseries_ell.py )
This sets the stack size to "unlimited". The gp
process which is executed by this doctest will then use a very large amount of memory.
This happens in particular with GNU make
< 3.82 due to https://savannah.gnu.org/bugs/?22010
This can be solved by upgrading PARI/GP.
Upstream: http://boxen.math.washington.edu/home/jdemeyer/spkg/pari-2.8-1545-gd04cdd3.tar.gz
See https://groups.google.com/d/topic/sage-devel/ydDfHsCxpAc/discussion
(Older) Thread on sage-release: http://thread.gmane.org/gmane.comp.mathematics.sage.release/2757
Change History (46)
comment:1 Changed 7 years ago by
- Keywords PARI gp ptestlong doctesting added
- Summary changed from Memory issues while testing to Memory issues while testing (gp process)
comment:2 Changed 7 years ago by
- Description modified (diff)
For me, on one machine it happens each time I run make ptestlong
unless I delete ~/.sage/timings*
before doing so. (Deleting these files "disables" the IMHO useless and rather counter-productive sorting of doctests by previous runtime.)
John C. reported it happened to him even with make testlong
, i.e., without testing in parallel.
comment:4 Changed 7 years ago by
- Description modified (diff)
- Keywords lseries_ell.py added
comment:5 Changed 7 years ago by
Why don't we always limit doctesting by some sensible number (4GB per process seems fine to me) and report failed tests if it is triggered? It is one thing when my old laptop get unresponsive with a new version of Sage, it is a completely different story when "my" server goes down which is used by several researches and hundreds of students.
comment:6 follow-up: ↓ 7 Changed 7 years ago by
Well, whether ulimit
works depends on the OS AFAIK.
And as I said, I do not even get a "Killed" from Sage; the doctest passes even though its child process got killed.
I haven't yet been able to reproduce this other than by running make ptestlong
; ./sage -tp --long --warn-long 0.0001 src/sage/schemes/elliptic_curves/
for example doesn't show the behaviour even on an already loaded machine.
comment:7 in reply to: ↑ 6 Changed 7 years ago by
Replying to leif:
I haven't yet been able to reproduce this other than by running
make ptestlong
;./sage -tp --long --warn-long 0.0001 src/sage/schemes/elliptic_curves/
for example doesn't show the behaviour even on an already loaded machine.
Vaguely thought of that already, and we indeed need (GNU) make
, and a sufficiently old one:
echo "WARNING: 'gp' may eat up up to ~16GB with the following" (ulimit -v 16000000; printf "all:\n\t./sage -t --long src/sage/schemes/elliptic_curves/lseries_ell.py\n" | make -f -)
With make
3.81, gp
runs amok; with make
4.1, everything's fine.
(We had a similar issue with ECL and/or GAP quite a while ago IIRC, and the make
version without issues was 3.82 and later I think.)
comment:8 follow-up: ↓ 10 Changed 7 years ago by
novoselt@sagenb:~$ make --version GNU Make 3.81 Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. This program built for x86_64-pc-linux-gnu
ECL fails some tests on this machine unless I compile with some flags for Bulldozer... I guess updating to jessie may help, but I can't risk breaking things right before the term starts.
comment:9 follow-up: ↓ 11 Changed 7 years ago by
- Cc jdemeyer added
In case it's really GNU make's fault: https://savannah.gnu.org/bugs/?22010
Fixed in 3.82, as I guessed.
Note that not just Ubuntu 10.04, but also 12.04 and even 14.04!!! (just checked and couldn't believe) still come with the dead old and broken version 3.81 of 2006.
comment:10 in reply to: ↑ 8 ; follow-up: ↓ 12 Changed 7 years ago by
Replying to novoselt:
novoselt@sagenb:~$ make --version GNU Make 3.81 Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. This program built for x86_64-pc-linux-gnuECL fails some tests on this machine unless I compile with some flags for Bulldozer... I guess updating to jessie may help, but I can't risk breaking things right before the term starts.
Well, just grab some newer version (3.82, 4.0 or 4.1) from http://ftp.gnu.org/gnu/make/, untar it, and run configure && make && sudo make install
(by default into /usr/local
).
comment:11 in reply to: ↑ 9 ; follow-up: ↓ 13 Changed 7 years ago by
Replying to leif:
In case it's really GNU make's fault: https://savannah.gnu.org/bugs/?22010
Hmmm, I'm still not sure whether it's really this. Ubuntu apparently patched their make
, as it does not touch the resource limits, while my GNU make 4.1 limits the stack size to 8MB (soft):
for make in /usr/local/bin/make /usr/bin/make; do $make --version | head -1; printf "all:\n\t: ; ulimit -sH\n\t: ; ulimit -sS\n" | $make -f -; done GNU Make 4.1 : ; ulimit -sH unlimited : ; ulimit -sS 8192 GNU Make 3.81 : ; ulimit -sH unlimited : ; ulimit -sS unlimited
Another cause could be GNU make 3.81 messing up file descriptors somehow.
comment:12 in reply to: ↑ 10 ; follow-up: ↓ 14 Changed 7 years ago by
Replying to leif:
Well, just grab some newer version (3.82, 4.0 or 4.1) from http://ftp.gnu.org/gnu/make/, untar it, and run
configure && make && sudo make install
(by default into/usr/local
).
Alternatively, I can just not test Sage. But it does not fix the issue for other users. If it is a bug in make
, but it is a widely adopted version, it still makes sense to work around it. Wheezy is likely to be around for another 3 years.
comment:13 in reply to: ↑ 11 ; follow-up: ↓ 29 Changed 7 years ago by
Replying to leif:
Replying to leif:
In case it's really GNU make's fault: https://savannah.gnu.org/bugs/?22010
Hmmm, I'm still not sure whether it's really this. Ubuntu apparently patched their
make
, as it does not touch the resource limits, while my GNU make 4.1 limits the stack size to 8MB (soft):for make in /usr/local/bin/make /usr/bin/make; do $make --version | head -1; printf "all:\n\t: ; ulimit -sH\n\t: ; ulimit -sS\n" | $make -f -; done GNU Make 4.1 : ; ulimit -sH unlimited : ; ulimit -sS 8192 GNU Make 3.81 : ; ulimit -sH unlimited : ; ulimit -sS unlimited
What is the ulimit -s
outside of make
? I get
jdemeyer@tamiyo:~$ ulimit -s 8192
so if make
changes that to unlimited
, that's https://savannah.gnu.org/bugs/?22010 indeed.
comment:14 in reply to: ↑ 12 Changed 7 years ago by
Replying to novoselt:
If it is a bug in
make
, but it is a widely adopted version, it still makes sense to work around it. Wheezy is likely to be around for another 3 years.
The fixed version of GNU make
has been released almost 5 years ago. There is really no excuse for still using make
3.81. Like leif said, you can easily install that manually outside of the distro.
comment:15 Changed 7 years ago by
- Component changed from PLEASE CHANGE to packages: standard
- Description modified (diff)
- Summary changed from Memory issues while testing (gp process) to If given unlimited stack, gp uses it all
I can confirm the issue.
comment:16 Changed 7 years ago by
I'm now trying to see why gp
uses so much stack memory.
comment:17 Changed 7 years ago by
OK, this is my analysis: at some point, the dokchitser gp
script is really trying to compute a vector of length 348804762364674
. This obviously fails with a PARI stack overflow (this "stack" has nothing to do with the ulimit -s
stack). As a result, Sage tries to increase gp
's memory with the allocatemem()
command. But since this vector is really too big, it fails no matter how much memory is allocated. At some point, allocating more memory from the system fails and it seems that PARI/GP does not gracefully handle this out-of-memory condition.
The solution: upgrade PARI/GP.
comment:18 Changed 7 years ago by
- Description modified (diff)
comment:19 Changed 7 years ago by
- Branch set to u/jdemeyer/if_given_unlimited_stack__gp_uses_it_all
comment:20 Changed 7 years ago by
- Commit set to 84ebf31d352e519253a07247c9fe2d3c3fa85b6a
- Status changed from new to needs_review
comment:21 follow-up: ↓ 24 Changed 7 years ago by
This is not floating point noise
sage: pari('intnum(x=0,13,sin(x)+sin(x^2) + x)') - 85.1885681951527 + 84.1818153922297
The value of the integral computed with gsl is
sage: numerical_integral(sin(x)+sin(x^2)+x,0,13) (85.18856819515268, 1.3572929447036586e-09)
So Pari was right up to 10-11 and now it gets down to 10-4. This is not acceptable.
comment:22 Changed 7 years ago by
- Commit changed from 84ebf31d352e519253a07247c9fe2d3c3fa85b6a to da46c953bd22e80790ec6437cfe529484c858dec
Branch pushed to git repo; I updated commit sha1. New commits:
da46c95 | Auto-generated PARI files should depend on decl.pxi
|
comment:23 Changed 7 years ago by
- Status changed from needs_review to needs_work
comment:24 in reply to: ↑ 21 ; follow-up: ↓ 25 Changed 7 years ago by
- Report Upstream changed from N/A to Not yet reported upstream; Will do shortly.
Replying to vdelecroix:
This is not floating point noise
sage: pari('intnum(x=0,13,sin(x)+sin(x^2) + x)') - 85.1885681951527 + 84.1818153922297The value of the integral computed with gsl is
sage: numerical_integral(sin(x)+sin(x^2)+x,0,13) (85.18856819515268, 1.3572929447036586e-09)So Pari was right up to 10-11 and now it gets down to 10-4. This is not acceptable.
It's much worse. Note the 85
became 84
.
comment:25 in reply to: ↑ 24 ; follow-up: ↓ 26 Changed 7 years ago by
Replying to jdemeyer:
Replying to vdelecroix:
This is not floating point noise
sage: pari('intnum(x=0,13,sin(x)+sin(x^2) + x)') - 85.1885681951527 + 84.1818153922297The value of the integral computed with gsl is
sage: numerical_integral(sin(x)+sin(x^2)+x,0,13) (85.18856819515268, 1.3572929447036586e-09)So Pari was right up to 10-11 and now it gets down to 10-4. This is not acceptable.
It's much worse. Note the
85
became84
.
Indeed! I can confirm the behavior on the freshly compiled pari master development 17795-d04cdd3
.
comment:26 in reply to: ↑ 25 Changed 7 years ago by
Replying to vdelecroix:
Replying to jdemeyer:
Replying to vdelecroix:
This is not floating point noise
sage: pari('intnum(x=0,13,sin(x)+sin(x^2) + x)') - 85.1885681951527 + 84.1818153922297The value of the integral computed with gsl is
sage: numerical_integral(sin(x)+sin(x^2)+x,0,13) (85.18856819515268, 1.3572929447036586e-09)So Pari was right up to 10-11 and now it gets down to 10-4. This is not acceptable.
It's much worse. Note the
85
became84
.Indeed! I can confirm the behavior on the freshly compiled pari master
development 17795-d04cdd3
.
This is rather disjoint from this ticket, I opened #18342 to track the issue.
comment:27 Changed 7 years ago by
There are also doctest failures in src/sage/calculus/calculus.py
due to this.
comment:28 Changed 7 years ago by
- Commit changed from da46c953bd22e80790ec6437cfe529484c858dec to 23a01103840dd8dd37197f7a18c9b1ac8635bd47
Branch pushed to git repo; I updated commit sha1. New commits:
23a0110 | Fix call to PARI qfminim() in shortest_vector()
|
comment:29 in reply to: ↑ 13 Changed 7 years ago by
Replying to jdemeyer:
What is the
ulimit -s
outside ofmake
? I getjdemeyer@tamiyo:~$ ulimit -s 8192so if
make
changes that tounlimited
, that's https://savannah.gnu.org/bugs/?22010 indeed.
Yep, missed that: Outside of make
, the stack size is (already) soft-limited to 8MB, which GNU make 3.81 in the distros resets to unlimited
.
The original bug report was actually about the opposite: Changing it from unlimited
to some huge value.
But the real bug is that (the old) make
touches the resource limits of (non-make
) subprocesses at all.
comment:30 Changed 7 years ago by
- Commit changed from 23a01103840dd8dd37197f7a18c9b1ac8635bd47 to 11ce9814729ea25384715b293ea3711f0354b6a2
Branch pushed to git repo; I updated commit sha1. New commits:
11ce981 | More integration doctest fixes
|
comment:31 Changed 7 years ago by
- Status changed from needs_work to needs_review
comment:32 Changed 7 years ago by
- Summary changed from If given unlimited stack, gp uses it all to PARI/GP does not gracefully handle out-of-memory
comment:33 Changed 7 years ago by
But "PARI/GP does not gracefully handle out-of-memory" is "Fixed upstream, but not in a stable release", right?
comment:34 Changed 7 years ago by
- Report Upstream changed from Not yet reported upstream; Will do shortly. to Fixed upstream, but not in a stable release.
comment:35 Changed 7 years ago by
Using bzip2 would save 0.5 MB... (2.9 MB vs. 3.4 MB)
comment:36 Changed 7 years ago by
With Sage 6.7.beta3 and GCC 5.1.0 on Linux x86_64, PARI and eclib pass their test suites (Lcalc no longer has one), and ptestlong passes modulo one unrelated test (I already reported for 6.7.beta3 on sage-release).
Also
(ulimit -s unlimited; ./sage -t --long src/sage/schemes/elliptic_curves/lseries_ell.py)
works as expected.
Haven't looked very closely at the patches though.
The changed behaviour of PARI's intnum()
should IMHO be documented, on #18342, where also the doctest could be changed (not just its output).
comment:37 Changed 7 years ago by
P.S.:
These aren't new, i.e., no regression:
../src/modules/mpqs.c:376:16: warning: array subscript is above array bounds [-Warray-bounds] ../src/modules/mpqs.c:378:22: warning: array subscript is above array bounds [-Warray-bounds] ../src/modules/mpqs.c:380:24: warning: array subscript is above array bounds [-Warray-bounds] ../src/modules/mpqs.c:380:24: warning: array subscript is above array bounds [-Warray-bounds] ../src/modules/mpqs.c:381:14: warning: array subscript is above array bounds [-Warray-bounds]
comment:38 Changed 7 years ago by
Ok, patches look reasonable as well.
comment:39 follow-up: ↓ 41 Changed 7 years ago by
Just triggered a build on a (slow) x86 machine; this will presumably take at least half a day...
comment:40 Changed 7 years ago by
Looks good to me -- all tests pass (64 bit). If Leif's test passes OK on a 32-bit machine let's approve this. Thanks, Jeroen.
comment:41 in reply to: ↑ 39 ; follow-up: ↓ 42 Changed 7 years ago by
Replying to leif:
Just triggered a build on a (slow) x86 machine; this will presumably take at least half a day...
That was probably an estimate for a few releases ago; the last test run is still in progress...
comment:42 in reply to: ↑ 41 Changed 7 years ago by
- Reviewers set to Vincent Delecroix, John Cremona, Leif Leonhardy
- Status changed from needs_review to positive_review
Replying to leif:
Replying to leif:
Just triggered a build on a (slow) x86 machine; this will presumably take at least half a day...
That was probably an estimate for a few releases ago; the last test run is still in progress...
It finally finished without errors. (This was on a Pentium4 Prescott, Ubuntu 10.04.4, FSF GCC 4.8.2 with -O3
, native build, FWIW.)
I haven't tested building with FLTK, but it looks as if PARI's "configure" would meanwhile work.
comment:43 Changed 7 years ago by
Ooops, just noticed get_fltk.patch
should then also get removed from patches/README.txt
, but that's really a minor issue.
comment:44 Changed 7 years ago by
- Commit changed from 11ce9814729ea25384715b293ea3711f0354b6a2 to 5249809481c4366e0ea5f9efc0f89b27ce7bc7fd
- Status changed from positive_review to needs_review
Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:
5249809 | Remove mention of get_fltk.patch
|
comment:45 Changed 7 years ago by
- Status changed from needs_review to positive_review
comment:46 Changed 7 years ago by
- Branch changed from u/jdemeyer/if_given_unlimited_stack__gp_uses_it_all to 5249809481c4366e0ea5f9efc0f89b27ce7bc7fd
- Resolution set to fixed
- Status changed from positive_review to closed
This has also been reported on sage-release; it happens presumably since the last update of PARI.