Opened 7 years ago
Closed 7 years ago
#18340 closed defect (fixed)
PARI/GP does not gracefully handle outofmemory
Reported by:  Andrey Novoseltsev  Owned by:  

Priority:  blocker  Milestone:  sage6.7 
Component:  packages: standard  Keywords:  PARI gp ptestlong doctesting lseries_ell.py 
Cc:  Jeroen Demeyer  Merged in:  
Authors:  Jeroen Demeyer  Reviewers:  Vincent Delecroix, John Cremona, Leif Leonhardy 
Report Upstream:  Fixed upstream, but not in a stable release.  Work issues:  
Branch:  5249809 (Commits, GitHub, GitLab)  Commit:  5249809481c4366e0ea5f9efc0f89b27ce7bc7fd 
Dependencies:  Stopgaps: 
Description (last modified by )
The following will eat all your memory:
$ ( ulimit s unlimited; ./sage t long src/sage/schemes/elliptic_curves/lseries_ell.py )
This sets the stack size to "unlimited". The gp
process which is executed by this doctest will then use a very large amount of memory.
This happens in particular with GNU make
< 3.82 due to https://savannah.gnu.org/bugs/?22010
This can be solved by upgrading PARI/GP.
Upstream: http://boxen.math.washington.edu/home/jdemeyer/spkg/pari2.81545gd04cdd3.tar.gz
See https://groups.google.com/d/topic/sagedevel/ydDfHsCxpAc/discussion
(Older) Thread on sagerelease: http://thread.gmane.org/gmane.comp.mathematics.sage.release/2757
Change History (46)
comment:1 Changed 7 years ago by
Keywords:  PARI gp ptestlong doctesting added 

Summary:  Memory issues while testing → Memory issues while testing (gp process) 
comment:2 Changed 7 years ago by
Description:  modified (diff) 

For me, on one machine it happens each time I run make ptestlong
unless I delete ~/.sage/timings*
before doing so. (Deleting these files "disables" the IMHO useless and rather counterproductive sorting of doctests by previous runtime.)
John C. reported it happened to him even with make testlong
, i.e., without testing in parallel.
comment:4 Changed 7 years ago by
Description:  modified (diff) 

Keywords:  lseries_ell.py added 
comment:5 Changed 7 years ago by
Why don't we always limit doctesting by some sensible number (4GB per process seems fine to me) and report failed tests if it is triggered? It is one thing when my old laptop get unresponsive with a new version of Sage, it is a completely different story when "my" server goes down which is used by several researches and hundreds of students.
comment:6 followup: 7 Changed 7 years ago by
Well, whether ulimit
works depends on the OS AFAIK.
And as I said, I do not even get a "Killed" from Sage; the doctest passes even though its child process got killed.
I haven't yet been able to reproduce this other than by running make ptestlong
; ./sage tp long warnlong 0.0001 src/sage/schemes/elliptic_curves/
for example doesn't show the behaviour even on an already loaded machine.
comment:7 Changed 7 years ago by
Replying to leif:
I haven't yet been able to reproduce this other than by running
make ptestlong
;./sage tp long warnlong 0.0001 src/sage/schemes/elliptic_curves/
for example doesn't show the behaviour even on an already loaded machine.
Vaguely thought of that already, and we indeed need (GNU) make
, and a sufficiently old one:
echo "WARNING: 'gp' may eat up up to ~16GB with the following" (ulimit v 16000000; printf "all:\n\t./sage t long src/sage/schemes/elliptic_curves/lseries_ell.py\n"  make f )
With make
3.81, gp
runs amok; with make
4.1, everything's fine.
(We had a similar issue with ECL and/or GAP quite a while ago IIRC, and the make
version without issues was 3.82 and later I think.)
comment:8 followup: 10 Changed 7 years ago by
novoselt@sagenb:~$ make version GNU Make 3.81 Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. This program built for x86_64pclinuxgnu
ECL fails some tests on this machine unless I compile with some flags for Bulldozer... I guess updating to jessie may help, but I can't risk breaking things right before the term starts.
comment:9 followup: 11 Changed 7 years ago by
Cc:  Jeroen Demeyer added 

In case it's really GNU make's fault: https://savannah.gnu.org/bugs/?22010
Fixed in 3.82, as I guessed.
Note that not just Ubuntu 10.04, but also 12.04 and even 14.04!!! (just checked and couldn't believe) still come with the dead old and broken version 3.81 of 2006.
comment:10 followup: 12 Changed 7 years ago by
Replying to novoselt:
novoselt@sagenb:~$ make version GNU Make 3.81 Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. This program built for x86_64pclinuxgnuECL fails some tests on this machine unless I compile with some flags for Bulldozer... I guess updating to jessie may help, but I can't risk breaking things right before the term starts.
Well, just grab some newer version (3.82, 4.0 or 4.1) from http://ftp.gnu.org/gnu/make/, untar it, and run configure && make && sudo make install
(by default into /usr/local
).
comment:11 followup: 13 Changed 7 years ago by
Replying to leif:
In case it's really GNU make's fault: https://savannah.gnu.org/bugs/?22010
Hmmm, I'm still not sure whether it's really this. Ubuntu apparently patched their make
, as it does not touch the resource limits, while my GNU make 4.1 limits the stack size to 8MB (soft):
for make in /usr/local/bin/make /usr/bin/make; do $make version  head 1; printf "all:\n\t: ; ulimit sH\n\t: ; ulimit sS\n"  $make f ; done GNU Make 4.1 : ; ulimit sH unlimited : ; ulimit sS 8192 GNU Make 3.81 : ; ulimit sH unlimited : ; ulimit sS unlimited
Another cause could be GNU make 3.81 messing up file descriptors somehow.
comment:12 followup: 14 Changed 7 years ago by
Replying to leif:
Well, just grab some newer version (3.82, 4.0 or 4.1) from http://ftp.gnu.org/gnu/make/, untar it, and run
configure && make && sudo make install
(by default into/usr/local
).
Alternatively, I can just not test Sage. But it does not fix the issue for other users. If it is a bug in make
, but it is a widely adopted version, it still makes sense to work around it. Wheezy is likely to be around for another 3 years.
comment:13 followup: 29 Changed 7 years ago by
Replying to leif:
Replying to leif:
In case it's really GNU make's fault: https://savannah.gnu.org/bugs/?22010
Hmmm, I'm still not sure whether it's really this. Ubuntu apparently patched their
make
, as it does not touch the resource limits, while my GNU make 4.1 limits the stack size to 8MB (soft):for make in /usr/local/bin/make /usr/bin/make; do $make version  head 1; printf "all:\n\t: ; ulimit sH\n\t: ; ulimit sS\n"  $make f ; done GNU Make 4.1 : ; ulimit sH unlimited : ; ulimit sS 8192 GNU Make 3.81 : ; ulimit sH unlimited : ; ulimit sS unlimited
What is the ulimit s
outside of make
? I get
jdemeyer@tamiyo:~$ ulimit s 8192
so if make
changes that to unlimited
, that's https://savannah.gnu.org/bugs/?22010 indeed.
comment:14 Changed 7 years ago by
Replying to novoselt:
If it is a bug in
make
, but it is a widely adopted version, it still makes sense to work around it. Wheezy is likely to be around for another 3 years.
The fixed version of GNU make
has been released almost 5 years ago. There is really no excuse for still using make
3.81. Like leif said, you can easily install that manually outside of the distro.
comment:15 Changed 7 years ago by
Component:  PLEASE CHANGE → packages: standard 

Description:  modified (diff) 
Summary:  Memory issues while testing (gp process) → If given unlimited stack, gp uses it all 
I can confirm the issue.
comment:17 Changed 7 years ago by
Authors:  → Jeroen Demeyer 

OK, this is my analysis: at some point, the dokchitser gp
script is really trying to compute a vector of length 348804762364674
. This obviously fails with a PARI stack overflow (this "stack" has nothing to do with the ulimit s
stack). As a result, Sage tries to increase gp
's memory with the allocatemem()
command. But since this vector is really too big, it fails no matter how much memory is allocated. At some point, allocating more memory from the system fails and it seems that PARI/GP does not gracefully handle this outofmemory condition.
The solution: upgrade PARI/GP.
comment:18 Changed 7 years ago by
Description:  modified (diff) 

comment:19 Changed 7 years ago by
Branch:  → u/jdemeyer/if_given_unlimited_stack__gp_uses_it_all 

comment:20 Changed 7 years ago by
Commit:  → 84ebf31d352e519253a07247c9fe2d3c3fa85b6a 

Status:  new → needs_review 
comment:21 followup: 24 Changed 7 years ago by
This is not floating point noise
sage: pari('intnum(x=0,13,sin(x)+sin(x^2) + x)')  85.1885681951527 + 84.1818153922297
The value of the integral computed with gsl is
sage: numerical_integral(sin(x)+sin(x^2)+x,0,13) (85.18856819515268, 1.3572929447036586e09)
So Pari was right up to 10^{11} and now it gets down to 10^{4}. This is not acceptable.
comment:22 Changed 7 years ago by
Commit:  84ebf31d352e519253a07247c9fe2d3c3fa85b6a → da46c953bd22e80790ec6437cfe529484c858dec 

Branch pushed to git repo; I updated commit sha1. New commits:
da46c95  Autogenerated PARI files should depend on decl.pxi

comment:23 Changed 7 years ago by
Status:  needs_review → needs_work 

comment:24 followup: 25 Changed 7 years ago by
Report Upstream:  N/A → Not yet reported upstream; Will do shortly. 

Replying to vdelecroix:
This is not floating point noise
sage: pari('intnum(x=0,13,sin(x)+sin(x^2) + x)')  85.1885681951527 + 84.1818153922297The value of the integral computed with gsl is
sage: numerical_integral(sin(x)+sin(x^2)+x,0,13) (85.18856819515268, 1.3572929447036586e09)So Pari was right up to 10^{11} and now it gets down to 10^{4}. This is not acceptable.
It's much worse. Note the 85
became 84
.
comment:25 followup: 26 Changed 7 years ago by
Replying to jdemeyer:
Replying to vdelecroix:
This is not floating point noise
sage: pari('intnum(x=0,13,sin(x)+sin(x^2) + x)')  85.1885681951527 + 84.1818153922297The value of the integral computed with gsl is
sage: numerical_integral(sin(x)+sin(x^2)+x,0,13) (85.18856819515268, 1.3572929447036586e09)So Pari was right up to 10^{11} and now it gets down to 10^{4}. This is not acceptable.
It's much worse. Note the
85
became84
.
Indeed! I can confirm the behavior on the freshly compiled pari master development 17795d04cdd3
.
comment:26 Changed 7 years ago by
Replying to vdelecroix:
Replying to jdemeyer:
Replying to vdelecroix:
This is not floating point noise
sage: pari('intnum(x=0,13,sin(x)+sin(x^2) + x)')  85.1885681951527 + 84.1818153922297The value of the integral computed with gsl is
sage: numerical_integral(sin(x)+sin(x^2)+x,0,13) (85.18856819515268, 1.3572929447036586e09)So Pari was right up to 10^{11} and now it gets down to 10^{4}. This is not acceptable.
It's much worse. Note the
85
became84
.Indeed! I can confirm the behavior on the freshly compiled pari master
development 17795d04cdd3
.
This is rather disjoint from this ticket, I opened #18342 to track the issue.
comment:27 Changed 7 years ago by
There are also doctest failures in src/sage/calculus/calculus.py
due to this.
comment:28 Changed 7 years ago by
Commit:  da46c953bd22e80790ec6437cfe529484c858dec → 23a01103840dd8dd37197f7a18c9b1ac8635bd47 

Branch pushed to git repo; I updated commit sha1. New commits:
23a0110  Fix call to PARI qfminim() in shortest_vector()

comment:29 Changed 7 years ago by
Replying to jdemeyer:
What is the
ulimit s
outside ofmake
? I getjdemeyer@tamiyo:~$ ulimit s 8192so if
make
changes that tounlimited
, that's https://savannah.gnu.org/bugs/?22010 indeed.
Yep, missed that: Outside of make
, the stack size is (already) softlimited to 8MB, which GNU make 3.81 in the distros resets to unlimited
.
The original bug report was actually about the opposite: Changing it from unlimited
to some huge value.
But the real bug is that (the old) make
touches the resource limits of (nonmake
) subprocesses at all.
comment:30 Changed 7 years ago by
Commit:  23a01103840dd8dd37197f7a18c9b1ac8635bd47 → 11ce9814729ea25384715b293ea3711f0354b6a2 

Branch pushed to git repo; I updated commit sha1. New commits:
11ce981  More integration doctest fixes

comment:31 Changed 7 years ago by
Status:  needs_work → needs_review 

comment:32 Changed 7 years ago by
Summary:  If given unlimited stack, gp uses it all → PARI/GP does not gracefully handle outofmemory 

comment:33 Changed 7 years ago by
But "PARI/GP does not gracefully handle outofmemory" is "Fixed upstream, but not in a stable release", right?
comment:34 Changed 7 years ago by
Report Upstream:  Not yet reported upstream; Will do shortly. → Fixed upstream, but not in a stable release. 

comment:36 Changed 7 years ago by
With Sage 6.7.beta3 and GCC 5.1.0 on Linux x86_64, PARI and eclib pass their test suites (Lcalc no longer has one), and ptestlong passes modulo one unrelated test (I already reported for 6.7.beta3 on sagerelease).
Also
(ulimit s unlimited; ./sage t long src/sage/schemes/elliptic_curves/lseries_ell.py)
works as expected.
Haven't looked very closely at the patches though.
The changed behaviour of PARI's intnum()
should IMHO be documented, on #18342, where also the doctest could be changed (not just its output).
comment:37 Changed 7 years ago by
P.S.:
These aren't new, i.e., no regression:
../src/modules/mpqs.c:376:16: warning: array subscript is above array bounds [Warraybounds] ../src/modules/mpqs.c:378:22: warning: array subscript is above array bounds [Warraybounds] ../src/modules/mpqs.c:380:24: warning: array subscript is above array bounds [Warraybounds] ../src/modules/mpqs.c:380:24: warning: array subscript is above array bounds [Warraybounds] ../src/modules/mpqs.c:381:14: warning: array subscript is above array bounds [Warraybounds]
comment:39 followup: 41 Changed 7 years ago by
Just triggered a build on a (slow) x86 machine; this will presumably take at least half a day...
comment:40 Changed 7 years ago by
Looks good to me  all tests pass (64 bit). If Leif's test passes OK on a 32bit machine let's approve this. Thanks, Jeroen.
comment:41 followup: 42 Changed 7 years ago by
Replying to leif:
Just triggered a build on a (slow) x86 machine; this will presumably take at least half a day...
That was probably an estimate for a few releases ago; the last test run is still in progress...
comment:42 Changed 7 years ago by
Reviewers:  → Vincent Delecroix, John Cremona, Leif Leonhardy 

Status:  needs_review → positive_review 
Replying to leif:
Replying to leif:
Just triggered a build on a (slow) x86 machine; this will presumably take at least half a day...
That was probably an estimate for a few releases ago; the last test run is still in progress...
It finally finished without errors. (This was on a Pentium4 Prescott, Ubuntu 10.04.4, FSF GCC 4.8.2 with O3
, native build, FWIW.)
I haven't tested building with FLTK, but it looks as if PARI's "configure" would meanwhile work.
comment:43 Changed 7 years ago by
Ooops, just noticed get_fltk.patch
should then also get removed from patches/README.txt
, but that's really a minor issue.
comment:44 Changed 7 years ago by
Commit:  11ce9814729ea25384715b293ea3711f0354b6a2 → 5249809481c4366e0ea5f9efc0f89b27ce7bc7fd 

Status:  positive_review → needs_review 
Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:
5249809  Remove mention of get_fltk.patch

comment:45 Changed 7 years ago by
Status:  needs_review → positive_review 

comment:46 Changed 7 years ago by
Branch:  u/jdemeyer/if_given_unlimited_stack__gp_uses_it_all → 5249809481c4366e0ea5f9efc0f89b27ce7bc7fd 

Resolution:  → fixed 
Status:  positive_review → closed 
This has also been reported on sagerelease; it happens presumably since the last update of PARI.