Opened 7 years ago

Closed 7 years ago

#18340 closed defect (fixed)

PARI/GP does not gracefully handle out-of-memory

Reported by: Andrey Novoseltsev Owned by:
Priority: blocker Milestone: sage-6.7
Component: packages: standard Keywords: PARI gp ptestlong doctesting lseries_ell.py
Cc: Jeroen Demeyer Merged in:
Authors: Jeroen Demeyer Reviewers: Vincent Delecroix, John Cremona, Leif Leonhardy
Report Upstream: Fixed upstream, but not in a stable release. Work issues:
Branch: 5249809 (Commits, GitHub, GitLab) Commit: 5249809481c4366e0ea5f9efc0f89b27ce7bc7fd
Dependencies: Stopgaps:

Status badges

Description (last modified by Jeroen Demeyer)

The following will eat all your memory:

$ ( ulimit -s unlimited; ./sage -t --long src/sage/schemes/elliptic_curves/lseries_ell.py )

This sets the stack size to "unlimited". The gp process which is executed by this doctest will then use a very large amount of memory.

This happens in particular with GNU make < 3.82 due to https://savannah.gnu.org/bugs/?22010

This can be solved by upgrading PARI/GP.

Upstream: http://boxen.math.washington.edu/home/jdemeyer/spkg/pari-2.8-1545-gd04cdd3.tar.gz


See https://groups.google.com/d/topic/sage-devel/ydDfHsCxpAc/discussion

(Older) Thread on sage-release: http://thread.gmane.org/gmane.comp.mathematics.sage.release/2757

Change History (46)

comment:1 Changed 7 years ago by Leif Leonhardy

Keywords: PARI gp ptestlong doctesting added
Summary: Memory issues while testingMemory issues while testing (gp process)

This has also been reported on sage-release; it happens presumably since the last update of PARI.

comment:2 Changed 7 years ago by Leif Leonhardy

Description: modified (diff)

For me, on one machine it happens each time I run make ptestlong unless I delete ~/.sage/timings* before doing so. (Deleting these files "disables" the IMHO useless and rather counter-productive sorting of doctests by previous runtime.)

John C. reported it happened to him even with make testlong, i.e., without testing in parallel.

comment:3 Changed 7 years ago by Leif Leonhardy

Description: modified (diff)

Ooops, GB, not MB... 8)

comment:4 Changed 7 years ago by Leif Leonhardy

Description: modified (diff)
Keywords: lseries_ell.py added

comment:5 Changed 7 years ago by Andrey Novoseltsev

Why don't we always limit doctesting by some sensible number (4GB per process seems fine to me) and report failed tests if it is triggered? It is one thing when my old laptop get unresponsive with a new version of Sage, it is a completely different story when "my" server goes down which is used by several researches and hundreds of students.

comment:6 Changed 7 years ago by Leif Leonhardy

Well, whether ulimit works depends on the OS AFAIK.

And as I said, I do not even get a "Killed" from Sage; the doctest passes even though its child process got killed.

I haven't yet been able to reproduce this other than by running make ptestlong; ./sage -tp --long --warn-long 0.0001 src/sage/schemes/elliptic_curves/ for example doesn't show the behaviour even on an already loaded machine.

comment:7 in reply to:  6 Changed 7 years ago by Leif Leonhardy

Replying to leif:

I haven't yet been able to reproduce this other than by running make ptestlong; ./sage -tp --long --warn-long 0.0001 src/sage/schemes/elliptic_curves/ for example doesn't show the behaviour even on an already loaded machine.

Vaguely thought of that already, and we indeed need (GNU) make, and a sufficiently old one:

echo "WARNING:  'gp' may eat up up to ~16GB with the following"
(ulimit -v 16000000; printf "all:\n\t./sage -t --long src/sage/schemes/elliptic_curves/lseries_ell.py\n" | make -f -)

With make 3.81, gp runs amok; with make 4.1, everything's fine.

(We had a similar issue with ECL and/or GAP quite a while ago IIRC, and the make version without issues was 3.82 and later I think.)

comment:8 Changed 7 years ago by Andrey Novoseltsev

novoselt@sagenb:~$ make --version
GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for x86_64-pc-linux-gnu

ECL fails some tests on this machine unless I compile with some flags for Bulldozer... I guess updating to jessie may help, but I can't risk breaking things right before the term starts.

comment:9 Changed 7 years ago by Leif Leonhardy

Cc: Jeroen Demeyer added

In case it's really GNU make's fault: https://savannah.gnu.org/bugs/?22010

Fixed in 3.82, as I guessed.

Note that not just Ubuntu 10.04, but also 12.04 and even 14.04!!! (just checked and couldn't believe) still come with the dead old and broken version 3.81 of 2006.

comment:10 in reply to:  8 ; Changed 7 years ago by Leif Leonhardy

Replying to novoselt:

novoselt@sagenb:~$ make --version
GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for x86_64-pc-linux-gnu

ECL fails some tests on this machine unless I compile with some flags for Bulldozer... I guess updating to jessie may help, but I can't risk breaking things right before the term starts.

Well, just grab some newer version (3.82, 4.0 or 4.1) from http://ftp.gnu.org/gnu/make/, untar it, and run configure && make && sudo make install (by default into /usr/local).

comment:11 in reply to:  9 ; Changed 7 years ago by Leif Leonhardy

Replying to leif:

In case it's really GNU make's fault: https://savannah.gnu.org/bugs/?22010

Hmmm, I'm still not sure whether it's really this. Ubuntu apparently patched their make, as it does not touch the resource limits, while my GNU make 4.1 limits the stack size to 8MB (soft):

for make in /usr/local/bin/make /usr/bin/make; do $make --version | head -1; printf "all:\n\t: ; ulimit -sH\n\t: ; ulimit -sS\n" | $make -f -; done
GNU Make 4.1
: ; ulimit -sH
unlimited
: ; ulimit -sS
8192
GNU Make 3.81
: ; ulimit -sH
unlimited
: ; ulimit -sS
unlimited

Another cause could be GNU make 3.81 messing up file descriptors somehow.

comment:12 in reply to:  10 ; Changed 7 years ago by Andrey Novoseltsev

Replying to leif:

Well, just grab some newer version (3.82, 4.0 or 4.1) from http://ftp.gnu.org/gnu/make/, untar it, and run configure && make && sudo make install (by default into /usr/local).

Alternatively, I can just not test Sage. But it does not fix the issue for other users. If it is a bug in make, but it is a widely adopted version, it still makes sense to work around it. Wheezy is likely to be around for another 3 years.

comment:13 in reply to:  11 ; Changed 7 years ago by Jeroen Demeyer

Replying to leif:

Replying to leif:

In case it's really GNU make's fault: https://savannah.gnu.org/bugs/?22010

Hmmm, I'm still not sure whether it's really this. Ubuntu apparently patched their make, as it does not touch the resource limits, while my GNU make 4.1 limits the stack size to 8MB (soft):

for make in /usr/local/bin/make /usr/bin/make; do $make --version | head -1; printf "all:\n\t: ; ulimit -sH\n\t: ; ulimit -sS\n" | $make -f -; done
GNU Make 4.1
: ; ulimit -sH
unlimited
: ; ulimit -sS
8192
GNU Make 3.81
: ; ulimit -sH
unlimited
: ; ulimit -sS
unlimited

What is the ulimit -s outside of make? I get

jdemeyer@tamiyo:~$ ulimit -s
8192

so if make changes that to unlimited, that's https://savannah.gnu.org/bugs/?22010 indeed.

Last edited 7 years ago by Jeroen Demeyer (previous) (diff)

comment:14 in reply to:  12 Changed 7 years ago by Jeroen Demeyer

Replying to novoselt:

If it is a bug in make, but it is a widely adopted version, it still makes sense to work around it. Wheezy is likely to be around for another 3 years.

The fixed version of GNU make has been released almost 5 years ago. There is really no excuse for still using make 3.81. Like leif said, you can easily install that manually outside of the distro.

comment:15 Changed 7 years ago by Jeroen Demeyer

Component: PLEASE CHANGEpackages: standard
Description: modified (diff)
Summary: Memory issues while testing (gp process)If given unlimited stack, gp uses it all

I can confirm the issue.

comment:16 Changed 7 years ago by Jeroen Demeyer

I'm now trying to see why gp uses so much stack memory.

comment:17 Changed 7 years ago by Jeroen Demeyer

Authors: Jeroen Demeyer

OK, this is my analysis: at some point, the dokchitser gp script is really trying to compute a vector of length 348804762364674. This obviously fails with a PARI stack overflow (this "stack" has nothing to do with the ulimit -s stack). As a result, Sage tries to increase gp's memory with the allocatemem() command. But since this vector is really too big, it fails no matter how much memory is allocated. At some point, allocating more memory from the system fails and it seems that PARI/GP does not gracefully handle this out-of-memory condition.

The solution: upgrade PARI/GP.

comment:18 Changed 7 years ago by Jeroen Demeyer

Description: modified (diff)

comment:19 Changed 7 years ago by Jeroen Demeyer

Branch: u/jdemeyer/if_given_unlimited_stack__gp_uses_it_all

comment:20 Changed 7 years ago by Jeroen Demeyer

Commit: 84ebf31d352e519253a07247c9fe2d3c3fa85b6a
Status: newneeds_review

This fixes the issue, currently running make ptestlong.


New commits:

84ebf31Upgrade PARI/GP

comment:21 Changed 7 years ago by Vincent Delecroix

This is not floating point noise

 sage: pari('intnum(x=0,13,sin(x)+sin(x^2) + x)')
- 85.1885681951527
+ 84.1818153922297

The value of the integral computed with gsl is

sage: numerical_integral(sin(x)+sin(x^2)+x,0,13)
(85.18856819515268, 1.3572929447036586e-09)

So Pari was right up to 10-11 and now it gets down to 10-4. This is not acceptable.

comment:22 Changed 7 years ago by git

Commit: 84ebf31d352e519253a07247c9fe2d3c3fa85b6ada46c953bd22e80790ec6437cfe529484c858dec

Branch pushed to git repo; I updated commit sha1. New commits:

da46c95Auto-generated PARI files should depend on decl.pxi

comment:23 Changed 7 years ago by Jeroen Demeyer

Status: needs_reviewneeds_work

comment:24 in reply to:  21 ; Changed 7 years ago by Jeroen Demeyer

Report Upstream: N/ANot yet reported upstream; Will do shortly.

Replying to vdelecroix:

This is not floating point noise

 sage: pari('intnum(x=0,13,sin(x)+sin(x^2) + x)')
- 85.1885681951527
+ 84.1818153922297

The value of the integral computed with gsl is

sage: numerical_integral(sin(x)+sin(x^2)+x,0,13)
(85.18856819515268, 1.3572929447036586e-09)

So Pari was right up to 10-11 and now it gets down to 10-4. This is not acceptable.

It's much worse. Note the 85 became 84.

comment:25 in reply to:  24 ; Changed 7 years ago by Vincent Delecroix

Replying to jdemeyer:

Replying to vdelecroix:

This is not floating point noise

 sage: pari('intnum(x=0,13,sin(x)+sin(x^2) + x)')
- 85.1885681951527
+ 84.1818153922297

The value of the integral computed with gsl is

sage: numerical_integral(sin(x)+sin(x^2)+x,0,13)
(85.18856819515268, 1.3572929447036586e-09)

So Pari was right up to 10-11 and now it gets down to 10-4. This is not acceptable.

It's much worse. Note the 85 became 84.

Indeed! I can confirm the behavior on the freshly compiled pari master development 17795-d04cdd3.

comment:26 in reply to:  25 Changed 7 years ago by Vincent Delecroix

Replying to vdelecroix:

Replying to jdemeyer:

Replying to vdelecroix:

This is not floating point noise

 sage: pari('intnum(x=0,13,sin(x)+sin(x^2) + x)')
- 85.1885681951527
+ 84.1818153922297

The value of the integral computed with gsl is

sage: numerical_integral(sin(x)+sin(x^2)+x,0,13)
(85.18856819515268, 1.3572929447036586e-09)

So Pari was right up to 10-11 and now it gets down to 10-4. This is not acceptable.

It's much worse. Note the 85 became 84.

Indeed! I can confirm the behavior on the freshly compiled pari master development 17795-d04cdd3.

This is rather disjoint from this ticket, I opened #18342 to track the issue.

comment:27 Changed 7 years ago by Jeroen Demeyer

There are also doctest failures in src/sage/calculus/calculus.py due to this.

comment:28 Changed 7 years ago by git

Commit: da46c953bd22e80790ec6437cfe529484c858dec23a01103840dd8dd37197f7a18c9b1ac8635bd47

Branch pushed to git repo; I updated commit sha1. New commits:

23a0110Fix call to PARI qfminim() in shortest_vector()

comment:29 in reply to:  13 Changed 7 years ago by Leif Leonhardy

Replying to jdemeyer:

What is the ulimit -s outside of make? I get

jdemeyer@tamiyo:~$ ulimit -s
8192

so if make changes that to unlimited, that's https://savannah.gnu.org/bugs/?22010 indeed.

Yep, missed that: Outside of make, the stack size is (already) soft-limited to 8MB, which GNU make 3.81 in the distros resets to unlimited.

The original bug report was actually about the opposite: Changing it from unlimited to some huge value.

But the real bug is that (the old) make touches the resource limits of (non-make) subprocesses at all.

comment:30 Changed 7 years ago by git

Commit: 23a01103840dd8dd37197f7a18c9b1ac8635bd4711ce9814729ea25384715b293ea3711f0354b6a2

Branch pushed to git repo; I updated commit sha1. New commits:

11ce981More integration doctest fixes

comment:31 Changed 7 years ago by Jeroen Demeyer

Status: needs_workneeds_review

comment:32 Changed 7 years ago by Jeroen Demeyer

Summary: If given unlimited stack, gp uses it allPARI/GP does not gracefully handle out-of-memory

comment:33 Changed 7 years ago by Leif Leonhardy

But "PARI/GP does not gracefully handle out-of-memory" is "Fixed upstream, but not in a stable release", right?

comment:34 Changed 7 years ago by Jeroen Demeyer

Report Upstream: Not yet reported upstream; Will do shortly.Fixed upstream, but not in a stable release.

comment:35 Changed 7 years ago by Leif Leonhardy

Using bzip2 would save 0.5 MB... (2.9 MB vs. 3.4 MB)

comment:36 Changed 7 years ago by Leif Leonhardy

With Sage 6.7.beta3 and GCC 5.1.0 on Linux x86_64, PARI and eclib pass their test suites (Lcalc no longer has one), and ptestlong passes modulo one unrelated test (I already reported for 6.7.beta3 on sage-release).

Also

(ulimit -s unlimited; ./sage -t --long src/sage/schemes/elliptic_curves/lseries_ell.py)

works as expected.

Haven't looked very closely at the patches though.

The changed behaviour of PARI's intnum() should IMHO be documented, on #18342, where also the doctest could be changed (not just its output).

comment:37 Changed 7 years ago by Leif Leonhardy

P.S.:

These aren't new, i.e., no regression:

../src/modules/mpqs.c:376:16: warning: array subscript is above array bounds [-Warray-bounds]
../src/modules/mpqs.c:378:22: warning: array subscript is above array bounds [-Warray-bounds]
../src/modules/mpqs.c:380:24: warning: array subscript is above array bounds [-Warray-bounds]
../src/modules/mpqs.c:380:24: warning: array subscript is above array bounds [-Warray-bounds]
../src/modules/mpqs.c:381:14: warning: array subscript is above array bounds [-Warray-bounds]

comment:38 Changed 7 years ago by Leif Leonhardy

Ok, patches look reasonable as well.

comment:39 Changed 7 years ago by Leif Leonhardy

Just triggered a build on a (slow) x86 machine; this will presumably take at least half a day...

comment:40 Changed 7 years ago by John Cremona

Looks good to me -- all tests pass (64 bit). If Leif's test passes OK on a 32-bit machine let's approve this. Thanks, Jeroen.

comment:41 in reply to:  39 ; Changed 7 years ago by Leif Leonhardy

Replying to leif:

Just triggered a build on a (slow) x86 machine; this will presumably take at least half a day...

That was probably an estimate for a few releases ago; the last test run is still in progress...

comment:42 in reply to:  41 Changed 7 years ago by Leif Leonhardy

Reviewers: Vincent Delecroix, John Cremona, Leif Leonhardy
Status: needs_reviewpositive_review

Replying to leif:

Replying to leif:

Just triggered a build on a (slow) x86 machine; this will presumably take at least half a day...

That was probably an estimate for a few releases ago; the last test run is still in progress...

It finally finished without errors. (This was on a Pentium4 Prescott, Ubuntu 10.04.4, FSF GCC 4.8.2 with -O3, native build, FWIW.)


I haven't tested building with FLTK, but it looks as if PARI's "configure" would meanwhile work.

comment:43 Changed 7 years ago by Leif Leonhardy

Ooops, just noticed get_fltk.patch should then also get removed from patches/README.txt, but that's really a minor issue.

comment:44 Changed 7 years ago by git

Commit: 11ce9814729ea25384715b293ea3711f0354b6a25249809481c4366e0ea5f9efc0f89b27ce7bc7fd
Status: positive_reviewneeds_review

Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:

5249809Remove mention of get_fltk.patch

comment:45 Changed 7 years ago by Jeroen Demeyer

Status: needs_reviewpositive_review

comment:46 Changed 7 years ago by Volker Braun

Branch: u/jdemeyer/if_given_unlimited_stack__gp_uses_it_all5249809481c4366e0ea5f9efc0f89b27ce7bc7fd
Resolution: fixed
Status: positive_reviewclosed
Note: See TracTickets for help on using tickets.