Opened 5 years ago

Closed 5 years ago

#18340 closed defect (fixed)

PARI/GP does not gracefully handle out-of-memory

Reported by: novoselt Owned by:
Priority: blocker Milestone: sage-6.7
Component: packages: standard Keywords: PARI gp ptestlong doctesting lseries_ell.py
Cc: jdemeyer Merged in:
Authors: Jeroen Demeyer Reviewers: Vincent Delecroix, John Cremona, Leif Leonhardy
Report Upstream: Fixed upstream, but not in a stable release. Work issues:
Branch: 5249809 (Commits) Commit: 5249809481c4366e0ea5f9efc0f89b27ce7bc7fd
Dependencies: Stopgaps:

Description (last modified by jdemeyer)

The following will eat all your memory:

$ ( ulimit -s unlimited; ./sage -t --long src/sage/schemes/elliptic_curves/lseries_ell.py )

This sets the stack size to "unlimited". The gp process which is executed by this doctest will then use a very large amount of memory.

This happens in particular with GNU make < 3.82 due to https://savannah.gnu.org/bugs/?22010

This can be solved by upgrading PARI/GP.

Upstream: http://boxen.math.washington.edu/home/jdemeyer/spkg/pari-2.8-1545-gd04cdd3.tar.gz


See https://groups.google.com/d/topic/sage-devel/ydDfHsCxpAc/discussion

(Older) Thread on sage-release: http://thread.gmane.org/gmane.comp.mathematics.sage.release/2757

Change History (46)

comment:1 Changed 5 years ago by leif

  • Keywords PARI gp ptestlong doctesting added
  • Summary changed from Memory issues while testing to Memory issues while testing (gp process)

This has also been reported on sage-release; it happens presumably since the last update of PARI.

comment:2 Changed 5 years ago by leif

  • Description modified (diff)

For me, on one machine it happens each time I run make ptestlong unless I delete ~/.sage/timings* before doing so. (Deleting these files "disables" the IMHO useless and rather counter-productive sorting of doctests by previous runtime.)

John C. reported it happened to him even with make testlong, i.e., without testing in parallel.

comment:3 Changed 5 years ago by leif

  • Description modified (diff)

Ooops, GB, not MB... 8)

comment:4 Changed 5 years ago by leif

  • Description modified (diff)
  • Keywords lseries_ell.py added

comment:5 Changed 5 years ago by novoselt

Why don't we always limit doctesting by some sensible number (4GB per process seems fine to me) and report failed tests if it is triggered? It is one thing when my old laptop get unresponsive with a new version of Sage, it is a completely different story when "my" server goes down which is used by several researches and hundreds of students.

comment:6 follow-up: Changed 5 years ago by leif

Well, whether ulimit works depends on the OS AFAIK.

And as I said, I do not even get a "Killed" from Sage; the doctest passes even though its child process got killed.

I haven't yet been able to reproduce this other than by running make ptestlong; ./sage -tp --long --warn-long 0.0001 src/sage/schemes/elliptic_curves/ for example doesn't show the behaviour even on an already loaded machine.

comment:7 in reply to: ↑ 6 Changed 5 years ago by leif

Replying to leif:

I haven't yet been able to reproduce this other than by running make ptestlong; ./sage -tp --long --warn-long 0.0001 src/sage/schemes/elliptic_curves/ for example doesn't show the behaviour even on an already loaded machine.

Vaguely thought of that already, and we indeed need (GNU) make, and a sufficiently old one:

echo "WARNING:  'gp' may eat up up to ~16GB with the following"
(ulimit -v 16000000; printf "all:\n\t./sage -t --long src/sage/schemes/elliptic_curves/lseries_ell.py\n" | make -f -)

With make 3.81, gp runs amok; with make 4.1, everything's fine.

(We had a similar issue with ECL and/or GAP quite a while ago IIRC, and the make version without issues was 3.82 and later I think.)

comment:8 follow-up: Changed 5 years ago by novoselt

novoselt@sagenb:~$ make --version
GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for x86_64-pc-linux-gnu

ECL fails some tests on this machine unless I compile with some flags for Bulldozer... I guess updating to jessie may help, but I can't risk breaking things right before the term starts.

comment:9 follow-up: Changed 5 years ago by leif

  • Cc jdemeyer added

In case it's really GNU make's fault: https://savannah.gnu.org/bugs/?22010

Fixed in 3.82, as I guessed.

Note that not just Ubuntu 10.04, but also 12.04 and even 14.04!!! (just checked and couldn't believe) still come with the dead old and broken version 3.81 of 2006.

comment:10 in reply to: ↑ 8 ; follow-up: Changed 5 years ago by leif

Replying to novoselt:

novoselt@sagenb:~$ make --version
GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for x86_64-pc-linux-gnu

ECL fails some tests on this machine unless I compile with some flags for Bulldozer... I guess updating to jessie may help, but I can't risk breaking things right before the term starts.

Well, just grab some newer version (3.82, 4.0 or 4.1) from http://ftp.gnu.org/gnu/make/, untar it, and run configure && make && sudo make install (by default into /usr/local).

comment:11 in reply to: ↑ 9 ; follow-up: Changed 5 years ago by leif

Replying to leif:

In case it's really GNU make's fault: https://savannah.gnu.org/bugs/?22010

Hmmm, I'm still not sure whether it's really this. Ubuntu apparently patched their make, as it does not touch the resource limits, while my GNU make 4.1 limits the stack size to 8MB (soft):

for make in /usr/local/bin/make /usr/bin/make; do $make --version | head -1; printf "all:\n\t: ; ulimit -sH\n\t: ; ulimit -sS\n" | $make -f -; done
GNU Make 4.1
: ; ulimit -sH
unlimited
: ; ulimit -sS
8192
GNU Make 3.81
: ; ulimit -sH
unlimited
: ; ulimit -sS
unlimited

Another cause could be GNU make 3.81 messing up file descriptors somehow.

comment:12 in reply to: ↑ 10 ; follow-up: Changed 5 years ago by novoselt

Replying to leif:

Well, just grab some newer version (3.82, 4.0 or 4.1) from http://ftp.gnu.org/gnu/make/, untar it, and run configure && make && sudo make install (by default into /usr/local).

Alternatively, I can just not test Sage. But it does not fix the issue for other users. If it is a bug in make, but it is a widely adopted version, it still makes sense to work around it. Wheezy is likely to be around for another 3 years.

comment:13 in reply to: ↑ 11 ; follow-up: Changed 5 years ago by jdemeyer

Replying to leif:

Replying to leif:

In case it's really GNU make's fault: https://savannah.gnu.org/bugs/?22010

Hmmm, I'm still not sure whether it's really this. Ubuntu apparently patched their make, as it does not touch the resource limits, while my GNU make 4.1 limits the stack size to 8MB (soft):

for make in /usr/local/bin/make /usr/bin/make; do $make --version | head -1; printf "all:\n\t: ; ulimit -sH\n\t: ; ulimit -sS\n" | $make -f -; done
GNU Make 4.1
: ; ulimit -sH
unlimited
: ; ulimit -sS
8192
GNU Make 3.81
: ; ulimit -sH
unlimited
: ; ulimit -sS
unlimited

What is the ulimit -s outside of make? I get

jdemeyer@tamiyo:~$ ulimit -s
8192

so if make changes that to unlimited, that's https://savannah.gnu.org/bugs/?22010 indeed.

Last edited 5 years ago by jdemeyer (previous) (diff)

comment:14 in reply to: ↑ 12 Changed 5 years ago by jdemeyer

Replying to novoselt:

If it is a bug in make, but it is a widely adopted version, it still makes sense to work around it. Wheezy is likely to be around for another 3 years.

The fixed version of GNU make has been released almost 5 years ago. There is really no excuse for still using make 3.81. Like leif said, you can easily install that manually outside of the distro.

comment:15 Changed 5 years ago by jdemeyer

  • Component changed from PLEASE CHANGE to packages: standard
  • Description modified (diff)
  • Summary changed from Memory issues while testing (gp process) to If given unlimited stack, gp uses it all

I can confirm the issue.

comment:16 Changed 5 years ago by jdemeyer

I'm now trying to see why gp uses so much stack memory.

comment:17 Changed 5 years ago by jdemeyer

  • Authors set to Jeroen Demeyer

OK, this is my analysis: at some point, the dokchitser gp script is really trying to compute a vector of length 348804762364674. This obviously fails with a PARI stack overflow (this "stack" has nothing to do with the ulimit -s stack). As a result, Sage tries to increase gp's memory with the allocatemem() command. But since this vector is really too big, it fails no matter how much memory is allocated. At some point, allocating more memory from the system fails and it seems that PARI/GP does not gracefully handle this out-of-memory condition.

The solution: upgrade PARI/GP.

comment:18 Changed 5 years ago by jdemeyer

  • Description modified (diff)

comment:19 Changed 5 years ago by jdemeyer

  • Branch set to u/jdemeyer/if_given_unlimited_stack__gp_uses_it_all

comment:20 Changed 5 years ago by jdemeyer

  • Commit set to 84ebf31d352e519253a07247c9fe2d3c3fa85b6a
  • Status changed from new to needs_review

This fixes the issue, currently running make ptestlong.


New commits:

84ebf31Upgrade PARI/GP

comment:21 follow-up: Changed 5 years ago by vdelecroix

This is not floating point noise

 sage: pari('intnum(x=0,13,sin(x)+sin(x^2) + x)')
- 85.1885681951527
+ 84.1818153922297

The value of the integral computed with gsl is

sage: numerical_integral(sin(x)+sin(x^2)+x,0,13)
(85.18856819515268, 1.3572929447036586e-09)

So Pari was right up to 10-11 and now it gets down to 10-4. This is not acceptable.

comment:22 Changed 5 years ago by git

  • Commit changed from 84ebf31d352e519253a07247c9fe2d3c3fa85b6a to da46c953bd22e80790ec6437cfe529484c858dec

Branch pushed to git repo; I updated commit sha1. New commits:

da46c95Auto-generated PARI files should depend on decl.pxi

comment:23 Changed 5 years ago by jdemeyer

  • Status changed from needs_review to needs_work

comment:24 in reply to: ↑ 21 ; follow-up: Changed 5 years ago by jdemeyer

  • Report Upstream changed from N/A to Not yet reported upstream; Will do shortly.

Replying to vdelecroix:

This is not floating point noise

 sage: pari('intnum(x=0,13,sin(x)+sin(x^2) + x)')
- 85.1885681951527
+ 84.1818153922297

The value of the integral computed with gsl is

sage: numerical_integral(sin(x)+sin(x^2)+x,0,13)
(85.18856819515268, 1.3572929447036586e-09)

So Pari was right up to 10-11 and now it gets down to 10-4. This is not acceptable.

It's much worse. Note the 85 became 84.

comment:25 in reply to: ↑ 24 ; follow-up: Changed 5 years ago by vdelecroix

Replying to jdemeyer:

Replying to vdelecroix:

This is not floating point noise

 sage: pari('intnum(x=0,13,sin(x)+sin(x^2) + x)')
- 85.1885681951527
+ 84.1818153922297

The value of the integral computed with gsl is

sage: numerical_integral(sin(x)+sin(x^2)+x,0,13)
(85.18856819515268, 1.3572929447036586e-09)

So Pari was right up to 10-11 and now it gets down to 10-4. This is not acceptable.

It's much worse. Note the 85 became 84.

Indeed! I can confirm the behavior on the freshly compiled pari master development 17795-d04cdd3.

comment:26 in reply to: ↑ 25 Changed 5 years ago by vdelecroix

Replying to vdelecroix:

Replying to jdemeyer:

Replying to vdelecroix:

This is not floating point noise

 sage: pari('intnum(x=0,13,sin(x)+sin(x^2) + x)')
- 85.1885681951527
+ 84.1818153922297

The value of the integral computed with gsl is

sage: numerical_integral(sin(x)+sin(x^2)+x,0,13)
(85.18856819515268, 1.3572929447036586e-09)

So Pari was right up to 10-11 and now it gets down to 10-4. This is not acceptable.

It's much worse. Note the 85 became 84.

Indeed! I can confirm the behavior on the freshly compiled pari master development 17795-d04cdd3.

This is rather disjoint from this ticket, I opened #18342 to track the issue.

comment:27 Changed 5 years ago by jdemeyer

There are also doctest failures in src/sage/calculus/calculus.py due to this.

comment:28 Changed 5 years ago by git

  • Commit changed from da46c953bd22e80790ec6437cfe529484c858dec to 23a01103840dd8dd37197f7a18c9b1ac8635bd47

Branch pushed to git repo; I updated commit sha1. New commits:

23a0110Fix call to PARI qfminim() in shortest_vector()

comment:29 in reply to: ↑ 13 Changed 5 years ago by leif

Replying to jdemeyer:

What is the ulimit -s outside of make? I get

jdemeyer@tamiyo:~$ ulimit -s
8192

so if make changes that to unlimited, that's https://savannah.gnu.org/bugs/?22010 indeed.

Yep, missed that: Outside of make, the stack size is (already) soft-limited to 8MB, which GNU make 3.81 in the distros resets to unlimited.

The original bug report was actually about the opposite: Changing it from unlimited to some huge value.

But the real bug is that (the old) make touches the resource limits of (non-make) subprocesses at all.

comment:30 Changed 5 years ago by git

  • Commit changed from 23a01103840dd8dd37197f7a18c9b1ac8635bd47 to 11ce9814729ea25384715b293ea3711f0354b6a2

Branch pushed to git repo; I updated commit sha1. New commits:

11ce981More integration doctest fixes

comment:31 Changed 5 years ago by jdemeyer

  • Status changed from needs_work to needs_review

comment:32 Changed 5 years ago by jdemeyer

  • Summary changed from If given unlimited stack, gp uses it all to PARI/GP does not gracefully handle out-of-memory

comment:33 Changed 5 years ago by leif

But "PARI/GP does not gracefully handle out-of-memory" is "Fixed upstream, but not in a stable release", right?

comment:34 Changed 5 years ago by jdemeyer

  • Report Upstream changed from Not yet reported upstream; Will do shortly. to Fixed upstream, but not in a stable release.

comment:35 Changed 5 years ago by leif

Using bzip2 would save 0.5 MB... (2.9 MB vs. 3.4 MB)

comment:36 Changed 5 years ago by leif

With Sage 6.7.beta3 and GCC 5.1.0 on Linux x86_64, PARI and eclib pass their test suites (Lcalc no longer has one), and ptestlong passes modulo one unrelated test (I already reported for 6.7.beta3 on sage-release).

Also

(ulimit -s unlimited; ./sage -t --long src/sage/schemes/elliptic_curves/lseries_ell.py)

works as expected.

Haven't looked very closely at the patches though.

The changed behaviour of PARI's intnum() should IMHO be documented, on #18342, where also the doctest could be changed (not just its output).

comment:37 Changed 5 years ago by leif

P.S.:

These aren't new, i.e., no regression:

../src/modules/mpqs.c:376:16: warning: array subscript is above array bounds [-Warray-bounds]
../src/modules/mpqs.c:378:22: warning: array subscript is above array bounds [-Warray-bounds]
../src/modules/mpqs.c:380:24: warning: array subscript is above array bounds [-Warray-bounds]
../src/modules/mpqs.c:380:24: warning: array subscript is above array bounds [-Warray-bounds]
../src/modules/mpqs.c:381:14: warning: array subscript is above array bounds [-Warray-bounds]

comment:38 Changed 5 years ago by leif

Ok, patches look reasonable as well.

comment:39 follow-up: Changed 5 years ago by leif

Just triggered a build on a (slow) x86 machine; this will presumably take at least half a day...

comment:40 Changed 5 years ago by cremona

Looks good to me -- all tests pass (64 bit). If Leif's test passes OK on a 32-bit machine let's approve this. Thanks, Jeroen.

comment:41 in reply to: ↑ 39 ; follow-up: Changed 5 years ago by leif

Replying to leif:

Just triggered a build on a (slow) x86 machine; this will presumably take at least half a day...

That was probably an estimate for a few releases ago; the last test run is still in progress...

comment:42 in reply to: ↑ 41 Changed 5 years ago by leif

  • Reviewers set to Vincent Delecroix, John Cremona, Leif Leonhardy
  • Status changed from needs_review to positive_review

Replying to leif:

Replying to leif:

Just triggered a build on a (slow) x86 machine; this will presumably take at least half a day...

That was probably an estimate for a few releases ago; the last test run is still in progress...

It finally finished without errors. (This was on a Pentium4 Prescott, Ubuntu 10.04.4, FSF GCC 4.8.2 with -O3, native build, FWIW.)


I haven't tested building with FLTK, but it looks as if PARI's "configure" would meanwhile work.

comment:43 Changed 5 years ago by leif

Ooops, just noticed get_fltk.patch should then also get removed from patches/README.txt, but that's really a minor issue.

comment:44 Changed 5 years ago by git

  • Commit changed from 11ce9814729ea25384715b293ea3711f0354b6a2 to 5249809481c4366e0ea5f9efc0f89b27ce7bc7fd
  • Status changed from positive_review to needs_review

Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:

5249809Remove mention of get_fltk.patch

comment:45 Changed 5 years ago by jdemeyer

  • Status changed from needs_review to positive_review

comment:46 Changed 5 years ago by vbraun

  • Branch changed from u/jdemeyer/if_given_unlimited_stack__gp_uses_it_all to 5249809481c4366e0ea5f9efc0f89b27ce7bc7fd
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.