Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#13211 closed enhancement (fixed)

Upgrade GAP to 4.5.7

Reported by: kini Owned by: tbd
Priority: major Milestone: sage-5.6
Component: packages: standard Keywords:
Cc: ppurka, dimpase, mmarco, jhpalmieri, rbeezer, vbraun, burcin Merged in: sage-5.6.beta1
Authors: Volker Braun, Jeroen Demeyer Reviewers: Dmitrii Pasechnik
Report Upstream: Reported upstream. Developers acknowledge bug. Work issues:
Branch: Commit:
Dependencies: #13123, #13579 Stopgaps:

Status badges

Description (last modified by vbraun)

While we are at it, move the gap install to $SAGE_LOCAL/gap/gap.x.y.z. Its not cool to put anything but libraries into /lib. Also, make a symlink latest -> gap-x.y.z so that not every script has to figure out the current version number. This follows what is usually done with java, another offender who can't install in a standards-compliant manner:

[vbraun@laptop ~]$ ll /usr/java
total 8
drwxr-xr-x. 3 root root 4096 Jul  9 17:37 jdk1.7.0_03
drwxr-xr-x. 8 root root 4096 Jul  9 17:37 jdk1.7.0_05
lrwxrwxrwx. 1 root root   21 Jul  9 17:37 latest -> /usr/java/jdk1.7.0_05

Updated spkgs:

Apply to SAGE_ROOT

Apply

Attachments (18)

endianness.patch (1.5 KB) - added by dimpase 9 years ago.
fixlinuxmeminfo.patch (636 bytes) - added by dimpase 9 years ago.
fix control flow in misc/memory_info.py
trac_13211_mpir_dep.patch (696 bytes) - added by dimpase 9 years ago.
adding MPIR dependence
trac_13211_itanium_fix.patch (826 bytes) - added by vbraun 9 years ago.
Initial patch
trac_13211_quit_after_workspace.patch (1.1 KB) - added by vbraun 9 years ago.
Initial patch
trac_13211_quit.patch (851 bytes) - added by vbraun 9 years ago.
Initial patch
gap.19589 (114.6 KB) - added by jdemeyer 9 years ago.
First 1500 lines of an infinite strace
sage-test-maxima-EIO.strace (599.1 KB) - added by vbraun 9 years ago.
maxima getting EIO & segfault too
writeandcheck.patch (766 bytes) - added by jdemeyer 9 years ago.
Patch added to GAP in gap-4.5.6.p0
trac_13211_pool_size.patch (19.2 KB) - added by vbraun 9 years ago.
Updated patch
cflags.patch (1.0 KB) - added by jdemeyer 9 years ago.
Fix CFLAGS setting in GAP configure file
siginterrupt.patch (649 bytes) - added by jdemeyer 9 years ago.
Fix interrupts in Solaris
gap-4.5.7.p0.diff (7.2 KB) - added by jdemeyer 9 years ago.
Diff for the GAP spkg 4.5.6 -> 4.5.7.p0. For reference / review only.
gap-4.5.7.p1.diff (731 bytes) - added by vbraun 9 years ago.
spkg diff for review only
trac_13211_fix_gap_doctests.patch (67.7 KB) - added by jdemeyer 9 years ago.
Updated for GAP-4.5.7 (contains pool_size patch)
trac_13211_fix_gap_doctests.2.patch (55.5 KB) - added by vbraun 9 years ago.
Updated patch
trac_13211_fix_gap_doctests.3.patch (67.7 KB) - added by vbraun 9 years ago.
Updated patch
trac_13211_fix_gap_doctests_vb.patch (67.8 KB) - added by vbraun 9 years ago.
Updated patch

Download all attachments as: .zip

Change History (275)

comment:1 Changed 9 years ago by dimpase

  • Cc mmarco added
  • Description modified (diff)
  • Summary changed from Upgrade GAP to 4.5.4 to Upgrade GAP to 4.5

comment:2 Changed 9 years ago by jhpalmieri

  • Cc jhpalmieri added

comment:3 Changed 9 years ago by rbeezer

  • Cc rbeezer added

From the GAP website: "The current version is GAP 4.5.5 released on July 17th, 2012."

comment:4 Changed 9 years ago by vbraun

  • Cc vbraun added

comment:5 Changed 9 years ago by vbraun

  • Authors set to Volker Braun
  • Cc burcin added
  • Description modified (diff)
  • Summary changed from Upgrade GAP to 4.5 to Upgrade GAP to 4.5.5

comment:6 Changed 9 years ago by vbraun

  • Keywords rng added

And of course the random number generator (random group element) in GAP changed, yay!

comment:7 Changed 9 years ago by mmarco

It could also be a good moment to include as many gap packages as possible in the gap_packages spkg. I made a list with the licenses of some packages [1] and a spkg with those packages that have gpl license and i was able to make install seamlessly [2].

It was all for 4.4.12 version, but should be easily ported to 4.5

[1] https://docs.google.com/spreadsheet/ccc?key=0AvB7eBQ-W5NGdDN2X1NVQnUyQ24tQ05CQzRlVVh2Rmc [2] https://docs.google.com/open?id=0B_B7eBQ-W5NGWUUzQzYxdFRqaXM

comment:8 follow-up: Changed 9 years ago by vbraun

No this is not a good moment to add features to gap. Make a separate ticket.

comment:9 in reply to: ↑ 8 Changed 9 years ago by dimpase

Replying to vbraun:

No this is not a good moment to add features to gap. Make a separate ticket.

indeed, that can be done later. Let's upgrade first. By the way, did you take care of #13341?

comment:10 Changed 9 years ago by vbraun

I've based it on #13341 and tried to not touch any cygwin stuff. Though its probably save to say that cygwin broke, I don't have a Windows machine to test on.

comment:11 Changed 9 years ago by vbraun

  • Description modified (diff)

Also the new gap prints escape codes on Fedora 17, I've communicated this to the GAP devs here:

[vbraun@laptop ~]$ echo '1+1;' | sage -gap -q | od -c
0000000 033   [   ?   1   0   3   4   h   2  \n
0000012

comment:12 Changed 9 years ago by SimonKing

Here are test failures on my OpenSUSE laptop.

Most of the tests fail because of changes in the random generator or because of slightly changed error messages or because of changing documentation in GAP or simply because of testing against the version number.

Here are more serious problems:

File "/home/simon/SAGE/prerelease/sage-5.2.rc0/devel/sage-main/sage/interfaces/gap.py", line 1330:
    sage: 'Centralizer' in s5.trait_names()
Exception raised:
    Traceback (most recent call last):
      File "/home/simon/SAGE/prerelease/sage-5.2.rc0/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/home/simon/SAGE/prerelease/sage-5.2.rc0/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/home/simon/SAGE/prerelease/sage-5.2.rc0/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_46[3]>", line 1, in <module>
        'Centralizer' in s5.trait_names()###line 1330:
    sage: 'Centralizer' in s5.trait_names()
      File "/home/simon/SAGE/prerelease/sage-5.2.rc0/local/lib/python/site-packages/sage/interfaces/gap.py", line 1338, in trait_names
        v = eval(v)
      File "<string>", line 4
        "in", "ShallowCopy", <Attribute "Name",
                             ^
    SyntaxError: invalid syntax

In sage/coding/linear_code.py are a few crashes, like this:

File "/home/simon/SAGE/prerelease/sage-5.2.rc0/devel/sage-main/sage/coding/linear_code.py", line 2047:
    sage: G.order()
Exception raised:
    Traceback (most recent call last):
...
    RuntimeError: Gap produced error output
    Error, Variable: '$sage16' must have a value

       executing Size($sage16);
**********************************************************************
File "/home/simon/SAGE/prerelease/sage-5.2.rc0/devel/sage-main/sage/coding/linear_code.py", line 2716:
    sage: C.zeta_polynomial()
Exception raised:
    Traceback (most recent call last):
...
    RuntimeError: Gap produced error output
    Error, Variable: '$sage24' must have a value

       executing Print($sage24);
**********************************************************************
File "/home/simon/SAGE/prerelease/sage-5.2.rc0/devel/sage-main/sage/coding/linear_code.py", line 599:
    sage: for B in self_orthogonal_binary_codes(7,3,4):
       print B; print B.gen_mat()
Expected:
    Linear code of length 4, dimension 1 over Finite Field of size 2
    [1 1 1 1]
    Linear code of length 6, dimension 2 over Finite Field of size 2
    [1 1 1 1 0 0]
    [0 1 0 1 1 1]
    Linear code of length 7, dimension 3 over Finite Field of size 2
    [1 0 1 1 0 1 0]
    [0 1 0 1 1 1 0]
    [0 0 1 0 1 1 1]
Got:
    Linear code of length 4, dimension 1 over Finite Field of size 2
    [1 1 1 1]
    ** Gap crashed or quit executing 'Read("/home/simon/.sage//temp/linux_sqwp.site/18857//interface//tmp18907");' **
    Restarting Gap and trying again
    Linear code of length 6, dimension 2 over Finite Field of size 2
    [1 1 1 1 0 0]
    [0 1 0 1 1 1]
    Linear code of length 7, dimension 3 over Finite Field of size 2
    [1 0 1 1 0 1 0]
    [0 1 0 1 1 1 0]
    [0 0 1 0 1 1 1]
**********************************************************************

Note the crash in the last example. A similar crash is:

File "/home/simon/SAGE/prerelease/sage-5.2.rc0/devel/sage-main/sage/coding/code_constructions.py", line 530:
    sage: C.minimum_distance()
Expected:
    4
Got:
    ** Gap crashed or quit executing 'Read("/home/simon/.sage//temp/linux_sqwp.site/18774//interface//tmp18791");' **
    Restarting Gap and trying again
    4

Several tests in latin.py fail, probably because of changes in the random generator. Couldn't one test instead whether the squares really are latin?

A strange one:

File "/home/simon/SAGE/prerelease/sage-5.2.rc0/devel/sage-main/sage/tests/cmdline.py", line 359:
    sage: out
Expected:
    '120\n'
Got:
    '\x1b[?1034h120\n'
Last edited 9 years ago by SimonKing (previous) (diff)

comment:13 Changed 9 years ago by vbraun

  • Report Upstream changed from N/A to Reported upstream. No feedback yet.

I've isolated a test case for the gap crashes and sent it to the GAP developers.

comment:14 Changed 9 years ago by vbraun

  • Description modified (diff)
  • Keywords rng removed
  • Report Upstream changed from Reported upstream. No feedback yet. to Reported upstream. Developers acknowledge bug.

The crashes are all due to a GAP garbage collection bug in Z/2Z-specific code. I've fixed all other doctest errors. The only remaining ones are

sage -t  -force_lib devel/sage/sage/coding/code_constructions.py # 1 doctests failed
sage -t  -force_lib devel/sage/sage/coding/linear_code.py # 3 doctests failed

comment:15 Changed 9 years ago by vbraun

  • Report Upstream changed from Reported upstream. Developers acknowledge bug. to Fixed upstream, in a later stable release.
  • Status changed from new to needs_review

I received a patch from the upstream developers, to be included in gap-4.5.6. I've updated the gap spkg with the fix, now the Sage testsuite runs without errors. I also updated the gap_packages spkg with a fix for a function name clash in braid-1.1 that broke the GAP testsuite (i.e. SAGE_CHECK=yes). So as far as I'm concerned we are good to go.

I would appreciate a speedy review of this ticket *nudge* *nudge*

comment:16 Changed 9 years ago by SimonKing

Installing the gap-4.5.5 spkg, the new gap packages and database gap spkgs worked fine on openSuse with SAGE_CHECK=yes. Now I'll start the Sage test suite.

comment:17 Changed 9 years ago by SimonKing

The Sage test suite passes as well. hg status has nothing to complain, in any of the three packages. And SPKG.txt is updated in all cases. Hence, from my perspective, it is a positive review. However, I'll repeat on bsd.math, and perhaps someone else can repeat on openSolaris.

comment:18 Changed 9 years ago by SimonKing

PS: Unfortunately most test of the to-be-reviewed latest version of my p_group_cohomology spkg fail with the new versions of Singular and GAP. But I guess that's my own problem...

comment:19 Changed 9 years ago by rbeezer

Dear Volker and Simon,

Thanks very much for your work on this one. I've been becoming a lot more familiar with the GAP interface this summer and getting undergraduate students involved, so I really appreciate your work getting this organized and reviewed.

Rob

comment:20 follow-up: Changed 9 years ago by jhpalmieri

On two different OS X 10.7 machines, there is one doctest failure:

sage -t --long "devel/sage/sage/interfaces/gap.py"          
**********************************************************************
File "/Users/palmieri/Desktop/Sage_stuff/sage_builds/sage-5.3.rc0-gap/devel/sage/sage/interfaces/gap.py", line 521:
    sage: a = gap(3)
Exception raised:
    Traceback (most recent call last):
      File "/Users/palmieri/Desktop/Sage_stuff/sage_builds/sage-5.3.rc0-gap/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/Users/palmieri/Desktop/Sage_stuff/sage_builds/sage-5.3.rc0-gap/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/Users/palmieri/Desktop/Sage_stuff/sage_builds/sage-5.3.rc0-gap/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_9[8]>", line 1, in <module>
        a = gap(Integer(3))###line 521:
    sage: a = gap(3)
      File "/Users/palmieri/Desktop/Sage_stuff/sage_builds/sage-5.3.rc0-gap/local/lib/python/site-packages/sage/interfaces/interface.py", line 197, in __call__
        return self._coerce_from_special_method(x)
      File "/Users/palmieri/Desktop/Sage_stuff/sage_builds/sage-5.3.rc0-gap/local/lib/python/site-packages/sage/interfaces/interface.py", line 223, in _coerce_from_special_method
        return (x.__getattribute__(s))(self)
      File "sage_object.pyx", line 463, in sage.structure.sage_object.SageObject._gap_ (sage/structure/sage_object.c:4529)
      File "sage_object.pyx", line 439, in sage.structure.sage_object.SageObject._interface_ (sage/structure/sage_object.c:4129)
      File "/Users/palmieri/Desktop/Sage_stuff/sage_builds/sage-5.3.rc0-gap/local/lib/python/site-packages/sage/interfaces/interface.py", line 195, in __call__
        return cls(self, x, name=name)
      File "/Users/palmieri/Desktop/Sage_stuff/sage_builds/sage-5.3.rc0-gap/local/lib/python/site-packages/sage/interfaces/expect.py", line 1330, in __init__
        raise TypeError, x
    TypeError: Gap produced error output
    Error, user interrupt

       executing $sage4:=3;;
**********************************************************************

I'll try to build on OpenSolaris, too.

comment:21 in reply to: ↑ 20 ; follow-up: Changed 9 years ago by SimonKing

Replying to jhpalmieri:

On two different OS X 10.7 machines, there is one doctest failure:

sage -t --long "devel/sage/sage/interfaces/gap.py"          
**********************************************************************
File "/Users/palmieri/Desktop/Sage_stuff/sage_builds/sage-5.3.rc0-gap/devel/sage/sage/interfaces/gap.py", line 521:
    sage: a = gap(3)
Exception raised:
    Traceback (most recent call last):
...

I'll try to build on OpenSolaris, too.

Wow. That's not good. I suppose that there are a lot of further errors. If gap(3) breaks then I guess most other gap stuff will break, too.

FWIW, make ptest worked on my laptop and make ptestlong worked on bsd.math. The spkgs look fine, in terms of hg status and SPKG.txt.

comment:22 Changed 9 years ago by jhpalmieri

That is the only error, actually. I did make ptestlong and everything else passed.

comment:23 Changed 9 years ago by jhpalmieri

A little more data: I can only get the doctest to fail if the system is somewhat loaded (for example, doing parallel doctests). If the system is idle, the doctest passes consistently; if it's loaded, the doctest fails consistently.

(In more detail: I am remotely logging into the machine in my office at the university, and I know that no one else uses the machine. Running 'make ptestlong gives a failure, running ./sage -tp 2 devel/sage/sage/interfaces/` gives a failure, running the doctest while I'm also building another installation of Sage gives a failure, while it passes when running the doctest by itself. It's quite repeatable.)

comment:24 in reply to: ↑ 21 ; follow-up: Changed 9 years ago by vbraun

Replying to SimonKing:

Wow. That's not good. I suppose that there are a lot of further errors. If gap(3) breaks then I guess most other gap stuff will break, too.

This is where Ctrl-C is tested, for the record.

Edit: where automatic restarting of the GAP interpreter is tested.

Last edited 9 years ago by vbraun (previous) (diff)

comment:25 in reply to: ↑ 24 Changed 9 years ago by SimonKing

Replying to vbraun:

This is where Ctrl-C is tested, for the record.

Edit: where automatic restarting of the GAP interpreter is tested.

You mean this one?

        The following tests against a bug fixed at trac ticket #10296:

            sage: a = gap(3)
            sage: gap.eval('quit;')
            ''
            sage: a = gap(3)
            ** Gap crashed or quit executing '$sage...:=3;;' **
            Restarting Gap and trying again
            sage: a
            3

Does it fail in the first or in the second instance of a = gap(3)? Why does it report a user interrupt, when either nothing has happened at all (first instance) or GAP was not running (second instance)?

comment:26 Changed 9 years ago by vbraun

I could reproduce this in repeated testing on Linux. You actually need to scroll up a bit higher:

            sage: gap.interrupt(timeout=1) is not None
            True
            sage: gap._eval_using_file_cutoff = cutoff

        The following tests against a bug fixed at trac ticket #10296:

            sage: a = gap(3)

and it dies in the first gap(3). Because our interrupt() method is crap, it sends the wrong quit string and GAP/readline interferes with Ctrl-C.

The updated patch fixes this. I still find a ~1% chance of failing sage.interfaces.gap doctests in random places, but thats probably on par for any pexpect interface.

comment:27 Changed 9 years ago by vbraun

Also gap_console() was broken. Fixed it and added a meaningful doctest.

comment:28 Changed 9 years ago by jhpalmieri

On OpenSolaris, I am getting doctest failures in interfaces/expect.py. The failures are not always the same, but here is a sample:

I don't know what these have to do with this ticket, but they certainly seem to be caused by the spkg here.


By the way, the new library patch seems to fix the issues on OS X.

Last edited 9 years ago by jhpalmieri (previous) (diff)

comment:29 follow-up: Changed 9 years ago by vbraun

sage -t --long "devel/sage/sage/interfaces/expect.py"       
**********************************************************************
File "/export/home/palmieri/testing/clean/sage-5.3.rc0/devel/sage/sage/interfaces/expect.py", line 608:
    sage: L = [t[1] for t in f(range(5))]
Expected nothing
Got:
    [Errno 12] Not enough space
    Killing any remaining workers...

Disk is full, it seems.

comment:30 in reply to: ↑ 29 ; follow-up: Changed 9 years ago by jhpalmieri

Replying to vbraun:

Disk is full, it seems.

It looks like there is space to me, but which disk should I be looking at? I don't understand why after installing the spkg here, if I then do ./sage -f spkg/standard/gap-4.4.12.p7.spkg, the doctest passes, and then if I do ./sage -f gap-4.5.5.spkg, it fails again. Does the new version of GAP create temporary files somewhere new, compared to the old version?

comment:31 in reply to: ↑ 30 Changed 9 years ago by vbraun

Replying to jhpalmieri:

It looks like there is space to me, but which disk should I be looking at?

I didn't change anything. All temp files should go to $DOT_SAGE as before.

comment:32 follow-up: Changed 9 years ago by SimonKing

On bsd.math, at least when #12876 and its dependencies are added, I find:

sage -t -force_lib "devel/sage/sage/interfaces/gap.py"      
**********************************************************************
File "/scratch/sking/sage-5.3.rc1/devel/sage/sage/interfaces/gap.py", line 809:
    sage: gap(2)
Expected:
    2
Got:
    <BLANKLINE>
**********************************************************************
1 items had failures:
   1 of   4 in __main__.example_21
***Test Failed*** 1 failures.
For whitespace errors, see the file /Users/SimonKing/.sage//tmp/gap_70056.py
         [29.6 s]
 
----------------------------------------------------------------------
The following tests failed:


        sage -t -force_lib "devel/sage/sage/interfaces/gap.py"
Total time for all tests: 29.7 seconds

I'd need to repeat without all the other patches, though, to be on the safe side.

comment:33 Changed 9 years ago by jhpalmieri

OpenSolaris: I tried setting DOT_SAGE to /tmp/palmieri, and I still see the same failures with the new spkg, no failures with the old one. Any ideas?

comment:34 in reply to: ↑ 32 Changed 9 years ago by SimonKing

Replying to SimonKing:

On bsd.math, at least when #12876 and its dependencies are added, I find:

sage -t -force_lib "devel/sage/sage/interfaces/gap.py"      
**********************************************************************
File "/scratch/sking/sage-5.3.rc1/devel/sage/sage/interfaces/gap.py", line 809:
    sage: gap(2)
Expected:
    2
Got:
    <BLANKLINE>
> }}}

The error is reproducible, with just the new spkg and the doctest fix patch applied. That must be a really nasty side effect, because in an interactive session gap(2) returns 2.

comment:35 Changed 9 years ago by vbraun

I took some time to understand the gap pexpect interface. Altogether, I think it would be better to base it on the normal interface which seems to be more stable and definitely has more exposure. But we are using the package mode (gap -p) and that is neither documented (apart from the source) nor does it behave particularly well when you send Ctrl-C. But that is definitely for another ticket.

I did add more checks after interrupt() that the pexpect interface is in a sane state, and restart gap if it is not. I ran 500+ iterations of the doctest and do not get any failures anymore. I'm pretty confident that this is at least as good as we had before, so please review.

comment:36 Changed 9 years ago by vbraun

There was one doctest error is sage/interfaces/expect.py where there is a synchronization method for the expect interface. Since it doesn't know about the GAP package mode it actually desynchronizes the gap interface ;-) Fixed in the updated patch.

comment:37 Changed 9 years ago by jhpalmieri

On skynet machine mark (Solaris on sparc, Sage built with SAGE_INSTALL_GCC=yes), this version of Gap doesn't work:

$ ./sage --gap
 *********   GAP, Version 4.5.5 of 16-Jul-2012 (free software, GPL)
 *  GAP  *   http://www.gap-system.org
 *********   Architecture: sparc-sun-solaris2.10-gcc-default32
 Libs used:  gmp
 Loading the library and packages ...
/home/palmieri/mark/sage-5.3.rc1/spkg/bin/sage: line 400:  4079 Bus Error               (core dumped) "$SAGE_LOCAL/bin/gap" "$@"

comment:38 Changed 9 years ago by vbraun

I'm not surprised that it doesn't work on SPARC. I'll send the authors a bug report. But since its not a primary platform it shouldn't stop us from shipping the new GAP version.

comment:39 Changed 9 years ago by vbraun

I've investigated the SPARC issue and received a patch from the GAP developers. I've updated the spkg, now builds and tests fine on mark.

comment:40 follow-up: Changed 9 years ago by vbraun

  • Description modified (diff)
  • Report Upstream changed from Fixed upstream, in a later stable release. to Completely fixed; Fix reported upstream
  • Summary changed from Upgrade GAP to 4.5.5 to Upgrade GAP to 4.5.6

Naturally, nobody dared to review this ticket. So GAP released the next version in the meantime. Updated to gap-4.5.6.

comment:41 in reply to: ↑ 40 Changed 9 years ago by dimpase

Replying to vbraun:

Naturally, nobody dared to review this ticket. So GAP released the next version in the meantime. Updated to gap-4.5.6.

challenge accepted :/ expect a review real soon...

comment:42 Changed 9 years ago by dimpase

http://www.stp.dias.ie/~vbraun/Sage/spkg/gap-4.5.6.spkg has uncommitted changes in SPKG.txt and the spkg itself is not compressed, making it over 50MB instead of under 10MB.

comment:43 Changed 9 years ago by dimpase

  • Description modified (diff)
  • Status changed from needs_review to positive_review

I've created the spkg file with checked in changes and compressed it. The link included in the modified ticket description. Otherwise, great stuff! Positive review.

comment:44 Changed 9 years ago by jdemeyer

  • Milestone changed from sage-5.4 to sage-5.5
  • Reviewers set to Dmitrii Pasechnik

comment:45 Changed 9 years ago by jdemeyer

  • Dependencies set to #13123
  • Status changed from positive_review to needs_work

This needs to be rebased to #13123.

comment:46 follow-up: Changed 9 years ago by kcrisman

I get the following errors on Mac OS X 10.4 PPC. I think they are all about the seed for random tests. Notice that they did pass in the past, and they are neither the old nor the new versions of the expected results from the patch here. Could the little/big-endian have any impact on this? I assume not, but otherwise it seems odd that they worked in the past and don't now. Of course, they ARE "random"...

sage -t  "devel/sage-main/sage/algebras/group_algebra_new.py"
**********************************************************************
File "/Users/student/Desktop/sage-5.4.beta1/devel/sage-main/sage/algebras/group_algebra_new.py", line 592:
    sage: GroupAlgebra(DihedralGroup(6), QQ).random_element()
Expected:
    -1/95*(2,6)(3,5) - 1/2*(1,3)(4,6)
Got:
    -1/95*(1,3)(4,6) - 1/2*(1,5,3)(2,6,4)
**********************************************************************
File "/Users/student/Desktop/sage-5.4.beta1/devel/sage-main/sage/algebras/group_algebra_new.py", line 594:
    sage: GroupAlgebra(SU(2, 13), QQ).random_element(1)
Expected:
    1/2*[      1 9*a + 2]
    [9*a + 2      12]
Got:
    1/2*[       4  9*a + 2]
    [6*a + 10        1]
**********************************************************************
1 items had failures:
   2 of   5 in __main__.example_23
***Test Failed*** 2 failures.
For whitespace errors, see the file /Users/student/.sage//tmp/group_algebra_new_23167.py
         [83.0 s]
sage -t  "devel/sage-main/sage/groups/matrix_gps/__init__.py"
         [2.8 s]
sage -t  "devel/sage-main/sage/groups/matrix_gps/all.py"    
         [0.9 s]
sage -t  "devel/sage-main/sage/groups/matrix_gps/general_linear.py"
         [93.5 s]
sage -t  "devel/sage-main/sage/groups/matrix_gps/homset.py" 
         [30.6 s]
sage -t  "devel/sage-main/sage/groups/matrix_gps/linear.py" 
         [21.4 s]
sage -t  "devel/sage-main/sage/groups/matrix_gps/matrix_group.py"
         [129.7 s]
sage -t  "devel/sage-main/sage/groups/matrix_gps/matrix_group_element.py"
         [53.0 s]
sage -t  "devel/sage-main/sage/groups/matrix_gps/matrix_group_morphism.py"
         [50.4 s]
sage -t  "devel/sage-main/sage/groups/matrix_gps/orthogonal.py"
**********************************************************************
File "/Users/student/Desktop/sage-5.4.beta1/devel/sage-main/sage/groups/matrix_gps/orthogonal.py", line 244:
    sage: GO( 3, GF(7), 0).random_element()
Expected:
    [1 0 0]
    [6 1 6]
    [5 0 6]
Got:
    [1 5 3]
    [0 1 0]
    [0 6 6]
**********************************************************************
File "/Users/student/Desktop/sage-5.4.beta1/devel/sage-main/sage/groups/matrix_gps/orthogonal.py", line 142:
    sage: G.random_element()
Expected:
    [4 3 5 2]
    [6 6 4 0]
    [0 4 6 0]
    [4 4 5 1]
Got:
    [0 6 4 6]
    [2 5 0 2]
    [5 5 4 0]
    [1 0 3 4]
**********************************************************************
2 items had failures:
   1 of   6 in __main__.example_10
   1 of   6 in __main__.example_4
***Test Failed*** 2 failures.
For whitespace errors, see the file /Users/student/.sage//tmp/orthogonal_23218.py
         [39.1 s]
sage -t  "devel/sage-main/sage/groups/matrix_gps/special_linear.py"
         [42.5 s]
sage -t  "devel/sage-main/sage/groups/matrix_gps/symplectic.py"
**********************************************************************
File "/Users/student/Desktop/sage-5.4.beta1/devel/sage-main/sage/groups/matrix_gps/symplectic.py", line 16:
    sage: G.random_element()
Expected:
    [5 4 6 0]
    [1 1 6 2]
    [5 5 0 6]
    [5 4 5 1]
Got:
    [2 3 2 1]
    [6 4 6 5]
    [1 2 5 2]
    [6 5 1 0]
**********************************************************************
1 items had failures:
   1 of   8 in __main__.example_0
***Test Failed*** 1 failures.
For whitespace errors, see the file /Users/student/.sage//tmp/symplectic_23230.py
         [30.4 s]
sage -t  "devel/sage-main/sage/groups/matrix_gps/unitary.py"
**********************************************************************
File "/Users/student/Desktop/sage-5.4.beta1/devel/sage-main/sage/groups/matrix_gps/unitary.py", line 26:
    sage: G.random_element()
Expected:
    [4*a + 1 4*a + 4   a + 4]
    [3*a + 3       3       3]
    [  a + 2 4*a + 1 3*a + 3]
Got:
    [2*a + 3       4 3*a + 2]
    [3*a + 4       a 3*a + 1]
    [    4*a 2*a + 2   a + 2]
**********************************************************************
1 items had failures:
   1 of  10 in __main__.example_0
***Test Failed*** 1 failures.
For whitespace errors, see the file /Users/student/.sage//tmp/unitary_23236.py
         [31.7 s]
sage -t  "devel/sage-main/sage/misc/randstate.pyx"          
**********************************************************************
File "/Users/student/Desktop/sage-5.4.beta1/devel/sage-main/sage/misc/randstate.pyx", line 57:
    sage: rtest()
Expected:
    (303, -0.266166246380421, 1/2*x^2 - 1/95*x - 1/2, (1,3,2), [ 0, 0, 0, 0, 1 ], 963229057, 8045, 0.9661911734708414)  
Got:
    (303, -0.266166246380421, 1/2*x^2 - 1/95*x - 1/2, (1,2)(4,5), [ 0, 0, 0, 0, 1 ], 963229057, 8045, 0.9661911734708414)
**********************************************************************
File "/Users/student/Desktop/sage-5.4.beta1/devel/sage-main/sage/misc/randstate.pyx", line 61:
    sage: rtest()
Expected:
    (978, 0.0557699430711638, -3*x^2 - 1/12, (1,3,2), [ 0, 1, 1, 0, 0 ], 1161603091, 60359, 0.8335077654199736)  
Got:
    (978, 0.0557699430711638, -3*x^2 - 1/12, (1,2,3), [ 0, 1, 1, 0, 0 ], 1161603091, 60359, 0.8335077654199736)
**********************************************************************
File "/Users/student/Desktop/sage-5.4.beta1/devel/sage-main/sage/misc/randstate.pyx", line 65:
    sage: rtest()
Expected:
    (207, -0.0141049486533456, 4*x^2 + 1/2, (1,3,2), [ 0, 0, 1, 0, 1 ],  637693405, 27695, 0.19982565117278328)  
Got:
    (207, -0.0141049486533456, 4*x^2 + 1/2, (2,3), [ 0, 0, 1, 0, 1 ], 637693405, 27695, 0.19982565117278328)
**********************************************************************
File "/Users/student/Desktop/sage-5.4.beta1/devel/sage-main/sage/misc/randstate.pyx", line 69:
    sage: rtest()
Expected:
    (303, -0.266166246380421, 1/2*x^2 - 1/95*x - 1/2, (1,3,2), [ 0, 0, 0, 0, 1 ], 963229057, 8045, 0.9661911734708414)  
Got:
    (303, -0.266166246380421, 1/2*x^2 - 1/95*x - 1/2, (1,2)(4,5), [ 0, 0, 0, 0, 1 ], 963229057, 8045, 0.9661911734708414)
**********************************************************************
File "/Users/student/Desktop/sage-5.4.beta1/devel/sage-main/sage/misc/randstate.pyx", line 73:
    sage: rtest()
Expected:
    (978, 0.0557699430711638, -3*x^2 - 1/12, (1,3,2), [ 0, 1, 1, 0, 0 ], 1161603091, 60359, 0.8335077654199736)  
Got:
    (978, 0.0557699430711638, -3*x^2 - 1/12, (1,2,3), [ 0, 1, 1, 0, 0 ], 1161603091, 60359, 0.8335077654199736)
**********************************************************************
File "/Users/student/Desktop/sage-5.4.beta1/devel/sage-main/sage/misc/randstate.pyx", line 77:
    sage: rtest()
Expected:
    (207, -0.0141049486533456, 4*x^2 + 1/2, (1,3,2), [ 0, 0, 1, 0, 1 ],  637693405, 27695, 0.19982565117278328)  
Got:
    (207, -0.0141049486533456, 4*x^2 + 1/2, (2,3), [ 0, 0, 1, 0, 1 ], 637693405, 27695, 0.19982565117278328)
**********************************************************************
File "/Users/student/Desktop/sage-5.4.beta1/devel/sage-main/sage/misc/randstate.pyx", line 88:
    sage: rtest()
Expected:
    (720, -0.612180244315804, x^2 - x, (2,3), [ 1, 0, 0, 0, 0 ], 912534076, 14005, 0.9205331599518184)   
Got:
    (720, -0.612180244315804, x^2 - x, (1,3), [ 1, 0, 0, 0, 0 ], 912534076, 14005, 0.9205331599518184)
**********************************************************************
File "/Users/student/Desktop/sage-5.4.beta1/devel/sage-main/sage/misc/randstate.pyx", line 224:
    sage: r1 = rtest(); r1
Expected:
    (303, -0.266166246380421, 1/2*x^2 - 1/95*x - 1/2, (1,3,2), [ 0, 0, 0, 0, 1 ], 963229057, 8045, 0.9661911734708414)  
Got:
    (303, -0.266166246380421, 1/2*x^2 - 1/95*x - 1/2, (1,2)(4,5), [ 0, 0, 0, 0, 1 ], 963229057, 8045, 0.9661911734708414)
**********************************************************************
File "/Users/student/Desktop/sage-5.4.beta1/devel/sage-main/sage/misc/randstate.pyx", line 227:
    sage: r2 = rtest(); r2
Expected:
    (105, 0.642309615982449, -x^2 - x - 6, (1,2,3), [ 1, 0, 0, 1, 1 ], 14082860, 1271, 0.001767155077382232)  
Got:
    (105, 0.642309615982449, -x^2 - x - 6, (4,5), [ 1, 0, 0, 1, 1 ], 14082860, 1271, 0.001767155077382232)
**********************************************************************
File "/Users/student/Desktop/sage-5.4.beta1/devel/sage-main/sage/misc/randstate.pyx", line 236:
    sage: with seed(1): rtest()
Expected:
    (978, 0.0557699430711638, -3*x^2 - 1/12, (1,3,2), [ 0, 1, 1, 0, 0 ], 1161603091, 60359, 0.8335077654199736)  
Got:
    (978, 0.0557699430711638, -3*x^2 - 1/12, (1,2,3), [ 0, 1, 1, 0, 0 ], 1161603091, 60359, 0.8335077654199736)
**********************************************************************
File "/Users/student/Desktop/sage-5.4.beta1/devel/sage-main/sage/misc/randstate.pyx", line 239:
    sage: r2m = rtest(); r2m
Expected:
    (105, 0.642309615982449, -x^2 - x - 6, (1,2,3), [ 1, 0, 0, 1, 1 ], 14082860, 19769, 0.001767155077382232)  
Got:
    (105, 0.642309615982449, -x^2 - x - 6, (4,5), [ 1, 0, 0, 1, 1 ], 14082860, 19769, 0.001767155077382232)
**********************************************************************
File "/Users/student/Desktop/sage-5.4.beta1/devel/sage-main/sage/misc/randstate.pyx", line 255:
    sage: with seed(1):
          rtest();
          rtest();
Expected:
    (978, 0.0557699430711638, -3*x^2 - 1/12, (1,3,2), [ 0, 1, 1, 0, 0 ], 1161603091, 60359, 0.8335077654199736)  
    (138, -0.0404945051288503, 2*x - 24, (2,3), [ 1, 1, 1, 0, 1 ], 1966097838, 10234, 0.0033332230808060803)      
Got:
    (978, 0.0557699430711638, -3*x^2 - 1/12, (1,2,3), [ 0, 1, 1, 0, 0 ], 1161603091, 60359, 0.8335077654199736)
    (138, -0.0404945051288503, 2*x - 24, (2,3), [ 1, 1, 1, 0, 1 ], 1966097838, 10234, 0.0033332230808060803)
**********************************************************************
File "/Users/student/Desktop/sage-5.4.beta1/devel/sage-main/sage/misc/randstate.pyx", line 274:
    sage: try:
          ctx.__enter__()
          rtest()
    finally:
          ctx.__exit__(None, None, None)
Expected:
    <sage.misc.randstate.randstate object at 0x...>
    (978, 0.0557699430711638, -3*x^2 - 1/12, (1,3,2), [ 0, 1, 1, 0, 0 ], 1161603091, 60359, 0.8335077654199736)  
    False
Got:
    <sage.misc.randstate.randstate object at 0x155950c0>
    (978, 0.0557699430711638, -3*x^2 - 1/12, (1,2,3), [ 0, 1, 1, 0, 0 ], 1161603091, 60359, 0.8335077654199736)
    False
**********************************************************************
File "/Users/student/Desktop/sage-5.4.beta1/devel/sage-main/sage/misc/randstate.pyx", line 703:
    sage: gap.Random(1, 10^50)
Expected:
    1496738263332555434474532297768680634540939580077
Got:
    97144566318213989637952954803537490912828430192472
**********************************************************************

comment:47 follow-up: Changed 9 years ago by vbraun

In sage.misc.randstate.pyx there is a method set_seed_gap() that checks for big endian-ness and flips bytes around in that case. It could be that GAP's new random number code is actually big endian clean. Can you modify set_seed_gap() and see if that fixes things?

comment:48 in reply to: ↑ 47 Changed 9 years ago by kcrisman

In sage.misc.randstate.pyx there is a method set_seed_gap() that checks for big endian-ness and flips bytes around in that case. It could be that GAP's new random number code is actually big endian clean. Can you modify set_seed_gap() and see if that fixes things?

Well, what do you know! I didn't think Sage had any custom code for big endian... I'll try this now. It certainly sounds likely.


It does fix the problem in randstate.pyx, so I think you're right about the others. Testing now.

comment:49 Changed 9 years ago by kcrisman

sage -t  "devel/sage-main/sage/algebras/group_algebra_new.py"
         [81.4 s]
sage -t  "devel/sage-main/sage/groups/matrix_gps/orthogonal.py"
         [31.4 s]
sage -t  "devel/sage-main/sage/groups/matrix_gps/symplectic.py"
         [30.7 s]
sage -t  "devel/sage-main/sage/groups/matrix_gps/unitary.py"
         [31.2 s]
sage -t  "devel/sage-main/sage/misc/randstate.pyx"          
         [109.4 s]

I just gutted that part of the code.

  • sage/misc/randstate.pyx

    # HG changeset patch
    # User Karl-Dieter Crisman <kcrisman@gmail.com>
    # Date 1348512346 14400
    # Node ID aedbdc34982e44f81636a8006f7fc44086604f43
    # Parent  e1f48782037500988a1ea3f3b0764ead26b6232f
    Remove big-endian workaround
    
    diff --git a/sage/misc/randstate.pyx b/sage/misc/randstate.pyx
    a b  
    718718                 seed = ZZ.random_element(long(1)<<128)
    719719                 classic_seed = seed
    720720
    721                  if sys.byteorder == 'big':
    722                      # GAP's random number generator initialization
    723                      # (in integer.c, in FuncInitRandomMT) takes its
    724                      # seed as a string, then converts this string into
    725                      # an array of 32-bit integers just by casting the
    726                      # pointer.  Thus, the result depends on the
    727                      # endianness of the machine.  As a workaround, we
    728                      # swap the bytes in the string ourselves, so that
    729                      # GAP always gets the same array of integers.
    730 
    731                      seed = str(seed)
    732                      new_seed = ''
    733                      while len(seed) >= 4:
    734                          new_seed += seed[3::-1]
    735                          seed = seed[4:]
    736                      seed = '"' + new_seed + '"'
    737 
    738721                 mersenne_seed = seed
    739722
    740723             prev_mersenne_seed = gap.Reset(gap.GlobalMersenneTwister, mersenne_seed)

I don't know why these never format quite right...

comment:50 in reply to: ↑ 46 Changed 9 years ago by dimpase

Replying to kcrisman:

I get the following errors on Mac OS X 10.4 PPC. I think they are all about the seed for random tests. Notice that they did pass in the past, and they are neither the old nor the new versions of the expected results from the patch here. Could the little/big-endian have any impact on this?

Oh dear, now I see why it felt like good deja vu : see #9867. Nice moldy bitrotten ticket. It was never merged when the current ticket was prepared.

I've put #9867 up for review. Please close it and merge it here somehow...

Changed 9 years ago by dimpase

comment:51 Changed 9 years ago by dimpase

  • Description modified (diff)

I have added the patch from #9867 here.

comment:52 Changed 9 years ago by dimpase

  • Description modified (diff)

comment:53 Changed 9 years ago by vbraun

  • Dependencies changed from #13123 to #13123, #13123
  • Status changed from needs_work to positive_review

I've rebased it on #13123 (that is, on the as-of-yet unrelease sage-5.4.beta2) and folded in the endianness patch.

comment:54 Changed 9 years ago by jdemeyer

  • Dependencies changed from #13123, #13123 to #13123

I don't think it depends on #13123 twice.

comment:55 Changed 9 years ago by jdemeyer

  • Description modified (diff)

It seems the endianness patch is included in the other patch...

comment:56 Changed 9 years ago by jdemeyer

When testing this in a from-scratch built Sage, I got

sage -t  -force_lib devel/sage/sage/tests/cmdline.py
**********************************************************************
File "/release/merger/sage-5.5.beta0/devel/sage-main/sage/tests/cmdline.py", line 408:
    sage: err
Expected:
    ''
Got:
    'gap: halving pool size.\n'
**********************************************************************

On subsequent tests, I did not get this message.

comment:57 Changed 9 years ago by jdemeyer

Looking at the GAP sources, this seems to be caused by a lack of memory...

comment:58 Changed 9 years ago by jdemeyer

Also, I somehow ended up with orphan GAP processes running:

jdemeyer@sage:/release/merger/sage-5.5.beta0$ ps -ef |grep gap
jdemeyer 10863     1 99 14:26 ?        01:21:46 /release/merger/sage-5.5.beta0/local/gap/latest/bin/x86_64-unknown-linux-gnu-gcc-default64/gap -m 24m -l /release/merger/sage-5.5.beta0/local/gap/latest -r -b -p -T -o 9999G /release/merger/sage-5.5.beta0/local/share/sage/ext/gap/sage.g
jdemeyer 17015 10589  0 15:49 pts/124  00:00:00 grep gap
jdemeyer 22854     1 99 14:52 ?        00:56:56 /release/merger/sage-5.5.beta0/local/gap/latest/bin/x86_64-unknown-linux-gnu-gcc-default64/gap -m 24m -l /release/merger/sage-5.5.beta0/local/gap/latest -r -b -p -T -o 9999G /release/merger/sage-5.5.beta0/local/share/sage/ext/gap/sage.g

comment:59 follow-up: Changed 9 years ago by vbraun

Its not lack of actual memory, it is a lack of swap space. GAP reserves addressing space and corresponding swap space such that the addressing space could be actually used (potentially by swapping) if GAP were to use that much memory. If the initial workspace is too large, it tries with half the size iteratively until it succeeds. The "halving pool size" is harmless but cannot be disabled.

comment:60 in reply to: ↑ 59 Changed 9 years ago by jdemeyer

Replying to vbraun:

The "halving pool size" is harmless but cannot be disabled.

The printf() could be patched out... if it's harmless.

comment:61 Changed 9 years ago by vbraun

I've written upstream about this issue, hopefully it'll be fixed in a future release.

I'm not entirely happy with our allocation strategy, if you are running multiple sage processes (e.g. on a server) then you quickly reserve most of swap. I'll add a patch to tweak the pool size.

comment:62 Changed 9 years ago by jdemeyer

In any case, something needs to be done about the cmdline.py doctest failure due to that "error" message which isn't even an error.

Also: after running all doctests again, I ended up again with an orphaned gap process.

comment:63 Changed 9 years ago by jdemeyer

  • Status changed from positive_review to needs_work

comment:64 follow-up: Changed 9 years ago by vbraun

  • Dependencies changed from #13123 to #13123, #13579
  • Description modified (diff)
  • Status changed from needs_work to needs_review

I've fixed the "halving pool size" doctest error and changed the workspace allocation to default to 1/10*(available swap). This should give you enough workspace without reserving all swap when you run a server (or parallel doctests).

I can't reproduce any gap orphans. I verified that the gap pid is correctly written to spawned_processes so even if Sage gets kill -9'ed the sage-cleaner will get rid of orphans. Maybe sage-cleaner has some issue on your setup, Jeroen?

The additional trac_13211_pool_size.patch needs review... Dima? ;-)

comment:65 in reply to: ↑ 64 Changed 9 years ago by jdemeyer

Replying to vbraun:

I can't reproduce any gap orphans. I verified that the gap pid is correctly written to spawned_processes

I haven't looked at the code, but are you sure gap is started only from one place in Sage? Maybe the command line helps:

/release/merger/sage-5.5.beta0/local/gap/latest/bin/x86_64-unknown-linux-gnu-gcc-default64/gap -m 24m -l /release/merger/sage-5.5.beta0/local/gap/latest -r -b -p -T -o 9999G /release/merger/sage-5.5.beta0/local/share/sage/ext/gap/sage.g

comment:66 Changed 9 years ago by vbraun

I spend quite some time today trying to find any change that might change where or how gap is started, but I didn't find any. I ran various doctests and tried to kill -9 the sage process, but never managed to produce any orphans that weren't cleaned up.

comment:67 Changed 9 years ago by dimpase

On MacOSX 10.6.8 I get

sage -t --long -force_lib "devel/sage/sage/misc/memory_info.py"
**********************************************************************
File "/usr/local/src/sage/sage-5.4.rc0/devel/sage/sage/misc/memory_info.py", line 99:
    sage: print "ignore this";  mem._parse_proc_meminfo()   # random output
Exception raised:
    Traceback (most recent call last):
      File "/usr/local/src/sage/sage-5.4.rc0/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/usr/local/src/sage/sage-5.4.rc0/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/usr/local/src/sage/sage-5.4.rc0/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_3[4]>", line 1, in <module>
        print "ignore this";  mem._parse_proc_meminfo()   # random output###line 99:
    sage: print "ignore this";  mem._parse_proc_meminfo()   # random output
    AttributeError: 'MemoryInfo_guess' object has no attribute '_parse_proc_meminfo'
**********************************************************************
1 items had failures:
   1 of   6 in __main__.example_3
***Test Failed*** 1 failures.
For whitespace errors, see the file /Users/dima/.sage//tmp/memory_info_41483.py
	 [2.6 s]

comment:68 follow-up: Changed 9 years ago by vbraun

I see, raising an exception still makes # random doctests fail. Updated patch fixes this and adds special handling for OSX to get the ram size.

comment:69 in reply to: ↑ 68 ; follow-up: Changed 9 years ago by dimpase

Replying to vbraun:

I see, raising an exception still makes # random doctests fail. Updated patch fixes this and adds special handling for OSX to get the ram size.

OK, this works on OSX now, good. Let me me check on Linux...

comment:70 in reply to: ↑ 69 Changed 9 years ago by dimpase

Replying to dimpase:

Replying to vbraun:

I see, raising an exception still makes # random doctests fail. Updated patch fixes this and adds special handling for OSX to get the ram size.

OK, this works on OSX now, good. Let me me check on Linux...

with #13579 and #13211 patches applied, I have the following weirdness on Debian:

$ ../../sage    
----------------------------------------------------------------------
| Sage Version 5.4.rc1, Release Date: 2012-10-05                     |
| Type "notebook()" for the browser-based notebook interface.        |
| Type "help()" for help.                                            |
----------------------------------------------------------------------
**********************************************************************
*                                                                    *
* Warning: this is a prerelease version, and it may be unstable.     *
*                                                                    *
**********************************************************************
        top: unknown argument 'l'
usage:  top -hv | -bcisSH -d delay -n iterations [-u user | -U user] -p pid [,pid ...]

sage: 

here is my patch queue:

/usr/local/src/sage/sage-5.4.rc1/devel/sage$ hg qapplied
13579_secure_tmp.patch
trac_13579_fix_test_executable.patch
trac_13211_fix_gap_doctests.patch
trac_13211_pool_size.patch

PS. This seems to be due to failure to tell OSX from Linux!

Last edited 9 years ago by dimpase (previous) (diff)

Changed 9 years ago by dimpase

fix control flow in misc/memory_info.py

comment:71 Changed 9 years ago by dimpase

Well, why don't you actually check for Darwin/OSX? E.g.

import platform
if platform.system=='Darwin':
   # do OSX-thing...
else:
   # the rest...

comment:72 Changed 9 years ago by mmarco

I can't apply the patches.

sage: hg_sage.apply('http://trac.sagemath.org/sage_trac/raw-attachment/ticket/13211/trac_13211_fix_gap_doctests.patch')
Attempting to load remote file: http://trac.sagemath.org/sage_trac/raw-attachment/ticket/13211/trac_13211_fix_gap_doctests.patch
Loading: [.......]
cd "/home/mmarco/sage-5.3/devel/sage" && sage --hg import   "/home/mmarco/.sage/temp/neumann/583/tmp_0.patch"
applying /home/mmarco/.sage/temp/neumann/583/tmp_0.patch
patching file sage/interfaces/gap.py
Hunk #20 FAILED at 1663
1 out of 21 hunks FAILED -- saving rejects to file sage/interfaces/gap.py.rej
patching file sage/tests/cmdline.py
Hunk #1 FAILED at 516
1 out of 1 hunks FAILED -- saving rejects to file sage/tests/cmdline.py.rej
abort: patch failed to apply

comment:73 Changed 9 years ago by vbraun

You need at least sage-5.4.beta2 to apply the patches, I think.

comment:74 follow-up: Changed 9 years ago by vbraun

Ok should work now!

comment:75 in reply to: ↑ 74 Changed 9 years ago by dimpase

Replying to vbraun:

Ok should work now!

looks good. Another round of tests, and hopefully it's done.

comment:76 Changed 9 years ago by dimpase

There is an apparent incompatibility with the new GUAVA GAP package:

sage -t  --optional -force_lib devel/sage/sage/coding/linear_code.py
**********************************************************************File "/usr/local/src/sage/sage-5.4.rc1/devel/sage-main/sage/coding/linear_code.py",
 line 1239:
    sage: C.covering_radius()  # requires optional GAP package GuavaException raised:
    Traceback (most recent call last):      File "/usr/local/src/sage/sage-5.4.rc1/local/bin/ncadoctest.py", line 1231, i
n run_one_test        self.run_one_example(test, example, filename, compileflags)
      File "/usr/local/src/sage/sage-5.4.rc1/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compilefla
gs)      File "/usr/local/src/sage/sage-5.4.rc1/local/bin/ncadoctest.py", line 1172, i
n run_one_example        compileflags, 1) in test.globs
      File "<doctest __main__.example_24[3]>", line 1, in <module>
        C.covering_radius()  # requires optional GAP package Guava###line 1239:
    sage: C.covering_radius()  # requires optional GAP package Guava
      File "/usr/local/src/sage/sage-5.4.rc1/local/lib/python/site-packages/sage/co
ding/linear_code.py", line 1245, in covering_radius        C = gapG.GeneratorMatCode(gap(F))
      File "/usr/local/src/sage/sage-5.4.rc1/local/lib/python/site-packages/sage/in
terfaces/interface.py", line 584, in __call__        return self._obj.parent().function_call(self._name, [self._obj] + list(args
), kwds)
      File "/usr/local/src/sage/sage-5.4.rc1/local/lib/python/site-packages/sage/in
terfaces/gap.py", line 874, in function_call
        ['%s=%s'%(key,value.name()) for key, value in kwds.items()])))      File "/usr/local/src/sage/sage-5.4.rc1/local/lib/python/site-packages/sage/in
terfaces/gap.py", line 546, in eval
        result = Expect.eval(self, input_line, **kwds)      File "/usr/local/src/sage/sage-5.4.rc1/local/lib/python/site-packages/sage/in
terfaces/expect.py", line 1236, in eval        for L in code.split('\n') if L != ''])
      File "/usr/local/src/sage/sage-5.4.rc1/local/lib/python/site-packages/sage/interfaces/gap.py", line 747, in _eval_line
        raise RuntimeError, message
    RuntimeError: Gap produced error output
    Error, Variable: 'GeneratorMatCode' must have a value

       executing GeneratorMatCode($sage8,$sage1);
**********************************************************************
.... etc

comment:77 Changed 9 years ago by dimpase

  • Cc ppurka added

comment:78 Changed 9 years ago by dimpase

  • Status changed from needs_review to positive_review

Positive review. The issues with optional packages, such as the above, and over here at sage-devel, should be tackled elsewhere.

comment:79 Changed 9 years ago by jdemeyer

  • Milestone changed from sage-5.5 to sage-pending

comment:80 Changed 9 years ago by vbraun

rebased for sage-5.4.rc2

comment:81 follow-up: Changed 9 years ago by jhpalmieri

While we are at it, move the gap install to $SAGE_LOCAL/gap/gap.x.y.z.

Would it make sense to use $SAGE_LOCAL/share/gap/gap.x.y.z instead?

comment:82 in reply to: ↑ 81 Changed 9 years ago by vbraun

Replying to jhpalmieri:

Would it make sense to use $SAGE_LOCAL/share/gap/gap.x.y.z instead?

No, /share/ isn't for binaries.

comment:83 Changed 9 years ago by jdemeyer

  • Milestone changed from sage-pending to sage-5.5

comment:84 follow-up: Changed 9 years ago by jdemeyer

  • Status changed from positive_review to needs_work

This fails on Skynet eno:

Host system:
Linux eno 3.3.7-1.fc16.x86_64 #1 SMP Tue May 22 13:59:39 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
****************************************************
C compiler: gcc
C compiler version:
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-4.7.0/x86_64-Linux-core2-fc/libexec/gcc/x86_64-unknown-linux-gnu/4.7.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /usr/local/gcc-4.7.0/src/gcc-4.7.0/configure --enable-languages=c,c++,fortran --with-gnu-as --with-as=/usr/local/binutils-2.22/x86_64-Linux-core2-fc-gcc-4.6.2-rh/bin/as --with-gnu-ld --with-ld=/usr/local/binutils-2.22/x86_64-Linux-core2-fc-gcc-4.6.2-rh/bin/ld --with-gmp=/usr/local/mpir-2.5.1/x86_64-Linux-core2-fc-gcc-4.6.3-rh --with-mpfr=/usr/local/mpfr-3.1.0/x86_64-Linux-core2-fc-mpir-2.5.1-gcc-4.6.3-rh --with-mpc=/usr/local/mpc-0.9/x86_64-Linux-core2-fc-mpir-2.5.1-mpfr-3.1.0-gcc-4.6.3-rh --prefix=/usr/local/gcc-4.7.0/x86_64-Linux-core2-fc
Thread model: posix
gcc version 4.7.0 (GCC) 
****************************************************
spkg-install is using
VERSION = 4.5.6
GAP_DIR = gap-4.5.6
INSTALL_DIR = /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.5.beta1/local/gap/gap-4.5.6
Applying patches...
patching file gap.shi
patching file tst/testinstall.g
Configuring GAP...
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking how to run the C preprocessor... gcc -E
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking size of void *... 8
checking ABI bit size... 64
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking target system type... x86_64-unknown-linux-gnu
checking for gcc... (cached) gcc
checking whether we are using the GNU C compiler... (cached) yes
checking whether gcc accepts -g... (cached) yes
checking for gcc option to accept ISO C89... (cached) none needed
checking whether make sets $(MAKE)... yes
checking GAP config name... default64
configure: error: Could not locate GMP in the specified location
Error configuring GAP.

comment:85 in reply to: ↑ 84 ; follow-up: Changed 9 years ago by dimpase

Replying to jdemeyer:

This fails on Skynet eno:

Host system:
Linux eno 3.3.7-1.fc16.x86_64 #1 SMP Tue May 22 13:59:39 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
****************************************************
C compiler: gcc
C compiler version:
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-4.7.0/x86_64-Linux-core2-fc/libexec/gcc/x86_64-unknown-linux-gnu/4.7.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /usr/local/gcc-4.7.0/src/gcc-4.7.0/configure --enable-languages=c,c++,fortran --with-gnu-as --with-as=/usr/local/binutils-2.22/x86_64-Linux-core2-fc-gcc-4.6.2-rh/bin/as --with-gnu-ld --with-ld=/usr/local/binutils-2.22/x86_64-Linux-core2-fc-gcc-4.6.2-rh/bin/ld --with-gmp=/usr/local/mpir-2.5.1/x86_64-Linux-core2-fc-gcc-4.6.3-rh --with-mpfr=/usr/local/mpfr-3.1.0/x86_64-Linux-core2-fc-mpir-2.5.1-gcc-4.6.3-rh --with-mpc=/usr/local/mpc-0.9/x86_64-Linux-core2-fc-mpir-2.5.1-mpfr-3.1.0-gcc-4.6.3-rh --prefix=/usr/local/gcc-4.7.0/x86_64-Linux-core2-fc
Thread model: posix
gcc version 4.7.0 (GCC) 

must everything be able to run with gcc 4.7.0, too?

comment:86 in reply to: ↑ 85 Changed 9 years ago by jdemeyer

Replying to dimpase:

must everything be able to run with gcc 4.7.0, too?

At least some gcc-4.7.x version should work.

comment:87 Changed 9 years ago by jdemeyer

The orphaned processes are still a problem for me. After playing around with a Sage version including this patch, I am seeing 9 gap processes running at 100% CPU.

comment:88 Changed 9 years ago by jdemeyer

  • Keywords orphaned processes build on eno added

comment:89 Changed 9 years ago by jdemeyer

On Itanium (Skynet iras):

gap(15123): unaligned access to 0x607ffffffecfa0ef, ip=0x400000000019cbd0

gap(15123): unaligned access to 0x607ffffffecfa0e7, ip=0x400000000019cbd0

gap(15123): unaligned access to 0x607ffffffecfa0df, ip=0x400000000019cbd0

gap(15123): unaligned access to 0x607ffffffecfa0d7, ip=0x400000000019cbd0

gap(15123): unaligned access to 0x607ffffffecfa0cf, ip=0x400000000019cbd0

sage -t  --long -force_lib devel/sage/sage/interfaces/gap.py
**********************************************************************
File "/home/buildbot/build/sage/iras-1/iras_full/build/sage-5.5.beta1/devel/sage-main/sage/interfaces/gap.py", line 886:
    sage: c = gap.trait_names()
Exception raised:
    Traceback (most recent call last):
      File "/home/buildbot/build/sage/iras-1/iras_full/build/sage-5.5.beta1/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/home/buildbot/build/sage/iras-1/iras_full/build/sage-5.5.beta1/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/home/buildbot/build/sage/iras-1/iras_full/build/sage-5.5.beta1/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_21[2]>", line 1, in <module>
        c = gap.trait_names()###line 886:
    sage: c = gap.trait_names()
      File "/home/buildbot/build/sage/iras-1/iras_full/build/sage-5.5.beta1/local/lib/python/site-packages/sage/interfaces/gap.py", line 1379, in trait_names
        self.__trait_names = eval(self.eval('NamesSystemGVars()')) + \
      File "<string>", line 2644
        gap(15082): unaligned access to 0x607ffffffe79200f, ip=0x400000000019cbd0
                  ^
    SyntaxError: invalid syntax
**********************************************************************

comment:90 follow-up: Changed 9 years ago by jdemeyer

On OS X 10.6 (William's bsd machine), I get the same build error as eno:

Host system:
Darwin bsd.math.washington.edu 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun  7 16:32:41 PDT 2011; root:xnu-1504.15.3~1/RELEASE_X86_64 x86_64
****************************************************
C compiler: gcc
C compiler version:
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/Users/buildbot/build/sage/bsd-1/bsd_full/build/sage-5.5.beta1/local/libexec/gcc/x86_64-apple-darwin10.8.0/4.6.3/lto-wrapper
Target: x86_64-apple-darwin10.8.0
Configured with: ../src/configure --prefix=/Users/buildbot/build/sage/bsd-1/bsd_full/build/sage-5.5.beta1/local --with-local-prefix=/Users/buildbot/build/sage/bsd-1/bsd_full/build/sage-5.5.beta1/local --with-gmp=/Users/buildbot/build/sage/bsd-1/bsd_full/build/sage-5.5.beta1/local --with-mpfr=/Users/buildbot/build/sage/bsd-1/bsd_full/build/sage-5.5.beta1/local --with-mpc=/Users/buildbot/build/sage/bsd-1/bsd_full/build/sage-5.5.beta1/local --with-system-zlib --disable-multilib  
Thread model: posix
gcc version 4.6.3 (GCC) 
****************************************************
spkg-install is using
VERSION = 4.5.6
GAP_DIR = gap-4.5.6
INSTALL_DIR = /Users/buildbot/build/sage/bsd-1/bsd_full/build/sage-5.5.beta1/local/gap/gap-4.5.6
Applying patches...
patching file gap.shi
patching file tst/testinstall.g
Configuring GAP...
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking how to run the C preprocessor... cpp
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking size of void *... 8
checking ABI bit size... 64
checking build system type... x86_64-apple-darwin10.8.0
checking host system type... x86_64-apple-darwin10.8.0
checking target system type... x86_64-apple-darwin10.8.0
checking for gcc... (cached) gcc
checking whether we are using the GNU C compiler... (cached) yes
checking whether gcc accepts -g... (cached) yes
checking for gcc option to accept ISO C89... (cached) none needed
checking whether make sets $(MAKE)... yes
checking GAP config name... default64
configure: error: Could not locate GMP in the specified location
Error configuring GAP.

comment:91 follow-up: Changed 9 years ago by jdemeyer

Concerning the orphaned process: could it be that they appear during the build as opposed to when running Sage? Typical command line:

/release/merger/sage-5.5.beta2/local/gap/latest/bin/x86_64-unknown-linux-gnu-gcc-default64/gap -m 24m -l /release/merger/sage-5.5.beta2/local/gap/latest -r -b -p -T -o 6271720243 /release/merger/sage-5.5.beta2/local/share/sage/ext/gap/sage.g

comment:92 in reply to: ↑ 91 Changed 9 years ago by dimpase

Replying to jdemeyer:

Concerning the orphaned process: could it be that they appear during the build as opposed to when running Sage? Typical command line:

/release/merger/sage-5.5.beta2/local/gap/latest/bin/x86_64-unknown-linux-gnu-gcc-default64/gap -m 24m -l /release/merger/sage-5.5.beta2/local/gap/latest -r -b -p -T -o 6271720243 /release/merger/sage-5.5.beta2/local/share/sage/ext/gap/sage.g

well, I have seen an orphaned gap process a couple of times, but I don't even recall whether it was with this patch, or not.

comment:93 in reply to: ↑ 90 Changed 9 years ago by dimpase

Replying to jdemeyer:

On OS X 10.6 (William's bsd machine), I get the same build error as eno:

Oh, OK, this is a missing dependency, I think. GAP depends upon MPIR now, as it uses (pseudo)GMP. I'll attach a patch in a second.

Changed 9 years ago by dimpase

adding MPIR dependence

comment:94 Changed 9 years ago by dimpase

  • Description modified (diff)
  • Status changed from needs_work to needs_review

comment:95 Changed 9 years ago by vbraun

The question is not why are there GAP orphans, but why is the sage-cleaner not killing them once the parent Sage process quits?

I'll shortly add a patch to run gap with prctl --unaligned=silent on itanium, this will get rid of the offending alignment warnings.

comment:96 Changed 9 years ago by dimpase

  • Keywords build on eno removed

comment:97 Changed 9 years ago by jdemeyer

I'm currently building again to try to determine when the processes are created.

Changed 9 years ago by vbraun

Initial patch

comment:98 Changed 9 years ago by vbraun

  • Description modified (diff)

Positive review to dima's dependency patch.

I've also added the patch to suppress the itanium warnings. I haven't tested it on itanium yet (building sage-5.4.rc3 first). But I've verified that the prctl call gets rid of the warnings.

comment:99 Changed 9 years ago by vbraun

PS: The GAP command line includes -o 6271720243, this is clearly from trac_13211_pool_size.patch and not a process created during build.

comment:100 follow-up: Changed 9 years ago by jdemeyer

  • Status changed from needs_review to needs_work

I found out how to reproduce the orphans: running local/bin/sage-starts creates a gap orphan every time I run it.

comment:101 in reply to: ↑ 100 ; follow-up: Changed 9 years ago by dimpase

Replying to jdemeyer:

I found out how to reproduce the orphans: running local/bin/sage-starts creates a gap orphan every time I run it.

I can reproduce this neither on Debian x86_64 nor on OSX 10.6.8. Specifically, I cd to SAGE_ROOT and start ./local/bin/sage-starts there. I even tried to remove ~/.sage/, no difference.

It looks like we need more details on the system you see this. (Preferably, access to it...).

comment:102 in reply to: ↑ 101 ; follow-up: Changed 9 years ago by dimpase

Replying to dimpase:

Replying to jdemeyer:

I found out how to reproduce the orphans: running local/bin/sage-starts creates a gap orphan every time I run it.

I can reproduce this neither on Debian x86_64 nor on OSX 10.6.8. Specifically, I cd to SAGE_ROOT and start ./local/bin/sage-starts there. I even tried to remove ~/.sage/, no difference.

It looks like we need more details on the system you see this. (Preferably, access to it...).

OK, got it. I need to issue several local/bin/sage-starts within a short period of time on Debian x86_64. Then I get the orphans! Otherwise, not. A race condition, of sorts?

comment:103 in reply to: ↑ 102 ; follow-up: Changed 9 years ago by dimpase

Replying to dimpase:

Replying to dimpase:

Replying to jdemeyer:

I found out how to reproduce the orphans: running local/bin/sage-starts creates a gap orphan every time I run it.

I can reproduce this neither on Debian x86_64 nor on OSX 10.6.8. Specifically, I cd to SAGE_ROOT and start ./local/bin/sage-starts there. I even tried to remove ~/.sage/, no difference.

It looks like we need more details on the system you see this. (Preferably, access to it...).

OK, got it. I need to issue several local/bin/sage-starts within a short period of time on Debian x86_64. Then I get the orphans! Otherwise, not. A race condition, of sorts?

Oops, no, in fact, these are not orphans, in the sense that they are not staying running forever. They all finish within 20-30 seconds. A typical process looks as follows (very similar to Jeroen's case):

dima     17195     1 93 19:24 ?        00:00:25 /usr/local/src/sage/sage-5.4.rc3/local/gap/latest/bin/x86_64-unknown-linux-gnu-gcc-default64/gap -m 24m -l /usr/local/src/sage/sage-5.4.rc3/local/gap/latest -r -b -p -T -o 872851456 /usr/local/src/sage/sage-5.4.rc3/local/share/sage/ext/gap/sage.g

Jeroen, so you say that for you these processes don't finish by themselves?

comment:104 in reply to: ↑ 103 Changed 9 years ago by jdemeyer

Replying to dimpase:

Jeroen, so you say that for you these processes don't finish by themselves?

Yes, they do not finish by themselves (at least not within a day).

It looks like we need more details on the system you see this. (Preferably, access to it...).

sage.math.washington.edu: Ubuntu 8.04.4 LTS, x86_64.

comment:105 Changed 9 years ago by jdemeyer

Well, I spoke too soon, sometimes they do finish by themselves.

comment:106 follow-up: Changed 9 years ago by jdemeyer

Big there's still the obvious question: what is that gap process doing and why does it keep running for a while?

comment:107 Changed 9 years ago by jdemeyer

Running strace on such a gap process shows:

getrusage(RUSAGE_SELF, {ru_utime={97, 710000}, ru_stime={0, 200000}, ...}) = 0
write(1, "@!", 2)                       = -1 EIO (Input/output error)
getrusage(RUSAGE_SELF, {ru_utime={97, 740000}, ru_stime={0, 200000}, ...}) = 0
write(1, "@!", 2)                       = -1 EIO (Input/output error)
getrusage(RUSAGE_SELF, {ru_utime={97, 770000}, ru_stime={0, 200000}, ...}) = 0
write(1, "@!", 2)                       = -1 EIO (Input/output error)
getrusage(RUSAGE_SELF, {ru_utime={97, 800000}, ru_stime={0, 200000}, ...}) = 0
write(1, "@!", 2)                       = -1 EIO (Input/output error)
getrusage(RUSAGE_SELF, {ru_utime={97, 830000}, ru_stime={0, 200000}, ...}) = 0

[...]

getrusage(RUSAGE_SELF, {ru_utime={116, 940000}, ru_stime={0, 200000}, ...}) = 0
write(1, "@!", 2)                       = -1 EIO (Input/output error)
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
Process 12608 detached

It looks like it's stuck in an infinite loop until it finally segfaults, which is almost certainly not what's supposed to happen.

comment:108 Changed 9 years ago by jdemeyer

For those interested: a full strace can be found at http://boxen.math.washington.edu/home/jdemeyer/gap.trace

comment:109 in reply to: ↑ 106 Changed 9 years ago by dimpase

Replying to jdemeyer:

Big there's still the obvious question: what is that gap process doing and why does it keep running for a while?

that it's the same GAP process as the one that gets started during the "normal" Sage startup. And it does the following things, as you can see by uncommenting LogTo line at the end of sage.g.

gap> LoadPackage("ctbllib");
true
...
gap> SaveWorkspace("/var/folders/qW/qWY+4Ku1GF0WXrOsV+IDvk+++TM/-Tmp-//dotsageKTXveD/gap/workspace-8046660130267724445");
true
gap> 

LoadPackage() things are OK, it's just loading GAP's packages. The tough thing is SaveWorkspace(), which dumps GAP's workspace into a binary file; which can be loaded back --- well, not in this case of course, cause we are invoked with --nodotsage, so it goes to waste.

So I suppose that's what gets stuck, and crashes, for some reason (e.g. not enough disk space?).

comment:110 Changed 9 years ago by vbraun

Whats the content of your ~/.sage/tmp/<hostname>/<pid>/spawned_processes file?

comment:111 Changed 9 years ago by vbraun

PS: Writing the workspace dump succeeds as you can see from Jeroen's strace.

We don't specifically close down GAP afterwards, but just close the stdin/out pipes. This sends GAP into at tizzy, and it keeps trying to read from stdin after getting EIO (why would you do this?). Still, the real question remains: why is GAP not killed by the sage cleaner?

comment:112 Changed 9 years ago by jdemeyer

Now I see ENOSPC errors (I think not before):

open("/tmp/dotsagewHCJJ1/gap/workspace-5368401171496622154", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 4
write(4, "GAP workspace\0004.5.6\00064 bit\0\10\7\6\5\4"..., 100000) = -1 ENOSPC (No space left on device)

But this doesn't make sense as there is plenty of space on /tmp and I can create large files manually there:

jdemeyer@sage:sage-5.5.beta0-gap$ dd if=/dev/zero of=/tmp/dotsagewHCJJ1/gap/workspace-5368401171496622154 bs=1024 count=100000
100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 0.158741 s, 645 MB/s

I am lost...

comment:113 Changed 9 years ago by jdemeyer

Could the ENOSPC be related to the fact that this a tmpfs and that GAP reserves all the memory for itself?

comment:114 Changed 9 years ago by ppurka

Does gap create lots of small files in /tmp? If so, are you running out of inodes? You can check using df -i.

comment:115 Changed 9 years ago by jdemeyer

/tmp is as good as empty:

jdemeyer@sage:~$ df -h /tmp
Filesystem      Size  Used Avail Use% Mounted on
tmpfs            16G   25M   16G   1% /tmp
jdemeyer@sage:~$ df -i /tmp
Filesystem     Inodes IUsed  IFree IUse% Mounted on
tmpfs          524288  1253 523035    1% /tmp

(looking at the trace, gap doesn't create a lot of files in /tmp)

comment:116 Changed 9 years ago by vbraun

Possibly kernel bug? From 2.6.38 changelog: "tmpfs: fix spurious ENOSPC when racing with unswap". GAP puts pressure on swap since it locks address space (and hence reduces the available swap).

comment:117 Changed 9 years ago by jdemeyer

Switching filesystems made to ENOSPC disappear.

comment:118 Changed 9 years ago by jdemeyer

But now I cannot reproduce the orphans anymore, they now crash much earlier:

write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\202\30\0\0\0\0\0\0\300\3\0\0\0"..., 100000) = 100000
write(4, "\0\0\0\0\275\10\0\0\0\0\0\0\363\6\0\0\0\0\0\0\0\0\0\0\0"..., 100000) = 100000
write(4, "\0\0\0\275\10\0\0\0\0\0\0\306\v\0\0\0\0\0\0\0\0\0\0\0\0"..., 100000) = 100000
write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\275"..., 86744) = 86744
close(4)                                = 0
write(1, "@n", 2)                       = 2
write(1, "true@J", 6)                   = 6
getrusage(RUSAGE_SELF, {ru_utime={2, 900000}, ru_stime={0, 140000}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={2, 900000}, ru_stime={0, 140000}, ...}) = 0
write(1, "@n", 2)                       = 2
write(1, "gap> ", 5)                    = 5
write(1, "@i", 2)                       = 2
getrusage(RUSAGE_SELF, {ru_utime={2, 900000}, ru_stime={0, 140000}, ...}) = 0
read(0, "", 1)                          = 0
write(1, "\r", 1)                       = -1 EIO (Input/output error)
write(1, "@f", 2)                       = -1 EIO (Input/output error)
write(1, "@f", 2)                       = -1 EIO (Input/output error)
write(1, "@f", 2)                       = -1 EIO (Input/output error)
write(1, "@f", 2)                       = -1 EIO (Input/output error)
write(1, "@f", 2)                       = -1 EIO (Input/output error)
write(1, "@f", 2)                       = -1 EIO (Input/output error)
write(1, "@f", 2)                       = -1 EIO (Input/output error)
write(1, "@f", 2)                       = -1 EIO (Input/output error)
write(1, "@f", 2)                       = -1 EIO (Input/output error)
write(1, "@f", 2)                       = -1 EIO (Input/output error)
write(1, "@f", 2)                       = -1 EIO (Input/output error)
write(1, "@f", 2)                       = -1 EIO (Input/output error)
write(1, "@f", 2)                       = -1 EIO (Input/output error)
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++

comment:119 Changed 9 years ago by jdemeyer

And "$DOT_SAGE/tmp/sage.math.washington.edu" is empty after Sage is finished.

comment:120 follow-up: Changed 9 years ago by jdemeyer

But for sage-cleaner, what matters is "$DOT_SAGE/temp/sage.math.washington.edu"

comment:121 Changed 9 years ago by jdemeyer

When running Sage interactively (as opposed to sage -c), gap gets closed properly because it gets sent "quit;\n"

write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\201\20\0\0\0\0"..., 91115) = 91115
close(4)                                = 0
write(1, "@n", 2)                       = 2
write(1, "true@J", 6)                   = 6
getrusage(RUSAGE_SELF, {ru_utime={2, 790000}, ru_stime={0, 170000}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={2, 790000}, ru_stime={0, 170000}, ...}) = 0
write(1, "@n", 2)                       = 2
write(1, "gap> ", 5)                    = 5
write(1, "@i", 2)                       = 2
getrusage(RUSAGE_SELF, {ru_utime={2, 790000}, ru_stime={0, 170000}, ...}) = 0
read(0, "q", 1)                         = 1
write(1, "q", 1)                        = 1
read(0, "u", 1)                         = 1
write(1, "u", 1)                        = 1
read(0, "i", 1)                         = 1
write(1, "i", 1)                        = 1
read(0, "t", 1)                         = 1
write(1, "t", 1)                        = 1
read(0, ";", 1)                         = 1
write(1, ";", 1)                        = 1
read(0, "\n", 1)                        = 1
write(1, "\r", 1)                       = 1
write(1, "\n", 1)                       = 1
write(1, "@r", 2)                       = 2
write(1, "quit;@J", 7)                  = 7
getrusage(RUSAGE_SELF, {ru_utime={2, 790000}, ru_stime={0, 170000}, ...}) = 0
read(3, "", 20000)                      = 0
close(3)                                = 0
exit_group(0)                           = ?

comment:122 Changed 9 years ago by jdemeyer

  • Keywords segmentation fault in child process added; orphaned processes removed

Regardless of sage-cleaner, I think it's clear that the gap Segmentation Fault should be fixed.

comment:123 in reply to: ↑ 120 Changed 9 years ago by dimpase

Replying to jdemeyer:

But for sage-cleaner, what matters is "$DOT_SAGE/temp/sage.math.washington.edu"

OK, so the cleaner does not clean if one starts with --nodotsage...

At least this should go another ticket, IMHO.

comment:124 Changed 9 years ago by jdemeyer

  • Keywords segmentation fault in child process removed
  • Work issues set to segmentation fault in child process

Changed 9 years ago by vbraun

Initial patch

comment:125 Changed 9 years ago by vbraun

  • Description modified (diff)
  • Status changed from needs_work to needs_review
  • Work issues segmentation fault in child process deleted

I've added a patch to explicitly quit Gap after writing the workspace.

The sage-cleaner indeed looks in .../temp/..., but SAGE_TMP is in .../tmp/... so the cleaner is currently broken. Thats the fault of #13579

comment:126 Changed 9 years ago by vbraun

I've made #13681 to deal with some fallout of the SAGE_TMP change that #13579 introduced.

comment:127 Changed 9 years ago by dimpase

  • Status changed from needs_review to positive_review

comment:128 Changed 9 years ago by jdemeyer

  • Status changed from positive_review to needs_work

After running some doctests again with this patch, I see a lot of orphan gap processes again. The strace is as before:

[...]
getrusage(RUSAGE_SELF, {ru_utime={9186, 870000}, ru_stime={6, 120000}, ...}) = 0
write(1, "@!", 2)                       = -1 EIO (Input/output error)
getrusage(RUSAGE_SELF, {ru_utime={9187, 210000}, ru_stime={6, 120000}, ...}) = 0
write(1, "@!", 2)                       = -1 EIO (Input/output error)
getrusage(RUSAGE_SELF, {ru_utime={9187, 550000}, ru_stime={6, 120000}, ...}) = 0
write(1, "@!", 2)                       = -1 EIO (Input/output error)
getrusage(RUSAGE_SELF, {ru_utime={9187, 910000}, ru_stime={6, 120000}, ...}) = 0
write(1, "@!", 2)                       = -1 EIO (Input/output error)
getrusage(RUSAGE_SELF, {ru_utime={9188, 260000}, ru_stime={6, 120000}, ...}) = 0
write(1, "@!", 2)                       = -1 EIO (Input/output error)
[...]

comment:129 follow-up: Changed 9 years ago by vbraun

  • Status changed from needs_work to needs_info

Again, the question is not why are there GAP orphans but why is the sage-cleaner not killing them once the parent Sage process quits?

comment:130 in reply to: ↑ 129 Changed 9 years ago by jdemeyer

Replying to vbraun:

Again, the question is not why are there GAP orphans but why is the sage-cleaner not killing them once the parent Sage process quits?

That's a question, I don't think it's the question. Sage should clean up for itself and sage-cleaner should only be a last resort.

comment:131 Changed 9 years ago by vbraun

I agree, but since you are the only one who is able to get the orphans (possibly in conjunction with a kernel bug) I'd be perfectly happy to let sage-cleaner handle this corner case.

comment:132 Changed 9 years ago by vbraun

On an unrelated note, the interleaved getrusage / write strace probably means that GAP is running something in the internal profiler.

comment:133 Changed 9 years ago by vbraun

Can you confirm that the offending gap process still doesn't include the "-L ...gapworkspace.." command line switch? This probably means that the processes originate when creating the workspace. Is this on a file system that disallows simultaneous writes from multiple processes to the same file?

comment:134 Changed 9 years ago by vbraun

Also, can you post a strace for the hanging GAP process? With trac_13211_quit_after_workspace.patch the 'gap_reset_workspace()' command quits correctly, so I suspect your problem is at a different place now.

comment:135 Changed 9 years ago by jdemeyer

To better test this, I disabled the cleaner (by putting sys.exit(0) in sage-cleaner).

Example strace of segfaulting process is attached.

comment:136 Changed 9 years ago by jdemeyer

I'm also seeing a lot of ENOMEM errors in the trace. Could these be caused by the huge pool size?

comment:137 Changed 9 years ago by jdemeyer

There is a potential infinite loop with _eval_line in gap.py: when it crashes, it keeps trying again over and over:

    def _eval_line(self, line, allow_use_file=True, wait_for_prompt=True, restart_if_needed=True):
        [...]
        try:
            [...]
        except (RuntimeError,TypeError),message:
            if 'EOF' in message[0] or E is None or not E.isalive():
                print "** %s crashed or quit executing '%s' **"%(self, line)
                print "Restarting %s and trying again"%self
                self._start()
                if line != '':
                    return self._eval_line(line, allow_use_file=allow_use_file)
                else:
                    return ''
            else:
                raise RuntimeError, message

Is this intentional? Probably the number of retries should be limited.

(this seems unrelated to this ticket but I thought I should mention it)

Last edited 9 years ago by jdemeyer (previous) (diff)

comment:138 Changed 9 years ago by vbraun

Did you apply trac_13211_pool_size.patch? Unless your machine has exabyte-sized swap it shouldn't even try to allocate such a large pool.

comment:139 Changed 9 years ago by vbraun

Oh I see, gap -o <number> doesn't work. It apparently requires a binary prefix.

comment:140 follow-up: Changed 9 years ago by vbraun

There is apparently an overflow in gap's argument parsing at 3*2^31:

(sage-sh) vbraun@localhost:~$ /home/vbraun/opt/sage-5.4.rc4/local/gap/latest/bin/x86_64-unknown-linux-gnu-gcc-default64/gap -l /home/vbraun/opt/sage-5.4.rc4/local/gap/latest -o 6442450943
 ┌───────┐   GAP, Version 4.5.6 of 16-Sep-2012 (free software, GPL)
 │  GAP  │   http://www.gap-system.org
 └───────┘   Architecture: x86_64-unknown-linux-gnu-gcc-default64
 Libs used:  gmp, readline
 Loading the library and packages ...
 Packages:   GAPDoc 1.5.1
 Try '?help' for help. See also  '?copyright' and  '?authors'
gap> 
(sage-sh) vbraun@localhost:~$ /home/vbraun/opt/sage-5.4.rc4/local/gap/latest/bin/x86_64-unknown-linux-gnu-gcc-default64/gap -l /home/vbraun/opt/sage-5.4.rc4/local/gap/latest -o 6442450944
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
gap: halving pool size.
 ┌───────┐   GAP, Version 4.5.6 of 16-Sep-2012 (free software, GPL)
 │  GAP  │   http://www.gap-system.org
 └───────┘   Architecture: x86_64-unknown-linux-gnu-gcc-default64
 Libs used:  gmp, readline
 Loading the library and packages ...
 Packages:   GAPDoc 1.5.1
 Try '?help' for help. See also  '?copyright' and  '?authors'

That explains why you get the ENOMEMs. But gap tries smaller and smaller mmaps until it succeeds, so this is not a real problem. It will eat the available swap space and put much more pressure on the virtual memory system, though. I'll report this issue upstream.

Last edited 9 years ago by vbraun (previous) (diff)

comment:141 follow-up: Changed 9 years ago by vbraun

I've updated the patch to use gap -o <number>m for a specific number of megabytes instead of specifying the pool size in bytes. This should avoid the argument parsing overflow.

comment:142 in reply to: ↑ 140 Changed 9 years ago by dimpase

Replying to vbraun:

There is apparently an overflow in gap's argument parsing at 3*2^31: That explains why you get the ENOMEMs. But gap tries smaller and smaller mmaps until it succeeds, so this is not a real problem. It will eat the available swap space and put much more pressure on the virtual memory system, though. I'll report this issue upstream.

by the way, GAP finally has a real tracker!

comment:143 in reply to: ↑ 141 Changed 9 years ago by dimpase

  • Status changed from needs_info to needs_review

Replying to vbraun:

I've updated the patch to use gap -o <number>m for a specific number of megabytes instead of specifying the pool size in bytes. This should avoid the argument parsing overflow.

OK, I am testing this on the culprit machine (sage on sagemath UW cluster), and not able to see any staying up GAP processes. I'll try a bit more, and unless I succeed, I'll make it positive review...

comment:144 follow-ups: Changed 9 years ago by jdemeyer

After building and fully doctesting Sage, I still end up with 20 gap processes:

jdemeyer@sage:~$ ps -ef |grep gap
jdemeyer  6148     1 83 10:12 ?        03:35:26 /release/merger/sage-5.5.beta2/local/gap/latest/bin/x86_64-unknown-linux-gnu-gcc-default64/gap -m 24m -l /release/merger/sage-5.5.beta2/local/gap/latest -r -L /release/merger/sage-5.5.beta2/home/.sage/gap/workspace-1647326966196298359 -b -p -T -o 6274m /release/merger/sage-5.5.beta2/local/share/sage/ext/gap/sage.g
jdemeyer  6150     1 83 10:12 ?        03:36:51 /release/merger/sage-5.5.beta2/local/gap/latest/bin/x86_64-unknown-linux-gnu-gcc-default64/gap -m 24m -l /release/merger/sage-5.5.beta2/local/gap/latest -r -L /release/merger/sage-5.5.beta2/home/.sage/gap/workspace-1647326966196298359 -b -p -T -o 6274m /release/merger/sage-5.5.beta2/local/share/sage/ext/gap/sage.g
jdemeyer  6154     1 83 10:12 ?        03:35:23 /release/merger/sage-5.5.beta2/local/gap/latest/bin/x86_64-unknown-linux-gnu-gcc-default64/gap -m 24m -l /release/merger/sage-5.5.beta2/local/gap/latest -r -L /release/merger/sage-5.5.beta2/home/.sage/gap/workspace-1647326966196298359 -b -p -T -o 6274m /release/merger/sage-5.5.beta2/local/share/sage/ext/gap/sage.g
[...]

comment:145 Changed 9 years ago by jdemeyer

At least, the ENOMEM errors are gone!

comment:146 Changed 9 years ago by jdemeyer

gap.10799 shows the strace of a hanging process. At some point, the trace stops, it seems no further system calls are done.

I don't see the EIO errors and segmentation faults anymore, perhaps that was caused by ENOMEM?

comment:147 in reply to: ↑ 144 Changed 9 years ago by dimpase

Replying to jdemeyer:

After building and fully doctesting Sage, I still end up with 20 gap processes:

I tried applying the patches (and the spkg) on this ticket to Sage 5.4.rc4, and running make ptest. I have not got any gap processes left after it has finished. Have you been doing something else, something nonstandard?

comment:148 in reply to: ↑ 144 Changed 9 years ago by dimpase

Replying to jdemeyer:

After building and fully doctesting Sage, I still end up with 20 gap processes:

jdemeyer@sage:~$ ps -ef |grep gap
jdemeyer  6148     1 83 10:12 ?        03:35:26 /release/merger/sage-5.5.beta2/local/gap/latest/bin/x86_64-unknown-linux-gnu-gcc-default64/gap -m 24m -l /release/merger/sage-5.5.beta2/local/gap/latest -r -L /release/merger/sage-5.5.beta2/home/.sage/gap/workspace-1647326966196298359 -b -p -T -o 6274m /release/merger/sage-5.5.beta2/local/share/sage/ext/gap/sage.g
jdemeyer  6150     1 83 10:12 ?        03:36:51 /release/merger/sage-5.5.beta2/local/gap/latest/bin/x86_64-unknown-linux-gnu-gcc-default64/gap -m 24m -l /release/merger/sage-5.5.beta2/local/gap/latest -r -L /release/merger/sage-5.5.beta2/home/.sage/gap/workspace-1647326966196298359 -b -p -T -o 6274m /release/merger/sage-5.5.beta2/local/share/sage/ext/gap/sage.g
jdemeyer  6154     1 83 10:12 ?        03:35:23 /release/merger/sage-5.5.beta2/local/gap/latest/bin/x86_64-unknown-linux-gnu-gcc-default64/gap -m 24m -l /release/merger/sage-5.5.beta2/local/gap/latest -r -L /release/merger/sage-5.5.beta2/home/.sage/gap/workspace-1647326966196298359 -b -p -T -o 6274m /release/merger/sage-5.5.beta2/local/share/sage/ext/gap/sage.g
[...]

Where is .sage/ used in these calls to GAP? AFAIK there is no /release/merger/sage-5.5.beta2/home/, yet they list GAP workspace files /release/merger/sage-5.5.beta2/home/.sage/gap/workspace-1647326966196298359.

Weird. Could it be the reason for them to hang?

Last edited 9 years ago by dimpase (previous) (diff)

comment:149 Changed 9 years ago by jdemeyer

/release/merger/sage-5.5.beta2/home is a temporary $HOME directory for the merger. It's similar to using the --nodotsage command line option to sage.

comment:150 Changed 9 years ago by vbraun

The GAP developers acknowledged the option parsing overflow, will be fixed in the next stable release.

Changed 9 years ago by vbraun

Initial patch

comment:151 Changed 9 years ago by vbraun

Is the number of orphans related to the number of parallel doctest processes? The added patch fixes an issue where we abandoned one process instead of killing it if the cached workspace fails to load and needs to be recreated.

comment:152 Changed 9 years ago by vbraun

  • Description modified (diff)

comment:153 Changed 9 years ago by jdemeyer

I spoke too soon about the EIO errors and segmentation faults being gone, I still get them.

comment:154 follow-up: Changed 9 years ago by vbraun

I still don't understand why you get EIO. This means something is seriously foobared, no? When sage quits the other end of the pipe should be closed, resulting in EPIPE.

My theory is the following: the Sage expect interface can use temporary files to redirect stdin/out and this is where the problem lies. If you run with sage --nodotsage, this temporary file is on /tmp (see SAGE_TMP_INTERFACE). The tmpfs on sage.math has some bug that is triggered when you put pressure on the virtual memory, when most of the swap is locked in anonymous mmaps.

comment:155 Changed 9 years ago by vbraun

I ran some more doctests and GAP is not the only process that ends up in a write loop with EIO until it segfaults. E.g. Maxima does it, too. I guess your problem is that GAP does not segfault under certain circumstances.

comment:156 Changed 9 years ago by vbraun

The glibc manual says: EIO also occurs when a background process tries to read from the controlling terminal, and the normal action of stopping the process by sending it a SIGTTIN signal isn't working. This might happen if signal is being blocked or ignored, or because the process group is orphaned. See section Job Control, for more information about job control, and section Signal Handling, for information about signals.

comment:157 follow-up: Changed 9 years ago by dimpase

I don't want to annoy Jeroen yet again, but I really think we are done with this ticket. If on an old system (Ubuntu 8.04 LTS will reach its EOL in the coming April) under some very unusual circumstances it doesn't quite work as expected, it should not hold the ticket up.

comment:158 in reply to: ↑ 157 Changed 9 years ago by jdemeyer

Replying to dimpase:

I don't want to annoy Jeroen yet again, but I really think we are done with this ticket. If on an old system (Ubuntu 8.04 LTS will reach its EOL in the coming April) under some very unusual circumstances it doesn't quite work as expected, it should not hold the ticket up.

If that old system happens to be the one on which Sage releases are made, it obviously holds up the ticket.

Changed 9 years ago by jdemeyer

First 1500 lines of an infinite strace

comment:159 follow-up: Changed 9 years ago by jdemeyer

  • Status changed from needs_review to needs_work

I doubt that /tmp has anything to do with it. The filedescriptor giving EIO is a deleted pseudo-terminal, not a temporary file. This can be seen from /proc/$PID/fd:

jdemeyer@sage:/tmp/gaptrace$ ls -l /proc/19589/fd
total 0
lrwx------ 1 jdemeyer jdemeyer 64 Nov 14 11:49 0 -> /dev/pts/71 (deleted)
lrwx------ 1 jdemeyer jdemeyer 64 Nov 14 11:49 1 -> /dev/pts/71 (deleted)
lrwx------ 1 jdemeyer jdemeyer 64 Nov 14 11:49 2 -> /dev/pts/71 (deleted)

I don't think it's a relevant discussion why we sometimes get Segmentation Faults and sometimes not.

comment:160 in reply to: ↑ 154 Changed 9 years ago by jdemeyer

Replying to vbraun:

When sage quits the other end of the pipe should be closed, resulting in EPIPE.

A pseudo-terminal is not a pipe.

comment:161 in reply to: ↑ 159 Changed 9 years ago by dimpase

Replying to jdemeyer:

I doubt that /tmp has anything to do with it.

Somewhere near the end of sage/ext/gap/sage.g there is a commented out line LogTo("/tmp/gapsage.log"); Could you uncomment it and try to reproduce such a hanging GAP process, with this log on?

As well, perhaps add some debugging prints into each of the GAP functions in the file. Something like AppendTo("/tmp/sageg","OperationsAdmittingFirstArgument\n"); where OperationsAdmittingFirstArgument is the function name, and "tmp/sageg" the name of the log file...

My guess that the problem is in $SAGE.NewPager(), which tries to get $SAGE.tempfile and fails, either during this interaction, or upon attempting to open it.

Last edited 9 years ago by dimpase (previous) (diff)

comment:162 Changed 9 years ago by jdemeyer

I am seriously debugging GAP now to check for the problem.

comment:163 follow-up: Changed 9 years ago by jdemeyer

I'm pretty sure I found the bug. The culprit is the following from src/sysfiles.c:

/* utility to check return value of 'write'  */
ssize_t writeandcheck(int fd, const char *buf, size_t count) {
  int ret;
  ret = write(fd, buf, count);
  if (ret < 0)
  {
    ErrorQuit("Cannot write to file descriptor %d, see 'LastSystemError();'\n",
               fd, 0L);
  }
return ret;
}

If the pseudo-tty is closed, then write() fails with EIO, causing GAP to write an error message. We then get an infinite loop of failing to write the error message.

comment:164 in reply to: ↑ 163 Changed 9 years ago by dimpase

Replying to jdemeyer:

It seems that ErrorQuit could just write to stderr in this case. Although in general it's not clear to me how to deal with it, for such a plug perhaps could break the ability of the interpreter to recover from errors. Is it actually possible to reliably check that the pty is closed?

comment:165 follow-up: Changed 9 years ago by jdemeyer

Now that I know what the problem is, reproducing the Segmentation Fault is trivial. Within a Sage shell:

(sage-sh) jdemeyer@sage:sage-5.5.beta2$ gap &>/dev/full
Segmentation fault

comment:166 follow-up: Changed 9 years ago by vbraun

You'd still get a EPIPE if the process wasn't orphaned. Also, the problem is that GAP is not segfaulting in some circumstances when it is orphaned. If it would segfault you wouldn't get an orphan.

The underlying problem is that the expect interfaces don't always clean up processes before quitting Sage, hence leading to orphaned processes to start with. Only after the process is orphaned you get EIO's.

comment:167 in reply to: ↑ 166 Changed 9 years ago by jdemeyer

Replying to vbraun:

You'd still get a EPIPE if the process wasn't orphaned.

I doubt that pseudo-terminals can cause an EPIPE (at least in Linux). Even if they would, you should also get a SIGPIPE signal, killing the process.

Also, the problem is that GAP is not segfaulting in some circumstances when it is orphaned.

I would not say that this is the problem. There is a bug in GAP which may or may not lead to a segfault. Relying that it will cause a segfault is silly.

The underlying problem is that the expect interfaces don't always clean up processes before quitting Sage, hence leading to orphaned processes to start with. Only after the process is orphaned you get EIO's.

True. So either we fix GAP or we fix the Sage interface, and preferably both.

Changed 9 years ago by vbraun

maxima getting EIO & segfault too

comment:168 follow-up: Changed 9 years ago by vbraun

Just for the record, other programs Segfault after running into EIO for a while as well. I dare say that most interactive binaries are not meant to be run as orphans. I don't see much point in making sure that the GAP command line interface can be run without controlling terminal.

comment:169 in reply to: ↑ 168 Changed 9 years ago by jdemeyer

Replying to vbraun:

Just for the record, other programs Segfault after running into EIO for a while as well.

Indeed, it seems ECL has exactly the same bug. But that doesn't mean the GAP bug (or the Sage bug controlling GAP if you want) shouldn't be fixed.

And the problem isn't limited to terminals: as I showed, it can also occur for example when writing to a filesystem which is full.

comment:170 in reply to: ↑ 165 Changed 9 years ago by dimpase

Replying to jdemeyer:

Now that I know what the problem is, reproducing the Segmentation Fault is trivial. Within a Sage shell:

(sage-sh) jdemeyer@sage:sage-5.5.beta2$ gap &>/dev/full
Segmentation fault

This is not the complete story. First off, you need to run with -T, which is meant to suppress the usual interactive behavour, and this is the option using which the orphans in question arise. In fact, it is because stderr is full, not because stdout is full.

(sage-sh) dima@sage$ gap -T >/dev/full
Error, Cannot write to file descriptor 1, see 'LastSystemError();'

Error, Cannot write to file descriptor 1, see 'LastSystemError();'

Error, Cannot write to file descriptor 1, see 'LastSystemError();'

Error, Cannot write to file descriptor 1, see 'LastSystemError();'

Error, Cannot write to file descriptor 1, see 'LastSystemError();'

Syntax error: ; expected
^
Error, Cannot write to file descriptor 1, see 'LastSystemError();'

(sage-sh) dima@sage$

Thus if the stderr is OK, it sort of works (the Syntax error; is quite weird though, and might be an indication of a "more real" bug). By the way, gap-4.4 just hangs in this situation.

As we talk about the situation with -T on, we can patch, say, ErrorQuit for this case to print to stderr and quit immediately. This will not fix this completely, but at least it will make sure that with -T option the behaviour is as it should be.

An alternative is to change Sage so that GAPs stderr does not get redirected. I don't know how well this will work, though.

comment:171 follow-up: Changed 9 years ago by dimpase

  • Report Upstream changed from Completely fixed; Fix reported upstream to Reported upstream. No feedback yet.

I've created an issue on GAP bugtracker in regard to the comment 165: http://tracker.gap-system.org/issues/125

comment:172 in reply to: ↑ 171 Changed 9 years ago by jdemeyer

Replying to dimpase:

I've created an issue on GAP bugtracker in regard to the comment 165: http://tracker.gap-system.org/issues/125

Too bad one cannot even look at the tracker issue without logging in. I registered for an account, but it needs to be approved by a moderator. Let me know if anything interesting comes up from upstream.

comment:173 follow-up: Changed 9 years ago by jdemeyer

  • Description modified (diff)
  • Status changed from needs_work to needs_review

Changed 9 years ago by jdemeyer

Patch added to GAP in gap-4.5.6.p0

comment:174 in reply to: ↑ 173 ; follow-up: Changed 9 years ago by dimpase

  • Report Upstream changed from Reported upstream. No feedback yet. to Reported upstream. Developers acknowledge bug.
  • Status changed from needs_review to positive_review

Replying to jdemeyer:

New spkg with patch added: http://boxen.math.washington.edu/home/jdemeyer/spkg/gap-4.5.6.p0.spkg

Patch attached.

The patch makes gap &>/dev/full hang. (Well, this is consistent with GAP 4.4 behaviour). At least this is good enough, I suppose, to fix the orphans issue. I mark this as positive review, with understanding that orphans, which I can't reproduce anyway, are no longer there. I hope GAP people will fix this good and proper.

comment:175 in reply to: ↑ 174 ; follow-up: Changed 9 years ago by jdemeyer

Replying to dimpase:

The patch makes gap &>/dev/full hang.

Of course it "hangs", since it's waiting for user input. It's still an interactive GAP session, you can hit CTRL-D to exit.

comment:176 in reply to: ↑ 175 Changed 9 years ago by dimpase

Replying to jdemeyer:

Replying to dimpase:

The patch makes gap &>/dev/full hang.

Of course it "hangs", since it's waiting for user input. It's still an interactive GAP session, you can hit CTRL-D to exit.

Oh, right. I tried CTRL-C, in vain :-)

comment:177 follow-up: Changed 9 years ago by jdemeyer

  • Status changed from positive_review to needs_work

There is something wrong with the pool size patch. On the buildbot machine "snapperkob" (Ubuntu 12.04 x86_64), gap is started with "-o 53m" which is too few. It gives

RuntimeError: Gap produced error output
Error, exceeded the permitted memory (`-o' command line option)

   executing SaveWorkspace("/tmp/dotsageyds6nf/gap/workspace-8407479053605879120");

But there is plenty of memory available:

$ cat /proc/meminfo
MemTotal:        8012124 kB
MemFree:         2736044 kB
Buffers:          125968 kB
Cached:          4721000 kB
SwapCached:            0 kB
Active:          2978056 kB
Inactive:        1930440 kB
Active(anon):      61564 kB
Inactive(anon):     3648 kB
Active(file):    2916492 kB
Inactive(file):  1926792 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:        524284 kB
SwapFree:         524284 kB
Dirty:                20 kB
Writeback:             0 kB
AnonPages:         61552 kB
Mapped:            14396 kB
Shmem:              3688 kB
Slab:             261404 kB
SReclaimable:     243728 kB
SUnreclaim:        17676 kB
KernelStack:        1280 kB
PageTables:         3568 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     4530344 kB
Committed_AS:     160128 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      561128 kB
VmallocChunk:   34359173628 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       47100 kB
DirectMap2M:     8173568 kB
Last edited 9 years ago by jdemeyer (previous) (diff)

comment:178 Changed 9 years ago by jdemeyer

  • Status changed from needs_work to positive_review
Last edited 9 years ago by jdemeyer (previous) (diff)

comment:179 Changed 9 years ago by jdemeyer

  • Status changed from positive_review to needs_work

comment:180 Changed 9 years ago by jdemeyer

  • Milestone changed from sage-5.5 to sage-5.6

comment:181 in reply to: ↑ 177 Changed 9 years ago by dimpase

Replying to jdemeyer:

There is something wrong with the pool size patch. On the buildbot machine "snapperkob" (Ubuntu 12.04 x86_64), gap is started with "-o 53m" which is too few.

I'll try to reproduce this on sage.combinat. This is the only Ubuntu 12.04 x86_64 I presently have access to.

comment:182 Changed 9 years ago by vbraun

The machine has only 512MB of swap? The pool defaults to 1/10th of the total swap.

comment:183 follow-ups: Changed 9 years ago by vbraun

I've changed the pool size computation to default to at least 75MB.

comment:184 in reply to: ↑ 183 Changed 9 years ago by dimpase

Replying to vbraun:

I've changed the pool size computation to default to at least 75MB.

yes, I was just reading your patch. It's my fault that I missed this potential problem before (I am aware of a "modern" school that teaches that swap is dead, as there is so much RAM nowadays...).

Let me test it now.

comment:185 in reply to: ↑ 183 ; follow-up: Changed 9 years ago by dimpase

Replying to vbraun:

I've changed the pool size computation to default to at least 75MB.

no, you have set it to 75*1024**3, which is 75GB! You should do 75*1024**2 instead.

comment:186 in reply to: ↑ 185 Changed 9 years ago by dimpase

Replying to dimpase:

Replying to vbraun:

I've changed the pool size computation to default to at least 75MB.

no, you have set it to 75*1024**3, which is 75GB! You should do 75*1024**2 instead.

I also think that

   suggested_size = max(int(mem.available_swap() / 10),  
 		                         int(mem.available_ram()  / 50),   # in case you run without swap 
 		                         75 * 1024**2 )      # about 75MB is the minimum to run GAP 

is way too much for machines with a lot of RAM. E.g. I tried this on sage.combinat and got 3.6GB as the suggested_size. I'd rather propose

   suggested_size = min(150 * 1024**2, 
                               max(int(mem.available_swap() / 10),  
 		               int(mem.available_ram()  / 50),   # in case you run without swap 
 		               75 * 1024**2 ))      # about 75MB is the minimum to run GAP 

for 150MB is certainly good enough, but would not lead to problems if you have a hundred instances of Sage running on the same machine.

comment:187 Changed 9 years ago by jdemeyer

On 32-bit systems:

sage -t  --long -force_lib devel/sage/sage/misc/memory_info.py
**********************************************************************
File "/var/lib/buildbot/build/sage/arando-1/arando_full/build/sage-5.5.beta2/devel/sage-main/sage/misc/memory_info.py", line 350:
    sage: mem.total_ram()
Expected:
    4294967296
Got:
    4294967296L
**********************************************************************

comment:188 Changed 9 years ago by jhpalmieri

Would it make sense, in spkg-install, to either remove the old GAP installation or (probably better) to move it to local/gap/gap-4.4.12?

comment:189 follow-up: Changed 9 years ago by vbraun

Thanks, fixed the MB... its been a long time that I've seen computations not finish because of O(1024^2) bytes of storage missing ;-)

I've also added the appropriate # 64-bit vs # 32-bit to the offending doctest.

The pool size is NOT allocated ram, it is an anonymous mmap and will only be used if necessary. 150MB is definitely not enough for serious computations. If you want hundreds of Sage instances then thats fine, you just need enough swap. Or, if you prefer, you can set up your desktop without swap. Then the pool will be backed by actual RAM, but you hopefully were aware of that when you decided not to have a swap partition.

Finally, I don't see any benefit in moving/deleting the previous GAP install. We'll just run into unintended consequences.

comment:190 in reply to: ↑ 189 Changed 9 years ago by dimpase

Replying to vbraun:

The pool size is NOT allocated ram, it is an anonymous mmap and will only be used if necessary. 150MB is definitely not enough for serious computations. If you want hundreds of Sage instances then thats fine, you just need enough swap.

Why do we need to set '-o' at all? GAP can perfectly start with only '-m' (i.e. the initial amount of memory) set only. '-o' is meant to set the maximal amount of RAM to be available, at least according to GAPs docs.

Last edited 9 years ago by dimpase (previous) (diff)

comment:191 follow-up: Changed 9 years ago by vbraun

If you don't specify '-o' then gap will just chose a pool size for you. The virtual memory pool address space can't be changed after GAP has started. The actual memory used can expand and contract. Of course the actual memory used is bounded by the pool size.

comment:192 in reply to: ↑ 191 ; follow-up: Changed 9 years ago by dimpase

Replying to vbraun:

If you don't specify '-o' then gap will just chose a pool size for you. The virtual memory pool address space can't be changed after GAP has started. The actual memory used can expand and contract. Of course the actual memory used is bounded by the pool size.

OK. Still, what about limiting the suggested_size to something (say, 500MB) - I agree that 150MB I suggested above is on a low side.

Anyway, we should put somewhere in the documentation a remark that by default such-and-such max amount is allocated, but if you need more, then do set_gap_memory_pool_size() before doing any GAP stuff.

By the way, can one terminate, from Sage, all the running GAP subprocesses?

comment:193 in reply to: ↑ 192 ; follow-up: Changed 9 years ago by vbraun

Replying to dimpase:

OK. Still, what about limiting the suggested_size to something (say, 500MB) - I agree that 150MB I suggested above is on a low side.

I'm against absolute limits, it should be a fraction of the available resources. Otherwise you'll end up on record with "640kb is enough for everyone".

Anyway, we should put somewhere in the documentation a remark that by default such-and-such max amount is allocated, but if you need more, then do set_gap_memory_pool_size() before doing any GAP stuff.

Its in the set_gap_memory_pool_size docstring already.

By the way, can one terminate, from Sage, all the running GAP subprocesses?

The Sage expect interface does not keep a list of started subprocesses, so no. That would be a nice enhancement to the whole expect stuff, but please in another ticket.

comment:194 Changed 9 years ago by vbraun

  • Status changed from needs_work to needs_review

comment:195 in reply to: ↑ 193 Changed 9 years ago by dimpase

Replying to vbraun:

Replying to dimpase:

OK. Still, what about limiting the suggested_size to something (say, 500MB) - I agree that 150MB I suggested above is on a low side.

I'm against absolute limits, it should be a fraction of the available resources. Otherwise you'll end up on record with "640kb is enough for everyone".

I propose a limit that can be explicitly overwritten, not something absolute. The reason is that on big machines a fraction of the resource can be too big to be meaningful, and will make sharing the machine very hard, as the first few processes will grab an enormous chunk of swap, and the remaining ones will suffer. I propose a limit that one would not even notice on a typical desktop (OK, if you think 500MB is too small, make it 750MB, or 1GB). And if you come to a super-duper machine to compute something huge then OK, please tell the system explicitly how much memory you might need.

As a matter of fact, I do not understand why GAP needs to reserve any swap at all. Just does not make sense to me. What is so special about GAP here?

comment:196 Changed 9 years ago by vbraun

By "absolute" limit I mean a fixed size (as opposed to relative to available resources). 640kb (or any other fixed amount) is not going to be good enough in the future.

The current GAP in Sage does soak up all the available swap and I don't see anybody complaining about it. It does make my desktop slow down to a crawl whenever the patchbot hits GAP, though. Maybe this patch is not the ideal solution, but its a vast improvement over what we currently have.

As for the rest, read the mmap manpage. You either reserve swap or you get SIGSEGV when you run out of memory. And GAP is not written in a way what would handle that signal.

comment:197 Changed 9 years ago by dimpase

  • Status changed from needs_review to positive_review

comment:198 Changed 9 years ago by vbraun

FWIW, this report https://groups.google.com/d/msg/sage-release/qxv0lvoSfN4/dSYcy-xOG1AJ mentions orphans with the previous version of GAP as well.

comment:199 Changed 9 years ago by vbraun

I had accidentally dropped the workaround for the gap -o 3*2^31 option parser overflow, its back in the trac_13211_pool_size.patch now.

Also, seems like the reporter on sage-release didn't get GAP orphans with the old version after all.

comment:200 follow-up: Changed 9 years ago by jdemeyer

  • Status changed from positive_review to needs_work

The memory size still isn't completely right, as I got this doctest error on snapperkob (Linux Ubuntu 12.04 x86_64, 8GB RAM + 0.5GB swap) and iras (Linux ia64, 4GB RAM + 2GB swap):

sage -t  --long -force_lib devel/sage/sage/groups/matrix_gps/matrix_group_morphism.py
**********************************************************************
File "/home/buildbot/build/sage/iras-1/iras_full/build/sage-5.6.beta0/devel/sage-main/sage/groups/matrix_gps/matrix_group_morphism.py", line 229:
    sage: f = O.hom([r*x*r_ for x in O.gens()])  # long time (19s on sage.math, 2011)
Exception raised:
    Traceback (most recent call last):
      File "/home/buildbot/build/sage/iras-1/iras_full/build/sage-5.6.beta0/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/home/buildbot/build/sage/iras-1/iras_full/build/sage-5.6.beta0/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/home/buildbot/build/sage/iras-1/iras_full/build/sage-5.6.beta0/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_7[17]>", line 1, in <module>
        f = O.hom([r*x*r_ for x in O.gens()])  # long time (19s on sage.math, 2011)###line 229:
    sage: f = O.hom([r*x*r_ for x in O.gens()])  # long time (19s on sage.math, 2011)
      File "/home/buildbot/build/sage/iras-1/iras_full/build/sage-5.6.beta0/local/lib/python/site-packages/sage/groups/matrix_gps/matrix_group.py", line 268, in hom
        return self.Hom(U)(x)
      File "/home/buildbot/build/sage/iras-1/iras_full/build/sage-5.6.beta0/local/lib/python/site-packages/sage/groups/matrix_gps/homset.py", line 114, in __call__
        im_gens, check=check)
      File "/home/buildbot/build/sage/iras-1/iras_full/build/sage-5.6.beta0/local/lib/python/site-packages/sage/groups/matrix_gps/matrix_group_morphism.py", line 75, in __init__
        phi0 = gap(self)
      File "/home/buildbot/build/sage/iras-1/iras_full/build/sage-5.6.beta0/local/lib/python/site-packages/sage/interfaces/interface.py", line 197, in __call__
        return self._coerce_from_special_method(x)
      File "/home/buildbot/build/sage/iras-1/iras_full/build/sage-5.6.beta0/local/lib/python/site-packages/sage/interfaces/interface.py", line 223, in _coerce_from_special_method
        return (x.__getattribute__(s))(self)
      File "sage_object.pyx", line 463, in sage.structure.sage_object.SageObject._gap_ (sage/structure/sage_object.c:4539)
      File "sage_object.pyx", line 439, in sage.structure.sage_object.SageObject._interface_ (sage/structure/sage_object.c:4139)
      File "/home/buildbot/build/sage/iras-1/iras_full/build/sage-5.6.beta0/local/lib/python/site-packages/sage/interfaces/interface.py", line 195, in __call__
        return cls(self, x, name=name)
      File "/home/buildbot/build/sage/iras-1/iras_full/build/sage-5.6.beta0/local/lib/python/site-packages/sage/interfaces/expect.py", line 1308, in __init__
        raise TypeError, x
    TypeError: Gap terminated unexpectedly while reading in a large line:
    Gap produced error output
    Error, exceeded the permitted memory (`-o' command line option)

       executing Read("/home/buildbot/.sage/temp/iras/26682/interface/tmp26707");
**********************************************************************

Since this never happened with the old GAP, I consider this a regression which should be fixed.

Last edited 9 years ago by jdemeyer (previous) (diff)

comment:201 Changed 9 years ago by jdemeyer

On hawk (OpenSolaris i386), I get this reproducible doctest error:

sage -t  --long -force_lib devel/sage/sage/interfaces/gap.py
gap: halving pool size.
**********************************************************************
File "/export/home/buildbot/build/sage/hawk-1/hawk_full/build/sage-5.6.beta0/devel/sage-main/sage/interfaces/gap.py", line 376:
    sage: gap('"finished computation"'); gap.interrupt(); gap('"ok"')
Exception raised:
    Traceback (most recent call last):
      File "/export/home/buildbot/build/sage/hawk-1/hawk_full/build/sage-5.6.beta0/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/export/home/buildbot/build/sage/hawk-1/hawk_full/build/sage-5.6.beta0/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/export/home/buildbot/build/sage/hawk-1/hawk_full/build/sage-5.6.beta0/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_6[5]>", line 1, in <module>
        gap('"finished computation"'); gap.interrupt(); gap('"ok"')###line 376:
    sage: gap('"finished computation"'); gap.interrupt(); gap('"ok"')
      File "/export/home/buildbot/build/sage/hawk-1/hawk_full/build/sage-5.6.beta0/local/lib/python/site-packages/sage/interfaces/gap.py", line 393, in interrupt
        E.sendline()
      File "/export/home/buildbot/build/sage/hawk-1/hawk_full/build/sage-5.6.beta0/local/lib/python/site-packages/pexpect.py", line 677, in sendline
        n = n + self.send (os.linesep)
      File "/export/home/buildbot/build/sage/hawk-1/hawk_full/build/sage-5.6.beta0/local/lib/python/site-packages/pexpect.py", line 669, in send
        c = os.write(self.child_fd, str)
    OSError: [Errno 22] Invalid argument
**********************************************************************
File "/export/home/buildbot/build/sage/hawk-1/hawk_full/build/sage-5.6.beta0/devel/sage-main/sage/interfaces/gap.py", line 1794:
    sage: gap_version()
Expected:
    doctest:...: DeprecationWarning: use gap.version() instead
    See http://trac.sagemath.org/13211 for details.
    '4.5.6'
Got:
    doctest:1: DeprecationWarning: use gap.version() instead
    See http://trac.sagemath.org/13211 for details.
    ** Gap crashed or quit executing 'VERSION;' **
    Restarting Gap and trying again
    '4.5.6'
**********************************************************************

It seems we're trying to write to a closed filedescriptor.

comment:202 in reply to: ↑ 200 Changed 9 years ago by vbraun

Replying to jdemeyer:

The memory size still isn't completely right, as I got this doctest error on snapperkob (Linux Ubuntu 12.04 x86_64, 8GB RAM + 0.5GB swap) and iras (Linux ia64, 4GB RAM + 2GB swap):

These are cases where the hardcoded minimum for the GAP memory pool is used (since they have insufficient swap space). I'll try to find a minimal value that is sufficient to run the long doctests, not just to start up GAP.

Changed 9 years ago by vbraun

Updated patch

comment:203 follow-up: Changed 9 years ago by vbraun

  • Status changed from needs_work to positive_review

Turns out we need 76MB for the long doctests, but we hardcoded 75MB. I've increased the minimum to 100MB to give us some headroom for future doctests. I'll see if I can reproduce the OpenSolaris bug on OpenIndiana (if anybody with access to hawk can debug it go for it, but I don't have an account). But that shouldn't stop us from shipping the update.

comment:204 in reply to: ↑ 203 ; follow-ups: Changed 9 years ago by jdemeyer

  • Status changed from positive_review to needs_work

Replying to vbraun:

I'll see if I can reproduce the OpenSolaris bug on OpenIndiana (if anybody with access to hawk can debug it go for it, but I don't have an account). But that shouldn't stop us from shipping the update.

A failure on a supported platform does stop the update...

comment:205 in reply to: ↑ 204 Changed 9 years ago by dimpase

Replying to jdemeyer:

Replying to vbraun:

I'll see if I can reproduce the OpenSolaris bug on OpenIndiana (if anybody with access to hawk can debug it go for it, but I don't have an account). But that shouldn't stop us from shipping the update.

A failure on a supported platform does stop the update...

I am having trouble building 5.5.rc1 on hawk. It errors out for no obvious reason.

comment:206 in reply to: ↑ 204 Changed 9 years ago by vbraun

Replying to jdemeyer:

A failure on a supported platform does stop the update...

The most glaring bug is of course that OpenSolaris is a fully supported platform. https://groups.google.com/d/topic/sage-devel/nRVBUQtL_d4/discussion

comment:207 Changed 9 years ago by jdemeyer

Another unrelated issue is that setting SAGE_DEBUG=yes doesn't correctly set -O0, as it's overwritten later in the command line by an -O2:

$ SAGE_DEBUG=yes ./sage -f gap-4.5.6.p0.spkg
[...]
gcc -I. -I../.. -DCONFIG_H  -I/release/merger/sage-5.6.beta0/local/include -D__GMP_MP_RELEASE=50002 -O0 -g3 -DDEBUG_MASTERPOINTERS -DDEBUG_GLOBAL_BAGS -DDEBUG_FUNCTIONS_BAGS -Wall -g -O2   -o intfuncs.o -c ../../src/intfuncs.c
[...]

comment:208 Changed 9 years ago by jdemeyer

FYI: GAP-4.5.7 has been released which fixes (amongst other things)

Numbers in memory options on the command line exceeding 232 could not be parsed correctly, even on 64-bit systems. [Reported by Volker Braun]

Changed 9 years ago by jdemeyer

Fix CFLAGS setting in GAP configure file

comment:209 Changed 9 years ago by jdemeyer

I reported the patch cflags.patch upstream, it should be added to the spkg.

comment:210 follow-up: Changed 9 years ago by jdemeyer

The "Special Update/Build Instructions" are quite unclear to me. I'm trying to update the package to GAP-4.5.7 but it's difficult to understand what all this means:

This is a stripped-down version of GAP.  The databases, which are arch
independent, are in a separate package and doc and tests are removed.

** IMPORTANT **:
   When you update this package, be sure to put the guava package
   in the package directory!!

Delete some of the documentation:
  cd doc
  rm *.bbl *.aux *.dvi *.idx *.ilg *.l* *.m* *.pdf *.toc  *.blg *.ind
  rm */*.bbl */*.aux */*.dvi */*.idx */*.ilg */*.l* */*.m* */*.pdf */*.ind */*.toc */*.blg

DATABASES (separated out to database_gap.spkg) except GAPDoc which is required:
  rm -rf small prim trans
  cd pkg
  rm -rf !(GAPDoc*)

Stuff that isn't GAP sources:
  rm -rf bin/*
  cd extern
  rm -rf !Makefile.in
Last edited 9 years ago by jdemeyer (previous) (diff)

comment:211 in reply to: ↑ 210 ; follow-up: Changed 9 years ago by dimpase

Replying to jdemeyer:

The "Special Update/Build Instructions" are quite unclear to me. I'm trying to update the package to GAP-4.5.7 but it's difficult to understand what all this means:

This is a stripped-down version of GAP.  The databases, which are arch
independent, are in a separate package and doc and tests are removed.

** IMPORTANT **:
   When you update this package, be sure to put the guava package
   in the package directory!!

somehow, guava, the favorite GAP package of DJ, used to enjoy a special deal. But not anymore, for a while already. This needs to be updated.

GAP packages (a selection) go to gap_packages optional package.

I hope the rest is clear.

Delete some of the documentation:

cd doc rm *.bbl *.aux *.dvi *.idx *.ilg *.l* *.m* *.pdf *.toc *.blg *.ind rm */*.bbl */*.aux */*.dvi */*.idx */*.ilg */*.l* */*.m* */*.pdf */*.ind */*.toc */*.blg

DATABASES (separated out to database_gap.spkg) except GAPDoc which is required:

rm -rf small prim trans cd pkg rm -rf !(GAPDoc*)

Stuff that isn't GAP sources:

rm -rf bin/* cd extern rm -rf !Makefile.in

}}}

comment:212 in reply to: ↑ 211 Changed 9 years ago by jdemeyer

Replying to dimpase:

I hope the rest is clear.

It's not.

When updating the sources is non-trivial, it's best to create a shell script (called spkg-src) which downloads and prepares src/.

Changed 9 years ago by jdemeyer

Fix interrupts in Solaris

comment:213 Changed 9 years ago by jdemeyer

  • Work issues set to Clarify src updates

I have a new spkg ready which includes two patches: cflags.patch and siginterrupt.patch. The latter one fixes the OpenSolaris issue and has also been reported upstream.

Somebody else should really clarify SPKG.txt on how to update src or write a shell script to do so.

comment:214 follow-up: Changed 9 years ago by vbraun

Call me an optimist, but I'm still hoping that upstream will clean up their build system and dist tarball. The layout of their dist tarball and the required packages are also in a flux, so I don't see the value of a shell script. Having said that, the instructions are written in a manner that you just have to cd into the gap* source directory and execute them. E.g. there are directories gap4r5/small, gap4r5/prim, and gap4r5/trans. Then by "rm -rf small prim trans" I mean to check that the directory layout hasn't changed and then delete these directories.

comment:215 in reply to: ↑ 214 Changed 9 years ago by jdemeyer

Replying to vbraun:

Having said that, the instructions are written in a manner that you just have to cd into the gap* source directory and execute them.

Not quite. I don't know what rm -rf !(GAPDoc*) is supposed to do:

jdemeyer@sage:~/spkg/gap-4.5.7.p0/src/pkg$ rm -rf !(GAPDoc*)
bash: !: event not found

Besides, if you're writing out the commands in SPKG.txt, you might as well create an actual shell script which is easier to use than copy/pasting the code and putting the right "cd" statements in between.

comment:216 follow-up: Changed 9 years ago by vbraun

You don't like extended shell globs? Then there is no concise way to match with exceptions, I think.

shopt +extglob

Both the mercurial and subversion bash completion scripts enable this by default.

comment:217 in reply to: ↑ 216 Changed 9 years ago by jdemeyer

Replying to vbraun:

You don't like extended shell globs?

Never heard of this...

comment:218 Changed 9 years ago by jdemeyer

I guess you mean

shopt -s extglob

comment:219 Changed 9 years ago by vbraun

Yes, sorry.

comment:220 Changed 9 years ago by jdemeyer

  • Description modified (diff)

Changed 9 years ago by jdemeyer

Diff for the GAP spkg 4.5.6 -> 4.5.7.p0. For reference / review only.

comment:221 Changed 9 years ago by jdemeyer

  • Authors changed from Volker Braun to Volker Braun, Jeroen Demeyer
  • Status changed from needs_work to needs_review

I updated SPKG.txt and upgraded to GAP-4.5.7. needs review...

comment:222 Changed 9 years ago by jdemeyer

  • Work issues Clarify src updates deleted

comment:223 Changed 9 years ago by jdemeyer

  • Summary changed from Upgrade GAP to 4.5.6 to Upgrade GAP to 4.5.7

comment:224 follow-ups: Changed 9 years ago by vbraun

Did you check that nothing in the gap_package and database_gap packages changed? Should I version bump them to 4.5.7 to be clear?

comment:225 in reply to: ↑ 224 ; follow-ups: Changed 9 years ago by dimpase

Replying to vbraun:

Did you check that nothing in the gap_package and database_gap packages changed? Should I version bump them to 4.5.7 to be clear?

At least some packages might have been changed (this is not in sync with the GAP releases, AFAIK).

Also, after updating with Jeroen's spkg, I still get gap.version 4.5.6, not 4.5.7.

comment:226 in reply to: ↑ 225 Changed 9 years ago by vbraun

Replying to dimpase:

Also, after updating with Jeroen's spkg, I still get gap.version 4.5.6, not 4.5.7.

Me, too. I verified that the upstream tarball shows the correct version (4.5.7) upon startup, so something is wrong with the spkg sources.

comment:227 in reply to: ↑ 225 Changed 9 years ago by jdemeyer

Replying to dimpase:

Also, after updating with Jeroen's spkg, I still get gap.version 4.5.6, not 4.5.7.

It works for me:

jdemeyer@sage:/release/merger/sage-5.6.beta1$ ./sage --gap
 *********   GAP, Version 4.5.7 of 14-Dec-2012 (free software, GPL)
 *  GAP  *   http://www.gap-system.org
 *********   Architecture: x86_64-unknown-linux-gnu-gcc-default64
 Libs used:  gmp, readline
 Loading the library and packages ...
 Packages:   GAPDoc 1.5.1
 Try '?help' for help. See also  '?copyright' and  '?authors'
gap>
jdemeyer@sage:/release/merger/sage-5.6.beta1$ ./sage
----------------------------------------------------------------------
| Sage Version 5.5.rc1, Release Date: 2012-12-18                     |
| Type "notebook()" for the browser-based notebook interface.        |
| Type "help()" for help.                                            |
----------------------------------------------------------------------
**********************************************************************
*                                                                    *
* Warning: this is a prerelease version, and it may be unstable.     *
*                                                                    *
**********************************************************************
sage: gap.version()
'4.5.7'

comment:228 Changed 9 years ago by jdemeyer

The doctest needs to be changed though to reflect version 4.5.7

comment:229 in reply to: ↑ 224 Changed 9 years ago by jdemeyer

Replying to vbraun:

Did you check that nothing in the gap_package and database_gap packages changed?

No, I didn't check anything, nor did I realise that this was needed. Please go ahead and update them.

comment:230 follow-up: Changed 9 years ago by vbraun

The gap/latest symlink is not correctly created, which is why the old version is still used if you installed gap-4.5.6 previously. I'll fix that, too.

comment:231 Changed 9 years ago by jdemeyer

  • Description modified (diff)

comment:232 in reply to: ↑ 230 Changed 9 years ago by jdemeyer

Replying to vbraun:

The gap/latest symlink is not correctly created, which is why the old version is still used if you installed gap-4.5.6 previously. I'll fix that, too.

ok

comment:233 Changed 9 years ago by vbraun

  • Description modified (diff)

Changed 9 years ago by vbraun

spkg diff for review only

comment:234 Changed 9 years ago by vbraun

Nothing changed in the gap_packages and database_gap this time, but I bumped the version to match. In any case they need to be updated from the old gap-4.4 ones.

comment:235 Changed 9 years ago by vbraun

  • Status changed from needs_review to positive_review

Positive review to Jeroen's changes.

Changed 9 years ago by jdemeyer

Updated for GAP-4.5.7 (contains pool_size patch)

comment:236 follow-up: Changed 9 years ago by jdemeyer

  • Status changed from positive_review to needs_work

The memory size still isn't right, as I got this doctest error on snapperkob (Linux Ubuntu 12.04 x86_64, 8GB RAM + 0.5GB swap):

sage -t  --long -force_lib devel/sage/sage/combinat/root_system/weyl_group.py
**********************************************************************
File "/home/buildbot/build/sage/snapperkob/snapperkob_full/build/sage-5.6.beta1/devel/sage-main/sage/combinat/root_system/weyl_group.py", line 543:
    sage: all( WeylGroup(t).long_element() == WeylGroup(t).long_element_hardcoded() for t in types )  # long time (17s on sage.math, 2011)
Exception raised:
    Traceback (most recent call last):
      File "/home/buildbot/build/sage/snapperkob/snapperkob_full/build/sage-5.6.beta1/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/home/buildbot/build/sage/snapperkob/snapperkob_full/build/sage-5.6.beta1/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/home/buildbot/build/sage/snapperkob/snapperkob_full/build/sage-5.6.beta1/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_16[4]>", line 1, in <module>
        all( WeylGroup(t).long_element() == WeylGroup(t).long_element_hardcoded() for t in types )  # long time (17s on sage.math, 2011)###line 543:
    sage: all( WeylGroup(t).long_element() == WeylGroup(t).long_element_hardcoded() for t in types )  # long time (17s on sage.math, 2011)
      File "<doctest __main__.example_16[4]>", line 1, in <genexpr>
        all( WeylGroup(t).long_element() == WeylGroup(t).long_element_hardcoded() for t in types )  # long time (17s on sage.math, 2011)###line 543:
    sage: all( WeylGroup(t).long_element() == WeylGroup(t).long_element_hardcoded() for t in types )  # long time (17s on sage.math, 2011)
      File "/home/buildbot/build/sage/snapperkob/snapperkob_full/build/sage-5.6.beta1/local/lib/python/site-packages/sage/combinat/root_system/weyl_group.py", line 579, in long_element_hardcoded
        return self.__call__(m)
      File "/home/buildbot/build/sage/snapperkob/snapperkob_full/build/sage-5.6.beta1/local/lib/python/site-packages/sage/combinat/root_system/weyl_group.py", line 362, in __call__
        if not gap(g) in gap(self):
      File "/home/buildbot/build/sage/snapperkob/snapperkob_full/build/sage-5.6.beta1/local/lib/python/site-packages/sage/interfaces/interface.py", line 675, in __contains__
        return P._contains(x.name(), self.name())
      File "/home/buildbot/build/sage/snapperkob/snapperkob_full/build/sage-5.6.beta1/local/lib/python/site-packages/sage/interfaces/gap.py", line 815, in _contains
        return self.eval('%s in %s'%(v1,v2)) == "true"
      File "/home/buildbot/build/sage/snapperkob/snapperkob_full/build/sage-5.6.beta1/local/lib/python/site-packages/sage/interfaces/gap.py", line 574, in eval
        result = Expect.eval(self, input_line, **kwds)
      File "/home/buildbot/build/sage/snapperkob/snapperkob_full/build/sage-5.6.beta1/local/lib/python/site-packages/sage/interfaces/expect.py", line 1220, in eval
        for L in code.split('\n') if L != ''])
      File "/home/buildbot/build/sage/snapperkob/snapperkob_full/build/sage-5.6.beta1/local/lib/python/site-packages/sage/interfaces/gap.py", line 775, in _eval_line
        raise RuntimeError, message
    RuntimeError: Gap produced error output
    Error, exceeded the permitted memory (`-o' command line option)

       executing $sage3 in $sage11;
**********************************************************************

comment:237 in reply to: ↑ 236 Changed 9 years ago by dimpase

Replying to jdemeyer:

The memory size still isn't right, as I got this doctest error on snapperkob (Linux Ubuntu 12.04 x86_64, 8GB RAM + 0.5GB swap):

same on my OSX 10.6.8 laptop. Actually, just the following suffices:

$ sage 
----------------------------------------------------------------------
| Sage Version 5.5.rc0, Release Date: 2012-11-17                     |
| Type "notebook()" for the browser-based notebook interface.        |
| Type "help()" for help.                                            |
----------------------------------------------------------------------
**********************************************************************
*                                                                    *
* Warning: this is a prerelease version, and it may be unstable.     *
*                                                                    *
**********************************************************************
sage: WeylGroup(['E',6]).long_element() == WeylGroup(['E',6]).long_element_hardcoded()
ERROR: An unexpected error occurred while tokenizing input
[...]
RuntimeError: Gap produced error output
Error, exceeded the permitted memory (`-o' command line option)

Repeating this line again in the same session produces no error, something I don't understand.

Changed 9 years ago by vbraun

Updated patch

comment:238 Changed 9 years ago by vbraun

  • Description modified (diff)
  • Status changed from needs_work to positive_review

I see, the by far largest computation with GAP is actually in combinat and not in the group theory stuff %-) We actually need about 220 MB, so I increased the minimum to 250 MB. I checked that make ptestlong runs without any further errors.

Dima, running the command twice probably uses values that were cached in Sage.

comment:239 Changed 9 years ago by jdemeyer

Volker, your patch is missing memory_info.py for some reason.

comment:240 Changed 9 years ago by jdemeyer

Also, something else I just noted: in #12221, I determined it was better to complete unset TERM instead of setting it to "dumb". I don't remember why, but I do remember that was the most reliable.

But I don't care very much since it seems to pass doctests...

comment:241 Changed 9 years ago by jdemeyer

  • Status changed from positive_review to needs_work

Changed 9 years ago by vbraun

Updated patch

Changed 9 years ago by vbraun

Updated patch

comment:242 Changed 9 years ago by vbraun

  • Description modified (diff)
  • Status changed from needs_work to positive_review

Strange that the file got lost, I didn't edit anything in that dir. I put memory_info.py back in and switched from "dumb" to no TERM.

comment:243 follow-up: Changed 9 years ago by jdemeyer

  • Merged in set to sage-5.6.beta1
  • Resolution set to fixed
  • Status changed from positive_review to closed

And there was much rejoicing...

comment:244 Changed 9 years ago by ppurka

Thanks! \o/

comment:245 in reply to: ↑ 243 Changed 9 years ago by SimonKing

Replying to jdemeyer:

And there was much rejoicing...

Congratulation! Although it means that I have to produce a new version of my group cohomology spkg - it seems that the new Gap version has a different syntax or function names or whatever else to cope with... :(

comment:246 Changed 9 years ago by schilly

both optional SPKSes are on their way around the world :-)

comment:247 Changed 9 years ago by jhpalmieri

The database_gap spkg on the mirrors seems to be corrupted, as does the one on Volker's page in the ticket description:

palmieri@sage:sage$ ./sage -i http://www.stp.dias.ie/~vbraun/Sage/spkg/database_gap-4.5.7.spkg
Attempting to download package database_gap-4.5.7
>>> Downloading database_gap-4.5.7.spkg.
[............................................................]
database_gap-4.5.7
====================================================
Extracting package /scratch/palmieri/sage-5.5.rc0/spkg/optional/database_gap-4.5.7.spkg
-rw-r--r-- 1 palmieri palmieri 59654144 2012-12-29 17:33 /scratch/palmieri/sage-5.5.rc0/spkg/optional/database_gap-4.5.7.spkg
tar: Skipping to next header
tar: Error exit delayed from previous errors
Error: failed to extract /scratch/palmieri/sage-5.5.rc0/spkg/optional/database_gap-4.5.7.spkg

comment:248 Changed 9 years ago by vbraun

Fixed, the file was truncated. This is the correct checksum:

[vbraun@volker-desktop spkg]$ md5sum database_gap-4.5.7.spkg 
46b0a14437b1fe996cbbb482d00e5325  database_gap-4.5.7.spkg

comment:249 Changed 9 years ago by schilly

ok,i've replaced the faulty file on the master and the md5 sum matches now. mirrors are updating when you read this!

comment:250 Changed 9 years ago by kcrisman

Just FYI, JP opened a followup at #13954 since this broke on Cygwin. No action necessary here, of course, and it looks like he has a fix in any case.

comment:251 follow-up: Changed 9 years ago by kcrisman

I get some weird stuff in interfaces/gap.py on OS X 10.4 with 5.6.beta2. Things like

sage: n = get_gap_memory_pool_size()
<snip>
    free_ram = int(free_ram([:-1]) * units[free_ram[-1]]
ValueError: invalid literal for int() with base 10: '33.1'

and lots and lots of other similar things. The new file misc/memory_info.py has similar problems. This was all introduced in this ticket, so I assume it is related... but maybe there was a followup I am unaware of? Thanks for any feedback.

comment:252 in reply to: ↑ 251 ; follow-up: Changed 9 years ago by jdemeyer

There is a follow-up at #13880, but if that doesn't fix your problem, please post to sage-devel such that this can be fixed before releasing the final Sage 5.6.

comment:253 in reply to: ↑ 252 ; follow-up: Changed 9 years ago by kcrisman

There is a follow-up at #13880, but if that doesn't fix your problem, please post to sage-devel such that this can be fixed before releasing the final Sage 5.6.

I saw that, but it didn't seem to be quite the same issue... but I'll try it!

comment:254 in reply to: ↑ 253 ; follow-up: Changed 9 years ago by kcrisman

There is a follow-up at #13880, but if that doesn't fix your problem, please post to sage-devel such that this can be fixed before releasing the final Sage 5.6.

I saw that, but it didn't seem to be quite the same issue... but I'll try it!

Sorry, it didn't seem to do it. I've posted to sage-devel.

comment:255 in reply to: ↑ 254 Changed 9 years ago by jdemeyer

(never mind)

Last edited 9 years ago by jdemeyer (previous) (diff)

comment:256 Changed 9 years ago by jdemeyer

This caused a serious slow-down in congruence subgroups: https://groups.google.com/forum/?fromgroups#!topic/sage-devel/e3EDIRLuJXA

Note: See TracTickets for help on using tickets.