#13211 closed enhancement (fixed)
Upgrade GAP to 4.5.7
Reported by:  kini  Owned by:  tbd 

Priority:  major  Milestone:  sage5.6 
Component:  packages: standard  Keywords:  
Cc:  ppurka, dimpase, mmarco, jhpalmieri, rbeezer, vbraun, burcin  Merged in:  sage5.6.beta1 
Authors:  Volker Braun, Jeroen Demeyer  Reviewers:  Dmitrii Pasechnik 
Report Upstream:  Reported upstream. Developers acknowledge bug.  Work issues:  
Branch:  Commit:  
Dependencies:  #13123, #13579  Stopgaps: 
Description (last modified by )
While we are at it, move the gap install to $SAGE_LOCAL/gap/gap.x.y.z
. Its not cool to put anything but libraries into /lib
. Also, make a symlink latest
> gapx.y.z
so that not every script has to figure out the current version number. This follows what is usually done with java, another offender who can't install in a standardscompliant manner:
[vbraun@laptop ~]$ ll /usr/java total 8 drwxrxrx. 3 root root 4096 Jul 9 17:37 jdk1.7.0_03 drwxrxrx. 8 root root 4096 Jul 9 17:37 jdk1.7.0_05 lrwxrwxrwx. 1 root root 21 Jul 9 17:37 latest > /usr/java/jdk1.7.0_05
Updated spkgs:
 http://www.stp.dias.ie/~vbraun/Sage/spkg/gap4.5.7.p1.spkg
 http://www.stp.dias.ie/~vbraun/Sage/spkg/gap_packages4.5.7.spkg
 http://www.stp.dias.ie/~vbraun/Sage/spkg/database_gap4.5.7.spkg
Apply to SAGE_ROOT
Apply
Attachments (18)
Change History (275)
comment:1 Changed 9 years ago by
 Cc mmarco added
 Description modified (diff)
 Summary changed from Upgrade GAP to 4.5.4 to Upgrade GAP to 4.5
comment:2 Changed 9 years ago by
 Cc jhpalmieri added
comment:3 Changed 9 years ago by
 Cc rbeezer added
comment:4 Changed 9 years ago by
 Cc vbraun added
comment:5 Changed 9 years ago by
 Cc burcin added
 Description modified (diff)
 Summary changed from Upgrade GAP to 4.5 to Upgrade GAP to 4.5.5
comment:6 Changed 9 years ago by
 Keywords rng added
And of course the random number generator (random group element) in GAP changed, yay!
comment:7 Changed 9 years ago by
It could also be a good moment to include as many gap packages as possible in the gap_packages spkg. I made a list with the licenses of some packages [1] and a spkg with those packages that have gpl license and i was able to make install seamlessly [2].
It was all for 4.4.12 version, but should be easily ported to 4.5
[1] https://docs.google.com/spreadsheet/ccc?key=0AvB7eBQW5NGdDN2X1NVQnUyQ24tQ05CQzRlVVh2Rmc [2] https://docs.google.com/open?id=0B_B7eBQW5NGWUUzQzYxdFRqaXM
comment:8 followup: ↓ 9 Changed 9 years ago by
No this is not a good moment to add features to gap. Make a separate ticket.
comment:9 in reply to: ↑ 8 Changed 9 years ago by
comment:10 Changed 9 years ago by
I've based it on #13341 and tried to not touch any cygwin stuff. Though its probably save to say that cygwin broke, I don't have a Windows machine to test on.
comment:11 Changed 9 years ago by
 Description modified (diff)
Also the new gap prints escape codes on Fedora 17, I've communicated this to the GAP devs here:
[vbraun@laptop ~]$ echo '1+1;'  sage gap q  od c 0000000 033 [ ? 1 0 3 4 h 2 \n 0000012
comment:12 Changed 9 years ago by
Here are test failures on my OpenSUSE
laptop.
Most of the tests fail because of changes in the random generator or because of slightly changed error messages or because of changing documentation in GAP or simply because of testing against the version number.
Here are more serious problems:
File "/home/simon/SAGE/prerelease/sage5.2.rc0/devel/sagemain/sage/interfaces/gap.py", line 1330: sage: 'Centralizer' in s5.trait_names() Exception raised: Traceback (most recent call last): File "/home/simon/SAGE/prerelease/sage5.2.rc0/local/bin/ncadoctest.py", line 1231, in run_one_test self.run_one_example(test, example, filename, compileflags) File "/home/simon/SAGE/prerelease/sage5.2.rc0/local/bin/sagedoctest.py", line 38, in run_one_example OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags) File "/home/simon/SAGE/prerelease/sage5.2.rc0/local/bin/ncadoctest.py", line 1172, in run_one_example compileflags, 1) in test.globs File "<doctest __main__.example_46[3]>", line 1, in <module> 'Centralizer' in s5.trait_names()###line 1330: sage: 'Centralizer' in s5.trait_names() File "/home/simon/SAGE/prerelease/sage5.2.rc0/local/lib/python/sitepackages/sage/interfaces/gap.py", line 1338, in trait_names v = eval(v) File "<string>", line 4 "in", "ShallowCopy", <Attribute "Name", ^ SyntaxError: invalid syntax
In sage/coding/linear_code.py are a few crashes, like this:
File "/home/simon/SAGE/prerelease/sage5.2.rc0/devel/sagemain/sage/coding/linear_code.py", line 2047: sage: G.order() Exception raised: Traceback (most recent call last): ... RuntimeError: Gap produced error output Error, Variable: '$sage16' must have a value executing Size($sage16); ********************************************************************** File "/home/simon/SAGE/prerelease/sage5.2.rc0/devel/sagemain/sage/coding/linear_code.py", line 2716: sage: C.zeta_polynomial() Exception raised: Traceback (most recent call last): ... RuntimeError: Gap produced error output Error, Variable: '$sage24' must have a value executing Print($sage24); ********************************************************************** File "/home/simon/SAGE/prerelease/sage5.2.rc0/devel/sagemain/sage/coding/linear_code.py", line 599: sage: for B in self_orthogonal_binary_codes(7,3,4): print B; print B.gen_mat() Expected: Linear code of length 4, dimension 1 over Finite Field of size 2 [1 1 1 1] Linear code of length 6, dimension 2 over Finite Field of size 2 [1 1 1 1 0 0] [0 1 0 1 1 1] Linear code of length 7, dimension 3 over Finite Field of size 2 [1 0 1 1 0 1 0] [0 1 0 1 1 1 0] [0 0 1 0 1 1 1] Got: Linear code of length 4, dimension 1 over Finite Field of size 2 [1 1 1 1] ** Gap crashed or quit executing 'Read("/home/simon/.sage//temp/linux_sqwp.site/18857//interface//tmp18907");' ** Restarting Gap and trying again Linear code of length 6, dimension 2 over Finite Field of size 2 [1 1 1 1 0 0] [0 1 0 1 1 1] Linear code of length 7, dimension 3 over Finite Field of size 2 [1 0 1 1 0 1 0] [0 1 0 1 1 1 0] [0 0 1 0 1 1 1] **********************************************************************
Note the crash in the last example. A similar crash is:
File "/home/simon/SAGE/prerelease/sage5.2.rc0/devel/sagemain/sage/coding/code_constructions.py", line 530: sage: C.minimum_distance() Expected: 4 Got: ** Gap crashed or quit executing 'Read("/home/simon/.sage//temp/linux_sqwp.site/18774//interface//tmp18791");' ** Restarting Gap and trying again 4
Several tests in latin.py fail, probably because of changes in the random generator. Couldn't one test instead whether the squares really are latin?
A strange one:
File "/home/simon/SAGE/prerelease/sage5.2.rc0/devel/sagemain/sage/tests/cmdline.py", line 359: sage: out Expected: '120\n' Got: '\x1b[?1034h120\n'
comment:13 Changed 9 years ago by
 Report Upstream changed from N/A to Reported upstream. No feedback yet.
I've isolated a test case for the gap crashes and sent it to the GAP developers.
comment:14 Changed 9 years ago by
 Description modified (diff)
 Keywords rng removed
 Report Upstream changed from Reported upstream. No feedback yet. to Reported upstream. Developers acknowledge bug.
The crashes are all due to a GAP garbage collection bug in Z/2Zspecific code. I've fixed all other doctest errors. The only remaining ones are
sage t force_lib devel/sage/sage/coding/code_constructions.py # 1 doctests failed sage t force_lib devel/sage/sage/coding/linear_code.py # 3 doctests failed
comment:15 Changed 9 years ago by
 Report Upstream changed from Reported upstream. Developers acknowledge bug. to Fixed upstream, in a later stable release.
 Status changed from new to needs_review
I received a patch from the upstream developers, to be included in gap4.5.6. I've updated the gap spkg with the fix, now the Sage testsuite runs without errors. I also updated the gap_packages spkg with a fix for a function name clash in braid1.1 that broke the GAP testsuite (i.e. SAGE_CHECK=yes). So as far as I'm concerned we are good to go.
I would appreciate a speedy review of this ticket *nudge* *nudge*
comment:16 Changed 9 years ago by
Installing the gap4.5.5 spkg, the new gap packages and database gap spkgs worked fine on openSuse
with SAGE_CHECK=yes
. Now I'll start the Sage test suite.
comment:17 Changed 9 years ago by
The Sage test suite passes as well. hg status
has nothing to complain, in any of the three packages. And SPKG.txt
is updated in all cases. Hence, from my perspective, it is a positive review. However, I'll repeat on bsd.math, and perhaps someone else can repeat on openSolaris
.
comment:18 Changed 9 years ago by
PS: Unfortunately most test of the tobereviewed latest version of my p_group_cohomology spkg fail with the new versions of Singular and GAP. But I guess that's my own problem...
comment:19 Changed 9 years ago by
Dear Volker and Simon,
Thanks very much for your work on this one. I've been becoming a lot more familiar with the GAP interface this summer and getting undergraduate students involved, so I really appreciate your work getting this organized and reviewed.
Rob
comment:20 followup: ↓ 21 Changed 9 years ago by
On two different OS X 10.7 machines, there is one doctest failure:
sage t long "devel/sage/sage/interfaces/gap.py" ********************************************************************** File "/Users/palmieri/Desktop/Sage_stuff/sage_builds/sage5.3.rc0gap/devel/sage/sage/interfaces/gap.py", line 521: sage: a = gap(3) Exception raised: Traceback (most recent call last): File "/Users/palmieri/Desktop/Sage_stuff/sage_builds/sage5.3.rc0gap/local/bin/ncadoctest.py", line 1231, in run_one_test self.run_one_example(test, example, filename, compileflags) File "/Users/palmieri/Desktop/Sage_stuff/sage_builds/sage5.3.rc0gap/local/bin/sagedoctest.py", line 38, in run_one_example OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags) File "/Users/palmieri/Desktop/Sage_stuff/sage_builds/sage5.3.rc0gap/local/bin/ncadoctest.py", line 1172, in run_one_example compileflags, 1) in test.globs File "<doctest __main__.example_9[8]>", line 1, in <module> a = gap(Integer(3))###line 521: sage: a = gap(3) File "/Users/palmieri/Desktop/Sage_stuff/sage_builds/sage5.3.rc0gap/local/lib/python/sitepackages/sage/interfaces/interface.py", line 197, in __call__ return self._coerce_from_special_method(x) File "/Users/palmieri/Desktop/Sage_stuff/sage_builds/sage5.3.rc0gap/local/lib/python/sitepackages/sage/interfaces/interface.py", line 223, in _coerce_from_special_method return (x.__getattribute__(s))(self) File "sage_object.pyx", line 463, in sage.structure.sage_object.SageObject._gap_ (sage/structure/sage_object.c:4529) File "sage_object.pyx", line 439, in sage.structure.sage_object.SageObject._interface_ (sage/structure/sage_object.c:4129) File "/Users/palmieri/Desktop/Sage_stuff/sage_builds/sage5.3.rc0gap/local/lib/python/sitepackages/sage/interfaces/interface.py", line 195, in __call__ return cls(self, x, name=name) File "/Users/palmieri/Desktop/Sage_stuff/sage_builds/sage5.3.rc0gap/local/lib/python/sitepackages/sage/interfaces/expect.py", line 1330, in __init__ raise TypeError, x TypeError: Gap produced error output Error, user interrupt executing $sage4:=3;; **********************************************************************
I'll try to build on OpenSolaris, too.
comment:21 in reply to: ↑ 20 ; followup: ↓ 24 Changed 9 years ago by
Replying to jhpalmieri:
On two different OS X 10.7 machines, there is one doctest failure:
sage t long "devel/sage/sage/interfaces/gap.py" ********************************************************************** File "/Users/palmieri/Desktop/Sage_stuff/sage_builds/sage5.3.rc0gap/devel/sage/sage/interfaces/gap.py", line 521: sage: a = gap(3) Exception raised: Traceback (most recent call last): ...I'll try to build on OpenSolaris, too.
Wow. That's not good. I suppose that there are a lot of further errors. If gap(3)
breaks then I guess most other gap stuff will break, too.
FWIW, make ptest worked on my laptop and make ptestlong worked on bsd.math. The spkgs look fine, in terms of hg status
and SPKG.txt
.
comment:22 Changed 9 years ago by
That is the only error, actually. I did make ptestlong
and everything else passed.
comment:23 Changed 9 years ago by
A little more data: I can only get the doctest to fail if the system is somewhat loaded (for example, doing parallel doctests). If the system is idle, the doctest passes consistently; if it's loaded, the doctest fails consistently.
(In more detail: I am remotely logging into the machine in my office at the university, and I know that no one else uses the machine. Running 'make ptestlong gives a failure, running
./sage tp 2 devel/sage/sage/interfaces/` gives a failure, running the doctest while I'm also building another installation of Sage gives a failure, while it passes when running the doctest by itself. It's quite repeatable.)
comment:24 in reply to: ↑ 21 ; followup: ↓ 25 Changed 9 years ago by
Replying to SimonKing:
Wow. That's not good. I suppose that there are a lot of further errors. If
gap(3)
breaks then I guess most other gap stuff will break, too.
This is where CtrlC is tested, for the record.
Edit: where automatic restarting of the GAP interpreter is tested.
comment:25 in reply to: ↑ 24 Changed 9 years ago by
Replying to vbraun:
This is where CtrlC is tested, for the record.
Edit: where automatic restarting of the GAP interpreter is tested.
You mean this one?
The following tests against a bug fixed at trac ticket #10296: sage: a = gap(3) sage: gap.eval('quit;') '' sage: a = gap(3) ** Gap crashed or quit executing '$sage...:=3;;' ** Restarting Gap and trying again sage: a 3
Does it fail in the first or in the second instance of a = gap(3)
? Why does it report a user interrupt, when either nothing has happened at all (first instance) or GAP was not running (second instance)?
comment:26 Changed 9 years ago by
I could reproduce this in repeated testing on Linux. You actually need to scroll up a bit higher:
sage: gap.interrupt(timeout=1) is not None True sage: gap._eval_using_file_cutoff = cutoff The following tests against a bug fixed at trac ticket #10296: sage: a = gap(3)
and it dies in the first gap(3)
. Because our interrupt()
method is crap, it sends the wrong quit string and GAP/readline interferes with CtrlC.
The updated patch fixes this. I still find a ~1% chance of failing sage.interfaces.gap
doctests in random places, but thats probably on par for any pexpect interface.
comment:27 Changed 9 years ago by
Also gap_console()
was broken. Fixed it and added a meaningful doctest.
comment:28 Changed 9 years ago by
On OpenSolaris, I am getting doctest failures in interfaces/expect.py. The failures are not always the same, but here is a sample:
 http://sage.math.washington.edu/home/palmieri/misc/EXPECT1.log
 http://sage.math.washington.edu/home/palmieri/misc/EXPECT2.log
 http://sage.math.washington.edu/home/palmieri/misc/EXPECT3.log
I don't know what these have to do with this ticket, but they certainly seem to be caused by the spkg here.
By the way, the new library patch seems to fix the issues on OS X.
comment:29 followup: ↓ 30 Changed 9 years ago by
sage t long "devel/sage/sage/interfaces/expect.py" ********************************************************************** File "/export/home/palmieri/testing/clean/sage5.3.rc0/devel/sage/sage/interfaces/expect.py", line 608: sage: L = [t[1] for t in f(range(5))] Expected nothing Got: [Errno 12] Not enough space Killing any remaining workers...
Disk is full, it seems.
comment:30 in reply to: ↑ 29 ; followup: ↓ 31 Changed 9 years ago by
Replying to vbraun:
Disk is full, it seems.
It looks like there is space to me, but which disk should I be looking at? I don't understand why after installing the spkg here, if I then do ./sage f spkg/standard/gap4.4.12.p7.spkg
, the doctest passes, and then if I do ./sage f gap4.5.5.spkg
, it fails again. Does the new version of GAP create temporary files somewhere new, compared to the old version?
comment:31 in reply to: ↑ 30 Changed 9 years ago by
Replying to jhpalmieri:
It looks like there is space to me, but which disk should I be looking at?
I didn't change anything. All temp files should go to $DOT_SAGE
as before.
comment:32 followup: ↓ 34 Changed 9 years ago by
On bsd.math, at least when #12876 and its dependencies are added, I find:
sage t force_lib "devel/sage/sage/interfaces/gap.py" ********************************************************************** File "/scratch/sking/sage5.3.rc1/devel/sage/sage/interfaces/gap.py", line 809: sage: gap(2) Expected: 2 Got: <BLANKLINE> ********************************************************************** 1 items had failures: 1 of 4 in __main__.example_21 ***Test Failed*** 1 failures. For whitespace errors, see the file /Users/SimonKing/.sage//tmp/gap_70056.py [29.6 s]  The following tests failed: sage t force_lib "devel/sage/sage/interfaces/gap.py" Total time for all tests: 29.7 seconds
I'd need to repeat without all the other patches, though, to be on the safe side.
comment:33 Changed 9 years ago by
OpenSolaris: I tried setting DOT_SAGE to /tmp/palmieri, and I still see the same failures with the new spkg, no failures with the old one. Any ideas?
comment:34 in reply to: ↑ 32 Changed 9 years ago by
Replying to SimonKing:
On bsd.math, at least when #12876 and its dependencies are added, I find:
sage t force_lib "devel/sage/sage/interfaces/gap.py" ********************************************************************** File "/scratch/sking/sage5.3.rc1/devel/sage/sage/interfaces/gap.py", line 809: sage: gap(2) Expected: 2 Got: <BLANKLINE> > }}}
The error is reproducible, with just the new spkg and the doctest fix patch applied. That must be a really nasty side effect, because in an interactive session gap(2)
returns 2.
comment:35 Changed 9 years ago by
I took some time to understand the gap pexpect interface. Altogether, I think it would be better to base it on the normal interface which seems to be more stable and definitely has more exposure. But we are using the package mode (gap p
) and that is neither documented (apart from the source) nor does it behave particularly well when you send CtrlC. But that is definitely for another ticket.
I did add more checks after interrupt()
that the pexpect interface is in a sane state, and restart gap if it is not. I ran 500+ iterations of the doctest and do not get any failures anymore. I'm pretty confident that this is at least as good as we had before, so please review.
comment:36 Changed 9 years ago by
There was one doctest error is sage/interfaces/expect.py
where there is a synchronization method for the expect interface. Since it doesn't know about the GAP package mode it actually desynchronizes the gap interface ;) Fixed in the updated patch.
comment:37 Changed 9 years ago by
On skynet machine mark (Solaris on sparc, Sage built with SAGE_INSTALL_GCC=yes
), this version of Gap doesn't work:
$ ./sage gap ********* GAP, Version 4.5.5 of 16Jul2012 (free software, GPL) * GAP * http://www.gapsystem.org ********* Architecture: sparcsunsolaris2.10gccdefault32 Libs used: gmp Loading the library and packages ... /home/palmieri/mark/sage5.3.rc1/spkg/bin/sage: line 400: 4079 Bus Error (core dumped) "$SAGE_LOCAL/bin/gap" "$@"
comment:38 Changed 9 years ago by
I'm not surprised that it doesn't work on SPARC. I'll send the authors a bug report. But since its not a primary platform it shouldn't stop us from shipping the new GAP version.
comment:39 Changed 9 years ago by
I've investigated the SPARC issue and received a patch from the GAP developers. I've updated the spkg, now builds and tests fine on mark.
comment:40 followup: ↓ 41 Changed 9 years ago by
 Description modified (diff)
 Report Upstream changed from Fixed upstream, in a later stable release. to Completely fixed; Fix reported upstream
 Summary changed from Upgrade GAP to 4.5.5 to Upgrade GAP to 4.5.6
Naturally, nobody dared to review this ticket. So GAP released the next version in the meantime. Updated to gap4.5.6.
comment:41 in reply to: ↑ 40 Changed 9 years ago by
Replying to vbraun:
Naturally, nobody dared to review this ticket. So GAP released the next version in the meantime. Updated to gap4.5.6.
challenge accepted :/ expect a review real soon...
comment:42 Changed 9 years ago by
http://www.stp.dias.ie/~vbraun/Sage/spkg/gap4.5.6.spkg has uncommitted changes in SPKG.txt
and the spkg itself is not compressed, making it over 50MB instead of under 10MB.
comment:43 Changed 9 years ago by
 Description modified (diff)
 Status changed from needs_review to positive_review
I've created the spkg file with checked in changes and compressed it. The link included in the modified ticket description. Otherwise, great stuff! Positive review.
comment:44 Changed 9 years ago by
 Milestone changed from sage5.4 to sage5.5
 Reviewers set to Dmitrii Pasechnik
comment:45 Changed 9 years ago by
 Dependencies set to #13123
 Status changed from positive_review to needs_work
This needs to be rebased to #13123.
comment:46 followup: ↓ 50 Changed 9 years ago by
I get the following errors on Mac OS X 10.4 PPC. I think they are all about the seed for random tests. Notice that they did pass in the past, and they are neither the old nor the new versions of the expected results from the patch here. Could the little/bigendian have any impact on this? I assume not, but otherwise it seems odd that they worked in the past and don't now. Of course, they ARE "random"...
sage t "devel/sagemain/sage/algebras/group_algebra_new.py" ********************************************************************** File "/Users/student/Desktop/sage5.4.beta1/devel/sagemain/sage/algebras/group_algebra_new.py", line 592: sage: GroupAlgebra(DihedralGroup(6), QQ).random_element() Expected: 1/95*(2,6)(3,5)  1/2*(1,3)(4,6) Got: 1/95*(1,3)(4,6)  1/2*(1,5,3)(2,6,4) ********************************************************************** File "/Users/student/Desktop/sage5.4.beta1/devel/sagemain/sage/algebras/group_algebra_new.py", line 594: sage: GroupAlgebra(SU(2, 13), QQ).random_element(1) Expected: 1/2*[ 1 9*a + 2] [9*a + 2 12] Got: 1/2*[ 4 9*a + 2] [6*a + 10 1] ********************************************************************** 1 items had failures: 2 of 5 in __main__.example_23 ***Test Failed*** 2 failures. For whitespace errors, see the file /Users/student/.sage//tmp/group_algebra_new_23167.py [83.0 s] sage t "devel/sagemain/sage/groups/matrix_gps/__init__.py" [2.8 s] sage t "devel/sagemain/sage/groups/matrix_gps/all.py" [0.9 s] sage t "devel/sagemain/sage/groups/matrix_gps/general_linear.py" [93.5 s] sage t "devel/sagemain/sage/groups/matrix_gps/homset.py" [30.6 s] sage t "devel/sagemain/sage/groups/matrix_gps/linear.py" [21.4 s] sage t "devel/sagemain/sage/groups/matrix_gps/matrix_group.py" [129.7 s] sage t "devel/sagemain/sage/groups/matrix_gps/matrix_group_element.py" [53.0 s] sage t "devel/sagemain/sage/groups/matrix_gps/matrix_group_morphism.py" [50.4 s] sage t "devel/sagemain/sage/groups/matrix_gps/orthogonal.py" ********************************************************************** File "/Users/student/Desktop/sage5.4.beta1/devel/sagemain/sage/groups/matrix_gps/orthogonal.py", line 244: sage: GO( 3, GF(7), 0).random_element() Expected: [1 0 0] [6 1 6] [5 0 6] Got: [1 5 3] [0 1 0] [0 6 6] ********************************************************************** File "/Users/student/Desktop/sage5.4.beta1/devel/sagemain/sage/groups/matrix_gps/orthogonal.py", line 142: sage: G.random_element() Expected: [4 3 5 2] [6 6 4 0] [0 4 6 0] [4 4 5 1] Got: [0 6 4 6] [2 5 0 2] [5 5 4 0] [1 0 3 4] ********************************************************************** 2 items had failures: 1 of 6 in __main__.example_10 1 of 6 in __main__.example_4 ***Test Failed*** 2 failures. For whitespace errors, see the file /Users/student/.sage//tmp/orthogonal_23218.py [39.1 s] sage t "devel/sagemain/sage/groups/matrix_gps/special_linear.py" [42.5 s] sage t "devel/sagemain/sage/groups/matrix_gps/symplectic.py" ********************************************************************** File "/Users/student/Desktop/sage5.4.beta1/devel/sagemain/sage/groups/matrix_gps/symplectic.py", line 16: sage: G.random_element() Expected: [5 4 6 0] [1 1 6 2] [5 5 0 6] [5 4 5 1] Got: [2 3 2 1] [6 4 6 5] [1 2 5 2] [6 5 1 0] ********************************************************************** 1 items had failures: 1 of 8 in __main__.example_0 ***Test Failed*** 1 failures. For whitespace errors, see the file /Users/student/.sage//tmp/symplectic_23230.py [30.4 s] sage t "devel/sagemain/sage/groups/matrix_gps/unitary.py" ********************************************************************** File "/Users/student/Desktop/sage5.4.beta1/devel/sagemain/sage/groups/matrix_gps/unitary.py", line 26: sage: G.random_element() Expected: [4*a + 1 4*a + 4 a + 4] [3*a + 3 3 3] [ a + 2 4*a + 1 3*a + 3] Got: [2*a + 3 4 3*a + 2] [3*a + 4 a 3*a + 1] [ 4*a 2*a + 2 a + 2] ********************************************************************** 1 items had failures: 1 of 10 in __main__.example_0 ***Test Failed*** 1 failures. For whitespace errors, see the file /Users/student/.sage//tmp/unitary_23236.py [31.7 s] sage t "devel/sagemain/sage/misc/randstate.pyx" ********************************************************************** File "/Users/student/Desktop/sage5.4.beta1/devel/sagemain/sage/misc/randstate.pyx", line 57: sage: rtest() Expected: (303, 0.266166246380421, 1/2*x^2  1/95*x  1/2, (1,3,2), [ 0, 0, 0, 0, 1 ], 963229057, 8045, 0.9661911734708414) Got: (303, 0.266166246380421, 1/2*x^2  1/95*x  1/2, (1,2)(4,5), [ 0, 0, 0, 0, 1 ], 963229057, 8045, 0.9661911734708414) ********************************************************************** File "/Users/student/Desktop/sage5.4.beta1/devel/sagemain/sage/misc/randstate.pyx", line 61: sage: rtest() Expected: (978, 0.0557699430711638, 3*x^2  1/12, (1,3,2), [ 0, 1, 1, 0, 0 ], 1161603091, 60359, 0.8335077654199736) Got: (978, 0.0557699430711638, 3*x^2  1/12, (1,2,3), [ 0, 1, 1, 0, 0 ], 1161603091, 60359, 0.8335077654199736) ********************************************************************** File "/Users/student/Desktop/sage5.4.beta1/devel/sagemain/sage/misc/randstate.pyx", line 65: sage: rtest() Expected: (207, 0.0141049486533456, 4*x^2 + 1/2, (1,3,2), [ 0, 0, 1, 0, 1 ], 637693405, 27695, 0.19982565117278328) Got: (207, 0.0141049486533456, 4*x^2 + 1/2, (2,3), [ 0, 0, 1, 0, 1 ], 637693405, 27695, 0.19982565117278328) ********************************************************************** File "/Users/student/Desktop/sage5.4.beta1/devel/sagemain/sage/misc/randstate.pyx", line 69: sage: rtest() Expected: (303, 0.266166246380421, 1/2*x^2  1/95*x  1/2, (1,3,2), [ 0, 0, 0, 0, 1 ], 963229057, 8045, 0.9661911734708414) Got: (303, 0.266166246380421, 1/2*x^2  1/95*x  1/2, (1,2)(4,5), [ 0, 0, 0, 0, 1 ], 963229057, 8045, 0.9661911734708414) ********************************************************************** File "/Users/student/Desktop/sage5.4.beta1/devel/sagemain/sage/misc/randstate.pyx", line 73: sage: rtest() Expected: (978, 0.0557699430711638, 3*x^2  1/12, (1,3,2), [ 0, 1, 1, 0, 0 ], 1161603091, 60359, 0.8335077654199736) Got: (978, 0.0557699430711638, 3*x^2  1/12, (1,2,3), [ 0, 1, 1, 0, 0 ], 1161603091, 60359, 0.8335077654199736) ********************************************************************** File "/Users/student/Desktop/sage5.4.beta1/devel/sagemain/sage/misc/randstate.pyx", line 77: sage: rtest() Expected: (207, 0.0141049486533456, 4*x^2 + 1/2, (1,3,2), [ 0, 0, 1, 0, 1 ], 637693405, 27695, 0.19982565117278328) Got: (207, 0.0141049486533456, 4*x^2 + 1/2, (2,3), [ 0, 0, 1, 0, 1 ], 637693405, 27695, 0.19982565117278328) ********************************************************************** File "/Users/student/Desktop/sage5.4.beta1/devel/sagemain/sage/misc/randstate.pyx", line 88: sage: rtest() Expected: (720, 0.612180244315804, x^2  x, (2,3), [ 1, 0, 0, 0, 0 ], 912534076, 14005, 0.9205331599518184) Got: (720, 0.612180244315804, x^2  x, (1,3), [ 1, 0, 0, 0, 0 ], 912534076, 14005, 0.9205331599518184) ********************************************************************** File "/Users/student/Desktop/sage5.4.beta1/devel/sagemain/sage/misc/randstate.pyx", line 224: sage: r1 = rtest(); r1 Expected: (303, 0.266166246380421, 1/2*x^2  1/95*x  1/2, (1,3,2), [ 0, 0, 0, 0, 1 ], 963229057, 8045, 0.9661911734708414) Got: (303, 0.266166246380421, 1/2*x^2  1/95*x  1/2, (1,2)(4,5), [ 0, 0, 0, 0, 1 ], 963229057, 8045, 0.9661911734708414) ********************************************************************** File "/Users/student/Desktop/sage5.4.beta1/devel/sagemain/sage/misc/randstate.pyx", line 227: sage: r2 = rtest(); r2 Expected: (105, 0.642309615982449, x^2  x  6, (1,2,3), [ 1, 0, 0, 1, 1 ], 14082860, 1271, 0.001767155077382232) Got: (105, 0.642309615982449, x^2  x  6, (4,5), [ 1, 0, 0, 1, 1 ], 14082860, 1271, 0.001767155077382232) ********************************************************************** File "/Users/student/Desktop/sage5.4.beta1/devel/sagemain/sage/misc/randstate.pyx", line 236: sage: with seed(1): rtest() Expected: (978, 0.0557699430711638, 3*x^2  1/12, (1,3,2), [ 0, 1, 1, 0, 0 ], 1161603091, 60359, 0.8335077654199736) Got: (978, 0.0557699430711638, 3*x^2  1/12, (1,2,3), [ 0, 1, 1, 0, 0 ], 1161603091, 60359, 0.8335077654199736) ********************************************************************** File "/Users/student/Desktop/sage5.4.beta1/devel/sagemain/sage/misc/randstate.pyx", line 239: sage: r2m = rtest(); r2m Expected: (105, 0.642309615982449, x^2  x  6, (1,2,3), [ 1, 0, 0, 1, 1 ], 14082860, 19769, 0.001767155077382232) Got: (105, 0.642309615982449, x^2  x  6, (4,5), [ 1, 0, 0, 1, 1 ], 14082860, 19769, 0.001767155077382232) ********************************************************************** File "/Users/student/Desktop/sage5.4.beta1/devel/sagemain/sage/misc/randstate.pyx", line 255: sage: with seed(1): rtest(); rtest(); Expected: (978, 0.0557699430711638, 3*x^2  1/12, (1,3,2), [ 0, 1, 1, 0, 0 ], 1161603091, 60359, 0.8335077654199736) (138, 0.0404945051288503, 2*x  24, (2,3), [ 1, 1, 1, 0, 1 ], 1966097838, 10234, 0.0033332230808060803) Got: (978, 0.0557699430711638, 3*x^2  1/12, (1,2,3), [ 0, 1, 1, 0, 0 ], 1161603091, 60359, 0.8335077654199736) (138, 0.0404945051288503, 2*x  24, (2,3), [ 1, 1, 1, 0, 1 ], 1966097838, 10234, 0.0033332230808060803) ********************************************************************** File "/Users/student/Desktop/sage5.4.beta1/devel/sagemain/sage/misc/randstate.pyx", line 274: sage: try: ctx.__enter__() rtest() finally: ctx.__exit__(None, None, None) Expected: <sage.misc.randstate.randstate object at 0x...> (978, 0.0557699430711638, 3*x^2  1/12, (1,3,2), [ 0, 1, 1, 0, 0 ], 1161603091, 60359, 0.8335077654199736) False Got: <sage.misc.randstate.randstate object at 0x155950c0> (978, 0.0557699430711638, 3*x^2  1/12, (1,2,3), [ 0, 1, 1, 0, 0 ], 1161603091, 60359, 0.8335077654199736) False ********************************************************************** File "/Users/student/Desktop/sage5.4.beta1/devel/sagemain/sage/misc/randstate.pyx", line 703: sage: gap.Random(1, 10^50) Expected: 1496738263332555434474532297768680634540939580077 Got: 97144566318213989637952954803537490912828430192472 **********************************************************************
comment:47 followup: ↓ 48 Changed 9 years ago by
In sage.misc.randstate.pyx
there is a method set_seed_gap()
that checks for big endianness and flips bytes around in that case. It could be that GAP's new random number code is actually big endian clean. Can you modify set_seed_gap()
and see if that fixes things?
comment:48 in reply to: ↑ 47 Changed 9 years ago by
In
sage.misc.randstate.pyx
there is a methodset_seed_gap()
that checks for big endianness and flips bytes around in that case. It could be that GAP's new random number code is actually big endian clean. Can you modifyset_seed_gap()
and see if that fixes things?
Well, what do you know! I didn't think Sage had any custom code for big endian... I'll try this now. It certainly sounds likely.
It does fix the problem in randstate.pyx, so I think you're right about the others. Testing now.
comment:49 Changed 9 years ago by
sage t "devel/sagemain/sage/algebras/group_algebra_new.py" [81.4 s] sage t "devel/sagemain/sage/groups/matrix_gps/orthogonal.py" [31.4 s] sage t "devel/sagemain/sage/groups/matrix_gps/symplectic.py" [30.7 s] sage t "devel/sagemain/sage/groups/matrix_gps/unitary.py" [31.2 s] sage t "devel/sagemain/sage/misc/randstate.pyx" [109.4 s]
I just gutted that part of the code.

sage/misc/randstate.pyx
# HG changeset patch # User KarlDieter Crisman <kcrisman@gmail.com> # Date 1348512346 14400 # Node ID aedbdc34982e44f81636a8006f7fc44086604f43 # Parent e1f48782037500988a1ea3f3b0764ead26b6232f Remove bigendian workaround diff git a/sage/misc/randstate.pyx b/sage/misc/randstate.pyx
a b 718 718 seed = ZZ.random_element(long(1)<<128) 719 719 classic_seed = seed 720 720 721 if sys.byteorder == 'big':722 # GAP's random number generator initialization723 # (in integer.c, in FuncInitRandomMT) takes its724 # seed as a string, then converts this string into725 # an array of 32bit integers just by casting the726 # pointer. Thus, the result depends on the727 # endianness of the machine. As a workaround, we728 # swap the bytes in the string ourselves, so that729 # GAP always gets the same array of integers.730 731 seed = str(seed)732 new_seed = ''733 while len(seed) >= 4:734 new_seed += seed[3::1]735 seed = seed[4:]736 seed = '"' + new_seed + '"'737 738 721 mersenne_seed = seed 739 722 740 723 prev_mersenne_seed = gap.Reset(gap.GlobalMersenneTwister, mersenne_seed)
I don't know why these never format quite right...
comment:50 in reply to: ↑ 46 Changed 9 years ago by
Replying to kcrisman:
I get the following errors on Mac OS X 10.4 PPC. I think they are all about the seed for random tests. Notice that they did pass in the past, and they are neither the old nor the new versions of the expected results from the patch here. Could the little/bigendian have any impact on this?
Oh dear, now I see why it felt like good deja vu : see #9867. Nice moldy bitrotten ticket. It was never merged when the current ticket was prepared.
I've put #9867 up for review. Please close it and merge it here somehow...
Changed 9 years ago by
comment:51 Changed 9 years ago by
 Description modified (diff)
I have added the patch from #9867 here.
comment:52 Changed 9 years ago by
 Description modified (diff)
comment:53 Changed 9 years ago by
 Dependencies changed from #13123 to #13123, #13123
 Status changed from needs_work to positive_review
I've rebased it on #13123 (that is, on the asofyet unrelease sage5.4.beta2) and folded in the endianness patch.
comment:54 Changed 9 years ago by
 Dependencies changed from #13123, #13123 to #13123
I don't think it depends on #13123 twice.
comment:55 Changed 9 years ago by
 Description modified (diff)
It seems the endianness patch is included in the other patch...
comment:56 Changed 9 years ago by
When testing this in a fromscratch built Sage, I got
sage t force_lib devel/sage/sage/tests/cmdline.py ********************************************************************** File "/release/merger/sage5.5.beta0/devel/sagemain/sage/tests/cmdline.py", line 408: sage: err Expected: '' Got: 'gap: halving pool size.\n' **********************************************************************
On subsequent tests, I did not get this message.
comment:57 Changed 9 years ago by
Looking at the GAP sources, this seems to be caused by a lack of memory...
comment:58 Changed 9 years ago by
Also, I somehow ended up with orphan GAP processes running:
jdemeyer@sage:/release/merger/sage5.5.beta0$ ps ef grep gap jdemeyer 10863 1 99 14:26 ? 01:21:46 /release/merger/sage5.5.beta0/local/gap/latest/bin/x86_64unknownlinuxgnugccdefault64/gap m 24m l /release/merger/sage5.5.beta0/local/gap/latest r b p T o 9999G /release/merger/sage5.5.beta0/local/share/sage/ext/gap/sage.g jdemeyer 17015 10589 0 15:49 pts/124 00:00:00 grep gap jdemeyer 22854 1 99 14:52 ? 00:56:56 /release/merger/sage5.5.beta0/local/gap/latest/bin/x86_64unknownlinuxgnugccdefault64/gap m 24m l /release/merger/sage5.5.beta0/local/gap/latest r b p T o 9999G /release/merger/sage5.5.beta0/local/share/sage/ext/gap/sage.g
comment:59 followup: ↓ 60 Changed 9 years ago by
Its not lack of actual memory, it is a lack of swap space. GAP reserves addressing space and corresponding swap space such that the addressing space could be actually used (potentially by swapping) if GAP were to use that much memory. If the initial workspace is too large, it tries with half the size iteratively until it succeeds. The "halving pool size" is harmless but cannot be disabled.
comment:60 in reply to: ↑ 59 Changed 9 years ago by
Replying to vbraun:
The "halving pool size" is harmless but cannot be disabled.
The printf()
could be patched out... if it's harmless.
comment:61 Changed 9 years ago by
I've written upstream about this issue, hopefully it'll be fixed in a future release.
I'm not entirely happy with our allocation strategy, if you are running multiple sage processes (e.g. on a server) then you quickly reserve most of swap. I'll add a patch to tweak the pool size.
comment:62 Changed 9 years ago by
In any case, something needs to be done about the cmdline.py
doctest failure due to that "error" message which isn't even an error.
Also: after running all doctests again, I ended up again with an orphaned gap
process.
comment:63 Changed 9 years ago by
 Status changed from positive_review to needs_work
comment:64 followup: ↓ 65 Changed 9 years ago by
 Dependencies changed from #13123 to #13123, #13579
 Description modified (diff)
 Status changed from needs_work to needs_review
I've fixed the "halving pool size" doctest error and changed the workspace allocation to default to 1/10*(available swap). This should give you enough workspace without reserving all swap when you run a server (or parallel doctests).
I can't reproduce any gap orphans. I verified that the gap pid is correctly written to spawned_processes
so even if Sage gets kill 9'ed the sagecleaner will get rid of orphans. Maybe sagecleaner has some issue on your setup, Jeroen?
The additional trac_13211_pool_size.patch needs review... Dima? ;)
comment:65 in reply to: ↑ 64 Changed 9 years ago by
Replying to vbraun:
I can't reproduce any gap orphans. I verified that the gap pid is correctly written to
spawned_processes
I haven't looked at the code, but are you sure gap is started only from one place in Sage? Maybe the command line helps:
/release/merger/sage5.5.beta0/local/gap/latest/bin/x86_64unknownlinuxgnugccdefault64/gap m 24m l /release/merger/sage5.5.beta0/local/gap/latest r b p T o 9999G /release/merger/sage5.5.beta0/local/share/sage/ext/gap/sage.g
comment:66 Changed 9 years ago by
I spend quite some time today trying to find any change that might change where or how gap is started, but I didn't find any. I ran various doctests and tried to kill 9 the sage process, but never managed to produce any orphans that weren't cleaned up.
comment:67 Changed 9 years ago by
On MacOSX 10.6.8 I get
sage t long force_lib "devel/sage/sage/misc/memory_info.py" ********************************************************************** File "/usr/local/src/sage/sage5.4.rc0/devel/sage/sage/misc/memory_info.py", line 99: sage: print "ignore this"; mem._parse_proc_meminfo() # random output Exception raised: Traceback (most recent call last): File "/usr/local/src/sage/sage5.4.rc0/local/bin/ncadoctest.py", line 1231, in run_one_test self.run_one_example(test, example, filename, compileflags) File "/usr/local/src/sage/sage5.4.rc0/local/bin/sagedoctest.py", line 38, in run_one_example OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags) File "/usr/local/src/sage/sage5.4.rc0/local/bin/ncadoctest.py", line 1172, in run_one_example compileflags, 1) in test.globs File "<doctest __main__.example_3[4]>", line 1, in <module> print "ignore this"; mem._parse_proc_meminfo() # random output###line 99: sage: print "ignore this"; mem._parse_proc_meminfo() # random output AttributeError: 'MemoryInfo_guess' object has no attribute '_parse_proc_meminfo' ********************************************************************** 1 items had failures: 1 of 6 in __main__.example_3 ***Test Failed*** 1 failures. For whitespace errors, see the file /Users/dima/.sage//tmp/memory_info_41483.py [2.6 s]
comment:68 followup: ↓ 69 Changed 9 years ago by
I see, raising an exception still makes # random
doctests fail. Updated patch fixes this and adds special handling for OSX to get the ram size.
comment:69 in reply to: ↑ 68 ; followup: ↓ 70 Changed 9 years ago by
Replying to vbraun:
I see, raising an exception still makes
# random
doctests fail. Updated patch fixes this and adds special handling for OSX to get the ram size.
OK, this works on OSX now, good. Let me me check on Linux...
comment:70 in reply to: ↑ 69 Changed 9 years ago by
Replying to dimpase:
Replying to vbraun:
I see, raising an exception still makes
# random
doctests fail. Updated patch fixes this and adds special handling for OSX to get the ram size.OK, this works on OSX now, good. Let me me check on Linux...
with #13579 and #13211 patches applied, I have the following weirdness on Debian:
$ ../../sage   Sage Version 5.4.rc1, Release Date: 20121005   Type "notebook()" for the browserbased notebook interface.   Type "help()" for help.   ********************************************************************** * * * Warning: this is a prerelease version, and it may be unstable. * * * ********************************************************************** top: unknown argument 'l' usage: top hv  bcisSH d delay n iterations [u user  U user] p pid [,pid ...] sage:
here is my patch queue:
/usr/local/src/sage/sage5.4.rc1/devel/sage$ hg qapplied 13579_secure_tmp.patch trac_13579_fix_test_executable.patch trac_13211_fix_gap_doctests.patch trac_13211_pool_size.patch
PS. This seems to be due to failure to tell OSX from Linux!
comment:71 Changed 9 years ago by
Well, why don't you actually check for Darwin/OSX? E.g.
import platform if platform.system=='Darwin': # do OSXthing... else: # the rest...
comment:72 Changed 9 years ago by
I can't apply the patches.
sage: hg_sage.apply('http://trac.sagemath.org/sage_trac/rawattachment/ticket/13211/trac_13211_fix_gap_doctests.patch') Attempting to load remote file: http://trac.sagemath.org/sage_trac/rawattachment/ticket/13211/trac_13211_fix_gap_doctests.patch Loading: [.......] cd "/home/mmarco/sage5.3/devel/sage" && sage hg import "/home/mmarco/.sage/temp/neumann/583/tmp_0.patch" applying /home/mmarco/.sage/temp/neumann/583/tmp_0.patch patching file sage/interfaces/gap.py Hunk #20 FAILED at 1663 1 out of 21 hunks FAILED  saving rejects to file sage/interfaces/gap.py.rej patching file sage/tests/cmdline.py Hunk #1 FAILED at 516 1 out of 1 hunks FAILED  saving rejects to file sage/tests/cmdline.py.rej abort: patch failed to apply
comment:73 Changed 9 years ago by
You need at least sage5.4.beta2 to apply the patches, I think.
comment:74 followup: ↓ 75 Changed 9 years ago by
Ok should work now!
comment:75 in reply to: ↑ 74 Changed 9 years ago by
Replying to vbraun:
Ok should work now!
looks good. Another round of tests, and hopefully it's done.
comment:76 Changed 9 years ago by
There is an apparent incompatibility with the new GUAVA GAP package:
sage t optional force_lib devel/sage/sage/coding/linear_code.py **********************************************************************File "/usr/local/src/sage/sage5.4.rc1/devel/sagemain/sage/coding/linear_code.py", line 1239: sage: C.covering_radius() # requires optional GAP package GuavaException raised: Traceback (most recent call last): File "/usr/local/src/sage/sage5.4.rc1/local/bin/ncadoctest.py", line 1231, i n run_one_test self.run_one_example(test, example, filename, compileflags) File "/usr/local/src/sage/sage5.4.rc1/local/bin/sagedoctest.py", line 38, in run_one_example OrigDocTestRunner.run_one_example(self, test, example, filename, compilefla gs) File "/usr/local/src/sage/sage5.4.rc1/local/bin/ncadoctest.py", line 1172, i n run_one_example compileflags, 1) in test.globs File "<doctest __main__.example_24[3]>", line 1, in <module> C.covering_radius() # requires optional GAP package Guava###line 1239: sage: C.covering_radius() # requires optional GAP package Guava File "/usr/local/src/sage/sage5.4.rc1/local/lib/python/sitepackages/sage/co ding/linear_code.py", line 1245, in covering_radius C = gapG.GeneratorMatCode(gap(F)) File "/usr/local/src/sage/sage5.4.rc1/local/lib/python/sitepackages/sage/in terfaces/interface.py", line 584, in __call__ return self._obj.parent().function_call(self._name, [self._obj] + list(args ), kwds) File "/usr/local/src/sage/sage5.4.rc1/local/lib/python/sitepackages/sage/in terfaces/gap.py", line 874, in function_call ['%s=%s'%(key,value.name()) for key, value in kwds.items()]))) File "/usr/local/src/sage/sage5.4.rc1/local/lib/python/sitepackages/sage/in terfaces/gap.py", line 546, in eval result = Expect.eval(self, input_line, **kwds) File "/usr/local/src/sage/sage5.4.rc1/local/lib/python/sitepackages/sage/in terfaces/expect.py", line 1236, in eval for L in code.split('\n') if L != '']) File "/usr/local/src/sage/sage5.4.rc1/local/lib/python/sitepackages/sage/interfaces/gap.py", line 747, in _eval_line raise RuntimeError, message RuntimeError: Gap produced error output Error, Variable: 'GeneratorMatCode' must have a value executing GeneratorMatCode($sage8,$sage1); ********************************************************************** .... etc
comment:77 Changed 9 years ago by
 Cc ppurka added
comment:78 Changed 9 years ago by
 Status changed from needs_review to positive_review
Positive review. The issues with optional packages, such as the above, and over here at sagedevel, should be tackled elsewhere.
comment:79 Changed 9 years ago by
 Milestone changed from sage5.5 to sagepending
comment:80 Changed 9 years ago by
rebased for sage5.4.rc2
comment:81 followup: ↓ 82 Changed 9 years ago by
While we are at it, move the gap install to
$SAGE_LOCAL/gap/gap.x.y.z
.
Would it make sense to use $SAGE_LOCAL/share/gap/gap.x.y.z
instead?
comment:82 in reply to: ↑ 81 Changed 9 years ago by
Replying to jhpalmieri:
Would it make sense to use
$SAGE_LOCAL/share/gap/gap.x.y.z
instead?
No, /share/
isn't for binaries.
comment:83 Changed 9 years ago by
 Milestone changed from sagepending to sage5.5
comment:84 followup: ↓ 85 Changed 9 years ago by
 Status changed from positive_review to needs_work
This fails on Skynet eno
:
Host system: Linux eno 3.3.71.fc16.x86_64 #1 SMP Tue May 22 13:59:39 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux **************************************************** C compiler: gcc C compiler version: Using builtin specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/local/gcc4.7.0/x86_64Linuxcore2fc/libexec/gcc/x86_64unknownlinuxgnu/4.7.0/ltowrapper Target: x86_64unknownlinuxgnu Configured with: /usr/local/gcc4.7.0/src/gcc4.7.0/configure enablelanguages=c,c++,fortran withgnuas withas=/usr/local/binutils2.22/x86_64Linuxcore2fcgcc4.6.2rh/bin/as withgnuld withld=/usr/local/binutils2.22/x86_64Linuxcore2fcgcc4.6.2rh/bin/ld withgmp=/usr/local/mpir2.5.1/x86_64Linuxcore2fcgcc4.6.3rh withmpfr=/usr/local/mpfr3.1.0/x86_64Linuxcore2fcmpir2.5.1gcc4.6.3rh withmpc=/usr/local/mpc0.9/x86_64Linuxcore2fcmpir2.5.1mpfr3.1.0gcc4.6.3rh prefix=/usr/local/gcc4.7.0/x86_64Linuxcore2fc Thread model: posix gcc version 4.7.0 (GCC) **************************************************** spkginstall is using VERSION = 4.5.6 GAP_DIR = gap4.5.6 INSTALL_DIR = /home/buildbot/build/sage/eno1/eno_full/build/sage5.5.beta1/local/gap/gap4.5.6 Applying patches... patching file gap.shi patching file tst/testinstall.g Configuring GAP... checking for gcc... gcc checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts g... yes checking for gcc option to accept ISO C89... none needed checking how to run the C preprocessor... gcc E checking for grep that handles long lines and e... /bin/grep checking for egrep... /bin/grep E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking size of void *... 8 checking ABI bit size... 64 checking build system type... x86_64unknownlinuxgnu checking host system type... x86_64unknownlinuxgnu checking target system type... x86_64unknownlinuxgnu checking for gcc... (cached) gcc checking whether we are using the GNU C compiler... (cached) yes checking whether gcc accepts g... (cached) yes checking for gcc option to accept ISO C89... (cached) none needed checking whether make sets $(MAKE)... yes checking GAP config name... default64 configure: error: Could not locate GMP in the specified location Error configuring GAP.
comment:85 in reply to: ↑ 84 ; followup: ↓ 86 Changed 9 years ago by
Replying to jdemeyer:
This fails on Skynet
eno
:Host system: Linux eno 3.3.71.fc16.x86_64 #1 SMP Tue May 22 13:59:39 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux **************************************************** C compiler: gcc C compiler version: Using builtin specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/local/gcc4.7.0/x86_64Linuxcore2fc/libexec/gcc/x86_64unknownlinuxgnu/4.7.0/ltowrapper Target: x86_64unknownlinuxgnu Configured with: /usr/local/gcc4.7.0/src/gcc4.7.0/configure enablelanguages=c,c++,fortran withgnuas withas=/usr/local/binutils2.22/x86_64Linuxcore2fcgcc4.6.2rh/bin/as withgnuld withld=/usr/local/binutils2.22/x86_64Linuxcore2fcgcc4.6.2rh/bin/ld withgmp=/usr/local/mpir2.5.1/x86_64Linuxcore2fcgcc4.6.3rh withmpfr=/usr/local/mpfr3.1.0/x86_64Linuxcore2fcmpir2.5.1gcc4.6.3rh withmpc=/usr/local/mpc0.9/x86_64Linuxcore2fcmpir2.5.1mpfr3.1.0gcc4.6.3rh prefix=/usr/local/gcc4.7.0/x86_64Linuxcore2fc Thread model: posix gcc version 4.7.0 (GCC)
must everything be able to run with gcc 4.7.0, too?
comment:86 in reply to: ↑ 85 Changed 9 years ago by
Replying to dimpase:
must everything be able to run with gcc 4.7.0, too?
At least some gcc4.7.x version should work.
comment:87 Changed 9 years ago by
The orphaned processes are still a problem for me. After playing around with a Sage version including this patch, I am seeing 9 gap processes running at 100% CPU.
comment:88 Changed 9 years ago by
 Keywords orphaned processes build on eno added
comment:89 Changed 9 years ago by
On Itanium (Skynet iras
):
gap(15123): unaligned access to 0x607ffffffecfa0ef, ip=0x400000000019cbd0 gap(15123): unaligned access to 0x607ffffffecfa0e7, ip=0x400000000019cbd0 gap(15123): unaligned access to 0x607ffffffecfa0df, ip=0x400000000019cbd0 gap(15123): unaligned access to 0x607ffffffecfa0d7, ip=0x400000000019cbd0 gap(15123): unaligned access to 0x607ffffffecfa0cf, ip=0x400000000019cbd0 sage t long force_lib devel/sage/sage/interfaces/gap.py ********************************************************************** File "/home/buildbot/build/sage/iras1/iras_full/build/sage5.5.beta1/devel/sagemain/sage/interfaces/gap.py", line 886: sage: c = gap.trait_names() Exception raised: Traceback (most recent call last): File "/home/buildbot/build/sage/iras1/iras_full/build/sage5.5.beta1/local/bin/ncadoctest.py", line 1231, in run_one_test self.run_one_example(test, example, filename, compileflags) File "/home/buildbot/build/sage/iras1/iras_full/build/sage5.5.beta1/local/bin/sagedoctest.py", line 38, in run_one_example OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags) File "/home/buildbot/build/sage/iras1/iras_full/build/sage5.5.beta1/local/bin/ncadoctest.py", line 1172, in run_one_example compileflags, 1) in test.globs File "<doctest __main__.example_21[2]>", line 1, in <module> c = gap.trait_names()###line 886: sage: c = gap.trait_names() File "/home/buildbot/build/sage/iras1/iras_full/build/sage5.5.beta1/local/lib/python/sitepackages/sage/interfaces/gap.py", line 1379, in trait_names self.__trait_names = eval(self.eval('NamesSystemGVars()')) + \ File "<string>", line 2644 gap(15082): unaligned access to 0x607ffffffe79200f, ip=0x400000000019cbd0 ^ SyntaxError: invalid syntax **********************************************************************
comment:90 followup: ↓ 93 Changed 9 years ago by
On OS X 10.6 (William's bsd
machine), I get the same build error as eno
:
Host system: Darwin bsd.math.washington.edu 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun 7 16:32:41 PDT 2011; root:xnu1504.15.3~1/RELEASE_X86_64 x86_64 **************************************************** C compiler: gcc C compiler version: Using builtin specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/Users/buildbot/build/sage/bsd1/bsd_full/build/sage5.5.beta1/local/libexec/gcc/x86_64appledarwin10.8.0/4.6.3/ltowrapper Target: x86_64appledarwin10.8.0 Configured with: ../src/configure prefix=/Users/buildbot/build/sage/bsd1/bsd_full/build/sage5.5.beta1/local withlocalprefix=/Users/buildbot/build/sage/bsd1/bsd_full/build/sage5.5.beta1/local withgmp=/Users/buildbot/build/sage/bsd1/bsd_full/build/sage5.5.beta1/local withmpfr=/Users/buildbot/build/sage/bsd1/bsd_full/build/sage5.5.beta1/local withmpc=/Users/buildbot/build/sage/bsd1/bsd_full/build/sage5.5.beta1/local withsystemzlib disablemultilib Thread model: posix gcc version 4.6.3 (GCC) **************************************************** spkginstall is using VERSION = 4.5.6 GAP_DIR = gap4.5.6 INSTALL_DIR = /Users/buildbot/build/sage/bsd1/bsd_full/build/sage5.5.beta1/local/gap/gap4.5.6 Applying patches... patching file gap.shi patching file tst/testinstall.g Configuring GAP... checking for gcc... gcc checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts g... yes checking for gcc option to accept ISO C89... none needed checking how to run the C preprocessor... cpp checking for grep that handles long lines and e... /usr/bin/grep checking for egrep... /usr/bin/grep E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking size of void *... 8 checking ABI bit size... 64 checking build system type... x86_64appledarwin10.8.0 checking host system type... x86_64appledarwin10.8.0 checking target system type... x86_64appledarwin10.8.0 checking for gcc... (cached) gcc checking whether we are using the GNU C compiler... (cached) yes checking whether gcc accepts g... (cached) yes checking for gcc option to accept ISO C89... (cached) none needed checking whether make sets $(MAKE)... yes checking GAP config name... default64 configure: error: Could not locate GMP in the specified location Error configuring GAP.
comment:91 followup: ↓ 92 Changed 9 years ago by
Concerning the orphaned process: could it be that they appear during the build as opposed to when running Sage? Typical command line:
/release/merger/sage5.5.beta2/local/gap/latest/bin/x86_64unknownlinuxgnugccdefault64/gap m 24m l /release/merger/sage5.5.beta2/local/gap/latest r b p T o 6271720243 /release/merger/sage5.5.beta2/local/share/sage/ext/gap/sage.g
comment:92 in reply to: ↑ 91 Changed 9 years ago by
Replying to jdemeyer:
Concerning the orphaned process: could it be that they appear during the build as opposed to when running Sage? Typical command line:
/release/merger/sage5.5.beta2/local/gap/latest/bin/x86_64unknownlinuxgnugccdefault64/gap m 24m l /release/merger/sage5.5.beta2/local/gap/latest r b p T o 6271720243 /release/merger/sage5.5.beta2/local/share/sage/ext/gap/sage.g
well, I have seen an orphaned gap process a couple of times, but I don't even recall whether it was with this patch, or not.
comment:93 in reply to: ↑ 90 Changed 9 years ago by
Replying to jdemeyer:
On OS X 10.6 (William's
bsd
machine), I get the same build error aseno
:
Oh, OK, this is a missing dependency, I think. GAP depends upon MPIR now, as it uses (pseudo)GMP. I'll attach a patch in a second.
comment:94 Changed 9 years ago by
 Description modified (diff)
 Status changed from needs_work to needs_review
comment:95 Changed 9 years ago by
The question is not why are there GAP orphans, but why is the sagecleaner not killing them once the parent Sage process quits?
I'll shortly add a patch to run gap with prctl unaligned=silent
on itanium, this will get rid of the offending alignment warnings.
comment:96 Changed 9 years ago by
 Keywords build on eno removed
comment:97 Changed 9 years ago by
I'm currently building again to try to determine when the processes are created.
comment:98 Changed 9 years ago by
 Description modified (diff)
Positive review to dima's dependency patch.
I've also added the patch to suppress the itanium warnings. I haven't tested it on itanium yet (building sage5.4.rc3 first). But I've verified that the prctl
call gets rid of the warnings.
comment:99 Changed 9 years ago by
PS: The GAP command line includes o 6271720243
, this is clearly from trac_13211_pool_size.patch
and not a process created during build.
comment:100 followup: ↓ 101 Changed 9 years ago by
 Status changed from needs_review to needs_work
I found out how to reproduce the orphans: running local/bin/sagestarts
creates a gap orphan every time I run it.
comment:101 in reply to: ↑ 100 ; followup: ↓ 102 Changed 9 years ago by
Replying to jdemeyer:
I found out how to reproduce the orphans: running
local/bin/sagestarts
creates a gap orphan every time I run it.
I can reproduce this neither on Debian x86_64 nor on OSX 10.6.8. Specifically, I cd to SAGE_ROOT and start
./local/bin/sagestarts
there.
I even tried to remove ~/.sage/
, no difference.
It looks like we need more details on the system you see this. (Preferably, access to it...).
comment:102 in reply to: ↑ 101 ; followup: ↓ 103 Changed 9 years ago by
Replying to dimpase:
Replying to jdemeyer:
I found out how to reproduce the orphans: running
local/bin/sagestarts
creates a gap orphan every time I run it.I can reproduce this neither on Debian x86_64 nor on OSX 10.6.8. Specifically, I cd to SAGE_ROOT and start
./local/bin/sagestarts
there. I even tried to remove~/.sage/
, no difference.It looks like we need more details on the system you see this. (Preferably, access to it...).
OK, got it. I need to issue several local/bin/sagestarts
within a short period of time on Debian x86_64. Then I get the orphans! Otherwise, not. A race condition, of sorts?
comment:103 in reply to: ↑ 102 ; followup: ↓ 104 Changed 9 years ago by
Replying to dimpase:
Replying to dimpase:
Replying to jdemeyer:
I found out how to reproduce the orphans: running
local/bin/sagestarts
creates a gap orphan every time I run it.I can reproduce this neither on Debian x86_64 nor on OSX 10.6.8. Specifically, I cd to SAGE_ROOT and start
./local/bin/sagestarts
there. I even tried to remove~/.sage/
, no difference.It looks like we need more details on the system you see this. (Preferably, access to it...).
OK, got it. I need to issue several
local/bin/sagestarts
within a short period of time on Debian x86_64. Then I get the orphans! Otherwise, not. A race condition, of sorts?
Oops, no, in fact, these are not orphans, in the sense that they are not staying running forever. They all finish within 2030 seconds. A typical process looks as follows (very similar to Jeroen's case):
dima 17195 1 93 19:24 ? 00:00:25 /usr/local/src/sage/sage5.4.rc3/local/gap/latest/bin/x86_64unknownlinuxgnugccdefault64/gap m 24m l /usr/local/src/sage/sage5.4.rc3/local/gap/latest r b p T o 872851456 /usr/local/src/sage/sage5.4.rc3/local/share/sage/ext/gap/sage.g
Jeroen, so you say that for you these processes don't finish by themselves?
comment:104 in reply to: ↑ 103 Changed 9 years ago by
Replying to dimpase:
Jeroen, so you say that for you these processes don't finish by themselves?
Yes, they do not finish by themselves (at least not within a day).
It looks like we need more details on the system you see this. (Preferably, access to it...).
sage.math.washington.edu: Ubuntu 8.04.4 LTS, x86_64.
comment:105 Changed 9 years ago by
Well, I spoke too soon, sometimes they do finish by themselves.
comment:106 followup: ↓ 109 Changed 9 years ago by
Big there's still the obvious question: what is that gap
process doing and why does it keep running for a while?
comment:107 Changed 9 years ago by
Running strace
on such a gap
process shows:
getrusage(RUSAGE_SELF, {ru_utime={97, 710000}, ru_stime={0, 200000}, ...}) = 0 write(1, "@!", 2) = 1 EIO (Input/output error) getrusage(RUSAGE_SELF, {ru_utime={97, 740000}, ru_stime={0, 200000}, ...}) = 0 write(1, "@!", 2) = 1 EIO (Input/output error) getrusage(RUSAGE_SELF, {ru_utime={97, 770000}, ru_stime={0, 200000}, ...}) = 0 write(1, "@!", 2) = 1 EIO (Input/output error) getrusage(RUSAGE_SELF, {ru_utime={97, 800000}, ru_stime={0, 200000}, ...}) = 0 write(1, "@!", 2) = 1 EIO (Input/output error) getrusage(RUSAGE_SELF, {ru_utime={97, 830000}, ru_stime={0, 200000}, ...}) = 0
[...]
getrusage(RUSAGE_SELF, {ru_utime={116, 940000}, ru_stime={0, 200000}, ...}) = 0 write(1, "@!", 2) = 1 EIO (Input/output error)  SIGSEGV (Segmentation fault) @ 0 (0)  Process 12608 detached
It looks like it's stuck in an infinite loop until it finally segfaults, which is almost certainly not what's supposed to happen.
comment:108 Changed 9 years ago by
For those interested: a full strace
can be found at http://boxen.math.washington.edu/home/jdemeyer/gap.trace
comment:109 in reply to: ↑ 106 Changed 9 years ago by
Replying to jdemeyer:
Big there's still the obvious question: what is that
gap
process doing and why does it keep running for a while?
that it's the same GAP process as the one that gets started during the "normal" Sage startup.
And it does the following things, as you can see by uncommenting LogTo
line at the end of sage.g
.
gap> LoadPackage("ctbllib"); true ... gap> SaveWorkspace("/var/folders/qW/qWY+4Ku1GF0WXrOsV+IDvk+++TM/Tmp//dotsageKTXveD/gap/workspace8046660130267724445"); true gap>
LoadPackage()
things are OK, it's just loading GAP's packages.
The tough thing is SaveWorkspace()
, which dumps GAP's workspace into a binary file; which can be loaded back  well, not in this case of course, cause we are invoked with nodotsage, so it goes to waste.
So I suppose that's what gets stuck, and crashes, for some reason (e.g. not enough disk space?).
comment:110 Changed 9 years ago by
Whats the content of your ~/.sage/tmp/<hostname>/<pid>/spawned_processes
file?
comment:111 Changed 9 years ago by
PS: Writing the workspace dump succeeds as you can see from Jeroen's strace.
We don't specifically close down GAP afterwards, but just close the stdin/out pipes. This sends GAP into at tizzy, and it keeps trying to read from stdin after getting EIO (why would you do this?). Still, the real question remains: why is GAP not killed by the sage cleaner?
comment:112 Changed 9 years ago by
Now I see ENOSPC errors (I think not before):
open("/tmp/dotsagewHCJJ1/gap/workspace5368401171496622154", O_WRONLYO_CREATO_TRUNC, 0644) = 4 write(4, "GAP workspace\0004.5.6\00064 bit\0\10\7\6\5\4"..., 100000) = 1 ENOSPC (No space left on device)
But this doesn't make sense as there is plenty of space on /tmp
and I can create large files manually there:
jdemeyer@sage:sage5.5.beta0gap$ dd if=/dev/zero of=/tmp/dotsagewHCJJ1/gap/workspace5368401171496622154 bs=1024 count=100000 100000+0 records in 100000+0 records out 102400000 bytes (102 MB) copied, 0.158741 s, 645 MB/s
I am lost...
comment:113 Changed 9 years ago by
Could the ENOSPC be related to the fact that this a tmpfs and that GAP reserves all the memory for itself?
comment:114 Changed 9 years ago by
Does gap create lots of small files in /tmp
? If so, are you running out of inodes? You can check using df i
.
comment:115 Changed 9 years ago by
/tmp
is as good as empty:
jdemeyer@sage:~$ df h /tmp Filesystem Size Used Avail Use% Mounted on tmpfs 16G 25M 16G 1% /tmp jdemeyer@sage:~$ df i /tmp Filesystem Inodes IUsed IFree IUse% Mounted on tmpfs 524288 1253 523035 1% /tmp
(looking at the trace, gap doesn't create a lot of files in /tmp
)
comment:116 Changed 9 years ago by
Possibly kernel bug? From 2.6.38 changelog: "tmpfs: fix spurious ENOSPC when racing with unswap". GAP puts pressure on swap since it locks address space (and hence reduces the available swap).
comment:117 Changed 9 years ago by
Switching filesystems made to ENOSPC disappear.
comment:118 Changed 9 years ago by
But now I cannot reproduce the orphans anymore, they now crash much earlier:
write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\202\30\0\0\0\0\0\0\300\3\0\0\0"..., 100000) = 100000 write(4, "\0\0\0\0\275\10\0\0\0\0\0\0\363\6\0\0\0\0\0\0\0\0\0\0\0"..., 100000) = 100000 write(4, "\0\0\0\275\10\0\0\0\0\0\0\306\v\0\0\0\0\0\0\0\0\0\0\0\0"..., 100000) = 100000 write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\275"..., 86744) = 86744 close(4) = 0 write(1, "@n", 2) = 2 write(1, "true@J", 6) = 6 getrusage(RUSAGE_SELF, {ru_utime={2, 900000}, ru_stime={0, 140000}, ...}) = 0 getrusage(RUSAGE_SELF, {ru_utime={2, 900000}, ru_stime={0, 140000}, ...}) = 0 write(1, "@n", 2) = 2 write(1, "gap> ", 5) = 5 write(1, "@i", 2) = 2 getrusage(RUSAGE_SELF, {ru_utime={2, 900000}, ru_stime={0, 140000}, ...}) = 0 read(0, "", 1) = 0 write(1, "\r", 1) = 1 EIO (Input/output error) write(1, "@f", 2) = 1 EIO (Input/output error) write(1, "@f", 2) = 1 EIO (Input/output error) write(1, "@f", 2) = 1 EIO (Input/output error) write(1, "@f", 2) = 1 EIO (Input/output error) write(1, "@f", 2) = 1 EIO (Input/output error) write(1, "@f", 2) = 1 EIO (Input/output error) write(1, "@f", 2) = 1 EIO (Input/output error) write(1, "@f", 2) = 1 EIO (Input/output error) write(1, "@f", 2) = 1 EIO (Input/output error) write(1, "@f", 2) = 1 EIO (Input/output error) write(1, "@f", 2) = 1 EIO (Input/output error) write(1, "@f", 2) = 1 EIO (Input/output error) write(1, "@f", 2) = 1 EIO (Input/output error)  SIGSEGV (Segmentation fault) @ 0 (0)  +++ killed by SIGSEGV +++
comment:119 Changed 9 years ago by
And "$DOT_SAGE/tmp/sage.math.washington.edu"
is empty after Sage is finished.
comment:120 followup: ↓ 123 Changed 9 years ago by
But for sagecleaner
, what matters is "$DOT_SAGE/temp/sage.math.washington.edu"
comment:121 Changed 9 years ago by
When running Sage interactively (as opposed to sage c
), gap
gets closed properly because it gets sent "quit;\n"
write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\201\20\0\0\0\0"..., 91115) = 91115 close(4) = 0 write(1, "@n", 2) = 2 write(1, "true@J", 6) = 6 getrusage(RUSAGE_SELF, {ru_utime={2, 790000}, ru_stime={0, 170000}, ...}) = 0 getrusage(RUSAGE_SELF, {ru_utime={2, 790000}, ru_stime={0, 170000}, ...}) = 0 write(1, "@n", 2) = 2 write(1, "gap> ", 5) = 5 write(1, "@i", 2) = 2 getrusage(RUSAGE_SELF, {ru_utime={2, 790000}, ru_stime={0, 170000}, ...}) = 0 read(0, "q", 1) = 1 write(1, "q", 1) = 1 read(0, "u", 1) = 1 write(1, "u", 1) = 1 read(0, "i", 1) = 1 write(1, "i", 1) = 1 read(0, "t", 1) = 1 write(1, "t", 1) = 1 read(0, ";", 1) = 1 write(1, ";", 1) = 1 read(0, "\n", 1) = 1 write(1, "\r", 1) = 1 write(1, "\n", 1) = 1 write(1, "@r", 2) = 2 write(1, "quit;@J", 7) = 7 getrusage(RUSAGE_SELF, {ru_utime={2, 790000}, ru_stime={0, 170000}, ...}) = 0 read(3, "", 20000) = 0 close(3) = 0 exit_group(0) = ?
comment:122 Changed 9 years ago by
 Keywords segmentation fault in child process added; orphaned processes removed
Regardless of sagecleaner
, I think it's clear that the gap Segmentation Fault should be fixed.
comment:123 in reply to: ↑ 120 Changed 9 years ago by
Replying to jdemeyer:
But for
sagecleaner
, what matters is "$DOT_SAGE/temp/sage.math.washington.edu"
OK, so the cleaner does not clean if one starts with nodotsage...
At least this should go another ticket, IMHO.
comment:124 Changed 9 years ago by
 Keywords segmentation fault in child process removed
 Work issues set to segmentation fault in child process
comment:125 Changed 9 years ago by
 Description modified (diff)
 Status changed from needs_work to needs_review
 Work issues segmentation fault in child process deleted
I've added a patch to explicitly quit Gap after writing the workspace.
The sagecleaner indeed looks in .../temp/...
, but SAGE_TMP
is in .../tmp/...
so the cleaner is currently broken. Thats the fault of #13579
comment:126 Changed 9 years ago by
comment:127 Changed 9 years ago by
 Status changed from needs_review to positive_review
comment:128 Changed 9 years ago by
 Status changed from positive_review to needs_work
After running some doctests again with this patch, I see a lot of orphan gap processes again. The strace is as before:
[...] getrusage(RUSAGE_SELF, {ru_utime={9186, 870000}, ru_stime={6, 120000}, ...}) = 0 write(1, "@!", 2) = 1 EIO (Input/output error) getrusage(RUSAGE_SELF, {ru_utime={9187, 210000}, ru_stime={6, 120000}, ...}) = 0 write(1, "@!", 2) = 1 EIO (Input/output error) getrusage(RUSAGE_SELF, {ru_utime={9187, 550000}, ru_stime={6, 120000}, ...}) = 0 write(1, "@!", 2) = 1 EIO (Input/output error) getrusage(RUSAGE_SELF, {ru_utime={9187, 910000}, ru_stime={6, 120000}, ...}) = 0 write(1, "@!", 2) = 1 EIO (Input/output error) getrusage(RUSAGE_SELF, {ru_utime={9188, 260000}, ru_stime={6, 120000}, ...}) = 0 write(1, "@!", 2) = 1 EIO (Input/output error) [...]
comment:129 followup: ↓ 130 Changed 9 years ago by
 Status changed from needs_work to needs_info
Again, the question is not why are there GAP orphans but why is the sagecleaner not killing them once the parent Sage process quits?
comment:130 in reply to: ↑ 129 Changed 9 years ago by
Replying to vbraun:
Again, the question is not why are there GAP orphans but why is the sagecleaner not killing them once the parent Sage process quits?
That's a question, I don't think it's the question. Sage should clean up for itself and sagecleaner
should only be a last resort.
comment:131 Changed 9 years ago by
I agree, but since you are the only one who is able to get the orphans (possibly in conjunction with a kernel bug) I'd be perfectly happy to let sagecleaner handle this corner case.
comment:132 Changed 9 years ago by
On an unrelated note, the interleaved getrusage / write strace probably means that GAP is running something in the internal profiler.
comment:133 Changed 9 years ago by
Can you confirm that the offending gap process still doesn't include the "L ...gapworkspace.." command line switch? This probably means that the processes originate when creating the workspace. Is this on a file system that disallows simultaneous writes from multiple processes to the same file?
comment:134 Changed 9 years ago by
Also, can you post a strace for the hanging GAP process? With trac_13211_quit_after_workspace.patch the 'gap_reset_workspace()' command quits correctly, so I suspect your problem is at a different place now.
comment:135 Changed 9 years ago by
To better test this, I disabled the cleaner (by putting sys.exit(0)
in sagecleaner
).
Example strace of segfaulting process is attached.
comment:136 Changed 9 years ago by
I'm also seeing a lot of ENOMEM
errors in the trace. Could these be caused by the huge pool size?
comment:137 Changed 9 years ago by
There is a potential infinite loop with _eval_line
in gap.py
: when it crashes, it keeps trying again over and over:
def _eval_line(self, line, allow_use_file=True, wait_for_prompt=True, restart_if_needed=True): [...] try: [...] except (RuntimeError,TypeError),message: if 'EOF' in message[0] or E is None or not E.isalive(): print "** %s crashed or quit executing '%s' **"%(self, line) print "Restarting %s and trying again"%self self._start() if line != '': return self._eval_line(line, allow_use_file=allow_use_file) else: return '' else: raise RuntimeError, message
Is this intentional? Probably the number of retries should be limited.
(this seems unrelated to this ticket but I thought I should mention it)
comment:138 Changed 9 years ago by
Did you apply trac_13211_pool_size.patch
? Unless your machine has exabytesized swap it shouldn't even try to allocate such a large pool.
comment:139 Changed 9 years ago by
Oh I see, gap o <number>
doesn't work. It apparently requires a binary prefix.
comment:140 followup: ↓ 142 Changed 9 years ago by
There is apparently an overflow in gap's argument parsing at 3*2^31
:
(sagesh) vbraun@localhost:~$ /home/vbraun/opt/sage5.4.rc4/local/gap/latest/bin/x86_64unknownlinuxgnugccdefault64/gap l /home/vbraun/opt/sage5.4.rc4/local/gap/latest o 6442450943 ┌───────┐ GAP, Version 4.5.6 of 16Sep2012 (free software, GPL) │ GAP │ http://www.gapsystem.org └───────┘ Architecture: x86_64unknownlinuxgnugccdefault64 Libs used: gmp, readline Loading the library and packages ... Packages: GAPDoc 1.5.1 Try '?help' for help. See also '?copyright' and '?authors' gap> (sagesh) vbraun@localhost:~$ /home/vbraun/opt/sage5.4.rc4/local/gap/latest/bin/x86_64unknownlinuxgnugccdefault64/gap l /home/vbraun/opt/sage5.4.rc4/local/gap/latest o 6442450944 gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. gap: halving pool size. ┌───────┐ GAP, Version 4.5.6 of 16Sep2012 (free software, GPL) │ GAP │ http://www.gapsystem.org └───────┘ Architecture: x86_64unknownlinuxgnugccdefault64 Libs used: gmp, readline Loading the library and packages ... Packages: GAPDoc 1.5.1 Try '?help' for help. See also '?copyright' and '?authors'
That explains why you get the ENOMEMs. But gap tries smaller and smaller mmaps until it succeeds, so this is not a real problem. It will eat the available swap space and put much more pressure on the virtual memory system, though. I'll report this issue upstream.
comment:141 followup: ↓ 143 Changed 9 years ago by
I've updated the patch to use gap o <number>m
for a specific number of megabytes instead of specifying the pool size in bytes. This should avoid the argument parsing overflow.
comment:142 in reply to: ↑ 140 Changed 9 years ago by
Replying to vbraun:
There is apparently an overflow in gap's argument parsing at
3*2^31
: That explains why you get the ENOMEMs. But gap tries smaller and smaller mmaps until it succeeds, so this is not a real problem. It will eat the available swap space and put much more pressure on the virtual memory system, though. I'll report this issue upstream.
by the way, GAP finally has a real tracker!
comment:143 in reply to: ↑ 141 Changed 9 years ago by
 Status changed from needs_info to needs_review
Replying to vbraun:
I've updated the patch to use
gap o <number>m
for a specific number of megabytes instead of specifying the pool size in bytes. This should avoid the argument parsing overflow.
OK, I am testing this on the culprit machine (sage on sagemath UW cluster), and not able to see any staying up GAP processes. I'll try a bit more, and unless I succeed, I'll make it positive review...
comment:144 followups: ↓ 147 ↓ 148 Changed 9 years ago by
After building and fully doctesting Sage, I still end up with 20 gap processes:
jdemeyer@sage:~$ ps ef grep gap jdemeyer 6148 1 83 10:12 ? 03:35:26 /release/merger/sage5.5.beta2/local/gap/latest/bin/x86_64unknownlinuxgnugccdefault64/gap m 24m l /release/merger/sage5.5.beta2/local/gap/latest r L /release/merger/sage5.5.beta2/home/.sage/gap/workspace1647326966196298359 b p T o 6274m /release/merger/sage5.5.beta2/local/share/sage/ext/gap/sage.g jdemeyer 6150 1 83 10:12 ? 03:36:51 /release/merger/sage5.5.beta2/local/gap/latest/bin/x86_64unknownlinuxgnugccdefault64/gap m 24m l /release/merger/sage5.5.beta2/local/gap/latest r L /release/merger/sage5.5.beta2/home/.sage/gap/workspace1647326966196298359 b p T o 6274m /release/merger/sage5.5.beta2/local/share/sage/ext/gap/sage.g jdemeyer 6154 1 83 10:12 ? 03:35:23 /release/merger/sage5.5.beta2/local/gap/latest/bin/x86_64unknownlinuxgnugccdefault64/gap m 24m l /release/merger/sage5.5.beta2/local/gap/latest r L /release/merger/sage5.5.beta2/home/.sage/gap/workspace1647326966196298359 b p T o 6274m /release/merger/sage5.5.beta2/local/share/sage/ext/gap/sage.g [...]
comment:145 Changed 9 years ago by
At least, the ENOMEM errors are gone!
comment:146 Changed 9 years ago by
gap.10799 shows the strace of a hanging process. At some point, the trace stops, it seems no further system calls are done.
I don't see the EIO errors and segmentation faults anymore, perhaps that was caused by ENOMEM?
comment:147 in reply to: ↑ 144 Changed 9 years ago by
Replying to jdemeyer:
After building and fully doctesting Sage, I still end up with 20 gap processes:
I tried applying the patches (and the spkg) on this ticket to Sage 5.4.rc4, and running make ptest
. I have not got any gap
processes left after it has finished. Have you been doing something else, something nonstandard?
comment:148 in reply to: ↑ 144 Changed 9 years ago by
Replying to jdemeyer:
After building and fully doctesting Sage, I still end up with 20 gap processes:
jdemeyer@sage:~$ ps ef grep gap jdemeyer 6148 1 83 10:12 ? 03:35:26 /release/merger/sage5.5.beta2/local/gap/latest/bin/x86_64unknownlinuxgnugccdefault64/gap m 24m l /release/merger/sage5.5.beta2/local/gap/latest r L /release/merger/sage5.5.beta2/home/.sage/gap/workspace1647326966196298359 b p T o 6274m /release/merger/sage5.5.beta2/local/share/sage/ext/gap/sage.g jdemeyer 6150 1 83 10:12 ? 03:36:51 /release/merger/sage5.5.beta2/local/gap/latest/bin/x86_64unknownlinuxgnugccdefault64/gap m 24m l /release/merger/sage5.5.beta2/local/gap/latest r L /release/merger/sage5.5.beta2/home/.sage/gap/workspace1647326966196298359 b p T o 6274m /release/merger/sage5.5.beta2/local/share/sage/ext/gap/sage.g jdemeyer 6154 1 83 10:12 ? 03:35:23 /release/merger/sage5.5.beta2/local/gap/latest/bin/x86_64unknownlinuxgnugccdefault64/gap m 24m l /release/merger/sage5.5.beta2/local/gap/latest r L /release/merger/sage5.5.beta2/home/.sage/gap/workspace1647326966196298359 b p T o 6274m /release/merger/sage5.5.beta2/local/share/sage/ext/gap/sage.g [...]
Where is .sage/ used in these calls to GAP? AFAIK there is no /release/merger/sage5.5.beta2/home/
, yet they list GAP workspace files /release/merger/sage5.5.beta2/home/.sage/gap/workspace1647326966196298359
.
Weird. Could it be the reason for them to hang?
comment:149 Changed 9 years ago by
/release/merger/sage5.5.beta2/home
is a temporary $HOME
directory for the merger. It's similar to using the nodotsage
command line option to sage
.
comment:150 Changed 9 years ago by
The GAP developers acknowledged the option parsing overflow, will be fixed in the next stable release.
comment:151 Changed 9 years ago by
Is the number of orphans related to the number of parallel doctest processes? The added patch fixes an issue where we abandoned one process instead of killing it if the cached workspace fails to load and needs to be recreated.
comment:152 Changed 9 years ago by
 Description modified (diff)
comment:153 Changed 9 years ago by
I spoke too soon about the EIO errors and segmentation faults being gone, I still get them.
comment:154 followup: ↓ 160 Changed 9 years ago by
I still don't understand why you get EIO. This means something is seriously foobared, no? When sage quits the other end of the pipe should be closed, resulting in EPIPE.
My theory is the following: the Sage expect interface can use temporary files to redirect stdin/out and this is where the problem lies. If you run with sage nodotsage
, this temporary file is on /tmp
(see SAGE_TMP_INTERFACE
). The tmpfs on sage.math has some bug that is triggered when you put pressure on the virtual memory, when most of the swap is locked in anonymous mmaps.
comment:155 Changed 9 years ago by
I ran some more doctests and GAP is not the only process that ends up in a write loop with EIO until it segfaults. E.g. Maxima does it, too. I guess your problem is that GAP does not segfault under certain circumstances.
comment:156 Changed 9 years ago by
The glibc manual says: EIO also occurs when a background process tries to read from the controlling terminal, and the normal action of stopping the process by sending it a SIGTTIN signal isn't working. This might happen if signal is being blocked or ignored, or because the process group is orphaned. See section Job Control, for more information about job control, and section Signal Handling, for information about signals.
comment:157 followup: ↓ 158 Changed 9 years ago by
I don't want to annoy Jeroen yet again, but I really think we are done with this ticket. If on an old system (Ubuntu 8.04 LTS will reach its EOL in the coming April) under some very unusual circumstances it doesn't quite work as expected, it should not hold the ticket up.
comment:158 in reply to: ↑ 157 Changed 9 years ago by
Replying to dimpase:
I don't want to annoy Jeroen yet again, but I really think we are done with this ticket. If on an old system (Ubuntu 8.04 LTS will reach its EOL in the coming April) under some very unusual circumstances it doesn't quite work as expected, it should not hold the ticket up.
If that old system happens to be the one on which Sage releases are made, it obviously holds up the ticket.
comment:159 followup: ↓ 161 Changed 9 years ago by
 Status changed from needs_review to needs_work
I doubt that /tmp
has anything to do with it. The filedescriptor giving EIO is a deleted pseudoterminal, not a temporary file. This can be seen from /proc/$PID/fd
:
jdemeyer@sage:/tmp/gaptrace$ ls l /proc/19589/fd total 0 lrwx 1 jdemeyer jdemeyer 64 Nov 14 11:49 0 > /dev/pts/71 (deleted) lrwx 1 jdemeyer jdemeyer 64 Nov 14 11:49 1 > /dev/pts/71 (deleted) lrwx 1 jdemeyer jdemeyer 64 Nov 14 11:49 2 > /dev/pts/71 (deleted)
I don't think it's a relevant discussion why we sometimes get Segmentation Faults and sometimes not.
comment:160 in reply to: ↑ 154 Changed 9 years ago by
Replying to vbraun:
When sage quits the other end of the pipe should be closed, resulting in EPIPE.
A pseudoterminal is not a pipe.
comment:161 in reply to: ↑ 159 Changed 9 years ago by
Replying to jdemeyer:
I doubt that
/tmp
has anything to do with it.
Somewhere near the end of sage/ext/gap/sage.g
there is a commented out line LogTo("/tmp/gapsage.log");
Could you uncomment it and try to reproduce such a hanging GAP process, with this log on?
As well, perhaps add some debugging prints into each of the GAP functions in the file.
Something like AppendTo("/tmp/sageg","OperationsAdmittingFirstArgument\n");
where OperationsAdmittingFirstArgument
is the function name, and "tmp/sageg" the name of the log file...
My guess that the problem is in $SAGE.NewPager()
, which tries to get $SAGE.tempfile
and fails, either during this interaction, or upon attempting to open it.
comment:162 Changed 9 years ago by
I am seriously debugging GAP now to check for the problem.
comment:163 followup: ↓ 164 Changed 9 years ago by
I'm pretty sure I found the bug. The culprit is the following from src/sysfiles.c
:
/* utility to check return value of 'write' */ ssize_t writeandcheck(int fd, const char *buf, size_t count) { int ret; ret = write(fd, buf, count); if (ret < 0) { ErrorQuit("Cannot write to file descriptor %d, see 'LastSystemError();'\n", fd, 0L); } return ret; }
If the pseudotty is closed, then write()
fails with EIO, causing GAP to write an error message. We then get an infinite loop of failing to write the error message.
comment:164 in reply to: ↑ 163 Changed 9 years ago by
Replying to jdemeyer:
It seems that ErrorQuit
could just write to stderr
in this case. Although in general it's not clear to me how to deal with it, for such a plug perhaps could break the ability of the interpreter to recover from errors.
Is it actually possible to reliably check that the pty is closed?
comment:165 followup: ↓ 170 Changed 9 years ago by
Now that I know what the problem is, reproducing the Segmentation Fault is trivial. Within a Sage shell:
(sagesh) jdemeyer@sage:sage5.5.beta2$ gap &>/dev/full Segmentation fault
comment:166 followup: ↓ 167 Changed 9 years ago by
You'd still get a EPIPE if the process wasn't orphaned. Also, the problem is that GAP is not segfaulting in some circumstances when it is orphaned. If it would segfault you wouldn't get an orphan.
The underlying problem is that the expect interfaces don't always clean up processes before quitting Sage, hence leading to orphaned processes to start with. Only after the process is orphaned you get EIO's.
comment:167 in reply to: ↑ 166 Changed 9 years ago by
Replying to vbraun:
You'd still get a EPIPE if the process wasn't orphaned.
I doubt that pseudoterminals can cause an EPIPE (at least in Linux). Even if they would, you should also get a SIGPIPE signal, killing the process.
Also, the problem is that GAP is not segfaulting in some circumstances when it is orphaned.
I would not say that this is the problem. There is a bug in GAP which may or may not lead to a segfault. Relying that it will cause a segfault is silly.
The underlying problem is that the expect interfaces don't always clean up processes before quitting Sage, hence leading to orphaned processes to start with. Only after the process is orphaned you get EIO's.
True. So either we fix GAP or we fix the Sage interface, and preferably both.
comment:168 followup: ↓ 169 Changed 9 years ago by
Just for the record, other programs Segfault after running into EIO for a while as well. I dare say that most interactive binaries are not meant to be run as orphans. I don't see much point in making sure that the GAP command line interface can be run without controlling terminal.
comment:169 in reply to: ↑ 168 Changed 9 years ago by
Replying to vbraun:
Just for the record, other programs Segfault after running into EIO for a while as well.
Indeed, it seems ECL has exactly the same bug. But that doesn't mean the GAP bug (or the Sage bug controlling GAP if you want) shouldn't be fixed.
And the problem isn't limited to terminals: as I showed, it can also occur for example when writing to a filesystem which is full.
comment:170 in reply to: ↑ 165 Changed 9 years ago by
Replying to jdemeyer:
Now that I know what the problem is, reproducing the Segmentation Fault is trivial. Within a Sage shell:
(sagesh) jdemeyer@sage:sage5.5.beta2$ gap &>/dev/full Segmentation fault
This is not the complete story. First off, you need to run with T, which is meant to suppress the usual interactive behavour, and this is the option using which the orphans in question arise. In fact, it is because stderr
is full, not because stdout
is full.
(sagesh) dima@sage$ gap T >/dev/full Error, Cannot write to file descriptor 1, see 'LastSystemError();' Error, Cannot write to file descriptor 1, see 'LastSystemError();' Error, Cannot write to file descriptor 1, see 'LastSystemError();' Error, Cannot write to file descriptor 1, see 'LastSystemError();' Error, Cannot write to file descriptor 1, see 'LastSystemError();' Syntax error: ; expected ^ Error, Cannot write to file descriptor 1, see 'LastSystemError();' (sagesh) dima@sage$
Thus if the stderr
is OK, it sort of works (the Syntax error;
is quite weird though, and might be an indication of a "more real" bug).
By the way, gap4.4
just hangs in this situation.
As we talk about the situation with T on, we can patch, say, ErrorQuit
for this case to print to stderr
and quit immediately. This will not fix this completely, but at least it will make sure that with T option the behaviour is as it should be.
An alternative is to change Sage so that GAPs stderr
does not get redirected. I don't know how well this will work, though.
comment:171 followup: ↓ 172 Changed 9 years ago by
 Report Upstream changed from Completely fixed; Fix reported upstream to Reported upstream. No feedback yet.
I've created an issue on GAP bugtracker in regard to the comment 165: http://tracker.gapsystem.org/issues/125
comment:172 in reply to: ↑ 171 Changed 9 years ago by
Replying to dimpase:
I've created an issue on GAP bugtracker in regard to the comment 165: http://tracker.gapsystem.org/issues/125
Too bad one cannot even look at the tracker issue without logging in. I registered for an account, but it needs to be approved by a moderator. Let me know if anything interesting comes up from upstream.
comment:173 followup: ↓ 174 Changed 9 years ago by
 Description modified (diff)
 Status changed from needs_work to needs_review
New spkg with patch added: http://boxen.math.washington.edu/home/jdemeyer/spkg/gap4.5.6.p0.spkg
Patch attached.
comment:174 in reply to: ↑ 173 ; followup: ↓ 175 Changed 9 years ago by
 Report Upstream changed from Reported upstream. No feedback yet. to Reported upstream. Developers acknowledge bug.
 Status changed from needs_review to positive_review
Replying to jdemeyer:
New spkg with patch added: http://boxen.math.washington.edu/home/jdemeyer/spkg/gap4.5.6.p0.spkg
Patch attached.
The patch makes gap &>/dev/full
hang. (Well, this is consistent with GAP 4.4 behaviour). At least this is good enough, I suppose, to fix the orphans issue. I mark this as positive review, with understanding that orphans, which I can't reproduce anyway, are no longer there. I hope GAP people will fix this good and proper.
comment:175 in reply to: ↑ 174 ; followup: ↓ 176 Changed 9 years ago by
Replying to dimpase:
The patch makes
gap &>/dev/full
hang.
Of course it "hangs", since it's waiting for user input. It's still an interactive GAP session, you can hit CTRLD to exit.
comment:176 in reply to: ↑ 175 Changed 9 years ago by
comment:177 followup: ↓ 181 Changed 9 years ago by
 Status changed from positive_review to needs_work
There is something wrong with the pool size patch. On the buildbot machine "snapperkob" (Ubuntu 12.04 x86_64), gap is started with "o 53m" which is too few. It gives
RuntimeError: Gap produced error output Error, exceeded the permitted memory (`o' command line option) executing SaveWorkspace("/tmp/dotsageyds6nf/gap/workspace8407479053605879120");
But there is plenty of memory available:
$ cat /proc/meminfo MemTotal: 8012124 kB MemFree: 2736044 kB Buffers: 125968 kB Cached: 4721000 kB SwapCached: 0 kB Active: 2978056 kB Inactive: 1930440 kB Active(anon): 61564 kB Inactive(anon): 3648 kB Active(file): 2916492 kB Inactive(file): 1926792 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 524284 kB SwapFree: 524284 kB Dirty: 20 kB Writeback: 0 kB AnonPages: 61552 kB Mapped: 14396 kB Shmem: 3688 kB Slab: 261404 kB SReclaimable: 243728 kB SUnreclaim: 17676 kB KernelStack: 1280 kB PageTables: 3568 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 4530344 kB Committed_AS: 160128 kB VmallocTotal: 34359738367 kB VmallocUsed: 561128 kB VmallocChunk: 34359173628 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 47100 kB DirectMap2M: 8173568 kB
comment:178 Changed 9 years ago by
 Status changed from needs_work to positive_review
comment:179 Changed 9 years ago by
 Status changed from positive_review to needs_work
comment:180 Changed 9 years ago by
 Milestone changed from sage5.5 to sage5.6
comment:181 in reply to: ↑ 177 Changed 9 years ago by
Replying to jdemeyer:
There is something wrong with the pool size patch. On the buildbot machine "snapperkob" (Ubuntu 12.04 x86_64), gap is started with "o 53m" which is too few.
I'll try to reproduce this on sage.combinat
. This is the only Ubuntu 12.04 x86_64 I presently have access to.
comment:182 Changed 9 years ago by
The machine has only 512MB of swap? The pool defaults to 1/10th of the total swap.
comment:183 followups: ↓ 184 ↓ 185 Changed 9 years ago by
I've changed the pool size computation to default to at least 75MB.
comment:184 in reply to: ↑ 183 Changed 9 years ago by
Replying to vbraun:
I've changed the pool size computation to default to at least 75MB.
yes, I was just reading your patch. It's my fault that I missed this potential problem before (I am aware of a "modern" school that teaches that swap is dead, as there is so much RAM nowadays...).
Let me test it now.
comment:185 in reply to: ↑ 183 ; followup: ↓ 186 Changed 9 years ago by
Replying to vbraun:
I've changed the pool size computation to default to at least 75MB.
no, you have set it to 75*1024**3
, which is 75GB!
You should do 75*1024**2
instead.
comment:186 in reply to: ↑ 185 Changed 9 years ago by
Replying to dimpase:
Replying to vbraun:
I've changed the pool size computation to default to at least 75MB.
no, you have set it to
75*1024**3
, which is 75GB! You should do75*1024**2
instead.
I also think that
suggested_size = max(int(mem.available_swap() / 10), int(mem.available_ram() / 50), # in case you run without swap 75 * 1024**2 ) # about 75MB is the minimum to run GAP
is way too much for machines with a lot of RAM. E.g. I tried this on sage.combinat and got 3.6GB as the suggested_size
.
I'd rather propose
suggested_size = min(150 * 1024**2, max(int(mem.available_swap() / 10), int(mem.available_ram() / 50), # in case you run without swap 75 * 1024**2 )) # about 75MB is the minimum to run GAP
for 150MB is certainly good enough, but would not lead to problems if you have a hundred instances of Sage running on the same machine.
comment:187 Changed 9 years ago by
On 32bit systems:
sage t long force_lib devel/sage/sage/misc/memory_info.py ********************************************************************** File "/var/lib/buildbot/build/sage/arando1/arando_full/build/sage5.5.beta2/devel/sagemain/sage/misc/memory_info.py", line 350: sage: mem.total_ram() Expected: 4294967296 Got: 4294967296L **********************************************************************
comment:188 Changed 9 years ago by
Would it make sense, in spkginstall
, to either remove the old GAP installation or (probably better) to move it to local/gap/gap4.4.12
?
comment:189 followup: ↓ 190 Changed 9 years ago by
Thanks, fixed the MB... its been a long time that I've seen computations not finish because of O(1024^2)
bytes of storage missing ;)
I've also added the appropriate # 64bit
vs # 32bit
to the offending doctest.
The pool size is NOT allocated ram, it is an anonymous mmap and will only be used if necessary. 150MB is definitely not enough for serious computations. If you want hundreds of Sage instances then thats fine, you just need enough swap. Or, if you prefer, you can set up your desktop without swap. Then the pool will be backed by actual RAM, but you hopefully were aware of that when you decided not to have a swap partition.
Finally, I don't see any benefit in moving/deleting the previous GAP install. We'll just run into unintended consequences.
comment:190 in reply to: ↑ 189 Changed 9 years ago by
Replying to vbraun:
The pool size is NOT allocated ram, it is an anonymous mmap and will only be used if necessary. 150MB is definitely not enough for serious computations. If you want hundreds of Sage instances then thats fine, you just need enough swap.
Why do we need to set 'o' at all? GAP can perfectly start with only 'm' (i.e. the initial amount of memory) set only. 'o' is meant to set the maximal amount of RAM to be available, at least according to GAPs docs.
comment:191 followup: ↓ 192 Changed 9 years ago by
If you don't specify 'o' then gap will just chose a pool size for you. The virtual memory pool address space can't be changed after GAP has started. The actual memory used can expand and contract. Of course the actual memory used is bounded by the pool size.
comment:192 in reply to: ↑ 191 ; followup: ↓ 193 Changed 9 years ago by
Replying to vbraun:
If you don't specify 'o' then gap will just chose a pool size for you. The virtual memory pool address space can't be changed after GAP has started. The actual memory used can expand and contract. Of course the actual memory used is bounded by the pool size.
OK. Still, what about limiting the suggested_size to something (say, 500MB)  I agree that 150MB I suggested above is on a low side.
Anyway, we should put somewhere in the documentation a remark that by default suchandsuch max amount is allocated, but if you need more, then do set_gap_memory_pool_size()
before doing any GAP stuff.
By the way, can one terminate, from Sage, all the running GAP subprocesses?
comment:193 in reply to: ↑ 192 ; followup: ↓ 195 Changed 9 years ago by
Replying to dimpase:
OK. Still, what about limiting the suggested_size to something (say, 500MB)  I agree that 150MB I suggested above is on a low side.
I'm against absolute limits, it should be a fraction of the available resources. Otherwise you'll end up on record with "640kb is enough for everyone".
Anyway, we should put somewhere in the documentation a remark that by default suchandsuch max amount is allocated, but if you need more, then do
set_gap_memory_pool_size()
before doing any GAP stuff.
Its in the set_gap_memory_pool_size
docstring already.
By the way, can one terminate, from Sage, all the running GAP subprocesses?
The Sage expect interface does not keep a list of started subprocesses, so no. That would be a nice enhancement to the whole expect stuff, but please in another ticket.
comment:194 Changed 9 years ago by
 Status changed from needs_work to needs_review
comment:195 in reply to: ↑ 193 Changed 9 years ago by
Replying to vbraun:
Replying to dimpase:
OK. Still, what about limiting the suggested_size to something (say, 500MB)  I agree that 150MB I suggested above is on a low side.
I'm against absolute limits, it should be a fraction of the available resources. Otherwise you'll end up on record with "640kb is enough for everyone".
I propose a limit that can be explicitly overwritten, not something absolute. The reason is that on big machines a fraction of the resource can be too big to be meaningful, and will make sharing the machine very hard, as the first few processes will grab an enormous chunk of swap, and the remaining ones will suffer. I propose a limit that one would not even notice on a typical desktop (OK, if you think 500MB is too small, make it 750MB, or 1GB). And if you come to a superduper machine to compute something huge then OK, please tell the system explicitly how much memory you might need.
As a matter of fact, I do not understand why GAP needs to reserve any swap at all. Just does not make sense to me. What is so special about GAP here?
comment:196 Changed 9 years ago by
By "absolute" limit I mean a fixed size (as opposed to relative to available resources). 640kb (or any other fixed amount) is not going to be good enough in the future.
The current GAP in Sage does soak up all the available swap and I don't see anybody complaining about it. It does make my desktop slow down to a crawl whenever the patchbot hits GAP, though. Maybe this patch is not the ideal solution, but its a vast improvement over what we currently have.
As for the rest, read the mmap manpage. You either reserve swap or you get SIGSEGV when you run out of memory. And GAP is not written in a way what would handle that signal.
comment:197 Changed 9 years ago by
 Status changed from needs_review to positive_review
comment:198 Changed 9 years ago by
FWIW, this report https://groups.google.com/d/msg/sagerelease/qxv0lvoSfN4/dSYcyxOG1AJ mentions orphans with the previous version of GAP as well.
comment:199 Changed 9 years ago by
I had accidentally dropped the workaround for the gap o 3*2^31
option parser overflow, its back in the trac_13211_pool_size.patch now.
Also, seems like the reporter on sagerelease didn't get GAP orphans with the old version after all.
comment:200 followup: ↓ 202 Changed 9 years ago by
 Status changed from positive_review to needs_work
The memory size still isn't completely right, as I got this doctest error on snapperkob (Linux Ubuntu 12.04 x86_64, 8GB RAM + 0.5GB swap) and iras (Linux ia64, 4GB RAM + 2GB swap):
sage t long force_lib devel/sage/sage/groups/matrix_gps/matrix_group_morphism.py ********************************************************************** File "/home/buildbot/build/sage/iras1/iras_full/build/sage5.6.beta0/devel/sagemain/sage/groups/matrix_gps/matrix_group_morphism.py", line 229: sage: f = O.hom([r*x*r_ for x in O.gens()]) # long time (19s on sage.math, 2011) Exception raised: Traceback (most recent call last): File "/home/buildbot/build/sage/iras1/iras_full/build/sage5.6.beta0/local/bin/ncadoctest.py", line 1231, in run_one_test self.run_one_example(test, example, filename, compileflags) File "/home/buildbot/build/sage/iras1/iras_full/build/sage5.6.beta0/local/bin/sagedoctest.py", line 38, in run_one_example OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags) File "/home/buildbot/build/sage/iras1/iras_full/build/sage5.6.beta0/local/bin/ncadoctest.py", line 1172, in run_one_example compileflags, 1) in test.globs File "<doctest __main__.example_7[17]>", line 1, in <module> f = O.hom([r*x*r_ for x in O.gens()]) # long time (19s on sage.math, 2011)###line 229: sage: f = O.hom([r*x*r_ for x in O.gens()]) # long time (19s on sage.math, 2011) File "/home/buildbot/build/sage/iras1/iras_full/build/sage5.6.beta0/local/lib/python/sitepackages/sage/groups/matrix_gps/matrix_group.py", line 268, in hom return self.Hom(U)(x) File "/home/buildbot/build/sage/iras1/iras_full/build/sage5.6.beta0/local/lib/python/sitepackages/sage/groups/matrix_gps/homset.py", line 114, in __call__ im_gens, check=check) File "/home/buildbot/build/sage/iras1/iras_full/build/sage5.6.beta0/local/lib/python/sitepackages/sage/groups/matrix_gps/matrix_group_morphism.py", line 75, in __init__ phi0 = gap(self) File "/home/buildbot/build/sage/iras1/iras_full/build/sage5.6.beta0/local/lib/python/sitepackages/sage/interfaces/interface.py", line 197, in __call__ return self._coerce_from_special_method(x) File "/home/buildbot/build/sage/iras1/iras_full/build/sage5.6.beta0/local/lib/python/sitepackages/sage/interfaces/interface.py", line 223, in _coerce_from_special_method return (x.__getattribute__(s))(self) File "sage_object.pyx", line 463, in sage.structure.sage_object.SageObject._gap_ (sage/structure/sage_object.c:4539) File "sage_object.pyx", line 439, in sage.structure.sage_object.SageObject._interface_ (sage/structure/sage_object.c:4139) File "/home/buildbot/build/sage/iras1/iras_full/build/sage5.6.beta0/local/lib/python/sitepackages/sage/interfaces/interface.py", line 195, in __call__ return cls(self, x, name=name) File "/home/buildbot/build/sage/iras1/iras_full/build/sage5.6.beta0/local/lib/python/sitepackages/sage/interfaces/expect.py", line 1308, in __init__ raise TypeError, x TypeError: Gap terminated unexpectedly while reading in a large line: Gap produced error output Error, exceeded the permitted memory (`o' command line option) executing Read("/home/buildbot/.sage/temp/iras/26682/interface/tmp26707"); **********************************************************************
Since this never happened with the old GAP, I consider this a regression which should be fixed.
comment:201 Changed 9 years ago by
On hawk (OpenSolaris i386), I get this reproducible doctest error:
sage t long force_lib devel/sage/sage/interfaces/gap.py gap: halving pool size. ********************************************************************** File "/export/home/buildbot/build/sage/hawk1/hawk_full/build/sage5.6.beta0/devel/sagemain/sage/interfaces/gap.py", line 376: sage: gap('"finished computation"'); gap.interrupt(); gap('"ok"') Exception raised: Traceback (most recent call last): File "/export/home/buildbot/build/sage/hawk1/hawk_full/build/sage5.6.beta0/local/bin/ncadoctest.py", line 1231, in run_one_test self.run_one_example(test, example, filename, compileflags) File "/export/home/buildbot/build/sage/hawk1/hawk_full/build/sage5.6.beta0/local/bin/sagedoctest.py", line 38, in run_one_example OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags) File "/export/home/buildbot/build/sage/hawk1/hawk_full/build/sage5.6.beta0/local/bin/ncadoctest.py", line 1172, in run_one_example compileflags, 1) in test.globs File "<doctest __main__.example_6[5]>", line 1, in <module> gap('"finished computation"'); gap.interrupt(); gap('"ok"')###line 376: sage: gap('"finished computation"'); gap.interrupt(); gap('"ok"') File "/export/home/buildbot/build/sage/hawk1/hawk_full/build/sage5.6.beta0/local/lib/python/sitepackages/sage/interfaces/gap.py", line 393, in interrupt E.sendline() File "/export/home/buildbot/build/sage/hawk1/hawk_full/build/sage5.6.beta0/local/lib/python/sitepackages/pexpect.py", line 677, in sendline n = n + self.send (os.linesep) File "/export/home/buildbot/build/sage/hawk1/hawk_full/build/sage5.6.beta0/local/lib/python/sitepackages/pexpect.py", line 669, in send c = os.write(self.child_fd, str) OSError: [Errno 22] Invalid argument ********************************************************************** File "/export/home/buildbot/build/sage/hawk1/hawk_full/build/sage5.6.beta0/devel/sagemain/sage/interfaces/gap.py", line 1794: sage: gap_version() Expected: doctest:...: DeprecationWarning: use gap.version() instead See http://trac.sagemath.org/13211 for details. '4.5.6' Got: doctest:1: DeprecationWarning: use gap.version() instead See http://trac.sagemath.org/13211 for details. ** Gap crashed or quit executing 'VERSION;' ** Restarting Gap and trying again '4.5.6' **********************************************************************
It seems we're trying to write to a closed filedescriptor.
comment:202 in reply to: ↑ 200 Changed 9 years ago by
Replying to jdemeyer:
The memory size still isn't completely right, as I got this doctest error on snapperkob (Linux Ubuntu 12.04 x86_64, 8GB RAM + 0.5GB swap) and iras (Linux ia64, 4GB RAM + 2GB swap):
These are cases where the hardcoded minimum for the GAP memory pool is used (since they have insufficient swap space). I'll try to find a minimal value that is sufficient to run the long doctests, not just to start up GAP.
comment:203 followup: ↓ 204 Changed 9 years ago by
 Status changed from needs_work to positive_review
Turns out we need 76MB for the long doctests, but we hardcoded 75MB. I've increased the minimum to 100MB to give us some headroom for future doctests. I'll see if I can reproduce the OpenSolaris bug on OpenIndiana (if anybody with access to hawk can debug it go for it, but I don't have an account). But that shouldn't stop us from shipping the update.
comment:204 in reply to: ↑ 203 ; followups: ↓ 205 ↓ 206 Changed 9 years ago by
 Status changed from positive_review to needs_work
Replying to vbraun:
I'll see if I can reproduce the OpenSolaris bug on OpenIndiana (if anybody with access to hawk can debug it go for it, but I don't have an account). But that shouldn't stop us from shipping the update.
A failure on a supported platform does stop the update...
comment:205 in reply to: ↑ 204 Changed 9 years ago by
Replying to jdemeyer:
Replying to vbraun:
I'll see if I can reproduce the OpenSolaris bug on OpenIndiana (if anybody with access to hawk can debug it go for it, but I don't have an account). But that shouldn't stop us from shipping the update.
A failure on a supported platform does stop the update...
I am having trouble building 5.5.rc1 on hawk. It errors out for no obvious reason.
comment:206 in reply to: ↑ 204 Changed 9 years ago by
Replying to jdemeyer:
A failure on a supported platform does stop the update...
The most glaring bug is of course that OpenSolaris is a fully supported platform. https://groups.google.com/d/topic/sagedevel/nRVBUQtL_d4/discussion
comment:207 Changed 9 years ago by
Another unrelated issue is that setting SAGE_DEBUG=yes
doesn't correctly set O0
, as it's overwritten later in the command line by an O2
:
$ SAGE_DEBUG=yes ./sage f gap4.5.6.p0.spkg [...] gcc I. I../.. DCONFIG_H I/release/merger/sage5.6.beta0/local/include D__GMP_MP_RELEASE=50002 O0 g3 DDEBUG_MASTERPOINTERS DDEBUG_GLOBAL_BAGS DDEBUG_FUNCTIONS_BAGS Wall g O2 o intfuncs.o c ../../src/intfuncs.c [...]
comment:208 Changed 9 years ago by
FYI: GAP4.5.7 has been released which fixes (amongst other things)
Numbers in memory options on the command line exceeding 2^{32} could not be parsed correctly, even on 64bit systems. [Reported by Volker Braun]
comment:209 Changed 9 years ago by
I reported the patch cflags.patch upstream, it should be added to the spkg.
comment:210 followup: ↓ 211 Changed 9 years ago by
The "Special Update/Build Instructions
" are quite unclear to me. I'm trying to update the package to GAP4.5.7 but it's difficult to understand what all this means:
This is a strippeddown version of GAP. The databases, which are arch independent, are in a separate package and doc and tests are removed. ** IMPORTANT **: When you update this package, be sure to put the guava package in the package directory!! Delete some of the documentation: cd doc rm *.bbl *.aux *.dvi *.idx *.ilg *.l* *.m* *.pdf *.toc *.blg *.ind rm */*.bbl */*.aux */*.dvi */*.idx */*.ilg */*.l* */*.m* */*.pdf */*.ind */*.toc */*.blg DATABASES (separated out to database_gap.spkg) except GAPDoc which is required: rm rf small prim trans cd pkg rm rf !(GAPDoc*) Stuff that isn't GAP sources: rm rf bin/* cd extern rm rf !Makefile.in
comment:211 in reply to: ↑ 210 ; followup: ↓ 212 Changed 9 years ago by
Replying to jdemeyer:
The "
Special Update/Build Instructions
" are quite unclear to me. I'm trying to update the package to GAP4.5.7 but it's difficult to understand what all this means:This is a strippeddown version of GAP. The databases, which are arch independent, are in a separate package and doc and tests are removed. ** IMPORTANT **: When you update this package, be sure to put the guava package in the package directory!!
somehow, guava, the favorite GAP package of DJ, used to enjoy a special deal. But not anymore, for a while already. This needs to be updated.
GAP packages (a selection) go to gap_packages optional package.
I hope the rest is clear.
Delete some of the documentation:
cd doc rm *.bbl *.aux *.dvi *.idx *.ilg *.l* *.m* *.pdf *.toc *.blg *.ind rm */*.bbl */*.aux */*.dvi */*.idx */*.ilg */*.l* */*.m* */*.pdf */*.ind */*.toc */*.blg
DATABASES (separated out to database_gap.spkg) except GAPDoc which is required:
rm rf small prim trans cd pkg rm rf !(GAPDoc*)
Stuff that isn't GAP sources:
rm rf bin/* cd extern rm rf !Makefile.in
}}}
comment:212 in reply to: ↑ 211 Changed 9 years ago by
Replying to dimpase:
I hope the rest is clear.
It's not.
When updating the sources is nontrivial, it's best to create a shell script (called spkgsrc
) which downloads and prepares src/
.
comment:213 Changed 9 years ago by
 Work issues set to Clarify src updates
I have a new spkg ready which includes two patches: cflags.patch and siginterrupt.patch. The latter one fixes the OpenSolaris issue and has also been reported upstream.
Somebody else should really clarify SPKG.txt
on how to update src
or write a shell script to do so.
comment:214 followup: ↓ 215 Changed 9 years ago by
Call me an optimist, but I'm still hoping that upstream will clean up their build system and dist tarball. The layout of their dist tarball and the required packages are also in a flux, so I don't see the value of a shell script. Having said that, the instructions are written in a manner that you just have to cd into the gap* source directory and execute them. E.g. there are directories gap4r5/small
, gap4r5/prim
, and gap4r5/trans
. Then by "rm rf small prim trans
" I mean to check that the directory layout hasn't changed and then delete these directories.
comment:215 in reply to: ↑ 214 Changed 9 years ago by
Replying to vbraun:
Having said that, the instructions are written in a manner that you just have to cd into the gap* source directory and execute them.
Not quite. I don't know what rm rf !(GAPDoc*)
is supposed to do:
jdemeyer@sage:~/spkg/gap4.5.7.p0/src/pkg$ rm rf !(GAPDoc*) bash: !: event not found
Besides, if you're writing out the commands in SPKG.txt
, you might as well create an actual shell script which is easier to use than copy/pasting the code and putting the right "cd" statements in between.
comment:216 followup: ↓ 217 Changed 9 years ago by
You don't like extended shell globs? Then there is no concise way to match with exceptions, I think.
shopt +extglob
Both the mercurial and subversion bash completion scripts enable this by default.
comment:217 in reply to: ↑ 216 Changed 9 years ago by
comment:218 Changed 9 years ago by
I guess you mean
shopt s extglob
comment:219 Changed 9 years ago by
Yes, sorry.
comment:220 Changed 9 years ago by
 Description modified (diff)
comment:221 Changed 9 years ago by
 Status changed from needs_work to needs_review
I updated SPKG.txt
and upgraded to GAP4.5.7. needs review...
comment:222 Changed 9 years ago by
 Work issues Clarify src updates deleted
comment:223 Changed 9 years ago by
 Summary changed from Upgrade GAP to 4.5.6 to Upgrade GAP to 4.5.7
comment:224 followups: ↓ 225 ↓ 229 Changed 9 years ago by
Did you check that nothing in the gap_package
and database_gap
packages changed? Should I version bump them to 4.5.7 to be clear?
comment:225 in reply to: ↑ 224 ; followups: ↓ 226 ↓ 227 Changed 9 years ago by
Replying to vbraun:
Did you check that nothing in the
gap_package
anddatabase_gap
packages changed? Should I version bump them to 4.5.7 to be clear?
At least some packages might have been changed (this is not in sync with the GAP releases, AFAIK).
Also, after updating with Jeroen's spkg, I still get gap.version 4.5.6, not 4.5.7.
comment:226 in reply to: ↑ 225 Changed 9 years ago by
Replying to dimpase:
Also, after updating with Jeroen's spkg, I still get gap.version 4.5.6, not 4.5.7.
Me, too. I verified that the upstream tarball shows the correct version (4.5.7) upon startup, so something is wrong with the spkg sources.
comment:227 in reply to: ↑ 225 Changed 9 years ago by
Replying to dimpase:
Also, after updating with Jeroen's spkg, I still get gap.version 4.5.6, not 4.5.7.
It works for me:
jdemeyer@sage:/release/merger/sage5.6.beta1$ ./sage gap ********* GAP, Version 4.5.7 of 14Dec2012 (free software, GPL) * GAP * http://www.gapsystem.org ********* Architecture: x86_64unknownlinuxgnugccdefault64 Libs used: gmp, readline Loading the library and packages ... Packages: GAPDoc 1.5.1 Try '?help' for help. See also '?copyright' and '?authors' gap>
jdemeyer@sage:/release/merger/sage5.6.beta1$ ./sage   Sage Version 5.5.rc1, Release Date: 20121218   Type "notebook()" for the browserbased notebook interface.   Type "help()" for help.   ********************************************************************** * * * Warning: this is a prerelease version, and it may be unstable. * * * ********************************************************************** sage: gap.version() '4.5.7'
comment:228 Changed 9 years ago by
The doctest needs to be changed though to reflect version 4.5.7
comment:229 in reply to: ↑ 224 Changed 9 years ago by
Replying to vbraun:
Did you check that nothing in the
gap_package
anddatabase_gap
packages changed?
No, I didn't check anything, nor did I realise that this was needed. Please go ahead and update them.
comment:230 followup: ↓ 232 Changed 9 years ago by
The gap/latest
symlink is not correctly created, which is why the old version is still used if you installed gap4.5.6 previously. I'll fix that, too.
comment:231 Changed 9 years ago by
 Description modified (diff)
comment:232 in reply to: ↑ 230 Changed 9 years ago by
Replying to vbraun:
The
gap/latest
symlink is not correctly created, which is why the old version is still used if you installed gap4.5.6 previously. I'll fix that, too.
ok
comment:233 Changed 9 years ago by
 Description modified (diff)
comment:234 Changed 9 years ago by
Nothing changed in the gap_packages
and database_gap
this time, but I bumped the version to match. In any case they need to be updated from the old gap4.4 ones.
comment:235 Changed 9 years ago by
 Status changed from needs_review to positive_review
Positive review to Jeroen's changes.
comment:236 followup: ↓ 237 Changed 9 years ago by
 Status changed from positive_review to needs_work
The memory size still isn't right, as I got this doctest error on snapperkob (Linux Ubuntu 12.04 x86_64, 8GB RAM + 0.5GB swap):
sage t long force_lib devel/sage/sage/combinat/root_system/weyl_group.py ********************************************************************** File "/home/buildbot/build/sage/snapperkob/snapperkob_full/build/sage5.6.beta1/devel/sagemain/sage/combinat/root_system/weyl_group.py", line 543: sage: all( WeylGroup(t).long_element() == WeylGroup(t).long_element_hardcoded() for t in types ) # long time (17s on sage.math, 2011) Exception raised: Traceback (most recent call last): File "/home/buildbot/build/sage/snapperkob/snapperkob_full/build/sage5.6.beta1/local/bin/ncadoctest.py", line 1231, in run_one_test self.run_one_example(test, example, filename, compileflags) File "/home/buildbot/build/sage/snapperkob/snapperkob_full/build/sage5.6.beta1/local/bin/sagedoctest.py", line 38, in run_one_example OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags) File "/home/buildbot/build/sage/snapperkob/snapperkob_full/build/sage5.6.beta1/local/bin/ncadoctest.py", line 1172, in run_one_example compileflags, 1) in test.globs File "<doctest __main__.example_16[4]>", line 1, in <module> all( WeylGroup(t).long_element() == WeylGroup(t).long_element_hardcoded() for t in types ) # long time (17s on sage.math, 2011)###line 543: sage: all( WeylGroup(t).long_element() == WeylGroup(t).long_element_hardcoded() for t in types ) # long time (17s on sage.math, 2011) File "<doctest __main__.example_16[4]>", line 1, in <genexpr> all( WeylGroup(t).long_element() == WeylGroup(t).long_element_hardcoded() for t in types ) # long time (17s on sage.math, 2011)###line 543: sage: all( WeylGroup(t).long_element() == WeylGroup(t).long_element_hardcoded() for t in types ) # long time (17s on sage.math, 2011) File "/home/buildbot/build/sage/snapperkob/snapperkob_full/build/sage5.6.beta1/local/lib/python/sitepackages/sage/combinat/root_system/weyl_group.py", line 579, in long_element_hardcoded return self.__call__(m) File "/home/buildbot/build/sage/snapperkob/snapperkob_full/build/sage5.6.beta1/local/lib/python/sitepackages/sage/combinat/root_system/weyl_group.py", line 362, in __call__ if not gap(g) in gap(self): File "/home/buildbot/build/sage/snapperkob/snapperkob_full/build/sage5.6.beta1/local/lib/python/sitepackages/sage/interfaces/interface.py", line 675, in __contains__ return P._contains(x.name(), self.name()) File "/home/buildbot/build/sage/snapperkob/snapperkob_full/build/sage5.6.beta1/local/lib/python/sitepackages/sage/interfaces/gap.py", line 815, in _contains return self.eval('%s in %s'%(v1,v2)) == "true" File "/home/buildbot/build/sage/snapperkob/snapperkob_full/build/sage5.6.beta1/local/lib/python/sitepackages/sage/interfaces/gap.py", line 574, in eval result = Expect.eval(self, input_line, **kwds) File "/home/buildbot/build/sage/snapperkob/snapperkob_full/build/sage5.6.beta1/local/lib/python/sitepackages/sage/interfaces/expect.py", line 1220, in eval for L in code.split('\n') if L != '']) File "/home/buildbot/build/sage/snapperkob/snapperkob_full/build/sage5.6.beta1/local/lib/python/sitepackages/sage/interfaces/gap.py", line 775, in _eval_line raise RuntimeError, message RuntimeError: Gap produced error output Error, exceeded the permitted memory (`o' command line option) executing $sage3 in $sage11; **********************************************************************
comment:237 in reply to: ↑ 236 Changed 9 years ago by
Replying to jdemeyer:
The memory size still isn't right, as I got this doctest error on snapperkob (Linux Ubuntu 12.04 x86_64, 8GB RAM + 0.5GB swap):
same on my OSX 10.6.8 laptop. Actually, just the following suffices:
$ sage   Sage Version 5.5.rc0, Release Date: 20121117   Type "notebook()" for the browserbased notebook interface.   Type "help()" for help.   ********************************************************************** * * * Warning: this is a prerelease version, and it may be unstable. * * * ********************************************************************** sage: WeylGroup(['E',6]).long_element() == WeylGroup(['E',6]).long_element_hardcoded() ERROR: An unexpected error occurred while tokenizing input [...] RuntimeError: Gap produced error output Error, exceeded the permitted memory (`o' command line option)
Repeating this line again in the same session produces no error, something I don't understand.
comment:238 Changed 9 years ago by
 Description modified (diff)
 Status changed from needs_work to positive_review
I see, the by far largest computation with GAP is actually in combinat and not in the group theory stuff %) We actually need about 220 MB, so I increased the minimum to 250 MB. I checked that make ptestlong runs without any further errors.
Dima, running the command twice probably uses values that were cached in Sage.
comment:239 Changed 9 years ago by
Volker, your patch is missing memory_info.py
for some reason.
comment:240 Changed 9 years ago by
Also, something else I just noted: in #12221, I determined it was better to complete unset TERM
instead of setting it to "dumb". I don't remember why, but I do remember that was the most reliable.
But I don't care very much since it seems to pass doctests...
comment:241 Changed 9 years ago by
 Status changed from positive_review to needs_work
comment:242 Changed 9 years ago by
 Description modified (diff)
 Status changed from needs_work to positive_review
Strange that the file got lost, I didn't edit anything in that dir. I put memory_info.py
back in and switched from "dumb" to no TERM.
comment:243 followup: ↓ 245 Changed 8 years ago by
 Merged in set to sage5.6.beta1
 Resolution set to fixed
 Status changed from positive_review to closed
And there was much rejoicing...
comment:244 Changed 8 years ago by
Thanks! \o/
comment:245 in reply to: ↑ 243 Changed 8 years ago by
Replying to jdemeyer:
And there was much rejoicing...
Congratulation! Although it means that I have to produce a new version of my group cohomology spkg  it seems that the new Gap version has a different syntax or function names or whatever else to cope with... :(
comment:246 Changed 8 years ago by
both optional SPKSes are on their way around the world :)
comment:247 Changed 8 years ago by
The database_gap spkg on the mirrors seems to be corrupted, as does the one on Volker's page in the ticket description:
palmieri@sage:sage$ ./sage i http://www.stp.dias.ie/~vbraun/Sage/spkg/database_gap4.5.7.spkg Attempting to download package database_gap4.5.7 >>> Downloading database_gap4.5.7.spkg. [............................................................] database_gap4.5.7 ==================================================== Extracting package /scratch/palmieri/sage5.5.rc0/spkg/optional/database_gap4.5.7.spkg rwrr 1 palmieri palmieri 59654144 20121229 17:33 /scratch/palmieri/sage5.5.rc0/spkg/optional/database_gap4.5.7.spkg tar: Skipping to next header tar: Error exit delayed from previous errors Error: failed to extract /scratch/palmieri/sage5.5.rc0/spkg/optional/database_gap4.5.7.spkg
comment:248 Changed 8 years ago by
Fixed, the file was truncated. This is the correct checksum:
[vbraun@volkerdesktop spkg]$ md5sum database_gap4.5.7.spkg 46b0a14437b1fe996cbbb482d00e5325 database_gap4.5.7.spkg
comment:249 Changed 8 years ago by
ok,i've replaced the faulty file on the master and the md5 sum matches now. mirrors are updating when you read this!
comment:250 Changed 8 years ago by
Just FYI, JP opened a followup at #13954 since this broke on Cygwin. No action necessary here, of course, and it looks like he has a fix in any case.
comment:251 followup: ↓ 252 Changed 8 years ago by
I get some weird stuff in interfaces/gap.py on OS X 10.4 with 5.6.beta2. Things like
sage: n = get_gap_memory_pool_size() <snip> free_ram = int(free_ram([:1]) * units[free_ram[1]] ValueError: invalid literal for int() with base 10: '33.1'
and lots and lots of other similar things. The new file misc/memory_info.py has similar problems. This was all introduced in this ticket, so I assume it is related... but maybe there was a followup I am unaware of? Thanks for any feedback.
comment:252 in reply to: ↑ 251 ; followup: ↓ 253 Changed 8 years ago by
There is a followup at #13880, but if that doesn't fix your problem, please post to sagedevel such that this can be fixed before releasing the final Sage 5.6.
comment:253 in reply to: ↑ 252 ; followup: ↓ 254 Changed 8 years ago by
There is a followup at #13880, but if that doesn't fix your problem, please post to sagedevel such that this can be fixed before releasing the final Sage 5.6.
I saw that, but it didn't seem to be quite the same issue... but I'll try it!
comment:254 in reply to: ↑ 253 ; followup: ↓ 255 Changed 8 years ago by
There is a followup at #13880, but if that doesn't fix your problem, please post to sagedevel such that this can be fixed before releasing the final Sage 5.6.
I saw that, but it didn't seem to be quite the same issue... but I'll try it!
Sorry, it didn't seem to do it. I've posted to sagedevel.
comment:255 in reply to: ↑ 254 Changed 8 years ago by
(never mind)
comment:256 Changed 8 years ago by
This caused a serious slowdown in congruence subgroups: https://groups.google.com/forum/?fromgroups#!topic/sagedevel/e3EDIRLuJXA
comment:257 Changed 8 years ago by
Reported upstream at http://tracker.gapsystem.org/issues/221
From the GAP website: "The current version is GAP 4.5.5 released on July 17th, 2012."