Opened 6 years ago
Closed 2 years ago
#18267 closed enhancement (fixed)
libgap for PermutationGroup
Reported by:  vdelecroix  Owned by:  embray 

Priority:  major  Milestone:  sage8.9 
Component:  interfaces  Keywords:  fpsac2019 
Cc:  nthiery, tscrim  Merged in:  
Authors:  Erik Bray  Reviewers:  Frédéric Chapoton, Vincent Delecroix, Travis Scrimshaw 
Report Upstream:  N/A  Work issues:  
Branch:  65f8a84 (Commits, GitHub, GitLab)  Commit:  65f8a84b3965bf80fc3b7fb6018a1be753f3eab2 
Dependencies:  #27946, #28354  Stopgaps: 
Description
As remarked in this question on ask.sagemath.org a lot of time is spent with pexpect when playing with permutation groups.
In this ticket:
 we interface libgap permutation group so that the following works
sage: p1=libgap.eval("(1,2,3)") sage: p2=libgap.eval("(3,4,5)") sage: p1.sage() (1,2,3) sage: G=libgap.Group([p1,p2]) sage: G.sage() Traceback (most recent call last): ... NotImplementedError: cannot construct equivalent Sage object
 we check that interesting GAP functions such as
ClosureGroup
,AllBlocks
, are available from libgap  we use them in
PermutationGroup
inplace of the current gap version
Change History (40)
comment:1 Changed 2 years ago by
 Milestone changed from sage6.7 to sagewishlist
comment:2 Changed 2 years ago by
comment:3 Changed 2 years ago by
 Owner changed from (none) to embray
I have a prototype for this in progress that is shaping up pretty well so far. I will likely need help with this part:
we check that interesting GAP functions such as ClosureGroup?, AllBlocks?, are available from libgap
as I don't know all the math well enough to write interesting tests for this. But I can get most of the skunkworks done and then anyone else who's interested can add on to it.
comment:4 Changed 2 years ago by
The I/O overhead on Cygwin adds up to something on the order of 20x in this test (specially, the _test_cellular
test). I tracked this down to being entirely coming from the pexpect communication. On Linux:
sage t long src/sage/combinat/symmetric_group_algebra.py [422 tests, 30.17 s]
On Cygwin (different machine, but with reasonably good specs):
sage t long src/sage/combinat/symmetric_group_algebra.py [422 tests, 613.16 s]
More than 10 minutes on this one test! It's by far one of the slowest tests on Cygwin. With my prototype of getting pexpect out of permutation groups I knocked this result down to (again on Cygwin):
sage t long src/sage/combinat/symmetric_group_algebra.py [422 tests, 25.49 s]
comment:5 Changed 2 years ago by
 Milestone changed from sagewishlist to sagepending
Unfortunately src/sage/groups/perm_gps/permgroup_named.py
still remains by far one of the slowest tests on Cygwin despite my fixes. Unclear as of yet why; maybe several tests using other code that still uses pexpect, or something else...
comment:6 Changed 2 years ago by
Well I found one more pexpect use in permgroup_named; fixing that one knocked it down from ~5 minutes to
sage t long src/sage/groups/perm_gps/permgroup_named.py [509 tests, 47.12 s]
Which is still about 5 times slower for some reason than that test is on Linux, but progress!
Some of the tests also intentionally use the pexpect interface directly, so nothing to do about that for now.
comment:7 Changed 2 years ago by
 Branch set to u/embray/prototype/permgpsnopexpectgap
 Cc nthiery added
 Commit set to 75d6b764cac83c2930dff648605de98adbc506db
 Dependencies set to #27946
 Status changed from new to needs_review
Here is my current prototype for getting pexpect out of permutation groups, and having them use libgap exclusively. For the large part it is quite effective at this. This fixes (among many other examples):
sage: G = SymmetricGroup(3) sage: %prun G.one() sage: G.one() () sage: gap._expect is None
where previously this was invoking a GAP process through pexpect.
Last 10 new commits:
618a0d4  Fix several predicate methods of permutation groups to use libgap.

7a0dd0e  Use libgap for permutation group comparison.

cc5381d  It seems these tests broke again.

d4e81e8  Use libgap in several more methods where it was trivial to make the change, and where it at least didn't break any local tests.

2e60ebe  These tests were marked 'random output' despite the explicit calls to set_random_seed(0), which should have obviated this.

3c8954a  Use libgap for direct_product

220c5e7  Use libgap for character_table

5ab41fb  Some fixes to make ClassFunction work properly with libgap.

db4eeda  Use libgap for PermutationGroup_generic._regular_subgroup_gap

75d6b76  Use libgap for PermutationGroup_generic.molien_series

comment:8 Changed 2 years ago by
 Status changed from needs_review to needs_work
Unfortunately there is still a handful of failing tests with this.
comment:9 Changed 2 years ago by
I think most of the problems are coming from my hacky attempt to make some GapElement
objects pickleable in fcfde85 which can make some surprising results, and is maybe not quite right yet. It should probably be handled in a separate ticket.
comment:10 Changed 2 years ago by
 Commit changed from 75d6b764cac83c2930dff648605de98adbc506db to c02c2f11a2c2228851419eb1aadd32fde75a2ad4
Branch pushed to git repo; I updated commit sha1. This was a forced push. Last 10 new commits:
075a48a  It seems these tests broke again.

dcc0b11  Use libgap in several more methods where it was trivial to make the change, and where it at least didn't break any local tests.

96c55c3  These tests were marked 'random output' despite the explicit calls to set_random_seed(0), which should have obviated this.

13b0e3c  Use libgap for direct_product

6928f69  Use libgap for character_table

77513fe  Some fixes to make ClassFunction work properly with libgap.

edf2a5d  Use libgap for PermutationGroup_generic._regular_subgroup_gap

6fbdbd1  Use libgap for PermutationGroup_generic.molien_series

42318a9  Fix minor test failure: This will now fail at trying to find a _libgap_init_ rather than a _gap_init_

c02c2f1  If the values passed to the ClassFunction constructor are already a libgap

comment:11 Changed 2 years ago by
Fixed the problems that I knew about, though there are likely still others. Although this is set to "needs_work" it would probably be interestingenough to start playing around with if anyone wants to review.
comment:12 Changed 2 years ago by
With the current version of this branch almost everything works except for a few trivial doctest failures due to new pseudorandom results in some tests.
However, I am also getting a bizarre memory error when running the tests for sage.combinat.tableau
that's similar to what I reported in #27681 (but later couldn't reproduce).
comment:14 Changed 2 years ago by
If anyone would like to help test this that would be great for moving forward on this work. In particular, it would be helpful to know if anyone can reproduce the test failure in sage.combinat.tableau
, since from my end that is the only major blocker to completing this work.
comment:15 Changed 2 years ago by
 Commit changed from c02c2f11a2c2228851419eb1aadd32fde75a2ad4 to cc6491009070408c9ed5dde6317ee0ac229458ca
Branch pushed to git repo; I updated commit sha1. New commits:
cc64910  Fixed slightly different output on a few tests due to PRNG differences related

comment:16 Changed 2 years ago by
 Status changed from needs_work to needs_review
All tests now pass for me on this ticket (have not tried optional packages).
The one bug I mentioned in comment:14 is now tracked at #28106 and is not directly caused by the changes I made here; rather it only exhibited due to the fact that using libgap in more places meant slightly more virtual memory was being allocated in some tests than previously, but this is expected.
So if you run into the bug with MemoryError
s in sage.combinat.tableau
you can pass the test runner memlimit=4000
or something like that.
comment:17 Changed 2 years ago by
I have tested a bit with python3. All doctests pass in tableaurelated files and in the modified files.
comment:18 Changed 2 years ago by
Tests pass with surface_dynamics that uses PermutationGroup
. In the same library, I also replaced many calls to gap
by libgap
and everything went smoothly.
Thanks for working on this!
comment:19 Changed 2 years ago by
 Milestone changed from sagepending to sage8.9
 Reviewers set to Frédéric Chapoton, Vincent Delecroix, Travis Scrimshaw
 Status changed from needs_review to positive_review
Everything also LGTM.
comment:20 Changed 2 years ago by
 Keywords fpsac2019 added
comment:21 Changed 2 years ago by
 Dependencies changed from #27946 to #27946, #28106
comment:22 Changed 2 years ago by
 Milestone changed from sage8.9 to sagepending
comment:23 Changed 2 years ago by
Any reason this is sagepending?
comment:24 Changed 2 years ago by
dependency needing work
comment:25 Changed 2 years ago by
 Dependencies changed from #27946, #28106 to #27946
 Milestone changed from sagepending to sage8.9
I think I'll just remove the dependency. It's not actually a dependency for this ticket, especially since my solution proposed there still doesn't seem to work. The relation between that issue and this ticket is much more indirect than I previously thought.
comment:26 Changed 2 years ago by
 Status changed from positive_review to needs_work
This breaks the gap interfaces when building with SAGE_DEBUG=yes:
********************************************************************** File "src/sage/interfaces/gap.py", line 732, in sage.interfaces.gap.Gap_generic._eval_line Failed example: a = gap(3) Exception raised: Traceback (most recent call last): File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 681, in _run self.compile_and_execute(example, compiler, test.globs) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 1105, in compile_and_execute exec(compiled, globs) File "<doctest sage.interfaces.gap.Gap_generic._eval_line[8]>", line 1, in <module> a = gap(Integer(3)) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/interfaces/interface.py", line 295, in __call__ result = self._coerce_from_special_method(x) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/interfaces/interface.py", line 323, in _coerce_from_special_method return (x.__getattribute__(s))(self) File "sage/structure/sage_object.pyx", line 693, in sage.structure.sage_object.SageObject._gap_ (build/cythonized/sage/structure/sage_object.c:5929) return self._interface_(G) File "sage/structure/sage_object.pyx", line 669, in sage.structure.sage_object.SageObject._interface_ (build/cythonized/sage/structure/sage_object.c:5475) X = I(s) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/interfaces/interface.py", line 288, in __call__ return cls(self, x, name=name) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/interfaces/expect.py", line 1438, in __init__ self._name = parent._create(value, name=name) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/interfaces/interface.py", line 491, in _create self.set(name, value) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/interfaces/gap.py", line 1411, in set self._eval_line(cmd, allow_use_file=True) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/interfaces/gap.py", line 746, in _eval_line expect_eof= (self._quit_string() in line)) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/interfaces/gap.py", line 609, in _execute_line x = E.expect_list(self._compiled_full_pattern) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/pexpect/spawnbase.py", line 369, in expect_list return exp.expect_loop(timeout) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/pexpect/expect.py", line 111, in expect_loop incoming = spawn.read_nonblocking(spawn.maxread, timeout) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/pexpect/pty_spawn.py", line 469, in read_nonblocking self.isalive() File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/pexpect/pty_spawn.py", line 704, in isalive alive = ptyproc.isalive() File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/contextlib.py", line 35, in __exit__ self.gen.throw(type, value, traceback) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/pexpect/pty_spawn.py", line 25, in _wrap_ptyprocess_err raise ExceptionPexpect(*e.args) ExceptionPexpect: isalive() encountered condition where "terminated" is 0, but there was no child process. Did someone else call waitpid() on our process? ********************************************************************** [...] [227 tests, 92 failures, 42.83 s]  sage t long src/sage/interfaces/gap.py # 92 doctests failed 
comment:27 Changed 2 years ago by
 Status changed from needs_work to needs_info
I'm not really sure what makes you think that has anything to do with this ticket. That looks like the gap interpreter itself is crashing unexpectedly when running a debug build. But this ticket doesn't change anything about gap itself or the pexpect interface.
comment:28 Changed 2 years ago by
I only ran the tests with and without this ticket, I haven't looked at the code. If it is as you say then presumably its some sort of resource exhaustion while running the tests.
comment:29 Changed 2 years ago by
I'll still check if I can reproduce. It could be something like #28106
comment:30 Changed 2 years ago by
I can reproduce it, even with memlimit=0
. Truly baffling, but I guess there must be something...
comment:31 Changed 2 years ago by
Somehow it seems this has caused a regression of something that was supposed to be fixed by #10296. The problems start here:
The following tests against a bug fixed at :trac:`10296`:: sage: gap(3) 3 sage: gap.eval('quit;') '' sage: a = gap(3) ** Gap crashed or quit executing '\$sage...:=3;;' ** Restarting Gap and trying again sage: a 3
Instead of gap crashing normally and restarting, we get this error instead:
Exception raised: Traceback (most recent call last): File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 681, in _run self.compile_and_execute(example, compiler, test.globs) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 1105, in compile_and_execute exec(compiled, globs) File "<doctest sage.interfaces.gap.Gap_generic._eval_line[8]>", line 1, in <module> a = gap(Integer(3)) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/interfaces/interface.py", line 295, in __call__ result = self._coerce_from_special_method(x) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/interfaces/interface.py", line 323, in _coerce_from_special_method return (x.__getattribute__(s))(self) File "sage/structure/sage_object.pyx", line 693, in sage.structure.sage_object.SageObject._gap_ (build/cythonized/sage/structure/sage_object.c:5929) return self._interface_(G) File "sage/structure/sage_object.pyx", line 669, in sage.structure.sage_object.SageObject._interface_ (build/cythonized/sage/structure/sage_object.c:5475) X = I(s) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/interfaces/interface.py", line 288, in __call__ return cls(self, x, name=name) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/interfaces/expect.py", line 1438, in __init__ self._name = parent._create(value, name=name) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/interfaces/interface.py", line 491, in _create self.set(name, value) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/interfaces/gap.py", line 1411, in set self._eval_line(cmd, allow_use_file=True) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/interfaces/gap.py", line 746, in _eval_line expect_eof= (self._quit_string() in line)) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/interfaces/gap.py", line 609, in _execute_line x = E.expect_list(self._compiled_full_pattern) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/pexpect/spawnbase.py", line 369, in expect_list return exp.expect_loop(timeout) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/pexpect/expect.py", line 111, in expect_loop incoming = spawn.read_nonblocking(spawn.maxread, timeout) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/pexpect/pty_spawn.py", line 469, in read_nonblocking self.isalive() File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/pexpect/pty_spawn.py", line 704, in isalive alive = ptyproc.isalive() File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/contextlib.py", line 35, in __exit__ self.gen.throw(type, value, traceback) File "/home/buildbotsage/slave/sage_git/build/local/lib/python2.7/sitepackages/pexpect/pty_spawn.py", line 25, in _wrap_ptyprocess_err raise ExceptionPexpect(*e.args) ExceptionPexpect: isalive() encountered condition where "terminated" is 0, but there was no child process. Did someone else call waitpid() on our process?
after which point the gap interface never recovers. It's only happening in the tests though. I can't reproduce interactively. Running the tests with serial
makes no difference.
comment:32 Changed 2 years ago by
 Commit changed from cc6491009070408c9ed5dde6317ee0ac229458ca to 65f8a84b3965bf80fc3b7fb6018a1be753f3eab2
Branch pushed to git repo; I updated commit sha1. This was a forced push. Last 10 new commits:
2fc1096  These tests were marked 'random output' despite the explicit calls to set_random_seed(0), which should have obviated this.

b123422  Use libgap for direct_product

f35a967  Use libgap for character_table

4aacf0f  Some fixes to make ClassFunction work properly with libgap.

4fe1132  Use libgap for PermutationGroup_generic._regular_subgroup_gap

3ef4b4b  Use libgap for PermutationGroup_generic.molien_series

9205aa6  Fix minor test failure: This will now fail at trying to find a _libgap_init_ rather than a _gap_init_

0d0e1bb  If the values passed to the ClassFunction constructor are already a libgap

c8c5ecf  Fixed slightly different output on a few tests due to PRNG differences related

65f8a84  Update this test to still use the gap pexpect interface.

comment:33 Changed 2 years ago by
 Status changed from needs_info to needs_review
Rebased on current develop, and the latest commit fixes the immediate problem in the test suite.
I believe there is still an underlying problem with libgap/gap interaction which is not yet addressed though (and is probably not new as of this ticket, but just more likely to occur with libgap being used more extensively).
comment:34 Changed 2 years ago by
This might be related to https://github.com/gapsystem/gap/issues/3380 , with libgap wait()
ing on random child processes inappropriatly.
comment:35 Changed 2 years ago by
It seems that, at least theoretically, we already do disable GAP's default SIGCHLD
handler.
comment:36 Changed 2 years ago by
 Status changed from needs_review to positive_review
Bizarrely, even after backing out that last test update, and rebuilding, I can't reproduce the issue anymore either. I think it might have actually had something to do with sagecleaner.
Please send this to the buildbots again. I really don't think the issue is caused by this ticket, but is something more ephemeral that it happens to provoke somehow.
comment:37 Changed 2 years ago by
Yes, this has got to have something to do with sagecleaner
. If I just start a sagecleaner
process on its own, then run ./sage t src/sage/interfaces/gap.py
I can reproduce the issue reliably.
comment:38 Changed 2 years ago by
comment:39 Changed 2 years ago by
 Dependencies changed from #27946 to #27946, #28354
comment:40 Changed 2 years ago by
 Branch changed from u/embray/prototype/permgpsnopexpectgap to 65f8a84b3965bf80fc3b7fb6018a1be753f3eab2
 Resolution set to fixed
 Status changed from positive_review to closed
The overhead of using pexpect for GAP is especially serious on Cygwin. I believe it to be one of the worst performance hits there, due to the I/O overhead.
It's so slow in fact that I wonder if there isn't a bug/defect in Cygwin related to this. It's normal that there is some additional overhead involved for I/O on Cygwin, but this is especially bad.