Opened 5 years ago

Last modified 3 years ago

#22626 closed enhancement

Upgrade to GAP 4.10 — at Version 305

Reported by: nthiery Owned by:
Priority: critical Milestone: sage-8.6
Component: interfaces Keywords: days85, libgap
Cc: alexk, dimpase, embray, fbissey, arojas, gh-sebasguts, jpflori, markuspf, nthiery, slelievre, vbraun, wstein, gh-timokau Merged in:
Authors: Nicolas M. Thiéry, Dima Pasechnik, Erik Bray, Jeroen Demeyer Reviewers:
Report Upstream: N/A Work issues: fix libgap workspace loading, etc, and much more work...
Branch: u/embray/spkgs/gap-410 (Commits, GitHub, GitLab) Commit: 7ccc120e3a606e044bdd9ef6835930075fa70a31
Dependencies: Stopgaps: error handling in libgap, documentation display

Status badges

Description (last modified by embray)

GAP 4.10 comes with a completely rewritten build system that will simplify our packaging. In particular, libGAP no longer needs to be a separate package.

What the branch does:

  • Remove the libgap spkg
  • Update the gap spkg to the new build system and build and install libgap
  • Replace gap.shi.patch by a plain gap startup script for Sage

Rationale: GAP used to provide a startup shell script. The GAP devs are in the process of getting rid of it and provide a very minimal one. They recommend to just write our own rather than patching it.

  • Update a few doctests w.r.t. changes of output of some GAP functions
  • Reworks how gap is installed in $SAGE_LOCAL: Rather than installing the entire source tree we just install the gap and libgap binaries to standard locations, and add a $SAGE_LOCAL/share/gap containing the GAP_ROOT_DIR which is stripped down to the minimum needed for GAP to work (standard libs and packages, docs, as well as the source code for introspection of kernel functions, but all build artifacts are carefully omitted).
  • Some of how this is done is still broken w.r.t. building compiled packages; need to work on it a bit more.
  • - Possibly controversial: The new libgap currently 'does not come' with symbol rewriting (foobar -> libGAP_foobar). This avoids messing around with GAP's sources; in particular opening the door for using a stock GAP from the OS distribution. However there always is a risk of name conflicts. And indeed, GAP's enums T_INT, T_FLOAT, ... conflict with Python's constants defined in structmember.h. This is hopefully not actually a problem in practice due to the way how Cython orders includes.

Something similar was started by Volker at #19915.

  • Removes configure.patch: it was patching configure for better GMP detection under Cygwin (#13954). This should not be needed anymore with the new build system and use of --with-gmp. If it is, upstream asked for it to be reported and they will fix it.
  • Removes additional patches for now--we would like to focus on being able to work with a system GAP as much as possible.
  • Revert #19726 (not needed anymore)

Status

Currently broken - crashes deep inside GAP error handling system after few simple commands.

Basic tests on libgap:

sage: libgap.eval("GAPInfo.Version")
sage: libgap.DihedralGroup(10).CharacterTable()
CharacterTable( <pc group of size 10 with 2 generators> )
sage: libgap.Group(libgap.eval("[(1,2,3),(1,2)]")).Size()
6

Running most relevant tests:

sage -tp 8 sage/groups sage/libs/gap

Current status: lots of errors

  • Still have some miscellaneous segfaults and other weird crashes
  • Still have work to do on improving error handling; replacing the built-in ErrorInner function might help here.
  • SIGINT handling by Python is broken by GAP_Intialize; need to work around this.

Testing packages with dynamic loading (e.g. IO):

Install IO:

cd $SAGE_LOCAL/share/gap/pkg
wget https://www.gap-system.org/pub/gap/gap4/tar.gz/packages/io-4.5.4.tar.gz
tar xvf io-4.5.4.tar.gz
cd io-4.5.4
./configure --with-gaproot=../..
make

Test it locally:

cd $SAGE_ROOT
gap
gap> LoadPackage("IO");
true

This does not yet work:

#W dlopen() error: /home/embray/src/sagemath/sage/local/share/gap/pkg/io-4.5.4/bin/x86_64-pc-linux-gnu-default64/io.so: undefine\
d symbol: ChangedBags
Error, module '/home/embray/src/sagemath/sage/local/share/gap/pkg/io-4.5.4/bin/x86_64-pc-linux-gnu-default64/io.so' not found
Error, was not in any namespace
Syntax warning: Unbound global variable in /home/embray/src/sagemath/sage/local/share/gap/lib/init.g:803
    ColorPrompt( UserPreference( "UseColorPrompt" ) );
               ^
Error, SetGasmanMessageStatus: function is not yet defined
true

This should be fixed once GAP's gap binary is built on top of libgap. See: https://github.com/markuspf/gap/issues/1 I believe this is fixed, but there are still some problems with the way this ticket is "installing" GAP for $SAGE_LOCAL.

Note:

  • Max Horn reviewed the list of GAP symbols we use in Sage; some have already changed in 4.9. See this pad for notes.

Upstream PRs

A few pull requests we made to improve use of GAP as a library:

Other open issues that don't have PRs yet:

Some other PRs useful to this effort:

Tarball: https://www.gap-system.org/pub/gap/gap-4.10/tar.gz/gap-4.10.0.tar.gz

Change History (306)

comment:1 Changed 5 years ago by nthiery

  • Description modified (diff)

comment:2 Changed 5 years ago by nthiery

  • Description modified (diff)

comment:3 Changed 5 years ago by nthiery

  • Description modified (diff)

comment:4 Changed 5 years ago by nthiery

  • Description modified (diff)

comment:5 Changed 5 years ago by nthiery

  • Description modified (diff)

comment:6 Changed 5 years ago by nthiery

  • Branch set to u/nthiery/upgrade_to_gap_4_9

comment:7 Changed 5 years ago by git

  • Commit set to 3011ac0908d667c0f245ca21859e336511106b5f

Branch pushed to git repo; I updated commit sha1. New commits:

1c645e0#22626: remove GAP's symbol prefixing in libgap: libGAP_Foo -> Foo
1ecc67b#22626: doctest update w.r.t. minor changes of output in GAP
e8ebbca#22626: GMP detection patch for cygwin should not be needed anymore
fd65b06#22626: Remove libgap spkg
9511733#22626: replace patch for GAP's startup script template in favor of a custom script
550625e#22626: remove GAP's symbol prefixing in libgap: libGAP_Foo -> Foo, and workaround GAP <-> Python symbol conflict
a9c859c#22626: updated gap spkg w.r.t. GAP's devel version and its new build system; also include compilation and installation of libgap
3011ac0Merge branch 'develop' into t/22626/upgrade_to_gap_4_9

comment:8 Changed 5 years ago by nthiery

  • Description modified (diff)

comment:9 Changed 5 years ago by nthiery

  • Description modified (diff)

comment:10 Changed 5 years ago by nthiery

  • Description modified (diff)

comment:11 Changed 5 years ago by nthiery

  • Description modified (diff)

comment:12 Changed 5 years ago by nthiery

  • Description modified (diff)

comment:13 Changed 5 years ago by nthiery

  • Cc vbraun dimpase wstein added
  • Keywords days85 libgap added
  • Work issues set to Wait for gap 4.9 release

comment:14 Changed 5 years ago by nthiery

  • Description modified (diff)

comment:15 Changed 5 years ago by fbissey

  • Cc fbissey added

comment:16 Changed 5 years ago by git

  • Commit changed from 3011ac0908d667c0f245ca21859e336511106b5f to 234c54b4b7e0495e343c3ab1b925e2c99f37e391

Branch pushed to git repo; I updated commit sha1. New commits:

234c54b#22626: revert #19726 as it won't be needed for gap 4.9

comment:17 Changed 5 years ago by nthiery

  • Description modified (diff)

Changed 5 years ago by nthiery

comment:18 Changed 5 years ago by nthiery

  • Description modified (diff)

comment:19 Changed 5 years ago by git

  • Commit changed from 234c54b4b7e0495e343c3ab1b925e2c99f37e391 to 431845f6da27d20c78b5e7a42b5f47bea866201c

Branch pushed to git repo; I updated commit sha1. New commits:

b301e4c#22626: ReST fix
431845f#22626: better work around for the GAP-Python name clash on T_INT, ...

comment:20 Changed 5 years ago by nthiery

  • Description modified (diff)

New commits:

b301e4c#22626: ReST fix
431845f#22626: better work around for the GAP-Python name clash on T_INT, ...

comment:21 Changed 5 years ago by git

  • Commit changed from 431845f6da27d20c78b5e7a42b5f47bea866201c to 7c04025083d61cab671df3302c32f353c4e28313

Branch pushed to git repo; I updated commit sha1. New commits:

7c0402522626: document the work around for the GAP-Python name clash on T_INT, ...

comment:22 follow-ups: Changed 5 years ago by fbissey

So if I understand the mechanics involved, at the moment you have to configure and install gap. Then clean and configure and install for libgap. So it is still behaving like "two packages" with the same source code. Do we have an idea on when we'll have just have gap linking on libgap which would also solve this particular problem?

comment:23 in reply to: ↑ 22 Changed 5 years ago by nthiery

Salut François,

Replying to fbissey:

So if I understand the mechanics involved, at the moment you have to configure and install gap. Then clean and configure and install for libgap. So it is still behaving like "two packages" with the same source code.

Exactly.

Do we have an idea on when we'll have just have gap linking on libgap which would also solve this particular problem?

The GAP developers are well aware of that and planning to implement it. They have not yet done yet just to be more incremental; their new build system is a big PR already :-) I would assume that this will be done before GAP 4.9 which they are planning for a couple months from now.

See: https://github.com/markuspf/gap/issues/2

Btw: feel free to comment / expand on the list there. I'll advertise this issue for additional feedback on the sage-packaging mailing list.

comment:24 in reply to: ↑ 22 ; follow-ups: Changed 5 years ago by jdemeyer

Replying to fbissey:

So if I understand the mechanics involved, at the moment you have to configure and install gap. Then clean and configure and install for libgap. So it is still behaving like "two packages" with the same source code.

For this reason, I wonder why you didn't keep the separation GAP + libGAP in Sage too. If it effectively behaves as two packages, it seems more natural to keep it as two packages in Sage too. Imagine for example that we need to patch GAP but not libGAP or conversely, that would be harder with the current approach. That being said, this is mostly bikeshedding. So, if the current setup works well, there might be no reason to change it.

comment:25 in reply to: ↑ 24 ; follow-up: Changed 5 years ago by dimpase

Replying to jdemeyer:

Replying to fbissey:

So if I understand the mechanics involved, at the moment you have to configure and install gap. Then clean and configure and install for libgap. So it is still behaving like "two packages" with the same source code.

For this reason, I wonder why you didn't keep the separation GAP + libGAP in Sage too. If it effectively behaves as two packages, it seems more natural to keep it as two packages in Sage too. Imagine for example that we need to patch GAP but not libGAP or conversely, that would be harder with the current approach. That being said, this is mostly bikeshedding. So, if the current setup works well, there might be no reason to change it.

Two completely separate packages would also mean that gap_packages should come in two different flavours, etc. I'd rather keep it simple and do not multiply these instances (but rather hope that libGAP will support all of GAP packages soon).

comment:26 in reply to: ↑ 25 ; follow-up: Changed 5 years ago by jdemeyer

Replying to dimpase:

Two completely separate packages would also mean that gap_packages should come in two different flavours, etc.

So, you are saying that we need to install gap_packages twice too? Once for GAP and once for libGAP?

comment:27 in reply to: ↑ 24 Changed 5 years ago by nthiery

Replying to jdemeyer:

For this reason, I wonder why you didn't keep the separation GAP + libGAP in Sage too.

I wanted to experiment with how far we were from having a proper single package, so as to give early feedback to the gap developers on any sticking points. And indeed, linking gap to libgap is basically the only sticking point. Since this point will be most likely resolved by the time this ticket gets merged in, we might as well shoot directly for the "right thing".

comment:28 in reply to: ↑ 26 ; follow-up: Changed 5 years ago by dimpase

Replying to jdemeyer:

Replying to dimpase:

Two completely separate packages would also mean that gap_packages should come in two different flavours, etc.

So, you are saying that we need to install gap_packages twice too? Once for GAP and once for libGAP?

no, not at all---I mean to say that there exist GAP packages (not part of Sage ATM) that break libGAP now. As some of these packages are very useful, it's important to have this fixed (but not at expense of having 2 separate gap_packages).

comment:29 in reply to: ↑ 28 Changed 5 years ago by nthiery

Replying to dimpase:

As some of these packages are very useful, it's important to have this fixed

YES. In fact that is my main motivation to work on the libgap integration: I badly want to use Semigroups that requires IO! We went through the details with Markus and Max, and the shared aim is indeed to have this resolved with the upgrade to 4.9.

comment:30 Changed 4 years ago by jpflori

  • Cc jpflori added

comment:31 Changed 4 years ago by jdemeyer

Is there any news here?

comment:32 Changed 4 years ago by slelievre

  • Cc slelievre added

GAP 4.9.0, a beta release for GAP 4.9, has been released.

This means we can start working on building Sage with it.

comment:33 follow-up: Changed 4 years ago by dimpase

Do you know if they now provide a replacement for libGAP?

comment:34 Changed 4 years ago by slelievre

  • Cc alexk markuspf added
  • Description modified (diff)

Cc-ing @alexk and @markuspf.

comment:35 in reply to: ↑ 33 Changed 4 years ago by nthiery

Replying to dimpase:

Do you know if they now provide a replacement for libGAP?

In principle yes: the branch I posted here last April was using their stock alpha version. Now the next step is to revive the branch and adapt it if needed.

I can work on this, but probably not at once. Any volunteer to take over welcome.

Markus: do you foresee any major changes in how GAP's lib is to be built that could affect what I was doing last April?

comment:36 follow-up: Changed 4 years ago by dimpase

I just noticed that in 4.9 smallgrps and transgrps are released under a GPL-compatible license. This is great! We really should make them (i.e. database_gap) standard Sage package, and do away with the extra hurdle of having database_gap optional.

comment:37 follow-up: Changed 4 years ago by jdemeyer

If you're willing to wait 2 months, this might be a good task for the Cernay workshop https://github.com/OpenDreamKit/OpenDreamKit/issues/251

comment:38 in reply to: ↑ 37 Changed 4 years ago by nthiery

Replying to jdemeyer:

If you're willing to wait 2 months, this might be a good task for the Cernay workshop https://github.com/OpenDreamKit/OpenDreamKit/issues/251

Yes indeed; it will be helpful to have GAP people under hand. I may just become impatient before that for some research project of mine, in which case I'll have a head start :-)

comment:39 in reply to: ↑ 36 Changed 4 years ago by nthiery

Replying to dimpase:

I just noticed that in 4.9 smallgrps and transgrps are released under a GPL-compatible license. This is great! We really should make them (i.e. database_gap) standard Sage package, and do away with the extra hurdle of having database_gap optional.

Yes! I can't wait for all the simplifications this will bring to us!

comment:40 in reply to: ↑ 22 ; follow-up: Changed 4 years ago by jdemeyer

Replying to fbissey:

So if I understand the mechanics involved, at the moment you have to configure and install gap. Then clean and configure and install for libgap. So it is still behaving like "two packages".

With the past and future modifications to the Sage build system in mind, I now object more strongly than before to treating two separate packages (GAP + libGAP) as one package.

The reason is that we now try to separate the build and install stages of packages. But that is incompatible with the recipe here of first "build and install GAP" and then "build and install libGAP".

comment:41 Changed 4 years ago by jdemeyer

For an example of two Sage packages with the same sources, see gcc and gfortran. IMHO, you should do the same for GAP and libGAP (unless they are merged upstream but my impression is that this has not happened yet).

comment:42 in reply to: ↑ 40 Changed 4 years ago by fbissey

Replying to jdemeyer:

Replying to fbissey:

So if I understand the mechanics involved, at the moment you have to configure and install gap. Then clean and configure and install for libgap. So it is still behaving like "two packages".

With the past and future modifications to the Sage build system in mind, I now object more strongly than before to treating two separate packages (GAP + libGAP) as one package.

The reason is that we now try to separate the build and install stages of packages. But that is incompatible with the recipe here of first "build and install GAP" and then "build and install libGAP".

Hopefully we will be getting something more serious at final release time, like with pari where you build libpari and then the gp executable. That would be the ideal scenario where it is really one package. Have to see how far they have gone with 4.9.

comment:43 Changed 4 years ago by jdemeyer

A lot of changes on this ticket (also the changes likely to lead to merge conflicts) are related to the unprefixing of libGAP symbols. To ease development, maybe we should try to do that separately from the upgrade to GAP 4.9.

I was thinking to do something like #19915 but changing only the Sage source code (not libGAP) and undoing the prefixing with macros like #define SomeGapFunction libGAP_SomeGapFunction. What do you think?

comment:44 follow-up: Changed 4 years ago by dimpase

there is also libgap unprefixing in Python/Cython? that has to be done. Perhaps separating these from C level unprefixing would help making it smoother.

comment:45 in reply to: ↑ 44 ; follow-up: Changed 4 years ago by jdemeyer

Replying to dimpase:

there is also libgap unprefixing in Python/Cython? that has to be done. Perhaps separating these from C level unprefixing would help making it smoother.

Yes, that is what I meant. Creating a separate ticket for unprefixing only at the Cython level by using macros to translate between the unprefixed names in Cython and the prefixed names in libGAP.

comment:46 in reply to: ↑ 45 ; follow-up: Changed 4 years ago by dimpase

Replying to jdemeyer:

Replying to dimpase:

there is also libgap unprefixing in Python/Cython? that has to be done. Perhaps separating these from C level unprefixing would help making it smoother.

Yes, that is what I meant. Creating a separate ticket for unprefixing only at the Cython level by using macros to translate between the unprefixed names in Cython and the prefixed names in libGAP.

I am not sure I understand. Currently in Cython on can do something like

    from sage.libs.gap.libgap import libgap
    g=libgap.ProjectiveGeneralLinearGroup(3,3)

How will this change under what you propose?

comment:47 in reply to: ↑ 46 Changed 4 years ago by jdemeyer

Replying to dimpase:

    from sage.libs.gap.libgap import libgap
    g=libgap.ProjectiveGeneralLinearGroup(3,3)

How will this change under what you propose?

The Python interface won't change at all. I am talking about how sage.libs.gap calls libGAP.

comment:48 Changed 4 years ago by jdemeyer

As far as I can tell, GAP 4.9 has not been released. Does anybody know how close we are to an actual release?

Last edited 4 years ago by jdemeyer (previous) (diff)

comment:49 Changed 4 years ago by slelievre

I think GAP 4.9.0 is considered a public beta for GAP 4.9.

The list of releases at

still displays GAP 4.8.10 as the latest release.

In the list of past releases there, the GAP 4.8, 4.7, 4.6, 4.5 series start at GAP 4.8.2, 4.7.2, 4.6.2, 4.5.4, so probably GAP 4.x.y with those x and lower y were considered beta.

Still, seeing how 4.9.0 works with Sage would be nice.

comment:50 Changed 3 years ago by jdemeyer

  • Dependencies set to #25273

comment:51 follow-up: Changed 3 years ago by jdemeyer

Where is the GAP source tarball which is supposed to be used here?

comment:52 Changed 3 years ago by jdemeyer

  • Branch changed from u/nthiery/upgrade_to_gap_4_9 to u/jdemeyer/upgrade_to_gap_4_9

comment:53 Changed 3 years ago by git

  • Commit changed from 7c04025083d61cab671df3302c32f353c4e28313 to b7278a120c5db710d1e11297b3dd1411d69d302b

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

72b5066Unprefixed libGAP interface
b7278a1Upgrade to GAP 4.9

comment:54 in reply to: ↑ 51 Changed 3 years ago by dimpase

Replying to jdemeyer:

Where is the GAP source tarball which is supposed to be used here?

here: https://www.gap-system.org/pub/gap/gap-4.9/beta/

comment:55 Changed 3 years ago by jdemeyer

  • Description modified (diff)

comment:56 in reply to: ↑ description Changed 3 years ago by jdemeyer

Replying to nthiery:

The branch attached to this ticket updates Sage to run on top of a branch of GAP by Markus Pfeiffer that adds libgap compilation and will be merged soon in the devel version of GAP.

It seems that this branch which "will be merged soon" is still not merged. It would be good to have an idea of whether or not that will happen.

comment:57 Changed 3 years ago by git

  • Commit changed from b7278a120c5db710d1e11297b3dd1411d69d302b to 40e644a16bafa77e4b93e72c215f53a4afb19cc1

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

d28b575Unprefixed libGAP interface
daf8754Unprefixed libGAP interface
40e644aUpgrade to GAP 4.9

comment:58 Changed 3 years ago by nthiery

According to https://github.com/gap-system/gap/pull/1205#issuecomment-385670578, Markus's libgap branch won't be merged in GAP 4.9: it's not yet ready enough. Bummer.

We are discussing with Sebastian and Thomas what we can do to move forward here. They are very interested as well by libgap for the integration between GAP and Julia.

Tentative steps:

  • As a warm up, upgrade our current gap and libgap spkg to 4.8.10
  • Upgrade our current gap and libgap spkg to 4.9; with the many changes to the build system, this may get more tricky
  • To make the switch to GAP's libgap less urgent, investigate whether we could change our current libgap spkg to not prefix GAP symbols, in order to be able to use GAP's dynamic packages; see #25273
  • On the GAP side: move toward more incremental changes. In particular, extract the changesets from Markus branch that are non controversial and ready, and get them merged in GAP (4.9? 4.10?)
Last edited 3 years ago by nthiery (previous) (diff)

comment:59 Changed 3 years ago by nthiery

  • Cc ghsebasguts added

comment:60 Changed 3 years ago by gh-antonio-rojas

  • Cc gh-antonio-rojas added

comment:61 Changed 3 years ago by slelievre

  • Description modified (diff)
  • Milestone changed from sage-8.0 to sage-8.3

GAP 4.9.1 was released in May 2018.

This is the first stable release of GAP 4.9 (GAP 4.9.0 was a beta for GAP 4.9).

comment:62 Changed 3 years ago by slelievre

  • Cc embray gh-sebasguts nthiery added; ghsebasguts removed

comment:63 Changed 3 years ago by gh-antonio-rojas

I'm confused - I just built GAP 4.9.1 from the upstream tarball and 'make install' *does* install /usr/lib/libgap.so and libgap's headers. Is this a different libgap than the one mentioned here to be included in 4.10?

comment:64 Changed 3 years ago by slelievre

One comment at pull request 1205 in GAP's github repo said GAP won't ship libGAP before GAP 4.10:

Maybe this was changed though.

comment:65 Changed 3 years ago by nthiery

  • Description modified (diff)

comment:66 follow-up: Changed 3 years ago by jdemeyer

Has anybody asked GAP developers for a policy on the prefixing?

In Cernay, I proposed that GAP should prefix some/all of their names like T_INT -> GAP_T_INT. For compiling GAP (+ GAP packages) itself, we can use a file with defines like

#define  T_INT  GAP_T_INT

to ensure backwards compatibility. This way, in GAP itself, one could use either T_INT or GAP_T_INT. Outside of GAP, only GAP_T_INT would work.

I think this is relatively easy to implement (I'm willing to do it) but I don't know what upstream GAP thinks of it.

comment:67 Changed 3 years ago by arojas

  • Cc arojas added; gh-antonio-rojas removed

comment:68 Changed 3 years ago by slelievre

  • Description modified (diff)

GAP 4.9.2 was released in July 2018.

comment:69 Changed 3 years ago by gh-timokau

  • Cc gh-timokau added

comment:70 Changed 3 years ago by nthiery

Hi,

For the record: I experimented a bit a month ago and a bit more this morning with Dima with the latest gap master that has make libgap and make install-libgap rules. See the following branch, which builds libgap from GAP's pristine sources:

https://git.sagemath.org/sage.git/log/?h=u/nthiery/gap-libgap

Note that, at this stage, Sage won't start with it; maybe even not compile.

Last edited 3 years ago by nthiery (previous) (diff)

comment:71 in reply to: ↑ 66 Changed 3 years ago by nthiery

Has anybody asked GAP developers for a policy on the prefixing?

In Cernay, I proposed that GAP should prefix some/all of their names like T_INT -> GAP_T_INT. For compiling GAP (+ GAP packages) itself, we can use a file with defines like

#define  T_INT  GAP_T_INT

to ensure backwards compatibility. This way, in GAP itself, one could use either T_INT or GAP_T_INT. Outside of GAP, only GAP_T_INT would work.

I think this is relatively easy to implement (I'm willing to do it) but I don't know what upstream GAP thinks of it.

That's the idea indeed. Eventually they will provide a standalone header file that will define the official GAP API, and where all symbols will be prefixed. To bootstrap the process, we (well mostly Max!) reviewed and annotated the symbols that are used in Sage:

https://hackmd.io/emNi76svSWCh1fBeLKqPdA#

comment:72 Changed 3 years ago by gh-timokau

Thank you for working on this! Sage using an up-to-date gap and doing away with the additional libgap package will simplify packaging.

comment:73 Changed 3 years ago by vdelecroix

  • Milestone changed from sage-8.3 to sage-8.4

update milestone 8.3 -> 8.4

comment:74 Changed 3 years ago by slelievre

GAP 4.9.3 was released.

comment:75 Changed 3 years ago by dimpase

  • Description modified (diff)
  • Summary changed from Upgrade to GAP 4.9 to Upgrade to GAP 4.10
  • Work issues changed from Wait for gap 4.9 release to Wait for gap 4.10 release

comment:76 Changed 3 years ago by dimpase

  • Description modified (diff)

comment:77 Changed 3 years ago by dimpase

  • Description modified (diff)

note that the prefix of the tarball should start with src/ - else one gets weird error messages from the python's tar.

GAPs make install target now works in a preliminary form - on Linux for sure.

comment:78 Changed 3 years ago by dimpase

  • Branch changed from u/jdemeyer/upgrade_to_gap_4_9 to u/dimpase/WIP/libgap410
  • Commit changed from 40e644a16bafa77e4b93e72c215f53a4afb19cc1 to f06912253d01557727041fe3b4d03e459fb90a11

This is WIP - builds, but libgap.pyx interaction loop has to be rewritten using new API.

Also, sage --gap is broken

gap: hmm, I cannot find 'lib/init.g' maybe use option '-l <gaproot>'?

New commits:

c766ad6Merge branch 'u/jdemeyer/upgrade_to_gap_4_9' of trac.sagemath.org:sage into libgap
5152637building GAP and libGAP from GAP
6bed83achanges for using libGAP from GAP - part 1
f069122removing util*

comment:79 Changed 3 years ago by dimpase

oops, one still needs initialize() from util.pyx, in some form...

comment:80 Changed 3 years ago by git

  • Commit changed from f06912253d01557727041fe3b4d03e459fb90a11 to 9c22e727f7a87271c26885e9cdc9b52583f598c5

Branch pushed to git repo; I updated commit sha1. New commits:

8c28e13Revert "removing util*"
9c22e72few comments

comment:81 Changed 3 years ago by nthiery

Great to see the progress! Sounds like a productive week in Siegen. Thanks so much.

comment:82 Changed 3 years ago by dimpase

  • Description modified (diff)

comment:83 Changed 3 years ago by git

  • Commit changed from 9c22e727f7a87271c26885e9cdc9b52583f598c5 to 14919d3d32074d36ece1da26815e5f5d7a24812b

Branch pushed to git repo; I updated commit sha1. New commits:

14919d3update util and the GAP package

comment:84 Changed 3 years ago by git

  • Commit changed from 14919d3d32074d36ece1da26815e5f5d7a24812b to 8b5ccbe6323feb8ded8cf71cf1d00faa71da59c5

Branch pushed to git repo; I updated commit sha1. New commits:

b3dc304Merge branch 'develop' into libgap
8a8a146use MarkBag instead of MARK_BAG
c5a5d4fno more *_BAG, and a call signature fix
d13fc80more compilation fixes
8b5ccberemoved debugging utils (they need a rewrite)

comment:85 Changed 3 years ago by git

  • Commit changed from 8b5ccbe6323feb8ded8cf71cf1d00faa71da59c5 to a0616deb562afcf9d5b1a8a4e02406b4a0479038

Branch pushed to git repo; I updated commit sha1. New commits:

a0616deadding --nointeract to libgap startup options

comment:86 Changed 3 years ago by dimpase

at this point I am at

sage: libgap(1)
Error, Variable: '_rich_repr_' must have a value
Error, Variable: '_ipython_canary_method_should_not_exist_' must have a\
 value
[... several more errors like above]
gap: panic, could not open *errout*!

comment:87 Changed 3 years ago by git

  • Commit changed from a0616deb562afcf9d5b1a8a4e02406b4a0479038 to 9cc327f6004c1a16819774fd34cf5fc48ff4a001

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

9cc327fadding --nointeract to libgap startup options

comment:88 follow-up: Changed 3 years ago by dimpase

In the current branch I am able to run

sage: g=libgap.SymmetricGroup(4)
sage: print(g.Order())
24
sage: i=g.Order().sage()
sage: i
24

So it's outputting GAP object that is broken ATM.

I'll try fixing the "old" GAP interface, perhaps it will help.

comment:89 in reply to: ↑ 88 ; follow-up: Changed 3 years ago by nthiery

Replying to dimpase:

In the current branch I am able to run

sage: g=libgap.SymmetricGroup(4)
sage: print(g.Order())
24
sage: i=g.Order().sage()
sage: i
24

Nice! Making good progress!

So it's outputting GAP object that is broken ATM. I'll try fixing the "old" GAP interface, perhaps it will help.

I am not sure what's broken above?

comment:90 in reply to: ↑ 89 Changed 3 years ago by dimpase

Replying to nthiery:

So it's outputting GAP object that is broken ATM. I'll try fixing the "old" GAP interface, perhaps it will help.

I am not sure what's broken above?

see comment:86 - displaying (lib)GAP objects is broken. Also,

sage: gap(1)
** Gap crashed or quit executing 'SetUserPreference("HistoryMaxLines", 30);' **
Restarting Gap and trying again
** Gap crashed or quit executing '\$sage1:=1;;' **
...

the latter is probably due to some scripts gotten messed up by the upgrade.

And error handling in libgap interface needs work, too.

comment:91 Changed 3 years ago by dimpase

libgap's display problem seems to be ipython-specific. Indeed

$ ./sage --python
Python 2.7.15 (default, Sep 15 2018, 05:05:36) 
[GCC 8.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from sage.all import *
>>> libgap(1)
1

comment:92 Changed 3 years ago by dimpase

Some discussions related to libgap API:

and an accepted PR to fix a bug that was breaking libgap/Sage interface:

https://github.com/gap-system/gap/pull/2840

Last edited 3 years ago by dimpase (previous) (diff)

comment:93 Changed 3 years ago by git

  • Commit changed from 9cc327f6004c1a16819774fd34cf5fc48ff4a001 to ece8cfc39e5f4258a5ec2c8dc2a307f988218891

Branch pushed to git repo; I updated commit sha1. New commits:

ece8cfcfix gap() interface and gap_console, removed some old stuff

comment:94 Changed 3 years ago by git

  • Commit changed from ece8cfc39e5f4258a5ec2c8dc2a307f988218891 to 043fb95c1ee6360ef5df9ac879d643a662bf6dbc

Branch pushed to git repo; I updated commit sha1. New commits:

043fb95fixing all but one doctests in interfaces/gap.py

comment:95 Changed 3 years ago by dimpase

One error - tab completion does not work on GAP objects:

sage -t --long --warn-long 49.3 src/sage/interfaces/gap.py
**********************************************************************
File "src/sage/interfaces/gap.py", line 1628, in sage.interfaces.gap.GapElement._tab_completion
Failed example:
    'Centralizer' in s5._tab_completion()
Exception raised:
    Traceback (most recent call last):
      File "/mnt/opt/Sage/sage-dev/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 659, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/mnt/opt/Sage/sage-dev/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 1070, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.interfaces.gap.GapElement._tab_completion[1]>", line 1, in <module>
        'Centralizer' in s5._tab_completion()
      File "sage/misc/cachefunc.pyx", line 2316, in sage.misc.cachefunc.CachedMethodCallerNoArgs.__call__ (build/cythonized/sage/misc/cachefunc.c:13467)
        self.cache = f(self._instance)
      File "/mnt/opt/Sage/sage-dev/local/lib/python2.7/site-packages/sage/interfaces/gap.py", line 1633, in _tab_completion
        v = P.eval(r'\$SAGE.OperationsAdmittingFirstArgument(%s)'%self.name())
      File "/mnt/opt/Sage/sage-dev/local/lib/python2.7/site-packages/sage/interfaces/gap.py", line 583, in eval
        result = Expect.eval(self, input_line, **kwds)
      File "/mnt/opt/Sage/sage-dev/local/lib/python2.7/site-packages/sage/interfaces/expect.py", line 1354, in eval
        for L in code.split('\n') if L != ''])
      File "/mnt/opt/Sage/sage-dev/local/lib/python2.7/site-packages/sage/interfaces/gap.py", line 789, in _eval_line
        raise RuntimeError(message)
    RuntimeError: Gap produced error output
    Error, no method found! For debugging hints type ?Recovery from NoMethodFound
    Error, no 1st choice method found for `Iterator' on 1 arguments

       executing \$SAGE.OperationsAdmittingFirstArgument(x);
**********************************************************************
1 item had failures:
   1 of   3 in sage.interfaces.gap.GapElement._tab_completion

comment:96 Changed 3 years ago by dimpase

  • Stopgaps set to tab completion on GAP objects, error handling in libgap

comment:97 Changed 3 years ago by git

  • Commit changed from 043fb95c1ee6360ef5df9ac879d643a662bf6dbc to 0128112b501a6c66d21b570dde09526ecf63c9b1

Branch pushed to git repo; I updated commit sha1. New commits:

0128112few important Cython types fixed

comment:98 Changed 3 years ago by git

  • Commit changed from 0128112b501a6c66d21b570dde09526ecf63c9b1 to 732cbdd54e24ac35c3fd52c75fb7608ad41603ce

Branch pushed to git repo; I updated commit sha1. New commits:

c4d128epass the correct error callback function
732cbddremoving libgap_enter/exit

comment:99 Changed 3 years ago by dimpase

in gap.* interface, tab completion is broken due to outdated code in src/ext/gap/sage.g: with GAP 4.10 one gets

gap> Read("sage.g");
gap> s5:=SymmetricGroup(5);
Sym( [ 1 .. 5 ] )
gap> \$SAGE.OperationsAdmittingFirstArgument(s5);
Error, no method found! For debugging hints type ?Recovery from NoMethodFound
Error, no 1st choice method found for `Iterator' on 1 arguments at /mnt/opt/Sage/sage-dev/local/gap/latest/lib/methsel2.g:250 called from
OPERATIONS[i + 1] at sage.g:69 called from
<function "LastReadValue">( <arguments> )
 called from read-eval loop at *stdin*:4
type 'quit;' to quit to outer loop

in GAP 4.8.6 this works:

gap> Read("sage.g");
gap> s5:=SymmetricGroup(5);
Sym( [ 1 .. 5 ] )
gap> \$SAGE.OperationsAdmittingFirstArgument(s5);
[ <Operation "ViewObj">, <Operation "ViewString">, <Operation "NameFunction">, <Operation "SetNameFunction">, 
...

I've asked here, as this is due to format change of internal GAP data...

Last edited 3 years ago by dimpase (previous) (diff)

comment:100 Changed 3 years ago by dimpase

Very similar outdated code is in OperationInspector.operations in sage/libs/gap/operations.py.

comment:101 Changed 3 years ago by git

  • Commit changed from 732cbdd54e24ac35c3fd52c75fb7608ad41603ce to 41218248115922fdae5c4ebb01a03a7df7bafc91

Branch pushed to git repo; I updated commit sha1. New commits:

4121824fix for tab completion for GAP interface

comment:102 Changed 3 years ago by git

  • Commit changed from 41218248115922fdae5c4ebb01a03a7df7bafc91 to 1d60fdd386b98b11474df00074039f3b3ef26bca

Branch pushed to git repo; I updated commit sha1. New commits:

1d60fddfix tab completion in libgap interface

comment:103 Changed 3 years ago by dimpase

  • Dependencies #25273 deleted
  • Stopgaps changed from tab completion on GAP objects, error handling in libgap to error handling in libgap

broken error handling:

sage: a=libgap(1)
sage: getattr(a,'_rich_repr_',None)
Error, Variable: '_rich_repr_' must have a value
sage: 

(with a default argument, getattr should just return it, with raising anything)

comment:104 Changed 3 years ago by dimpase

  • Stopgaps changed from error handling in libgap to error handling in libgap, documentation display

Please see https://github.com/gap-system/gap/issues/2874 for the current sticking point, which is error handling and processing of error messages. Basically, the question is whether it's possible to retain the behaviour of libGAP without patches to the GAP's kernel.

I'd like someone who knows more about libGAP error handling to have a look at badly butchered by me parts in the branch dealing with it, and suggest what should be done.

Another (easier) thing to fix is GAP and libgap documentation display.

comment:105 Changed 3 years ago by dimpase

  • Status changed from new to needs_info

comment:106 Changed 3 years ago by git

  • Commit changed from 1d60fdd386b98b11474df00074039f3b3ef26bca to e352f7e4ad34dfc1422afc04506332ef7674755a

Branch pushed to git repo; I updated commit sha1. New commits:

e352f7emodest progress - can do libgap(1)

comment:107 Changed 3 years ago by dimpase

  • Status changed from needs_info to needs_work
  • Work issues changed from Wait for gap 4.10 release to work...

the interaction with ipython (which tries to get things like _rich_repr_ from GAP to display GAP objects) is far from done. The current problem:

sage: a=libgap(41)
sage: a
41
sage: b=libgap(42)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
...
ValueError: libGAP: PrintTo: cannot open stream for output
Error, Length: <list> must be a list (not a boolean or fail)
...

this looks like a bug in my new (bad) implementation of GAP objects printing in Sage.

comment:108 Changed 3 years ago by dimpase

perhaps we should rather try rebasing libGAP to GAP 4.10, which looks a less daunting task now than a total replacement with "native" GAP's libgap, get it working, and then replace all the libgap_* functions by the native ones.

comment:109 follow-up: Changed 3 years ago by git

  • Commit changed from e352f7e4ad34dfc1422afc04506332ef7674755a to 406cae67f059056df29413b8fa36ac9964eb3c5b

Branch pushed to git repo; I updated commit sha1. New commits:

406cae6still the interaction with ipython is broken

comment:110 Changed 3 years ago by dimpase

On the present branch, I get

sage: a=libgap(1)
sage: a
*** trouble in PRINT_OR_APPEND_TO_STREAM***
...
*** trouble in PRINT_OR_APPEND_TO_STREAM***
1

after this GAP is pretty much nuked. This happens during ipython's attempts to pretty-print a, so that it tries to find _rich_repr_ etc. Specifically, at least on the system I am running this, the trouble starts not immediately, but where _rich_png_ (or the previous thing in the loop) is checked for by get_real_method in IPython/utils/dir2.py.

If I run the same from ./sage --python (after loading Sage there as usual) at least this much (and more) works.

I don't have a good idea how to debug this, so I'd like to pass the token now.

Last edited 3 years ago by dimpase (previous) (diff)

comment:111 Changed 3 years ago by dimpase

  • Status changed from needs_work to needs_review

oops, wrong ticket

Last edited 3 years ago by dimpase (previous) (diff)

comment:112 Changed 3 years ago by dimpase

  • Status changed from needs_review to needs_work

comment:113 Changed 3 years ago by embray

As long as we're upgrading GAP we should make sure DESTDIR builds work now; see #25044 which this conflicts with. But if this can get done soon we can skip #25044 and incorporate it into this. I'll give you a patch for it if you want.

I'm working on testing this on Cygwin now.

comment:114 Changed 3 years ago by embray

Okay, I am again confused as to what upstream tarball to use, because the one in the ticket description no longer matches the checksum.

comment:115 Changed 3 years ago by dimpase

I just checked that md5=017738e0ba9e166a19084d68123d3e1f as in GAP package checksum matches the md5 of https://github.com/dimpase/gap/releases/download/dev/gap-4.dev.tar.gz

Perhaps your download just got corrupted?

comment:116 follow-up: Changed 3 years ago by thansen

GAP 4.9.3 was uploaded to Debian unstable yesterday, the sagemath Debian package is broken until we can use GAP 4.9. Is the patch here already enough to use libgap from GAP 4.9 directly? It does not apply cleanly to sage 8.4. Or would you say updating the separate libgap to 4.9 is a better idea?

comment:117 in reply to: ↑ 116 Changed 3 years ago by dimpase

Replying to thansen:

GAP 4.9.3 was uploaded to Debian unstable yesterday, the sagemath Debian package is broken until we can use GAP 4.9. Is the patch here already enough to use libgap from GAP 4.9 directly? It does not apply cleanly to sage 8.4.

the patch here is very far from a working one.

Or would you say updating the separate libgap to 4.9 is a better idea?

If you could manage this, it would be great and very useful, but it's a very nontrivial amount of work, due to substantial changes in GAP internals that happened in GAP 4.9.

comment:118 follow-up: Changed 3 years ago by thansen

I see. If there was progress on this from you guys it would be very much appreciated. The freeze period for the next Debian release starts in January. If we can't fix the problem in time, sage will not be in the next Debian release, which will be the stable release for two years.

comment:119 in reply to: ↑ 118 Changed 3 years ago by jdemeyer

Replying to thansen:

The freeze period for the next Debian release starts in January.

For those of us who are not Debian experts, where is this documented? And what kind of changes are still allowed during the "freeze" period? For example, suppose for the sake of argument that this SageMath ticket is only merged in February, could it still be added as a patch to the Debian SageMath package?

comment:120 Changed 3 years ago by thansen

It is documented here: https://release.debian.org/buster/freeze_policy.html

The broken sagemath will eventually be automatically removed from Debian testing, which will become the next Debian release. After February 12 it cannot migrate back to testing even if it is fixed. The earlier transition freeze on January 12 means that no more library updates with SONAME change can be done (which might be relevant for a sage update too).

comment:121 follow-up: Changed 3 years ago by thansen

About the example you mentioned: This ticket does not need to be merged, we just need a usable patch. It seems that the gap maintainer is open to providing libgap.la, see https://bugs.debian.org/912862 . It is not yet clear if gap 4.10 is released in time to be included before the freeze.

comment:122 Changed 3 years ago by embray

  • Milestone changed from sage-8.4 to sage-8.5

This ticket mentions GAP 4.10, but should we not be focusing instead on GAP 4.9? Or would getting Sage working against GAP 4.10 imply support for 4.9 as well? I would want to check against both, and starting with the existing release (4.9) seems simpler...

Is there something at the system integration level I could be able to help with? ISTM this needs to be a high priority.

comment:123 in reply to: ↑ 121 Changed 3 years ago by embray

Replying to thansen:

About the example you mentioned: This ticket does not need to be merged, we just need a usable patch. It seems that the gap maintainer is open to providing libgap.la, see https://bugs.debian.org/912862 . It is not yet clear if gap 4.10 is released in time to be included before the freeze.

If that's the case, for the sake of integration, it might make more sense to pin GAP in Debia to 4.9 and focus on getting a Sage out that works against GAP 4.9 instead of focusing on 4.10. Given the amount of time left to do this, waiting on an uncertain GAP 4.10 package(s) landing in Debian seems a bit risky...

comment:124 Changed 3 years ago by dimpase

  • Description modified (diff)
  • Priority changed from major to critical

comment:125 Changed 3 years ago by dimpase

I think getting Sage's libGAP work with GAP 4.9 and/or 4.10 is much harder than in used to be in 4.8.x, due to major changes to GAP internals (although I have not tried it yet).

comment:126 follow-up: Changed 3 years ago by nthiery

Posting on behalf of Alexander Konovalov who does not yet have a trac account:

http://www.gap-system.org/pub/gap/gap-4.10/

has now GAP 4.10.0. The website will be updated in due course, but that means that GAP 4.10 is out.

That's excellent news. Now it's up to Sage to catch up.

comment:127 Changed 3 years ago by embray

Great! From what I saw yesterday it seemed like there were some doubts about how soon 4.10 would be released. But this being the case I rescind my previous comment.

comment:128 in reply to: ↑ 109 ; follow-up: Changed 3 years ago by embray

Replying to git:

Branch pushed to git repo; I updated commit sha1. New commits:

406cae6still the interaction with ipython is broken

If I'm to understand correctly, you just added this patch for debugging purposes, right? It's not intended to go into the final product is it? I'm going to look into this problem.

comment:129 in reply to: ↑ 128 Changed 3 years ago by dimpase

Replying to embray:

Replying to git:

Branch pushed to git repo; I updated commit sha1. New commits:

406cae6still the interaction with ipython is broken

If I'm to understand correctly, you just added this patch for debugging purposes, right? It's not intended to go into the final product is it? I'm going to look into this problem.

That's right, it's for debugging purposes (although it merely shows that something is going wrong, not more :-)). You might like to coordinate this with Jeroen, as he's going to work on this branch too...

comment:130 follow-up: Changed 3 years ago by embray

I'm updating it right now to the 4.10 release. Jeroen is probably a better person to work on the LibGAP integration just because he has more experience with it. But if there any specific segfaults or anything I could look at I will. I'll take a look, for example, at the tab-completion issue...

comment:131 Changed 3 years ago by embray

I'm also rebasing this branch on current develop, which needs a bit of work...

comment:132 in reply to: ↑ 126 Changed 3 years ago by dimpase

Replying to nthiery:

Posting on behalf of Alexander Konovalov who does not yet have a trac account:

http://www.gap-system.org/pub/gap/gap-4.10/

Alex should be aware that he can post on trac using his github account!

comment:133 Changed 3 years ago by embray

Another thing that it seems still hasn't been done is to update the spkg-install for GAP to use DESTDIR-based installation. I'll just work on that simultaneously with this. If we have to upgrade GAP anyways that should be part of the process (and may expose some flaws in GAP's new installation process...)

Last edited 3 years ago by embray (previous) (diff)

comment:134 Changed 3 years ago by embray

Needed to modify build/make/deps to directly make GAP a build dependency of sagelib as well, which is now needed (and remove the obsolete reference to the libgap spkg).

comment:135 Changed 3 years ago by jdemeyer

  • Authors changed from Nicolas M. Thiéry, ... to Nicolas M. Thiéry, Dima Pasechnik, Jeroen Demeyer
  • Description modified (diff)

comment:136 in reply to: ↑ 130 ; follow-up: Changed 3 years ago by jdemeyer

Replying to embray:

I'm updating it right now to the 4.10 release.

Are you actually doing that? I haven't seen any update on the branch here.

comment:137 Changed 3 years ago by jdemeyer

  • Authors changed from Nicolas M. Thiéry, Dima Pasechnik, Jeroen Demeyer to Nicolas M. Thiéry, Dima Pasechnik, Erik Bray, Jeroen Demeyer

comment:138 Changed 3 years ago by jdemeyer

  • Description modified (diff)

comment:139 Changed 3 years ago by jdemeyer

Thanks to this Cython PR (by me!), the libGAP T_INT thing might not be an issue anymore: https://github.com/cython/cython/pull/2006

Cython will still include <structmember.h> but only after all user code. It's still very fragile to have two different meanings for T_INT in the same source file (one enum value and one #define), but it should work for now. Of course, nobody can promise that it will remain working in the future.

comment:140 in reply to: ↑ 136 ; follow-up: Changed 3 years ago by embray

Replying to jdemeyer:

Replying to embray:

I'm updating it right now to the 4.10 release.

Are you actually doing that? I haven't seen any update on the branch here.

Just locally. It doesn't even build for me yet so no point in pushing.

comment:141 Changed 3 years ago by embray

I should clarify: GAP itself compiles just fine. But "installation", such that there is any, is still a partially-broken headache, and I'm experimenting how best to incorporate a GAP installation into Sage.

For now I'll probably keep with something similar to what we have currently, of having a GAP_DIR under $SAGE_LOCAL/gap. But I'm trying to decide, in that case, what the best thing to do with libgap might be.

Though I'm also trying to see if we can do something entirely different with the installation. There's something close to a standard "make install" but it isn't quite right, and I don't know how it works with GAP packages.

Last edited 3 years ago by embray (previous) (diff)

comment:142 follow-up: Changed 3 years ago by embray

It seems that currently make install actually just installs the libtool wrapper script for gap, and not the plain gap executable. I'm not sure if this is intentional or not--normally this would not make a whole lot of sense I think. Though one advantage to it is that the libtool wrapper script adds the "GAP_ROOT" that it was built in to PATH, and thus it seems capable of finding its way to the original GAP_ROOT it was installed from. This doesn't really work though, from the perspective of system packaging. The paths hard-coded in the libtool wrapper script are to the build directory itself, so when we move all the GAP_ROOT files to a new location such as $SAGE_LOCAL/gap/gap-4.10 it doesn't work properly with that GAP_ROOT.

Better to do what we did traditionally of having bin/gap as a shell script that wraps the real gap executable and passes an appropriate -l flag. Dima's branch already has something to that effect, though rather than patching bin/gap.sh I think we should just install our own custom gap script, such as the one that's currently in this branch but unused.

I'm also experimenting with adding the actual executable under something like bin/gap-bin. I don't think we need all the files from the GAP source tree, much less build artifacts. I think we can have a minimal GAP_ROOT of sorts (e.g. under share/gap) that includes grp/, lib/, pkg/ at a minimum... Maybe doc/? And possibly also src/ (so that PageSource can work for kernel functions) but I'm not sure if that's needed or not...

Last edited 3 years ago by embray (previous) (diff)

comment:143 follow-up: Changed 3 years ago by embray

So far this "relatively minimal" GAP install seems to be working pretty well, and is much cleaner. I'm sure there are some glitches here and there, but I don't have the expertise to really know how to break it. A lot of the test suites under tst/ seem to work, or mostly work.

comment:144 in reply to: ↑ 143 Changed 3 years ago by dimpase

Replying to embray:

So far this "relatively minimal" GAP install seems to be working pretty well, and is much cleaner. I'm sure there are some glitches here and there, but I don't have the expertise to really know how to break it. A lot of the test suites under tst/ seem to work, or mostly work.

Breaking libGAP should be easy, but GAP itself should be OK, I think.

comment:145 Changed 3 years ago by embray

FWIW I'm also developing and testing this on Cygwin, just for some extra wind in my face.

comment:146 Changed 3 years ago by embray

Note: Also need to manually install config.h into the headers. There's even a note in GAP's Makefile: # TODO: take care of config.h :(

comment:147 Changed 3 years ago by embray

Progress on my branch so far here.

Currently it builds, but is broken (trying to do anything with libgap just hangs).

Last edited 3 years ago by embray (previous) (diff)

comment:148 in reply to: ↑ 140 Changed 3 years ago by jdemeyer

Replying to embray:

Just locally. It doesn't even build for me yet so no point in pushing.

Depends. If it's an improvement over Dima's branch...

The question is basically: if I want to work on the libGAP interface, on which branch should I work?

comment:149 Changed 3 years ago by dimpase

I'm building Erik's branch now, once done, I'll be able to say how well it does its job (Erik works on Windows, so this makes things even more interesting for him).

comment:150 Changed 3 years ago by dimpase

Erik's branch does refer to *.gz tarball, but GAP has .*bz2 tarball. Thus

--- a/build/pkgs/gap/checksums.ini
+++ b/build/pkgs/gap/checksums.ini
@@ -1,4 +1,4 @@
-tarball=gap-VERSION.tar.gz
-sha1=ab95077df0775bc39dbf13becdc05818935a53d2
-md5=4d3cd775a7dcf5582b8b4dbc2b484bd4
-cksum=1444102593
+tarball=gap-VERSION.tar.bz2
+sha1=60d0b6b0127758da4270569c6ff42ad2364cac7e
+md5=7a94c54d2b1190d6471084b8fcbda6a9
+cksum=40709676

comment:151 Changed 3 years ago by dimpase

  • Work issues changed from work... to fix libgap workspace loading, etc, and much more work...

Apart from this little issue, Erik's branch (u/embray/spkgs/gap-410) is OK---in the sense I can basically do what I was able to do on my branch. E.g. (after not forgetting to remove the stale (lib)GAP workspaces - all bets are off if they are present, e.g. it might be why Erik's job hang, probably something to rectify in the installation prcedure) I can do

sage: a=libgap(1)
sage: b=libgap(1)
sage: c=a+b
sage: c.sage()
2

and then get a segfault as follows:

sage: a
---------------------------------------------------------------------------
SignalError                               Traceback (most recent call last)
<ipython-input-5-3f786850e387> in <module>()
----> 1 a

/home/scratch2/dimpase/sage/sage/local/lib/python2.7/site-packages/IPython/core/displayhook.pyc in __call__(self, result)
    244             self.start_displayhook()
    245             self.write_output_prompt()
--> 246             format_dict, md_dict = self.compute_format_data(result)
    247             self.update_user_ns(result)
    248             self.fill_exec_result(result)

/home/scratch2/dimpase/sage/sage/local/lib/python2.7/site-packages/IPython/core/displayhook.pyc in compute_format_data(self, result)
    148 
    149         """
--> 150         return self.shell.display_formatter.format(result)
    151 
    152     # This can be set to True by the write_output_prompt method in a subclass

/home/scratch2/dimpase/sage/sage/local/lib/python2.7/site-packages/sage/repl/display/formatter.pyc in format(self, obj, include, exclude)
    211             exclude = self.default_mime()
    212         ipy_format, ipy_metadata = super(SageDisplayFormatter, self).format(
--> 213             obj, include=include, exclude=exclude)
    214         if not ipy_format:
    215             return sage_format, sage_metadata

/home/scratch2/dimpase/sage/sage/local/lib/python2.7/site-packages/IPython/core/formatters.pyc in format(self, obj, include, exclude)
    171             md = None
    172             try:
--> 173                 data = formatter(obj)
    174             except:
    175                 # FIXME: log the exception

<decorator-gen-9> in __call__(self, obj)

/home/scratch2/dimpase/sage/sage/local/lib/python2.7/site-packages/IPython/core/formatters.pyc in catch_format_error(method, self, *args, **kwargs)
    215     """show traceback on failed format call"""
    216     try:
--> 217         r = method(self, *args, **kwargs)
    218     except NotImplementedError:
    219         # don't warn on NotImplementedErrors

/home/scratch2/dimpase/sage/sage/local/lib/python2.7/site-packages/IPython/core/formatters.pyc in __call__(self, obj)
    334                 return printer(obj)
    335             # Finally look for special method names
--> 336             method = get_real_method(obj, self.print_method)
    337             if method is not None:
    338                 return method()

/home/scratch2/dimpase/sage/sage/local/lib/python2.7/site-packages/IPython/utils/dir2.pyc in get_real_method(obj, name)
     72 
     73     try:
---> 74         m = getattr(obj, name, None)
     75     except Exception:
     76         return None

/home/scratch2/dimpase/sage/sage/local/lib/python2.7/site-packages/sage/libs/gap/element.pyx in sage.libs.gap.element.GapElement.__getattr__ (build/cythonized/sage/libs/gap/element.c:6741)()
    581         try:
    582             proxy = make_GapElement_MethodProxy\
--> 583                 (self.parent(), gap_eval(name), self)
    584         except ValueError:
    585             raise AttributeError('name "'+str(name)+'" is not defined in GAP.')

/home/scratch2/dimpase/sage/sage/local/lib/python2.7/site-packages/sage/libs/gap/util.pyx in sage.libs.gap.util.gap_eval (build/cythonized/sage/libs/gap/util.c:5412)()
    310 #    print("gap_string: "+gap_string+"\n")
    311     try:
--> 312             sig_on()
    313             result = GAP_EvalString(cmd)
    314             nresults = LEN_LIST(result)

SignalError: Segmentation fault

This is the sort of crashes I traced to ipython trying to do a rich display of a, basically, making GAP repeatedly report errors.

It seems that now I can reproduce this hanging, it seems that libgap workspace does not load correctly, so to start Sage and libgap I have to manually remove it in ~/.sage/gap/.

comment:152 in reply to: ↑ 142 ; follow-up: Changed 3 years ago by fbissey

Replying to embray:

It seems that currently make install actually just installs the libtool wrapper script for gap, and not the plain gap executable. I'm not sure if this is intentional or not--normally this would not make a whole lot of sense I think. Though one advantage to it is that the libtool wrapper script adds the "GAP_ROOT" that it was built in to PATH, and thus it seems capable of finding its way to the original GAP_ROOT it was installed from. This doesn't really work though, from the perspective of system packaging. The paths hard-coded in the libtool wrapper script are to the build directory itself, so when we move all the GAP_ROOT files to a new location such as $SAGE_LOCAL/gap/gap-4.10 it doesn't work properly with that GAP_ROOT.

Better to do what we did traditionally of having bin/gap as a shell script that wraps the real gap executable and passes an appropriate -l flag. Dima's branch already has something to that effect, though rather than patching bin/gap.sh I think we should just install our own custom gap script, such as the one that's currently in this branch but unused.

I'm also experimenting with adding the actual executable under something like bin/gap-bin. I don't think we need all the files from the GAP source tree, much less build artifacts. I think we can have a minimal GAP_ROOT of sorts (e.g. under share/gap) that includes grp/, lib/, pkg/ at a minimum... Maybe doc/? And possibly also src/ (so that PageSource can work for kernel functions) but I'm not sure if that's needed or not...

This is not my experience here with just plain gap-4.10.0 package

fbissey@moonloop ~/sandbox/local-gap/bin $ ldd -r gap
        linux-vdso.so.1 (0x00007ffc039c2000)
        libgmp.so.10 => /usr/lib64/libgmp.so.10 (0x00007fa163a4a000)
        libz.so.1 => /lib64/libz.so.1 (0x00007fa163833000)
        libreadline.so.7 => /lib64/libreadline.so.7 (0x00007fa1635e6000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fa16325c000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fa163058000)
        libutil.so.1 => /lib64/libutil.so.1 (0x00007fa162e55000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fa162a8d000)
        libncurses.so.6 => /lib64/libncurses.so.6 (0x00007fa162831000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fa1645f2000)
fbissey@moonloop ~/sandbox/local-gap/bin $ file gap
gap: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, with debug_info, not stripped

I just get plain executable binaries not libtool scripts after install. Both with the tarball in the description and the one from github (which is lighter because it doesn't include gap packages).

comment:153 Changed 3 years ago by embray

I was getting outright hanging as soon as libgap is initialized (the sage.libs.gap.util.initialize function. It would hang so badly that I could only terminate the process with a SIGKILL, and it would leave my terminal in some invalid state (I have to type reset to get it back--the characters aren't even echo'd). I should note, this was on Cygwin. I will try now on Linux and see if I get any further.

Regardless, libgap probably shouldn't be doing any monkeying around with the terminal, and that might be part of the problem on Cygwin.

Dima, from where did you download the 4.10.0 tarball? The one I got from gap-system.org was a tar.gz as in my branch.

comment:154 in reply to: ↑ 152 Changed 3 years ago by embray

Replying to fbissey:

This is not my experience here with just plain gap-4.10.0 package

fbissey@moonloop ~/sandbox/local-gap/bin $ ldd -r gap
        linux-vdso.so.1 (0x00007ffc039c2000)
        libgmp.so.10 => /usr/lib64/libgmp.so.10 (0x00007fa163a4a000)
        libz.so.1 => /lib64/libz.so.1 (0x00007fa163833000)
        libreadline.so.7 => /lib64/libreadline.so.7 (0x00007fa1635e6000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fa16325c000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fa163058000)
        libutil.so.1 => /lib64/libutil.so.1 (0x00007fa162e55000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fa162a8d000)
        libncurses.so.6 => /lib64/libncurses.so.6 (0x00007fa162831000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fa1645f2000)
fbissey@moonloop ~/sandbox/local-gap/bin $ file gap
gap: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, with debug_info, not stripped

I just get plain executable binaries not libtool scripts after install. Both with the tarball in the description and the one from github (which is lighter because it doesn't include gap packages).

I can't tell exactly what file you're looking at, but if the bin/ directory you reference above is the one that gets created in the gap source three, that's the normal executable. I'm taking about the gap that gets installed to $PREFIX when you run make install.

comment:155 Changed 3 years ago by embray

It's possible this is only a bug on Cygwin too. I haven't played with it on Linux yet, but I am beginning to now.

comment:156 follow-ups: Changed 3 years ago by embray

On Linux I was able to reproduce Dima's results. The first libgap() call is a little slow, which is to be expected--the GAP interface is initialized lazily. After that it works up to outputting a, whereas a.sage() and the like works. I also get the tab-completion segfault.

Jeroen: do you have anything to report from your investigation so far? I'd like to look into these segfaults but I don't want to duplicate your effort if you've already made significant progress.

comment:157 Changed 3 years ago by embray

Ooh, interesting. After my initial crash I started Sage up again, tried libgap(1), and it hanged just like on Cygwin. The process just sits without any CPU usage and does not respond to ctrl-C. It does respond to a SIGTERM. Could this have something to do with a corrupt GAP workspace?

comment:158 Changed 3 years ago by embray

I suspected something like this: GAP is installing its own readline callbacks, overriding the Python REPL's. I think it should not be doing that at all if --nointeract.

comment:159 in reply to: ↑ 156 Changed 3 years ago by dimpase

Replying to embray:

On Linux I was able to reproduce Dima's results. The first libgap() call is a little slow, which is to be expected--the GAP interface is initialized lazily. After that it works up to outputting a, whereas a.sage() and the like works. I also get the tab-completion segfault.

Jeroen: do you have anything to report from your investigation so far? I'd like to look into these segfaults but I don't want to duplicate your effort if you've already made significant progress.

I suspect that I forgot that we must not hold pointers to GAP objects - as GAP’s GC is moving things, this would be asking for trouble. So this could be a rich source of bugs I introduced in my branch :(

comment:160 Changed 3 years ago by dimpase

https://github.com/gap-system/gap/pull/2988 is introducing more things to libgap API, have a look (and say)

comment:161 Changed 3 years ago by dimpase

GAP 4.10 tarballs are here: http://www.gap-system.org/Releases/

comment:162 Changed 3 years ago by embray

Apparently passing the -E argument disables readline support so we should do that. Trying it now.

comment:163 Changed 3 years ago by embray

The issue with "corrupt" (?) gap workspaces seems to affect old versions too. After a crash, I switched over to a Sage build with GAP 4.8 just to compare something, and calling libgap hanged again. I have to rm -rf the old workspace to get it to work again. Nevermind; was still running the development Sage. Old Sage does not appear to be impacted.

Last edited 3 years ago by embray (previous) (diff)

comment:164 Changed 3 years ago by embray

This comment from Markus seems relevant: https://github.com/gap-system/gap/pull/2702#issuecomment-414960197 I am observing something that sounds like what he describes, and it would explain the terminal non-responsiveness. I'm not exactly sure where or why this is happening though.

comment:165 Changed 3 years ago by embray

-E fixed problems with readline, but we're still having problems in error handling. Somehow the error handling code--particularly manipulation of the ERROR_OUTPUT stream, results in a stack overflow \o/

comment:166 follow-up: Changed 3 years ago by dimpase

libgap is started in sage.libs.gap.util.initialize(), and you see that

argv[11] = "--nointeract"

i.e. --nointeract is passed. So you add -E to this list, right?

The GAP patch you disabled in your branch makes the stack overflow you see a bit more transparent, I think...

comment:167 follow-up: Changed 3 years ago by dimpase

in my branch in sage.libs.gap.util.initialize() I have cdef int argc = 11, and this seems to be off by 1. Is it the root of all the evil?

comment:168 in reply to: ↑ 167 Changed 3 years ago by embray

Replying to dimpase:

in my branch in sage.libs.gap.util.initialize() I have cdef int argc = 11, and this seems to be off by 1. Is it the root of all the evil?

No, that seems right to me...? No, you're right. It's off by one.

Last edited 3 years ago by embray (previous) (diff)

comment:169 in reply to: ↑ 166 Changed 3 years ago by embray

Replying to dimpase:

libgap is started in sage.libs.gap.util.initialize(), and you see that

argv[11] = "--nointeract"

i.e. --nointeract is passed. So you add -E to this list, right?

Basically. I replaced -T with -E, since -T is implied by --no-interact (which should probably also imply -E, but it doesn't).

However, as you pointed out, there's an off-by-one and now I'm not so sure if --no-interact was being processed at all.

Last edited 3 years ago by embray (previous) (diff)

comment:170 Changed 3 years ago by embray

There's what I would consider a bug in GAP, where the function PRINT_OR_APPEND_TO_STREAM goes:

    /* try to open the file for output                                     */
    i = OpenOutputStream(stream);
    if ( ! i ) {
        ErrorQuit( "%s: cannot open stream for output", (Int)funcname, 0L );
        return 0;
    }

but if this is called to print to the ERROR_OUTPUT stream during ErrorQuit then we get an infinite recursion. This is the root cause of the segfault's we're seeing. What I'm not sure about is why OpenOutputStream would fail, since we close ERROR_OUTPUT after each error, and then re-open it as a new stream. That part bothers me. But I think GAP should check here if stream == ERROR_OUTPUT and if so reset ERROR_OUTPUT to its default value as a last resort, or something to that effect.

comment:171 follow-up: Changed 3 years ago by dimpase

Erik, do you have your updated branch somewhere online? My last attempt to build your branch ends with a glorious collision of GAP's T_INT and Python's T_INT...

Yes, I agree that PRINT_OR_APPEND_TO_STREAM is a problem, and the patch in my branch I mentioned in comment 166 provides a crude workaround. I already mentioned this to GAP devs more than once, see e.g. https://github.com/gap-system/gap/issues/2487

comment:172 follow-up: Changed 3 years ago by embray

No, I don't have an updated branch. I'm not seeing that problem before.

I think I have some idea what the issue is. The problem is that because we call sig_error() in our error handler, we jump out of GAP's eval loop before it can perform certain cleanup, including closing the output stream that it's using to capture output in GAP_EvalString.

So we're leaving things in a slightly fragile state; and more importantly we're not cleaning up after ourselves. I believe that's the case here. I wonder if we should forgo sig_error() entirely and just let GAP_EvalString return naturally. The error handler can just record somewhere that an error occurred, and we can raise an exception from within gap_eval`.

comment:173 Changed 3 years ago by embray

Yep. This patch fixes it:

  • src/sage/libs/gap/gap_includes.pxd

    diff --git a/src/sage/libs/gap/gap_includes.pxd b/src/sage/libs/gap/gap_includes.pxd
    index 06a82ee..f348e2c 100644
    a b cdef extern from "<gap/libgap-api.h>": 
    129129    ctypedef void (*CallbackFunc)()
    130130    void GAP_Initialize(int argc, char ** argv, char ** env,
    131131        CallbackFunc, CallbackFunc)
    132     Obj GAP_EvalString(const char *)
     132    Obj GAP_EvalString(const char *) except *
    133133    Obj GAP_ValueGlobalVariable(const char *)
    134134#    void libgap_start_interaction(char* inputline)
    135135#    char* libgap_get_output()
  • src/sage/libs/gap/util.pyx

    diff --git a/src/sage/libs/gap/util.pyx b/src/sage/libs/gap/util.pyx
    index aa94e89..fb1e5e1 100644
    a b cdef initialize(): 
    221221    argv[8] = "64m"
    222222
    223223    argv[9] = "-q"    # no prompt!
    224     argv[10] = "-T"    # no debug loop
    225     argv[11] = "--nointeract"
     224    argv[10] = "-E"    # don't use readline as this will interfere with Python
     225    argv[11] = "--nointeract"  # Implies -T
    226226    argv[12] = NULL
    227227    cdef int argc = 11   # argv[argc] must be NULL
    228228
    cdef initialize(): 
    237237
    238238    # Initialize GAP and capture any error messages
    239239    # The initialization just prints error and does not use the error handler
    240     # libgap_initialize(argc, argv)
    241240    try:
    242241        GAP_Initialize(argc, argv, environ, &gasman_callback, &error_handler)
    243242    except RuntimeError as msg:
    244243        raise RuntimeError('libGAP initialization failed\n' + msg)
    245244
    246     # gap_error_msg = char_to_str(libgap_get_output())
    247     GAP_EvalString('libgap_errout := \"\"; ERROR_OUTPUT := OutputTextString(libgap_errout, false);')
    248 
    249     # The error handler is called if a GAP evaluation fails, e.g. 1/0
    250     # libgap_set_error_handler(&error_handler)
     245    # Set the ERROR_OUTPUT global in GAP to an output stream in which to
     246    # receive error output
     247    GAP_EvalString('libgap_errout := ""; '
     248                   'ERROR_OUTPUT := OutputTextString(libgap_errout, false);')
    251249
    252250    # Prepare global GAP variable to hold temporary GAP objects
    253251    global reference_holder
    cdef Obj gap_eval(str gap_string) except? NULL: 
    307305    # so that Cython doesn't dereference it before libGAP is done with
    308306    # its contents.
    309307    cmd = str_to_bytes(gap_string + ';\n')
    310 #    print("gap_string: "+gap_string+"\n")
    311308    try:
    312             sig_on()
    313             result = GAP_EvalString(cmd)
    314             nresults = LEN_LIST(result)
    315 #            print("nresults="+str(nresults)+"\n")
    316             if nresults > 1: # to mimick the old libGAP
    317                 raise ValueError('can only evaluate a single statement')
    318             result = ELM_LIST(result, 1) # 1-indexed!
    319 #            print("result's length: "+str(LEN_LIST(result))+"\n")
    320             if ELM_LIST(result, 1) != GAP_True:
    321                 # libgap_call_error_handler()
    322                         print("An error occurred, but libGAP has no handler set")
    323                         return GAP_False # needs work
    324             sig_off()
     309        sig_on()
     310        result = GAP_EvalString(cmd)
     311
     312        # If an error occurred in GAP_EvalString we won't even get
     313        # here if the error handler was set; but in case it wasn't
     314        # let's still check the result...
     315        nresults = LEN_LIST(result)
     316        if nresults > 1:  # to mimick the old libGAP
     317            # TODO: Get rid of this restriction eventually?
     318            raise ValueError('can only evaluate a single statement')
     319
     320        result = ELM_LIST(result, 1) # 1-indexed!
     321        if ELM_LIST(result, 1) != GAP_True:
     322            raise RuntimeError("an error occurred, but libGAP has no "
     323                               "error handler set")
    325324    except RuntimeError as msg:
    326             raise ValueError('libGAP: '+str(msg).strip())
     325        raise ValueError(f'libGAP: {msg}')
     326    finally:
     327        sig_off()
    327328
    328329    return ELM_LIST(result, 2)
    329330
    cdef void hold_reference(Obj obj): 
    357358
    358359cdef void error_handler():
    359360    """
    360     The libgap error handler
     361    The libgap error handler.
    361362
    362     We call ``sig_error()`` which causes us to jump back to the Sage
    363     signal handler. Since we wrap libGAP C calls in ``sig_on`` /
    364     ``sig_off`` blocks, this then jumps back to the ``sig_on`` where
    365     the ``RuntimeError`` we raise here will be seen.
     363    If an error occurred we set a RuntimeError; when the original
     364    GAP_EvalString returns this exception will be raised.
     365
     366    TODO: We should probably prevent re-entering this function if we
     367    are already handling an error; if there is an error in our stream
     368    handling code below it could result in a stack overflow.
    366369    """
    367370    cdef Obj r
     371    cdef char *msg
     372
     373    # TODO: Do we need/want this ClearError??
     374    ClearError()
     375
     376    # Close the error stream: This flushes any remaining output and closes
     377    # the stream for further writing; reset ERROR_OUTPUT to something sane
     378    # just in case (trying to print to a closed stream segfaults GAP)
     379    GAP_EvalString('CloseStream(ERROR_OUTPUT); '
     380                   'ERROR_OUTPUT := "*errout*"; '
     381                   'MakeImmutable(libgap_errout);');
    368382    r = GAP_ValueGlobalVariable("libgap_errout")
    369     GAP_EvalString("CloseStream(ERROR_OUTPUT);");
    370     GAP_EvalString('libgap_errout := \"\"; ERROR_OUTPUT := OutputTextString(libgap_errout, false);')
    371383
    372     msg_py = char_to_str(CSTR_STRING(r))
    373     msg_py = msg_py.replace('For debugging hints type ?Recovery from NoMethodFound\n', '')
     384    # Grab a pointer to the C string underlying the GAP string libgap_errout
     385    # then copy it to a Python str (char_to_str contains an implicit strcpy)
     386    msg = CSTR_STRING(r)
     387    if msg != NULL:
     388        msg_py = char_to_str(msg)
     389        msg_py = msg_py.replace('For debugging hints type ?Recovery from '
     390                                'NoMethodFound\n', '')
     391    else:
     392        # Shouldn't happen but just in case...
     393        msg_py = "An unknown error occurred in libGAP"
     394
     395    # Reset ERROR_OUTPUT with a new text string stream
     396    GAP_EvalString('libgap_errout := ""; '
     397                   'ERROR_OUTPUT := OutputTextString(libgap_errout, false);')
     398
    374399    PyErr_SetObject(RuntimeError, msg_py)
    375     ClearError()
    376     sig_error()
    377400
    378401
    379402############################################################################

Now printing GAP elements works, and it also fixes tab-completion. So a lot of basic stuff works with this. I make no guarantees about memory leaks, reference counting, etc.

Last edited 3 years ago by embray (previous) (diff)

comment:174 Changed 3 years ago by dimpase

you still have wrong argc count on line 227.

comment:175 Changed 3 years ago by dimpase

But otherwise it's a lot of progress, great!!!

comment:176 in reply to: ↑ 156 Changed 3 years ago by jdemeyer

Replying to embray:

Jeroen: do you have anything to report from your investigation so far? I'd like to look into these segfaults but I don't want to duplicate your effort if you've already made significant progress.

No, on the contrary: I'm not doing anything for the moment since I don't want to interfere were your (Erik and Dima) work.

comment:177 in reply to: ↑ 171 ; follow-up: Changed 3 years ago by jdemeyer

Replying to dimpase:

My last attempt to build your branch ends with a glorious collision of GAP's T_INT and Python's T_INT...

Which branch is that? According to my earlier analysis, that shouldn't happen. So I'd like to see that branch.

comment:178 in reply to: ↑ 172 Changed 3 years ago by jdemeyer

Replying to embray:

The problem is that because we call sig_error() in our error handler, we jump out of GAP's eval loop before it can perform certain cleanup, including closing the output stream that it's using to capture output in GAP_EvalString.

If so, the problem is not limited to sig_error() but also to interrupts. So it seems like your proposed solution of not using sig_error() will break interrupts, which I do not consider acceptable.

Would there be a way of patching GAP to make GAP_EvalString stateless?

comment:179 follow-ups: Changed 3 years ago by embray

I don't know what you mean that it would "break interrupts". Either way handling an interrupt with cysignals and not returning to GAP is going to break things.

There's definitely still a problem somewhere:

sage: b = libgap.eval('Sleep(5);')
Caught error at top-most level, probably quit from library loading

It sleeps, then displays that message immediately and quits the process. Strangely, the only source I can find for that message is at the end of InitializeGap which should have already exited. I think there is still some strange things going on in GAP's error handling...

On the other hand if I do:

sage: f = libgap.eval('Sleep;')
sage: f
<Gap function "Sleep">
sage: f(5)

it works. But when I hit Ctrl-C:

sage: f(10)
^CCaught error at top-most level, probably quit from library loading

and the process quits again.

Last edited 3 years ago by embray (previous) (diff)

comment:180 in reply to: ↑ 179 Changed 3 years ago by jdemeyer

Replying to embray:

I don't know what you mean that it would "break interrupts".

(A) If the keep the sig_on() statements, then CTRL-C would break GAP in the same way as sig_error() breaks GAP.

(B) If you remove the sig_on() statements, then GAP can no longer be interrupted with CTRL-C.

I consider both alternatives not acceptable.

Last edited 3 years ago by jdemeyer (previous) (diff)

comment:181 Changed 3 years ago by embray

I'm not even sure yet how GAP handles signals. Do you know anything about it? I haven't looked into it yet.

comment:182 Changed 3 years ago by embray

There is a RegisterSyLongjmpObserver that might be useful. I'm not sure if it's used at all in signal handling though.

Last edited 3 years ago by embray (previous) (diff)

comment:183 Changed 3 years ago by jdemeyer

Looking at the GAP sources right now... GAP seems to handle interrupts similar to CPython, by setting some global state. The function below is called by the SIGINT handler:

/****************************************************************************
**
*F  InterruptExecStat() . . . . . . . . interrupt the execution of statements
**
**  'InterruptExecStat'  interrupts the execution of   statements at the next
**  possible moment.  It is called from 'SyAnsIntr' if an interrupt signal is
**  received.  It is never called on systems that do not support signals.  On
**  those systems the executors test 'SyIsIntr' at regular intervals.
**
**  'InterruptExecStat' changes all entries   in the executor  dispatch table
**  'ExecStatFuncs'  to point to  'ExecIntrStat',  which changes  the entries
**  back, calls 'Error', and redispatches after a return from the break-loop.
*/
void InterruptExecStat ( void )
{
    /* remember the original entries from the table 'ExecStatFuncs'        */
    STATE(CurrExecStatFuncs) = IntrExecStatFuncs;
}

comment:184 follow-up: Changed 3 years ago by embray

Okay. GAP's own signal handling (which just handles SIGINT) uses InterruptExecStat to signal to the eval loop that an interrupt has occurred, so it will break out and return an error at the first opportunity. We can probably make use of that.

comment:185 Changed 3 years ago by embray

I see we've got our noses on the same trail :) I'm going to step away for a bit anyways.

comment:186 in reply to: ↑ 184 ; follow-up: Changed 3 years ago by jdemeyer

Replying to embray:

We can probably make use of that.

This will need new hooks in cysignals though. Also, all sig_on() and sig_off() statements should be removed from the GAP interface.

Given that this is not the first priority, I'll let you guys work further. We can deal with interrupts once we have a mostly-finished GAP interface.

comment:187 in reply to: ↑ 186 ; follow-up: Changed 3 years ago by embray

Replying to jdemeyer:

Replying to embray:

We can probably make use of that.

This will need new hooks in cysignals though. Also, all sig_on() and sig_off() statements should be removed from the GAP interface.

I was wondering about that. I'm not sure cysignals is needed at all really, though if it can provide a wrapper interface to make things easier that might be nice. I think it would be sufficient, when entering and leaving GAP_EvalString, to just register/unregister a custom signal handler, for SIGINT at the very least.

We also need to be careful around GAP_Initialize to restore any existing SIGINT handler that gets clobbered by GAP's initialization...

comment:188 in reply to: ↑ 187 Changed 3 years ago by jdemeyer

Replying to embray:

to just register/unregister a custom signal handler, for SIGINT at the very least.

Since that involves system calls, that would be less efficient than doing something within cysignals. I bet that this would cause a noticable slowdown for a calculation like 1 + 1.

comment:189 in reply to: ↑ 179 Changed 3 years ago by embray

Replying to embray:

I don't know what you mean that it would "break interrupts". Either way handling an interrupt with cysignals and not returning to GAP is going to break things.

There's definitely still a problem somewhere:

sage: b = libgap.eval('Sleep(5);')
Caught error at top-most level, probably quit from library loading

It sleeps, then displays that message immediately and quits the process. Strangely, the only source I can find for that message is at the end of InitializeGap which should have already exited. I think there is still some strange things going on in GAP's error handling...

The reason for this appears to be a bug in how we're processing the result list from GAP_EvalString--in particular it is the equivalent of a Python IndexError on the list. GAP is treating this as a sort of internal error, meaning it jumps to the last invocation of its TRY_IF_NO_ERROR/CATCH_ERROR macros. But since in our code we're not in a TRY/CATCH, perversely the last TRY is back in InitializeGap, so it invokes a longjmp back there.

It seems we can work around that by using the ELM_PLIST function directly rather than the higher-level generic ELM_LIST. The lower-level function does not invoke GAP error handling. I've added another comment about this here: https://github.com/gap-system/gap/pull/2988/files#r233021437

comment:190 Changed 3 years ago by embray

Another annoyance--albeit one that I'm not sure if Sage's libGAP handles either--is that if GAP Panic(...)s it will just exit(1) the process and I'm not sure if there's any way around that currently...

comment:191 in reply to: ↑ 177 Changed 3 years ago by dimpase

Replying to jdemeyer:

Replying to dimpase:

My last attempt to build your branch ends with a glorious collision of GAP's T_INT and Python's T_INT...

Which branch is that? According to my earlier analysis, that shouldn't happen. So I'd like to see that branch.

Well, it was something cooked up a bit, it could be I just messed things up. I hope Erik can update his branch and share it, so that I can try again...

comment:192 Changed 3 years ago by jdemeyer

  • Description modified (diff)

comment:193 follow-up: Changed 3 years ago by embray

  • Description modified (diff)

Should I just go ahead and attach my branch to this ticket?

It's up to date with my current work, which mostly works pretty well except for three major issues that I'm aware of:

1) The weirdness with hanging during GAP initialization if there is an existing workspace. This might have something to do with the issue Markus has mentioned w.r.t. spawning a shell (??) for some reason. I'm going to look into that next since I consider it the most severe issue for getting something usable. I'm sure I can figure it out quickly; I just haven't investigated it at all yet.

2) General issues regarding signal handling. The situation there mostly isn't great--I found some cases where even GAP's eval loop can't be interrupted. I agree with Jeroen that ideally that's "not acceptable", but I believe we may need to compromise in the near term (and the situation isn't all that better in current Sage either).

3) The Panic issue. I've confirmed that affects current Sage as well. I believe Markus has made some comment to suggest he's thinking about that (among other related issues) so we may have a solution at some point (I think the obvious one, at least broadly speaking, is just having a hook for what Panic actually does). But I'm not going to worry about that for the purposes of GAP 4.10 integration since it's at least not a regression.

comment:194 follow-up: Changed 3 years ago by dimpase

regarding 1), I guess it is just off by one argc thing I already mentioned, cause in the case of workspace loading, argc is right, so it does notexpect a shell...

by all means, change the branch to your branch...

Last edited 3 years ago by dimpase (previous) (diff)

comment:195 in reply to: ↑ 194 Changed 3 years ago by embray

Replying to dimpase:

regarding 1), I guess it is just off by one argc thing I already mentioned, cause in the case of workspace loading, argc is right, so it does notexpect a shell...

by all means, change the branch to your branch...

I don't think so, because I fixed that in my branch and it still happens. I previously thought it might have something to do with the workspace being corrupted since I was causing segfaults, but now all it takes is for me to start sage, do one operation with GAP (enough to initialize the workspace), exit sage, and then try to start GAP again...

comment:196 Changed 3 years ago by embray

  • Branch changed from u/dimpase/WIP/libgap410 to u/embray/spkgs/gap-410
  • Commit changed from 406cae67f059056df29413b8fa36ac9964eb3c5b to 30d1f7fa95fcc9855077c5ed6587748eed1a4cce

My semi-working branch...


Last 10 new commits:

39e6c46modest progress - can do libgap(1)
981e099still the interaction with ipython is broken
17df139Fix more libGAP prefixes
e76f38dremove bogus libgap dependency; add gap as a dependency for sagelib
bcce303updated gap package to the 4.10.0 final release
ad2ea55minor source updates to account for new installation layout
2c21db0Fix some argument handling:
a797f42some improvements to robustness of the ERROR_OUTPUT handling when handling
e58b9edAllow GAP_EvalString to raise a Python exception
30d1f7fcysignals docs sage sig_on() should be outside try

comment:197 Changed 3 years ago by dimpase

  • Description modified (diff)

link to the tarball used by the branch

comment:198 follow-up: Changed 3 years ago by embray

Thanks--sorry about that. No reason not to use the bz2-compressed tarball if it's smaller, either. I just haven't bothered with it yet. If you want to make a public branch off mine to mess around with things feel free to do so.

comment:199 Changed 3 years ago by embray

To my surprise, it's not in LoadWorkspace that it hangs. Regardless, there's something about there being an existing workspace to load that causes it to crash. Continuing my gdb session...

comment:200 in reply to: ↑ 198 Changed 3 years ago by dimpase

Replying to embray:

Thanks--sorry about that. No reason not to use the bz2-compressed tarball if it's smaller, either. I just haven't bothered with it yet. If you want to make a public branch off mine to mess around with things feel free to do so.

I'll have time in about 7 hours - taking a flight from Boston to London tonight. Please update your branch with whatever good things find...

comment:201 Changed 3 years ago by embray

Ah, there's a global function called POST_RESTORE, which I guess gets set by the workspace, which is where it hangs. It also appears in lib/session.g which is probably worth looking at.

comment:202 Changed 3 years ago by embray

Ah, the goldmine is in GAPInfo.PostRestoreFuncs. All kind of system-level and terminal-related stuff in here:

[ function (  )
        local option;
        for option in GAPInfo.CommandLineOptionData do
            if IsBound( option.short ) then
                GAPInfo.CommandLineOptionCanonicalName.(option.short) := option.short;
                if IsBound( option.long ) then
                    GAPInfo.CommandLineOptionCanonicalName.(option.long) := option.short;
                fi;
            else
                GAPInfo.CommandLineOptionCanonicalName.(option.long) := option.long;
            fi;
        od;
        return;
    end, function (  )
        local j, i, CommandLineOptions, opt, InitFiles, line, word, value, padspace;
        GAPInfo.KernelInfo := KERNEL_INFO(  );
        GAPInfo.KernelVersion := GAPInfo.KernelInfo.KERNEL_VERSION;
        GAPInfo.BuildVersion := GAPInfo.KernelInfo.BUILD_VERSION;
        GAPInfo.BuildDateTime := GAPInfo.KernelInfo.BUILD_DATETIME;
        GAPInfo.Architecture := GAPInfo.KernelInfo.GAP_ARCHITECTURE;
        GAPInfo.SystemCommandLine := GAPInfo.KernelInfo.COMMAND_LINE;
        GAPInfo.SystemEnvironment := GAPInfo.KernelInfo.ENVIRONMENT;
        GAPInfo.RootPaths := GAPInfo.KernelInfo.GAP_ROOT_PATHS;
        if IsBound( GAPInfo.SystemEnvironment.HOME ) then
            GAPInfo.UserHome := GAPInfo.SystemEnvironment.HOME;
        else
            GAPInfo.UserHome := fail;
        fi;
        if IsBound( GAPInfo.KernelInfo.DOT_GAP_PATH ) then
            GAPInfo.UserGapRoot := GAPInfo.KernelInfo.DOT_GAP_PATH;
        else
            GAPInfo.UserGapRoot := fail;
        fi;
        GAPInfo.DirectoriesPrograms := false;
        GAPInfo.DirectoryCurrent := false;
        GAPInfo.DirectoriesLibrary := rec(
             );
        GAPInfo.DirectoriesTemporary := [  ];
        GAPInfo.DirectoriesSystemPrograms := [  ];
        if IsBound( GAPInfo.SystemEnvironment.PATH ) then
            j := 1;
            for i in [ 1 .. LENGTH( GAPInfo.SystemEnvironment.PATH ) ] do
                if GAPInfo.SystemEnvironment.PATH[i] = ':' then
                    if i > j then
                        ADD_LIST_DEFAULT( GAPInfo.DirectoriesSystemPrograms,
                         MakeImmutable( GAPInfo.SystemEnvironment.PATH{[ j .. i - 1 ]} ) );
                    fi;
                    j := i + 1;
                fi;
            od;
            if j <= LENGTH( GAPInfo.SystemEnvironment.PATH ) then
                ADD_LIST_DEFAULT( GAPInfo.DirectoriesSystemPrograms,
                 MakeImmutable(
                   GAPInfo.SystemEnvironment.PATH{[ j .. LENGTH( GAPInfo.SystemEnvironment.PATH ) ]} )
                 );
            fi;
        fi;
        CommandLineOptions := rec(
             );
        for opt in GAPInfo.CommandLineOptionData do
            if IsBound( opt.short ) then
                CommandLineOptions.(opt.short) := SHALLOW_COPY_OBJ( opt.default );
            else
                CommandLineOptions.(opt.long) := SHALLOW_COPY_OBJ( opt.default );
            fi;
        od;
        InitFiles := [  ];
        line := GAPInfo.SystemCommandLine;
        i := 2;
        while i <= LENGTH( line ) do
            word := line[i];
            i := i + 1;
            if word = "" then
                PRINT_TO( "*errout*", "Ignoring empty command line argument\n" );
            elif word[1] = '-' and (LENGTH( word ) = 2 or word[2] = '-') then
                opt := word{[ 2 .. LENGTH( word ) ]};
                if opt[1] = '-' then
                    opt := opt{[ 2 .. LENGTH( opt ) ]};
                fi;
                if not IsBound( GAPInfo.CommandLineOptionCanonicalName.(opt) ) then
                    PRINT_TO( "*errout*", "Unrecognised command line option: ", word, "\n" );
                else
                    opt := GAPInfo.CommandLineOptionCanonicalName.(opt);
                    value := CommandLineOptions.(opt);
                    if IS_BOOL( value ) then
                        CommandLineOptions.(opt) := not CommandLineOptions.(opt);
                    elif IS_INT( value ) then
                        CommandLineOptions.(opt) := CommandLineOptions.(opt) + 1;
                    elif i <= LENGTH( line ) then
                        if IS_STRING_REP( value ) then
                            CommandLineOptions.(opt) := line[i];
                            i := i + 1;
                        elif IS_LIST( value ) then
                            ADD_LIST_DEFAULT( CommandLineOptions.(opt), line[i] );
                            i := i + 1;
                        fi;
                    else
                        PRINT_TO( "*errout*", "Command line option ", word, " needs an argument.\n" );
                    fi;
                fi;
            else
                ADD_LIST_DEFAULT( InitFiles, word );
            fi;
        od;
        CommandLineOptions.g := CommandLineOptions.g mod 3;
        CommandLineOptions.E := GAPInfo.KernelInfo.HAVE_LIBREADLINE;
        if CommandLineOptions.nointeract then
            CommandLineOptions.T := true;
            CommandLineOptions.norepl := true;
        fi;
        MakeImmutable( CommandLineOptions );
        MakeImmutable( InitFiles );
        if CommandLineOptions.L = "" or CommandLineOptions.R then
            GAPInfo.CommandLineOptionsPrev := [  ];
            GAPInfo.InitFilesPrev := [  ];
        else
            ADD_LIST_DEFAULT( GAPInfo.CommandLineOptionsPrev, GAPInfo.CommandLineOptions );
            ADD_LIST_DEFAULT( GAPInfo.InitFilesPrev, GAPInfo.InitFiles );
        fi;
        GAPInfo.CommandLineOptions := CommandLineOptions;
        GAPInfo.InitFiles := InitFiles;
        if GAPInfo.CommandLineOptions.D then
            InfoRead1 := Print;
        fi;
        padspace := function ( strlen, len )
              local i;
              for i in [ strlen + 1 .. len ] do
                  PRINT_TO( "*errout*", " " );
              od;
              return;
          end;
        if GAPInfo.CommandLineOptions.h then
            PRINT_TO( "*errout*", "usage: gap [OPTIONS] [FILES]\n",
             "       run the Groups, Algorithms and Programming system, Version ",
             GAPInfo.KernelVersion, "\n\n" );
            for i in [ 1 .. LENGTH( GAPInfo.CommandLineOptionData ) ] do
                if
                 IsBound( GAPInfo.CommandLineOptionData[i] )
                    and IsBound( GAPInfo.CommandLineOptionData[i].help ) then
                    opt := GAPInfo.CommandLineOptionData[i];
                    if IsBound( opt.short ) then
                        PRINT_TO( "*errout*", " -", opt.short );
                        if IsBound( opt.long ) then
                            PRINT_TO( "*errout*", ", --", opt.long );
                            padspace( 4 + LENGTH( opt.long ), 16 );
                        else
                            padspace( 0, 16 );
                        fi;
                        if IsBound( opt.arg ) then
                            PRINT_TO( "*errout*", " ", opt.arg );
                            padspace( LENGTH( opt.arg ) + 1, 8 );
                        else
                            padspace( 0, 8 );
                        fi;
                    else
                        PRINT_TO( "*errout*", "   " );
                        PRINT_TO( "*errout*", "  --", opt.long );
                        padspace( 4 + LENGTH( opt.long ), 16 );
                        if IsBound( opt.arg ) then
                            PRINT_TO( "*errout*", " ", opt.arg );
                            padspace( LENGTH( opt.arg ) + 1, 8 );
                        else
                            padspace( 0, 8 );
                        fi;
                    fi;
                    if IsBound( opt.long ) and LENGTH( opt.long ) > 12 then
                        PRINT_TO( "*errout*", "\n" );
                        padspace( 0, 3 + 16 + 8 + 3 );
                    else
                        PRINT_TO( "*errout*", "   " );
                    fi;
                    PRINT_TO( "*errout*", opt.help[1], "\n" );
                    for j in [ 2 .. LENGTH( opt.help ) ] do
                        padspace( 0, 3 + 16 + 8 + 3 );
                        PRINT_TO( "*errout*", opt.help[j], "\n" );
                    od;
                else
                    if not IsBound( GAPInfo.CommandLineOptionData[i] ) then
                        PRINT_TO( "*errout*", "\n" );
                    fi;
                fi;
            od;
            PRINT_TO( "*errout*", "\n",
             "  Boolean options (b,q,e,r,A,D,E,M,N,T,X,Y) toggle the current value\n",
             "  each time they are called. Default actions are indicated first.\n", "\n" );
            QUIT_GAP(  );
        fi;
        return;
    end, function (  )
        if DEBUG_LOADING then
            InfoRead1 := Print;
        else
            InfoRead1 := Ignore;
        fi;
        MAKE_READ_WRITE_GLOBAL( "DEBUG_LOADING" );
        UNBIND_GLOBAL( "DEBUG_LOADING" );
        ASS_GVAR( "BreakOnError", not GAPInfo.CommandLineOptions.T );
        ASS_GVAR( "AlwaysPrintTracebackOnError", GAPInfo.CommandLineOptions.alwaystrace );
        ASS_GVAR( "SilentNonInteractiveErrors", false );
        return;
    end, function (  )
        local nondigits, digits, i, char, have, need, haveint, needint;
        if
         not IS_STRING( GAPInfo.NeedKernelVersion )
              and not GAPInfo.KernelVersion in GAPInfo.NeedKernelVersion
            or IS_STRING( GAPInfo.NeedKernelVersion )
              and GAPInfo.KernelVersion <> GAPInfo.NeedKernelVersion then
            Print( "\n\n", "You are running a GAP kernel which does not fit with the library.\n\n" );
            nondigits := [  ];
            digits := "0123456789";
            for i in [ 0 .. 255 ] do
                char := CHAR_INT( i );
                if not char in digits then
                    ADD_LIST_DEFAULT( nondigits, char );
                fi;
            od;
            have := SplitStringInternal( GAPInfo.KernelVersion, nondigits, "" );
            need := SplitStringInternal( GAPInfo.NeedKernelVersion, nondigits, "" );
            haveint := [  ];
            needint := [  ];
            for i in [ 1 .. 3 ] do
                haveint[i] := IntHexString( have[i] );
                needint[i] := IntHexString( need[i] );
            od;
            Print( "Current kernel version:   ", haveint[1], ".", haveint[2], ".", haveint[3], "\n" );
            Print( "Library requires version: ", needint[1], ".", needint[2], ".", needint[3], "\n" );
            if haveint > needint then
                Print( "The GAP kernel is newer than the library.\n\n" );
            else
                Print(
                 "The GAP kernel is older than the library. Perhaps you forgot to recompile?\n\n" );
            fi;
            Error( "Update to correct kernel version!\n\n" );
        fi;
        return;
    end, function (  )
        local tmp, a;
        tmp := [  ];
        for a in REC_NAMES( OPER_FLAGS ) do
            ADD_LIST( tmp, OPER_FLAGS.(a) );
            Unbind( OPER_FLAGS.(a) );
        od;
        for a in tmp do
            OPER_FLAGS.(MASTER_POINTER_NUMBER( a[1] )) := a;
        od;
        return;
    end, function (  )
        ASS_GVAR( "ERROR_COUNT", 0 );
        ASS_GVAR( "ErrorLevel", 0 );
        ;
        ASS_GVAR( "QUITTING", false );
        return;
    end, function (  )
        local env, pos, enc, a, PositionSublist;
        PositionSublist := function ( str, sub )
              local i;
              for i in [ 1 .. Length( str ) - Length( sub ) + 1 ] do
                  if str{[ i .. i + Length( sub ) - 1 ]} = sub then
                      return i;
                  fi;
              od;
              return fail;
          end;
        if not IsBound( GAPInfo.TermEncodingOverwrite ) then
            if IsList( GAPInfo.SystemEnvironment ) then
                env := rec(
                     );
                for a in GAPInfo.SystemEnvironment do
                    pos := Position( a, '=' );
                    env.(a{[ 1 .. pos - 1 ]}) := a{[ pos + 1 .. Length( a ) ]};
                od;
            else
                env := GAPInfo.SystemEnvironment;
            fi;
            enc := fail;
            if IsBound( env.LC_CTYPE ) then
                enc := env.LC_CTYPE;
            fi;
            if enc = fail and IsBound( env.LC_ALL ) then
                enc := env.LC_ALL;
            fi;
            if enc = fail and IsBound( env.LANG ) then
                enc := env.LANG;
            fi;
            if enc <> fail then
                enc := STRING_LOWER( enc );
                if PositionSublist( enc, "utf-8" ) <> fail or PositionSublist( enc, "utf8" ) <> fail
                    then
                    GAPInfo.TermEncoding := "UTF-8";
                fi;
            fi;
            if not IsBound( GAPInfo.TermEncoding ) then
                GAPInfo.TermEncoding := "ISO-8859-1";
            fi;
        else
            GAPInfo.TermEncoding := GAPInfo.TermEncodingOverwrite;
        fi;
        MakeImmutable( GAPInfo.TermEncoding );
        return;
    end, function (  )
        if
         not (GAPInfo.CommandLineOptions.q or GAPInfo.CommandLineOptions.b
               or GAPInfo.CommandLineOptions.L <> "") then
            ShowKernelInformation(  );
        fi;
        return;
    end, function (  )
        ASS_GVAR( "IN_LOGGING_MODE", false );
        return;
    end, function (  )
        if IsBool( GAPInfo.DirectoryCurrent ) then
            GAPInfo.DirectoryCurrent := Directory( "./" );
        fi;
        return GAPInfo.DirectoryCurrent;
    end, function (  )
        READ_GAP_ROOT( "gap.ini" );
        return;
    end, function (  )
        if not GAPInfo.CommandLineOptions.O and UserPreference( "ReadObsolete" ) <> false
            and not IsBound( GAPInfo.Read_obsolete_gd ) then
            ReadLib( "obsolete.gd" );
            GAPInfo.Read_obsolete_gd := true;
        fi;
        return;
    end, function (  )
        local env, pos, enc, a;
        if not IsBound( GAPInfo.TermEncodingOverwrite ) then
            if IsList( GAPInfo.SystemEnvironment ) then
                env := rec(
                     );
                for a in GAPInfo.SystemEnvironment do
                    pos := Position( a, '=' );
                    env.(a{[ 1 .. pos - 1 ]}) := a{[ pos + 1 .. Length( a ) ]};
                od;
            else
                env := GAPInfo.SystemEnvironment;
            fi;
            enc := fail;
            if IsBound( env.LC_CTYPE ) then
                enc := env.LC_CTYPE;
            fi;
            if enc = fail and IsBound( env.LC_ALL ) then
                enc := env.LC_ALL;
            fi;
            if enc = fail and IsBound( env.LANG ) then
                enc := env.LANG;
            fi;
            if
             enc <> fail and (PositionSublist( enc, ".UTF-8" ) <> fail
                  or PositionSublist( enc, ".utf8" ) <> fail) then
                GAPInfo.TermEncoding := "UTF-8";
            fi;
            if not IsBound( GAPInfo.TermEncoding ) then
                GAPInfo.TermEncoding := "ISO-8859-1";
            fi;
        else
            GAPInfo.TermEncoding := GAPInfo.TermEncodingOverwrite;
        fi;
        MakeImmutable( GAPInfo.TermEncoding );
        return;
    end, function (  )
        local banner, msg, pair, excludedpackages, name, record;
        if IsBoundGlobal( "BANNER" ) then
            banner := ValueGlobal( "BANNER" );
            MakeReadWriteGlobal( "BANNER" );
            UnbindGlobal( "BANNER" );
        fi;
        BindGlobal( "BANNER", false );
        if GAPInfo.CommandLineOptions.L = "" then
            msg := "entering AutoloadPackages (no workspace)";
        else
            msg := Concatenation( "entering AutoloadPackages (workspace ", GAPInfo.CommandLineOptions.L
                , ")" );
        fi;
        LogPackageLoadingMessage( PACKAGE_DEBUG, msg, "GAP" );
        SetInfoLevel( InfoPackageLoading, UserPreference( "InfoPackageLoadingLevel" ) );
        GAPInfo.ExcludeFromAutoload := [  ];
        GAPInfo.PackagesInfoInitialized := false;
        InitializePackagesInfoRecords(  );
        GAPInfo.delayedImplementationParts := [  ];
        if ForAny( GAPInfo.Dependencies.NeededOtherPackages, function ( p )
                  return not IsBound( GAPInfo.PackagesLoaded.(p[1]) );
              end ) then
            LogPackageLoadingMessage( PACKAGE_DEBUG,
             Concatenation( [ "trying to load needed packages" ],
               List( GAPInfo.Dependencies.NeededOtherPackages, function ( pair )
                      return Concatenation( pair[1], " (", pair[2], ")" );
                  end ) ), "GAP" );
            if GAPInfo.CommandLineOptions.A then
                PushOptions( rec(
                    OnlyNeeded := true ) );
            fi;
            for pair in GAPInfo.Dependencies.NeededOtherPackages do
                if LoadPackage( pair[1], pair[2], false ) <> true then
                    LogPackageLoadingMessage( PACKAGE_ERROR, Concatenation( "needed package ",
                       pair[1], " cannot be loaded" ), "GAP" );
                    Error( "failed to load needed package `", pair[1], "' (version ", pair[2], ")" );
                fi;
            od;
            LogPackageLoadingMessage( PACKAGE_DEBUG, "needed packages loaded", "GAP" );
            if GAPInfo.CommandLineOptions.A then
                PopOptions(  );
            fi;
        fi;
        if not IsBound( GAPInfo.LibraryLoaded ) then
            LogPackageLoadingMessage( PACKAGE_DEBUG, [ "read the impl. part of the GAP library" ],
             "GAP" );
            ReadGapRoot( "lib/read.g" );
            GAPInfo.LibraryLoaded := true;
            GAPInfo.LoadPackageLevel := GAPInfo.LoadPackageLevel + 1;
            LoadPackage_ReadImplementationParts( GAPInfo.delayedImplementationParts, false );
            GAPInfo.LoadPackageLevel := GAPInfo.LoadPackageLevel - 1;
        fi;
        Unbind( GAPInfo.delayedImplementationParts );
        MakeReadWriteGlobal( "BANNER" );
        UnbindGlobal( "BANNER" );
        if IsBound( banner ) then
            BindGlobal( "BANNER", banner );
        fi;
        if GAPInfo.CommandLineOptions.A then
            LogPackageLoadingMessage( PACKAGE_DEBUG,
             "omitting packages suggested via \"PackagesToLoad\" (-A option)", "GAP" );
        elif ValueOption( "OnlyNeeded" ) = true then
            LogPackageLoadingMessage( PACKAGE_DEBUG,
             [ "omitting packages suggested via \"PackagesToLoad\"", " ('OnlyNeeded' option)" ], "GAP"
             );
        elif ForAny( List( UserPreference( "PackagesToLoad" ), LowercaseString ), function ( p )
                  return not IsBound( GAPInfo.PackagesLoaded.(p) );
              end ) then
            excludedpackages := List( UserPreference( "ExcludeFromAutoload" ), LowercaseString );
            LogPackageLoadingMessage( PACKAGE_DEBUG,
             Concatenation( [ "trying to load suggested packages" ], UserPreference( "PackagesToLoad"
                 ) ), "GAP" );
            for name in UserPreference( "PackagesToLoad" ) do
                if LowercaseString( name ) in excludedpackages then
                    LogPackageLoadingMessage( PACKAGE_DEBUG,
                     Concatenation( "excluded from autoloading: ", name ), "GAP" );
                elif not IsBound( GAPInfo.PackagesLoaded.(LowercaseString( name )) ) then
                    LogPackageLoadingMessage( PACKAGE_DEBUG,
                     Concatenation( "considering for autoloading: ", name ), "GAP" );
                    if LoadPackage( name, false ) <> true then
                        LogPackageLoadingMessage( PACKAGE_DEBUG, Concatenation( "suggested package ",
                           name, " cannot be loaded" ), "GAP" );
                    else
                        LogPackageLoadingMessage( PACKAGE_DEBUG, Concatenation( name, " loaded" ),
                         "GAP" );
                    fi;
                fi;
            od;
            LogPackageLoadingMessage( PACKAGE_DEBUG, "suggested packages loaded", "GAP" );
        fi;
        LogPackageLoadingMessage( PACKAGE_DEBUG,
         "call LoadPackageDocumentation for not loaded packages", "GAP" );
        for name in RecNames( GAPInfo.PackagesInfo ) do
            if not IsBound( GAPInfo.PackagesLoaded.(name) ) then
                record := First( GAPInfo.PackagesInfo.(name), IsRecord );
                if record <> fail then
                    LoadPackageDocumentation( record );
                fi;
            fi;
        od;
        LogPackageLoadingMessage( PACKAGE_DEBUG,
         "LoadPackageDocumentation for not loaded packages done", "GAP" );
        Unbind( GAPInfo.ExcludeFromAutoload );
        LogPackageLoadingMessage( PACKAGE_DEBUG, "return from AutoloadPackages", "GAP" );
        return;
    end, function (  )
        local xy;
        xy := [  ];
        if GAPInfo.CommandLineOptions.x <> "" then
            xy[1] := SMALLINT_STR( GAPInfo.CommandLineOptions.x );
        fi;
        if GAPInfo.CommandLineOptions.y <> "" then
            xy[2] := SMALLINT_STR( GAPInfo.CommandLineOptions.y );
        fi;
        if xy <> [  ] then
            SizeScreen( xy );
        fi;
        if GAPInfo.CommandLineOptions.g = 0 then
            SetGasmanMessageStatus( "none" );
        elif GAPInfo.CommandLineOptions.g = 1 then
            SetGasmanMessageStatus( "full" );
        else
            SetGasmanMessageStatus( "all" );
        fi;
        GAPInfo.ViewLength := UserPreference( "ViewLength" );
        ColorPrompt( UserPreference( "UseColorPrompt" ) );
        return;
    end, function (  )
        if not GAPInfo.CommandLineOptions.O and UserPreference( "ReadObsolete" ) <> false
            and not IsBound( GAPInfo.Read_obsolete_gi ) then
            ReadLib( "obsolete.gi" );
            GAPInfo.Read_obsolete_gi := true;
        fi;
        return;
    end, function (  )
        if UserPreference( "SaveAndRestoreHistory" ) = true then
            InstallAtExit( SaveCommandLineHistory );
            ReadCommandLineHistory(  );
        fi;
        return;
    end, function (  )
        if READ_GAP_ROOT( "gaprc" ) then
            GAPInfo.HasReadGAPRC := true;
        elif not IsExistingFile( GAPInfo.UserGapRoot ) then
            if not (GAPInfo.CommandLineOptions.r or GAPInfo.HasReadGAPRC) then
                if ARCH_IS_UNIX(  ) then
                    GAPInfo.gaprc := SHALLOW_COPY_OBJ( GAPInfo.UserHome );
                    if IsString( GAPInfo.gaprc ) then
                        APPEND_LIST_INTR( GAPInfo.gaprc, "/.gaprc" );
                    fi;
                else
                    GAPInfo.gaprc := "gap.rc";
                fi;
                if IsString( GAPInfo.gaprc ) and READ( GAPInfo.gaprc ) then
                    Info( InfoWarning, 1, "You are using an old ", GAPInfo.gaprc, " file. " );
                    Info( InfoWarning, 1, "See '?Ref: The former .gaprc file' for hints to upgrade." );
                    GAPInfo.HasReadGAPRC := true;
                fi;
            fi;
        fi;
        return;
    end, function (  )
        if not (GAPInfo.CommandLineOptions.q or GAPInfo.CommandLineOptions.b) then
            if GAPInfo.CommandLineOptions.L <> "" then
                ShowKernelInformation(  );
            fi;
            ShowPackageInformation(  );
        fi;
        return;
    end, function (  )
        local i, status;
        for i in [ 1 .. Length( GAPInfo.InitFiles ) ] do
            status := READ_NORECOVERY( GAPInfo.InitFiles[i] );
            if status = fail then
                PRINT_TO( "*errout*", "Reading file \"", GAPInfo.InitFiles[i],
                 "\" has been aborted.\n" );
                if i < Length( GAPInfo.InitFiles ) then
                    PRINT_TO( "*errout*",
                     "The remaining files on the command line will not be read.\n" );
                fi;
                break;
            elif status = false then
                PRINT_TO( "*errout*", "Could not read file \"", GAPInfo.InitFiles[i], "\".\n" );
            fi;
        od;
        return;
    end ]

comment:203 Changed 3 years ago by jdemeyer

  • Description modified (diff)

comment:204 in reply to: ↑ 193 ; follow-up: Changed 3 years ago by jdemeyer

Replying to embray:

I think the obvious one, at least broadly speaking, is just having a hook for what Panic actually does.

The obvious thing to do would be to treat this as any other error condition.

comment:205 in reply to: ↑ 204 Changed 3 years ago by embray

Replying to jdemeyer:

Replying to embray:

I think the obvious one, at least broadly speaking, is just having a hook for what Panic actually does.

The obvious thing to do would be to treat this as any other error condition.

Yes, but I don't know that that's necessarily the right thing to do in the case of GAP. Much of GAP's error handling code is actually written in the GAP language (in the error.g library), where as Panic() seems primarily reserved for cases where the GAP kernel cannot be guaranteed to work anymore, much handle errors correctly.

A similar, but more subtle approach, would be to bypass most of the normal higher-level error-handling functionality, but still use JUMP_TO_CATCH to allow the error to be handled. If the GAP interpreter process wants to just exit upon such errors it can do that in the appropriate CATCH handler.

That's probably still naïve but I would bet it's closer to what they might have in mind to do. Of course, we can always ask...

comment:206 Changed 3 years ago by embray

  • Description modified (diff)

I was slightly barking up the wrong tree with the PostRestoreFuncs. Turns out it's hanging right after that in the function called SESSION. That's where it invokes that problematic SHELL() call:

 36 BIND_GLOBAL("SESSION",
 37     function()
 38     local   prompt;
 39
 40     if GAPInfo.CommandLineOptions.q then
 41         prompt := "";
 42     else
 43         prompt := "gap> ";
 44     fi;
 45
 46     SHELL( GetBottomLVars(), # in global context
 47         false, # no return
 48         false, # no return  obj
 49         3,     # set last, last2 and last3 each command
 50         true,  # set time after each command
 51         prompt,
 52         function()
 53             if IsBound(OnGAPPromptHook) and IsFunction(OnGAPPromptHook) then
 54                 OnGAPPromptHook();
 55             else
 56                 return;
 57             fi;
 58         end,
 59         "*stdin*",
 60         "*stdout*",
 61         true);
 62
 63     BreakOnError := false;
 64 end);

I'm still no exactly sure what this is for or why it causes libgap to hang in Sage. But it seems we can avoid it by using the --norepl option. What's strange is that --nointeract should imply --norepl but that doesn't appear to be working. That would have been the case with the off-by-one error in argc that Dima pointed out. But I fixed that and it didn't seem to help :/

Perhaps this is simple...

Last edited 3 years ago by embray (previous) (diff)

comment:207 Changed 3 years ago by jdemeyer

  • Description modified (diff)

Erik, please take care not the change the ticket description again... you accidentally reverted a chance that I made twice now.

comment:208 follow-up: Changed 3 years ago by embray

This is basically the issue that was reported in https://github.com/gap-system/gap/pull/2840/files, but it appears to still be broken in some way if I load an existing workspace. The --nointeract and/or --norepl options get clobbered.

comment:209 Changed 3 years ago by embray

Sorry. Dangers of using an outdated platform.

comment:210 in reply to: ↑ 208 Changed 3 years ago by embray

Replying to embray:

This is basically the issue that was reported in https://github.com/gap-system/gap/pull/2840/files, but it appears to still be broken in some way if I load an existing workspace. The --nointeract and/or --norepl options get clobbered.

I realized somewhat too late that although this PR was merged well before 4.10.0 was released, that fix was never included in the release. So either we wait for 4.10.1 (and see if we can ensure that this fix is included) or we patch it, or work around it somehow. I'm looking into maybe using a gaprc file to hack around it, but I don't have high hopes for that...

comment:211 follow-up: Changed 3 years ago by embray

Ok. Putting the following gaprc in GAP_ROOT works around it:

if GAPInfo.CommandLineOptions.norepl then
    # GAP 4.10.0 has a bug that an interactive session will be started
    # even if --norepl was set; see https://github.com/gap-system/gap/pull/2840
    # To work around this we redefine the SESSION function to a no-op
    MAKE_READ_WRITE_GLOBAL("SESSION");
    UNBIND_GLOBAL("SESSION");
    BIND_GLOBAL("SESSION", function() end);
fi;

my one concern is that if there is a gaprc in the UserGapRoot this will be ignored. I'm not 100% sure if that's the way it works or not. Need to double-check.

Last edited 3 years ago by embray (previous) (diff)

comment:212 in reply to: ↑ 211 Changed 3 years ago by embray

Replying to embray:

my one concern is that if there is a gaprc in the UserGapRoot this will be ignored. I'm not 100% sure if that's the way it works or not. Need to double-check.

Yes. That does appear to be a problem... Another option might be to install this via a gap.ini, but that would theoretically have the same problem I think.

comment:213 Changed 3 years ago by embray

A third possibility, which may be good enough for Sage's case, would be to install a pre-configured "default workspace" that includes this fix, and just always load that workspace if no other is found.

comment:214 follow-up: Changed 3 years ago by embray

Ah, best of all (I feel like I discovered this once a long time ago too): If we just pass a filename argument to libgap's init it will read that filename too, after everything else. This happens during the PostRestoreFuncs processing so it's still early enough, before the session is initialized.

Therefore for Sage's libgap initialization we can include a file with any number of patches to GAP.

comment:215 Changed 3 years ago by git

  • Commit changed from 30d1f7fa95fcc9855077c5ed6587748eed1a4cce to ed8fc3058db547b2af7ab1e72e468a60aa92895c

Branch pushed to git repo; I updated commit sha1. New commits:

28535a5remove (apparently?) superfluous gap_eval
97be615when a statement in libgap.eval returns NULL, just return Python's None instead; makes for empty output to the command prompt
2b7bdcdUse ELM_PLIST directly instead of ELM_LIST when processing GAP_EvalString
ed8fc30install a special file called sage.gaprc that is read at the end of libgap

comment:216 Changed 3 years ago by embray

^ An attempt at a fix. Seems to work, and I'm pretty happy with the approach.

Last edited 3 years ago by embray (previous) (diff)

comment:217 in reply to: ↑ 214 Changed 3 years ago by embray

Replying to embray:

Ah, best of all (I feel like I discovered this once a long time ago too): If we just pass a filename argument to libgap's init it will read that filename too, after everything else. This happens during the PostRestoreFuncs processing so it's still early enough, before the session is initialized.

Therefore for Sage's libgap initialization we can include a file with any number of patches to GAP.

I see now that the pexpect interface does something similar with sage.g. It just lives in SAGE_EXTCODE rather than being installed in the package itself (I obviously prefer the latter). For libgap we probably shouldn't need most of the stuff in sage.g but I'm not sure.

comment:218 Changed 3 years ago by git

  • Commit changed from ed8fc3058db547b2af7ab1e72e468a60aa92895c to f3a909cdec30f70891b515fed05c53072fdfec09

Branch pushed to git repo; I updated commit sha1. New commits:

f3a909cdifferent way of repr-ing a GAP element more consistent with the previous method

comment:219 follow-up: Changed 3 years ago by embray

It appears that next we need to focus on the GAP pexpect interface, which is also broken. It just hangs as soon as you try to do anything :(

comment:220 in reply to: ↑ 219 Changed 3 years ago by embray

Replying to embray:

It appears that next we need to focus on the GAP pexpect interface, which is also broken. It just hangs as soon as you try to do anything :(

It appears to get lost in gap_reset_workspace while loading the packages 'sonata' and 'guava'. It doesn't run into problems with other packages. So there are two problems:

1) The fact that something is broken when loading those packages

2) The fact that Sage's pexpect parser for GAP seems to go into an infinite loop upon those error conditions (it even prints "Warning: this should never happen"--so whatever "this" is apparently can happen, and the parser does not behave well when it does.

comment:221 follow-up: Changed 3 years ago by alexk

Looking at this ticket. Of course it became very long and I can't get an overview of all thread, but some comments while scrolling through it:

1) There is a smaller archive https://www.gap-system.org/pub/gap/gap4core/gap-4.10.0-core.zip of the core GAP system and four _required_ packages, if it helps to reduce the bandwidth. You can then get matching package archive from https://www.gap-system.org/pub/gap/gap4pkgs/ (ask me which). Also, may be helpful to reduce interaction with packages

2) From doc/dev - PostRestoreFunctions are called after a workspace has been loaded to finish linking up the kernel and workspace. Some of those from GAPInfo.PostRestoreFunctions may be added by packages (GAPDoc, GRAPE).

3) Did you get it to work without the workspace already?

4) "A lot of the test suites under tst/ seem to work, or mostly work" - the word "mostly" sounds alarming. see e.g. https://travis-ci.org/gap-system/gap-docker-stable-4.10-testsuite - everything should work there.

5) doc directory of GAP - of course, please include it. Allow the user to use it from GAP command line via ?<search_term> mechanism.

All from me for now!

Last edited 3 years ago by alexk (previous) (diff)

comment:222 follow-up: Changed 3 years ago by alexk

Try to load GUAVA and SONATA in GAP session and check their output:

gap> LoadPackage("sonata");
#I  You may wish to install the xgap package
#I  and enjoy the graphic capabilities of SONATA.

  ___________________________________________________________________________
 /        ___
||       /   \                 /\    Version 2.9.1
||      ||   ||  |\    |      /  \               /\       Erhard Aichinger
 \___   ||   ||  |\\   |     /____\_____________/__\      Franz Binder
     \  ||   ||  | \\  |    /      \     ||    /    \     Juergen Ecker
     ||  \___/   |  \\ |   /        \    ||   /      \    Peter Mayr
     ||          |   \\|  /          \   ||               Christof Noebauer
 \___/           |    \|                 ||

 System    Of   Nearrings     And      Their Applications
 Info: https://gap-packages.github.io/sonata/

true
gap> LoadPackage("guava");

   ____                          |
  /            \           /   --+--  Version 3.14
 /      |    | |\\        //|    |
|    _  |    | | \\      // |     the GUAVA Group
|     \ |    | |--\\    //--|     
 \     ||    | |   \\  //   |     
  \___/  \___/ |    \\//    |      
                                  

true
gap>

You may want to use the 2nd argument to disable their banners:

gap> LoadPackage("sonata",false);
#I  You may wish to install the xgap package
#I  and enjoy the graphic capabilities of SONATA.
true
gap> LoadPackage("guava",false);
true
gap> 

comment:223 in reply to: ↑ 221 Changed 3 years ago by embray

Replying to alexk:

Looking at this ticket. Of course it became very long and I can't get an overview of all thread, but some comments while scrolling through it:

Yes, sorry for the length. We are just taking notes here as we go I think. Some point soon I will make a summary progress report and link to it in the description, perhaps giving some indication as to where to skip to in the comments.

comment:224 in reply to: ↑ 222 Changed 3 years ago by embray

Replying to alexk:

Try to load GUAVA and SONATA in GAP session and check their output:

It works normally, but when I run gap with the -p flag ("enable/disable package output mode") it hangs like:

@p1.@n ┌───────┐@n   GAP 4.10.0 of 01-Nov-2018@J@n │  GAP  │   https://www.gap-system.org@J@n └───────┘@n   Architecture: x86_64-pc-linux-gnu-default64@J@n Configuration:  gmp 6.0.0@n@J@n Loading the library @nand packages ...@J@!968002+@"06191+@#378371+@$19131+@%13743+@&63556+@!90963+@"6494+@#15447+@$0255+@%45192+@&63556+@!47803+@"4843+@#72844+@$0625+@%03152+@&63556+@!63803+@"1393+@#19753+@$8863+@%17602+@&63556+@!03712+@"6513+@#27151+@$3431+@%03171+@&63556+@!75471+@"1752+@#44851+@$1441+@%75241+@&63556+@!85661+@"6242+@#31881+@$0691+@%54511+@&63556+@!52121+@"3761+@#3368+@$687+@%1669+@&63556+@!48091+@"5051+@#45203+@$7951+@%1287+@&63556+@!18252+@"6241+@#42751+@$3632+@%6495+@&63556+@!5723+@"464+@#77637+@$1581+@%2345+@&63556+@!365+@"93+@#242821+@$4492+@%4735+@&63556+@n Packages:   GAPDoc 1.6.2@n, PrimGrp 3.3.2@n, SmallGrp 1.3@n, TransGrp 2.0.4@n@J@n Try '??help' for help. See also '?copyright', '?cite' and '?authors'@J@ngap> @iLoadPackage("sonata");
LoadPackage("sonata");
@rLoadPackage("sonata");@J@w3+XCN

I don't fully understand what -p means or what all the output above is, though I know that Sage's pexect interface to GAP uses this debug output extensively to understand what's going on.

Same if I do LoadPackage("sonata", false), so at least it has nothing to do with the banner??

If I press enter a few times while it's hanging (or really any other input) it eventually outputs

@fError, @fwindow system: @fIllegal Answer@f at /home/embray/src/sagemath/sage/local/share/gap/pkg/xgap-4.28/lib/color.gi:160@f called from@f@J@f<function "CreateColors">( <arguments> )@J@f called from read-eval loop @fat /home/embray/src/sagemath/sage/local/share/gap/pkg/xgap-4.28/lib/color.gi:220@J@nyou can 'quit;' to quit to outer loop, or@J@nyou can 'return;' to continue@J@fbrk> @e

@r@J@fbrk> @e
Last edited 3 years ago by embray (previous) (diff)

comment:225 Changed 3 years ago by embray

I see now---p is really intended for use by xgap and the like; it also just happens to be taken advantage of for Sage's pexpect interface as well (not unreasonably). This makes me wonder though why a couple of packages would stall out in this case. I hope it's not actually trying to talk to an X server...

comment:226 follow-up: Changed 3 years ago by embray

We need some way, when starting GAP for Sage's pexpect interface, to force "no, xgap is really not available"

comment:227 Changed 3 years ago by dimpase

Note that we are installing GAP with the patch

--- a/lib/package.gi    2016-11-12 14:25:15.000000000 +0000
+++ b/lib/package.gi    2017-01-23 22:45:39.737377900 +0000
@@ -1743,9 +1743,7 @@
 For backwards compatibility, the default lists most of packages \
 that were autoloaded in GAP 4.4 (add or remove packages as you like)."
     ],
-  default:= [ "autpgrp", "alnuth", "crisp", "ctbllib", "factint", "fga", 
-              "irredsol", "laguna", "polenta", "polycyclic", "resclasses", 
-              "sophus", "tomlib" ],
+  default:= [],
   values:= function() return RecNames( GAPInfo.PackagesInfo ); end,
   multi:= true,
   ) );

Perhaps it has become somewhat outdated in GAP 4.10.

comment:228 in reply to: ↑ 226 ; follow-up: Changed 3 years ago by arojas

Replying to embray:

We need some way, when starting GAP for Sage's pexpect interface, to force "no, xgap is really not available"

FWIW we (Arch) faced a similar issue some time ago:

https://bugs.archlinux.org/task/55174

My "fix" was to patch out loading xgap from sonata/PackageInfo.g

comment:229 Changed 3 years ago by dimpase

I've noticed that after installing this branch I still have gap in SAGE_LOCAL/bin pointing at the wrong location of GAP executable.

In fact, I also get gap-bin in SAGE_LOCAL/bin, and by right, gap should be a link to the latter.

Correction - one should change the script to point to the right GAP_DIR too, it seems...

#!/bin/sh
GAP_DIR=$SAGE_LOCAL/gap/latest
GAP_EXE=$GAP_DIR
exec "$GAP_EXE/gap" -l "$GAP_DIR" "$@"

It seems that spkg-install should remove the whole SAGE_LOCAL/gap/ to make sure to stale copies of GAP are used after the update.

Last edited 3 years ago by dimpase (previous) (diff)

comment:230 in reply to: ↑ 228 ; follow-up: Changed 3 years ago by embray

Replying to arojas:

Replying to embray:

We need some way, when starting GAP for Sage's pexpect interface, to force "no, xgap is really not available"

FWIW we (Arch) faced a similar issue some time ago:

https://bugs.archlinux.org/task/55174

My "fix" was to patch out loading xgap from sonata/PackageInfo.g

My solution is to add the following to sage.g:

  • src/ext/gap/sage.g

    diff --git a/src/ext/gap/sage.g b/src/ext/gap/sage.g
    index 2a4ce13..dafb003 100644
    a b  
    1 
    21#
    32# SAGE support utilities to read into the GAP session.
    43#
     4
     5# Prevent loading the xgap package; we use the -p flag to GAP in order to
     6# communicate with it via the pexpect interface; this is normally used by
     7# for an xgap window to communicate with GAP, so unfortunatelly setting this
     8# flag also allows the xgap package to be loaded and for some packages to
     9# attempt to communicate with a "window handler" that doesn't exist.
     10# Therefore we must explicitly disable loading of the xgap package.
     11
     12# Don't use SetUserPreference since that leads to reloading the workspace,
     13# which confusing the pexpect interface
     14if IsBound(GAPInfo.ExcludeFromAutoload) then
     15    Append(GAPInfo.ExcludeFromAutoload, "xgap");
     16else
     17    GAPInfo.ExcludeFromAutoload := [ "xgap" ];
     18fi;
     19
    520\$SAGE := rec();
    621
    722\$SAGE.OldPager := Pager;

This might be slightly fragile, as I don't think GAPInfo.ExcludeFromAutoload is meant to be manipulated directly (normally it should be set via SetUserPreference, but it's too late to do that in sage.g). Seems to work though.

The patch Dima mentioned isn't relevant here because none of those default packages use xgap. I don't think we should rely on that being patched out anyways.

comment:231 follow-up: Changed 3 years ago by embray

Dima, can I ask you about this patch?

  • src/sage/interfaces/gap.py

    commit 3d12d9c432e551c57241cf615d57e7cbd80fc23a
    Author: Dima Pasechnik <dimpase@gmail.com>
    Date:   Wed Sep 26 12:18:55 2018 +0100
    
        fixing all but one doctests in interfaces/gap.py
    
    diff --git a/src/sage/interfaces/gap.py b/src/sage/interfaces/gap.py
    index 55d41aa..6cf740c 100644
    a b class GapFunction(ExpectFunction): 
    16661666
    16671667            sage: print(gap.SymmetricGroup.__doc__)
    16681668            <BLANKLINE>
    1669             50 Group Libraries
     1669              50.1-12 SymmetricGroup
    16701670            <BLANKLINE>
    1671             When you start GAP, it already knows several groups. Currently GAP initially
    1672             knows the following groups:
     1671              ‣ SymmetricGroup( [filt, ]deg ) ─────────────────────────────────── function
    16731672            ...
    16741673        """
    16751674        M = self._parent

This test fails for me now. It gives the full output of that documentation file starting with "50 Group Libraries"...

That is not great though. If I ask for gap.SymmetricGroup.__doc__ it should only display the relevant section of that document, not the whole document. I wonder if there's some patch I removed, or some other reason to expect that asking for ? SymmetricGroup should only start on the relevant line...

comment:232 Changed 3 years ago by embray

I think I see the problem. This is a bit buggy in general (even before this ticket...)

comment:233 Changed 3 years ago by embray

We have some other serious problems with error handling, in particular in the areas where we go outside of what is currently possible with the libgap API--particularly when we do things like call functions directly with CALL_<N>ARGS. Unfortunately I don't see a great way around that right now. The current, very limited API provided to us has a way to get a pointer to a global variable (e.g. with GAP_ValueGlobalVariable) but then no way to really do anything with it--in particular no way to pass it as an argument to a function--without going outside the API.

That means that before errors occur somewhere down in the GAP kernel resulting in a longjump to STATE(ReadJmpError), we need to always ensure that it has somewhere to jump to that makes sense.

Currently when we wrap those calls in sig_on() and sig_off() we at least get a segfault handled by cysignals, but we can lose any useful error reporting from GAP if an error occurs somewhere that doesn't go through our error handling. Further confusing matters, there's an internal function ReadEvalError() that's kind of like a raise which just jump to the last TRY_IF_NO_ERROR. And unlike JUMP_TO_CALL it does not process our custom error handler. It can also just jump off into the weeds if we're not careful.

comment:234 Changed 3 years ago by embray

I think it would mostly clear things up if we either did away with sig_on() (or left it there for last-resort error handling) and added a similar macro, based on GAP's TRY_IF_NO_ERROR macro, the effect of which is also to set a return point for longjmps. If we don't do that then GAP can wind up jumping to some weird places, or even invalid longjmp buffers (resulting in segfaults).

comment:235 Changed 3 years ago by gh-ChrisJefferson

Re: 2b7bdcd. Don't replace ELM_LIST / LEN_LIST with ELM_PLIST / LEN_PLIST. You are going to get horrible crashes you will never fix.

LEN_LIST should be fine, use ELM0_LIST if you want to get the 0 for unbound elements of the list.

comment:236 in reply to: ↑ 230 Changed 3 years ago by dimpase

Replying to embray:

Replying to arojas:

Replying to embray:

We need some way, when starting GAP for Sage's pexpect interface, to force "no, xgap is really not available"

FWIW we (Arch) faced a similar issue some time ago:

https://bugs.archlinux.org/task/55174

My "fix" was to patch out loading xgap from sonata/PackageInfo.g

My solution is to add the following to sage.g: [...] This might be slightly fragile, as I don't think GAPInfo.ExcludeFromAutoload is meant to be manipulated directly (normally it should be set via SetUserPreference, but it's too late to do that in sage.g). Seems to work though.

xgap is a 20+ years old X11 code. It's not working on many X11 setups (won't build, or builds, but the resulting package does not work), is tricky get working on OSX or Windows, and there is a Jupyter-based replacement being worked on. I don't think Sage can realistically support it.

comment:237 in reply to: ↑ 231 Changed 3 years ago by dimpase

Replying to embray:

Dima, can I ask you about this patch?

  • src/sage/interfaces/gap.py

    commit 3d12d9c432e551c57241cf615d57e7cbd80fc23a
    Author: Dima Pasechnik <dimpase@gmail.com>
    Date:   Wed Sep 26 12:18:55 2018 +0100
    
        fixing all but one doctests in interfaces/gap.py
    
    diff --git a/src/sage/interfaces/gap.py b/src/sage/interfaces/gap.py
    index 55d41aa..6cf740c 100644
    a b class GapFunction(ExpectFunction): 
    16661666
    16671667            sage: print(gap.SymmetricGroup.__doc__)
    16681668            <BLANKLINE>
    1669             50 Group Libraries
     1669              50.1-12 SymmetricGroup
    16701670            <BLANKLINE>
    1671             When you start GAP, it already knows several groups. Currently GAP initially
    1672             knows the following groups:
     1671              ‣ SymmetricGroup( [filt, ]deg ) ─────────────────────────────────── function
    16731672            ...
    16741673        """
    16751674        M = self._parent

This test fails for me now. It gives the full output of that documentation file starting with "50 Group Libraries"...

That is not great though. If I ask for gap.SymmetricGroup.__doc__ it should only display the relevant section of that document, not the whole document. I wonder if there's some patch I removed, or some other reason to expect that asking for ?

No, you get the same with the Sage's current GAP 4.8.6.

SymmetricGroup should only start on the relevant line...

Sure.

I don't have an answer to this immediately - this particular change was a result of testing on a custom tarball I built from source; perhaps the upstream tarball packages docs in a bit different way. Let's postpone this until other things are done here.

comment:238 Changed 3 years ago by alexk

What I see with my limited knowledge of SageMath internals, you have interfaces/gap.py which mentions chapter 50 Group Libraries and in it 50.1-12 SymmetricGroup. If you check whether that section exists at http://www.gap-system.org/Manuals/doc/ref/chap50.html, you will see that it is now 50.1-10. GAP does not promise that the order and numbers of manual sections are the same, so for SageMath the best way would be to avoid such references.

Last edited 3 years ago by alexk (previous) (diff)

comment:239 Changed 3 years ago by alexk

Regarding XGAP and other packages, I'd prefer using user preferences over patches to GAP and /or packages. Furthermore, this is what is going on in SageMath with SONATA and XGAP. XGAP needs to be started from xgap.sh script - it can not be loaded via LoadPackage. The -p option to start GAP is reserved for XGAP. If you start GAP with -p, then you trick SONATA into assuming that XGAP is loaded. Can you do not call GAP with -p option, and in this case you do not to do anything with XGAP and SONATA at all.

comment:240 follow-up: Changed 3 years ago by alexk

P.S. rereading some earlier posts, it seems that you need -p for Sage's pexpect interface. Fine then. I suggest only that instead of tweaking GAPInfo.ExcludeFromAutoload in gap/sage.g you use another user preference:

##  These packages are not regarded as available. This doesn't work for
##  packages which are needed by the GAP library, or which are already
##  loaded in a workspace.
# SetUserPreference( "PackagesToIgnore", [ ] );

This way, you ignore this package if GAP is loaded for SageMath, but if it is compiled properly, it still can be used if one works with GAP directly.

comment:241 in reply to: ↑ 240 Changed 3 years ago by embray

Replying to alexk:

P.S. rereading some earlier posts, it seems that you need -p for Sage's pexpect interface. Fine then. I suggest only that instead of tweaking GAPInfo.ExcludeFromAutoload in gap/sage.g you use another user preference:

##  These packages are not regarded as available. This doesn't work for
##  packages which are needed by the GAP library, or which are already
##  loaded in a workspace.
# SetUserPreference( "PackagesToIgnore", [ ] );

This way, you ignore this package if GAP is loaded for SageMath, but if it is compiled properly, it still can be used if one works with GAP directly.

The problem with this is where to put the SetUserPreference call. It should go somewhere like a gap.ini but the thing I don't like gap.ini and gaprc handling is that it will only read the first one it finds. So even if I install a customized gap.ini to be read by Sage, if the user has a UserGapRoot containing their own gap.ini it will take full precedence, and there is no way, for example, to inherit also from a "system" gap.ini.

Therefore the solution that Sage has apparently used for some time is to read a file, passed on the command line, containing additional customizations when starting GAP. This ensures that it will always be read. But I found that putting SetUserPreference in at such a late stage in the initialization didn't work, especially when loading an existing workspace. I might be wrong about that but when I tried this it didn't work.

comment:242 Changed 3 years ago by embray

Yes, I tried it again and calling SetUserPreference late, after GAP initialization has already mostly completed, appears to have no effect if I then start loading some packages. I have to modify GAPInfo.ExcludeFromAutoload directly.

This would be avoidable if there were a sort of gap.ini that can be loaded early on in initialization that I could guarantee will be run, but unless I'm missing something that does not appear to be quite possible...

Last edited 3 years ago by embray (previous) (diff)

comment:243 Changed 3 years ago by alexk

I agree that you need to tweak GAPInfo in gap/sage.g - but it does not make sense to tweak ExcludeFromAutoload there - you better change PackagesToIgnore. XGAP is not autoloaded at all, so adding it there is not semantically correct.

But perhaps a better way would be to make gap.ini working, by passing a directory that contains the gap.ini file as needed by Sage to GAP using the -l option. See 9.2 GAP Root Directories to start with.

comment:244 Changed 3 years ago by alexk

P.S. -r option will disable reading user's .gap/gap.ini

comment:245 Changed 3 years ago by alexk

proof of concept:

$ gap -b -r -l "~/sage/;~/gap-4.10.0/"

Reading sage/gap.ini

comment:246 Changed 3 years ago by embray

I believe we do currently use -r actually, so maybe there's something to be said for that, though it's also unfortunate: Perhaps the user has some things they want to set, and we're now crippling that capability. Of course, that is the current case as well, and is not so good.

I'm also not sure where the best place would be to put a Sage-specific gap.ini. For Sage's own gap package we're setting up our own GAP root into which we could put such things, so that's straightforward.

But I really want Sage to be able to work better with a system version of GAP and in that case it's not so obvious. If we do something like -l /path/to/sage-specific-gap-root;/path/to/system/gap/root would it still be able to access packages and libraries in the system GAP root?

comment:247 follow-up: Changed 3 years ago by alexk

1) If the user has own ~/.gap that may be what they use to customise their own GAP installation - perhaps what they want to set for other GAP, not the one in Sage?

2) If you have own GAP root already, putting it there seems natural to me

3) yes, sure - it will access packages and libraries in the system GAP root. Just try my example without -b to the the full GAP banner.

Last edited 3 years ago by alexk (previous) (diff)

comment:248 in reply to: ↑ 247 Changed 3 years ago by embray

Replying to alexk:

1) If the user has own ~/.gap that may be what they use to customise their own GAP installation - perhaps what they want to set for other GAP, not the one in Sage?

Point being, I want to do away with the notion of a "GAP in Sage" being necessary as opposed to an existing GAP installation. Sage should work and integrate with an existing GAP and not provide its own GAP as its own remote universe separate from the GAP installation that a user might already be using. Practically-speaking that isn't totally possible yet, but I am trying to move in that direction.

3) yes, sure - it will access packages and libraries in the system GAP root. Just try my example without -b to the the full GAP banner.

Okay, I'll give that a try then. That said, I'm still not sure I want to rely on -r since that hinders a user from working in GAP directly and in Sage with the same configuration (to the extent possible).

comment:249 Changed 3 years ago by embray

Some progress: With a few mostly trivial exceptions I have all the tests in the following modules working

  • sage.interfaces.gap
  • sage.libs.gap.element
  • sage.libs.gap.libgap
  • sage.libs.gap.util

Some of the others not so much. For example sage.libs.gap.all_documented_functions times out. In that case I have to wonder if it's just because I have too many packages available or something. Will have to investigate that more later.

Things are still something of a mess so I'll work next Monday or Tuesday on cleaning up some more before I push my work. The biggest mess is still in the area of error handling, but I think I have something of a handle on it. I'm also still concerned about memory leaks, though I don't have any immediate evidence that this is a problem.

Last edited 3 years ago by embray (previous) (diff)

comment:250 Changed 3 years ago by embray

Even if I reduce the number of packages down to the minimum required for GAP to work, sage.libs.gap.all_documented_functions takes ages, and eventually times out. On GAP 4.8.6 it's also quite slow for me--takes about two minutes. But now it's even worse. Is this really something we need/want, and is there a reason it's quite so slow...?

comment:251 follow-up: Changed 3 years ago by dimpase

Let's limit the scope of this ticket to concentrate on the core problems: error handling and interaction with GAP's GC, something that needs a lot of skills. We can iron out the kinks in e.g. sage.libs.gap.all_documented_functions and in GAP workspaces handling later.

comment:252 in reply to: ↑ 251 Changed 3 years ago by embray

Replying to dimpase:

Let's limit the scope of this ticket to concentrate on the core problems: error handling and interaction with GAP's GC, something that needs a lot of skills. We can iron out the kinks in e.g. sage.libs.gap.all_documented_functions and in GAP workspaces handling later.

I agree, you're right. And there's more work to be done on that. I have something that mostly works, but I want to think a little more systematically about it and come up with something a bit cleaner.

Related to that, I think, alas, there is still a need for some kind of libgap_enter/exit functions like from Volker's libgap. Obviously the idea of the official "libgap API" would be such that nothing like this is needed. But right now there are still many areas where we are (more or less unavoidably) bypassing that and calling GAP kernel functions directly.

GASMAN needs to know where the bottom of the execution stack is (at least as far as GAP objects are present) in order to properly mark bags that are initialized in global variables, and I believe we may have some cases like that. I think this might explain some weird segfaults and other errors. I'm getting from time to time. The problem is that GAP assumes that everything stems from an initial InitializeGap call that's at the bottom of the stack. But when using GAP as a library that is clearly not true.

I'm not sure what the GAP developers intend us to do in this case, as this is clearly a problem for anyone using GAP as a library. Do we have to manage references ourselves even for local variables? That would seem fragile, whereas the old libgap_enter() very clearly marks the "bottom" of the stack so far as GASMAN needs to be concerned. I will ask.

Last edited 3 years ago by embray (previous) (diff)

comment:253 follow-up: Changed 3 years ago by dimpase

I received repeated assurances from GAP devs that these libgap_enter/libgap_exit funcs are not needed.

My understanding is that GASMAN searches the stack for pointers, and so the local vars are safe.

comment:254 in reply to: ↑ 253 ; follow-up: Changed 3 years ago by embray

Replying to dimpase:

I received repeated assurances from GAP devs that these libgap_enter/libgap_exit funcs are not needed.

My understanding is that GASMAN searches the stack for pointers, and so the local vars are safe.

I'm not sure that's true. I may be wrong, but the global variable StackBottomBags (which in official GAP, as opposed to the old libgap, is declared static so we can't easily do anything with it) is set once, when the InitBags function is called from InitializeGap.

Garbage collections then either search up from this location if StackBottomBags is below the top of the stack. Or if, for some reason, StackBottomBags is above, it will search down from there. But not both. ISTM that this implementation detail is just meant to account for both architectures were the stack grows up or down, but it still bakes in an apparent assumption that StackBottomBags is initialized from some stack frame outside any where GAP objects are initialized.

Since InitializeGap is called in some arbitrary stack frame that may be either above or below future stack frames that call GAP functions, this does not seem safe unless we ensure that all future calls to GAP functions happen to occur in a deeper stack frame than when InitializeGap is called, which means it needs to be called early in the Python interpreter startup, just for example.

Again, I'm probably missing something but I have my doubts about this...

Garbage collection issue aside, I also believe some kind of libgap_enter() is necessary for error handling to work correctly, or else you can wind up in situations where if an error occurs you can wind up longjmp-ing to some weird place, like back into the InitializeGap call, for example. The only other way around is to do as you originally did, and have our error_handler longjmp completely out of GAP to a location of our choosing, but that can cause GAP not to perform necessary error cleanup of its own (such as closing streams).

Last edited 3 years ago by embray (previous) (diff)

comment:255 Changed 3 years ago by embray

Another relatively minor, but difficult to deal with issue I encountered last week but only just now looked into:

Syntax errors are not handled "correctly" insofar as it ignores the custom ERROR_OUTPUT variable that has been provided, and just prints the error directly to the libc stderr. No bueno. It's a minor thing, but this alone has me thinking that the new libgap is not fully ready for prime-time. But if we can get stuff like this fixed in a GAP 4.10.x and set that as the minimum required version, that might be good enough, both for Sage and for downstream packagers. I'll open an issue about it.

On a broader note, I really wish GAP relied a lot less at the lower levels on printing to I/O streams for error reporting, and instead set an error message somewhere along with an error indicator (there is STATE(NrError) but no equivalent for storing error messages). Something along the lines of having something similar to a PyErr_Occurred() would be more helpful for libraries to use, and would still integrate easily into the GAP REPL (where printing becomes appropriate). I have a similar complaint about how ViewObj works. I will open issues about these as well but I don't know if we can expect a fix anytime soon. In the meantime it is okay (but not ideal) to work around by capturing the relevant I/O streams (but in the case of syntax errors even that workaround is broken :(

Last edited 3 years ago by embray (previous) (diff)

comment:256 follow-up: Changed 3 years ago by gh-ChrisJefferson

Some general comments from a gap developer.

We know the libgap is far from complete -- we will accept PRs to add functionality and (if it doesn't require Ganges to other code) are happy to add new functions in minor 4.10 releases.

For GASMAN, yes it does support stacks going up and down -- GASMAN over the years has run on all kinds of systems, including Amigas and Ataris. For Sage's purpose, the best option would be to the stack base at he base of the stack -- is it possible to get that in Python? If not, a function which sets the pointer to some value above any stack values is sufficient.

Could Sage try to run GAP in multiple threads? In that case we would have to track all the stacks. That is also possible, but would require a GAP core change.

We have had trouble in the past with error entering an infinite loop, and are happy to accept PRs to fix it. Any changes must not change functionality for existing GAP users, and break any packages (I realise that is a high bar). It is already possible (with care) to replace the break loop, and that replacement could store the error generated.

Changes to ViewObj? should be done with discussion. We are generally awarez and not entirely happy, with some printing functionality, but it's so fundamental it's very hard to change anything without breaking dozens of packages.

comment:257 in reply to: ↑ 256 Changed 3 years ago by embray

Replying to gh-ChrisJefferson:

For GASMAN, yes it does support stacks going up and down -- GASMAN over the years has run on all kinds of systems, including Amigas and Ataris. For Sage's purpose, the best option would be to the stack base at he base of the stack -- is it possible to get that in Python? If not, a function which sets the pointer to some value above any stack values is sufficient.

What do you mean by "a function which sets the pointer to some value above any stack values"? Right now nothing like that exists officially, although I could probably do it in some gnarly hack. Otherwise we would need a patch to GAP.

The best I could do would be to call GAP_Initialize as early as possible in the Python interpreter start-up, but for Sage at least--and I think likely for other applications--this would not be acceptable. Sage initializes GAP lazily only when it is first used. Adding GAP initialization to Sage start-up would noticeably impact startup time, especially in cases when the workspace needs to be reinitialized.

In any case, IIUC, setting the stack base for GAP from InitializeGap is not good for GAP used as a library, since there's no way to guarantee that the InitializeGap call will remain a base of future stack frames.

I'm not positive yet, but I think this is a reason for some problems I'm having with bag pointers being moved from under my feet once the number of GAP objects in memory grows large. For heap objects it's not a problem--they remain valid because we track our references to them and use the kindly-provided markBagsCallback interface to mark them all. But then when I attempt function calls (e.g. Filter) on large lists of GAP objects things start moving under my feet.

Could Sage try to run GAP in multiple threads? In that case we would have to track all the stacks. That is also possible, but would require a GAP core change.

In theory sure--anything could try to run GAP code multi-threaded. But that's never been safe AFAIK so I'm not worried about that here. In Python multi-threading is not very common anyways since limitations in the Python interpreter make it less useful for performance purposes, so multi-process parallelism is more common.

comment:258 follow-up: Changed 3 years ago by gh-ChrisJefferson

Is there any simple way to get a pointer to something early in the stack, before anything else that might call GAP? Does Sage run any early C function which doesn't return, where it could stash the stack pointer? Looking at Python itself, a _PyMain object is put on the stack very early on, which is they passed by pointer. Is it possible to get that pointer back?

EDIT: _PyCoreConfig also looks like it is set up very early and stored on the stack.

Last edited 3 years ago by gh-ChrisJefferson (previous) (diff)

comment:259 in reply to: ↑ 258 Changed 3 years ago by embray

Replying to gh-ChrisJefferson:

Is there any simple way to get a pointer to something early in the stack, before anything else that might call GAP? Does Sage run any early C function which doesn't return, where it could stash the stack pointer? Looking at Python itself, a _PyMain object is put on the stack very early on, which is they passed by pointer. Is it possible to get that pointer back?

EDIT: _PyCoreConfig also looks like it is set up very early and stored on the stack.

I was thinking about something like that, but in the end it still wouldn't help, at least not with GAP_Initialize() as it's currently defined:

void GAP_Initialize(int              argc,
                    char **          argv,
                    char **          env,
                    GAP_CallbackFunc markBagsCallback,
                    GAP_CallbackFunc errorCallback)
{
    InitializeGap(&argc, argv, env);
    SetExtraMarkFuncBags(markBagsCallback);
    STATE(JumpToCatchCallback) = errorCallback;

    GAP_True = True;
    GAP_False = False;
    GAP_Fail = Fail;
}

Here it passes &argc to IntializeGap, where the compiler effectively makes argc a stack local variable and thus &argc is a pointer to the GAP_Initialize stack frame.

I could, as you say, create (or abuse) some area earlier on the stack, and temporarily shove the value for argc in there. Then I would have manually re-implement GAP_Initialize so that I can pass that pointer to InitializeGap.

Point being, of course, none of this should be necessary, and there ought to be a way to set the stack bottom for GASMAN any time to wherever is appropriate (this is what libgap_enter did, primarily).

comment:260 follow-up: Changed 3 years ago by gh-ChrisJefferson

There are two seperate issues here:

1) We need a way to set the stack bottom in GASMAN. Indeed there is no such function. As has been said multiple times, the libgap api was only introduced a short while ago as a well-defined place to put functionality needed by code outside GAP, and we can easily add a function to it to allow passing in a stack base.

2) Do we need something more like the existing libgap_enter/leave, which kept updating this pointer all the time, and had to be called in pairs. I don't think we do, as long as we can get some pointer into the stack quite high up.

comment:261 in reply to: ↑ 254 ; follow-up: Changed 3 years ago by jdemeyer

Replying to embray:

I also believe some kind of libgap_enter() is necessary for error handling to work correctly

Isn't this exactly the sig_on()/sig_error() use case? Earlier, you said that longjmping using sig_error() was problematic.

comment:262 in reply to: ↑ 260 Changed 3 years ago by embray

Replying to gh-ChrisJefferson:

There are two seperate issues here:

1) We need a way to set the stack bottom in GASMAN. Indeed there is no such function. As has been said multiple times, the libgap api was only introduced a short while ago as a well-defined place to put functionality needed by code outside GAP, and we can easily add a function to it to allow passing in a stack base.

That's fine. I know (now) that it's incomplete. My fear is that it won't be "ready" enough to fully replace Sage's use of its custom GAP fork in time for the next release, and in particular in time for the upcoming Debian freeze. Regardless, now we must work together to discover these kinds of issues and resolve them.

2) Do we need something more like the existing libgap_enter/leave, which kept updating this pointer all the time, and had to be called in pairs. I don't think we do, as long as we can get some pointer into the stack quite high up.

Yes, that is possible. But there are some problems with that:

  • It places the burden on libgap users to have to worry about issues like stack frames, and ensuring that this is set up early in their application and in such a way that all future function calls are children to some call in which this is set up, which may be not always be obvious or trivial, especially (for example) if trying to implement a generic pyGAP library or the like. It could be done, but would require hacks.
  • It involves a kind of "non-local" behavior that is hard to debug. Actual calls to GAP functionality may not occur in an application until far from where this "stack bottom pointer" is initialized. It's also inefficient. I could grab and abuse a pointer to a local variable in Py_Main for example, but from there a typical code eval may be 30-40 stack frames deep, if not much more, none of which have anything to do with GAP. I haven't evaluated exactly how much that affects the efficiency of GAP's garbage collection, but it's certainly adding needless overhead.
  • Instead, having explicit libgap_enter/exit functions clearly demarcates where GAP-related code is expected to run. It is both clearer to the developer (they don't even have to understand what it's for or why--just document "all code that uses libgap must be demarcated with these functions/macros)"; it's clearer for the reader of existing code ("this is where the GAP stuff starts/ends"); and it's more efficient for the garbage collector ("this marks the end of stack frames I'm responsible for").

An alternative might be to design an extensive libgap API such that this kind of memory management responsibility is completely taken away from the user, by never allowing them to allocate new Bags in their own local variables, but instead ensures that all GAP objects returned by the API are allocated from some pool that is entirely managed by GAP. Then all GAP API calls could begin with the effective equivalent of "libgap_enter" while taking that responsibility entirely away from the user. I would definitely prefer only using functions from GAP that are designated as part of some public API. I know the API has already expanded significantly since the 4.10.0 release, but I don't think it has this feature in terms of memory management.

comment:263 in reply to: ↑ 261 ; follow-up: Changed 3 years ago by embray

Replying to jdemeyer:

Replying to embray:

I also believe some kind of libgap_enter() is necessary for error handling to work correctly

Isn't this exactly the sig_on()/sig_error() use case? Earlier, you said that longjmping using sig_error() was problematic.

libgap_enter/exit() has nothing really to do with error handling. The problem with calling our own longjmp from within a custom error handler is that it causes GAP's own internal error cleanup (in CATCH_ERROR blocks) blocks to be bypassed.

comment:264 in reply to: ↑ 263 ; follow-up: Changed 3 years ago by embray

Replying to embray:

Replying to jdemeyer:

Replying to embray:

I also believe some kind of libgap_enter() is necessary for error handling to work correctly

Isn't this exactly the sig_on()/sig_error() use case? Earlier, you said that longjmping using sig_error() was problematic.

libgap_enter/exit() has nothing really to do with error handling. The problem with calling our own longjmp from within a custom error handler is that it causes GAP's own internal error cleanup (in CATCH_ERROR blocks) blocks to be bypassed.

Sorry, I think I misread what you were responding to here, since I was apparently the one who mentioned libgap_enter in the context of error handling. What I meant here was that in addition to setting the stack bottom, libgap_enter might also be needed to work as something like sig_on(), but that uses an alternative location for the jmp_buf (specifically GAP's internal STATE(ReadErrorJump) as opposed to cysignals' cysigs.env). To that end I think an extension to cysignals might be useful for allowing it to work with an alternative location of the jmp_buf--for example as an alternative to sig_on().

comment:265 Changed 3 years ago by embray

Also I haven't pushed updates to my branch yet, but I already have a macro that sets sigsetjmp(STATE(ReadErrorJump)) before calls to any GAP functions. This is called after sig_on() so we get the best of both worlds. But I also think it would be possible, and handy, to have a way to merge the two into a single sig_on_altbuf call or something.

comment:266 Changed 3 years ago by git

  • Commit changed from f3a909cdec30f70891b515fed05c53072fdfec09 to 478d554716f58275be53342819f3c8d927092fcb

Branch pushed to git repo; I updated commit sha1. New commits:

478d554some error-handling 'improvement'

comment:267 in reply to: ↑ 264 ; follow-up: Changed 3 years ago by jdemeyer

Replying to embray:

What I meant here was that in addition to setting the stack bottom, libgap_enter might also be needed to work as something like sig_on(), but that uses an alternative location for the jmp_buf (specifically GAP's internal STATE(ReadErrorJump) as opposed to cysignals' cysigs.env). To that end I think an extension to cysignals might be useful for allowing it to work with an alternative location of the jmp_buf--for example as an alternative to sig_on().

One immediate problem: would that be a jmp_buf or a sigjmp_buf? What cysignals uses depends on the platform: on Linux and OS X, it's sigjmp_buf. On other platforms, it's jmp_buf.

Does GAP really have to perform a longjmp()? Couldn't we propose to change GAP to use a callback function instead? That would be much more user friendly and it's also how other libraries implement error handling.

comment:268 follow-up: Changed 3 years ago by jdemeyer

The current libGAP-4.8 uses a callback mechanism for error handling (see libgap_set_error_handler). IMHO, porting that to GAP-4.10 would be a much better solution than messing with setjmp()/longjmp() calls.

comment:269 Changed 3 years ago by git

  • Commit changed from 478d554716f58275be53342819f3c8d927092fcb to 025740df47e6a3dd00929c0903e441d8584a1476

Branch pushed to git repo; I updated commit sha1. New commits:

025740da little bit of gap_includes cleanup

comment:270 in reply to: ↑ 267 ; follow-ups: Changed 3 years ago by embray

Replying to jdemeyer:

Replying to embray:

What I meant here was that in addition to setting the stack bottom, libgap_enter might also be needed to work as something like sig_on(), but that uses an alternative location for the jmp_buf (specifically GAP's internal STATE(ReadErrorJump) as opposed to cysignals' cysigs.env). To that end I think an extension to cysignals might be useful for allowing it to work with an alternative location of the jmp_buf--for example as an alternative to sig_on().

One immediate problem: would that be a jmp_buf or a sigjmp_buf? What cysignals uses depends on the platform: on Linux and OS X, it's sigjmp_buf. On other platforms, it's jmp_buf.

GAP also has macros for these that depend on platform availability (e.g. syJmp_buf, which is a sigjmp_buf where available). As long as these macros are used it should be safe for a GAP and a cysignals that were configured and compiled for the same platform.

Does GAP really have to perform a longjmp()? Couldn't we propose to change GAP to use a callback function instead? That would be much more user friendly and it's also how other libraries implement error handling.

I don't think it really does. I wrote something a little like you're suggesting here. Internally it uses these longjmps to effectively raise/catch exceptions, and that's a design choice that it's not my place to question. But I think for integration as a library it would be nicer and simpler for everyone if it never long-jumped out of the GAP library at all, and instead just set some "error occurred" indicator with a way to retrieve the most recent error message (a la PyErr_Occurred + PyErr_Fetch).

comment:271 Changed 3 years ago by embray

Also, part of the problem is that as of 4.10.0, the official libGAP API does not offer as much functionality as needed, so we have to call some internal functions directly, and this means we have to provide some sensible return point for when errors occur.

comment:272 in reply to: ↑ 270 ; follow-up: Changed 3 years ago by jdemeyer

Replying to embray:

Internally it uses these longjmps to effectively raise/catch exceptions, and that's a design choice that it's not my place to question.

If if turns out that this makes error handling difficult when using GAP as a library, then it is our place to question that design choice. I never said that we should drop the ReadJmpError thing, we could add the callback as additional error handling mechanism. That's what PARI does for example: it allows a longjmp()-based error handler which is mostly for internal use and it has an error callback function which is used by our Python bindings to PARI.

comment:273 follow-up: Changed 3 years ago by gh-ChrisJefferson

I can't imagine what you would suggest we use instead of longjmp? If a GAP function raises an error, we need to return from multiple functions, back to some well-defined calling point (in practice, back to the user prompt). In a C system, you can really only do that with longjmp.

Of course, we can stop those internal longjmps from escaping a well-defined libgap API, by adding a longjmp in each ABI function that could cause an Error.

comment:274 in reply to: ↑ 273 Changed 3 years ago by jdemeyer

Replying to gh-ChrisJefferson:

I can't imagine what you would suggest we use instead of longjmp?

Various libraries (in particular, PARI, NTL, GLPK and Volker Braun's libGAP) call a callback function to handle errors. Of course, that callback function might also call some variant of longjmp(). In Sage, we use sig_on() to define a single setjmp() jump point that the various callback functions jump to.

A callback function also has the advantage that it can take arguments: you could pass it some info about the error and it may store that or take different actions depending on the error.

comment:275 follow-up: Changed 3 years ago by gh-ChrisJefferson

I'm leaving this thread. It isn't being productive, and asking that GAP stops using longjmp shows a fundamental misunderstanding of.. well C. Or rewriting the entire of GAP basically from scratch.

comment:276 Changed 3 years ago by git

  • Commit changed from 025740df47e6a3dd00929c0903e441d8584a1476 to a407140ae046980369a23934b5e13e1892a19f18

Branch pushed to git repo; I updated commit sha1. New commits:

a407140more gap_includes cleanup: just use IS_LIST and IS_REC

comment:277 in reply to: ↑ 275 Changed 3 years ago by jdemeyer

Replying to gh-ChrisJefferson:

asking that GAP stops using longjmp

Nobody ever asked that. See 272

comment:278 in reply to: ↑ 270 Changed 3 years ago by jdemeyer

Replying to embray:

instead just set some "error occurred" indicator with a way to retrieve the most recent error message (a la PyErr_Occurred + PyErr_Fetch).

But that would require adding error propagation support to all (at least, a lot) GAP functions, no?

In Python for example, most C API functions return a PyObject* and a return value NULL means that an error occurred. This is a good convention, but it does require support for error propagation in virtually all Python/C API functions.

comment:279 in reply to: ↑ 268 ; follow-up: Changed 3 years ago by embray

Replying to jdemeyer:

The current libGAP-4.8 uses a callback mechanism for error handling (see libgap_set_error_handler). IMHO, porting that to GAP-4.10 would be a much better solution than messing with setjmp()/longjmp() calls.

I should add, this has already been done. It is the errorCallback function that is passed to GAP_Initialize. This serves the same purpose and sets a callback in the same place in the code. It doesn't strictly help the problem because this callback is in a location right before GAP longjmps to its CATCH_ERROR handler (if any). This is fine, but the problem is that sometimes this does not jump to a sensible place, but rather just to wherever the last TRY_IF_NO_ERROR macro was, which is not always in a sensible place (yet) for calls to the public API.

comment:280 in reply to: ↑ 272 Changed 3 years ago by embray

Replying to jdemeyer:

Replying to embray:

Internally it uses these longjmps to effectively raise/catch exceptions, and that's a design choice that it's not my place to question.

If if turns out that this makes error handling difficult when using GAP as a library, then it is our place to question that design choice. I never said that we should drop the ReadJmpError thing, we could add the callback as additional error handling mechanism. That's what PARI does for example: it allows a longjmp()-based error handler which is mostly for internal use and it has an error callback function which is used by our Python bindings to PARI.

The use of that mechanism does not, on its own, cause any problems and is perfectly reasonable to use for error handling within the GAP kernel. The only problem is that it's applied somewhat inconsistently at the moment, from the standpoint of third-party API consumers (who, ideally, should not be exposed to this implementation detail at all). Many of the problems I'm encountering related to this are because we're still currently using internal implementation details like CALL_<N>ARGS without setting up the appropriate error handling that the GAP kernel itself otherwise might.

Fortunately in GAP's master branch there's now a GAP_CallFuncArray, for example, that gives an "official" implementation allowing us to avoid (eventually) using these internal details. The API still doesn't get some of the error-handling details quite right (it does not fix the problem I describe above) but it does provide the appropriate area for making these improvements.

comment:281 in reply to: ↑ 279 Changed 3 years ago by jdemeyer

Replying to embray:

It doesn't strictly help the problem because this callback is in a location right before GAP longjmps to its CATCH_ERROR handler (if any). This is fine, but the problem is that sometimes this does not jump to a sensible place, but rather just to wherever the last TRY_IF_NO_ERROR macro was, which is not always in a sensible place (yet) for calls to the public API.

I think I understand what you are saying: there is no way to detect whether the TRY_IF_NO_ERROR block has ended. I.e. you would want something like

TRY_IF_NO_ERROR
{
  enable_error_recovery = 1;
  ....
  enable_error_recovery = 0;
}

and then check enable_error_recovery in the error handler to know whether one should longjmp() to the location set by TRY_IF_NO_ERROR?

comment:282 follow-up: Changed 3 years ago by embray

Yes, that's definitely part of it. After a TRY_IF_NO_ERROR there's no resetting of the mechanism to disable future jumps to that point.

I had some notes about this further up this already confusing ticket, but for example I had some cases where it was possible, if some error occurred very early on in Sage's libgap API, it could result in a longjmp back to InitializeGap (which just happened to be where the last actual TRY_IF_NO_ERROR was). What the GAP API needs is some reasonable terminal location to jump to (preferably within the GAP kernel code itself, or possibly the public API code in libgap-api.c) that can set an error indicator and return.

Incidentally, almost all (and there are some exception) of these jumps originate from calls to the function JUMP_TO_CATCH (this is where the error callback is called as well), and within GAP almost all calls to JUMP_TO_CATCH originate from the default error handler ErrorInner, which is implemented in GAP code in error.g. Chris mentioned to me that the GAP test suite runner actually messes with this in some way (I haven't looked yet), and that the GAP Juypter kernel also replaces ErrorInner. We might consider doing the same, which might simplify some issues. That's still not ideal, and should not be necessary--some internal cleanup is still needed. But replacing ErrorInner might be acceptable as a temporary workaround.

comment:283 Changed 3 years ago by embray

I also can't speak to how JUMP_TO_CATCH may be used, if at all, by third-party libraries. Really it shouldn't be used at all, but it might be somewhere. It's worth doing a search on...

comment:284 Changed 3 years ago by embray

It seems that, at least of the "officially supported" packages, the Browse and SCSCP packages mess around with ErrorInner as well, albeit in ways that I think at least aren't likely to come up when using GAP as a library (e.g. in the SCSCP package this only happens in running its included SCSCP server, which one generally would not do from library code...)

comment:285 in reply to: ↑ 282 Changed 3 years ago by jdemeyer

Replying to embray:

Yes, that's definitely part of it. After a TRY_IF_NO_ERROR there's no resetting of the mechanism to disable future jumps to that point.

And it doesn't look easy to add such a mechanism now.

What the GAP API needs is some reasonable terminal location to jump to (preferably within the GAP kernel code itself, or possibly the public API code in libgap-api.c) that can set an error indicator and return.

I would prefer not a "terminal location to jump to" but a callback function to be called.

Concretely, I would suggest the following:

  • add a macro to reset TRY_IF_NO_ERROR. This could be implemented as memset(STATE(ReadJmpError), 0, sizeof(syJmp_buf));
  • use this macro in GAP in the relevant places, in particular in InitializeGap.
  • have a way to check whether we are inside a TRY_IF_NO_ERROR block or not. Implemented by checking whether syJmp_buf consists of all-zero bytes.
  • in the GAP error callback in Sage, do the following:
    • if we are inside TRY_IF_NO_ERROR: return from the error callback
    • otherwise: call sig_error

comment:286 follow-up: Changed 3 years ago by embray

I don't see why you would need/want sig_error at all. I'm trying to think about ways this could be usefully implemented for all users of GAP as a library. My thinking is that the TRY_IF_NO_ERROR stuff should never leak out of the GAP kernel code at all, which is what I meant by a "terminal location to jump to"--that is--within GAP's own code. At this point it would check whether an error occurred, and should have a way of setting some indication of that fact that can be checked easily from third-party code, along with retrieving the error message. Being able to get a stack trace might be nice too but less important. I don't think a "callback function" is necessary at all, but could also be used I suppose. Though it would be more useful if said callback function were passed the actual error message, rather than having to wade through ErrorInner as we currently do, and read the error message out of a stream object that we're then also responsible for resetting :(

comment:287 in reply to: ↑ 286 Changed 3 years ago by jdemeyer

Replying to embray:

Though it would be more useful if said callback function were passed the actual error message

Of course. That's one of the advantages that a callback would give.

A callback function can run arbitrary code. So whatever specific mechanism that you want to use, you could always implement that using a callback. A callback can call longjmp() for example. Or a callback could set some global variables that you would want to check later.

It's also pretty standard to have an error callback function: PARI, NTL, GLPK have it.

comment:288 Changed 3 years ago by embray

Most sage.libs.gap doctests working, modulo some minor issues. But still getting segfaults during fresh docbuild :(

Suspect garbage collection problems regarding stack pointer issue, but that isn't confirmed.

comment:289 Changed 3 years ago by embray

Also have some doubts regarding interrupt handing being screwed up by InitializeGap.

comment:290 Changed 3 years ago by dimpase

failures during fresh docbuild might indicate multithreading problems. Does “make -j1 ...” make a difference?

comment:291 Changed 3 years ago by git

  • Commit changed from a407140ae046980369a23934b5e13e1892a19f18 to ec4c3a03e18d5789338b40e1e4fa22b56f8b0020

Branch pushed to git repo; I updated commit sha1. New commits:

ec4c3a0fixes soem goofy bugs with the spkg-install for gap

comment:292 Changed 3 years ago by git

  • Commit changed from ec4c3a03e18d5789338b40e1e4fa22b56f8b0020 to 7ccc120e3a606e044bdd9ef6835930075fa70a31

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

7ccc120fixes some goofy bugs with the spkg-install for gap

comment:293 Changed 3 years ago by embray

I found a neat thing while digging around in the GAP code. It seems there is already basically a way to "throw" an "exception" with the JUMP_TO_CATCH calls.

JUMP_TO_CATCH already accepts a payload argument which can be any GAP object. It stows this away in a thread-local state variable called ThrownObject.

At the moment there is no particular convention for what should go there, and it is not used at all, for anything. But it certainly could be.

Perhaps it was once used for something else. Most existing JUMP_TO_CATCH pass some integer between 0 and 3,but the meaning of this is long lost and the GAP git repository doesn't go back far enough to find out what it might have once meant. I'd be curious to dig around on that a bit more.

But I could implement and document a convention for how this is intended to be used, for example, to make the last error message or something like that available. This would be useful as part of my larger vision for cleaning up GAP's error handling.

comment:294 Changed 3 years ago by embray

  • Description modified (diff)

Added some more current status updates.

comment:295 Changed 3 years ago by embray

I think at least some of the problems I'm having now are related to the issue that the old writeandcheck patch was intended to fix. I removed that patch just to see if it was still needed and it seems very much like it is. That issue is also closely related to the issue I opened here: https://github.com/gap-system/gap/issues/3028

I'm going to try re-applying the patch to see if it fixes any of the known problems I'm having, and then will work with the GAP team to find a fix that's acceptable to them. It's definitely a bug in GAP regardless of interaction with Sage.

comment:296 Changed 3 years ago by dimpase

so that's something renamed to echoandcheck(), right?

comment:297 Changed 3 years ago by embray

There's now echoandcheck and SyWriteandcheck where are nearly exact duplicates of each other except for a minor difference. The latter seems to be more directly affected but echoandcheck might be too.

I cobbled together an updated version of Jeroen's original patch for this, though I'm not exactly sure if that version is gonna fly, since I think there is some deeper cleanup of error handling needed.

The updated patch seems to have helped with some issues, but not others that I was hoping it might fix... :/

comment:298 Changed 3 years ago by embray

Definitely having some problems with gap + fork() and/or multiprocessing in general. Not exactly sure what the deal is yet.

comment:299 Changed 3 years ago by dimpase

Well, are you on Windows? This might be an extra weight to carry around...

comment:300 Changed 3 years ago by embray

Nope.

Interesting; I see now that there is some evil interaction between libgap and libecl (either the actual library or something to do with Sage's wrapper around it). Will investigate more next week.

comment:301 Changed 3 years ago by embray

Ah, if libgap happens to be initialized before libecl this is what happens:

/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/sage/interfaces/maxima_lib.py in <module>()
     91 from sage.symbolic.ring import SR
     92
---> 93 from sage.libs.ecl import EclObject, ecl_eval
     94
     95 from .maxima_abstract import (MaximaAbstract, MaximaAbstractFunction,

/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/sage/libs/ecl.pyx in init sage.libs.ecl (build/cythonized/sage/libs/ecl.c:13344)()
   1342     return ecl_wrap(o)
   1343
-> 1344 init_ecl()

/home/embray/src/sagemath/sage/local/lib/python2.7/site-packages/sage/libs/ecl.pyx in sage.libs.ecl.init_ecl (build/cythonized/sage/libs/ecl.c:5603)()
    271     cdef sigaction_t sig_test
    272     sigaction(SIGCHLD, NULL, &sig_test)
--> 273     assert sage_action[SIGCHLD].sa_handler == NULL  # Sage does not set SIGCHLD handler
    274     assert sig_test.sa_handler == NULL              # And ECL bootup did not set one

However, under some circumstances (I reproduced this under pdb) the AssertionError is being caught and ignored, so libecl initialization is incomplete and broken. Future ecl calls result in a segfault. Neat!

comment:302 Changed 3 years ago by embray

I didn't notice this before since it's set up by the InitKernel function in one of the GAP kernel modules (src/iostream.c), but GAP also installs its own special SIGCHLD handler, which it does even if it hasn't started any child processes.

We still want GAP to be able to clean up whatever I/O-related state it cleans up in this SIGCHLD handler. This need was anticipated somewhat in a CheckChildStatusChanged function that is not used by GAP itself, but made available for wrapping by applications that set their own SIGCHLD handler. This may need to be made available by the libgap API. I still need to think more about how best to handle this in Sage though. It would also be nice if Cysignals had a helper, in the form of a context manager perhaps, to save restore all registered signal handlers.

In any case, for now, just disabling GAP's SIGCHLD handler fixed most of the more severe problems I've been having.

comment:303 Changed 3 years ago by embray

Still have a number of doctests failing, but a lot more of them seem to have to do with changes to GAP itself, rather than with the interface to it. I'm not completely positive about that though. I need to go through the failing tests and classify the types of errors. Still a few segfaults--I suspect mostly to do with the problems with garbage collection of stack local variables (see https://trac.sagemath.org/ticket/22626#comment:252)

comment:304 Changed 3 years ago by dimpase

OK, this all sounds very promising!

Sorry for actually removing all these libgap_enter/exit pairs, it seems we need to put them back in now...

comment:305 Changed 3 years ago by embray

  • Description modified (diff)

I tried to describe the issue with premature garbage collection of stack local variables here: https://github.com/gap-system/gap/issues/3089 I'm still not 100% sure that this is the cause for some of the problems I'm seeing and need to try to perform a more careful analysis of a real-world case of this. But I'm not sure what else would be causing some of the problems I'm seeing with objects being randomly replaced with other objects.

Note: See TracTickets for help on using tickets.