Opened 4 years ago

Last modified 3 weeks ago

#22191 closed enhancement

update ECL to 16.1.3 or - rather - 20.4.24 — at Version 127

Reported by: dimpase Owned by:
Priority: critical Milestone: sage-9.2
Component: packages: standard Keywords: upgrade, ecl
Cc: rws, fbissey, jdemeyer, charpent, jpflori, arojas, gh-timokau, saraedum, infinity0, slelievre, thansen, embray, mjo Merged in:
Authors: Nils Bruin, Dima Pasechnik, Erik Bray Reviewers: Dima Pasechnik, Nils Bruin, François Bissey
Report Upstream: Fixed upstream, but not in a stable release. Work issues:
Branch: public/ticket-22191 (Commits) Commit: 4b4255e39af6d1baf19ba0af9d11e924501946eb
Dependencies: Stopgaps:

Description (last modified by dimpase)

ECL-20.4.24 has been released (on 2020/04/24)

https://common-lisp.net/project/ecl/posts/ECL-20424-release.html

This most probably obsoletes this:

ECL 16.1.3 has been released.

We should update, as it will resolve some Maxima bugs, and is essential for resolving other Maxima bugs.

Tarball: https://common-lisp.net/project/ecl/static/files/release/ecl-16.1.3.tgz

Change History (128)

comment:1 Changed 4 years ago by dimpase

  • Branch set to u/dimpase/ecl16.1.3
  • Commit set to 33e59cb6e6d5b6b16f0b23eacc566d613ffccd07
  • Description modified (diff)

New commits:

3ed32fbUpgrade Jupyter packages, add enum34
72993f2Merge branch 'u/jdemeyer/upgrade_jupyter_packages' of trac.sagemath.org:sage into develop
2a291abMerge branch 'develop' of trac.sagemath.org:sage into develop
bd5a04fUpdate source to 2.11.0, fix checksum.
8356446Merge branch 'u/charpent/upgrade_git_to_2_10_2__or_2_11___' of trac.sagemath.org:sage into newgit
33e59cbupdate ECL to 16.1.3 - most of our patches are obsolete

comment:2 Changed 4 years ago by dimpase

oops, somewhat wrong branch... testing anyway

comment:3 Changed 4 years ago by git

  • Commit changed from 33e59cb6e6d5b6b16f0b23eacc566d613ffccd07 to eff60815383ca3b3120a7a26e98e1712072c70ac

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

eff6081update ECL to 16.1.3 - most of our patches are obsolete

comment:4 Changed 4 years ago by dimpase

  • Authors set to Dima Pasechnik
  • Status changed from new to needs_review

comment:5 Changed 4 years ago by fbissey

  • Cc fbissey added

comment:6 Changed 4 years ago by dimpase

The FPE problem on #18920 in comment 99 might be due to ECL 16.1.3 throwing an FPE exception at

> (/ 1.0 0.0)

Condition of type: DIVISION-BY-ZERO
#<a DIVISION-BY-ZERO>

while 16.1.2 happily returns

#.ext::single-float-positive-infinity

comment:7 Changed 4 years ago by dimpase

with 6.1.3 there is apparently new --no-trap-fpe which emulates the old behaviour. So we somehow have to figure out where to put this option.

comment:8 Changed 4 years ago by fbissey

If it is an option for ecl then somewhere in sage/libs/ecl.pyx is the place, probably in ecl_init but I have to dig a bit.

comment:9 follow-up: Changed 4 years ago by fbissey

There is already an option ECL_OPT_TRAP_SIGFPE is it equivalent?

Changed 4 years ago by dimpase

crash gdb log

comment:10 in reply to: ↑ 9 Changed 4 years ago by dimpase

Replying to fbissey:

There is already an option ECL_OPT_TRAP_SIGFPE is it equivalent?

I guess it should be. Perhaps it's something more subtle... I attach the crash dump.

comment:11 Changed 4 years ago by git

  • Commit changed from eff60815383ca3b3120a7a26e98e1712072c70ac to ac8348c462b11a67ee22f77e265afc0ce0572620

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

62b5825some more easy doctest fixes
3c28022more easy doctest fixes
5d86413Merge branch 'public/t18920' of trac.sagemath.org:sage into max5381
100d9d6workaround for #<a ARITHMETIC-ERROR> (18920#comment:60
79eb82ffix for maxima bug #3236
814e8edreflect patch's dependence on an earlier maxima tree commit
af7e5ffnew scaling of dynatomic polynomial
68da1e4Merge branch 'public/t18920' of trac.sagemath.org:sage into ecl1613
6bae4b1bumped up for 5.39.0. Sage builds (with ecl 16.1.3)
ac8348cdisable SIGFPE traps

comment:12 Changed 4 years ago by git

  • Commit changed from ac8348c462b11a67ee22f77e265afc0ce0572620 to a06f00245bfb652270eeea78248348777d006538

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

3ed32fbUpgrade Jupyter packages, add enum34
72993f2Merge branch 'u/jdemeyer/upgrade_jupyter_packages' of trac.sagemath.org:sage into develop
2a291abMerge branch 'develop' of trac.sagemath.org:sage into develop
6027df9Merge branch 'develop' of trac.sagemath.org:sage into develop
b9ee72dupdate ECL to 16.1.3 - most of our patches are obsolete
a06f002disable SIGFPE traps

comment:13 Changed 4 years ago by git

  • Commit changed from a06f00245bfb652270eeea78248348777d006538 to ad25254d3445265211081b50501af670daf4dade

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

377e0edupdate ECL to 16.1.3 - most of our patches are obsolete
ad25254disable SIGFPE traps

comment:14 Changed 4 years ago by dimpase

  • Status changed from needs_review to needs_work
  • Work issues set to sort out SIGFPE

this is a clean update branch for ECL 16.1.3 - needs work to sort out SIGFPE.

comment:15 follow-up: Changed 4 years ago by nbruin

I'm pretty sure the change in behaviour comes from this check-in on ECL: https://gitlab.com/embeddable-common-lisp/ecl/commit/c2b2941768f39544f45f24e19a30081d316eeb71

Clearly, this code will be setting FPE generation flags in more places than before. Probably ECL doesn't go out of its way to reset FPE flags when ECL is exiting. So probably, after executing ECL code, it is now the case that (often?) the FPE flags are set the way ECL wants; not the way we want. So in addition to restoring the SIG handlers, we should probably sprinkle an appropriate amount of fesetenv calls around the ECL entry and exit wrappers.

Last edited 4 years ago by nbruin (previous) (diff)

comment:16 Changed 4 years ago by dimpase

Upstream tells me that https://gitlab.com/embeddable-common-lisp/ecl/blob/develop/src/lsp/top.lsp#L1456 makes write-error patch unnecessary. So we should remove it too.

Last edited 4 years ago by dimpase (previous) (diff)

comment:17 in reply to: ↑ 15 Changed 4 years ago by dimpase

  • Cc jdemeyer added

Replying to nbruin:

I'm pretty sure the change in behaviour comes from this check-in on ECL: https://gitlab.com/embeddable-common-lisp/ecl/commit/c2b2941768f39544f45f24e19a30081d316eeb71

Clearly, this code will be setting FPE generation flags in more places than before.

but this ECL macro is not about the case at hand, as I believe we do have fenv.h available. Or you mean that Sage does not use fenv.h?

Probably ECL doesn't go out of its way to reset FPE flags when ECL is exiting. So probably, after executing ECL code, it is now the case that (often?) the FPE flags are set the way ECL wants; not the way we want. So in addition to restoring the SIG handlers, we should probably sprinkle an appropriate amount of fesetenv calls around the ECL entry and exit wrappers.

this does not explain why simply setting ECL_OPT_TRAP_SIGFPE to 0 solves the issue for division by 0, but not for FP overflow.

However, I don't understand why doing anything with ECL_OPT_TRAP_SIGFPE has any effect whatsoever (assuming we do not preserve it among other ECL's interrupts such as SIGINT). Shouldn't

    #and put the Sage signal handlers back
    for i in range(1,32):
        sigaction(i, &sage_action[i], NULL)

in ecl.pyx be overriding whatever ECL did with SIGFPE?

comment:18 Changed 4 years ago by dimpase

The experimental branch using ECLs SIGFPE handler is on girt trac: u/dimpase/ecl16.1.3exp So you can have a look --- it also doesn't work.

comment:19 follow-up: Changed 4 years ago by nbruin

I am not very familiar with this kind of coding issues and/or with SIG handling in general.

If we're doing our sig handler setting properly then it shouldn't matter what sig handlers ECL installs when we run non-ECL code, because we switch them out. As far as I can see this is the case. I see no evidence that an ECL sig handler is interfering with our work. The trace-back doesn't mention ECL code at all. Just an uncaught signal.

It seems that the sage action for SIGFPE is to coredump. That's what we're seeing.

There are flags that govern when SIGFPE is raised (rather than propagating a NaN or something similar). The ECL macro I pointed at will be inserted in ECL generated code quite a bit, so I expect that macro to be executed ALL THE TIME. Plus, the patch was made exactly to fix an issue of unraised SIGFPEs (CL apparently prefers the exceptions over NaNs? by default).

We are not taking measures to set fenv to our liking after executing ECL, and one change in the new ECL version is certainly to change fenv more often. So I think it's worth a try to save & restore's sage's own fenv across ECL calls.

It could be that ECL changes its FPE raise mask depending on the value of ECL_OPT_TRAP_SIGFPE. I haven't checked. That would be consistent with observed behaviour.

comment:20 in reply to: ↑ 19 ; follow-ups: Changed 4 years ago by dimpase

Replying to nbruin:

I am not very familiar with this kind of coding issues and/or with SIG handling in general.

If we're doing our sig handler setting properly then it shouldn't matter what sig handlers ECL installs when we run non-ECL code, because we switch them out. As far as I can see this is the case. I see no evidence that an ECL sig handler is interfering with our work. The trace-back doesn't mention ECL code at all. Just an uncaught signal.

But why do we get an uncaught FP overflow signal? That's because booting ECL up destroys Sages' sig handler, even if we switch ECLs SIGFPE handler off. How exactly this is possible is not clear to me.

It seems that the sage action for SIGFPE is to coredump. That's what we're seeing.

no, it is not. Sage's action on FP overflow is to return infinity. Why does ECL continue to mess around with it, even if told not to? In fact, ECL 16.1.3 has a new command line switch, -no-trap-fp, and it works. (i.e. you get ECLs infinity if you try to convert a too big number to float if this switch is on, and you get an interrupt if that switch is off)

There are flags that govern when SIGFPE is raised (rather than propagating a NaN or something similar). The ECL macro I pointed at will be inserted in ECL generated code quite a bit, so I expect that macro to be executed ALL THE TIME. Plus, the patch was made exactly to fix an issue of unraised SIGFPEs (CL apparently prefers the exceptions over NaNs? by default).

We are not taking measures to set fenv to our liking after executing ECL, and one change in the new ECL version is certainly to change fenv more often. So I think it's worth a try to save & restore's sage's own fenv across ECL calls.

It could be that ECL changes its FPE raise mask depending on the value of ECL_OPT_TRAP_SIGFPE. I haven't checked. That would be consistent with observed behaviour.

comment:21 Changed 4 years ago by nbruin

  • Branch changed from u/dimpase/ecl16.1.3 to u/nbruin/ecl16.1.3

comment:22 in reply to: ↑ 20 Changed 4 years ago by nbruin

  • Commit changed from ad25254d3445265211081b50501af670daf4dade to a2cadc04eb91409ad1f88b5f4d6ad80d5dfc3451

Replying to dimpase:

In fact, ECL 16.1.3 has a new command line switch, -no-trap-fp, and it works.

The corresponding ECL command is (si::trap-fpe t nil), so we could just pass that to ECL just after cl_boot. That might be easier/more efficient than passing it in the argv.

This should put ECL in agreement about what the fenv FPE raise flags are supposed to be (and then it also doesn't matter what SIGFPE handler ECL uses, because the signals will not occur, so we don't need to restore ECL's SIGFPE handler for ECL code either.

Last edited 4 years ago by nbruin (previous) (diff)

comment:23 follow-up: Changed 4 years ago by git

  • Commit changed from a2cadc04eb91409ad1f88b5f4d6ad80d5dfc3451 to ea11dd53248462bf2859fb3cbe34bfd47f006682

Branch pushed to git repo; I updated commit sha1. New commits:

ea11dd5Restore "fenv" upon entry and exit of ECL code

comment:24 in reply to: ↑ 23 Changed 4 years ago by nbruin

Replying to git:

Branch pushed to git repo; I updated commit sha1. New commits:

ea11dd5Restore "fenv" upon entry and exit of ECL code

OK, committing the change before pushing is probably helpful.

comment:25 Changed 4 years ago by git

  • Commit changed from ea11dd53248462bf2859fb3cbe34bfd47f006682 to 35e9d3d1610bade7ae7d5fac735244be25e009eb

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

377e0edupdate ECL to 16.1.3 - most of our patches are obsolete
35e9d3dset ECL to not trap floating point exceptions, so that SIGFPE will not occur

comment:26 Changed 4 years ago by nbruin

OK, that should do the trick! I guess ECL_OPT_TRAP_SIGFPE doesn't have much to do with what we need because it's supposed to control what happens with SIGFPE if ECL is catching signals, whereas we're interested in preventing FP exceptions from generating signals at all. the routine si:trap-fpe is exactly the one to set that behaviour, in a way that is maintained in ECL's environment tracking as well.

comment:27 follow-up: Changed 4 years ago by charpent

  • Cc charpent added

Just as a memento for future testing :

With the current ECL Maxima 5.39.0 gives :

(%i3) taylor(integrate(f(x),x),x,x_0,2); 

taylor: unable to expand at a point specified in:
'integrate(f(x),x)
 -- an error. To debug this try: debugmode(true);

whereas :

%i2) taylor(integrate(f(x),x),x,x_0,2);

(%o2) 'at('integrate(f(x),x),x = x_0)+f(x_0)*(x-x_0)
                                     +(('at('diff(f(x),x,1),x = x_0))
                                      *(x-x_0)^2)
                                      /2

is expected since Maxima 5.38.1 at the latest...

comment:28 Changed 4 years ago by nbruin

DISCLAIMER: I just found out that all my testing was on 16.1.2. While I expect that my fix should work, I haven't actually successfully tested this. Other people who do have easy access to 16.1.3 can verify this easily.

comment:29 in reply to: ↑ 20 Changed 4 years ago by dimpase

OK, this appears to work. make ptest with this ECL 16.1.3 branch and Maxima from #18920 passes without any errors.

comment:30 Changed 4 years ago by dimpase

  • Report Upstream changed from N/A to Reported upstream. No feedback yet.

Daniel Kochmański (ECL lead dev) invited me to open an issue #347 on this.

comment:31 in reply to: ↑ 27 Changed 4 years ago by dimpase

Replying to charpent:

Just as a memento for future testing :

With the current ECL Maxima 5.39.0 gives :

(%i3) taylor(integrate(f(x),x),x,x_0,2); 

taylor: unable to expand at a point specified in:
'integrate(f(x),x)
 -- an error. To debug this try: debugmode(true);

this unfortunately stays the same with ECL 16.1.3. This is for Maxima/ECL people to fix, not for us.

comment:32 follow-ups: Changed 4 years ago by jdemeyer

It would be really good to explain on this ticket what the SIGFPE issue is all about.

comment:33 Changed 4 years ago by jdemeyer

  • Description modified (diff)

comment:34 in reply to: ↑ 32 Changed 4 years ago by dimpase

Replying to jdemeyer:

It would be really good to explain on this ticket what the SIGFPE issue is all about.

examples can be found on #18920. More specifically, in Sage Maxima built with ECL 16.1.3 and unpatched libs/ecl.pyx gives crash dumps like

sage: g(x)=taylor(log(x),x,1,6) # this initialises ECL/Maxima
sage: q = (2^53) * 2^971/1
sage: float(q) # boom, instead of "inf" as the answer.
...
Unhandled SIGFPE: An unhandled floating point exception occurred.
...

and

sage: g(x)=taylor(log(x),x,1,6); g
x |--> -1/6*(x - 1)^6 + 1/5*(x - 1)^5 - 1/4*(x - 1)^4 + 1/3*(x - 1)^3 - 1/2*(x - 1)^2 + x - 1
sage: plot(log(x),(x,0,2))
...
Unhandled SIGFPE: An unhandled floating point exception occurred.
...

comment:35 in reply to: ↑ 32 ; follow-up: Changed 4 years ago by nbruin

The mechanics of why SIGFPE is a problem with ECL 6.1.3 by default:

  • Apparently it is desirable in CL to raise floating point exceptions rather than propagate Inf and NaN by default (CL is a formally specified language, so it probably states that FP exceptions should be handled that way)
  • This is most easily done by using fenv.h functionality to configure floating point arithmetic to raise SIGFPE when these things arise. So, ECL does that, and has a SIGFPE handler in place.
  • We switch the ECL sighandlers in/out whenever we're entering/exiting significant ECL code. Sage is not expecting floating point exceptions to raise SIGFPE, so sage does not have a SIGFPE handler.
  • In order to fix a bug elsewhere in ECL, https://gitlab.com/embeddable-common-lisp/ecl/commit/c2b2941768f39544f45f24e19a30081d316eeb71 changed ECL to much more often update the fenv to reflect its desired state (previously, they were not doing this at all). ECL is not actually written to be called as a library, so it doesn't take any precautions to set/restore global state such as fenv upon entry/exit. Hence, after running ECL in its default configuration, you're liable to end up with an fenv in which SIGFPE can be raised. This is what was happening.

For solutions:

  • it's possible to tell ECL to NOT let floating point exceptions raise SIGFPE. Then ECL's SIGFPE handler is irrelevant, and the fenv state ECL wants is basically consistent with what we want. This is the solution currently used.
  • alternatively, we could let ECL do what it wants and set/restore fenv upon entry/exit of ECL code. fesetenv and fegetenv are the relevant routines from fenv.h. In addition, though, we should restore ECL's SIGFPE handler during ECL code execution, if we go that route (currently we don't).

This was one of these issues that's really easy to resolve, once you understand the problem.

The main confusion is that there are two components: whether floating point exceptions should raise SIGFPE, and how SIGFPE gets handled once it's raised. You might think these two components are controlled via the same mechanism, but they are not. It looks like ECL might be investigating if its configuration can be streamlined on these two issues: https://gitlab.com/embeddable-common-lisp/ecl/issues/347

Currently there are two options that seem relevant: ECL_OPT_TRAP_SIGFPE and --no-trap-fpe that feed into different places and control different aspects.

comment:36 in reply to: ↑ 35 ; follow-up: Changed 4 years ago by jdemeyer

Replying to nbruin:

This was one of these issues that's really easy to resolve, once you understand the problem.

So the issue is solved now?

comment:37 in reply to: ↑ 36 ; follow-up: Changed 4 years ago by dimpase

Replying to jdemeyer:

Replying to nbruin:

This was one of these issues that's really easy to resolve, once you understand the problem.

So the issue is solved now?

it is desirable to tell ECL to NOT let floating point exceptions raise SIGFPE before booting it up, not during the bootup, for this leaves a time window where SIGFPE can be raised. How can this be done?

comment:38 in reply to: ↑ 37 Changed 4 years ago by nbruin

Replying to dimpase:

it is desirable to tell ECL to NOT let floating point exceptions raise SIGFPE before booting it up, not during the bootup, for this leaves a time window where SIGFPE can be raised. How can this be done?

I think this is an academic issue, because in our current setting, no fp exceptions will occur in the window, and hence the way SIGFPE is handled is irrelevant. Anyway:

I assume ECL installs its SIGFPE handler before it executes any FP instructions and/or sets the fenv. The ECL's SIGFPE handler greatly eases the effect of SIGFPE. So if (si::trap-fpe T NIL) is executed right after cl_boot; before switching the SIGhandlers, I think the risk is smaller (I don't think there is a risk at all because no FP work occurs in this process, so it's hard to make an accurate estimate of the relative risks here).

Alternatively, you could trust that ECL knows what it's doing and execute cl_boot with an argv that contains --no-trap-fpe. With any luck, that should just get processed in the same way as command line options upon startup. It is then ECL's responsibility to do this before a SIGFPE actually occurs. In reality, this is essentially the same as just feeding ECL (si::trap-fpe T NIL) just after boot.

fenv is supposed to be thread-specific, so concurrency issues should not come into play, even if sage were ever to grow multiple threads.

comment:39 Changed 4 years ago by nbruin

  • Status changed from needs_work to needs_review
  • Work issues sort out SIGFPE deleted

Needs review?

comment:40 follow-up: Changed 4 years ago by dimpase

there is now https://gitlab.com/embeddable-common-lisp/ecl/issues/349 and also still write_error patch to remove.

comment:41 follow-up: Changed 4 years ago by jdemeyer

  • Status changed from needs_review to needs_work

You should document the change to src/sage/libs/ecl.pyx. At a minimum, there should be a link to this ticket.

comment:42 in reply to: ↑ 41 Changed 4 years ago by dimpase

Replying to jdemeyer:

You should document the change to src/sage/libs/ecl.pyx. At a minimum, there should be a link to this ticket.

Sure. I didn't set it up for review, Nils did. IMHO upstream is still working on this.

comment:43 Changed 3 years ago by jpflori

  • Cc jpflori added

comment:44 in reply to: ↑ 40 Changed 3 years ago by nbruin

Replying to dimpase:

there is now https://gitlab.com/embeddable-common-lisp/ecl/issues/349 and also still write_error patch to remove.

That ticket has now been closed and its fix was to change a test.

No movement on https://gitlab.com/embeddable-common-lisp/ecl/issues/347, so if updating ECL is relevant then doing so with the (si::trap-fpe T NIL) given here is probably a good way to go.

comment:45 Changed 2 years ago by thansen

  • Description modified (diff)
  • Summary changed from update ECL to 6.1.3 to update ECL to 16.1.3

comment:46 Changed 2 years ago by thansen

Any chance this could be finished soon? Some of the ECL maintainers in Debian would like to update the ECL package.

comment:47 follow-up: Changed 2 years ago by fbissey

If I am being honest we want a brand new release of ECL. It won't fix everything but we may be in a better position.

comment:48 in reply to: ↑ 47 Changed 2 years ago by dimpase

Replying to fbissey:

If I am being honest we want a brand new release of ECL. It won't fix everything but we may be in a better position.

I second this. 16.1.3 introduced tricky changes, and the upcoming release (they wanted to get something in late 2017, but OK...) is going again to be much different in the way exceptions are handled.

comment:49 Changed 2 years ago by gh-antonio-rojas

  • Cc gh-antonio-rojas added

comment:50 Changed 2 years ago by arojas

  • Cc arojas added; gh-antonio-rojas removed

comment:51 Changed 2 years ago by gh-timokau

  • Cc gh-timokau added

comment:52 Changed 2 years ago by saraedum

  • Cc saraedum added

comment:53 Changed 2 years ago by gh-timokau

In case we're really waiting for the next ecl release, here is the gitlab milestone and the rc testing issue (which was opened a year ago and got practically no attention).

comment:54 follow-up: Changed 20 months ago by slelievre

  • Keywords upgrade ecl added
  • Milestone changed from sage-7.6 to sage-8.6

Does it help with this ticket that upstream issue 347 was closed a month ago:

comment:55 Changed 20 months ago by slelievre

  • Report Upstream changed from Reported upstream. No feedback yet. to Fixed upstream, but not in a stable release.

Note: debian-science-sagemath discussion about upgrading to ECL 16.1.3:

comment:56 Changed 20 months ago by slelievre

  • Cc infinity0 slelievre thansen added

comment:57 in reply to: ↑ 54 ; follow-up: Changed 20 months ago by fbissey

Replying to slelievre:

Does it help with this ticket that upstream issue 347 was closed a month ago:

It is unclear why it was closed. Possibly because no one was caring. As far as it goes, some of the fix we need would be hard to rebase on top of 16.1.3 - we may just as well try a snapshot from git with all the uncertainty that it entails.

comment:58 in reply to: ↑ 57 ; follow-up: Changed 20 months ago by arojas

Replying to fbissey:

It is unclear why it was closed. Possibly because no one was caring. As far as it goes, some of the fix we need would be hard to rebase on top of 16.1.3 - we may just as well try a snapshot from git with all the uncertainty that it entails.

If I'm reading it correctly it was closed by https://gitlab.com/embeddable-common-lisp/ecl/commit/2a9084b105caf26c89bb16ed4dd0bf5aa7ceab59

Beckporting that commit to 16.1.3 still doesn't fix the SIGFPE crashes on Arch unfortunately.

comment:59 follow-up: Changed 20 months ago by fbissey

Oh right. I am obviously not used to gitlab UI for those. Did you also manage to backport the fix for https://gitlab.com/embeddable-common-lisp/ecl/issues/317 ?

comment:60 in reply to: ↑ 59 Changed 20 months ago by arojas

Replying to fbissey:

Oh right. I am obviously not used to gitlab UI for those. Did you also manage to backport the fix for https://gitlab.com/embeddable-common-lisp/ecl/issues/317 ?

I did now, still no improvement.

comment:61 in reply to: ↑ 58 ; follow-up: Changed 20 months ago by gh-spaghettisalat

Replying to arojas:

If I'm reading it correctly it was closed by https://gitlab.com/embeddable-common-lisp/ecl/commit/2a9084b105caf26c89bb16ed4dd0bf5aa7ceab59

Beckporting that commit to 16.1.3 still doesn't fix the SIGFPE crashes on Arch unfortunately.

As the ECL developer that made this change: Did you also set the ECL_OPT_TRAP_SIGFPE option to false, i.e. put an extra line ecl_set_option(ECL_OPT_TRAP_SIGFPE, 0); into the ecl.pyx file before testing this? Because otherwise if you use the default value of true, you still instruct ECL to generate and catch floating point exceptions.

comment:62 in reply to: ↑ 61 ; follow-ups: Changed 20 months ago by arojas

Replying to gh-spaghettisalat:

Replying to arojas:

If I'm reading it correctly it was closed by https://gitlab.com/embeddable-common-lisp/ecl/commit/2a9084b105caf26c89bb16ed4dd0bf5aa7ceab59

Beckporting that commit to 16.1.3 still doesn't fix the SIGFPE crashes on Arch unfortunately.

As the ECL developer that made this change: Did you also set the ECL_OPT_TRAP_SIGFPE option to false, i.e. put an extra line ecl_set_option(ECL_OPT_TRAP_SIGFPE, 0); into the ecl.pyx file before testing this? Because otherwise if you use the default value of true, you still instruct ECL to generate and catch floating point exceptions.

Yes, still no difference

comment:63 in reply to: ↑ 62 ; follow-up: Changed 20 months ago by fbissey

Replying to arojas:

Replying to gh-spaghettisalat:

Replying to arojas:

If I'm reading it correctly it was closed by https://gitlab.com/embeddable-common-lisp/ecl/commit/2a9084b105caf26c89bb16ed4dd0bf5aa7ceab59

Beckporting that commit to 16.1.3 still doesn't fix the SIGFPE crashes on Arch unfortunately.

As the ECL developer that made this change: Did you also set the ECL_OPT_TRAP_SIGFPE option to false, i.e. put an extra line ecl_set_option(ECL_OPT_TRAP_SIGFPE, 0); into the ecl.pyx file before testing this? Because otherwise if you use the default value of true, you still instruct ECL to generate and catch floating point exceptions.

Yes, still no difference

Is the patch set you are working with available somewhere? I would like to try what you have done myself. May be more eyeballs can detect an issue that escaped you.

comment:64 in reply to: ↑ 63 ; follow-up: Changed 20 months ago by arojas

Replying to fbissey:

Is the patch set you are working with available somewhere? I would like to try what you have done myself. May be more eyeballs can detect an issue that escaped you.

Here it is: http://pkgbuild.com/~arojas/sagemath-ecl-sigfpe.patch

ecl version is 16.1.3 with 2a9084b1 and c2b29417 backported

comment:65 in reply to: ↑ 64 Changed 20 months ago by fbissey

Replying to arojas:

Replying to fbissey:

Is the patch set you are working with available somewhere? I would like to try what you have done myself. May be more eyeballs can detect an issue that escaped you.

Here it is: http://pkgbuild.com/~arojas/sagemath-ecl-sigfpe.patch

ecl version is 16.1.3 with 2a9084b1 and c2b29417 backported

I was thinking of the backport patches as well actually.

comment:67 Changed 20 months ago by fbissey

Thank you. I must say my memory of that conversation - started more than a year ago - was making the backport patches way more complicated.

comment:68 Changed 20 months ago by arojas

  • Cc embray added

comment:69 follow-up: Changed 20 months ago by embray

Is this something we should also try to get done for 8.6, along with the GAP 4.10 upgrade? ISTM that would be best from the Debian perspective.

comment:70 in reply to: ↑ 69 Changed 20 months ago by fbissey

Replying to embray:

Is this something we should also try to get done for 8.6, along with the GAP 4.10 upgrade? ISTM that would be best from the Debian perspective.

ecl 16.1.3 has been out for a while. Us in the distro world have been stumped for a bit. If you can move it, it would be very much appreciated. Everyone has their own timetable and that would be more interesting to me in Gentoo than gap 4.10 (because I am literally in charge of gap amongst other things - and don't get me wrong getting gap 4.10 is important especially getting rid of libgap).

comment:71 Changed 20 months ago by dimpase

ECL has not done a new release in years. We were hoping their next release would address some of problems with signals and exceptions we found in 16.1.3 - something that needed tricky work on Sage side, only to be thrown away, as ECL is going to change in this area a lot...

comment:72 Changed 20 months ago by slelievre

Note: the ECL release manager tells me the current estimate for the ECL 16.2.0 release date is "first quarter of 2019".

comment:73 Changed 20 months ago by dimpase

So they should have a beta soon, no?

comment:74 Changed 20 months ago by gh-timokau

Well they have been "release candidate testing" without any activity for a year now (https://gitlab.com/embeddable-common-lisp/ecl/issues/333). I wouldn't bet on the Q1 2019 release.

comment:75 in reply to: ↑ 62 ; follow-ups: Changed 20 months ago by gh-spaghettisalat

Replying to arojas:

Replying to gh-spaghettisalat:

Replying to arojas:

If I'm reading it correctly it was closed by https://gitlab.com/embeddable-common-lisp/ecl/commit/2a9084b105caf26c89bb16ed4dd0bf5aa7ceab59

Beckporting that commit to 16.1.3 still doesn't fix the SIGFPE crashes on Arch unfortunately.

As the ECL developer that made this change: Did you also set the ECL_OPT_TRAP_SIGFPE option to false, i.e. put an extra line ecl_set_option(ECL_OPT_TRAP_SIGFPE, 0); into the ecl.pyx file before testing this? Because otherwise if you use the default value of true, you still instruct ECL to generate and catch floating point exceptions.

Yes, still no difference

Then I'm guessing, the branch at https://git.sagemath.org/sage.git/log/?h=u/nbruin/ecl16.1.3 also failed for you? Because all the patch of ​https://gitlab.com/embeddable-common-lisp/ecl/commit/2a9084b105caf26c89bb16ed4dd0bf5aa7ceab59 does is to call (si::trap-fpe t nil) during startup if the ECL_OPT_TRAP_SIGFPE option is false.

Anyway, I think the real issue is that maxima enables floating point overflow exceptions during startup (see the last line in the src/ecl-port.lisp file of the maxima source directory). Removing this line should fix the observed crashes also with an unpatched ECL 16.1.3 and without the need to explicitely call (si::trap-fpe t nil) in ecl.pyx (as long as the ECL_OPT_TRAP_SIGFPE option is false of course). This is also consistent with the observation in https://gitlab.com/embeddable-common-lisp/ecl/issues/317 that floating point overflows behaved differently than the other floating point exceptions.

comment:76 in reply to: ↑ 75 Changed 20 months ago by arojas

Replying to gh-spaghettisalat:

Then I'm guessing, the branch at https://git.sagemath.org/sage.git/log/?h=u/nbruin/ecl16.1.3 also failed for you?

You're right, the branch never worked for me

Anyway, I think the real issue is that maxima enables floating point overflow exceptions during startup (see the last line in the src/ecl-port.lisp file of the maxima source directory). Removing this line should fix the observed crashes also with an unpatched ECL 16.1.3 and without the need to explicitely call (si::trap-fpe t nil) in ecl.pyx (as long as the ECL_OPT_TRAP_SIGFPE option is false of course).

And you're right again. Removed that line in maxima and there are no more crashes.

comment:77 in reply to: ↑ 75 ; follow-up: Changed 20 months ago by nbruin

Replying to gh-spaghettisalat:

Anyway, I think the real issue is that maxima enables floating point overflow exceptions during startup (see the last line in the src/ecl-port.lisp file of the maxima source directory). Removing this line should fix the observed crashes also with an unpatched ECL 16.1.3 and without the need to explicitely call (si::trap-fpe t nil) in ecl.pyx (as long as the ECL_OPT_TRAP_SIGFPE option is false of course).

In that case, I suspect that for us the right place to fix it is to revert the effect of the line in ecl-port.lisp after initialization in maxima_lib. Then we don't have to patch maxima. Also, maxima may have a good reason to do this under normal operation, and sagemath does build maxima both for stand-alone use and for library use (and it uses the same fas for both!). Unless we have a good reason, we should probably not modify the stand-alone behaviour.

The relevant file would be sage/interfaces/maxima_lib.py, somewhere after the (require 'maxima). It already has a lot of lisp commands that go digging in the internals of maxima. The short window where maxima turns on fpe exceptions and we turn them off again shouldn't be a problem.

comment:78 follow-up: Changed 20 months ago by arojas

With this [1] patch in sagemath it no longer crashes for me, with unmodified ecl and maxima

[1] https://git.archlinux.org/svntogit/community.git/tree/trunk/sagemath-ecl-sigfpe.patch?h=packages/sagemath

comment:79 in reply to: ↑ 77 Changed 20 months ago by embray

Replying to nbruin:

Replying to gh-spaghettisalat:

Anyway, I think the real issue is that maxima enables floating point overflow exceptions during startup (see the last line in the src/ecl-port.lisp file of the maxima source directory). Removing this line should fix the observed crashes also with an unpatched ECL 16.1.3 and without the need to explicitely call (si::trap-fpe t nil) in ecl.pyx (as long as the ECL_OPT_TRAP_SIGFPE option is false of course).

In that case, I suspect that for us the right place to fix it is to revert the effect of the line in ecl-port.lisp after initialization in maxima_lib. Then we don't have to patch maxima. Also, maxima may have a good reason to do this under normal operation, and sagemath does build maxima both for stand-alone use and for library use (and it uses the same fas for both!). Unless we have a good reason, we should probably not modify the stand-alone behaviour.

The relevant file would be sage/interfaces/maxima_lib.py, somewhere after the (require 'maxima). It already has a lot of lisp commands that go digging in the internals of maxima. The short window where maxima turns on fpe exceptions and we turn them off again shouldn't be a problem.

+1 Sage should not depend on non-standard functionality supplied via patch in order to work.

comment:80 Changed 19 months ago by embray

Of the patches we still include for our ECL build, are there any that haven't been submitted upstream, or at least that still need upstream issues?

I know that I got flisten-bug.patch fixed here: https://gitlab.com/embeddable-common-lisp/ecl/merge_requests/51 so that patch can probably be backported to 16.1.3 if need be (and in any case, the bug it fixes was only, to my knowledge, affecting Cygwin, so I don't care about it for Linux).

Last edited 19 months ago by embray (previous) (diff)

comment:81 Changed 19 months ago by embray

Also, are we still doing the tarball patching with spkg-src? I did not do that, so I'm guessing it's obsolete.

comment:82 follow-up: Changed 19 months ago by embray

The difficulty/problem here is that ECL's si_trap_fpe still uncondtionally (when available) calls feenableexcept. The effect of (si::trap-fpe t nil) is effectively to call feenableexcept(FE_DIVBYZERO | FE_OVERFLOW | FE_UNDERFLOW | FE_INVALID) which means now SIGFPEs will be sent to the program that weren't otherwise.

The only effect of setting ECL_OPT_TRAP_SIGFPE = 0 is that ECL's signal handler won't be installed (actually this setting could be avoided entirely by saving/restoring ECL's SIGFPE handler like we do some of its other signal handlers).

However, (si::trap-fpe t nil) still results now in unhandled SIGFPEs in random places where previously floating point exceptions were just ignored, or handled manually.

comment:83 in reply to: ↑ 82 ; follow-up: Changed 19 months ago by gh-spaghettisalat

Replying to embray:

The difficulty/problem here is that ECL's si_trap_fpe still uncondtionally (when available) calls feenableexcept. The effect of (si::trap-fpe t nil) is effectively to call feenableexcept(FE_DIVBYZERO | FE_OVERFLOW | FE_UNDERFLOW | FE_INVALID) which means now SIGFPEs will be sent to the program that weren't otherwise.

No, the effect of (si::trap-fpe t nil) is to call feenableexcept(0). What you're describing happens with (si::trap-fpe t t).

The only effect of setting ECL_OPT_TRAP_SIGFPE = 0 is that ECL's signal handler won't be installed (actually this setting could be avoided entirely by saving/restoring ECL's SIGFPE handler like we do some of its other signal handlers).

The effect of ECL_OPT_TRAP_SIGFPE = 0 in ECL 16.1.3 is a) that no signal handler for SIGFPE exceptions is installed and b) that in contrast to ECL_OPT_TRAP_SIGFPE = 1 no floating point exceptions are enabled (i.e. feenableexcept is not called). Because of the feature request https://gitlab.com/embeddable-common-lisp/ecl/issues/347, the next ECL release changes this behaviour to call feenableexcept(0) during startup if ECL_OPT_TRAP_SIGFPE is 0.

However, (si::trap-fpe t nil) still results now in unhandled SIGFPEs in random places where previously floating point exceptions were just ignored, or handled manually.

No, it doesn't, as explained above.

comment:84 in reply to: ↑ 78 Changed 19 months ago by embray

Replying to arojas:

With this [1] patch in sagemath it no longer crashes for me, with unmodified ecl and maxima

[1] https://git.archlinux.org/svntogit/community.git/tree/trunk/sagemath-ecl-sigfpe.patch?h=packages/sagemath

This patch also works for me. It's not ideal but it's okay as a temporary workaround.

comment:85 in reply to: ↑ 83 Changed 19 months ago by embray

Replying to gh-spaghettisalat:

Replying to embray:

The difficulty/problem here is that ECL's si_trap_fpe still uncondtionally (when available) calls feenableexcept. The effect of (si::trap-fpe t nil) is effectively to call feenableexcept(FE_DIVBYZERO | FE_OVERFLOW | FE_UNDERFLOW | FE_INVALID) which means now SIGFPEs will be sent to the program that weren't otherwise.

No, the effect of (si::trap-fpe t nil) is to call feenableexcept(0). What you're describing happens with (si::trap-fpe t t).

You're right: I was having trouble understanding exactly how this code works. The problem doesn't come from (si::trap-fpe t nil), but rather the fact that maxima later enables traps for overflows (which then needs to be subsequently disabled, per the patch in comment:78).

I guess, what I would want, would be an option to effectively turn trap-fpe into a no-op, because it's still modifying the global floating point environment in a way that's rather opaque and detrimental when embedding.

Anyways, since we identified for now the one place where this was happening--loading maxima--I'm content for now to step away, though we should probably change our ecl_sig_on/off() to save/restore the floating point environment and clear any floating point exceptions.

Happy New Year and have a good night!

comment:86 Changed 19 months ago by embray

Here's my attempt to patch this more generally. It appears to work, but I'm not sure how robust it is. With this, it's not necessary to manually call si::trap-fpe anywhere, because ECL's FPE traps are saved/restored alongside its signal handlers by ecl_sig_on/off():

  • src/sage/libs/ecl.pyx

    diff --git a/src/sage/libs/ecl.pyx b/src/sage/libs/ecl.pyx
    index 8dd2895..fa24ac0 100644
    a b from __future__ import print_function, absolute_import 
    1616#adapted to work with pure Python types.
    1717
    1818from libc.stdlib cimport abort
    19 from libc.signal cimport SIGINT, SIGBUS, SIGSEGV, SIGCHLD
     19from libc.signal cimport SIGINT, SIGBUS, SIGSEGV, SIGCHLD, SIGFPE
    2020from libc.signal cimport raise_ as signal_raise
    2121from posix.signal cimport sigaction, sigaction_t
    2222cimport cysignals.signals
    cdef extern from "eclsig.h": 
    4848    void ecl_sig_off()
    4949    cdef sigaction_t ecl_sigint_handler
    5050    cdef sigaction_t ecl_sigbus_handler
     51    cdef sigaction_t ecl_sigfpe_handler
    5152    cdef sigaction_t ecl_sigsegv_handler
    5253    cdef mpz_t ecl_mpz_from_bignum(cl_object obj)
    5354    cdef cl_object ecl_bignum_from_mpz(mpz_t num)
     55    cdef int fegetexcept()
     56    cdef int feenableexcept(int)
     57    cdef int fedisableexcept(int)
     58    cdef int ecl_feflags
    5459
    5560cdef cl_object string_to_object(char * s):
    5661    return ecl_read_from_cstring(s)
    def init_ecl(): 
    239244    global ecl_has_booted
    240245    cdef char *argv[1]
    241246    cdef sigaction_t sage_action[32]
     247    cdef int sage_fpes
    242248    cdef int i
    243249
    244250    if ecl_has_booted:
    def init_ecl(): 
    258264    for i in range(1,32):
    259265        sigaction(i, NULL, &sage_action[i])
    260266
     267    sage_fpes = fegetexcept()
     268
    261269    #initialize ECL
    262270    ecl_set_option(ECL_OPT_SIGNAL_HANDLING_THREAD, 0)
    263271    cl_boot(1, argv)
    def init_ecl(): 
    265273    #save signal handler from ECL
    266274    sigaction(SIGINT, NULL, &ecl_sigint_handler)
    267275    sigaction(SIGBUS, NULL, &ecl_sigbus_handler)
     276    sigaction(SIGFPE, NULL, &ecl_sigfpe_handler)
    268277    sigaction(SIGSEGV, NULL, &ecl_sigsegv_handler)
    269278
     279    #save ECL's floating point exception flags
     280    ecl_feflags = fegetexcept()
     281
    270282    #verify that no SIGCHLD handler was installed
    271283    cdef sigaction_t sig_test
    272284    sigaction(SIGCHLD, NULL, &sig_test)
    def init_ecl(): 
    277289    for i in range(1,32):
    278290        sigaction(i, &sage_action[i], NULL)
    279291
     292    fedisableexcept(ecl_feflags)
     293    feenableexcept(sage_fpes)
     294
    280295    #initialise list of objects and bind to global variable
    281296    # *SAGE-LIST-OF-OBJECTS* to make it rooted in the reachable tree for the GC
    282297    list_of_objects=cl_cons(Cnil,cl_cons(Cnil,Cnil))
    def init_ecl(): 
    320335                    (values nil (princ-to-string cnd)))))
    321336        """))
    322337    safe_funcall_clobj=cl_eval(string_to_object(b"(symbol-function 'sage-safe-funcall)"))
    323 
    324     cl_eval(string_to_object("(si::trap-fpe T NIL)"))
    325338    ecl_has_booted = 1
    326339
    327340cdef cl_object ecl_safe_eval(cl_object form) except NULL:
  • src/sage/libs/eclsig.h

    diff --git a/src/sage/libs/eclsig.h b/src/sage/libs/eclsig.h
    index f9f2690..c1d5244 100644
    a b  
    99
    1010
    1111#include <signal.h>
     12
     13/* Rummage around to determine how ECL was configured */
     14#define ECL_AVOID_FPE_H  /* Prevent some local includes */
     15#include <ecl/config-internal.h>
     16
     17#ifdef HAVE_FENV_H
     18#include <fenv.h>
     19#ifndef FE_ALL_EXCEPT
     20#define FE_ALL_EXCEPT FE_DIVBYZERO | FE_OVERFLOW | FE_UNDERFLOW | FE_INVALID
     21#endif
     22#else
     23#ifndef FE_ALL_EXCEPT
     24#define FE_ALL_EXCEPT 0
     25#endif
     26#endif
     27
     28#ifndef HAVE_FEENABLEEXCEPT
     29/* These are GNU extensions */
     30#define fegetexcept() 0
     31#define feenablexcept(flags)
     32#define fdisableexcept(flags)
     33#endif
     34
    1235static struct sigaction ecl_sigint_handler;
    1336static struct sigaction ecl_sigbus_handler;
     37static struct sigaction ecl_sigfpe_handler;
    1438static struct sigaction ecl_sigsegv_handler;
    1539static struct sigaction sage_sigint_handler;
    1640static struct sigaction sage_sigbus_handler;
     41static struct sigaction sage_sigfpe_handler;
    1742static struct sigaction sage_sigsegv_handler;
     43static int ecl_feflags;
     44static int sage_feflags;
    1845
    1946static inline void set_ecl_signal_handler(void)
    2047{
    2148    sigaction(SIGINT, &ecl_sigint_handler, &sage_sigint_handler);
    2249    sigaction(SIGBUS, &ecl_sigbus_handler, &sage_sigbus_handler);
     50    sigaction(SIGFPE, &ecl_sigfpe_handler, &sage_sigfpe_handler);
    2351    sigaction(SIGSEGV, &ecl_sigsegv_handler, &sage_sigsegv_handler);
     52    /* sage_feflags should be 0; we don't set them otherwise */
     53    sage_feflags = fedisableexcept(FE_ALL_EXCEPT);
     54    feenableexcept(ecl_feflags);
    2455}
    2556
    2657static inline void unset_ecl_signal_handler(void)
    2758{
    2859    sigaction(SIGINT, &sage_sigint_handler, NULL);
    2960    sigaction(SIGBUS, &sage_sigbus_handler, NULL);
     61    sigaction(SIGFPE, &sage_sigfpe_handler, NULL);
    3062    sigaction(SIGSEGV, &sage_sigsegv_handler, NULL);
     63    ecl_feflags = fedisableexcept(FE_ALL_EXCEPT);
     64    feenableexcept(sage_feflags);
    3165}
    3266
    3367/* This MUST be a macro because sig_on() must be in the same

comment:87 Changed 19 months ago by embray

  • Authors changed from Dima Pasechnik to Dima Pasechnik, Erik Bray
  • Branch changed from u/nbruin/ecl16.1.3 to public/ticket-22191
  • Commit 35e9d3d1610bade7ae7d5fac735244be25e009eb deleted
  • Status changed from needs_work to needs_review

New branch based on the original one, rebased on 8.6.beta0, and including my floating point exception handling fixes. These non-standard FPE APIs are woefully underdocumented and confusing, but I think this has the right idea and all tests pass for me, docs build, etc. It should be a non-issue on platforms that don't have feenableexcept(), so we just treat it there as a no-op.

comment:88 Changed 19 months ago by git

  • Commit set to f2eea3db4ce90cb5743033c32753b9d9ade75703

Branch pushed to git repo; I updated commit sha1. New commits:

615b29fupdate ECL to 16.1.3 - most of our patches are obsolete
1a643faset ECL to not trap floating point exceptions, so that SIGFPE will not occur
f2eea3dimprove save/restore of floating point exception handling when entering/leaving libecl

comment:89 Changed 19 months ago by dimpase

some patches don't apply

Setting up build directory for ecl-16.1.3.p0
Finished extraction
Applying patches from ../patches...
Applying ../patches/flisten-bug.patch
patching file src/c/file.d
Hunk #1 FAILED at 5317.
1 out of 1 hunk FAILED -- saving rejects to file src/c/file.d.rej
Error applying '../patches/flisten-bug.patch'
************************************************************************
Error applying patches

comment:90 Changed 19 months ago by dimpase

  • Status changed from needs_review to needs_work

comment:91 Changed 19 months ago by embray

I had renamed that patch while updating, but looks like I didn't push the updated patch.

comment:92 Changed 19 months ago by git

  • Commit changed from f2eea3db4ce90cb5743033c32753b9d9ade75703 to ee5a10b5781f825b4da678f54c8b5589d152e438

Branch pushed to git repo; I updated commit sha1. New commits:

ee5a10bupdated this patch to apply cleanly to ECL 16.1.3

comment:93 Changed 19 months ago by embray

  • Status changed from needs_work to needs_review

comment:94 Changed 19 months ago by dimpase

  • Authors changed from Dima Pasechnik, Erik Bray to Nils Bruin, Dima Pasechnik, Erik Bray
  • Reviewers set to Dima Pasechnik

Nils, could you have a look at this?

comment:95 Changed 19 months ago by nbruin

It seems to me that the fedisableexcept/feenableexcept dance is supposed to switch between "ecl" FPE state and "sage" FPE state. However, that is not what those commands do. According to the documentation, fedisableexcept just disables the given exceptions (and leaves the other ones the way they were) and feenableexcept enables them. Furthermore, it seems these are supposedly glibc extensions.

When I look at the relevant manpage, it seems to indicate that fegetenv and fesetenv are defined in C99 and do set the FPE environment wholesale. So, would it be better to do:

fegetenv(&sage_fe_env)
fesetenv(&ecl_fe_env)
<do ecl stuff>
fegetenv(&ecl_fe_env)
fesetenv(&sagel_fe_env)

that might do a more comprehensive capture ...

comment:96 follow-up: Changed 19 months ago by embray

I thought of using feget/setenv, but the reason I didn't is simply because ECL does not use them, and rather uses fedisableexcept and feenableexcept. See: https://gitlab.com/embeddable-common-lisp/ecl/blob/2a9084b105caf26c89bb16ed4dd0bf5aa7ceab59/src/c/unixint.d#L1234

It's not clear to me or well documented how feenable/disableexcept interact with fesetenv. Presumably the latter should encompass the former, but I have not looked into it in more detail. I agree that the get/setenv functions look better overall and more portable.

So for now I'd be more comfortable just doing exactly the converse to what ECL is doing. In this code I use fedisableexcept(FE_ALL_EXCEPT) to just disable all floating point exceptions, followed by feenableexcept(...) for just the ECL flags, or just the Sage flags (which should be 0, since I'm not aware of anything in Sage or any of its other dependencies that, at least by default, sets FPE traps, but we save/restore it anyways just in case).

Last edited 19 months ago by embray (previous) (diff)

comment:97 Changed 19 months ago by dimpase

I don’t recall whether numpy sets FPE traps, or merely monitors for FPEs, but this did cause us trouble in the past, see #22799.

comment:98 in reply to: ↑ 96 ; follow-up: Changed 19 months ago by nbruin

Replying to embray:

I thought of using feget/setenv, but the reason I didn't is simply because ECL does not use them, and rather uses fedisableexcept and feenableexcept. See: https://gitlab.com/embeddable-common-lisp/ecl/blob/2a9084b105caf26c89bb16ed4dd0bf5aa7ceab59/src/c/unixint.d#L1234

OK, that makes sense. There is one point of concern: as far as I can see, these routines change which exceptions raise a signal. It doesn't change which flags are set. Can it be that sage or ecl get exception flag states they are not expecting because the state arose under a signal regime they are not used to? (getenv/setenv would take care of that because they'd swap the *entire* FP state)

Also: can we do something so that the patchbot can test this ticket? It looks like it can't find the ecl tarball at the moment.

Last edited 19 months ago by nbruin (previous) (diff)

comment:99 in reply to: ↑ 98 ; follow-up: Changed 19 months ago by embray

Replying to nbruin:

Replying to embray:

I thought of using feget/setenv, but the reason I didn't is simply because ECL does not use them, and rather uses fedisableexcept and feenableexcept. See: https://gitlab.com/embeddable-common-lisp/ecl/blob/2a9084b105caf26c89bb16ed4dd0bf5aa7ceab59/src/c/unixint.d#L1234

OK, that makes sense. There is one point of concern: as far as I can see, these routines change which exceptions raise a signal. It doesn't change which flags are set. Can it be that sage or ecl get exception flag states they are not expecting because the state arose under a signal regime they are not used to? (getenv/setenv would take care of that because they'd swap the *entire* FP state)

I'm afraid I don't exactly follow you here. Maybe you'd have to describe a specific case. However, the answer is probably "Yes, that's possible, but I don't currently have a case where that's occurring so I'm not going to worry about it until I do."

Also: can we do something so that the patchbot can test this ticket? It looks like it can't find the ecl tarball at the moment.

I don't know. The patchbot has some code for divining a link to the upstream tarball from the ticket description and downloading it, so I don't know why it's not working in this case.

comment:100 in reply to: ↑ 99 Changed 19 months ago by nbruin

Replying to embray:

I'm afraid I don't exactly follow you here. Maybe you'd have to describe a specific case.

I'd expect that signalling-based code would clear exception flags as it goes along, expecting a clear set of exception flags as normal state.

With NaN-producing code, the exception flags would usually pile up. Entering signalling-based code could therefore easily be confused if you feed it a state attained by NaN-producing code (it would be hard to tell what exception triggered the signal if all the flags are on ...)

It might be that ECL functions reasonably well with that kind of state poorly defined ... It seems it can turn on/off signals by itself already.

comment:101 Changed 19 months ago by embray

I'm sorry but I'm still not following. What do you mean by "entering signalling-based code"? Do you have some specific suggestions or are you just speculating at this point? I would like to move on here, that's all...

comment:102 Changed 19 months ago by nbruin

OK, I don't know enough about IEEE floats. Someone else with more expertise should take a look at this.

The fact that only part of the FP state gets swapped (which exceptions get trapped) and another part (exception state flags) does not, looks suspicious.

I would normally think that a little more rigorous argument than "it doesn't seem to cause problems right now" would be warranted before implementing such a shortcut in an interface. Someone with more expertise in scientific computing software using IEEE floats can probably make a better assessment on whether this is something to worry about.

comment:103 follow-up: Changed 19 months ago by embray

It's not a question about IEEE floats at all per se, just the API.

And this is literally just doing the converse of what ECL does. I'm not worried about a "more rigorous argument" for a problem I don't have. Maybe that's just the pragmatist in me but I really don't have more time to spend on this unless someone can demonstrate an actual problem.

I do remember reading somewhere that fedisableexcept also clears any exception flags, but I forget where, so I might need to double-check that. If it helps we could also add an explicit feclearexcept() but that's as far as I'm willing to go right now. I don't think there's any point in otherwise delaying this further after...looks...2 years.

comment:104 Changed 19 months ago by dimpase

  • Reviewers changed from Dima Pasechnik to Dima Pasechnik, Nils Bruin
  • Status changed from needs_review to positive_review

with numerics moving to GPUs and TPUs, it's more and more a purely academic matter of sticking, or not, to IEEE floats standard. Let's go ahead with this, and hope that a new ECL release happens soon...

comment:105 in reply to: ↑ 103 Changed 19 months ago by embray

Replying to embray:

I do remember reading somewhere that fedisableexcept also clears any exception flags, but I forget where, so I might need to double-check that. If it helps we could also add an explicit feclearexcept() but that's as far as I'm willing to go right now.

I did go ahead and check on this since now I'm thinking I must have imagined or misremembered that. Modifying the exception masks does not cause any pending exceptions to be cleared, so theoretically this could result in some odd bugs I guess.

Nils has a point that it would be better to save/restore the entire FPU environment but my point still stands that for the present purpose it's mostly academic. I'm really only concerned about saving/reverting exactly what ECL is (potentially) doing and nothing more. Clearing any pending exceptions first does make sense though so I'll add that in real quick.

comment:106 Changed 19 months ago by git

  • Commit changed from ee5a10b5781f825b4da678f54c8b5589d152e438 to 85fcb48922383cd1c2ac36412ed761a507cb9537
  • Status changed from positive_review to needs_review

Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:

85fcb48add feclearexcept to ecl_sig_on/off to clear any pending floating point status flags before modifying the traps

comment:107 Changed 19 months ago by embray

  • Status changed from needs_review to positive_review

I tested the changes myself and it's not a very significant difference over what was already reviewed, so setting back to positive_review.

comment:108 Changed 19 months ago by dimpase

OK, I ran tests too, just to make sure, it all looks good.

comment:109 Changed 19 months ago by embray

  • Milestone changed from sage-8.6 to sage-8.7

Well, clearly it was too late for this, though I believe this should have been part of 8.6. Changing milestone since it wasn't (and it was more a nice-to-have I think than critical).

If downstream wants to use sagemath with this or a newer ECL they can simply use the patch from this branch.

comment:110 follow-ups: Changed 19 months ago by vbraun

  • Status changed from positive_review to needs_work

Doesn't work on OSX, maxima then fails to build with

for l in ecl; do for d in / /numerical /numerical/slatec; do .././install-sh -c -d binary-$l$d; done; done
ecl -norc \
           -eval '(progn  (load "../lisp-utils/defsystem.lisp") (funcall (intern (symbol-name :operate-on-system) :mk) "maxima" :compile :verbose t) (build-maxima-lib))' \
           -eval '(ext:quit)'
dyld: Library not loaded: @libdir@/libecl.16.1.dylib
  Referenced from: /Users/buildslave-sage/slave/sage_git/build/local/bin/ecl
  Reason: image not found
make[5]: *** [binary-ecl/maxima] Abort trap: 6
make[5]: Target `all' not remade because of errors.

comment:111 in reply to: ↑ 110 Changed 19 months ago by dimpase

Replying to vbraun:

Doesn't work on OSX, maxima then fails to build with

Does building ECL actually work on OSX (what version?) ? It looks as if ECL is broken in some way then.

comment:112 in reply to: ↑ 110 Changed 19 months ago by gh-spaghettisalat

Replying to vbraun:

Doesn't work on OSX, maxima then fails to build with

for l in ecl; do for d in / /numerical /numerical/slatec; do .././install-sh -c -d binary-$l$d; done; done
ecl -norc \
           -eval '(progn  (load "../lisp-utils/defsystem.lisp") (funcall (intern (symbol-name :operate-on-system) :mk) "maxima" :compile :verbose t) (build-maxima-lib))' \
           -eval '(ext:quit)'
dyld: Library not loaded: @libdir@/libecl.16.1.dylib
  Referenced from: /Users/buildslave-sage/slave/sage_git/build/local/bin/ecl
  Reason: image not found
make[5]: *** [binary-ecl/maxima] Abort trap: 6
make[5]: Target `all' not remade because of errors.

This is probably same as https://gitlab.com/embeddable-common-lisp/ecl/issues/398, fixed by https://gitlab.com/embeddable-common-lisp/ecl/commit/612eeb5ed1623c4c7cb71029aab39107caf1cdba

comment:113 Changed 18 months ago by fbissey

Looks like totally it.

comment:114 Changed 18 months ago by git

  • Commit changed from 85fcb48922383cd1c2ac36412ed761a507cb9537 to 61d68b4a3872e3ae05e3a42f6aa6d960b7847be9

Branch pushed to git repo; I updated commit sha1. New commits:

28def8dMerge branch 'develop' into ticket-22191
61d68b4Add upstream patch for OS X build

comment:115 Changed 18 months ago by fbissey

  • Status changed from needs_work to needs_review

Added upstream patch. It applies without problem and doesn't break things on linux. But it has to be tested on OS X.

comment:116 Changed 18 months ago by fbissey

I wonder if #27225 will fix the darwin build bot for this ticket once it is in.

comment:117 Changed 18 months ago by git

  • Commit changed from 61d68b4a3872e3ae05e3a42f6aa6d960b7847be9 to 4b4255e39af6d1baf19ba0af9d11e924501946eb

Branch pushed to git repo; I updated commit sha1. New commits:

4b4255eMerge branch 'develop' into ticket-22191

comment:118 Changed 18 months ago by fbissey

Just rebasing on top of 8.5.beta5. Hopefully the patchbot will stop choking on python-3.7.2 and we can see if things work for real.

comment:119 Changed 18 months ago by fbissey

  • Reviewers changed from Dima Pasechnik, Nils Bruin to Dima Pasechnik, Nils Bruin, François Bissey
  • Status changed from needs_review to positive_review

Patchbot still breaking on something unrelated to this ticket. Putting to positive review to have some real testing.

comment:120 Changed 18 months ago by vbraun

  • Status changed from positive_review to needs_work

How is this unrelated (see patchbot)?

[sagelib-8.7.beta5] build/cythonized/sage/libs/ecl.c:4814:36: error: 'ecl_sigfpe_handler' undeclared (first use in this function); did you mean 'ecl_sigbus_handler'?

comment:121 Changed 17 months ago by embray

  • Milestone changed from sage-8.7 to sage-8.8

Ticket retargeted after milestone closed (if you don't believe this ticket is appropriate for the Sage 8.8 release please retarget manually)

comment:122 Changed 14 months ago by embray

  • Milestone changed from sage-8.8 to sage-8.9

Tickets still needing working or clarification should be moved to the next release milestone at the soonest (please feel free to revert if you think the ticket is close to being resolved).

comment:123 Changed 14 months ago by dimpase

Is it possible to revive the current branch? It used to work at some point.

comment:124 Changed 7 months ago by embray

  • Milestone changed from sage-8.9 to sage-9.1

Ticket retargeted after milestone closed

comment:125 Changed 6 months ago by mkoeppe

on macOS (merged with 9.1.beta3):

ImportError: dlopen(/Users/mkoeppe/s/sage/sage-rebasing/worktree-algebraic-2018-spring/local/lib/python3.7/site-packages/sage/libs/ecl.cpython-37m-darwin.so, 2): Symbol not found: _fedisableexcept
  Referenced from: /Users/mkoeppe/s/sage/sage-rebasing/worktree-algebraic-2018-spring/local/lib/python3.7/site-packages/sage/libs/ecl.cpython-37m-darwin.so
  Expected in: flat namespace
 in /Users/mkoeppe/s/sage/sage-rebasing/worktree-algebraic-2018-spring/local/lib/python3.7/site-packages/sage/libs/ecl.cpython-37m-darwin.so

comment:126 Changed 4 months ago by mkoeppe

  • Milestone changed from sage-9.1 to sage-9.2

comment:127 Changed 3 months ago by dimpase

  • Description modified (diff)
  • Summary changed from update ECL to 16.1.3 to update ECL to 16.1.3 or - rather - 20.4.24
Note: See TracTickets for help on using tickets.