Opened 3 years ago

Last modified 30 hours ago

#22191 positive_review enhancement

update ECL to 6.1.3 — at Version 33

Reported by: dimpase Owned by:
Priority: critical Milestone: sage-9.2
Component: packages: standard Keywords: upgrade, ecl
Cc: rws, fbissey, jdemeyer, charpent, jpflori, arojas, gh-timokau, saraedum, infinity0, slelievre, thansen, embray, mjo Merged in:
Authors: Dima Pasechnik Reviewers:
Report Upstream: Fixed upstream, but not in a stable release. Work issues: sort out SIGFPE
Branch: u/nbruin/ecl16.1.3 (Commits) Commit: 35e9d3d1610bade7ae7d5fac735244be25e009eb
Dependencies: Stopgaps:

Description (last modified by jdemeyer)

ECL 6.1.3 has been released.

We should update, as it will resolve some Maxima bugs, and is essential for resolving other Maxima bugs.

Tarball: https://common-lisp.net/project/ecl/static/files/release/ecl-16.1.3.tgz

Change History (34)

comment:1 Changed 3 years ago by dimpase

  • Branch set to u/dimpase/ecl16.1.3
  • Commit set to 33e59cb6e6d5b6b16f0b23eacc566d613ffccd07
  • Description modified (diff)

New commits:

3ed32fbUpgrade Jupyter packages, add enum34
72993f2Merge branch 'u/jdemeyer/upgrade_jupyter_packages' of trac.sagemath.org:sage into develop
2a291abMerge branch 'develop' of trac.sagemath.org:sage into develop
bd5a04fUpdate source to 2.11.0, fix checksum.
8356446Merge branch 'u/charpent/upgrade_git_to_2_10_2__or_2_11___' of trac.sagemath.org:sage into newgit
33e59cbupdate ECL to 16.1.3 - most of our patches are obsolete

comment:2 Changed 3 years ago by dimpase

oops, somewhat wrong branch... testing anyway

comment:3 Changed 3 years ago by git

  • Commit changed from 33e59cb6e6d5b6b16f0b23eacc566d613ffccd07 to eff60815383ca3b3120a7a26e98e1712072c70ac

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

eff6081update ECL to 16.1.3 - most of our patches are obsolete

comment:4 Changed 3 years ago by dimpase

  • Authors set to Dima Pasechnik
  • Status changed from new to needs_review

comment:5 Changed 3 years ago by fbissey

  • Cc fbissey added

comment:6 Changed 3 years ago by dimpase

The FPE problem on #18920 in comment 99 might be due to ECL 16.1.3 throwing an FPE exception at

> (/ 1.0 0.0)

Condition of type: DIVISION-BY-ZERO
#<a DIVISION-BY-ZERO>

while 16.1.2 happily returns

#.ext::single-float-positive-infinity

comment:7 Changed 3 years ago by dimpase

with 6.1.3 there is apparently new --no-trap-fpe which emulates the old behaviour. So we somehow have to figure out where to put this option.

comment:8 Changed 3 years ago by fbissey

If it is an option for ecl then somewhere in sage/libs/ecl.pyx is the place, probably in ecl_init but I have to dig a bit.

comment:9 follow-up: Changed 3 years ago by fbissey

There is already an option ECL_OPT_TRAP_SIGFPE is it equivalent?

Changed 3 years ago by dimpase

crash gdb log

comment:10 in reply to: ↑ 9 Changed 3 years ago by dimpase

Replying to fbissey:

There is already an option ECL_OPT_TRAP_SIGFPE is it equivalent?

I guess it should be. Perhaps it's something more subtle... I attach the crash dump.

comment:11 Changed 3 years ago by git

  • Commit changed from eff60815383ca3b3120a7a26e98e1712072c70ac to ac8348c462b11a67ee22f77e265afc0ce0572620

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

62b5825some more easy doctest fixes
3c28022more easy doctest fixes
5d86413Merge branch 'public/t18920' of trac.sagemath.org:sage into max5381
100d9d6workaround for #<a ARITHMETIC-ERROR> (18920#comment:60
79eb82ffix for maxima bug #3236
814e8edreflect patch's dependence on an earlier maxima tree commit
af7e5ffnew scaling of dynatomic polynomial
68da1e4Merge branch 'public/t18920' of trac.sagemath.org:sage into ecl1613
6bae4b1bumped up for 5.39.0. Sage builds (with ecl 16.1.3)
ac8348cdisable SIGFPE traps

comment:12 Changed 3 years ago by git

  • Commit changed from ac8348c462b11a67ee22f77e265afc0ce0572620 to a06f00245bfb652270eeea78248348777d006538

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

3ed32fbUpgrade Jupyter packages, add enum34
72993f2Merge branch 'u/jdemeyer/upgrade_jupyter_packages' of trac.sagemath.org:sage into develop
2a291abMerge branch 'develop' of trac.sagemath.org:sage into develop
6027df9Merge branch 'develop' of trac.sagemath.org:sage into develop
b9ee72dupdate ECL to 16.1.3 - most of our patches are obsolete
a06f002disable SIGFPE traps

comment:13 Changed 3 years ago by git

  • Commit changed from a06f00245bfb652270eeea78248348777d006538 to ad25254d3445265211081b50501af670daf4dade

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

377e0edupdate ECL to 16.1.3 - most of our patches are obsolete
ad25254disable SIGFPE traps

comment:14 Changed 3 years ago by dimpase

  • Status changed from needs_review to needs_work
  • Work issues set to sort out SIGFPE

this is a clean update branch for ECL 16.1.3 - needs work to sort out SIGFPE.

comment:15 follow-up: Changed 3 years ago by nbruin

I'm pretty sure the change in behaviour comes from this check-in on ECL: https://gitlab.com/embeddable-common-lisp/ecl/commit/c2b2941768f39544f45f24e19a30081d316eeb71

Clearly, this code will be setting FPE generation flags in more places than before. Probably ECL doesn't go out of its way to reset FPE flags when ECL is exiting. So probably, after executing ECL code, it is now the case that (often?) the FPE flags are set the way ECL wants; not the way we want. So in addition to restoring the SIG handlers, we should probably sprinkle an appropriate amount of fesetenv calls around the ECL entry and exit wrappers.

Last edited 3 years ago by nbruin (previous) (diff)

comment:16 Changed 3 years ago by dimpase

Upstream tells me that https://gitlab.com/embeddable-common-lisp/ecl/blob/develop/src/lsp/top.lsp#L1456 makes write-error patch unnecessary. So we should remove it too.

Last edited 3 years ago by dimpase (previous) (diff)

comment:17 in reply to: ↑ 15 Changed 3 years ago by dimpase

  • Cc jdemeyer added

Replying to nbruin:

I'm pretty sure the change in behaviour comes from this check-in on ECL: https://gitlab.com/embeddable-common-lisp/ecl/commit/c2b2941768f39544f45f24e19a30081d316eeb71

Clearly, this code will be setting FPE generation flags in more places than before.

but this ECL macro is not about the case at hand, as I believe we do have fenv.h available. Or you mean that Sage does not use fenv.h?

Probably ECL doesn't go out of its way to reset FPE flags when ECL is exiting. So probably, after executing ECL code, it is now the case that (often?) the FPE flags are set the way ECL wants; not the way we want. So in addition to restoring the SIG handlers, we should probably sprinkle an appropriate amount of fesetenv calls around the ECL entry and exit wrappers.

this does not explain why simply setting ECL_OPT_TRAP_SIGFPE to 0 solves the issue for division by 0, but not for FP overflow.

However, I don't understand why doing anything with ECL_OPT_TRAP_SIGFPE has any effect whatsoever (assuming we do not preserve it among other ECL's interrupts such as SIGINT). Shouldn't

    #and put the Sage signal handlers back
    for i in range(1,32):
        sigaction(i, &sage_action[i], NULL)

in ecl.pyx be overriding whatever ECL did with SIGFPE?

comment:18 Changed 3 years ago by dimpase

The experimental branch using ECLs SIGFPE handler is on girt trac: u/dimpase/ecl16.1.3exp So you can have a look --- it also doesn't work.

comment:19 follow-up: Changed 3 years ago by nbruin

I am not very familiar with this kind of coding issues and/or with SIG handling in general.

If we're doing our sig handler setting properly then it shouldn't matter what sig handlers ECL installs when we run non-ECL code, because we switch them out. As far as I can see this is the case. I see no evidence that an ECL sig handler is interfering with our work. The trace-back doesn't mention ECL code at all. Just an uncaught signal.

It seems that the sage action for SIGFPE is to coredump. That's what we're seeing.

There are flags that govern when SIGFPE is raised (rather than propagating a NaN or something similar). The ECL macro I pointed at will be inserted in ECL generated code quite a bit, so I expect that macro to be executed ALL THE TIME. Plus, the patch was made exactly to fix an issue of unraised SIGFPEs (CL apparently prefers the exceptions over NaNs? by default).

We are not taking measures to set fenv to our liking after executing ECL, and one change in the new ECL version is certainly to change fenv more often. So I think it's worth a try to save & restore's sage's own fenv across ECL calls.

It could be that ECL changes its FPE raise mask depending on the value of ECL_OPT_TRAP_SIGFPE. I haven't checked. That would be consistent with observed behaviour.

comment:20 in reply to: ↑ 19 ; follow-ups: Changed 3 years ago by dimpase

Replying to nbruin:

I am not very familiar with this kind of coding issues and/or with SIG handling in general.

If we're doing our sig handler setting properly then it shouldn't matter what sig handlers ECL installs when we run non-ECL code, because we switch them out. As far as I can see this is the case. I see no evidence that an ECL sig handler is interfering with our work. The trace-back doesn't mention ECL code at all. Just an uncaught signal.

But why do we get an uncaught FP overflow signal? That's because booting ECL up destroys Sages' sig handler, even if we switch ECLs SIGFPE handler off. How exactly this is possible is not clear to me.

It seems that the sage action for SIGFPE is to coredump. That's what we're seeing.

no, it is not. Sage's action on FP overflow is to return infinity. Why does ECL continue to mess around with it, even if told not to? In fact, ECL 16.1.3 has a new command line switch, -no-trap-fp, and it works. (i.e. you get ECLs infinity if you try to convert a too big number to float if this switch is on, and you get an interrupt if that switch is off)

There are flags that govern when SIGFPE is raised (rather than propagating a NaN or something similar). The ECL macro I pointed at will be inserted in ECL generated code quite a bit, so I expect that macro to be executed ALL THE TIME. Plus, the patch was made exactly to fix an issue of unraised SIGFPEs (CL apparently prefers the exceptions over NaNs? by default).

We are not taking measures to set fenv to our liking after executing ECL, and one change in the new ECL version is certainly to change fenv more often. So I think it's worth a try to save & restore's sage's own fenv across ECL calls.

It could be that ECL changes its FPE raise mask depending on the value of ECL_OPT_TRAP_SIGFPE. I haven't checked. That would be consistent with observed behaviour.

comment:21 Changed 3 years ago by nbruin

  • Branch changed from u/dimpase/ecl16.1.3 to u/nbruin/ecl16.1.3

comment:22 in reply to: ↑ 20 Changed 3 years ago by nbruin

  • Commit changed from ad25254d3445265211081b50501af670daf4dade to a2cadc04eb91409ad1f88b5f4d6ad80d5dfc3451

Replying to dimpase:

In fact, ECL 16.1.3 has a new command line switch, -no-trap-fp, and it works.

The corresponding ECL command is (si::trap-fpe t nil), so we could just pass that to ECL just after cl_boot. That might be easier/more efficient than passing it in the argv.

This should put ECL in agreement about what the fenv FPE raise flags are supposed to be (and then it also doesn't matter what SIGFPE handler ECL uses, because the signals will not occur, so we don't need to restore ECL's SIGFPE handler for ECL code either.

Last edited 3 years ago by nbruin (previous) (diff)

comment:23 follow-up: Changed 3 years ago by git

  • Commit changed from a2cadc04eb91409ad1f88b5f4d6ad80d5dfc3451 to ea11dd53248462bf2859fb3cbe34bfd47f006682

Branch pushed to git repo; I updated commit sha1. New commits:

ea11dd5Restore "fenv" upon entry and exit of ECL code

comment:24 in reply to: ↑ 23 Changed 3 years ago by nbruin

Replying to git:

Branch pushed to git repo; I updated commit sha1. New commits:

ea11dd5Restore "fenv" upon entry and exit of ECL code

OK, committing the change before pushing is probably helpful.

comment:25 Changed 3 years ago by git

  • Commit changed from ea11dd53248462bf2859fb3cbe34bfd47f006682 to 35e9d3d1610bade7ae7d5fac735244be25e009eb

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

377e0edupdate ECL to 16.1.3 - most of our patches are obsolete
35e9d3dset ECL to not trap floating point exceptions, so that SIGFPE will not occur

comment:26 Changed 3 years ago by nbruin

OK, that should do the trick! I guess ECL_OPT_TRAP_SIGFPE doesn't have much to do with what we need because it's supposed to control what happens with SIGFPE if ECL is catching signals, whereas we're interested in preventing FP exceptions from generating signals at all. the routine si:trap-fpe is exactly the one to set that behaviour, in a way that is maintained in ECL's environment tracking as well.

comment:27 follow-up: Changed 3 years ago by charpent

  • Cc charpent added

Just as a memento for future testing :

With the current ECL Maxima 5.39.0 gives :

(%i3) taylor(integrate(f(x),x),x,x_0,2); 

taylor: unable to expand at a point specified in:
'integrate(f(x),x)
 -- an error. To debug this try: debugmode(true);

whereas :

%i2) taylor(integrate(f(x),x),x,x_0,2);

(%o2) 'at('integrate(f(x),x),x = x_0)+f(x_0)*(x-x_0)
                                     +(('at('diff(f(x),x,1),x = x_0))
                                      *(x-x_0)^2)
                                      /2

is expected since Maxima 5.38.1 at the latest...

comment:28 Changed 3 years ago by nbruin

DISCLAIMER: I just found out that all my testing was on 16.1.2. While I expect that my fix should work, I haven't actually successfully tested this. Other people who do have easy access to 16.1.3 can verify this easily.

comment:29 in reply to: ↑ 20 Changed 3 years ago by dimpase

OK, this appears to work. make ptest with this ECL 16.1.3 branch and Maxima from #18920 passes without any errors.

comment:30 Changed 3 years ago by dimpase

  • Report Upstream changed from N/A to Reported upstream. No feedback yet.

Daniel Kochmański (ECL lead dev) invited me to open an issue #347 on this.

comment:31 in reply to: ↑ 27 Changed 3 years ago by dimpase

Replying to charpent:

Just as a memento for future testing :

With the current ECL Maxima 5.39.0 gives :

(%i3) taylor(integrate(f(x),x),x,x_0,2); 

taylor: unable to expand at a point specified in:
'integrate(f(x),x)
 -- an error. To debug this try: debugmode(true);

this unfortunately stays the same with ECL 16.1.3. This is for Maxima/ECL people to fix, not for us.

comment:32 Changed 3 years ago by jdemeyer

It would be really good to explain on this ticket what the SIGFPE issue is all about.

comment:33 Changed 3 years ago by jdemeyer

  • Description modified (diff)
Note: See TracTickets for help on using tickets.