Opened 4 years ago

Closed 4 years ago

#23448 closed defect (fixed)

configure fails when run with a posix shell supporting $LINENO (e.g. dash 0.5.9 and many others)

Reported by: tornaria Owned by:
Priority: critical Milestone: sage-8.2
Component: build: configure Keywords: sd87
Cc: jdemeyer Merged in:
Authors: Gonzalo Tornaría Reviewers: Dima Pasechnik
Report Upstream: N/A Work issues:
Branch: 89212b4 (Commits, GitHub, GitLab) Commit: 89212b447cda068b0596de93e59ca93a9104ad1e
Dependencies: Stopgaps:

Status badges

Description (last modified by jdemeyer)

When using an unpatched recent version of dash to run configure, this happens:

$ wget http://gondor.apana.org.au/~herbert/dash/files/dash-0.5.9.1.tar.gz
[...]
$ tar xf dash-0.5.9.1.tar.gz
$ cd dash-0.5.9.1
$ ./configure && make
[...]
$ cd ..
$ wget http://files.sagemath.org/devel/sage-8.0.rc2.tar.gz
$ tar xf sage-8.0.rc2.tar.gz
$ cd sage-8.0.rc2
$ ../dash-0.5.9.1/src/dash ./configure
$ ../dash-0.5.9.1/src/dash ./configure
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether to enable maintainer-specific portions of Makefiles... yes
configure: WARNING: you should use --build, --host, --target
configure: WARNING: you should use --build, --host, --target
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether to enable maintainer-specific portions of Makefiles... yes
checking for root user... no
checking build system type... Invalid configuration `x': machine `x' not recognized
configure: error: bash config/config.sub x failed

My /bin/sh happens to be this shell, so just running make is broken in my system.

After much suffering, I found out the code at fault in configure.ac:

#---------------------------------------------------------
# We need to run this configure script with bash
if test -z "$BASH_VERSION$CONFIG_SHELL"
then
    CONFIG_SHELL=bash
    export CONFIG_SHELL
    if $CONFIG_SHELL -c "exit 0"
    then
        exec $CONFIG_SHELL $0 "$@"
[...]

introduced by e03f296685f986f89e1e12974f4c5ead9a9ff440.

I guess the idea is to re-run configure ($0) with the same arguments ($@) but using bash. However, by the time we reach the line exec $CONFIG_SHELL $0 "$@", the positional parameters have been obliterated by previous tests. Indeed, in the example above it happens to run bash ./configure x make which produces the error.

When using debian's dash, it turns out that before reaching this code configure decides to rerun itself and for that reason it sets CONFIG_SHELL=/bin/sh (to avoid a recursion).

Hence, the test above is wrong (IMO it should only test $BASH_VERSION).

Change History (33)

comment:1 Changed 4 years ago by tornaria

After much suffering, I found out the code at fault in configure.ac:

#---------------------------------------------------------
# We need to run this configure script with bash
if test -z "$BASH_VERSION$CONFIG_SHELL"
then
    CONFIG_SHELL=bash
    export CONFIG_SHELL
    if $CONFIG_SHELL -c "exit 0"
    then
        exec $CONFIG_SHELL $0 "$@"
[...]

introduced by e03f296685f986f89e1e12974f4c5ead9a9ff440.

I guess the idea is to re-run configure ($0) with the same arguments ($@) but using bash. However, by the time we reach the line exec $CONFIG_SHELL $0 "$@", the positional parameters have been obliterated by previous tests. Indeed, in the example above it happens to run bash ./configure x make which produces the error.

When using debian's dash, it turns out that before reaching this code configure decides to rerun itself and for that reason it sets CONFIG_SHELL=/bin/sh (to avoid a recursion).

Hence, the test above is wrong (IMO it should only test $BASH_VERSION).

comment:2 Changed 4 years ago by tornaria

  • Branch set to u/tornaria/23448
  • Commit set to 60ac90025ebfb0b4a8617922d7e09346458b103a

New commits:

60ac900(#23448) fix configure when run with dash

comment:3 Changed 4 years ago by git

  • Commit changed from 60ac90025ebfb0b4a8617922d7e09346458b103a to e77704be65648f5915df5f78573eb26fbc22cee8

Branch pushed to git repo; I updated commit sha1. New commits:

e77704b(#23448) fix configure when run with old dash

comment:4 Changed 4 years ago by tornaria

The first patch makes configure work with new versions of dash, but it breaks with old versions of dash (unpatched 0.5.6, and also version 0.5.7-4+b1 from debian jessie.

This happens because test -z "$BASH_VERSION$CONFIG_SHELL" is incorrect, it should instead be test -z "$BASH_VERSION", which is what the second patch does.

comment:5 Changed 4 years ago by tornaria

  • Status changed from new to needs_review

comment:6 Changed 4 years ago by tornaria

  • Cc jdemeyer added

comment:7 Changed 4 years ago by jdemeyer

  • Description modified (diff)
  • Status changed from needs_review to needs_info

I'm a bit lost here. I think I understood the problem, and I understand what your patch does. However, I don't see how that actually fixes the problem.

comment:8 Changed 4 years ago by jdemeyer

General side comment: when fixing a bug, three things have to be asked:

(1) what is the problem?

(2) what does the patch do?

(3) how does (2) actually fix (1)?

In this case, I am missing (3).

comment:9 Changed 4 years ago by tornaria

Thanks, I tried to explain it above, but I will be more detailed:

(1) by the time configure reaches the line exec $CONFIG_SHELL $0 "$@", the positional parameters have been obliterated by previous tests, so what is actually run is exec bash ./configure x make which is bad (configure interprets x as the target machine which fails).

(2) the patch ​60ac900 moves the block of code to before calling other tests.

(3) this fixes (1) for recent dash because now exec $CONFIG_SHELL $0 "$@" uses the correct positional parameters.

comment:10 Changed 4 years ago by tornaria

It remains to explain e77704b.

I'm actually confused as to the intent of doing test -z "$BASH_VERSION$CONFIG_SHELL", maybe this is so that if the user sets CONFIG_SHELL then we leave it unchanged.

But the problem is that for some older versions of dash the variable CONFIG_SHELL is written to if not set, so the test fails (that seems to be the status quo anyway, so nothing that ​60ac900 changes.

Maybe it's better to change ​60ac900 so that the code in question is run even earlier, before CONFIG_SHELL gets a chance to be written to.

comment:11 follow-up: Changed 4 years ago by tornaria

I understand better now.

The first thing configure does is to check whether the current shell is "good enough" to run configure. If not, it will try sh, bash, ksh and sh5 in /bin or /usr/bin.

When we run configure on debian, /bin/sh is rejected, and so CONFIG_SHELL ends up being set to /bin/bash and the whole configure script is run under bash, so everything is fine.

When I run configure on my system, /bin/sh is ok, so configure keeps going with CONFIG_SHELL unset until it reaches the test -z "$BASH_VERSION$CONFIG_SHELL" which is true. As I explained above, this ends up running exec bash $0 "$@" which is wrong unless it is run very early. This is what ​60ac900 fixes.

As to e77704b, it is necessary when the following conditions are met:

  1. I run configure under an old version of dash (or any "not good enough" shell)
  2. /bin/sh is a new version of dash (or any "good enough shell" which is not bash)

For example, I could be doing

$ ../dash-0.5.6/src/dash ./configure

What happens in this case is that configure decides that the current shell (old version of dash) is not good, so it looks for a better one. It decides /bin/sh is good enough so it sets CONFIG_SHELL=/bin/sh and re-execs itself under this shell. Now the test -z "$BASH_VERSION$CONFIG_SHELL" is false, and so configure is not reexecuted with bash and things break later on [1].

The only way to fix this I could find is to change the test to test -z "$BASH_VERSION", which is what e77704b is doing.

The suggestion of moving the test earlier is not possible: if placed before the line

AC_INIT([Sage], SAGE_VERSION, [sage-devel@googlegroups.com])

it is too early and it won't be included in configure.

But after this line it is already too late (CONFIG_SHELL is already set in the cases where it is a problem).

[1] to be explicit, the first problem is the substitution "${!need_to_install}" which is not supported by dash.

comment:12 Changed 4 years ago by tornaria

  • Authors set to tornaria
  • Status changed from needs_info to needs_review

comment:13 Changed 4 years ago by jdemeyer

  • Status changed from needs_review to needs_info

Author name

comment:14 Changed 4 years ago by jdemeyer

Wouldn't it make sense to move the block

if test -z "$BASH_VERSION"

even higher in the file, as early as possible?

comment:15 Changed 4 years ago by tornaria

  • Authors changed from tornaria to Gonzalo Tornaría
  • Branch changed from u/tornaria/23448 to u/tornaria/23448-v2
  • Commit changed from e77704be65648f5915df5f78573eb26fbc22cee8 to 16d2c3a5bc98b45ec3174d2e30e06889705e0fa6
  • Status changed from needs_info to needs_review

I've changed the patch so that this is done immediately after AC_INIT, as per this paragraph in autoconf documentation:

If your configure script does its own option processing, it should inspect ‘$@’ or ‘$*’ immediately after calling AC_INIT, because other Autoconf macros liberally use the set command to process strings, and this has the side effect of updating ‘$@’ and ‘$*’.

https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.69/html_node/Initializing-configure.html

I put the new patch as a single commit in a new branch, I'm not sure if that's proper etiquette, please let me know otherwise.


New commits:

16d2c3aTrac #23448: fix configure when run with dash

comment:16 in reply to: ↑ 11 Changed 4 years ago by tornaria

Replying to tornaria:

The first thing configure does is to check whether the current shell is "good enough" to run configure. If not, it will try sh, bash, ksh and sh5 in /bin or /usr/bin.

For the record: apparently what it means to be "good enough" is to support $LINENO. It turns out recent dash added or fixed support for $LINENO.

Version 0, edited 4 years ago by tornaria (next)

comment:17 Changed 4 years ago by git

  • Commit changed from 16d2c3a5bc98b45ec3174d2e30e06889705e0fa6 to de73482e6cd4d6ea667c867a7b6197c8ef29ef8c

Branch pushed to git repo; I updated commit sha1. New commits:

de73482Trac #23448: change the way we run bash

comment:18 Changed 4 years ago by tornaria

I found another issue with the way bash is run.

It turns out that if we set CONFIG_SHELL=bash, this ends up into the #! for config.status. Eventually we get an error when this script is run later.

~The simplest solution is to just run exec bash $0 "$@" without touching CONFIG_SHELL. This way configure is run under bash, and we let it decide which shell to put in the #! for config.status.~

The proper fix is to use command -v bash to get full path.

Please ignore de73482 above, it was replaced with ​89212b4 below.

(This could also affect #23451)

Last edited 4 years ago by tornaria (previous) (diff)

comment:19 Changed 4 years ago by tornaria

  • Branch changed from u/tornaria/23448-v2 to u/tornaria/23448-v3
  • Commit changed from de73482e6cd4d6ea667c867a7b6197c8ef29ef8c to 89212b447cda068b0596de93e59ca93a9104ad1e

Final list of commits for this ticket, on branch u/tornaria/23448-v3:

16d2c3aTrac #23448: fix configure when run with dash
89212b4Trac #23448: use full path for bash in configure
Last edited 4 years ago by tornaria (previous) (diff)

comment:20 in reply to: ↑ description ; follow-ups: Changed 4 years ago by jdemeyer

Replying to tornaria:

My /bin/sh happens to be this shell, so just running make is broken in my system.

Simple proposal: change the Makefile to run bash configure instead of ./configure and error out in configure if the shell is not bash.

I think that is a much simpler solution which would also solve your problem.

Comments?

comment:21 in reply to: ↑ 20 ; follow-up: Changed 4 years ago by tornaria

Replying to jdemeyer:

Simple proposal: change the Makefile to run bash configure instead of ./configure and error out in configure if the shell is not bash.

I think that is a much simpler solution which would also solve your problem.

I wouldn't say the current proposal is not simple, in fact my current patch has this in configure.ac:

# The following needs to be immediately after calling AC_INIT
#------------------------------------------------------------
# We need to run this configure script with bash
if test -z "$BASH_VERSION"
then
    CONFIG_SHELL=`command -v bash`
    export CONFIG_SHELL
    if $CONFIG_SHELL -c "exit 0"
    then
        exec $CONFIG_SHELL $0 "$@"
    else
        AC_MSG_NOTICE([The 'bash' shell is needed to build AC_PACKAGE_NAME])
        AC_MSG_NOTICE([All modern systems will have the 'bash' shell installed somewhere])
        if test -d /opt/OpenSource/bin
        then
           AC_MSG_NOTICE([On HP-UX you may try adding /opt/OpenSource/bin to your path])
        fi  
        if test -d /opt/pware/bin
        then
           AC_MSG_NOTICE([On AIX you may try adding /opt/pware/bin to your path])
        fi  
        AC_MSG_ERROR(['bash' not found])
    fi  
fi

while for your proposal it would be:

#------------------------------------------------------------
# We need to run this configure script with bash
if test -z "$BASH_VERSION"
then
    AC_MSG_NOTICE([The 'bash' shell is needed to build AC_PACKAGE_NAME])
    AC_MSG_NOTICE([All modern systems will have the 'bash' shell installed somewhere])
    if test -d /opt/OpenSource/bin
    then
       AC_MSG_NOTICE([On HP-UX you may try adding /opt/OpenSource/bin to your path])
    fi  
    if test -d /opt/pware/bin
    then
       AC_MSG_NOTICE([On AIX you may try adding /opt/pware/bin to your path])
    fi  
    AC_MSG_ERROR(['bash' not found])
fi

So it's essentially the same except the first one will try to find bash and reexec instead, we avoid tracking all the paths that lead to running configure.

Note that I didn't write the code above, it was already written and the only problem was that it was moved to run too late. With the warning I added (needs to be immediately after calling AC_INIT) it should be safe.

comment:22 in reply to: ↑ 20 Changed 4 years ago by tornaria

Replying to jdemeyer:

I think that is a much simpler solution which would also solve your problem.

Another comment: it just happens that I'm running a new version of dash, while most others (e.g. debian, ubuntu, etc) are running an older version.

This will be a problem for everybody once the main distros pick up a new version of dash, so fixing this now is investing for the future :-)

(same applies to #23451)

comment:23 in reply to: ↑ 21 ; follow-up: Changed 4 years ago by jdemeyer

Replying to tornaria:

So it's essentially the same except the first one will try to find bash and reexec instead

"apples and oranges are essentially the same except they are different" :-)

As this ticket shows, it's precisely the reexec part which is fragile, so I suggest just dropping that.

comment:24 in reply to: ↑ 23 Changed 4 years ago by tornaria

Replying to jdemeyer:

Replying to tornaria:

So it's essentially the same except the first one will try to find bash and reexec instead

"apples and oranges are essentially the same except they are different" :-)

As this ticket shows, it's precisely the reexec part which is fragile, so I suggest just dropping that.

Not quite, the "fragile" part is using the positional parameters "$0" and "$@", but that's very clearly documented in the autoconf manual that the parameters must be used immediately after calling AC_INIT since other macros will change their value.

I would argue that if configure requires bash, then the #! line should have /bin/bash instead, but that may have other problems (maybe bash is in a different location, etc), it's a bit silly to ship a script with "#! /bin/sh" for which the correct way to run it is bash ./configure.

The standard way to instal a random program from source is to run

./configure
make

which won't work in your proposal.

Last edited 4 years ago by tornaria (previous) (diff)

comment:25 Changed 4 years ago by dimpase

  • Milestone changed from sage-8.0 to sage-8.2
  • Reviewers set to Dima Pasechnik
  • Status changed from needs_review to needs_work

I am seeing a similar problem on FreeBSD, where /bin/sh is not bash or dash. However, I have /bin/bash, pointing to /usr/local/bin/bash, and this is my default shell.

My current workaround is to simply edit ./configure to replace every /bin/sh with /bin/bash.

The branch on this ticket does not work for me, as /bin/sh is not the right shell to propagate into the build files by ./configure: I get some silly errors while running make (which then tries to use /bin/sh instead of /bin/bash).

I think we should give up on even trying to run configure with /bin/sh, make bash a Sage pre-req, and use /bin/bash (or whatever the proper shebang invocation would be e.g. /usr/bin/env bash) consistently throughout.

Last edited 4 years ago by dimpase (previous) (diff)

comment:26 Changed 4 years ago by dimpase

  • Summary changed from configure fails when run with dash to configure fails when run with dash or BSD sh

comment:27 Changed 4 years ago by dimpase

Would it be acceptable to simply add to ./bootstrap the sed editing of the generated ./configure to replace all the occurence of /bin/sh with the output of command -v bash ?

comment:28 Changed 4 years ago by tornaria

  • Status changed from needs_work to needs_review
  • Summary changed from configure fails when run with dash or BSD sh to configure fails when run with a posix shell supporting $LINENO (e.g. dash 0.5.9 and many others)

If my solution doesn't fix your bug, then it may be different bug.

Can you please document clearly what fails before/after this patches? I suggest that you include detailed information of the exact shell with version and where to get it / how to build it, as I did in my report.

AFAICT the behaviour with any posix-compatible shell would be to run configure under bash. This was broken before, and it should be fixed after.

Did you remember to run ./bootstrap after switching branches?

I just tried to run configure with the three BSD shells that I have available in my distro:

[*] loksh-6.2_1                        A Linux port of OpenBSD's ksh
[*] mksh-R56b_1                        The MirBSD Korn Shell
[*] oksh-0.5.9_2                       OpenBSD's version of ksh ported to Linux

I can tell you that those three exhibit the same behaviour as dash 0.5.9.1, namely configure fails in the same way before my patch, and works after my patch.

While I was at it, I also tried all the other posix shells I have available in my distro:

[*] heirloom-sh-050706_3                A portable variant of the traditional Unix shell
[*] ksh-2017.12.31_1                    AT&T's Korn shell (ksh93)
[*] yash-2.46_1                         Yet another shell (POSIX-compliant)
[*] zsh-5.4.2_1                         The Z SHell

with the same result (configure is broken before, works after).

Note that I am NOT claiming one can build the whole of sage with those shells, but only that configure can be run successfully. In fact, #23451 reports and fixes two issues down the line.

Also note that this works by switching to bash, so it's still a requirement, but it will pick bash from your PATH so you don't need to put it in /bin/bash (which you can't unless you are root)


The patch I proposed fixes the issue I reported in a reasonable way without breaking anything that already works. It is really simple, just changing the order of some lines to match what is required explicitly by autoconf documentation, and some little details. I explained and documented everything that's going on.

I documented above 8 different shells for which configure is broken and fixed by this.

The fact that there may be OTHER bugs that my patch won't fix is a non sequitur for rejecting it, and I even reported at least one more (#23451).

I spent a fair number of hours back in SD87 to understand and diagnose this and the other bug, fix it and document the reasons for the bug and the fix, and the situation is that 6 months after I still cannot build Sage in void linux.

More important, as I argued above, is that is only matter of time until "more mainstream" distros pick the new version of dash that will break sage building.


Let me summarise the situation.

Before the patch:

  1. sage can be built if /bin/sh is bash or an old version of dash (I guess any posix shell that doesn't support $LINENO)
  2. configure fails if /bin/sh is a posix shell that supports $LINENO

After the patch:

  1. nothing changes for situation (a)
  2. configure works if /bin/sh is a posix shell that supports $LINENO (tested with 8 different shells)
  3. sage can be completely built after some minor additional fixes to a couple of packages (#23451)

Can we please:

  1. limit the scope of this ticket to fixing the configure step of sage when /bin/sh is a posix shell that supports $LINENO
  2. limit the review to checking my claim that
    1. the patch won't break building in systems where it now works
    2. it fixes the configure step under other shells (including dash 0.5.9.1)

Fixing other steps of the build is a non-goal of the current ticket.

comment:29 follow-up: Changed 4 years ago by dimpase

  • Reviewers changed from Dima Pasechnik to Jeroen Demeyer, Dima Pasechnik
  • Status changed from needs_review to positive_review

OK, sorry for noise. It looks good.

comment:30 in reply to: ↑ 29 Changed 4 years ago by tornaria

Replying to dimpase:

OK, sorry for noise. It looks good.

Thanks, let me know if I can help you diagnosing the issue you have with FreeBSD.

comment:31 follow-up: Changed 4 years ago by jdemeyer

  • Reviewers changed from Jeroen Demeyer, Dima Pasechnik to Dima Pasechnik

I never said that I agreed with the current branch.

comment:32 in reply to: ↑ 31 Changed 4 years ago by dimpase

Replying to jdemeyer:

I never said that I agreed with the current branch.

I take "reviewer" in sense of publishing, not as "signed off by".

This patch is an improvement for freebsd and other non-*ash shells too.

comment:33 Changed 4 years ago by vbraun

  • Branch changed from u/tornaria/23448-v3 to 89212b447cda068b0596de93e59ca93a9104ad1e
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.