Opened 23 months ago

Last modified 19 months ago

#24575 needs_work defect

on Arch make+guile is broken

Reported by: vdelecroix Owned by:
Priority: critical Milestone: sage-8.2
Component: packages: standard Keywords:
Cc: embray, vbraun, charpent, defeo, jpflori Merged in:
Authors: Erik Bray, Vincent Delecroix Reviewers: Erik Bray, Vincent Delecroix
Report Upstream: Reported upstream. No feedback yet. Work issues:
Branch: u/vdelecroix/24575 (Commits) Commit: 0d8a4a954cee8958b24ece2067bf5191dcbcd28e
Dependencies: #24885 Stopgaps:

Description (last modified by gh-dimpase)

Guile plugin in make under certain not completely clear conditions, likely to do with a version misconfiguration of system libraries, may fail to build a number of Sage packages. For example, while building Sage 8.2.beta3 on latest archlinux one gets

make: symbol lookup error: /usr/lib/libguile-2.2.so.1: undefined symbol: 
GC_move_disappearing_link

as libguile looks at the LD_LIBRARY_PATH containing a different version of libgc.so than the one it needs.

See also this report on sage-devel.

After deactivating the gc package, the compilation went fine.

However, this cannot be reproduced on other Linux systems.

The workaround in the branch consists in declaring the environment variable LD_PRELOAD so that make uses the system gc. The workaround has to be applied to the 2 standard packages R and rpy2.

The three other packges flint/arb/deformation fail to build for the same reason and we apply a small patch to avoid them redefining LD_LIBRARY_PATH.

Upstream issues

Attachments (7)

flint-2.5.2.p2.log (3.5 KB) - added by vdelecroix 22 months ago.
rpy2-2.8.2.p0.log (19.5 KB) - added by vdelecroix 22 months ago.
flint-2.5.2-check_on_base.log.gz (216.9 KB) - added by vdelecroix 21 months ago.
flint-2.5.2-check_with_DLPATH_ADD_empty.log.gz (59.5 KB) - added by vdelecroix 21 months ago.
flint-2.5.2-check_with_DLPATH_ADD_pwd.log.gz (160.9 KB) - added by vdelecroix 21 months ago.
flint-2.5.2-check_with_sdh_preload.log.gz (101.0 KB) - added by vdelecroix 21 months ago.
flint-2.5.2-check_with_DLPATH_ADD_empty_but_not_on_make_install.log.gz (55.9 KB) - added by vdelecroix 21 months ago.

Download all attachments as: .zip

Change History (142)

comment:1 Changed 23 months ago by vdelecroix

  • Description modified (diff)

comment:2 Changed 23 months ago by vdelecroix

  • Description modified (diff)

comment:3 Changed 23 months ago by charpent

  • Cc charpent added

A (possibly naive) suggestion is to do (almost) what we already do for gcc and related, but simpler :

  • Depend on gc
  • Test for it (and its version) in the main configure file
    • If found (and sufficient) : use that (possibly symlinking the relevant header/library files)
    • Else : install "our" version.

Drawback : a newer version might break compatibility. A code review of its use is necessary.

BTW : can one use pkg-config in the main configure file ? I think not (alas...).

comment:4 Changed 23 months ago by dimpase

At what point do you get this error? While building R? (guessing the latter from the subject of sage-devel post)

comment:5 follow-up: Changed 23 months ago by dimpase

how come make depends on guile for you? I don't see it.

$ ldd `which gmake`
	linux-vdso.so.1 (0x00007ffe861ce000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f0e5b6d3000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f0e5b30f000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f0e5b8d7000)

I've just installed gc-7.6.2 systemwide, and things work for me with Sage, so far.

comment:6 Changed 23 months ago by dimpase

By the way, there is #23700, which would give you the same major gc version as you apparently need (although I fail to see why).

comment:7 follow-up: Changed 23 months ago by dimpase

IMHO, if you can nuke a system utility by installing a library with normal user privileges, then you have a huge security hole. Thus I don't think it's something to fix in Sage.

comment:8 Changed 23 months ago by vbraun

I think the "symbol lookup error" is just being echoed by make, its not from make not finding a symbol; This happens while R is compiling the MASS package and output is clearly being filtered.

Apparently R sets LD_LIBRARY_PATH while compiling packages so Sage's libraries take precedence over system ones. Which inevitably leads to problems, which is why we removed that from the Sage build system a while ago.

comment:9 Changed 23 months ago by dimpase

No, R is not setting LD_LIBRARY_PATH, it is merely respecting it. I think we have a case of Sage being built with LD_LIBRARY_PATH set to something, and also guile (indeed, it has nothing to do with Sage AFAIK) involved in the environment somehow; and guile (perhaps invoked from .bashrc?) made to use wrong gc version from Sage.

To reproduce this one needs to have libgc.so.X have the same X in $SAGE_LOCAL/lib and in /usr/lib. On my system X=1 in the former and X=2 in the latter, and libguile is linked to libgc.so.2. So I went and made

$ ln -sf libgc.so.1 libgc.so.2

in $SAGE_LOCAL/lib. After this I duly get

$ LD_LIBRARY_PATH=./local/lib guile
guile: symbol lookup error: /usr/lib64/libguile-2.2.so.1: undefined symbol: GC_move_disappearing_link

Needless to say, R still builds just fine for me after this hack.

comment:10 Changed 23 months ago by vdelecroix

For precision, my $LD_LIBRARY_PATH is empty. It should not have anything to do with it. What about the proposition of charpent comment:3?

comment:11 Changed 23 months ago by dimpase

You have a strange setup on your system, which involves guile into building Sage. Perhaps something in shell configurations, I do not know. Or something wrong with your linker settings or its cache. Guile is a system library which you can only make to fail this way by setting LD_LIBRARY_PATH. But Sage does not do it, something else does.

Last edited 23 months ago by dimpase (previous) (diff)

comment:12 in reply to: ↑ 7 ; follow-up: Changed 23 months ago by embray

Replying to dimpase:

IMHO, if you can nuke a system utility by installing a library with normal user privileges, then you have a huge security hole. Thus I don't think it's something to fix in Sage.

That's not what's going on here so please don't mischaracterize it as a "huge security hole". It's quite normal to have a broken setup where one executable is linking at runtime with the wrong version of some shared library. This is the Linux version of "DLL hell" (albeit less severe).

comment:13 in reply to: ↑ 12 ; follow-up: Changed 23 months ago by dimpase

Replying to embray:

Replying to dimpase:

IMHO, if you can nuke a system utility by installing a library with normal user privileges, then you have a huge security hole. Thus I don't think it's something to fix in Sage.

That's not what's going on here so please don't mischaracterize it as a "huge security hole". It's quite normal to have a broken setup where one executable is linking at runtime with the wrong version of some shared library. This is the Linux version of "DLL hell" (albeit less severe).

One needs to set LD_LIBRARY_PATH for this to happen. If on the other hand you succeed in replacing the system library with one at your account *for all the users*, regardless of the environment, then yes, you have hacked the system via a security hole.

Anyhow, there is no Sage bug to fix here, that's what I am trying to say all along. Unless I see an meaningful explanation how libguile is relevant to building Sage, I'd tend to set this to wontfix.

comment:14 in reply to: ↑ 13 ; follow-up: Changed 23 months ago by embray

Replying to dimpase:

Replying to embray:

Replying to dimpase:

IMHO, if you can nuke a system utility by installing a library with normal user privileges, then you have a huge security hole. Thus I don't think it's something to fix in Sage.

That's not what's going on here so please don't mischaracterize it as a "huge security hole". It's quite normal to have a broken setup where one executable is linking at runtime with the wrong version of some shared library. This is the Linux version of "DLL hell" (albeit less severe).

One needs to set LD_LIBRARY_PATH for this to happen. If on the other hand you succeed in replacing the system library with one at your account *for all the users*, regardless of the environment, then yes, you have hacked the system via a security hole.

I...don't see any evidence that that's happening here.

comment:15 in reply to: ↑ 14 Changed 23 months ago by dimpase

Replying to embray:

Replying to dimpase:

Replying to embray:

Replying to dimpase:

IMHO, if you can nuke a system utility by installing a library with normal user privileges, then you have a huge security hole. Thus I don't think it's something to fix in Sage.

That's not what's going on here so please don't mischaracterize it as a "huge security hole". It's quite normal to have a broken setup where one executable is linking at runtime with the wrong version of some shared library. This is the Linux version of "DLL hell" (albeit less severe).

One needs to set LD_LIBRARY_PATH for this to happen. If on the other hand you succeed in replacing the system library with one at your account *for all the users*, regardless of the environment, then yes, you have hacked the system via a security hole.

I...don't see any evidence that that's happening here.

I have not said I see this, either. What I see is an unexplained attempt to invoke (lib)guile during the Sage build.

comment:16 follow-up: Changed 23 months ago by vbraun

Somebody who can reproduce the original problem should make the R build more verbose and try again...

comment:17 Changed 23 months ago by dimpase

according to Vincent this can happen on his system while building Flint:

I restart a build from scratch and I don't believe that R is responsible 
in any way. This new build stopped on flint pointing at the same library 
issue 

make: symbol lookup error: /usr/lib/libguile-2.2.so.1: undefined symbol: 
GC_move_disappearing_link 

comment:18 in reply to: ↑ 16 Changed 23 months ago by vdelecroix

Replying to vbraun:

Somebody who can reproduce the original problem should make the R build more verbose and try again...

on it

comment:19 Changed 23 months ago by vdelecroix

Failed on flint. But debug mode not very helpful (made in flint source dir)

(sage-sh)$ make --debug
GNU Make 4.2.1
Built for x86_64-unknown-linux-gnu
Copyright (C) 1988-2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Reading makefiles...
Updating makefiles....
Updating goal targets....
 File 'all' does not exist.
   File 'library' does not exist.
  Must remake target 'library'.
make: symbol lookup error: /usr/lib/libguile-2.2.so.1: undefined symbol: GC_move_disappearing_link
make: *** [Makefile:173: library] Error 127

comment:20 follow-up: Changed 23 months ago by dimpase

Can you try starting guile at (sage-sh)$ prompt?

comment:21 in reply to: ↑ 20 Changed 23 months ago by vdelecroix

Replying to dimpase:

Can you try starting guile at (sage-sh)$ prompt?

It works fine

(sage-sh) $ guile
GNU Guile 2.2.3
Copyright (C) 1995-2017 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guile-user)> quit()
$1 = #<procedure quit args>
While compiling expression:
Syntax error:
unknown location: unexpected syntax in form ()
scheme@(guile-user)> ()
Last edited 23 months ago by vdelecroix (previous) (diff)

comment:22 Changed 23 months ago by vdelecroix

Still with flint: without any option to ./configure it succeeds

(sage-sh)$ ./configure --disable-static --prefix="$SAGE_LOCAL"
Configuring...x86_64-Linux
Testing __builtin_popcountl...yes
Testing native popcount...yes
Testing __thread...yes
Testing fenv...yes
FLINT was successfully configured.
(sage-sh) $ make
mkdir -p build
make[1]: Entering directory '/opt/sage-bis/local/var/tmp/sage/build/flint-2.5.2.p1/src'
    CC   build/printf.lo
    CC   build/fprintf.lo
    CC   build/sprintf.lo
    CC   build/scanf.lo
    CC   build/fscanf.lo
    CC   build/sscanf.lo
    CC   build/clz_tab.lo
    CC   build/memory_manager.lo
    CC   build/version.lo
    CC   build/profiler.lo
    CC   build/thread_support.lo
...

But setting --with-gmp it fails

(sage-sh) $ ./configure --disable-static --prefix="$SAGE_LOCAL" --with-gmp="$SAGE_LOCAL"
Configuring...x86_64-Linux
Testing __builtin_popcountl...yes
Testing native popcount...yes
Testing __thread...yes
Testing fenv...yes
FLINT was successfully configured.
(sage-sh) $ make
make: symbol lookup error: /usr/lib/libguile-2.2.so.1: undefined symbol: GC_move_disappearing_link
make: *** [Makefile:173: library] Error 127

comment:23 in reply to: ↑ 5 ; follow-up: Changed 23 months ago by vdelecroix

Replying to dimpase:

how come make depends on guile for you? I don't see it.

$ ldd `which gmake`
	linux-vdso.so.1 (0x00007ffe861ce000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f0e5b6d3000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f0e5b30f000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f0e5b8d7000)

I've just installed gc-7.6.2 systemwide, and things work for me with Sage, so far.

What is gmake? I got

(sage-sh) $ ldd `which make`
        linux-vdso.so.1 (0x00007fff3ccf6000)
        libguile-2.2.so.1 => /usr/lib/libguile-2.2.so.1 (0x00007f88df74d000)
        libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f88df549000)
        libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f88df32b000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007f88def74000)
        libgc.so.1 => /usr/lib/libgc.so.1 (0x00007f88ded0a000)
        libffi.so.6 => /usr/lib/libffi.so.6 (0x00007f88deb01000)
        libunistring.so.2 => /usr/lib/libunistring.so.2 (0x00007f88de77f000)
        libgmp.so.10 => /usr/lib/libgmp.so.10 (0x00007f88de4ec000)
        libltdl.so.7 => /usr/lib/libltdl.so.7 (0x00007f88de2e2000)
        libcrypt.so.1 => /usr/lib/libcrypt.so.1 (0x00007f88de0aa000)
        libm.so.6 => /usr/lib/libm.so.6 (0x00007f88ddd5e000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f88dfa7a000)
        libatomic_ops.so.1 => /usr/lib/libatomic_ops.so.1 (0x00007f88ddb5c000)

comment:24 in reply to: ↑ 23 ; follow-ups: Changed 23 months ago by dimpase

Replying to vdelecroix:

Replying to dimpase:

how come make depends on guile for you? I don't see it.

$ ldd `which gmake`
	linux-vdso.so.1 (0x00007ffe861ce000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f0e5b6d3000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f0e5b30f000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f0e5b8d7000)

I've just installed gc-7.6.2 systemwide, and things work for me with Sage, so far.

What is gmake?

for me make is a link to gmake, but it's not important. What's important is that your make is linked with libguile (and a slew of its dependencies, including libgc), and this is not usual (I never heard of it--- although it is not crazy, see https://www.gnu.org/software/make/manual/html_node/Guile-Integration.html)

I got

(sage-sh) $ ldd `which make`
        linux-vdso.so.1 (0x00007fff3ccf6000)
        libguile-2.2.so.1 => /usr/lib/libguile-2.2.so.1 (0x00007f88df74d000)
        libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f88df549000)
        libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f88df32b000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007f88def74000)
        libgc.so.1 => /usr/lib/libgc.so.1 (0x00007f88ded0a000)
        libffi.so.6 => /usr/lib/libffi.so.6 (0x00007f88deb01000)
        libunistring.so.2 => /usr/lib/libunistring.so.2 (0x00007f88de77f000)
        libgmp.so.10 => /usr/lib/libgmp.so.10 (0x00007f88de4ec000)
        libltdl.so.7 => /usr/lib/libltdl.so.7 (0x00007f88de2e2000)
        libcrypt.so.1 => /usr/lib/libcrypt.so.1 (0x00007f88de0aa000)
        libm.so.6 => /usr/lib/libm.so.6 (0x00007f88ddd5e000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f88dfa7a000)
        libatomic_ops.so.1 => /usr/lib/libatomic_ops.so.1 (0x00007f88ddb5c000)

So we see that at this point make appears to be correctly linked.

What do you see if in that directory (at sage-sh prompt) you run make -v rather than make? (More precisely, I'd like to understand whether it's the generated Flint's Makefile that breaks it, or it's just make itself)

And what does ldd /usr/lib/libguile-2.2.so.1 show?

comment:25 in reply to: ↑ 24 Changed 23 months ago by vdelecroix

Replying to dimpase:

Replying to vdelecroix:

Replying to dimpase:

What do you see if in that directory (at sage-sh prompt) you run make -v rather than make? (More precisely, I'd like to understand whether it's the generated Flint's Makefile that breaks it, or it's just make itself)

(sage-sh) $ make -v
GNU Make 4.2.1
Construit pour x86_64-unknown-linux-gnu
Copyright (C) 1988-2016 Free Software Foundation, Inc.
Licence GPLv3+ : GNU GPL version 3 ou ultérieure <http://gnu.org/licenses/gpl.html>
Ceci est un logiciel libre : vous êtes autorisé à le modifier et à la redistribuer.
Il ne comporte AUCUNE GARANTIE, dans la mesure de ce que permet la loi.

Please read also comment:22: make does not look broken when I do not configure gmp.

And what does ldd /usr/lib/libguile-2.2.so.1 show?

(sage-sh) $ ldd /usr/lib/libguile-2.2.so.1
        linux-vdso.so.1 (0x00007fff17387000)
        libgc.so.1 => /usr/lib/libgc.so.1 (0x00007fed43ae3000)
        libffi.so.6 => /usr/lib/libffi.so.6 (0x00007fed438da000)
        libunistring.so.2 => /usr/lib/libunistring.so.2 (0x00007fed43558000)
        libgmp.so.10 => /usr/lib/libgmp.so.10 (0x00007fed432c5000)
        libltdl.so.7 => /usr/lib/libltdl.so.7 (0x00007fed430bb000)
        libcrypt.so.1 => /usr/lib/libcrypt.so.1 (0x00007fed42e83000)
        libm.so.6 => /usr/lib/libm.so.6 (0x00007fed42b37000)
        libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007fed42919000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007fed42562000)
        /usr/lib64/ld-linux-x86-64.so.2 (0x00007fed4407a000)
        libdl.so.2 => /usr/lib/libdl.so.2 (0x00007fed4235e000)
        libatomic_ops.so.1 => /usr/lib/libatomic_ops.so.1 (0x00007fed4215c000)

comment:26 in reply to: ↑ 24 Changed 23 months ago by fbissey

Replying to dimpase:

for me make is a link to gmake, but it's not important. What's important is that your make is linked with libguile (and a slew of its dependencies, including libgc), and this is not usual (I never heard of it---

Since GNU make version 4 you can extend make with guile bindings. Building make with such extension is a configuration option. Building those or not is a choice usually made by distro. Usually binary distro include all possible options unless they have "reservations". On Gentoo it is an option that is off by default.

comment:27 Changed 23 months ago by dimpase

I've built make from source with --with-guile, guile version 2.2. (which required changing one character in line 171 configure.ac,

[ PKG_CHECK_MODULES([GUILE], [guile-2.2], [have_guile=yes],

(2.2 instead of 2.0 - this probably explains why I was unable to build it the gentoo way?), getting

$ ldd `which make`
	linux-vdso.so.1 (0x00007ffc6d3c3000)
	libguile-2.2.so.1 => /usr/lib64/libguile-2.2.so.1 (0x00007f932105f000)
	libgc.so.2 => /usr/lib64/libgc.so.2 (0x00007f9320de6000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f9320be2000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f93209c2000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f93205fe000)
	libffi.so.6 => /usr/lib64/libffi.so.6 (0x00007f93203f5000)
	libunistring.so.2 => /usr/lib64/libunistring.so.2 (0x00007f932007c000)
	libgmp.so.10 => /usr/lib64/libgmp.so.10 (0x00007f931fdf3000)
	libltdl.so.7 => /usr/lib64/libltdl.so.7 (0x00007f931fbe9000)
	libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f931f9b1000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f931f66f000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f93213b0000)

but I cannot reproduce this. It might be the version difference, but the produced make happily builds Flint even if I do $export LD_LIBRARY_PATH=$SAGE_LOCAL/lib; make.

Needless to say, this export breaks guile:

$ guile
guile: symbol lookup error: /usr/lib64/libguile-2.2.so.1: undefined symbol: GC_move_disappearing_link

Or it might be that the generated by Flint Makefile does not trigger guile extension in my case, and does trigger it in Vincent's case?

comment:28 follow-up: Changed 23 months ago by fbissey

The trigger is just execution. If the symbol is not resolved you get this. There are a couple of things to remember:

  • for the problem to happen the soname of the libgc in sage and on the system need to be the same
  • while the soname are the same libgc in sage doesn't have the same symbols than on the system

So either libgc shouldn't have the same soname (upstream not bumping the number properly) or libgc is not configured with the same features in sage and on the system.

comment:29 in reply to: ↑ 28 Changed 23 months ago by dimpase

Replying to fbissey:

The trigger is just execution. If the symbol is not resolved you get this. There are a couple of things to remember:

  • for the problem to happen the soname of the libgc in sage and on the system need to be the same
  • while the soname are the same libgc in sage doesn't have the same symbols than on the system

So either libgc shouldn't have the same soname (upstream not bumping the number properly) or libgc is not configured with the same features in sage and on the system.

This does crash guile:

$ ldd `which guile`
	linux-vdso.so.1 (0x00007ffd5e36a000)
	libguile-2.2.so.1 => /usr/lib64/libguile-2.2.so.1 (0x00007f4c66c70000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f4c66a50000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f4c6668c000)
	libgc.so.2 => /usr/lib64/libgc.so.2 (0x00007f4c66413000)
	libffi.so.6 => /usr/lib64/libffi.so.6 (0x00007f4c6620a000)
	libunistring.so.2 => /usr/lib64/libunistring.so.2 (0x00007f4c65e91000)
	libgmp.so.10 => /usr/lib64/libgmp.so.10 (0x00007f4c65c08000)
	libltdl.so.7 => /usr/lib64/libltdl.so.7 (0x00007f4c659fe000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f4c657fa000)
	libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f4c655c2000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f4c65280000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f4c66fc1000)
(sage-sh) dima@hilbert:sage-dev$ LD_LIBRARY_PATH=$SAGE_LOCAL/lib guile
guile: symbol lookup error: /usr/lib64/libguile-2.2.so.1: undefined symbol: GC_move_disappearing_link

as I created a link to the wrong libgc (see comment 9):

$ ls -l $SAGE_LOCAL/lib/libgc*
-rw-r--r-- 1 dima dima 946784 Dec 30 09:59 /home/dima/Sage/sage-dev/local/lib/libgc.a
lrwxrwxrwx 1 dima dima     14 Dec 30 09:59 /home/dima/Sage/sage-dev/local/lib/libgc.so -> libgc.so.1.0.3
lrwxrwxrwx 1 dima dima     14 Dec 30 09:59 /home/dima/Sage/sage-dev/local/lib/libgc.so.1 -> libgc.so.1.0.3
-rwxr-xr-x 1 dima dima 702568 Dec 30 09:59 /home/dima/Sage/sage-dev/local/lib/libgc.so.1.0.3
lrwxrwxrwx 1 dima dima     10 Jan 20 22:54 /home/dima/Sage/sage-dev/local/lib/libgc.so.2 -> libgc.so.1

(libgc.so.1 is wrong (Sage's gc 7.2)) But make with guile works just fine:

$ LD_LIBRARY_PATH=$SAGE_LOCAL/lib /home/dima/bin/make -v
GNU Make 4.2.1
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

even though it is linked to the same libguile:

$ ldd /home/dima/bin/make 
	linux-vdso.so.1 (0x00007ffd2959e000)
	libguile-2.2.so.1 => /usr/lib64/libguile-2.2.so.1 (0x00007fb4e1418000)
	libgc.so.2 => /usr/lib64/libgc.so.2 (0x00007fb4e119f000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fb4e0f9b000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fb4e0d7b000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fb4e09b7000)
	libffi.so.6 => /usr/lib64/libffi.so.6 (0x00007fb4e07ae000)
	libunistring.so.2 => /usr/lib64/libunistring.so.2 (0x00007fb4e0435000)
	libgmp.so.10 => /usr/lib64/libgmp.so.10 (0x00007fb4e01ac000)
	libltdl.so.7 => /usr/lib64/libltdl.so.7 (0x00007fb4dffa2000)
	libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007fb4dfd6a000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fb4dfa28000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fb4e1769000)

just as a sanity check:

$ LD_LIBRARY_PATH=$SAGE_LOCAL/lib ldd /home/dima/bin/make 
	linux-vdso.so.1 (0x00007ffccbbfe000)
	libguile-2.2.so.1 => /usr/lib64/libguile-2.2.so.1 (0x00007ff033858000)
	libgc.so.2 => /home/dima/Sage/sage-dev/local/lib/libgc.so.2 (0x00007ff0334f9000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007ff0332f5000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007ff0330d5000)
	libc.so.6 => /lib64/libc.so.6 (0x00007ff032d11000)
	libffi.so.6 => /usr/lib64/libffi.so.6 (0x00007ff032b08000)
	libunistring.so.2 => /usr/lib64/libunistring.so.2 (0x00007ff03278f000)
	libgmp.so.10 => /usr/lib64/libgmp.so.10 (0x00007ff032506000)
	libltdl.so.7 => /usr/lib64/libltdl.so.7 (0x00007ff0322fc000)
	libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007ff0320c4000)
	libm.so.6 => /lib64/libm.so.6 (0x00007ff031d82000)
	/lib64/ld-linux-x86-64.so.2 (0x00007ff033ba9000)

So even though the link ought to be resolved by the linker, it is not done (I can also run actual building, not just -v with this setup).

comment:30 follow-up: Changed 22 months ago by defeo

  • Cc defeo added

Just wanted to confirm I'm experiencing the same problem on Arch. I have no more insights than you guys.

Has Antonio Rojas popped up in the discussion yet? He might have already seen this error while packaging for Arch.

comment:31 Changed 22 months ago by dimpase

One trivial way out is to upgrade our gc, see #23700

comment:32 in reply to: ↑ 30 Changed 22 months ago by dimpase

Replying to defeo:

Just wanted to confirm I'm experiencing the same problem on Arch. I have no more insights than you guys.

Has Antonio Rojas popped up in the discussion yet? He might have already seen this error while packaging for Arch.

I think that Arch guys forgot to bump up the version of libgc, for it is still libgc.so.1 (while on gentoo the same libgc is named libgc.so.2) cf comments 25 and 27 above.

Note that Arch most probably uses system libgc in its build of Sage, as it's not listed here https://www.archlinux.org/packages/community/x86_64/sagemath/

Last edited 22 months ago by dimpase (previous) (diff)

comment:33 Changed 22 months ago by dimpase

Could anyone who can reproduce this check whether #23000 fixes the problem?

comment:34 follow-up: Changed 22 months ago by dimpase

oops, typo, it should be "Could anyone who can reproduce this check whether #23700 fixes the problem?"

comment:35 Changed 22 months ago by dimpase

  • Report Upstream changed from N/A to Reported upstream. No feedback yet.

I've asked on bug-make@gnu.org whether is this a GNU make bug.

Last edited 22 months ago by dimpase (previous) (diff)

comment:36 follow-ups: Changed 22 months ago by dimpase

  • Report Upstream changed from Reported upstream. No feedback yet. to Reported upstream. Developers deny it's a bug.

Well, I am not convinced, and am still waiting for an answer to this.

comment:37 in reply to: ↑ 36 ; follow-up: Changed 22 months ago by embray

Replying to dimpase:

Well, I am not convinced, and am still waiting for an answer to this.

Your comment about maybe statically linking libguile makes some sense, but then you'd have to also statically link any of its dependencies as well, including libgc or else it wouldn't solve the problem.

comment:38 in reply to: ↑ 37 Changed 22 months ago by dimpase

Replying to embray:

Replying to dimpase:

Well, I am not convinced, and am still waiting for an answer to this.

Your comment about maybe statically linking libguile makes some sense, but then you'd have to also statically link any of its dependencies as well, including libgc or else it wouldn't solve the problem.

They could also load the Guile extension only if they need it. (And/or have a configuration option of turning it off).

comment:39 in reply to: ↑ 36 Changed 22 months ago by embray

That would make sense too.

Anyways, I'm increasingly convinced that the problem here is in the affected distros. I'm gonna try an Arch VM and see if I can reproduce...

comment:40 in reply to: ↑ 34 ; follow-ups: Changed 22 months ago by defeo

Replying to dimpase:

oops, typo, it should be "Could anyone who can reproduce this check whether #23700 fixes the problem?"

Not for me.

  1. I checked out the ticket and ran make. Same failure.
  1. I ran make distclean, the make again. I got this failure:
[patch-2.7.5] Using cached file /home/defeo/sage/upstream/patch-2.7.5.tar.gz
[patch-2.7.5] patch-2.7.5
[patch-2.7.5] ====================================================
[patch-2.7.5] Setting up build directory for patch-2.7.5
[patch-2.7.5] Traceback (most recent call last):
[patch-2.7.5]   File "/home/defeo/sage/build/bin/sage-uncompress-spkg", line 23, in <module>
[patch-2.7.5]     run()
[patch-2.7.5]   File "/home/defeo/sage/build/bin/../sage_bootstrap/uncompress/cmdline.py", line 72, in run
[patch-2.7.5]     unpack_archive(archive, dirname)
[patch-2.7.5]   File "/home/defeo/sage/build/bin/../sage_bootstrap/uncompress/action.py", line 68, in unpack_archive
[patch-2.7.5]     archive.extractall(members=archive.names)
[patch-2.7.5]   File "/home/defeo/sage/build/bin/../sage_bootstrap/uncompress/tar_file.py", line 90, in extractall
[patch-2.7.5]     members=members)
[patch-2.7.5]   File "/usr/lib/python3.6/tarfile.py", line 2007, in extractall
[patch-2.7.5]     numeric_owner=numeric_owner)
[patch-2.7.5]   File "/usr/lib/python3.6/tarfile.py", line 2049, in extract
[patch-2.7.5]     numeric_owner=numeric_owner)
[patch-2.7.5] TypeError: _extract_member() got an unexpected keyword argument 'set_attrs'
[patch-2.7.5] ************************************************************************
[patch-2.7.5] Error: failed to extract /home/defeo/sage/upstream/patch-2.7.5.tar.gz
[patch-2.7.5] ************************************************************************

comment:41 in reply to: ↑ 40 Changed 22 months ago by dimpase

a duplicate comment, sorry.

Last edited 22 months ago by dimpase (previous) (diff)

comment:42 in reply to: ↑ 40 Changed 22 months ago by dimpase

Replying to defeo:

Replying to dimpase:

oops, typo, it should be "Could anyone who can reproduce this check whether #23700 fixes the problem?"

Not for me.

  1. I checked out the ticket and ran make. Same failure.
  1. I ran make distclean, the make again. I got this failure:
[patch-2.7.5] Using cached file /home/defeo/sage/upstream/patch-2.7.5.tar.gz
[patch-2.7.5] patch-2.7.5
[patch-2.7.5] ====================================================
[patch-2.7.5] Setting up build directory for patch-2.7.5
[patch-2.7.5] Traceback (most recent call last):
[patch-2.7.5]   File "/home/defeo/sage/build/bin/sage-uncompress-spkg", line 23, in <module>
[patch-2.7.5]     run()
[patch-2.7.5]   File "/home/defeo/sage/build/bin/../sage_bootstrap/uncompress/cmdline.py", line 72, in run
[patch-2.7.5]     unpack_archive(archive, dirname)
[patch-2.7.5]   File "/home/defeo/sage/build/bin/../sage_bootstrap/uncompress/action.py", line 68, in unpack_archive
[patch-2.7.5]     archive.extractall(members=archive.names)
[patch-2.7.5]   File "/home/defeo/sage/build/bin/../sage_bootstrap/uncompress/tar_file.py", line 90, in extractall
[patch-2.7.5]     members=members)
[patch-2.7.5]   File "/usr/lib/python3.6/tarfile.py", line 2007, in extractall
[patch-2.7.5]     numeric_owner=numeric_owner)
[patch-2.7.5]   File "/usr/lib/python3.6/tarfile.py", line 2049, in extract
[patch-2.7.5]     numeric_owner=numeric_owner)
[patch-2.7.5] TypeError: _extract_member() got an unexpected keyword argument 'set_attrs'
[patch-2.7.5] ************************************************************************
[patch-2.7.5] Error: failed to extract /home/defeo/sage/upstream/patch-2.7.5.tar.gz
[patch-2.7.5] ************************************************************************

this looks like system's Python is nuked too. Do you have funky stuff in your LD_LIBRARY_PATH or in PATH? Nothing to do with gc, that's certain.

comment:43 Changed 22 months ago by dimpase

Or perhaps it's simply due to your python being python3 (or a very new python3, which has not been tested...)

comment:44 follow-up: Changed 22 months ago by dimpase

yep, I have this error if I set my system Python to python3.5, too.

Thus, set python to python2, and repeat please.

comment:45 in reply to: ↑ 44 Changed 22 months ago by dimpase

Replying to dimpase:

yep, I have this error if I set my system Python to python3.5, too.

Thus, set python to python2, and repeat please.

This tar_file py3 problem is now #24830 (which has nothing to do with the current ticket)

comment:46 Changed 22 months ago by defeo

Ok, it compiled with Python2. Now, it might be thanks to #23700, or thanks to Python2... who knows? :)

comment:47 Changed 22 months ago by dimpase

  • Dependencies set to #23700
  • Status changed from new to needs_review

#23700 is reported to fix this issue.

(As well as using a guile-less make, I presume.)

comment:48 Changed 22 months ago by embray

Neat, I was able to reproduce this in an Arch Linux Docker image. So at least there's that.

comment:49 Changed 22 months ago by dimpase

Does #23700 cure it?

comment:50 Changed 22 months ago by embray

I haven't tried. But a workaround that did work was to add LD_PRELOAD=/usr/bin/libgc.so. So a full workaround might look something like:

if [ "$UNAME" = "Linux" ]; then
    LIBGC="$(ldd $(which make) | sed -n 's/\s*libgc\.so.* => \(.\+\) .*/\1/p')"
    if [ -n "$LIBGC" ]; then
        export LD_PRELOAD="$LIBGC"
    fi
fi

This finds the libgc that is needed by libguile (and by extension make) and ensures it's the one that's used, not the one from Sage. Sucks, but it works, and is kind of necessary.

A similar LD_PRELOAD trick might be able to solve #24605 as well, but I haven't tested that yet.

Last edited 22 months ago by embray (previous) (diff)

comment:51 follow-up: Changed 22 months ago by embray

  • Authors set to Erik Bray
  • Branch set to u/embray/build/ticket-24575
  • Commit set to 71c63fd0d9043a134568ad2a018f65c69888c52a
  • Reviewers set to Erik Bray

I've gone ahead and added my workaround. I would recommend using this even with #23700, just because really we should always be using the libgc from the system when invoking make (where applicable), even if the libgc in Sage happens, by some luck, to be compatible with the system's version.

In principle this workaround is needed for any build process that adds $SAGE_LOCAL/lib to $LD_LIBRARY_PATH. In general this should not be done at all, but there is at least one other case I know of in Sage: python. So this might also be worth extracting into a helper function for pre-loading certain libraries when needed...


New commits:

71c63fdAdd the workaround to https://trac.sagemath.org/ticket/24575

comment:52 in reply to: ↑ 51 Changed 22 months ago by vdelecroix

  • Dependencies #23700 deleted

Replying to embray:

I've gone ahead and added my workaround. I would recommend using this even with #23700, just because really we should always be using the libgc from the system when invoking make (where applicable), even if the libgc in Sage happens, by some luck, to be compatible with the system's version.

In principle this workaround is needed for any build process that adds $SAGE_LOCAL/lib to $LD_LIBRARY_PATH. In general this should not be done at all, but there is at least one other case I know of in Sage: python. So this might also be worth extracting into a helper function for pre-loading certain libraries when needed...

Thanks Erik for analyzing the problem and providing the workaround! I definitely did not want to consider #23700 as a solution. (I am now compiling from scratch for checking)

Note that your fix is focused towards libgc so that the same kind of trouble might appear with another library in the future. But I consider this as fine for now. Wouldn't it be possible to exclude libgc from the list of packages to install when already present (and up to date) on the system?

Changed 22 months ago by vdelecroix

comment:53 Changed 22 months ago by vdelecroix

flint build is failing (for the same reason as R did), see flint-2.5.2.p2.log. Should we apply the same strategy here?

Changed 22 months ago by vdelecroix

comment:54 Changed 22 months ago by vdelecroix

Replying to vdelecroix:

flint build is failing (for the same reason as R did), see flint-2.5.2.p2.log. Should we apply the same strategy here?

Same also with arb and the Python package ryp2 (rpy2-2.8.2.p0.log). After adding the workaround to the three spkg-install the build completes.

Though I did not check the optional packages.

comment:55 Changed 22 months ago by vdelecroix

  • Branch changed from u/embray/build/ticket-24575 to u/vdelecroix/24575
  • Commit changed from 71c63fd0d9043a134568ad2a018f65c69888c52a to f33c5e60c6655d2af7d97047dcba996600f37193

New commits:

f33c5e6Same workaround for arb, flint and rpy2

comment:56 Changed 22 months ago by vdelecroix

  • Description modified (diff)

comment:57 follow-up: Changed 22 months ago by dimpase

  • Status changed from needs_review to needs_work

Shouldn't we do the LD_PRELOAD in the script that calls spkg-install, rather than repeat this boilerplate? (And the same for spkg-check, by the way).

This would also take care of all the non-standard packages.

comment:58 in reply to: ↑ 57 ; follow-up: Changed 22 months ago by vdelecroix

Replying to dimpase:

Shouldn't we do the LD_PRELOAD in the script that calls spkg-install, rather than repeat this boilerplate? (And the same for spkg-check, by the way).

This would also take care of all the non-standard packages.

I don't think so. This workaround takes care of fragile makefiles until a better solution is found. Having it globally applied would be a nightmare for debugging as well as upstream communication. It is also likely that the workarounds will be removed one by one.

comment:59 in reply to: ↑ 58 Changed 22 months ago by dimpase

Replying to vdelecroix:

Replying to dimpase:

Shouldn't we do the LD_PRELOAD in the script that calls spkg-install, rather than repeat this boilerplate? (And the same for spkg-check, by the way).

This would also take care of all the non-standard packages.

I don't think so. This workaround takes care of fragile makefiles until a better solution is found.

A better solution is not to use Guile-enabled make, at least not until it is built in a way ensuring one can use it for hacking on Guile dependencies.

Having it globally applied would be a nightmare for debugging as well as upstream communication. It is also likely that the workarounds will be removed one by one.

There is nothing to communicate to package upstream here, I think. You cannot ban their use of LD_LIBRARY_FLAGS. Then, the LD_PRELOAD is a pretty standard way to deal with these issues. It has cured so far all these issues, why do you want to keep getting reports on such and such package mysteriously breaking while Guile-enabled make is used.

comment:60 follow-up: Changed 22 months ago by mkoeppe

I agree with Vincent here. This workaround should only be used for the known packages with an LD_LIBRARY_PATH problem, and this should be reported as a bug upstream.

FLINT seems to be getting a CMake build system to replace its handwritten one, which will likely eliminate this problem.

comment:61 Changed 22 months ago by embray

Over in #24885 I already implemented a more generic solution for this, but I didn't push the branch yet. That should be used instead.

comment:62 in reply to: ↑ 60 Changed 22 months ago by embray

Replying to mkoeppe:

I agree with Vincent here. This workaround should only be used for the known packages with an LD_LIBRARY_PATH problem, and this should be reported as a bug upstream.

The fact that they use LD_LIBRARY_PATH is not a bug IMO, though it would be better, at least in some cases, if they used LD_PRELOAD instead for specific libraries.

comment:63 Changed 22 months ago by embray

  • Branch changed from u/vdelecroix/24575 to u/embray/build/ticket-24575
  • Commit changed from f33c5e60c6655d2af7d97047dcba996600f37193 to 454221ac40ec282c469ff2043465811b7364313b
  • Dependencies set to #24885

Reworked on top of #24885


New commits:

ba1b5eeAdd helper function that implements the workaround from https://trac.sagemath.org/ticket/24575 more generically.
6103df0Add the workaround to https://trac.sagemath.org/ticket/24575
2ecaa74Replace this with sdh_preload_lib
454221aSame issue applies to arb, flint, and rpy2

comment:64 Changed 22 months ago by embray

  • Status changed from needs_work to needs_review

comment:65 Changed 22 months ago by vdelecroix

  • Reviewers changed from Erik Bray to Erik Bray, Vincent Delecroix
  • Status changed from needs_review to needs_work

I am currently testing optional tickets, at least deformation has the same symptoms (I will provide a proper commit with all of them once finished).

comment:66 Changed 22 months ago by vdelecroix

All right, for deformation, after setting the sdh_preload_lib I got a different error that is unrelated

[deformation-d05941b]     CC   ../build/perm/../perm.lo
[deformation-d05941b] /usr/bin/ld: -r and -pie may not be used together
[deformation-d05941b] collect2: error: ld returned 1 exit status
[deformation-d05941b] make[4]: *** [../Makefile.subdirs:55: ../build/perm/../perm.lo] Error 1

comment:67 Changed 22 months ago by vdelecroix

  • Branch changed from u/embray/build/ticket-24575 to u/vdelecroix/24575
  • Commit changed from 454221ac40ec282c469ff2043465811b7364313b to 1ac3afaefe81fb21c43be7964c07f7cd7d529cb9

Concerning optional packages, only deformation needs the workaround (It appears that I also have some unrelated build failures #23533, #24901, #24902 and #24903).

Erik, Dima, Matthias: I am considering the branch as ready to be positively reviewed. As I added a commit on top of the branch I let somebody else finishing the review.


New commits:

1ac3afaSame issue applies to optional package deformation

comment:68 Changed 22 months ago by vdelecroix

  • Status changed from needs_work to needs_review

comment:69 Changed 22 months ago by mkoeppe

Our package perl_term_readline_gnu also has some LD_LIBRARY_PATH stuff...

comment:70 follow-up: Changed 22 months ago by mkoeppe

I think it's better to patch out this LD_LIBRARY_PATH stuff from the packages' Makefiles. Like this: https://github.com/mkoeppe/deformation/commit/0d732b13e901b777aca000ff502a5d5aa8d690bf

comment:71 Changed 22 months ago by mkoeppe

  • Cc jpflori added

comment:72 in reply to: ↑ 70 ; follow-ups: Changed 22 months ago by vdelecroix

Replying to mkoeppe:

I think it's better to patch out this LD_LIBRARY_PATH stuff from the packages' Makefiles. Like this: https://github.com/mkoeppe/deformation/commit/0d732b13e901b777aca000ff502a5d5aa8d690bf

Note that flint Makefile contains the very same lines... would you suggest that the same operation should be applied there?

comment:73 in reply to: ↑ 72 Changed 22 months ago by vdelecroix

Replying to vdelecroix:

Replying to mkoeppe:

I think it's better to patch out this LD_LIBRARY_PATH stuff from the packages' Makefiles. Like this: https://github.com/mkoeppe/deformation/commit/0d732b13e901b777aca000ff502a5d5aa8d690bf

Note that flint Makefile contains the very same lines... would you suggest that the same operation should be applied there?

As well as arb.

comment:74 Changed 22 months ago by jpflori

flint, arb and deformation share almost the same build system indeed. Except I did not push the -r/pie fix to deformation.

comment:75 in reply to: ↑ 72 Changed 22 months ago by mkoeppe

Replying to vdelecroix:

would you suggest that the same operation should be applied there?

Yes, probably.

comment:76 Changed 22 months ago by vdelecroix

  • Description modified (diff)
  • Report Upstream changed from Reported upstream. Developers deny it's a bug. to Reported upstream. No feedback yet.

comment:77 follow-up: Changed 22 months ago by mkoeppe

And for R, it may be enough to remove the bottom lines of etc/ldpaths.in.

comment:78 Changed 22 months ago by vdelecroix

  • Description modified (diff)

comment:79 follow-up: Changed 22 months ago by mkoeppe

Note for all these packages, it is to be seen whether upstream would accept these changes: Some of these libraries may in fact have valid reasons for adjusting LD_LIBRARY_PATH in the context of their build system quirks. But in our setup, since we make sure that all libraries are installed with a full rpath, none of these LD_LIBRARY_PATH things are necessary.

comment:80 in reply to: ↑ 79 Changed 22 months ago by vdelecroix

Replying to mkoeppe:

Note for all these packages, it is to be seen whether upstream would accept these changes: Some of these libraries may in fact have valid reasons for adjusting LD_LIBRARY_PATH in the context of their build system quirks. But in our setup, since we make sure that all libraries are installed with a full rpath, none of these LD_LIBRARY_PATH things are necessary.

At least I asked for flint/arb (see ticket description). Even if upstream remains unchanged we would have two options

  • adding a patch removing the 5 initial lines of Makefile.in (compiles fine on my computer)
  • use the Erik's workaround with LD_PRELOAD

Does anybody have a preference?

comment:81 Changed 22 months ago by mkoeppe

I (clearly) have a strong preference for patching.

comment:82 in reply to: ↑ 77 ; follow-up: Changed 22 months ago by mkoeppe

Replying to mkoeppe:

And for R, it may be enough to remove the bottom lines of etc/ldpaths.in.

@charpent: I see that the sage R package contains various patches from you, in particular regarding directories. Would this change, removing the set up of LD_LIBRARY_PATH (and DYLD_FALLBACK_LIBRARY_PATH on macOS), make sense to you?

comment:83 in reply to: ↑ 82 Changed 22 months ago by dimpase

Replying to mkoeppe:

Replying to mkoeppe:

And for R, it may be enough to remove the bottom lines of etc/ldpaths.in.

@charpent: I see that the sage R package contains various patches from you, in particular regarding directories. Would this change, removing the set up of LD_LIBRARY_PATH (and DYLD_FALLBACK_LIBRARY_PATH on macOS), make sense to you?

IMHO this would break the Java support in R (if Java is installed in a non-standard place, which is not very unusual), and probably other R packages that might install or use shared libs.

comment:84 Changed 21 months ago by git

  • Commit changed from 1ac3afaefe81fb21c43be7964c07f7cd7d529cb9 to f4697df10175ba42a821f848942d3c174c550d76

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

0aaff21same issue with rpy2
f4697dfPatching flint/arb/deformation

comment:85 Changed 21 months ago by vdelecroix

  • Description modified (diff)

comment:86 Changed 21 months ago by vdelecroix

  • Authors changed from Erik Bray to Erik Bray, Vincent Delecroix

comment:87 Changed 21 months ago by vdelecroix

  • Description modified (diff)

comment:88 Changed 21 months ago by git

  • Commit changed from f4697df10175ba42a821f848942d3c174c550d76 to f5f25dc71fa05c4ac81d18392880a3358a9c9957

Branch pushed to git repo; I updated commit sha1. New commits:

f5f25dcspecify upstream issues in patches

comment:89 follow-ups: Changed 21 months ago by embray

I'm -1 on patching things out if it isn't necessary to, or if the reason to do so hasn't been fully understood. Are any of these patches even necessary? And if so, why are these packages manipulating LD_LIBRARY_PATH, and how are you sure that it isn't a necessary and valid thing to do in this case?

comment:90 Changed 21 months ago by embray

In particular, since I already provided a workaround, why not just use that same workaround for those packages as well?

comment:91 in reply to: ↑ 89 Changed 21 months ago by vdelecroix

Replying to embray:

I'm -1 on patching things out if it isn't necessary to, or if the reason to do so hasn't been fully understood. Are any of these patches even necessary? And if so, why are these packages manipulating LD_LIBRARY_PATH, and how are you sure that it isn't a necessary and valid thing to do in this case?

Matthias might be better placed to answer (see 81). We might also wait for upstream answers (see ticket description for the links). To my mind, keeping spkg-install as simple as possible is better (assuming the patch is correct).

I would also like to find a solution so that all these build troubles are merged soon (even if the fix has to change later on). The problem at the origin of this ticket affects all archlinux users.

comment:92 follow-ups: Changed 21 months ago by dimpase

Hell, we can provide a make spkg. To me, make is a tool that should work, no matter what. Otherwise, patching Sage packages is akin to trying to use a slot screwdriver on French recess screws, and then proceeding to saw slots in screws instead of picking up a correct tool...(sorry, my undergrad was in engineering :-))

comment:93 in reply to: ↑ 92 Changed 21 months ago by charpent

Replying to dimpase:

Hell, we can provide a make spkg. To me, make is a tool that should work, no matter what. Otherwise, patching Sage packages is akin to trying to use a slot screwdriver on French recess screws, and then proceeding to saw slots in screws instead of picking up a correct tool...(sorry, my undergrad was in engineering :-))

No problem. After all, my undergrad was even more remote...

comment:94 follow-up: Changed 21 months ago by embray

For flint and friends another workaround I found was to simply pass DLPATH_ADD= when calling make in its spkg-install.

comment:95 in reply to: ↑ 89 Changed 21 months ago by mkoeppe

Replying to embray:

why are these packages manipulating LD_LIBRARY_PATH, and how are you sure that it isn't a necessary and valid thing to do in this case?

We can be sure simply by testing that the build works, just like we do with any other package.

comment:96 in reply to: ↑ 94 Changed 21 months ago by mkoeppe

Replying to embray:

For flint and friends another workaround I found was to simply pass DLPATH_ADD= when calling make in its spkg-install.

This is a great solution, which I prefer over patching.

comment:97 Changed 21 months ago by git

  • Commit changed from f5f25dc71fa05c4ac81d18392880a3358a9c9957 to 21442be9d659d74e29ce2a3a5dd85f55b7c90475

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

eb4fd11patch for recent gcc's in deformation
21442beFix for arb/flint/deformation

comment:98 Changed 21 months ago by mkoeppe

Thank you! Next I'd suggest that we also see how exactly $SAGE_LOCAL/lib ends up in R's LD_LIBRARY_PATH. The relevant file seems to be $SAGE_LOCAL/lib/R/etc/ldpaths. But I can't investigate this here on macOS.

comment:99 in reply to: ↑ 92 Changed 21 months ago by mkoeppe

Replying to dimpase:

we can provide a make spkg.

-1

comment:100 follow-up: Changed 21 months ago by vdelecroix

[EDITED]

At commit 21442be flint and arb do not pass their testsuite on my computer (when doing $ sage -f -c flint or same with arb/deformation)

  • without anything new in spkg-check I end up with the same libguile buisness
  • with DLPATH_ADD= it ends with the tests failing to load libflint/libarb/libdeformation (as I guess they are not installed at the time the tests are run).
  • with sdh_preload_lib it works fine

I will add a commit in a minute.

Last edited 21 months ago by vdelecroix (previous) (diff)

comment:101 Changed 21 months ago by git

  • Commit changed from 21442be9d659d74e29ce2a3a5dd85f55b7c90475 to 0d8a4a954cee8958b24ece2067bf5191dcbcd28e

Branch pushed to git repo; I updated commit sha1. New commits:

0d8a4a9use sdh_preload_lib for flint/arb/deformation test-suites

comment:102 Changed 21 months ago by mkoeppe

Interesting; normally make check is supposed to test against the non-installed library, rather than the installed library. (The distinction does not matter to us because we run spkg-check after spkg-install.

comment:103 Changed 21 months ago by mkoeppe

Vincent, could you try whether passing

$MAKE check DLPATH_ADD=`pwd`

works (without using sdh_preload_lib)?

Changed 21 months ago by vdelecroix

Changed 21 months ago by vdelecroix

Changed 21 months ago by vdelecroix

Changed 21 months ago by vdelecroix

comment:104 Changed 21 months ago by vdelecroix

See in attachment the full log of the sage -f -c flint with the various configurations

comment:105 Changed 21 months ago by mkoeppe

Thank you! Seems like FLINT forgets to pass the rpath linker option when it builds its test programs, relying on LD_LIBRARY_PATH instead.

comment:106 Changed 21 months ago by mkoeppe

Our patch use_ldflags_in_tests.patch (for FLINT) does not go far enough; it forgets to patch Makefile.subdirs.

comment:108 follow-up: Changed 21 months ago by gh-dimpase

  • Description modified (diff)
  • Priority changed from blocker to critical
  • Summary changed from conflicts with gc to on Arch make+guile is broken

The ticket description and the title were misleading. It's a bug in Arch that we are dealing with here, and it should not be a blocker. A broken make is nothing new, e.g. while in theory BSD Make should be able to build Sage, in practice is does not work, and one has to use GNU Make.

If you must work on Arch, install make without guile support. Certainly, improving various upstream build systems is a noble goal, but getting totally carried away with this is not a good idea.

comment:109 follow-up: Changed 21 months ago by mkoeppe

I agree that it's probably not a blocker because it only happens on a relatively obscure configuration. But I don't think your edit to the description was an improvement.

comment:110 in reply to: ↑ 109 Changed 21 months ago by gh-dimpase

  • Description modified (diff)

Replying to mkoeppe:

I agree that it's probably not a blocker because it only happens on a relatively obscure configuration. But I don't think your edit to the description was an improvement.

I've added a bit more detail pointing at the root cause of trouble. As well, #23700 will provide libgc compatible with the one needed by libguile, and the main problem will go away after it is merged.

comment:111 in reply to: ↑ 107 ; follow-up: Changed 21 months ago by vdelecroix

Replying to mkoeppe:

Fix for FLINT is here: https://github.com/mkoeppe/flint2/commit/bd2684891b6da6791ae2f52482a02f5b4cc56bd1

With the patch applied, tests run with DLPATH_ADD= in make check. Could you make it a proper upstream pull request?

comment:112 in reply to: ↑ 100 Changed 21 months ago by embray

Replying to vdelecroix:

[EDITED]

At commit 21442be flint and arb do not pass their testsuite on my computer (when doing $ sage -f -c flint or same with arb/deformation)

  • without anything new in spkg-check I end up with the same libguile buisness
  • with DLPATH_ADD= it ends with the tests failing to load libflint/libarb/libdeformation (as I guess they are not installed at the time the tests are run).

I didn't get an opportunity to comment on this last night, but FYI you don't need to add DLPATH_ADD= to the make install call. Just the first one that says make verbose. That works for me, and the make check tests pass.

comment:113 in reply to: ↑ 108 ; follow-up: Changed 21 months ago by embray

Replying to gh-dimpase:

The ticket description and the title were misleading. It's a bug in Arch that we are dealing with here, and it should not be a blocker. A broken make is nothing new, e.g. while in theory BSD Make should be able to build Sage, in practice is does not work, and one has to use GNU Make.

It's not a bug in Arch. If anything it's a bug in Sage...

comment:114 in reply to: ↑ 113 ; follow-up: Changed 21 months ago by dimpase

Replying to embray:

Replying to gh-dimpase:

The ticket description and the title were misleading. It's a bug in Arch that we are dealing with here, and it should not be a blocker. A broken make is nothing new, e.g. while in theory BSD Make should be able to build Sage, in practice is does not work, and one has to use GNU Make.

It's not a bug in Arch. If anything it's a bug in Sage...

As you cannot reproduce it on anything but the latest Arch, I don't really see how you can say this. The make they ship is not backwards-compatible, and they have not made any announcements to that extent, have not bumped the version up, have they? At least if they insist on this make, they should also provide a guile-less make in another package.

comment:115 in reply to: ↑ 114 ; follow-up: Changed 21 months ago by embray

Replying to dimpase:

Replying to embray:

Replying to gh-dimpase:

The ticket description and the title were misleading. It's a bug in Arch that we are dealing with here, and it should not be a blocker. A broken make is nothing new, e.g. while in theory BSD Make should be able to build Sage, in practice is does not work, and one has to use GNU Make.

It's not a bug in Arch. If anything it's a bug in Sage...

As you cannot reproduce it on anything but the latest Arch, I don't really see how you can say this. The make they ship is not backwards-compatible, and they have not made any announcements to that extent, have not bumped the version up, have they? At least if they insist on this make, they should also provide a guile-less make in another package.

I think you're being a tad Sage-centric. The kind of problem we're encountering here is a normal problem when you have two different versions of a library and you load the wrong version in an executable that expects the different version. There's certainly nothing wrong with Arch shipping a feature-complete version of GNU make (while it's a rare feature, it's probably used by at least some packages), and I can't blame them for not expecting that somebody might end up with their own copy of libgc on their shared library path, which is a highly unusual thing to be doing.

You do have a point that being a fundamental build tool, a make with additional dependencies is more likely to encounter a problem like this than many other tools, but this same sort of problem can happen with any other part of the build toolchain. The real problem is that Sage is insisting on using too many of its own packages for low-level dependencies :)

comment:116 follow-ups: Changed 21 months ago by embray

Another example where this kind of problem can occur (but by luck doesn't seem to), which has nothing to do with make or guile or gc: Sage ships its own libz for some reason. Well, libpython has libz as a dependency, and when building Python it also manipulates LD_LIBRARY_PATH. In this case it isn't really a problem, but if you had $SAGE_LOCAL/lib on LD_LIBRARY_PATH, and Sage's libz were incompatible with the system's libz, you would also have a problem.

comment:117 in reply to: ↑ 116 ; follow-up: Changed 21 months ago by dimpase

Replying to embray:

Another example where this kind of problem can occur (but by luck doesn't seem to), which has nothing to do with make or guile or gc: Sage ships its own libz for some reason. Well, libpython has libz as a dependency, and when building Python it also manipulates LD_LIBRARY_PATH. In this case it isn't really a problem, but if you had $SAGE_LOCAL/lib on LD_LIBRARY_PATH, and Sage's libz were incompatible with the system's libz, you would also have a problem.

please have a look at comments 27-29 above. The only conclusion I can draw from it that Arch has done something dodgy and very hard to reproduce, the probably screwed up versioning of libgc, by not bumping it up while upgrading, or something similar. (or perhaps gentoo has a different build setup for make+libguile, so that the result is not fragile...)

TLDR; manipulating LD_LIBRARY_PATH while building with make+guile on gentoo does not lead to a problem, while on arch it does.

comment:118 in reply to: ↑ 115 ; follow-ups: Changed 21 months ago by charpent

Replying to embray:

[ Snip... ]

The real problem is that Sage is insisting on using too many of its own packages for low-level dependencies :)

Hear, hear !

This, IMNSHO, is a capital point, that is involved in a *lot* of other parts of Sage. It stems from our insistence to have "known good" versions of almost everything we use, hence "our" version of Maxima, "our" version of R, "our" version of Sympy ,etc... and even "our" version(s) of Python (!).

This modus operandi greatly simplifies the maintenance of the consistency of Sage with (sometimes wildly) varying versions of other people's software. But the drawback is that Sage ends up being a distribution of mathematics-related software and underlying utilities, which converges to a (not so small) Unix-like distribution.

An alternative would be to push the version-related variability of interface in specialized interface packages, presenting Sage with a uniform interface and adapting to the "other side" variability.

As far as I can tell, this alternative isn't used because it requires maintenance of the interface for each and every version of the interfaced software that can be met "in the wild". Which would be more work than maintaining one "Sage's own" version, it seems...

But at the point where we need "our" make, "our" gcc, "our" python(s), etc..., I wonder if this point of view shouldn't be re-assessed.

This ticket is probably not the right place to discuss it ; however, I think it should be opened on sage-devel.

Opinions ? Advice ?

comment:119 in reply to: ↑ 111 ; follow-up: Changed 21 months ago by mkoeppe

Replying to vdelecroix:

Replying to mkoeppe:

Fix for FLINT is here: https://github.com/mkoeppe/flint2/commit/bd2684891b6da6791ae2f52482a02f5b4cc56bd1

With the patch applied, tests run with DLPATH_ADD= in make check. Could you make it a proper upstream pull request?

https://github.com/wbhart/flint2/pull/449

comment:120 in reply to: ↑ 116 Changed 21 months ago by mkoeppe

Replying to embray:

Another example where this kind of problem can occur (but by luck doesn't seem to) [...] libz [...]

That's why I would recommend to use DLPATH_ADD= in all places where make is used in the spkg scripts for the FLINT-like packages, not just the minimal list of those where the current symptom (make-guile-gc-on-arch) is observed.

comment:121 in reply to: ↑ 118 Changed 21 months ago by mkoeppe

Replying to charpent:

Opinions ? Advice ?

The convenience for users of having a self contained sage distribution (in particular those who are stuck on a system without root access, or distributions without a comprehensive and well-maintained set of mathematical software packages) is important.

Using the distribution's installed packages instead of our own whenever we can is of course desirable, and in fact there is an ongoing effort to do so. See for example Erik's #24919.

Whether we can use some specific version is often times not a question of "interfacing", but to avoid critical bugs; and it seems problematic to build an interface with the purpose of working around bugs. When it is about features rather than bugs, again there is an ongoing effort already, #20382.

These tickets (and related ones) need technical discussion.

comment:122 follow-up: Changed 21 months ago by vdelecroix

Replying to embray:

Replying to vdelecroix:

[EDITED]

At commit 21442be flint and arb do not pass their testsuite on my computer (when doing $ sage -f -c flint or same with arb/deformation)

  • without anything new in spkg-check I end up with the same libguile buisness
  • with DLPATH_ADD= it ends with the tests failing to load libflint/libarb/libdeformation (as I guess they are not installed at the time the tests are run).

I didn't get an opportunity to comment on this last night, but FYI you don't need to add DLPATH_ADD= to the make install call. Just the first one that says make verbose. That works for me, and the make check tests pass.

Perhaps I misunderstood your suggestion but the following does not work on my computer

  • build/pkgs/flint/spkg-install

    diff --git a/build/pkgs/flint/spkg-install b/build/pkgs/flint/spkg-install
    index f6c185433f..29cdbee8f0 100644
    a b if [ $? -ne 0 ]; then 
    3838fi
    3939
    4040echo "Building FLINT shared library."
    41 $MAKE verbose
     41$MAKE verbose DLPATH_ADD=
    4242if [ $? -ne 0 ]; then
    4343    echo >&2 "Error: Failed to build FLINT shared library."
    4444    exit 1

The log of the command sage -f flint is flint-2.5.2-check_with_DLPATH_ADD_empty_but_not_on_make_install.log.gz.

comment:123 in reply to: ↑ 117 Changed 21 months ago by mkoeppe

Replying to dimpase:

The only conclusion I can draw from it that Arch has done something dodgy and very hard to reproduce, the probably screwed up versioning of libgc, by not bumping it up while upgrading, or something similar. (or perhaps gentoo has a different build setup for make+libguile, so that the result is not fragile...)

Shared library versioning is certainly a powerful mechanism for a distribution to keep consistency, but it cannot protect against shadowing system libraries by user-installed libraries via LD_LIBRARY_PATH.

comment:124 in reply to: ↑ 119 Changed 21 months ago by vdelecroix

Replying to mkoeppe:

Replying to vdelecroix:

Replying to mkoeppe:

Fix for FLINT is here: https://github.com/mkoeppe/flint2/commit/bd2684891b6da6791ae2f52482a02f5b4cc56bd1

With the patch applied, tests run with DLPATH_ADD= in make check. Could you make it a proper upstream pull request?

https://github.com/wbhart/flint2/pull/449

Superseeded by https://github.com/wbhart/flint2/pull/450 (sorry).

comment:125 in reply to: ↑ 118 Changed 21 months ago by embray

Replying to charpent:

This ticket is probably not the right place to discuss it ; however, I think it should be opened on sage-devel.

Opinions ? Advice ?

While I agree with you 100% this is not a new discussion, nor does a discussion really need to be opened on sage-devel (been there done that). There is already lots of work being done on that from different directions, much of it funded by OpenDreamKit.

comment:126 in reply to: ↑ 122 ; follow-up: Changed 21 months ago by embray

Replying to vdelecroix:

Replying to embray:

Replying to vdelecroix:

[EDITED]

At commit 21442be flint and arb do not pass their testsuite on my computer (when doing $ sage -f -c flint or same with arb/deformation)

  • without anything new in spkg-check I end up with the same libguile buisness
  • with DLPATH_ADD= it ends with the tests failing to load libflint/libarb/libdeformation (as I guess they are not installed at the time the tests are run).

I didn't get an opportunity to comment on this last night, but FYI you don't need to add DLPATH_ADD= to the make install call. Just the first one that says make verbose. That works for me, and the make check tests pass.

Perhaps I misunderstood your suggestion but the following does not work on my computer

  • build/pkgs/flint/spkg-install

    diff --git a/build/pkgs/flint/spkg-install b/build/pkgs/flint/spkg-install
    index f6c185433f..29cdbee8f0 100644
    a b if [ $? -ne 0 ]; then 
    3838fi
    3939
    4040echo "Building FLINT shared library."
    41 $MAKE verbose
     41$MAKE verbose DLPATH_ADD=
    4242if [ $? -ne 0 ]; then
    4343    echo >&2 "Error: Failed to build FLINT shared library."
    4444    exit 1

The log of the command sage -f flint is flint-2.5.2-check_with_DLPATH_ADD_empty_but_not_on_make_install.log.gz.

Exactly this is all I needed for the build to work on arch.

comment:127 Changed 21 months ago by gh-dimpase

Do you actually need any of this, with the new beta merging #23700 ? I'm not saying that this branch is not needed, it's merely for understanding the root causes...

comment:128 in reply to: ↑ 126 Changed 21 months ago by embray

Replying to embray:

Replying to vdelecroix:

Perhaps I misunderstood your suggestion but the following does not work on my computer

Exactly this is all I needed for the build to work on arch.

Scratch that--it's not working for me either. I must have made a mistake when I last tested it (perhaps I forgot to ensure that the gc package was installed in sage first). Now I am getting the same result.

comment:129 Changed 20 months ago by mmezzarobba

This is going to conflict with #25035: Erik, Vincent, in which order do you think we should handle the two tickets?

comment:130 Changed 20 months ago by embray

I don't have a strong preference. If there is a conflict I can resolve it later.

comment:131 Changed 20 months ago by mkoeppe

Let's finish this ticket by removing the unnecessary sdh_preload_lib from the packages with FLINT-like build systems.

Last edited 20 months ago by mkoeppe (previous) (diff)

comment:132 Changed 20 months ago by mkoeppe

I have created a follow-up ticket for a solution without the sdh_preload_lib with R (and rpy2) at #25170.

comment:133 follow-up: Changed 20 months ago by embray

By the way, #24919 provides a prototype for an easy to use (I think) generic mechanism for adding configure-time checks for system packages to use in favor of building copies of those packages for sage (just as we currently do for packages like gcc, curl, etc...).

In the long-term the best solution to this particular issue would be for Sage to not be installing its own libgc unless it absolute has to (which for a modern Arch Linux where this problem is occurring, it probably wouldn't have to, though that also might depend on whether or not one has the develop package installed). We could also maybe enable header-only installs of some packages.

I'll experiment with adding a configure-time check for libgc on top of #24919.

comment:134 in reply to: ↑ 133 Changed 20 months ago by vdelecroix

Replying to embray:

By the way, #24919 provides a prototype for an easy to use (I think) generic mechanism for adding configure-time checks for system packages to use in favor of building copies of those packages for sage (just as we currently do for packages like gcc, curl, etc...).

Nice!

comment:135 Changed 19 months ago by mkoeppe

  • Status changed from needs_review to needs_work
Note: See TracTickets for help on using tickets.