Opened 3 years ago
Last modified 3 weeks ago
#11116 new defect
Pynac module not initialized before being used. This causes a crash on 64-bit OpenSolaris.
Reported by: | drkirkby | Owned by: | drkirkby |
---|---|---|---|
Priority: | major | Milestone: | sage-6.4 |
Component: | porting: Solaris | Keywords: | |
Cc: | burcin, jhpalmieri, robertwb, gagern, vbraun | Merged in: | |
Authors: | Reviewers: | ||
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description (last modified by drkirkby)
Sage builds fully 64-bit on Solaris 10 (SPARC).
On 64-bit OpenSolaris or Solaris 10, the stats package R fails to build. Since R is an external program, one can just touch SAGE_ROOT/spkg/installed/r-$versions and get an almost complete Sage.
However, this 64-bit Sage crashes at startup with OpenSolaris on x86, as discussed at:
http://groups.google.com/group/sage-devel/browse_thread/thread/efc864c79fed92df?hl=en
(one would expect similar on Solaris 10 x86 and probably SPARC too).
A backtrace with gdb on a Sun Ultra 27 running OpenSolaris 06/2009 shows:
drkirkby@hawk:~/64/sage-4.7.alpha3$ ./sage -gdb Building Sage on Solaris in 64-bit mode Creating SAGE_LOCAL/lib/sage-64.txt since it does not exist Detected SAGE64 flag Building Sage on Solaris in 64-bit mode ---------------------------------------------------------------------- | Sage Version 4.7.alpha3, Release Date: 2011-03-31 | | Type notebook() for the GUI, and license() for information. | ---------------------------------------------------------------------- ********************************************************************** * * * Warning: this is a prerelease version, and it may be unstable. * * * ********************************************************************** /export/home/drkirkby/64/sage-4.7.alpha3/local/bin/sage-ipython GNU gdb 6.8 Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i386-pc-solaris2.11"... warning: Lowest section in /lib/amd64/libdl.so.1 is .dynamic at 00000000000000b0 Python 2.6.4 (r264:75706, Apr 1 2011, 15:07:52) [GCC 4.5.0] on sunos5 Type "help", "copyright", "credits" or "license" for more information. warning: Lowest section in /lib/amd64/libintl.so.1 is .dynamic at 00000000000000b0 warning: Lowest section in /lib/amd64/libpthread.so.1 is .dynamic at 00000000000000b0 Program received signal SIGSEGV, Segmentation fault. 0x00000000003eb0a5 in ?? () (gdb) bt #0 0x00000000003eb0a5 in ?? () #1 0xfffffd7fff2ac5d1 in _Unwind_RaiseException_Body () from /lib/64/libc.so.1 #2 0xfffffd7fff2ac855 in _Unwind_RaiseException () from /lib/64/libc.so.1 #3 0xfffffd7ff91d6729 in __cxa_throw (obj=<value optimized out>, tinfo=<value optimized out>, dest=<value optimized out>) at ../../../../../gcc-4.5.0/libstdc++-v3/libsupc++/eh_throw.cc:78 #4 0xfffffd7fcec6d5ff in GiNaC::function::find_function (name=@0x4a359b0, nparams=2) at function.cpp:1446 #5 0xfffffd7fce9454ad in __pyx_f_4sage_8symbolic_8function_15BuiltinFunction__is_registered (__pyx_v_self=0x4a142f0) at sage/symbolic/function.cpp:7301 #6 0xfffffd7fce950755 in __pyx_pf_4sage_8symbolic_8function_8Function___init__ (__pyx_v_self=0x4a142f0, __pyx_args=<value optimized out>, __pyx_kwds=<value optimized out>) at sage/symbolic/function.cpp:2374 #7 0xfffffd7fffde7a70 in ?? () #8 0x00000016745f5f63 in ?? () #9 0x0000000004a0e5a8 in ?? () #10 0x0000000004a142f0 in ?? () #11 0x000000000000000b in ?? () #12 0x0000000004a0e5a8 in ?? () #13 0x0000000002c913e8 in ?? () #14 0xfffffd7fd76c2b30 in module_members () from /export/home/drkirkby/64/sage-4.7.alpha3/local/lib//libpython2.6.so.1.0 #15 0x0000002752657572 in ?? () #16 0xfffffd7fd76d5c60 in ?? () from /export/home/drkirkby/64/sage-4.7.alpha3/local/lib//libpython2.6.so.1.0 #17 0x2800000040520000 in ?? () #18 0xfffffd7fd76d5920 in ?? () from /export/home/drkirkby/64/sage-4.7.alpha3/local/lib//libpython2.6.so.1.0 #19 0x0000000000000005 in ?? () #20 0x00000000049e4db8 in ?? () ---Type <return> to continue, or q <return> to quit--- #21 0x0000000000000000 in ?? ()
Burcin Erocal produced this Python call stack.
File "/export/home/burcin/sage-4.7.alpha3/local/bin/sage-ipython", line 21, in <module> ipy_sage = IPython.Shell.start() File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/IPython/Shell.py", line 1233, in start return shell(user_ns = user_ns) File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/IPython/Shell.py", line 78, in __init__ debug=debug,shell_class=shell_class) File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/IPython/ipmaker.py", line 644, in make_IPython force_import(profmodname) File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/IPython/ipmaker.py", line 66, in force_import __import__(modname) File "ipy_profile_sage.py", line 7, in <module> import sage.all_cmdline File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/all_cmdline.py", line 14, in <module> from sage.all import * File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/all.py", line 75, in <module> from sage.schemes.all import * File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/schemes/all.py", line 25, in <module> from hyperelliptic_curves.all import * File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/schemes/hyperelliptic_curves/all.py", line 1, in <module> from constructor import HyperellipticCurve File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/schemes/hyperelliptic_curves/constructor.py", line 11, in <module> from sage.schemes.generic.all import ProjectiveSpace File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/schemes/generic/all.py", line 4, in <module> from affine_space import AffineSpace, is_AffineSpace File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/schemes/generic/affine_space.py", line 24, in <module> import algebraic_scheme File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/schemes/generic/algebraic_scheme.py", line 143, in <module> import toric_variety File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/schemes/generic/toric_variety.py", line 236, in <module> from sage.geometry.cone import Cone, is_Cone File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/geometry/cone.py", line 174, in <module> from sage.combinat.posets.posets import FinitePoset File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/combinat/posets/posets.py", line 24, in <module> from sage.graphs.all import DiGraph File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/graphs/all.py", line 16, in <module> from graph_editor import graph_editor File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/graphs/graph_editor.py", line 22, in <module> from sagenb.misc.support import EMBEDDED_MODE File "/export/home/burcin/sage-4.7.alpha3/devel/sagenb/sagenb/misc/support.py", line 563, in <module> from sage.symbolic.all import Expression, SR File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/symbolic/all.py", line 9, in <module> from sage.symbolic.relation import solve, solve_mod, solve_ineq File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/symbolic/relation.py", line 314, in <module> from sage.calculus.calculus import maxima File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/calculus/calculus.py", line 374, in <module> from sage.symbolic.integration.integral import indefinite_integral, \ File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/symbolic/integration/integral.py", line 129, in <module> indefinite_integral = IndefiniteIntegral() File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/symbolic/integration/integral.py", line 62, in __init__ BuiltinFunction.__init__(self, "integrate", nargs=2)
Burcin writes on sage-devel
It seems that cones.py looks for posets.py, which needs the graphs module, which initializes the graph_editor. The graph editor tries to see if it's in the notebook or the command line, but sagenb imports SR and Expression from sage.symbolic.all (line 563 of sagenb/misc/support.py). This tries to initialize the functions (integrate in this case) before pynac is initialized...
We need a better solution for making sure modules are initialized properly before anything is imported from them. I thought putting an init.py file in sage/symbolic/ with "import pynac" would solve the problem. However, it seems that python just ignores that file.
This is one of the very few issues preventing a complete 64-bit build on Solaris/OpenSolaris, so it would be nice to crack this one.
Attachments (1)
Change History (26)
comment:1 Changed 3 years ago by drkirkby
comment:2 follow-up: ↓ 3 Changed 3 years ago by fbissey
That's interesting you get this backtrace. That looks so familiar, I thought it wouldn't happen on a vanilla sage install. https://github.com/cschwan/sage-on-gentoo/issues#issue/40
comment:3 in reply to: ↑ 2 ; follow-up: ↓ 4 Changed 3 years ago by drkirkby
Replying to fbissey:
That's interesting you get this backtrace. That looks so familiar, I thought it wouldn't happen on a vanilla sage install. https://github.com/cschwan/sage-on-gentoo/issues#issue/40
Yes, it does look very similar. Robert Bradshaw had some ideas how to solve it, but it needs someone with more knowledge of the subject to sort this out.
Dave
comment:4 in reply to: ↑ 3 Changed 3 years ago by drkirkby
Replying to drkirkby:
Replying to fbissey:
That's interesting you get this backtrace. That looks so familiar, I thought it wouldn't happen on a vanilla sage install. https://github.com/cschwan/sage-on-gentoo/issues#issue/40
Yes, it does look very similar. Robert Bradshaw had some ideas how to solve it, but it needs someone with more knowledge of the subject to sort this out.
Dave
More knowledge than me I mean - not more knowledge than Robert.
Dave
comment:5 follow-up: ↓ 7 Changed 3 years ago by fbissey
Well Martin who reported the backtrace on github (but it wasn't the first report of it) tracked it down to what he thinks is a bug in glibc http://sources.redhat.com/bugzilla/show_bug.cgi?id=12453 I don't think solaris uses glibc (but you can confirm that) so you getting it there is very interesting. There is a test script in the glibc bug report above to test for the problem - it would be interesting if you could run it. In sage-on-gentoo it started on amd64 and spread into x86 land later the exact mix triggering the problem is still unknown. But if Robert can find a pure python solution that would be a relief. We are now giving users instructions on patching their glibc which isn't nice.
comment:6 Changed 3 years ago by fbissey
- Cc burcin added; burchin removed
comment:7 in reply to: ↑ 5 Changed 3 years ago by drkirkby
- Description modified (diff)
Replying to fbissey:
Well Martin who reported the backtrace on github (but it wasn't the first report of it) tracked it down to what he thinks is a bug in glibc http://sources.redhat.com/bugzilla/show_bug.cgi?id=12453
I'm suspicious of the fact this may be a bug in glibc, as I'm 99% sure GCC will use the Sun C library and not the GNU one. I think Burcin's diagnosis of the problem is more likely to be correct. I posted a comment to that effect on the Gentoo site.
There's a test program on the Redhat glibc site, but I can't get that to run. Probably a bashism that needs a newer version of bash than I have.
Dave
comment:8 follow-up: ↓ 10 Changed 3 years ago by fbissey
That's why the fact you get it is so interesting. Patching glibc solved everyone's problem in gentoo but it may be that this is just a work around. Or may be it is more complex than that: the problem will only happen if a combination of elements are not behaving like they should.
Note it has been observed to happen after update that are seemingly unrelated to sage packages. Which suggests that there is indeed a complex mix leading to the failure. Fixing any one components of the mix is probably enough to prevent the problem altogether.
comment:9 Changed 3 years ago by drkirkby
Just to confirm, creating a 64-bit hello world program with gcc and using 'ldd' to find what libraries are linked.
drkirkby@hawk:~$ gcc -m64 test.c drkirkby@hawk:~$ ldd a.out libc.so.1 => /lib/64/libc.so.1 libm.so.2 => /lib/64/libm.so.2 drkirkby@hawk:~$
so only the Sun libraries are being used.
I'm hoping Robert or Bucin can come up with a reliable fix for this at the Python level. At least I know that will solve the problem on OpenSolaris.
As with many things, if the behavior is not defined, one can get mysterious and not necessarily reproducible bugs. It sounds like there are a couple of things which are not defined fully.
Dave
comment:10 in reply to: ↑ 8 Changed 3 years ago by gagern
- Cc gagern added
Replying to fbissey:
the exact mix triggering the problem is still unknown.
Ingredients to the Linux/libc issue as far as I know them:
- Dynloading of a library with dependencies, so that multiple so files are loaded in response to a single dlopen call. The gtk python module is a likely candidate here.
- One of the deps must use the initial-exec flavour of thread-local variables. The proprietary nvidia OpenGL drivers (libnvidia-tls.so) do this. Iirc a line mentioning "R_X86_64_TPOFF64" in the output from "objdump -R" is a good indication for this.
- Another of the deps (from the same dlopen call) must be the place where things will later go wrong. In our case that was the C++ library, libstdc++, shipped with gcc.
- The latter dep must make use of its thread local vars (of local-dynamic flavour). In our case that was the C++ exception handling mechanism. So if no exception gets thrown, we won't encounter the issue here.
Replying to fbissey:
Or may be it is more complex than that: the problem will only happen if a combination of elements are not behaving like they should. Fixing any one components of the mix is probably enough to prevent the problem altogether.
I'd agree to both. So you can either fix the exception handling mechanism, or avoid the exception being thrown. Either one makes the problem vanish, although the other half of the problem might well resurface somewhere else later on.
Is there a chance of linking that OpenSolaris? backtrace to actual code lines from the Sun C library, to see what's happening there?
comment:11 in reply to: ↑ description ; follow-up: ↓ 12 Changed 3 years ago by gagern
Replying to drkirkby:
Seems people using Python in combination with boost have encountered similar things before. Does their diagnosis or workaround apply to you in some way?
comment:12 in reply to: ↑ 11 Changed 3 years ago by drkirkby
Replying to gagern:
Replying to drkirkby:
Seems people using Python in combination with boost have encountered similar things before. Does their diagnosis or workaround apply to you in some way?
If I understand correctly, his problem was that that he had compiled boost & his application with a different compiler to the Sun-compiled Python. But that is not the case here, as Python, boost and the rest of Sage are all built with the same compiler - Sage is not using Sun's Python.
Robert Bradshaw had some suggestions about how to solve this, but I don't know sufficient Python to implement them myself.
There are some other issues remaining when I comment out the code that's causing the crash, but I'm not sure if those are in any way related to the fact I've comment out a section of code.
Dave
comment:13 Changed 3 years ago by fbissey
The issue from sage-on-gentoo seem to have disappeared on one of my machines. I am not completely sure if Gentoo included Martin's patch already or if pynac-0.2.3 shipped in sage-4.7.1_alpha4 is responsible. It is probably worth giving 4.7.1_alpha4 a dpin.
comment:14 Changed 3 years ago by burcin
The solution to this would be to import one of the objects mentioned in the chain I described lazily:
It seems that cones.py looks for posets.py, which needs the graphs module, which initializes the graph_editor. The graph editor tries to see if it's in the notebook or the command line, but sagenb imports SR and Expression from sage.symbolic.all (line 563 of sagenb/misc/support.py). This tries to initialize the functions (integrate in this case) before pynac is initialized...
This can be done by
- putting the import statement closer to the place where the object is needed in the source as opposed to the top of the file, or
- using Robert's LazyImport?
Note that quite a while ago, I wrote a script (sage-test-import in local/bin) to test for these problems. It imports each module in the Sage library individually, and checks if we got any errors. It would be a significant achievement if we can get a release to pass this test. This would go a long way towards making Sage more modular.
comment:15 Changed 3 years ago by drkirkby
I'll give the latest alpha a try if there's a chance the problem may have been fixed.
It would be really good to get this resolved, as basically I am having to give up an attempt at a 64-bit Solaris port due to this bug. I can't do anything until this is solved, and I don't have the knowledge to do it myself. Hence you may have noticed my absence on sage-devel. I really can't make any useful contribution to Sage until this issue is resolved.
I'll give the latest alpha a build 64-bit. If this can be resolved, then there's a good chance of completing a 64-bit Solaris port, but without it solved, the port is effectively stalled.
Dave
Changed 3 years ago by burcin
comment:16 Changed 3 years ago by burcin
- Cc vbraun added
attachment:trac_11116-fix_imports.patch is a first attempt to clear up the circular dependencies. However, it still doesn't fix this problem.
Whatever I do, it seems that the initialization for libpynac.so is not run by the time modules in sage.functions are loaded. Is there a trick to make sure the library is initialized sooner?
I added Volker to the CC list, since he mentioned exactly this problem while working on pynac at SD31. :)
comment:17 follow-up: ↓ 18 Changed 3 years ago by vbraun
Having spent the whole day yesterday worrying about import ordering, I must say that we have way too many circular imports. This is also an issue because we currently call Cython with --disable-function-redefinition that changes the import ordering for cython files to an old and obsolete behavior. But Sage relies on it, otherwise many of its circular imports break.
It would be the wrong approach to require module X to load before module Y, this will just cause maintenance headaches down the road. Really, the problem is that module initializers do too much too early. If you start up Sage under a debugger then there are lots of non-trivial computations done in module initializers. Do we really need to construct some degree-20 polynomials every time Sage starts up? I don't think so. I would deposit that
- Module initializers should never require anything outside of their own module.
- If there is the slightest doubt, make your initializer lazy.
To my mind, the problem here in this ticket is that the sage.symbolic.integration.integral module instantiates its IndefiniteIntegral class,
indefinite_integral = IndefiniteIntegral()
which in turn calls into pynac to register itself. Really there is no reason for this to be immediate, and it opens a can of worms about initialization order.
One could try to kludge around this and make sage.symbolic.function.Function.__init__ delay the function registration with pynac until pynac is ready, or initialize pynac explicitly. But then somebody will find a way to not only initialize a pynac function, but also use it inside a module initalizer in a nontrivial way, and it would crash again.
comment:18 in reply to: ↑ 17 ; follow-up: ↓ 21 Changed 3 years ago by drkirkby
Replying to vbraun:
Really, the problem is that module initializers do too much too early. If you start up Sage under a debugger then there are lots of non-trivial computations done in module initializers. Do we really need to construct some degree-20 polynomials every time Sage starts up? I don't think so.
If this sort of stupidity is occurring, it is no wonder there are complaints of Sage starting slowly. For some people Sage is taking minutes to start.
If I am understanding you correctly, it seems this problem I noticed on OpenSolaris is just a symptom of a more serious implementation issue, which is the result of a lack of thought in the design of Sage.
comment:19 follow-up: ↓ 20 Changed 3 years ago by jhpalmieri
I wonder if the patch at #11043 might help.
comment:20 in reply to: ↑ 19 Changed 3 years ago by fbissey
Replying to jhpalmieri:
I wonder if the patch at #11043 might help.
I cannot hurt to try it. It is hard to know how far these imports are reaching.
comment:21 in reply to: ↑ 18 Changed 3 years ago by vbraun
Replying to drkirkby:
If this sort of stupidity is occurring, it is no wonder there are complaints of Sage starting slowly. For some people Sage is taking minutes to start.
I think thats unrelated and essentially due to harddrives or NFS. The CPU can still run circles around any filesystem access.
If I am understanding you correctly, it seems this problem I noticed on OpenSolaris is just a symptom of a more serious implementation issue, which is the result of a lack of thought in the design of Sage.
Well compared to the C++ static initializer hell this is a piece of cake ;-). We have the tools to easily make initializers lazy, we just need to use them.
#11043 doesn't touch symbolic stuff so I doubt it'll do anything.
comment:22 Changed 13 months ago by jdemeyer
- Milestone changed from sage-5.11 to sage-5.12
comment:23 Changed 7 months ago by vbraun_spam
- Milestone changed from sage-6.1 to sage-6.2
comment:24 Changed 4 months ago by vbraun_spam
- Milestone changed from sage-6.2 to sage-6.3
comment:25 Changed 3 weeks ago by vbraun_spam
- Milestone changed from sage-6.3 to sage-6.4
#7029 is semi-related to this.