#22021 closed defect (fixed)
OpenBLAS randomly crashes / deadlocks
Reported by: | vbraun | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-7.5 |
Component: | linear algebra | Keywords: | random_fail |
Cc: | fbissey, jpflori, jdemeyer | Merged in: | |
Authors: | Volker Braun | Reviewers: | François Bissey |
Report Upstream: | N/A | Work issues: | |
Branch: | 0266727 (Commits) | Commit: | |
Dependencies: | Stopgaps: |
Description
Openblas occasionally crashes or deadlocks, most often in src/sage/matrix/matrix_integer_dense.pyx
but also other places. But its always some longish linear algebra computation. Examples in the comments.
Change History (42)
comment:1 Changed 4 years ago by
comment:2 Changed 4 years ago by
- Cc fbissey jpflori jdemeyer added
comment:3 Changed 4 years ago by
Here is a stack trace on a deadlocked test on OSX:
(lldb) process attach --pid 30237 Process 30237 stopped * thread #1: tid = 0x1eea45e, 0x00007fff8c08d10a libsystem_kernel.dylib`__semwait_signal + 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP frame #0: 0x00007fff8c08d10a libsystem_kernel.dylib`__semwait_signal + 10 libsystem_kernel.dylib`__semwait_signal: -> 0x7fff8c08d10a <+10>: jae 0x7fff8c08d114 ; <+20> 0x7fff8c08d10c <+12>: movq %rax, %rdi 0x7fff8c08d10f <+15>: jmp 0x7fff8c0877f2 ; cerror 0x7fff8c08d114 <+20>: retq Executable module set to "/Users/buildslave-sage/slave/sage_git/build/local/bin/python". Architecture set to: x86_64-apple-macosx. (lldb) bt * thread #1: tid = 0x1eea45e, 0x00007fff8c08d10a libsystem_kernel.dylib`__semwait_signal + 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP * frame #0: 0x00007fff8c08d10a libsystem_kernel.dylib`__semwait_signal + 10 frame #1: 0x00007fff90e8f787 libsystem_pthread.dylib`pthread_join + 444 frame #2: 0x000000010be46e9e libopenblas_sandybridgep-r0.2.19.dylib`blas_thread_shutdown_ + 238 frame #3: 0x00007fff90e8fda7 libsystem_pthread.dylib`_pthread_fork_prepare + 85 frame #4: 0x00007fff81da9a74 libSystem.B.dylib`libSystem_atfork_prepare + 24 frame #5: 0x00007fff8f2a7f7c libsystem_c.dylib`fork + 12 frame #6: 0x00007fff8f2a7310 libsystem_c.dylib`forkpty + 58 frame #7: 0x000000010999f493 libpython2.7.dylib`posix_forkpty + 35 frame #8: 0x00000001099514ef libpython2.7.dylib`PyEval_EvalFrameEx + 25631 frame #9: 0x000000010995126e libpython2.7.dylib`PyEval_EvalFrameEx + 24990 frame #10: 0x000000010995219c libpython2.7.dylib`PyEval_EvalCodeEx + 2124 frame #11: 0x00000001098bc0dd libpython2.7.dylib`function_call + 349 frame #12: 0x0000000109884403 libpython2.7.dylib`PyObject_Call + 67 frame #13: 0x00000001098989ec libpython2.7.dylib`instancemethod_call + 140 frame #14: 0x000000010b27f072 sagespawn.so`__pyx_pw_4sage_10interfaces_9sagespawn_9SageSpawn_3_spawnpty + 65 at sagespawn.c:4885 frame #15: 0x000000010b27f031 sagespawn.so`__pyx_pw_4sage_10interfaces_9sagespawn_9SageSpawn_3_spawnpty + 123 frame #16: 0x000000010b27efb6 sagespawn.so`__pyx_pw_4sage_10interfaces_9sagespawn_9SageSpawn_3_spawnpty(__pyx_self=<unavailable>, __pyx_args=<unavailable>, __pyx_kwds=<unavailable>) + 86 frame #17: 0x0000000109884403 libpython2.7.dylib`PyObject_Call + 67 frame #18: 0x000000010994de87 libpython2.7.dylib`PyEval_EvalFrameEx + 11703 frame #19: 0x000000010995219c libpython2.7.dylib`PyEval_EvalCodeEx + 2124 frame #20: 0x00000001099511cd libpython2.7.dylib`PyEval_EvalFrameEx + 24829 frame #21: 0x000000010995219c libpython2.7.dylib`PyEval_EvalCodeEx + 2124 frame #22: 0x00000001098bc0dd libpython2.7.dylib`function_call + 349 frame #23: 0x0000000109884403 libpython2.7.dylib`PyObject_Call + 67 frame #24: 0x00000001098989ec libpython2.7.dylib`instancemethod_call + 140 frame #25: 0x000000010b27bb80 sagespawn.so`__Pyx_PyObject_Call(func=0x00000002403d9b90, arg=<unavailable>, kw=<unavailable>) + 64 at sagespawn.c:4885 frame #26: 0x000000010b281fd0 sagespawn.so`__pyx_pw_4sage_10interfaces_9sagespawn_9SageSpawn_1__init__ + 1745 at sagespawn.c:1564 frame #27: 0x000000010b2818ff sagespawn.so`__pyx_pw_4sage_10interfaces_9sagespawn_9SageSpawn_1__init__(__pyx_self=<unavailable>, __pyx_args=<unavailable>, __pyx_kwds=<unavailable>) + 111 frame #28: 0x0000000109884403 libpython2.7.dylib`PyObject_Call + 67 frame #29: 0x00000001098989ec libpython2.7.dylib`instancemethod_call + 140 frame #30: 0x0000000109884403 libpython2.7.dylib`PyObject_Call + 67 frame #31: 0x00000001099033ed libpython2.7.dylib`slot_tp_init + 109 frame #32: 0x0000000109900d2a libpython2.7.dylib`type_call + 202 frame #33: 0x0000000109884403 libpython2.7.dylib`PyObject_Call + 67 frame #34: 0x000000010994eecd libpython2.7.dylib`PyEval_EvalFrameEx + 15869 frame #35: 0x000000010995219c libpython2.7.dylib`PyEval_EvalCodeEx + 2124 frame #36: 0x00000001098bbffc libpython2.7.dylib`function_call + 124 frame #37: 0x0000000109884403 libpython2.7.dylib`PyObject_Call + 67 frame #38: 0x00000001098989ec libpython2.7.dylib`instancemethod_call + 140 frame #39: 0x0000000109884403 libpython2.7.dylib`PyObject_Call + 67 frame #40: 0x000000010994eecd libpython2.7.dylib`PyEval_EvalFrameEx + 15869 frame #41: 0x000000010995219c libpython2.7.dylib`PyEval_EvalCodeEx + 2124 frame #42: 0x00000001099511cd libpython2.7.dylib`PyEval_EvalFrameEx + 24829 frame #43: 0x000000010995219c libpython2.7.dylib`PyEval_EvalCodeEx + 2124 frame #44: 0x00000001098bc0dd libpython2.7.dylib`function_call + 349 frame #45: 0x0000000109884403 libpython2.7.dylib`PyObject_Call + 67 frame #46: 0x000000010994de87 libpython2.7.dylib`PyEval_EvalFrameEx + 11703 frame #47: 0x000000010995219c libpython2.7.dylib`PyEval_EvalCodeEx + 2124 frame #48: 0x00000001098bc0dd libpython2.7.dylib`function_call + 349 frame #49: 0x0000000109884403 libpython2.7.dylib`PyObject_Call + 67 frame #50: 0x00000001098989ec libpython2.7.dylib`instancemethod_call + 140 frame #51: 0x0000000109884403 libpython2.7.dylib`PyObject_Call + 67 frame #52: 0x000000010994de87 libpython2.7.dylib`PyEval_EvalFrameEx + 11703 frame #53: 0x000000010995219c libpython2.7.dylib`PyEval_EvalCodeEx + 2124 frame #54: 0x0000000237667191 matrix_integer_dense.so`__Pyx_PyFunction_FastCallDict(func=0x0000000114422488, args=<unavailable>, nargs=<unavailable>, kwargs=0x9000000000000000) + 113 at matrix_integer_dense.c:57331 frame #55: 0x00000002376affa6 matrix_integer_dense.so`__pyx_pw_4sage_6matrix_20matrix_integer_dense_20Matrix_integer_dense_157_singular_(__pyx_v_self=<unavailable>, __pyx_args=<unavailable>, __pyx_kwds=<unavailable>) + 6134 at matrix_integer_dense.c:48368 frame #56: 0x000000010995169b libpython2.7.dylib`PyEval_EvalFrameEx + 26059 frame #57: 0x000000010995219c libpython2.7.dylib`PyEval_EvalCodeEx + 2124 frame #58: 0x00000001098bbffc libpython2.7.dylib`function_call + 124 frame #59: 0x0000000109884403 libpython2.7.dylib`PyObject_Call + 67 frame #60: 0x00000001098989ec libpython2.7.dylib`instancemethod_call + 140 frame #61: 0x0000000109884403 libpython2.7.dylib`PyObject_Call + 67 frame #62: 0x0000000109902f95 libpython2.7.dylib`slot_tp_call + 101 frame #63: 0x0000000109884403 libpython2.7.dylib`PyObject_Call + 67 frame #64: 0x000000010994eecd libpython2.7.dylib`PyEval_EvalFrameEx + 15869 frame #65: 0x000000010995219c libpython2.7.dylib`PyEval_EvalCodeEx + 2124 frame #66: 0x0000000109951373 libpython2.7.dylib`PyEval_EvalFrameEx + 25251 frame #67: 0x000000010995219c libpython2.7.dylib`PyEval_EvalCodeEx + 2124 frame #68: 0x00000001099511cd libpython2.7.dylib`PyEval_EvalFrameEx + 24829 frame #69: 0x000000010995219c libpython2.7.dylib`PyEval_EvalCodeEx + 2124 frame #70: 0x00000001099511cd libpython2.7.dylib`PyEval_EvalFrameEx + 24829 frame #71: 0x000000010995219c libpython2.7.dylib`PyEval_EvalCodeEx + 2124 frame #72: 0x00000001099511cd libpython2.7.dylib`PyEval_EvalFrameEx + 24829 frame #73: 0x000000010995219c libpython2.7.dylib`PyEval_EvalCodeEx + 2124 frame #74: 0x00000001098bbffc libpython2.7.dylib`function_call + 124 frame #75: 0x0000000109884403 libpython2.7.dylib`PyObject_Call + 67 frame #76: 0x00000001098989ec libpython2.7.dylib`instancemethod_call + 140 frame #77: 0x0000000109884403 libpython2.7.dylib`PyObject_Call + 67 frame #78: 0x0000000109902f95 libpython2.7.dylib`slot_tp_call + 101 frame #79: 0x0000000109884403 libpython2.7.dylib`PyObject_Call + 67 frame #80: 0x000000010994eecd libpython2.7.dylib`PyEval_EvalFrameEx + 15869 frame #81: 0x000000010995219c libpython2.7.dylib`PyEval_EvalCodeEx + 2124 frame #82: 0x00000001099511cd libpython2.7.dylib`PyEval_EvalFrameEx + 24829 frame #83: 0x000000010995126e libpython2.7.dylib`PyEval_EvalFrameEx + 24990 frame #84: 0x000000010995219c libpython2.7.dylib`PyEval_EvalCodeEx + 2124 frame #85: 0x00000001098bbffc libpython2.7.dylib`function_call + 124 frame #86: 0x0000000109884403 libpython2.7.dylib`PyObject_Call + 67 frame #87: 0x00000001098989ec libpython2.7.dylib`instancemethod_call + 140 frame #88: 0x0000000109884403 libpython2.7.dylib`PyObject_Call + 67 frame #89: 0x00000001099033ed libpython2.7.dylib`slot_tp_init + 109 frame #90: 0x0000000109900d2a libpython2.7.dylib`type_call + 202 frame #91: 0x0000000109884403 libpython2.7.dylib`PyObject_Call + 67 frame #92: 0x000000010994eecd libpython2.7.dylib`PyEval_EvalFrameEx + 15869 frame #93: 0x000000010995126e libpython2.7.dylib`PyEval_EvalFrameEx + 24990 frame #94: 0x000000010995219c libpython2.7.dylib`PyEval_EvalCodeEx + 2124 frame #95: 0x00000001099511cd libpython2.7.dylib`PyEval_EvalFrameEx + 24829 frame #96: 0x000000010995219c libpython2.7.dylib`PyEval_EvalCodeEx + 2124 frame #97: 0x00000001099511cd libpython2.7.dylib`PyEval_EvalFrameEx + 24829 frame #98: 0x000000010995219c libpython2.7.dylib`PyEval_EvalCodeEx + 2124 frame #99: 0x00000001099511cd libpython2.7.dylib`PyEval_EvalFrameEx + 24829 frame #100: 0x000000010995219c libpython2.7.dylib`PyEval_EvalCodeEx + 2124 frame #101: 0x00000001099511cd libpython2.7.dylib`PyEval_EvalFrameEx + 24829 frame #102: 0x000000010995219c libpython2.7.dylib`PyEval_EvalCodeEx + 2124 frame #103: 0x00000001099511cd libpython2.7.dylib`PyEval_EvalFrameEx + 24829 frame #104: 0x000000010995219c libpython2.7.dylib`PyEval_EvalCodeEx + 2124 frame #105: 0x00000001099522b9 libpython2.7.dylib`PyEval_EvalCode + 25 frame #106: 0x000000010997c2c3 libpython2.7.dylib`PyRun_FileExFlags + 291 frame #107: 0x000000010997dd67 libpython2.7.dylib`PyRun_SimpleFileExFlags + 215 frame #108: 0x0000000109997873 libpython2.7.dylib`Py_Main + 3443 frame #109: 0x00007fff8f9385ad libdyld.dylib`start + 1 frame #110: 0x00007fff8f9385ad libdyld.dylib`start + 1
comment:4 Changed 4 years ago by
Running
while true ; do ./sage -t --long src/sage/matrix/matrix_integer_dense.pyx ; done
usually hangs/crashes in about 10 - 100 tries.
comment:5 Changed 4 years ago by
Another possibility: The alarm() could mess up OpenBLAS's internal thread pool...
comment:6 Changed 4 years ago by
Here is a similar backtrace from Linux 64-bit Haswell-E (libopenblas_haswellp-r0.2.19.so)
#0 0x00007fa6cf0756bd in pthread_join () from /lib64/libpthread.so.0 #1 0x00007fa6b9dcb51e in blas_thread_shutdown_ () from /home/vbraun/Code/sage.git/local/lib/libopenblas.so.0 #2 0x00007fa6ce663aa5 in fork () from /lib64/libc.so.6 #3 0x00007fa6cec67840 in forkpty () from /lib64/libutil.so.1 #4 0x00007fa6cf3d6583 in posix_forkpty (self=<optimized out>, noargs=<optimized out>) at ./Modules/posixmodule.c:4012 #5 0x00007fa6cf391cca in call_function (oparg=<optimized out>, pp_stack=0x7ffcbbf7d5d0) at Python/ceval.c:4019 #6 PyEval_EvalFrameEx (f=f@entry=0x7f9e6fa8b830, throwflag=throwflag@entry=0) at Python/ceval.c:2681 #7 0x00007fa6cf391ca0 in fast_function (nk=<optimized out>, na=0, n=0, pp_stack=0x7ffcbbf7d6f0, func=<optimized out>) at Python/ceval.c:4121 #8 call_function (oparg=<optimized out>, pp_stack=0x7ffcbbf7d6f0) at Python/ceval.c:4056 #9 PyEval_EvalFrameEx (f=f@entry=0x7f9e72d799c0, throwflag=throwflag@entry=0) at Python/ceval.c:2681 #10 0x00007fa6cf392b2c in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=args@entry=0x7f9e740b9c80, argcount=2, kws=kws@entry=0x7f9e6f4b5e78, kwcount=4, defs=0x7fa6bedf0cc8, defcount=5, closure=0x0) at Python/ceval.c:3267 #11 0x00007fa6cf30c8ed in function_call (func=0x7fa6be8d8320, arg=0x7f9e740b9c68, kw=0x7f9e6fa5a910) at Objects/funcobject.c:526 #12 0x00007fa6cf2dbe33 in PyObject_Call (func=func@entry=0x7fa6be8d8320, arg=arg@entry=0x7f9e740b9c68, kw=kw@entry=0x7f9e6fa5a910) at Objects/abstract.c:2529 #13 0x00007fa6cf2ed01c in instancemethod_call (func=0x7fa6be8d8320, arg=0x7f9e740b9c68, kw=0x7f9e6fa5a910) at Objects/classobject.c:2602 #14 0x00007fa6be661ab9 in __Pyx_PyObject_Call (kw=0x7f9e6fa5a910, arg=0x7f9e6fb68990, func=0x7f9e6f8e73c0) at /home/vbraun/Sage/git/src/build/cythonized/sage/interfaces/sagespawn.c:4885 #15 __pyx_pf_4sage_10interfaces_9sagespawn_9SageSpawn_2_spawnpty (__pyx_self=<optimized out>, __pyx_v_kwds=0x7f9e6fa5a910, __pyx_v_args=<optimized out>, __pyx_v_self=<optimized out>) at /home/vbraun/Sage/git/src/build/cythonized/sage/interfaces/sagespawn.c:1797 #16 __pyx_pw_4sage_10interfaces_9sagespawn_9SageSpawn_3_spawnpty (__pyx_self=<optimized out>, __pyx_args=<optimized out>, __pyx_kwds=<optimized out>) at /home/vbraun/Sage/git/src/build/cythonized/sage/interfaces/sagespawn.c:1763 #17 0x00007fa6cf2dbe33 in PyObject_Call (func=func@entry=0x7fa6beb1aad0, arg=arg@entry=0x7f9e74261560, kw=kw@entry=0x7f9e6f4e0398) at Objects/abstract.c:2529 #18 0x00007fa6cf38cede in ext_do_call (nk=<optimized out>, na=2, flags=<optimized out>, pp_stack=0x7ffcbbf7dc30, func=0x7fa6beb1aad0) at Python/ceval.c:4348 #19 PyEval_EvalFrameEx (f=f@entry=0x7f9e72d79580, throwflag=throwflag@entry=0) at Python/ceval.c:2720 #20 0x00007fa6cf392b2c in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=argcount@entry=5, kws=0x7f9e72d2c6c8, kwcount=0, defs=0x7fa6bed85068, defcount=3, closure=0x0) at Python/ceval.c:3267
comment:7 Changed 4 years ago by
Probably relevant: http://savannah.gnu.org/bugs/?47400
comment:8 Changed 4 years ago by
I am testing the OMP_NUM_THREADS=1
thing. It seems to apply even if you haven't compiled openblas with openmp. Have done quite a few loops without incident so far.
comment:9 Changed 4 years ago by
OMP_NUM_THREADS=1
seems effective in suppressing the problem so it is probably a related issue. The test also runs quicker and uses less resources.
comment:10 Changed 4 years ago by
- Branch set to u/vbraun/openblas_randomly_crashes
comment:11 Changed 4 years ago by
- Commit set to 27f412b65b7b13ec908eebec9f26d7036a374174
comment:12 Changed 4 years ago by
But acceptable as long as we don't properly use OpenBLAS threading in Sage, if that ever happens. IIRC even Linbox supposes (or supposed and now enforces it at runtime) that OpenBLAS is single threaded to get optimal performance.
comment:13 follow-up: ↓ 15 Changed 4 years ago by
Maybe setting OMP_NUM_THREADS by default to 1 at runtime would be less brutal?
comment:14 Changed 4 years ago by
See #21323 where a related discussion happened..
comment:15 in reply to: ↑ 13 Changed 4 years ago by
Replying to jpflori:
Maybe setting OMP_NUM_THREADS by default to 1 at runtime would be less brutal?
More complicated and error prone - by that I mean someone will mess up.
comment:16 Changed 4 years ago by
Doing it for each call from the code doesn't scale very well. You'll have to find all the code and fix the non-sage code, like cvxopt and scipy, as well. Good luck.
comment:17 follow-up: ↓ 18 Changed 4 years ago by
I mean setting it globally in sage-env...
comment:18 in reply to: ↑ 17 Changed 4 years ago by
comment:19 follow-up: ↓ 20 Changed 4 years ago by
So any conclusion? We are currently in a pretty bad shape as far as running doctests is concerned, about 50% of the time the testsuite fails for me.
comment:20 in reply to: ↑ 19 Changed 4 years ago by
Replying to vbraun:
So any conclusion? We are currently in a pretty bad shape as far as running doctests is concerned, about 50% of the time the testsuite fails for me.
No bullet proof way I guess. It can always be tempered with, one way or the other. Putting it in sage-env
has the advantage that it can be picked up by distro while they may not be willing to pack their blas/lapack (whichever it is) single threaded. From my sage-on-distro point of view the only difficulty is when someone import sage from python rather than by starting the sage script (yes you can do that, env.py
make that possible, _I_ made sure of that).
So after bashing it, I will say that sage-env
offer the most flexibility - and flexibility always cuts both ways.
comment:21 follow-up: ↓ 22 Changed 4 years ago by
Globally setting OMP_NUM_THREADS=1
will affect all OpenMP programs, not just OpenBLAS. If thats what you really want then I'm fine with it, just pointing out the obvious.
comment:22 in reply to: ↑ 21 Changed 4 years ago by
Replying to vbraun:
Globally setting
OMP_NUM_THREADS=1
will affect all OpenMP programs, not just OpenBLAS. If thats what you really want then I'm fine with it, just pointing out the obvious.
Thanks for pointing the obvious. I guess your branch has the least side effects on sage as a whole.
comment:23 Changed 4 years ago by
Is that a positive review? ;-)
comment:24 Changed 4 years ago by
- Reviewers set to François Bissey
- Status changed from new to needs_review
I was waiting for you to fill everything in and put it in "needs_review" but what the heck. Done!
comment:26 Changed 4 years ago by
- Status changed from positive_review to needs_work
Hmm fails on OSX
16387[openblas-0.2.19.p0] gfortran -m128bit-long-double -Wall -m64 -L/Users/buildslave-sage/slave/sage_git/build/local/lib -Wl,-rpath,/Users/buildslave-sage/slave/sage_git/build/local/lib -o sblat3 sblat3.o ../libopenblas_sandybridgep-r0.2.19.a -lpthread -lgfortran -lpthread -lgfortran 16388[openblas-0.2.19.p0] ld: file too small (length=0) file '../libopenblas_sandybridgep-r0.2.19.a' for architecture x86_64 16389[openblas-0.2.19.p0] collect2: error: ld returned 1 exit status 16390[openblas-0.2.19.p0] make[4]: *** [cblat1] Error 1
comment:27 Changed 4 years ago by
Looks like a test program problem again. I will investigate ASAP.
comment:28 Changed 4 years ago by
Hum I cannot reproduce it locally, my line is slightly different
gfortran -m128bit-long-double -Wall -m64 -L/Users/fbissey/build/sage/local/lib -Wl,-rpath,/Users/fbissey/build/sage/local/lib -o sblat3 sblat3.o ../libopenblas_haswell-r0.2.19.a -lgfortran -lgfortran
Notably I don't have -lpthread
which is suspicious. Was this a "binary" build?
comment:29 Changed 4 years ago by
No, just a normal (incremental) build
comment:30 Changed 4 years ago by
There are other suspicious bits in the build log,
gcc -O2 -DMAX_STACK_ALLOC=2048 -DEXPRECISION -m128bit-long-double -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DNO_WARMUP -DMAX_CPU_NUMBER=4 -DASMNAME=_ -DASMFNAME=__ -DNAME=_ -DCNAME= -DCHAR_NAME=\"_\" -DCHAR_CNAME=\"\" -DNO_AFFINITY -I.. -all_load -headerpad_max_install_names -install_name "/Users/fbissey/build/sage/local/var/tmp/sage/build/openblas-0.2.19/src/exports/../libopenblas_haswell-r0.2.19.dylib" -dynamiclib -o ../libopenblas_haswell-r0.2.19.dylib ../libopenblas_haswell-r0.2.19.a -Wl,-exported_symbols_list,osx.def -L/Users/fbissey/build/sage/local/lib -L/Users/fbissey/build/sage/local/lib/gcc/x86_64-apple-darwin16.1.0/5.4.0 -L/Users/fbissey/build/sage/local/lib/gcc/x86_64-apple-darwin16.1.0/5.4.0/../../.. -L/Users/fbissey/build/sage/local/lib -L/Users/fbissey/build/sage/local/lib/gcc/x86_64-apple-darwin16.1.0/5.4.0 -L/Users/fbissey/build/sage/local/lib/gcc/x86_64-apple-darwin16.1.0/5.4.0/../../.. -lgfortran -lSystem -lquadmath -lm -lSystem -lgfortran -lSystem -lquadmath -lm -lSystem ld: warning: object file (/Users/fbissey/build/sage/local/lib/gcc/x86_64-apple-darwin16.1.0/5.4.0/libgcc.a(_muldi3.o)) was built for newer OSX version (10.9) than being linked (10.6) ld: warning: object file (/Users/fbissey/build/sage/local/lib/gcc/x86_64-apple-darwin16.1.0/5.4.0/libgcc.a(_negdi2.o)) was built for newer OSX version (10.9) than being linked (10.6) ....
I get those after the tests are run and before the install (so it happens after your error), I wonder if the problem they are alluding too is fatal in your build. It certainly merit investigation. Also what is the exact version of OS X involved?
comment:31 Changed 4 years ago by
It looks like Openblas wants to build compatible to 10.6. From Makefile.system
ifeq ($(OSNAME), Darwin) export MACOSX_DEPLOYMENT_TARGET=10.6 MD5SUM = md5 -r endif
We currently set MACOSX_DEPLOYMENT_TARGET
to the system value or 10.9, whichever is lowest. I don't think openblas should over-ride an external setting of that kind.
comment:32 follow-up: ↓ 33 Changed 4 years ago by
Sorry, I don't have much sage time these days.
The correct env variable is OPENBLAS_NUM_THREADS
.
See:
comment:33 in reply to: ↑ 32 Changed 4 years ago by
Replying to jpflori:
Sorry, I don't have much sage time these days. The correct env variable is
OPENBLAS_NUM_THREADS
. See:
Is the right variable for runtime, not build time. Volker got it right according to the same link. However for some reason I think threads were not completely disabled in his build.
comment:34 Changed 4 years ago by
I cannot figure out where the -lpthread
originate in your build Volker.
comment:35 Changed 4 years ago by
I think I found where you get pthread
from and that may be the root cause. We may have a "shell" accident depending on version. It is still rather strange that I cannot reproduce it.
comment:36 Changed 4 years ago by
- Branch changed from u/vbraun/openblas_randomly_crashes to u/fbissey/openblas_randomly_crashes
- Commit changed from 27f412b65b7b13ec908eebec9f26d7036a374174 to 026672777b73c63b58800135796caa3981faf924
comment:37 Changed 4 years ago by
- Status changed from needs_work to needs_review
Two of the patches I have added to this branch have now been upstreamed. Only the last one about the SMP
variable use in ifdef
has not been submitted yet. This one cure the fact you have -lpthread
when you shouldn't. It fells like something other than a recent bash
is used on that machine.
comment:38 Changed 4 years ago by
- Status changed from needs_review to positive_review
comment:39 Changed 4 years ago by
Followup at #22100
comment:40 Changed 4 years ago by
- Branch changed from u/fbissey/openblas_randomly_crashes to 026672777b73c63b58800135796caa3981faf924
- Resolution set to fixed
- Status changed from positive_review to closed
comment:41 Changed 3 years ago by
- Commit 026672777b73c63b58800135796caa3981faf924 deleted
Apparently this still happens sometimes in sage 8.1.beta6, I created #23933 for this.
comment:42 Changed 2 years ago by
You discarded the option to set OMP_NUM_THREADS=1
. What about OPENBLAS_NUM_THREADS=1
? This came up in #26118 (sage -tp
does not scale to many cores.)
Example 1:
Example 2:
Example 3: