Opened 3 years ago

Last modified 22 months ago

#22822 new defect

openblas from Cygwin's package deadloops after fork

Reported by: embray Owned by:
Priority: minor Milestone: sage-wishlist
Component: porting: Cygwin Keywords:
Cc: Merged in:
Authors: Reviewers:
Report Upstream: Fixed upstream, but not in a stable release. Work issues:
Branch: Commit:
Dependencies: #24638 Stopgaps:

Description (last modified by embray)

After some pain and headscratching I see what's going on here. Normally when I build Sage for Cygwin I use Cygwin's system liblapack0 package using SAGE_ATLAS_LIB and selecting --with-blas=atlas at configure time.

The liblapack0 package provides a BLAS implementation--the Netlib reference implementation.In any case it works but I haven't done any performance comparisons.

When you install the liblapack0 package it installs the cygblas-0.dll in /usr/lib/lapack as well as an /etc/profile.d/ script that appends /usr/lib/lapack to $PATH so that any cygblas-0.dll is found there.

However, at some point I installed the Cygwin R package, which depends on Cygwin's openblas package which is actually the default BLAS on Cygwin (that is, it installs cygblas-0.dll to /bin. Since this comes earlier on the path, this is the BLAS that gets used if it's installed. That would be fine, except there is a problem with the package. It is not configured to use pthreads, and instead uses OpenBLAS's Windows threading driver. That works fine until you fork a process, at which case the entire threading state becomes invalid, but since the Windows threading driver doesn't know about fork it just carries on, and doesn't have enough error-handling to deal properly with this scenario. Any attempt to perform a multi-threaded operation after a fork goes into a dead wait for a thread that doesn't exist to become available. This is similar to the issue I described in #22694, though more severe since it just locks, rather than crashes.

This can be avoided a number of ways (I will also post something about this on the wiki):

  1. For now, definitely just don't install Cygwin's openblas package.
  2. Cygwin's openblas package should be fixed to be built with pthread support; I don't know why it isn't but I will report this...
  3. OpenBLAS could be fixed upstream to recognize building on Cygwin, and install post-fork handlers even when using the Windows threading driver (similar to what libgc does).
  4. Build either ATLAS or OpenBLAS in Sage and don't use the system package--in my experience this has generally worked too, though I haven't tested either one extensively. I need to do more work with both and see what issues arise. This option is certainly fine too, but it saves a lot of development time to use the system package.

Upstream PR for OpenBLAS: https://github.com/xianyi/OpenBLAS/pull/1450

Discussion on Cygwin mailing list: https://cygwin.com/ml/cygwin/2018-02/msg00057.html

Change History (7)

comment:1 Changed 2 years ago by embray

  • Description modified (diff)

comment:2 Changed 2 years ago by embray

  • Description modified (diff)
  • Report Upstream changed from Not yet reported upstream; Will do shortly. to Reported upstream. No feedback yet.

comment:3 Changed 2 years ago by embray

  • Description modified (diff)

Finally also brought this up on Cygwin. I meant to do that a long time ago, but it helps to have an actual patch to discuss.

comment:4 Changed 2 years ago by embray

  • Description modified (diff)

comment:5 Changed 2 years ago by jdemeyer

  • Dependencies set to #24638

Beware of conflicts with #24638.

comment:6 Changed 2 years ago by embray

AFIACT there is no conflict.

comment:7 Changed 22 months ago by embray

  • Report Upstream changed from Reported upstream. No feedback yet. to Fixed upstream, but not in a stable release.
Note: See TracTickets for help on using tickets.