Opened 5 years ago

Last modified 17 months ago

#22822 new defect

openblas from Cygwin's package deadloops after fork

Reported by: embray Owned by:
Priority: minor Milestone: sage-wishlist
Component: porting: Cygwin Keywords:
Cc: Merged in:
Authors: Reviewers:
Report Upstream: Fixed upstream, but not in a stable release. Work issues:
Branch: Commit:
Dependencies: #24638 Stopgaps:

Status badges

Description (last modified by embray)

After some pain and headscratching I see what's going on here. Normally when I build Sage for Cygwin I use Cygwin's system liblapack0 package using SAGE_ATLAS_LIB and selecting --with-blas=atlas at configure time.

The liblapack0 package provides a BLAS implementation--the Netlib reference implementation.In any case it works but I haven't done any performance comparisons.

When you install the liblapack0 package it installs the cygblas-0.dll in /usr/lib/lapack as well as an /etc/profile.d/ script that appends /usr/lib/lapack to $PATH so that any cygblas-0.dll is found there.

However, at some point I installed the Cygwin R package, which depends on Cygwin's openblas package which is actually the default BLAS on Cygwin (that is, it installs cygblas-0.dll to /bin. Since this comes earlier on the path, this is the BLAS that gets used if it's installed. That would be fine, except there is a problem with the package. It is not configured to use pthreads, and instead uses OpenBLAS's Windows threading driver. That works fine until you fork a process, at which case the entire threading state becomes invalid, but since the Windows threading driver doesn't know about fork it just carries on, and doesn't have enough error-handling to deal properly with this scenario. Any attempt to perform a multi-threaded operation after a fork goes into a dead wait for a thread that doesn't exist to become available. This is similar to the issue I described in #22694, though more severe since it just locks, rather than crashes.

This can be avoided a number of ways (I will also post something about this on the wiki):

  1. For now, definitely just don't install Cygwin's openblas package.
  2. Cygwin's openblas package should be fixed to be built with pthread support; I don't know why it isn't but I will report this...
  3. OpenBLAS could be fixed upstream to recognize building on Cygwin, and install post-fork handlers even when using the Windows threading driver (similar to what libgc does).
  4. Build either ATLAS or OpenBLAS in Sage and don't use the system package--in my experience this has generally worked too, though I haven't tested either one extensively. I need to do more work with both and see what issues arise. This option is certainly fine too, but it saves a lot of development time to use the system package.

Upstream PR for OpenBLAS: https://github.com/xianyi/OpenBLAS/pull/1450

Discussion on Cygwin mailing list: https://cygwin.com/ml/cygwin/2018-02/msg00057.html

Change History (8)

comment:1 Changed 5 years ago by embray

  • Description modified (diff)

comment:2 Changed 5 years ago by embray

  • Description modified (diff)
  • Report Upstream changed from Not yet reported upstream; Will do shortly. to Reported upstream. No feedback yet.

comment:3 Changed 5 years ago by embray

  • Description modified (diff)

Finally also brought this up on Cygwin. I meant to do that a long time ago, but it helps to have an actual patch to discuss.

comment:4 Changed 5 years ago by embray

  • Description modified (diff)

comment:5 Changed 5 years ago by jdemeyer

  • Dependencies set to #24638

Beware of conflicts with #24638.

comment:6 Changed 5 years ago by embray

AFIACT there is no conflict.

comment:7 Changed 4 years ago by embray

  • Report Upstream changed from Reported upstream. No feedback yet. to Fixed upstream, but not in a stable release.

comment:8 Changed 17 months ago by embray

FWIW I *think* this issue is fixed. The bug in OpenBLAS has been long-since fixed, and the Cygwin package has been updated as well.

What we don't have yet is getting Sage built against the system OpenBLAS. Once that's done and confirmed working, we can definitely close this issue.

Note: See TracTickets for help on using tickets.