Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#25921 closed defect (fixed)

aborted tests on macOS 10.13.6

Reported by: gh-bryangingechen Owned by:
Priority: major Milestone: sage-8.6
Component: interfaces: optional Keywords:
Cc: Merged in:
Authors: Jeroen Demeyer Reviewers: John Palmieri
Report Upstream: N/A Work issues:
Branch: 688a288 (Commits, GitHub, GitLab) Commit: 688a288e16663fe2c4a881baa510ed797a46efa1
Dependencies: Stopgaps:

Status badges

Description

With 8.3rc2, testing of sage.doctest.external and sage.combinat.designs.ext_rep fail with similar messages.

From googling, I found this comment which https://github.com/ansible/ansible/issues/34056#issuecomment-352862252 but I do not have the knowledge to tell whether this is the same issue. Excerpts from output below; full logs attached.

sage -t --long src/sage/doctest/external.py
Running doctests with ID 2018-07-24-23-58-33-c07cfdbe.
Git branch: develop
Using --optional=mpir,python2,sage
Doctesting 1 file.
sage -t --long --warn-long 183.2 src/sage/doctest/external.py
    Killed due to abort
**********************************************************************
Tests run before process (pid=69718) failed:

[ snip ]

sage: sage.doctest.external._lookup('internet') # random ## line 302 ##
objc[69718]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[69718]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
------------------------------------------------------------------------
0   signals.so                          0x000000010e5956e8 print_backtrace + 40
1   ???                                 0x6974757469747362 0x0 + 7598827614126764898
------------------------------------------------------------------------
Unhandled SIGABRT: An abort() occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
------------------------------------------------------------------------

**********************************************************************
----------------------------------------------------------------------
sage -t --long --warn-long 183.2 src/sage/doctest/external.py  # Killed due to abort
----------------------------------------------------------------------
Total time for all tests: 1.6 seconds
    cpu time: 0.0 seconds
    cumulative wall time: 0.0 seconds
sage -t --long src/sage/combinat/designs/ext_rep.py
Running doctests with ID 2018-07-24-23-58-47-07be9643.
Git branch: develop
Using --optional=mpir,python2,sage
Doctesting 1 file.
sage -t --long --warn-long 183.2 src/sage/combinat/designs/ext_rep.py
    Killed due to abort
**********************************************************************
Tests run before process (pid=69771) failed:

[ snip ]

sage: from sage.combinat.designs import ext_rep ## line 549 ##
sage: file_loc = ext_rep.dump_to_tmpfile(ext_rep.v2_b2_k2_icgsa) ## line 550 ##
sage: proc = ext_rep.XTreeProcessor() ## line 551 ##
sage: s = ext_rep.open_extrep_url("file://" + file_loc) ## line 552 ##
objc[69771]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[69771]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
------------------------------------------------------------------------
0   signals.so                          0x0000000107ab66e8 print_backtrace + 40
1   ???                                 0x6974757469747362 0x0 + 7598827614126764898
------------------------------------------------------------------------
Unhandled SIGABRT: An abort() occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
------------------------------------------------------------------------

**********************************************************************
----------------------------------------------------------------------
sage -t --long --warn-long 183.2 src/sage/combinat/designs/ext_rep.py  # Killed due to abort
----------------------------------------------------------------------
Total time for all tests: 0.9 seconds
    cpu time: 0.0 seconds
    cumulative wall time: 0.0 seconds

Attachments (2)

crash_sage.combinat.designs.ext_rep.log (2.9 KB) - added by gh-bryangingechen 3 years ago.
crash_sage.doctest.external.log (4.7 KB) - added by gh-bryangingechen 3 years ago.

Download all attachments as: .zip

Change History (47)

Changed 3 years ago by gh-bryangingechen

Changed 3 years ago by gh-bryangingechen

comment:1 follow-up: Changed 3 years ago by jhpalmieri

I think this was caused by #25092. See comment 11 there. Setting the environment variable OBJC_DISABLE_INITIALIZE_FORK_SAFETY to YES works for me – that is, if I do this before running doctests, I don't get these failures.

comment:2 Changed 3 years ago by jhpalmieri

comment:3 in reply to: ↑ 1 Changed 3 years ago by gh-bryangingechen

Thanks! I missed those comments when I searched. I can confirm that setting the environment variable fixes these tests. Should I close this ticket as a duplicate then?

comment:4 follow-up: Changed 3 years ago by jhpalmieri

Well, is setting an environment variable a good enough solution? I don't know what the underlying problem is, and whether it's worth pursuing, so I think we should keep this open to see if anyone can make any progress on it.

comment:5 Changed 3 years ago by embray

There is also an open issue about this in Python: https://bugs.python.org/issue33725

comment:6 in reply to: ↑ 4 Changed 3 years ago by jdemeyer

Replying to jhpalmieri:

I don't know what the underlying problem is

I'd love to read a good explanation. All over the web you'll find different partial explanations about the problem and what the environment variable does but never with a lot of technical detail.

For example, there is talk about "multi-threaded applications" but what exactly is a multi-threaded application?

comment:7 follow-up: Changed 3 years ago by embray

I think the talk about "multi-threaded applications" is perhaps mostly a red-herring.

The issue is that Objective-C runtime functions are not allowed to be called between the fork() and exec() in a child process--if you do the runtime calls abort(). The OBJC_DISABLE_INITIALIZE_FORK_SAFETY=yes environment variable disables this, but then you really better be sure that you've registered a pre-fork handler (e.g. with pthread_atfork) that ensures the C runtime is in a safe state (e.g. no locks are held, etc.) since it won't guarantee this on its own.

comment:8 in reply to: ↑ 7 Changed 3 years ago by jdemeyer

Replying to embray:

I think the talk about "multi-threaded applications" is perhaps mostly a red-herring.

The issue is that Objective-C runtime functions are not allowed to be called between the fork() and exec() in a child process

So they basically broke fork()?

comment:9 follow-up: Changed 3 years ago by embray

That's basically my understanding, though upon rereading the linked article, it does pertain to multi-threaded applications insofar as this is only a problem when forking from a multi-threaded process. If there's just a single process I think it doesn't matter.

It may not be entirely wrong:

A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called. [THR] [Option Start] Fork handlers may be established by means of the pthread_atfork() function in order to maintain application invariants across fork() calls.

If the runtime itself can't guarantee that "the child process [will] only execute async-signal-safe operations until such time as one of the exec functions is called" then it's safer for it to blow up immediately than maybe work, or maybe result in even harder to debug errors.

I'm not saying it doesn't suck, but it seems to make some sense...

comment:10 Changed 3 years ago by embray

To put another way, because Objective-C doesn't have an equivalent of a GIL, forking a multi-threaded application is dangerous and tricky, because rather than make sure that their library actually ensures it's in a sane state before forking, the Apple developers seem to have thrown their hands up and made it everyone else's problem (glibc sometimes has bugs like this as well, but actually tries to fix them...)

comment:11 in reply to: ↑ 9 ; follow-up: Changed 3 years ago by jdemeyer

I'd still like to know under exactly which circumstances the objc error occurs (and none of the articles I saw answers that question).

comment:12 Changed 3 years ago by jhpalmieri

When I first saw this, I thought that I had traced it to #25092. Can you deduce anything from that? (What changed in cysignals 1.7.0 that could have led to this?) Is there anything I can do to provide more information about Sage's situation in particular, since my computers give me these errors?

comment:13 in reply to: ↑ 11 ; follow-up: Changed 3 years ago by embray

Replying to jdemeyer:

I'd still like to know under exactly which circumstances the objc error occurs (and none of the articles I saw answers that question).

I think the reason for that is that there are probably too many to list, and it's impossible to say exactly since it's all closed source.

comment:14 Changed 3 years ago by embray

In these examples it's happening because something calls __NSPlaceholderDate.initialize whatever that is. So we'd want to go into GDB and find out why that's being called, because it seems to be a common thread (no pun intended) in this specific case.

comment:15 Changed 3 years ago by embray

Looks like it probably stems from somewhere in urlopen.

Is it only these tests that fail?

comment:16 Changed 3 years ago by jhpalmieri

Yes, only these.

comment:17 in reply to: ↑ 13 Changed 3 years ago by jdemeyer

Replying to embray:

I think the reason for that is that there are probably too many to list, and it's impossible to say exactly since it's all closed source.

Yes and that's really bad. How can we fix a problem if nobody is able to say exactly what the problem is?

comment:18 follow-up: Changed 3 years ago by jdemeyer

If setting OBJC_DISABLE_INITIALIZE_FORK_SAFETY=yes fixes this, then we at least have an easy workaround.

Should we do that (at least as a temporary solution) in Sage 8.3?

comment:19 in reply to: ↑ 18 ; follow-up: Changed 3 years ago by jhpalmieri

Replying to jdemeyer:

If setting OBJC_DISABLE_INITIALIZE_FORK_SAFETY=yes fixes this, then we at least have an easy workaround.

This only needs to be done when running doctests, by the way. I can't get the same commands to fail when run in Sage interactively.

comment:20 follow-up: Changed 3 years ago by jhpalmieri

From this point of view, we could just change the doctests:

  • src/sage/doctest/external.py

    diff --git a/src/sage/doctest/external.py b/src/sage/doctest/external.py
    index 9cbc42b178..de0b0e1e2f 100644
    a b def _lookup(software): 
    299299   
    300300    EXAMPLES::
    301301
    302         sage: sage.doctest.external._lookup('internet') # random
     302        sage: sage.doctest.external._lookup('latex') # random
    303303        True
    304304    """
    305305    if software in external_software:
    class AvailableSoftware(object): 
    332332         'octave',
    333333         'pandoc',
    334334         'scilab']
    335         sage: 'internet' in available_software # random
     335        sage: 'latex' in available_software # random
    336336        True
    337         sage: available_software.issuperset(set(['internet','latex'])) # random
     337        sage: available_software.issuperset(set(['maple','latex'])) # random
    338338        True
    339339    """
    340340    def __init__(self):
    class AvailableSoftware(object): 
    360360        EXAMPLES::
    361361
    362362            sage: from sage.doctest.external import available_software
    363             sage: 'internet' in available_software # random
     363            sage: 'macaulay2' in available_software # random
    364364            True
    365365        """
    366366        try:
    class AvailableSoftware(object): 
    386386        EXAMPLES::
    387387
    388388            sage: from sage.doctest.external import available_software
    389             sage: available_software.issuperset(set(['internet','latex','magma'])) # random
     389            sage: available_software.issuperset(set(['gurobi','latex','magma'])) # random
    390390            True
    391391        """
    392392        for item in other:

Treating the symptom rather than the disease, but if this only manifests itself in doctests, maybe that's okay.

comment:21 in reply to: ↑ 19 ; follow-up: Changed 3 years ago by jdemeyer

Replying to jhpalmieri:

I can't get the same commands to fail when run in Sage interactively.

Well, it seems that you need to fork() which would be the case when using multiprocessing for example.

comment:22 in reply to: ↑ 20 Changed 3 years ago by jdemeyer

Replying to jhpalmieri:

From this point of view, we could just change the doctests:

I'm -1 to this. As far as I know, there is nothing wrong with those doctests, so why should we change them?

comment:23 in reply to: ↑ 21 ; follow-up: Changed 3 years ago by jhpalmieri

Replying to jdemeyer:

Replying to jhpalmieri:

I can't get the same commands to fail when run in Sage interactively.

Well, it seems that you need to fork() which would be the case when using multiprocessing for example.

In combination with urlopen, as embray points out.

Replying to jdemeyer:

Replying to jhpalmieri:

From this point of view, we could just change the doctests:

I'm -1 to this. As far as I know, there is nothing wrong with those doctests, so why should we change them?

If someone can figure out a way to fix what's wrong, that's great. On the other hand, I don't think they test anything particularly compelling, since the same commands work fine interactively. So if it turns out to be easy to fix, I am happy to fix it, but if it is complicated, perhaps we could spend our efforts fixing more important things. Or maybe I'm underestimating somehow the importance of these failures.

Put another way, how is changing the doctests different, qualitatively, from doing export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=yes before running doctests? Both bypass the problem without solving it.

comment:24 in reply to: ↑ 23 Changed 3 years ago by jdemeyer

Replying to jhpalmieri:

Put another way, how is changing the doctests different, qualitatively, from doing export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=yes before running doctests? Both bypass the problem without solving it.

Your proposal only works around those specific doctest failures. There could easily be tests added in the future with the same problem. And I believe that the problem can also occur interactively when using multiprocessing.

Note that I didn't mean to set OBJC_DISABLE_INITIALIZE_FORK_SAFETY=yes only before running doctests, I meant doing that always (in sage-env for example). So that should work around all cases of the problem, not just the one specific test failure.

comment:25 follow-ups: Changed 3 years ago by jhpalmieri

Okay, I can see that. Are there dangers to setting OBJC_DISABLE_INITIALIZE_FORK_SAFETY=yes always? (It would be good for me to know, since I am currently setting this in my .bashrc file to avoid these doctest failures.)

comment:26 in reply to: ↑ 25 ; follow-up: Changed 3 years ago by gh-bryangingechen

Thanks for following up on this everyone.

Replying to jhpalmieri:

Okay, I can see that. Are there dangers to setting OBJC_DISABLE_INITIALIZE_FORK_SAFETY=yes always? (It would be good for me to know, since I am currently setting this in my .bashrc file to avoid these doctest failures.)

I just wanted to point out that according to your previous comment ticket:25092#comment:20 OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES is required (note the caps).

comment:27 in reply to: ↑ 26 Changed 3 years ago by jhpalmieri

Replying to gh-bryangingechen:

Thanks for following up on this everyone.

Replying to jhpalmieri:

Okay, I can see that. Are there dangers to setting OBJC_DISABLE_INITIALIZE_FORK_SAFETY=yes always? (It would be good for me to know, since I am currently setting this in my .bashrc file to avoid these doctest failures.)

I just wanted to point out that according to your previous comment ticket:25092#comment:20 OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES is required (note the caps).

Yes, you're right, it has to be YES, not yes.

comment:28 in reply to: ↑ 25 ; follow-up: Changed 3 years ago by jdemeyer

Replying to jhpalmieri:

Okay, I can see that. Are there dangers to setting OBJC_DISABLE_INITIALIZE_FORK_SAFETY=yes always?

"dangers" in which sense? It's removing a safety check, so it could be considered a bad idea. On the other hand, my impression is that this safety check is not very good so maybe it deserves to be disabled.

One remaining question to be answered: should we do this in Sage 8.3? My feeling says yes, because other people using the same OS X version might be affected by it.

comment:29 in reply to: ↑ 28 ; follow-up: Changed 3 years ago by embray

Replying to jdemeyer:

Replying to jhpalmieri:

Okay, I can see that. Are there dangers to setting OBJC_DISABLE_INITIALIZE_FORK_SAFETY=yes always?

"dangers" in which sense? It's removing a safety check, so it could be considered a bad idea. On the other hand, my impression is that this safety check is not very good so maybe it deserves to be disabled.

One remaining question to be answered: should we do this in Sage 8.3? My feeling says yes, because other people using the same OS X version might be affected by it.

I wouldn't, in general. Maybe just for running the test suite at the most, but even then it's iffy.

I'll investigate a bit more what's going on with urlopen. Perhaps there are some pre-fork or post-fork cleanups we can do to mitigate that specific case. I might need to get my hands on a machine with OSX 10.13, but not necessarily either...

comment:30 in reply to: ↑ 29 Changed 3 years ago by jdemeyer

Replying to embray:

just for running the test suite

In general, I am against testsuite-specific fixes. So I think that this is the worst possible option.

I'll investigate a bit more what's going on with urlopen. Perhaps there are some pre-fork or post-fork cleanups we can do to mitigate that specific case.

I wonder how you'll do that without information from Apple. It reminds me how annoying closed-source software is.

comment:31 Changed 3 years ago by embray

I wonder how you'll do that without information from Apple. It reminds me how annoying closed-source software is.

I don't know, we'll see. It would not be my first time reverse engineering Apple code. You do know what my old job was right...

comment:32 Changed 3 years ago by jdemeyer

ping? We are still seeing reports of people saying that they need to set OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES. I recall my suggestion to set that (always and unconditionally) in sage-env.

But before doing that, I'd like some advice from the OS X experts.

comment:33 follow-ups: Changed 3 years ago by embray

Seems like the thing to do, unfortunately. Should probably also set it in sage.env. I wonder what the status is of dealing with this issue in Python itself.

comment:34 in reply to: ↑ 33 Changed 3 years ago by jdemeyer

Replying to embray:

I wonder what the status is of dealing with this issue in Python itself.

They recently merged a PR to disable a failing test: https://github.com/python/cpython/pull/11043

Apart from that, nothing seems to be happening in CPython.

comment:35 Changed 3 years ago by jdemeyer

What bothers me is that the error message from Apple may have been in progress in another thread when fork() was called is in most cases a false positive.

The way I understand things (I certainly may be wrong), that error message protects against a crash that may happen if one thread of an application is calling fork() while another thread is executing objective-C code. Those are very precise conditions which are highly unlikely to occur in Sage (and probably Python in general): multi-threading is used in Sage, but only in low-level C libraries which wouldn't run objective-C code. Moreover, I don't think we are ever running multiple threads of Python code (typically, the C code creates threads and joins them before passing control back to Python). So any time that os.fork() is called, only a single thread is running.

But the error message triggers any time that objective-C code is called after fork(), regardless of whether threads are involved, let alone that those threads are executing objective-C code.

So it seems to me that disabling the error message is the right thing to do.

comment:36 Changed 3 years ago by embray

You can't necessarily say that for sure, because I think a number of standard posix interfaces on macOS are implemented on top of lower-level Objective-C code, as one can often see in stack traces. I remember at one point I traced exactly what function was causing this error in Sage, when checking for internet access. I don't remember exactly but I think it was something standard like gethostbyname().

But I agree, it's especially annoying that the message is printed unconditionally, and worse that the abort() happens even in the case of false positives, which as you write are going to be far more common. So setting this environment variable is the only logical choice.

In the odd case where some bizarre crash is happening on macOS, it should be understood that the case this is trying to prevent is a possibility. This is just Apple aggressively making us aware of their bug (rather than just fixing the damn bug which you'd think with nearly a trillion dollars in their pocket they could do--this isn't rocket science).

comment:37 Changed 3 years ago by jdemeyer

  • Branch set to u/jdemeyer/aborted_tests_on_macos_10_13_6

comment:38 Changed 3 years ago by jdemeyer

  • Authors set to Jeroen Demeyer
  • Commit set to 688a288e16663fe2c4a881baa510ed797a46efa1
  • Status changed from new to needs_review

New commits:

688a288Set OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

comment:39 in reply to: ↑ 33 Changed 3 years ago by jhpalmieri

This works for me.

Replying to embray:

Should probably also set it in sage.env.

Can you explain why?

comment:40 follow-up: Changed 3 years ago by embray

Although not yet fully operative I want the sage package to eventually be able to work correctly with some sensible defaults without having to source sage-env, so the package itself should also put this in the environment if it's needed for some of its functionality to work properly.

comment:41 Changed 3 years ago by jhpalmieri

That makes sense to me. Jeroen, any comments?

comment:42 in reply to: ↑ 40 Changed 3 years ago by jdemeyer

Replying to embray:

Although not yet fully operative I want the sage package to eventually be able to work correctly with some sensible defaults without having to source sage-env, so the package itself should also put this in the environment if it's needed for some of its functionality to work properly.

Maybe, but not in this ticket. What you propose should fit in a larger discussion about how to handle environment variables. It doesn't make sense to me to special-case this one environment variable OBJC_DISABLE_INITIALIZE_FORK_SAFETY.

comment:43 Changed 3 years ago by jhpalmieri

  • Reviewers set to John Palmieri
  • Status changed from needs_review to positive_review

This fixes the problem and I think it should be merged quickly; enhancements to sage.env should be on another ticket. Separate topics should be on separate tickets unless they are inextricably entwined, and I don't think that's the case for sage.env.

comment:44 Changed 3 years ago by vbraun

  • Branch changed from u/jdemeyer/aborted_tests_on_macos_10_13_6 to 688a288e16663fe2c4a881baa510ed797a46efa1
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:45 Changed 3 years ago by embray

  • Milestone changed from sage-8.4 to sage-8.6

This tickets were closed as fixed after the Sage 8.5 release.

Note: See TracTickets for help on using tickets.