Opened 4 years ago

Closed 20 months ago

#17650 closed defect (wontfix)

alarm broken on cygwin

Reported by: gouezel Owned by: embray
Priority: major Milestone: sage-duplicate/invalid/wontfix
Component: porting: Cygwin Keywords: ppl, cygwin
Cc: vbraun, jpflori Merged in:
Authors: Sebastien Gouezel Reviewers:
Report Upstream: N/A Work issues:
Branch: u/gouezel/alarm_broken_on_cygwin (Commits) Commit: 457bfa5062b248f4c36abeda94e84ed7c0acb877
Dependencies: Stopgaps:

Description (last modified by gouezel)

On cygwin (32 and 64), the alarm mechanism is broken:

sage: alarm(0.1); sum(xrange(100000000))
4999999950000000

instead of the expected interrupt.

I traced the bug by dichotomy to a ppl component named watchdog. Its role is similar to alarm, involving signals and interrupts. I guess that, due to the peculiarities of cygwin's implementation of signals, watchdog somehow intercepts the python signals and does dot send them back.

watchdog is not an optional component of ppl, so it can not be disabled by a configure flag. However, it is disabled on systems that don't have setitimer. Cygwin has setitimer, but a simple hack of ppl's makefile can hide it, thereby disabling watchdog and fixing the sage alarm mechanism. This is done in the proposed patch.

A better fix would be to dig into the ppl and cygwin signal mechanism, of course...

Change History (20)

comment:1 Changed 4 years ago by gouezel

Should it be considered as a PPL bug that the methods are not defined in the header? Or a cython bug that this suffices to break the interrupts mechanism? Or a cygwin bug since it does not seem to be a problem on other platforms?

Last edited 4 years ago by gouezel (previous) (diff)

comment:2 Changed 4 years ago by gouezel

The PPL code seems to be correct, I am stuck on this one...

Last edited 4 years ago by gouezel (previous) (diff)

comment:3 Changed 4 years ago by gouezel

  • Report Upstream changed from N/A to Reported upstream. No feedback yet.

comment:4 Changed 4 years ago by gouezel

  • Description modified (diff)
  • Report Upstream changed from Reported upstream. No feedback yet. to N/A

comment:5 Changed 4 years ago by gouezel

  • Branch set to u/gouezel/alarm_broken_on_cygwin

comment:6 Changed 4 years ago by gouezel

  • Cc vbraun jpflori added
  • Commit set to 1a8ecd257246a869941d6407eda677a489e019fa
  • Description modified (diff)
  • Keywords ppl cygwin added
  • Status changed from new to needs_review

New commits:

1a8ecd2disable ppl watchdog on cygwin

comment:7 Changed 4 years ago by gouezel

  • Authors set to Sebastien Gouezel

comment:8 Changed 4 years ago by jdemeyer

  • Status changed from needs_review to needs_info

What I don't like about this ticket is that it's not clear that the "ppl watchdog" is not needed. What does it do and why does Sage not need it?

comment:9 follow-up: Changed 4 years ago by gouezel

Excellent question. There is essentially no documentation on this ppl watchdog that I could find. It seems to be a classical watchdog mechanism, i.e., it detects if some computation loops forever, and in this case it interrupts it automatically and resets things to a nicer state. In particular, it should not play any role in non-buggy situations.

It is not necessary to PPL (since it is disabled on some platforms, those which don't have setitimer), and I do not see how it could be relevant to sage. So, my impression is that it is safe to disable it (note that the patch only disables it on cygwin, so it will not break anything on officially supported platforms). On cygwin, with the patch, sage -t all does not show any ppl-related failure. By the way, without the patch, sage -t all hangs forever and eats all available memory, since some tests rely on alarm to be interrupted! So, in my opinion, the patch gives a definitive improvement on cygwin.

comment:10 Changed 4 years ago by gouezel

  • Status changed from needs_info to needs_review

comment:11 in reply to: ↑ 9 Changed 4 years ago by jdemeyer

  • Status changed from needs_review to needs_work

Replying to gouezel:

note that the patch only disables it on cygwin, so it will not break anything on officially supported platforms

For me, that's actually a reason to be against this patch. Either we need it and we shouldn't disable it on Cygwin, or we don't need it and we can just disable it everywhere.

comment:12 Changed 4 years ago by gouezel

I get your point. Unfortunately, I can't say for sure that watchdog is not helpful to ppl in rare situations (for instance, maybe it uses some algorithms whose termination is not guaranteed, counting on watchdog to exit bad situations), so I would rather keep watchdog wherever possible (i.e., not on cygwin, where setitimer is not fully functional).

I leave the ticket as needs_work until someone more knowledgeable on ppl steps up (or someone who really needs the patch on cygwin pushes for it)

comment:13 Changed 4 years ago by git

  • Commit changed from 1a8ecd257246a869941d6407eda677a489e019fa to 457bfa5062b248f4c36abeda94e84ed7c0acb877

Branch pushed to git repo; I updated commit sha1. New commits:

457bfa5 #17650: touch files in ppl spkg_install to avoid reconfiguration

comment:14 follow-up: Changed 3 years ago by jpflori

I'm just posting random links potentially related:

According to http://trac.sagemath.org/ticket/10039#comment:49 there used to be a --disable-watchdog configure flag, too bad it disappeared.

I guess we should open a bug on ppl bugtracker (if one exists).

Unfortunately it seems cygwin does not ship ppl anymore, so we cannot look at what they would have done. It could still be useful to post on the cygwin ml's.

@sebastien: can you provide a minimal C file reproducing the issue? Maybe something inspired in the aforementioned 2008 post can help.

comment:15 Changed 3 years ago by jdemeyer

The post does indeed look related. I don't understand why it was posted on PPL-devel, since it seems to have nothing to do with PPL really.

@sebastien: can you provide a minimal C file reproducing the issue?

That would be great, especially to send to the upstream bug trackers (PPL and/or Cygwin).

comment:16 in reply to: ↑ 14 Changed 3 years ago by gouezel

Replying to jpflori:

@sebastien: can you provide a minimal C file reproducing the issue? Maybe something inspired in the aforementioned 2008 post can help.

I just tried, but unfortunately I was not able to reproduce the issue with plain C or C++ files (for instance, taking the file in the link you give and adding ppl headers and initialization does not break the itimer mechanism).

For the record, steps to reproduce the issue using python:

File alarm.pyx:

import signal

def essai(n):
  signal.setitimer(signal.ITIMER_REAL, 0.5 , 0)
  return sum(xrange(n))

File alarme_c.cpp

#include <ppl.hh>

File setup.py

from distutils.core import setup
from Cython.Build import cythonize
from distutils.extension import Extension

setup(
    ext_modules = cythonize(Extension("alarme", ['alarme.pyx', 'alarme_c.cpp'], libraries = ['ppl']))
)

File essai.py

import alarme

print alarme.essai(100000000)

Compile the extension with

python setup.py build_ext --inplace

Then python essai.py results in 4999999950000000, no interruption.

Commenting out the header inclusion in alarme_c.cpp, then one gets Alarm clock instead.

comment:17 Changed 3 years ago by embray

  • Owner changed from (none) to embray

comment:18 Changed 23 months ago by embray

  • Milestone sage-6.5 deleted

comment:19 Changed 23 months ago by gouezel

  • Milestone set to sage-duplicate/invalid/wontfix
  • Status changed from needs_work to positive_review

duplicate of #21190

comment:20 Changed 20 months ago by embray

  • Resolution set to wontfix
  • Status changed from positive_review to closed

Closing tickets in the sage-duplicate/invalid/wontfix module with positive_review (i.e. someone has confirmed they should be closed).

Note: See TracTickets for help on using tickets.