Opened 2 years ago

Closed 20 months ago

#26516 closed defect (fixed)

Wrong sig_on_count in SharedMeatAxe

Reported by: SimonKing Owned by:
Priority: major Milestone: sage-8.7
Component: interfaces Keywords: libgap sig_on_count
Cc: Merged in:
Authors: Simon King Reviewers: Jeroen Demeyer
Report Upstream: N/A Work issues:
Branch: 6f876e7 (Commits) Commit: 6f876e7753f18a0b38273ae5f3e4cb09139ffb2c
Dependencies: Stopgaps:

Description (last modified by jdemeyer)

When running doctests in my applications of SharedMeatAxe?, sig_on_count reveals problems.

Change History (32)

comment:1 in reply to: ↑ description Changed 2 years ago by jdemeyer

Replying to SimonKing:

Proposed fix: move the sig_off() call to the "finally" clause, directly before libgap_exit()

No, that finally is meant for libgap_exit. So you need a nested try/finally:

try:
    sig_on()
    try:
        ...
    finally:
        sig_off
finally:
    libgap_exit()

or something like that.

comment:2 Changed 2 years ago by jdemeyer

  • Description modified (diff)

comment:3 in reply to: ↑ description ; follow-up: Changed 2 years ago by jdemeyer

Replying to SimonKing:

But in my applications, I am in fact catching errors raised by libgap.

Can you elaborate on what you mean with that? As far as I know, libgap is a C library, so how could it raise Python exceptions?

Last edited 2 years ago by jdemeyer (previous) (diff)

comment:4 in reply to: ↑ 3 ; follow-up: Changed 2 years ago by SimonKing

Replying to jdemeyer:

Replying to SimonKing:

But in my applications, I am in fact catching errors raised by libgap.

Can you elaborate on what you mean with that? As far as I know, libgap is a C library, so how could it raise Python exceptions?

Right. I use "libgap" synonymous to SageMath's libgap wrapper.

Anyway, I am doing things like this:

    try:
        G = G.MinimalGeneratingSet().Group()
    except (RuntimeError, ValueError):
        G = G.SmallGeneratingSet().Group()

For some of my groups, the attempt to compute a minimal generating set fails, and if I understand correctly, the libgap wrapper would call sig_on(), then kindly ask libgap to compute the minimal generating set, and when this fails would not call sig_off() before raising a Python error.

EDIT: I tested and found that this is not giving a sig_on_count() failure. But I am now going through the actual tests of my applications and will hopefully be able to create a minimal example.

Last edited 2 years ago by SimonKing (previous) (diff)

comment:5 in reply to: ↑ 4 ; follow-up: Changed 2 years ago by jdemeyer

Replying to SimonKing:

if I understand correctly, the libgap wrapper would call sig_on(), then kindly ask libgap to compute the minimal generating set, and when this fails would not call sig_off() before raising a Python error.

No, the code that you posted in this ticket is correct in that sense. Libgap error handling in Sage uses sig_error() (see error_handler in src/sage/libs/gap/util.pyx). For that, there must not be a sig_off() call.

But I am now going through the actual tests of my applications and will hopefully be able to create a minimal example.

Please do.

comment:6 in reply to: ↑ 5 ; follow-up: Changed 2 years ago by SimonKing

Replying to jdemeyer:

Replying to SimonKing:

if I understand correctly, the libgap wrapper would call sig_on(), then kindly ask libgap to compute the minimal generating set, and when this fails would not call sig_off() before raising a Python error.

No, the code that you posted in this ticket is correct in that sense. Libgap error handling in Sage uses sig_error() (see error_handler in src/sage/libs/gap/util.pyx). For that, there must not be a sig_off() call.

I see. In that case, my original assessment was very likely wrong. I have moved my cohomology programs from using gap-via-pexpect to using gap-via-libgap, and so it seemed very likely to me that the problem came from that move, even more so when it seemed to me that sig_off() was forgotten in basically all try-except-finally clauses in sage/libs/gap/*.

Currently, my smallest example triggering sig_on_count to complain is an example in which another library results in an error in an example that previously worked. So, very likely the wrong sig_on/off count comes from there, not from libgap.

comment:7 in reply to: ↑ 6 Changed 2 years ago by SimonKing

Replying to SimonKing:

Currently, my smallest example triggering sig_on_count to complain is an example in which another library results in an error in an example that previously worked. So, very likely the wrong sig_on/off count comes from there, not from libgap.

Actually it seems that the problem rather lies in the SharedMeatAxe wrapper, not in the libgap wrapper. Namely, when wrapped libmtx functions are called, they are encapsulated into sig_on/sig_off, but the wrapper also translates propagated C errors into Python errors. So, presumable when such error is raised, sig_off ought to be called.

Question on the way to proceed:

  • Change the topic of this ticket from libgap wrapper to libmtx wrapper?
  • Resolve this ticket as "invalid/won't fix" and open a new ticket addressing the libmtx wrapper?

comment:8 follow-ups: Changed 2 years ago by SimonKing

Technical question about sig_on/sig_off, sig_check, and sig_error.

Assume that in the error handler of the libmtx wrapper I call sig_error, similar to what is done in the case of libgap. Would it be needed to henceforth only use sig_on/sig_off, but not sig_check? Namely, when calling sig_error, it is required that sig_on has been called before.

And if that's really the case, then maybe it would be good to know about the use cases of the different sig_* functions. So far, I thought it is like this:

  • Use sig_on/sig_off arount a library call that potentially takes a long time.
  • Use sig_check when there is a tight loop involving library calls, as that's faster than sig_on/sig_off.
  • I didn't know at all about sig_error.

So, is it really the case that I have to accept the slow-down that comes from being forced to use sig_on/off instead of sig_check?

In any case, I realised that I need to go through matrix_gfpn_dense, as there are many library calls that are in fact not wrapped into sig_on/off pairs.

comment:9 in reply to: ↑ 8 ; follow-up: Changed 2 years ago by jdemeyer

  • Description modified (diff)
  • Summary changed from Wrong sig_on_count in libgap to Wrong sig_on_count in SharedMeatAxe

Replying to SimonKing:

Technical question about sig_on/sig_off, sig_check, and sig_error.

Assume that in the error handler of the libmtx wrapper I call sig_error, similar to what is done in the case of libgap. Would it be needed to henceforth only use sig_on/sig_off, but not sig_check? Namely, when calling sig_error, it is required that sig_on has been called before.

Absolutely.

And if that's really the case, then maybe it would be good to know about the use cases of the different sig_* functions. So far, I thought it is like this:

  • Use sig_on/sig_off arount a library call that potentially takes a long time.
  • Use sig_check when there is a tight loop involving library calls, as that's faster than sig_on/sig_off.

That's more or less correct.

  • I didn't know at all about sig_error.

sig_error is quite specialized to handle errors from a C library in a callback. But for SharedMeatAxe, this doesn't seem to be needed since it has a way to deal with errors already.

Conclusion: the best solution is probably to wrap errors from SharedMeatAxe as

sig_on()
try:
    # MeatAxe call which might raise a Python exception
finally:
    sig_off()

comment:10 in reply to: ↑ 8 Changed 2 years ago by jdemeyer

Replying to SimonKing:

  • Use sig_check when there is a tight loop involving library calls, as that's faster than sig_on/sig_off.

Actually, a more important reason is not so much that it's faster but safer too. sig_on()/sig_off() is really "use this only when you really know what you are doing" while sig_check() is pretty foolproof.

comment:11 in reply to: ↑ 9 ; follow-up: Changed 2 years ago by SimonKing

I suppose changing title and description means I should use this ticket to fix the libmtx wrapper, rather than create a new ticket.

Replying to jdemeyer:

sig_error is quite specialized to handle errors from a C library in a callback. But for SharedMeatAxe, this doesn't seem to be needed since it has a way to deal with errors already.

In SharedMeatAxe, in contrast to MeatAxe, I am consequently propagating error values, so that a program calling a library function can take proper action when an error occurs. MeatAxe does make it possible to have a custom error handler, which I am using here of course.

Maybe I am misunderstanding the role of sig_*. Namely, the library functions have an error return value, and Cython knows about it (it is declared in the .pxd files). So, a Python error is raised regardless whether sig_on/sig_off is involved or not.

Is it correct that the only added bonus of using sig_on/sig_off is that the user can interrupt with ctrl-c?

comment:12 in reply to: ↑ 11 ; follow-up: Changed 2 years ago by jdemeyer

Replying to SimonKing:

Is it correct that the only added bonus of using sig_on/sig_off is that the user can interrupt with ctrl-c?

Yes.

comment:13 in reply to: ↑ 12 Changed 2 years ago by SimonKing

Replying to jdemeyer:

Replying to SimonKing:

Is it correct that the only added bonus of using sig_on/sig_off is that the user can interrupt with ctrl-c?

Yes.

OK, in that case I should use

sig_on()
try:
    <library call>
finally:
    sig_off()

only for functions that potentially run long enough so that it makes sense for them being interruptible.

comment:14 follow-up: Changed 2 years ago by SimonKing

Is it ok if Python stuff happens between sig_on and sig_off, such as

        try:
            mat = MatAlloc(self.Data.Field, self._nrows, right._ncols)
            MatMulStrassen(mat, self.Data, right.Data)
            return new_mtx(mat, self)
        finally:
            sig_off()

?

Or should, in particular, return statements happen after the `finally: sig_off()'?

comment:15 in reply to: ↑ 14 ; follow-up: Changed 2 years ago by jdemeyer

Replying to SimonKing:

Is it ok if Python stuff happens between sig_on and sig_off

You should try hard to avoid that as it's unsafe.

Or should, in particular, return statements happen after the `finally: sig_off()'?

There is nothing in particular about return statements. There just shouldn't be any kind of Python object manipulation (especially no creation of new Python objects).

comment:16 in reply to: ↑ 15 Changed 2 years ago by SimonKing

Replying to jdemeyer:

Replying to SimonKing:

Is it ok if Python stuff happens between sig_on and sig_off

You should try hard to avoid that as it's unsafe.

Or should, in particular, return statements happen after the `finally: sig_off()'?

There is nothing in particular about return statements. There just shouldn't be any kind of Python object manipulation (especially no creation of new Python objects).

Good, then I guess the following is bad, as new_mtx creates a python object (namely: a !Matrix_gfpn_dense):

        sig_on()
        try:
            mat = MatAlloc(self.Data.Field, self.Data.Nor+other.Data.Nor, self.Data.Noc)
            memcpy(mat.Data, self.Data.Data, FfCurrentRowSize*self.Data.Nor)
            memcpy(MatGetPtr(mat, self.Data.Nor), other.Data.Data, FfCurrentRowSize*other.Data.Nor)
            return new_mtx(mat, self)
        finally:
            sig_off()

comment:17 Changed 2 years ago by SimonKing

  • Branch set to u/SimonKing/MeatAxe_sig_on_count

comment:18 Changed 2 years ago by SimonKing

  • Authors set to Simon King
  • Commit set to 3772613b0401c025950666d26ad30e0d37965018
  • Status changed from new to needs_review

I am sorry that I cannot provide a new test, but I think the code is a bit safer now.


New commits:

3772613Fix sig_on/off count in matrix_gfpn_dense

comment:19 Changed 2 years ago by jdemeyer

I assume that you actually tested this? No patchbot with meataxe has tested this so far.

comment:20 Changed 2 years ago by jdemeyer

It would be safer to do self._converter.field_to_int calls outside of sig_on, as that involves Python calls.

While looking at that code, I also noticed a potential bug: #26546

comment:21 Changed 2 years ago by jdemeyer

  • Status changed from needs_review to needs_work

comment:22 Changed 2 years ago by SimonKing

I suggest to base this on top of #26546

comment:23 Changed 2 years ago by git

  • Commit changed from 3772613b0401c025950666d26ad30e0d37965018 to a3b9049f5deb2c3db20a0cbf9af10d198795d677

Branch pushed to git repo; I updated commit sha1. New commits:

cda646eFieldConverter_class: new methods fel_to_field and field_to_fel
81808c0Mention that this is for Cython only
a3b9049Merge branch 't/26546/matrix_gfpn_dense__exceptions_ignored_in_field_to_int__' into MeatAxe_sig_on_count

comment:24 Changed 2 years ago by SimonKing

  • Status changed from needs_work to needs_review

While merging with #26546 I had to resolve a couple of conflicts. Now all tests pass, I think that the usage of sig_on/sig_off is safer, and thus it is ready for review.

comment:25 Changed 2 years ago by jdemeyer

I'd rather review this after the dependency has actually been merged. Feel free to remind me when that happens.

comment:26 follow-up: Changed 21 months ago by dimpase

  • Milestone changed from sage-8.5 to sage-8.6

on this branch with installed meataxe, and #22626 and #26856, merged all the tests pass.

comment:27 in reply to: ↑ 26 Changed 21 months ago by SimonKing

Replying to jdemeyer:

I'd rather review this after the dependency has actually been merged. Feel free to remind me when that happens.

Replying to dimpase:

on this branch with installed meataxe, and #22626 and #26856, merged all the tests pass.

Is that the reminder? ;-)

comment:28 Changed 21 months ago by jdemeyer

  • Branch changed from u/SimonKing/MeatAxe_sig_on_count to u/jdemeyer/MeatAxe_sig_on_count

comment:29 Changed 21 months ago by jdemeyer

  • Commit changed from a3b9049f5deb2c3db20a0cbf9af10d198795d677 to 6f876e7753f18a0b38273ae5f3e4cb09139ffb2c

Rebased to 8.6.beta0 and put back this comment (assuming that it was removed by accident)

-            # Inverting a singular matrix causes MatInverse to raise a
-            # ZeroDivisionError (see sage_meataxe_error_handler).
-            # So we need to support exceptions here.

New commits:

6f876e7Fix sig_on/off count in matrix_gfpn_dense

comment:30 Changed 21 months ago by jdemeyer

  • Reviewers set to Jeroen Demeyer
  • Status changed from needs_review to positive_review

comment:31 Changed 21 months ago by embray

  • Milestone changed from sage-8.6 to sage-8.7

Retarging tickets optimistically to the next milestone. If you are responsible for this ticket (either its reporter or owner) and don't believe you are likely to complete this ticket before the next release (8.7) please retarget this ticket's milestone to sage-pending or sage-wishlist.

comment:32 Changed 20 months ago by vbraun

  • Branch changed from u/jdemeyer/MeatAxe_sig_on_count to 6f876e7753f18a0b38273ae5f3e4cb09139ffb2c
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.