Opened 5 years ago

Closed 4 years ago

#21628 closed defect (duplicate)

AbstractLinearCode.canonical_representative not properly wrapped in sig_on and sig_off

Reported by: jsrn Owned by:
Priority: major Milestone: sage-duplicate/invalid/wontfix
Component: coding theory Keywords: cython, linear_code
Cc: tfeulner Merged in:
Authors: Reviewers: Johan Rosenkilde, Jeroen Demeyer
Report Upstream: N/A Work issues:
Branch: u/jsrn/21628_canonical_representative_interrupting (Commits, GitHub, GitLab) Commit: 40269b97a50f39e55cc6413a6b49eecafcb565ff
Dependencies: #21651 Stopgaps:

Status badges

Description

sage: C = LinearCode(random_matrix(GF(47), 12, 43))
sage: C.canonical_representative()
<wait half an eternity or try to use C-c>

Pressing C-c either seems to have no effect, throws an exception which is not caught and computation continues, or segfaults.

This is very unfortunate because this is a function commonly tried but interrupted due to long run-time.

Change History (17)

comment:1 Changed 5 years ago by jsrn

  • Branch set to u/jsrn/21628_canonical_representative_interrupting

comment:2 Changed 5 years ago by jsrn

  • Authors set to Johan Rosenkilde
  • Commit set to 40269b97a50f39e55cc6413a6b49eecafcb565ff
  • Status changed from new to needs_review

I've added calls to sig_check in the constructor LinearCodeAutGroupCanLabel (which might call itself recursively), as well as sig_on/off the call to PartitionRefinementLinearCode. The latter is, AFAICT, the main work-horse of the algorithm. Both are Cython, but there's more C-like code in PartitionRefinementLinearCode (in codecan.pyx), so this might be why it used to behave so badly when interrupted.

I'm not exactly sure how the interruption works, but it now seems to me to be responsive and not crash. I also don't know if there might be memory leaks left after the interruption :-S


New commits:

40269b9Insert sig_on/off around expensive check and sig_check on recursive calls

comment:3 follow-up: Changed 5 years ago by jdemeyer

  • Status changed from needs_review to needs_work

The call to PartitionRefinementLinearCode is a Python call, so sig_on() is inappropriate. sig_on() is only meant to wrap calls to external libraries that you have no control over.

In this case, you should use sig_check() in the Cython code instead.

comment:4 in reply to: ↑ 3 ; follow-ups: Changed 5 years ago by jsrn

Replying to jdemeyer:

The call to PartitionRefinementLinearCode is a Python call, so sig_on() is inappropriate. sig_on() is only meant to wrap calls to external libraries that you have no control over.

In this case, you should use sig_check() in the Cython code instead.

Thanks. Of course you're the mastermind behind cysignals, but the documentation of cysignals *does* say: "You should put sig_on() before and sig_off() after any Cython code which could potentially take a long time.". Which is why I thought it would be OK to do that.

Further, removing that sig_on/sig_off pair, I get the following segfault when interrupting:

------------------------------------------------------------------------
Unhandled SIGSEGV: A segmentation fault occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
------------------------------------------------------------------------

Further leading me to believe sig_on/sig_off was the way to go.

I've tried to look closer at PartitionRefinementLinearCode: it quickly calls into sage/groups/perm_gps/partn_ref2/refinement_generic.pyx which is a very complicated, mutually recursive backtracking algorithm, and it's not very clear what the "expensive loop" is. Exactly why is it not OK to use sig_on/sig_off in such a case?

Also, it's not clear to me why Sage segfaults on C-c since all the code I can see is just Cython code. If this is expected behaviour, why is the following message printed if sig_on/sig_off is not the right way to solve such an issue?

Best, Johan

comment:5 in reply to: ↑ 4 ; follow-up: Changed 5 years ago by jdemeyer

Well, the cysignals documentation also has a warning box with

The code inside sig_on() should be pure C or Cython code. If you call any Python code or manipulate any Python object (even something trivial like x = []), an interrupt can mess up Python’s internal state. When in doubt, try to use sig_check() instead.

comment:6 in reply to: ↑ 4 Changed 5 years ago by jdemeyer

Replying to jsrn:

Further, removing that sig_on/sig_off pair, I get the following segfault when interrupting:

------------------------------------------------------------------------
Unhandled SIGSEGV: A segmentation fault occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
------------------------------------------------------------------------

That's probably an independent problem. I strongly suggest that you first fix this segfault before moving on. It is pointless to build on top of a broken foundation.

Do you have a GDB backtrace (if you have gdb installed, such a backtrace will be shown automatically) or a way for me to reproduce that segfault?

comment:7 in reply to: ↑ 5 Changed 5 years ago by jsrn

Replying to jdemeyer:

Well, the cysignals documentation also has a warning box with

The code inside sig_on() should be pure C or Cython code. If you call any Python code or manipulate any Python object (even something trivial like x = []), an interrupt can mess up Python’s internal state. When in doubt, try to use sig_check() instead.

Haha, good point :-)

That's probably an independent problem. I strongly suggest that you first fix this segfault before moving on. It is pointless to build on top of a broken foundation.

Do you have a GDB backtrace (if you have gdb installed, such a backtrace will be shown automatically) or a way for me to reproduce that segfault?

Hmm, OK. What I run to produce the segfault is the following:

sage: C = LinearCode(random_matrix(GF(47), 25, 35))
sage: C.canonical_representative()
< Hit C-c and watch the fireworks >

comment:8 follow-up: Changed 5 years ago by jsrn

The backtrace from gdb is:

/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/cysignals/signals.so(+0x43c5)[0x7f872e4e73c5]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/cysignals/signals.so(+0x4415)[0x7f872e4e7415]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/cysignals/signals.so(+0x6f07)[0x7f872e4e9f07]
/usr/lib/libpthread.so.0(+0x11080)[0x7f87369d8080]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/sage/coding/codecan/codecan.so(+0xcec0)[0x7f856f699ec0]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/sage/groups/perm_gps/partn_ref2/refinement_generic.so(+0x10150)[0x7f857b071150]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/sage/coding/codecan/codecan.so(+0x2caf1)[0x7f856f6b9af1]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(+0xbe6b5)[0x7f8736ca26b5]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/sage/coding/codecan/autgroup_can_label.so(+0x5c90)[0x7f856f8f8c90]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/sage/coding/codecan/autgroup_can_label.so(+0x2a74f)[0x7f856f91d74f]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7f8736c36e53]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(+0x6403c)[0x7f8736c4803c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7f8736c36e53]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_CallObjectWithKeywords+0x47)[0x7f8736ce69d7]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyInstance_New+0x6c)[0x7f8736c4fe1c]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/sage/coding/codecan/autgroup_can_label.so(+0x5c90)[0x7f856f8f8c90]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/sage/coding/codecan/autgroup_can_label.so(+0x34d19)[0x7f856f927d19]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7f8736c36e53]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(+0x6403c)[0x7f8736c4803c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7f8736c36e53]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_CallObjectWithKeywords+0x47)[0x7f8736ce69d7]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyInstance_New+0x6c)[0x7f8736c4fe1c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7f8736c36e53]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x3adf)[0x7f8736ceaa7f]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8bc)[0x7f8736cedb3c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(+0x8390d)[0x7f8736c6790d]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/sage/misc/cachefunc.so(+0xf563)[0x7f872ce17563]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/sage/misc/cachefunc.so(+0xaeb0)[0x7f872ce12eb0]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/sage/misc/cachefunc.so(+0x1f946)[0x7f872ce27946]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7f8736c36e53]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x3adf)[0x7f8736ceaa7f]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8bc)[0x7f8736cedb3c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5075)[0x7f8736cec015]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8bc)[0x7f8736cedb3c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x19)[0x7f8736cedc39]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5d60)[0x7f8736cecd00]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8bc)[0x7f8736cedb3c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5075)[0x7f8736cec015]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8bc)[0x7f8736cedb3c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5075)[0x7f8736cec015]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8bc)[0x7f8736cedb3c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5075)[0x7f8736cec015]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8bc)[0x7f8736cedb3c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5075)[0x7f8736cec015]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8bc)[0x7f8736cedb3c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5075)[0x7f8736cec015]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8bc)[0x7f8736cedb3c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5075)[0x7f8736cec015]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8bc)[0x7f8736cedb3c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x19)[0x7f8736cedc39]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyRun_FileExFlags+0x8a)[0x7f8736d1227a]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0xd5)[0x7f8736d13655]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(Py_Main+0xc6f)[0x7f8736d29daf]
/usr/lib/libc.so.6(__libc_start_main+0xf1)[0x7f8735f3e291]
python(_start+0x2a)[0x40067a]

comment:9 in reply to: ↑ 8 ; follow-up: Changed 5 years ago by jdemeyer

Replying to jsrn:

The backtrace from gdb is

No, it's not :-)

That's the backtrace from glibc. The backtrace from gdb should be below that.

comment:10 in reply to: ↑ 9 ; follow-up: Changed 5 years ago by jsrn

Replying to jdemeyer:

Replying to jsrn:

The backtrace from gdb is

No, it's not :-)

That's the backtrace from glibc. The backtrace from gdb should be below that.

Ah ok. It says "Attaching gdb to process id 10318." and then "Saved trace to <blah>". But that log file is empty.

comment:11 in reply to: ↑ 10 Changed 5 years ago by jdemeyer

Replying to jsrn:

Ah ok. It says "Attaching gdb to process id 10318." and then "Saved trace to <blah>". But that log file is empty.

Probably you don't have gdb or it's the wrong version...

comment:12 Changed 5 years ago by jsrn

I do have gdb since gdb launches when I type gdb. It's not hard to test.

How do I know if it's "the right version"? It's 7.11.1. My gcc is 6.1.1. Presumably they fit together since they're in my package manager?

comment:13 Changed 5 years ago by jdemeyer

  • Dependencies set to #21651

comment:14 Changed 5 years ago by jdemeyer

I created #21651 for the existing segfault issue.

comment:15 Changed 4 years ago by jdemeyer

As far as I can tell, this issue is completely fixed by #21651.

comment:16 Changed 4 years ago by jsrn

  • Milestone changed from sage-7.4 to sage-duplicate/invalid/wontfix

Yes, I think so. I tried some days ago to make it crash again, but was unable. I wanted to get back to it and try harder, but never got around to it. Since you also tried since then, it's probably true that the issue got fixed.

Let's close it.

comment:17 Changed 4 years ago by jdemeyer

  • Authors Johan Rosenkilde deleted
  • Resolution set to duplicate
  • Reviewers set to Johan Rosenkilde, Jeroen Demeyer
  • Status changed from needs_work to closed
Note: See TracTickets for help on using tickets.