Opened 7 years ago

Closed 7 years ago

#14059 closed defect (fixed)

Fix refcount/deallocation of integers

Reported by: SimonKing Owned by: rlm
Priority: blocker Milestone: sage-5.7
Component: memleak Keywords:
Cc: nbruin, robertwb, ncohen Merged in: sage-5.7.beta4
Authors: Simon King Reviewers: Jean-Pierre Flori
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by jpflori)

In #2435, the deallocation function in sage/rings/integer.pyx became as follows:

cdef void fast_tp_dealloc(PyObject* o):

    # If there is room in the pool for a used integer object,
    # then put it in rather than deallocating it.

    global integer_pool, integer_pool_count
    
    if integer_pool_count < integer_pool_size:
    
        # Here we free any extra memory used by the mpz_t by
        # setting it to a single limb. 
        if (<__mpz_struct *>( <char *>o + mpz_t_offset))._mp_alloc > 10:
            _mpz_realloc(<mpz_t *>( <char *>o + mpz_t_offset), 10)
            
        # It's cheap to zero out an integer, so do it here. 
        (<__mpz_struct *>( <char *>o + mpz_t_offset))._mp_size = 0
        
        # And add it to the pool.
        integer_pool[integer_pool_count] = o
        integer_pool_count += 1
        return

    # Again, we move to the mpz_t and clear it. See above, why this is evil.
    # The clean version of this line would be:
    #   mpz_clear(<mpz_t>(<char *>o + mpz_t_offset))

    mpz_free((<__mpz_struct *>( <char *>o + mpz_t_offset) )._mp_d, 0)

    # Free the object. This assumes that Py_TPFLAGS_HAVE_GC is not
    # set. If it was set another free function would need to be
    # called.

    PyObject_FREE(o)

The purpose of #2435 was to fix a memory leak introduced in #1337. Problem: One has to manually incref ONE, which I find a bit suspicious.

When I build sage-5.7.beta2 in debug version (as by #13864), Sage crashes at exit, while attempting PyObject_FREE(o). I will attach backtraces in comments.

The problem is the workaround for ONE (which is not cdefed so get collected at exit) got removed (accidently?) at #12615. We simplify the workaround nd completetly remove the use of ONE and use the new one from #12615. There still are strange things going one with allocation and deallocation before and after setting up the hooked functions with debug builds of Python, but with the current setting everything seems, so the workarounds are sufficient.

Apply trac_14059.patch.

Attachments (1)

trac_14059.patch (6.2 KB) - added by jpflori 7 years ago.
Cleanup hooked functions and integer allocation.

Download all attachments as: .zip

Change History (61)

comment:1 Changed 7 years ago by SimonKing

Starting and quitting Sage yields (with the optional gdb package from #13866)

sage: 
Exiting Sage (CPU time 0m0.06s, Wall time 0m7.51s).
Debug memory block at address p=0x1c817e0: API '�'
    18302628885633695743 bytes originally requested
    The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfb):
        at p-7: 0xcb *** OUCH
        at p-6: 0xcb *** OUCH
        at p-5: 0xcb *** OUCH
        at p-4: 0xcb *** OUCH
        at p-3: 0xcb *** OUCH
        at p-2: 0xcb *** OUCH
        at p-1: 0xcb *** OUCH
    Because memory is corrupted at the start, the count of bytes requested
       may be bogus, and checking the trailing pad bytes may segfault.
    The 8 pad bytes at tail=0xfe00000001c817df are ------------------------------------------------------------------------
/home/simon/SAGE/debug/sage-5.7.beta2/local/lib/libcsage.so(print_backtrace+0x31)[0x7fa099a7210d]
/home/simon/SAGE/debug/sage-5.7.beta2/local/lib/libcsage.so(sigdie+0x3d)[0x7fa099a7228f]
/home/simon/SAGE/debug/sage-5.7.beta2/local/lib/libcsage.so(sage_signal_handler+0x199)[0x7fa099a71ae3]
/lib64/libpthread.so.0(+0xfd00)[0x7fa09cf83d00]
/home/simon/SAGE/debug/sage-5.7.beta2/local/lib/libpython2.7.so.1.0(_PyObject_DebugDumpAddress+0x292)[0x7fa09d24eebe]
/home/simon/SAGE/debug/sage-5.7.beta2/local/lib/libpython2.7.so.1.0(_PyObject_DebugCheckAddressApi+0x114)[0x7fa09d24ec1e]
/home/simon/SAGE/debug/sage-5.7.beta2/local/lib/libpython2.7.so.1.0(_PyObject_DebugFreeApi+0x36)[0x7fa09d24e8de]
/home/simon/SAGE/debug/sage-5.7.beta2/local/lib/libpython2.7.so.1.0(_PyObject_DebugFree+0x1d)[0x7fa09d24e78a]
/home/simon/SAGE/debug/sage-5.7.beta2/local/lib/python2.7/site-packages/sage/rings/integer.so(+0x80292)[0x7fa08a30d292]
/home/simon/SAGE/debug/sage-5.7.beta2/local/lib/libpython2.7.so.1.0(_Py_Dealloc+0x35)[0x7fa09d24cbe4]
/home/simon/SAGE/debug/sage-5.7.beta2/local/lib/libpython2.7.so.1.0(+0xace8e)[0x7fa09d23de8e]
/home/simon/SAGE/debug/sage-5.7.beta2/local/lib/libpython2.7.so.1.0(_Py_Dealloc+0x35)[0x7fa09d24cbe4]
/home/simon/SAGE/debug/sage-5.7.beta2/local/lib/libpython2.7.so.1.0(+0xace8e)[0x7fa09d23de8e]
/home/simon/SAGE/debug/sage-5.7.beta2/local/lib/libpython2.7.so.1.0(_Py_Dealloc+0x35)[0x7fa09d24cbe4]
/home/simon/SAGE/debug/sage-5.7.beta2/local/lib/libpython2.7.so.1.0(_PyImport_Fini+0x80)[0x7fa09d306b45]
/home/simon/SAGE/debug/sage-5.7.beta2/local/lib/libpython2.7.so.1.0(Py_Finalize+0x58)[0x7fa09d31ab86]
/home/simon/SAGE/debug/sage-5.7.beta2/local/lib/libpython2.7.so.1.0(Py_Main+0xec1)[0x7fa09d33776b]
python(main+0x20)[0x4007b4]
/lib64/libc.so.6(__libc_start_main+0xed)[0x7fa09c5a723d]
python[0x4006d9]
------------------------------------------------------------------------
Attaching gdb to process id 29767.
GNU gdb (GDB) 7.5.1
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007fa09cf838bd in waitpid () from /lib64/libpthread.so.0

Stack backtrace
---------------
No symbol table info available.
#1  0x00007fa099a72250 in print_enhanced_backtrace () from /home/simon/SAGE/debug/sage-5.7.beta2/local/lib/libcsage.so
No symbol table info available.
#2  0x00007fa099a722c2 in sigdie () from /home/simon/SAGE/debug/sage-5.7.beta2/local/lib/libcsage.so
No symbol table info available.
#3  0x00007fa099a71ae3 in sage_signal_handler () from /home/simon/SAGE/debug/sage-5.7.beta2/local/lib/libcsage.so
No symbol table info available.
#4  <signal handler called>
No symbol table info available.
#5  0x00007fa09d24eebe in _PyObject_DebugDumpAddress (p=0x1c817e0) at Objects/obmalloc.c:1649
        q = 0x1c817e0 ""
        tail = 0xfe00000001c817df <Address 0xfe00000001c817df out of bounds>
        nbytes = 18302628885633695743
        serial = 29890528
        i = 0
        ok = 1
        id = -53 '\313'
#6  0x00007fa09d24ec1e in _PyObject_DebugCheckAddressApi (api=111 'o', p=0x1c817e0) at Objects/obmalloc.c:1590
        q = 0x1c817e0 ""
        msgbuf = "bad ID: Allocated using API '\313', verified using API 'o'\000\000\000\000\000\000\000\000"
        msg = 0x7fff6ce1a910 "bad ID: Allocated using API '\313', verified using API 'o'"
        nbytes = 28403168
        tail = 0x67 <Address 0x67 out of bounds>
        i = 32767
        id = -53 '\313'
#7  0x00007fa09d24e8de in _PyObject_DebugFreeApi (api=111 'o', p=0x1c817e0) at Objects/obmalloc.c:1478
        q = 0x1c817d0 "\375\377\377\377\377\377\377\377\313\313\313\313\313\313\313", <incomplete sequence \313>
        nbytes = 28429568
#8  0x00007fa09d24e78a in _PyObject_DebugFree (p=0x1c817e0) at Objects/obmalloc.c:1422
No locals.
#9  0x00007fa08a30d292 in __pyx_f_4sage_5rings_7integer_fast_tp_dealloc (__pyx_v_o=0x1c817e0) at sage/rings/integer.c:35775
        __pyx_t_1 = 0
#10 0x00007fa09d24cbe4 in _Py_Dealloc (op=0x1c817e0) at Objects/object.c:2243
        dealloc = 0x7fa08a30d1c4 <__pyx_f_4sage_5rings_7integer_fast_tp_dealloc>
#11 0x00007fa09d23de8e in dict_dealloc (mp=0x1ccef60) at Objects/dictobject.c:985
        ep = 0x1cd31a8
        fill = 25
#12 0x00007fa09d24cbe4 in _Py_Dealloc (op=0x1ccef60) at Objects/object.c:2243
        dealloc = 0x7fa09d23dd5b <dict_dealloc>
#13 0x00007fa09d23de8e in dict_dealloc (mp=0x673ba0) at Objects/dictobject.c:985
        ep = 0x1b86bd8
        fill = 277
#14 0x00007fa09d24cbe4 in _Py_Dealloc (op=0x673ba0) at Objects/object.c:2243
        dealloc = 0x7fa09d23dd5b <dict_dealloc>
#15 0x00007fa09d306b45 in _PyImport_Fini () at Python/import.c:244
No locals.
#16 0x00007fa09d31ab86 in Py_Finalize () at Python/pythonrun.c:470
        interp = 0x602010
        tstate = 0x6020a0
#17 0x00007fa09d33776b in Py_Main (argc=3, argv=0x7fff6ce1ad68) at Modules/main.c:664
        c = -1
        sts = 0
        command = 0x0
        filename = 0x7fff6ce1b856 "/home/simon/SAGE/debug/sage-5.7.beta2/local/bin/sage-ipython"
        module = 0x0
        fp = 0x684a90
        p = 0x0
        unbuffered = 0
        skipfirstline = 0
        stdin_is_interactive = 1
        help = 0
        version = 0
        saw_unbuffered_flag = 0
        cf = {cf_flags = 0}
#18 0x00000000004007b4 in main (argc=3, argv=0x7fff6ce1ad68) at ./Modules/python.c:23
No locals.


Cython backtrace (newest frame = last)
--------------------------------------
#0  0x0000000000400794 in main()
#1  0x00007fa09d3368aa in Py_Main()
#2  0x00007fa09d31ab2e in Py_Finalize()
#3  0x00007fa09d306ac5 in _PyImport_Fini()
#4  0x00007fa09d24cbaf in _Py_Dealloc()
#5  0x00007fa09d23dd5b in dict_dealloc()
#6  0x00007fa09d24cbaf in _Py_Dealloc()
#7  0x00007fa09d23dd5b in dict_dealloc()
#8  0x00007fa09d24cbaf in _Py_Dealloc()
#9  0x00007fa08a30d1c4 in fast_tp_dealloc() at /home/simon/SAGE/debug/sage-5.7.beta2/devel/sage-main/sage/rings/integer.pyx:6053
  6048    
  6049        # Free the object. This assumes that Py_TPFLAGS_HAVE_GC is not
  6050        # set. If it was set another free function would need to be
  6051        # called.
  6052    
> 6053        PyObject_FREE(o)
  6054    
  6055    
  6056    hook_fast_tp_functions()
  6057    from sage.misc.allocator cimport hook_tp_functions
#10 0x00007fa09d24e76d in _PyObject_DebugFree()
#11 0x00007fa09d24e8a8 in _PyObject_DebugFreeApi()
#12 0x00007fa09d24eb0a in _PyObject_DebugCheckAddressApi()
#13 0x00007fa09d24ec2c in _PyObject_DebugDumpAddress()
#14 0x00007fa09d617a10 in __restore_rt()
#15 0x00007fa099a7194a in sage_signal_handler()
#16 0x00007fa099a72252 in sigdie()
#17 0x00007fa099a7212b in print_enhanced_backtrace()
#18 0x00007fa09cf83860 in waitpid()


Saved trace to /home/simon/.sage/crash_logs/sage_crash_WekcH8.log
------------------------------------------------------------------------
Unhandled SIGSEGV: A segmentation fault occurred in Sage.
This probably occurred because a *compiled* component of Sage has a bug
in it and is not properly wrapped with sig_on(), sig_off(). You might
want to run Sage under gdb with 'sage -gdb' to debug this.
Sage will now terminate.
------------------------------------------------------------------------
../../sage: Zeile 135: 29767 Speicherzugriffsfehler  "$SAGE_ROOT/spkg/bin/sage" "$@"
Last edited 7 years ago by SimonKing (previous) (diff)

comment:2 Changed 7 years ago by jpflori

I can indeed repodruce the problem.

comment:3 Changed 7 years ago by SimonKing

The same with sage -gdb yields

sage: 
Exiting Sage (CPU time 0m2.67s, Wall time 0m27.17s).
[Thread 0x7fffef932700 (LWP 29861) exited]
Debug memory block at address p=0x1c817e0: API '�'
    18302628885633695743 bytes originally requested
    The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfb):
        at p-7: 0xcb *** OUCH
        at p-6: 0xcb *** OUCH
        at p-5: 0xcb *** OUCH
        at p-4: 0xcb *** OUCH
        at p-3: 0xcb *** OUCH
        at p-2: 0xcb *** OUCH
        at p-1: 0xcb *** OUCH
    Because memory is corrupted at the start, the count of bytes requested
       may be bogus, and checking the trailing pad bytes may segfault.
    The 8 pad bytes at tail=0xfe00000001c817df are 
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7a29ebe in _PyObject_DebugDumpAddress (p=0x1c817e0) at Objects/obmalloc.c:1649
1649    Objects/obmalloc.c: Datei oder Verzeichnis nicht gefunden.
(gdb) bt
#0  0x00007ffff7a29ebe in _PyObject_DebugDumpAddress (p=0x1c817e0) at Objects/obmalloc.c:1649
#1  0x00007ffff7a29c1e in _PyObject_DebugCheckAddressApi (api=111 'o', p=0x1c817e0) at Objects/obmalloc.c:1590
#2  0x00007ffff7a298de in _PyObject_DebugFreeApi (api=111 'o', p=0x1c817e0) at Objects/obmalloc.c:1478
#3  0x00007ffff7a2978a in _PyObject_DebugFree (p=0x1c817e0) at Objects/obmalloc.c:1422
#4  0x00007fffe4a40292 in __pyx_f_4sage_5rings_7integer_fast_tp_dealloc (__pyx_v_o=<sage.rings.integer.Integer at remote 0x1c817e0>) at sage/rings/integer.c:35775
#5  0x00007ffff7a27be4 in _Py_Dealloc (op=<sage.rings.integer.Integer at remote 0x1c817e0>) at Objects/object.c:2243
#6  0x00007ffff7a18e8e in dict_dealloc (mp=
    {'gmp_randrange': <built-in method gmp_randrange of builtin_function_or_method object at remote 0x239aba0>, 'LCM_list': <built-in method LCM_list of builtin_function_or_method object at remote 0x1b23c18>, '_test_mpz_set_longlong': <built-in method _test_mpz_set_longlong of builtin_function_or_method object at remote 0x1b16d08>, 'initialized': False, 'operator': <module at remote 0x8a7f68>, 'make_integer': <built-in method make_integer of builtin_function_or_method object at remote 0x1c8b588>, '__package__': <unknown at remote 0x1b1cdc0>, 'IntegerWrapper': <type sage.rings.integer.IntegerWrapper at remote 0x7fffe4c74d20>, 'CoercionException': <type CoercionException at remote 0x16d53e0>, 'ONE': <sage.rings.integer.Integer at remote 0x1c817e0>, 'clear_mpz_globals': <built-in function clear_mpz_globals>, '__pyx_capi__': {'smallInteger': <PyCapsule at remote 0x1a2bac0>}, 'integer_ring': <module at remote 0x1b30e30>, 'Integer': <type sage.rings.integer.Integer at remote 0x7fffe4c74980>, '__doc__': "File: sage/ring...(truncated)) at Objects/dictobject.c:985
#7  0x00007ffff7a27be4 in _Py_Dealloc (op=
    {'gmp_randrange': <built-in method gmp_randrange of builtin_function_or_method object at remote 0x239aba0>, 'LCM_list': <built-in method LCM_list of builtin_function_or_method object at remote 0x1b23c18>, '_test_mpz_set_longlong': <built-in method _test_mpz_set_longlong of builtin_function_or_method object at remote 0x1b16d08>, 'initialized': False, 'operator': <module at remote 0x8a7f68>, 'make_integer': <built-in method make_integer of builtin_function_or_method object at remote 0x1c8b588>, '__package__': <unknown at remote 0x1b1cdc0>, 'IntegerWrapper': <type sage.rings.integer.IntegerWrapper at remote 0x7fffe4c74d20>, 'CoercionException': <type CoercionException at remote 0x16d53e0>, 'ONE': <sage.rings.integer.Integer at remote 0x1c817e0>, 'clear_mpz_globals': <built-in function clear_mpz_globals>, '__pyx_capi__': {'smallInteger': <PyCapsule at remote 0x1a2bac0>}, 'integer_ring': <module at remote 0x1b30e30>, 'Integer': <type sage.rings.integer.Integer at remote 0x7fffe4c74980>, '__doc__': "File: sage/ring...(truncated)) at Objects/object.c:2243
#8  0x00007ffff7a18e8e in dict_dealloc (mp=
    {<unknown at remote 0x23365b8>: <unknown at remote 0x237d650>, <unknown at remote 0x26c1428>: <unknown at remote 0x26903a0>, <unknown at remote 0x290b100>: <unknown at remote 0x293db00>, <unknown at remote 0x17865c0>: <unknown at remote 0x17b9b70>, <unknown at remote 0x2b92640>: <unknown at remote 0x2caa5e0>, <unknown at remote 0x2413100>: <unknown at remote 0x2407470>, <unknown at remote 0x2a67a50>: <unknown at remote 0x2a28140>, <unknown at remote 0x5d5eeb0>: <unknown at remote 0x5d9f410>, <unknown at remote 0x13a78c8>: <unknown at remote 0x13ee740>, <unknown at remote 0x2e0e880>: <unknown at remote 0x2e22b60>, <unknown at remote 0x24b2998>: <unknown at remote 0x2508e50>, <unknown at remote 0x2a67bc0>: <unknown at remote 0x2a2b0c0>, <unknown at remote 0x2b924c0>: <unknown at remote 0x2aec980>, <unknown at remote 0x3579040>: <unknown at remote 0x3522620>, 'gc': <unknown at remote 0x98eba0>, <unknown at remote 0x2b21998>: <unknown at remote 0x2ab35e0>, <unknown at remote 0x19483d8>: <unknown at remote 0x197fd...(truncated)) at Objects/dictobject.c:985
#9  0x00007ffff7a27be4 in _Py_Dealloc (op=
    {<unknown at remote 0x23365b8>: <unknown at remote 0x237d650>, <unknown at remote 0x26c1428>: <unknown at remote 0x26903a0>, <unknown at remote 0x290b100>: <unknown at remote 0x293db00>, <unknown at remote 0x17865c0>: <unknown at remote 0x17b9b70>, <unknown at remote 0x2b92640>: <unknown at remote 0x2caa5e0>, <unknown at remote 0x2413100>: <unknown at remote 0x2407470>, <unknown at remote 0x2a67a50>: <unknown at remote 0x2a28140>, <unknown at remote 0x5d5eeb0>: <unknown at remote 0x5d9f410>, <unknown at remote 0x13a78c8>: <unknown at remote 0x13ee740>, <unknown at remote 0x2e0e880>: <unknown at remote 0x2e22b60>, <unknown at remote 0x24b2998>: <unknown at remote 0x2508e50>, <unknown at remote 0x2a67bc0>: <unknown at remote 0x2a2b0c0>, <unknown at remote 0x2b924c0>: <unknown at remote 0x2aec980>, <unknown at remote 0x3579040>: <unknown at remote 0x3522620>, 'gc': <unknown at remote 0x98eba0>, <unknown at remote 0x2b21998>: <unknown at remote 0x2ab35e0>, <unknown at remote 0x19483d8>: <unknown at remote 0x197fd...(truncated)) at Objects/object.c:2243
#10 0x00007ffff7ae1b45 in _PyImport_Fini () at Python/import.c:244
#11 0x00007ffff7af5b86 in Py_Finalize () at Python/pythonrun.c:470
#12 0x00007ffff7b1276b in Py_Main (argc=3, argv=0x7fffffffd168) at Modules/main.c:664
#13 0x00000000004007b4 in main (argc=3, argv=0x7fffffffd168) at ./Modules/python.c:23
(gdb) q

comment:4 follow-up: Changed 7 years ago by SimonKing

One detail:

sage: quit
'Use Ctrl-D (i.e. EOF), %Exit, or %Quit to exit without confirmation.'

I.e., quit does not work as normally. Is this due to the debug version? Or what is happening here?

comment:5 in reply to: ↑ 4 Changed 7 years ago by jpflori

Replying to SimonKing:

One detail:

sage: quit
'Use Ctrl-D (i.e. EOF), %Exit, or %Quit to exit without confirmation.'

I.e., quit does not work as normally. Is this due to the debug version? Or what is happening here?

At least, this is not due to Python being built with --pydebug as it happens on my 5.7.beta[1|2] with SAGE_DEBUG=yes but without the correct Python spkg. This does not happen on 5.6 (with SAGE_DEBUG as well I think). Maybe an upgrade to ipython ?

comment:6 Changed 7 years ago by SimonKing

What is the original purpose of fast_tp_dealloc? If I understand it correctly, if there is enough room left on the stack, then dead integers are put there (and I guess get freed in a whole bunch, which is cheaper than freeing them one by one), otherwise fast_tp_dealloc explicitly removes them.

And I see that the Integer class has a __dealloc__ method; but it seems that

hook_tp_functions(global_dummy_Integer, NULL, <newfunc>&fast_tp_new, NULL, &fast_tp_dealloc, False)

is then used to override the dealloc, or what?

And concerning the global_dummy_Integer, I see the comment

# We use a global Integer element to steal all the references
# from.  DO NOT INITIALIZE IT AGAIN and DO NOT REFERENCE IT!
cdef Integer global_dummy_Integer
global_dummy_Integer = Integer()

Sounds like black magic to me. Robert, can you elaborate how it all works?

comment:7 Changed 7 years ago by SimonKing

> grep global_dummy_Integer -R sage | grep REF
sage/rings/integer.c:  if_Py_TRACE_REFS_then_PyObject_INIT(__pyx_v_new, ((PyObject *)__pyx_v_4sage_5rings_7integer_global_dummy_Integer)->ob_type);
sage/rings/integer.c:  __pyx_v_4sage_5rings_7integer_global_dummy_Integer = ((struct __pyx_obj_4sage_5rings_7integer_Integer *)Py_None); Py_INCREF(Py_None);
sage/rings/integer.c:  __Pyx_XGOTREF(((PyObject *)__pyx_v_4sage_5rings_7integer_global_dummy_Integer));
sage/rings/integer.c:  __Pyx_DECREF(((PyObject *)__pyx_v_4sage_5rings_7integer_global_dummy_Integer));

I am not expert enough to see whether the lines above gives rise to a wrong reference count.

comment:8 Changed 7 years ago by SimonKing

Interesting. When I add a print statement to the beginning of fast_tp_new and fast_tp_dealloc then the crash vanishes.

comment:9 Changed 7 years ago by SimonKing

The Integer's __cinit__ method gets called 7 times at startup -- even though it appears to get overriden by the hook_tp_functions line. When one creates an integer after startup, then only fast_tp_new is called, but not __cinit__. And __dealloc__ never gets called.

comment:10 follow-up: Changed 7 years ago by SimonKing

I wonder about this message when Sage crashes:

    The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfb):
        at p-7: 0xcb *** OUCH
        at p-6: 0xcb *** OUCH
        at p-5: 0xcb *** OUCH
        at p-4: 0xcb *** OUCH
        at p-3: 0xcb *** OUCH
        at p-2: 0xcb *** OUCH
        at p-1: 0xcb *** OUCH

Could it be that fast_tp_new writes to the wrong location when it does

        (<__mpz_struct *>( <char *>new + mpz_t_offset) )._mp_d = <mp_ptr>mpz_alloc(GMP_LIMB_BITS >> 3)

?

comment:11 Changed 7 years ago by SimonKing

At least for educational purposes, the following comments have to change:

        # We take the address 'new' and move mpz_t_offset bytes (chars)
        # to the address of 'value'. We treat that address as a pointer
        # to a mpz_t struct and allocate memory for the _mp_d element of
        # that struct. We allocate one limb.
        #
        # What is done here is potentially very dangerous as it reaches
        # deeply into the internal structure of GMP. Consequently things
        # may break if a new release of GMP changes some internals. To
        # emphasize this, this is what the GMP manual has to say about
        # the documentation for the struct we are using:
        #
        #  "This chapter is provided only for informational purposes and the
        #  various internals described here may change in future GMP releases.
        #  Applications expecting to be compatible with future releases should use
        #  only the documented interfaces described in previous chapters."
        #
        # If this line is used Sage is not such an application.
        #
        # The clean version of the following line is:
        #
        # mpz_init( <mpz_t>(<char *>new + mpz_t_offset) )
        #
        # We save time both by avoiding an extra function call and
        # because the rest of the mpz struct was already initialized
        # fully using the memcpy above.

        (<__mpz_struct *>( <char *>new + mpz_t_offset) )._mp_d = <mp_ptr>mpz_alloc(GMP_LIMB_BITS >> 3)

Namely, when I uncomment mpz_init( <mpz_t>(<char *>new + mpz_t_offset) ) and comment the last line, then the module won't build:

sage/rings/integer.c: In function ‘__pyx_f_4sage_5rings_7integer_fast_tp_new’:
sage/rings/integer.c:35637:18: error: cast specifies array type
error: command 'gcc' failed with exit status 1
Error installing modified sage library code.

So, the "clean version" doesn't even build.

comment:12 follow-up: Changed 7 years ago by jpflori

I think the point of this black magic is to keep a pool of pointers to Sage integers rather than allocating/freeing one each time a new one is needed/deleted.

So when you want a new integer and there are preallocated ones left you just grab one and when you want to delete one, and there is room left in the pool, you put it back there.

FLINT more or less uses a similar strategy (in single threaded mode).

The commented out line makes sense, inside the memory allocated for a Sage integer, we move to the address reserved for the mpz structure and initialize it, I guess its just illegal to cast to mpz_t type which is a typedef for an array of mpz of size one, something like mpz* should be used.

Not quite sure what the global dummy integer is used for, obviously for dark magic refcounting, but still unclear to me.

The point of the global dummy integer is to fool.

comment:13 in reply to: ↑ 10 Changed 7 years ago by jpflori

Replying to SimonKing:

I wonder about this message when Sage crashes:

    The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfb):
        at p-7: 0xcb *** OUCH
        at p-6: 0xcb *** OUCH
        at p-5: 0xcb *** OUCH
        at p-4: 0xcb *** OUCH
        at p-3: 0xcb *** OUCH
        at p-2: 0xcb *** OUCH
        at p-1: 0xcb *** OUCH

Could it be that fast_tp_new writes to the wrong location when it does

        (<__mpz_struct *>( <char *>new + mpz_t_offset) )._mp_d = <mp_ptr>mpz_alloc(GMP_LIMB_BITS >> 3)

?

The size allocated looks fine, the address of the pointer as well.

comment:14 Changed 7 years ago by ncohen

  • Cc ncohen added

comment:15 in reply to: ↑ 12 Changed 7 years ago by SimonKing

Replying to jpflori:

I think the point of this black magic is to keep a pool of pointers to Sage integers rather than allocating/freeing one each time a new one is needed/deleted.

Yep, I understood that much.

The commented out line makes sense, inside the memory allocated for a Sage integer, we move to the address reserved for the mpz structure and initialize it, I guess its just illegal to cast to mpz_t type which is a typedef for an array of mpz of size one, something like mpz* should be used.

So, you think one needs to write mpz_init( <__mpz_struct*>(<char *>new + mpz_t_offset)instead of mpz_init( <mpz_t>(<char *>new + mpz_t_offset), and similar for deallocation?

I tried

  • sage/rings/integer.pyx

    diff --git a/sage/rings/integer.pyx b/sage/rings/integer.pyx
    a b  
    59895989        #
    59905990        # The clean version of the following line is:
    59915991        #
    5992         #  mpz_init( <mpz_t>(<char *>new + mpz_t_offset) )
     5992        mpz_init( <__mpz_struct*>(<char *>new + mpz_t_offset) )
    59935993        #
    59945994        # We save time both by avoiding an extra function call and
    59955995        # because the rest of the mpz struct was already initialized
    59965996        # fully using the memcpy above.
    59975997
    5998         (<__mpz_struct *>( <char *>new + mpz_t_offset) )._mp_d = <mp_ptr>mpz_alloc(GMP_LIMB_BITS >> 3)
     5998        #(<__mpz_struct *>( <char *>new + mpz_t_offset) )._mp_d = <mp_ptr>mpz_alloc(GMP_LIMB_BITS >> 3)
    59995999
    60006000    # This line is only needed if Python is compiled in debugging mode
    60016001    # './configure --with-pydebug' or SAGE_DEBUG=yes. If that is the
     
    60426042
    60436043    # Again, we move to the mpz_t and clear it. See above, why this is evil.
    60446044    # The clean version of this line would be:
    6045     #   mpz_clear(<mpz_t>(<char *>o + mpz_t_offset))
    6046 
    6047     mpz_free((<__mpz_struct *>( <char *>o + mpz_t_offset) )._mp_d, 0)
     6045    mpz_clear(<__mpz_struct *>(<char *>o + mpz_t_offset))
     6046
     6047    #mpz_free((<__mpz_struct *>( <char *>o + mpz_t_offset) )._mp_d, 0)
    60486048
    60496049    # Free the object. This assumes that Py_TPFLAGS_HAVE_GC is not
    60506050    # set. If it was set another free function would need to be

but it still segfaults at exit.

comment:16 Changed 7 years ago by jpflori

I did not say it could solve the segfault :) In fact it should do basically the same thing as before, but using the proper functions.

comment:17 follow-up: Changed 7 years ago by jpflori

The problem occurs when deallocating ONE, which is allocated before the hooks are setup, with the hooked method.

Creating ONE after hooking up the method (seemingly) solves the problem.

comment:18 in reply to: ↑ 17 ; follow-up: Changed 7 years ago by SimonKing

Replying to jpflori:

The problem occurs when deallocating ONE, which is allocated before the hooks are setup, with the hooked method.

OK. But ONE is not the only integer that gets created before setting the hook.

The created integers are:

  • 0
  • 1
  • ONE (hence, the integer 1 is created, and then another 1 is created and assigned to ONE)
  • -1
  • 0
  • 1
  • global_dummy_Integer (which is 0)

And them the hooks are established.

Creating ONE after hooking up the method (seemingly) solves the problem.

But probably we should take care of the other elements as well.

By the way: There is another way to make the segfault vanish. We could simply insert print statements, that print the memory addresses of the integers when they are created/deleted/taken from the pool/put into the pool. It works (but don't take it as a serious suggestion...)!

So, how did you find out that ONE is deallocated with the hooked method?

comment:19 follow-up: Changed 7 years ago by jpflori

Hummm, I don't see the Py_INCREF(ONE) you mention in the ticket description in my 5.7.beta1 and 5.7.beta2 install (it is in my 5.6 install).

comment:20 Changed 7 years ago by jpflori

That's "because" of #12615.

comment:21 Changed 7 years ago by jpflori

And it was indeed merged in 5.7.beta1.

comment:22 in reply to: ↑ 18 ; follow-up: Changed 7 years ago by jpflori

Replying to SimonKing:

Replying to jpflori:

The problem occurs when deallocating ONE, which is allocated before the hooks are setup, with the hooked method.

OK. But ONE is not the only integer that gets created before setting the hook.

I agree it is a problem.

The created integers are:

  • 0
  • 1
  • ONE (hence, the integer 1 is created, and then another 1 is created and assigned to ONE)
  • -1
  • 0
  • 1
  • global_dummy_Integer (which is 0)

And them the hooks are established.

Creating ONE after hooking up the method (seemingly) solves the problem.

But probably we should take care of the other elements as well.

Yup.

By the way: There is another way to make the segfault vanish. We could simply insert print statements, that print the memory addresses of the integers when they are created/deleted/taken from the pool/put into the pool. It works (but don't take it as a serious suggestion...)!

So, how did you find out that ONE is deallocated with the hooked method?

Looking at the enhanced backtrace. A dict is deallocated and our tools are smart enough to print the name of the object in the dict with there address and one step further the fats_tp_dealloc method is called on the address corresponding to ONE.

comment:23 in reply to: ↑ 19 Changed 7 years ago by SimonKing

Replying to jpflori:

Hummm, I don't see the Py_INCREF(ONE) you mention in the ticket description in my 5.7.beta1 and 5.7.beta2 install (it is in my 5.6 install).

Line 3584 has Py_INCREF(ONE) (still in sage-5.7.beta2).

But it seems to be different from the incref introduced in #2435. The incref we currently find was hg commit number 4442 of David Roe, who doesn't mention a ticket number in the commit message.

comment:24 Changed 7 years ago by jpflori

I'm speaking of line 5651 which was removed in #12615 and was the fix of #2435.

comment:25 in reply to: ↑ 22 Changed 7 years ago by SimonKing

Replying to jpflori:

The created integers are:

  • 0
  • 1
  • ONE (hence, the integer 1 is created, and then another 1 is created and assigned to ONE)
  • -1
  • 0
  • 1
  • global_dummy_Integer (which is 0)

And them the hooks are established.

Creating ONE after hooking up the method (seemingly) solves the problem.

But probably we should take care of the other elements as well.

Yup.

It is obvious where the integers between ONE and global_dummy_Integer come from:

ONE = Integer(1)
cdef long small_pool_min = -1
cdef long small_pool_max = 1
cdef list small_pool = [Integer(k) for k in range(small_pool_min, small_pool_max+1)]
cdef inline Integer smallInteger(long value):
    cdef Integer z
    if small_pool_min <= value <= small_pool_max:
        return <Integer>small_pool[value - small_pool_min]
    else:
        z = PY_NEW(Integer)
        mpz_set_si(z.value, value)
        return z

Before ONE, the function zero_one_elements() creates 0 and 1.

So, our current guess is that we should move the creation of the dummy and the hooking to the very beginning of the module, right?

Last edited 7 years ago by SimonKing (previous) (diff)

comment:26 follow-up: Changed 7 years ago by jpflori

We should also get rid of ONE as we now have one.

And I'm not sure what the remaining Py_INCREF really is for... Does not the assignment automatically increase the refcount?

comment:27 follow-up: Changed 7 years ago by jpflori

Don't really now why, but the other integers are cdef'ed which I guess is why they don't cause trouble. See http://trac.sagemath.org/sage_trac/ticket/2435#comment:4

comment:28 in reply to: ↑ 26 Changed 7 years ago by SimonKing

Replying to jpflori:

We should also get rid of ONE as we now have one.

Worth trying, anyway. We could define ONE=one.

And I'm not sure what the remaining Py_INCREF really is for... Does not the assignment automatically increase the refcount?

Looks mysterious. Perhaps it is explained in #2435 or #12615. Robert?

comment:29 in reply to: ↑ 27 Changed 7 years ago by SimonKing

Replying to jpflori:

Don't really now why, but the other integers are cdef'ed which I guess is why they don't cause trouble. See http://trac.sagemath.org/sage_trac/ticket/2435#comment:4

... which says: "Any easy fix is to cdef it or incref it, but that isn't what the real problem is." Hence, the incref was a work-around, but no solution.

comment:30 Changed 7 years ago by jpflori

About the other INCREF, unfortunately: http://trac.sagemath.org/sage_trac/ticket/2435#comment:6

comment:31 Changed 7 years ago by jpflori

I think the removal of #12615 is a typo, it really makes no sense.

comment:32 Changed 7 years ago by SimonKing

  • Authors set to Simon King
  • Status changed from new to needs_review

With the patch, sage starts and quits without a crash. I don't know if it is ready for review, but you may have a look.

comment:33 follow-up: Changed 7 years ago by jpflori

I don't really think your solution is right. As I mentioned these small integers won't be deallocated anyway as they are cdefed, so we don't care.

And I'm not really sure that doing anything after the global dummy is created rather than after the hook is in place (I'm aware of http://trac.sagemath.org/sage_trac/ticket/2435#comment:10, but that might just be cosmic rays).

For further investigation, the Py_INCREF of the global dummy in sage/misc/allocator.pyx seems useless as well now as it is cdefed and won't be collected.

comment:34 Changed 7 years ago by jpflori

Anyway, I think we should wait for Robert feedback.

comment:35 follow-up: Changed 7 years ago by jpflori

Maybe we are just f*ing up with Python debug memory api.

What Python rants about is that the first bytes before the actual object ONE, which should have been allocated in PyObject_DebugMallocAPI and set to FORBIDDENBYTE are in fact worth CLEANBYTE, which is what the debug memory api sets for the space reserved for the object itself...

comment:36 Changed 7 years ago by jpflori

Any clue on how to speak with GDB before Sage is initialized?

comment:37 Changed 7 years ago by jpflori

Or its Cython which is too hackish. When global_dummy_Integer is initialized, the preceding bytes are worth CLEANBYTE rather than FORBIDDENBYTE, so there is no chance deallocation will fail with a debug build.

comment:38 in reply to: ↑ 33 ; follow-up: Changed 7 years ago by SimonKing

Replying to jpflori:

I don't really think your solution is right. As I mentioned these small integers won't be deallocated anyway as they are cdefed, so we don't care.

Anyway. I believe it should be cleaner to avoid integers being created in one way and then deleted in a different way.

And I'm not really sure that doing anything after the global dummy is created rather than after the hook is in place.

I can't parse that phrase. Do you say: There should be no code whatsoever between the creation of the global dummy and the creation of the hook?

For further investigation, the Py_INCREF of the global dummy in sage/misc/allocator.pyx seems useless as well now as it is cdefed and won't be collected.

I removed the Py_INCREF, and Sage starts and quits without problem. But would it hurt to have the incref?

comment:39 in reply to: ↑ 38 Changed 7 years ago by jpflori

Replying to SimonKing:

Replying to jpflori:

I don't really think your solution is right. As I mentioned these small integers won't be deallocated anyway as they are cdefed, so we don't care.

Anyway. I believe it should be cleaner to avoid integers being created in one way and then deleted in a different way.

I agree.

And I'm not really sure that doing anything after the global dummy is created rather than after the hook is in place.

I can't parse that phrase. Do you say: There should be no code whatsoever between the creation of the global dummy and the creation of the hook?

I meant that following what you posted you should have tried to create the integers different from the dummy one just after the dummy one and not after the hooks are in place.

But as stated above, I agree that it is cleaner to create everything after the hooks are in place, except for dummy of course cos its not possible. And anyway these integers are cdefed so wont be garbage collected automatically, although you can trigger their deletion by calling del on them.

For further investigation, the Py_INCREF of the global dummy in sage/misc/allocator.pyx seems useless as well now as it is cdefed and won't be collected.

I removed the Py_INCREF, and Sage starts and quits without problem. But would it hurt to have the incref?

I don't think having code no one understands is a good thing. And it would make a small slowdown to keep this incref :) And prevent a proper deallocation of ONE when Sage quits, if we ever want or manage to do that.

comment:40 in reply to: ↑ 35 ; follow-up: Changed 7 years ago by SimonKing

Replying to jpflori:

Maybe we are just f*ing up with Python debug memory api.

What Python rants about is that the first bytes before the actual object ONE, which should have been allocated in PyObject_DebugMallocAPI and set to FORBIDDENBYTE are in fact worth CLEANBYTE, which is what the debug memory api sets for the space reserved for the object itself...

What is the problem with it? Can we prevent it? Does it make sense to prevent it when we are not in debug mode?

comment:41 in reply to: ↑ 40 Changed 7 years ago by jpflori

Replying to SimonKing:

Replying to jpflori:

Maybe we are just f*ing up with Python debug memory api.

What Python rants about is that the first bytes before the actual object ONE, which should have been allocated in PyObject_DebugMallocAPI and set to FORBIDDENBYTE are in fact worth CLEANBYTE, which is what the debug memory api sets for the space reserved for the object itself...

What is the problem with it? Can we prevent it? Does it make sense to prevent it when we are not in debug mode?

It is not a problem outside of debug mode I guess.

In debug mode the problem is that it seems that the object has been malloced/created without these surrounding FORBIDDENBYTE (and some additional info including the size allocated, and what API was used), but when it is deallocaed (what happened because a Py_INCREF was removed in #12615, and ONE was not cdefed), the dealloc functions look for these bytes (that's the OUCH part), and then try to reach something based on the size allocated which should be stored next to these bytes. Unfortunately this additional info is not present (not written when alloced? overwritten later? I don't know...) so the dealloc function gets a random size (look at "18302628885633695743 bytes originally requested" in your first comment) and it leads to a segfault.

comment:42 Changed 7 years ago by SimonKing

Robert, do you agree that letting global_dummy_Integer be the only integer created without pool and with __cinit__ is a valid solution?

comment:43 follow-up: Changed 7 years ago by SimonKing

Concerning removal of the INCREF: I could imagine removing it is a bad idea. After all, the allocator appears to be of general use. Hence, we have no guarantee that in other (potential) uses of sage.misc.allocator we could guarantee externally that the dummy will live forever.

Plus: hook_tp_functions (and thus Py_INCREF) is called exactly once. I don't think that the slowdown caused by it really matters...

comment:44 in reply to: ↑ 43 Changed 7 years ago by jpflori

Replying to SimonKing:

Concerning removal of the INCREF: I could imagine removing it is a bad idea. After all, the allocator appears to be of general use. Hence, we have no guarantee that in other (potential) uses of sage.misc.allocator we could guarantee externally that the dummy will live forever.

I meant the other INCREF mentioned here http://trac.sagemath.org/sage_trac/ticket/2435#comment:6 and with no info on why it would be needed.

About the INCREF on dummy, my point is that it is useless as dummy is cdefed and won't be automatically collected. But indeedit could be called on something not cdefed so let's keep it.

Plus: hook_tp_functions (and thus Py_INCREF) is called exactly once. I don't think that the slowdown caused by it really matters...

Yeah, even the other which gets called every time the method is called won't be that terrible, I was kidding.

comment:45 Changed 7 years ago by jpflori

In fact I don't even feel that it is necessary to ensure that the global dummy never gets collected. I mean what's the matter if it gets collected when Sage quits and we are sure no other Integer will be created ? This should be the case for a more general use of sage/misc/allocator.pyx whence the unnecessity of the INCREF in general.

comment:46 Changed 7 years ago by jpflori

Here comes another proposition of patch.

comment:47 Changed 7 years ago by jpflori

Hum, the problem with this BYTE marker has really something to do with the change of allocator/deallocator.

If I remove the call to hook_fast_tp_functions, then the integers created, even during an interactive session, are preceded by CLEANBYTE rather than FORBIDDENBYTE and deallocated without problems.

So I guess I'll have to have a closer look at Python's code... I must have missed what the default allocator for Integer is, but it does not finish in PyObject_Malloc I guess.

comment:48 Changed 7 years ago by SimonKing

The alternative patch is incomplete: It removes ONE, but it does not change the import line "from sage.rings.integer import ONE" in sage/structure/factorization_integer.py.

It seems to me that the alternative patch is not essentially different from the original patch, but mainly adds more comments - or did I miss something? Could you turn it into a referee patch?

comment:49 Changed 7 years ago by jpflori

Yup, I'll do that.

Changed 7 years ago by jpflori

Cleanup hooked functions and integer allocation.

comment:50 Changed 7 years ago by jpflori

  • Description modified (diff)
  • Reviewers set to Jean-Pierre Flori
  • Status changed from needs_review to positive_review

If Robert has any comment, feel free to change the status of the ticket or enlighten us.

comment:51 follow-up: Changed 7 years ago by SimonKing

One question: Why is it a problem that ONE was created via __cinit__ but deleted via fast_tp_dealloc? After all, if the pool is not involved, this function does the same as __dealloc__ plus decref. Isn't that the same as without the hooks?

comment:52 in reply to: ↑ 51 Changed 7 years ago by jpflori

Replying to SimonKing:

One question: Why is it a problem that ONE was created via __cinit__ but deleted via fast_tp_dealloc? After all, if the pool is not involved, this function does the same as __dealloc__ plus decref. Isn't that the same as without the hooks?

That's quite exactly what I've been ranting about and hoped we could better understand although the problem is not at this level but deeper it seems.

A first guess is that somehow the deep PyObject_Malloc function used before the hooks are setup is not the debug version and do not produce the FORBIDDENBYTE markers (and additional info including the size alloced) and the PyObject_Free function is not as well and does not expect them and everything is fine if we try to deallocate ONE (you can put an explicit del statement and check there is not crash, or completely remove the hooks setup). But when the hooks are setup, these two functions get replaced by the Malloc and Free debug functions which set and expect the FORBIDDENBYTE markers (and some additional info). So at dealloc time it bangs. What I don't understand is why these two last debug version would not used from the start. After all were living in a debug build of Python...

comment:53 Changed 7 years ago by jpflori

So our workaround here is just to make sure that Integer allocated before the hooks are setup do not get deallocated, even when Sage start.

It has two flavours:

  • cdef the Integer, so there is no automatic collection,
  • artificially increase its refcount so that it never reaches zero.

And we use both.

comment:54 follow-up: Changed 7 years ago by jpflori

Ok got it.

The problem is that the dummy is allocated with the PyObject_GC_Malloc function and that adds an additional header of 32 bytes before the object itself. And if you look 36 bytes before the object itself you indeed find the FORBIDDENBYTE markers.

comment:55 in reply to: ↑ 54 Changed 7 years ago by SimonKing

Replying to jpflori:

Ok got it.

The problem is that the dummy is allocated with the PyObject_GC_Malloc function and that adds an additional header of 32 bytes before the object itself.

And fast_tp_new can not use that? What is the difference between PyObject_GC_Malloc and PyObject_MALLOC? I guess it's about the cyclic garbage collection, which seemingly we don't want here. But if we don't want it, why are integers created with PyObject_GC_Malloc before establishing the hook?

comment:56 follow-up: Changed 7 years ago by jpflori

Because I guess that all cdef'ed classes are created with GC enabled (by default?).

comment:57 in reply to: ↑ 56 Changed 7 years ago by SimonKing

Replying to jpflori:

Because I guess that all cdef'ed classes are created with GC enabled (by default?).

OK. But what is the reason for not using it in the hooked functions?

comment:58 follow-up: Changed 7 years ago by jpflori

Don't know. Maybe speed?

As you can see in sage/misc/allocator.pyx we pass a flag to explicitely disable GC.

comment:59 in reply to: ↑ 58 Changed 7 years ago by nbruin

Replying to jpflori:

Don't know. Maybe speed?

Indeed. Types that are not container type (types that do not hold references to other python objects) need not be tracked by GC. I don't think Cython supports "not GC tracked" by default. Basically all Sage objects formally are container types because they have a _parent pointer or something like that.

Integers do too, but since ZZ is supposed to be immortal anyway, it doesn't matter.

It means that

del ZZ

would not lead to ZZ actually being deallocated (even if there are no other references to it anymore, because ZZ.one_element would have a pointer to ZZ that can't be discovered to be part of the cycle by GC.

comment:60 Changed 7 years ago by jdemeyer

  • Merged in set to sage-5.7.beta4
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.