Opened 7 years ago

Last modified 4 weeks ago

#13447 needs_work defect

Make libsingular multivariate polynomial rings collectable — at Version 16

Reported by: nbruin Owned by: rlm
Priority: major Milestone: sage-9.1
Component: memleak Keywords:
Cc: SimonKing, malb, vbraun, gagern, robertwb, ylchapuy, jpflori, burcin Merged in:
Authors: Reviewers:
Report Upstream: N/A Work issues: Input from libsingular experts
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by SimonKing)

Presently, #715 + #11521 help not permanently keeping parent in memory. In the process we uncovered a hard-but-consistently triggerable problem with the collection of MPolynomialRing_libsingular. We have only observed the problem on bsd.math.washington.edu, MacOSX 10.6 on x86_64.

The present work-around is to permanently store references to these upon creation, thus preventing collection. It would be nice if we could properly solve the problem (or at least establish that the problem is specific to bsd.math)

Apply trac_13447-consolidated_refcount.patch

Change History (18)

comment:1 Changed 7 years ago by nbruin

On 5.4-beta0 + #715 + #11521, there is a doctest failure on bsd.math.washington.edu, an x86_64 machine running MacOSX 10.6:

bash-3.2$ ../../sage -t sage/misc/cachefunc.pyx
sage -t  "devel/sage-main/sage/misc/cachefunc.pyx"          
The doctested process was killed by signal 11
         [12.7 s]
 
----------------------------------------------------------------------
The following tests failed:


        sage -t  "devel/sage-main/sage/misc/cachefunc.pyx" # Killed/crashed
Total time for all tests: 12.7 seconds

The segmentation fault happens reliably, but is hard to study because

  • running the same examples in an interactive session does not trigger the problem
  • running with sage -t -gdb (yes, that's possible!) or sage -t --verbose does not trigger the problem.
  • the order of tests in the file seems fairly important. You can change and

delete some tests but not others. The likely explanation is that a garbage collection has to be triggered under the right conditions, so that the memory corruption (which likely happens upon deallocate somewhere) happens in the right spot.

The segfault happens in the doctest for CachedMethodCaller._instance_call (line 1038 in the sage source; example_27 in the file ~/.sage/tmp/cachefunc_*.py left after doctesting), in the line

            sage: P.<a,b,c,d> = QQ[]

Further instrumentation showed that the segfault happens in sage/libs/singular/ring. pyx, in singular_ring_new, in the part that copies the strings over.

+    sys.stderr.write("before _names allocation\n")
     _names = <char**>omAlloc0(sizeof(char*)*(len(names)))
+    sys.stderr.write("after _names allocation\n")

     for i from 0 <= i < n:
         _name = names[i]
+        sys.stderr.write("calling omStrDup for i=%s with name=%s\n"%(i,names[i])
        _names[i] = omStrDup(_name)
+        sys.stderr.write("after omStrDup\n")

The call _omStrDup segfaults for i=1. Unwinding the _omStrDup call:

     for i from 0 <= i < n:
         _name = names[i]
+        sys.stderr.write("calling omStrDup for i=%s with name=%s\n"%(i,names[i]))
-        _names[i] = omStrDup(_name)
+        j = 0
+        while <bint> _name[j]:
+            j+=1
+        j+=1     #increment to include the 0
+        sys.stderr.write("string length (including 0) seems to be %s\n"%j)
+        copiedname =  <char*>omAlloc(sizeof(char)*(j+perturb))
+        sys.stderr.write("Done reserving memory buffer; got address %x\n"%(<long>copiedname))
+        for 0 <= offset < j:
+            sys.stderr.write("copying character nr %s\n"%offset)
+            copiedname[offset] = _name[offset]
+        _names[i] = copiedname
+        sys.stderr.write("after omStrDup\n")

shows that it's actually the omAlloc call segfaulting. For perturb=7 or higher, the segfault does not happen. For perturb a lower value it does. Given that the omAlloc addresses returned on earlier calls do not seem close to a page boundary, the only way omAlloc can fail is basically by a corrupted freelist an 8-byte bin. Likely culprits:

  • a double free (although I'd expect that would trigger problems on more architectures)
  • someone writing out-of-bounds in omAlloc-managed memory.

Perhaps someone claiming an <int>,<int> structure and storing a <void *> in the second one?

Note the <char*> to <long> cast in the print statement. With an <int>, the compiler complains about loss of precision, but not with <long>. I haven't checked whether <long> is really 64 bits on this machine, though. I have tried and the problem seems to persist with the old singular (5.4b0 has a recently upgraded singular).

It would help a lot if someone could build singular to use plain malloc throughout and then use valgrind or a similar tool, which should be able to immediately catch a double free or out-of-bounds error. If the root of the problem is not OSX-specific, this would even show up on other architectures.

See also [http://trac.sagemath.org/sage_trac/ticket/715#comment:295 #715,comment 295] and below for some more details on how the diagnosis above was obtained.

comment:2 Changed 7 years ago by nbruin

  • Summary changed from Make libsindular multivariate polynomial rings collectable to Make libsingular multivariate polynomial rings collectable

comment:3 Changed 7 years ago by nbruin

OK, I did a little experiment and tried to build singular with plain malloc rather than omalloc. In principle, omalloc has an --with-emulate... flag, but the API offered in that mode is woefully incomplete for singular use. I tried to extend it a little. Very rough result:

http://sage.math.washington.edu/home/nbruin/singular-3-1-5.malloc.spkg

One problem is supplying a memdup, which needs to know the size of an allocated block from its pointer. On BSD, you can use malloc_size for that. On linux one could use malloc_usable_size. The rest is a whole swath of routines that need to be supplied.

The package above is very dirty, but on bsd.math it did provide me with an apparently working libsingular. The singular executable produced didn't seem usable, so keep your old one!

The doctest passes! Not exactly what we were hoping for, but it does make a double-free unlikely. That would have been detected. Corruption after freeing could still be possible, since malloc allocates way bigger blocks, so freelist data is likely missed.

There is of course also the possibility of some routine writing out of bounds, which is less likely to trigger problems with malloc too.

Singular people might be interested in incorporating the changes to omalloc (and preferrably extend them a little bit) so that --with-emulate... becomes a viable option for Singular debugging. Then you can valgrind singular code.

comment:4 Changed 7 years ago by SimonKing

  • Cc SimonKing added

comment:5 Changed 7 years ago by SimonKing

  • Report Upstream changed from N/A to Reported upstream. No feedback yet.

I have contacted Hans Schönemann.

comment:6 follow-up: Changed 7 years ago by SimonKing

  • Cc malb added

I think Martin should be Cc for this as well.

I am not sure if changing to malloc is an acceptable option for Singular. If I understand correctly, omalloc is very important for having a good performance in Gröbner basis computations.

comment:7 in reply to: ↑ 6 Changed 7 years ago by nbruin

Replying to SimonKing:

I am not sure if changing to malloc is an acceptable option for Singular.

I am sure it is not acceptable for production, but being able to swap out omalloc for debugging can be very useful. That's why I tried. I understand that there are great tools to do memory management debugging and omalloc puts them all out of play because it hides all memory alloc/free activity.

It seems omalloc has its own tools but I wasn't able to get them working, I've seen indications that they don't work on 64 bits, and there's a good chance they're not as good as the general ones because they're for a smaller market.

I'm sure someone more familiar with the Singular and omalloc code bases can make a more informed decision on whether having the option of straight malloc memory management for debugging is worthwhile. My initial finding is that it quite likely can be done with relatively small modifications. I got it to more or less work in an evening, while being unfamiliar with the code.

Last edited 7 years ago by nbruin (previous) (diff)

comment:8 follow-up: Changed 7 years ago by nbruin

At least the problem is a real one. I've found a similar iMac:

    Hardware Overview:

      Model Name: iMac
      Model Identifier: iMac10,1
      Processor Name: Intel Core 2 Duo
      Processor Speed: 3.06 GHz
      Number Of Processors: 1
      Total Number Of Cores: 2
      L2 Cache: 3 MB
      Memory: 4 GB
      Bus Speed: 1.07 GHz
      Boot ROM Version: IM101.00CC.B00
      SMC Version (system): 1.52f9

    System Software Overview:

      System Version: Mac OS X 10.6.8 (10K549)
      Kernel Version: Darwin 10.8.0
      64-bit Kernel and Extensions: No
      Time since boot: 5 days 8:46

and bsd.math.washington.edu:

    Hardware Overview:

      Model Name: Mac Pro
      Model Identifier: MacPro5,1
      Processor Name: Quad-Core Intel Xeon
      Processor Speed: 2.4 GHz
      Number Of Processors: 2
      Total Number Of Cores: 8
      L2 Cache (per core): 256 KB
      L3 Cache (per processor): 12 MB
      Memory: 32 GB
      Processor Interconnect Speed: 5.86 GT/s
      Boot ROM Version: MP51.007F.B03
      SMC Version (system): 1.39f11
      SMC Version (processor tray): 1.39f11

    System Software Overview:

      System Version: Mac OS X 10.6.8 (10K549)
      Kernel Version: Darwin 10.8.0
      64-bit Kernel and Extensions: Yes
      Time since boot: 16 days 6:14

Both these machines exhibit the same problem that on 5.4b0 + #715 + #11521, the doctest for cachefunc.pyx segfaults in the same spot. Note that the iMac claims to not have a 64-bit kernel. Sage is compiled to be 64-bit on that machine, though (and seems to work).

Have we actually established that this bug does not occur on newer OSX versions?

comment:9 in reply to: ↑ 8 ; follow-up: Changed 7 years ago by SimonKing

Replying to nbruin:

Have we actually established that this bug does not occur on newer OSX versions?

And have we actually established that this problem does not occur with older Singular versions? I am not totally sure, but I think the problem with #715+#11521 first emerged in sage-5.4.beta0, when Singular-3-1-5 was merged.

comment:10 in reply to: ↑ 9 Changed 7 years ago by nbruin

Replying to SimonKing:

And have we actually established that this problem does not occur with older Singular versions?

Quoting from comment:1

I have tried and the problem seems to persist with the old singular (5.4b0 has a recently upgraded singular).

In the mean time, a bit of googling led me to OSX's "GuardMalloc?". While sage+singular-malloc does not crash on the doctest, it does crash when run with

export DYLD_INSERT_LIBRARIES=/usr/lib/libgmalloc.dylib

Since gmalloc is a memory manager that places each allocation on its own page with protected/unmapped memory as close as possible around the block and that unmaps the block as soon as freed (I'm just parroting the manpage), a segfault is likely due to an access-after-free or access-out-of-bounds -- the one that would normally cause the corruption and then the segfault much later. (that's the whole idea of replacing omalloc -- I don't think it's doable to get omalloc to segfault on an access-after-free). This all comes at a significant speed penalty of course, so experiments are painful and I wouldn't even be able to interpret the backtrace/coredump if I got it (I'd hope that the gmalloc-induced segfault would be reproducible in gdb). It would really be useful if the test file would be pared down to an absolute minimum. That's basically just a backtracking search on which elements can be removed while still triggering a segfault.

However, I think this is a strong indication that there is a real memory violation at the base of this and that it is tracable.

comment:11 follow-up: Changed 7 years ago by SimonKing

I tried to track the problem as follows:

  • sage/libs/singular/ring.pyx

    diff --git a/sage/libs/singular/ring.pyx b/sage/libs/singular/ring.pyx
    a b  
    1616
    1717include "../../ext/stdsage.pxi"
    1818
     19import sys
     20
    1921from sage.libs.gmp.types cimport __mpz_struct
    2022from sage.libs.gmp.mpz cimport mpz_init_set_ui, mpz_init_set
    2123
     
    495497    cdef object r = wrap_ring(existing_ring)
    496498    refcount = ring_refcount_dict.pop(r)
    497499    ring_refcount_dict[r] = refcount+1
     500    sys.stderr.write("reference %d to %d, wrapper %d\n"%(refcount+1,<size_t>existing_ring, id(r)))
     501    sys.stderr.flush()
    498502    return existing_ring
    499503
    500504
     
    536540
    537541    cdef ring_wrapper_Py r = wrap_ring(doomed)
    538542    refcount = ring_refcount_dict.pop(r)
     543    sys.stderr.write("dereference level %d of %d, wrapper %d\n"%(refcount-1,<size_t>doomed, id(r)))
     544    sys.stderr.flush()
    539545    if refcount > 1:
    540546        ring_refcount_dict[r] = refcount-1
    541547        return
  • sage/rings/polynomial/multi_polynomial_libsingular.pyx

    diff --git a/sage/rings/polynomial/multi_polynomial_libsingular.pyx b/sage/rings/polynomial/multi_polynomial_libsingular.pyx
    a b  
    151151    sage: b-j*c
    152152    b - 1728*c
    153153"""
     154import sys
    154155
    155156# The Singular API is as follows:
    156157#
     
    242243
    243244import sage.libs.pari.gen
    244245import polynomial_element
    245 
     246from sage.libs.singular.ring cimport wrap_ring
    246247cdef class MPolynomialRing_libsingular(MPolynomialRing_generic):
    247248
    248249    def __cinit__(self):
     
    364365        from sage.rings.polynomial.polynomial_element import PolynomialBaseringInjection
    365366        base_inject = PolynomialBaseringInjection(base_ring, self)
    366367        self.register_coercion(base_inject)
     368        sys.stderr.write("At %d, creating %s\n"%(<size_t>self._ring, self))
     369        sys.stderr.flush()
    367370
    368371    def __dealloc__(self):
    369372        r"""
     
    390393            sage: _ = gc.collect()
    391394        """
    392395        if self._ring != NULL:  # the constructor did not raise an exception
     396            from sage.libs.singular.ring import ring_refcount_dict
     397            try:
     398                level = ring_refcount_dict[wrap_ring(self._ring)]
     399            except KeyError:
     400                level = -1
     401            if level > 1:
     402                sys.stderr.write("WARNING: %d\n"%(<size_t>self._ring))
     403            else:
     404                sys.stderr.write("__dealloc__: %s\n"%(<size_t>self._ring))
     405            sys.stderr.flush()
    393406            singular_ring_delete(self._ring)
    394407
    395408    def __copy__(self):

Then, I ran python -t on the segfaulting test. Observation: It happens precisely twice that "WARNING" is printed, i.e., the __dealloc__ method is called even though there remain multiple references to the underlying libsingular ring.

In both cases it is QQ['a','b','c','d']. Here is a snipped from the output:

reference 2 to 4409548912, wrapper 4302568952
reference 3 to 4409548912, wrapper 4302569000
reference 4 to 4409548912, wrapper 4302568952
At 4409548912, creating Multivariate Polynomial Ring in a, b, c, d over Rational Field
reference 5 to 4409548912, wrapper 4302569000
reference 6 to 4409548912, wrapper 4302568952
reference 7 to 4409548912, wrapper 4302569000
reference 8 to 4409548912, wrapper 4302568952
reference 9 to 4409548912, wrapper 4302569000
reference 10 to 4409548912, wrapper 4302568952
reference 2 to 4409549416, wrapper 4302568928
reference 3 to 4409549416, wrapper 4302569000
dereference level 9 of 4409548912, wrapper 4302568928
dereference level 8 of 4409548912, wrapper 4302568952
dereference level 7 of 4409548912, wrapper 4302568928
dereference level 6 of 4409548912, wrapper 4302568952
dereference level 5 of 4409548912, wrapper 4302568928
dereference level 4 of 4409548912, wrapper 4302568952
dereference level 3 of 4409548912, wrapper 4302568928
WARNING: 4409548912
dereference level 2 of 4409548912, wrapper 4302568952
dereference level 1 of 4409548912, wrapper 4302568928
dereference level 0 of 4409548912, wrapper 4302568952

However, I am not totally sure whether this indicates a problem, because in both cases the remaining references are immediately removed. Also, it is always the case that 4 references are set to the libsingular ring before actually creating the polynomial ring in Sage.

One last observation: You may notice a libsingular ring at address 4409549416 that is referenced here as well, aparently in the middle of the construction of QQ['a','b','c','d']. It is later used for QQ['x','y','z']. The last report before the segfault is

reference 32 to 4409549416, wrapper 4302568952

Seems like a wild-goose chase to me, though.

comment:12 in reply to: ↑ 11 Changed 7 years ago by SimonKing

Replying to SimonKing:

One last observation: You may notice a libsingular ring at address 4409549416 that is referenced here as well, aparently in the middle of the construction of QQ['a','b','c','d']. It is later used for QQ['x','y','z']. The last report before the segfault is

reference 32 to 4409549416, wrapper 4302568952

And this ring is in fact currRing when it crashes.

comment:13 Changed 7 years ago by nbruin

OK! good progress. Instrumenting sagedoc.py a little bit we can indeed see the order in which the doctests are executed:

__main__
__main__.change_warning_output
__main__.check_with_tolerance
__main__.example_0
__main__.example_1
__main__.example_10
__main__.example_11
__main__.example_12
__main__.example_13
__main__.example_14
__main__.example_15
__main__.example_16
__main__.example_17
__main__.example_18
__main__.example_19
__main__.example_2
__main__.example_20
__main__.example_21
__main__.example_22
__main__.example_23
__main__.example_24
__main__.example_25
__main__.example_26
__main__.example_27
Unhandled SIGSEGV

so that indeed seems to be alphabetical order.

Now let's run the doctests with singular-using-malloc. Result: No segfault. OSX comes with gmalloc, which is a guarded malloc for debugging purposes. It places every allocation on a separate page and unmaps that page upon freeing. So, any access-after-free leads to a segfault. Now we do get a segfault and it happens a lot sooner than example_27. In fact, now the segfault survives in gdb. The error happens when executing

G = I.groebner_basis()###line 921:_sage_    >>> G = I.groebner_basis()

Here's a session with gdb once the segfault has happened. I think I have been able to extract enough data to point at the probably problem.

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x00000001850dbf44
__pyx_f_4sage_4libs_8singular_8function_call_function (__pyx_v_self=0x190ab8960, __pyx_v_args=0x190a8e810, __pyx_v_R=0x19c39be70, __pyx_optional_args=<value temporarily unavailable, due to optimizations>) at sage/libs/singular/function.cpp:13253
13253       currRingHdl->data.uring->ref = (currRingHdl->data.uring->ref - 1);
####NB: This is line 1410 in sage/libs/singular/function.pyx
(gdb) print currRingHdl
$1 = (idhdl) 0x17c2b5fd0
(gdb) print currRingHdl->data
$2 = {
  i = -2062696816,
  uring = 0x1850dbe90,
  p = 0x1850dbe90,
  n = 0x1850dbe90,
  uideal = 0x1850dbe90,
  umap = 0x1850dbe90,
  umatrix = 0x1850dbe90,
  ustring = 0x1850dbe90 <Address 0x1850dbe90 out of bounds>,
  iv = 0x1850dbe90,
  bim = 0x1850dbe90,
  l = 0x1850dbe90,
  li = 0x1850dbe90,
  pack = 0x1850dbe90,
  pinf = 0x1850dbe90
}
(gdb) print currRingHdl->data.uring
$3 = (ring) 0x1850dbe90
(gdb) print currRingHdl->data.uring->ref
Cannot access memory at address 0x1850dbf44
(gdb) print  *__pyx_v_si_ring
$10 = {
  idroot = 0x0, 
  order = 0x19c3cbff0, 
  block0 = 0x19c3cdff0, 
  block1 = 0x19c3cfff0, 
  parameter = 0x0, 
  minpoly = 0x0, 
  minideal = 0x0, 
  wvhdl = 0x19c3c9fe0, 
  names = 0x19c3bdfe0, 
  ordsgn = 0x19c3ddfe0, 
  typ = 0x19c3dffd0, 
  NegWeightL_Offset = 0x0, 
  VarOffset = 0x19c3d9ff0, 
  qideal = 0x0, 
  firstwv = 0x0, 
  PolyBin = 0x104ee8440, 
  ringtype = 0, 
  ringflaga = 0x0, 
  ringflagb = 0, 
  nr2mModul = 0, 
  nrnModul = 0x0, 
  options = 100663424, 
  ch = 0, 
  ref = 0, 
  float_len = 0, 
  float_len2 = 0, 
  N = 3, 
  P = 0, 
  OrdSgn = 1, 
  firstBlockEnds = 3, 
  real_var_start = 0, 
  real_var_end = 0, 
  isLPring = 0, 
  VectorOut = 0, 
  ShortOut = 0, 
  CanShortOut = 1, 
  LexOrder = 0, 
  MixedOrder = 0, 
  ComponentOrder = -1, 
  ExpL_Size = 3, 
  CmpL_Size = 3, 
  VarL_Size = 1, 
  BitsPerExp = 20, 
  ExpPerLong = 3, 
  pCompIndex = 2, 
  pOrdIndex = 0, 
  OrdSize = 1, 
  VarL_LowIndex = 1, 
  MinExpPerLong = 3, 
  NegWeightL_Size = 0, 
  VarL_Offset = 0x19c3e3ff0, 
  bitmask = 1048575, 
  divmask = 1152922604119523329, 
  p_Procs = 0x19c3e7f80, 
  pFDeg = 0x104a80150 <pDeg(spolyrec*, sip_sring*)>, 
  pLDeg = 0x104a80920 <pLDegb(spolyrec*, int*, sip_sring*)>, 
  pFDegOrig = 0x104a80150 <pDeg(spolyrec*, sip_sring*)>, 
  pLDegOrig = 0x104a80920 <pLDegb(spolyrec*, int*, sip_sring*)>, 
  p_Setm = 0x104a7ff40 <p_Setm_TotalDegree(spolyrec*, sip_sring*)>, 
  cf = 0x11e487e70, 
  algring = 0x0, 
  _nc = 0x0
}
(gdb) print __pyx_v_si_ring
$11 = (ip_sring *) 0x19c3c5e90
(gdb) print ((struct __pyx_obj_4sage_5rings_10polynomial_28multi_polynomial_libsingular_MPolynomialRing_libsingular *)__pyx_v_R)->_ring
$12 = (ip_sring *) 0x19c3c5e90
(gdb) print ((struct __pyx_obj_4sage_5rings_10polynomial_6plural_NCPolynomialRing_plural *)__pyx_v_R)->_ring
$13 = (ip_sring *) 0x10019ff30
####NB: so PY_TYPE_CHECK(R, MPolynomialRing_libsingular) is true
(gdb) print (__pyx_v_si_ring != currRing)
$15 = false
####NB: does this mean that rChangeCurrRing(si_ring) got executed or that si_ring already equalled currRing?
(gdb) print (currRingHdl->data.uring != currRing)
$16 = true
####NB: of course, that's why we segfault on the statement that follows:
####NB:       currRingHdl.data.uring.ref -= 1
(gdb) print *(currRingHdl->data.uring)
Cannot access memory at address 0x1850dbe90
####NB: It looks like currRingHdl.data.uring has been unbound.
####NB: naturally, changing a field on that pointer will corrupt memory (or in this case
####NB: because gmalloc has unmapped the page, cause a segfault)
####NB: Could it be that the code here should really test for uring being still valid?
####NB: (if it can do that at all)?

So I think the issue is in sage.lib.singular.function.call_function:

...
    if currRingHdl.data.uring!= currRing:
        currRingHdl.data.uring.ref -= 1
        currRingHdl.data.uring = currRing # ref counting?
        currRingHdl.data.uring.ref += 1
...

The evidence points absolutely to currRingHdl.data.uring pointing to unallocated (probably freed) memory. The access then of course can have all kinds of effects. At this point it is probably doable for a LibSingular expert to reason about the code whether uring should always be valid at this point (I suspect not).

It looks suspicious to me that sage.libs.singular.ring.singular_ring_delete does do a careful dance to zero out the currRing variable but doesn't seem to care about currRngHdl. I also find it worrying that there apparently is a refcount system right on the ring structures (as you can see above) and yet in singular_ring_delete a separate refcounting dict is used. One would think the same refcounting system should be borrowed by singular_ring_new and singular_ring_delete. It looks to me the code above thinks that by increasing ...uring.ref the reference is protected, but singular_ring_delete doesn't seem to take into account this refcount. It could well be that I'm misinterpreting the code and that this is all perfectly safe, though.

Libsingular specialists: Keep in mind that in principle, singular code can get executed in rather awkward moments, possibly as part of clean-ups of circular garbage and call-backs on weakref cleanup, where equality might be tested of objects that are soon to be deallocated themselves.

The fickleness of the bug is consistent with this condition arising during a cyclic garbage collection with just the right amount of junk around. That would make the occurrence of the bug depend on just about everything in memory. Or at least if you depend on the corruption leading to a segfault, it depends on which location exactly gets corrupted.

I think we might be getting close to a badge for debugging excellence here!

Last edited 7 years ago by nbruin (previous) (diff)

Changed 7 years ago by nbruin

take into account both refcount_dict and ring*.ref fields on deletion.

Changed 7 years ago by nbruin

Consolidate two refcount systems (cruft not yet removed from patch)

comment:14 Changed 7 years ago by nbruin

OK, two independent patches. Either prevents the segfault. I may just have removed the symptom, but not the cause.

If I'm correctly understanding the problem, trac_13447-consolidated_refcount.patch should be the preferred solution. However, my unfamiliarity with (lib)singular's intricacies might have caused an oversight somewhere. I think my interpretation is consistent with the use in sage.lib.singular.function.call_function, which is my main source of inspiration.

If people agree, we can clean out the cruft remaining from the refcounting method implemented locally.

comment:15 Changed 7 years ago by nbruin

  • Status changed from new to needs_info
  • Work issues set to Input from libsingular experts

comment:16 Changed 7 years ago by SimonKing

  • Cc vbraun gagern added
  • Description modified (diff)

Replying to nbruin:

If I'm correctly understanding the problem, trac_13447-consolidated_refcount.patch should be the preferred solution.

I didn't test the patch yet. However, it seems very straight forward to me: There already is a refcounting, and thus one should use it. I am Cc'ing Volker Braun and Martin von Gagern, the authors of #11339. Does trac_13447-consolidated_refcount.patch make sense to you as well?

Keeping a double refcount (as with trac_13447-double_refcount.patch seems suspicious to me.

Perhaps one should let the patchbots test it? Thus, I'll add this as dependency for #715, and for the patchbot:

Apply trac_13447-consolidated_refcount.patch

PS: You really deserve a badge for debugging excellence! Do I understand correctly that the bug is not on the side of Singular? I'll inform Hans accordingly.

Note: See TracTickets for help on using tickets.