Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#13731 closed defect (fixed)

Fix libsingular memory management

Reported by: nbruin Owned by: rlm
Priority: major Milestone: sage-5.6
Component: memleak Keywords:
Cc: SimonKing, fbissey, malb Merged in: sage-5.6.beta0
Authors: Nils Bruin, Simon King Reviewers: Nils Bruin
Report Upstream: Fixed upstream, in a later stable release. Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by jdemeyer)

We created a new patchlevel for the singular-3-1-5 spkg. It contains backports of some upstream fixes of several memory corruptions.

If one does export SINGULAR_XALLOC=yes before installing the spkg, Singular's usual memory manager "omalloc" will be replaced by "xalloc". This is a compatibility layer on top of malloc. The resulting Singular executable and library will be slower, but allows the use of some debug tools.

Here is the background.

We have several indications that interaction with libsingular's memory management is tricky. Debugging is hard, because libsingular uses omalloc for its memory management and that hides all allocation/deallocation from conventional memory checking tools. Singular's allocation library is relatively pluggable, though, and (at a speed penalty) one can mostly switch it to using the system malloc for everything.

Preliminary packages for linux and for osx using malloc are found here:

With the help from libsingular-devel, we managed to use xalloc instead of malloc. We now obtain Singular executable and library, and can still use debugging tools.

Install

The spkg contains singular_15435.patch and singular_part_of_changeset_baadc0f7.patch, which backport upstream fixes. It usually builds with omalloc. If export SINGULAR_XALLOC=yes, then it builds with xalloc, which is a compatibility layer for omalloc on top of malloc.

Bugs fixed

With the preliminary linux spkg, one obtains:

$ export MALLOC_CHECK_ 3
$ sage
...
sage: A.<x,y> = FreeAlgebra(QQ, 2)
sage: P.<x,y> = A.g_algebra({y*x:-x*y})
sage: x*y
*** glibc detected *** /usr/local/sage/5.5b1/local/bin/python: free(): 

gdb backtrace:

#0  0x00000031cfe36285 in raise () from /lib64/libc.so.6
#1  0x00000031cfe37b9b in abort () from /lib64/libc.so.6
#2  0x00000031cfe7774e in __libc_message () from /lib64/libc.so.6
#3  0x00000031cfe7da76 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007fffd82d5c34 in gnc_mm_Mult_nn (F0=<optimized out>,
G0=<optimized out>, r=0x2faa720) at gring.cc:507
#5  0x00007fffd82d6abc in gnc_p_Mult_mm_Common (p=0x326e8c0,
m=<optimized out>, side=0, r=0x2faa720) at gring.cc:424
#6  0x00007fffd76c7860 in nc_mm_Mult_pp (p=0x342aaf0, r=0x2faa720,
m=0x2fb6530) at /usr/local/sage/5.5b1/local/include/singular/gring.h:
171
#7  pp_Mult_qq (r=0x2faa720, q=0x342aaf0, p=0x2fb6530) at /usr/local/
sage/5.5b1/local/include/singular/pInline2.h:667
#8  __pyx_f_4sage_4libs_8singular_10polynomial_singular_polynomial_mul
(__pyx_v_ret=0x7fffffffb3c8, __pyx_v_p=0x2fb6530, __pyx_v_q=0x342aaf0,
__pyx_v_r=0x2faa720)
    at sage/libs/singular/polynomial.cpp:3895
#9  0x00007fffd7b096f8 in
__pyx_f_4sage_5rings_10polynomial_6plural_19NCPolynomial_plural__mul_
(__pyx_v_left=0x7fffbe6bd758, __pyx_v_right=0x7fffbe6bd878,
__pyx_skip_dispatch=<optimized out>)
    at sage/rings/polynomial/plural.cpp:12060

running it in valgrind gets you complaints about the same locations.

On OSX, MALLOC_CHECK_ is not available. But gmalloc does detect the crash.

The new spkg fixes that problem

Testing

Meanwhile two of the underlying problems where found and fixed upstream. http://sage.math.washington.edu/home/SimonKing/SAGE/spkgs/singular-3-1-5.p2.spkg contains backports of these fixes. With the exeption of one unrelated problem in sage/graphs/, the whole Sage test suite passes on Linux with export MALLOC_CHECK_=yes.

Attachments (5)

sage_trac_13731.patch (3.7 KB) - added by SimonKing 7 years ago.
A patch for the singular spkg to produce a libsingular that uses malloc
singular_15435.patch (3.7 KB) - added by SimonKing 7 years ago.
Backporting Singular changeset 15435
singular_part_of_changeset_baadc0f7.patch (645 bytes) - added by SimonKing 7 years ago.
Backporting a small part of Singular changeset baadc0f7
singular_xalloc.patch (827.7 KB) - added by SimonKing 7 years ago.
Build Singular and libsingular with omalloc replaced by xalloc
xalloc-makefile.diff (6.8 KB) - added by jdemeyer 7 years ago.
Patch added to the spkg

Download all attachments as: .zip

Change History (142)

comment:1 Changed 7 years ago by nbruin

See also #13447 for previous uses of libsingular-malloc, as well as this sage-devel thread

comment:2 follow-up: Changed 7 years ago by nbruin

We can also use ElectricFence? as a malloc substitute:

$ export LD_PRELOAD=libefence.so
$ export EF_ALLOW_MALLOC_0=1     #apparently sage/python does a malloc(0) somewhere
$ echo 512000 | sudo tee /proc/sys/vm/max_map_count #efence uses mprotect A LOT
$ ./sage -t --gdb devel/sage/sage/libs/singular/function.pyx 

This gives an error in a different location. Traceback:

#0  0x00007fffce9fb3f2 in List<CanonicalForm>::isEmpty(this=0x7fffa515b000) at ./templates/ftmpl_list.cc:256
#1  0x00007fffce96a350 in multiFactorize (F=..., v=...) at facFactorize.cc:710
#2  0x00007fffce905e9d in ratSqrfFactorize (v=..., G=...) at facFactorize.h:45
#3  ratFactorize (G=..., v=..., substCheck=<optimized out>) at facFactorize.h:125
#4  0x00007fffce90303c in factorize (f=..., issqrfree=false) at cf_factor.cc:651
#5  0x00007fffce7a5b44 in singclap_factorize (f=0x7fff9e16dfd0, v=0x7fffffffb548, with_exps=0) at clapsing.cc:835
#6  0x00007fffce720d50 in jjFAC_P (res=0x7fff9e16ffc0, u=0x7fff9ea7afc0) at iparith.cc:3995
#7  0x00007fffce729c42 in iiExprArith1 (res=0x7fff9e16ffc0, a=0x7fff9ea7afc0, op=424) at iparith.cc:7869
#8  0x00007fffce12df55 in __pyx_f_4sage_4libs_8singular_8function_17KernelCallHandler_handle_call(__pyx_v_self=0x7fff9e967b48, __pyx_v_argument_list=0x7fff9e975b40, __pyx_optional_args=<optimized out>) at sage/lib/singular/function.cpp:11286 

All these error point in the same direction, though: As soon as we start interacting with the singular interpreter we get exposed to whatever memory management it does and we're in big trouble. Use of libsingular on more basic levels usually doesn't expose us to libsingular memory management, so no problems arise.

comment:3 Changed 7 years ago by SimonKing

  • Cc SimonKing added

comment:4 in reply to: ↑ description ; follow-up: Changed 7 years ago by pcpa

$ export MALLOC_CHECK_ 3
$ sage ... sage: A.<x,y> = FreeAlgebra (QQ, 2)
sage: P.<x,y> = A.g_algebra({y*x:-x*y})
sage: x*y ***
glibc detected *** /usr/local/sage/5.5b1/local/bin/python: free():

I do not see this in my current sagemath-5.4 rpm package, but I suspect it may have been either corrected or worked around by one of the patches at http://pkgs.fedoraproject.org/cgit/Singular.git/tree most likely Singular-undefined.patch, but the padding added by Singular-M2_memutil_debuggging.patch may be actually hiding some problem. Or, it may be something different, the singular rpm was known to not build properly or generate proper binaries if an older singular was installed but I did not care much about that because official rpms are built in a clean chroot.

comment:5 in reply to: ↑ 4 ; follow-up: Changed 7 years ago by nbruin

Replying to pcpa:

I do not see this in my current sagemath-5.4 rpm package

Thanks for checking!

Are you sure your singular gets built to use the glibc malloc for all small allocations and not omalloc on "malloc"-ed pages? You may try setting EXTRA=0 to remove the padding. I don't expect that padding comes into play, though. I suspect a double free or a free on a pointer that doesn't point to a malloc-ed block to begin with.

comment:6 in reply to: ↑ 5 ; follow-up: Changed 7 years ago by pcpa

Replying to nbruin:

Replying to pcpa:

I do not see this in my current sagemath-5.4 rpm package

Thanks for checking!

Are you sure your singular gets built to use the glibc malloc for all small allocations and not omalloc on "malloc"-ed pages? You may try setting EXTRA=0 to remove the padding. I don't expect that padding comes into play, though. I suspect a double free or a free on a pointer that doesn't point to a malloc-ed block to begin with.

Sorry, I completely misunderstood what was mean't by the spkg and the '--with-emulate-omalloc' option. I will try to debug more later, but I cannot build the Singular rpm with that option as it always fails in the kernel subdirectory due to missing omEmulate* declarations.

comment:7 in reply to: ↑ 6 Changed 7 years ago by nbruin

Sorry, I completely misunderstood what was mean't by the spkg and the '--with-emulate-omalloc' option. I will try to debug more later, but I cannot build the Singular rpm with that option as it always fails in the kernel subdirectory due to missing omEmulate* declarations.

Yes, my edits in the linked-to singular spkgs add declarations to make those errors go away. I didn't exactly respect the rest of build options in the process, though (because I didn't understand them). Cleaning that up to a well-defined patch to the singular package would be a great help.

comment:8 Changed 7 years ago by SimonKing

I am trying to build Sage-5.5.rc0 with the malloc singular spkg for linux. Platform: openSuse 12.1, on Intel Core2 Duo.

What I tried: Untar the sage tarball, replace the usual Singular spkg with yours, and type make. Singular fails to build, because of

g++ -O2 -I. -I.. -I/home/simon/SAGE/debug/sage-5.5.rc0/local -I/home/simon/SAGE/debug/sage-5.5.rc0/local/include -I/home/simon/SAGE/debug/sage-5.5.rc0/local/include -I/home/simon/SAGE/debug/sage-5.5.rc0/local/include  -I/usr/local/include  -DNDEBUG -DOM_NDEBUG -Dx86_64_Linux -DHAVE_CONFIG_H -DESINGULAR -DPROTO -o ESingular emacs.cc ../kernel/fegetopt.o \
-rdynamic -L/home/simon/SAGE/debug/sage-5.5.rc0/local/kernel -L../kernel -lkernel -L/home/simon/SAGE/debug/sage-5.5.rc0/local/lib -L/home/simon/SAGE/debug/sage-5.5.rc0/local/lib  -L/usr/local/lib  -ldl -lm -lsingfac -lsingcf -lntl -lgmp -lreadline -lncurses -lnsl -lm  -lnsl -lbsd  -lomalloc -lpthread  ../kernel/mmalloc.o
/usr/lib64/gcc/x86_64-suse-linux/4.6/../../../../x86_64-suse-linux/bin/ld: cannot find -lbsd

Could it be that it is a problem with SAGE_ROOT/local/lib versus SAGE_ROOT/local/lib64? That used sometimes to be a problem with openSuse.

comment:9 Changed 7 years ago by SimonKing

No, libdir is used in spkg-install. But I wonder: The original Singular spkg contains three Makefiles: in dyn_modules/python, in calldummy/ and in omalloc/Misc/dlmalloc. But the malloc Singular package from here contains 14 Makefiles. Could it be that some of them should better be created by ./configure?

comment:10 Changed 7 years ago by SimonKing

The spkg also contains several files created by configure scripts. Is that really intended?

comment:11 Changed 7 years ago by SimonKing

When I removed all autogenerated files from your spkg, installing singular-3-1-5.malloc-linux succeeded!

I guess you are aware that one is supposed to leave the Singular source files untouched when creating an spkg - after all, Singular-malloc is for testing only.

comment:12 Changed 7 years ago by SimonKing

Building sage-5.5.rc0 did almost succeed.

...
gcc -fno-strict-aliasing -fwrapv -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/simon/SAGE/debug/sage-5.5.rc0/local/include/singular/ -I/home/simon/SAGE/debug/sage-5.5.rc0/local/include -I/home/simon/SAGE/debug/sage-5.5.rc0/local/include/csage -I/home/simon/SAGE/debug/sage-5.5.rc0/devel/sage/sage/ext -I/home/simon/SAGE/debug/sage-5.5.rc0/local/include/python2.7 -c sage/algebras/letterplace/free_algebra_element_letterplace.cpp -o build/temp.linux-x86_64-2.7/sage/algebras/letterplace/free_algebra_element_letterplace.o -w
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for Ada/C/ObjC but not for C++ [enabled by default]
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for Ada/C/ObjC but not for C++ [enabled by default]
In file included from /home/simon/SAGE/debug/sage-5.5.rc0/local/include/singular/structs.h:13:0,
                 from /home/simon/SAGE/debug/sage-5.5.rc0/local/include/libsingular.h:6,
                 from sage/algebras/letterplace/free_algebra_element_letterplace.cpp:283:
/home/simon/SAGE/debug/sage-5.5.rc0/local/include/omalloc.h:784:30: fatal error: omalloc/omMalloc.h: Datei oder Verzeichnis nicht gefundenIn file included from /home/simon/SAGE/debug/sage-5.5.rc0/local/include/singular/structs.h:13:0,
                 from /home/simon/SAGE/debug/sage-5.5.rc0/local/include/libsingular.h:6,
                 from sage/algebras/letterplace/free_algebra_letterplace.cpp:283:
/home/simon/SAGE/debug/sage-5.5.rc0/local/include/omalloc.h:784:30: fatal error: omalloc/omMalloc.h: Datei oder Verzeichnis nicht gefunden
compilation terminated.

So, that looks like coming from Plural.

On the other hand, your example in the ticket description does use plural.

What went wrong here? I did erase src/omalloc/omMalloc.h from your spkg, since it is not part of the "original" Singular spkg (hence, I thought it was autogenerated).

comment:13 follow-up: Changed 7 years ago by SimonKing

PS: Here is a part of the install log for singular-malloc:

updating cache .././config.cache
creating ./config.status
creating Makefile
creating omConfig.h
creating omlimits.h
creating omMalloc.h
configuring in factory
running /bin/sh ./configure  --prefix=/home/simon/SAGE/debug/sage-5.5.rc0/local --exec-prefix=/home/simon/SAGE/debug/sage-5.5.rc0/local --bindir=/home/simon/SAGE/debug/sage-5.5.rc0/local/bin --libdir=/home/simon/SAGE/debug/sage-5.5.rc0/local/lib --with-apint=gmp --with-malloc=system --with-emulate-omalloc --with-NTL --without-MP --without-lex --without-Boost --without-flint --enable-Singular --enable-factory --enable-libfac --enable-IntegerProgramming --disable-doc --with-debug --includedir=/home/simon/SAGE/debug/sage-5.5.rc0/local/include --enable-omalloc --with-external-config_h='/home/simon/SAGE/debug/sage-5.5.rc0/spkg/build/singular-3-1-5.malloc-linux/src/Singular/omSingularConfig.h' --with-track-custom --enable-Plural --with-factory --with-libfac --with-Singular=yes --cache-file=.././config.cache --srcdir=.

So, the missing header actually is created.

Also, I don't understand why omMalloc.h is needed: In a previous Sage installation with the standard Singular 3-1-5 package, I can't find omMalloc.h in SAGE_ROOT/local/....

comment:14 in reply to: ↑ 13 Changed 7 years ago by nbruin

Replying to SimonKing:

So, the missing header is actually is created.

Yes, indeed, and that is the one I copy over manually because I didn't figure out how to do it in the install script (that should be straightforward though)

Also, I don't understand why omMalloc.h is needed: In a previous Sage installation with the standard Singular 3-1-5 package, I can't find omMalloc.h in SAGE_ROOT/local/....

It might not be. The reason why it is needed now is because of the rough way I mutilated the omalloc source in Singular. Its configuration process is a big tangle of pasting together header files, so when I finally found a way to effect the desired changes I was already happy. That involved knocking out some #ifdef ... #endif preprocessor directives in the source. I'm sure it's straightforward to fix that a little more nicely (I wasn't aiming for production code at the time, just a quick tool to debug stuff). That activated a #include <omalloc/omAlloc.h> as well. I don't know if errors happen if you delete that line. Anyway, the libsingular*.pyx files import the header file in which that include ends up, so that's why presently, that file needs to be in the include paths used by sage -b.

I wasn't exaggerating when I called the package rather untidy ...

Last edited 7 years ago by nbruin (previous) (diff)

comment:15 Changed 7 years ago by fbissey

  • Cc fbissey added

comment:16 Changed 7 years ago by SimonKing

I managed to build Sage with Singular-malloc. What I did to make it work:

  • open the spkg
  • remove all autogenerated files
  • change src/omalloc/Makefile.in so that the freshly built omMalloc.h is installed in $(includedir)/omalloc (creating that directory first)

Apparently the singular executable is not usable (when I try to start it, it simply seems to hang).

I can reproduce the crash from the ticket description:

simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> export MALLOC_CHECK_=3
simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> ./sage
----------------------------------------------------------------------
| Sage Version 5.5.rc0, Release Date: 2012-11-17                     |
| Type "notebook()" for the browser-based notebook interface.        |
| Type "help()" for help.                                            |
----------------------------------------------------------------------
**********************************************************************
*                                                                    *
* Warning: this is a prerelease version, and it may be unstable.     *
*                                                                    *
**********************************************************************
sage: A.<x,y> = FreeAlgebra(QQ, 2)
sage: P.<x,y> = A.g_algebra({y*x:-x*y})
sage: x*y
*** glibc detected *** python: free(): invalid pointer: 0x0000000002806c90 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x766d6)[0x7fa9ed9cc6d6]
/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libsingular.so(_Z14gnc_mm_Mult_nnPiS_P9sip_sring+0x294)[0x7fa9d35c7c54]
/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libsingular.so(_Z20gnc_p_Mult_mm_CommonP8spolyrecS0_iP9sip_sring+0x1ec)[0x7fa9d35c8acc]
/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/python2.7/site-packages/sage/libs/singular/polynomial.so(+0x4bd0)[0x7fa9d29a9bd0]
/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/python2.7/site-packages/sage/rings/polynomial/plural.so(+0x1fbd8)[0x7fa9d2deebd8]
/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/python2.7/site-packages/sage/structure/element.so(+0x2a399)[0x7fa9e4bed399]
...

So, that's a good platform to try to hunt the Heisenbugs from #12215, #715 and friends once and for all.

comment:17 follow-up: Changed 7 years ago by SimonKing

How do I make it create a core dump, and how do I open it with gdb? I recall something with "ulimit -c", but I can't recall the details...

sage -gdb does not work, by the way:

simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> ./sage -gdb
----------------------------------------------------------------------
| Sage Version 5.5.rc0, Release Date: 2012-11-17                     |
| Type "notebook()" for the browser-based notebook interface.        |
| Type "help()" for help.                                            |
----------------------------------------------------------------------
**********************************************************************
*                                                                    *
* Warning: this is a prerelease version, and it may be unstable.     *
*                                                                    *
**********************************************************************
/home/simon/SAGE/debug/sage-5.5.rc0/local/bin/sage-ipython
GNU gdb (GDB) SUSE (7.3-41.1.2)
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-suse-linux".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/simon/SAGE/debug/sage-5.5.rc0/local/bin/python...done.
[Thread debugging using libthread_db enabled]
Python 2.7.3 (default, Nov 21 2012, 11:21:15) 
[GCC 4.6.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)

/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/python2.7/site-packages/IPython/ipmaker.pyc in force_import(modname, force_reload)
     61         reload(sys.modules[modname])
     62     else:
---> 63         __import__(modname)
     64         
     65 

ImportError: No module named ipy_profile_sage
Error importing ipy_profile_sage - perhaps you should run %upgrade?
WARNING: Loading of ipy_profile_sage failed.

sage: A.<x,y> = FreeAlgebra(QQ, 2)
------------------------------------------------------------
   File "<ipython console>", line 1
     A.<x,y> = FreeAlgebra(QQ, 2)
       ^
SyntaxError: invalid syntax

comment:18 in reply to: ↑ 17 ; follow-up: Changed 7 years ago by nbruin

Replying to SimonKing:

How do I make it create a core dump, and how do I open it with gdb? I recall something with "ulimit -c", but I can't recall the details...

I'd guess ulimit -c unlimited.

sage -gdb does not work, by the way:

Yes, I had the same problem (and it's not singular-malloc's fault). It seems to be a bad interaction between ipython and gdb. sage -t -gdb does work, so I've just worked around it by wrapping it in a little test file: tester.py:

def t():
    r"""
    sage: A.<x,y> = FreeAlgebra(QQ, 2)
    sage: P.<x,y> = A.g_algebra({y*x:-x*y})
    sage: x*y
    """
    pass

Congratulations on getting my franken-spkg to build ...

comment:19 in reply to: ↑ 18 Changed 7 years ago by SimonKing

Replying to nbruin:

Replying to SimonKing:

How do I make it create a core dump, and how do I open it with gdb? I recall something with "ulimit -c", but I can't recall the details...

I'd guess ulimit -c unlimited.

Thanks!

And then, it is gdb -c <name_of_core_dump>, right?

comment:20 Changed 7 years ago by SimonKing

By the way, what is the ultimate aim of this ticket? Replacing Singular's memory management by malloc makes Singular very slow, if I understand correctly. So, the aim is to provide a tool to investigate flaws in Sage-Singular interrelation, right?

So, could one say that the aim is to provide an optional package?

comment:21 Changed 7 years ago by SimonKing

First observation: The crash has nothing to do with coercion.

sage: A.<x,y> = FreeAlgebra(QQ, 2)
sage: P.<x,y> = A.g_algebra({y*x:-x*y})
sage: x._mul_(y)
*** glibc detected *** python: free(): invalid pointer: 0x0000000002dd86c0 ***

That's a good news, because the absence of coercion makes life a lot easier...

comment:22 Changed 7 years ago by SimonKing

Now the bad news: By inserting print statements into the _mul_ method, I found that the crash occurs in the line

        singular_polynomial_mul(&_p, left._poly,
                                 (<NCPolynomial_plural>right)._poly,
                                 (<NCPolynomialRing_plural>left._parent)._ring)

I was somehow hoping that it would happen in the new_NCP function.

comment:23 Changed 7 years ago by SimonKing

News become worse: The crash happens in pp_Mult_qq, so that I start to believe it might be an upstream problem.

comment:24 Changed 7 years ago by SimonKing

The pp_Mult_qq function has three arguments: Two pointers p,q to polynomials, and a pointer r to a ring.

When I tested, the hex address of the polynomials resp. the ring being pointed at were

p = 0x2dd3e10
q = 0x2dd38a0
r = 0x2dfa7e0

But I can't relate these figures to the address reported by the backtrace:

*** glibc detected *** python: free(): invalid pointer: 0x0000000002bb4ee0 ***

Do you have an idea how this can be interpreted?

comment:25 Changed 7 years ago by SimonKing

Looking into the upstream sources and trying to find what the gdb backtrace is telling, I see in line 667 of pInline2.h:

PINLINE2 poly pp_Mult_qq(poly p, poly q, const ring r)
{
  poly last;
  if (p == NULL || q == NULL) return NULL;

  if (pNext(p) == NULL)
  {
#ifdef HAVE_PLURAL
    if (rIsPluralRing(r))
      return nc_mm_Mult_pp(p, q, r);
#endif
    return r->p_Procs->pp_Mult_mm(q, p, r, last);
  }

and line 667 is return nc_mm_Mult_pp(p, q, r). That function is defined in gring.h (why in a header file, by the way?). I find no indication of "free" being used there, except that it calls r->GetNC()->p_Procs.mm_Mult_pp(...), and I have no idea where that method is being defined.

So, I'll ask upstream.

comment:26 Changed 7 years ago by SimonKing

I should have read your backtrace more carefully. It mentions gnc_mm_Mult_nn, and that's what I am analysing now. A shame that one needs to recompile so many things for testing it...

comment:27 follow-up: Changed 7 years ago by SimonKing

OK, the error occurs in the function poly gnc_mm_Mult_nn(int *F0, int *G0, const ring r), whose definition in kernel/gring.cc starts at line 463.

I inserted some print statements, and found the crash occurs in this chunk of code:

  if (iF<=jG)
    /* i.e. no mixed exp_num , MERGE case */
  {
    p_MemAdd_LengthGeneral(F, G, ExpSize/sizeof(long));
    p_SetExpV(out,F,r);
    p_Setm(out,r);
    //    omFreeSize((ADDRESS)F,ExpSize);
    printf("second freeT F,rN\n");
    freeT(F,rN);
    printf("second freeT G,rN\n");
    freeT(G,rN);
    printf("second return out\n");
    return(out);
  }

The last line printed is second freeT F,rN. Hence, F seems to be corrupted. Here is ho F is created:

  int *F=(int *)omAlloc0((rN+1)*sizeof(int));
  int *G=(int *)omAlloc0((rN+1)*sizeof(int));

  memcpy(F, F0,(rN+1)*sizeof(int));
  // pExpVectorCopy(F,F0);
  memcpy(G, G0,(rN+1)*sizeof(int));
  //  pExpVectorCopy(G,G0);
  F[0]=0; /* important for p_MemAdd */
  G[0]=0;

Is it a problem that omAlloc0 is explicitly called? Or is it just a macro, and actually defaults to malloc with your spkg?

comment:28 in reply to: ↑ 27 ; follow-up: Changed 7 years ago by nbruin

Replying to SimonKing:

The last line printed is second freeT F,rN. Hence, F seems to be corrupted. Here is ho F is created:

  int *F=(int *)omAlloc0((rN+1)*sizeof(int));
  int *G=(int *)omAlloc0((rN+1)*sizeof(int));

  memcpy(F, F0,(rN+1)*sizeof(int));
  // pExpVectorCopy(F,F0);
  memcpy(G, G0,(rN+1)*sizeof(int));
  //  pExpVectorCopy(G,G0);
  F[0]=0; /* important for p_MemAdd */
  G[0]=0;

Is it a problem that omAlloc0 is explicitly called? Or is it just a macro, and actually defaults to malloc with your spkg?

That all should be safe. (omalloc already had an omEmulateAlloc config option, but it was missing declarations and therefore Singular failed to compile. I added those. Part of it was an omMemDup routine that takes only a pointer, so it needs to know how large the block pointed to actually is. I needed the linux-specific malloc_usable_size for this. It looks into the 8 bytes preceding the block, which makes valgrind complain)

The actual declarations are for the most part near omalloc.h:777, which comes from omAllocDecl.h:280.

#define omAlloc0(size)                          omEmulateAlloc0(size)

which is only defined in omAllocEmulate.c:

void* omEmulateAlloc0(size_t size)
{
  void* addr = OM_MALLOC_MALLOC(size);
  memset(addr, 0, size);
  return addr;
}

where 'OM_MALLOC_MALLOC' is 'malloc' as far as I can tell (the number of indirections involved in omalloc probably is very useful but it drove me to despair). So that looks perfectly safe to me. Furthermore, I didn't touch that file.

I suppose the corruption happens somewhere lower down. Incidentally, the valgrind report points to exactly those two snippets. So valgrind reporting for these things is really very good.

comment:29 in reply to: ↑ 28 ; follow-up: Changed 7 years ago by SimonKing

Replying to nbruin:

That all should be safe.

... which is bad, because I need someone that I can blame }:->

I suppose the corruption happens somewhere lower down. Incidentally, the valgrind report points to exactly those two snippets. So valgrind reporting for these things is really very good.

It is not clear to me what you mean by "those two snippets": The definition of omEmulateAlloc0, or the snippets that I provided in my previous post?

comment:30 in reply to: ↑ 29 ; follow-ups: Changed 7 years ago by nbruin

Replying to SimonKing:

It is not clear to me what you mean by "those two snippets":

This one:

the snippets that I provided in my previous post?

But don't take my word for it. The report is in the sage-devel thread:

==20553== Invalid read of size 8
==20553==    at 0x2694CB7C: gnc_mm_Mult_nn(int*, int*, sip_sring*) (gring.cc:503)
==20553==    by 0x2694DABB: gnc_p_Mult_mm_Common(spolyrec*, spolyrec*, int, sip_sring*) (gring.cc:424)
==20553==    by 0x269C7726: p_Power(spolyrec*, int, sip_sring*) (gring.h:181)
...
==20553==  Address 0x40c39c48 is 8 bytes inside a block of size 12 alloc'd
==20553==    at 0x4A0762F: malloc (vg_replace_malloc.c:270)
==20553==    by 0x26BADA45: omEmulateAlloc0 (omAllocEmulate.c:15)
==20553==    by 0x2694C9EE: gnc_mm_Mult_nn(int*, int*, sip_sring*) (gring.cc:472)
==20553==    by 0x2694DABB: gnc_p_Mult_mm_Common(spolyrec*, spolyrec*, int, sip_sring*) (gring.cc:424)
==20553==    by 0x269C7726: p_Power(spolyrec*, int, sip_sring*) (gring.h:181)

Wait a minute ... valgrind complains EARLIER! Line 503 rather than 507. So valgrind already seems unhappy with

    p_MemAdd_LengthGeneral(F, G, ExpSize/sizeof(long));

It's not immediately clear to me that ExpSize/sizeof(long) really is (at most) rN+1 (see p_MemAdd.h for the definition). The singular source is now my most-grepped directory.

comment:31 in reply to: ↑ 30 Changed 7 years ago by nbruin

It's not immediately clear to me that ExpSize/sizeof(long) really is (at most) rN+1 (see p_MemAdd.h for the definition).

In fact, it certainly is not:

int ExpSize=(((rN+1)*sizeof(int)+sizeof(long)-1)/sizeof(long))*sizeof(long);

so ExpSize gets rounded up. If sizeof(int)=4 and sizeof(long)=8 then for rN=2 we get (rN+1)*sizeof(int)=12 and ExpSize=16 (which, by the valgrind report, is probably exactly the case we're in). So p_MemAdd_LengthGeneral is definitely writing out of bounds. I think this code should simply read:

  int *F=(int *)omAlloc0(ExpSize);
  int *G=(int *)omAlloc0(ExpSize);

  memcpy(F, F0,(rN+1)*sizeof(int));
  // pExpVectorCopy(F,F0);
  memcpy(G, G0,(rN+1)*sizeof(int));

EDIT: Oops, don't change the memcpy, since the source may not have such a large allocation. The target is 0-initialized, so the extra space should not be a problem.

God knows why they add an array of int as an array of long in

p_MemAdd_LengthGeneral(F, G, ExpSize/sizeof(long));

Are they sure there's no carry (I guess, if these are non-negative signed ints the sign bit would buffer any carry).

Is it viable to simply ditch Singular :-?

By the way, since most architectures with sizeof(long)=8 would be 8-byte aligned anyway, I doubt that this error would ever lead to real trouble. But the sloppiness surely is worrying.

Last edited 7 years ago by nbruin (previous) (diff)

comment:32 follow-ups: Changed 7 years ago by nbruin

OK, making the changes outlined above do make

 ./sage -t devel/sage/sage/libs/singular/function.pyx

pass with MALLOC_CHECK_=3. So: yes, singular-malloc found a real issue in the singular code (please report upstream!).

Bugs like this *could* trip up singular on architectures that have default 4-byte alignment and sizeof(long) = 8. From omTables.c we see that bins that have a length that are not a multiple of 8 are used in that case:

#ifdef OM_ALIGN_8

size_t om_BinSize [SIZEOF_OM_BIN_PAGE / MIN_BIN_BLOCKS] =
{   8,  16,  24,  32,
    40,  48,  56,  64, 72,
    80,  96, 112, 128, 144,
    160, 192, 224,
    ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*9)) / 8)*8,
    ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*6)) / 8)*8,
    ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*4)) / 8)*8,
    ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*2)) / 8)*8,
    ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR))   / 8)*8,
    OM_MAX_BLOCK_SIZE};

#else /* ! OM_ALIGN_8 */

size_t om_BinSize [SIZEOF_OM_BIN_PAGE / MIN_BIN_BLOCKS] =
{   8,  12,  16,  20,
    24,  28,  32,
    40,  48,  56,  64,
    80,  96, 112, 128,
    160, 192, 224,
    ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*9)) / SIZEOF_STRICT_ALIGNMENT)*SIZEOF_STRICT_ALIGNMENT,
    ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*6)) / SIZEOF_STRICT_ALIGNMENT)*SIZEOF_STRICT_ALIGNMENT,
    ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*4)) / SIZEOF_STRICT_ALIGNMENT)*SIZEOF_STRICT_ALIGNMENT,
    ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*2)) / SIZEOF_STRICT_ALIGNMENT)*SIZEOF_STRICT_ALIGNMENT,
    ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR))   / SIZEOF_STRICT_ALIGNMENT)*SIZEOF_STRICT_ALIGNMENT,
    OM_MAX_BLOCK_SIZE};

#endif /* OM_ALIGN_8 */

Running the same file with valgrind does give a couple of reports of the form:

==30816== Conditional jump or move depends on uninitialised value(s)
==30816==    at 0x26942A01: hGetmem(int, int**, monrec*) (hutil.cc:1026)
==30816==    by 0x26940296: hHilbStep(int*, int**, int, int*, int, int*, int) (hilb.cc:142)
==30816==    by 0x2694047C: hHilbStep(int*, int**, int, int*, int, int*, int) (hilb.cc:186)
==30816==    by 0x26940AEB: hSeries(sip_sideal*, intvec*, int, intvec*, sip_sideal*, sip_sring*) (hilb.cc:285)
==30816==    by 0x26A62E0D: khCheck(sip_sideal*, intvec*, intvec*, int&, int&, skStrategy*) (khstd.cc:75)
==30816==    by 0x2697EE07: bba(sip_sideal*, sip_sideal*, intvec*, intvec*, skStrategy*) (kstd2.cc:1214)
==30816==    by 0x26973F5A: kStd(sip_sideal*, sip_sideal*, tHomog, intvec**, intvec*, int, int, intvec*) (kstd1.cc:1819)
==30816==    by 0x268B3B27: jjSTD_HILB(sleftv*, sleftv*, sleftv*) (iparith.cc:3321)

and also (and this one seems to be what ElectricFence trips on):

==30816== Invalid read of size 8
==30816==    at 0x26B8B3F2: List<CanonicalForm>::isEmpty() const (ftmpl_list.cc:256)
==30816==    by 0x26AFA34F: multiFactorize(CanonicalForm const&, Variable const&) (facFactorize.cc:710)
==30816==    by 0x26A95E9C: ratFactorize(CanonicalForm const&, Variable const&, bool) (facFactorize.h:45)
==30816==    by 0x26A9303B: factorize(CanonicalForm const&, bool) (cf_factor.cc:651)
==30816==    by 0x26935B43: singclap_factorize(spolyrec*, intvec**, int) (clapsing.cc:835)
==30816==    by 0x268B0D4F: jjFAC_P(sleftv*, sleftv*) (iparith.cc:3995)
==30816==    by 0x268B9C41: iiExprArith1(sleftv*, sleftv*, int) (iparith.cc:7869)
==30816==    by 0x270C9F54: __pyx_f_4sage_4libs_8singular_8function_17KernelCallHandler_handle_call(__pyx_obj_4sage_4libs_8singular_8function_KernelCallHandler*, __pyx_obj_4sage_4libs_8singular_8function_Converter*, __pyx_opt_args_4sage_4libs_8singular_8function_17KernelCallHandler_handle_call*) (function.cpp:11286)
...
==30816==  Address 0x40c08620 is 0 bytes after a block of size 32 alloc'd
==30816==    at 0x4A06A97: operator new[](unsigned long) (vg_replace_malloc.c:363)
==30816==    by 0x26AFA10B: multiFactorize(CanonicalForm const&, Variable const&) (facFactorize.cc:688)
==30816==    by 0x26A95E9C: ratFactorize(CanonicalForm const&, Variable const&, bool) (facFactorize.h:45)
==30816==    by 0x26A9303B: factorize(CanonicalForm const&, bool) (cf_factor.cc:651)
==30816==    by 0x26935B43: singclap_factorize(spolyrec*, intvec**, int) (clapsing.cc:835)
==30816==    by 0x268B0D4F: jjFAC_P(sleftv*, sleftv*) (iparith.cc:3995)
==30816==    by 0x268B9C41: iiExprArith1(sleftv*, sleftv*, int) (iparith.cc:7869)
==30816==    by 0x270C9F54: __pyx_f_4sage_4libs_8singular_8function_17KernelCallHandler_handle_call(__pyx_obj_4sage_4libs_8singular_8function_KernelCallHandler*, __pyx_obj_4sage_4libs_8singular_8function_Converter*, __pyx_opt_args_4sage_4libs_8singular_8function_17KernelCallHandler_handle_call*) (function.cpp:11286)

and one further report:

==30816== Invalid read of size 4
==30816==    at 0x269CE137: rOrd_is_Totaldegree_Ordering(sip_sring*) (ring.cc:2074)
==30816==    by 0x269CFE12: rComplete(sip_sring*, int) (ring.cc:3469)
==30816==    by 0x269D315C: rDefault(int, int, char**, int, int*, int*, int*, int**) (ring.cc:150)
==30816==    by 0x269D3214: rDefault(int, int, char**) (ring.cc:167)
==30816==    by 0x269C06F9: nInitChar(sip_sring*) (numbers.cc:267)
==30816==    by 0x269CEF07: rComplete(sip_sring*, int) (ring.cc:3546)
==30816==    by 0x268FA13F: rInit(sleftv*, sleftv*, sleftv*) (ipshell.cc:5127)
==30816==    by 0x268D7607: yyparse() (grammar.y:1288)
==30816==    by 0x268B9C41: iiExprArith1(sleftv*, sleftv*, int) (iparith.cc:7869)
==30816==    by 0x268D7D32: yyparse() (grammar.y:659)
==30816==    by 0x268ED5A7: iiAllStart(procinfo*, char*, feBufferTypes, int) (iplib.cc:316)
==30816==    by 0x268ED73E: iiPStart(idrec*, sleftv*) (iplib.cc:417)
==30816==    by 0x268EDC60: iiMake_proc(idrec*, sip_package*, sleftv*) (iplib.cc:539)
...
==30816==  Address 0x40a7b208 is 0 bytes after a block of size 8 alloc'd
==30816==    at 0x4A0762F: malloc (vg_replace_malloc.c:270)
==30816==    by 0x269D31BD: rDefault(int, int, char**) (ring.cc:157)
==30816==    by 0x269C06F9: nInitChar(sip_sring*) (numbers.cc:267)
==30816==    by 0x269CEF07: rComplete(sip_sring*, int) (ring.cc:3546)
==30816==    by 0x268FA13F: rInit(sleftv*, sleftv*, sleftv*) (ipshell.cc:5127)
==30816==    by 0x268D7607: yyparse() (grammar.y:1288)
==30816==    by 0x268B9C41: iiExprArith1(sleftv*, sleftv*, int) (iparith.cc:7869)
==30816==    by 0x268D7D32: yyparse() (grammar.y:659)
==30816==    by 0x268ED5A7: iiAllStart(procinfo*, char*, feBufferTypes, int) (iplib.cc:316)
==30816==    by 0x268ED73E: iiPStart(idrec*, sleftv*) (iplib.cc:417)
==30816==    by 0x268EDC60: iiMake_proc(idrec*, sip_package*, sleftv*) (iplib.cc:539)
Last edited 7 years ago by nbruin (previous) (diff)

comment:33 in reply to: ↑ 32 Changed 7 years ago by nbruin

==30816== Invalid read of size 4
==30816==    at 0x269CE137: rOrd_is_Totaldegree_Ordering(sip_sring*) (ring.cc:2074)
...

This is like shooting fish in a barrel. ring.cc:157:

...
  int *order = (int *) omAlloc(2* sizeof(int));
...
  return rDefault(ch,N,n,2,order,block0,block1);

calling ring.cc:127 (the variable order above gets passed as the parameter ord in the code below):

  ring r=(ring) omAlloc0Bin(sip_sring_bin);
...
  r->order = ord;
...
  rComplete(r);

calling ring.cc:3469:

  if (rOrd_is_Totaldegree_Ordering(r) || rOrd_is_WeightedDegree_Ordering(r))

calling ring.cc:2065:

BOOLEAN rOrd_is_Totaldegree_Ordering(ring r)
{
  // Hmm.... what about Syz orderings?
  return (rVar(r) > 1 &&
          ((rHasSimpleOrder(r) &&
           (rOrder_is_DegOrdering((rRingOrder_t)r->order[0]) ||
            rOrder_is_DegOrdering(( rRingOrder_t)r->order[1]))) ||
           (rHasSimpleOrderAA(r) &&
            (rOrder_is_DegOrdering((rRingOrder_t)r->order[1]) ||
             rOrder_is_DegOrdering((rRingOrder_t)r->order[2])))));
}

As you see, that last line ...r->ord[2] is indeed out of bound on a 2*sizeof(int) block. The valgrind session seems to indicate this code path is indeed exercised.

comment:34 in reply to: ↑ 30 Changed 7 years ago by SimonKing

Replying to nbruin:

It's not immediately clear to me that ExpSize/sizeof(long) really is (at most) rN+1

Do you mean (rN+1)*sizeof(int)?

comment:35 Changed 7 years ago by SimonKing

  • Report Upstream changed from N/A to Reported upstream. Developers acknowledge bug.

For the record: The problem is reported upstream, and Hannes seems to acknowledge that there is a problem.

comment:36 follow-up: Changed 7 years ago by SimonKing

I am a bit puzzled: I thought that I had posted a comment roughly as follows, but it seems to have vanished.

It would take some time to get a fix into a stable Singular release, and it would take even more time until the next stable Singular release is in Sage.

Hence: Could you create diff files for your fixes in the Singular source code (I am not talking about the malloc thing but about the bug fixes), so that one can put them into the patches/ directory of the Singular spkg?

comment:37 in reply to: ↑ 36 ; follow-up: Changed 7 years ago by nbruin

Replying to SimonKing:

Hence: Could you create diff files for your fixes in the Singular source code (I am not talking about the malloc thing but about the bug fixes), so that one can put them into the patches/ directory of the Singular spkg?

For the bug that tripped MALLOC_CHECK_: sure. That's just a 2-line change which I'm fairly confident is safe to make.

-  int *F=(int *)omAlloc0((rN+1)*sizeof(int));
+  int *F=(int *)omAlloc0(ExpSize);
-  int *G=(int *)omAlloc0((rN+1)*sizeof(int));
+  int *G=(int *)omAlloc0(ExpSize);

For the second bug: I have no idea how to fix that. Just doing

-  int *order = (int *) omAlloc(2* sizeof(int))
+  int *order = (int *) omAlloc(3* sizeof(int))

may work but is a bit questionable.

Also I have no experience in making patch files for the singular source. It may be a bit more robust to take whatever change Singular makes to their development version and backport that.

Also: we're not done yet. There's still a branch on what valgrind thinks is an uninitialized value and there's the List<CanonicalForm>::isEmpty() issue which ventures into templated code, so I'm at a loss for that.

Purpose of this ticket: I guess it's vague. Feel free to specify. I was actually hoping that after seeing that singular-malloc can expose issues that are otherwise hard to find, Singular might be interesting in adopting it and including it as a honest configure option (resolving the problem with the executable too). Then we can just put the proper doc in the spkg. If they don't do that, we could still do so but as a patch to singular. My goal was just to get rid of Heisenbugs in our communication with Singular.

comment:38 in reply to: ↑ 37 Changed 7 years ago by SimonKing

Replying to nbruin:

Replying to SimonKing:

Hence: Could you create diff files for your fixes in the Singular source code (I am not talking about the malloc thing but about the bug fixes), so that one can put them into the patches/ directory of the Singular spkg?

For the bug that tripped MALLOC_CHECK_: sure. That's just a 2-line change which I'm fairly confident is safe to make.

-  int *F=(int *)omAlloc0((rN+1)*sizeof(int));
+  int *F=(int *)omAlloc0(ExpSize);
-  int *G=(int *)omAlloc0((rN+1)*sizeof(int));
+  int *G=(int *)omAlloc0(ExpSize);

For the second bug: I have no idea how to fix that. Just doing

-  int *order = (int *) omAlloc(2* sizeof(int))
+  int *order = (int *) omAlloc(3* sizeof(int))

may work but is a bit questionable.

Can you suggest that as a fix on the singular ticket? Or shall I communicate it?

Also I have no experience in making patch files for the singular source.

That's easy.

  • Create a diff with context (is that called "unified"? I am not sure).
  • Give it a good name and perhaps a commit message, and put it into the patches/ folder of the Singular spkg. Just look what you currently find in that folder. The point is: The patches placed into the patches/ folder are automatically applied before building Singular.
  • Do not change the files in src/! They are supposed to be identical with the official Singular 3-1-5 source files.
  • Rename the spkg into singular-3-1-5.p2.spkg.
  • Update SPKG.txt, describing the new patch.
  • Both the patches/ folder and SPKG.txt are under version control, hence, you also need to hg commit.
  • Pack the spkg, post it somewhere, and ask me or someone else for a review.

The same patch could be put into your malloc-Singular/patches/, so that we can test whether the problem is really solved. And if it is, then the patch in the usual omalloc Singular should be fine to go.

It may be a bit more robust to take whatever change Singular makes to their development version and backport that.

Well, let's see what the Singular team thinks of your suggestion...

If they react quickly, then we could take the patch from Singular trac. If they are too slow, we take your patch from Sage trac.

Also: we're not done yet. There's still a branch on what valgrind thinks is an uninitialized value and there's the List<CanonicalForm>::isEmpty() issue which ventures into templated code, so I'm at a loss for that.

Sure. Could be that we will open further Singular trac tickets, and could be that there will be two or three more patches in patches/...

Purpose of this ticket: I guess it's vague. Feel free to specify. I was actually hoping that after seeing that singular-malloc can expose issues that are otherwise hard to find, Singular might be interesting in adopting it and including it as a honest configure option (resolving the problem with the executable too). Then we can just put the proper doc in the spkg.

I see that the patches/ directory contains a sub-directory patches/conditional/ containing a patch sage_debug.patch and a patch for cygwin. The changes that you've done in the Singular source code to use malloc could be put into patches/conditional/. Probably spkg-install needs to be changed to take the new option into account.

If they don't do that, we could still do so but as a patch to singular.

I think it will be a patch to the Singular spkg in any case. Waiting for Singular 3-1-6 (which will hopefully contain a fix for the memory problem) is probably no option.

comment:39 follow-up: Changed 7 years ago by SimonKing

I've put the following two files into the patches/ folder of the singular-3-1-5.malloc-linux spkg:

  • sage_trac13731_gnc_mm_Mult_nn.patch
    • src/kernel/gring.cc

      # HG changeset patch
      # User Simon King <simon.king@uni-jena.de>
      # Date 1353701669 -3600
      # Node ID 0f5e07666bc133a0d38cb846120c0103c4c2b0ae
      # Parent  3eb11c08dd9611b7fdba4622cc0d1660527f9e96
      #13731: Fix allocated size of F and G in gnc_mm_Mult_nn
      
      diff --git a/src/kernel/gring.cc b/src/kernel/gring.cc
      a b  
      469469  int rN=r->N;
      470470  int ExpSize=(((rN+1)*sizeof(int)+sizeof(long)-1)/sizeof(long))*sizeof(long);
      471471
      472   int *F=(int *)omAlloc0((rN+1)*sizeof(int));
      473   int *G=(int *)omAlloc0((rN+1)*sizeof(int));
       472  int *F=(int *)omAlloc0(ExpSize);
       473  int *G=(int *)omAlloc0(ExpSize);
      474474
      475475  memcpy(F, F0,(rN+1)*sizeof(int));
      476476  // pExpVectorCopy(F,F0);
  • patches/sage_trac13731_size_of_order.patch
    • src/kernel/ring.cc

      # HG changeset patch
      # User Simon King <simon.king@uni-jena.de>
      # Date 1353701944 -3600
      # Node ID 3ab3d75bf03794d94037c3e33711fdd773d858dc
      # Parent  0f5e07666bc133a0d38cb846120c0103c4c2b0ae
      #13731: Fix allocated size for order (in ring.cc)
      
      diff --git a/src/kernel/ring.cc b/src/kernel/ring.cc
      a b  
      154154ring rDefault(int ch, int N, char **n)
      155155{
      156156  /*order: lp,0*/
      157   int *order = (int *) omAlloc(2* sizeof(int));
       157  int *order = (int *) omAlloc(3* sizeof(int));
      158158  int *block0 = (int *)omAlloc0(2 * sizeof(int));
      159159  int *block1 = (int *)omAlloc0(2 * sizeof(int));
      160160  /* ringorder dp for the first block: var 1..N */

However, after re-installing the package, the crash persists!

sage: A.<x,y> = FreeAlgebra(QQ, 2)
sage: P.<x,y> = A.g_algebra({y*x:-x*y})
sage: x*y
*** glibc detected *** python: free(): invalid pointer: 0x0000000002bb11c0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x766d6)[0x7f8243e866d6]
/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libsingular.so(_Z14gnc_mm_Mult_nnPiS_P9sip_sring+0x294)[0x7f8229a81c54]
/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libsingular.so(_Z20gnc_p_Mult_mm_CommonP8spolyrecS0_iP9sip_sring+0x1ec)[0x7f8229a82acc]
/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/python2.7/site-packages/sage/libs/singular/polynomial.so(+0x4bd0)[0x7f8228e63bd0]
/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/python2.7/site-packages/sage/rings/polynomial/plural.so(+0x1fbd8)[0x7f82292a8bd8]
/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/python2.7/site-packages/sage/structure/element.so(+0x2a399)[0x7f823b0a7399]
/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(+0x48a4f)[0x7f8244a63a4f]

That's bad. Didn't you say the change fixes it for you?

comment:40 Changed 7 years ago by SimonKing

Aha! You also changed spkg-install, so that the patches are actually not applied! That is of course the reason why it didn't work.

comment:41 in reply to: ↑ 39 ; follow-up: Changed 7 years ago by nbruin

Replying to SimonKing:

I've put the following two files into the patches/ folder of the singular-3-1-5.malloc-linux spkg:

[...]

Perhaps backport this instead (should be pretty straightforward to replace the patches):

http://www.singular.uni-kl.de:8002/trac/changeset/15435

It looks like the second issue is also addressed. Are the other issues within your reach to investigate? I'm not really comfortable tracing through templated C++.

The FC16 packaged valgrind didn't work for me due to unsupported machine instructions, but the latest valgrind release fixes that and installing that in /usr/local was a breeze! I can thoroughly recommend using it. The information it gives seem spot-on. (I was able to diagnose both these issues by just looking at the valgrind report and tracing through the code from allocation to access error. It won't always be that straightforward, but one can hope)

With

http://sage.math.washington.edu/home/nbruin/sage.supp

the amount of false (or at least for me uninteresting) reports is bearable.

It's unlikely, but the errors we've found could lead to random corruptions and hence explain the peculiar system-dependent bugs we've experienced.

I'm afraid we're not done yet, though: Although we haven't found a GC mismatch here, we're pretty sure they exist. Also these corruptions should cause bad problems all around, not just after introducing GC.

comment:42 in reply to: ↑ 32 Changed 7 years ago by nbruin

==30816== Conditional jump or move depends on uninitialised value(s)
==30816==    at 0x26942A01: hGetmem(int, int**, monrec*)

I think this one is benign, because rewriting the code to the functionally equivalent

hutil.cc:1020:

scfmon hGetmem(int lm, scfmon old, monp monmem)
{
   scfmon x = monmem->mo;
   int  lx = monmem->a;
   if ((x==NULL) || (lm > lx))
   {
-    if ((x!=NULL)&&(lx>0)) omFreeSize((ADDRESS)x, lx * sizeof(scmon));
+    if (x!=NULL) if (lx>0) omFreeSize((ADDRESS)x, lx * sizeof(scmon));
     monmem->mo = x = (scfmon)omAlloc(lm * sizeof(scmon));
     monmem->a = lm;
   }
   memcpy(x, old, lm * sizeof(scmon));
   return x;
}

makes the reports go away. I guess gcc optimizes out the shortcutting because it's not a benefit here and valgrind doesn't see through the fact that the undefined value doesn't influence the outcome.

comment:43 in reply to: ↑ 2 Changed 7 years ago by nbruin

Replying to nbruin:

#0  0x00007fffce9fb3f2 in List<CanonicalForm>::isEmpty(this=0x7fffa515b000) at ./templates/ftmpl_list.cc:256
#1  0x00007fffce96a350 in multiFactorize (F=..., v=...) at facFactorize.cc:710

I think this one is rather straightforward too (and a genuine out-of-bounds reference!): facFactorize.cc:688

    CFList* bufAeval2= new CFList [A.level() - 2];
...
    evaluationWRTDifferentSecondVars (bufAeval2, bufEvaluation, A);

    for (int j= 0; j < A.level() - 1; j++)
    {
      if (!bufAeval2[j].isEmpty())
        counter++;
    }

so this queries all elements bufAeval2[0],...,bufAeval2[A.level()-2]. However, if this allocates an array as it does in C, then the new command above only creates bufAeval2[0],...,bufAeval2[A.level()-3] (i.e., A.level()-2 of them, but 0-based.

The initialization by evaluationWRTDifferentSecondVars seems to corroborate that:

facFqFactorize.cc:1778

void
evaluationWRTDifferentSecondVars (CFList*& Aeval, const CFList& evaluation,
                                  const CanonicalForm& A)
{
  CanonicalForm tmp;
  CFList tmp2;
  CFListIterator iter;
  for (int i= A.level(); i > 2; i--)
  {
...
    if (preserveDegree)
      Aeval [i - 3]= tmp2;
    else
      Aeval [i - 3]= CFList();
  }
}

So only Aeval[0], ..., Aeval[A.level()-3] get initialized.

Thus, the reference to !bufAeval2[j].isEmpty() with j = A.level() - 2 indeed seems out of bounds to me. The abundance of bound errors in Singular is really making me feel uncomfortable. The Singular team should really take their memory audits a little more seriously. They're playing russian roulette with mathematical correctness.

comment:44 in reply to: ↑ 41 Changed 7 years ago by SimonKing

Replying to nbruin:

Replying to SimonKing:

I've put the following two files into the patches/ folder of the singular-3-1-5.malloc-linux spkg:

[...]

Perhaps backport this instead (should be pretty straightforward to replace the patches):

http://www.singular.uni-kl.de:8002/trac/changeset/15435

Yes, that looks like the right thing to do.

My plan is to take your Singular-malloc spkg and turn it into a "proper" spkg, in the sense that the changes that you did to the source code will become patches that are optionally applied (say, if one does export SAGE_SINGULA_MALLOC=yes).

The backported fix will of course be an unconditional patch.

It looks like the second issue is also addressed. Are the other issues within your reach to investigate? I'm not really comfortable tracing through templated C++.

Neither am I. But at least (in contrast to me!) you are comfortable to work with valgrind! Perhaps you could open a ticket to make your .supp file available to people? After all, there is a totally outdated valgrind spkg...

comment:45 follow-up: Changed 7 years ago by SimonKing

Now I try to patch the standard Singular spkg so that one can optionally use malloc. But I have two questions on how you created the singular-malloc spkg.

Recall that I have removed (or at least I tried to remove) all auto-generated files that I found in your spkg in order to make it work on my openSuse laptop. The changes compared with the original sources are:

M src/Singular/Makefile.in
M src/Singular/Minor.h
M src/Singular/configure
M src/Singular/configure.in
M src/Singular/sing_win.cc
M src/factory/assert.h
M src/factory/cf_chinese.cc
M src/factory/cf_util.cc
M src/factory/cf_util.h
M src/factory/facBivar.h
M src/factory/facFactorize.h
M src/factory/facFqBivar.cc
M src/omalloc/Makefile.in
M src/omalloc/omAllocDecl.h
M src/omalloc/omAllocFunc.c
M src/omalloc/omMallocSystem.h
M src/omalloc/omalloc.c
M src/omalloc/pmalloc.h
? src/factory/gen_cf_gmp.sh
? src/omalloc/omTables.h
? src/omalloc/omTables.inc

So, first question: Is it correct that all three new files are needed for malloc support? I'd rather think that gen_cf_gmp.sh should not be here, because it contains explicit paths:

#!/bin/sh
GMP_H_T="gen_cf_gmp.o: gen_cf_gmp.cc /usr/local/sage/5.5b1/local/include/gmp.h \ "
GMP_H=`echo $GMP_H_T| sed -e 's/^.*gmp.cc//' -e 's/ .$//'`
echo generating cf_gmp.h from $GMP_H
cat $GMP_H | grep -v __GMP_DECLSPEC_XX |grep -v std::FILE > cf_gmp.h

Second question: Is it not the case that src/Singular/configure is auto-generated from src/Singular/configure.in? Or is it really essential that src/Singular/configure is as in your spkg?

comment:46 Changed 7 years ago by SimonKing

PS: Actually it seems to me that the changes in src/Singular/configure.in and src/Singular/configure are due to the patch patches/singular_trac_443.patch that is to be applied anyway.

comment:47 Changed 7 years ago by SimonKing

src/omalloc/omTables.inc doesn't state that it is auto-generated, but it certainly looks like.

So, third question: Did you write src/omalloc/omTables.inc yourself? Or can I safely delete it?

comment:48 in reply to: ↑ 45 ; follow-up: Changed 7 years ago by nbruin

Replying to SimonKing:

M src/omalloc/omAllocDecl.h
M src/omalloc/omAllocFunc.c
M src/omalloc/omMallocSystem.h
M src/omalloc/omalloc.c
M src/omalloc/pmalloc.h

So, first question: Is it correct that all three new files are needed for malloc support?

I didn't make any new files and I only seriously edited things in the omalloc directory. I may have done some rough hacking in config files to ensure that the changes took effect, but I believe that it should be possible to do without by someone who know how to pass the correct options to configure etc.

To make the "spkg" I only did a "make clean" and a command to make the dir into a spkg. I disabled the "patch application" in spkg-install because that made it fail the second time I ran it (which is inconvenient when you're editing the package!)

Note that malloc_usable_size for glibc has the same function as malloc_size on BSD (OSX), so if there's a convenient config option to choose between the two you can make the spkg so that it compiles on both.

The omalloc package is obviously written in such a way that it can build in different ways, depending on config options passed at build-time. What I did was resurrect one of those options (it didn't provide all the functionality that singular needs). It should be possible to adapt my changes accordingly. Then it's not essential to conditionally apply the patches. It's just that I have zero experience with all the "#ifdef" voodoo that seems to be associated with production-quality code.

I found that "meld" was a very convenient way of browsing the changes I'd made (comparing against a pristine directory). It can do a graphically displayed diff on an entire tree. It's what I used to make the linux package from the OSX one I had.

comment:49 in reply to: ↑ 48 ; follow-up: Changed 7 years ago by SimonKing

Replying to nbruin:

Replying to SimonKing:

M src/omalloc/omAllocDecl.h
M src/omalloc/omAllocFunc.c
M src/omalloc/omMallocSystem.h
M src/omalloc/omalloc.c
M src/omalloc/pmalloc.h

So, first question: Is it correct that all three new files are needed for malloc support?

I didn't make any new files and I only seriously edited things in the omalloc directory.

OK, so, I assume you only edited the five files indicated above, and did no other changes by yourself.

I may have done some rough hacking in config files to ensure that the changes took effect, but I believe that it should be possible to do without by someone who know how to pass the correct options to configure etc.

Well, I don't know.

To make the "spkg" I only did a "make clean" and a command to make the dir into a spkg.

That means that all patches from the patches/ directory are applied in your code, and if make clean is imperfect then it also means that some auto-generated (machine dependent) files are in your code.

I disabled the "patch application" in spkg-install because that made it fail the second time I ran it (which is inconvenient when you're editing the package!)

Sure, because you started with code that already contains all patches.

Note that malloc_usable_size for glibc has the same function as malloc_size on BSD (OSX), so if there's a convenient config option to choose between the two you can make the spkg so that it compiles on both.

Hm. I see the following definition

src/omalloc/dlmalloc.h

/* define to -1 if you want that this implementation provides
   malloc/calloc/realloc/free funcs */
#ifndef OM_PROVIDE_MALLOC
#define OM_PROVIDE_MALLOC 0
#endif

#ifndef HAVE__USR_INCLUDE_MALLOC_H
#undef  HAVE__USR_INCLUDE_MALLOC_H
#endif

#ifdef HAVE__USR_INCLUDE_MALLOC_H
#define HAVE_USR_INCLUDE_MALLOC_H 1
#endif

#define OM_MALLOC_MALLOC   mALLOc
#define OM_MALLOC_REALLOC  rEALLOc
#define OM_MALLOC_FREE     fREe
#define OM_MALLOC_VALLOC   vALLOc
#define OM_MALLOC_VFREE(addr, size) OM_MALLOC_FREE(addr)
#define OM_MALLOC_SIZEOF_ADDR(addr) malloc_usable_size(addr)
#define cfree cFREe

I don't totally understand the comment "define to -1 if you want that this implementation provides malloc/calloc/realloc/free func". Does that perhaps mean that export OM_PROVIDE_MALLOC=-1 is all what one need to use system malloc?

But I wonder: You say that malloc_usable_size is not available on OSX. Then, the line

#define OM_MALLOC_SIZEOF_ADDR(addr) malloc_usable_size(addr)

sounds like a bug to me. I'll ask Hannes.

comment:50 in reply to: ↑ 49 ; follow-up: Changed 7 years ago by nbruin

Replying to SimonKing:

But I wonder: You say that malloc_usable_size is not available on OSX. Then, the line

#define OM_MALLOC_SIZEOF_ADDR(addr) malloc_usable_size(addr)

sounds like a bug to me. I'll ask Hannes.

That definition only occurs in dlmalloc.h, right? I don't think other files import that header file, so probably dlmalloc only gets used under certain configure options. I wasn't able to trace through how omalloc gets configured/built. It seems to be a value for configure:

 --with-malloc=system|dlmalloc|gmalloc|pmalloc|external

(given that singular builds just fine on OSX this cannot be a real problem)

But yes, these changes are best made with input from someone familiar with omalloc, so hopefully Hannes can help.

comment:51 in reply to: ↑ 50 ; follow-up: Changed 7 years ago by SimonKing

Replying to nbruin:

It seems to be a value for configure:

 --with-malloc=system|dlmalloc|gmalloc|pmalloc|external

Then the question arises: Why is it not enough to pass the --with-malloc=system to Singular? Why did the additional changes become necessary that you made on five files in src/omalloc/?

For example, you edited src/omalloc/omAllocDecl.h to make the emulation unconditional. Is that really needed, if there is a configuration option anyway?

But yes, these changes are best made with input from someone familiar with omalloc, so hopefully Hannes can help.

I asked, but he didn't reply yet. Well, it's week-end, after all...

comment:52 in reply to: ↑ 51 ; follow-up: Changed 7 years ago by nbruin

Replying to SimonKing:

Replying to nbruin:

It seems to be a value for configure:

 --with-malloc=system|dlmalloc|gmalloc|pmalloc|external

Then the question arises: Why is it not enough to pass the --with-malloc=system to Singular? Why did the additional changes become necessary that you made on five files in src/omalloc/?

I think I can explain that: omalloc is written to work on top of another memory manager. It doesn't call mmap or sbrk by itself. So I think this option selects which malloc is used to serve omalloc with pages.

dlmalloc is simply a full-fledged memory manager that happens to be shipped with omalloc and can be used as a backend. If you look in dlmalloc.c, there is actually an implementation of malloc_usable_size. So I guess you can build omalloc in a way that is independent of the malloc offered by the clib and just uses system calls to mmap or sbrk directly.

The other configuration options (you'll have to read up on it to be sure) deal with what services omalloc provides and how it does it.

There's an option that selects whether omalloc should offer the standard 'malloc/realloc/free' interface to its own allocators.

There is also an option that selects how omalloc does its job: whether it should do what it is designed for or whether it should just honour the requests it receives by passing them on 1-1 to the underlying memory allocator. That's the option that had fallen into disuse within singular and had become defunct. I tried to bring its functionality back to the level required for singular. I hacked my way to getting omalloc to compile in that mode. However, I'm sure this can be done in a nicer fashion.

For example, you edited src/omalloc/omAllocDecl.h to make the emulation unconditional. Is that really needed, if there is a configuration option anyway?

No I'm sure it's not. I didn't figure out a way that positively confirmed for me that just passing in an option helped (partially because I couldn't interact directly with the singular configure script: the spkg-install plays games too in order to build both an executable and libsingular). But someone more familiar with how configure and conditional compilation setups usually work can probably very easily put that in order.

comment:53 in reply to: ↑ 52 ; follow-ups: Changed 7 years ago by SimonKing

Replying to nbruin:

I think I can explain that: omalloc is written to work on top of another memory manager.

So, that is --with-malloc=....

There's an option that selects whether omalloc should offer the standard 'malloc/realloc/free' interface to its own allocators.

What does that mean? If one uses omalloc, why should one not want that it offers malloc/realloc/free?

There is also an option that selects how omalloc does its job: whether it should do what it is designed for or whether it should just honour the requests it receives by passing them on 1-1 to the underlying memory allocator.

So, that's the option that we need, right? Is that --with-emulate-omalloc that you added to spkg-install?

I guess I am now experiencing the same that you did: If --with-emulate-omalloc is the only change, then installing the spkg fails with

In file included from ../kernel/bigintmat.h:13:0,
                 from bigintmat.cc:10:
../kernel/intvec.h: In constructor ‘intvec::intvec(int)’:
../kernel/intvec.h:26:18: error: ‘omEmulateAlloc0’ was not declared in this scope
bigintmat.cc: In member function ‘int* bigintmat::getwid(int)’:
bigintmat.cc:379:31: error: ‘omStrDup’ was not declared in this scope
bigintmat.cc: In member function ‘void bigintmat::pprint(int)’:
bigintmat.cc:414:18: error: ‘omEmulateAlloc0’ was not declared in this scope
bigintmat.cc:421:32: error: ‘omStrDup’ was not declared in this scope

comment:54 in reply to: ↑ 53 Changed 7 years ago by nbruin

Replying to SimonKing:

There's an option that selects whether omalloc should offer the standard 'malloc/realloc/free' interface to its own allocators.

What does that mean? If one uses omalloc, why should one not want that it offers malloc/realloc/free?

It selects whether omalloc should offer its services under those names, i.e., as a drop-in replacement for the standard libc memory manager. That's not what Singular does. It uses the various omalloc routines, which offer extra options. Otherwise, changing Singular would have been a matter of "dropping out" the drop-in replacement. Instead we have to jump through hoops to get omalloc to use libc's malloc (or whatever drop-in replacement we want to use, like ElectricFence?) behind the scenes.

comment:55 in reply to: ↑ 53 Changed 7 years ago by nbruin

Replying to SimonKing:

I guess I am now experiencing the same that you did: If --with-emulate-omalloc is the only change, then installing the spkg fails with

In file included from ../kernel/bigintmat.h:13:0,
                 from bigintmat.cc:10:
../kernel/intvec.h: In constructor ‘intvec::intvec(int)’:
../kernel/intvec.h:26:18: error: ‘omEmulateAlloc0’ was not declared in this scope
bigintmat.cc: In member function ‘int* bigintmat::getwid(int)’:
bigintmat.cc:379:31: error: ‘omStrDup’ was not declared in this scope
bigintmat.cc: In member function ‘void bigintmat::pprint(int)’:
bigintmat.cc:414:18: error: ‘omEmulateAlloc0’ was not declared in this scope
bigintmat.cc:421:32: error: ‘omStrDup’ was not declared in this scope

Yay! indeed. Those are some of the routines I provided. So yes, it's probably best if you just use my changes for information and as hints for how to provide the missing functionality, but then integrate it more cleanly (I had a couple of rounds of these; the hard one being omMemDup, which you don't seem to hit yet). The omEmulateAlloc0 is a weird one since that IS implemented in omAllocEmulate.c I think I solved that one by simply including the relevant declaration in omAllocDecl.h.

Changed 7 years ago by SimonKing

A patch for the singular spkg to produce a libsingular that uses malloc

comment:56 Changed 7 years ago by SimonKing

What I did to produce a singular spkg that produces a malloc-libsingular:

  • Drop attachment:sage_trac_13731.patch into the spkg's patches/ directory
  • Apply the following change to spkg-install:
    • spkg-install

      diff --git a/spkg-install b/spkg-install
      a b  
      134134                --libdir="$SAGE_LOCAL"/lib \
      135135                --with-apint=gmp \
      136136                --with-malloc=system \
       137                --with-emulate-omalloc \
      137138                --with-NTL \
      138139                --without-MP \
      139140                --without-lex \

With that, I seem to get a working libsingular, and I get the due segfault with export MALLOC_CHECK_=3 in the example of the ticket description. However, I do not obtain a working Singular executable.

I am now testing whether the spkg works on bsd.math as well.

comment:57 Changed 7 years ago by SimonKing

Unfortunately, it does not work on bsd.math: The problem appears to be that there is no malloc.h available:

In file included from ./omalloc.h:785:0,
                 from omalloc.c:16:
../omalloc/omMalloc.h:12:20: fatal error: malloc.h: No such file or directory
compilation terminated.

That's really strange. Shouldn't malloc be available system wide?

comment:58 Changed 7 years ago by fbissey

bsd.math is OS X, isn't it? I think I had trouble with that recently because malloc.h is in a subdirectory - not at the top level. You would need an extra -I I guess. I'll see if I can find it.

comment:59 Changed 7 years ago by fbissey

It is in /usr/include/malloc/

comment:60 follow-up: Changed 7 years ago by SimonKing

I am puzzled.

When I put

#if defined(ix86Mac_darwin) || defined(x86_64Mac_darwin) || defined(ppcMac_darwin)
#include <malloc/malloc.h>
#else
#include <malloc.h>
#endif

in src/omalloc/omMallocSystem.h (which is responsible for including malloc.h), things do still not work and it still complains that it can not find malloc.h. But in the installation log, I see the line

checking uname for singular... (cached) x86_64Mac-darwin

and omMalloc.h is created only after that line. Hence I don't understand why it doesn't try to pick up malloc/malloc.h.

comment:61 in reply to: ↑ 60 Changed 7 years ago by SimonKing

Replying to SimonKing:

#if defined(ix86Mac_darwin) || defined(x86_64Mac_darwin) || defined(ppcMac_darwin)
#include <malloc/malloc.h>
#else
#include <malloc.h>
#endif
checking uname for singular... (cached) x86_64Mac-darwin

How nasty! It defines x86_64Mac-darwin, but the person who originally wrote the check for the platforms wrote x86_64Mac_darwin!!

comment:62 follow-up: Changed 7 years ago by fbissey

Rool eyes.... but not completely surprised.

comment:63 in reply to: ↑ 62 Changed 7 years ago by SimonKing

Replying to fbissey:

Rool eyes.... but not completely surprised.

But on the other hand, it can not be "-" (minus sign!) between Mac and darwin. Anyway. There must be a test on which platform we are. How to?

comment:64 follow-up: Changed 7 years ago by fbissey

I am not sure what you would get for a non-mac darwin target, otherwise the easiest thing is probably gcc -dumpmachine, On my 10.5.8 machine:

BlueFerniMac1:~ frb15$ gcc -dumpmachine
i686-apple-darwin9

comment:65 in reply to: ↑ 64 Changed 7 years ago by SimonKing

Replying to fbissey:

I am not sure what you would get for a non-mac darwin target, otherwise the easiest thing is probably gcc -dumpmachine, On my 10.5.8 machine:

BlueFerniMac1:~ frb15$ gcc -dumpmachine
i686-apple-darwin9

Thank you! But I would rather expect that those things are done during configuration, and then the type of the platform is stored in a variable. I'm asking Hans Schönemann how Singular deals with different platforms.

Changed 7 years ago by SimonKing

Backporting Singular changeset 15435

comment:66 Changed 7 years ago by SimonKing

I have backported Singular changeset 15435.

When I put attachment:sage_trac_13731.patch and attachment:singular_15435.patch into the spkg's patches/ directory and apply the change to spkg-install mentioned in comment:56, then I get

simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> export MALLOC_CHECK_=3
simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> ./sage 
...
sage: A.<x,y> = FreeAlgebra(QQ, 2)
sage: P.<x,y> = A.g_algebra({y*x:-x*y})
sage: x._mul_(y)
x*y

So, indeed the original problem got fixed upstream.

Plans:

  • My suggestion is to put the second patch (the backported fix) into our standard Singular spkg.
  • We should of course follow the original plan and see whether we can find further sloppy allocations in Singular, so that #715 and friends can be made work in a reliable way.
  • Can someone tell me how to tell autoconf that it shall create a configure script that tells the preprocessor that we are on Darwin and hence must include malloc/malloc.h in place of malloc.h?

comment:67 follow-up: Changed 7 years ago by fbissey

After more thinking I think you may want to use AC_CANONICAL_TARGET it is usually used for cross compilation but it should work in this case. It is much better than testing gcc since we may want to use clang.

comment:68 follow-up: Changed 7 years ago by nbruin

Just a quick comment: The normal way of using malloc is via #include <stdlib.h>. It's only because we need malloc_usable_size or malloc_size that we need malloc.h. You're already choosing between those using an #if defined(...), so why not use the same mechanism for the compile? Given that this code should only be relevant for *debug* builds of singular, I don't think it's worth overhauling the process by which omalloc produces its header field.

The likely programmatic injection point for changes seems to be the makeheader script which is invoked in the makefile rule for malloc.h, but I'm not sure that that has capabilities of building header files based on conditionals by default.

comment:69 in reply to: ↑ 67 Changed 7 years ago by SimonKing

Replying to fbissey:

After more thinking I think you may want to use AC_CANONICAL_TARGET it is usually used for cross compilation but it should work in this case. It is much better than testing gcc since we may want to use clang.

My question was actually not so much about "How to detect the platform with autoconf?", because I see in the existing configure.in how it can be done. The question is more "How to define a symbol, say, HAVE_DARWIN so that the preprocessor is aware of it?".

I tried to do export HAVE_DARWIN=1 in spkg-build, but it didn't work. I tried AC_DEFINE(HAVE_DARWIN,1), but it didn't work. Here, "didn't work" means that the preprocessor has always been in the else part of the following lines:

#ifdef HAVE_DARWIN
#include "malloc/malloc.h"
#else
#include "malloc.h"
#endif

comment:70 in reply to: ↑ 68 Changed 7 years ago by SimonKing

Replying to nbruin:

Just a quick comment: The normal way of using malloc is via #include <stdlib.h>. It's only because we need malloc_usable_size or malloc_size that we need malloc.h. You're already choosing between those using an #if defined(...), so why not use the same mechanism for the compile?

Well, I don't know whether that way of chosing works. It definitely does not for the #include "malloc.h" versus #include "malloc/malloc.h" thingy.

The likely programmatic injection point for changes seems to be the makeheader script which is invoked in the makefile rule for malloc.h, but I'm not sure that that has capabilities of building header files based on conditionals by default.

I haven't looked into that.> Given that this code should only be relevant for *debug* builds of singular, I don't think it's worth overhauling the process by which omalloc produces its header field.

Probably right. We should better focus on using the debug spkg to detect more memory corruptions.

Just to be on the safe side: Does attachment:singular_15435.patch fix all upstream problems that we have detected so far?

  • The one from the ticket description is fixed.
  • Is the one from comment:33 addressed?
  • Is the one from comment:42 addressed? I understand that one isn't critical, though.
  • The one from comment:43 does not seem to be addressed, and I think I didn't mention it upstream, yet.

comment:71 follow-up: Changed 7 years ago by fbissey

I think AC_DEFINE puts stuff in config.h and you would have to include config.h before this piece of code. Having a code file changed from configure is quite some work. I would have to look it up.

comment:72 in reply to: ↑ 71 ; follow-up: Changed 7 years ago by SimonKing

Replying to fbissey:

I think AC_DEFINE puts stuff in config.h and you would have to include config.h before this piece of code.

That's probably it! Is it needed to use AC_SUBS as well? I read somewhere that this would do something with a header file, so, it is not clear to me whether AC_DEFINE is enough or AC_SUBS is needed in addition.

comment:73 in reply to: ↑ 72 ; follow-up: Changed 7 years ago by fbissey

Replying to SimonKing:

Replying to fbissey:

I think AC_DEFINE puts stuff in config.h and you would have to include config.h before this piece of code.

That's probably it! Is it needed to use AC_SUBS as well? I read somewhere that this would do something with a header file, so, it is not clear to me whether AC_DEFINE is enough or AC_SUBS is needed in addition.

I have read more about it at http://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.69/html_node/Defining-Symbols.html and it goes in config.h if AC_CONFIG_HEADERS is called. AC_SUBST usually acts on Makefile.in preferably generated from Makefile.am and you have to put some code for it to work IIRC. Substituting in a specific myfile.c.in is something else altogether, I need to look it up.

I don't know that stuff from the top of my head unfortunately.

comment:74 in reply to: ↑ 73 Changed 7 years ago by SimonKing

Replying to fbissey:

I have read more about it at http://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.69/html_node/Defining-Symbols.html and it goes in config.h if AC_CONFIG_HEADERS is called.

Thank you! So, I'd need to add AC_CONFIG_HEADERS as well, because the only place where AC_CONFIG_HEADER is used is in gfanlib/configure.ac.

AC_SUBST usually acts on Makefile.in preferably generated from Makefile.am and you have to put some code for it to work IIRC.

Makefile.am does not exist in the Singular sources. However, AC_SUBST is used in several places.

Anyway, I guess inserting AC_CONFIG_HEADERS is the easiest thing to do.

comment:75 follow-up: Changed 7 years ago by fbissey

I wouldn't add AC_CONFIG_HEADERS if it isn't already there. Are you working on the top level or at the Singular folder level? configure.in in Singular does call AC_CONFIG_HEADERS and puts definitions in mod2.h.

comment:76 in reply to: ↑ 75 Changed 7 years ago by SimonKing

Replying to fbissey:

I wouldn't add AC_CONFIG_HEADERS if it isn't already there. Are you working on the top level or at the Singular folder level? configure.in in Singular does call AC_CONFIG_HEADERS and puts definitions in mod2.h.

No, it doesn't use AC_CONFIG_HEADERS but AC_CONFIG_HEADER. That's why grep didn't find it.

Even better: omalloc/configure.in uses it as well - hence, this should be the right place to put the missing symbol.

comment:77 Changed 7 years ago by SimonKing

I am slightly shocked that omalloc/configure.in uses a hardcoded path to malloc.h:

AC_CHECK_HEADERS(unistd.h sys/mman.h fcntl.h /usr/include/malloc.h)

Isn't that bad?

comment:78 Changed 7 years ago by SimonKing

  • Cc malb added

Ccing Marting Albrecht, because he knows the Singular spkg.

Next thing I wonder about: The spkg contains src/omalloc/configure, but on top it says that this is auto-generated. By consequence, when I tried to change src/omalloc/configure.in, my changes had no effect.

So, my first guess was that configure has to be removed - but that didn't work.

Martin, do you think patching src/omalloc/configure directly is the right thing to do? Or what should I do in order to change the configuration in src/omalloc?

comment:79 follow-up: Changed 7 years ago by malb

Simon, configure is generated from configure.in using autoconf. However, Singular requires a special autoconf version (I think it is a particularly old one) which makes fixing things in its configure script tricky. I think this should go on [libsingular-devel]. Hans can help, I think.

comment:80 in reply to: ↑ 79 Changed 7 years ago by SimonKing

Replying to malb:

Simon, configure is generated from configure.in using autoconf. However, Singular requires a special autoconf version (I think it is a particularly old one) which makes fixing things in its configure script tricky. I think this should go on [libsingular-devel]. Hans can help, I think.

I am not subscribed to libsingular-devel. Can you provide a link?

FYI:

I think this ticket has two purposes.

  1. Create a Singular spkg that builds both on linux and osx and replaces omalloc by malloc, so that it is easier to track down memory corruptions in Singular that became apparent by #715 and friends. It is just a tool to analyse things - in fact, it wouldn't even yield a working Singular executable.
  2. The study of memory corruptions should help to fix them - and the fixes should not only be upstream, but should be backported to a new patch level of our Singular 3-1-5 spkg.

The problem with src/omalloc/configure only concerns purpose 1. Hence, if directly editing src/omalloc/configure is dirty but works, then I wouldn't mind to do it, just so that it builds on osx.

Fortunately, the fixes in purpose 2. would probably not involve the configuration scripts.

Last edited 7 years ago by SimonKing (previous) (diff)

comment:81 Changed 7 years ago by SimonKing

Sorry, when posting the previous comment, I forgot to delete two paragraphs before submitting...

comment:82 follow-up: Changed 7 years ago by SimonKing

On libsingular-devel, Hans suggests to use xalloc, not malloc to detect memory-related problems.

Nils, do you know how to do those things?

comment:83 in reply to: ↑ 82 Changed 7 years ago by nbruin

Replying to SimonKing:

Nils, do you know how to do those things?

No, I don't, but it looks like you're learning how to on libsingular-devel already. The xalloc approach seems to be what I was contemplating before I read up on malloc_size and malloc_usable_size. It's interesting to see that the singular people thought it was easier to write a wrapper from scratch than to amend omalloc. I can sympathize with that, although it wasn't so bad in the end.

If xalloc does result in a usable executable, that would be great! I wonder what prevents us from getting that.

It may be that once you've built xalloc.so, you can just do LD_PRELOAD=xalloc.so to pre-empt the normal omalloc routines. That's how things like ElectricFence capture malloc action (provided omalloc doesn't get statically linked)

comment:84 Changed 7 years ago by SimonKing

Sorry, my knowledge of automake is too limited. So far, I did not succeed to create a package that replaces omalloc by xalloc.

What I did (see also the thread in libsingular):

  • replace omalloc/ by the contents of xalloc/ in Singular Spielwiese
  • run aclocal in omalloc
  • run autoconf in omalloc
  • run automake --install-missing in omalloc
  • run automake again in omalloc
  • add the line install-nolns: install to the resulting Makefile.in
  • turn all this into a conditional patch, that is applied if export SINGULAR_XALLOC=yes was done.

When I try to install the spkg, it fails with

make install-nolns in omalloc
make[1]: Entering directory `/home/simon/SAGE/debug/sage-5.5.rc0/spkg/build/singular-3-1-5.p2/src/omalloc'
/bin/sh ./libtool  --tag=CC   --mode=compile gcc -DPACKAGE_NAME=\"xalloc\" -DPACKAGE_TARNAME=\"xalloc\" -DPACKAGE_VERSION=\"3.1.2.sw\" -DPACKAGE_STRING=\"xalloc\ 3.1.2.sw\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"xalloc\" -DVERSION=\"3.1.2.sw\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -I.  -I. -DHAVE_CONFIG_H -DNDEBUG -DOM_NDEBUG -I/home/simon/SAGE/debug/sage-5.5.rc0/local/include -I/home/simon/SAGE/debug/sage-5.5.rc0/local/include  -O3  -O2 -g  -fPIC -MT libomalloc_la-dummy.lo -MD -MP -MF .deps/libomalloc_la-dummy.Tpo -c -o libomalloc_la-dummy.lo `test -f 'dummy.c' || echo './'`dummy.c
mv -f .deps/libomalloc_la-dummy.Tpo .deps/libomalloc_la-dummy.Plo
mv: Aufruf von stat für „.deps/libomalloc_la-dummy.Tpo“ nicht möglich: Datei oder Verzeichnis nicht gefunden
make[1]: *** [libomalloc_la-dummy.lo] Fehler 1
make[1]: Leaving directory `/home/simon/SAGE/debug/sage-5.5.rc0/spkg/build/singular-3-1-5.p2/src/omalloc'
make: *** [install-nolns] Fehler 1
Unable to build and install Singular
Error building Singular (error in build_singular).

Apparently the .deps directory was automatically created by the script ./depcomp, that belongs to those installed by automake --install-missing. Its content after the installation failed:

simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> ls spkg/build/singular-3-1-5.p2/src/omalloc/.deps/
libomalloc_g_la-dummy.Plo  libomalloc_la-dummy.Plo

And to be honest, I don't see the point of the dummy.c file in xalloc, which is:

#include "omalloc.h"

int om_sing_opt_show_mem; /* dummy */
struct omInfo_s om_Info; /* dummy */
struct omOpts_s om_Opts; /* dummy */

comment:85 Changed 7 years ago by SimonKing

With the help of the good people of libsingular-devel, I managed to create a new singular spkg.

When you export SINGULAR_XALLOC=yes then the spkg creates a libsingular library and a Singular executable (yes, it starts and can compute things!!) with omalloc replaced by xalloc (a compatibility layer on top of malloc). Otherwise, the spkg just builds with omalloc.

The good news:

  1. Singular works.
    simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> ./sage -singular
                         SINGULAR                                 /  Development
     A Computer Algebra System for Polynomial Computations       /   version 3-1-5
                                                               0<
     by: W. Decker, G.-M. Greuel, G. Pfister, H. Schoenemann     \   Jul 2012
    FB Mathematik der Universitaet, D-67653 Kaiserslautern        \
    > ring r;
    > x;
    x
    > x*y+z;
    xy+z
    > quit;
    Auf Wiedersehen.
    
    There is no crash in the example above!
  2. Sage starts and quits just fine, even if libsingular is involved:
    simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> ./sage 
    ----------------------------------------------------------------------
    | Sage Version 5.5.rc0, Release Date: 2012-11-17                     |
    | Type "notebook()" for the browser-based notebook interface.        |
    | Type "help()" for help.                                            |
    ----------------------------------------------------------------------
    **********************************************************************
    *                                                                    *
    * Warning: this is a prerelease version, and it may be unstable.     *
    *                                                                    *
    **********************************************************************
    sage: P.<x,y> = QQ[]
    sage: x*y
    x*y
    sage: quit
    Exiting Sage (CPU time 0m0.05s, Wall time 0m9.80s).
    
  3. We can reproduce the error from the ticket description:
    simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> export MALLOC_CHECK_=3
    simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> ./sage 
    ----------------------------------------------------------------------
    | Sage Version 5.5.rc0, Release Date: 2012-11-17                     |
    | Type "notebook()" for the browser-based notebook interface.        |
    | Type "help()" for help.                                            |
    ----------------------------------------------------------------------
    **********************************************************************
    *                                                                    *
    * Warning: this is a prerelease version, and it may be unstable.     *
    *                                                                    *
    **********************************************************************
    sage: A.<x,y> = FreeAlgebra(QQ, 2)
    sage: P.<x,y> = A.g_algebra({y*x:-x*y})
    sage: x._mul_(y)
    *** glibc detected *** python: free(): invalid pointer: 0x0000000002cf4920 ***
    ======= Backtrace: =========
    /lib64/libc.so.6(+0x766d6)[0x7f4331a486d6]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libsingular.so(_Z14gnc_mm_Mult_nnPiS_P9sip_sring+0x301)[0x7f431764de91]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libsingular.so(_Z20gnc_p_Mult_mm_CommonP8spolyrecS0_iP9sip_sring+0x21c)[0x7f431764ef5c]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/python2.7/site-packages/sage/libs/singular/polynomial.so(+0x4c60)[0x7f4316a2cc60]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/python2.7/site-packages/sage/rings/polynomial/plural.so(+0x1d278)[0x7f4316e70278]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/python2.7/site-packages/sage/rings/polynomial/plural.so(+0xbb63)[0x7f4316e5eb63]
    ...
    
  4. Even better: We additionally get a crash when we do the example without MALLOC_CHECK, namely when quitting Sage:
    simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> export MALLOC_CHECK_=
    simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> ./sage
    ----------------------------------------------------------------------
    | Sage Version 5.5.rc0, Release Date: 2012-11-17                     |
    | Type "notebook()" for the browser-based notebook interface.        |
    | Type "help()" for help.                                            |
    ----------------------------------------------------------------------
    **********************************************************************
    *                                                                    *
    * Warning: this is a prerelease version, and it may be unstable.     *
    *                                                                    *
    **********************************************************************
    sage: A.<x,y> = FreeAlgebra(QQ, 2)
    sage: P.<x,y> = A.g_algebra({y*x:-x*y})
    sage: x._mul_(y)
    x*y
    sage: quit
    Exiting Sage (CPU time 0m0.18s, Wall time 0m7.89s).
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libcsage.so(print_backtrace+0x31)[0x7f657e22bc77]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libcsage.so(sigdie+0x14)[0x7f657e22bca9]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libcsage.so(sage_signal_handler+0x216)[0x7f657e22b886]
    /lib64/libpthread.so.0(+0xfd00)[0x7f6583601d00]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(+0x129940)[0x7f6583938940]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(PyGC_Collect+0x24)[0x7f6583939694]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(Py_Finalize+0x116)[0x7f6583925316]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(Py_Exit+0x8)[0x7f6583924308]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(+0x115434)[0x7f6583924434]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(PyErr_PrintEx+0x205)[0x7f65839246d5]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0x1fe)[0x7f6583924b4e]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(Py_Main+0xbd5)[0x7f6583938035]
    /lib64/libc.so.6(__libc_start_main+0xed)[0x7f6582c2523d]
    python[0x4006e1]
    
    ------------------------------------------------------------------------
    Unhandled SIGSEGV: A segmentation fault occurred in Sage.
    This probably occurred because a *compiled* component of Sage has a bug
    in it and is not properly wrapped with sig_on(), sig_off(). You might
    want to run Sage under gdb with 'sage -gdb' to debug this.
    Sage will now terminate.
    ------------------------------------------------------------------------
    /home/simon/SAGE/debug/sage-5.5.rc0/spkg/bin/sage: Zeile 310: 25245 Speicherzugriffsfehler  sage-ipython "$@" -i
    

Conclusion

I think this spkg is something we can work with to detect errors.

comment:86 Changed 7 years ago by SimonKing

PS:

  • I did not test whether this spkg builds on OSX.
  • The next step should be to include the bug fix from singular_15435.patch and similar upstream fixes, and see whether the crash persists...
Last edited 7 years ago by SimonKing (previous) (diff)

comment:87 Changed 7 years ago by SimonKing

PPS: I am afraid the bug fixes and the xalloc patch do not commute. Hence, when including the fixes, I need to update the xalloc patch as well.

comment:88 Changed 7 years ago by SimonKing

Hooray, it does build on OSX (bsd.math)!!

Changed 7 years ago by SimonKing

Backporting a small part of Singular changeset baadc0f7

comment:89 Changed 7 years ago by SimonKing

  • Authors set to Nils Bruin, Simon King
  • Description modified (diff)
  • Report Upstream changed from Reported upstream. Developers acknowledge bug. to Fixed upstream, in a later stable release.

comment:90 follow-up: Changed 7 years ago by SimonKing

The new Singular spkg is intended to eventually become a new patch level of the standard Singular spkg.

The spkg is not ready for review yet (for example, SPKG.txt needs to be updated, and there are uncommited changes). But it contains two bug fixes from upstream, it builds on Linux and OSX, and one can opionally build libsingular and the Singular executable based on xalloc.

The disadvantage is that there are much more changes to the upstream sources compared with Nils' spkg. So, it could very well be that in the xalloc version one sees bugs that are actually artefacts of totally stripping down omalloc.

Here is what happens with the new spkg, build with export SINGULAR_XALLOC=yes on my !openSuse laptop:

Without MALLOC_CHECK

  • Sage can not quit after executing a very basic polynomial arithmetics:
    sage: P.<x,y> = QQ[]
    sage: x*y
    x*y
    sage: quit
    Exiting Sage (CPU time 0m0.04s, Wall time 0m3.72s).
    
    The shell prompt does not return, ctrl-C has no effect - I have to pkill all Python processes! Note that P.<x,y>=QQ[] alone does not trigger the problem.
  • In the example from the ticket description, one does not get a crash when multiplying two elements of a g-algebra, but one gets a crash when quitting Sage afterwards:
    sage: A.<x,y> = FreeAlgebra(QQ, 2)
    sage: P.<x,y> = A.g_algebra({y*x:-x*y})
    sage: x*y
    x*y
    sage: quit
    Exiting Sage (CPU time 0m0.19s, Wall time 0m19.09s).
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libcsage.so(print_backtrace+0x31)[0x7f951301ac77]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libcsage.so(sigdie+0x14)[0x7f951301aca9]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libcsage.so(sage_signal_handler+0x216)[0x7f951301a886]
    /lib64/libpthread.so.0(+0xfd00)[0x7f95183f0d00]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(+0x129940)[0x7f9518727940]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(PyGC_Collect+0x24)[0x7f9518728694]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(Py_Finalize+0x116)[0x7f9518714316]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(Py_Exit+0x8)[0x7f9518713308]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(+0x115434)[0x7f9518713434]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(PyErr_PrintEx+0x205)[0x7f95187136d5]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0x1fe)[0x7f9518713b4e]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(Py_Main+0xbd5)[0x7f9518727035]
    /lib64/libc.so.6(__libc_start_main+0xed)[0x7f9517a1423d]
    python[0x4006e1]
    
    ------------------------------------------------------------------------
    Unhandled SIGSEGV: A segmentation fault occurred in Sage.
    This probably occurred because a *compiled* component of Sage has a bug
    in it and is not properly wrapped with sig_on(), sig_off(). You might
    want to run Sage under gdb with 'sage -gdb' to debug this.
    Sage will now terminate.
    ------------------------------------------------------------------------
    /home/simon/SAGE/debug/sage-5.5.rc0/spkg/bin/sage: Zeile 310:  5308 Speicherzugriffsfehler  sage-ipython "$@" -i
    

With export MALLOC_CHECK=3

  • Sage crashes when quitting after executing a very basic polynomial arithmetics:
    sage: P.<x,y> = QQ[]
    sage: x*y
    x*y
    sage: 
    Exiting Sage (CPU time 0m0.05s, Wall time 0m26.09s).
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libcsage.so(print_backtrace+0x31)[0x7fcc69d1ac77]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libcsage.so(sigdie+0x14)[0x7fcc69d1aca9]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libcsage.so(sage_signal_handler+0x216)[0x7fcc69d1a886]
    /lib64/libpthread.so.0(+0xfd00)[0x7fcc6f339d00]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(+0x1296d8)[0x7fcc6f6706d8]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(+0x75bca)[0x7fcc6f5bcbca]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(+0x129977)[0x7fcc6f670977]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(PyGC_Collect+0x24)[0x7fcc6f671694]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(Py_Finalize+0x116)[0x7fcc6f65d316]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(Py_Exit+0x8)[0x7fcc6f65c308]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(+0x115434)[0x7fcc6f65c434]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(PyErr_PrintEx+0x205)[0x7fcc6f65c6d5]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0x1fe)[0x7fcc6f65cb4e]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(Py_Main+0xbd5)[0x7fcc6f670035]
    /lib64/libc.so.6(__libc_start_main+0xed)[0x7fcc6e95d23d]
    python[0x4006e1]
    
    ------------------------------------------------------------------------
    Unhandled SIGSEGV: A segmentation fault occurred in Sage.
    This probably occurred because a *compiled* component of Sage has a bug
    in it and is not properly wrapped with sig_on(), sig_off(). You might
    want to run Sage under gdb with 'sage -gdb' to debug this.
    Sage will now terminate.
    ------------------------------------------------------------------------
    /home/simon/SAGE/debug/sage-5.5.rc0/spkg/bin/sage: Zeile 310:  5393 Speicherzugriffsfehler  sage-ipython "$@" -i
    
  • There is a memory corruption detected already when creating the g-algebra. The Sage or shell prompt does not appear, ctrl-C has no effect, I need to pkill python:
    sage: A.<x,y> = FreeAlgebra(QQ, 2)
    sage: P.<x,y> = A.g_algebra({y*x:-x*y})
    *** glibc detected *** python: malloc(): memory corruption: 0x0000000002c8db40 ***
    

Two upstream bugs were fixed in the new spkg. Now the question is whether the findings above will point to further upstream bugs or are simply consequences of the rather brutal approach of replacing omalloc with xalloc.

Nils, can you analyse what is happening here? You are a lot better than I in using gdb and valgrind.

Version 0, edited 7 years ago by SimonKing (next)

comment:91 in reply to: ↑ 90 Changed 7 years ago by nbruin

sage: P.<x,y> = QQ[]
sage: x*y
x*y

I put this in a file to run as a doctest using valgrind and gdb. The crash happens in python's gcmodule, so the memory corruption is severe.

==12827== Invalid write of size 8
==12827==    at 0x26A58673: pp_Mult_mm__FieldQ_LengthThree_OrdGeneral (pp_Mult_mm__T.cc:48)
==12827==    by 0x27748935: __pyx_f_4sage_4libs_8singular_10polynomial_singular_polynomial_mul(spolyrec**, spolyrec*, spolyrec*, sip_sring*) (pInline2.h:673)
==12827==    by 0x265BCE3F: __pyx_f_4sage_5rings_10polynomial_28multi_polynomial_libsingular_23MPolynomial_libsingular__mul_(__pyx_obj_4sage_5rings_10polynomial_28multi_polynomial_libsingular_MPolynomial_libsingular*, __pyx_obj_4sage_9structure_7element_RingElement*, int) (multi_polynomial_libsingular.cpp:16475)
==12827==    by 0x154E9EA8: __pyx_pw_4sage_9structure_7element_11RingElement_11__mul__ (element.c:14091)
...
==12827==  Address 0x101597c0 is 0 bytes after a block of size 16 alloc'd
==12827==    at 0x4A0762F: malloc (vg_replace_malloc.c:270)
==12827==    by 0x26A5865E: pp_Mult_mm__FieldQ_LengthThree_OrdGeneral (omalloc.h:85)
==12827==    by 0x27748935: __pyx_f_4sage_4libs_8singular_10polynomial_singular_polynomial_mul(spolyrec**, spolyrec*, spolyrec*, sip_sring*) (pInline2.h:673)
==12827==    by 0x265BCE3F: __pyx_f_4sage_5rings_10polynomial_28multi_polynomial_libsingular_23MPolynomial_libsingular__mul_(__pyx_obj_4sage_5rings_10polynomial_28multi_polynomial_libsingular_MPolynomial_libsingular*, __pyx_obj_4sage_9structure_7element_RingElement*, int) (multi_polynomial_libsingular.cpp:16475)
==12827==    by 0x154E9EA8: __pyx_pw_4sage_9structure_7element_11RingElement_11__mul__ (element.c:14091)
==12827==    by 0x4C5435E: binary_op1 (abstract.c:945)

==12827== Invalid write of size 8
==12827==    at 0x26A5867F: pp_Mult_mm__FieldQ_LengthThree_OrdGeneral (pp_Mult_mm__T.cc:49)
==12827==    by 0x27748935: __pyx_f_4sage_4libs_8singular_10polynomial_singular_polynomial_mul(spolyrec**, spolyrec*, spolyrec*, sip_sring*) (pInline2.h:673)
==12827==    by 0x265BCE3F: __pyx_f_4sage_5rings_10polynomial_28multi_polynomial_libsingular_23MPolynomial_libsingular__mul_(__pyx_obj_4sage_5rings_10polynomial_28multi_polynomial_libsingular_MPolynomial_libsingular*, __pyx_obj_4sage_9structure_7element_RingElement*, int) (multi_polynomial_libsingular.cpp:16475)
==12827==    by 0x154E9EA8: __pyx_pw_4sage_9structure_7element_11RingElement_11__mul__ (element.c:14091)
==12827==    by 0x4C5435E: binary_op1 (abstract.c:945)
...
==12827==  Address 0x101597c8 is 8 bytes after a block of size 16 alloc'd
==12827==    at 0x4A0762F: malloc (vg_replace_malloc.c:270)
==12827==    by 0x26A5865E: pp_Mult_mm__FieldQ_LengthThree_OrdGeneral (omalloc.h:85)
==12827==    by 0x27748935: __pyx_f_4sage_4libs_8singular_10polynomial_singular_polynomial_mul(spolyrec**, spolyrec*, spolyrec*, sip_sring*) (pInline2.h:673)
==12827==    by 0x265BCE3F: __pyx_f_4sage_5rings_10polynomial_28multi_polynomial_libsingular_23MPolynomial_libsingular__mul_(__pyx_obj_4sage_5rings_10polynomial_28multi_polynomial_libsingular_MPolynomial_libsingular*, __pyx_obj_4sage_9structure_7element_RingElement*, int) (multi_polynomial_libsingular.cpp:16475)
==12827==    by 0x154E9EA8: __pyx_pw_4sage_9structure_7element_11RingElement_11__mul__ (element.c:14091)
==12827==    by 0x4C5435E: binary_op1 (abstract.c:945)

Similar message 16 bytes past and one invalid write Address 0x101597d8 is not stack'd, malloc'd or (recently) free'd

Also:

==12827== Invalid read of size 8
==12827==    at 0x269A4A56: nlNormalize(snumber*&) (longrat.cc:1122)
==12827==    by 0x269C7E50: p_Normalize(spolyrec*, sip_sring*) (polys.cc:647)
==12827==    by 0x265BD052: __pyx_f_4sage_5rings_10polynomial_28multi_polynomial_libsingular_23MPolynomial_libsingular__mul_(__pyx_obj_4sage_5rings_10polynomial_28multi_polynomial_libsingular_MPolynomial_libsingular*, __pyx_obj_4sage_9structure_7element_RingElement*, int) (multi_polynomial_libsingular.cpp:31792)
==12827==    by 0x154E9EA8: __pyx_pw_4sage_9structure_7element_11RingElement_11__mul__ (element.c:14091)
==12827==    by 0x4C5435E: binary_op1 (abstract.c:945)
...
==12827==  Address 0x101597c0 is 0 bytes after a block of size 16 alloc'd
==12827==    at 0x4A0762F: malloc (vg_replace_malloc.c:270)
==12827==    by 0x26A5865E: pp_Mult_mm__FieldQ_LengthThree_OrdGeneral (omalloc.h:85)
==12827==    by 0x27748935: __pyx_f_4sage_4libs_8singular_10polynomial_singular_polynomial_mul(spolyrec**, spolyrec*, spolyrec*, sip_sring*) (pInline2.h:673)
==12827==    by 0x265BCE3F: __pyx_f_4sage_5rings_10polynomial_28multi_polynomial_libsingular_23MPolynomial_libsingular__mul_(__pyx_obj_4sage_5rings_10polynomial_28multi_polynomial_libsingular_MPolynomial_libsingular*, __pyx_obj_4sage_9structure_7element_RingElement*, int) (multi_polynomial_libsingular.cpp:16475)
==12827==    by 0x154E9EA8: __pyx_pw_4sage_9structure_7element_11RingElement_11__mul__ (element.c:14091)
==12827==    by 0x4C5435E: binary_op1 (abstract.c:945)

==12827== Invalid read of size 8
==12827==    at 0x269CBC3D: p_String0(spolyrec*, sip_sring*, sip_sring*) (polys0.cc:90)
==12827==    by 0x27748087: __pyx_f_4sage_4libs_8singular_10polynomial_singular_polynomial_str(spolyrec*, sip_sring*) (polynomial.cpp:4266)
==12827==    by 0x2658F9B1: __pyx_pw_4sage_5rings_10polynomial_28multi_polynomial_libsingular_23MPolynomial_libsingular_33_repr_(_object*, _object*) (multi_polynomial_libsingular.cpp:17242)
==12827==    by 0x4C59402: PyObject_Call (abstract.c:2529)
==12827==    by 0x10CD2408: __pyx_pw_4sage_9structure_11sage_object_10SageObject_5__repr__ (sage_object.c:1790)
...
==12827==  Address 0x101597d8 is not stack'd, malloc'd or (recently) free'd

==12827== Invalid read of size 8
==12827==    at 0x269CB926: writemon(spolyrec*, int, sip_sring*) (polys0.cc:25)
==12827==    by 0x269CBC82: p_String0(spolyrec*, sip_sring*, sip_sring*) (polys0.cc:92)
==12827==    by 0x27748087: __pyx_f_4sage_4libs_8singular_10polynomial_singular_polynomial_str(spolyrec*, sip_sring*) (polynomial.cpp:4266)
==12827==    by 0x2658F9B1: __pyx_pw_4sage_5rings_10polynomial_28multi_polynomial_libsingular_23MPolynomial_libsingular_33_repr_(_object*, _object*) (multi_polynomial_libsingular.cpp:17242)
...
==12827==  Address 0x101597c0 is 0 bytes after a block of size 16 alloc'd
==12827==    at 0x4A0762F: malloc (vg_replace_malloc.c:270)
==12827==    by 0x26A5865E: pp_Mult_mm__FieldQ_LengthThree_OrdGeneral (omalloc.h:85)
==12827==    by 0x27748935: __pyx_f_4sage_4libs_8singular_10polynomial_singular_polynomial_mul(spolyrec**, spolyrec*, spolyrec*, sip_sring*) (pInline2.h:673)
==12827==    by 0x265BCE3F: __pyx_f_4sage_5rings_10polynomial_28multi_polynomial_libsingular_23MPolynomial_libsingular__mul_(__pyx_obj_4sage_5rings_10polynomial_28multi_polynomial_libsingular_MPolynomial_libsingular*, __pyx_obj_4sage_9structure_7element_RingElement*, int) (multi_polynomial_libsingular.cpp:16475)
==12827==    by 0x154E9EA8: __pyx_pw_4sage_9structure_7element_11RingElement_11__mul__ (element.c:14091)

etc. I tracked down the other bugs from similar information, so good luck! My guess is that this PolyBin? is a little more required than assumed.

This is really just running the test and copy/paste from valgrind! It's much faster if you can do this yourself. }}}

comment:92 follow-up: Changed 7 years ago by nbruin

Indeed, kernel/pp_Mult_mm__T.cc:32:

  omBin bin = ri->PolyBin;
  DECLARE_LENGTH(const unsigned long length = ri->ExpL_Size);
  const unsigned long* m_e = m->exp;
  pAssume(!n_IsZero(ln,ri));
  pAssume1(p_GetComp(m, ri) == 0 || p_MaxComp(p, ri) == 0);
  number tmp;

  do
  {
    tmp = n_Mult(ln, pGetCoeff(p), ri);
#ifdef HAVE_ZERODIVISORS
    if (! n_IsZero(tmp, ri))
    {
#endif
      p_AllocBin( pNext(q), bin, ri);
      q = pNext(q);
      pSetCoeff0(q, tmp);

I haven't looked into what happens here exactly, but r->PolyBin does get passed around. It might not be deref problem, though, but: omalloc/omalloc.h:142 (xalloc really)

#define omTypeAllocBin(T,P,B)    P=(T)omAlloc(sizeof(B))
#define omTypeAlloc(T,P,S)       P=(T)omAlloc(S)
#define omTypeAlloc0Bin(T,P,B)   P=(T)omAlloc0(B)

looks suspicious to me. The two allocbin routines look incompatible to me. I think they are supposed to have the same interface type.

comment:93 in reply to: ↑ 92 Changed 7 years ago by SimonKing

Replying to nbruin:

I haven't looked into what happens here exactly, but r->PolyBin does get passed around. It might not be deref problem, though, but: omalloc/omalloc.h:142 (xalloc really)

#define omTypeAllocBin(T,P,B)    P=(T)omAlloc(sizeof(B))
#define omTypeAlloc(T,P,S)       P=(T)omAlloc(S)
#define omTypeAlloc0Bin(T,P,B)   P=(T)omAlloc0(B)

looks suspicious to me. The two allocbin routines look incompatible to me. I think they are supposed to have the same interface type.

Could very well be. The "sizeof(...)" was my addition - without it, the compiler would complain, and indeed looking into the code it seems that a size was expected. Perhaps I should add the "sizeof" to the other two macros as well, even though the compiler did not complain about them.

Or, yet another possibility: If I am not mistaken, the bins don't really matter when omalloc is ripped out. Hence, one could perhaps simply do nothing instead?

Alternatively, one could think of defining typedef size_t* omBin; rather than the current typedef size_t omBin, so that at least we have a pointer which we can allocate memory for.

comment:94 Changed 7 years ago by nbruin

Well, pNext(q) (soon to be q) gets allocated via:

#define p_AllocBin(p, bin, r)   omTypeAllocBin(poly, p, bin)

If I'm not mistaken, a poly is a pointer to what is ultimately

struct  spolyrec
{
  poly      next;           // next needs to be the first field
  number    coef;           // and coef the second --- do not change this !!!
  unsigned long exp[VARS];  // make sure that exp is aligned
};

so whatever you allocate, it had better have enough room for that.

As far as I understand, a "bin" is a (collection of) page(s) that gets used for allocation of same-sized memory blocks. So I'd expect an alloc that uses a bin reference to get the size to alocate from the bin rather than as a parameter. Here, things get allocated from the "poly" bin. So perhaps you don't need bins per se, but I think you do need to keep enough of the bin to know what size memory blocks are associated with it. The above code doesn't seem to do that (none of it).

I would think (T)omAlloc(sizeof(*T)) or something like that would be closer to what's required. At least (assuming T is a pointer type) it would allocate a block that's big enough to let a T type point to. But I may well be misreading the code.

Last edited 7 years ago by nbruin (previous) (diff)

comment:95 Changed 7 years ago by nbruin

OK, my C skills are insufficient for this:

#include<stdio.h>

typedef struct{
  int a; int b; int c; int d; int e;
} composite;

typedef composite *pointer;

void main() {
  pointer v;
  printf("sizeof(composite)=%d\n",sizeof(composite));
  printf("sizeof(t)=%d\n",sizeof(t));
  printf("sizeof(*v)=%d\n",sizeof(*v));
};

produces the desired output:

sizeof(composite)=20
sizeof(pointer)=8
sizeof(*v)=20

So if I have my hands on an instance of a "pointer" variable, I can get the size of the structure pointed to by dereferencing. However, if I want to get this straight from the pointer type there doesn't seem to be a syntax for it. sizeof(*pointer) gives an error ... How do you get that information?

SOLVED:

  printf("sizeof(*(pointer)NULL)=%d\n",sizeof(*(pointer)NULL));
Last edited 7 years ago by nbruin (previous) (diff)

comment:96 Changed 7 years ago by nbruin

OK, this trick doesn't work. There are calls of the form

omTypeAllocBin(void*, addr, InternalInteger_bin);

which prevent and there are calls to omAllocBin anyway that only have the bin, not a type hint for how much memory should be allocated. So removing bins completely will simply not work: you need them at least as a storage container for block lengths associated to them.

Short: The AllocBin routines seem to allocate from a bin, not the bin itself. I think you can take this back to libsingular-devel. Hannes claims that xalloc is a sufficient omalloc replacement, so he probably has a solution to this (which I bet is resurrecting omBin_s and supporting rudimentary initialization of it. We only need the sizeW field on these things)

In fact, from the line

omBin slists_bin = omGetSpecBin(sizeof(slists));

I bet that you can declare

typedef struct omBin_s {
  size_t sizeW;
  size_t sizeB;
} omBin_t;
typedef omBin_t* omBin;

static inline omBin omGetSpecBin(size_t S)
{   omBin bin = (omBin)malloc(sizeof(omBin_t));
    bin->sizeB = S;
    bin->sizeW = S/sizeof(long);
    return bin;
}

static inline void omUnGetSpecBin(omBin bin)
{
    free(bin);
}

#define omTypeAllocBin(T,P,B)    P=(T)omAlloc(B->sizeB)

etc.

Last edited 7 years ago by nbruin (previous) (diff)

comment:97 follow-up: Changed 7 years ago by nbruin

NO!! It's much easier. Hannes is right: just store bin->sizeB directly in the bin. Then

#define omTypeAllocBin(T,P,B)    P=(T)omAlloc(B)

is the correct description, since the bin now simply IS the size_t.

comment:98 in reply to: ↑ 97 Changed 7 years ago by SimonKing

Replying to nbruin:

NO!! It's much easier. Hannes is right: just store bin->sizeB directly in the bin. Then

#define omTypeAllocBin(T,P,B)    P=(T)omAlloc(B)

is the correct description, since the bin now simply IS the size_t.

Sorry, I don't get that.

typedef size_t  omBin;

static inline omBin omGetSpecBin(size_t S)
{   return (omBin)sizeof(omBin_t);
}

or what?

And remove omUnGetSpecBin?

And then

#define omTypeAllocBin(T,P,B)    P=(T)omAlloc(B)

?

comment:99 Changed 7 years ago by SimonKing

PS: Could you please have a look at the file omalloc/Makefile that my patch creates? It contains

install-nolns: install
install-libsingular: install

Both targets install-nolns and install-libsingular are requested. Does that mean that the compilation is done twice? If this is so, how can I prevent it?

comment:100 Changed 7 years ago by SimonKing

No. It doesn't work in the way I just tried. Now Sage won't even start.

comment:101 Changed 7 years ago by SimonKing

I have updated the new spkg and posted a new version of singular_xalloc.patch.

Now it seems to work!

simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> ./sage 
----------------------------------------------------------------------
| Sage Version 5.5.rc0, Release Date: 2012-11-17                     |
| Type "notebook()" for the browser-based notebook interface.        |
| Type "help()" for help.                                            |
----------------------------------------------------------------------
**********************************************************************
*                                                                    *
* Warning: this is a prerelease version, and it may be unstable.     *
*                                                                    *
**********************************************************************
sage: A.<x,y> = FreeAlgebra(QQ, 2)
sage: P.<x,y> = A.g_algebra({y*x:-x*y})
sage: x*y
x*y
sage: quit
Exiting Sage (CPU time 0m0.48s, Wall time 0m10.86s).
simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> export MALLOC_CHECK_=3
simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> ./sage 
----------------------------------------------------------------------
| Sage Version 5.5.rc0, Release Date: 2012-11-17                     |
| Type "notebook()" for the browser-based notebook interface.        |
| Type "help()" for help.                                            |
----------------------------------------------------------------------
**********************************************************************
*                                                                    *
* Warning: this is a prerelease version, and it may be unstable.     *
*                                                                    *
**********************************************************************
sage: P.<x,y> = QQ[]
sage: x*y
x*y
sage: quit
Exiting Sage (CPU time 0m0.04s, Wall time 0m6.09s).
simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> ./sage 
----------------------------------------------------------------------
| Sage Version 5.5.rc0, Release Date: 2012-11-17                     |
| Type "notebook()" for the browser-based notebook interface.        |
| Type "help()" for help.                                            |
----------------------------------------------------------------------
**********************************************************************
*                                                                    *
* Warning: this is a prerelease version, and it may be unstable.     *
*                                                                    *
**********************************************************************
sage: A.<x,y> = FreeAlgebra(QQ, 2)
sage: P.<x,y> = A.g_algebra({y*x:-x*y})
sage: x*y
x*y
sage: quit
Exiting Sage (CPU time 0m0.17s, Wall time 0m8.22s).

My suggestion: Using the doc tests (run with MALLOC_CHECK_=3), we try to detect further crashes. This may result in further up- or downstream fixes. Eventually, the new spkg shall become standard, with all the fixes backported, and with optional xalloc support.

comment:102 Changed 7 years ago by SimonKing

I am now running make test with MALLOC_CHECK_=3 and it seems to work quit well!

However, it seems that the spkg should also install omalloc/mylimits.h: It turns out that ./sage -br only works since there is mylimits.h around from building the original singular spkg.

comment:103 Changed 7 years ago by SimonKing

PS: Normally, mylimits.h just includes "limits.h" - but in IRIX-6 machines that wouldn't work, and so on these machines the header omlimits.h generated by the configure script would be included.

Hence, I will soon create a new spkg that installs mylimits.h including limits.h - but the xalloc-version of the spkg will not work on IRIX-6 machines.

Changed 7 years ago by SimonKing

Build Singular and libsingular with omalloc replaced by xalloc

comment:104 Changed 7 years ago by SimonKing

PPS: The spkg installing mylimits.h is already posted.

comment:105 Changed 7 years ago by SimonKing

make test reports

        sage -t  -force_lib "devel/sage/sage/graphs/graph_generators.py" # Killed/crashed
        sage -t  -force_lib "devel/sage/sage/misc/sageinspect.py"
Total time for all tests: 6884.7 seconds
make: *** [test] Fehler 132
simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> echo $SINGULAR_XALLOC
yes
simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> echo $MALLOC_CHECK_
3

To be on the safe side, can you verify that I did not misspell the environment variable? I.e., is there a trailing underscore?

Note that the error in sageinspect is due to the fact that I also had #11768 applied, which needs work. I don't know if the crash in graph_generators is related.

Is that good or bad?

  • It is good, since the xalloc version of Singular works very well, and the crashes we discussed here have apparently been fixed upstream (and the fixes backported). Hence, I am now going to finalise the spkg for review.
  • It is bad, since I think we had hoped that we see easy-to-analyse crashes in garbage collection - after all, I am using sage-5.5.rc0, which has some of the weak reference patches applied. But if there is no crash, there is nothing to analyse.

However, if I recall correctly, the crashes did only occur when also considering #12215, not only #715 and #11521.

comment:106 Changed 7 years ago by SimonKing

PS: The graph_generators test pass fine if MALLOC_CHECK_ is not defined. Hence, we see a new bug here. I'll report on sage-devel.

comment:107 Changed 7 years ago by SimonKing

PPS: The graph_generators test also crashes with the default omalloc Singular. Hence, it definitely is unrelated.

comment:108 follow-up: Changed 7 years ago by SimonKing

PPPS:

All tests passed!
Total time for all tests: 6376.5 seconds
bash-3.2$ echo $MALLOC_CHECK_
3

with the xalloc singular on bsd.math!!

comment:109 in reply to: ↑ 108 ; follow-up: Changed 7 years ago by nbruin

Replying to SimonKing:

PPPS:

All tests passed!
Total time for all tests: 6376.5 seconds
bash-3.2$ echo $MALLOC_CHECK_
3

with the xalloc singular on bsd.math!!

That variable doesn't do anything on OSX, so congratulations that xalloc works on OSX, but you haven't done stringent memory checking yet. Try

export DYLD_INSERT_LIBRARIES=/usr/lib/libgmalloc.dylib

instead. see man libgmalloc for various options to select what should be guarded. (warning: It will run MUCH slower and take MUCH more memory. It seems gmalloc is about as strict as ElectricFence which is also incredibly slow. Valgrind is speedy by comparison)

comment:110 in reply to: ↑ 109 Changed 7 years ago by SimonKing

Replying to nbruin:

That variable doesn't do anything on OSX, so congratulations that xalloc works on OSX, but you haven't done stringent memory checking yet. Try

export DYLD_INSERT_LIBRARIES=/usr/lib/libgmalloc.dylib

instead. see man libgmalloc for various options to select what should be guarded. (warning: It will run MUCH slower and take MUCH more memory. It seems gmalloc is about as strict as ElectricFence which is also incredibly slow. Valgrind is speedy by comparison)

Thank you!

First result:

----------------------------------------------------------------------
| Sage Version 5.5.rc0, Release Date: 2012-11-17                     |
| Type "notebook()" for the browser-based notebook interface.        |
| Type "help()" for help.                                            |
----------------------------------------------------------------------
**********************************************************************
*                                                                    *
* Warning: this is a prerelease version, and it may be unstable.     *
*                                                                    *
**********************************************************************
GuardMalloc: Allocations will be placed on 16 byte boundaries.
GuardMalloc:  - Some buffer overruns may not be noticed.
GuardMalloc:  - Applications using vector instructions (e.g., SSE or Altivec) should work.
GuardMalloc: GuardMalloc version 23
GuardMalloc: Allocations will be placed on 16 byte boundaries.
GuardMalloc:  - Some buffer overruns may not be noticed.
GuardMalloc:  - Applications using vector instructions (e.g., SSE or Altivec) should work.
GuardMalloc: GuardMalloc version 23
GuardMalloc: Allocations will be placed on 16 byte boundaries.
GuardMalloc:  - Some buffer overruns may not be noticed.
GuardMalloc:  - Applications using vector instructions (e.g., SSE or Altivec) should work.
GuardMalloc: GuardMalloc version 23
GuardMalloc: Allocations will be placed on 16 byte boundaries.
GuardMalloc:  - Some buffer overruns may not be noticed.
GuardMalloc:  - Applications using vector instructions (e.g., SSE or Altivec) should work.
GuardMalloc: GuardMalloc version 23
sage: A.<x,y> = FreeAlgebra(QQ, 2)
sage: P.<x,y> = A.g_algebra({y*x:-x*y})
sage: x*y
x*y
sage: quit
Exiting Sage (CPU time 0m24.65s, Wall time 1m15.86s).

So, the bug really is fixed, on OSX as well.

Second result: Yes, it is incredibly slow!

Nevertheless, I just started sage -testall under gmalloc on bsd.math. Let's see whether it will finish this year...

comment:111 Changed 7 years ago by SimonKing

How can I increase the timeout for doctests? According to what google gave me, I tried export TIMEOUT=1800, but that didn't work.

Last edited 7 years ago by SimonKing (previous) (diff)

comment:112 Changed 7 years ago by SimonKing

Aha! It seems to require SAGE_TIMEOUT!

comment:113 Changed 7 years ago by SimonKing

Even a timeout of 1800 seconds is not enough for any test. I tried sage/schemes (because memory problems often show up there), but every single test times out.

comment:114 follow-up: Changed 7 years ago by nbruin

I think running sage -t --gdb ... disables the timeout. If you're just testing a single file you might as well do that (in screen of course--it'll be a while before you check back). It's how I got thetraceback for the first heisenbug. It also leaves you immediately with an environment where you can examine the backtrace etc.

comment:115 in reply to: ↑ 114 Changed 7 years ago by SimonKing

Replying to nbruin:

I think running sage -t --gdb ... disables the timeout. If you're just testing a single file you might as well do that (in screen of course--it'll be a while before you check back). It's how I got thetraceback for the first heisenbug. It also leaves you immediately with an environment where you can examine the backtrace etc.

Well, I first wanted to have a "quick" overview: Is there any problematic test? Namely, on Linux, everything works just fine with MALLOC_CHECK_ (except the known problem with the graphs). And if there is another crash on OSX in the quick test then I wanted to analyse it more deeply.

But the only test that did not timeout with SAGE_TIMEOUT=1800 was a test that normally takes 3.5 seconds...

Is there any other memory checker than gmalloc that I can use on OSX, that is faster and finds some of the issues that gmalloc would find?

comment:116 Changed 7 years ago by nbruin

Concerning the packaging of xalloc in the singular package. Presently:

$ ls -l /tmp/singular_xalloc.patch
-r--------. 1  847572 Dec  1 10:05 /tmp/singular_xalloc.patch
$ grep '^-' /tmp/singular_xalloc.patch |wc
  26089  106795  824416
$ grep '^+' /tmp/singular_xalloc.patch |wc
    374    1168   10286

so it's a large patch: 800kB on 8MB for the spkg. I guess it'll compress well, so the ration is not quite 10%, but still: The patch mainly consists of exactly listing all the files in the omalloc directory in order to delete them. That means that we're now shipping omalloc twice in order to be able to remove it.

I propose to code the patch differently:

  • put xalloc into its own subdir; inside next to omalloc probably. You can do this by patch or something else.
  • If SINGULAR_XALLOC is yes, do explicit mv and or ln -s commands to more the original omalloc out of the way and replace it by xalloc
  • the changes to the singular source itself should of course remain in the patch file.

Perhaps it's not according to the letter of "use patch to change src", but patch is just inappropriate for a rip out and replace operation like this.

comment:117 Changed 7 years ago by SimonKing

  • Description modified (diff)
  • Status changed from new to needs_review

I have produced a new patchlevel for the singular spkg. The optional patch is now smaller, since I simply erase the content of the omalloc/ folder, so that the removed files are not part of the patch.

Needs review!

comment:118 Changed 7 years ago by SimonKing

With singular-xalloc and using gmalloc on bsd.math, I tested sage/schemes. Of course, most tests timed out. But these few did fail differently:

        sage -t  "devel/sage/sage/schemes/toric/divisor_class.pyx"
        sage -t  "devel/sage/sage/schemes/toric/fano_variety.py"
        sage -t  "devel/sage/sage/schemes/toric/homset.py"
        sage -t  "devel/sage/sage/schemes/toric/library.py"

I don't know why, but there was no test.log for it. Hence, I am now repeating these tests separately.

comment:119 follow-up: Changed 7 years ago by nbruin

Valgrind gives suspicious records for two of these tests:

sage -t --valgrind "devel/sage/sage/schemes/toric/fano_variety.py"

==20768== Syscall param open(filename) points to unaddressable byte(s)
==20768==    at 0x31CFEE41F0: __open_nocancel (in /lib64/libc-2.14.90.so)
==20768==    by 0x31CFE79508: _IO_file_fopen@@GLIBC_2.2.5 (in /lib64/libc-2.14.90.so)
==20768==    by 0x31CFE6E265: __fopen_internal (in /lib64/libc-2.14.90.so)
==20768==    by 0x31D6275A06: std::__basic_file<char>::open(char const*, std::_Ios_Openmode, int) (in /usr/lib64/libstdc++.so.6.0.16)
==20768==    by 0x31D627947B: std::basic_filebuf<char, std::char_traits<char> >::open(char const*, std::_Ios_Openmode) (in /usr/lib64/libstdc++.so.6.0.16)
==20768==    by 0x31D627A9D2: std::basic_ofstream<char, std::char_traits<char> >::basic_ofstream(char const*, std::_Ios_Openmode) (in /usr/lib64/libstdc++.so.6.0.16)
==20768==    by 0x2D69E6E2: LinBox::commentator() (commentator.h:857)
==20768==    by 0x2D7BF15E: unsigned long& LinBox::rank<LinBox::BlasMatrix<LinBox::PID_integer>, LinBox::HybridSpecifier>(unsigned long&, LinBox::BlasMatrix<LinBox::PID_integer> const&, LinBox::RingCategories::IntegerTag const&, LinBox::HybridSpecifier const&) (rank.h:629)
==20768==    by 0x2D6A864C: linbox_integer_dense_rank(__mpz_struct (**) [1], unsigned long, unsigned long) (rank.h:110)
==20768==    by 0x2B4B25FC: __pyx_pf_4sage_6matrix_20matrix_integer_dense_20Matrix_integer_dense_100_rank_linbox (matrix_integer_dense.c:25464)
==20768==    by 0x4C59402: PyObject_Call (abstract.c:2529)
==20768==    by 0x2B4BFC5E: __pyx_pw_4sage_6matrix_20matrix_integer_dense_20Matrix_integer_dense_99rank (matrix_integer_dense.c:25357)
==20768==    by 0x4C59402: PyObject_Call (abstract.c:2529)
==20768==    by 0x2BB90582: __pyx_pw_4sage_6matrix_21matrix_rational_dense_21Matrix_rational_dense_83rank (matrix_rational_dense.c:21989)
==20768==    by 0x4C59402: PyObject_Call (abstract.c:2529)
==20768==    by 0x2973737D: __pyx_pw_4sage_6matrix_7matrix2_6Matrix_7solve_right (matrix2.c:4604)
==20768==    by 0x4CFC624: PyEval_EvalFrameEx (ceval.c:4021)
==20768==    by 0x4CFE274: PyEval_EvalCodeEx (ceval.c:3253)
==20768==    by 0x4CFC69F: PyEval_EvalFrameEx (ceval.c:4117)
==20768==    by 0x4CFE274: PyEval_EvalCodeEx (ceval.c:3253)
==20768==    by 0x4CFC69F: PyEval_EvalFrameEx (ceval.c:4117)
==20768==  Address 0x0 is not stack'd, malloc'd or (recently) free'd

I don't know how serious it is to call open with a zero pointer. Probably not at all...

sage -t "devel/sage/sage/schemes/toric/library.py":

==21248== Invalid read of size 8
==21248==    at 0x3432C24F: __pyx_f_4sage_9functions_8prime_pi_7PrimePi_prime_phi_small (prime_pi.c:2190)
==21248==    by 0x3432A937: __pyx_f_4sage_9functions_8prime_pi_7PrimePi_prime_phi_large (prime_pi.c:1624)
==21248==    by 0x3432B349: __pyx_pf_4sage_9functions_8prime_pi_7PrimePi_4__call__ (prime_pi.c:1489)
==21248==    by 0x3432BBBC: __pyx_pw_4sage_9functions_8prime_pi_7PrimePi_5__call__ (prime_pi.c:1202)
==21248==    by 0x4C59402: PyObject_Call (abstract.c:2529)
==21248==    by 0x2ADCA63A: __pyx_pw_4sage_3ext_13multi_modular_22MultiModularBasis_base_5__init__ (multi_modular.c:4498)
==21248==    by 0x4CB5DC7: type_call (typeobject.c:737)
==21248==    by 0x4C59402: PyObject_Call (abstract.c:2529)
==21248==    by 0x2B4CCDF4: __pyx_pw_4sage_6matrix_20matrix_integer_dense_20Matrix_integer_dense_45_multiply_multi_modular (matrix_integer_dense.c:12884)
==21248==    by 0x4C59402: PyObject_Call (abstract.c:2529)
==21248==    by 0x2B4BE7C6: __pyx_f_4sage_6matrix_20matrix_integer_dense_20Matrix_integer_dense__matrix_times_matrix_ (matrix_integer_dense.c:10112)
==21248==    by 0x43DCC37E: __pyx_f_4sage_6matrix_6action_18MatrixMatrixAction__call_ (action.c:3179)
==21248==    by 0x151550FE: __pyx_f_4sage_9structure_6coerce_24CoercionModel_cache_maps_bin_op (coerce.c:6785)
==21248==    by 0x14CE7775: __pyx_pw_4sage_9structure_7element_6Matrix_5__mul__ (element.c:19222)
==21248==    by 0x4C5435E: binary_op1 (abstract.c:945)
==21248==    by 0x4C572A7: PyNumber_Multiply (abstract.c:1216)
==21248==    by 0x4CFA2F0: PyEval_EvalFrameEx (ceval.c:1264)
==21248==    by 0x4CFE274: PyEval_EvalCodeEx (ceval.c:3253)
==21248==    by 0x4CFC69F: PyEval_EvalFrameEx (ceval.c:4117)
==21248==    by 0x4CFD3CA: PyEval_EvalFrameEx (ceval.c:4107)
==21248==    by 0x4CFD3CA: PyEval_EvalFrameEx (ceval.c:4107)
==21248==    by 0x4CFE274: PyEval_EvalCodeEx (ceval.c:3253)
==21248==    by 0x4CFC69F: PyEval_EvalFrameEx (ceval.c:4117)
==21248==    by 0x4CFD3CA: PyEval_EvalFrameEx (ceval.c:4107)
==21248==    by 0x4CFE274: PyEval_EvalCodeEx (ceval.c:3253)
==21248==  Address 0x3f42f390 is 0 bytes after a block of size 336 alloc'd
==21248==    at 0x4A0762F: malloc (vg_replace_malloc.c:270)
==21248==    by 0x3432B173: __pyx_pf_4sage_9functions_8prime_pi_7PrimePi_4__call__ (memory.h:32)
==21248==    by 0x3432BBBC: __pyx_pw_4sage_9functions_8prime_pi_7PrimePi_5__call__ (prime_pi.c:1202)
==21248==    by 0x4C59402: PyObject_Call (abstract.c:2529)
==21248==    by 0x2ADCA63A: __pyx_pw_4sage_3ext_13multi_modular_22MultiModularBasis_base_5__init__ (multi_modular.c:4498)
==21248==    by 0x4CB5DC7: type_call (typeobject.c:737)
==21248==    by 0x4C59402: PyObject_Call (abstract.c:2529)
==21248==    by 0x2B4CCDF4: __pyx_pw_4sage_6matrix_20matrix_integer_dense_20Matrix_integer_dense_45_multiply_multi_modular (matrix_integer_dense.c:12884)
==21248==    by 0x4C59402: PyObject_Call (abstract.c:2529)
==21248==    by 0x2B4BE7C6: __pyx_f_4sage_6matrix_20matrix_integer_dense_20Matrix_integer_dense__matrix_times_matrix_ (matrix_integer_dense.c:10112)
==21248==    by 0x43DCC37E: __pyx_f_4sage_6matrix_6action_18MatrixMatrixAction__call_ (action.c:3179)
==21248==    by 0x151550FE: __pyx_f_4sage_9structure_6coerce_24CoercionModel_cache_maps_bin_op (coerce.c:6785)
==21248==    by 0x14CE7775: __pyx_pw_4sage_9structure_7element_6Matrix_5__mul__ (element.c:19222)
==21248==    by 0x4C5435E: binary_op1 (abstract.c:945)
==21248==    by 0x4C572A7: PyNumber_Multiply (abstract.c:1216)
==21248==    by 0x4CFA2F0: PyEval_EvalFrameEx (ceval.c:1264)
==21248==    by 0x4CFE274: PyEval_EvalCodeEx (ceval.c:3253)
==21248==    by 0x4CFC69F: PyEval_EvalFrameEx (ceval.c:4117)
==21248==    by 0x4CFD3CA: PyEval_EvalFrameEx (ceval.c:4107)
==21248==    by 0x4CFD3CA: PyEval_EvalFrameEx (ceval.c:4107)
==21248==    by 0x4CFE274: PyEval_EvalCodeEx (ceval.c:3253)
==21248==    by 0x4CFC69F: PyEval_EvalFrameEx (ceval.c:4117)
==21248==    by 0x4CFD3CA: PyEval_EvalFrameEx (ceval.c:4107)
==21248==    by 0x4CFE274: PyEval_EvalCodeEx (ceval.c:3253)
==21248==    by 0x4CFC69F: PyEval_EvalFrameEx (ceval.c:4117)

This looks like an off-by-one on an array of small primes. Indeed, in

sage/functions/prime_pi.pyx:196

        self.primes = <long*> sage_malloc(self.primeCount*sizeof(long))

whereas in

sage/functions/prime_pi.pyx:263

        for i in range(4, self.primeCount):
            if prime >= m_max: break
            x = N / prime
            if preTransition:
                if prime*prime <= x:
                    sum -= self.prime_phi_small(x, prime) - i
                else:
                    sum -= self.prime_phi_small(x) - i
                    preTransition = False
            else:
                sum -= self.prime_phi_small(x) - i
            prime = self.primes[i + 1]
        return sum

The last line indeed gets prime from an unallocated address. However, that value doesn't get used. So if this instruction succeeds I don't think it'll do harm, but in extremely unlucky situations (when self.primes is just at the edge of a page) this could lead to a segfault. The same thing happens at other places in this file too.

The sanitary thing is to allocate one "long" more. Then we'll only read memory that's guaranteed to exist.

Last edited 7 years ago by nbruin (previous) (diff)

comment:120 Changed 7 years ago by nbruin

  • Milestone changed from sage-5.6 to sage-5.5
  • Status changed from needs_review to positive_review

doctests check out as far as I can test. Business rules for spkg have been followed, so I think this can get a positive review.

The new spkg doesn't actually contain code written by me, so I think I'm eligible to give a positive review.

comment:121 Changed 7 years ago by jdemeyer

  • Milestone changed from sage-5.5 to sage-5.6
  • Reviewers set to Nils Bruin

comment:122 in reply to: ↑ 119 ; follow-up: Changed 7 years ago by SimonKing

Replying to nbruin:

Valgrind gives suspicious records for two of these tests:

...

sage -t "devel/sage/sage/schemes/toric/library.py":

...

This looks like an off-by-one on an array of small primes. Indeed, in

sage/functions/prime_pi.pyx:196

        self.primes = <long*> sage_malloc(self.primeCount*sizeof(long))

whereas in

sage/functions/prime_pi.pyx:263

        for i in range(4, self.primeCount):
            if prime >= m_max: break
            x = N / prime
            if preTransition:
                if prime*prime <= x:
                    sum -= self.prime_phi_small(x, prime) - i
                else:
                    sum -= self.prime_phi_small(x) - i
                    preTransition = False
            else:
                sum -= self.prime_phi_small(x) - i
            prime = self.primes[i + 1]
        return sum

The last line indeed gets prime from an unallocated address. However, that value doesn't get used. So if this instruction succeeds I don't think it'll do harm, but in extremely unlucky situations (when self.primes is just at the edge of a page) this could lead to a segfault. The same thing happens at other places in this file too.

The sanitary thing is to allocate one "long" more. Then we'll only read memory that's guaranteed to exist.

OK, but that wouldn't be an upstream bug, right?

Shall it then be dealt with on a different ticket, or shall we provide a patch for the Sage library, in addition to the new Singular spkg?

comment:123 in reply to: ↑ 122 Changed 7 years ago by nbruin

Replying to SimonKing:

OK, but that wouldn't be an upstream bug, right? Shall it then be dealt with on a different ticket, or shall we provide a patch for the Sage library, in addition to the new Singular spkg?

Absolutely a separate ticket! Those issues have nothing to do with singular. They are just diagnosed by running doctests through Valgrind.

comment:124 Changed 7 years ago by jdemeyer

  • Merged in set to sage-5.6.beta0
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:125 Changed 7 years ago by jdemeyer

  • Description modified (diff)

comment:126 follow-up: Changed 7 years ago by jdemeyer

Where did the Makefile for xalloc come from? It's written in a non-autotools way and should be fixed. It's probably easy to fix, but it should be reported "upstream", whoever that is.

comment:127 Changed 7 years ago by jdemeyer

  • Merged in sage-5.6.beta0 deleted
  • Resolution fixed deleted
  • Status changed from closed to new

comment:128 in reply to: ↑ 126 ; follow-up: Changed 7 years ago by SimonKing

Replying to jdemeyer:

Where did the Makefile for xalloc come from? It's written in a non-autotools way and should be fixed. It's probably easy to fix, but it should be reported "upstream", whoever that is.

It originally came from upstream (i.e., Hans Schönemann), but the "install" bits are my modifications. I have no idea how to use autotools to do it correctly.

comment:129 in reply to: ↑ 128 Changed 7 years ago by SimonKing

Replying to SimonKing:

Replying to jdemeyer:

Where did the Makefile for xalloc come from? It's written in a non-autotools way and should be fixed. It's probably easy to fix, but it should be reported "upstream", whoever that is.

It originally came from upstream (i.e., Hans Schönemann), but the "install" bits are my modifications. I have no idea how to use autotools to do it correctly.

PS: I don't think that it would be needed to inform upstream, because the optional xalloc support is not and (to my knowledge) will not be part of an official Singular release.

comment:130 Changed 7 years ago by jdemeyer

I'll fix the Makefile, hang on...

comment:131 Changed 7 years ago by jdemeyer

  • Description modified (diff)
  • Status changed from new to needs_review

Changed 7 years ago by jdemeyer

Patch added to the spkg

comment:132 Changed 7 years ago by jdemeyer

I changed the spkg, the changes are here: xalloc-makefile.diff. Needs review.

comment:133 Changed 7 years ago by vbraun

I'm trying to run Sage under the electric fence memory debugger and also ran into the prime_phi issue. It seems you managed to discuss it in detail but then failed to open an actual ticket for it. This is now #13861

comment:134 Changed 7 years ago by vbraun

  • Status changed from needs_review to positive_review

Jeroen's changes look good to me.

comment:135 Changed 7 years ago by jdemeyer

  • Merged in set to sage-5.6.beta0
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:136 Changed 7 years ago by SimonKing

Sorry for asking so late (after resolving the ticket as fixed). But perhaps it has been a bad idea to use a new environment variable SINGULAR_XALLOC to trigger the use of xalloc. What do you think: Would it be a better idea to build singular with xalloc if SAGE_DEBUG is set to "yes", as in #13864?

comment:137 Changed 7 years ago by vbraun

Switching SINGULAR_XALLOC to SAGE_DEBUG will be done in #13876.

Note: See TracTickets for help on using tickets.