Opened 7 years ago

Last modified 7 years ago

#13731 closed defect

Fix libsingular memory management — at Version 89

Reported by: nbruin Owned by: rlm
Priority: major Milestone: sage-5.6
Component: memleak Keywords:
Cc: SimonKing, fbissey, malb Merged in:
Authors: Nils Bruin, Simon King Reviewers:
Report Upstream: Fixed upstream, in a later stable release. Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by SimonKing)

We have several indications that interaction with libsingular's memory management is tricky. Debugging is hard, because libsingular uses omalloc for its memory management and that hides all allocation/deallocation from conventional memory checking tools. Singular's allocation library is relatively pluggable, though, and (at a speed penalty) one can mostly switch it to using the system malloc for everything.

For that purpose, one can use one of the following:

With the linux spkg, one obtains:

$ export MALLOC_CHECK_ 3
$ sage
...
sage: A.<x,y> = FreeAlgebra(QQ, 2)
sage: P.<x,y> = A.g_algebra({y*x:-x*y})
sage: x*y
*** glibc detected *** /usr/local/sage/5.5b1/local/bin/python: free(): 

gdb backtrace:

#0  0x00000031cfe36285 in raise () from /lib64/libc.so.6
#1  0x00000031cfe37b9b in abort () from /lib64/libc.so.6
#2  0x00000031cfe7774e in __libc_message () from /lib64/libc.so.6
#3  0x00000031cfe7da76 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007fffd82d5c34 in gnc_mm_Mult_nn (F0=<optimized out>,
G0=<optimized out>, r=0x2faa720) at gring.cc:507
#5  0x00007fffd82d6abc in gnc_p_Mult_mm_Common (p=0x326e8c0,
m=<optimized out>, side=0, r=0x2faa720) at gring.cc:424
#6  0x00007fffd76c7860 in nc_mm_Mult_pp (p=0x342aaf0, r=0x2faa720,
m=0x2fb6530) at /usr/local/sage/5.5b1/local/include/singular/gring.h:
171
#7  pp_Mult_qq (r=0x2faa720, q=0x342aaf0, p=0x2fb6530) at /usr/local/
sage/5.5b1/local/include/singular/pInline2.h:667
#8  __pyx_f_4sage_4libs_8singular_10polynomial_singular_polynomial_mul
(__pyx_v_ret=0x7fffffffb3c8, __pyx_v_p=0x2fb6530, __pyx_v_q=0x342aaf0,
__pyx_v_r=0x2faa720)
    at sage/libs/singular/polynomial.cpp:3895
#9  0x00007fffd7b096f8 in
__pyx_f_4sage_5rings_10polynomial_6plural_19NCPolynomial_plural__mul_
(__pyx_v_left=0x7fffbe6bd758, __pyx_v_right=0x7fffbe6bd878,
__pyx_skip_dispatch=<optimized out>)
    at sage/rings/polynomial/plural.cpp:12060

running it in valgrind gets you complaints about the same locations. Meanwhile two of the underlying problems where found and fixed upstream. http://sage.math.washington.edu/home/SimonKing/SAGE/spkgs/singular-3-1-5.p2.spkg contains backports of these fixes.

Change History (92)

comment:1 Changed 7 years ago by nbruin

See also #13447 for previous uses of libsingular-malloc, as well as this sage-devel thread

comment:2 follow-up: Changed 7 years ago by nbruin

We can also use ElectricFence? as a malloc substitute:

$ export LD_PRELOAD=libefence.so
$ export EF_ALLOW_MALLOC_0=1     #apparently sage/python does a malloc(0) somewhere
$ echo 512000 | sudo tee /proc/sys/vm/max_map_count #efence uses mprotect A LOT
$ ./sage -t --gdb devel/sage/sage/libs/singular/function.pyx 

This gives an error in a different location. Traceback:

#0  0x00007fffce9fb3f2 in List<CanonicalForm>::isEmpty(this=0x7fffa515b000) at ./templates/ftmpl_list.cc:256
#1  0x00007fffce96a350 in multiFactorize (F=..., v=...) at facFactorize.cc:710
#2  0x00007fffce905e9d in ratSqrfFactorize (v=..., G=...) at facFactorize.h:45
#3  ratFactorize (G=..., v=..., substCheck=<optimized out>) at facFactorize.h:125
#4  0x00007fffce90303c in factorize (f=..., issqrfree=false) at cf_factor.cc:651
#5  0x00007fffce7a5b44 in singclap_factorize (f=0x7fff9e16dfd0, v=0x7fffffffb548, with_exps=0) at clapsing.cc:835
#6  0x00007fffce720d50 in jjFAC_P (res=0x7fff9e16ffc0, u=0x7fff9ea7afc0) at iparith.cc:3995
#7  0x00007fffce729c42 in iiExprArith1 (res=0x7fff9e16ffc0, a=0x7fff9ea7afc0, op=424) at iparith.cc:7869
#8  0x00007fffce12df55 in __pyx_f_4sage_4libs_8singular_8function_17KernelCallHandler_handle_call(__pyx_v_self=0x7fff9e967b48, __pyx_v_argument_list=0x7fff9e975b40, __pyx_optional_args=<optimized out>) at sage/lib/singular/function.cpp:11286 

All these error point in the same direction, though: As soon as we start interacting with the singular interpreter we get exposed to whatever memory management it does and we're in big trouble. Use of libsingular on more basic levels usually doesn't expose us to libsingular memory management, so no problems arise.

comment:3 Changed 7 years ago by SimonKing

  • Cc SimonKing added

comment:4 in reply to: ↑ description ; follow-up: Changed 7 years ago by pcpa

$ export MALLOC_CHECK_ 3
$ sage ... sage: A.<x,y> = FreeAlgebra (QQ, 2)
sage: P.<x,y> = A.g_algebra({y*x:-x*y})
sage: x*y ***
glibc detected *** /usr/local/sage/5.5b1/local/bin/python: free():

I do not see this in my current sagemath-5.4 rpm package, but I suspect it may have been either corrected or worked around by one of the patches at http://pkgs.fedoraproject.org/cgit/Singular.git/tree most likely Singular-undefined.patch, but the padding added by Singular-M2_memutil_debuggging.patch may be actually hiding some problem. Or, it may be something different, the singular rpm was known to not build properly or generate proper binaries if an older singular was installed but I did not care much about that because official rpms are built in a clean chroot.

comment:5 in reply to: ↑ 4 ; follow-up: Changed 7 years ago by nbruin

Replying to pcpa:

I do not see this in my current sagemath-5.4 rpm package

Thanks for checking!

Are you sure your singular gets built to use the glibc malloc for all small allocations and not omalloc on "malloc"-ed pages? You may try setting EXTRA=0 to remove the padding. I don't expect that padding comes into play, though. I suspect a double free or a free on a pointer that doesn't point to a malloc-ed block to begin with.

comment:6 in reply to: ↑ 5 ; follow-up: Changed 7 years ago by pcpa

Replying to nbruin:

Replying to pcpa:

I do not see this in my current sagemath-5.4 rpm package

Thanks for checking!

Are you sure your singular gets built to use the glibc malloc for all small allocations and not omalloc on "malloc"-ed pages? You may try setting EXTRA=0 to remove the padding. I don't expect that padding comes into play, though. I suspect a double free or a free on a pointer that doesn't point to a malloc-ed block to begin with.

Sorry, I completely misunderstood what was mean't by the spkg and the '--with-emulate-omalloc' option. I will try to debug more later, but I cannot build the Singular rpm with that option as it always fails in the kernel subdirectory due to missing omEmulate* declarations.

comment:7 in reply to: ↑ 6 Changed 7 years ago by nbruin

Sorry, I completely misunderstood what was mean't by the spkg and the '--with-emulate-omalloc' option. I will try to debug more later, but I cannot build the Singular rpm with that option as it always fails in the kernel subdirectory due to missing omEmulate* declarations.

Yes, my edits in the linked-to singular spkgs add declarations to make those errors go away. I didn't exactly respect the rest of build options in the process, though (because I didn't understand them). Cleaning that up to a well-defined patch to the singular package would be a great help.

comment:8 Changed 7 years ago by SimonKing

I am trying to build Sage-5.5.rc0 with the malloc singular spkg for linux. Platform: openSuse 12.1, on Intel Core2 Duo.

What I tried: Untar the sage tarball, replace the usual Singular spkg with yours, and type make. Singular fails to build, because of

g++ -O2 -I. -I.. -I/home/simon/SAGE/debug/sage-5.5.rc0/local -I/home/simon/SAGE/debug/sage-5.5.rc0/local/include -I/home/simon/SAGE/debug/sage-5.5.rc0/local/include -I/home/simon/SAGE/debug/sage-5.5.rc0/local/include  -I/usr/local/include  -DNDEBUG -DOM_NDEBUG -Dx86_64_Linux -DHAVE_CONFIG_H -DESINGULAR -DPROTO -o ESingular emacs.cc ../kernel/fegetopt.o \
-rdynamic -L/home/simon/SAGE/debug/sage-5.5.rc0/local/kernel -L../kernel -lkernel -L/home/simon/SAGE/debug/sage-5.5.rc0/local/lib -L/home/simon/SAGE/debug/sage-5.5.rc0/local/lib  -L/usr/local/lib  -ldl -lm -lsingfac -lsingcf -lntl -lgmp -lreadline -lncurses -lnsl -lm  -lnsl -lbsd  -lomalloc -lpthread  ../kernel/mmalloc.o
/usr/lib64/gcc/x86_64-suse-linux/4.6/../../../../x86_64-suse-linux/bin/ld: cannot find -lbsd

Could it be that it is a problem with SAGE_ROOT/local/lib versus SAGE_ROOT/local/lib64? That used sometimes to be a problem with openSuse.

comment:9 Changed 7 years ago by SimonKing

No, libdir is used in spkg-install. But I wonder: The original Singular spkg contains three Makefiles: in dyn_modules/python, in calldummy/ and in omalloc/Misc/dlmalloc. But the malloc Singular package from here contains 14 Makefiles. Could it be that some of them should better be created by ./configure?

comment:10 Changed 7 years ago by SimonKing

The spkg also contains several files created by configure scripts. Is that really intended?

comment:11 Changed 7 years ago by SimonKing

When I removed all autogenerated files from your spkg, installing singular-3-1-5.malloc-linux succeeded!

I guess you are aware that one is supposed to leave the Singular source files untouched when creating an spkg - after all, Singular-malloc is for testing only.

comment:12 Changed 7 years ago by SimonKing

Building sage-5.5.rc0 did almost succeed.

...
gcc -fno-strict-aliasing -fwrapv -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/simon/SAGE/debug/sage-5.5.rc0/local/include/singular/ -I/home/simon/SAGE/debug/sage-5.5.rc0/local/include -I/home/simon/SAGE/debug/sage-5.5.rc0/local/include/csage -I/home/simon/SAGE/debug/sage-5.5.rc0/devel/sage/sage/ext -I/home/simon/SAGE/debug/sage-5.5.rc0/local/include/python2.7 -c sage/algebras/letterplace/free_algebra_element_letterplace.cpp -o build/temp.linux-x86_64-2.7/sage/algebras/letterplace/free_algebra_element_letterplace.o -w
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for Ada/C/ObjC but not for C++ [enabled by default]
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for Ada/C/ObjC but not for C++ [enabled by default]
In file included from /home/simon/SAGE/debug/sage-5.5.rc0/local/include/singular/structs.h:13:0,
                 from /home/simon/SAGE/debug/sage-5.5.rc0/local/include/libsingular.h:6,
                 from sage/algebras/letterplace/free_algebra_element_letterplace.cpp:283:
/home/simon/SAGE/debug/sage-5.5.rc0/local/include/omalloc.h:784:30: fatal error: omalloc/omMalloc.h: Datei oder Verzeichnis nicht gefundenIn file included from /home/simon/SAGE/debug/sage-5.5.rc0/local/include/singular/structs.h:13:0,
                 from /home/simon/SAGE/debug/sage-5.5.rc0/local/include/libsingular.h:6,
                 from sage/algebras/letterplace/free_algebra_letterplace.cpp:283:
/home/simon/SAGE/debug/sage-5.5.rc0/local/include/omalloc.h:784:30: fatal error: omalloc/omMalloc.h: Datei oder Verzeichnis nicht gefunden
compilation terminated.

So, that looks like coming from Plural.

On the other hand, your example in the ticket description does use plural.

What went wrong here? I did erase src/omalloc/omMalloc.h from your spkg, since it is not part of the "original" Singular spkg (hence, I thought it was autogenerated).

comment:13 follow-up: Changed 7 years ago by SimonKing

PS: Here is a part of the install log for singular-malloc:

updating cache .././config.cache
creating ./config.status
creating Makefile
creating omConfig.h
creating omlimits.h
creating omMalloc.h
configuring in factory
running /bin/sh ./configure  --prefix=/home/simon/SAGE/debug/sage-5.5.rc0/local --exec-prefix=/home/simon/SAGE/debug/sage-5.5.rc0/local --bindir=/home/simon/SAGE/debug/sage-5.5.rc0/local/bin --libdir=/home/simon/SAGE/debug/sage-5.5.rc0/local/lib --with-apint=gmp --with-malloc=system --with-emulate-omalloc --with-NTL --without-MP --without-lex --without-Boost --without-flint --enable-Singular --enable-factory --enable-libfac --enable-IntegerProgramming --disable-doc --with-debug --includedir=/home/simon/SAGE/debug/sage-5.5.rc0/local/include --enable-omalloc --with-external-config_h='/home/simon/SAGE/debug/sage-5.5.rc0/spkg/build/singular-3-1-5.malloc-linux/src/Singular/omSingularConfig.h' --with-track-custom --enable-Plural --with-factory --with-libfac --with-Singular=yes --cache-file=.././config.cache --srcdir=.

So, the missing header actually is created.

Also, I don't understand why omMalloc.h is needed: In a previous Sage installation with the standard Singular 3-1-5 package, I can't find omMalloc.h in SAGE_ROOT/local/....

comment:14 in reply to: ↑ 13 Changed 7 years ago by nbruin

Replying to SimonKing:

So, the missing header is actually is created.

Yes, indeed, and that is the one I copy over manually because I didn't figure out how to do it in the install script (that should be straightforward though)

Also, I don't understand why omMalloc.h is needed: In a previous Sage installation with the standard Singular 3-1-5 package, I can't find omMalloc.h in SAGE_ROOT/local/....

It might not be. The reason why it is needed now is because of the rough way I mutilated the omalloc source in Singular. Its configuration process is a big tangle of pasting together header files, so when I finally found a way to effect the desired changes I was already happy. That involved knocking out some #ifdef ... #endif preprocessor directives in the source. I'm sure it's straightforward to fix that a little more nicely (I wasn't aiming for production code at the time, just a quick tool to debug stuff). That activated a #include <omalloc/omAlloc.h> as well. I don't know if errors happen if you delete that line. Anyway, the libsingular*.pyx files import the header file in which that include ends up, so that's why presently, that file needs to be in the include paths used by sage -b.

I wasn't exaggerating when I called the package rather untidy ...

Last edited 7 years ago by nbruin (previous) (diff)

comment:15 Changed 7 years ago by fbissey

  • Cc fbissey added

comment:16 Changed 7 years ago by SimonKing

I managed to build Sage with Singular-malloc. What I did to make it work:

  • open the spkg
  • remove all autogenerated files
  • change src/omalloc/Makefile.in so that the freshly built omMalloc.h is installed in $(includedir)/omalloc (creating that directory first)

Apparently the singular executable is not usable (when I try to start it, it simply seems to hang).

I can reproduce the crash from the ticket description:

simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> export MALLOC_CHECK_=3
simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> ./sage
----------------------------------------------------------------------
| Sage Version 5.5.rc0, Release Date: 2012-11-17                     |
| Type "notebook()" for the browser-based notebook interface.        |
| Type "help()" for help.                                            |
----------------------------------------------------------------------
**********************************************************************
*                                                                    *
* Warning: this is a prerelease version, and it may be unstable.     *
*                                                                    *
**********************************************************************
sage: A.<x,y> = FreeAlgebra(QQ, 2)
sage: P.<x,y> = A.g_algebra({y*x:-x*y})
sage: x*y
*** glibc detected *** python: free(): invalid pointer: 0x0000000002806c90 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x766d6)[0x7fa9ed9cc6d6]
/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libsingular.so(_Z14gnc_mm_Mult_nnPiS_P9sip_sring+0x294)[0x7fa9d35c7c54]
/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libsingular.so(_Z20gnc_p_Mult_mm_CommonP8spolyrecS0_iP9sip_sring+0x1ec)[0x7fa9d35c8acc]
/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/python2.7/site-packages/sage/libs/singular/polynomial.so(+0x4bd0)[0x7fa9d29a9bd0]
/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/python2.7/site-packages/sage/rings/polynomial/plural.so(+0x1fbd8)[0x7fa9d2deebd8]
/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/python2.7/site-packages/sage/structure/element.so(+0x2a399)[0x7fa9e4bed399]
...

So, that's a good platform to try to hunt the Heisenbugs from #12215, #715 and friends once and for all.

comment:17 follow-up: Changed 7 years ago by SimonKing

How do I make it create a core dump, and how do I open it with gdb? I recall something with "ulimit -c", but I can't recall the details...

sage -gdb does not work, by the way:

simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> ./sage -gdb
----------------------------------------------------------------------
| Sage Version 5.5.rc0, Release Date: 2012-11-17                     |
| Type "notebook()" for the browser-based notebook interface.        |
| Type "help()" for help.                                            |
----------------------------------------------------------------------
**********************************************************************
*                                                                    *
* Warning: this is a prerelease version, and it may be unstable.     *
*                                                                    *
**********************************************************************
/home/simon/SAGE/debug/sage-5.5.rc0/local/bin/sage-ipython
GNU gdb (GDB) SUSE (7.3-41.1.2)
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-suse-linux".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/simon/SAGE/debug/sage-5.5.rc0/local/bin/python...done.
[Thread debugging using libthread_db enabled]
Python 2.7.3 (default, Nov 21 2012, 11:21:15) 
[GCC 4.6.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)

/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/python2.7/site-packages/IPython/ipmaker.pyc in force_import(modname, force_reload)
     61         reload(sys.modules[modname])
     62     else:
---> 63         __import__(modname)
     64         
     65 

ImportError: No module named ipy_profile_sage
Error importing ipy_profile_sage - perhaps you should run %upgrade?
WARNING: Loading of ipy_profile_sage failed.

sage: A.<x,y> = FreeAlgebra(QQ, 2)
------------------------------------------------------------
   File "<ipython console>", line 1
     A.<x,y> = FreeAlgebra(QQ, 2)
       ^
SyntaxError: invalid syntax

comment:18 in reply to: ↑ 17 ; follow-up: Changed 7 years ago by nbruin

Replying to SimonKing:

How do I make it create a core dump, and how do I open it with gdb? I recall something with "ulimit -c", but I can't recall the details...

I'd guess ulimit -c unlimited.

sage -gdb does not work, by the way:

Yes, I had the same problem (and it's not singular-malloc's fault). It seems to be a bad interaction between ipython and gdb. sage -t -gdb does work, so I've just worked around it by wrapping it in a little test file: tester.py:

def t():
    r"""
    sage: A.<x,y> = FreeAlgebra(QQ, 2)
    sage: P.<x,y> = A.g_algebra({y*x:-x*y})
    sage: x*y
    """
    pass

Congratulations on getting my franken-spkg to build ...

comment:19 in reply to: ↑ 18 Changed 7 years ago by SimonKing

Replying to nbruin:

Replying to SimonKing:

How do I make it create a core dump, and how do I open it with gdb? I recall something with "ulimit -c", but I can't recall the details...

I'd guess ulimit -c unlimited.

Thanks!

And then, it is gdb -c <name_of_core_dump>, right?

comment:20 Changed 7 years ago by SimonKing

By the way, what is the ultimate aim of this ticket? Replacing Singular's memory management by malloc makes Singular very slow, if I understand correctly. So, the aim is to provide a tool to investigate flaws in Sage-Singular interrelation, right?

So, could one say that the aim is to provide an optional package?

comment:21 Changed 7 years ago by SimonKing

First observation: The crash has nothing to do with coercion.

sage: A.<x,y> = FreeAlgebra(QQ, 2)
sage: P.<x,y> = A.g_algebra({y*x:-x*y})
sage: x._mul_(y)
*** glibc detected *** python: free(): invalid pointer: 0x0000000002dd86c0 ***

That's a good news, because the absence of coercion makes life a lot easier...

comment:22 Changed 7 years ago by SimonKing

Now the bad news: By inserting print statements into the _mul_ method, I found that the crash occurs in the line

        singular_polynomial_mul(&_p, left._poly,
                                 (<NCPolynomial_plural>right)._poly,
                                 (<NCPolynomialRing_plural>left._parent)._ring)

I was somehow hoping that it would happen in the new_NCP function.

comment:23 Changed 7 years ago by SimonKing

News become worse: The crash happens in pp_Mult_qq, so that I start to believe it might be an upstream problem.

comment:24 Changed 7 years ago by SimonKing

The pp_Mult_qq function has three arguments: Two pointers p,q to polynomials, and a pointer r to a ring.

When I tested, the hex address of the polynomials resp. the ring being pointed at were

p = 0x2dd3e10
q = 0x2dd38a0
r = 0x2dfa7e0

But I can't relate these figures to the address reported by the backtrace:

*** glibc detected *** python: free(): invalid pointer: 0x0000000002bb4ee0 ***

Do you have an idea how this can be interpreted?

comment:25 Changed 7 years ago by SimonKing

Looking into the upstream sources and trying to find what the gdb backtrace is telling, I see in line 667 of pInline2.h:

PINLINE2 poly pp_Mult_qq(poly p, poly q, const ring r)
{
  poly last;
  if (p == NULL || q == NULL) return NULL;

  if (pNext(p) == NULL)
  {
#ifdef HAVE_PLURAL
    if (rIsPluralRing(r))
      return nc_mm_Mult_pp(p, q, r);
#endif
    return r->p_Procs->pp_Mult_mm(q, p, r, last);
  }

and line 667 is return nc_mm_Mult_pp(p, q, r). That function is defined in gring.h (why in a header file, by the way?). I find no indication of "free" being used there, except that it calls r->GetNC()->p_Procs.mm_Mult_pp(...), and I have no idea where that method is being defined.

So, I'll ask upstream.

comment:26 Changed 7 years ago by SimonKing

I should have read your backtrace more carefully. It mentions gnc_mm_Mult_nn, and that's what I am analysing now. A shame that one needs to recompile so many things for testing it...

comment:27 follow-up: Changed 7 years ago by SimonKing

OK, the error occurs in the function poly gnc_mm_Mult_nn(int *F0, int *G0, const ring r), whose definition in kernel/gring.cc starts at line 463.

I inserted some print statements, and found the crash occurs in this chunk of code:

  if (iF<=jG)
    /* i.e. no mixed exp_num , MERGE case */
  {
    p_MemAdd_LengthGeneral(F, G, ExpSize/sizeof(long));
    p_SetExpV(out,F,r);
    p_Setm(out,r);
    //    omFreeSize((ADDRESS)F,ExpSize);
    printf("second freeT F,rN\n");
    freeT(F,rN);
    printf("second freeT G,rN\n");
    freeT(G,rN);
    printf("second return out\n");
    return(out);
  }

The last line printed is second freeT F,rN. Hence, F seems to be corrupted. Here is ho F is created:

  int *F=(int *)omAlloc0((rN+1)*sizeof(int));
  int *G=(int *)omAlloc0((rN+1)*sizeof(int));

  memcpy(F, F0,(rN+1)*sizeof(int));
  // pExpVectorCopy(F,F0);
  memcpy(G, G0,(rN+1)*sizeof(int));
  //  pExpVectorCopy(G,G0);
  F[0]=0; /* important for p_MemAdd */
  G[0]=0;

Is it a problem that omAlloc0 is explicitly called? Or is it just a macro, and actually defaults to malloc with your spkg?

comment:28 in reply to: ↑ 27 ; follow-up: Changed 7 years ago by nbruin

Replying to SimonKing:

The last line printed is second freeT F,rN. Hence, F seems to be corrupted. Here is ho F is created:

  int *F=(int *)omAlloc0((rN+1)*sizeof(int));
  int *G=(int *)omAlloc0((rN+1)*sizeof(int));

  memcpy(F, F0,(rN+1)*sizeof(int));
  // pExpVectorCopy(F,F0);
  memcpy(G, G0,(rN+1)*sizeof(int));
  //  pExpVectorCopy(G,G0);
  F[0]=0; /* important for p_MemAdd */
  G[0]=0;

Is it a problem that omAlloc0 is explicitly called? Or is it just a macro, and actually defaults to malloc with your spkg?

That all should be safe. (omalloc already had an omEmulateAlloc config option, but it was missing declarations and therefore Singular failed to compile. I added those. Part of it was an omMemDup routine that takes only a pointer, so it needs to know how large the block pointed to actually is. I needed the linux-specific malloc_usable_size for this. It looks into the 8 bytes preceding the block, which makes valgrind complain)

The actual declarations are for the most part near omalloc.h:777, which comes from omAllocDecl.h:280.

#define omAlloc0(size)                          omEmulateAlloc0(size)

which is only defined in omAllocEmulate.c:

void* omEmulateAlloc0(size_t size)
{
  void* addr = OM_MALLOC_MALLOC(size);
  memset(addr, 0, size);
  return addr;
}

where 'OM_MALLOC_MALLOC' is 'malloc' as far as I can tell (the number of indirections involved in omalloc probably is very useful but it drove me to despair). So that looks perfectly safe to me. Furthermore, I didn't touch that file.

I suppose the corruption happens somewhere lower down. Incidentally, the valgrind report points to exactly those two snippets. So valgrind reporting for these things is really very good.

comment:29 in reply to: ↑ 28 ; follow-up: Changed 7 years ago by SimonKing

Replying to nbruin:

That all should be safe.

... which is bad, because I need someone that I can blame }:->

I suppose the corruption happens somewhere lower down. Incidentally, the valgrind report points to exactly those two snippets. So valgrind reporting for these things is really very good.

It is not clear to me what you mean by "those two snippets": The definition of omEmulateAlloc0, or the snippets that I provided in my previous post?

comment:30 in reply to: ↑ 29 ; follow-ups: Changed 7 years ago by nbruin

Replying to SimonKing:

It is not clear to me what you mean by "those two snippets":

This one:

the snippets that I provided in my previous post?

But don't take my word for it. The report is in the sage-devel thread:

==20553== Invalid read of size 8
==20553==    at 0x2694CB7C: gnc_mm_Mult_nn(int*, int*, sip_sring*) (gring.cc:503)
==20553==    by 0x2694DABB: gnc_p_Mult_mm_Common(spolyrec*, spolyrec*, int, sip_sring*) (gring.cc:424)
==20553==    by 0x269C7726: p_Power(spolyrec*, int, sip_sring*) (gring.h:181)
...
==20553==  Address 0x40c39c48 is 8 bytes inside a block of size 12 alloc'd
==20553==    at 0x4A0762F: malloc (vg_replace_malloc.c:270)
==20553==    by 0x26BADA45: omEmulateAlloc0 (omAllocEmulate.c:15)
==20553==    by 0x2694C9EE: gnc_mm_Mult_nn(int*, int*, sip_sring*) (gring.cc:472)
==20553==    by 0x2694DABB: gnc_p_Mult_mm_Common(spolyrec*, spolyrec*, int, sip_sring*) (gring.cc:424)
==20553==    by 0x269C7726: p_Power(spolyrec*, int, sip_sring*) (gring.h:181)

Wait a minute ... valgrind complains EARLIER! Line 503 rather than 507. So valgrind already seems unhappy with

    p_MemAdd_LengthGeneral(F, G, ExpSize/sizeof(long));

It's not immediately clear to me that ExpSize/sizeof(long) really is (at most) rN+1 (see p_MemAdd.h for the definition). The singular source is now my most-grepped directory.

comment:31 in reply to: ↑ 30 Changed 7 years ago by nbruin

It's not immediately clear to me that ExpSize/sizeof(long) really is (at most) rN+1 (see p_MemAdd.h for the definition).

In fact, it certainly is not:

int ExpSize=(((rN+1)*sizeof(int)+sizeof(long)-1)/sizeof(long))*sizeof(long);

so ExpSize gets rounded up. If sizeof(int)=4 and sizeof(long)=8 then for rN=2 we get (rN+1)*sizeof(int)=12 and ExpSize=16 (which, by the valgrind report, is probably exactly the case we're in). So p_MemAdd_LengthGeneral is definitely writing out of bounds. I think this code should simply read:

  int *F=(int *)omAlloc0(ExpSize);
  int *G=(int *)omAlloc0(ExpSize);

  memcpy(F, F0,(rN+1)*sizeof(int));
  // pExpVectorCopy(F,F0);
  memcpy(G, G0,(rN+1)*sizeof(int));

EDIT: Oops, don't change the memcpy, since the source may not have such a large allocation. The target is 0-initialized, so the extra space should not be a problem.

God knows why they add an array of int as an array of long in

p_MemAdd_LengthGeneral(F, G, ExpSize/sizeof(long));

Are they sure there's no carry (I guess, if these are non-negative signed ints the sign bit would buffer any carry).

Is it viable to simply ditch Singular :-?

By the way, since most architectures with sizeof(long)=8 would be 8-byte aligned anyway, I doubt that this error would ever lead to real trouble. But the sloppiness surely is worrying.

Last edited 7 years ago by nbruin (previous) (diff)

comment:32 follow-ups: Changed 7 years ago by nbruin

OK, making the changes outlined above do make

 ./sage -t devel/sage/sage/libs/singular/function.pyx

pass with MALLOC_CHECK_=3. So: yes, singular-malloc found a real issue in the singular code (please report upstream!).

Bugs like this *could* trip up singular on architectures that have default 4-byte alignment and sizeof(long) = 8. From omTables.c we see that bins that have a length that are not a multiple of 8 are used in that case:

#ifdef OM_ALIGN_8

size_t om_BinSize [SIZEOF_OM_BIN_PAGE / MIN_BIN_BLOCKS] =
{   8,  16,  24,  32,
    40,  48,  56,  64, 72,
    80,  96, 112, 128, 144,
    160, 192, 224,
    ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*9)) / 8)*8,
    ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*6)) / 8)*8,
    ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*4)) / 8)*8,
    ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*2)) / 8)*8,
    ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR))   / 8)*8,
    OM_MAX_BLOCK_SIZE};

#else /* ! OM_ALIGN_8 */

size_t om_BinSize [SIZEOF_OM_BIN_PAGE / MIN_BIN_BLOCKS] =
{   8,  12,  16,  20,
    24,  28,  32,
    40,  48,  56,  64,
    80,  96, 112, 128,
    160, 192, 224,
    ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*9)) / SIZEOF_STRICT_ALIGNMENT)*SIZEOF_STRICT_ALIGNMENT,
    ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*6)) / SIZEOF_STRICT_ALIGNMENT)*SIZEOF_STRICT_ALIGNMENT,
    ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*4)) / SIZEOF_STRICT_ALIGNMENT)*SIZEOF_STRICT_ALIGNMENT,
    ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*2)) / SIZEOF_STRICT_ALIGNMENT)*SIZEOF_STRICT_ALIGNMENT,
    ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR))   / SIZEOF_STRICT_ALIGNMENT)*SIZEOF_STRICT_ALIGNMENT,
    OM_MAX_BLOCK_SIZE};

#endif /* OM_ALIGN_8 */

Running the same file with valgrind does give a couple of reports of the form:

==30816== Conditional jump or move depends on uninitialised value(s)
==30816==    at 0x26942A01: hGetmem(int, int**, monrec*) (hutil.cc:1026)
==30816==    by 0x26940296: hHilbStep(int*, int**, int, int*, int, int*, int) (hilb.cc:142)
==30816==    by 0x2694047C: hHilbStep(int*, int**, int, int*, int, int*, int) (hilb.cc:186)
==30816==    by 0x26940AEB: hSeries(sip_sideal*, intvec*, int, intvec*, sip_sideal*, sip_sring*) (hilb.cc:285)
==30816==    by 0x26A62E0D: khCheck(sip_sideal*, intvec*, intvec*, int&, int&, skStrategy*) (khstd.cc:75)
==30816==    by 0x2697EE07: bba(sip_sideal*, sip_sideal*, intvec*, intvec*, skStrategy*) (kstd2.cc:1214)
==30816==    by 0x26973F5A: kStd(sip_sideal*, sip_sideal*, tHomog, intvec**, intvec*, int, int, intvec*) (kstd1.cc:1819)
==30816==    by 0x268B3B27: jjSTD_HILB(sleftv*, sleftv*, sleftv*) (iparith.cc:3321)

and also (and this one seems to be what ElectricFence trips on):

==30816== Invalid read of size 8
==30816==    at 0x26B8B3F2: List<CanonicalForm>::isEmpty() const (ftmpl_list.cc:256)
==30816==    by 0x26AFA34F: multiFactorize(CanonicalForm const&, Variable const&) (facFactorize.cc:710)
==30816==    by 0x26A95E9C: ratFactorize(CanonicalForm const&, Variable const&, bool) (facFactorize.h:45)
==30816==    by 0x26A9303B: factorize(CanonicalForm const&, bool) (cf_factor.cc:651)
==30816==    by 0x26935B43: singclap_factorize(spolyrec*, intvec**, int) (clapsing.cc:835)
==30816==    by 0x268B0D4F: jjFAC_P(sleftv*, sleftv*) (iparith.cc:3995)
==30816==    by 0x268B9C41: iiExprArith1(sleftv*, sleftv*, int) (iparith.cc:7869)
==30816==    by 0x270C9F54: __pyx_f_4sage_4libs_8singular_8function_17KernelCallHandler_handle_call(__pyx_obj_4sage_4libs_8singular_8function_KernelCallHandler*, __pyx_obj_4sage_4libs_8singular_8function_Converter*, __pyx_opt_args_4sage_4libs_8singular_8function_17KernelCallHandler_handle_call*) (function.cpp:11286)
...
==30816==  Address 0x40c08620 is 0 bytes after a block of size 32 alloc'd
==30816==    at 0x4A06A97: operator new[](unsigned long) (vg_replace_malloc.c:363)
==30816==    by 0x26AFA10B: multiFactorize(CanonicalForm const&, Variable const&) (facFactorize.cc:688)
==30816==    by 0x26A95E9C: ratFactorize(CanonicalForm const&, Variable const&, bool) (facFactorize.h:45)
==30816==    by 0x26A9303B: factorize(CanonicalForm const&, bool) (cf_factor.cc:651)
==30816==    by 0x26935B43: singclap_factorize(spolyrec*, intvec**, int) (clapsing.cc:835)
==30816==    by 0x268B0D4F: jjFAC_P(sleftv*, sleftv*) (iparith.cc:3995)
==30816==    by 0x268B9C41: iiExprArith1(sleftv*, sleftv*, int) (iparith.cc:7869)
==30816==    by 0x270C9F54: __pyx_f_4sage_4libs_8singular_8function_17KernelCallHandler_handle_call(__pyx_obj_4sage_4libs_8singular_8function_KernelCallHandler*, __pyx_obj_4sage_4libs_8singular_8function_Converter*, __pyx_opt_args_4sage_4libs_8singular_8function_17KernelCallHandler_handle_call*) (function.cpp:11286)

and one further report:

==30816== Invalid read of size 4
==30816==    at 0x269CE137: rOrd_is_Totaldegree_Ordering(sip_sring*) (ring.cc:2074)
==30816==    by 0x269CFE12: rComplete(sip_sring*, int) (ring.cc:3469)
==30816==    by 0x269D315C: rDefault(int, int, char**, int, int*, int*, int*, int**) (ring.cc:150)
==30816==    by 0x269D3214: rDefault(int, int, char**) (ring.cc:167)
==30816==    by 0x269C06F9: nInitChar(sip_sring*) (numbers.cc:267)
==30816==    by 0x269CEF07: rComplete(sip_sring*, int) (ring.cc:3546)
==30816==    by 0x268FA13F: rInit(sleftv*, sleftv*, sleftv*) (ipshell.cc:5127)
==30816==    by 0x268D7607: yyparse() (grammar.y:1288)
==30816==    by 0x268B9C41: iiExprArith1(sleftv*, sleftv*, int) (iparith.cc:7869)
==30816==    by 0x268D7D32: yyparse() (grammar.y:659)
==30816==    by 0x268ED5A7: iiAllStart(procinfo*, char*, feBufferTypes, int) (iplib.cc:316)
==30816==    by 0x268ED73E: iiPStart(idrec*, sleftv*) (iplib.cc:417)
==30816==    by 0x268EDC60: iiMake_proc(idrec*, sip_package*, sleftv*) (iplib.cc:539)
...
==30816==  Address 0x40a7b208 is 0 bytes after a block of size 8 alloc'd
==30816==    at 0x4A0762F: malloc (vg_replace_malloc.c:270)
==30816==    by 0x269D31BD: rDefault(int, int, char**) (ring.cc:157)
==30816==    by 0x269C06F9: nInitChar(sip_sring*) (numbers.cc:267)
==30816==    by 0x269CEF07: rComplete(sip_sring*, int) (ring.cc:3546)
==30816==    by 0x268FA13F: rInit(sleftv*, sleftv*, sleftv*) (ipshell.cc:5127)
==30816==    by 0x268D7607: yyparse() (grammar.y:1288)
==30816==    by 0x268B9C41: iiExprArith1(sleftv*, sleftv*, int) (iparith.cc:7869)
==30816==    by 0x268D7D32: yyparse() (grammar.y:659)
==30816==    by 0x268ED5A7: iiAllStart(procinfo*, char*, feBufferTypes, int) (iplib.cc:316)
==30816==    by 0x268ED73E: iiPStart(idrec*, sleftv*) (iplib.cc:417)
==30816==    by 0x268EDC60: iiMake_proc(idrec*, sip_package*, sleftv*) (iplib.cc:539)
Last edited 7 years ago by nbruin (previous) (diff)

comment:33 in reply to: ↑ 32 Changed 7 years ago by nbruin

==30816== Invalid read of size 4
==30816==    at 0x269CE137: rOrd_is_Totaldegree_Ordering(sip_sring*) (ring.cc:2074)
...

This is like shooting fish in a barrel. ring.cc:157:

...
  int *order = (int *) omAlloc(2* sizeof(int));
...
  return rDefault(ch,N,n,2,order,block0,block1);

calling ring.cc:127 (the variable order above gets passed as the parameter ord in the code below):

  ring r=(ring) omAlloc0Bin(sip_sring_bin);
...
  r->order = ord;
...
  rComplete(r);

calling ring.cc:3469:

  if (rOrd_is_Totaldegree_Ordering(r) || rOrd_is_WeightedDegree_Ordering(r))

calling ring.cc:2065:

BOOLEAN rOrd_is_Totaldegree_Ordering(ring r)
{
  // Hmm.... what about Syz orderings?
  return (rVar(r) > 1 &&
          ((rHasSimpleOrder(r) &&
           (rOrder_is_DegOrdering((rRingOrder_t)r->order[0]) ||
            rOrder_is_DegOrdering(( rRingOrder_t)r->order[1]))) ||
           (rHasSimpleOrderAA(r) &&
            (rOrder_is_DegOrdering((rRingOrder_t)r->order[1]) ||
             rOrder_is_DegOrdering((rRingOrder_t)r->order[2])))));
}

As you see, that last line ...r->ord[2] is indeed out of bound on a 2*sizeof(int) block. The valgrind session seems to indicate this code path is indeed exercised.

comment:34 in reply to: ↑ 30 Changed 7 years ago by SimonKing

Replying to nbruin:

It's not immediately clear to me that ExpSize/sizeof(long) really is (at most) rN+1

Do you mean (rN+1)*sizeof(int)?

comment:35 Changed 7 years ago by SimonKing

  • Report Upstream changed from N/A to Reported upstream. Developers acknowledge bug.

For the record: The problem is reported upstream, and Hannes seems to acknowledge that there is a problem.

comment:36 follow-up: Changed 7 years ago by SimonKing

I am a bit puzzled: I thought that I had posted a comment roughly as follows, but it seems to have vanished.

It would take some time to get a fix into a stable Singular release, and it would take even more time until the next stable Singular release is in Sage.

Hence: Could you create diff files for your fixes in the Singular source code (I am not talking about the malloc thing but about the bug fixes), so that one can put them into the patches/ directory of the Singular spkg?

comment:37 in reply to: ↑ 36 ; follow-up: Changed 7 years ago by nbruin

Replying to SimonKing:

Hence: Could you create diff files for your fixes in the Singular source code (I am not talking about the malloc thing but about the bug fixes), so that one can put them into the patches/ directory of the Singular spkg?

For the bug that tripped MALLOC_CHECK_: sure. That's just a 2-line change which I'm fairly confident is safe to make.

-  int *F=(int *)omAlloc0((rN+1)*sizeof(int));
+  int *F=(int *)omAlloc0(ExpSize);
-  int *G=(int *)omAlloc0((rN+1)*sizeof(int));
+  int *G=(int *)omAlloc0(ExpSize);

For the second bug: I have no idea how to fix that. Just doing

-  int *order = (int *) omAlloc(2* sizeof(int))
+  int *order = (int *) omAlloc(3* sizeof(int))

may work but is a bit questionable.

Also I have no experience in making patch files for the singular source. It may be a bit more robust to take whatever change Singular makes to their development version and backport that.

Also: we're not done yet. There's still a branch on what valgrind thinks is an uninitialized value and there's the List<CanonicalForm>::isEmpty() issue which ventures into templated code, so I'm at a loss for that.

Purpose of this ticket: I guess it's vague. Feel free to specify. I was actually hoping that after seeing that singular-malloc can expose issues that are otherwise hard to find, Singular might be interesting in adopting it and including it as a honest configure option (resolving the problem with the executable too). Then we can just put the proper doc in the spkg. If they don't do that, we could still do so but as a patch to singular. My goal was just to get rid of Heisenbugs in our communication with Singular.

comment:38 in reply to: ↑ 37 Changed 7 years ago by SimonKing

Replying to nbruin:

Replying to SimonKing:

Hence: Could you create diff files for your fixes in the Singular source code (I am not talking about the malloc thing but about the bug fixes), so that one can put them into the patches/ directory of the Singular spkg?

For the bug that tripped MALLOC_CHECK_: sure. That's just a 2-line change which I'm fairly confident is safe to make.

-  int *F=(int *)omAlloc0((rN+1)*sizeof(int));
+  int *F=(int *)omAlloc0(ExpSize);
-  int *G=(int *)omAlloc0((rN+1)*sizeof(int));
+  int *G=(int *)omAlloc0(ExpSize);

For the second bug: I have no idea how to fix that. Just doing

-  int *order = (int *) omAlloc(2* sizeof(int))
+  int *order = (int *) omAlloc(3* sizeof(int))

may work but is a bit questionable.

Can you suggest that as a fix on the singular ticket? Or shall I communicate it?

Also I have no experience in making patch files for the singular source.

That's easy.

  • Create a diff with context (is that called "unified"? I am not sure).
  • Give it a good name and perhaps a commit message, and put it into the patches/ folder of the Singular spkg. Just look what you currently find in that folder. The point is: The patches placed into the patches/ folder are automatically applied before building Singular.
  • Do not change the files in src/! They are supposed to be identical with the official Singular 3-1-5 source files.
  • Rename the spkg into singular-3-1-5.p2.spkg.
  • Update SPKG.txt, describing the new patch.
  • Both the patches/ folder and SPKG.txt are under version control, hence, you also need to hg commit.
  • Pack the spkg, post it somewhere, and ask me or someone else for a review.

The same patch could be put into your malloc-Singular/patches/, so that we can test whether the problem is really solved. And if it is, then the patch in the usual omalloc Singular should be fine to go.

It may be a bit more robust to take whatever change Singular makes to their development version and backport that.

Well, let's see what the Singular team thinks of your suggestion...

If they react quickly, then we could take the patch from Singular trac. If they are too slow, we take your patch from Sage trac.

Also: we're not done yet. There's still a branch on what valgrind thinks is an uninitialized value and there's the List<CanonicalForm>::isEmpty() issue which ventures into templated code, so I'm at a loss for that.

Sure. Could be that we will open further Singular trac tickets, and could be that there will be two or three more patches in patches/...

Purpose of this ticket: I guess it's vague. Feel free to specify. I was actually hoping that after seeing that singular-malloc can expose issues that are otherwise hard to find, Singular might be interesting in adopting it and including it as a honest configure option (resolving the problem with the executable too). Then we can just put the proper doc in the spkg.

I see that the patches/ directory contains a sub-directory patches/conditional/ containing a patch sage_debug.patch and a patch for cygwin. The changes that you've done in the Singular source code to use malloc could be put into patches/conditional/. Probably spkg-install needs to be changed to take the new option into account.

If they don't do that, we could still do so but as a patch to singular.

I think it will be a patch to the Singular spkg in any case. Waiting for Singular 3-1-6 (which will hopefully contain a fix for the memory problem) is probably no option.

comment:39 follow-up: Changed 7 years ago by SimonKing

I've put the following two files into the patches/ folder of the singular-3-1-5.malloc-linux spkg:

  • sage_trac13731_gnc_mm_Mult_nn.patch
    • src/kernel/gring.cc

      # HG changeset patch
      # User Simon King <simon.king@uni-jena.de>
      # Date 1353701669 -3600
      # Node ID 0f5e07666bc133a0d38cb846120c0103c4c2b0ae
      # Parent  3eb11c08dd9611b7fdba4622cc0d1660527f9e96
      #13731: Fix allocated size of F and G in gnc_mm_Mult_nn
      
      diff --git a/src/kernel/gring.cc b/src/kernel/gring.cc
      a b  
      469469  int rN=r->N;
      470470  int ExpSize=(((rN+1)*sizeof(int)+sizeof(long)-1)/sizeof(long))*sizeof(long);
      471471
      472   int *F=(int *)omAlloc0((rN+1)*sizeof(int));
      473   int *G=(int *)omAlloc0((rN+1)*sizeof(int));
       472  int *F=(int *)omAlloc0(ExpSize);
       473  int *G=(int *)omAlloc0(ExpSize);
      474474
      475475  memcpy(F, F0,(rN+1)*sizeof(int));
      476476  // pExpVectorCopy(F,F0);
  • patches/sage_trac13731_size_of_order.patch
    • src/kernel/ring.cc

      # HG changeset patch
      # User Simon King <simon.king@uni-jena.de>
      # Date 1353701944 -3600
      # Node ID 3ab3d75bf03794d94037c3e33711fdd773d858dc
      # Parent  0f5e07666bc133a0d38cb846120c0103c4c2b0ae
      #13731: Fix allocated size for order (in ring.cc)
      
      diff --git a/src/kernel/ring.cc b/src/kernel/ring.cc
      a b  
      154154ring rDefault(int ch, int N, char **n)
      155155{
      156156  /*order: lp,0*/
      157   int *order = (int *) omAlloc(2* sizeof(int));
       157  int *order = (int *) omAlloc(3* sizeof(int));
      158158  int *block0 = (int *)omAlloc0(2 * sizeof(int));
      159159  int *block1 = (int *)omAlloc0(2 * sizeof(int));
      160160  /* ringorder dp for the first block: var 1..N */

However, after re-installing the package, the crash persists!

sage: A.<x,y> = FreeAlgebra(QQ, 2)
sage: P.<x,y> = A.g_algebra({y*x:-x*y})
sage: x*y
*** glibc detected *** python: free(): invalid pointer: 0x0000000002bb11c0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x766d6)[0x7f8243e866d6]
/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libsingular.so(_Z14gnc_mm_Mult_nnPiS_P9sip_sring+0x294)[0x7f8229a81c54]
/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libsingular.so(_Z20gnc_p_Mult_mm_CommonP8spolyrecS0_iP9sip_sring+0x1ec)[0x7f8229a82acc]
/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/python2.7/site-packages/sage/libs/singular/polynomial.so(+0x4bd0)[0x7f8228e63bd0]
/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/python2.7/site-packages/sage/rings/polynomial/plural.so(+0x1fbd8)[0x7f82292a8bd8]
/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/python2.7/site-packages/sage/structure/element.so(+0x2a399)[0x7f823b0a7399]
/home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(+0x48a4f)[0x7f8244a63a4f]

That's bad. Didn't you say the change fixes it for you?

comment:40 Changed 7 years ago by SimonKing

Aha! You also changed spkg-install, so that the patches are actually not applied! That is of course the reason why it didn't work.

comment:41 in reply to: ↑ 39 ; follow-up: Changed 7 years ago by nbruin

Replying to SimonKing:

I've put the following two files into the patches/ folder of the singular-3-1-5.malloc-linux spkg:

[...]

Perhaps backport this instead (should be pretty straightforward to replace the patches):

http://www.singular.uni-kl.de:8002/trac/changeset/15435

It looks like the second issue is also addressed. Are the other issues within your reach to investigate? I'm not really comfortable tracing through templated C++.

The FC16 packaged valgrind didn't work for me due to unsupported machine instructions, but the latest valgrind release fixes that and installing that in /usr/local was a breeze! I can thoroughly recommend using it. The information it gives seem spot-on. (I was able to diagnose both these issues by just looking at the valgrind report and tracing through the code from allocation to access error. It won't always be that straightforward, but one can hope)

With

http://sage.math.washington.edu/home/nbruin/sage.supp

the amount of false (or at least for me uninteresting) reports is bearable.

It's unlikely, but the errors we've found could lead to random corruptions and hence explain the peculiar system-dependent bugs we've experienced.

I'm afraid we're not done yet, though: Although we haven't found a GC mismatch here, we're pretty sure they exist. Also these corruptions should cause bad problems all around, not just after introducing GC.

comment:42 in reply to: ↑ 32 Changed 7 years ago by nbruin

==30816== Conditional jump or move depends on uninitialised value(s)
==30816==    at 0x26942A01: hGetmem(int, int**, monrec*)

I think this one is benign, because rewriting the code to the functionally equivalent

hutil.cc:1020:

scfmon hGetmem(int lm, scfmon old, monp monmem)
{
   scfmon x = monmem->mo;
   int  lx = monmem->a;
   if ((x==NULL) || (lm > lx))
   {
-    if ((x!=NULL)&&(lx>0)) omFreeSize((ADDRESS)x, lx * sizeof(scmon));
+    if (x!=NULL) if (lx>0) omFreeSize((ADDRESS)x, lx * sizeof(scmon));
     monmem->mo = x = (scfmon)omAlloc(lm * sizeof(scmon));
     monmem->a = lm;
   }
   memcpy(x, old, lm * sizeof(scmon));
   return x;
}

makes the reports go away. I guess gcc optimizes out the shortcutting because it's not a benefit here and valgrind doesn't see through the fact that the undefined value doesn't influence the outcome.

comment:43 in reply to: ↑ 2 Changed 7 years ago by nbruin

Replying to nbruin:

#0  0x00007fffce9fb3f2 in List<CanonicalForm>::isEmpty(this=0x7fffa515b000) at ./templates/ftmpl_list.cc:256
#1  0x00007fffce96a350 in multiFactorize (F=..., v=...) at facFactorize.cc:710

I think this one is rather straightforward too (and a genuine out-of-bounds reference!): facFactorize.cc:688

    CFList* bufAeval2= new CFList [A.level() - 2];
...
    evaluationWRTDifferentSecondVars (bufAeval2, bufEvaluation, A);

    for (int j= 0; j < A.level() - 1; j++)
    {
      if (!bufAeval2[j].isEmpty())
        counter++;
    }

so this queries all elements bufAeval2[0],...,bufAeval2[A.level()-2]. However, if this allocates an array as it does in C, then the new command above only creates bufAeval2[0],...,bufAeval2[A.level()-3] (i.e., A.level()-2 of them, but 0-based.

The initialization by evaluationWRTDifferentSecondVars seems to corroborate that:

facFqFactorize.cc:1778

void
evaluationWRTDifferentSecondVars (CFList*& Aeval, const CFList& evaluation,
                                  const CanonicalForm& A)
{
  CanonicalForm tmp;
  CFList tmp2;
  CFListIterator iter;
  for (int i= A.level(); i > 2; i--)
  {
...
    if (preserveDegree)
      Aeval [i - 3]= tmp2;
    else
      Aeval [i - 3]= CFList();
  }
}

So only Aeval[0], ..., Aeval[A.level()-3] get initialized.

Thus, the reference to !bufAeval2[j].isEmpty() with j = A.level() - 2 indeed seems out of bounds to me. The abundance of bound errors in Singular is really making me feel uncomfortable. The Singular team should really take their memory audits a little more seriously. They're playing russian roulette with mathematical correctness.

comment:44 in reply to: ↑ 41 Changed 7 years ago by SimonKing

Replying to nbruin:

Replying to SimonKing:

I've put the following two files into the patches/ folder of the singular-3-1-5.malloc-linux spkg:

[...]

Perhaps backport this instead (should be pretty straightforward to replace the patches):

http://www.singular.uni-kl.de:8002/trac/changeset/15435

Yes, that looks like the right thing to do.

My plan is to take your Singular-malloc spkg and turn it into a "proper" spkg, in the sense that the changes that you did to the source code will become patches that are optionally applied (say, if one does export SAGE_SINGULA_MALLOC=yes).

The backported fix will of course be an unconditional patch.

It looks like the second issue is also addressed. Are the other issues within your reach to investigate? I'm not really comfortable tracing through templated C++.

Neither am I. But at least (in contrast to me!) you are comfortable to work with valgrind! Perhaps you could open a ticket to make your .supp file available to people? After all, there is a totally outdated valgrind spkg...

comment:45 follow-up: Changed 7 years ago by SimonKing

Now I try to patch the standard Singular spkg so that one can optionally use malloc. But I have two questions on how you created the singular-malloc spkg.

Recall that I have removed (or at least I tried to remove) all auto-generated files that I found in your spkg in order to make it work on my openSuse laptop. The changes compared with the original sources are:

M src/Singular/Makefile.in
M src/Singular/Minor.h
M src/Singular/configure
M src/Singular/configure.in
M src/Singular/sing_win.cc
M src/factory/assert.h
M src/factory/cf_chinese.cc
M src/factory/cf_util.cc
M src/factory/cf_util.h
M src/factory/facBivar.h
M src/factory/facFactorize.h
M src/factory/facFqBivar.cc
M src/omalloc/Makefile.in
M src/omalloc/omAllocDecl.h
M src/omalloc/omAllocFunc.c
M src/omalloc/omMallocSystem.h
M src/omalloc/omalloc.c
M src/omalloc/pmalloc.h
? src/factory/gen_cf_gmp.sh
? src/omalloc/omTables.h
? src/omalloc/omTables.inc

So, first question: Is it correct that all three new files are needed for malloc support? I'd rather think that gen_cf_gmp.sh should not be here, because it contains explicit paths:

#!/bin/sh
GMP_H_T="gen_cf_gmp.o: gen_cf_gmp.cc /usr/local/sage/5.5b1/local/include/gmp.h \ "
GMP_H=`echo $GMP_H_T| sed -e 's/^.*gmp.cc//' -e 's/ .$//'`
echo generating cf_gmp.h from $GMP_H
cat $GMP_H | grep -v __GMP_DECLSPEC_XX |grep -v std::FILE > cf_gmp.h

Second question: Is it not the case that src/Singular/configure is auto-generated from src/Singular/configure.in? Or is it really essential that src/Singular/configure is as in your spkg?

comment:46 Changed 7 years ago by SimonKing

PS: Actually it seems to me that the changes in src/Singular/configure.in and src/Singular/configure are due to the patch patches/singular_trac_443.patch that is to be applied anyway.

comment:47 Changed 7 years ago by SimonKing

src/omalloc/omTables.inc doesn't state that it is auto-generated, but it certainly looks like.

So, third question: Did you write src/omalloc/omTables.inc yourself? Or can I safely delete it?

comment:48 in reply to: ↑ 45 ; follow-up: Changed 7 years ago by nbruin

Replying to SimonKing:

M src/omalloc/omAllocDecl.h
M src/omalloc/omAllocFunc.c
M src/omalloc/omMallocSystem.h
M src/omalloc/omalloc.c
M src/omalloc/pmalloc.h

So, first question: Is it correct that all three new files are needed for malloc support?

I didn't make any new files and I only seriously edited things in the omalloc directory. I may have done some rough hacking in config files to ensure that the changes took effect, but I believe that it should be possible to do without by someone who know how to pass the correct options to configure etc.

To make the "spkg" I only did a "make clean" and a command to make the dir into a spkg. I disabled the "patch application" in spkg-install because that made it fail the second time I ran it (which is inconvenient when you're editing the package!)

Note that malloc_usable_size for glibc has the same function as malloc_size on BSD (OSX), so if there's a convenient config option to choose between the two you can make the spkg so that it compiles on both.

The omalloc package is obviously written in such a way that it can build in different ways, depending on config options passed at build-time. What I did was resurrect one of those options (it didn't provide all the functionality that singular needs). It should be possible to adapt my changes accordingly. Then it's not essential to conditionally apply the patches. It's just that I have zero experience with all the "#ifdef" voodoo that seems to be associated with production-quality code.

I found that "meld" was a very convenient way of browsing the changes I'd made (comparing against a pristine directory). It can do a graphically displayed diff on an entire tree. It's what I used to make the linux package from the OSX one I had.

comment:49 in reply to: ↑ 48 ; follow-up: Changed 7 years ago by SimonKing

Replying to nbruin:

Replying to SimonKing:

M src/omalloc/omAllocDecl.h
M src/omalloc/omAllocFunc.c
M src/omalloc/omMallocSystem.h
M src/omalloc/omalloc.c
M src/omalloc/pmalloc.h

So, first question: Is it correct that all three new files are needed for malloc support?

I didn't make any new files and I only seriously edited things in the omalloc directory.

OK, so, I assume you only edited the five files indicated above, and did no other changes by yourself.

I may have done some rough hacking in config files to ensure that the changes took effect, but I believe that it should be possible to do without by someone who know how to pass the correct options to configure etc.

Well, I don't know.

To make the "spkg" I only did a "make clean" and a command to make the dir into a spkg.

That means that all patches from the patches/ directory are applied in your code, and if make clean is imperfect then it also means that some auto-generated (machine dependent) files are in your code.

I disabled the "patch application" in spkg-install because that made it fail the second time I ran it (which is inconvenient when you're editing the package!)

Sure, because you started with code that already contains all patches.

Note that malloc_usable_size for glibc has the same function as malloc_size on BSD (OSX), so if there's a convenient config option to choose between the two you can make the spkg so that it compiles on both.

Hm. I see the following definition

src/omalloc/dlmalloc.h

/* define to -1 if you want that this implementation provides
   malloc/calloc/realloc/free funcs */
#ifndef OM_PROVIDE_MALLOC
#define OM_PROVIDE_MALLOC 0
#endif

#ifndef HAVE__USR_INCLUDE_MALLOC_H
#undef  HAVE__USR_INCLUDE_MALLOC_H
#endif

#ifdef HAVE__USR_INCLUDE_MALLOC_H
#define HAVE_USR_INCLUDE_MALLOC_H 1
#endif

#define OM_MALLOC_MALLOC   mALLOc
#define OM_MALLOC_REALLOC  rEALLOc
#define OM_MALLOC_FREE     fREe
#define OM_MALLOC_VALLOC   vALLOc
#define OM_MALLOC_VFREE(addr, size) OM_MALLOC_FREE(addr)
#define OM_MALLOC_SIZEOF_ADDR(addr) malloc_usable_size(addr)
#define cfree cFREe

I don't totally understand the comment "define to -1 if you want that this implementation provides malloc/calloc/realloc/free func". Does that perhaps mean that export OM_PROVIDE_MALLOC=-1 is all what one need to use system malloc?

But I wonder: You say that malloc_usable_size is not available on OSX. Then, the line

#define OM_MALLOC_SIZEOF_ADDR(addr) malloc_usable_size(addr)

sounds like a bug to me. I'll ask Hannes.

comment:50 in reply to: ↑ 49 ; follow-up: Changed 7 years ago by nbruin

Replying to SimonKing:

But I wonder: You say that malloc_usable_size is not available on OSX. Then, the line

#define OM_MALLOC_SIZEOF_ADDR(addr) malloc_usable_size(addr)

sounds like a bug to me. I'll ask Hannes.

That definition only occurs in dlmalloc.h, right? I don't think other files import that header file, so probably dlmalloc only gets used under certain configure options. I wasn't able to trace through how omalloc gets configured/built. It seems to be a value for configure:

 --with-malloc=system|dlmalloc|gmalloc|pmalloc|external

(given that singular builds just fine on OSX this cannot be a real problem)

But yes, these changes are best made with input from someone familiar with omalloc, so hopefully Hannes can help.

comment:51 in reply to: ↑ 50 ; follow-up: Changed 7 years ago by SimonKing

Replying to nbruin:

It seems to be a value for configure:

 --with-malloc=system|dlmalloc|gmalloc|pmalloc|external

Then the question arises: Why is it not enough to pass the --with-malloc=system to Singular? Why did the additional changes become necessary that you made on five files in src/omalloc/?

For example, you edited src/omalloc/omAllocDecl.h to make the emulation unconditional. Is that really needed, if there is a configuration option anyway?

But yes, these changes are best made with input from someone familiar with omalloc, so hopefully Hannes can help.

I asked, but he didn't reply yet. Well, it's week-end, after all...

comment:52 in reply to: ↑ 51 ; follow-up: Changed 7 years ago by nbruin

Replying to SimonKing:

Replying to nbruin:

It seems to be a value for configure:

 --with-malloc=system|dlmalloc|gmalloc|pmalloc|external

Then the question arises: Why is it not enough to pass the --with-malloc=system to Singular? Why did the additional changes become necessary that you made on five files in src/omalloc/?

I think I can explain that: omalloc is written to work on top of another memory manager. It doesn't call mmap or sbrk by itself. So I think this option selects which malloc is used to serve omalloc with pages.

dlmalloc is simply a full-fledged memory manager that happens to be shipped with omalloc and can be used as a backend. If you look in dlmalloc.c, there is actually an implementation of malloc_usable_size. So I guess you can build omalloc in a way that is independent of the malloc offered by the clib and just uses system calls to mmap or sbrk directly.

The other configuration options (you'll have to read up on it to be sure) deal with what services omalloc provides and how it does it.

There's an option that selects whether omalloc should offer the standard 'malloc/realloc/free' interface to its own allocators.

There is also an option that selects how omalloc does its job: whether it should do what it is designed for or whether it should just honour the requests it receives by passing them on 1-1 to the underlying memory allocator. That's the option that had fallen into disuse within singular and had become defunct. I tried to bring its functionality back to the level required for singular. I hacked my way to getting omalloc to compile in that mode. However, I'm sure this can be done in a nicer fashion.

For example, you edited src/omalloc/omAllocDecl.h to make the emulation unconditional. Is that really needed, if there is a configuration option anyway?

No I'm sure it's not. I didn't figure out a way that positively confirmed for me that just passing in an option helped (partially because I couldn't interact directly with the singular configure script: the spkg-install plays games too in order to build both an executable and libsingular). But someone more familiar with how configure and conditional compilation setups usually work can probably very easily put that in order.

comment:53 in reply to: ↑ 52 ; follow-ups: Changed 7 years ago by SimonKing

Replying to nbruin:

I think I can explain that: omalloc is written to work on top of another memory manager.

So, that is --with-malloc=....

There's an option that selects whether omalloc should offer the standard 'malloc/realloc/free' interface to its own allocators.

What does that mean? If one uses omalloc, why should one not want that it offers malloc/realloc/free?

There is also an option that selects how omalloc does its job: whether it should do what it is designed for or whether it should just honour the requests it receives by passing them on 1-1 to the underlying memory allocator.

So, that's the option that we need, right? Is that --with-emulate-omalloc that you added to spkg-install?

I guess I am now experiencing the same that you did: If --with-emulate-omalloc is the only change, then installing the spkg fails with

In file included from ../kernel/bigintmat.h:13:0,
                 from bigintmat.cc:10:
../kernel/intvec.h: In constructor ‘intvec::intvec(int)’:
../kernel/intvec.h:26:18: error: ‘omEmulateAlloc0’ was not declared in this scope
bigintmat.cc: In member function ‘int* bigintmat::getwid(int)’:
bigintmat.cc:379:31: error: ‘omStrDup’ was not declared in this scope
bigintmat.cc: In member function ‘void bigintmat::pprint(int)’:
bigintmat.cc:414:18: error: ‘omEmulateAlloc0’ was not declared in this scope
bigintmat.cc:421:32: error: ‘omStrDup’ was not declared in this scope

comment:54 in reply to: ↑ 53 Changed 7 years ago by nbruin

Replying to SimonKing:

There's an option that selects whether omalloc should offer the standard 'malloc/realloc/free' interface to its own allocators.

What does that mean? If one uses omalloc, why should one not want that it offers malloc/realloc/free?

It selects whether omalloc should offer its services under those names, i.e., as a drop-in replacement for the standard libc memory manager. That's not what Singular does. It uses the various omalloc routines, which offer extra options. Otherwise, changing Singular would have been a matter of "dropping out" the drop-in replacement. Instead we have to jump through hoops to get omalloc to use libc's malloc (or whatever drop-in replacement we want to use, like ElectricFence?) behind the scenes.

comment:55 in reply to: ↑ 53 Changed 7 years ago by nbruin

Replying to SimonKing:

I guess I am now experiencing the same that you did: If --with-emulate-omalloc is the only change, then installing the spkg fails with

In file included from ../kernel/bigintmat.h:13:0,
                 from bigintmat.cc:10:
../kernel/intvec.h: In constructor ‘intvec::intvec(int)’:
../kernel/intvec.h:26:18: error: ‘omEmulateAlloc0’ was not declared in this scope
bigintmat.cc: In member function ‘int* bigintmat::getwid(int)’:
bigintmat.cc:379:31: error: ‘omStrDup’ was not declared in this scope
bigintmat.cc: In member function ‘void bigintmat::pprint(int)’:
bigintmat.cc:414:18: error: ‘omEmulateAlloc0’ was not declared in this scope
bigintmat.cc:421:32: error: ‘omStrDup’ was not declared in this scope

Yay! indeed. Those are some of the routines I provided. So yes, it's probably best if you just use my changes for information and as hints for how to provide the missing functionality, but then integrate it more cleanly (I had a couple of rounds of these; the hard one being omMemDup, which you don't seem to hit yet). The omEmulateAlloc0 is a weird one since that IS implemented in omAllocEmulate.c I think I solved that one by simply including the relevant declaration in omAllocDecl.h.

Changed 7 years ago by SimonKing

A patch for the singular spkg to produce a libsingular that uses malloc

comment:56 Changed 7 years ago by SimonKing

What I did to produce a singular spkg that produces a malloc-libsingular:

  • Drop attachment:sage_trac_13731.patch into the spkg's patches/ directory
  • Apply the following change to spkg-install:
    • spkg-install

      diff --git a/spkg-install b/spkg-install
      a b  
      134134                --libdir="$SAGE_LOCAL"/lib \
      135135                --with-apint=gmp \
      136136                --with-malloc=system \
       137                --with-emulate-omalloc \
      137138                --with-NTL \
      138139                --without-MP \
      139140                --without-lex \

With that, I seem to get a working libsingular, and I get the due segfault with export MALLOC_CHECK_=3 in the example of the ticket description. However, I do not obtain a working Singular executable.

I am now testing whether the spkg works on bsd.math as well.

comment:57 Changed 7 years ago by SimonKing

Unfortunately, it does not work on bsd.math: The problem appears to be that there is no malloc.h available:

In file included from ./omalloc.h:785:0,
                 from omalloc.c:16:
../omalloc/omMalloc.h:12:20: fatal error: malloc.h: No such file or directory
compilation terminated.

That's really strange. Shouldn't malloc be available system wide?

comment:58 Changed 7 years ago by fbissey

bsd.math is OS X, isn't it? I think I had trouble with that recently because malloc.h is in a subdirectory - not at the top level. You would need an extra -I I guess. I'll see if I can find it.

comment:59 Changed 7 years ago by fbissey

It is in /usr/include/malloc/

comment:60 follow-up: Changed 7 years ago by SimonKing

I am puzzled.

When I put

#if defined(ix86Mac_darwin) || defined(x86_64Mac_darwin) || defined(ppcMac_darwin)
#include <malloc/malloc.h>
#else
#include <malloc.h>
#endif

in src/omalloc/omMallocSystem.h (which is responsible for including malloc.h), things do still not work and it still complains that it can not find malloc.h. But in the installation log, I see the line

checking uname for singular... (cached) x86_64Mac-darwin

and omMalloc.h is created only after that line. Hence I don't understand why it doesn't try to pick up malloc/malloc.h.

comment:61 in reply to: ↑ 60 Changed 7 years ago by SimonKing

Replying to SimonKing:

#if defined(ix86Mac_darwin) || defined(x86_64Mac_darwin) || defined(ppcMac_darwin)
#include <malloc/malloc.h>
#else
#include <malloc.h>
#endif
checking uname for singular... (cached) x86_64Mac-darwin

How nasty! It defines x86_64Mac-darwin, but the person who originally wrote the check for the platforms wrote x86_64Mac_darwin!!

comment:62 follow-up: Changed 7 years ago by fbissey

Rool eyes.... but not completely surprised.

comment:63 in reply to: ↑ 62 Changed 7 years ago by SimonKing

Replying to fbissey:

Rool eyes.... but not completely surprised.

But on the other hand, it can not be "-" (minus sign!) between Mac and darwin. Anyway. There must be a test on which platform we are. How to?

comment:64 follow-up: Changed 7 years ago by fbissey

I am not sure what you would get for a non-mac darwin target, otherwise the easiest thing is probably gcc -dumpmachine, On my 10.5.8 machine:

BlueFerniMac1:~ frb15$ gcc -dumpmachine
i686-apple-darwin9

comment:65 in reply to: ↑ 64 Changed 7 years ago by SimonKing

Replying to fbissey:

I am not sure what you would get for a non-mac darwin target, otherwise the easiest thing is probably gcc -dumpmachine, On my 10.5.8 machine:

BlueFerniMac1:~ frb15$ gcc -dumpmachine
i686-apple-darwin9

Thank you! But I would rather expect that those things are done during configuration, and then the type of the platform is stored in a variable. I'm asking Hans Schönemann how Singular deals with different platforms.

Changed 7 years ago by SimonKing

Backporting Singular changeset 15435

comment:66 Changed 7 years ago by SimonKing

I have backported Singular changeset 15435.

When I put attachment:sage_trac_13731.patch and attachment:singular_15435.patch into the spkg's patches/ directory and apply the change to spkg-install mentioned in comment:56, then I get

simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> export MALLOC_CHECK_=3
simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> ./sage 
...
sage: A.<x,y> = FreeAlgebra(QQ, 2)
sage: P.<x,y> = A.g_algebra({y*x:-x*y})
sage: x._mul_(y)
x*y

So, indeed the original problem got fixed upstream.

Plans:

  • My suggestion is to put the second patch (the backported fix) into our standard Singular spkg.
  • We should of course follow the original plan and see whether we can find further sloppy allocations in Singular, so that #715 and friends can be made work in a reliable way.
  • Can someone tell me how to tell autoconf that it shall create a configure script that tells the preprocessor that we are on Darwin and hence must include malloc/malloc.h in place of malloc.h?

comment:67 follow-up: Changed 7 years ago by fbissey

After more thinking I think you may want to use AC_CANONICAL_TARGET it is usually used for cross compilation but it should work in this case. It is much better than testing gcc since we may want to use clang.

comment:68 follow-up: Changed 7 years ago by nbruin

Just a quick comment: The normal way of using malloc is via #include <stdlib.h>. It's only because we need malloc_usable_size or malloc_size that we need malloc.h. You're already choosing between those using an #if defined(...), so why not use the same mechanism for the compile? Given that this code should only be relevant for *debug* builds of singular, I don't think it's worth overhauling the process by which omalloc produces its header field.

The likely programmatic injection point for changes seems to be the makeheader script which is invoked in the makefile rule for malloc.h, but I'm not sure that that has capabilities of building header files based on conditionals by default.

comment:69 in reply to: ↑ 67 Changed 7 years ago by SimonKing

Replying to fbissey:

After more thinking I think you may want to use AC_CANONICAL_TARGET it is usually used for cross compilation but it should work in this case. It is much better than testing gcc since we may want to use clang.

My question was actually not so much about "How to detect the platform with autoconf?", because I see in the existing configure.in how it can be done. The question is more "How to define a symbol, say, HAVE_DARWIN so that the preprocessor is aware of it?".

I tried to do export HAVE_DARWIN=1 in spkg-build, but it didn't work. I tried AC_DEFINE(HAVE_DARWIN,1), but it didn't work. Here, "didn't work" means that the preprocessor has always been in the else part of the following lines:

#ifdef HAVE_DARWIN
#include "malloc/malloc.h"
#else
#include "malloc.h"
#endif

comment:70 in reply to: ↑ 68 Changed 7 years ago by SimonKing

Replying to nbruin:

Just a quick comment: The normal way of using malloc is via #include <stdlib.h>. It's only because we need malloc_usable_size or malloc_size that we need malloc.h. You're already choosing between those using an #if defined(...), so why not use the same mechanism for the compile?

Well, I don't know whether that way of chosing works. It definitely does not for the #include "malloc.h" versus #include "malloc/malloc.h" thingy.

The likely programmatic injection point for changes seems to be the makeheader script which is invoked in the makefile rule for malloc.h, but I'm not sure that that has capabilities of building header files based on conditionals by default.

I haven't looked into that.> Given that this code should only be relevant for *debug* builds of singular, I don't think it's worth overhauling the process by which omalloc produces its header field.

Probably right. We should better focus on using the debug spkg to detect more memory corruptions.

Just to be on the safe side: Does attachment:singular_15435.patch fix all upstream problems that we have detected so far?

  • The one from the ticket description is fixed.
  • Is the one from comment:33 addressed?
  • Is the one from comment:42 addressed? I understand that one isn't critical, though.
  • The one from comment:43 does not seem to be addressed, and I think I didn't mention it upstream, yet.

comment:71 follow-up: Changed 7 years ago by fbissey

I think AC_DEFINE puts stuff in config.h and you would have to include config.h before this piece of code. Having a code file changed from configure is quite some work. I would have to look it up.

comment:72 in reply to: ↑ 71 ; follow-up: Changed 7 years ago by SimonKing

Replying to fbissey:

I think AC_DEFINE puts stuff in config.h and you would have to include config.h before this piece of code.

That's probably it! Is it needed to use AC_SUBS as well? I read somewhere that this would do something with a header file, so, it is not clear to me whether AC_DEFINE is enough or AC_SUBS is needed in addition.

comment:73 in reply to: ↑ 72 ; follow-up: Changed 7 years ago by fbissey

Replying to SimonKing:

Replying to fbissey:

I think AC_DEFINE puts stuff in config.h and you would have to include config.h before this piece of code.

That's probably it! Is it needed to use AC_SUBS as well? I read somewhere that this would do something with a header file, so, it is not clear to me whether AC_DEFINE is enough or AC_SUBS is needed in addition.

I have read more about it at http://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.69/html_node/Defining-Symbols.html and it goes in config.h if AC_CONFIG_HEADERS is called. AC_SUBST usually acts on Makefile.in preferably generated from Makefile.am and you have to put some code for it to work IIRC. Substituting in a specific myfile.c.in is something else altogether, I need to look it up.

I don't know that stuff from the top of my head unfortunately.

comment:74 in reply to: ↑ 73 Changed 7 years ago by SimonKing

Replying to fbissey:

I have read more about it at http://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.69/html_node/Defining-Symbols.html and it goes in config.h if AC_CONFIG_HEADERS is called.

Thank you! So, I'd need to add AC_CONFIG_HEADERS as well, because the only place where AC_CONFIG_HEADER is used is in gfanlib/configure.ac.

AC_SUBST usually acts on Makefile.in preferably generated from Makefile.am and you have to put some code for it to work IIRC.

Makefile.am does not exist in the Singular sources. However, AC_SUBST is used in several places.

Anyway, I guess inserting AC_CONFIG_HEADERS is the easiest thing to do.

comment:75 follow-up: Changed 7 years ago by fbissey

I wouldn't add AC_CONFIG_HEADERS if it isn't already there. Are you working on the top level or at the Singular folder level? configure.in in Singular does call AC_CONFIG_HEADERS and puts definitions in mod2.h.

comment:76 in reply to: ↑ 75 Changed 7 years ago by SimonKing

Replying to fbissey:

I wouldn't add AC_CONFIG_HEADERS if it isn't already there. Are you working on the top level or at the Singular folder level? configure.in in Singular does call AC_CONFIG_HEADERS and puts definitions in mod2.h.

No, it doesn't use AC_CONFIG_HEADERS but AC_CONFIG_HEADER. That's why grep didn't find it.

Even better: omalloc/configure.in uses it as well - hence, this should be the right place to put the missing symbol.

comment:77 Changed 7 years ago by SimonKing

I am slightly shocked that omalloc/configure.in uses a hardcoded path to malloc.h:

AC_CHECK_HEADERS(unistd.h sys/mman.h fcntl.h /usr/include/malloc.h)

Isn't that bad?

comment:78 Changed 7 years ago by SimonKing

  • Cc malb added

Ccing Marting Albrecht, because he knows the Singular spkg.

Next thing I wonder about: The spkg contains src/omalloc/configure, but on top it says that this is auto-generated. By consequence, when I tried to change src/omalloc/configure.in, my changes had no effect.

So, my first guess was that configure has to be removed - but that didn't work.

Martin, do you think patching src/omalloc/configure directly is the right thing to do? Or what should I do in order to change the configuration in src/omalloc?

comment:79 follow-up: Changed 7 years ago by malb

Simon, configure is generated from configure.in using autoconf. However, Singular requires a special autoconf version (I think it is a particularly old one) which makes fixing things in its configure script tricky. I think this should go on [libsingular-devel]. Hans can help, I think.

comment:80 in reply to: ↑ 79 Changed 7 years ago by SimonKing

Replying to malb:

Simon, configure is generated from configure.in using autoconf. However, Singular requires a special autoconf version (I think it is a particularly old one) which makes fixing things in its configure script tricky. I think this should go on [libsingular-devel]. Hans can help, I think.

I am not subscribed to libsingular-devel. Can you provide a link?

FYI:

I think this ticket has two purposes.

  1. Create a Singular spkg that builds both on linux and osx and replaces omalloc by malloc, so that it is easier to track down memory corruptions in Singular that became apparent by #715 and friends. It is just a tool to analyse things - in fact, it wouldn't even yield a working Singular executable.
  2. The study of memory corruptions should help to fix them - and the fixes should not only be upstream, but should be backported to a new patch level of our Singular 3-1-5 spkg.

The problem with src/omalloc/configure only concerns purpose 1. Hence, if directly editing src/omalloc/configure is dirty but works, then I wouldn't mind to do it, just so that it builds on osx.

Fortunately, the fixes in purpose 2. would probably not involve the configuration scripts.

Last edited 7 years ago by SimonKing (previous) (diff)

comment:81 Changed 7 years ago by SimonKing

Sorry, when posting the previous comment, I forgot to delete two paragraphs before submitting...

comment:82 follow-up: Changed 7 years ago by SimonKing

On libsingular-devel, Hans suggests to use xalloc, not malloc to detect memory-related problems.

Nils, do you know how to do those things?

comment:83 in reply to: ↑ 82 Changed 7 years ago by nbruin

Replying to SimonKing:

Nils, do you know how to do those things?

No, I don't, but it looks like you're learning how to on libsingular-devel already. The xalloc approach seems to be what I was contemplating before I read up on malloc_size and malloc_usable_size. It's interesting to see that the singular people thought it was easier to write a wrapper from scratch than to amend omalloc. I can sympathize with that, although it wasn't so bad in the end.

If xalloc does result in a usable executable, that would be great! I wonder what prevents us from getting that.

It may be that once you've built xalloc.so, you can just do LD_PRELOAD=xalloc.so to pre-empt the normal omalloc routines. That's how things like ElectricFence capture malloc action (provided omalloc doesn't get statically linked)

comment:84 Changed 7 years ago by SimonKing

Sorry, my knowledge of automake is too limited. So far, I did not succeed to create a package that replaces omalloc by xalloc.

What I did (see also the thread in libsingular):

  • replace omalloc/ by the contents of xalloc/ in Singular Spielwiese
  • run aclocal in omalloc
  • run autoconf in omalloc
  • run automake --install-missing in omalloc
  • run automake again in omalloc
  • add the line install-nolns: install to the resulting Makefile.in
  • turn all this into a conditional patch, that is applied if export SINGULAR_XALLOC=yes was done.

When I try to install the spkg, it fails with

make install-nolns in omalloc
make[1]: Entering directory `/home/simon/SAGE/debug/sage-5.5.rc0/spkg/build/singular-3-1-5.p2/src/omalloc'
/bin/sh ./libtool  --tag=CC   --mode=compile gcc -DPACKAGE_NAME=\"xalloc\" -DPACKAGE_TARNAME=\"xalloc\" -DPACKAGE_VERSION=\"3.1.2.sw\" -DPACKAGE_STRING=\"xalloc\ 3.1.2.sw\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"xalloc\" -DVERSION=\"3.1.2.sw\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -I.  -I. -DHAVE_CONFIG_H -DNDEBUG -DOM_NDEBUG -I/home/simon/SAGE/debug/sage-5.5.rc0/local/include -I/home/simon/SAGE/debug/sage-5.5.rc0/local/include  -O3  -O2 -g  -fPIC -MT libomalloc_la-dummy.lo -MD -MP -MF .deps/libomalloc_la-dummy.Tpo -c -o libomalloc_la-dummy.lo `test -f 'dummy.c' || echo './'`dummy.c
mv -f .deps/libomalloc_la-dummy.Tpo .deps/libomalloc_la-dummy.Plo
mv: Aufruf von stat für „.deps/libomalloc_la-dummy.Tpo“ nicht möglich: Datei oder Verzeichnis nicht gefunden
make[1]: *** [libomalloc_la-dummy.lo] Fehler 1
make[1]: Leaving directory `/home/simon/SAGE/debug/sage-5.5.rc0/spkg/build/singular-3-1-5.p2/src/omalloc'
make: *** [install-nolns] Fehler 1
Unable to build and install Singular
Error building Singular (error in build_singular).

Apparently the .deps directory was automatically created by the script ./depcomp, that belongs to those installed by automake --install-missing. Its content after the installation failed:

simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> ls spkg/build/singular-3-1-5.p2/src/omalloc/.deps/
libomalloc_g_la-dummy.Plo  libomalloc_la-dummy.Plo

And to be honest, I don't see the point of the dummy.c file in xalloc, which is:

#include "omalloc.h"

int om_sing_opt_show_mem; /* dummy */
struct omInfo_s om_Info; /* dummy */
struct omOpts_s om_Opts; /* dummy */

comment:85 Changed 7 years ago by SimonKing

With the help of the good people of libsingular-devel, I managed to create a new singular spkg.

When you export SINGULAR_XALLOC=yes then the spkg creates a libsingular library and a Singular executable (yes, it starts and can compute things!!) with omalloc replaced by xalloc (a compatibility layer on top of malloc). Otherwise, the spkg just builds with omalloc.

The good news:

  1. Singular works.
    simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> ./sage -singular
                         SINGULAR                                 /  Development
     A Computer Algebra System for Polynomial Computations       /   version 3-1-5
                                                               0<
     by: W. Decker, G.-M. Greuel, G. Pfister, H. Schoenemann     \   Jul 2012
    FB Mathematik der Universitaet, D-67653 Kaiserslautern        \
    > ring r;
    > x;
    x
    > x*y+z;
    xy+z
    > quit;
    Auf Wiedersehen.
    
    There is no crash in the example above!
  2. Sage starts and quits just fine, even if libsingular is involved:
    simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> ./sage 
    ----------------------------------------------------------------------
    | Sage Version 5.5.rc0, Release Date: 2012-11-17                     |
    | Type "notebook()" for the browser-based notebook interface.        |
    | Type "help()" for help.                                            |
    ----------------------------------------------------------------------
    **********************************************************************
    *                                                                    *
    * Warning: this is a prerelease version, and it may be unstable.     *
    *                                                                    *
    **********************************************************************
    sage: P.<x,y> = QQ[]
    sage: x*y
    x*y
    sage: quit
    Exiting Sage (CPU time 0m0.05s, Wall time 0m9.80s).
    
  3. We can reproduce the error from the ticket description:
    simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> export MALLOC_CHECK_=3
    simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> ./sage 
    ----------------------------------------------------------------------
    | Sage Version 5.5.rc0, Release Date: 2012-11-17                     |
    | Type "notebook()" for the browser-based notebook interface.        |
    | Type "help()" for help.                                            |
    ----------------------------------------------------------------------
    **********************************************************************
    *                                                                    *
    * Warning: this is a prerelease version, and it may be unstable.     *
    *                                                                    *
    **********************************************************************
    sage: A.<x,y> = FreeAlgebra(QQ, 2)
    sage: P.<x,y> = A.g_algebra({y*x:-x*y})
    sage: x._mul_(y)
    *** glibc detected *** python: free(): invalid pointer: 0x0000000002cf4920 ***
    ======= Backtrace: =========
    /lib64/libc.so.6(+0x766d6)[0x7f4331a486d6]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libsingular.so(_Z14gnc_mm_Mult_nnPiS_P9sip_sring+0x301)[0x7f431764de91]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libsingular.so(_Z20gnc_p_Mult_mm_CommonP8spolyrecS0_iP9sip_sring+0x21c)[0x7f431764ef5c]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/python2.7/site-packages/sage/libs/singular/polynomial.so(+0x4c60)[0x7f4316a2cc60]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/python2.7/site-packages/sage/rings/polynomial/plural.so(+0x1d278)[0x7f4316e70278]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/python2.7/site-packages/sage/rings/polynomial/plural.so(+0xbb63)[0x7f4316e5eb63]
    ...
    
  4. Even better: We additionally get a crash when we do the example without MALLOC_CHECK, namely when quitting Sage:
    simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> export MALLOC_CHECK_=
    simon@linux-sqwp:~/SAGE/debug/sage-5.5.rc0> ./sage
    ----------------------------------------------------------------------
    | Sage Version 5.5.rc0, Release Date: 2012-11-17                     |
    | Type "notebook()" for the browser-based notebook interface.        |
    | Type "help()" for help.                                            |
    ----------------------------------------------------------------------
    **********************************************************************
    *                                                                    *
    * Warning: this is a prerelease version, and it may be unstable.     *
    *                                                                    *
    **********************************************************************
    sage: A.<x,y> = FreeAlgebra(QQ, 2)
    sage: P.<x,y> = A.g_algebra({y*x:-x*y})
    sage: x._mul_(y)
    x*y
    sage: quit
    Exiting Sage (CPU time 0m0.18s, Wall time 0m7.89s).
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libcsage.so(print_backtrace+0x31)[0x7f657e22bc77]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libcsage.so(sigdie+0x14)[0x7f657e22bca9]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libcsage.so(sage_signal_handler+0x216)[0x7f657e22b886]
    /lib64/libpthread.so.0(+0xfd00)[0x7f6583601d00]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(+0x129940)[0x7f6583938940]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(PyGC_Collect+0x24)[0x7f6583939694]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(Py_Finalize+0x116)[0x7f6583925316]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(Py_Exit+0x8)[0x7f6583924308]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(+0x115434)[0x7f6583924434]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(PyErr_PrintEx+0x205)[0x7f65839246d5]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0x1fe)[0x7f6583924b4e]
    /home/simon/SAGE/debug/sage-5.5.rc0/local/lib/libpython2.7.so.1.0(Py_Main+0xbd5)[0x7f6583938035]
    /lib64/libc.so.6(__libc_start_main+0xed)[0x7f6582c2523d]
    python[0x4006e1]
    
    ------------------------------------------------------------------------
    Unhandled SIGSEGV: A segmentation fault occurred in Sage.
    This probably occurred because a *compiled* component of Sage has a bug
    in it and is not properly wrapped with sig_on(), sig_off(). You might
    want to run Sage under gdb with 'sage -gdb' to debug this.
    Sage will now terminate.
    ------------------------------------------------------------------------
    /home/simon/SAGE/debug/sage-5.5.rc0/spkg/bin/sage: Zeile 310: 25245 Speicherzugriffsfehler  sage-ipython "$@" -i
    

Conclusion

I think this spkg is something we can work with to detect errors.

comment:86 Changed 7 years ago by SimonKing

PS:

  • I did not test whether this spkg builds on OSX.
  • The next step should be to include the bug fix from singular_15435.patch and similar upstream fixes, and see whether the crash persists...
Last edited 7 years ago by SimonKing (previous) (diff)

comment:87 Changed 7 years ago by SimonKing

PPS: I am afraid the bug fixes and the xalloc patch do not commute. Hence, when including the fixes, I need to update the xalloc patch as well.

comment:88 Changed 7 years ago by SimonKing

Hooray, it does build on OSX (bsd.math)!!

Changed 7 years ago by SimonKing

Backporting a small part of Singular changeset baadc0f7

comment:89 Changed 7 years ago by SimonKing

  • Authors set to Nils Bruin, Simon King
  • Description modified (diff)
  • Report Upstream changed from Reported upstream. Developers acknowledge bug. to Fixed upstream, in a later stable release.
Note: See TracTickets for help on using tickets.