#13731 closed defect (fixed)
Fix libsingular memory management
Reported by:  nbruin  Owned by:  rlm 

Priority:  major  Milestone:  sage5.6 
Component:  memleak  Keywords:  
Cc:  SimonKing, fbissey, malb  Merged in:  sage5.6.beta0 
Authors:  Nils Bruin, Simon King  Reviewers:  Nils Bruin 
Report Upstream:  Fixed upstream, in a later stable release.  Work issues:  
Branch:  Commit:  
Dependencies:  Stopgaps: 
Description (last modified by )
We created a new patchlevel for the singular315 spkg. It contains backports of some upstream fixes of several memory corruptions.
If one does export SINGULAR_XALLOC=yes
before installing the spkg, Singular's usual memory manager "omalloc" will be replaced by "xalloc". This is a compatibility layer on top of malloc. The resulting Singular executable and library will be slower, but allows the use of some debug tools.
Here is the background.
We have several indications that interaction with libsingular's memory management is tricky. Debugging is hard, because libsingular uses omalloc for its memory management and that hides all allocation/deallocation from conventional memory checking tools. Singular's allocation library is relatively pluggable, though, and (at a speed penalty) one can mostly switch it to using the system malloc for everything.
Preliminary packages for linux and for osx using malloc are found here:
 http://sage.math.washington.edu/home/nbruin/singular315.malloclinux.spkg (for linux). It replaces omalloc by the system malloc and tries to keep the changes small. It does build a libsingular library, but the Singular executable does not work.
 http://sage.math.washington.edu/home/nbruin/singular315.malloc.spkg (for OSX). Same as the previous item.
With the help from libsingulardevel, we managed to use xalloc instead of malloc. We now obtain Singular executable and library, and can still use debugging tools.
Install
The spkg contains singular_15435.patch and singular_part_of_changeset_baadc0f7.patch, which backport upstream fixes.
It usually builds with omalloc. If export SINGULAR_XALLOC=yes
, then it builds with xalloc, which is a compatibility layer for omalloc on top of malloc.
Bugs fixed
With the preliminary linux spkg, one obtains:
$ export MALLOC_CHECK_ 3 $ sage ... sage: A.<x,y> = FreeAlgebra(QQ, 2) sage: P.<x,y> = A.g_algebra({y*x:x*y}) sage: x*y *** glibc detected *** /usr/local/sage/5.5b1/local/bin/python: free():
gdb backtrace:
#0 0x00000031cfe36285 in raise () from /lib64/libc.so.6 #1 0x00000031cfe37b9b in abort () from /lib64/libc.so.6 #2 0x00000031cfe7774e in __libc_message () from /lib64/libc.so.6 #3 0x00000031cfe7da76 in malloc_printerr () from /lib64/libc.so.6 #4 0x00007fffd82d5c34 in gnc_mm_Mult_nn (F0=<optimized out>, G0=<optimized out>, r=0x2faa720) at gring.cc:507 #5 0x00007fffd82d6abc in gnc_p_Mult_mm_Common (p=0x326e8c0, m=<optimized out>, side=0, r=0x2faa720) at gring.cc:424 #6 0x00007fffd76c7860 in nc_mm_Mult_pp (p=0x342aaf0, r=0x2faa720, m=0x2fb6530) at /usr/local/sage/5.5b1/local/include/singular/gring.h: 171 #7 pp_Mult_qq (r=0x2faa720, q=0x342aaf0, p=0x2fb6530) at /usr/local/ sage/5.5b1/local/include/singular/pInline2.h:667 #8 __pyx_f_4sage_4libs_8singular_10polynomial_singular_polynomial_mul (__pyx_v_ret=0x7fffffffb3c8, __pyx_v_p=0x2fb6530, __pyx_v_q=0x342aaf0, __pyx_v_r=0x2faa720) at sage/libs/singular/polynomial.cpp:3895 #9 0x00007fffd7b096f8 in __pyx_f_4sage_5rings_10polynomial_6plural_19NCPolynomial_plural__mul_ (__pyx_v_left=0x7fffbe6bd758, __pyx_v_right=0x7fffbe6bd878, __pyx_skip_dispatch=<optimized out>) at sage/rings/polynomial/plural.cpp:12060
running it in valgrind gets you complaints about the same locations.
On OSX, MALLOC_CHECK_
is not available. But gmalloc does detect the crash.
The new spkg fixes that problem
Testing
Meanwhile two of the underlying problems where found and fixed upstream. http://sage.math.washington.edu/home/SimonKing/SAGE/spkgs/singular315.p2.spkg contains backports of these fixes. With the exeption of one unrelated problem in sage/graphs/, the whole Sage test suite passes on Linux with export MALLOC_CHECK_=yes
.
Attachments (5)
Change History (142)
comment:1 Changed 8 years ago by
comment:2 followup: ↓ 43 Changed 8 years ago by
We can also use ElectricFence? as a malloc substitute:
$ export LD_PRELOAD=libefence.so $ export EF_ALLOW_MALLOC_0=1 #apparently sage/python does a malloc(0) somewhere $ echo 512000  sudo tee /proc/sys/vm/max_map_count #efence uses mprotect A LOT $ ./sage t gdb devel/sage/sage/libs/singular/function.pyx
This gives an error in a different location. Traceback:
#0 0x00007fffce9fb3f2 in List<CanonicalForm>::isEmpty(this=0x7fffa515b000) at ./templates/ftmpl_list.cc:256 #1 0x00007fffce96a350 in multiFactorize (F=..., v=...) at facFactorize.cc:710 #2 0x00007fffce905e9d in ratSqrfFactorize (v=..., G=...) at facFactorize.h:45 #3 ratFactorize (G=..., v=..., substCheck=<optimized out>) at facFactorize.h:125 #4 0x00007fffce90303c in factorize (f=..., issqrfree=false) at cf_factor.cc:651 #5 0x00007fffce7a5b44 in singclap_factorize (f=0x7fff9e16dfd0, v=0x7fffffffb548, with_exps=0) at clapsing.cc:835 #6 0x00007fffce720d50 in jjFAC_P (res=0x7fff9e16ffc0, u=0x7fff9ea7afc0) at iparith.cc:3995 #7 0x00007fffce729c42 in iiExprArith1 (res=0x7fff9e16ffc0, a=0x7fff9ea7afc0, op=424) at iparith.cc:7869 #8 0x00007fffce12df55 in __pyx_f_4sage_4libs_8singular_8function_17KernelCallHandler_handle_call(__pyx_v_self=0x7fff9e967b48, __pyx_v_argument_list=0x7fff9e975b40, __pyx_optional_args=<optimized out>) at sage/lib/singular/function.cpp:11286
All these error point in the same direction, though: As soon as we start interacting with the singular interpreter we get exposed to whatever memory management it does and we're in big trouble. Use of libsingular on more basic levels usually doesn't expose us to libsingular memory management, so no problems arise.
comment:3 Changed 8 years ago by
 Cc SimonKing added
comment:4 in reply to: ↑ description ; followup: ↓ 5 Changed 8 years ago by
$ export MALLOC_CHECK_ 3 $ sage ... sage: A.<x,y> = FreeAlgebra (QQ, 2) sage: P.<x,y> = A.g_algebra({y*x:x*y}) sage: x*y *** glibc detected *** /usr/local/sage/5.5b1/local/bin/python: free():
I do not see this in my current sagemath5.4 rpm package, but I suspect it may have been either corrected or worked around by one of the patches at http://pkgs.fedoraproject.org/cgit/Singular.git/tree most likely Singularundefined.patch
, but the padding added by SingularM2_memutil_debuggging.patch
may be actually hiding some problem. Or, it may be something different, the singular rpm was known to not build properly or generate proper binaries if an older singular was installed but I did not care much about that because official rpms are built in a clean chroot.
comment:5 in reply to: ↑ 4 ; followup: ↓ 6 Changed 8 years ago by
Replying to pcpa:
I do not see this in my current sagemath5.4 rpm package
Thanks for checking!
Are you sure your singular gets built to use the glibc malloc for all small allocations and not omalloc on "malloc"ed pages? You may try setting EXTRA=0
to remove the padding. I don't expect that padding comes into play, though. I suspect a double free or a free on a pointer that doesn't point to a malloced block to begin with.
comment:6 in reply to: ↑ 5 ; followup: ↓ 7 Changed 8 years ago by
Replying to nbruin:
Replying to pcpa:
I do not see this in my current sagemath5.4 rpm package
Thanks for checking!
Are you sure your singular gets built to use the glibc malloc for all small allocations and not omalloc on "malloc"ed pages? You may try setting
EXTRA=0
to remove the padding. I don't expect that padding comes into play, though. I suspect a double free or a free on a pointer that doesn't point to a malloced block to begin with.
Sorry, I completely misunderstood what was mean't by the spkg and the 'withemulateomalloc' option. I will try to debug more later, but I cannot build the Singular rpm with that option as it always fails in the kernel subdirectory due to missing omEmulate* declarations.
comment:7 in reply to: ↑ 6 Changed 8 years ago by
Sorry, I completely misunderstood what was mean't by the spkg and the 'withemulateomalloc' option. I will try to debug more later, but I cannot build the Singular rpm with that option as it always fails in the kernel subdirectory due to missing omEmulate* declarations.
Yes, my edits in the linkedto singular spkgs add declarations to make those errors go away. I didn't exactly respect the rest of build options in the process, though (because I didn't understand them). Cleaning that up to a welldefined patch to the singular package would be a great help.
comment:8 Changed 8 years ago by
I am trying to build Sage5.5.rc0 with the malloc singular spkg for linux. Platform: openSuse
12.1, on Intel Core2 Duo.
What I tried: Untar the sage tarball, replace the usual Singular spkg with yours, and type make. Singular fails to build, because of
g++ O2 I. I.. I/home/simon/SAGE/debug/sage5.5.rc0/local I/home/simon/SAGE/debug/sage5.5.rc0/local/include I/home/simon/SAGE/debug/sage5.5.rc0/local/include I/home/simon/SAGE/debug/sage5.5.rc0/local/include I/usr/local/include DNDEBUG DOM_NDEBUG Dx86_64_Linux DHAVE_CONFIG_H DESINGULAR DPROTO o ESingular emacs.cc ../kernel/fegetopt.o \ rdynamic L/home/simon/SAGE/debug/sage5.5.rc0/local/kernel L../kernel lkernel L/home/simon/SAGE/debug/sage5.5.rc0/local/lib L/home/simon/SAGE/debug/sage5.5.rc0/local/lib L/usr/local/lib ldl lm lsingfac lsingcf lntl lgmp lreadline lncurses lnsl lm lnsl lbsd lomalloc lpthread ../kernel/mmalloc.o /usr/lib64/gcc/x86_64suselinux/4.6/../../../../x86_64suselinux/bin/ld: cannot find lbsd
Could it be that it is a problem with SAGE_ROOT/local/lib
versus SAGE_ROOT/local/lib64
? That used sometimes to be a problem with openSuse
.
comment:9 Changed 8 years ago by
No, libdir is used in spkginstall. But I wonder: The original Singular spkg contains three Makefiles: in dyn_modules/python, in calldummy/ and in omalloc/Misc/dlmalloc. But the malloc Singular package from here contains 14 Makefiles. Could it be that some of them should better be created by ./configure
?
comment:10 Changed 8 years ago by
The spkg also contains several files created by configure scripts. Is that really intended?
comment:11 Changed 8 years ago by
When I removed all autogenerated files from your spkg, installing singular315.malloclinux succeeded!
I guess you are aware that one is supposed to leave the Singular source files untouched when creating an spkg  after all, Singularmalloc is for testing only.
comment:12 Changed 8 years ago by
Building sage5.5.rc0 did almost succeed.
... gcc fnostrictaliasing fwrapv DNDEBUG g fwrapv O3 Wall Wstrictprototypes fPIC I/home/simon/SAGE/debug/sage5.5.rc0/local/include/singular/ I/home/simon/SAGE/debug/sage5.5.rc0/local/include I/home/simon/SAGE/debug/sage5.5.rc0/local/include/csage I/home/simon/SAGE/debug/sage5.5.rc0/devel/sage/sage/ext I/home/simon/SAGE/debug/sage5.5.rc0/local/include/python2.7 c sage/algebras/letterplace/free_algebra_element_letterplace.cpp o build/temp.linuxx86_642.7/sage/algebras/letterplace/free_algebra_element_letterplace.o w cc1plus: warning: command line option ‘Wstrictprototypes’ is valid for Ada/C/ObjC but not for C++ [enabled by default] cc1plus: warning: command line option ‘Wstrictprototypes’ is valid for Ada/C/ObjC but not for C++ [enabled by default] In file included from /home/simon/SAGE/debug/sage5.5.rc0/local/include/singular/structs.h:13:0, from /home/simon/SAGE/debug/sage5.5.rc0/local/include/libsingular.h:6, from sage/algebras/letterplace/free_algebra_element_letterplace.cpp:283: /home/simon/SAGE/debug/sage5.5.rc0/local/include/omalloc.h:784:30: fatal error: omalloc/omMalloc.h: Datei oder Verzeichnis nicht gefundenIn file included from /home/simon/SAGE/debug/sage5.5.rc0/local/include/singular/structs.h:13:0, from /home/simon/SAGE/debug/sage5.5.rc0/local/include/libsingular.h:6, from sage/algebras/letterplace/free_algebra_letterplace.cpp:283: /home/simon/SAGE/debug/sage5.5.rc0/local/include/omalloc.h:784:30: fatal error: omalloc/omMalloc.h: Datei oder Verzeichnis nicht gefunden compilation terminated.
So, that looks like coming from Plural.
On the other hand, your example in the ticket description does use plural.
What went wrong here? I did erase src/omalloc/omMalloc.h from your spkg, since it is not part of the "original" Singular spkg (hence, I thought it was autogenerated).
comment:13 followup: ↓ 14 Changed 8 years ago by
PS: Here is a part of the install log for singularmalloc:
updating cache .././config.cache creating ./config.status creating Makefile creating omConfig.h creating omlimits.h creating omMalloc.h configuring in factory running /bin/sh ./configure prefix=/home/simon/SAGE/debug/sage5.5.rc0/local execprefix=/home/simon/SAGE/debug/sage5.5.rc0/local bindir=/home/simon/SAGE/debug/sage5.5.rc0/local/bin libdir=/home/simon/SAGE/debug/sage5.5.rc0/local/lib withapint=gmp withmalloc=system withemulateomalloc withNTL withoutMP withoutlex withoutBoost withoutflint enableSingular enablefactory enablelibfac enableIntegerProgramming disabledoc withdebug includedir=/home/simon/SAGE/debug/sage5.5.rc0/local/include enableomalloc withexternalconfig_h='/home/simon/SAGE/debug/sage5.5.rc0/spkg/build/singular315.malloclinux/src/Singular/omSingularConfig.h' withtrackcustom enablePlural withfactory withlibfac withSingular=yes cachefile=.././config.cache srcdir=.
So, the missing header actually is created.
Also, I don't understand why omMalloc.h
is needed: In a previous Sage installation with the standard Singular 315 package, I can't find omMalloc.h
in SAGE_ROOT/local/...
.
comment:14 in reply to: ↑ 13 Changed 8 years ago by
Replying to SimonKing:
So, the missing header is actually is created.
Yes, indeed, and that is the one I copy over manually because I didn't figure out how to do it in the install script (that should be straightforward though)
Also, I don't understand why
omMalloc.h
is needed: In a previous Sage installation with the standard Singular 315 package, I can't findomMalloc.h
inSAGE_ROOT/local/...
.
It might not be. The reason why it is needed now is because of the rough way I mutilated the omalloc source in Singular. Its configuration process is a big tangle of pasting together header files, so when I finally found a way to effect the desired changes I was already happy. That involved knocking out some #ifdef ... #endif
preprocessor directives in the source. I'm sure it's straightforward to fix that a little more nicely (I wasn't aiming for production code at the time, just a quick tool to debug stuff). That activated a #include <omalloc/omAlloc.h>
as well.
I don't know if errors happen if you delete that line.
Anyway, the libsingular*.pyx
files import the header file in which that include ends up, so that's why presently, that file needs to be in the include paths used by sage b
.
I wasn't exaggerating when I called the package rather untidy ...
comment:15 Changed 8 years ago by
 Cc fbissey added
comment:16 Changed 8 years ago by
I managed to build Sage with Singularmalloc. What I did to make it work:
 open the spkg
 remove all autogenerated files
 change src/omalloc/Makefile.in so that the freshly built
omMalloc.h
is installed in$(includedir)/omalloc
(creating that directory first)
Apparently the singular executable is not usable (when I try to start it, it simply seems to hang).
I can reproduce the crash from the ticket description:
simon@linuxsqwp:~/SAGE/debug/sage5.5.rc0> export MALLOC_CHECK_=3 simon@linuxsqwp:~/SAGE/debug/sage5.5.rc0> ./sage   Sage Version 5.5.rc0, Release Date: 20121117   Type "notebook()" for the browserbased notebook interface.   Type "help()" for help.   ********************************************************************** * * * Warning: this is a prerelease version, and it may be unstable. * * * ********************************************************************** sage: A.<x,y> = FreeAlgebra(QQ, 2) sage: P.<x,y> = A.g_algebra({y*x:x*y}) sage: x*y *** glibc detected *** python: free(): invalid pointer: 0x0000000002806c90 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x766d6)[0x7fa9ed9cc6d6] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libsingular.so(_Z14gnc_mm_Mult_nnPiS_P9sip_sring+0x294)[0x7fa9d35c7c54] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libsingular.so(_Z20gnc_p_Mult_mm_CommonP8spolyrecS0_iP9sip_sring+0x1ec)[0x7fa9d35c8acc] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/python2.7/sitepackages/sage/libs/singular/polynomial.so(+0x4bd0)[0x7fa9d29a9bd0] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/python2.7/sitepackages/sage/rings/polynomial/plural.so(+0x1fbd8)[0x7fa9d2deebd8] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/python2.7/sitepackages/sage/structure/element.so(+0x2a399)[0x7fa9e4bed399] ...
So, that's a good platform to try to hunt the Heisenbugs from #12215, #715 and friends once and for all.
comment:17 followup: ↓ 18 Changed 8 years ago by
How do I make it create a core dump, and how do I open it with gdb? I recall something with "ulimit c", but I can't recall the details...
sage gdb
does not work, by the way:
simon@linuxsqwp:~/SAGE/debug/sage5.5.rc0> ./sage gdb   Sage Version 5.5.rc0, Release Date: 20121117   Type "notebook()" for the browserbased notebook interface.   Type "help()" for help.   ********************************************************************** * * * Warning: this is a prerelease version, and it may be unstable. * * * ********************************************************************** /home/simon/SAGE/debug/sage5.5.rc0/local/bin/sageipython GNU gdb (GDB) SUSE (7.341.1.2) Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64suselinux". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /home/simon/SAGE/debug/sage5.5.rc0/local/bin/python...done. [Thread debugging using libthread_db enabled] Python 2.7.3 (default, Nov 21 2012, 11:21:15) [GCC 4.6.2] on linux2 Type "help", "copyright", "credits" or "license" for more information.  ImportError Traceback (most recent call last) /home/simon/SAGE/debug/sage5.5.rc0/local/lib/python2.7/sitepackages/IPython/ipmaker.pyc in force_import(modname, force_reload) 61 reload(sys.modules[modname]) 62 else: > 63 __import__(modname) 64 65 ImportError: No module named ipy_profile_sage Error importing ipy_profile_sage  perhaps you should run %upgrade? WARNING: Loading of ipy_profile_sage failed. sage: A.<x,y> = FreeAlgebra(QQ, 2)  File "<ipython console>", line 1 A.<x,y> = FreeAlgebra(QQ, 2) ^ SyntaxError: invalid syntax
comment:18 in reply to: ↑ 17 ; followup: ↓ 19 Changed 8 years ago by
Replying to SimonKing:
How do I make it create a core dump, and how do I open it with gdb? I recall something with "ulimit c", but I can't recall the details...
I'd guess ulimit c unlimited
.
sage gdb
does not work, by the way:
Yes, I had the same problem (and it's not singularmalloc's fault). It seems to be a bad interaction between ipython and gdb. sage t gdb
does work, so I've just worked around it by wrapping it in a little test file:
tester.py
:
def t(): r""" sage: A.<x,y> = FreeAlgebra(QQ, 2) sage: P.<x,y> = A.g_algebra({y*x:x*y}) sage: x*y """ pass
Congratulations on getting my frankenspkg to build ...
comment:19 in reply to: ↑ 18 Changed 8 years ago by
comment:20 Changed 8 years ago by
By the way, what is the ultimate aim of this ticket? Replacing Singular's memory management by malloc makes Singular very slow, if I understand correctly. So, the aim is to provide a tool to investigate flaws in SageSingular interrelation, right?
So, could one say that the aim is to provide an optional package?
comment:21 Changed 8 years ago by
First observation: The crash has nothing to do with coercion.
sage: A.<x,y> = FreeAlgebra(QQ, 2) sage: P.<x,y> = A.g_algebra({y*x:x*y}) sage: x._mul_(y) *** glibc detected *** python: free(): invalid pointer: 0x0000000002dd86c0 ***
That's a good news, because the absence of coercion makes life a lot easier...
comment:22 Changed 8 years ago by
Now the bad news: By inserting print statements into the _mul_ method, I found that the crash occurs in the line
singular_polynomial_mul(&_p, left._poly, (<NCPolynomial_plural>right)._poly, (<NCPolynomialRing_plural>left._parent)._ring)
I was somehow hoping that it would happen in the new_NCP
function.
comment:23 Changed 8 years ago by
News become worse: The crash happens in pp_Mult_qq
, so that I start to believe it might be an upstream problem.
comment:24 Changed 8 years ago by
The pp_Mult_qq
function has three arguments: Two pointers p,q to polynomials, and a pointer r to a ring.
When I tested, the hex address of the polynomials resp. the ring being pointed at were
p = 0x2dd3e10 q = 0x2dd38a0 r = 0x2dfa7e0
But I can't relate these figures to the address reported by the backtrace:
*** glibc detected *** python: free(): invalid pointer: 0x0000000002bb4ee0 ***
Do you have an idea how this can be interpreted?
comment:25 Changed 8 years ago by
Looking into the upstream sources and trying to find what the gdb backtrace is telling, I see in line 667 of pInline2.h
:
PINLINE2 poly pp_Mult_qq(poly p, poly q, const ring r) { poly last; if (p == NULL  q == NULL) return NULL; if (pNext(p) == NULL) { #ifdef HAVE_PLURAL if (rIsPluralRing(r)) return nc_mm_Mult_pp(p, q, r); #endif return r>p_Procs>pp_Mult_mm(q, p, r, last); }
and line 667 is return nc_mm_Mult_pp(p, q, r)
. That function is defined in gring.h
(why in a header file, by the way?). I find no indication of "free" being used there, except that it calls r>GetNC()>p_Procs.mm_Mult_pp(...)
, and I have no idea where that method is being defined.
So, I'll ask upstream.
comment:26 Changed 8 years ago by
I should have read your backtrace more carefully. It mentions gnc_mm_Mult_nn
, and that's what I am analysing now. A shame that one needs to recompile so many things for testing it...
comment:27 followup: ↓ 28 Changed 8 years ago by
OK, the error occurs in the function poly gnc_mm_Mult_nn(int *F0, int *G0, const ring r)
, whose definition in kernel/gring.cc
starts at line 463.
I inserted some print statements, and found the crash occurs in this chunk of code:
if (iF<=jG) /* i.e. no mixed exp_num , MERGE case */ { p_MemAdd_LengthGeneral(F, G, ExpSize/sizeof(long)); p_SetExpV(out,F,r); p_Setm(out,r); // omFreeSize((ADDRESS)F,ExpSize); printf("second freeT F,rN\n"); freeT(F,rN); printf("second freeT G,rN\n"); freeT(G,rN); printf("second return out\n"); return(out); }
The last line printed is second freeT F,rN
. Hence, F seems to be corrupted. Here is ho F is created:
int *F=(int *)omAlloc0((rN+1)*sizeof(int)); int *G=(int *)omAlloc0((rN+1)*sizeof(int)); memcpy(F, F0,(rN+1)*sizeof(int)); // pExpVectorCopy(F,F0); memcpy(G, G0,(rN+1)*sizeof(int)); // pExpVectorCopy(G,G0); F[0]=0; /* important for p_MemAdd */ G[0]=0;
Is it a problem that omAlloc0
is explicitly called? Or is it just a macro, and actually defaults to malloc with your spkg?
comment:28 in reply to: ↑ 27 ; followup: ↓ 29 Changed 8 years ago by
Replying to SimonKing:
The last line printed is
second freeT F,rN
. Hence, F seems to be corrupted. Here is ho F is created:int *F=(int *)omAlloc0((rN+1)*sizeof(int)); int *G=(int *)omAlloc0((rN+1)*sizeof(int)); memcpy(F, F0,(rN+1)*sizeof(int)); // pExpVectorCopy(F,F0); memcpy(G, G0,(rN+1)*sizeof(int)); // pExpVectorCopy(G,G0); F[0]=0; /* important for p_MemAdd */ G[0]=0;Is it a problem that
omAlloc0
is explicitly called? Or is it just a macro, and actually defaults to malloc with your spkg?
That all should be safe. (omalloc already had an omEmulateAlloc config option, but it was missing declarations and therefore Singular failed to compile. I added those. Part of it was an omMemDup
routine that takes only a pointer, so it needs to know how large the block pointed to actually is. I needed the linuxspecific malloc_usable_size
for this. It looks into the 8 bytes preceding the block, which makes valgrind complain)
The actual declarations are for the most part near omalloc.h:777
, which comes from omAllocDecl.h:280.
#define omAlloc0(size) omEmulateAlloc0(size)
which is only defined in omAllocEmulate.c
:
void* omEmulateAlloc0(size_t size) { void* addr = OM_MALLOC_MALLOC(size); memset(addr, 0, size); return addr; }
where 'OM_MALLOC_MALLOC' is 'malloc' as far as I can tell (the number of indirections involved in omalloc probably is very useful but it drove me to despair). So that looks perfectly safe to me. Furthermore, I didn't touch that file.
I suppose the corruption happens somewhere lower down. Incidentally, the valgrind report points to exactly those two snippets. So valgrind reporting for these things is really very good.
comment:29 in reply to: ↑ 28 ; followup: ↓ 30 Changed 8 years ago by
Replying to nbruin:
That all should be safe.
... which is bad, because I need someone that I can blame }:>
I suppose the corruption happens somewhere lower down. Incidentally, the valgrind report points to exactly those two snippets. So valgrind reporting for these things is really very good.
It is not clear to me what you mean by "those two snippets": The definition of omEmulateAlloc0
, or the snippets that I provided in my previous post?
comment:30 in reply to: ↑ 29 ; followups: ↓ 31 ↓ 34 Changed 8 years ago by
Replying to SimonKing:
It is not clear to me what you mean by "those two snippets":
This one:
the snippets that I provided in my previous post?
But don't take my word for it. The report is in the sagedevel
thread:
==20553== Invalid read of size 8 ==20553== at 0x2694CB7C: gnc_mm_Mult_nn(int*, int*, sip_sring*) (gring.cc:503) ==20553== by 0x2694DABB: gnc_p_Mult_mm_Common(spolyrec*, spolyrec*, int, sip_sring*) (gring.cc:424) ==20553== by 0x269C7726: p_Power(spolyrec*, int, sip_sring*) (gring.h:181) ... ==20553== Address 0x40c39c48 is 8 bytes inside a block of size 12 alloc'd ==20553== at 0x4A0762F: malloc (vg_replace_malloc.c:270) ==20553== by 0x26BADA45: omEmulateAlloc0 (omAllocEmulate.c:15) ==20553== by 0x2694C9EE: gnc_mm_Mult_nn(int*, int*, sip_sring*) (gring.cc:472) ==20553== by 0x2694DABB: gnc_p_Mult_mm_Common(spolyrec*, spolyrec*, int, sip_sring*) (gring.cc:424) ==20553== by 0x269C7726: p_Power(spolyrec*, int, sip_sring*) (gring.h:181)
Wait a minute ... valgrind complains EARLIER! Line 503 rather than 507. So valgrind already seems unhappy with
p_MemAdd_LengthGeneral(F, G, ExpSize/sizeof(long));
It's not immediately clear to me that ExpSize/sizeof(long)
really is (at most) rN+1
(see p_MemAdd.h
for the definition). The singular source is now my mostgrepped directory.
comment:31 in reply to: ↑ 30 Changed 8 years ago by
It's not immediately clear to me that
ExpSize/sizeof(long)
really is (at most)rN+1
(seep_MemAdd.h
for the definition).
In fact, it certainly is not:
int ExpSize=(((rN+1)*sizeof(int)+sizeof(long)1)/sizeof(long))*sizeof(long);
so ExpSize
gets rounded up. If sizeof(int)=4
and sizeof(long)=8
then for rN=2
we get (rN+1)*sizeof(int)=12
and ExpSize=16
(which, by the valgrind report, is probably exactly the case we're in). So p_MemAdd_LengthGeneral
is definitely writing out of bounds. I think this code should simply read:
int *F=(int *)omAlloc0(ExpSize); int *G=(int *)omAlloc0(ExpSize); memcpy(F, F0,(rN+1)*sizeof(int)); // pExpVectorCopy(F,F0); memcpy(G, G0,(rN+1)*sizeof(int));
EDIT: Oops, don't change the memcpy, since the source may not have such a large allocation. The target is 0initialized, so the extra space should not be a problem.
God knows why they add an array of int as an array of long in
p_MemAdd_LengthGeneral(F, G, ExpSize/sizeof(long));
Are they sure there's no carry (I guess, if these are nonnegative signed ints the sign bit would buffer any carry).
Is it viable to simply ditch Singular :?
By the way, since most architectures with sizeof(long)=8
would be 8byte aligned anyway, I doubt that this error would ever lead to real trouble. But the sloppiness surely is worrying.
comment:32 followups: ↓ 33 ↓ 42 Changed 8 years ago by
OK, making the changes outlined above do make
./sage t devel/sage/sage/libs/singular/function.pyx
pass with MALLOC_CHECK_=3
. So: yes, singularmalloc
found a real issue in the singular code (please report upstream!).
Bugs like this *could* trip up singular on architectures that have default 4byte alignment and sizeof(long) = 8
. From omTables.c
we see that bins that have a length that are not a multiple of 8 are used in that case:
#ifdef OM_ALIGN_8 size_t om_BinSize [SIZEOF_OM_BIN_PAGE / MIN_BIN_BLOCKS] = { 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 96, 112, 128, 144, 160, 192, 224, ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*9)) / 8)*8, ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*6)) / 8)*8, ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*4)) / 8)*8, ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*2)) / 8)*8, ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR)) / 8)*8, OM_MAX_BLOCK_SIZE}; #else /* ! OM_ALIGN_8 */ size_t om_BinSize [SIZEOF_OM_BIN_PAGE / MIN_BIN_BLOCKS] = { 8, 12, 16, 20, 24, 28, 32, 40, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*9)) / SIZEOF_STRICT_ALIGNMENT)*SIZEOF_STRICT_ALIGNMENT, ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*6)) / SIZEOF_STRICT_ALIGNMENT)*SIZEOF_STRICT_ALIGNMENT, ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*4)) / SIZEOF_STRICT_ALIGNMENT)*SIZEOF_STRICT_ALIGNMENT, ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR*2)) / SIZEOF_STRICT_ALIGNMENT)*SIZEOF_STRICT_ALIGNMENT, ((SIZEOF_OM_BIN_PAGE / (MIN_BIN_BLOCKS + INCR_FACTOR)) / SIZEOF_STRICT_ALIGNMENT)*SIZEOF_STRICT_ALIGNMENT, OM_MAX_BLOCK_SIZE}; #endif /* OM_ALIGN_8 */
Running the same file with valgrind does give a couple of reports of the form:
==30816== Conditional jump or move depends on uninitialised value(s) ==30816== at 0x26942A01: hGetmem(int, int**, monrec*) (hutil.cc:1026) ==30816== by 0x26940296: hHilbStep(int*, int**, int, int*, int, int*, int) (hilb.cc:142) ==30816== by 0x2694047C: hHilbStep(int*, int**, int, int*, int, int*, int) (hilb.cc:186) ==30816== by 0x26940AEB: hSeries(sip_sideal*, intvec*, int, intvec*, sip_sideal*, sip_sring*) (hilb.cc:285) ==30816== by 0x26A62E0D: khCheck(sip_sideal*, intvec*, intvec*, int&, int&, skStrategy*) (khstd.cc:75) ==30816== by 0x2697EE07: bba(sip_sideal*, sip_sideal*, intvec*, intvec*, skStrategy*) (kstd2.cc:1214) ==30816== by 0x26973F5A: kStd(sip_sideal*, sip_sideal*, tHomog, intvec**, intvec*, int, int, intvec*) (kstd1.cc:1819) ==30816== by 0x268B3B27: jjSTD_HILB(sleftv*, sleftv*, sleftv*) (iparith.cc:3321)
and also (and this one seems to be what ElectricFence
trips on):
==30816== Invalid read of size 8 ==30816== at 0x26B8B3F2: List<CanonicalForm>::isEmpty() const (ftmpl_list.cc:256) ==30816== by 0x26AFA34F: multiFactorize(CanonicalForm const&, Variable const&) (facFactorize.cc:710) ==30816== by 0x26A95E9C: ratFactorize(CanonicalForm const&, Variable const&, bool) (facFactorize.h:45) ==30816== by 0x26A9303B: factorize(CanonicalForm const&, bool) (cf_factor.cc:651) ==30816== by 0x26935B43: singclap_factorize(spolyrec*, intvec**, int) (clapsing.cc:835) ==30816== by 0x268B0D4F: jjFAC_P(sleftv*, sleftv*) (iparith.cc:3995) ==30816== by 0x268B9C41: iiExprArith1(sleftv*, sleftv*, int) (iparith.cc:7869) ==30816== by 0x270C9F54: __pyx_f_4sage_4libs_8singular_8function_17KernelCallHandler_handle_call(__pyx_obj_4sage_4libs_8singular_8function_KernelCallHandler*, __pyx_obj_4sage_4libs_8singular_8function_Converter*, __pyx_opt_args_4sage_4libs_8singular_8function_17KernelCallHandler_handle_call*) (function.cpp:11286) ... ==30816== Address 0x40c08620 is 0 bytes after a block of size 32 alloc'd ==30816== at 0x4A06A97: operator new[](unsigned long) (vg_replace_malloc.c:363) ==30816== by 0x26AFA10B: multiFactorize(CanonicalForm const&, Variable const&) (facFactorize.cc:688) ==30816== by 0x26A95E9C: ratFactorize(CanonicalForm const&, Variable const&, bool) (facFactorize.h:45) ==30816== by 0x26A9303B: factorize(CanonicalForm const&, bool) (cf_factor.cc:651) ==30816== by 0x26935B43: singclap_factorize(spolyrec*, intvec**, int) (clapsing.cc:835) ==30816== by 0x268B0D4F: jjFAC_P(sleftv*, sleftv*) (iparith.cc:3995) ==30816== by 0x268B9C41: iiExprArith1(sleftv*, sleftv*, int) (iparith.cc:7869) ==30816== by 0x270C9F54: __pyx_f_4sage_4libs_8singular_8function_17KernelCallHandler_handle_call(__pyx_obj_4sage_4libs_8singular_8function_KernelCallHandler*, __pyx_obj_4sage_4libs_8singular_8function_Converter*, __pyx_opt_args_4sage_4libs_8singular_8function_17KernelCallHandler_handle_call*) (function.cpp:11286)
and one further report:
==30816== Invalid read of size 4 ==30816== at 0x269CE137: rOrd_is_Totaldegree_Ordering(sip_sring*) (ring.cc:2074) ==30816== by 0x269CFE12: rComplete(sip_sring*, int) (ring.cc:3469) ==30816== by 0x269D315C: rDefault(int, int, char**, int, int*, int*, int*, int**) (ring.cc:150) ==30816== by 0x269D3214: rDefault(int, int, char**) (ring.cc:167) ==30816== by 0x269C06F9: nInitChar(sip_sring*) (numbers.cc:267) ==30816== by 0x269CEF07: rComplete(sip_sring*, int) (ring.cc:3546) ==30816== by 0x268FA13F: rInit(sleftv*, sleftv*, sleftv*) (ipshell.cc:5127) ==30816== by 0x268D7607: yyparse() (grammar.y:1288) ==30816== by 0x268B9C41: iiExprArith1(sleftv*, sleftv*, int) (iparith.cc:7869) ==30816== by 0x268D7D32: yyparse() (grammar.y:659) ==30816== by 0x268ED5A7: iiAllStart(procinfo*, char*, feBufferTypes, int) (iplib.cc:316) ==30816== by 0x268ED73E: iiPStart(idrec*, sleftv*) (iplib.cc:417) ==30816== by 0x268EDC60: iiMake_proc(idrec*, sip_package*, sleftv*) (iplib.cc:539) ... ==30816== Address 0x40a7b208 is 0 bytes after a block of size 8 alloc'd ==30816== at 0x4A0762F: malloc (vg_replace_malloc.c:270) ==30816== by 0x269D31BD: rDefault(int, int, char**) (ring.cc:157) ==30816== by 0x269C06F9: nInitChar(sip_sring*) (numbers.cc:267) ==30816== by 0x269CEF07: rComplete(sip_sring*, int) (ring.cc:3546) ==30816== by 0x268FA13F: rInit(sleftv*, sleftv*, sleftv*) (ipshell.cc:5127) ==30816== by 0x268D7607: yyparse() (grammar.y:1288) ==30816== by 0x268B9C41: iiExprArith1(sleftv*, sleftv*, int) (iparith.cc:7869) ==30816== by 0x268D7D32: yyparse() (grammar.y:659) ==30816== by 0x268ED5A7: iiAllStart(procinfo*, char*, feBufferTypes, int) (iplib.cc:316) ==30816== by 0x268ED73E: iiPStart(idrec*, sleftv*) (iplib.cc:417) ==30816== by 0x268EDC60: iiMake_proc(idrec*, sip_package*, sleftv*) (iplib.cc:539)
comment:33 in reply to: ↑ 32 Changed 8 years ago by
==30816== Invalid read of size 4 ==30816== at 0x269CE137: rOrd_is_Totaldegree_Ordering(sip_sring*) (ring.cc:2074) ...
This is like shooting fish in a barrel. ring.cc:157
:
... int *order = (int *) omAlloc(2* sizeof(int)); ... return rDefault(ch,N,n,2,order,block0,block1);
calling ring.cc:127
(the variable order
above gets passed as the parameter ord
in the code below):
ring r=(ring) omAlloc0Bin(sip_sring_bin); ... r>order = ord; ... rComplete(r);
calling ring.cc:3469
:
if (rOrd_is_Totaldegree_Ordering(r)  rOrd_is_WeightedDegree_Ordering(r))
calling ring.cc:2065
:
BOOLEAN rOrd_is_Totaldegree_Ordering(ring r) { // Hmm.... what about Syz orderings? return (rVar(r) > 1 && ((rHasSimpleOrder(r) && (rOrder_is_DegOrdering((rRingOrder_t)r>order[0])  rOrder_is_DegOrdering(( rRingOrder_t)r>order[1])))  (rHasSimpleOrderAA(r) && (rOrder_is_DegOrdering((rRingOrder_t)r>order[1])  rOrder_is_DegOrdering((rRingOrder_t)r>order[2]))))); }
As you see, that last line ...r>ord[2]
is indeed out of bound on a 2*sizeof(int)
block. The valgrind session seems to indicate this code path is indeed exercised.
comment:34 in reply to: ↑ 30 Changed 8 years ago by
Replying to nbruin:
It's not immediately clear to me that
ExpSize/sizeof(long)
really is (at most)rN+1
Do you mean (rN+1)*sizeof(int)
?
comment:35 Changed 8 years ago by
 Report Upstream changed from N/A to Reported upstream. Developers acknowledge bug.
For the record: The problem is reported upstream, and Hannes seems to acknowledge that there is a problem.
comment:36 followup: ↓ 37 Changed 8 years ago by
I am a bit puzzled: I thought that I had posted a comment roughly as follows, but it seems to have vanished.
It would take some time to get a fix into a stable Singular release, and it would take even more time until the next stable Singular release is in Sage.
Hence: Could you create diff files for your fixes in the Singular source code (I am not talking about the malloc thing but about the bug fixes), so that one can put them into the patches/
directory of the Singular spkg?
comment:37 in reply to: ↑ 36 ; followup: ↓ 38 Changed 8 years ago by
Replying to SimonKing:
Hence: Could you create diff files for your fixes in the Singular source code (I am not talking about the malloc thing but about the bug fixes), so that one can put them into the
patches/
directory of the Singular spkg?
For the bug that tripped MALLOC_CHECK_
: sure. That's just a 2line change which I'm fairly confident is safe to make.
 int *F=(int *)omAlloc0((rN+1)*sizeof(int)); + int *F=(int *)omAlloc0(ExpSize);  int *G=(int *)omAlloc0((rN+1)*sizeof(int)); + int *G=(int *)omAlloc0(ExpSize);
For the second bug: I have no idea how to fix that. Just doing
 int *order = (int *) omAlloc(2* sizeof(int)) + int *order = (int *) omAlloc(3* sizeof(int))
may work but is a bit questionable.
Also I have no experience in making patch files for the singular source. It may be a bit more robust to take whatever change Singular makes to their development version and backport that.
Also: we're not done yet. There's still a branch on what valgrind thinks is an uninitialized value and there's the List<CanonicalForm>::isEmpty()
issue which ventures into templated code, so I'm at a loss for that.
Purpose of this ticket: I guess it's vague. Feel free to specify. I was actually hoping that after seeing that singularmalloc can expose issues that are otherwise hard to find, Singular might be interesting in adopting it and including it as a honest configure option (resolving the problem with the executable too). Then we can just put the proper doc in the spkg. If they don't do that, we could still do so but as a patch to singular. My goal was just to get rid of Heisenbugs in our communication with Singular.
comment:38 in reply to: ↑ 37 Changed 8 years ago by
Replying to nbruin:
Replying to SimonKing:
Hence: Could you create diff files for your fixes in the Singular source code (I am not talking about the malloc thing but about the bug fixes), so that one can put them into the
patches/
directory of the Singular spkg?For the bug that tripped
MALLOC_CHECK_
: sure. That's just a 2line change which I'm fairly confident is safe to make. int *F=(int *)omAlloc0((rN+1)*sizeof(int)); + int *F=(int *)omAlloc0(ExpSize);  int *G=(int *)omAlloc0((rN+1)*sizeof(int)); + int *G=(int *)omAlloc0(ExpSize);For the second bug: I have no idea how to fix that. Just doing
 int *order = (int *) omAlloc(2* sizeof(int)) + int *order = (int *) omAlloc(3* sizeof(int))may work but is a bit questionable.
Can you suggest that as a fix on the singular ticket? Or shall I communicate it?
Also I have no experience in making patch files for the singular source.
That's easy.
 Create a diff with context (is that called "unified"? I am not sure).
 Give it a good name and perhaps a commit message, and put it into the
patches/
folder of the Singular spkg. Just look what you currently find in that folder. The point is: The patches placed into thepatches/
folder are automatically applied before building Singular.  Do not change the files in
src/
! They are supposed to be identical with the official Singular 315 source files.  Rename the spkg into
singular315.p2.spkg
.  Update
SPKG.txt
, describing the new patch.  Both the
patches/
folder andSPKG.txt
are under version control, hence, you also need tohg commit
.  Pack the spkg, post it somewhere, and ask me or someone else for a review.
The same patch could be put into your mallocSingular/patches/
, so that we can test whether the problem is really solved. And if it is, then the patch in the usual omalloc Singular should be fine to go.
It may be a bit more robust to take whatever change Singular makes to their development version and backport that.
Well, let's see what the Singular team thinks of your suggestion...
If they react quickly, then we could take the patch from Singular trac. If they are too slow, we take your patch from Sage trac.
Also: we're not done yet. There's still a branch on what valgrind thinks is an uninitialized value and there's the
List<CanonicalForm>::isEmpty()
issue which ventures into templated code, so I'm at a loss for that.
Sure. Could be that we will open further Singular trac tickets, and could be that there will be two or three more patches in patches/
...
Purpose of this ticket: I guess it's vague. Feel free to specify. I was actually hoping that after seeing that singularmalloc can expose issues that are otherwise hard to find, Singular might be interesting in adopting it and including it as a honest configure option (resolving the problem with the executable too). Then we can just put the proper doc in the spkg.
I see that the patches/
directory contains a subdirectory patches/conditional/
containing a patch sage_debug.patch
and a patch for cygwin. The changes that you've done in the Singular source code to use malloc could be put into patches/conditional/
. Probably spkginstall needs to be changed to take the new option into account.
If they don't do that, we could still do so but as a patch to singular.
I think it will be a patch to the Singular spkg in any case. Waiting for Singular 316 (which will hopefully contain a fix for the memory problem) is probably no option.
comment:39 followup: ↓ 41 Changed 8 years ago by
I've put the following two files into the patches/
folder of the singular315.malloclinux spkg:
sage_trac13731_gnc_mm_Mult_nn.patch

src/kernel/gring.cc
# HG changeset patch # User Simon King <simon.king@unijena.de> # Date 1353701669 3600 # Node ID 0f5e07666bc133a0d38cb846120c0103c4c2b0ae # Parent 3eb11c08dd9611b7fdba4622cc0d1660527f9e96 #13731: Fix allocated size of F and G in gnc_mm_Mult_nn diff git a/src/kernel/gring.cc b/src/kernel/gring.cc
a b 469 469 int rN=r>N; 470 470 int ExpSize=(((rN+1)*sizeof(int)+sizeof(long)1)/sizeof(long))*sizeof(long); 471 471 472 int *F=(int *)omAlloc0( (rN+1)*sizeof(int));473 int *G=(int *)omAlloc0( (rN+1)*sizeof(int));472 int *F=(int *)omAlloc0(ExpSize); 473 int *G=(int *)omAlloc0(ExpSize); 474 474 475 475 memcpy(F, F0,(rN+1)*sizeof(int)); 476 476 // pExpVectorCopy(F,F0);

 patches/sage_trac13731_size_of_order.patch

src/kernel/ring.cc
# HG changeset patch # User Simon King <simon.king@unijena.de> # Date 1353701944 3600 # Node ID 3ab3d75bf03794d94037c3e33711fdd773d858dc # Parent 0f5e07666bc133a0d38cb846120c0103c4c2b0ae #13731: Fix allocated size for order (in ring.cc) diff git a/src/kernel/ring.cc b/src/kernel/ring.cc
a b 154 154 ring rDefault(int ch, int N, char **n) 155 155 { 156 156 /*order: lp,0*/ 157 int *order = (int *) omAlloc( 2* sizeof(int));157 int *order = (int *) omAlloc(3* sizeof(int)); 158 158 int *block0 = (int *)omAlloc0(2 * sizeof(int)); 159 159 int *block1 = (int *)omAlloc0(2 * sizeof(int)); 160 160 /* ringorder dp for the first block: var 1..N */

However, after reinstalling the package, the crash persists!
sage: A.<x,y> = FreeAlgebra(QQ, 2) sage: P.<x,y> = A.g_algebra({y*x:x*y}) sage: x*y *** glibc detected *** python: free(): invalid pointer: 0x0000000002bb11c0 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x766d6)[0x7f8243e866d6] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libsingular.so(_Z14gnc_mm_Mult_nnPiS_P9sip_sring+0x294)[0x7f8229a81c54] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libsingular.so(_Z20gnc_p_Mult_mm_CommonP8spolyrecS0_iP9sip_sring+0x1ec)[0x7f8229a82acc] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/python2.7/sitepackages/sage/libs/singular/polynomial.so(+0x4bd0)[0x7f8228e63bd0] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/python2.7/sitepackages/sage/rings/polynomial/plural.so(+0x1fbd8)[0x7f82292a8bd8] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/python2.7/sitepackages/sage/structure/element.so(+0x2a399)[0x7f823b0a7399] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(+0x48a4f)[0x7f8244a63a4f]
That's bad. Didn't you say the change fixes it for you?
comment:40 Changed 8 years ago by
Aha! You also changed spkginstall, so that the patches are actually not applied! That is of course the reason why it didn't work.
comment:41 in reply to: ↑ 39 ; followup: ↓ 44 Changed 8 years ago by
Replying to SimonKing:
I've put the following two files into the
patches/
folder of the singular315.malloclinux spkg:
[...]
Perhaps backport this instead (should be pretty straightforward to replace the patches):
http://www.singular.unikl.de:8002/trac/changeset/15435
It looks like the second issue is also addressed. Are the other issues within your reach to investigate? I'm not really comfortable tracing through templated C++.
The FC16 packaged valgrind didn't work for me due to unsupported machine instructions, but the latest valgrind release fixes that and installing that in /usr/local was a breeze! I can thoroughly recommend using it. The information it gives seem spoton. (I was able to diagnose both these issues by just looking at the valgrind report and tracing through the code from allocation to access error. It won't always be that straightforward, but one can hope)
With
http://sage.math.washington.edu/home/nbruin/sage.supp
the amount of false (or at least for me uninteresting) reports is bearable.
It's unlikely, but the errors we've found could lead to random corruptions and hence explain the peculiar systemdependent bugs we've experienced.
I'm afraid we're not done yet, though: Although we haven't found a GC mismatch here, we're pretty sure they exist. Also these corruptions should cause bad problems all around, not just after introducing GC.
comment:42 in reply to: ↑ 32 Changed 8 years ago by
==30816== Conditional jump or move depends on uninitialised value(s) ==30816== at 0x26942A01: hGetmem(int, int**, monrec*)
I think this one is benign, because rewriting the code to the functionally equivalent
hutil.cc:1020:
scfmon hGetmem(int lm, scfmon old, monp monmem) { scfmon x = monmem>mo; int lx = monmem>a; if ((x==NULL)  (lm > lx)) {  if ((x!=NULL)&&(lx>0)) omFreeSize((ADDRESS)x, lx * sizeof(scmon)); + if (x!=NULL) if (lx>0) omFreeSize((ADDRESS)x, lx * sizeof(scmon)); monmem>mo = x = (scfmon)omAlloc(lm * sizeof(scmon)); monmem>a = lm; } memcpy(x, old, lm * sizeof(scmon)); return x; }
makes the reports go away. I guess gcc optimizes out the shortcutting because it's not a benefit here and valgrind doesn't see through the fact that the undefined value doesn't influence the outcome.
comment:43 in reply to: ↑ 2 Changed 8 years ago by
Replying to nbruin:
#0 0x00007fffce9fb3f2 in List<CanonicalForm>::isEmpty(this=0x7fffa515b000) at ./templates/ftmpl_list.cc:256 #1 0x00007fffce96a350 in multiFactorize (F=..., v=...) at facFactorize.cc:710
I think this one is rather straightforward too (and a genuine outofbounds reference!): facFactorize.cc:688
CFList* bufAeval2= new CFList [A.level()  2]; ... evaluationWRTDifferentSecondVars (bufAeval2, bufEvaluation, A); for (int j= 0; j < A.level()  1; j++) { if (!bufAeval2[j].isEmpty()) counter++; }
so this queries all elements bufAeval2[0],...,bufAeval2[A.level()2]
. However,
if this allocates an array as it does in C, then the new
command above only
creates
bufAeval2[0],...,bufAeval2[A.level()3]
(i.e., A.level()2 of them, but
0based.
The initialization by evaluationWRTDifferentSecondVars
seems to corroborate
that:
facFqFactorize.cc:1778
void evaluationWRTDifferentSecondVars (CFList*& Aeval, const CFList& evaluation, const CanonicalForm& A) { CanonicalForm tmp; CFList tmp2; CFListIterator iter; for (int i= A.level(); i > 2; i) { ... if (preserveDegree) Aeval [i  3]= tmp2; else Aeval [i  3]= CFList(); } }
So only Aeval[0], ..., Aeval[A.level()3]
get initialized.
Thus, the reference to !bufAeval2[j].isEmpty()
with j = A.level()  2
indeed seems out of bounds to me. The abundance of bound errors in Singular is really making me feel uncomfortable. The Singular team should really take their memory audits a little more seriously. They're playing russian roulette with mathematical correctness.
comment:44 in reply to: ↑ 41 Changed 8 years ago by
Replying to nbruin:
Replying to SimonKing:
I've put the following two files into the
patches/
folder of the singular315.malloclinux spkg:[...]
Perhaps backport this instead (should be pretty straightforward to replace the patches):
Yes, that looks like the right thing to do.
My plan is to take your Singularmalloc spkg and turn it into a "proper" spkg, in the sense that the changes that you did to the source code will become patches that are optionally applied (say, if one does export SAGE_SINGULA_MALLOC=yes
).
The backported fix will of course be an unconditional patch.
It looks like the second issue is also addressed. Are the other issues within your reach to investigate? I'm not really comfortable tracing through templated C++.
Neither am I. But at least (in contrast to me!) you are comfortable to work with valgrind! Perhaps you could open a ticket to make your .supp
file available to people? After all, there is a totally outdated valgrind spkg...
comment:45 followup: ↓ 48 Changed 8 years ago by
Now I try to patch the standard Singular spkg so that one can optionally use malloc. But I have two questions on how you created the singularmalloc spkg.
Recall that I have removed (or at least I tried to remove) all autogenerated files that I found in your spkg in order to make it work on my openSuse
laptop. The changes compared with the original sources are:
M src/Singular/Makefile.in M src/Singular/Minor.h M src/Singular/configure M src/Singular/configure.in M src/Singular/sing_win.cc M src/factory/assert.h M src/factory/cf_chinese.cc M src/factory/cf_util.cc M src/factory/cf_util.h M src/factory/facBivar.h M src/factory/facFactorize.h M src/factory/facFqBivar.cc M src/omalloc/Makefile.in M src/omalloc/omAllocDecl.h M src/omalloc/omAllocFunc.c M src/omalloc/omMallocSystem.h M src/omalloc/omalloc.c M src/omalloc/pmalloc.h ? src/factory/gen_cf_gmp.sh ? src/omalloc/omTables.h ? src/omalloc/omTables.inc
So, first question: Is it correct that all three new files are needed for malloc support? I'd rather think that gen_cf_gmp.sh
should not be here, because it contains explicit paths:
#!/bin/sh GMP_H_T="gen_cf_gmp.o: gen_cf_gmp.cc /usr/local/sage/5.5b1/local/include/gmp.h \ " GMP_H=`echo $GMP_H_T sed e 's/^.*gmp.cc//' e 's/ .$//'` echo generating cf_gmp.h from $GMP_H cat $GMP_H  grep v __GMP_DECLSPEC_XX grep v std::FILE > cf_gmp.h
Second question: Is it not the case that src/Singular/configure
is autogenerated from src/Singular/configure.in
? Or is it really essential that src/Singular/configure
is as in your spkg?
comment:46 Changed 8 years ago by
PS: Actually it seems to me that the changes in src/Singular/configure.in
and src/Singular/configure
are due to the patch patches/singular_trac_443.patch
that is to be applied anyway.
comment:47 Changed 8 years ago by
src/omalloc/omTables.inc
doesn't state that it is autogenerated, but it certainly looks like.
So, third question: Did you write src/omalloc/omTables.inc
yourself? Or can I safely delete it?
comment:48 in reply to: ↑ 45 ; followup: ↓ 49 Changed 8 years ago by
Replying to SimonKing:
M src/omalloc/omAllocDecl.h M src/omalloc/omAllocFunc.c M src/omalloc/omMallocSystem.h M src/omalloc/omalloc.c M src/omalloc/pmalloc.hSo, first question: Is it correct that all three new files are needed for malloc support?
I didn't make any new files and I only seriously edited things in the omalloc
directory. I may have done some rough hacking in config files to ensure that the changes took effect, but I believe that it should be possible to do without by someone who know how to pass the correct options to configure etc.
To make the "spkg" I only did a "make clean" and a command to make the dir into a spkg. I disabled the "patch application" in spkginstall because that made it fail the second time I ran it (which is inconvenient when you're editing the package!)
Note that malloc_usable_size
for glibc has the same function as malloc_size
on BSD (OSX), so if there's a convenient config option to choose between the two you can make the spkg so that it compiles on both.
The omalloc package is obviously written in such a way that it can build in different ways, depending on config options passed at buildtime. What I did was resurrect one of those options (it didn't provide all the functionality that singular needs). It should be possible to adapt my changes accordingly. Then it's not essential to conditionally apply the patches. It's just that I have zero experience with all the "#ifdef" voodoo that seems to be associated with productionquality code.
I found that "meld" was a very convenient way of browsing the changes I'd made (comparing against a pristine directory). It can do a graphically displayed diff on an entire tree. It's what I used to make the linux package from the OSX one I had.
comment:49 in reply to: ↑ 48 ; followup: ↓ 50 Changed 8 years ago by
Replying to nbruin:
Replying to SimonKing:
M src/omalloc/omAllocDecl.h M src/omalloc/omAllocFunc.c M src/omalloc/omMallocSystem.h M src/omalloc/omalloc.c M src/omalloc/pmalloc.hSo, first question: Is it correct that all three new files are needed for malloc support?
I didn't make any new files and I only seriously edited things in the
omalloc
directory.
OK, so, I assume you only edited the five files indicated above, and did no other changes by yourself.
I may have done some rough hacking in config files to ensure that the changes took effect, but I believe that it should be possible to do without by someone who know how to pass the correct options to configure etc.
Well, I don't know.
To make the "spkg" I only did a "make clean" and a command to make the dir into a spkg.
That means that all patches from the patches/
directory are applied in your code, and if make clean
is imperfect then it also means that some autogenerated (machine dependent) files are in your code.
I disabled the "patch application" in spkginstall because that made it fail the second time I ran it (which is inconvenient when you're editing the package!)
Sure, because you started with code that already contains all patches.
Note that
malloc_usable_size
for glibc has the same function asmalloc_size
on BSD (OSX), so if there's a convenient config option to choose between the two you can make the spkg so that it compiles on both.
Hm. I see the following definition
src/omalloc/dlmalloc.h
/* define to 1 if you want that this implementation provides malloc/calloc/realloc/free funcs */ #ifndef OM_PROVIDE_MALLOC #define OM_PROVIDE_MALLOC 0 #endif #ifndef HAVE__USR_INCLUDE_MALLOC_H #undef HAVE__USR_INCLUDE_MALLOC_H #endif #ifdef HAVE__USR_INCLUDE_MALLOC_H #define HAVE_USR_INCLUDE_MALLOC_H 1 #endif #define OM_MALLOC_MALLOC mALLOc #define OM_MALLOC_REALLOC rEALLOc #define OM_MALLOC_FREE fREe #define OM_MALLOC_VALLOC vALLOc #define OM_MALLOC_VFREE(addr, size) OM_MALLOC_FREE(addr) #define OM_MALLOC_SIZEOF_ADDR(addr) malloc_usable_size(addr) #define cfree cFREe
I don't totally understand the comment "define to 1 if you want that this implementation provides malloc/calloc/realloc/free func". Does that perhaps mean that export OM_PROVIDE_MALLOC=1
is all what one need to use system malloc?
But I wonder: You say that malloc_usable_size
is not available on OSX. Then, the line
#define OM_MALLOC_SIZEOF_ADDR(addr) malloc_usable_size(addr)
sounds like a bug to me. I'll ask Hannes.
comment:50 in reply to: ↑ 49 ; followup: ↓ 51 Changed 8 years ago by
Replying to SimonKing:
But I wonder: You say that
malloc_usable_size
is not available on OSX. Then, the line#define OM_MALLOC_SIZEOF_ADDR(addr) malloc_usable_size(addr)sounds like a bug to me. I'll ask Hannes.
That definition only occurs in dlmalloc.h
, right? I don't think other files import that header file, so probably dlmalloc
only gets used under certain configure options. I wasn't able to trace through how omalloc gets configured/built. It seems to be a value for configure:
withmalloc=systemdlmallocgmallocpmallocexternal
(given that singular builds just fine on OSX this cannot be a real problem)
But yes, these changes are best made with input from someone familiar with omalloc, so hopefully Hannes can help.
comment:51 in reply to: ↑ 50 ; followup: ↓ 52 Changed 8 years ago by
Replying to nbruin:
It seems to be a value for configure:
withmalloc=systemdlmallocgmallocpmallocexternal
Then the question arises: Why is it not enough to pass the withmalloc=system
to Singular? Why did the additional changes become necessary that you made on five files in src/omalloc/
?
For example, you edited src/omalloc/omAllocDecl.h
to make the emulation unconditional. Is that really needed, if there is a configuration option anyway?
But yes, these changes are best made with input from someone familiar with omalloc, so hopefully Hannes can help.
I asked, but he didn't reply yet. Well, it's weekend, after all...
comment:52 in reply to: ↑ 51 ; followup: ↓ 53 Changed 8 years ago by
Replying to SimonKing:
Replying to nbruin:
It seems to be a value for configure:
withmalloc=systemdlmallocgmallocpmallocexternalThen the question arises: Why is it not enough to pass the
withmalloc=system
to Singular? Why did the additional changes become necessary that you made on five files insrc/omalloc/
?
I think I can explain that: omalloc is written to work on top of another memory manager. It doesn't call mmap
or sbrk
by itself. So I think this option selects which malloc is used to serve omalloc with pages.
dlmalloc
is simply a fullfledged memory manager that happens to be shipped with omalloc and can be used as a backend. If you look in dlmalloc.c
, there is actually an implementation of malloc_usable_size
. So I guess you can build omalloc in a way that is independent of the malloc offered by the clib and just uses system calls to mmap
or sbrk
directly.
The other configuration options (you'll have to read up on it to be sure) deal with what services omalloc provides and how it does it.
There's an option that selects whether omalloc should offer the standard 'malloc/realloc/free' interface to its own allocators.
There is also an option that selects how omalloc does its job: whether it should do what it is designed for or whether it should just honour the requests it receives by passing them on 11 to the underlying memory allocator. That's the option that had fallen into disuse within singular and had become defunct. I tried to bring its functionality back to the level required for singular. I hacked my way to getting omalloc to compile in that mode. However, I'm sure this can be done in a nicer fashion.
For example, you edited
src/omalloc/omAllocDecl.h
to make the emulation unconditional. Is that really needed, if there is a configuration option anyway?
No I'm sure it's not. I didn't figure out a way that positively confirmed for me that just passing in an option helped (partially because I couldn't interact directly with the singular configure script: the spkginstall plays games too in order to build both an executable and libsingular). But someone more familiar with how configure and conditional compilation setups usually work can probably very easily put that in order.
comment:53 in reply to: ↑ 52 ; followups: ↓ 54 ↓ 55 Changed 8 years ago by
Replying to nbruin:
I think I can explain that: omalloc is written to work on top of another memory manager.
So, that is withmalloc=...
.
There's an option that selects whether omalloc should offer the standard 'malloc/realloc/free' interface to its own allocators.
What does that mean? If one uses omalloc, why should one not want that it offers malloc/realloc/free
?
There is also an option that selects how omalloc does its job: whether it should do what it is designed for or whether it should just honour the requests it receives by passing them on 11 to the underlying memory allocator.
So, that's the option that we need, right? Is that withemulateomalloc
that you added to spkginstall?
I guess I am now experiencing the same that you did: If withemulateomalloc
is the only change, then installing the spkg fails with
In file included from ../kernel/bigintmat.h:13:0, from bigintmat.cc:10: ../kernel/intvec.h: In constructor ‘intvec::intvec(int)’: ../kernel/intvec.h:26:18: error: ‘omEmulateAlloc0’ was not declared in this scope bigintmat.cc: In member function ‘int* bigintmat::getwid(int)’: bigintmat.cc:379:31: error: ‘omStrDup’ was not declared in this scope bigintmat.cc: In member function ‘void bigintmat::pprint(int)’: bigintmat.cc:414:18: error: ‘omEmulateAlloc0’ was not declared in this scope bigintmat.cc:421:32: error: ‘omStrDup’ was not declared in this scope
comment:54 in reply to: ↑ 53 Changed 8 years ago by
Replying to SimonKing:
There's an option that selects whether omalloc should offer the standard 'malloc/realloc/free' interface to its own allocators.
What does that mean? If one uses omalloc, why should one not want that it offers
malloc/realloc/free
?
It selects whether omalloc should offer its services under those names, i.e., as a dropin replacement for the standard libc memory manager. That's not what Singular does. It uses the various omalloc routines, which offer extra options. Otherwise, changing Singular would have been a matter of "dropping out" the dropin replacement. Instead we have to jump through hoops to get omalloc to use libc's malloc (or whatever dropin replacement we want to use, like ElectricFence?) behind the scenes.
comment:55 in reply to: ↑ 53 Changed 8 years ago by
Replying to SimonKing:
I guess I am now experiencing the same that you did: If
withemulateomalloc
is the only change, then installing the spkg fails withIn file included from ../kernel/bigintmat.h:13:0, from bigintmat.cc:10: ../kernel/intvec.h: In constructor ‘intvec::intvec(int)’: ../kernel/intvec.h:26:18: error: ‘omEmulateAlloc0’ was not declared in this scope bigintmat.cc: In member function ‘int* bigintmat::getwid(int)’: bigintmat.cc:379:31: error: ‘omStrDup’ was not declared in this scope bigintmat.cc: In member function ‘void bigintmat::pprint(int)’: bigintmat.cc:414:18: error: ‘omEmulateAlloc0’ was not declared in this scope bigintmat.cc:421:32: error: ‘omStrDup’ was not declared in this scope
Yay! indeed. Those are some of the routines I provided. So yes, it's probably best if you just use my changes for information and as hints for how to provide the missing functionality, but then integrate it more cleanly (I had a couple of rounds of these; the hard one being omMemDup, which you don't seem to hit yet). The omEmulateAlloc0
is a weird one since that IS implemented in omAllocEmulate.c
I think I solved that one by simply including the relevant declaration in omAllocDecl.h
.
comment:56 Changed 8 years ago by
What I did to produce a singular spkg that produces a malloclibsingular:
 Drop attachment:sage_trac_13731.patch into the spkg's
patches/
directory  Apply the following change to spkginstall:

spkginstall
diff git a/spkginstall b/spkginstall
a b 134 134 libdir="$SAGE_LOCAL"/lib \ 135 135 withapint=gmp \ 136 136 withmalloc=system \ 137 withemulateomalloc \ 137 138 withNTL \ 138 139 withoutMP \ 139 140 withoutlex \

With that, I seem to get a working libsingular, and I get the due segfault with export MALLOC_CHECK_=3
in the example of the ticket description. However, I do not obtain a working Singular executable.
I am now testing whether the spkg works on bsd.math as well.
comment:57 Changed 8 years ago by
Unfortunately, it does not work on bsd.math: The problem appears to be that there is no malloc.h available:
In file included from ./omalloc.h:785:0, from omalloc.c:16: ../omalloc/omMalloc.h:12:20: fatal error: malloc.h: No such file or directory compilation terminated.
That's really strange. Shouldn't malloc be available system wide?
comment:58 Changed 8 years ago by
bsd.math is OS X, isn't it? I think I had trouble with that recently because malloc.h is in a subdirectory  not at the top level. You would need an extra I I guess. I'll see if I can find it.
comment:59 Changed 8 years ago by
It is in /usr/include/malloc/
comment:60 followup: ↓ 61 Changed 8 years ago by
I am puzzled.
When I put
#if defined(ix86Mac_darwin)  defined(x86_64Mac_darwin)  defined(ppcMac_darwin) #include <malloc/malloc.h> #else #include <malloc.h> #endif
in src/omalloc/omMallocSystem.h
(which is responsible for including malloc.h), things do still not work and it still complains that it can not find malloc.h. But in the installation log, I see the line
checking uname for singular... (cached) x86_64Macdarwin
and omMalloc.h
is created only after that line. Hence I don't understand why it doesn't try to pick up malloc/malloc.h
.
comment:61 in reply to: ↑ 60 Changed 8 years ago by
Replying to SimonKing:
#if defined(ix86Mac_darwin)  defined(x86_64Mac_darwin)  defined(ppcMac_darwin) #include <malloc/malloc.h> #else #include <malloc.h> #endifchecking uname for singular... (cached) x86_64Macdarwin
How nasty! It defines x86_64Macdarwin
, but the person who originally wrote the check for the platforms wrote x86_64Mac_darwin
!!
comment:62 followup: ↓ 63 Changed 8 years ago by
Rool eyes.... but not completely surprised.
comment:63 in reply to: ↑ 62 Changed 8 years ago by
Replying to fbissey:
Rool eyes.... but not completely surprised.
But on the other hand, it can not be "" (minus sign!) between Mac and darwin. Anyway. There must be a test on which platform we are. How to?
comment:64 followup: ↓ 65 Changed 8 years ago by
I am not sure what you would get for a nonmac darwin target, otherwise the easiest thing is probably gcc dumpmachine, On my 10.5.8 machine:
BlueFerniMac1:~ frb15$ gcc dumpmachine i686appledarwin9
comment:65 in reply to: ↑ 64 Changed 8 years ago by
Replying to fbissey:
I am not sure what you would get for a nonmac darwin target, otherwise the easiest thing is probably gcc dumpmachine, On my 10.5.8 machine:
BlueFerniMac1:~ frb15$ gcc dumpmachine i686appledarwin9
Thank you! But I would rather expect that those things are done during configuration, and then the type of the platform is stored in a variable. I'm asking Hans Schönemann how Singular deals with different platforms.
comment:66 Changed 8 years ago by
I have backported Singular changeset 15435.
When I put attachment:sage_trac_13731.patch and attachment:singular_15435.patch into the spkg's patches/ directory and apply the change to spkginstall mentioned in comment:56, then I get
simon@linuxsqwp:~/SAGE/debug/sage5.5.rc0> export MALLOC_CHECK_=3 simon@linuxsqwp:~/SAGE/debug/sage5.5.rc0> ./sage ... sage: A.<x,y> = FreeAlgebra(QQ, 2) sage: P.<x,y> = A.g_algebra({y*x:x*y}) sage: x._mul_(y) x*y
So, indeed the original problem got fixed upstream.
Plans:
 My suggestion is to put the second patch (the backported fix) into our standard Singular spkg.
 We should of course follow the original plan and see whether we can find further sloppy allocations in Singular, so that #715 and friends can be made work in a reliable way.
 Can someone tell me how to tell autoconf that it shall create a configure script that tells the preprocessor that we are on Darwin and hence must include malloc/malloc.h in place of malloc.h?
comment:67 followup: ↓ 69 Changed 8 years ago by
After more thinking I think you may want to use AC_CANONICAL_TARGET it is usually used for cross compilation but it should work in this case. It is much better than testing gcc since we may want to use clang.
comment:68 followup: ↓ 70 Changed 8 years ago by
Just a quick comment: The normal way of using malloc
is via #include <stdlib.h>
. It's only because we need malloc_usable_size
or malloc_size
that we need malloc.h
. You're already choosing between those using an #if defined(...)
, so why not use the same mechanism for the compile?
Given that this code should only be relevant for *debug* builds of singular, I don't think it's worth overhauling the process by which omalloc produces its header field.
The likely programmatic injection point for changes seems to be the makeheader
script which is invoked in the makefile rule for malloc.h
, but I'm not sure that that has capabilities of building header files based on conditionals by default.
comment:69 in reply to: ↑ 67 Changed 8 years ago by
Replying to fbissey:
After more thinking I think you may want to use AC_CANONICAL_TARGET it is usually used for cross compilation but it should work in this case. It is much better than testing gcc since we may want to use clang.
My question was actually not so much about "How to detect the platform with autoconf?", because I see in the existing configure.in how it can be done. The question is more "How to define a symbol, say, HAVE_DARWIN
so that the preprocessor is aware of it?".
I tried to do export HAVE_DARWIN=1
in spkgbuild, but it didn't work. I tried AC_DEFINE(HAVE_DARWIN,1)
, but it didn't work. Here, "didn't work" means that the preprocessor has always been in the else
part of the following lines:
#ifdef HAVE_DARWIN #include "malloc/malloc.h" #else #include "malloc.h" #endif
comment:70 in reply to: ↑ 68 Changed 8 years ago by
Replying to nbruin:
Just a quick comment: The normal way of using
malloc
is via#include <stdlib.h>
. It's only because we needmalloc_usable_size
ormalloc_size
that we needmalloc.h
. You're already choosing between those using an#if defined(...)
, so why not use the same mechanism for the compile?
Well, I don't know whether that way of chosing works. It definitely does not for the #include "malloc.h"
versus #include "malloc/malloc.h"
thingy.
The likely programmatic injection point for changes seems to be the
makeheader
script which is invoked in the makefile rule formalloc.h
, but I'm not sure that that has capabilities of building header files based on conditionals by default.
I haven't looked into that.> Given that this code should only be relevant for *debug* builds of singular, I don't think it's worth overhauling the process by which omalloc produces its header field.
Probably right. We should better focus on using the debug spkg to detect more memory corruptions.
Just to be on the safe side: Does attachment:singular_15435.patch fix all upstream problems that we have detected so far?
 The one from the ticket description is fixed.
 Is the one from comment:33 addressed?
 Is the one from comment:42 addressed? I understand that one isn't critical, though.
 The one from comment:43 does not seem to be addressed, and I think I didn't mention it upstream, yet.
comment:71 followup: ↓ 72 Changed 8 years ago by
I think AC_DEFINE puts stuff in config.h and you would have to include config.h before this piece of code. Having a code file changed from configure is quite some work. I would have to look it up.
comment:72 in reply to: ↑ 71 ; followup: ↓ 73 Changed 8 years ago by
Replying to fbissey:
I think AC_DEFINE puts stuff in config.h and you would have to include config.h before this piece of code.
That's probably it! Is it needed to use AC_SUBS
as well? I read somewhere that this would do something with a header file, so, it is not clear to me whether AC_DEFINE
is enough or AC_SUBS
is needed in addition.
comment:73 in reply to: ↑ 72 ; followup: ↓ 74 Changed 8 years ago by
Replying to SimonKing:
Replying to fbissey:
I think AC_DEFINE puts stuff in config.h and you would have to include config.h before this piece of code.
That's probably it! Is it needed to use
AC_SUBS
as well? I read somewhere that this would do something with a header file, so, it is not clear to me whetherAC_DEFINE
is enough orAC_SUBS
is needed in addition.
I have read more about it at http://www.gnu.org/savannahcheckouts/gnu/autoconf/manual/autoconf2.69/html_node/DefiningSymbols.html and it goes in config.h if AC_CONFIG_HEADERS is called. AC_SUBST usually acts on Makefile.in preferably generated from Makefile.am and you have to put some code for it to work IIRC. Substituting in a specific myfile.c.in is something else altogether, I need to look it up.
I don't know that stuff from the top of my head unfortunately.
comment:74 in reply to: ↑ 73 Changed 8 years ago by
Replying to fbissey:
I have read more about it at http://www.gnu.org/savannahcheckouts/gnu/autoconf/manual/autoconf2.69/html_node/DefiningSymbols.html and it goes in config.h if AC_CONFIG_HEADERS is called.
Thank you! So, I'd need to add AC_CONFIG_HEADERS
as well, because the only place where AC_CONFIG_HEADER
is used is in gfanlib/configure.ac
.
AC_SUBST usually acts on Makefile.in preferably generated from Makefile.am and you have to put some code for it to work IIRC.
Makefile.am does not exist in the Singular sources. However, AC_SUBST
is used in several places.
Anyway, I guess inserting AC_CONFIG_HEADERS
is the easiest thing to do.
comment:75 followup: ↓ 76 Changed 8 years ago by
I wouldn't add AC_CONFIG_HEADERS if it isn't already there. Are you working on the top level or at the Singular folder level? configure.in in Singular does call AC_CONFIG_HEADERS and puts definitions in mod2.h.
comment:76 in reply to: ↑ 75 Changed 8 years ago by
Replying to fbissey:
I wouldn't add AC_CONFIG_HEADERS if it isn't already there. Are you working on the top level or at the Singular folder level? configure.in in Singular does call AC_CONFIG_HEADERS and puts definitions in mod2.h.
No, it doesn't use AC_CONFIG_HEADERS but AC_CONFIG_HEADER. That's why grep didn't find it.
Even better: omalloc/configure.in uses it as well  hence, this should be the right place to put the missing symbol.
comment:77 Changed 8 years ago by
I am slightly shocked that omalloc/configure.in uses a hardcoded path to malloc.h:
AC_CHECK_HEADERS(unistd.h sys/mman.h fcntl.h /usr/include/malloc.h)
Isn't that bad?
comment:78 Changed 8 years ago by
 Cc malb added
Ccing Marting Albrecht, because he knows the Singular spkg.
Next thing I wonder about: The spkg contains src/omalloc/configure, but on top it says that this is autogenerated. By consequence, when I tried to change src/omalloc/configure.in, my changes had no effect.
So, my first guess was that configure has to be removed  but that didn't work.
Martin, do you think patching src/omalloc/configure directly is the right thing to do? Or what should I do in order to change the configuration in src/omalloc?
comment:79 followup: ↓ 80 Changed 8 years ago by
Simon, configure
is generated from configure.in
using autoconf
. However, Singular requires a special autoconf version (I think it is a particularly old one) which makes fixing things in its configure script tricky. I think this should go on [libsingulardevel]. Hans can help, I think.
comment:80 in reply to: ↑ 79 Changed 8 years ago by
Replying to malb:
Simon,
configure
is generated fromconfigure.in
usingautoconf
. However, Singular requires a special autoconf version (I think it is a particularly old one) which makes fixing things in its configure script tricky. I think this should go on [libsingulardevel]. Hans can help, I think.
I am not subscribed to libsingulardevel. Can you provide a link?
FYI:
I think this ticket has two purposes.
 Create a Singular spkg that builds both on linux and osx and replaces omalloc by malloc, so that it is easier to track down memory corruptions in Singular that became apparent by #715 and friends. It is just a tool to analyse things  in fact, it wouldn't even yield a working Singular executable.
 The study of memory corruptions should help to fix them  and the fixes should not only be upstream, but should be backported to a new patch level of our Singular 315 spkg.
The problem with src/omalloc/configure only concerns purpose 1. Hence, if directly editing src/omalloc/configure is dirty but works, then I wouldn't mind to do it, just so that it builds on osx.
Fortunately, the fixes in purpose 2. would probably not involve the configuration scripts.
comment:81 Changed 8 years ago by
Sorry, when posting the previous comment, I forgot to delete two paragraphs before submitting...
comment:82 followup: ↓ 83 Changed 8 years ago by
On libsingulardevel, Hans suggests to use xalloc, not malloc to detect memoryrelated problems.
Nils, do you know how to do those things?
comment:83 in reply to: ↑ 82 Changed 8 years ago by
Replying to SimonKing:
Nils, do you know how to do those things?
No, I don't, but it looks like you're learning how to on libsingulardevel already. The xalloc
approach seems to be what I was contemplating before I read up on malloc_size
and malloc_usable_size
. It's interesting to see that the singular people thought it was easier to write a wrapper from scratch than to amend omalloc
. I can sympathize with that, although it wasn't so bad in the end.
If xalloc does result in a usable executable, that would be great! I wonder what prevents us from getting that.
It may be that once you've built xalloc.so, you can just do LD_PRELOAD=xalloc.so
to preempt the normal omalloc routines. That's how things like ElectricFence
capture malloc action (provided omalloc doesn't get statically linked)
comment:84 Changed 8 years ago by
Sorry, my knowledge of automake is too limited. So far, I did not succeed to create a package that replaces omalloc by xalloc.
What I did (see also the thread in libsingular):
 replace omalloc/ by the contents of xalloc/ in Singular Spielwiese
 run
aclocal
in omalloc  run
autoconf
in omalloc  run
automake installmissing
in omalloc  run
automake
again in omalloc  add the line
installnolns: install
to the resultingMakefile.in
 turn all this into a conditional patch, that is applied if
export SINGULAR_XALLOC=yes
was done.
When I try to install the spkg, it fails with
make installnolns in omalloc make[1]: Entering directory `/home/simon/SAGE/debug/sage5.5.rc0/spkg/build/singular315.p2/src/omalloc' /bin/sh ./libtool tag=CC mode=compile gcc DPACKAGE_NAME=\"xalloc\" DPACKAGE_TARNAME=\"xalloc\" DPACKAGE_VERSION=\"3.1.2.sw\" DPACKAGE_STRING=\"xalloc\ 3.1.2.sw\" DPACKAGE_BUGREPORT=\"\" DPACKAGE_URL=\"\" DPACKAGE=\"xalloc\" DVERSION=\"3.1.2.sw\" DSTDC_HEADERS=1 DHAVE_SYS_TYPES_H=1 DHAVE_SYS_STAT_H=1 DHAVE_STDLIB_H=1 DHAVE_STRING_H=1 DHAVE_MEMORY_H=1 DHAVE_STRINGS_H=1 DHAVE_INTTYPES_H=1 DHAVE_STDINT_H=1 DHAVE_UNISTD_H=1 DHAVE_DLFCN_H=1 DLT_OBJDIR=\".libs/\" I. I. DHAVE_CONFIG_H DNDEBUG DOM_NDEBUG I/home/simon/SAGE/debug/sage5.5.rc0/local/include I/home/simon/SAGE/debug/sage5.5.rc0/local/include O3 O2 g fPIC MT libomalloc_ladummy.lo MD MP MF .deps/libomalloc_ladummy.Tpo c o libomalloc_ladummy.lo `test f 'dummy.c'  echo './'`dummy.c mv f .deps/libomalloc_ladummy.Tpo .deps/libomalloc_ladummy.Plo mv: Aufruf von stat für „.deps/libomalloc_ladummy.Tpo“ nicht möglich: Datei oder Verzeichnis nicht gefunden make[1]: *** [libomalloc_ladummy.lo] Fehler 1 make[1]: Leaving directory `/home/simon/SAGE/debug/sage5.5.rc0/spkg/build/singular315.p2/src/omalloc' make: *** [installnolns] Fehler 1 Unable to build and install Singular Error building Singular (error in build_singular).
Apparently the .deps directory was automatically created by the script ./depcomp
, that belongs to those installed by automake installmissing. Its content after the installation failed:
simon@linuxsqwp:~/SAGE/debug/sage5.5.rc0> ls spkg/build/singular315.p2/src/omalloc/.deps/ libomalloc_g_ladummy.Plo libomalloc_ladummy.Plo
And to be honest, I don't see the point of the dummy.c
file in xalloc, which is:
#include "omalloc.h" int om_sing_opt_show_mem; /* dummy */ struct omInfo_s om_Info; /* dummy */ struct omOpts_s om_Opts; /* dummy */
comment:85 Changed 8 years ago by
With the help of the good people of libsingulardevel, I managed to create a new singular spkg.
When you export SINGULAR_XALLOC=yes
then the spkg creates a libsingular library and a Singular executable (yes, it starts and can compute things!!) with omalloc replaced by xalloc (a compatibility layer on top of malloc). Otherwise, the spkg just builds with omalloc.
The good news:
 Singular works.
simon@linuxsqwp:~/SAGE/debug/sage5.5.rc0> ./sage singular SINGULAR / Development A Computer Algebra System for Polynomial Computations / version 315 0< by: W. Decker, G.M. Greuel, G. Pfister, H. Schoenemann \ Jul 2012 FB Mathematik der Universitaet, D67653 Kaiserslautern \ > ring r; > x; x > x*y+z; xy+z > quit; Auf Wiedersehen.
There is no crash in the example above!  Sage starts and quits just fine, even if libsingular is involved:
simon@linuxsqwp:~/SAGE/debug/sage5.5.rc0> ./sage   Sage Version 5.5.rc0, Release Date: 20121117   Type "notebook()" for the browserbased notebook interface.   Type "help()" for help.   ********************************************************************** * * * Warning: this is a prerelease version, and it may be unstable. * * * ********************************************************************** sage: P.<x,y> = QQ[] sage: x*y x*y sage: quit Exiting Sage (CPU time 0m0.05s, Wall time 0m9.80s).
 We can reproduce the error from the ticket description:
simon@linuxsqwp:~/SAGE/debug/sage5.5.rc0> export MALLOC_CHECK_=3 simon@linuxsqwp:~/SAGE/debug/sage5.5.rc0> ./sage   Sage Version 5.5.rc0, Release Date: 20121117   Type "notebook()" for the browserbased notebook interface.   Type "help()" for help.   ********************************************************************** * * * Warning: this is a prerelease version, and it may be unstable. * * * ********************************************************************** sage: A.<x,y> = FreeAlgebra(QQ, 2) sage: P.<x,y> = A.g_algebra({y*x:x*y}) sage: x._mul_(y) *** glibc detected *** python: free(): invalid pointer: 0x0000000002cf4920 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x766d6)[0x7f4331a486d6] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libsingular.so(_Z14gnc_mm_Mult_nnPiS_P9sip_sring+0x301)[0x7f431764de91] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libsingular.so(_Z20gnc_p_Mult_mm_CommonP8spolyrecS0_iP9sip_sring+0x21c)[0x7f431764ef5c] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/python2.7/sitepackages/sage/libs/singular/polynomial.so(+0x4c60)[0x7f4316a2cc60] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/python2.7/sitepackages/sage/rings/polynomial/plural.so(+0x1d278)[0x7f4316e70278] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/python2.7/sitepackages/sage/rings/polynomial/plural.so(+0xbb63)[0x7f4316e5eb63] ...
 Even better: We additionally get a crash when we do the example without
MALLOC_CHECK
, namely when quitting Sage:simon@linuxsqwp:~/SAGE/debug/sage5.5.rc0> export MALLOC_CHECK_= simon@linuxsqwp:~/SAGE/debug/sage5.5.rc0> ./sage   Sage Version 5.5.rc0, Release Date: 20121117   Type "notebook()" for the browserbased notebook interface.   Type "help()" for help.   ********************************************************************** * * * Warning: this is a prerelease version, and it may be unstable. * * * ********************************************************************** sage: A.<x,y> = FreeAlgebra(QQ, 2) sage: P.<x,y> = A.g_algebra({y*x:x*y}) sage: x._mul_(y) x*y sage: quit Exiting Sage (CPU time 0m0.18s, Wall time 0m7.89s). /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libcsage.so(print_backtrace+0x31)[0x7f657e22bc77] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libcsage.so(sigdie+0x14)[0x7f657e22bca9] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libcsage.so(sage_signal_handler+0x216)[0x7f657e22b886] /lib64/libpthread.so.0(+0xfd00)[0x7f6583601d00] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(+0x129940)[0x7f6583938940] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(PyGC_Collect+0x24)[0x7f6583939694] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(Py_Finalize+0x116)[0x7f6583925316] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(Py_Exit+0x8)[0x7f6583924308] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(+0x115434)[0x7f6583924434] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(PyErr_PrintEx+0x205)[0x7f65839246d5] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0x1fe)[0x7f6583924b4e] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(Py_Main+0xbd5)[0x7f6583938035] /lib64/libc.so.6(__libc_start_main+0xed)[0x7f6582c2523d] python[0x4006e1]  Unhandled SIGSEGV: A segmentation fault occurred in Sage. This probably occurred because a *compiled* component of Sage has a bug in it and is not properly wrapped with sig_on(), sig_off(). You might want to run Sage under gdb with 'sage gdb' to debug this. Sage will now terminate.  /home/simon/SAGE/debug/sage5.5.rc0/spkg/bin/sage: Zeile 310: 25245 Speicherzugriffsfehler sageipython "$@" i
Conclusion
I think this spkg is something we can work with to detect errors.
comment:86 Changed 8 years ago by
PS:
 I did not test whether this spkg builds on OSX.
 The next step should be to include the bug fix from singular_15435.patch and similar upstream fixes, and see whether the crash persists...
comment:87 Changed 8 years ago by
PPS: I am afraid the bug fixes and the xalloc patch do not commute. Hence, when including the fixes, I need to update the xalloc patch as well.
comment:88 Changed 8 years ago by
Hooray, it does build on OSX (bsd.math)!!
comment:89 Changed 8 years ago by
 Description modified (diff)
 Report Upstream changed from Reported upstream. Developers acknowledge bug. to Fixed upstream, in a later stable release.
comment:90 followup: ↓ 91 Changed 8 years ago by
The new Singular spkg is intended to eventually become a new patch level of the standard Singular spkg.
The spkg is not ready for review yet (for example, SPKG.txt needs to be updated, and there are uncommited changes). But it contains two bug fixes from upstream, it builds on Linux and OSX, and one can optionally build libsingular and the Singular executable based on xalloc.
The disadvantage is that there are much more changes to the upstream sources compared with Nils' spkg. So, it could very well be that in the xalloc version one sees bugs that are actually artefacts of totally stripping down omalloc.
Here is what happens with the new spkg, build with export SINGULAR_XALLOC=yes
on my openSuse
laptop:
Without MALLOC_CHECK
 Sage can not quit after executing a very basic polynomial arithmetics:
sage: P.<x,y> = QQ[] sage: x*y x*y sage: quit Exiting Sage (CPU time 0m0.04s, Wall time 0m3.72s).
The shell prompt does not return, ctrlC has no effect  I have to pkill all Python processes! Note thatP.<x,y>=QQ[]
alone does not trigger the problem.  In the example from the ticket description, one does not get a crash when multiplying two elements of a galgebra, but one gets a crash when quitting Sage afterwards:
sage: A.<x,y> = FreeAlgebra(QQ, 2) sage: P.<x,y> = A.g_algebra({y*x:x*y}) sage: x*y x*y sage: quit Exiting Sage (CPU time 0m0.19s, Wall time 0m19.09s). /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libcsage.so(print_backtrace+0x31)[0x7f951301ac77] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libcsage.so(sigdie+0x14)[0x7f951301aca9] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libcsage.so(sage_signal_handler+0x216)[0x7f951301a886] /lib64/libpthread.so.0(+0xfd00)[0x7f95183f0d00] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(+0x129940)[0x7f9518727940] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(PyGC_Collect+0x24)[0x7f9518728694] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(Py_Finalize+0x116)[0x7f9518714316] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(Py_Exit+0x8)[0x7f9518713308] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(+0x115434)[0x7f9518713434] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(PyErr_PrintEx+0x205)[0x7f95187136d5] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0x1fe)[0x7f9518713b4e] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(Py_Main+0xbd5)[0x7f9518727035] /lib64/libc.so.6(__libc_start_main+0xed)[0x7f9517a1423d] python[0x4006e1]  Unhandled SIGSEGV: A segmentation fault occurred in Sage. This probably occurred because a *compiled* component of Sage has a bug in it and is not properly wrapped with sig_on(), sig_off(). You might want to run Sage under gdb with 'sage gdb' to debug this. Sage will now terminate.  /home/simon/SAGE/debug/sage5.5.rc0/spkg/bin/sage: Zeile 310: 5308 Speicherzugriffsfehler sageipython "$@" i
With export MALLOC_CHECK=3
 Sage crashes when quitting after executing a very basic polynomial arithmetics:
sage: P.<x,y> = QQ[] sage: x*y x*y sage: quit Exiting Sage (CPU time 0m0.05s, Wall time 0m26.09s). /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libcsage.so(print_backtrace+0x31)[0x7fcc69d1ac77] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libcsage.so(sigdie+0x14)[0x7fcc69d1aca9] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libcsage.so(sage_signal_handler+0x216)[0x7fcc69d1a886] /lib64/libpthread.so.0(+0xfd00)[0x7fcc6f339d00] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(+0x1296d8)[0x7fcc6f6706d8] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(+0x75bca)[0x7fcc6f5bcbca] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(+0x129977)[0x7fcc6f670977] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(PyGC_Collect+0x24)[0x7fcc6f671694] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(Py_Finalize+0x116)[0x7fcc6f65d316] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(Py_Exit+0x8)[0x7fcc6f65c308] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(+0x115434)[0x7fcc6f65c434] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(PyErr_PrintEx+0x205)[0x7fcc6f65c6d5] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0x1fe)[0x7fcc6f65cb4e] /home/simon/SAGE/debug/sage5.5.rc0/local/lib/libpython2.7.so.1.0(Py_Main+0xbd5)[0x7fcc6f670035] /lib64/libc.so.6(__libc_start_main+0xed)[0x7fcc6e95d23d] python[0x4006e1]  Unhandled SIGSEGV: A segmentation fault occurred in Sage. This probably occurred because a *compiled* component of Sage has a bug in it and is not properly wrapped with sig_on(), sig_off(). You might want to run Sage under gdb with 'sage gdb' to debug this. Sage will now terminate.  /home/simon/SAGE/debug/sage5.5.rc0/spkg/bin/sage: Zeile 310: 5393 Speicherzugriffsfehler sageipython "$@" i
 There is a memory corruption detected already when creating the galgebra. The Sage or shell prompt does not appear, ctrlC has no effect, I need to pkill python:
sage: A.<x,y> = FreeAlgebra(QQ, 2) sage: P.<x,y> = A.g_algebra({y*x:x*y}) *** glibc detected *** python: malloc(): memory corruption: 0x0000000002c8db40 ***
Two upstream bugs were fixed in the new spkg. Now the question is whether the findings above will point to further upstream bugs or are simply consequences of the rather brutal approach of replacing omalloc with xalloc.
Nils, can you analyse what is happening here? You are a lot better than I in using gdb and valgrind.
comment:91 in reply to: ↑ 90 Changed 8 years ago by
sage: P.<x,y> = QQ[] sage: x*y x*y
I put this in a file to run as a doctest using valgrind and gdb. The crash happens in python's gcmodule, so the memory corruption is severe.
==12827== Invalid write of size 8 ==12827== at 0x26A58673: pp_Mult_mm__FieldQ_LengthThree_OrdGeneral (pp_Mult_mm__T.cc:48) ==12827== by 0x27748935: __pyx_f_4sage_4libs_8singular_10polynomial_singular_polynomial_mul(spolyrec**, spolyrec*, spolyrec*, sip_sring*) (pInline2.h:673) ==12827== by 0x265BCE3F: __pyx_f_4sage_5rings_10polynomial_28multi_polynomial_libsingular_23MPolynomial_libsingular__mul_(__pyx_obj_4sage_5rings_10polynomial_28multi_polynomial_libsingular_MPolynomial_libsingular*, __pyx_obj_4sage_9structure_7element_RingElement*, int) (multi_polynomial_libsingular.cpp:16475) ==12827== by 0x154E9EA8: __pyx_pw_4sage_9structure_7element_11RingElement_11__mul__ (element.c:14091) ... ==12827== Address 0x101597c0 is 0 bytes after a block of size 16 alloc'd ==12827== at 0x4A0762F: malloc (vg_replace_malloc.c:270) ==12827== by 0x26A5865E: pp_Mult_mm__FieldQ_LengthThree_OrdGeneral (omalloc.h:85) ==12827== by 0x27748935: __pyx_f_4sage_4libs_8singular_10polynomial_singular_polynomial_mul(spolyrec**, spolyrec*, spolyrec*, sip_sring*) (pInline2.h:673) ==12827== by 0x265BCE3F: __pyx_f_4sage_5rings_10polynomial_28multi_polynomial_libsingular_23MPolynomial_libsingular__mul_(__pyx_obj_4sage_5rings_10polynomial_28multi_polynomial_libsingular_MPolynomial_libsingular*, __pyx_obj_4sage_9structure_7element_RingElement*, int) (multi_polynomial_libsingular.cpp:16475) ==12827== by 0x154E9EA8: __pyx_pw_4sage_9structure_7element_11RingElement_11__mul__ (element.c:14091) ==12827== by 0x4C5435E: binary_op1 (abstract.c:945) ==12827== Invalid write of size 8 ==12827== at 0x26A5867F: pp_Mult_mm__FieldQ_LengthThree_OrdGeneral (pp_Mult_mm__T.cc:49) ==12827== by 0x27748935: __pyx_f_4sage_4libs_8singular_10polynomial_singular_polynomial_mul(spolyrec**, spolyrec*, spolyrec*, sip_sring*) (pInline2.h:673) ==12827== by 0x265BCE3F: __pyx_f_4sage_5rings_10polynomial_28multi_polynomial_libsingular_23MPolynomial_libsingular__mul_(__pyx_obj_4sage_5rings_10polynomial_28multi_polynomial_libsingular_MPolynomial_libsingular*, __pyx_obj_4sage_9structure_7element_RingElement*, int) (multi_polynomial_libsingular.cpp:16475) ==12827== by 0x154E9EA8: __pyx_pw_4sage_9structure_7element_11RingElement_11__mul__ (element.c:14091) ==12827== by 0x4C5435E: binary_op1 (abstract.c:945) ... ==12827== Address 0x101597c8 is 8 bytes after a block of size 16 alloc'd ==12827== at 0x4A0762F: malloc (vg_replace_malloc.c:270) ==12827== by 0x26A5865E: pp_Mult_mm__FieldQ_LengthThree_OrdGeneral (omalloc.h:85) ==12827== by 0x27748935: __pyx_f_4sage_4libs_8singular_10polynomial_singular_polynomial_mul(spolyrec**, spolyrec*, spolyrec*, sip_sring*) (pInline2.h:673) ==12827== by 0x265BCE3F: __pyx_f_4sage_5rings_10polynomial_28multi_polynomial_libsingular_23MPolynomial_libsingular__mul_(__pyx_obj_4sage_5rings_10polynomial_28multi_polynomial_libsingular_MPolynomial_libsingular*, __pyx_obj_4sage_9structure_7element_RingElement*, int) (multi_polynomial_libsingular.cpp:16475) ==12827== by 0x154E9EA8: __pyx_pw_4sage_9structure_7element_11RingElement_11__mul__ (element.c:14091) ==12827== by 0x4C5435E: binary_op1 (abstract.c:945)
Similar message 16 bytes past and one invalid write Address 0x101597d8 is not stack'd, malloc'd or (recently) free'd
Also:
==12827== Invalid read of size 8 ==12827== at 0x269A4A56: nlNormalize(snumber*&) (longrat.cc:1122) ==12827== by 0x269C7E50: p_Normalize(spolyrec*, sip_sring*) (polys.cc:647) ==12827== by 0x265BD052: __pyx_f_4sage_5rings_10polynomial_28multi_polynomial_libsingular_23MPolynomial_libsingular__mul_(__pyx_obj_4sage_5rings_10polynomial_28multi_polynomial_libsingular_MPolynomial_libsingular*, __pyx_obj_4sage_9structure_7element_RingElement*, int) (multi_polynomial_libsingular.cpp:31792) ==12827== by 0x154E9EA8: __pyx_pw_4sage_9structure_7element_11RingElement_11__mul__ (element.c:14091) ==12827== by 0x4C5435E: binary_op1 (abstract.c:945) ... ==12827== Address 0x101597c0 is 0 bytes after a block of size 16 alloc'd ==12827== at 0x4A0762F: malloc (vg_replace_malloc.c:270) ==12827== by 0x26A5865E: pp_Mult_mm__FieldQ_LengthThree_OrdGeneral (omalloc.h:85) ==12827== by 0x27748935: __pyx_f_4sage_4libs_8singular_10polynomial_singular_polynomial_mul(spolyrec**, spolyrec*, spolyrec*, sip_sring*) (pInline2.h:673) ==12827== by 0x265BCE3F: __pyx_f_4sage_5rings_10polynomial_28multi_polynomial_libsingular_23MPolynomial_libsingular__mul_(__pyx_obj_4sage_5rings_10polynomial_28multi_polynomial_libsingular_MPolynomial_libsingular*, __pyx_obj_4sage_9structure_7element_RingElement*, int) (multi_polynomial_libsingular.cpp:16475) ==12827== by 0x154E9EA8: __pyx_pw_4sage_9structure_7element_11RingElement_11__mul__ (element.c:14091) ==12827== by 0x4C5435E: binary_op1 (abstract.c:945) ==12827== Invalid read of size 8 ==12827== at 0x269CBC3D: p_String0(spolyrec*, sip_sring*, sip_sring*) (polys0.cc:90) ==12827== by 0x27748087: __pyx_f_4sage_4libs_8singular_10polynomial_singular_polynomial_str(spolyrec*, sip_sring*) (polynomial.cpp:4266) ==12827== by 0x2658F9B1: __pyx_pw_4sage_5rings_10polynomial_28multi_polynomial_libsingular_23MPolynomial_libsingular_33_repr_(_object*, _object*) (multi_polynomial_libsingular.cpp:17242) ==12827== by 0x4C59402: PyObject_Call (abstract.c:2529) ==12827== by 0x10CD2408: __pyx_pw_4sage_9structure_11sage_object_10SageObject_5__repr__ (sage_object.c:1790) ... ==12827== Address 0x101597d8 is not stack'd, malloc'd or (recently) free'd ==12827== Invalid read of size 8 ==12827== at 0x269CB926: writemon(spolyrec*, int, sip_sring*) (polys0.cc:25) ==12827== by 0x269CBC82: p_String0(spolyrec*, sip_sring*, sip_sring*) (polys0.cc:92) ==12827== by 0x27748087: __pyx_f_4sage_4libs_8singular_10polynomial_singular_polynomial_str(spolyrec*, sip_sring*) (polynomial.cpp:4266) ==12827== by 0x2658F9B1: __pyx_pw_4sage_5rings_10polynomial_28multi_polynomial_libsingular_23MPolynomial_libsingular_33_repr_(_object*, _object*) (multi_polynomial_libsingular.cpp:17242) ... ==12827== Address 0x101597c0 is 0 bytes after a block of size 16 alloc'd ==12827== at 0x4A0762F: malloc (vg_replace_malloc.c:270) ==12827== by 0x26A5865E: pp_Mult_mm__FieldQ_LengthThree_OrdGeneral (omalloc.h:85) ==12827== by 0x27748935: __pyx_f_4sage_4libs_8singular_10polynomial_singular_polynomial_mul(spolyrec**, spolyrec*, spolyrec*, sip_sring*) (pInline2.h:673) ==12827== by 0x265BCE3F: __pyx_f_4sage_5rings_10polynomial_28multi_polynomial_libsingular_23MPolynomial_libsingular__mul_(__pyx_obj_4sage_5rings_10polynomial_28multi_polynomial_libsingular_MPolynomial_libsingular*, __pyx_obj_4sage_9structure_7element_RingElement*, int) (multi_polynomial_libsingular.cpp:16475) ==12827== by 0x154E9EA8: __pyx_pw_4sage_9structure_7element_11RingElement_11__mul__ (element.c:14091)
etc. I tracked down the other bugs from similar information, so good luck! My guess is that this PolyBin? is a little more required than assumed.
This is really just running the test and copy/paste from valgrind! It's much faster if you can do this yourself. }}}
comment:92 followup: ↓ 93 Changed 8 years ago by
Indeed, kernel/pp_Mult_mm__T.cc:32
:
omBin bin = ri>PolyBin; DECLARE_LENGTH(const unsigned long length = ri>ExpL_Size); const unsigned long* m_e = m>exp; pAssume(!n_IsZero(ln,ri)); pAssume1(p_GetComp(m, ri) == 0  p_MaxComp(p, ri) == 0); number tmp; do { tmp = n_Mult(ln, pGetCoeff(p), ri); #ifdef HAVE_ZERODIVISORS if (! n_IsZero(tmp, ri)) { #endif p_AllocBin( pNext(q), bin, ri); q = pNext(q); pSetCoeff0(q, tmp);
I haven't looked into what happens here exactly, but r>PolyBin
does get passed around. It might not be deref problem, though, but:
omalloc/omalloc.h:142
(xalloc really)
#define omTypeAllocBin(T,P,B) P=(T)omAlloc(sizeof(B)) #define omTypeAlloc(T,P,S) P=(T)omAlloc(S) #define omTypeAlloc0Bin(T,P,B) P=(T)omAlloc0(B)
looks suspicious to me. The two allocbin routines look incompatible to me. I think they are supposed to have the same interface type.
comment:93 in reply to: ↑ 92 Changed 8 years ago by
Replying to nbruin:
I haven't looked into what happens here exactly, but
r>PolyBin
does get passed around. It might not be deref problem, though, but:omalloc/omalloc.h:142
(xalloc really)#define omTypeAllocBin(T,P,B) P=(T)omAlloc(sizeof(B)) #define omTypeAlloc(T,P,S) P=(T)omAlloc(S) #define omTypeAlloc0Bin(T,P,B) P=(T)omAlloc0(B)looks suspicious to me. The two allocbin routines look incompatible to me. I think they are supposed to have the same interface type.
Could very well be. The "sizeof(...)" was my addition  without it, the compiler would complain, and indeed looking into the code it seems that a size was expected. Perhaps I should add the "sizeof" to the other two macros as well, even though the compiler did not complain about them.
Or, yet another possibility: If I am not mistaken, the bins don't really matter when omalloc is ripped out. Hence, one could perhaps simply do nothing instead?
Alternatively, one could think of defining typedef size_t* omBin;
rather than the current typedef size_t omBin
, so that at least we have a pointer which we can allocate memory for.
comment:94 Changed 8 years ago by
Well, pNext(q)
(soon to be q
) gets allocated via:
#define p_AllocBin(p, bin, r) omTypeAllocBin(poly, p, bin)
If I'm not mistaken, a poly
is a pointer to what is ultimately
struct spolyrec { poly next; // next needs to be the first field number coef; // and coef the second  do not change this !!! unsigned long exp[VARS]; // make sure that exp is aligned };
so whatever you allocate, it had better have enough room for that.
As far as I understand, a "bin" is a (collection of) page(s) that gets used for allocation of samesized memory blocks. So I'd expect an alloc that uses a bin reference to get the size to alocate from the bin rather than as a parameter. Here, things get allocated from the "poly" bin. So perhaps you don't need bins per se, but I think you do need to keep enough of the bin to know what size memory blocks are associated with it. The above code doesn't seem to do that (none of it).
I would think (T)omAlloc(sizeof(*T))
or something like that would be closer to what's required.
At least (assuming T is a pointer type) it would allocate a block that's big enough to let a T type point to. But I may well be misreading the code.
comment:95 Changed 8 years ago by
OK, my C skills are insufficient for this:
#include<stdio.h> typedef struct{ int a; int b; int c; int d; int e; } composite; typedef composite *pointer; void main() { pointer v; printf("sizeof(composite)=%d\n",sizeof(composite)); printf("sizeof(t)=%d\n",sizeof(t)); printf("sizeof(*v)=%d\n",sizeof(*v)); };
produces the desired output:
sizeof(composite)=20 sizeof(pointer)=8 sizeof(*v)=20
So if I have my hands on an instance of a "pointer" variable, I can get the size of the structure pointed to by dereferencing. However, if I want to get this straight from the pointer type there doesn't seem to be a syntax for it. sizeof(*pointer)
gives an error ... How do you get that information?
SOLVED:
printf("sizeof(*(pointer)NULL)=%d\n",sizeof(*(pointer)NULL));
comment:96 Changed 8 years ago by
OK, this trick doesn't work. There are calls of the form
omTypeAllocBin(void*, addr, InternalInteger_bin);
which prevent
and there are calls to omAllocBin
anyway that only have the bin, not a type hint for how much memory should be allocated. So removing bins completely will simply not work: you need them at least as a storage container for block lengths associated to them.
Short: The AllocBin
routines seem to allocate from a bin, not the bin itself. I think you can take this back to libsingulardevel. Hannes claims that xalloc is a sufficient omalloc replacement, so he probably has a solution to this (which I bet is resurrecting omBin_s and supporting rudimentary initialization of it. We only need the sizeW field on these things)
In fact, from the line
omBin slists_bin = omGetSpecBin(sizeof(slists));
I bet that you can declare
typedef struct omBin_s { size_t sizeW; size_t sizeB; } omBin_t; typedef omBin_t* omBin; static inline omBin omGetSpecBin(size_t S) { omBin bin = (omBin)malloc(sizeof(omBin_t)); bin>sizeB = S; bin>sizeW = S/sizeof(long); return bin; } static inline void omUnGetSpecBin(omBin bin) { free(bin); } #define omTypeAllocBin(T,P,B) P=(T)omAlloc(B>sizeB)
etc.
comment:97 followup: ↓ 98 Changed 8 years ago by
NO!! It's much easier. Hannes is right: just store bin>sizeB directly in the bin. Then
#define omTypeAllocBin(T,P,B) P=(T)omAlloc(B)
is the correct description, since the bin now simply IS the size_t.
comment:98 in reply to: ↑ 97 Changed 8 years ago by
Replying to nbruin:
NO!! It's much easier. Hannes is right: just store bin>sizeB directly in the bin. Then
#define omTypeAllocBin(T,P,B) P=(T)omAlloc(B)is the correct description, since the bin now simply IS the size_t.
Sorry, I don't get that.
typedef size_t omBin; static inline omBin omGetSpecBin(size_t S) { return (omBin)sizeof(omBin_t); }
or what?
And remove omUnGetSpecBin
?
And then
#define omTypeAllocBin(T,P,B) P=(T)omAlloc(B)
?
comment:99 Changed 8 years ago by
PS: Could you please have a look at the file omalloc/Makefile
that my patch creates? It contains
installnolns: install installlibsingular: install
Both targets installnolns
and installlibsingular
are requested. Does that mean that the compilation is done twice? If this is so, how can I prevent it?
comment:100 Changed 8 years ago by
No. It doesn't work in the way I just tried. Now Sage won't even start.
comment:101 Changed 8 years ago by
I have updated the new spkg and posted a new version of singular_xalloc.patch.
Now it seems to work!
simon@linuxsqwp:~/SAGE/debug/sage5.5.rc0> ./sage   Sage Version 5.5.rc0, Release Date: 20121117   Type "notebook()" for the browserbased notebook interface.   Type "help()" for help.   ********************************************************************** * * * Warning: this is a prerelease version, and it may be unstable. * * * ********************************************************************** sage: A.<x,y> = FreeAlgebra(QQ, 2) sage: P.<x,y> = A.g_algebra({y*x:x*y}) sage: x*y x*y sage: quit Exiting Sage (CPU time 0m0.48s, Wall time 0m10.86s). simon@linuxsqwp:~/SAGE/debug/sage5.5.rc0> export MALLOC_CHECK_=3 simon@linuxsqwp:~/SAGE/debug/sage5.5.rc0> ./sage   Sage Version 5.5.rc0, Release Date: 20121117   Type "notebook()" for the browserbased notebook interface.   Type "help()" for help.   ********************************************************************** * * * Warning: this is a prerelease version, and it may be unstable. * * * ********************************************************************** sage: P.<x,y> = QQ[] sage: x*y x*y sage: quit Exiting Sage (CPU time 0m0.04s, Wall time 0m6.09s). simon@linuxsqwp:~/SAGE/debug/sage5.5.rc0> ./sage   Sage Version 5.5.rc0, Release Date: 20121117   Type "notebook()" for the browserbased notebook interface.   Type "help()" for help.   ********************************************************************** * * * Warning: this is a prerelease version, and it may be unstable. * * * ********************************************************************** sage: A.<x,y> = FreeAlgebra(QQ, 2) sage: P.<x,y> = A.g_algebra({y*x:x*y}) sage: x*y x*y sage: quit Exiting Sage (CPU time 0m0.17s, Wall time 0m8.22s).
My suggestion: Using the doc tests (run with MALLOC_CHECK_=3
), we try to detect further crashes. This may result in further up or downstream fixes. Eventually, the new spkg shall become standard, with all the fixes backported, and with optional xalloc support.
comment:102 Changed 8 years ago by
I am now running make test
with MALLOC_CHECK_=3
and it seems to work quit well!
However, it seems that the spkg should also install omalloc/mylimits.h
: It turns out that ./sage br
only works since there is mylimits.h
around from building the original singular spkg.
comment:103 Changed 8 years ago by
PS: Normally, mylimits.h just includes "limits.h"  but in IRIX6 machines that wouldn't work, and so on these machines the header omlimits.h generated by the configure script would be included.
Hence, I will soon create a new spkg that installs mylimits.h including limits.h  but the xallocversion of the spkg will not work on IRIX6 machines.
comment:104 Changed 8 years ago by
PPS: The spkg installing mylimits.h is already posted.
comment:105 Changed 8 years ago by
make test
reports
sage t force_lib "devel/sage/sage/graphs/graph_generators.py" # Killed/crashed sage t force_lib "devel/sage/sage/misc/sageinspect.py" Total time for all tests: 6884.7 seconds make: *** [test] Fehler 132 simon@linuxsqwp:~/SAGE/debug/sage5.5.rc0> echo $SINGULAR_XALLOC yes simon@linuxsqwp:~/SAGE/debug/sage5.5.rc0> echo $MALLOC_CHECK_ 3
To be on the safe side, can you verify that I did not misspell the environment variable? I.e., is there a trailing underscore?
Note that the error in sageinspect is due to the fact that I also had #11768 applied, which needs work. I don't know if the crash in graph_generators is related.
Is that good or bad?
 It is good, since the xalloc version of Singular works very well, and the crashes we discussed here have apparently been fixed upstream (and the fixes backported). Hence, I am now going to finalise the spkg for review.
 It is bad, since I think we had hoped that we see easytoanalyse crashes in garbage collection  after all, I am using sage5.5.rc0, which has some of the weak reference patches applied. But if there is no crash, there is nothing to analyse.
However, if I recall correctly, the crashes did only occur when also considering #12215, not only #715 and #11521.
comment:106 Changed 8 years ago by
PS: The graph_generators test pass fine if MALLOC_CHECK_
is not defined. Hence, we see a new bug here. I'll report on sagedevel.
comment:107 Changed 8 years ago by
PPS: The graph_generators test also crashes with the default omalloc Singular. Hence, it definitely is unrelated.
comment:108 followup: ↓ 109 Changed 8 years ago by
PPPS:
All tests passed! Total time for all tests: 6376.5 seconds bash3.2$ echo $MALLOC_CHECK_ 3
with the xalloc singular on bsd.math!!
comment:109 in reply to: ↑ 108 ; followup: ↓ 110 Changed 8 years ago by
Replying to SimonKing:
PPPS:
All tests passed! Total time for all tests: 6376.5 seconds bash3.2$ echo $MALLOC_CHECK_ 3with the xalloc singular on bsd.math!!
That variable doesn't do anything on OSX, so congratulations that xalloc works on OSX, but you haven't done stringent memory checking yet. Try
export DYLD_INSERT_LIBRARIES=/usr/lib/libgmalloc.dylib
instead. see man libgmalloc
for various options to select what should be guarded.
(warning: It will run MUCH slower and take MUCH more memory. It seems gmalloc is about as strict as ElectricFence
which is also incredibly slow. Valgrind is speedy by comparison)
comment:110 in reply to: ↑ 109 Changed 8 years ago by
Replying to nbruin:
That variable doesn't do anything on OSX, so congratulations that xalloc works on OSX, but you haven't done stringent memory checking yet. Try
export DYLD_INSERT_LIBRARIES=/usr/lib/libgmalloc.dylibinstead. see
man libgmalloc
for various options to select what should be guarded. (warning: It will run MUCH slower and take MUCH more memory. It seems gmalloc is about as strict asElectricFence
which is also incredibly slow. Valgrind is speedy by comparison)
Thank you!
First result:
  Sage Version 5.5.rc0, Release Date: 20121117   Type "notebook()" for the browserbased notebook interface.   Type "help()" for help.   ********************************************************************** * * * Warning: this is a prerelease version, and it may be unstable. * * * ********************************************************************** GuardMalloc: Allocations will be placed on 16 byte boundaries. GuardMalloc:  Some buffer overruns may not be noticed. GuardMalloc:  Applications using vector instructions (e.g., SSE or Altivec) should work. GuardMalloc: GuardMalloc version 23 GuardMalloc: Allocations will be placed on 16 byte boundaries. GuardMalloc:  Some buffer overruns may not be noticed. GuardMalloc:  Applications using vector instructions (e.g., SSE or Altivec) should work. GuardMalloc: GuardMalloc version 23 GuardMalloc: Allocations will be placed on 16 byte boundaries. GuardMalloc:  Some buffer overruns may not be noticed. GuardMalloc:  Applications using vector instructions (e.g., SSE or Altivec) should work. GuardMalloc: GuardMalloc version 23 GuardMalloc: Allocations will be placed on 16 byte boundaries. GuardMalloc:  Some buffer overruns may not be noticed. GuardMalloc:  Applications using vector instructions (e.g., SSE or Altivec) should work. GuardMalloc: GuardMalloc version 23 sage: A.<x,y> = FreeAlgebra(QQ, 2) sage: P.<x,y> = A.g_algebra({y*x:x*y}) sage: x*y x*y sage: quit Exiting Sage (CPU time 0m24.65s, Wall time 1m15.86s).
So, the bug really is fixed, on OSX as well.
Second result: Yes, it is incredibly slow!
Nevertheless, I just started sage testall
under gmalloc on bsd.math. Let's see whether it will finish this year...
comment:111 Changed 8 years ago by
How can I increase the timeout for doctests? According to what google gave me, I tried export TIMEOUT=1800
, but that didn't work.
comment:112 Changed 8 years ago by
Aha! It seems to require SAGE_TIMEOUT
!
comment:113 Changed 8 years ago by
Even a timeout of 1800 seconds is not enough for any test. I tried sage/schemes (because memory problems often show up there), but every single test times out.
comment:114 followup: ↓ 115 Changed 8 years ago by
I think running sage t gdb ...
disables the timeout. If you're just testing a single file you might as well do that (in screen
of courseit'll be a while before you check back). It's how I got thetraceback for the first heisenbug. It also leaves you immediately with an environment where you can examine the backtrace etc.
comment:115 in reply to: ↑ 114 Changed 8 years ago by
Replying to nbruin:
I think running
sage t gdb ...
disables the timeout. If you're just testing a single file you might as well do that (inscreen
of courseit'll be a while before you check back). It's how I got thetraceback for the first heisenbug. It also leaves you immediately with an environment where you can examine the backtrace etc.
Well, I first wanted to have a "quick" overview: Is there any problematic test? Namely, on Linux, everything works just fine with MALLOC_CHECK_ (except the known problem with the graphs). And if there is another crash on OSX in the quick test then I wanted to analyse it more deeply.
But the only test that did not timeout with SAGE_TIMEOUT=1800
was a test that normally takes 3.5 seconds...
Is there any other memory checker than gmalloc that I can use on OSX, that is faster and finds some of the issues that gmalloc would find?
comment:116 Changed 8 years ago by
Concerning the packaging of xalloc
in the singular package. Presently:
$ ls l /tmp/singular_xalloc.patch r. 1 847572 Dec 1 10:05 /tmp/singular_xalloc.patch $ grep '^' /tmp/singular_xalloc.patch wc 26089 106795 824416 $ grep '^+' /tmp/singular_xalloc.patch wc 374 1168 10286
so it's a large patch: 800kB on 8MB for the spkg. I guess it'll compress well, so the ration is not quite 10%, but still: The patch mainly consists of exactly listing all the files in the omalloc
directory in order to delete them. That means that we're now shipping omalloc
twice in order to be able to remove it.
I propose to code the patch differently:
 put xalloc into its own subdir; inside next to
omalloc
probably. You can do this by patch or something else.  If
SINGULAR_XALLOC
isyes
, do explicitmv
and orln s
commands to more the originalomalloc
out of the way and replace it byxalloc
 the changes to the singular source itself should of course remain in the patch file.
Perhaps it's not according to the letter of "use patch to change src", but patch is just inappropriate for a rip out and replace
operation like this.
comment:117 Changed 8 years ago by
 Description modified (diff)
 Status changed from new to needs_review
I have produced a new patchlevel for the singular spkg. The optional patch is now smaller, since I simply erase the content of the omalloc/ folder, so that the removed files are not part of the patch.
Needs review!
comment:118 Changed 8 years ago by
With singularxalloc and using gmalloc on bsd.math, I tested sage/schemes. Of course, most tests timed out. But these few did fail differently:
sage t "devel/sage/sage/schemes/toric/divisor_class.pyx" sage t "devel/sage/sage/schemes/toric/fano_variety.py" sage t "devel/sage/sage/schemes/toric/homset.py" sage t "devel/sage/sage/schemes/toric/library.py"
I don't know why, but there was no test.log for it. Hence, I am now repeating these tests separately.
comment:119 followup: ↓ 122 Changed 8 years ago by
Valgrind gives suspicious records for two of these tests:
sage t valgrind "devel/sage/sage/schemes/toric/fano_variety.py"
==20768== Syscall param open(filename) points to unaddressable byte(s) ==20768== at 0x31CFEE41F0: __open_nocancel (in /lib64/libc2.14.90.so) ==20768== by 0x31CFE79508: _IO_file_fopen@@GLIBC_2.2.5 (in /lib64/libc2.14.90.so) ==20768== by 0x31CFE6E265: __fopen_internal (in /lib64/libc2.14.90.so) ==20768== by 0x31D6275A06: std::__basic_file<char>::open(char const*, std::_Ios_Openmode, int) (in /usr/lib64/libstdc++.so.6.0.16) ==20768== by 0x31D627947B: std::basic_filebuf<char, std::char_traits<char> >::open(char const*, std::_Ios_Openmode) (in /usr/lib64/libstdc++.so.6.0.16) ==20768== by 0x31D627A9D2: std::basic_ofstream<char, std::char_traits<char> >::basic_ofstream(char const*, std::_Ios_Openmode) (in /usr/lib64/libstdc++.so.6.0.16) ==20768== by 0x2D69E6E2: LinBox::commentator() (commentator.h:857) ==20768== by 0x2D7BF15E: unsigned long& LinBox::rank<LinBox::BlasMatrix<LinBox::PID_integer>, LinBox::HybridSpecifier>(unsigned long&, LinBox::BlasMatrix<LinBox::PID_integer> const&, LinBox::RingCategories::IntegerTag const&, LinBox::HybridSpecifier const&) (rank.h:629) ==20768== by 0x2D6A864C: linbox_integer_dense_rank(__mpz_struct (**) [1], unsigned long, unsigned long) (rank.h:110) ==20768== by 0x2B4B25FC: __pyx_pf_4sage_6matrix_20matrix_integer_dense_20Matrix_integer_dense_100_rank_linbox (matrix_integer_dense.c:25464) ==20768== by 0x4C59402: PyObject_Call (abstract.c:2529) ==20768== by 0x2B4BFC5E: __pyx_pw_4sage_6matrix_20matrix_integer_dense_20Matrix_integer_dense_99rank (matrix_integer_dense.c:25357) ==20768== by 0x4C59402: PyObject_Call (abstract.c:2529) ==20768== by 0x2BB90582: __pyx_pw_4sage_6matrix_21matrix_rational_dense_21Matrix_rational_dense_83rank (matrix_rational_dense.c:21989) ==20768== by 0x4C59402: PyObject_Call (abstract.c:2529) ==20768== by 0x2973737D: __pyx_pw_4sage_6matrix_7matrix2_6Matrix_7solve_right (matrix2.c:4604) ==20768== by 0x4CFC624: PyEval_EvalFrameEx (ceval.c:4021) ==20768== by 0x4CFE274: PyEval_EvalCodeEx (ceval.c:3253) ==20768== by 0x4CFC69F: PyEval_EvalFrameEx (ceval.c:4117) ==20768== by 0x4CFE274: PyEval_EvalCodeEx (ceval.c:3253) ==20768== by 0x4CFC69F: PyEval_EvalFrameEx (ceval.c:4117) ==20768== Address 0x0 is not stack'd, malloc'd or (recently) free'd
I don't know how serious it is to call open with a zero pointer. Probably not at all...
sage t "devel/sage/sage/schemes/toric/library.py"
:
==21248== Invalid read of size 8 ==21248== at 0x3432C24F: __pyx_f_4sage_9functions_8prime_pi_7PrimePi_prime_phi_small (prime_pi.c:2190) ==21248== by 0x3432A937: __pyx_f_4sage_9functions_8prime_pi_7PrimePi_prime_phi_large (prime_pi.c:1624) ==21248== by 0x3432B349: __pyx_pf_4sage_9functions_8prime_pi_7PrimePi_4__call__ (prime_pi.c:1489) ==21248== by 0x3432BBBC: __pyx_pw_4sage_9functions_8prime_pi_7PrimePi_5__call__ (prime_pi.c:1202) ==21248== by 0x4C59402: PyObject_Call (abstract.c:2529) ==21248== by 0x2ADCA63A: __pyx_pw_4sage_3ext_13multi_modular_22MultiModularBasis_base_5__init__ (multi_modular.c:4498) ==21248== by 0x4CB5DC7: type_call (typeobject.c:737) ==21248== by 0x4C59402: PyObject_Call (abstract.c:2529) ==21248== by 0x2B4CCDF4: __pyx_pw_4sage_6matrix_20matrix_integer_dense_20Matrix_integer_dense_45_multiply_multi_modular (matrix_integer_dense.c:12884) ==21248== by 0x4C59402: PyObject_Call (abstract.c:2529) ==21248== by 0x2B4BE7C6: __pyx_f_4sage_6matrix_20matrix_integer_dense_20Matrix_integer_dense__matrix_times_matrix_ (matrix_integer_dense.c:10112) ==21248== by 0x43DCC37E: __pyx_f_4sage_6matrix_6action_18MatrixMatrixAction__call_ (action.c:3179) ==21248== by 0x151550FE: __pyx_f_4sage_9structure_6coerce_24CoercionModel_cache_maps_bin_op (coerce.c:6785) ==21248== by 0x14CE7775: __pyx_pw_4sage_9structure_7element_6Matrix_5__mul__ (element.c:19222) ==21248== by 0x4C5435E: binary_op1 (abstract.c:945) ==21248== by 0x4C572A7: PyNumber_Multiply (abstract.c:1216) ==21248== by 0x4CFA2F0: PyEval_EvalFrameEx (ceval.c:1264) ==21248== by 0x4CFE274: PyEval_EvalCodeEx (ceval.c:3253) ==21248== by 0x4CFC69F: PyEval_EvalFrameEx (ceval.c:4117) ==21248== by 0x4CFD3CA: PyEval_EvalFrameEx (ceval.c:4107) ==21248== by 0x4CFD3CA: PyEval_EvalFrameEx (ceval.c:4107) ==21248== by 0x4CFE274: PyEval_EvalCodeEx (ceval.c:3253) ==21248== by 0x4CFC69F: PyEval_EvalFrameEx (ceval.c:4117) ==21248== by 0x4CFD3CA: PyEval_EvalFrameEx (ceval.c:4107) ==21248== by 0x4CFE274: PyEval_EvalCodeEx (ceval.c:3253) ==21248== Address 0x3f42f390 is 0 bytes after a block of size 336 alloc'd ==21248== at 0x4A0762F: malloc (vg_replace_malloc.c:270) ==21248== by 0x3432B173: __pyx_pf_4sage_9functions_8prime_pi_7PrimePi_4__call__ (memory.h:32) ==21248== by 0x3432BBBC: __pyx_pw_4sage_9functions_8prime_pi_7PrimePi_5__call__ (prime_pi.c:1202) ==21248== by 0x4C59402: PyObject_Call (abstract.c:2529) ==21248== by 0x2ADCA63A: __pyx_pw_4sage_3ext_13multi_modular_22MultiModularBasis_base_5__init__ (multi_modular.c:4498) ==21248== by 0x4CB5DC7: type_call (typeobject.c:737) ==21248== by 0x4C59402: PyObject_Call (abstract.c:2529) ==21248== by 0x2B4CCDF4: __pyx_pw_4sage_6matrix_20matrix_integer_dense_20Matrix_integer_dense_45_multiply_multi_modular (matrix_integer_dense.c:12884) ==21248== by 0x4C59402: PyObject_Call (abstract.c:2529) ==21248== by 0x2B4BE7C6: __pyx_f_4sage_6matrix_20matrix_integer_dense_20Matrix_integer_dense__matrix_times_matrix_ (matrix_integer_dense.c:10112) ==21248== by 0x43DCC37E: __pyx_f_4sage_6matrix_6action_18MatrixMatrixAction__call_ (action.c:3179) ==21248== by 0x151550FE: __pyx_f_4sage_9structure_6coerce_24CoercionModel_cache_maps_bin_op (coerce.c:6785) ==21248== by 0x14CE7775: __pyx_pw_4sage_9structure_7element_6Matrix_5__mul__ (element.c:19222) ==21248== by 0x4C5435E: binary_op1 (abstract.c:945) ==21248== by 0x4C572A7: PyNumber_Multiply (abstract.c:1216) ==21248== by 0x4CFA2F0: PyEval_EvalFrameEx (ceval.c:1264) ==21248== by 0x4CFE274: PyEval_EvalCodeEx (ceval.c:3253) ==21248== by 0x4CFC69F: PyEval_EvalFrameEx (ceval.c:4117) ==21248== by 0x4CFD3CA: PyEval_EvalFrameEx (ceval.c:4107) ==21248== by 0x4CFD3CA: PyEval_EvalFrameEx (ceval.c:4107) ==21248== by 0x4CFE274: PyEval_EvalCodeEx (ceval.c:3253) ==21248== by 0x4CFC69F: PyEval_EvalFrameEx (ceval.c:4117) ==21248== by 0x4CFD3CA: PyEval_EvalFrameEx (ceval.c:4107) ==21248== by 0x4CFE274: PyEval_EvalCodeEx (ceval.c:3253) ==21248== by 0x4CFC69F: PyEval_EvalFrameEx (ceval.c:4117)
This looks like an offbyone on an array of small primes. Indeed, in
sage/functions/prime_pi.pyx
:196
self.primes = <long*> sage_malloc(self.primeCount*sizeof(long))
whereas in
sage/functions/prime_pi.pyx
:263
for i in range(4, self.primeCount): if prime >= m_max: break x = N / prime if preTransition: if prime*prime <= x: sum = self.prime_phi_small(x, prime)  i else: sum = self.prime_phi_small(x)  i preTransition = False else: sum = self.prime_phi_small(x)  i prime = self.primes[i + 1] return sum
The last line indeed gets prime
from an unallocated address. However, that value doesn't get used. So if this instruction succeeds I don't think it'll do harm, but in extremely unlucky situations (when self.primes
is just at the edge of a page) this could lead to a segfault. The same thing happens at other places in this file too.
The sanitary thing is to allocate one "long" more. Then we'll only read memory that's guaranteed to exist.
comment:120 Changed 8 years ago by
 Milestone changed from sage5.6 to sage5.5
 Status changed from needs_review to positive_review
doctests check out as far as I can test. Business rules for spkg have been followed, so I think this can get a positive review.
The new spkg doesn't actually contain code written by me, so I think I'm eligible to give a positive review.
comment:121 Changed 8 years ago by
 Milestone changed from sage5.5 to sage5.6
 Reviewers set to Nils Bruin
comment:122 in reply to: ↑ 119 ; followup: ↓ 123 Changed 8 years ago by
Replying to nbruin:
Valgrind gives suspicious records for two of these tests:
...
sage t "devel/sage/sage/schemes/toric/library.py"
:...
This looks like an offbyone on an array of small primes. Indeed, in
sage/functions/prime_pi.pyx
:196self.primes = <long*> sage_malloc(self.primeCount*sizeof(long))whereas in
sage/functions/prime_pi.pyx
:263for i in range(4, self.primeCount): if prime >= m_max: break x = N / prime if preTransition: if prime*prime <= x: sum = self.prime_phi_small(x, prime)  i else: sum = self.prime_phi_small(x)  i preTransition = False else: sum = self.prime_phi_small(x)  i prime = self.primes[i + 1] return sumThe last line indeed gets
prime
from an unallocated address. However, that value doesn't get used. So if this instruction succeeds I don't think it'll do harm, but in extremely unlucky situations (whenself.primes
is just at the edge of a page) this could lead to a segfault. The same thing happens at other places in this file too.The sanitary thing is to allocate one "long" more. Then we'll only read memory that's guaranteed to exist.
OK, but that wouldn't be an upstream bug, right?
Shall it then be dealt with on a different ticket, or shall we provide a patch for the Sage library, in addition to the new Singular spkg?
comment:123 in reply to: ↑ 122 Changed 8 years ago by
Replying to SimonKing:
OK, but that wouldn't be an upstream bug, right? Shall it then be dealt with on a different ticket, or shall we provide a patch for the Sage library, in addition to the new Singular spkg?
Absolutely a separate ticket! Those issues have nothing to do with singular. They are just diagnosed by running doctests through Valgrind.
comment:124 Changed 8 years ago by
 Merged in set to sage5.6.beta0
 Resolution set to fixed
 Status changed from positive_review to closed
comment:125 Changed 8 years ago by
 Description modified (diff)
comment:126 followup: ↓ 128 Changed 8 years ago by
Where did the Makefile for xalloc come from? It's written in a nonautotools way and should be fixed. It's probably easy to fix, but it should be reported "upstream", whoever that is.
comment:127 Changed 8 years ago by
 Merged in sage5.6.beta0 deleted
 Resolution fixed deleted
 Status changed from closed to new
comment:128 in reply to: ↑ 126 ; followup: ↓ 129 Changed 8 years ago by
Replying to jdemeyer:
Where did the Makefile for xalloc come from? It's written in a nonautotools way and should be fixed. It's probably easy to fix, but it should be reported "upstream", whoever that is.
It originally came from upstream (i.e., Hans Schönemann), but the "install" bits are my modifications. I have no idea how to use autotools to do it correctly.
comment:129 in reply to: ↑ 128 Changed 8 years ago by
Replying to SimonKing:
Replying to jdemeyer:
Where did the Makefile for xalloc come from? It's written in a nonautotools way and should be fixed. It's probably easy to fix, but it should be reported "upstream", whoever that is.
It originally came from upstream (i.e., Hans Schönemann), but the "install" bits are my modifications. I have no idea how to use autotools to do it correctly.
PS: I don't think that it would be needed to inform upstream, because the optional xalloc support is not and (to my knowledge) will not be part of an official Singular release.
comment:130 Changed 8 years ago by
I'll fix the Makefile, hang on...
comment:131 Changed 8 years ago by
 Description modified (diff)
 Status changed from new to needs_review
comment:132 Changed 8 years ago by
I changed the spkg, the changes are here: xallocmakefile.diff. Needs review.
comment:133 Changed 8 years ago by
I'm trying to run Sage under the electric fence memory debugger and also ran into the prime_phi
issue. It seems you managed to discuss it in detail but then failed to open an actual ticket for it. This is now #13861
comment:134 Changed 8 years ago by
 Status changed from needs_review to positive_review
Jeroen's changes look good to me.
comment:135 Changed 8 years ago by
 Merged in set to sage5.6.beta0
 Resolution set to fixed
 Status changed from positive_review to closed
comment:136 Changed 8 years ago by
Sorry for asking so late (after resolving the ticket as fixed). But perhaps it has been a bad idea to use a new environment variable SINGULAR_XALLOC
to trigger the use of xalloc. What do you think: Would it be a better idea to build singular with xalloc if SAGE_DEBUG
is set to "yes", as in #13864?
comment:137 Changed 8 years ago by
Switching SINGULAR_XALLOC
to SAGE_DEBUG
will be done in #13876.
See also #13447 for previous uses of libsingularmalloc, as well as this sagedevel thread