Opened 6 months ago
Closed 6 months ago
#15312 closed defect (fixed)
Fix use of freed memory in Symmetrica
Reported by: | zabrocki | Owned by: | |
---|---|---|---|
Priority: | critical | Milestone: | sage-5.13 |
Component: | combinatorics | Keywords: | symmetric functions, symmetrica, memleak |
Cc: | sage-combinat, saliola, aschilling, zabrocki, mguaypaq, darij, tscrim, mhansen, jdemeyer | Merged in: | sage-5.13.beta2 |
Authors: | Mike Zabrocki | Reviewers: | Jeroen Demeyer |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | #13413 | Stopgaps: |
Description (last modified by jdemeyer)
The following bug seems to appear on Macs only (linux machines do not seem to be effected).
To begin with, let's do a change of basis in a small degree:
sage: p = SymmetricFunctions(QQ).powersum() sage: s = SymmetricFunctions(QQ).schur() sage: s(p[2,2]) s[1, 1, 1, 1] - s[2, 1, 1] + 2*s[2, 2] - s[3, 1] + s[4]
Now let's do one in a larger degree:
sage: time g = s(p([1]*47)) Time: CPU 19.16 s, Wall: 20.91 s
Now let's do that first one again:
sage: s(p[2,2]) s[1, 1, 1, 1] - s[2, 1, 1] + 4571483302*s[2, 2] - s[3, 1] + s[4]
That's not the correct answer ! And the next time you ask Sage, it gives different, still incorrect, answers:
sage: s(p[2,2]) s[1, 1, 1, 1] - s[2, 1, 1] + 4614252243*s[2, 2] - s[3, 1] + s[4] sage: s(p[2,2]) s[1, 1, 1, 1] - s[2, 1, 1] + 4634718110*s[2, 2] - s[3, 1] + s[4] sage: s(p[2,2]) s[1, 1, 1, 1] - s[2, 1, 1] + 4631047636*s[2, 2] - s[3, 1] + s[4]
In this discussion on sage-combinat-devel, Anne noticed that the problem is "not symmetrica per se, but some wrapper functions around symmetrica, as the following example shows":
sage: f = eval('symmetrica.t_POWSYM_SCHUR') sage: f({Partition([2,2]):Integer(1)}) s[1, 1, 1, 1] - s[2, 1, 1] + 2*s[2, 2] - s[3, 1] + s[4] sage: time g = f({Partition([1]*47):Integer(1)}) Time: CPU 13.03 s, Wall: 13.04 s sage: f({Partition([2,2]):Integer(1)}) s[1, 1, 1, 1] - s[2, 1, 1] + 2*s[2, 2] - s[3, 1] + s[4] sage: Sym = SymmetricFunctions(QQ) sage: p = Sym.power() sage: s = Sym.schur() sage: s(p[2,2]) s[1, 1, 1, 1] - s[2, 1, 1] + 4631790164*s[2, 2] - s[3, 1] + s[4] sage: f({Partition([2,2]):Integer(1)}) s[1, 1, 1, 1] - s[2, 1, 1] + 2*s[2, 2] - s[3, 1] + s[4]
And Travis commented that:
When (random) big numbers appear like that, it almost always is an integer overflow. I suspect the communication with Symmetrica goes through the native integer. Is this correct? If so, then that's where I'd say the problem lies.
This is a bug which is carried over from #13413 . When we realized that there are actually two bugs and one has been corrected, these tickets were split.
spkg: http://garsia.math.yorku.ca/~zabrocki/symmetrica-2.0.p9.spkg
apply: trac_15312_doctest-mz.patch
Attachments (5)
Change History (43)
comment:1 Changed 6 months ago by zabrocki
- Cc sage-combinat saliola aschilling zabrocki mguaypaq darij tscrim mhansen added
- Keywords symmetric functions symmetrica memleak added
Changed 6 months ago by zabrocki
comment:2 Changed 6 months ago by zabrocki
comment:3 Changed 6 months ago by zabrocki
I want to get this bug resolved too. The bug in #13413 sent me into a bit of panic because I realized that very few calculations using the symmetric function package weren't questionable (sorry for the double negative). I now have slightly more confidence in symmetric function calculations, but only a little more than before.
I started reading Matheiu's comment carefully and he states : "the problem appears at size 47". However I ran his program on my computer (OS 10.8.5 with 2.6 GHz Intel Core i7 processor) and on a mac in my office with copies of sage that had #13413 applied and not. The error seemed to appear somewhere between degree 47 and 51 depending on some non-deterministic effect. Mostly it was starting at 48.
comment:4 Changed 6 months ago by zabrocki
To help identify the problem I need to know precisely what steps trigger the error:
sage: from sage.libs.symmetrica.all import t_POWSYM_SCHUR as PS sage: g = PS({Partition([1]*50):ZZ(1)}) sage: PS({Partition([1]*4):ZZ(1)}) s[1, 1, 1, 1] + 3*s[2, 1, 1] + 2*s[2, 2] + 3*s[3, 1] + s[4] sage: PS({Partition([1]*4):QQ(1)}) s[1, 1, 1, 1] + 3*s[2, 1, 1] + 2*s[2, 2] + 508150636*s[3, 1] + s[4]
So it seems that POWSYM to SCHUR over the integers has no visible problems, but POWSYM to SCHUR over the rationals is having the overflow, but only after we've done a sufficiently large calculation (over the integers or rationals).
I'm having a hard time identifying the problem inside of the symmetrica package because if you create a c program with a command scan(POW_SYM,a); then it allows you to enter a power symmetric function with all kinds of coefficients, but none of them are rational.
enter kind of object integer [1]vector [2]partition [3]bruch [4]permutation [6] skewpart [7]tableaux [8]polynom [9]schurfunk [10]matrix [11] homsym [13]schubert [14]kostka [16]symchar [18]word [19] list [20]longint [22]powersum [28]mon. sym. [29]groupalg. [32] elm. sym. [33]fin. field [35]reihe [36]cyclotomic [41]monopoly [42] radical [43]bitvector [44]laurent [45]barperm [46] what kind:? 1
comment:5 Changed 6 months ago by darij
"bruch" means "fraction" in German, so maybe that is what a rational should be?
comment:6 Changed 6 months ago by zabrocki
Thanks for that one. I should have tried it since I didn't know what it was. I thought bruch was not quite breakfast and not quite lunch, but it comes with a slice of canteloupe at the end. You don't get completely what you would at breakfast, but you get a good meal.
comment:7 Changed 6 months ago by zabrocki
Ok. I think that I can recreate the error in Symmetrica now.
Here is my test.c program:
#include "def.h" #include "macro.h" ANFANG scan(POW_SYM,a); t_POWSYM_SCHUR(a, b); freeall(a); freeall(b); a=callocobject(); b=callocobject(); scan(POW_SYM,a); t_POWSYM_SCHUR(a, b); println(b); ENDE
In the first scan I enter a power sum p_{1^50} with coefficient integer 1. In the second scan I enter p_{1^5} with a bruch coefficient of 1/1 and my output is:
-6917520240879580904 11111 -20.752587.120249.330592 1112 -27.670133.758739.499376 113 -20.752613.517859.918472 122 -20.752587.120249.330592 14 -20.752613.517859.918472 23 -6917520240879580904 5
comment:8 Changed 6 months ago by zabrocki
Just to make sure that it really is the first calculation that is messing things up, I tried running the same program with a power sum p_{1^5} with coefficient integer 1 and in the second scan I enter p_{1^5} with a bruch coefficient of 1/1. In this case my output is as expected:
1 11111 4 1112 6 113 5 122 4 14 5 23 1 5
I don't have a fix, but it seems to be a garbage collection error and a problem with Macs.
comment:9 Changed 6 months ago by zabrocki
I get the same sort of problem with the following test.c program. This seems to show that the problem is with the 'freeall' and not with the fact that I reuse the a and b variables. ANFANG is making a bunch of calls to callocobject for variables a through h and ENDE is doing a bunch of freeall calls for the same variables.
#include "def.h" #include "macro.h" ANFANG scan(POW_SYM,a); t_POWSYM_SCHUR(a, b); freeall(a); freeall(b); scan(POW_SYM,c); t_POWSYM_SCHUR(c, d); println(d); a = callocobject(); b = callocobject(); ENDE
comment:10 Changed 6 months ago by zabrocki
I should remark that if move the freeall(a) and freeall(b) lines after the println(d) statement then I do not get the same error.
This leads me to believe that if c has rational (bruch) coefficients that it is because when the program starts to allocate the memory for the bruch coefficients that certain bits are not being initialized to 0 and they need to be. I don't know why this error would be triggered only when b uses up a lot of memory. Maybe if b is something huge then the the memory collection reuses memory and if b is small then the allocation of c and d memory is 'clean' and is initialized to 0 already.
I should also remark that the coefficients that are printed out in the println(d) are not random (as we are seeing in sage) in that if I insert several copies of println(d) in the program they always printout the same random numbers.
comment:11 Changed 6 months ago by zabrocki
Here is something integeresting^{(tm)}, if I put the freeall(a) and freeall(b) after the scan(POW_SYM,c) line then I get the correct answers. As in...
#include "def.h" #include "macro.h" ANFANG scan(POW_SYM,a); t_POWSYM_SCHUR(a, b); scan(POW_SYM,c); freeall(a); freeall(b); t_POWSYM_SCHUR(c, d); println(d); println(d); a = callocobject(); b = callocobject(); ENDE
This means that the problem seems to be in the initialization of c (through the scan) and not in the calculation of d.
comment:12 Changed 6 months ago by zabrocki
Schur enough, if the lines after t_POWSYM_SCHUR(a, b); are:
freeall(a); freeall(b); scan(POW_SYM,c); println(c);
Then the output of c is -8070441737306486504 11111.
comment:13 Changed 6 months ago by zabrocki
I'm getting closer. When my program is simply:
#include "def.h" #include "macro.h" ANFANG scan(POW_SYM,a); t_POWSYM_SCHUR(a, b); freeall(a); freeall(b); scan_bruch(c); println(c); a = callocobject(); b = callocobject(); ENDE
and my input for a is p_{1^50} with coefficient integer 1 and my input for c is 1/1 then when it prints out c I see 2305851790110509326.
comment:14 Changed 6 months ago by zabrocki
The error seems to be if the denominator is 1. If my input for c is a fraction where the denominator is not 1 (e.g. 1/2 or 7/12) then the output is correct. If my input for c is 1/1 or 14/7 then my output for c is some random integer (like -1152912740476237544).
comment:15 Changed 6 months ago by zabrocki
I can also trigger this error by scaning two bruch's and adding them together and then using as input any two rational numbers that add to a whole number. Their sum seems to be a random integer.
comment:16 Changed 6 months ago by zabrocki
Guess where the most German is used in the C code? Anne or Darij, can I assume that a good place to to look for this bug in the functions that begin with the word kuerzen in the bruch.c file?
comment:17 Changed 6 months ago by zabrocki
OK, the following just looks wrong to me and it is in kuerzen_integer_integer and this same issue seems to appear in several places.
ggterg = ggt_i(S_B_UI(bruch),S_B_OI(bruch)); if (ggterg == S_B_UI(bruch)) { freeself_bruch(bruch); M_I_I(S_B_OI(bruch) / ggterg,bruch); goto ende; }
It seems to say (1) calculate the gcd of the numerator (S_B_OI(bruch)) and the denominator (S_B_UI(bruch)) (2) if the gcd is equal to the denominator then freeself_bruch(bruch) and then divide ... and at this point I scream wait! don't touch S_B_OI(bruch) because bruch is already free!
So I want to change this to the following:
ggterg = ggt_i(S_B_UI(bruch),S_B_OI(bruch)); if (ggterg == S_B_UI(bruch)) { tmp = S_B_OI(bruch); freeself_bruch(bruch); M_I_I(tmp / ggterg,bruch); goto ende; }
When I do this...guess what!?!?! I don't seem to see the random integer in the output when I use the test.c from comment 13. I think that this means I have found the bug.
Changed 6 months ago by zabrocki
comment:18 Changed 6 months ago by zabrocki
OK. I have added a file called bruch.c and this should replace the file bruch.c that is currently in the spkg. Sorry, I probably should have done a diff of some kind rather than the file itself, but its too late now.
Jeroen, can you please replace this in the spkg? As far as I can tell, the 8 or so lines that I changed fixes the original Symmetrica bug.
comment:19 Changed 6 months ago by zabrocki
OK, I'm not a complete NULL. I can create my own spkg.
My initial tests seem to indicate that this will fix the bug.
Apply: trac_15312_doctest-mz.patch
comment:20 Changed 6 months ago by zabrocki
- Dependencies set to #13413
comment:21 Changed 6 months ago by zabrocki
- Status changed from new to needs_review
Changed 6 months ago by zabrocki
comment:22 Changed 6 months ago by zabrocki
The file bruch.diff is the only change in the spkg from #13413.
comment:23 Changed 6 months ago by jdemeyer
- Cc jdemeyer added
- Description modified (diff)
comment:24 Changed 6 months ago by jdemeyer
- Status changed from needs_review to needs_work
You should never change files directly in the src/ directory in a spkg. Add a patch in the patches/ directory and document what you did in SPKG.txt. Commit all changes and please put up a diff of the full spkg (such a diff should look like [attachment:symmetrica-2.0.p8.diff:ticket:13413).
comment:25 Changed 6 months ago by jdemeyer
Are you sure the change that you made is ok now? Are you sure that the memory that tmp points to isn't freed by freeself_bruch(bruch)? Would it not be even safer to free bruch even later?
comment:26 Changed 6 months ago by mhansen
tmp is a uint32_t and not a pointer -- that's what the S_B_UI does.
comment:27 Changed 6 months ago by jdemeyer
You're right, it's not a pointer, so it's safe.
comment:28 Changed 6 months ago by zabrocki
I've changed the spkg and included a diff file. Does it look ok now? The diff file doesn't quite have the same header information as yours.
In looking for other instances of this bug, I noticed that other places (e.g. kuerzen_longint_integer) have a similar fix that looks like it was corrected before my time.
INT wi = S_B_OI(bruch); freeself_bruch(bruch); M_I_I(wi,bruch);
comment:29 Changed 6 months ago by jdemeyer
1) The changes inside the spkg need to be committed (hg commit):
jdemeyer@boxen:~/spkg/symmetrica-2.0.p9$ hg status M SPKG.txt ? patches/bruch.patch
2) The directory src/ should be reverted to the exact contents it had before, including timestamps (in practice, simply remove src/ and copy it from the previous spkg):
jdemeyer@boxen:~/spkg/symmetrica-2.0.p9$ ls -l src/b* -rw-r--r-- 1 jdemeyer jdemeyer 23190 Dec 6 2007 src/bar.c -rw-r--r-- 1 jdemeyer jdemeyer 3467 Dec 7 2007 src/bar.doc -rw-r--r-- 1 jdemeyer jdemeyer 35117 Dec 6 2007 src/bi.c -rw-r--r-- 1 jdemeyer jdemeyer 348 Dec 7 2007 src/bi.doc -rw-r--r-- 1 jdemeyer jdemeyer 77748 Dec 6 2007 src/boe.c -rw-r--r-- 1 jdemeyer jdemeyer 7150 Dec 7 2007 src/boe.doc -rw-r--r-- 1 jdemeyer jdemeyer 59384 Oct 21 21:38 src/bruch.c -rw-r--r-- 1 jdemeyer jdemeyer 2122 Dec 7 2007 src/bruch.doc
3) The spkg doesn't install, because of problems with the filenames inside bruch.patch:
$ ./sage -i http://garsia.math.yorku.ca/~zabrocki/symmetrica-2.0.p9.spkg [...] can't find file to patch at input line 3 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -------------------------- |--- ../src/bruch.c 2013-10-21 15:38:23.000000000 -0400 |+++ ../../symmetrica-2.0.p9/src/bruch.c 2013-10-21 13:35:00.000000000 -0400 --------------------------
comment:30 Changed 6 months ago by jdemeyer
Mike: excellent analysis of the problem by the way!
comment:31 Changed 6 months ago by jdemeyer
- Summary changed from fix integer overflow (?) in conversion of powersums to Schur functions to Fix use of freed memory in Symmetrica
comment:32 follow-up: ↓ 33 Changed 6 months ago by zabrocki
I've fixed the spkg and I think it is all right now. The src directory has the original time stamps. I committed the changes and hg status is fine. The spkg installs (not sure what I did wrong there, but should be ok now). What did you do to create the diff file?
comment:33 in reply to: ↑ 32 Changed 6 months ago by jdemeyer
comment:34 Changed 6 months ago by jdemeyer
- Reviewers set to Jeroen Demeyer
In trac_15312_doctest-mz.patch, you are missing a ::
Changed 6 months ago by zabrocki
Changed 6 months ago by zabrocki
comment:35 Changed 6 months ago by zabrocki
- Status changed from needs_work to needs_review
comment:36 Changed 6 months ago by jdemeyer
- Status changed from needs_review to positive_review
comment:37 Changed 6 months ago by zabrocki
Thank you! Having this fixed is a load off of my mind.
comment:38 Changed 6 months ago by jdemeyer
- Merged in set to sage-5.13.beta2
- Resolution set to fixed
- Status changed from positive_review to closed
The attachement symmetrica-test.py is a test file that is described in Comment 3 of #13413