Opened 5 months ago

Closed 4 months ago

# Fix use of freed memory in Symmetrica

Reported by: Owned by: zabrocki critical sage-5.13 combinatorics symmetric functions, symmetrica, memleak sage-combinat, saliola, aschilling, zabrocki, mguaypaq, darij, tscrim, mhansen, jdemeyer sage-5.13.beta2 Mike Zabrocki Jeroen Demeyer N/A #13413

The following bug seems to appear on Macs only (linux machines do not seem to be effected).

To begin with, let's do a change of basis in a small degree:

```    sage: p = SymmetricFunctions(QQ).powersum()
sage: s = SymmetricFunctions(QQ).schur()
sage: s(p[2,2])
s[1, 1, 1, 1] - s[2, 1, 1] + 2*s[2, 2] - s[3, 1] + s[4]
```

Now let's do one in a larger degree:

```    sage: time g = s(p([1]*47))
Time: CPU 19.16 s, Wall: 20.91 s
```

Now let's do that first one again:

```    sage: s(p[2,2])
s[1, 1, 1, 1] - s[2, 1, 1] + 4571483302*s[2, 2] - s[3, 1] + s[4]
```

That's not the correct answer ! And the next time you ask Sage, it gives different, still incorrect, answers:

```    sage: s(p[2,2])
s[1, 1, 1, 1] - s[2, 1, 1] + 4614252243*s[2, 2] - s[3, 1] + s[4]
sage: s(p[2,2])
s[1, 1, 1, 1] - s[2, 1, 1] + 4634718110*s[2, 2] - s[3, 1] + s[4]
sage: s(p[2,2])
s[1, 1, 1, 1] - s[2, 1, 1] + 4631047636*s[2, 2] - s[3, 1] + s[4]
```

In this discussion on sage-combinat-devel, Anne noticed that the problem is "not symmetrica per se, but some wrapper functions around symmetrica, as the following example shows":

```sage: f = eval('symmetrica.t_POWSYM_SCHUR')
sage: f({Partition([2,2]):Integer(1)})
s[1, 1, 1, 1] - s[2, 1, 1] + 2*s[2, 2] - s[3, 1] + s[4]
sage: time g = f({Partition([1]*47):Integer(1)})
Time: CPU 13.03 s, Wall: 13.04 s
sage: f({Partition([2,2]):Integer(1)})
s[1, 1, 1, 1] - s[2, 1, 1] + 2*s[2, 2] - s[3, 1] + s[4]
sage: Sym = SymmetricFunctions(QQ)
sage: p = Sym.power()
sage: s = Sym.schur()
sage: s(p[2,2])
s[1, 1, 1, 1] - s[2, 1, 1] + 4631790164*s[2, 2] - s[3, 1] + s[4]
sage: f({Partition([2,2]):Integer(1)})
s[1, 1, 1, 1] - s[2, 1, 1] + 2*s[2, 2] - s[3, 1] + s[4]
```

And Travis commented that:

When (random) big numbers appear like that, it almost always is an integer overflow. I suspect the communication with Symmetrica goes through the native integer. Is this correct? If so, then that's where I'd say the problem lies.

This is a bug which is carried over from #13413 . When we realized that there are actually two bugs and one has been corrected, these tickets were split.

### comment:1 Changed 5 months ago by zabrocki

• Cc sage-combinat saliola aschilling zabrocki mguaypaq darij tscrim mhansen added
• Keywords symmetric functions symmetrica memleak added

### comment:2 Changed 5 months ago by zabrocki

The attachement symmetrica-test.py is a test file that is described in Comment 3 of #13413

### comment:3 Changed 5 months ago by zabrocki

I want to get this bug resolved too. The bug in #13413 sent me into a bit of panic because I realized that very few calculations using the symmetric function package weren't questionable (sorry for the double negative). I now have slightly more confidence in symmetric function calculations, but only a little more than before.

I started reading Matheiu's comment carefully and he states : "the problem appears at size 47". However I ran his program on my computer (OS 10.8.5 with 2.6 GHz Intel Core i7 processor) and on a mac in my office with copies of sage that had #13413 applied and not. The error seemed to appear somewhere between degree 47 and 51 depending on some non-deterministic effect. Mostly it was starting at 48.

### comment:4 Changed 5 months ago by zabrocki

To help identify the problem I need to know precisely what steps trigger the error:

```sage: from sage.libs.symmetrica.all import t_POWSYM_SCHUR as PS
sage: g = PS({Partition([1]*50):ZZ(1)})
sage: PS({Partition([1]*4):ZZ(1)})
s[1, 1, 1, 1] + 3*s[2, 1, 1] + 2*s[2, 2] + 3*s[3, 1] + s[4]
sage: PS({Partition([1]*4):QQ(1)})
s[1, 1, 1, 1] + 3*s[2, 1, 1] + 2*s[2, 2] + 508150636*s[3, 1] + s[4]
```

So it seems that POWSYM to SCHUR over the integers has no visible problems, but POWSYM to SCHUR over the rationals is having the overflow, but only after we've done a sufficiently large calculation (over the integers or rationals).

I'm having a hard time identifying the problem inside of the symmetrica package because if you create a c program with a command scan(POW_SYM,a); then it allows you to enter a power symmetric function with all kinds of coefficients, but none of them are rational.

```enter kind of object
integer     [1]vector      [2]partition   [3]bruch       [4]permutation [6]
skewpart    [7]tableaux    [8]polynom     [9]schurfunk  [10]matrix     [11]
homsym     [13]schubert   [14]kostka     [16]symchar    [18]word       [19]
list       [20]longint    [22]powersum   [28]mon. sym.  [29]groupalg.  [32]
elm. sym.  [33]fin. field [35]reihe      [36]cyclotomic [41]monopoly   [42]
what kind:? 1
```

### comment:5 Changed 5 months ago by darij

"bruch" means "fraction" in German, so maybe that is what a rational should be?

### comment:6 Changed 5 months ago by zabrocki

Thanks for that one. I should have tried it since I didn't know what it was. I thought bruch was not quite breakfast and not quite lunch, but it comes with a slice of canteloupe at the end. You don't get completely what you would at breakfast, but you get a good meal.

### comment:7 Changed 5 months ago by zabrocki

Ok. I think that I can recreate the error in Symmetrica now.

Here is my test.c program:

```#include "def.h"
#include "macro.h"

ANFANG
scan(POW_SYM,a);
t_POWSYM_SCHUR(a, b);
freeall(a);
freeall(b);
a=callocobject();
b=callocobject();
scan(POW_SYM,a);
t_POWSYM_SCHUR(a, b);
println(b);
ENDE
```

In the first scan I enter a power sum p_{1^50} with coefficient integer 1. In the second scan I enter p_{1^5} with a bruch coefficient of 1/1 and my output is:

```-6917520240879580904 11111   -20.752587.120249.330592  1112   -27.670133.758739.499376
113   -20.752613.517859.918472  122   -20.752587.120249.330592  14   -20.752613.517859.918472
23  -6917520240879580904 5
```

### comment:8 Changed 5 months ago by zabrocki

Just to make sure that it really is the first calculation that is messing things up, I tried running the same program with a power sum p_{1^5} with coefficient integer 1 and in the second scan I enter p_{1^5} with a bruch coefficient of 1/1. In this case my output is as expected:

```1 11111  4 1112  6 113  5 122  4 14  5 23  1 5
```

I don't have a fix, but it seems to be a garbage collection error and a problem with Macs.

### comment:9 Changed 5 months ago by zabrocki

I get the same sort of problem with the following test.c program. This seems to show that the problem is with the 'freeall' and not with the fact that I reuse the a and b variables. ANFANG is making a bunch of calls to callocobject for variables a through h and ENDE is doing a bunch of freeall calls for the same variables.

```#include "def.h"
#include "macro.h"

ANFANG
scan(POW_SYM,a);
t_POWSYM_SCHUR(a, b);
freeall(a);
freeall(b);
scan(POW_SYM,c);
t_POWSYM_SCHUR(c, d);
println(d);
a = callocobject();
b = callocobject();
ENDE
```

### comment:10 Changed 5 months ago by zabrocki

I should remark that if move the freeall(a) and freeall(b) lines after the println(d) statement then I do not get the same error.

This leads me to believe that if c has rational (bruch) coefficients that it is because when the program starts to allocate the memory for the bruch coefficients that certain bits are not being initialized to 0 and they need to be. I don't know why this error would be triggered only when b uses up a lot of memory. Maybe if b is something huge then the the memory collection reuses memory and if b is small then the allocation of c and d memory is 'clean' and is initialized to 0 already.

I should also remark that the coefficients that are printed out in the println(d) are not random (as we are seeing in sage) in that if I insert several copies of println(d) in the program they always printout the same random numbers.

### comment:11 Changed 5 months ago by zabrocki

Here is something integeresting(tm), if I put the freeall(a) and freeall(b) after the scan(POW_SYM,c) line then I get the correct answers. As in...

```#include "def.h"
#include "macro.h"

ANFANG
scan(POW_SYM,a);
t_POWSYM_SCHUR(a, b);
scan(POW_SYM,c);
freeall(a);
freeall(b);
t_POWSYM_SCHUR(c, d);
println(d);
println(d);
a = callocobject();
b = callocobject();
ENDE
```

This means that the problem seems to be in the initialization of c (through the scan) and not in the calculation of d.

### comment:12 Changed 5 months ago by zabrocki

Schur enough, if the lines after t_POWSYM_SCHUR(a, b); are:

```freeall(a);
freeall(b);
scan(POW_SYM,c);
println(c);
```

Then the output of c is -8070441737306486504 11111.

### comment:13 Changed 5 months ago by zabrocki

I'm getting closer. When my program is simply:

```#include "def.h"
#include "macro.h"

ANFANG
scan(POW_SYM,a);
t_POWSYM_SCHUR(a, b);
freeall(a);
freeall(b);
scan_bruch(c);
println(c);
a = callocobject();
b = callocobject();
ENDE
```

and my input for a is p_{1^50} with coefficient integer 1 and my input for c is 1/1 then when it prints out c I see 2305851790110509326.

### comment:14 Changed 5 months ago by zabrocki

The error seems to be if the denominator is 1. If my input for c is a fraction where the denominator is not 1 (e.g. 1/2 or 7/12) then the output is correct. If my input for c is 1/1 or 14/7 then my output for c is some random integer (like -1152912740476237544).

### comment:15 Changed 5 months ago by zabrocki

I can also trigger this error by scaning two bruch's and adding them together and then using as input any two rational numbers that add to a whole number. Their sum seems to be a random integer.

### comment:16 Changed 5 months ago by zabrocki

Guess where the most German is used in the C code? Anne or Darij, can I assume that a good place to to look for this bug in the functions that begin with the word kuerzen in the bruch.c file?

### comment:17 Changed 5 months ago by zabrocki

OK, the following just looks wrong to me and it is in kuerzen_integer_integer and this same issue seems to appear in several places.

```    ggterg = ggt_i(S_B_UI(bruch),S_B_OI(bruch));

if (ggterg == S_B_UI(bruch)) {
freeself_bruch(bruch);
M_I_I(S_B_OI(bruch) / ggterg,bruch);
goto ende;
}
```

It seems to say (1) calculate the gcd of the numerator (S_B_OI(bruch)) and the denominator (S_B_UI(bruch)) (2) if the gcd is equal to the denominator then freeself_bruch(bruch) and then divide ... and at this point I scream wait! don't touch S_B_OI(bruch) because bruch is already free!

So I want to change this to the following:

```    ggterg = ggt_i(S_B_UI(bruch),S_B_OI(bruch));

if (ggterg == S_B_UI(bruch)) {
tmp = S_B_OI(bruch);
freeself_bruch(bruch);
M_I_I(tmp / ggterg,bruch);
goto ende;
}
```

When I do this...guess what!?!?! I don't seem to see the random integer in the output when I use the test.c from comment 13. I think that this means I have found the bug.

### comment:18 Changed 5 months ago by zabrocki

OK. I have added a file called bruch.c and this should replace the file bruch.c that is currently in the spkg. Sorry, I probably should have done a diff of some kind rather than the file itself, but its too late now.

Jeroen, can you please replace this in the spkg? As far as I can tell, the 8 or so lines that I changed fixes the original Symmetrica bug.

### comment:19 Changed 5 months ago by zabrocki

OK, I'm not a complete NULL. I can create my own spkg.

My initial tests seem to indicate that this will fix the bug.

Apply: trac_15312_doctest-mz.patch​

### comment:20 Changed 5 months ago by zabrocki

• Authors set to Mike Zabrocki
• Dependencies set to #13413

### comment:21 Changed 5 months ago by zabrocki

• Status changed from new to needs_review

### comment:22 Changed 5 months ago by zabrocki

The file bruch.diff is the only change in the spkg from #13413.

### comment:23 Changed 5 months ago by jdemeyer

• Description modified (diff)

### comment:24 Changed 5 months ago by jdemeyer

• Status changed from needs_review to needs_work

You should never change files directly in the src/ directory in a spkg. Add a patch in the patches/ directory and document what you did in SPKG.txt. Commit all changes and please put up a diff of the full spkg (such a diff should look like [attachment:symmetrica-2.0.p8.diff:ticket:13413).

### comment:25 Changed 5 months ago by jdemeyer

Are you sure the change that you made is ok now? Are you sure that the memory that tmp points to isn't freed by freeself_bruch(bruch)? Would it not be even safer to free bruch even later?

### comment:26 Changed 5 months ago by mhansen

tmp is a uint32_t and not a pointer -- that's what the S_B_UI does.

### comment:27 Changed 5 months ago by jdemeyer

You're right, it's not a pointer, so it's safe.

### comment:28 Changed 5 months ago by zabrocki

I've changed the spkg and included a diff file. Does it look ok now? The diff file doesn't quite have the same header information as yours.

In looking for other instances of this bug, I noticed that other places (e.g. kuerzen_longint_integer) have a similar fix that looks like it was corrected before my time.

```            INT wi = S_B_OI(bruch);
freeself_bruch(bruch);
M_I_I(wi,bruch);
```

### comment:29 Changed 5 months ago by jdemeyer

1) The changes inside the spkg need to be committed (hg commit):

```jdemeyer@boxen:~/spkg/symmetrica-2.0.p9\$ hg status
M SPKG.txt
? patches/bruch.patch
```

2) The directory src/ should be reverted to the exact contents it had before, including timestamps (in practice, simply remove src/ and copy it from the previous spkg):

```jdemeyer@boxen:~/spkg/symmetrica-2.0.p9\$ ls -l src/b*
-rw-r--r-- 1 jdemeyer jdemeyer 23190 Dec  6  2007 src/bar.c
-rw-r--r-- 1 jdemeyer jdemeyer  3467 Dec  7  2007 src/bar.doc
-rw-r--r-- 1 jdemeyer jdemeyer 35117 Dec  6  2007 src/bi.c
-rw-r--r-- 1 jdemeyer jdemeyer   348 Dec  7  2007 src/bi.doc
-rw-r--r-- 1 jdemeyer jdemeyer 77748 Dec  6  2007 src/boe.c
-rw-r--r-- 1 jdemeyer jdemeyer  7150 Dec  7  2007 src/boe.doc
-rw-r--r-- 1 jdemeyer jdemeyer 59384 Oct 21 21:38 src/bruch.c
-rw-r--r-- 1 jdemeyer jdemeyer  2122 Dec  7  2007 src/bruch.doc
```

3) The spkg doesn't install, because of problems with the filenames inside bruch.patch:

```\$ ./sage -i http://garsia.math.yorku.ca/~zabrocki/symmetrica-2.0.p9.spkg
[...]
can't find file to patch at input line 3
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|--- ../src/bruch.c     2013-10-21 15:38:23.000000000 -0400
|+++ ../../symmetrica-2.0.p9/src/bruch.c        2013-10-21 13:35:00.000000000 -0400
--------------------------
```

### comment:30 Changed 5 months ago by jdemeyer

Mike: excellent analysis of the problem by the way!

### comment:31 Changed 5 months ago by jdemeyer

• Summary changed from fix integer overflow (?) in conversion of powersums to Schur functions to Fix use of freed memory in Symmetrica

### comment:32 follow-up: ↓ 33 Changed 5 months ago by zabrocki

I've fixed the spkg and I think it is all right now. The src directory has the original time stamps. I committed the changes and hg status is fine. The spkg installs (not sure what I did wrong there, but should be ok now). What did you do to create the diff file?

### comment:33 in reply to: ↑ 32 Changed 5 months ago by jdemeyer

What did you do to create the diff file?

After committing, hg export tip.

### comment:34 Changed 5 months ago by jdemeyer

• Reviewers set to Jeroen Demeyer

In trac_15312_doctest-mz.patch, you are missing a ::

### comment:35 Changed 5 months ago by zabrocki

• Status changed from needs_work to needs_review

### comment:36 Changed 5 months ago by jdemeyer

• Status changed from needs_review to positive_review

### comment:37 Changed 5 months ago by zabrocki

Thank you! Having this fixed is a load off of my mind.

### comment:38 Changed 4 months ago by jdemeyer

• Merged in set to sage-5.13.beta2
• Resolution set to fixed
• Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.