Opened 6 months ago

Closed 6 months ago

#15312 closed defect (fixed)

Fix use of freed memory in Symmetrica

Reported by: zabrocki Owned by:
Priority: critical Milestone: sage-5.13
Component: combinatorics Keywords: symmetric functions, symmetrica, memleak
Cc: sage-combinat, saliola, aschilling, zabrocki, mguaypaq, darij, tscrim, mhansen, jdemeyer Merged in: sage-5.13.beta2
Authors: Mike Zabrocki Reviewers: Jeroen Demeyer
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: #13413 Stopgaps:

Description (last modified by jdemeyer)

The following bug seems to appear on Macs only (linux machines do not seem to be effected).

To begin with, let's do a change of basis in a small degree:

    sage: p = SymmetricFunctions(QQ).powersum()
    sage: s = SymmetricFunctions(QQ).schur()
    sage: s(p[2,2])
    s[1, 1, 1, 1] - s[2, 1, 1] + 2*s[2, 2] - s[3, 1] + s[4]

Now let's do one in a larger degree:

    sage: time g = s(p([1]*47))
    Time: CPU 19.16 s, Wall: 20.91 s

Now let's do that first one again:

    sage: s(p[2,2])
    s[1, 1, 1, 1] - s[2, 1, 1] + 4571483302*s[2, 2] - s[3, 1] + s[4]

That's not the correct answer ! And the next time you ask Sage, it gives different, still incorrect, answers:

    sage: s(p[2,2])
    s[1, 1, 1, 1] - s[2, 1, 1] + 4614252243*s[2, 2] - s[3, 1] + s[4]
    sage: s(p[2,2])
    s[1, 1, 1, 1] - s[2, 1, 1] + 4634718110*s[2, 2] - s[3, 1] + s[4]
    sage: s(p[2,2])
    s[1, 1, 1, 1] - s[2, 1, 1] + 4631047636*s[2, 2] - s[3, 1] + s[4]

In this discussion on sage-combinat-devel, Anne noticed that the problem is "not symmetrica per se, but some wrapper functions around symmetrica, as the following example shows":

sage: f = eval('symmetrica.t_POWSYM_SCHUR')
sage: f({Partition([2,2]):Integer(1)})
s[1, 1, 1, 1] - s[2, 1, 1] + 2*s[2, 2] - s[3, 1] + s[4]
sage: time g = f({Partition([1]*47):Integer(1)})
Time: CPU 13.03 s, Wall: 13.04 s
sage: f({Partition([2,2]):Integer(1)})
s[1, 1, 1, 1] - s[2, 1, 1] + 2*s[2, 2] - s[3, 1] + s[4]
sage: Sym = SymmetricFunctions(QQ)
sage: p = Sym.power()
sage: s = Sym.schur()
sage: s(p[2,2])
s[1, 1, 1, 1] - s[2, 1, 1] + 4631790164*s[2, 2] - s[3, 1] + s[4]
sage: f({Partition([2,2]):Integer(1)})
s[1, 1, 1, 1] - s[2, 1, 1] + 2*s[2, 2] - s[3, 1] + s[4]

And Travis commented that:

When (random) big numbers appear like that, it almost always is an integer overflow. I suspect the communication with Symmetrica goes through the native integer. Is this correct? If so, then that's where I'd say the problem lies.

This is a bug which is carried over from #13413 . When we realized that there are actually two bugs and one has been corrected, these tickets were split.

spkg: http://garsia.math.yorku.ca/~zabrocki/symmetrica-2.0.p9.spkg

apply: trac_15312_doctest-mz.patch

Attachments (5)

symmetrica-test.py (530 bytes) - added by zabrocki 6 months ago.
bruch.c (58.1 KB) - added by zabrocki 6 months ago.
bruch.diff (1.2 KB) - added by zabrocki 6 months ago.
trac_15312_doctest-mz.patch (1.0 KB) - added by zabrocki 6 months ago.
symmetrica-2.0.p9.diff (2.4 KB) - added by zabrocki 6 months ago.

Download all attachments as: .zip

Change History (43)

comment:1 Changed 6 months ago by zabrocki

  • Cc sage-combinat saliola aschilling zabrocki mguaypaq darij tscrim mhansen added
  • Keywords symmetric functions symmetrica memleak added

Changed 6 months ago by zabrocki

comment:2 Changed 6 months ago by zabrocki

The attachement symmetrica-test.py is a test file that is described in Comment 3 of #13413

comment:3 Changed 6 months ago by zabrocki

I want to get this bug resolved too. The bug in #13413 sent me into a bit of panic because I realized that very few calculations using the symmetric function package weren't questionable (sorry for the double negative). I now have slightly more confidence in symmetric function calculations, but only a little more than before.

I started reading Matheiu's comment carefully and he states : "the problem appears at size 47". However I ran his program on my computer (OS 10.8.5 with 2.6 GHz Intel Core i7 processor) and on a mac in my office with copies of sage that had #13413 applied and not. The error seemed to appear somewhere between degree 47 and 51 depending on some non-deterministic effect. Mostly it was starting at 48.

comment:4 Changed 6 months ago by zabrocki

To help identify the problem I need to know precisely what steps trigger the error:

sage: from sage.libs.symmetrica.all import t_POWSYM_SCHUR as PS
sage: g = PS({Partition([1]*50):ZZ(1)})
sage: PS({Partition([1]*4):ZZ(1)})
s[1, 1, 1, 1] + 3*s[2, 1, 1] + 2*s[2, 2] + 3*s[3, 1] + s[4]
sage: PS({Partition([1]*4):QQ(1)})
s[1, 1, 1, 1] + 3*s[2, 1, 1] + 2*s[2, 2] + 508150636*s[3, 1] + s[4]

So it seems that POWSYM to SCHUR over the integers has no visible problems, but POWSYM to SCHUR over the rationals is having the overflow, but only after we've done a sufficiently large calculation (over the integers or rationals).

I'm having a hard time identifying the problem inside of the symmetrica package because if you create a c program with a command scan(POW_SYM,a); then it allows you to enter a power symmetric function with all kinds of coefficients, but none of them are rational.

enter kind of object
integer     [1]vector      [2]partition   [3]bruch       [4]permutation [6]
skewpart    [7]tableaux    [8]polynom     [9]schurfunk  [10]matrix     [11]
homsym     [13]schubert   [14]kostka     [16]symchar    [18]word       [19]
list       [20]longint    [22]powersum   [28]mon. sym.  [29]groupalg.  [32]
elm. sym.  [33]fin. field [35]reihe      [36]cyclotomic [41]monopoly   [42]
radical    [43]bitvector  [44]laurent    [45]barperm    [46]
what kind:? 1

comment:5 Changed 6 months ago by darij

"bruch" means "fraction" in German, so maybe that is what a rational should be?

comment:6 Changed 6 months ago by zabrocki

Thanks for that one. I should have tried it since I didn't know what it was. I thought bruch was not quite breakfast and not quite lunch, but it comes with a slice of canteloupe at the end. You don't get completely what you would at breakfast, but you get a good meal.

comment:7 Changed 6 months ago by zabrocki

Ok. I think that I can recreate the error in Symmetrica now.

Here is my test.c program:

#include "def.h"
#include "macro.h"

ANFANG
scan(POW_SYM,a);
t_POWSYM_SCHUR(a, b);
freeall(a);
freeall(b);
a=callocobject();
b=callocobject();
scan(POW_SYM,a);
t_POWSYM_SCHUR(a, b);
println(b);
ENDE

In the first scan I enter a power sum p_{1^50} with coefficient integer 1. In the second scan I enter p_{1^5} with a bruch coefficient of 1/1 and my output is:

-6917520240879580904 11111   -20.752587.120249.330592  1112   -27.670133.758739.499376 
 113   -20.752613.517859.918472  122   -20.752587.120249.330592  14   -20.752613.517859.918472 
 23  -6917520240879580904 5  

comment:8 Changed 6 months ago by zabrocki

Just to make sure that it really is the first calculation that is messing things up, I tried running the same program with a power sum p_{1^5} with coefficient integer 1 and in the second scan I enter p_{1^5} with a bruch coefficient of 1/1. In this case my output is as expected:

1 11111  4 1112  6 113  5 122  4 14  5 23  1 5 

I don't have a fix, but it seems to be a garbage collection error and a problem with Macs.

comment:9 Changed 6 months ago by zabrocki

I get the same sort of problem with the following test.c program. This seems to show that the problem is with the 'freeall' and not with the fact that I reuse the a and b variables. ANFANG is making a bunch of calls to callocobject for variables a through h and ENDE is doing a bunch of freeall calls for the same variables.

#include "def.h"
#include "macro.h"

ANFANG
scan(POW_SYM,a); 
t_POWSYM_SCHUR(a, b); 
freeall(a);
freeall(b);
scan(POW_SYM,c); 
t_POWSYM_SCHUR(c, d); 
println(d); 
a = callocobject();
b = callocobject();
ENDE

comment:10 Changed 6 months ago by zabrocki

I should remark that if move the freeall(a) and freeall(b) lines after the println(d) statement then I do not get the same error.

This leads me to believe that if c has rational (bruch) coefficients that it is because when the program starts to allocate the memory for the bruch coefficients that certain bits are not being initialized to 0 and they need to be. I don't know why this error would be triggered only when b uses up a lot of memory. Maybe if b is something huge then the the memory collection reuses memory and if b is small then the allocation of c and d memory is 'clean' and is initialized to 0 already.

I should also remark that the coefficients that are printed out in the println(d) are not random (as we are seeing in sage) in that if I insert several copies of println(d) in the program they always printout the same random numbers.

comment:11 Changed 6 months ago by zabrocki

Here is something integeresting(tm), if I put the freeall(a) and freeall(b) after the scan(POW_SYM,c) line then I get the correct answers. As in...

#include "def.h"
#include "macro.h"

ANFANG
scan(POW_SYM,a);
t_POWSYM_SCHUR(a, b);
scan(POW_SYM,c);
freeall(a);
freeall(b);
t_POWSYM_SCHUR(c, d);
println(d);
println(d);
a = callocobject();
b = callocobject();
ENDE

This means that the problem seems to be in the initialization of c (through the scan) and not in the calculation of d.

comment:12 Changed 6 months ago by zabrocki

Schur enough, if the lines after t_POWSYM_SCHUR(a, b); are:

freeall(a);
freeall(b);
scan(POW_SYM,c);
println(c);

Then the output of c is -8070441737306486504 11111.

comment:13 Changed 6 months ago by zabrocki

I'm getting closer. When my program is simply:

#include "def.h"
#include "macro.h"

ANFANG
scan(POW_SYM,a); 
t_POWSYM_SCHUR(a, b); 
freeall(a);
freeall(b);
scan_bruch(c); 
println(c);
a = callocobject();
b = callocobject();
ENDE

and my input for a is p_{1^50} with coefficient integer 1 and my input for c is 1/1 then when it prints out c I see 2305851790110509326.

comment:14 Changed 6 months ago by zabrocki

The error seems to be if the denominator is 1. If my input for c is a fraction where the denominator is not 1 (e.g. 1/2 or 7/12) then the output is correct. If my input for c is 1/1 or 14/7 then my output for c is some random integer (like -1152912740476237544).

comment:15 Changed 6 months ago by zabrocki

I can also trigger this error by scaning two bruch's and adding them together and then using as input any two rational numbers that add to a whole number. Their sum seems to be a random integer.

comment:16 Changed 6 months ago by zabrocki

Guess where the most German is used in the C code? Anne or Darij, can I assume that a good place to to look for this bug in the functions that begin with the word kuerzen in the bruch.c file?

comment:17 Changed 6 months ago by zabrocki

OK, the following just looks wrong to me and it is in kuerzen_integer_integer and this same issue seems to appear in several places.

    ggterg = ggt_i(S_B_UI(bruch),S_B_OI(bruch));

    if (ggterg == S_B_UI(bruch)) {
        freeself_bruch(bruch);
        M_I_I(S_B_OI(bruch) / ggterg,bruch);
        goto ende;
        }

It seems to say (1) calculate the gcd of the numerator (S_B_OI(bruch)) and the denominator (S_B_UI(bruch)) (2) if the gcd is equal to the denominator then freeself_bruch(bruch) and then divide ... and at this point I scream wait! don't touch S_B_OI(bruch) because bruch is already free!

So I want to change this to the following:

    ggterg = ggt_i(S_B_UI(bruch),S_B_OI(bruch));

    if (ggterg == S_B_UI(bruch)) {
        tmp = S_B_OI(bruch);
        freeself_bruch(bruch);
        M_I_I(tmp / ggterg,bruch);
        goto ende;
        }

When I do this...guess what!?!?! I don't seem to see the random integer in the output when I use the test.c from comment 13. I think that this means I have found the bug.

Changed 6 months ago by zabrocki

comment:18 Changed 6 months ago by zabrocki

OK. I have added a file called bruch.c and this should replace the file bruch.c that is currently in the spkg. Sorry, I probably should have done a diff of some kind rather than the file itself, but its too late now.

Jeroen, can you please replace this in the spkg? As far as I can tell, the 8 or so lines that I changed fixes the original Symmetrica bug.

comment:19 Changed 6 months ago by zabrocki

OK, I'm not a complete NULL. I can create my own spkg.

symmetrica-2.0.p9.spkg

My initial tests seem to indicate that this will fix the bug.

Apply: trac_15312_doctest-mz.patch​

comment:20 Changed 6 months ago by zabrocki

  • Authors set to Mike Zabrocki
  • Dependencies set to #13413

comment:21 Changed 6 months ago by zabrocki

  • Status changed from new to needs_review

Changed 6 months ago by zabrocki

comment:22 Changed 6 months ago by zabrocki

The file bruch.diff is the only change in the spkg from #13413.

comment:23 Changed 6 months ago by jdemeyer

  • Cc jdemeyer added
  • Description modified (diff)

comment:24 Changed 6 months ago by jdemeyer

  • Status changed from needs_review to needs_work

You should never change files directly in the src/ directory in a spkg. Add a patch in the patches/ directory and document what you did in SPKG.txt. Commit all changes and please put up a diff of the full spkg (such a diff should look like [attachment:symmetrica-2.0.p8.diff:ticket:13413).

See http://sagemath.org/doc/developer/patching_spkgs.html

comment:25 Changed 6 months ago by jdemeyer

Are you sure the change that you made is ok now? Are you sure that the memory that tmp points to isn't freed by freeself_bruch(bruch)? Would it not be even safer to free bruch even later?

comment:26 Changed 6 months ago by mhansen

tmp is a uint32_t and not a pointer -- that's what the S_B_UI does.

comment:27 Changed 6 months ago by jdemeyer

You're right, it's not a pointer, so it's safe.

comment:28 Changed 6 months ago by zabrocki

I've changed the spkg and included a diff file. Does it look ok now? The diff file doesn't quite have the same header information as yours.

In looking for other instances of this bug, I noticed that other places (e.g. kuerzen_longint_integer) have a similar fix that looks like it was corrected before my time.

            INT wi = S_B_OI(bruch);
            freeself_bruch(bruch);
            M_I_I(wi,bruch);

comment:29 Changed 6 months ago by jdemeyer

1) The changes inside the spkg need to be committed (hg commit):

jdemeyer@boxen:~/spkg/symmetrica-2.0.p9$ hg status
M SPKG.txt
? patches/bruch.patch

2) The directory src/ should be reverted to the exact contents it had before, including timestamps (in practice, simply remove src/ and copy it from the previous spkg):

jdemeyer@boxen:~/spkg/symmetrica-2.0.p9$ ls -l src/b*
-rw-r--r-- 1 jdemeyer jdemeyer 23190 Dec  6  2007 src/bar.c
-rw-r--r-- 1 jdemeyer jdemeyer  3467 Dec  7  2007 src/bar.doc
-rw-r--r-- 1 jdemeyer jdemeyer 35117 Dec  6  2007 src/bi.c
-rw-r--r-- 1 jdemeyer jdemeyer   348 Dec  7  2007 src/bi.doc
-rw-r--r-- 1 jdemeyer jdemeyer 77748 Dec  6  2007 src/boe.c
-rw-r--r-- 1 jdemeyer jdemeyer  7150 Dec  7  2007 src/boe.doc
-rw-r--r-- 1 jdemeyer jdemeyer 59384 Oct 21 21:38 src/bruch.c
-rw-r--r-- 1 jdemeyer jdemeyer  2122 Dec  7  2007 src/bruch.doc

3) The spkg doesn't install, because of problems with the filenames inside bruch.patch:

$ ./sage -i http://garsia.math.yorku.ca/~zabrocki/symmetrica-2.0.p9.spkg
[...]
can't find file to patch at input line 3
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|--- ../src/bruch.c     2013-10-21 15:38:23.000000000 -0400
|+++ ../../symmetrica-2.0.p9/src/bruch.c        2013-10-21 13:35:00.000000000 -0400
--------------------------

comment:30 Changed 6 months ago by jdemeyer

Mike: excellent analysis of the problem by the way!

comment:31 Changed 6 months ago by jdemeyer

  • Summary changed from fix integer overflow (?) in conversion of powersums to Schur functions to Fix use of freed memory in Symmetrica

comment:32 follow-up: Changed 6 months ago by zabrocki

I've fixed the spkg and I think it is all right now. The src directory has the original time stamps. I committed the changes and hg status is fine. The spkg installs (not sure what I did wrong there, but should be ok now). What did you do to create the diff file?

comment:33 in reply to: ↑ 32 Changed 6 months ago by jdemeyer

Replying to zabrocki:

What did you do to create the diff file?

After committing, hg export tip.

comment:34 Changed 6 months ago by jdemeyer

  • Reviewers set to Jeroen Demeyer

In trac_15312_doctest-mz.patch, you are missing a ::

Changed 6 months ago by zabrocki

Changed 6 months ago by zabrocki

comment:35 Changed 6 months ago by zabrocki

  • Status changed from needs_work to needs_review

comment:36 Changed 6 months ago by jdemeyer

  • Status changed from needs_review to positive_review

comment:37 Changed 6 months ago by zabrocki

Thank you! Having this fixed is a load off of my mind.

comment:38 Changed 6 months ago by jdemeyer

  • Merged in set to sage-5.13.beta2
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.