Opened 9 years ago

Closed 9 years ago

#12004 closed defect (duplicate)

copying a linear program using Coin solver consumes enormous amounts of memory

Reported by: john_perry Owned by: ncohen
Priority: major Milestone: sage-duplicate/invalid/wontfix
Component: linear programming Keywords: linear programming, Coin, copy, memory
Cc: Merged in:
Authors: Reviewers: Nathann Cohen
Report Upstream: Reported upstream. Developers acknowledge bug. Work issues:
Branch: Commit:
Dependencies: #12220 Stopgaps:

Status badges

Description

The following Sage program quickly explodes memory usage:

P = MixedIntegerLinearProgram(solver="Coin")
P.add_constraint(P[0]+P[1]==1)
while True:
  P = copy(P)
  w = solve(P)

Change History (30)

comment:1 Changed 9 years ago by john_perry

One problem may lie in line 259 of mip.pyx:

cdef MixedIntegerLinearProgram p = MixedIntegerLinearProgram(solver="GLPK")

but changing this does not resolve the problem. In any case, I'm not sure what's the best way to fix this; p has to be initialized to something. When I first did it, I initialized p by checking isinstance(self._backend,...), but that leads to import problems if CPLEX or Coin are not installed. That didn't fix the memory leak, so I didn't see the point in submitting even a preliminary patch. I could add an attribute that would indicate what the backend is, but there's probably a better way.

comment:2 follow-up: Changed 9 years ago by ncohen

Ohhhhhh !! Now I remember why there is this solver="GLPK" where it shouldn't be, inside of the code ! It's because three lines later the _backend variable is overwritten anyway :

p._backend = (<GenericBackend> self._backend).copy()

So the solver used by the copy of the MILP will indeed be the same as the current solver, and all is well under the sun. Now, for some reason the bug could come from the fact that even though the p._backend is overwritten, the former object is not freed. Would a del p._backend just before the assigning a new value to p._backend solve the problem ? :-)

Nathann

comment:3 in reply to: ↑ 2 Changed 9 years ago by john_perry

Replying to ncohen:

Would a del p._backend just before the assigning a new value to p._backend solve the problem ? :-)

I had tried something like this, but Cython complains:

sage/numerical/mip.pyx:281:13: Cannot delete C attribute of extension type

Is there a command that lets me delete the backend? I can see that there is a command that is commented out, void del_OsiCbcSolverInterface..., but I haven't yet figured out whether I should uncomment or use that.

comment:4 follow-up: Changed 9 years ago by john_perry

FWIW, I'm fairly convinced that initializing with the GLPK backend is not the problem. I tried the following modifications:

  • change __init__ in MixedIntegerLinearProgram? so that solver==None actually gives no backend, not the GLPK backend;
  • in __copy__, initialize with solver==None.

I know it is a bad, bad idea to change the behavior of an initializer this way, but I thought it would give unequivocal evidence. I've changed my version back.

Indeed, even in this case, copying a Coin solver consumes enormous memory. It seems easier to break out of the loop, though. In fact, within 15 loops it consumes 1.6GB RAM. That's not a typo: only 15!!! 8-0

So the problem really does lie elsewhere.

comment:5 in reply to: ↑ 4 ; follow-up: Changed 9 years ago by ncohen

So the problem really does lie elsewhere.

Did you add a "print 'Hey'" in the dealloc method of the COIN backend, just to see whether the backends are really removed each time you overwrite the value of P ?

Nathann

comment:6 in reply to: ↑ 5 Changed 9 years ago by john_perry

Replying to ncohen:

Did you add a "print 'Hey'" in the dealloc method of the COIN backend, just to see whether the backends are really removed each time you overwrite the value of P ?

No, but I tried it now, and the backends really are removed: I get a "hey" with every copy.

In addition, the explosion in memory occurs regardless of whether we add constraints or not. You could simply run P = copy(P) several times, and memory usage jumps up.

comment:7 follow-up: Changed 9 years ago by john_perry

Maybe I'm wrong (probably) but the exponential growth suggests to me that, during the copy, some sort of nesting is going on. If the problem were merely a failure to deallocate, the growth should be merely linear, shouldn't it?

Python has a garbage collector -- is there any way to peek inside the structure of P, and see what constitutes it?

comment:8 in reply to: ↑ 7 ; follow-up: Changed 9 years ago by ncohen

Maybe I'm wrong (probably) but the exponential growth suggests to me that, during the copy, some sort of nesting is going on. If the problem were merely a failure to deallocate, the growth should be merely linear, shouldn't it?

Yep, right O_o

Python has a garbage collector -- is there any way to peek inside the structure of P, and see what constitutes it?

I've got no idea how to do that. And if that were the case, it would have to deal with Cython too !

Nathann

comment:9 in reply to: ↑ 8 Changed 9 years ago by john_perry

Replying to ncohen:

Maybe I'm wrong (probably) but the exponential growth suggests to me that, during the copy, some sort of nesting is going on. If the problem were merely a failure to deallocate, the growth should be merely linear, shouldn't it?

Yep, right O_o

I was afraid of that.

Python has a garbage collector -- is there any way to peek inside the structure of P, and see what constitutes it?

I've got no idea how to do that. And if that were the case, it would have to deal with Cython too !

I tried it (import gc; gc.get_referents(P)). It isn't really helpful.

I also tried the following: instead of calling the copy constructor, I called the clone method. I'm referring to this function in OsiCbcSolverInterface.hpp:

virtual OsiSolverInterface * clone(bool copyData = true) const;

That required modifying the .pxd file, of course. I got it to compile & run, but the memory problems persist. Still chugging away at this...

comment:10 Changed 9 years ago by john_perry

I tried the following code:

from sage.numerical.backends.coin_backend import CoinBackend

p = CoinBackend()
while True: p = copy(p)

We do not see an explosion of memory on this. The problem lies elsewhere.

Geez... I should have tried this a long time ago.

comment:11 Changed 9 years ago by john_perry

Ahh... spoke too soon. If I change the above to,

from sage.numerical.backends.coin_backend import CoinBackend

p = CoinBackend()
while True: p = p.copy()

then the memory problems persist.

comment:12 Changed 9 years ago by john_perry

I figured out how to implement the copy constructor: in the .pxd file, declare under cdef cppclass c_OsiCbcSolverInterface "OsiCbcSolverInterface"

  c_OsiCbcSolverInterface(c_OsiCbcSolverInterface &si)

In the .pyx file, I then call

        p.si = new c_OsiCbcSolverInterface(<c_OsiCbcSolverInterface>self.si)

It runs, but we still get exponential growth of memory.

I'm running out of ideas here. Do you have a copy of the Coin source, and can you see if there is exponential growth if we do the same thing there?

comment:13 Changed 9 years ago by ncohen

Hello again !

Did you try copying several times the line p.si = new [...] ? This way, each time you do a copy() in Sage the COIN interface actually does several of them, if the memory explodes faster you will be sure it comes from their code !

On my computer I am having some trouble installing CBC right now ^^;

Nathann

comment:14 Changed 9 years ago by john_perry

Here's what I tried. In coin_backend.copy I changed line 990,

p.si = new c_OsiCbcSolverInterface(<OsiSolverInterface *> self.si)

to

for each in xrange(20): p.si = new c_OsiCbcSolverInterface(<OsiSolverInterface *> self.si)

That gave me no problems.

Then I changed it to two lines:

p.si = new c_OsiCbcSolverInterface(<OsiSolverInterface *> self.si)
for each in xrange(20): p.si = new c_OsiCbcSolverInterface(<OsiSolverInterface *> p.si)

and quickly regretted it!

So, it looks as if that's the problem.

comment:15 Changed 9 years ago by john_perry

I have successfully replicated the problem in C++ directly, using the following short program:

#include "include/coin/OsiCbcSolverInterface.hpp"
#include <stdio.h>

int main() {
  OsiCbcSolverInterface * si = new OsiCbcSolverInterface(NULL);
  
  for (int i = 0; i < 13; i++) {
    printf("%d ", i);
    si = new OsiCbcSolverInterface(si);
  }
  printf("\n");
  
  while (true) {}
  
  delete si;
}

An enormous amount of memory is consumed thereby. Since it's possible that this has already been resolved upstream, I will test with the latest version of Coin (2.7.5) in a moment.

comment:16 Changed 9 years ago by john_perry

Same problem in 2.7.5.

comment:17 Changed 9 years ago by ncohen

Hmmm.... So the issue is to "report upstream" ?

Nathann

comment:18 Changed 9 years ago by john_perry

I want to look carefully at the documentation first, and see if there's another way to do what we're trying to do. Maybe this approach is not what they actually want (but I doubt it). I'd also like to see if I can identify the precise location of the error.

comment:19 Changed 9 years ago by john_perry

  • Owner changed from ncohen to john_perry
  • Report Upstream changed from N/A to Not yet reported upstream; Will do shortly.

The source code (Cbc/src/OsiCbc/OsiCbcSolverInterface.cpp) suggests we're doing the right thing: when supplied a solver, the constructor tries to clone it. I can see that they carry out several further steps, cloning the solver again, but that's for a different class member (referenceSolver_). Nothing strikes me as obviously wrong.

I will report it upstream and try to find out what's going on.

comment:20 Changed 9 years ago by john_perry

  • Report Upstream changed from Not yet reported upstream; Will do shortly. to Reported upstream. Developers acknowledge bug.
  • Status changed from new to needs_info

Upstream replies pretty quickly, once you get the right mailing list. :-)

I may be wrong but in my experience OsiCbcSolverInterface? has several problems and should not be used. If your problem is a linear program, you should use OsiClpSolverInterface? instead. Once you load the problem in an object of class OsiClpSolverInterface?, you can use the clone() method to obtain copies.

I can verify that modifying the test program above in this way means we can clone without an exponential memory leak:

#include "include/coin/OsiClpSolverInterface.hpp"
#include <stdio.h>

int main() {
  OsiClpSolverInterface * si = new OsiClpSolverInterface();
  OsiClpSolverInterface * newsi;
  
  for (int i = 0; i < 13; i++) {
    printf("%d ", i);
    newsi = dynamic_cast <OsiClpSolverInterface *>((*si).clone());
    delete si;
    si = newsi;
  }
  printf("\n");
  
  while (true) {}
  
}

(FWIW putting newsi = ...; delete si; si = newsi; didn't affect the previous version of the code, either.)

So, now I have a question. Is there a reason we're using the OsiCbcSolverInterface instead of the OsiClpSolverInterface? or is it okay to switch interfaces?

comment:21 Changed 9 years ago by john_perry

  • Owner changed from john_perry to ncohen

And how did I end up the owner of this ticket, anyway?

comment:22 follow-up: Changed 9 years ago by ncohen

Well.... The main reason is that if I make no mistake Clp is their *linear* solver and Cbc their *integer* solver. So if we switch to OsiClpSolverInterface? I guess we immediately loose the ability to solve integer programs O_o

Nathann

comment:23 in reply to: ↑ 22 Changed 9 years ago by john_perry

Replying to ncohen:

Well.... The main reason is that if I make no mistake Clp is their *linear* solver and Cbc their *integer* solver. So if we switch to OsiClpSolverInterface? I guess we immediately loose the ability to solve integer programs O_o Nathann

This was a question I had in mind, by the way. Doesn't MILP return float() answers? In fact, I've often found it to return non-integer answers, so that in some of my code, I've had to hack a way to create approximate rational answers (which will suffice for my purposes).

I understand your problem, and I'll ask with them. One option that would work for me would be to create a separate Clp interface, and work with that. I guess that in this case, we couldn't package it with Coin.

comment:24 Changed 9 years ago by john_perry

...we couldn't package it with Coin.

Sorry, I meant "we couldn't package it with MILP".

comment:25 Changed 9 years ago by ncohen

  • Status changed from needs_info to needs_review

John reported the bug upstream, they acknowledge it is a bug and do not intend to fix it :-D

The reason is that the C++ class that is currently use to deal with Coin is "a mistake" according to the developpers. Anyway the package will eventually be upgraded by #12220, and the rewriting that it will require should solve this bug along.

Nathann

comment:26 Changed 9 years ago by ncohen

  • Status changed from needs_review to positive_review

comment:27 Changed 9 years ago by ncohen

  • Milestone changed from sage-4.8 to sage-duplicate/invalid/wontfix

comment:28 Changed 9 years ago by jdemeyer

  • Authors john_perry deleted
  • Reviewers set to Nathann Cohen

comment:29 Changed 9 years ago by jdemeyer

  • Dependencies set to #12220

comment:30 Changed 9 years ago by jdemeyer

  • Resolution set to duplicate
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.