Opened 8 years ago

Closed 8 years ago

Last modified 7 years ago

#11606 closed enhancement (fixed)

simplify constraints in linear programs

Reported by: john_perry Owned by: ncohen
Priority: major Milestone: sage-5.0
Component: linear programming Keywords: sd32
Cc: ncohen Merged in: sage-5.0.beta1
Authors: John Perry Reviewers: Nathann Cohen
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by john_perry)

MixedIntegerLinearProgram? doesn't notice when it is given constraints that already exist in the program, or that are constant multiples. A simple example:

sage: lp = MixedIntegerLinearProgram()
sage: for each in xrange(10):
....:     lp.add_constraint(lp[0]-lp[1],min=1)
....:     
sage: lp.show()
Maximization:
 
Constraints:
  1.0 <= x_0 -x_1 
  1.0 <= x_0 -x_1 
  1.0 <= x_0 -x_1 
  1.0 <= x_0 -x_1 
  1.0 <= x_0 -x_1 
  1.0 <= x_0 -x_1 
  1.0 <= x_0 -x_1 
  1.0 <= x_0 -x_1 
  1.0 <= x_0 -x_1 
  1.0 <= x_0 -x_1 
Variables:
  x_0 is a continuous variable (min=0.0, max=+oo)
  x_1 is a continuous variable (min=0.0, max=+oo)

Notice that the same constraint appears 10 different times.

Apply:

This patch depends on #11588 and #11607.

Attachments (3)

trac_11606_add_only_new_constraints_to_lp.patch (4.4 KB) - added by john_perry 8 years ago.
trac_11606_simplify_with_sets.patch (4.7 KB) - added by john_perry 8 years ago.
trac_11606_add_only_new_constraints_to_lp_using_sets.patch (5.3 KB) - added by john_perry 8 years ago.

Download all attachments as: .zip

Change History (43)

comment:1 Changed 8 years ago by john_perry

To emphasize the need for this, in a sample application I'm working on, the number of constraints is decreased by an order of magnitude (93 vs. 787); in larger systems the different would likely be much larger.

Changed 8 years ago by john_perry

comment:2 Changed 8 years ago by john_perry

  • Authors set to john_perry
  • Status changed from new to needs_review

The attached file is a first attempt at solving the problem. An optional argument, check_redundant, which defaults to False, is used to control whether to invoke the change. If so:

  • it proceeds row by row of the constraint matrix
  • it checks whether the same variables are nonzero
  • if so, it computes the ratio r between the coefficients of the "leading" variables
  • if it is negative, add the constraint

(if r is negative, the constraint is not so obviously redundant)

  • if the ratio of the lower & upper bounds disagrees with r, add the constraint
  • now check the ratio of the remaining variables in the current row; if it differs from r, add the constraint

This is meant to be a preliminary implementation rather than an efficient implementation; the best option is probably to visit the matrix & compare, but this would require going into each back end, which is probably a bad idea right now.

comment:3 Changed 8 years ago by kcrisman

  • Cc ncohen added

comment:4 Changed 8 years ago by john_perry

  • Keywords sd32 added

comment:5 follow-ups: Changed 8 years ago by ncohen

  • Status changed from needs_review to needs_info

Hello John !

I was thinking again about what I told you by email : in order to simplify constraints more efficiently, one could add the constraints to a Set object when adding it to the solvers, and add a new constraint only when it is not already included in the set. The search is then much more efficient, and handled by Python. If you also want to test for constraints which may be formaly different (i.e. the original constraint multiplied by two), you but have to insert the "normalized" constraint in the set instead of the one which is given.

This would mean doubling in memory the size of the stored constraints, as it would mean that the LP constraint's are remembered both by the backend and by a Set object in the MILP class, but this is also what is happening right now when returning the list of all constraints. This could also be rewritten as an interator to avoid the cost of memory, then again have no idea how large your LP can be, but perhaps if you have many constraints this Set trick would help you much more.

In this case, the simplify_constraints would be a parameter of the MixedntegerLinearProgram? init method, so that the Set object is created and maintained from beginning to end when the user asks the constraints to be simplified.

Well, my two cents :-)

athann

comment:6 in reply to: ↑ 5 ; follow-up: Changed 8 years ago by john_perry

Replying to ncohen:

If you also want to test for constraints which may be formaly different (i.e. the original constraint multiplied by two), you but have to insert the "normalized" constraint in the set instead of the one which is given.

I had thought of this, too. I think the only reason I didn't do it at the time is a law I've seen attributed to David Knuth: premature optimization is the root of all evil. I can certainly try it.

This could also be rewritten as an interator to avoid the cost of memory...

I'm not quite following how to do this.

then again have no idea how large your LP can be...

Hard data: a problem that exhausted memory on a 4GB machine a month ago computed yesterday in about 40MB.

In this case, the simplify_constraints would be a parameter of the MixedntegerLinearProgram? init method, so that the Set object is created and maintained from beginning to end when the user asks the constraints to be simplified.

I'll try that, thanks.

comment:7 in reply to: ↑ 6 ; follow-up: Changed 8 years ago by ncohen

This could also be rewritten as an interator to avoid the cost of memory...

I'm not quite following how to do this.

Oh. For the moment your patch returns a list of tuples, representing the constraints. Instead of that (which requires to build the whole list first, then to return it, thus doubling the memory cost), python can return the list element by element using the "yield" keyword. This would avoid doubling the memory, especially if you just want to check something for each constraint returned.

comment:8 in reply to: ↑ 7 Changed 8 years ago by john_perry

Replying to ncohen:

This could also be rewritten as an interator to avoid the cost of memory...

I'm not quite following how to do this.

Oh. For the moment your patch returns a list of tuples, representing the constraints. Instead of that (which requires to build the whole list first, then to return it, thus doubling the memory cost), python can return the list element by element using the "yield" keyword. This would avoid doubling the memory, especially if you just want to check something for each constraint returned.

Are you thinking of #11607 here? This patch only affects the return value in one place; it returns None, and then only if the constraint is redundant.

comment:9 in reply to: ↑ 5 Changed 8 years ago by john_perry

Replying to ncohen:

I was thinking again about what I told you by email : in order to simplify constraints more efficiently, one could add the constraints to a Set object when adding it to the solvers, and add a new constraint only when it is not already included in the set. The search is then much more efficient, and handled by Python. If you also want to test for constraints which may be formaly different (i.e. the original constraint multiplied by two), you but have to insert the "normalized" constraint in the set instead of the one which is given.

I have a working implementation along these lines, but there's one difficulty. In order to normalize the constraint, the code divides by the coefficient of the variable with the smallest index. For example, given the constraint 2*x_1-2*x_0<=2, the code converts this to x_1-x_0<=1.

In order to do this, I want to use the min() command to determine the smallest non-zero key, determine its coefficient, and divide:

    i = min(f.keys())
    c = f[i]
    C = [(v,coeff/c) for (v,coeff) in f.iteritems() if v != -1]

The problem with this is that min is a parameter to the function, so this provokes an error.

I can rename the parameters as min_value and max_value, but this would break people's existing code. Alternately, I can define self._min = min in __init__, and call self._min() where I would ordinarily call min().

I am attaching a patch that uses the latter approach, but if you have suggestions for a better solution, I'm all ears. It wouldn't surprise me if there is one.

BTW, does it matter if I use Set or set? You wrote the former, but currently I'm using the latter.

comment:10 Changed 8 years ago by john_perry

  • Description modified (diff)

comment:11 Changed 8 years ago by john_perry

  • Status changed from needs_info to needs_review

More rigorous testing showed that I had overlooked an issue with the keys (variables whose coefficients are zero). Fixed in the revision of the patch.

(Just noticed: it helps if one executes hg qrefresh before hg export tip...)

Changed 8 years ago by john_perry

comment:12 Changed 8 years ago by ncohen

  • Status changed from needs_review to needs_work

Hello John ! :-)

I really didn't think at first this method would need to use either min or max later on :-)

Hopefully, this trick should work :

from __builtin__ import min as min_function

Could you also add in the explanation of the check_redundant parameter what is a "redundant" constraint ? Something like "(two constraints are redundant when one can be obtained from the other scalar by multiplication)" or something of the kind ? :-)

Oh, and it looks like there are some missing "::" before the beginning of Python code in the doctests, so those will not be properly formatted when building the documentation.

sage -docbuild reference html

Two other remarks :

  • Set is a Sage version of set, the latter being a purely Python thing. The interest of Set is that it is hashable and immutable, while the "set" can be modified. When you do not need the elements to be immutable, I'm told the best is to use set. And this I was probably told after writing this part :-)
  • I thought that
    min( x for x in range(3) if x != 0 )
    
    would be faster than
    min([ x for x in range(3) if x != 0 ])
    

as you do not create the list first, but it looks like the iterators have a cost too :

sage: %timeit min( x for x in range(10000) if x != 0 )
125 loops, best of 3: 5.51 ms per loop
sage: %timeit min([ x for x in range(10000) if x != 0 ])
125 loops, best of 3: 4.56 ms per loop
sage: %timeit min(x for x in range(10) if x != 0)
625 loops, best of 3: 7.41 µs per loop
sage: %timeit min([ x for x in range(10) if x != 0 ])
625 loops, best of 3: 6.81 µs per loop

Nathann

comment:13 Changed 8 years ago by john_perry

  • Description modified (diff)

Alright! The new patch imports min via __builtin__, and fixes those issues with the documentation. The doctests take about twice as long to run on my machine now. :-(

Regarding the iterators: now I understand what you mean. Actually, if memory serves (and it may deceive -- not sure) I think I tried that when defining C, but Cython defied me, claiming that generators weren't acceptable. That's when I noticed that you had placed you definition of C in a list. So, I didn't think to use it in the min_function (maybe I even tried).

comment:14 Changed 8 years ago by john_perry

  • Status changed from needs_work to needs_review

comment:15 Changed 8 years ago by john_perry

By the way, when reading the section on docstring markup in Conventions for Coding in Sage, I noticed this:

Warning: Functions whose names start with an underscore do not currently appear in the reference manual, so avoid putting crucial documentation in their docstrings. In particular, if you are defining a class, you might put a long informative docstring after the class definition, not for the __init__ method.

Should we move that documentation out of __init__?

(You have to scroll fairly far down to find the warning, or else do a search.)

comment:16 Changed 8 years ago by john_perry

Forgot about copying -- we would want to copy the flag for checking redundancy, as well as the constraints.

comment:17 Changed 8 years ago by john_perry

Forgot about copying sets, not just assigning pointers...

comment:18 Changed 8 years ago by john_perry

  • Status changed from needs_review to needs_work

Something isn't working correctly with the normalization, sometimes. I'm trying to figure out what changed that is causing the issue I'm seeing...

comment:19 Changed 8 years ago by john_perry

  • Status changed from needs_work to needs_info

As I suspected, it's a typing problem. Normalization performs a division, and 1/2 simplifies to 0 if 1 is an int. If 1 is an Integer or a Rational, then 1/2 simplifies to a Rational.

This strikes me as a matter best left to the client: the normalization shouldn't choose the type of the coefficients of the constraints; it's not even clear we can guess this a priori. So this isn't a bug after all.

But, I'd like someone else's opinion.

comment:20 Changed 8 years ago by john_perry

  • Description modified (diff)

I just noticed that the patch included depends on the patch for #11588 (copying a linear program), and changed the description to reflect this.

comment:21 Changed 8 years ago by john_perry

  • Description modified (diff)

Yep, depends on #11607, too. That can be removed, and I'll do so in the near future.

comment:22 follow-up: Changed 8 years ago by ncohen

but Cython defied me, claiming that generators weren't acceptable.

Oops, that's right. The current version of Cython doesn't like generators, this feature should be included in the next Cython release, and I assure you that we have been waiting for generators in Cython for a loooooooong time here. There are patches I haven't written yet \ waiting for that, others that contain as a comment "Change this as soon as we have iterators".. Sigh :-)

Should we move that documentation out of init?

Well, the current documentation of the init method of the MixedIntegerLinearProgram? class also appears just before, in the section's documentation (and that appears in the doc). But you are right : it's probably best to remove it just to avoid later mistakes.

Actually, it should just contain the line "Constructor" and the doctests. The command

sage -coverage file.pyx

is a statistic telling you whether all the functions have documentation and tests. It's nice to have around, but that's also why this init function should have a minimal set of doctests -- so that the output of this software doesn't report that some doc is missing :-)

This strikes me as a matter best left to the client: the normalization shouldn't choose the type of the coefficients of the constraints

You are right in general, though in this special situation the answer has been taken from us. When a constraint is added, it is forwarded to the solver backend which is the only place where it is saved. Now, all of CPLEX/GLPK/Coin only accept floats as data. As mip.pyx is\

a Cython file, the best is probably to deal with float variables. By the way, why do you create so many lists ? Instead of

i = min_function([v for (v,coeff) in f.iteritems() if coeff != 0])
c = f[i]

What about

cdef int i = 0
cdef float c

for i, c in f.iteritems():
    if c!= 0:
        break

Sorry about the delay again. I'm getting back to the point where I can deal with tasks on-the-fly, and not with the ones that should have been settled the week before :-)

Nathann

comment:23 in reply to: ↑ 22 ; follow-up: Changed 8 years ago by john_perry

Replying to ncohen:

Well, the current documentation of the init method of the MixedIntegerLinearProgram? class also appears just before, in the section's documentation (and that appears in the doc). But you are right : it's probably best to remove it just to avoid later mistakes.

Actually, it should just contain the line "Constructor" and the doctests.

Will do.

This strikes me as a matter best left to the client: the normalization shouldn't choose the type of the coefficients of the constraints

You are right in general, though in this special situation the answer has been taken from us. When a constraint is added, it is forwarded to the solver backend which is the only place where it is saved. Now, all of CPLEX/GLPK/Coin only accept floats as data. As mip.pyx is a Cython file, the best is probably to deal with float variables.

Are you saying I should coerce everything to floats, or leave it be? I thought to leave it be in case some future back end accepted rationals; the backends can do their own coercion.

By the way, why do you create so many lists ?

You seem to think I know something about proper P/Cython programming. :-) Thanks for the suggestion; that will help speed things up a bit, though I think my real problem lies elsewhere.

Sorry about the delay again. I'm getting back to the point where I can deal with tasks on-the-fly, and not with the ones that should have been settled the week before :-)

No problem. I'm in a bit of a fix myself, so it may take a few days before another patch is ready, anyway.

By the way, do you know what those colored balls next to "opened: ... weeks ago" mean?

comment:24 in reply to: ↑ 23 ; follow-up: Changed 8 years ago by ncohen

Are you saying I should coerce everything to floats, or leave it be? I thought to leave it be in case some future back end accepted rationals; the backends can do their own coercion.

Well, I think the doubles wouldn't hurt. And I guess it would be the same for rationals if it gets implemented. The constraints in the LP would not change, all that matters is that equivalent constraints get coerced to the same values :-)

that will help speed things up a bit, though I think my real problem lies elsewhere.

Yep, unfortunately :-)

By the way, do you know what those colored balls next to "opened: ... weeks ago" mean?

Oh. Did you try clicking on them ? The patches uploaded on a trac ticket are automatically applied and tested on a Sage install, and this ball prints the status of this operation. Browse the other tickets waiting for review and see how it changes ;-)

Nathann

comment:25 in reply to: ↑ 24 Changed 8 years ago by john_perry

Replying to ncohen:

Oh. Did you try clicking on them ?

I did, and I don't seem any status of an automatic patch and test. It seems only to summarize the data. Then there's a long horizontal line & white space.

The only thing I've noticed is that different patches sometimes have different colors.

comment:27 in reply to: ↑ 26 Changed 8 years ago by john_perry

comment:28 Changed 8 years ago by ncohen

(On this ticket the circle is yellow and you will have more complete information if yo click on it)

comment:29 Changed 8 years ago by john_perry

  • Status changed from needs_info to needs_review

The new patch reflects changes described above.

comment:30 Changed 8 years ago by john_perry

The only change was to rebase the patch against the revised #11607.

comment:31 follow-up: Changed 8 years ago by john_perry

*bump* any chance of a review?

comment:32 in reply to: ↑ 31 Changed 8 years ago by ncohen

Replying to john_perry:

*bump* any chance of a review?

Rightrightrightright. You did well :-D

Nathann

comment:33 Changed 8 years ago by ncohen

  • Status changed from needs_review to needs_work

Hellooooooo again !

I have less than 30 minutes of battery left but it should be ok :-D

So. First, and before anything else, this patch needs to be rebased on the current version of Sage. This patch defines two methods (constraints and number of contraints) which have been added by another patch already, so Sage refuses to run when this patch is applied.

Short of this, the patch looks good, so once this is settled it should be good to go :-)

Nathann

comment:34 Changed 8 years ago by john_perry

  • Status changed from needs_work to needs_review

I have rebased the patch against sage-4.8.alpha4. This patch no longer introduces the function constraints() that existed in the previous patch, and that caused the conflict. It does not provoke any doctest failures on my version of sage-4.8.alpha4.

However, I did notice that three doctest failures currently occur with sage-4.8.alpha4:

File "/Applications/sage-4.8.alpha4/devel/sage-main/sage/numerical/mip.pyx", line 523:
    sage: p.show()
Expected:
...
File "/Applications/sage-4.8.alpha4/devel/sage-main/sage/numerical/mip.pyx", line 685:
    sage: p.write_lp(SAGE_TMP+"/lp_problem.lp")
Exception raised:
...
File "/Applications/sage-4.8.alpha4/devel/sage-main/sage/numerical/mip.pyx", line 322:
    sage: p
Expected:
...

Since these exist in an unpatched sage-4.8.alpha4, they have no bearing on this ticket, but is this a known issue? I'll open a new ticket if not. My machine is a MacBook? Pro running OSX 10.6.8.

comment:35 Changed 8 years ago by ncohen

  • Authors changed from john_perry to John Perry
  • Reviewers set to Nathann Cohen
  • Status changed from needs_review to positive_review

Positive review to this patch, which passes all tests ! :-)

John, I do not see the errors you mentionned on my alpha4. The fact that the write_lp and p.show return errors lead me to think that you have CBC installed. Is this true ? :-) Through OsiCbcSolverInterface?, I did not find how to write those files and some "names" in the structure of a LP were not available (like the "name of the lp", or the "name of the objective function"), and because of that I did not enable the names in the CBc interface. Perhaps this is one of the things that would be fixed by a real interface with Cbc. But anyway, I guess those bugs only appear on your install because of Cbc. They can also be "fixed" by adding "solver= GLPK"in the doctests above.

Thank you for this patch ! :-)

Nathann

comment:36 Changed 8 years ago by jdemeyer

  • Milestone changed from sage-4.8 to sage-5.0

comment:37 follow-up: Changed 8 years ago by john_perry

Weird. I saw jdemeyer's change yesterday, but not ncohen's change two days ago.

John, I do not see the errors you mentionned on my alpha4. The fact that the write_lp and p.show return errors lead me to think that you have CBC installed. Is this true ? :-)

Bingo!

As I see, Cbc is a real problem for us, even though it seems good for me personally... then again, maybe not if I look at it long enough :-(

comment:38 in reply to: ↑ 37 Changed 8 years ago by ncohen

As I see, Cbc is a real problem for us, even though it seems good for me personally... then again, maybe not if I look at it long enough :-(

Well, if you find out how to solve a MIP using Coin's library without using OsiCbcSolverInterface?, just let me know. This solver is driving me mad :-p

Nathann

comment:39 Changed 8 years ago by jdemeyer

  • Merged in set to sage-5.0.beta1
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:40 Changed 7 years ago by davidloeffler

Apply trac_11606_add_only_new_constraints_to_lp_using_sets.patch

(for legacy patchbots running 4.8)

Note: See TracTickets for help on using tickets.