Opened 4 years ago

Last modified 5 days ago

#12121 needs_work defect

floor/ceil can be very slow at integral values

Reported by: dsm Owned by: AlexGhitza
Priority: major Milestone: sage-7.2
Component: basic arithmetic Keywords:
Cc: kcrisman., jdemeyer Merged in:
Authors: Vincent Delecroix Reviewers: Marc Mezzarobba
Report Upstream: N/A Work issues:
Branch: u/vdelecroix/12121 (Commits) Commit: 602e5158c5f37c6900a1dee8e27a228114af1319
Dependencies: Stopgaps:

Description (last modified by vdelecroix)

Reported (in slightly different form) on ask.sagemath.org:

sage: %timeit floor(log(3)/log(2))
625 loops, best of 3: 586 µs per loop
sage: %timeit floor(log(4)/log(2))
5 loops, best of 3: 3.79 s per loop

This happens because ceil and floor first try to increase the precision of a coercion of the input argument to a RealInterval by 100 bits from 53 to 20000 before finally trying a full_simplify, which succeeds. The RealInterval rounds all fail because the interval is always of the form (2 - epsilon, 2 + epsilon) and endpoints have different ceilings.

The proposal is to:

  • first found an interval of reasonably small diameter (arbitrarily set to 2-30) and see whether this is enough to decide the ceiling
  • then check equality with the only available candidate (possibly doing some simplification)
  • start further refinement of the interval

With the branch applied math.floor and numpy.floor are used directly

sage: floor(1.2r)
1.0
sage: type(_)
<type 'float'>

which is distinct from the current Sage behavior

sage: floor(1.2r)
1
sage: type(_)
<type 'sage.rings.integer.Integer'>

Change History (51)

comment:1 Changed 4 years ago by dsm

  • Description modified (diff)

comment:2 Changed 4 years ago by dsm

  • Description modified (diff)

comment:3 Changed 4 years ago by dsm

Note to self (and others): we can use is_int to decide when we should test for equality. Test (once) the first time there's only one integer in the interval. If you have equality there, you're done. If not, you never will (or won't be able to prove it, anyway) and can carry on.

comment:4 Changed 4 years ago by kini

comment:5 Changed 3 years ago by jdemeyer

  • Milestone changed from sage-5.11 to sage-5.12

comment:6 Changed 2 years ago by vbraun_spam

  • Milestone changed from sage-6.1 to sage-6.2

comment:7 Changed 2 years ago by vbraun_spam

  • Milestone changed from sage-6.2 to sage-6.3

comment:8 Changed 22 months ago by vbraun_spam

  • Milestone changed from sage-6.3 to sage-6.4

comment:9 follow-up: Changed 11 months ago by zimmerma

I'm not sure if this should go to this ticket, but the following never returns:

sage: z
(11/9*sqrt(3)*sqrt(2) + 3)^(1/3) + 1/3/(11/9*sqrt(3)*sqrt(2) + 3)^(1/3) - 1
sage: floor(z)

Even floor(z, maximum_bits=53) loops infinitely.
Should I open a separate ticket?

comment:10 Changed 3 months ago by ajagekar.akshay

  • Branch set to u/ajagekar.akshay/Trac12121

comment:11 Changed 3 months ago by ajagekar.akshay

  • Authors set to Akshay Ajagekar
  • Commit set to 6b75d32ab5cc279a8566f0bc67acc2501bbb833e
  • Status changed from new to needs_review

New commits:

6b75d32Trac #12121: floor/ceil can be very slow at integral values

comment:12 Changed 3 months ago by ajagekar.akshay

  • Authors Akshay Ajagekar deleted
  • Branch u/ajagekar.akshay/Trac12121 deleted
  • Commit 6b75d32ab5cc279a8566f0bc67acc2501bbb833e deleted

comment:13 in reply to: ↑ 9 Changed 3 months ago by vdelecroix

Replying to zimmerma:

I'm not sure if this should go to this ticket, but the following never returns:

sage: z
(11/9*sqrt(3)*sqrt(2) + 3)^(1/3) + 1/3/(11/9*sqrt(3)*sqrt(2) + 3)^(1/3) - 1
sage: floor(z)

Even floor(z, maximum_bits=53) loops infinitely.

Whereas the following actually works

sage: bool(z == 1)
True

Should I open a separate ticket?

I think that "very slow" includes "infinite amount of time". For me it is worth it to also fix this kind of endless loops in this ticket.

comment:14 follow-up: Changed 3 months ago by vdelecroix

  • Milestone changed from sage-6.4 to sage-7.1
  • Status changed from needs_review to needs_work

@ajagekar.akshay: why did you remove your branch? The commit lacks some examples (as the one of comment:9).

comment:15 in reply to: ↑ 14 ; follow-up: Changed 3 months ago by ajagekar.akshay

Replying to vdelecroix:

@ajagekar.akshay: why did you remove your branch? The commit lacks some examples (as the one of comment:9).

Sorry but I pushed that branch without testing, some tests failed.

comment:16 in reply to: ↑ 15 Changed 3 months ago by vdelecroix

Replying to ajagekar.akshay:

Replying to vdelecroix:

@ajagekar.akshay: why did you remove your branch? The commit lacks some examples (as the one of comment:9).

Sorry but I pushed that branch without testing, some tests failed.

That is not a problem. Tickets can have different status: closed, positive_review, needs_review, needs_work, new. If there is no branch associated to a ticket it makes no sense to keep it into "needs review" state. (I already changed it to needs_work)

comment:17 Changed 3 months ago by vdelecroix

Indeed, your code was good... and there is a bug somewhere else in the conversion from SR to the real interval fields:

sage: RealIntervalField(256)(SR(10^50 + 10^(-50))).is_int()
(True, 100000000000000000000000000000000000000000000000000)

Sorry, I was wrong. The method is_int is not intended to test if the interval is actually restricted to one integer! It only tests if the interval contains only one integer.

Last edited 3 months ago by vdelecroix (previous) (diff)

comment:18 Changed 3 months ago by vdelecroix

  • Authors set to Vincent Delecroix
  • Branch set to u/vdelecroix/12121
  • Commit set to 9821cadaa866f28a0178ce4c75b03dbc82aabd30
  • Status changed from needs_work to needs_review

New commits:

9821cadTrac 12121: fix floor/ceil functions

comment:19 Changed 3 months ago by git

  • Commit changed from 9821cadaa866f28a0178ce4c75b03dbc82aabd30 to 800d0ee77a1628f92faaf1b1e159d4aa3f1ed966

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

800d0eeTrac 12121: fix floor/ceil functions

comment:20 Changed 3 months ago by git

  • Commit changed from 800d0ee77a1628f92faaf1b1e159d4aa3f1ed966 to 50bdaf3b31aecc326045e48524d2aa4d98186549

Branch pushed to git repo; I updated commit sha1. New commits:

50bdaf3Trac 12121: work around a bug with symbolics

comment:21 Changed 3 months ago by vdelecroix

I pushed a tiny fix because we currently have

sage: log(8) / log(2) == 3
False

See the discussion at this sage-devel thread.

comment:22 Changed 2 months ago by mmezzarobba

#15786 is a partial duplicate. I find the fix posted there a bit cleaner, though it needs to be rebased.

Last edited 2 months ago by mmezzarobba (previous) (diff)

comment:23 Changed 2 months ago by vdelecroix

Rebased on what!?

There is one thing I am not happy with. I think we should try to make the interval much smaller than 1 in

while x_interval.absolute_diameter() >= 1:
    bits *= 2
    x_interval = RealIntervalField(bits)(x)

before trying to simplify the expression. I guess that 2-30 would be much more reasonable and avoid many attempts of (costly) simplification. What do you think?

comment:24 Changed 2 months ago by mmezzarobba

Sorry if I wasn't clear. I'm saying that the other branch (the one at #15786) should be rebased.

comment:25 Changed 2 months ago by rws

Be aware you might be using symbolics when doing bool( == ). To not end up with surprises you might want to break out the actual functionality you need out of Expression.__nonzero__ or review #16397 and use cmp.

comment:26 Changed 2 months ago by git

  • Commit changed from 50bdaf3b31aecc326045e48524d2aa4d98186549 to 688ebfa077f09ee9a2bc988833ddd4f0fc65a4a1

Branch pushed to git repo; I updated commit sha1. New commits:

688ebfaTrac 12121: code improvements

comment:27 Changed 2 months ago by vdelecroix

Replying to rws:

Be aware you might be using symbolics when doing bool( == ). To not end up with surprises you might want to break out the actual functionality you need out of Expression.__nonzero__ or review #16397 and use cmp.

You mean that there is no way to check that expr1 == expr2? I do not want to copy/paste anything from other place. If testing equality is not available, there is a big problem.

Using cmp for that purpose is a bad idea. The purpose of cmp in Python 2 is to sort things out. Not to compare.

comment:28 follow-up: Changed 2 months ago by rws

Testing equality of symbolics in general is undecidable. If you remove all expressions with variables however, it is easy: convert symbolic constants and function expressions to float as in N(), compare. Of course there are precision problems but that's nothing in comparison to what you get with variables.

The idea to abuse cmp is not mine. Somewhere here RLF(1) < RLF(sqrt(2)) for example, symbolic cmp is called. Should I file a bug report for such usage?

EDIT: typos

Last edited 2 months ago by rws (previous) (diff)

comment:29 in reply to: ↑ 28 Changed 2 months ago by mmezzarobba

Replying to rws:

Testing equality of symbolics in general is undecidable. If you remove all expressions with variables however, it is easy

It is undecidable even without variables.

The idea to abuse cmp is not mine. Somewhere here RLF(1) < RLF(sqrt(2)) for example, symbolic cmp is called. Should I file a bug report for such usage?

I'd say it probably is a bug. As to whether you should file a bug report, I don't know—I doubt anyone really reads them, and the issue is probably already covered somewhere in the myriad of known bugs with comparisons...

comment:30 follow-up: Changed 2 months ago by vdelecroix

Replying to rws:

Testing equality of symbolics in general is undecidable. If you remove all expressions with variables however, it is easy: convert symbolic constants and function expressions to float as in N(), compare. Of course there are precision problems but that's nothing in comparison to what you get with variables.

What I need is a method that either returns a reliable answer True or False or raise an error which can either be This comparison is meaningless or I don't know how to compare this.

comment:31 in reply to: ↑ 30 ; follow-up: Changed 2 months ago by rws

Replying to vdelecroix:

What I need is a method that either returns a reliable answer True or False or raise an error which can either be This comparison is meaningless or I don't know how to compare this.

Then bool( == ) is right for the moment and may be replaced with #19040. As said don't be surprised if it takes a long time.

comment:32 Changed 2 months ago by git

  • Commit changed from 688ebfa077f09ee9a2bc988833ddd4f0fc65a4a1 to a5ef94a5b27b302d813ac83538558b39a39c7267

Branch pushed to git repo; I updated commit sha1. New commits:

a5ef94aTrac 12121: note about #19040 + extra if

comment:33 in reply to: ↑ 31 Changed 2 months ago by vdelecroix

Replying to rws:

Replying to vdelecroix:

What I need is a method that either returns a reliable answer True or False or raise an error which can either be This comparison is meaningless or I don't know how to compare this.

Then bool( == ) is right for the moment and may be replaced with #19040. As said don't be surprised if it takes a long time.

I added a note about #19040.

comment:34 Changed 2 months ago by vdelecroix

  • Description modified (diff)
  • Milestone changed from sage-7.1 to sage-7.2

comment:35 Changed 2 months ago by mmezzarobba

  • Reviewers set to Marc Mezzarobba
  • Status changed from needs_review to needs_work

Hi Vincent,

Sorry for my unclear comments above, I didn't look closely enough at your code before posting them.

Beside the issue with comparisons raised by Ralf, I find the code on your branch a bit complicated, and I don't like the fact that you drop the maximum_bits parameters at the risk of looping forever if you cannot prove that the input is an integer. I pushed to u/mmezzarobba/12121-ceil a rough attempt to fix these issues (in the case of ceil only at the moment, and not well tested yet), please tell me what you think of it.

comment:36 Changed 7 weeks ago by git

  • Commit changed from a5ef94a5b27b302d813ac83538558b39a39c7267 to 60f0c291adf28344156661ea7623c5721ed4a06b

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

60f0c29Trac 12121: Fix floor/ceil

comment:37 Changed 7 weeks ago by vdelecroix

  • Status changed from needs_work to needs_review

All right. I factorized the two implementations in a new function incremental_rounding. It is cleaner and the parameter maximum_bits is reintroduced.

comment:38 Changed 7 weeks ago by git

  • Commit changed from 60f0c291adf28344156661ea7623c5721ed4a06b to d53c406171d73a0138d25336732728b41409c8b9

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

d53c406Trac 12121: Fix floor/ceil

comment:39 Changed 7 weeks ago by mmezzarobba

I fear I won't have time to review your new version in the next few days at least, but from a quick look at it there are a lot of things I don't understand. In no particular order:

  • why do you make maximum_bits an Integer?
  • what don't you like about unique_integer()?
  • is it really better to have an absolute bound for the diameter that makes us suspect we found an exact integer, rather than something that depends on the precision?
  • why do you insist on using == on symbolic expressions instead of is_trivial_zero()?
  • are you sure you want to raise an error when maximum_bits does not suffice to conclude? this is a symbolic function that may be buried deep in the middle of a symbolic expression; returning unevaluated seems more reasonable to me...
  • do we really need two loops that do essentially the same thing (including raising errors with the exact same message)?

comment:40 Changed 7 weeks ago by git

  • Commit changed from d53c406171d73a0138d25336732728b41409c8b9 to 0e9b2f21d6a6bc42e35302316ac0f6b3b7451450

Branch pushed to git repo; I updated commit sha1. New commits:

0e9b2f2Trac 12121: remove __call__ and fix round

comment:41 follow-ups: Changed 7 weeks ago by vdelecroix

I removed the __call__ in both Function_floor and Function_ceil. The code is now much simpler. Though there was some adaptation needed in symbolic/expression.pyx.

Replying to mmezzarobba:

I fear I won't have time to review your new version in the next few days at least, but from a quick look at it there are a lot of things I don't understand. In no particular order:

  • why do you make maximum_bits an Integer?

all right. int is fine as well.

  • what don't you like about unique_integer()?

an assert does not cost anything. And unique_integer silently fails if the interval does not enclose a unique integer.

  • is it really better to have an absolute bound for the diameter that makes us suspect we found an exact integer, rather than something that depends on the precision?

the precision of what? there is the field used for the evaluation which is different from the diameter of the interval. If you have more than one integer in your interval which one are you using to test equality?

  • why do you insist on using == on symbolic expressions instead of is_trivial_zero()?

Because I want to check equality with an integer. Not if it is a trivial equality.

  • are you sure you want to raise an error when maximum_bits does not suffice to conclude? this is a symbolic function that may be buried deep in the middle of a symbolic expression; returning unevaluated seems more reasonable to me...

Done with an example.

  • do we really need two loops that do essentially the same thing (including raising errors with the exact same message)?

The equality test is potentially costly. And we want to avoid it as much as possible. In particular, it makes no sense to test this equality within each step of the loop as it is in your version. On a related note, I noticed that for round you need to test equality with elements of ZZ + 1/2.

comment:42 in reply to: ↑ 41 Changed 7 weeks ago by vdelecroix

Replying to vdelecroix:

Replying to mmezzarobba:

  • do we really need two loops that do essentially the same thing (including raising errors with the exact same message)?

The equality test is potentially costly. And we want to avoid it as much as possible. In particular, it makes no sense to test this equality within each step of the loop as it is in your version. On a related note, I noticed that for round you need to test equality with elements of ZZ + 1/2.

And I also would like to use the very same function incremental_rounding for elements of QQbar. For the very same reason, you only want very lately the equality test.

comment:43 Changed 7 weeks ago by git

  • Commit changed from 0e9b2f21d6a6bc42e35302316ac0f6b3b7451450 to 313c497daaec6324dfe4ffdeb684844e301a3f61

Branch pushed to git repo; I updated commit sha1. New commits:

313c497Trac 12121: fix doctests

comment:44 Changed 7 weeks ago by vdelecroix

  • Description modified (diff)

comment:45 Changed 4 weeks ago by vdelecroix

ping?!

comment:46 Changed 4 weeks ago by mmezzarobba

Sorry, I have about zero time for Sage development before at least 1-2 weeks. All I can say is that I wasn't completely convinced by your answers and would need to think things over more carefully. If someone wants to review the ticket in the meantime, please do.

comment:47 Changed 2 weeks ago by git

  • Commit changed from 313c497daaec6324dfe4ffdeb684844e301a3f61 to 602e5158c5f37c6900a1dee8e27a228114af1319

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

9bfef80Trac 12121: Fix floor/ceil
c8abe9cTrac 12121: remove __call__ and fix round
602e515Trac 12121: fix doctests

comment:48 Changed 2 weeks ago by vdelecroix

rebased on 7-2.rc1

comment:49 in reply to: ↑ 41 Changed 7 days ago by mmezzarobba

  • Status changed from needs_review to needs_work

Okay, I'm back. Sorry that it took so long.

Replying to vdelecroix:

  • what don't you like about unique_integer()?

an assert does not cost anything. And unique_integer silently fails if the interval does not enclose a unique integer.

Uh? No, it doesn't.

  • is it really better to have an absolute bound for the diameter that makes us suspect we found an exact integer, rather than something that depends on the precision?

the precision of what? there is the field used for the evaluation which is different from the diameter of the interval. If you have more than one integer in your interval which one are you using to test equality?

The precision of the interval computation, i.e. bits. The idea being that if we found an interval of width (say) 2⁻²⁰ containing an integer by computing with 1000 bits of precision, we may want to see if the width of the interval keeps decreasing when the precision increases and only run the symbolic part of the algorithm if that's the case. But I agree that this is a very minor issue at best.

  • why do you insist on using == on symbolic expressions instead of is_trivial_zero()?

Because I want to check equality with an integer. Not if it is a trivial equality.

I mean using is_trivial_zero() after calling full_simplify() & co, as in my version: this is safer than relying on == and essentially as powerful.

  • do we really need two loops that do essentially the same thing (including raising errors with the exact same message)?

The equality test is potentially costly. And we want to avoid it as much as possible. In particular, it makes no sense to test this equality within each step of the loop as it is in your version.

Yes—as I said, my version was just a rough sketch of the changes I'd b tempted to make, nothing finished. As the code I was starting with used to loop forever in the typical situation where mine would do the symbolic test repeatedly (that is, x ∈ ℤ integer but we don't manage to prove it), I thought the additional cost would be acceptable. ;-) But anyway, it is not hard to do the test only once while avoiding the code duplication.

On a related note, I noticed that for round you need to test equality with elements of ZZ + 1/2.

Couldn't you just compute ceil(x-1/2)?


Now for some comments on the current code:

  • To summarize the above, I still think the main logic in incremental_rounding() could be shortened to something like (not tested):
        unique_rounding = getattr(RealIntervalFieldElement, 'unique_' + mode)
        r = RR.one() >> 20
    
        bits = 64
        candidate = None
        while bits < maximum_bits:
            interval = RealIntervalField(bits)(x) # may raise a TypeError
            try:
                return unique_rounding(interval)
            except ValueError:
                pass
            if candidate is None and interval.absolute_diameter() > r:
                candidate = interval.unique_integer()
                try:
                    delta = x - candidate
                    if (delta.is_zero()
                            or SR(delta).full_simplify().canonicalize_radical()
                                        .is_trivial_zero()
                            or QQbar(delta).is_zero()):
                        return candidate
                    except (TypeError, ValueError):
                        pass
            bits *= 2
    
  • I'd also make incremental_rounding() private and dispense with the argument checking (and perhaps move it to real_mpfi if your plan is to use it from elsewhere)—but I'm okay with keeping it as it is. An advantage of making it private is that you could take the “unique rounding” function on intervals as input instead of accessing it via getattr(). Another option would be to introduce a common base class for Function_floor, Function_ceil and Function_round.
  • I'm a little uneasy about the changes you made to BuiltinFunction.__call__(). The fact that it used to convert non-Element inputs to Elements looks intentional and pretty reasonable to me. Is it really necessary to change that behavior? That being said, I'm a bit lost in the maze of Function.__call__, BuiltinFunction.__call__, _eval_, _evalf_, _evalf_try_ and friends, so if you tell me you are confident that the change is correct I'll trust you!
  • If these changes stay, then I guess this
    if module is not None:
        func = getattr(module, self._name, None)
        if func is None and self._alt_name is not None:
            func = getattr(module, self._name, None)
                                        ^^^^^
    
    should be _alt_name.
  • Note that these changes also make
    sage: sin(numpy.int32(0))
    0.0
    
    which is at odds with
    sage: sin(ZZ(0)).parent()
    Integer Ring
    
    (perhaps not ideal, but predictable at least).
  • In Function_*, what is the point of calling _evalf_() from _eval_()?
  • And why isn't the logic for choosing maximum_bits in _evalf_() the same in floor, ceil and round?
  • I wouldn't bother with checking that x is not a relation. First, _eval_() methods of individual functions are probably not the right place for that (either BuiltinFunction.__call__() or perhaps methods ceil(), floor() etc. in a future subclass RelationalExpression of Expression would be more reasonable). Besides, various other nonsensical inputs (e.g. series, booleans) are accepted without error, so it is a bit strange to have an ad hoc check dealing with this one.

comment:50 follow-up: Changed 5 days ago by nbruin

You may be interested in #20624. It looks like implementing _evalf_ by calling _eval_ is VERY bad: currently evaluation of ceil may lead to running out the python call stack before doing anything useful. Obviously, this takes some time. Inheriting from BuiltinFunction is a real bugtrap: its init reassigns a whole bunch of methods.

comment:51 in reply to: ↑ 50 Changed 5 days ago by rws

  • Cc kcrisman. jdemeyer added; kcrisman removed

Replying to nbruin:

Inheriting from BuiltinFunction is a real bugtrap: its init reassigns a whole bunch of methods.

That must be the reason why Sage crashes all the time. Seriously, I agree the construction of functions is messy and it limits the developer somewhat, see "other symbolic function tickets" in http://trac.sagemath.org/wiki/symbolics/functions. Note also there are a bunch of tickets needing review there. However, I think the design is sound. You just have to read how other functions (the more recently implemented) are using it. In the end, I or someone will be transferring the Python you write to Pynac, anyway.

Cc: the author of the _evalf_try_ mechanism.

Note: See TracTickets for help on using tickets.