Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#24244 closed enhancement (fixed)

Fast check for C long

Reported by: jdemeyer Owned by:
Priority: major Milestone: sage-8.1
Component: cython Keywords:
Cc: Merged in:
Authors: Jeroen Demeyer Reviewers: Travis Scrimshaw
Report Upstream: N/A Work issues:
Branch: 70592a8 (Commits) Commit:
Dependencies: #24252 Stopgaps:

Description (last modified by jdemeyer)

Implement a fast variant of pyobject_to_long, which could be used to replace checks of the form isinstance(x, int).

The goal of these functions is to provide a single place to check for integers and convert them to a C long (at least in Cython code, where performance matters most). Various places in Sage check for integers in different ways. First of all, this is an inconsistency which should be fixed.

In Python 3, the types int/long are changed, so many of the existing checks will need to be changed anyway. So we might as well provide one function which should be able to replace most of these checks. And ideally, we want to do this without losing too much performance. For that reason, I avoid PyLong_FromLong in favour of a hand-written implementation.

The function pyobject_to_long has signature

cdef bint integer_check_long(x, long* value, int* err)

It does several things: first of all, it checks whether x is "integer-like". This means that it is either a Python int or long, a Sage Integer or it supports __index__. The return value of the function is whether or not it is an integer. If it is an integer, it tries to convert the value to a C long and stores it in *value. Errors are passed through the *err variable.

This interface looks overcomplicated on first sight, but the complexity is needed to support the various use cases of this function. Also note that it is an inline function, so the compiler should be able to optimize away the unused variables.

Use case 1: a function can take different kinds of input and want to check whether the input is an integer. Typically, this would be done with a long if chain like

if isinstance(x, ...):
    ...
elif isinstance(x, ...):
    ...
elif integer_check_long(x, &value, &err):
    # Process value and err as needed
elif isinstance(x, ...):
    ...

Use case 2: convert an integer to a C long purely for performance. In this case, you want to treat overflow the same as the input not being an integer:

cdef int err = -1
integer_check_long(x, &value, &err)
if err == 0:
    # Conversion successful, use value

Change History (32)

comment:1 Changed 3 years ago by jdemeyer

  • Branch set to u/jdemeyer/fast_check_for_c_long

comment:2 Changed 3 years ago by git

  • Commit set to dd27cd5b59823f5f5ff7c1a80d0afc962c29dc7d

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

eeee3deMove long.pxd to arith
dd27cd5New function integer_check_long()

comment:3 Changed 3 years ago by jdemeyer

  • Dependencies set to #24245

comment:4 Changed 3 years ago by git

  • Commit changed from dd27cd5b59823f5f5ff7c1a80d0afc962c29dc7d to b1ea838fc07baf8d6247a15f8db067ade9596e4a

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

b1ea838New functions integer_check_long() and integer_check_long_py()

comment:5 Changed 3 years ago by jdemeyer

  • Status changed from new to needs_review

comment:6 Changed 3 years ago by jdemeyer

  • Dependencies changed from #24245 to #24252
  • Status changed from needs_review to needs_work

comment:7 Changed 3 years ago by git

  • Commit changed from b1ea838fc07baf8d6247a15f8db067ade9596e4a to 4d76bc133b7b93cf3a3baaad782caedee692ae0a

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

bb114c9Fake Integer interface
4d76bc1New functions integer_check_long() and integer_check_long_py()

comment:8 Changed 3 years ago by jdemeyer

  • Status changed from needs_work to needs_review

comment:9 follow-up: Changed 3 years ago by tscrim

I think it would be better if integer_check_long and integer_check_long_py returned the error and 0 if it was completely successful. Then pyobject_to_long would then interpret the error using the enum and do the correct Python error raising. I feel like you are basically doing this, but in a roundabout way by passing err.

comment:10 Changed 3 years ago by tscrim

Also, in #24248, you do not even use the return bint of integer_check_long_py.

comment:11 in reply to: ↑ 9 Changed 3 years ago by jdemeyer

  • Description modified (diff)

comment:12 Changed 3 years ago by git

  • Commit changed from 4d76bc133b7b93cf3a3baaad782caedee692ae0a to 70592a850d7677928592c022760d6183bd81364f

Branch pushed to git repo; I updated commit sha1. New commits:

70592a8Fix compiler warning

comment:13 follow-ups: Changed 3 years ago by tscrim

Use case 2 could just become:

if integer_check_long(x, &value) == 0:
    # Conversion successful, use value

which feels much more natural to me.

However, I agree that use case 1 is a bit more of a hassle because you could not subsequently assign the error message (this wouldn't be a problem in, e.g., C because of the looser syntax rules). Although you can get around this by expanding out the elif as an else-if (but it looks fugly). Do you have a specific case where this does come up?

comment:14 in reply to: ↑ 13 Changed 3 years ago by jdemeyer

Replying to tscrim:

Use case 2 could just become:

if integer_check_long(x, &value) == 0:
    # Conversion successful, use value

which feels much more natural to me.

Sure, I can simplify use case 1 and use case 2 individually. But I see no way to simplify things in such a way that both use cases still work. Alternatively, I could simply add another layer for use case 2 and make a new function

if integer_check_long_no_overflow(x, &value):
    # Conversion successful, use value

Finally, let me stress that this complexity shouldn't affect performance since we are dealing with inline functions and compilers are good at optimization.

comment:15 in reply to: ↑ 13 ; follow-up: Changed 3 years ago by jdemeyer

Replying to tscrim:

Do you have a specific case where this does come up?

I don't know exactly what you mean, but you can look at my work in progress at #24247. There are various calls to integer_check_long(_py) there.

comment:16 Changed 3 years ago by jdemeyer

Let me know what you would suggest to do with this ticket.

comment:17 in reply to: ↑ 15 ; follow-ups: Changed 3 years ago by tscrim

  • Reviewers set to Travis Scrimshaw

Replying to jdemeyer:

Replying to tscrim:

Do you have a specific case where this does come up?

I don't know exactly what you mean, but you can look at my work in progress at #24247. There are various calls to integer_check_long(_py) there.

Do you have a place on a future ticket where use case 1 applies? I see use case 2 on #24247 and this one:

            if integer_check_long_py(right, &value, &err):
                if not err:
                    return (<Element>left)._pow_long(value)
                else:
                    return (<Element>left)._pow_int(right)

which could be simplified to

            err = integer_check_long_py(right, &value):
            if not err:
                return (<Element>left)._pow_long(value)
            elif err == ERR_OVERFLOW:
                return (<Element>left)._pow_int(right)

However, if you see a clear benefit to take on the more complicated function description and return values, then you can set this to a positive review. I just am thinking a little bit about the potential future maintenance of this function (I also don't think there will be any [noticeable] performance difference) and the places where it is used.

Last edited 3 years ago by tscrim (previous) (diff)

comment:18 in reply to: ↑ 17 Changed 3 years ago by jdemeyer

Here is a concrete proposal: we keep the complicated signature for now (if anything, simply to prevent conflicts with current tickets depending on this one). Once we have a clearer idea on how these functions are used, we can revisit and change things if needed.

comment:19 in reply to: ↑ 17 Changed 3 years ago by jdemeyer

Replying to tscrim:

I also don't think there will be any [noticeable] performance difference

Fun observation: compiler warnings can be used to get data on how the compiler is able to optimize the code. For example, in #24247 I'm writing essentially

        cdef long value
        cdef int err
        if integer_check_long(y, &value, &err):
            if not err:
                return (<Element>x)._pow_long(value)

In this case, the compiler does not give a warning that err and value might be uninitialized. This means that the compiler is able to inline and optimize all code paths of integer_check_long to prove that err and value are set to a known value whenever they are accessed in the code snippet above.

comment:20 Changed 3 years ago by jdemeyer

  • Status changed from needs_review to positive_review

comment:21 follow-up: Changed 3 years ago by embray

Oh, I just saw this from #24293 (I don't look at every ticket opened...)

As for "integer_fake" is there maybe a name for it that better reflects its purpose"? I was thinking something like "integer_proto" (for prototype) since providing early access to the Integer prototype seems to be its primary purpose for existing. "integer_fake" to me sounds like something for testing, or some other odd purpose.

comment:22 in reply to: ↑ 21 Changed 3 years ago by jdemeyer

Replying to embray:

As for "integer_fake" is there maybe a name for it that better reflects its purpose"? I was thinking something like "integer_proto" (for prototype) since providing early access to the Integer prototype seems to be its primary purpose for existing.

I think that integer_fake or integer_proto are equally meaningless. What would "prototype" mean in this context? I don't mind changing the name, but I'm not convinced with integer_proto either.

"integer_fake" to me sounds like something for testing, or some other odd purpose.

"odd purpose" totally applies here :-)

comment:23 Changed 3 years ago by jdemeyer

Thinking more about it, I actually like how the name sounds like it should not be used. integer_proto sounds too official somehow.

comment:24 Changed 3 years ago by tscrim

When I read proto(type), my first thought was the prototype design pattern, but that's right, in C it is called prototyping. However, since this is only mimicking that, I think fake is a little more accurate as to the class's purpose.

comment:25 Changed 3 years ago by embray

I mean, it should be used, just for a specific narrow case. Alright I won't push on it; just something called "fake" is a little jarring to me to see outside of, say, test code.

comment:26 follow-up: Changed 3 years ago by embray

BTW, I should add, this is really cool. Definitely something we should have done in the first place, in retrospect.

comment:27 in reply to: ↑ 26 Changed 3 years ago by jdemeyer

Replying to embray:

BTW, I should add, this is really cool. Definitely something we should have done in the first place, in retrospect.

Thanks!

comment:28 follow-up: Changed 3 years ago by embray

Okay okay, what about "integer_decl", since it's basically a forward declaration IIUC.

comment:29 in reply to: ↑ 28 Changed 3 years ago by jdemeyer

Replying to embray:

Okay okay, what about "integer_decl", since it's basically a forward declaration IIUC.

It's a fake forward declaration which shouldn't be used.

comment:30 Changed 3 years ago by vbraun

  • Branch changed from u/jdemeyer/fast_check_for_c_long to 70592a850d7677928592c022760d6183bd81364f
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:31 Changed 3 years ago by jmantysalo

  • Branch changed from 70592a850d7677928592c022760d6183bd81364f to u/jmantysalo/70592a850d7677928592c022760d6183bd81364f

comment:32 Changed 3 years ago by jmantysalo

  • Branch changed from u/jmantysalo/70592a850d7677928592c022760d6183bd81364f to 70592a850d7677928592c022760d6183bd81364f
  • Commit 70592a850d7677928592c022760d6183bd81364f deleted

Arghs, my typo.

Note: See TracTickets for help on using tickets.