Opened 6 years ago

Closed 6 years ago

#21448 closed enhancement (fixed)

Avoid underscored arithmetic methods in Python

Reported by: Jeroen Demeyer Owned by:
Priority: major Milestone: sage-7.4
Component: performance Keywords:
Cc: Nicolas M. Thiéry Merged in:
Authors: Jeroen Demeyer Reviewers: Nicolas M. Thiéry
Report Upstream: N/A Work issues:
Branch: ca0f7ba (Commits, GitHub, GitLab) Commit: ca0f7ba75740682514cef1c22d3515c41ffb8411
Dependencies: #20767 Stopgaps:

Status badges

Description (last modified by Jeroen Demeyer)

In categories, it is better to write x + y instead of x._add_(y) since the latter can be a lot slower if _add_ is implemented in Cython.

We also remove several redundant implementations of _sub_ where they coincide with the default implementation from ModuleElement.

Attachments (1)

Single-underscore benchmark.ipynb (4.6 KB) - added by Jeroen Demeyer 6 years ago.
Benchmark

Download all attachments as: .zip

Change History (18)

comment:1 Changed 6 years ago by Jeroen Demeyer

Description: modified (diff)

comment:2 Changed 6 years ago by Jeroen Demeyer

Description: modified (diff)

comment:3 Changed 6 years ago by Jeroen Demeyer

Description: modified (diff)

comment:4 Changed 6 years ago by Jeroen Demeyer

Description: modified (diff)

comment:5 Changed 6 years ago by Jeroen Demeyer

Branch: u/jdemeyer/ticket/21448

comment:6 Changed 6 years ago by Jeroen Demeyer

Commit: ca0f7ba75740682514cef1c22d3515c41ffb8411
Status: newneeds_review

Last 10 new commits:

343131aFix doctest for Trac #20767
1f964c720767: minor documentation reformating / improvements
6b1f65f20767: additional test, use Element rather than RingElement in some tests, minor doc improvement
35b64a8Fix division action
2e8f9e4Return exception instead of error message
59a04e4Improve documentation and tests
2d4094820767: Proofreading doc + doc of cdef _add_ trick + TODO about _add_long / _mul_long
fec4abeFurther doc additions
25c9d7dAdd tests for _add_long and _mul_long
ca0f7baAvoid underscored arithmetic methods in Python

comment:7 Changed 6 years ago by Travis Scrimshaw

This is counterintuitive to me:

In categories, it is better to write x + y instead of x._add_(y) since the latter can be a lot slower if _add_ is implemented in Cython.

I would think that avoiding the additional call to __add__ (which does some additional checks) would result in a speed gain. Does this have to do with how cpdef functions are implemented in Cython?

comment:8 in reply to:  7 Changed 6 years ago by Jeroen Demeyer

Rule number 1 of optimizing Python: a method is slow. Almost anything is faster than a method lookup.

comment:9 in reply to:  7 Changed 6 years ago by Jeroen Demeyer

And the call of x + y to x.__add__(y) is optimized by Python, so it is reasonably fast. Compare:

sage: timeit('a + a', number=10^6, repeat=20)
1000000 loops, best of 20: 62.3 ns per loop
sage: timeit('a._add_(a)', number=10^6, repeat=20)
1000000 loops, best of 20: 115 ns per loop
sage: timeit('a.__add__(a)', number=10^6, repeat=20)
1000000 loops, best of 20: 162 ns per loop

I don't really have an explanation of why __add__ is slower than _add_, I would have expected them to be equally slow.

comment:10 Changed 6 years ago by Jeroen Demeyer

On the topic of methods being slow, note the difference between:

sage: a = 1
sage: timeit('parent(a)', number=10^6, repeat=20)
1000000 loops, best of 20: 74.6 ns per loop
sage: timeit('a.parent()', number=10^6, repeat=20)
1000000 loops, best of 20: 93.5 ns per loop

comment:11 Changed 6 years ago by Nicolas M. Thiéry

I checked the diff and it does what the ticket claims.

Just for a complete confirmation, could you post it a new benchmark where the code to be benchmarked is called 100 times in a loop inside a function? Just to make sure that there is no interference from the fact that we are timing at very low granularity.

Once confirmed, you can set a positive review on my behalf.

Thanks!

comment:12 Changed 6 years ago by Nicolas M. Thiéry

Reviewers: Nicolas M. Thiéry

comment:13 Changed 6 years ago by Travis Scrimshaw

Thank you for the explanations Jeroen!

comment:14 Changed 6 years ago by Jeroen Demeyer

I made a benchmark which should be close to possible real-life usage. The conclusion:

With _add_: 5 loops, best of 3: 85.2 ms per loop

With +: 5 loops, best of 3: 78.1 ms per loop

Last edited 6 years ago by Jeroen Demeyer (previous) (diff)

comment:15 Changed 6 years ago by Jeroen Demeyer

Status: needs_reviewpositive_review

comment:16 Changed 6 years ago by Jeroen Demeyer

Note that the benchmark also shows that implementing arithmetic in the category is really slow:

With _add_ implemented in Cython: 125 loops, best of 3: 5.57 ms per loop

With _sub_ implemented in the category: 5 loops, best of 3: 78.1 ms per loop

Last edited 6 years ago by Jeroen Demeyer (previous) (diff)

Changed 6 years ago by Jeroen Demeyer

Benchmark

comment:17 Changed 6 years ago by Volker Braun

Branch: u/jdemeyer/ticket/21448ca0f7ba75740682514cef1c22d3515c41ffb8411
Resolution: fixed
Status: positive_reviewclosed
Note: See TracTickets for help on using tickets.