Opened 3 years ago

Closed 3 years ago

#25317 closed enhancement (fixed)

Special-case pol*term, term*pol for generic polynomials

Reported by: mmezzarobba Owned by:
Priority: major Milestone: sage-8.3
Component: basic arithmetic Keywords:
Cc: Merged in:
Authors: Marc Mezzarobba, Travis Scrimshaw Reviewers: Travis Scrimshaw, Marc Mezzarobba
Report Upstream: N/A Work issues:
Branch: d37db8d (Commits, GitHub, GitLab) Commit: d37db8da4c69d6d7fc8982e653b9c412c1d0b77f
Dependencies: Stopgaps:

Status badges

Description (last modified by mmezzarobba)

sage: Pol0.<t> = ZZ[]
....: Pol1.<x> = Pol0[]
....: p = ((t^2 - 3*t)*x^10 + (-3*t^2 - 1)*x^9 + (-t^2 - t + 1)*x^8 + (-t^2 - 1)*x^7 + (-4*t^2 + 14*t - 4)*x^6 + (2*t^2 + 2)*x^5 + (-t^2 + 2*t + 2)*x^4 + (3*t^2 - t + 3)*x^3 + (t^2 - t + 156)*x^2 + (7*t - 3)*x - t^2 + 2*t)
....: q = 2*x

8.3.beta0:

sage: %timeit p*q
The slowest run took 2129.02 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.02 µs per loop

This ticket:

The slowest run took 8.63 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.21 µs per loop

Change History (8)

comment:1 Changed 3 years ago by mmezzarobba

  • Status changed from new to needs_review

comment:2 Changed 3 years ago by mmezzarobba

  • Type changed from PLEASE CHANGE to enhancement

comment:3 follow-up: Changed 3 years ago by tscrim

Perhaps it would be better to have a specialized cdef scale_and_shift method that can be overwritten by the specific data structures in order to avoid creating an intermediate object? In fact, the generic _lmul_ and _rmul_ just lift the coefficient to the polynomial ring and multiply. I would also rewrite _mul_ and _rmul_ to use this scale_and_shift with a shift of 0 to avoid code duplication.

comment:4 in reply to: ↑ 3 ; follow-up: Changed 3 years ago by mmezzarobba

Replying to tscrim:

Perhaps it would be better to have a specialized cdef scale_and_shift method that can be overwritten by the specific data structures in order to avoid creating an intermediate object? In fact, the generic _lmul_ and _rmul_ just lift the coefficient to the polynomial ring and multiply. I would also rewrite _mul_ and _rmul_ to use this scale_and_shift with a shift of 0 to avoid code duplication.

Yes, why not... I'm not sure it is worth the pain now—I'm not really trying to make this operation as fast as possible, just to avoid the worst of the overhead (especially in the Karatsuba range). So I'm not going to implement your suggestion now, but perhaps later, and in any case I can review the implementation if you want to do it yourself.

Thanks for you comments in any case!

comment:5 in reply to: ↑ 4 Changed 3 years ago by tscrim

Replying to mmezzarobba:

Replying to tscrim:

Perhaps it would be better to have a specialized cdef scale_and_shift method that can be overwritten by the specific data structures in order to avoid creating an intermediate object? In fact, the generic _lmul_ and _rmul_ just lift the coefficient to the polynomial ring and multiply. I would also rewrite _mul_ and _rmul_ to use this scale_and_shift with a shift of 0 to avoid code duplication.

Yes, why not... I'm not sure it is worth the pain now—I'm not really trying to make this operation as fast as possible, just to avoid the worst of the overhead (especially in the Karatsuba range). So I'm not going to implement your suggestion now, but perhaps later, and in any case I can review the implementation if you want to do it yourself.

I will do it today then.

comment:6 Changed 3 years ago by tscrim

  • Branch changed from u/mmezzarobba/generic_pol_times_term to u/tscrim/generic_pol_times_term-25317
  • Commit changed from 42034410ada1eb5e8667f8821cae489a4f0a1659 to d37db8da4c69d6d7fc8982e653b9c412c1d0b77f
  • Reviewers set to Travis Scrimshaw

Running the same test as in the description.

8.3.beta0:

sage: %timeit p*q
The slowest run took 2129.02 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.02 µs per loop

Your branch:

sage: %timeit p*q
The slowest run took 9.36 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.78 µs per loop

My branch:

The slowest run took 8.63 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.21 µs per loop

So my changes get us an extra ~20%.


New commits:

4a9c966Merge branch 'u/mmezzarobba/generic_pol_times_term' of git://trac.sagemath.org/sage into u/tscrim/generic_pol_times_term
d37db8dImproving the speed of single term multiplication.

comment:7 Changed 3 years ago by mmezzarobba

  • Authors changed from Marc Mezzarobba to Marc Mezzarobba, Travis Scrimshaw
  • Description modified (diff)
  • Reviewers changed from Travis Scrimshaw to Travis Scrimshaw, Marc Mezzarobba
  • Status changed from needs_review to positive_review

Great, thanks!

comment:8 Changed 3 years ago by vbraun

  • Branch changed from u/tscrim/generic_pol_times_term-25317 to d37db8da4c69d6d7fc8982e653b9c412c1d0b77f
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.