Opened 3 years ago

Closed 3 years ago

# Special-case pol*term, term*pol for generic polynomials

Reported by: Owned by: mmezzarobba major sage-8.3 basic arithmetic Marc Mezzarobba, Travis Scrimshaw Travis Scrimshaw, Marc Mezzarobba N/A d37db8d d37db8da4c69d6d7fc8982e653b9c412c1d0b77f

```sage: Pol0.<t> = ZZ[]
....: Pol1.<x> = Pol0[]
....: p = ((t^2 - 3*t)*x^10 + (-3*t^2 - 1)*x^9 + (-t^2 - t + 1)*x^8 + (-t^2 - 1)*x^7 + (-4*t^2 + 14*t - 4)*x^6 + (2*t^2 + 2)*x^5 + (-t^2 + 2*t + 2)*x^4 + (3*t^2 - t + 3)*x^3 + (t^2 - t + 156)*x^2 + (7*t - 3)*x - t^2 + 2*t)
....: q = 2*x
```

8.3.beta0:

```sage: %timeit p*q
The slowest run took 2129.02 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.02 µs per loop
```

This ticket:

```The slowest run took 8.63 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.21 µs per loop
```

### comment:1 Changed 3 years ago by mmezzarobba

• Status changed from new to needs_review

### comment:2 Changed 3 years ago by mmezzarobba

• Type changed from PLEASE CHANGE to enhancement

### comment:3 follow-up: ↓ 4 Changed 3 years ago by tscrim

Perhaps it would be better to have a specialized `cdef scale_and_shift` method that can be overwritten by the specific data structures in order to avoid creating an intermediate object? In fact, the generic `_lmul_` and `_rmul_` just lift the coefficient to the polynomial ring and multiply. I would also rewrite `_mul_` and `_rmul_` to use this `scale_and_shift` with a shift of 0 to avoid code duplication.

### comment:4 in reply to: ↑ 3 ; follow-up: ↓ 5 Changed 3 years ago by mmezzarobba

Perhaps it would be better to have a specialized `cdef scale_and_shift` method that can be overwritten by the specific data structures in order to avoid creating an intermediate object? In fact, the generic `_lmul_` and `_rmul_` just lift the coefficient to the polynomial ring and multiply. I would also rewrite `_mul_` and `_rmul_` to use this `scale_and_shift` with a shift of 0 to avoid code duplication.

Yes, why not... I'm not sure it is worth the pain now—I'm not really trying to make this operation as fast as possible, just to avoid the worst of the overhead (especially in the Karatsuba range). So I'm not going to implement your suggestion now, but perhaps later, and in any case I can review the implementation if you want to do it yourself.

Thanks for you comments in any case!

### comment:5 in reply to: ↑ 4 Changed 3 years ago by tscrim

Perhaps it would be better to have a specialized `cdef scale_and_shift` method that can be overwritten by the specific data structures in order to avoid creating an intermediate object? In fact, the generic `_lmul_` and `_rmul_` just lift the coefficient to the polynomial ring and multiply. I would also rewrite `_mul_` and `_rmul_` to use this `scale_and_shift` with a shift of 0 to avoid code duplication.

Yes, why not... I'm not sure it is worth the pain now—I'm not really trying to make this operation as fast as possible, just to avoid the worst of the overhead (especially in the Karatsuba range). So I'm not going to implement your suggestion now, but perhaps later, and in any case I can review the implementation if you want to do it yourself.

I will do it today then.

### comment:6 Changed 3 years ago by tscrim

• Branch changed from u/mmezzarobba/generic_pol_times_term to u/tscrim/generic_pol_times_term-25317
• Commit changed from 42034410ada1eb5e8667f8821cae489a4f0a1659 to d37db8da4c69d6d7fc8982e653b9c412c1d0b77f
• Reviewers set to Travis Scrimshaw

Running the same test as in the description.

8.3.beta0:

```sage: %timeit p*q
The slowest run took 2129.02 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.02 µs per loop
```

```sage: %timeit p*q
The slowest run took 9.36 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.78 µs per loop
```

My branch:

```The slowest run took 8.63 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.21 µs per loop
```

So my changes get us an extra ~20%.

New commits:

 ​4a9c966 `Merge branch 'u/mmezzarobba/generic_pol_times_term' of git://trac.sagemath.org/sage into u/tscrim/generic_pol_times_term` ​d37db8d `Improving the speed of single term multiplication.`

### comment:7 Changed 3 years ago by mmezzarobba

• Authors changed from Marc Mezzarobba to Marc Mezzarobba, Travis Scrimshaw
• Description modified (diff)
• Reviewers changed from Travis Scrimshaw to Travis Scrimshaw, Marc Mezzarobba
• Status changed from needs_review to positive_review

Great, thanks!

### comment:8 Changed 3 years ago by vbraun

• Branch changed from u/tscrim/generic_pol_times_term-25317 to d37db8da4c69d6d7fc8982e653b9c412c1d0b77f
• Resolution set to fixed
• Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.