Opened 8 years ago

Closed 2 years ago

#12657 closed enhancement (fixed)

Better implementation of Frobenius automorphism

Reported by: roed Owned by: roed
Priority: major Milestone: sage-8.1
Component: padics Keywords: sd87
Cc: jpflori, caruso Merged in:
Authors: Ander Steele, Ricky Magner Reviewers: GaYee Park, Xavier Caruso
Report Upstream: N/A Work issues:
Branch: 5c9d33c (Commits) Commit: 5c9d33c72a8993cde39ebd9d1ccf774bfb65b07d
Dependencies: #14304 Stopgaps:

Description

The Frobenius automorphism of p-adic fields was added in #8241. The implementation is horribly inefficient and should use Hensel lifting directly rather than decomposing into a list of Teichmuller representatives.

This optimization should probably wait on #12555.

Change History (29)

comment:1 Changed 6 years ago by jdemeyer

  • Milestone changed from sage-5.11 to sage-5.12

comment:2 Changed 6 years ago by roed

  • Dependencies set to #14304

The method that needs to be improved is frobenius_unram in src/sage/libs/linkages/padics/unram_shared.pxi. This may want to wait on some more p-adic polynomial developments, but could also be done now.

comment:3 Changed 6 years ago by vbraun_spam

  • Milestone changed from sage-6.1 to sage-6.2

comment:4 Changed 5 years ago by vbraun_spam

  • Milestone changed from sage-6.2 to sage-6.3

comment:5 Changed 5 years ago by vbraun_spam

  • Milestone changed from sage-6.3 to sage-6.4

comment:6 Changed 5 years ago by jpflori

  • Cc jpflori added

comment:7 Changed 2 years ago by roed

  • Keywords sd87 added

comment:8 Changed 2 years ago by asteele

  • Branch set to u/asteele/better_implementation_of_frobenius_automorphism

comment:9 Changed 2 years ago by git

  • Commit set to ffcb86ca34e2f45641ec8639b2a045527f5c1e61

Branch pushed to git repo; I updated commit sha1. New commits:

9a280e0Add cached method for computing frob of generator of unramified extension. Speeds up general frob
ffcb86cMake frob_gen() method private

comment:10 Changed 2 years ago by git

  • Commit changed from ffcb86ca34e2f45641ec8639b2a045527f5c1e61 to 3a03a51b0875bc8d72e3f51b787245938e464493

Branch pushed to git repo; I updated commit sha1. New commits:

3a03a51Add doctest for _frob_gen()

comment:11 Changed 2 years ago by asteele

  • Status changed from new to needs_review

comment:12 Changed 2 years ago by gpark

  • Reviewers set to GaYee Park
  • Status changed from needs_review to positive_review

All test passed.

comment:13 Changed 2 years ago by tscrim

  • Milestone changed from sage-6.4 to sage-8.1
  • Status changed from positive_review to needs_work

Author name missing.

comment:14 Changed 2 years ago by asteele

  • Authors set to asteele, Ricky Magner
  • Status changed from needs_work to needs_review

comment:15 Changed 2 years ago by caruso

  • Cc caruso added
  • Reviewers changed from GaYee Park to GaYee Park, Xavier Caruso

I don't understand why you need to redo Newton iteration in frobenius_unram.

Alternatively, instead of storing the powers of frob_a in a dictonary, you can use Horner scheme.

comment:16 follow-up: Changed 2 years ago by asteele

We redid the Newton iteration in _frob_gen() because that was easier (for me) than finding & adapting an existing method. The only method I see is .hensel_lift(p,e) in sage.rings.polynomial.polynomial_element.Polynomial, and I don't see the advantage of adapting that.

As for the horner scheme--our experiments suggested that the expensive operation was multiplication (in large degree unramified extensions). For example, In R.<a> = Zq(37100,100), addition of two random elements takes ~80 micro seconds, while muliplication takes ~80 milli seconds. My naive guess is that cacheing the powers of frob_a in the dictionary and paying the cost of addition will be much faster than paying the repeated cost of multiplication. In any case, storing the powers in a dict or computing the polynomials via horner are much faster than the previous method of computing frobenius.

comment:17 Changed 2 years ago by roed

  • Authors changed from asteele, Ricky Magner to Ander Steele, Ricky Magner

comment:18 in reply to: ↑ 16 ; follow-up: Changed 2 years ago by caruso

Replying to asteele:

We redid the Newton iteration in _frob_gen() because that was easier (for me) than finding & adapting an existing method. The only method I see is .hensel_lift(p,e) in sage.rings.polynomial.polynomial_element.Polynomial, and I don't see the advantage of adapting that.

I wanted to say that you redid Newton iteration in _frob_gen() and in frobenius_unram(). Is there a reason for that?

As for the horner scheme--our experiments suggested that the expensive operation was multiplication (in large degree unramified extensions). For example, In R.<a> = Zq(37100,100), addition of two random elements takes ~80 micro seconds, while muliplication takes ~80 milli seconds. My naive guess is that cacheing the powers of frob_a in the dictionary and paying the cost of addition will be much faster than paying the repeated cost of multiplication. In any case, storing the powers in a dict or computing the polynomials via horner are much faster than the previous method of computing frobenius.

I don't understand why Horner method is more costly in terms of multiplications. On the contrary, I think that it is (a bit) less costly.

Last edited 2 years ago by caruso (previous) (diff)

comment:19 Changed 2 years ago by git

  • Commit changed from 3a03a51b0875bc8d72e3f51b787245938e464493 to 62cec73f79f300e700cc683b9a2486d796777905

Branch pushed to git repo; I updated commit sha1. New commits:

ea18e00fix whitespace
62cec73Remove duplicated code

comment:20 in reply to: ↑ 18 Changed 2 years ago by asteele

Aha, thanks for catching that! That was a mistake, and has been fixed.

Replying to caruso:

Replying to asteele:

We redid the Newton iteration in _frob_gen() because that was easier (for me) than finding & adapting an existing method. The only method I see is .hensel_lift(p,e) in sage.rings.polynomial.polynomial_element.Polynomial, and I don't see the advantage of adapting that.

I wanted to say that you redid Newton iteration in _frob_gen() and in frobenius_unram(). Is there a reason for that?

I just tested the Horner method and it is roughly 10 times slower in Zq(5100,100). It seems that multiplying elements of the extension by elements in Q_p is very fast, but multiplying elements in the extension is generally slow. Cacheing the expensive multiplications (and maybe performing more multiplications over all) appears to be a good strategy in very high degree.

As for the horner scheme--our experiments suggested that the expensive operation was multiplication (in large degree unramified extensions). For example, In R.<a> = Zq(37100,100), addition of two random elements takes ~80 micro seconds, while muliplication takes ~80 milli seconds. My naive guess is that cacheing the powers of frob_a in the dictionary and paying the cost of addition will be much faster than paying the repeated cost of multiplication. In any case, storing the powers in a dict or computing the polynomials via horner are much faster than the previous method of computing frobenius.

I don't understand why Horner method is more costly in terms of multiplications. On the contrary, I think that it is (a bit) less costly.

comment:21 follow-up: Changed 2 years ago by caruso

Let me disagree :-).

On my laptop, with your current implementation, it takes about 1s to compute the Frobenius of a random element over Zq(5^100,100) (without taking in account the computation of the Frobenius of the generator) while it takes less than 0.1s using the Horner scheme (cf implementation attached). It is then about 10 times faster.

By the way, as far as I understand, an element of Zq is representated as P(a) where a is the generator and P is a polynomial over Z. What we do currently is to write all the coefficients of P in radix p (this is done by the method list), and then just after recompose them into actual integers. It would be better to avoid this useful computation and access directly to the coefficients of P but I'm not sure it's easy to do it.

comment:22 Changed 2 years ago by caruso

  • Branch changed from u/asteele/better_implementation_of_frobenius_automorphism to u/caruso/better_implementation_of_frobenius_automorphism

comment:23 Changed 2 years ago by git

  • Commit changed from 62cec73f79f300e700cc683b9a2486d796777905 to 8ae9f8aff0648ed4a3a90e137a6640945f208140

Branch pushed to git repo; I updated commit sha1. New commits:

8ae9f8aUse Horner scheme

comment:24 in reply to: ↑ 21 Changed 2 years ago by asteele

Fantastic! I see your implementation of Horner is much better than what I was doing, and I have reproduced the performance increase on k8s: After _frob_gen() is cached, your scheme computes frobenius of random elements of Zq(5^100,100) in approximately 60 ms, while the previous method was taking 800ms. Your current implementation is definitely the way to go (and is roughly 1000 times faster than what is currently in Sage)

Replying to caruso:

Let me disagree :-).

On my laptop, with your current implementation, it takes about 1s to compute the Frobenius of a random element over Zq(5^100,100) (without taking in account the computation of the Frobenius of the generator) while it takes less than 0.1s using the Horner scheme (cf implementation attached). It is then about 10 times faster.

By the way, as far as I understand, an element of Zq is representated as P(a) where a is the generator and P is a polynomial over Z. What we do currently is to write all the coefficients of P in radix p (this is done by the method list), and then just after recompose them into actual integers. It would be better to avoid this useful computation and access directly to the coefficients of P but I'm not sure it's easy to do it.

comment:25 Changed 2 years ago by chapoton

maybe the iteration + for i in range(len(coefs)): could be replaced by just + for ci in coefs: ?

comment:26 Changed 2 years ago by asteele

  • Branch changed from u/caruso/better_implementation_of_frobenius_automorphism to u/asteele/better_implementation_of_frobenius_automorphism

comment:27 Changed 2 years ago by asteele

  • Commit changed from 8ae9f8aff0648ed4a3a90e137a6640945f208140 to 5c9d33c72a8993cde39ebd9d1ccf774bfb65b07d

I've cleaned up the code a little bit, removing the previous method (which contained the non-pythonic iteration for i in range(len(coefs)) ). Code now uses Xavier's implementation of Horner' scheme, passes all doctests and, I think, is ready for review.


New commits:

5c9d33cremove extraneous code

comment:28 Changed 2 years ago by caruso

  • Status changed from needs_review to positive_review

Ok, positive review (through it would be even faster to use the method polynomial of ticket #14825 but it still needs work).

comment:29 Changed 2 years ago by vbraun

  • Branch changed from u/asteele/better_implementation_of_frobenius_automorphism to 5c9d33c72a8993cde39ebd9d1ccf774bfb65b07d
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.