Opened 7 years ago
Closed 2 years ago
#12657 closed enhancement (fixed)
Better implementation of Frobenius automorphism
Reported by:  roed  Owned by:  roed 

Priority:  major  Milestone:  sage8.1 
Component:  padics  Keywords:  sd87 
Cc:  jpflori, caruso  Merged in:  
Authors:  Ander Steele, Ricky Magner  Reviewers:  GaYee Park, Xavier Caruso 
Report Upstream:  N/A  Work issues:  
Branch:  5c9d33c (Commits)  Commit:  5c9d33c72a8993cde39ebd9d1ccf774bfb65b07d 
Dependencies:  #14304  Stopgaps: 
Change History (29)
comment:1 Changed 6 years ago by
 Milestone changed from sage5.11 to sage5.12
comment:2 Changed 6 years ago by
 Dependencies set to #14304
comment:3 Changed 6 years ago by
 Milestone changed from sage6.1 to sage6.2
comment:4 Changed 5 years ago by
 Milestone changed from sage6.2 to sage6.3
comment:5 Changed 5 years ago by
 Milestone changed from sage6.3 to sage6.4
comment:6 Changed 5 years ago by
 Cc jpflori added
comment:7 Changed 2 years ago by
 Keywords sd87 added
comment:8 Changed 2 years ago by
 Branch set to u/asteele/better_implementation_of_frobenius_automorphism
comment:9 Changed 2 years ago by
 Commit set to ffcb86ca34e2f45641ec8639b2a045527f5c1e61
comment:10 Changed 2 years ago by
 Commit changed from ffcb86ca34e2f45641ec8639b2a045527f5c1e61 to 3a03a51b0875bc8d72e3f51b787245938e464493
Branch pushed to git repo; I updated commit sha1. New commits:
3a03a51  Add doctest for _frob_gen()

comment:11 Changed 2 years ago by
 Status changed from new to needs_review
comment:12 Changed 2 years ago by
 Reviewers set to GaYee Park
 Status changed from needs_review to positive_review
All test passed.
comment:13 Changed 2 years ago by
 Milestone changed from sage6.4 to sage8.1
 Status changed from positive_review to needs_work
Author name missing.
comment:14 Changed 2 years ago by
 Status changed from needs_work to needs_review
comment:15 Changed 2 years ago by
 Cc caruso added
 Reviewers changed from GaYee Park to GaYee Park, Xavier Caruso
I don't understand why you need to redo Newton iteration in frobenius_unram
.
Alternatively, instead of storing the powers of frob_a
in a dictonary, you can use Horner scheme.
comment:16 followup: ↓ 18 Changed 2 years ago by
We redid the Newton iteration in _frob_gen() because that was easier (for me) than finding & adapting an existing method. The only method I see is .hensel_lift(p,e) in sage.rings.polynomial.polynomial_element.Polynomial, and I don't see the advantage of adapting that.
As for the horner schemeour experiments suggested that the expensive operation was multiplication (in large degree unramified extensions). For example, In R.<a> = Zq(37^{100},100), addition of two random elements takes ~80 micro seconds, while muliplication takes ~80 milli seconds. My naive guess is that cacheing the powers of frob_a in the dictionary and paying the cost of addition will be much faster than paying the repeated cost of multiplication. In any case, storing the powers in a dict or computing the polynomials via horner are much faster than the previous method of computing frobenius.
comment:17 Changed 2 years ago by
comment:18 in reply to: ↑ 16 ; followup: ↓ 20 Changed 2 years ago by
Replying to asteele:
We redid the Newton iteration in _frob_gen() because that was easier (for me) than finding & adapting an existing method. The only method I see is .hensel_lift(p,e) in sage.rings.polynomial.polynomial_element.Polynomial, and I don't see the advantage of adapting that.
I wanted to say that you redid Newton iteration in _frob_gen()
and in frobenius_unram()
. Is there a reason for that? (In fact, I'm wondering whether the auxiliary function _frob_gen()
is really necessary.)
As for the horner schemeour experiments suggested that the expensive operation was multiplication (in large degree unramified extensions). For example, In R.<a> = Zq(37^{100},100), addition of two random elements takes ~80 micro seconds, while muliplication takes ~80 milli seconds. My naive guess is that cacheing the powers of frob_a in the dictionary and paying the cost of addition will be much faster than paying the repeated cost of multiplication. In any case, storing the powers in a dict or computing the polynomials via horner are much faster than the previous method of computing frobenius.
I don't understand why Horner method is more costly in terms of multiplications. On the contrary, I think that it is (a bit) less costly.
comment:19 Changed 2 years ago by
 Commit changed from 3a03a51b0875bc8d72e3f51b787245938e464493 to 62cec73f79f300e700cc683b9a2486d796777905
comment:20 in reply to: ↑ 18 Changed 2 years ago by
Aha, thanks for catching that! That was a mistake, and has been fixed.
Replying to caruso:
Replying to asteele:
We redid the Newton iteration in _frob_gen() because that was easier (for me) than finding & adapting an existing method. The only method I see is .hensel_lift(p,e) in sage.rings.polynomial.polynomial_element.Polynomial, and I don't see the advantage of adapting that.
I wanted to say that you redid Newton iteration in
_frob_gen()
and infrobenius_unram()
. Is there a reason for that?
I just tested the Horner method and it is roughly 10 times slower in Zq(5^{100},100). It seems that multiplying elements of the extension by elements in Q_p is very fast, but multiplying elements in the extension is generally slow. Cacheing the expensive multiplications (and maybe performing more multiplications over all) appears to be a good strategy in very high degree.
As for the horner schemeour experiments suggested that the expensive operation was multiplication (in large degree unramified extensions). For example, In R.<a> = Zq(37^{100},100), addition of two random elements takes ~80 micro seconds, while muliplication takes ~80 milli seconds. My naive guess is that cacheing the powers of frob_a in the dictionary and paying the cost of addition will be much faster than paying the repeated cost of multiplication. In any case, storing the powers in a dict or computing the polynomials via horner are much faster than the previous method of computing frobenius.
I don't understand why Horner method is more costly in terms of multiplications. On the contrary, I think that it is (a bit) less costly.
comment:21 followup: ↓ 24 Changed 2 years ago by
Let me disagree :).
On my laptop, with your current implementation, it takes about 1s to compute the Frobenius of a random element over Zq(5^100,100)
(without taking in account the computation of the Frobenius of the generator) while it takes less than 0.1s using the Horner scheme (cf implementation attached). It is then about 10 times faster.
By the way, as far as I understand, an element of Zq is representated as P(a) where a is the generator and P is a polynomial over Z. What we do currently is to write all the coefficients of P in radix p (this is done by the method list
), and then just after recompose them into actual integers. It would be better to avoid this useful computation and access directly to the coefficients of P but I'm not sure it's easy to do it.
comment:22 Changed 2 years ago by
 Branch changed from u/asteele/better_implementation_of_frobenius_automorphism to u/caruso/better_implementation_of_frobenius_automorphism
comment:23 Changed 2 years ago by
 Commit changed from 62cec73f79f300e700cc683b9a2486d796777905 to 8ae9f8aff0648ed4a3a90e137a6640945f208140
Branch pushed to git repo; I updated commit sha1. New commits:
8ae9f8a  Use Horner scheme

comment:24 in reply to: ↑ 21 Changed 2 years ago by
Fantastic! I see your implementation of Horner is much better than what I was doing, and I have reproduced the performance increase on k8s: After _frob_gen()
is cached, your scheme computes frobenius of random elements of Zq(5^100,100)
in approximately 60 ms, while the previous method was taking 800ms. Your current implementation is definitely the way to go (and is roughly 1000 times faster than what is currently in Sage)
Replying to caruso:
Let me disagree :).
On my laptop, with your current implementation, it takes about 1s to compute the Frobenius of a random element over
Zq(5^100,100)
(without taking in account the computation of the Frobenius of the generator) while it takes less than 0.1s using the Horner scheme (cf implementation attached). It is then about 10 times faster.By the way, as far as I understand, an element of Zq is representated as P(a) where a is the generator and P is a polynomial over Z. What we do currently is to write all the coefficients of P in radix p (this is done by the method
list
), and then just after recompose them into actual integers. It would be better to avoid this useful computation and access directly to the coefficients of P but I'm not sure it's easy to do it.
comment:25 Changed 2 years ago by
maybe the iteration + for i in range(len(coefs)):
could be replaced by just + for ci in coefs:
?
comment:26 Changed 2 years ago by
 Branch changed from u/caruso/better_implementation_of_frobenius_automorphism to u/asteele/better_implementation_of_frobenius_automorphism
comment:27 Changed 2 years ago by
 Commit changed from 8ae9f8aff0648ed4a3a90e137a6640945f208140 to 5c9d33c72a8993cde39ebd9d1ccf774bfb65b07d
I've cleaned up the code a little bit, removing the previous method (which contained the nonpythonic iteration for i in range(len(coefs))
). Code now uses Xavier's implementation of Horner' scheme, passes all doctests and, I think, is ready for review.
New commits:
5c9d33c  remove extraneous code

comment:28 Changed 2 years ago by
 Status changed from needs_review to positive_review
Ok, positive review (through it would be even faster to use the method polynomial
of ticket #14825 but it still needs work).
comment:29 Changed 2 years ago by
 Branch changed from u/asteele/better_implementation_of_frobenius_automorphism to 5c9d33c72a8993cde39ebd9d1ccf774bfb65b07d
 Resolution set to fixed
 Status changed from positive_review to closed
The method that needs to be improved is
frobenius_unram
insrc/sage/libs/linkages/padics/unram_shared.pxi
. This may want to wait on some more padic polynomial developments, but could also be done now.