Opened 2 months ago

Closed 2 months ago

#33961 closed enhancement (fixed)

compute square roots modulo powers of two in polynomial time

Reported by: lorenz Owned by:
Priority: major Milestone: sage-9.7
Component: algebra Keywords:
Cc: Merged in:
Authors: Lorenz Panny Reviewers: Travis Scrimshaw
Report Upstream: N/A Work issues:
Branch: aa06a21 (Commits, GitHub, GitLab) Commit: aa06a211bd923aeb7ebc51b7de088a1e224646b4
Dependencies: Stopgaps:

Status badges

Description

Currently, square_root_mod_prime_power() in integer_mod.pyx contains the following:

    if p == 2:
        if e == 1:
            return a
        # TODO: implement something that isn't totally idiotic.
        for x in a.parent():
            if x**2 == a:
                return x

In this patch, we do so.

Change History (9)

comment:1 Changed 2 months ago by git

  • Commit changed from edf55782390bcb960a054f623788aec766fde19f to 1d1551e9bdfc1fe816e3915dad1abec21864d116

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

1d1551emore efficient algorithm for square roots modulo 2^n

comment:2 Changed 2 months ago by lorenz

  • Status changed from new to needs_review

comment:3 Changed 2 months ago by git

  • Commit changed from 1d1551e9bdfc1fe816e3915dad1abec21864d116 to 0b6382da3b8714b10c48f1ce55ba2ff0f55e7623

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

0b6382dmore efficient algorithm for square roots modulo 2^n

comment:4 Changed 2 months ago by lorenz

  • Summary changed from compute square roots in modulo powers of two in polynomial time to compute square roots modulo powers of two in polynomial time

comment:5 Changed 2 months ago by tscrim

How does this compare against the old algorithm for small values? I am wondering if we want to impose a cutoff and do the dumb way when n is small. (Cf. bubble sort versus more efficient search algorithms for small lists.)

comment:6 Changed 2 months ago by git

  • Commit changed from 0b6382da3b8714b10c48f1ce55ba2ff0f55e7623 to aa06a211bd923aeb7ebc51b7de088a1e224646b4

Branch pushed to git repo; I updated commit sha1. New commits:

aa06a21faster 2-adic lifting for square roots modulo 2^n

comment:7 Changed 2 months ago by lorenz

I revisited the choice of algorithm and it's both simpler and even faster now. :-)

Example:

sage: for e in range(1,19):
....:     set_random_seed(0)
....:     tests = [Zmod(2^e).random_element()^2 for _ in range(1000)]
....:     print(f'{e:<2}', end=' | ', flush=True)
....:     %timeit for y in tests: y.sqrt()

Old:

1  | 95 µs ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
2  | 100 µs ± 1.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
3  | 103 µs ± 2.18 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
4  | 113 µs ± 3.67 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
5  | 122 µs ± 369 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
6  | 145 µs ± 948 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
7  | 332 µs ± 1.31 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
8  | 426 µs ± 5.57 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
9  | 669 µs ± 7.18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
10 | 1.03 ms ± 3.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
11 | 1.78 ms ± 7.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
12 | 3.26 ms ± 24.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
13 | 6.13 ms ± 27.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
14 | 486 ms ± 1.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
15 | 1.01 s ± 3.67 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
16 | 1.9 s ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
17 | 4.11 s ± 79.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
18 | 7.66 s ± 51.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

New:

1  | 89.5 µs ± 397 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
2  | 90 µs ± 177 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
3  | 95.2 µs ± 505 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
4  | 103 µs ± 193 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
5  | 120 µs ± 372 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
6  | 141 µs ± 526 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
7  | 338 µs ± 2.17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
8  | 428 µs ± 3.18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
9  | 696 µs ± 9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
10 | 1.03 ms ± 2.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
11 | 1.75 ms ± 2.83 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
12 | 3.2 ms ± 18.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
13 | 5.98 ms ± 20.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
14 | 17.9 ms ± 93.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
15 | 18.4 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
16 | 18.1 ms ± 77.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
17 | 18.4 ms ± 62.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
18 | 19 ms ± 42.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The reason for the jump at n ≥ 14 is that n ≤ 13 is implemented by the IntegerMod_int specialization, which has its own version of .sqrt() using brute force. Thus, the new code is only used for n ≥ 14 anyway, where it clearly outperforms the old (visibly exponential-time) version.

comment:8 Changed 2 months ago by tscrim

  • Reviewers set to Travis Scrimshaw
  • Status changed from needs_review to positive_review

Thank you. LGTM.

comment:9 Changed 2 months ago by vbraun

  • Branch changed from public/square_roots_modulo_powers_of_2 to aa06a211bd923aeb7ebc51b7de088a1e224646b4
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.